All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/9] drm/i915: Make user contexts bannable again!
@ 2019-02-17 16:11 Chris Wilson
  2019-02-17 16:11 ` [PATCH 2/9] drm/i915: Prevent user context creation while wedged Chris Wilson
                   ` (16 more replies)
  0 siblings, 17 replies; 23+ messages in thread
From: Chris Wilson @ 2019-02-17 16:11 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, Mika Kuoppala, stable

Since moving the bannable boolean into the context flags, we lost the
default setting of contexts being bannable. Oops.

Sadly because we have multi-level banning scheme, our testcase for being
banned cannot distinguish between the expected ban on the context and
the applied banned via the fd.

Fixes: 6095868a271d ("drm/i915: Complete kerneldoc for struct i915_gem_context")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.11+
---
 drivers/gpu/drm/i915/i915_gem_context.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 280813a4bf82..102866967998 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -364,6 +364,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	list_add_tail(&ctx->link, &dev_priv->contexts.list);
 	ctx->i915 = dev_priv;
 	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
+	ctx->user_flags = BIT(UCONTEXT_BANNABLE);
 
 	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
 		intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/9] drm/i915: Prevent user context creation while wedged
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
@ 2019-02-17 16:11 ` Chris Wilson
  2019-02-17 16:11 ` [PATCH 3/9] drm/i915: Optionally disable automatic recovery after a GPU reset Chris Wilson
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2019-02-17 16:11 UTC (permalink / raw)
  To: intel-gfx

Introduce a new ABI method for detecting a wedged driver by reporting
-EIO from DRM_IOCTL_I915_GEM_CONTEXT_CREATE.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 102866967998..f6c48274fafb 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -797,18 +797,23 @@ static bool client_is_banned(struct drm_i915_file_private *file_priv)
 int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 				  struct drm_file *file)
 {
-	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct drm_i915_private *i915 = to_i915(dev);
 	struct drm_i915_gem_context_create *args = data;
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct i915_gem_context *ctx;
 	int ret;
 
-	if (!DRIVER_CAPS(dev_priv)->has_logical_contexts)
+	if (!DRIVER_CAPS(i915)->has_logical_contexts)
 		return -ENODEV;
 
 	if (args->pad != 0)
 		return -EINVAL;
 
+	if (i915_terminally_wedged(&i915->gpu_error)) {
+		DRM_DEBUG("driver is wedged; banning new ctx!\n");
+		return -EIO;
+	}
+
 	if (client_is_banned(file_priv)) {
 		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
 			  current->comm,
@@ -821,7 +826,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
-	ctx = i915_gem_create_context(dev_priv, file_priv);
+	ctx = i915_gem_create_context(i915, file_priv);
 	mutex_unlock(&dev->struct_mutex);
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/9] drm/i915: Optionally disable automatic recovery after a GPU reset
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
  2019-02-17 16:11 ` [PATCH 2/9] drm/i915: Prevent user context creation while wedged Chris Wilson
@ 2019-02-17 16:11 ` Chris Wilson
  2019-02-17 17:05   ` [PATCH] " Chris Wilson
  2019-02-17 16:11 ` [PATCH 4/9] drm/i915/selftests: Move local mock_ggtt allocations to the heap Chris Wilson
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2019-02-17 16:11 UTC (permalink / raw)
  To: intel-gfx; +Cc: Kenneth Graunke, Mika Kuoppala

Some clients, such as mesa, may only emit minimal incremental batches
that rely on the logical context state from previous batches. They know
that recovery is impossible after a hang as their required GPU state is
lost, and that each in flight and subsequent batch will hang (resetting
the context image back to default perpetuating the problem).

To avoid getting into the state in the first place, we can allow clients
to opt out of automatic recovery and elect to ban any guilty context
following a hang. This prevents the continual stream of hangs and allows
the client to recreate their context and rebuild the state from scratch.

v2: Prefer calling it recoverable rather than unrecoverable.

References: https://lists.freedesktop.org/archives/mesa-dev/2019-February/215431.html
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org> # for mesa
---
 drivers/gpu/drm/i915/i915_gem_context.c | 14 +++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h | 16 ++++++++++++++++
 drivers/gpu/drm/i915/i915_reset.c       |  3 ++-
 include/uapi/drm/i915_drm.h             | 20 ++++++++++++++++++++
 4 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f6c48274fafb..b6564f3faf85 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -364,7 +364,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	list_add_tail(&ctx->link, &dev_priv->contexts.list);
 	ctx->i915 = dev_priv;
 	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
-	ctx->user_flags = BIT(UCONTEXT_BANNABLE);
+	ctx->user_flags = BIT(UCONTEXT_BANNABLE) | BIT(UCONTEXT_RECOVERABLE);
 
 	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
 		intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
@@ -957,6 +957,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 		args->size = 0;
 		args->value = i915_gem_context_is_bannable(ctx);
 		break;
+	case I915_CONTEXT_PARAM_RECOVERABLE:
+		args->value = i915_gem_context_is_recoverable(ctx);
+		break;
 	case I915_CONTEXT_PARAM_PRIORITY:
 		args->size = 0;
 		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
@@ -1291,6 +1294,15 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 			i915_gem_context_clear_bannable(ctx);
 		break;
 
+	case I915_CONTEXT_PARAM_RECOVERABLE:
+		if (args->size)
+			ret = -EINVAL;
+		else if (args->value)
+			i915_gem_context_set_recoverable(ctx);
+		else
+			i915_gem_context_clear_recoverable(ctx);
+		break;
+
 	case I915_CONTEXT_PARAM_PRIORITY:
 		{
 			s64 priority = args->value;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index ca150a764c24..071108d34ae0 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -134,6 +134,7 @@ struct i915_gem_context {
 #define UCONTEXT_NO_ZEROMAP		0
 #define UCONTEXT_NO_ERROR_CAPTURE	1
 #define UCONTEXT_BANNABLE		2
+#define UCONTEXT_RECOVERABLE		3
 
 	/**
 	 * @flags: small set of booleans
@@ -270,6 +271,21 @@ static inline void i915_gem_context_clear_bannable(struct i915_gem_context *ctx)
 	clear_bit(UCONTEXT_BANNABLE, &ctx->user_flags);
 }
 
+static inline bool i915_gem_context_is_recoverable(const struct i915_gem_context *ctx)
+{
+	return test_bit(UCONTEXT_RECOVERABLE, &ctx->user_flags);
+}
+
+static inline void i915_gem_context_set_recoverable(struct i915_gem_context *ctx)
+{
+	set_bit(UCONTEXT_RECOVERABLE, &ctx->user_flags);
+}
+
+static inline void i915_gem_context_clear_recoverable(struct i915_gem_context *ctx)
+{
+	clear_bit(UCONTEXT_RECOVERABLE, &ctx->user_flags);
+}
+
 static inline bool i915_gem_context_is_banned(const struct i915_gem_context *ctx)
 {
 	return test_bit(CONTEXT_BANNED, &ctx->flags);
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 5a067a4b3d5d..1911e00d2581 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -66,7 +66,8 @@ static bool context_mark_guilty(struct i915_gem_context *ctx)
 
 	bannable = i915_gem_context_is_bannable(ctx);
 	score = atomic_add_return(CONTEXT_SCORE_GUILTY, &ctx->ban_score);
-	banned = score >= CONTEXT_SCORE_BAN_THRESHOLD;
+	banned = (!i915_gem_context_is_recoverable(ctx) ||
+		  score >= CONTEXT_SCORE_BAN_THRESHOLD);
 
 	/* Cool contexts don't accumulate client ban score */
 	if (!bannable)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 397810fa2d33..c890b7992d5c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1491,6 +1491,26 @@ struct drm_i915_gem_context_param {
 	 * drm_i915_gem_context_param_sseu.
 	 */
 #define I915_CONTEXT_PARAM_SSEU		0x7
+
+/*
+ * Not all clients may want to attempt automatic recover of a context after
+ * a hang (for example, some clients may only submit very small incremental
+ * batches relying on known logical state of previous batches which will never
+ * recover correctly and each attempt will hang), and so would prefer that
+ * the context is forever banned instead.
+ *
+ * If set to false (0), after a reset, subsequent (and in flight) rendering
+ * from this context is discarded, and the client will need to create a new
+ * context to use instead.
+ *
+ * If set to true (1), the kernel will automatically attempt to recover the
+ * context by skipping the hanging batch and executing the next batch starting
+ * from the default context state (discarding the incomplete logical context
+ * state lost due to the reset).
+ *
+ * On creation, all new contexts are marked as recoverable.
+ */
+#define I915_CONTEXT_PARAM_RECOVERABLE	0x8
 	__u64 value;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/9] drm/i915/selftests: Move local mock_ggtt allocations to the heap
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
  2019-02-17 16:11 ` [PATCH 2/9] drm/i915: Prevent user context creation while wedged Chris Wilson
  2019-02-17 16:11 ` [PATCH 3/9] drm/i915: Optionally disable automatic recovery after a GPU reset Chris Wilson
@ 2019-02-17 16:11 ` Chris Wilson
  2019-02-17 18:40   ` Matthew Auld
  2019-02-17 16:11 ` [PATCH 5/9] drm/i915: Move verify_wm_state() to heap Chris Wilson
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2019-02-17 16:11 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matthew Auld

This struct appears quite large and pushes our stack frame over
1024 bytes -- too high for conservative setups. So move the mock_ggtt
struct to the heap.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++++++++++-----
 drivers/gpu/drm/i915/selftests/i915_vma.c     | 15 ++++++++++-----
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 3850ef4a5ec8..73aafcb4c673 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -1681,25 +1681,30 @@ int i915_gem_gtt_mock_selftests(void)
 		SUBTEST(igt_gtt_insert),
 	};
 	struct drm_i915_private *i915;
-	struct i915_ggtt ggtt;
+	struct i915_ggtt *ggtt;
 	int err;
 
 	i915 = mock_gem_device();
 	if (!i915)
 		return -ENOMEM;
 
-	mock_init_ggtt(i915, &ggtt);
+	ggtt = kmalloc(sizeof(*ggtt), GFP_KERNEL);
+	if (!ggtt) {
+		err = -ENOMEM;
+		goto out_put;
+	}
+	mock_init_ggtt(i915, ggtt);
 
 	mutex_lock(&i915->drm.struct_mutex);
-	err = i915_subtests(tests, &ggtt);
+	err = i915_subtests(tests, ggtt);
 	mock_device_flush(i915);
 	mutex_unlock(&i915->drm.struct_mutex);
 
 	i915_gem_drain_freed_objects(i915);
 
-	mock_fini_ggtt(&ggtt);
+	mock_fini_ggtt(ggtt);
+out_put:
 	drm_dev_put(&i915->drm);
-
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
index cf1de82741fa..4a1c8575e425 100644
--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
@@ -725,24 +725,29 @@ int i915_vma_mock_selftests(void)
 		SUBTEST(igt_vma_partial),
 	};
 	struct drm_i915_private *i915;
-	struct i915_ggtt ggtt;
+	struct i915_ggtt *ggtt;
 	int err;
 
 	i915 = mock_gem_device();
 	if (!i915)
 		return -ENOMEM;
 
-	mock_init_ggtt(i915, &ggtt);
+	ggtt = kmalloc(sizeof(*ggtt), GFP_KERNEL);
+	if (!ggtt) {
+		err = -ENOMEM;
+		goto out_put;
+	}
+	mock_init_ggtt(i915, ggtt);
 
 	mutex_lock(&i915->drm.struct_mutex);
-	err = i915_subtests(tests, &ggtt);
+	err = i915_subtests(tests, ggtt);
 	mock_device_flush(i915);
 	mutex_unlock(&i915->drm.struct_mutex);
 
 	i915_gem_drain_freed_objects(i915);
 
-	mock_fini_ggtt(&ggtt);
+	mock_fini_ggtt(ggtt);
+out_put:
 	drm_dev_put(&i915->drm);
-
 	return err;
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/9] drm/i915: Move verify_wm_state() to heap
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
                   ` (2 preceding siblings ...)
  2019-02-17 16:11 ` [PATCH 4/9] drm/i915/selftests: Move local mock_ggtt allocations to the heap Chris Wilson
@ 2019-02-17 16:11 ` Chris Wilson
  2019-02-17 16:11 ` [PATCH 6/9] drm/i915: Avoid reset lock in writing fence registers Chris Wilson
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2019-02-17 16:11 UTC (permalink / raw)
  To: intel-gfx

The stack usage exceeded 1024 bytes prompting warnings on conservative
setups, so move the temporary allocation for HW readback onto the heap.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 48 ++++++++++++++++++----------
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index afa21daaae51..e10cab1c8c65 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -12230,12 +12230,15 @@ static void verify_wm_state(struct drm_crtc *crtc,
 			    struct drm_crtc_state *new_state)
 {
 	struct drm_i915_private *dev_priv = to_i915(crtc->dev);
-	struct skl_ddb_allocation hw_ddb, *sw_ddb;
-	struct skl_pipe_wm hw_wm, *sw_wm;
-	struct skl_plane_wm *hw_plane_wm, *sw_plane_wm;
+	struct skl_hw_state {
+		struct skl_ddb_entry ddb_y[I915_MAX_PLANES];
+		struct skl_ddb_entry ddb_uv[I915_MAX_PLANES];
+		struct skl_ddb_allocation ddb;
+		struct skl_pipe_wm wm;
+	} *hw;
+	struct skl_ddb_allocation *sw_ddb;
+	struct skl_pipe_wm *sw_wm;
 	struct skl_ddb_entry *hw_ddb_entry, *sw_ddb_entry;
-	struct skl_ddb_entry hw_ddb_y[I915_MAX_PLANES];
-	struct skl_ddb_entry hw_ddb_uv[I915_MAX_PLANES];
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	const enum pipe pipe = intel_crtc->pipe;
 	int plane, level, max_level = ilk_wm_max_level(dev_priv);
@@ -12243,22 +12246,29 @@ static void verify_wm_state(struct drm_crtc *crtc,
 	if (INTEL_GEN(dev_priv) < 9 || !new_state->active)
 		return;
 
-	skl_pipe_wm_get_hw_state(intel_crtc, &hw_wm);
+	hw = kzalloc(sizeof(*hw), GFP_KERNEL);
+	if (!hw)
+		return;
+
+	skl_pipe_wm_get_hw_state(intel_crtc, &hw->wm);
 	sw_wm = &to_intel_crtc_state(new_state)->wm.skl.optimal;
 
-	skl_pipe_ddb_get_hw_state(intel_crtc, hw_ddb_y, hw_ddb_uv);
+	skl_pipe_ddb_get_hw_state(intel_crtc, hw->ddb_y, hw->ddb_uv);
 
-	skl_ddb_get_hw_state(dev_priv, &hw_ddb);
+	skl_ddb_get_hw_state(dev_priv, &hw->ddb);
 	sw_ddb = &dev_priv->wm.skl_hw.ddb;
 
-	if (INTEL_GEN(dev_priv) >= 11)
-		if (hw_ddb.enabled_slices != sw_ddb->enabled_slices)
-			DRM_ERROR("mismatch in DBUF Slices (expected %u, got %u)\n",
-				  sw_ddb->enabled_slices,
-				  hw_ddb.enabled_slices);
+	if (INTEL_GEN(dev_priv) >= 11 &&
+	    hw->ddb.enabled_slices != sw_ddb->enabled_slices)
+		DRM_ERROR("mismatch in DBUF Slices (expected %u, got %u)\n",
+			  sw_ddb->enabled_slices,
+			  hw->ddb.enabled_slices);
+
 	/* planes */
 	for_each_universal_plane(dev_priv, pipe, plane) {
-		hw_plane_wm = &hw_wm.planes[plane];
+		struct skl_plane_wm *hw_plane_wm, *sw_plane_wm;
+
+		hw_plane_wm = &hw->wm.planes[plane];
 		sw_plane_wm = &sw_wm->planes[plane];
 
 		/* Watermarks */
@@ -12290,7 +12300,7 @@ static void verify_wm_state(struct drm_crtc *crtc,
 		}
 
 		/* DDB */
-		hw_ddb_entry = &hw_ddb_y[plane];
+		hw_ddb_entry = &hw->ddb_y[plane];
 		sw_ddb_entry = &to_intel_crtc_state(new_state)->wm.skl.plane_ddb_y[plane];
 
 		if (!skl_ddb_entry_equal(hw_ddb_entry, sw_ddb_entry)) {
@@ -12308,7 +12318,9 @@ static void verify_wm_state(struct drm_crtc *crtc,
 	 * once the plane becomes visible, we can skip this check
 	 */
 	if (1) {
-		hw_plane_wm = &hw_wm.planes[PLANE_CURSOR];
+		struct skl_plane_wm *hw_plane_wm, *sw_plane_wm;
+
+		hw_plane_wm = &hw->wm.planes[PLANE_CURSOR];
 		sw_plane_wm = &sw_wm->planes[PLANE_CURSOR];
 
 		/* Watermarks */
@@ -12340,7 +12352,7 @@ static void verify_wm_state(struct drm_crtc *crtc,
 		}
 
 		/* DDB */
-		hw_ddb_entry = &hw_ddb_y[PLANE_CURSOR];
+		hw_ddb_entry = &hw->ddb_y[PLANE_CURSOR];
 		sw_ddb_entry = &to_intel_crtc_state(new_state)->wm.skl.plane_ddb_y[PLANE_CURSOR];
 
 		if (!skl_ddb_entry_equal(hw_ddb_entry, sw_ddb_entry)) {
@@ -12350,6 +12362,8 @@ static void verify_wm_state(struct drm_crtc *crtc,
 				  hw_ddb_entry->start, hw_ddb_entry->end);
 		}
 	}
+
+	kfree(hw);
 }
 
 static void
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/9] drm/i915: Avoid reset lock in writing fence registers
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
                   ` (3 preceding siblings ...)
  2019-02-17 16:11 ` [PATCH 5/9] drm/i915: Move verify_wm_state() to heap Chris Wilson
@ 2019-02-17 16:11 ` Chris Wilson
  2019-02-17 16:11 ` [PATCH 7/9] drm/i915: Reorder struct_mutex-vs-reset_lock in i915_gem_fault() Chris Wilson
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2019-02-17 16:11 UTC (permalink / raw)
  To: intel-gfx

The idea of taking the reset lock around writing the fence register was
to serialise the mmio write we also perform during the reset where those
registers get clobbered. However, the lock is overkill as write tearing
between reset and fence_update() is harmless; the final value of the
fence register is the same. A race between revoke_fences() and
fence_update() is also harmless at this point as on the fault path where
this is necessary, we acquire the reset lock to coordinate ourselves in
the upper layer.

The danger of acquiring the reset lock again in fence_update() is that
we may recurse from the shrinker along the i915_gem_fault() path.

<4> [125.739646] ============================================
<4> [125.739652] WARNING: possible recursive locking detected
<4> [125.739659] 5.0.0-rc6-ga6e4cbf00557-drmtip_223+ #1 Tainted: G     U
<4> [125.739666] --------------------------------------------
<4> [125.739672] gem_mmap_gtt/1017 is trying to acquire lock:
<4> [125.739679] 00000000a730190a (&dev_priv->gpu_error.reset_backoff_srcu){+.+.}, at: i915_reset_trylock+0x0/0x310 [i915]
<4> [125.739848]
but task is already holding lock:
<4> [125.739854] 00000000a730190a (&dev_priv->gpu_error.reset_backoff_srcu){+.+.}, at: i915_reset_trylock+0x192/0x310 [i915]
<4> [125.739918]
other info that might help us debug this:
<4> [125.739925]  Possible unsafe locking scenario:

<4> [125.739930]        CPU0
<4> [125.739934]        ----
<4> [125.739937]   lock(&dev_priv->gpu_error.reset_backoff_srcu);
<4> [125.739944]   lock(&dev_priv->gpu_error.reset_backoff_srcu);
<4> [125.739950]
 *** DEADLOCK ***

<4> [125.739958]  May be due to missing lock nesting notation

<4> [125.739966] 5 locks held by gem_mmap_gtt/1017:
<4> [125.739972]  #0: 00000000471f682c (&mm->mmap_sem){++++}, at: __do_page_fault+0x133/0x500
<4> [125.739987]  #1: 0000000026542685 (&dev->struct_mutex){+.+.}, at: i915_gem_fault+0x1f6/0x860 [i915]
<4> [125.740061]  #2: 00000000a730190a (&dev_priv->gpu_error.reset_backoff_srcu){+.+.}, at: i915_reset_trylock+0x192/0x310 [i915]
<4> [125.740126]  #3: 00000000c828eb4f (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.25+0x0/0x30
<4> [125.740140]  #4: 000000002d360d65 (shrinker_rwsem){++++}, at: shrink_slab+0x1cb/0x2c0
<4> [125.740151]
stack backtrace:
<4> [125.740159] CPU: 1 PID: 1017 Comm: gem_mmap_gtt Tainted: G     U            5.0.0-rc6-ga6e4cbf00557-drmtip_223+ #1
<4> [125.740170] Hardware name: Dell Inc.                 OptiPlex 745                 /0GW726, BIOS 2.3.1  05/21/2007
<4> [125.740180] Call Trace:
<4> [125.740189]  dump_stack+0x67/0x9b
<4> [125.740199]  __lock_acquire+0xc75/0x1b00
<4> [125.740209]  ? arch_tlb_finish_mmu+0x2a/0xa0
<4> [125.740216]  ? tlb_finish_mmu+0x1a/0x30
<4> [125.740222]  ? zap_page_range_single+0xe2/0x130
<4> [125.740230]  ? lock_acquire+0xa6/0x1c0
<4> [125.740237]  lock_acquire+0xa6/0x1c0
<4> [125.740296]  ? i915_clear_error_registers+0x280/0x280 [i915]
<4> [125.740357]  i915_reset_trylock+0x44/0x310 [i915]
<4> [125.740417]  ? i915_clear_error_registers+0x280/0x280 [i915]
<4> [125.740426]  ? lockdep_hardirqs_on+0xe0/0x1b0
<4> [125.740434]  ? _raw_spin_unlock_irqrestore+0x39/0x60
<4> [125.740499]  fence_update+0x218/0x470 [i915]
<4> [125.740571]  i915_vma_unbind+0xa6/0x550 [i915]
<4> [125.740640]  i915_gem_object_unbind+0xfa/0x190 [i915]
<4> [125.740711]  i915_gem_shrink+0x2dc/0x590 [i915]
<4> [125.740722]  ? ___preempt_schedule+0x16/0x18
<4> [125.740792]  ? i915_gem_shrinker_scan+0xc9/0x130 [i915]
<4> [125.740861]  i915_gem_shrinker_scan+0xc9/0x130 [i915]
<4> [125.740870]  do_shrink_slab+0x143/0x3f0
<4> [125.740878]  shrink_slab+0x228/0x2c0
<4> [125.740886]  shrink_node+0x167/0x450
<4> [125.740894]  do_try_to_free_pages+0xc4/0x340
<4> [125.740902]  try_to_free_pages+0xdc/0x2e0
<4> [125.740911]  __alloc_pages_nodemask+0x662/0x1110
<4> [125.740921]  ? reacquire_held_locks+0xb5/0x1b0
<4> [125.740928]  ? reacquire_held_locks+0xb5/0x1b0
<4> [125.740986]  ? i915_reset_trylock+0x192/0x310 [i915]
<4> [125.741045]  ? i915_memcpy_init_early+0x30/0x30 [i915]
<4> [125.741054]  pte_alloc_one+0x12/0x70
<4> [125.741060]  __pte_alloc+0x11/0xf0
<4> [125.741067]  apply_to_page_range+0x37e/0x440
<4> [125.741127]  remap_io_mapping+0x6c/0x100 [i915]
<4> [125.741196]  i915_gem_fault+0x5a9/0x860 [i915]
<4> [125.741204]  ? ptlock_alloc+0x15/0x30
<4> [125.741212]  __do_fault+0x2c/0xb0
<4> [125.741218]  __handle_mm_fault+0x8ee/0xfa0
<4> [125.741227]  handle_mm_fault+0x196/0x3a0
<4> [125.741235]  __do_page_fault+0x246/0x500
<4> [125.741243]  ? page_fault+0x8/0x30
<4> [125.741250]  page_fault+0x1e/0x30
<4> [125.741256] RIP: 0033:0x55d0cc456e12
<4> [125.741264] Code: b0 df ff ff 89 c2 8b 85 70 df ff ff 01 c2 8b 85 70 df ff ff 48 98 48 8d 0c 85 00 00 00 00 48 8b 85 e0 df ff ff 48 01 c8 f7 d2 <89> 10 83 85 70 df ff ff 01 81 bd 70 df ff ff ff 03 00 00 7e be 48
<4> [125.741280] RSP: 002b:00007ffc1bab7ab0 EFLAGS: 00010206
<4> [125.741287] RAX: 00007fc787cb6000 RBX: 0000000000000000 RCX: 0000000000000000
<4> [125.741295] RDX: 00000000ffffffff RSI: 0000000000005401 RDI: 0000000000000002
<4> [125.741303] RBP: 00007ffc1bab9b70 R08: 00007ffc1bab7920 R09: 000000000000001b
<4> [125.741310] R10: 7165722074736554 R11: 0000000000000246 R12: 000055d0cc454a80
<4> [125.741318] R13: 00007ffc1bab9f60 R14: 0000000000000000 R15: 0000000000000000

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_fence_reg.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
index 1ec1417cf8b4..65624b8e4d15 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
@@ -270,24 +270,16 @@ static int fence_update(struct drm_i915_fence_reg *fence,
 		return 0;
 	}
 
-	ret = i915_reset_trylock(fence->i915);
-	if (ret < 0)
-		goto out_rpm;
-
+	WRITE_ONCE(fence->vma, vma);
 	fence_write(fence, vma);
-	fence->vma = vma;
 
 	if (vma) {
 		vma->fence = fence;
 		list_move_tail(&fence->link, &fence->i915->mm.fence_list);
 	}
 
-	i915_reset_unlock(fence->i915, ret);
-	ret = 0;
-
-out_rpm:
 	intel_runtime_pm_put(fence->i915, wakeref);
-	return ret;
+	return 0;
 }
 
 /**
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/9] drm/i915: Reorder struct_mutex-vs-reset_lock in i915_gem_fault()
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
                   ` (4 preceding siblings ...)
  2019-02-17 16:11 ` [PATCH 6/9] drm/i915: Avoid reset lock in writing fence registers Chris Wilson
@ 2019-02-17 16:11 ` Chris Wilson
  2019-02-17 16:11 ` [PATCH 8/9] drm/i915: Trim i915_do_reset() to minimum delays Chris Wilson
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2019-02-17 16:11 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

Annoyingly, struct_mutex was not entirely eliminated from the reset
pathway; for reasons of its own, intel_display_resumes() requires
struct_mutex to prepare the planes it already captured. To avoid the
immediate problem of a deadlock between the struct_mutex and the reset
srcu, we have to acquire the reset_lock before struct_mutex in
i915_gem_fault(). Now any wait underneath struct_mutex will result us in
having to forcibly reset all inflight rendering, less than ideal.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b421bc7a2e26..b3129cb0a04d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1834,9 +1834,15 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 
 	wakeref = intel_runtime_pm_get(dev_priv);
 
+	srcu = i915_reset_trylock(dev_priv);
+	if (srcu < 0) {
+		ret = srcu;
+		goto err_rpm;
+	}
+
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
-		goto err_rpm;
+		goto err_reset;
 
 	/* Access to snoopable pages through the GTT is incoherent. */
 	if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(dev_priv)) {
@@ -1885,12 +1891,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 	if (ret)
 		goto err_unpin;
 
-	srcu = i915_reset_trylock(dev_priv);
-	if (srcu < 0) {
-		ret = srcu;
-		goto err_fence;
-	}
-
 	/* Finally, remap it using the new GTT offset */
 	ret = remap_io_mapping(area,
 			       area->vm_start + (vma->ggtt_view.partial.offset << PAGE_SHIFT),
@@ -1898,7 +1898,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 			       min_t(u64, vma->size, area->vm_end - area->vm_start),
 			       &ggtt->iomap);
 	if (ret)
-		goto err_reset;
+		goto err_fence;
 
 	/* Mark as being mmapped into userspace for later revocation */
 	assert_rpm_wakelock_held(dev_priv);
@@ -1908,14 +1908,14 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 
 	i915_vma_set_ggtt_write(vma);
 
-err_reset:
-	i915_reset_unlock(dev_priv, srcu);
 err_fence:
 	i915_vma_unpin_fence(vma);
 err_unpin:
 	__i915_vma_unpin(vma);
 err_unlock:
 	mutex_unlock(&dev->struct_mutex);
+err_reset:
+	i915_reset_unlock(dev_priv, srcu);
 err_rpm:
 	intel_runtime_pm_put(dev_priv, wakeref);
 	i915_gem_object_unpin_pages(obj);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 8/9] drm/i915: Trim i915_do_reset() to minimum delays
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
                   ` (5 preceding siblings ...)
  2019-02-17 16:11 ` [PATCH 7/9] drm/i915: Reorder struct_mutex-vs-reset_lock in i915_gem_fault() Chris Wilson
@ 2019-02-17 16:11 ` Chris Wilson
  2019-02-17 16:11 ` [PATCH 9/9] drm/i915: Trim delays for wedging Chris Wilson
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2019-02-17 16:11 UTC (permalink / raw)
  To: intel-gfx

Remove the "safety-factor" in our udelays for i915_do_reset(). If we are
told to hold the line high for 20us, do it only for 20us plus the tiny
bit of udelay latency.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_reset.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 1911e00d2581..280f0e35cb3d 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -162,12 +162,12 @@ static int i915_do_reset(struct drm_i915_private *i915,
 
 	/* Assert reset for at least 20 usec, and wait for acknowledgement. */
 	pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
-	udelay(50);
+	udelay(20);
 	err = wait_for_atomic(i915_in_reset(pdev), 50);
 
 	/* Clear the reset request. */
 	pci_write_config_byte(pdev, I915_GDRST, 0);
-	udelay(50);
+	udelay(20);
 	if (!err)
 		err = wait_for_atomic(!i915_in_reset(pdev), 50);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 9/9] drm/i915: Trim delays for wedging
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
                   ` (6 preceding siblings ...)
  2019-02-17 16:11 ` [PATCH 8/9] drm/i915: Trim i915_do_reset() to minimum delays Chris Wilson
@ 2019-02-17 16:11 ` Chris Wilson
  2019-02-17 16:36 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/9] drm/i915: Make user contexts bannable again! Patchwork
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2019-02-17 16:11 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

CI still reports the occasional multi-second delay for resets, in
particular along the wedge+recovery paths. As the likely, and unbounded,
delay here is from sync_rcu, use the expedited variant instead.

Testcase: igt/gem_eio/unwedge-stress
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_reset.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 280f0e35cb3d..c334089a7476 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -830,7 +830,7 @@ static void __i915_gem_set_wedged(struct drm_i915_private *i915)
 	 * either this call here to intel_engine_write_global_seqno, or the one
 	 * in nop_submit_request.
 	 */
-	synchronize_rcu();
+	synchronize_rcu_expedited();
 
 	/* Mark all executing requests as skipped */
 	for_each_engine(engine, i915, id)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/9] drm/i915: Make user contexts bannable again!
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
                   ` (7 preceding siblings ...)
  2019-02-17 16:11 ` [PATCH 9/9] drm/i915: Trim delays for wedging Chris Wilson
@ 2019-02-17 16:36 ` Patchwork
  2019-02-17 16:39 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2019-02-17 16:36 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [1/9] drm/i915: Make user contexts bannable again!
URL   : https://patchwork.freedesktop.org/series/56809/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
b5b63e011d58 drm/i915: Make user contexts bannable again!
0bad7186178e drm/i915: Prevent user context creation while wedged
1105d9066d77 drm/i915: Optionally disable automatic recovery after a GPU reset
-:20: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#20: 
References: https://lists.freedesktop.org/archives/mesa-dev/2019-February/215431.html

total: 0 errors, 1 warnings, 0 checks, 95 lines checked
fc8819eea289 drm/i915/selftests: Move local mock_ggtt allocations to the heap
cca5b459de06 drm/i915: Move verify_wm_state() to heap
3dc9280a028c drm/i915: Avoid reset lock in writing fence registers
-:23: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#23: 
<4> [125.739679] 00000000a730190a (&dev_priv->gpu_error.reset_backoff_srcu){+.+.}, at: i915_reset_trylock+0x0/0x310 [i915]

total: 0 errors, 1 warnings, 0 checks, 26 lines checked
b3e220ab636d drm/i915: Reorder struct_mutex-vs-reset_lock in i915_gem_fault()
28f9b341fa1d drm/i915: Trim i915_do_reset() to minimum delays
-:21: CHECK:USLEEP_RANGE: usleep_range is preferred over udelay; see Documentation/timers/timers-howto.txt
#21: FILE: drivers/gpu/drm/i915/i915_reset.c:165:
+	udelay(20);

-:27: CHECK:USLEEP_RANGE: usleep_range is preferred over udelay; see Documentation/timers/timers-howto.txt
#27: FILE: drivers/gpu/drm/i915/i915_reset.c:170:
+	udelay(20);

total: 0 errors, 0 warnings, 2 checks, 14 lines checked
1b1e5512337e drm/i915: Trim delays for wedging

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [1/9] drm/i915: Make user contexts bannable again!
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
                   ` (8 preceding siblings ...)
  2019-02-17 16:36 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/9] drm/i915: Make user contexts bannable again! Patchwork
@ 2019-02-17 16:39 ` Patchwork
  2019-02-17 17:19 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2019-02-17 16:39 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [1/9] drm/i915: Make user contexts bannable again!
URL   : https://patchwork.freedesktop.org/series/56809/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915: Make user contexts bannable again!
Okay!

Commit: drm/i915: Prevent user context creation while wedged
Okay!

Commit: drm/i915: Optionally disable automatic recovery after a GPU reset
Okay!

Commit: drm/i915/selftests: Move local mock_ggtt allocations to the heap
Okay!

Commit: drm/i915: Move verify_wm_state() to heap
Okay!

Commit: drm/i915: Avoid reset lock in writing fence registers
Okay!

Commit: drm/i915: Reorder struct_mutex-vs-reset_lock in i915_gem_fault()
-O:drivers/gpu/drm/i915/i915_gem.c:1898:32: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem.c:1898:32: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem.c:1898:32: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem.c:1898:32: warning: expression using sizeof(void)

Commit: drm/i915: Trim i915_do_reset() to minimum delays
Okay!

Commit: drm/i915: Trim delays for wedging
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] drm/i915: Optionally disable automatic recovery after a GPU reset
  2019-02-17 16:11 ` [PATCH 3/9] drm/i915: Optionally disable automatic recovery after a GPU reset Chris Wilson
@ 2019-02-17 17:05   ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2019-02-17 17:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: Kenneth Graunke, Mika Kuoppala

Some clients, such as mesa, may only emit minimal incremental batches
that rely on the logical context state from previous batches. They know
that recovery is impossible after a hang as their required GPU state is
lost, and that each in flight and subsequent batch will hang (resetting
the context image back to default perpetuating the problem).

To avoid getting into the state in the first place, we can allow clients
to opt out of automatic recovery and elect to ban any guilty context
following a hang. This prevents the continual stream of hangs and allows
the client to recreate their context and rebuild the state from scratch.

v2: Prefer calling it recoverable rather than unrecoverable.

References: https://lists.freedesktop.org/archives/mesa-dev/2019-February/215431.html
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org> # for mesa
---
Fixup setting args->size = 0 (silent rebase error).
---
 drivers/gpu/drm/i915/i915_gem_context.c | 15 ++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h | 16 ++++++++++++++++
 drivers/gpu/drm/i915/i915_reset.c       |  3 ++-
 include/uapi/drm/i915_drm.h             | 20 ++++++++++++++++++++
 4 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f6c48274fafb..7fdc273bd57c 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -364,7 +364,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	list_add_tail(&ctx->link, &dev_priv->contexts.list);
 	ctx->i915 = dev_priv;
 	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
-	ctx->user_flags = BIT(UCONTEXT_BANNABLE);
+	ctx->user_flags = BIT(UCONTEXT_BANNABLE) | BIT(UCONTEXT_RECOVERABLE);
 
 	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
 		intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
@@ -957,6 +957,10 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 		args->size = 0;
 		args->value = i915_gem_context_is_bannable(ctx);
 		break;
+	case I915_CONTEXT_PARAM_RECOVERABLE:
+		args->size = 0;
+		args->value = i915_gem_context_is_recoverable(ctx);
+		break;
 	case I915_CONTEXT_PARAM_PRIORITY:
 		args->size = 0;
 		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
@@ -1291,6 +1295,15 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 			i915_gem_context_clear_bannable(ctx);
 		break;
 
+	case I915_CONTEXT_PARAM_RECOVERABLE:
+		if (args->size)
+			ret = -EINVAL;
+		else if (args->value)
+			i915_gem_context_set_recoverable(ctx);
+		else
+			i915_gem_context_clear_recoverable(ctx);
+		break;
+
 	case I915_CONTEXT_PARAM_PRIORITY:
 		{
 			s64 priority = args->value;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index ca150a764c24..071108d34ae0 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -134,6 +134,7 @@ struct i915_gem_context {
 #define UCONTEXT_NO_ZEROMAP		0
 #define UCONTEXT_NO_ERROR_CAPTURE	1
 #define UCONTEXT_BANNABLE		2
+#define UCONTEXT_RECOVERABLE		3
 
 	/**
 	 * @flags: small set of booleans
@@ -270,6 +271,21 @@ static inline void i915_gem_context_clear_bannable(struct i915_gem_context *ctx)
 	clear_bit(UCONTEXT_BANNABLE, &ctx->user_flags);
 }
 
+static inline bool i915_gem_context_is_recoverable(const struct i915_gem_context *ctx)
+{
+	return test_bit(UCONTEXT_RECOVERABLE, &ctx->user_flags);
+}
+
+static inline void i915_gem_context_set_recoverable(struct i915_gem_context *ctx)
+{
+	set_bit(UCONTEXT_RECOVERABLE, &ctx->user_flags);
+}
+
+static inline void i915_gem_context_clear_recoverable(struct i915_gem_context *ctx)
+{
+	clear_bit(UCONTEXT_RECOVERABLE, &ctx->user_flags);
+}
+
 static inline bool i915_gem_context_is_banned(const struct i915_gem_context *ctx)
 {
 	return test_bit(CONTEXT_BANNED, &ctx->flags);
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index a6fb6754f621..a171465b1c04 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -66,7 +66,8 @@ static bool context_mark_guilty(struct i915_gem_context *ctx)
 
 	bannable = i915_gem_context_is_bannable(ctx);
 	score = atomic_add_return(CONTEXT_SCORE_GUILTY, &ctx->ban_score);
-	banned = score >= CONTEXT_SCORE_BAN_THRESHOLD;
+	banned = (!i915_gem_context_is_recoverable(ctx) ||
+		  score >= CONTEXT_SCORE_BAN_THRESHOLD);
 
 	/* Cool contexts don't accumulate client ban score */
 	if (!bannable)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 397810fa2d33..c890b7992d5c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1491,6 +1491,26 @@ struct drm_i915_gem_context_param {
 	 * drm_i915_gem_context_param_sseu.
 	 */
 #define I915_CONTEXT_PARAM_SSEU		0x7
+
+/*
+ * Not all clients may want to attempt automatic recover of a context after
+ * a hang (for example, some clients may only submit very small incremental
+ * batches relying on known logical state of previous batches which will never
+ * recover correctly and each attempt will hang), and so would prefer that
+ * the context is forever banned instead.
+ *
+ * If set to false (0), after a reset, subsequent (and in flight) rendering
+ * from this context is discarded, and the client will need to create a new
+ * context to use instead.
+ *
+ * If set to true (1), the kernel will automatically attempt to recover the
+ * context by skipping the hanging batch and executing the next batch starting
+ * from the default context state (discarding the incomplete logical context
+ * state lost due to the reset).
+ *
+ * On creation, all new contexts are marked as recoverable.
+ */
+#define I915_CONTEXT_PARAM_RECOVERABLE	0x8
 	__u64 value;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/9] drm/i915: Make user contexts bannable again!
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
                   ` (9 preceding siblings ...)
  2019-02-17 16:39 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-02-17 17:19 ` Patchwork
  2019-02-17 17:31 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/9] drm/i915: Make user contexts bannable again! (rev2) Patchwork
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2019-02-17 17:19 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [1/9] drm/i915: Make user contexts bannable again!
URL   : https://patchwork.freedesktop.org/series/56809/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5616 -> Patchwork_12239
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/56809/revisions/1/mbox/

Known issues
------------

  Here are the changes found in Patchwork_12239 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_suspend@basic-s3:
    - fi-byt-clapper:     PASS -> INCOMPLETE [fdo#102657]

  * igt@pm_rpm@module-reload:
    - fi-skl-6260u:       PASS -> INCOMPLETE [fdo#107807]

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s4-devices:
    - fi-blb-e6850:       INCOMPLETE [fdo#107718] -> PASS

  * igt@i915_selftest@live_execlists:
    - fi-apl-guc:         INCOMPLETE [fdo#103927] -> PASS
    - {fi-icl-y}:         INCOMPLETE [fdo#109567] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#102657]: https://bugs.freedesktop.org/show_bug.cgi?id=102657
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#107807]: https://bugs.freedesktop.org/show_bug.cgi?id=107807
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109567]: https://bugs.freedesktop.org/show_bug.cgi?id=109567


Participating hosts (40 -> 36)
------------------------------

  Additional (1): fi-glk-j4005 
  Missing    (5): fi-kbl-soraka fi-ilk-m540 fi-bsw-n3050 fi-byt-squawks fi-bsw-cyan 


Build changes
-------------

    * Linux: CI_DRM_5616 -> Patchwork_12239

  CI_DRM_5616: 3479bf1d93b4e556be2804ff39ad41293fc48e4f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4833: 7802324e86ddf947cba847e910f75b1a8affe8d7 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12239: 1b1e5512337ec75d9db7e58615cdf4b47b71d826 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

1b1e5512337e drm/i915: Trim delays for wedging
28f9b341fa1d drm/i915: Trim i915_do_reset() to minimum delays
b3e220ab636d drm/i915: Reorder struct_mutex-vs-reset_lock in i915_gem_fault()
3dc9280a028c drm/i915: Avoid reset lock in writing fence registers
cca5b459de06 drm/i915: Move verify_wm_state() to heap
fc8819eea289 drm/i915/selftests: Move local mock_ggtt allocations to the heap
1105d9066d77 drm/i915: Optionally disable automatic recovery after a GPU reset
0bad7186178e drm/i915: Prevent user context creation while wedged
b5b63e011d58 drm/i915: Make user contexts bannable again!

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12239/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/9] drm/i915: Make user contexts bannable again! (rev2)
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
                   ` (10 preceding siblings ...)
  2019-02-17 17:19 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-02-17 17:31 ` Patchwork
  2019-02-17 17:34 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2019-02-17 17:31 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [1/9] drm/i915: Make user contexts bannable again! (rev2)
URL   : https://patchwork.freedesktop.org/series/56809/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
f1a2277064d5 drm/i915: Make user contexts bannable again!
801c337002f2 drm/i915: Prevent user context creation while wedged
dad779ca76b6 drm/i915: Optionally disable automatic recovery after a GPU reset
-:20: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#20: 
References: https://lists.freedesktop.org/archives/mesa-dev/2019-February/215431.html

total: 0 errors, 1 warnings, 0 checks, 96 lines checked
ee4d03caf0ea drm/i915/selftests: Move local mock_ggtt allocations to the heap
655613b3ee20 drm/i915: Move verify_wm_state() to heap
258f897a1c79 drm/i915: Avoid reset lock in writing fence registers
-:23: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#23: 
<4> [125.739679] 00000000a730190a (&dev_priv->gpu_error.reset_backoff_srcu){+.+.}, at: i915_reset_trylock+0x0/0x310 [i915]

total: 0 errors, 1 warnings, 0 checks, 26 lines checked
d1862023dd64 drm/i915: Reorder struct_mutex-vs-reset_lock in i915_gem_fault()
f6b521db8e4e drm/i915: Trim i915_do_reset() to minimum delays
-:21: CHECK:USLEEP_RANGE: usleep_range is preferred over udelay; see Documentation/timers/timers-howto.txt
#21: FILE: drivers/gpu/drm/i915/i915_reset.c:165:
+	udelay(20);

-:27: CHECK:USLEEP_RANGE: usleep_range is preferred over udelay; see Documentation/timers/timers-howto.txt
#27: FILE: drivers/gpu/drm/i915/i915_reset.c:170:
+	udelay(20);

total: 0 errors, 0 warnings, 2 checks, 14 lines checked
3be34136a0eb drm/i915: Trim delays for wedging

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [1/9] drm/i915: Make user contexts bannable again! (rev2)
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
                   ` (11 preceding siblings ...)
  2019-02-17 17:31 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/9] drm/i915: Make user contexts bannable again! (rev2) Patchwork
@ 2019-02-17 17:34 ` Patchwork
  2019-02-17 17:51 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2019-02-17 17:34 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [1/9] drm/i915: Make user contexts bannable again! (rev2)
URL   : https://patchwork.freedesktop.org/series/56809/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915: Make user contexts bannable again!
Okay!

Commit: drm/i915: Prevent user context creation while wedged
Okay!

Commit: drm/i915: Optionally disable automatic recovery after a GPU reset
Okay!

Commit: drm/i915/selftests: Move local mock_ggtt allocations to the heap
Okay!

Commit: drm/i915: Move verify_wm_state() to heap
Okay!

Commit: drm/i915: Avoid reset lock in writing fence registers
Okay!

Commit: drm/i915: Reorder struct_mutex-vs-reset_lock in i915_gem_fault()
-O:drivers/gpu/drm/i915/i915_gem.c:1898:32: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem.c:1898:32: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem.c:1898:32: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem.c:1898:32: warning: expression using sizeof(void)

Commit: drm/i915: Trim i915_do_reset() to minimum delays
Okay!

Commit: drm/i915: Trim delays for wedging
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/9] drm/i915: Make user contexts bannable again! (rev2)
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
                   ` (12 preceding siblings ...)
  2019-02-17 17:34 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-02-17 17:51 ` Patchwork
  2019-02-17 23:12 ` ✗ Fi.CI.IGT: failure " Patchwork
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2019-02-17 17:51 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [1/9] drm/i915: Make user contexts bannable again! (rev2)
URL   : https://patchwork.freedesktop.org/series/56809/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5617 -> Patchwork_12240
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/56809/revisions/2/mbox/

Known issues
------------

  Here are the changes found in Patchwork_12240 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_suspend@basic-s3:
    - fi-blb-e6850:       PASS -> INCOMPLETE [fdo#107718]

  * igt@i915_selftest@live_execlists:
    - fi-apl-guc:         PASS -> INCOMPLETE [fdo#103927]

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-kbl-7500u:       PASS -> FAIL [fdo#109485]

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence:
    - fi-byt-clapper:     PASS -> FAIL [fdo#103191] / [fdo#107362]

  * igt@pm_rpm@basic-rte:
    - fi-bsw-kefka:       PASS -> FAIL [fdo#108800]

  
#### Possible fixes ####

  * igt@i915_selftest@live_evict:
    - fi-bsw-kefka:       DMESG-WARN [fdo#107709] -> PASS

  * igt@kms_pipe_crc_basic@read-crc-pipe-a:
    - fi-byt-clapper:     FAIL [fdo#107362] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#105998]: https://bugs.freedesktop.org/show_bug.cgi?id=105998
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#107709]: https://bugs.freedesktop.org/show_bug.cgi?id=107709
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
  [fdo#108622]: https://bugs.freedesktop.org/show_bug.cgi?id=108622
  [fdo#108800]: https://bugs.freedesktop.org/show_bug.cgi?id=108800
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109294]: https://bugs.freedesktop.org/show_bug.cgi?id=109294
  [fdo#109309]: https://bugs.freedesktop.org/show_bug.cgi?id=109309
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [fdo#109316]: https://bugs.freedesktop.org/show_bug.cgi?id=109316
  [fdo#109485]: https://bugs.freedesktop.org/show_bug.cgi?id=109485
  [fdo#109527]: https://bugs.freedesktop.org/show_bug.cgi?id=109527
  [fdo#109528]: https://bugs.freedesktop.org/show_bug.cgi?id=109528
  [fdo#109530]: https://bugs.freedesktop.org/show_bug.cgi?id=109530
  [fdo#109635 ]: https://bugs.freedesktop.org/show_bug.cgi?id=109635 


Participating hosts (37 -> 36)
------------------------------

  Additional (4): fi-icl-y fi-glk-j4005 fi-icl-u2 fi-whl-u 
  Missing    (5): fi-kbl-soraka fi-ilk-m540 fi-byt-j1900 fi-byt-squawks fi-bsw-cyan 


Build changes
-------------

    * Linux: CI_DRM_5617 -> Patchwork_12240

  CI_DRM_5617: 75d902e598dbb8f95b42abc70dfcf41d275f478f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4833: 7802324e86ddf947cba847e910f75b1a8affe8d7 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12240: 3be34136a0eb1a3262399e2b7478074dededbb6a @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

3be34136a0eb drm/i915: Trim delays for wedging
f6b521db8e4e drm/i915: Trim i915_do_reset() to minimum delays
d1862023dd64 drm/i915: Reorder struct_mutex-vs-reset_lock in i915_gem_fault()
258f897a1c79 drm/i915: Avoid reset lock in writing fence registers
655613b3ee20 drm/i915: Move verify_wm_state() to heap
ee4d03caf0ea drm/i915/selftests: Move local mock_ggtt allocations to the heap
dad779ca76b6 drm/i915: Optionally disable automatic recovery after a GPU reset
801c337002f2 drm/i915: Prevent user context creation while wedged
f1a2277064d5 drm/i915: Make user contexts bannable again!

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12240/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/9] drm/i915/selftests: Move local mock_ggtt allocations to the heap
  2019-02-17 16:11 ` [PATCH 4/9] drm/i915/selftests: Move local mock_ggtt allocations to the heap Chris Wilson
@ 2019-02-17 18:40   ` Matthew Auld
  2019-02-17 20:22     ` Chris Wilson
  0 siblings, 1 reply; 23+ messages in thread
From: Matthew Auld @ 2019-02-17 18:40 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development, Matthew Auld

On Sun, 17 Feb 2019 at 16:13, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> This struct appears quite large and pushes our stack frame over
> 1024 bytes -- too high for conservative setups. So move the mock_ggtt
> struct to the heap.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Matthew Auld <matthew.auld@intel.com>

Missing kfree() somewhere?

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/9] drm/i915/selftests: Move local mock_ggtt allocations to the heap
  2019-02-17 18:40   ` Matthew Auld
@ 2019-02-17 20:22     ` Chris Wilson
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2019-02-17 20:22 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development, Matthew Auld

Quoting Matthew Auld (2019-02-17 18:40:05)
> On Sun, 17 Feb 2019 at 16:13, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >
> > This struct appears quite large and pushes our stack frame over
> > 1024 bytes -- too high for conservative setups. So move the mock_ggtt
> > struct to the heap.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Matthew Auld <matthew.auld@intel.com>
> 
> Missing kfree() somewhere?

Or two.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* ✗ Fi.CI.IGT: failure for series starting with [1/9] drm/i915: Make user contexts bannable again! (rev2)
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
                   ` (13 preceding siblings ...)
  2019-02-17 17:51 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-02-17 23:12 ` Patchwork
  2019-02-18 10:45   ` Mika Kuoppala
  2019-02-22 15:24 ` Sasha Levin
  16 siblings, 0 replies; 23+ messages in thread
From: Patchwork @ 2019-02-17 23:12 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [1/9] drm/i915: Make user contexts bannable again! (rev2)
URL   : https://patchwork.freedesktop.org/series/56809/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_5617_full -> Patchwork_12240_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_12240_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_12240_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12240_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_ctx_param@invalid-param-set:
    - shard-kbl:          PASS -> FAIL
    - shard-hsw:          PASS -> FAIL
    - shard-snb:          PASS -> FAIL
    - shard-glk:          PASS -> FAIL
    - shard-iclb:         PASS -> FAIL
    - shard-apl:          PASS -> FAIL

  
Known issues
------------

  Here are the changes found in Patchwork_12240_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_param@invalid-param-get:
    - shard-apl:          PASS -> FAIL [fdo#109559]
    - shard-iclb:         PASS -> FAIL [fdo#109559]
    - shard-glk:          PASS -> FAIL [fdo#109559]
    - shard-snb:          PASS -> FAIL [fdo#109559]
    - shard-hsw:          PASS -> FAIL [fdo#109559]
    - shard-kbl:          PASS -> FAIL [fdo#109559]

  * igt@kms_color@pipe-c-legacy-gamma:
    - shard-glk:          PASS -> FAIL [fdo#104782]

  * igt@kms_cursor_crc@cursor-256x256-dpms:
    - shard-apl:          PASS -> FAIL [fdo#103232]

  * igt@kms_cursor_legacy@cursor-vs-flip-atomic-transitions:
    - shard-hsw:          PASS -> FAIL [fdo#103355]

  * igt@kms_fbcon_fbt@fbc:
    - shard-iclb:         PASS -> DMESG-WARN [fdo#109593]

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-draw-mmap-wc:
    - shard-glk:          PASS -> FAIL [fdo#103167] +3

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-spr-indfb-draw-mmap-wc:
    - shard-iclb:         PASS -> FAIL [fdo#103167] +2

  * igt@kms_plane_multiple@atomic-pipe-a-tiling-y:
    - shard-apl:          PASS -> FAIL [fdo#103166]

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-x:
    - shard-glk:          PASS -> FAIL [fdo#103166] +1

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-yf:
    - shard-iclb:         PASS -> FAIL [fdo#103166]

  * igt@kms_vblank@pipe-c-ts-continuation-dpms-rpm:
    - shard-apl:          PASS -> FAIL [fdo#104894]

  * igt@pm_rpm@dpms-lpsp:
    - shard-iclb:         PASS -> DMESG-WARN [fdo#107724] +2

  
#### Possible fixes ####

  * igt@i915_selftest@live_workarounds:
    - shard-iclb:         DMESG-FAIL [fdo#108954] -> PASS

  * igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-a:
    - shard-glk:          DMESG-WARN [fdo#107956] -> PASS

  * igt@kms_cursor_crc@cursor-128x42-sliding:
    - shard-apl:          FAIL [fdo#103232] -> PASS +7

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic:
    - shard-glk:          FAIL [fdo#102670] -> PASS

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-iclb:         FAIL [fdo#105363] -> PASS

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-gtt:
    - shard-apl:          FAIL [fdo#103167] -> PASS +3

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-move:
    - shard-iclb:         FAIL [fdo#103167] -> PASS +1

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-fullscreen:
    - shard-glk:          FAIL [fdo#103167] -> PASS

  * igt@kms_plane@plane-position-covered-pipe-b-planes:
    - shard-glk:          FAIL [fdo#103166] -> PASS

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-x:
    - shard-iclb:         FAIL [fdo#103166] -> PASS

  * igt@kms_plane_multiple@atomic-pipe-c-tiling-y:
    - shard-apl:          FAIL [fdo#103166] -> PASS +1

  * igt@kms_sysfs_edid_timing:
    - shard-iclb:         FAIL [fdo#100047] -> PASS

  * igt@pm_rpm@cursor-dpms:
    - shard-iclb:         DMESG-WARN [fdo#107724] -> PASS +2

  * igt@pm_rpm@universal-planes:
    - shard-iclb:         DMESG-WARN [fdo#108654] / [fdo#108756] -> PASS

  
#### Warnings ####

  * igt@pm_backlight@fade_with_suspend:
    - shard-iclb:         FAIL [fdo#107847] -> DMESG-FAIL [fdo#107724] / [fdo#107847]

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#100047]: https://bugs.freedesktop.org/show_bug.cgi?id=100047
  [fdo#102670]: https://bugs.freedesktop.org/show_bug.cgi?id=102670
  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#103355]: https://bugs.freedesktop.org/show_bug.cgi?id=103355
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#104782]: https://bugs.freedesktop.org/show_bug.cgi?id=104782
  [fdo#104894]: https://bugs.freedesktop.org/show_bug.cgi?id=104894
  [fdo#105363]: https://bugs.freedesktop.org/show_bug.cgi?id=105363
  [fdo#107724]: https://bugs.freedesktop.org/show_bug.cgi?id=107724
  [fdo#107847]: https://bugs.freedesktop.org/show_bug.cgi?id=107847
  [fdo#107956]: https://bugs.freedesktop.org/show_bug.cgi?id=107956
  [fdo#108654]: https://bugs.freedesktop.org/show_bug.cgi?id=108654
  [fdo#108756]: https://bugs.freedesktop.org/show_bug.cgi?id=108756
  [fdo#108954]: https://bugs.freedesktop.org/show_bug.cgi?id=108954
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280
  [fdo#109281]: https://bugs.freedesktop.org/show_bug.cgi?id=109281
  [fdo#109559]: https://bugs.freedesktop.org/show_bug.cgi?id=109559
  [fdo#109593]: https://bugs.freedesktop.org/show_bug.cgi?id=109593


Participating hosts (7 -> 6)
------------------------------

  Missing    (1): shard-skl 


Build changes
-------------

    * Linux: CI_DRM_5617 -> Patchwork_12240

  CI_DRM_5617: 75d902e598dbb8f95b42abc70dfcf41d275f478f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4833: 7802324e86ddf947cba847e910f75b1a8affe8d7 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12240: 3be34136a0eb1a3262399e2b7478074dededbb6a @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12240/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/9] drm/i915: Make user contexts bannable again!
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
@ 2019-02-18 10:45   ` Mika Kuoppala
  2019-02-17 16:11 ` [PATCH 3/9] drm/i915: Optionally disable automatic recovery after a GPU reset Chris Wilson
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Mika Kuoppala @ 2019-02-18 10:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Chris Wilson, stable

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Since moving the bannable boolean into the context flags, we lost the
> default setting of contexts being bannable. Oops.
>
> Sadly because we have multi-level banning scheme, our testcase for being
> banned cannot distinguish between the expected ban on the context and
> the applied banned via the fd.
>
> Fixes: 6095868a271d ("drm/i915: Complete kerneldoc for struct i915_gem_context")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: <stable@vger.kernel.org> # v4.11+
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 280813a4bf82..102866967998 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -364,6 +364,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
>  	list_add_tail(&ctx->link, &dev_priv->contexts.list);
>  	ctx->i915 = dev_priv;
>  	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
> +	ctx->user_flags = BIT(UCONTEXT_BANNABLE);
>

But it is there, after setting the ring size.

-Mika

>  	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
>  		intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/9] drm/i915: Make user contexts bannable again!
@ 2019-02-18 10:45   ` Mika Kuoppala
  0 siblings, 0 replies; 23+ messages in thread
From: Mika Kuoppala @ 2019-02-18 10:45 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, stable

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Since moving the bannable boolean into the context flags, we lost the
> default setting of contexts being bannable. Oops.
>
> Sadly because we have multi-level banning scheme, our testcase for being
> banned cannot distinguish between the expected ban on the context and
> the applied banned via the fd.
>
> Fixes: 6095868a271d ("drm/i915: Complete kerneldoc for struct i915_gem_context")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: <stable@vger.kernel.org> # v4.11+
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 280813a4bf82..102866967998 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -364,6 +364,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
>  	list_add_tail(&ctx->link, &dev_priv->contexts.list);
>  	ctx->i915 = dev_priv;
>  	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
> +	ctx->user_flags = BIT(UCONTEXT_BANNABLE);
>

But it is there, after setting the ring size.

-Mika

>  	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
>  		intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/9] drm/i915: Make user contexts bannable again!
  2019-02-18 10:45   ` Mika Kuoppala
  (?)
@ 2019-02-18 10:51   ` Chris Wilson
  -1 siblings, 0 replies; 23+ messages in thread
From: Chris Wilson @ 2019-02-18 10:51 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx; +Cc: stable

Quoting Mika Kuoppala (2019-02-18 10:45:32)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Since moving the bannable boolean into the context flags, we lost the
> > default setting of contexts being bannable. Oops.
> >
> > Sadly because we have multi-level banning scheme, our testcase for being
> > banned cannot distinguish between the expected ban on the context and
> > the applied banned via the fd.
> >
> > Fixes: 6095868a271d ("drm/i915: Complete kerneldoc for struct i915_gem_context")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: <stable@vger.kernel.org> # v4.11+
> > ---
> >  drivers/gpu/drm/i915/i915_gem_context.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 280813a4bf82..102866967998 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -364,6 +364,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
> >       list_add_tail(&ctx->link, &dev_priv->contexts.list);
> >       ctx->i915 = dev_priv;
> >       ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
> > +     ctx->user_flags = BIT(UCONTEXT_BANNABLE);
> >
> 
> But it is there, after setting the ring size.

Hmm. However my mesa context didn't get banned until the fd did,
which is not what was intended. Odd.

So which is preferrable setting user_flags explicitly or using the
helper, probably the latter since that's the style we already have.
-Chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/9] drm/i915: Make user contexts bannable again!
  2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
                   ` (15 preceding siblings ...)
  2019-02-18 10:45   ` Mika Kuoppala
@ 2019-02-22 15:24 ` Sasha Levin
  16 siblings, 0 replies; 23+ messages in thread
From: Sasha Levin @ 2019-02-22 15:24 UTC (permalink / raw)
  To: Sasha Levin, Chris Wilson, intel-gfx; +Cc: stable

Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 6095868a271d drm/i915: Complete kerneldoc for struct i915_gem_context.

The bot has tested the following trees: v4.20.11, v4.19.24, v4.14.102.

v4.20.11: Failed to apply! Possible dependencies:
    7651a4452ddf ("drm/i915: Reserve some priority bits for internal use")

v4.19.24: Failed to apply! Possible dependencies:
    7651a4452ddf ("drm/i915: Reserve some priority bits for internal use")

v4.14.102: Failed to apply! Possible dependencies:
    0a53bc07f044 ("drm/i915/gvt: Separate cmd scan from request allocation")
    0cce2823ed37 ("drm/i915/gvt: Refine error handling for prepare_execlist_workload")
    1406a14b0ed9 ("drm/i915/gvt: Introduce intel_vgpu_submission")
    1603660b3342 ("drm/i915/gvt: set max priority for gvt context")
    465c403cb508 ("drm/i915: introduce simple gemfs")
    54cff6479fd8 ("drm/i915/gvt: Make elsp_dwords in the right order")
    7651a4452ddf ("drm/i915: Reserve some priority bits for internal use")
    9a9829e9eb8b ("drm/i915/gvt: Move workload cache init/clean into intel_vgpu_{setup, clean}_submission()")
    a3cfdca920b2 ("drm/i915/gvt: Add error handling for intel_gvt_scan_and_shadow_workload")
    b7268c5eed0a ("drm/i915: Pack params to engine->schedule() into a struct")
    d8235b5e5584 ("drm/i915/gvt: Move common workload preparation into prepare_workload()")
    e61e0f51ba79 ("drm/i915: Rename drm_i915_gem_request to i915_request")
    e91ef99b9543 ("drm/i915/selftests: Remember to create the fake preempt context")
    f2880e04f3a5 ("drm/i915/gvt: Move request alloc to dispatch_workload path only")


How should we proceed with this patch?

--
Thanks,
Sasha
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-02-22 15:24 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-17 16:11 [PATCH 1/9] drm/i915: Make user contexts bannable again! Chris Wilson
2019-02-17 16:11 ` [PATCH 2/9] drm/i915: Prevent user context creation while wedged Chris Wilson
2019-02-17 16:11 ` [PATCH 3/9] drm/i915: Optionally disable automatic recovery after a GPU reset Chris Wilson
2019-02-17 17:05   ` [PATCH] " Chris Wilson
2019-02-17 16:11 ` [PATCH 4/9] drm/i915/selftests: Move local mock_ggtt allocations to the heap Chris Wilson
2019-02-17 18:40   ` Matthew Auld
2019-02-17 20:22     ` Chris Wilson
2019-02-17 16:11 ` [PATCH 5/9] drm/i915: Move verify_wm_state() to heap Chris Wilson
2019-02-17 16:11 ` [PATCH 6/9] drm/i915: Avoid reset lock in writing fence registers Chris Wilson
2019-02-17 16:11 ` [PATCH 7/9] drm/i915: Reorder struct_mutex-vs-reset_lock in i915_gem_fault() Chris Wilson
2019-02-17 16:11 ` [PATCH 8/9] drm/i915: Trim i915_do_reset() to minimum delays Chris Wilson
2019-02-17 16:11 ` [PATCH 9/9] drm/i915: Trim delays for wedging Chris Wilson
2019-02-17 16:36 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/9] drm/i915: Make user contexts bannable again! Patchwork
2019-02-17 16:39 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-02-17 17:19 ` ✓ Fi.CI.BAT: success " Patchwork
2019-02-17 17:31 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/9] drm/i915: Make user contexts bannable again! (rev2) Patchwork
2019-02-17 17:34 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-02-17 17:51 ` ✓ Fi.CI.BAT: success " Patchwork
2019-02-17 23:12 ` ✗ Fi.CI.IGT: failure " Patchwork
2019-02-18 10:45 ` [PATCH 1/9] drm/i915: Make user contexts bannable again! Mika Kuoppala
2019-02-18 10:45   ` Mika Kuoppala
2019-02-18 10:51   ` Chris Wilson
2019-02-22 15:24 ` Sasha Levin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.