All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/6] Default request/fence expiry + watchdog
@ 2021-03-12 15:46 ` Tvrtko Ursulin
  0 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-12 15:46 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter, dri-devel, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

"Watchdog" aka "restoring hangcheck" aka default request/fence expiry - first
post of a somewhat controversial feature so may be somewhat rough in commit
messages, commentary and implementation. So only RFC for now.

I parenthesise the "watchdog" becuase in classical sense watchdog would allow
userspace to ping it and so remain alive.

I parenthesise "restoring hangcheck" because this series, contrary to the old
hangcheck, is not looking at whether the workload is making any progress from
the kernel side either. (Althoguh disclaimer my memory may be leaky - Daniel
suspects old hangcheck had some stricter, more indiscriminatory, angles to it.
But apart from being prone to both false negatives and false positives I can't
remember that myself.)

Short version - ask is to fail any user submissions after a set time period. In
this RFC that time is ten seconds.

Time counts from the moment user submission is "runnable" (implicit and explicit
dependencies have been cleared) and keeps counting regardless of the GPU
contetion caused by other users of the system. So semantics are really a bit
weak but again, I understand this is really wanted by the DRM core.

As an attempt to compensate for this brutish nature, I proposed adding
extendable configurability via a context param as part of the series. That could
allow userspace to pick different semantics (always going more restrictive than
the system default) and so implement interesting things like long desired media
watchdog. Module trickyness of the implementation there.

Test-with: 20210312093329.1639502-1-tvrtko.ursulin@linux.intel.com
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Chris Wilson (1):
  drm/i915: Individual request cancellation

Tvrtko Ursulin (5):
  drm/i915: Restrict sentinel requests further
  drm/i915: Request watchdog infrastructure
  drm/i915: Allow userspace to configure the watchdog
  drm/i915: Fail too long user submissions by default
  drm/i915: Allow configuring default request expiry via modparam

 drivers/gpu/drm/i915/Kconfig.profile          |   8 +
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  92 ++++++
 .../gpu/drm/i915/gem/i915_gem_context_types.h |   4 +
 drivers/gpu/drm/i915/gt/intel_context_param.h |  11 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   4 +
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   1 +
 .../drm/i915/gt/intel_execlists_submission.c  |  11 +-
 .../drm/i915/gt/intel_execlists_submission.h  |   2 +
 drivers/gpu/drm/i915/gt/intel_gt.c            |   3 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |   2 +
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   |  21 ++
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |   7 +
 drivers/gpu/drm/i915/i915_params.c            |   5 +
 drivers/gpu/drm/i915/i915_params.h            |   1 +
 drivers/gpu/drm/i915/i915_request.c           | 129 +++++++-
 drivers/gpu/drm/i915/i915_request.h           |  12 +-
 drivers/gpu/drm/i915/selftests/i915_request.c | 275 ++++++++++++++++++
 include/uapi/drm/i915_drm.h                   |   5 +-
 18 files changed, 584 insertions(+), 9 deletions(-)

-- 
2.27.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-gfx] [RFC 0/6] Default request/fence expiry + watchdog
@ 2021-03-12 15:46 ` Tvrtko Ursulin
  0 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-12 15:46 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter, dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

"Watchdog" aka "restoring hangcheck" aka default request/fence expiry - first
post of a somewhat controversial feature so may be somewhat rough in commit
messages, commentary and implementation. So only RFC for now.

I parenthesise the "watchdog" becuase in classical sense watchdog would allow
userspace to ping it and so remain alive.

I parenthesise "restoring hangcheck" because this series, contrary to the old
hangcheck, is not looking at whether the workload is making any progress from
the kernel side either. (Althoguh disclaimer my memory may be leaky - Daniel
suspects old hangcheck had some stricter, more indiscriminatory, angles to it.
But apart from being prone to both false negatives and false positives I can't
remember that myself.)

Short version - ask is to fail any user submissions after a set time period. In
this RFC that time is ten seconds.

Time counts from the moment user submission is "runnable" (implicit and explicit
dependencies have been cleared) and keeps counting regardless of the GPU
contetion caused by other users of the system. So semantics are really a bit
weak but again, I understand this is really wanted by the DRM core.

As an attempt to compensate for this brutish nature, I proposed adding
extendable configurability via a context param as part of the series. That could
allow userspace to pick different semantics (always going more restrictive than
the system default) and so implement interesting things like long desired media
watchdog. Module trickyness of the implementation there.

Test-with: 20210312093329.1639502-1-tvrtko.ursulin@linux.intel.com
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Chris Wilson (1):
  drm/i915: Individual request cancellation

Tvrtko Ursulin (5):
  drm/i915: Restrict sentinel requests further
  drm/i915: Request watchdog infrastructure
  drm/i915: Allow userspace to configure the watchdog
  drm/i915: Fail too long user submissions by default
  drm/i915: Allow configuring default request expiry via modparam

 drivers/gpu/drm/i915/Kconfig.profile          |   8 +
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  92 ++++++
 .../gpu/drm/i915/gem/i915_gem_context_types.h |   4 +
 drivers/gpu/drm/i915/gt/intel_context_param.h |  11 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   4 +
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   1 +
 .../drm/i915/gt/intel_execlists_submission.c  |  11 +-
 .../drm/i915/gt/intel_execlists_submission.h  |   2 +
 drivers/gpu/drm/i915/gt/intel_gt.c            |   3 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |   2 +
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   |  21 ++
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |   7 +
 drivers/gpu/drm/i915/i915_params.c            |   5 +
 drivers/gpu/drm/i915/i915_params.h            |   1 +
 drivers/gpu/drm/i915/i915_request.c           | 129 +++++++-
 drivers/gpu/drm/i915/i915_request.h           |  12 +-
 drivers/gpu/drm/i915/selftests/i915_request.c | 275 ++++++++++++++++++
 include/uapi/drm/i915_drm.h                   |   5 +-
 18 files changed, 584 insertions(+), 9 deletions(-)

-- 
2.27.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC 1/6] drm/i915: Individual request cancellation
  2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-03-12 15:46   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-12 15:46 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Tvrtko Ursulin, dri-devel, Chris Wilson

From: Chris Wilson <chris@chris-wilson.co.uk>

Currently, we cancel outstanding requests within a context when the
context is closed. We may also want to cancel individual requests using
the same graceful preemption mechanism.

v2 (Tvrtko):
 * Cancel waiters carefully considering no timeline lock and RCU.
 * Fixed selftests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   1 +
 .../drm/i915/gt/intel_execlists_submission.c  |   9 +-
 drivers/gpu/drm/i915/i915_request.c           |  77 ++++-
 drivers/gpu/drm/i915/i915_request.h           |   4 +-
 drivers/gpu/drm/i915/selftests/i915_request.c | 275 ++++++++++++++++++
 5 files changed, 360 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index 0b062fad1837..e2fb3ae2aaf3 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -314,6 +314,7 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
 		mutex_unlock(&ce->timeline->mutex);
 	}
 
+	intel_engine_flush_scheduler(engine);
 	intel_engine_pm_put(engine);
 	return err;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 85ff5fe861b4..4c2acb5a6c0a 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -421,6 +421,11 @@ static void reset_active(struct i915_request *rq,
 	ce->lrc.lrca = lrc_update_regs(ce, engine, head);
 }
 
+static bool bad_request(const struct i915_request *rq)
+{
+	return rq->fence.error && i915_request_started(rq);
+}
+
 static struct intel_engine_cs *
 __execlists_schedule_in(struct i915_request *rq)
 {
@@ -433,7 +438,7 @@ __execlists_schedule_in(struct i915_request *rq)
 		     !intel_engine_has_heartbeat(engine)))
 		intel_context_set_banned(ce);
 
-	if (unlikely(intel_context_is_banned(ce)))
+	if (unlikely(intel_context_is_banned(ce) || bad_request(rq)))
 		reset_active(rq, engine);
 
 	if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
@@ -1112,7 +1117,7 @@ static unsigned long active_preempt_timeout(struct intel_engine_cs *engine,
 		return 0;
 
 	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
-	if (unlikely(intel_context_is_banned(rq->context)))
+	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
 		return 1;
 
 	return READ_ONCE(engine->props.preempt_timeout_ms);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index e7b4c4bc41a6..fb9c5bb1fe41 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -33,7 +33,10 @@
 #include "gem/i915_gem_context.h"
 #include "gt/intel_breadcrumbs.h"
 #include "gt/intel_context.h"
+#include "gt/intel_engine.h"
+#include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_gpu_commands.h"
+#include "gt/intel_reset.h"
 #include "gt/intel_ring.h"
 #include "gt/intel_rps.h"
 
@@ -429,20 +432,22 @@ void __i915_request_skip(struct i915_request *rq)
 	rq->infix = rq->postfix;
 }
 
-void i915_request_set_error_once(struct i915_request *rq, int error)
+bool i915_request_set_error_once(struct i915_request *rq, int error)
 {
 	int old;
 
 	GEM_BUG_ON(!IS_ERR_VALUE((long)error));
 
 	if (i915_request_signaled(rq))
-		return;
+		return false;
 
 	old = READ_ONCE(rq->fence.error);
 	do {
 		if (fatal_error(old))
-			return;
+			return false;
 	} while (!try_cmpxchg(&rq->fence.error, &old, error));
+
+	return true;
 }
 
 struct i915_request *i915_request_mark_eio(struct i915_request *rq)
@@ -609,6 +614,72 @@ void i915_request_unsubmit(struct i915_request *request)
 	spin_unlock_irqrestore(&se->lock, flags);
 }
 
+static struct intel_engine_cs *active_engine(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine, *locked;
+
+	locked = READ_ONCE(rq->engine);
+	spin_lock_irq(&locked->sched.lock);
+	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
+		spin_unlock(&locked->sched.lock);
+		locked = engine;
+		spin_lock(&locked->sched.lock);
+	}
+
+	engine = NULL;
+	if (i915_request_is_active(rq) && !__i915_request_is_complete(rq))
+		engine = locked;
+
+	spin_unlock_irq(&locked->sched.lock);
+
+	return engine;
+}
+
+static void __cancel_request(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine = active_engine(rq);
+
+	if (engine && intel_engine_pulse(engine))
+		intel_gt_handle_error(engine->gt, engine->mask, 0,
+				      "request cancellation by %s",
+				      current->comm);
+}
+
+void i915_request_cancel(struct i915_request *rq, int error)
+{
+	if (!i915_request_set_error_once(rq, error))
+		return;
+
+	set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
+
+	if (i915_sw_fence_signaled(&rq->submit)) {
+		struct i915_dependency *p;
+
+restart:
+		rcu_read_lock();
+		for_each_waiter(p, rq) {
+			struct i915_request *w =
+				container_of(p->waiter, typeof(*w), sched);
+
+			if (__i915_request_is_complete(w) ||
+			    fatal_error(w->fence.error))
+				continue;
+
+			w = i915_request_get(w);
+			rcu_read_unlock();
+			/* Recursion bound by the number of engines */
+			i915_request_cancel(w, error);
+			i915_request_put(w);
+
+			/* Restart after having to drop rcu lock. */
+			goto restart;
+		}
+		rcu_read_unlock();
+	}
+
+	__cancel_request(rq);
+}
+
 static int __i915_sw_fence_call
 submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 {
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index dd10a6db3d21..64869a313b3e 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -312,7 +312,7 @@ struct i915_request * __must_check
 i915_request_create(struct intel_context *ce);
 
 void __i915_request_skip(struct i915_request *rq);
-void i915_request_set_error_once(struct i915_request *rq, int error);
+bool i915_request_set_error_once(struct i915_request *rq, int error);
 struct i915_request *i915_request_mark_eio(struct i915_request *rq);
 
 struct i915_request *__i915_request_commit(struct i915_request *request);
@@ -368,6 +368,8 @@ void i915_request_submit(struct i915_request *request);
 void __i915_request_unsubmit(struct i915_request *request);
 void i915_request_unsubmit(struct i915_request *request);
 
+void i915_request_cancel(struct i915_request *rq, int error);
+
 long i915_request_wait(struct i915_request *rq,
 		       unsigned int flags,
 		       long timeout)
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 8035ea7565ed..e63609ec5b97 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -608,6 +608,280 @@ static int live_nop_request(void *arg)
 	return err;
 }
 
+static int __cancel_inactive(struct intel_engine_cs *engine)
+{
+	struct intel_context *ce;
+	struct igt_spinner spin;
+	struct i915_request *rq;
+	int err = 0;
+
+	if (igt_spinner_init(&spin, engine->gt))
+		return -ENOMEM;
+
+	ce = intel_context_create(engine);
+	if (IS_ERR(ce)) {
+		err = PTR_ERR(ce);
+		goto out_spin;
+	}
+
+	rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
+	if (IS_ERR(rq)) {
+		err = PTR_ERR(rq);
+		goto out_ce;
+	}
+
+	pr_debug("%s: Cancelling inactive request\n", engine->name);
+	i915_request_cancel(rq, -EINTR);
+	i915_request_get(rq);
+	i915_request_add(rq);
+
+	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+		struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+		pr_err("%s: Failed to cancel inactive request\n", engine->name);
+		intel_engine_dump(engine, &p, "%s\n", engine->name);
+		err = -ETIME;
+		goto out_rq;
+	}
+
+	if (rq->fence.error != -EINTR) {
+		pr_err("%s: fence not cancelled (%u)\n",
+		       engine->name, rq->fence.error);
+		err = -EINVAL;
+	}
+
+out_rq:
+	i915_request_put(rq);
+out_ce:
+	intel_context_put(ce);
+out_spin:
+	igt_spinner_fini(&spin);
+	if (err)
+		pr_err("%s: __cancel_inactive error %d\n", engine->name, err);
+	return err;
+}
+
+static int __cancel_active(struct intel_engine_cs *engine)
+{
+	struct intel_context *ce;
+	struct igt_spinner spin;
+	struct i915_request *rq;
+	int err = 0;
+
+	if (igt_spinner_init(&spin, engine->gt))
+		return -ENOMEM;
+
+	ce = intel_context_create(engine);
+	if (IS_ERR(ce)) {
+		err = PTR_ERR(ce);
+		goto out_spin;
+	}
+
+	rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
+	if (IS_ERR(rq)) {
+		err = PTR_ERR(rq);
+		goto out_ce;
+	}
+
+	pr_debug("%s: Cancelling active request\n", engine->name);
+	i915_request_get(rq);
+	i915_request_add(rq);
+	if (!igt_wait_for_spinner(&spin, rq)) {
+		struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+		pr_err("Failed to start spinner on %s\n", engine->name);
+		intel_engine_dump(engine, &p, "%s\n", engine->name);
+		err = -ETIME;
+		goto out_rq;
+	}
+	i915_request_cancel(rq, -EINTR);
+
+	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+		struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+		pr_err("%s: Failed to cancel active request\n", engine->name);
+		intel_engine_dump(engine, &p, "%s\n", engine->name);
+		err = -ETIME;
+		goto out_rq;
+	}
+
+	if (rq->fence.error != -EINTR) {
+		pr_err("%s: fence not cancelled (%u)\n",
+		       engine->name, rq->fence.error);
+		err = -EINVAL;
+	}
+
+out_rq:
+	i915_request_put(rq);
+out_ce:
+	intel_context_put(ce);
+out_spin:
+	igt_spinner_fini(&spin);
+	if (err)
+		pr_err("%s: __cancel_active error %d\n", engine->name, err);
+	return err;
+}
+
+static int __cancel_active_chain(struct intel_engine_cs *engine)
+{
+	struct intel_context *ce;
+	struct igt_spinner spin;
+	struct i915_request *rq[2];
+	int err = 0;
+
+	if (igt_spinner_init(&spin, engine->gt))
+		return -ENOMEM;
+
+	ce = intel_context_create(engine);
+	if (IS_ERR(ce)) {
+		err = PTR_ERR(ce);
+		goto out_spin;
+	}
+
+	rq[0] = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
+	if (IS_ERR(rq[0])) {
+		err = PTR_ERR(rq[0]);
+		goto out_ce;
+	}
+	i915_request_get(rq[0]);
+	i915_request_add(rq[0]);
+
+	rq[1] = intel_context_create_request(ce);
+	if (IS_ERR(rq[1])) {
+		err = PTR_ERR(rq[1]);
+		goto out_spinner;
+	}
+	i915_request_get(rq[1]);
+	i915_request_add(rq[1]);
+
+	pr_debug("%s: Cancelling active chain\n", engine->name);
+	intel_engine_flush_scheduler(engine);
+	i915_request_cancel(rq[0], -EINTR);
+	igt_spinner_end(&spin);
+
+	if (i915_request_wait(rq[1], 0, HZ / 5) < 0) {
+		struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+		pr_err("%s: Failed to cancel chained request\n", engine->name);
+		intel_engine_dump(engine, &p, "%s\n", engine->name);
+		err = -ETIME;
+		goto out_waiter;
+	}
+
+	if (rq[0]->fence.error != -EINTR) {
+		pr_err("%s: first fence not cancelled (%u)\n",
+		       engine->name, rq[0]->fence.error);
+		err = -EINVAL;
+	}
+
+	if (rq[1]->fence.error != -EINTR) {
+		pr_err("%s: second fence not cancelled (%u)\n",
+		       engine->name, rq[1]->fence.error);
+		err = -EINVAL;
+	}
+
+out_waiter:
+	i915_request_put(rq[1]);
+out_spinner:
+	i915_request_put(rq[0]);
+out_ce:
+	intel_context_put(ce);
+out_spin:
+	igt_spinner_fini(&spin);
+	if (err)
+		pr_err("%s: __cancel_active_chain error %d\n",
+		       engine->name, err);
+	return err;
+}
+
+static int __cancel_completed(struct intel_engine_cs *engine)
+{
+	struct intel_context *ce;
+	struct igt_spinner spin;
+	struct i915_request *rq;
+	int err = 0;
+
+	if (igt_spinner_init(&spin, engine->gt))
+		return -ENOMEM;
+
+	ce = intel_context_create(engine);
+	if (IS_ERR(ce)) {
+		err = PTR_ERR(ce);
+		goto out_spin;
+	}
+
+	rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
+	if (IS_ERR(rq)) {
+		err = PTR_ERR(rq);
+		goto out_ce;
+	}
+	igt_spinner_end(&spin);
+	i915_request_get(rq);
+	i915_request_add(rq);
+
+	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+		err = -ETIME;
+		goto out_rq;
+	}
+
+	pr_debug("%s: Cancelling completed request\n", engine->name);
+	i915_request_cancel(rq, -EINTR);
+	if (rq->fence.error) {
+		pr_err("%s: fence not cancelled (%u)\n",
+		       engine->name, rq->fence.error);
+		err = -EINVAL;
+	}
+
+out_rq:
+	i915_request_put(rq);
+out_ce:
+	intel_context_put(ce);
+out_spin:
+	igt_spinner_fini(&spin);
+	if (err)
+		pr_err("%s: __cancel_completed error %d\n", engine->name, err);
+	return err;
+}
+
+static int live_cancel_request(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+
+	/*
+	 * Check cancellation of requests. We expect to be able to immediately
+	 * cancel active requests, even if they are currently on the GPU.
+	 */
+
+	for_each_uabi_engine(engine, i915) {
+		struct igt_live_test t;
+		int err, err2;
+
+		if (!intel_engine_has_preemption(engine))
+			continue;
+
+		err = igt_live_test_begin(&t, i915, __func__, engine->name);
+		if (err)
+			return err;
+
+		err = __cancel_inactive(engine);
+		if (err == 0)
+			err = __cancel_active(engine);
+		if (err == 0)
+			err = __cancel_active_chain(engine);
+		if (err == 0)
+			err = __cancel_completed(engine);
+
+		err2 = igt_live_test_end(&t);
+		if (err)
+			return err;
+		if (err2)
+			return err2;
+	}
+
+	return 0;
+}
+
 static struct i915_vma *empty_batch(struct drm_i915_private *i915)
 {
 	struct drm_i915_gem_object *obj;
@@ -1485,6 +1759,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_sequential_engines),
 		SUBTEST(live_parallel_engines),
 		SUBTEST(live_empty_request),
+		SUBTEST(live_cancel_request),
 		SUBTEST(live_breadcrumbs_smoketest),
 	};
 
-- 
2.27.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Intel-gfx] [RFC 1/6] drm/i915: Individual request cancellation
@ 2021-03-12 15:46   ` Tvrtko Ursulin
  0 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-12 15:46 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel, Chris Wilson

From: Chris Wilson <chris@chris-wilson.co.uk>

Currently, we cancel outstanding requests within a context when the
context is closed. We may also want to cancel individual requests using
the same graceful preemption mechanism.

v2 (Tvrtko):
 * Cancel waiters carefully considering no timeline lock and RCU.
 * Fixed selftests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   1 +
 .../drm/i915/gt/intel_execlists_submission.c  |   9 +-
 drivers/gpu/drm/i915/i915_request.c           |  77 ++++-
 drivers/gpu/drm/i915/i915_request.h           |   4 +-
 drivers/gpu/drm/i915/selftests/i915_request.c | 275 ++++++++++++++++++
 5 files changed, 360 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index 0b062fad1837..e2fb3ae2aaf3 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -314,6 +314,7 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
 		mutex_unlock(&ce->timeline->mutex);
 	}
 
+	intel_engine_flush_scheduler(engine);
 	intel_engine_pm_put(engine);
 	return err;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 85ff5fe861b4..4c2acb5a6c0a 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -421,6 +421,11 @@ static void reset_active(struct i915_request *rq,
 	ce->lrc.lrca = lrc_update_regs(ce, engine, head);
 }
 
+static bool bad_request(const struct i915_request *rq)
+{
+	return rq->fence.error && i915_request_started(rq);
+}
+
 static struct intel_engine_cs *
 __execlists_schedule_in(struct i915_request *rq)
 {
@@ -433,7 +438,7 @@ __execlists_schedule_in(struct i915_request *rq)
 		     !intel_engine_has_heartbeat(engine)))
 		intel_context_set_banned(ce);
 
-	if (unlikely(intel_context_is_banned(ce)))
+	if (unlikely(intel_context_is_banned(ce) || bad_request(rq)))
 		reset_active(rq, engine);
 
 	if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
@@ -1112,7 +1117,7 @@ static unsigned long active_preempt_timeout(struct intel_engine_cs *engine,
 		return 0;
 
 	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
-	if (unlikely(intel_context_is_banned(rq->context)))
+	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
 		return 1;
 
 	return READ_ONCE(engine->props.preempt_timeout_ms);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index e7b4c4bc41a6..fb9c5bb1fe41 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -33,7 +33,10 @@
 #include "gem/i915_gem_context.h"
 #include "gt/intel_breadcrumbs.h"
 #include "gt/intel_context.h"
+#include "gt/intel_engine.h"
+#include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_gpu_commands.h"
+#include "gt/intel_reset.h"
 #include "gt/intel_ring.h"
 #include "gt/intel_rps.h"
 
@@ -429,20 +432,22 @@ void __i915_request_skip(struct i915_request *rq)
 	rq->infix = rq->postfix;
 }
 
-void i915_request_set_error_once(struct i915_request *rq, int error)
+bool i915_request_set_error_once(struct i915_request *rq, int error)
 {
 	int old;
 
 	GEM_BUG_ON(!IS_ERR_VALUE((long)error));
 
 	if (i915_request_signaled(rq))
-		return;
+		return false;
 
 	old = READ_ONCE(rq->fence.error);
 	do {
 		if (fatal_error(old))
-			return;
+			return false;
 	} while (!try_cmpxchg(&rq->fence.error, &old, error));
+
+	return true;
 }
 
 struct i915_request *i915_request_mark_eio(struct i915_request *rq)
@@ -609,6 +614,72 @@ void i915_request_unsubmit(struct i915_request *request)
 	spin_unlock_irqrestore(&se->lock, flags);
 }
 
+static struct intel_engine_cs *active_engine(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine, *locked;
+
+	locked = READ_ONCE(rq->engine);
+	spin_lock_irq(&locked->sched.lock);
+	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
+		spin_unlock(&locked->sched.lock);
+		locked = engine;
+		spin_lock(&locked->sched.lock);
+	}
+
+	engine = NULL;
+	if (i915_request_is_active(rq) && !__i915_request_is_complete(rq))
+		engine = locked;
+
+	spin_unlock_irq(&locked->sched.lock);
+
+	return engine;
+}
+
+static void __cancel_request(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine = active_engine(rq);
+
+	if (engine && intel_engine_pulse(engine))
+		intel_gt_handle_error(engine->gt, engine->mask, 0,
+				      "request cancellation by %s",
+				      current->comm);
+}
+
+void i915_request_cancel(struct i915_request *rq, int error)
+{
+	if (!i915_request_set_error_once(rq, error))
+		return;
+
+	set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
+
+	if (i915_sw_fence_signaled(&rq->submit)) {
+		struct i915_dependency *p;
+
+restart:
+		rcu_read_lock();
+		for_each_waiter(p, rq) {
+			struct i915_request *w =
+				container_of(p->waiter, typeof(*w), sched);
+
+			if (__i915_request_is_complete(w) ||
+			    fatal_error(w->fence.error))
+				continue;
+
+			w = i915_request_get(w);
+			rcu_read_unlock();
+			/* Recursion bound by the number of engines */
+			i915_request_cancel(w, error);
+			i915_request_put(w);
+
+			/* Restart after having to drop rcu lock. */
+			goto restart;
+		}
+		rcu_read_unlock();
+	}
+
+	__cancel_request(rq);
+}
+
 static int __i915_sw_fence_call
 submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 {
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index dd10a6db3d21..64869a313b3e 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -312,7 +312,7 @@ struct i915_request * __must_check
 i915_request_create(struct intel_context *ce);
 
 void __i915_request_skip(struct i915_request *rq);
-void i915_request_set_error_once(struct i915_request *rq, int error);
+bool i915_request_set_error_once(struct i915_request *rq, int error);
 struct i915_request *i915_request_mark_eio(struct i915_request *rq);
 
 struct i915_request *__i915_request_commit(struct i915_request *request);
@@ -368,6 +368,8 @@ void i915_request_submit(struct i915_request *request);
 void __i915_request_unsubmit(struct i915_request *request);
 void i915_request_unsubmit(struct i915_request *request);
 
+void i915_request_cancel(struct i915_request *rq, int error);
+
 long i915_request_wait(struct i915_request *rq,
 		       unsigned int flags,
 		       long timeout)
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 8035ea7565ed..e63609ec5b97 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -608,6 +608,280 @@ static int live_nop_request(void *arg)
 	return err;
 }
 
+static int __cancel_inactive(struct intel_engine_cs *engine)
+{
+	struct intel_context *ce;
+	struct igt_spinner spin;
+	struct i915_request *rq;
+	int err = 0;
+
+	if (igt_spinner_init(&spin, engine->gt))
+		return -ENOMEM;
+
+	ce = intel_context_create(engine);
+	if (IS_ERR(ce)) {
+		err = PTR_ERR(ce);
+		goto out_spin;
+	}
+
+	rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
+	if (IS_ERR(rq)) {
+		err = PTR_ERR(rq);
+		goto out_ce;
+	}
+
+	pr_debug("%s: Cancelling inactive request\n", engine->name);
+	i915_request_cancel(rq, -EINTR);
+	i915_request_get(rq);
+	i915_request_add(rq);
+
+	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+		struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+		pr_err("%s: Failed to cancel inactive request\n", engine->name);
+		intel_engine_dump(engine, &p, "%s\n", engine->name);
+		err = -ETIME;
+		goto out_rq;
+	}
+
+	if (rq->fence.error != -EINTR) {
+		pr_err("%s: fence not cancelled (%u)\n",
+		       engine->name, rq->fence.error);
+		err = -EINVAL;
+	}
+
+out_rq:
+	i915_request_put(rq);
+out_ce:
+	intel_context_put(ce);
+out_spin:
+	igt_spinner_fini(&spin);
+	if (err)
+		pr_err("%s: __cancel_inactive error %d\n", engine->name, err);
+	return err;
+}
+
+static int __cancel_active(struct intel_engine_cs *engine)
+{
+	struct intel_context *ce;
+	struct igt_spinner spin;
+	struct i915_request *rq;
+	int err = 0;
+
+	if (igt_spinner_init(&spin, engine->gt))
+		return -ENOMEM;
+
+	ce = intel_context_create(engine);
+	if (IS_ERR(ce)) {
+		err = PTR_ERR(ce);
+		goto out_spin;
+	}
+
+	rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
+	if (IS_ERR(rq)) {
+		err = PTR_ERR(rq);
+		goto out_ce;
+	}
+
+	pr_debug("%s: Cancelling active request\n", engine->name);
+	i915_request_get(rq);
+	i915_request_add(rq);
+	if (!igt_wait_for_spinner(&spin, rq)) {
+		struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+		pr_err("Failed to start spinner on %s\n", engine->name);
+		intel_engine_dump(engine, &p, "%s\n", engine->name);
+		err = -ETIME;
+		goto out_rq;
+	}
+	i915_request_cancel(rq, -EINTR);
+
+	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+		struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+		pr_err("%s: Failed to cancel active request\n", engine->name);
+		intel_engine_dump(engine, &p, "%s\n", engine->name);
+		err = -ETIME;
+		goto out_rq;
+	}
+
+	if (rq->fence.error != -EINTR) {
+		pr_err("%s: fence not cancelled (%u)\n",
+		       engine->name, rq->fence.error);
+		err = -EINVAL;
+	}
+
+out_rq:
+	i915_request_put(rq);
+out_ce:
+	intel_context_put(ce);
+out_spin:
+	igt_spinner_fini(&spin);
+	if (err)
+		pr_err("%s: __cancel_active error %d\n", engine->name, err);
+	return err;
+}
+
+static int __cancel_active_chain(struct intel_engine_cs *engine)
+{
+	struct intel_context *ce;
+	struct igt_spinner spin;
+	struct i915_request *rq[2];
+	int err = 0;
+
+	if (igt_spinner_init(&spin, engine->gt))
+		return -ENOMEM;
+
+	ce = intel_context_create(engine);
+	if (IS_ERR(ce)) {
+		err = PTR_ERR(ce);
+		goto out_spin;
+	}
+
+	rq[0] = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
+	if (IS_ERR(rq[0])) {
+		err = PTR_ERR(rq[0]);
+		goto out_ce;
+	}
+	i915_request_get(rq[0]);
+	i915_request_add(rq[0]);
+
+	rq[1] = intel_context_create_request(ce);
+	if (IS_ERR(rq[1])) {
+		err = PTR_ERR(rq[1]);
+		goto out_spinner;
+	}
+	i915_request_get(rq[1]);
+	i915_request_add(rq[1]);
+
+	pr_debug("%s: Cancelling active chain\n", engine->name);
+	intel_engine_flush_scheduler(engine);
+	i915_request_cancel(rq[0], -EINTR);
+	igt_spinner_end(&spin);
+
+	if (i915_request_wait(rq[1], 0, HZ / 5) < 0) {
+		struct drm_printer p = drm_info_printer(engine->i915->drm.dev);
+
+		pr_err("%s: Failed to cancel chained request\n", engine->name);
+		intel_engine_dump(engine, &p, "%s\n", engine->name);
+		err = -ETIME;
+		goto out_waiter;
+	}
+
+	if (rq[0]->fence.error != -EINTR) {
+		pr_err("%s: first fence not cancelled (%u)\n",
+		       engine->name, rq[0]->fence.error);
+		err = -EINVAL;
+	}
+
+	if (rq[1]->fence.error != -EINTR) {
+		pr_err("%s: second fence not cancelled (%u)\n",
+		       engine->name, rq[1]->fence.error);
+		err = -EINVAL;
+	}
+
+out_waiter:
+	i915_request_put(rq[1]);
+out_spinner:
+	i915_request_put(rq[0]);
+out_ce:
+	intel_context_put(ce);
+out_spin:
+	igt_spinner_fini(&spin);
+	if (err)
+		pr_err("%s: __cancel_active_chain error %d\n",
+		       engine->name, err);
+	return err;
+}
+
+static int __cancel_completed(struct intel_engine_cs *engine)
+{
+	struct intel_context *ce;
+	struct igt_spinner spin;
+	struct i915_request *rq;
+	int err = 0;
+
+	if (igt_spinner_init(&spin, engine->gt))
+		return -ENOMEM;
+
+	ce = intel_context_create(engine);
+	if (IS_ERR(ce)) {
+		err = PTR_ERR(ce);
+		goto out_spin;
+	}
+
+	rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
+	if (IS_ERR(rq)) {
+		err = PTR_ERR(rq);
+		goto out_ce;
+	}
+	igt_spinner_end(&spin);
+	i915_request_get(rq);
+	i915_request_add(rq);
+
+	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+		err = -ETIME;
+		goto out_rq;
+	}
+
+	pr_debug("%s: Cancelling completed request\n", engine->name);
+	i915_request_cancel(rq, -EINTR);
+	if (rq->fence.error) {
+		pr_err("%s: fence not cancelled (%u)\n",
+		       engine->name, rq->fence.error);
+		err = -EINVAL;
+	}
+
+out_rq:
+	i915_request_put(rq);
+out_ce:
+	intel_context_put(ce);
+out_spin:
+	igt_spinner_fini(&spin);
+	if (err)
+		pr_err("%s: __cancel_completed error %d\n", engine->name, err);
+	return err;
+}
+
+static int live_cancel_request(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+
+	/*
+	 * Check cancellation of requests. We expect to be able to immediately
+	 * cancel active requests, even if they are currently on the GPU.
+	 */
+
+	for_each_uabi_engine(engine, i915) {
+		struct igt_live_test t;
+		int err, err2;
+
+		if (!intel_engine_has_preemption(engine))
+			continue;
+
+		err = igt_live_test_begin(&t, i915, __func__, engine->name);
+		if (err)
+			return err;
+
+		err = __cancel_inactive(engine);
+		if (err == 0)
+			err = __cancel_active(engine);
+		if (err == 0)
+			err = __cancel_active_chain(engine);
+		if (err == 0)
+			err = __cancel_completed(engine);
+
+		err2 = igt_live_test_end(&t);
+		if (err)
+			return err;
+		if (err2)
+			return err2;
+	}
+
+	return 0;
+}
+
 static struct i915_vma *empty_batch(struct drm_i915_private *i915)
 {
 	struct drm_i915_gem_object *obj;
@@ -1485,6 +1759,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_sequential_engines),
 		SUBTEST(live_parallel_engines),
 		SUBTEST(live_empty_request),
+		SUBTEST(live_cancel_request),
 		SUBTEST(live_breadcrumbs_smoketest),
 	};
 
-- 
2.27.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 2/6] drm/i915: Restrict sentinel requests further
  2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-03-12 15:46   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-12 15:46 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Disallow sentinel requests follow previous sentinels to make request
cancellation work better when faced with a chain of requests which have
all been marked as in error.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 4c2acb5a6c0a..4b870eca9693 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -896,7 +896,7 @@ static bool can_merge_rq(const struct i915_request *prev,
 	if (__i915_request_is_complete(next))
 		return true;
 
-	if (unlikely((i915_request_flags(prev) ^ i915_request_flags(next)) &
+	if (unlikely((i915_request_flags(prev) | i915_request_flags(next)) &
 		     (BIT(I915_FENCE_FLAG_NOPREEMPT) |
 		      BIT(I915_FENCE_FLAG_SENTINEL))))
 		return false;
-- 
2.27.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Intel-gfx] [RFC 2/6] drm/i915: Restrict sentinel requests further
@ 2021-03-12 15:46   ` Tvrtko Ursulin
  0 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-12 15:46 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Disallow sentinel requests follow previous sentinels to make request
cancellation work better when faced with a chain of requests which have
all been marked as in error.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 4c2acb5a6c0a..4b870eca9693 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -896,7 +896,7 @@ static bool can_merge_rq(const struct i915_request *prev,
 	if (__i915_request_is_complete(next))
 		return true;
 
-	if (unlikely((i915_request_flags(prev) ^ i915_request_flags(next)) &
+	if (unlikely((i915_request_flags(prev) | i915_request_flags(next)) &
 		     (BIT(I915_FENCE_FLAG_NOPREEMPT) |
 		      BIT(I915_FENCE_FLAG_SENTINEL))))
 		return false;
-- 
2.27.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 3/6] drm/i915: Request watchdog infrastructure
  2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-03-12 15:46   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-12 15:46 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter, dri-devel, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Prepares the plumbing for setting request/fence expiration time. All code
is put in place but is never activeted due yet missing ability to actually
configure the timer.

Outline of the basic operation:

A timer is started when request is ready for execution. If the request
completes (retires) before the timer fires, timer is cancelled and nothing
further happens.

If the timer fires request is added to a lockless list and worker queued.
Purpose of this is twofold: a) It allows request cancellation from a more
friendly context and b) coalesces multiple expirations into a single event
of consuming the list.

Worker locklessly consumes the list of expired requests and cancels them
all using previous added i915_request_cancel().

Associated timeout value is stored in rq->context.watchdog.timeout_us.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |  4 ++
 .../drm/i915/gt/intel_execlists_submission.h  |  2 +
 drivers/gpu/drm/i915/gt/intel_gt.c            |  3 ++
 drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 21 ++++++++
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |  7 +++
 drivers/gpu/drm/i915/i915_request.c           | 52 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_request.h           |  8 +++
 8 files changed, 99 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 0ea18c9e2aca..65a5730a4f5b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -99,6 +99,10 @@ struct intel_context {
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
 #define CONTEXT_NOPREEMPT		8
 
+	struct {
+		u64 timeout_us;
+	} watchdog;
+
 	u32 *lrc_reg_state;
 	union {
 		struct {
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
index f7bd3fccfee8..4ca9b475e252 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
@@ -6,6 +6,7 @@
 #ifndef __INTEL_EXECLISTS_SUBMISSION_H__
 #define __INTEL_EXECLISTS_SUBMISSION_H__
 
+#include <linux/llist.h>
 #include <linux/types.h>
 
 struct drm_printer;
@@ -13,6 +14,7 @@ struct drm_printer;
 struct i915_request;
 struct intel_context;
 struct intel_engine_cs;
+struct intel_gt;
 
 enum {
 	INTEL_CONTEXT_SCHEDULE_IN = 0,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index ca76f93bc03d..8d77dcbad059 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -31,6 +31,9 @@ void intel_gt_init_early(struct intel_gt *gt, struct drm_i915_private *i915)
 	INIT_LIST_HEAD(&gt->closed_vma);
 	spin_lock_init(&gt->closed_lock);
 
+	init_llist_head(&gt->watchdog.list);
+	INIT_WORK(&gt->watchdog.work, intel_gt_watchdog_work);
+
 	intel_gt_init_buffer_pool(gt);
 	intel_gt_init_reset(gt);
 	intel_gt_init_requests(gt);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index a17bd8b3195f..7ec395cace69 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -78,4 +78,6 @@ static inline bool intel_gt_is_wedged(const struct intel_gt *gt)
 void intel_gt_info_print(const struct intel_gt_info *info,
 			 struct drm_printer *p);
 
+void intel_gt_watchdog_work(struct work_struct *work);
+
 #endif /* __INTEL_GT_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 36ec97f79174..df12fc7f0aa7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -8,6 +8,7 @@
 #include "i915_drv.h" /* for_each_engine() */
 #include "i915_request.h"
 #include "intel_engine_heartbeat.h"
+#include "intel_execlists_submission.h"
 #include "intel_gt.h"
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
@@ -242,4 +243,24 @@ void intel_gt_fini_requests(struct intel_gt *gt)
 {
 	/* Wait until the work is marked as finished before unloading! */
 	cancel_delayed_work_sync(&gt->requests.retire_work);
+
+	flush_work(&gt->watchdog.work);
+}
+
+void intel_gt_watchdog_work(struct work_struct *work)
+{
+	struct intel_gt *gt =
+		container_of(work, typeof(*gt), watchdog.work);
+	struct i915_request *rq, *rn;
+	struct llist_node *first;
+
+	first = llist_del_all(&gt->watchdog.list);
+	if (!first)
+		return;
+
+	llist_for_each_entry_safe(rq, rn, first, watchdog.link) {
+		if (!i915_request_completed(rq))
+			i915_request_cancel(rq, -EINTR);
+		i915_request_put(rq);
+	}
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 626af37c7790..d70ebcc6f19f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -8,10 +8,12 @@
 
 #include <linux/ktime.h>
 #include <linux/list.h>
+#include <linux/llist.h>
 #include <linux/mutex.h>
 #include <linux/notifier.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
+#include <linux/workqueue.h>
 
 #include "uc/intel_uc.h"
 
@@ -62,6 +64,11 @@ struct intel_gt {
 		struct delayed_work retire_work;
 	} requests;
 
+	struct {
+		struct llist_head list;
+		struct work_struct work;
+	} watchdog;
+
 	struct intel_wakeref wakeref;
 	atomic_t user_wakeref;
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index fb9c5bb1fe41..e6b9edaca13f 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -277,6 +277,53 @@ static void remove_from_engine(struct i915_request *rq)
 	__notify_execute_cb_imm(rq);
 }
 
+static void __rq_init_watchdog(struct i915_request *rq)
+{
+	rq->watchdog.timer.function = NULL;
+}
+
+static enum hrtimer_restart __rq_watchdog_expired(struct hrtimer *hrtimer)
+{
+	struct i915_request *rq =
+		container_of(hrtimer, struct i915_request, watchdog.timer);
+	struct intel_gt *gt = rq->engine->gt;
+
+	if (!i915_request_completed(rq)) {
+		if (llist_add(&rq->watchdog.link, &gt->watchdog.list))
+			schedule_work(&gt->watchdog.work);
+	} else {
+		i915_request_put(rq);
+	}
+
+	return HRTIMER_NORESTART;
+}
+
+static void __rq_arm_watchdog(struct i915_request *rq)
+{
+	struct i915_request_watchdog *wdg = &rq->watchdog;
+	struct intel_context *ce = rq->context;
+
+	if (!ce->watchdog.timeout_us)
+		return;
+
+	hrtimer_init(&wdg->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	wdg->timer.function = __rq_watchdog_expired;
+	hrtimer_start_range_ns(&wdg->timer,
+			       ns_to_ktime(ce->watchdog.timeout_us *
+					   NSEC_PER_USEC),
+			       NSEC_PER_MSEC, /* FIXME check if it gives the "not sooner" guarantee or slack is both ways */
+			       HRTIMER_MODE_REL);
+	i915_request_get(rq);
+}
+
+static void __rq_cancel_watchdog(struct i915_request *rq)
+{
+	struct i915_request_watchdog *wdg = &rq->watchdog;
+
+	if (wdg->timer.function && hrtimer_try_to_cancel(&wdg->timer) > 0)
+		i915_request_put(rq);
+}
+
 bool i915_request_retire(struct i915_request *rq)
 {
 	if (!__i915_request_is_complete(rq))
@@ -288,6 +335,8 @@ bool i915_request_retire(struct i915_request *rq)
 	trace_i915_request_retire(rq);
 	i915_request_mark_complete(rq);
 
+	__rq_cancel_watchdog(rq);
+
 	/*
 	 * We know the GPU must have read the request to have
 	 * sent us the seqno + interrupt, so use the position
@@ -692,6 +741,8 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 
 		if (unlikely(fence->error))
 			i915_request_set_error_once(request, fence->error);
+		else
+			__rq_arm_watchdog(request);
 
 		/*
 		 * We need to serialize use of the submit_request() callback
@@ -879,6 +930,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 
 	/* No zalloc, everything must be cleared after use */
 	rq->batch = NULL;
+	__rq_init_watchdog(rq);
 	GEM_BUG_ON(rq->capture_list);
 	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
 
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 64869a313b3e..294f16e2163d 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -26,7 +26,9 @@
 #define I915_REQUEST_H
 
 #include <linux/dma-fence.h>
+#include <linux/hrtimer.h>
 #include <linux/irq_work.h>
+#include <linux/llist.h>
 #include <linux/lockdep.h>
 
 #include "gem/i915_gem_context_types.h"
@@ -289,6 +291,12 @@ struct i915_request {
 	/** timeline->request entry for this request */
 	struct list_head link;
 
+	/** Watchdog support fields. */
+	struct i915_request_watchdog {
+		struct llist_node link;
+		struct hrtimer timer;
+	} watchdog;
+
 	I915_SELFTEST_DECLARE(struct {
 		struct list_head link;
 		unsigned long delay;
-- 
2.27.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Intel-gfx] [RFC 3/6] drm/i915: Request watchdog infrastructure
@ 2021-03-12 15:46   ` Tvrtko Ursulin
  0 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-12 15:46 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter, dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Prepares the plumbing for setting request/fence expiration time. All code
is put in place but is never activeted due yet missing ability to actually
configure the timer.

Outline of the basic operation:

A timer is started when request is ready for execution. If the request
completes (retires) before the timer fires, timer is cancelled and nothing
further happens.

If the timer fires request is added to a lockless list and worker queued.
Purpose of this is twofold: a) It allows request cancellation from a more
friendly context and b) coalesces multiple expirations into a single event
of consuming the list.

Worker locklessly consumes the list of expired requests and cancels them
all using previous added i915_request_cancel().

Associated timeout value is stored in rq->context.watchdog.timeout_us.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |  4 ++
 .../drm/i915/gt/intel_execlists_submission.h  |  2 +
 drivers/gpu/drm/i915/gt/intel_gt.c            |  3 ++
 drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 21 ++++++++
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |  7 +++
 drivers/gpu/drm/i915/i915_request.c           | 52 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_request.h           |  8 +++
 8 files changed, 99 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 0ea18c9e2aca..65a5730a4f5b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -99,6 +99,10 @@ struct intel_context {
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
 #define CONTEXT_NOPREEMPT		8
 
+	struct {
+		u64 timeout_us;
+	} watchdog;
+
 	u32 *lrc_reg_state;
 	union {
 		struct {
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
index f7bd3fccfee8..4ca9b475e252 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
@@ -6,6 +6,7 @@
 #ifndef __INTEL_EXECLISTS_SUBMISSION_H__
 #define __INTEL_EXECLISTS_SUBMISSION_H__
 
+#include <linux/llist.h>
 #include <linux/types.h>
 
 struct drm_printer;
@@ -13,6 +14,7 @@ struct drm_printer;
 struct i915_request;
 struct intel_context;
 struct intel_engine_cs;
+struct intel_gt;
 
 enum {
 	INTEL_CONTEXT_SCHEDULE_IN = 0,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index ca76f93bc03d..8d77dcbad059 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -31,6 +31,9 @@ void intel_gt_init_early(struct intel_gt *gt, struct drm_i915_private *i915)
 	INIT_LIST_HEAD(&gt->closed_vma);
 	spin_lock_init(&gt->closed_lock);
 
+	init_llist_head(&gt->watchdog.list);
+	INIT_WORK(&gt->watchdog.work, intel_gt_watchdog_work);
+
 	intel_gt_init_buffer_pool(gt);
 	intel_gt_init_reset(gt);
 	intel_gt_init_requests(gt);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index a17bd8b3195f..7ec395cace69 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -78,4 +78,6 @@ static inline bool intel_gt_is_wedged(const struct intel_gt *gt)
 void intel_gt_info_print(const struct intel_gt_info *info,
 			 struct drm_printer *p);
 
+void intel_gt_watchdog_work(struct work_struct *work);
+
 #endif /* __INTEL_GT_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 36ec97f79174..df12fc7f0aa7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -8,6 +8,7 @@
 #include "i915_drv.h" /* for_each_engine() */
 #include "i915_request.h"
 #include "intel_engine_heartbeat.h"
+#include "intel_execlists_submission.h"
 #include "intel_gt.h"
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
@@ -242,4 +243,24 @@ void intel_gt_fini_requests(struct intel_gt *gt)
 {
 	/* Wait until the work is marked as finished before unloading! */
 	cancel_delayed_work_sync(&gt->requests.retire_work);
+
+	flush_work(&gt->watchdog.work);
+}
+
+void intel_gt_watchdog_work(struct work_struct *work)
+{
+	struct intel_gt *gt =
+		container_of(work, typeof(*gt), watchdog.work);
+	struct i915_request *rq, *rn;
+	struct llist_node *first;
+
+	first = llist_del_all(&gt->watchdog.list);
+	if (!first)
+		return;
+
+	llist_for_each_entry_safe(rq, rn, first, watchdog.link) {
+		if (!i915_request_completed(rq))
+			i915_request_cancel(rq, -EINTR);
+		i915_request_put(rq);
+	}
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 626af37c7790..d70ebcc6f19f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -8,10 +8,12 @@
 
 #include <linux/ktime.h>
 #include <linux/list.h>
+#include <linux/llist.h>
 #include <linux/mutex.h>
 #include <linux/notifier.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
+#include <linux/workqueue.h>
 
 #include "uc/intel_uc.h"
 
@@ -62,6 +64,11 @@ struct intel_gt {
 		struct delayed_work retire_work;
 	} requests;
 
+	struct {
+		struct llist_head list;
+		struct work_struct work;
+	} watchdog;
+
 	struct intel_wakeref wakeref;
 	atomic_t user_wakeref;
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index fb9c5bb1fe41..e6b9edaca13f 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -277,6 +277,53 @@ static void remove_from_engine(struct i915_request *rq)
 	__notify_execute_cb_imm(rq);
 }
 
+static void __rq_init_watchdog(struct i915_request *rq)
+{
+	rq->watchdog.timer.function = NULL;
+}
+
+static enum hrtimer_restart __rq_watchdog_expired(struct hrtimer *hrtimer)
+{
+	struct i915_request *rq =
+		container_of(hrtimer, struct i915_request, watchdog.timer);
+	struct intel_gt *gt = rq->engine->gt;
+
+	if (!i915_request_completed(rq)) {
+		if (llist_add(&rq->watchdog.link, &gt->watchdog.list))
+			schedule_work(&gt->watchdog.work);
+	} else {
+		i915_request_put(rq);
+	}
+
+	return HRTIMER_NORESTART;
+}
+
+static void __rq_arm_watchdog(struct i915_request *rq)
+{
+	struct i915_request_watchdog *wdg = &rq->watchdog;
+	struct intel_context *ce = rq->context;
+
+	if (!ce->watchdog.timeout_us)
+		return;
+
+	hrtimer_init(&wdg->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	wdg->timer.function = __rq_watchdog_expired;
+	hrtimer_start_range_ns(&wdg->timer,
+			       ns_to_ktime(ce->watchdog.timeout_us *
+					   NSEC_PER_USEC),
+			       NSEC_PER_MSEC, /* FIXME check if it gives the "not sooner" guarantee or slack is both ways */
+			       HRTIMER_MODE_REL);
+	i915_request_get(rq);
+}
+
+static void __rq_cancel_watchdog(struct i915_request *rq)
+{
+	struct i915_request_watchdog *wdg = &rq->watchdog;
+
+	if (wdg->timer.function && hrtimer_try_to_cancel(&wdg->timer) > 0)
+		i915_request_put(rq);
+}
+
 bool i915_request_retire(struct i915_request *rq)
 {
 	if (!__i915_request_is_complete(rq))
@@ -288,6 +335,8 @@ bool i915_request_retire(struct i915_request *rq)
 	trace_i915_request_retire(rq);
 	i915_request_mark_complete(rq);
 
+	__rq_cancel_watchdog(rq);
+
 	/*
 	 * We know the GPU must have read the request to have
 	 * sent us the seqno + interrupt, so use the position
@@ -692,6 +741,8 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 
 		if (unlikely(fence->error))
 			i915_request_set_error_once(request, fence->error);
+		else
+			__rq_arm_watchdog(request);
 
 		/*
 		 * We need to serialize use of the submit_request() callback
@@ -879,6 +930,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 
 	/* No zalloc, everything must be cleared after use */
 	rq->batch = NULL;
+	__rq_init_watchdog(rq);
 	GEM_BUG_ON(rq->capture_list);
 	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
 
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 64869a313b3e..294f16e2163d 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -26,7 +26,9 @@
 #define I915_REQUEST_H
 
 #include <linux/dma-fence.h>
+#include <linux/hrtimer.h>
 #include <linux/irq_work.h>
+#include <linux/llist.h>
 #include <linux/lockdep.h>
 
 #include "gem/i915_gem_context_types.h"
@@ -289,6 +291,12 @@ struct i915_request {
 	/** timeline->request entry for this request */
 	struct list_head link;
 
+	/** Watchdog support fields. */
+	struct i915_request_watchdog {
+		struct llist_node link;
+		struct hrtimer timer;
+	} watchdog;
+
 	I915_SELFTEST_DECLARE(struct {
 		struct list_head link;
 		unsigned long delay;
-- 
2.27.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 4/6] drm/i915: Allow userspace to configure the watchdog
  2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-03-12 15:46   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-12 15:46 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter, dri-devel, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Idea here is to make the watchdog mechanism more useful than for just
default request/fence expiry.

To this effect a new context param I915_CONTEXT_PARAM_WATCHDOG is added
where the value fields allows passing in a timeout in micro-seconds.

This allows userspace to set a limit to how long they expect their batches
to take, or otherwise they will be cancelled, and userspace notified via
one of the available mechanisms.

Main attractiveness of adding uapi here is perhaps to extend the proposal
by passing in a structure instead of a single value, like for illustration
only:

struct drm_i915_gem_context_watchdog {
	__u64 flags;
 #define I915_CONTEXT_WATCHDOG_WALL_TIME	BIT(0)
 #define I915_CONTEXT_WATCHDOG_GPU_TIME		BIT(1)
 #define I915_CONTEXT_WATCHDOG_FROM_SUBMIT	BIT(2)
 #define I915_CONTEXT_WATCHDOG_FROM_RUNNABLE	BIT(3)
	__64 timeout_us;
};

Point being to prepare the uapi for different semantics from the start.
Given how not a single one makes complete sense for all use cases. And
also perhaps satisfy the long wanted media watchdog feature request.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 57 +++++++++++++++++++
 .../gpu/drm/i915/gem/i915_gem_context_types.h |  4 ++
 drivers/gpu/drm/i915/gt/intel_context_param.h | 11 +++-
 include/uapi/drm/i915_drm.h                   |  5 +-
 4 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index ca37d93ef5e7..32b05af4fc8f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -233,6 +233,8 @@ static void intel_context_set_gem(struct intel_context *ce,
 	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
 	    intel_engine_has_timeslices(ce->engine))
 		__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
+
+	intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us);
 }
 
 static void __free_engines(struct i915_gem_engines *e, unsigned int count)
@@ -1397,6 +1399,28 @@ static int set_ringsize(struct i915_gem_context *ctx,
 				 __intel_context_ring_size(args->value));
 }
 
+static int __apply_watchdog(struct intel_context *ce, void *timeout_us)
+{
+	return intel_context_set_watchdog_us(ce, (uintptr_t)timeout_us);
+}
+
+static int set_watchdog(struct i915_gem_context *ctx,
+			struct drm_i915_gem_context_param *args)
+{
+	int ret;
+
+	if (args->size)
+		return -EINVAL;
+
+	ret = context_apply_all(ctx, __apply_watchdog,
+				(void *)(uintptr_t)args->value);
+
+	if (!ret)
+		ctx->watchdog.timeout_us = args->value;
+
+	return ret;
+}
+
 static int __get_ringsize(struct intel_context *ce, void *arg)
 {
 	long sz;
@@ -1426,6 +1450,17 @@ static int get_ringsize(struct i915_gem_context *ctx,
 	return 0;
 }
 
+static int get_watchdog(struct i915_gem_context *ctx,
+			struct drm_i915_gem_context_param *args)
+{
+	if (args->size)
+		return -EINVAL;
+
+	args->value = ctx->watchdog.timeout_us;
+
+	return 0;
+}
+
 int
 i915_gem_user_to_context_sseu(struct intel_gt *gt,
 			      const struct drm_i915_gem_context_param_sseu *user,
@@ -2075,6 +2110,10 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
 		ret = set_ringsize(ctx, args);
 		break;
 
+	case I915_CONTEXT_PARAM_WATCHDOG:
+		ret = set_watchdog(ctx, args);
+		break;
+
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
@@ -2196,6 +2235,19 @@ static int clone_schedattr(struct i915_gem_context *dst,
 	return 0;
 }
 
+static int clone_watchdog(struct i915_gem_context *dst,
+			  struct i915_gem_context *src)
+{
+	int ret;
+
+	ret = context_apply_all(dst, __apply_watchdog,
+				(void *)(uintptr_t)src->watchdog.timeout_us);
+	if (!ret)
+		dst->watchdog = src->watchdog;
+
+	return ret;
+}
+
 static int clone_sseu(struct i915_gem_context *dst,
 		      struct i915_gem_context *src)
 {
@@ -2279,6 +2331,7 @@ static int create_clone(struct i915_user_extension __user *ext, void *data)
 		MAP(SSEU, clone_sseu),
 		MAP(TIMELINE, clone_timeline),
 		MAP(VM, clone_vm),
+		MAP(WATCHDOG, clone_watchdog),
 #undef MAP
 	};
 	struct drm_i915_gem_context_create_ext_clone local;
@@ -2532,6 +2585,10 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 		ret = get_ringsize(ctx, args);
 		break;
 
+	case I915_CONTEXT_PARAM_WATCHDOG:
+		ret = get_watchdog(ctx, args);
+		break;
+
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index d5bc75508048..f17da7e26c43 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -150,6 +150,10 @@ struct i915_gem_context {
 	 */
 	atomic_t active_count;
 
+	struct {
+		u64 timeout_us;
+	} watchdog;
+
 	/**
 	 * @hang_timestamp: The last time(s) this context caused a GPU hang
 	 */
diff --git a/drivers/gpu/drm/i915/gt/intel_context_param.h b/drivers/gpu/drm/i915/gt/intel_context_param.h
index f053d8633fe2..3ecacc675f41 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_param.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_param.h
@@ -6,9 +6,18 @@
 #ifndef INTEL_CONTEXT_PARAM_H
 #define INTEL_CONTEXT_PARAM_H
 
-struct intel_context;
+#include <linux/types.h>
+
+#include "intel_context.h"
 
 int intel_context_set_ring_size(struct intel_context *ce, long sz);
 long intel_context_get_ring_size(struct intel_context *ce);
 
+static inline int
+intel_context_set_watchdog_us(struct intel_context *ce, u64 timeout_us)
+{
+	ce->watchdog.timeout_us = timeout_us;
+	return 0;
+}
+
 #endif /* INTEL_CONTEXT_PARAM_H */
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 1987e2ea79a3..a4c65780850c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1694,6 +1694,8 @@ struct drm_i915_gem_context_param {
  * Default is 16 KiB.
  */
 #define I915_CONTEXT_PARAM_RINGSIZE	0xc
+
+#define I915_CONTEXT_PARAM_WATCHDOG	0xd
 /* Must be kept compact -- no holes and well documented */
 
 	__u64 value;
@@ -1863,7 +1865,8 @@ struct drm_i915_gem_context_create_ext_clone {
 #define I915_CONTEXT_CLONE_SSEU		(1u << 3)
 #define I915_CONTEXT_CLONE_TIMELINE	(1u << 4)
 #define I915_CONTEXT_CLONE_VM		(1u << 5)
-#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_VM << 1)
+#define I915_CONTEXT_CLONE_WATCHDOG	(1u << 6)
+#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_WATCHDOG << 1)
 	__u64 rsvd;
 };
 
-- 
2.27.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Intel-gfx] [RFC 4/6] drm/i915: Allow userspace to configure the watchdog
@ 2021-03-12 15:46   ` Tvrtko Ursulin
  0 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-12 15:46 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter, dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Idea here is to make the watchdog mechanism more useful than for just
default request/fence expiry.

To this effect a new context param I915_CONTEXT_PARAM_WATCHDOG is added
where the value fields allows passing in a timeout in micro-seconds.

This allows userspace to set a limit to how long they expect their batches
to take, or otherwise they will be cancelled, and userspace notified via
one of the available mechanisms.

Main attractiveness of adding uapi here is perhaps to extend the proposal
by passing in a structure instead of a single value, like for illustration
only:

struct drm_i915_gem_context_watchdog {
	__u64 flags;
 #define I915_CONTEXT_WATCHDOG_WALL_TIME	BIT(0)
 #define I915_CONTEXT_WATCHDOG_GPU_TIME		BIT(1)
 #define I915_CONTEXT_WATCHDOG_FROM_SUBMIT	BIT(2)
 #define I915_CONTEXT_WATCHDOG_FROM_RUNNABLE	BIT(3)
	__64 timeout_us;
};

Point being to prepare the uapi for different semantics from the start.
Given how not a single one makes complete sense for all use cases. And
also perhaps satisfy the long wanted media watchdog feature request.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 57 +++++++++++++++++++
 .../gpu/drm/i915/gem/i915_gem_context_types.h |  4 ++
 drivers/gpu/drm/i915/gt/intel_context_param.h | 11 +++-
 include/uapi/drm/i915_drm.h                   |  5 +-
 4 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index ca37d93ef5e7..32b05af4fc8f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -233,6 +233,8 @@ static void intel_context_set_gem(struct intel_context *ce,
 	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
 	    intel_engine_has_timeslices(ce->engine))
 		__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
+
+	intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us);
 }
 
 static void __free_engines(struct i915_gem_engines *e, unsigned int count)
@@ -1397,6 +1399,28 @@ static int set_ringsize(struct i915_gem_context *ctx,
 				 __intel_context_ring_size(args->value));
 }
 
+static int __apply_watchdog(struct intel_context *ce, void *timeout_us)
+{
+	return intel_context_set_watchdog_us(ce, (uintptr_t)timeout_us);
+}
+
+static int set_watchdog(struct i915_gem_context *ctx,
+			struct drm_i915_gem_context_param *args)
+{
+	int ret;
+
+	if (args->size)
+		return -EINVAL;
+
+	ret = context_apply_all(ctx, __apply_watchdog,
+				(void *)(uintptr_t)args->value);
+
+	if (!ret)
+		ctx->watchdog.timeout_us = args->value;
+
+	return ret;
+}
+
 static int __get_ringsize(struct intel_context *ce, void *arg)
 {
 	long sz;
@@ -1426,6 +1450,17 @@ static int get_ringsize(struct i915_gem_context *ctx,
 	return 0;
 }
 
+static int get_watchdog(struct i915_gem_context *ctx,
+			struct drm_i915_gem_context_param *args)
+{
+	if (args->size)
+		return -EINVAL;
+
+	args->value = ctx->watchdog.timeout_us;
+
+	return 0;
+}
+
 int
 i915_gem_user_to_context_sseu(struct intel_gt *gt,
 			      const struct drm_i915_gem_context_param_sseu *user,
@@ -2075,6 +2110,10 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
 		ret = set_ringsize(ctx, args);
 		break;
 
+	case I915_CONTEXT_PARAM_WATCHDOG:
+		ret = set_watchdog(ctx, args);
+		break;
+
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
@@ -2196,6 +2235,19 @@ static int clone_schedattr(struct i915_gem_context *dst,
 	return 0;
 }
 
+static int clone_watchdog(struct i915_gem_context *dst,
+			  struct i915_gem_context *src)
+{
+	int ret;
+
+	ret = context_apply_all(dst, __apply_watchdog,
+				(void *)(uintptr_t)src->watchdog.timeout_us);
+	if (!ret)
+		dst->watchdog = src->watchdog;
+
+	return ret;
+}
+
 static int clone_sseu(struct i915_gem_context *dst,
 		      struct i915_gem_context *src)
 {
@@ -2279,6 +2331,7 @@ static int create_clone(struct i915_user_extension __user *ext, void *data)
 		MAP(SSEU, clone_sseu),
 		MAP(TIMELINE, clone_timeline),
 		MAP(VM, clone_vm),
+		MAP(WATCHDOG, clone_watchdog),
 #undef MAP
 	};
 	struct drm_i915_gem_context_create_ext_clone local;
@@ -2532,6 +2585,10 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 		ret = get_ringsize(ctx, args);
 		break;
 
+	case I915_CONTEXT_PARAM_WATCHDOG:
+		ret = get_watchdog(ctx, args);
+		break;
+
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index d5bc75508048..f17da7e26c43 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -150,6 +150,10 @@ struct i915_gem_context {
 	 */
 	atomic_t active_count;
 
+	struct {
+		u64 timeout_us;
+	} watchdog;
+
 	/**
 	 * @hang_timestamp: The last time(s) this context caused a GPU hang
 	 */
diff --git a/drivers/gpu/drm/i915/gt/intel_context_param.h b/drivers/gpu/drm/i915/gt/intel_context_param.h
index f053d8633fe2..3ecacc675f41 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_param.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_param.h
@@ -6,9 +6,18 @@
 #ifndef INTEL_CONTEXT_PARAM_H
 #define INTEL_CONTEXT_PARAM_H
 
-struct intel_context;
+#include <linux/types.h>
+
+#include "intel_context.h"
 
 int intel_context_set_ring_size(struct intel_context *ce, long sz);
 long intel_context_get_ring_size(struct intel_context *ce);
 
+static inline int
+intel_context_set_watchdog_us(struct intel_context *ce, u64 timeout_us)
+{
+	ce->watchdog.timeout_us = timeout_us;
+	return 0;
+}
+
 #endif /* INTEL_CONTEXT_PARAM_H */
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 1987e2ea79a3..a4c65780850c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1694,6 +1694,8 @@ struct drm_i915_gem_context_param {
  * Default is 16 KiB.
  */
 #define I915_CONTEXT_PARAM_RINGSIZE	0xc
+
+#define I915_CONTEXT_PARAM_WATCHDOG	0xd
 /* Must be kept compact -- no holes and well documented */
 
 	__u64 value;
@@ -1863,7 +1865,8 @@ struct drm_i915_gem_context_create_ext_clone {
 #define I915_CONTEXT_CLONE_SSEU		(1u << 3)
 #define I915_CONTEXT_CLONE_TIMELINE	(1u << 4)
 #define I915_CONTEXT_CLONE_VM		(1u << 5)
-#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_VM << 1)
+#define I915_CONTEXT_CLONE_WATCHDOG	(1u << 6)
+#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_WATCHDOG << 1)
 	__u64 rsvd;
 };
 
-- 
2.27.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 5/6] drm/i915: Fail too long user submissions by default
  2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-03-12 15:46   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-12 15:46 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter, dri-devel, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A new Kconfig option CONFIG_DRM_I915_REQUEST_TIMEOUT is added, defaulting
to 10s, and this timeout is applied to _all_ contexts using the previously
added watchdog facility.

Result of this is that any user submission will simply fail after this
time, either causing a reset (for non-preemptable) or incomplete results.

This can have an effect that workloads which used to work fine will
suddenly start failing.

When the default expiry is active userspace will not be allowed to
decrease the timeout using the context param setting.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/Kconfig.profile        |  8 ++++
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 47 ++++++++++++++++++---
 2 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
index 35bbe2b80596..55e157ffff73 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -1,3 +1,11 @@
+config DRM_I915_REQUEST_TIMEOUT
+	int "Default timeout for requests (ms)"
+	default 10000 # milliseconds
+	help
+	  ...
+
+	  May be 0 to disable the timeout.
+
 config DRM_I915_FENCE_TIMEOUT
 	int "Timeout for unsignaled foreign fences (ms, jiffy granularity)"
 	default 10000 # milliseconds
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 32b05af4fc8f..21c0176e27a0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -854,6 +854,25 @@ static void __assign_timeline(struct i915_gem_context *ctx,
 	context_apply_all(ctx, __apply_timeline, timeline);
 }
 
+static int
+__set_watchdog(struct i915_gem_context *ctx, unsigned long timeout_us);
+
+static void __set_default_fence_expiry(struct i915_gem_context *ctx)
+{
+	struct drm_i915_private *i915 = ctx->i915;
+	int ret;
+
+	if (!IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT))
+		return;
+
+	/* Default expiry for user fences. */
+	ret = __set_watchdog(ctx, CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000);
+	if (ret)
+		drm_notice(&i915->drm,
+			   "Failed to configure default fence expiry! (%d)",
+			   ret);
+}
+
 static struct i915_gem_context *
 i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags)
 {
@@ -898,6 +917,8 @@ i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags)
 		intel_timeline_put(timeline);
 	}
 
+	__set_default_fence_expiry(ctx);
+
 	trace_i915_context_create(ctx);
 
 	return ctx;
@@ -1404,23 +1425,35 @@ static int __apply_watchdog(struct intel_context *ce, void *timeout_us)
 	return intel_context_set_watchdog_us(ce, (uintptr_t)timeout_us);
 }
 
-static int set_watchdog(struct i915_gem_context *ctx,
-			struct drm_i915_gem_context_param *args)
+static int
+__set_watchdog(struct i915_gem_context *ctx, unsigned long timeout_us)
 {
 	int ret;
 
-	if (args->size)
-		return -EINVAL;
-
 	ret = context_apply_all(ctx, __apply_watchdog,
-				(void *)(uintptr_t)args->value);
+				(void *)(uintptr_t)timeout_us);
 
 	if (!ret)
-		ctx->watchdog.timeout_us = args->value;
+		ctx->watchdog.timeout_us = timeout_us;
 
 	return ret;
 }
 
+static int set_watchdog(struct i915_gem_context *ctx,
+			struct drm_i915_gem_context_param *args)
+{
+	if (args->size)
+		return -EINVAL;
+
+	/* Disallow disabling or configuring longer watchdog than default. */
+	if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) &&
+	    (!args->value ||
+	     args->value > CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000))
+		return -EPERM;
+
+	return __set_watchdog(ctx, args->value);
+}
+
 static int __get_ringsize(struct intel_context *ce, void *arg)
 {
 	long sz;
-- 
2.27.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Intel-gfx] [RFC 5/6] drm/i915: Fail too long user submissions by default
@ 2021-03-12 15:46   ` Tvrtko Ursulin
  0 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-12 15:46 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter, dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

A new Kconfig option CONFIG_DRM_I915_REQUEST_TIMEOUT is added, defaulting
to 10s, and this timeout is applied to _all_ contexts using the previously
added watchdog facility.

Result of this is that any user submission will simply fail after this
time, either causing a reset (for non-preemptable) or incomplete results.

This can have an effect that workloads which used to work fine will
suddenly start failing.

When the default expiry is active userspace will not be allowed to
decrease the timeout using the context param setting.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/Kconfig.profile        |  8 ++++
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 47 ++++++++++++++++++---
 2 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
index 35bbe2b80596..55e157ffff73 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -1,3 +1,11 @@
+config DRM_I915_REQUEST_TIMEOUT
+	int "Default timeout for requests (ms)"
+	default 10000 # milliseconds
+	help
+	  ...
+
+	  May be 0 to disable the timeout.
+
 config DRM_I915_FENCE_TIMEOUT
 	int "Timeout for unsignaled foreign fences (ms, jiffy granularity)"
 	default 10000 # milliseconds
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 32b05af4fc8f..21c0176e27a0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -854,6 +854,25 @@ static void __assign_timeline(struct i915_gem_context *ctx,
 	context_apply_all(ctx, __apply_timeline, timeline);
 }
 
+static int
+__set_watchdog(struct i915_gem_context *ctx, unsigned long timeout_us);
+
+static void __set_default_fence_expiry(struct i915_gem_context *ctx)
+{
+	struct drm_i915_private *i915 = ctx->i915;
+	int ret;
+
+	if (!IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT))
+		return;
+
+	/* Default expiry for user fences. */
+	ret = __set_watchdog(ctx, CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000);
+	if (ret)
+		drm_notice(&i915->drm,
+			   "Failed to configure default fence expiry! (%d)",
+			   ret);
+}
+
 static struct i915_gem_context *
 i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags)
 {
@@ -898,6 +917,8 @@ i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags)
 		intel_timeline_put(timeline);
 	}
 
+	__set_default_fence_expiry(ctx);
+
 	trace_i915_context_create(ctx);
 
 	return ctx;
@@ -1404,23 +1425,35 @@ static int __apply_watchdog(struct intel_context *ce, void *timeout_us)
 	return intel_context_set_watchdog_us(ce, (uintptr_t)timeout_us);
 }
 
-static int set_watchdog(struct i915_gem_context *ctx,
-			struct drm_i915_gem_context_param *args)
+static int
+__set_watchdog(struct i915_gem_context *ctx, unsigned long timeout_us)
 {
 	int ret;
 
-	if (args->size)
-		return -EINVAL;
-
 	ret = context_apply_all(ctx, __apply_watchdog,
-				(void *)(uintptr_t)args->value);
+				(void *)(uintptr_t)timeout_us);
 
 	if (!ret)
-		ctx->watchdog.timeout_us = args->value;
+		ctx->watchdog.timeout_us = timeout_us;
 
 	return ret;
 }
 
+static int set_watchdog(struct i915_gem_context *ctx,
+			struct drm_i915_gem_context_param *args)
+{
+	if (args->size)
+		return -EINVAL;
+
+	/* Disallow disabling or configuring longer watchdog than default. */
+	if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) &&
+	    (!args->value ||
+	     args->value > CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000))
+		return -EPERM;
+
+	return __set_watchdog(ctx, args->value);
+}
+
 static int __get_ringsize(struct intel_context *ce, void *arg)
 {
 	long sz;
-- 
2.27.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 6/6] drm/i915: Allow configuring default request expiry via modparam
  2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-03-12 15:46   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-12 15:46 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter, dri-devel, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Module parameter is added (request_timeout_ms) to allow configuring the
default request/fence expiry.

Default value is inherited from CONFIG_DRM_I915_REQUEST_TIMEOUT.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 8 +++++---
 drivers/gpu/drm/i915/i915_params.c          | 5 +++++
 drivers/gpu/drm/i915/i915_params.h          | 1 +
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 21c0176e27a0..1dae5e2514a9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -866,7 +866,7 @@ static void __set_default_fence_expiry(struct i915_gem_context *ctx)
 		return;
 
 	/* Default expiry for user fences. */
-	ret = __set_watchdog(ctx, CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000);
+	ret = __set_watchdog(ctx, i915->params.request_timeout_ms * 1000);
 	if (ret)
 		drm_notice(&i915->drm,
 			   "Failed to configure default fence expiry! (%d)",
@@ -1442,13 +1442,15 @@ __set_watchdog(struct i915_gem_context *ctx, unsigned long timeout_us)
 static int set_watchdog(struct i915_gem_context *ctx,
 			struct drm_i915_gem_context_param *args)
 {
+	struct drm_i915_private *i915 = ctx->i915;
+
 	if (args->size)
 		return -EINVAL;
 
 	/* Disallow disabling or configuring longer watchdog than default. */
-	if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) &&
+	if (i915->params.request_timeout_ms &&
 	    (!args->value ||
-	     args->value > CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000))
+	     args->value > i915->params.request_timeout_ms * 1000))
 		return -EPERM;
 
 	return __set_watchdog(ctx, args->value);
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 6939634e56ed..0320878d96b0 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -197,6 +197,11 @@ i915_param_named_unsafe(fake_lmem_start, ulong, 0400,
 	"Fake LMEM start offset (default: 0)");
 #endif
 
+#if CONFIG_DRM_I915_REQUEST_TIMEOUT
+i915_param_named_unsafe(request_timeout_ms, uint, 0600,
+			"Default request/fence/batch buffer expiration timeout.");
+#endif
+
 static __always_inline void _print_param(struct drm_printer *p,
 					 const char *name,
 					 const char *type,
diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
index 48f47e44e848..34ebb0662547 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -72,6 +72,7 @@ struct drm_printer;
 	param(int, enable_dpcd_backlight, -1, 0600) \
 	param(char *, force_probe, CONFIG_DRM_I915_FORCE_PROBE, 0400) \
 	param(unsigned long, fake_lmem_start, 0, 0400) \
+	param(unsigned int, request_timeout_ms, CONFIG_DRM_I915_REQUEST_TIMEOUT, 0600) \
 	/* leave bools at the end to not create holes */ \
 	param(bool, enable_hangcheck, true, 0600) \
 	param(bool, load_detect_test, false, 0600) \
-- 
2.27.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Intel-gfx] [RFC 6/6] drm/i915: Allow configuring default request expiry via modparam
@ 2021-03-12 15:46   ` Tvrtko Ursulin
  0 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-12 15:46 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter, dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Module parameter is added (request_timeout_ms) to allow configuring the
default request/fence expiry.

Default value is inherited from CONFIG_DRM_I915_REQUEST_TIMEOUT.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 8 +++++---
 drivers/gpu/drm/i915/i915_params.c          | 5 +++++
 drivers/gpu/drm/i915/i915_params.h          | 1 +
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 21c0176e27a0..1dae5e2514a9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -866,7 +866,7 @@ static void __set_default_fence_expiry(struct i915_gem_context *ctx)
 		return;
 
 	/* Default expiry for user fences. */
-	ret = __set_watchdog(ctx, CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000);
+	ret = __set_watchdog(ctx, i915->params.request_timeout_ms * 1000);
 	if (ret)
 		drm_notice(&i915->drm,
 			   "Failed to configure default fence expiry! (%d)",
@@ -1442,13 +1442,15 @@ __set_watchdog(struct i915_gem_context *ctx, unsigned long timeout_us)
 static int set_watchdog(struct i915_gem_context *ctx,
 			struct drm_i915_gem_context_param *args)
 {
+	struct drm_i915_private *i915 = ctx->i915;
+
 	if (args->size)
 		return -EINVAL;
 
 	/* Disallow disabling or configuring longer watchdog than default. */
-	if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) &&
+	if (i915->params.request_timeout_ms &&
 	    (!args->value ||
-	     args->value > CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000))
+	     args->value > i915->params.request_timeout_ms * 1000))
 		return -EPERM;
 
 	return __set_watchdog(ctx, args->value);
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 6939634e56ed..0320878d96b0 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -197,6 +197,11 @@ i915_param_named_unsafe(fake_lmem_start, ulong, 0400,
 	"Fake LMEM start offset (default: 0)");
 #endif
 
+#if CONFIG_DRM_I915_REQUEST_TIMEOUT
+i915_param_named_unsafe(request_timeout_ms, uint, 0600,
+			"Default request/fence/batch buffer expiration timeout.");
+#endif
+
 static __always_inline void _print_param(struct drm_printer *p,
 					 const char *name,
 					 const char *type,
diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
index 48f47e44e848..34ebb0662547 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -72,6 +72,7 @@ struct drm_printer;
 	param(int, enable_dpcd_backlight, -1, 0600) \
 	param(char *, force_probe, CONFIG_DRM_I915_FORCE_PROBE, 0400) \
 	param(unsigned long, fake_lmem_start, 0, 0400) \
+	param(unsigned int, request_timeout_ms, CONFIG_DRM_I915_REQUEST_TIMEOUT, 0600) \
 	/* leave bools at the end to not create holes */ \
 	param(bool, enable_hangcheck, true, 0600) \
 	param(bool, load_detect_test, false, 0600) \
-- 
2.27.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Default request/fence expiry + watchdog
  2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin
                   ` (6 preceding siblings ...)
  (?)
@ 2021-03-12 16:22 ` Patchwork
  -1 siblings, 0 replies; 27+ messages in thread
From: Patchwork @ 2021-03-12 16:22 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Default request/fence expiry + watchdog
URL   : https://patchwork.freedesktop.org/series/87930/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
865f5ed6eec8 drm/i915: Individual request cancellation
-:256: WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to using '__cancel_inactive', this function's name, in a string
#256: FILE: drivers/gpu/drm/i915/selftests/i915_request.c:660:
+		pr_err("%s: __cancel_inactive error %d\n", engine->name, err);

-:317: WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to using '__cancel_active', this function's name, in a string
#317: FILE: drivers/gpu/drm/i915/selftests/i915_request.c:721:
+		pr_err("%s: __cancel_active error %d\n", engine->name, err);

-:388: WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to using '__cancel_active_chain', this function's name, in a string
#388: FILE: drivers/gpu/drm/i915/selftests/i915_request.c:792:
+		pr_err("%s: __cancel_active_chain error %d\n",

-:438: WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to using '__cancel_completed', this function's name, in a string
#438: FILE: drivers/gpu/drm/i915/selftests/i915_request.c:842:
+		pr_err("%s: __cancel_completed error %d\n", engine->name, err);

total: 0 errors, 4 warnings, 0 checks, 444 lines checked
5b101fa80277 drm/i915: Restrict sentinel requests further
c4f617682a6d drm/i915: Request watchdog infrastructure
-:197: WARNING:LONG_LINE_COMMENT: line length of 124 exceeds 100 columns
#197: FILE: drivers/gpu/drm/i915/i915_request.c:314:
+			       NSEC_PER_MSEC, /* FIXME check if it gives the "not sooner" guarantee or slack is both ways */

total: 0 errors, 1 warnings, 0 checks, 190 lines checked
9e458012bb44 drm/i915: Allow userspace to configure the watchdog
5d0b88dfca39 drm/i915: Fail too long user submissions by default
7f4fb6d644b5 drm/i915: Allow configuring default request expiry via modparam


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for Default request/fence expiry + watchdog
  2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin
                   ` (7 preceding siblings ...)
  (?)
@ 2021-03-12 16:48 ` Patchwork
  -1 siblings, 0 replies; 27+ messages in thread
From: Patchwork @ 2021-03-12 16:48 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 3350 bytes --]

== Series Details ==

Series: Default request/fence expiry + watchdog
URL   : https://patchwork.freedesktop.org/series/87930/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_9854 -> Patchwork_19789
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/index.html

Known issues
------------

  Here are the changes found in Patchwork_19789 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_gttfill@basic:
    - fi-kbl-8809g:       [PASS][1] -> [TIMEOUT][2] ([i915#3145])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/fi-kbl-8809g/igt@gem_exec_gttfill@basic.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/fi-kbl-8809g/igt@gem_exec_gttfill@basic.html

  * igt@runner@aborted:
    - fi-bdw-5557u:       NOTRUN -> [FAIL][3] ([i915#1602] / [i915#2029] / [i915#2369])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/fi-bdw-5557u/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s0:
    - fi-tgl-u2:          [FAIL][4] ([i915#1888]) -> [PASS][5]
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/fi-tgl-u2/igt@gem_exec_suspend@basic-s0.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/fi-tgl-u2/igt@gem_exec_suspend@basic-s0.html

  * igt@gem_tiled_blits@basic:
    - fi-kbl-8809g:       [TIMEOUT][6] ([i915#2502] / [i915#3145]) -> [PASS][7]
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/fi-kbl-8809g/igt@gem_tiled_blits@basic.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/fi-kbl-8809g/igt@gem_tiled_blits@basic.html

  
  [i915#1602]: https://gitlab.freedesktop.org/drm/intel/issues/1602
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#2029]: https://gitlab.freedesktop.org/drm/intel/issues/2029
  [i915#2369]: https://gitlab.freedesktop.org/drm/intel/issues/2369
  [i915#2502]: https://gitlab.freedesktop.org/drm/intel/issues/2502
  [i915#3145]: https://gitlab.freedesktop.org/drm/intel/issues/3145


Participating hosts (46 -> 41)
------------------------------

  Missing    (5): fi-ilk-m540 fi-hsw-4200u fi-bsw-cyan fi-ctg-p8600 fi-bdw-samus 


Build changes
-------------

  * IGT: IGT_6031 -> TrybotIGT_303
  * Linux: CI_DRM_9854 -> Patchwork_19789

  CI-20190529: 20190529
  CI_DRM_9854: 4483074e9d0683cba71600dec27241fffef7b2d6 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6031: 6ab78f9da7621b62c162929013772b3c6ac87dbd @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_19789: 7f4fb6d644b5b89887d99fda13cc3a42fa38516a @ git://anongit.freedesktop.org/gfx-ci/linux
  TrybotIGT_303: https://intel-gfx-ci.01.org/tree/drm-tip/TrybotIGT_303/index.html


== Linux commits ==

7f4fb6d644b5 drm/i915: Allow configuring default request expiry via modparam
5d0b88dfca39 drm/i915: Fail too long user submissions by default
9e458012bb44 drm/i915: Allow userspace to configure the watchdog
c4f617682a6d drm/i915: Request watchdog infrastructure
5b101fa80277 drm/i915: Restrict sentinel requests further
865f5ed6eec8 drm/i915: Individual request cancellation

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/index.html

[-- Attachment #1.2: Type: text/html, Size: 4146 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for Default request/fence expiry + watchdog
  2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin
                   ` (8 preceding siblings ...)
  (?)
@ 2021-03-12 18:25 ` Patchwork
  -1 siblings, 0 replies; 27+ messages in thread
From: Patchwork @ 2021-03-12 18:25 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 30261 bytes --]

== Series Details ==

Series: Default request/fence expiry + watchdog
URL   : https://patchwork.freedesktop.org/series/87930/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_9854_full -> Patchwork_19789_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_19789_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_19789_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_19789_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_ctx_ringsize@active@bcs0:
    - shard-skl:          [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-skl9/igt@gem_ctx_ringsize@active@bcs0.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl4/igt@gem_ctx_ringsize@active@bcs0.html

  * igt@gem_ctx_ringsize@idle@bcs0:
    - shard-skl:          NOTRUN -> [INCOMPLETE][3]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl5/igt@gem_ctx_ringsize@idle@bcs0.html

  * igt@gem_exec_balancer@bonded-true-hang:
    - shard-tglb:         [PASS][4] -> [INCOMPLETE][5]
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-tglb8/igt@gem_exec_balancer@bonded-true-hang.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb7/igt@gem_exec_balancer@bonded-true-hang.html

  * igt@gem_userptr_blits@vma-merge:
    - shard-snb:          NOTRUN -> [FAIL][6] +2 similar issues
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-snb7/igt@gem_userptr_blits@vma-merge.html

  * igt@i915_hangman@engine-hang@rcs0:
    - shard-tglb:         NOTRUN -> [FAIL][7]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb7/igt@i915_hangman@engine-hang@rcs0.html

  * igt@i915_hangman@error-state-capture@bcs0:
    - shard-snb:          [PASS][8] -> [FAIL][9] +2 similar issues
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-snb2/igt@i915_hangman@error-state-capture@bcs0.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-snb6/igt@i915_hangman@error-state-capture@bcs0.html

  * igt@kms_universal_plane@universal-plane-pipe-b-functional:
    - shard-glk:          [PASS][10] -> [FAIL][11]
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-glk1/igt@kms_universal_plane@universal-plane-pipe-b-functional.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-glk9/igt@kms_universal_plane@universal-plane-pipe-b-functional.html
    - shard-apl:          NOTRUN -> [FAIL][12]
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-apl1/igt@kms_universal_plane@universal-plane-pipe-b-functional.html
    - shard-kbl:          [PASS][13] -> [FAIL][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-kbl3/igt@kms_universal_plane@universal-plane-pipe-b-functional.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-kbl7/igt@kms_universal_plane@universal-plane-pipe-b-functional.html
    - shard-skl:          [PASS][15] -> [FAIL][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-skl4/igt@kms_universal_plane@universal-plane-pipe-b-functional.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl4/igt@kms_universal_plane@universal-plane-pipe-b-functional.html

  
#### Warnings ####

  * igt@gem_exec_reloc@basic-wide-active@rcs0:
    - shard-tglb:         [FAIL][17] ([i915#2389]) -> [FAIL][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-tglb3/igt@gem_exec_reloc@basic-wide-active@rcs0.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb2/igt@gem_exec_reloc@basic-wide-active@rcs0.html

  * igt@gem_exec_reloc@basic-wide-active@vcs0:
    - shard-snb:          [FAIL][19] ([i915#2389]) -> [FAIL][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-snb5/igt@gem_exec_reloc@basic-wide-active@vcs0.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-snb5/igt@gem_exec_reloc@basic-wide-active@vcs0.html

  * igt@gem_userptr_blits@vma-merge:
    - shard-apl:          [INCOMPLETE][21] ([i915#2502] / [i915#2667]) -> [FAIL][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-apl7/igt@gem_userptr_blits@vma-merge.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-apl2/igt@gem_userptr_blits@vma-merge.html
    - shard-iclb:         [INCOMPLETE][23] ([i915#2502] / [i915#2667]) -> [FAIL][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-iclb4/igt@gem_userptr_blits@vma-merge.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb2/igt@gem_userptr_blits@vma-merge.html
    - shard-glk:          [INCOMPLETE][25] ([i915#2502] / [i915#2667]) -> [FAIL][26]
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-glk9/igt@gem_userptr_blits@vma-merge.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-glk6/igt@gem_userptr_blits@vma-merge.html
    - shard-kbl:          [INCOMPLETE][27] ([i915#2502] / [i915#2667]) -> [FAIL][28]
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-kbl6/igt@gem_userptr_blits@vma-merge.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-kbl4/igt@gem_userptr_blits@vma-merge.html
    - shard-tglb:         [INCOMPLETE][29] ([i915#2502] / [i915#2667]) -> [FAIL][30]
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-tglb2/igt@gem_userptr_blits@vma-merge.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb5/igt@gem_userptr_blits@vma-merge.html
    - shard-skl:          [INCOMPLETE][31] ([i915#2502] / [i915#2667]) -> [FAIL][32]
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-skl6/igt@gem_userptr_blits@vma-merge.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl7/igt@gem_userptr_blits@vma-merge.html

  
New tests
---------

  New tests have been introduced between CI_DRM_9854_full and Patchwork_19789_full:

### New IGT tests (4) ###

  * igt@gem_watchdog@default-physical:
    - Statuses : 5 pass(s) 1 skip(s)
    - Exec time: [0.00, 1.26] s

  * igt@gem_watchdog@default-virtual:
    - Statuses : 1 fail(s) 5 pass(s)
    - Exec time: [0.10, 1.26] s

  * igt@gem_watchdog@watchdog-physical:
    - Statuses : 6 pass(s) 1 skip(s)
    - Exec time: [0.00, 0.84] s

  * igt@gem_watchdog@watchdog-virtual:
    - Statuses : 1 fail(s) 6 pass(s)
    - Exec time: [0.09, 0.79] s

  

Known issues
------------

  Here are the changes found in Patchwork_19789_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@feature_discovery@chamelium:
    - shard-tglb:         NOTRUN -> [SKIP][33] ([fdo#111827])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb6/igt@feature_discovery@chamelium.html
    - shard-iclb:         NOTRUN -> [SKIP][34] ([fdo#111827])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb3/igt@feature_discovery@chamelium.html

  * igt@feature_discovery@display-4x:
    - shard-tglb:         NOTRUN -> [SKIP][35] ([i915#1839])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb7/igt@feature_discovery@display-4x.html

  * igt@gem_create@create-clear:
    - shard-glk:          [PASS][36] -> [FAIL][37] ([i915#1888] / [i915#3160])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-glk6/igt@gem_create@create-clear.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-glk8/igt@gem_create@create-clear.html
    - shard-skl:          [PASS][38] -> [FAIL][39] ([i915#3160])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-skl9/igt@gem_create@create-clear.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl1/igt@gem_create@create-clear.html

  * igt@gem_create@create-massive:
    - shard-iclb:         NOTRUN -> [DMESG-WARN][40] ([i915#3002])
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb6/igt@gem_create@create-massive.html
    - shard-snb:          NOTRUN -> [DMESG-WARN][41] ([i915#3002])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-snb2/igt@gem_create@create-massive.html

  * igt@gem_ctx_persistence@legacy-engines-persistence:
    - shard-snb:          NOTRUN -> [SKIP][42] ([fdo#109271] / [i915#1099]) +1 similar issue
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-snb2/igt@gem_ctx_persistence@legacy-engines-persistence.html

  * igt@gem_eio@unwedge-stress:
    - shard-tglb:         [PASS][43] -> [TIMEOUT][44] ([i915#2369] / [i915#3063])
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-tglb5/igt@gem_eio@unwedge-stress.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb1/igt@gem_eio@unwedge-stress.html

  * igt@gem_exec_balancer@hang:
    - shard-iclb:         [PASS][45] -> [INCOMPLETE][46] ([i915#1895] / [i915#3031])
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-iclb6/igt@gem_exec_balancer@hang.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb4/igt@gem_exec_balancer@hang.html

  * igt@gem_exec_fair@basic-deadline:
    - shard-kbl:          [PASS][47] -> [FAIL][48] ([i915#2846])
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-kbl2/igt@gem_exec_fair@basic-deadline.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-kbl1/igt@gem_exec_fair@basic-deadline.html
    - shard-skl:          NOTRUN -> [FAIL][49] ([i915#2846])
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl2/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-none@vcs1:
    - shard-kbl:          [PASS][50] -> [FAIL][51] ([i915#2842])
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-kbl3/igt@gem_exec_fair@basic-none@vcs1.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-kbl2/igt@gem_exec_fair@basic-none@vcs1.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - shard-tglb:         [PASS][52] -> [FAIL][53] ([i915#2842])
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-tglb5/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb6/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gem_exec_fair@basic-pace@vcs1:
    - shard-iclb:         NOTRUN -> [FAIL][54] ([i915#2842])
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb4/igt@gem_exec_fair@basic-pace@vcs1.html

  * igt@gem_exec_flush@basic-batch-kernel-default-cmd:
    - shard-tglb:         NOTRUN -> [SKIP][55] ([fdo#109313])
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb1/igt@gem_exec_flush@basic-batch-kernel-default-cmd.html

  * igt@gem_exec_reloc@basic-many-active@rcs0:
    - shard-tglb:         NOTRUN -> [FAIL][56] ([i915#2389]) +4 similar issues
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb7/igt@gem_exec_reloc@basic-many-active@rcs0.html

  * igt@gem_exec_reloc@basic-wide-active@bcs0:
    - shard-apl:          NOTRUN -> [FAIL][57] ([i915#2389]) +3 similar issues
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-apl6/igt@gem_exec_reloc@basic-wide-active@bcs0.html

  * igt@gem_exec_schedule@u-fairslice@rcs0:
    - shard-skl:          [PASS][58] -> [DMESG-WARN][59] ([i915#1610] / [i915#2803])
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-skl4/igt@gem_exec_schedule@u-fairslice@rcs0.html
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl5/igt@gem_exec_schedule@u-fairslice@rcs0.html

  * igt@gem_exec_suspend@basic-s3:
    - shard-kbl:          [PASS][60] -> [INCOMPLETE][61] ([i915#155])
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-kbl3/igt@gem_exec_suspend@basic-s3.html
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-kbl4/igt@gem_exec_suspend@basic-s3.html
    - shard-skl:          [PASS][62] -> [INCOMPLETE][63] ([i915#198])
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-skl4/igt@gem_exec_suspend@basic-s3.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl3/igt@gem_exec_suspend@basic-s3.html

  * igt@gem_exec_whisper@basic-queues-all:
    - shard-glk:          [PASS][64] -> [DMESG-WARN][65] ([i915#118] / [i915#95]) +2 similar issues
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-glk2/igt@gem_exec_whisper@basic-queues-all.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-glk4/igt@gem_exec_whisper@basic-queues-all.html

  * igt@gem_render_copy@yf-tiled-to-vebox-x-tiled:
    - shard-iclb:         NOTRUN -> [SKIP][66] ([i915#768]) +1 similar issue
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb5/igt@gem_render_copy@yf-tiled-to-vebox-x-tiled.html

  * igt@gem_softpin@evict-snoop:
    - shard-iclb:         NOTRUN -> [SKIP][67] ([fdo#109312])
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb1/igt@gem_softpin@evict-snoop.html
    - shard-tglb:         NOTRUN -> [SKIP][68] ([fdo#109312])
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb1/igt@gem_softpin@evict-snoop.html

  * igt@gem_userptr_blits@input-checking:
    - shard-apl:          NOTRUN -> [DMESG-WARN][69] ([i915#3002])
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-apl6/igt@gem_userptr_blits@input-checking.html

  * igt@gem_workarounds@suspend-resume:
    - shard-skl:          NOTRUN -> [INCOMPLETE][70] ([i915#198])
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl9/igt@gem_workarounds@suspend-resume.html

  * igt@gen3_render_tiledy_blits:
    - shard-tglb:         NOTRUN -> [SKIP][71] ([fdo#109289]) +1 similar issue
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb1/igt@gen3_render_tiledy_blits.html

  * igt@gen9_exec_parse@allowed-single:
    - shard-tglb:         NOTRUN -> [SKIP][72] ([fdo#112306]) +2 similar issues
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb6/igt@gen9_exec_parse@allowed-single.html
    - shard-skl:          NOTRUN -> [DMESG-WARN][73] ([i915#1436] / [i915#716])
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl5/igt@gen9_exec_parse@allowed-single.html

  * igt@gen9_exec_parse@batch-invalid-length:
    - shard-snb:          NOTRUN -> [SKIP][74] ([fdo#109271]) +347 similar issues
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-snb2/igt@gen9_exec_parse@batch-invalid-length.html

  * igt@gen9_exec_parse@bb-chained:
    - shard-iclb:         NOTRUN -> [SKIP][75] ([fdo#112306]) +1 similar issue
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb2/igt@gen9_exec_parse@bb-chained.html

  * igt@i915_module_load@reload-with-fault-injection:
    - shard-snb:          [PASS][76] -> [INCOMPLETE][77] ([i915#2880])
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-snb2/igt@i915_module_load@reload-with-fault-injection.html
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-snb7/igt@i915_module_load@reload-with-fault-injection.html

  * igt@i915_pm_rpm@modeset-non-lpsp:
    - shard-iclb:         NOTRUN -> [SKIP][78] ([fdo#110892])
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb5/igt@i915_pm_rpm@modeset-non-lpsp.html
    - shard-tglb:         NOTRUN -> [SKIP][79] ([fdo#111644] / [i915#1397] / [i915#2411])
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb3/igt@i915_pm_rpm@modeset-non-lpsp.html

  * igt@i915_pm_rpm@pc8-residency:
    - shard-tglb:         NOTRUN -> [SKIP][80] ([fdo#109506] / [i915#2411])
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb2/igt@i915_pm_rpm@pc8-residency.html

  * igt@kms_atomic@plane-primary-overlay-mutable-zpos:
    - shard-iclb:         NOTRUN -> [SKIP][81] ([i915#404])
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb3/igt@kms_atomic@plane-primary-overlay-mutable-zpos.html

  * igt@kms_atomic_transition@plane-all-modeset-transition-fencing:
    - shard-iclb:         NOTRUN -> [SKIP][82] ([i915#1769])
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb3/igt@kms_atomic_transition@plane-all-modeset-transition-fencing.html
    - shard-tglb:         NOTRUN -> [SKIP][83] ([i915#1769])
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb6/igt@kms_atomic_transition@plane-all-modeset-transition-fencing.html

  * igt@kms_big_fb@linear-64bpp-rotate-270:
    - shard-iclb:         NOTRUN -> [SKIP][84] ([fdo#110725] / [fdo#111614])
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb1/igt@kms_big_fb@linear-64bpp-rotate-270.html

  * igt@kms_big_fb@y-tiled-64bpp-rotate-270:
    - shard-tglb:         NOTRUN -> [SKIP][85] ([fdo#111614]) +2 similar issues
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb8/igt@kms_big_fb@y-tiled-64bpp-rotate-270.html

  * igt@kms_big_fb@yf-tiled-64bpp-rotate-180:
    - shard-tglb:         NOTRUN -> [SKIP][86] ([fdo#111615]) +2 similar issues
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb8/igt@kms_big_fb@yf-tiled-64bpp-rotate-180.html

  * igt@kms_big_fb@yf-tiled-64bpp-rotate-90:
    - shard-iclb:         NOTRUN -> [SKIP][87] ([fdo#110723]) +1 similar issue
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb5/igt@kms_big_fb@yf-tiled-64bpp-rotate-90.html

  * igt@kms_big_joiner@basic:
    - shard-tglb:         NOTRUN -> [SKIP][88] ([i915#2705]) +1 similar issue
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb5/igt@kms_big_joiner@basic.html
    - shard-apl:          NOTRUN -> [SKIP][89] ([fdo#109271] / [i915#2705])
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-apl2/igt@kms_big_joiner@basic.html

  * igt@kms_big_joiner@invalid-modeset:
    - shard-skl:          NOTRUN -> [SKIP][90] ([fdo#109271] / [i915#2705]) +1 similar issue
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl1/igt@kms_big_joiner@invalid-modeset.html
    - shard-iclb:         NOTRUN -> [SKIP][91] ([i915#2705])
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb7/igt@kms_big_joiner@invalid-modeset.html
    - shard-kbl:          NOTRUN -> [SKIP][92] ([fdo#109271] / [i915#2705])
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-kbl2/igt@kms_big_joiner@invalid-modeset.html
    - shard-glk:          NOTRUN -> [SKIP][93] ([fdo#109271] / [i915#2705])
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-glk1/igt@kms_big_joiner@invalid-modeset.html

  * igt@kms_ccs@pipe-c-ccs-on-another-bo:
    - shard-skl:          NOTRUN -> [SKIP][94] ([fdo#109271] / [fdo#111304])
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl10/igt@kms_ccs@pipe-c-ccs-on-another-bo.html

  * igt@kms_chamelium@hdmi-aspect-ratio:
    - shard-skl:          NOTRUN -> [SKIP][95] ([fdo#109271] / [fdo#111827]) +12 similar issues
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl5/igt@kms_chamelium@hdmi-aspect-ratio.html

  * igt@kms_chamelium@hdmi-hpd-storm:
    - shard-kbl:          NOTRUN -> [SKIP][96] ([fdo#109271] / [fdo#111827]) +11 similar issues
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-kbl4/igt@kms_chamelium@hdmi-hpd-storm.html

  * igt@kms_color@pipe-a-ctm-0-25:
    - shard-iclb:         NOTRUN -> [FAIL][97] ([i915#1149] / [i915#315])
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb6/igt@kms_color@pipe-a-ctm-0-25.html
    - shard-tglb:         NOTRUN -> [FAIL][98] ([i915#1149] / [i915#315])
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb6/igt@kms_color@pipe-a-ctm-0-25.html

  * igt@kms_color@pipe-a-degamma:
    - shard-iclb:         NOTRUN -> [FAIL][99] ([i915#1149]) +1 similar issue
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb6/igt@kms_color@pipe-a-degamma.html

  * igt@kms_color@pipe-b-degamma:
    - shard-tglb:         NOTRUN -> [FAIL][100] ([i915#1149]) +1 similar issue
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb2/igt@kms_color@pipe-b-degamma.html

  * igt@kms_color_chamelium@pipe-a-ctm-limited-range:
    - shard-apl:          NOTRUN -> [SKIP][101] ([fdo#109271] / [fdo#111827]) +21 similar issues
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-apl3/igt@kms_color_chamelium@pipe-a-ctm-limited-range.html

  * igt@kms_color_chamelium@pipe-a-ctm-red-to-blue:
    - shard-iclb:         NOTRUN -> [SKIP][102] ([fdo#109284] / [fdo#111827]) +8 similar issues
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb7/igt@kms_color_chamelium@pipe-a-ctm-red-to-blue.html

  * igt@kms_color_chamelium@pipe-invalid-ctm-matrix-sizes:
    - shard-snb:          NOTRUN -> [SKIP][103] ([fdo#109271] / [fdo#111827]) +21 similar issues
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-snb2/igt@kms_color_chamelium@pipe-invalid-ctm-matrix-sizes.html

  * igt@kms_color_chamelium@pipe-invalid-degamma-lut-sizes:
    - shard-tglb:         NOTRUN -> [SKIP][104] ([fdo#109284] / [fdo#111827]) +13 similar issues
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb1/igt@kms_color_chamelium@pipe-invalid-degamma-lut-sizes.html
    - shard-glk:          NOTRUN -> [SKIP][105] ([fdo#109271] / [fdo#111827]) +7 similar issues
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-glk3/igt@kms_color_chamelium@pipe-invalid-degamma-lut-sizes.html

  * igt@kms_content_protection@atomic:
    - shard-apl:          NOTRUN -> [TIMEOUT][106] ([i915#1319])
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-apl3/igt@kms_content_protection@atomic.html

  * igt@kms_content_protection@content_type_change:
    - shard-iclb:         NOTRUN -> [SKIP][107] ([fdo#109300] / [fdo#111066])
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb8/igt@kms_content_protection@content_type_change.html
    - shard-tglb:         NOTRUN -> [SKIP][108] ([fdo#111828])
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb8/igt@kms_content_protection@content_type_change.html

  * igt@kms_cursor_crc@pipe-a-cursor-256x256-random:
    - shard-skl:          [PASS][109] -> [FAIL][110] ([i915#54]) +1 similar issue
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-skl10/igt@kms_cursor_crc@pipe-a-cursor-256x256-random.html
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl5/igt@kms_cursor_crc@pipe-a-cursor-256x256-random.html

  * igt@kms_cursor_crc@pipe-c-cursor-512x170-sliding:
    - shard-iclb:         NOTRUN -> [SKIP][111] ([fdo#109278] / [fdo#109279])
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb2/igt@kms_cursor_crc@pipe-c-cursor-512x170-sliding.html

  * igt@kms_cursor_crc@pipe-d-cursor-512x512-sliding:
    - shard-tglb:         NOTRUN -> [SKIP][112] ([fdo#109279]) +2 similar issues
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb7/igt@kms_cursor_crc@pipe-d-cursor-512x512-sliding.html

  * igt@kms_cursor_edge_walk@pipe-d-64x64-left-edge:
    - shard-kbl:          NOTRUN -> [SKIP][113] ([fdo#109271]) +84 similar issues
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-kbl1/igt@kms_cursor_edge_walk@pipe-d-64x64-left-edge.html
    - shard-iclb:         NOTRUN -> [SKIP][114] ([fdo#109278]) +8 similar issues
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb7/igt@kms_cursor_edge_walk@pipe-d-64x64-left-edge.html

  * igt@kms_cursor_legacy@cursorb-vs-flipb-legacy:
    - shard-iclb:         NOTRUN -> [SKIP][115] ([fdo#109274] / [fdo#109278]) +2 similar issues
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb3/igt@kms_cursor_legacy@cursorb-vs-flipb-legacy.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size:
    - shard-skl:          NOTRUN -> [FAIL][116] ([i915#2346] / [i915#533])
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl10/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html

  * igt@kms_cursor_legacy@flip-vs-cursor-busy-crc-atomic:
    - shard-apl:          NOTRUN -> [DMESG-FAIL][117] ([IGT#6])
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-apl6/igt@kms_cursor_legacy@flip-vs-cursor-busy-crc-atomic.html

  * igt@kms_cursor_legacy@flip-vs-cursor-toggle:
    - shard-tglb:         NOTRUN -> [FAIL][118] ([i915#2346])
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb8/igt@kms_cursor_legacy@flip-vs-cursor-toggle.html

  * igt@kms_cursor_legacy@pipe-d-single-bo:
    - shard-kbl:          NOTRUN -> [SKIP][119] ([fdo#109271] / [i915#533])
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-kbl3/igt@kms_cursor_legacy@pipe-d-single-bo.html

  * igt@kms_flip@2x-blocking-absolute-wf_vblank-interruptible:
    - shard-tglb:         NOTRUN -> [SKIP][120] ([fdo#111825]) +33 similar issues
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb6/igt@kms_flip@2x-blocking-absolute-wf_vblank-interruptible.html

  * igt@kms_flip@flip-vs-absolute-wf_vblank-interruptible@a-edp1:
    - shard-tglb:         NOTRUN -> [FAIL][121] ([i915#2122])
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb8/igt@kms_flip@flip-vs-absolute-wf_vblank-interruptible@a-edp1.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@c-hdmi-a2:
    - shard-glk:          [PASS][122] -> [FAIL][123] ([i915#2122])
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-glk3/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-hdmi-a2.html
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-glk8/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-hdmi-a2.html

  * igt@kms_flip@flip-vs-suspend@c-dp1:
    - shard-kbl:          [PASS][124] -> [DMESG-WARN][125] ([i915#180]) +2 similar issues
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-kbl3/igt@kms_flip@flip-vs-suspend@c-dp1.html
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-kbl2/igt@kms_flip@flip-vs-suspend@c-dp1.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile:
    - shard-glk:          NOTRUN -> [SKIP][126] ([fdo#109271] / [i915#2642])
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-glk1/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile.html
    - shard-kbl:          NOTRUN -> [SKIP][127] ([fdo#109271] / [i915#2642])
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-kbl4/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile:
    - shard-tglb:         NOTRUN -> [SKIP][128] ([i915#2587])
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb5/igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile.html

  * igt@kms_frontbuffer_tracking@fbc-1p-shrfb-fliptrack-mmap-gtt:
    - shard-skl:          NOTRUN -> [SKIP][129] ([fdo#109271]) +136 similar issues
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl6/igt@kms_frontbuffer_tracking@fbc-1p-shrfb-fliptrack-mmap-gtt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-shrfb-draw-mmap-cpu:
    - shard-iclb:         NOTRUN -> [SKIP][130] ([fdo#109280]) +14 similar issues
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-iclb6/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-shrfb-draw-mmap-cpu.html

  * igt@kms_hdr@static-toggle:
    - shard-tglb:         NOTRUN -> [SKIP][131] ([i915#1187])
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb6/igt@kms_hdr@static-toggle.html

  * igt@kms_lease@simple_lease:
    - shard-skl:          [PASS][132] -> [SKIP][133] ([fdo#109271]) +3 similar issues
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-skl2/igt@kms_lease@simple_lease.html
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-skl6/igt@kms_lease@simple_lease.html

  * igt@kms_pipe_crc_basic@read-crc-pipe-d:
    - shard-apl:          NOTRUN -> [SKIP][134] ([fdo#109271] / [i915#533])
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-apl3/igt@kms_pipe_crc_basic@read-crc-pipe-d.html

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-d:
    - shard-tglb:         [PASS][135] -> [INCOMPLETE][136] ([i915#1436] / [i915#1982])
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-tglb1/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-d.html
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-tglb2/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-d.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes:
    - shard-kbl:          [PASS][137] -> [DMESG-WARN][138] ([i915#180] / [i915#533])
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9854/shard-kbl6/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes.html
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/shard-kbl2/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-transparent-fb:
    -

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_19789/index.html

[-- Attachment #1.2: Type: text/html, Size: 33728 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Intel-gfx] [RFC 1/6] drm/i915: Individual request cancellation
  2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-03-15 17:37     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-15 17:37 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel


On 12/03/2021 15:46, Tvrtko Ursulin wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Currently, we cancel outstanding requests within a context when the
> context is closed. We may also want to cancel individual requests using
> the same graceful preemption mechanism.
> 
> v2 (Tvrtko):
>   * Cancel waiters carefully considering no timeline lock and RCU.
>   * Fixed selftests.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

[snip]

> +void i915_request_cancel(struct i915_request *rq, int error)
> +{
> +	if (!i915_request_set_error_once(rq, error))
> +		return;
> +
> +	set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
> +
> +	if (i915_sw_fence_signaled(&rq->submit)) {
> +		struct i915_dependency *p;
> +
> +restart:
> +		rcu_read_lock();
> +		for_each_waiter(p, rq) {
> +			struct i915_request *w =
> +				container_of(p->waiter, typeof(*w), sched);
> +
> +			if (__i915_request_is_complete(w) ||
> +			    fatal_error(w->fence.error))
> +				continue;
> +
> +			w = i915_request_get(w);
> +			rcu_read_unlock();
> +			/* Recursion bound by the number of engines */
> +			i915_request_cancel(w, error);
> +			i915_request_put(w);
> +
> +			/* Restart after having to drop rcu lock. */
> +			goto restart;
> +		}

So I need to fix this error propagation to waiters in order to avoid 
potential stack overflow caught in shards (gem_ctx_ringsize).

Or alternatively we decide not to propagate fence errors. Not sure that 
consequences either way are particularly better or worse. Things will 
break anyway since what userspace inspects for unexpected fence errors?!

So rendering corruption more or less. Can it cause a further stream of 
GPU hangs I am not sure. Only if there is a inter-engine data dependency 
involving data more complex than images/textures.

Regards,

Tvrtko

> +		rcu_read_unlock();
> +	}
> +
> +	__cancel_request(rq);
> +}
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Intel-gfx] [RFC 1/6] drm/i915: Individual request cancellation
@ 2021-03-15 17:37     ` Tvrtko Ursulin
  0 siblings, 0 replies; 27+ messages in thread
From: Tvrtko Ursulin @ 2021-03-15 17:37 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel


On 12/03/2021 15:46, Tvrtko Ursulin wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Currently, we cancel outstanding requests within a context when the
> context is closed. We may also want to cancel individual requests using
> the same graceful preemption mechanism.
> 
> v2 (Tvrtko):
>   * Cancel waiters carefully considering no timeline lock and RCU.
>   * Fixed selftests.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

[snip]

> +void i915_request_cancel(struct i915_request *rq, int error)
> +{
> +	if (!i915_request_set_error_once(rq, error))
> +		return;
> +
> +	set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
> +
> +	if (i915_sw_fence_signaled(&rq->submit)) {
> +		struct i915_dependency *p;
> +
> +restart:
> +		rcu_read_lock();
> +		for_each_waiter(p, rq) {
> +			struct i915_request *w =
> +				container_of(p->waiter, typeof(*w), sched);
> +
> +			if (__i915_request_is_complete(w) ||
> +			    fatal_error(w->fence.error))
> +				continue;
> +
> +			w = i915_request_get(w);
> +			rcu_read_unlock();
> +			/* Recursion bound by the number of engines */
> +			i915_request_cancel(w, error);
> +			i915_request_put(w);
> +
> +			/* Restart after having to drop rcu lock. */
> +			goto restart;
> +		}

So I need to fix this error propagation to waiters in order to avoid 
potential stack overflow caught in shards (gem_ctx_ringsize).

Or alternatively we decide not to propagate fence errors. Not sure that 
consequences either way are particularly better or worse. Things will 
break anyway since what userspace inspects for unexpected fence errors?!

So rendering corruption more or less. Can it cause a further stream of 
GPU hangs I am not sure. Only if there is a inter-engine data dependency 
involving data more complex than images/textures.

Regards,

Tvrtko

> +		rcu_read_unlock();
> +	}
> +
> +	__cancel_request(rq);
> +}
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Intel-gfx] [RFC 1/6] drm/i915: Individual request cancellation
  2021-03-15 17:37     ` Tvrtko Ursulin
@ 2021-03-16 10:02       ` Daniel Vetter
  -1 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2021-03-16 10:02 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, dri-devel

On Mon, Mar 15, 2021 at 05:37:27PM +0000, Tvrtko Ursulin wrote:
> 
> On 12/03/2021 15:46, Tvrtko Ursulin wrote:
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > Currently, we cancel outstanding requests within a context when the
> > context is closed. We may also want to cancel individual requests using
> > the same graceful preemption mechanism.
> > 
> > v2 (Tvrtko):
> >   * Cancel waiters carefully considering no timeline lock and RCU.
> >   * Fixed selftests.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> [snip]
> 
> > +void i915_request_cancel(struct i915_request *rq, int error)
> > +{
> > +	if (!i915_request_set_error_once(rq, error))
> > +		return;
> > +
> > +	set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
> > +
> > +	if (i915_sw_fence_signaled(&rq->submit)) {
> > +		struct i915_dependency *p;
> > +
> > +restart:
> > +		rcu_read_lock();
> > +		for_each_waiter(p, rq) {
> > +			struct i915_request *w =
> > +				container_of(p->waiter, typeof(*w), sched);
> > +
> > +			if (__i915_request_is_complete(w) ||
> > +			    fatal_error(w->fence.error))
> > +				continue;
> > +
> > +			w = i915_request_get(w);
> > +			rcu_read_unlock();
> > +			/* Recursion bound by the number of engines */
> > +			i915_request_cancel(w, error);
> > +			i915_request_put(w);
> > +
> > +			/* Restart after having to drop rcu lock. */
> > +			goto restart;
> > +		}
> 
> So I need to fix this error propagation to waiters in order to avoid
> potential stack overflow caught in shards (gem_ctx_ringsize).
> 
> Or alternatively we decide not to propagate fence errors. Not sure that
> consequences either way are particularly better or worse. Things will break
> anyway since what userspace inspects for unexpected fence errors?!

fence error propagation is one of these "sounds like a good idea" things
that turned into a can of worms. See the recent revert Jason submitted, I
replied with a  more in-depth discussion.

So I'd say if we don't need this internally somehow for scheduler state,
remove it. Maybe even the entire scaffolding we have for the forwarding.

Maybe best if you sync with Jason here, we need to stuff Jason's patch
into -fixes since there's a pretty bad regression going on. I think Jason
also said there's a pile of igts to remove once we give up on fence error
propagation.

> So rendering corruption more or less. Can it cause a further stream of GPU
> hangs I am not sure. Only if there is a inter-engine data dependency
> involving data more complex than images/textures.

Yup. Also at least on modern-ish hw our userspace goes with
non-recoverable contexts anyway, because everything needs to be
reconstructed. vk is even more brutal, it just hands you back a
vk_device_lost and everything is gone (textures, data, all api objects,
really everything afaiui). Trying to continue is something only old
userspace is doing, because the fully emit the entire ctx state at the
start of each batch anyway.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Intel-gfx] [RFC 1/6] drm/i915: Individual request cancellation
@ 2021-03-16 10:02       ` Daniel Vetter
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2021-03-16 10:02 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, dri-devel

On Mon, Mar 15, 2021 at 05:37:27PM +0000, Tvrtko Ursulin wrote:
> 
> On 12/03/2021 15:46, Tvrtko Ursulin wrote:
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > Currently, we cancel outstanding requests within a context when the
> > context is closed. We may also want to cancel individual requests using
> > the same graceful preemption mechanism.
> > 
> > v2 (Tvrtko):
> >   * Cancel waiters carefully considering no timeline lock and RCU.
> >   * Fixed selftests.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> [snip]
> 
> > +void i915_request_cancel(struct i915_request *rq, int error)
> > +{
> > +	if (!i915_request_set_error_once(rq, error))
> > +		return;
> > +
> > +	set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
> > +
> > +	if (i915_sw_fence_signaled(&rq->submit)) {
> > +		struct i915_dependency *p;
> > +
> > +restart:
> > +		rcu_read_lock();
> > +		for_each_waiter(p, rq) {
> > +			struct i915_request *w =
> > +				container_of(p->waiter, typeof(*w), sched);
> > +
> > +			if (__i915_request_is_complete(w) ||
> > +			    fatal_error(w->fence.error))
> > +				continue;
> > +
> > +			w = i915_request_get(w);
> > +			rcu_read_unlock();
> > +			/* Recursion bound by the number of engines */
> > +			i915_request_cancel(w, error);
> > +			i915_request_put(w);
> > +
> > +			/* Restart after having to drop rcu lock. */
> > +			goto restart;
> > +		}
> 
> So I need to fix this error propagation to waiters in order to avoid
> potential stack overflow caught in shards (gem_ctx_ringsize).
> 
> Or alternatively we decide not to propagate fence errors. Not sure that
> consequences either way are particularly better or worse. Things will break
> anyway since what userspace inspects for unexpected fence errors?!

fence error propagation is one of these "sounds like a good idea" things
that turned into a can of worms. See the recent revert Jason submitted, I
replied with a  more in-depth discussion.

So I'd say if we don't need this internally somehow for scheduler state,
remove it. Maybe even the entire scaffolding we have for the forwarding.

Maybe best if you sync with Jason here, we need to stuff Jason's patch
into -fixes since there's a pretty bad regression going on. I think Jason
also said there's a pile of igts to remove once we give up on fence error
propagation.

> So rendering corruption more or less. Can it cause a further stream of GPU
> hangs I am not sure. Only if there is a inter-engine data dependency
> involving data more complex than images/textures.

Yup. Also at least on modern-ish hw our userspace goes with
non-recoverable contexts anyway, because everything needs to be
reconstructed. vk is even more brutal, it just hands you back a
vk_device_lost and everything is gone (textures, data, all api objects,
really everything afaiui). Trying to continue is something only old
userspace is doing, because the fully emit the entire ctx state at the
start of each batch anyway.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 6/6] drm/i915: Allow configuring default request expiry via modparam
  2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-03-16 10:03     ` Daniel Vetter
  -1 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2021-03-16 10:03 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx, dri-devel, Tvrtko Ursulin

On Fri, Mar 12, 2021 at 03:46:22PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Module parameter is added (request_timeout_ms) to allow configuring the
> default request/fence expiry.
> 
> Default value is inherited from CONFIG_DRM_I915_REQUEST_TIMEOUT.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Yeah I think this makes sense for debugging and testing (e.g. in igt we
can crank down the timeout to make stuff fail real fast, could help with
runtime on some tests).

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Cheers, Daniel

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 8 +++++---
>  drivers/gpu/drm/i915/i915_params.c          | 5 +++++
>  drivers/gpu/drm/i915/i915_params.h          | 1 +
>  3 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 21c0176e27a0..1dae5e2514a9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -866,7 +866,7 @@ static void __set_default_fence_expiry(struct i915_gem_context *ctx)
>  		return;
>  
>  	/* Default expiry for user fences. */
> -	ret = __set_watchdog(ctx, CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000);
> +	ret = __set_watchdog(ctx, i915->params.request_timeout_ms * 1000);
>  	if (ret)
>  		drm_notice(&i915->drm,
>  			   "Failed to configure default fence expiry! (%d)",
> @@ -1442,13 +1442,15 @@ __set_watchdog(struct i915_gem_context *ctx, unsigned long timeout_us)
>  static int set_watchdog(struct i915_gem_context *ctx,
>  			struct drm_i915_gem_context_param *args)
>  {
> +	struct drm_i915_private *i915 = ctx->i915;
> +
>  	if (args->size)
>  		return -EINVAL;
>  
>  	/* Disallow disabling or configuring longer watchdog than default. */
> -	if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) &&
> +	if (i915->params.request_timeout_ms &&
>  	    (!args->value ||
> -	     args->value > CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000))
> +	     args->value > i915->params.request_timeout_ms * 1000))
>  		return -EPERM;
>  
>  	return __set_watchdog(ctx, args->value);
> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> index 6939634e56ed..0320878d96b0 100644
> --- a/drivers/gpu/drm/i915/i915_params.c
> +++ b/drivers/gpu/drm/i915/i915_params.c
> @@ -197,6 +197,11 @@ i915_param_named_unsafe(fake_lmem_start, ulong, 0400,
>  	"Fake LMEM start offset (default: 0)");
>  #endif
>  
> +#if CONFIG_DRM_I915_REQUEST_TIMEOUT
> +i915_param_named_unsafe(request_timeout_ms, uint, 0600,
> +			"Default request/fence/batch buffer expiration timeout.");
> +#endif
> +
>  static __always_inline void _print_param(struct drm_printer *p,
>  					 const char *name,
>  					 const char *type,
> diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
> index 48f47e44e848..34ebb0662547 100644
> --- a/drivers/gpu/drm/i915/i915_params.h
> +++ b/drivers/gpu/drm/i915/i915_params.h
> @@ -72,6 +72,7 @@ struct drm_printer;
>  	param(int, enable_dpcd_backlight, -1, 0600) \
>  	param(char *, force_probe, CONFIG_DRM_I915_FORCE_PROBE, 0400) \
>  	param(unsigned long, fake_lmem_start, 0, 0400) \
> +	param(unsigned int, request_timeout_ms, CONFIG_DRM_I915_REQUEST_TIMEOUT, 0600) \
>  	/* leave bools at the end to not create holes */ \
>  	param(bool, enable_hangcheck, true, 0600) \
>  	param(bool, load_detect_test, false, 0600) \
> -- 
> 2.27.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Intel-gfx] [RFC 6/6] drm/i915: Allow configuring default request expiry via modparam
@ 2021-03-16 10:03     ` Daniel Vetter
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2021-03-16 10:03 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx, dri-devel

On Fri, Mar 12, 2021 at 03:46:22PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Module parameter is added (request_timeout_ms) to allow configuring the
> default request/fence expiry.
> 
> Default value is inherited from CONFIG_DRM_I915_REQUEST_TIMEOUT.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Yeah I think this makes sense for debugging and testing (e.g. in igt we
can crank down the timeout to make stuff fail real fast, could help with
runtime on some tests).

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Cheers, Daniel

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 8 +++++---
>  drivers/gpu/drm/i915/i915_params.c          | 5 +++++
>  drivers/gpu/drm/i915/i915_params.h          | 1 +
>  3 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 21c0176e27a0..1dae5e2514a9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -866,7 +866,7 @@ static void __set_default_fence_expiry(struct i915_gem_context *ctx)
>  		return;
>  
>  	/* Default expiry for user fences. */
> -	ret = __set_watchdog(ctx, CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000);
> +	ret = __set_watchdog(ctx, i915->params.request_timeout_ms * 1000);
>  	if (ret)
>  		drm_notice(&i915->drm,
>  			   "Failed to configure default fence expiry! (%d)",
> @@ -1442,13 +1442,15 @@ __set_watchdog(struct i915_gem_context *ctx, unsigned long timeout_us)
>  static int set_watchdog(struct i915_gem_context *ctx,
>  			struct drm_i915_gem_context_param *args)
>  {
> +	struct drm_i915_private *i915 = ctx->i915;
> +
>  	if (args->size)
>  		return -EINVAL;
>  
>  	/* Disallow disabling or configuring longer watchdog than default. */
> -	if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) &&
> +	if (i915->params.request_timeout_ms &&
>  	    (!args->value ||
> -	     args->value > CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000))
> +	     args->value > i915->params.request_timeout_ms * 1000))
>  		return -EPERM;
>  
>  	return __set_watchdog(ctx, args->value);
> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> index 6939634e56ed..0320878d96b0 100644
> --- a/drivers/gpu/drm/i915/i915_params.c
> +++ b/drivers/gpu/drm/i915/i915_params.c
> @@ -197,6 +197,11 @@ i915_param_named_unsafe(fake_lmem_start, ulong, 0400,
>  	"Fake LMEM start offset (default: 0)");
>  #endif
>  
> +#if CONFIG_DRM_I915_REQUEST_TIMEOUT
> +i915_param_named_unsafe(request_timeout_ms, uint, 0600,
> +			"Default request/fence/batch buffer expiration timeout.");
> +#endif
> +
>  static __always_inline void _print_param(struct drm_printer *p,
>  					 const char *name,
>  					 const char *type,
> diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
> index 48f47e44e848..34ebb0662547 100644
> --- a/drivers/gpu/drm/i915/i915_params.h
> +++ b/drivers/gpu/drm/i915/i915_params.h
> @@ -72,6 +72,7 @@ struct drm_printer;
>  	param(int, enable_dpcd_backlight, -1, 0600) \
>  	param(char *, force_probe, CONFIG_DRM_I915_FORCE_PROBE, 0400) \
>  	param(unsigned long, fake_lmem_start, 0, 0400) \
> +	param(unsigned int, request_timeout_ms, CONFIG_DRM_I915_REQUEST_TIMEOUT, 0600) \
>  	/* leave bools at the end to not create holes */ \
>  	param(bool, enable_hangcheck, true, 0600) \
>  	param(bool, load_detect_test, false, 0600) \
> -- 
> 2.27.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 4/6] drm/i915: Allow userspace to configure the watchdog
  2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-03-16 10:09     ` Daniel Vetter
  -1 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2021-03-16 10:09 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx, dri-devel, Tvrtko Ursulin

On Fri, Mar 12, 2021 at 03:46:20PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Idea here is to make the watchdog mechanism more useful than for just
> default request/fence expiry.
> 
> To this effect a new context param I915_CONTEXT_PARAM_WATCHDOG is added
> where the value fields allows passing in a timeout in micro-seconds.
> 
> This allows userspace to set a limit to how long they expect their batches
> to take, or otherwise they will be cancelled, and userspace notified via
> one of the available mechanisms.
> 
> Main attractiveness of adding uapi here is perhaps to extend the proposal
> by passing in a structure instead of a single value, like for illustration
> only:
> 
> struct drm_i915_gem_context_watchdog {
> 	__u64 flags;
>  #define I915_CONTEXT_WATCHDOG_WALL_TIME	BIT(0)
>  #define I915_CONTEXT_WATCHDOG_GPU_TIME		BIT(1)
>  #define I915_CONTEXT_WATCHDOG_FROM_SUBMIT	BIT(2)
>  #define I915_CONTEXT_WATCHDOG_FROM_RUNNABLE	BIT(3)
> 	__64 timeout_us;
> };
> 
> Point being to prepare the uapi for different semantics from the start.
> Given how not a single one makes complete sense for all use cases. And
> also perhaps satisfy the long wanted media watchdog feature request.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

I think what's been discussed forever is a gpu time watchdog for media.
Otherwise I don't think userspace should be allowed to configure anything
here, and even the watch really should only allow to fail faster.

Now for the workloads that we do break with the next patch (and those
exist), the real solution is the preempt-ctx dma_fence that amdkfd does,
and which we're also working on. But unfortunately that's a lot more work
than a quick simple patch.

So please drop this one from the next round. If media wants a fast
gpu timeout then we should roll that in as a separate series, with
userspace and igt and all the usual bells&whistles.
-Daniel

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 57 +++++++++++++++++++
>  .../gpu/drm/i915/gem/i915_gem_context_types.h |  4 ++
>  drivers/gpu/drm/i915/gt/intel_context_param.h | 11 +++-
>  include/uapi/drm/i915_drm.h                   |  5 +-
>  4 files changed, 75 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index ca37d93ef5e7..32b05af4fc8f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -233,6 +233,8 @@ static void intel_context_set_gem(struct intel_context *ce,
>  	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
>  	    intel_engine_has_timeslices(ce->engine))
>  		__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
> +
> +	intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us);
>  }
>  
>  static void __free_engines(struct i915_gem_engines *e, unsigned int count)
> @@ -1397,6 +1399,28 @@ static int set_ringsize(struct i915_gem_context *ctx,
>  				 __intel_context_ring_size(args->value));
>  }
>  
> +static int __apply_watchdog(struct intel_context *ce, void *timeout_us)
> +{
> +	return intel_context_set_watchdog_us(ce, (uintptr_t)timeout_us);
> +}
> +
> +static int set_watchdog(struct i915_gem_context *ctx,
> +			struct drm_i915_gem_context_param *args)
> +{
> +	int ret;
> +
> +	if (args->size)
> +		return -EINVAL;
> +
> +	ret = context_apply_all(ctx, __apply_watchdog,
> +				(void *)(uintptr_t)args->value);
> +
> +	if (!ret)
> +		ctx->watchdog.timeout_us = args->value;
> +
> +	return ret;
> +}
> +
>  static int __get_ringsize(struct intel_context *ce, void *arg)
>  {
>  	long sz;
> @@ -1426,6 +1450,17 @@ static int get_ringsize(struct i915_gem_context *ctx,
>  	return 0;
>  }
>  
> +static int get_watchdog(struct i915_gem_context *ctx,
> +			struct drm_i915_gem_context_param *args)
> +{
> +	if (args->size)
> +		return -EINVAL;
> +
> +	args->value = ctx->watchdog.timeout_us;
> +
> +	return 0;
> +}
> +
>  int
>  i915_gem_user_to_context_sseu(struct intel_gt *gt,
>  			      const struct drm_i915_gem_context_param_sseu *user,
> @@ -2075,6 +2110,10 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
>  		ret = set_ringsize(ctx, args);
>  		break;
>  
> +	case I915_CONTEXT_PARAM_WATCHDOG:
> +		ret = set_watchdog(ctx, args);
> +		break;
> +
>  	case I915_CONTEXT_PARAM_BAN_PERIOD:
>  	default:
>  		ret = -EINVAL;
> @@ -2196,6 +2235,19 @@ static int clone_schedattr(struct i915_gem_context *dst,
>  	return 0;
>  }
>  
> +static int clone_watchdog(struct i915_gem_context *dst,
> +			  struct i915_gem_context *src)
> +{
> +	int ret;
> +
> +	ret = context_apply_all(dst, __apply_watchdog,
> +				(void *)(uintptr_t)src->watchdog.timeout_us);
> +	if (!ret)
> +		dst->watchdog = src->watchdog;
> +
> +	return ret;
> +}
> +
>  static int clone_sseu(struct i915_gem_context *dst,
>  		      struct i915_gem_context *src)
>  {
> @@ -2279,6 +2331,7 @@ static int create_clone(struct i915_user_extension __user *ext, void *data)
>  		MAP(SSEU, clone_sseu),
>  		MAP(TIMELINE, clone_timeline),
>  		MAP(VM, clone_vm),
> +		MAP(WATCHDOG, clone_watchdog),
>  #undef MAP
>  	};
>  	struct drm_i915_gem_context_create_ext_clone local;
> @@ -2532,6 +2585,10 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>  		ret = get_ringsize(ctx, args);
>  		break;
>  
> +	case I915_CONTEXT_PARAM_WATCHDOG:
> +		ret = get_watchdog(ctx, args);
> +		break;
> +
>  	case I915_CONTEXT_PARAM_BAN_PERIOD:
>  	default:
>  		ret = -EINVAL;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
> index d5bc75508048..f17da7e26c43 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
> @@ -150,6 +150,10 @@ struct i915_gem_context {
>  	 */
>  	atomic_t active_count;
>  
> +	struct {
> +		u64 timeout_us;
> +	} watchdog;
> +
>  	/**
>  	 * @hang_timestamp: The last time(s) this context caused a GPU hang
>  	 */
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_param.h b/drivers/gpu/drm/i915/gt/intel_context_param.h
> index f053d8633fe2..3ecacc675f41 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_param.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_param.h
> @@ -6,9 +6,18 @@
>  #ifndef INTEL_CONTEXT_PARAM_H
>  #define INTEL_CONTEXT_PARAM_H
>  
> -struct intel_context;
> +#include <linux/types.h>
> +
> +#include "intel_context.h"
>  
>  int intel_context_set_ring_size(struct intel_context *ce, long sz);
>  long intel_context_get_ring_size(struct intel_context *ce);
>  
> +static inline int
> +intel_context_set_watchdog_us(struct intel_context *ce, u64 timeout_us)
> +{
> +	ce->watchdog.timeout_us = timeout_us;
> +	return 0;
> +}
> +
>  #endif /* INTEL_CONTEXT_PARAM_H */
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 1987e2ea79a3..a4c65780850c 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1694,6 +1694,8 @@ struct drm_i915_gem_context_param {
>   * Default is 16 KiB.
>   */
>  #define I915_CONTEXT_PARAM_RINGSIZE	0xc
> +
> +#define I915_CONTEXT_PARAM_WATCHDOG	0xd
>  /* Must be kept compact -- no holes and well documented */
>  
>  	__u64 value;
> @@ -1863,7 +1865,8 @@ struct drm_i915_gem_context_create_ext_clone {
>  #define I915_CONTEXT_CLONE_SSEU		(1u << 3)
>  #define I915_CONTEXT_CLONE_TIMELINE	(1u << 4)
>  #define I915_CONTEXT_CLONE_VM		(1u << 5)
> -#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_VM << 1)
> +#define I915_CONTEXT_CLONE_WATCHDOG	(1u << 6)
> +#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_WATCHDOG << 1)
>  	__u64 rsvd;
>  };
>  
> -- 
> 2.27.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Intel-gfx] [RFC 4/6] drm/i915: Allow userspace to configure the watchdog
@ 2021-03-16 10:09     ` Daniel Vetter
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2021-03-16 10:09 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx, dri-devel

On Fri, Mar 12, 2021 at 03:46:20PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Idea here is to make the watchdog mechanism more useful than for just
> default request/fence expiry.
> 
> To this effect a new context param I915_CONTEXT_PARAM_WATCHDOG is added
> where the value fields allows passing in a timeout in micro-seconds.
> 
> This allows userspace to set a limit to how long they expect their batches
> to take, or otherwise they will be cancelled, and userspace notified via
> one of the available mechanisms.
> 
> Main attractiveness of adding uapi here is perhaps to extend the proposal
> by passing in a structure instead of a single value, like for illustration
> only:
> 
> struct drm_i915_gem_context_watchdog {
> 	__u64 flags;
>  #define I915_CONTEXT_WATCHDOG_WALL_TIME	BIT(0)
>  #define I915_CONTEXT_WATCHDOG_GPU_TIME		BIT(1)
>  #define I915_CONTEXT_WATCHDOG_FROM_SUBMIT	BIT(2)
>  #define I915_CONTEXT_WATCHDOG_FROM_RUNNABLE	BIT(3)
> 	__64 timeout_us;
> };
> 
> Point being to prepare the uapi for different semantics from the start.
> Given how not a single one makes complete sense for all use cases. And
> also perhaps satisfy the long wanted media watchdog feature request.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

I think what's been discussed forever is a gpu time watchdog for media.
Otherwise I don't think userspace should be allowed to configure anything
here, and even the watch really should only allow to fail faster.

Now for the workloads that we do break with the next patch (and those
exist), the real solution is the preempt-ctx dma_fence that amdkfd does,
and which we're also working on. But unfortunately that's a lot more work
than a quick simple patch.

So please drop this one from the next round. If media wants a fast
gpu timeout then we should roll that in as a separate series, with
userspace and igt and all the usual bells&whistles.
-Daniel

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 57 +++++++++++++++++++
>  .../gpu/drm/i915/gem/i915_gem_context_types.h |  4 ++
>  drivers/gpu/drm/i915/gt/intel_context_param.h | 11 +++-
>  include/uapi/drm/i915_drm.h                   |  5 +-
>  4 files changed, 75 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index ca37d93ef5e7..32b05af4fc8f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -233,6 +233,8 @@ static void intel_context_set_gem(struct intel_context *ce,
>  	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
>  	    intel_engine_has_timeslices(ce->engine))
>  		__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
> +
> +	intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us);
>  }
>  
>  static void __free_engines(struct i915_gem_engines *e, unsigned int count)
> @@ -1397,6 +1399,28 @@ static int set_ringsize(struct i915_gem_context *ctx,
>  				 __intel_context_ring_size(args->value));
>  }
>  
> +static int __apply_watchdog(struct intel_context *ce, void *timeout_us)
> +{
> +	return intel_context_set_watchdog_us(ce, (uintptr_t)timeout_us);
> +}
> +
> +static int set_watchdog(struct i915_gem_context *ctx,
> +			struct drm_i915_gem_context_param *args)
> +{
> +	int ret;
> +
> +	if (args->size)
> +		return -EINVAL;
> +
> +	ret = context_apply_all(ctx, __apply_watchdog,
> +				(void *)(uintptr_t)args->value);
> +
> +	if (!ret)
> +		ctx->watchdog.timeout_us = args->value;
> +
> +	return ret;
> +}
> +
>  static int __get_ringsize(struct intel_context *ce, void *arg)
>  {
>  	long sz;
> @@ -1426,6 +1450,17 @@ static int get_ringsize(struct i915_gem_context *ctx,
>  	return 0;
>  }
>  
> +static int get_watchdog(struct i915_gem_context *ctx,
> +			struct drm_i915_gem_context_param *args)
> +{
> +	if (args->size)
> +		return -EINVAL;
> +
> +	args->value = ctx->watchdog.timeout_us;
> +
> +	return 0;
> +}
> +
>  int
>  i915_gem_user_to_context_sseu(struct intel_gt *gt,
>  			      const struct drm_i915_gem_context_param_sseu *user,
> @@ -2075,6 +2110,10 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
>  		ret = set_ringsize(ctx, args);
>  		break;
>  
> +	case I915_CONTEXT_PARAM_WATCHDOG:
> +		ret = set_watchdog(ctx, args);
> +		break;
> +
>  	case I915_CONTEXT_PARAM_BAN_PERIOD:
>  	default:
>  		ret = -EINVAL;
> @@ -2196,6 +2235,19 @@ static int clone_schedattr(struct i915_gem_context *dst,
>  	return 0;
>  }
>  
> +static int clone_watchdog(struct i915_gem_context *dst,
> +			  struct i915_gem_context *src)
> +{
> +	int ret;
> +
> +	ret = context_apply_all(dst, __apply_watchdog,
> +				(void *)(uintptr_t)src->watchdog.timeout_us);
> +	if (!ret)
> +		dst->watchdog = src->watchdog;
> +
> +	return ret;
> +}
> +
>  static int clone_sseu(struct i915_gem_context *dst,
>  		      struct i915_gem_context *src)
>  {
> @@ -2279,6 +2331,7 @@ static int create_clone(struct i915_user_extension __user *ext, void *data)
>  		MAP(SSEU, clone_sseu),
>  		MAP(TIMELINE, clone_timeline),
>  		MAP(VM, clone_vm),
> +		MAP(WATCHDOG, clone_watchdog),
>  #undef MAP
>  	};
>  	struct drm_i915_gem_context_create_ext_clone local;
> @@ -2532,6 +2585,10 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>  		ret = get_ringsize(ctx, args);
>  		break;
>  
> +	case I915_CONTEXT_PARAM_WATCHDOG:
> +		ret = get_watchdog(ctx, args);
> +		break;
> +
>  	case I915_CONTEXT_PARAM_BAN_PERIOD:
>  	default:
>  		ret = -EINVAL;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
> index d5bc75508048..f17da7e26c43 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
> @@ -150,6 +150,10 @@ struct i915_gem_context {
>  	 */
>  	atomic_t active_count;
>  
> +	struct {
> +		u64 timeout_us;
> +	} watchdog;
> +
>  	/**
>  	 * @hang_timestamp: The last time(s) this context caused a GPU hang
>  	 */
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_param.h b/drivers/gpu/drm/i915/gt/intel_context_param.h
> index f053d8633fe2..3ecacc675f41 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_param.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_param.h
> @@ -6,9 +6,18 @@
>  #ifndef INTEL_CONTEXT_PARAM_H
>  #define INTEL_CONTEXT_PARAM_H
>  
> -struct intel_context;
> +#include <linux/types.h>
> +
> +#include "intel_context.h"
>  
>  int intel_context_set_ring_size(struct intel_context *ce, long sz);
>  long intel_context_get_ring_size(struct intel_context *ce);
>  
> +static inline int
> +intel_context_set_watchdog_us(struct intel_context *ce, u64 timeout_us)
> +{
> +	ce->watchdog.timeout_us = timeout_us;
> +	return 0;
> +}
> +
>  #endif /* INTEL_CONTEXT_PARAM_H */
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 1987e2ea79a3..a4c65780850c 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1694,6 +1694,8 @@ struct drm_i915_gem_context_param {
>   * Default is 16 KiB.
>   */
>  #define I915_CONTEXT_PARAM_RINGSIZE	0xc
> +
> +#define I915_CONTEXT_PARAM_WATCHDOG	0xd
>  /* Must be kept compact -- no holes and well documented */
>  
>  	__u64 value;
> @@ -1863,7 +1865,8 @@ struct drm_i915_gem_context_create_ext_clone {
>  #define I915_CONTEXT_CLONE_SSEU		(1u << 3)
>  #define I915_CONTEXT_CLONE_TIMELINE	(1u << 4)
>  #define I915_CONTEXT_CLONE_VM		(1u << 5)
> -#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_VM << 1)
> +#define I915_CONTEXT_CLONE_WATCHDOG	(1u << 6)
> +#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_WATCHDOG << 1)
>  	__u64 rsvd;
>  };
>  
> -- 
> 2.27.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 5/6] drm/i915: Fail too long user submissions by default
  2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-03-16 10:10     ` Daniel Vetter
  -1 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2021-03-16 10:10 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx, dri-devel, Tvrtko Ursulin

On Fri, Mar 12, 2021 at 03:46:21PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> A new Kconfig option CONFIG_DRM_I915_REQUEST_TIMEOUT is added, defaulting
> to 10s, and this timeout is applied to _all_ contexts using the previously
> added watchdog facility.
> 
> Result of this is that any user submission will simply fail after this
> time, either causing a reset (for non-preemptable) or incomplete results.
> 
> This can have an effect that workloads which used to work fine will
> suddenly start failing.
> 
> When the default expiry is active userspace will not be allowed to
> decrease the timeout using the context param setting.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

I think this should explain that it will break long running compute
workloads, and that maybe the modparam in the next patch can paper over
that until we've implemented proper long running compute workload support
in upstream. Which is unfortunately still some ways off.

Otherwise makes all sense to me. Maybe if you want also copy some of the
discussion from your cover letter into this commit message, and think
there's some good stuff there.
-Daniel

> ---
>  drivers/gpu/drm/i915/Kconfig.profile        |  8 ++++
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 47 ++++++++++++++++++---
>  2 files changed, 48 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> index 35bbe2b80596..55e157ffff73 100644
> --- a/drivers/gpu/drm/i915/Kconfig.profile
> +++ b/drivers/gpu/drm/i915/Kconfig.profile
> @@ -1,3 +1,11 @@
> +config DRM_I915_REQUEST_TIMEOUT
> +	int "Default timeout for requests (ms)"
> +	default 10000 # milliseconds
> +	help
> +	  ...
> +
> +	  May be 0 to disable the timeout.
> +
>  config DRM_I915_FENCE_TIMEOUT
>  	int "Timeout for unsignaled foreign fences (ms, jiffy granularity)"
>  	default 10000 # milliseconds
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 32b05af4fc8f..21c0176e27a0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -854,6 +854,25 @@ static void __assign_timeline(struct i915_gem_context *ctx,
>  	context_apply_all(ctx, __apply_timeline, timeline);
>  }
>  
> +static int
> +__set_watchdog(struct i915_gem_context *ctx, unsigned long timeout_us);
> +
> +static void __set_default_fence_expiry(struct i915_gem_context *ctx)
> +{
> +	struct drm_i915_private *i915 = ctx->i915;
> +	int ret;
> +
> +	if (!IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT))
> +		return;
> +
> +	/* Default expiry for user fences. */
> +	ret = __set_watchdog(ctx, CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000);
> +	if (ret)
> +		drm_notice(&i915->drm,
> +			   "Failed to configure default fence expiry! (%d)",
> +			   ret);
> +}
> +
>  static struct i915_gem_context *
>  i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags)
>  {
> @@ -898,6 +917,8 @@ i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags)
>  		intel_timeline_put(timeline);
>  	}
>  
> +	__set_default_fence_expiry(ctx);
> +
>  	trace_i915_context_create(ctx);
>  
>  	return ctx;
> @@ -1404,23 +1425,35 @@ static int __apply_watchdog(struct intel_context *ce, void *timeout_us)
>  	return intel_context_set_watchdog_us(ce, (uintptr_t)timeout_us);
>  }
>  
> -static int set_watchdog(struct i915_gem_context *ctx,
> -			struct drm_i915_gem_context_param *args)
> +static int
> +__set_watchdog(struct i915_gem_context *ctx, unsigned long timeout_us)
>  {
>  	int ret;
>  
> -	if (args->size)
> -		return -EINVAL;
> -
>  	ret = context_apply_all(ctx, __apply_watchdog,
> -				(void *)(uintptr_t)args->value);
> +				(void *)(uintptr_t)timeout_us);
>  
>  	if (!ret)
> -		ctx->watchdog.timeout_us = args->value;
> +		ctx->watchdog.timeout_us = timeout_us;
>  
>  	return ret;
>  }
>  
> +static int set_watchdog(struct i915_gem_context *ctx,
> +			struct drm_i915_gem_context_param *args)
> +{
> +	if (args->size)
> +		return -EINVAL;
> +
> +	/* Disallow disabling or configuring longer watchdog than default. */
> +	if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) &&
> +	    (!args->value ||
> +	     args->value > CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000))
> +		return -EPERM;
> +
> +	return __set_watchdog(ctx, args->value);
> +}
> +
>  static int __get_ringsize(struct intel_context *ce, void *arg)
>  {
>  	long sz;
> -- 
> 2.27.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Intel-gfx] [RFC 5/6] drm/i915: Fail too long user submissions by default
@ 2021-03-16 10:10     ` Daniel Vetter
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2021-03-16 10:10 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx, dri-devel

On Fri, Mar 12, 2021 at 03:46:21PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> A new Kconfig option CONFIG_DRM_I915_REQUEST_TIMEOUT is added, defaulting
> to 10s, and this timeout is applied to _all_ contexts using the previously
> added watchdog facility.
> 
> Result of this is that any user submission will simply fail after this
> time, either causing a reset (for non-preemptable) or incomplete results.
> 
> This can have an effect that workloads which used to work fine will
> suddenly start failing.
> 
> When the default expiry is active userspace will not be allowed to
> decrease the timeout using the context param setting.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

I think this should explain that it will break long running compute
workloads, and that maybe the modparam in the next patch can paper over
that until we've implemented proper long running compute workload support
in upstream. Which is unfortunately still some ways off.

Otherwise makes all sense to me. Maybe if you want also copy some of the
discussion from your cover letter into this commit message, and think
there's some good stuff there.
-Daniel

> ---
>  drivers/gpu/drm/i915/Kconfig.profile        |  8 ++++
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 47 ++++++++++++++++++---
>  2 files changed, 48 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> index 35bbe2b80596..55e157ffff73 100644
> --- a/drivers/gpu/drm/i915/Kconfig.profile
> +++ b/drivers/gpu/drm/i915/Kconfig.profile
> @@ -1,3 +1,11 @@
> +config DRM_I915_REQUEST_TIMEOUT
> +	int "Default timeout for requests (ms)"
> +	default 10000 # milliseconds
> +	help
> +	  ...
> +
> +	  May be 0 to disable the timeout.
> +
>  config DRM_I915_FENCE_TIMEOUT
>  	int "Timeout for unsignaled foreign fences (ms, jiffy granularity)"
>  	default 10000 # milliseconds
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 32b05af4fc8f..21c0176e27a0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -854,6 +854,25 @@ static void __assign_timeline(struct i915_gem_context *ctx,
>  	context_apply_all(ctx, __apply_timeline, timeline);
>  }
>  
> +static int
> +__set_watchdog(struct i915_gem_context *ctx, unsigned long timeout_us);
> +
> +static void __set_default_fence_expiry(struct i915_gem_context *ctx)
> +{
> +	struct drm_i915_private *i915 = ctx->i915;
> +	int ret;
> +
> +	if (!IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT))
> +		return;
> +
> +	/* Default expiry for user fences. */
> +	ret = __set_watchdog(ctx, CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000);
> +	if (ret)
> +		drm_notice(&i915->drm,
> +			   "Failed to configure default fence expiry! (%d)",
> +			   ret);
> +}
> +
>  static struct i915_gem_context *
>  i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags)
>  {
> @@ -898,6 +917,8 @@ i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags)
>  		intel_timeline_put(timeline);
>  	}
>  
> +	__set_default_fence_expiry(ctx);
> +
>  	trace_i915_context_create(ctx);
>  
>  	return ctx;
> @@ -1404,23 +1425,35 @@ static int __apply_watchdog(struct intel_context *ce, void *timeout_us)
>  	return intel_context_set_watchdog_us(ce, (uintptr_t)timeout_us);
>  }
>  
> -static int set_watchdog(struct i915_gem_context *ctx,
> -			struct drm_i915_gem_context_param *args)
> +static int
> +__set_watchdog(struct i915_gem_context *ctx, unsigned long timeout_us)
>  {
>  	int ret;
>  
> -	if (args->size)
> -		return -EINVAL;
> -
>  	ret = context_apply_all(ctx, __apply_watchdog,
> -				(void *)(uintptr_t)args->value);
> +				(void *)(uintptr_t)timeout_us);
>  
>  	if (!ret)
> -		ctx->watchdog.timeout_us = args->value;
> +		ctx->watchdog.timeout_us = timeout_us;
>  
>  	return ret;
>  }
>  
> +static int set_watchdog(struct i915_gem_context *ctx,
> +			struct drm_i915_gem_context_param *args)
> +{
> +	if (args->size)
> +		return -EINVAL;
> +
> +	/* Disallow disabling or configuring longer watchdog than default. */
> +	if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) &&
> +	    (!args->value ||
> +	     args->value > CONFIG_DRM_I915_REQUEST_TIMEOUT * 1000))
> +		return -EPERM;
> +
> +	return __set_watchdog(ctx, args->value);
> +}
> +
>  static int __get_ringsize(struct intel_context *ce, void *arg)
>  {
>  	long sz;
> -- 
> 2.27.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-03-16 10:11 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-12 15:46 [RFC 0/6] Default request/fence expiry + watchdog Tvrtko Ursulin
2021-03-12 15:46 ` [Intel-gfx] " Tvrtko Ursulin
2021-03-12 15:46 ` [RFC 1/6] drm/i915: Individual request cancellation Tvrtko Ursulin
2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-15 17:37   ` Tvrtko Ursulin
2021-03-15 17:37     ` Tvrtko Ursulin
2021-03-16 10:02     ` Daniel Vetter
2021-03-16 10:02       ` Daniel Vetter
2021-03-12 15:46 ` [RFC 2/6] drm/i915: Restrict sentinel requests further Tvrtko Ursulin
2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-12 15:46 ` [RFC 3/6] drm/i915: Request watchdog infrastructure Tvrtko Ursulin
2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-12 15:46 ` [RFC 4/6] drm/i915: Allow userspace to configure the watchdog Tvrtko Ursulin
2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-16 10:09   ` Daniel Vetter
2021-03-16 10:09     ` [Intel-gfx] " Daniel Vetter
2021-03-12 15:46 ` [RFC 5/6] drm/i915: Fail too long user submissions by default Tvrtko Ursulin
2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-16 10:10   ` Daniel Vetter
2021-03-16 10:10     ` [Intel-gfx] " Daniel Vetter
2021-03-12 15:46 ` [RFC 6/6] drm/i915: Allow configuring default request expiry via modparam Tvrtko Ursulin
2021-03-12 15:46   ` [Intel-gfx] " Tvrtko Ursulin
2021-03-16 10:03   ` Daniel Vetter
2021-03-16 10:03     ` [Intel-gfx] " Daniel Vetter
2021-03-12 16:22 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Default request/fence expiry + watchdog Patchwork
2021-03-12 16:48 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-03-12 18:25 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.