All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Subject: [PATCH 04/12] drm/i915/gt: Expose reset stop timeout via sysfs
Date: Tue, 22 Oct 2019 23:38:23 +0100	[thread overview]
Message-ID: <20191022223831.22677-4-chris@chris-wilson.co.uk> (raw)
In-Reply-To: <20191022223831.22677-1-chris@chris-wilson.co.uk>

Allow ourselves to sleep before a GPU reset, for a few milliseconds.
This allows for userspace to gracefully quiesce before being shot down.
By stopping submission and then waiting for the current requests to
complete, we are less likely to disrupt innocent contexts by performing
the reset while they are active. The longer we wait, the more likely we
are to spare innocents, but also the more disruptive to interactive, or
other low latency task, the reset itself will be.

The timeout can be adjusted using

	/sys/class/drm/card?/engine/*/stop_timeout_ms

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
---
 drivers/gpu/drm/i915/Kconfig.profile         | 14 +++++++++
 drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 19 +++++++++++-
 drivers/gpu/drm/i915/gt/intel_engine_sysfs.c | 31 ++++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  1 +
 4 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
index b8df80bc0b47..274bdd208e0d 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -26,6 +26,20 @@ config DRM_I915_SPIN_REQUEST
 	  the cost of enabling the interrupt (if currently disabled) to be
 	  a few microseconds.
 
+config DRM_I915_STOP_TIMEOUT
+	int "How long to wait for an engine to quiesce gracefully before reset (ms)"
+	default 100 # milliseconds
+	help
+	  By stopping submission and sleeping for a short time before resetting
+	  the GPU, we allow the innocent contexts also on the system to quiesce.
+	  It is then less likely for a hanging context to cause collateral
+	  damage as the system is reset in order to recover. The colorary is
+	  that the reset itself may take longer and so be more disruptive to
+	  interactive or low latency workloads.
+
+	  This is adjustable via
+	  /sys/class/drm/card?/engine/*/stop_timeout_ms
+
 config DRM_I915_TIMESLICE_DURATION
 	int "Scheduling quantum for userspace batches (ms, jiffy granularity)"
 	default 1 # milliseconds
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 98dbaaaaf3db..5f8dad04915f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -308,6 +308,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	engine->instance = info->instance;
 	__sprint_engine_name(engine);
 
+	engine->props.stop_timeout_ms =
+		CONFIG_DRM_I915_STOP_TIMEOUT;
 	engine->props.timeslice_duration_ms =
 		CONFIG_DRM_I915_TIMESLICE_DURATION;
 
@@ -878,6 +880,21 @@ u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine)
 	return bbaddr;
 }
 
+static unsigned long stop_timeout(const struct intel_engine_cs *engine)
+{
+	if (in_atomic() || irqs_disabled()) /* inside atomic preempt-reset? */
+		return 0; /* Going too fast, can't stop now! */
+
+	/*
+	 * If we are doing a normal GPU reset, we can take our time and allow
+	 * the engine to quiesce. We've stopped submission to the engine, and
+	 * if we wait long enough an innocent context should complete and
+	 * leave the engine idle. So they should not be caught unaware by
+	 * the forthcoming GPU reset (which usually follows the stop_cs)!
+	 */
+	return READ_ONCE(engine->props.stop_timeout_ms);
+}
+
 int intel_engine_stop_cs(struct intel_engine_cs *engine)
 {
 	struct intel_uncore *uncore = engine->uncore;
@@ -895,7 +912,7 @@ int intel_engine_stop_cs(struct intel_engine_cs *engine)
 	err = 0;
 	if (__intel_wait_for_register_fw(uncore,
 					 mode, MODE_IDLE, MODE_IDLE,
-					 1000, 0,
+					 1000, stop_timeout(engine),
 					 NULL)) {
 		GEM_TRACE("%s: timed out on STOP_RING -> IDLE\n", engine->name);
 		err = -ETIMEDOUT;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_sysfs.c b/drivers/gpu/drm/i915/gt/intel_engine_sysfs.c
index 55ae81769a8e..73a755f44cce 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_sysfs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_sysfs.c
@@ -184,6 +184,36 @@ timeslice_store(struct kobject *kobj, struct kobj_attribute *attr,
 static struct kobj_attribute timeslice_duration_attr =
 __ATTR(timeslice_duration_ms, 0644, timeslice_show, timeslice_store);
 
+static ssize_t
+stop_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
+{
+	struct intel_engine_cs *engine = kobj_to_engine(kobj);
+
+	return sprintf(buf, "%lu\n", engine->props.stop_timeout_ms);
+}
+
+static ssize_t
+stop_store(struct kobject *kobj, struct kobj_attribute *attr,
+	   const char *buf, size_t count)
+{
+	struct intel_engine_cs *engine = kobj_to_engine(kobj);
+	unsigned long long duration;
+	int err;
+
+	err = kstrtoull(buf, 0, &duration);
+	if (err)
+		return err;
+
+	if (duration > jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT))
+		return -EINVAL;
+
+	WRITE_ONCE(engine->props.stop_timeout_ms, duration);
+	return count;
+}
+
+static struct kobj_attribute stop_timeout_attr =
+__ATTR(stop_timeout_ms, 0644, stop_show, stop_store);
+
 static void kobj_engine_release(struct kobject *kobj)
 {
 	kfree(kobj);
@@ -224,6 +254,7 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 		&mmio_attr.attr,
 		&caps_attr.attr,
 		&all_caps_attr.attr,
+		&stop_timeout_attr.attr,
 		NULL
 	};
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 89a9616e8539..76ef8a63f921 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -544,6 +544,7 @@ struct intel_engine_cs {
 	} stats;
 
 	struct {
+		unsigned long stop_timeout_ms;
 		unsigned long timeslice_duration_ms;
 	} props;
 };
-- 
2.24.0.rc0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2019-10-22 22:39 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-22 22:38 [PATCH 01/12] drm/i915/gt: Expose engine properties via sysfs Chris Wilson
2019-10-22 22:38 ` [PATCH 02/12] drm/i915/gt: Expose engine->mmio_base " Chris Wilson
2019-10-22 22:38 ` [PATCH 03/12] drm/i915/gt: Expose timeslice duration to sysfs Chris Wilson
2019-10-22 22:38 ` Chris Wilson [this message]
2019-10-22 22:38 ` [PATCH 05/12] drm/i915/execlists: Force preemption Chris Wilson
2019-10-22 22:38 ` [PATCH 06/12] drm/i915/execlists: Cancel banned contexts on schedule-out Chris Wilson
2019-10-22 22:38 ` [PATCH 07/12] drm/i915/gem: Cancel contexts when hangchecking is disabled Chris Wilson
2019-10-22 22:38 ` [PATCH 08/12] drm/i915: Replace hangcheck by heartbeats Chris Wilson
2019-10-22 22:38 ` [PATCH 09/12] drm/i915/gem: Make context persistence optional Chris Wilson
2019-10-22 22:38 ` [PATCH 10/12] drm/i915: Flush idle barriers when waiting Chris Wilson
2019-10-22 22:38 ` [PATCH 11/12] drm/i915: Allow userspace to specify ringsize on construction Chris Wilson
2019-10-22 22:38 ` [PATCH 12/12] drm/i915/gem: Honour O_NONBLOCK before throttling execbuf submissions Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191022223831.22677-4-chris@chris-wilson.co.uk \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.