All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michel Thierry <michel.thierry@intel.com>
To: intel-gfx@lists.freedesktop.org
Subject: [PATCH v8 18/20] drm/i915: Watchdog timeout: DRM kernel interface to set the timeout
Date: Mon, 22 May 2017 10:46:39 -0700	[thread overview]
Message-ID: <20170522174641.25354-19-michel.thierry@intel.com> (raw)
In-Reply-To: <20170522174641.25354-1-michel.thierry@intel.com>

Final enablement patch for GPU hang detection using watchdog timeout.
Using the gem_context_setparam ioctl, users can specify the desired
timeout value in microseconds, and the driver will do the conversion to
'timestamps'.

The recommended default watchdog threshold for video engines is 60000 us,
since this has been _empirically determined_ to be a good compromise for
low-latency requirements and low rate of false positives. The default
register value is ~106000us and the theoretical max value (all 1s) is
353 seconds.

Note, UABI engine ids and i915 engine ids are different, and this patch
uses the i915 ones. Some kind of mapping table [1] is required if we
decide to use the UABI engine ids.

[1] http://patchwork.freedesktop.org/patch/msgid/20170329135831.30254-2-chris@chris-wilson.co.uk

v2: Fixed get api to return values in microseconds. Threshold updated to
be per context engine. Check for u32 overflow. Capture ctx threshold
value in error state.

v3: Add a way to get array size, short-cut to disable all thresholds,
return EFAULT / EINVAL as needed. Move the capture of the threshold
value in the error state into a new patch. BXT has a different
timestamp base (because why not?).

v4: Checking if watchdog is available should be the first thing to
do, instead of giving false hopes to abi users; remove unnecessary & in
set_watchdog; ignore args->size in getparam.

v5: GEN9-LP platforms have a different crystal clock frequency, use the
right timestamp base for them (magic 8-ball predicts this will change
again later on, so future-proof it). (Daniele)

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 56 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.c | 95 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        |  5 +-
 include/uapi/drm/i915_drm.h             |  1 +
 4 files changed, 155 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d7141d65c3a9..3a757ef7dec0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2191,6 +2191,9 @@ struct drm_i915_private {
 	struct drm_i915_fence_reg fence_regs[I915_MAX_NUM_FENCES]; /* assume 965 */
 	int num_fence_regs; /* 8 on pre-965, 16 otherwise */
 
+	/* Command stream timestamp base - helps define watchdog threshold */
+	u32 cs_timestamp_base;
+
 	unsigned int fsb_freq, mem_freq, is_ddr3;
 	unsigned int skl_preferred_vco_freq;
 	unsigned int max_cdclk_freq;
@@ -3554,6 +3557,59 @@ i915_gem_context_lookup_timeline(struct i915_gem_context *ctx,
 	return &vm->timeline.engine[engine->id];
 }
 
+/*
+ * BDW, CHV & SKL+ Timestamp timer resolution = 0.080 uSec,
+ * or 12500000 counts per second, or ~12 counts per microsecond.
+ *
+ * But BXT/GLK Timestamp timer resolution is different, 0.052 uSec,
+ * or 19200000 counts per second, or ~19 counts per microsecond.
+ *
+ * Future-proofing, some day it won't be as simple as just GEN & IS_LP.
+ */
+#define GEN8_TIMESTAMP_CNTS_PER_USEC 12
+#define GEN9_LP_TIMESTAMP_CNTS_PER_USEC 19
+static inline u32 cs_timestamp_in_us(struct drm_i915_private *dev_priv)
+{
+	u32 cs_timestamp_base = dev_priv->cs_timestamp_base;
+
+	if (cs_timestamp_base)
+		return cs_timestamp_base;
+
+	switch (INTEL_GEN(dev_priv)) {
+	default:
+		MISSING_CASE(INTEL_GEN(dev_priv));
+		/* fall through */
+	case 9:
+		cs_timestamp_base = IS_GEN9_LP(dev_priv) ?
+					GEN9_LP_TIMESTAMP_CNTS_PER_USEC :
+					GEN8_TIMESTAMP_CNTS_PER_USEC;
+		break;
+	case 8:
+		cs_timestamp_base = GEN8_TIMESTAMP_CNTS_PER_USEC;
+		break;
+	}
+
+	dev_priv->cs_timestamp_base = cs_timestamp_base;
+	return cs_timestamp_base;
+}
+
+static inline u32
+watchdog_to_us(struct drm_i915_private *dev_priv, u32 value_in_clock_counts)
+{
+	return value_in_clock_counts / cs_timestamp_in_us(dev_priv);
+}
+
+static inline u32
+watchdog_to_clock_counts(struct drm_i915_private *dev_priv, u64 value_in_us)
+{
+	u64 threshold = value_in_us * cs_timestamp_in_us(dev_priv);
+
+	if (overflows_type(threshold, u32))
+		return -EINVAL;
+
+	return threshold;
+}
+
 int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 			 struct drm_file *file);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 210d6638e97b..a56e79430082 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -337,6 +337,95 @@ i915_gem_context_create_gvt(struct drm_device *dev)
 	return ctx;
 }
 
+/* Return the timer count threshold in microseconds. */
+int i915_gem_context_get_watchdog(struct i915_gem_context *ctx,
+				  struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_private *dev_priv = ctx->i915;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	u32 threshold_in_us[I915_NUM_ENGINES];
+
+	if (!dev_priv->engine[VCS]->emit_start_watchdog)
+		return -ENODEV;
+
+	for_each_engine(engine, dev_priv, id) {
+		struct intel_context *ce = &ctx->engine[id];
+
+		threshold_in_us[id] = watchdog_to_us(dev_priv,
+						     ce->watchdog_threshold);
+	}
+
+	mutex_unlock(&dev_priv->drm.struct_mutex);
+	if (__copy_to_user(u64_to_user_ptr(args->value),
+			   &threshold_in_us,
+			   sizeof(threshold_in_us))) {
+		mutex_lock(&dev_priv->drm.struct_mutex);
+		return -EFAULT;
+	}
+	mutex_lock(&dev_priv->drm.struct_mutex);
+
+	args->size = sizeof(threshold_in_us);
+
+	return 0;
+}
+
+/*
+ * Based on time out value in microseconds (us) calculate
+ * timer count thresholds needed based on core frequency.
+ * Watchdog can be disabled by setting it to 0.
+ */
+int i915_gem_context_set_watchdog(struct i915_gem_context *ctx,
+				  struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_private *dev_priv = ctx->i915;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	u32 threshold[I915_NUM_ENGINES];
+
+	if (!dev_priv->engine[VCS]->emit_start_watchdog)
+		return -ENODEV;
+
+	memset(threshold, 0, sizeof(threshold));
+
+	/* shortcut to disable in all engines */
+	if (args->size == 0)
+		goto set_watchdog;
+
+	if (args->size < sizeof(threshold))
+		return -EFAULT;
+
+	mutex_unlock(&dev_priv->drm.struct_mutex);
+	if (copy_from_user(threshold,
+			   u64_to_user_ptr(args->value),
+			   sizeof(threshold))) {
+		mutex_lock(&dev_priv->drm.struct_mutex);
+		return -EFAULT;
+	}
+	mutex_lock(&dev_priv->drm.struct_mutex);
+
+	/* not supported in blitter engine */
+	if (threshold[BCS] != 0)
+		return -EINVAL;
+
+	for_each_engine(engine, dev_priv, id) {
+		threshold[id] = watchdog_to_clock_counts(dev_priv,
+							 threshold[id]);
+
+		if (threshold[id] == -EINVAL)
+			return -EINVAL;
+	}
+
+set_watchdog:
+	for_each_engine(engine, dev_priv, id) {
+		struct intel_context *ce = &ctx->engine[id];
+
+		ce->watchdog_threshold = threshold[id];
+	}
+
+	return 0;
+}
+
 int i915_gem_context_init(struct drm_i915_private *dev_priv)
 {
 	struct i915_gem_context *ctx;
@@ -957,6 +1046,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_BANNABLE:
 		args->value = i915_gem_context_is_bannable(ctx);
 		break;
+	case I915_CONTEXT_PARAM_WATCHDOG:
+		ret = i915_gem_context_get_watchdog(ctx, args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -1014,6 +1106,9 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		else
 			i915_gem_context_clear_bannable(ctx);
 		break;
+	case I915_CONTEXT_PARAM_WATCHDOG:
+		ret = i915_gem_context_set_watchdog(ctx, args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d7a6ba48cd18..5765b6d7da26 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1573,7 +1573,7 @@ static inline u32 get_watchdog_disable(struct intel_engine_cs *engine)
 		return GEN8_XCS_WATCHDOG_DISABLE;
 }
 
-#define GEN8_WATCHDOG_1000US 0x2ee0 //XXX: Temp, replace with helper function
+#define GEN8_WATCHDOG_1000US(dev_priv) watchdog_to_clock_counts(dev_priv, 1000)
 static void gen8_watchdog_irq_handler(unsigned long data)
 {
 	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
@@ -1603,7 +1603,8 @@ static void gen8_watchdog_irq_handler(unsigned long data)
 	} else {
 		engine->hangcheck.watchdog = current_seqno;
 		/* Re-start the counter, if really hung, it will expire again */
-		I915_WRITE_FW(RING_THRESH(engine->mmio_base), GEN8_WATCHDOG_1000US);
+		I915_WRITE_FW(RING_THRESH(engine->mmio_base),
+			      GEN8_WATCHDOG_1000US(dev_priv));
 		I915_WRITE_FW(RING_CNTR(engine->mmio_base), GEN8_WATCHDOG_ENABLE);
 	}
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index fadedefba6db..18bc0ec618dd 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1308,6 +1308,7 @@ struct drm_i915_gem_context_param {
 #define I915_CONTEXT_PARAM_GTT_SIZE	0x3
 #define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE	0x4
 #define I915_CONTEXT_PARAM_BANNABLE	0x5
+#define I915_CONTEXT_PARAM_WATCHDOG	0x6
 	__u64 value;
 };
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2017-05-22 17:46 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-22 17:46 [PATCH v8 00/20] Gen8+ engine-reset Michel Thierry
2017-05-22 17:46 ` [PATCH v8 01/20] drm/i915: Look for active requests earlier in the reset path Michel Thierry
2017-05-22 17:46 ` [PATCH v8 02/20] drm/i915: Update i915.reset to handle engine resets Michel Thierry
2017-05-22 17:46 ` [PATCH v8 03/20] drm/i915: Modify error handler for per engine hang recovery Michel Thierry
2017-06-02 20:16   ` Chris Wilson
2017-06-02 20:38     ` Michel Thierry
2017-06-02 21:29       ` Chris Wilson
2017-06-04 11:47     ` Chris Wilson
2017-06-04 12:06       ` Chris Wilson
2017-06-06  0:40         ` Michel Thierry
2017-06-06 10:16           ` Chris Wilson
2017-06-06 18:51             ` Michel Thierry
2017-06-06  0:45   ` [PATCH v9] " Michel Thierry
2017-06-06 13:03     ` kbuild test robot
2017-06-06 13:05     ` kbuild test robot
2017-05-22 17:46 ` [PATCH v8 04/20] drm/i915: Add support for per engine reset recovery Michel Thierry
2017-05-22 17:46 ` [PATCH v8 05/20] drm/i915: Add engine reset count to error state Michel Thierry
2017-05-22 17:46 ` [PATCH v8 06/20] drm/i915: Export per-engine reset count info to debugfs Michel Thierry
2017-05-22 17:46 ` [PATCH v8 07/20] drm/i915: Carry on with reset even if hw engine is not ready Michel Thierry
2017-05-22 17:46 ` [PATCH v8 08/20] drm/i915: Enable Engine reset and recovery support Michel Thierry
2017-05-22 17:46 ` [PATCH v8 09/20] drm/i915: Add engine reset count in get-reset-stats ioctl Michel Thierry
2017-05-22 17:46 ` [PATCH v8 10/20] drm/i915/selftests: reset engine self tests Michel Thierry
2017-05-22 17:46 ` [PATCH v8 11/20] drm/i915/guc: fix mmio whitelist mmio_start offset and add reminder Michel Thierry
2017-05-22 17:46 ` [PATCH v8 12/20] drm/i915/guc: Provide register list to be saved/restored during engine reset Michel Thierry
2017-05-22 17:46 ` [PATCH v8 13/20] drm/i915/guc: Rename the function that resets the GuC Michel Thierry
2017-05-22 17:46 ` [PATCH v8 14/20] drm/i915/guc: Add support for reset engine using GuC commands Michel Thierry
2017-05-22 17:46 ` [PATCH v8 15/20] drm/i915: Watchdog timeout: Pass GuC shared data structure during param load Michel Thierry
2017-05-22 17:46 ` [PATCH v8 16/20] drm/i915: Watchdog timeout: IRQ handler for gen8+ Michel Thierry
2017-05-22 17:46 ` [PATCH v8 17/20] drm/i915: Watchdog timeout: Ringbuffer command emission " Michel Thierry
2017-05-22 17:46 ` Michel Thierry [this message]
2017-05-22 17:46 ` [PATCH v8 19/20] drm/i915: Watchdog timeout: Include threshold value in error state Michel Thierry
2017-05-22 17:46 ` [PATCH v8 20/20] drm/i915: Watchdog timeout: Export media reset count from GuC to debugfs Michel Thierry
2017-05-22 18:04 ` ✓ Fi.CI.BAT: success for Gen8+ engine-reset (rev11) Patchwork
2017-06-06  1:02 ` ✗ Fi.CI.BAT: failure for Gen8+ engine-reset (rev12) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170522174641.25354-19-michel.thierry@intel.com \
    --to=michel.thierry@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.