All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] GuC based reset engine
@ 2017-10-30 18:56 Michel Thierry
  2017-10-30 18:56 ` [PATCH 1/3] drm/i915/guc: Rename the function that resets the GuC Michel Thierry
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Michel Thierry @ 2017-10-30 18:56 UTC (permalink / raw)
  To: intel-gfx

We've been supporting reset-engine in execlist submission mode for a
while, but with GuC, the resubmission path had to be different because we
used to re-enable the engines before GuC... so we've been using full gpu
reset when GuC submission is enabled (which reset the fw).

Thanks to Michal Winiarski GuC preemption changes, the firmware is now
enabled before restarting the engines, so the replay part after
resetting an engine is no longer different.

The only change now is that the reset-engine is requested to the
firmware via a H2G command, and GuC is the one in charge of acquiring the
forcewake and idling the engine before reseting it.

(The first patch is a cosmetic change, which I'm sure Tvrtko doesn't
remember he reviewed [1]).

[1] https://lists.freedesktop.org/archives/intel-gfx/2017-April/126813.html

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>

Michal Wajdeczko (1):
  HAX enable GuC submission for CI

Michel Thierry (2):
  drm/i915/guc: Rename the function that resets the GuC
  drm/i915/guc: Add support for reset engine using GuC commands

 drivers/gpu/drm/i915/i915_drv.c       |  9 +++++++--
 drivers/gpu/drm/i915/i915_drv.h       |  3 ++-
 drivers/gpu/drm/i915/i915_gem_gtt.c   |  8 ++------
 drivers/gpu/drm/i915/i915_params.h    |  4 ++--
 drivers/gpu/drm/i915/intel_guc.c      | 24 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_guc_fwif.h |  1 +
 drivers/gpu/drm/i915/intel_uc.c       |  4 ++--
 drivers/gpu/drm/i915/intel_uncore.c   |  7 +------
 8 files changed, 41 insertions(+), 19 deletions(-)

-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/3] drm/i915/guc: Rename the function that resets the GuC
  2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
@ 2017-10-30 18:56 ` Michel Thierry
  2017-10-30 21:02   ` Chris Wilson
  2017-10-30 18:56 ` [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands Michel Thierry
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Michel Thierry @ 2017-10-30 18:56 UTC (permalink / raw)
  To: intel-gfx

intel_guc_reset sounds more like the microcontroller is the one performing
a reset, while in this case is the opposite. intel_reset_guc not only
makes it clearer, it follows the other intel_reset functions available.

v2: Print error message in English (Tvrtko).

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h     | 2 +-
 drivers/gpu/drm/i915/intel_uc.c     | 4 ++--
 drivers/gpu/drm/i915/intel_uncore.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d760f995a22a..0ed06ca07568 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3325,7 +3325,7 @@ extern int i915_reset_engine(struct intel_engine_cs *engine,
 			     unsigned int flags);
 
 extern bool intel_has_reset_engine(struct drm_i915_private *dev_priv);
-extern int intel_guc_reset(struct drm_i915_private *dev_priv);
+extern int intel_reset_guc(struct drm_i915_private *dev_priv);
 extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine);
 extern void intel_hangcheck_init(struct drm_i915_private *dev_priv);
 extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_uc.c b/drivers/gpu/drm/i915/intel_uc.c
index 25bd162f38d2..aec295470e0d 100644
--- a/drivers/gpu/drm/i915/intel_uc.c
+++ b/drivers/gpu/drm/i915/intel_uc.c
@@ -33,9 +33,9 @@ static int __intel_uc_reset_hw(struct drm_i915_private *dev_priv)
 	int ret;
 	u32 guc_status;
 
-	ret = intel_guc_reset(dev_priv);
+	ret = intel_reset_guc(dev_priv);
 	if (ret) {
-		DRM_ERROR("GuC reset failed, ret = %d\n", ret);
+		DRM_ERROR("Failed to reset GuC, ret = %d\n", ret);
 		return ret;
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 96ee6b2754be..d629b2e40013 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1803,7 +1803,7 @@ bool intel_has_reset_engine(struct drm_i915_private *dev_priv)
 		i915_modparams.reset >= 2);
 }
 
-int intel_guc_reset(struct drm_i915_private *dev_priv)
+int intel_reset_guc(struct drm_i915_private *dev_priv)
 {
 	int ret;
 
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands
  2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
  2017-10-30 18:56 ` [PATCH 1/3] drm/i915/guc: Rename the function that resets the GuC Michel Thierry
@ 2017-10-30 18:56 ` Michel Thierry
  2017-10-30 20:58   ` Chris Wilson
                     ` (2 more replies)
  2017-10-30 18:56 ` [PATCH 3/3] HAX enable GuC submission for CI Michel Thierry
                   ` (4 subsequent siblings)
  6 siblings, 3 replies; 22+ messages in thread
From: Michel Thierry @ 2017-10-30 18:56 UTC (permalink / raw)
  To: intel-gfx

This patch adds per engine reset and recovery (TDR) support when GuC is
used to submit workloads to GPU.

In the case of i915 directly submission to ELSP, driver manages hang
detection, recovery and resubmission. With GuC submission these tasks
are shared between driver and GuC. i915 is still responsible for detecting
a hang, and when it does it only requests GuC to reset that Engine. GuC
internally manages acquiring forcewake and idling the engine before actually
resetting it.

Once the reset is successful, i915 takes over again and handles resubmission.
The scheduler in i915 knows which requests are pending so after resetting
a engine, pending workloads/requests are resubmitted again.

v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
non-guc funtion names.

v3: Removed debug message about engine restarting from which request,
since the new baseline do it regardless of submission mode. (Chris)

v4: Rebase.

v5: Do not pass unnecessary reporting flags to the fw (Jeff);
tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c       |  9 +++++++--
 drivers/gpu/drm/i915/i915_drv.h       |  1 +
 drivers/gpu/drm/i915/intel_guc.c      | 24 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_guc_fwif.h |  1 +
 drivers/gpu/drm/i915/intel_uncore.c   |  5 -----
 5 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index af745749509c..02fb35744f66 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1984,10 +1984,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
 		goto out;
 	}
 
-	ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
+	if (!engine->i915->guc.execbuf_client)
+		ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
+	else
+		ret = intel_guc_reset_engine(engine);
+
 	if (ret) {
 		/* If we fail here, we expect to fallback to a global reset */
-		DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n",
+		DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n",
+				 (engine->i915->guc.execbuf_client ? "GUC " : ""),
 				 engine->name, ret);
 		goto out;
 	}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0ed06ca07568..b77fd9720ec2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3326,6 +3326,7 @@ extern int i915_reset_engine(struct intel_engine_cs *engine,
 
 extern bool intel_has_reset_engine(struct drm_i915_private *dev_priv);
 extern int intel_reset_guc(struct drm_i915_private *dev_priv);
+extern int intel_guc_reset_engine(struct intel_engine_cs *engine);
 extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine);
 extern void intel_hangcheck_init(struct drm_i915_private *dev_priv);
 extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_guc.c b/drivers/gpu/drm/i915/intel_guc.c
index f74d50fdaeb0..f7b9b6b91499 100644
--- a/drivers/gpu/drm/i915/intel_guc.c
+++ b/drivers/gpu/drm/i915/intel_guc.c
@@ -24,6 +24,7 @@
 
 #include "intel_guc.h"
 #include "i915_drv.h"
+#include "i915_guc_submission.h"
 
 static void gen8_guc_raise_irq(struct intel_guc *guc)
 {
@@ -283,6 +284,29 @@ int intel_guc_suspend(struct drm_i915_private *dev_priv)
 	return intel_guc_send(guc, data, ARRAY_SIZE(data));
 }
 
+/**
+ * intel_guc_reset_engine() - ask GuC to reset an engine
+ * @engine:	engine to be reset
+ */
+int intel_guc_reset_engine(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+	struct intel_guc *guc = &dev_priv->guc;
+	u32 data[7];
+
+	GEM_BUG_ON(!guc->execbuf_client);
+
+	data[0] = INTEL_GUC_ACTION_REQUEST_ENGINE_RESET;
+	data[1] = engine->guc_id;
+	data[2] = 0;
+	data[3] = 0;
+	data[4] = 0;
+	data[5] = guc->execbuf_client->stage_id;
+	data[6] = guc_ggtt_offset(guc->shared_data);
+
+	return intel_guc_send(guc, data, ARRAY_SIZE(data));
+}
+
 /**
  * intel_guc_resume() - notify GuC resuming from suspend state
  * @dev_priv:	i915 device private
diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h b/drivers/gpu/drm/i915/intel_guc_fwif.h
index e24dbec2ead8..6a10aa6f04d3 100644
--- a/drivers/gpu/drm/i915/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
@@ -574,6 +574,7 @@ struct guc_shared_ctx_data {
 enum intel_guc_action {
 	INTEL_GUC_ACTION_DEFAULT = 0x0,
 	INTEL_GUC_ACTION_REQUEST_PREEMPTION = 0x2,
+	INTEL_GUC_ACTION_REQUEST_ENGINE_RESET = 0x3,
 	INTEL_GUC_ACTION_SAMPLE_FORCEWAKE = 0x6,
 	INTEL_GUC_ACTION_ALLOCATE_DOORBELL = 0x10,
 	INTEL_GUC_ACTION_DEALLOCATE_DOORBELL = 0x20,
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index d629b2e40013..0564d87ad416 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1792,14 +1792,9 @@ bool intel_has_gpu_reset(struct drm_i915_private *dev_priv)
 	return intel_get_gpu_reset(dev_priv) != NULL;
 }
 
-/*
- * When GuC submission is enabled, GuC manages ELSP and can initiate the
- * engine reset too. For now, fall back to full GPU reset if it is enabled.
- */
 bool intel_has_reset_engine(struct drm_i915_private *dev_priv)
 {
 	return (dev_priv->info.has_reset_engine &&
-		!dev_priv->guc.execbuf_client &&
 		i915_modparams.reset >= 2);
 }
 
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/3] HAX enable GuC submission for CI
  2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
  2017-10-30 18:56 ` [PATCH 1/3] drm/i915/guc: Rename the function that resets the GuC Michel Thierry
  2017-10-30 18:56 ` [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands Michel Thierry
@ 2017-10-30 18:56 ` Michel Thierry
  2017-10-30 20:05 ` ✓ Fi.CI.BAT: success for GuC based reset engine Patchwork
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Michel Thierry @ 2017-10-30 18:56 UTC (permalink / raw)
  To: intel-gfx

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

Also revert ("drm/i915/guc: Assert that we switch between
known ggtt->invalidate functions")

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 8 ++------
 drivers/gpu/drm/i915/i915_params.h  | 4 ++--
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5eaa6893daaa..7dcf931abe37 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3568,17 +3568,13 @@ int i915_ggtt_enable_hw(struct drm_i915_private *dev_priv)
 
 void i915_ggtt_enable_guc(struct drm_i915_private *i915)
 {
-	GEM_BUG_ON(i915->ggtt.invalidate != gen6_ggtt_invalidate);
-
 	i915->ggtt.invalidate = guc_ggtt_invalidate;
 }
 
 void i915_ggtt_disable_guc(struct drm_i915_private *i915)
 {
-	/* We should only be called after i915_ggtt_enable_guc() */
-	GEM_BUG_ON(i915->ggtt.invalidate != guc_ggtt_invalidate);
-
-	i915->ggtt.invalidate = gen6_ggtt_invalidate;
+	if (i915->ggtt.invalidate == guc_ggtt_invalidate)
+		i915->ggtt.invalidate = gen6_ggtt_invalidate;
 }
 
 void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
index c7292268ed43..c38cef07b9fe 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -44,8 +44,8 @@
 	param(int, disable_power_well, -1) \
 	param(int, enable_ips, 1) \
 	param(int, invert_brightness, 0) \
-	param(int, enable_guc_loading, 0) \
-	param(int, enable_guc_submission, 0) \
+	param(int, enable_guc_loading, 1) \
+	param(int, enable_guc_submission, 1) \
 	param(int, guc_log_level, -1) \
 	param(char *, guc_firmware_path, NULL) \
 	param(char *, huc_firmware_path, NULL) \
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* ✓ Fi.CI.BAT: success for GuC based reset engine
  2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
                   ` (2 preceding siblings ...)
  2017-10-30 18:56 ` [PATCH 3/3] HAX enable GuC submission for CI Michel Thierry
@ 2017-10-30 20:05 ` Patchwork
  2017-10-30 21:14   ` Chris Wilson
  2017-10-30 23:20 ` ✗ Fi.CI.IGT: warning " Patchwork
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Patchwork @ 2017-10-30 20:05 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

== Series Details ==

Series: GuC based reset engine
URL   : https://patchwork.freedesktop.org/series/32859/
State : success

== Summary ==

Series 32859v1 GuC based reset engine
https://patchwork.freedesktop.org/api/1.0/series/32859/revisions/1/mbox/

Test chamelium:
        Subgroup dp-crc-fast:
                pass       -> FAIL       (fi-kbl-7500u) fdo#102514
Test gem_exec_suspend:
        Subgroup basic-s3:
                pass       -> DMESG-WARN (fi-cfl-s) fdo#103186
Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-c:
                incomplete -> PASS       (fi-bxt-dsi)
Test pm_rpm:
        Subgroup basic-pci-d3-state:
                incomplete -> FAIL       (fi-bxt-j4205)

fdo#102514 https://bugs.freedesktop.org/show_bug.cgi?id=102514
fdo#103186 https://bugs.freedesktop.org/show_bug.cgi?id=103186

fi-bdw-5557u     total:289  pass:266  dwarn:0   dfail:0   fail:2   skip:21  time:459s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:454s
fi-blb-e6850     total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:371s
fi-bsw-n3050     total:289  pass:241  dwarn:0   dfail:0   fail:2   skip:46  time:540s
fi-bwr-2160      total:289  pass:183  dwarn:0   dfail:0   fail:0   skip:106 time:265s
fi-bxt-dsi       total:289  pass:257  dwarn:0   dfail:0   fail:2   skip:30  time:517s
fi-bxt-j4205     total:289  pass:258  dwarn:0   dfail:0   fail:2   skip:29  time:520s
fi-byt-j1900     total:289  pass:251  dwarn:1   dfail:0   fail:2   skip:35  time:510s
fi-byt-n2820     total:289  pass:247  dwarn:1   dfail:0   fail:2   skip:39  time:499s
fi-cfl-s         total:289  pass:251  dwarn:4   dfail:0   fail:2   skip:32  time:571s
fi-elk-e7500     total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:423s
fi-gdg-551       total:289  pass:178  dwarn:1   dfail:0   fail:1   skip:109 time:249s
fi-glk-1         total:289  pass:259  dwarn:0   dfail:0   fail:2   skip:28  time:592s
fi-hsw-4770      total:289  pass:260  dwarn:0   dfail:0   fail:2   skip:27  time:444s
fi-hsw-4770r     total:289  pass:260  dwarn:0   dfail:0   fail:2   skip:27  time:442s
fi-ilk-650       total:289  pass:228  dwarn:0   dfail:0   fail:0   skip:61  time:421s
fi-ivb-3520m     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:488s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:458s
fi-kbl-7500u     total:289  pass:261  dwarn:1   dfail:0   fail:3   skip:24  time:500s
fi-kbl-7560u     total:289  pass:268  dwarn:0   dfail:0   fail:2   skip:19  time:582s
fi-kbl-7567u     total:289  pass:267  dwarn:0   dfail:0   fail:2   skip:20  time:489s
fi-kbl-r         total:289  pass:260  dwarn:0   dfail:0   fail:2   skip:27  time:588s
fi-pnv-d510      total:289  pass:222  dwarn:1   dfail:0   fail:0   skip:66  time:537s
fi-skl-6260u     total:289  pass:267  dwarn:0   dfail:0   fail:2   skip:20  time:474s
fi-skl-6600u     total:289  pass:260  dwarn:0   dfail:0   fail:2   skip:27  time:593s
fi-skl-6700hq    total:289  pass:261  dwarn:0   dfail:0   fail:2   skip:26  time:652s
fi-skl-6700k     total:289  pass:263  dwarn:0   dfail:0   fail:2   skip:24  time:537s
fi-skl-6770hq    total:289  pass:267  dwarn:0   dfail:0   fail:2   skip:20  time:515s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:459s
fi-snb-2520m     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:554s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:0   skip:40  time:415s

302b1723bec5e7f43e423c975b9b87705fb9fdd6 drm-tip: 2017y-10m-30d-18h-28m-39s UTC integration manifest
e25d1ee5b093 HAX enable GuC submission for CI
c3a835c2c17c drm/i915/guc: Add support for reset engine using GuC commands
195f89fa1970 drm/i915/guc: Rename the function that resets the GuC

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands
  2017-10-30 18:56 ` [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands Michel Thierry
@ 2017-10-30 20:58   ` Chris Wilson
  2017-10-30 21:08     ` Michel Thierry
  2017-10-30 21:09   ` Chris Wilson
  2017-10-31 22:53   ` [PATCH v6] " Michel Thierry
  2 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2017-10-30 20:58 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Quoting Michel Thierry (2017-10-30 18:56:15)
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index af745749509c..02fb35744f66 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1984,10 +1984,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
>                 goto out;
>         }
>  
> -       ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
> +       if (!engine->i915->guc.execbuf_client)
> +               ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
> +       else
> +               ret = intel_guc_reset_engine(engine);

While redundant, the interface cries out for
	intel_guc_reset_engine(guc, engine);
even though, in this case we would be using
	intel_guc_reset_engine(&engine->i915->guc, engine);

I would also be tempted to change intel_gpu_reset to
	intel_gpu_reset_engine(engine->i915, engine);

intel_gt_reset_engine?

Just trying to make our language consistent along each path.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] drm/i915/guc: Rename the function that resets the GuC
  2017-10-30 18:56 ` [PATCH 1/3] drm/i915/guc: Rename the function that resets the GuC Michel Thierry
@ 2017-10-30 21:02   ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-10-30 21:02 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Quoting Michel Thierry (2017-10-30 18:56:14)
> intel_guc_reset sounds more like the microcontroller is the one performing
> a reset, while in this case is the opposite. intel_reset_guc not only
> makes it clearer, it follows the other intel_reset functions available.
> 
> v2: Print error message in English (Tvrtko).
> 
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands
  2017-10-30 20:58   ` Chris Wilson
@ 2017-10-30 21:08     ` Michel Thierry
  0 siblings, 0 replies; 22+ messages in thread
From: Michel Thierry @ 2017-10-30 21:08 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 30/10/17 13:58, Chris Wilson wrote:
> Quoting Michel Thierry (2017-10-30 18:56:15)
>> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>> index af745749509c..02fb35744f66 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.c
>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>> @@ -1984,10 +1984,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
>>                  goto out;
>>          }
>>   
>> -       ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
>> +       if (!engine->i915->guc.execbuf_client)
>> +               ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
>> +       else
>> +               ret = intel_guc_reset_engine(engine);
> 
> While redundant, the interface cries out for
> 	intel_guc_reset_engine(guc, engine);
> even though, in this case we would be using
> 	intel_guc_reset_engine(&engine->i915->guc, engine);
> 

I'll change the name and pass the guc as parameter.

> I would also be tempted to change intel_gpu_reset to
> 	intel_gpu_reset_engine(engine->i915, engine);
> 
> intel_gt_reset_engine?
> 
And then it becomes a wrapper of intel_gpu_reset? (intel_gpu_reset is 
used for full reset path and in older platforms too).

> Just trying to make our language consistent along each path.
> 

It's true that it would look better as you suggest:

if (!guc_client)
	intel_gt_reset_engine(i915, engine);
else
	intel_guc_reset_engine(guc, engine);

Thanks,

-Michel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands
  2017-10-30 18:56 ` [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands Michel Thierry
  2017-10-30 20:58   ` Chris Wilson
@ 2017-10-30 21:09   ` Chris Wilson
  2017-10-31  4:38     ` Michel Thierry
  2017-10-31 22:53   ` [PATCH v6] " Michel Thierry
  2 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2017-10-30 21:09 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Quoting Michel Thierry (2017-10-30 18:56:15)
> This patch adds per engine reset and recovery (TDR) support when GuC is
> used to submit workloads to GPU.
> 
> In the case of i915 directly submission to ELSP, driver manages hang
> detection, recovery and resubmission. With GuC submission these tasks
> are shared between driver and GuC. i915 is still responsible for detecting
> a hang, and when it does it only requests GuC to reset that Engine. GuC
> internally manages acquiring forcewake and idling the engine before actually
> resetting it.
> 
> Once the reset is successful, i915 takes over again and handles resubmission.
> The scheduler in i915 knows which requests are pending so after resetting
> a engine, pending workloads/requests are resubmitted again.
> 
> v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> non-guc funtion names.
> 
> v3: Removed debug message about engine restarting from which request,
> since the new baseline do it regardless of submission mode. (Chris)
> 
> v4: Rebase.
> 
> v5: Do not pass unnecessary reporting flags to the fw (Jeff);
> tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.

In your experience, how did our test coverage fare?

Could you use live_hangcheck effectively? (The drv_selftest would need
some hand holding to pass along guc options. But for livetesting we
should probably get to the point of being able to load/unload the guc
interface so that we cover both execlists and guc.) Did you find 
"gem_exec_whisper --r hang*", did you try gem_concurrent_all?
 
> +/**
> + * intel_guc_reset_engine() - ask GuC to reset an engine
> + * @engine:    engine to be reset
> + */
> +int intel_guc_reset_engine(struct intel_engine_cs *engine)
> +{
> +       struct drm_i915_private *dev_priv = engine->i915;
> +       struct intel_guc *guc = &dev_priv->guc;
> +       u32 data[7];
> +
> +       GEM_BUG_ON(!guc->execbuf_client);
> +
> +       data[0] = INTEL_GUC_ACTION_REQUEST_ENGINE_RESET;
> +       data[1] = engine->guc_id;
> +       data[2] = 0;
> +       data[3] = 0;
> +       data[4] = 0;
> +       data[5] = guc->execbuf_client->stage_id;
> +       data[6] = guc_ggtt_offset(guc->shared_data);
> +
> +       return intel_guc_send(guc, data, ARRAY_SIZE(data));

Is this a synchronous action? We expect that following the completion of
the reset routine, we are ready to reinit the hw. The same rule needs to
apply the guc, I think.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ✓ Fi.CI.BAT: success for GuC based reset engine
  2017-10-30 20:05 ` ✓ Fi.CI.BAT: success for GuC based reset engine Patchwork
@ 2017-10-30 21:14   ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-10-30 21:14 UTC (permalink / raw)
  To: Patchwork, Michel Thierry; +Cc: intel-gfx

Quoting Patchwork (2017-10-30 20:05:05)
> == Series Details ==
> 
> Series: GuC based reset engine
> URL   : https://patchwork.freedesktop.org/series/32859/
> State : success
> 
> == Summary ==
> 
> Series 32859v1 GuC based reset engine
> https://patchwork.freedesktop.org/api/1.0/series/32859/revisions/1/mbox/
> 
> Test chamelium:
>         Subgroup dp-crc-fast:
>                 pass       -> FAIL       (fi-kbl-7500u) fdo#102514
> Test gem_exec_suspend:
>         Subgroup basic-s3:
>                 pass       -> DMESG-WARN (fi-cfl-s) fdo#103186
> Test kms_pipe_crc_basic:
>         Subgroup suspend-read-crc-pipe-c:
>                 incomplete -> PASS       (fi-bxt-dsi)
> Test pm_rpm:
>         Subgroup basic-pci-d3-state:
>                 incomplete -> FAIL       (fi-bxt-j4205)

35686a44e4b8 drm/i915: Use intel_ddi_get_config() for MST
1939ba51fd05 drm/i915: Pass a crtc state to ddi post_disable from MST code
bb911536f07e drm/i915: Eliminate pll->state usage from bxt_calc_pll_link()
0fce04c8764b drm/i915: Nuke intel_ddi_get_encoder_port()
7e732cacb1ae drm/i915: Stop frobbing with DDI encoder->type
e1214b95ed83 drm/i915: Populate output_types from .get_config()
d6038611aa3d drm/i915: Parse max HDMI TMDS clock from VBT
6e8fbf8d19e4 drm/i915/vbt: Fix HDMI level shifter and max data rate bitfield sizes
2dd14a4ad87f drm-tip: 2017y-10m-30d-13h-40m-22s UTC integration manifest
0b07194bb55e Linux 4.14-rc7

From the rc7 patchset, my guess would be
0cc2b4e5a020 PM / QoS: Fix device resume latency PM QoS

Anyway, someone has a fun bisect task on their hang; it should be very
quick.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* ✗ Fi.CI.IGT: warning for GuC based reset engine
  2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
                   ` (3 preceding siblings ...)
  2017-10-30 20:05 ` ✓ Fi.CI.BAT: success for GuC based reset engine Patchwork
@ 2017-10-30 23:20 ` Patchwork
  2017-10-31 10:20   ` Chris Wilson
  2017-10-31 23:50 ` ✓ Fi.CI.BAT: success for GuC based reset engine (rev2) Patchwork
  2017-11-01  0:59 ` ✗ Fi.CI.IGT: failure " Patchwork
  6 siblings, 1 reply; 22+ messages in thread
From: Patchwork @ 2017-10-30 23:20 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

== Series Details ==

Series: GuC based reset engine
URL   : https://patchwork.freedesktop.org/series/32859/
State : warning

== Summary ==

Test drv_module_reload:
        Subgroup basic-no-display:
                pass       -> DMESG-WARN (shard-hsw) fdo#102707
Test gem_softpin:
        Subgroup noreloc-S3:
                pass       -> SKIP       (shard-hsw)

fdo#102707 https://bugs.freedesktop.org/show_bug.cgi?id=102707

shard-hsw        total:2539 pass:1390 dwarn:2   dfail:0   fail:52  skip:1095 time:9075s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands
  2017-10-30 21:09   ` Chris Wilson
@ 2017-10-31  4:38     ` Michel Thierry
  2017-10-31 10:17       ` Chris Wilson
  0 siblings, 1 reply; 22+ messages in thread
From: Michel Thierry @ 2017-10-31  4:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 30/10/17 14:09, Chris Wilson wrote:
> Quoting Michel Thierry (2017-10-30 18:56:15)
>> This patch adds per engine reset and recovery (TDR) support when GuC is
>> used to submit workloads to GPU.
>>
>> In the case of i915 directly submission to ELSP, driver manages hang
>> detection, recovery and resubmission. With GuC submission these tasks
>> are shared between driver and GuC. i915 is still responsible for detecting
>> a hang, and when it does it only requests GuC to reset that Engine. GuC
>> internally manages acquiring forcewake and idling the engine before actually
>> resetting it.
>>
>> Once the reset is successful, i915 takes over again and handles resubmission.
>> The scheduler in i915 knows which requests are pending so after resetting
>> a engine, pending workloads/requests are resubmitted again.
>>
>> v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
>> non-guc funtion names.
>>
>> v3: Removed debug message about engine restarting from which request,
>> since the new baseline do it regardless of submission mode. (Chris)
>>
>> v4: Rebase.
>>
>> v5: Do not pass unnecessary reporting flags to the fw (Jeff);
>> tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
> 
> In your experience, how did our test coverage fare?
> 
> Could you use live_hangcheck effectively? (The drv_selftest would need
> some hand holding to pass along guc options. But for livetesting we
> should probably get to the point of being able to load/unload the guc
> interface so that we cover both execlists and guc.) Did you find
> "gem_exec_whisper --r hang*", did you try gem_concurrent_all?
>   

live_hangcheck runs ok with guc (as long as i915_params.h has guc 
submission enabled). Do you see a benefit on adding an option in 
drv_selftest to override the submission mode? I can add it to my list.

You got me in gem_concurrent_all, I forgot to schedule it a few weeks ago.

>> +/**
>> + * intel_guc_reset_engine() - ask GuC to reset an engine
>> + * @engine:    engine to be reset
>> + */
>> +int intel_guc_reset_engine(struct intel_engine_cs *engine)
>> +{
>> +       struct drm_i915_private *dev_priv = engine->i915;
>> +       struct intel_guc *guc = &dev_priv->guc;
>> +       u32 data[7];
>> +
>> +       GEM_BUG_ON(!guc->execbuf_client);
>> +
>> +       data[0] = INTEL_GUC_ACTION_REQUEST_ENGINE_RESET;
>> +       data[1] = engine->guc_id;
>> +       data[2] = 0;
>> +       data[3] = 0;
>> +       data[4] = 0;
>> +       data[5] = guc->execbuf_client->stage_id;
>> +       data[6] = guc_ggtt_offset(guc->shared_data);
>> +
>> +       return intel_guc_send(guc, data, ARRAY_SIZE(data));
> 
> Is this a synchronous action? We expect that following the completion of
> the reset routine, we are ready to reinit the hw. The same rule needs to
> apply the guc, I think.

Right now the action is synchronous, the fw won't reply to the action 
until all the steps are completed. It also is fast enough, I haven't 
seen it time out (which would be promoted to full reset and reload the 
fw). But, do you have a crystal ball?

> -Chris
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands
  2017-10-31  4:38     ` Michel Thierry
@ 2017-10-31 10:17       ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-10-31 10:17 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Quoting Michel Thierry (2017-10-31 04:38:30)
> On 30/10/17 14:09, Chris Wilson wrote:
> > Quoting Michel Thierry (2017-10-30 18:56:15)
> >> This patch adds per engine reset and recovery (TDR) support when GuC is
> >> used to submit workloads to GPU.
> >>
> >> In the case of i915 directly submission to ELSP, driver manages hang
> >> detection, recovery and resubmission. With GuC submission these tasks
> >> are shared between driver and GuC. i915 is still responsible for detecting
> >> a hang, and when it does it only requests GuC to reset that Engine. GuC
> >> internally manages acquiring forcewake and idling the engine before actually
> >> resetting it.
> >>
> >> Once the reset is successful, i915 takes over again and handles resubmission.
> >> The scheduler in i915 knows which requests are pending so after resetting
> >> a engine, pending workloads/requests are resubmitted again.
> >>
> >> v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> >> non-guc funtion names.
> >>
> >> v3: Removed debug message about engine restarting from which request,
> >> since the new baseline do it regardless of submission mode. (Chris)
> >>
> >> v4: Rebase.
> >>
> >> v5: Do not pass unnecessary reporting flags to the fw (Jeff);
> >> tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
> > 
> > In your experience, how did our test coverage fare?
> > 
> > Could you use live_hangcheck effectively? (The drv_selftest would need
> > some hand holding to pass along guc options. But for livetesting we
> > should probably get to the point of being able to load/unload the guc
> > interface so that we cover both execlists and guc.) Did you find
> > "gem_exec_whisper --r hang*", did you try gem_concurrent_all?
> >   
> 
> live_hangcheck runs ok with guc (as long as i915_params.h has guc 
> submission enabled). Do you see a benefit on adding an option in 
> drv_selftest to override the submission mode? I can add it to my list.

I'd rather take the path of loading/unloading guc in the kernel to
ensure we have coverage of both on all platforms where that makes sense.
That way the kernel remains in control of the coverage it deems
necessary.

The alternative would be a for_each_param_value() loop in drv_selftests
which doesn't inspired me with forward/backward testing confidence.

For the time being, knowing that you did check live_hangcheck is enough
for me to cross it off the checklist.

The larger question behind "how did our test coverage fare" is did it
find any bugs during development? If not, we need more tests ;)
 
> You got me in gem_concurrent_all, I forgot to schedule it a few weeks ago.
> 
> >> +/**
> >> + * intel_guc_reset_engine() - ask GuC to reset an engine
> >> + * @engine:    engine to be reset
> >> + */
> >> +int intel_guc_reset_engine(struct intel_engine_cs *engine)
> >> +{
> >> +       struct drm_i915_private *dev_priv = engine->i915;
> >> +       struct intel_guc *guc = &dev_priv->guc;
> >> +       u32 data[7];
> >> +
> >> +       GEM_BUG_ON(!guc->execbuf_client);
> >> +
> >> +       data[0] = INTEL_GUC_ACTION_REQUEST_ENGINE_RESET;
> >> +       data[1] = engine->guc_id;
> >> +       data[2] = 0;
> >> +       data[3] = 0;
> >> +       data[4] = 0;
> >> +       data[5] = guc->execbuf_client->stage_id;
> >> +       data[6] = guc_ggtt_offset(guc->shared_data);
> >> +
> >> +       return intel_guc_send(guc, data, ARRAY_SIZE(data));
> > 
> > Is this a synchronous action? We expect that following the completion of
> > the reset routine, we are ready to reinit the hw. The same rule needs to
> > apply the guc, I think.
> 
> Right now the action is synchronous, the fw won't reply to the action 
> until all the steps are completed. It also is fast enough, I haven't 
> seen it time out (which would be promoted to full reset and reload the 
> fw). But, do you have a crystal ball?

Just REQUEST is a red flag that there may be an asynchronous REPLY/ACK.

Aside: we need to start planning/adding fault_injection tests.
intel_guc_send() seems a good choice for an if(should_fail()).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ✗ Fi.CI.IGT: warning for GuC based reset engine
  2017-10-30 23:20 ` ✗ Fi.CI.IGT: warning " Patchwork
@ 2017-10-31 10:20   ` Chris Wilson
  2017-10-31 20:56     ` Michel Thierry
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2017-10-31 10:20 UTC (permalink / raw)
  To: Patchwork, Michel Thierry; +Cc: intel-gfx

Quoting Patchwork (2017-10-30 23:20:13)
> For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/shards.html

https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/shard-kbl1/igt@prime_busy@wait-hang-render.html
is suspicious.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ✗ Fi.CI.IGT: warning for GuC based reset engine
  2017-10-31 10:20   ` Chris Wilson
@ 2017-10-31 20:56     ` Michel Thierry
  2017-10-31 21:31       ` Chris Wilson
  0 siblings, 1 reply; 22+ messages in thread
From: Michel Thierry @ 2017-10-31 20:56 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Patchwork

On 10/31/2017 3:20 AM, Chris Wilson wrote:
> Quoting Patchwork (2017-10-30 23:20:13)
>> For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/shards.html
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/shard-kbl1/igt@prime_busy@wait-hang-render.html
> is suspicious.
> -Chris
> 

It is suspicious (I just double checked that prime_busy runs fine in my 
skl). I don't know if the log is complete or it was killed before 
starting the subtest. Instead of:
[  ] [IGT] prime_busy: executing
[  ] [IGT] prime_busy: starting subtest wait-hang-render

the log only has:

<7>[   59.430514] [IGT] kms_plane_lowres: exiting, ret=77
<7>[   59.522502] [IGT] prime_busy: executing
������������������������������

(and a few tests were returning EBADFD).
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ✗ Fi.CI.IGT: warning for GuC based reset engine
  2017-10-31 20:56     ` Michel Thierry
@ 2017-10-31 21:31       ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-10-31 21:31 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx, Patchwork

Quoting Michel Thierry (2017-10-31 20:56:03)
> On 10/31/2017 3:20 AM, Chris Wilson wrote:
> > Quoting Patchwork (2017-10-30 23:20:13)
> >> For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/shards.html
> > 
> > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/shard-kbl1/igt@prime_busy@wait-hang-render.html
> > is suspicious.
> > -Chris
> > 
> 
> It is suspicious (I just double checked that prime_busy runs fine in my 
> skl). I don't know if the log is complete or it was killed before 
> starting the subtest. Instead of:
> [  ] [IGT] prime_busy: executing
> [  ] [IGT] prime_busy: starting subtest wait-hang-render
> 
> the log only has:
> 
> <7>[   59.430514] [IGT] kms_plane_lowres: exiting, ret=77
> <7>[   59.522502] [IGT] prime_busy: executing
> ������������������������������

Yup, that just tells us that the machine was rebooted before the dmesg
was written to disk. Usually a sign of a panic, but no pstore oops.

It was only that this purported to be a hang test, and until very
recently broke intel_gpu_reset, that I worried.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v6] drm/i915/guc: Add support for reset engine using GuC commands
  2017-10-30 18:56 ` [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands Michel Thierry
  2017-10-30 20:58   ` Chris Wilson
  2017-10-30 21:09   ` Chris Wilson
@ 2017-10-31 22:53   ` Michel Thierry
  2017-11-01 13:58     ` Chris Wilson
  2 siblings, 1 reply; 22+ messages in thread
From: Michel Thierry @ 2017-10-31 22:53 UTC (permalink / raw)
  To: intel-gfx

This patch adds per engine reset and recovery (TDR) support when GuC is
used to submit workloads to GPU.

In the case of i915 directly submission to ELSP, driver manages hang
detection, recovery and resubmission. With GuC submission these tasks
are shared between driver and GuC. i915 is still responsible for detecting
a hang, and when it does it only requests GuC to reset that Engine. GuC
internally manages acquiring forcewake and idling the engine before
resetting it.

Once the reset is successful, i915 takes over again and handles the
resubmission. The scheduler in i915 knows which requests are pending so
after resetting a engine, pending workloads/requests are resubmitted
again.

v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
non-guc function names.

v3: Removed debug message about engine restarting from which request,
since the new baseline do it regardless of submission mode. (Chris)

v4: Rebase.

v5: Do not pass unnecessary reporting flags to the fw (Jeff);
tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.

v6: Rename the existing reset engine function and share a similar
interface between guc and non-guc paths (Chris).

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c       | 15 +++++++++++++--
 drivers/gpu/drm/i915/i915_drv.h       |  2 ++
 drivers/gpu/drm/i915/intel_guc.c      | 24 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_guc_fwif.h |  1 +
 drivers/gpu/drm/i915/intel_uncore.c   |  5 -----
 5 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index af745749509c..359333a423cf 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1950,6 +1950,12 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
 	goto finish;
 }
 
+static inline int intel_gt_reset_engine(struct drm_i915_private *dev_priv,
+					struct intel_engine_cs *engine)
+{
+	return intel_gpu_reset(dev_priv, intel_engine_flag(engine));
+}
+
 /**
  * i915_reset_engine - reset GPU engine to recover from a hang
  * @engine: engine to reset
@@ -1984,10 +1990,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
 		goto out;
 	}
 
-	ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
+	if (!engine->i915->guc.execbuf_client)
+		ret = intel_gt_reset_engine(engine->i915, engine);
+	else
+		ret = intel_guc_reset_engine(&engine->i915->guc, engine);
+
 	if (ret) {
 		/* If we fail here, we expect to fallback to a global reset */
-		DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n",
+		DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n",
+				 (engine->i915->guc.execbuf_client ? "GUC ":""),
 				 engine->name, ret);
 		goto out;
 	}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index cff1b57598c3..ce2725696187 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3330,6 +3330,8 @@ extern int i915_reset_engine(struct intel_engine_cs *engine,
 
 extern bool intel_has_reset_engine(struct drm_i915_private *dev_priv);
 extern int intel_reset_guc(struct drm_i915_private *dev_priv);
+extern int intel_guc_reset_engine(struct intel_guc *guc,
+				  struct intel_engine_cs *engine);
 extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine);
 extern void intel_hangcheck_init(struct drm_i915_private *dev_priv);
 extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_guc.c b/drivers/gpu/drm/i915/intel_guc.c
index f74d50fdaeb0..9678630a1c70 100644
--- a/drivers/gpu/drm/i915/intel_guc.c
+++ b/drivers/gpu/drm/i915/intel_guc.c
@@ -24,6 +24,7 @@
 
 #include "intel_guc.h"
 #include "i915_drv.h"
+#include "i915_guc_submission.h"
 
 static void gen8_guc_raise_irq(struct intel_guc *guc)
 {
@@ -283,6 +284,29 @@ int intel_guc_suspend(struct drm_i915_private *dev_priv)
 	return intel_guc_send(guc, data, ARRAY_SIZE(data));
 }
 
+/**
+ * intel_guc_reset_engine() - ask GuC to reset an engine
+ * @guc:	intel_guc structure
+ * @engine:	engine to be reset
+ */
+int intel_guc_reset_engine(struct intel_guc *guc,
+			   struct intel_engine_cs *engine)
+{
+	u32 data[7];
+
+	GEM_BUG_ON(!guc->execbuf_client);
+
+	data[0] = INTEL_GUC_ACTION_REQUEST_ENGINE_RESET;
+	data[1] = engine->guc_id;
+	data[2] = 0;
+	data[3] = 0;
+	data[4] = 0;
+	data[5] = guc->execbuf_client->stage_id;
+	data[6] = guc_ggtt_offset(guc->shared_data);
+
+	return intel_guc_send(guc, data, ARRAY_SIZE(data));
+}
+
 /**
  * intel_guc_resume() - notify GuC resuming from suspend state
  * @dev_priv:	i915 device private
diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h b/drivers/gpu/drm/i915/intel_guc_fwif.h
index e24dbec2ead8..6a10aa6f04d3 100644
--- a/drivers/gpu/drm/i915/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
@@ -574,6 +574,7 @@ struct guc_shared_ctx_data {
 enum intel_guc_action {
 	INTEL_GUC_ACTION_DEFAULT = 0x0,
 	INTEL_GUC_ACTION_REQUEST_PREEMPTION = 0x2,
+	INTEL_GUC_ACTION_REQUEST_ENGINE_RESET = 0x3,
 	INTEL_GUC_ACTION_SAMPLE_FORCEWAKE = 0x6,
 	INTEL_GUC_ACTION_ALLOCATE_DOORBELL = 0x10,
 	INTEL_GUC_ACTION_DEALLOCATE_DOORBELL = 0x20,
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index d629b2e40013..0564d87ad416 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1792,14 +1792,9 @@ bool intel_has_gpu_reset(struct drm_i915_private *dev_priv)
 	return intel_get_gpu_reset(dev_priv) != NULL;
 }
 
-/*
- * When GuC submission is enabled, GuC manages ELSP and can initiate the
- * engine reset too. For now, fall back to full GPU reset if it is enabled.
- */
 bool intel_has_reset_engine(struct drm_i915_private *dev_priv)
 {
 	return (dev_priv->info.has_reset_engine &&
-		!dev_priv->guc.execbuf_client &&
 		i915_modparams.reset >= 2);
 }
 
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* ✓ Fi.CI.BAT: success for GuC based reset engine (rev2)
  2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
                   ` (4 preceding siblings ...)
  2017-10-30 23:20 ` ✗ Fi.CI.IGT: warning " Patchwork
@ 2017-10-31 23:50 ` Patchwork
  2017-11-01  0:59 ` ✗ Fi.CI.IGT: failure " Patchwork
  6 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2017-10-31 23:50 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

== Series Details ==

Series: GuC based reset engine (rev2)
URL   : https://patchwork.freedesktop.org/series/32859/
State : success

== Summary ==

Series 32859v2 GuC based reset engine
https://patchwork.freedesktop.org/api/1.0/series/32859/revisions/2/mbox/

Test chamelium:
        Subgroup dp-crc-fast:
                fail       -> PASS       (fi-kbl-7500u) fdo#102514
Test drv_module_reload:
        Subgroup basic-no-display:
                fail       -> PASS       (fi-hsw-4770r)

fdo#102514 https://bugs.freedesktop.org/show_bug.cgi?id=102514

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:450s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:453s
fi-blb-e6850     total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:380s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:553s
fi-bwr-2160      total:289  pass:183  dwarn:0   dfail:0   fail:0   skip:106 time:276s
fi-bxt-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:515s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:507s
fi-byt-j1900     total:289  pass:253  dwarn:1   dfail:0   fail:0   skip:35  time:513s
fi-byt-n2820     total:289  pass:249  dwarn:1   dfail:0   fail:0   skip:39  time:492s
fi-cfl-s         total:289  pass:253  dwarn:4   dfail:0   fail:0   skip:32  time:557s
fi-cnl-y         total:217  pass:196  dwarn:0   dfail:0   fail:0   skip:20 
fi-elk-e7500     total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:435s
fi-gdg-551       total:289  pass:178  dwarn:1   dfail:0   fail:1   skip:109 time:265s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:587s
fi-glk-dsi       total:289  pass:258  dwarn:0   dfail:0   fail:1   skip:30  time:495s
fi-hsw-4770      total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:435s
fi-hsw-4770r     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:426s
fi-ilk-650       total:289  pass:228  dwarn:0   dfail:0   fail:0   skip:61  time:425s
fi-ivb-3520m     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:506s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:465s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:489s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:574s
fi-kbl-7567u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:472s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:584s
fi-pnv-d510      total:289  pass:222  dwarn:1   dfail:0   fail:0   skip:66  time:569s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:455s
fi-skl-6600u     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:591s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:653s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:521s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:496s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:463s
fi-snb-2520m     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:567s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:0   skip:40  time:421s

98f067e087ab9294555a9bdd6a056c4156777117 drm-tip: 2017y-10m-31d-20h-56m-13s UTC integration manifest
2fda8dacaf77 HAX enable GuC submission for CI
2f3b79915935 drm/i915/guc: Add support for reset engine using GuC commands
614ac0cbdb82 drm/i915/guc: Rename the function that resets the GuC

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6284/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* ✗ Fi.CI.IGT: failure for GuC based reset engine (rev2)
  2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
                   ` (5 preceding siblings ...)
  2017-10-31 23:50 ` ✓ Fi.CI.BAT: success for GuC based reset engine (rev2) Patchwork
@ 2017-11-01  0:59 ` Patchwork
  6 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2017-11-01  0:59 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

== Series Details ==

Series: GuC based reset engine (rev2)
URL   : https://patchwork.freedesktop.org/series/32859/
State : failure

== Summary ==

Test drv_module_reload:
        Subgroup basic-reload-inject:
                pass       -> DMESG-WARN (shard-hsw) fdo#102707
Test kms_sysfs_edid_timing:
                pass       -> FAIL       (shard-hsw) fdo#100047
Test kms_flip:
        Subgroup flip-vs-dpms-interruptible:
                pass       -> INCOMPLETE (shard-hsw)

fdo#102707 https://bugs.freedesktop.org/show_bug.cgi?id=102707
fdo#100047 https://bugs.freedesktop.org/show_bug.cgi?id=100047

shard-hsw        total:2536 pass:1426 dwarn:2   dfail:0   fail:10  skip:1097 time:9009s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6284/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6] drm/i915/guc: Add support for reset engine using GuC commands
  2017-10-31 22:53   ` [PATCH v6] " Michel Thierry
@ 2017-11-01 13:58     ` Chris Wilson
  2017-11-01 20:41       ` Jeff McGee
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2017-11-01 13:58 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Quoting Michel Thierry (2017-10-31 22:53:09)
> This patch adds per engine reset and recovery (TDR) support when GuC is
> used to submit workloads to GPU.
> 
> In the case of i915 directly submission to ELSP, driver manages hang
> detection, recovery and resubmission. With GuC submission these tasks
> are shared between driver and GuC. i915 is still responsible for detecting
> a hang, and when it does it only requests GuC to reset that Engine. GuC
> internally manages acquiring forcewake and idling the engine before
> resetting it.
> 
> Once the reset is successful, i915 takes over again and handles the
> resubmission. The scheduler in i915 knows which requests are pending so
> after resetting a engine, pending workloads/requests are resubmitted
> again.
> 
> v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> non-guc function names.
> 
> v3: Removed debug message about engine restarting from which request,
> since the new baseline do it regardless of submission mode. (Chris)
> 
> v4: Rebase.
> 
> v5: Do not pass unnecessary reporting flags to the fw (Jeff);
> tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
> 
> v6: Rename the existing reset engine function and share a similar
> interface between guc and non-guc paths (Chris).
> 
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_drv.c       | 15 +++++++++++++--
>  drivers/gpu/drm/i915/i915_drv.h       |  2 ++
>  drivers/gpu/drm/i915/intel_guc.c      | 24 ++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_guc_fwif.h |  1 +
>  drivers/gpu/drm/i915/intel_uncore.c   |  5 -----
>  5 files changed, 40 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index af745749509c..359333a423cf 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1950,6 +1950,12 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
>         goto finish;
>  }
>  
> +static inline int intel_gt_reset_engine(struct drm_i915_private *dev_priv,
> +                                       struct intel_engine_cs *engine)
> +{
> +       return intel_gpu_reset(dev_priv, intel_engine_flag(engine));
> +}
> +
>  /**
>   * i915_reset_engine - reset GPU engine to recover from a hang
>   * @engine: engine to reset
> @@ -1984,10 +1990,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
>                 goto out;
>         }
>  
> -       ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
> +       if (!engine->i915->guc.execbuf_client)
> +               ret = intel_gt_reset_engine(engine->i915, engine);
> +       else
> +               ret = intel_guc_reset_engine(&engine->i915->guc, engine);
> +
>         if (ret) {
>                 /* If we fail here, we expect to fallback to a global reset */
> -               DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n",
> +               DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n",
> +                                (engine->i915->guc.execbuf_client ? "GUC ":""),

A bit overkill on the parentheses there ;)

Lgtm, can you please ping, say, Jeff or Daniele for an r-b on the guc
interaction?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6] drm/i915/guc: Add support for reset engine using GuC commands
  2017-11-01 13:58     ` Chris Wilson
@ 2017-11-01 20:41       ` Jeff McGee
  2017-11-02  8:43         ` Chris Wilson
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff McGee @ 2017-11-01 20:41 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Nov 01, 2017 at 01:58:04PM +0000, Chris Wilson wrote:
> Quoting Michel Thierry (2017-10-31 22:53:09)
> > This patch adds per engine reset and recovery (TDR) support when GuC is
> > used to submit workloads to GPU.
> > 
> > In the case of i915 directly submission to ELSP, driver manages hang
> > detection, recovery and resubmission. With GuC submission these tasks
> > are shared between driver and GuC. i915 is still responsible for detecting
> > a hang, and when it does it only requests GuC to reset that Engine. GuC
> > internally manages acquiring forcewake and idling the engine before
> > resetting it.
> > 
> > Once the reset is successful, i915 takes over again and handles the
> > resubmission. The scheduler in i915 knows which requests are pending so
> > after resetting a engine, pending workloads/requests are resubmitted
> > again.
> > 
> > v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> > non-guc function names.
> > 
> > v3: Removed debug message about engine restarting from which request,
> > since the new baseline do it regardless of submission mode. (Chris)
> > 
> > v4: Rebase.
> > 
> > v5: Do not pass unnecessary reporting flags to the fw (Jeff);
> > tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
> > 
> > v6: Rename the existing reset engine function and share a similar
> > interface between guc and non-guc paths (Chris).
> > 
> > Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.c       | 15 +++++++++++++--
> >  drivers/gpu/drm/i915/i915_drv.h       |  2 ++
> >  drivers/gpu/drm/i915/intel_guc.c      | 24 ++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/intel_guc_fwif.h |  1 +
> >  drivers/gpu/drm/i915/intel_uncore.c   |  5 -----
> >  5 files changed, 40 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index af745749509c..359333a423cf 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -1950,6 +1950,12 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
> >         goto finish;
> >  }
> >  
> > +static inline int intel_gt_reset_engine(struct drm_i915_private *dev_priv,
> > +                                       struct intel_engine_cs *engine)
> > +{
> > +       return intel_gpu_reset(dev_priv, intel_engine_flag(engine));
> > +}
> > +
> >  /**
> >   * i915_reset_engine - reset GPU engine to recover from a hang
> >   * @engine: engine to reset
> > @@ -1984,10 +1990,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
> >                 goto out;
> >         }
> >  
> > -       ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
> > +       if (!engine->i915->guc.execbuf_client)
> > +               ret = intel_gt_reset_engine(engine->i915, engine);
> > +       else
> > +               ret = intel_guc_reset_engine(&engine->i915->guc, engine);
> > +
> >         if (ret) {
> >                 /* If we fail here, we expect to fallback to a global reset */
> > -               DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n",
> > +               DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n",
> > +                                (engine->i915->guc.execbuf_client ? "GUC ":""),
> 
> A bit overkill on the parentheses there ;)
> 
> Lgtm, can you please ping, say, Jeff or Daniele for an r-b on the guc
> interaction?
> -Chris

There is one small change needed in the GuC preemption protocol to make it
compatible with GuC engine reset. I will send that shortly.

There are also a couple of corner case bugs with engine reset in our current
firmware versions. We are planning a firmware update to address those. But
the host-side code here is fine. So...

Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v6] drm/i915/guc: Add support for reset engine using GuC commands
  2017-11-01 20:41       ` Jeff McGee
@ 2017-11-02  8:43         ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-11-02  8:43 UTC (permalink / raw)
  To: Jeff McGee; +Cc: intel-gfx

Quoting Jeff McGee (2017-11-01 20:41:13)
> On Wed, Nov 01, 2017 at 01:58:04PM +0000, Chris Wilson wrote:
> > Quoting Michel Thierry (2017-10-31 22:53:09)
> > > This patch adds per engine reset and recovery (TDR) support when GuC is
> > > used to submit workloads to GPU.
> > > 
> > > In the case of i915 directly submission to ELSP, driver manages hang
> > > detection, recovery and resubmission. With GuC submission these tasks
> > > are shared between driver and GuC. i915 is still responsible for detecting
> > > a hang, and when it does it only requests GuC to reset that Engine. GuC
> > > internally manages acquiring forcewake and idling the engine before
> > > resetting it.
> > > 
> > > Once the reset is successful, i915 takes over again and handles the
> > > resubmission. The scheduler in i915 knows which requests are pending so
> > > after resetting a engine, pending workloads/requests are resubmitted
> > > again.
> > > 
> > > v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> > > non-guc function names.
> > > 
> > > v3: Removed debug message about engine restarting from which request,
> > > since the new baseline do it regardless of submission mode. (Chris)
> > > 
> > > v4: Rebase.
> > > 
> > > v5: Do not pass unnecessary reporting flags to the fw (Jeff);
> > > tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
> > > 
> > > v6: Rename the existing reset engine function and share a similar
> > > interface between guc and non-guc paths (Chris).
> > > 
> > > Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > >  drivers/gpu/drm/i915/i915_drv.c       | 15 +++++++++++++--
> > >  drivers/gpu/drm/i915/i915_drv.h       |  2 ++
> > >  drivers/gpu/drm/i915/intel_guc.c      | 24 ++++++++++++++++++++++++
> > >  drivers/gpu/drm/i915/intel_guc_fwif.h |  1 +
> > >  drivers/gpu/drm/i915/intel_uncore.c   |  5 -----
> > >  5 files changed, 40 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > > index af745749509c..359333a423cf 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.c
> > > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > > @@ -1950,6 +1950,12 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
> > >         goto finish;
> > >  }
> > >  
> > > +static inline int intel_gt_reset_engine(struct drm_i915_private *dev_priv,
> > > +                                       struct intel_engine_cs *engine)
> > > +{
> > > +       return intel_gpu_reset(dev_priv, intel_engine_flag(engine));
> > > +}
> > > +
> > >  /**
> > >   * i915_reset_engine - reset GPU engine to recover from a hang
> > >   * @engine: engine to reset
> > > @@ -1984,10 +1990,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
> > >                 goto out;
> > >         }
> > >  
> > > -       ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
> > > +       if (!engine->i915->guc.execbuf_client)
> > > +               ret = intel_gt_reset_engine(engine->i915, engine);
> > > +       else
> > > +               ret = intel_guc_reset_engine(&engine->i915->guc, engine);
> > > +
> > >         if (ret) {
> > >                 /* If we fail here, we expect to fallback to a global reset */
> > > -               DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n",
> > > +               DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n",
> > > +                                (engine->i915->guc.execbuf_client ? "GUC ":""),
> > 
> > A bit overkill on the parentheses there ;)
> > 
> > Lgtm, can you please ping, say, Jeff or Daniele for an r-b on the guc
> > interaction?
> > -Chris
> 
> There is one small change needed in the GuC preemption protocol to make it
> compatible with GuC engine reset. I will send that shortly.
> 
> There are also a couple of corner case bugs with engine reset in our current
> firmware versions. We are planning a firmware update to address those. But
> the host-side code here is fine. So...
> 
> Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>

Pushed this series, along with the preempt interaction fix.
Thanks,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-11-02  8:43 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
2017-10-30 18:56 ` [PATCH 1/3] drm/i915/guc: Rename the function that resets the GuC Michel Thierry
2017-10-30 21:02   ` Chris Wilson
2017-10-30 18:56 ` [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands Michel Thierry
2017-10-30 20:58   ` Chris Wilson
2017-10-30 21:08     ` Michel Thierry
2017-10-30 21:09   ` Chris Wilson
2017-10-31  4:38     ` Michel Thierry
2017-10-31 10:17       ` Chris Wilson
2017-10-31 22:53   ` [PATCH v6] " Michel Thierry
2017-11-01 13:58     ` Chris Wilson
2017-11-01 20:41       ` Jeff McGee
2017-11-02  8:43         ` Chris Wilson
2017-10-30 18:56 ` [PATCH 3/3] HAX enable GuC submission for CI Michel Thierry
2017-10-30 20:05 ` ✓ Fi.CI.BAT: success for GuC based reset engine Patchwork
2017-10-30 21:14   ` Chris Wilson
2017-10-30 23:20 ` ✗ Fi.CI.IGT: warning " Patchwork
2017-10-31 10:20   ` Chris Wilson
2017-10-31 20:56     ` Michel Thierry
2017-10-31 21:31       ` Chris Wilson
2017-10-31 23:50 ` ✓ Fi.CI.BAT: success for GuC based reset engine (rev2) Patchwork
2017-11-01  0:59 ` ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.