* [PATCH 0/3] GuC based reset engine
@ 2017-10-30 18:56 Michel Thierry
2017-10-30 18:56 ` [PATCH 1/3] drm/i915/guc: Rename the function that resets the GuC Michel Thierry
` (6 more replies)
0 siblings, 7 replies; 22+ messages in thread
From: Michel Thierry @ 2017-10-30 18:56 UTC (permalink / raw)
To: intel-gfx
We've been supporting reset-engine in execlist submission mode for a
while, but with GuC, the resubmission path had to be different because we
used to re-enable the engines before GuC... so we've been using full gpu
reset when GuC submission is enabled (which reset the fw).
Thanks to Michal Winiarski GuC preemption changes, the firmware is now
enabled before restarting the engines, so the replay part after
resetting an engine is no longer different.
The only change now is that the reset-engine is requested to the
firmware via a H2G command, and GuC is the one in charge of acquiring the
forcewake and idling the engine before reseting it.
(The first patch is a cosmetic change, which I'm sure Tvrtko doesn't
remember he reviewed [1]).
[1] https://lists.freedesktop.org/archives/intel-gfx/2017-April/126813.html
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Michal Wajdeczko (1):
HAX enable GuC submission for CI
Michel Thierry (2):
drm/i915/guc: Rename the function that resets the GuC
drm/i915/guc: Add support for reset engine using GuC commands
drivers/gpu/drm/i915/i915_drv.c | 9 +++++++--
drivers/gpu/drm/i915/i915_drv.h | 3 ++-
drivers/gpu/drm/i915/i915_gem_gtt.c | 8 ++------
drivers/gpu/drm/i915/i915_params.h | 4 ++--
drivers/gpu/drm/i915/intel_guc.c | 24 ++++++++++++++++++++++++
drivers/gpu/drm/i915/intel_guc_fwif.h | 1 +
drivers/gpu/drm/i915/intel_uc.c | 4 ++--
drivers/gpu/drm/i915/intel_uncore.c | 7 +------
8 files changed, 41 insertions(+), 19 deletions(-)
--
2.14.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/3] drm/i915/guc: Rename the function that resets the GuC
2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
@ 2017-10-30 18:56 ` Michel Thierry
2017-10-30 21:02 ` Chris Wilson
2017-10-30 18:56 ` [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands Michel Thierry
` (5 subsequent siblings)
6 siblings, 1 reply; 22+ messages in thread
From: Michel Thierry @ 2017-10-30 18:56 UTC (permalink / raw)
To: intel-gfx
intel_guc_reset sounds more like the microcontroller is the one performing
a reset, while in this case is the opposite. intel_reset_guc not only
makes it clearer, it follows the other intel_reset functions available.
v2: Print error message in English (Tvrtko).
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 2 +-
drivers/gpu/drm/i915/intel_uc.c | 4 ++--
drivers/gpu/drm/i915/intel_uncore.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d760f995a22a..0ed06ca07568 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3325,7 +3325,7 @@ extern int i915_reset_engine(struct intel_engine_cs *engine,
unsigned int flags);
extern bool intel_has_reset_engine(struct drm_i915_private *dev_priv);
-extern int intel_guc_reset(struct drm_i915_private *dev_priv);
+extern int intel_reset_guc(struct drm_i915_private *dev_priv);
extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine);
extern void intel_hangcheck_init(struct drm_i915_private *dev_priv);
extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_uc.c b/drivers/gpu/drm/i915/intel_uc.c
index 25bd162f38d2..aec295470e0d 100644
--- a/drivers/gpu/drm/i915/intel_uc.c
+++ b/drivers/gpu/drm/i915/intel_uc.c
@@ -33,9 +33,9 @@ static int __intel_uc_reset_hw(struct drm_i915_private *dev_priv)
int ret;
u32 guc_status;
- ret = intel_guc_reset(dev_priv);
+ ret = intel_reset_guc(dev_priv);
if (ret) {
- DRM_ERROR("GuC reset failed, ret = %d\n", ret);
+ DRM_ERROR("Failed to reset GuC, ret = %d\n", ret);
return ret;
}
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 96ee6b2754be..d629b2e40013 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1803,7 +1803,7 @@ bool intel_has_reset_engine(struct drm_i915_private *dev_priv)
i915_modparams.reset >= 2);
}
-int intel_guc_reset(struct drm_i915_private *dev_priv)
+int intel_reset_guc(struct drm_i915_private *dev_priv)
{
int ret;
--
2.14.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands
2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
2017-10-30 18:56 ` [PATCH 1/3] drm/i915/guc: Rename the function that resets the GuC Michel Thierry
@ 2017-10-30 18:56 ` Michel Thierry
2017-10-30 20:58 ` Chris Wilson
` (2 more replies)
2017-10-30 18:56 ` [PATCH 3/3] HAX enable GuC submission for CI Michel Thierry
` (4 subsequent siblings)
6 siblings, 3 replies; 22+ messages in thread
From: Michel Thierry @ 2017-10-30 18:56 UTC (permalink / raw)
To: intel-gfx
This patch adds per engine reset and recovery (TDR) support when GuC is
used to submit workloads to GPU.
In the case of i915 directly submission to ELSP, driver manages hang
detection, recovery and resubmission. With GuC submission these tasks
are shared between driver and GuC. i915 is still responsible for detecting
a hang, and when it does it only requests GuC to reset that Engine. GuC
internally manages acquiring forcewake and idling the engine before actually
resetting it.
Once the reset is successful, i915 takes over again and handles resubmission.
The scheduler in i915 knows which requests are pending so after resetting
a engine, pending workloads/requests are resubmitted again.
v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
non-guc funtion names.
v3: Removed debug message about engine restarting from which request,
since the new baseline do it regardless of submission mode. (Chris)
v4: Rebase.
v5: Do not pass unnecessary reporting flags to the fw (Jeff);
tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
drivers/gpu/drm/i915/i915_drv.c | 9 +++++++--
drivers/gpu/drm/i915/i915_drv.h | 1 +
drivers/gpu/drm/i915/intel_guc.c | 24 ++++++++++++++++++++++++
drivers/gpu/drm/i915/intel_guc_fwif.h | 1 +
drivers/gpu/drm/i915/intel_uncore.c | 5 -----
5 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index af745749509c..02fb35744f66 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1984,10 +1984,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
goto out;
}
- ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
+ if (!engine->i915->guc.execbuf_client)
+ ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
+ else
+ ret = intel_guc_reset_engine(engine);
+
if (ret) {
/* If we fail here, we expect to fallback to a global reset */
- DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n",
+ DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n",
+ (engine->i915->guc.execbuf_client ? "GUC " : ""),
engine->name, ret);
goto out;
}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0ed06ca07568..b77fd9720ec2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3326,6 +3326,7 @@ extern int i915_reset_engine(struct intel_engine_cs *engine,
extern bool intel_has_reset_engine(struct drm_i915_private *dev_priv);
extern int intel_reset_guc(struct drm_i915_private *dev_priv);
+extern int intel_guc_reset_engine(struct intel_engine_cs *engine);
extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine);
extern void intel_hangcheck_init(struct drm_i915_private *dev_priv);
extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_guc.c b/drivers/gpu/drm/i915/intel_guc.c
index f74d50fdaeb0..f7b9b6b91499 100644
--- a/drivers/gpu/drm/i915/intel_guc.c
+++ b/drivers/gpu/drm/i915/intel_guc.c
@@ -24,6 +24,7 @@
#include "intel_guc.h"
#include "i915_drv.h"
+#include "i915_guc_submission.h"
static void gen8_guc_raise_irq(struct intel_guc *guc)
{
@@ -283,6 +284,29 @@ int intel_guc_suspend(struct drm_i915_private *dev_priv)
return intel_guc_send(guc, data, ARRAY_SIZE(data));
}
+/**
+ * intel_guc_reset_engine() - ask GuC to reset an engine
+ * @engine: engine to be reset
+ */
+int intel_guc_reset_engine(struct intel_engine_cs *engine)
+{
+ struct drm_i915_private *dev_priv = engine->i915;
+ struct intel_guc *guc = &dev_priv->guc;
+ u32 data[7];
+
+ GEM_BUG_ON(!guc->execbuf_client);
+
+ data[0] = INTEL_GUC_ACTION_REQUEST_ENGINE_RESET;
+ data[1] = engine->guc_id;
+ data[2] = 0;
+ data[3] = 0;
+ data[4] = 0;
+ data[5] = guc->execbuf_client->stage_id;
+ data[6] = guc_ggtt_offset(guc->shared_data);
+
+ return intel_guc_send(guc, data, ARRAY_SIZE(data));
+}
+
/**
* intel_guc_resume() - notify GuC resuming from suspend state
* @dev_priv: i915 device private
diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h b/drivers/gpu/drm/i915/intel_guc_fwif.h
index e24dbec2ead8..6a10aa6f04d3 100644
--- a/drivers/gpu/drm/i915/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
@@ -574,6 +574,7 @@ struct guc_shared_ctx_data {
enum intel_guc_action {
INTEL_GUC_ACTION_DEFAULT = 0x0,
INTEL_GUC_ACTION_REQUEST_PREEMPTION = 0x2,
+ INTEL_GUC_ACTION_REQUEST_ENGINE_RESET = 0x3,
INTEL_GUC_ACTION_SAMPLE_FORCEWAKE = 0x6,
INTEL_GUC_ACTION_ALLOCATE_DOORBELL = 0x10,
INTEL_GUC_ACTION_DEALLOCATE_DOORBELL = 0x20,
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index d629b2e40013..0564d87ad416 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1792,14 +1792,9 @@ bool intel_has_gpu_reset(struct drm_i915_private *dev_priv)
return intel_get_gpu_reset(dev_priv) != NULL;
}
-/*
- * When GuC submission is enabled, GuC manages ELSP and can initiate the
- * engine reset too. For now, fall back to full GPU reset if it is enabled.
- */
bool intel_has_reset_engine(struct drm_i915_private *dev_priv)
{
return (dev_priv->info.has_reset_engine &&
- !dev_priv->guc.execbuf_client &&
i915_modparams.reset >= 2);
}
--
2.14.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/3] HAX enable GuC submission for CI
2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
2017-10-30 18:56 ` [PATCH 1/3] drm/i915/guc: Rename the function that resets the GuC Michel Thierry
2017-10-30 18:56 ` [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands Michel Thierry
@ 2017-10-30 18:56 ` Michel Thierry
2017-10-30 20:05 ` ✓ Fi.CI.BAT: success for GuC based reset engine Patchwork
` (3 subsequent siblings)
6 siblings, 0 replies; 22+ messages in thread
From: Michel Thierry @ 2017-10-30 18:56 UTC (permalink / raw)
To: intel-gfx
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
Also revert ("drm/i915/guc: Assert that we switch between
known ggtt->invalidate functions")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
drivers/gpu/drm/i915/i915_gem_gtt.c | 8 ++------
drivers/gpu/drm/i915/i915_params.h | 4 ++--
2 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5eaa6893daaa..7dcf931abe37 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3568,17 +3568,13 @@ int i915_ggtt_enable_hw(struct drm_i915_private *dev_priv)
void i915_ggtt_enable_guc(struct drm_i915_private *i915)
{
- GEM_BUG_ON(i915->ggtt.invalidate != gen6_ggtt_invalidate);
-
i915->ggtt.invalidate = guc_ggtt_invalidate;
}
void i915_ggtt_disable_guc(struct drm_i915_private *i915)
{
- /* We should only be called after i915_ggtt_enable_guc() */
- GEM_BUG_ON(i915->ggtt.invalidate != guc_ggtt_invalidate);
-
- i915->ggtt.invalidate = gen6_ggtt_invalidate;
+ if (i915->ggtt.invalidate == guc_ggtt_invalidate)
+ i915->ggtt.invalidate = gen6_ggtt_invalidate;
}
void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
index c7292268ed43..c38cef07b9fe 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -44,8 +44,8 @@
param(int, disable_power_well, -1) \
param(int, enable_ips, 1) \
param(int, invert_brightness, 0) \
- param(int, enable_guc_loading, 0) \
- param(int, enable_guc_submission, 0) \
+ param(int, enable_guc_loading, 1) \
+ param(int, enable_guc_submission, 1) \
param(int, guc_log_level, -1) \
param(char *, guc_firmware_path, NULL) \
param(char *, huc_firmware_path, NULL) \
--
2.14.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 22+ messages in thread
* ✓ Fi.CI.BAT: success for GuC based reset engine
2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
` (2 preceding siblings ...)
2017-10-30 18:56 ` [PATCH 3/3] HAX enable GuC submission for CI Michel Thierry
@ 2017-10-30 20:05 ` Patchwork
2017-10-30 21:14 ` Chris Wilson
2017-10-30 23:20 ` ✗ Fi.CI.IGT: warning " Patchwork
` (2 subsequent siblings)
6 siblings, 1 reply; 22+ messages in thread
From: Patchwork @ 2017-10-30 20:05 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx
== Series Details ==
Series: GuC based reset engine
URL : https://patchwork.freedesktop.org/series/32859/
State : success
== Summary ==
Series 32859v1 GuC based reset engine
https://patchwork.freedesktop.org/api/1.0/series/32859/revisions/1/mbox/
Test chamelium:
Subgroup dp-crc-fast:
pass -> FAIL (fi-kbl-7500u) fdo#102514
Test gem_exec_suspend:
Subgroup basic-s3:
pass -> DMESG-WARN (fi-cfl-s) fdo#103186
Test kms_pipe_crc_basic:
Subgroup suspend-read-crc-pipe-c:
incomplete -> PASS (fi-bxt-dsi)
Test pm_rpm:
Subgroup basic-pci-d3-state:
incomplete -> FAIL (fi-bxt-j4205)
fdo#102514 https://bugs.freedesktop.org/show_bug.cgi?id=102514
fdo#103186 https://bugs.freedesktop.org/show_bug.cgi?id=103186
fi-bdw-5557u total:289 pass:266 dwarn:0 dfail:0 fail:2 skip:21 time:459s
fi-bdw-gvtdvm total:289 pass:265 dwarn:0 dfail:0 fail:0 skip:24 time:454s
fi-blb-e6850 total:289 pass:223 dwarn:1 dfail:0 fail:0 skip:65 time:371s
fi-bsw-n3050 total:289 pass:241 dwarn:0 dfail:0 fail:2 skip:46 time:540s
fi-bwr-2160 total:289 pass:183 dwarn:0 dfail:0 fail:0 skip:106 time:265s
fi-bxt-dsi total:289 pass:257 dwarn:0 dfail:0 fail:2 skip:30 time:517s
fi-bxt-j4205 total:289 pass:258 dwarn:0 dfail:0 fail:2 skip:29 time:520s
fi-byt-j1900 total:289 pass:251 dwarn:1 dfail:0 fail:2 skip:35 time:510s
fi-byt-n2820 total:289 pass:247 dwarn:1 dfail:0 fail:2 skip:39 time:499s
fi-cfl-s total:289 pass:251 dwarn:4 dfail:0 fail:2 skip:32 time:571s
fi-elk-e7500 total:289 pass:229 dwarn:0 dfail:0 fail:0 skip:60 time:423s
fi-gdg-551 total:289 pass:178 dwarn:1 dfail:0 fail:1 skip:109 time:249s
fi-glk-1 total:289 pass:259 dwarn:0 dfail:0 fail:2 skip:28 time:592s
fi-hsw-4770 total:289 pass:260 dwarn:0 dfail:0 fail:2 skip:27 time:444s
fi-hsw-4770r total:289 pass:260 dwarn:0 dfail:0 fail:2 skip:27 time:442s
fi-ilk-650 total:289 pass:228 dwarn:0 dfail:0 fail:0 skip:61 time:421s
fi-ivb-3520m total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:488s
fi-ivb-3770 total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:458s
fi-kbl-7500u total:289 pass:261 dwarn:1 dfail:0 fail:3 skip:24 time:500s
fi-kbl-7560u total:289 pass:268 dwarn:0 dfail:0 fail:2 skip:19 time:582s
fi-kbl-7567u total:289 pass:267 dwarn:0 dfail:0 fail:2 skip:20 time:489s
fi-kbl-r total:289 pass:260 dwarn:0 dfail:0 fail:2 skip:27 time:588s
fi-pnv-d510 total:289 pass:222 dwarn:1 dfail:0 fail:0 skip:66 time:537s
fi-skl-6260u total:289 pass:267 dwarn:0 dfail:0 fail:2 skip:20 time:474s
fi-skl-6600u total:289 pass:260 dwarn:0 dfail:0 fail:2 skip:27 time:593s
fi-skl-6700hq total:289 pass:261 dwarn:0 dfail:0 fail:2 skip:26 time:652s
fi-skl-6700k total:289 pass:263 dwarn:0 dfail:0 fail:2 skip:24 time:537s
fi-skl-6770hq total:289 pass:267 dwarn:0 dfail:0 fail:2 skip:20 time:515s
fi-skl-gvtdvm total:289 pass:266 dwarn:0 dfail:0 fail:0 skip:23 time:459s
fi-snb-2520m total:289 pass:250 dwarn:0 dfail:0 fail:0 skip:39 time:554s
fi-snb-2600 total:289 pass:249 dwarn:0 dfail:0 fail:0 skip:40 time:415s
302b1723bec5e7f43e423c975b9b87705fb9fdd6 drm-tip: 2017y-10m-30d-18h-28m-39s UTC integration manifest
e25d1ee5b093 HAX enable GuC submission for CI
c3a835c2c17c drm/i915/guc: Add support for reset engine using GuC commands
195f89fa1970 drm/i915/guc: Rename the function that resets the GuC
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands
2017-10-30 18:56 ` [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands Michel Thierry
@ 2017-10-30 20:58 ` Chris Wilson
2017-10-30 21:08 ` Michel Thierry
2017-10-30 21:09 ` Chris Wilson
2017-10-31 22:53 ` [PATCH v6] " Michel Thierry
2 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2017-10-30 20:58 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Quoting Michel Thierry (2017-10-30 18:56:15)
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index af745749509c..02fb35744f66 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1984,10 +1984,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
> goto out;
> }
>
> - ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
> + if (!engine->i915->guc.execbuf_client)
> + ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
> + else
> + ret = intel_guc_reset_engine(engine);
While redundant, the interface cries out for
intel_guc_reset_engine(guc, engine);
even though, in this case we would be using
intel_guc_reset_engine(&engine->i915->guc, engine);
I would also be tempted to change intel_gpu_reset to
intel_gpu_reset_engine(engine->i915, engine);
intel_gt_reset_engine?
Just trying to make our language consistent along each path.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/3] drm/i915/guc: Rename the function that resets the GuC
2017-10-30 18:56 ` [PATCH 1/3] drm/i915/guc: Rename the function that resets the GuC Michel Thierry
@ 2017-10-30 21:02 ` Chris Wilson
0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-10-30 21:02 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Quoting Michel Thierry (2017-10-30 18:56:14)
> intel_guc_reset sounds more like the microcontroller is the one performing
> a reset, while in this case is the opposite. intel_reset_guc not only
> makes it clearer, it follows the other intel_reset functions available.
>
> v2: Print error message in English (Tvrtko).
>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands
2017-10-30 20:58 ` Chris Wilson
@ 2017-10-30 21:08 ` Michel Thierry
0 siblings, 0 replies; 22+ messages in thread
From: Michel Thierry @ 2017-10-30 21:08 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
On 30/10/17 13:58, Chris Wilson wrote:
> Quoting Michel Thierry (2017-10-30 18:56:15)
>> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>> index af745749509c..02fb35744f66 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.c
>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>> @@ -1984,10 +1984,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
>> goto out;
>> }
>>
>> - ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
>> + if (!engine->i915->guc.execbuf_client)
>> + ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
>> + else
>> + ret = intel_guc_reset_engine(engine);
>
> While redundant, the interface cries out for
> intel_guc_reset_engine(guc, engine);
> even though, in this case we would be using
> intel_guc_reset_engine(&engine->i915->guc, engine);
>
I'll change the name and pass the guc as parameter.
> I would also be tempted to change intel_gpu_reset to
> intel_gpu_reset_engine(engine->i915, engine);
>
> intel_gt_reset_engine?
>
And then it becomes a wrapper of intel_gpu_reset? (intel_gpu_reset is
used for full reset path and in older platforms too).
> Just trying to make our language consistent along each path.
>
It's true that it would look better as you suggest:
if (!guc_client)
intel_gt_reset_engine(i915, engine);
else
intel_guc_reset_engine(guc, engine);
Thanks,
-Michel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands
2017-10-30 18:56 ` [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands Michel Thierry
2017-10-30 20:58 ` Chris Wilson
@ 2017-10-30 21:09 ` Chris Wilson
2017-10-31 4:38 ` Michel Thierry
2017-10-31 22:53 ` [PATCH v6] " Michel Thierry
2 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2017-10-30 21:09 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Quoting Michel Thierry (2017-10-30 18:56:15)
> This patch adds per engine reset and recovery (TDR) support when GuC is
> used to submit workloads to GPU.
>
> In the case of i915 directly submission to ELSP, driver manages hang
> detection, recovery and resubmission. With GuC submission these tasks
> are shared between driver and GuC. i915 is still responsible for detecting
> a hang, and when it does it only requests GuC to reset that Engine. GuC
> internally manages acquiring forcewake and idling the engine before actually
> resetting it.
>
> Once the reset is successful, i915 takes over again and handles resubmission.
> The scheduler in i915 knows which requests are pending so after resetting
> a engine, pending workloads/requests are resubmitted again.
>
> v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> non-guc funtion names.
>
> v3: Removed debug message about engine restarting from which request,
> since the new baseline do it regardless of submission mode. (Chris)
>
> v4: Rebase.
>
> v5: Do not pass unnecessary reporting flags to the fw (Jeff);
> tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
In your experience, how did our test coverage fare?
Could you use live_hangcheck effectively? (The drv_selftest would need
some hand holding to pass along guc options. But for livetesting we
should probably get to the point of being able to load/unload the guc
interface so that we cover both execlists and guc.) Did you find
"gem_exec_whisper --r hang*", did you try gem_concurrent_all?
> +/**
> + * intel_guc_reset_engine() - ask GuC to reset an engine
> + * @engine: engine to be reset
> + */
> +int intel_guc_reset_engine(struct intel_engine_cs *engine)
> +{
> + struct drm_i915_private *dev_priv = engine->i915;
> + struct intel_guc *guc = &dev_priv->guc;
> + u32 data[7];
> +
> + GEM_BUG_ON(!guc->execbuf_client);
> +
> + data[0] = INTEL_GUC_ACTION_REQUEST_ENGINE_RESET;
> + data[1] = engine->guc_id;
> + data[2] = 0;
> + data[3] = 0;
> + data[4] = 0;
> + data[5] = guc->execbuf_client->stage_id;
> + data[6] = guc_ggtt_offset(guc->shared_data);
> +
> + return intel_guc_send(guc, data, ARRAY_SIZE(data));
Is this a synchronous action? We expect that following the completion of
the reset routine, we are ready to reinit the hw. The same rule needs to
apply the guc, I think.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ✓ Fi.CI.BAT: success for GuC based reset engine
2017-10-30 20:05 ` ✓ Fi.CI.BAT: success for GuC based reset engine Patchwork
@ 2017-10-30 21:14 ` Chris Wilson
0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-10-30 21:14 UTC (permalink / raw)
To: Patchwork, Michel Thierry; +Cc: intel-gfx
Quoting Patchwork (2017-10-30 20:05:05)
> == Series Details ==
>
> Series: GuC based reset engine
> URL : https://patchwork.freedesktop.org/series/32859/
> State : success
>
> == Summary ==
>
> Series 32859v1 GuC based reset engine
> https://patchwork.freedesktop.org/api/1.0/series/32859/revisions/1/mbox/
>
> Test chamelium:
> Subgroup dp-crc-fast:
> pass -> FAIL (fi-kbl-7500u) fdo#102514
> Test gem_exec_suspend:
> Subgroup basic-s3:
> pass -> DMESG-WARN (fi-cfl-s) fdo#103186
> Test kms_pipe_crc_basic:
> Subgroup suspend-read-crc-pipe-c:
> incomplete -> PASS (fi-bxt-dsi)
> Test pm_rpm:
> Subgroup basic-pci-d3-state:
> incomplete -> FAIL (fi-bxt-j4205)
35686a44e4b8 drm/i915: Use intel_ddi_get_config() for MST
1939ba51fd05 drm/i915: Pass a crtc state to ddi post_disable from MST code
bb911536f07e drm/i915: Eliminate pll->state usage from bxt_calc_pll_link()
0fce04c8764b drm/i915: Nuke intel_ddi_get_encoder_port()
7e732cacb1ae drm/i915: Stop frobbing with DDI encoder->type
e1214b95ed83 drm/i915: Populate output_types from .get_config()
d6038611aa3d drm/i915: Parse max HDMI TMDS clock from VBT
6e8fbf8d19e4 drm/i915/vbt: Fix HDMI level shifter and max data rate bitfield sizes
2dd14a4ad87f drm-tip: 2017y-10m-30d-13h-40m-22s UTC integration manifest
0b07194bb55e Linux 4.14-rc7
From the rc7 patchset, my guess would be
0cc2b4e5a020 PM / QoS: Fix device resume latency PM QoS
Anyway, someone has a fun bisect task on their hang; it should be very
quick.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* ✗ Fi.CI.IGT: warning for GuC based reset engine
2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
` (3 preceding siblings ...)
2017-10-30 20:05 ` ✓ Fi.CI.BAT: success for GuC based reset engine Patchwork
@ 2017-10-30 23:20 ` Patchwork
2017-10-31 10:20 ` Chris Wilson
2017-10-31 23:50 ` ✓ Fi.CI.BAT: success for GuC based reset engine (rev2) Patchwork
2017-11-01 0:59 ` ✗ Fi.CI.IGT: failure " Patchwork
6 siblings, 1 reply; 22+ messages in thread
From: Patchwork @ 2017-10-30 23:20 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx
== Series Details ==
Series: GuC based reset engine
URL : https://patchwork.freedesktop.org/series/32859/
State : warning
== Summary ==
Test drv_module_reload:
Subgroup basic-no-display:
pass -> DMESG-WARN (shard-hsw) fdo#102707
Test gem_softpin:
Subgroup noreloc-S3:
pass -> SKIP (shard-hsw)
fdo#102707 https://bugs.freedesktop.org/show_bug.cgi?id=102707
shard-hsw total:2539 pass:1390 dwarn:2 dfail:0 fail:52 skip:1095 time:9075s
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands
2017-10-30 21:09 ` Chris Wilson
@ 2017-10-31 4:38 ` Michel Thierry
2017-10-31 10:17 ` Chris Wilson
0 siblings, 1 reply; 22+ messages in thread
From: Michel Thierry @ 2017-10-31 4:38 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
On 30/10/17 14:09, Chris Wilson wrote:
> Quoting Michel Thierry (2017-10-30 18:56:15)
>> This patch adds per engine reset and recovery (TDR) support when GuC is
>> used to submit workloads to GPU.
>>
>> In the case of i915 directly submission to ELSP, driver manages hang
>> detection, recovery and resubmission. With GuC submission these tasks
>> are shared between driver and GuC. i915 is still responsible for detecting
>> a hang, and when it does it only requests GuC to reset that Engine. GuC
>> internally manages acquiring forcewake and idling the engine before actually
>> resetting it.
>>
>> Once the reset is successful, i915 takes over again and handles resubmission.
>> The scheduler in i915 knows which requests are pending so after resetting
>> a engine, pending workloads/requests are resubmitted again.
>>
>> v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
>> non-guc funtion names.
>>
>> v3: Removed debug message about engine restarting from which request,
>> since the new baseline do it regardless of submission mode. (Chris)
>>
>> v4: Rebase.
>>
>> v5: Do not pass unnecessary reporting flags to the fw (Jeff);
>> tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
>
> In your experience, how did our test coverage fare?
>
> Could you use live_hangcheck effectively? (The drv_selftest would need
> some hand holding to pass along guc options. But for livetesting we
> should probably get to the point of being able to load/unload the guc
> interface so that we cover both execlists and guc.) Did you find
> "gem_exec_whisper --r hang*", did you try gem_concurrent_all?
>
live_hangcheck runs ok with guc (as long as i915_params.h has guc
submission enabled). Do you see a benefit on adding an option in
drv_selftest to override the submission mode? I can add it to my list.
You got me in gem_concurrent_all, I forgot to schedule it a few weeks ago.
>> +/**
>> + * intel_guc_reset_engine() - ask GuC to reset an engine
>> + * @engine: engine to be reset
>> + */
>> +int intel_guc_reset_engine(struct intel_engine_cs *engine)
>> +{
>> + struct drm_i915_private *dev_priv = engine->i915;
>> + struct intel_guc *guc = &dev_priv->guc;
>> + u32 data[7];
>> +
>> + GEM_BUG_ON(!guc->execbuf_client);
>> +
>> + data[0] = INTEL_GUC_ACTION_REQUEST_ENGINE_RESET;
>> + data[1] = engine->guc_id;
>> + data[2] = 0;
>> + data[3] = 0;
>> + data[4] = 0;
>> + data[5] = guc->execbuf_client->stage_id;
>> + data[6] = guc_ggtt_offset(guc->shared_data);
>> +
>> + return intel_guc_send(guc, data, ARRAY_SIZE(data));
>
> Is this a synchronous action? We expect that following the completion of
> the reset routine, we are ready to reinit the hw. The same rule needs to
> apply the guc, I think.
Right now the action is synchronous, the fw won't reply to the action
until all the steps are completed. It also is fast enough, I haven't
seen it time out (which would be promoted to full reset and reload the
fw). But, do you have a crystal ball?
> -Chris
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands
2017-10-31 4:38 ` Michel Thierry
@ 2017-10-31 10:17 ` Chris Wilson
0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-10-31 10:17 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Quoting Michel Thierry (2017-10-31 04:38:30)
> On 30/10/17 14:09, Chris Wilson wrote:
> > Quoting Michel Thierry (2017-10-30 18:56:15)
> >> This patch adds per engine reset and recovery (TDR) support when GuC is
> >> used to submit workloads to GPU.
> >>
> >> In the case of i915 directly submission to ELSP, driver manages hang
> >> detection, recovery and resubmission. With GuC submission these tasks
> >> are shared between driver and GuC. i915 is still responsible for detecting
> >> a hang, and when it does it only requests GuC to reset that Engine. GuC
> >> internally manages acquiring forcewake and idling the engine before actually
> >> resetting it.
> >>
> >> Once the reset is successful, i915 takes over again and handles resubmission.
> >> The scheduler in i915 knows which requests are pending so after resetting
> >> a engine, pending workloads/requests are resubmitted again.
> >>
> >> v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> >> non-guc funtion names.
> >>
> >> v3: Removed debug message about engine restarting from which request,
> >> since the new baseline do it regardless of submission mode. (Chris)
> >>
> >> v4: Rebase.
> >>
> >> v5: Do not pass unnecessary reporting flags to the fw (Jeff);
> >> tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
> >
> > In your experience, how did our test coverage fare?
> >
> > Could you use live_hangcheck effectively? (The drv_selftest would need
> > some hand holding to pass along guc options. But for livetesting we
> > should probably get to the point of being able to load/unload the guc
> > interface so that we cover both execlists and guc.) Did you find
> > "gem_exec_whisper --r hang*", did you try gem_concurrent_all?
> >
>
> live_hangcheck runs ok with guc (as long as i915_params.h has guc
> submission enabled). Do you see a benefit on adding an option in
> drv_selftest to override the submission mode? I can add it to my list.
I'd rather take the path of loading/unloading guc in the kernel to
ensure we have coverage of both on all platforms where that makes sense.
That way the kernel remains in control of the coverage it deems
necessary.
The alternative would be a for_each_param_value() loop in drv_selftests
which doesn't inspired me with forward/backward testing confidence.
For the time being, knowing that you did check live_hangcheck is enough
for me to cross it off the checklist.
The larger question behind "how did our test coverage fare" is did it
find any bugs during development? If not, we need more tests ;)
> You got me in gem_concurrent_all, I forgot to schedule it a few weeks ago.
>
> >> +/**
> >> + * intel_guc_reset_engine() - ask GuC to reset an engine
> >> + * @engine: engine to be reset
> >> + */
> >> +int intel_guc_reset_engine(struct intel_engine_cs *engine)
> >> +{
> >> + struct drm_i915_private *dev_priv = engine->i915;
> >> + struct intel_guc *guc = &dev_priv->guc;
> >> + u32 data[7];
> >> +
> >> + GEM_BUG_ON(!guc->execbuf_client);
> >> +
> >> + data[0] = INTEL_GUC_ACTION_REQUEST_ENGINE_RESET;
> >> + data[1] = engine->guc_id;
> >> + data[2] = 0;
> >> + data[3] = 0;
> >> + data[4] = 0;
> >> + data[5] = guc->execbuf_client->stage_id;
> >> + data[6] = guc_ggtt_offset(guc->shared_data);
> >> +
> >> + return intel_guc_send(guc, data, ARRAY_SIZE(data));
> >
> > Is this a synchronous action? We expect that following the completion of
> > the reset routine, we are ready to reinit the hw. The same rule needs to
> > apply the guc, I think.
>
> Right now the action is synchronous, the fw won't reply to the action
> until all the steps are completed. It also is fast enough, I haven't
> seen it time out (which would be promoted to full reset and reload the
> fw). But, do you have a crystal ball?
Just REQUEST is a red flag that there may be an asynchronous REPLY/ACK.
Aside: we need to start planning/adding fault_injection tests.
intel_guc_send() seems a good choice for an if(should_fail()).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ✗ Fi.CI.IGT: warning for GuC based reset engine
2017-10-30 23:20 ` ✗ Fi.CI.IGT: warning " Patchwork
@ 2017-10-31 10:20 ` Chris Wilson
2017-10-31 20:56 ` Michel Thierry
0 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2017-10-31 10:20 UTC (permalink / raw)
To: Patchwork, Michel Thierry; +Cc: intel-gfx
Quoting Patchwork (2017-10-30 23:20:13)
> For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/shards.html
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/shard-kbl1/igt@prime_busy@wait-hang-render.html
is suspicious.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ✗ Fi.CI.IGT: warning for GuC based reset engine
2017-10-31 10:20 ` Chris Wilson
@ 2017-10-31 20:56 ` Michel Thierry
2017-10-31 21:31 ` Chris Wilson
0 siblings, 1 reply; 22+ messages in thread
From: Michel Thierry @ 2017-10-31 20:56 UTC (permalink / raw)
To: Chris Wilson, intel-gfx, Patchwork
On 10/31/2017 3:20 AM, Chris Wilson wrote:
> Quoting Patchwork (2017-10-30 23:20:13)
>> For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/shards.html
>
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/shard-kbl1/igt@prime_busy@wait-hang-render.html
> is suspicious.
> -Chris
>
It is suspicious (I just double checked that prime_busy runs fine in my
skl). I don't know if the log is complete or it was killed before
starting the subtest. Instead of:
[ ] [IGT] prime_busy: executing
[ ] [IGT] prime_busy: starting subtest wait-hang-render
the log only has:
<7>[ 59.430514] [IGT] kms_plane_lowres: exiting, ret=77
<7>[ 59.522502] [IGT] prime_busy: executing
������������������������������
(and a few tests were returning EBADFD).
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ✗ Fi.CI.IGT: warning for GuC based reset engine
2017-10-31 20:56 ` Michel Thierry
@ 2017-10-31 21:31 ` Chris Wilson
0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-10-31 21:31 UTC (permalink / raw)
To: Michel Thierry, intel-gfx, Patchwork
Quoting Michel Thierry (2017-10-31 20:56:03)
> On 10/31/2017 3:20 AM, Chris Wilson wrote:
> > Quoting Patchwork (2017-10-30 23:20:13)
> >> For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/shards.html
> >
> > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6267/shard-kbl1/igt@prime_busy@wait-hang-render.html
> > is suspicious.
> > -Chris
> >
>
> It is suspicious (I just double checked that prime_busy runs fine in my
> skl). I don't know if the log is complete or it was killed before
> starting the subtest. Instead of:
> [ ] [IGT] prime_busy: executing
> [ ] [IGT] prime_busy: starting subtest wait-hang-render
>
> the log only has:
>
> <7>[ 59.430514] [IGT] kms_plane_lowres: exiting, ret=77
> <7>[ 59.522502] [IGT] prime_busy: executing
> ������������������������������
Yup, that just tells us that the machine was rebooted before the dmesg
was written to disk. Usually a sign of a panic, but no pstore oops.
It was only that this purported to be a hang test, and until very
recently broke intel_gpu_reset, that I worried.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v6] drm/i915/guc: Add support for reset engine using GuC commands
2017-10-30 18:56 ` [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands Michel Thierry
2017-10-30 20:58 ` Chris Wilson
2017-10-30 21:09 ` Chris Wilson
@ 2017-10-31 22:53 ` Michel Thierry
2017-11-01 13:58 ` Chris Wilson
2 siblings, 1 reply; 22+ messages in thread
From: Michel Thierry @ 2017-10-31 22:53 UTC (permalink / raw)
To: intel-gfx
This patch adds per engine reset and recovery (TDR) support when GuC is
used to submit workloads to GPU.
In the case of i915 directly submission to ELSP, driver manages hang
detection, recovery and resubmission. With GuC submission these tasks
are shared between driver and GuC. i915 is still responsible for detecting
a hang, and when it does it only requests GuC to reset that Engine. GuC
internally manages acquiring forcewake and idling the engine before
resetting it.
Once the reset is successful, i915 takes over again and handles the
resubmission. The scheduler in i915 knows which requests are pending so
after resetting a engine, pending workloads/requests are resubmitted
again.
v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
non-guc function names.
v3: Removed debug message about engine restarting from which request,
since the new baseline do it regardless of submission mode. (Chris)
v4: Rebase.
v5: Do not pass unnecessary reporting flags to the fw (Jeff);
tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
v6: Rename the existing reset engine function and share a similar
interface between guc and non-guc paths (Chris).
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
drivers/gpu/drm/i915/i915_drv.c | 15 +++++++++++++--
drivers/gpu/drm/i915/i915_drv.h | 2 ++
drivers/gpu/drm/i915/intel_guc.c | 24 ++++++++++++++++++++++++
drivers/gpu/drm/i915/intel_guc_fwif.h | 1 +
drivers/gpu/drm/i915/intel_uncore.c | 5 -----
5 files changed, 40 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index af745749509c..359333a423cf 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1950,6 +1950,12 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
goto finish;
}
+static inline int intel_gt_reset_engine(struct drm_i915_private *dev_priv,
+ struct intel_engine_cs *engine)
+{
+ return intel_gpu_reset(dev_priv, intel_engine_flag(engine));
+}
+
/**
* i915_reset_engine - reset GPU engine to recover from a hang
* @engine: engine to reset
@@ -1984,10 +1990,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
goto out;
}
- ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
+ if (!engine->i915->guc.execbuf_client)
+ ret = intel_gt_reset_engine(engine->i915, engine);
+ else
+ ret = intel_guc_reset_engine(&engine->i915->guc, engine);
+
if (ret) {
/* If we fail here, we expect to fallback to a global reset */
- DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n",
+ DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n",
+ (engine->i915->guc.execbuf_client ? "GUC ":""),
engine->name, ret);
goto out;
}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index cff1b57598c3..ce2725696187 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3330,6 +3330,8 @@ extern int i915_reset_engine(struct intel_engine_cs *engine,
extern bool intel_has_reset_engine(struct drm_i915_private *dev_priv);
extern int intel_reset_guc(struct drm_i915_private *dev_priv);
+extern int intel_guc_reset_engine(struct intel_guc *guc,
+ struct intel_engine_cs *engine);
extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine);
extern void intel_hangcheck_init(struct drm_i915_private *dev_priv);
extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_guc.c b/drivers/gpu/drm/i915/intel_guc.c
index f74d50fdaeb0..9678630a1c70 100644
--- a/drivers/gpu/drm/i915/intel_guc.c
+++ b/drivers/gpu/drm/i915/intel_guc.c
@@ -24,6 +24,7 @@
#include "intel_guc.h"
#include "i915_drv.h"
+#include "i915_guc_submission.h"
static void gen8_guc_raise_irq(struct intel_guc *guc)
{
@@ -283,6 +284,29 @@ int intel_guc_suspend(struct drm_i915_private *dev_priv)
return intel_guc_send(guc, data, ARRAY_SIZE(data));
}
+/**
+ * intel_guc_reset_engine() - ask GuC to reset an engine
+ * @guc: intel_guc structure
+ * @engine: engine to be reset
+ */
+int intel_guc_reset_engine(struct intel_guc *guc,
+ struct intel_engine_cs *engine)
+{
+ u32 data[7];
+
+ GEM_BUG_ON(!guc->execbuf_client);
+
+ data[0] = INTEL_GUC_ACTION_REQUEST_ENGINE_RESET;
+ data[1] = engine->guc_id;
+ data[2] = 0;
+ data[3] = 0;
+ data[4] = 0;
+ data[5] = guc->execbuf_client->stage_id;
+ data[6] = guc_ggtt_offset(guc->shared_data);
+
+ return intel_guc_send(guc, data, ARRAY_SIZE(data));
+}
+
/**
* intel_guc_resume() - notify GuC resuming from suspend state
* @dev_priv: i915 device private
diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h b/drivers/gpu/drm/i915/intel_guc_fwif.h
index e24dbec2ead8..6a10aa6f04d3 100644
--- a/drivers/gpu/drm/i915/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
@@ -574,6 +574,7 @@ struct guc_shared_ctx_data {
enum intel_guc_action {
INTEL_GUC_ACTION_DEFAULT = 0x0,
INTEL_GUC_ACTION_REQUEST_PREEMPTION = 0x2,
+ INTEL_GUC_ACTION_REQUEST_ENGINE_RESET = 0x3,
INTEL_GUC_ACTION_SAMPLE_FORCEWAKE = 0x6,
INTEL_GUC_ACTION_ALLOCATE_DOORBELL = 0x10,
INTEL_GUC_ACTION_DEALLOCATE_DOORBELL = 0x20,
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index d629b2e40013..0564d87ad416 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1792,14 +1792,9 @@ bool intel_has_gpu_reset(struct drm_i915_private *dev_priv)
return intel_get_gpu_reset(dev_priv) != NULL;
}
-/*
- * When GuC submission is enabled, GuC manages ELSP and can initiate the
- * engine reset too. For now, fall back to full GPU reset if it is enabled.
- */
bool intel_has_reset_engine(struct drm_i915_private *dev_priv)
{
return (dev_priv->info.has_reset_engine &&
- !dev_priv->guc.execbuf_client &&
i915_modparams.reset >= 2);
}
--
2.14.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 22+ messages in thread
* ✓ Fi.CI.BAT: success for GuC based reset engine (rev2)
2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
` (4 preceding siblings ...)
2017-10-30 23:20 ` ✗ Fi.CI.IGT: warning " Patchwork
@ 2017-10-31 23:50 ` Patchwork
2017-11-01 0:59 ` ✗ Fi.CI.IGT: failure " Patchwork
6 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2017-10-31 23:50 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx
== Series Details ==
Series: GuC based reset engine (rev2)
URL : https://patchwork.freedesktop.org/series/32859/
State : success
== Summary ==
Series 32859v2 GuC based reset engine
https://patchwork.freedesktop.org/api/1.0/series/32859/revisions/2/mbox/
Test chamelium:
Subgroup dp-crc-fast:
fail -> PASS (fi-kbl-7500u) fdo#102514
Test drv_module_reload:
Subgroup basic-no-display:
fail -> PASS (fi-hsw-4770r)
fdo#102514 https://bugs.freedesktop.org/show_bug.cgi?id=102514
fi-bdw-5557u total:289 pass:268 dwarn:0 dfail:0 fail:0 skip:21 time:450s
fi-bdw-gvtdvm total:289 pass:265 dwarn:0 dfail:0 fail:0 skip:24 time:453s
fi-blb-e6850 total:289 pass:223 dwarn:1 dfail:0 fail:0 skip:65 time:380s
fi-bsw-n3050 total:289 pass:243 dwarn:0 dfail:0 fail:0 skip:46 time:553s
fi-bwr-2160 total:289 pass:183 dwarn:0 dfail:0 fail:0 skip:106 time:276s
fi-bxt-dsi total:289 pass:259 dwarn:0 dfail:0 fail:0 skip:30 time:515s
fi-bxt-j4205 total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:507s
fi-byt-j1900 total:289 pass:253 dwarn:1 dfail:0 fail:0 skip:35 time:513s
fi-byt-n2820 total:289 pass:249 dwarn:1 dfail:0 fail:0 skip:39 time:492s
fi-cfl-s total:289 pass:253 dwarn:4 dfail:0 fail:0 skip:32 time:557s
fi-cnl-y total:217 pass:196 dwarn:0 dfail:0 fail:0 skip:20
fi-elk-e7500 total:289 pass:229 dwarn:0 dfail:0 fail:0 skip:60 time:435s
fi-gdg-551 total:289 pass:178 dwarn:1 dfail:0 fail:1 skip:109 time:265s
fi-glk-1 total:289 pass:261 dwarn:0 dfail:0 fail:0 skip:28 time:587s
fi-glk-dsi total:289 pass:258 dwarn:0 dfail:0 fail:1 skip:30 time:495s
fi-hsw-4770 total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:435s
fi-hsw-4770r total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:426s
fi-ilk-650 total:289 pass:228 dwarn:0 dfail:0 fail:0 skip:61 time:425s
fi-ivb-3520m total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:506s
fi-ivb-3770 total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:465s
fi-kbl-7500u total:289 pass:264 dwarn:1 dfail:0 fail:0 skip:24 time:489s
fi-kbl-7560u total:289 pass:270 dwarn:0 dfail:0 fail:0 skip:19 time:574s
fi-kbl-7567u total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:472s
fi-kbl-r total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:584s
fi-pnv-d510 total:289 pass:222 dwarn:1 dfail:0 fail:0 skip:66 time:569s
fi-skl-6260u total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:455s
fi-skl-6600u total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:591s
fi-skl-6700hq total:289 pass:263 dwarn:0 dfail:0 fail:0 skip:26 time:653s
fi-skl-6700k total:289 pass:265 dwarn:0 dfail:0 fail:0 skip:24 time:521s
fi-skl-6770hq total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:496s
fi-skl-gvtdvm total:289 pass:266 dwarn:0 dfail:0 fail:0 skip:23 time:463s
fi-snb-2520m total:289 pass:250 dwarn:0 dfail:0 fail:0 skip:39 time:567s
fi-snb-2600 total:289 pass:249 dwarn:0 dfail:0 fail:0 skip:40 time:421s
98f067e087ab9294555a9bdd6a056c4156777117 drm-tip: 2017y-10m-31d-20h-56m-13s UTC integration manifest
2fda8dacaf77 HAX enable GuC submission for CI
2f3b79915935 drm/i915/guc: Add support for reset engine using GuC commands
614ac0cbdb82 drm/i915/guc: Rename the function that resets the GuC
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6284/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* ✗ Fi.CI.IGT: failure for GuC based reset engine (rev2)
2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
` (5 preceding siblings ...)
2017-10-31 23:50 ` ✓ Fi.CI.BAT: success for GuC based reset engine (rev2) Patchwork
@ 2017-11-01 0:59 ` Patchwork
6 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2017-11-01 0:59 UTC (permalink / raw)
To: Michel Thierry; +Cc: intel-gfx
== Series Details ==
Series: GuC based reset engine (rev2)
URL : https://patchwork.freedesktop.org/series/32859/
State : failure
== Summary ==
Test drv_module_reload:
Subgroup basic-reload-inject:
pass -> DMESG-WARN (shard-hsw) fdo#102707
Test kms_sysfs_edid_timing:
pass -> FAIL (shard-hsw) fdo#100047
Test kms_flip:
Subgroup flip-vs-dpms-interruptible:
pass -> INCOMPLETE (shard-hsw)
fdo#102707 https://bugs.freedesktop.org/show_bug.cgi?id=102707
fdo#100047 https://bugs.freedesktop.org/show_bug.cgi?id=100047
shard-hsw total:2536 pass:1426 dwarn:2 dfail:0 fail:10 skip:1097 time:9009s
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6284/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6] drm/i915/guc: Add support for reset engine using GuC commands
2017-10-31 22:53 ` [PATCH v6] " Michel Thierry
@ 2017-11-01 13:58 ` Chris Wilson
2017-11-01 20:41 ` Jeff McGee
0 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2017-11-01 13:58 UTC (permalink / raw)
To: Michel Thierry, intel-gfx
Quoting Michel Thierry (2017-10-31 22:53:09)
> This patch adds per engine reset and recovery (TDR) support when GuC is
> used to submit workloads to GPU.
>
> In the case of i915 directly submission to ELSP, driver manages hang
> detection, recovery and resubmission. With GuC submission these tasks
> are shared between driver and GuC. i915 is still responsible for detecting
> a hang, and when it does it only requests GuC to reset that Engine. GuC
> internally manages acquiring forcewake and idling the engine before
> resetting it.
>
> Once the reset is successful, i915 takes over again and handles the
> resubmission. The scheduler in i915 knows which requests are pending so
> after resetting a engine, pending workloads/requests are resubmitted
> again.
>
> v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> non-guc function names.
>
> v3: Removed debug message about engine restarting from which request,
> since the new baseline do it regardless of submission mode. (Chris)
>
> v4: Rebase.
>
> v5: Do not pass unnecessary reporting flags to the fw (Jeff);
> tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
>
> v6: Rename the existing reset engine function and share a similar
> interface between guc and non-guc paths (Chris).
>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
> drivers/gpu/drm/i915/i915_drv.c | 15 +++++++++++++--
> drivers/gpu/drm/i915/i915_drv.h | 2 ++
> drivers/gpu/drm/i915/intel_guc.c | 24 ++++++++++++++++++++++++
> drivers/gpu/drm/i915/intel_guc_fwif.h | 1 +
> drivers/gpu/drm/i915/intel_uncore.c | 5 -----
> 5 files changed, 40 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index af745749509c..359333a423cf 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1950,6 +1950,12 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
> goto finish;
> }
>
> +static inline int intel_gt_reset_engine(struct drm_i915_private *dev_priv,
> + struct intel_engine_cs *engine)
> +{
> + return intel_gpu_reset(dev_priv, intel_engine_flag(engine));
> +}
> +
> /**
> * i915_reset_engine - reset GPU engine to recover from a hang
> * @engine: engine to reset
> @@ -1984,10 +1990,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
> goto out;
> }
>
> - ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
> + if (!engine->i915->guc.execbuf_client)
> + ret = intel_gt_reset_engine(engine->i915, engine);
> + else
> + ret = intel_guc_reset_engine(&engine->i915->guc, engine);
> +
> if (ret) {
> /* If we fail here, we expect to fallback to a global reset */
> - DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n",
> + DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n",
> + (engine->i915->guc.execbuf_client ? "GUC ":""),
A bit overkill on the parentheses there ;)
Lgtm, can you please ping, say, Jeff or Daniele for an r-b on the guc
interaction?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6] drm/i915/guc: Add support for reset engine using GuC commands
2017-11-01 13:58 ` Chris Wilson
@ 2017-11-01 20:41 ` Jeff McGee
2017-11-02 8:43 ` Chris Wilson
0 siblings, 1 reply; 22+ messages in thread
From: Jeff McGee @ 2017-11-01 20:41 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
On Wed, Nov 01, 2017 at 01:58:04PM +0000, Chris Wilson wrote:
> Quoting Michel Thierry (2017-10-31 22:53:09)
> > This patch adds per engine reset and recovery (TDR) support when GuC is
> > used to submit workloads to GPU.
> >
> > In the case of i915 directly submission to ELSP, driver manages hang
> > detection, recovery and resubmission. With GuC submission these tasks
> > are shared between driver and GuC. i915 is still responsible for detecting
> > a hang, and when it does it only requests GuC to reset that Engine. GuC
> > internally manages acquiring forcewake and idling the engine before
> > resetting it.
> >
> > Once the reset is successful, i915 takes over again and handles the
> > resubmission. The scheduler in i915 knows which requests are pending so
> > after resetting a engine, pending workloads/requests are resubmitted
> > again.
> >
> > v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> > non-guc function names.
> >
> > v3: Removed debug message about engine restarting from which request,
> > since the new baseline do it regardless of submission mode. (Chris)
> >
> > v4: Rebase.
> >
> > v5: Do not pass unnecessary reporting flags to the fw (Jeff);
> > tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
> >
> > v6: Rename the existing reset engine function and share a similar
> > interface between guc and non-guc paths (Chris).
> >
> > Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> > drivers/gpu/drm/i915/i915_drv.c | 15 +++++++++++++--
> > drivers/gpu/drm/i915/i915_drv.h | 2 ++
> > drivers/gpu/drm/i915/intel_guc.c | 24 ++++++++++++++++++++++++
> > drivers/gpu/drm/i915/intel_guc_fwif.h | 1 +
> > drivers/gpu/drm/i915/intel_uncore.c | 5 -----
> > 5 files changed, 40 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index af745749509c..359333a423cf 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -1950,6 +1950,12 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
> > goto finish;
> > }
> >
> > +static inline int intel_gt_reset_engine(struct drm_i915_private *dev_priv,
> > + struct intel_engine_cs *engine)
> > +{
> > + return intel_gpu_reset(dev_priv, intel_engine_flag(engine));
> > +}
> > +
> > /**
> > * i915_reset_engine - reset GPU engine to recover from a hang
> > * @engine: engine to reset
> > @@ -1984,10 +1990,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
> > goto out;
> > }
> >
> > - ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
> > + if (!engine->i915->guc.execbuf_client)
> > + ret = intel_gt_reset_engine(engine->i915, engine);
> > + else
> > + ret = intel_guc_reset_engine(&engine->i915->guc, engine);
> > +
> > if (ret) {
> > /* If we fail here, we expect to fallback to a global reset */
> > - DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n",
> > + DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n",
> > + (engine->i915->guc.execbuf_client ? "GUC ":""),
>
> A bit overkill on the parentheses there ;)
>
> Lgtm, can you please ping, say, Jeff or Daniele for an r-b on the guc
> interaction?
> -Chris
There is one small change needed in the GuC preemption protocol to make it
compatible with GuC engine reset. I will send that shortly.
There are also a couple of corner case bugs with engine reset in our current
firmware versions. We are planning a firmware update to address those. But
the host-side code here is fine. So...
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6] drm/i915/guc: Add support for reset engine using GuC commands
2017-11-01 20:41 ` Jeff McGee
@ 2017-11-02 8:43 ` Chris Wilson
0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-11-02 8:43 UTC (permalink / raw)
To: Jeff McGee; +Cc: intel-gfx
Quoting Jeff McGee (2017-11-01 20:41:13)
> On Wed, Nov 01, 2017 at 01:58:04PM +0000, Chris Wilson wrote:
> > Quoting Michel Thierry (2017-10-31 22:53:09)
> > > This patch adds per engine reset and recovery (TDR) support when GuC is
> > > used to submit workloads to GPU.
> > >
> > > In the case of i915 directly submission to ELSP, driver manages hang
> > > detection, recovery and resubmission. With GuC submission these tasks
> > > are shared between driver and GuC. i915 is still responsible for detecting
> > > a hang, and when it does it only requests GuC to reset that Engine. GuC
> > > internally manages acquiring forcewake and idling the engine before
> > > resetting it.
> > >
> > > Once the reset is successful, i915 takes over again and handles the
> > > resubmission. The scheduler in i915 knows which requests are pending so
> > > after resetting a engine, pending workloads/requests are resubmitted
> > > again.
> > >
> > > v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
> > > non-guc function names.
> > >
> > > v3: Removed debug message about engine restarting from which request,
> > > since the new baseline do it regardless of submission mode. (Chris)
> > >
> > > v4: Rebase.
> > >
> > > v5: Do not pass unnecessary reporting flags to the fw (Jeff);
> > > tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
> > >
> > > v6: Rename the existing reset engine function and share a similar
> > > interface between guc and non-guc paths (Chris).
> > >
> > > Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > > drivers/gpu/drm/i915/i915_drv.c | 15 +++++++++++++--
> > > drivers/gpu/drm/i915/i915_drv.h | 2 ++
> > > drivers/gpu/drm/i915/intel_guc.c | 24 ++++++++++++++++++++++++
> > > drivers/gpu/drm/i915/intel_guc_fwif.h | 1 +
> > > drivers/gpu/drm/i915/intel_uncore.c | 5 -----
> > > 5 files changed, 40 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > > index af745749509c..359333a423cf 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.c
> > > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > > @@ -1950,6 +1950,12 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
> > > goto finish;
> > > }
> > >
> > > +static inline int intel_gt_reset_engine(struct drm_i915_private *dev_priv,
> > > + struct intel_engine_cs *engine)
> > > +{
> > > + return intel_gpu_reset(dev_priv, intel_engine_flag(engine));
> > > +}
> > > +
> > > /**
> > > * i915_reset_engine - reset GPU engine to recover from a hang
> > > * @engine: engine to reset
> > > @@ -1984,10 +1990,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, unsigned int flags)
> > > goto out;
> > > }
> > >
> > > - ret = intel_gpu_reset(engine->i915, intel_engine_flag(engine));
> > > + if (!engine->i915->guc.execbuf_client)
> > > + ret = intel_gt_reset_engine(engine->i915, engine);
> > > + else
> > > + ret = intel_guc_reset_engine(&engine->i915->guc, engine);
> > > +
> > > if (ret) {
> > > /* If we fail here, we expect to fallback to a global reset */
> > > - DRM_DEBUG_DRIVER("Failed to reset %s, ret=%d\n",
> > > + DRM_DEBUG_DRIVER("%sFailed to reset %s, ret=%d\n",
> > > + (engine->i915->guc.execbuf_client ? "GUC ":""),
> >
> > A bit overkill on the parentheses there ;)
> >
> > Lgtm, can you please ping, say, Jeff or Daniele for an r-b on the guc
> > interaction?
> > -Chris
>
> There is one small change needed in the GuC preemption protocol to make it
> compatible with GuC engine reset. I will send that shortly.
>
> There are also a couple of corner case bugs with engine reset in our current
> firmware versions. We are planning a firmware update to address those. But
> the host-side code here is fine. So...
>
> Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
Pushed this series, along with the preempt interaction fix.
Thanks,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2017-11-02 8:43 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-30 18:56 [PATCH 0/3] GuC based reset engine Michel Thierry
2017-10-30 18:56 ` [PATCH 1/3] drm/i915/guc: Rename the function that resets the GuC Michel Thierry
2017-10-30 21:02 ` Chris Wilson
2017-10-30 18:56 ` [PATCH 2/3] drm/i915/guc: Add support for reset engine using GuC commands Michel Thierry
2017-10-30 20:58 ` Chris Wilson
2017-10-30 21:08 ` Michel Thierry
2017-10-30 21:09 ` Chris Wilson
2017-10-31 4:38 ` Michel Thierry
2017-10-31 10:17 ` Chris Wilson
2017-10-31 22:53 ` [PATCH v6] " Michel Thierry
2017-11-01 13:58 ` Chris Wilson
2017-11-01 20:41 ` Jeff McGee
2017-11-02 8:43 ` Chris Wilson
2017-10-30 18:56 ` [PATCH 3/3] HAX enable GuC submission for CI Michel Thierry
2017-10-30 20:05 ` ✓ Fi.CI.BAT: success for GuC based reset engine Patchwork
2017-10-30 21:14 ` Chris Wilson
2017-10-30 23:20 ` ✗ Fi.CI.IGT: warning " Patchwork
2017-10-31 10:20 ` Chris Wilson
2017-10-31 20:56 ` Michel Thierry
2017-10-31 21:31 ` Chris Wilson
2017-10-31 23:50 ` ✓ Fi.CI.BAT: success for GuC based reset engine (rev2) Patchwork
2017-11-01 0:59 ` ✗ Fi.CI.IGT: failure " Patchwork
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.