All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
@ 2017-12-06 14:19 Chris Wilson
  2017-12-06 14:37 ` ✓ Fi.CI.BAT: success for " Patchwork
                   ` (11 more replies)
  0 siblings, 12 replies; 19+ messages in thread
From: Chris Wilson @ 2017-12-06 14:19 UTC (permalink / raw)
  To: intel-gfx

Since capturing the error state requires fiddling around with the GGTT
to read arbitrary buffers and is itself run under stop_machine(), it
deadlocks the machine (effectively a hard hang) when run in conjunction
with Broxton's VTd workaround to serialize GGTT access.

Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: John Harrison <john.C.Harrison@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 48418fb81066..e6c7e8e53815 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1813,6 +1813,10 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv,
 	if (!i915_modparams.error_capture)
 		return;
 
+	/* Prevent recursively calling stop_machine() and deadlocking. */
+	if (intel_ggtt_update_needs_vtd_wa(dev_priv))
+		return;
+
 	if (READ_ONCE(dev_priv->gpu_error.first_error))
 		return;
 
-- 
2.15.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
  2017-12-06 14:19 [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
@ 2017-12-06 14:37 ` Patchwork
  2017-12-06 14:43 ` [PATCH] " Daniel Vetter
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2017-12-06 14:37 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
URL   : https://patchwork.freedesktop.org/series/34969/
State : success

== Summary ==

Series 34969v1 drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
https://patchwork.freedesktop.org/api/1.0/series/34969/revisions/1/mbox/

Test debugfs_test:
        Subgroup read_all_entries:
                dmesg-warn -> PASS       (fi-elk-e7500) fdo#103989 +1
Test gem_mmap_gtt:
        Subgroup basic-small-bo-tiledx:
                pass       -> FAIL       (fi-gdg-551) fdo#102575

fdo#103989 https://bugs.freedesktop.org/show_bug.cgi?id=103989
fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575

fi-bdw-5557u     total:288  pass:267  dwarn:0   dfail:0   fail:0   skip:21  time:436s
fi-blb-e6850     total:288  pass:223  dwarn:1   dfail:0   fail:0   skip:64  time:383s
fi-bsw-n3050     total:288  pass:242  dwarn:0   dfail:0   fail:0   skip:46  time:523s
fi-bwr-2160      total:288  pass:183  dwarn:0   dfail:0   fail:0   skip:105 time:284s
fi-bxt-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:503s
fi-bxt-j4205     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:509s
fi-byt-j1900     total:288  pass:253  dwarn:0   dfail:0   fail:0   skip:35  time:488s
fi-byt-n2820     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:471s
fi-elk-e7500     total:224  pass:163  dwarn:15  dfail:0   fail:0   skip:45 
fi-gdg-551       total:288  pass:178  dwarn:1   dfail:0   fail:1   skip:108 time:266s
fi-glk-1         total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:538s
fi-hsw-4770      total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:369s
fi-hsw-4770r     total:288  pass:224  dwarn:0   dfail:0   fail:0   skip:64  time:259s
fi-ivb-3520m     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:471s
fi-ivb-3770      total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:445s
fi-kbl-7560u     total:288  pass:269  dwarn:0   dfail:0   fail:0   skip:19  time:528s
fi-kbl-7567u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:472s
fi-kbl-r         total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:535s
fi-pnv-d510      total:288  pass:222  dwarn:1   dfail:0   fail:0   skip:65  time:586s
fi-skl-6260u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:451s
fi-skl-6600u     total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:540s
fi-skl-6700hq    total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:575s
fi-skl-6700k     total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:518s
fi-skl-6770hq    total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:496s
fi-snb-2520m     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:552s
fi-snb-2600      total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:415s
Blacklisted hosts:
fi-cfl-s2        total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:614s
fi-cnl-y         total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:633s
fi-glk-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:487s
fi-kbl-7500u failed to connect after reboot

1a0d67efb4cc5611887c79adc5c3315790f78df5 drm-tip: 2017y-12m-06d-00h-51m-07s UTC integration manifest
1cb49b831f1d drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7426/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
  2017-12-06 14:19 [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
  2017-12-06 14:37 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2017-12-06 14:43 ` Daniel Vetter
  2017-12-06 14:48   ` Chris Wilson
  2017-12-06 15:26 ` ✗ Fi.CI.IGT: warning for " Patchwork
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Daniel Vetter @ 2017-12-06 14:43 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Dec 06, 2017 at 02:19:03PM +0000, Chris Wilson wrote:
> Since capturing the error state requires fiddling around with the GGTT
> to read arbitrary buffers and is itself run under stop_machine(), it
> deadlocks the machine (effectively a hard hang) when run in conjunction
> with Broxton's VTd workaround to serialize GGTT access.
> 
> Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: John Harrison <john.C.Harrison@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 48418fb81066..e6c7e8e53815 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1813,6 +1813,10 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv,
>  	if (!i915_modparams.error_capture)
>  		return;
>  
> +	/* Prevent recursively calling stop_machine() and deadlocking. */
> +	if (intel_ggtt_update_needs_vtd_wa(dev_priv))
> +		return;

I'd put this closer to the stop machine, at the head of
i915_capture_gpu_state(). If the bogus debug output annoys then we could
switch that to an PTR_ERR return value I guess. But I guess this here is
ok too, so either way:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> +
>  	if (READ_ONCE(dev_priv->gpu_error.first_error))
>  		return;
>  
> -- 
> 2.15.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
  2017-12-06 14:43 ` [PATCH] " Daniel Vetter
@ 2017-12-06 14:48   ` Chris Wilson
  2017-12-06 14:51     ` Daniel Vetter
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2017-12-06 14:48 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

Quoting Daniel Vetter (2017-12-06 14:43:39)
> On Wed, Dec 06, 2017 at 02:19:03PM +0000, Chris Wilson wrote:
> > Since capturing the error state requires fiddling around with the GGTT
> > to read arbitrary buffers and is itself run under stop_machine(), it
> > deadlocks the machine (effectively a hard hang) when run in conjunction
> > with Broxton's VTd workaround to serialize GGTT access.
> > 
> > Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> > Cc: John Harrison <john.C.Harrison@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index 48418fb81066..e6c7e8e53815 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -1813,6 +1813,10 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv,
> >       if (!i915_modparams.error_capture)
> >               return;
> >  
> > +     /* Prevent recursively calling stop_machine() and deadlocking. */
> > +     if (intel_ggtt_update_needs_vtd_wa(dev_priv))
> > +             return;
> 
> I'd put this closer to the stop machine, at the head of
> i915_capture_gpu_state(). If the bogus debug output annoys then we could
> switch that to an PTR_ERR return value I guess. But I guess this here is
> ok too, so either way:

I was considering doing some of the capture, skipping the buffers, but
nowadays those buffers tend to the crux of triaging. My only real concern
is how to explain to the user that the error state cannot exist, for 
which we could go and add -ENODEV to sysfs/debugfs just to be clear.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
  2017-12-06 14:48   ` Chris Wilson
@ 2017-12-06 14:51     ` Daniel Vetter
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2017-12-06 14:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Dec 06, 2017 at 02:48:36PM +0000, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-12-06 14:43:39)
> > On Wed, Dec 06, 2017 at 02:19:03PM +0000, Chris Wilson wrote:
> > > Since capturing the error state requires fiddling around with the GGTT
> > > to read arbitrary buffers and is itself run under stop_machine(), it
> > > deadlocks the machine (effectively a hard hang) when run in conjunction
> > > with Broxton's VTd workaround to serialize GGTT access.
> > > 
> > > Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> > > Cc: John Harrison <john.C.Harrison@intel.com>
> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > > index 48418fb81066..e6c7e8e53815 100644
> > > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > > @@ -1813,6 +1813,10 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv,
> > >       if (!i915_modparams.error_capture)
> > >               return;
> > >  
> > > +     /* Prevent recursively calling stop_machine() and deadlocking. */
> > > +     if (intel_ggtt_update_needs_vtd_wa(dev_priv))
> > > +             return;
> > 
> > I'd put this closer to the stop machine, at the head of
> > i915_capture_gpu_state(). If the bogus debug output annoys then we could
> > switch that to an PTR_ERR return value I guess. But I guess this here is
> > ok too, so either way:
> 
> I was considering doing some of the capture, skipping the buffers, but
> nowadays those buffers tend to the crux of triaging. My only real concern
> is how to explain to the user that the error state cannot exist, for 
> which we could go and add -ENODEV to sysfs/debugfs just to be clear.

Fancy idea: store ther PTR_ERR in ->first.error and return that? Would
address both my bikeshed and your suggestion.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* ✗ Fi.CI.IGT: warning for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
  2017-12-06 14:19 [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
  2017-12-06 14:37 ` ✓ Fi.CI.BAT: success for " Patchwork
  2017-12-06 14:43 ` [PATCH] " Daniel Vetter
@ 2017-12-06 15:26 ` Patchwork
  2017-12-06 15:37 ` [PATCH v2] " Chris Wilson
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2017-12-06 15:26 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
URL   : https://patchwork.freedesktop.org/series/34969/
State : warning

== Summary ==

Test kms_plane:
        Subgroup plane-position-covered-pipe-c-planes:
                skip       -> PASS       (shard-hsw)
Test kms_frontbuffer_tracking:
        Subgroup fbc-1p-offscren-pri-shrfb-draw-blt:
                fail       -> PASS       (shard-snb) fdo#101623 +1
Test kms_rotation_crc:
        Subgroup cursor-rotation-180:
                pass       -> SKIP       (shard-snb)
                pass       -> SKIP       (shard-hsw)
Test perf:
        Subgroup oa-exponents:
                fail       -> PASS       (shard-hsw) fdo#102254
Test kms_flip:
        Subgroup vblank-vs-suspend-interruptible:
                incomplete -> PASS       (shard-hsw) fdo#100368

fdo#101623 https://bugs.freedesktop.org/show_bug.cgi?id=101623
fdo#102254 https://bugs.freedesktop.org/show_bug.cgi?id=102254
fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368

shard-hsw        total:2679 pass:1535 dwarn:1   dfail:0   fail:10  skip:1133 time:9424s
shard-snb        total:2679 pass:1308 dwarn:1   dfail:0   fail:11  skip:1359 time:8079s
Blacklisted hosts:
shard-apl        total:2679 pass:1677 dwarn:2   dfail:0   fail:23  skip:977 time:13525s
shard-kbl        total:2571 pass:1722 dwarn:6   dfail:0   fail:22  skip:820 time:10349s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7426/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
  2017-12-06 14:19 [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
                   ` (2 preceding siblings ...)
  2017-12-06 15:26 ` ✗ Fi.CI.IGT: warning for " Patchwork
@ 2017-12-06 15:37 ` Chris Wilson
  2017-12-06 17:01   ` Bloomfield, Jon
  2017-12-06 16:11 ` ✓ Fi.CI.BAT: success for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev2) Patchwork
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2017-12-06 15:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

Since capturing the error state requires fiddling around with the GGTT
to read arbitrary buffers and is itself run under stop_machine(), it
deadlocks the machine (effectively a hard hang) when run in conjunction
with Broxton's VTd workaround to serialize GGTT access.

v2: Store the ERR_PTR in first_error so that the error can be reported
to the user via sysfs.

Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: John Harrison <john.C.Harrison@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.h       |  8 +++++++-
 drivers/gpu/drm/i915/i915_gem_gtt.c   |  3 +++
 drivers/gpu/drm/i915/i915_gpu_error.c | 15 ++++++++++++++-
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 594fd14e66c5..1eca4f954050 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3990,6 +3990,7 @@ static inline void i915_gpu_state_put(struct i915_gpu_state *gpu)
 
 struct i915_gpu_state *i915_first_error_state(struct drm_i915_private *i915);
 void i915_reset_error_state(struct drm_i915_private *i915);
+void i915_disable_error_state(struct drm_i915_private *i915, int err);
 
 #else
 
@@ -4002,13 +4003,18 @@ static inline void i915_capture_error_state(struct drm_i915_private *dev_priv,
 static inline struct i915_gpu_state *
 i915_first_error_state(struct drm_i915_private *i915)
 {
-	return NULL;
+	return ERR_PTR(-ENODEV);
 }
 
 static inline void i915_reset_error_state(struct drm_i915_private *i915)
 {
 }
 
+static inline void i915_disable_error_state(struct drm_i915_private *i915,
+					    int err)
+{
+}
+
 #endif
 
 const char *i915_cache_level_str(struct drm_i915_private *i915, int type);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f3c35e826321..0264d88b4cff 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3373,6 +3373,9 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
 		ggtt->base.insert_page    = bxt_vtd_ggtt_insert_page__BKL;
 		if (ggtt->base.clear_range != nop_clear_range)
 			ggtt->base.clear_range = bxt_vtd_ggtt_clear_range__BKL;
+
+		/* Prevent recursively calling stop_machine() and deadlocks. */
+		i915_disable_error_state(dev_priv, -ENODEV);
 	}
 
 	ggtt->invalidate = gen6_ggtt_invalidate;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 48418fb81066..0b45d28624b7 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -633,6 +633,9 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		return 0;
 	}
 
+	if (IS_ERR(error))
+		return PTR_ERR(error);
+
 	if (*error->error_msg)
 		err_printf(m, "%s\n", error->error_msg);
 	err_printf(m, "Kernel: " UTS_RELEASE "\n");
@@ -1819,6 +1822,7 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv,
 	error = i915_capture_gpu_state(dev_priv);
 	if (!error) {
 		DRM_DEBUG_DRIVER("out of memory, not capturing error state\n");
+		i915_disable_error_state(dev_priv, -ENOMEM);
 		return;
 	}
 
@@ -1874,5 +1878,14 @@ void i915_reset_error_state(struct drm_i915_private *i915)
 	i915->gpu_error.first_error = NULL;
 	spin_unlock_irq(&i915->gpu_error.lock);
 
-	i915_gpu_state_put(error);
+	if (!IS_ERR(error))
+		i915_gpu_state_put(error);
+}
+
+void i915_disable_error_state(struct drm_i915_private *i915, int err)
+{
+	spin_lock_irq(&i915->gpu_error.lock);
+	if (!i915->gpu_error.first_error)
+		i915->gpu_error.first_error = ERR_PTR(err);
+	spin_unlock_irq(&i915->gpu_error.lock);
 }
-- 
2.15.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev2)
  2017-12-06 14:19 [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
                   ` (3 preceding siblings ...)
  2017-12-06 15:37 ` [PATCH v2] " Chris Wilson
@ 2017-12-06 16:11 ` Patchwork
  2017-12-06 17:43 ` ✓ Fi.CI.IGT: " Patchwork
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2017-12-06 16:11 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev2)
URL   : https://patchwork.freedesktop.org/series/34969/
State : success

== Summary ==

Series 34969v2 drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
https://patchwork.freedesktop.org/api/1.0/series/34969/revisions/2/mbox/

Test debugfs_test:
        Subgroup read_all_entries:
                dmesg-warn -> DMESG-FAIL (fi-elk-e7500) fdo#103989
Test gem_mmap_gtt:
        Subgroup basic-small-bo-tiledx:
                fail       -> PASS       (fi-gdg-551) fdo#102575

fdo#103989 https://bugs.freedesktop.org/show_bug.cgi?id=103989
fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575

fi-bdw-5557u     total:288  pass:267  dwarn:0   dfail:0   fail:0   skip:21  time:438s
fi-blb-e6850     total:288  pass:223  dwarn:1   dfail:0   fail:0   skip:64  time:383s
fi-bsw-n3050     total:288  pass:242  dwarn:0   dfail:0   fail:0   skip:46  time:513s
fi-bwr-2160      total:288  pass:183  dwarn:0   dfail:0   fail:0   skip:105 time:281s
fi-bxt-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:501s
fi-bxt-j4205     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:509s
fi-byt-j1900     total:288  pass:253  dwarn:0   dfail:0   fail:0   skip:35  time:489s
fi-byt-n2820     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:472s
fi-elk-e7500     total:224  pass:163  dwarn:14  dfail:1   fail:0   skip:45 
fi-gdg-551       total:288  pass:179  dwarn:1   dfail:0   fail:0   skip:108 time:272s
fi-glk-1         total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:538s
fi-hsw-4770      total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:383s
fi-hsw-4770r     total:288  pass:224  dwarn:0   dfail:0   fail:0   skip:64  time:261s
fi-ivb-3520m     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:480s
fi-ivb-3770      total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:451s
fi-kbl-7560u     total:288  pass:269  dwarn:0   dfail:0   fail:0   skip:19  time:531s
fi-kbl-7567u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:476s
fi-kbl-r         total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:529s
fi-skl-6260u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:448s
fi-skl-6600u     total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:544s
fi-skl-6700hq    total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:567s
fi-skl-6700k     total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:517s
fi-skl-6770hq    total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:499s
fi-snb-2520m     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:550s
fi-snb-2600      total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:419s
Blacklisted hosts:
fi-cfl-s2        total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:631s
fi-cnl-y         total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:619s
fi-glk-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:491s

01b30547063a8ba25114041e6caf41fc98ea7ddb drm-tip: 2017y-12m-06d-15h-18m-33s UTC integration manifest
fce7cec98532 drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7429/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
  2017-12-06 15:37 ` [PATCH v2] " Chris Wilson
@ 2017-12-06 17:01   ` Bloomfield, Jon
  2017-12-06 17:25     ` Bloomfield, Jon
  0 siblings, 1 reply; 19+ messages in thread
From: Bloomfield, Jon @ 2017-12-06 17:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter

> -----Original Message-----
> From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> Sent: Wednesday, December 6, 2017 7:38 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: Chris Wilson <chris@chris-wilson.co.uk>; Bloomfield, Jon
> <jon.bloomfield@intel.com>; Harrison, John C <john.c.harrison@intel.com>;
> Ursulin, Tvrtko <tvrtko.ursulin@intel.com>; Joonas Lahtinen
> <joonas.lahtinen@linux.intel.com>; Daniel Vetter <daniel.vetter@ffwll.ch>
> Subject: [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd w/a
> and error capture
> 
> Since capturing the error state requires fiddling around with the GGTT
> to read arbitrary buffers and is itself run under stop_machine(), it
> deadlocks the machine (effectively a hard hang) when run in conjunction
> with Broxton's VTd workaround to serialize GGTT access.
> 
> v2: Store the ERR_PTR in first_error so that the error can be reported
> to the user via sysfs.
> 
> Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: John Harrison <john.C.Harrison@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

It's  a real shame to lose error capture on BXT. Can we wrap stop_machine to make it recursive ?

Something like...

static cpumask_t sm_mask;

struct sm_args {
        cpu_stop_fn_t *fn;
        void *data;
};

void do_recursive_stop(void *sm_arg_data)
{
        struct sm_arg *args = sm_arg_data;

        /* We're stopped - flag the fact to prevent recursion */
        cpumask_set_cpu(smp_processor_id(), &sm_mask);

        args->fn(args->data);

        /* Re-enable recursion */
        cpumask_clear_cpu(smp_processor_id(), &sm_mask);
}

void recursive_stop_machine(cpu_stop_fn_t fn, void *data)
{
        if (cpumask_test_cpu(smp_processor_id(), &sm_mask)) {
                /* We were already stopped, so can just call directly */
                fn(data);
        }
        else {
                /* Our CPU is not currently stopped */
                struct sm_args *args = {fn, data};
                stop_machine(do_recursive_stop, args, NULL);
        }
}
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
  2017-12-06 17:01   ` Bloomfield, Jon
@ 2017-12-06 17:25     ` Bloomfield, Jon
  0 siblings, 0 replies; 19+ messages in thread
From: Bloomfield, Jon @ 2017-12-06 17:25 UTC (permalink / raw)
  To: Bloomfield, Jon, Chris Wilson, intel-gfx; +Cc: Daniel Vetter

> -----Original Message-----
> From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf
> Of Bloomfield, Jon
> Sent: Wednesday, December 6, 2017 9:01 AM
> To: Chris Wilson <chris@chris-wilson.co.uk>; intel-gfx@lists.freedesktop.org
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Subject: Re: [Intel-gfx] [PATCH v2] drm/i915: Prevent machine hang from
> Broxton's vtd w/a and error capture
> 
> > -----Original Message-----
> > From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> > Sent: Wednesday, December 6, 2017 7:38 AM
> > To: intel-gfx@lists.freedesktop.org
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>; Bloomfield, Jon
> > <jon.bloomfield@intel.com>; Harrison, John C
> <john.c.harrison@intel.com>;
> > Ursulin, Tvrtko <tvrtko.ursulin@intel.com>; Joonas Lahtinen
> > <joonas.lahtinen@linux.intel.com>; Daniel Vetter <daniel.vetter@ffwll.ch>
> > Subject: [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd
> w/a
> > and error capture
> >
> > Since capturing the error state requires fiddling around with the GGTT
> > to read arbitrary buffers and is itself run under stop_machine(), it
> > deadlocks the machine (effectively a hard hang) when run in conjunction
> > with Broxton's VTd workaround to serialize GGTT access.
> >
> > v2: Store the ERR_PTR in first_error so that the error can be reported
> > to the user via sysfs.
> >
> > Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> > Cc: John Harrison <john.C.Harrison@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> It's  a real shame to lose error capture on BXT. Can we wrap stop_machine to
> make it recursive ?
> 
> Something like...
> 
> static cpumask_t sm_mask;
> 
> struct sm_args {
>         cpu_stop_fn_t *fn;
>         void *data;
> };
> 
> void do_recursive_stop(void *sm_arg_data)
> {
>         struct sm_arg *args = sm_arg_data;
> 
>         /* We're stopped - flag the fact to prevent recursion */
>         cpumask_set_cpu(smp_processor_id(), &sm_mask);
> 
>         args->fn(args->data);
> 
>         /* Re-enable recursion */
>         cpumask_clear_cpu(smp_processor_id(), &sm_mask);
> }
> 
> void recursive_stop_machine(cpu_stop_fn_t fn, void *data)
> {
>         if (cpumask_test_cpu(smp_processor_id(), &sm_mask)) {
>                 /* We were already stopped, so can just call directly */
>                 fn(data);
>         }
>         else {
>                 /* Our CPU is not currently stopped */
>                 struct sm_args *args = {fn, data};
>                 stop_machine(do_recursive_stop, args, NULL);
>         }
> }

... I think a single bool is sufficient in place of the cpumask, since it is set and cleared
within stop_machine - I started out trying to set/clear outside.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev2)
  2017-12-06 14:19 [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
                   ` (4 preceding siblings ...)
  2017-12-06 16:11 ` ✓ Fi.CI.BAT: success for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev2) Patchwork
@ 2017-12-06 17:43 ` Patchwork
  2018-10-11 11:21 ` ✗ Fi.CI.BAT: failure " Patchwork
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2017-12-06 17:43 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev2)
URL   : https://patchwork.freedesktop.org/series/34969/
State : success

== Summary ==

Test kms_frontbuffer_tracking:
        Subgroup fbc-1p-offscren-pri-shrfb-draw-render:
                pass       -> FAIL       (shard-snb) fdo#101623
Test kms_flip:
        Subgroup vblank-vs-dpms-suspend-interruptible:
                incomplete -> PASS       (shard-hsw) fdo#103706 +1
        Subgroup vblank-vs-modeset-suspend:
                skip       -> PASS       (shard-hsw)
Test kms_setmode:
        Subgroup basic:
                pass       -> FAIL       (shard-hsw) fdo#99912

fdo#101623 https://bugs.freedesktop.org/show_bug.cgi?id=101623
fdo#103706 https://bugs.freedesktop.org/show_bug.cgi?id=103706
fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912

shard-hsw        total:2527 pass:1455 dwarn:1   dfail:0   fail:10  skip:1061 time:8932s
shard-snb        total:2679 pass:1308 dwarn:1   dfail:0   fail:12  skip:1358 time:8101s
Blacklisted hosts:
shard-apl        total:2679 pass:1676 dwarn:2   dfail:0   fail:24  skip:977 time:13526s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7429/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* ✗ Fi.CI.BAT: failure for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev2)
  2017-12-06 14:19 [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
                   ` (5 preceding siblings ...)
  2017-12-06 17:43 ` ✓ Fi.CI.IGT: " Patchwork
@ 2018-10-11 11:21 ` Patchwork
  2018-10-11 11:37 ` [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2018-10-11 11:21 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev2)
URL   : https://patchwork.freedesktop.org/series/34969/
State : failure

== Summary ==

Applying: drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/i915_drv.h
M	drivers/gpu/drm/i915/i915_gem_gtt.c
M	drivers/gpu/drm/i915/i915_gpu_error.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/i915_gpu_error.c
Auto-merging drivers/gpu/drm/i915/i915_gem_gtt.c
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/i915_gem_gtt.c
Auto-merging drivers/gpu/drm/i915/i915_drv.h
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/i915_drv.h
error: Failed to merge in the changes.
Patch failed at 0001 drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7429/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
  2017-12-06 14:19 [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
                   ` (6 preceding siblings ...)
  2018-10-11 11:21 ` ✗ Fi.CI.BAT: failure " Patchwork
@ 2018-10-11 11:37 ` Chris Wilson
  2018-10-11 22:03   ` kbuild test robot
  2018-10-12  1:13   ` kbuild test robot
  2018-10-11 11:43 ` ✗ Fi.CI.BAT: failure for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev3) Patchwork
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 19+ messages in thread
From: Chris Wilson @ 2018-10-11 11:37 UTC (permalink / raw)
  To: intel-gfx

Since capturing the error state requires fiddling around with the GGTT
to read arbitrary buffers and is itself run under stop_machine(), it
deadlocks the machine (effectively a hard hang) when run in conjunction
with Broxton's VTd workaround to serialize GGTT access.

v2: Store the ERR_PTR in first_error so that the error can be reported
to the user via sysfs.

Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: John Harrison <john.C.Harrison@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c   |  3 +++
 drivers/gpu/drm/i915/i915_gpu_error.c | 15 ++++++++++++++-
 drivers/gpu/drm/i915/i915_gpu_error.h |  8 +++++++-
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 29ca9007a704..47b003daa6f3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3339,6 +3339,9 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
 		ggtt->vm.insert_page    = bxt_vtd_ggtt_insert_page__BKL;
 		if (ggtt->vm.clear_range != nop_clear_range)
 			ggtt->vm.clear_range = bxt_vtd_ggtt_clear_range__BKL;
+
+		/* Prevent recursively calling stop_machine() and deadlocks. */
+		i915_disable_error_state(dev_priv, -ENODEV);
 	}
 
 	ggtt->invalidate = gen6_ggtt_invalidate;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index c8d8f79688a8..f5b9914e9c6d 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -648,6 +648,9 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		return 0;
 	}
 
+	if (IS_ERR(error))
+		return PTR_ERR(error);
+
 	if (*error->error_msg)
 		err_printf(m, "%s\n", error->error_msg);
 	err_printf(m, "Kernel: " UTS_RELEASE "\n");
@@ -1867,6 +1870,7 @@ void i915_capture_error_state(struct drm_i915_private *i915,
 	error = i915_capture_gpu_state(i915);
 	if (!error) {
 		DRM_DEBUG_DRIVER("out of memory, not capturing error state\n");
+		i915_disable_error_state(dev_priv, -ENOMEM);
 		return;
 	}
 
@@ -1922,5 +1926,14 @@ void i915_reset_error_state(struct drm_i915_private *i915)
 	i915->gpu_error.first_error = NULL;
 	spin_unlock_irq(&i915->gpu_error.lock);
 
-	i915_gpu_state_put(error);
+	if (!IS_ERR(error))
+		i915_gpu_state_put(error);
+}
+
+void i915_disable_error_state(struct drm_i915_private *i915, int err)
+{
+	spin_lock_irq(&i915->gpu_error.lock);
+	if (!i915->gpu_error.first_error)
+		i915->gpu_error.first_error = ERR_PTR(err);
+	spin_unlock_irq(&i915->gpu_error.lock);
 }
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 8710fb18ed74..3ec89a504de5 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -343,6 +343,7 @@ static inline void i915_gpu_state_put(struct i915_gpu_state *gpu)
 
 struct i915_gpu_state *i915_first_error_state(struct drm_i915_private *i915);
 void i915_reset_error_state(struct drm_i915_private *i915);
+void i915_disable_error_state(struct drm_i915_private *i915, int err);
 
 #else
 
@@ -355,13 +356,18 @@ static inline void i915_capture_error_state(struct drm_i915_private *dev_priv,
 static inline struct i915_gpu_state *
 i915_first_error_state(struct drm_i915_private *i915)
 {
-	return NULL;
+	return ERR_PTR(-ENODEV);
 }
 
 static inline void i915_reset_error_state(struct drm_i915_private *i915)
 {
 }
 
+static inline void i915_disable_error_state(struct drm_i915_private *i915,
+					    int err)
+{
+}
+
 #endif /* IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR) */
 
 #endif /* _I915_GPU_ERROR_H_ */
-- 
2.19.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* ✗ Fi.CI.BAT: failure for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev3)
  2017-12-06 14:19 [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
                   ` (7 preceding siblings ...)
  2018-10-11 11:37 ` [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
@ 2018-10-11 11:43 ` Patchwork
  2018-10-11 11:51 ` [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2018-10-11 11:43 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev3)
URL   : https://patchwork.freedesktop.org/series/34969/
State : failure

== Summary ==

CALL    scripts/checksyscalls.sh
  DESCEND  objtool
  CHK     include/generated/compile.h
  CC [M]  drivers/gpu/drm/i915/i915_gpu_error.o
drivers/gpu/drm/i915/i915_gpu_error.c: In function ‘i915_capture_error_state’:
drivers/gpu/drm/i915/i915_gpu_error.c:1873:28: error: ‘dev_priv’ undeclared (first use in this function); did you mean ‘dev_crit’?
   i915_disable_error_state(dev_priv, -ENOMEM);
                            ^~~~~~~~
                            dev_crit
drivers/gpu/drm/i915/i915_gpu_error.c:1873:28: note: each undeclared identifier is reported only once for each function it appears in
scripts/Makefile.build:305: recipe for target 'drivers/gpu/drm/i915/i915_gpu_error.o' failed
make[4]: *** [drivers/gpu/drm/i915/i915_gpu_error.o] Error 1
scripts/Makefile.build:546: recipe for target 'drivers/gpu/drm/i915' failed
make[3]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:546: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:546: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Makefile:1050: recipe for target 'drivers' failed
make: *** [drivers] Error 2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
  2017-12-06 14:19 [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
                   ` (8 preceding siblings ...)
  2018-10-11 11:43 ` ✗ Fi.CI.BAT: failure for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev3) Patchwork
@ 2018-10-11 11:51 ` Chris Wilson
  2018-10-11 12:29 ` ✓ Fi.CI.BAT: success for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev4) Patchwork
  2018-10-11 17:13 ` ✓ Fi.CI.IGT: " Patchwork
  11 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2018-10-11 11:51 UTC (permalink / raw)
  To: intel-gfx

Since capturing the error state requires fiddling around with the GGTT
to read arbitrary buffers and is itself run under stop_machine(), it
deadlocks the machine (effectively a hard hang) when run in conjunction
with Broxton's VTd workaround to serialize GGTT access.

v2: Store the ERR_PTR in first_error so that the error can be reported
to the user via sysfs.

Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: John Harrison <john.C.Harrison@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c   |  3 +++
 drivers/gpu/drm/i915/i915_gpu_error.c | 15 ++++++++++++++-
 drivers/gpu/drm/i915/i915_gpu_error.h |  8 +++++++-
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 29ca9007a704..47b003daa6f3 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3339,6 +3339,9 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
 		ggtt->vm.insert_page    = bxt_vtd_ggtt_insert_page__BKL;
 		if (ggtt->vm.clear_range != nop_clear_range)
 			ggtt->vm.clear_range = bxt_vtd_ggtt_clear_range__BKL;
+
+		/* Prevent recursively calling stop_machine() and deadlocks. */
+		i915_disable_error_state(dev_priv, -ENODEV);
 	}
 
 	ggtt->invalidate = gen6_ggtt_invalidate;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index c8d8f79688a8..21b5c8765015 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -648,6 +648,9 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		return 0;
 	}
 
+	if (IS_ERR(error))
+		return PTR_ERR(error);
+
 	if (*error->error_msg)
 		err_printf(m, "%s\n", error->error_msg);
 	err_printf(m, "Kernel: " UTS_RELEASE "\n");
@@ -1867,6 +1870,7 @@ void i915_capture_error_state(struct drm_i915_private *i915,
 	error = i915_capture_gpu_state(i915);
 	if (!error) {
 		DRM_DEBUG_DRIVER("out of memory, not capturing error state\n");
+		i915_disable_error_state(i915, -ENOMEM);
 		return;
 	}
 
@@ -1922,5 +1926,14 @@ void i915_reset_error_state(struct drm_i915_private *i915)
 	i915->gpu_error.first_error = NULL;
 	spin_unlock_irq(&i915->gpu_error.lock);
 
-	i915_gpu_state_put(error);
+	if (!IS_ERR(error))
+		i915_gpu_state_put(error);
+}
+
+void i915_disable_error_state(struct drm_i915_private *i915, int err)
+{
+	spin_lock_irq(&i915->gpu_error.lock);
+	if (!i915->gpu_error.first_error)
+		i915->gpu_error.first_error = ERR_PTR(err);
+	spin_unlock_irq(&i915->gpu_error.lock);
 }
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 8710fb18ed74..3ec89a504de5 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -343,6 +343,7 @@ static inline void i915_gpu_state_put(struct i915_gpu_state *gpu)
 
 struct i915_gpu_state *i915_first_error_state(struct drm_i915_private *i915);
 void i915_reset_error_state(struct drm_i915_private *i915);
+void i915_disable_error_state(struct drm_i915_private *i915, int err);
 
 #else
 
@@ -355,13 +356,18 @@ static inline void i915_capture_error_state(struct drm_i915_private *dev_priv,
 static inline struct i915_gpu_state *
 i915_first_error_state(struct drm_i915_private *i915)
 {
-	return NULL;
+	return ERR_PTR(-ENODEV);
 }
 
 static inline void i915_reset_error_state(struct drm_i915_private *i915)
 {
 }
 
+static inline void i915_disable_error_state(struct drm_i915_private *i915,
+					    int err)
+{
+}
+
 #endif /* IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR) */
 
 #endif /* _I915_GPU_ERROR_H_ */
-- 
2.19.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev4)
  2017-12-06 14:19 [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
                   ` (9 preceding siblings ...)
  2018-10-11 11:51 ` [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
@ 2018-10-11 12:29 ` Patchwork
  2018-10-11 17:13 ` ✓ Fi.CI.IGT: " Patchwork
  11 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2018-10-11 12:29 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev4)
URL   : https://patchwork.freedesktop.org/series/34969/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4969 -> Patchwork_10427 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/34969/revisions/4/mbox/

== Known issues ==

  Here are the changes found in Patchwork_10427 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_flip@basic-flip-vs-modeset:
      fi-hsw-4770r:       PASS -> DMESG-WARN (fdo#105602) +1

    igt@kms_frontbuffer_tracking@basic:
      {fi-icl-u2}:        SKIP -> FAIL (fdo#103167)
      fi-byt-clapper:     PASS -> FAIL (fdo#103167)

    igt@kms_pipe_crc_basic@read-crc-pipe-b:
      fi-byt-clapper:     PASS -> FAIL (fdo#107362)

    
    ==== Possible fixes ====

    igt@gem_exec_suspend@basic-s3:
      fi-cfl-8109u:       INCOMPLETE (fdo#107187, fdo#108126) -> PASS

    igt@kms_chamelium@dp-edid-read:
      fi-kbl-7500u:       WARN (fdo#102672) -> PASS

    igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence:
      fi-byt-clapper:     FAIL (fdo#107362, fdo#103191) -> PASS +1

    
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  fdo#102672 https://bugs.freedesktop.org/show_bug.cgi?id=102672
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
  fdo#105602 https://bugs.freedesktop.org/show_bug.cgi?id=105602
  fdo#107187 https://bugs.freedesktop.org/show_bug.cgi?id=107187
  fdo#107362 https://bugs.freedesktop.org/show_bug.cgi?id=107362
  fdo#108126 https://bugs.freedesktop.org/show_bug.cgi?id=108126


== Participating hosts (44 -> 39) ==

  Missing    (5): fi-bsw-cyan fi-ilk-m540 fi-byt-squawks fi-gdg-551 fi-pnv-d510 


== Build changes ==

    * Linux: CI_DRM_4969 -> Patchwork_10427

  CI_DRM_4969: 1121d2889e57dedacc0885deaaa9de614832e62f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4673: 54cb1aeb4e50dea9f3abae632e317875d147c4ab @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10427: ea61179bb5e951736f94721fa7359e98e78a3906 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

ea61179bb5e9 drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10427/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev4)
  2017-12-06 14:19 [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
                   ` (10 preceding siblings ...)
  2018-10-11 12:29 ` ✓ Fi.CI.BAT: success for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev4) Patchwork
@ 2018-10-11 17:13 ` Patchwork
  11 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2018-10-11 17:13 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev4)
URL   : https://patchwork.freedesktop.org/series/34969/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4969_full -> Patchwork_10427_full =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_10427_full need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_10427_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_10427_full:

  === IGT changes ===

    ==== Warnings ====

    igt@perf_pmu@rc6:
      shard-kbl:          PASS -> SKIP

    igt@pm_rc6_residency@rc6-accuracy:
      shard-snb:          SKIP -> PASS

    
== Known issues ==

  Here are the changes found in Patchwork_10427_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_hangman@error-state-capture-render:
      shard-glk:          PASS -> INCOMPLETE (k.org#198133, fdo#103359)

    igt@gem_exec_schedule@pi-ringfull-blt:
      shard-skl:          NOTRUN -> FAIL (fdo#103158) +2

    igt@gem_ppgtt@blt-vs-render-ctxn:
      shard-skl:          NOTRUN -> TIMEOUT (fdo#108039)

    igt@gem_userptr_blits@readonly-unsync:
      shard-skl:          NOTRUN -> INCOMPLETE (fdo#108074)

    igt@kms_available_modes_crc@available_mode_test_crc:
      shard-apl:          PASS -> FAIL (fdo#106641)

    igt@kms_busy@extended-pageflip-hang-newfb-render-a:
      shard-hsw:          PASS -> DMESG-WARN (fdo#102614)

    igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-c:
      shard-skl:          NOTRUN -> DMESG-WARN (fdo#107956)

    igt@kms_cursor_crc@cursor-128x128-random:
      shard-apl:          PASS -> FAIL (fdo#103232)

    igt@kms_cursor_crc@cursor-64x64-dpms:
      shard-glk:          PASS -> FAIL (fdo#103232) +1

    igt@kms_draw_crc@draw-method-xrgb2101010-pwrite-xtiled:
      shard-skl:          PASS -> FAIL (fdo#103184)

    igt@kms_fbcon_fbt@psr:
      shard-skl:          NOTRUN -> FAIL (fdo#107882)

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-cpu:
      shard-skl:          NOTRUN -> FAIL (fdo#103167)

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-fullscreen:
      shard-apl:          PASS -> FAIL (fdo#103167) +1

    igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-draw-mmap-cpu:
      shard-glk:          PASS -> FAIL (fdo#103167) +2

    igt@kms_frontbuffer_tracking@fbc-stridechange:
      shard-skl:          NOTRUN -> FAIL (fdo#105683)

    igt@kms_panel_fitting@legacy:
      shard-skl:          NOTRUN -> FAIL (fdo#105456)

    igt@kms_pipe_crc_basic@read-crc-pipe-c:
      shard-skl:          NOTRUN -> FAIL (fdo#107362) +1

    igt@kms_plane@plane-position-covered-pipe-c-planes:
      shard-glk:          PASS -> FAIL (fdo#103166) +1

    {igt@kms_plane_alpha_blend@pipe-b-coverage-7efc}:
      shard-skl:          NOTRUN -> FAIL (fdo#108146)

    {igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min}:
      shard-skl:          NOTRUN -> FAIL (fdo#108145) +1

    igt@kms_plane_multiple@atomic-pipe-a-tiling-y:
      shard-apl:          PASS -> FAIL (fdo#103166) +1

    igt@perf_pmu@rc6-runtime-pm:
      shard-glk:          PASS -> FAIL (fdo#105010)
      shard-apl:          PASS -> FAIL (fdo#105010)

    
    ==== Possible fixes ====

    igt@gem_exec_await@wide-contexts:
      shard-glk:          DMESG-FAIL (fdo#106680) -> PASS

    igt@kms_cursor_crc@cursor-128x128-dpms:
      shard-apl:          FAIL (fdo#103232) -> PASS +1

    igt@kms_flip@flip-vs-expired-vblank:
      shard-kbl:          FAIL (fdo#105363, fdo#102887) -> PASS

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-mmap-cpu:
      shard-glk:          FAIL (fdo#103167) -> PASS +3

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-mmap-wc:
      shard-apl:          FAIL (fdo#103167) -> PASS

    {igt@kms_plane_alpha_blend@pipe-b-constant-alpha-max}:
      shard-glk:          FAIL (fdo#108145) -> PASS

    igt@pm_rpm@dpms-non-lpsp:
      shard-skl:          INCOMPLETE (fdo#107807) -> SKIP

    
    ==== Warnings ====

    igt@kms_vblank@pipe-b-wait-forked:
      shard-snb:          DMESG-WARN (fdo#107469) -> INCOMPLETE (fdo#105411)

    
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  fdo#102614 https://bugs.freedesktop.org/show_bug.cgi?id=102614
  fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887
  fdo#103158 https://bugs.freedesktop.org/show_bug.cgi?id=103158
  fdo#103166 https://bugs.freedesktop.org/show_bug.cgi?id=103166
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103184 https://bugs.freedesktop.org/show_bug.cgi?id=103184
  fdo#103232 https://bugs.freedesktop.org/show_bug.cgi?id=103232
  fdo#103359 https://bugs.freedesktop.org/show_bug.cgi?id=103359
  fdo#105010 https://bugs.freedesktop.org/show_bug.cgi?id=105010
  fdo#105363 https://bugs.freedesktop.org/show_bug.cgi?id=105363
  fdo#105411 https://bugs.freedesktop.org/show_bug.cgi?id=105411
  fdo#105456 https://bugs.freedesktop.org/show_bug.cgi?id=105456
  fdo#105683 https://bugs.freedesktop.org/show_bug.cgi?id=105683
  fdo#106641 https://bugs.freedesktop.org/show_bug.cgi?id=106641
  fdo#106680 https://bugs.freedesktop.org/show_bug.cgi?id=106680
  fdo#107362 https://bugs.freedesktop.org/show_bug.cgi?id=107362
  fdo#107469 https://bugs.freedesktop.org/show_bug.cgi?id=107469
  fdo#107807 https://bugs.freedesktop.org/show_bug.cgi?id=107807
  fdo#107882 https://bugs.freedesktop.org/show_bug.cgi?id=107882
  fdo#107956 https://bugs.freedesktop.org/show_bug.cgi?id=107956
  fdo#108039 https://bugs.freedesktop.org/show_bug.cgi?id=108039
  fdo#108074 https://bugs.freedesktop.org/show_bug.cgi?id=108074
  fdo#108145 https://bugs.freedesktop.org/show_bug.cgi?id=108145
  fdo#108146 https://bugs.freedesktop.org/show_bug.cgi?id=108146
  k.org#198133 https://bugzilla.kernel.org/show_bug.cgi?id=198133


== Participating hosts (6 -> 6) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4969 -> Patchwork_10427

  CI_DRM_4969: 1121d2889e57dedacc0885deaaa9de614832e62f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4673: 54cb1aeb4e50dea9f3abae632e317875d147c4ab @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10427: ea61179bb5e951736f94721fa7359e98e78a3906 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10427/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
  2018-10-11 11:37 ` [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
@ 2018-10-11 22:03   ` kbuild test robot
  2018-10-12  1:13   ` kbuild test robot
  1 sibling, 0 replies; 19+ messages in thread
From: kbuild test robot @ 2018-10-11 22:03 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3795 bytes --]

Hi Chris,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on drm-intel/for-linux-next]
[also build test ERROR on v4.19-rc7 next-20181011]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Chris-Wilson/drm-i915-Prevent-machine-hang-from-Broxton-s-vtd-w-a-and-error-capture/20181012-053134
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: i386-randconfig-x019-201840 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/gpu/drm/i915/i915_gpu_error.c: In function 'i915_capture_error_state':
>> drivers/gpu/drm/i915/i915_gpu_error.c:1827:28: error: 'dev_priv' undeclared (first use in this function); did you mean 'dev_crit'?
      i915_disable_error_state(dev_priv, -ENOMEM);
                               ^~~~~~~~
                               dev_crit
   drivers/gpu/drm/i915/i915_gpu_error.c:1827:28: note: each undeclared identifier is reported only once for each function it appears in

vim +1827 drivers/gpu/drm/i915/i915_gpu_error.c

  1798	
  1799	/**
  1800	 * i915_capture_error_state - capture an error record for later analysis
  1801	 * @i915: i915 device
  1802	 * @engine_mask: the mask of engines triggering the hang
  1803	 * @error_msg: a message to insert into the error capture header
  1804	 *
  1805	 * Should be called when an error is detected (either a hang or an error
  1806	 * interrupt) to capture error state from the time of the error.  Fills
  1807	 * out a structure which becomes available in debugfs for user level tools
  1808	 * to pick up.
  1809	 */
  1810	void i915_capture_error_state(struct drm_i915_private *i915,
  1811				      u32 engine_mask,
  1812				      const char *error_msg)
  1813	{
  1814		static bool warned;
  1815		struct i915_gpu_state *error;
  1816		unsigned long flags;
  1817	
  1818		if (!i915_modparams.error_capture)
  1819			return;
  1820	
  1821		if (READ_ONCE(i915->gpu_error.first_error))
  1822			return;
  1823	
  1824		error = i915_capture_gpu_state(i915);
  1825		if (!error) {
  1826			DRM_DEBUG_DRIVER("out of memory, not capturing error state\n");
> 1827			i915_disable_error_state(dev_priv, -ENOMEM);
  1828			return;
  1829		}
  1830	
  1831		i915_error_capture_msg(i915, error, engine_mask, error_msg);
  1832		DRM_INFO("%s\n", error->error_msg);
  1833	
  1834		if (!error->simulated) {
  1835			spin_lock_irqsave(&i915->gpu_error.lock, flags);
  1836			if (!i915->gpu_error.first_error) {
  1837				i915->gpu_error.first_error = error;
  1838				error = NULL;
  1839			}
  1840			spin_unlock_irqrestore(&i915->gpu_error.lock, flags);
  1841		}
  1842	
  1843		if (error) {
  1844			__i915_gpu_state_free(&error->ref);
  1845			return;
  1846		}
  1847	
  1848		if (!warned &&
  1849		    ktime_get_real_seconds() - DRIVER_TIMESTAMP < DAY_AS_SECONDS(180)) {
  1850			DRM_INFO("GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.\n");
  1851			DRM_INFO("Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel\n");
  1852			DRM_INFO("drm/i915 developers can then reassign to the right component if it's not a kernel issue.\n");
  1853			DRM_INFO("The gpu crash dump is required to analyze gpu hangs, so please always attach it.\n");
  1854			DRM_INFO("GPU crash dump saved to /sys/class/drm/card%d/error\n",
  1855				 i915->drm.primary->index);
  1856			warned = true;
  1857		}
  1858	}
  1859	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30015 bytes --]

[-- Attachment #3: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
  2018-10-11 11:37 ` [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
  2018-10-11 22:03   ` kbuild test robot
@ 2018-10-12  1:13   ` kbuild test robot
  1 sibling, 0 replies; 19+ messages in thread
From: kbuild test robot @ 2018-10-12  1:13 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3753 bytes --]

Hi Chris,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on drm-intel/for-linux-next]
[also build test ERROR on v4.19-rc7 next-20181011]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Chris-Wilson/drm-i915-Prevent-machine-hang-from-Broxton-s-vtd-w-a-and-error-capture/20181012-053134
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: i386-randconfig-s1-10111203 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/gpu//drm/i915/i915_gpu_error.c: In function 'i915_capture_error_state':
>> drivers/gpu//drm/i915/i915_gpu_error.c:1827:28: error: 'dev_priv' undeclared (first use in this function)
      i915_disable_error_state(dev_priv, -ENOMEM);
                               ^~~~~~~~
   drivers/gpu//drm/i915/i915_gpu_error.c:1827:28: note: each undeclared identifier is reported only once for each function it appears in

vim +/dev_priv +1827 drivers/gpu//drm/i915/i915_gpu_error.c

  1798	
  1799	/**
  1800	 * i915_capture_error_state - capture an error record for later analysis
  1801	 * @i915: i915 device
  1802	 * @engine_mask: the mask of engines triggering the hang
  1803	 * @error_msg: a message to insert into the error capture header
  1804	 *
  1805	 * Should be called when an error is detected (either a hang or an error
  1806	 * interrupt) to capture error state from the time of the error.  Fills
  1807	 * out a structure which becomes available in debugfs for user level tools
  1808	 * to pick up.
  1809	 */
  1810	void i915_capture_error_state(struct drm_i915_private *i915,
  1811				      u32 engine_mask,
  1812				      const char *error_msg)
  1813	{
  1814		static bool warned;
  1815		struct i915_gpu_state *error;
  1816		unsigned long flags;
  1817	
  1818		if (!i915_modparams.error_capture)
  1819			return;
  1820	
  1821		if (READ_ONCE(i915->gpu_error.first_error))
  1822			return;
  1823	
  1824		error = i915_capture_gpu_state(i915);
  1825		if (!error) {
  1826			DRM_DEBUG_DRIVER("out of memory, not capturing error state\n");
> 1827			i915_disable_error_state(dev_priv, -ENOMEM);
  1828			return;
  1829		}
  1830	
  1831		i915_error_capture_msg(i915, error, engine_mask, error_msg);
  1832		DRM_INFO("%s\n", error->error_msg);
  1833	
  1834		if (!error->simulated) {
  1835			spin_lock_irqsave(&i915->gpu_error.lock, flags);
  1836			if (!i915->gpu_error.first_error) {
  1837				i915->gpu_error.first_error = error;
  1838				error = NULL;
  1839			}
  1840			spin_unlock_irqrestore(&i915->gpu_error.lock, flags);
  1841		}
  1842	
  1843		if (error) {
  1844			__i915_gpu_state_free(&error->ref);
  1845			return;
  1846		}
  1847	
  1848		if (!warned &&
  1849		    ktime_get_real_seconds() - DRIVER_TIMESTAMP < DAY_AS_SECONDS(180)) {
  1850			DRM_INFO("GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.\n");
  1851			DRM_INFO("Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel\n");
  1852			DRM_INFO("drm/i915 developers can then reassign to the right component if it's not a kernel issue.\n");
  1853			DRM_INFO("The gpu crash dump is required to analyze gpu hangs, so please always attach it.\n");
  1854			DRM_INFO("GPU crash dump saved to /sys/class/drm/card%d/error\n",
  1855				 i915->drm.primary->index);
  1856			warned = true;
  1857		}
  1858	}
  1859	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29926 bytes --]

[-- Attachment #3: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-10-12  1:14 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-06 14:19 [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
2017-12-06 14:37 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-12-06 14:43 ` [PATCH] " Daniel Vetter
2017-12-06 14:48   ` Chris Wilson
2017-12-06 14:51     ` Daniel Vetter
2017-12-06 15:26 ` ✗ Fi.CI.IGT: warning for " Patchwork
2017-12-06 15:37 ` [PATCH v2] " Chris Wilson
2017-12-06 17:01   ` Bloomfield, Jon
2017-12-06 17:25     ` Bloomfield, Jon
2017-12-06 16:11 ` ✓ Fi.CI.BAT: success for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev2) Patchwork
2017-12-06 17:43 ` ✓ Fi.CI.IGT: " Patchwork
2018-10-11 11:21 ` ✗ Fi.CI.BAT: failure " Patchwork
2018-10-11 11:37 ` [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
2018-10-11 22:03   ` kbuild test robot
2018-10-12  1:13   ` kbuild test robot
2018-10-11 11:43 ` ✗ Fi.CI.BAT: failure for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev3) Patchwork
2018-10-11 11:51 ` [PATCH] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture Chris Wilson
2018-10-11 12:29 ` ✓ Fi.CI.BAT: success for drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture (rev4) Patchwork
2018-10-11 17:13 ` ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.