All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged
@ 2018-02-07 15:13 Chris Wilson
  2018-02-07 15:59 ` ✓ Fi.CI.BAT: success for " Patchwork
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Chris Wilson @ 2018-02-07 15:13 UTC (permalink / raw)
  To: intel-gfx

Reduce the window of opportunity for set-wedged being called
concurrently with reset (after i915_reset() has performed the
i915_gem_unset_wedged()) by moving the set_bit(I915_WEDGED) to before we
complete the inflight requests. When i915_reset() is being blocked on a
request, such completion may allow it to start and beginning resetting
the GPU before i915_gem_set_wedged() has finished (and so before
set-wedge will have marked the device as wedged). As such,
i915_gem_init_hw() may see a wedged device even from inside
i915_reset().

References: 36703e79a982 ("drm/i915: Break modeset deadlocks on reset")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c1b80cd52f9e..06f0456699af 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3205,6 +3205,9 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 			intel_engine_dump(engine, &p, "%s\n", engine->name);
 	}
 
+	set_bit(I915_WEDGED, &i915->gpu_error.flags);
+	smp_mb__after_atomic();
+
 	/*
 	 * First, stop submission to hw, but do not yet complete requests by
 	 * rolling the global seqno forward (since this would complete requests
@@ -3241,7 +3244,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 	for_each_engine(engine, i915, id) {
 		unsigned long flags;
 
-		/* Mark all pending requests as complete so that any concurrent
+		/*
+		 * Mark all pending requests as complete so that any concurrent
 		 * (lockless) lookup doesn't try and wait upon the request as we
 		 * reset it.
 		 */
@@ -3251,7 +3255,6 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 		spin_unlock_irqrestore(&engine->timeline->lock, flags);
 	}
 
-	set_bit(I915_WEDGED, &i915->gpu_error.flags);
 	wake_up_all(&i915->gpu_error.reset_queue);
 }
 
-- 
2.16.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Mark the device as wedged from the beginning of set-wedged
  2018-02-07 15:13 [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged Chris Wilson
@ 2018-02-07 15:59 ` Patchwork
  2018-02-07 21:41 ` ✓ Fi.CI.IGT: " Patchwork
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2018-02-07 15:59 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Mark the device as wedged from the beginning of set-wedged
URL   : https://patchwork.freedesktop.org/series/37819/
State : success

== Summary ==

Series 37819v1 drm/i915: Mark the device as wedged from the beginning of set-wedged
https://patchwork.freedesktop.org/api/1.0/series/37819/revisions/1/mbox/

Test gem_mmap_gtt:
        Subgroup basic-small-bo-tiledx:
                fail       -> PASS       (fi-gdg-551) fdo#102575

fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575

fi-bdw-5557u     total:288  pass:267  dwarn:0   dfail:0   fail:0   skip:21  time:421s
fi-bdw-gvtdvm    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:424s
fi-blb-e6850     total:288  pass:223  dwarn:1   dfail:0   fail:0   skip:64  time:375s
fi-bsw-n3050     total:288  pass:242  dwarn:0   dfail:0   fail:0   skip:46  time:487s
fi-bwr-2160      total:288  pass:183  dwarn:0   dfail:0   fail:0   skip:105 time:286s
fi-bxt-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:480s
fi-bxt-j4205     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:483s
fi-byt-j1900     total:288  pass:253  dwarn:0   dfail:0   fail:0   skip:35  time:467s
fi-byt-n2820     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:468s
fi-cfl-s2        total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:569s
fi-cnl-y3        total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:567s
fi-elk-e7500     total:288  pass:229  dwarn:0   dfail:0   fail:0   skip:59  time:415s
fi-gdg-551       total:288  pass:180  dwarn:0   dfail:0   fail:0   skip:108 time:284s
fi-glk-1         total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:509s
fi-hsw-4770      total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:388s
fi-ilk-650       total:288  pass:228  dwarn:0   dfail:0   fail:0   skip:60  time:412s
fi-ivb-3520m     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:461s
fi-ivb-3770      total:288  pass:255  dwarn:0   dfail:0   fail:0   skip:33  time:417s
fi-kbl-7500u     total:288  pass:263  dwarn:1   dfail:0   fail:0   skip:24  time:456s
fi-kbl-7560u     total:288  pass:269  dwarn:0   dfail:0   fail:0   skip:19  time:494s
fi-kbl-7567u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:454s
fi-kbl-r         total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:504s
fi-pnv-d510      total:288  pass:222  dwarn:1   dfail:0   fail:0   skip:65  time:603s
fi-skl-6260u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:430s
fi-skl-6600u     total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:514s
fi-skl-6700hq    total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:527s
fi-skl-6700k2    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:489s
fi-skl-6770hq    total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:488s
fi-skl-guc       total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:415s
fi-skl-gvtdvm    total:288  pass:265  dwarn:0   dfail:0   fail:0   skip:23  time:434s
fi-snb-2520m     total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:524s
fi-snb-2600      total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:394s
Blacklisted hosts:
fi-glk-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:472s

94ca1ebb0652da416401b22e227d0d95eb382b22 drm-tip: 2018y-02m-07d-13h-38m-30s UTC integration manifest
1fcb8ccd3c6e drm/i915: Mark the device as wedged from the beginning of set-wedged

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7924/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915: Mark the device as wedged from the beginning of set-wedged
  2018-02-07 15:13 [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged Chris Wilson
  2018-02-07 15:59 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2018-02-07 21:41 ` Patchwork
  2018-02-08  8:39 ` [PATCH] " Chris Wilson
  2018-02-08 11:35 ` Mika Kuoppala
  3 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2018-02-07 21:41 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Mark the device as wedged from the beginning of set-wedged
URL   : https://patchwork.freedesktop.org/series/37819/
State : success

== Summary ==

Test gem_eio:
        Subgroup in-flight:
                dmesg-warn -> PASS       (shard-snb) fdo#104058
Test kms_flip:
        Subgroup modeset-vs-vblank-race-interruptible:
                fail       -> PASS       (shard-hsw) fdo#103060
        Subgroup 2x-flip-vs-expired-vblank:
                fail       -> PASS       (shard-hsw) fdo#102887
Test kms_sysfs_edid_timing:
                pass       -> WARN       (shard-apl) fdo#100047
Test perf:
        Subgroup buffer-fill:
                fail       -> PASS       (shard-apl) fdo#103755

fdo#104058 https://bugs.freedesktop.org/show_bug.cgi?id=104058
fdo#103060 https://bugs.freedesktop.org/show_bug.cgi?id=103060
fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887
fdo#100047 https://bugs.freedesktop.org/show_bug.cgi?id=100047
fdo#103755 https://bugs.freedesktop.org/show_bug.cgi?id=103755

shard-apl        total:3419 pass:1768 dwarn:1   dfail:0   fail:22  skip:1626 time:12312s
shard-hsw        total:3442 pass:1759 dwarn:1   dfail:0   fail:10  skip:1671 time:11892s
shard-snb        total:3442 pass:1350 dwarn:1   dfail:0   fail:10  skip:2081 time:6522s
Blacklisted hosts:
shard-kbl        total:3442 pass:1909 dwarn:1   dfail:0   fail:21  skip:1511 time:9621s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7924/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged
  2018-02-07 15:13 [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged Chris Wilson
  2018-02-07 15:59 ` ✓ Fi.CI.BAT: success for " Patchwork
  2018-02-07 21:41 ` ✓ Fi.CI.IGT: " Patchwork
@ 2018-02-08  8:39 ` Chris Wilson
  2018-02-08 11:30   ` Joonas Lahtinen
  2018-02-08 11:35 ` Mika Kuoppala
  3 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2018-02-08  8:39 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2018-02-07 15:13:50)
> Reduce the window of opportunity for set-wedged being called
> concurrently with reset (after i915_reset() has performed the
> i915_gem_unset_wedged()) by moving the set_bit(I915_WEDGED) to before we
> complete the inflight requests. When i915_reset() is being blocked on a
> request, such completion may allow it to start and beginning resetting
> the GPU before i915_gem_set_wedged() has finished (and so before
> set-wedge will have marked the device as wedged). As such,
> i915_gem_init_hw() may see a wedged device even from inside
> i915_reset().

So I'm 99% certain this is the problem on blb/pnv. As we break the
modeset deadlock using set-wedged from a timer, the reset springs into
action on the other cpu and races with set_bit(I915_WEDGED). Flaging
I915_WEDGED first will force i915_reset() to serialise via
i915_gem_unset_wedged(). (Well that's the plan at least.)
 
> References: 36703e79a982 ("drm/i915: Break modeset deadlocks on reset")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c1b80cd52f9e..06f0456699af 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3205,6 +3205,9 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>                         intel_engine_dump(engine, &p, "%s\n", engine->name);
>         }
>  
> +       set_bit(I915_WEDGED, &i915->gpu_error.flags);
> +       smp_mb__after_atomic();
> +
>         /*
>          * First, stop submission to hw, but do not yet complete requests by
>          * rolling the global seqno forward (since this would complete requests
> @@ -3241,7 +3244,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>         for_each_engine(engine, i915, id) {
>                 unsigned long flags;
>  
> -               /* Mark all pending requests as complete so that any concurrent
> +               /*
> +                * Mark all pending requests as complete so that any concurrent
>                  * (lockless) lookup doesn't try and wait upon the request as we
>                  * reset it.
>                  */
> @@ -3251,7 +3255,6 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>                 spin_unlock_irqrestore(&engine->timeline->lock, flags);
>         }
>  
> -       set_bit(I915_WEDGED, &i915->gpu_error.flags);
>         wake_up_all(&i915->gpu_error.reset_queue);
>  }
>  
> -- 
> 2.16.1
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged
  2018-02-08  8:39 ` [PATCH] " Chris Wilson
@ 2018-02-08 11:30   ` Joonas Lahtinen
  0 siblings, 0 replies; 7+ messages in thread
From: Joonas Lahtinen @ 2018-02-08 11:30 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-02-08 10:39:05)
> Quoting Chris Wilson (2018-02-07 15:13:50)
> > Reduce the window of opportunity for set-wedged being called
> > concurrently with reset (after i915_reset() has performed the
> > i915_gem_unset_wedged()) by moving the set_bit(I915_WEDGED) to before we
> > complete the inflight requests. When i915_reset() is being blocked on a
> > request, such completion may allow it to start and beginning resetting
> > the GPU before i915_gem_set_wedged() has finished (and so before
> > set-wedge will have marked the device as wedged). As such,
> > i915_gem_init_hw() may see a wedged device even from inside
> > i915_reset().
> 
> So I'm 99% certain this is the problem on blb/pnv. As we break the
> modeset deadlock using set-wedged from a timer, the reset springs into
> action on the other cpu and races with set_bit(I915_WEDGED). Flaging
> I915_WEDGED first will force i915_reset() to serialise via
> i915_gem_unset_wedged(). (Well that's the plan at least.)
>  
> > References: 36703e79a982 ("drm/i915: Break modeset deadlocks on reset")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged
  2018-02-07 15:13 [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged Chris Wilson
                   ` (2 preceding siblings ...)
  2018-02-08  8:39 ` [PATCH] " Chris Wilson
@ 2018-02-08 11:35 ` Mika Kuoppala
  2018-02-08 11:45   ` Chris Wilson
  3 siblings, 1 reply; 7+ messages in thread
From: Mika Kuoppala @ 2018-02-08 11:35 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Reduce the window of opportunity for set-wedged being called
> concurrently with reset (after i915_reset() has performed the
> i915_gem_unset_wedged()) by moving the set_bit(I915_WEDGED) to before we
> complete the inflight requests. When i915_reset() is being blocked on a
> request, such completion may allow it to start and beginning resetting
> the GPU before i915_gem_set_wedged() has finished (and so before
> set-wedge will have marked the device as wedged). As such,
> i915_gem_init_hw() may see a wedged device even from inside
> i915_reset().
>
> References: 36703e79a982 ("drm/i915: Break modeset deadlocks on reset")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c1b80cd52f9e..06f0456699af 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3205,6 +3205,9 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>  			intel_engine_dump(engine, &p, "%s\n", engine->name);
>  	}
>  
> +	set_bit(I915_WEDGED, &i915->gpu_error.flags);
> +	smp_mb__after_atomic();
> +
>  	/*
>  	 * First, stop submission to hw, but do not yet complete requests by
>  	 * rolling the global seqno forward (since this would complete requests
> @@ -3241,7 +3244,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>  	for_each_engine(engine, i915, id) {
>  		unsigned long flags;
>  
> -		/* Mark all pending requests as complete so that any concurrent
> +		/*
> +		 * Mark all pending requests as complete so that any concurrent
>  		 * (lockless) lookup doesn't try and wait upon the request as we
>  		 * reset it.
>  		 */
> @@ -3251,7 +3255,6 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>  		spin_unlock_irqrestore(&engine->timeline->lock, flags);
>  	}
>  
> -	set_bit(I915_WEDGED, &i915->gpu_error.flags);
>  	wake_up_all(&i915->gpu_error.reset_queue);
>  }
>  
> -- 
> 2.16.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged
  2018-02-08 11:35 ` Mika Kuoppala
@ 2018-02-08 11:45   ` Chris Wilson
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Wilson @ 2018-02-08 11:45 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2018-02-08 11:35:09)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Reduce the window of opportunity for set-wedged being called
> > concurrently with reset (after i915_reset() has performed the
> > i915_gem_unset_wedged()) by moving the set_bit(I915_WEDGED) to before we
> > complete the inflight requests. When i915_reset() is being blocked on a
> > request, such completion may allow it to start and beginning resetting
> > the GPU before i915_gem_set_wedged() has finished (and so before
> > set-wedge will have marked the device as wedged). As such,
> > i915_gem_init_hw() may see a wedged device even from inside
> > i915_reset().
> >
> > References: 36703e79a982 ("drm/i915: Break modeset deadlocks on reset")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Thank you both kindly for the review,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-02-08 11:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-07 15:13 [PATCH] drm/i915: Mark the device as wedged from the beginning of set-wedged Chris Wilson
2018-02-07 15:59 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-02-07 21:41 ` ✓ Fi.CI.IGT: " Patchwork
2018-02-08  8:39 ` [PATCH] " Chris Wilson
2018-02-08 11:30   ` Joonas Lahtinen
2018-02-08 11:35 ` Mika Kuoppala
2018-02-08 11:45   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.