All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Drain the device workqueue on unload
@ 2017-06-28 15:39 Chris Wilson
  2017-06-28 16:03 ` ✓ Fi.CI.BAT: success for " Patchwork
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Chris Wilson @ 2017-06-28 15:39 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matthew Auld

Workers on the i915->wq may rearm themselves so for completeness we need
to replace our flush_workqueue() with a call to drain_workqueue() before
unloading the device.

References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c                  | 2 +-
 drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 9167a73f3c69..3f998d7102f7 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -592,7 +592,7 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
 
 static void i915_gem_fini(struct drm_i915_private *dev_priv)
 {
-	flush_workqueue(dev_priv->wq);
+	drain_workqueue(dev_priv->wq);
 
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	intel_uc_fini_hw(dev_priv);
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 47613d20bba8..4beed89b51e6 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -57,7 +57,7 @@ static void mock_device_release(struct drm_device *dev)
 
 	cancel_delayed_work_sync(&i915->gt.retire_work);
 	cancel_delayed_work_sync(&i915->gt.idle_work);
-	flush_workqueue(i915->wq);
+	drain_workqueue(i915->wq);
 
 	mutex_lock(&i915->drm.struct_mutex);
 	for_each_engine(engine, i915, id)
-- 
2.13.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Drain the device workqueue on unload
  2017-06-28 15:39 [PATCH] drm/i915: Drain the device workqueue on unload Chris Wilson
@ 2017-06-28 16:03 ` Patchwork
  2017-06-29  9:07 ` [PATCH] " Mika Kuoppala
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Patchwork @ 2017-06-28 16:03 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Drain the device workqueue on unload
URL   : https://patchwork.freedesktop.org/series/26494/
State : success

== Summary ==

Series 26494v1 drm/i915: Drain the device workqueue on unload
https://patchwork.freedesktop.org/api/1.0/series/26494/revisions/1/mbox/

Test gem_exec_flush:
        Subgroup basic-batch-kernel-default-uc:
                fail       -> PASS       (fi-snb-2600) fdo#100007
Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-a:
                dmesg-warn -> PASS       (fi-byt-j1900) fdo#101517

fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
fdo#101517 https://bugs.freedesktop.org/show_bug.cgi?id=101517

fi-bdw-5557u     total:279  pass:268  dwarn:0   dfail:0   fail:0   skip:11  time:443s
fi-bdw-gvtdvm    total:279  pass:257  dwarn:8   dfail:0   fail:0   skip:14  time:424s
fi-blb-e6850     total:279  pass:224  dwarn:1   dfail:0   fail:0   skip:54  time:354s
fi-bsw-n3050     total:279  pass:242  dwarn:1   dfail:0   fail:0   skip:36  time:539s
fi-bxt-j4205     total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:518s
fi-byt-j1900     total:279  pass:254  dwarn:1   dfail:0   fail:0   skip:24  time:489s
fi-byt-n2820     total:279  pass:249  dwarn:2   dfail:0   fail:0   skip:28  time:483s
fi-glk-2a        total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:602s
fi-hsw-4770      total:279  pass:263  dwarn:0   dfail:0   fail:0   skip:16  time:435s
fi-hsw-4770r     total:279  pass:263  dwarn:0   dfail:0   fail:0   skip:16  time:410s
fi-ilk-650       total:279  pass:229  dwarn:0   dfail:0   fail:0   skip:50  time:414s
fi-ivb-3520m     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:499s
fi-ivb-3770      total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:471s
fi-kbl-7500u     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:464s
fi-kbl-7560u     total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:570s
fi-kbl-r         total:279  pass:260  dwarn:1   dfail:0   fail:0   skip:18  time:581s
fi-pnv-d510      total:279  pass:223  dwarn:1   dfail:0   fail:0   skip:55  time:557s
fi-skl-6260u     total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:461s
fi-skl-6700hq    total:279  pass:223  dwarn:1   dfail:0   fail:30  skip:24  time:340s
fi-skl-6700k     total:279  pass:257  dwarn:4   dfail:0   fail:0   skip:18  time:462s
fi-skl-6770hq    total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:479s
fi-skl-gvtdvm    total:279  pass:266  dwarn:0   dfail:0   fail:0   skip:13  time:436s
fi-snb-2520m     total:279  pass:251  dwarn:0   dfail:0   fail:0   skip:28  time:536s
fi-snb-2600      total:279  pass:250  dwarn:0   dfail:0   fail:0   skip:29  time:414s

85a692e2c6a7cf93082044d776e838cb9e9b2146 drm-tip: 2017y-06m-28d-14h-24m-59s UTC integration manifest
99d97d4 drm/i915: Drain the device workqueue on unload

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_5061/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915: Drain the device workqueue on unload
  2017-06-28 15:39 [PATCH] drm/i915: Drain the device workqueue on unload Chris Wilson
  2017-06-28 16:03 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2017-06-29  9:07 ` Mika Kuoppala
  2017-06-29  9:49   ` Chris Wilson
  2017-07-18 13:41 ` [PATCH v2] " Chris Wilson
  2017-07-18 13:58 ` ✓ Fi.CI.BAT: success for drm/i915: Drain the device workqueue on unload (rev2) Patchwork
  3 siblings, 1 reply; 10+ messages in thread
From: Mika Kuoppala @ 2017-06-29  9:07 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Matthew Auld

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Workers on the i915->wq may rearm themselves so for completeness we need
> to replace our flush_workqueue() with a call to drain_workqueue() before
> unloading the device.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Matthew Auld <matthew.auld@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c                  | 2 +-
>  drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 9167a73f3c69..3f998d7102f7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -592,7 +592,7 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
>  
>  static void i915_gem_fini(struct drm_i915_private *dev_priv)
>  {
> -	flush_workqueue(dev_priv->wq);
> +	drain_workqueue(dev_priv->wq);

There will be superfluous drain_workqueue in driver_unload.

Also the destroy will drain byitself but in here we want
to drain before taking mutex?

-Mika

>  
>  	mutex_lock(&dev_priv->drm.struct_mutex);
>  	intel_uc_fini_hw(dev_priv);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index 47613d20bba8..4beed89b51e6 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -57,7 +57,7 @@ static void mock_device_release(struct drm_device *dev)
>  
>  	cancel_delayed_work_sync(&i915->gt.retire_work);
>  	cancel_delayed_work_sync(&i915->gt.idle_work);
> -	flush_workqueue(i915->wq);
> +	drain_workqueue(i915->wq);
>  
>  	mutex_lock(&i915->drm.struct_mutex);
>  	for_each_engine(engine, i915, id)
> -- 
> 2.13.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915: Drain the device workqueue on unload
  2017-06-29  9:07 ` [PATCH] " Mika Kuoppala
@ 2017-06-29  9:49   ` Chris Wilson
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Wilson @ 2017-06-29  9:49 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx; +Cc: Matthew Auld

Quoting Mika Kuoppala (2017-06-29 10:07:04)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Workers on the i915->wq may rearm themselves so for completeness we need
> > to replace our flush_workqueue() with a call to drain_workqueue() before
> > unloading the device.
> >
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.c                  | 2 +-
> >  drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 9167a73f3c69..3f998d7102f7 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -592,7 +592,7 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
> >  
> >  static void i915_gem_fini(struct drm_i915_private *dev_priv)
> >  {
> > -     flush_workqueue(dev_priv->wq);
> > +     drain_workqueue(dev_priv->wq);
> 
> There will be superfluous drain_workqueue in driver_unload.
> 
> Also the destroy will drain byitself but in here we want
> to drain before taking mutex?

Yes. Some fini functions (e.g. i915_gem_contexts_fini) rely on there
being no pending work left and can safely destroy the parent structures.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] drm/i915: Drain the device workqueue on unload
  2017-06-28 15:39 [PATCH] drm/i915: Drain the device workqueue on unload Chris Wilson
  2017-06-28 16:03 ` ✓ Fi.CI.BAT: success for " Patchwork
  2017-06-29  9:07 ` [PATCH] " Mika Kuoppala
@ 2017-07-18 13:41 ` Chris Wilson
  2017-07-19 11:18   ` Mika Kuoppala
  2017-07-18 13:58 ` ✓ Fi.CI.BAT: success for drm/i915: Drain the device workqueue on unload (rev2) Patchwork
  3 siblings, 1 reply; 10+ messages in thread
From: Chris Wilson @ 2017-07-18 13:41 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matthew Auld

Workers on the i915->wq may rearm themselves so for completeness we need
to replace our flush_workqueue() with a call to drain_workqueue() before
unloading the device.

v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a
few of the tasks that need to be drained may first be armed by RCU.

References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c                  |  6 ++----
 drivers/gpu/drm/i915/i915_drv.h                  | 20 ++++++++++++++++++++
 drivers/gpu/drm/i915/selftests/mock_gem_device.c |  2 +-
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 4b62fd012877..41c5b11a7c8f 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
 
 static void i915_gem_fini(struct drm_i915_private *dev_priv)
 {
-	flush_workqueue(dev_priv->wq);
+	/* Flush any outstanding unpin_work. */
+	i915_gem_drain_workqueue(dev_priv);
 
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	intel_uc_fini_hw(dev_priv);
@@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev)
 	cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
 	i915_reset_error_state(dev_priv);
 
-	/* Flush any outstanding unpin_work. */
-	drain_workqueue(dev_priv->wq);
-
 	i915_gem_fini(dev_priv);
 	intel_uc_fini_fw(dev_priv);
 	intel_fbc_cleanup_cfb(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 667fb5c44483..e9a4b96dc775 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
 	} while (flush_work(&i915->mm.free_work));
 }
 
+static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915)
+{
+	/*
+	 * Similar to objects above (see i915_gem_drain_freed-objects), in
+	 * general we have workers that are armed by RCU and then rearm
+	 * themselves in their callbacks. To be paranoid, we need to
+	 * drain the workqueue a second time after waiting for the RCU
+	 * grace period so that we catch work queued via RCU from the first
+	 * pass. As neither drain_workqueue() nor flush_workqueue() report
+	 * a result, we make an assumption that we only don't require more
+	 * than 2 passes to catch all recursive RCU delayed work.
+	 *
+	 */
+	int pass = 2;
+	do {
+		rcu_barrier();
+		drain_workqueue(i915->wq);
+	} while (--pass);
+}
+
 struct i915_vma * __must_check
 i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 			 const struct i915_ggtt_view *view,
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 47613d20bba8..7a468cb30946 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -57,7 +57,7 @@ static void mock_device_release(struct drm_device *dev)
 
 	cancel_delayed_work_sync(&i915->gt.retire_work);
 	cancel_delayed_work_sync(&i915->gt.idle_work);
-	flush_workqueue(i915->wq);
+	i915_gem_drain_workqueue(i915);
 
 	mutex_lock(&i915->drm.struct_mutex);
 	for_each_engine(engine, i915, id)
-- 
2.13.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Drain the device workqueue on unload (rev2)
  2017-06-28 15:39 [PATCH] drm/i915: Drain the device workqueue on unload Chris Wilson
                   ` (2 preceding siblings ...)
  2017-07-18 13:41 ` [PATCH v2] " Chris Wilson
@ 2017-07-18 13:58 ` Patchwork
  3 siblings, 0 replies; 10+ messages in thread
From: Patchwork @ 2017-07-18 13:58 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Drain the device workqueue on unload (rev2)
URL   : https://patchwork.freedesktop.org/series/26494/
State : success

== Summary ==

Series 26494v2 drm/i915: Drain the device workqueue on unload
https://patchwork.freedesktop.org/api/1.0/series/26494/revisions/2/mbox/

Test gem_exec_suspend:
        Subgroup basic-s4-devices:
                pass       -> DMESG-WARN (fi-kbl-7560u) k.org#196399
Test kms_cursor_legacy:
        Subgroup basic-busy-flip-before-cursor-atomic:
                pass       -> FAIL       (fi-snb-2600) fdo#100215
Test kms_flip:
        Subgroup basic-flip-vs-modeset:
                skip       -> PASS       (fi-skl-x1585l) fdo#101781
Test kms_pipe_crc_basic:
        Subgroup hang-read-crc-pipe-a:
                pass       -> DMESG-WARN (fi-pnv-d510) fdo#101597
        Subgroup suspend-read-crc-pipe-b:
                dmesg-warn -> PASS       (fi-byt-j1900) fdo#101705

k.org#196399 https://bugzilla.kernel.org/show_bug.cgi?id=196399
fdo#100215 https://bugs.freedesktop.org/show_bug.cgi?id=100215
fdo#101781 https://bugs.freedesktop.org/show_bug.cgi?id=101781
fdo#101597 https://bugs.freedesktop.org/show_bug.cgi?id=101597
fdo#101705 https://bugs.freedesktop.org/show_bug.cgi?id=101705

fi-bdw-5557u     total:279  pass:268  dwarn:0   dfail:0   fail:0   skip:11  time:440s
fi-bdw-gvtdvm    total:279  pass:265  dwarn:0   dfail:0   fail:0   skip:14  time:426s
fi-blb-e6850     total:279  pass:224  dwarn:1   dfail:0   fail:0   skip:54  time:353s
fi-bsw-n3050     total:279  pass:243  dwarn:0   dfail:0   fail:0   skip:36  time:533s
fi-bxt-j4205     total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:505s
fi-byt-j1900     total:279  pass:255  dwarn:0   dfail:0   fail:0   skip:24  time:493s
fi-byt-n2820     total:279  pass:251  dwarn:0   dfail:0   fail:0   skip:28  time:488s
fi-glk-2a        total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:610s
fi-hsw-4770      total:279  pass:263  dwarn:0   dfail:0   fail:0   skip:16  time:440s
fi-hsw-4770r     total:279  pass:263  dwarn:0   dfail:0   fail:0   skip:16  time:414s
fi-ilk-650       total:279  pass:229  dwarn:0   dfail:0   fail:0   skip:50  time:414s
fi-ivb-3520m     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:502s
fi-ivb-3770      total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:470s
fi-kbl-7500u     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:468s
fi-kbl-7560u     total:279  pass:268  dwarn:1   dfail:0   fail:0   skip:10  time:569s
fi-kbl-r         total:279  pass:260  dwarn:1   dfail:0   fail:0   skip:18  time:579s
fi-pnv-d510      total:279  pass:222  dwarn:2   dfail:0   fail:0   skip:55  time:569s
fi-skl-6260u     total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:452s
fi-skl-6700hq    total:279  pass:262  dwarn:0   dfail:0   fail:0   skip:17  time:585s
fi-skl-6700k     total:279  pass:257  dwarn:4   dfail:0   fail:0   skip:18  time:467s
fi-skl-6770hq    total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:472s
fi-skl-gvtdvm    total:279  pass:266  dwarn:0   dfail:0   fail:0   skip:13  time:441s
fi-skl-x1585l    total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:482s
fi-snb-2520m     total:279  pass:251  dwarn:0   dfail:0   fail:0   skip:28  time:539s
fi-snb-2600      total:279  pass:249  dwarn:0   dfail:0   fail:1   skip:29  time:411s

10de1e17faaab452782e5a1baffd1b30a639a261 drm-tip: 2017y-07m-18d-10h-08m-42s UTC integration manifest
b859b5d drm/i915: Drain the device workqueue on unload

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_5219/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] drm/i915: Drain the device workqueue on unload
  2017-07-18 13:41 ` [PATCH v2] " Chris Wilson
@ 2017-07-19 11:18   ` Mika Kuoppala
  2017-07-19 11:30     ` Chris Wilson
  0 siblings, 1 reply; 10+ messages in thread
From: Mika Kuoppala @ 2017-07-19 11:18 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Matthew Auld

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Workers on the i915->wq may rearm themselves so for completeness we need
> to replace our flush_workqueue() with a call to drain_workqueue() before
> unloading the device.
>
> v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a
> few of the tasks that need to be drained may first be armed by RCU.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c                  |  6 ++----
>  drivers/gpu/drm/i915/i915_drv.h                  | 20 ++++++++++++++++++++
>  drivers/gpu/drm/i915/selftests/mock_gem_device.c |  2 +-
>  3 files changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 4b62fd012877..41c5b11a7c8f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
>  
>  static void i915_gem_fini(struct drm_i915_private *dev_priv)
>  {
> -	flush_workqueue(dev_priv->wq);
> +	/* Flush any outstanding unpin_work. */
> +	i915_gem_drain_workqueue(dev_priv);
>  
>  	mutex_lock(&dev_priv->drm.struct_mutex);
>  	intel_uc_fini_hw(dev_priv);
> @@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev)
>  	cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
>  	i915_reset_error_state(dev_priv);
>  
> -	/* Flush any outstanding unpin_work. */
> -	drain_workqueue(dev_priv->wq);
> -
>  	i915_gem_fini(dev_priv);
>  	intel_uc_fini_fw(dev_priv);
>  	intel_fbc_cleanup_cfb(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 667fb5c44483..e9a4b96dc775 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
>  	} while (flush_work(&i915->mm.free_work));
>  }
>  
> +static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915)
> +{
> +	/*
> +	 * Similar to objects above (see i915_gem_drain_freed-objects), in
> +	 * general we have workers that are armed by RCU and then rearm
> +	 * themselves in their callbacks. To be paranoid, we need to
> +	 * drain the workqueue a second time after waiting for the RCU
> +	 * grace period so that we catch work queued via RCU from the first
> +	 * pass. As neither drain_workqueue() nor flush_workqueue() report
> +	 * a result, we make an assumption that we only don't require more
> +	 * than 2 passes to catch all recursive RCU delayed work.
> +	 *
> +	 */
> +	int pass = 2;
> +	do {
> +		rcu_barrier();
> +		drain_workqueue(i915->wq);

I am fine with the paranoia, and it covers the case below. Still if we do:

drain_workqueue();
rcu_barrier();

With drawining in progress, only chain queuing is allowed. I understand
this so that when it returns, all the ctx pointers are now unreferenced
but not freed.

Thus the rcu_barrier() after it cleans the trash and we are good to
be unloaded. With one pass.

I guess it comes to how to understand the comment, so could you
elaborate the 'we have workers that are armed by RCU and then rearm
themselves'?. As from drain_workqueue desc, this should be covered.

Thanks,
-Mika

> +	} while (--pass);
> +}
> +
>  struct i915_vma * __must_check
>  i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
>  			 const struct i915_ggtt_view *view,
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index 47613d20bba8..7a468cb30946 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -57,7 +57,7 @@ static void mock_device_release(struct drm_device *dev)
>  
>  	cancel_delayed_work_sync(&i915->gt.retire_work);
>  	cancel_delayed_work_sync(&i915->gt.idle_work);
> -	flush_workqueue(i915->wq);
> +	i915_gem_drain_workqueue(i915);
>  
>  	mutex_lock(&i915->drm.struct_mutex);
>  	for_each_engine(engine, i915, id)
> -- 
> 2.13.3
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] drm/i915: Drain the device workqueue on unload
  2017-07-19 11:18   ` Mika Kuoppala
@ 2017-07-19 11:30     ` Chris Wilson
  2017-07-19 11:51       ` Mika Kuoppala
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Wilson @ 2017-07-19 11:30 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx; +Cc: Matthew Auld

Quoting Mika Kuoppala (2017-07-19 12:18:47)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Workers on the i915->wq may rearm themselves so for completeness we need
> > to replace our flush_workqueue() with a call to drain_workqueue() before
> > unloading the device.
> >
> > v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a
> > few of the tasks that need to be drained may first be armed by RCU.
> >
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.c                  |  6 ++----
> >  drivers/gpu/drm/i915/i915_drv.h                  | 20 ++++++++++++++++++++
> >  drivers/gpu/drm/i915/selftests/mock_gem_device.c |  2 +-
> >  3 files changed, 23 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 4b62fd012877..41c5b11a7c8f 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
> >  
> >  static void i915_gem_fini(struct drm_i915_private *dev_priv)
> >  {
> > -     flush_workqueue(dev_priv->wq);
> > +     /* Flush any outstanding unpin_work. */
> > +     i915_gem_drain_workqueue(dev_priv);
> >  
> >       mutex_lock(&dev_priv->drm.struct_mutex);
> >       intel_uc_fini_hw(dev_priv);
> > @@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev)
> >       cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
> >       i915_reset_error_state(dev_priv);
> >  
> > -     /* Flush any outstanding unpin_work. */
> > -     drain_workqueue(dev_priv->wq);
> > -
> >       i915_gem_fini(dev_priv);
> >       intel_uc_fini_fw(dev_priv);
> >       intel_fbc_cleanup_cfb(dev_priv);
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 667fb5c44483..e9a4b96dc775 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
> >       } while (flush_work(&i915->mm.free_work));
> >  }
> >  
> > +static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915)
> > +{
> > +     /*
> > +      * Similar to objects above (see i915_gem_drain_freed-objects), in
> > +      * general we have workers that are armed by RCU and then rearm
> > +      * themselves in their callbacks. To be paranoid, we need to
> > +      * drain the workqueue a second time after waiting for the RCU
> > +      * grace period so that we catch work queued via RCU from the first
> > +      * pass. As neither drain_workqueue() nor flush_workqueue() report
> > +      * a result, we make an assumption that we only don't require more
> > +      * than 2 passes to catch all recursive RCU delayed work.
> > +      *
> > +      */
> > +     int pass = 2;
> > +     do {
> > +             rcu_barrier();
> > +             drain_workqueue(i915->wq);
> 
> I am fine with the paranoia, and it covers the case below. Still if we do:
> 
> drain_workqueue();
> rcu_barrier();
> 
> With drawining in progress, only chain queuing is allowed. I understand
> this so that when it returns, all the ctx pointers are now unreferenced
> but not freed.
> 
> Thus the rcu_barrier() after it cleans the trash and we are good to
> be unloaded. With one pass.
> 
> I guess it comes to how to understand the comment, so could you
> elaborate the 'we have workers that are armed by RCU and then rearm
> themselves'?. As from drain_workqueue desc, this should be covered.

I'm considering that they may be rearmed via RCU in the general case,
e.g. context close frees an object and so goes onto an RCU list that
once processed kicks off a new worker and so requires another round of
drain_workqueue. We are in module unload so a few extra delays to belts
and braces are ok until somebody notices it takes a few minutes to run a
reload test ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] drm/i915: Drain the device workqueue on unload
  2017-07-19 11:30     ` Chris Wilson
@ 2017-07-19 11:51       ` Mika Kuoppala
  2017-07-19 12:23         ` Chris Wilson
  0 siblings, 1 reply; 10+ messages in thread
From: Mika Kuoppala @ 2017-07-19 11:51 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Matthew Auld

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2017-07-19 12:18:47)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > Workers on the i915->wq may rearm themselves so for completeness we need
>> > to replace our flush_workqueue() with a call to drain_workqueue() before
>> > unloading the device.
>> >
>> > v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a
>> > few of the tasks that need to be drained may first be armed by RCU.
>> >
>> > References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Matthew Auld <matthew.auld@intel.com>
>> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/i915_drv.c                  |  6 ++----
>> >  drivers/gpu/drm/i915/i915_drv.h                  | 20 ++++++++++++++++++++
>> >  drivers/gpu/drm/i915/selftests/mock_gem_device.c |  2 +-
>> >  3 files changed, 23 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>> > index 4b62fd012877..41c5b11a7c8f 100644
>> > --- a/drivers/gpu/drm/i915/i915_drv.c
>> > +++ b/drivers/gpu/drm/i915/i915_drv.c
>> > @@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
>> >  
>> >  static void i915_gem_fini(struct drm_i915_private *dev_priv)
>> >  {
>> > -     flush_workqueue(dev_priv->wq);
>> > +     /* Flush any outstanding unpin_work. */
>> > +     i915_gem_drain_workqueue(dev_priv);
>> >  
>> >       mutex_lock(&dev_priv->drm.struct_mutex);
>> >       intel_uc_fini_hw(dev_priv);
>> > @@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev)
>> >       cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
>> >       i915_reset_error_state(dev_priv);
>> >  
>> > -     /* Flush any outstanding unpin_work. */
>> > -     drain_workqueue(dev_priv->wq);
>> > -
>> >       i915_gem_fini(dev_priv);
>> >       intel_uc_fini_fw(dev_priv);
>> >       intel_fbc_cleanup_cfb(dev_priv);
>> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> > index 667fb5c44483..e9a4b96dc775 100644
>> > --- a/drivers/gpu/drm/i915/i915_drv.h
>> > +++ b/drivers/gpu/drm/i915/i915_drv.h
>> > @@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
>> >       } while (flush_work(&i915->mm.free_work));
>> >  }
>> >  
>> > +static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915)
>> > +{
>> > +     /*
>> > +      * Similar to objects above (see i915_gem_drain_freed-objects), in
>> > +      * general we have workers that are armed by RCU and then rearm
>> > +      * themselves in their callbacks. To be paranoid, we need to
>> > +      * drain the workqueue a second time after waiting for the RCU
>> > +      * grace period so that we catch work queued via RCU from the first
>> > +      * pass. As neither drain_workqueue() nor flush_workqueue() report
>> > +      * a result, we make an assumption that we only don't require more
>> > +      * than 2 passes to catch all recursive RCU delayed work.
>> > +      *
>> > +      */
>> > +     int pass = 2;
>> > +     do {
>> > +             rcu_barrier();
>> > +             drain_workqueue(i915->wq);
>> 
>> I am fine with the paranoia, and it covers the case below. Still if we do:
>> 
>> drain_workqueue();
>> rcu_barrier();
>> 
>> With drawining in progress, only chain queuing is allowed. I understand
>> this so that when it returns, all the ctx pointers are now unreferenced
>> but not freed.
>> 
>> Thus the rcu_barrier() after it cleans the trash and we are good to
>> be unloaded. With one pass.
>> 
>> I guess it comes to how to understand the comment, so could you
>> elaborate the 'we have workers that are armed by RCU and then rearm
>> themselves'?. As from drain_workqueue desc, this should be covered.
>
> I'm considering that they may be rearmed via RCU in the general case,
> e.g. context close frees an object and so goes onto an RCU list that
> once processed kicks off a new worker and so requires another round of
> drain_workqueue. We are in module unload so a few extra delays to belts
> and braces are ok until somebody notices it takes a few minutes to run a
> reload test ;)

Ok. Patch is
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] drm/i915: Drain the device workqueue on unload
  2017-07-19 11:51       ` Mika Kuoppala
@ 2017-07-19 12:23         ` Chris Wilson
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Wilson @ 2017-07-19 12:23 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx; +Cc: Matthew Auld

Quoting Mika Kuoppala (2017-07-19 12:51:04)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Quoting Mika Kuoppala (2017-07-19 12:18:47)
> >> Chris Wilson <chris@chris-wilson.co.uk> writes:
> >> 
> >> > Workers on the i915->wq may rearm themselves so for completeness we need
> >> > to replace our flush_workqueue() with a call to drain_workqueue() before
> >> > unloading the device.
> >> >
> >> > v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a
> >> > few of the tasks that need to be drained may first be armed by RCU.
> >> >
> >> > References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
> >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> > Cc: Matthew Auld <matthew.auld@intel.com>
> >> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >> > ---
> >> >  drivers/gpu/drm/i915/i915_drv.c                  |  6 ++----
> >> >  drivers/gpu/drm/i915/i915_drv.h                  | 20 ++++++++++++++++++++
> >> >  drivers/gpu/drm/i915/selftests/mock_gem_device.c |  2 +-
> >> >  3 files changed, 23 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> >> > index 4b62fd012877..41c5b11a7c8f 100644
> >> > --- a/drivers/gpu/drm/i915/i915_drv.c
> >> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> >> > @@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
> >> >  
> >> >  static void i915_gem_fini(struct drm_i915_private *dev_priv)
> >> >  {
> >> > -     flush_workqueue(dev_priv->wq);
> >> > +     /* Flush any outstanding unpin_work. */
> >> > +     i915_gem_drain_workqueue(dev_priv);
> >> >  
> >> >       mutex_lock(&dev_priv->drm.struct_mutex);
> >> >       intel_uc_fini_hw(dev_priv);
> >> > @@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev)
> >> >       cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
> >> >       i915_reset_error_state(dev_priv);
> >> >  
> >> > -     /* Flush any outstanding unpin_work. */
> >> > -     drain_workqueue(dev_priv->wq);
> >> > -
> >> >       i915_gem_fini(dev_priv);
> >> >       intel_uc_fini_fw(dev_priv);
> >> >       intel_fbc_cleanup_cfb(dev_priv);
> >> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >> > index 667fb5c44483..e9a4b96dc775 100644
> >> > --- a/drivers/gpu/drm/i915/i915_drv.h
> >> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> >> > @@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
> >> >       } while (flush_work(&i915->mm.free_work));
> >> >  }
> >> >  
> >> > +static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915)
> >> > +{
> >> > +     /*
> >> > +      * Similar to objects above (see i915_gem_drain_freed-objects), in
> >> > +      * general we have workers that are armed by RCU and then rearm
> >> > +      * themselves in their callbacks. To be paranoid, we need to
> >> > +      * drain the workqueue a second time after waiting for the RCU
> >> > +      * grace period so that we catch work queued via RCU from the first
> >> > +      * pass. As neither drain_workqueue() nor flush_workqueue() report
> >> > +      * a result, we make an assumption that we only don't require more
> >> > +      * than 2 passes to catch all recursive RCU delayed work.
> >> > +      *
> >> > +      */
> >> > +     int pass = 2;
> >> > +     do {
> >> > +             rcu_barrier();
> >> > +             drain_workqueue(i915->wq);
> >> 
> >> I am fine with the paranoia, and it covers the case below. Still if we do:
> >> 
> >> drain_workqueue();
> >> rcu_barrier();
> >> 
> >> With drawining in progress, only chain queuing is allowed. I understand
> >> this so that when it returns, all the ctx pointers are now unreferenced
> >> but not freed.
> >> 
> >> Thus the rcu_barrier() after it cleans the trash and we are good to
> >> be unloaded. With one pass.
> >> 
> >> I guess it comes to how to understand the comment, so could you
> >> elaborate the 'we have workers that are armed by RCU and then rearm
> >> themselves'?. As from drain_workqueue desc, this should be covered.
> >
> > I'm considering that they may be rearmed via RCU in the general case,
> > e.g. context close frees an object and so goes onto an RCU list that
> > once processed kicks off a new worker and so requires another round of
> > drain_workqueue. We are in module unload so a few extra delays to belts
> > and braces are ok until somebody notices it takes a few minutes to run a
> > reload test ;)
> 
> Ok. Patch is
> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

Thanks, I'm optimistic this will silence the bug, so marking it as
resolved. Pushed,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-07-19 12:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-28 15:39 [PATCH] drm/i915: Drain the device workqueue on unload Chris Wilson
2017-06-28 16:03 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-06-29  9:07 ` [PATCH] " Mika Kuoppala
2017-06-29  9:49   ` Chris Wilson
2017-07-18 13:41 ` [PATCH v2] " Chris Wilson
2017-07-19 11:18   ` Mika Kuoppala
2017-07-19 11:30     ` Chris Wilson
2017-07-19 11:51       ` Mika Kuoppala
2017-07-19 12:23         ` Chris Wilson
2017-07-18 13:58 ` ✓ Fi.CI.BAT: success for drm/i915: Drain the device workqueue on unload (rev2) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.