* [PATCH] drm/i915: Drain the device workqueue on unload
@ 2017-06-28 15:39 Chris Wilson
2017-06-28 16:03 ` ✓ Fi.CI.BAT: success for " Patchwork
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Chris Wilson @ 2017-06-28 15:39 UTC (permalink / raw)
To: intel-gfx; +Cc: Matthew Auld
Workers on the i915->wq may rearm themselves so for completeness we need
to replace our flush_workqueue() with a call to drain_workqueue() before
unloading the device.
References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
---
drivers/gpu/drm/i915/i915_drv.c | 2 +-
drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 9167a73f3c69..3f998d7102f7 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -592,7 +592,7 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
static void i915_gem_fini(struct drm_i915_private *dev_priv)
{
- flush_workqueue(dev_priv->wq);
+ drain_workqueue(dev_priv->wq);
mutex_lock(&dev_priv->drm.struct_mutex);
intel_uc_fini_hw(dev_priv);
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 47613d20bba8..4beed89b51e6 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -57,7 +57,7 @@ static void mock_device_release(struct drm_device *dev)
cancel_delayed_work_sync(&i915->gt.retire_work);
cancel_delayed_work_sync(&i915->gt.idle_work);
- flush_workqueue(i915->wq);
+ drain_workqueue(i915->wq);
mutex_lock(&i915->drm.struct_mutex);
for_each_engine(engine, i915, id)
--
2.13.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 10+ messages in thread
* ✓ Fi.CI.BAT: success for drm/i915: Drain the device workqueue on unload
2017-06-28 15:39 [PATCH] drm/i915: Drain the device workqueue on unload Chris Wilson
@ 2017-06-28 16:03 ` Patchwork
2017-06-29 9:07 ` [PATCH] " Mika Kuoppala
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Patchwork @ 2017-06-28 16:03 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Drain the device workqueue on unload
URL : https://patchwork.freedesktop.org/series/26494/
State : success
== Summary ==
Series 26494v1 drm/i915: Drain the device workqueue on unload
https://patchwork.freedesktop.org/api/1.0/series/26494/revisions/1/mbox/
Test gem_exec_flush:
Subgroup basic-batch-kernel-default-uc:
fail -> PASS (fi-snb-2600) fdo#100007
Test kms_pipe_crc_basic:
Subgroup suspend-read-crc-pipe-a:
dmesg-warn -> PASS (fi-byt-j1900) fdo#101517
fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
fdo#101517 https://bugs.freedesktop.org/show_bug.cgi?id=101517
fi-bdw-5557u total:279 pass:268 dwarn:0 dfail:0 fail:0 skip:11 time:443s
fi-bdw-gvtdvm total:279 pass:257 dwarn:8 dfail:0 fail:0 skip:14 time:424s
fi-blb-e6850 total:279 pass:224 dwarn:1 dfail:0 fail:0 skip:54 time:354s
fi-bsw-n3050 total:279 pass:242 dwarn:1 dfail:0 fail:0 skip:36 time:539s
fi-bxt-j4205 total:279 pass:260 dwarn:0 dfail:0 fail:0 skip:19 time:518s
fi-byt-j1900 total:279 pass:254 dwarn:1 dfail:0 fail:0 skip:24 time:489s
fi-byt-n2820 total:279 pass:249 dwarn:2 dfail:0 fail:0 skip:28 time:483s
fi-glk-2a total:279 pass:260 dwarn:0 dfail:0 fail:0 skip:19 time:602s
fi-hsw-4770 total:279 pass:263 dwarn:0 dfail:0 fail:0 skip:16 time:435s
fi-hsw-4770r total:279 pass:263 dwarn:0 dfail:0 fail:0 skip:16 time:410s
fi-ilk-650 total:279 pass:229 dwarn:0 dfail:0 fail:0 skip:50 time:414s
fi-ivb-3520m total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:499s
fi-ivb-3770 total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:471s
fi-kbl-7500u total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:464s
fi-kbl-7560u total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:570s
fi-kbl-r total:279 pass:260 dwarn:1 dfail:0 fail:0 skip:18 time:581s
fi-pnv-d510 total:279 pass:223 dwarn:1 dfail:0 fail:0 skip:55 time:557s
fi-skl-6260u total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:461s
fi-skl-6700hq total:279 pass:223 dwarn:1 dfail:0 fail:30 skip:24 time:340s
fi-skl-6700k total:279 pass:257 dwarn:4 dfail:0 fail:0 skip:18 time:462s
fi-skl-6770hq total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:479s
fi-skl-gvtdvm total:279 pass:266 dwarn:0 dfail:0 fail:0 skip:13 time:436s
fi-snb-2520m total:279 pass:251 dwarn:0 dfail:0 fail:0 skip:28 time:536s
fi-snb-2600 total:279 pass:250 dwarn:0 dfail:0 fail:0 skip:29 time:414s
85a692e2c6a7cf93082044d776e838cb9e9b2146 drm-tip: 2017y-06m-28d-14h-24m-59s UTC integration manifest
99d97d4 drm/i915: Drain the device workqueue on unload
== Logs ==
For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_5061/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] drm/i915: Drain the device workqueue on unload
2017-06-28 15:39 [PATCH] drm/i915: Drain the device workqueue on unload Chris Wilson
2017-06-28 16:03 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2017-06-29 9:07 ` Mika Kuoppala
2017-06-29 9:49 ` Chris Wilson
2017-07-18 13:41 ` [PATCH v2] " Chris Wilson
2017-07-18 13:58 ` ✓ Fi.CI.BAT: success for drm/i915: Drain the device workqueue on unload (rev2) Patchwork
3 siblings, 1 reply; 10+ messages in thread
From: Mika Kuoppala @ 2017-06-29 9:07 UTC (permalink / raw)
To: Chris Wilson, intel-gfx; +Cc: Matthew Auld
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Workers on the i915->wq may rearm themselves so for completeness we need
> to replace our flush_workqueue() with a call to drain_workqueue() before
> unloading the device.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Matthew Auld <matthew.auld@intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.c | 2 +-
> drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 9167a73f3c69..3f998d7102f7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -592,7 +592,7 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
>
> static void i915_gem_fini(struct drm_i915_private *dev_priv)
> {
> - flush_workqueue(dev_priv->wq);
> + drain_workqueue(dev_priv->wq);
There will be superfluous drain_workqueue in driver_unload.
Also the destroy will drain byitself but in here we want
to drain before taking mutex?
-Mika
>
> mutex_lock(&dev_priv->drm.struct_mutex);
> intel_uc_fini_hw(dev_priv);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index 47613d20bba8..4beed89b51e6 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -57,7 +57,7 @@ static void mock_device_release(struct drm_device *dev)
>
> cancel_delayed_work_sync(&i915->gt.retire_work);
> cancel_delayed_work_sync(&i915->gt.idle_work);
> - flush_workqueue(i915->wq);
> + drain_workqueue(i915->wq);
>
> mutex_lock(&i915->drm.struct_mutex);
> for_each_engine(engine, i915, id)
> --
> 2.13.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] drm/i915: Drain the device workqueue on unload
2017-06-29 9:07 ` [PATCH] " Mika Kuoppala
@ 2017-06-29 9:49 ` Chris Wilson
0 siblings, 0 replies; 10+ messages in thread
From: Chris Wilson @ 2017-06-29 9:49 UTC (permalink / raw)
To: Mika Kuoppala, intel-gfx; +Cc: Matthew Auld
Quoting Mika Kuoppala (2017-06-29 10:07:04)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > Workers on the i915->wq may rearm themselves so for completeness we need
> > to replace our flush_workqueue() with a call to drain_workqueue() before
> > unloading the device.
> >
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_drv.c | 2 +-
> > drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 9167a73f3c69..3f998d7102f7 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -592,7 +592,7 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
> >
> > static void i915_gem_fini(struct drm_i915_private *dev_priv)
> > {
> > - flush_workqueue(dev_priv->wq);
> > + drain_workqueue(dev_priv->wq);
>
> There will be superfluous drain_workqueue in driver_unload.
>
> Also the destroy will drain byitself but in here we want
> to drain before taking mutex?
Yes. Some fini functions (e.g. i915_gem_contexts_fini) rely on there
being no pending work left and can safely destroy the parent structures.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2] drm/i915: Drain the device workqueue on unload
2017-06-28 15:39 [PATCH] drm/i915: Drain the device workqueue on unload Chris Wilson
2017-06-28 16:03 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-06-29 9:07 ` [PATCH] " Mika Kuoppala
@ 2017-07-18 13:41 ` Chris Wilson
2017-07-19 11:18 ` Mika Kuoppala
2017-07-18 13:58 ` ✓ Fi.CI.BAT: success for drm/i915: Drain the device workqueue on unload (rev2) Patchwork
3 siblings, 1 reply; 10+ messages in thread
From: Chris Wilson @ 2017-07-18 13:41 UTC (permalink / raw)
To: intel-gfx; +Cc: Matthew Auld
Workers on the i915->wq may rearm themselves so for completeness we need
to replace our flush_workqueue() with a call to drain_workqueue() before
unloading the device.
v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a
few of the tasks that need to be drained may first be armed by RCU.
References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
drivers/gpu/drm/i915/i915_drv.c | 6 ++----
drivers/gpu/drm/i915/i915_drv.h | 20 ++++++++++++++++++++
drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +-
3 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 4b62fd012877..41c5b11a7c8f 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
static void i915_gem_fini(struct drm_i915_private *dev_priv)
{
- flush_workqueue(dev_priv->wq);
+ /* Flush any outstanding unpin_work. */
+ i915_gem_drain_workqueue(dev_priv);
mutex_lock(&dev_priv->drm.struct_mutex);
intel_uc_fini_hw(dev_priv);
@@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev)
cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
i915_reset_error_state(dev_priv);
- /* Flush any outstanding unpin_work. */
- drain_workqueue(dev_priv->wq);
-
i915_gem_fini(dev_priv);
intel_uc_fini_fw(dev_priv);
intel_fbc_cleanup_cfb(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 667fb5c44483..e9a4b96dc775 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
} while (flush_work(&i915->mm.free_work));
}
+static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915)
+{
+ /*
+ * Similar to objects above (see i915_gem_drain_freed-objects), in
+ * general we have workers that are armed by RCU and then rearm
+ * themselves in their callbacks. To be paranoid, we need to
+ * drain the workqueue a second time after waiting for the RCU
+ * grace period so that we catch work queued via RCU from the first
+ * pass. As neither drain_workqueue() nor flush_workqueue() report
+ * a result, we make an assumption that we only don't require more
+ * than 2 passes to catch all recursive RCU delayed work.
+ *
+ */
+ int pass = 2;
+ do {
+ rcu_barrier();
+ drain_workqueue(i915->wq);
+ } while (--pass);
+}
+
struct i915_vma * __must_check
i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
const struct i915_ggtt_view *view,
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 47613d20bba8..7a468cb30946 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -57,7 +57,7 @@ static void mock_device_release(struct drm_device *dev)
cancel_delayed_work_sync(&i915->gt.retire_work);
cancel_delayed_work_sync(&i915->gt.idle_work);
- flush_workqueue(i915->wq);
+ i915_gem_drain_workqueue(i915);
mutex_lock(&i915->drm.struct_mutex);
for_each_engine(engine, i915, id)
--
2.13.3
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 10+ messages in thread
* ✓ Fi.CI.BAT: success for drm/i915: Drain the device workqueue on unload (rev2)
2017-06-28 15:39 [PATCH] drm/i915: Drain the device workqueue on unload Chris Wilson
` (2 preceding siblings ...)
2017-07-18 13:41 ` [PATCH v2] " Chris Wilson
@ 2017-07-18 13:58 ` Patchwork
3 siblings, 0 replies; 10+ messages in thread
From: Patchwork @ 2017-07-18 13:58 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Drain the device workqueue on unload (rev2)
URL : https://patchwork.freedesktop.org/series/26494/
State : success
== Summary ==
Series 26494v2 drm/i915: Drain the device workqueue on unload
https://patchwork.freedesktop.org/api/1.0/series/26494/revisions/2/mbox/
Test gem_exec_suspend:
Subgroup basic-s4-devices:
pass -> DMESG-WARN (fi-kbl-7560u) k.org#196399
Test kms_cursor_legacy:
Subgroup basic-busy-flip-before-cursor-atomic:
pass -> FAIL (fi-snb-2600) fdo#100215
Test kms_flip:
Subgroup basic-flip-vs-modeset:
skip -> PASS (fi-skl-x1585l) fdo#101781
Test kms_pipe_crc_basic:
Subgroup hang-read-crc-pipe-a:
pass -> DMESG-WARN (fi-pnv-d510) fdo#101597
Subgroup suspend-read-crc-pipe-b:
dmesg-warn -> PASS (fi-byt-j1900) fdo#101705
k.org#196399 https://bugzilla.kernel.org/show_bug.cgi?id=196399
fdo#100215 https://bugs.freedesktop.org/show_bug.cgi?id=100215
fdo#101781 https://bugs.freedesktop.org/show_bug.cgi?id=101781
fdo#101597 https://bugs.freedesktop.org/show_bug.cgi?id=101597
fdo#101705 https://bugs.freedesktop.org/show_bug.cgi?id=101705
fi-bdw-5557u total:279 pass:268 dwarn:0 dfail:0 fail:0 skip:11 time:440s
fi-bdw-gvtdvm total:279 pass:265 dwarn:0 dfail:0 fail:0 skip:14 time:426s
fi-blb-e6850 total:279 pass:224 dwarn:1 dfail:0 fail:0 skip:54 time:353s
fi-bsw-n3050 total:279 pass:243 dwarn:0 dfail:0 fail:0 skip:36 time:533s
fi-bxt-j4205 total:279 pass:260 dwarn:0 dfail:0 fail:0 skip:19 time:505s
fi-byt-j1900 total:279 pass:255 dwarn:0 dfail:0 fail:0 skip:24 time:493s
fi-byt-n2820 total:279 pass:251 dwarn:0 dfail:0 fail:0 skip:28 time:488s
fi-glk-2a total:279 pass:260 dwarn:0 dfail:0 fail:0 skip:19 time:610s
fi-hsw-4770 total:279 pass:263 dwarn:0 dfail:0 fail:0 skip:16 time:440s
fi-hsw-4770r total:279 pass:263 dwarn:0 dfail:0 fail:0 skip:16 time:414s
fi-ilk-650 total:279 pass:229 dwarn:0 dfail:0 fail:0 skip:50 time:414s
fi-ivb-3520m total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:502s
fi-ivb-3770 total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:470s
fi-kbl-7500u total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:468s
fi-kbl-7560u total:279 pass:268 dwarn:1 dfail:0 fail:0 skip:10 time:569s
fi-kbl-r total:279 pass:260 dwarn:1 dfail:0 fail:0 skip:18 time:579s
fi-pnv-d510 total:279 pass:222 dwarn:2 dfail:0 fail:0 skip:55 time:569s
fi-skl-6260u total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:452s
fi-skl-6700hq total:279 pass:262 dwarn:0 dfail:0 fail:0 skip:17 time:585s
fi-skl-6700k total:279 pass:257 dwarn:4 dfail:0 fail:0 skip:18 time:467s
fi-skl-6770hq total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:472s
fi-skl-gvtdvm total:279 pass:266 dwarn:0 dfail:0 fail:0 skip:13 time:441s
fi-skl-x1585l total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:482s
fi-snb-2520m total:279 pass:251 dwarn:0 dfail:0 fail:0 skip:28 time:539s
fi-snb-2600 total:279 pass:249 dwarn:0 dfail:0 fail:1 skip:29 time:411s
10de1e17faaab452782e5a1baffd1b30a639a261 drm-tip: 2017y-07m-18d-10h-08m-42s UTC integration manifest
b859b5d drm/i915: Drain the device workqueue on unload
== Logs ==
For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_5219/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] drm/i915: Drain the device workqueue on unload
2017-07-18 13:41 ` [PATCH v2] " Chris Wilson
@ 2017-07-19 11:18 ` Mika Kuoppala
2017-07-19 11:30 ` Chris Wilson
0 siblings, 1 reply; 10+ messages in thread
From: Mika Kuoppala @ 2017-07-19 11:18 UTC (permalink / raw)
To: Chris Wilson, intel-gfx; +Cc: Matthew Auld
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Workers on the i915->wq may rearm themselves so for completeness we need
> to replace our flush_workqueue() with a call to drain_workqueue() before
> unloading the device.
>
> v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a
> few of the tasks that need to be drained may first be armed by RCU.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.c | 6 ++----
> drivers/gpu/drm/i915/i915_drv.h | 20 ++++++++++++++++++++
> drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +-
> 3 files changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 4b62fd012877..41c5b11a7c8f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
>
> static void i915_gem_fini(struct drm_i915_private *dev_priv)
> {
> - flush_workqueue(dev_priv->wq);
> + /* Flush any outstanding unpin_work. */
> + i915_gem_drain_workqueue(dev_priv);
>
> mutex_lock(&dev_priv->drm.struct_mutex);
> intel_uc_fini_hw(dev_priv);
> @@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev)
> cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
> i915_reset_error_state(dev_priv);
>
> - /* Flush any outstanding unpin_work. */
> - drain_workqueue(dev_priv->wq);
> -
> i915_gem_fini(dev_priv);
> intel_uc_fini_fw(dev_priv);
> intel_fbc_cleanup_cfb(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 667fb5c44483..e9a4b96dc775 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
> } while (flush_work(&i915->mm.free_work));
> }
>
> +static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915)
> +{
> + /*
> + * Similar to objects above (see i915_gem_drain_freed-objects), in
> + * general we have workers that are armed by RCU and then rearm
> + * themselves in their callbacks. To be paranoid, we need to
> + * drain the workqueue a second time after waiting for the RCU
> + * grace period so that we catch work queued via RCU from the first
> + * pass. As neither drain_workqueue() nor flush_workqueue() report
> + * a result, we make an assumption that we only don't require more
> + * than 2 passes to catch all recursive RCU delayed work.
> + *
> + */
> + int pass = 2;
> + do {
> + rcu_barrier();
> + drain_workqueue(i915->wq);
I am fine with the paranoia, and it covers the case below. Still if we do:
drain_workqueue();
rcu_barrier();
With drawining in progress, only chain queuing is allowed. I understand
this so that when it returns, all the ctx pointers are now unreferenced
but not freed.
Thus the rcu_barrier() after it cleans the trash and we are good to
be unloaded. With one pass.
I guess it comes to how to understand the comment, so could you
elaborate the 'we have workers that are armed by RCU and then rearm
themselves'?. As from drain_workqueue desc, this should be covered.
Thanks,
-Mika
> + } while (--pass);
> +}
> +
> struct i915_vma * __must_check
> i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
> const struct i915_ggtt_view *view,
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index 47613d20bba8..7a468cb30946 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -57,7 +57,7 @@ static void mock_device_release(struct drm_device *dev)
>
> cancel_delayed_work_sync(&i915->gt.retire_work);
> cancel_delayed_work_sync(&i915->gt.idle_work);
> - flush_workqueue(i915->wq);
> + i915_gem_drain_workqueue(i915);
>
> mutex_lock(&i915->drm.struct_mutex);
> for_each_engine(engine, i915, id)
> --
> 2.13.3
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] drm/i915: Drain the device workqueue on unload
2017-07-19 11:18 ` Mika Kuoppala
@ 2017-07-19 11:30 ` Chris Wilson
2017-07-19 11:51 ` Mika Kuoppala
0 siblings, 1 reply; 10+ messages in thread
From: Chris Wilson @ 2017-07-19 11:30 UTC (permalink / raw)
To: Mika Kuoppala, intel-gfx; +Cc: Matthew Auld
Quoting Mika Kuoppala (2017-07-19 12:18:47)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > Workers on the i915->wq may rearm themselves so for completeness we need
> > to replace our flush_workqueue() with a call to drain_workqueue() before
> > unloading the device.
> >
> > v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a
> > few of the tasks that need to be drained may first be armed by RCU.
> >
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_drv.c | 6 ++----
> > drivers/gpu/drm/i915/i915_drv.h | 20 ++++++++++++++++++++
> > drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +-
> > 3 files changed, 23 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 4b62fd012877..41c5b11a7c8f 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
> >
> > static void i915_gem_fini(struct drm_i915_private *dev_priv)
> > {
> > - flush_workqueue(dev_priv->wq);
> > + /* Flush any outstanding unpin_work. */
> > + i915_gem_drain_workqueue(dev_priv);
> >
> > mutex_lock(&dev_priv->drm.struct_mutex);
> > intel_uc_fini_hw(dev_priv);
> > @@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev)
> > cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
> > i915_reset_error_state(dev_priv);
> >
> > - /* Flush any outstanding unpin_work. */
> > - drain_workqueue(dev_priv->wq);
> > -
> > i915_gem_fini(dev_priv);
> > intel_uc_fini_fw(dev_priv);
> > intel_fbc_cleanup_cfb(dev_priv);
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 667fb5c44483..e9a4b96dc775 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
> > } while (flush_work(&i915->mm.free_work));
> > }
> >
> > +static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915)
> > +{
> > + /*
> > + * Similar to objects above (see i915_gem_drain_freed-objects), in
> > + * general we have workers that are armed by RCU and then rearm
> > + * themselves in their callbacks. To be paranoid, we need to
> > + * drain the workqueue a second time after waiting for the RCU
> > + * grace period so that we catch work queued via RCU from the first
> > + * pass. As neither drain_workqueue() nor flush_workqueue() report
> > + * a result, we make an assumption that we only don't require more
> > + * than 2 passes to catch all recursive RCU delayed work.
> > + *
> > + */
> > + int pass = 2;
> > + do {
> > + rcu_barrier();
> > + drain_workqueue(i915->wq);
>
> I am fine with the paranoia, and it covers the case below. Still if we do:
>
> drain_workqueue();
> rcu_barrier();
>
> With drawining in progress, only chain queuing is allowed. I understand
> this so that when it returns, all the ctx pointers are now unreferenced
> but not freed.
>
> Thus the rcu_barrier() after it cleans the trash and we are good to
> be unloaded. With one pass.
>
> I guess it comes to how to understand the comment, so could you
> elaborate the 'we have workers that are armed by RCU and then rearm
> themselves'?. As from drain_workqueue desc, this should be covered.
I'm considering that they may be rearmed via RCU in the general case,
e.g. context close frees an object and so goes onto an RCU list that
once processed kicks off a new worker and so requires another round of
drain_workqueue. We are in module unload so a few extra delays to belts
and braces are ok until somebody notices it takes a few minutes to run a
reload test ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] drm/i915: Drain the device workqueue on unload
2017-07-19 11:30 ` Chris Wilson
@ 2017-07-19 11:51 ` Mika Kuoppala
2017-07-19 12:23 ` Chris Wilson
0 siblings, 1 reply; 10+ messages in thread
From: Mika Kuoppala @ 2017-07-19 11:51 UTC (permalink / raw)
To: Chris Wilson, intel-gfx; +Cc: Matthew Auld
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Quoting Mika Kuoppala (2017-07-19 12:18:47)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>>
>> > Workers on the i915->wq may rearm themselves so for completeness we need
>> > to replace our flush_workqueue() with a call to drain_workqueue() before
>> > unloading the device.
>> >
>> > v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a
>> > few of the tasks that need to be drained may first be armed by RCU.
>> >
>> > References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Matthew Auld <matthew.auld@intel.com>
>> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> > ---
>> > drivers/gpu/drm/i915/i915_drv.c | 6 ++----
>> > drivers/gpu/drm/i915/i915_drv.h | 20 ++++++++++++++++++++
>> > drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +-
>> > 3 files changed, 23 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>> > index 4b62fd012877..41c5b11a7c8f 100644
>> > --- a/drivers/gpu/drm/i915/i915_drv.c
>> > +++ b/drivers/gpu/drm/i915/i915_drv.c
>> > @@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
>> >
>> > static void i915_gem_fini(struct drm_i915_private *dev_priv)
>> > {
>> > - flush_workqueue(dev_priv->wq);
>> > + /* Flush any outstanding unpin_work. */
>> > + i915_gem_drain_workqueue(dev_priv);
>> >
>> > mutex_lock(&dev_priv->drm.struct_mutex);
>> > intel_uc_fini_hw(dev_priv);
>> > @@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev)
>> > cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
>> > i915_reset_error_state(dev_priv);
>> >
>> > - /* Flush any outstanding unpin_work. */
>> > - drain_workqueue(dev_priv->wq);
>> > -
>> > i915_gem_fini(dev_priv);
>> > intel_uc_fini_fw(dev_priv);
>> > intel_fbc_cleanup_cfb(dev_priv);
>> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> > index 667fb5c44483..e9a4b96dc775 100644
>> > --- a/drivers/gpu/drm/i915/i915_drv.h
>> > +++ b/drivers/gpu/drm/i915/i915_drv.h
>> > @@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
>> > } while (flush_work(&i915->mm.free_work));
>> > }
>> >
>> > +static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915)
>> > +{
>> > + /*
>> > + * Similar to objects above (see i915_gem_drain_freed-objects), in
>> > + * general we have workers that are armed by RCU and then rearm
>> > + * themselves in their callbacks. To be paranoid, we need to
>> > + * drain the workqueue a second time after waiting for the RCU
>> > + * grace period so that we catch work queued via RCU from the first
>> > + * pass. As neither drain_workqueue() nor flush_workqueue() report
>> > + * a result, we make an assumption that we only don't require more
>> > + * than 2 passes to catch all recursive RCU delayed work.
>> > + *
>> > + */
>> > + int pass = 2;
>> > + do {
>> > + rcu_barrier();
>> > + drain_workqueue(i915->wq);
>>
>> I am fine with the paranoia, and it covers the case below. Still if we do:
>>
>> drain_workqueue();
>> rcu_barrier();
>>
>> With drawining in progress, only chain queuing is allowed. I understand
>> this so that when it returns, all the ctx pointers are now unreferenced
>> but not freed.
>>
>> Thus the rcu_barrier() after it cleans the trash and we are good to
>> be unloaded. With one pass.
>>
>> I guess it comes to how to understand the comment, so could you
>> elaborate the 'we have workers that are armed by RCU and then rearm
>> themselves'?. As from drain_workqueue desc, this should be covered.
>
> I'm considering that they may be rearmed via RCU in the general case,
> e.g. context close frees an object and so goes onto an RCU list that
> once processed kicks off a new worker and so requires another round of
> drain_workqueue. We are in module unload so a few extra delays to belts
> and braces are ok until somebody notices it takes a few minutes to run a
> reload test ;)
Ok. Patch is
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] drm/i915: Drain the device workqueue on unload
2017-07-19 11:51 ` Mika Kuoppala
@ 2017-07-19 12:23 ` Chris Wilson
0 siblings, 0 replies; 10+ messages in thread
From: Chris Wilson @ 2017-07-19 12:23 UTC (permalink / raw)
To: Mika Kuoppala, intel-gfx; +Cc: Matthew Auld
Quoting Mika Kuoppala (2017-07-19 12:51:04)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > Quoting Mika Kuoppala (2017-07-19 12:18:47)
> >> Chris Wilson <chris@chris-wilson.co.uk> writes:
> >>
> >> > Workers on the i915->wq may rearm themselves so for completeness we need
> >> > to replace our flush_workqueue() with a call to drain_workqueue() before
> >> > unloading the device.
> >> >
> >> > v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a
> >> > few of the tasks that need to be drained may first be armed by RCU.
> >> >
> >> > References: https://bugs.freedesktop.org/show_bug.cgi?id=101627
> >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> > Cc: Matthew Auld <matthew.auld@intel.com>
> >> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >> > ---
> >> > drivers/gpu/drm/i915/i915_drv.c | 6 ++----
> >> > drivers/gpu/drm/i915/i915_drv.h | 20 ++++++++++++++++++++
> >> > drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +-
> >> > 3 files changed, 23 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> >> > index 4b62fd012877..41c5b11a7c8f 100644
> >> > --- a/drivers/gpu/drm/i915/i915_drv.c
> >> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> >> > @@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
> >> >
> >> > static void i915_gem_fini(struct drm_i915_private *dev_priv)
> >> > {
> >> > - flush_workqueue(dev_priv->wq);
> >> > + /* Flush any outstanding unpin_work. */
> >> > + i915_gem_drain_workqueue(dev_priv);
> >> >
> >> > mutex_lock(&dev_priv->drm.struct_mutex);
> >> > intel_uc_fini_hw(dev_priv);
> >> > @@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev)
> >> > cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
> >> > i915_reset_error_state(dev_priv);
> >> >
> >> > - /* Flush any outstanding unpin_work. */
> >> > - drain_workqueue(dev_priv->wq);
> >> > -
> >> > i915_gem_fini(dev_priv);
> >> > intel_uc_fini_fw(dev_priv);
> >> > intel_fbc_cleanup_cfb(dev_priv);
> >> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >> > index 667fb5c44483..e9a4b96dc775 100644
> >> > --- a/drivers/gpu/drm/i915/i915_drv.h
> >> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> >> > @@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
> >> > } while (flush_work(&i915->mm.free_work));
> >> > }
> >> >
> >> > +static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915)
> >> > +{
> >> > + /*
> >> > + * Similar to objects above (see i915_gem_drain_freed-objects), in
> >> > + * general we have workers that are armed by RCU and then rearm
> >> > + * themselves in their callbacks. To be paranoid, we need to
> >> > + * drain the workqueue a second time after waiting for the RCU
> >> > + * grace period so that we catch work queued via RCU from the first
> >> > + * pass. As neither drain_workqueue() nor flush_workqueue() report
> >> > + * a result, we make an assumption that we only don't require more
> >> > + * than 2 passes to catch all recursive RCU delayed work.
> >> > + *
> >> > + */
> >> > + int pass = 2;
> >> > + do {
> >> > + rcu_barrier();
> >> > + drain_workqueue(i915->wq);
> >>
> >> I am fine with the paranoia, and it covers the case below. Still if we do:
> >>
> >> drain_workqueue();
> >> rcu_barrier();
> >>
> >> With drawining in progress, only chain queuing is allowed. I understand
> >> this so that when it returns, all the ctx pointers are now unreferenced
> >> but not freed.
> >>
> >> Thus the rcu_barrier() after it cleans the trash and we are good to
> >> be unloaded. With one pass.
> >>
> >> I guess it comes to how to understand the comment, so could you
> >> elaborate the 'we have workers that are armed by RCU and then rearm
> >> themselves'?. As from drain_workqueue desc, this should be covered.
> >
> > I'm considering that they may be rearmed via RCU in the general case,
> > e.g. context close frees an object and so goes onto an RCU list that
> > once processed kicks off a new worker and so requires another round of
> > drain_workqueue. We are in module unload so a few extra delays to belts
> > and braces are ok until somebody notices it takes a few minutes to run a
> > reload test ;)
>
> Ok. Patch is
> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Thanks, I'm optimistic this will silence the bug, so marking it as
resolved. Pushed,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-07-19 12:23 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-28 15:39 [PATCH] drm/i915: Drain the device workqueue on unload Chris Wilson
2017-06-28 16:03 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-06-29 9:07 ` [PATCH] " Mika Kuoppala
2017-06-29 9:49 ` Chris Wilson
2017-07-18 13:41 ` [PATCH v2] " Chris Wilson
2017-07-19 11:18 ` Mika Kuoppala
2017-07-19 11:30 ` Chris Wilson
2017-07-19 11:51 ` Mika Kuoppala
2017-07-19 12:23 ` Chris Wilson
2017-07-18 13:58 ` ✓ Fi.CI.BAT: success for drm/i915: Drain the device workqueue on unload (rev2) Patchwork
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.