* [PATCH] drm/i915: Drain the device workqueue on unload @ 2017-06-28 15:39 Chris Wilson 2017-06-28 16:03 ` ✓ Fi.CI.BAT: success for " Patchwork ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Chris Wilson @ 2017-06-28 15:39 UTC (permalink / raw) To: intel-gfx; +Cc: Matthew Auld Workers on the i915->wq may rearm themselves so for completeness we need to replace our flush_workqueue() with a call to drain_workqueue() before unloading the device. References: https://bugs.freedesktop.org/show_bug.cgi?id=101627 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> --- drivers/gpu/drm/i915/i915_drv.c | 2 +- drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 9167a73f3c69..3f998d7102f7 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -592,7 +592,7 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = { static void i915_gem_fini(struct drm_i915_private *dev_priv) { - flush_workqueue(dev_priv->wq); + drain_workqueue(dev_priv->wq); mutex_lock(&dev_priv->drm.struct_mutex); intel_uc_fini_hw(dev_priv); diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c index 47613d20bba8..4beed89b51e6 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c @@ -57,7 +57,7 @@ static void mock_device_release(struct drm_device *dev) cancel_delayed_work_sync(&i915->gt.retire_work); cancel_delayed_work_sync(&i915->gt.idle_work); - flush_workqueue(i915->wq); + drain_workqueue(i915->wq); mutex_lock(&i915->drm.struct_mutex); for_each_engine(engine, i915, id) -- 2.13.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 10+ messages in thread
* ✓ Fi.CI.BAT: success for drm/i915: Drain the device workqueue on unload 2017-06-28 15:39 [PATCH] drm/i915: Drain the device workqueue on unload Chris Wilson @ 2017-06-28 16:03 ` Patchwork 2017-06-29 9:07 ` [PATCH] " Mika Kuoppala ` (2 subsequent siblings) 3 siblings, 0 replies; 10+ messages in thread From: Patchwork @ 2017-06-28 16:03 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx == Series Details == Series: drm/i915: Drain the device workqueue on unload URL : https://patchwork.freedesktop.org/series/26494/ State : success == Summary == Series 26494v1 drm/i915: Drain the device workqueue on unload https://patchwork.freedesktop.org/api/1.0/series/26494/revisions/1/mbox/ Test gem_exec_flush: Subgroup basic-batch-kernel-default-uc: fail -> PASS (fi-snb-2600) fdo#100007 Test kms_pipe_crc_basic: Subgroup suspend-read-crc-pipe-a: dmesg-warn -> PASS (fi-byt-j1900) fdo#101517 fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007 fdo#101517 https://bugs.freedesktop.org/show_bug.cgi?id=101517 fi-bdw-5557u total:279 pass:268 dwarn:0 dfail:0 fail:0 skip:11 time:443s fi-bdw-gvtdvm total:279 pass:257 dwarn:8 dfail:0 fail:0 skip:14 time:424s fi-blb-e6850 total:279 pass:224 dwarn:1 dfail:0 fail:0 skip:54 time:354s fi-bsw-n3050 total:279 pass:242 dwarn:1 dfail:0 fail:0 skip:36 time:539s fi-bxt-j4205 total:279 pass:260 dwarn:0 dfail:0 fail:0 skip:19 time:518s fi-byt-j1900 total:279 pass:254 dwarn:1 dfail:0 fail:0 skip:24 time:489s fi-byt-n2820 total:279 pass:249 dwarn:2 dfail:0 fail:0 skip:28 time:483s fi-glk-2a total:279 pass:260 dwarn:0 dfail:0 fail:0 skip:19 time:602s fi-hsw-4770 total:279 pass:263 dwarn:0 dfail:0 fail:0 skip:16 time:435s fi-hsw-4770r total:279 pass:263 dwarn:0 dfail:0 fail:0 skip:16 time:410s fi-ilk-650 total:279 pass:229 dwarn:0 dfail:0 fail:0 skip:50 time:414s fi-ivb-3520m total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:499s fi-ivb-3770 total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:471s fi-kbl-7500u total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:464s fi-kbl-7560u total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:570s fi-kbl-r total:279 pass:260 dwarn:1 dfail:0 fail:0 skip:18 time:581s fi-pnv-d510 total:279 pass:223 dwarn:1 dfail:0 fail:0 skip:55 time:557s fi-skl-6260u total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:461s fi-skl-6700hq total:279 pass:223 dwarn:1 dfail:0 fail:30 skip:24 time:340s fi-skl-6700k total:279 pass:257 dwarn:4 dfail:0 fail:0 skip:18 time:462s fi-skl-6770hq total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:479s fi-skl-gvtdvm total:279 pass:266 dwarn:0 dfail:0 fail:0 skip:13 time:436s fi-snb-2520m total:279 pass:251 dwarn:0 dfail:0 fail:0 skip:28 time:536s fi-snb-2600 total:279 pass:250 dwarn:0 dfail:0 fail:0 skip:29 time:414s 85a692e2c6a7cf93082044d776e838cb9e9b2146 drm-tip: 2017y-06m-28d-14h-24m-59s UTC integration manifest 99d97d4 drm/i915: Drain the device workqueue on unload == Logs == For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_5061/ _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] drm/i915: Drain the device workqueue on unload 2017-06-28 15:39 [PATCH] drm/i915: Drain the device workqueue on unload Chris Wilson 2017-06-28 16:03 ` ✓ Fi.CI.BAT: success for " Patchwork @ 2017-06-29 9:07 ` Mika Kuoppala 2017-06-29 9:49 ` Chris Wilson 2017-07-18 13:41 ` [PATCH v2] " Chris Wilson 2017-07-18 13:58 ` ✓ Fi.CI.BAT: success for drm/i915: Drain the device workqueue on unload (rev2) Patchwork 3 siblings, 1 reply; 10+ messages in thread From: Mika Kuoppala @ 2017-06-29 9:07 UTC (permalink / raw) To: Chris Wilson, intel-gfx; +Cc: Matthew Auld Chris Wilson <chris@chris-wilson.co.uk> writes: > Workers on the i915->wq may rearm themselves so for completeness we need > to replace our flush_workqueue() with a call to drain_workqueue() before > unloading the device. > > References: https://bugs.freedesktop.org/show_bug.cgi?id=101627 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Matthew Auld <matthew.auld@intel.com> > --- > drivers/gpu/drm/i915/i915_drv.c | 2 +- > drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index 9167a73f3c69..3f998d7102f7 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -592,7 +592,7 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = { > > static void i915_gem_fini(struct drm_i915_private *dev_priv) > { > - flush_workqueue(dev_priv->wq); > + drain_workqueue(dev_priv->wq); There will be superfluous drain_workqueue in driver_unload. Also the destroy will drain byitself but in here we want to drain before taking mutex? -Mika > > mutex_lock(&dev_priv->drm.struct_mutex); > intel_uc_fini_hw(dev_priv); > diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c > index 47613d20bba8..4beed89b51e6 100644 > --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c > +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c > @@ -57,7 +57,7 @@ static void mock_device_release(struct drm_device *dev) > > cancel_delayed_work_sync(&i915->gt.retire_work); > cancel_delayed_work_sync(&i915->gt.idle_work); > - flush_workqueue(i915->wq); > + drain_workqueue(i915->wq); > > mutex_lock(&i915->drm.struct_mutex); > for_each_engine(engine, i915, id) > -- > 2.13.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] drm/i915: Drain the device workqueue on unload 2017-06-29 9:07 ` [PATCH] " Mika Kuoppala @ 2017-06-29 9:49 ` Chris Wilson 0 siblings, 0 replies; 10+ messages in thread From: Chris Wilson @ 2017-06-29 9:49 UTC (permalink / raw) To: Mika Kuoppala, intel-gfx; +Cc: Matthew Auld Quoting Mika Kuoppala (2017-06-29 10:07:04) > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > Workers on the i915->wq may rearm themselves so for completeness we need > > to replace our flush_workqueue() with a call to drain_workqueue() before > > unloading the device. > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=101627 > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Matthew Auld <matthew.auld@intel.com> > > --- > > drivers/gpu/drm/i915/i915_drv.c | 2 +- > > drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +- > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > > index 9167a73f3c69..3f998d7102f7 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.c > > +++ b/drivers/gpu/drm/i915/i915_drv.c > > @@ -592,7 +592,7 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = { > > > > static void i915_gem_fini(struct drm_i915_private *dev_priv) > > { > > - flush_workqueue(dev_priv->wq); > > + drain_workqueue(dev_priv->wq); > > There will be superfluous drain_workqueue in driver_unload. > > Also the destroy will drain byitself but in here we want > to drain before taking mutex? Yes. Some fini functions (e.g. i915_gem_contexts_fini) rely on there being no pending work left and can safely destroy the parent structures. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2] drm/i915: Drain the device workqueue on unload 2017-06-28 15:39 [PATCH] drm/i915: Drain the device workqueue on unload Chris Wilson 2017-06-28 16:03 ` ✓ Fi.CI.BAT: success for " Patchwork 2017-06-29 9:07 ` [PATCH] " Mika Kuoppala @ 2017-07-18 13:41 ` Chris Wilson 2017-07-19 11:18 ` Mika Kuoppala 2017-07-18 13:58 ` ✓ Fi.CI.BAT: success for drm/i915: Drain the device workqueue on unload (rev2) Patchwork 3 siblings, 1 reply; 10+ messages in thread From: Chris Wilson @ 2017-07-18 13:41 UTC (permalink / raw) To: intel-gfx; +Cc: Matthew Auld Workers on the i915->wq may rearm themselves so for completeness we need to replace our flush_workqueue() with a call to drain_workqueue() before unloading the device. v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a few of the tasks that need to be drained may first be armed by RCU. References: https://bugs.freedesktop.org/show_bug.cgi?id=101627 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> --- drivers/gpu/drm/i915/i915_drv.c | 6 ++---- drivers/gpu/drm/i915/i915_drv.h | 20 ++++++++++++++++++++ drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +- 3 files changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 4b62fd012877..41c5b11a7c8f 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = { static void i915_gem_fini(struct drm_i915_private *dev_priv) { - flush_workqueue(dev_priv->wq); + /* Flush any outstanding unpin_work. */ + i915_gem_drain_workqueue(dev_priv); mutex_lock(&dev_priv->drm.struct_mutex); intel_uc_fini_hw(dev_priv); @@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev) cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work); i915_reset_error_state(dev_priv); - /* Flush any outstanding unpin_work. */ - drain_workqueue(dev_priv->wq); - i915_gem_fini(dev_priv); intel_uc_fini_fw(dev_priv); intel_fbc_cleanup_cfb(dev_priv); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 667fb5c44483..e9a4b96dc775 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915) } while (flush_work(&i915->mm.free_work)); } +static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915) +{ + /* + * Similar to objects above (see i915_gem_drain_freed-objects), in + * general we have workers that are armed by RCU and then rearm + * themselves in their callbacks. To be paranoid, we need to + * drain the workqueue a second time after waiting for the RCU + * grace period so that we catch work queued via RCU from the first + * pass. As neither drain_workqueue() nor flush_workqueue() report + * a result, we make an assumption that we only don't require more + * than 2 passes to catch all recursive RCU delayed work. + * + */ + int pass = 2; + do { + rcu_barrier(); + drain_workqueue(i915->wq); + } while (--pass); +} + struct i915_vma * __must_check i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj, const struct i915_ggtt_view *view, diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c index 47613d20bba8..7a468cb30946 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c @@ -57,7 +57,7 @@ static void mock_device_release(struct drm_device *dev) cancel_delayed_work_sync(&i915->gt.retire_work); cancel_delayed_work_sync(&i915->gt.idle_work); - flush_workqueue(i915->wq); + i915_gem_drain_workqueue(i915); mutex_lock(&i915->drm.struct_mutex); for_each_engine(engine, i915, id) -- 2.13.3 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2] drm/i915: Drain the device workqueue on unload 2017-07-18 13:41 ` [PATCH v2] " Chris Wilson @ 2017-07-19 11:18 ` Mika Kuoppala 2017-07-19 11:30 ` Chris Wilson 0 siblings, 1 reply; 10+ messages in thread From: Mika Kuoppala @ 2017-07-19 11:18 UTC (permalink / raw) To: Chris Wilson, intel-gfx; +Cc: Matthew Auld Chris Wilson <chris@chris-wilson.co.uk> writes: > Workers on the i915->wq may rearm themselves so for completeness we need > to replace our flush_workqueue() with a call to drain_workqueue() before > unloading the device. > > v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a > few of the tasks that need to be drained may first be armed by RCU. > > References: https://bugs.freedesktop.org/show_bug.cgi?id=101627 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Matthew Auld <matthew.auld@intel.com> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > --- > drivers/gpu/drm/i915/i915_drv.c | 6 ++---- > drivers/gpu/drm/i915/i915_drv.h | 20 ++++++++++++++++++++ > drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +- > 3 files changed, 23 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index 4b62fd012877..41c5b11a7c8f 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = { > > static void i915_gem_fini(struct drm_i915_private *dev_priv) > { > - flush_workqueue(dev_priv->wq); > + /* Flush any outstanding unpin_work. */ > + i915_gem_drain_workqueue(dev_priv); > > mutex_lock(&dev_priv->drm.struct_mutex); > intel_uc_fini_hw(dev_priv); > @@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev) > cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work); > i915_reset_error_state(dev_priv); > > - /* Flush any outstanding unpin_work. */ > - drain_workqueue(dev_priv->wq); > - > i915_gem_fini(dev_priv); > intel_uc_fini_fw(dev_priv); > intel_fbc_cleanup_cfb(dev_priv); > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 667fb5c44483..e9a4b96dc775 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915) > } while (flush_work(&i915->mm.free_work)); > } > > +static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915) > +{ > + /* > + * Similar to objects above (see i915_gem_drain_freed-objects), in > + * general we have workers that are armed by RCU and then rearm > + * themselves in their callbacks. To be paranoid, we need to > + * drain the workqueue a second time after waiting for the RCU > + * grace period so that we catch work queued via RCU from the first > + * pass. As neither drain_workqueue() nor flush_workqueue() report > + * a result, we make an assumption that we only don't require more > + * than 2 passes to catch all recursive RCU delayed work. > + * > + */ > + int pass = 2; > + do { > + rcu_barrier(); > + drain_workqueue(i915->wq); I am fine with the paranoia, and it covers the case below. Still if we do: drain_workqueue(); rcu_barrier(); With drawining in progress, only chain queuing is allowed. I understand this so that when it returns, all the ctx pointers are now unreferenced but not freed. Thus the rcu_barrier() after it cleans the trash and we are good to be unloaded. With one pass. I guess it comes to how to understand the comment, so could you elaborate the 'we have workers that are armed by RCU and then rearm themselves'?. As from drain_workqueue desc, this should be covered. Thanks, -Mika > + } while (--pass); > +} > + > struct i915_vma * __must_check > i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj, > const struct i915_ggtt_view *view, > diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c > index 47613d20bba8..7a468cb30946 100644 > --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c > +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c > @@ -57,7 +57,7 @@ static void mock_device_release(struct drm_device *dev) > > cancel_delayed_work_sync(&i915->gt.retire_work); > cancel_delayed_work_sync(&i915->gt.idle_work); > - flush_workqueue(i915->wq); > + i915_gem_drain_workqueue(i915); > > mutex_lock(&i915->drm.struct_mutex); > for_each_engine(engine, i915, id) > -- > 2.13.3 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] drm/i915: Drain the device workqueue on unload 2017-07-19 11:18 ` Mika Kuoppala @ 2017-07-19 11:30 ` Chris Wilson 2017-07-19 11:51 ` Mika Kuoppala 0 siblings, 1 reply; 10+ messages in thread From: Chris Wilson @ 2017-07-19 11:30 UTC (permalink / raw) To: Mika Kuoppala, intel-gfx; +Cc: Matthew Auld Quoting Mika Kuoppala (2017-07-19 12:18:47) > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > Workers on the i915->wq may rearm themselves so for completeness we need > > to replace our flush_workqueue() with a call to drain_workqueue() before > > unloading the device. > > > > v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a > > few of the tasks that need to be drained may first be armed by RCU. > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=101627 > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Matthew Auld <matthew.auld@intel.com> > > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > > --- > > drivers/gpu/drm/i915/i915_drv.c | 6 ++---- > > drivers/gpu/drm/i915/i915_drv.h | 20 ++++++++++++++++++++ > > drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +- > > 3 files changed, 23 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > > index 4b62fd012877..41c5b11a7c8f 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.c > > +++ b/drivers/gpu/drm/i915/i915_drv.c > > @@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = { > > > > static void i915_gem_fini(struct drm_i915_private *dev_priv) > > { > > - flush_workqueue(dev_priv->wq); > > + /* Flush any outstanding unpin_work. */ > > + i915_gem_drain_workqueue(dev_priv); > > > > mutex_lock(&dev_priv->drm.struct_mutex); > > intel_uc_fini_hw(dev_priv); > > @@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev) > > cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work); > > i915_reset_error_state(dev_priv); > > > > - /* Flush any outstanding unpin_work. */ > > - drain_workqueue(dev_priv->wq); > > - > > i915_gem_fini(dev_priv); > > intel_uc_fini_fw(dev_priv); > > intel_fbc_cleanup_cfb(dev_priv); > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > > index 667fb5c44483..e9a4b96dc775 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915) > > } while (flush_work(&i915->mm.free_work)); > > } > > > > +static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915) > > +{ > > + /* > > + * Similar to objects above (see i915_gem_drain_freed-objects), in > > + * general we have workers that are armed by RCU and then rearm > > + * themselves in their callbacks. To be paranoid, we need to > > + * drain the workqueue a second time after waiting for the RCU > > + * grace period so that we catch work queued via RCU from the first > > + * pass. As neither drain_workqueue() nor flush_workqueue() report > > + * a result, we make an assumption that we only don't require more > > + * than 2 passes to catch all recursive RCU delayed work. > > + * > > + */ > > + int pass = 2; > > + do { > > + rcu_barrier(); > > + drain_workqueue(i915->wq); > > I am fine with the paranoia, and it covers the case below. Still if we do: > > drain_workqueue(); > rcu_barrier(); > > With drawining in progress, only chain queuing is allowed. I understand > this so that when it returns, all the ctx pointers are now unreferenced > but not freed. > > Thus the rcu_barrier() after it cleans the trash and we are good to > be unloaded. With one pass. > > I guess it comes to how to understand the comment, so could you > elaborate the 'we have workers that are armed by RCU and then rearm > themselves'?. As from drain_workqueue desc, this should be covered. I'm considering that they may be rearmed via RCU in the general case, e.g. context close frees an object and so goes onto an RCU list that once processed kicks off a new worker and so requires another round of drain_workqueue. We are in module unload so a few extra delays to belts and braces are ok until somebody notices it takes a few minutes to run a reload test ;) -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] drm/i915: Drain the device workqueue on unload 2017-07-19 11:30 ` Chris Wilson @ 2017-07-19 11:51 ` Mika Kuoppala 2017-07-19 12:23 ` Chris Wilson 0 siblings, 1 reply; 10+ messages in thread From: Mika Kuoppala @ 2017-07-19 11:51 UTC (permalink / raw) To: Chris Wilson, intel-gfx; +Cc: Matthew Auld Chris Wilson <chris@chris-wilson.co.uk> writes: > Quoting Mika Kuoppala (2017-07-19 12:18:47) >> Chris Wilson <chris@chris-wilson.co.uk> writes: >> >> > Workers on the i915->wq may rearm themselves so for completeness we need >> > to replace our flush_workqueue() with a call to drain_workqueue() before >> > unloading the device. >> > >> > v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a >> > few of the tasks that need to be drained may first be armed by RCU. >> > >> > References: https://bugs.freedesktop.org/show_bug.cgi?id=101627 >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >> > Cc: Matthew Auld <matthew.auld@intel.com> >> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> >> > --- >> > drivers/gpu/drm/i915/i915_drv.c | 6 ++---- >> > drivers/gpu/drm/i915/i915_drv.h | 20 ++++++++++++++++++++ >> > drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +- >> > 3 files changed, 23 insertions(+), 5 deletions(-) >> > >> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c >> > index 4b62fd012877..41c5b11a7c8f 100644 >> > --- a/drivers/gpu/drm/i915/i915_drv.c >> > +++ b/drivers/gpu/drm/i915/i915_drv.c >> > @@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = { >> > >> > static void i915_gem_fini(struct drm_i915_private *dev_priv) >> > { >> > - flush_workqueue(dev_priv->wq); >> > + /* Flush any outstanding unpin_work. */ >> > + i915_gem_drain_workqueue(dev_priv); >> > >> > mutex_lock(&dev_priv->drm.struct_mutex); >> > intel_uc_fini_hw(dev_priv); >> > @@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev) >> > cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work); >> > i915_reset_error_state(dev_priv); >> > >> > - /* Flush any outstanding unpin_work. */ >> > - drain_workqueue(dev_priv->wq); >> > - >> > i915_gem_fini(dev_priv); >> > intel_uc_fini_fw(dev_priv); >> > intel_fbc_cleanup_cfb(dev_priv); >> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h >> > index 667fb5c44483..e9a4b96dc775 100644 >> > --- a/drivers/gpu/drm/i915/i915_drv.h >> > +++ b/drivers/gpu/drm/i915/i915_drv.h >> > @@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915) >> > } while (flush_work(&i915->mm.free_work)); >> > } >> > >> > +static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915) >> > +{ >> > + /* >> > + * Similar to objects above (see i915_gem_drain_freed-objects), in >> > + * general we have workers that are armed by RCU and then rearm >> > + * themselves in their callbacks. To be paranoid, we need to >> > + * drain the workqueue a second time after waiting for the RCU >> > + * grace period so that we catch work queued via RCU from the first >> > + * pass. As neither drain_workqueue() nor flush_workqueue() report >> > + * a result, we make an assumption that we only don't require more >> > + * than 2 passes to catch all recursive RCU delayed work. >> > + * >> > + */ >> > + int pass = 2; >> > + do { >> > + rcu_barrier(); >> > + drain_workqueue(i915->wq); >> >> I am fine with the paranoia, and it covers the case below. Still if we do: >> >> drain_workqueue(); >> rcu_barrier(); >> >> With drawining in progress, only chain queuing is allowed. I understand >> this so that when it returns, all the ctx pointers are now unreferenced >> but not freed. >> >> Thus the rcu_barrier() after it cleans the trash and we are good to >> be unloaded. With one pass. >> >> I guess it comes to how to understand the comment, so could you >> elaborate the 'we have workers that are armed by RCU and then rearm >> themselves'?. As from drain_workqueue desc, this should be covered. > > I'm considering that they may be rearmed via RCU in the general case, > e.g. context close frees an object and so goes onto an RCU list that > once processed kicks off a new worker and so requires another round of > drain_workqueue. We are in module unload so a few extra delays to belts > and braces are ok until somebody notices it takes a few minutes to run a > reload test ;) Ok. Patch is Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> > -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] drm/i915: Drain the device workqueue on unload 2017-07-19 11:51 ` Mika Kuoppala @ 2017-07-19 12:23 ` Chris Wilson 0 siblings, 0 replies; 10+ messages in thread From: Chris Wilson @ 2017-07-19 12:23 UTC (permalink / raw) To: Mika Kuoppala, intel-gfx; +Cc: Matthew Auld Quoting Mika Kuoppala (2017-07-19 12:51:04) > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > Quoting Mika Kuoppala (2017-07-19 12:18:47) > >> Chris Wilson <chris@chris-wilson.co.uk> writes: > >> > >> > Workers on the i915->wq may rearm themselves so for completeness we need > >> > to replace our flush_workqueue() with a call to drain_workqueue() before > >> > unloading the device. > >> > > >> > v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a > >> > few of the tasks that need to be drained may first be armed by RCU. > >> > > >> > References: https://bugs.freedesktop.org/show_bug.cgi?id=101627 > >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > >> > Cc: Matthew Auld <matthew.auld@intel.com> > >> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > >> > --- > >> > drivers/gpu/drm/i915/i915_drv.c | 6 ++---- > >> > drivers/gpu/drm/i915/i915_drv.h | 20 ++++++++++++++++++++ > >> > drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +- > >> > 3 files changed, 23 insertions(+), 5 deletions(-) > >> > > >> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > >> > index 4b62fd012877..41c5b11a7c8f 100644 > >> > --- a/drivers/gpu/drm/i915/i915_drv.c > >> > +++ b/drivers/gpu/drm/i915/i915_drv.c > >> > @@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = { > >> > > >> > static void i915_gem_fini(struct drm_i915_private *dev_priv) > >> > { > >> > - flush_workqueue(dev_priv->wq); > >> > + /* Flush any outstanding unpin_work. */ > >> > + i915_gem_drain_workqueue(dev_priv); > >> > > >> > mutex_lock(&dev_priv->drm.struct_mutex); > >> > intel_uc_fini_hw(dev_priv); > >> > @@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev) > >> > cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work); > >> > i915_reset_error_state(dev_priv); > >> > > >> > - /* Flush any outstanding unpin_work. */ > >> > - drain_workqueue(dev_priv->wq); > >> > - > >> > i915_gem_fini(dev_priv); > >> > intel_uc_fini_fw(dev_priv); > >> > intel_fbc_cleanup_cfb(dev_priv); > >> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > >> > index 667fb5c44483..e9a4b96dc775 100644 > >> > --- a/drivers/gpu/drm/i915/i915_drv.h > >> > +++ b/drivers/gpu/drm/i915/i915_drv.h > >> > @@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915) > >> > } while (flush_work(&i915->mm.free_work)); > >> > } > >> > > >> > +static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915) > >> > +{ > >> > + /* > >> > + * Similar to objects above (see i915_gem_drain_freed-objects), in > >> > + * general we have workers that are armed by RCU and then rearm > >> > + * themselves in their callbacks. To be paranoid, we need to > >> > + * drain the workqueue a second time after waiting for the RCU > >> > + * grace period so that we catch work queued via RCU from the first > >> > + * pass. As neither drain_workqueue() nor flush_workqueue() report > >> > + * a result, we make an assumption that we only don't require more > >> > + * than 2 passes to catch all recursive RCU delayed work. > >> > + * > >> > + */ > >> > + int pass = 2; > >> > + do { > >> > + rcu_barrier(); > >> > + drain_workqueue(i915->wq); > >> > >> I am fine with the paranoia, and it covers the case below. Still if we do: > >> > >> drain_workqueue(); > >> rcu_barrier(); > >> > >> With drawining in progress, only chain queuing is allowed. I understand > >> this so that when it returns, all the ctx pointers are now unreferenced > >> but not freed. > >> > >> Thus the rcu_barrier() after it cleans the trash and we are good to > >> be unloaded. With one pass. > >> > >> I guess it comes to how to understand the comment, so could you > >> elaborate the 'we have workers that are armed by RCU and then rearm > >> themselves'?. As from drain_workqueue desc, this should be covered. > > > > I'm considering that they may be rearmed via RCU in the general case, > > e.g. context close frees an object and so goes onto an RCU list that > > once processed kicks off a new worker and so requires another round of > > drain_workqueue. We are in module unload so a few extra delays to belts > > and braces are ok until somebody notices it takes a few minutes to run a > > reload test ;) > > Ok. Patch is > Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Thanks, I'm optimistic this will silence the bug, so marking it as resolved. Pushed, -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* ✓ Fi.CI.BAT: success for drm/i915: Drain the device workqueue on unload (rev2) 2017-06-28 15:39 [PATCH] drm/i915: Drain the device workqueue on unload Chris Wilson ` (2 preceding siblings ...) 2017-07-18 13:41 ` [PATCH v2] " Chris Wilson @ 2017-07-18 13:58 ` Patchwork 3 siblings, 0 replies; 10+ messages in thread From: Patchwork @ 2017-07-18 13:58 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx == Series Details == Series: drm/i915: Drain the device workqueue on unload (rev2) URL : https://patchwork.freedesktop.org/series/26494/ State : success == Summary == Series 26494v2 drm/i915: Drain the device workqueue on unload https://patchwork.freedesktop.org/api/1.0/series/26494/revisions/2/mbox/ Test gem_exec_suspend: Subgroup basic-s4-devices: pass -> DMESG-WARN (fi-kbl-7560u) k.org#196399 Test kms_cursor_legacy: Subgroup basic-busy-flip-before-cursor-atomic: pass -> FAIL (fi-snb-2600) fdo#100215 Test kms_flip: Subgroup basic-flip-vs-modeset: skip -> PASS (fi-skl-x1585l) fdo#101781 Test kms_pipe_crc_basic: Subgroup hang-read-crc-pipe-a: pass -> DMESG-WARN (fi-pnv-d510) fdo#101597 Subgroup suspend-read-crc-pipe-b: dmesg-warn -> PASS (fi-byt-j1900) fdo#101705 k.org#196399 https://bugzilla.kernel.org/show_bug.cgi?id=196399 fdo#100215 https://bugs.freedesktop.org/show_bug.cgi?id=100215 fdo#101781 https://bugs.freedesktop.org/show_bug.cgi?id=101781 fdo#101597 https://bugs.freedesktop.org/show_bug.cgi?id=101597 fdo#101705 https://bugs.freedesktop.org/show_bug.cgi?id=101705 fi-bdw-5557u total:279 pass:268 dwarn:0 dfail:0 fail:0 skip:11 time:440s fi-bdw-gvtdvm total:279 pass:265 dwarn:0 dfail:0 fail:0 skip:14 time:426s fi-blb-e6850 total:279 pass:224 dwarn:1 dfail:0 fail:0 skip:54 time:353s fi-bsw-n3050 total:279 pass:243 dwarn:0 dfail:0 fail:0 skip:36 time:533s fi-bxt-j4205 total:279 pass:260 dwarn:0 dfail:0 fail:0 skip:19 time:505s fi-byt-j1900 total:279 pass:255 dwarn:0 dfail:0 fail:0 skip:24 time:493s fi-byt-n2820 total:279 pass:251 dwarn:0 dfail:0 fail:0 skip:28 time:488s fi-glk-2a total:279 pass:260 dwarn:0 dfail:0 fail:0 skip:19 time:610s fi-hsw-4770 total:279 pass:263 dwarn:0 dfail:0 fail:0 skip:16 time:440s fi-hsw-4770r total:279 pass:263 dwarn:0 dfail:0 fail:0 skip:16 time:414s fi-ilk-650 total:279 pass:229 dwarn:0 dfail:0 fail:0 skip:50 time:414s fi-ivb-3520m total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:502s fi-ivb-3770 total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:470s fi-kbl-7500u total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:468s fi-kbl-7560u total:279 pass:268 dwarn:1 dfail:0 fail:0 skip:10 time:569s fi-kbl-r total:279 pass:260 dwarn:1 dfail:0 fail:0 skip:18 time:579s fi-pnv-d510 total:279 pass:222 dwarn:2 dfail:0 fail:0 skip:55 time:569s fi-skl-6260u total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:452s fi-skl-6700hq total:279 pass:262 dwarn:0 dfail:0 fail:0 skip:17 time:585s fi-skl-6700k total:279 pass:257 dwarn:4 dfail:0 fail:0 skip:18 time:467s fi-skl-6770hq total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:472s fi-skl-gvtdvm total:279 pass:266 dwarn:0 dfail:0 fail:0 skip:13 time:441s fi-skl-x1585l total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:482s fi-snb-2520m total:279 pass:251 dwarn:0 dfail:0 fail:0 skip:28 time:539s fi-snb-2600 total:279 pass:249 dwarn:0 dfail:0 fail:1 skip:29 time:411s 10de1e17faaab452782e5a1baffd1b30a639a261 drm-tip: 2017y-07m-18d-10h-08m-42s UTC integration manifest b859b5d drm/i915: Drain the device workqueue on unload == Logs == For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_5219/ _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-07-19 12:23 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-06-28 15:39 [PATCH] drm/i915: Drain the device workqueue on unload Chris Wilson 2017-06-28 16:03 ` ✓ Fi.CI.BAT: success for " Patchwork 2017-06-29 9:07 ` [PATCH] " Mika Kuoppala 2017-06-29 9:49 ` Chris Wilson 2017-07-18 13:41 ` [PATCH v2] " Chris Wilson 2017-07-19 11:18 ` Mika Kuoppala 2017-07-19 11:30 ` Chris Wilson 2017-07-19 11:51 ` Mika Kuoppala 2017-07-19 12:23 ` Chris Wilson 2017-07-18 13:58 ` ✓ Fi.CI.BAT: success for drm/i915: Drain the device workqueue on unload (rev2) Patchwork
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.