From: Andrea Arcangeli <aarcange@redhat.com> To: Martin Kepplinger <martink@posteo.de>, Thorsten Leemhuis <regressions@leemhuis.info>, daniel.vetter@intel.com, Dave Airlie <airlied@gmail.com>, Chris Wilson <chris@chris-wilson.co.uk> Cc: intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org Subject: [PATCH 1/5] i915: avoid kernel hang caused by synchronize rcu struct_mutex deadlock Date: Fri, 7 Apr 2017 01:23:43 +0200 [thread overview] Message-ID: <20170406232347.988-2-aarcange@redhat.com> (raw) In-Reply-To: <20170406232347.988-1-aarcange@redhat.com> synchronize_rcu/synchronize_sched/synchronize_rcu_expedited() will hang until its own workqueues are run. The i915 gem workqueues will wait on the struct_mutex to be released. So we cannot wait for a quiescent state using those rcu primitives while holding the struct_mutex or it creates a circular lock dependency resulting in kernel hangs (which is reproducible but goes undetected by lockdep). This started in commit 3d3d18f086cdda72ee18a454db70ca72c6e3246c and lockdep didn't detect it apparently. kswapd0 D 0 700 2 0x00000000 Call Trace: ? __schedule+0x1a5/0x660 ? schedule+0x36/0x80 ? _synchronize_rcu_expedited.constprop.65+0x2ef/0x300 ? wake_up_bit+0x20/0x20 ? rcu_stall_kick_kthreads.part.54+0xc0/0xc0 ? rcu_exp_wait_wake+0x530/0x530 ? i915_gem_shrink+0x34b/0x4b0 ? i915_gem_shrinker_scan+0x7c/0x90 ? i915_gem_shrinker_scan+0x7c/0x90 ? shrink_slab.part.61.constprop.72+0x1c1/0x3a0 ? shrink_zone+0x154/0x160 ? kswapd+0x40a/0x720 ? kthread+0xf4/0x130 ? try_to_free_pages+0x450/0x450 ? kthread_create_on_node+0x40/0x40 ? ret_from_fork+0x23/0x30 plasmashell D 0 4657 4614 0x00000000 Call Trace: ? __schedule+0x1a5/0x660 ? schedule+0x36/0x80 ? schedule_preempt_disabled+0xe/0x10 ? __mutex_lock.isra.4+0x1c9/0x790 ? i915_gem_close_object+0x26/0xc0 ? i915_gem_close_object+0x26/0xc0 ? drm_gem_object_release_handle+0x48/0x90 ? drm_gem_handle_delete+0x50/0x80 ? drm_ioctl+0x1fa/0x420 ? drm_gem_handle_create+0x40/0x40 ? pipe_write+0x391/0x410 ? __vfs_write+0xc6/0x120 ? do_vfs_ioctl+0x8b/0x5d0 ? SyS_ioctl+0x3b/0x70 ? entry_SYSCALL_64_fastpath+0x13/0x94 kworker/0:0 D 0 29186 2 0x00000000 Workqueue: events __i915_gem_free_work Call Trace: ? __schedule+0x1a5/0x660 ? schedule+0x36/0x80 ? schedule_preempt_disabled+0xe/0x10 ? __mutex_lock.isra.4+0x1c9/0x790 ? del_timer_sync+0x44/0x50 ? update_curr+0x57/0x110 ? __i915_gem_free_objects+0x31/0x300 ? __i915_gem_free_objects+0x31/0x300 ? __i915_gem_free_work+0x2d/0x40 ? process_one_work+0x13a/0x3b0 ? worker_thread+0x4a/0x460 ? kthread+0xf4/0x130 ? process_one_work+0x3b0/0x3b0 ? kthread_create_on_node+0x40/0x40 ? ret_from_fork+0x23/0x30 Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- drivers/gpu/drm/i915/i915_gem.c | 9 +++++++++ drivers/gpu/drm/i915/i915_gem_shrinker.c | 14 ++++++++++---- 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 67b1fc5..3982489 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4742,6 +4742,13 @@ int i915_gem_freeze(struct drm_i915_private *dev_priv) i915_gem_shrink_all(dev_priv); mutex_unlock(&dev_priv->drm.struct_mutex); + /* + * Cannot call synchronize_rcu() inside the struct_mutex + * because it may block until workqueues complete, and the + * running workqueue may wait on the struct_mutex. + */ + synchronize_rcu(); /* wait for our earlier RCU delayed slab frees */ + intel_runtime_pm_put(dev_priv); return 0; @@ -4781,6 +4788,8 @@ int i915_gem_freeze_late(struct drm_i915_private *dev_priv) } mutex_unlock(&dev_priv->drm.struct_mutex); + synchronize_rcu_expedited(); + return 0; } diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c index d5d2b4c..fea1454 100644 --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c @@ -235,9 +235,6 @@ i915_gem_shrink(struct drm_i915_private *dev_priv, if (unlock) mutex_unlock(&dev_priv->drm.struct_mutex); - /* expedite the RCU grace period to free some request slabs */ - synchronize_rcu_expedited(); - return count; } @@ -263,7 +260,6 @@ unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv) I915_SHRINK_BOUND | I915_SHRINK_UNBOUND | I915_SHRINK_ACTIVE); - synchronize_rcu(); /* wait for our earlier RCU delayed slab frees */ return freed; } @@ -324,6 +320,16 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc) if (unlock) mutex_unlock(&dev->struct_mutex); + if (likely(__mutex_owner(&dev->struct_mutex) != current)) + /* + * If reclaim was invoked by an allocation done while + * holding the struct mutex, we cannot call + * synchronize_rcu_expedited() as it depends on + * workqueues to run but the running workqueue may be + * blocked waiting on us to release struct_mutex. + */ + synchronize_rcu_expedited(); + return freed; }
WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com> To: Martin Kepplinger <martink@posteo.de>, Thorsten Leemhuis <regressions@leemhuis.info>, daniel.vetter@intel.com, Dave Airlie <airlied@gmail.com>, Chris Wilson <chris@chris-wilson.co.uk> Cc: intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org Subject: [PATCH 1/5] i915: avoid kernel hang caused by synchronize rcu struct_mutex deadlock Date: Fri, 7 Apr 2017 01:23:43 +0200 [thread overview] Message-ID: <20170406232347.988-2-aarcange@redhat.com> (raw) In-Reply-To: <20170406232347.988-1-aarcange@redhat.com> synchronize_rcu/synchronize_sched/synchronize_rcu_expedited() will hang until its own workqueues are run. The i915 gem workqueues will wait on the struct_mutex to be released. So we cannot wait for a quiescent state using those rcu primitives while holding the struct_mutex or it creates a circular lock dependency resulting in kernel hangs (which is reproducible but goes undetected by lockdep). This started in commit 3d3d18f086cdda72ee18a454db70ca72c6e3246c and lockdep didn't detect it apparently. kswapd0 D 0 700 2 0x00000000 Call Trace: ? __schedule+0x1a5/0x660 ? schedule+0x36/0x80 ? _synchronize_rcu_expedited.constprop.65+0x2ef/0x300 ? wake_up_bit+0x20/0x20 ? rcu_stall_kick_kthreads.part.54+0xc0/0xc0 ? rcu_exp_wait_wake+0x530/0x530 ? i915_gem_shrink+0x34b/0x4b0 ? i915_gem_shrinker_scan+0x7c/0x90 ? i915_gem_shrinker_scan+0x7c/0x90 ? shrink_slab.part.61.constprop.72+0x1c1/0x3a0 ? shrink_zone+0x154/0x160 ? kswapd+0x40a/0x720 ? kthread+0xf4/0x130 ? try_to_free_pages+0x450/0x450 ? kthread_create_on_node+0x40/0x40 ? ret_from_fork+0x23/0x30 plasmashell D 0 4657 4614 0x00000000 Call Trace: ? __schedule+0x1a5/0x660 ? schedule+0x36/0x80 ? schedule_preempt_disabled+0xe/0x10 ? __mutex_lock.isra.4+0x1c9/0x790 ? i915_gem_close_object+0x26/0xc0 ? i915_gem_close_object+0x26/0xc0 ? drm_gem_object_release_handle+0x48/0x90 ? drm_gem_handle_delete+0x50/0x80 ? drm_ioctl+0x1fa/0x420 ? drm_gem_handle_create+0x40/0x40 ? pipe_write+0x391/0x410 ? __vfs_write+0xc6/0x120 ? do_vfs_ioctl+0x8b/0x5d0 ? SyS_ioctl+0x3b/0x70 ? entry_SYSCALL_64_fastpath+0x13/0x94 kworker/0:0 D 0 29186 2 0x00000000 Workqueue: events __i915_gem_free_work Call Trace: ? __schedule+0x1a5/0x660 ? schedule+0x36/0x80 ? schedule_preempt_disabled+0xe/0x10 ? __mutex_lock.isra.4+0x1c9/0x790 ? del_timer_sync+0x44/0x50 ? update_curr+0x57/0x110 ? __i915_gem_free_objects+0x31/0x300 ? __i915_gem_free_objects+0x31/0x300 ? __i915_gem_free_work+0x2d/0x40 ? process_one_work+0x13a/0x3b0 ? worker_thread+0x4a/0x460 ? kthread+0xf4/0x130 ? process_one_work+0x3b0/0x3b0 ? kthread_create_on_node+0x40/0x40 ? ret_from_fork+0x23/0x30 Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- drivers/gpu/drm/i915/i915_gem.c | 9 +++++++++ drivers/gpu/drm/i915/i915_gem_shrinker.c | 14 ++++++++++---- 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 67b1fc5..3982489 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4742,6 +4742,13 @@ int i915_gem_freeze(struct drm_i915_private *dev_priv) i915_gem_shrink_all(dev_priv); mutex_unlock(&dev_priv->drm.struct_mutex); + /* + * Cannot call synchronize_rcu() inside the struct_mutex + * because it may block until workqueues complete, and the + * running workqueue may wait on the struct_mutex. + */ + synchronize_rcu(); /* wait for our earlier RCU delayed slab frees */ + intel_runtime_pm_put(dev_priv); return 0; @@ -4781,6 +4788,8 @@ int i915_gem_freeze_late(struct drm_i915_private *dev_priv) } mutex_unlock(&dev_priv->drm.struct_mutex); + synchronize_rcu_expedited(); + return 0; } diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c index d5d2b4c..fea1454 100644 --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c @@ -235,9 +235,6 @@ i915_gem_shrink(struct drm_i915_private *dev_priv, if (unlock) mutex_unlock(&dev_priv->drm.struct_mutex); - /* expedite the RCU grace period to free some request slabs */ - synchronize_rcu_expedited(); - return count; } @@ -263,7 +260,6 @@ unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv) I915_SHRINK_BOUND | I915_SHRINK_UNBOUND | I915_SHRINK_ACTIVE); - synchronize_rcu(); /* wait for our earlier RCU delayed slab frees */ return freed; } @@ -324,6 +320,16 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc) if (unlock) mutex_unlock(&dev->struct_mutex); + if (likely(__mutex_owner(&dev->struct_mutex) != current)) + /* + * If reclaim was invoked by an allocation done while + * holding the struct mutex, we cannot call + * synchronize_rcu_expedited() as it depends on + * workqueues to run but the running workqueue may be + * blocked waiting on us to release struct_mutex. + */ + synchronize_rcu_expedited(); + return freed; } _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2017-04-06 23:24 UTC|newest] Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-03-22 8:38 [BUG][REGRESSION] i915 gpu hangs under load Martin Kepplinger 2017-03-22 10:36 ` [Intel-gfx] " Jani Nikula 2017-03-22 10:36 ` Jani Nikula 2017-04-02 11:50 ` [Intel-gfx] " Thorsten Leemhuis 2017-04-02 11:50 ` Thorsten Leemhuis 2017-04-02 12:13 ` [Intel-gfx] " Martin Kepplinger 2017-04-02 12:13 ` Martin Kepplinger 2017-04-03 15:09 ` [Intel-gfx] " Jani Nikula 2017-04-03 15:09 ` Jani Nikula 2017-04-06 23:23 ` [PATCH 0/5] " Andrea Arcangeli 2017-04-06 23:23 ` [PATCH 0/5] " Andrea Arcangeli 2017-04-06 23:23 ` Andrea Arcangeli [this message] 2017-04-06 23:23 ` [PATCH 1/5] i915: avoid kernel hang caused by synchronize rcu struct_mutex deadlock Andrea Arcangeli 2017-04-07 9:05 ` [Intel-gfx] " Joonas Lahtinen 2017-04-07 9:05 ` Joonas Lahtinen 2017-04-06 23:23 ` [PATCH 2/5] i915: flush gem obj freeing workqueues to add accuracy to the i915 shrinker Andrea Arcangeli 2017-04-06 23:23 ` Andrea Arcangeli 2017-04-07 10:02 ` Chris Wilson 2017-04-07 10:02 ` Chris Wilson 2017-04-07 13:06 ` Andrea Arcangeli 2017-04-07 13:06 ` Andrea Arcangeli 2017-04-07 15:30 ` Chris Wilson 2017-04-07 15:30 ` Chris Wilson 2017-04-07 16:48 ` Andrea Arcangeli 2017-04-07 16:48 ` Andrea Arcangeli 2017-04-10 9:39 ` Chris Wilson 2017-04-10 9:39 ` Chris Wilson 2017-04-06 23:23 ` [PATCH 3/5] i915: initialize the free_list of the fencing atomic_helper Andrea Arcangeli 2017-04-06 23:23 ` Andrea Arcangeli 2017-04-07 10:35 ` Chris Wilson 2017-04-07 10:35 ` Chris Wilson 2017-04-06 23:23 ` [PATCH 4/5] i915: schedule while freeing the lists of gem objects Andrea Arcangeli 2017-04-06 23:23 ` Andrea Arcangeli 2017-04-06 23:23 ` [PATCH 5/5] i915: fence workqueue optimization Andrea Arcangeli 2017-04-06 23:23 ` Andrea Arcangeli 2017-04-07 9:58 ` Chris Wilson 2017-04-07 9:58 ` Chris Wilson 2017-04-07 13:13 ` Andrea Arcangeli 2017-04-07 13:13 ` Andrea Arcangeli 2017-04-10 10:15 ` [PATCH 0/5] Re: [Intel-gfx] [BUG][REGRESSION] i915 gpu hangs under load Martin Kepplinger 2017-04-10 10:15 ` [PATCH 0/5] " Martin Kepplinger
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20170406232347.988-2-aarcange@redhat.com \ --to=aarcange@redhat.com \ --cc=airlied@gmail.com \ --cc=chris@chris-wilson.co.uk \ --cc=daniel.vetter@intel.com \ --cc=dri-devel@lists.freedesktop.org \ --cc=intel-gfx@lists.freedesktop.org \ --cc=linux-kernel@vger.kernel.org \ --cc=martink@posteo.de \ --cc=regressions@leemhuis.info \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.