All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: Andrea Arcangeli <aarcange@redhat.com>,
	Martin Kepplinger <martink@posteo.de>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	daniel.vetter@intel.com, Dave Airlie <airlied@gmail.com>,
	Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 1/5] i915: avoid kernel hang caused by synchronize rcu struct_mutex deadlock
Date: Fri, 07 Apr 2017 12:05:22 +0300	[thread overview]
Message-ID: <1491555922.3493.18.camel@linux.intel.com> (raw)
In-Reply-To: <20170406232347.988-2-aarcange@redhat.com>

On pe, 2017-04-07 at 01:23 +0200, Andrea Arcangeli wrote:
> synchronize_rcu/synchronize_sched/synchronize_rcu_expedited() will
> hang until its own workqueues are run. The i915 gem workqueues will
> wait on the struct_mutex to be released. So we cannot wait for a
> quiescent state using those rcu primitives while holding the
> struct_mutex or it creates a circular lock dependency resulting in
> kernel hangs (which is reproducible but goes undetected by lockdep).
> 
> This started in commit 3d3d18f086cdda72ee18a454db70ca72c6e3246c and
> lockdep didn't detect it apparently.

The right format is;

Fixes: 3d3d18f086cd ("drm/i915: Avoid rcu_barrier() from reclaim paths (shrinker)")

> @@ -324,6 +320,16 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
>  	if (unlock)
>  		mutex_unlock(&dev->struct_mutex);
>  
> +	if (likely(__mutex_owner(&dev->struct_mutex) != current))

This check can be dropped and synchronize_rcu_expedited() should be
embedded directly to the if (unlock) branch as it's functionally
equivalent. This can be applied to all the unlock cases, not just this
one. That should be the correct action to avoid the deadlock. I've sent
a patch to do this (Cc'd you), can you verify that it gets rid of the
problem for you?

> +		/*
> +		 * If reclaim was invoked by an allocation done while
> +		 * holding the struct mutex, we cannot call
> +		 * synchronize_rcu_expedited() as it depends on
> +		 * workqueues to run but the running workqueue may be
> +		 * blocked waiting on us to release struct_mutex.
> +		 */
> +		synchronize_rcu_expedited();
> +
>  	return freed;
>  }
>  
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation

WARNING: multiple messages have this Message-ID (diff)
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: Andrea Arcangeli <aarcange@redhat.com>,
	Martin Kepplinger <martink@posteo.de>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	daniel.vetter@intel.com, Dave Airlie <airlied@gmail.com>,
	Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 1/5] i915: avoid kernel hang caused by synchronize rcu struct_mutex deadlock
Date: Fri, 07 Apr 2017 12:05:22 +0300	[thread overview]
Message-ID: <1491555922.3493.18.camel@linux.intel.com> (raw)
In-Reply-To: <20170406232347.988-2-aarcange@redhat.com>

On pe, 2017-04-07 at 01:23 +0200, Andrea Arcangeli wrote:
> synchronize_rcu/synchronize_sched/synchronize_rcu_expedited() will
> hang until its own workqueues are run. The i915 gem workqueues will
> wait on the struct_mutex to be released. So we cannot wait for a
> quiescent state using those rcu primitives while holding the
> struct_mutex or it creates a circular lock dependency resulting in
> kernel hangs (which is reproducible but goes undetected by lockdep).
> 
> This started in commit 3d3d18f086cdda72ee18a454db70ca72c6e3246c and
> lockdep didn't detect it apparently.

The right format is;

Fixes: 3d3d18f086cd ("drm/i915: Avoid rcu_barrier() from reclaim paths (shrinker)")

> @@ -324,6 +320,16 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
>  	if (unlock)
>  		mutex_unlock(&dev->struct_mutex);
>  
> +	if (likely(__mutex_owner(&dev->struct_mutex) != current))

This check can be dropped and synchronize_rcu_expedited() should be
embedded directly to the if (unlock) branch as it's functionally
equivalent. This can be applied to all the unlock cases, not just this
one. That should be the correct action to avoid the deadlock. I've sent
a patch to do this (Cc'd you), can you verify that it gets rid of the
problem for you?

> +		/*
> +		 * If reclaim was invoked by an allocation done while
> +		 * holding the struct mutex, we cannot call
> +		 * synchronize_rcu_expedited() as it depends on
> +		 * workqueues to run but the running workqueue may be
> +		 * blocked waiting on us to release struct_mutex.
> +		 */
> +		synchronize_rcu_expedited();
> +
>  	return freed;
>  }
>  
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2017-04-07  9:05 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-22  8:38 [BUG][REGRESSION] i915 gpu hangs under load Martin Kepplinger
2017-03-22 10:36 ` [Intel-gfx] " Jani Nikula
2017-03-22 10:36   ` Jani Nikula
2017-04-02 11:50   ` [Intel-gfx] " Thorsten Leemhuis
2017-04-02 11:50     ` Thorsten Leemhuis
2017-04-02 12:13     ` [Intel-gfx] " Martin Kepplinger
2017-04-02 12:13       ` Martin Kepplinger
2017-04-03 15:09       ` [Intel-gfx] " Jani Nikula
2017-04-03 15:09         ` Jani Nikula
2017-04-06 23:23         ` [PATCH 0/5] " Andrea Arcangeli
2017-04-06 23:23           ` [PATCH 0/5] " Andrea Arcangeli
2017-04-06 23:23           ` [PATCH 1/5] i915: avoid kernel hang caused by synchronize rcu struct_mutex deadlock Andrea Arcangeli
2017-04-06 23:23             ` Andrea Arcangeli
2017-04-07  9:05             ` Joonas Lahtinen [this message]
2017-04-07  9:05               ` [Intel-gfx] " Joonas Lahtinen
2017-04-06 23:23           ` [PATCH 2/5] i915: flush gem obj freeing workqueues to add accuracy to the i915 shrinker Andrea Arcangeli
2017-04-06 23:23             ` Andrea Arcangeli
2017-04-07 10:02             ` Chris Wilson
2017-04-07 10:02               ` Chris Wilson
2017-04-07 13:06               ` Andrea Arcangeli
2017-04-07 13:06                 ` Andrea Arcangeli
2017-04-07 15:30                 ` Chris Wilson
2017-04-07 15:30                   ` Chris Wilson
2017-04-07 16:48                   ` Andrea Arcangeli
2017-04-07 16:48                     ` Andrea Arcangeli
2017-04-10  9:39                     ` Chris Wilson
2017-04-10  9:39                       ` Chris Wilson
2017-04-06 23:23           ` [PATCH 3/5] i915: initialize the free_list of the fencing atomic_helper Andrea Arcangeli
2017-04-06 23:23             ` Andrea Arcangeli
2017-04-07 10:35             ` Chris Wilson
2017-04-07 10:35               ` Chris Wilson
2017-04-06 23:23           ` [PATCH 4/5] i915: schedule while freeing the lists of gem objects Andrea Arcangeli
2017-04-06 23:23             ` Andrea Arcangeli
2017-04-06 23:23           ` [PATCH 5/5] i915: fence workqueue optimization Andrea Arcangeli
2017-04-06 23:23             ` Andrea Arcangeli
2017-04-07  9:58             ` Chris Wilson
2017-04-07  9:58               ` Chris Wilson
2017-04-07 13:13               ` Andrea Arcangeli
2017-04-07 13:13                 ` Andrea Arcangeli
2017-04-10 10:15           ` [PATCH 0/5] Re: [Intel-gfx] [BUG][REGRESSION] i915 gpu hangs under load Martin Kepplinger
2017-04-10 10:15             ` [PATCH 0/5] " Martin Kepplinger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1491555922.3493.18.camel@linux.intel.com \
    --to=joonas.lahtinen@linux.intel.com \
    --cc=aarcange@redhat.com \
    --cc=airlied@gmail.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martink@posteo.de \
    --cc=regressions@leemhuis.info \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.