[PATCH] drm/i915: remove user GTT mappings early during runtime suspend

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] drm/i915: remove user GTT mappings early during runtime suspend
@ 2014-05-06 11:28 Imre Deak
  2014-05-06 11:40 ` Chris Wilson
  2014-05-07 16:57 ` [PATCH v2] " Imre Deak
  0 siblings, 2 replies; 10+ messages in thread
From: Imre Deak @ 2014-05-06 11:28 UTC (permalink / raw)
  To: intel-gfx

Currently user space can access GEM buffers mapped to GTT through
existing mappings concurrently while the platform specific suspend
handlers are running.  Since these handlers may change the HW state in a
way that would break such accesses, remove the mappings before calling
the handlers.

Suggested by Ville.

Signed-off-by: Imre Deak <imre.deak@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 4024e16..2d4bb48 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1321,6 +1321,7 @@ static int intel_runtime_suspend(struct device *device)
 	 */
 	cancel_work_sync(&dev_priv->rps.work);
 	intel_runtime_pm_disable_interrupts(dev);
+	i915_gem_release_all_mmaps(dev_priv);
 
 	if (IS_GEN6(dev)) {
 		ret = 0;
@@ -1340,8 +1341,6 @@ static int intel_runtime_suspend(struct device *device)
 		return ret;
 	}
 
-	i915_gem_release_all_mmaps(dev_priv);
-
 	del_timer_sync(&dev_priv->gpu_error.hangcheck_timer);
 	dev_priv->pm.suspended = true;
 
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915: remove user GTT mappings early during runtime suspend
  2014-05-06 11:28 [PATCH] drm/i915: remove user GTT mappings early during runtime suspend Imre Deak
@ 2014-05-06 11:40 ` Chris Wilson
  2014-05-06 11:42   ` Imre Deak
  2014-05-07 16:57 ` [PATCH v2] " Imre Deak
  1 sibling, 1 reply; 10+ messages in thread
From: Chris Wilson @ 2014-05-06 11:40 UTC (permalink / raw)
  To: Imre Deak; +Cc: intel-gfx

On Tue, May 06, 2014 at 02:28:50PM +0300, Imre Deak wrote:
> Currently user space can access GEM buffers mapped to GTT through
> existing mappings concurrently while the platform specific suspend
> handlers are running.  Since these handlers may change the HW state in a
> way that would break such accesses, remove the mappings before calling
> the handlers.

Hmm, but you never locked the device, so what is preventing those
concurrent accesses from refaulting in GTT entires anyway. Please explain
the context under which the runtime suspend code executes, and leave
that explanation within easy reach of intel_runtime_suspend() -
preferrably with testing of those assumptions.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915: remove user GTT mappings early during runtime suspend
  2014-05-06 11:40 ` Chris Wilson
@ 2014-05-06 11:42   ` Imre Deak
  2014-05-06 11:59     ` Chris Wilson
  0 siblings, 1 reply; 10+ messages in thread
From: Imre Deak @ 2014-05-06 11:42 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 914 bytes --]

On Tue, 2014-05-06 at 12:40 +0100, Chris Wilson wrote:
> On Tue, May 06, 2014 at 02:28:50PM +0300, Imre Deak wrote:
> > Currently user space can access GEM buffers mapped to GTT through
> > existing mappings concurrently while the platform specific suspend
> > handlers are running.  Since these handlers may change the HW state in a
> > way that would break such accesses, remove the mappings before calling
> > the handlers.
> 
> Hmm, but you never locked the device, so what is preventing those
> concurrent accesses from refaulting in GTT entires anyway. Please explain
> the context under which the runtime suspend code executes, and leave
> that explanation within easy reach of intel_runtime_suspend() -
> preferrably with testing of those assumptions.

During faulting we take an RPM reference, so that avoids concurrent
re-faults. I could add a comment about this to the code.

--Imre


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915: remove user GTT mappings early during runtime suspend
  2014-05-06 11:42   ` Imre Deak
@ 2014-05-06 11:59     ` Chris Wilson
  2014-05-06 14:46       ` Imre Deak
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Wilson @ 2014-05-06 11:59 UTC (permalink / raw)
  To: Imre Deak; +Cc: intel-gfx

On Tue, May 06, 2014 at 02:42:26PM +0300, Imre Deak wrote:
> On Tue, 2014-05-06 at 12:40 +0100, Chris Wilson wrote:
> > On Tue, May 06, 2014 at 02:28:50PM +0300, Imre Deak wrote:
> > > Currently user space can access GEM buffers mapped to GTT through
> > > existing mappings concurrently while the platform specific suspend
> > > handlers are running.  Since these handlers may change the HW state in a
> > > way that would break such accesses, remove the mappings before calling
> > > the handlers.
> > 
> > Hmm, but you never locked the device, so what is preventing those
> > concurrent accesses from refaulting in GTT entires anyway. Please explain
> > the context under which the runtime suspend code executes, and leave
> > that explanation within easy reach of intel_runtime_suspend() -
> > preferrably with testing of those assumptions.
> 
> During faulting we take an RPM reference, so that avoids concurrent
> re-faults. I could add a comment about this to the code.

You are still iterating over lists that are not safe, right? Or are
there more magic rpm references that prevent ioctls whilst
intel_runtime_suspend is being called?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915: remove user GTT mappings early during runtime suspend
  2014-05-06 11:59     ` Chris Wilson
@ 2014-05-06 14:46       ` Imre Deak
  2014-05-06 19:27         ` Daniel Vetter
  0 siblings, 1 reply; 10+ messages in thread
From: Imre Deak @ 2014-05-06 14:46 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 2044 bytes --]

On Tue, 2014-05-06 at 12:59 +0100, Chris Wilson wrote:
> On Tue, May 06, 2014 at 02:42:26PM +0300, Imre Deak wrote:
> > On Tue, 2014-05-06 at 12:40 +0100, Chris Wilson wrote:
> > > On Tue, May 06, 2014 at 02:28:50PM +0300, Imre Deak wrote:
> > > > Currently user space can access GEM buffers mapped to GTT through
> > > > existing mappings concurrently while the platform specific suspend
> > > > handlers are running.  Since these handlers may change the HW state in a
> > > > way that would break such accesses, remove the mappings before calling
> > > > the handlers.
> > > 
> > > Hmm, but you never locked the device, so what is preventing those
> > > concurrent accesses from refaulting in GTT entires anyway. Please explain
> > > the context under which the runtime suspend code executes, and leave
> > > that explanation within easy reach of intel_runtime_suspend() -
> > > preferrably with testing of those assumptions.
> > 
> > During faulting we take an RPM reference, so that avoids concurrent
> > re-faults. I could add a comment about this to the code.
> 
> You are still iterating over lists that are not safe, right? Or are
> there more magic rpm references that prevent ioctls whilst
> intel_runtime_suspend is being called?

Tbh I haven't checked this, since moving the unmapping earlier doesn't
make a difference in this regard.

But it's a good point, I tried to audit now those paths. Currently the
assumption is that we hold an RPM reference everywhere where those lists
are changed. On the exec buffer path this is true, but for example in
i915_drop_caches_set() it's not.

We could fix this by taking struct_mutex around
i915_gem_release_all_mmaps() in intel_runtime_suspend(). Doing that
needs some more auditing to make sure we can't deadlock somehow. At
first glance it seems that the driver always schedules a deferred work
and calls intel_runtime_suspend() from that, so I think it's fine.

I suggest applying this patch regardless since the two issues are
separate.

--Imre

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915: remove user GTT mappings early during runtime suspend
  2014-05-06 14:46       ` Imre Deak
@ 2014-05-06 19:27         ` Daniel Vetter
  2014-05-06 20:03           ` Imre Deak
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Vetter @ 2014-05-06 19:27 UTC (permalink / raw)
  To: Imre Deak; +Cc: intel-gfx

On Tue, May 06, 2014 at 05:46:01PM +0300, Imre Deak wrote:
> On Tue, 2014-05-06 at 12:59 +0100, Chris Wilson wrote:
> > On Tue, May 06, 2014 at 02:42:26PM +0300, Imre Deak wrote:
> > > On Tue, 2014-05-06 at 12:40 +0100, Chris Wilson wrote:
> > > > On Tue, May 06, 2014 at 02:28:50PM +0300, Imre Deak wrote:
> > > > > Currently user space can access GEM buffers mapped to GTT through
> > > > > existing mappings concurrently while the platform specific suspend
> > > > > handlers are running.  Since these handlers may change the HW state in a
> > > > > way that would break such accesses, remove the mappings before calling
> > > > > the handlers.
> > > > 
> > > > Hmm, but you never locked the device, so what is preventing those
> > > > concurrent accesses from refaulting in GTT entires anyway. Please explain
> > > > the context under which the runtime suspend code executes, and leave
> > > > that explanation within easy reach of intel_runtime_suspend() -
> > > > preferrably with testing of those assumptions.
> > > 
> > > During faulting we take an RPM reference, so that avoids concurrent
> > > re-faults. I could add a comment about this to the code.
> > 
> > You are still iterating over lists that are not safe, right? Or are
> > there more magic rpm references that prevent ioctls whilst
> > intel_runtime_suspend is being called?
> 
> Tbh I haven't checked this, since moving the unmapping earlier doesn't
> make a difference in this regard.
> 
> But it's a good point, I tried to audit now those paths. Currently the
> assumption is that we hold an RPM reference everywhere where those lists
> are changed. On the exec buffer path this is true, but for example in
> i915_drop_caches_set() it's not.
> 
> We could fix this by taking struct_mutex around
> i915_gem_release_all_mmaps() in intel_runtime_suspend(). Doing that
> needs some more auditing to make sure we can't deadlock somehow. At
> first glance it seems that the driver always schedules a deferred work
> and calls intel_runtime_suspend() from that, so I think it's fine.
> 
> I suggest applying this patch regardless since the two issues are
> separate.

If I understand the situation correctly the runtime suspend function only
ever gets called from a worker thread after the hysteris timeout expired.
Which means we should be able to wrap _just_ the gtt pte shotdown with
dev->struct_mutex and nothing else. Which is good since profileration of
dev->struct_mutex is awful.

On the resume side we don't need any locking since the gtt fault handler
will first grab the runtime reference and also dev->struct_mutex.

One issue which is looming is that this might deadlock. We might need a
trylock in the runtime suspend function and abort the runtime suspend if
we can't get the lock. Please test that lockdep catches this before we
commit to a design.

Just a very quick analysis, I didn't check the details so this might be
horribly wrong.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915: remove user GTT mappings early during runtime suspend
  2014-05-06 19:27         ` Daniel Vetter
@ 2014-05-06 20:03           ` Imre Deak
  0 siblings, 0 replies; 10+ messages in thread
From: Imre Deak @ 2014-05-06 20:03 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, 2014-05-06 at 21:27 +0200, Daniel Vetter wrote:
> On Tue, May 06, 2014 at 05:46:01PM +0300, Imre Deak wrote:
> > On Tue, 2014-05-06 at 12:59 +0100, Chris Wilson wrote:
> > > On Tue, May 06, 2014 at 02:42:26PM +0300, Imre Deak wrote:
> > > > On Tue, 2014-05-06 at 12:40 +0100, Chris Wilson wrote:
> > > > > On Tue, May 06, 2014 at 02:28:50PM +0300, Imre Deak wrote:
> > > > > > Currently user space can access GEM buffers mapped to GTT through
> > > > > > existing mappings concurrently while the platform specific suspend
> > > > > > handlers are running.  Since these handlers may change the HW state in a
> > > > > > way that would break such accesses, remove the mappings before calling
> > > > > > the handlers.
> > > > > 
> > > > > Hmm, but you never locked the device, so what is preventing those
> > > > > concurrent accesses from refaulting in GTT entires anyway. Please explain
> > > > > the context under which the runtime suspend code executes, and leave
> > > > > that explanation within easy reach of intel_runtime_suspend() -
> > > > > preferrably with testing of those assumptions.
> > > > 
> > > > During faulting we take an RPM reference, so that avoids concurrent
> > > > re-faults. I could add a comment about this to the code.
> > > 
> > > You are still iterating over lists that are not safe, right? Or are
> > > there more magic rpm references that prevent ioctls whilst
> > > intel_runtime_suspend is being called?
> > 
> > Tbh I haven't checked this, since moving the unmapping earlier doesn't
> > make a difference in this regard.
> > 
> > But it's a good point, I tried to audit now those paths. Currently the
> > assumption is that we hold an RPM reference everywhere where those lists
> > are changed. On the exec buffer path this is true, but for example in
> > i915_drop_caches_set() it's not.
> > 
> > We could fix this by taking struct_mutex around
> > i915_gem_release_all_mmaps() in intel_runtime_suspend(). Doing that
> > needs some more auditing to make sure we can't deadlock somehow. At
> > first glance it seems that the driver always schedules a deferred work
> > and calls intel_runtime_suspend() from that, so I think it's fine.
> > 
> > I suggest applying this patch regardless since the two issues are
> > separate.
> 
> If I understand the situation correctly the runtime suspend function only
> ever gets called from a worker thread after the hysteris timeout expired.
> Which means we should be able to wrap _just_ the gtt pte shotdown with
> dev->struct_mutex and nothing else. Which is good since profileration of
> dev->struct_mutex is awful.
>
> On the resume side we don't need any locking since the gtt fault handler
> will first grab the runtime reference and also dev->struct_mutex.
> 
> One issue which is looming is that this might deadlock. We might need a
> trylock in the runtime suspend function and abort the runtime suspend if
> we can't get the lock. Please test that lockdep catches this before we
> commit to a design.

After looking some more at different possible solutions and the RPM core
this looks the easiest way. There could be cleaner ones, for example
rearranging the order everywhere of getting struct_mutex and RPM ref in
the same order, so that we always get the RPM ref outside of
struct_mutex. By this we could just take the mutex during runtime
suspend without the possibility of deadlocking. But this would need a
lot of change due to the RPM get in i915_gem_free_object(). 

It's also clear that we need the trylock, since an RPM get with a struct
mutex already held can can block on a pending RPM put, and so getting
the mutex in the suspend handler could deadlock. In this case we can
fail the suspend with EAGAIN, so it'll get scheduled again. 

--Imre

> Just a very quick analysis, I didn't check the details so this might be
> horribly wrong.

> -Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] drm/i915: remove user GTT mappings early during runtime suspend
  2014-05-06 11:28 [PATCH] drm/i915: remove user GTT mappings early during runtime suspend Imre Deak
  2014-05-06 11:40 ` Chris Wilson
@ 2014-05-07 16:57 ` Imre Deak
  2014-05-07 17:11   ` Imre Deak
  2014-05-22 11:48   ` Robert Beckett
  1 sibling, 2 replies; 10+ messages in thread
From: Imre Deak @ 2014-05-07 16:57 UTC (permalink / raw)
  To: intel-gfx

Currently user space can access GEM buffers mapped to GTT through
existing mappings concurrently while the platform specific suspend
handlers are running. Since these handlers may change the HW state in a
way that would break such accesses, remove the mappings before calling
the handlers. Spotted by Ville.

Also Chris pointed out that the lists that i915_gem_release_all_mmaps()
walks through need dev->struct_mutex, so take this lock. There is a
potential deadlock against a concurrent RPM resume, resolve this by
aborting and rescheduling the suspend (Daniel).

v2:
- take struct_mutex around i915_gem_release_all_mmaps() (Chris, Daniel)

Signed-off-by: Imre Deak <imre.deak@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 4024e16..0c9858c 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -36,6 +36,7 @@
 
 #include <linux/console.h>
 #include <linux/module.h>
+#include <linux/pm_runtime.h>
 #include <drm/drm_crtc_helper.h>
 
 static struct drm_driver driver;
@@ -1315,6 +1316,30 @@ static int intel_runtime_suspend(struct device *device)
 	DRM_DEBUG_KMS("Suspending device\n");
 
 	/*
+	 * We could deadlock here in case another thread holding struct_mutex
+	 * calls RPM suspend concurrently, since the RPM suspend will wait
+	 * first for this RPM suspend to finish. In this case the concurrent
+	 * RPM resume will be followed by its RPM suspend counterpart. Still
+	 * for consistency return -EAGAIN, which will reschedule this suspend.
+	 */
+	if (!mutex_trylock(&dev->struct_mutex)) {
+		DRM_DEBUG_KMS("device lock contention, deffering suspend\n");
+		/*
+		 * Bump the expiration timestamp, otherwise the suspend won't
+		 * be rescheduled.
+		 */
+		pm_runtime_mark_last_busy(device);
+
+		return -EAGAIN;
+	}
+	/*
+	 * We are safe here against re-faults, since the fault handler takes
+	 * an RPM reference.
+	 */
+	i915_gem_release_all_mmaps(dev_priv);
+	mutex_unlock(&dev->struct_mutex);
+
+	/*
 	 * rps.work can't be rearmed here, since we get here only after making
 	 * sure the GPU is idle and the RPS freq is set to the minimum. See
 	 * intel_mark_idle().
@@ -1340,8 +1365,6 @@ static int intel_runtime_suspend(struct device *device)
 		return ret;
 	}
 
-	i915_gem_release_all_mmaps(dev_priv);
-
 	del_timer_sync(&dev_priv->gpu_error.hangcheck_timer);
 	dev_priv->pm.suspended = true;
 
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] drm/i915: remove user GTT mappings early during runtime suspend
  2014-05-07 16:57 ` [PATCH v2] " Imre Deak
@ 2014-05-07 17:11   ` Imre Deak
  2014-05-22 11:48   ` Robert Beckett
  1 sibling, 0 replies; 10+ messages in thread
From: Imre Deak @ 2014-05-07 17:11 UTC (permalink / raw)
  To: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 2851 bytes --]

On Wed, 2014-05-07 at 19:57 +0300, Imre Deak wrote:
> Currently user space can access GEM buffers mapped to GTT through
> existing mappings concurrently while the platform specific suspend
> handlers are running. Since these handlers may change the HW state in a
> way that would break such accesses, remove the mappings before calling
> the handlers. Spotted by Ville.
> 
> Also Chris pointed out that the lists that i915_gem_release_all_mmaps()
> walks through need dev->struct_mutex, so take this lock. There is a
> potential deadlock against a concurrent RPM resume, resolve this by
> aborting and rescheduling the suspend (Daniel).
> 
> v2:
> - take struct_mutex around i915_gem_release_all_mmaps() (Chris, Daniel)
> 
> Signed-off-by: Imre Deak <imre.deak@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 27 +++++++++++++++++++++++++--
>  1 file changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 4024e16..0c9858c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -36,6 +36,7 @@
>  
>  #include <linux/console.h>
>  #include <linux/module.h>
> +#include <linux/pm_runtime.h>
>  #include <drm/drm_crtc_helper.h>
>  
>  static struct drm_driver driver;
> @@ -1315,6 +1316,30 @@ static int intel_runtime_suspend(struct device *device)
>  	DRM_DEBUG_KMS("Suspending device\n");
>  
>  	/*
> +	 * We could deadlock here in case another thread holding struct_mutex
> +	 * calls RPM suspend concurrently, since the RPM suspend will wait
                   resume^                             resume^
> +	 * first for this RPM suspend to finish. In this case the concurrent
> +	 * RPM resume will be followed by its RPM suspend counterpart. Still
> +	 * for consistency return -EAGAIN, which will reschedule this suspend.
> +	 */
> +	if (!mutex_trylock(&dev->struct_mutex)) {
> +		DRM_DEBUG_KMS("device lock contention, deffering suspend\n");
> +		/*
> +		 * Bump the expiration timestamp, otherwise the suspend won't
> +		 * be rescheduled.
> +		 */
> +		pm_runtime_mark_last_busy(device);
> +
> +		return -EAGAIN;
> +	}
> +	/*
> +	 * We are safe here against re-faults, since the fault handler takes
> +	 * an RPM reference.
> +	 */
> +	i915_gem_release_all_mmaps(dev_priv);
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	/*
>  	 * rps.work can't be rearmed here, since we get here only after making
>  	 * sure the GPU is idle and the RPS freq is set to the minimum. See
>  	 * intel_mark_idle().
> @@ -1340,8 +1365,6 @@ static int intel_runtime_suspend(struct device *device)
>  		return ret;
>  	}
>  
> -	i915_gem_release_all_mmaps(dev_priv);
> -
>  	del_timer_sync(&dev_priv->gpu_error.hangcheck_timer);
>  	dev_priv->pm.suspended = true;
>  


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] drm/i915: remove user GTT mappings early during runtime suspend
  2014-05-07 16:57 ` [PATCH v2] " Imre Deak
  2014-05-07 17:11   ` Imre Deak
@ 2014-05-22 11:48   ` Robert Beckett
  1 sibling, 0 replies; 10+ messages in thread
From: Robert Beckett @ 2014-05-22 11:48 UTC (permalink / raw)
  To: intel-gfx

On 07/05/2014 17:57, Imre Deak wrote:
> Currently user space can access GEM buffers mapped to GTT through
> existing mappings concurrently while the platform specific suspend
> handlers are running. Since these handlers may change the HW state in a
> way that would break such accesses, remove the mappings before calling
> the handlers. Spotted by Ville.
>
> Also Chris pointed out that the lists that i915_gem_release_all_mmaps()
> walks through need dev->struct_mutex, so take this lock. There is a
> potential deadlock against a concurrent RPM resume, resolve this by
> aborting and rescheduling the suspend (Daniel).
>
> v2:
> - take struct_mutex around i915_gem_release_all_mmaps() (Chris, Daniel)
>
> Signed-off-by: Imre Deak <imre.deak@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.c | 27 +++++++++++++++++++++++++--
>   1 file changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 4024e16..0c9858c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -36,6 +36,7 @@
>
>   #include <linux/console.h>
>   #include <linux/module.h>
> +#include <linux/pm_runtime.h>
>   #include <drm/drm_crtc_helper.h>
>
>   static struct drm_driver driver;
> @@ -1315,6 +1316,30 @@ static int intel_runtime_suspend(struct device *device)
>   	DRM_DEBUG_KMS("Suspending device\n");
>
>   	/*
> +	 * We could deadlock here in case another thread holding struct_mutex
> +	 * calls RPM suspend concurrently, since the RPM suspend will wait
> +	 * first for this RPM suspend to finish. In this case the concurrent
> +	 * RPM resume will be followed by its RPM suspend counterpart. Still
> +	 * for consistency return -EAGAIN, which will reschedule this suspend.
> +	 */
> +	if (!mutex_trylock(&dev->struct_mutex)) {
> +		DRM_DEBUG_KMS("device lock contention, deffering suspend\n");
> +		/*
> +		 * Bump the expiration timestamp, otherwise the suspend won't
> +		 * be rescheduled.
> +		 */
> +		pm_runtime_mark_last_busy(device);
> +
> +		return -EAGAIN;
> +	}
> +	/*
> +	 * We are safe here against re-faults, since the fault handler takes
> +	 * an RPM reference.
> +	 */
> +	i915_gem_release_all_mmaps(dev_priv);
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	/*
>   	 * rps.work can't be rearmed here, since we get here only after making
>   	 * sure the GPU is idle and the RPS freq is set to the minimum. See
>   	 * intel_mark_idle().
> @@ -1340,8 +1365,6 @@ static int intel_runtime_suspend(struct device *device)
>   		return ret;
>   	}
>
> -	i915_gem_release_all_mmaps(dev_priv);
> -
>   	del_timer_sync(&dev_priv->gpu_error.hangcheck_timer);
>   	dev_priv->pm.suspended = true;
>
>

Looks good to me.

Reviewed-by: Robert Beckett <robert.beckett@intel.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-05-22 11:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-06 11:28 [PATCH] drm/i915: remove user GTT mappings early during runtime suspend Imre Deak
2014-05-06 11:40 ` Chris Wilson
2014-05-06 11:42   ` Imre Deak
2014-05-06 11:59     ` Chris Wilson
2014-05-06 14:46       ` Imre Deak
2014-05-06 19:27         ` Daniel Vetter
2014-05-06 20:03           ` Imre Deak
2014-05-07 16:57 ` [PATCH v2] " Imre Deak
2014-05-07 17:11   ` Imre Deak
2014-05-22 11:48   ` Robert Beckett

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.