All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Unpin last_context at reset
@ 2014-06-18 19:04 ville.syrjala
  2014-06-18 19:04 ` [PATCH igt] tests/gem_ctx_exec: Add reset-pin-leak subtest ville.syrjala
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: ville.syrjala @ 2014-06-18 19:04 UTC (permalink / raw)
  To: intel-gfx

From: Ville Syrjälä <ville.syrjala@linux.intel.com>

We're forgetting to unpin the last_context from the ggtt at GPU reset
time. This leads to the vma pin_count leaking at every reset if the
last context wasn't the ring default context. Further use of the same
context will trigger the pin_count check in i915_gem_object_pin() and
userspace will be faced with EBUSY as a result.

This plaques kms_flip rather badly since it performs lots of resets,
and every fd has its own default context these days.

Fix the problem by properly unpinning the last context at reset.

Testcase: igt/gem_ctx_exec/reset-pin-leak
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 3ffe308..e362c96 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -382,6 +382,9 @@ void i915_gem_context_reset(struct drm_device *dev)
 			dctx->obj->active = 0;
 		}
 
+		if (ring->last_context->obj && i == RCS)
+			i915_gem_object_ggtt_unpin(ring->last_context->obj);
+
 		i915_gem_context_unreference(ring->last_context);
 		i915_gem_context_reference(dctx);
 		ring->last_context = dctx;
-- 
1.8.5.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH igt] tests/gem_ctx_exec: Add reset-pin-leak subtest
  2014-06-18 19:04 [PATCH] drm/i915: Unpin last_context at reset ville.syrjala
@ 2014-06-18 19:04 ` ville.syrjala
  2014-06-19  7:47 ` [PATCH] drm/i915: Unpin last_context at reset Chris Wilson
  2014-06-23 10:07 ` Ville Syrjälä
  2 siblings, 0 replies; 6+ messages in thread
From: ville.syrjala @ 2014-06-18 19:04 UTC (permalink / raw)
  To: intel-gfx

From: Ville Syrjälä <ville.syrjala@linux.intel.com>

Add a subtest to make sure the kernel doesn't leak the vma
pin_count for the last context on reset.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 tests/gem_ctx_exec.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/tests/gem_ctx_exec.c b/tests/gem_ctx_exec.c
index da49a2f..da7412c 100644
--- a/tests/gem_ctx_exec.c
+++ b/tests/gem_ctx_exec.c
@@ -46,6 +46,7 @@
 #include "ioctl_wrappers.h"
 #include "drmtest.h"
 #include "igt_aux.h"
+#include "igt_debugfs.h"
 
 struct local_drm_i915_gem_context_destroy {
 	__u32 ctx_id;
@@ -92,7 +93,6 @@ static int exec(int fd, uint32_t handle, int ring, int ctx_id)
 
 	ret = drmIoctl(fd, DRM_IOCTL_I915_GEM_EXECBUFFER2,
 			&execbuf);
-	gem_sync(fd, handle);
 
 	return ret;
 }
@@ -188,15 +188,43 @@ igt_main
 	igt_subtest("basic") {
 		ctx_id = gem_context_create(fd);
 		igt_assert(exec(fd, handle, I915_EXEC_RENDER, ctx_id) == 0);
+		gem_sync(fd, handle);
 		context_destroy(fd, ctx_id);
 
 		ctx_id = gem_context_create(fd);
 		igt_assert(exec(fd, handle, I915_EXEC_RENDER, ctx_id) == 0);
+		gem_sync(fd, handle);
 		context_destroy(fd, ctx_id);
 
 		igt_assert(exec(fd, handle, I915_EXEC_RENDER, ctx_id) < 0);
+		gem_sync(fd, handle);
 	}
 
 	igt_subtest("eviction")
 		big_exec(fd, handle, I915_EXEC_RENDER);
+
+	igt_subtest("reset-pin-leak") {
+		int i;
+
+		/*
+		 * Use an explicit context to isolate the test from
+		 * any major code changes related to the per-file
+		 * default context (eg. if they would be eliminated).
+		 */
+		ctx_id = gem_context_create(fd);
+
+		/*
+		 * Iterate enough times that the kernel will
+		 * become unhappy if the ggtt pin count for
+		 * the last context is leaked at every reset.
+		 */
+		for (i = 0; i < 20; i++) {
+			igt_set_stop_rings(STOP_RING_DEFAULTS);
+			igt_assert(exec(fd, handle, I915_EXEC_RENDER, 0) == 0);
+			igt_assert(exec(fd, handle, I915_EXEC_RENDER, ctx_id) == 0);
+			gem_sync(fd, handle);
+		}
+
+		context_destroy(fd, ctx_id);
+	}
 }
-- 
1.8.5.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Unpin last_context at reset
  2014-06-18 19:04 [PATCH] drm/i915: Unpin last_context at reset ville.syrjala
  2014-06-18 19:04 ` [PATCH igt] tests/gem_ctx_exec: Add reset-pin-leak subtest ville.syrjala
@ 2014-06-19  7:47 ` Chris Wilson
  2014-06-19 16:21   ` Mateo Lozano, Oscar
  2014-06-23 10:07 ` Ville Syrjälä
  2 siblings, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2014-06-19  7:47 UTC (permalink / raw)
  To: ville.syrjala; +Cc: intel-gfx

On Wed, Jun 18, 2014 at 10:04:48PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> We're forgetting to unpin the last_context from the ggtt at GPU reset
> time. This leads to the vma pin_count leaking at every reset if the
> last context wasn't the ring default context. Further use of the same
> context will trigger the pin_count check in i915_gem_object_pin() and
> userspace will be faced with EBUSY as a result.
> 
> This plaques kms_flip rather badly since it performs lots of resets,
> and every fd has its own default context these days.
> 
> Fix the problem by properly unpinning the last context at reset.

Ah, the context reset here is faked because we never restore the default
context state.  Hmm, in fact, I get the impression that we should just
delete i915_gem_context_reset(), and make i915_gem_context_enable()
function correctly after the GPU is reset.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Unpin last_context at reset
  2014-06-19  7:47 ` [PATCH] drm/i915: Unpin last_context at reset Chris Wilson
@ 2014-06-19 16:21   ` Mateo Lozano, Oscar
  0 siblings, 0 replies; 6+ messages in thread
From: Mateo Lozano, Oscar @ 2014-06-19 16:21 UTC (permalink / raw)
  To: Chris Wilson, ville.syrjala; +Cc: intel-gfx

> -----Original Message-----
> From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf
> Of Chris Wilson
> Sent: Thursday, June 19, 2014 8:47 AM
> To: ville.syrjala@linux.intel.com
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Unpin last_context at reset
> 
> On Wed, Jun 18, 2014 at 10:04:48PM +0300, ville.syrjala@linux.intel.com
> wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >
> > We're forgetting to unpin the last_context from the ggtt at GPU reset
> > time. This leads to the vma pin_count leaking at every reset if the
> > last context wasn't the ring default context. Further use of the same
> > context will trigger the pin_count check in i915_gem_object_pin() and
> > userspace will be faced with EBUSY as a result.
> >
> > This plaques kms_flip rather badly since it performs lots of resets,
> > and every fd has its own default context these days.
> >
> > Fix the problem by properly unpinning the last context at reset.
> 
> Ah, the context reset here is faked because we never restore the default
> context state.  Hmm, in fact, I get the impression that we should just delete
> i915_gem_context_reset(), and make i915_gem_context_enable() function
> correctly after the GPU is reset.
> -Chris

I´m doing something similar in a parallel conversation (see http://lists.freedesktop.org/archives/intel-gfx/2014-June/047744.html). My proposal was to make both the APPGTT switch and default context switch completely synchronous (RING_PP_DIR_BASE/GEN8_RING_PDP for the PPGTT, CCID for the context) in i915_gem_context_enable(). That way we get rid of i915_gem_context_reset() and we don´t need all the gpu reset special casing inside ppgtt->switch_mm.
Daniel wants a completely asynchronous thing, which seems complex without intel_ring_begin, ring->flush, intel_ring_advance, etc..

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Unpin last_context at reset
  2014-06-18 19:04 [PATCH] drm/i915: Unpin last_context at reset ville.syrjala
  2014-06-18 19:04 ` [PATCH igt] tests/gem_ctx_exec: Add reset-pin-leak subtest ville.syrjala
  2014-06-19  7:47 ` [PATCH] drm/i915: Unpin last_context at reset Chris Wilson
@ 2014-06-23 10:07 ` Ville Syrjälä
  2014-07-07 14:38   ` Daniel Vetter
  2 siblings, 1 reply; 6+ messages in thread
From: Ville Syrjälä @ 2014-06-23 10:07 UTC (permalink / raw)
  To: intel-gfx

On Wed, Jun 18, 2014 at 10:04:48PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> We're forgetting to unpin the last_context from the ggtt at GPU reset
> time. This leads to the vma pin_count leaking at every reset if the
> last context wasn't the ring default context. Further use of the same
> context will trigger the pin_count check in i915_gem_object_pin() and
> userspace will be faced with EBUSY as a result.
> 
> This plaques kms_flip rather badly since it performs lots of resets,
> and every fd has its own default context these days.
> 
> Fix the problem by properly unpinning the last context at reset.
> 
> Testcase: igt/gem_ctx_exec/reset-pin-leak
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

I pushed the igt, and also kms_flip hits this easily.

It looks to me that the bug has been there since the introduction of
i915_gem_context_reset() in [1], so I think we may want this patch
merged with cc: stable. Afterwards people can work on killing
i915_gem_context_reset() if they so wish.

[1]
 commit acce9ffa4807027965ebd948456fa8385bbee32e
 Author: Ben Widawsky <ben@bwidawsk.net>
 Date:   Fri Dec 6 14:11:03 2013 -0800

    drm/i915: Better reset handling for contexts

> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 3ffe308..e362c96 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -382,6 +382,9 @@ void i915_gem_context_reset(struct drm_device *dev)
>  			dctx->obj->active = 0;
>  		}
>  
> +		if (ring->last_context->obj && i == RCS)
> +			i915_gem_object_ggtt_unpin(ring->last_context->obj);
> +
>  		i915_gem_context_unreference(ring->last_context);
>  		i915_gem_context_reference(dctx);
>  		ring->last_context = dctx;
> -- 
> 1.8.5.5

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Unpin last_context at reset
  2014-06-23 10:07 ` Ville Syrjälä
@ 2014-07-07 14:38   ` Daniel Vetter
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Vetter @ 2014-07-07 14:38 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Mon, Jun 23, 2014 at 01:07:50PM +0300, Ville Syrjälä wrote:
> On Wed, Jun 18, 2014 at 10:04:48PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > We're forgetting to unpin the last_context from the ggtt at GPU reset
> > time. This leads to the vma pin_count leaking at every reset if the
> > last context wasn't the ring default context. Further use of the same
> > context will trigger the pin_count check in i915_gem_object_pin() and
> > userspace will be faced with EBUSY as a result.
> > 
> > This plaques kms_flip rather badly since it performs lots of resets,
> > and every fd has its own default context these days.
> > 
> > Fix the problem by properly unpinning the last context at reset.
> > 
> > Testcase: igt/gem_ctx_exec/reset-pin-leak
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> I pushed the igt, and also kms_flip hits this easily.
> 
> It looks to me that the bug has been there since the introduction of
> i915_gem_context_reset() in [1], so I think we may want this patch
> merged with cc: stable. Afterwards people can work on killing
> i915_gem_context_reset() if they so wish.
> 
> [1]
>  commit acce9ffa4807027965ebd948456fa8385bbee32e
>  Author: Ben Widawsky <ben@bwidawsk.net>
>  Date:   Fri Dec 6 14:11:03 2013 -0800
> 
>     drm/i915: Better reset handling for contexts

I've added this regression marker but forgone the cc: stable - haven't
heard anyone complain about this loud enough yet ... So just merged to
dinq.

We can do the proper fix of unifying the paths here a bit more later on.

Thanks for the patch.
-Daniel
> 
> > ---
> >  drivers/gpu/drm/i915/i915_gem_context.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 3ffe308..e362c96 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -382,6 +382,9 @@ void i915_gem_context_reset(struct drm_device *dev)
> >  			dctx->obj->active = 0;
> >  		}
> >  
> > +		if (ring->last_context->obj && i == RCS)
> > +			i915_gem_object_ggtt_unpin(ring->last_context->obj);
> > +
> >  		i915_gem_context_unreference(ring->last_context);
> >  		i915_gem_context_reference(dctx);
> >  		ring->last_context = dctx;
> > -- 
> > 1.8.5.5
> 
> -- 
> Ville Syrjälä
> Intel OTC
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-07-07 14:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-18 19:04 [PATCH] drm/i915: Unpin last_context at reset ville.syrjala
2014-06-18 19:04 ` [PATCH igt] tests/gem_ctx_exec: Add reset-pin-leak subtest ville.syrjala
2014-06-19  7:47 ` [PATCH] drm/i915: Unpin last_context at reset Chris Wilson
2014-06-19 16:21   ` Mateo Lozano, Oscar
2014-06-23 10:07 ` Ville Syrjälä
2014-07-07 14:38   ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.