[PATCH] drm/i915/guc: Fix a false alert of memory leak when free LRC

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] drm/i915/guc: Fix a false alert of memory leak when free LRC
@ 2015-10-19 22:05 yu.dai
  2015-10-20  7:45 ` Daniel Vetter
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: yu.dai @ 2015-10-19 22:05 UTC (permalink / raw)
  To: intel-gfx

From: Alex Dai <yu.dai@intel.com>

There is a memory leak warning message from i915_gem_context_clean
when GuC submission is enabled. The reason is that the request (so
the LRC associated with it) is freed early than moving the vma list
to inactive. When retire a gem object, this patch moves its vma
list to inactive first to avoid the false alert of memory leak.

We are not seeing this in ExecList (non-GuC) mode because the gem
request is moved to execlist_retired_req_list queue. The management
of this queue, therefore free of LRC, happens after retire of vma
list (i915_gem_retire_requests_ring).

Signed-off-by: Alex Dai <yu.dai@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7d6b0c8..a903d45 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2377,28 +2377,31 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 	RQ_BUG_ON(obj->last_read_req[ring] == NULL);
 	RQ_BUG_ON(!(obj->active & (1 << ring)));
 
+	obj->active &= ~(1 << ring);
+	if (!obj->active) {
+		/* Bump our place on the bound list to keep it roughly in LRU
+		 * order so that we don't steal from recently used but inactive
+		 * objects (unless we are forced to ofc!)
+		 */
+		list_move_tail(&obj->global_list,
+				&to_i915(obj->base.dev)->mm.bound_list);
+
+		list_for_each_entry(vma, &obj->vma_list, vma_link) {
+			if (!list_empty(&vma->mm_list))
+				list_move_tail(&vma->mm_list,
+						&vma->vm->inactive_list);
+		}
+	}
+
 	list_del_init(&obj->ring_list[ring]);
 	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
 
 	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
 		i915_gem_object_retire__write(obj);
 
-	obj->active &= ~(1 << ring);
 	if (obj->active)
 		return;
 
-	/* Bump our place on the bound list to keep it roughly in LRU order
-	 * so that we don't steal from recently used but inactive objects
-	 * (unless we are forced to ofc!)
-	 */
-	list_move_tail(&obj->global_list,
-		       &to_i915(obj->base.dev)->mm.bound_list);
-
-	list_for_each_entry(vma, &obj->vma_list, vma_link) {
-		if (!list_empty(&vma->mm_list))
-			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
-	}
-
 	i915_gem_request_assign(&obj->last_fenced_req, NULL);
 	drm_gem_object_unreference(&obj->base);
 }
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915/guc: Fix a false alert of memory leak when free LRC
  2015-10-19 22:05 [PATCH] drm/i915/guc: Fix a false alert of memory leak when free LRC yu.dai
@ 2015-10-20  7:45 ` Daniel Vetter
  2015-10-21 18:27 ` yu.dai
  2015-11-20  0:10 ` [PATCH v1] drm/i915: " yu.dai
  2 siblings, 0 replies; 14+ messages in thread
From: Daniel Vetter @ 2015-10-20  7:45 UTC (permalink / raw)
  To: yu.dai; +Cc: intel-gfx

On Mon, Oct 19, 2015 at 03:05:46PM -0700, yu.dai@intel.com wrote:
> From: Alex Dai <yu.dai@intel.com>
> 
> There is a memory leak warning message from i915_gem_context_clean
> when GuC submission is enabled. The reason is that the request (so
> the LRC associated with it) is freed early than moving the vma list
> to inactive. When retire a gem object, this patch moves its vma
> list to inactive first to avoid the false alert of memory leak.
> 
> We are not seeing this in ExecList (non-GuC) mode because the gem
> request is moved to execlist_retired_req_list queue. The management
> of this queue, therefore free of LRC, happens after retire of vma
> list (i915_gem_retire_requests_ring).

Instead of hacking up the core active tracking code can we just fix lrc
context object tracking instead? This patch here seems to be supremely
fragile, and I really don't want it.
-Daniel

> 
> Signed-off-by: Alex Dai <yu.dai@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 29 ++++++++++++++++-------------
>  1 file changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7d6b0c8..a903d45 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2377,28 +2377,31 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>  	RQ_BUG_ON(obj->last_read_req[ring] == NULL);
>  	RQ_BUG_ON(!(obj->active & (1 << ring)));
>  
> +	obj->active &= ~(1 << ring);
> +	if (!obj->active) {
> +		/* Bump our place on the bound list to keep it roughly in LRU
> +		 * order so that we don't steal from recently used but inactive
> +		 * objects (unless we are forced to ofc!)
> +		 */
> +		list_move_tail(&obj->global_list,
> +				&to_i915(obj->base.dev)->mm.bound_list);
> +
> +		list_for_each_entry(vma, &obj->vma_list, vma_link) {
> +			if (!list_empty(&vma->mm_list))
> +				list_move_tail(&vma->mm_list,
> +						&vma->vm->inactive_list);
> +		}
> +	}
> +
>  	list_del_init(&obj->ring_list[ring]);
>  	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>  
>  	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
>  		i915_gem_object_retire__write(obj);
>  
> -	obj->active &= ~(1 << ring);
>  	if (obj->active)
>  		return;
>  
> -	/* Bump our place on the bound list to keep it roughly in LRU order
> -	 * so that we don't steal from recently used but inactive objects
> -	 * (unless we are forced to ofc!)
> -	 */
> -	list_move_tail(&obj->global_list,
> -		       &to_i915(obj->base.dev)->mm.bound_list);
> -
> -	list_for_each_entry(vma, &obj->vma_list, vma_link) {
> -		if (!list_empty(&vma->mm_list))
> -			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
> -	}
> -
>  	i915_gem_request_assign(&obj->last_fenced_req, NULL);
>  	drm_gem_object_unreference(&obj->base);
>  }
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] drm/i915/guc: Fix a false alert of memory leak when free LRC
  2015-10-19 22:05 [PATCH] drm/i915/guc: Fix a false alert of memory leak when free LRC yu.dai
  2015-10-20  7:45 ` Daniel Vetter
@ 2015-10-21 18:27 ` yu.dai
  2015-10-23 21:40   ` Dave Gordon
  2015-11-20  0:10 ` [PATCH v1] drm/i915: " yu.dai
  2 siblings, 1 reply; 14+ messages in thread
From: yu.dai @ 2015-10-21 18:27 UTC (permalink / raw)
  To: intel-gfx

From: Alex Dai <yu.dai@intel.com>

There is a memory leak warning message from i915_gem_context_clean
when GuC submission is enabled. The reason is that gem_request (so
the LRC associated with it) is freed early than moving the vma list
to inactive.

We are not seeing this in ExecList (non-GuC) mode because the
gem_request is tracked by execlist_retired_req_list. The management
of this queue, therefore free of LRC, happens after retire of vma
list. In this patch, we use the same gem_request management for GuC
submission. Because the context switch interrupt is handled by
firmware, intel_guc_retire_requests is introduced to move retired
gem_request to execlist_retired_req_list then be released later in
workqueue.

Signed-off-by: Alex Dai <yu.dai@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c            |  1 +
 drivers/gpu/drm/i915/i915_guc_submission.c | 35 +++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_guc.h           |  2 ++
 drivers/gpu/drm/i915/intel_lrc.c           | 10 ++++-----
 4 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7d6b0c8..6d8a0f1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2879,6 +2879,7 @@ i915_gem_retire_requests(struct drm_device *dev)
 			idle &= list_empty(&ring->execlist_queue);
 			spin_unlock_irqrestore(&ring->execlist_lock, flags);
 
+			intel_guc_retire_requests(ring);
 			intel_execlists_retire_requests(ring);
 		}
 	}
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 737b4f5..a35cfee 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -597,7 +597,8 @@ int i915_guc_submit(struct i915_guc_client *client,
 		    struct drm_i915_gem_request *rq)
 {
 	struct intel_guc *guc = client->guc;
-	enum intel_ring_id ring_id = rq->ring->id;
+	struct intel_engine_cs *ring = rq->ring;
+	enum intel_ring_id ring_id = ring->id;
 	unsigned long flags;
 	int q_ret, b_ret;
 
@@ -628,9 +629,41 @@ int i915_guc_submit(struct i915_guc_client *client,
 	guc->last_seqno[ring_id] = rq->seqno;
 	spin_unlock(&guc->host2guc_lock);
 
+	spin_lock_irq(&ring->execlist_lock);
+	list_add_tail(&rq->execlist_link, &ring->execlist_queue);
+	spin_unlock_irq(&ring->execlist_lock);
+
 	return q_ret;
 }
 
+void intel_guc_retire_requests(struct intel_engine_cs *ring)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+
+	if (!dev_priv->guc.execbuf_client)
+		return;
+
+	spin_lock_irq(&ring->execlist_lock);
+
+	while (!list_empty(&ring->execlist_queue)) {
+		struct drm_i915_gem_request *request;
+
+		request = list_first_entry(&ring->execlist_queue,
+				struct drm_i915_gem_request,
+				execlist_link);
+
+		if (!i915_gem_request_completed(request, true))
+			break;
+
+		list_del(&request->execlist_link);
+		list_add_tail(&request->execlist_link,
+			&ring->execlist_retired_req_list);
+
+	}
+
+	spin_unlock(&ring->execlist_lock);
+}
+
 /*
  * Everything below here is concerned with setup & teardown, and is
  * therefore not part of the somewhat time-critical batch-submission
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
index 8c5f82f..4c647b9 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -129,4 +129,6 @@ int i915_guc_submit(struct i915_guc_client *client,
 void i915_guc_submission_disable(struct drm_device *dev);
 void i915_guc_submission_fini(struct drm_device *dev);
 
+void intel_guc_retire_requests(struct intel_engine_cs *ring);
+
 #endif
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 98389bd..05620f3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -566,11 +566,6 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
 	struct drm_i915_gem_request *cursor;
 	int num_elements = 0;
 
-	if (request->ctx != ring->default_context)
-		intel_lr_context_pin(request);
-
-	i915_gem_request_reference(request);
-
 	spin_lock_irq(&ring->execlist_lock);
 
 	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
@@ -732,6 +727,11 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
 	if (intel_ring_stopped(ring))
 		return;
 
+	if (request->ctx != ring->default_context)
+		intel_lr_context_pin(request);
+
+	i915_gem_request_reference(request);
+
 	if (dev_priv->guc.execbuf_client)
 		i915_guc_submit(dev_priv->guc.execbuf_client, request);
 	else
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915/guc: Fix a false alert of memory leak when free LRC
  2015-10-21 18:27 ` yu.dai
@ 2015-10-23 21:40   ` Dave Gordon
  2015-10-24  8:52     ` Chris Wilson
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Gordon @ 2015-10-23 21:40 UTC (permalink / raw)
  To: intel-gfx, Dai, Yu

On 21/10/15 19:27, yu.dai@intel.com wrote:
> From: Alex Dai <yu.dai@intel.com>
>
> There is a memory leak warning message from i915_gem_context_clean
> when GuC submission is enabled. The reason is that gem_request (so
> the LRC associated with it) is freed early than moving the vma list
> to inactive.
>
> We are not seeing this in ExecList (non-GuC) mode because the
> gem_request is tracked by execlist_retired_req_list. The management
> of this queue, therefore free of LRC, happens after retire of vma
> list. In this patch, we use the same gem_request management for GuC
> submission. Because the context switch interrupt is handled by
> firmware, intel_guc_retire_requests is introduced to move retired
> gem_request to execlist_retired_req_list then be released later in
> workqueue.
>
> Signed-off-by: Alex Dai <yu.dai@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c            |  1 +
>   drivers/gpu/drm/i915/i915_guc_submission.c | 35 +++++++++++++++++++++++++++++-
>   drivers/gpu/drm/i915/intel_guc.h           |  2 ++
>   drivers/gpu/drm/i915/intel_lrc.c           | 10 ++++-----
>   4 files changed, 42 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7d6b0c8..6d8a0f1 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2879,6 +2879,7 @@ i915_gem_retire_requests(struct drm_device *dev)
>   			idle &= list_empty(&ring->execlist_queue);
>   			spin_unlock_irqrestore(&ring->execlist_lock, flags);
>
> +			intel_guc_retire_requests(ring);
>   			intel_execlists_retire_requests(ring);
>   		}
>   	}
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 737b4f5..a35cfee 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -597,7 +597,8 @@ int i915_guc_submit(struct i915_guc_client *client,
>   		    struct drm_i915_gem_request *rq)
>   {
>   	struct intel_guc *guc = client->guc;
> -	enum intel_ring_id ring_id = rq->ring->id;
> +	struct intel_engine_cs *ring = rq->ring;
> +	enum intel_ring_id ring_id = ring->id;
>   	unsigned long flags;
>   	int q_ret, b_ret;
>
> @@ -628,9 +629,41 @@ int i915_guc_submit(struct i915_guc_client *client,
>   	guc->last_seqno[ring_id] = rq->seqno;
>   	spin_unlock(&guc->host2guc_lock);
>
> +	spin_lock_irq(&ring->execlist_lock);
> +	list_add_tail(&rq->execlist_link, &ring->execlist_queue);
> +	spin_unlock_irq(&ring->execlist_lock);
> +
>   	return q_ret;
>   }
>
> +void intel_guc_retire_requests(struct intel_engine_cs *ring)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +
> +	if (!dev_priv->guc.execbuf_client)
> +		return;
> +
> +	spin_lock_irq(&ring->execlist_lock);
> +
> +	while (!list_empty(&ring->execlist_queue)) {
> +		struct drm_i915_gem_request *request;
> +
> +		request = list_first_entry(&ring->execlist_queue,
> +				struct drm_i915_gem_request,
> +				execlist_link);
> +
> +		if (!i915_gem_request_completed(request, true))
> +			break;
> +
> +		list_del(&request->execlist_link);
> +		list_add_tail(&request->execlist_link,
> +			&ring->execlist_retired_req_list);
> +
> +	}
> +
> +	spin_unlock(&ring->execlist_lock);
> +}
> +
>   /*
>    * Everything below here is concerned with setup & teardown, and is
>    * therefore not part of the somewhat time-critical batch-submission
> diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
> index 8c5f82f..4c647b9 100644
> --- a/drivers/gpu/drm/i915/intel_guc.h
> +++ b/drivers/gpu/drm/i915/intel_guc.h
> @@ -129,4 +129,6 @@ int i915_guc_submit(struct i915_guc_client *client,
>   void i915_guc_submission_disable(struct drm_device *dev);
>   void i915_guc_submission_fini(struct drm_device *dev);
>
> +void intel_guc_retire_requests(struct intel_engine_cs *ring);
> +
>   #endif
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 98389bd..05620f3 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -566,11 +566,6 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
>   	struct drm_i915_gem_request *cursor;
>   	int num_elements = 0;
>
> -	if (request->ctx != ring->default_context)
> -		intel_lr_context_pin(request);
> -
> -	i915_gem_request_reference(request);
> -
>   	spin_lock_irq(&ring->execlist_lock);
>
>   	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
> @@ -732,6 +727,11 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
>   	if (intel_ring_stopped(ring))
>   		return;
>
> +	if (request->ctx != ring->default_context)
> +		intel_lr_context_pin(request);
> +
> +	i915_gem_request_reference(request);
> +
>   	if (dev_priv->guc.execbuf_client)
>   		i915_guc_submit(dev_priv->guc.execbuf_client, request);
>   	else

Not entirely happy about this. If an extra ref is needed for both 
execlist and GuC mode (of which I'm not convinced) what about legacy 
ringbuffer mode? I thought execlists needed the extra ref only because 
it added an extra queue to hold work not yet submitted to hardware.
That queue isn't needed in ringbuffer OR GuC mode. OTOH if an extra ref 
is needed in ALL modes it should be in common core code, not replicated 
into all submission paths :)

You've got intel_logical_ring_advance_and_submit() taking the extra 
reference, but each of the execlist and GuC paths adding the request to 
the "execlist_queue" :( And then two separate but very similar functions 
reversing these two operations :(

Surely the VMA should not be freed while there are any outstanding 
(not-yet-retired) requests referring to it? Perhaps 
i915_gem_context_clean() is called in the wrong place (too early?). 
Shouldn't we have waited for all requests to complete and be retired 
BEFORE freeing the context they're using?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915/guc: Fix a false alert of memory leak when free LRC
  2015-10-23 21:40   ` Dave Gordon
@ 2015-10-24  8:52     ` Chris Wilson
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2015-10-24  8:52 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Fri, Oct 23, 2015 at 10:40:11PM +0100, Dave Gordon wrote:
> >@@ -732,6 +727,11 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
> >  	if (intel_ring_stopped(ring))
> >  		return;
> >
> >+	if (request->ctx != ring->default_context)
> >+		intel_lr_context_pin(request);
> >+
> >+	i915_gem_request_reference(request);
> >+
> >  	if (dev_priv->guc.execbuf_client)
> >  		i915_guc_submit(dev_priv->guc.execbuf_client, request);
> >  	else
> 
> Not entirely happy about this. If an extra ref is needed for both
> execlist and GuC mode (of which I'm not convinced) what about legacy
> ringbuffer mode? I thought execlists needed the extra ref only
> because it added an extra queue to hold work not yet submitted to
> hardware.
> That queue isn't needed in ringbuffer OR GuC mode. OTOH if an extra
> ref is needed in ALL modes it should be in common core code, not
> replicated into all submission paths :)

Indeed that extra ref here is bogus.
 
> You've got intel_logical_ring_advance_and_submit() taking the extra
> reference, but each of the execlist and GuC paths adding the request
> to the "execlist_queue" :( And then two separate but very similar
> functions reversing these two operations :(
> 
> Surely the VMA should not be freed while there are any outstanding
> (not-yet-retired) requests referring to it? Perhaps
> i915_gem_context_clean() is called in the wrong place (too early?).
> Shouldn't we have waited for all requests to complete and be retired
> BEFORE freeing the context they're using?

The issue is that the context_clean() is being called whilst the
bookkeeping is in an inconsistent state. My preferred solution here is
to revert that bogus patch as it doesn't fix the issue it was purported
to.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v1] drm/i915: Fix a false alert of memory leak when free LRC
  2015-10-19 22:05 [PATCH] drm/i915/guc: Fix a false alert of memory leak when free LRC yu.dai
  2015-10-20  7:45 ` Daniel Vetter
  2015-10-21 18:27 ` yu.dai
@ 2015-11-20  0:10 ` yu.dai
  2015-11-20  8:31   ` Daniel Vetter
  2 siblings, 1 reply; 14+ messages in thread
From: yu.dai @ 2015-11-20  0:10 UTC (permalink / raw)
  To: intel-gfx

From: Alex Dai <yu.dai@intel.com>

There is a memory leak warning message from i915_gem_context_clean
when GuC submission is enabled. The reason is that when LRC is
released, its ppgtt could be still referenced. The assumption that
all VMAs are unbound during release of LRC is not true.

v1: Move the code inside i915_gem_context_clean() to where ppgtt is
released because it is not cleaning context anyway but ppgtt.

Signed-off-by: Alex Dai <yu.dai@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 24 ------------------------
 drivers/gpu/drm/i915/i915_gem_gtt.c     | 12 ++++++++++++
 2 files changed, 12 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 204dc7c..cc5c8e6 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -133,23 +133,6 @@ static int get_context_size(struct drm_device *dev)
 	return ret;
 }
 
-static void i915_gem_context_clean(struct intel_context *ctx)
-{
-	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt;
-	struct i915_vma *vma, *next;
-
-	if (!ppgtt)
-		return;
-
-	WARN_ON(!list_empty(&ppgtt->base.active_list));
-
-	list_for_each_entry_safe(vma, next, &ppgtt->base.inactive_list,
-				 mm_list) {
-		if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
-			break;
-	}
-}
-
 void i915_gem_context_free(struct kref *ctx_ref)
 {
 	struct intel_context *ctx = container_of(ctx_ref, typeof(*ctx), ref);
@@ -159,13 +142,6 @@ void i915_gem_context_free(struct kref *ctx_ref)
 	if (i915.enable_execlists)
 		intel_lr_context_free(ctx);
 
-	/*
-	 * This context is going away and we need to remove all VMAs still
-	 * around. This is to handle imported shared objects for which
-	 * destructor did not run when their handles were closed.
-	 */
-	i915_gem_context_clean(ctx);
-
 	i915_ppgtt_put(ctx->ppgtt);
 
 	if (ctx->legacy_hw_ctx.rcs_state)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 016739e..d36943c 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2214,6 +2214,7 @@ void  i915_ppgtt_release(struct kref *kref)
 {
 	struct i915_hw_ppgtt *ppgtt =
 		container_of(kref, struct i915_hw_ppgtt, ref);
+	struct i915_vma *vma, *next;
 
 	trace_i915_ppgtt_release(&ppgtt->base);
 
@@ -2221,6 +2222,17 @@ void  i915_ppgtt_release(struct kref *kref)
 	WARN_ON(!list_empty(&ppgtt->base.active_list));
 	WARN_ON(!list_empty(&ppgtt->base.inactive_list));
 
+	/*
+	 * This ppgtt is going away and we need to remove all VMAs still
+	 * around. This is to handle imported shared objects for which
+	 * destructor did not run when their handles were closed.
+	 */
+	list_for_each_entry_safe(vma, next, &ppgtt->base.inactive_list,
+				 mm_list) {
+		if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
+			break;
+	}
+
 	list_del(&ppgtt->base.global_link);
 	drm_mm_takedown(&ppgtt->base.mm);
 
-- 
2.5.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v1] drm/i915: Fix a false alert of memory leak when free LRC
  2015-11-20  0:10 ` [PATCH v1] drm/i915: " yu.dai
@ 2015-11-20  8:31   ` Daniel Vetter
  2015-11-20 18:38     ` Yu Dai
  2015-11-23 10:34     ` Tvrtko Ursulin
  0 siblings, 2 replies; 14+ messages in thread
From: Daniel Vetter @ 2015-11-20  8:31 UTC (permalink / raw)
  To: yu.dai; +Cc: intel-gfx

On Thu, Nov 19, 2015 at 04:10:26PM -0800, yu.dai@intel.com wrote:
> From: Alex Dai <yu.dai@intel.com>
> 
> There is a memory leak warning message from i915_gem_context_clean
> when GuC submission is enabled. The reason is that when LRC is
> released, its ppgtt could be still referenced. The assumption that
> all VMAs are unbound during release of LRC is not true.
> 
> v1: Move the code inside i915_gem_context_clean() to where ppgtt is
> released because it is not cleaning context anyway but ppgtt.
> 
> Signed-off-by: Alex Dai <yu.dai@intel.com>

retire__read drops the ctx (and hence ppgtt) reference too early,
resulting in us hitting the WARNING. See the giant thread with Tvrtko,
Chris and me:

http://www.spinics.net/lists/intel-gfx/msg78918.html

Would be great if someone could test the diff I posted in there.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 24 ------------------------
>  drivers/gpu/drm/i915/i915_gem_gtt.c     | 12 ++++++++++++
>  2 files changed, 12 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 204dc7c..cc5c8e6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -133,23 +133,6 @@ static int get_context_size(struct drm_device *dev)
>  	return ret;
>  }
>  
> -static void i915_gem_context_clean(struct intel_context *ctx)
> -{
> -	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt;
> -	struct i915_vma *vma, *next;
> -
> -	if (!ppgtt)
> -		return;
> -
> -	WARN_ON(!list_empty(&ppgtt->base.active_list));
> -
> -	list_for_each_entry_safe(vma, next, &ppgtt->base.inactive_list,
> -				 mm_list) {
> -		if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
> -			break;
> -	}
> -}
> -
>  void i915_gem_context_free(struct kref *ctx_ref)
>  {
>  	struct intel_context *ctx = container_of(ctx_ref, typeof(*ctx), ref);
> @@ -159,13 +142,6 @@ void i915_gem_context_free(struct kref *ctx_ref)
>  	if (i915.enable_execlists)
>  		intel_lr_context_free(ctx);
>  
> -	/*
> -	 * This context is going away and we need to remove all VMAs still
> -	 * around. This is to handle imported shared objects for which
> -	 * destructor did not run when their handles were closed.
> -	 */
> -	i915_gem_context_clean(ctx);
> -
>  	i915_ppgtt_put(ctx->ppgtt);
>  
>  	if (ctx->legacy_hw_ctx.rcs_state)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 016739e..d36943c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2214,6 +2214,7 @@ void  i915_ppgtt_release(struct kref *kref)
>  {
>  	struct i915_hw_ppgtt *ppgtt =
>  		container_of(kref, struct i915_hw_ppgtt, ref);
> +	struct i915_vma *vma, *next;
>  
>  	trace_i915_ppgtt_release(&ppgtt->base);
>  
> @@ -2221,6 +2222,17 @@ void  i915_ppgtt_release(struct kref *kref)
>  	WARN_ON(!list_empty(&ppgtt->base.active_list));
>  	WARN_ON(!list_empty(&ppgtt->base.inactive_list));
>  
> +	/*
> +	 * This ppgtt is going away and we need to remove all VMAs still
> +	 * around. This is to handle imported shared objects for which
> +	 * destructor did not run when their handles were closed.
> +	 */
> +	list_for_each_entry_safe(vma, next, &ppgtt->base.inactive_list,
> +				 mm_list) {
> +		if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
> +			break;
> +	}
> +
>  	list_del(&ppgtt->base.global_link);
>  	drm_mm_takedown(&ppgtt->base.mm);
>  
> -- 
> 2.5.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1] drm/i915: Fix a false alert of memory leak when free LRC
  2015-11-20  8:31   ` Daniel Vetter
@ 2015-11-20 18:38     ` Yu Dai
  2015-11-23 10:34     ` Tvrtko Ursulin
  1 sibling, 0 replies; 14+ messages in thread
From: Yu Dai @ 2015-11-20 18:38 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx



On 11/20/2015 12:31 AM, Daniel Vetter wrote:
> On Thu, Nov 19, 2015 at 04:10:26PM -0800, yu.dai@intel.com wrote:
> > From: Alex Dai <yu.dai@intel.com>
> >
> > There is a memory leak warning message from i915_gem_context_clean
> > when GuC submission is enabled. The reason is that when LRC is
> > released, its ppgtt could be still referenced. The assumption that
> > all VMAs are unbound during release of LRC is not true.
> >
> > v1: Move the code inside i915_gem_context_clean() to where ppgtt is
> > released because it is not cleaning context anyway but ppgtt.
> >
> > Signed-off-by: Alex Dai <yu.dai@intel.com>
>
> retire__read drops the ctx (and hence ppgtt) reference too early,
> resulting in us hitting the WARNING. See the giant thread with Tvrtko,
> Chris and me:
>
> http://www.spinics.net/lists/intel-gfx/msg78918.html
>
> Would be great if someone could test the diff I posted in there.

I have to recall my patch because of calling vma_unbind inside 
ppgtt_release is not right.

> > ---
> >  drivers/gpu/drm/i915/i915_gem_context.c | 24 ------------------------
> >  drivers/gpu/drm/i915/i915_gem_gtt.c     | 12 ++++++++++++
> >  2 files changed, 12 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 204dc7c..cc5c8e6 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -133,23 +133,6 @@ static int get_context_size(struct drm_device *dev)
> >  	return ret;
> >  }
> >
> > -static void i915_gem_context_clean(struct intel_context *ctx)
> > -{
> > -	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt;
> > -	struct i915_vma *vma, *next;
> > -
> > -	if (!ppgtt)
> > -		return;
> > -
> > -	WARN_ON(!list_empty(&ppgtt->base.active_list));
> > -
> > -	list_for_each_entry_safe(vma, next, &ppgtt->base.inactive_list,
> > -				 mm_list) {
> > -		if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
> > -			break;
> > -	}
> > -}
> > -
> >  void i915_gem_context_free(struct kref *ctx_ref)
> >  {
> >  	struct intel_context *ctx = container_of(ctx_ref, typeof(*ctx), ref);
> > @@ -159,13 +142,6 @@ void i915_gem_context_free(struct kref *ctx_ref)
> >  	if (i915.enable_execlists)
> >  		intel_lr_context_free(ctx);
> >
> > -	/*
> > -	 * This context is going away and we need to remove all VMAs still
> > -	 * around. This is to handle imported shared objects for which
> > -	 * destructor did not run when their handles were closed.
> > -	 */
> > -	i915_gem_context_clean(ctx);
> > -
> >  	i915_ppgtt_put(ctx->ppgtt);
> >
> >  	if (ctx->legacy_hw_ctx.rcs_state)
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index 016739e..d36943c 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -2214,6 +2214,7 @@ void  i915_ppgtt_release(struct kref *kref)
> >  {
> >  	struct i915_hw_ppgtt *ppgtt =
> >  		container_of(kref, struct i915_hw_ppgtt, ref);
> > +	struct i915_vma *vma, *next;
> >
> >  	trace_i915_ppgtt_release(&ppgtt->base);
> >
> > @@ -2221,6 +2222,17 @@ void  i915_ppgtt_release(struct kref *kref)
> >  	WARN_ON(!list_empty(&ppgtt->base.active_list));
> >  	WARN_ON(!list_empty(&ppgtt->base.inactive_list));
> >
> > +	/*
> > +	 * This ppgtt is going away and we need to remove all VMAs still
> > +	 * around. This is to handle imported shared objects for which
> > +	 * destructor did not run when their handles were closed.
> > +	 */
> > +	list_for_each_entry_safe(vma, next, &ppgtt->base.inactive_list,
> > +				 mm_list) {
> > +		if (WARN_ON(__i915_vma_unbind_no_wait(vma)))
> > +			break;
> > +	}
> > +
> >  	list_del(&ppgtt->base.global_link);
> >  	drm_mm_takedown(&ppgtt->base.mm);
> >
> > --
> > 2.5.0
> >
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1] drm/i915: Fix a false alert of memory leak when free LRC
  2015-11-20  8:31   ` Daniel Vetter
  2015-11-20 18:38     ` Yu Dai
@ 2015-11-23 10:34     ` Tvrtko Ursulin
  2015-11-23 22:30       ` Yu Dai
  2015-11-24 10:57       ` Daniel Vetter
  1 sibling, 2 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2015-11-23 10:34 UTC (permalink / raw)
  To: Daniel Vetter, yu.dai; +Cc: intel-gfx


On 20/11/15 08:31, Daniel Vetter wrote:
> On Thu, Nov 19, 2015 at 04:10:26PM -0800, yu.dai@intel.com wrote:
>> From: Alex Dai <yu.dai@intel.com>
>>
>> There is a memory leak warning message from i915_gem_context_clean
>> when GuC submission is enabled. The reason is that when LRC is
>> released, its ppgtt could be still referenced. The assumption that
>> all VMAs are unbound during release of LRC is not true.
>>
>> v1: Move the code inside i915_gem_context_clean() to where ppgtt is
>> released because it is not cleaning context anyway but ppgtt.
>>
>> Signed-off-by: Alex Dai <yu.dai@intel.com>
>
> retire__read drops the ctx (and hence ppgtt) reference too early,
> resulting in us hitting the WARNING. See the giant thread with Tvrtko,
> Chris and me:
>
> http://www.spinics.net/lists/intel-gfx/msg78918.html
>
> Would be great if someone could test the diff I posted in there.

It doesn't work - I have posted my IGT snippet which I thought explained it.

Problem req unreference in obj->active case. When it hits that path it 
will not move the VMA to inactive and the 
intel_execlists_retire_requests will be the last unreference from the 
retire worker which will trigger the WARN.

I posted an IGT which hits that -> 
http://patchwork.freedesktop.org/patch/65369/

And posted one give up on the active VMA mem leak patch -> 
http://patchwork.freedesktop.org/patch/65529/

I have no idea yet of GuC implications, I just spotted this parallel thread.

And Mika has proposed something interesting - that we could just clean 
up the active VMA in context cleanup since we know it is a false one.

However, again I don't know how that interacts with the GuC. Surely it 
cannot be freeing the context with stuff genuinely still active in the GuC?

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1] drm/i915: Fix a false alert of memory leak when free LRC
  2015-11-23 10:34     ` Tvrtko Ursulin
@ 2015-11-23 22:30       ` Yu Dai
  2015-11-24 10:46         ` Tvrtko Ursulin
  2015-11-24 10:57       ` Daniel Vetter
  1 sibling, 1 reply; 14+ messages in thread
From: Yu Dai @ 2015-11-23 22:30 UTC (permalink / raw)
  To: Tvrtko Ursulin, Daniel Vetter; +Cc: intel-gfx



On 11/23/2015 02:34 AM, Tvrtko Ursulin wrote:
> On 20/11/15 08:31, Daniel Vetter wrote:
> > On Thu, Nov 19, 2015 at 04:10:26PM -0800, yu.dai@intel.com wrote:
> >> From: Alex Dai <yu.dai@intel.com>
> >>
> >> There is a memory leak warning message from i915_gem_context_clean
> >> when GuC submission is enabled. The reason is that when LRC is
> >> released, its ppgtt could be still referenced. The assumption that
> >> all VMAs are unbound during release of LRC is not true.
> >>
> >> v1: Move the code inside i915_gem_context_clean() to where ppgtt is
> >> released because it is not cleaning context anyway but ppgtt.
> >>
> >> Signed-off-by: Alex Dai <yu.dai@intel.com>
> >
> > retire__read drops the ctx (and hence ppgtt) reference too early,
> > resulting in us hitting the WARNING. See the giant thread with Tvrtko,
> > Chris and me:
> >
> > http://www.spinics.net/lists/intel-gfx/msg78918.html
> >
> > Would be great if someone could test the diff I posted in there.
>
> It doesn't work - I have posted my IGT snippet which I thought explained it.

I thought moving the VMA list clean up into i915_ppgtt_release() should 
work. However, it creates a chicken & egg problem. ppgtt_release() rely 
on vma_unbound() to be called first to decrease its refcount. So calling 
vma_unbound() inside ppgtt_release() is not right.
> Problem req unreference in obj->active case. When it hits that path it
> will not move the VMA to inactive and the
> intel_execlists_retire_requests will be the last unreference from the
> retire worker which will trigger the WARN.

I still think the problem comes from the assumption that when lrc is 
released, its all VMAs should be unbound. Precisely I mean the comments 
made for i915_gem_context_clean() - "This context is going away and we 
need to remove all VMAs still around." Really the lrc life cycle is 
different from ppgtt / VMAs. Check the line after 
i915_gem_context_clean(). It is ppgtt_put(). In the case lrc is freed 
early, It won't release ppgtt anyway because it is still referenced by 
VMAs. An it will be freed when no ref of GEM obj.

> I posted an IGT which hits that ->
> http://patchwork.freedesktop.org/patch/65369/
>
> And posted one give up on the active VMA mem leak patch ->
> http://patchwork.freedesktop.org/patch/65529/

This patch will silent the warning. But I think the 
i915_gem_context_clean() itself is unnecessary. I don't see any issue by 
deleting it. The check of VMA list is inside ppgtt_release() and the 
unbound should be aligned to GEM obj's life cycle but not lrc life cycle.
> I have no idea yet of GuC implications, I just spotted this parallel thread.
>
> And Mika has proposed something interesting - that we could just clean
> up the active VMA in context cleanup since we know it is a false one.
>
> However, again I don't know how that interacts with the GuC. Surely it
> cannot be freeing the context with stuff genuinely still active in the GuC?
>

There is no interacts with GuC though. Just very easy to see the warning 
when GuC is enabled, says when run gem_close_race. The reason is that 
GuC does not use the execlist_queue (execlist_retired_req_list) which is 
deferred to retire worker. Same as ring submission mode, when GuC is 
enabled, whenever driver submits a new batch, it will try to release 
previous request. I don't know why  intel_execlists_retire_requests is 
not called for this case. Probably because of the unpin. Deferring the 
retirement may just hide the issue. I bet you will see the warning more 
often if you change i915_gem_retire_requests_ring() to 
i915_gem_retire_requests() in i915_gem_execbuffer_reserve().

Thanks,
Alex
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1] drm/i915: Fix a false alert of memory leak when free LRC
  2015-11-23 22:30       ` Yu Dai
@ 2015-11-24 10:46         ` Tvrtko Ursulin
  0 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2015-11-24 10:46 UTC (permalink / raw)
  To: Yu Dai, Daniel Vetter; +Cc: intel-gfx


On 23/11/15 22:30, Yu Dai wrote:
> On 11/23/2015 02:34 AM, Tvrtko Ursulin wrote:
>> On 20/11/15 08:31, Daniel Vetter wrote:
>> > On Thu, Nov 19, 2015 at 04:10:26PM -0800, yu.dai@intel.com wrote:
>> >> From: Alex Dai <yu.dai@intel.com>
>> >>
>> >> There is a memory leak warning message from i915_gem_context_clean
>> >> when GuC submission is enabled. The reason is that when LRC is
>> >> released, its ppgtt could be still referenced. The assumption that
>> >> all VMAs are unbound during release of LRC is not true.
>> >>
>> >> v1: Move the code inside i915_gem_context_clean() to where ppgtt is
>> >> released because it is not cleaning context anyway but ppgtt.
>> >>
>> >> Signed-off-by: Alex Dai <yu.dai@intel.com>
>> >
>> > retire__read drops the ctx (and hence ppgtt) reference too early,
>> > resulting in us hitting the WARNING. See the giant thread with Tvrtko,
>> > Chris and me:
>> >
>> > http://www.spinics.net/lists/intel-gfx/msg78918.html
>> >
>> > Would be great if someone could test the diff I posted in there.
>>
>> It doesn't work - I have posted my IGT snippet which I thought
>> explained it.
>
> I thought moving the VMA list clean up into i915_ppgtt_release() should
> work. However, it creates a chicken & egg problem. ppgtt_release() rely
> on vma_unbound() to be called first to decrease its refcount. So calling
> vma_unbound() inside ppgtt_release() is not right.
>> Problem req unreference in obj->active case. When it hits that path it
>> will not move the VMA to inactive and the
>> intel_execlists_retire_requests will be the last unreference from the
>> retire worker which will trigger the WARN.
>
> I still think the problem comes from the assumption that when lrc is
> released, its all VMAs should be unbound. Precisely I mean the comments
> made for i915_gem_context_clean() - "This context is going away and we
> need to remove all VMAs still around." Really the lrc life cycle is
> different from ppgtt / VMAs. Check the line after
> i915_gem_context_clean(). It is ppgtt_put(). In the case lrc is freed
> early, It won't release ppgtt anyway because it is still referenced by
> VMAs. An it will be freed when no ref of GEM obj.

Yes, but the last is just a consequence of how the code was laid out, 
not a hard requirement as far as I understand it.

When context destructor runs that means two things:

1. GPU is done with all VMAs belonging to the VM.
2. USerspace also cannot reach them any more either.

So VMAs have no reason to exist beyond that point. If they do, they can 
be left dangling under certain circumstances. See below.

>> I posted an IGT which hits that ->
>> http://patchwork.freedesktop.org/patch/65369/
>>
>> And posted one give up on the active VMA mem leak patch ->
>> http://patchwork.freedesktop.org/patch/65529/
>
> This patch will silent the warning. But I think the
> i915_gem_context_clean() itself is unnecessary. I don't see any issue by
> deleting it. The check of VMA list is inside ppgtt_release() and the
> unbound should be aligned to GEM obj's life cycle but not lrc life cycle.

i915_gem_context_clean solves a memory leak which is explained in the 
commit.

If there is an extra reference on the GEM obj, like in the flink case, 
VMAs will not get unbound.

Apart from various IGTs, you can also demonstrate this leak with fbcon 
and Xorg. Every time you re-start Xorg one VMA will be leaked since it 
imports the fbcon fb via flink.

Second part of the story is that the WARN in i915_gem_context_clean is 
wrong because of how retirement works. So if we removed the WARN, 
cleanup still has some value to avoid memory leak in the above described 
Xorg case, or in any case where flink is in the picture and VMAs have 
been retired to inactive.

Bug left standing would be that if the VMA happens to be flinked and on 
the active list, because of say being shared between rings and contexts, 
it would still be leaked. But it is less leaking than without the cleanup.

I proposed another way of fixing that: 
http://patchwork.freedesktop.org/patch/65861/

>> I have no idea yet of GuC implications, I just spotted this parallel
>> thread.
>>
>> And Mika has proposed something interesting - that we could just clean
>> up the active VMA in context cleanup since we know it is a false one.
>>
>> However, again I don't know how that interacts with the GuC. Surely it
>> cannot be freeing the context with stuff genuinely still active in the
>> GuC?
>>
>
> There is no interacts with GuC though. Just very easy to see the warning
> when GuC is enabled, says when run gem_close_race. The reason is that
> GuC does not use the execlist_queue (execlist_retired_req_list) which is
> deferred to retire worker. Same as ring submission mode, when GuC is
> enabled, whenever driver submits a new batch, it will try to release
> previous request. I don't know why  intel_execlists_retire_requests is
> not called for this case. Probably because of the unpin. Deferring the
> retirement may just hide the issue. I bet you will see the warning more
> often if you change i915_gem_retire_requests_ring() to
> i915_gem_retire_requests() in i915_gem_execbuffer_reserve().

Hmmm.. gem_close_race uses flink, but why would it trigger context 
destruction with active VMAs? How does the backtrace looks like from the 
context cleanup WARN?

Regards,

Tvrtko





_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1] drm/i915: Fix a false alert of memory leak when free LRC
  2015-11-23 10:34     ` Tvrtko Ursulin
  2015-11-23 22:30       ` Yu Dai
@ 2015-11-24 10:57       ` Daniel Vetter
  2015-11-24 12:50         ` Chris Wilson
  2015-11-24 12:51         ` Chris Wilson
  1 sibling, 2 replies; 14+ messages in thread
From: Daniel Vetter @ 2015-11-24 10:57 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Mon, Nov 23, 2015 at 10:34:59AM +0000, Tvrtko Ursulin wrote:
> 
> On 20/11/15 08:31, Daniel Vetter wrote:
> >On Thu, Nov 19, 2015 at 04:10:26PM -0800, yu.dai@intel.com wrote:
> >>From: Alex Dai <yu.dai@intel.com>
> >>
> >>There is a memory leak warning message from i915_gem_context_clean
> >>when GuC submission is enabled. The reason is that when LRC is
> >>released, its ppgtt could be still referenced. The assumption that
> >>all VMAs are unbound during release of LRC is not true.
> >>
> >>v1: Move the code inside i915_gem_context_clean() to where ppgtt is
> >>released because it is not cleaning context anyway but ppgtt.
> >>
> >>Signed-off-by: Alex Dai <yu.dai@intel.com>
> >
> >retire__read drops the ctx (and hence ppgtt) reference too early,
> >resulting in us hitting the WARNING. See the giant thread with Tvrtko,
> >Chris and me:
> >
> >http://www.spinics.net/lists/intel-gfx/msg78918.html
> >
> >Would be great if someone could test the diff I posted in there.
> 
> It doesn't work - I have posted my IGT snippet which I thought explained it.
> 
> Problem req unreference in obj->active case. When it hits that path it will
> not move the VMA to inactive and the intel_execlists_retire_requests will be
> the last unreference from the retire worker which will trigger the WARN.
> 
> I posted an IGT which hits that ->
> http://patchwork.freedesktop.org/patch/65369/
> 
> And posted one give up on the active VMA mem leak patch ->
> http://patchwork.freedesktop.org/patch/65529/

Ok, I get things now. Fundamentally the problem is that we track active
per-obj, but we want it per-vma. In a way this patch just diggs us deeper
into that hole, which doesn't make me all too happy. Oh well.

I'll pull in the warning removal.
-Daniel

> I have no idea yet of GuC implications, I just spotted this parallel thread.
> 
> And Mika has proposed something interesting - that we could just clean up
> the active VMA in context cleanup since we know it is a false one.
> 
> However, again I don't know how that interacts with the GuC. Surely it
> cannot be freeing the context with stuff genuinely still active in the GuC?
> 
> Regards,
> 
> Tvrtko
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1] drm/i915: Fix a false alert of memory leak when free LRC
  2015-11-24 10:57       ` Daniel Vetter
@ 2015-11-24 12:50         ` Chris Wilson
  2015-11-24 12:51         ` Chris Wilson
  1 sibling, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2015-11-24 12:50 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, Nov 24, 2015 at 11:57:22AM +0100, Daniel Vetter wrote:
> On Mon, Nov 23, 2015 at 10:34:59AM +0000, Tvrtko Ursulin wrote:
> > 
> > On 20/11/15 08:31, Daniel Vetter wrote:
> > >On Thu, Nov 19, 2015 at 04:10:26PM -0800, yu.dai@intel.com wrote:
> > >>From: Alex Dai <yu.dai@intel.com>
> > >>
> > >>There is a memory leak warning message from i915_gem_context_clean
> > >>when GuC submission is enabled. The reason is that when LRC is
> > >>released, its ppgtt could be still referenced. The assumption that
> > >>all VMAs are unbound during release of LRC is not true.
> > >>
> > >>v1: Move the code inside i915_gem_context_clean() to where ppgtt is
> > >>released because it is not cleaning context anyway but ppgtt.
> > >>
> > >>Signed-off-by: Alex Dai <yu.dai@intel.com>
> > >
> > >retire__read drops the ctx (and hence ppgtt) reference too early,
> > >resulting in us hitting the WARNING. See the giant thread with Tvrtko,
> > >Chris and me:
> > >
> > >http://www.spinics.net/lists/intel-gfx/msg78918.html
> > >
> > >Would be great if someone could test the diff I posted in there.
> > 
> > It doesn't work - I have posted my IGT snippet which I thought explained it.
> > 
> > Problem req unreference in obj->active case. When it hits that path it will
> > not move the VMA to inactive and the intel_execlists_retire_requests will be
> > the last unreference from the retire worker which will trigger the WARN.
> > 
> > I posted an IGT which hits that ->
> > http://patchwork.freedesktop.org/patch/65369/
> > 
> > And posted one give up on the active VMA mem leak patch ->
> > http://patchwork.freedesktop.org/patch/65529/
> 
> Ok, I get things now. Fundamentally the problem is that we track active
> per-obj, but we want it per-vma. In a way this patch just diggs us deeper
> into that hole, which doesn't make me all too happy. Oh well.
> 
> I'll pull in the warning removal.

Just revert the incomplete patch, rather than the warning that the
patch is suspect.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1] drm/i915: Fix a false alert of memory leak when free LRC
  2015-11-24 10:57       ` Daniel Vetter
  2015-11-24 12:50         ` Chris Wilson
@ 2015-11-24 12:51         ` Chris Wilson
  1 sibling, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2015-11-24 12:51 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, Nov 24, 2015 at 11:57:22AM +0100, Daniel Vetter wrote:
> Ok, I get things now. Fundamentally the problem is that we track active
> per-obj, but we want it per-vma. In a way this patch just diggs us deeper
> into that hole, which doesn't make me all too happy. Oh well.

Why not, I've been chasing reviewers for vma tracking for the last 15
months.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-11-24 12:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-19 22:05 [PATCH] drm/i915/guc: Fix a false alert of memory leak when free LRC yu.dai
2015-10-20  7:45 ` Daniel Vetter
2015-10-21 18:27 ` yu.dai
2015-10-23 21:40   ` Dave Gordon
2015-10-24  8:52     ` Chris Wilson
2015-11-20  0:10 ` [PATCH v1] drm/i915: " yu.dai
2015-11-20  8:31   ` Daniel Vetter
2015-11-20 18:38     ` Yu Dai
2015-11-23 10:34     ` Tvrtko Ursulin
2015-11-23 22:30       ` Yu Dai
2015-11-24 10:46         ` Tvrtko Ursulin
2015-11-24 10:57       ` Daniel Vetter
2015-11-24 12:50         ` Chris Wilson
2015-11-24 12:51         ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.