All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
@ 2015-10-26 11:05 Tvrtko Ursulin
  2015-10-26 11:23 ` Chris Wilson
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2015-10-26 11:05 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

In the following commit:

    commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
    Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Date:   Mon Oct 5 13:26:36 2015 +0100

        drm/i915: Clean up associated VMAs on context destruction

I added a WARN_ON assertion that VM's active list must be empty
at the time of owning context is getting freed, but that turned
out to be a wrong assumption.

Due ordering of operations in i915_gem_object_retire__read, where
contexts are unreferenced before VMAs are moved to the inactive
list, the described situation can in fact happen.

It feels wrong to do things in such order so this fix makes sure
a reference to context is held until the move to inactive list
is completed.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9b2048c7077d..6cbe3fdbca96 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2373,19 +2373,27 @@ static void
 i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 {
 	struct i915_vma *vma;
+	struct intel_context *ctx;
 
 	RQ_BUG_ON(obj->last_read_req[ring] == NULL);
 	RQ_BUG_ON(!(obj->active & (1 << ring)));
 
 	list_del_init(&obj->ring_list[ring]);
+
+	/* Ensure context cannot be destroyed with VMAs on the active list. */
+	ctx = obj->last_read_req[ring]->ctx;
+	i915_gem_context_reference(ctx);
+
 	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
 
 	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
 		i915_gem_object_retire__write(obj);
 
 	obj->active &= ~(1 << ring);
-	if (obj->active)
+	if (obj->active) {
+		i915_gem_context_unreference(ctx);
 		return;
+	}
 
 	/* Bump our place on the bound list to keep it roughly in LRU order
 	 * so that we don't steal from recently used but inactive objects
@@ -2399,6 +2407,8 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
 	}
 
+	i915_gem_context_unreference(ctx);
+
 	i915_gem_request_assign(&obj->last_fenced_req, NULL);
 	drm_gem_object_unreference(&obj->base);
 }
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-10-26 11:05 [PATCH] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed Tvrtko Ursulin
@ 2015-10-26 11:23 ` Chris Wilson
  2015-10-26 12:00   ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2015-10-26 11:23 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Oct 26, 2015 at 11:05:03AM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> In the following commit:
> 
>     commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>     Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>     Date:   Mon Oct 5 13:26:36 2015 +0100
> 
>         drm/i915: Clean up associated VMAs on context destruction
> 
> I added a WARN_ON assertion that VM's active list must be empty
> at the time of owning context is getting freed, but that turned
> out to be a wrong assumption.
> 
> Due ordering of operations in i915_gem_object_retire__read, where
> contexts are unreferenced before VMAs are moved to the inactive
> list, the described situation can in fact happen.

The context is being unreferenced indirectly. Adding a direct reference
here is even more bizarre.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-10-26 11:23 ` Chris Wilson
@ 2015-10-26 12:00   ` Tvrtko Ursulin
  2015-10-26 12:10     ` Chris Wilson
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2015-10-26 12:00 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin, Michel Thierry


On 26/10/15 11:23, Chris Wilson wrote:
> On Mon, Oct 26, 2015 at 11:05:03AM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> In the following commit:
>>
>>      commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>      Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>      Date:   Mon Oct 5 13:26:36 2015 +0100
>>
>>          drm/i915: Clean up associated VMAs on context destruction
>>
>> I added a WARN_ON assertion that VM's active list must be empty
>> at the time of owning context is getting freed, but that turned
>> out to be a wrong assumption.
>>
>> Due ordering of operations in i915_gem_object_retire__read, where
>> contexts are unreferenced before VMAs are moved to the inactive
>> list, the described situation can in fact happen.
>
> The context is being unreferenced indirectly. Adding a direct reference
> here is even more bizarre.

Perhaps is not the prettiest, but it sounds logical to me to ensure that 
order of destruction of involved object hierarchy goes from the 
bottom-up and is not interleaved.

If you consider the active/inactive list position as part of the retire 
process, doing it at the very place in code, and the very object that 
looked to be destroyed out of sequence, to me sounded logical.

How would you do it, can you think of a better way?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-10-26 12:00   ` Tvrtko Ursulin
@ 2015-10-26 12:10     ` Chris Wilson
  2015-10-26 13:10       ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2015-10-26 12:10 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Oct 26, 2015 at 12:00:06PM +0000, Tvrtko Ursulin wrote:
> 
> On 26/10/15 11:23, Chris Wilson wrote:
> >On Mon, Oct 26, 2015 at 11:05:03AM +0000, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>In the following commit:
> >>
> >>     commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
> >>     Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>     Date:   Mon Oct 5 13:26:36 2015 +0100
> >>
> >>         drm/i915: Clean up associated VMAs on context destruction
> >>
> >>I added a WARN_ON assertion that VM's active list must be empty
> >>at the time of owning context is getting freed, but that turned
> >>out to be a wrong assumption.
> >>
> >>Due ordering of operations in i915_gem_object_retire__read, where
> >>contexts are unreferenced before VMAs are moved to the inactive
> >>list, the described situation can in fact happen.
> >
> >The context is being unreferenced indirectly. Adding a direct reference
> >here is even more bizarre.
> 
> Perhaps is not the prettiest, but it sounds logical to me to ensure
> that order of destruction of involved object hierarchy goes from the
> bottom-up and is not interleaved.
> 
> If you consider the active/inactive list position as part of the
> retire process, doing it at the very place in code, and the very
> object that looked to be destroyed out of sequence, to me sounded
> logical.
> 
> How would you do it, can you think of a better way?

The reference is via the request. We are handling requests, it makes
more sense that you take the reference on the request.

I would just revert the patch, it doesn't fix the problem you tried to
solve and just adds more.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-10-26 12:10     ` Chris Wilson
@ 2015-10-26 13:10       ` Tvrtko Ursulin
  2015-11-03 10:48         ` Tvrtko Ursulin
  2015-11-03 10:55         ` Chris Wilson
  0 siblings, 2 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2015-10-26 13:10 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin, Michel Thierry


On 26/10/15 12:10, Chris Wilson wrote:
> On Mon, Oct 26, 2015 at 12:00:06PM +0000, Tvrtko Ursulin wrote:
>>
>> On 26/10/15 11:23, Chris Wilson wrote:
>>> On Mon, Oct 26, 2015 at 11:05:03AM +0000, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> In the following commit:
>>>>
>>>>      commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>>>      Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>      Date:   Mon Oct 5 13:26:36 2015 +0100
>>>>
>>>>          drm/i915: Clean up associated VMAs on context destruction
>>>>
>>>> I added a WARN_ON assertion that VM's active list must be empty
>>>> at the time of owning context is getting freed, but that turned
>>>> out to be a wrong assumption.
>>>>
>>>> Due ordering of operations in i915_gem_object_retire__read, where
>>>> contexts are unreferenced before VMAs are moved to the inactive
>>>> list, the described situation can in fact happen.
>>>
>>> The context is being unreferenced indirectly. Adding a direct reference
>>> here is even more bizarre.
>>
>> Perhaps is not the prettiest, but it sounds logical to me to ensure
>> that order of destruction of involved object hierarchy goes from the
>> bottom-up and is not interleaved.
>>
>> If you consider the active/inactive list position as part of the
>> retire process, doing it at the very place in code, and the very
>> object that looked to be destroyed out of sequence, to me sounded
>> logical.
>>
>> How would you do it, can you think of a better way?
> 
> The reference is via the request. We are handling requests, it makes
> more sense that you take the reference on the request.

Hm, so you would be happy with:

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9b2048c7077d..c238481a8090 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2373,19 +2373,26 @@ static void
 i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 {
        struct i915_vma *vma;
+       struct drm_i915_gem_request *req;
 
        RQ_BUG_ON(obj->last_read_req[ring] == NULL);
        RQ_BUG_ON(!(obj->active & (1 << ring)));
 
        list_del_init(&obj->ring_list[ring]);
+
+       /* Ensure context cannot be destroyed with VMAs on the active list. */
+       req = i915_gem_request_reference(obj->last_read_req[ring]);
+
        i915_gem_request_assign(&obj->last_read_req[ring], NULL);
 
        if (obj->last_write_req && obj->last_write_req->ring->id == ring)
                i915_gem_object_retire__write(obj);
 
        obj->active &= ~(1 << ring);
-       if (obj->active)
+       if (obj->active) {
+               i915_gem_request_unreference(req);
                return;
+       }
 
        /* Bump our place on the bound list to keep it roughly in LRU order
         * so that we don't steal from recently used but inactive objects
@@ -2399,6 +2406,8 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
                        list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
        }
 
+       i915_gem_request_unreference(req);
+
        i915_gem_request_assign(&obj->last_fenced_req, NULL);
        drm_gem_object_unreference(&obj->base);
 }
 
> I would just revert the patch, it doesn't fix the problem you tried to
> solve and just adds more.

It solves one problem, just not all of them.

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-10-26 13:10       ` Tvrtko Ursulin
@ 2015-11-03 10:48         ` Tvrtko Ursulin
  2015-11-03 10:55         ` Chris Wilson
  1 sibling, 0 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2015-11-03 10:48 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin, Michel Thierry


On 26/10/15 13:10, Tvrtko Ursulin wrote:
>
> On 26/10/15 12:10, Chris Wilson wrote:
>> On Mon, Oct 26, 2015 at 12:00:06PM +0000, Tvrtko Ursulin wrote:
>>>
>>> On 26/10/15 11:23, Chris Wilson wrote:
>>>> On Mon, Oct 26, 2015 at 11:05:03AM +0000, Tvrtko Ursulin wrote:
>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>
>>>>> In the following commit:
>>>>>
>>>>>       commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>>>>       Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>       Date:   Mon Oct 5 13:26:36 2015 +0100
>>>>>
>>>>>           drm/i915: Clean up associated VMAs on context destruction
>>>>>
>>>>> I added a WARN_ON assertion that VM's active list must be empty
>>>>> at the time of owning context is getting freed, but that turned
>>>>> out to be a wrong assumption.
>>>>>
>>>>> Due ordering of operations in i915_gem_object_retire__read, where
>>>>> contexts are unreferenced before VMAs are moved to the inactive
>>>>> list, the described situation can in fact happen.
>>>>
>>>> The context is being unreferenced indirectly. Adding a direct reference
>>>> here is even more bizarre.
>>>
>>> Perhaps is not the prettiest, but it sounds logical to me to ensure
>>> that order of destruction of involved object hierarchy goes from the
>>> bottom-up and is not interleaved.
>>>
>>> If you consider the active/inactive list position as part of the
>>> retire process, doing it at the very place in code, and the very
>>> object that looked to be destroyed out of sequence, to me sounded
>>> logical.
>>>
>>> How would you do it, can you think of a better way?
>>
>> The reference is via the request. We are handling requests, it makes
>> more sense that you take the reference on the request.
>
> Hm, so you would be happy with:
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9b2048c7077d..c238481a8090 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2373,19 +2373,26 @@ static void
>   i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>   {
>          struct i915_vma *vma;
> +       struct drm_i915_gem_request *req;
>
>          RQ_BUG_ON(obj->last_read_req[ring] == NULL);
>          RQ_BUG_ON(!(obj->active & (1 << ring)));
>
>          list_del_init(&obj->ring_list[ring]);
> +
> +       /* Ensure context cannot be destroyed with VMAs on the active list. */
> +       req = i915_gem_request_reference(obj->last_read_req[ring]);
> +
>          i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>
>          if (obj->last_write_req && obj->last_write_req->ring->id == ring)
>                  i915_gem_object_retire__write(obj);
>
>          obj->active &= ~(1 << ring);
> -       if (obj->active)
> +       if (obj->active) {
> +               i915_gem_request_unreference(req);
>                  return;
> +       }
>
>          /* Bump our place on the bound list to keep it roughly in LRU order
>           * so that we don't steal from recently used but inactive objects
> @@ -2399,6 +2406,8 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>                          list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
>          }
>
> +       i915_gem_request_unreference(req);
> +
>          i915_gem_request_assign(&obj->last_fenced_req, NULL);
>          drm_gem_object_unreference(&obj->base);
>   }
>


Ping on this?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-10-26 13:10       ` Tvrtko Ursulin
  2015-11-03 10:48         ` Tvrtko Ursulin
@ 2015-11-03 10:55         ` Chris Wilson
  2015-11-03 11:08           ` Tvrtko Ursulin
  1 sibling, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2015-11-03 10:55 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Oct 26, 2015 at 01:10:19PM +0000, Tvrtko Ursulin wrote:
> 
> On 26/10/15 12:10, Chris Wilson wrote:
> > On Mon, Oct 26, 2015 at 12:00:06PM +0000, Tvrtko Ursulin wrote:
> >>
> >> On 26/10/15 11:23, Chris Wilson wrote:
> >>> On Mon, Oct 26, 2015 at 11:05:03AM +0000, Tvrtko Ursulin wrote:
> >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>
> >>>> In the following commit:
> >>>>
> >>>>      commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
> >>>>      Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>      Date:   Mon Oct 5 13:26:36 2015 +0100
> >>>>
> >>>>          drm/i915: Clean up associated VMAs on context destruction
> >>>>
> >>>> I added a WARN_ON assertion that VM's active list must be empty
> >>>> at the time of owning context is getting freed, but that turned
> >>>> out to be a wrong assumption.
> >>>>
> >>>> Due ordering of operations in i915_gem_object_retire__read, where
> >>>> contexts are unreferenced before VMAs are moved to the inactive
> >>>> list, the described situation can in fact happen.
> >>>
> >>> The context is being unreferenced indirectly. Adding a direct reference
> >>> here is even more bizarre.
> >>
> >> Perhaps is not the prettiest, but it sounds logical to me to ensure
> >> that order of destruction of involved object hierarchy goes from the
> >> bottom-up and is not interleaved.
> >>
> >> If you consider the active/inactive list position as part of the
> >> retire process, doing it at the very place in code, and the very
> >> object that looked to be destroyed out of sequence, to me sounded
> >> logical.
> >>
> >> How would you do it, can you think of a better way?
> > 
> > The reference is via the request. We are handling requests, it makes
> > more sense that you take the reference on the request.
> 
> Hm, so you would be happy with:

Go up another level. There is just one callsite where the reference
needs to be added across the call.

And no, I would not be happy as I see this as just futher increasing the
technical debt.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-03 10:55         ` Chris Wilson
@ 2015-11-03 11:08           ` Tvrtko Ursulin
  2015-11-17 15:53             ` [PATCH v2] " Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2015-11-03 11:08 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin, Michel Thierry


On 03/11/15 10:55, Chris Wilson wrote:
> On Mon, Oct 26, 2015 at 01:10:19PM +0000, Tvrtko Ursulin wrote:
>>
>> On 26/10/15 12:10, Chris Wilson wrote:
>>> On Mon, Oct 26, 2015 at 12:00:06PM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 26/10/15 11:23, Chris Wilson wrote:
>>>>> On Mon, Oct 26, 2015 at 11:05:03AM +0000, Tvrtko Ursulin wrote:
>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>
>>>>>> In the following commit:
>>>>>>
>>>>>>       commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>>>>>       Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>       Date:   Mon Oct 5 13:26:36 2015 +0100
>>>>>>
>>>>>>           drm/i915: Clean up associated VMAs on context destruction
>>>>>>
>>>>>> I added a WARN_ON assertion that VM's active list must be empty
>>>>>> at the time of owning context is getting freed, but that turned
>>>>>> out to be a wrong assumption.
>>>>>>
>>>>>> Due ordering of operations in i915_gem_object_retire__read, where
>>>>>> contexts are unreferenced before VMAs are moved to the inactive
>>>>>> list, the described situation can in fact happen.
>>>>>
>>>>> The context is being unreferenced indirectly. Adding a direct reference
>>>>> here is even more bizarre.
>>>>
>>>> Perhaps is not the prettiest, but it sounds logical to me to ensure
>>>> that order of destruction of involved object hierarchy goes from the
>>>> bottom-up and is not interleaved.
>>>>
>>>> If you consider the active/inactive list position as part of the
>>>> retire process, doing it at the very place in code, and the very
>>>> object that looked to be destroyed out of sequence, to me sounded
>>>> logical.
>>>>
>>>> How would you do it, can you think of a better way?
>>>
>>> The reference is via the request. We are handling requests, it makes
>>> more sense that you take the reference on the request.
>>
>> Hm, so you would be happy with:
>
> Go up another level. There is just one callsite where the reference
> needs to be added across the call.

i915_gem_retire_requests_ring? Why do you think that is more logical?

To me it sounds really clean to do it in the place which deals with 
moving VMAs to the inactive list. It is localized and clear then - that 
it is fixing the illogic of allowing context destructor to run with VMAs 
still on the active list.

> And no, I would not be happy as I see this as just futher increasing the
> technical debt.

I thought we have agreed it is better to fix up what we have quickly, to 
the extent it is feasible, and work towards the rewrite over time.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-03 11:08           ` Tvrtko Ursulin
@ 2015-11-17 15:53             ` Tvrtko Ursulin
  2015-11-17 16:04               ` Chris Wilson
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2015-11-17 15:53 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

In the following commit:

    commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
    Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Date:   Mon Oct 5 13:26:36 2015 +0100

        drm/i915: Clean up associated VMAs on context destruction

I added a WARN_ON assertion that VM's active list must be empty
at the time of owning context is getting freed, but that turned
out to be a wrong assumption.

Due ordering of operations in i915_gem_object_retire__read, where
contexts are unreferenced before VMAs are moved to the inactive
list, the described situation can in fact happen.

It feels wrong to do things in such order so this fix makes sure
a reference to context is held until the move to inactive list
is completed.

v2: Rather than hold a temporary context reference move the
    request unreference to be the last operation. (Daniel Vetter)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 98c83286ab68..e2248601e997 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2404,29 +2404,31 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 	RQ_BUG_ON(!(obj->active & (1 << ring)));
 
 	list_del_init(&obj->ring_list[ring]);
-	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
 
 	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
 		i915_gem_object_retire__write(obj);
 
 	obj->active &= ~(1 << ring);
-	if (obj->active)
-		return;
 
-	/* Bump our place on the bound list to keep it roughly in LRU order
-	 * so that we don't steal from recently used but inactive objects
-	 * (unless we are forced to ofc!)
-	 */
-	list_move_tail(&obj->global_list,
-		       &to_i915(obj->base.dev)->mm.bound_list);
+	if (!obj->active) {
+		/* Bump our place on the bound list to keep it roughly in LRU order
+		* so that we don't steal from recently used but inactive objects
+		* (unless we are forced to ofc!)
+		*/
+		list_move_tail(&obj->global_list,
+			&to_i915(obj->base.dev)->mm.bound_list);
 
-	list_for_each_entry(vma, &obj->vma_list, vma_link) {
-		if (!list_empty(&vma->mm_list))
-			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
+		list_for_each_entry(vma, &obj->vma_list, vma_link) {
+			if (!list_empty(&vma->mm_list))
+				list_move_tail(&vma->mm_list,
+					       &vma->vm->inactive_list);
+		}
+
+		i915_gem_request_assign(&obj->last_fenced_req, NULL);
+		drm_gem_object_unreference(&obj->base);
 	}
 
-	i915_gem_request_assign(&obj->last_fenced_req, NULL);
-	drm_gem_object_unreference(&obj->base);
+	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
 }
 
 static int
-- 
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-17 15:53             ` [PATCH v2] " Tvrtko Ursulin
@ 2015-11-17 16:04               ` Chris Wilson
  2015-11-17 16:27                 ` [PATCH v3] " Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Wilson @ 2015-11-17 16:04 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx

On Tue, Nov 17, 2015 at 03:53:24PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> In the following commit:
> 
>     commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>     Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>     Date:   Mon Oct 5 13:26:36 2015 +0100
> 
>         drm/i915: Clean up associated VMAs on context destruction
> 
> I added a WARN_ON assertion that VM's active list must be empty
> at the time of owning context is getting freed, but that turned
> out to be a wrong assumption.
> 
> Due ordering of operations in i915_gem_object_retire__read, where
> contexts are unreferenced before VMAs are moved to the inactive
> list, the described situation can in fact happen.
> 
> It feels wrong to do things in such order so this fix makes sure
> a reference to context is held until the move to inactive list
> is completed.
> 
> v2: Rather than hold a temporary context reference move the
>     request unreference to be the last operation. (Daniel Vetter)

Because that is a use-after-free.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-17 16:04               ` Chris Wilson
@ 2015-11-17 16:27                 ` Tvrtko Ursulin
  2015-11-17 16:39                   ` Daniel Vetter
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2015-11-17 16:27 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

In the following commit:

    commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
    Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Date:   Mon Oct 5 13:26:36 2015 +0100

        drm/i915: Clean up associated VMAs on context destruction

I added a WARN_ON assertion that VM's active list must be empty
at the time of owning context is getting freed, but that turned
out to be a wrong assumption.

Due ordering of operations in i915_gem_object_retire__read, where
contexts are unreferenced before VMAs are moved to the inactive
list, the described situation can in fact happen.

It feels wrong to do things in such order so this fix makes sure
a reference to context is held until the move to inactive list
is completed.

v2: Rather than hold a temporary context reference move the
    request unreference to be the last operation. (Daniel Vetter)

v3: Fix use after free. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 98c83286ab68..094ac17a712d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2404,29 +2404,32 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 	RQ_BUG_ON(!(obj->active & (1 << ring)));
 
 	list_del_init(&obj->ring_list[ring]);
-	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
 
 	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
 		i915_gem_object_retire__write(obj);
 
 	obj->active &= ~(1 << ring);
-	if (obj->active)
-		return;
 
-	/* Bump our place on the bound list to keep it roughly in LRU order
-	 * so that we don't steal from recently used but inactive objects
-	 * (unless we are forced to ofc!)
-	 */
-	list_move_tail(&obj->global_list,
-		       &to_i915(obj->base.dev)->mm.bound_list);
+	if (!obj->active) {
+		/* Bump our place on the bound list to keep it roughly in LRU order
+		* so that we don't steal from recently used but inactive objects
+		* (unless we are forced to ofc!)
+		*/
+		list_move_tail(&obj->global_list,
+			&to_i915(obj->base.dev)->mm.bound_list);
 
-	list_for_each_entry(vma, &obj->vma_list, vma_link) {
-		if (!list_empty(&vma->mm_list))
-			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
-	}
+		list_for_each_entry(vma, &obj->vma_list, vma_link) {
+			if (!list_empty(&vma->mm_list))
+				list_move_tail(&vma->mm_list,
+					       &vma->vm->inactive_list);
+		}
 
-	i915_gem_request_assign(&obj->last_fenced_req, NULL);
-	drm_gem_object_unreference(&obj->base);
+		i915_gem_request_assign(&obj->last_fenced_req, NULL);
+		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
+		drm_gem_object_unreference(&obj->base);
+	} else {
+		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
+	}
 }
 
 static int
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-17 16:27                 ` [PATCH v3] " Tvrtko Ursulin
@ 2015-11-17 16:39                   ` Daniel Vetter
  2015-11-17 16:54                     ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Vetter @ 2015-11-17 16:39 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx

On Tue, Nov 17, 2015 at 04:27:12PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> In the following commit:
> 
>     commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>     Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>     Date:   Mon Oct 5 13:26:36 2015 +0100
> 
>         drm/i915: Clean up associated VMAs on context destruction
> 
> I added a WARN_ON assertion that VM's active list must be empty
> at the time of owning context is getting freed, but that turned
> out to be a wrong assumption.
> 
> Due ordering of operations in i915_gem_object_retire__read, where
> contexts are unreferenced before VMAs are moved to the inactive
> list, the described situation can in fact happen.
> 
> It feels wrong to do things in such order so this fix makes sure
> a reference to context is held until the move to inactive list
> is completed.
> 
> v2: Rather than hold a temporary context reference move the
>     request unreference to be the last operation. (Daniel Vetter)
> 
> v3: Fix use after free. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 33 ++++++++++++++++++---------------
>  1 file changed, 18 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 98c83286ab68..094ac17a712d 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2404,29 +2404,32 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>  	RQ_BUG_ON(!(obj->active & (1 << ring)));
>  
>  	list_del_init(&obj->ring_list[ring]);
> -	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>  
>  	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
>  		i915_gem_object_retire__write(obj);
>  
>  	obj->active &= ~(1 << ring);
> -	if (obj->active)
> -		return;

	if (obj->active) {
		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
		return;
	}

Would result in less churn in the code and drop the unecessary indent
level. Also comment is missing as to why we need to do things in a
specific order.
-Daniel

>  
> -	/* Bump our place on the bound list to keep it roughly in LRU order
> -	 * so that we don't steal from recently used but inactive objects
> -	 * (unless we are forced to ofc!)
> -	 */
> -	list_move_tail(&obj->global_list,
> -		       &to_i915(obj->base.dev)->mm.bound_list);
> +	if (!obj->active) {
> +		/* Bump our place on the bound list to keep it roughly in LRU order
> +		* so that we don't steal from recently used but inactive objects
> +		* (unless we are forced to ofc!)
> +		*/
> +		list_move_tail(&obj->global_list,
> +			&to_i915(obj->base.dev)->mm.bound_list);
>  
> -	list_for_each_entry(vma, &obj->vma_list, vma_link) {
> -		if (!list_empty(&vma->mm_list))
> -			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
> -	}
> +		list_for_each_entry(vma, &obj->vma_list, vma_link) {
> +			if (!list_empty(&vma->mm_list))
> +				list_move_tail(&vma->mm_list,
> +					       &vma->vm->inactive_list);
> +		}
>  
> -	i915_gem_request_assign(&obj->last_fenced_req, NULL);
> -	drm_gem_object_unreference(&obj->base);
> +		i915_gem_request_assign(&obj->last_fenced_req, NULL);
> +		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> +		drm_gem_object_unreference(&obj->base);
> +	} else {
> +		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> +	}
>  }
>  
>  static int
> -- 
> 1.9.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-17 16:39                   ` Daniel Vetter
@ 2015-11-17 16:54                     ` Tvrtko Ursulin
  2015-11-17 17:08                       ` Daniel Vetter
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2015-11-17 16:54 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, Intel-gfx


On 17/11/15 16:39, Daniel Vetter wrote:
> On Tue, Nov 17, 2015 at 04:27:12PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> In the following commit:
>>
>>      commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>      Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>      Date:   Mon Oct 5 13:26:36 2015 +0100
>>
>>          drm/i915: Clean up associated VMAs on context destruction
>>
>> I added a WARN_ON assertion that VM's active list must be empty
>> at the time of owning context is getting freed, but that turned
>> out to be a wrong assumption.
>>
>> Due ordering of operations in i915_gem_object_retire__read, where
>> contexts are unreferenced before VMAs are moved to the inactive
>> list, the described situation can in fact happen.
>>
>> It feels wrong to do things in such order so this fix makes sure
>> a reference to context is held until the move to inactive list
>> is completed.
>>
>> v2: Rather than hold a temporary context reference move the
>>      request unreference to be the last operation. (Daniel Vetter)
>>
>> v3: Fix use after free. (Chris Wilson)
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
>> Cc: Michel Thierry <michel.thierry@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> ---
>>   drivers/gpu/drm/i915/i915_gem.c | 33 ++++++++++++++++++---------------
>>   1 file changed, 18 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 98c83286ab68..094ac17a712d 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2404,29 +2404,32 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>>   	RQ_BUG_ON(!(obj->active & (1 << ring)));
>>
>>   	list_del_init(&obj->ring_list[ring]);
>> -	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>>
>>   	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
>>   		i915_gem_object_retire__write(obj);
>>
>>   	obj->active &= ~(1 << ring);
>> -	if (obj->active)
>> -		return;
>
> 	if (obj->active) {
> 		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> 		return;
> 	}
>
> Would result in less churn in the code and drop the unecessary indent
> level. Also comment is missing as to why we need to do things in a
> specific order.

Actually I think I changed my mind and that v1 is the way to go.

Just re-ordering the code here still makes it possible for the context 
destructor to run with VMAs on the active list I think.

If we hold the context then it is 100% clear it is not possible.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-17 16:54                     ` Tvrtko Ursulin
@ 2015-11-17 17:08                       ` Daniel Vetter
  2015-11-17 17:24                         ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Vetter @ 2015-11-17 17:08 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx

On Tue, Nov 17, 2015 at 04:54:50PM +0000, Tvrtko Ursulin wrote:
> 
> On 17/11/15 16:39, Daniel Vetter wrote:
> >On Tue, Nov 17, 2015 at 04:27:12PM +0000, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>In the following commit:
> >>
> >>     commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
> >>     Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>     Date:   Mon Oct 5 13:26:36 2015 +0100
> >>
> >>         drm/i915: Clean up associated VMAs on context destruction
> >>
> >>I added a WARN_ON assertion that VM's active list must be empty
> >>at the time of owning context is getting freed, but that turned
> >>out to be a wrong assumption.
> >>
> >>Due ordering of operations in i915_gem_object_retire__read, where
> >>contexts are unreferenced before VMAs are moved to the inactive
> >>list, the described situation can in fact happen.
> >>
> >>It feels wrong to do things in such order so this fix makes sure
> >>a reference to context is held until the move to inactive list
> >>is completed.
> >>
> >>v2: Rather than hold a temporary context reference move the
> >>     request unreference to be the last operation. (Daniel Vetter)
> >>
> >>v3: Fix use after free. (Chris Wilson)
> >>
> >>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
> >>Cc: Michel Thierry <michel.thierry@intel.com>
> >>Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >>---
> >>  drivers/gpu/drm/i915/i915_gem.c | 33 ++++++++++++++++++---------------
> >>  1 file changed, 18 insertions(+), 15 deletions(-)
> >>
> >>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>index 98c83286ab68..094ac17a712d 100644
> >>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>@@ -2404,29 +2404,32 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
> >>  	RQ_BUG_ON(!(obj->active & (1 << ring)));
> >>
> >>  	list_del_init(&obj->ring_list[ring]);
> >>-	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> >>
> >>  	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
> >>  		i915_gem_object_retire__write(obj);
> >>
> >>  	obj->active &= ~(1 << ring);
> >>-	if (obj->active)
> >>-		return;
> >
> >	if (obj->active) {
> >		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> >		return;
> >	}
> >
> >Would result in less churn in the code and drop the unecessary indent
> >level. Also comment is missing as to why we need to do things in a
> >specific order.
> 
> Actually I think I changed my mind and that v1 is the way to go.
> 
> Just re-ordering the code here still makes it possible for the context
> destructor to run with VMAs on the active list I think.
> 
> If we hold the context then it is 100% clear it is not possible.

request_assign _is_ the function which adjust the refcounts for us, which
means if we drop that reference too early then grabbing a temp reference
is just papering over the real bug.

Written out your patch looks something like

	a_reference(a);
	a_unreference(a);

	/* more cleanup code that should get run before a_unreference but isn't */

	a_unrefernce(a); /* for real this time */

Unfortunately foo_assign is a new pattern and not well-established, so
that connection isn't clear. Maybe we should rename it to
foo_reference_assign to make it more obvious. Or just drop the pretense
and open-code it since we unconditionally assign NULL as the new pointer
value, and we know the current value of the pointer is non-NULL. So
there's really no benefit to the helper here, it only obfuscates. And
since that obfuscation tripped you up it's time to remove it ;-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-17 17:08                       ` Daniel Vetter
@ 2015-11-17 17:24                         ` Tvrtko Ursulin
  2015-11-17 17:32                           ` Tvrtko Ursulin
  2015-11-17 17:56                           ` Daniel Vetter
  0 siblings, 2 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2015-11-17 17:24 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, Intel-gfx


On 17/11/15 17:08, Daniel Vetter wrote:
> On Tue, Nov 17, 2015 at 04:54:50PM +0000, Tvrtko Ursulin wrote:
>>
>> On 17/11/15 16:39, Daniel Vetter wrote:
>>> On Tue, Nov 17, 2015 at 04:27:12PM +0000, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> In the following commit:
>>>>
>>>>      commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>>>      Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>      Date:   Mon Oct 5 13:26:36 2015 +0100
>>>>
>>>>          drm/i915: Clean up associated VMAs on context destruction
>>>>
>>>> I added a WARN_ON assertion that VM's active list must be empty
>>>> at the time of owning context is getting freed, but that turned
>>>> out to be a wrong assumption.
>>>>
>>>> Due ordering of operations in i915_gem_object_retire__read, where
>>>> contexts are unreferenced before VMAs are moved to the inactive
>>>> list, the described situation can in fact happen.
>>>>
>>>> It feels wrong to do things in such order so this fix makes sure
>>>> a reference to context is held until the move to inactive list
>>>> is completed.
>>>>
>>>> v2: Rather than hold a temporary context reference move the
>>>>      request unreference to be the last operation. (Daniel Vetter)
>>>>
>>>> v3: Fix use after free. (Chris Wilson)
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
>>>> Cc: Michel Thierry <michel.thierry@intel.com>
>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>> ---
>>>>   drivers/gpu/drm/i915/i915_gem.c | 33 ++++++++++++++++++---------------
>>>>   1 file changed, 18 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>>> index 98c83286ab68..094ac17a712d 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>> @@ -2404,29 +2404,32 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>>>>   	RQ_BUG_ON(!(obj->active & (1 << ring)));
>>>>
>>>>   	list_del_init(&obj->ring_list[ring]);
>>>> -	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>>>>
>>>>   	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
>>>>   		i915_gem_object_retire__write(obj);
>>>>
>>>>   	obj->active &= ~(1 << ring);
>>>> -	if (obj->active)
>>>> -		return;
>>>
>>> 	if (obj->active) {
>>> 		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>>> 		return;
>>> 	}
>>>
>>> Would result in less churn in the code and drop the unecessary indent
>>> level. Also comment is missing as to why we need to do things in a
>>> specific order.
>>
>> Actually I think I changed my mind and that v1 is the way to go.
>>
>> Just re-ordering the code here still makes it possible for the context
>> destructor to run with VMAs on the active list I think.
>>
>> If we hold the context then it is 100% clear it is not possible.
>
> request_assign _is_ the function which adjust the refcounts for us, which
> means if we drop that reference too early then grabbing a temp reference
> is just papering over the real bug.
>
> Written out your patch looks something like
>
> 	a_reference(a);
> 	a_unreference(a);
>
> 	/* more cleanup code that should get run before a_unreference but isn't */
>
> 	a_unrefernce(a); /* for real this time */
>
> Unfortunately foo_assign is a new pattern and not well-established, so
> that connection isn't clear. Maybe we should rename it to
> foo_reference_assign to make it more obvious. Or just drop the pretense
> and open-code it since we unconditionally assign NULL as the new pointer
> value, and we know the current value of the pointer is non-NULL. So
> there's really no benefit to the helper here, it only obfuscates. And
> since that obfuscation tripped you up it's time to remove it ;-)

Then foo_reference_unreference_assign. :)

But seriously, I think it is more complicated that..

The thing it trips over is that moving VMAs to inactive does not 
correspond in time to request retirement. But in fact VMAs are moved to 
inactive only when all requests associated with an object are done.

This is the unintuitive thing I was working around. To make sure when 
context destructor runs there are not active VMAs for that VM.

I don't know how to guarantee that with what you propose. Perhaps I am 
missing something?

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-17 17:24                         ` Tvrtko Ursulin
@ 2015-11-17 17:32                           ` Tvrtko Ursulin
  2015-11-17 17:34                             ` Tvrtko Ursulin
  2015-11-17 17:56                           ` Daniel Vetter
  1 sibling, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2015-11-17 17:32 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, Intel-gfx


On 17/11/15 17:24, Tvrtko Ursulin wrote:
>
> On 17/11/15 17:08, Daniel Vetter wrote:
>> On Tue, Nov 17, 2015 at 04:54:50PM +0000, Tvrtko Ursulin wrote:
>>>
>>> On 17/11/15 16:39, Daniel Vetter wrote:
>>>> On Tue, Nov 17, 2015 at 04:27:12PM +0000, Tvrtko Ursulin wrote:
>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>
>>>>> In the following commit:
>>>>>
>>>>>      commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>>>>      Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>      Date:   Mon Oct 5 13:26:36 2015 +0100
>>>>>
>>>>>          drm/i915: Clean up associated VMAs on context destruction
>>>>>
>>>>> I added a WARN_ON assertion that VM's active list must be empty
>>>>> at the time of owning context is getting freed, but that turned
>>>>> out to be a wrong assumption.
>>>>>
>>>>> Due ordering of operations in i915_gem_object_retire__read, where
>>>>> contexts are unreferenced before VMAs are moved to the inactive
>>>>> list, the described situation can in fact happen.
>>>>>
>>>>> It feels wrong to do things in such order so this fix makes sure
>>>>> a reference to context is held until the move to inactive list
>>>>> is completed.
>>>>>
>>>>> v2: Rather than hold a temporary context reference move the
>>>>>      request unreference to be the last operation. (Daniel Vetter)
>>>>>
>>>>> v3: Fix use after free. (Chris Wilson)
>>>>>
>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
>>>>> Cc: Michel Thierry <michel.thierry@intel.com>
>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/i915_gem.c | 33
>>>>> ++++++++++++++++++---------------
>>>>>   1 file changed, 18 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>>>>> b/drivers/gpu/drm/i915/i915_gem.c
>>>>> index 98c83286ab68..094ac17a712d 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>>> @@ -2404,29 +2404,32 @@ i915_gem_object_retire__read(struct
>>>>> drm_i915_gem_object *obj, int ring)
>>>>>       RQ_BUG_ON(!(obj->active & (1 << ring)));
>>>>>
>>>>>       list_del_init(&obj->ring_list[ring]);
>>>>> -    i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>>>>>
>>>>>       if (obj->last_write_req && obj->last_write_req->ring->id ==
>>>>> ring)
>>>>>           i915_gem_object_retire__write(obj);
>>>>>
>>>>>       obj->active &= ~(1 << ring);
>>>>> -    if (obj->active)
>>>>> -        return;
>>>>
>>>>     if (obj->active) {
>>>>         i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>>>>         return;
>>>>     }
>>>>
>>>> Would result in less churn in the code and drop the unecessary indent
>>>> level. Also comment is missing as to why we need to do things in a
>>>> specific order.
>>>
>>> Actually I think I changed my mind and that v1 is the way to go.
>>>
>>> Just re-ordering the code here still makes it possible for the context
>>> destructor to run with VMAs on the active list I think.
>>>
>>> If we hold the context then it is 100% clear it is not possible.
>>
>> request_assign _is_ the function which adjust the refcounts for us, which
>> means if we drop that reference too early then grabbing a temp reference
>> is just papering over the real bug.
>>
>> Written out your patch looks something like
>>
>>     a_reference(a);
>>     a_unreference(a);
>>
>>     /* more cleanup code that should get run before a_unreference but
>> isn't */
>>
>>     a_unrefernce(a); /* for real this time */
>>
>> Unfortunately foo_assign is a new pattern and not well-established, so
>> that connection isn't clear. Maybe we should rename it to
>> foo_reference_assign to make it more obvious. Or just drop the pretense
>> and open-code it since we unconditionally assign NULL as the new pointer
>> value, and we know the current value of the pointer is non-NULL. So
>> there's really no benefit to the helper here, it only obfuscates. And
>> since that obfuscation tripped you up it's time to remove it ;-)
>
> Then foo_reference_unreference_assign. :)
>
> But seriously, I think it is more complicated that..
>
> The thing it trips over is that moving VMAs to inactive does not
> correspond in time to request retirement. But in fact VMAs are moved to
> inactive only when all requests associated with an object are done.
>
> This is the unintuitive thing I was working around. To make sure when
> context destructor runs there are not active VMAs for that VM.
>
> I don't know how to guarantee that with what you propose. Perhaps I am
> missing something?

Maybe completely different approach would be to find the VMA belonging 
to req->ctx->vm in i915_gem_request_free and move it to the inactive 
list before context unreference there?

Should would I think, if request is going away means VMA definitely 
needs to go to inactive..

Regards,

Tvrtko






_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-17 17:32                           ` Tvrtko Ursulin
@ 2015-11-17 17:34                             ` Tvrtko Ursulin
  0 siblings, 0 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2015-11-17 17:34 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, Intel-gfx



On 17/11/15 17:32, Tvrtko Ursulin wrote:
>
> On 17/11/15 17:24, Tvrtko Ursulin wrote:
>>
>> On 17/11/15 17:08, Daniel Vetter wrote:
>>> On Tue, Nov 17, 2015 at 04:54:50PM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 17/11/15 16:39, Daniel Vetter wrote:
>>>>> On Tue, Nov 17, 2015 at 04:27:12PM +0000, Tvrtko Ursulin wrote:
>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>
>>>>>> In the following commit:
>>>>>>
>>>>>>      commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>>>>>      Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>      Date:   Mon Oct 5 13:26:36 2015 +0100
>>>>>>
>>>>>>          drm/i915: Clean up associated VMAs on context destruction
>>>>>>
>>>>>> I added a WARN_ON assertion that VM's active list must be empty
>>>>>> at the time of owning context is getting freed, but that turned
>>>>>> out to be a wrong assumption.
>>>>>>
>>>>>> Due ordering of operations in i915_gem_object_retire__read, where
>>>>>> contexts are unreferenced before VMAs are moved to the inactive
>>>>>> list, the described situation can in fact happen.
>>>>>>
>>>>>> It feels wrong to do things in such order so this fix makes sure
>>>>>> a reference to context is held until the move to inactive list
>>>>>> is completed.
>>>>>>
>>>>>> v2: Rather than hold a temporary context reference move the
>>>>>>      request unreference to be the last operation. (Daniel Vetter)
>>>>>>
>>>>>> v3: Fix use after free. (Chris Wilson)
>>>>>>
>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
>>>>>> Cc: Michel Thierry <michel.thierry@intel.com>
>>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>>>> ---
>>>>>>   drivers/gpu/drm/i915/i915_gem.c | 33
>>>>>> ++++++++++++++++++---------------
>>>>>>   1 file changed, 18 insertions(+), 15 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>>>>>> b/drivers/gpu/drm/i915/i915_gem.c
>>>>>> index 98c83286ab68..094ac17a712d 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>>>> @@ -2404,29 +2404,32 @@ i915_gem_object_retire__read(struct
>>>>>> drm_i915_gem_object *obj, int ring)
>>>>>>       RQ_BUG_ON(!(obj->active & (1 << ring)));
>>>>>>
>>>>>>       list_del_init(&obj->ring_list[ring]);
>>>>>> -    i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>>>>>>
>>>>>>       if (obj->last_write_req && obj->last_write_req->ring->id ==
>>>>>> ring)
>>>>>>           i915_gem_object_retire__write(obj);
>>>>>>
>>>>>>       obj->active &= ~(1 << ring);
>>>>>> -    if (obj->active)
>>>>>> -        return;
>>>>>
>>>>>     if (obj->active) {
>>>>>         i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>>>>>         return;
>>>>>     }
>>>>>
>>>>> Would result in less churn in the code and drop the unecessary indent
>>>>> level. Also comment is missing as to why we need to do things in a
>>>>> specific order.
>>>>
>>>> Actually I think I changed my mind and that v1 is the way to go.
>>>>
>>>> Just re-ordering the code here still makes it possible for the context
>>>> destructor to run with VMAs on the active list I think.
>>>>
>>>> If we hold the context then it is 100% clear it is not possible.
>>>
>>> request_assign _is_ the function which adjust the refcounts for us,
>>> which
>>> means if we drop that reference too early then grabbing a temp reference
>>> is just papering over the real bug.
>>>
>>> Written out your patch looks something like
>>>
>>>     a_reference(a);
>>>     a_unreference(a);
>>>
>>>     /* more cleanup code that should get run before a_unreference but
>>> isn't */
>>>
>>>     a_unrefernce(a); /* for real this time */
>>>
>>> Unfortunately foo_assign is a new pattern and not well-established, so
>>> that connection isn't clear. Maybe we should rename it to
>>> foo_reference_assign to make it more obvious. Or just drop the pretense
>>> and open-code it since we unconditionally assign NULL as the new pointer
>>> value, and we know the current value of the pointer is non-NULL. So
>>> there's really no benefit to the helper here, it only obfuscates. And
>>> since that obfuscation tripped you up it's time to remove it ;-)
>>
>> Then foo_reference_unreference_assign. :)
>>
>> But seriously, I think it is more complicated that..
>>
>> The thing it trips over is that moving VMAs to inactive does not
>> correspond in time to request retirement. But in fact VMAs are moved to
>> inactive only when all requests associated with an object are done.
>>
>> This is the unintuitive thing I was working around. To make sure when
>> context destructor runs there are not active VMAs for that VM.
>>
>> I don't know how to guarantee that with what you propose. Perhaps I am
>> missing something?
>
> Maybe completely different approach would be to find the VMA belonging
> to req->ctx->vm in i915_gem_request_free and move it to the inactive
> list before context unreference there?
>
> Should would I think, if request is going away means VMA definitely
> needs to go to inactive..

Or not.. no link to the correct object from there. Ok calling it a day..

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-17 17:24                         ` Tvrtko Ursulin
  2015-11-17 17:32                           ` Tvrtko Ursulin
@ 2015-11-17 17:56                           ` Daniel Vetter
  2015-11-18 17:18                             ` Tvrtko Ursulin
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel Vetter @ 2015-11-17 17:56 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx

On Tue, Nov 17, 2015 at 05:24:01PM +0000, Tvrtko Ursulin wrote:
> 
> On 17/11/15 17:08, Daniel Vetter wrote:
> >On Tue, Nov 17, 2015 at 04:54:50PM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 17/11/15 16:39, Daniel Vetter wrote:
> >>>On Tue, Nov 17, 2015 at 04:27:12PM +0000, Tvrtko Ursulin wrote:
> >>>>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>
> >>>>In the following commit:
> >>>>
> >>>>     commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
> >>>>     Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>     Date:   Mon Oct 5 13:26:36 2015 +0100
> >>>>
> >>>>         drm/i915: Clean up associated VMAs on context destruction
> >>>>
> >>>>I added a WARN_ON assertion that VM's active list must be empty
> >>>>at the time of owning context is getting freed, but that turned
> >>>>out to be a wrong assumption.
> >>>>
> >>>>Due ordering of operations in i915_gem_object_retire__read, where
> >>>>contexts are unreferenced before VMAs are moved to the inactive
> >>>>list, the described situation can in fact happen.
> >>>>
> >>>>It feels wrong to do things in such order so this fix makes sure
> >>>>a reference to context is held until the move to inactive list
> >>>>is completed.
> >>>>
> >>>>v2: Rather than hold a temporary context reference move the
> >>>>     request unreference to be the last operation. (Daniel Vetter)
> >>>>
> >>>>v3: Fix use after free. (Chris Wilson)
> >>>>
> >>>>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
> >>>>Cc: Michel Thierry <michel.thierry@intel.com>
> >>>>Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >>>>---
> >>>>  drivers/gpu/drm/i915/i915_gem.c | 33 ++++++++++++++++++---------------
> >>>>  1 file changed, 18 insertions(+), 15 deletions(-)
> >>>>
> >>>>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>>>index 98c83286ab68..094ac17a712d 100644
> >>>>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>>>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>>>@@ -2404,29 +2404,32 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
> >>>>  	RQ_BUG_ON(!(obj->active & (1 << ring)));
> >>>>
> >>>>  	list_del_init(&obj->ring_list[ring]);
> >>>>-	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> >>>>
> >>>>  	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
> >>>>  		i915_gem_object_retire__write(obj);
> >>>>
> >>>>  	obj->active &= ~(1 << ring);
> >>>>-	if (obj->active)
> >>>>-		return;
> >>>
> >>>	if (obj->active) {
> >>>		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> >>>		return;
> >>>	}
> >>>
> >>>Would result in less churn in the code and drop the unecessary indent
> >>>level. Also comment is missing as to why we need to do things in a
> >>>specific order.
> >>
> >>Actually I think I changed my mind and that v1 is the way to go.
> >>
> >>Just re-ordering the code here still makes it possible for the context
> >>destructor to run with VMAs on the active list I think.
> >>
> >>If we hold the context then it is 100% clear it is not possible.
> >
> >request_assign _is_ the function which adjust the refcounts for us, which
> >means if we drop that reference too early then grabbing a temp reference
> >is just papering over the real bug.
> >
> >Written out your patch looks something like
> >
> >	a_reference(a);
> >	a_unreference(a);
> >
> >	/* more cleanup code that should get run before a_unreference but isn't */
> >
> >	a_unrefernce(a); /* for real this time */
> >
> >Unfortunately foo_assign is a new pattern and not well-established, so
> >that connection isn't clear. Maybe we should rename it to
> >foo_reference_assign to make it more obvious. Or just drop the pretense
> >and open-code it since we unconditionally assign NULL as the new pointer
> >value, and we know the current value of the pointer is non-NULL. So
> >there's really no benefit to the helper here, it only obfuscates. And
> >since that obfuscation tripped you up it's time to remove it ;-)
> 
> Then foo_reference_unreference_assign. :)
> 
> But seriously, I think it is more complicated that..
> 
> The thing it trips over is that moving VMAs to inactive does not correspond
> in time to request retirement. But in fact VMAs are moved to inactive only
> when all requests associated with an object are done.
> 
> This is the unintuitive thing I was working around. To make sure when
> context destructor runs there are not active VMAs for that VM.
> 
> I don't know how to guarantee that with what you propose. Perhaps I am
> missing something?

Ok, my example was slightly off, since we have 2 objects:

	b_reference(a->b);
	a_unreference(a); /* might unref a->b if it's the last reference */

	/* more cleanup code that should get run before a_unreference but isn't */

	b_unrefernce(a->b); /* for real this time */

Holding the ref to a makes sure that b doesn't disappear. We rely on that
in a fundamental way (a request really needs the ctx to stick around), and
the bug really is that we drop the ref to a too early. That it's the
releasing of a->b which is eventually blowing things up doesn't really
matter.

Btw would it be possible to have an igt for this? I should be possible to
hit this with some varian of gem_unref_active_buffers.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-17 17:56                           ` Daniel Vetter
@ 2015-11-18 17:18                             ` Tvrtko Ursulin
  2015-11-19  9:17                               ` Daniel Vetter
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2015-11-18 17:18 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, Intel-gfx


On 17/11/15 17:56, Daniel Vetter wrote:
> On Tue, Nov 17, 2015 at 05:24:01PM +0000, Tvrtko Ursulin wrote:
>>
>> On 17/11/15 17:08, Daniel Vetter wrote:
>>> On Tue, Nov 17, 2015 at 04:54:50PM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 17/11/15 16:39, Daniel Vetter wrote:
>>>>> On Tue, Nov 17, 2015 at 04:27:12PM +0000, Tvrtko Ursulin wrote:
>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>
>>>>>> In the following commit:
>>>>>>
>>>>>>      commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>>>>>      Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>      Date:   Mon Oct 5 13:26:36 2015 +0100
>>>>>>
>>>>>>          drm/i915: Clean up associated VMAs on context destruction
>>>>>>
>>>>>> I added a WARN_ON assertion that VM's active list must be empty
>>>>>> at the time of owning context is getting freed, but that turned
>>>>>> out to be a wrong assumption.
>>>>>>
>>>>>> Due ordering of operations in i915_gem_object_retire__read, where
>>>>>> contexts are unreferenced before VMAs are moved to the inactive
>>>>>> list, the described situation can in fact happen.
>>>>>>
>>>>>> It feels wrong to do things in such order so this fix makes sure
>>>>>> a reference to context is held until the move to inactive list
>>>>>> is completed.
>>>>>>
>>>>>> v2: Rather than hold a temporary context reference move the
>>>>>>      request unreference to be the last operation. (Daniel Vetter)
>>>>>>
>>>>>> v3: Fix use after free. (Chris Wilson)
>>>>>>
>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
>>>>>> Cc: Michel Thierry <michel.thierry@intel.com>
>>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>>>> ---
>>>>>>   drivers/gpu/drm/i915/i915_gem.c | 33 ++++++++++++++++++---------------
>>>>>>   1 file changed, 18 insertions(+), 15 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>>>>> index 98c83286ab68..094ac17a712d 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>>>> @@ -2404,29 +2404,32 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>>>>>>   	RQ_BUG_ON(!(obj->active & (1 << ring)));
>>>>>>
>>>>>>   	list_del_init(&obj->ring_list[ring]);
>>>>>> -	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>>>>>>
>>>>>>   	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
>>>>>>   		i915_gem_object_retire__write(obj);
>>>>>>
>>>>>>   	obj->active &= ~(1 << ring);
>>>>>> -	if (obj->active)
>>>>>> -		return;
>>>>>
>>>>> 	if (obj->active) {
>>>>> 		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>>>>> 		return;
>>>>> 	}
>>>>>
>>>>> Would result in less churn in the code and drop the unecessary indent
>>>>> level. Also comment is missing as to why we need to do things in a
>>>>> specific order.
>>>>
>>>> Actually I think I changed my mind and that v1 is the way to go.
>>>>
>>>> Just re-ordering the code here still makes it possible for the context
>>>> destructor to run with VMAs on the active list I think.
>>>>
>>>> If we hold the context then it is 100% clear it is not possible.
>>>
>>> request_assign _is_ the function which adjust the refcounts for us, which
>>> means if we drop that reference too early then grabbing a temp reference
>>> is just papering over the real bug.
>>>
>>> Written out your patch looks something like
>>>
>>> 	a_reference(a);
>>> 	a_unreference(a);
>>>
>>> 	/* more cleanup code that should get run before a_unreference but isn't */
>>>
>>> 	a_unrefernce(a); /* for real this time */
>>>
>>> Unfortunately foo_assign is a new pattern and not well-established, so
>>> that connection isn't clear. Maybe we should rename it to
>>> foo_reference_assign to make it more obvious. Or just drop the pretense
>>> and open-code it since we unconditionally assign NULL as the new pointer
>>> value, and we know the current value of the pointer is non-NULL. So
>>> there's really no benefit to the helper here, it only obfuscates. And
>>> since that obfuscation tripped you up it's time to remove it ;-)
>>
>> Then foo_reference_unreference_assign. :)
>>
>> But seriously, I think it is more complicated that..
>>
>> The thing it trips over is that moving VMAs to inactive does not correspond
>> in time to request retirement. But in fact VMAs are moved to inactive only
>> when all requests associated with an object are done.
>>
>> This is the unintuitive thing I was working around. To make sure when
>> context destructor runs there are not active VMAs for that VM.
>>
>> I don't know how to guarantee that with what you propose. Perhaps I am
>> missing something?
> 
> Ok, my example was slightly off, since we have 2 objects:
> 
> 	b_reference(a->b);
> 	a_unreference(a); /* might unref a->b if it's the last reference */
> 
> 	/* more cleanup code that should get run before a_unreference but isn't */
> 
> 	b_unrefernce(a->b); /* for real this time */
> 
> Holding the ref to a makes sure that b doesn't disappear. We rely on that
> in a fundamental way (a request really needs the ctx to stick around), and
> the bug really is that we drop the ref to a too early. That it's the
> releasing of a->b which is eventually blowing things up doesn't really
> matter.
> 
> Btw would it be possible to have an igt for this? I should be possible to
> hit this with some varian of gem_unref_active_buffers.

I was trying to do that today and it is proving to be a bit tricky.

I need a blitter workload which will run for long enough for the retire
worker to run. So I'll try and build a bit bb tomorrow which will do that.

Alternatively I did this:

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 6ed7d63a0688..db51e4b42a20 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -699,6 +699,8 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
        int retry;
 
        i915_gem_retire_requests_ring(ring);
+       if (i915.enable_execlists)
+               intel_execlists_retire_requests(ring);
 
        vm = list_first_entry(vmas, struct i915_vma, exec_list)->vm;

And that enables me to trigger the WARN from my igt even with the current
shorter blitter workload (copy 64Mb).

Going back to the original problem, how about something like this hunk for a fix?

@@ -2413,19 +2416,36 @@ static void
 i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 {
        struct i915_vma *vma;
+       struct i915_hw_ppgtt *ppgtt;
 
        RQ_BUG_ON(obj->last_read_req[ring] == NULL);
        RQ_BUG_ON(!(obj->active & (1 << ring)));
 
        list_del_init(&obj->ring_list[ring]);
+
+       ppgtt = obj->last_read_req[ring]->ctx->ppgtt;
+       if (ppgtt) {
+               list_for_each_entry(vma, &obj->vma_list, vma_link) {
+                       if (vma->vm == &ppgtt->base &&
+                           !list_empty(&vma->mm_list)) {
+                               list_move_tail(&vma->mm_list,
+                                              &vma->vm->inactive_list);
+                       }
+               }
+       }
+
        i915_gem_request_assign(&obj->last_read_req[ring], NULL);

This moves VMAs immediately to inactive as requests are retired and avoids
the problem with them staying on active for undefined amount of time.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-18 17:18                             ` Tvrtko Ursulin
@ 2015-11-19  9:17                               ` Daniel Vetter
  2015-11-19  9:42                                 ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Vetter @ 2015-11-19  9:17 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx

On Wed, Nov 18, 2015 at 05:18:30PM +0000, Tvrtko Ursulin wrote:
> 
> On 17/11/15 17:56, Daniel Vetter wrote:
> > On Tue, Nov 17, 2015 at 05:24:01PM +0000, Tvrtko Ursulin wrote:
> >>
> >> On 17/11/15 17:08, Daniel Vetter wrote:
> >>> On Tue, Nov 17, 2015 at 04:54:50PM +0000, Tvrtko Ursulin wrote:
> >>>>
> >>>> On 17/11/15 16:39, Daniel Vetter wrote:
> >>>>> On Tue, Nov 17, 2015 at 04:27:12PM +0000, Tvrtko Ursulin wrote:
> >>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>>>
> >>>>>> In the following commit:
> >>>>>>
> >>>>>>      commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
> >>>>>>      Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>>>      Date:   Mon Oct 5 13:26:36 2015 +0100
> >>>>>>
> >>>>>>          drm/i915: Clean up associated VMAs on context destruction
> >>>>>>
> >>>>>> I added a WARN_ON assertion that VM's active list must be empty
> >>>>>> at the time of owning context is getting freed, but that turned
> >>>>>> out to be a wrong assumption.
> >>>>>>
> >>>>>> Due ordering of operations in i915_gem_object_retire__read, where
> >>>>>> contexts are unreferenced before VMAs are moved to the inactive
> >>>>>> list, the described situation can in fact happen.
> >>>>>>
> >>>>>> It feels wrong to do things in such order so this fix makes sure
> >>>>>> a reference to context is held until the move to inactive list
> >>>>>> is completed.
> >>>>>>
> >>>>>> v2: Rather than hold a temporary context reference move the
> >>>>>>      request unreference to be the last operation. (Daniel Vetter)
> >>>>>>
> >>>>>> v3: Fix use after free. (Chris Wilson)
> >>>>>>
> >>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
> >>>>>> Cc: Michel Thierry <michel.thierry@intel.com>
> >>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >>>>>> ---
> >>>>>>   drivers/gpu/drm/i915/i915_gem.c | 33 ++++++++++++++++++---------------
> >>>>>>   1 file changed, 18 insertions(+), 15 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>>>>> index 98c83286ab68..094ac17a712d 100644
> >>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
> >>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >>>>>> @@ -2404,29 +2404,32 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
> >>>>>>   	RQ_BUG_ON(!(obj->active & (1 << ring)));
> >>>>>>
> >>>>>>   	list_del_init(&obj->ring_list[ring]);
> >>>>>> -	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> >>>>>>
> >>>>>>   	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
> >>>>>>   		i915_gem_object_retire__write(obj);
> >>>>>>
> >>>>>>   	obj->active &= ~(1 << ring);
> >>>>>> -	if (obj->active)
> >>>>>> -		return;
> >>>>>
> >>>>> 	if (obj->active) {
> >>>>> 		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> >>>>> 		return;
> >>>>> 	}
> >>>>>
> >>>>> Would result in less churn in the code and drop the unecessary indent
> >>>>> level. Also comment is missing as to why we need to do things in a
> >>>>> specific order.
> >>>>
> >>>> Actually I think I changed my mind and that v1 is the way to go.
> >>>>
> >>>> Just re-ordering the code here still makes it possible for the context
> >>>> destructor to run with VMAs on the active list I think.
> >>>>
> >>>> If we hold the context then it is 100% clear it is not possible.
> >>>
> >>> request_assign _is_ the function which adjust the refcounts for us, which
> >>> means if we drop that reference too early then grabbing a temp reference
> >>> is just papering over the real bug.
> >>>
> >>> Written out your patch looks something like
> >>>
> >>> 	a_reference(a);
> >>> 	a_unreference(a);
> >>>
> >>> 	/* more cleanup code that should get run before a_unreference but isn't */
> >>>
> >>> 	a_unrefernce(a); /* for real this time */
> >>>
> >>> Unfortunately foo_assign is a new pattern and not well-established, so
> >>> that connection isn't clear. Maybe we should rename it to
> >>> foo_reference_assign to make it more obvious. Or just drop the pretense
> >>> and open-code it since we unconditionally assign NULL as the new pointer
> >>> value, and we know the current value of the pointer is non-NULL. So
> >>> there's really no benefit to the helper here, it only obfuscates. And
> >>> since that obfuscation tripped you up it's time to remove it ;-)
> >>
> >> Then foo_reference_unreference_assign. :)
> >>
> >> But seriously, I think it is more complicated that..
> >>
> >> The thing it trips over is that moving VMAs to inactive does not correspond
> >> in time to request retirement. But in fact VMAs are moved to inactive only
> >> when all requests associated with an object are done.
> >>
> >> This is the unintuitive thing I was working around. To make sure when
> >> context destructor runs there are not active VMAs for that VM.
> >>
> >> I don't know how to guarantee that with what you propose. Perhaps I am
> >> missing something?
> > 
> > Ok, my example was slightly off, since we have 2 objects:
> > 
> > 	b_reference(a->b);
> > 	a_unreference(a); /* might unref a->b if it's the last reference */
> > 
> > 	/* more cleanup code that should get run before a_unreference but isn't */
> > 
> > 	b_unrefernce(a->b); /* for real this time */
> > 
> > Holding the ref to a makes sure that b doesn't disappear. We rely on that
> > in a fundamental way (a request really needs the ctx to stick around), and
> > the bug really is that we drop the ref to a too early. That it's the
> > releasing of a->b which is eventually blowing things up doesn't really
> > matter.
> > 
> > Btw would it be possible to have an igt for this? I should be possible to
> > hit this with some varian of gem_unref_active_buffers.
> 
> I was trying to do that today and it is proving to be a bit tricky.
> 
> I need a blitter workload which will run for long enough for the retire
> worker to run. So I'll try and build a bit bb tomorrow which will do that.
> 
> Alternatively I did this:
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 6ed7d63a0688..db51e4b42a20 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -699,6 +699,8 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
>         int retry;
>  
>         i915_gem_retire_requests_ring(ring);
> +       if (i915.enable_execlists)
> +               intel_execlists_retire_requests(ring);
>  
>         vm = list_first_entry(vmas, struct i915_vma, exec_list)->vm;
> 
> And that enables me to trigger the WARN from my igt even with the current
> shorter blitter workload (copy 64Mb).
> 
> Going back to the original problem, how about something like this hunk for a fix?
> 
> @@ -2413,19 +2416,36 @@ static void
>  i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>  {
>         struct i915_vma *vma;
> +       struct i915_hw_ppgtt *ppgtt;
>  
>         RQ_BUG_ON(obj->last_read_req[ring] == NULL);
>         RQ_BUG_ON(!(obj->active & (1 << ring)));
>  
>         list_del_init(&obj->ring_list[ring]);
> +
> +       ppgtt = obj->last_read_req[ring]->ctx->ppgtt;
> +       if (ppgtt) {
> +               list_for_each_entry(vma, &obj->vma_list, vma_link) {
> +                       if (vma->vm == &ppgtt->base &&
> +                           !list_empty(&vma->mm_list)) {
> +                               list_move_tail(&vma->mm_list,
> +                                              &vma->vm->inactive_list);
> +                       }
> +               }
> +       }
> +
>         i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> 
> This moves VMAs immediately to inactive as requests are retired and avoids
> the problem with them staying on active for undefined amount of time.

You can't put active objects onto the inactive list, i.e. the obj->active
check is non-optional. And the if (ppgtt) case is abstraction violation.
I really don't get why we can't just move the unref to the right place ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-19  9:17                               ` Daniel Vetter
@ 2015-11-19  9:42                                 ` Tvrtko Ursulin
  2015-11-19 12:13                                   ` Daniel Vetter
  0 siblings, 1 reply; 23+ messages in thread
From: Tvrtko Ursulin @ 2015-11-19  9:42 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, Intel-gfx


On 19/11/15 09:17, Daniel Vetter wrote:
> On Wed, Nov 18, 2015 at 05:18:30PM +0000, Tvrtko Ursulin wrote:
>>
>> On 17/11/15 17:56, Daniel Vetter wrote:
>>> On Tue, Nov 17, 2015 at 05:24:01PM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 17/11/15 17:08, Daniel Vetter wrote:
>>>>> On Tue, Nov 17, 2015 at 04:54:50PM +0000, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 17/11/15 16:39, Daniel Vetter wrote:
>>>>>>> On Tue, Nov 17, 2015 at 04:27:12PM +0000, Tvrtko Ursulin wrote:
>>>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>>
>>>>>>>> In the following commit:
>>>>>>>>
>>>>>>>>       commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>>>>>>>       Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>>       Date:   Mon Oct 5 13:26:36 2015 +0100
>>>>>>>>
>>>>>>>>           drm/i915: Clean up associated VMAs on context destruction
>>>>>>>>
>>>>>>>> I added a WARN_ON assertion that VM's active list must be empty
>>>>>>>> at the time of owning context is getting freed, but that turned
>>>>>>>> out to be a wrong assumption.
>>>>>>>>
>>>>>>>> Due ordering of operations in i915_gem_object_retire__read, where
>>>>>>>> contexts are unreferenced before VMAs are moved to the inactive
>>>>>>>> list, the described situation can in fact happen.
>>>>>>>>
>>>>>>>> It feels wrong to do things in such order so this fix makes sure
>>>>>>>> a reference to context is held until the move to inactive list
>>>>>>>> is completed.
>>>>>>>>
>>>>>>>> v2: Rather than hold a temporary context reference move the
>>>>>>>>       request unreference to be the last operation. (Daniel Vetter)
>>>>>>>>
>>>>>>>> v3: Fix use after free. (Chris Wilson)
>>>>>>>>
>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
>>>>>>>> Cc: Michel Thierry <michel.thierry@intel.com>
>>>>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>>>>>> ---
>>>>>>>>    drivers/gpu/drm/i915/i915_gem.c | 33 ++++++++++++++++++---------------
>>>>>>>>    1 file changed, 18 insertions(+), 15 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>>>>>>> index 98c83286ab68..094ac17a712d 100644
>>>>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>>>>>> @@ -2404,29 +2404,32 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>>>>>>>>    	RQ_BUG_ON(!(obj->active & (1 << ring)));
>>>>>>>>
>>>>>>>>    	list_del_init(&obj->ring_list[ring]);
>>>>>>>> -	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>>>>>>>>
>>>>>>>>    	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
>>>>>>>>    		i915_gem_object_retire__write(obj);
>>>>>>>>
>>>>>>>>    	obj->active &= ~(1 << ring);
>>>>>>>> -	if (obj->active)
>>>>>>>> -		return;
>>>>>>>
>>>>>>> 	if (obj->active) {
>>>>>>> 		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>>>>>>> 		return;
>>>>>>> 	}
>>>>>>>
>>>>>>> Would result in less churn in the code and drop the unecessary indent
>>>>>>> level. Also comment is missing as to why we need to do things in a
>>>>>>> specific order.
>>>>>>
>>>>>> Actually I think I changed my mind and that v1 is the way to go.
>>>>>>
>>>>>> Just re-ordering the code here still makes it possible for the context
>>>>>> destructor to run with VMAs on the active list I think.
>>>>>>
>>>>>> If we hold the context then it is 100% clear it is not possible.
>>>>>
>>>>> request_assign _is_ the function which adjust the refcounts for us, which
>>>>> means if we drop that reference too early then grabbing a temp reference
>>>>> is just papering over the real bug.
>>>>>
>>>>> Written out your patch looks something like
>>>>>
>>>>> 	a_reference(a);
>>>>> 	a_unreference(a);
>>>>>
>>>>> 	/* more cleanup code that should get run before a_unreference but isn't */
>>>>>
>>>>> 	a_unrefernce(a); /* for real this time */
>>>>>
>>>>> Unfortunately foo_assign is a new pattern and not well-established, so
>>>>> that connection isn't clear. Maybe we should rename it to
>>>>> foo_reference_assign to make it more obvious. Or just drop the pretense
>>>>> and open-code it since we unconditionally assign NULL as the new pointer
>>>>> value, and we know the current value of the pointer is non-NULL. So
>>>>> there's really no benefit to the helper here, it only obfuscates. And
>>>>> since that obfuscation tripped you up it's time to remove it ;-)
>>>>
>>>> Then foo_reference_unreference_assign. :)
>>>>
>>>> But seriously, I think it is more complicated that..
>>>>
>>>> The thing it trips over is that moving VMAs to inactive does not correspond
>>>> in time to request retirement. But in fact VMAs are moved to inactive only
>>>> when all requests associated with an object are done.
>>>>
>>>> This is the unintuitive thing I was working around. To make sure when
>>>> context destructor runs there are not active VMAs for that VM.
>>>>
>>>> I don't know how to guarantee that with what you propose. Perhaps I am
>>>> missing something?
>>>
>>> Ok, my example was slightly off, since we have 2 objects:
>>>
>>> 	b_reference(a->b);
>>> 	a_unreference(a); /* might unref a->b if it's the last reference */
>>>
>>> 	/* more cleanup code that should get run before a_unreference but isn't */
>>>
>>> 	b_unrefernce(a->b); /* for real this time */
>>>
>>> Holding the ref to a makes sure that b doesn't disappear. We rely on that
>>> in a fundamental way (a request really needs the ctx to stick around), and
>>> the bug really is that we drop the ref to a too early. That it's the
>>> releasing of a->b which is eventually blowing things up doesn't really
>>> matter.
>>>
>>> Btw would it be possible to have an igt for this? I should be possible to
>>> hit this with some varian of gem_unref_active_buffers.
>>
>> I was trying to do that today and it is proving to be a bit tricky.
>>
>> I need a blitter workload which will run for long enough for the retire
>> worker to run. So I'll try and build a bit bb tomorrow which will do that.
>>
>> Alternatively I did this:
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> index 6ed7d63a0688..db51e4b42a20 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> @@ -699,6 +699,8 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
>>          int retry;
>>
>>          i915_gem_retire_requests_ring(ring);
>> +       if (i915.enable_execlists)
>> +               intel_execlists_retire_requests(ring);
>>
>>          vm = list_first_entry(vmas, struct i915_vma, exec_list)->vm;
>>
>> And that enables me to trigger the WARN from my igt even with the current
>> shorter blitter workload (copy 64Mb).
>>
>> Going back to the original problem, how about something like this hunk for a fix?
>>
>> @@ -2413,19 +2416,36 @@ static void
>>   i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>>   {
>>          struct i915_vma *vma;
>> +       struct i915_hw_ppgtt *ppgtt;
>>
>>          RQ_BUG_ON(obj->last_read_req[ring] == NULL);
>>          RQ_BUG_ON(!(obj->active & (1 << ring)));
>>
>>          list_del_init(&obj->ring_list[ring]);
>> +
>> +       ppgtt = obj->last_read_req[ring]->ctx->ppgtt;
>> +       if (ppgtt) {
>> +               list_for_each_entry(vma, &obj->vma_list, vma_link) {
>> +                       if (vma->vm == &ppgtt->base &&
>> +                           !list_empty(&vma->mm_list)) {
>> +                               list_move_tail(&vma->mm_list,
>> +                                              &vma->vm->inactive_list);
>> +                       }
>> +               }
>> +       }
>> +
>>          i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>>
>> This moves VMAs immediately to inactive as requests are retired and avoids
>> the problem with them staying on active for undefined amount of time.
>
> You can't put active objects onto the inactive list, i.e. the obj->active

Hm it is inactive in this VM so who would care?

Perhaps then the fix is simply to remove the 
WARN_ON(!list_empty(&ppgtt->base.active_list)) from the context destructor.

If there are active VMAs at that point, they'll get cleaned up when they 
are retired and there is no leak.

> check is non-optional. And the if (ppgtt) case is abstraction violation.
> I really don't get why we can't just move the unref to the right place ...

I don't see where.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-19  9:42                                 ` Tvrtko Ursulin
@ 2015-11-19 12:13                                   ` Daniel Vetter
  2015-11-19 12:28                                     ` Tvrtko Ursulin
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Vetter @ 2015-11-19 12:13 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx

On Thu, Nov 19, 2015 at 09:42:17AM +0000, Tvrtko Ursulin wrote:
> 
> On 19/11/15 09:17, Daniel Vetter wrote:
> >On Wed, Nov 18, 2015 at 05:18:30PM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 17/11/15 17:56, Daniel Vetter wrote:
> >>>On Tue, Nov 17, 2015 at 05:24:01PM +0000, Tvrtko Ursulin wrote:
> >>>>
> >>>>On 17/11/15 17:08, Daniel Vetter wrote:
> >>>>>On Tue, Nov 17, 2015 at 04:54:50PM +0000, Tvrtko Ursulin wrote:
> >>>>>>
> >>>>>>On 17/11/15 16:39, Daniel Vetter wrote:
> >>>>>>>On Tue, Nov 17, 2015 at 04:27:12PM +0000, Tvrtko Ursulin wrote:
> >>>>>>>>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>>>>>
> >>>>>>>>In the following commit:
> >>>>>>>>
> >>>>>>>>      commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
> >>>>>>>>      Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>>>>>      Date:   Mon Oct 5 13:26:36 2015 +0100
> >>>>>>>>
> >>>>>>>>          drm/i915: Clean up associated VMAs on context destruction
> >>>>>>>>
> >>>>>>>>I added a WARN_ON assertion that VM's active list must be empty
> >>>>>>>>at the time of owning context is getting freed, but that turned
> >>>>>>>>out to be a wrong assumption.
> >>>>>>>>
> >>>>>>>>Due ordering of operations in i915_gem_object_retire__read, where
> >>>>>>>>contexts are unreferenced before VMAs are moved to the inactive
> >>>>>>>>list, the described situation can in fact happen.
> >>>>>>>>
> >>>>>>>>It feels wrong to do things in such order so this fix makes sure
> >>>>>>>>a reference to context is held until the move to inactive list
> >>>>>>>>is completed.
> >>>>>>>>
> >>>>>>>>v2: Rather than hold a temporary context reference move the
> >>>>>>>>      request unreference to be the last operation. (Daniel Vetter)
> >>>>>>>>
> >>>>>>>>v3: Fix use after free. (Chris Wilson)
> >>>>>>>>
> >>>>>>>>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>>>>>Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
> >>>>>>>>Cc: Michel Thierry <michel.thierry@intel.com>
> >>>>>>>>Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>>>>>Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >>>>>>>>---
> >>>>>>>>   drivers/gpu/drm/i915/i915_gem.c | 33 ++++++++++++++++++---------------
> >>>>>>>>   1 file changed, 18 insertions(+), 15 deletions(-)
> >>>>>>>>
> >>>>>>>>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>>>>>>>index 98c83286ab68..094ac17a712d 100644
> >>>>>>>>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>>>>>>>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>>>>>>>@@ -2404,29 +2404,32 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
> >>>>>>>>   	RQ_BUG_ON(!(obj->active & (1 << ring)));
> >>>>>>>>
> >>>>>>>>   	list_del_init(&obj->ring_list[ring]);
> >>>>>>>>-	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> >>>>>>>>
> >>>>>>>>   	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
> >>>>>>>>   		i915_gem_object_retire__write(obj);
> >>>>>>>>
> >>>>>>>>   	obj->active &= ~(1 << ring);
> >>>>>>>>-	if (obj->active)
> >>>>>>>>-		return;
> >>>>>>>
> >>>>>>>	if (obj->active) {
> >>>>>>>		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> >>>>>>>		return;
> >>>>>>>	}
> >>>>>>>
> >>>>>>>Would result in less churn in the code and drop the unecessary indent
> >>>>>>>level. Also comment is missing as to why we need to do things in a
> >>>>>>>specific order.
> >>>>>>
> >>>>>>Actually I think I changed my mind and that v1 is the way to go.
> >>>>>>
> >>>>>>Just re-ordering the code here still makes it possible for the context
> >>>>>>destructor to run with VMAs on the active list I think.
> >>>>>>
> >>>>>>If we hold the context then it is 100% clear it is not possible.
> >>>>>
> >>>>>request_assign _is_ the function which adjust the refcounts for us, which
> >>>>>means if we drop that reference too early then grabbing a temp reference
> >>>>>is just papering over the real bug.
> >>>>>
> >>>>>Written out your patch looks something like
> >>>>>
> >>>>>	a_reference(a);
> >>>>>	a_unreference(a);
> >>>>>
> >>>>>	/* more cleanup code that should get run before a_unreference but isn't */
> >>>>>
> >>>>>	a_unrefernce(a); /* for real this time */
> >>>>>
> >>>>>Unfortunately foo_assign is a new pattern and not well-established, so
> >>>>>that connection isn't clear. Maybe we should rename it to
> >>>>>foo_reference_assign to make it more obvious. Or just drop the pretense
> >>>>>and open-code it since we unconditionally assign NULL as the new pointer
> >>>>>value, and we know the current value of the pointer is non-NULL. So
> >>>>>there's really no benefit to the helper here, it only obfuscates. And
> >>>>>since that obfuscation tripped you up it's time to remove it ;-)
> >>>>
> >>>>Then foo_reference_unreference_assign. :)
> >>>>
> >>>>But seriously, I think it is more complicated that..
> >>>>
> >>>>The thing it trips over is that moving VMAs to inactive does not correspond
> >>>>in time to request retirement. But in fact VMAs are moved to inactive only
> >>>>when all requests associated with an object are done.
> >>>>
> >>>>This is the unintuitive thing I was working around. To make sure when
> >>>>context destructor runs there are not active VMAs for that VM.
> >>>>
> >>>>I don't know how to guarantee that with what you propose. Perhaps I am
> >>>>missing something?
> >>>
> >>>Ok, my example was slightly off, since we have 2 objects:
> >>>
> >>>	b_reference(a->b);
> >>>	a_unreference(a); /* might unref a->b if it's the last reference */
> >>>
> >>>	/* more cleanup code that should get run before a_unreference but isn't */
> >>>
> >>>	b_unrefernce(a->b); /* for real this time */
> >>>
> >>>Holding the ref to a makes sure that b doesn't disappear. We rely on that
> >>>in a fundamental way (a request really needs the ctx to stick around), and
> >>>the bug really is that we drop the ref to a too early. That it's the
> >>>releasing of a->b which is eventually blowing things up doesn't really
> >>>matter.
> >>>
> >>>Btw would it be possible to have an igt for this? I should be possible to
> >>>hit this with some varian of gem_unref_active_buffers.
> >>
> >>I was trying to do that today and it is proving to be a bit tricky.
> >>
> >>I need a blitter workload which will run for long enough for the retire
> >>worker to run. So I'll try and build a bit bb tomorrow which will do that.
> >>
> >>Alternatively I did this:
> >>
> >>diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>index 6ed7d63a0688..db51e4b42a20 100644
> >>--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>@@ -699,6 +699,8 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
> >>         int retry;
> >>
> >>         i915_gem_retire_requests_ring(ring);
> >>+       if (i915.enable_execlists)
> >>+               intel_execlists_retire_requests(ring);
> >>
> >>         vm = list_first_entry(vmas, struct i915_vma, exec_list)->vm;
> >>
> >>And that enables me to trigger the WARN from my igt even with the current
> >>shorter blitter workload (copy 64Mb).
> >>
> >>Going back to the original problem, how about something like this hunk for a fix?
> >>
> >>@@ -2413,19 +2416,36 @@ static void
> >>  i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
> >>  {
> >>         struct i915_vma *vma;
> >>+       struct i915_hw_ppgtt *ppgtt;
> >>
> >>         RQ_BUG_ON(obj->last_read_req[ring] == NULL);
> >>         RQ_BUG_ON(!(obj->active & (1 << ring)));
> >>
> >>         list_del_init(&obj->ring_list[ring]);
> >>+
> >>+       ppgtt = obj->last_read_req[ring]->ctx->ppgtt;
> >>+       if (ppgtt) {
> >>+               list_for_each_entry(vma, &obj->vma_list, vma_link) {
> >>+                       if (vma->vm == &ppgtt->base &&
> >>+                           !list_empty(&vma->mm_list)) {
> >>+                               list_move_tail(&vma->mm_list,
> >>+                                              &vma->vm->inactive_list);
> >>+                       }
> >>+               }
> >>+       }
> >>+
> >>         i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> >>
> >>This moves VMAs immediately to inactive as requests are retired and avoids
> >>the problem with them staying on active for undefined amount of time.
> >
> >You can't put active objects onto the inactive list, i.e. the obj->active
> 
> Hm it is inactive in this VM so who would care?

The shrinker will fall over because of some long standing design mistake.
Well, that's only the obj->active vs. vma->active problem.

The real one is that you can have piles of concurrent read requests on a
specific vma even, and retiring the first one of those doesn't make the
vma inactive. So even with the active tracking design mistake fixed your
suggestion wouldn't work.

> Perhaps then the fix is simply to remove the
> WARN_ON(!list_empty(&ppgtt->base.active_list)) from the context destructor.
> 
> If there are active VMAs at that point, they'll get cleaned up when they are
> retired and there is no leak.
> 
> >check is non-optional. And the if (ppgtt) case is abstraction violation.
> >I really don't get why we can't just move the unref to the right place ...
> 
> I don't see where.

Please explain why the below change (which is the one I've been proposing,
and which Chris suggested too) doesn't work:


diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9552647a925d..d16b5ca042fa 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2375,14 +2375,15 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 	RQ_BUG_ON(!(obj->active & (1 << ring)));
 
 	list_del_init(&obj->ring_list[ring]);
-	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
 
 	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
 		i915_gem_object_retire__write(obj);
 
 	obj->active &= ~(1 << ring);
-	if (obj->active)
+	if (obj->active) {
+		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
 		return;
+	}
 
 	/* Bump our place on the bound list to keep it roughly in LRU order
 	 * so that we don't steal from recently used but inactive objects
@@ -2396,6 +2397,7 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
 	}
 
+	/* Only unref once we're on the inactive list. */
+	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
 	i915_gem_request_assign(&obj->last_fenced_req, NULL);
 	drm_gem_object_unreference(&obj->base);
 }


Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed
  2015-11-19 12:13                                   ` Daniel Vetter
@ 2015-11-19 12:28                                     ` Tvrtko Ursulin
  0 siblings, 0 replies; 23+ messages in thread
From: Tvrtko Ursulin @ 2015-11-19 12:28 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, Intel-gfx


Hi,

On 19/11/15 12:13, Daniel Vetter wrote:
> On Thu, Nov 19, 2015 at 09:42:17AM +0000, Tvrtko Ursulin wrote:
>>
>> On 19/11/15 09:17, Daniel Vetter wrote:
>>> On Wed, Nov 18, 2015 at 05:18:30PM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 17/11/15 17:56, Daniel Vetter wrote:
>>>>> On Tue, Nov 17, 2015 at 05:24:01PM +0000, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 17/11/15 17:08, Daniel Vetter wrote:
>>>>>>> On Tue, Nov 17, 2015 at 04:54:50PM +0000, Tvrtko Ursulin wrote:
>>>>>>>>
>>>>>>>> On 17/11/15 16:39, Daniel Vetter wrote:
>>>>>>>>> On Tue, Nov 17, 2015 at 04:27:12PM +0000, Tvrtko Ursulin wrote:
>>>>>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>>>>
>>>>>>>>>> In the following commit:
>>>>>>>>>>
>>>>>>>>>>       commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>>>>>>>>>       Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>>>>       Date:   Mon Oct 5 13:26:36 2015 +0100
>>>>>>>>>>
>>>>>>>>>>           drm/i915: Clean up associated VMAs on context destruction
>>>>>>>>>>
>>>>>>>>>> I added a WARN_ON assertion that VM's active list must be empty
>>>>>>>>>> at the time of owning context is getting freed, but that turned
>>>>>>>>>> out to be a wrong assumption.
>>>>>>>>>>
>>>>>>>>>> Due ordering of operations in i915_gem_object_retire__read, where
>>>>>>>>>> contexts are unreferenced before VMAs are moved to the inactive
>>>>>>>>>> list, the described situation can in fact happen.
>>>>>>>>>>
>>>>>>>>>> It feels wrong to do things in such order so this fix makes sure
>>>>>>>>>> a reference to context is held until the move to inactive list
>>>>>>>>>> is completed.
>>>>>>>>>>
>>>>>>>>>> v2: Rather than hold a temporary context reference move the
>>>>>>>>>>       request unreference to be the last operation. (Daniel Vetter)
>>>>>>>>>>
>>>>>>>>>> v3: Fix use after free. (Chris Wilson)
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
>>>>>>>>>> Cc: Michel Thierry <michel.thierry@intel.com>
>>>>>>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>>>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>>>>>>>> ---
>>>>>>>>>>    drivers/gpu/drm/i915/i915_gem.c | 33 ++++++++++++++++++---------------
>>>>>>>>>>    1 file changed, 18 insertions(+), 15 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>>>>>>>>> index 98c83286ab68..094ac17a712d 100644
>>>>>>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>>>>>>>> @@ -2404,29 +2404,32 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>>>>>>>>>>    	RQ_BUG_ON(!(obj->active & (1 << ring)));
>>>>>>>>>>
>>>>>>>>>>    	list_del_init(&obj->ring_list[ring]);
>>>>>>>>>> -	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>>>>>>>>>>
>>>>>>>>>>    	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
>>>>>>>>>>    		i915_gem_object_retire__write(obj);
>>>>>>>>>>
>>>>>>>>>>    	obj->active &= ~(1 << ring);
>>>>>>>>>> -	if (obj->active)
>>>>>>>>>> -		return;
>>>>>>>>>
>>>>>>>>> 	if (obj->active) {
>>>>>>>>> 		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>>>>>>>>> 		return;
>>>>>>>>> 	}
>>>>>>>>>
>>>>>>>>> Would result in less churn in the code and drop the unecessary indent
>>>>>>>>> level. Also comment is missing as to why we need to do things in a
>>>>>>>>> specific order.
>>>>>>>>
>>>>>>>> Actually I think I changed my mind and that v1 is the way to go.
>>>>>>>>
>>>>>>>> Just re-ordering the code here still makes it possible for the context
>>>>>>>> destructor to run with VMAs on the active list I think.
>>>>>>>>
>>>>>>>> If we hold the context then it is 100% clear it is not possible.
>>>>>>>
>>>>>>> request_assign _is_ the function which adjust the refcounts for us, which
>>>>>>> means if we drop that reference too early then grabbing a temp reference
>>>>>>> is just papering over the real bug.
>>>>>>>
>>>>>>> Written out your patch looks something like
>>>>>>>
>>>>>>> 	a_reference(a);
>>>>>>> 	a_unreference(a);
>>>>>>>
>>>>>>> 	/* more cleanup code that should get run before a_unreference but isn't */
>>>>>>>
>>>>>>> 	a_unrefernce(a); /* for real this time */
>>>>>>>
>>>>>>> Unfortunately foo_assign is a new pattern and not well-established, so
>>>>>>> that connection isn't clear. Maybe we should rename it to
>>>>>>> foo_reference_assign to make it more obvious. Or just drop the pretense
>>>>>>> and open-code it since we unconditionally assign NULL as the new pointer
>>>>>>> value, and we know the current value of the pointer is non-NULL. So
>>>>>>> there's really no benefit to the helper here, it only obfuscates. And
>>>>>>> since that obfuscation tripped you up it's time to remove it ;-)
>>>>>>
>>>>>> Then foo_reference_unreference_assign. :)
>>>>>>
>>>>>> But seriously, I think it is more complicated that..
>>>>>>
>>>>>> The thing it trips over is that moving VMAs to inactive does not correspond
>>>>>> in time to request retirement. But in fact VMAs are moved to inactive only
>>>>>> when all requests associated with an object are done.
>>>>>>
>>>>>> This is the unintuitive thing I was working around. To make sure when
>>>>>> context destructor runs there are not active VMAs for that VM.
>>>>>>
>>>>>> I don't know how to guarantee that with what you propose. Perhaps I am
>>>>>> missing something?
>>>>>
>>>>> Ok, my example was slightly off, since we have 2 objects:
>>>>>
>>>>> 	b_reference(a->b);
>>>>> 	a_unreference(a); /* might unref a->b if it's the last reference */
>>>>>
>>>>> 	/* more cleanup code that should get run before a_unreference but isn't */
>>>>>
>>>>> 	b_unrefernce(a->b); /* for real this time */
>>>>>
>>>>> Holding the ref to a makes sure that b doesn't disappear. We rely on that
>>>>> in a fundamental way (a request really needs the ctx to stick around), and
>>>>> the bug really is that we drop the ref to a too early. That it's the
>>>>> releasing of a->b which is eventually blowing things up doesn't really
>>>>> matter.
>>>>>
>>>>> Btw would it be possible to have an igt for this? I should be possible to
>>>>> hit this with some varian of gem_unref_active_buffers.
>>>>
>>>> I was trying to do that today and it is proving to be a bit tricky.
>>>>
>>>> I need a blitter workload which will run for long enough for the retire
>>>> worker to run. So I'll try and build a bit bb tomorrow which will do that.
>>>>
>>>> Alternatively I did this:
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>> index 6ed7d63a0688..db51e4b42a20 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>> @@ -699,6 +699,8 @@ i915_gem_execbuffer_reserve(struct intel_engine_cs *ring,
>>>>          int retry;
>>>>
>>>>          i915_gem_retire_requests_ring(ring);
>>>> +       if (i915.enable_execlists)
>>>> +               intel_execlists_retire_requests(ring);
>>>>
>>>>          vm = list_first_entry(vmas, struct i915_vma, exec_list)->vm;
>>>>
>>>> And that enables me to trigger the WARN from my igt even with the current
>>>> shorter blitter workload (copy 64Mb).
>>>>
>>>> Going back to the original problem, how about something like this hunk for a fix?
>>>>
>>>> @@ -2413,19 +2416,36 @@ static void
>>>>   i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>>>>   {
>>>>          struct i915_vma *vma;
>>>> +       struct i915_hw_ppgtt *ppgtt;
>>>>
>>>>          RQ_BUG_ON(obj->last_read_req[ring] == NULL);
>>>>          RQ_BUG_ON(!(obj->active & (1 << ring)));
>>>>
>>>>          list_del_init(&obj->ring_list[ring]);
>>>> +
>>>> +       ppgtt = obj->last_read_req[ring]->ctx->ppgtt;
>>>> +       if (ppgtt) {
>>>> +               list_for_each_entry(vma, &obj->vma_list, vma_link) {
>>>> +                       if (vma->vm == &ppgtt->base &&
>>>> +                           !list_empty(&vma->mm_list)) {
>>>> +                               list_move_tail(&vma->mm_list,
>>>> +                                              &vma->vm->inactive_list);
>>>> +                       }
>>>> +               }
>>>> +       }
>>>> +
>>>>          i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>>>>
>>>> This moves VMAs immediately to inactive as requests are retired and avoids
>>>> the problem with them staying on active for undefined amount of time.
>>>
>>> You can't put active objects onto the inactive list, i.e. the obj->active
>>
>> Hm it is inactive in this VM so who would care?
>
> The shrinker will fall over because of some long standing design mistake.
> Well, that's only the obj->active vs. vma->active problem.
>
> The real one is that you can have piles of concurrent read requests on a
> specific vma even, and retiring the first one of those doesn't make the
> vma inactive. So even with the active tracking design mistake fixed your
> suggestion wouldn't work.

But it is *last_read_req* and it is getting assigned to NULL meaning 
there are no other ones at that point.

>> Perhaps then the fix is simply to remove the
>> WARN_ON(!list_empty(&ppgtt->base.active_list)) from the context destructor.
>>
>> If there are active VMAs at that point, they'll get cleaned up when they are
>> retired and there is no leak.
>>
>>> check is non-optional. And the if (ppgtt) case is abstraction violation.
>>> I really don't get why we can't just move the unref to the right place ...
>>
>> I don't see where.
>
> Please explain why the below change (which is the one I've been proposing,
> and which Chris suggested too) doesn't work:
>
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9552647a925d..d16b5ca042fa 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2375,14 +2375,15 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>   	RQ_BUG_ON(!(obj->active & (1 << ring)));
>
>   	list_del_init(&obj->ring_list[ring]);
> -	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>
>   	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
>   		i915_gem_object_retire__write(obj);
>
>   	obj->active &= ~(1 << ring);
> -	if (obj->active)
> +	if (obj->active) {
> +		i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>   		return;

Effectively my test code a bit annotated:

static void
test_retire_vma_not_inactive(int fd)
{
	uint32_t ctx_id;
	uint32_t src, dst, dst2;
	uint32_t blit_bb, store_bb;

	igt_require(HAS_BLT_RING(intel_get_drm_devid(fd)));

	ctx_id = gem_context_create(fd);
	igt_assert(ctx_id);

	src = gem_create(fd, BO_SIZE);
	igt_assert(src);
	dst = gem_create(fd, BO_SIZE);
	igt_assert(dst);
	dst2 = gem_create(fd, 4096);
	igt_assert(dst2);

	blit_bb = blit(fd, dst, src, 0);

// The above blit takes long time to complete, src bo is last_read_req.

	store_bb = store(fd, dst2, src, ctx_id);

// This is a very quick one puts the same src bo as last_read_req on 
render ring.

	gem_sync(fd, store_bb);
	gem_sync(fd, dst2);
	gem_close(fd, store_bb);
	store_bb = store(fd, dst2, src, ctx_id);
	gem_sync(fd, store_bb);
	gem_sync(fd, dst2);
	gem_close(fd, store_bb);
	gem_close(fd, dst2);

// I was doing two in hope to trigger retirement but turns out the key 
is in execlist retirement so never mind. Key is req is retired but VMA
not put on the inactive list.

	gem_context_destroy(fd, ctx_id);

// Now context destruction runs with VMA on the active list -> WARN.

	gem_sync(fd, blit_bb);
	gem_close(fd, blit_bb);

	gem_close(fd, src);
	gem_close(fd, dst);
}

> +	}
>
>   	/* Bump our place on the bound list to keep it roughly in LRU order
>   	 * so that we don't steal from recently used but inactive objects
> @@ -2396,6 +2397,7 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>   			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
>   	}
>
> +	/* Only unref once we're on the inactive list. */
> +	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>   	i915_gem_request_assign(&obj->last_fenced_req, NULL);
>   	drm_gem_object_unreference(&obj->base);
>   }


Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-11-19 12:28 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-26 11:05 [PATCH] drm/i915: Ensure associated VMAs are inactive when contexts are destroyed Tvrtko Ursulin
2015-10-26 11:23 ` Chris Wilson
2015-10-26 12:00   ` Tvrtko Ursulin
2015-10-26 12:10     ` Chris Wilson
2015-10-26 13:10       ` Tvrtko Ursulin
2015-11-03 10:48         ` Tvrtko Ursulin
2015-11-03 10:55         ` Chris Wilson
2015-11-03 11:08           ` Tvrtko Ursulin
2015-11-17 15:53             ` [PATCH v2] " Tvrtko Ursulin
2015-11-17 16:04               ` Chris Wilson
2015-11-17 16:27                 ` [PATCH v3] " Tvrtko Ursulin
2015-11-17 16:39                   ` Daniel Vetter
2015-11-17 16:54                     ` Tvrtko Ursulin
2015-11-17 17:08                       ` Daniel Vetter
2015-11-17 17:24                         ` Tvrtko Ursulin
2015-11-17 17:32                           ` Tvrtko Ursulin
2015-11-17 17:34                             ` Tvrtko Ursulin
2015-11-17 17:56                           ` Daniel Vetter
2015-11-18 17:18                             ` Tvrtko Ursulin
2015-11-19  9:17                               ` Daniel Vetter
2015-11-19  9:42                                 ` Tvrtko Ursulin
2015-11-19 12:13                                   ` Daniel Vetter
2015-11-19 12:28                                     ` Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.