All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about page table updates at BO destroy
@ 2017-03-22 15:06 Nicolai Hähnle
       [not found] ` <e2ebd097-b2d2-b909-62ce-1c4f67993eae-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Nicolai Hähnle @ 2017-03-22 15:06 UTC (permalink / raw)
  To: amd-gfx mailing list

Hi all,

there's a bit of a puzzle where I'm wondering whether there's a subtle 
bug in the amdgpu kernel module.

Basically, the concern is that a buggy user space driver might trigger a 
sequence like this:

1. Submit a CS that accesses some BO _without_ adding that BO to the 
buffer list.
2. Free that BO.
3. Some other task re-uses the memory underlying the BO.
4. The CS is submitted to the hardware and accesses memory that is now 
already in use by somebody else, since there has been no update to the 
page tables to reflect the freed BO.

Obviously there's a user space bug in step 1, but the kernel must still 
prevent the conflicting memory accesses, and I don't see where it does.

amdgpu_gem_object_close takes a reservation of the BO and the page 
directory, but then simply backs off that reservation rather than adding 
a fence, which I suspect is necessary.

I believe that whenever we remove a BO from a VM, we must 
unconditionally add the most recent page directory fence(?) to the BO. 
Does that sound right?

Cheers,
Nicolai

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about page table updates at BO destroy
       [not found] ` <e2ebd097-b2d2-b909-62ce-1c4f67993eae-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-03-22 15:47   ` Christian König
  2017-03-23  2:26   ` Zhang, Jerry (Junwei)
  1 sibling, 0 replies; 5+ messages in thread
From: Christian König @ 2017-03-22 15:47 UTC (permalink / raw)
  To: Nicolai Hähnle, amd-gfx mailing list

Hi Nicolai,

yeah, that is a known issue.

You don't necessary need to add all fences from the PD to the released 
BO, but immediately starting to clear the PTE would be a good idea.

amdgpu_gem_object_close() should call amdgpu_vm_clear_freed() if the 
PD/PT are swapped in at that moment.

This leaves only a very small window where the application could access 
freed up memory while the PTEs are cleared.

If we even want to close that one we could let amdgpu_vm_clear_freed() 
return the fence of the clear operation and add that to the BO in question.

Regards,
Christian.

Am 22.03.2017 um 16:06 schrieb Nicolai Hähnle:
> Hi all,
>
> there's a bit of a puzzle where I'm wondering whether there's a subtle 
> bug in the amdgpu kernel module.
>
> Basically, the concern is that a buggy user space driver might trigger 
> a sequence like this:
>
> 1. Submit a CS that accesses some BO _without_ adding that BO to the 
> buffer list.
> 2. Free that BO.
> 3. Some other task re-uses the memory underlying the BO.
> 4. The CS is submitted to the hardware and accesses memory that is now 
> already in use by somebody else, since there has been no update to the 
> page tables to reflect the freed BO.
>
> Obviously there's a user space bug in step 1, but the kernel must 
> still prevent the conflicting memory accesses, and I don't see where 
> it does.
>
> amdgpu_gem_object_close takes a reservation of the BO and the page 
> directory, but then simply backs off that reservation rather than 
> adding a fence, which I suspect is necessary.
>
> I believe that whenever we remove a BO from a VM, we must 
> unconditionally add the most recent page directory fence(?) to the BO. 
> Does that sound right?
>
> Cheers,
> Nicolai
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about page table updates at BO destroy
       [not found] ` <e2ebd097-b2d2-b909-62ce-1c4f67993eae-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-03-22 15:47   ` Christian König
@ 2017-03-23  2:26   ` Zhang, Jerry (Junwei)
       [not found]     ` <58D33260.6080008-5C7GfCeVMHo@public.gmane.org>
  1 sibling, 1 reply; 5+ messages in thread
From: Zhang, Jerry (Junwei) @ 2017-03-23  2:26 UTC (permalink / raw)
  To: Nicolai Hähnle, amd-gfx mailing list

On 03/22/2017 11:06 PM, Nicolai Hähnle wrote:
> Hi all,
>
> there's a bit of a puzzle where I'm wondering whether there's a subtle bug in
> the amdgpu kernel module.
>
> Basically, the concern is that a buggy user space driver might trigger a
> sequence like this:
>
> 1. Submit a CS that accesses some BO _without_ adding that BO to the buffer list.
> 2. Free that BO.

The user space should call unmap when free a BO, as my understanding.
In this case, it will call amdgpu_gem_va_update_vm() to clear the PTE related 
to the BO.
Right?

Or you just imagine this scenery that there is no unmap?

Jerry

> 3. Some other task re-uses the memory underlying the BO.
> 4. The CS is submitted to the hardware and accesses memory that is now already
> in use by somebody else, since there has been no update to the page tables to
> reflect the freed BO.
>
> Obviously there's a user space bug in step 1, but the kernel must still prevent
> the conflicting memory accesses, and I don't see where it does.
>
> amdgpu_gem_object_close takes a reservation of the BO and the page directory,
> but then simply backs off that reservation rather than adding a fence, which I
> suspect is necessary.
>
> I believe that whenever we remove a BO from a VM, we must unconditionally add
> the most recent page directory fence(?) to the BO. Does that sound right?
>
> Cheers,
> Nicolai
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about page table updates at BO destroy
       [not found]     ` <58D33260.6080008-5C7GfCeVMHo@public.gmane.org>
@ 2017-03-23 13:07       ` Nicolai Hähnle
       [not found]         ` <e9f57d1c-e7a2-db9e-bd4e-b6f3c008f01f-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Nicolai Hähnle @ 2017-03-23 13:07 UTC (permalink / raw)
  To: Zhang, Jerry (Junwei), amd-gfx mailing list

Hi Jerry,

On 23.03.2017 03:26, Zhang, Jerry (Junwei) wrote:
> On 03/22/2017 11:06 PM, Nicolai Hähnle wrote:
>> Hi all,
>>
>> there's a bit of a puzzle where I'm wondering whether there's a subtle
>> bug in
>> the amdgpu kernel module.
>>
>> Basically, the concern is that a buggy user space driver might trigger a
>> sequence like this:
>>
>> 1. Submit a CS that accesses some BO _without_ adding that BO to the
>> buffer list.
>> 2. Free that BO.
>
> The user space should call unmap when free a BO, as my understanding.
> In this case, it will call amdgpu_gem_va_update_vm() to clear the PTE
> related to the BO.
> Right?
>
> Or you just imagine this scenery that there is no unmap?

I'm thinking of the scenario without an unmap, i.e. broken / malicious 
user space. I haven't looked into the unmap case, I will. I have a WIP 
patch for this, will give it a proper test drive later.

Cheers,
Nicolai


>
> Jerry
>
>> 3. Some other task re-uses the memory underlying the BO.
>> 4. The CS is submitted to the hardware and accesses memory that is now
>> already
>> in use by somebody else, since there has been no update to the page
>> tables to
>> reflect the freed BO.
>>
>> Obviously there's a user space bug in step 1, but the kernel must
>> still prevent
>> the conflicting memory accesses, and I don't see where it does.
>>
>> amdgpu_gem_object_close takes a reservation of the BO and the page
>> directory,
>> but then simply backs off that reservation rather than adding a fence,
>> which I
>> suspect is necessary.
>>
>> I believe that whenever we remove a BO from a VM, we must
>> unconditionally add
>> the most recent page directory fence(?) to the BO. Does that sound right?
>>
>> Cheers,
>> Nicolai
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about page table updates at BO destroy
       [not found]         ` <e9f57d1c-e7a2-db9e-bd4e-b6f3c008f01f-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-03-24  3:49           ` Zhang, Jerry (Junwei)
  0 siblings, 0 replies; 5+ messages in thread
From: Zhang, Jerry (Junwei) @ 2017-03-24  3:49 UTC (permalink / raw)
  To: Nicolai Hähnle, amd-gfx mailing list

On 03/23/2017 09:07 PM, Nicolai Hähnle wrote:
> Hi Jerry,
>
> On 23.03.2017 03:26, Zhang, Jerry (Junwei) wrote:
>> On 03/22/2017 11:06 PM, Nicolai Hähnle wrote:
>>> Hi all,
>>>
>>> there's a bit of a puzzle where I'm wondering whether there's a subtle
>>> bug in
>>> the amdgpu kernel module.
>>>
>>> Basically, the concern is that a buggy user space driver might trigger a
>>> sequence like this:
>>>
>>> 1. Submit a CS that accesses some BO _without_ adding that BO to the
>>> buffer list.
>>> 2. Free that BO.
>>
>> The user space should call unmap when free a BO, as my understanding.
>> In this case, it will call amdgpu_gem_va_update_vm() to clear the PTE
>> related to the BO.
>> Right?
>>
>> Or you just imagine this scenery that there is no unmap?
>
> I'm thinking of the scenario without an unmap, i.e. broken / malicious user
> space. I haven't looked into the unmap case, I will. I have a WIP patch for
> this, will give it a proper test drive later.

if so, it will happens.
I have reviewed them all.

Jerry

>
> Cheers,
> Nicolai
>
>
>>
>> Jerry
>>
>>> 3. Some other task re-uses the memory underlying the BO.
>>> 4. The CS is submitted to the hardware and accesses memory that is now
>>> already
>>> in use by somebody else, since there has been no update to the page
>>> tables to
>>> reflect the freed BO.
>>>
>>> Obviously there's a user space bug in step 1, but the kernel must
>>> still prevent
>>> the conflicting memory accesses, and I don't see where it does.
>>>
>>> amdgpu_gem_object_close takes a reservation of the BO and the page
>>> directory,
>>> but then simply backs off that reservation rather than adding a fence,
>>> which I
>>> suspect is necessary.
>>>
>>> I believe that whenever we remove a BO from a VM, we must
>>> unconditionally add
>>> the most recent page directory fence(?) to the BO. Does that sound right?
>>>
>>> Cheers,
>>> Nicolai
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-03-24  3:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-22 15:06 Question about page table updates at BO destroy Nicolai Hähnle
     [not found] ` <e2ebd097-b2d2-b909-62ce-1c4f67993eae-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-22 15:47   ` Christian König
2017-03-23  2:26   ` Zhang, Jerry (Junwei)
     [not found]     ` <58D33260.6080008-5C7GfCeVMHo@public.gmane.org>
2017-03-23 13:07       ` Nicolai Hähnle
     [not found]         ` <e9f57d1c-e7a2-db9e-bd4e-b6f3c008f01f-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-24  3:49           ` Zhang, Jerry (Junwei)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.