All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace
@ 2018-06-26 14:31 ` Michel Dänzer
  0 siblings, 0 replies; 13+ messages in thread
From: Michel Dänzer @ 2018-06-26 14:31 UTC (permalink / raw)
  To: Sumit Semwal; +Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx

From: Michel Dänzer <michel.daenzer@amd.com>

Fixes the BUG_ON spuriously triggering under the following
circumstances:

* ttm_eu_reserve_buffers processes a list containing multiple BOs using
  the same reservation object, so it calls
  reservation_object_reserve_shared with that reservation object once
  for each such BO.
* In reservation_object_reserve_shared, old->shared_count ==
  old->shared_max - 1, so obj->staged is freed in preparation of an
  in-place update.
* ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
  once for each of the BOs above, always with the same fence.
* The first call adds the fence in the remaining free slot, after which
  old->shared_count == old->shared_max.

In the next call to reservation_object_add_shared_fence, the BUG_ON
triggers. However, nothing bad would happen in
reservation_object_add_shared_inplace, since the fence is already in the
reservation object.

Prevent this by moving the BUG_ON to where an overflow would actually
happen (e.g. if a buggy caller didn't call
reservation_object_reserve_shared before).

Cc: stable@vger.kernel.org
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 drivers/dma-buf/reservation.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 314eb1071cce..532545b9488e 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
 	if (signaled) {
 		RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
 	} else {
+		BUG_ON(fobj->shared_count >= fobj->shared_max);
 		RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
 		fobj->shared_count++;
 	}
@@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj,
 	old = reservation_object_get_list(obj);
 	obj->staged = NULL;
 
-	if (!fobj) {
-		BUG_ON(old->shared_count >= old->shared_max);
+	if (!fobj)
 		reservation_object_add_shared_inplace(obj, old, fence);
-	} else
+	else
 		reservation_object_add_shared_replace(obj, old, fobj, fence);
 }
 EXPORT_SYMBOL(reservation_object_add_shared_fence);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace
@ 2018-06-26 14:31 ` Michel Dänzer
  0 siblings, 0 replies; 13+ messages in thread
From: Michel Dänzer @ 2018-06-26 14:31 UTC (permalink / raw)
  To: Sumit Semwal
  Cc: linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-media-u79uwXL29TY76Z2rM5mHXA

From: Michel Dänzer <michel.daenzer@amd.com>

Fixes the BUG_ON spuriously triggering under the following
circumstances:

* ttm_eu_reserve_buffers processes a list containing multiple BOs using
  the same reservation object, so it calls
  reservation_object_reserve_shared with that reservation object once
  for each such BO.
* In reservation_object_reserve_shared, old->shared_count ==
  old->shared_max - 1, so obj->staged is freed in preparation of an
  in-place update.
* ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
  once for each of the BOs above, always with the same fence.
* The first call adds the fence in the remaining free slot, after which
  old->shared_count == old->shared_max.

In the next call to reservation_object_add_shared_fence, the BUG_ON
triggers. However, nothing bad would happen in
reservation_object_add_shared_inplace, since the fence is already in the
reservation object.

Prevent this by moving the BUG_ON to where an overflow would actually
happen (e.g. if a buggy caller didn't call
reservation_object_reserve_shared before).

Cc: stable@vger.kernel.org
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 drivers/dma-buf/reservation.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 314eb1071cce..532545b9488e 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
 	if (signaled) {
 		RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
 	} else {
+		BUG_ON(fobj->shared_count >= fobj->shared_max);
 		RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
 		fobj->shared_count++;
 	}
@@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj,
 	old = reservation_object_get_list(obj);
 	obj->staged = NULL;
 
-	if (!fobj) {
-		BUG_ON(old->shared_count >= old->shared_max);
+	if (!fobj)
 		reservation_object_add_shared_inplace(obj, old, fence);
-	} else
+	else
 		reservation_object_add_shared_replace(obj, old, fobj, fence);
 }
 EXPORT_SYMBOL(reservation_object_add_shared_fence);
-- 
2.18.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace
  2018-06-26 14:31 ` Michel Dänzer
  (?)
@ 2018-06-27 11:50 ` Chris Wilson
  2018-06-27 16:04     ` Michel Dänzer
  -1 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2018-06-27 11:50 UTC (permalink / raw)
  To: Michel Dänzer, Sumit Semwal
  Cc: linaro-mm-sig, amd-gfx, linux-kernel, dri-devel, linux-media,
	Christian König

Quoting Michel Dänzer (2018-06-26 15:31:47)
> From: Michel Dänzer <michel.daenzer@amd.com>
> 
> Fixes the BUG_ON spuriously triggering under the following
> circumstances:
> 
> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>   the same reservation object, so it calls
>   reservation_object_reserve_shared with that reservation object once
>   for each such BO.
> * In reservation_object_reserve_shared, old->shared_count ==
>   old->shared_max - 1, so obj->staged is freed in preparation of an
>   in-place update.
> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>   once for each of the BOs above, always with the same fence.
> * The first call adds the fence in the remaining free slot, after which
>   old->shared_count == old->shared_max.
> 
> In the next call to reservation_object_add_shared_fence, the BUG_ON
> triggers. However, nothing bad would happen in
> reservation_object_add_shared_inplace, since the fence is already in the
> reservation object.
> 
> Prevent this by moving the BUG_ON to where an overflow would actually
> happen (e.g. if a buggy caller didn't call
> reservation_object_reserve_shared before).
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

I've convinced myself (or rather have not found a valid argument
against) that being able to call reserve_shared + add_shared multiple
times for the same fence is an intended part of reservation_object API 

I'd double check with Christian though.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

>  drivers/dma-buf/reservation.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 314eb1071cce..532545b9488e 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>         if (signaled) {
>                 RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
>         } else {
> +               BUG_ON(fobj->shared_count >= fobj->shared_max);

Personally I would just let kasan detect this and throw away the BUG_ON
or at least move it behind some DMABUF_BUG_ON().
-Chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace
@ 2018-06-27 16:04     ` Michel Dänzer
  0 siblings, 0 replies; 13+ messages in thread
From: Michel Dänzer @ 2018-06-27 16:04 UTC (permalink / raw)
  To: Chris Wilson, Sumit Semwal
  Cc: linux-kernel, dri-devel, linaro-mm-sig, amd-gfx,
	Christian König, linux-media

On 2018-06-27 01:50 PM, Chris Wilson wrote:
> Quoting Michel Dänzer (2018-06-26 15:31:47)
>> From: Michel Dänzer <michel.daenzer@amd.com>
>>
>> Fixes the BUG_ON spuriously triggering under the following
>> circumstances:
>>
>> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>>   the same reservation object, so it calls
>>   reservation_object_reserve_shared with that reservation object once
>>   for each such BO.
>> * In reservation_object_reserve_shared, old->shared_count ==
>>   old->shared_max - 1, so obj->staged is freed in preparation of an
>>   in-place update.
>> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>>   once for each of the BOs above, always with the same fence.
>> * The first call adds the fence in the remaining free slot, after which
>>   old->shared_count == old->shared_max.
>>
>> In the next call to reservation_object_add_shared_fence, the BUG_ON
>> triggers. However, nothing bad would happen in
>> reservation_object_add_shared_inplace, since the fence is already in the
>> reservation object.
>>
>> Prevent this by moving the BUG_ON to where an overflow would actually
>> happen (e.g. if a buggy caller didn't call
>> reservation_object_reserve_shared before).
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
> 
> I've convinced myself (or rather have not found a valid argument
> against) that being able to call reserve_shared + add_shared multiple
> times for the same fence is an intended part of reservation_object API 
> 
> I'd double check with Christian though.

Right, I'm interested in Christian's feedback.


> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Thanks!


>>  drivers/dma-buf/reservation.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
>> index 314eb1071cce..532545b9488e 100644
>> --- a/drivers/dma-buf/reservation.c
>> +++ b/drivers/dma-buf/reservation.c
>> @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>>         if (signaled) {
>>                 RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
>>         } else {
>> +               BUG_ON(fobj->shared_count >= fobj->shared_max);
> 
> Personally I would just let kasan detect this and throw away the BUG_ON
> or at least move it behind some DMABUF_BUG_ON().

Hmm. Normally, I'm not a fan of BUG(_ON) either. But in this case, it's
clear that the caller is buggy, and proceeding to write beyond the end
of the array could have far-reaching consequences. I'm leaving that to
somebody else.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace
@ 2018-06-27 16:04     ` Michel Dänzer
  0 siblings, 0 replies; 13+ messages in thread
From: Michel Dänzer @ 2018-06-27 16:04 UTC (permalink / raw)
  To: Chris Wilson, Sumit Semwal
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Christian König,
	linux-media-u79uwXL29TY76Z2rM5mHXA

On 2018-06-27 01:50 PM, Chris Wilson wrote:
> Quoting Michel Dänzer (2018-06-26 15:31:47)
>> From: Michel Dänzer <michel.daenzer@amd.com>
>>
>> Fixes the BUG_ON spuriously triggering under the following
>> circumstances:
>>
>> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>>   the same reservation object, so it calls
>>   reservation_object_reserve_shared with that reservation object once
>>   for each such BO.
>> * In reservation_object_reserve_shared, old->shared_count ==
>>   old->shared_max - 1, so obj->staged is freed in preparation of an
>>   in-place update.
>> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>>   once for each of the BOs above, always with the same fence.
>> * The first call adds the fence in the remaining free slot, after which
>>   old->shared_count == old->shared_max.
>>
>> In the next call to reservation_object_add_shared_fence, the BUG_ON
>> triggers. However, nothing bad would happen in
>> reservation_object_add_shared_inplace, since the fence is already in the
>> reservation object.
>>
>> Prevent this by moving the BUG_ON to where an overflow would actually
>> happen (e.g. if a buggy caller didn't call
>> reservation_object_reserve_shared before).
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
> 
> I've convinced myself (or rather have not found a valid argument
> against) that being able to call reserve_shared + add_shared multiple
> times for the same fence is an intended part of reservation_object API 
> 
> I'd double check with Christian though.

Right, I'm interested in Christian's feedback.


> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Thanks!


>>  drivers/dma-buf/reservation.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
>> index 314eb1071cce..532545b9488e 100644
>> --- a/drivers/dma-buf/reservation.c
>> +++ b/drivers/dma-buf/reservation.c
>> @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>>         if (signaled) {
>>                 RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
>>         } else {
>> +               BUG_ON(fobj->shared_count >= fobj->shared_max);
> 
> Personally I would just let kasan detect this and throw away the BUG_ON
> or at least move it behind some DMABUF_BUG_ON().

Hmm. Normally, I'm not a fan of BUG(_ON) either. But in this case, it's
clear that the caller is buggy, and proceeding to write beyond the end
of the array could have far-reaching consequences. I'm leaving that to
somebody else.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace
@ 2018-07-04  8:31   ` Christian König
  0 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2018-07-04  8:31 UTC (permalink / raw)
  To: Michel Dänzer, Sumit Semwal
  Cc: linaro-mm-sig, amd-gfx, linux-kernel, dri-devel, linux-media

Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
> From: Michel Dänzer <michel.daenzer@amd.com>
>
> Fixes the BUG_ON spuriously triggering under the following
> circumstances:
>
> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>    the same reservation object, so it calls
>    reservation_object_reserve_shared with that reservation object once
>    for each such BO.
> * In reservation_object_reserve_shared, old->shared_count ==
>    old->shared_max - 1, so obj->staged is freed in preparation of an
>    in-place update.
> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>    once for each of the BOs above, always with the same fence.
> * The first call adds the fence in the remaining free slot, after which
>    old->shared_count == old->shared_max.

Well, the explanation here is not correct. For multiple BOs using the 
same reservation object we won't call 
reservation_object_add_shared_fence() multiple times because we move 
those to the duplicates list in ttm_eu_reserve_buffers().

But this bug can still happen because we call 
reservation_object_add_shared_fence() manually with fences for the same 
context in a couple of places.

One prominent case which comes to my mind are for the VM BOs during 
updates. Another possibility are VRAM BOs which need to be cleared.

>
> In the next call to reservation_object_add_shared_fence, the BUG_ON
> triggers. However, nothing bad would happen in
> reservation_object_add_shared_inplace, since the fence is already in the
> reservation object.
>
> Prevent this by moving the BUG_ON to where an overflow would actually
> happen (e.g. if a buggy caller didn't call
> reservation_object_reserve_shared before).
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>.

Regards,
Christian.

> ---
>   drivers/dma-buf/reservation.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 314eb1071cce..532545b9488e 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>   	if (signaled) {
>   		RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
>   	} else {
> +		BUG_ON(fobj->shared_count >= fobj->shared_max);
>   		RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
>   		fobj->shared_count++;
>   	}
> @@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj,
>   	old = reservation_object_get_list(obj);
>   	obj->staged = NULL;
>   
> -	if (!fobj) {
> -		BUG_ON(old->shared_count >= old->shared_max);
> +	if (!fobj)
>   		reservation_object_add_shared_inplace(obj, old, fence);
> -	} else
> +	else
>   		reservation_object_add_shared_replace(obj, old, fobj, fence);
>   }
>   EXPORT_SYMBOL(reservation_object_add_shared_fence);


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace
@ 2018-07-04  8:31   ` Christian König
  0 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2018-07-04  8:31 UTC (permalink / raw)
  To: Michel Dänzer, Sumit Semwal
  Cc: linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-media-u79uwXL29TY76Z2rM5mHXA

Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
> From: Michel Dänzer <michel.daenzer@amd.com>
>
> Fixes the BUG_ON spuriously triggering under the following
> circumstances:
>
> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>    the same reservation object, so it calls
>    reservation_object_reserve_shared with that reservation object once
>    for each such BO.
> * In reservation_object_reserve_shared, old->shared_count ==
>    old->shared_max - 1, so obj->staged is freed in preparation of an
>    in-place update.
> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>    once for each of the BOs above, always with the same fence.
> * The first call adds the fence in the remaining free slot, after which
>    old->shared_count == old->shared_max.

Well, the explanation here is not correct. For multiple BOs using the 
same reservation object we won't call 
reservation_object_add_shared_fence() multiple times because we move 
those to the duplicates list in ttm_eu_reserve_buffers().

But this bug can still happen because we call 
reservation_object_add_shared_fence() manually with fences for the same 
context in a couple of places.

One prominent case which comes to my mind are for the VM BOs during 
updates. Another possibility are VRAM BOs which need to be cleared.

>
> In the next call to reservation_object_add_shared_fence, the BUG_ON
> triggers. However, nothing bad would happen in
> reservation_object_add_shared_inplace, since the fence is already in the
> reservation object.
>
> Prevent this by moving the BUG_ON to where an overflow would actually
> happen (e.g. if a buggy caller didn't call
> reservation_object_reserve_shared before).
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>.

Regards,
Christian.

> ---
>   drivers/dma-buf/reservation.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 314eb1071cce..532545b9488e 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>   	if (signaled) {
>   		RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
>   	} else {
> +		BUG_ON(fobj->shared_count >= fobj->shared_max);
>   		RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
>   		fobj->shared_count++;
>   	}
> @@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj,
>   	old = reservation_object_get_list(obj);
>   	obj->staged = NULL;
>   
> -	if (!fobj) {
> -		BUG_ON(old->shared_count >= old->shared_max);
> +	if (!fobj)
>   		reservation_object_add_shared_inplace(obj, old, fence);
> -	} else
> +	else
>   		reservation_object_add_shared_replace(obj, old, fobj, fence);
>   }
>   EXPORT_SYMBOL(reservation_object_add_shared_fence);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace
  2018-07-04  8:31   ` Christian König
@ 2018-07-04  9:09     ` Michel Dänzer
  -1 siblings, 0 replies; 13+ messages in thread
From: Michel Dänzer @ 2018-07-04  9:09 UTC (permalink / raw)
  To: christian.koenig, Sumit Semwal
  Cc: linaro-mm-sig, dri-devel, linux-kernel, amd-gfx, linux-media

On 2018-07-04 10:31 AM, Christian König wrote:
> Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
>> From: Michel Dänzer <michel.daenzer@amd.com>
>>
>> Fixes the BUG_ON spuriously triggering under the following
>> circumstances:
>>
>> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>>    the same reservation object, so it calls
>>    reservation_object_reserve_shared with that reservation object once
>>    for each such BO.
>> * In reservation_object_reserve_shared, old->shared_count ==
>>    old->shared_max - 1, so obj->staged is freed in preparation of an
>>    in-place update.
>> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>>    once for each of the BOs above, always with the same fence.
>> * The first call adds the fence in the remaining free slot, after which
>>    old->shared_count == old->shared_max.
> 
> Well, the explanation here is not correct. For multiple BOs using the
> same reservation object we won't call
> reservation_object_add_shared_fence() multiple times because we move
> those to the duplicates list in ttm_eu_reserve_buffers().
> 
> But this bug can still happen because we call
> reservation_object_add_shared_fence() manually with fences for the same
> context in a couple of places.
> 
> One prominent case which comes to my mind are for the VM BOs during
> updates. Another possibility are VRAM BOs which need to be cleared.

Thanks. How about the following:

* ttm_eu_reserve_buffers calls reservation_object_reserve_shared.
* In reservation_object_reserve_shared, shared_count == shared_max - 1,
  so obj->staged is freed in preparation of an in-place update.
* ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence,
  after which shared_count == shared_max.
* The amdgpu driver also calls reservation_object_add_shared_fence for
  the same reservation object, and the BUG_ON triggers.

However, nothing bad would happen in
reservation_object_add_shared_inplace, since all fences use the same
context, so they can only occupy a single slot.

Prevent this by moving the BUG_ON to where an overflow would actually
happen (e.g. if a buggy caller didn't call
reservation_object_reserve_shared before).


Also, I'll add a reference to https://bugs.freedesktop.org/106418 in v2,
as I suspect this fix is necessary under the circumstances described
there as well.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace
@ 2018-07-04  9:09     ` Michel Dänzer
  0 siblings, 0 replies; 13+ messages in thread
From: Michel Dänzer @ 2018-07-04  9:09 UTC (permalink / raw)
  To: christian.koenig, Sumit Semwal
  Cc: linaro-mm-sig, amd-gfx, linux-kernel, dri-devel, linux-media

On 2018-07-04 10:31 AM, Christian König wrote:
> Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
>> From: Michel Dänzer <michel.daenzer@amd.com>
>>
>> Fixes the BUG_ON spuriously triggering under the following
>> circumstances:
>>
>> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>>    the same reservation object, so it calls
>>    reservation_object_reserve_shared with that reservation object once
>>    for each such BO.
>> * In reservation_object_reserve_shared, old->shared_count ==
>>    old->shared_max - 1, so obj->staged is freed in preparation of an
>>    in-place update.
>> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>>    once for each of the BOs above, always with the same fence.
>> * The first call adds the fence in the remaining free slot, after which
>>    old->shared_count == old->shared_max.
> 
> Well, the explanation here is not correct. For multiple BOs using the
> same reservation object we won't call
> reservation_object_add_shared_fence() multiple times because we move
> those to the duplicates list in ttm_eu_reserve_buffers().
> 
> But this bug can still happen because we call
> reservation_object_add_shared_fence() manually with fences for the same
> context in a couple of places.
> 
> One prominent case which comes to my mind are for the VM BOs during
> updates. Another possibility are VRAM BOs which need to be cleared.

Thanks. How about the following:

* ttm_eu_reserve_buffers calls reservation_object_reserve_shared.
* In reservation_object_reserve_shared, shared_count == shared_max - 1,
  so obj->staged is freed in preparation of an in-place update.
* ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence,
  after which shared_count == shared_max.
* The amdgpu driver also calls reservation_object_add_shared_fence for
  the same reservation object, and the BUG_ON triggers.

However, nothing bad would happen in
reservation_object_add_shared_inplace, since all fences use the same
context, so they can only occupy a single slot.

Prevent this by moving the BUG_ON to where an overflow would actually
happen (e.g. if a buggy caller didn't call
reservation_object_reserve_shared before).


Also, I'll add a reference to https://bugs.freedesktop.org/106418 in v2,
as I suspect this fix is necessary under the circumstances described
there as well.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace
@ 2018-07-04  9:30       ` Christian König
  0 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2018-07-04  9:30 UTC (permalink / raw)
  To: Michel Dänzer, Sumit Semwal
  Cc: linaro-mm-sig, dri-devel, linux-kernel, amd-gfx, linux-media

Am 04.07.2018 um 11:09 schrieb Michel Dänzer:
> On 2018-07-04 10:31 AM, Christian König wrote:
>> Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
>>> From: Michel Dänzer <michel.daenzer@amd.com>
>>>
>>> Fixes the BUG_ON spuriously triggering under the following
>>> circumstances:
>>>
>>> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>>>     the same reservation object, so it calls
>>>     reservation_object_reserve_shared with that reservation object once
>>>     for each such BO.
>>> * In reservation_object_reserve_shared, old->shared_count ==
>>>     old->shared_max - 1, so obj->staged is freed in preparation of an
>>>     in-place update.
>>> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>>>     once for each of the BOs above, always with the same fence.
>>> * The first call adds the fence in the remaining free slot, after which
>>>     old->shared_count == old->shared_max.
>> Well, the explanation here is not correct. For multiple BOs using the
>> same reservation object we won't call
>> reservation_object_add_shared_fence() multiple times because we move
>> those to the duplicates list in ttm_eu_reserve_buffers().
>>
>> But this bug can still happen because we call
>> reservation_object_add_shared_fence() manually with fences for the same
>> context in a couple of places.
>>
>> One prominent case which comes to my mind are for the VM BOs during
>> updates. Another possibility are VRAM BOs which need to be cleared.
> Thanks. How about the following:
>
> * ttm_eu_reserve_buffers calls reservation_object_reserve_shared.
> * In reservation_object_reserve_shared, shared_count == shared_max - 1,
>    so obj->staged is freed in preparation of an in-place update.
> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence,
>    after which shared_count == shared_max.
> * The amdgpu driver also calls reservation_object_add_shared_fence for
>    the same reservation object, and the BUG_ON triggers.

I would rather completely drop the reference to the ttm_eu_* functions, 
cause those wrappers are completely unrelated to the problem.

Instead let's just note something like the following:

* When reservation_object_reserve_shared is called with shared_count == 
shared_max - 1,
   so obj->staged is freed in preparation of an in-place update.

* Now reservation_object_add_shared_fence is called with the first fence 
and after that shared_count == shared_max.

* After that  reservation_object_add_shared_fence can be called with 
follow up fences from the same context, but since shared_count == 
shared_max we would run into this BUG_ON.

> However, nothing bad would happen in
> reservation_object_add_shared_inplace, since all fences use the same
> context, so they can only occupy a single slot.
>
> Prevent this by moving the BUG_ON to where an overflow would actually
> happen (e.g. if a buggy caller didn't call
> reservation_object_reserve_shared before).
>
>
> Also, I'll add a reference to https://bugs.freedesktop.org/106418 in v2,
> as I suspect this fix is necessary under the circumstances described
> there as well.

The rest sounds good to me.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace
@ 2018-07-04  9:30       ` Christian König
  0 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2018-07-04  9:30 UTC (permalink / raw)
  To: Michel Dänzer, Sumit Semwal
  Cc: linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-media-u79uwXL29TY76Z2rM5mHXA

Am 04.07.2018 um 11:09 schrieb Michel Dänzer:
> On 2018-07-04 10:31 AM, Christian König wrote:
>> Am 26.06.2018 um 16:31 schrieb Michel Dänzer:
>>> From: Michel Dänzer <michel.daenzer@amd.com>
>>>
>>> Fixes the BUG_ON spuriously triggering under the following
>>> circumstances:
>>>
>>> * ttm_eu_reserve_buffers processes a list containing multiple BOs using
>>>     the same reservation object, so it calls
>>>     reservation_object_reserve_shared with that reservation object once
>>>     for each such BO.
>>> * In reservation_object_reserve_shared, old->shared_count ==
>>>     old->shared_max - 1, so obj->staged is freed in preparation of an
>>>     in-place update.
>>> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence
>>>     once for each of the BOs above, always with the same fence.
>>> * The first call adds the fence in the remaining free slot, after which
>>>     old->shared_count == old->shared_max.
>> Well, the explanation here is not correct. For multiple BOs using the
>> same reservation object we won't call
>> reservation_object_add_shared_fence() multiple times because we move
>> those to the duplicates list in ttm_eu_reserve_buffers().
>>
>> But this bug can still happen because we call
>> reservation_object_add_shared_fence() manually with fences for the same
>> context in a couple of places.
>>
>> One prominent case which comes to my mind are for the VM BOs during
>> updates. Another possibility are VRAM BOs which need to be cleared.
> Thanks. How about the following:
>
> * ttm_eu_reserve_buffers calls reservation_object_reserve_shared.
> * In reservation_object_reserve_shared, shared_count == shared_max - 1,
>    so obj->staged is freed in preparation of an in-place update.
> * ttm_eu_fence_buffer_objects calls reservation_object_add_shared_fence,
>    after which shared_count == shared_max.
> * The amdgpu driver also calls reservation_object_add_shared_fence for
>    the same reservation object, and the BUG_ON triggers.

I would rather completely drop the reference to the ttm_eu_* functions, 
cause those wrappers are completely unrelated to the problem.

Instead let's just note something like the following:

* When reservation_object_reserve_shared is called with shared_count == 
shared_max - 1,
   so obj->staged is freed in preparation of an in-place update.

* Now reservation_object_add_shared_fence is called with the first fence 
and after that shared_count == shared_max.

* After that  reservation_object_add_shared_fence can be called with 
follow up fences from the same context, but since shared_count == 
shared_max we would run into this BUG_ON.

> However, nothing bad would happen in
> reservation_object_add_shared_inplace, since all fences use the same
> context, so they can only occupy a single slot.
>
> Prevent this by moving the BUG_ON to where an overflow would actually
> happen (e.g. if a buggy caller didn't call
> reservation_object_reserve_shared before).
>
>
> Also, I'll add a reference to https://bugs.freedesktop.org/106418 in v2,
> as I suspect this fix is necessary under the circumstances described
> there as well.

The rest sounds good to me.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace
@ 2018-07-04 15:14   ` Michel Dänzer
  0 siblings, 0 replies; 13+ messages in thread
From: Michel Dänzer @ 2018-07-04 15:14 UTC (permalink / raw)
  To: Christian König, Sumit Semwal
  Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, amd-gfx

From: Michel Dänzer <michel.daenzer@amd.com>

Fixes the BUG_ON spuriously triggering under the following
circumstances:

* reservation_object_reserve_shared is called with shared_count ==
  shared_max - 1, so obj->staged is freed in preparation of an in-place
  update.

* reservation_object_add_shared_fence is called with the first fence,
  after which shared_count == shared_max.

* reservation_object_add_shared_fence is called with a follow-up fence
  from the same context.

In the second reservation_object_add_shared_fence call, the BUG_ON
triggers. However, nothing bad would happen in
reservation_object_add_shared_inplace, since both fences are from the
same context, so they only occupy a single slot.

Prevent this by moving the BUG_ON to where an overflow would actually
happen (e.g. if a buggy caller didn't call
reservation_object_reserve_shared before).

v2:
* Fix description of breaking scenario (Christian König)
* Add bugzilla reference

Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/106418
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v1
Reviewed-by: Christian König <christian.koenig@amd.com> # v1
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 drivers/dma-buf/reservation.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 314eb1071cce..532545b9488e 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
 	if (signaled) {
 		RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
 	} else {
+		BUG_ON(fobj->shared_count >= fobj->shared_max);
 		RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
 		fobj->shared_count++;
 	}
@@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj,
 	old = reservation_object_get_list(obj);
 	obj->staged = NULL;
 
-	if (!fobj) {
-		BUG_ON(old->shared_count >= old->shared_max);
+	if (!fobj)
 		reservation_object_add_shared_inplace(obj, old, fence);
-	} else
+	else
 		reservation_object_add_shared_replace(obj, old, fobj, fence);
 }
 EXPORT_SYMBOL(reservation_object_add_shared_fence);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace
@ 2018-07-04 15:14   ` Michel Dänzer
  0 siblings, 0 replies; 13+ messages in thread
From: Michel Dänzer @ 2018-07-04 15:14 UTC (permalink / raw)
  To: Christian König, Sumit Semwal
  Cc: linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-media-u79uwXL29TY76Z2rM5mHXA

From: Michel Dänzer <michel.daenzer@amd.com>

Fixes the BUG_ON spuriously triggering under the following
circumstances:

* reservation_object_reserve_shared is called with shared_count ==
  shared_max - 1, so obj->staged is freed in preparation of an in-place
  update.

* reservation_object_add_shared_fence is called with the first fence,
  after which shared_count == shared_max.

* reservation_object_add_shared_fence is called with a follow-up fence
  from the same context.

In the second reservation_object_add_shared_fence call, the BUG_ON
triggers. However, nothing bad would happen in
reservation_object_add_shared_inplace, since both fences are from the
same context, so they only occupy a single slot.

Prevent this by moving the BUG_ON to where an overflow would actually
happen (e.g. if a buggy caller didn't call
reservation_object_reserve_shared before).

v2:
* Fix description of breaking scenario (Christian König)
* Add bugzilla reference

Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/106418
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v1
Reviewed-by: Christian König <christian.koenig@amd.com> # v1
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 drivers/dma-buf/reservation.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 314eb1071cce..532545b9488e 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -141,6 +141,7 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
 	if (signaled) {
 		RCU_INIT_POINTER(fobj->shared[signaled_idx], fence);
 	} else {
+		BUG_ON(fobj->shared_count >= fobj->shared_max);
 		RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
 		fobj->shared_count++;
 	}
@@ -230,10 +231,9 @@ void reservation_object_add_shared_fence(struct reservation_object *obj,
 	old = reservation_object_get_list(obj);
 	obj->staged = NULL;
 
-	if (!fobj) {
-		BUG_ON(old->shared_count >= old->shared_max);
+	if (!fobj)
 		reservation_object_add_shared_inplace(obj, old, fence);
-	} else
+	else
 		reservation_object_add_shared_replace(obj, old, fobj, fence);
 }
 EXPORT_SYMBOL(reservation_object_add_shared_fence);
-- 
2.18.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-07-04 15:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-26 14:31 [PATCH] dma-buf: Move BUG_ON from _add_shared_fence to _add_shared_inplace Michel Dänzer
2018-06-26 14:31 ` Michel Dänzer
2018-06-27 11:50 ` Chris Wilson
2018-06-27 16:04   ` Michel Dänzer
2018-06-27 16:04     ` Michel Dänzer
2018-07-04  8:31 ` Christian König
2018-07-04  8:31   ` Christian König
2018-07-04  9:09   ` Michel Dänzer
2018-07-04  9:09     ` Michel Dänzer
2018-07-04  9:30     ` Christian König
2018-07-04  9:30       ` Christian König
2018-07-04 15:14 ` [PATCH v2] " Michel Dänzer
2018-07-04 15:14   ` Michel Dänzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.