All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Two bugs found while rebasing my HMM branch
@ 2021-03-13  2:43 Felix Kuehling
  2021-03-13  2:43 ` [PATCH 1/2] drm/amdkfd: Fix recursive lock warnings Felix Kuehling
  2021-03-13  2:43 ` [PATCH 2/2] drm/amdkfd: Fix resource cursor initialization Felix Kuehling
  0 siblings, 2 replies; 7+ messages in thread
From: Felix Kuehling @ 2021-03-13  2:43 UTC (permalink / raw)
  To: amd-gfx

Felix Kuehling (2):
  drm/amdkfd: Fix recursive lock warnings
  drm/amdkfd: Fix resource cursor initialization

 drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c         | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

-- 
2.30.2

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] drm/amdkfd: Fix recursive lock warnings
  2021-03-13  2:43 [PATCH 0/2] Two bugs found while rebasing my HMM branch Felix Kuehling
@ 2021-03-13  2:43 ` Felix Kuehling
  2021-03-15 10:17   ` Christian König
  2021-03-13  2:43 ` [PATCH 2/2] drm/amdkfd: Fix resource cursor initialization Felix Kuehling
  1 sibling, 1 reply; 7+ messages in thread
From: Felix Kuehling @ 2021-03-13  2:43 UTC (permalink / raw)
  To: amd-gfx

memalloc_nofs_save/restore are no longer sufficient to prevent recursive
lock warnings when holding locks that can be taken in MMU notifiers. Use
memalloc_noreclaim_save/restore instead.

Fixes: f920e413ff9c ("mm: track mmu notifiers in fs_reclaim_acquire/release")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 318eeea577b5..bc3951b71079 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -93,13 +93,13 @@ struct amdgpu_prt_cb {
 static inline void amdgpu_vm_eviction_lock(struct amdgpu_vm *vm)
 {
 	mutex_lock(&vm->eviction_lock);
-	vm->saved_flags = memalloc_nofs_save();
+	vm->saved_flags = memalloc_noreclaim_save();
 }
 
 static inline int amdgpu_vm_eviction_trylock(struct amdgpu_vm *vm)
 {
 	if (mutex_trylock(&vm->eviction_lock)) {
-		vm->saved_flags = memalloc_nofs_save();
+		vm->saved_flags = memalloc_noreclaim_save();
 		return 1;
 	}
 	return 0;
@@ -107,7 +107,7 @@ static inline int amdgpu_vm_eviction_trylock(struct amdgpu_vm *vm)
 
 static inline void amdgpu_vm_eviction_unlock(struct amdgpu_vm *vm)
 {
-	memalloc_nofs_restore(vm->saved_flags);
+	memalloc_noreclaim_restore(vm->saved_flags);
 	mutex_unlock(&vm->eviction_lock);
 }
 
-- 
2.30.2

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] drm/amdkfd: Fix resource cursor initialization
  2021-03-13  2:43 [PATCH 0/2] Two bugs found while rebasing my HMM branch Felix Kuehling
  2021-03-13  2:43 ` [PATCH 1/2] drm/amdkfd: Fix recursive lock warnings Felix Kuehling
@ 2021-03-13  2:43 ` Felix Kuehling
  2021-03-15 10:22   ` Christian König
  1 sibling, 1 reply; 7+ messages in thread
From: Felix Kuehling @ 2021-03-13  2:43 UTC (permalink / raw)
  To: amd-gfx; +Cc: Christian König

Make sure the cur->size doesn't exceed cur->remaining. Otherwise the
first call to amdgpu_res_next will trigger the BUG_ON in that function.

Fixes: 3af0a018a728 ("drm/amdgpu: new resource cursor")
CC: Christian König <christian.koenig@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
index 1335e098510f..b49a61d07d60 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
@@ -68,7 +68,7 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
 		start -= node++->size << PAGE_SHIFT;
 
 	cur->start = (node->start << PAGE_SHIFT) + start;
-	cur->size = (node->size << PAGE_SHIFT) - start;
+	cur->size = min((node->size << PAGE_SHIFT) - start, size);
 	cur->remaining = size;
 	cur->node = node;
 }
-- 
2.30.2

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] drm/amdkfd: Fix recursive lock warnings
  2021-03-13  2:43 ` [PATCH 1/2] drm/amdkfd: Fix recursive lock warnings Felix Kuehling
@ 2021-03-15 10:17   ` Christian König
  0 siblings, 0 replies; 7+ messages in thread
From: Christian König @ 2021-03-15 10:17 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx

Am 13.03.21 um 03:43 schrieb Felix Kuehling:
> memalloc_nofs_save/restore are no longer sufficient to prevent recursive
> lock warnings when holding locks that can be taken in MMU notifiers. Use
> memalloc_noreclaim_save/restore instead.
>
> Fixes: f920e413ff9c ("mm: track mmu notifiers in fs_reclaim_acquire/release")
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com> for this one.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 318eeea577b5..bc3951b71079 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -93,13 +93,13 @@ struct amdgpu_prt_cb {
>   static inline void amdgpu_vm_eviction_lock(struct amdgpu_vm *vm)
>   {
>   	mutex_lock(&vm->eviction_lock);
> -	vm->saved_flags = memalloc_nofs_save();
> +	vm->saved_flags = memalloc_noreclaim_save();
>   }
>   
>   static inline int amdgpu_vm_eviction_trylock(struct amdgpu_vm *vm)
>   {
>   	if (mutex_trylock(&vm->eviction_lock)) {
> -		vm->saved_flags = memalloc_nofs_save();
> +		vm->saved_flags = memalloc_noreclaim_save();
>   		return 1;
>   	}
>   	return 0;
> @@ -107,7 +107,7 @@ static inline int amdgpu_vm_eviction_trylock(struct amdgpu_vm *vm)
>   
>   static inline void amdgpu_vm_eviction_unlock(struct amdgpu_vm *vm)
>   {
> -	memalloc_nofs_restore(vm->saved_flags);
> +	memalloc_noreclaim_restore(vm->saved_flags);
>   	mutex_unlock(&vm->eviction_lock);
>   }
>   

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: Fix resource cursor initialization
  2021-03-13  2:43 ` [PATCH 2/2] drm/amdkfd: Fix resource cursor initialization Felix Kuehling
@ 2021-03-15 10:22   ` Christian König
  2021-03-15 13:08     ` Felix Kuehling
  0 siblings, 1 reply; 7+ messages in thread
From: Christian König @ 2021-03-15 10:22 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx; +Cc: Christian König

Am 13.03.21 um 03:43 schrieb Felix Kuehling:
> Make sure the cur->size doesn't exceed cur->remaining. Otherwise the
> first call to amdgpu_res_next will trigger the BUG_ON in that function.

Mhm the BUG_ON is correct since the function complains that we want to 
move the cursor forward by more than originally expected.

The problem is rather that somebody is using cur->size which is the size 
of the current segment as parameter for amdgpu_res_next().

Do you have a backtrace of this?

Thanks,
Christian.

>
> Fixes: 3af0a018a728 ("drm/amdgpu: new resource cursor")
> CC: Christian König <christian.koenig@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
> index 1335e098510f..b49a61d07d60 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
> @@ -68,7 +68,7 @@ static inline void amdgpu_res_first(struct ttm_resource *res,
>   		start -= node++->size << PAGE_SHIFT;
>   
>   	cur->start = (node->start << PAGE_SHIFT) + start;
> -	cur->size = (node->size << PAGE_SHIFT) - start;
> +	cur->size = min((node->size << PAGE_SHIFT) - start, size);
>   	cur->remaining = size;
>   	cur->node = node;
>   }

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: Fix resource cursor initialization
  2021-03-15 10:22   ` Christian König
@ 2021-03-15 13:08     ` Felix Kuehling
  2021-03-15 13:15       ` Christian König
  0 siblings, 1 reply; 7+ messages in thread
From: Felix Kuehling @ 2021-03-15 13:08 UTC (permalink / raw)
  To: Christian König, amd-gfx; +Cc: Christian König


Am 2021-03-15 um 6:22 a.m. schrieb Christian König:
> Am 13.03.21 um 03:43 schrieb Felix Kuehling:
>> Make sure the cur->size doesn't exceed cur->remaining. Otherwise the
>> first call to amdgpu_res_next will trigger the BUG_ON in that function.
>
> Mhm the BUG_ON is correct since the function complains that we want to
> move the cursor forward by more than originally expected.
>
> The problem is rather that somebody is using cur->size which is the
> size of the current segment as parameter for amdgpu_res_next().
>
> Do you have a backtrace of this?
I didn't save the backtrace. The problem was in
amdgpu_vm_bo_update_mapping. num_entries is based on cursor.size and
later used in the amdpu_res_next call.

But I think that should be OK. If cursor.size is the current segment
size, it should not exceed cursor.remaining. Otherwise every user of the
cursor would have to check both cursor.size and cursor.remaining all the
time, which would be inconvenient. amdgpu_res_next ensures that with
cur->size = min(node->size << PAGE_SHIFT, cur->remaining). I think
amdgpu_res_first should do the same.

Regards,
  Felix


>
> Thanks,
> Christian.
>
>>
>> Fixes: 3af0a018a728 ("drm/amdgpu: new resource cursor")
>> CC: Christian König <christian.koenig@amd.com>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
>> index 1335e098510f..b49a61d07d60 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
>> @@ -68,7 +68,7 @@ static inline void amdgpu_res_first(struct
>> ttm_resource *res,
>>           start -= node++->size << PAGE_SHIFT;
>>         cur->start = (node->start << PAGE_SHIFT) + start;
>> -    cur->size = (node->size << PAGE_SHIFT) - start;
>> +    cur->size = min((node->size << PAGE_SHIFT) - start, size);
>>       cur->remaining = size;
>>       cur->node = node;
>>   }
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: Fix resource cursor initialization
  2021-03-15 13:08     ` Felix Kuehling
@ 2021-03-15 13:15       ` Christian König
  0 siblings, 0 replies; 7+ messages in thread
From: Christian König @ 2021-03-15 13:15 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx; +Cc: Christian König



Am 15.03.21 um 14:08 schrieb Felix Kuehling:
> Am 2021-03-15 um 6:22 a.m. schrieb Christian König:
>> Am 13.03.21 um 03:43 schrieb Felix Kuehling:
>>> Make sure the cur->size doesn't exceed cur->remaining. Otherwise the
>>> first call to amdgpu_res_next will trigger the BUG_ON in that function.
>> Mhm the BUG_ON is correct since the function complains that we want to
>> move the cursor forward by more than originally expected.
>>
>> The problem is rather that somebody is using cur->size which is the
>> size of the current segment as parameter for amdgpu_res_next().
>>
>> Do you have a backtrace of this?
> I didn't save the backtrace. The problem was in
> amdgpu_vm_bo_update_mapping. num_entries is based on cursor.size and
> later used in the amdpu_res_next call.

Yeah, found that in the meantime as well.

> But I think that should be OK. If cursor.size is the current segment
> size, it should not exceed cursor.remaining. Otherwise every user of the
> cursor would have to check both cursor.size and cursor.remaining all the
> time, which would be inconvenient. amdgpu_res_next ensures that with
> cur->size = min(node->size << PAGE_SHIFT, cur->remaining). I think
> amdgpu_res_first should do the same.

Ok, good point. Patch is Reviewed-by: Christian König 
<christian.koenig@amd.com> then.

Regards,
Christian.

>
> Regards,
>    Felix
>
>
>> Thanks,
>> Christian.
>>
>>> Fixes: 3af0a018a728 ("drm/amdgpu: new resource cursor")
>>> CC: Christian König <christian.koenig@amd.com>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
>>> index 1335e098510f..b49a61d07d60 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
>>> @@ -68,7 +68,7 @@ static inline void amdgpu_res_first(struct
>>> ttm_resource *res,
>>>            start -= node++->size << PAGE_SHIFT;
>>>          cur->start = (node->start << PAGE_SHIFT) + start;
>>> -    cur->size = (node->size << PAGE_SHIFT) - start;
>>> +    cur->size = min((node->size << PAGE_SHIFT) - start, size);
>>>        cur->remaining = size;
>>>        cur->node = node;
>>>    }

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-03-15 13:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-13  2:43 [PATCH 0/2] Two bugs found while rebasing my HMM branch Felix Kuehling
2021-03-13  2:43 ` [PATCH 1/2] drm/amdkfd: Fix recursive lock warnings Felix Kuehling
2021-03-15 10:17   ` Christian König
2021-03-13  2:43 ` [PATCH 2/2] drm/amdkfd: Fix resource cursor initialization Felix Kuehling
2021-03-15 10:22   ` Christian König
2021-03-15 13:08     ` Felix Kuehling
2021-03-15 13:15       ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.