All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] fix gmc page fault on navi1X
@ 2020-03-13 16:09 xinhui pan
  2020-03-13 16:09 ` [PATCH v4 1/2] drm_amdgpu: Add job fence to resv conditionally xinhui pan
  2020-03-13 16:09 ` [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit xinhui pan
  0 siblings, 2 replies; 10+ messages in thread
From: xinhui pan @ 2020-03-13 16:09 UTC (permalink / raw)
  To: amd-gfx; +Cc: xinhui pan

We hit gmc page fault on navi1X.
UMR tells that the physical address of pte is bad.

Two issues:
1) we did not sync job schedule fence while update mapping.
fix it by adding bo fence to resv after every job submit.

2) we might unref page table bo during update ptes, at the same time, there
is job pending on bo. and submit a job in commit after free bo.
We need free the bo after adding all fence to bo.

change from v3:
use vm to get root bo resv.
add vm zombies list head.

change from v2:
use the correct page table bo resv

change from v1:
fix rebase issue.

xinhui pan (2):
  drm_amdgpu: Add job fence to resv conditionally
  drm/amdgpu: unref pt bo after job submit

 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      | 24 ++++++++++++++++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |  3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |  4 +++-
 3 files changed, 27 insertions(+), 4 deletions(-)

-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4 1/2] drm_amdgpu: Add job fence to resv conditionally
  2020-03-13 16:09 [PATCH v4 0/2] fix gmc page fault on navi1X xinhui pan
@ 2020-03-13 16:09 ` xinhui pan
  2020-03-13 16:09 ` [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit xinhui pan
  1 sibling, 0 replies; 10+ messages in thread
From: xinhui pan @ 2020-03-13 16:09 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Felix Kuehling, xinhui pan, Christian König

Job fence on page table should be a shared one, so add it to the root
page talbe bo resv.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
index 4cc7881f438c..c094654b233c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
@@ -107,8 +107,10 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
 	tmp = dma_fence_get(f);
 	if (p->direct)
 		swap(p->vm->last_direct, tmp);
-	else
+	else {
+		dma_resv_add_shared_fence(p->vm->root.base.bo->tbo.base.resv, tmp);
 		swap(p->vm->last_delayed, tmp);
+	}
 	dma_fence_put(tmp);
 
 	if (fence && !p->direct)
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit
  2020-03-13 16:09 [PATCH v4 0/2] fix gmc page fault on navi1X xinhui pan
  2020-03-13 16:09 ` [PATCH v4 1/2] drm_amdgpu: Add job fence to resv conditionally xinhui pan
@ 2020-03-13 16:09 ` xinhui pan
  2020-03-13 17:36   ` Felix Kuehling
  1 sibling, 1 reply; 10+ messages in thread
From: xinhui pan @ 2020-03-13 16:09 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Felix Kuehling, xinhui pan, Christian König

Free page table bo before job submit is insane.
We might touch invalid memory while job is runnig.

we now have individualized bo resv during bo releasing.
So any fences added to root PT bo is actually untested when
a normal PT bo is releasing.

We might hit gmc page fault or memory just got overwrited.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 24 +++++++++++++++++++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  3 +++
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 73398831196f..346e2f753474 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -937,6 +937,21 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
 	return r;
 }
 
+static void amdgpu_vm_free_zombie_bo(struct amdgpu_device *adev,
+		struct amdgpu_vm *vm)
+{
+	struct amdgpu_vm_pt *entry;
+
+	while (!list_empty(&vm->zombies)) {
+		entry = list_first_entry(&vm->zombies, struct amdgpu_vm_pt,
+				base.vm_status);
+		list_del(&entry->base.vm_status);
+
+		amdgpu_bo_unref(&entry->base.bo->shadow);
+		amdgpu_bo_unref(&entry->base.bo);
+	}
+}
+
 /**
  * amdgpu_vm_free_table - fre one PD/PT
  *
@@ -945,10 +960,9 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
 static void amdgpu_vm_free_table(struct amdgpu_vm_pt *entry)
 {
 	if (entry->base.bo) {
+		list_move(&entry->base.vm_status,
+				&entry->base.bo->vm_bo->vm->zombies);
 		entry->base.bo->vm_bo = NULL;
-		list_del(&entry->base.vm_status);
-		amdgpu_bo_unref(&entry->base.bo->shadow);
-		amdgpu_bo_unref(&entry->base.bo);
 	}
 	kvfree(entry->entries);
 	entry->entries = NULL;
@@ -1624,6 +1638,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 	r = vm->update_funcs->commit(&params, fence);
 
 error_unlock:
+	amdgpu_vm_free_zombie_bo(adev, vm);
 	amdgpu_vm_eviction_unlock(vm);
 	return r;
 }
@@ -2807,6 +2822,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	INIT_LIST_HEAD(&vm->invalidated);
 	spin_lock_init(&vm->invalidated_lock);
 	INIT_LIST_HEAD(&vm->freed);
+	INIT_LIST_HEAD(&vm->zombies);
 
 
 	/* create scheduler entities for page table updates */
@@ -3119,6 +3135,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 	}
 
 	amdgpu_vm_free_pts(adev, vm, NULL);
+	amdgpu_vm_free_zombie_bo(adev, vm);
+
 	amdgpu_bo_unreserve(root);
 	amdgpu_bo_unref(&root);
 	WARN_ON(vm->root.base.bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index b5705fcfc935..9baf44fa16f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -269,6 +269,9 @@ struct amdgpu_vm {
 	/* BO mappings freed, but not yet updated in the PT */
 	struct list_head	freed;
 
+	/* BO will be freed soon */
+	struct list_head	zombies;
+
 	/* contains the page directory */
 	struct amdgpu_vm_pt     root;
 	struct dma_fence	*last_update;
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit
  2020-03-13 16:09 ` [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit xinhui pan
@ 2020-03-13 17:36   ` Felix Kuehling
  2020-03-13 18:05     ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Felix Kuehling @ 2020-03-13 17:36 UTC (permalink / raw)
  To: xinhui pan, amd-gfx; +Cc: Alex Deucher, Christian König

This seems weird. This means that we update a page table, and then free 
it in the same amdgpu_vm_update_ptes call? That means the update is 
redundant. Can we eliminate the redundant PTE update if the page table 
is about to be freed anyway?

Regards,
   Felix

On 2020-03-13 12:09, xinhui pan wrote:
> Free page table bo before job submit is insane.
> We might touch invalid memory while job is runnig.
>
> we now have individualized bo resv during bo releasing.
> So any fences added to root PT bo is actually untested when
> a normal PT bo is releasing.
>
> We might hit gmc page fault or memory just got overwrited.
>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 24 +++++++++++++++++++++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  3 +++
>   2 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 73398831196f..346e2f753474 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -937,6 +937,21 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
>   	return r;
>   }
>   
> +static void amdgpu_vm_free_zombie_bo(struct amdgpu_device *adev,
> +		struct amdgpu_vm *vm)
> +{
> +	struct amdgpu_vm_pt *entry;
> +
> +	while (!list_empty(&vm->zombies)) {
> +		entry = list_first_entry(&vm->zombies, struct amdgpu_vm_pt,
> +				base.vm_status);
> +		list_del(&entry->base.vm_status);
> +
> +		amdgpu_bo_unref(&entry->base.bo->shadow);
> +		amdgpu_bo_unref(&entry->base.bo);
> +	}
> +}
> +
>   /**
>    * amdgpu_vm_free_table - fre one PD/PT
>    *
> @@ -945,10 +960,9 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
>   static void amdgpu_vm_free_table(struct amdgpu_vm_pt *entry)
>   {
>   	if (entry->base.bo) {
> +		list_move(&entry->base.vm_status,
> +				&entry->base.bo->vm_bo->vm->zombies);
>   		entry->base.bo->vm_bo = NULL;
> -		list_del(&entry->base.vm_status);
> -		amdgpu_bo_unref(&entry->base.bo->shadow);
> -		amdgpu_bo_unref(&entry->base.bo);
>   	}
>   	kvfree(entry->entries);
>   	entry->entries = NULL;
> @@ -1624,6 +1638,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>   	r = vm->update_funcs->commit(&params, fence);
>   
>   error_unlock:
> +	amdgpu_vm_free_zombie_bo(adev, vm);
>   	amdgpu_vm_eviction_unlock(vm);
>   	return r;
>   }
> @@ -2807,6 +2822,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   	INIT_LIST_HEAD(&vm->invalidated);
>   	spin_lock_init(&vm->invalidated_lock);
>   	INIT_LIST_HEAD(&vm->freed);
> +	INIT_LIST_HEAD(&vm->zombies);
>   
>   
>   	/* create scheduler entities for page table updates */
> @@ -3119,6 +3135,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>   	}
>   
>   	amdgpu_vm_free_pts(adev, vm, NULL);
> +	amdgpu_vm_free_zombie_bo(adev, vm);
> +
>   	amdgpu_bo_unreserve(root);
>   	amdgpu_bo_unref(&root);
>   	WARN_ON(vm->root.base.bo);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index b5705fcfc935..9baf44fa16f0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -269,6 +269,9 @@ struct amdgpu_vm {
>   	/* BO mappings freed, but not yet updated in the PT */
>   	struct list_head	freed;
>   
> +	/* BO will be freed soon */
> +	struct list_head	zombies;
> +
>   	/* contains the page directory */
>   	struct amdgpu_vm_pt     root;
>   	struct dma_fence	*last_update;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit
  2020-03-13 17:36   ` Felix Kuehling
@ 2020-03-13 18:05     ` Christian König
  2020-03-14 13:06       ` Pan, Xinhui
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2020-03-13 18:05 UTC (permalink / raw)
  To: Felix Kuehling, xinhui pan, amd-gfx; +Cc: Alex Deucher

The page table is not updated and then freed. A higher level PDE is 
updated and because of this the lower level page tables is freed.

Without this it could be that the memory backing the freed page table is 
reused while the PDE is still pointing to it.

Rather unlikely that this causes problems, but better save than sorry.

Regards,
Christian.

Am 13.03.20 um 18:36 schrieb Felix Kuehling:
> This seems weird. This means that we update a page table, and then 
> free it in the same amdgpu_vm_update_ptes call? That means the update 
> is redundant. Can we eliminate the redundant PTE update if the page 
> table is about to be freed anyway?
>
> Regards,
>   Felix
>
> On 2020-03-13 12:09, xinhui pan wrote:
>> Free page table bo before job submit is insane.
>> We might touch invalid memory while job is runnig.
>>
>> we now have individualized bo resv during bo releasing.
>> So any fences added to root PT bo is actually untested when
>> a normal PT bo is releasing.
>>
>> We might hit gmc page fault or memory just got overwrited.
>>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 24 +++++++++++++++++++++---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  3 +++
>>   2 files changed, 24 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 73398831196f..346e2f753474 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -937,6 +937,21 @@ static int amdgpu_vm_alloc_pts(struct 
>> amdgpu_device *adev,
>>       return r;
>>   }
>>   +static void amdgpu_vm_free_zombie_bo(struct amdgpu_device *adev,
>> +        struct amdgpu_vm *vm)
>> +{
>> +    struct amdgpu_vm_pt *entry;
>> +
>> +    while (!list_empty(&vm->zombies)) {
>> +        entry = list_first_entry(&vm->zombies, struct amdgpu_vm_pt,
>> +                base.vm_status);
>> +        list_del(&entry->base.vm_status);
>> +
>> +        amdgpu_bo_unref(&entry->base.bo->shadow);
>> +        amdgpu_bo_unref(&entry->base.bo);
>> +    }
>> +}
>> +
>>   /**
>>    * amdgpu_vm_free_table - fre one PD/PT
>>    *
>> @@ -945,10 +960,9 @@ static int amdgpu_vm_alloc_pts(struct 
>> amdgpu_device *adev,
>>   static void amdgpu_vm_free_table(struct amdgpu_vm_pt *entry)
>>   {
>>       if (entry->base.bo) {
>> +        list_move(&entry->base.vm_status,
>> + &entry->base.bo->vm_bo->vm->zombies);
>>           entry->base.bo->vm_bo = NULL;
>> -        list_del(&entry->base.vm_status);
>> -        amdgpu_bo_unref(&entry->base.bo->shadow);
>> -        amdgpu_bo_unref(&entry->base.bo);
>>       }
>>       kvfree(entry->entries);
>>       entry->entries = NULL;
>> @@ -1624,6 +1638,7 @@ static int amdgpu_vm_bo_update_mapping(struct 
>> amdgpu_device *adev,
>>       r = vm->update_funcs->commit(&params, fence);
>>     error_unlock:
>> +    amdgpu_vm_free_zombie_bo(adev, vm);
>>       amdgpu_vm_eviction_unlock(vm);
>>       return r;
>>   }
>> @@ -2807,6 +2822,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, 
>> struct amdgpu_vm *vm,
>>       INIT_LIST_HEAD(&vm->invalidated);
>>       spin_lock_init(&vm->invalidated_lock);
>>       INIT_LIST_HEAD(&vm->freed);
>> +    INIT_LIST_HEAD(&vm->zombies);
>>           /* create scheduler entities for page table updates */
>> @@ -3119,6 +3135,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, 
>> struct amdgpu_vm *vm)
>>       }
>>         amdgpu_vm_free_pts(adev, vm, NULL);
>> +    amdgpu_vm_free_zombie_bo(adev, vm);
>> +
>>       amdgpu_bo_unreserve(root);
>>       amdgpu_bo_unref(&root);
>>       WARN_ON(vm->root.base.bo);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> index b5705fcfc935..9baf44fa16f0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> @@ -269,6 +269,9 @@ struct amdgpu_vm {
>>       /* BO mappings freed, but not yet updated in the PT */
>>       struct list_head    freed;
>>   +    /* BO will be freed soon */
>> +    struct list_head    zombies;
>> +
>>       /* contains the page directory */
>>       struct amdgpu_vm_pt     root;
>>       struct dma_fence    *last_update;

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit
  2020-03-13 18:05     ` Christian König
@ 2020-03-14 13:06       ` Pan, Xinhui
  2020-03-16  8:15         ` Tao, Yintian
  0 siblings, 1 reply; 10+ messages in thread
From: Pan, Xinhui @ 2020-03-14 13:06 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: Deucher, Alexander, Kuehling, Felix, Pan, Xinhui, amd-gfx

hi, All
I think I found the root cause. here is what happened.

user: alloc/mapping memory
	 	kernel: validate memory and update the bo mapping, and update the page table
			-> amdgpu_vm_bo_update_mapping
				-> amdgpu_vm_update_ptes
					-> amdgpu_vm_alloc_pts
						-> amdgpu_vm_clear_bo // it will submit a job and we have a fence. BUT it is NOT added in resv.
user: free/unmapping memory
		kernel: unmapping mmeory and udpate the page table
			-> amdgpu_vm_bo_update_mapping
			sync last_delay fence if flag & AMDGPU_PTE_VALID // of source we did not sync it here, as this is unmapping.
				-> amdgpu_vm_update_ptes
					-> amdgpu_vm_free_pts // unref page table bo.

So from the sequence above, we know there is a race betwen bo releasing and bo clearing.
bo might have been released before job running.

we can fix it in several ways,
1) sync last_delay in both mapping and unmapping case.
 Chris, you just sync last_delay in mapping case, should it be ok to sync it also in unmapping case?

2) always add fence to resv after commit. 
 this is done by patchset v4. And only need patch 1. no need to move unref bo after commit.

3) move unref bo after commit, and add the last delay fence to resv. 
This is done by patchset V1. 


any ideas?

thanks
xinhui

> 2020年3月14日 02:05,Koenig, Christian <Christian.Koenig@amd.com> 写道:
> 
> The page table is not updated and then freed. A higher level PDE is updated and because of this the lower level page tables is freed.
> 
> Without this it could be that the memory backing the freed page table is reused while the PDE is still pointing to it.
> 
> Rather unlikely that this causes problems, but better save than sorry.
> 
> Regards,
> Christian.
> 
> Am 13.03.20 um 18:36 schrieb Felix Kuehling:
>> This seems weird. This means that we update a page table, and then free it in the same amdgpu_vm_update_ptes call? That means the update is redundant. Can we eliminate the redundant PTE update if the page table is about to be freed anyway?
>> 
>> Regards,
>>   Felix
>> 
>> On 2020-03-13 12:09, xinhui pan wrote:
>>> Free page table bo before job submit is insane.
>>> We might touch invalid memory while job is runnig.
>>> 
>>> we now have individualized bo resv during bo releasing.
>>> So any fences added to root PT bo is actually untested when
>>> a normal PT bo is releasing.
>>> 
>>> We might hit gmc page fault or memory just got overwrited.
>>> 
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 24 +++++++++++++++++++++---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  3 +++
>>>   2 files changed, 24 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 73398831196f..346e2f753474 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -937,6 +937,21 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
>>>       return r;
>>>   }
>>>   +static void amdgpu_vm_free_zombie_bo(struct amdgpu_device *adev,
>>> +        struct amdgpu_vm *vm)
>>> +{
>>> +    struct amdgpu_vm_pt *entry;
>>> +
>>> +    while (!list_empty(&vm->zombies)) {
>>> +        entry = list_first_entry(&vm->zombies, struct amdgpu_vm_pt,
>>> +                base.vm_status);
>>> +        list_del(&entry->base.vm_status);
>>> +
>>> +        amdgpu_bo_unref(&entry->base.bo->shadow);
>>> +        amdgpu_bo_unref(&entry->base.bo);
>>> +    }
>>> +}
>>> +
>>>   /**
>>>    * amdgpu_vm_free_table - fre one PD/PT
>>>    *
>>> @@ -945,10 +960,9 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
>>>   static void amdgpu_vm_free_table(struct amdgpu_vm_pt *entry)
>>>   {
>>>       if (entry->base.bo) {
>>> +        list_move(&entry->base.vm_status,
>>> + &entry->base.bo->vm_bo->vm->zombies);
>>>           entry->base.bo->vm_bo = NULL;
>>> -        list_del(&entry->base.vm_status);
>>> -        amdgpu_bo_unref(&entry->base.bo->shadow);
>>> -        amdgpu_bo_unref(&entry->base.bo);
>>>       }
>>>       kvfree(entry->entries);
>>>       entry->entries = NULL;
>>> @@ -1624,6 +1638,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>       r = vm->update_funcs->commit(&params, fence);
>>>     error_unlock:
>>> +    amdgpu_vm_free_zombie_bo(adev, vm);
>>>       amdgpu_vm_eviction_unlock(vm);
>>>       return r;
>>>   }
>>> @@ -2807,6 +2822,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>>       INIT_LIST_HEAD(&vm->invalidated);
>>>       spin_lock_init(&vm->invalidated_lock);
>>>       INIT_LIST_HEAD(&vm->freed);
>>> +    INIT_LIST_HEAD(&vm->zombies);
>>>           /* create scheduler entities for page table updates */
>>> @@ -3119,6 +3135,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>>>       }
>>>         amdgpu_vm_free_pts(adev, vm, NULL);
>>> +    amdgpu_vm_free_zombie_bo(adev, vm);
>>> +
>>>       amdgpu_bo_unreserve(root);
>>>       amdgpu_bo_unref(&root);
>>>       WARN_ON(vm->root.base.bo);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> index b5705fcfc935..9baf44fa16f0 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> @@ -269,6 +269,9 @@ struct amdgpu_vm {
>>>       /* BO mappings freed, but not yet updated in the PT */
>>>       struct list_head    freed;
>>>   +    /* BO will be freed soon */
>>> +    struct list_head    zombies;
>>> +
>>>       /* contains the page directory */
>>>       struct amdgpu_vm_pt     root;
>>>       struct dma_fence    *last_update;
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit
  2020-03-14 13:06       ` Pan, Xinhui
@ 2020-03-16  8:15         ` Tao, Yintian
  2020-03-16  9:51           ` Pan, Xinhui
  2020-03-16 12:15           ` Christian König
  0 siblings, 2 replies; 10+ messages in thread
From: Tao, Yintian @ 2020-03-16  8:15 UTC (permalink / raw)
  To: Pan, Xinhui, Koenig, Christian
  Cc: Deucher, Alexander, Kuehling, Felix, Pan, Xinhui, amd-gfx

Hi Xinhui


I encounter the same problem(page fault) when test vk_example benchmark.
I use your first option which can fix the problem. Can you help submit one patch?


-       if (flags & AMDGPU_PTE_VALID) {
-               struct amdgpu_bo *root = vm->root.base.bo;
-               if (!dma_fence_is_signaled(vm->last_direct))
-                       amdgpu_bo_fence(root, vm->last_direct, true);
+       if (!dma_fence_is_signaled(vm->last_direct))
+               amdgpu_bo_fence(root, vm->last_direct, true);
 
-               if (!dma_fence_is_signaled(vm->last_delayed))
-                       amdgpu_bo_fence(root, vm->last_delayed, true);
-       }
+       if (!dma_fence_is_signaled(vm->last_delayed))
+               amdgpu_bo_fence(root, vm->last_delayed, true);


Best Regards
Yintian Tao

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Pan, Xinhui
Sent: 2020年3月14日 21:07
To: Koenig, Christian <Christian.Koenig@amd.com>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit

hi, All
I think I found the root cause. here is what happened.

user: alloc/mapping memory
	 	kernel: validate memory and update the bo mapping, and update the page table
			-> amdgpu_vm_bo_update_mapping
				-> amdgpu_vm_update_ptes
					-> amdgpu_vm_alloc_pts
						-> amdgpu_vm_clear_bo // it will submit a job and we have a fence. BUT it is NOT added in resv.
user: free/unmapping memory
		kernel: unmapping mmeory and udpate the page table
			-> amdgpu_vm_bo_update_mapping
			sync last_delay fence if flag & AMDGPU_PTE_VALID // of source we did not sync it here, as this is unmapping.
				-> amdgpu_vm_update_ptes
					-> amdgpu_vm_free_pts // unref page table bo.

So from the sequence above, we know there is a race betwen bo releasing and bo clearing.
bo might have been released before job running.

we can fix it in several ways,
1) sync last_delay in both mapping and unmapping case.
 Chris, you just sync last_delay in mapping case, should it be ok to sync it also in unmapping case?

2) always add fence to resv after commit. 
 this is done by patchset v4. And only need patch 1. no need to move unref bo after commit.

3) move unref bo after commit, and add the last delay fence to resv. 
This is done by patchset V1. 


any ideas?

thanks
xinhui

> 2020年3月14日 02:05,Koenig, Christian <Christian.Koenig@amd.com> 写道:
> 
> The page table is not updated and then freed. A higher level PDE is updated and because of this the lower level page tables is freed.
> 
> Without this it could be that the memory backing the freed page table is reused while the PDE is still pointing to it.
> 
> Rather unlikely that this causes problems, but better save than sorry.
> 
> Regards,
> Christian.
> 
> Am 13.03.20 um 18:36 schrieb Felix Kuehling:
>> This seems weird. This means that we update a page table, and then free it in the same amdgpu_vm_update_ptes call? That means the update is redundant. Can we eliminate the redundant PTE update if the page table is about to be freed anyway?
>> 
>> Regards,
>>   Felix
>> 
>> On 2020-03-13 12:09, xinhui pan wrote:
>>> Free page table bo before job submit is insane.
>>> We might touch invalid memory while job is runnig.
>>> 
>>> we now have individualized bo resv during bo releasing.
>>> So any fences added to root PT bo is actually untested when a normal 
>>> PT bo is releasing.
>>> 
>>> We might hit gmc page fault or memory just got overwrited.
>>> 
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 24 +++++++++++++++++++++---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  3 +++
>>>   2 files changed, 24 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 73398831196f..346e2f753474 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -937,6 +937,21 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
>>>       return r;
>>>   }
>>>   +static void amdgpu_vm_free_zombie_bo(struct amdgpu_device *adev,
>>> +        struct amdgpu_vm *vm)
>>> +{
>>> +    struct amdgpu_vm_pt *entry;
>>> +
>>> +    while (!list_empty(&vm->zombies)) {
>>> +        entry = list_first_entry(&vm->zombies, struct amdgpu_vm_pt,
>>> +                base.vm_status);
>>> +        list_del(&entry->base.vm_status);
>>> +
>>> +        amdgpu_bo_unref(&entry->base.bo->shadow);
>>> +        amdgpu_bo_unref(&entry->base.bo);
>>> +    }
>>> +}
>>> +
>>>   /**
>>>    * amdgpu_vm_free_table - fre one PD/PT
>>>    *
>>> @@ -945,10 +960,9 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
>>>   static void amdgpu_vm_free_table(struct amdgpu_vm_pt *entry)
>>>   {
>>>       if (entry->base.bo) {
>>> +        list_move(&entry->base.vm_status, 
>>> + &entry->base.bo->vm_bo->vm->zombies);
>>>           entry->base.bo->vm_bo = NULL;
>>> -        list_del(&entry->base.vm_status);
>>> -        amdgpu_bo_unref(&entry->base.bo->shadow);
>>> -        amdgpu_bo_unref(&entry->base.bo);
>>>       }
>>>       kvfree(entry->entries);
>>>       entry->entries = NULL;
>>> @@ -1624,6 +1638,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>       r = vm->update_funcs->commit(&params, fence);
>>>     error_unlock:
>>> +    amdgpu_vm_free_zombie_bo(adev, vm);
>>>       amdgpu_vm_eviction_unlock(vm);
>>>       return r;
>>>   }
>>> @@ -2807,6 +2822,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>>       INIT_LIST_HEAD(&vm->invalidated);
>>>       spin_lock_init(&vm->invalidated_lock);
>>>       INIT_LIST_HEAD(&vm->freed);
>>> +    INIT_LIST_HEAD(&vm->zombies);
>>>           /* create scheduler entities for page table updates */ @@ 
>>> -3119,6 +3135,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>>>       }
>>>         amdgpu_vm_free_pts(adev, vm, NULL);
>>> +    amdgpu_vm_free_zombie_bo(adev, vm);
>>> +
>>>       amdgpu_bo_unreserve(root);
>>>       amdgpu_bo_unref(&root);
>>>       WARN_ON(vm->root.base.bo);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> index b5705fcfc935..9baf44fa16f0 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> @@ -269,6 +269,9 @@ struct amdgpu_vm {
>>>       /* BO mappings freed, but not yet updated in the PT */
>>>       struct list_head    freed;
>>>   +    /* BO will be freed soon */
>>> +    struct list_head    zombies;
>>> +
>>>       /* contains the page directory */
>>>       struct amdgpu_vm_pt     root;
>>>       struct dma_fence    *last_update;
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cyintian.tao%40amd.com%7C580c8ec15d484bf546b208d7c8188cc2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637197880120611671&amp;sdata=dqSLasyuZhQskB38Kib6g8lQR9iMnyxFxfHGXXENoDc%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit
  2020-03-16  8:15         ` Tao, Yintian
@ 2020-03-16  9:51           ` Pan, Xinhui
  2020-03-16  9:54             ` Tao, Yintian
  2020-03-16 12:15           ` Christian König
  1 sibling, 1 reply; 10+ messages in thread
From: Pan, Xinhui @ 2020-03-16  9:51 UTC (permalink / raw)
  To: Koenig, Christian, Tao, Yintian
  Cc: Deucher, Alexander, Kuehling, Felix, amd-gfx


[-- Attachment #1.1: Type: text/plain, Size: 8515 bytes --]

[AMD Official Use Only - Internal Distribution Only]

I still hit page fault with option 1 while running oclperf test.
Looks like we need sync fence after commit.
________________________________
From: Tao, Yintian <Yintian.Tao@amd.com>
Sent: Monday, March 16, 2020 4:15:01 PM
To: Pan, Xinhui <Xinhui.Pan@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Subject: RE: [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit

Hi Xinhui


I encounter the same problem(page fault) when test vk_example benchmark.
I use your first option which can fix the problem. Can you help submit one patch?


-       if (flags & AMDGPU_PTE_VALID) {
-               struct amdgpu_bo *root = vm->root.base.bo;
-               if (!dma_fence_is_signaled(vm->last_direct))
-                       amdgpu_bo_fence(root, vm->last_direct, true);
+       if (!dma_fence_is_signaled(vm->last_direct))
+               amdgpu_bo_fence(root, vm->last_direct, true);

-               if (!dma_fence_is_signaled(vm->last_delayed))
-                       amdgpu_bo_fence(root, vm->last_delayed, true);
-       }
+       if (!dma_fence_is_signaled(vm->last_delayed))
+               amdgpu_bo_fence(root, vm->last_delayed, true);


Best Regards
Yintian Tao

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Pan, Xinhui
Sent: 2020年3月14日 21:07
To: Koenig, Christian <Christian.Koenig@amd.com>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit

hi, All
I think I found the root cause. here is what happened.

user: alloc/mapping memory
          kernel: validate memory and update the bo mapping, and update the page table
                        -> amdgpu_vm_bo_update_mapping
                                -> amdgpu_vm_update_ptes
                                        -> amdgpu_vm_alloc_pts
                                                -> amdgpu_vm_clear_bo // it will submit a job and we have a fence. BUT it is NOT added in resv.
user: free/unmapping memory
                kernel: unmapping mmeory and udpate the page table
                        -> amdgpu_vm_bo_update_mapping
                        sync last_delay fence if flag & AMDGPU_PTE_VALID // of source we did not sync it here, as this is unmapping.
                                -> amdgpu_vm_update_ptes
                                        -> amdgpu_vm_free_pts // unref page table bo.

So from the sequence above, we know there is a race betwen bo releasing and bo clearing.
bo might have been released before job running.

we can fix it in several ways,
1) sync last_delay in both mapping and unmapping case.
 Chris, you just sync last_delay in mapping case, should it be ok to sync it also in unmapping case?

2) always add fence to resv after commit.
 this is done by patchset v4. And only need patch 1. no need to move unref bo after commit.

3) move unref bo after commit, and add the last delay fence to resv.
This is done by patchset V1.


any ideas?

thanks
xinhui

> 2020年3月14日 02:05,Koenig, Christian <Christian.Koenig@amd.com> 写道:
>
> The page table is not updated and then freed. A higher level PDE is updated and because of this the lower level page tables is freed.
>
> Without this it could be that the memory backing the freed page table is reused while the PDE is still pointing to it.
>
> Rather unlikely that this causes problems, but better save than sorry.
>
> Regards,
> Christian.
>
> Am 13.03.20 um 18:36 schrieb Felix Kuehling:
>> This seems weird. This means that we update a page table, and then free it in the same amdgpu_vm_update_ptes call? That means the update is redundant. Can we eliminate the redundant PTE update if the page table is about to be freed anyway?
>>
>> Regards,
>>   Felix
>>
>> On 2020-03-13 12:09, xinhui pan wrote:
>>> Free page table bo before job submit is insane.
>>> We might touch invalid memory while job is runnig.
>>>
>>> we now have individualized bo resv during bo releasing.
>>> So any fences added to root PT bo is actually untested when a normal
>>> PT bo is releasing.
>>>
>>> We might hit gmc page fault or memory just got overwrited.
>>>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 24 +++++++++++++++++++++---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  3 +++
>>>   2 files changed, 24 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 73398831196f..346e2f753474 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -937,6 +937,21 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
>>>       return r;
>>>   }
>>>   +static void amdgpu_vm_free_zombie_bo(struct amdgpu_device *adev,
>>> +        struct amdgpu_vm *vm)
>>> +{
>>> +    struct amdgpu_vm_pt *entry;
>>> +
>>> +    while (!list_empty(&vm->zombies)) {
>>> +        entry = list_first_entry(&vm->zombies, struct amdgpu_vm_pt,
>>> +                base.vm_status);
>>> +        list_del(&entry->base.vm_status);
>>> +
>>> +        amdgpu_bo_unref(&entry->base.bo->shadow);
>>> +        amdgpu_bo_unref(&entry->base.bo);
>>> +    }
>>> +}
>>> +
>>>   /**
>>>    * amdgpu_vm_free_table - fre one PD/PT
>>>    *
>>> @@ -945,10 +960,9 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
>>>   static void amdgpu_vm_free_table(struct amdgpu_vm_pt *entry)
>>>   {
>>>       if (entry->base.bo) {
>>> +        list_move(&entry->base.vm_status,
>>> + &entry->base.bo->vm_bo->vm->zombies);
>>>           entry->base.bo->vm_bo = NULL;
>>> -        list_del(&entry->base.vm_status);
>>> -        amdgpu_bo_unref(&entry->base.bo->shadow);
>>> -        amdgpu_bo_unref(&entry->base.bo);
>>>       }
>>>       kvfree(entry->entries);
>>>       entry->entries = NULL;
>>> @@ -1624,6 +1638,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>       r = vm->update_funcs->commit(&params, fence);
>>>     error_unlock:
>>> +    amdgpu_vm_free_zombie_bo(adev, vm);
>>>       amdgpu_vm_eviction_unlock(vm);
>>>       return r;
>>>   }
>>> @@ -2807,6 +2822,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>>       INIT_LIST_HEAD(&vm->invalidated);
>>>       spin_lock_init(&vm->invalidated_lock);
>>>       INIT_LIST_HEAD(&vm->freed);
>>> +    INIT_LIST_HEAD(&vm->zombies);
>>>           /* create scheduler entities for page table updates */ @@
>>> -3119,6 +3135,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>>>       }
>>>         amdgpu_vm_free_pts(adev, vm, NULL);
>>> +    amdgpu_vm_free_zombie_bo(adev, vm);
>>> +
>>>       amdgpu_bo_unreserve(root);
>>>       amdgpu_bo_unref(&root);
>>>       WARN_ON(vm->root.base.bo);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> index b5705fcfc935..9baf44fa16f0 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> @@ -269,6 +269,9 @@ struct amdgpu_vm {
>>>       /* BO mappings freed, but not yet updated in the PT */
>>>       struct list_head    freed;
>>>   +    /* BO will be freed soon */
>>> +    struct list_head    zombies;
>>> +
>>>       /* contains the page directory */
>>>       struct amdgpu_vm_pt     root;
>>>       struct dma_fence    *last_update;
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cyintian.tao%40amd.com%7C580c8ec15d484bf546b208d7c8188cc2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637197880120611671&amp;sdata=dqSLasyuZhQskB38Kib6g8lQR9iMnyxFxfHGXXENoDc%3D&amp;reserved=0

[-- Attachment #1.2: Type: text/html, Size: 15521 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit
  2020-03-16  9:51           ` Pan, Xinhui
@ 2020-03-16  9:54             ` Tao, Yintian
  0 siblings, 0 replies; 10+ messages in thread
From: Tao, Yintian @ 2020-03-16  9:54 UTC (permalink / raw)
  To: Pan, Xinhui, Koenig, Christian
  Cc: Deucher, Alexander, Kuehling, Felix, amd-gfx


[-- Attachment #1.1: Type: text/plain, Size: 9681 bytes --]

Hi  Xinhui



Sure, can you submit one patch for it? I want to test it on my local server. Thanks in advance.


Best Regards
Yintian Tao

From: Pan, Xinhui <Xinhui.Pan@amd.com>
Sent: 2020年3月16日 17:51
To: Koenig, Christian <Christian.Koenig@amd.com>; Tao, Yintian <Yintian.Tao@amd.com>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit


[AMD Official Use Only - Internal Distribution Only]

I still hit page fault with option 1 while running oclperf test.
Looks like we need sync fence after commit.
________________________________
From: Tao, Yintian <Yintian.Tao@amd.com<mailto:Yintian.Tao@amd.com>>
Sent: Monday, March 16, 2020 4:15:01 PM
To: Pan, Xinhui <Xinhui.Pan@amd.com<mailto:Xinhui.Pan@amd.com>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix <Felix.Kuehling@amd.com<mailto:Felix.Kuehling@amd.com>>; Pan, Xinhui <Xinhui.Pan@amd.com<mailto:Xinhui.Pan@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>
Subject: RE: [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit

Hi Xinhui


I encounter the same problem(page fault) when test vk_example benchmark.
I use your first option which can fix the problem. Can you help submit one patch?


-       if (flags & AMDGPU_PTE_VALID) {
-               struct amdgpu_bo *root = vm->root.base.bo;
-               if (!dma_fence_is_signaled(vm->last_direct))
-                       amdgpu_bo_fence(root, vm->last_direct, true);
+       if (!dma_fence_is_signaled(vm->last_direct))
+               amdgpu_bo_fence(root, vm->last_direct, true);

-               if (!dma_fence_is_signaled(vm->last_delayed))
-                       amdgpu_bo_fence(root, vm->last_delayed, true);
-       }
+       if (!dma_fence_is_signaled(vm->last_delayed))
+               amdgpu_bo_fence(root, vm->last_delayed, true);


Best Regards
Yintian Tao

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org<mailto:amd-gfx-bounces@lists.freedesktop.org>> On Behalf Of Pan, Xinhui
Sent: 2020年3月14日 21:07
To: Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix <Felix.Kuehling@amd.com<mailto:Felix.Kuehling@amd.com>>; Pan, Xinhui <Xinhui.Pan@amd.com<mailto:Xinhui.Pan@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit

hi, All
I think I found the root cause. here is what happened.

user: alloc/mapping memory
          kernel: validate memory and update the bo mapping, and update the page table
                        -> amdgpu_vm_bo_update_mapping
                                -> amdgpu_vm_update_ptes
                                        -> amdgpu_vm_alloc_pts
                                                -> amdgpu_vm_clear_bo // it will submit a job and we have a fence. BUT it is NOT added in resv.
user: free/unmapping memory
                kernel: unmapping mmeory and udpate the page table
                        -> amdgpu_vm_bo_update_mapping
                        sync last_delay fence if flag & AMDGPU_PTE_VALID // of source we did not sync it here, as this is unmapping.
                                -> amdgpu_vm_update_ptes
                                        -> amdgpu_vm_free_pts // unref page table bo.

So from the sequence above, we know there is a race betwen bo releasing and bo clearing.
bo might have been released before job running.

we can fix it in several ways,
1) sync last_delay in both mapping and unmapping case.
 Chris, you just sync last_delay in mapping case, should it be ok to sync it also in unmapping case?

2) always add fence to resv after commit.
 this is done by patchset v4. And only need patch 1. no need to move unref bo after commit.

3) move unref bo after commit, and add the last delay fence to resv.
This is done by patchset V1.


any ideas?

thanks
xinhui

> 2020年3月14日 02:05,Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>> 写道:
>
> The page table is not updated and then freed. A higher level PDE is updated and because of this the lower level page tables is freed.
>
> Without this it could be that the memory backing the freed page table is reused while the PDE is still pointing to it.
>
> Rather unlikely that this causes problems, but better save than sorry.
>
> Regards,
> Christian.
>
> Am 13.03.20 um 18:36 schrieb Felix Kuehling:
>> This seems weird. This means that we update a page table, and then free it in the same amdgpu_vm_update_ptes call? That means the update is redundant. Can we eliminate the redundant PTE update if the page table is about to be freed anyway?
>>
>> Regards,
>>   Felix
>>
>> On 2020-03-13 12:09, xinhui pan wrote:
>>> Free page table bo before job submit is insane.
>>> We might touch invalid memory while job is runnig.
>>>
>>> we now have individualized bo resv during bo releasing.
>>> So any fences added to root PT bo is actually untested when a normal
>>> PT bo is releasing.
>>>
>>> We might hit gmc page fault or memory just got overwrited.
>>>
>>> Cc: Christian König <christian.koenig@amd.com<mailto:christian.koenig@amd.com>>
>>> Cc: Alex Deucher <alexander.deucher@amd.com<mailto:alexander.deucher@amd.com>>
>>> Cc: Felix Kuehling <Felix.Kuehling@amd.com<mailto:Felix.Kuehling@amd.com>>
>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com<mailto:xinhui.pan@amd.com>>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 24 +++++++++++++++++++++---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  3 +++
>>>   2 files changed, 24 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 73398831196f..346e2f753474 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -937,6 +937,21 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
>>>       return r;
>>>   }
>>>   +static void amdgpu_vm_free_zombie_bo(struct amdgpu_device *adev,
>>> +        struct amdgpu_vm *vm)
>>> +{
>>> +    struct amdgpu_vm_pt *entry;
>>> +
>>> +    while (!list_empty(&vm->zombies)) {
>>> +        entry = list_first_entry(&vm->zombies, struct amdgpu_vm_pt,
>>> +                base.vm_status);
>>> +        list_del(&entry->base.vm_status);
>>> +
>>> +        amdgpu_bo_unref(&entry->base.bo->shadow);
>>> +        amdgpu_bo_unref(&entry->base.bo);
>>> +    }
>>> +}
>>> +
>>>   /**
>>>    * amdgpu_vm_free_table - fre one PD/PT
>>>    *
>>> @@ -945,10 +960,9 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
>>>   static void amdgpu_vm_free_table(struct amdgpu_vm_pt *entry)
>>>   {
>>>       if (entry->base.bo) {
>>> +        list_move(&entry->base.vm_status,
>>> + &entry->base.bo->vm_bo->vm->zombies);
>>>           entry->base.bo->vm_bo = NULL;
>>> -        list_del(&entry->base.vm_status);
>>> -        amdgpu_bo_unref(&entry->base.bo->shadow);
>>> -        amdgpu_bo_unref(&entry->base.bo);
>>>       }
>>>       kvfree(entry->entries);
>>>       entry->entries = NULL;
>>> @@ -1624,6 +1638,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>       r = vm->update_funcs->commit(&params, fence);
>>>     error_unlock:
>>> +    amdgpu_vm_free_zombie_bo(adev, vm);
>>>       amdgpu_vm_eviction_unlock(vm);
>>>       return r;
>>>   }
>>> @@ -2807,6 +2822,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>>       INIT_LIST_HEAD(&vm->invalidated);
>>>       spin_lock_init(&vm->invalidated_lock);
>>>       INIT_LIST_HEAD(&vm->freed);
>>> +    INIT_LIST_HEAD(&vm->zombies);
>>>           /* create scheduler entities for page table updates */ @@
>>> -3119,6 +3135,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>>>       }
>>>         amdgpu_vm_free_pts(adev, vm, NULL);
>>> +    amdgpu_vm_free_zombie_bo(adev, vm);
>>> +
>>>       amdgpu_bo_unreserve(root);
>>>       amdgpu_bo_unref(&root);
>>>       WARN_ON(vm->root.base.bo);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> index b5705fcfc935..9baf44fa16f0 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> @@ -269,6 +269,9 @@ struct amdgpu_vm {
>>>       /* BO mappings freed, but not yet updated in the PT */
>>>       struct list_head    freed;
>>>   +    /* BO will be freed soon */
>>> +    struct list_head    zombies;
>>> +
>>>       /* contains the page directory */
>>>       struct amdgpu_vm_pt     root;
>>>       struct dma_fence    *last_update;
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cyintian.tao%40amd.com%7C580c8ec15d484bf546b208d7c8188cc2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637197880120611671&amp;sdata=dqSLasyuZhQskB38Kib6g8lQR9iMnyxFxfHGXXENoDc%3D&amp;reserved=0

[-- Attachment #1.2: Type: text/html, Size: 21870 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit
  2020-03-16  8:15         ` Tao, Yintian
  2020-03-16  9:51           ` Pan, Xinhui
@ 2020-03-16 12:15           ` Christian König
  1 sibling, 0 replies; 10+ messages in thread
From: Christian König @ 2020-03-16 12:15 UTC (permalink / raw)
  To: Tao, Yintian, Pan, Xinhui; +Cc: Deucher, Alexander, Kuehling, Felix, amd-gfx

Hi Xinhui,

well that is quite an impressive analyses of the problem.

> should it be ok to sync it also in unmapping case?
Short answer no, but I think I need to get back to the drawing board 
with the direct unmapping case anyway.

> 2) always add fence to resv after commit.
Yes, let's stick with that for now.

We also need to reinstate to ignore VM updates in the synchronization 
for good performance now, but I can take care of this.

Regards,
Christian.

Am 16.03.20 um 09:15 schrieb Tao, Yintian:
> Hi Xinhui
>
>
> I encounter the same problem(page fault) when test vk_example benchmark.
> I use your first option which can fix the problem. Can you help submit one patch?
>
>
> -       if (flags & AMDGPU_PTE_VALID) {
> -               struct amdgpu_bo *root = vm->root.base.bo;
> -               if (!dma_fence_is_signaled(vm->last_direct))
> -                       amdgpu_bo_fence(root, vm->last_direct, true);
> +       if (!dma_fence_is_signaled(vm->last_direct))
> +               amdgpu_bo_fence(root, vm->last_direct, true);
>   
> -               if (!dma_fence_is_signaled(vm->last_delayed))
> -                       amdgpu_bo_fence(root, vm->last_delayed, true);
> -       }
> +       if (!dma_fence_is_signaled(vm->last_delayed))
> +               amdgpu_bo_fence(root, vm->last_delayed, true);
>
>
> Best Regards
> Yintian Tao
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Pan, Xinhui
> Sent: 2020年3月14日 21:07
> To: Koenig, Christian <Christian.Koenig@amd.com>
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit
>
> hi, All
> I think I found the root cause. here is what happened.
>
> user: alloc/mapping memory
> 	 	kernel: validate memory and update the bo mapping, and update the page table
> 			-> amdgpu_vm_bo_update_mapping
> 				-> amdgpu_vm_update_ptes
> 					-> amdgpu_vm_alloc_pts
> 						-> amdgpu_vm_clear_bo // it will submit a job and we have a fence. BUT it is NOT added in resv.
> user: free/unmapping memory
> 		kernel: unmapping mmeory and udpate the page table
> 			-> amdgpu_vm_bo_update_mapping
> 			sync last_delay fence if flag & AMDGPU_PTE_VALID // of source we did not sync it here, as this is unmapping.
> 				-> amdgpu_vm_update_ptes
> 					-> amdgpu_vm_free_pts // unref page table bo.
>
> So from the sequence above, we know there is a race betwen bo releasing and bo clearing.
> bo might have been released before job running.
>
> we can fix it in several ways,
> 1) sync last_delay in both mapping and unmapping case.
>   Chris, you just sync last_delay in mapping case, should it be ok to sync it also in unmapping case?
>
> 2) always add fence to resv after commit.
>   this is done by patchset v4. And only need patch 1. no need to move unref bo after commit.
>
> 3) move unref bo after commit, and add the last delay fence to resv.
> This is done by patchset V1.
>
>
> any ideas?
>
> thanks
> xinhui
>
>> 2020年3月14日 02:05,Koenig, Christian <Christian.Koenig@amd.com> 写道:
>>
>> The page table is not updated and then freed. A higher level PDE is updated and because of this the lower level page tables is freed.
>>
>> Without this it could be that the memory backing the freed page table is reused while the PDE is still pointing to it.
>>
>> Rather unlikely that this causes problems, but better save than sorry.
>>
>> Regards,
>> Christian.
>>
>> Am 13.03.20 um 18:36 schrieb Felix Kuehling:
>>> This seems weird. This means that we update a page table, and then free it in the same amdgpu_vm_update_ptes call? That means the update is redundant. Can we eliminate the redundant PTE update if the page table is about to be freed anyway?
>>>
>>> Regards,
>>>    Felix
>>>
>>> On 2020-03-13 12:09, xinhui pan wrote:
>>>> Free page table bo before job submit is insane.
>>>> We might touch invalid memory while job is runnig.
>>>>
>>>> we now have individualized bo resv during bo releasing.
>>>> So any fences added to root PT bo is actually untested when a normal
>>>> PT bo is releasing.
>>>>
>>>> We might hit gmc page fault or memory just got overwrited.
>>>>
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 24 +++++++++++++++++++++---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  3 +++
>>>>    2 files changed, 24 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> index 73398831196f..346e2f753474 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> @@ -937,6 +937,21 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
>>>>        return r;
>>>>    }
>>>>    +static void amdgpu_vm_free_zombie_bo(struct amdgpu_device *adev,
>>>> +        struct amdgpu_vm *vm)
>>>> +{
>>>> +    struct amdgpu_vm_pt *entry;
>>>> +
>>>> +    while (!list_empty(&vm->zombies)) {
>>>> +        entry = list_first_entry(&vm->zombies, struct amdgpu_vm_pt,
>>>> +                base.vm_status);
>>>> +        list_del(&entry->base.vm_status);
>>>> +
>>>> +        amdgpu_bo_unref(&entry->base.bo->shadow);
>>>> +        amdgpu_bo_unref(&entry->base.bo);
>>>> +    }
>>>> +}
>>>> +
>>>>    /**
>>>>     * amdgpu_vm_free_table - fre one PD/PT
>>>>     *
>>>> @@ -945,10 +960,9 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
>>>>    static void amdgpu_vm_free_table(struct amdgpu_vm_pt *entry)
>>>>    {
>>>>        if (entry->base.bo) {
>>>> +        list_move(&entry->base.vm_status,
>>>> + &entry->base.bo->vm_bo->vm->zombies);
>>>>            entry->base.bo->vm_bo = NULL;
>>>> -        list_del(&entry->base.vm_status);
>>>> -        amdgpu_bo_unref(&entry->base.bo->shadow);
>>>> -        amdgpu_bo_unref(&entry->base.bo);
>>>>        }
>>>>        kvfree(entry->entries);
>>>>        entry->entries = NULL;
>>>> @@ -1624,6 +1638,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>        r = vm->update_funcs->commit(&params, fence);
>>>>      error_unlock:
>>>> +    amdgpu_vm_free_zombie_bo(adev, vm);
>>>>        amdgpu_vm_eviction_unlock(vm);
>>>>        return r;
>>>>    }
>>>> @@ -2807,6 +2822,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>>>>        INIT_LIST_HEAD(&vm->invalidated);
>>>>        spin_lock_init(&vm->invalidated_lock);
>>>>        INIT_LIST_HEAD(&vm->freed);
>>>> +    INIT_LIST_HEAD(&vm->zombies);
>>>>            /* create scheduler entities for page table updates */ @@
>>>> -3119,6 +3135,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>>>>        }
>>>>          amdgpu_vm_free_pts(adev, vm, NULL);
>>>> +    amdgpu_vm_free_zombie_bo(adev, vm);
>>>> +
>>>>        amdgpu_bo_unreserve(root);
>>>>        amdgpu_bo_unref(&root);
>>>>        WARN_ON(vm->root.base.bo);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> index b5705fcfc935..9baf44fa16f0 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>>> @@ -269,6 +269,9 @@ struct amdgpu_vm {
>>>>        /* BO mappings freed, but not yet updated in the PT */
>>>>        struct list_head    freed;
>>>>    +    /* BO will be freed soon */
>>>> +    struct list_head    zombies;
>>>> +
>>>>        /* contains the page directory */
>>>>        struct amdgpu_vm_pt     root;
>>>>        struct dma_fence    *last_update;
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cyintian.tao%40amd.com%7C580c8ec15d484bf546b208d7c8188cc2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637197880120611671&amp;sdata=dqSLasyuZhQskB38Kib6g8lQR9iMnyxFxfHGXXENoDc%3D&amp;reserved=0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-03-16 12:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-13 16:09 [PATCH v4 0/2] fix gmc page fault on navi1X xinhui pan
2020-03-13 16:09 ` [PATCH v4 1/2] drm_amdgpu: Add job fence to resv conditionally xinhui pan
2020-03-13 16:09 ` [PATCH v4 2/2] drm/amdgpu: unref pt bo after job submit xinhui pan
2020-03-13 17:36   ` Felix Kuehling
2020-03-13 18:05     ` Christian König
2020-03-14 13:06       ` Pan, Xinhui
2020-03-16  8:15         ` Tao, Yintian
2020-03-16  9:51           ` Pan, Xinhui
2020-03-16  9:54             ` Tao, Yintian
2020-03-16 12:15           ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.