All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] fix gmc page fault on navi1X
@ 2020-03-13  7:43 xinhui pan
  2020-03-13  7:43 ` [PATCH 1/2] drm//amdgpu: Always sync fence before unlock eviction_lock xinhui pan
  2020-03-13  7:43 ` [PATCH 2/2] drm/amdgpu: unref the bo after job submit xinhui pan
  0 siblings, 2 replies; 10+ messages in thread
From: xinhui pan @ 2020-03-13  7:43 UTC (permalink / raw)
  To: amd-gfx; +Cc: xinhui pan

We hit gmc page fault on navi1X.
UMR tells that the physical address of pte is bad.
Two issues:
1) we did not sync job schedule fence while update mapping.
we sync resv, but the last fence is not there. So any wait on the resv
finished before job completed. say, bo has been released while job
running.
2) we might unref page table bo during update ptes, at the same time, there
is job pending on bo. and submit a job in commit after free bo.
We need free the bo after adding all fence to bo.

xinhui pan (2):
  drm//amdgpu: Always sync fence before unlock eviction_lock
  drm/amdgpu: unref the bo after job submit

 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 46 +++++++++++++++++++-------
 1 file changed, 34 insertions(+), 12 deletions(-)

-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] drm//amdgpu: Always sync fence before unlock eviction_lock
  2020-03-13  7:43 [PATCH 0/2] fix gmc page fault on navi1X xinhui pan
@ 2020-03-13  7:43 ` xinhui pan
  2020-03-13  8:52   ` Christian König
  2020-03-13  7:43 ` [PATCH 2/2] drm/amdgpu: unref the bo after job submit xinhui pan
  1 sibling, 1 reply; 10+ messages in thread
From: xinhui pan @ 2020-03-13  7:43 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Felix Kuehling, xinhui pan, Christian König

The fence generated in ->commit is a shared one, so add it to resv.
And we need do that with eviction lock hold.

Currently we only sync last_direct/last_delayed before ->prepare. But we
fail to sync the last fence generated by ->commit. That cuases problems
if eviction happenes later, but it does not sync the last fence.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 73398831196f..f424b5969930 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1582,6 +1582,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 	struct amdgpu_vm_update_params params;
 	enum amdgpu_sync_mode sync_mode;
 	int r;
+	struct amdgpu_bo *root = vm->root.base.bo;
 
 	memset(&params, 0, sizeof(params));
 	params.adev = adev;
@@ -1604,8 +1605,6 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 	}
 
 	if (flags & AMDGPU_PTE_VALID) {
-		struct amdgpu_bo *root = vm->root.base.bo;
-
 		if (!dma_fence_is_signaled(vm->last_direct))
 			amdgpu_bo_fence(root, vm->last_direct, true);
 
@@ -1623,6 +1622,12 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 
 	r = vm->update_funcs->commit(&params, fence);
 
+	if (!dma_fence_is_signaled(vm->last_direct))
+		amdgpu_bo_fence(root, vm->last_direct, true);
+
+	if (!dma_fence_is_signaled(vm->last_delayed))
+		amdgpu_bo_fence(root, vm->last_delayed, true);
+
 error_unlock:
 	amdgpu_vm_eviction_unlock(vm);
 	return r;
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] drm/amdgpu: unref the bo after job submit
  2020-03-13  7:43 [PATCH 0/2] fix gmc page fault on navi1X xinhui pan
  2020-03-13  7:43 ` [PATCH 1/2] drm//amdgpu: Always sync fence before unlock eviction_lock xinhui pan
@ 2020-03-13  7:43 ` xinhui pan
  1 sibling, 0 replies; 10+ messages in thread
From: xinhui pan @ 2020-03-13  7:43 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Deucher, Felix Kuehling, xinhui pan, Christian König

Otherwise we might free BOs before job has completed.
We add the fence to BO after commit, so free BOs after that.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 37 +++++++++++++++++++-------
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index f424b5969930..513f3b77d3da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -942,13 +942,17 @@ static int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
  *
  * @entry: PDE to free
  */
-static void amdgpu_vm_free_table(struct amdgpu_vm_pt *entry)
+static void amdgpu_vm_free_table(struct amdgpu_vm_pt *entry,
+					struct list_head *head)
 {
 	if (entry->base.bo) {
 		entry->base.bo->vm_bo = NULL;
 		list_del(&entry->base.vm_status);
-		amdgpu_bo_unref(&entry->base.bo->shadow);
-		amdgpu_bo_unref(&entry->base.bo);
+		if (!head) {
+			amdgpu_bo_unref(&entry->base.bo->shadow);
+			amdgpu_bo_unref(&entry->base.bo);
+		} else
+			list_add(&entry->base.vm_status, head);
 	}
 	kvfree(entry->entries);
 	entry->entries = NULL;
@@ -965,7 +969,8 @@ static void amdgpu_vm_free_table(struct amdgpu_vm_pt *entry)
  */
 static void amdgpu_vm_free_pts(struct amdgpu_device *adev,
 			       struct amdgpu_vm *vm,
-			       struct amdgpu_vm_pt_cursor *start)
+			       struct amdgpu_vm_pt_cursor *start,
+				   struct list_head *head)
 {
 	struct amdgpu_vm_pt_cursor cursor;
 	struct amdgpu_vm_pt *entry;
@@ -973,10 +978,10 @@ static void amdgpu_vm_free_pts(struct amdgpu_device *adev,
 	vm->bulk_moveable = false;
 
 	for_each_amdgpu_vm_pt_dfs_safe(adev, vm, start, cursor, entry)
-		amdgpu_vm_free_table(entry);
+		amdgpu_vm_free_table(entry, head);
 
 	if (start)
-		amdgpu_vm_free_table(start->entry);
+		amdgpu_vm_free_table(start->entry, head);
 }
 
 /**
@@ -1428,7 +1433,8 @@ static void amdgpu_vm_fragment(struct amdgpu_vm_update_params *params,
  */
 static int amdgpu_vm_update_ptes(struct amdgpu_vm_update_params *params,
 				 uint64_t start, uint64_t end,
-				 uint64_t dst, uint64_t flags)
+				 uint64_t dst, uint64_t flags,
+				 struct list_head *head)
 {
 	struct amdgpu_device *adev = params->adev;
 	struct amdgpu_vm_pt_cursor cursor;
@@ -1539,7 +1545,7 @@ static int amdgpu_vm_update_ptes(struct amdgpu_vm_update_params *params,
 			 * completely covered by the range and so potentially still in use.
 			 */
 			while (cursor.pfn < frag_start) {
-				amdgpu_vm_free_pts(adev, params->vm, &cursor);
+				amdgpu_vm_free_pts(adev, params->vm, &cursor, head);
 				amdgpu_vm_pt_next(adev, &cursor);
 			}
 
@@ -1583,6 +1589,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 	enum amdgpu_sync_mode sync_mode;
 	int r;
 	struct amdgpu_bo *root = vm->root.base.bo;
+	struct list_head head;
 
 	memset(&params, 0, sizeof(params));
 	params.adev = adev;
@@ -1590,6 +1597,8 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 	params.direct = direct;
 	params.pages_addr = pages_addr;
 
+	INIT_LIST_HEAD(&head);
+
 	/* Implicitly sync to command submissions in the same VM before
 	 * unmapping. Sync to moving fences before mapping.
 	 */
@@ -1616,7 +1625,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 	if (r)
 		goto error_unlock;
 
-	r = amdgpu_vm_update_ptes(&params, start, last + 1, addr, flags);
+	r = amdgpu_vm_update_ptes(&params, start, last + 1, addr, flags, &head);
 	if (r)
 		goto error_unlock;
 
@@ -1629,6 +1638,14 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 		amdgpu_bo_fence(root, vm->last_delayed, true);
 
 error_unlock:
+	while (!list_empty(&head)) {
+		struct amdgpu_vm_pt *entry = list_first_entry(&head,
+				struct amdgpu_vm_pt, base.vm_status);
+		list_del(&entry->base.vm_status);
+
+		amdgpu_bo_unref(&entry->base.bo->shadow);
+		amdgpu_bo_unref(&entry->base.bo);
+	}
 	amdgpu_vm_eviction_unlock(vm);
 	return r;
 }
@@ -3123,7 +3140,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 		amdgpu_vm_free_mapping(adev, vm, mapping, NULL);
 	}
 
-	amdgpu_vm_free_pts(adev, vm, NULL);
+	amdgpu_vm_free_pts(adev, vm, NULL, NULL);
 	amdgpu_bo_unreserve(root);
 	amdgpu_bo_unref(&root);
 	WARN_ON(vm->root.base.bo);
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] drm//amdgpu: Always sync fence before unlock eviction_lock
  2020-03-13  7:43 ` [PATCH 1/2] drm//amdgpu: Always sync fence before unlock eviction_lock xinhui pan
@ 2020-03-13  8:52   ` Christian König
  2020-03-13  9:29     ` Pan, Xinhui
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2020-03-13  8:52 UTC (permalink / raw)
  To: xinhui pan, amd-gfx; +Cc: Alex Deucher, Felix Kuehling

Am 13.03.20 um 08:43 schrieb xinhui pan:
> The fence generated in ->commit is a shared one, so add it to resv.
> And we need do that with eviction lock hold.
>
> Currently we only sync last_direct/last_delayed before ->prepare. But we
> fail to sync the last fence generated by ->commit. That cuases problems
> if eviction happenes later, but it does not sync the last fence.

NAK, that won't work.

We can only add fences when the dma_resv object is locked and that is 
only the case when validating.

I'm considering to just partially revert the patch originally stopping 
to add fences and instead only not add them when invalidating in a 
direct submit.

Christian.

>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 73398831196f..f424b5969930 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1582,6 +1582,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>   	struct amdgpu_vm_update_params params;
>   	enum amdgpu_sync_mode sync_mode;
>   	int r;
> +	struct amdgpu_bo *root = vm->root.base.bo;
>   
>   	memset(&params, 0, sizeof(params));
>   	params.adev = adev;
> @@ -1604,8 +1605,6 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>   	}
>   
>   	if (flags & AMDGPU_PTE_VALID) {
> -		struct amdgpu_bo *root = vm->root.base.bo;
> -
>   		if (!dma_fence_is_signaled(vm->last_direct))
>   			amdgpu_bo_fence(root, vm->last_direct, true);
>   
> @@ -1623,6 +1622,12 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>   
>   	r = vm->update_funcs->commit(&params, fence);
>   
> +	if (!dma_fence_is_signaled(vm->last_direct))
> +		amdgpu_bo_fence(root, vm->last_direct, true);
> +
> +	if (!dma_fence_is_signaled(vm->last_delayed))
> +		amdgpu_bo_fence(root, vm->last_delayed, true);
> +
>   error_unlock:
>   	amdgpu_vm_eviction_unlock(vm);
>   	return r;

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] drm//amdgpu: Always sync fence before unlock eviction_lock
  2020-03-13  8:52   ` Christian König
@ 2020-03-13  9:29     ` Pan, Xinhui
  2020-03-13  9:55       ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Pan, Xinhui @ 2020-03-13  9:29 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: Deucher, Alexander, Kuehling, Felix, Pan, Xinhui, amd-gfx



> 2020年3月13日 16:52,Koenig, Christian <Christian.Koenig@amd.com> 写道:
> 
> Am 13.03.20 um 08:43 schrieb xinhui pan:
>> The fence generated in ->commit is a shared one, so add it to resv.
>> And we need do that with eviction lock hold.
>> 
>> Currently we only sync last_direct/last_delayed before ->prepare. But we
>> fail to sync the last fence generated by ->commit. That cuases problems
>> if eviction happenes later, but it does not sync the last fence.
> 
> NAK, that won't work.
> 
> We can only add fences when the dma_resv object is locked and that is only the case when validating.
> 
well, tha tis true.
but considering this is a PT BO, and only eviction has race on it AFAIK.
as for the individualized resv in bo release, we unref PT BO just after that.
I am still thinking of other races in the real world.

thanks
xinhui

> I'm considering to just partially revert the patch originally stopping to add fences and instead only not add them when invalidating in a direct submit.
> 
> Christian.
> 
>> 
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 73398831196f..f424b5969930 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -1582,6 +1582,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>  	struct amdgpu_vm_update_params params;
>>  	enum amdgpu_sync_mode sync_mode;
>>  	int r;
>> +	struct amdgpu_bo *root = vm->root.base.bo;
>>    	memset(&params, 0, sizeof(params));
>>  	params.adev = adev;
>> @@ -1604,8 +1605,6 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>  	}
>>    	if (flags & AMDGPU_PTE_VALID) {
>> -		struct amdgpu_bo *root = vm->root.base.bo;
>> -
>>  		if (!dma_fence_is_signaled(vm->last_direct))
>>  			amdgpu_bo_fence(root, vm->last_direct, true);
>>  @@ -1623,6 +1622,12 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>    	r = vm->update_funcs->commit(&params, fence);
>>  +	if (!dma_fence_is_signaled(vm->last_direct))
>> +		amdgpu_bo_fence(root, vm->last_direct, true);
>> +
>> +	if (!dma_fence_is_signaled(vm->last_delayed))
>> +		amdgpu_bo_fence(root, vm->last_delayed, true);
>> +
>>  error_unlock:
>>  	amdgpu_vm_eviction_unlock(vm);
>>  	return r;
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] drm//amdgpu: Always sync fence before unlock eviction_lock
  2020-03-13  9:29     ` Pan, Xinhui
@ 2020-03-13  9:55       ` Christian König
  2020-03-13 10:21         ` Pan, Xinhui
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2020-03-13  9:55 UTC (permalink / raw)
  To: Pan, Xinhui; +Cc: Deucher, Alexander, Kuehling, Felix, amd-gfx

Am 13.03.20 um 10:29 schrieb Pan, Xinhui:
>
>> 2020年3月13日 16:52,Koenig, Christian <Christian.Koenig@amd.com> 写道:
>>
>> Am 13.03.20 um 08:43 schrieb xinhui pan:
>>> The fence generated in ->commit is a shared one, so add it to resv.
>>> And we need do that with eviction lock hold.
>>>
>>> Currently we only sync last_direct/last_delayed before ->prepare. But we
>>> fail to sync the last fence generated by ->commit. That cuases problems
>>> if eviction happenes later, but it does not sync the last fence.
>> NAK, that won't work.
>>
>> We can only add fences when the dma_resv object is locked and that is only the case when validating.
>>
> well, tha tis true.
> but considering this is a PT BO, and only eviction has race on it AFAIK.
> as for the individualized resv in bo release, we unref PT BO just after that.
> I am still thinking of other races in the real world.

We should probably just add all pipelined/delayed submissions directly 
to the reservation object in amdgpu_vm_sdma_commit().

Only the direct and invalidating submissions can't be added because we 
can't grab the reservation object in the MMU notifier.

Can you prepare a patch for this?

Regards,
Christian.

>
> thanks
> xinhui
>
>> I'm considering to just partially revert the patch originally stopping to add fences and instead only not add them when invalidating in a direct submit.
>>
>> Christian.
>>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++--
>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 73398831196f..f424b5969930 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -1582,6 +1582,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>   	struct amdgpu_vm_update_params params;
>>>   	enum amdgpu_sync_mode sync_mode;
>>>   	int r;
>>> +	struct amdgpu_bo *root = vm->root.base.bo;
>>>     	memset(&params, 0, sizeof(params));
>>>   	params.adev = adev;
>>> @@ -1604,8 +1605,6 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>   	}
>>>     	if (flags & AMDGPU_PTE_VALID) {
>>> -		struct amdgpu_bo *root = vm->root.base.bo;
>>> -
>>>   		if (!dma_fence_is_signaled(vm->last_direct))
>>>   			amdgpu_bo_fence(root, vm->last_direct, true);
>>>   @@ -1623,6 +1622,12 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>     	r = vm->update_funcs->commit(&params, fence);
>>>   +	if (!dma_fence_is_signaled(vm->last_direct))
>>> +		amdgpu_bo_fence(root, vm->last_direct, true);
>>> +
>>> +	if (!dma_fence_is_signaled(vm->last_delayed))
>>> +		amdgpu_bo_fence(root, vm->last_delayed, true);
>>> +
>>>   error_unlock:
>>>   	amdgpu_vm_eviction_unlock(vm);
>>>   	return r;

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] drm//amdgpu: Always sync fence before unlock eviction_lock
  2020-03-13  9:55       ` Christian König
@ 2020-03-13 10:21         ` Pan, Xinhui
  2020-03-13 10:23           ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Pan, Xinhui @ 2020-03-13 10:21 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: Deucher, Alexander, Kuehling, Felix, Pan, Xinhui, amd-gfx



> 2020年3月13日 17:55,Koenig, Christian <Christian.Koenig@amd.com> 写道:
> 
> Am 13.03.20 um 10:29 schrieb Pan, Xinhui:
>> 
>>> 2020年3月13日 16:52,Koenig, Christian <Christian.Koenig@amd.com> 写道:
>>> 
>>> Am 13.03.20 um 08:43 schrieb xinhui pan:
>>>> The fence generated in ->commit is a shared one, so add it to resv.
>>>> And we need do that with eviction lock hold.
>>>> 
>>>> Currently we only sync last_direct/last_delayed before ->prepare. But we
>>>> fail to sync the last fence generated by ->commit. That cuases problems
>>>> if eviction happenes later, but it does not sync the last fence.
>>> NAK, that won't work.
>>> 
>>> We can only add fences when the dma_resv object is locked and that is only the case when validating.
>>> 
>> well, tha tis true.
>> but considering this is a PT BO, and only eviction has race on it AFAIK.
>> as for the individualized resv in bo release, we unref PT BO just after that.
>> I am still thinking of other races in the real world.
> 
> We should probably just add all pipelined/delayed submissions directly to the reservation object in amdgpu_vm_sdma_commit().
> 
> Only the direct and invalidating submissions can't be added because we can't grab the reservation object in the MMU notifier.
> 
> Can you prepare a patch for this?
> 
yep, I can.
Adding fence to bo resv in every commit introduce a little overload?  As we only need add the last fence to resv given the fact the job scheduer ring is FIFO.
yes,  code should be simple anyway as long as it works.

thanks
xinhui

> Regards,
> Christian.
> 
>> 
>> thanks
>> xinhui
>> 
>>> I'm considering to just partially revert the patch originally stopping to add fences and instead only not add them when invalidating in a direct submit.
>>> 
>>> Christian.
>>> 
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>>> ---
>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++--
>>>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> index 73398831196f..f424b5969930 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> @@ -1582,6 +1582,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>  	struct amdgpu_vm_update_params params;
>>>>  	enum amdgpu_sync_mode sync_mode;
>>>>  	int r;
>>>> +	struct amdgpu_bo *root = vm->root.base.bo;
>>>>    	memset(&params, 0, sizeof(params));
>>>>  	params.adev = adev;
>>>> @@ -1604,8 +1605,6 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>  	}
>>>>    	if (flags & AMDGPU_PTE_VALID) {
>>>> -		struct amdgpu_bo *root = vm->root.base.bo;
>>>> -
>>>>  		if (!dma_fence_is_signaled(vm->last_direct))
>>>>  			amdgpu_bo_fence(root, vm->last_direct, true);
>>>>  @@ -1623,6 +1622,12 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>    	r = vm->update_funcs->commit(&params, fence);
>>>>  +	if (!dma_fence_is_signaled(vm->last_direct))
>>>> +		amdgpu_bo_fence(root, vm->last_direct, true);
>>>> +
>>>> +	if (!dma_fence_is_signaled(vm->last_delayed))
>>>> +		amdgpu_bo_fence(root, vm->last_delayed, true);
>>>> +
>>>>  error_unlock:
>>>>  	amdgpu_vm_eviction_unlock(vm);
>>>>  	return r;
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] drm//amdgpu: Always sync fence before unlock eviction_lock
  2020-03-13 10:21         ` Pan, Xinhui
@ 2020-03-13 10:23           ` Christian König
  2020-03-13 10:40             ` Pan, Xinhui
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2020-03-13 10:23 UTC (permalink / raw)
  To: Pan, Xinhui; +Cc: Deucher, Alexander, Kuehling, Felix, amd-gfx

Am 13.03.20 um 11:21 schrieb Pan, Xinhui:
>
>> 2020年3月13日 17:55,Koenig, Christian <Christian.Koenig@amd.com> 写道:
>>
>> Am 13.03.20 um 10:29 schrieb Pan, Xinhui:
>>>> 2020年3月13日 16:52,Koenig, Christian <Christian.Koenig@amd.com> 写道:
>>>>
>>>> Am 13.03.20 um 08:43 schrieb xinhui pan:
>>>>> The fence generated in ->commit is a shared one, so add it to resv.
>>>>> And we need do that with eviction lock hold.
>>>>>
>>>>> Currently we only sync last_direct/last_delayed before ->prepare. But we
>>>>> fail to sync the last fence generated by ->commit. That cuases problems
>>>>> if eviction happenes later, but it does not sync the last fence.
>>>> NAK, that won't work.
>>>>
>>>> We can only add fences when the dma_resv object is locked and that is only the case when validating.
>>>>
>>> well, tha tis true.
>>> but considering this is a PT BO, and only eviction has race on it AFAIK.
>>> as for the individualized resv in bo release, we unref PT BO just after that.
>>> I am still thinking of other races in the real world.
>> We should probably just add all pipelined/delayed submissions directly to the reservation object in amdgpu_vm_sdma_commit().
>>
>> Only the direct and invalidating submissions can't be added because we can't grab the reservation object in the MMU notifier.
>>
>> Can you prepare a patch for this?
>>
> yep, I can.
> Adding fence to bo resv in every commit introduce a little overload?

Yes it does, but we used to have this before and it wasn't really 
measurable.

With the unusual exception of mapping really large chunks of fragmented 
system memory we only use one commit for anything <1GB anyway.

Christian.

> As we only need add the last fence to resv given the fact the job scheduer ring is FIFO.
> yes,  code should be simple anyway as long as it works.
>
> thanks
> xinhui
>
>> Regards,
>> Christian.
>>
>>> thanks
>>> xinhui
>>>
>>>> I'm considering to just partially revert the patch originally stopping to add fences and instead only not add them when invalidating in a direct submit.
>>>>
>>>> Christian.
>>>>
>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>>>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++--
>>>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> index 73398831196f..f424b5969930 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>> @@ -1582,6 +1582,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>>   	struct amdgpu_vm_update_params params;
>>>>>   	enum amdgpu_sync_mode sync_mode;
>>>>>   	int r;
>>>>> +	struct amdgpu_bo *root = vm->root.base.bo;
>>>>>     	memset(&params, 0, sizeof(params));
>>>>>   	params.adev = adev;
>>>>> @@ -1604,8 +1605,6 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>>   	}
>>>>>     	if (flags & AMDGPU_PTE_VALID) {
>>>>> -		struct amdgpu_bo *root = vm->root.base.bo;
>>>>> -
>>>>>   		if (!dma_fence_is_signaled(vm->last_direct))
>>>>>   			amdgpu_bo_fence(root, vm->last_direct, true);
>>>>>   @@ -1623,6 +1622,12 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>>     	r = vm->update_funcs->commit(&params, fence);
>>>>>   +	if (!dma_fence_is_signaled(vm->last_direct))
>>>>> +		amdgpu_bo_fence(root, vm->last_direct, true);
>>>>> +
>>>>> +	if (!dma_fence_is_signaled(vm->last_delayed))
>>>>> +		amdgpu_bo_fence(root, vm->last_delayed, true);
>>>>> +
>>>>>   error_unlock:
>>>>>   	amdgpu_vm_eviction_unlock(vm);
>>>>>   	return r;

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] drm//amdgpu: Always sync fence before unlock eviction_lock
  2020-03-13 10:23           ` Christian König
@ 2020-03-13 10:40             ` Pan, Xinhui
  2020-03-13 10:44               ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Pan, Xinhui @ 2020-03-13 10:40 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: Deucher, Alexander, Kuehling, Felix, Pan, Xinhui, amd-gfx



> 2020年3月13日 18:23,Koenig, Christian <Christian.Koenig@amd.com> 写道:
> 
> Am 13.03.20 um 11:21 schrieb Pan, Xinhui:
>> 
>>> 2020年3月13日 17:55,Koenig, Christian <Christian.Koenig@amd.com> 写道:
>>> 
>>> Am 13.03.20 um 10:29 schrieb Pan, Xinhui:
>>>>> 2020年3月13日 16:52,Koenig, Christian <Christian.Koenig@amd.com> 写道:
>>>>> 
>>>>> Am 13.03.20 um 08:43 schrieb xinhui pan:
>>>>>> The fence generated in ->commit is a shared one, so add it to resv.
>>>>>> And we need do that with eviction lock hold.
>>>>>> 
>>>>>> Currently we only sync last_direct/last_delayed before ->prepare. But we
>>>>>> fail to sync the last fence generated by ->commit. That cuases problems
>>>>>> if eviction happenes later, but it does not sync the last fence.
>>>>> NAK, that won't work.
>>>>> 
>>>>> We can only add fences when the dma_resv object is locked and that is only the case when validating.
>>>>> 
>>>> well, tha tis true.
>>>> but considering this is a PT BO, and only eviction has race on it AFAIK.
>>>> as for the individualized resv in bo release, we unref PT BO just after that.
>>>> I am still thinking of other races in the real world.
>>> We should probably just add all pipelined/delayed submissions directly to the reservation object in amdgpu_vm_sdma_commit().
>>> 
>>> Only the direct and invalidating submissions can't be added because we can't grab the reservation object in the MMU notifier.

wait, I see amdgpu_vm_handle_fault will grab the resv lock of root BO.
so no race then?

>>> 
>>> Can you prepare a patch for this?
>>> 
>> yep, I can.
>> Adding fence to bo resv in every commit introduce a little overload?
> 
> Yes it does, but we used to have this before and it wasn't really measurable.
> 
> With the unusual exception of mapping really large chunks of fragmented system memory we only use one commit for anything <1GB anyway.
> 
> Christian.
> 
>> As we only need add the last fence to resv given the fact the job scheduer ring is FIFO.
>> yes,  code should be simple anyway as long as it works.
>> 
>> thanks
>> xinhui
>> 
>>> Regards,
>>> Christian.
>>> 
>>>> thanks
>>>> xinhui
>>>> 
>>>>> I'm considering to just partially revert the patch originally stopping to add fences and instead only not add them when invalidating in a direct submit.
>>>>> 
>>>>> Christian.
>>>>> 
>>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>>>>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>>>>> ---
>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++--
>>>>>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>> 
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>> index 73398831196f..f424b5969930 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>> @@ -1582,6 +1582,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>>>  	struct amdgpu_vm_update_params params;
>>>>>>  	enum amdgpu_sync_mode sync_mode;
>>>>>>  	int r;
>>>>>> +	struct amdgpu_bo *root = vm->root.base.bo;
>>>>>>    	memset(&params, 0, sizeof(params));
>>>>>>  	params.adev = adev;
>>>>>> @@ -1604,8 +1605,6 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>>>  	}
>>>>>>    	if (flags & AMDGPU_PTE_VALID) {
>>>>>> -		struct amdgpu_bo *root = vm->root.base.bo;
>>>>>> -
>>>>>>  		if (!dma_fence_is_signaled(vm->last_direct))
>>>>>>  			amdgpu_bo_fence(root, vm->last_direct, true);
>>>>>>  @@ -1623,6 +1622,12 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>>>    	r = vm->update_funcs->commit(&params, fence);
>>>>>>  +	if (!dma_fence_is_signaled(vm->last_direct))
>>>>>> +		amdgpu_bo_fence(root, vm->last_direct, true);
>>>>>> +
>>>>>> +	if (!dma_fence_is_signaled(vm->last_delayed))
>>>>>> +		amdgpu_bo_fence(root, vm->last_delayed, true);
>>>>>> +
>>>>>>  error_unlock:
>>>>>>  	amdgpu_vm_eviction_unlock(vm);
>>>>>>  	return r;
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] drm//amdgpu: Always sync fence before unlock eviction_lock
  2020-03-13 10:40             ` Pan, Xinhui
@ 2020-03-13 10:44               ` Christian König
  0 siblings, 0 replies; 10+ messages in thread
From: Christian König @ 2020-03-13 10:44 UTC (permalink / raw)
  To: Pan, Xinhui; +Cc: Deucher, Alexander, Kuehling, Felix, amd-gfx

Am 13.03.20 um 11:40 schrieb Pan, Xinhui:
>
>> 2020年3月13日 18:23,Koenig, Christian <Christian.Koenig@amd.com> 写道:
>>
>> Am 13.03.20 um 11:21 schrieb Pan, Xinhui:
>>>> 2020年3月13日 17:55,Koenig, Christian <Christian.Koenig@amd.com> 写道:
>>>>
>>>> Am 13.03.20 um 10:29 schrieb Pan, Xinhui:
>>>>>> 2020年3月13日 16:52,Koenig, Christian <Christian.Koenig@amd.com> 写道:
>>>>>>
>>>>>> Am 13.03.20 um 08:43 schrieb xinhui pan:
>>>>>>> The fence generated in ->commit is a shared one, so add it to resv.
>>>>>>> And we need do that with eviction lock hold.
>>>>>>>
>>>>>>> Currently we only sync last_direct/last_delayed before ->prepare. But we
>>>>>>> fail to sync the last fence generated by ->commit. That cuases problems
>>>>>>> if eviction happenes later, but it does not sync the last fence.
>>>>>> NAK, that won't work.
>>>>>>
>>>>>> We can only add fences when the dma_resv object is locked and that is only the case when validating.
>>>>>>
>>>>> well, tha tis true.
>>>>> but considering this is a PT BO, and only eviction has race on it AFAIK.
>>>>> as for the individualized resv in bo release, we unref PT BO just after that.
>>>>> I am still thinking of other races in the real world.
>>>> We should probably just add all pipelined/delayed submissions directly to the reservation object in amdgpu_vm_sdma_commit().
>>>>
>>>> Only the direct and invalidating submissions can't be added because we can't grab the reservation object in the MMU notifier.
> wait, I see amdgpu_vm_handle_fault will grab the resv lock of root BO.
> so no race then?

No, not in this case.

But Philip, Alejandro and Felix are working on invalidation from an MMU 
callback. And in this case grabbing the lock doesn't work.

But this is a total special use case which can be handled differently.

Christian.

>
>>>> Can you prepare a patch for this?
>>>>
>>> yep, I can.
>>> Adding fence to bo resv in every commit introduce a little overload?
>> Yes it does, but we used to have this before and it wasn't really measurable.
>>
>> With the unusual exception of mapping really large chunks of fragmented system memory we only use one commit for anything <1GB anyway.
>>
>> Christian.
>>
>>> As we only need add the last fence to resv given the fact the job scheduer ring is FIFO.
>>> yes,  code should be simple anyway as long as it works.
>>>
>>> thanks
>>> xinhui
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> thanks
>>>>> xinhui
>>>>>
>>>>>> I'm considering to just partially revert the patch originally stopping to add fences and instead only not add them when invalidating in a direct submit.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>>>>>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>>>>>> ---
>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++--
>>>>>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>> index 73398831196f..f424b5969930 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>>>>> @@ -1582,6 +1582,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>>>>   	struct amdgpu_vm_update_params params;
>>>>>>>   	enum amdgpu_sync_mode sync_mode;
>>>>>>>   	int r;
>>>>>>> +	struct amdgpu_bo *root = vm->root.base.bo;
>>>>>>>     	memset(&params, 0, sizeof(params));
>>>>>>>   	params.adev = adev;
>>>>>>> @@ -1604,8 +1605,6 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>>>>   	}
>>>>>>>     	if (flags & AMDGPU_PTE_VALID) {
>>>>>>> -		struct amdgpu_bo *root = vm->root.base.bo;
>>>>>>> -
>>>>>>>   		if (!dma_fence_is_signaled(vm->last_direct))
>>>>>>>   			amdgpu_bo_fence(root, vm->last_direct, true);
>>>>>>>   @@ -1623,6 +1622,12 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>>>>>>>     	r = vm->update_funcs->commit(&params, fence);
>>>>>>>   +	if (!dma_fence_is_signaled(vm->last_direct))
>>>>>>> +		amdgpu_bo_fence(root, vm->last_direct, true);
>>>>>>> +
>>>>>>> +	if (!dma_fence_is_signaled(vm->last_delayed))
>>>>>>> +		amdgpu_bo_fence(root, vm->last_delayed, true);
>>>>>>> +
>>>>>>>   error_unlock:
>>>>>>>   	amdgpu_vm_eviction_unlock(vm);
>>>>>>>   	return r;

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-03-13 10:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-13  7:43 [PATCH 0/2] fix gmc page fault on navi1X xinhui pan
2020-03-13  7:43 ` [PATCH 1/2] drm//amdgpu: Always sync fence before unlock eviction_lock xinhui pan
2020-03-13  8:52   ` Christian König
2020-03-13  9:29     ` Pan, Xinhui
2020-03-13  9:55       ` Christian König
2020-03-13 10:21         ` Pan, Xinhui
2020-03-13 10:23           ` Christian König
2020-03-13 10:40             ` Pan, Xinhui
2020-03-13 10:44               ` Christian König
2020-03-13  7:43 ` [PATCH 2/2] drm/amdgpu: unref the bo after job submit xinhui pan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.