* [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
@ 2019-11-15 22:38 ` Alex Sierra
0 siblings, 0 replies; 14+ messages in thread
From: Alex Sierra @ 2019-11-15 22:38 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Alex Sierra
After a retry-fault happens, the fault handler will modify the UTCL2 to
set PTE bits to force a no-retry-fault. This will cause the wave to
enter the trap handler.
Change-Id: I177102891f715068f15605957ff23b0cab862603
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 3c0bd6472a46..9ad7345d315d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3167,7 +3167,8 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
uint64_t addr)
{
struct amdgpu_bo *root;
- uint64_t value, flags;
+ uint64_t value = 0;
+ uint64_t flags;
struct amdgpu_vm *vm;
long r;
@@ -3196,17 +3197,15 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
goto error_unlock;
addr /= AMDGPU_GPU_PAGE_SIZE;
- flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
- AMDGPU_PTE_SYSTEM;
if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
- /* Redirect the access to the dummy page */
- value = adev->dummy_page_addr;
- flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
- AMDGPU_PTE_WRITEABLE;
+ /* Setting PTE flags to trigger a no-retry-fault */
+ flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
+ AMDGPU_PTE_TF;
} else {
/* Let the hw retry silently on the PTE */
- value = 0;
+ flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
+ AMDGPU_PTE_SYSTEM;
}
r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
--
2.17.1
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
@ 2019-11-15 22:38 ` Alex Sierra
0 siblings, 0 replies; 14+ messages in thread
From: Alex Sierra @ 2019-11-15 22:38 UTC (permalink / raw)
To: amd-gfx; +Cc: Alex Sierra
After a retry-fault happens, the fault handler will modify the UTCL2 to
set PTE bits to force a no-retry-fault. This will cause the wave to
enter the trap handler.
Change-Id: I177102891f715068f15605957ff23b0cab862603
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 3c0bd6472a46..9ad7345d315d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3167,7 +3167,8 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
uint64_t addr)
{
struct amdgpu_bo *root;
- uint64_t value, flags;
+ uint64_t value = 0;
+ uint64_t flags;
struct amdgpu_vm *vm;
long r;
@@ -3196,17 +3197,15 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
goto error_unlock;
addr /= AMDGPU_GPU_PAGE_SIZE;
- flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
- AMDGPU_PTE_SYSTEM;
if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
- /* Redirect the access to the dummy page */
- value = adev->dummy_page_addr;
- flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
- AMDGPU_PTE_WRITEABLE;
+ /* Setting PTE flags to trigger a no-retry-fault */
+ flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
+ AMDGPU_PTE_TF;
} else {
/* Let the hw retry silently on the PTE */
- value = 0;
+ flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
+ AMDGPU_PTE_SYSTEM;
}
r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
--
2.17.1
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
@ 2019-11-16 9:32 ` Christian König
0 siblings, 0 replies; 14+ messages in thread
From: Christian König @ 2019-11-16 9:32 UTC (permalink / raw)
To: Alex Sierra, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Am 15.11.19 um 23:38 schrieb Alex Sierra:
> After a retry-fault happens, the fault handler will modify the UTCL2 to
> set PTE bits to force a no-retry-fault. This will cause the wave to
> enter the trap handler.
NAK, you can't do this unconditionally since that behavior is not wanted
for graphics.
Christian.
>
> Change-Id: I177102891f715068f15605957ff23b0cab862603
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 3c0bd6472a46..9ad7345d315d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3167,7 +3167,8 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> uint64_t addr)
> {
> struct amdgpu_bo *root;
> - uint64_t value, flags;
> + uint64_t value = 0;
> + uint64_t flags;
> struct amdgpu_vm *vm;
> long r;
>
> @@ -3196,17 +3197,15 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> goto error_unlock;
>
> addr /= AMDGPU_GPU_PAGE_SIZE;
> - flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> - AMDGPU_PTE_SYSTEM;
>
> if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
> - /* Redirect the access to the dummy page */
> - value = adev->dummy_page_addr;
> - flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
> - AMDGPU_PTE_WRITEABLE;
> + /* Setting PTE flags to trigger a no-retry-fault */
> + flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
> + AMDGPU_PTE_TF;
> } else {
> /* Let the hw retry silently on the PTE */
> - value = 0;
> + flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> + AMDGPU_PTE_SYSTEM;
> }
>
> r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
@ 2019-11-16 9:32 ` Christian König
0 siblings, 0 replies; 14+ messages in thread
From: Christian König @ 2019-11-16 9:32 UTC (permalink / raw)
To: Alex Sierra, amd-gfx
Am 15.11.19 um 23:38 schrieb Alex Sierra:
> After a retry-fault happens, the fault handler will modify the UTCL2 to
> set PTE bits to force a no-retry-fault. This will cause the wave to
> enter the trap handler.
NAK, you can't do this unconditionally since that behavior is not wanted
for graphics.
Christian.
>
> Change-Id: I177102891f715068f15605957ff23b0cab862603
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 3c0bd6472a46..9ad7345d315d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3167,7 +3167,8 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> uint64_t addr)
> {
> struct amdgpu_bo *root;
> - uint64_t value, flags;
> + uint64_t value = 0;
> + uint64_t flags;
> struct amdgpu_vm *vm;
> long r;
>
> @@ -3196,17 +3197,15 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> goto error_unlock;
>
> addr /= AMDGPU_GPU_PAGE_SIZE;
> - flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> - AMDGPU_PTE_SYSTEM;
>
> if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
> - /* Redirect the access to the dummy page */
> - value = adev->dummy_page_addr;
> - flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
> - AMDGPU_PTE_WRITEABLE;
> + /* Setting PTE flags to trigger a no-retry-fault */
> + flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
> + AMDGPU_PTE_TF;
> } else {
> /* Let the hw retry silently on the PTE */
> - value = 0;
> + flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> + AMDGPU_PTE_SYSTEM;
> }
>
> r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
@ 2019-11-18 14:12 ` Zeng, Oak
0 siblings, 0 replies; 14+ messages in thread
From: Zeng, Oak @ 2019-11-18 14:12 UTC (permalink / raw)
To: Koenig, Christian, Yang, Philip, Sierra Guiza, Alejandro (Alex),
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Thank you Christian. An intentional use of invalid PTE setting sounds reasonable for debugger case. I wasn't able to understand because I though this is preparation work for page migration.
Hi Alex,
Please add some comments in the commit message to say that this is intentionally invalid PTE settings to generate no-retry interrupt and this is for debugger use case where we don't want endless translation retry. Endless translation retry is designed for page migration. A s_trap instruction will be inserted in the debugger case to let the wave to enter trap handler to save context.
Regards,
Oak
-----Original Message-----
From: Christian König <ckoenig.leichtzumerken@gmail.com>
Sent: Monday, November 18, 2019 8:47 AM
To: Zeng, Oak <Oak.Zeng@amd.com>; Yang, Philip <Philip.Yang@amd.com>; Sierra Guiza, Alejandro (Alex) <Alex.Sierra@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
Hi Oak,
well your questions are really valid. This is a follow up from an internal discussion on how to handle the debug trap handler on Vega and newer.
The key missing information in the commit message is that we are intentionally using an invalid combination of flags (F and P set at the same time) to trigger a no-retry fault.
This is used with the debug trap handler to enter debugging when an application accesses an invalid address on the GPU.
Otherwise we would just re-try the access over and over again because the hardware thinks that this is for page migration.
Regards,
Christian.
Am 18.11.19 um 04:42 schrieb Zeng, Oak:
> Hi Philip/Alex,
>
> I found I can't understand this patch without more details in the commit message. Is this preparation work for the page migration? Why setting the translation further bit can force a no-retry-fault? Won't setting this bit cause UTCL2 treat the PTE as a PDE and continue to walk to the PTE that this PTE pointing to? But from the patch below, the next level of PTE (pseudoly the 5th level page table) is not set.
>
> Why this setting will cause the wave to enter trap handler? As I understand it, wave enters trap handler (saving context) either it encounter a s_trap instruction, or CP can initialize suspension to preempt the running waves, or context switch can be triggered by certain counter (monitored by SPI).
>
> Further question will be, what is the planned page migration process? Halt wave/save context -> migrate page/update page table-> restore wave?
>
> Also when I check navi10 GPUVM programming guide, it clearly says that currently the Translate Further (F) option cannot be used in concert with the PDE as PTE (P) option. But this patch set F and P.
>
> Sorry for too many questions.
>
> Regards,
> Oak
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
> Philip Yang
> Sent: Friday, November 15, 2019 1:04 PM
> To: Sierra Guiza, Alejandro (Alex) <Alex.Sierra@amd.com>;
> amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] amd/amdgpu: force to trigger a no-retry-fault
> after a retry-fault
>
> One suggestion below, move the flags setting for else path into else path looks better.
>
> Philip
>
> On 2019-11-15 12:43 p.m., Alex Sierra wrote:
>> After a retry-fault happens, the fault handler will modify the UTCL2
>> to set PTE bits to force a no-retry-fault. This will cause the wave
>> to enter the trap handler.
>>
>> Change-Id: I177102891f715068f15605957ff23b0cab862603
>> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +++++--------
>> 1 file changed, 5 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 3c0bd6472a46..e4d1a8fc97d5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -3167,7 +3167,8 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> uint64_t addr)
>> {
>> struct amdgpu_bo *root;
>> - uint64_t value, flags;
>> + uint64_t value = 0;
>> + uint64_t flags;
>> struct amdgpu_vm *vm;
>> long r;
>>
>> @@ -3200,13 +3201,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> AMDGPU_PTE_SYSTEM;
>>
> >- flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> >- AMDGPU_PTE_SYSTEM;
> >
>> if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>> - /* Redirect the access to the dummy page */
>> - value = adev->dummy_page_addr;
>> - flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>> - AMDGPU_PTE_WRITEABLE;
> > + /* Setting PTE flags to trigger a no-retry-fault */
> > + flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
> > + AMDGPU_PTE_TF;
> } else {
>> - value = 0;
> >+ flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> >+ AMDGPU_PTE_SYSTEM;
> > }
>
>>
>> r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr
>> + 1,
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
@ 2019-11-18 14:12 ` Zeng, Oak
0 siblings, 0 replies; 14+ messages in thread
From: Zeng, Oak @ 2019-11-18 14:12 UTC (permalink / raw)
To: Koenig, Christian, Yang, Philip, Sierra Guiza, Alejandro (Alex), amd-gfx
Thank you Christian. An intentional use of invalid PTE setting sounds reasonable for debugger case. I wasn't able to understand because I though this is preparation work for page migration.
Hi Alex,
Please add some comments in the commit message to say that this is intentionally invalid PTE settings to generate no-retry interrupt and this is for debugger use case where we don't want endless translation retry. Endless translation retry is designed for page migration. A s_trap instruction will be inserted in the debugger case to let the wave to enter trap handler to save context.
Regards,
Oak
-----Original Message-----
From: Christian König <ckoenig.leichtzumerken@gmail.com>
Sent: Monday, November 18, 2019 8:47 AM
To: Zeng, Oak <Oak.Zeng@amd.com>; Yang, Philip <Philip.Yang@amd.com>; Sierra Guiza, Alejandro (Alex) <Alex.Sierra@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
Hi Oak,
well your questions are really valid. This is a follow up from an internal discussion on how to handle the debug trap handler on Vega and newer.
The key missing information in the commit message is that we are intentionally using an invalid combination of flags (F and P set at the same time) to trigger a no-retry fault.
This is used with the debug trap handler to enter debugging when an application accesses an invalid address on the GPU.
Otherwise we would just re-try the access over and over again because the hardware thinks that this is for page migration.
Regards,
Christian.
Am 18.11.19 um 04:42 schrieb Zeng, Oak:
> Hi Philip/Alex,
>
> I found I can't understand this patch without more details in the commit message. Is this preparation work for the page migration? Why setting the translation further bit can force a no-retry-fault? Won't setting this bit cause UTCL2 treat the PTE as a PDE and continue to walk to the PTE that this PTE pointing to? But from the patch below, the next level of PTE (pseudoly the 5th level page table) is not set.
>
> Why this setting will cause the wave to enter trap handler? As I understand it, wave enters trap handler (saving context) either it encounter a s_trap instruction, or CP can initialize suspension to preempt the running waves, or context switch can be triggered by certain counter (monitored by SPI).
>
> Further question will be, what is the planned page migration process? Halt wave/save context -> migrate page/update page table-> restore wave?
>
> Also when I check navi10 GPUVM programming guide, it clearly says that currently the Translate Further (F) option cannot be used in concert with the PDE as PTE (P) option. But this patch set F and P.
>
> Sorry for too many questions.
>
> Regards,
> Oak
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
> Philip Yang
> Sent: Friday, November 15, 2019 1:04 PM
> To: Sierra Guiza, Alejandro (Alex) <Alex.Sierra@amd.com>;
> amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] amd/amdgpu: force to trigger a no-retry-fault
> after a retry-fault
>
> One suggestion below, move the flags setting for else path into else path looks better.
>
> Philip
>
> On 2019-11-15 12:43 p.m., Alex Sierra wrote:
>> After a retry-fault happens, the fault handler will modify the UTCL2
>> to set PTE bits to force a no-retry-fault. This will cause the wave
>> to enter the trap handler.
>>
>> Change-Id: I177102891f715068f15605957ff23b0cab862603
>> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +++++--------
>> 1 file changed, 5 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 3c0bd6472a46..e4d1a8fc97d5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -3167,7 +3167,8 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> uint64_t addr)
>> {
>> struct amdgpu_bo *root;
>> - uint64_t value, flags;
>> + uint64_t value = 0;
>> + uint64_t flags;
>> struct amdgpu_vm *vm;
>> long r;
>>
>> @@ -3200,13 +3201,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> AMDGPU_PTE_SYSTEM;
>>
> >- flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> >- AMDGPU_PTE_SYSTEM;
> >
>> if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>> - /* Redirect the access to the dummy page */
>> - value = adev->dummy_page_addr;
>> - flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>> - AMDGPU_PTE_WRITEABLE;
> > + /* Setting PTE flags to trigger a no-retry-fault */
> > + flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
> > + AMDGPU_PTE_TF;
> } else {
>> - value = 0;
> >+ flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> >+ AMDGPU_PTE_SYSTEM;
> > }
>
>>
>> r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr
>> + 1,
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
@ 2019-11-18 13:46 ` Christian König
0 siblings, 0 replies; 14+ messages in thread
From: Christian König @ 2019-11-18 13:46 UTC (permalink / raw)
To: Zeng, Oak, Yang, Philip, Sierra Guiza, Alejandro (Alex),
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Hi Oak,
well your questions are really valid. This is a follow up from an
internal discussion on how to handle the debug trap handler on Vega and
newer.
The key missing information in the commit message is that we are
intentionally using an invalid combination of flags (F and P set at the
same time) to trigger a no-retry fault.
This is used with the debug trap handler to enter debugging when an
application accesses an invalid address on the GPU.
Otherwise we would just re-try the access over and over again because
the hardware thinks that this is for page migration.
Regards,
Christian.
Am 18.11.19 um 04:42 schrieb Zeng, Oak:
> Hi Philip/Alex,
>
> I found I can't understand this patch without more details in the commit message. Is this preparation work for the page migration? Why setting the translation further bit can force a no-retry-fault? Won't setting this bit cause UTCL2 treat the PTE as a PDE and continue to walk to the PTE that this PTE pointing to? But from the patch below, the next level of PTE (pseudoly the 5th level page table) is not set.
>
> Why this setting will cause the wave to enter trap handler? As I understand it, wave enters trap handler (saving context) either it encounter a s_trap instruction, or CP can initialize suspension to preempt the running waves, or context switch can be triggered by certain counter (monitored by SPI).
>
> Further question will be, what is the planned page migration process? Halt wave/save context -> migrate page/update page table-> restore wave?
>
> Also when I check navi10 GPUVM programming guide, it clearly says that currently the Translate Further (F) option cannot be used in concert with the PDE as PTE (P) option. But this patch set F and P.
>
> Sorry for too many questions.
>
> Regards,
> Oak
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Philip Yang
> Sent: Friday, November 15, 2019 1:04 PM
> To: Sierra Guiza, Alejandro (Alex) <Alex.Sierra@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
>
> One suggestion below, move the flags setting for else path into else path looks better.
>
> Philip
>
> On 2019-11-15 12:43 p.m., Alex Sierra wrote:
>> After a retry-fault happens, the fault handler will modify the UTCL2
>> to set PTE bits to force a no-retry-fault. This will cause the wave to
>> enter the trap handler.
>>
>> Change-Id: I177102891f715068f15605957ff23b0cab862603
>> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +++++--------
>> 1 file changed, 5 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 3c0bd6472a46..e4d1a8fc97d5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -3167,7 +3167,8 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> uint64_t addr)
>> {
>> struct amdgpu_bo *root;
>> - uint64_t value, flags;
>> + uint64_t value = 0;
>> + uint64_t flags;
>> struct amdgpu_vm *vm;
>> long r;
>>
>> @@ -3200,13 +3201,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> AMDGPU_PTE_SYSTEM;
>>
> >- flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> >- AMDGPU_PTE_SYSTEM;
> >
>> if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>> - /* Redirect the access to the dummy page */
>> - value = adev->dummy_page_addr;
>> - flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>> - AMDGPU_PTE_WRITEABLE;
> > + /* Setting PTE flags to trigger a no-retry-fault */
> > + flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
> > + AMDGPU_PTE_TF;
> } else {
>> - value = 0;
> >+ flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> >+ AMDGPU_PTE_SYSTEM;
> > }
>
>>
>> r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr +
>> 1,
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
@ 2019-11-18 13:46 ` Christian König
0 siblings, 0 replies; 14+ messages in thread
From: Christian König @ 2019-11-18 13:46 UTC (permalink / raw)
To: Zeng, Oak, Yang, Philip, Sierra Guiza, Alejandro (Alex), amd-gfx
Hi Oak,
well your questions are really valid. This is a follow up from an
internal discussion on how to handle the debug trap handler on Vega and
newer.
The key missing information in the commit message is that we are
intentionally using an invalid combination of flags (F and P set at the
same time) to trigger a no-retry fault.
This is used with the debug trap handler to enter debugging when an
application accesses an invalid address on the GPU.
Otherwise we would just re-try the access over and over again because
the hardware thinks that this is for page migration.
Regards,
Christian.
Am 18.11.19 um 04:42 schrieb Zeng, Oak:
> Hi Philip/Alex,
>
> I found I can't understand this patch without more details in the commit message. Is this preparation work for the page migration? Why setting the translation further bit can force a no-retry-fault? Won't setting this bit cause UTCL2 treat the PTE as a PDE and continue to walk to the PTE that this PTE pointing to? But from the patch below, the next level of PTE (pseudoly the 5th level page table) is not set.
>
> Why this setting will cause the wave to enter trap handler? As I understand it, wave enters trap handler (saving context) either it encounter a s_trap instruction, or CP can initialize suspension to preempt the running waves, or context switch can be triggered by certain counter (monitored by SPI).
>
> Further question will be, what is the planned page migration process? Halt wave/save context -> migrate page/update page table-> restore wave?
>
> Also when I check navi10 GPUVM programming guide, it clearly says that currently the Translate Further (F) option cannot be used in concert with the PDE as PTE (P) option. But this patch set F and P.
>
> Sorry for too many questions.
>
> Regards,
> Oak
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Philip Yang
> Sent: Friday, November 15, 2019 1:04 PM
> To: Sierra Guiza, Alejandro (Alex) <Alex.Sierra@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
>
> One suggestion below, move the flags setting for else path into else path looks better.
>
> Philip
>
> On 2019-11-15 12:43 p.m., Alex Sierra wrote:
>> After a retry-fault happens, the fault handler will modify the UTCL2
>> to set PTE bits to force a no-retry-fault. This will cause the wave to
>> enter the trap handler.
>>
>> Change-Id: I177102891f715068f15605957ff23b0cab862603
>> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +++++--------
>> 1 file changed, 5 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 3c0bd6472a46..e4d1a8fc97d5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -3167,7 +3167,8 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> uint64_t addr)
>> {
>> struct amdgpu_bo *root;
>> - uint64_t value, flags;
>> + uint64_t value = 0;
>> + uint64_t flags;
>> struct amdgpu_vm *vm;
>> long r;
>>
>> @@ -3200,13 +3201,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> AMDGPU_PTE_SYSTEM;
>>
> >- flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> >- AMDGPU_PTE_SYSTEM;
> >
>> if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>> - /* Redirect the access to the dummy page */
>> - value = adev->dummy_page_addr;
>> - flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>> - AMDGPU_PTE_WRITEABLE;
> > + /* Setting PTE flags to trigger a no-retry-fault */
> > + flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
> > + AMDGPU_PTE_TF;
> } else {
>> - value = 0;
> >+ flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
> >+ AMDGPU_PTE_SYSTEM;
> > }
>
>>
>> r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr +
>> 1,
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
@ 2019-11-18 3:42 ` Zeng, Oak
0 siblings, 0 replies; 14+ messages in thread
From: Zeng, Oak @ 2019-11-18 3:42 UTC (permalink / raw)
To: Yang, Philip, Sierra Guiza, Alejandro (Alex),
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Hi Philip/Alex,
I found I can't understand this patch without more details in the commit message. Is this preparation work for the page migration? Why setting the translation further bit can force a no-retry-fault? Won't setting this bit cause UTCL2 treat the PTE as a PDE and continue to walk to the PTE that this PTE pointing to? But from the patch below, the next level of PTE (pseudoly the 5th level page table) is not set.
Why this setting will cause the wave to enter trap handler? As I understand it, wave enters trap handler (saving context) either it encounter a s_trap instruction, or CP can initialize suspension to preempt the running waves, or context switch can be triggered by certain counter (monitored by SPI).
Further question will be, what is the planned page migration process? Halt wave/save context -> migrate page/update page table-> restore wave?
Also when I check navi10 GPUVM programming guide, it clearly says that currently the Translate Further (F) option cannot be used in concert with the PDE as PTE (P) option. But this patch set F and P.
Sorry for too many questions.
Regards,
Oak
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Philip Yang
Sent: Friday, November 15, 2019 1:04 PM
To: Sierra Guiza, Alejandro (Alex) <Alex.Sierra@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
One suggestion below, move the flags setting for else path into else path looks better.
Philip
On 2019-11-15 12:43 p.m., Alex Sierra wrote:
> After a retry-fault happens, the fault handler will modify the UTCL2
> to set PTE bits to force a no-retry-fault. This will cause the wave to
> enter the trap handler.
>
> Change-Id: I177102891f715068f15605957ff23b0cab862603
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +++++--------
> 1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 3c0bd6472a46..e4d1a8fc97d5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3167,7 +3167,8 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> uint64_t addr)
> {
> struct amdgpu_bo *root;
> - uint64_t value, flags;
> + uint64_t value = 0;
> + uint64_t flags;
> struct amdgpu_vm *vm;
> long r;
>
> @@ -3200,13 +3201,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> AMDGPU_PTE_SYSTEM;
>
>- flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>- AMDGPU_PTE_SYSTEM;
>
> if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
> - /* Redirect the access to the dummy page */
> - value = adev->dummy_page_addr;
> - flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
> - AMDGPU_PTE_WRITEABLE;
> + /* Setting PTE flags to trigger a no-retry-fault */
> + flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
> + AMDGPU_PTE_TF;
} else {
> - value = 0;
>+ flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>+ AMDGPU_PTE_SYSTEM;
> }
>
> r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr +
> 1,
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
@ 2019-11-18 3:42 ` Zeng, Oak
0 siblings, 0 replies; 14+ messages in thread
From: Zeng, Oak @ 2019-11-18 3:42 UTC (permalink / raw)
To: Yang, Philip, Sierra Guiza, Alejandro (Alex), amd-gfx
Hi Philip/Alex,
I found I can't understand this patch without more details in the commit message. Is this preparation work for the page migration? Why setting the translation further bit can force a no-retry-fault? Won't setting this bit cause UTCL2 treat the PTE as a PDE and continue to walk to the PTE that this PTE pointing to? But from the patch below, the next level of PTE (pseudoly the 5th level page table) is not set.
Why this setting will cause the wave to enter trap handler? As I understand it, wave enters trap handler (saving context) either it encounter a s_trap instruction, or CP can initialize suspension to preempt the running waves, or context switch can be triggered by certain counter (monitored by SPI).
Further question will be, what is the planned page migration process? Halt wave/save context -> migrate page/update page table-> restore wave?
Also when I check navi10 GPUVM programming guide, it clearly says that currently the Translate Further (F) option cannot be used in concert with the PDE as PTE (P) option. But this patch set F and P.
Sorry for too many questions.
Regards,
Oak
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Philip Yang
Sent: Friday, November 15, 2019 1:04 PM
To: Sierra Guiza, Alejandro (Alex) <Alex.Sierra@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
One suggestion below, move the flags setting for else path into else path looks better.
Philip
On 2019-11-15 12:43 p.m., Alex Sierra wrote:
> After a retry-fault happens, the fault handler will modify the UTCL2
> to set PTE bits to force a no-retry-fault. This will cause the wave to
> enter the trap handler.
>
> Change-Id: I177102891f715068f15605957ff23b0cab862603
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +++++--------
> 1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 3c0bd6472a46..e4d1a8fc97d5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3167,7 +3167,8 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> uint64_t addr)
> {
> struct amdgpu_bo *root;
> - uint64_t value, flags;
> + uint64_t value = 0;
> + uint64_t flags;
> struct amdgpu_vm *vm;
> long r;
>
> @@ -3200,13 +3201,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> AMDGPU_PTE_SYSTEM;
>
>- flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>- AMDGPU_PTE_SYSTEM;
>
> if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
> - /* Redirect the access to the dummy page */
> - value = adev->dummy_page_addr;
> - flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
> - AMDGPU_PTE_WRITEABLE;
> + /* Setting PTE flags to trigger a no-retry-fault */
> + flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
> + AMDGPU_PTE_TF;
} else {
> - value = 0;
>+ flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>+ AMDGPU_PTE_SYSTEM;
> }
>
> r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr +
> 1,
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
@ 2019-11-15 18:03 ` Philip Yang
0 siblings, 0 replies; 14+ messages in thread
From: Philip Yang @ 2019-11-15 18:03 UTC (permalink / raw)
To: Alex Sierra, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
One suggestion below, move the flags setting for else path into else
path looks better.
Philip
On 2019-11-15 12:43 p.m., Alex Sierra wrote:
> After a retry-fault happens, the fault handler will modify the UTCL2 to
> set PTE bits to force a no-retry-fault. This will cause the wave to
> enter the trap handler.
>
> Change-Id: I177102891f715068f15605957ff23b0cab862603
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +++++--------
> 1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 3c0bd6472a46..e4d1a8fc97d5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3167,7 +3167,8 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> uint64_t addr)
> {
> struct amdgpu_bo *root;
> - uint64_t value, flags;
> + uint64_t value = 0;
> + uint64_t flags;
> struct amdgpu_vm *vm;
> long r;
>
> @@ -3200,13 +3201,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> AMDGPU_PTE_SYSTEM;
>
>- flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>- AMDGPU_PTE_SYSTEM;
>
> if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
> - /* Redirect the access to the dummy page */
> - value = adev->dummy_page_addr;
> - flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
> - AMDGPU_PTE_WRITEABLE;
> + /* Setting PTE flags to trigger a no-retry-fault */
> + flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
> + AMDGPU_PTE_TF;
} else {
> - value = 0;
>+ flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>+ AMDGPU_PTE_SYSTEM;
> }
>
> r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
@ 2019-11-15 18:03 ` Philip Yang
0 siblings, 0 replies; 14+ messages in thread
From: Philip Yang @ 2019-11-15 18:03 UTC (permalink / raw)
To: Alex Sierra, amd-gfx
One suggestion below, move the flags setting for else path into else
path looks better.
Philip
On 2019-11-15 12:43 p.m., Alex Sierra wrote:
> After a retry-fault happens, the fault handler will modify the UTCL2 to
> set PTE bits to force a no-retry-fault. This will cause the wave to
> enter the trap handler.
>
> Change-Id: I177102891f715068f15605957ff23b0cab862603
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +++++--------
> 1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 3c0bd6472a46..e4d1a8fc97d5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3167,7 +3167,8 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> uint64_t addr)
> {
> struct amdgpu_bo *root;
> - uint64_t value, flags;
> + uint64_t value = 0;
> + uint64_t flags;
> struct amdgpu_vm *vm;
> long r;
>
> @@ -3200,13 +3201,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> AMDGPU_PTE_SYSTEM;
>
>- flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>- AMDGPU_PTE_SYSTEM;
>
> if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
> - /* Redirect the access to the dummy page */
> - value = adev->dummy_page_addr;
> - flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
> - AMDGPU_PTE_WRITEABLE;
> + /* Setting PTE flags to trigger a no-retry-fault */
> + flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
> + AMDGPU_PTE_TF;
} else {
> - value = 0;
>+ flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>+ AMDGPU_PTE_SYSTEM;
> }
>
> r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
@ 2019-11-15 17:43 ` Alex Sierra
0 siblings, 0 replies; 14+ messages in thread
From: Alex Sierra @ 2019-11-15 17:43 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Alex Sierra
After a retry-fault happens, the fault handler will modify the UTCL2 to
set PTE bits to force a no-retry-fault. This will cause the wave to
enter the trap handler.
Change-Id: I177102891f715068f15605957ff23b0cab862603
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 3c0bd6472a46..e4d1a8fc97d5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3167,7 +3167,8 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
uint64_t addr)
{
struct amdgpu_bo *root;
- uint64_t value, flags;
+ uint64_t value = 0;
+ uint64_t flags;
struct amdgpu_vm *vm;
long r;
@@ -3200,13 +3201,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
AMDGPU_PTE_SYSTEM;
if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
- /* Redirect the access to the dummy page */
- value = adev->dummy_page_addr;
- flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
- AMDGPU_PTE_WRITEABLE;
- } else {
- /* Let the hw retry silently on the PTE */
- value = 0;
+ /* Setting PTE flags to trigger a no-retry-fault */
+ flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
+ AMDGPU_PTE_TF;
}
r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
--
2.17.1
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
@ 2019-11-15 17:43 ` Alex Sierra
0 siblings, 0 replies; 14+ messages in thread
From: Alex Sierra @ 2019-11-15 17:43 UTC (permalink / raw)
To: amd-gfx; +Cc: Alex Sierra
After a retry-fault happens, the fault handler will modify the UTCL2 to
set PTE bits to force a no-retry-fault. This will cause the wave to
enter the trap handler.
Change-Id: I177102891f715068f15605957ff23b0cab862603
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 3c0bd6472a46..e4d1a8fc97d5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3167,7 +3167,8 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
uint64_t addr)
{
struct amdgpu_bo *root;
- uint64_t value, flags;
+ uint64_t value = 0;
+ uint64_t flags;
struct amdgpu_vm *vm;
long r;
@@ -3200,13 +3201,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
AMDGPU_PTE_SYSTEM;
if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
- /* Redirect the access to the dummy page */
- value = adev->dummy_page_addr;
- flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
- AMDGPU_PTE_WRITEABLE;
- } else {
- /* Let the hw retry silently on the PTE */
- value = 0;
+ /* Setting PTE flags to trigger a no-retry-fault */
+ flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
+ AMDGPU_PTE_TF;
}
r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
--
2.17.1
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-11-18 14:12 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-15 22:38 [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault Alex Sierra
2019-11-15 22:38 ` Alex Sierra
[not found] ` <20191115223830.29769-1-alex.sierra-5C7GfCeVMHo@public.gmane.org>
2019-11-16 9:32 ` Christian König
2019-11-16 9:32 ` Christian König
-- strict thread matches above, loose matches on Subject: below --
2019-11-15 17:43 Alex Sierra
2019-11-15 17:43 ` Alex Sierra
[not found] ` <20191115174314.73446-1-alex.sierra-5C7GfCeVMHo@public.gmane.org>
2019-11-15 18:03 ` Philip Yang
2019-11-15 18:03 ` Philip Yang
[not found] ` <bd9f76ca-9dc7-6c29-9c41-fc3e0d3c0bf1-5C7GfCeVMHo@public.gmane.org>
2019-11-18 3:42 ` Zeng, Oak
2019-11-18 3:42 ` Zeng, Oak
[not found] ` <BL0PR12MB258076EDAE60740E94C92972804D0-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-11-18 13:46 ` Christian König
2019-11-18 13:46 ` Christian König
[not found] ` <b391a2bb-a078-d397-50f9-9f4cd2824cb3-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-11-18 14:12 ` Zeng, Oak
2019-11-18 14:12 ` Zeng, Oak
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.