All of lore.kernel.org
 help / color / mirror / Atom feed
From: Felix Kuehling <felix.kuehling-5C7GfCeVMHo@public.gmane.org>
To: Alex Sierra <alex.sierra-5C7GfCeVMHo@public.gmane.org>,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Subject: Re: [PATCH 2/2] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
Date: Mon, 18 Nov 2019 17:46:45 -0500	[thread overview]
Message-ID: <f60eeb60-712f-6aa4-2660-86970b92c637@amd.com> (raw)
In-Reply-To: <20191118222435.93134-2-alex.sierra-5C7GfCeVMHo@public.gmane.org>

On 2019-11-18 17:24, Alex Sierra wrote:
> Only for the debugger use case.
>
> [why]
> Avoid endless translation retries, after an invalid address access has
> been issued to the GPU. Instead, the trap handler is forced to enter by
> generating a no-retry-fault.
> A s_trap instruction is inserted in the debugger case to let the wave to
> enter trap handler to save context.
>
> [how]
> Intentionally using an invalid flag combination (F and P set at the same
> time) to trigger a no-retry-fault, after a retry-fault happens. This is
> only valid under compute context.
>
> Change-Id: I4180c30e2631dc0401cbd6171f8a6776e4733c9a
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index d51ac8771ae0..358a4f50fcfb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3207,6 +3207,12 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>   		value = adev->dummy_page_addr;
>   		flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>   			AMDGPU_PTE_WRITEABLE;
> +
> +		if (vm->is_compute_context) {
> +			/* Setting PTE flags to trigger a no-retry-fault  */
> +			flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
> +				AMDGPU_PTE_TF;

Hmm, this looks like you're setting flags twice in the compute-case. I 
was also expecting something more like this:

if (vm->is_compute_context) {
     ...
} else if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
     ...
} else {
     ...
}

I.e. for compute contexts, we do our compute-specific thing, otherwise 
the behaviour depends on the amdgpu_vm_fault_stop setting.

Regards,
   Felix


> +		}
>   	} else {
>   		/* Let the hw retry silently on the PTE */
>   		value = 0;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

WARNING: multiple messages have this Message-ID (diff)
From: Felix Kuehling <felix.kuehling@amd.com>
To: Alex Sierra <alex.sierra@amd.com>, amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/2] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault
Date: Mon, 18 Nov 2019 17:46:45 -0500	[thread overview]
Message-ID: <f60eeb60-712f-6aa4-2660-86970b92c637@amd.com> (raw)
Message-ID: <20191118224645.x_7Hv3Bc4ihZSG2orJk84Ry5A-CFuZoNvJpuZytEb9Q@z> (raw)
In-Reply-To: <20191118222435.93134-2-alex.sierra@amd.com>

On 2019-11-18 17:24, Alex Sierra wrote:
> Only for the debugger use case.
>
> [why]
> Avoid endless translation retries, after an invalid address access has
> been issued to the GPU. Instead, the trap handler is forced to enter by
> generating a no-retry-fault.
> A s_trap instruction is inserted in the debugger case to let the wave to
> enter trap handler to save context.
>
> [how]
> Intentionally using an invalid flag combination (F and P set at the same
> time) to trigger a no-retry-fault, after a retry-fault happens. This is
> only valid under compute context.
>
> Change-Id: I4180c30e2631dc0401cbd6171f8a6776e4733c9a
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index d51ac8771ae0..358a4f50fcfb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3207,6 +3207,12 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>   		value = adev->dummy_page_addr;
>   		flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>   			AMDGPU_PTE_WRITEABLE;
> +
> +		if (vm->is_compute_context) {
> +			/* Setting PTE flags to trigger a no-retry-fault  */
> +			flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
> +				AMDGPU_PTE_TF;

Hmm, this looks like you're setting flags twice in the compute-case. I 
was also expecting something more like this:

if (vm->is_compute_context) {
     ...
} else if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
     ...
} else {
     ...
}

I.e. for compute contexts, we do our compute-specific thing, otherwise 
the behaviour depends on the amdgpu_vm_fault_stop setting.

Regards,
   Felix


> +		}
>   	} else {
>   		/* Let the hw retry silently on the PTE */
>   		value = 0;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply	other threads:[~2019-11-18 22:46 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-18 22:24 [PATCH 1/2] drm/amdgpu: add flag to indicate amdgpu vm context Alex Sierra
2019-11-18 22:24 ` Alex Sierra
     [not found] ` <20191118222435.93134-1-alex.sierra-5C7GfCeVMHo@public.gmane.org>
2019-11-18 22:24   ` [PATCH 2/2] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault Alex Sierra
2019-11-18 22:24     ` Alex Sierra
     [not found]     ` <20191118222435.93134-2-alex.sierra-5C7GfCeVMHo@public.gmane.org>
2019-11-18 22:46       ` Felix Kuehling [this message]
2019-11-18 22:46         ` Felix Kuehling
2019-11-19 16:37 [PATCH 1/2] drm/amdgpu: add flag to indicate amdgpu vm context Alex Sierra
     [not found] ` <20191119163754.4966-1-alex.sierra-5C7GfCeVMHo@public.gmane.org>
2019-11-19 16:37   ` [PATCH 2/2] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault Alex Sierra
2019-11-19 16:37     ` Alex Sierra
     [not found]     ` <20191119163754.4966-2-alex.sierra-5C7GfCeVMHo@public.gmane.org>
2019-11-19 16:45       ` Felix Kuehling
2019-11-19 16:45         ` Felix Kuehling
     [not found]         ` <2b96848e-cf45-b558-e453-8a73de83d4a3-5C7GfCeVMHo@public.gmane.org>
2019-11-19 20:06           ` Christian König
2019-11-19 20:06             ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f60eeb60-712f-6aa4-2660-86970b92c637@amd.com \
    --to=felix.kuehling-5c7gfcevmho@public.gmane.org \
    --cc=alex.sierra-5C7GfCeVMHo@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.