All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Sharma, Shashank" <shashank.sharma@amd.com>
To: "Lazar, Lijo" <lijo.lazar@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Somalapuram Amaranath" <Amaranath.Somalapuram@amd.com>,
	"Christian König" <Christian.Koenig@amd.com>
Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
Date: Fri, 4 Feb 2022 17:38:41 +0100	[thread overview]
Message-ID: <817df2c3-e7af-92cb-53f8-8bc70b69b988@amd.com> (raw)
In-Reply-To: <a0693436-619c-efa1-b3f1-2fca6377e2fe@amd.com>

Hey Lijo,
I somehow missed to respond on this comment, pls find inline:

Regards
Shashank

On 1/22/2022 7:42 AM, Lazar, Lijo wrote:
> 
> 
> On 1/22/2022 2:04 AM, Sharma, Shashank wrote:
>>  From 899ec6060eb7d8a3d4d56ab439e4e6cdd74190a4 Mon Sep 17 00:00:00 2001
>> From: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
>> Date: Fri, 21 Jan 2022 14:19:42 +0530
>> Subject: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
>>
>> This patch adds a GPU reset handler for Navi ASIC family, which
>> typically dumps some of the registersand sends a trace event.
>>
>> V2: Accomodated call to work function to send uevent
>>
>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/nv.c | 28 ++++++++++++++++++++++++++++
>>   1 file changed, 28 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c 
>> b/drivers/gpu/drm/amd/amdgpu/nv.c
>> index 01efda4398e5..ada35d4c5245 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>> @@ -528,10 +528,38 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>>       }
>>   }
>>
>> +static void amdgpu_reset_dumps(struct amdgpu_device *adev)
>> +{
>> +    int r = 0, i;
>> +
>> +    /* original raven doesn't have full asic reset */
>> +    if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>> +        !(adev->apu_flags & AMD_APU_IS_RAVEN2))
>> +        return;
>> +    for (i = 0; i < adev->num_ip_blocks; i++) {
>> +        if (!adev->ip_blocks[i].status.valid)
>> +            continue;
>> +        if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>> +            continue;
>> +        r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>> +
>> +        if (r)
>> +            DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
>> +                    adev->ip_blocks[i].version->funcs->name, r);
>> +    }
>> +
>> +    /* Schedule work to send uevent */
>> +    if (!queue_work(system_unbound_wq, &adev->gpu_reset_work))
>> +        DRM_ERROR("failed to add GPU reset work\n");
>> +
>> +    dump_stack();
>> +}
>> +
>>   static int nv_asic_reset(struct amdgpu_device *adev)
>>   {
>>       int ret = 0;
>>
>> +    amdgpu_reset_dumps(adev);
> 
> Had a comment on this before. Now there are different reasons (or even 
> no reason like a precautionary reset) to perform reset. A user would be 
> interested in a trace only if the reason is valid.
> 
> To clarify on why a work shouldn't be scheduled on every reset, check 
> here -
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L2188 
In the example you pointed to, they have a criteria to decide what is a 
valid reset in their context, in the kernel side itself. So they can 
take a call if they want to do something about it or not.

But, in our case, we want to send the trace_event to user with some 
register values on every reset, and it is actually up to the profiling 
app to interpret (along with what it wants to call a GPU reset). So I 
don't think this is causing a considerable overhead.

- Shashank
> 
> 
> 
> Thanks,
> Lijo
> 
>>       switch (nv_asic_reset_method(adev)) {
>>       case AMD_RESET_METHOD_PCI:
>>           dev_info(adev->dev, "PCI reset\n");

  reply	other threads:[~2022-02-04 16:39 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-21 20:34 [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler Sharma, Shashank
2022-01-22  6:42 ` Lazar, Lijo
2022-02-04 16:38   ` Sharma, Shashank [this message]
2022-02-04 16:50     ` Lazar, Lijo
2022-02-04 16:59       ` Sharma, Shashank
2022-02-04 17:02         ` Lazar, Lijo
2022-02-04 17:07           ` Sharma, Shashank
2022-02-04 17:11             ` Lazar, Lijo
2022-02-04 17:16               ` Sharma, Shashank
2022-02-04 17:20                 ` Lazar, Lijo
2022-02-04 17:22                   ` Sharma, Shashank
2022-02-04 18:41                     ` Deucher, Alexander
2022-02-04 18:44                       ` Deucher, Alexander
2022-02-05  7:00                         ` Sharma, Shashank
2022-01-24  7:18 ` Christian König
2022-01-24 16:50   ` Sharma, Shashank
2022-01-24 16:32 ` Andrey Grodzovsky
2022-01-24 16:38   ` Sharma, Shashank
2022-01-24 17:08     ` Andrey Grodzovsky
2022-01-24 17:11       ` Sharma, Shashank

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=817df2c3-e7af-92cb-53f8-8bc70b69b988@amd.com \
    --to=shashank.sharma@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Amaranath.Somalapuram@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=lijo.lazar@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.