All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Somalapuram, Amaranath" <asomalap@amd.com>
To: "Christian König" <christian.koenig@amd.com>,
	"Alex Deucher" <alexdeucher@gmail.com>,
	"Somalapuram Amaranath" <Amaranath.Somalapuram@amd.com>
Cc: "Deucher, Alexander" <alexander.deucher@amd.com>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>,
	Shashank Sharma <shashank.sharma@amd.com>
Subject: Re: [PATCH 2/2] drm/amdgpu: add reset register trace function on GPU reset
Date: Thu, 10 Feb 2022 10:59:55 +0530	[thread overview]
Message-ID: <49e24f9f-4657-d3ce-e84e-abbaa56d3181@amd.com> (raw)
In-Reply-To: <6a7ca5ae-6d78-b8fd-cba8-cd2dca4418f4@amd.com>


On 2/9/2022 1:17 PM, Christian König wrote:
> Am 08.02.22 um 16:28 schrieb Alex Deucher:
>> On Tue, Feb 8, 2022 at 3:17 AM Somalapuram Amaranath
>> <Amaranath.Somalapuram@amd.com> wrote:
>>> Dump the list of register values to trace event on GPU reset.
>>>
>>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 ++++++++++++++++++++-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h  | 19 +++++++++++++++++++
>>>   2 files changed, 39 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 1e651b959141..057922fb7e37 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -4534,6 +4534,23 @@ int amdgpu_device_pre_asic_reset(struct 
>>> amdgpu_device *adev,
>>>          return r;
>>>   }
>>>
>>> +static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev)
>>> +{
>>> +       int i;
>>> +       uint32_t reg_value[128];
>>> +
>>> +       for (i = 0; adev->reset_dump_reg_list[i] != 0; i++) {
>>> +               if (adev->asic_type >= CHIP_NAVI10)
>> This check should be against CHIP_VEGA10.  Also, this only allows for
>> GC registers.  If we wanted to dump other registers, we'd need a
>> different macro.  Might be better to just use RREG32 here for
>> everything and then encode the full offset using
>> SOC15_REG_ENTRY_OFFSET() or a similar macro.  Also, we need to think
>> about how to handle gfxoff in this case.  gfxoff needs to be disabled
>> or we'll hang the chip if we try and read GC or SDMA registers via
>> MMIO which will adversely affect the hang signature.
>
> Well this should execute right before a GPU reset, so I think it 
> shouldn't matter if we hang the chip or not as long as the read comes 
> back correctly (I remember a very long UVD debug session because of 
> this).
>
> But in general I agree, we should just use RREG32() here and always 
> encode the full register offset.
>
> Regards,
> Christian.
>
Can I use something like this:

+                       reg_value[i] = 
RREG32((adev->reg_offset[adev->reset_dump_reg_list[i][0]]
+ [adev->reset_dump_reg_list[i][1]]
+ [adev->reset_dump_reg_list[i][2]])
+                                 + adev->reset_dump_reg_list[i][3]);

ip --> adev->reset_dump_reg_list[i][0]

inst --> adev->reset_dump_reg_list[i][1]

BASE_IDX--> adev->reset_dump_reg_list[i][2]

reg --> adev->reset_dump_reg_list[i][3]

which requires 4 values in user space for each register.

using any existing macro like RREG32_SOC15** will not be able to pass 
proper argument from user space (like ip##_HWIP or reg##_BASE_IDX)

>
>>
>> Alex
>>
>>> +                       reg_value[i] = RREG32_SOC15_IP(GC, 
>>> adev->reset_dump_reg_list[i]);
>>> +               else
>>> +                       reg_value[i] = 
>>> RREG32(adev->reset_dump_reg_list[i]);
>>> +       }
>>> +
>>> + trace_amdgpu_reset_reg_dumps(adev->reset_dump_reg_list, reg_value, 
>>> i);
>>> +
>>> +       return 0;
>>> +}
>>> +
>>>   int amdgpu_do_asic_reset(struct list_head *device_list_handle,
>>>                           struct amdgpu_reset_context *reset_context)
>>>   {
>>> @@ -4567,8 +4584,10 @@ int amdgpu_do_asic_reset(struct list_head 
>>> *device_list_handle,
>>> tmp_adev->gmc.xgmi.pending_reset = false;
>>>                                  if (!queue_work(system_unbound_wq, 
>>> &tmp_adev->xgmi_reset_work))
>>>                                          r = -EALREADY;
>>> -                       } else
>>> +                       } else {
>>> + amdgpu_reset_reg_dumps(tmp_adev);
>>>                                  r = amdgpu_asic_reset(tmp_adev);
>>> +                       }
>>>
>>>                          if (r) {
>>>                                  dev_err(tmp_adev->dev, "ASIC reset 
>>> failed with error, %d for drm dev, %s",
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>> index d855cb53c7e0..3fe33de3564a 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>> @@ -537,6 +537,25 @@ TRACE_EVENT(amdgpu_ib_pipe_sync,
>>>                        __entry->seqno)
>>>   );
>>>
>>> +TRACE_EVENT(amdgpu_reset_reg_dumps,
>>> +           TP_PROTO(long *address, uint32_t *value, int length),
>>> +           TP_ARGS(address, value, length),
>>> +           TP_STRUCT__entry(
>>> +                            __array(long, address, 128)
>>> +                            __array(uint32_t, value, 128)
>>> +                            __field(int, len)
>>> +                            ),
>>> +           TP_fast_assign(
>>> +                          memcpy(__entry->address, address, 128);
>>> +                          memcpy(__entry->value,  value, 128);
>>> +                          __entry->len = length;
>>> +                          ),
>>> +           TP_printk("amdgpu register dump offset: %s value: %s ",
>>> +                     __print_array(__entry->address, __entry->len, 8),
>>> +                     __print_array(__entry->value, __entry->len, 8)
>>> +                    )
>>> +);
>>> +
>>>   #undef AMDGPU_JOB_GET_TIMELINE_NAME
>>>   #endif
>>>
>>> -- 
>>> 2.25.1
>>>
>

  reply	other threads:[~2022-02-10  5:30 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08  8:16 [PATCH 1/2] drm/amdgpu: add debugfs for reset registers list Somalapuram Amaranath
2022-02-08  8:16 ` [PATCH 2/2] drm/amdgpu: add reset register trace function on GPU reset Somalapuram Amaranath
2022-02-08 15:28   ` Alex Deucher
2022-02-09  7:47     ` Christian König
2022-02-10  5:29       ` Somalapuram, Amaranath [this message]
2022-02-10  7:09         ` Christian König
2022-02-10  7:34           ` Somalapuram, Amaranath
2022-02-10  7:38             ` Christian König
2022-02-10 13:18               ` Sharma, Shashank
2022-02-10 14:05                 ` Christian König
2022-02-10 14:11                   ` Sharma, Shashank
2022-02-10 15:59                     ` Alex Deucher
2022-02-10 11:59         ` Sharma, Shashank
2022-02-10 12:27           ` Christian König
2022-02-08  8:18 ` [PATCH 1/2] drm/amdgpu: add debugfs for reset registers list Christian König
2022-02-08 11:13   ` Sharma, Shashank
2022-02-08 13:39     ` Somalapuram, Amaranath
2022-02-08 14:18       ` Sharma, Shashank
2022-02-08 14:31         ` Sharma, Shashank
2022-02-08 14:51           ` Sharma, Shashank
2022-02-08 10:54 ` Sharma, Shashank

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49e24f9f-4657-d3ce-e84e-abbaa56d3181@amd.com \
    --to=asomalap@amd.com \
    --cc=Amaranath.Somalapuram@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=shashank.sharma@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.