All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Sharma, Shashank" <shashank.sharma@amd.com>
To: "Somalapuram, Amaranath" <asomalap@amd.com>,
	Shashank Sharma <contactshashanksharma@gmail.com>,
	amd-gfx@lists.freedesktop.org
Cc: Alexander Deucher <alexander.deucher@amd.com>,
	amaranath.somalapuram@amd.com,
	Christian Koenig <christian.koenig@amd.com>
Subject: Re: [PATCH 2/2] drm/amdgpu: add work function for GPU reset event
Date: Mon, 7 Mar 2022 18:05:52 +0100	[thread overview]
Message-ID: <7eed2410-235b-a016-2ca6-20b8ccb1637c@amd.com> (raw)
In-Reply-To: <b5f0fbd2-f599-8f4b-dd12-8f18734d52d8@amd.com>

Hey Amar,

On 3/7/2022 5:46 PM, Somalapuram, Amaranath wrote:
> 
> On 3/7/2022 9:56 PM, Shashank Sharma wrote:
>> From: Shashank Sharma <shashank.sharma@amd.com>
>>
>> This patch adds a work function, which will get scheduled
>> in event of a GPU reset, and will send a uevent to user with
>> some reset context infomration, like a PID and some flags.
>>
>> The userspace can do some recovery and post-processing work
>> based on this event.
>>
>> V2:
>> - Changed the name of the work to gpu_reset_event_work
>>    (Christian)
>> - Added a structure to accommodate some additional information
>>    (like a PID and some flags)
>>
>> Cc: Alexander Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  7 +++++++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++++
>>   2 files changed, 26 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index d8b854fcbffa..7df219fe363f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -813,6 +813,11 @@ struct amd_powerplay {
>>   #define AMDGPU_RESET_MAGIC_NUM 64
>>   #define AMDGPU_MAX_DF_PERFMONS 4
>>   #define AMDGPU_PRODUCT_NAME_LEN 64
>> +struct amdgpu_reset_event_ctx {
>> +    uint64_t pid;
>> +    uint32_t flags;
>> +};
>> +
>>   struct amdgpu_device {
>>       struct device            *dev;
>>       struct pci_dev            *pdev;
>> @@ -1063,6 +1068,7 @@ struct amdgpu_device {
>>       int asic_reset_res;
>>       struct work_struct        xgmi_reset_work;
>> +    struct work_struct        gpu_reset_event_work;
>>       struct list_head        reset_list;
>>       long                gfx_timeout;
>> @@ -1097,6 +1103,7 @@ struct amdgpu_device {
>>       pci_channel_state_t        pci_channel_state;
>>       struct amdgpu_reset_control     *reset_cntl;
>> +    struct amdgpu_reset_event_ctx   reset_event_ctx;
>>       uint32_t                        
>> ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
>>       bool                ram_is_direct_mapped;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index ed077de426d9..c43d099da06d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -73,6 +73,7 @@
>>   #include <linux/pm_runtime.h>
>>   #include <drm/drm_drv.h>
>> +#include <drm/drm_sysfs.h>
>>   MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
>>   MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
>> @@ -3277,6 +3278,23 @@ bool amdgpu_device_has_dc_support(struct 
>> amdgpu_device *adev)
>>       return amdgpu_device_asic_has_dc_support(adev->asic_type);
>>   }
>> +static void amdgpu_device_reset_event_func(struct work_struct *__work)
>> +{
>> +    struct amdgpu_device *adev = container_of(__work, struct 
>> amdgpu_device,
>> +                          gpu_reset_event_work);
> 
> I am trying same thing but adev context is lost.
> 
> schedule_work() in amdgpu_do_asic_reset after getting/reading vram_lost 
> = amdgpu_device_check_vram_lost(tmp_adev);
> 

I am not sure if I understand your point correctly, but in this patch I 
have introduced a struct amdgpu_reset_event_ctx, which already is a part 
of adev, and has space to store both PID as well as flags.

So all you have to do is:
- GPU reset happens
- Save the pid in adev.reset_event_ctx.pid
- If VRAM valid, Save the vram status in adev.reset_event_ctx->flags |= 
(BIT0)

and schedule this work function. If the data is saved properly, it will 
reach the work function and the event will be sent.

- Shashank

> Regards,
> 
> S.Amarnath
> 
>> +    struct amdgpu_reset_event_ctx *event_ctx = &adev->reset_event_ctx;
>> +
>> +    /*
>> +     * A GPU reset has happened, indicate the userspace and pass the
>> +     * following information:
>> +     *    - pid of the process involved,
>> +     *    - if the VRAM is valid or not,
>> +     *    - indicate that userspace may want to collect the ftrace event
>> +     * data from the trace event.
>> +     */
>> +    drm_sysfs_reset_event(&adev->ddev, event_ctx->pid, 
>> event_ctx->flags);
>> +}
>> +
>>   static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)
>>   {
>>       struct amdgpu_device *adev =
>> @@ -3525,6 +3543,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>                 amdgpu_device_delay_enable_gfx_off);
>>       INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
>> +    INIT_WORK(&adev->gpu_reset_event_work, 
>> amdgpu_device_reset_event_func);
>>       adev->gfx.gfx_off_req_count = 1;
>>       adev->pm.ac_power = power_supply_is_system_supplied() > 0;

  reply	other threads:[~2022-03-07 17:06 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-07 16:26 [PATCH 1/2] drm: Add GPU reset sysfs event Shashank Sharma
2022-03-07 16:26 ` [PATCH 2/2] drm/amdgpu: add work function for GPU reset event Shashank Sharma
2022-03-07 16:46   ` Somalapuram, Amaranath
2022-03-07 17:05     ` Sharma, Shashank [this message]
2022-03-08  7:15   ` Christian König
2022-03-08  9:35     ` Sharma, Shashank
2022-03-08 16:26   ` Andrey Grodzovsky
2022-03-08 16:30     ` Sharma, Shashank
2022-03-08 17:20       ` Somalapuram, Amaranath
2022-03-08 18:53         ` Andrey Grodzovsky
2022-03-08  6:32 ` [PATCH 1/2] drm: Add GPU reset sysfs event Somalapuram, Amaranath
2022-03-08  7:06 ` Christian König
2022-03-08  9:31   ` Sharma, Shashank
2022-03-08 10:32     ` Christian König
2022-03-08 11:56       ` Sharma, Shashank
2022-03-08 15:37         ` Somalapuram, Amaranath
2022-03-09  8:02           ` Christian König
2022-03-08 16:40         ` Sharma, Shashank
2022-03-09  8:05           ` Christian König
2022-03-08 16:25 ` Andrey Grodzovsky
2022-03-08 16:35   ` Sharma, Shashank
2022-03-08 16:36     ` Andrey Grodzovsky
2022-03-08 16:36   ` Lazar, Lijo
2022-03-08 16:39     ` Andrey Grodzovsky
2022-03-08 16:46       ` Sharma, Shashank
2022-03-08 16:55         ` Andrey Grodzovsky
2022-03-08 16:57           ` Sharma, Shashank
2022-03-08 17:04             ` Somalapuram, Amaranath
2022-03-08 17:17               ` Andrey Grodzovsky
2022-03-08 17:27             ` Limonciello, Mario
2022-03-08 18:08               ` Sharma, Shashank
2022-03-08 18:10                 ` Limonciello, Mario

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7eed2410-235b-a016-2ca6-20b8ccb1637c@amd.com \
    --to=shashank.sharma@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amaranath.somalapuram@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=asomalap@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=contactshashanksharma@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.