All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Shashank Sharma <contactshashanksharma@gmail.com>,
	amd-gfx@lists.freedesktop.org
Cc: Alexander Deucher <alexander.deucher@amd.com>,
	amaranath.somalapuram@amd.com, shashank.sharma@amd.com
Subject: Re: [PATCH 2/2] drm/amdgpu: add work function for GPU reset event
Date: Tue, 8 Mar 2022 08:15:37 +0100	[thread overview]
Message-ID: <8d35cf70-0dc3-6fd5-6768-9530d729ae63@amd.com> (raw)
In-Reply-To: <20220307162631.2496286-2-contactshashanksharma@gmail.com>

Am 07.03.22 um 17:26 schrieb Shashank Sharma:
> From: Shashank Sharma <shashank.sharma@amd.com>
>
> This patch adds a work function, which will get scheduled
> in event of a GPU reset, and will send a uevent to user with
> some reset context infomration, like a PID and some flags.
>
> The userspace can do some recovery and post-processing work
> based on this event.
>
> V2:
> - Changed the name of the work to gpu_reset_event_work
>    (Christian)
> - Added a structure to accommodate some additional information
>    (like a PID and some flags)
>
> Cc: Alexander Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  7 +++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++++
>   2 files changed, 26 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index d8b854fcbffa..7df219fe363f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -813,6 +813,11 @@ struct amd_powerplay {
>   #define AMDGPU_RESET_MAGIC_NUM 64
>   #define AMDGPU_MAX_DF_PERFMONS 4
>   #define AMDGPU_PRODUCT_NAME_LEN 64
> +struct amdgpu_reset_event_ctx {
> +	uint64_t pid;
> +	uint32_t flags;
> +};
> +

Please don't put any new structures into amdgpu.h. If I'm not completely 
mistaken Andrey has created a new header for all the reset related stuff.

I would also reconsider the name, at least drop the _ctx suffix.

Regards,
Christian.

>   struct amdgpu_device {
>   	struct device			*dev;
>   	struct pci_dev			*pdev;
> @@ -1063,6 +1068,7 @@ struct amdgpu_device {
>   
>   	int asic_reset_res;
>   	struct work_struct		xgmi_reset_work;
> +	struct work_struct		gpu_reset_event_work;
>   	struct list_head		reset_list;
>   
>   	long				gfx_timeout;
> @@ -1097,6 +1103,7 @@ struct amdgpu_device {
>   	pci_channel_state_t		pci_channel_state;
>   
>   	struct amdgpu_reset_control     *reset_cntl;
> +	struct amdgpu_reset_event_ctx   reset_event_ctx;
>   	uint32_t                        ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
>   
>   	bool				ram_is_direct_mapped;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index ed077de426d9..c43d099da06d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -73,6 +73,7 @@
>   #include <linux/pm_runtime.h>
>   
>   #include <drm/drm_drv.h>
> +#include <drm/drm_sysfs.h>
>   
>   MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
>   MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
> @@ -3277,6 +3278,23 @@ bool amdgpu_device_has_dc_support(struct amdgpu_device *adev)
>   	return amdgpu_device_asic_has_dc_support(adev->asic_type);
>   }
>   
> +static void amdgpu_device_reset_event_func(struct work_struct *__work)
> +{
> +	struct amdgpu_device *adev = container_of(__work, struct amdgpu_device,
> +						  gpu_reset_event_work);
> +	struct amdgpu_reset_event_ctx *event_ctx = &adev->reset_event_ctx;
> +
> +	/*
> +	 * A GPU reset has happened, indicate the userspace and pass the
> +	 * following information:
> +	 *	- pid of the process involved,
> +	 *	- if the VRAM is valid or not,
> +	 *	- indicate that userspace may want to collect the ftrace event
> +	 * data from the trace event.
> +	 */
> +	drm_sysfs_reset_event(&adev->ddev, event_ctx->pid, event_ctx->flags);
> +}
> +
>   static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)
>   {
>   	struct amdgpu_device *adev =
> @@ -3525,6 +3543,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   			  amdgpu_device_delay_enable_gfx_off);
>   
>   	INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
> +	INIT_WORK(&adev->gpu_reset_event_work, amdgpu_device_reset_event_func);
>   
>   	adev->gfx.gfx_off_req_count = 1;
>   	adev->pm.ac_power = power_supply_is_system_supplied() > 0;


  parent reply	other threads:[~2022-03-08  7:15 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-07 16:26 [PATCH 1/2] drm: Add GPU reset sysfs event Shashank Sharma
2022-03-07 16:26 ` [PATCH 2/2] drm/amdgpu: add work function for GPU reset event Shashank Sharma
2022-03-07 16:46   ` Somalapuram, Amaranath
2022-03-07 17:05     ` Sharma, Shashank
2022-03-08  7:15   ` Christian König [this message]
2022-03-08  9:35     ` Sharma, Shashank
2022-03-08 16:26   ` Andrey Grodzovsky
2022-03-08 16:30     ` Sharma, Shashank
2022-03-08 17:20       ` Somalapuram, Amaranath
2022-03-08 18:53         ` Andrey Grodzovsky
2022-03-08  6:32 ` [PATCH 1/2] drm: Add GPU reset sysfs event Somalapuram, Amaranath
2022-03-08  7:06 ` Christian König
2022-03-08  9:31   ` Sharma, Shashank
2022-03-08 10:32     ` Christian König
2022-03-08 11:56       ` Sharma, Shashank
2022-03-08 15:37         ` Somalapuram, Amaranath
2022-03-09  8:02           ` Christian König
2022-03-08 16:40         ` Sharma, Shashank
2022-03-09  8:05           ` Christian König
2022-03-08 16:25 ` Andrey Grodzovsky
2022-03-08 16:35   ` Sharma, Shashank
2022-03-08 16:36     ` Andrey Grodzovsky
2022-03-08 16:36   ` Lazar, Lijo
2022-03-08 16:39     ` Andrey Grodzovsky
2022-03-08 16:46       ` Sharma, Shashank
2022-03-08 16:55         ` Andrey Grodzovsky
2022-03-08 16:57           ` Sharma, Shashank
2022-03-08 17:04             ` Somalapuram, Amaranath
2022-03-08 17:17               ` Andrey Grodzovsky
2022-03-08 17:27             ` Limonciello, Mario
2022-03-08 18:08               ` Sharma, Shashank
2022-03-08 18:10                 ` Limonciello, Mario

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d35cf70-0dc3-6fd5-6768-9530d729ae63@amd.com \
    --to=christian.koenig@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amaranath.somalapuram@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=contactshashanksharma@gmail.com \
    --cc=shashank.sharma@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.