* [PATCH 1/2] drm: Add GPU reset sysfs event
@ 2022-03-07 16:26 Shashank Sharma
2022-03-07 16:26 ` [PATCH 2/2] drm/amdgpu: add work function for GPU reset event Shashank Sharma
` (3 more replies)
0 siblings, 4 replies; 32+ messages in thread
From: Shashank Sharma @ 2022-03-07 16:26 UTC (permalink / raw)
To: amd-gfx
Cc: Alexandar Deucher, amaranath.somalapuram, Christian Koenig,
shashank.sharma
From: Shashank Sharma <shashank.sharma@amd.com>
This patch adds a new sysfs event, which will indicate
the userland about a GPU reset, and can also provide
some information like:
- which PID was involved in the GPU reset
- what was the GPU status (using flags)
This patch also introduces the first flag of the flags
bitmap, which can be appended as and when required.
Cc: Alexandar Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
include/drm/drm_sysfs.h | 3 +++
2 files changed, 27 insertions(+)
diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
index 430e00b16eec..52a015161431 100644
--- a/drivers/gpu/drm/drm_sysfs.c
+++ b/drivers/gpu/drm/drm_sysfs.c
@@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct drm_device *dev)
}
EXPORT_SYMBOL(drm_sysfs_hotplug_event);
+/**
+ * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
+ * @dev: DRM device
+ * @pid: The process ID involve with the reset
+ * @flags: Any other information about the GPU status
+ *
+ * Send a uevent for the DRM device specified by @dev. This indicates
+ * user that a GPU reset has occurred, so that the interested client
+ * can take any recovery or profiling measure, when required.
+ */
+void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, uint32_t flags)
+{
+ unsigned char pid_str[21], flags_str[15];
+ unsigned char reset_str[] = "RESET=1";
+ char *envp[] = { reset_str, pid_str, flags_str, NULL };
+
+ DRM_DEBUG("generating reset event\n");
+
+ snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
+ snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
+ kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
+}
+EXPORT_SYMBOL(drm_sysfs_reset_event);
+
/**
* drm_sysfs_connector_hotplug_event - generate a DRM uevent for any connector
* change
diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
index 6273cac44e47..63f00fe8054c 100644
--- a/include/drm/drm_sysfs.h
+++ b/include/drm/drm_sysfs.h
@@ -2,6 +2,8 @@
#ifndef _DRM_SYSFS_H_
#define _DRM_SYSFS_H_
+#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
+
struct drm_device;
struct device;
struct drm_connector;
@@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
void drm_class_device_unregister(struct device *dev);
void drm_sysfs_hotplug_event(struct drm_device *dev);
+void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, uint32_t reset_flags);
void drm_sysfs_connector_hotplug_event(struct drm_connector *connector);
void drm_sysfs_connector_status_event(struct drm_connector *connector,
struct drm_property *property);
--
2.32.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 2/2] drm/amdgpu: add work function for GPU reset event
2022-03-07 16:26 [PATCH 1/2] drm: Add GPU reset sysfs event Shashank Sharma
@ 2022-03-07 16:26 ` Shashank Sharma
2022-03-07 16:46 ` Somalapuram, Amaranath
` (2 more replies)
2022-03-08 6:32 ` [PATCH 1/2] drm: Add GPU reset sysfs event Somalapuram, Amaranath
` (2 subsequent siblings)
3 siblings, 3 replies; 32+ messages in thread
From: Shashank Sharma @ 2022-03-07 16:26 UTC (permalink / raw)
To: amd-gfx
Cc: Alexander Deucher, amaranath.somalapuram, Christian Koenig,
shashank.sharma
From: Shashank Sharma <shashank.sharma@amd.com>
This patch adds a work function, which will get scheduled
in event of a GPU reset, and will send a uevent to user with
some reset context infomration, like a PID and some flags.
The userspace can do some recovery and post-processing work
based on this event.
V2:
- Changed the name of the work to gpu_reset_event_work
(Christian)
- Added a structure to accommodate some additional information
(like a PID and some flags)
Cc: Alexander Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++++
2 files changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index d8b854fcbffa..7df219fe363f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -813,6 +813,11 @@ struct amd_powerplay {
#define AMDGPU_RESET_MAGIC_NUM 64
#define AMDGPU_MAX_DF_PERFMONS 4
#define AMDGPU_PRODUCT_NAME_LEN 64
+struct amdgpu_reset_event_ctx {
+ uint64_t pid;
+ uint32_t flags;
+};
+
struct amdgpu_device {
struct device *dev;
struct pci_dev *pdev;
@@ -1063,6 +1068,7 @@ struct amdgpu_device {
int asic_reset_res;
struct work_struct xgmi_reset_work;
+ struct work_struct gpu_reset_event_work;
struct list_head reset_list;
long gfx_timeout;
@@ -1097,6 +1103,7 @@ struct amdgpu_device {
pci_channel_state_t pci_channel_state;
struct amdgpu_reset_control *reset_cntl;
+ struct amdgpu_reset_event_ctx reset_event_ctx;
uint32_t ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
bool ram_is_direct_mapped;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ed077de426d9..c43d099da06d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -73,6 +73,7 @@
#include <linux/pm_runtime.h>
#include <drm/drm_drv.h>
+#include <drm/drm_sysfs.h>
MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
@@ -3277,6 +3278,23 @@ bool amdgpu_device_has_dc_support(struct amdgpu_device *adev)
return amdgpu_device_asic_has_dc_support(adev->asic_type);
}
+static void amdgpu_device_reset_event_func(struct work_struct *__work)
+{
+ struct amdgpu_device *adev = container_of(__work, struct amdgpu_device,
+ gpu_reset_event_work);
+ struct amdgpu_reset_event_ctx *event_ctx = &adev->reset_event_ctx;
+
+ /*
+ * A GPU reset has happened, indicate the userspace and pass the
+ * following information:
+ * - pid of the process involved,
+ * - if the VRAM is valid or not,
+ * - indicate that userspace may want to collect the ftrace event
+ * data from the trace event.
+ */
+ drm_sysfs_reset_event(&adev->ddev, event_ctx->pid, event_ctx->flags);
+}
+
static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)
{
struct amdgpu_device *adev =
@@ -3525,6 +3543,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
amdgpu_device_delay_enable_gfx_off);
INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
+ INIT_WORK(&adev->gpu_reset_event_work, amdgpu_device_reset_event_func);
adev->gfx.gfx_off_req_count = 1;
adev->pm.ac_power = power_supply_is_system_supplied() > 0;
--
2.32.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add work function for GPU reset event
2022-03-07 16:26 ` [PATCH 2/2] drm/amdgpu: add work function for GPU reset event Shashank Sharma
@ 2022-03-07 16:46 ` Somalapuram, Amaranath
2022-03-07 17:05 ` Sharma, Shashank
2022-03-08 7:15 ` Christian König
2022-03-08 16:26 ` Andrey Grodzovsky
2 siblings, 1 reply; 32+ messages in thread
From: Somalapuram, Amaranath @ 2022-03-07 16:46 UTC (permalink / raw)
To: Shashank Sharma, amd-gfx
Cc: Alexander Deucher, amaranath.somalapuram, Christian Koenig,
shashank.sharma
On 3/7/2022 9:56 PM, Shashank Sharma wrote:
> From: Shashank Sharma <shashank.sharma@amd.com>
>
> This patch adds a work function, which will get scheduled
> in event of a GPU reset, and will send a uevent to user with
> some reset context infomration, like a PID and some flags.
>
> The userspace can do some recovery and post-processing work
> based on this event.
>
> V2:
> - Changed the name of the work to gpu_reset_event_work
> (Christian)
> - Added a structure to accommodate some additional information
> (like a PID and some flags)
>
> Cc: Alexander Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++++
> 2 files changed, 26 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index d8b854fcbffa..7df219fe363f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -813,6 +813,11 @@ struct amd_powerplay {
> #define AMDGPU_RESET_MAGIC_NUM 64
> #define AMDGPU_MAX_DF_PERFMONS 4
> #define AMDGPU_PRODUCT_NAME_LEN 64
> +struct amdgpu_reset_event_ctx {
> + uint64_t pid;
> + uint32_t flags;
> +};
> +
> struct amdgpu_device {
> struct device *dev;
> struct pci_dev *pdev;
> @@ -1063,6 +1068,7 @@ struct amdgpu_device {
>
> int asic_reset_res;
> struct work_struct xgmi_reset_work;
> + struct work_struct gpu_reset_event_work;
> struct list_head reset_list;
>
> long gfx_timeout;
> @@ -1097,6 +1103,7 @@ struct amdgpu_device {
> pci_channel_state_t pci_channel_state;
>
> struct amdgpu_reset_control *reset_cntl;
> + struct amdgpu_reset_event_ctx reset_event_ctx;
> uint32_t ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
>
> bool ram_is_direct_mapped;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index ed077de426d9..c43d099da06d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -73,6 +73,7 @@
> #include <linux/pm_runtime.h>
>
> #include <drm/drm_drv.h>
> +#include <drm/drm_sysfs.h>
>
> MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
> MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
> @@ -3277,6 +3278,23 @@ bool amdgpu_device_has_dc_support(struct amdgpu_device *adev)
> return amdgpu_device_asic_has_dc_support(adev->asic_type);
> }
>
> +static void amdgpu_device_reset_event_func(struct work_struct *__work)
> +{
> + struct amdgpu_device *adev = container_of(__work, struct amdgpu_device,
> + gpu_reset_event_work);
I am trying same thing but adev context is lost.
schedule_work() in amdgpu_do_asic_reset after getting/reading vram_lost
= amdgpu_device_check_vram_lost(tmp_adev);
Regards,
S.Amarnath
> + struct amdgpu_reset_event_ctx *event_ctx = &adev->reset_event_ctx;
> +
> + /*
> + * A GPU reset has happened, indicate the userspace and pass the
> + * following information:
> + * - pid of the process involved,
> + * - if the VRAM is valid or not,
> + * - indicate that userspace may want to collect the ftrace event
> + * data from the trace event.
> + */
> + drm_sysfs_reset_event(&adev->ddev, event_ctx->pid, event_ctx->flags);
> +}
> +
> static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)
> {
> struct amdgpu_device *adev =
> @@ -3525,6 +3543,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> amdgpu_device_delay_enable_gfx_off);
>
> INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
> + INIT_WORK(&adev->gpu_reset_event_work, amdgpu_device_reset_event_func);
>
> adev->gfx.gfx_off_req_count = 1;
> adev->pm.ac_power = power_supply_is_system_supplied() > 0;
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add work function for GPU reset event
2022-03-07 16:46 ` Somalapuram, Amaranath
@ 2022-03-07 17:05 ` Sharma, Shashank
0 siblings, 0 replies; 32+ messages in thread
From: Sharma, Shashank @ 2022-03-07 17:05 UTC (permalink / raw)
To: Somalapuram, Amaranath, Shashank Sharma, amd-gfx
Cc: Alexander Deucher, amaranath.somalapuram, Christian Koenig
Hey Amar,
On 3/7/2022 5:46 PM, Somalapuram, Amaranath wrote:
>
> On 3/7/2022 9:56 PM, Shashank Sharma wrote:
>> From: Shashank Sharma <shashank.sharma@amd.com>
>>
>> This patch adds a work function, which will get scheduled
>> in event of a GPU reset, and will send a uevent to user with
>> some reset context infomration, like a PID and some flags.
>>
>> The userspace can do some recovery and post-processing work
>> based on this event.
>>
>> V2:
>> - Changed the name of the work to gpu_reset_event_work
>> (Christian)
>> - Added a structure to accommodate some additional information
>> (like a PID and some flags)
>>
>> Cc: Alexander Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +++++++
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++++
>> 2 files changed, 26 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index d8b854fcbffa..7df219fe363f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -813,6 +813,11 @@ struct amd_powerplay {
>> #define AMDGPU_RESET_MAGIC_NUM 64
>> #define AMDGPU_MAX_DF_PERFMONS 4
>> #define AMDGPU_PRODUCT_NAME_LEN 64
>> +struct amdgpu_reset_event_ctx {
>> + uint64_t pid;
>> + uint32_t flags;
>> +};
>> +
>> struct amdgpu_device {
>> struct device *dev;
>> struct pci_dev *pdev;
>> @@ -1063,6 +1068,7 @@ struct amdgpu_device {
>> int asic_reset_res;
>> struct work_struct xgmi_reset_work;
>> + struct work_struct gpu_reset_event_work;
>> struct list_head reset_list;
>> long gfx_timeout;
>> @@ -1097,6 +1103,7 @@ struct amdgpu_device {
>> pci_channel_state_t pci_channel_state;
>> struct amdgpu_reset_control *reset_cntl;
>> + struct amdgpu_reset_event_ctx reset_event_ctx;
>> uint32_t
>> ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
>> bool ram_is_direct_mapped;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index ed077de426d9..c43d099da06d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -73,6 +73,7 @@
>> #include <linux/pm_runtime.h>
>> #include <drm/drm_drv.h>
>> +#include <drm/drm_sysfs.h>
>> MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
>> MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
>> @@ -3277,6 +3278,23 @@ bool amdgpu_device_has_dc_support(struct
>> amdgpu_device *adev)
>> return amdgpu_device_asic_has_dc_support(adev->asic_type);
>> }
>> +static void amdgpu_device_reset_event_func(struct work_struct *__work)
>> +{
>> + struct amdgpu_device *adev = container_of(__work, struct
>> amdgpu_device,
>> + gpu_reset_event_work);
>
> I am trying same thing but adev context is lost.
>
> schedule_work() in amdgpu_do_asic_reset after getting/reading vram_lost
> = amdgpu_device_check_vram_lost(tmp_adev);
>
I am not sure if I understand your point correctly, but in this patch I
have introduced a struct amdgpu_reset_event_ctx, which already is a part
of adev, and has space to store both PID as well as flags.
So all you have to do is:
- GPU reset happens
- Save the pid in adev.reset_event_ctx.pid
- If VRAM valid, Save the vram status in adev.reset_event_ctx->flags |=
(BIT0)
and schedule this work function. If the data is saved properly, it will
reach the work function and the event will be sent.
- Shashank
> Regards,
>
> S.Amarnath
>
>> + struct amdgpu_reset_event_ctx *event_ctx = &adev->reset_event_ctx;
>> +
>> + /*
>> + * A GPU reset has happened, indicate the userspace and pass the
>> + * following information:
>> + * - pid of the process involved,
>> + * - if the VRAM is valid or not,
>> + * - indicate that userspace may want to collect the ftrace event
>> + * data from the trace event.
>> + */
>> + drm_sysfs_reset_event(&adev->ddev, event_ctx->pid,
>> event_ctx->flags);
>> +}
>> +
>> static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)
>> {
>> struct amdgpu_device *adev =
>> @@ -3525,6 +3543,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>> amdgpu_device_delay_enable_gfx_off);
>> INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
>> + INIT_WORK(&adev->gpu_reset_event_work,
>> amdgpu_device_reset_event_func);
>> adev->gfx.gfx_off_req_count = 1;
>> adev->pm.ac_power = power_supply_is_system_supplied() > 0;
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-07 16:26 [PATCH 1/2] drm: Add GPU reset sysfs event Shashank Sharma
2022-03-07 16:26 ` [PATCH 2/2] drm/amdgpu: add work function for GPU reset event Shashank Sharma
@ 2022-03-08 6:32 ` Somalapuram, Amaranath
2022-03-08 7:06 ` Christian König
2022-03-08 16:25 ` Andrey Grodzovsky
3 siblings, 0 replies; 32+ messages in thread
From: Somalapuram, Amaranath @ 2022-03-08 6:32 UTC (permalink / raw)
To: Shashank Sharma, amd-gfx
Cc: Alexandar Deucher, amaranath.somalapuram, Christian Koenig,
shashank.sharma
[-- Attachment #1: Type: text/plain, Size: 3039 bytes --]
On 3/7/2022 9:56 PM, Shashank Sharma wrote:
> From: Shashank Sharma<shashank.sharma@amd.com>
>
> This patch adds a new sysfs event, which will indicate
> the userland about a GPU reset, and can also provide
> some information like:
> - which PID was involved in the GPU reset
> - what was the GPU status (using flags)
>
> This patch also introduces the first flag of the flags
> bitmap, which can be appended as and when required.
>
> Cc: Alexandar Deucher<alexander.deucher@amd.com>
> Cc: Christian Koenig<christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma<shashank.sharma@amd.com>
> ---
> drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
> include/drm/drm_sysfs.h | 3 +++
> 2 files changed, 27 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
> index 430e00b16eec..52a015161431 100644
> --- a/drivers/gpu/drm/drm_sysfs.c
> +++ b/drivers/gpu/drm/drm_sysfs.c
> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct drm_device *dev)
> }
> EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>
> +/**
> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
> + * @dev: DRM device
> + * @pid: The process ID involve with the reset
> + * @flags: Any other information about the GPU status
> + *
> + * Send a uevent for the DRM device specified by @dev. This indicates
> + * user that a GPU reset has occurred, so that the interested client
> + * can take any recovery or profiling measure, when required.
> + */
> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, uint32_t flags)
we need process name
char process_name[TASK_COMM_LEN];
> +{
> + unsigned char pid_str[21], flags_str[15];
> + unsigned char reset_str[] = "RESET=1";
> + char *envp[] = { reset_str, pid_str, flags_str, NULL };
> +
> + DRM_DEBUG("generating reset event\n");
> +
> + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
> + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> +}
> +EXPORT_SYMBOL(drm_sysfs_reset_event);
> +
> /**
> * drm_sysfs_connector_hotplug_event - generate a DRM uevent for any connector
> * change
> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
> index 6273cac44e47..63f00fe8054c 100644
> --- a/include/drm/drm_sysfs.h
> +++ b/include/drm/drm_sysfs.h
> @@ -2,6 +2,8 @@
> #ifndef _DRM_SYSFS_H_
> #define _DRM_SYSFS_H_
>
> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
> +
> struct drm_device;
> struct device;
> struct drm_connector;
> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
> void drm_class_device_unregister(struct device *dev);
>
> void drm_sysfs_hotplug_event(struct drm_device *dev);
> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, uint32_t reset_flags);
> void drm_sysfs_connector_hotplug_event(struct drm_connector *connector);
> void drm_sysfs_connector_status_event(struct drm_connector *connector,
> struct drm_property *property);
[-- Attachment #2: Type: text/html, Size: 4138 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-07 16:26 [PATCH 1/2] drm: Add GPU reset sysfs event Shashank Sharma
2022-03-07 16:26 ` [PATCH 2/2] drm/amdgpu: add work function for GPU reset event Shashank Sharma
2022-03-08 6:32 ` [PATCH 1/2] drm: Add GPU reset sysfs event Somalapuram, Amaranath
@ 2022-03-08 7:06 ` Christian König
2022-03-08 9:31 ` Sharma, Shashank
2022-03-08 16:25 ` Andrey Grodzovsky
3 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2022-03-08 7:06 UTC (permalink / raw)
To: Shashank Sharma, amd-gfx
Cc: Alexandar Deucher, amaranath.somalapuram, Christian Koenig,
shashank.sharma
Am 07.03.22 um 17:26 schrieb Shashank Sharma:
> From: Shashank Sharma <shashank.sharma@amd.com>
>
> This patch adds a new sysfs event, which will indicate
> the userland about a GPU reset, and can also provide
> some information like:
> - which PID was involved in the GPU reset
> - what was the GPU status (using flags)
>
> This patch also introduces the first flag of the flags
> bitmap, which can be appended as and when required.
Make sure to CC the dri-devel mailing list when reviewing this.
>
> Cc: Alexandar Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
> drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
> include/drm/drm_sysfs.h | 3 +++
> 2 files changed, 27 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
> index 430e00b16eec..52a015161431 100644
> --- a/drivers/gpu/drm/drm_sysfs.c
> +++ b/drivers/gpu/drm/drm_sysfs.c
> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct drm_device *dev)
> }
> EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>
> +/**
> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
> + * @dev: DRM device
> + * @pid: The process ID involve with the reset
> + * @flags: Any other information about the GPU status
> + *
> + * Send a uevent for the DRM device specified by @dev. This indicates
> + * user that a GPU reset has occurred, so that the interested client
> + * can take any recovery or profiling measure, when required.
> + */
> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, uint32_t flags)
The PID is usually only 32bit, but even better would be to use pid_t.
> +{
> + unsigned char pid_str[21], flags_str[15];
> + unsigned char reset_str[] = "RESET=1";
> + char *envp[] = { reset_str, pid_str, flags_str, NULL };
> +
> + DRM_DEBUG("generating reset event\n");
> +
> + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
> + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> +}
> +EXPORT_SYMBOL(drm_sysfs_reset_event);
> +
> /**
> * drm_sysfs_connector_hotplug_event - generate a DRM uevent for any connector
> * change
> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
> index 6273cac44e47..63f00fe8054c 100644
> --- a/include/drm/drm_sysfs.h
> +++ b/include/drm/drm_sysfs.h
> @@ -2,6 +2,8 @@
> #ifndef _DRM_SYSFS_H_
> #define _DRM_SYSFS_H_
>
> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
Probably better to define that the other way around, e.g.
DRM_GPU_RESET_FLAG_VRAM_LOST.
Apart from that looks good to me.
Christian.
> +
> struct drm_device;
> struct device;
> struct drm_connector;
> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
> void drm_class_device_unregister(struct device *dev);
>
> void drm_sysfs_hotplug_event(struct drm_device *dev);
> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, uint32_t reset_flags);
> void drm_sysfs_connector_hotplug_event(struct drm_connector *connector);
> void drm_sysfs_connector_status_event(struct drm_connector *connector,
> struct drm_property *property);
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add work function for GPU reset event
2022-03-07 16:26 ` [PATCH 2/2] drm/amdgpu: add work function for GPU reset event Shashank Sharma
2022-03-07 16:46 ` Somalapuram, Amaranath
@ 2022-03-08 7:15 ` Christian König
2022-03-08 9:35 ` Sharma, Shashank
2022-03-08 16:26 ` Andrey Grodzovsky
2 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2022-03-08 7:15 UTC (permalink / raw)
To: Shashank Sharma, amd-gfx
Cc: Alexander Deucher, amaranath.somalapuram, shashank.sharma
Am 07.03.22 um 17:26 schrieb Shashank Sharma:
> From: Shashank Sharma <shashank.sharma@amd.com>
>
> This patch adds a work function, which will get scheduled
> in event of a GPU reset, and will send a uevent to user with
> some reset context infomration, like a PID and some flags.
>
> The userspace can do some recovery and post-processing work
> based on this event.
>
> V2:
> - Changed the name of the work to gpu_reset_event_work
> (Christian)
> - Added a structure to accommodate some additional information
> (like a PID and some flags)
>
> Cc: Alexander Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++++
> 2 files changed, 26 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index d8b854fcbffa..7df219fe363f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -813,6 +813,11 @@ struct amd_powerplay {
> #define AMDGPU_RESET_MAGIC_NUM 64
> #define AMDGPU_MAX_DF_PERFMONS 4
> #define AMDGPU_PRODUCT_NAME_LEN 64
> +struct amdgpu_reset_event_ctx {
> + uint64_t pid;
> + uint32_t flags;
> +};
> +
Please don't put any new structures into amdgpu.h. If I'm not completely
mistaken Andrey has created a new header for all the reset related stuff.
I would also reconsider the name, at least drop the _ctx suffix.
Regards,
Christian.
> struct amdgpu_device {
> struct device *dev;
> struct pci_dev *pdev;
> @@ -1063,6 +1068,7 @@ struct amdgpu_device {
>
> int asic_reset_res;
> struct work_struct xgmi_reset_work;
> + struct work_struct gpu_reset_event_work;
> struct list_head reset_list;
>
> long gfx_timeout;
> @@ -1097,6 +1103,7 @@ struct amdgpu_device {
> pci_channel_state_t pci_channel_state;
>
> struct amdgpu_reset_control *reset_cntl;
> + struct amdgpu_reset_event_ctx reset_event_ctx;
> uint32_t ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
>
> bool ram_is_direct_mapped;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index ed077de426d9..c43d099da06d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -73,6 +73,7 @@
> #include <linux/pm_runtime.h>
>
> #include <drm/drm_drv.h>
> +#include <drm/drm_sysfs.h>
>
> MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
> MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
> @@ -3277,6 +3278,23 @@ bool amdgpu_device_has_dc_support(struct amdgpu_device *adev)
> return amdgpu_device_asic_has_dc_support(adev->asic_type);
> }
>
> +static void amdgpu_device_reset_event_func(struct work_struct *__work)
> +{
> + struct amdgpu_device *adev = container_of(__work, struct amdgpu_device,
> + gpu_reset_event_work);
> + struct amdgpu_reset_event_ctx *event_ctx = &adev->reset_event_ctx;
> +
> + /*
> + * A GPU reset has happened, indicate the userspace and pass the
> + * following information:
> + * - pid of the process involved,
> + * - if the VRAM is valid or not,
> + * - indicate that userspace may want to collect the ftrace event
> + * data from the trace event.
> + */
> + drm_sysfs_reset_event(&adev->ddev, event_ctx->pid, event_ctx->flags);
> +}
> +
> static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)
> {
> struct amdgpu_device *adev =
> @@ -3525,6 +3543,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> amdgpu_device_delay_enable_gfx_off);
>
> INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
> + INIT_WORK(&adev->gpu_reset_event_work, amdgpu_device_reset_event_func);
>
> adev->gfx.gfx_off_req_count = 1;
> adev->pm.ac_power = power_supply_is_system_supplied() > 0;
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 7:06 ` Christian König
@ 2022-03-08 9:31 ` Sharma, Shashank
2022-03-08 10:32 ` Christian König
0 siblings, 1 reply; 32+ messages in thread
From: Sharma, Shashank @ 2022-03-08 9:31 UTC (permalink / raw)
To: Christian König, Shashank Sharma, amd-gfx
Cc: Alexandar Deucher, amaranath.somalapuram, Christian Koenig
On 3/8/2022 8:06 AM, Christian König wrote:
> Am 07.03.22 um 17:26 schrieb Shashank Sharma:
>> From: Shashank Sharma <shashank.sharma@amd.com>
>>
>> This patch adds a new sysfs event, which will indicate
>> the userland about a GPU reset, and can also provide
>> some information like:
>> - which PID was involved in the GPU reset
>> - what was the GPU status (using flags)
>>
>> This patch also introduces the first flag of the flags
>> bitmap, which can be appended as and when required.
>
> Make sure to CC the dri-devel mailing list when reviewing this.
Got it,
I was also curious if we want to move the reset_ctx structure itself to
DRM layer, like
drm_reset_event_ctx {
u32 pid;
u32 flags;
char process_name[64];
};
and then:
void drm_sysfs_reset_event(struct drm_device *dev, drm_reset_event_ctx
*ctx);
>
>>
>> Cc: Alexandar Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>> drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>> include/drm/drm_sysfs.h | 3 +++
>> 2 files changed, 27 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
>> index 430e00b16eec..52a015161431 100644
>> --- a/drivers/gpu/drm/drm_sysfs.c
>> +++ b/drivers/gpu/drm/drm_sysfs.c
>> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct drm_device *dev)
>> }
>> EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>> +/**
>> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
>> + * @dev: DRM device
>> + * @pid: The process ID involve with the reset
>> + * @flags: Any other information about the GPU status
>> + *
>> + * Send a uevent for the DRM device specified by @dev. This indicates
>> + * user that a GPU reset has occurred, so that the interested client
>> + * can take any recovery or profiling measure, when required.
>> + */
>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>> uint32_t flags)
>
> The PID is usually only 32bit, but even better would be to use pid_t.
>
>> +{
>> + unsigned char pid_str[21], flags_str[15];
>> + unsigned char reset_str[] = "RESET=1";
>> + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>> +
>> + DRM_DEBUG("generating reset event\n");
>> +
>> + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>> + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>> +}
>> +EXPORT_SYMBOL(drm_sysfs_reset_event);
>> +
>> /**
>> * drm_sysfs_connector_hotplug_event - generate a DRM uevent for any
>> connector
>> * change
>> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>> index 6273cac44e47..63f00fe8054c 100644
>> --- a/include/drm/drm_sysfs.h
>> +++ b/include/drm/drm_sysfs.h
>> @@ -2,6 +2,8 @@
>> #ifndef _DRM_SYSFS_H_
>> #define _DRM_SYSFS_H_
>> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>
> Probably better to define that the other way around, e.g.
> DRM_GPU_RESET_FLAG_VRAM_LOST.
>
> Apart from that looks good to me.
>
Got it, noted.
- Shashank
> Christian.
>
>> +
>> struct drm_device;
>> struct device;
>> struct drm_connector;
>> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>> void drm_class_device_unregister(struct device *dev);
>> void drm_sysfs_hotplug_event(struct drm_device *dev);
>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>> uint32_t reset_flags);
>> void drm_sysfs_connector_hotplug_event(struct drm_connector
>> *connector);
>> void drm_sysfs_connector_status_event(struct drm_connector *connector,
>> struct drm_property *property);
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add work function for GPU reset event
2022-03-08 7:15 ` Christian König
@ 2022-03-08 9:35 ` Sharma, Shashank
0 siblings, 0 replies; 32+ messages in thread
From: Sharma, Shashank @ 2022-03-08 9:35 UTC (permalink / raw)
To: Christian König, Shashank Sharma, amd-gfx
Cc: Alexander Deucher, amaranath.somalapuram
On 3/8/2022 8:15 AM, Christian König wrote:
> Am 07.03.22 um 17:26 schrieb Shashank Sharma:
>> From: Shashank Sharma <shashank.sharma@amd.com>
>>
>> This patch adds a work function, which will get scheduled
>> in event of a GPU reset, and will send a uevent to user with
>> some reset context infomration, like a PID and some flags.
>>
>> The userspace can do some recovery and post-processing work
>> based on this event.
>>
>> V2:
>> - Changed the name of the work to gpu_reset_event_work
>> (Christian)
>> - Added a structure to accommodate some additional information
>> (like a PID and some flags)
>>
>> Cc: Alexander Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +++++++
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++++
>> 2 files changed, 26 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index d8b854fcbffa..7df219fe363f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -813,6 +813,11 @@ struct amd_powerplay {
>> #define AMDGPU_RESET_MAGIC_NUM 64
>> #define AMDGPU_MAX_DF_PERFMONS 4
>> #define AMDGPU_PRODUCT_NAME_LEN 64
>> +struct amdgpu_reset_event_ctx {
>> + uint64_t pid;
>> + uint32_t flags;
>> +};
>> +
>
> Please don't put any new structures into amdgpu.h. If I'm not completely
> mistaken Andrey has created a new header for all the reset related stuff. >
> I would also reconsider the name, at least drop the _ctx suffix.
>
(Same as comment from other patch) should we move this structure in drm
level (drm_sysfs.h) like
drm_reset_event {
u32 pid;
u32 flags;
char pname[64];
};
- Shashank
> Regards,
> Christian.
>
>> struct amdgpu_device {
>> struct device *dev;
>> struct pci_dev *pdev;
>> @@ -1063,6 +1068,7 @@ struct amdgpu_device {
>> int asic_reset_res;
>> struct work_struct xgmi_reset_work;
>> + struct work_struct gpu_reset_event_work;
>> struct list_head reset_list;
>> long gfx_timeout;
>> @@ -1097,6 +1103,7 @@ struct amdgpu_device {
>> pci_channel_state_t pci_channel_state;
>> struct amdgpu_reset_control *reset_cntl;
>> + struct amdgpu_reset_event_ctx reset_event_ctx;
>> uint32_t
>> ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
>> bool ram_is_direct_mapped;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index ed077de426d9..c43d099da06d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -73,6 +73,7 @@
>> #include <linux/pm_runtime.h>
>> #include <drm/drm_drv.h>
>> +#include <drm/drm_sysfs.h>
>> MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
>> MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
>> @@ -3277,6 +3278,23 @@ bool amdgpu_device_has_dc_support(struct
>> amdgpu_device *adev)
>> return amdgpu_device_asic_has_dc_support(adev->asic_type);
>> }
>> +static void amdgpu_device_reset_event_func(struct work_struct *__work)
>> +{
>> + struct amdgpu_device *adev = container_of(__work, struct
>> amdgpu_device,
>> + gpu_reset_event_work);
>> + struct amdgpu_reset_event_ctx *event_ctx = &adev->reset_event_ctx;
>> +
>> + /*
>> + * A GPU reset has happened, indicate the userspace and pass the
>> + * following information:
>> + * - pid of the process involved,
>> + * - if the VRAM is valid or not,
>> + * - indicate that userspace may want to collect the ftrace event
>> + * data from the trace event.
>> + */
>> + drm_sysfs_reset_event(&adev->ddev, event_ctx->pid,
>> event_ctx->flags);
>> +}
>> +
>> static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)
>> {
>> struct amdgpu_device *adev =
>> @@ -3525,6 +3543,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>> amdgpu_device_delay_enable_gfx_off);
>> INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
>> + INIT_WORK(&adev->gpu_reset_event_work,
>> amdgpu_device_reset_event_func);
>> adev->gfx.gfx_off_req_count = 1;
>> adev->pm.ac_power = power_supply_is_system_supplied() > 0;
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 9:31 ` Sharma, Shashank
@ 2022-03-08 10:32 ` Christian König
2022-03-08 11:56 ` Sharma, Shashank
0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2022-03-08 10:32 UTC (permalink / raw)
To: Sharma, Shashank, Christian König, Shashank Sharma, amd-gfx
Cc: Alexandar Deucher, amaranath.somalapuram
Am 08.03.22 um 10:31 schrieb Sharma, Shashank:
>
>
> On 3/8/2022 8:06 AM, Christian König wrote:
>> Am 07.03.22 um 17:26 schrieb Shashank Sharma:
>>> From: Shashank Sharma <shashank.sharma@amd.com>
>>>
>>> This patch adds a new sysfs event, which will indicate
>>> the userland about a GPU reset, and can also provide
>>> some information like:
>>> - which PID was involved in the GPU reset
>>> - what was the GPU status (using flags)
>>>
>>> This patch also introduces the first flag of the flags
>>> bitmap, which can be appended as and when required.
>>
>> Make sure to CC the dri-devel mailing list when reviewing this.
> Got it,
>
> I was also curious if we want to move the reset_ctx structure itself
> to DRM layer, like
> drm_reset_event_ctx {
> u32 pid;
> u32 flags;
> char process_name[64];
> };
I was entertaining that thought as well.
But if we do this I would go even a step further and also move the reset
work item into the DRM layer as well.
You might also look like into migrating the exiting i915 code which uses
udev to signal GPU resets to this function as well.
Regards,
Christian.
>
> and then:
> void drm_sysfs_reset_event(struct drm_device *dev, drm_reset_event_ctx
> *ctx);
>
>>
>>>
>>> Cc: Alexandar Deucher <alexander.deucher@amd.com>
>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>> ---
>>> drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>> include/drm/drm_sysfs.h | 3 +++
>>> 2 files changed, 27 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
>>> index 430e00b16eec..52a015161431 100644
>>> --- a/drivers/gpu/drm/drm_sysfs.c
>>> +++ b/drivers/gpu/drm/drm_sysfs.c
>>> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct drm_device
>>> *dev)
>>> }
>>> EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>> +/**
>>> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
>>> + * @dev: DRM device
>>> + * @pid: The process ID involve with the reset
>>> + * @flags: Any other information about the GPU status
>>> + *
>>> + * Send a uevent for the DRM device specified by @dev. This indicates
>>> + * user that a GPU reset has occurred, so that the interested client
>>> + * can take any recovery or profiling measure, when required.
>>> + */
>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>> uint32_t flags)
>>
>> The PID is usually only 32bit, but even better would be to use pid_t.
>>
>>> +{
>>> + unsigned char pid_str[21], flags_str[15];
>>> + unsigned char reset_str[] = "RESET=1";
>>> + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>> +
>>> + DRM_DEBUG("generating reset event\n");
>>> +
>>> + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>> + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>> +}
>>> +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>> +
>>> /**
>>> * drm_sysfs_connector_hotplug_event - generate a DRM uevent for
>>> any connector
>>> * change
>>> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>> index 6273cac44e47..63f00fe8054c 100644
>>> --- a/include/drm/drm_sysfs.h
>>> +++ b/include/drm/drm_sysfs.h
>>> @@ -2,6 +2,8 @@
>>> #ifndef _DRM_SYSFS_H_
>>> #define _DRM_SYSFS_H_
>>> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>
>> Probably better to define that the other way around, e.g.
>> DRM_GPU_RESET_FLAG_VRAM_LOST.
>>
>> Apart from that looks good to me.
>>
> Got it, noted.
> - Shashank
>
>> Christian.
>>
>>> +
>>> struct drm_device;
>>> struct device;
>>> struct drm_connector;
>>> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>>> void drm_class_device_unregister(struct device *dev);
>>> void drm_sysfs_hotplug_event(struct drm_device *dev);
>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>> uint32_t reset_flags);
>>> void drm_sysfs_connector_hotplug_event(struct drm_connector
>>> *connector);
>>> void drm_sysfs_connector_status_event(struct drm_connector
>>> *connector,
>>> struct drm_property *property);
>>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 10:32 ` Christian König
@ 2022-03-08 11:56 ` Sharma, Shashank
2022-03-08 15:37 ` Somalapuram, Amaranath
2022-03-08 16:40 ` Sharma, Shashank
0 siblings, 2 replies; 32+ messages in thread
From: Sharma, Shashank @ 2022-03-08 11:56 UTC (permalink / raw)
To: Christian König, Christian König, Shashank Sharma, amd-gfx
Cc: Alexandar Deucher, amaranath.somalapuram
On 3/8/2022 11:32 AM, Christian König wrote:
> Am 08.03.22 um 10:31 schrieb Sharma, Shashank:
>>
>>
>> On 3/8/2022 8:06 AM, Christian König wrote:
>>> Am 07.03.22 um 17:26 schrieb Shashank Sharma:
>>>> From: Shashank Sharma <shashank.sharma@amd.com>
>>>>
>>>> This patch adds a new sysfs event, which will indicate
>>>> the userland about a GPU reset, and can also provide
>>>> some information like:
>>>> - which PID was involved in the GPU reset
>>>> - what was the GPU status (using flags)
>>>>
>>>> This patch also introduces the first flag of the flags
>>>> bitmap, which can be appended as and when required.
>>>
>>> Make sure to CC the dri-devel mailing list when reviewing this.
>> Got it,
>>
>> I was also curious if we want to move the reset_ctx structure itself
>> to DRM layer, like
>> drm_reset_event_ctx {
>> u32 pid;
>> u32 flags;
>> char process_name[64];
>> };
>
> I was entertaining that thought as well.
>
> But if we do this I would go even a step further and also move the reset
> work item into the DRM layer as well.
>
> You might also look like into migrating the exiting i915 code which uses
> udev to signal GPU resets to this function as well.
>
> Regards,
> Christian.
That seems like a good idea, let me quickly dive into i915 and check
this out.
Shashank
>
>>
>> and then:
>> void drm_sysfs_reset_event(struct drm_device *dev, drm_reset_event_ctx
>> *ctx);
>>
>>>
>>>>
>>>> Cc: Alexandar Deucher <alexander.deucher@amd.com>
>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>> ---
>>>> drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>>> include/drm/drm_sysfs.h | 3 +++
>>>> 2 files changed, 27 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
>>>> index 430e00b16eec..52a015161431 100644
>>>> --- a/drivers/gpu/drm/drm_sysfs.c
>>>> +++ b/drivers/gpu/drm/drm_sysfs.c
>>>> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct drm_device
>>>> *dev)
>>>> }
>>>> EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>>> +/**
>>>> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
>>>> + * @dev: DRM device
>>>> + * @pid: The process ID involve with the reset
>>>> + * @flags: Any other information about the GPU status
>>>> + *
>>>> + * Send a uevent for the DRM device specified by @dev. This indicates
>>>> + * user that a GPU reset has occurred, so that the interested client
>>>> + * can take any recovery or profiling measure, when required.
>>>> + */
>>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>>> uint32_t flags)
>>>
>>> The PID is usually only 32bit, but even better would be to use pid_t.
>>>
>>>> +{
>>>> + unsigned char pid_str[21], flags_str[15];
>>>> + unsigned char reset_str[] = "RESET=1";
>>>> + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>>> +
>>>> + DRM_DEBUG("generating reset event\n");
>>>> +
>>>> + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>>> + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>>>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>>> +}
>>>> +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>>> +
>>>> /**
>>>> * drm_sysfs_connector_hotplug_event - generate a DRM uevent for
>>>> any connector
>>>> * change
>>>> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>>> index 6273cac44e47..63f00fe8054c 100644
>>>> --- a/include/drm/drm_sysfs.h
>>>> +++ b/include/drm/drm_sysfs.h
>>>> @@ -2,6 +2,8 @@
>>>> #ifndef _DRM_SYSFS_H_
>>>> #define _DRM_SYSFS_H_
>>>> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>>
>>> Probably better to define that the other way around, e.g.
>>> DRM_GPU_RESET_FLAG_VRAM_LOST.
>>>
>>> Apart from that looks good to me.
>>>
>> Got it, noted.
>> - Shashank
>>
>>> Christian.
>>>
>>>> +
>>>> struct drm_device;
>>>> struct device;
>>>> struct drm_connector;
>>>> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>>>> void drm_class_device_unregister(struct device *dev);
>>>> void drm_sysfs_hotplug_event(struct drm_device *dev);
>>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>>> uint32_t reset_flags);
>>>> void drm_sysfs_connector_hotplug_event(struct drm_connector
>>>> *connector);
>>>> void drm_sysfs_connector_status_event(struct drm_connector
>>>> *connector,
>>>> struct drm_property *property);
>>>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 11:56 ` Sharma, Shashank
@ 2022-03-08 15:37 ` Somalapuram, Amaranath
2022-03-09 8:02 ` Christian König
2022-03-08 16:40 ` Sharma, Shashank
1 sibling, 1 reply; 32+ messages in thread
From: Somalapuram, Amaranath @ 2022-03-08 15:37 UTC (permalink / raw)
To: Sharma, Shashank, Christian König, Christian König,
Shashank Sharma, amd-gfx
Cc: Alexandar Deucher, amaranath.somalapuram
On 3/8/2022 5:26 PM, Sharma, Shashank wrote:
>
>
> On 3/8/2022 11:32 AM, Christian König wrote:
>> Am 08.03.22 um 10:31 schrieb Sharma, Shashank:
>>>
>>>
>>> On 3/8/2022 8:06 AM, Christian König wrote:
>>>> Am 07.03.22 um 17:26 schrieb Shashank Sharma:
>>>>> From: Shashank Sharma <shashank.sharma@amd.com>
>>>>>
>>>>> This patch adds a new sysfs event, which will indicate
>>>>> the userland about a GPU reset, and can also provide
>>>>> some information like:
>>>>> - which PID was involved in the GPU reset
>>>>> - what was the GPU status (using flags)
>>>>>
>>>>> This patch also introduces the first flag of the flags
>>>>> bitmap, which can be appended as and when required.
>>>>
>>>> Make sure to CC the dri-devel mailing list when reviewing this.
>>> Got it,
>>>
>>> I was also curious if we want to move the reset_ctx structure itself
>>> to DRM layer, like
>>> drm_reset_event_ctx {
>>> u32 pid;
>>> u32 flags;
>>> char process_name[64];
>>> };
>>
>> I was entertaining that thought as well.
>>
>> But if we do this I would go even a step further and also move the
>> reset work item into the DRM layer as well.
>>
>> You might also look like into migrating the exiting i915 code which
>> uses udev to signal GPU resets to this function as well.
>>
Hi Christian,
Can we access adev in common drm (even if we can access adev it will not
be common code)
move work function to drm need to be protected(i.e reset_domain->sem),
adding something like reset_sem to drm_device?
Regards,
S.Amarnath
>> Regards,
>> Christian.
>
> That seems like a good idea, let me quickly dive into i915 and check
> this out.
>
> Shashank
>>
>>>
>>> and then:
>>> void drm_sysfs_reset_event(struct drm_device *dev,
>>> drm_reset_event_ctx *ctx);
>>>
>>>>
>>>>>
>>>>> Cc: Alexandar Deucher <alexander.deucher@amd.com>
>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>>>> include/drm/drm_sysfs.h | 3 +++
>>>>> 2 files changed, 27 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_sysfs.c
>>>>> b/drivers/gpu/drm/drm_sysfs.c
>>>>> index 430e00b16eec..52a015161431 100644
>>>>> --- a/drivers/gpu/drm/drm_sysfs.c
>>>>> +++ b/drivers/gpu/drm/drm_sysfs.c
>>>>> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct
>>>>> drm_device *dev)
>>>>> }
>>>>> EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>>>> +/**
>>>>> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU
>>>>> reset
>>>>> + * @dev: DRM device
>>>>> + * @pid: The process ID involve with the reset
>>>>> + * @flags: Any other information about the GPU status
>>>>> + *
>>>>> + * Send a uevent for the DRM device specified by @dev. This
>>>>> indicates
>>>>> + * user that a GPU reset has occurred, so that the interested client
>>>>> + * can take any recovery or profiling measure, when required.
>>>>> + */
>>>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>>>> uint32_t flags)
>>>>
>>>> The PID is usually only 32bit, but even better would be to use pid_t.
>>>>
>>>>> +{
>>>>> + unsigned char pid_str[21], flags_str[15];
>>>>> + unsigned char reset_str[] = "RESET=1";
>>>>> + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>>>> +
>>>>> + DRM_DEBUG("generating reset event\n");
>>>>> +
>>>>> + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>>>> + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>>>>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>>>> +}
>>>>> +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>>>> +
>>>>> /**
>>>>> * drm_sysfs_connector_hotplug_event - generate a DRM uevent for
>>>>> any connector
>>>>> * change
>>>>> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>>>> index 6273cac44e47..63f00fe8054c 100644
>>>>> --- a/include/drm/drm_sysfs.h
>>>>> +++ b/include/drm/drm_sysfs.h
>>>>> @@ -2,6 +2,8 @@
>>>>> #ifndef _DRM_SYSFS_H_
>>>>> #define _DRM_SYSFS_H_
>>>>> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>>>
>>>> Probably better to define that the other way around, e.g.
>>>> DRM_GPU_RESET_FLAG_VRAM_LOST.
>>>>
>>>> Apart from that looks good to me.
>>>>
>>> Got it, noted.
>>> - Shashank
>>>
>>>> Christian.
>>>>
>>>>> +
>>>>> struct drm_device;
>>>>> struct device;
>>>>> struct drm_connector;
>>>>> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>>>>> void drm_class_device_unregister(struct device *dev);
>>>>> void drm_sysfs_hotplug_event(struct drm_device *dev);
>>>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>>>> uint32_t reset_flags);
>>>>> void drm_sysfs_connector_hotplug_event(struct drm_connector
>>>>> *connector);
>>>>> void drm_sysfs_connector_status_event(struct drm_connector
>>>>> *connector,
>>>>> struct drm_property *property);
>>>>
>>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-07 16:26 [PATCH 1/2] drm: Add GPU reset sysfs event Shashank Sharma
` (2 preceding siblings ...)
2022-03-08 7:06 ` Christian König
@ 2022-03-08 16:25 ` Andrey Grodzovsky
2022-03-08 16:35 ` Sharma, Shashank
2022-03-08 16:36 ` Lazar, Lijo
3 siblings, 2 replies; 32+ messages in thread
From: Andrey Grodzovsky @ 2022-03-08 16:25 UTC (permalink / raw)
To: Shashank Sharma, amd-gfx
Cc: Alexandar Deucher, amaranath.somalapuram, Christian Koenig,
shashank.sharma
On 2022-03-07 11:26, Shashank Sharma wrote:
> From: Shashank Sharma <shashank.sharma@amd.com>
>
> This patch adds a new sysfs event, which will indicate
> the userland about a GPU reset, and can also provide
> some information like:
> - which PID was involved in the GPU reset
> - what was the GPU status (using flags)
>
> This patch also introduces the first flag of the flags
> bitmap, which can be appended as and when required.
I am reminding again about another important piece of info which you can add
here and that is Smart Trace Buffer dump [1]. The buffer size is HW
specific but
from what I see there is no problem to just amend it as part of envp[]
initialization.
bellow.
The interface to get the buffer is smu_stb_collect_info and usage can be
seen from
frebugfs interface in smu_stb_debugfs_open
[1] - https://www.spinics.net/lists/amd-gfx/msg70751.html
Andrey
>
> Cc: Alexandar Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
> drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
> include/drm/drm_sysfs.h | 3 +++
> 2 files changed, 27 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
> index 430e00b16eec..52a015161431 100644
> --- a/drivers/gpu/drm/drm_sysfs.c
> +++ b/drivers/gpu/drm/drm_sysfs.c
> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct drm_device *dev)
> }
> EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>
> +/**
> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
> + * @dev: DRM device
> + * @pid: The process ID involve with the reset
> + * @flags: Any other information about the GPU status
> + *
> + * Send a uevent for the DRM device specified by @dev. This indicates
> + * user that a GPU reset has occurred, so that the interested client
> + * can take any recovery or profiling measure, when required.
> + */
> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, uint32_t flags)
> +{
> + unsigned char pid_str[21], flags_str[15];
> + unsigned char reset_str[] = "RESET=1";
> + char *envp[] = { reset_str, pid_str, flags_str, NULL };
> +
> + DRM_DEBUG("generating reset event\n");
> +
> + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
> + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> +}
> +EXPORT_SYMBOL(drm_sysfs_reset_event);
> +
> /**
> * drm_sysfs_connector_hotplug_event - generate a DRM uevent for any connector
> * change
> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
> index 6273cac44e47..63f00fe8054c 100644
> --- a/include/drm/drm_sysfs.h
> +++ b/include/drm/drm_sysfs.h
> @@ -2,6 +2,8 @@
> #ifndef _DRM_SYSFS_H_
> #define _DRM_SYSFS_H_
>
> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
> +
> struct drm_device;
> struct device;
> struct drm_connector;
> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
> void drm_class_device_unregister(struct device *dev);
>
> void drm_sysfs_hotplug_event(struct drm_device *dev);
> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, uint32_t reset_flags);
> void drm_sysfs_connector_hotplug_event(struct drm_connector *connector);
> void drm_sysfs_connector_status_event(struct drm_connector *connector,
> struct drm_property *property);
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add work function for GPU reset event
2022-03-07 16:26 ` [PATCH 2/2] drm/amdgpu: add work function for GPU reset event Shashank Sharma
2022-03-07 16:46 ` Somalapuram, Amaranath
2022-03-08 7:15 ` Christian König
@ 2022-03-08 16:26 ` Andrey Grodzovsky
2022-03-08 16:30 ` Sharma, Shashank
2 siblings, 1 reply; 32+ messages in thread
From: Andrey Grodzovsky @ 2022-03-08 16:26 UTC (permalink / raw)
To: Shashank Sharma, amd-gfx
Cc: Alexander Deucher, amaranath.somalapuram, Christian Koenig,
shashank.sharma
On 2022-03-07 11:26, Shashank Sharma wrote:
> From: Shashank Sharma <shashank.sharma@amd.com>
>
> This patch adds a work function, which will get scheduled
> in event of a GPU reset, and will send a uevent to user with
> some reset context infomration, like a PID and some flags.
Where is the actual scheduling of the work function ? Shouldn't
there be a patch for that too ?
Andrey
>
> The userspace can do some recovery and post-processing work
> based on this event.
>
> V2:
> - Changed the name of the work to gpu_reset_event_work
> (Christian)
> - Added a structure to accommodate some additional information
> (like a PID and some flags)
>
> Cc: Alexander Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++++
> 2 files changed, 26 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index d8b854fcbffa..7df219fe363f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -813,6 +813,11 @@ struct amd_powerplay {
> #define AMDGPU_RESET_MAGIC_NUM 64
> #define AMDGPU_MAX_DF_PERFMONS 4
> #define AMDGPU_PRODUCT_NAME_LEN 64
> +struct amdgpu_reset_event_ctx {
> + uint64_t pid;
> + uint32_t flags;
> +};
> +
> struct amdgpu_device {
> struct device *dev;
> struct pci_dev *pdev;
> @@ -1063,6 +1068,7 @@ struct amdgpu_device {
>
> int asic_reset_res;
> struct work_struct xgmi_reset_work;
> + struct work_struct gpu_reset_event_work;
> struct list_head reset_list;
>
> long gfx_timeout;
> @@ -1097,6 +1103,7 @@ struct amdgpu_device {
> pci_channel_state_t pci_channel_state;
>
> struct amdgpu_reset_control *reset_cntl;
> + struct amdgpu_reset_event_ctx reset_event_ctx;
> uint32_t ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
>
> bool ram_is_direct_mapped;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index ed077de426d9..c43d099da06d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -73,6 +73,7 @@
> #include <linux/pm_runtime.h>
>
> #include <drm/drm_drv.h>
> +#include <drm/drm_sysfs.h>
>
> MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
> MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
> @@ -3277,6 +3278,23 @@ bool amdgpu_device_has_dc_support(struct amdgpu_device *adev)
> return amdgpu_device_asic_has_dc_support(adev->asic_type);
> }
>
> +static void amdgpu_device_reset_event_func(struct work_struct *__work)
> +{
> + struct amdgpu_device *adev = container_of(__work, struct amdgpu_device,
> + gpu_reset_event_work);
> + struct amdgpu_reset_event_ctx *event_ctx = &adev->reset_event_ctx;
> +
> + /*
> + * A GPU reset has happened, indicate the userspace and pass the
> + * following information:
> + * - pid of the process involved,
> + * - if the VRAM is valid or not,
> + * - indicate that userspace may want to collect the ftrace event
> + * data from the trace event.
> + */
> + drm_sysfs_reset_event(&adev->ddev, event_ctx->pid, event_ctx->flags);
> +}
> +
> static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)
> {
> struct amdgpu_device *adev =
> @@ -3525,6 +3543,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> amdgpu_device_delay_enable_gfx_off);
>
> INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
> + INIT_WORK(&adev->gpu_reset_event_work, amdgpu_device_reset_event_func);
>
> adev->gfx.gfx_off_req_count = 1;
> adev->pm.ac_power = power_supply_is_system_supplied() > 0;
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add work function for GPU reset event
2022-03-08 16:26 ` Andrey Grodzovsky
@ 2022-03-08 16:30 ` Sharma, Shashank
2022-03-08 17:20 ` Somalapuram, Amaranath
0 siblings, 1 reply; 32+ messages in thread
From: Sharma, Shashank @ 2022-03-08 16:30 UTC (permalink / raw)
To: Andrey Grodzovsky, Shashank Sharma, amd-gfx
Cc: Alexander Deucher, amaranath.somalapuram, Christian Koenig
Hello Andrey
On 3/8/2022 5:26 PM, Andrey Grodzovsky wrote:
>
> On 2022-03-07 11:26, Shashank Sharma wrote:
>> From: Shashank Sharma <shashank.sharma@amd.com>
>>
>> This patch adds a work function, which will get scheduled
>> in event of a GPU reset, and will send a uevent to user with
>> some reset context infomration, like a PID and some flags.
>
>
> Where is the actual scheduling of the work function ? Shouldn't
> there be a patch for that too ?
>
Yes, Amar is working on that patch, on top of these patches. They should
be out soon. I thought it was a good idea to get quick feedback on the
basic patches before we build something on top of it.
- Shashank
> Andrey
>
>
>>
>> The userspace can do some recovery and post-processing work
>> based on this event.
>>
>> V2:
>> - Changed the name of the work to gpu_reset_event_work
>> (Christian)
>> - Added a structure to accommodate some additional information
>> (like a PID and some flags)
>>
>> Cc: Alexander Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +++++++
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++++
>> 2 files changed, 26 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index d8b854fcbffa..7df219fe363f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -813,6 +813,11 @@ struct amd_powerplay {
>> #define AMDGPU_RESET_MAGIC_NUM 64
>> #define AMDGPU_MAX_DF_PERFMONS 4
>> #define AMDGPU_PRODUCT_NAME_LEN 64
>> +struct amdgpu_reset_event_ctx {
>> + uint64_t pid;
>> + uint32_t flags;
>> +};
>> +
>> struct amdgpu_device {
>> struct device *dev;
>> struct pci_dev *pdev;
>> @@ -1063,6 +1068,7 @@ struct amdgpu_device {
>> int asic_reset_res;
>> struct work_struct xgmi_reset_work;
>> + struct work_struct gpu_reset_event_work;
>> struct list_head reset_list;
>> long gfx_timeout;
>> @@ -1097,6 +1103,7 @@ struct amdgpu_device {
>> pci_channel_state_t pci_channel_state;
>> struct amdgpu_reset_control *reset_cntl;
>> + struct amdgpu_reset_event_ctx reset_event_ctx;
>> uint32_t
>> ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
>> bool ram_is_direct_mapped;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index ed077de426d9..c43d099da06d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -73,6 +73,7 @@
>> #include <linux/pm_runtime.h>
>> #include <drm/drm_drv.h>
>> +#include <drm/drm_sysfs.h>
>> MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
>> MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
>> @@ -3277,6 +3278,23 @@ bool amdgpu_device_has_dc_support(struct
>> amdgpu_device *adev)
>> return amdgpu_device_asic_has_dc_support(adev->asic_type);
>> }
>> +static void amdgpu_device_reset_event_func(struct work_struct *__work)
>> +{
>> + struct amdgpu_device *adev = container_of(__work, struct
>> amdgpu_device,
>> + gpu_reset_event_work);
>> + struct amdgpu_reset_event_ctx *event_ctx = &adev->reset_event_ctx;
>> +
>> + /*
>> + * A GPU reset has happened, indicate the userspace and pass the
>> + * following information:
>> + * - pid of the process involved,
>> + * - if the VRAM is valid or not,
>> + * - indicate that userspace may want to collect the ftrace event
>> + * data from the trace event.
>> + */
>> + drm_sysfs_reset_event(&adev->ddev, event_ctx->pid,
>> event_ctx->flags);
>> +}
>> +
>> static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)
>> {
>> struct amdgpu_device *adev =
>> @@ -3525,6 +3543,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>> amdgpu_device_delay_enable_gfx_off);
>> INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
>> + INIT_WORK(&adev->gpu_reset_event_work,
>> amdgpu_device_reset_event_func);
>> adev->gfx.gfx_off_req_count = 1;
>> adev->pm.ac_power = power_supply_is_system_supplied() > 0;
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 16:25 ` Andrey Grodzovsky
@ 2022-03-08 16:35 ` Sharma, Shashank
2022-03-08 16:36 ` Andrey Grodzovsky
2022-03-08 16:36 ` Lazar, Lijo
1 sibling, 1 reply; 32+ messages in thread
From: Sharma, Shashank @ 2022-03-08 16:35 UTC (permalink / raw)
To: Andrey Grodzovsky, Shashank Sharma, amd-gfx
Cc: Alexandar Deucher, amaranath.somalapuram, Christian Koenig
On 3/8/2022 5:25 PM, Andrey Grodzovsky wrote:
>
> On 2022-03-07 11:26, Shashank Sharma wrote:
>> From: Shashank Sharma <shashank.sharma@amd.com>
>>
>> This patch adds a new sysfs event, which will indicate
>> the userland about a GPU reset, and can also provide
>> some information like:
>> - which PID was involved in the GPU reset
>> - what was the GPU status (using flags)
>>
>> This patch also introduces the first flag of the flags
>> bitmap, which can be appended as and when required.
>
>
> I am reminding again about another important piece of info which you can
> add
> here and that is Smart Trace Buffer dump [1]. The buffer size is HW
> specific but
> from what I see there is no problem to just amend it as part of envp[]
> initialization.
> bellow.
>
> The interface to get the buffer is smu_stb_collect_info and usage can be
> seen from
> frebugfs interface in smu_stb_debugfs_open
>
> [1] - https://www.spinics.net/lists/amd-gfx/msg70751.html
>
Noted Andrey, thank for the reminder. As you can see, this patch is
going into DRM layer, so as of now we are accommodating the PID and VRAM
validity information, which is common to all the DRM drivers (not only
AMDGPU). But as a next step, we will extend this interface to provide
driver specific custom data as well, and that is where we will start
digging into STB.
- Shashank
> Andrey
>
>
>>
>> Cc: Alexandar Deucher <alexander.deucher@amd.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> ---
>> drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>> include/drm/drm_sysfs.h | 3 +++
>> 2 files changed, 27 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
>> index 430e00b16eec..52a015161431 100644
>> --- a/drivers/gpu/drm/drm_sysfs.c
>> +++ b/drivers/gpu/drm/drm_sysfs.c
>> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct drm_device *dev)
>> }
>> EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>> +/**
>> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
>> + * @dev: DRM device
>> + * @pid: The process ID involve with the reset
>> + * @flags: Any other information about the GPU status
>> + *
>> + * Send a uevent for the DRM device specified by @dev. This indicates
>> + * user that a GPU reset has occurred, so that the interested client
>> + * can take any recovery or profiling measure, when required.
>> + */
>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>> uint32_t flags)
>> +{
>> + unsigned char pid_str[21], flags_str[15];
>> + unsigned char reset_str[] = "RESET=1";
>> + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>> +
>> + DRM_DEBUG("generating reset event\n");
>> +
>> + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>> + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>> +}
>> +EXPORT_SYMBOL(drm_sysfs_reset_event);
>> +
>> /**
>> * drm_sysfs_connector_hotplug_event - generate a DRM uevent for any
>> connector
>> * change
>> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>> index 6273cac44e47..63f00fe8054c 100644
>> --- a/include/drm/drm_sysfs.h
>> +++ b/include/drm/drm_sysfs.h
>> @@ -2,6 +2,8 @@
>> #ifndef _DRM_SYSFS_H_
>> #define _DRM_SYSFS_H_
>> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>> +
>> struct drm_device;
>> struct device;
>> struct drm_connector;
>> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>> void drm_class_device_unregister(struct device *dev);
>> void drm_sysfs_hotplug_event(struct drm_device *dev);
>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>> uint32_t reset_flags);
>> void drm_sysfs_connector_hotplug_event(struct drm_connector
>> *connector);
>> void drm_sysfs_connector_status_event(struct drm_connector *connector,
>> struct drm_property *property);
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 16:35 ` Sharma, Shashank
@ 2022-03-08 16:36 ` Andrey Grodzovsky
0 siblings, 0 replies; 32+ messages in thread
From: Andrey Grodzovsky @ 2022-03-08 16:36 UTC (permalink / raw)
To: Sharma, Shashank, Shashank Sharma, amd-gfx
Cc: Alexandar Deucher, amaranath.somalapuram, Christian Koenig
On 2022-03-08 11:35, Sharma, Shashank wrote:
>
>
> On 3/8/2022 5:25 PM, Andrey Grodzovsky wrote:
>>
>> On 2022-03-07 11:26, Shashank Sharma wrote:
>>> From: Shashank Sharma <shashank.sharma@amd.com>
>>>
>>> This patch adds a new sysfs event, which will indicate
>>> the userland about a GPU reset, and can also provide
>>> some information like:
>>> - which PID was involved in the GPU reset
>>> - what was the GPU status (using flags)
>>>
>>> This patch also introduces the first flag of the flags
>>> bitmap, which can be appended as and when required.
>>
>>
>> I am reminding again about another important piece of info which you
>> can add
>> here and that is Smart Trace Buffer dump [1]. The buffer size is HW
>> specific but
>> from what I see there is no problem to just amend it as part of
>> envp[] initialization.
>> bellow.
>>
>> The interface to get the buffer is smu_stb_collect_info and usage can
>> be seen from
>> frebugfs interface in smu_stb_debugfs_open
>>
>> [1] - https://www.spinics.net/lists/amd-gfx/msg70751.html
>>
>
> Noted Andrey, thank for the reminder. As you can see, this patch is
> going into DRM layer, so as of now we are accommodating the PID and
> VRAM validity information, which is common to all the DRM drivers (not
> only AMDGPU). But as a next step, we will extend this interface to
> provide driver specific custom data as well, and that is where we
> will start digging into STB.
> - Shashank
Got it.
Andrey
>
>> Andrey
>>
>>
>>>
>>> Cc: Alexandar Deucher <alexander.deucher@amd.com>
>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>> ---
>>> drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>> include/drm/drm_sysfs.h | 3 +++
>>> 2 files changed, 27 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
>>> index 430e00b16eec..52a015161431 100644
>>> --- a/drivers/gpu/drm/drm_sysfs.c
>>> +++ b/drivers/gpu/drm/drm_sysfs.c
>>> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct drm_device
>>> *dev)
>>> }
>>> EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>> +/**
>>> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
>>> + * @dev: DRM device
>>> + * @pid: The process ID involve with the reset
>>> + * @flags: Any other information about the GPU status
>>> + *
>>> + * Send a uevent for the DRM device specified by @dev. This indicates
>>> + * user that a GPU reset has occurred, so that the interested client
>>> + * can take any recovery or profiling measure, when required.
>>> + */
>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>> uint32_t flags)
>>> +{
>>> + unsigned char pid_str[21], flags_str[15];
>>> + unsigned char reset_str[] = "RESET=1";
>>> + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>> +
>>> + DRM_DEBUG("generating reset event\n");
>>> +
>>> + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>> + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>> +}
>>> +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>> +
>>> /**
>>> * drm_sysfs_connector_hotplug_event - generate a DRM uevent for
>>> any connector
>>> * change
>>> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>> index 6273cac44e47..63f00fe8054c 100644
>>> --- a/include/drm/drm_sysfs.h
>>> +++ b/include/drm/drm_sysfs.h
>>> @@ -2,6 +2,8 @@
>>> #ifndef _DRM_SYSFS_H_
>>> #define _DRM_SYSFS_H_
>>> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>> +
>>> struct drm_device;
>>> struct device;
>>> struct drm_connector;
>>> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>>> void drm_class_device_unregister(struct device *dev);
>>> void drm_sysfs_hotplug_event(struct drm_device *dev);
>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>> uint32_t reset_flags);
>>> void drm_sysfs_connector_hotplug_event(struct drm_connector
>>> *connector);
>>> void drm_sysfs_connector_status_event(struct drm_connector
>>> *connector,
>>> struct drm_property *property);
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 16:25 ` Andrey Grodzovsky
2022-03-08 16:35 ` Sharma, Shashank
@ 2022-03-08 16:36 ` Lazar, Lijo
2022-03-08 16:39 ` Andrey Grodzovsky
1 sibling, 1 reply; 32+ messages in thread
From: Lazar, Lijo @ 2022-03-08 16:36 UTC (permalink / raw)
To: Grodzovsky, Andrey, Shashank Sharma, amd-gfx
Cc: Deucher, Alexander, Limonciello, Mario, Somalapuram, Amaranath,
Koenig, Christian, Sharma, Shashank
[-- Attachment #1: Type: text/plain, Size: 4709 bytes --]
[AMD Official Use Only]
+Mario
I guess that means the functionality needs to be present in amdgpu for APUs also. Presently, this is taken care by PMC driver for APUs.
Thanks,
Lijo
________________________________
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Sent: Tuesday, March 8, 2022 9:55:03 PM
To: Shashank Sharma <contactshashanksharma@gmail.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
Subject: Re: [PATCH 1/2] drm: Add GPU reset sysfs event
On 2022-03-07 11:26, Shashank Sharma wrote:
> From: Shashank Sharma <shashank.sharma@amd.com>
>
> This patch adds a new sysfs event, which will indicate
> the userland about a GPU reset, and can also provide
> some information like:
> - which PID was involved in the GPU reset
> - what was the GPU status (using flags)
>
> This patch also introduces the first flag of the flags
> bitmap, which can be appended as and when required.
I am reminding again about another important piece of info which you can add
here and that is Smart Trace Buffer dump [1]. The buffer size is HW
specific but
from what I see there is no problem to just amend it as part of envp[]
initialization.
bellow.
The interface to get the buffer is smu_stb_collect_info and usage can be
seen from
frebugfs interface in smu_stb_debugfs_open
[1] - https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0
Andrey
>
> Cc: Alexandar Deucher <alexander.deucher@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> ---
> drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
> include/drm/drm_sysfs.h | 3 +++
> 2 files changed, 27 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
> index 430e00b16eec..52a015161431 100644
> --- a/drivers/gpu/drm/drm_sysfs.c
> +++ b/drivers/gpu/drm/drm_sysfs.c
> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct drm_device *dev)
> }
> EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>
> +/**
> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
> + * @dev: DRM device
> + * @pid: The process ID involve with the reset
> + * @flags: Any other information about the GPU status
> + *
> + * Send a uevent for the DRM device specified by @dev. This indicates
> + * user that a GPU reset has occurred, so that the interested client
> + * can take any recovery or profiling measure, when required.
> + */
> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, uint32_t flags)
> +{
> + unsigned char pid_str[21], flags_str[15];
> + unsigned char reset_str[] = "RESET=1";
> + char *envp[] = { reset_str, pid_str, flags_str, NULL };
> +
> + DRM_DEBUG("generating reset event\n");
> +
> + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
> + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> +}
> +EXPORT_SYMBOL(drm_sysfs_reset_event);
> +
> /**
> * drm_sysfs_connector_hotplug_event - generate a DRM uevent for any connector
> * change
> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
> index 6273cac44e47..63f00fe8054c 100644
> --- a/include/drm/drm_sysfs.h
> +++ b/include/drm/drm_sysfs.h
> @@ -2,6 +2,8 @@
> #ifndef _DRM_SYSFS_H_
> #define _DRM_SYSFS_H_
>
> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
> +
> struct drm_device;
> struct device;
> struct drm_connector;
> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
> void drm_class_device_unregister(struct device *dev);
>
> void drm_sysfs_hotplug_event(struct drm_device *dev);
> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, uint32_t reset_flags);
> void drm_sysfs_connector_hotplug_event(struct drm_connector *connector);
> void drm_sysfs_connector_status_event(struct drm_connector *connector,
> struct drm_property *property);
[-- Attachment #2: Type: text/html, Size: 7442 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 16:36 ` Lazar, Lijo
@ 2022-03-08 16:39 ` Andrey Grodzovsky
2022-03-08 16:46 ` Sharma, Shashank
0 siblings, 1 reply; 32+ messages in thread
From: Andrey Grodzovsky @ 2022-03-08 16:39 UTC (permalink / raw)
To: Lazar, Lijo, Shashank Sharma, amd-gfx
Cc: Deucher, Alexander, Limonciello, Mario, Somalapuram, Amaranath,
Koenig, Christian, Sharma, Shashank
[-- Attachment #1: Type: text/plain, Size: 5736 bytes --]
As long as PMC driver provides clear interface to retrieve the info
there should be no issue to call either amdgpu interface or PMC
interface using IS_APU (or something alike in the code)
We probably should add a wrapper function around this logic in amdgpu.
Andrey
On 2022-03-08 11:36, Lazar, Lijo wrote:
>
> [AMD Official Use Only]
>
>
> +Mario
>
> I guess that means the functionality needs to be present in amdgpu for
> APUs also. Presently, this is taken care by PMC driver for APUs.
>
> Thanks,
> Lijo
> ------------------------------------------------------------------------
> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of
> Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> *Sent:* Tuesday, March 8, 2022 9:55:03 PM
> *To:* Shashank Sharma <contactshashanksharma@gmail.com>;
> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
> *Cc:* Deucher, Alexander <Alexander.Deucher@amd.com>; Somalapuram,
> Amaranath <Amaranath.Somalapuram@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
> *Subject:* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
>
> On 2022-03-07 11:26, Shashank Sharma wrote:
> > From: Shashank Sharma <shashank.sharma@amd.com>
> >
> > This patch adds a new sysfs event, which will indicate
> > the userland about a GPU reset, and can also provide
> > some information like:
> > - which PID was involved in the GPU reset
> > - what was the GPU status (using flags)
> >
> > This patch also introduces the first flag of the flags
> > bitmap, which can be appended as and when required.
>
>
> I am reminding again about another important piece of info which you
> can add
> here and that is Smart Trace Buffer dump [1]. The buffer size is HW
> specific but
> from what I see there is no problem to just amend it as part of envp[]
> initialization.
> bellow.
>
> The interface to get the buffer is smu_stb_collect_info and usage can be
> seen from
> frebugfs interface in smu_stb_debugfs_open
>
> [1] -
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0>
>
> Andrey
>
>
> >
> > Cc: Alexandar Deucher <alexander.deucher@amd.com>
> > Cc: Christian Koenig <christian.koenig@amd.com>
> > Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> > ---
> > drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
> > include/drm/drm_sysfs.h | 3 +++
> > 2 files changed, 27 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
> > index 430e00b16eec..52a015161431 100644
> > --- a/drivers/gpu/drm/drm_sysfs.c
> > +++ b/drivers/gpu/drm/drm_sysfs.c
> > @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct drm_device
> *dev)
> > }
> > EXPORT_SYMBOL(drm_sysfs_hotplug_event);
> >
> > +/**
> > + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
> > + * @dev: DRM device
> > + * @pid: The process ID involve with the reset
> > + * @flags: Any other information about the GPU status
> > + *
> > + * Send a uevent for the DRM device specified by @dev. This indicates
> > + * user that a GPU reset has occurred, so that the interested client
> > + * can take any recovery or profiling measure, when required.
> > + */
> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
> uint32_t flags)
> > +{
> > + unsigned char pid_str[21], flags_str[15];
> > + unsigned char reset_str[] = "RESET=1";
> > + char *envp[] = { reset_str, pid_str, flags_str, NULL };
> > +
> > + DRM_DEBUG("generating reset event\n");
> > +
> > + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
> > + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
> > + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> > +}
> > +EXPORT_SYMBOL(drm_sysfs_reset_event);
> > +
> > /**
> > * drm_sysfs_connector_hotplug_event - generate a DRM uevent for
> any connector
> > * change
> > diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
> > index 6273cac44e47..63f00fe8054c 100644
> > --- a/include/drm/drm_sysfs.h
> > +++ b/include/drm/drm_sysfs.h
> > @@ -2,6 +2,8 @@
> > #ifndef _DRM_SYSFS_H_
> > #define _DRM_SYSFS_H_
> >
> > +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
> > +
> > struct drm_device;
> > struct device;
> > struct drm_connector;
> > @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
> > void drm_class_device_unregister(struct device *dev);
> >
> > void drm_sysfs_hotplug_event(struct drm_device *dev);
> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
> uint32_t reset_flags);
> > void drm_sysfs_connector_hotplug_event(struct drm_connector
> *connector);
> > void drm_sysfs_connector_status_event(struct drm_connector *connector,
> > struct drm_property *property);
[-- Attachment #2: Type: text/html, Size: 11451 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 11:56 ` Sharma, Shashank
2022-03-08 15:37 ` Somalapuram, Amaranath
@ 2022-03-08 16:40 ` Sharma, Shashank
2022-03-09 8:05 ` Christian König
1 sibling, 1 reply; 32+ messages in thread
From: Sharma, Shashank @ 2022-03-08 16:40 UTC (permalink / raw)
To: Christian König, Christian König, Shashank Sharma, amd-gfx
Cc: Alexandar Deucher, amaranath.somalapuram
On 3/8/2022 12:56 PM, Sharma, Shashank wrote:
>
>
> On 3/8/2022 11:32 AM, Christian König wrote:
>> Am 08.03.22 um 10:31 schrieb Sharma, Shashank:
>>>
>>>
>>> On 3/8/2022 8:06 AM, Christian König wrote:
>>>> Am 07.03.22 um 17:26 schrieb Shashank Sharma:
>>>>> From: Shashank Sharma <shashank.sharma@amd.com>
>>>>>
>>>>> This patch adds a new sysfs event, which will indicate
>>>>> the userland about a GPU reset, and can also provide
>>>>> some information like:
>>>>> - which PID was involved in the GPU reset
>>>>> - what was the GPU status (using flags)
>>>>>
>>>>> This patch also introduces the first flag of the flags
>>>>> bitmap, which can be appended as and when required.
>>>>
>>>> Make sure to CC the dri-devel mailing list when reviewing this.
>>> Got it,
>>>
>>> I was also curious if we want to move the reset_ctx structure itself
>>> to DRM layer, like
>>> drm_reset_event_ctx {
>>> u32 pid;
>>> u32 flags;
>>> char process_name[64];
>>> };
>>
>> I was entertaining that thought as well.
>>
>> But if we do this I would go even a step further and also move the
>> reset work item into the DRM layer as well.
>>
>> You might also look like into migrating the exiting i915 code which
>> uses udev to signal GPU resets to this function as well.
>>
>> Regards,
>> Christian.
>
> That seems like a good idea, let me quickly dive into i915 and check
> this out.
>
> Shashank
I had a quick look at I915, and it looks like both I915 and AMDGPU
drivers have very different methods of passing the data to the work
function, via different internal structures. Which means it would be
much additional work in both the drivers to move the work function
itself in the DRM layer.
To me, now it seems like it would be better if we can just provide this
interface to send the uevent and its structure, and the drivers can
collect their information and pass it to WQ in their own way.
How do you feel about it ?
- Shashank
>>
>>>
>>> and then:
>>> void drm_sysfs_reset_event(struct drm_device *dev,
>>> drm_reset_event_ctx *ctx);
>>>
>>>>
>>>>>
>>>>> Cc: Alexandar Deucher <alexander.deucher@amd.com>
>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>>>> include/drm/drm_sysfs.h | 3 +++
>>>>> 2 files changed, 27 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
>>>>> index 430e00b16eec..52a015161431 100644
>>>>> --- a/drivers/gpu/drm/drm_sysfs.c
>>>>> +++ b/drivers/gpu/drm/drm_sysfs.c
>>>>> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct drm_device
>>>>> *dev)
>>>>> }
>>>>> EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>>>> +/**
>>>>> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU
>>>>> reset
>>>>> + * @dev: DRM device
>>>>> + * @pid: The process ID involve with the reset
>>>>> + * @flags: Any other information about the GPU status
>>>>> + *
>>>>> + * Send a uevent for the DRM device specified by @dev. This indicates
>>>>> + * user that a GPU reset has occurred, so that the interested client
>>>>> + * can take any recovery or profiling measure, when required.
>>>>> + */
>>>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>>>> uint32_t flags)
>>>>
>>>> The PID is usually only 32bit, but even better would be to use pid_t.
>>>>
>>>>> +{
>>>>> + unsigned char pid_str[21], flags_str[15];
>>>>> + unsigned char reset_str[] = "RESET=1";
>>>>> + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>>>> +
>>>>> + DRM_DEBUG("generating reset event\n");
>>>>> +
>>>>> + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>>>> + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>>>>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>>>> +}
>>>>> +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>>>> +
>>>>> /**
>>>>> * drm_sysfs_connector_hotplug_event - generate a DRM uevent for
>>>>> any connector
>>>>> * change
>>>>> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>>>> index 6273cac44e47..63f00fe8054c 100644
>>>>> --- a/include/drm/drm_sysfs.h
>>>>> +++ b/include/drm/drm_sysfs.h
>>>>> @@ -2,6 +2,8 @@
>>>>> #ifndef _DRM_SYSFS_H_
>>>>> #define _DRM_SYSFS_H_
>>>>> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>>>
>>>> Probably better to define that the other way around, e.g.
>>>> DRM_GPU_RESET_FLAG_VRAM_LOST.
>>>>
>>>> Apart from that looks good to me.
>>>>
>>> Got it, noted.
>>> - Shashank
>>>
>>>> Christian.
>>>>
>>>>> +
>>>>> struct drm_device;
>>>>> struct device;
>>>>> struct drm_connector;
>>>>> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>>>>> void drm_class_device_unregister(struct device *dev);
>>>>> void drm_sysfs_hotplug_event(struct drm_device *dev);
>>>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>>>> uint32_t reset_flags);
>>>>> void drm_sysfs_connector_hotplug_event(struct drm_connector
>>>>> *connector);
>>>>> void drm_sysfs_connector_status_event(struct drm_connector
>>>>> *connector,
>>>>> struct drm_property *property);
>>>>
>>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 16:39 ` Andrey Grodzovsky
@ 2022-03-08 16:46 ` Sharma, Shashank
2022-03-08 16:55 ` Andrey Grodzovsky
0 siblings, 1 reply; 32+ messages in thread
From: Sharma, Shashank @ 2022-03-08 16:46 UTC (permalink / raw)
To: Andrey Grodzovsky, Lazar, Lijo, Shashank Sharma, amd-gfx
Cc: Deucher, Alexander, Somalapuram, Amaranath, Koenig, Christian,
Limonciello, Mario
I have a very limited understanding of PMC driver and its interfaces, so
I would just go ahead and rely on Andrey's judgement/recommendation on
this :)
- Shashank
On 3/8/2022 5:39 PM, Andrey Grodzovsky wrote:
> As long as PMC driver provides clear interface to retrieve the info
> there should be no issue to call either amdgpu interface or PMC
> interface using IS_APU (or something alike in the code)
> We probably should add a wrapper function around this logic in amdgpu.
>
> Andrey
>
> On 2022-03-08 11:36, Lazar, Lijo wrote:
>>
>> [AMD Official Use Only]
>>
>>
>> +Mario
>>
>> I guess that means the functionality needs to be present in amdgpu for
>> APUs also. Presently, this is taken care by PMC driver for APUs.
>>
>> Thanks,
>> Lijo
>> ------------------------------------------------------------------------
>> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of
>> Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> *Sent:* Tuesday, March 8, 2022 9:55:03 PM
>> *To:* Shashank Sharma <contactshashanksharma@gmail.com>;
>> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
>> *Cc:* Deucher, Alexander <Alexander.Deucher@amd.com>; Somalapuram,
>> Amaranath <Amaranath.Somalapuram@amd.com>; Koenig, Christian
>> <Christian.Koenig@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
>> *Subject:* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
>>
>> On 2022-03-07 11:26, Shashank Sharma wrote:
>> > From: Shashank Sharma <shashank.sharma@amd.com>
>> >
>> > This patch adds a new sysfs event, which will indicate
>> > the userland about a GPU reset, and can also provide
>> > some information like:
>> > - which PID was involved in the GPU reset
>> > - what was the GPU status (using flags)
>> >
>> > This patch also introduces the first flag of the flags
>> > bitmap, which can be appended as and when required.
>>
>>
>> I am reminding again about another important piece of info which you
>> can add
>> here and that is Smart Trace Buffer dump [1]. The buffer size is HW
>> specific but
>> from what I see there is no problem to just amend it as part of envp[]
>> initialization.
>> bellow.
>>
>> The interface to get the buffer is smu_stb_collect_info and usage can be
>> seen from
>> frebugfs interface in smu_stb_debugfs_open
>>
>> [1] -
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0
>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0>
>>
>> Andrey
>>
>>
>> >
>> > Cc: Alexandar Deucher <alexander.deucher@amd.com>
>> > Cc: Christian Koenig <christian.koenig@amd.com>
>> > Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>> > ---
>> > drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>> > include/drm/drm_sysfs.h | 3 +++
>> > 2 files changed, 27 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
>> > index 430e00b16eec..52a015161431 100644
>> > --- a/drivers/gpu/drm/drm_sysfs.c
>> > +++ b/drivers/gpu/drm/drm_sysfs.c
>> > @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct drm_device
>> *dev)
>> > }
>> > EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>> >
>> > +/**
>> > + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
>> > + * @dev: DRM device
>> > + * @pid: The process ID involve with the reset
>> > + * @flags: Any other information about the GPU status
>> > + *
>> > + * Send a uevent for the DRM device specified by @dev. This indicates
>> > + * user that a GPU reset has occurred, so that the interested client
>> > + * can take any recovery or profiling measure, when required.
>> > + */
>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>> uint32_t flags)
>> > +{
>> > + unsigned char pid_str[21], flags_str[15];
>> > + unsigned char reset_str[] = "RESET=1";
>> > + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>> > +
>> > + DRM_DEBUG("generating reset event\n");
>> > +
>> > + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>> > + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>> > + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>> > +}
>> > +EXPORT_SYMBOL(drm_sysfs_reset_event);
>> > +
>> > /**
>> > * drm_sysfs_connector_hotplug_event - generate a DRM uevent for
>> any connector
>> > * change
>> > diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>> > index 6273cac44e47..63f00fe8054c 100644
>> > --- a/include/drm/drm_sysfs.h
>> > +++ b/include/drm/drm_sysfs.h
>> > @@ -2,6 +2,8 @@
>> > #ifndef _DRM_SYSFS_H_
>> > #define _DRM_SYSFS_H_
>> >
>> > +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>> > +
>> > struct drm_device;
>> > struct device;
>> > struct drm_connector;
>> > @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>> > void drm_class_device_unregister(struct device *dev);
>> >
>> > void drm_sysfs_hotplug_event(struct drm_device *dev);
>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>> uint32_t reset_flags);
>> > void drm_sysfs_connector_hotplug_event(struct drm_connector
>> *connector);
>> > void drm_sysfs_connector_status_event(struct drm_connector *connector,
>> > struct drm_property *property);
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 16:46 ` Sharma, Shashank
@ 2022-03-08 16:55 ` Andrey Grodzovsky
2022-03-08 16:57 ` Sharma, Shashank
0 siblings, 1 reply; 32+ messages in thread
From: Andrey Grodzovsky @ 2022-03-08 16:55 UTC (permalink / raw)
To: Sharma, Shashank, Lazar, Lijo, Shashank Sharma, amd-gfx
Cc: Deucher, Alexander, Somalapuram, Amaranath, Koenig, Christian,
Limonciello, Mario
You can read on their side here -
https://www.phoronix.com/scan.php?page=news_item&px=AMD-STB-Linux-5.17
and see their patch. THey don't have as clean
interface as we do to retrieve the buffer and currently it's hard-coded
for debugfs dump but it looks like pretty straight forward to expose
their buffer to external
client like amdgpu.
Andrey
On 2022-03-08 11:46, Sharma, Shashank wrote:
> I have a very limited understanding of PMC driver and its interfaces,
> so I would just go ahead and rely on Andrey's judgement/recommendation
> on this :)
>
> - Shashank
>
> On 3/8/2022 5:39 PM, Andrey Grodzovsky wrote:
>> As long as PMC driver provides clear interface to retrieve the info
>> there should be no issue to call either amdgpu interface or PMC
>> interface using IS_APU (or something alike in the code)
>> We probably should add a wrapper function around this logic in amdgpu.
>>
>> Andrey
>>
>> On 2022-03-08 11:36, Lazar, Lijo wrote:
>>>
>>> [AMD Official Use Only]
>>>
>>>
>>> +Mario
>>>
>>> I guess that means the functionality needs to be present in amdgpu
>>> for APUs also. Presently, this is taken care by PMC driver for APUs.
>>>
>>> Thanks,
>>> Lijo
>>> ------------------------------------------------------------------------
>>>
>>> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of
>>> Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> *Sent:* Tuesday, March 8, 2022 9:55:03 PM
>>> *To:* Shashank Sharma <contactshashanksharma@gmail.com>;
>>> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
>>> *Cc:* Deucher, Alexander <Alexander.Deucher@amd.com>; Somalapuram,
>>> Amaranath <Amaranath.Somalapuram@amd.com>; Koenig, Christian
>>> <Christian.Koenig@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
>>> *Subject:* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
>>>
>>> On 2022-03-07 11:26, Shashank Sharma wrote:
>>> > From: Shashank Sharma <shashank.sharma@amd.com>
>>> >
>>> > This patch adds a new sysfs event, which will indicate
>>> > the userland about a GPU reset, and can also provide
>>> > some information like:
>>> > - which PID was involved in the GPU reset
>>> > - what was the GPU status (using flags)
>>> >
>>> > This patch also introduces the first flag of the flags
>>> > bitmap, which can be appended as and when required.
>>>
>>>
>>> I am reminding again about another important piece of info which you
>>> can add
>>> here and that is Smart Trace Buffer dump [1]. The buffer size is HW
>>> specific but
>>> from what I see there is no problem to just amend it as part of envp[]
>>> initialization.
>>> bellow.
>>>
>>> The interface to get the buffer is smu_stb_collect_info and usage
>>> can be
>>> seen from
>>> frebugfs interface in smu_stb_debugfs_open
>>>
>>> [1] -
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0
>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0>
>>>
>>>
>>> Andrey
>>>
>>>
>>> >
>>> > Cc: Alexandar Deucher <alexander.deucher@amd.com>
>>> > Cc: Christian Koenig <christian.koenig@amd.com>
>>> > Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>> > ---
>>> > drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>> > include/drm/drm_sysfs.h | 3 +++
>>> > 2 files changed, 27 insertions(+)
>>> >
>>> > diff --git a/drivers/gpu/drm/drm_sysfs.c
>>> b/drivers/gpu/drm/drm_sysfs.c
>>> > index 430e00b16eec..52a015161431 100644
>>> > --- a/drivers/gpu/drm/drm_sysfs.c
>>> > +++ b/drivers/gpu/drm/drm_sysfs.c
>>> > @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct
>>> drm_device *dev)
>>> > }
>>> > EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>> >
>>> > +/**
>>> > + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU
>>> reset
>>> > + * @dev: DRM device
>>> > + * @pid: The process ID involve with the reset
>>> > + * @flags: Any other information about the GPU status
>>> > + *
>>> > + * Send a uevent for the DRM device specified by @dev. This
>>> indicates
>>> > + * user that a GPU reset has occurred, so that the interested client
>>> > + * can take any recovery or profiling measure, when required.
>>> > + */
>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>> uint32_t flags)
>>> > +{
>>> > + unsigned char pid_str[21], flags_str[15];
>>> > + unsigned char reset_str[] = "RESET=1";
>>> > + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>> > +
>>> > + DRM_DEBUG("generating reset event\n");
>>> > +
>>> > + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>> > + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>>> > + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>> > +}
>>> > +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>> > +
>>> > /**
>>> > * drm_sysfs_connector_hotplug_event - generate a DRM uevent for
>>> any connector
>>> > * change
>>> > diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>> > index 6273cac44e47..63f00fe8054c 100644
>>> > --- a/include/drm/drm_sysfs.h
>>> > +++ b/include/drm/drm_sysfs.h
>>> > @@ -2,6 +2,8 @@
>>> > #ifndef _DRM_SYSFS_H_
>>> > #define _DRM_SYSFS_H_
>>> >
>>> > +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>> > +
>>> > struct drm_device;
>>> > struct device;
>>> > struct drm_connector;
>>> > @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>>> > void drm_class_device_unregister(struct device *dev);
>>> >
>>> > void drm_sysfs_hotplug_event(struct drm_device *dev);
>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>> uint32_t reset_flags);
>>> > void drm_sysfs_connector_hotplug_event(struct drm_connector
>>> *connector);
>>> > void drm_sysfs_connector_status_event(struct drm_connector
>>> *connector,
>>> > struct drm_property *property);
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 16:55 ` Andrey Grodzovsky
@ 2022-03-08 16:57 ` Sharma, Shashank
2022-03-08 17:04 ` Somalapuram, Amaranath
2022-03-08 17:27 ` Limonciello, Mario
0 siblings, 2 replies; 32+ messages in thread
From: Sharma, Shashank @ 2022-03-08 16:57 UTC (permalink / raw)
To: Andrey Grodzovsky, Lazar, Lijo, Shashank Sharma, amd-gfx
Cc: Deucher, Alexander, Somalapuram, Amaranath, Koenig, Christian,
Limonciello, Mario
On 3/8/2022 5:55 PM, Andrey Grodzovsky wrote:
> You can read on their side here -
> https://www.phoronix.com/scan.php?page=news_item&px=AMD-STB-Linux-5.17
> and see their patch. THey don't have as clean
> interface as we do to retrieve the buffer and currently it's hard-coded
> for debugfs dump but it looks like pretty straight forward to expose
> their buffer to external
> client like amdgpu.
Noted, thanks for the pointer.
- Shashank
>
> Andrey
>
> On 2022-03-08 11:46, Sharma, Shashank wrote:
>> I have a very limited understanding of PMC driver and its interfaces,
>> so I would just go ahead and rely on Andrey's judgement/recommendation
>> on this :)
>>
>> - Shashank
>>
>> On 3/8/2022 5:39 PM, Andrey Grodzovsky wrote:
>>> As long as PMC driver provides clear interface to retrieve the info
>>> there should be no issue to call either amdgpu interface or PMC
>>> interface using IS_APU (or something alike in the code)
>>> We probably should add a wrapper function around this logic in amdgpu.
>>>
>>> Andrey
>>>
>>> On 2022-03-08 11:36, Lazar, Lijo wrote:
>>>>
>>>> [AMD Official Use Only]
>>>>
>>>>
>>>> +Mario
>>>>
>>>> I guess that means the functionality needs to be present in amdgpu
>>>> for APUs also. Presently, this is taken care by PMC driver for APUs.
>>>>
>>>> Thanks,
>>>> Lijo
>>>> ------------------------------------------------------------------------
>>>>
>>>> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of
>>>> Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> *Sent:* Tuesday, March 8, 2022 9:55:03 PM
>>>> *To:* Shashank Sharma <contactshashanksharma@gmail.com>;
>>>> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
>>>> *Cc:* Deucher, Alexander <Alexander.Deucher@amd.com>; Somalapuram,
>>>> Amaranath <Amaranath.Somalapuram@amd.com>; Koenig, Christian
>>>> <Christian.Koenig@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
>>>> *Subject:* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
>>>>
>>>> On 2022-03-07 11:26, Shashank Sharma wrote:
>>>> > From: Shashank Sharma <shashank.sharma@amd.com>
>>>> >
>>>> > This patch adds a new sysfs event, which will indicate
>>>> > the userland about a GPU reset, and can also provide
>>>> > some information like:
>>>> > - which PID was involved in the GPU reset
>>>> > - what was the GPU status (using flags)
>>>> >
>>>> > This patch also introduces the first flag of the flags
>>>> > bitmap, which can be appended as and when required.
>>>>
>>>>
>>>> I am reminding again about another important piece of info which you
>>>> can add
>>>> here and that is Smart Trace Buffer dump [1]. The buffer size is HW
>>>> specific but
>>>> from what I see there is no problem to just amend it as part of envp[]
>>>> initialization.
>>>> bellow.
>>>>
>>>> The interface to get the buffer is smu_stb_collect_info and usage
>>>> can be
>>>> seen from
>>>> frebugfs interface in smu_stb_debugfs_open
>>>>
>>>> [1] -
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0
>>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0>
>>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>> >
>>>> > Cc: Alexandar Deucher <alexander.deucher@amd.com>
>>>> > Cc: Christian Koenig <christian.koenig@amd.com>
>>>> > Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>> > ---
>>>> > drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>>> > include/drm/drm_sysfs.h | 3 +++
>>>> > 2 files changed, 27 insertions(+)
>>>> >
>>>> > diff --git a/drivers/gpu/drm/drm_sysfs.c
>>>> b/drivers/gpu/drm/drm_sysfs.c
>>>> > index 430e00b16eec..52a015161431 100644
>>>> > --- a/drivers/gpu/drm/drm_sysfs.c
>>>> > +++ b/drivers/gpu/drm/drm_sysfs.c
>>>> > @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct
>>>> drm_device *dev)
>>>> > }
>>>> > EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>>> >
>>>> > +/**
>>>> > + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU
>>>> reset
>>>> > + * @dev: DRM device
>>>> > + * @pid: The process ID involve with the reset
>>>> > + * @flags: Any other information about the GPU status
>>>> > + *
>>>> > + * Send a uevent for the DRM device specified by @dev. This
>>>> indicates
>>>> > + * user that a GPU reset has occurred, so that the interested client
>>>> > + * can take any recovery or profiling measure, when required.
>>>> > + */
>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>>> uint32_t flags)
>>>> > +{
>>>> > + unsigned char pid_str[21], flags_str[15];
>>>> > + unsigned char reset_str[] = "RESET=1";
>>>> > + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>>> > +
>>>> > + DRM_DEBUG("generating reset event\n");
>>>> > +
>>>> > + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>>> > + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>>>> > + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>>> > +}
>>>> > +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>>> > +
>>>> > /**
>>>> > * drm_sysfs_connector_hotplug_event - generate a DRM uevent for
>>>> any connector
>>>> > * change
>>>> > diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>>> > index 6273cac44e47..63f00fe8054c 100644
>>>> > --- a/include/drm/drm_sysfs.h
>>>> > +++ b/include/drm/drm_sysfs.h
>>>> > @@ -2,6 +2,8 @@
>>>> > #ifndef _DRM_SYSFS_H_
>>>> > #define _DRM_SYSFS_H_
>>>> >
>>>> > +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>>> > +
>>>> > struct drm_device;
>>>> > struct device;
>>>> > struct drm_connector;
>>>> > @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>>>> > void drm_class_device_unregister(struct device *dev);
>>>> >
>>>> > void drm_sysfs_hotplug_event(struct drm_device *dev);
>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>>> uint32_t reset_flags);
>>>> > void drm_sysfs_connector_hotplug_event(struct drm_connector
>>>> *connector);
>>>> > void drm_sysfs_connector_status_event(struct drm_connector
>>>> *connector,
>>>> > struct drm_property *property);
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 16:57 ` Sharma, Shashank
@ 2022-03-08 17:04 ` Somalapuram, Amaranath
2022-03-08 17:17 ` Andrey Grodzovsky
2022-03-08 17:27 ` Limonciello, Mario
1 sibling, 1 reply; 32+ messages in thread
From: Somalapuram, Amaranath @ 2022-03-08 17:04 UTC (permalink / raw)
To: Sharma, Shashank, Andrey Grodzovsky, Lazar, Lijo,
Shashank Sharma, amd-gfx
Cc: Deucher, Alexander, Somalapuram, Amaranath, Koenig, Christian,
Limonciello, Mario
On 3/8/2022 10:27 PM, Sharma, Shashank wrote:
>
>
> On 3/8/2022 5:55 PM, Andrey Grodzovsky wrote:
>> You can read on their side here -
>> https://www.phoronix.com/scan.php?page=news_item&px=AMD-STB-Linux-5.17
>> and see their patch. THey don't have as clean
>> interface as we do to retrieve the buffer and currently it's
>> hard-coded for debugfs dump but it looks like pretty straight forward
>> to expose their buffer to external
>> client like amdgpu.
>
Customer requirement is to get reset notification for there daemon with
other info (like PID process name vram status).
Regards,
S.Amarnath
> Noted, thanks for the pointer.
> - Shashank
>>
>> Andrey
>>
>> On 2022-03-08 11:46, Sharma, Shashank wrote:
>>> I have a very limited understanding of PMC driver and its
>>> interfaces, so I would just go ahead and rely on Andrey's
>>> judgement/recommendation on this :)
>>>
>>> - Shashank
>>>
>>> On 3/8/2022 5:39 PM, Andrey Grodzovsky wrote:
>>>> As long as PMC driver provides clear interface to retrieve the info
>>>> there should be no issue to call either amdgpu interface or PMC
>>>> interface using IS_APU (or something alike in the code)
>>>> We probably should add a wrapper function around this logic in amdgpu.
>>>>
>>>> Andrey
>>>>
>>>> On 2022-03-08 11:36, Lazar, Lijo wrote:
>>>>>
>>>>> [AMD Official Use Only]
>>>>>
>>>>>
>>>>> +Mario
>>>>>
>>>>> I guess that means the functionality needs to be present in amdgpu
>>>>> for APUs also. Presently, this is taken care by PMC driver for APUs.
>>>>>
>>>>> Thanks,
>>>>> Lijo
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf
>>>>> of Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> *Sent:* Tuesday, March 8, 2022 9:55:03 PM
>>>>> *To:* Shashank Sharma <contactshashanksharma@gmail.com>;
>>>>> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
>>>>> *Cc:* Deucher, Alexander <Alexander.Deucher@amd.com>; Somalapuram,
>>>>> Amaranath <Amaranath.Somalapuram@amd.com>; Koenig, Christian
>>>>> <Christian.Koenig@amd.com>; Sharma, Shashank
>>>>> <Shashank.Sharma@amd.com>
>>>>> *Subject:* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
>>>>>
>>>>> On 2022-03-07 11:26, Shashank Sharma wrote:
>>>>> > From: Shashank Sharma <shashank.sharma@amd.com>
>>>>> >
>>>>> > This patch adds a new sysfs event, which will indicate
>>>>> > the userland about a GPU reset, and can also provide
>>>>> > some information like:
>>>>> > - which PID was involved in the GPU reset
>>>>> > - what was the GPU status (using flags)
>>>>> >
>>>>> > This patch also introduces the first flag of the flags
>>>>> > bitmap, which can be appended as and when required.
>>>>>
>>>>>
>>>>> I am reminding again about another important piece of info which
>>>>> you can add
>>>>> here and that is Smart Trace Buffer dump [1]. The buffer size is HW
>>>>> specific but
>>>>> from what I see there is no problem to just amend it as part of
>>>>> envp[]
>>>>> initialization.
>>>>> bellow.
>>>>>
>>>>> The interface to get the buffer is smu_stb_collect_info and usage
>>>>> can be
>>>>> seen from
>>>>> frebugfs interface in smu_stb_debugfs_open
>>>>>
>>>>> [1] -
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0
>>>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0>
>>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>> >
>>>>> > Cc: Alexandar Deucher <alexander.deucher@amd.com>
>>>>> > Cc: Christian Koenig <christian.koenig@amd.com>
>>>>> > Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>> > ---
>>>>> > drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>>>> > include/drm/drm_sysfs.h | 3 +++
>>>>> > 2 files changed, 27 insertions(+)
>>>>> >
>>>>> > diff --git a/drivers/gpu/drm/drm_sysfs.c
>>>>> b/drivers/gpu/drm/drm_sysfs.c
>>>>> > index 430e00b16eec..52a015161431 100644
>>>>> > --- a/drivers/gpu/drm/drm_sysfs.c
>>>>> > +++ b/drivers/gpu/drm/drm_sysfs.c
>>>>> > @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct
>>>>> drm_device *dev)
>>>>> > }
>>>>> > EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>>>> >
>>>>> > +/**
>>>>> > + * drm_sysfs_reset_event - generate a DRM uevent to indicate
>>>>> GPU reset
>>>>> > + * @dev: DRM device
>>>>> > + * @pid: The process ID involve with the reset
>>>>> > + * @flags: Any other information about the GPU status
>>>>> > + *
>>>>> > + * Send a uevent for the DRM device specified by @dev. This
>>>>> indicates
>>>>> > + * user that a GPU reset has occurred, so that the interested
>>>>> client
>>>>> > + * can take any recovery or profiling measure, when required.
>>>>> > + */
>>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t
>>>>> pid, uint32_t flags)
>>>>> > +{
>>>>> > + unsigned char pid_str[21], flags_str[15];
>>>>> > + unsigned char reset_str[] = "RESET=1";
>>>>> > + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>>>> > +
>>>>> > + DRM_DEBUG("generating reset event\n");
>>>>> > +
>>>>> > + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>>>> > + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u",
>>>>> flags);
>>>>> > + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>>>> > +}
>>>>> > +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>>>> > +
>>>>> > /**
>>>>> > * drm_sysfs_connector_hotplug_event - generate a DRM uevent
>>>>> for any connector
>>>>> > * change
>>>>> > diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>>>> > index 6273cac44e47..63f00fe8054c 100644
>>>>> > --- a/include/drm/drm_sysfs.h
>>>>> > +++ b/include/drm/drm_sysfs.h
>>>>> > @@ -2,6 +2,8 @@
>>>>> > #ifndef _DRM_SYSFS_H_
>>>>> > #define _DRM_SYSFS_H_
>>>>> >
>>>>> > +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>>>> > +
>>>>> > struct drm_device;
>>>>> > struct device;
>>>>> > struct drm_connector;
>>>>> > @@ -11,6 +13,7 @@ int drm_class_device_register(struct device
>>>>> *dev);
>>>>> > void drm_class_device_unregister(struct device *dev);
>>>>> >
>>>>> > void drm_sysfs_hotplug_event(struct drm_device *dev);
>>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t
>>>>> pid, uint32_t reset_flags);
>>>>> > void drm_sysfs_connector_hotplug_event(struct drm_connector
>>>>> *connector);
>>>>> > void drm_sysfs_connector_status_event(struct drm_connector
>>>>> *connector,
>>>>> > struct drm_property
>>>>> *property);
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 17:04 ` Somalapuram, Amaranath
@ 2022-03-08 17:17 ` Andrey Grodzovsky
0 siblings, 0 replies; 32+ messages in thread
From: Andrey Grodzovsky @ 2022-03-08 17:17 UTC (permalink / raw)
To: Somalapuram, Amaranath, Sharma, Shashank, Lazar, Lijo,
Shashank Sharma, amd-gfx
Cc: Deucher, Alexander, Somalapuram, Amaranath, Koenig, Christian,
Limonciello, Mario
On 2022-03-08 12:04, Somalapuram, Amaranath wrote:
>
> On 3/8/2022 10:27 PM, Sharma, Shashank wrote:
>>
>>
>> On 3/8/2022 5:55 PM, Andrey Grodzovsky wrote:
>>> You can read on their side here -
>>> https://www.phoronix.com/scan.php?page=news_item&px=AMD-STB-Linux-5.17
>>> and see their patch. THey don't have as clean
>>> interface as we do to retrieve the buffer and currently it's
>>> hard-coded for debugfs dump but it looks like pretty straight
>>> forward to expose their buffer to external
>>> client like amdgpu.
>>
> Customer requirement is to get reset notification for there daemon
> with other info (like PID process name vram status).
In general when a failure happens we want to have all debug info
possible to have better ability to root cause the problem. Since this is
an open forum I am not sure how
much i can disclose about the data in the buffer but i guarantee you it
is very useful for debugging GPU hang causes.
Andrey
>
> Regards,
> S.Amarnath
>> Noted, thanks for the pointer.
>> - Shashank
>>>
>>> Andrey
>>>
>>> On 2022-03-08 11:46, Sharma, Shashank wrote:
>>>> I have a very limited understanding of PMC driver and its
>>>> interfaces, so I would just go ahead and rely on Andrey's
>>>> judgement/recommendation on this :)
>>>>
>>>> - Shashank
>>>>
>>>> On 3/8/2022 5:39 PM, Andrey Grodzovsky wrote:
>>>>> As long as PMC driver provides clear interface to retrieve the
>>>>> info there should be no issue to call either amdgpu interface or
>>>>> PMC interface using IS_APU (or something alike in the code)
>>>>> We probably should add a wrapper function around this logic in
>>>>> amdgpu.
>>>>>
>>>>> Andrey
>>>>>
>>>>> On 2022-03-08 11:36, Lazar, Lijo wrote:
>>>>>>
>>>>>> [AMD Official Use Only]
>>>>>>
>>>>>>
>>>>>> +Mario
>>>>>>
>>>>>> I guess that means the functionality needs to be present in
>>>>>> amdgpu for APUs also. Presently, this is taken care by PMC driver
>>>>>> for APUs.
>>>>>>
>>>>>> Thanks,
>>>>>> Lijo
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf
>>>>>> of Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> *Sent:* Tuesday, March 8, 2022 9:55:03 PM
>>>>>> *To:* Shashank Sharma <contactshashanksharma@gmail.com>;
>>>>>> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
>>>>>> *Cc:* Deucher, Alexander <Alexander.Deucher@amd.com>;
>>>>>> Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>; Koenig,
>>>>>> Christian <Christian.Koenig@amd.com>; Sharma, Shashank
>>>>>> <Shashank.Sharma@amd.com>
>>>>>> *Subject:* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
>>>>>>
>>>>>> On 2022-03-07 11:26, Shashank Sharma wrote:
>>>>>> > From: Shashank Sharma <shashank.sharma@amd.com>
>>>>>> >
>>>>>> > This patch adds a new sysfs event, which will indicate
>>>>>> > the userland about a GPU reset, and can also provide
>>>>>> > some information like:
>>>>>> > - which PID was involved in the GPU reset
>>>>>> > - what was the GPU status (using flags)
>>>>>> >
>>>>>> > This patch also introduces the first flag of the flags
>>>>>> > bitmap, which can be appended as and when required.
>>>>>>
>>>>>>
>>>>>> I am reminding again about another important piece of info which
>>>>>> you can add
>>>>>> here and that is Smart Trace Buffer dump [1]. The buffer size is HW
>>>>>> specific but
>>>>>> from what I see there is no problem to just amend it as part of
>>>>>> envp[]
>>>>>> initialization.
>>>>>> bellow.
>>>>>>
>>>>>> The interface to get the buffer is smu_stb_collect_info and usage
>>>>>> can be
>>>>>> seen from
>>>>>> frebugfs interface in smu_stb_debugfs_open
>>>>>>
>>>>>> [1] -
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0
>>>>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0>
>>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>> >
>>>>>> > Cc: Alexandar Deucher <alexander.deucher@amd.com>
>>>>>> > Cc: Christian Koenig <christian.koenig@amd.com>
>>>>>> > Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>>> > ---
>>>>>> > drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>>>>> > include/drm/drm_sysfs.h | 3 +++
>>>>>> > 2 files changed, 27 insertions(+)
>>>>>> >
>>>>>> > diff --git a/drivers/gpu/drm/drm_sysfs.c
>>>>>> b/drivers/gpu/drm/drm_sysfs.c
>>>>>> > index 430e00b16eec..52a015161431 100644
>>>>>> > --- a/drivers/gpu/drm/drm_sysfs.c
>>>>>> > +++ b/drivers/gpu/drm/drm_sysfs.c
>>>>>> > @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct
>>>>>> drm_device *dev)
>>>>>> > }
>>>>>> > EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>>>>> >
>>>>>> > +/**
>>>>>> > + * drm_sysfs_reset_event - generate a DRM uevent to indicate
>>>>>> GPU reset
>>>>>> > + * @dev: DRM device
>>>>>> > + * @pid: The process ID involve with the reset
>>>>>> > + * @flags: Any other information about the GPU status
>>>>>> > + *
>>>>>> > + * Send a uevent for the DRM device specified by @dev. This
>>>>>> indicates
>>>>>> > + * user that a GPU reset has occurred, so that the interested
>>>>>> client
>>>>>> > + * can take any recovery or profiling measure, when required.
>>>>>> > + */
>>>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t
>>>>>> pid, uint32_t flags)
>>>>>> > +{
>>>>>> > + unsigned char pid_str[21], flags_str[15];
>>>>>> > + unsigned char reset_str[] = "RESET=1";
>>>>>> > + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>>>>> > +
>>>>>> > + DRM_DEBUG("generating reset event\n");
>>>>>> > +
>>>>>> > + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>>>>> > + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u",
>>>>>> flags);
>>>>>> > + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
>>>>>> envp);
>>>>>> > +}
>>>>>> > +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>>>>> > +
>>>>>> > /**
>>>>>> > * drm_sysfs_connector_hotplug_event - generate a DRM uevent
>>>>>> for any connector
>>>>>> > * change
>>>>>> > diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>>>>> > index 6273cac44e47..63f00fe8054c 100644
>>>>>> > --- a/include/drm/drm_sysfs.h
>>>>>> > +++ b/include/drm/drm_sysfs.h
>>>>>> > @@ -2,6 +2,8 @@
>>>>>> > #ifndef _DRM_SYSFS_H_
>>>>>> > #define _DRM_SYSFS_H_
>>>>>> >
>>>>>> > +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>>>>> > +
>>>>>> > struct drm_device;
>>>>>> > struct device;
>>>>>> > struct drm_connector;
>>>>>> > @@ -11,6 +13,7 @@ int drm_class_device_register(struct device
>>>>>> *dev);
>>>>>> > void drm_class_device_unregister(struct device *dev);
>>>>>> >
>>>>>> > void drm_sysfs_hotplug_event(struct drm_device *dev);
>>>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t
>>>>>> pid, uint32_t reset_flags);
>>>>>> > void drm_sysfs_connector_hotplug_event(struct drm_connector
>>>>>> *connector);
>>>>>> > void drm_sysfs_connector_status_event(struct drm_connector
>>>>>> *connector,
>>>>>> > struct drm_property
>>>>>> *property);
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add work function for GPU reset event
2022-03-08 16:30 ` Sharma, Shashank
@ 2022-03-08 17:20 ` Somalapuram, Amaranath
2022-03-08 18:53 ` Andrey Grodzovsky
0 siblings, 1 reply; 32+ messages in thread
From: Somalapuram, Amaranath @ 2022-03-08 17:20 UTC (permalink / raw)
To: Sharma, Shashank, Andrey Grodzovsky, Shashank Sharma, amd-gfx
Cc: Alexander Deucher, amaranath.somalapuram, Christian Koenig
[-- Attachment #1: Type: text/plain, Size: 5202 bytes --]
On 3/8/2022 10:00 PM, Sharma, Shashank wrote:
> Hello Andrey
>
> On 3/8/2022 5:26 PM, Andrey Grodzovsky wrote:
>>
>> On 2022-03-07 11:26, Shashank Sharma wrote:
>>> From: Shashank Sharma <shashank.sharma@amd.com>
>>>
>>> This patch adds a work function, which will get scheduled
>>> in event of a GPU reset, and will send a uevent to user with
>>> some reset context infomration, like a PID and some flags.
>>
>>
>> Where is the actual scheduling of the work function ? Shouldn't
>> there be a patch for that too ?
>>
>
> Yes, Amar is working on that patch, on top of these patches. They
> should be out soon. I thought it was a good idea to get quick feedback
> on the basic patches before we build something on top of it.
>
schedule_work() will be called in the function amdgpu_do_asic_reset ()
after getting vram_lost info:
vram_lost = amdgpu_device_check_vram_lost(tmp_adev);
update amdgpu_reset_event_ctx and call schedule_work()
* vram_lost
* reset_context->job->vm->task_info.process_name
* reset_context->job->vm->task_info.pid
Regards,
S.Amarnath
> - Shashank
>
>> Andrey
>>
>>
>>>
>>> The userspace can do some recovery and post-processing work
>>> based on this event.
>>>
>>> V2:
>>> - Changed the name of the work to gpu_reset_event_work
>>> (Christian)
>>> - Added a structure to accommodate some additional information
>>> (like a PID and some flags)
>>>
>>> Cc: Alexander Deucher <alexander.deucher@amd.com>
>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +++++++
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++++
>>> 2 files changed, 26 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index d8b854fcbffa..7df219fe363f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -813,6 +813,11 @@ struct amd_powerplay {
>>> #define AMDGPU_RESET_MAGIC_NUM 64
>>> #define AMDGPU_MAX_DF_PERFMONS 4
>>> #define AMDGPU_PRODUCT_NAME_LEN 64
>>> +struct amdgpu_reset_event_ctx {
>>> + uint64_t pid;
>>> + uint32_t flags;
>>> +};
>>> +
>>> struct amdgpu_device {
>>> struct device *dev;
>>> struct pci_dev *pdev;
>>> @@ -1063,6 +1068,7 @@ struct amdgpu_device {
>>> int asic_reset_res;
>>> struct work_struct xgmi_reset_work;
>>> + struct work_struct gpu_reset_event_work;
>>> struct list_head reset_list;
>>> long gfx_timeout;
>>> @@ -1097,6 +1103,7 @@ struct amdgpu_device {
>>> pci_channel_state_t pci_channel_state;
>>> struct amdgpu_reset_control *reset_cntl;
>>> + struct amdgpu_reset_event_ctx reset_event_ctx;
>>> uint32_t ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
>>> bool ram_is_direct_mapped;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index ed077de426d9..c43d099da06d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -73,6 +73,7 @@
>>> #include <linux/pm_runtime.h>
>>> #include <drm/drm_drv.h>
>>> +#include <drm/drm_sysfs.h>
>>> MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
>>> MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
>>> @@ -3277,6 +3278,23 @@ bool amdgpu_device_has_dc_support(struct
>>> amdgpu_device *adev)
>>> return amdgpu_device_asic_has_dc_support(adev->asic_type);
>>> }
>>> +static void amdgpu_device_reset_event_func(struct work_struct *__work)
>>> +{
>>> + struct amdgpu_device *adev = container_of(__work, struct
>>> amdgpu_device,
>>> + gpu_reset_event_work);
>>> + struct amdgpu_reset_event_ctx *event_ctx = &adev->reset_event_ctx;
>>> +
>>> + /*
>>> + * A GPU reset has happened, indicate the userspace and pass the
>>> + * following information:
>>> + * - pid of the process involved,
>>> + * - if the VRAM is valid or not,
>>> + * - indicate that userspace may want to collect the ftrace
>>> event
>>> + * data from the trace event.
>>> + */
>>> + drm_sysfs_reset_event(&adev->ddev, event_ctx->pid,
>>> event_ctx->flags);
>>> +}
>>> +
>>> static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)
>>> {
>>> struct amdgpu_device *adev =
>>> @@ -3525,6 +3543,7 @@ int amdgpu_device_init(struct amdgpu_device
>>> *adev,
>>> amdgpu_device_delay_enable_gfx_off);
>>> INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
>>> + INIT_WORK(&adev->gpu_reset_event_work,
>>> amdgpu_device_reset_event_func);
>>> adev->gfx.gfx_off_req_count = 1;
>>> adev->pm.ac_power = power_supply_is_system_supplied() > 0;
[-- Attachment #2: Type: text/html, Size: 9872 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 16:57 ` Sharma, Shashank
2022-03-08 17:04 ` Somalapuram, Amaranath
@ 2022-03-08 17:27 ` Limonciello, Mario
2022-03-08 18:08 ` Sharma, Shashank
1 sibling, 1 reply; 32+ messages in thread
From: Limonciello, Mario @ 2022-03-08 17:27 UTC (permalink / raw)
To: Sharma, Shashank, Andrey Grodzovsky, Lazar, Lijo,
Shashank Sharma, amd-gfx, Shyam-sundar S-k, Sanket Goswami
Cc: Deucher, Alexander, Somalapuram, Amaranath, Koenig, Christian
On 3/8/2022 10:57, Sharma, Shashank wrote:
>
>
> On 3/8/2022 5:55 PM, Andrey Grodzovsky wrote:
>> You can read on their side here -
>> https://www.phoronix.com/scan.php?page=news_item&px=AMD-STB-Linux-5.17
>> and see their patch. THey don't have as clean
>> interface as we do to retrieve the buffer and currently it's
>> hard-coded for debugfs dump but it looks like pretty straight forward
>> to expose their buffer to external
>> client like amdgpu.
>
> Noted, thanks for the pointer.
> - Shashank
Yeah I think this is a great idea for APU, but need to keep in mind
amd-pmc is only activated if APU is configured for s0i3. So in the
IS_APU check you will need to test for set for s0i3 and return an error
code if not set.
For APU it's currently fetched on demand from debugfs. If you can
"easily" export a symbol I say go for it and include a patch in your
series. If not my suggestion is to stub this out and let Shyam and
Sanket fill in the stub after they can sort out the exporting it to
another driver.
>>
>> Andrey
>>
>> On 2022-03-08 11:46, Sharma, Shashank wrote:
>>> I have a very limited understanding of PMC driver and its interfaces,
>>> so I would just go ahead and rely on Andrey's
>>> judgement/recommendation on this :)
>>>
>>> - Shashank
>>>
>>> On 3/8/2022 5:39 PM, Andrey Grodzovsky wrote:
>>>> As long as PMC driver provides clear interface to retrieve the info
>>>> there should be no issue to call either amdgpu interface or PMC
>>>> interface using IS_APU (or something alike in the code)
>>>> We probably should add a wrapper function around this logic in amdgpu.
>>>>
>>>> Andrey
>>>>
>>>> On 2022-03-08 11:36, Lazar, Lijo wrote:
>>>>>
>>>>> [AMD Official Use Only]
>>>>>
>>>>>
>>>>> +Mario
>>>>>
>>>>> I guess that means the functionality needs to be present in amdgpu
>>>>> for APUs also. Presently, this is taken care by PMC driver for APUs.
>>>>>
>>>>> Thanks,
>>>>> Lijo
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf
>>>>> of Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> *Sent:* Tuesday, March 8, 2022 9:55:03 PM
>>>>> *To:* Shashank Sharma <contactshashanksharma@gmail.com>;
>>>>> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
>>>>> *Cc:* Deucher, Alexander <Alexander.Deucher@amd.com>; Somalapuram,
>>>>> Amaranath <Amaranath.Somalapuram@amd.com>; Koenig, Christian
>>>>> <Christian.Koenig@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
>>>>> *Subject:* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
>>>>>
>>>>> On 2022-03-07 11:26, Shashank Sharma wrote:
>>>>> > From: Shashank Sharma <shashank.sharma@amd.com>
>>>>> >
>>>>> > This patch adds a new sysfs event, which will indicate
>>>>> > the userland about a GPU reset, and can also provide
>>>>> > some information like:
>>>>> > - which PID was involved in the GPU reset
>>>>> > - what was the GPU status (using flags)
>>>>> >
>>>>> > This patch also introduces the first flag of the flags
>>>>> > bitmap, which can be appended as and when required.
>>>>>
>>>>>
>>>>> I am reminding again about another important piece of info which
>>>>> you can add
>>>>> here and that is Smart Trace Buffer dump [1]. The buffer size is HW
>>>>> specific but
>>>>> from what I see there is no problem to just amend it as part of envp[]
>>>>> initialization.
>>>>> bellow.
>>>>>
>>>>> The interface to get the buffer is smu_stb_collect_info and usage
>>>>> can be
>>>>> seen from
>>>>> frebugfs interface in smu_stb_debugfs_open
>>>>>
>>>>> [1] -
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0
>>>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0>
>>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>> >
>>>>> > Cc: Alexandar Deucher <alexander.deucher@amd.com>
>>>>> > Cc: Christian Koenig <christian.koenig@amd.com>
>>>>> > Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>> > ---
>>>>> > drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>>>> > include/drm/drm_sysfs.h | 3 +++
>>>>> > 2 files changed, 27 insertions(+)
>>>>> >
>>>>> > diff --git a/drivers/gpu/drm/drm_sysfs.c
>>>>> b/drivers/gpu/drm/drm_sysfs.c
>>>>> > index 430e00b16eec..52a015161431 100644
>>>>> > --- a/drivers/gpu/drm/drm_sysfs.c
>>>>> > +++ b/drivers/gpu/drm/drm_sysfs.c
>>>>> > @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct
>>>>> drm_device *dev)
>>>>> > }
>>>>> > EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>>>> >
>>>>> > +/**
>>>>> > + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU
>>>>> reset
>>>>> > + * @dev: DRM device
>>>>> > + * @pid: The process ID involve with the reset
>>>>> > + * @flags: Any other information about the GPU status
>>>>> > + *
>>>>> > + * Send a uevent for the DRM device specified by @dev. This
>>>>> indicates
>>>>> > + * user that a GPU reset has occurred, so that the interested
>>>>> client
>>>>> > + * can take any recovery or profiling measure, when required.
>>>>> > + */
>>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>>>> uint32_t flags)
>>>>> > +{
>>>>> > + unsigned char pid_str[21], flags_str[15];
>>>>> > + unsigned char reset_str[] = "RESET=1";
>>>>> > + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>>>> > +
>>>>> > + DRM_DEBUG("generating reset event\n");
>>>>> > +
>>>>> > + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>>>> > + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>>>>> > + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>>>> > +}
>>>>> > +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>>>> > +
>>>>> > /**
>>>>> > * drm_sysfs_connector_hotplug_event - generate a DRM uevent
>>>>> for any connector
>>>>> > * change
>>>>> > diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>>>> > index 6273cac44e47..63f00fe8054c 100644
>>>>> > --- a/include/drm/drm_sysfs.h
>>>>> > +++ b/include/drm/drm_sysfs.h
>>>>> > @@ -2,6 +2,8 @@
>>>>> > #ifndef _DRM_SYSFS_H_
>>>>> > #define _DRM_SYSFS_H_
>>>>> >
>>>>> > +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>>>> > +
>>>>> > struct drm_device;
>>>>> > struct device;
>>>>> > struct drm_connector;
>>>>> > @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>>>>> > void drm_class_device_unregister(struct device *dev);
>>>>> >
>>>>> > void drm_sysfs_hotplug_event(struct drm_device *dev);
>>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>>>> uint32_t reset_flags);
>>>>> > void drm_sysfs_connector_hotplug_event(struct drm_connector
>>>>> *connector);
>>>>> > void drm_sysfs_connector_status_event(struct drm_connector
>>>>> *connector,
>>>>> > struct drm_property *property);
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 17:27 ` Limonciello, Mario
@ 2022-03-08 18:08 ` Sharma, Shashank
2022-03-08 18:10 ` Limonciello, Mario
0 siblings, 1 reply; 32+ messages in thread
From: Sharma, Shashank @ 2022-03-08 18:08 UTC (permalink / raw)
To: Limonciello, Mario, Andrey Grodzovsky, Lazar, Lijo,
Shashank Sharma, amd-gfx, Shyam-sundar S-k, Sanket Goswami
Cc: Deucher, Alexander, Somalapuram, Amaranath, Koenig, Christian
Hey Mario,
On 3/8/2022 6:27 PM, Limonciello, Mario wrote:
> On 3/8/2022 10:57, Sharma, Shashank wrote:
>>
>>
>> On 3/8/2022 5:55 PM, Andrey Grodzovsky wrote:
>>> You can read on their side here -
>>> https://www.phoronix.com/scan.php?page=news_item&px=AMD-STB-Linux-5.17 and
>>> see their patch. THey don't have as clean
>>> interface as we do to retrieve the buffer and currently it's
>>> hard-coded for debugfs dump but it looks like pretty straight forward
>>> to expose their buffer to external
>>> client like amdgpu.
>>
>> Noted, thanks for the pointer.
>> - Shashank
>
> Yeah I think this is a great idea for APU, but need to keep in mind
> amd-pmc is only activated if APU is configured for s0i3. So in the
> IS_APU check you will need to test for set for s0i3 and return an error
> code if not set.
>
> For APU it's currently fetched on demand from debugfs. If you can
> "easily" export a symbol I say go for it and include a patch in your
> series.
Yep, if the info is available via debugfs, I don't see a reason why
can't we fetch that directly from its lower layers. May I know the name
of this debugfs entry ?
- Shashank
If not my suggestion is to stub this out and let Shyam and
> Sanket fill in the stub after they can sort out the exporting it to
> another driver.
>
>>>
>>> Andrey
>>>
>>> On 2022-03-08 11:46, Sharma, Shashank wrote:
>>>> I have a very limited understanding of PMC driver and its
>>>> interfaces, so I would just go ahead and rely on Andrey's
>>>> judgement/recommendation on this :)
>>>>
>>>> - Shashank
>>>>
>>>> On 3/8/2022 5:39 PM, Andrey Grodzovsky wrote:
>>>>> As long as PMC driver provides clear interface to retrieve the info
>>>>> there should be no issue to call either amdgpu interface or PMC
>>>>> interface using IS_APU (or something alike in the code)
>>>>> We probably should add a wrapper function around this logic in amdgpu.
>>>>>
>>>>> Andrey
>>>>>
>>>>> On 2022-03-08 11:36, Lazar, Lijo wrote:
>>>>>>
>>>>>> [AMD Official Use Only]
>>>>>>
>>>>>>
>>>>>> +Mario
>>>>>>
>>>>>> I guess that means the functionality needs to be present in amdgpu
>>>>>> for APUs also. Presently, this is taken care by PMC driver for APUs.
>>>>>>
>>>>>> Thanks,
>>>>>> Lijo
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf
>>>>>> of Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> *Sent:* Tuesday, March 8, 2022 9:55:03 PM
>>>>>> *To:* Shashank Sharma <contactshashanksharma@gmail.com>;
>>>>>> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
>>>>>> *Cc:* Deucher, Alexander <Alexander.Deucher@amd.com>; Somalapuram,
>>>>>> Amaranath <Amaranath.Somalapuram@amd.com>; Koenig, Christian
>>>>>> <Christian.Koenig@amd.com>; Sharma, Shashank
>>>>>> <Shashank.Sharma@amd.com>
>>>>>> *Subject:* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
>>>>>>
>>>>>> On 2022-03-07 11:26, Shashank Sharma wrote:
>>>>>> > From: Shashank Sharma <shashank.sharma@amd.com>
>>>>>> >
>>>>>> > This patch adds a new sysfs event, which will indicate
>>>>>> > the userland about a GPU reset, and can also provide
>>>>>> > some information like:
>>>>>> > - which PID was involved in the GPU reset
>>>>>> > - what was the GPU status (using flags)
>>>>>> >
>>>>>> > This patch also introduces the first flag of the flags
>>>>>> > bitmap, which can be appended as and when required.
>>>>>>
>>>>>>
>>>>>> I am reminding again about another important piece of info which
>>>>>> you can add
>>>>>> here and that is Smart Trace Buffer dump [1]. The buffer size is HW
>>>>>> specific but
>>>>>> from what I see there is no problem to just amend it as part of
>>>>>> envp[]
>>>>>> initialization.
>>>>>> bellow.
>>>>>>
>>>>>> The interface to get the buffer is smu_stb_collect_info and usage
>>>>>> can be
>>>>>> seen from
>>>>>> frebugfs interface in smu_stb_debugfs_open
>>>>>>
>>>>>> [1] -
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0
>>>>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0>
>>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>> >
>>>>>> > Cc: Alexandar Deucher <alexander.deucher@amd.com>
>>>>>> > Cc: Christian Koenig <christian.koenig@amd.com>
>>>>>> > Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>>> > ---
>>>>>> > drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>>>>> > include/drm/drm_sysfs.h | 3 +++
>>>>>> > 2 files changed, 27 insertions(+)
>>>>>> >
>>>>>> > diff --git a/drivers/gpu/drm/drm_sysfs.c
>>>>>> b/drivers/gpu/drm/drm_sysfs.c
>>>>>> > index 430e00b16eec..52a015161431 100644
>>>>>> > --- a/drivers/gpu/drm/drm_sysfs.c
>>>>>> > +++ b/drivers/gpu/drm/drm_sysfs.c
>>>>>> > @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct
>>>>>> drm_device *dev)
>>>>>> > }
>>>>>> > EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>>>>> >
>>>>>> > +/**
>>>>>> > + * drm_sysfs_reset_event - generate a DRM uevent to indicate
>>>>>> GPU reset
>>>>>> > + * @dev: DRM device
>>>>>> > + * @pid: The process ID involve with the reset
>>>>>> > + * @flags: Any other information about the GPU status
>>>>>> > + *
>>>>>> > + * Send a uevent for the DRM device specified by @dev. This
>>>>>> indicates
>>>>>> > + * user that a GPU reset has occurred, so that the interested
>>>>>> client
>>>>>> > + * can take any recovery or profiling measure, when required.
>>>>>> > + */
>>>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t
>>>>>> pid, uint32_t flags)
>>>>>> > +{
>>>>>> > + unsigned char pid_str[21], flags_str[15];
>>>>>> > + unsigned char reset_str[] = "RESET=1";
>>>>>> > + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>>>>> > +
>>>>>> > + DRM_DEBUG("generating reset event\n");
>>>>>> > +
>>>>>> > + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>>>>> > + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u",
>>>>>> flags);
>>>>>> > + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>>>>> > +}
>>>>>> > +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>>>>> > +
>>>>>> > /**
>>>>>> > * drm_sysfs_connector_hotplug_event - generate a DRM uevent
>>>>>> for any connector
>>>>>> > * change
>>>>>> > diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>>>>> > index 6273cac44e47..63f00fe8054c 100644
>>>>>> > --- a/include/drm/drm_sysfs.h
>>>>>> > +++ b/include/drm/drm_sysfs.h
>>>>>> > @@ -2,6 +2,8 @@
>>>>>> > #ifndef _DRM_SYSFS_H_
>>>>>> > #define _DRM_SYSFS_H_
>>>>>> >
>>>>>> > +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>>>>> > +
>>>>>> > struct drm_device;
>>>>>> > struct device;
>>>>>> > struct drm_connector;
>>>>>> > @@ -11,6 +13,7 @@ int drm_class_device_register(struct device
>>>>>> *dev);
>>>>>> > void drm_class_device_unregister(struct device *dev);
>>>>>> >
>>>>>> > void drm_sysfs_hotplug_event(struct drm_device *dev);
>>>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t
>>>>>> pid, uint32_t reset_flags);
>>>>>> > void drm_sysfs_connector_hotplug_event(struct drm_connector
>>>>>> *connector);
>>>>>> > void drm_sysfs_connector_status_event(struct drm_connector
>>>>>> *connector,
>>>>>> > struct drm_property
>>>>>> *property);
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 18:08 ` Sharma, Shashank
@ 2022-03-08 18:10 ` Limonciello, Mario
0 siblings, 0 replies; 32+ messages in thread
From: Limonciello, Mario @ 2022-03-08 18:10 UTC (permalink / raw)
To: Sharma, Shashank, Grodzovsky, Andrey, Lazar, Lijo,
Shashank Sharma, amd-gfx, S-k, Shyam-sundar, Goswami, Sanket
Cc: Deucher, Alexander, Somalapuram, Amaranath, Koenig, Christian
[AMD Official Use Only]
> -----Original Message-----
> From: Sharma, Shashank <Shashank.Sharma@amd.com>
> Sent: Tuesday, March 8, 2022 12:09
> To: Limonciello, Mario <Mario.Limonciello@amd.com>; Grodzovsky, Andrey
> <Andrey.Grodzovsky@amd.com>; Lazar, Lijo <Lijo.Lazar@amd.com>; Shashank
> Sharma <contactshashanksharma@gmail.com>; amd-
> gfx@lists.freedesktop.org; S-k, Shyam-sundar <Shyam-sundar.S-k@amd.com>;
> Goswami, Sanket <Sanket.Goswami@amd.com>
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Somalapuram,
> Amaranath <Amaranath.Somalapuram@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>
> Subject: Re: [PATCH 1/2] drm: Add GPU reset sysfs event
>
> Hey Mario,
>
> On 3/8/2022 6:27 PM, Limonciello, Mario wrote:
> > On 3/8/2022 10:57, Sharma, Shashank wrote:
> >>
> >>
> >> On 3/8/2022 5:55 PM, Andrey Grodzovsky wrote:
> >>> You can read on their side here -
> >>> https://www.phoronix.com/scan.php?page=news_item&px=AMD-STB-
> Linux-5.17 and
> >>> see their patch. THey don't have as clean
> >>> interface as we do to retrieve the buffer and currently it's
> >>> hard-coded for debugfs dump but it looks like pretty straight forward
> >>> to expose their buffer to external
> >>> client like amdgpu.
>
> >>
> >> Noted, thanks for the pointer.
> >> - Shashank
> >
> > Yeah I think this is a great idea for APU, but need to keep in mind
> > amd-pmc is only activated if APU is configured for s0i3. So in the
> > IS_APU check you will need to test for set for s0i3 and return an error
> > code if not set.
> >
> > For APU it's currently fetched on demand from debugfs. If you can
> > "easily" export a symbol I say go for it and include a patch in your
> > series.
>
> Yep, if the info is available via debugfs, I don't see a reason why
> can't we fetch that directly from its lower layers. May I know the name
> of this debugfs entry ?
It's called "stb_read" from debugfs.
>
> - Shashank
>
> If not my suggestion is to stub this out and let Shyam and
> > Sanket fill in the stub after they can sort out the exporting it to
> > another driver.
> >
> >>>
> >>> Andrey
> >>>
> >>> On 2022-03-08 11:46, Sharma, Shashank wrote:
> >>>> I have a very limited understanding of PMC driver and its
> >>>> interfaces, so I would just go ahead and rely on Andrey's
> >>>> judgement/recommendation on this :)
> >>>>
> >>>> - Shashank
> >>>>
> >>>> On 3/8/2022 5:39 PM, Andrey Grodzovsky wrote:
> >>>>> As long as PMC driver provides clear interface to retrieve the info
> >>>>> there should be no issue to call either amdgpu interface or PMC
> >>>>> interface using IS_APU (or something alike in the code)
> >>>>> We probably should add a wrapper function around this logic in amdgpu.
> >>>>>
> >>>>> Andrey
> >>>>>
> >>>>> On 2022-03-08 11:36, Lazar, Lijo wrote:
> >>>>>>
> >>>>>> [AMD Official Use Only]
> >>>>>>
> >>>>>>
> >>>>>> +Mario
> >>>>>>
> >>>>>> I guess that means the functionality needs to be present in amdgpu
> >>>>>> for APUs also. Presently, this is taken care by PMC driver for APUs.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Lijo
> >>>>>> ------------------------------------------------------------------------
> >>>>>>
> >>>>>> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf
> >>>>>> of Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>>>>> *Sent:* Tuesday, March 8, 2022 9:55:03 PM
> >>>>>> *To:* Shashank Sharma <contactshashanksharma@gmail.com>;
> >>>>>> amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
> >>>>>> *Cc:* Deucher, Alexander <Alexander.Deucher@amd.com>;
> Somalapuram,
> >>>>>> Amaranath <Amaranath.Somalapuram@amd.com>; Koenig, Christian
> >>>>>> <Christian.Koenig@amd.com>; Sharma, Shashank
> >>>>>> <Shashank.Sharma@amd.com>
> >>>>>> *Subject:* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
> >>>>>>
> >>>>>> On 2022-03-07 11:26, Shashank Sharma wrote:
> >>>>>> > From: Shashank Sharma <shashank.sharma@amd.com>
> >>>>>> >
> >>>>>> > This patch adds a new sysfs event, which will indicate
> >>>>>> > the userland about a GPU reset, and can also provide
> >>>>>> > some information like:
> >>>>>> > - which PID was involved in the GPU reset
> >>>>>> > - what was the GPU status (using flags)
> >>>>>> >
> >>>>>> > This patch also introduces the first flag of the flags
> >>>>>> > bitmap, which can be appended as and when required.
> >>>>>>
> >>>>>>
> >>>>>> I am reminding again about another important piece of info which
> >>>>>> you can add
> >>>>>> here and that is Smart Trace Buffer dump [1]. The buffer size is HW
> >>>>>> specific but
> >>>>>> from what I see there is no problem to just amend it as part of
> >>>>>> envp[]
> >>>>>> initialization.
> >>>>>> bellow.
> >>>>>>
> >>>>>> The interface to get the buffer is smu_stb_collect_info and usage
> >>>>>> can be
> >>>>>> seen from
> >>>>>> frebugfs interface in smu_stb_debugfs_open
> >>>>>>
> >>>>>> [1] -
> >>>>>>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.sp
> inics.net%2Flists%2Famd-
> gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc
> 3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7
> C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC
> 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&a
> mp;sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&re
> served=0
> >>>>>>
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.s
> pinics.net%2Flists%2Famd-
> gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc
> 3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7
> C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC
> 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&a
> mp;sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&re
> served=0>
> >>>>>>
> >>>>>>
> >>>>>> Andrey
> >>>>>>
> >>>>>>
> >>>>>> >
> >>>>>> > Cc: Alexandar Deucher <alexander.deucher@amd.com>
> >>>>>> > Cc: Christian Koenig <christian.koenig@amd.com>
> >>>>>> > Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
> >>>>>> > ---
> >>>>>> > drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
> >>>>>> > include/drm/drm_sysfs.h | 3 +++
> >>>>>> > 2 files changed, 27 insertions(+)
> >>>>>> >
> >>>>>> > diff --git a/drivers/gpu/drm/drm_sysfs.c
> >>>>>> b/drivers/gpu/drm/drm_sysfs.c
> >>>>>> > index 430e00b16eec..52a015161431 100644
> >>>>>> > --- a/drivers/gpu/drm/drm_sysfs.c
> >>>>>> > +++ b/drivers/gpu/drm/drm_sysfs.c
> >>>>>> > @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct
> >>>>>> drm_device *dev)
> >>>>>> > }
> >>>>>> > EXPORT_SYMBOL(drm_sysfs_hotplug_event);
> >>>>>> >
> >>>>>> > +/**
> >>>>>> > + * drm_sysfs_reset_event - generate a DRM uevent to indicate
> >>>>>> GPU reset
> >>>>>> > + * @dev: DRM device
> >>>>>> > + * @pid: The process ID involve with the reset
> >>>>>> > + * @flags: Any other information about the GPU status
> >>>>>> > + *
> >>>>>> > + * Send a uevent for the DRM device specified by @dev. This
> >>>>>> indicates
> >>>>>> > + * user that a GPU reset has occurred, so that the interested
> >>>>>> client
> >>>>>> > + * can take any recovery or profiling measure, when required.
> >>>>>> > + */
> >>>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t
> >>>>>> pid, uint32_t flags)
> >>>>>> > +{
> >>>>>> > + unsigned char pid_str[21], flags_str[15];
> >>>>>> > + unsigned char reset_str[] = "RESET=1";
> >>>>>> > + char *envp[] = { reset_str, pid_str, flags_str, NULL };
> >>>>>> > +
> >>>>>> > + DRM_DEBUG("generating reset event\n");
> >>>>>> > +
> >>>>>> > + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
> >>>>>> > + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u",
> >>>>>> flags);
> >>>>>> > + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
> envp);
> >>>>>> > +}
> >>>>>> > +EXPORT_SYMBOL(drm_sysfs_reset_event);
> >>>>>> > +
> >>>>>> > /**
> >>>>>> > * drm_sysfs_connector_hotplug_event - generate a DRM uevent
> >>>>>> for any connector
> >>>>>> > * change
> >>>>>> > diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
> >>>>>> > index 6273cac44e47..63f00fe8054c 100644
> >>>>>> > --- a/include/drm/drm_sysfs.h
> >>>>>> > +++ b/include/drm/drm_sysfs.h
> >>>>>> > @@ -2,6 +2,8 @@
> >>>>>> > #ifndef _DRM_SYSFS_H_
> >>>>>> > #define _DRM_SYSFS_H_
> >>>>>> >
> >>>>>> > +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
> >>>>>> > +
> >>>>>> > struct drm_device;
> >>>>>> > struct device;
> >>>>>> > struct drm_connector;
> >>>>>> > @@ -11,6 +13,7 @@ int drm_class_device_register(struct device
> >>>>>> *dev);
> >>>>>> > void drm_class_device_unregister(struct device *dev);
> >>>>>> >
> >>>>>> > void drm_sysfs_hotplug_event(struct drm_device *dev);
> >>>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t
> >>>>>> pid, uint32_t reset_flags);
> >>>>>> > void drm_sysfs_connector_hotplug_event(struct drm_connector
> >>>>>> *connector);
> >>>>>> > void drm_sysfs_connector_status_event(struct drm_connector
> >>>>>> *connector,
> >>>>>> > struct drm_property
> >>>>>> *property);
> >
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add work function for GPU reset event
2022-03-08 17:20 ` Somalapuram, Amaranath
@ 2022-03-08 18:53 ` Andrey Grodzovsky
0 siblings, 0 replies; 32+ messages in thread
From: Andrey Grodzovsky @ 2022-03-08 18:53 UTC (permalink / raw)
To: Somalapuram, Amaranath, Sharma, Shashank, Shashank Sharma, amd-gfx
Cc: Alexander Deucher, amaranath.somalapuram, Christian Koenig
[-- Attachment #1: Type: text/plain, Size: 5672 bytes --]
On 2022-03-08 12:20, Somalapuram, Amaranath wrote:
>
>
> On 3/8/2022 10:00 PM, Sharma, Shashank wrote:
>> Hello Andrey
>>
>> On 3/8/2022 5:26 PM, Andrey Grodzovsky wrote:
>>>
>>> On 2022-03-07 11:26, Shashank Sharma wrote:
>>>> From: Shashank Sharma <shashank.sharma@amd.com>
>>>>
>>>> This patch adds a work function, which will get scheduled
>>>> in event of a GPU reset, and will send a uevent to user with
>>>> some reset context infomration, like a PID and some flags.
>>>
>>>
>>> Where is the actual scheduling of the work function ? Shouldn't
>>> there be a patch for that too ?
>>>
>>
>> Yes, Amar is working on that patch, on top of these patches. They
>> should be out soon. I thought it was a good idea to get quick
>> feedback on the basic patches before we build something on top of it.
>>
> schedule_work() will be called in the function amdgpu_do_asic_reset ()
>
I didn't follow closely on the requirements and so I don't know but,
what about
job timeout that was able to soft recover - do you need to cover this
too ? Or
in this case no need to restart user application and you hence don't care ?
Andrey
> after getting vram_lost info:
>
> vram_lost = amdgpu_device_check_vram_lost(tmp_adev);
>
> update amdgpu_reset_event_ctx and call schedule_work()
>
> * vram_lost
> * reset_context->job->vm->task_info.process_name
> * reset_context->job->vm->task_info.pid
>
> Regards,
> S.Amarnath
>> - Shashank
>>
>>> Andrey
>>>
>>>
>>>>
>>>> The userspace can do some recovery and post-processing work
>>>> based on this event.
>>>>
>>>> V2:
>>>> - Changed the name of the work to gpu_reset_event_work
>>>> (Christian)
>>>> - Added a structure to accommodate some additional information
>>>> (like a PID and some flags)
>>>>
>>>> Cc: Alexander Deucher <alexander.deucher@amd.com>
>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +++++++
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++++
>>>> 2 files changed, 26 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> index d8b854fcbffa..7df219fe363f 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> @@ -813,6 +813,11 @@ struct amd_powerplay {
>>>> #define AMDGPU_RESET_MAGIC_NUM 64
>>>> #define AMDGPU_MAX_DF_PERFMONS 4
>>>> #define AMDGPU_PRODUCT_NAME_LEN 64
>>>> +struct amdgpu_reset_event_ctx {
>>>> + uint64_t pid;
>>>> + uint32_t flags;
>>>> +};
>>>> +
>>>> struct amdgpu_device {
>>>> struct device *dev;
>>>> struct pci_dev *pdev;
>>>> @@ -1063,6 +1068,7 @@ struct amdgpu_device {
>>>> int asic_reset_res;
>>>> struct work_struct xgmi_reset_work;
>>>> + struct work_struct gpu_reset_event_work;
>>>> struct list_head reset_list;
>>>> long gfx_timeout;
>>>> @@ -1097,6 +1103,7 @@ struct amdgpu_device {
>>>> pci_channel_state_t pci_channel_state;
>>>> struct amdgpu_reset_control *reset_cntl;
>>>> + struct amdgpu_reset_event_ctx reset_event_ctx;
>>>> uint32_t ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
>>>> bool ram_is_direct_mapped;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index ed077de426d9..c43d099da06d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -73,6 +73,7 @@
>>>> #include <linux/pm_runtime.h>
>>>> #include <drm/drm_drv.h>
>>>> +#include <drm/drm_sysfs.h>
>>>> MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
>>>> MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
>>>> @@ -3277,6 +3278,23 @@ bool amdgpu_device_has_dc_support(struct
>>>> amdgpu_device *adev)
>>>> return amdgpu_device_asic_has_dc_support(adev->asic_type);
>>>> }
>>>> +static void amdgpu_device_reset_event_func(struct work_struct
>>>> *__work)
>>>> +{
>>>> + struct amdgpu_device *adev = container_of(__work, struct
>>>> amdgpu_device,
>>>> + gpu_reset_event_work);
>>>> + struct amdgpu_reset_event_ctx *event_ctx =
>>>> &adev->reset_event_ctx;
>>>> +
>>>> + /*
>>>> + * A GPU reset has happened, indicate the userspace and pass the
>>>> + * following information:
>>>> + * - pid of the process involved,
>>>> + * - if the VRAM is valid or not,
>>>> + * - indicate that userspace may want to collect the ftrace
>>>> event
>>>> + * data from the trace event.
>>>> + */
>>>> + drm_sysfs_reset_event(&adev->ddev, event_ctx->pid,
>>>> event_ctx->flags);
>>>> +}
>>>> +
>>>> static void amdgpu_device_xgmi_reset_func(struct work_struct
>>>> *__work)
>>>> {
>>>> struct amdgpu_device *adev =
>>>> @@ -3525,6 +3543,7 @@ int amdgpu_device_init(struct amdgpu_device
>>>> *adev,
>>>> amdgpu_device_delay_enable_gfx_off);
>>>> INIT_WORK(&adev->xgmi_reset_work,
>>>> amdgpu_device_xgmi_reset_func);
>>>> + INIT_WORK(&adev->gpu_reset_event_work,
>>>> amdgpu_device_reset_event_func);
>>>> adev->gfx.gfx_off_req_count = 1;
>>>> adev->pm.ac_power = power_supply_is_system_supplied() > 0;
[-- Attachment #2: Type: text/html, Size: 10138 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 15:37 ` Somalapuram, Amaranath
@ 2022-03-09 8:02 ` Christian König
0 siblings, 0 replies; 32+ messages in thread
From: Christian König @ 2022-03-09 8:02 UTC (permalink / raw)
To: Somalapuram, Amaranath, Sharma, Shashank, Christian König,
Shashank Sharma, amd-gfx
Cc: Alexandar Deucher, amaranath.somalapuram
Am 08.03.22 um 16:37 schrieb Somalapuram, Amaranath:
>
> On 3/8/2022 5:26 PM, Sharma, Shashank wrote:
>>
>>
>> On 3/8/2022 11:32 AM, Christian König wrote:
>>> Am 08.03.22 um 10:31 schrieb Sharma, Shashank:
>>>>
>>>>
>>>> On 3/8/2022 8:06 AM, Christian König wrote:
>>>>> Am 07.03.22 um 17:26 schrieb Shashank Sharma:
>>>>>> From: Shashank Sharma <shashank.sharma@amd.com>
>>>>>>
>>>>>> This patch adds a new sysfs event, which will indicate
>>>>>> the userland about a GPU reset, and can also provide
>>>>>> some information like:
>>>>>> - which PID was involved in the GPU reset
>>>>>> - what was the GPU status (using flags)
>>>>>>
>>>>>> This patch also introduces the first flag of the flags
>>>>>> bitmap, which can be appended as and when required.
>>>>>
>>>>> Make sure to CC the dri-devel mailing list when reviewing this.
>>>> Got it,
>>>>
>>>> I was also curious if we want to move the reset_ctx structure
>>>> itself to DRM layer, like
>>>> drm_reset_event_ctx {
>>>> u32 pid;
>>>> u32 flags;
>>>> char process_name[64];
>>>> };
>>>
>>> I was entertaining that thought as well.
>>>
>>> But if we do this I would go even a step further and also move the
>>> reset work item into the DRM layer as well.
>>>
>>> You might also look like into migrating the exiting i915 code which
>>> uses udev to signal GPU resets to this function as well.
>>>
> Hi Christian,
>
> Can we access adev in common drm (even if we can access adev it will
> not be common code)
No, that won't work.
>
> move work function to drm need to be protected(i.e reset_domain->sem),
> adding something like reset_sem to drm_device?
Hui what? Why do you think the work function needs any protection? That
doesn't make much sense.
Regards,
Christian.
>
> Regards,
>
> S.Amarnath
>
>>> Regards,
>>> Christian.
>>
>> That seems like a good idea, let me quickly dive into i915 and check
>> this out.
>>
>> Shashank
>>>
>>>>
>>>> and then:
>>>> void drm_sysfs_reset_event(struct drm_device *dev,
>>>> drm_reset_event_ctx *ctx);
>>>>
>>>>>
>>>>>>
>>>>>> Cc: Alexandar Deucher <alexander.deucher@amd.com>
>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>>>>> include/drm/drm_sysfs.h | 3 +++
>>>>>> 2 files changed, 27 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_sysfs.c
>>>>>> b/drivers/gpu/drm/drm_sysfs.c
>>>>>> index 430e00b16eec..52a015161431 100644
>>>>>> --- a/drivers/gpu/drm/drm_sysfs.c
>>>>>> +++ b/drivers/gpu/drm/drm_sysfs.c
>>>>>> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct
>>>>>> drm_device *dev)
>>>>>> }
>>>>>> EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>>>>> +/**
>>>>>> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU
>>>>>> reset
>>>>>> + * @dev: DRM device
>>>>>> + * @pid: The process ID involve with the reset
>>>>>> + * @flags: Any other information about the GPU status
>>>>>> + *
>>>>>> + * Send a uevent for the DRM device specified by @dev. This
>>>>>> indicates
>>>>>> + * user that a GPU reset has occurred, so that the interested
>>>>>> client
>>>>>> + * can take any recovery or profiling measure, when required.
>>>>>> + */
>>>>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>>>>> uint32_t flags)
>>>>>
>>>>> The PID is usually only 32bit, but even better would be to use pid_t.
>>>>>
>>>>>> +{
>>>>>> + unsigned char pid_str[21], flags_str[15];
>>>>>> + unsigned char reset_str[] = "RESET=1";
>>>>>> + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>>>>> +
>>>>>> + DRM_DEBUG("generating reset event\n");
>>>>>> +
>>>>>> + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>>>>> + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>>>>>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>>>>> +
>>>>>> /**
>>>>>> * drm_sysfs_connector_hotplug_event - generate a DRM uevent
>>>>>> for any connector
>>>>>> * change
>>>>>> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>>>>> index 6273cac44e47..63f00fe8054c 100644
>>>>>> --- a/include/drm/drm_sysfs.h
>>>>>> +++ b/include/drm/drm_sysfs.h
>>>>>> @@ -2,6 +2,8 @@
>>>>>> #ifndef _DRM_SYSFS_H_
>>>>>> #define _DRM_SYSFS_H_
>>>>>> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>>>>
>>>>> Probably better to define that the other way around, e.g.
>>>>> DRM_GPU_RESET_FLAG_VRAM_LOST.
>>>>>
>>>>> Apart from that looks good to me.
>>>>>
>>>> Got it, noted.
>>>> - Shashank
>>>>
>>>>> Christian.
>>>>>
>>>>>> +
>>>>>> struct drm_device;
>>>>>> struct device;
>>>>>> struct drm_connector;
>>>>>> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>>>>>> void drm_class_device_unregister(struct device *dev);
>>>>>> void drm_sysfs_hotplug_event(struct drm_device *dev);
>>>>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>>>>> uint32_t reset_flags);
>>>>>> void drm_sysfs_connector_hotplug_event(struct drm_connector
>>>>>> *connector);
>>>>>> void drm_sysfs_connector_status_event(struct drm_connector
>>>>>> *connector,
>>>>>> struct drm_property *property);
>>>>>
>>>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
2022-03-08 16:40 ` Sharma, Shashank
@ 2022-03-09 8:05 ` Christian König
0 siblings, 0 replies; 32+ messages in thread
From: Christian König @ 2022-03-09 8:05 UTC (permalink / raw)
To: Sharma, Shashank, Christian König, Shashank Sharma, amd-gfx
Cc: Alexandar Deucher, amaranath.somalapuram
Am 08.03.22 um 17:40 schrieb Sharma, Shashank:
>
>
> On 3/8/2022 12:56 PM, Sharma, Shashank wrote:
>>
>>
>> On 3/8/2022 11:32 AM, Christian König wrote:
>>> Am 08.03.22 um 10:31 schrieb Sharma, Shashank:
>>>>
>>>>
>>>> On 3/8/2022 8:06 AM, Christian König wrote:
>>>>> Am 07.03.22 um 17:26 schrieb Shashank Sharma:
>>>>>> From: Shashank Sharma <shashank.sharma@amd.com>
>>>>>>
>>>>>> This patch adds a new sysfs event, which will indicate
>>>>>> the userland about a GPU reset, and can also provide
>>>>>> some information like:
>>>>>> - which PID was involved in the GPU reset
>>>>>> - what was the GPU status (using flags)
>>>>>>
>>>>>> This patch also introduces the first flag of the flags
>>>>>> bitmap, which can be appended as and when required.
>>>>>
>>>>> Make sure to CC the dri-devel mailing list when reviewing this.
>>>> Got it,
>>>>
>>>> I was also curious if we want to move the reset_ctx structure
>>>> itself to DRM layer, like
>>>> drm_reset_event_ctx {
>>>> u32 pid;
>>>> u32 flags;
>>>> char process_name[64];
>>>> };
>>>
>>> I was entertaining that thought as well.
>>>
>>> But if we do this I would go even a step further and also move the
>>> reset work item into the DRM layer as well.
>>>
>>> You might also look like into migrating the exiting i915 code which
>>> uses udev to signal GPU resets to this function as well.
>>>
>>> Regards,
>>> Christian.
>>
>> That seems like a good idea, let me quickly dive into i915 and check
>> this out.
>>
>> Shashank
>
> I had a quick look at I915, and it looks like both I915 and AMDGPU
> drivers have very different methods of passing the data to the work
> function, via different internal structures. Which means it would be
> much additional work in both the drivers to move the work function
> itself in the DRM layer.
>
> To me, now it seems like it would be better if we can just provide
> this interface to send the uevent and its structure, and the drivers
> can collect their information and pass it to WQ in their own way.
>
> How do you feel about it ?
That does not sounds like a good approach to me. If we add common drm
functionality then we need to take the existing drivers into account.
What driver specific information does i915 pass to the work function?
Christian.
>
> - Shashank
>
>>>
>>>>
>>>> and then:
>>>> void drm_sysfs_reset_event(struct drm_device *dev,
>>>> drm_reset_event_ctx *ctx);
>>>>
>>>>>
>>>>>>
>>>>>> Cc: Alexandar Deucher <alexander.deucher@amd.com>
>>>>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>>>>> include/drm/drm_sysfs.h | 3 +++
>>>>>> 2 files changed, 27 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_sysfs.c
>>>>>> b/drivers/gpu/drm/drm_sysfs.c
>>>>>> index 430e00b16eec..52a015161431 100644
>>>>>> --- a/drivers/gpu/drm/drm_sysfs.c
>>>>>> +++ b/drivers/gpu/drm/drm_sysfs.c
>>>>>> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct
>>>>>> drm_device *dev)
>>>>>> }
>>>>>> EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>>>>> +/**
>>>>>> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU
>>>>>> reset
>>>>>> + * @dev: DRM device
>>>>>> + * @pid: The process ID involve with the reset
>>>>>> + * @flags: Any other information about the GPU status
>>>>>> + *
>>>>>> + * Send a uevent for the DRM device specified by @dev. This
>>>>>> indicates
>>>>>> + * user that a GPU reset has occurred, so that the interested
>>>>>> client
>>>>>> + * can take any recovery or profiling measure, when required.
>>>>>> + */
>>>>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>>>>> uint32_t flags)
>>>>>
>>>>> The PID is usually only 32bit, but even better would be to use pid_t.
>>>>>
>>>>>> +{
>>>>>> + unsigned char pid_str[21], flags_str[15];
>>>>>> + unsigned char reset_str[] = "RESET=1";
>>>>>> + char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>>>>> +
>>>>>> + DRM_DEBUG("generating reset event\n");
>>>>>> +
>>>>>> + snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>>>>> + snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>>>>>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>>>>> +
>>>>>> /**
>>>>>> * drm_sysfs_connector_hotplug_event - generate a DRM uevent
>>>>>> for any connector
>>>>>> * change
>>>>>> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>>>>> index 6273cac44e47..63f00fe8054c 100644
>>>>>> --- a/include/drm/drm_sysfs.h
>>>>>> +++ b/include/drm/drm_sysfs.h
>>>>>> @@ -2,6 +2,8 @@
>>>>>> #ifndef _DRM_SYSFS_H_
>>>>>> #define _DRM_SYSFS_H_
>>>>>> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>>>>
>>>>> Probably better to define that the other way around, e.g.
>>>>> DRM_GPU_RESET_FLAG_VRAM_LOST.
>>>>>
>>>>> Apart from that looks good to me.
>>>>>
>>>> Got it, noted.
>>>> - Shashank
>>>>
>>>>> Christian.
>>>>>
>>>>>> +
>>>>>> struct drm_device;
>>>>>> struct device;
>>>>>> struct drm_connector;
>>>>>> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>>>>>> void drm_class_device_unregister(struct device *dev);
>>>>>> void drm_sysfs_hotplug_event(struct drm_device *dev);
>>>>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid,
>>>>>> uint32_t reset_flags);
>>>>>> void drm_sysfs_connector_hotplug_event(struct drm_connector
>>>>>> *connector);
>>>>>> void drm_sysfs_connector_status_event(struct drm_connector
>>>>>> *connector,
>>>>>> struct drm_property *property);
>>>>>
>>>
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2022-03-09 8:05 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-07 16:26 [PATCH 1/2] drm: Add GPU reset sysfs event Shashank Sharma
2022-03-07 16:26 ` [PATCH 2/2] drm/amdgpu: add work function for GPU reset event Shashank Sharma
2022-03-07 16:46 ` Somalapuram, Amaranath
2022-03-07 17:05 ` Sharma, Shashank
2022-03-08 7:15 ` Christian König
2022-03-08 9:35 ` Sharma, Shashank
2022-03-08 16:26 ` Andrey Grodzovsky
2022-03-08 16:30 ` Sharma, Shashank
2022-03-08 17:20 ` Somalapuram, Amaranath
2022-03-08 18:53 ` Andrey Grodzovsky
2022-03-08 6:32 ` [PATCH 1/2] drm: Add GPU reset sysfs event Somalapuram, Amaranath
2022-03-08 7:06 ` Christian König
2022-03-08 9:31 ` Sharma, Shashank
2022-03-08 10:32 ` Christian König
2022-03-08 11:56 ` Sharma, Shashank
2022-03-08 15:37 ` Somalapuram, Amaranath
2022-03-09 8:02 ` Christian König
2022-03-08 16:40 ` Sharma, Shashank
2022-03-09 8:05 ` Christian König
2022-03-08 16:25 ` Andrey Grodzovsky
2022-03-08 16:35 ` Sharma, Shashank
2022-03-08 16:36 ` Andrey Grodzovsky
2022-03-08 16:36 ` Lazar, Lijo
2022-03-08 16:39 ` Andrey Grodzovsky
2022-03-08 16:46 ` Sharma, Shashank
2022-03-08 16:55 ` Andrey Grodzovsky
2022-03-08 16:57 ` Sharma, Shashank
2022-03-08 17:04 ` Somalapuram, Amaranath
2022-03-08 17:17 ` Andrey Grodzovsky
2022-03-08 17:27 ` Limonciello, Mario
2022-03-08 18:08 ` Sharma, Shashank
2022-03-08 18:10 ` Limonciello, Mario
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.