* [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
@ 2022-01-17 6:33 Somalapuram Amaranath
2022-01-17 7:27 ` Christian König
2022-01-17 11:54 ` Lazar, Lijo
0 siblings, 2 replies; 13+ messages in thread
From: Somalapuram Amaranath @ 2022-01-17 6:33 UTC (permalink / raw)
To: amd-gfx
Cc: alexander.deucher, Somalapuram Amaranath, christian.koenig,
shashank.sharma
AMDGPURESET uevent added to notify userspace, collect dump_stack and trace
Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
---
drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 2ec1ffb36b1f..b73147ae41fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
}
}
+/**
+ * drm_sysfs_reset_event - generate a DRM uevent
+ * @dev: DRM device
+ *
+ * Send a uevent for the DRM device specified by @dev. Currently we only
+ * set AMDGPURESET=1 in the uevent environment, but this could be expanded to
+ * deal with other types of events.
+ *
+ * Any new uapi should be using the drm_sysfs_connector_status_event()
+ * for uevents on connector status change.
+ */
+void drm_sysfs_reset_event(struct drm_device *dev)
+{
+ char *event_string = "AMDGPURESET=1";
+ char *envp[2] = { event_string, NULL };
+
+ kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
+}
+
+void amdgpu_reset_dumps(struct amdgpu_device *adev)
+{
+ struct drm_device *ddev = adev_to_drm(adev);
+ int r = 0, i;
+
+ /* original raven doesn't have full asic reset */
+ if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
+ !(adev->apu_flags & AMD_APU_IS_RAVEN2))
+ return;
+ for (i = 0; i < adev->num_ip_blocks; i++) {
+ if (!adev->ip_blocks[i].status.valid)
+ continue;
+ if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
+ continue;
+ r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
+
+ if (r)
+ DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
+ adev->ip_blocks[i].version->funcs->name, r);
+ }
+
+ drm_sysfs_reset_event(ddev);
+ dump_stack();
+}
+
static int nv_asic_reset(struct amdgpu_device *adev)
{
int ret = 0;
+ amdgpu_reset_dumps(adev);
switch (nv_asic_reset_method(adev)) {
case AMD_RESET_METHOD_PCI:
dev_info(adev->dev, "PCI reset\n");
--
2.25.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
2022-01-17 6:33 [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset Somalapuram Amaranath
@ 2022-01-17 7:27 ` Christian König
2022-01-17 10:01 ` Somalapuram, Amaranath
2022-01-17 11:54 ` Lazar, Lijo
1 sibling, 1 reply; 13+ messages in thread
From: Christian König @ 2022-01-17 7:27 UTC (permalink / raw)
To: Somalapuram Amaranath, amd-gfx; +Cc: alexander.deucher, shashank.sharma
Am 17.01.22 um 07:33 schrieb Somalapuram Amaranath:
> AMDGPURESET uevent added to notify userspace, collect dump_stack and trace
>
> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
> 1 file changed, 45 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
> index 2ec1ffb36b1f..b73147ae41fb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
> }
> }
>
> +/**
> + * drm_sysfs_reset_event - generate a DRM uevent
> + * @dev: DRM device
> + *
> + * Send a uevent for the DRM device specified by @dev. Currently we only
> + * set AMDGPURESET=1 in the uevent environment, but this could be expanded to
> + * deal with other types of events.
> + *
> + * Any new uapi should be using the drm_sysfs_connector_status_event()
> + * for uevents on connector status change.
> + */
> +void drm_sysfs_reset_event(struct drm_device *dev)
This should probably go directly into the DRM subsystem.
> +{
> + char *event_string = "AMDGPURESET=1";
> + char *envp[2] = { event_string, NULL };
> +
> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
As I said this approach is a clear NAK. We can't allocate any memory
when we do a reset.
Regards,
Christian.
> +}
> +
> +void amdgpu_reset_dumps(struct amdgpu_device *adev)
> +{
> + struct drm_device *ddev = adev_to_drm(adev);
> + int r = 0, i;
> +
> + /* original raven doesn't have full asic reset */
> + if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
> + !(adev->apu_flags & AMD_APU_IS_RAVEN2))
> + return;
> + for (i = 0; i < adev->num_ip_blocks; i++) {
> + if (!adev->ip_blocks[i].status.valid)
> + continue;
> + if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
> + continue;
> + r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
> +
> + if (r)
> + DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
> + adev->ip_blocks[i].version->funcs->name, r);
> + }
> +
> + drm_sysfs_reset_event(ddev);
> + dump_stack();
> +}
> +
> static int nv_asic_reset(struct amdgpu_device *adev)
> {
> int ret = 0;
>
> + amdgpu_reset_dumps(adev);
> switch (nv_asic_reset_method(adev)) {
> case AMD_RESET_METHOD_PCI:
> dev_info(adev->dev, "PCI reset\n");
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
2022-01-17 7:27 ` Christian König
@ 2022-01-17 10:01 ` Somalapuram, Amaranath
2022-01-17 10:03 ` Christian König
0 siblings, 1 reply; 13+ messages in thread
From: Somalapuram, Amaranath @ 2022-01-17 10:01 UTC (permalink / raw)
To: Koenig, Christian, amd-gfx; +Cc: Deucher, Alexander, Sharma, Shashank
[AMD Official Use Only]
-----Original Message-----
From: Koenig, Christian <Christian.Koenig@amd.com>
Sent: Monday, January 17, 2022 12:57 PM
To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
Am 17.01.22 um 07:33 schrieb Somalapuram Amaranath:
> AMDGPURESET uevent added to notify userspace, collect dump_stack and
> trace
>
> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
> 1 file changed, 45 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
> b/drivers/gpu/drm/amd/amdgpu/nv.c index 2ec1ffb36b1f..b73147ae41fb
> 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
> }
> }
>
> +/**
> + * drm_sysfs_reset_event - generate a DRM uevent
> + * @dev: DRM device
> + *
> + * Send a uevent for the DRM device specified by @dev. Currently we
> +only
> + * set AMDGPURESET=1 in the uevent environment, but this could be
> +expanded to
> + * deal with other types of events.
> + *
> + * Any new uapi should be using the
> +drm_sysfs_connector_status_event()
> + * for uevents on connector status change.
> + */
> +void drm_sysfs_reset_event(struct drm_device *dev)
This should probably go directly into the DRM subsystem.
> +{
> + char *event_string = "AMDGPURESET=1";
> + char *envp[2] = { event_string, NULL };
> +
> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
As I said this approach is a clear NAK. We can't allocate any memory when we do a reset.
Regards,
Christian.
Can I do something like this:
void drm_sysfs_reset_event(struct drm_device *dev)
{
- char *event_string = "AMDGPURESET=1";
- char *envp[2] = { event_string, NULL };
+ char **envp;
+
+ envp = kcalloc(2,sizeof(char *), GFP_ATOMIC);
+ envp[0] = kcalloc(30, sizeof(char), GFP_ATOMIC);
+ envp[1] = kcalloc(30, sizeof(char), GFP_ATOMIC);
+
+ strcpy(envp[0], "AMDGPURESET=1");
+ strcpy(envp[1], "");
+
kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
+
+ kfree(envp[0]);
+ kfree(envp[1]);
+ kfree(envp);
}
Regards,
S.Amarnath
> +}
> +
> +void amdgpu_reset_dumps(struct amdgpu_device *adev) {
> + struct drm_device *ddev = adev_to_drm(adev);
> + int r = 0, i;
> +
> + /* original raven doesn't have full asic reset */
> + if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
> + !(adev->apu_flags & AMD_APU_IS_RAVEN2))
> + return;
> + for (i = 0; i < adev->num_ip_blocks; i++) {
> + if (!adev->ip_blocks[i].status.valid)
> + continue;
> + if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
> + continue;
> + r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
> +
> + if (r)
> + DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
> + adev->ip_blocks[i].version->funcs->name, r);
> + }
> +
> + drm_sysfs_reset_event(ddev);
> + dump_stack();
> +}
> +
> static int nv_asic_reset(struct amdgpu_device *adev)
> {
> int ret = 0;
>
> + amdgpu_reset_dumps(adev);
> switch (nv_asic_reset_method(adev)) {
> case AMD_RESET_METHOD_PCI:
> dev_info(adev->dev, "PCI reset\n");
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
2022-01-17 10:01 ` Somalapuram, Amaranath
@ 2022-01-17 10:03 ` Christian König
2022-01-17 10:09 ` Somalapuram, Amaranath
0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2022-01-17 10:03 UTC (permalink / raw)
To: Somalapuram, Amaranath, amd-gfx; +Cc: Deucher, Alexander, Sharma, Shashank
Am 17.01.22 um 11:01 schrieb Somalapuram, Amaranath:
> [AMD Official Use Only]
>
>
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Monday, January 17, 2022 12:57 PM
> To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
>
> Am 17.01.22 um 07:33 schrieb Somalapuram Amaranath:
>> AMDGPURESET uevent added to notify userspace, collect dump_stack and
>> trace
>>
>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
>> 1 file changed, 45 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
>> b/drivers/gpu/drm/amd/amdgpu/nv.c index 2ec1ffb36b1f..b73147ae41fb
>> 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>> }
>> }
>>
>> +/**
>> + * drm_sysfs_reset_event - generate a DRM uevent
>> + * @dev: DRM device
>> + *
>> + * Send a uevent for the DRM device specified by @dev. Currently we
>> +only
>> + * set AMDGPURESET=1 in the uevent environment, but this could be
>> +expanded to
>> + * deal with other types of events.
>> + *
>> + * Any new uapi should be using the
>> +drm_sysfs_connector_status_event()
>> + * for uevents on connector status change.
>> + */
>> +void drm_sysfs_reset_event(struct drm_device *dev)
> This should probably go directly into the DRM subsystem.
>
>> +{
>> + char *event_string = "AMDGPURESET=1";
>> + char *envp[2] = { event_string, NULL };
>> +
>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> As I said this approach is a clear NAK. We can't allocate any memory when we do a reset.
>
> Regards,
> Christian.
>
> Can I do something like this:
>
> void drm_sysfs_reset_event(struct drm_device *dev)
> {
> - char *event_string = "AMDGPURESET=1";
> - char *envp[2] = { event_string, NULL };
> + char **envp;
> +
> + envp = kcalloc(2,sizeof(char *), GFP_ATOMIC);
> + envp[0] = kcalloc(30, sizeof(char), GFP_ATOMIC);
> + envp[1] = kcalloc(30, sizeof(char), GFP_ATOMIC);
No, not really. kobject_uevent_env() will still allocate memory without
GFP_ATOMIC.
I think the whole approach of using udev won't work for this.
Regards,
Christian.
> +
> + strcpy(envp[0], "AMDGPURESET=1");
> + strcpy(envp[1], "");
> +
>
> kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> +
> + kfree(envp[0]);
> + kfree(envp[1]);
> + kfree(envp);
> }
>
> Regards,
> S.Amarnath
>
>> +}
>> +
>> +void amdgpu_reset_dumps(struct amdgpu_device *adev) {
>> + struct drm_device *ddev = adev_to_drm(adev);
>> + int r = 0, i;
>> +
>> + /* original raven doesn't have full asic reset */
>> + if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>> + !(adev->apu_flags & AMD_APU_IS_RAVEN2))
>> + return;
>> + for (i = 0; i < adev->num_ip_blocks; i++) {
>> + if (!adev->ip_blocks[i].status.valid)
>> + continue;
>> + if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>> + continue;
>> + r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>> +
>> + if (r)
>> + DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
>> + adev->ip_blocks[i].version->funcs->name, r);
>> + }
>> +
>> + drm_sysfs_reset_event(ddev);
>> + dump_stack();
>> +}
>> +
>> static int nv_asic_reset(struct amdgpu_device *adev)
>> {
>> int ret = 0;
>>
>> + amdgpu_reset_dumps(adev);
>> switch (nv_asic_reset_method(adev)) {
>> case AMD_RESET_METHOD_PCI:
>> dev_info(adev->dev, "PCI reset\n");
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
2022-01-17 10:03 ` Christian König
@ 2022-01-17 10:09 ` Somalapuram, Amaranath
2022-01-17 10:19 ` Christian König
0 siblings, 1 reply; 13+ messages in thread
From: Somalapuram, Amaranath @ 2022-01-17 10:09 UTC (permalink / raw)
To: Koenig, Christian, amd-gfx; +Cc: Deucher, Alexander, Sharma, Shashank
[AMD Official Use Only]
-----Original Message-----
From: Koenig, Christian <Christian.Koenig@amd.com>
Sent: Monday, January 17, 2022 3:33 PM
To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
Am 17.01.22 um 11:01 schrieb Somalapuram, Amaranath:
> [AMD Official Use Only]
>
>
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Monday, January 17, 2022 12:57 PM
> To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>;
> amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank
> <Shashank.Sharma@amd.com>
> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU
> reset
>
> Am 17.01.22 um 07:33 schrieb Somalapuram Amaranath:
>> AMDGPURESET uevent added to notify userspace, collect dump_stack and
>> trace
>>
>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
>> 1 file changed, 45 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
>> b/drivers/gpu/drm/amd/amdgpu/nv.c index 2ec1ffb36b1f..b73147ae41fb
>> 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>> }
>> }
>>
>> +/**
>> + * drm_sysfs_reset_event - generate a DRM uevent
>> + * @dev: DRM device
>> + *
>> + * Send a uevent for the DRM device specified by @dev. Currently we
>> +only
>> + * set AMDGPURESET=1 in the uevent environment, but this could be
>> +expanded to
>> + * deal with other types of events.
>> + *
>> + * Any new uapi should be using the
>> +drm_sysfs_connector_status_event()
>> + * for uevents on connector status change.
>> + */
>> +void drm_sysfs_reset_event(struct drm_device *dev)
> This should probably go directly into the DRM subsystem.
>
>> +{
>> + char *event_string = "AMDGPURESET=1";
>> + char *envp[2] = { event_string, NULL };
>> +
>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> As I said this approach is a clear NAK. We can't allocate any memory when we do a reset.
>
> Regards,
> Christian.
>
> Can I do something like this:
>
> void drm_sysfs_reset_event(struct drm_device *dev)
> {
> - char *event_string = "AMDGPURESET=1";
> - char *envp[2] = { event_string, NULL };
> + char **envp;
> +
> + envp = kcalloc(2,sizeof(char *), GFP_ATOMIC);
> + envp[0] = kcalloc(30, sizeof(char), GFP_ATOMIC);
> + envp[1] = kcalloc(30, sizeof(char), GFP_ATOMIC);
No, not really. kobject_uevent_env() will still allocate memory without GFP_ATOMIC.
I think the whole approach of using udev won't work for this.
Regards,
Christian.
I have tested it with sample applications:
Gpu reset:
sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
And I never missed the AMDGPURESET=1 event in user space,
even with continues resets with sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover .
Regards,
S.Amarnath
> +
> + strcpy(envp[0], "AMDGPURESET=1");
> + strcpy(envp[1], "");
> +
>
> kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
> envp);
> +
> + kfree(envp[0]);
> + kfree(envp[1]);
> + kfree(envp);
> }
>
> Regards,
> S.Amarnath
>
>> +}
>> +
>> +void amdgpu_reset_dumps(struct amdgpu_device *adev) {
>> + struct drm_device *ddev = adev_to_drm(adev);
>> + int r = 0, i;
>> +
>> + /* original raven doesn't have full asic reset */
>> + if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>> + !(adev->apu_flags & AMD_APU_IS_RAVEN2))
>> + return;
>> + for (i = 0; i < adev->num_ip_blocks; i++) {
>> + if (!adev->ip_blocks[i].status.valid)
>> + continue;
>> + if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>> + continue;
>> + r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>> +
>> + if (r)
>> + DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
>> + adev->ip_blocks[i].version->funcs->name, r);
>> + }
>> +
>> + drm_sysfs_reset_event(ddev);
>> + dump_stack();
>> +}
>> +
>> static int nv_asic_reset(struct amdgpu_device *adev)
>> {
>> int ret = 0;
>>
>> + amdgpu_reset_dumps(adev);
>> switch (nv_asic_reset_method(adev)) {
>> case AMD_RESET_METHOD_PCI:
>> dev_info(adev->dev, "PCI reset\n");
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
2022-01-17 10:09 ` Somalapuram, Amaranath
@ 2022-01-17 10:19 ` Christian König
2022-01-17 10:34 ` Somalapuram, Amaranath
0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2022-01-17 10:19 UTC (permalink / raw)
To: Somalapuram, Amaranath, amd-gfx; +Cc: Deucher, Alexander, Sharma, Shashank
Am 17.01.22 um 11:09 schrieb Somalapuram, Amaranath:
> [AMD Official Use Only]
>
>
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Monday, January 17, 2022 3:33 PM
> To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
>
> Am 17.01.22 um 11:01 schrieb Somalapuram, Amaranath:
>> [AMD Official Use Only]
>>
>>
>>
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Monday, January 17, 2022 12:57 PM
>> To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>;
>> amd-gfx@lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank
>> <Shashank.Sharma@amd.com>
>> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU
>> reset
>>
>> Am 17.01.22 um 07:33 schrieb Somalapuram Amaranath:
>>> AMDGPURESET uevent added to notify userspace, collect dump_stack and
>>> trace
>>>
>>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
>>> 1 file changed, 45 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
>>> b/drivers/gpu/drm/amd/amdgpu/nv.c index 2ec1ffb36b1f..b73147ae41fb
>>> 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>>> }
>>> }
>>>
>>> +/**
>>> + * drm_sysfs_reset_event - generate a DRM uevent
>>> + * @dev: DRM device
>>> + *
>>> + * Send a uevent for the DRM device specified by @dev. Currently we
>>> +only
>>> + * set AMDGPURESET=1 in the uevent environment, but this could be
>>> +expanded to
>>> + * deal with other types of events.
>>> + *
>>> + * Any new uapi should be using the
>>> +drm_sysfs_connector_status_event()
>>> + * for uevents on connector status change.
>>> + */
>>> +void drm_sysfs_reset_event(struct drm_device *dev)
>> This should probably go directly into the DRM subsystem.
>>
>>> +{
>>> + char *event_string = "AMDGPURESET=1";
>>> + char *envp[2] = { event_string, NULL };
>>> +
>>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>> As I said this approach is a clear NAK. We can't allocate any memory when we do a reset.
>>
>> Regards,
>> Christian.
>>
>> Can I do something like this:
>>
>> void drm_sysfs_reset_event(struct drm_device *dev)
>> {
>> - char *event_string = "AMDGPURESET=1";
>> - char *envp[2] = { event_string, NULL };
>> + char **envp;
>> +
>> + envp = kcalloc(2,sizeof(char *), GFP_ATOMIC);
>> + envp[0] = kcalloc(30, sizeof(char), GFP_ATOMIC);
>> + envp[1] = kcalloc(30, sizeof(char), GFP_ATOMIC);
> No, not really. kobject_uevent_env() will still allocate memory without GFP_ATOMIC.
>
> I think the whole approach of using udev won't work for this.
>
> Regards,
> Christian.
>
> I have tested it with sample applications:
> Gpu reset:
> sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
>
> And I never missed the AMDGPURESET=1 event in user space,
That's not the problem. Allocating memory when we need to do a reset can
cause a *HARD* kernel deadlock.
This is absolutely not something we can do and Daniel even tried to add
a few lockdep annotations for this.
So automated testing scripts will complain that this won't work.
Regards,
Christian.
> even with continues resets with sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover .
>
> Regards,
> S.Amarnath
>> +
>> + strcpy(envp[0], "AMDGPURESET=1");
>> + strcpy(envp[1], "");
>> +
>>
>> kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
>> envp);
>> +
>> + kfree(envp[0]);
>> + kfree(envp[1]);
>> + kfree(envp);
>> }
>>
>> Regards,
>> S.Amarnath
>>
>>> +}
>>> +
>>> +void amdgpu_reset_dumps(struct amdgpu_device *adev) {
>>> + struct drm_device *ddev = adev_to_drm(adev);
>>> + int r = 0, i;
>>> +
>>> + /* original raven doesn't have full asic reset */
>>> + if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>>> + !(adev->apu_flags & AMD_APU_IS_RAVEN2))
>>> + return;
>>> + for (i = 0; i < adev->num_ip_blocks; i++) {
>>> + if (!adev->ip_blocks[i].status.valid)
>>> + continue;
>>> + if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>>> + continue;
>>> + r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>>> +
>>> + if (r)
>>> + DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
>>> + adev->ip_blocks[i].version->funcs->name, r);
>>> + }
>>> +
>>> + drm_sysfs_reset_event(ddev);
>>> + dump_stack();
>>> +}
>>> +
>>> static int nv_asic_reset(struct amdgpu_device *adev)
>>> {
>>> int ret = 0;
>>>
>>> + amdgpu_reset_dumps(adev);
>>> switch (nv_asic_reset_method(adev)) {
>>> case AMD_RESET_METHOD_PCI:
>>> dev_info(adev->dev, "PCI reset\n");
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
2022-01-17 10:19 ` Christian König
@ 2022-01-17 10:34 ` Somalapuram, Amaranath
2022-01-17 10:37 ` Christian König
0 siblings, 1 reply; 13+ messages in thread
From: Somalapuram, Amaranath @ 2022-01-17 10:34 UTC (permalink / raw)
To: Christian König, Somalapuram, Amaranath, amd-gfx
Cc: Deucher, Alexander, Sharma, Shashank
[-- Attachment #1: Type: text/plain, Size: 5725 bytes --]
On 1/17/2022 3:49 PM, Christian König wrote:
> Am 17.01.22 um 11:09 schrieb Somalapuram, Amaranath:
>> [AMD Official Use Only]
>>
>>
>>
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Monday, January 17, 2022 3:33 PM
>> To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>;
>> amd-gfx@lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank
>> <Shashank.Sharma@amd.com>
>> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD
>> GPU reset
>>
>> Am 17.01.22 um 11:01 schrieb Somalapuram, Amaranath:
>>> [AMD Official Use Only]
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>> Sent: Monday, January 17, 2022 12:57 PM
>>> To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>;
>>> amd-gfx@lists.freedesktop.org
>>> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank
>>> <Shashank.Sharma@amd.com>
>>> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU
>>> reset
>>>
>>> Am 17.01.22 um 07:33 schrieb Somalapuram Amaranath:
>>>> AMDGPURESET uevent added to notify userspace, collect dump_stack and
>>>> trace
>>>>
>>>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/nv.c | 45
>>>> +++++++++++++++++++++++++++++++++
>>>> 1 file changed, 45 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
>>>> b/drivers/gpu/drm/amd/amdgpu/nv.c index 2ec1ffb36b1f..b73147ae41fb
>>>> 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>>>> }
>>>> }
>>>> +/**
>>>> + * drm_sysfs_reset_event - generate a DRM uevent
>>>> + * @dev: DRM device
>>>> + *
>>>> + * Send a uevent for the DRM device specified by @dev. Currently we
>>>> +only
>>>> + * set AMDGPURESET=1 in the uevent environment, but this could be
>>>> +expanded to
>>>> + * deal with other types of events.
>>>> + *
>>>> + * Any new uapi should be using the
>>>> +drm_sysfs_connector_status_event()
>>>> + * for uevents on connector status change.
>>>> + */
>>>> +void drm_sysfs_reset_event(struct drm_device *dev)
>>> This should probably go directly into the DRM subsystem.
>>>
>>>> +{
>>>> + char *event_string = "AMDGPURESET=1";
>>>> + char *envp[2] = { event_string, NULL };
>>>> +
>>>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>> As I said this approach is a clear NAK. We can't allocate any memory
>>> when we do a reset.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Can I do something like this:
>>>
>>> void drm_sysfs_reset_event(struct drm_device *dev)
>>> {
>>> - char *event_string = "AMDGPURESET=1";
>>> - char *envp[2] = { event_string, NULL };
>>> + char **envp;
>>> +
>>> + envp = kcalloc(2,sizeof(char *), GFP_ATOMIC);
>>> + envp[0] = kcalloc(30, sizeof(char), GFP_ATOMIC);
>>> + envp[1] = kcalloc(30, sizeof(char), GFP_ATOMIC);
>> No, not really. kobject_uevent_env() will still allocate memory
>> without GFP_ATOMIC.
>>
>> I think the whole approach of using udev won't work for this.
>>
>> Regards,
>> Christian.
>>
>> I have tested it with sample applications:
>> Gpu reset:
>> sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
>>
>> And I never missed the AMDGPURESET=1 event in user space,
>
> That's not the problem. Allocating memory when we need to do a reset
> can cause a *HARD* kernel deadlock.
>
> This is absolutely not something we can do and Daniel even tried to
> add a few lockdep annotations for this.
>
> So automated testing scripts will complain that this won't work.
>
> Regards,
> Christian.
Any suggestion how we can notify user space during this situation.
Regards,
S.Amarnath
>
>> even with continues resets with sudo cat
>> /sys/kernel/debug/dri/0/amdgpu_gpu_recover .
>
>
>>
>> Regards,
>> S.Amarnath
>>> +
>>> + strcpy(envp[0], "AMDGPURESET=1");
>>> + strcpy(envp[1], "");
>>> +
>>>
>>> kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
>>> envp);
>>> +
>>> + kfree(envp[0]);
>>> + kfree(envp[1]);
>>> + kfree(envp);
>>> }
>>>
>>> Regards,
>>> S.Amarnath
>>>
>>>> +}
>>>> +
>>>> +void amdgpu_reset_dumps(struct amdgpu_device *adev) {
>>>> + struct drm_device *ddev = adev_to_drm(adev);
>>>> + int r = 0, i;
>>>> +
>>>> + /* original raven doesn't have full asic reset */
>>>> + if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>>>> + !(adev->apu_flags & AMD_APU_IS_RAVEN2))
>>>> + return;
>>>> + for (i = 0; i < adev->num_ip_blocks; i++) {
>>>> + if (!adev->ip_blocks[i].status.valid)
>>>> + continue;
>>>> + if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>>>> + continue;
>>>> + r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>>>> +
>>>> + if (r)
>>>> + DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
>>>> + adev->ip_blocks[i].version->funcs->name, r);
>>>> + }
>>>> +
>>>> + drm_sysfs_reset_event(ddev);
>>>> + dump_stack();
>>>> +}
>>>> +
>>>> static int nv_asic_reset(struct amdgpu_device *adev)
>>>> {
>>>> int ret = 0;
>>>> + amdgpu_reset_dumps(adev);
>>>> switch (nv_asic_reset_method(adev)) {
>>>> case AMD_RESET_METHOD_PCI:
>>>> dev_info(adev->dev, "PCI reset\n");
>
[-- Attachment #2: Type: text/html, Size: 12158 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
2022-01-17 10:34 ` Somalapuram, Amaranath
@ 2022-01-17 10:37 ` Christian König
2022-01-17 11:43 ` Sharma, Shashank
0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2022-01-17 10:37 UTC (permalink / raw)
To: Somalapuram, Amaranath, Somalapuram, Amaranath, amd-gfx
Cc: Deucher, Alexander, Sharma, Shashank
[-- Attachment #1: Type: text/plain, Size: 518 bytes --]
Am 17.01.22 um 11:34 schrieb Somalapuram, Amaranath:
> [SNIP]
> Any suggestion how we can notify user space during this situation.
Well trace points should work. They use a ring buffer and are specially
made to work in such situations.
Alternative would be to audit the udev code and allow giving a GFP mask
to the function to make sure that we don't run into the deadlock.
Another alternative is a debugfs file which just blocks for the next
reset to happen.
Regards,
Christian.
> Regards,
>
> S.Amarnath
>
[-- Attachment #2: Type: text/html, Size: 1023 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
2022-01-17 10:37 ` Christian König
@ 2022-01-17 11:43 ` Sharma, Shashank
2022-01-17 11:57 ` Christian König
0 siblings, 1 reply; 13+ messages in thread
From: Sharma, Shashank @ 2022-01-17 11:43 UTC (permalink / raw)
To: Christian König, Somalapuram, Amaranath, Somalapuram,
Amaranath, amd-gfx
Cc: Deucher, Alexander
Hello Christian,
On 1/17/2022 11:37 AM, Christian König wrote:
> Am 17.01.22 um 11:34 schrieb Somalapuram, Amaranath:
>> [SNIP]
>> Any suggestion how we can notify user space during this situation.
>
> Well trace points should work. They use a ring buffer and are specially
> made to work in such situations.
We are anyways sending a trace event with data, but can trace event be a
wake up source for an application (like a tracer) ? This DRM event is
just to indicate that reset happened, so app has to read from trace file.
>
> Alternative would be to audit the udev code and allow giving a GFP mask
> to the function to make sure that we don't run into the deadlock.
>
> Another alternative is a debugfs file which just blocks for the next
> reset to happen.
>
- Shashank
> Regards,
> Christian.
>
>> Regards,
>>
>> S.Amarnath
>>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
2022-01-17 6:33 [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset Somalapuram Amaranath
2022-01-17 7:27 ` Christian König
@ 2022-01-17 11:54 ` Lazar, Lijo
1 sibling, 0 replies; 13+ messages in thread
From: Lazar, Lijo @ 2022-01-17 11:54 UTC (permalink / raw)
To: Somalapuram Amaranath, amd-gfx
Cc: alexander.deucher, christian.koenig, shashank.sharma
On 1/17/2022 12:03 PM, Somalapuram Amaranath wrote:
> AMDGPURESET uevent added to notify userspace, collect dump_stack and trace
>
> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
> 1 file changed, 45 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
> index 2ec1ffb36b1f..b73147ae41fb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
> }
> }
>
> +/**
> + * drm_sysfs_reset_event - generate a DRM uevent
> + * @dev: DRM device
> + *
> + * Send a uevent for the DRM device specified by @dev. Currently we only
> + * set AMDGPURESET=1 in the uevent environment, but this could be expanded to
> + * deal with other types of events.
> + *
> + * Any new uapi should be using the drm_sysfs_connector_status_event()
> + * for uevents on connector status change.
> + */
> +void drm_sysfs_reset_event(struct drm_device *dev)
> +{
> + char *event_string = "AMDGPURESET=1";
> + char *envp[2] = { event_string, NULL };
> +
> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> +}
> +
> +void amdgpu_reset_dumps(struct amdgpu_device *adev)
> +{
> + struct drm_device *ddev = adev_to_drm(adev);
> + int r = 0, i;
> +
> + /* original raven doesn't have full asic reset */
> + if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
> + !(adev->apu_flags & AMD_APU_IS_RAVEN2))
> + return;
> + for (i = 0; i < adev->num_ip_blocks; i++) {
> + if (!adev->ip_blocks[i].status.valid)
> + continue;
> + if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
> + continue;
> + r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
> +
> + if (r)
> + DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
> + adev->ip_blocks[i].version->funcs->name, r);
> + }
> +
> + drm_sysfs_reset_event(ddev);
> + dump_stack();
> +}
> +
> static int nv_asic_reset(struct amdgpu_device *adev)
> {
> int ret = 0;
>
> + amdgpu_reset_dumps(adev);
Alex recently added a patch to reset GPU on suspend. It doesn't make
sense to send an event in such cases, guess the original intention is
for gpu recovery related cases.
Thanks,
Lijo
> switch (nv_asic_reset_method(adev)) {
> case AMD_RESET_METHOD_PCI:
> dev_info(adev->dev, "PCI reset\n");
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
2022-01-17 11:43 ` Sharma, Shashank
@ 2022-01-17 11:57 ` Christian König
2022-01-17 14:19 ` Somalapuram, Amaranath
0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2022-01-17 11:57 UTC (permalink / raw)
To: Sharma, Shashank, Somalapuram, Amaranath, Somalapuram, Amaranath,
amd-gfx
Cc: Deucher, Alexander
Am 17.01.22 um 12:43 schrieb Sharma, Shashank:
> Hello Christian,
>
> On 1/17/2022 11:37 AM, Christian König wrote:
>> Am 17.01.22 um 11:34 schrieb Somalapuram, Amaranath:
>>> [SNIP]
>>> Any suggestion how we can notify user space during this situation.
>>
>> Well trace points should work. They use a ring buffer and are
>> specially made to work in such situations.
>
> We are anyways sending a trace event with data, but can trace event be
> a wake up source for an application (like a tracer) ? This DRM event
> is just to indicate that reset happened, so app has to read from trace
> file.
Yeah, that's a really good question I can't fully answer. As far as I
know you can filter in userspace what you get from the tracer, but I
don't know if you can block on a specific event.
Christian.
>
>>
>> Alternative would be to audit the udev code and allow giving a GFP
>> mask to the function to make sure that we don't run into the deadlock.
>>
>> Another alternative is a debugfs file which just blocks for the next
>> reset to happen.
>>
>
> - Shashank
>
>> Regards,
>> Christian.
>>
>>> Regards,
>>>
>>> S.Amarnath
>>>
>>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
2022-01-17 11:57 ` Christian König
@ 2022-01-17 14:19 ` Somalapuram, Amaranath
2022-01-17 16:07 ` Christian König
0 siblings, 1 reply; 13+ messages in thread
From: Somalapuram, Amaranath @ 2022-01-17 14:19 UTC (permalink / raw)
To: Christian König, Sharma, Shashank, Somalapuram, Amaranath, amd-gfx
Cc: Deucher, Alexander
Hi Christian,
if sending udev event during reset is going to create problem, we can
move this code from reset sequence to re-int (after GPU reset succeeded).
Regards,
S.Amarnath
On 1/17/2022 5:27 PM, Christian König wrote:
> Am 17.01.22 um 12:43 schrieb Sharma, Shashank:
>> Hello Christian,
>>
>> On 1/17/2022 11:37 AM, Christian König wrote:
>>> Am 17.01.22 um 11:34 schrieb Somalapuram, Amaranath:
>>>> [SNIP]
>>>> Any suggestion how we can notify user space during this situation.
>>>
>>> Well trace points should work. They use a ring buffer and are
>>> specially made to work in such situations.
>>
>> We are anyways sending a trace event with data, but can trace event
>> be a wake up source for an application (like a tracer) ? This DRM
>> event is just to indicate that reset happened, so app has to read
>> from trace file.
>
> Yeah, that's a really good question I can't fully answer. As far as I
> know you can filter in userspace what you get from the tracer, but I
> don't know if you can block on a specific event.
>
> Christian.
>
>>
>>>
>>> Alternative would be to audit the udev code and allow giving a GFP
>>> mask to the function to make sure that we don't run into the deadlock.
>>>
>>> Another alternative is a debugfs file which just blocks for the next
>>> reset to happen.
>>>
>>
>> - Shashank
>>
>>> Regards,
>>> Christian.
>>>
>>>> Regards,
>>>>
>>>> S.Amarnath
>>>>
>>>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
2022-01-17 14:19 ` Somalapuram, Amaranath
@ 2022-01-17 16:07 ` Christian König
0 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2022-01-17 16:07 UTC (permalink / raw)
To: Somalapuram, Amaranath, Sharma, Shashank, Somalapuram, Amaranath,
amd-gfx
Cc: Deucher, Alexander
Hi Amarnath,
not a bad idea, but that won't work either because you really need to
return to make sure that a potential next reset can run.
What could be done is to have a work item which does that, but then I
think it would just be easier to teach the udev function a GFP mask.
Regards,
Christian.
Am 17.01.22 um 15:19 schrieb Somalapuram, Amaranath:
> Hi Christian,
>
> if sending udev event during reset is going to create problem, we can
> move this code from reset sequence to re-int (after GPU reset
> succeeded).
>
> Regards,
>
> S.Amarnath
>
> On 1/17/2022 5:27 PM, Christian König wrote:
>> Am 17.01.22 um 12:43 schrieb Sharma, Shashank:
>>> Hello Christian,
>>>
>>> On 1/17/2022 11:37 AM, Christian König wrote:
>>>> Am 17.01.22 um 11:34 schrieb Somalapuram, Amaranath:
>>>>> [SNIP]
>>>>> Any suggestion how we can notify user space during this situation.
>>>>
>>>> Well trace points should work. They use a ring buffer and are
>>>> specially made to work in such situations.
>>>
>>> We are anyways sending a trace event with data, but can trace event
>>> be a wake up source for an application (like a tracer) ? This DRM
>>> event is just to indicate that reset happened, so app has to read
>>> from trace file.
>>
>> Yeah, that's a really good question I can't fully answer. As far as I
>> know you can filter in userspace what you get from the tracer, but I
>> don't know if you can block on a specific event.
>>
>> Christian.
>>
>>>
>>>>
>>>> Alternative would be to audit the udev code and allow giving a GFP
>>>> mask to the function to make sure that we don't run into the deadlock.
>>>>
>>>> Another alternative is a debugfs file which just blocks for the
>>>> next reset to happen.
>>>>
>>>
>>> - Shashank
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Regards,
>>>>>
>>>>> S.Amarnath
>>>>>
>>>>
>>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-01-17 16:07 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-17 6:33 [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset Somalapuram Amaranath
2022-01-17 7:27 ` Christian König
2022-01-17 10:01 ` Somalapuram, Amaranath
2022-01-17 10:03 ` Christian König
2022-01-17 10:09 ` Somalapuram, Amaranath
2022-01-17 10:19 ` Christian König
2022-01-17 10:34 ` Somalapuram, Amaranath
2022-01-17 10:37 ` Christian König
2022-01-17 11:43 ` Sharma, Shashank
2022-01-17 11:57 ` Christian König
2022-01-17 14:19 ` Somalapuram, Amaranath
2022-01-17 16:07 ` Christian König
2022-01-17 11:54 ` Lazar, Lijo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.