All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
@ 2022-01-17  6:33 Somalapuram Amaranath
  2022-01-17  7:27 ` Christian König
  2022-01-17 11:54 ` Lazar, Lijo
  0 siblings, 2 replies; 13+ messages in thread
From: Somalapuram Amaranath @ 2022-01-17  6:33 UTC (permalink / raw)
  To: amd-gfx
  Cc: alexander.deucher, Somalapuram Amaranath, christian.koenig,
	shashank.sharma

AMDGPURESET uevent added to notify userspace, collect dump_stack and trace

Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 2ec1ffb36b1f..b73147ae41fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
 	}
 }
 
+/**
+ * drm_sysfs_reset_event - generate a DRM uevent
+ * @dev: DRM device
+ *
+ * Send a uevent for the DRM device specified by @dev.  Currently we only
+ * set AMDGPURESET=1 in the uevent environment, but this could be expanded to
+ * deal with other types of events.
+ *
+ * Any new uapi should be using the drm_sysfs_connector_status_event()
+ * for uevents on connector status change.
+ */
+void drm_sysfs_reset_event(struct drm_device *dev)
+{
+	char *event_string = "AMDGPURESET=1";
+	char *envp[2] = { event_string, NULL };
+
+	kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
+}
+
+void amdgpu_reset_dumps(struct amdgpu_device *adev)
+{
+	struct drm_device *ddev = adev_to_drm(adev);
+	int r = 0, i;
+
+	/* original raven doesn't have full asic reset */
+	if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
+		!(adev->apu_flags & AMD_APU_IS_RAVEN2))
+		return;
+	for (i = 0; i < adev->num_ip_blocks; i++) {
+		if (!adev->ip_blocks[i].status.valid)
+			continue;
+		if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
+			continue;
+		r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
+
+		if (r)
+			DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
+					adev->ip_blocks[i].version->funcs->name, r);
+	}
+
+	drm_sysfs_reset_event(ddev);
+	dump_stack();
+}
+
 static int nv_asic_reset(struct amdgpu_device *adev)
 {
 	int ret = 0;
 
+	amdgpu_reset_dumps(adev);
 	switch (nv_asic_reset_method(adev)) {
 	case AMD_RESET_METHOD_PCI:
 		dev_info(adev->dev, "PCI reset\n");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
  2022-01-17  6:33 [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset Somalapuram Amaranath
@ 2022-01-17  7:27 ` Christian König
  2022-01-17 10:01   ` Somalapuram, Amaranath
  2022-01-17 11:54 ` Lazar, Lijo
  1 sibling, 1 reply; 13+ messages in thread
From: Christian König @ 2022-01-17  7:27 UTC (permalink / raw)
  To: Somalapuram Amaranath, amd-gfx; +Cc: alexander.deucher, shashank.sharma

Am 17.01.22 um 07:33 schrieb Somalapuram Amaranath:
> AMDGPURESET uevent added to notify userspace, collect dump_stack and trace
>
> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
>   1 file changed, 45 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
> index 2ec1ffb36b1f..b73147ae41fb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>   	}
>   }
>   
> +/**
> + * drm_sysfs_reset_event - generate a DRM uevent
> + * @dev: DRM device
> + *
> + * Send a uevent for the DRM device specified by @dev.  Currently we only
> + * set AMDGPURESET=1 in the uevent environment, but this could be expanded to
> + * deal with other types of events.
> + *
> + * Any new uapi should be using the drm_sysfs_connector_status_event()
> + * for uevents on connector status change.
> + */
> +void drm_sysfs_reset_event(struct drm_device *dev)

This should probably go directly into the DRM subsystem.

> +{
> +	char *event_string = "AMDGPURESET=1";
> +	char *envp[2] = { event_string, NULL };
> +
> +	kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);

As I said this approach is a clear NAK. We can't allocate any memory 
when we do a reset.

Regards,
Christian.

> +}
> +
> +void amdgpu_reset_dumps(struct amdgpu_device *adev)
> +{
> +	struct drm_device *ddev = adev_to_drm(adev);
> +	int r = 0, i;
> +
> +	/* original raven doesn't have full asic reset */
> +	if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
> +		!(adev->apu_flags & AMD_APU_IS_RAVEN2))
> +		return;
> +	for (i = 0; i < adev->num_ip_blocks; i++) {
> +		if (!adev->ip_blocks[i].status.valid)
> +			continue;
> +		if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
> +			continue;
> +		r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
> +
> +		if (r)
> +			DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
> +					adev->ip_blocks[i].version->funcs->name, r);
> +	}
> +
> +	drm_sysfs_reset_event(ddev);
> +	dump_stack();
> +}
> +
>   static int nv_asic_reset(struct amdgpu_device *adev)
>   {
>   	int ret = 0;
>   
> +	amdgpu_reset_dumps(adev);
>   	switch (nv_asic_reset_method(adev)) {
>   	case AMD_RESET_METHOD_PCI:
>   		dev_info(adev->dev, "PCI reset\n");


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
  2022-01-17  7:27 ` Christian König
@ 2022-01-17 10:01   ` Somalapuram, Amaranath
  2022-01-17 10:03     ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Somalapuram, Amaranath @ 2022-01-17 10:01 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx; +Cc: Deucher, Alexander, Sharma, Shashank

[AMD Official Use Only]



-----Original Message-----
From: Koenig, Christian <Christian.Koenig@amd.com> 
Sent: Monday, January 17, 2022 12:57 PM
To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset

Am 17.01.22 um 07:33 schrieb Somalapuram Amaranath:
> AMDGPURESET uevent added to notify userspace, collect dump_stack and 
> trace
>
> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
>   1 file changed, 45 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c 
> b/drivers/gpu/drm/amd/amdgpu/nv.c index 2ec1ffb36b1f..b73147ae41fb 
> 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>   	}
>   }
>   
> +/**
> + * drm_sysfs_reset_event - generate a DRM uevent
> + * @dev: DRM device
> + *
> + * Send a uevent for the DRM device specified by @dev.  Currently we 
> +only
> + * set AMDGPURESET=1 in the uevent environment, but this could be 
> +expanded to
> + * deal with other types of events.
> + *
> + * Any new uapi should be using the 
> +drm_sysfs_connector_status_event()
> + * for uevents on connector status change.
> + */
> +void drm_sysfs_reset_event(struct drm_device *dev)

This should probably go directly into the DRM subsystem.

> +{
> +	char *event_string = "AMDGPURESET=1";
> +	char *envp[2] = { event_string, NULL };
> +
> +	kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);

As I said this approach is a clear NAK. We can't allocate any memory when we do a reset.

Regards,
Christian.

Can I do something like this:

void drm_sysfs_reset_event(struct drm_device *dev)
 {
-       char *event_string = "AMDGPURESET=1";
-       char *envp[2] = { event_string, NULL };
+       char **envp;
+
+       envp = kcalloc(2,sizeof(char *), GFP_ATOMIC);
+       envp[0] = kcalloc(30, sizeof(char), GFP_ATOMIC);
+       envp[1] = kcalloc(30, sizeof(char), GFP_ATOMIC);
+
+       strcpy(envp[0], "AMDGPURESET=1");
+       strcpy(envp[1], "");
+

        kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
+
+       kfree(envp[0]);
+       kfree(envp[1]);
+       kfree(envp);
 }

Regards,
S.Amarnath

> +}
> +
> +void amdgpu_reset_dumps(struct amdgpu_device *adev) {
> +	struct drm_device *ddev = adev_to_drm(adev);
> +	int r = 0, i;
> +
> +	/* original raven doesn't have full asic reset */
> +	if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
> +		!(adev->apu_flags & AMD_APU_IS_RAVEN2))
> +		return;
> +	for (i = 0; i < adev->num_ip_blocks; i++) {
> +		if (!adev->ip_blocks[i].status.valid)
> +			continue;
> +		if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
> +			continue;
> +		r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
> +
> +		if (r)
> +			DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
> +					adev->ip_blocks[i].version->funcs->name, r);
> +	}
> +
> +	drm_sysfs_reset_event(ddev);
> +	dump_stack();
> +}
> +
>   static int nv_asic_reset(struct amdgpu_device *adev)
>   {
>   	int ret = 0;
>   
> +	amdgpu_reset_dumps(adev);
>   	switch (nv_asic_reset_method(adev)) {
>   	case AMD_RESET_METHOD_PCI:
>   		dev_info(adev->dev, "PCI reset\n");

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
  2022-01-17 10:01   ` Somalapuram, Amaranath
@ 2022-01-17 10:03     ` Christian König
  2022-01-17 10:09       ` Somalapuram, Amaranath
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2022-01-17 10:03 UTC (permalink / raw)
  To: Somalapuram, Amaranath, amd-gfx; +Cc: Deucher, Alexander, Sharma, Shashank

Am 17.01.22 um 11:01 schrieb Somalapuram, Amaranath:
> [AMD Official Use Only]
>
>
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Monday, January 17, 2022 12:57 PM
> To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
>
> Am 17.01.22 um 07:33 schrieb Somalapuram Amaranath:
>> AMDGPURESET uevent added to notify userspace, collect dump_stack and
>> trace
>>
>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
>>    1 file changed, 45 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
>> b/drivers/gpu/drm/amd/amdgpu/nv.c index 2ec1ffb36b1f..b73147ae41fb
>> 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>>    	}
>>    }
>>    
>> +/**
>> + * drm_sysfs_reset_event - generate a DRM uevent
>> + * @dev: DRM device
>> + *
>> + * Send a uevent for the DRM device specified by @dev.  Currently we
>> +only
>> + * set AMDGPURESET=1 in the uevent environment, but this could be
>> +expanded to
>> + * deal with other types of events.
>> + *
>> + * Any new uapi should be using the
>> +drm_sysfs_connector_status_event()
>> + * for uevents on connector status change.
>> + */
>> +void drm_sysfs_reset_event(struct drm_device *dev)
> This should probably go directly into the DRM subsystem.
>
>> +{
>> +	char *event_string = "AMDGPURESET=1";
>> +	char *envp[2] = { event_string, NULL };
>> +
>> +	kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> As I said this approach is a clear NAK. We can't allocate any memory when we do a reset.
>
> Regards,
> Christian.
>
> Can I do something like this:
>
> void drm_sysfs_reset_event(struct drm_device *dev)
>   {
> -       char *event_string = "AMDGPURESET=1";
> -       char *envp[2] = { event_string, NULL };
> +       char **envp;
> +
> +       envp = kcalloc(2,sizeof(char *), GFP_ATOMIC);
> +       envp[0] = kcalloc(30, sizeof(char), GFP_ATOMIC);
> +       envp[1] = kcalloc(30, sizeof(char), GFP_ATOMIC);

No, not really. kobject_uevent_env() will still allocate memory without 
GFP_ATOMIC.

I think the whole approach of using udev won't work for this.

Regards,
Christian.

> +
> +       strcpy(envp[0], "AMDGPURESET=1");
> +       strcpy(envp[1], "");
> +
>
>          kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> +
> +       kfree(envp[0]);
> +       kfree(envp[1]);
> +       kfree(envp);
>   }
>
> Regards,
> S.Amarnath
>
>> +}
>> +
>> +void amdgpu_reset_dumps(struct amdgpu_device *adev) {
>> +	struct drm_device *ddev = adev_to_drm(adev);
>> +	int r = 0, i;
>> +
>> +	/* original raven doesn't have full asic reset */
>> +	if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>> +		!(adev->apu_flags & AMD_APU_IS_RAVEN2))
>> +		return;
>> +	for (i = 0; i < adev->num_ip_blocks; i++) {
>> +		if (!adev->ip_blocks[i].status.valid)
>> +			continue;
>> +		if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>> +			continue;
>> +		r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>> +
>> +		if (r)
>> +			DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
>> +					adev->ip_blocks[i].version->funcs->name, r);
>> +	}
>> +
>> +	drm_sysfs_reset_event(ddev);
>> +	dump_stack();
>> +}
>> +
>>    static int nv_asic_reset(struct amdgpu_device *adev)
>>    {
>>    	int ret = 0;
>>    
>> +	amdgpu_reset_dumps(adev);
>>    	switch (nv_asic_reset_method(adev)) {
>>    	case AMD_RESET_METHOD_PCI:
>>    		dev_info(adev->dev, "PCI reset\n");


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
  2022-01-17 10:03     ` Christian König
@ 2022-01-17 10:09       ` Somalapuram, Amaranath
  2022-01-17 10:19         ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Somalapuram, Amaranath @ 2022-01-17 10:09 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx; +Cc: Deucher, Alexander, Sharma, Shashank

[AMD Official Use Only]



-----Original Message-----
From: Koenig, Christian <Christian.Koenig@amd.com> 
Sent: Monday, January 17, 2022 3:33 PM
To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset

Am 17.01.22 um 11:01 schrieb Somalapuram, Amaranath:
> [AMD Official Use Only]
>
>
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Monday, January 17, 2022 12:57 PM
> To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>; 
> amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank 
> <Shashank.Sharma@amd.com>
> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU 
> reset
>
> Am 17.01.22 um 07:33 schrieb Somalapuram Amaranath:
>> AMDGPURESET uevent added to notify userspace, collect dump_stack and 
>> trace
>>
>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
>>    1 file changed, 45 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c 
>> b/drivers/gpu/drm/amd/amdgpu/nv.c index 2ec1ffb36b1f..b73147ae41fb
>> 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>>    	}
>>    }
>>    
>> +/**
>> + * drm_sysfs_reset_event - generate a DRM uevent
>> + * @dev: DRM device
>> + *
>> + * Send a uevent for the DRM device specified by @dev.  Currently we 
>> +only
>> + * set AMDGPURESET=1 in the uevent environment, but this could be 
>> +expanded to
>> + * deal with other types of events.
>> + *
>> + * Any new uapi should be using the
>> +drm_sysfs_connector_status_event()
>> + * for uevents on connector status change.
>> + */
>> +void drm_sysfs_reset_event(struct drm_device *dev)
> This should probably go directly into the DRM subsystem.
>
>> +{
>> +	char *event_string = "AMDGPURESET=1";
>> +	char *envp[2] = { event_string, NULL };
>> +
>> +	kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> As I said this approach is a clear NAK. We can't allocate any memory when we do a reset.
>
> Regards,
> Christian.
>
> Can I do something like this:
>
> void drm_sysfs_reset_event(struct drm_device *dev)
>   {
> -       char *event_string = "AMDGPURESET=1";
> -       char *envp[2] = { event_string, NULL };
> +       char **envp;
> +
> +       envp = kcalloc(2,sizeof(char *), GFP_ATOMIC);
> +       envp[0] = kcalloc(30, sizeof(char), GFP_ATOMIC);
> +       envp[1] = kcalloc(30, sizeof(char), GFP_ATOMIC);

No, not really. kobject_uevent_env() will still allocate memory without GFP_ATOMIC.

I think the whole approach of using udev won't work for this.

Regards,
Christian.

I have tested it with sample applications: 
Gpu reset:
sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover

And I never missed the AMDGPURESET=1 event in user space, 
even with continues resets with sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover .

Regards,
S.Amarnath
> +
> +       strcpy(envp[0], "AMDGPURESET=1");
> +       strcpy(envp[1], "");
> +
>
>          kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, 
> envp);
> +
> +       kfree(envp[0]);
> +       kfree(envp[1]);
> +       kfree(envp);
>   }
>
> Regards,
> S.Amarnath
>
>> +}
>> +
>> +void amdgpu_reset_dumps(struct amdgpu_device *adev) {
>> +	struct drm_device *ddev = adev_to_drm(adev);
>> +	int r = 0, i;
>> +
>> +	/* original raven doesn't have full asic reset */
>> +	if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>> +		!(adev->apu_flags & AMD_APU_IS_RAVEN2))
>> +		return;
>> +	for (i = 0; i < adev->num_ip_blocks; i++) {
>> +		if (!adev->ip_blocks[i].status.valid)
>> +			continue;
>> +		if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>> +			continue;
>> +		r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>> +
>> +		if (r)
>> +			DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
>> +					adev->ip_blocks[i].version->funcs->name, r);
>> +	}
>> +
>> +	drm_sysfs_reset_event(ddev);
>> +	dump_stack();
>> +}
>> +
>>    static int nv_asic_reset(struct amdgpu_device *adev)
>>    {
>>    	int ret = 0;
>>    
>> +	amdgpu_reset_dumps(adev);
>>    	switch (nv_asic_reset_method(adev)) {
>>    	case AMD_RESET_METHOD_PCI:
>>    		dev_info(adev->dev, "PCI reset\n");

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
  2022-01-17 10:09       ` Somalapuram, Amaranath
@ 2022-01-17 10:19         ` Christian König
  2022-01-17 10:34           ` Somalapuram, Amaranath
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2022-01-17 10:19 UTC (permalink / raw)
  To: Somalapuram, Amaranath, amd-gfx; +Cc: Deucher, Alexander, Sharma, Shashank

Am 17.01.22 um 11:09 schrieb Somalapuram, Amaranath:
> [AMD Official Use Only]
>
>
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Monday, January 17, 2022 3:33 PM
> To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
>
> Am 17.01.22 um 11:01 schrieb Somalapuram, Amaranath:
>> [AMD Official Use Only]
>>
>>
>>
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Monday, January 17, 2022 12:57 PM
>> To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>;
>> amd-gfx@lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank
>> <Shashank.Sharma@amd.com>
>> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU
>> reset
>>
>> Am 17.01.22 um 07:33 schrieb Somalapuram Amaranath:
>>> AMDGPURESET uevent added to notify userspace, collect dump_stack and
>>> trace
>>>
>>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
>>>     1 file changed, 45 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
>>> b/drivers/gpu/drm/amd/amdgpu/nv.c index 2ec1ffb36b1f..b73147ae41fb
>>> 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>>>     	}
>>>     }
>>>     
>>> +/**
>>> + * drm_sysfs_reset_event - generate a DRM uevent
>>> + * @dev: DRM device
>>> + *
>>> + * Send a uevent for the DRM device specified by @dev.  Currently we
>>> +only
>>> + * set AMDGPURESET=1 in the uevent environment, but this could be
>>> +expanded to
>>> + * deal with other types of events.
>>> + *
>>> + * Any new uapi should be using the
>>> +drm_sysfs_connector_status_event()
>>> + * for uevents on connector status change.
>>> + */
>>> +void drm_sysfs_reset_event(struct drm_device *dev)
>> This should probably go directly into the DRM subsystem.
>>
>>> +{
>>> +	char *event_string = "AMDGPURESET=1";
>>> +	char *envp[2] = { event_string, NULL };
>>> +
>>> +	kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>> As I said this approach is a clear NAK. We can't allocate any memory when we do a reset.
>>
>> Regards,
>> Christian.
>>
>> Can I do something like this:
>>
>> void drm_sysfs_reset_event(struct drm_device *dev)
>>    {
>> -       char *event_string = "AMDGPURESET=1";
>> -       char *envp[2] = { event_string, NULL };
>> +       char **envp;
>> +
>> +       envp = kcalloc(2,sizeof(char *), GFP_ATOMIC);
>> +       envp[0] = kcalloc(30, sizeof(char), GFP_ATOMIC);
>> +       envp[1] = kcalloc(30, sizeof(char), GFP_ATOMIC);
> No, not really. kobject_uevent_env() will still allocate memory without GFP_ATOMIC.
>
> I think the whole approach of using udev won't work for this.
>
> Regards,
> Christian.
>
> I have tested it with sample applications:
> Gpu reset:
> sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
>
> And I never missed the AMDGPURESET=1 event in user space,

That's not the problem. Allocating memory when we need to do a reset can 
cause a *HARD* kernel deadlock.

This is absolutely not something we can do and Daniel even tried to add 
a few lockdep annotations for this.

So automated testing scripts will complain that this won't work.

Regards,
Christian.

> even with continues resets with sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover .


>
> Regards,
> S.Amarnath
>> +
>> +       strcpy(envp[0], "AMDGPURESET=1");
>> +       strcpy(envp[1], "");
>> +
>>
>>           kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
>> envp);
>> +
>> +       kfree(envp[0]);
>> +       kfree(envp[1]);
>> +       kfree(envp);
>>    }
>>
>> Regards,
>> S.Amarnath
>>
>>> +}
>>> +
>>> +void amdgpu_reset_dumps(struct amdgpu_device *adev) {
>>> +	struct drm_device *ddev = adev_to_drm(adev);
>>> +	int r = 0, i;
>>> +
>>> +	/* original raven doesn't have full asic reset */
>>> +	if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>>> +		!(adev->apu_flags & AMD_APU_IS_RAVEN2))
>>> +		return;
>>> +	for (i = 0; i < adev->num_ip_blocks; i++) {
>>> +		if (!adev->ip_blocks[i].status.valid)
>>> +			continue;
>>> +		if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>>> +			continue;
>>> +		r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>>> +
>>> +		if (r)
>>> +			DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
>>> +					adev->ip_blocks[i].version->funcs->name, r);
>>> +	}
>>> +
>>> +	drm_sysfs_reset_event(ddev);
>>> +	dump_stack();
>>> +}
>>> +
>>>     static int nv_asic_reset(struct amdgpu_device *adev)
>>>     {
>>>     	int ret = 0;
>>>     
>>> +	amdgpu_reset_dumps(adev);
>>>     	switch (nv_asic_reset_method(adev)) {
>>>     	case AMD_RESET_METHOD_PCI:
>>>     		dev_info(adev->dev, "PCI reset\n");


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
  2022-01-17 10:19         ` Christian König
@ 2022-01-17 10:34           ` Somalapuram, Amaranath
  2022-01-17 10:37             ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Somalapuram, Amaranath @ 2022-01-17 10:34 UTC (permalink / raw)
  To: Christian König, Somalapuram, Amaranath, amd-gfx
  Cc: Deucher, Alexander, Sharma, Shashank

[-- Attachment #1: Type: text/plain, Size: 5725 bytes --]


On 1/17/2022 3:49 PM, Christian König wrote:
> Am 17.01.22 um 11:09 schrieb Somalapuram, Amaranath:
>> [AMD Official Use Only]
>>
>>
>>
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Monday, January 17, 2022 3:33 PM
>> To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>; 
>> amd-gfx@lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank 
>> <Shashank.Sharma@amd.com>
>> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD 
>> GPU reset
>>
>> Am 17.01.22 um 11:01 schrieb Somalapuram, Amaranath:
>>> [AMD Official Use Only]
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>> Sent: Monday, January 17, 2022 12:57 PM
>>> To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>;
>>> amd-gfx@lists.freedesktop.org
>>> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank
>>> <Shashank.Sharma@amd.com>
>>> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU
>>> reset
>>>
>>> Am 17.01.22 um 07:33 schrieb Somalapuram Amaranath:
>>>> AMDGPURESET uevent added to notify userspace, collect dump_stack and
>>>> trace
>>>>
>>>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
>>>> ---
>>>>     drivers/gpu/drm/amd/amdgpu/nv.c | 45 
>>>> +++++++++++++++++++++++++++++++++
>>>>     1 file changed, 45 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
>>>> b/drivers/gpu/drm/amd/amdgpu/nv.c index 2ec1ffb36b1f..b73147ae41fb
>>>> 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>>>>         }
>>>>     }
>>>>     +/**
>>>> + * drm_sysfs_reset_event - generate a DRM uevent
>>>> + * @dev: DRM device
>>>> + *
>>>> + * Send a uevent for the DRM device specified by @dev. Currently we
>>>> +only
>>>> + * set AMDGPURESET=1 in the uevent environment, but this could be
>>>> +expanded to
>>>> + * deal with other types of events.
>>>> + *
>>>> + * Any new uapi should be using the
>>>> +drm_sysfs_connector_status_event()
>>>> + * for uevents on connector status change.
>>>> + */
>>>> +void drm_sysfs_reset_event(struct drm_device *dev)
>>> This should probably go directly into the DRM subsystem.
>>>
>>>> +{
>>>> +    char *event_string = "AMDGPURESET=1";
>>>> +    char *envp[2] = { event_string, NULL };
>>>> +
>>>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>> As I said this approach is a clear NAK. We can't allocate any memory 
>>> when we do a reset.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Can I do something like this:
>>>
>>> void drm_sysfs_reset_event(struct drm_device *dev)
>>>    {
>>> -       char *event_string = "AMDGPURESET=1";
>>> -       char *envp[2] = { event_string, NULL };
>>> +       char **envp;
>>> +
>>> +       envp = kcalloc(2,sizeof(char *), GFP_ATOMIC);
>>> +       envp[0] = kcalloc(30, sizeof(char), GFP_ATOMIC);
>>> +       envp[1] = kcalloc(30, sizeof(char), GFP_ATOMIC);
>> No, not really. kobject_uevent_env() will still allocate memory 
>> without GFP_ATOMIC.
>>
>> I think the whole approach of using udev won't work for this.
>>
>> Regards,
>> Christian.
>>
>> I have tested it with sample applications:
>> Gpu reset:
>> sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
>>
>> And I never missed the AMDGPURESET=1 event in user space,
>
> That's not the problem. Allocating memory when we need to do a reset 
> can cause a *HARD* kernel deadlock.
>
> This is absolutely not something we can do and Daniel even tried to 
> add a few lockdep annotations for this.
>
> So automated testing scripts will complain that this won't work.
>
> Regards,
> Christian.
Any suggestion how we can notify user space during this situation.

Regards,

S.Amarnath

>
>> even with continues resets with sudo cat 
>> /sys/kernel/debug/dri/0/amdgpu_gpu_recover .
>
>
>>
>> Regards,
>> S.Amarnath
>>> +
>>> +       strcpy(envp[0], "AMDGPURESET=1");
>>> +       strcpy(envp[1], "");
>>> +
>>>
>>> kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
>>> envp);
>>> +
>>> +       kfree(envp[0]);
>>> +       kfree(envp[1]);
>>> +       kfree(envp);
>>>    }
>>>
>>> Regards,
>>> S.Amarnath
>>>
>>>> +}
>>>> +
>>>> +void amdgpu_reset_dumps(struct amdgpu_device *adev) {
>>>> +    struct drm_device *ddev = adev_to_drm(adev);
>>>> +    int r = 0, i;
>>>> +
>>>> +    /* original raven doesn't have full asic reset */
>>>> +    if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>>>> +        !(adev->apu_flags & AMD_APU_IS_RAVEN2))
>>>> +        return;
>>>> +    for (i = 0; i < adev->num_ip_blocks; i++) {
>>>> +        if (!adev->ip_blocks[i].status.valid)
>>>> +            continue;
>>>> +        if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>>>> +            continue;
>>>> +        r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>>>> +
>>>> +        if (r)
>>>> +            DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
>>>> + adev->ip_blocks[i].version->funcs->name, r);
>>>> +    }
>>>> +
>>>> +    drm_sysfs_reset_event(ddev);
>>>> +    dump_stack();
>>>> +}
>>>> +
>>>>     static int nv_asic_reset(struct amdgpu_device *adev)
>>>>     {
>>>>         int ret = 0;
>>>>     +    amdgpu_reset_dumps(adev);
>>>>         switch (nv_asic_reset_method(adev)) {
>>>>         case AMD_RESET_METHOD_PCI:
>>>>             dev_info(adev->dev, "PCI reset\n");
>

[-- Attachment #2: Type: text/html, Size: 12158 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
  2022-01-17 10:34           ` Somalapuram, Amaranath
@ 2022-01-17 10:37             ` Christian König
  2022-01-17 11:43               ` Sharma, Shashank
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2022-01-17 10:37 UTC (permalink / raw)
  To: Somalapuram, Amaranath, Somalapuram, Amaranath, amd-gfx
  Cc: Deucher, Alexander, Sharma, Shashank

[-- Attachment #1: Type: text/plain, Size: 518 bytes --]

Am 17.01.22 um 11:34 schrieb Somalapuram, Amaranath:
> [SNIP]
> Any suggestion how we can notify user space during this situation. 

Well trace points should work. They use a ring buffer and are specially 
made to work in such situations.

Alternative would be to audit the udev code and allow giving a GFP mask 
to the function to make sure that we don't run into the deadlock.

Another alternative is a debugfs file which just blocks for the next 
reset to happen.

Regards,
Christian.

> Regards,
>
> S.Amarnath
>


[-- Attachment #2: Type: text/html, Size: 1023 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
  2022-01-17 10:37             ` Christian König
@ 2022-01-17 11:43               ` Sharma, Shashank
  2022-01-17 11:57                 ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Sharma, Shashank @ 2022-01-17 11:43 UTC (permalink / raw)
  To: Christian König, Somalapuram, Amaranath, Somalapuram,
	Amaranath, amd-gfx
  Cc: Deucher, Alexander

Hello Christian,

On 1/17/2022 11:37 AM, Christian König wrote:
> Am 17.01.22 um 11:34 schrieb Somalapuram, Amaranath:
>> [SNIP]
>> Any suggestion how we can notify user space during this situation. 
> 
> Well trace points should work. They use a ring buffer and are specially 
> made to work in such situations.

We are anyways sending a trace event with data, but can trace event be a 
wake up source for an application (like a tracer) ? This DRM event is 
just to indicate that reset happened, so app has to read from trace file.

> 
> Alternative would be to audit the udev code and allow giving a GFP mask 
> to the function to make sure that we don't run into the deadlock.
> 
> Another alternative is a debugfs file which just blocks for the next 
> reset to happen.
> 

- Shashank

> Regards,
> Christian.
> 
>> Regards,
>>
>> S.Amarnath
>>
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
  2022-01-17  6:33 [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset Somalapuram Amaranath
  2022-01-17  7:27 ` Christian König
@ 2022-01-17 11:54 ` Lazar, Lijo
  1 sibling, 0 replies; 13+ messages in thread
From: Lazar, Lijo @ 2022-01-17 11:54 UTC (permalink / raw)
  To: Somalapuram Amaranath, amd-gfx
  Cc: alexander.deucher, christian.koenig, shashank.sharma



On 1/17/2022 12:03 PM, Somalapuram Amaranath wrote:
> AMDGPURESET uevent added to notify userspace, collect dump_stack and trace
> 
> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
>   1 file changed, 45 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
> index 2ec1ffb36b1f..b73147ae41fb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>   	}
>   }
>   
> +/**
> + * drm_sysfs_reset_event - generate a DRM uevent
> + * @dev: DRM device
> + *
> + * Send a uevent for the DRM device specified by @dev.  Currently we only
> + * set AMDGPURESET=1 in the uevent environment, but this could be expanded to
> + * deal with other types of events.
> + *
> + * Any new uapi should be using the drm_sysfs_connector_status_event()
> + * for uevents on connector status change.
> + */
> +void drm_sysfs_reset_event(struct drm_device *dev)
> +{
> +	char *event_string = "AMDGPURESET=1";
> +	char *envp[2] = { event_string, NULL };
> +
> +	kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
> +}
> +
> +void amdgpu_reset_dumps(struct amdgpu_device *adev)
> +{
> +	struct drm_device *ddev = adev_to_drm(adev);
> +	int r = 0, i;
> +
> +	/* original raven doesn't have full asic reset */
> +	if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
> +		!(adev->apu_flags & AMD_APU_IS_RAVEN2))
> +		return;
> +	for (i = 0; i < adev->num_ip_blocks; i++) {
> +		if (!adev->ip_blocks[i].status.valid)
> +			continue;
> +		if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
> +			continue;
> +		r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
> +
> +		if (r)
> +			DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
> +					adev->ip_blocks[i].version->funcs->name, r);
> +	}
> +
> +	drm_sysfs_reset_event(ddev);
> +	dump_stack();
> +}
> +
>   static int nv_asic_reset(struct amdgpu_device *adev)
>   {
>   	int ret = 0;
>   
> +	amdgpu_reset_dumps(adev);

Alex recently added a patch to reset GPU on suspend. It doesn't make 
sense to send an event in such cases, guess the original intention is 
for gpu recovery related cases.

Thanks,
Lijo

>   	switch (nv_asic_reset_method(adev)) {
>   	case AMD_RESET_METHOD_PCI:
>   		dev_info(adev->dev, "PCI reset\n");
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
  2022-01-17 11:43               ` Sharma, Shashank
@ 2022-01-17 11:57                 ` Christian König
  2022-01-17 14:19                   ` Somalapuram, Amaranath
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2022-01-17 11:57 UTC (permalink / raw)
  To: Sharma, Shashank, Somalapuram, Amaranath, Somalapuram, Amaranath,
	amd-gfx
  Cc: Deucher, Alexander

Am 17.01.22 um 12:43 schrieb Sharma, Shashank:
> Hello Christian,
>
> On 1/17/2022 11:37 AM, Christian König wrote:
>> Am 17.01.22 um 11:34 schrieb Somalapuram, Amaranath:
>>> [SNIP]
>>> Any suggestion how we can notify user space during this situation. 
>>
>> Well trace points should work. They use a ring buffer and are 
>> specially made to work in such situations.
>
> We are anyways sending a trace event with data, but can trace event be 
> a wake up source for an application (like a tracer) ? This DRM event 
> is just to indicate that reset happened, so app has to read from trace 
> file.

Yeah, that's a really good question I can't fully answer. As far as I 
know you can filter in userspace what you get from the tracer, but I 
don't know if you can block on a specific event.

Christian.

>
>>
>> Alternative would be to audit the udev code and allow giving a GFP 
>> mask to the function to make sure that we don't run into the deadlock.
>>
>> Another alternative is a debugfs file which just blocks for the next 
>> reset to happen.
>>
>
> - Shashank
>
>> Regards,
>> Christian.
>>
>>> Regards,
>>>
>>> S.Amarnath
>>>
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
  2022-01-17 11:57                 ` Christian König
@ 2022-01-17 14:19                   ` Somalapuram, Amaranath
  2022-01-17 16:07                     ` Christian König
  0 siblings, 1 reply; 13+ messages in thread
From: Somalapuram, Amaranath @ 2022-01-17 14:19 UTC (permalink / raw)
  To: Christian König, Sharma, Shashank, Somalapuram, Amaranath, amd-gfx
  Cc: Deucher, Alexander

Hi Christian,

if sending udev event during reset is going to create problem, we can 
move this code from reset sequence to re-int  (after GPU reset succeeded).

Regards,

S.Amarnath

On 1/17/2022 5:27 PM, Christian König wrote:
> Am 17.01.22 um 12:43 schrieb Sharma, Shashank:
>> Hello Christian,
>>
>> On 1/17/2022 11:37 AM, Christian König wrote:
>>> Am 17.01.22 um 11:34 schrieb Somalapuram, Amaranath:
>>>> [SNIP]
>>>> Any suggestion how we can notify user space during this situation. 
>>>
>>> Well trace points should work. They use a ring buffer and are 
>>> specially made to work in such situations.
>>
>> We are anyways sending a trace event with data, but can trace event 
>> be a wake up source for an application (like a tracer) ? This DRM 
>> event is just to indicate that reset happened, so app has to read 
>> from trace file.
>
> Yeah, that's a really good question I can't fully answer. As far as I 
> know you can filter in userspace what you get from the tracer, but I 
> don't know if you can block on a specific event.
>
> Christian.
>
>>
>>>
>>> Alternative would be to audit the udev code and allow giving a GFP 
>>> mask to the function to make sure that we don't run into the deadlock.
>>>
>>> Another alternative is a debugfs file which just blocks for the next 
>>> reset to happen.
>>>
>>
>> - Shashank
>>
>>> Regards,
>>> Christian.
>>>
>>>> Regards,
>>>>
>>>> S.Amarnath
>>>>
>>>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
  2022-01-17 14:19                   ` Somalapuram, Amaranath
@ 2022-01-17 16:07                     ` Christian König
  0 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2022-01-17 16:07 UTC (permalink / raw)
  To: Somalapuram, Amaranath, Sharma, Shashank, Somalapuram, Amaranath,
	amd-gfx
  Cc: Deucher, Alexander

Hi Amarnath,

not a bad idea, but that won't work either because you really need to 
return to make sure that a potential next reset can run.

What could be done is to have a work item which does that, but then I 
think it would just be easier to teach the udev function a GFP mask.

Regards,
Christian.


Am 17.01.22 um 15:19 schrieb Somalapuram, Amaranath:
> Hi Christian,
>
> if sending udev event during reset is going to create problem, we can 
> move this code from reset sequence to re-int  (after GPU reset 
> succeeded).
>
> Regards,
>
> S.Amarnath
>
> On 1/17/2022 5:27 PM, Christian König wrote:
>> Am 17.01.22 um 12:43 schrieb Sharma, Shashank:
>>> Hello Christian,
>>>
>>> On 1/17/2022 11:37 AM, Christian König wrote:
>>>> Am 17.01.22 um 11:34 schrieb Somalapuram, Amaranath:
>>>>> [SNIP]
>>>>> Any suggestion how we can notify user space during this situation. 
>>>>
>>>> Well trace points should work. They use a ring buffer and are 
>>>> specially made to work in such situations.
>>>
>>> We are anyways sending a trace event with data, but can trace event 
>>> be a wake up source for an application (like a tracer) ? This DRM 
>>> event is just to indicate that reset happened, so app has to read 
>>> from trace file.
>>
>> Yeah, that's a really good question I can't fully answer. As far as I 
>> know you can filter in userspace what you get from the tracer, but I 
>> don't know if you can block on a specific event.
>>
>> Christian.
>>
>>>
>>>>
>>>> Alternative would be to audit the udev code and allow giving a GFP 
>>>> mask to the function to make sure that we don't run into the deadlock.
>>>>
>>>> Another alternative is a debugfs file which just blocks for the 
>>>> next reset to happen.
>>>>
>>>
>>> - Shashank
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Regards,
>>>>>
>>>>> S.Amarnath
>>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-01-17 16:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-17  6:33 [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset Somalapuram Amaranath
2022-01-17  7:27 ` Christian König
2022-01-17 10:01   ` Somalapuram, Amaranath
2022-01-17 10:03     ` Christian König
2022-01-17 10:09       ` Somalapuram, Amaranath
2022-01-17 10:19         ` Christian König
2022-01-17 10:34           ` Somalapuram, Amaranath
2022-01-17 10:37             ` Christian König
2022-01-17 11:43               ` Sharma, Shashank
2022-01-17 11:57                 ` Christian König
2022-01-17 14:19                   ` Somalapuram, Amaranath
2022-01-17 16:07                     ` Christian König
2022-01-17 11:54 ` Lazar, Lijo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.