All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: "Somalapuram, Amaranath" <Amaranath.Somalapuram@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Sharma, Shashank" <Shashank.Sharma@amd.com>
Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
Date: Mon, 17 Jan 2022 11:19:42 +0100	[thread overview]
Message-ID: <62372e77-31c8-211f-0d9c-01a0f880badf@amd.com> (raw)
In-Reply-To: <DM6PR12MB3897D33226DBA88264B5BBF8F8579@DM6PR12MB3897.namprd12.prod.outlook.com>

Am 17.01.22 um 11:09 schrieb Somalapuram, Amaranath:
> [AMD Official Use Only]
>
>
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Monday, January 17, 2022 3:33 PM
> To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank <Shashank.Sharma@amd.com>
> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset
>
> Am 17.01.22 um 11:01 schrieb Somalapuram, Amaranath:
>> [AMD Official Use Only]
>>
>>
>>
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: Monday, January 17, 2022 12:57 PM
>> To: Somalapuram, Amaranath <Amaranath.Somalapuram@amd.com>;
>> amd-gfx@lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Sharma, Shashank
>> <Shashank.Sharma@amd.com>
>> Subject: Re: [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU
>> reset
>>
>> Am 17.01.22 um 07:33 schrieb Somalapuram Amaranath:
>>> AMDGPURESET uevent added to notify userspace, collect dump_stack and
>>> trace
>>>
>>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/nv.c | 45 +++++++++++++++++++++++++++++++++
>>>     1 file changed, 45 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
>>> b/drivers/gpu/drm/amd/amdgpu/nv.c index 2ec1ffb36b1f..b73147ae41fb
>>> 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>> @@ -529,10 +529,55 @@ nv_asic_reset_method(struct amdgpu_device *adev)
>>>     	}
>>>     }
>>>     
>>> +/**
>>> + * drm_sysfs_reset_event - generate a DRM uevent
>>> + * @dev: DRM device
>>> + *
>>> + * Send a uevent for the DRM device specified by @dev.  Currently we
>>> +only
>>> + * set AMDGPURESET=1 in the uevent environment, but this could be
>>> +expanded to
>>> + * deal with other types of events.
>>> + *
>>> + * Any new uapi should be using the
>>> +drm_sysfs_connector_status_event()
>>> + * for uevents on connector status change.
>>> + */
>>> +void drm_sysfs_reset_event(struct drm_device *dev)
>> This should probably go directly into the DRM subsystem.
>>
>>> +{
>>> +	char *event_string = "AMDGPURESET=1";
>>> +	char *envp[2] = { event_string, NULL };
>>> +
>>> +	kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>> As I said this approach is a clear NAK. We can't allocate any memory when we do a reset.
>>
>> Regards,
>> Christian.
>>
>> Can I do something like this:
>>
>> void drm_sysfs_reset_event(struct drm_device *dev)
>>    {
>> -       char *event_string = "AMDGPURESET=1";
>> -       char *envp[2] = { event_string, NULL };
>> +       char **envp;
>> +
>> +       envp = kcalloc(2,sizeof(char *), GFP_ATOMIC);
>> +       envp[0] = kcalloc(30, sizeof(char), GFP_ATOMIC);
>> +       envp[1] = kcalloc(30, sizeof(char), GFP_ATOMIC);
> No, not really. kobject_uevent_env() will still allocate memory without GFP_ATOMIC.
>
> I think the whole approach of using udev won't work for this.
>
> Regards,
> Christian.
>
> I have tested it with sample applications:
> Gpu reset:
> sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover
>
> And I never missed the AMDGPURESET=1 event in user space,

That's not the problem. Allocating memory when we need to do a reset can 
cause a *HARD* kernel deadlock.

This is absolutely not something we can do and Daniel even tried to add 
a few lockdep annotations for this.

So automated testing scripts will complain that this won't work.

Regards,
Christian.

> even with continues resets with sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover .


>
> Regards,
> S.Amarnath
>> +
>> +       strcpy(envp[0], "AMDGPURESET=1");
>> +       strcpy(envp[1], "");
>> +
>>
>>           kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
>> envp);
>> +
>> +       kfree(envp[0]);
>> +       kfree(envp[1]);
>> +       kfree(envp);
>>    }
>>
>> Regards,
>> S.Amarnath
>>
>>> +}
>>> +
>>> +void amdgpu_reset_dumps(struct amdgpu_device *adev) {
>>> +	struct drm_device *ddev = adev_to_drm(adev);
>>> +	int r = 0, i;
>>> +
>>> +	/* original raven doesn't have full asic reset */
>>> +	if ((adev->apu_flags & AMD_APU_IS_RAVEN) &&
>>> +		!(adev->apu_flags & AMD_APU_IS_RAVEN2))
>>> +		return;
>>> +	for (i = 0; i < adev->num_ip_blocks; i++) {
>>> +		if (!adev->ip_blocks[i].status.valid)
>>> +			continue;
>>> +		if (!adev->ip_blocks[i].version->funcs->reset_reg_dumps)
>>> +			continue;
>>> +		r = adev->ip_blocks[i].version->funcs->reset_reg_dumps(adev);
>>> +
>>> +		if (r)
>>> +			DRM_ERROR("reset_reg_dumps of IP block <%s> failed %d\n",
>>> +					adev->ip_blocks[i].version->funcs->name, r);
>>> +	}
>>> +
>>> +	drm_sysfs_reset_event(ddev);
>>> +	dump_stack();
>>> +}
>>> +
>>>     static int nv_asic_reset(struct amdgpu_device *adev)
>>>     {
>>>     	int ret = 0;
>>>     
>>> +	amdgpu_reset_dumps(adev);
>>>     	switch (nv_asic_reset_method(adev)) {
>>>     	case AMD_RESET_METHOD_PCI:
>>>     		dev_info(adev->dev, "PCI reset\n");


  reply	other threads:[~2022-01-17 10:20 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-17  6:33 [PATCH 2/2] drm/amdgpu: add AMDGPURESET uevent on AMD GPU reset Somalapuram Amaranath
2022-01-17  7:27 ` Christian König
2022-01-17 10:01   ` Somalapuram, Amaranath
2022-01-17 10:03     ` Christian König
2022-01-17 10:09       ` Somalapuram, Amaranath
2022-01-17 10:19         ` Christian König [this message]
2022-01-17 10:34           ` Somalapuram, Amaranath
2022-01-17 10:37             ` Christian König
2022-01-17 11:43               ` Sharma, Shashank
2022-01-17 11:57                 ` Christian König
2022-01-17 14:19                   ` Somalapuram, Amaranath
2022-01-17 16:07                     ` Christian König
2022-01-17 11:54 ` Lazar, Lijo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62372e77-31c8-211f-0d9c-01a0f880badf@amd.com \
    --to=christian.koenig@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Amaranath.Somalapuram@amd.com \
    --cc=Shashank.Sharma@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.