All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: "Deng, Emily" <Emily.Deng@amd.com>,
	"Chen, Jiansong (Simon)" <Jiansong.Chen@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
Date: Tue, 30 Mar 2021 10:37:56 +0200	[thread overview]
Message-ID: <df01f5bf-b4f0-9e93-0ef7-0caee4633fb7@gmail.com> (raw)
In-Reply-To: <BY5PR12MB41156DDEF8EAF5CE77B2AD598F7D9@BY5PR12MB4115.namprd12.prod.outlook.com>

Hi Emily,

as I said add a WARN_ON() and look at the backtrace.

It could be that the backtrace then just shows the general cleanup 
functions, but it is at least a start.

On the other hand if you only see this sometimes then we have some kind 
of race condition and need to dig deeper.

Christian.

Am 30.03.21 um 10:19 schrieb Deng, Emily:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi Christian,
>       Yes, I agree both with you. But the issue occurs randomly and in unload driver and in fairly low rate. It is hard to debug where is the memory leak. Could you give some suggestion about how
> to debug this issue?
>
>
> Best wishes
> Emily Deng
>
>
>
>> -----Original Message-----
>> From: Christian König <ckoenig.leichtzumerken@gmail.com>
>> Sent: Tuesday, March 30, 2021 3:11 PM
>> To: Deng, Emily <Emily.Deng@amd.com>; Chen, Jiansong (Simon)
>> <Jiansong.Chen@amd.com>; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>
>> Good morning,
>>
>> yes Jiansong is right that patch is really not a good idea.
>>
>> Moving buffers can indeed happen during shutdown while some memory is
>> still referenced.
>>
>> Just ignoring the move is not the right approach, you need to find out why the
>> memory is moved in the first place.
>>
>> You could add something like WARN_ON(adev->shutdown);
>>
>> Regards,
>> Christian.
>>
>> Am 30.03.21 um 09:05 schrieb Deng, Emily:
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>> Hi Jiansong,
>>>        It does happen,  maybe have the race condition?
>>>
>>>
>>> Best wishes
>>> Emily Deng
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Chen, Jiansong (Simon) <Jiansong.Chen@amd.com>
>>>> Sent: Tuesday, March 30, 2021 2:49 PM
>>>> To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>>>> Cc: Deng, Emily <Emily.Deng@amd.com>
>>>> Subject: RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>>
>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>
>>>> I still wonder how the issue takes place? According to my humble
>>>> knowledge in driver model, the reference count of the kobject for the
>>>> device will not reach zero when there is still some device mem
>>>> access, and shutdown should not happen.
>>>>
>>>> Regards,
>>>> Jiansong
>>>> -----Original Message-----
>>>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>>>> Emily Deng
>>>> Sent: Tuesday, March 30, 2021 12:42 PM
>>>> To: amd-gfx@lists.freedesktop.org
>>>> Cc: Deng, Emily <Emily.Deng@amd.com>
>>>> Subject: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>>
>>>> During driver unloading, don't need to copy mem, or it will introduce
>>>> some call trace, such as when sa_manager is freed, it will introduce
>>>> warn call trace in amdgpu_sa_bo_new.
>>>>
>>>> Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> index e00263bcc88b..f0546a489e0d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> @@ -317,6 +317,9 @@ int amdgpu_ttm_copy_mem_to_mem(struct
>>>> amdgpu_device *adev,  struct dma_fence *fence = NULL;  int r = 0;
>>>>
>>>> +if (adev->shutdown)
>>>> +return 0;
>>>> +
>>>> if (!adev->mman.buffer_funcs_enabled) {  DRM_ERROR("Trying to move
>>>> memory with ring turned off.\n");  return -EINVAL;
>>>> --
>>>> 2.25.1
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
>>>> ts.fr
>>>> eedesktop.org%2Fmailman%2Flistinfo%2Famd-
>>>>
>> gfx&amp;data=04%7C01%7CJiansong.Chen%40amd.com%7C1b4c71d7b96247
>> 6a367508d8f3362f40%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>> C637526761354532311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdat
>> a=RxRnZW0fmwjKSGMN1nf6kIHRdAPVs9J5OBluDYhR6vQ%3D&amp;reserved
>>>> =0
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
>>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-
>> gfx&amp;data=04%7C01%7CEm
>> ily.Deng%40amd.com%7Cffacb4715aff4ba4336808d8f34af62d%7C3dd8961fe4
>> 884e
>> 608e11a82d994e183d%7C0%7C0%7C637526850578585302%7CUnknown%7CT
>> WFpbGZsb3
>> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
>> 3D%7
>> C1000&amp;sdata=u26JPASmJOF5nkXFSJP89PiUUFehvzf%2B2qxQM%2FgT9Ek
>> %3D&amp
>>> ;reserved=0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply	other threads:[~2021-03-30  8:38 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-30  4:41 [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12 Emily Deng
2021-03-30  4:41 ` [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual ctrc Emily Deng
2021-03-31  9:00   ` Deng, Emily
2021-04-01  6:03     ` Deng, Emily
2021-03-30  4:41 ` [PATCH 3/6] drm/amdgpu: Restore msix after FLR Emily Deng
2021-03-30  5:37   ` Chen, Guchun
2021-03-30  8:07     ` Deng, Emily
2021-03-30  4:41 ` [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from vram for navi12 sriov Emily Deng
2021-03-31  9:01   ` Deng, Emily
2021-04-01  6:03     ` Deng, Emily
2021-03-30  4:41 ` [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12 Emily Deng
2021-03-30  7:12   ` Christian König
2021-03-30  7:20     ` Deng, Emily
2021-03-30  7:24       ` Christian König
2021-03-30  7:40         ` Deng, Emily
2021-03-30  4:41 ` [PATCH 6/6] drm/amdgpu: Fix driver unload issue Emily Deng
2021-03-30  6:49   ` Chen, Jiansong (Simon)
2021-03-30  7:05     ` Deng, Emily
2021-03-30  7:10       ` Christian König
2021-03-30  8:19         ` Deng, Emily
2021-03-30  8:37           ` Christian König [this message]
2021-03-30  9:11             ` Deng, Emily
2021-03-31  9:00 ` [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12 Deng, Emily
2021-04-01  6:03   ` Deng, Emily
  -- strict thread matches above, loose matches on Subject: below --
2021-03-29  7:49 Emily Deng
2021-03-29  7:49 ` [PATCH 6/6] drm/amdgpu: Fix driver unload issue Emily Deng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df01f5bf-b4f0-9e93-0ef7-0caee4633fb7@gmail.com \
    --to=ckoenig.leichtzumerken@gmail.com \
    --cc=Emily.Deng@amd.com \
    --cc=Jiansong.Chen@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.