All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Liu, Monk" <Monk.Liu-5C7GfCeVMHo@public.gmane.org>
To: "Koenig,
	Christian" <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>,
	"amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Subject: RE: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset
Date: Wed, 20 Sep 2017 02:27:21 +0000	[thread overview]
Message-ID: <BLUPR12MB04498A808CB3E4513CC4637C84610@BLUPR12MB0449.namprd12.prod.outlook.com> (raw)
In-Reply-To: <45fa4145-41a4-6186-4f35-4f3347bad601-5C7GfCeVMHo@public.gmane.org>

Oh, I see your point, but that actually presents for a cleanup patch, and mine is to add a condition to fix memory leak, I think they different purpose and should be separated,

I can add one more patch to cleanup it with that "create_bo_kenel" to make code more tight and clean

BR Monk

-----Original Message-----
From: Koenig, Christian 
Sent: 2017年9月18日 19:35
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset

Am 18.09.2017 um 12:47 schrieb Liu, Monk:
> I didn't get your point... how could bo_create_kernel solve my issue ?

It doesn't solve the underlying issue, you just need less code for your workaround.

With bo_create_kernel you can do create/pin/kmap in just one function call.

>
> The thing here is during gpu reset we invoke hw_init for every hw 
> component, and by design hw_init shouldn't doing anything software 
> related, thus the BO allocating in hw_init is wrong,

Yeah, but your patch doesn't fix that either as far as I can see.

> Even switch to bo_create_kernel won't address the issue ...

See the implementation of bo_create_kernel():
>         if (!*bo_ptr) {
>                 r = amdgpu_bo_create(adev, size, align, true, domain,
....
>         }
....
>         r = amdgpu_bo_pin(*bo_ptr, domain, gpu_addr);
...
>         if (cpu_addr) {
>                 r = amdgpu_bo_kmap(*bo_ptr, cpu_addr);
...
>         }

Creating is actually optional, but the function always pins the BO once more and figures out it's CPU address.

As far as I can see that should solve your problem for now.

Christian.


>
>
> BR Monk
>
> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
> Sent: 2017年9月18日 17:13
> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu 
> reset
>
> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>> doing gpu reset will rerun all hw_init and thus ucode_init_bo is 
>> invoked again, so we need to skip the fw_buf allocation during sriov 
>> gpu reset to avoid memory leak.
>>
>> Change-Id: I31131eda1bd45ea2f5bdc50c5da5fc5a9fe9027d
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h       |  3 ++
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 64 +++++++++++++++----------------
>>    2 files changed, 35 insertions(+), 32 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 6ff2959..3d0c633 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1185,6 +1185,9 @@ struct amdgpu_firmware {
>>    
>>    	/* gpu info firmware data pointer */
>>    	const struct firmware *gpu_info_fw;
>> +
>> +	void *fw_buf_ptr;
>> +	uint64_t fw_buf_mc;
>>    };
>>    
>>    /*
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> index f306374..6564902 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> @@ -360,8 +360,6 @@ static int amdgpu_ucode_patch_jt(struct amdgpu_firmware_info *ucode,
>>    int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
>>    {
>>    	struct amdgpu_bo **bo = &adev->firmware.fw_buf;
>> -	uint64_t fw_mc_addr;
>> -	void *fw_buf_ptr = NULL;
>>    	uint64_t fw_offset = 0;
>>    	int i, err;
>>    	struct amdgpu_firmware_info *ucode = NULL; @@ -372,37 +370,39 @@ 
>> int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
>>    		return 0;
>>    	}
>>    
>> -	err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true,
>> -				amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
>> -				AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
>> -				NULL, NULL, 0, bo);
>> -	if (err) {
>> -		dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err);
>> -		goto failed;
>> -	}
>> +	if (!amdgpu_sriov_vf(adev) || !adev->in_sriov_reset) {
> Instead of all this better use amdgpu_bo_create_kernel(), this should already include most of the handling necessary here.
>
> Christian.
>
>> +		err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true,
>> +					amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
>> +					AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
>> +					NULL, NULL, 0, bo);
>> +		if (err) {
>> +			dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err);
>> +			goto failed;
>> +		}
>>    
>> -	err = amdgpu_bo_reserve(*bo, false);
>> -	if (err) {
>> -		dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err);
>> -		goto failed_reserve;
>> -	}
>> +		err = amdgpu_bo_reserve(*bo, false);
>> +		if (err) {
>> +			dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err);
>> +			goto failed_reserve;
>> +		}
>>    
>> -	err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
>> -				&fw_mc_addr);
>> -	if (err) {
>> -		dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err);
>> -		goto failed_pin;
>> -	}
>> +		err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
>> +					&adev->firmware.fw_buf_mc);
>> +		if (err) {
>> +			dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err);
>> +			goto failed_pin;
>> +		}
>>    
>> -	err = amdgpu_bo_kmap(*bo, &fw_buf_ptr);
>> -	if (err) {
>> -		dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err);
>> -		goto failed_kmap;
>> -	}
>> +		err = amdgpu_bo_kmap(*bo, &adev->firmware.fw_buf_ptr);
>> +		if (err) {
>> +			dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err);
>> +			goto failed_kmap;
>> +		}
>>    
>> -	amdgpu_bo_unreserve(*bo);
>> +		amdgpu_bo_unreserve(*bo);
>> +	}
>>    
>> -	memset(fw_buf_ptr, 0, adev->firmware.fw_size);
>> +	memset(adev->firmware.fw_buf_ptr, 0, adev->firmware.fw_size);
>>    
>>    	/*
>>    	 * if SMU loaded firmware, it needn't add SMC, UVD, and VCE @@
>> -421,14 +421,14 @@ int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
>>    		ucode = &adev->firmware.ucode[i];
>>    		if (ucode->fw) {
>>    			header = (const struct common_firmware_header *)ucode->fw->data;
>> -			amdgpu_ucode_init_single_fw(adev, ucode, fw_mc_addr + fw_offset,
>> -						    (void *)((uint8_t *)fw_buf_ptr + fw_offset));
>> +			amdgpu_ucode_init_single_fw(adev, ucode, adev->firmware.fw_buf_mc + fw_offset,
>> +						    adev->firmware.fw_buf_ptr + fw_offset);
>>    			if (i == AMDGPU_UCODE_ID_CP_MEC1 &&
>>    			    adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) {
>>    				const struct gfx_firmware_header_v1_0 *cp_hdr;
>>    				cp_hdr = (const struct gfx_firmware_header_v1_0 *)ucode->fw->data;
>> -				amdgpu_ucode_patch_jt(ucode, fw_mc_addr + fw_offset,
>> -						    fw_buf_ptr + fw_offset);
>> +				amdgpu_ucode_patch_jt(ucode,  adev->firmware.fw_buf_mc + fw_offset,
>> +						    adev->firmware.fw_buf_ptr + fw_offset);
>>    				fw_offset += ALIGN(le32_to_cpu(cp_hdr->jt_size) << 2, PAGE_SIZE);
>>    			}
>>    			fw_offset += ALIGN(ucode->ucode_size, PAGE_SIZE);
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply	other threads:[~2017-09-20  2:27 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-18  6:11 [PATCH 00/18] *** misc patches for SRIOV *** Monk Liu
     [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  6:11   ` [PATCH 01/18] drm/amdgpu/sriov:fix missing error handling Monk Liu
     [not found]     ` <1505715122-23904-2-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:04       ` Christian König
2017-09-18  6:11   ` [PATCH 02/18] drm/amdgpu:no kiq in IH Monk Liu
     [not found]     ` <1505715122-23904-3-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:05       ` Christian König
2017-09-18  6:11   ` [PATCH 03/18] drm/amdgpu/sriov:move in_reset to adev and rename Monk Liu
     [not found]     ` <1505715122-23904-4-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:05       ` Christian König
2017-09-18  6:11   ` [PATCH 04/18] drm/amdgpu/sriov:don't load psp fw during gpu reset Monk Liu
     [not found]     ` <1505715122-23904-5-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:06       ` Christian König
     [not found]         ` <2cd93ffd-91a6-77c6-b07c-c68188a340a5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-20  1:32           ` Quan, Evan
     [not found]             ` <DM5PR1201MB2489EF41F0B4703FE248AEBDE4610-grEf7a3NxMAAZHT/xKzwlGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-09-20  1:54               ` Liu, Monk
2017-09-18  6:11   ` [PATCH 05/18] drm/amdgpu:make ctx_add_fence interruptible Monk Liu
     [not found]     ` <1505715122-23904-6-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:10       ` Christian König
2017-09-18  6:11   ` [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset Monk Liu
     [not found]     ` <1505715122-23904-7-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:12       ` Christian König
     [not found]         ` <f96a1189-2fe3-6466-df1b-557f87319cb9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-18 10:47           ` Liu, Monk
     [not found]             ` <BLUPR12MB0449D8D7812A4C80EDA2253D84630-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-18 11:34               ` Christian König
     [not found]                 ` <45fa4145-41a4-6186-4f35-4f3347bad601-5C7GfCeVMHo@public.gmane.org>
2017-09-20  2:27                   ` Liu, Monk [this message]
2017-09-18  6:11   ` [PATCH 07/18] drm/amdgpu:add hdp golden setting register name hint Monk Liu
     [not found]     ` <1505715122-23904-8-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:13       ` Christian König
2017-09-18  6:11   ` [PATCH 08/18] drm/amdgpu:halt when vm fault Monk Liu
     [not found]     ` <1505715122-23904-9-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:14       ` Christian König
2017-09-18  6:11   ` [PATCH 09/18] drm/amdgpu:insert TMZ_BEGIN Monk Liu
     [not found]     ` <1505715122-23904-10-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:15       ` Christian König
2017-09-18  6:11   ` [PATCH 10/18] drm/amdgpu:hdp flush should be put it initialized Monk Liu
     [not found]     ` <1505715122-23904-11-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:16       ` Christian König
2017-09-18  6:11   ` [PATCH 11/18] drm/amdgpu:add vgt_flush for gfx9 Monk Liu
     [not found]     ` <1505715122-23904-12-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:18       ` Christian König
     [not found]         ` <34ac878c-5bf7-7735-1787-b5d3c1691fd2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-18 15:48           ` Marek Olšák
2017-09-18  6:11   ` [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate Monk Liu
     [not found]     ` <1505715122-23904-13-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:19       ` Christian König
     [not found]         ` <2f11f862-6022-7a97-17ab-ae2c634f0061-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-18 11:03           ` Liu, Monk
     [not found]             ` <BLUPR12MB04497CDE395DCE35F830DD4F84630-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-18 11:39               ` Christian König
     [not found]                 ` <4de1beaf-95c0-ba6e-da79-1070074f82e8-5C7GfCeVMHo@public.gmane.org>
2017-09-19  4:04                   ` Liu, Monk
     [not found]                     ` <BLUPR12MB0449D86C880B4B15A4FD916884600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-19  4:25                       ` Zhou, David(ChunMing)
     [not found]                         ` <MWHPR1201MB020621C233AA2C12F6127C61B4600-3iK1xFAIwjrUF/YbdlDdgWrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-09-19  6:46                           ` Liu, Monk
     [not found]                             ` <BLUPR12MB0449F560B6A658DC4C120EC084600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-19  6:50                               ` zhoucm1
     [not found]                                 ` <baa9518f-d2b1-cfb8-8f98-c3557e3ef8fe-5C7GfCeVMHo@public.gmane.org>
2017-09-19  7:00                                   ` Liu, Monk
     [not found]                                     ` <BLUPR12MB0449775C4245A708B15E9D0B84600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-19  7:02                                       ` zhoucm1
     [not found]                                         ` <5367a2b2-3044-7388-08ff-6f0a620d5aa8-5C7GfCeVMHo@public.gmane.org>
2017-09-19  8:30                                           ` Christian König
     [not found]                                             ` <28fa17b6-ebb0-99c7-042a-19289d858f64-5C7GfCeVMHo@public.gmane.org>
2017-09-19  9:34                                               ` Zhang, Jerry (Junwei)
2017-09-19 13:42                                               ` Alex Deucher
2017-09-18  6:11   ` [PATCH 13/18] drm/amdgpu:fix driver unloading bug Monk Liu
     [not found]     ` <1505715122-23904-14-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:27       ` Christian König
     [not found]         ` <1821bf91-83d8-c933-704d-fcd8db07def1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-18 10:12           ` Liu, Monk
     [not found]             ` <BLUPR12MB0449D3944109EA4A7D151A2684630-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-18 11:53               ` Christian König
     [not found]                 ` <fade2e70-6594-9a6e-9d5a-d488d360363e-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-19  4:14                   ` Liu, Monk
     [not found]                     ` <BLUPR12MB04498EEB2BF374C72EF7CF5384600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-19  8:26                       ` Christian König
     [not found]                         ` <69a1e774-6a9e-31c6-8b30-dfbd430062c8-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-19 11:37                           ` Liu, Monk
2017-09-18  6:11   ` [PATCH 14/18] drm/amdgpu: Fix amdgpu reload failure under SRIOV Monk Liu
     [not found]     ` <1505715122-23904-15-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:10       ` Yu, Xiangliang
2017-09-18  9:31       ` Christian König
     [not found]         ` <0951ed06-954a-0f31-6b6e-ba923be008a2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-18 21:07           ` Alex Deucher
     [not found]             ` <CADnq5_Nj5Kqp4CXtFLLz-cPynvchBV-RLFFpB6e5D-OCyPXQiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-09-19  1:52               ` Yu, Xiangliang
2017-09-18  6:11   ` [PATCH 15/18] drm/amdgpu/sriov: fix page fault issue of driver unload Monk Liu
     [not found]     ` <1505715122-23904-16-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:22       ` Christian König
2017-09-18  6:12   ` [PATCH 16/18] drm/amdgpu: increate mailbox polling timeout to 12s Monk Liu
     [not found]     ` <1505715122-23904-17-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:23       ` Christian König
2017-09-18  6:12   ` [PATCH 17/18] drm/amdgpu:fix uvd ring fini routine Monk Liu
     [not found]     ` <1505715122-23904-18-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:25       ` Christian König
2017-09-18  6:12   ` [PATCH 18/18] drm/amdgpu/sriov:init csb for gfxv9 Monk Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BLUPR12MB04498A808CB3E4513CC4637C84610@BLUPR12MB0449.namprd12.prod.outlook.com \
    --to=monk.liu-5c7gfcevmho@public.gmane.org \
    --cc=Christian.Koenig-5C7GfCeVMHo@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.