All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: fix missed gpu info firmware when cache firmware during S3
@ 2017-06-06  6:24 Huang Rui
       [not found] ` <1496730269-27140-1-git-send-email-ray.huang-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Huang Rui @ 2017-06-06  6:24 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Alex Deucher,
	Christian König
  Cc: Ken Wang, Huang Rui, Alvin Huan

gpu_info firmware is released after data is used. But when system enters into
suspend, upper class driver will cache all firmware names. At that time,
gpu_info will be failing to load. It seems an upper class issue, that we should
not release gpu_info firmware until device finished.

[  903.236589] cache_firmware: amdgpu/vega10_sdma1.bin
[  903.236590] fw_set_page_data: fw-amdgpu/vega10_sdma1.bin buf=ffff88041eee10c0 data=ffffc90002561000 size=17408
[  903.236591] cache_firmware: amdgpu/vega10_sdma1.bin ret=0
[  903.464160] __allocate_fw_buf: fw-amdgpu/vega10_gpu_info.bin buf=ffff88041eee2c00
[  903.471815] (NULL device *): loading /lib/firmware/updates/4.11.0-custom/amdgpu/vega10_gpu_info.bin failed with error -2
[  903.482870] (NULL device *): loading /lib/firmware/updates/amdgpu/vega10_gpu_info.bin failed with error -2
[  903.492716] (NULL device *): loading /lib/firmware/4.11.0-custom/amdgpu/vega10_gpu_info.bin failed with error -2
[  903.503156] (NULL device *): direct-loading amdgpu/vega10_gpu_info.bin

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 18 ++++++++++--------
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8658643..54ee050 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1272,6 +1272,9 @@ struct amdgpu_firmware {
 	const struct amdgpu_psp_funcs *funcs;
 	struct amdgpu_bo *rbuf;
 	struct mutex mutex;
+
+	/* gpu info firmware data pointer */
+	const struct firmware *fw;
 };
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6883fe1..af8f8b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1426,12 +1426,13 @@ static void amdgpu_device_enable_virtual_display(struct amdgpu_device *adev)
 
 static int amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
 {
-	const struct firmware *fw;
 	const char *chip_name;
 	char fw_name[30];
 	int err;
 	const struct gpu_info_firmware_header_v1_0 *hdr;
 
+	adev->firmware.fw = NULL;
+
 	switch (adev->asic_type) {
 	case CHIP_TOPAZ:
 	case CHIP_TONGA:
@@ -1466,14 +1467,14 @@ static int amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
 	}
 
 	snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_gpu_info.bin", chip_name);
-	err = request_firmware(&fw, fw_name, adev->dev);
+	err = request_firmware(&adev->firmware.fw, fw_name, adev->dev);
 	if (err) {
 		dev_err(adev->dev,
 			"Failed to load gpu_info firmware \"%s\"\n",
 			fw_name);
 		goto out;
 	}
-	err = amdgpu_ucode_validate(fw);
+	err = amdgpu_ucode_validate(adev->firmware.fw);
 	if (err) {
 		dev_err(adev->dev,
 			"Failed to validate gpu_info firmware \"%s\"\n",
@@ -1481,14 +1482,14 @@ static int amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
 		goto out;
 	}
 
-	hdr = (const struct gpu_info_firmware_header_v1_0 *)fw->data;
+	hdr = (const struct gpu_info_firmware_header_v1_0 *)adev->firmware.fw->data;
 	amdgpu_ucode_print_gpu_info_hdr(&hdr->header);
 
 	switch (hdr->version_major) {
 	case 1:
 	{
 		const struct gpu_info_firmware_v1_0 *gpu_info_fw =
-			(const struct gpu_info_firmware_v1_0 *)(fw->data +
+			(const struct gpu_info_firmware_v1_0 *)(adev->firmware.fw->data +
 								le32_to_cpu(hdr->header.ucode_array_offset_bytes));
 
 		adev->gfx.config.max_shader_engines = gpu_info_fw->gc_num_se;
@@ -1513,9 +1514,6 @@ static int amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
 		goto out;
 	}
 out:
-	release_firmware(fw);
-	fw = NULL;
-
 	return err;
 }
 
@@ -2313,6 +2311,10 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
 	amdgpu_fence_driver_fini(adev);
 	amdgpu_fbdev_fini(adev);
 	r = amdgpu_fini(adev);
+	if (adev->firmware.fw) {
+		release_firmware(adev->firmware.fw);
+		adev->firmware.fw = NULL;
+	}
 	adev->accel_working = false;
 	/* free i2c buses */
 	if (!amdgpu_device_has_dc_support(adev))
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: fix missed gpu info firmware when cache firmware during S3
       [not found] ` <1496730269-27140-1-git-send-email-ray.huang-5C7GfCeVMHo@public.gmane.org>
@ 2017-06-06  8:00   ` Christian König
       [not found]     ` <9f43e1be-6d92-cba9-80e0-274d07109c5f-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2017-06-06  8:00 UTC (permalink / raw)
  To: Huang Rui, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Alex Deucher, Christian König
  Cc: Ken Wang, Alvin Huan

Hi Ray,

mhm, indeed a nice catch.

But why do we need to load the gpu info after resume in the first place?

I mean we already know what GPU we have, loading it again looks 
superfluous to me.

Regards,
Christian.

Am 06.06.2017 um 08:24 schrieb Huang Rui:
> gpu_info firmware is released after data is used. But when system enters into
> suspend, upper class driver will cache all firmware names. At that time,
> gpu_info will be failing to load. It seems an upper class issue, that we should
> not release gpu_info firmware until device finished.
>
> [  903.236589] cache_firmware: amdgpu/vega10_sdma1.bin
> [  903.236590] fw_set_page_data: fw-amdgpu/vega10_sdma1.bin buf=ffff88041eee10c0 data=ffffc90002561000 size=17408
> [  903.236591] cache_firmware: amdgpu/vega10_sdma1.bin ret=0
> [  903.464160] __allocate_fw_buf: fw-amdgpu/vega10_gpu_info.bin buf=ffff88041eee2c00
> [  903.471815] (NULL device *): loading /lib/firmware/updates/4.11.0-custom/amdgpu/vega10_gpu_info.bin failed with error -2
> [  903.482870] (NULL device *): loading /lib/firmware/updates/amdgpu/vega10_gpu_info.bin failed with error -2
> [  903.492716] (NULL device *): loading /lib/firmware/4.11.0-custom/amdgpu/vega10_gpu_info.bin failed with error -2
> [  903.503156] (NULL device *): direct-loading amdgpu/vega10_gpu_info.bin
>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  3 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 18 ++++++++++--------
>   2 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 8658643..54ee050 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1272,6 +1272,9 @@ struct amdgpu_firmware {
>   	const struct amdgpu_psp_funcs *funcs;
>   	struct amdgpu_bo *rbuf;
>   	struct mutex mutex;
> +
> +	/* gpu info firmware data pointer */
> +	const struct firmware *fw;
>   };
>   
>   /*
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 6883fe1..af8f8b3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1426,12 +1426,13 @@ static void amdgpu_device_enable_virtual_display(struct amdgpu_device *adev)
>   
>   static int amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
>   {
> -	const struct firmware *fw;
>   	const char *chip_name;
>   	char fw_name[30];
>   	int err;
>   	const struct gpu_info_firmware_header_v1_0 *hdr;
>   
> +	adev->firmware.fw = NULL;
> +
>   	switch (adev->asic_type) {
>   	case CHIP_TOPAZ:
>   	case CHIP_TONGA:
> @@ -1466,14 +1467,14 @@ static int amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
>   	}
>   
>   	snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_gpu_info.bin", chip_name);
> -	err = request_firmware(&fw, fw_name, adev->dev);
> +	err = request_firmware(&adev->firmware.fw, fw_name, adev->dev);
>   	if (err) {
>   		dev_err(adev->dev,
>   			"Failed to load gpu_info firmware \"%s\"\n",
>   			fw_name);
>   		goto out;
>   	}
> -	err = amdgpu_ucode_validate(fw);
> +	err = amdgpu_ucode_validate(adev->firmware.fw);
>   	if (err) {
>   		dev_err(adev->dev,
>   			"Failed to validate gpu_info firmware \"%s\"\n",
> @@ -1481,14 +1482,14 @@ static int amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
>   		goto out;
>   	}
>   
> -	hdr = (const struct gpu_info_firmware_header_v1_0 *)fw->data;
> +	hdr = (const struct gpu_info_firmware_header_v1_0 *)adev->firmware.fw->data;
>   	amdgpu_ucode_print_gpu_info_hdr(&hdr->header);
>   
>   	switch (hdr->version_major) {
>   	case 1:
>   	{
>   		const struct gpu_info_firmware_v1_0 *gpu_info_fw =
> -			(const struct gpu_info_firmware_v1_0 *)(fw->data +
> +			(const struct gpu_info_firmware_v1_0 *)(adev->firmware.fw->data +
>   								le32_to_cpu(hdr->header.ucode_array_offset_bytes));
>   
>   		adev->gfx.config.max_shader_engines = gpu_info_fw->gc_num_se;
> @@ -1513,9 +1514,6 @@ static int amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev)
>   		goto out;
>   	}
>   out:
> -	release_firmware(fw);
> -	fw = NULL;
> -
>   	return err;
>   }
>   
> @@ -2313,6 +2311,10 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>   	amdgpu_fence_driver_fini(adev);
>   	amdgpu_fbdev_fini(adev);
>   	r = amdgpu_fini(adev);
> +	if (adev->firmware.fw) {
> +		release_firmware(adev->firmware.fw);
> +		adev->firmware.fw = NULL;
> +	}
>   	adev->accel_working = false;
>   	/* free i2c buses */
>   	if (!amdgpu_device_has_dc_support(adev))


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: fix missed gpu info firmware when cache firmware during S3
       [not found]     ` <9f43e1be-6d92-cba9-80e0-274d07109c5f-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2017-06-06  8:33       ` Huang Rui
  2017-06-06 11:22         ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: Huang Rui @ 2017-06-06  8:33 UTC (permalink / raw)
  To: Christian König
  Cc: Deucher, Alexander, Huan, Alvin, Wang, Ken, Koenig, Christian,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Tue, Jun 06, 2017 at 04:00:29PM +0800, Christian König wrote:
> Hi Ray,
> 
> mhm, indeed a nice catch.
> 
> But why do we need to load the gpu info after resume in the first place?
> 
> I mean we already know what GPU we have, loading it again looks
> superfluous to me.
> 

Yes, I agree with you. That's also my orignal opinion.
But we encountered a random buggy when we were calling
device_cache_fw_images.

[  558.288976] cache_firmware: amdgpu/vega10_sdma1.bin
[  558.288976] cache_firmware: amdgpu/vega10_sdma.bin ret=0
[  558.288981] fw_set_page_data: fw-amdgpu/vega10_sdma1.bin buf=ffff8803f1e64a80 data=ffffc90002411000 size=17408
[  558.288981] cache_firmware: amdgpu/vega10_sdma1.bin ret=0
[  558.288997] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[  558.289001] IP: devres_for_each_res+0x5e/0x100
[  558.289001] PGD 0
[  558.289002] Oops: 0000 [#3] SMP
[  558.289003] Modules linked in: joydev hid_generic usbhid amdgpu(OE) ttm(OE) drm_kms_helper(OE) drm(OE) i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt rpcsec_gss_krb5 nfsv4 nfs fscache snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core intel_rapl snd_hwdep x86_pkg_temp_thermal intel_powerclamp snd_pcm kvm_intel snd_seq_midi kvm snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer irqbypass snd crct10dif_pclmul soundcore crc32_pclmul ghash_clmulni_intel pcbc mei_me aesni_intel shpchp mei aes_x86_64 crypto_simd glue_helper mac_hid cryptd acpi_pad tpm_infineon nfsd auth_rpcgss nfs_acl coretemp lockd grace sunrpc parport_pc ppdev lp parport autofs4 e1000e ptp nvme mxm_wmi ahci i2c_hid pps_core libahci nvme_core wmi video hid
[  558.289027] CPU: 0 PID: 3742 Comm: pm-suspend Tainted: G      D    OE   4.11.0-custom #7
[  558.289027] Hardware name: Gigabyte Technology Co., Ltd. Z170XP-SLI/Z170XP-SLI-CF, BIOS F20 11/04/2016
[  558.289027] task: ffff8803ebdcd940 task.stack: ffffc900029b0000
[  558.289029] RIP: 0010:devres_for_each_res+0x5e/0x100
[  558.289029] RSP: 0018:ffffc900029b3bc8 EFLAGS: 00010086
[  558.289030] RAX: 000000000000001d RBX: ffff880426aa0c18 RCX: 0000000000000000
[  558.289030] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000092
[  558.289031] RBP: ffffc900029b3c20 R08: 000000000000001d R09: ffffffff821ee601
[  558.289031] R10: 000000000000141d R11: 0000000000000000 R12: ffffffff81566590
[  558.289032] R13: ffffffff81566870 R14: ffffc900029b3c30 R15: ffff880426aa0e98
[  558.289032] FS:  00007f63006c3700(0000) GS:ffff88043ec00000(0000) knlGS:0000000000000000
[  558.289033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  558.289033] CR2: 0000000000000008 CR3: 00000003f257e000 CR4: 00000000003406f0
[  558.289034] Call Trace:
[  558.289036]  ? alloc_fw_cache_entry+0x60/0x60
[  558.289037]  ? request_firmware_nowait+0x140/0x140
[  558.289038]  dev_cache_fw_image+0x46/0x120
[  558.289039]  ? request_firmware_nowait+0x140/0x140
[  558.289040]  dpm_for_each_dev+0x44/0x70
[  558.289041]  fw_pm_notify+0x164/0x190
[  558.289043]  ? prepare_to_wait_event+0x110/0x110
[  558.289044]  notifier_call_chain+0x49/0x70
[  558.289046]  __blocking_notifier_call_chain+0x4d/0x70
[  558.289047]  __pm_notifier_call_chain+0x1f/0x40
[  558.289047]  pm_suspend+0x27f/0x3a0
[  558.289048]  state_store+0x80/0xf0
[  558.289050]  kobj_attr_store+0xf/0x20
[  558.289051]  sysfs_kf_write+0x3a/0x50
[  558.289053]  kernfs_fop_write+0xff/0x180
[  558.289054]  __vfs_write+0x28/0x120
[  558.289056]  ? apparmor_file_permission+0x1a/0x20

So then I check these functions and find gpu_info errors. The random buggy
cannot be reproduced constantly.But we expected it can pass more than 30 cycles
of S3 suspend and resume. Any ideas?

Thanks,
Ray
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: fix missed gpu info firmware when cache firmware during S3
  2017-06-06  8:33       ` Huang Rui
@ 2017-06-06 11:22         ` Christian König
       [not found]           ` <33c86a80-454e-7fb2-2e25-0c9a686bb3da-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2017-06-06 11:22 UTC (permalink / raw)
  To: Huang Rui, Christian König
  Cc: Deucher, Alexander, Huan, Alvin, Wang, Ken,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

> Yes, I agree with you. That's also my orignal opinion.
> But we encountered a random buggy when we were calling
> device_cache_fw_images.
That looks like an upstream bug in device_cache_fw_images.

We should probably open a bug report and ping the maintainer. Most 
likely we are not correctly using the FW interface or trigger a rare bug 
or something like this.

> So then I check these functions and find gpu_info errors. The random buggy
> cannot be reproduced constantly.But we expected it can pass more than 30 cycles
> of S3 suspend and resume. Any ideas?
I think the real solution is to just stop calling 
amdgpu_device_parse_gpu_info_fw() during resume.

That function just sets up the adev->gfx.config fields and that is 
unnecessary after resume.

Regards,
Christian.

Am 06.06.2017 um 10:33 schrieb Huang Rui:
> On Tue, Jun 06, 2017 at 04:00:29PM +0800, Christian König wrote:
>> Hi Ray,
>>
>> mhm, indeed a nice catch.
>>
>> But why do we need to load the gpu info after resume in the first place?
>>
>> I mean we already know what GPU we have, loading it again looks
>> superfluous to me.
>>
> Yes, I agree with you. That's also my orignal opinion.
> But we encountered a random buggy when we were calling
> device_cache_fw_images.
>
> [  558.288976] cache_firmware: amdgpu/vega10_sdma1.bin
> [  558.288976] cache_firmware: amdgpu/vega10_sdma.bin ret=0
> [  558.288981] fw_set_page_data: fw-amdgpu/vega10_sdma1.bin buf=ffff8803f1e64a80 data=ffffc90002411000 size=17408
> [  558.288981] cache_firmware: amdgpu/vega10_sdma1.bin ret=0
> [  558.288997] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [  558.289001] IP: devres_for_each_res+0x5e/0x100
> [  558.289001] PGD 0
> [  558.289002] Oops: 0000 [#3] SMP
> [  558.289003] Modules linked in: joydev hid_generic usbhid amdgpu(OE) ttm(OE) drm_kms_helper(OE) drm(OE) i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt rpcsec_gss_krb5 nfsv4 nfs fscache snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core intel_rapl snd_hwdep x86_pkg_temp_thermal intel_powerclamp snd_pcm kvm_intel snd_seq_midi kvm snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer irqbypass snd crct10dif_pclmul soundcore crc32_pclmul ghash_clmulni_intel pcbc mei_me aesni_intel shpchp mei aes_x86_64 crypto_simd glue_helper mac_hid cryptd acpi_pad tpm_infineon nfsd auth_rpcgss nfs_acl coretemp lockd grace sunrpc parport_pc ppdev lp parport autofs4 e1000e ptp nvme mxm_wmi ahci i2c_hid pps_core libahci nvme_core wmi video hid
> [  558.289027] CPU: 0 PID: 3742 Comm: pm-suspend Tainted: G      D    OE   4.11.0-custom #7
> [  558.289027] Hardware name: Gigabyte Technology Co., Ltd. Z170XP-SLI/Z170XP-SLI-CF, BIOS F20 11/04/2016
> [  558.289027] task: ffff8803ebdcd940 task.stack: ffffc900029b0000
> [  558.289029] RIP: 0010:devres_for_each_res+0x5e/0x100
> [  558.289029] RSP: 0018:ffffc900029b3bc8 EFLAGS: 00010086
> [  558.289030] RAX: 000000000000001d RBX: ffff880426aa0c18 RCX: 0000000000000000
> [  558.289030] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000092
> [  558.289031] RBP: ffffc900029b3c20 R08: 000000000000001d R09: ffffffff821ee601
> [  558.289031] R10: 000000000000141d R11: 0000000000000000 R12: ffffffff81566590
> [  558.289032] R13: ffffffff81566870 R14: ffffc900029b3c30 R15: ffff880426aa0e98
> [  558.289032] FS:  00007f63006c3700(0000) GS:ffff88043ec00000(0000) knlGS:0000000000000000
> [  558.289033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  558.289033] CR2: 0000000000000008 CR3: 00000003f257e000 CR4: 00000000003406f0
> [  558.289034] Call Trace:
> [  558.289036]  ? alloc_fw_cache_entry+0x60/0x60
> [  558.289037]  ? request_firmware_nowait+0x140/0x140
> [  558.289038]  dev_cache_fw_image+0x46/0x120
> [  558.289039]  ? request_firmware_nowait+0x140/0x140
> [  558.289040]  dpm_for_each_dev+0x44/0x70
> [  558.289041]  fw_pm_notify+0x164/0x190
> [  558.289043]  ? prepare_to_wait_event+0x110/0x110
> [  558.289044]  notifier_call_chain+0x49/0x70
> [  558.289046]  __blocking_notifier_call_chain+0x4d/0x70
> [  558.289047]  __pm_notifier_call_chain+0x1f/0x40
> [  558.289047]  pm_suspend+0x27f/0x3a0
> [  558.289048]  state_store+0x80/0xf0
> [  558.289050]  kobj_attr_store+0xf/0x20
> [  558.289051]  sysfs_kf_write+0x3a/0x50
> [  558.289053]  kernfs_fop_write+0xff/0x180
> [  558.289054]  __vfs_write+0x28/0x120
> [  558.289056]  ? apparmor_file_permission+0x1a/0x20
>
> So then I check these functions and find gpu_info errors. The random buggy
> cannot be reproduced constantly.But we expected it can pass more than 30 cycles
> of S3 suspend and resume. Any ideas?
>
> Thanks,
> Ray


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: fix missed gpu info firmware when cache firmware during S3
       [not found]           ` <33c86a80-454e-7fb2-2e25-0c9a686bb3da-5C7GfCeVMHo@public.gmane.org>
@ 2017-06-06 14:03             ` Alex Deucher
       [not found]               ` <CADnq5_PmrZg-Ov_Zr=48fwcrMkNhsiup0vnmpc--0gFOztmzCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Deucher @ 2017-06-06 14:03 UTC (permalink / raw)
  To: Christian König
  Cc: Huan, Alvin, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Christian König, Huang Rui, Deucher, Alexander, Wang, Ken

On Tue, Jun 6, 2017 at 7:22 AM, Christian König
<christian.koenig@amd.com> wrote:
>> Yes, I agree with you. That's also my orignal opinion.
>> But we encountered a random buggy when we were calling
>> device_cache_fw_images.
>
> That looks like an upstream bug in device_cache_fw_images.
>
> We should probably open a bug report and ping the maintainer. Most likely we
> are not correctly using the FW interface or trigger a rare bug or something
> like this.
>
>> So then I check these functions and find gpu_info errors. The random buggy
>> cannot be reproduced constantly.But we expected it can pass more than 30
>> cycles
>> of S3 suspend and resume. Any ideas?
>
> I think the real solution is to just stop calling
> amdgpu_device_parse_gpu_info_fw() during resume.

Right.  we only need to parse the firmware once during startup.

Alex

>
> That function just sets up the adev->gfx.config fields and that is
> unnecessary after resume.
>
> Regards,
> Christian.
>
>
> Am 06.06.2017 um 10:33 schrieb Huang Rui:
>>
>> On Tue, Jun 06, 2017 at 04:00:29PM +0800, Christian König wrote:
>>>
>>> Hi Ray,
>>>
>>> mhm, indeed a nice catch.
>>>
>>> But why do we need to load the gpu info after resume in the first place?
>>>
>>> I mean we already know what GPU we have, loading it again looks
>>> superfluous to me.
>>>
>> Yes, I agree with you. That's also my orignal opinion.
>> But we encountered a random buggy when we were calling
>> device_cache_fw_images.
>>
>> [  558.288976] cache_firmware: amdgpu/vega10_sdma1.bin
>> [  558.288976] cache_firmware: amdgpu/vega10_sdma.bin ret=0
>> [  558.288981] fw_set_page_data: fw-amdgpu/vega10_sdma1.bin
>> buf=ffff8803f1e64a80 data=ffffc90002411000 size=17408
>> [  558.288981] cache_firmware: amdgpu/vega10_sdma1.bin ret=0
>> [  558.288997] BUG: unable to handle kernel NULL pointer dereference at
>> 0000000000000008
>> [  558.289001] IP: devres_for_each_res+0x5e/0x100
>> [  558.289001] PGD 0
>> [  558.289002] Oops: 0000 [#3] SMP
>> [  558.289003] Modules linked in: joydev hid_generic usbhid amdgpu(OE)
>> ttm(OE) drm_kms_helper(OE) drm(OE) i2c_algo_bit fb_sys_fops syscopyarea
>> sysfillrect sysimgblt rpcsec_gss_krb5 nfsv4 nfs fscache
>> snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel
>> snd_hda_codec snd_hda_core intel_rapl snd_hwdep x86_pkg_temp_thermal
>> intel_powerclamp snd_pcm kvm_intel snd_seq_midi kvm snd_seq_midi_event
>> snd_rawmidi snd_seq snd_seq_device snd_timer irqbypass snd crct10dif_pclmul
>> soundcore crc32_pclmul ghash_clmulni_intel pcbc mei_me aesni_intel shpchp
>> mei aes_x86_64 crypto_simd glue_helper mac_hid cryptd acpi_pad tpm_infineon
>> nfsd auth_rpcgss nfs_acl coretemp lockd grace sunrpc parport_pc ppdev lp
>> parport autofs4 e1000e ptp nvme mxm_wmi ahci i2c_hid pps_core libahci
>> nvme_core wmi video hid
>> [  558.289027] CPU: 0 PID: 3742 Comm: pm-suspend Tainted: G      D    OE
>> 4.11.0-custom #7
>> [  558.289027] Hardware name: Gigabyte Technology Co., Ltd.
>> Z170XP-SLI/Z170XP-SLI-CF, BIOS F20 11/04/2016
>> [  558.289027] task: ffff8803ebdcd940 task.stack: ffffc900029b0000
>> [  558.289029] RIP: 0010:devres_for_each_res+0x5e/0x100
>> [  558.289029] RSP: 0018:ffffc900029b3bc8 EFLAGS: 00010086
>> [  558.289030] RAX: 000000000000001d RBX: ffff880426aa0c18 RCX:
>> 0000000000000000
>> [  558.289030] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
>> 0000000000000092
>> [  558.289031] RBP: ffffc900029b3c20 R08: 000000000000001d R09:
>> ffffffff821ee601
>> [  558.289031] R10: 000000000000141d R11: 0000000000000000 R12:
>> ffffffff81566590
>> [  558.289032] R13: ffffffff81566870 R14: ffffc900029b3c30 R15:
>> ffff880426aa0e98
>> [  558.289032] FS:  00007f63006c3700(0000) GS:ffff88043ec00000(0000)
>> knlGS:0000000000000000
>> [  558.289033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  558.289033] CR2: 0000000000000008 CR3: 00000003f257e000 CR4:
>> 00000000003406f0
>> [  558.289034] Call Trace:
>> [  558.289036]  ? alloc_fw_cache_entry+0x60/0x60
>> [  558.289037]  ? request_firmware_nowait+0x140/0x140
>> [  558.289038]  dev_cache_fw_image+0x46/0x120
>> [  558.289039]  ? request_firmware_nowait+0x140/0x140
>> [  558.289040]  dpm_for_each_dev+0x44/0x70
>> [  558.289041]  fw_pm_notify+0x164/0x190
>> [  558.289043]  ? prepare_to_wait_event+0x110/0x110
>> [  558.289044]  notifier_call_chain+0x49/0x70
>> [  558.289046]  __blocking_notifier_call_chain+0x4d/0x70
>> [  558.289047]  __pm_notifier_call_chain+0x1f/0x40
>> [  558.289047]  pm_suspend+0x27f/0x3a0
>> [  558.289048]  state_store+0x80/0xf0
>> [  558.289050]  kobj_attr_store+0xf/0x20
>> [  558.289051]  sysfs_kf_write+0x3a/0x50
>> [  558.289053]  kernfs_fop_write+0xff/0x180
>> [  558.289054]  __vfs_write+0x28/0x120
>> [  558.289056]  ? apparmor_file_permission+0x1a/0x20
>>
>> So then I check these functions and find gpu_info errors. The random buggy
>> cannot be reproduced constantly.But we expected it can pass more than 30
>> cycles
>> of S3 suspend and resume. Any ideas?
>>
>> Thanks,
>> Ray
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: fix missed gpu info firmware when cache firmware during S3
       [not found]               ` <CADnq5_PmrZg-Ov_Zr=48fwcrMkNhsiup0vnmpc--0gFOztmzCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-06 14:45                 ` Alex Deucher
       [not found]                   ` <CADnq5_OqUVNPrQWth2Y_jaoJ9jzbe948Wg-vjpQT0+GV9Q=uDg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Deucher @ 2017-06-06 14:45 UTC (permalink / raw)
  To: Christian König
  Cc: Huan, Alvin, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Christian König, Huang Rui, Deucher, Alexander, Wang, Ken

On Tue, Jun 6, 2017 at 10:03 AM, Alex Deucher <alexdeucher@gmail.com> wrote:
> On Tue, Jun 6, 2017 at 7:22 AM, Christian König
> <christian.koenig@amd.com> wrote:
>>> Yes, I agree with you. That's also my orignal opinion.
>>> But we encountered a random buggy when we were calling
>>> device_cache_fw_images.
>>
>> That looks like an upstream bug in device_cache_fw_images.
>>
>> We should probably open a bug report and ping the maintainer. Most likely we
>> are not correctly using the FW interface or trigger a rare bug or something
>> like this.
>>
>>> So then I check these functions and find gpu_info errors. The random buggy
>>> cannot be reproduced constantly.But we expected it can pass more than 30
>>> cycles
>>> of S3 suspend and resume. Any ideas?
>>
>> I think the real solution is to just stop calling
>> amdgpu_device_parse_gpu_info_fw() during resume.
>
> Right.  we only need to parse the firmware once during startup.

How are hitting this on resume?  amdgpu_device_parse_gpu_info_fw() is
called indirectly from amdgpu_device_init() which is only called once
at driver load time.

Alex


>
> Alex
>
>>
>> That function just sets up the adev->gfx.config fields and that is
>> unnecessary after resume.
>>
>> Regards,
>> Christian.
>>
>>
>> Am 06.06.2017 um 10:33 schrieb Huang Rui:
>>>
>>> On Tue, Jun 06, 2017 at 04:00:29PM +0800, Christian König wrote:
>>>>
>>>> Hi Ray,
>>>>
>>>> mhm, indeed a nice catch.
>>>>
>>>> But why do we need to load the gpu info after resume in the first place?
>>>>
>>>> I mean we already know what GPU we have, loading it again looks
>>>> superfluous to me.
>>>>
>>> Yes, I agree with you. That's also my orignal opinion.
>>> But we encountered a random buggy when we were calling
>>> device_cache_fw_images.
>>>
>>> [  558.288976] cache_firmware: amdgpu/vega10_sdma1.bin
>>> [  558.288976] cache_firmware: amdgpu/vega10_sdma.bin ret=0
>>> [  558.288981] fw_set_page_data: fw-amdgpu/vega10_sdma1.bin
>>> buf=ffff8803f1e64a80 data=ffffc90002411000 size=17408
>>> [  558.288981] cache_firmware: amdgpu/vega10_sdma1.bin ret=0
>>> [  558.288997] BUG: unable to handle kernel NULL pointer dereference at
>>> 0000000000000008
>>> [  558.289001] IP: devres_for_each_res+0x5e/0x100
>>> [  558.289001] PGD 0
>>> [  558.289002] Oops: 0000 [#3] SMP
>>> [  558.289003] Modules linked in: joydev hid_generic usbhid amdgpu(OE)
>>> ttm(OE) drm_kms_helper(OE) drm(OE) i2c_algo_bit fb_sys_fops syscopyarea
>>> sysfillrect sysimgblt rpcsec_gss_krb5 nfsv4 nfs fscache
>>> snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel
>>> snd_hda_codec snd_hda_core intel_rapl snd_hwdep x86_pkg_temp_thermal
>>> intel_powerclamp snd_pcm kvm_intel snd_seq_midi kvm snd_seq_midi_event
>>> snd_rawmidi snd_seq snd_seq_device snd_timer irqbypass snd crct10dif_pclmul
>>> soundcore crc32_pclmul ghash_clmulni_intel pcbc mei_me aesni_intel shpchp
>>> mei aes_x86_64 crypto_simd glue_helper mac_hid cryptd acpi_pad tpm_infineon
>>> nfsd auth_rpcgss nfs_acl coretemp lockd grace sunrpc parport_pc ppdev lp
>>> parport autofs4 e1000e ptp nvme mxm_wmi ahci i2c_hid pps_core libahci
>>> nvme_core wmi video hid
>>> [  558.289027] CPU: 0 PID: 3742 Comm: pm-suspend Tainted: G      D    OE
>>> 4.11.0-custom #7
>>> [  558.289027] Hardware name: Gigabyte Technology Co., Ltd.
>>> Z170XP-SLI/Z170XP-SLI-CF, BIOS F20 11/04/2016
>>> [  558.289027] task: ffff8803ebdcd940 task.stack: ffffc900029b0000
>>> [  558.289029] RIP: 0010:devres_for_each_res+0x5e/0x100
>>> [  558.289029] RSP: 0018:ffffc900029b3bc8 EFLAGS: 00010086
>>> [  558.289030] RAX: 000000000000001d RBX: ffff880426aa0c18 RCX:
>>> 0000000000000000
>>> [  558.289030] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
>>> 0000000000000092
>>> [  558.289031] RBP: ffffc900029b3c20 R08: 000000000000001d R09:
>>> ffffffff821ee601
>>> [  558.289031] R10: 000000000000141d R11: 0000000000000000 R12:
>>> ffffffff81566590
>>> [  558.289032] R13: ffffffff81566870 R14: ffffc900029b3c30 R15:
>>> ffff880426aa0e98
>>> [  558.289032] FS:  00007f63006c3700(0000) GS:ffff88043ec00000(0000)
>>> knlGS:0000000000000000
>>> [  558.289033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  558.289033] CR2: 0000000000000008 CR3: 00000003f257e000 CR4:
>>> 00000000003406f0
>>> [  558.289034] Call Trace:
>>> [  558.289036]  ? alloc_fw_cache_entry+0x60/0x60
>>> [  558.289037]  ? request_firmware_nowait+0x140/0x140
>>> [  558.289038]  dev_cache_fw_image+0x46/0x120
>>> [  558.289039]  ? request_firmware_nowait+0x140/0x140
>>> [  558.289040]  dpm_for_each_dev+0x44/0x70
>>> [  558.289041]  fw_pm_notify+0x164/0x190
>>> [  558.289043]  ? prepare_to_wait_event+0x110/0x110
>>> [  558.289044]  notifier_call_chain+0x49/0x70
>>> [  558.289046]  __blocking_notifier_call_chain+0x4d/0x70
>>> [  558.289047]  __pm_notifier_call_chain+0x1f/0x40
>>> [  558.289047]  pm_suspend+0x27f/0x3a0
>>> [  558.289048]  state_store+0x80/0xf0
>>> [  558.289050]  kobj_attr_store+0xf/0x20
>>> [  558.289051]  sysfs_kf_write+0x3a/0x50
>>> [  558.289053]  kernfs_fop_write+0xff/0x180
>>> [  558.289054]  __vfs_write+0x28/0x120
>>> [  558.289056]  ? apparmor_file_permission+0x1a/0x20
>>>
>>> So then I check these functions and find gpu_info errors. The random buggy
>>> cannot be reproduced constantly.But we expected it can pass more than 30
>>> cycles
>>> of S3 suspend and resume. Any ideas?
>>>
>>> Thanks,
>>> Ray
>>
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: fix missed gpu info firmware when cache firmware during S3
  2017-06-06 14:52                     ` Huang Rui
@ 2017-06-06 14:52                       ` Alex Deucher
  2017-06-06 14:55                       ` Huang Rui
  1 sibling, 0 replies; 9+ messages in thread
From: Alex Deucher @ 2017-06-06 14:52 UTC (permalink / raw)
  To: Huang Rui
  Cc: Huan, Alvin, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Christian König, Deucher, Alexander, Wang, Ken, Koenig,
	Christian

On Tue, Jun 6, 2017 at 10:52 AM, Huang Rui <ray.huang@amd.com> wrote:
> On Tue, Jun 06, 2017 at 10:45:42PM +0800, Alex Deucher wrote:
>> On Tue, Jun 6, 2017 at 10:03 AM, Alex Deucher <alexdeucher@gmail.com> wrote:
>> > On Tue, Jun 6, 2017 at 7:22 AM, Christian König
>> > <christian.koenig@amd.com> wrote:
>> >>> Yes, I agree with you. That's also my orignal opinion.
>> >>> But we encountered a random buggy when we were calling
>> >>> device_cache_fw_images.
>> >>
>> >> That looks like an upstream bug in device_cache_fw_images.
>> >>
>> >> We should probably open a bug report and ping the maintainer. Most likely we
>> >> are not correctly using the FW interface or trigger a rare bug or something
>> >> like this.
>> >>
>> >>> So then I check these functions and find gpu_info errors. The random buggy
>> >>> cannot be reproduced constantly.But we expected it can pass more than 30
>> >>> cycles
>> >>> of S3 suspend and resume. Any ideas?
>> >>
>> >> I think the real solution is to just stop calling
>> >> amdgpu_device_parse_gpu_info_fw() during resume.
>> >
>> > Right.  we only need to parse the firmware once during startup.
>>
>> How are hitting this on resume?  amdgpu_device_parse_gpu_info_fw() is
>> called indirectly from amdgpu_device_init() which is only called once
>> at driver load time.
>>
>
> Yes, I also noted it. So I am confused with why firmware_class will still
> cache it during suspend.

I guess request_firmware expects the driver to keep the firmware
around until the driver is unloaded.  Just one comment about the
patch:

@@ -1272,6 +1272,9 @@ struct amdgpu_firmware {
        const struct amdgpu_psp_funcs *funcs;
        struct amdgpu_bo *rbuf;
        struct mutex mutex;
+
+       /* gpu info firmware data pointer */
+       const struct firmware *fw;
 };

Call this gpu_info_fw rather than just fw.  With that, the patch is:
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

Alex
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: fix missed gpu info firmware when cache firmware during S3
       [not found]                   ` <CADnq5_OqUVNPrQWth2Y_jaoJ9jzbe948Wg-vjpQT0+GV9Q=uDg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-06 14:52                     ` Huang Rui
  2017-06-06 14:52                       ` Alex Deucher
  2017-06-06 14:55                       ` Huang Rui
  0 siblings, 2 replies; 9+ messages in thread
From: Huang Rui @ 2017-06-06 14:52 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Huan, Alvin, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Christian König, Deucher, Alexander, Wang, Ken, Koenig,
	Christian

On Tue, Jun 06, 2017 at 10:45:42PM +0800, Alex Deucher wrote:
> On Tue, Jun 6, 2017 at 10:03 AM, Alex Deucher <alexdeucher@gmail.com> wrote:
> > On Tue, Jun 6, 2017 at 7:22 AM, Christian König
> > <christian.koenig@amd.com> wrote:
> >>> Yes, I agree with you. That's also my orignal opinion.
> >>> But we encountered a random buggy when we were calling
> >>> device_cache_fw_images.
> >>
> >> That looks like an upstream bug in device_cache_fw_images.
> >>
> >> We should probably open a bug report and ping the maintainer. Most likely we
> >> are not correctly using the FW interface or trigger a rare bug or something
> >> like this.
> >>
> >>> So then I check these functions and find gpu_info errors. The random buggy
> >>> cannot be reproduced constantly.But we expected it can pass more than 30
> >>> cycles
> >>> of S3 suspend and resume. Any ideas?
> >>
> >> I think the real solution is to just stop calling
> >> amdgpu_device_parse_gpu_info_fw() during resume.
> >
> > Right.  we only need to parse the firmware once during startup.
> 
> How are hitting this on resume?  amdgpu_device_parse_gpu_info_fw() is
> called indirectly from amdgpu_device_init() which is only called once
> at driver load time.
> 

Yes, I also noted it. So I am confused with why firmware_class will still
cache it during suspend.

Thanks,
Ray
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu: fix missed gpu info firmware when cache firmware during S3
  2017-06-06 14:52                     ` Huang Rui
  2017-06-06 14:52                       ` Alex Deucher
@ 2017-06-06 14:55                       ` Huang Rui
  1 sibling, 0 replies; 9+ messages in thread
From: Huang Rui @ 2017-06-06 14:55 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Huan, Alvin, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Christian K�nig, Deucher, Alexander, Wang, Ken, Koenig,
	Christian

On Tue, Jun 06, 2017 at 10:52:46PM +0800, Huang Rui wrote:
> On Tue, Jun 06, 2017 at 10:45:42PM +0800, Alex Deucher wrote:
> > On Tue, Jun 6, 2017 at 10:03 AM, Alex Deucher <alexdeucher@gmail.com> wrote:
> > > On Tue, Jun 6, 2017 at 7:22 AM, Christian K?nig
> > > <christian.koenig@amd.com> wrote:
> > >>> Yes, I agree with you. That's also my orignal opinion.
> > >>> But we encountered a random buggy when we were calling
> > >>> device_cache_fw_images.
> > >>
> > >> That looks like an upstream bug in device_cache_fw_images.
> > >>
> > >> We should probably open a bug report and ping the maintainer. Most likely we
> > >> are not correctly using the FW interface or trigger a rare bug or something
> > >> like this.
> > >>
> > >>> So then I check these functions and find gpu_info errors. The random buggy
> > >>> cannot be reproduced constantly.But we expected it can pass more than 30
> > >>> cycles
> > >>> of S3 suspend and resume. Any ideas?
> > >>
> > >> I think the real solution is to just stop calling
> > >> amdgpu_device_parse_gpu_info_fw() during resume.
> > >
> > > Right.  we only need to parse the firmware once during startup.
> > 
> > How are hitting this on resume?  amdgpu_device_parse_gpu_info_fw() is
> > called indirectly from amdgpu_device_init() which is only called once
> > at driver load time.
> > 
> 
> Yes, I also noted it. So I am confused with why firmware_class will still
> cache it during suspend.
> 

At that time, we already have released gpu_info firmware data. It seems a
bug of upper layer.

Thanks,
Ray
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-06-06 14:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-06  6:24 [PATCH] drm/amdgpu: fix missed gpu info firmware when cache firmware during S3 Huang Rui
     [not found] ` <1496730269-27140-1-git-send-email-ray.huang-5C7GfCeVMHo@public.gmane.org>
2017-06-06  8:00   ` Christian König
     [not found]     ` <9f43e1be-6d92-cba9-80e0-274d07109c5f-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-06-06  8:33       ` Huang Rui
2017-06-06 11:22         ` Christian König
     [not found]           ` <33c86a80-454e-7fb2-2e25-0c9a686bb3da-5C7GfCeVMHo@public.gmane.org>
2017-06-06 14:03             ` Alex Deucher
     [not found]               ` <CADnq5_PmrZg-Ov_Zr=48fwcrMkNhsiup0vnmpc--0gFOztmzCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-06 14:45                 ` Alex Deucher
     [not found]                   ` <CADnq5_OqUVNPrQWth2Y_jaoJ9jzbe948Wg-vjpQT0+GV9Q=uDg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-06 14:52                     ` Huang Rui
2017-06-06 14:52                       ` Alex Deucher
2017-06-06 14:55                       ` Huang Rui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.