All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Yang, Philip" <Philip.Yang-5C7GfCeVMHo@public.gmane.org>
To: "Deng, Emily" <Emily.Deng-5C7GfCeVMHo@public.gmane.org>,
	"Russell, Kent" <Kent.Russell-5C7GfCeVMHo@public.gmane.org>,
	"Quan, Evan" <Evan.Quan-5C7GfCeVMHo@public.gmane.org>,
	"amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Subject: Re: [PATCH] drm/amdgpu: Need to set the baco cap before baco reset
Date: Fri, 14 Jun 2019 14:15:38 +0000	[thread overview]
Message-ID: <9bfdbf20-b914-6063-5517-8352e652e203@amd.com> (raw)
In-Reply-To: <MN2PR12MB29750834B9F6C548E831DE798FEE0-rweVpJHSKToFlvJWC7EAqwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>

Hi Emily,

I am not familiar with vbios and driver init part, just based on my 
experience, the patch don't modify amdgpu_get_bios but move 
amdgpu_get_bios to amdgpu_device_ip_early_init from amdgpu_device_init, 
so amdgpu_get_bios is executed earlier. The kernel error message "BUG: 
kernel NULL pointer dereference" means something is not initialized. 
Please review the change. This issue blocks rocm release now.

Regards,
Philip

On 2019-06-13 11:19 p.m., Deng, Emily wrote:
> Hi Russell,
>       This patch will read vbios, and parse vbios to get the baco reset feature bit.  From the call trace, it shows error in " amdgpu_get_bios ", but this patch don't modify amdgpu_get_bios, and code before amdgpu_get_bios. Please first check why it will has error when read vbios.
> 
> Best wishes
> Emily Deng
> 
> 
> 
>> -----Original Message-----
>> From: Russell, Kent <Kent.Russell@amd.com>
>> Sent: Thursday, June 13, 2019 7:11 PM
>> To: Quan, Evan <Evan.Quan@amd.com>; Deng, Emily
>> <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Deng, Emily <Emily.Deng@amd.com>
>> Subject: RE: [PATCH] drm/amdgpu: Need to set the baco cap before baco reset
>>
>> Hi Emily,
>>
>> This patch caused a regression on MI25 (Chip 6860, VBIOS 113-D0513700-001)
>> machines where the driver would not boot. Note that this was not seen on
>> Vega10 Frontier (Chip 6863, VBIOS 113-D0501100-X09) or Radeon64 (Chip
>> 697f). Reverting this patch resolved the issue with no other work required and
>> was confirmed on all 3 machines.
>>
>> Here is the dmesg:
>>
>> [    3.908653] amdgpu 0000:23:00.0: BAR 6: can't assign [??? 0x00000000 flags
>> 0x20000000] (bogus alignment)
>> [    3.908692] BUG: kernel NULL pointer dereference, address:
>> 0000000000000008
>> [    3.908716] #PF: supervisor read access in kernel mode
>> [    3.908734] #PF: error_code(0x0000) - not-present page
>> [    3.908753] PGD 0 P4D 0
>> [    3.908767] Oops: 0000 [#1] SMP NOPTI
>> [    3.909293] CPU: 8 PID: 409 Comm: kworker/8:1 Not tainted 5.2.0-rc1-kfd-
>> compute-roc-master-10734 #1
>> [    3.909753] Hardware name: Inventec     P47                              WC2071019001
>> /P47             , BIOS 0.64 04/09/2018
>> [    3.910534] Workqueue: events work_for_cpu_fn
>> [    3.910953] RIP: 0010:amdgpu_get_bios+0x3aa/0x580 [amdgpu]
>> [    3.911314] Code: c0 48 c7 44 24 5c 00 00 00 00 48 c7 84 24 90 00 00 00 00 00
>> 00 00 48 89 d9 48 29 f9 83 c1 3c c1 e9 03 f3 48 ab 49 8b 44 24 38 <48> 8b 40 08
>> 48 85 c0 74 24 ba 3c 00 00 00 48 89 de 4c 89 e7 ff d0
>> [    3.912069] RSP: 0018:ffffa27dce28fc50 EFLAGS: 00010212
>> [    3.912502] RAX: 0000000000000000 RBX: ffffa27dce28fcac RCX:
>> 0000000000000000
>> [    3.912980] RDX: 0000000000000000 RSI: 0000000000000082 RDI:
>> ffffa27dce28fce8
>> [    3.913467] RBP: 0000000000000000 R08: 0000000000000001 R09:
>> 000000000000079a
>> [    3.913940] R10: 0000000000000000 R11: 0000000000000001 R12:
>> ffff88d657af0000
>> [    3.914349] R13: ffffffffc0c38120 R14: ffffa27dce28fc68 R15:
>> ffff88d657af0000
>> [    3.914767] FS:  0000000000000000(0000) GS:ffff88d65f400000(0000)
>> knlGS:0000000000000000
>> [    3.915203] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    3.915637] CR2: 0000000000000008 CR3: 0000003e7540a000 CR4:
>> 00000000003406e0
>> [    3.916075] Call Trace:
>> [    3.916522]  ? pcie_capability_clear_and_set_word+0x53/0x80
>> [    3.917014]  amdgpu_device_init+0x923/0x1820 [amdgpu]
>> [    3.917515]  amdgpu_driver_load_kms+0x71/0x310 [amdgpu]
>> [    3.917997]  drm_dev_register+0x113/0x1a0 [drm]
>> [    3.918514]  amdgpu_pci_probe+0xb8/0x150 [amdgpu]
>> [    3.919003]  ? __pm_runtime_resume+0x54/0x70
>> [    3.919270] usb 1-2: New USB device found, idVendor=14dd, idProduct=1005,
>> bcdDevice= 0.00
>> [    3.919498]  local_pci_probe+0x3d/0x90
>> [    3.919503]  ? __schedule+0x3de/0x690
>> [    3.920374] usb 1-2: New USB device strings: Mfr=1, Product=2,
>> SerialNumber=3
>> [    3.921137]  work_for_cpu_fn+0x10/0x20
>> [    3.922028] usb 1-2: Product: D2CIM-VUSB
>> [    3.922815]  process_one_work+0x159/0x3e0
>> [    3.923633] usb 1-2: Manufacturer: Raritan
>> [    3.923635] usb 1-2: SerialNumber: EFFB212D0A6E32F
>> [    3.924416]  worker_thread+0x22b/0x440
>> [    3.924419]  ? rescuer_thread+0x350/0x350
>> [    3.927812]  kthread+0xf6/0x130
>> [    3.928157] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>> [    3.928365]  ? kthread_destroy_worker+0x40/0x40
>> [    3.929401] ata1.00: ATA-10: INTEL SSDSC2KG960G7, SCV10100, max
>> UDMA/133
>> [    3.930101]  ret_from_fork+0x1f/0x30
>> [    3.930103] Modules linked in: amdgpu(+) crct10dif_pclmul crc32_pclmul
>> ghash_clmulni_intel ast amd_iommu_v2 aesni_intel gpu_sched i2c_algo_bit
>> aes_x86_64 ttm crypto_simd drm_kms_helper cryptd glue_helper
>> syscopyarea sysfillrect ahci sysimgblt libahci fb_sys_fops ixgbe(+) drm dca
>> mdio
>> [    3.930965] ata1.00: 1875385008 sectors, multi 1: LBA48 NCQ (depth 32)
>> [    3.931085] ata1.00: configured for UDMA/133
>> [    3.931809] CR2: 0000000000000008
>> [    3.934723] scsi 0:0:0:0: Direct-Access     ATA      INTEL SSDSC2KG96 0100 PQ:
>> 0 ANSI: 5
>>
>>
>> Thanks!
>>
>> Kent
>>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Quan,
>> Evan
>> Sent: Monday, May 27, 2019 9:17 PM
>> To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Deng, Emily <Emily.Deng@amd.com>
>> Subject: RE: [PATCH] drm/amdgpu: Need to set the baco cap before baco reset
>>
>> Reviewed-by: Evan Quan <evan.quan@amd.com>
>>
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>>> Emily Deng
>>> Sent: Monday, May 27, 2019 3:43 PM
>>> To: amd-gfx@lists.freedesktop.org
>>> Cc: Deng, Emily <Emily.Deng@amd.com>
>>> Subject: [PATCH] drm/amdgpu: Need to set the baco cap before baco
>>> reset
>>>
>>> For passthrough, after rebooted the VM, driver will do a baco reset
>>> before doing other driver initialization during loading  driver. For
>>> doing the baco reset, it will first check the baco reset capability.
>>> So first need to set the cap from the vbios information or baco reset won't
>> be enabled.
>>>
>>> Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c         | 24 ++++++++++-----
>>> -------
>>>   drivers/gpu/drm/amd/amdgpu/soc15.c                 |  3 ++-
>>>   drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c |  4 ++++
>>>   .../amd/powerplay/hwmgr/vega10_processpptables.c   | 24
>>> ++++++++++++++++++++++
>>>   .../amd/powerplay/hwmgr/vega10_processpptables.h   |  1 +
>>>   5 files changed, 42 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index b6ded84..7a8c220 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -1541,6 +1541,17 @@ static int amdgpu_device_ip_early_init(struct
>>> amdgpu_device *adev)
>>>   	if (amdgpu_sriov_vf(adev))
>>>   		adev->pm.pp_feature &= ~PP_GFXOFF_MASK;
>>>
>>> +	/* Read BIOS */
>>> +	if (!amdgpu_get_bios(adev))
>>> +		return -EINVAL;
>>> +
>>> +	r = amdgpu_atombios_init(adev);
>>> +	if (r) {
>>> +		dev_err(adev->dev, "amdgpu_atombios_init failed\n");
>>> +		amdgpu_vf_error_put(adev,
>>> AMDGIM_ERROR_VF_ATOMBIOS_INIT_FAIL, 0, 0);
>>> +		return r;
>>> +	}
>>> +
>>>   	for (i = 0; i < adev->num_ip_blocks; i++) {
>>>   		if ((amdgpu_ip_block_mask & (1 << i)) == 0) {
>>>   			DRM_ERROR("disabled ip block: %d <%s>\n", @@ -
>>> 2591,19 +2602,6 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>>   		goto fence_driver_init;
>>>   	}
>>>
>>> -	/* Read BIOS */
>>> -	if (!amdgpu_get_bios(adev)) {
>>> -		r = -EINVAL;
>>> -		goto failed;
>>> -	}
>>> -
>>> -	r = amdgpu_atombios_init(adev);
>>> -	if (r) {
>>> -		dev_err(adev->dev, "amdgpu_atombios_init failed\n");
>>> -		amdgpu_vf_error_put(adev,
>>> AMDGIM_ERROR_VF_ATOMBIOS_INIT_FAIL, 0, 0);
>>> -		goto failed;
>>> -	}
>>> -
>>>   	/* detect if we are with an SRIOV vbios */
>>>   	amdgpu_device_detect_sriov_bios(adev);
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> index 78bd4fc..d9fdd95 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> @@ -764,7 +764,8 @@ static bool soc15_need_reset_on_init(struct
>>> amdgpu_device *adev)
>>>   	/* Just return false for soc15 GPUs.  Reset does not seem to
>>>   	 * be necessary.
>>>   	 */
>>> -	return false;
>>> +	if (!amdgpu_passthrough(adev))
>>> +		return false;
>>>
>>>   	if (adev->flags & AMD_IS_APU)
>>>   		return false;
>>> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
>>> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
>>> index ce6aeb5..1d9bb29 100644
>>> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
>>> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c
>>> @@ -5311,8 +5311,12 @@ static const struct pp_hwmgr_func
>>> vega10_hwmgr_funcs = {
>>>
>>>   int vega10_hwmgr_init(struct pp_hwmgr *hwmgr)  {
>>> +	struct amdgpu_device *adev = hwmgr->adev;
>>> +
>>>   	hwmgr->hwmgr_func = &vega10_hwmgr_funcs;
>>>   	hwmgr->pptable_func = &vega10_pptable_funcs;
>>> +	if (amdgpu_passthrough(adev))
>>> +		return vega10_baco_set_cap(hwmgr);
>>>
>>>   	return 0;
>>>   }
>>> diff --git
>>> a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
>>> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
>>> index b6767d7..83d22cd 100644
>>> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
>>> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
>>> @@ -1371,3 +1371,27 @@ int vega10_get_powerplay_table_entry(struct
>>> pp_hwmgr *hwmgr,
>>>
>>>   	return result;
>>>   }
>>> +
>>> +int vega10_baco_set_cap(struct pp_hwmgr *hwmgr) {
>>> +	int result = 0;
>>> +
>>> +	const ATOM_Vega10_POWERPLAYTABLE *powerplay_table;
>>> +
>>> +	powerplay_table = get_powerplay_table(hwmgr);
>>> +
>>> +	PP_ASSERT_WITH_CODE((powerplay_table != NULL),
>>> +		"Missing PowerPlay Table!", return -1);
>>> +
>>> +	result = check_powerplay_tables(hwmgr, powerplay_table);
>>> +
>>> +	PP_ASSERT_WITH_CODE((result == 0),
>>> +			    "check_powerplay_tables failed", return result);
>>> +
>>> +	set_hw_cap(
>>> +			hwmgr,
>>> +			0 != (le32_to_cpu(powerplay_table->ulPlatformCaps)
>>> & ATOM_VEGA10_PP_PLATFORM_CAP_BACO),
>>> +			PHM_PlatformCaps_BACO);
>>> +	return result;
>>> +}
>>> +
>>> diff --git
>>> a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.h
>>> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.h
>>> index d83ed2a..da5fbec 100644
>>> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.h
>>> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.h
>>> @@ -59,4 +59,5 @@ extern int
>>> vega10_get_number_of_powerplay_table_entries(struct pp_hwmgr
>> *hwmgr);
>>> extern int vega10_get_powerplay_table_entry(struct pp_hwmgr *hwmgr,
>>> uint32_t entry_index,
>>>   		struct pp_power_state *power_state, int
>> (*call_back_func)(struct
>>> pp_hwmgr *, void *,
>>>   				struct pp_power_state *, void *, uint32_t));
>>> +extern int vega10_baco_set_cap(struct pp_hwmgr *hwmgr);
>>>   #endif
>>> --
>>> 2.7.4
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply	other threads:[~2019-06-14 14:15 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-27  7:43 [PATCH] drm/amdgpu: Need to set the baco cap before baco reset Emily Deng
     [not found] ` <1558943005-16557-1-git-send-email-Emily.Deng-5C7GfCeVMHo@public.gmane.org>
2019-05-28  1:17   ` Quan, Evan
     [not found]     ` <MN2PR12MB3344E228ADA29C20F5BA9DDBE41E0-rweVpJHSKToDMgCC8P//OwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-06-13 11:11       ` Russell, Kent
     [not found]         ` <BN6PR12MB1618E4322B84944963895A8E85EF0-/b2+HYfkarRqaFUXYJa4HgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-06-14  3:19           ` Deng, Emily
     [not found]             ` <MN2PR12MB29750834B9F6C548E831DE798FEE0-rweVpJHSKToFlvJWC7EAqwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-06-14 14:15               ` Yang, Philip [this message]
     [not found]                 ` <9bfdbf20-b914-6063-5517-8352e652e203-5C7GfCeVMHo@public.gmane.org>
2019-06-17  2:50                   ` Deng, Emily
     [not found]                     ` <MN2PR12MB2975EAAD825DA9842876B4A38FEB0-rweVpJHSKToFlvJWC7EAqwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-06-17  3:52                       ` Deng, Emily
     [not found]                         ` <MN2PR12MB29750900DC4B43AD00254D888FEB0-rweVpJHSKToFlvJWC7EAqwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-06-17 11:08                           ` Russell, Kent
     [not found]                             ` <BN6PR12MB16188EBC282A9D8BEA715AB185EB0-/b2+HYfkarRqaFUXYJa4HgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-06-17 14:39                               ` Alex Deucher
2019-06-17 11:39                           ` Russell, Kent
  -- strict thread matches above, loose matches on Subject: below --
2019-05-23 10:22 Emily Deng
     [not found] ` <1558606921-1537-1-git-send-email-Emily.Deng-5C7GfCeVMHo@public.gmane.org>
2019-05-23 16:08   ` Alex Deucher
     [not found]     ` <CADnq5_NhONFSC2C=0npCx4AKGKAphAsWq1Kq7jZhmeY+ODQ6-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-05-24  2:29       ` Deng, Emily
     [not found]         ` <MWHPR12MB132625FCE9DF15D24746CFD28F020-Gy0DoCVfaSVaj5rVPFIlogdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-05-24  8:28           ` Deng, Emily
2019-05-24 16:59           ` Alex Deucher
     [not found]             ` <CADnq5_Of2+kLtG3tRWkJXG3Ghbs9VFu8g4mDbKvdD7cXyuysRA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-05-27  2:57               ` Deng, Emily
2019-05-22 10:06 Emily Deng
     [not found] ` <1558519591-30349-1-git-send-email-Emily.Deng-5C7GfCeVMHo@public.gmane.org>
2019-05-23  3:11   ` Quan, Evan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9bfdbf20-b914-6063-5517-8352e652e203@amd.com \
    --to=philip.yang-5c7gfcevmho@public.gmane.org \
    --cc=Emily.Deng-5C7GfCeVMHo@public.gmane.org \
    --cc=Evan.Quan-5C7GfCeVMHo@public.gmane.org \
    --cc=Kent.Russell-5C7GfCeVMHo@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.