All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Koenig, Christian" <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>
To: "Zhu, Changfeng" <Changfeng.Zhu-5C7GfCeVMHo@public.gmane.org>,
	"amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Cc: "Deucher,
	Alexander" <Alexander.Deucher-5C7GfCeVMHo@public.gmane.org>,
	"Pelloux-prayer,
	Pierre-eric"
	<Pierre-eric.Pelloux-prayer-5C7GfCeVMHo@public.gmane.org>,
	"Huang, Ray" <Ray.Huang-5C7GfCeVMHo@public.gmane.org>,
	"Tuikov, Luben" <Luben.Tuikov-5C7GfCeVMHo@public.gmane.org>
Subject: Re: [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle delay
Date: Mon, 28 Oct 2019 10:46:39 +0000	[thread overview]
Message-ID: <924c7758-92ed-caf6-8068-ca12d7d77ed7@amd.com> (raw)
In-Reply-To: <MN2PR12MB2896E32084545C8EB240BC45FD660-rweVpJHSKToIQ/pRnFqe/QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>

Hi Changfeng,

> So how can we deal with the firmware between mec version(402) and mec version(421)?
Well of hand I see only two options: Either print a warning or 
completely reject loading the driver.

Completely rejecting loading the driver is probably not a good idea and 
the issue is actually extremely unlikely to cause any problems.

So printing a warning that the user should update their firmware is 
probably the best approach.

Regards,
Christian.

Am 28.10.19 um 04:01 schrieb Zhu, Changfeng:
> Hi Christian,
>
> Re- that won't work, you can't add this to
> amdgpu_ring_emit_reg_write_reg_wait_helper or break all read triggered registers (like the semaphore ones).
>
> Do you mean that I should use reg_wait registers(wait_reg_mem) like Luben to replace read triggered registers for adding dummy read?
>
> Re-Additional to that it will never work on GFX9, since the CP firmware there uses the integrated write/wait command and you can't add an additional dummy read there.
>
> Yes, I see the integrated write/wait command and they are realized in gfx_v9_0_wait_reg_mem:
> Emily's patch:
> drm/amdgpu: Remove the sriov checking and add firmware checking
> decides when to go into gfx_v9_0_wait_reg_mem and when go into amdgpu_ring_emit_reg_write_reg_wait_helper.
>
> However there are two problems now.
> 1.Before the fw_version_ok fw version, the code goes into amdgpu_ring_emit_reg_write_reg_wait_helper. In this case, should not we add dummy read in amdgpu_ring_emit_reg_write_reg_wait_helper?
> 2.After the fw_version_ok fw version, the code goes into gfx_v9_0_wait_reg_mem. However, it realizes write/wait command in firmware. Then how can we add this dummy read? According to Yang,Zilong, the CP firmware has realized dummy in firmware in CL:
> Vega20 CL#1762470 @3/27/2019
> Navi10 CL#1761300 @3/25/2019
> Accodring to CL#1762470,
> The firmware which realized dummy read is(Raven for example):
> Mec version:
> #define F32_MEC_UCODE_VERSION "#421"
> #define F32_MEC_FEATURE_VERSION 46
> Pfp version:
> #define F32_PFP_UCODE_VERSION "#183"
> #define F32_PFP_FEATURE_VERSION 46
> In Emily's patch:
> The CP firmware whichuses the integrated write/wait command begins from version:
> +       case CHIP_RAVEN:
> +               if ((adev->gfx.me_fw_version >= 0x0000009c) &&
> +                   (adev->gfx.me_feature_version >= 42) &&
> +                   (adev->gfx.pfp_fw_version >=  0x000000b1(177)) &&
> +                   (adev->gfx.pfp_feature_version >= 42))
> +                       adev->gfx.me_fw_write_wait = true;
> +
> +               if ((adev->gfx.mec_fw_version >=  0x00000192(402)) &&
> +                   (adev->gfx.mec_feature_version >= 42))
> +                       adev->gfx.mec_fw_write_wait = true;
> +               break;
>
> So how can we deal with the firmware between mec version(402) and mec version(421)?
> It will realize write/wait command in CP firmware but it doesn't have dummy read.
>
> BR,
> Changfeng.
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Friday, October 25, 2019 11:54 PM
> To: Zhu, Changfeng <Changfeng.Zhu@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Pelloux-prayer, Pierre-eric <Pierre-eric.Pelloux-prayer@amd.com>; Huang, Ray <Ray.Huang@amd.com>; Tuikov, Luben <Luben.Tuikov@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle delay
>
> Hi Changfeng,
>
> that won't work, you can't add this to
> amdgpu_ring_emit_reg_write_reg_wait_helper or break all read triggered registers (like the semaphore ones).
>
> Additional to that it will never work on GFX9, since the CP firmware there uses the integrated write/wait command and you can't add an additional dummy read there.
>
> Regards,
> Christian.
>
> Am 25.10.19 um 16:22 schrieb Zhu, Changfeng:
>> I try to write a patch based on the patch of Tuikov,Luben.
>>
>> Inspired by Luben,here is the patch:
>>
>>   From 1980d8f1ed44fb9a84a5ea1f6e2edd2bc25c629a Mon Sep 17 00:00:00
>> 2001
>> From: changzhu <Changfeng.Zhu@amd.com>
>> Date: Thu, 10 Oct 2019 11:02:33 +0800
>> Subject: [PATCH] drm/amdgpu: add dummy read by engines for some GCVM status
>>    registers
>>
>> The GRBM register interface is now capable of bursting 1 cycle per
>> register wr->wr, wr->rd much faster than previous muticycle per
>> transaction done interface.  This has caused a problem where status
>> registers requiring HW to update have a 1 cycle delay, due to the
>> register update having to go through GRBM.
>>
>> SW may operate on an incorrect value if they write a register and
>> immediately check the corresponding status register.
>>
>> Registers requiring HW to clear or set fields may be delayed by 1 cycle.
>> For example,
>>
>> 1. write VM_INVALIDATE_ENG0_REQ mask = 5a 2. read
>> VM_INVALIDATE_ENG0_ACKb till the ack is same as the request mask = 5a
>>       	a. HW will reset VM_INVALIDATE_ENG0_ACK = 0 until invalidation
>> is complete 3. write VM_INVALIDATE_ENG0_REQ mask = 5a 4. read
>> VM_INVALIDATE_ENG0_ACK till the ack is same as the request mask = 5a
>> 	a. First read of VM_INVALIDATE_ENG0_ACK = 5a instead of 0
>> 	b. Second read of VM_INVALIDATE_ENG0_ACK = 0 because the remote GRBM h/w
>> 	   register takes one extra cycle to be cleared
>> 	c. In this case,SW wil see a false ACK if they exit on first read
>>
>> Affected registers (only GC variant)  | Recommended Dummy Read
>> --------------------------------------+----------------------------
>> VM_INVALIDATE_ENG*_ACK		      |  VM_INVALIDATE_ENG*_REQ
>> VM_L2_STATUS			      |  VM_L2_STATUS
>> VM_L2_PROTECTION_FAULT_STATUS	      |  VM_L2_PROTECTION_FAULT_STATUS
>> VM_L2_PROTECTION_FAULT_ADDR_HI/LO32   |  VM_L2_PROTECTION_FAULT_ADDR_HI/LO32
>> VM_L2_IH_LOG_BUSY		      |  VM_L2_IH_LOG_BUSY
>> MC_VM_L2_PERFCOUNTER_HI/LO	      |  MC_VM_L2_PERFCOUNTER_HI/LO
>> ATC_L2_PERFCOUNTER_HI/LO	      |  ATC_L2_PERFCOUNTER_HI/LO
>> ATC_L2_PERFCOUNTER2_HI/LO	      |  ATC_L2_PERFCOUNTER2_HI/LO
>>
>> It also needs dummy read by engines for these gc registers.
>>
>> Change-Id: Ie028f37eb789966d4593984bd661b248ebeb1ac3
>> Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  5 +++++
>>    drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   |  2 ++
>>    drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    |  2 ++
>>    drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c   |  4 ++++
>>    drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c   | 18 ++++++++++++++++++
>>    5 files changed, 31 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>> index 4b3f58dbf36f..c2fbf6087ecf 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>> @@ -392,6 +392,11 @@ void amdgpu_ring_emit_reg_write_reg_wait_helper(struct amdgpu_ring *ring,
>>    						uint32_t ref, uint32_t mask)
>>    {
>>    	amdgpu_ring_emit_wreg(ring, reg0, ref);
>> +
>> +	/* wait for a cycle to reset vm_inv_eng0_ack */
>> +	if (ring->funcs->vmhub == AMDGPU_GFXHUB_0)
>> +		amdgpu_ring_emit_rreg(ring, reg0);
>> +
>>    	amdgpu_ring_emit_reg_wait(ring, reg1, mask, mask);
>>    }
>>    
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index ef1975a5323a..104c47734316 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -5155,6 +5155,7 @@ static const struct amdgpu_ring_funcs gfx_v10_0_ring_funcs_gfx = {
>>    	.patch_cond_exec = gfx_v10_0_ring_emit_patch_cond_exec,
>>    	.preempt_ib = gfx_v10_0_ring_preempt_ib,
>>    	.emit_tmz = gfx_v10_0_ring_emit_tmz,
>> +	.emit_rreg = gfx_v10_0_ring_emit_rreg,
>>    	.emit_wreg = gfx_v10_0_ring_emit_wreg,
>>    	.emit_reg_wait = gfx_v10_0_ring_emit_reg_wait,
>>    };
>> @@ -5188,6 +5189,7 @@ static const struct amdgpu_ring_funcs gfx_v10_0_ring_funcs_compute = {
>>    	.test_ib = gfx_v10_0_ring_test_ib,
>>    	.insert_nop = amdgpu_ring_insert_nop,
>>    	.pad_ib = amdgpu_ring_generic_pad_ib,
>> +	.emit_rreg = gfx_v10_0_ring_emit_rreg,
>>    	.emit_wreg = gfx_v10_0_ring_emit_wreg,
>>    	.emit_reg_wait = gfx_v10_0_ring_emit_reg_wait,
>>    };
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index 2f03bf533d41..d00b53de0fdc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -6253,6 +6253,7 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = {
>>    	.init_cond_exec = gfx_v9_0_ring_emit_init_cond_exec,
>>    	.patch_cond_exec = gfx_v9_0_ring_emit_patch_cond_exec,
>>    	.emit_tmz = gfx_v9_0_ring_emit_tmz,
>> +	.emit_rreg = gfx_v9_0_ring_emit_rreg,
>>    	.emit_wreg = gfx_v9_0_ring_emit_wreg,
>>    	.emit_reg_wait = gfx_v9_0_ring_emit_reg_wait,
>>    	.emit_reg_write_reg_wait = gfx_v9_0_ring_emit_reg_write_reg_wait,
>> @@ -6289,6 +6290,7 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = {
>>    	.insert_nop = amdgpu_ring_insert_nop,
>>    	.pad_ib = amdgpu_ring_generic_pad_ib,
>>    	.set_priority = gfx_v9_0_ring_set_priority_compute,
>> +	.emit_rreg = gfx_v9_0_ring_emit_rreg,
>>    	.emit_wreg = gfx_v9_0_ring_emit_wreg,
>>    	.emit_reg_wait = gfx_v9_0_ring_emit_reg_wait,
>>    	.emit_reg_write_reg_wait = gfx_v9_0_ring_emit_reg_write_reg_wait,
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> index 3b00bce14cfb..dce6b651da1f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> @@ -346,6 +346,10 @@ static uint64_t
>> gmc_v10_0_emit_flush_gpu_tlb(struct amdgpu_ring *ring,
>>    
>>    	amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_req + eng, req);
>>    
>> +	/* wait for a cycle to reset vm_inv_eng0_ack */
>> +	if (ring->funcs->vmhub == AMDGPU_GFXHUB_0)
>> +		amdgpu_ring_emit_rreg(ring, hub->vm_inv_eng0_req + eng);
>> +
>>    	/* wait for the invalidate to complete */
>>    	amdgpu_ring_emit_reg_wait(ring, hub->vm_inv_eng0_ack + eng,
>>    				  1 << vmid, 1 << vmid);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> index 3460c00f3eaa..baaa33467882 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> @@ -38,6 +38,7 @@
>>    #include "navi10_sdma_pkt_open.h"
>>    #include "nbio_v2_3.h"
>>    #include "sdma_v5_0.h"
>> +#include "nvd.h"
>>    
>>    MODULE_FIRMWARE("amdgpu/navi10_sdma.bin");
>>    MODULE_FIRMWARE("amdgpu/navi10_sdma1.bin");
>> @@ -1147,6 +1148,22 @@ static void sdma_v5_0_ring_emit_vm_flush(struct amdgpu_ring *ring,
>>    	amdgpu_gmc_emit_flush_gpu_tlb(ring, vmid, pd_addr);
>>    }
>>    
>> +static void sdma_v5_0_ring_emit_rreg(struct amdgpu_ring *ring,
>> +uint32_t reg) {
>> +	struct amdgpu_device *adev = ring->adev;
>> +
>> +	amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>> +	amdgpu_ring_write(ring, 0 | /* src: register*/
>> +				(5 << 8) |  /* dst: memory */
>> +				(1 << 20)); /* write confirm */
>> +	amdgpu_ring_write(ring, reg);
>> +	amdgpu_ring_write(ring, 0);
>> +	amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
>> +				adev->virt.reg_val_offs * 4));
>> +	amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
>> +				adev->virt.reg_val_offs * 4));
>> +}
>> +
>>    static void sdma_v5_0_ring_emit_wreg(struct amdgpu_ring *ring,
>>    				     uint32_t reg, uint32_t val)
>>    {
>> @@ -1597,6 +1614,7 @@ static const struct amdgpu_ring_funcs sdma_v5_0_ring_funcs = {
>>    	.test_ib = sdma_v5_0_ring_test_ib,
>>    	.insert_nop = sdma_v5_0_ring_insert_nop,
>>    	.pad_ib = sdma_v5_0_ring_pad_ib,
>> +	.emit_rreg = sdma_v5_0_ring_emit_rreg,
>>    	.emit_wreg = sdma_v5_0_ring_emit_wreg,
>>    	.emit_reg_wait = sdma_v5_0_ring_emit_reg_wait,
>>    	.init_cond_exec = sdma_v5_0_ring_init_cond_exec,

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

WARNING: multiple messages have this Message-ID (diff)
From: "Koenig, Christian" <Christian.Koenig@amd.com>
To: "Zhu, Changfeng" <Changfeng.Zhu@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Pelloux-prayer,
	Pierre-eric" <Pierre-eric.Pelloux-prayer@amd.com>,
	"Huang, Ray" <Ray.Huang@amd.com>,
	"Tuikov, Luben" <Luben.Tuikov@amd.com>
Subject: Re: [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle delay
Date: Mon, 28 Oct 2019 10:46:39 +0000	[thread overview]
Message-ID: <924c7758-92ed-caf6-8068-ca12d7d77ed7@amd.com> (raw)
Message-ID: <20191028104639.eHv2aI9VTeNKDYu411p4qkhlAvb7_UV0yg7HpsjHf78@z> (raw)
In-Reply-To: <MN2PR12MB2896E32084545C8EB240BC45FD660@MN2PR12MB2896.namprd12.prod.outlook.com>

Hi Changfeng,

> So how can we deal with the firmware between mec version(402) and mec version(421)?
Well of hand I see only two options: Either print a warning or 
completely reject loading the driver.

Completely rejecting loading the driver is probably not a good idea and 
the issue is actually extremely unlikely to cause any problems.

So printing a warning that the user should update their firmware is 
probably the best approach.

Regards,
Christian.

Am 28.10.19 um 04:01 schrieb Zhu, Changfeng:
> Hi Christian,
>
> Re- that won't work, you can't add this to
> amdgpu_ring_emit_reg_write_reg_wait_helper or break all read triggered registers (like the semaphore ones).
>
> Do you mean that I should use reg_wait registers(wait_reg_mem) like Luben to replace read triggered registers for adding dummy read?
>
> Re-Additional to that it will never work on GFX9, since the CP firmware there uses the integrated write/wait command and you can't add an additional dummy read there.
>
> Yes, I see the integrated write/wait command and they are realized in gfx_v9_0_wait_reg_mem:
> Emily's patch:
> drm/amdgpu: Remove the sriov checking and add firmware checking
> decides when to go into gfx_v9_0_wait_reg_mem and when go into amdgpu_ring_emit_reg_write_reg_wait_helper.
>
> However there are two problems now.
> 1.Before the fw_version_ok fw version, the code goes into amdgpu_ring_emit_reg_write_reg_wait_helper. In this case, should not we add dummy read in amdgpu_ring_emit_reg_write_reg_wait_helper?
> 2.After the fw_version_ok fw version, the code goes into gfx_v9_0_wait_reg_mem. However, it realizes write/wait command in firmware. Then how can we add this dummy read? According to Yang,Zilong, the CP firmware has realized dummy in firmware in CL:
> Vega20 CL#1762470 @3/27/2019
> Navi10 CL#1761300 @3/25/2019
> Accodring to CL#1762470,
> The firmware which realized dummy read is(Raven for example):
> Mec version:
> #define F32_MEC_UCODE_VERSION "#421"
> #define F32_MEC_FEATURE_VERSION 46
> Pfp version:
> #define F32_PFP_UCODE_VERSION "#183"
> #define F32_PFP_FEATURE_VERSION 46
> In Emily's patch:
> The CP firmware whichuses the integrated write/wait command begins from version:
> +       case CHIP_RAVEN:
> +               if ((adev->gfx.me_fw_version >= 0x0000009c) &&
> +                   (adev->gfx.me_feature_version >= 42) &&
> +                   (adev->gfx.pfp_fw_version >=  0x000000b1(177)) &&
> +                   (adev->gfx.pfp_feature_version >= 42))
> +                       adev->gfx.me_fw_write_wait = true;
> +
> +               if ((adev->gfx.mec_fw_version >=  0x00000192(402)) &&
> +                   (adev->gfx.mec_feature_version >= 42))
> +                       adev->gfx.mec_fw_write_wait = true;
> +               break;
>
> So how can we deal with the firmware between mec version(402) and mec version(421)?
> It will realize write/wait command in CP firmware but it doesn't have dummy read.
>
> BR,
> Changfeng.
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Friday, October 25, 2019 11:54 PM
> To: Zhu, Changfeng <Changfeng.Zhu@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Pelloux-prayer, Pierre-eric <Pierre-eric.Pelloux-prayer@amd.com>; Huang, Ray <Ray.Huang@amd.com>; Tuikov, Luben <Luben.Tuikov@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle delay
>
> Hi Changfeng,
>
> that won't work, you can't add this to
> amdgpu_ring_emit_reg_write_reg_wait_helper or break all read triggered registers (like the semaphore ones).
>
> Additional to that it will never work on GFX9, since the CP firmware there uses the integrated write/wait command and you can't add an additional dummy read there.
>
> Regards,
> Christian.
>
> Am 25.10.19 um 16:22 schrieb Zhu, Changfeng:
>> I try to write a patch based on the patch of Tuikov,Luben.
>>
>> Inspired by Luben,here is the patch:
>>
>>   From 1980d8f1ed44fb9a84a5ea1f6e2edd2bc25c629a Mon Sep 17 00:00:00
>> 2001
>> From: changzhu <Changfeng.Zhu@amd.com>
>> Date: Thu, 10 Oct 2019 11:02:33 +0800
>> Subject: [PATCH] drm/amdgpu: add dummy read by engines for some GCVM status
>>    registers
>>
>> The GRBM register interface is now capable of bursting 1 cycle per
>> register wr->wr, wr->rd much faster than previous muticycle per
>> transaction done interface.  This has caused a problem where status
>> registers requiring HW to update have a 1 cycle delay, due to the
>> register update having to go through GRBM.
>>
>> SW may operate on an incorrect value if they write a register and
>> immediately check the corresponding status register.
>>
>> Registers requiring HW to clear or set fields may be delayed by 1 cycle.
>> For example,
>>
>> 1. write VM_INVALIDATE_ENG0_REQ mask = 5a 2. read
>> VM_INVALIDATE_ENG0_ACKb till the ack is same as the request mask = 5a
>>       	a. HW will reset VM_INVALIDATE_ENG0_ACK = 0 until invalidation
>> is complete 3. write VM_INVALIDATE_ENG0_REQ mask = 5a 4. read
>> VM_INVALIDATE_ENG0_ACK till the ack is same as the request mask = 5a
>> 	a. First read of VM_INVALIDATE_ENG0_ACK = 5a instead of 0
>> 	b. Second read of VM_INVALIDATE_ENG0_ACK = 0 because the remote GRBM h/w
>> 	   register takes one extra cycle to be cleared
>> 	c. In this case,SW wil see a false ACK if they exit on first read
>>
>> Affected registers (only GC variant)  | Recommended Dummy Read
>> --------------------------------------+----------------------------
>> VM_INVALIDATE_ENG*_ACK		      |  VM_INVALIDATE_ENG*_REQ
>> VM_L2_STATUS			      |  VM_L2_STATUS
>> VM_L2_PROTECTION_FAULT_STATUS	      |  VM_L2_PROTECTION_FAULT_STATUS
>> VM_L2_PROTECTION_FAULT_ADDR_HI/LO32   |  VM_L2_PROTECTION_FAULT_ADDR_HI/LO32
>> VM_L2_IH_LOG_BUSY		      |  VM_L2_IH_LOG_BUSY
>> MC_VM_L2_PERFCOUNTER_HI/LO	      |  MC_VM_L2_PERFCOUNTER_HI/LO
>> ATC_L2_PERFCOUNTER_HI/LO	      |  ATC_L2_PERFCOUNTER_HI/LO
>> ATC_L2_PERFCOUNTER2_HI/LO	      |  ATC_L2_PERFCOUNTER2_HI/LO
>>
>> It also needs dummy read by engines for these gc registers.
>>
>> Change-Id: Ie028f37eb789966d4593984bd661b248ebeb1ac3
>> Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  5 +++++
>>    drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   |  2 ++
>>    drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    |  2 ++
>>    drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c   |  4 ++++
>>    drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c   | 18 ++++++++++++++++++
>>    5 files changed, 31 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>> index 4b3f58dbf36f..c2fbf6087ecf 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
>> @@ -392,6 +392,11 @@ void amdgpu_ring_emit_reg_write_reg_wait_helper(struct amdgpu_ring *ring,
>>    						uint32_t ref, uint32_t mask)
>>    {
>>    	amdgpu_ring_emit_wreg(ring, reg0, ref);
>> +
>> +	/* wait for a cycle to reset vm_inv_eng0_ack */
>> +	if (ring->funcs->vmhub == AMDGPU_GFXHUB_0)
>> +		amdgpu_ring_emit_rreg(ring, reg0);
>> +
>>    	amdgpu_ring_emit_reg_wait(ring, reg1, mask, mask);
>>    }
>>    
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index ef1975a5323a..104c47734316 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -5155,6 +5155,7 @@ static const struct amdgpu_ring_funcs gfx_v10_0_ring_funcs_gfx = {
>>    	.patch_cond_exec = gfx_v10_0_ring_emit_patch_cond_exec,
>>    	.preempt_ib = gfx_v10_0_ring_preempt_ib,
>>    	.emit_tmz = gfx_v10_0_ring_emit_tmz,
>> +	.emit_rreg = gfx_v10_0_ring_emit_rreg,
>>    	.emit_wreg = gfx_v10_0_ring_emit_wreg,
>>    	.emit_reg_wait = gfx_v10_0_ring_emit_reg_wait,
>>    };
>> @@ -5188,6 +5189,7 @@ static const struct amdgpu_ring_funcs gfx_v10_0_ring_funcs_compute = {
>>    	.test_ib = gfx_v10_0_ring_test_ib,
>>    	.insert_nop = amdgpu_ring_insert_nop,
>>    	.pad_ib = amdgpu_ring_generic_pad_ib,
>> +	.emit_rreg = gfx_v10_0_ring_emit_rreg,
>>    	.emit_wreg = gfx_v10_0_ring_emit_wreg,
>>    	.emit_reg_wait = gfx_v10_0_ring_emit_reg_wait,
>>    };
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index 2f03bf533d41..d00b53de0fdc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -6253,6 +6253,7 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = {
>>    	.init_cond_exec = gfx_v9_0_ring_emit_init_cond_exec,
>>    	.patch_cond_exec = gfx_v9_0_ring_emit_patch_cond_exec,
>>    	.emit_tmz = gfx_v9_0_ring_emit_tmz,
>> +	.emit_rreg = gfx_v9_0_ring_emit_rreg,
>>    	.emit_wreg = gfx_v9_0_ring_emit_wreg,
>>    	.emit_reg_wait = gfx_v9_0_ring_emit_reg_wait,
>>    	.emit_reg_write_reg_wait = gfx_v9_0_ring_emit_reg_write_reg_wait,
>> @@ -6289,6 +6290,7 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = {
>>    	.insert_nop = amdgpu_ring_insert_nop,
>>    	.pad_ib = amdgpu_ring_generic_pad_ib,
>>    	.set_priority = gfx_v9_0_ring_set_priority_compute,
>> +	.emit_rreg = gfx_v9_0_ring_emit_rreg,
>>    	.emit_wreg = gfx_v9_0_ring_emit_wreg,
>>    	.emit_reg_wait = gfx_v9_0_ring_emit_reg_wait,
>>    	.emit_reg_write_reg_wait = gfx_v9_0_ring_emit_reg_write_reg_wait,
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> index 3b00bce14cfb..dce6b651da1f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> @@ -346,6 +346,10 @@ static uint64_t
>> gmc_v10_0_emit_flush_gpu_tlb(struct amdgpu_ring *ring,
>>    
>>    	amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_req + eng, req);
>>    
>> +	/* wait for a cycle to reset vm_inv_eng0_ack */
>> +	if (ring->funcs->vmhub == AMDGPU_GFXHUB_0)
>> +		amdgpu_ring_emit_rreg(ring, hub->vm_inv_eng0_req + eng);
>> +
>>    	/* wait for the invalidate to complete */
>>    	amdgpu_ring_emit_reg_wait(ring, hub->vm_inv_eng0_ack + eng,
>>    				  1 << vmid, 1 << vmid);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> index 3460c00f3eaa..baaa33467882 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> @@ -38,6 +38,7 @@
>>    #include "navi10_sdma_pkt_open.h"
>>    #include "nbio_v2_3.h"
>>    #include "sdma_v5_0.h"
>> +#include "nvd.h"
>>    
>>    MODULE_FIRMWARE("amdgpu/navi10_sdma.bin");
>>    MODULE_FIRMWARE("amdgpu/navi10_sdma1.bin");
>> @@ -1147,6 +1148,22 @@ static void sdma_v5_0_ring_emit_vm_flush(struct amdgpu_ring *ring,
>>    	amdgpu_gmc_emit_flush_gpu_tlb(ring, vmid, pd_addr);
>>    }
>>    
>> +static void sdma_v5_0_ring_emit_rreg(struct amdgpu_ring *ring,
>> +uint32_t reg) {
>> +	struct amdgpu_device *adev = ring->adev;
>> +
>> +	amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>> +	amdgpu_ring_write(ring, 0 | /* src: register*/
>> +				(5 << 8) |  /* dst: memory */
>> +				(1 << 20)); /* write confirm */
>> +	amdgpu_ring_write(ring, reg);
>> +	amdgpu_ring_write(ring, 0);
>> +	amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
>> +				adev->virt.reg_val_offs * 4));
>> +	amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
>> +				adev->virt.reg_val_offs * 4));
>> +}
>> +
>>    static void sdma_v5_0_ring_emit_wreg(struct amdgpu_ring *ring,
>>    				     uint32_t reg, uint32_t val)
>>    {
>> @@ -1597,6 +1614,7 @@ static const struct amdgpu_ring_funcs sdma_v5_0_ring_funcs = {
>>    	.test_ib = sdma_v5_0_ring_test_ib,
>>    	.insert_nop = sdma_v5_0_ring_insert_nop,
>>    	.pad_ib = sdma_v5_0_ring_pad_ib,
>> +	.emit_rreg = sdma_v5_0_ring_emit_rreg,
>>    	.emit_wreg = sdma_v5_0_ring_emit_wreg,
>>    	.emit_reg_wait = sdma_v5_0_ring_emit_reg_wait,
>>    	.init_cond_exec = sdma_v5_0_ring_init_cond_exec,

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply	other threads:[~2019-10-28 10:46 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-25  9:26 [PATCH] drm/amdgpu: GFX9, GFX10: GRBM requires 1-cycle delay Huang, Ray
2019-10-25  9:26 ` Huang, Ray
     [not found] ` <MN2PR12MB33095371C6336C43E4F88C43EC650-rweVpJHSKTpWdvXm18W95QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-10-25 14:22   ` Zhu, Changfeng
2019-10-25 14:22     ` Zhu, Changfeng
     [not found]     ` <MN2PR12MB28967F025FA60291AE745FE6FD650-rweVpJHSKToIQ/pRnFqe/QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-10-25 15:53       ` Koenig, Christian
2019-10-25 15:53         ` Koenig, Christian
     [not found]         ` <b54e3e37-ff15-079f-9b62-be7936836672-5C7GfCeVMHo@public.gmane.org>
2019-10-28  3:01           ` Zhu, Changfeng
2019-10-28  3:01             ` Zhu, Changfeng
     [not found]             ` <MN2PR12MB2896E32084545C8EB240BC45FD660-rweVpJHSKToIQ/pRnFqe/QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-10-28 10:46               ` Koenig, Christian [this message]
2019-10-28 10:46                 ` Koenig, Christian
     [not found]                 ` <924c7758-92ed-caf6-8068-ca12d7d77ed7-5C7GfCeVMHo@public.gmane.org>
2019-10-28 12:07                   ` Zhu, Changfeng
2019-10-28 12:07                     ` Zhu, Changfeng
  -- strict thread matches above, loose matches on Subject: below --
2019-10-28 13:38 Koenig, Christian
2019-10-28 13:38 ` Koenig, Christian
2019-10-24 21:16 Tuikov, Luben
2019-10-24 21:16 ` Tuikov, Luben
     [not found] ` <20191024211430.25399-1-luben.tuikov-5C7GfCeVMHo@public.gmane.org>
2019-10-25  3:20   ` Zhu, Changfeng
2019-10-25  3:20     ` Zhu, Changfeng
2019-10-25  6:49   ` Koenig, Christian
2019-10-25  6:49     ` Koenig, Christian
     [not found]     ` <6be2805a-dddc-7b02-84ea-f52fab9780b0-5C7GfCeVMHo@public.gmane.org>
2019-10-25 16:05       ` Alex Deucher
2019-10-25 16:05         ` Alex Deucher
     [not found]         ` <CADnq5_NsTABDWTMBFcQBGfaBganBpzN+YQ0gmw55pa8PswNZYA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-10-25 16:19           ` Koenig, Christian
2019-10-25 16:19             ` Koenig, Christian
     [not found]             ` <b40c78f1-17a5-f0f9-183e-0c78fd7163e9-5C7GfCeVMHo@public.gmane.org>
2019-10-25 22:45               ` Tuikov, Luben
2019-10-25 22:45                 ` Tuikov, Luben
     [not found]                 ` <c3e496c7-2ace-149e-0c51-92dd1342d31d-5C7GfCeVMHo@public.gmane.org>
2019-10-26 12:09                   ` Koenig, Christian
2019-10-26 12:09                     ` Koenig, Christian
     [not found]                     ` <122f3bde-5fd0-1fa5-864c-547c0cefb744-5C7GfCeVMHo@public.gmane.org>
2019-10-27 21:25                       ` Tuikov, Luben
2019-10-27 21:25                         ` Tuikov, Luben

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=924c7758-92ed-caf6-8068-ca12d7d77ed7@amd.com \
    --to=christian.koenig-5c7gfcevmho@public.gmane.org \
    --cc=Alexander.Deucher-5C7GfCeVMHo@public.gmane.org \
    --cc=Changfeng.Zhu-5C7GfCeVMHo@public.gmane.org \
    --cc=Luben.Tuikov-5C7GfCeVMHo@public.gmane.org \
    --cc=Pierre-eric.Pelloux-prayer-5C7GfCeVMHo@public.gmane.org \
    --cc=Ray.Huang-5C7GfCeVMHo@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.