Timeout issue are complicated. These patched can fix driver side issue. Acturus SPG timeout issue can be fixed with these patches. For other type of timeout issues are still under investigation.

Thanks & Best Regards!


James Zhu


From: Liu, Leo <Leo.Liu@amd.com>
Sent: Wednesday, February 12, 2020 10:11 AM
To: Zhu, James <James.Zhu@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Subject: RE: [PATCH 1/2] drm/amdgpu/vcn: fix race condition issue for vcn start
 

With your patches, still seeing the hung with multiple processes of decode, encode, and transcode.

I think we need find the root cause of that and give a comprehensive fix either from driver side or firmware side or both.

 

Regards,

Leo

 

From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Zhu, James
Sent: Wednesday, February 12, 2020 9:28 AM
To: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/2] drm/amdgpu/vcn: fix race condition issue for vcn start

 

[AMD Official Use Only - Internal Distribution Only]

 

ping

 


From: Zhu, James <James.Zhu@amd.com>
Sent: Monday, February 10, 2020 1:06 PM
To: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Zhu, James <James.Zhu@amd.com>
Subject: [PATCH 1/2] drm/amdgpu/vcn: fix race condition issue for vcn start

 

Fix race condition issue when multiple vcn starts are called.

Signed-off-by: James Zhu <James.Zhu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 4 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index f96464e..aa7663f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -63,6 +63,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
         int i, r;
 
         INIT_DELAYED_WORK(&adev->vcn.idle_work, amdgpu_vcn_idle_work_handler);
+       mutex_init(&adev->vcn.vcn_pg_lock);
 
         switch (adev->asic_type) {
         case CHIP_RAVEN:
@@ -210,6 +211,7 @@ int amdgpu_vcn_sw_fini(struct amdgpu_device *adev)
         }
 
         release_firmware(adev->vcn.fw);
+       mutex_destroy(&adev->vcn.vcn_pg_lock);
 
         return 0;
 }
@@ -321,6 +323,7 @@ void amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
         struct amdgpu_device *adev = ring->adev;
         bool set_clocks = !cancel_delayed_work_sync(&adev->vcn.idle_work);
 
+       mutex_lock(&adev->vcn.vcn_pg_lock);
         if (set_clocks) {
                 amdgpu_gfx_off_ctrl(adev, false);
                 amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VCN,
@@ -345,6 +348,7 @@ void amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
 
                 adev->vcn.pause_dpg_mode(adev, ring->me, &new_state);
         }
+       mutex_unlock(&adev->vcn.vcn_pg_lock);
 }
 
 void amdgpu_vcn_ring_end_use(struct amdgpu_ring *ring)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
index 6fe0573..2ae110d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
@@ -200,6 +200,7 @@ struct amdgpu_vcn {
         struct drm_gpu_scheduler *vcn_dec_sched[AMDGPU_MAX_VCN_INSTANCES];
         uint32_t                 num_vcn_enc_sched;
         uint32_t                 num_vcn_dec_sched;
+       struct mutex             vcn_pg_lock;
 
         unsigned        harvest_config;
         int (*pause_dpg_mode)(struct amdgpu_device *adev,
--
2.7.4