[PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12
@ 2021-03-30  4:41 Emily Deng
  2021-03-30  4:41 ` [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual ctrc Emily Deng
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Emily Deng @ 2021-03-30  4:41 UTC (permalink / raw)
  To: amd-gfx; +Cc: Emily Deng, Frank . Min

Since vcn decoding ring is not required, so just disable it.

Signed-off-by: Frank.Min <Frank.Min@amd.com>
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  4 +++-
 drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c   | 29 ++++++++++++-------------
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 8844f650b17f..5d5c41c9d5aa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -427,7 +427,9 @@ static int amdgpu_hw_ip_info(struct amdgpu_device *adev,
 			if (adev->uvd.harvest_config & (1 << i))
 				continue;
 
-			if (adev->vcn.inst[i].ring_dec.sched.ready)
+			if (adev->vcn.inst[i].ring_dec.sched.ready ||
+				(adev->asic_type == CHIP_NAVI12 &&
+				amdgpu_sriov_vf(adev)))
 				++num_rings;
 		}
 		ib_start_alignment = 16;
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
index 116b9643d5ba..e4b61f3a45fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
@@ -220,21 +220,20 @@ static int vcn_v2_0_hw_init(void *handle)
 {
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 	struct amdgpu_ring *ring = &adev->vcn.inst->ring_dec;
-	int i, r;
+	int i, r = -1;
 
 	adev->nbio.funcs->vcn_doorbell_range(adev, ring->use_doorbell,
 					     ring->doorbell_index, 0);
 
-	if (amdgpu_sriov_vf(adev))
+	if (amdgpu_sriov_vf(adev)) {
 		vcn_v2_0_start_sriov(adev);
-
-	r = amdgpu_ring_test_helper(ring);
-	if (r)
-		goto done;
-
-	//Disable vcn decode for sriov
-	if (amdgpu_sriov_vf(adev))
-		ring->sched.ready = false;
+		if (adev->asic_type == CHIP_NAVI12)
+			ring->sched.ready = false;
+	} else {
+		r = amdgpu_ring_test_helper(ring);
+		if (r)
+			goto done;
+	}
 
 	for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
 		ring = &adev->vcn.inst->ring_enc[i];
@@ -245,8 +244,11 @@ static int vcn_v2_0_hw_init(void *handle)
 
 done:
 	if (!r)
-		DRM_INFO("VCN decode and encode initialized successfully(under %s).\n",
-			(adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)?"DPG Mode":"SPG Mode");
+		DRM_INFO("VCN %s encode initialized successfully(under %s).\n",
+			(adev->asic_type == CHIP_NAVI12 &&
+				amdgpu_sriov_vf(adev))?"":"decode and",
+			(adev->pg_flags &
+				AMD_PG_SUPPORT_VCN_DPG)?"DPG Mode":"SPG Mode");
 
 	return r;
 }
@@ -1719,9 +1721,6 @@ int vcn_v2_0_dec_ring_test_ring(struct amdgpu_ring *ring)
 	unsigned i;
 	int r;
 
-	if (amdgpu_sriov_vf(adev))
-		return 0;
-
 	WREG32(adev->vcn.inst[ring->me].external.scratch9, 0xCAFEDEAD);
 	r = amdgpu_ring_alloc(ring, 4);
 	if (r)
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual ctrc
  2021-03-30  4:41 [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12 Emily Deng
@ 2021-03-30  4:41 ` Emily Deng
  2021-03-31  9:00   ` Deng, Emily
  2021-03-30  4:41 ` [PATCH 3/6] drm/amdgpu: Restore msix after FLR Emily Deng
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Emily Deng @ 2021-03-30  4:41 UTC (permalink / raw)
  To: amd-gfx; +Cc: Emily Deng

Set the num_types equal to the enabled num_crtc.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
index 5c11144da051..c03a83a2b7cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
@@ -768,7 +768,7 @@ static const struct amdgpu_irq_src_funcs dce_virtual_crtc_irq_funcs = {
 
 static void dce_virtual_set_irq_funcs(struct amdgpu_device *adev)
 {
-	adev->crtc_irq.num_types = AMDGPU_CRTC_IRQ_VBLANK6 + 1;
+	adev->crtc_irq.num_types = adev->mode_info.num_crtc;
 	adev->crtc_irq.funcs = &dce_virtual_crtc_irq_funcs;
 }
 
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/6] drm/amdgpu: Restore msix after FLR
  2021-03-30  4:41 [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12 Emily Deng
  2021-03-30  4:41 ` [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual ctrc Emily Deng
@ 2021-03-30  4:41 ` Emily Deng
  2021-03-30  5:37   ` Chen, Guchun
  2021-03-30  4:41 ` [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from vram for navi12 sriov Emily Deng
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Emily Deng @ 2021-03-30  4:41 UTC (permalink / raw)
  To: amd-gfx; +Cc: Emily.Deng

From: "Emily.Deng" <Emily.Deng@amd.com>

After FLR, the msix will be cleared, so need to re-enable it.

v2:
Change name with amdgpu_irq prefix, remove #ifdef.

Signed-off-by: Emily.Deng <Emily.Deng@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 03412543427a..8936589bd7f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -277,6 +277,17 @@ static bool amdgpu_msi_ok(struct amdgpu_device *adev)
 	return true;
 }
 
+void amdgpu_irq_restore_msix(struct amdgpu_device *adev)
+{
+	u16 ctrl;
+
+	pci_read_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, &ctrl);
+	ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
+	pci_write_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, ctrl);
+	ctrl |= PCI_MSIX_FLAGS_ENABLE;
+	pci_write_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, ctrl);
+}
+
 /**
  * amdgpu_irq_init - initialize interrupt handling
  *
@@ -558,6 +569,7 @@ void amdgpu_irq_gpu_reset_resume_helper(struct amdgpu_device *adev)
 {
 	int i, j, k;
 
+	amdgpu_irq_restore_msix(adev);
 	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
 		if (!adev->irq.client[i].sources)
 			continue;
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from vram for navi12 sriov
  2021-03-30  4:41 [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12 Emily Deng
  2021-03-30  4:41 ` [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual ctrc Emily Deng
  2021-03-30  4:41 ` [PATCH 3/6] drm/amdgpu: Restore msix after FLR Emily Deng
@ 2021-03-30  4:41 ` Emily Deng
  2021-03-31  9:01   ` Deng, Emily
  2021-03-30  4:41 ` [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12 Emily Deng
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Emily Deng @ 2021-03-30  4:41 UTC (permalink / raw)
  To: amd-gfx; +Cc: Emily Deng

To fix the board disappear issue.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 46d4bbabce75..48dc171bc759 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -693,6 +693,10 @@ int nv_set_ip_blocks(struct amdgpu_device *adev)
 		adev->nbio.funcs = &nbio_v2_3_funcs;
 		adev->nbio.hdp_flush_reg = &nbio_v2_3_hdp_flush_reg;
 	}
+
+	if (amdgpu_sriov_vf(adev) && adev->asic_type == CHIP_NAVI12)
+		amdgpu_discovery = 0;
+
 	adev->hdp.funcs = &hdp_v5_0_funcs;
 
 	if (adev->asic_type >= CHIP_SIENNA_CICHLID)
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12
  2021-03-30  4:41 [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12 Emily Deng
                   ` (2 preceding siblings ...)
  2021-03-30  4:41 ` [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from vram for navi12 sriov Emily Deng
@ 2021-03-30  4:41 ` Emily Deng
  2021-03-30  7:12   ` Christian König
  2021-03-30  4:41 ` [PATCH 6/6] drm/amdgpu: Fix driver unload issue Emily Deng
  2021-03-31  9:00 ` [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12 Deng, Emily
  5 siblings, 1 reply; 25+ messages in thread
From: Emily Deng @ 2021-03-30  4:41 UTC (permalink / raw)
  To: amd-gfx; +Cc: Emily Deng

It will hit ramdomly sdma hang, and pending on utcl2
address translation when access the RPTR polling address.

According sdma firmware team mentioned, the RPTR writeback is done by
hardware automatically, and will hit issue when clock gating occurs. So
stop using the rptr write back for sdma5.0.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 920fc6d4a127..63e4a78181b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -298,13 +298,19 @@ static void sdma_v5_0_ring_patch_cond_exec(struct amdgpu_ring *ring,
  */
 static uint64_t sdma_v5_0_ring_get_rptr(struct amdgpu_ring *ring)
 {
-	u64 *rptr;
+	struct amdgpu_device *adev = ring->adev;
+	u64 rptr;
+	u32 lowbit, highbit;
+
+	lowbit = RREG32(sdma_v5_0_get_reg_offset(adev, ring->me, mmSDMA0_GFX_RB_RPTR));
+	highbit = RREG32(sdma_v5_0_get_reg_offset(adev, ring->me, mmSDMA0_GFX_RB_RPTR_HI));
 
-	/* XXX check if swapping is necessary on BE */
-	rptr = ((u64 *)&ring->adev->wb.wb[ring->rptr_offs]);
+	rptr = highbit;
+	rptr = rptr << 32;
+	rptr |= lowbit;
 
-	DRM_DEBUG("rptr before shift == 0x%016llx\n", *rptr);
-	return ((*rptr) >> 2);
+	DRM_DEBUG("rptr before shift == 0x%016llx\n", rptr);
+	return (rptr >> 2);
 }
 
 /**
@@ -702,7 +708,7 @@ static int sdma_v5_0_gfx_resume(struct amdgpu_device *adev)
 		WREG32(sdma_v5_0_get_reg_offset(adev, i, mmSDMA0_GFX_RB_RPTR_ADDR_LO),
 		       lower_32_bits(adev->wb.gpu_addr + wb_offset) & 0xFFFFFFFC);
 
-		rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL, RPTR_WRITEBACK_ENABLE, 1);
+		rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL, RPTR_WRITEBACK_ENABLE, 0);
 
 		WREG32(sdma_v5_0_get_reg_offset(adev, i, mmSDMA0_GFX_RB_BASE), ring->gpu_addr >> 8);
 		WREG32(sdma_v5_0_get_reg_offset(adev, i, mmSDMA0_GFX_RB_BASE_HI), ring->gpu_addr >> 40);
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/6] drm/amdgpu: Fix driver unload issue
  2021-03-30  4:41 [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12 Emily Deng
                   ` (3 preceding siblings ...)
  2021-03-30  4:41 ` [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12 Emily Deng
@ 2021-03-30  4:41 ` Emily Deng
  2021-03-30  6:49   ` Chen, Jiansong (Simon)
  2021-03-31  9:00 ` [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12 Deng, Emily
  5 siblings, 1 reply; 25+ messages in thread
From: Emily Deng @ 2021-03-30  4:41 UTC (permalink / raw)
  To: amd-gfx; +Cc: Emily Deng

During driver unloading, don't need to copy mem, or it will introduce
some call trace, such as when sa_manager is freed, it will introduce warn
call trace in amdgpu_sa_bo_new.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e00263bcc88b..f0546a489e0d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -317,6 +317,9 @@ int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
 	struct dma_fence *fence = NULL;
 	int r = 0;
 
+	if (adev->shutdown)
+		return 0;
+
 	if (!adev->mman.buffer_funcs_enabled) {
 		DRM_ERROR("Trying to move memory with ring turned off.\n");
 		return -EINVAL;
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* RE: [PATCH 3/6] drm/amdgpu: Restore msix after FLR
  2021-03-30  4:41 ` [PATCH 3/6] drm/amdgpu: Restore msix after FLR Emily Deng
@ 2021-03-30  5:37   ` Chen, Guchun
  2021-03-30  8:07     ` Deng, Emily
  0 siblings, 1 reply; 25+ messages in thread
From: Chen, Guchun @ 2021-03-30  5:37 UTC (permalink / raw)
  To: Deng, Emily, amd-gfx; +Cc: Deng, Emily

[AMD Public Use]

amdgpu_irq_restore_msix should be one static function?

Regards,
Guchun

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Emily Deng
Sent: Tuesday, March 30, 2021 12:42 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily <Emily.Deng@amd.com>
Subject: [PATCH 3/6] drm/amdgpu: Restore msix after FLR

From: "Emily.Deng" <Emily.Deng@amd.com>

After FLR, the msix will be cleared, so need to re-enable it.

v2:
Change name with amdgpu_irq prefix, remove #ifdef.

Signed-off-by: Emily.Deng <Emily.Deng@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 03412543427a..8936589bd7f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -277,6 +277,17 @@ static bool amdgpu_msi_ok(struct amdgpu_device *adev)
 	return true;
 }
 
+void amdgpu_irq_restore_msix(struct amdgpu_device *adev) {
+	u16 ctrl;
+
+	pci_read_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, &ctrl);
+	ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
+	pci_write_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, ctrl);
+	ctrl |= PCI_MSIX_FLAGS_ENABLE;
+	pci_write_config_word(adev->pdev, adev->pdev->msix_cap + 
+PCI_MSIX_FLAGS, ctrl); }
+
 /**
  * amdgpu_irq_init - initialize interrupt handling
  *
@@ -558,6 +569,7 @@ void amdgpu_irq_gpu_reset_resume_helper(struct amdgpu_device *adev)  {
 	int i, j, k;
 
+	amdgpu_irq_restore_msix(adev);
 	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
 		if (!adev->irq.client[i].sources)
 			continue;
--
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cguchun.chen%40amd.com%7C6aff296c96104aef176208d8f3362acf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637526761267513989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=BG4P%2FbJmn8PiLR%2BxTys8cVWK6924LWftjTXjKqrgnkg%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
  2021-03-30  4:41 ` [PATCH 6/6] drm/amdgpu: Fix driver unload issue Emily Deng
@ 2021-03-30  6:49   ` Chen, Jiansong (Simon)
  2021-03-30  7:05     ` Deng, Emily
  0 siblings, 1 reply; 25+ messages in thread
From: Chen, Jiansong (Simon) @ 2021-03-30  6:49 UTC (permalink / raw)
  To: Deng, Emily, amd-gfx; +Cc: Deng, Emily

[AMD Official Use Only - Internal Distribution Only]

I still wonder how the issue takes place? According to my humble knowledge in driver model, the reference count of the kobject
for the device will not reach zero when there is still some device mem access, and shutdown should not happen.

Regards,
Jiansong
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Emily Deng
Sent: Tuesday, March 30, 2021 12:42 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily <Emily.Deng@amd.com>
Subject: [PATCH 6/6] drm/amdgpu: Fix driver unload issue

During driver unloading, don't need to copy mem, or it will introduce some call trace, such as when sa_manager is freed, it will introduce warn call trace in amdgpu_sa_bo_new.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e00263bcc88b..f0546a489e0d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -317,6 +317,9 @@ int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
 struct dma_fence *fence = NULL;
 int r = 0;

+if (adev->shutdown)
+return 0;
+
 if (!adev->mman.buffer_funcs_enabled) {
 DRM_ERROR("Trying to move memory with ring turned off.\n");
 return -EINVAL;
--
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CJiansong.Chen%40amd.com%7C1b4c71d7b962476a367508d8f3362f40%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637526761354532311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=RxRnZW0fmwjKSGMN1nf6kIHRdAPVs9J5OBluDYhR6vQ%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
  2021-03-30  6:49   ` Chen, Jiansong (Simon)
@ 2021-03-30  7:05     ` Deng, Emily
  2021-03-30  7:10       ` Christian König
  0 siblings, 1 reply; 25+ messages in thread
From: Deng, Emily @ 2021-03-30  7:05 UTC (permalink / raw)
  To: Chen, Jiansong (Simon), amd-gfx

[AMD Official Use Only - Internal Distribution Only]

Hi Jiansong,
     It does happen,  maybe have the race condition?


Best wishes
Emily Deng



>-----Original Message-----
>From: Chen, Jiansong (Simon) <Jiansong.Chen@amd.com>
>Sent: Tuesday, March 30, 2021 2:49 PM
>To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily <Emily.Deng@amd.com>
>Subject: RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>
>[AMD Official Use Only - Internal Distribution Only]
>
>I still wonder how the issue takes place? According to my humble knowledge
>in driver model, the reference count of the kobject for the device will not
>reach zero when there is still some device mem access, and shutdown should
>not happen.
>
>Regards,
>Jiansong
>-----Original Message-----
>From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Emily
>Deng
>Sent: Tuesday, March 30, 2021 12:42 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily <Emily.Deng@amd.com>
>Subject: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>
>During driver unloading, don't need to copy mem, or it will introduce some
>call trace, such as when sa_manager is freed, it will introduce warn call trace
>in amdgpu_sa_bo_new.
>
>Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>index e00263bcc88b..f0546a489e0d 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>@@ -317,6 +317,9 @@ int amdgpu_ttm_copy_mem_to_mem(struct
>amdgpu_device *adev,  struct dma_fence *fence = NULL;  int r = 0;
>
>+if (adev->shutdown)
>+return 0;
>+
> if (!adev->mman.buffer_funcs_enabled) {  DRM_ERROR("Trying to move
>memory with ring turned off.\n");  return -EINVAL;
>--
>2.25.1
>
>_______________________________________________
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fr
>eedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfx&amp;data=04%7C01%7CJiansong.Chen%40amd.com%7C1b4c71d7b96247
>6a367508d8f3362f40%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>C637526761354532311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdat
>a=RxRnZW0fmwjKSGMN1nf6kIHRdAPVs9J5OBluDYhR6vQ%3D&amp;reserved
>=0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
  2021-03-30  7:05     ` Deng, Emily
@ 2021-03-30  7:10       ` Christian König
  2021-03-30  8:19         ` Deng, Emily
  0 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2021-03-30  7:10 UTC (permalink / raw)
  To: Deng, Emily, Chen, Jiansong (Simon), amd-gfx

Good morning,

yes Jiansong is right that patch is really not a good idea.

Moving buffers can indeed happen during shutdown while some memory is 
still referenced.

Just ignoring the move is not the right approach, you need to find out 
why the memory is moved in the first place.

You could add something like WARN_ON(adev->shutdown);

Regards,
Christian.

Am 30.03.21 um 09:05 schrieb Deng, Emily:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi Jiansong,
>       It does happen,  maybe have the race condition?
>
>
> Best wishes
> Emily Deng
>
>
>
>> -----Original Message-----
>> From: Chen, Jiansong (Simon) <Jiansong.Chen@amd.com>
>> Sent: Tuesday, March 30, 2021 2:49 PM
>> To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Deng, Emily <Emily.Deng@amd.com>
>> Subject: RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> I still wonder how the issue takes place? According to my humble knowledge
>> in driver model, the reference count of the kobject for the device will not
>> reach zero when there is still some device mem access, and shutdown should
>> not happen.
>>
>> Regards,
>> Jiansong
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Emily
>> Deng
>> Sent: Tuesday, March 30, 2021 12:42 PM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Deng, Emily <Emily.Deng@amd.com>
>> Subject: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>
>> During driver unloading, don't need to copy mem, or it will introduce some
>> call trace, such as when sa_manager is freed, it will introduce warn call trace
>> in amdgpu_sa_bo_new.
>>
>> Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index e00263bcc88b..f0546a489e0d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -317,6 +317,9 @@ int amdgpu_ttm_copy_mem_to_mem(struct
>> amdgpu_device *adev,  struct dma_fence *fence = NULL;  int r = 0;
>>
>> +if (adev->shutdown)
>> +return 0;
>> +
>> if (!adev->mman.buffer_funcs_enabled) {  DRM_ERROR("Trying to move
>> memory with ring turned off.\n");  return -EINVAL;
>> --
>> 2.25.1
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fr
>> eedesktop.org%2Fmailman%2Flistinfo%2Famd-
>> gfx&amp;data=04%7C01%7CJiansong.Chen%40amd.com%7C1b4c71d7b96247
>> 6a367508d8f3362f40%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>> C637526761354532311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdat
>> a=RxRnZW0fmwjKSGMN1nf6kIHRdAPVs9J5OBluDYhR6vQ%3D&amp;reserved
>> =0
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12
  2021-03-30  4:41 ` [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12 Emily Deng
@ 2021-03-30  7:12   ` Christian König
  2021-03-30  7:20     ` Deng, Emily
  0 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2021-03-30  7:12 UTC (permalink / raw)
  To: Emily Deng, amd-gfx



Am 30.03.21 um 06:41 schrieb Emily Deng:
> It will hit ramdomly sdma hang, and pending on utcl2
> address translation when access the RPTR polling address.
>
> According sdma firmware team mentioned, the RPTR writeback is done by
> hardware automatically, and will hit issue when clock gating occurs. So
> stop using the rptr write back for sdma5.0.
>
> Signed-off-by: Emily Deng <Emily.Deng@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 18 ++++++++++++------
>   1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> index 920fc6d4a127..63e4a78181b8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> @@ -298,13 +298,19 @@ static void sdma_v5_0_ring_patch_cond_exec(struct amdgpu_ring *ring,
>    */
>   static uint64_t sdma_v5_0_ring_get_rptr(struct amdgpu_ring *ring)
>   {
> -	u64 *rptr;
> +	struct amdgpu_device *adev = ring->adev;
> +	u64 rptr;
> +	u32 lowbit, highbit;
> +
> +	lowbit = RREG32(sdma_v5_0_get_reg_offset(adev, ring->me, mmSDMA0_GFX_RB_RPTR));
> +	highbit = RREG32(sdma_v5_0_get_reg_offset(adev, ring->me, mmSDMA0_GFX_RB_RPTR_HI));

That won't work like this.

We have the readpointer writeback because we otherwise can't guarantee 
that the two 32bit values read from the registers are coherent.

In other words it can be that the hi rptr is already wrapped around 
while the lo is still the old value.

Why exactly doesn't the writeback work?

Christian.

>   
> -	/* XXX check if swapping is necessary on BE */
> -	rptr = ((u64 *)&ring->adev->wb.wb[ring->rptr_offs]);
> +	rptr = highbit;
> +	rptr = rptr << 32;
> +	rptr |= lowbit;
>   
> -	DRM_DEBUG("rptr before shift == 0x%016llx\n", *rptr);
> -	return ((*rptr) >> 2);
> +	DRM_DEBUG("rptr before shift == 0x%016llx\n", rptr);
> +	return (rptr >> 2);
>   }
>   
>   /**
> @@ -702,7 +708,7 @@ static int sdma_v5_0_gfx_resume(struct amdgpu_device *adev)
>   		WREG32(sdma_v5_0_get_reg_offset(adev, i, mmSDMA0_GFX_RB_RPTR_ADDR_LO),
>   		       lower_32_bits(adev->wb.gpu_addr + wb_offset) & 0xFFFFFFFC);
>   
> -		rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL, RPTR_WRITEBACK_ENABLE, 1);
> +		rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL, RPTR_WRITEBACK_ENABLE, 0);
>   
>   		WREG32(sdma_v5_0_get_reg_offset(adev, i, mmSDMA0_GFX_RB_BASE), ring->gpu_addr >> 8);
>   		WREG32(sdma_v5_0_get_reg_offset(adev, i, mmSDMA0_GFX_RB_BASE_HI), ring->gpu_addr >> 40);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12
  2021-03-30  7:12   ` Christian König
@ 2021-03-30  7:20     ` Deng, Emily
  2021-03-30  7:24       ` Christian König
  0 siblings, 1 reply; 25+ messages in thread
From: Deng, Emily @ 2021-03-30  7:20 UTC (permalink / raw)
  To: Christian König, amd-gfx

[AMD Official Use Only - Internal Distribution Only]

>-----Original Message-----
>From: Christian König <ckoenig.leichtzumerken@gmail.com>
>Sent: Tuesday, March 30, 2021 3:13 PM
>To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12
>
>
>
>Am 30.03.21 um 06:41 schrieb Emily Deng:
>> It will hit ramdomly sdma hang, and pending on utcl2 address
>> translation when access the RPTR polling address.
>>
>> According sdma firmware team mentioned, the RPTR writeback is done by
>> hardware automatically, and will hit issue when clock gating occurs.
>> So stop using the rptr write back for sdma5.0.
>>
>> Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 18 ++++++++++++------
>>   1 file changed, 12 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> index 920fc6d4a127..63e4a78181b8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>> @@ -298,13 +298,19 @@ static void
>sdma_v5_0_ring_patch_cond_exec(struct amdgpu_ring *ring,
>>    */
>>   static uint64_t sdma_v5_0_ring_get_rptr(struct amdgpu_ring *ring)
>>   {
>> -u64 *rptr;
>> +struct amdgpu_device *adev = ring->adev;
>> +u64 rptr;
>> +u32 lowbit, highbit;
>> +
>> +lowbit = RREG32(sdma_v5_0_get_reg_offset(adev, ring->me,
>mmSDMA0_GFX_RB_RPTR));
>> +highbit = RREG32(sdma_v5_0_get_reg_offset(adev, ring->me,
>> +mmSDMA0_GFX_RB_RPTR_HI));
>
>That won't work like this.
>
>We have the readpointer writeback because we otherwise can't guarantee
>that the two 32bit values read from the registers are coherent.
>
>In other words it can be that the hi rptr is already wrapped around while the
>lo is still the old value.
>
>Why exactly doesn't the writeback work?
>
>Christian.
Issue occurs, when occurs clockgating, at the same time, the rptr write back occurs. At this time, the utcl2 translation will hang.
>
>>
>> -/* XXX check if swapping is necessary on BE */
>> -rptr = ((u64 *)&ring->adev->wb.wb[ring->rptr_offs]);
>> +rptr = highbit;
>> +rptr = rptr << 32;
>> +rptr |= lowbit;
>>
>> -DRM_DEBUG("rptr before shift == 0x%016llx\n", *rptr);
>> -return ((*rptr) >> 2);
>> +DRM_DEBUG("rptr before shift == 0x%016llx\n", rptr);
>> +return (rptr >> 2);
>>   }
>>
>>   /**
>> @@ -702,7 +708,7 @@ static int sdma_v5_0_gfx_resume(struct
>amdgpu_device *adev)
>>   WREG32(sdma_v5_0_get_reg_offset(adev, i,
>mmSDMA0_GFX_RB_RPTR_ADDR_LO),
>>          lower_32_bits(adev->wb.gpu_addr + wb_offset) &
>0xFFFFFFFC);
>>
>> -rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL,
>RPTR_WRITEBACK_ENABLE, 1);
>> +rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL,
>> +RPTR_WRITEBACK_ENABLE, 0);
>>
>>   WREG32(sdma_v5_0_get_reg_offset(adev, i,
>mmSDMA0_GFX_RB_BASE), ring->gpu_addr >> 8);
>>   WREG32(sdma_v5_0_get_reg_offset(adev, i,
>mmSDMA0_GFX_RB_BASE_HI),
>> ring->gpu_addr >> 40);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12
  2021-03-30  7:20     ` Deng, Emily
@ 2021-03-30  7:24       ` Christian König
  2021-03-30  7:40         ` Deng, Emily
  0 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2021-03-30  7:24 UTC (permalink / raw)
  To: Deng, Emily, amd-gfx, Alex Deucher

Am 30.03.21 um 09:20 schrieb Deng, Emily:
> [AMD Official Use Only - Internal Distribution Only]
>
>> -----Original Message-----
>> From: Christian König <ckoenig.leichtzumerken@gmail.com>
>> Sent: Tuesday, March 30, 2021 3:13 PM
>> To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12
>>
>>
>>
>> Am 30.03.21 um 06:41 schrieb Emily Deng:
>>> It will hit ramdomly sdma hang, and pending on utcl2 address
>>> translation when access the RPTR polling address.
>>>
>>> According sdma firmware team mentioned, the RPTR writeback is done by
>>> hardware automatically, and will hit issue when clock gating occurs.
>>> So stop using the rptr write back for sdma5.0.
>>>
>>> Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 18 ++++++++++++------
>>>    1 file changed, 12 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>>> index 920fc6d4a127..63e4a78181b8 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>>> @@ -298,13 +298,19 @@ static void
>> sdma_v5_0_ring_patch_cond_exec(struct amdgpu_ring *ring,
>>>     */
>>>    static uint64_t sdma_v5_0_ring_get_rptr(struct amdgpu_ring *ring)
>>>    {
>>> -u64 *rptr;
>>> +struct amdgpu_device *adev = ring->adev;
>>> +u64 rptr;
>>> +u32 lowbit, highbit;
>>> +
>>> +lowbit = RREG32(sdma_v5_0_get_reg_offset(adev, ring->me,
>> mmSDMA0_GFX_RB_RPTR));
>>> +highbit = RREG32(sdma_v5_0_get_reg_offset(adev, ring->me,
>>> +mmSDMA0_GFX_RB_RPTR_HI));
>> That won't work like this.
>>
>> We have the readpointer writeback because we otherwise can't guarantee
>> that the two 32bit values read from the registers are coherent.
>>
>> In other words it can be that the hi rptr is already wrapped around while the
>> lo is still the old value.
>>
>> Why exactly doesn't the writeback work?
>>
>> Christian.
> Issue occurs, when occurs clockgating, at the same time, the rptr write back occurs. At this time, the utcl2 translation will hang.

Mhm, crap. Alex are you up to date on this bug?

I'm not an expert on the SDMA, but my last status is that writeback is 
mandatory when we use 64bit rptr/wptr.

Otherwise we need a workaround how to read a consistent 64bit rptr from 
two 32bit registers.

Can you check the register documentation if there is any double 
buffering or stuff like that?

Christian.

>>> -/* XXX check if swapping is necessary on BE */
>>> -rptr = ((u64 *)&ring->adev->wb.wb[ring->rptr_offs]);
>>> +rptr = highbit;
>>> +rptr = rptr << 32;
>>> +rptr |= lowbit;
>>>
>>> -DRM_DEBUG("rptr before shift == 0x%016llx\n", *rptr);
>>> -return ((*rptr) >> 2);
>>> +DRM_DEBUG("rptr before shift == 0x%016llx\n", rptr);
>>> +return (rptr >> 2);
>>>    }
>>>
>>>    /**
>>> @@ -702,7 +708,7 @@ static int sdma_v5_0_gfx_resume(struct
>> amdgpu_device *adev)
>>>    WREG32(sdma_v5_0_get_reg_offset(adev, i,
>> mmSDMA0_GFX_RB_RPTR_ADDR_LO),
>>>           lower_32_bits(adev->wb.gpu_addr + wb_offset) &
>> 0xFFFFFFFC);
>>> -rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL,
>> RPTR_WRITEBACK_ENABLE, 1);
>>> +rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL,
>>> +RPTR_WRITEBACK_ENABLE, 0);
>>>
>>>    WREG32(sdma_v5_0_get_reg_offset(adev, i,
>> mmSDMA0_GFX_RB_BASE), ring->gpu_addr >> 8);
>>>    WREG32(sdma_v5_0_get_reg_offset(adev, i,
>> mmSDMA0_GFX_RB_BASE_HI),
>>> ring->gpu_addr >> 40);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12
  2021-03-30  7:24       ` Christian König
@ 2021-03-30  7:40         ` Deng, Emily
  0 siblings, 0 replies; 25+ messages in thread
From: Deng, Emily @ 2021-03-30  7:40 UTC (permalink / raw)
  To: Christian König, amd-gfx, Deucher, Alexander

[AMD Official Use Only - Internal Distribution Only]

>-----Original Message-----
>From: Christian König <ckoenig.leichtzumerken@gmail.com>
>Sent: Tuesday, March 30, 2021 3:24 PM
>To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org;
>Deucher, Alexander <Alexander.Deucher@amd.com>
>Subject: Re: [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12
>
>Am 30.03.21 um 09:20 schrieb Deng, Emily:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>> -----Original Message-----
>>> From: Christian König <ckoenig.leichtzumerken@gmail.com>
>>> Sent: Tuesday, March 30, 2021 3:13 PM
>>> To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH 5/6] drm/amdgpu: Disable RPTR write back for
>>> navi12
>>>
>>>
>>>
>>> Am 30.03.21 um 06:41 schrieb Emily Deng:
>>>> It will hit ramdomly sdma hang, and pending on utcl2 address
>>>> translation when access the RPTR polling address.
>>>>
>>>> According sdma firmware team mentioned, the RPTR writeback is done
>>>> by hardware automatically, and will hit issue when clock gating occurs.
>>>> So stop using the rptr write back for sdma5.0.
>>>>
>>>> Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 18 ++++++++++++------
>>>>    1 file changed, 12 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>>>> index 920fc6d4a127..63e4a78181b8 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
>>>> @@ -298,13 +298,19 @@ static void
>>> sdma_v5_0_ring_patch_cond_exec(struct amdgpu_ring *ring,
>>>>     */
>>>>    static uint64_t sdma_v5_0_ring_get_rptr(struct amdgpu_ring *ring)
>>>>    {
>>>> -u64 *rptr;
>>>> +struct amdgpu_device *adev = ring->adev;
>>>> +u64 rptr;
>>>> +u32 lowbit, highbit;
>>>> +
>>>> +lowbit = RREG32(sdma_v5_0_get_reg_offset(adev, ring->me,
>>> mmSDMA0_GFX_RB_RPTR));
>>>> +highbit = RREG32(sdma_v5_0_get_reg_offset(adev, ring->me,
>>>> +mmSDMA0_GFX_RB_RPTR_HI));
>>> That won't work like this.
>>>
>>> We have the readpointer writeback because we otherwise can't
>>> guarantee that the two 32bit values read from the registers are coherent.
>>>
>>> In other words it can be that the hi rptr is already wrapped around
>>> while the lo is still the old value.
>>>
>>> Why exactly doesn't the writeback work?
>>>
>>> Christian.
>> Issue occurs, when occurs clockgating, at the same time, the rptr write back
>occurs. At this time, the utcl2 translation will hang.
>
>Mhm, crap. Alex are you up to date on this bug?
>
>I'm not an expert on the SDMA, but my last status is that writeback is
>mandatory when we use 64bit rptr/wptr.
>
>Otherwise we need a workaround how to read a consistent 64bit rptr from
>two 32bit registers.
>
>Can you check the register documentation if there is any double buffering or
>stuff like that?
>
>Christian.
Hi Christian,
     Thanks to point out the inconsistent issue for 64 bit register. Please ignore this patch. Will try to fix the issue in sdma firmware.

Best wishes
Emily Deng


>
>>>> -/* XXX check if swapping is necessary on BE */ -rptr = ((u64
>>>> *)&ring->adev->wb.wb[ring->rptr_offs]);
>>>> +rptr = highbit;
>>>> +rptr = rptr << 32;
>>>> +rptr |= lowbit;
>>>>
>>>> -DRM_DEBUG("rptr before shift == 0x%016llx\n", *rptr); -return
>>>> ((*rptr) >> 2);
>>>> +DRM_DEBUG("rptr before shift == 0x%016llx\n", rptr); return (rptr
>>>> +>> 2);
>>>>    }
>>>>
>>>>    /**
>>>> @@ -702,7 +708,7 @@ static int sdma_v5_0_gfx_resume(struct
>>> amdgpu_device *adev)
>>>>    WREG32(sdma_v5_0_get_reg_offset(adev, i,
>>> mmSDMA0_GFX_RB_RPTR_ADDR_LO),
>>>>           lower_32_bits(adev->wb.gpu_addr + wb_offset) &
>>> 0xFFFFFFFC);
>>>> -rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL,
>>> RPTR_WRITEBACK_ENABLE, 1);
>>>> +rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL,
>>>> +RPTR_WRITEBACK_ENABLE, 0);
>>>>
>>>>    WREG32(sdma_v5_0_get_reg_offset(adev, i,
>>> mmSDMA0_GFX_RB_BASE), ring->gpu_addr >> 8);
>>>>    WREG32(sdma_v5_0_get_reg_offset(adev, i,
>>> mmSDMA0_GFX_RB_BASE_HI),
>>>> ring->gpu_addr >> 40);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 3/6] drm/amdgpu: Restore msix after FLR
  2021-03-30  5:37   ` Chen, Guchun
@ 2021-03-30  8:07     ` Deng, Emily
  0 siblings, 0 replies; 25+ messages in thread
From: Deng, Emily @ 2021-03-30  8:07 UTC (permalink / raw)
  To: Chen, Guchun, amd-gfx

Hi Guchun,
    Ok, will make it to static function.

>-----Original Message-----
>From: Chen, Guchun <Guchun.Chen@amd.com>
>Sent: Tuesday, March 30, 2021 1:38 PM
>To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily <Emily.Deng@amd.com>
>Subject: RE: [PATCH 3/6] drm/amdgpu: Restore msix after FLR
>
>[AMD Public Use]
>
>amdgpu_irq_restore_msix should be one static function?
>
>Regards,
>Guchun
>
>-----Original Message-----
>From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Emily
>Deng
>Sent: Tuesday, March 30, 2021 12:42 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily <Emily.Deng@amd.com>
>Subject: [PATCH 3/6] drm/amdgpu: Restore msix after FLR
>
>From: "Emily.Deng" <Emily.Deng@amd.com>
>
>After FLR, the msix will be cleared, so need to re-enable it.
>
>v2:
>Change name with amdgpu_irq prefix, remove #ifdef.
>
>Signed-off-by: Emily.Deng <Emily.Deng@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>index 03412543427a..8936589bd7f9 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>@@ -277,6 +277,17 @@ static bool amdgpu_msi_ok(struct amdgpu_device
>*adev)
> 	return true;
> }
>
>+void amdgpu_irq_restore_msix(struct amdgpu_device *adev) {
>+	u16 ctrl;
>+
>+	pci_read_config_word(adev->pdev, adev->pdev->msix_cap +
>PCI_MSIX_FLAGS, &ctrl);
>+	ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
>+	pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>PCI_MSIX_FLAGS, ctrl);
>+	ctrl |= PCI_MSIX_FLAGS_ENABLE;
>+	pci_write_config_word(adev->pdev, adev->pdev->msix_cap +
>+PCI_MSIX_FLAGS, ctrl); }
>+
> /**
>  * amdgpu_irq_init - initialize interrupt handling
>  *
>@@ -558,6 +569,7 @@ void amdgpu_irq_gpu_reset_resume_helper(struct
>amdgpu_device *adev)  {
> 	int i, j, k;
>
>+	amdgpu_irq_restore_msix(adev);
> 	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
> 		if (!adev->irq.client[i].sources)
> 			continue;
>--
>2.25.1
>
>_______________________________________________
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fr
>eedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfx&amp;data=04%7C01%7Cguchun.chen%40amd.com%7C6aff296c96104aef
>176208d8f3362acf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6
>37526761267513989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
>AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=B
>G4P%2FbJmn8PiLR%2BxTys8cVWK6924LWftjTXjKqrgnkg%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
  2021-03-30  7:10       ` Christian König
@ 2021-03-30  8:19         ` Deng, Emily
  2021-03-30  8:37           ` Christian König
  0 siblings, 1 reply; 25+ messages in thread
From: Deng, Emily @ 2021-03-30  8:19 UTC (permalink / raw)
  To: Christian König, Chen, Jiansong (Simon), amd-gfx

[AMD Official Use Only - Internal Distribution Only]

Hi Christian,
     Yes, I agree both with you. But the issue occurs randomly and in unload driver and in fairly low rate. It is hard to debug where is the memory leak. Could you give some suggestion about how
to debug this issue?


Best wishes
Emily Deng



>-----Original Message-----
>From: Christian König <ckoenig.leichtzumerken@gmail.com>
>Sent: Tuesday, March 30, 2021 3:11 PM
>To: Deng, Emily <Emily.Deng@amd.com>; Chen, Jiansong (Simon)
><Jiansong.Chen@amd.com>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>
>Good morning,
>
>yes Jiansong is right that patch is really not a good idea.
>
>Moving buffers can indeed happen during shutdown while some memory is
>still referenced.
>
>Just ignoring the move is not the right approach, you need to find out why the
>memory is moved in the first place.
>
>You could add something like WARN_ON(adev->shutdown);
>
>Regards,
>Christian.
>
>Am 30.03.21 um 09:05 schrieb Deng, Emily:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Hi Jiansong,
>>       It does happen,  maybe have the race condition?
>>
>>
>> Best wishes
>> Emily Deng
>>
>>
>>
>>> -----Original Message-----
>>> From: Chen, Jiansong (Simon) <Jiansong.Chen@amd.com>
>>> Sent: Tuesday, March 30, 2021 2:49 PM
>>> To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>>> Cc: Deng, Emily <Emily.Deng@amd.com>
>>> Subject: RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>> I still wonder how the issue takes place? According to my humble
>>> knowledge in driver model, the reference count of the kobject for the
>>> device will not reach zero when there is still some device mem
>>> access, and shutdown should not happen.
>>>
>>> Regards,
>>> Jiansong
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>>> Emily Deng
>>> Sent: Tuesday, March 30, 2021 12:42 PM
>>> To: amd-gfx@lists.freedesktop.org
>>> Cc: Deng, Emily <Emily.Deng@amd.com>
>>> Subject: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>
>>> During driver unloading, don't need to copy mem, or it will introduce
>>> some call trace, such as when sa_manager is freed, it will introduce
>>> warn call trace in amdgpu_sa_bo_new.
>>>
>>> Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> index e00263bcc88b..f0546a489e0d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> @@ -317,6 +317,9 @@ int amdgpu_ttm_copy_mem_to_mem(struct
>>> amdgpu_device *adev,  struct dma_fence *fence = NULL;  int r = 0;
>>>
>>> +if (adev->shutdown)
>>> +return 0;
>>> +
>>> if (!adev->mman.buffer_funcs_enabled) {  DRM_ERROR("Trying to move
>>> memory with ring turned off.\n");  return -EINVAL;
>>> --
>>> 2.25.1
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
>>> ts.fr
>>> eedesktop.org%2Fmailman%2Flistinfo%2Famd-
>>>
>gfx&amp;data=04%7C01%7CJiansong.Chen%40amd.com%7C1b4c71d7b96247
>>>
>6a367508d8f3362f40%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>>>
>C637526761354532311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>>>
>MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdat
>>>
>a=RxRnZW0fmwjKSGMN1nf6kIHRdAPVs9J5OBluDYhR6vQ%3D&amp;reserved
>>> =0
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-
>gfx&amp;data=04%7C01%7CEm
>>
>ily.Deng%40amd.com%7Cffacb4715aff4ba4336808d8f34af62d%7C3dd8961fe4
>884e
>>
>608e11a82d994e183d%7C0%7C0%7C637526850578585302%7CUnknown%7CT
>WFpbGZsb3
>>
>d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
>3D%7
>>
>C1000&amp;sdata=u26JPASmJOF5nkXFSJP89PiUUFehvzf%2B2qxQM%2FgT9Ek
>%3D&amp
>> ;reserved=0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
  2021-03-30  8:19         ` Deng, Emily
@ 2021-03-30  8:37           ` Christian König
  2021-03-30  9:11             ` Deng, Emily
  0 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2021-03-30  8:37 UTC (permalink / raw)
  To: Deng, Emily, Chen, Jiansong (Simon), amd-gfx

Hi Emily,

as I said add a WARN_ON() and look at the backtrace.

It could be that the backtrace then just shows the general cleanup 
functions, but it is at least a start.

On the other hand if you only see this sometimes then we have some kind 
of race condition and need to dig deeper.

Christian.

Am 30.03.21 um 10:19 schrieb Deng, Emily:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi Christian,
>       Yes, I agree both with you. But the issue occurs randomly and in unload driver and in fairly low rate. It is hard to debug where is the memory leak. Could you give some suggestion about how
> to debug this issue?
>
>
> Best wishes
> Emily Deng
>
>
>
>> -----Original Message-----
>> From: Christian König <ckoenig.leichtzumerken@gmail.com>
>> Sent: Tuesday, March 30, 2021 3:11 PM
>> To: Deng, Emily <Emily.Deng@amd.com>; Chen, Jiansong (Simon)
>> <Jiansong.Chen@amd.com>; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>
>> Good morning,
>>
>> yes Jiansong is right that patch is really not a good idea.
>>
>> Moving buffers can indeed happen during shutdown while some memory is
>> still referenced.
>>
>> Just ignoring the move is not the right approach, you need to find out why the
>> memory is moved in the first place.
>>
>> You could add something like WARN_ON(adev->shutdown);
>>
>> Regards,
>> Christian.
>>
>> Am 30.03.21 um 09:05 schrieb Deng, Emily:
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>> Hi Jiansong,
>>>        It does happen,  maybe have the race condition?
>>>
>>>
>>> Best wishes
>>> Emily Deng
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Chen, Jiansong (Simon) <Jiansong.Chen@amd.com>
>>>> Sent: Tuesday, March 30, 2021 2:49 PM
>>>> To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>>>> Cc: Deng, Emily <Emily.Deng@amd.com>
>>>> Subject: RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>>
>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>
>>>> I still wonder how the issue takes place? According to my humble
>>>> knowledge in driver model, the reference count of the kobject for the
>>>> device will not reach zero when there is still some device mem
>>>> access, and shutdown should not happen.
>>>>
>>>> Regards,
>>>> Jiansong
>>>> -----Original Message-----
>>>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>>>> Emily Deng
>>>> Sent: Tuesday, March 30, 2021 12:42 PM
>>>> To: amd-gfx@lists.freedesktop.org
>>>> Cc: Deng, Emily <Emily.Deng@amd.com>
>>>> Subject: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>>
>>>> During driver unloading, don't need to copy mem, or it will introduce
>>>> some call trace, such as when sa_manager is freed, it will introduce
>>>> warn call trace in amdgpu_sa_bo_new.
>>>>
>>>> Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> index e00263bcc88b..f0546a489e0d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> @@ -317,6 +317,9 @@ int amdgpu_ttm_copy_mem_to_mem(struct
>>>> amdgpu_device *adev,  struct dma_fence *fence = NULL;  int r = 0;
>>>>
>>>> +if (adev->shutdown)
>>>> +return 0;
>>>> +
>>>> if (!adev->mman.buffer_funcs_enabled) {  DRM_ERROR("Trying to move
>>>> memory with ring turned off.\n");  return -EINVAL;
>>>> --
>>>> 2.25.1
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
>>>> ts.fr
>>>> eedesktop.org%2Fmailman%2Flistinfo%2Famd-
>>>>
>> gfx&amp;data=04%7C01%7CJiansong.Chen%40amd.com%7C1b4c71d7b96247
>> 6a367508d8f3362f40%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>> C637526761354532311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdat
>> a=RxRnZW0fmwjKSGMN1nf6kIHRdAPVs9J5OBluDYhR6vQ%3D&amp;reserved
>>>> =0
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
>>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-
>> gfx&amp;data=04%7C01%7CEm
>> ily.Deng%40amd.com%7Cffacb4715aff4ba4336808d8f34af62d%7C3dd8961fe4
>> 884e
>> 608e11a82d994e183d%7C0%7C0%7C637526850578585302%7CUnknown%7CT
>> WFpbGZsb3
>> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
>> 3D%7
>> C1000&amp;sdata=u26JPASmJOF5nkXFSJP89PiUUFehvzf%2B2qxQM%2FgT9Ek
>> %3D&amp
>>> ;reserved=0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
  2021-03-30  8:37           ` Christian König
@ 2021-03-30  9:11             ` Deng, Emily
  0 siblings, 0 replies; 25+ messages in thread
From: Deng, Emily @ 2021-03-30  9:11 UTC (permalink / raw)
  To: Christian König, Chen, Jiansong (Simon), amd-gfx

[AMD Official Use Only - Internal Distribution Only]

Hi Christian,
     Ok, will investigate this more for memory leak. But even I fixed this memory leak this time, it couldn't promise anymore memory leak in future. Memory leak shouldn't cause kernel crush, and couldn't
be used anymore.

Best wishes
Emily Deng



>-----Original Message-----
>From: Christian König <ckoenig.leichtzumerken@gmail.com>
>Sent: Tuesday, March 30, 2021 4:38 PM
>To: Deng, Emily <Emily.Deng@amd.com>; Chen, Jiansong (Simon)
><Jiansong.Chen@amd.com>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>
>Hi Emily,
>
>as I said add a WARN_ON() and look at the backtrace.
>
>It could be that the backtrace then just shows the general cleanup functions,
>but it is at least a start.
>
>On the other hand if you only see this sometimes then we have some kind of
>race condition and need to dig deeper.
>
>Christian.
>
>Am 30.03.21 um 10:19 schrieb Deng, Emily:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>> Hi Christian,
>>       Yes, I agree both with you. But the issue occurs randomly and in
>> unload driver and in fairly low rate. It is hard to debug where is the memory
>leak. Could you give some suggestion about how to debug this issue?
>>
>>
>> Best wishes
>> Emily Deng
>>
>>
>>
>>> -----Original Message-----
>>> From: Christian König <ckoenig.leichtzumerken@gmail.com>
>>> Sent: Tuesday, March 30, 2021 3:11 PM
>>> To: Deng, Emily <Emily.Deng@amd.com>; Chen, Jiansong (Simon)
>>> <Jiansong.Chen@amd.com>; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>
>>> Good morning,
>>>
>>> yes Jiansong is right that patch is really not a good idea.
>>>
>>> Moving buffers can indeed happen during shutdown while some memory
>is
>>> still referenced.
>>>
>>> Just ignoring the move is not the right approach, you need to find
>>> out why the memory is moved in the first place.
>>>
>>> You could add something like WARN_ON(adev->shutdown);
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 30.03.21 um 09:05 schrieb Deng, Emily:
>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>
>>>> Hi Jiansong,
>>>>        It does happen,  maybe have the race condition?
>>>>
>>>>
>>>> Best wishes
>>>> Emily Deng
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Chen, Jiansong (Simon) <Jiansong.Chen@amd.com>
>>>>> Sent: Tuesday, March 30, 2021 2:49 PM
>>>>> To: Deng, Emily <Emily.Deng@amd.com>; amd-
>gfx@lists.freedesktop.org
>>>>> Cc: Deng, Emily <Emily.Deng@amd.com>
>>>>> Subject: RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>>>
>>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>>
>>>>> I still wonder how the issue takes place? According to my humble
>>>>> knowledge in driver model, the reference count of the kobject for
>>>>> the device will not reach zero when there is still some device mem
>>>>> access, and shutdown should not happen.
>>>>>
>>>>> Regards,
>>>>> Jiansong
>>>>> -----Original Message-----
>>>>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>>>>> Emily Deng
>>>>> Sent: Tuesday, March 30, 2021 12:42 PM
>>>>> To: amd-gfx@lists.freedesktop.org
>>>>> Cc: Deng, Emily <Emily.Deng@amd.com>
>>>>> Subject: [PATCH 6/6] drm/amdgpu: Fix driver unload issue
>>>>>
>>>>> During driver unloading, don't need to copy mem, or it will
>>>>> introduce some call trace, such as when sa_manager is freed, it
>>>>> will introduce warn call trace in amdgpu_sa_bo_new.
>>>>>
>>>>> Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
>>>>> 1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> index e00263bcc88b..f0546a489e0d 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> @@ -317,6 +317,9 @@ int amdgpu_ttm_copy_mem_to_mem(struct
>>>>> amdgpu_device *adev,  struct dma_fence *fence = NULL;  int r = 0;
>>>>>
>>>>> +if (adev->shutdown)
>>>>> +return 0;
>>>>> +
>>>>> if (!adev->mman.buffer_funcs_enabled) {  DRM_ERROR("Trying to move
>>>>> memory with ring turned off.\n");  return -EINVAL;
>>>>> --
>>>>> 2.25.1
>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fl
>>>>> is
>>>>> ts.fr
>>>>> eedesktop.org%2Fmailman%2Flistinfo%2Famd-
>>>>>
>>>
>gfx&amp;data=04%7C01%7CJiansong.Chen%40amd.com%7C1b4c71d7b96247
>>>
>6a367508d8f3362f40%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7
>>>
>C637526761354532311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>>>
>MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdat
>>>
>a=RxRnZW0fmwjKSGMN1nf6kIHRdAPVs9J5OBluDYhR6vQ%3D&amp;reserved
>>>>> =0
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
>>>> st
>>>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-
>>> gfx&amp;data=04%7C01%7CEm
>>>
>ily.Deng%40amd.com%7Cffacb4715aff4ba4336808d8f34af62d%7C3dd8961fe4
>>> 884e
>>>
>608e11a82d994e183d%7C0%7C0%7C637526850578585302%7CUnknown%7CT
>>> WFpbGZsb3
>>>
>d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
>>> 3D%7
>>>
>C1000&amp;sdata=u26JPASmJOF5nkXFSJP89PiUUFehvzf%2B2qxQM%2FgT9Ek
>>> %3D&amp
>>>> ;reserved=0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12
  2021-03-30  4:41 [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12 Emily Deng
                   ` (4 preceding siblings ...)
  2021-03-30  4:41 ` [PATCH 6/6] drm/amdgpu: Fix driver unload issue Emily Deng
@ 2021-03-31  9:00 ` Deng, Emily
  2021-04-01  6:03   ` Deng, Emily
  5 siblings, 1 reply; 25+ messages in thread
From: Deng, Emily @ 2021-03-31  9:00 UTC (permalink / raw)
  To: Deng, Emily, amd-gfx; +Cc: Min, Frank

[AMD Official Use Only - Internal Distribution Only]

Ping......

>-----Original Message-----
>From: Emily Deng <Emily.Deng@amd.com>
>Sent: Tuesday, March 30, 2021 12:42 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily <Emily.Deng@amd.com>; Min, Frank <Frank.Min@amd.com>
>Subject: [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12
>
>Since vcn decoding ring is not required, so just disable it.
>
>Signed-off-by: Frank.Min <Frank.Min@amd.com>
>Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  4 +++-
> drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c   | 29 ++++++++++++-------------
> 2 files changed, 17 insertions(+), 16 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>index 8844f650b17f..5d5c41c9d5aa 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>@@ -427,7 +427,9 @@ static int amdgpu_hw_ip_info(struct amdgpu_device
>*adev,
> if (adev->uvd.harvest_config & (1 << i))
> continue;
>
>-if (adev->vcn.inst[i].ring_dec.sched.ready)
>+if (adev->vcn.inst[i].ring_dec.sched.ready ||
>+(adev->asic_type == CHIP_NAVI12 &&
>+amdgpu_sriov_vf(adev)))
> ++num_rings;
> }
> ib_start_alignment = 16;
>diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>index 116b9643d5ba..e4b61f3a45fb 100644
>--- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>@@ -220,21 +220,20 @@ static int vcn_v2_0_hw_init(void *handle)  {
> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> struct amdgpu_ring *ring = &adev->vcn.inst->ring_dec;
>-int i, r;
>+int i, r = -1;
>
> adev->nbio.funcs->vcn_doorbell_range(adev, ring->use_doorbell,
>      ring->doorbell_index, 0);
>
>-if (amdgpu_sriov_vf(adev))
>+if (amdgpu_sriov_vf(adev)) {
> vcn_v2_0_start_sriov(adev);
>-
>-r = amdgpu_ring_test_helper(ring);
>-if (r)
>-goto done;
>-
>-//Disable vcn decode for sriov
>-if (amdgpu_sriov_vf(adev))
>-ring->sched.ready = false;
>+if (adev->asic_type == CHIP_NAVI12)
>+ring->sched.ready = false;
>+} else {
>+r = amdgpu_ring_test_helper(ring);
>+if (r)
>+goto done;
>+}
>
> for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
> ring = &adev->vcn.inst->ring_enc[i];
>@@ -245,8 +244,11 @@ static int vcn_v2_0_hw_init(void *handle)
>
> done:
> if (!r)
>-DRM_INFO("VCN decode and encode initialized
>successfully(under %s).\n",
>-(adev->pg_flags &
>AMD_PG_SUPPORT_VCN_DPG)?"DPG Mode":"SPG Mode");
>+DRM_INFO("VCN %s encode initialized
>successfully(under %s).\n",
>+(adev->asic_type == CHIP_NAVI12 &&
>+amdgpu_sriov_vf(adev))?"":"decode and",
>+(adev->pg_flags &
>+AMD_PG_SUPPORT_VCN_DPG)?"DPG
>Mode":"SPG Mode");
>
> return r;
> }
>@@ -1719,9 +1721,6 @@ int vcn_v2_0_dec_ring_test_ring(struct
>amdgpu_ring *ring)
> unsigned i;
> int r;
>
>-if (amdgpu_sriov_vf(adev))
>-return 0;
>-
> WREG32(adev->vcn.inst[ring->me].external.scratch9, 0xCAFEDEAD);
> r = amdgpu_ring_alloc(ring, 4);
> if (r)
>--
>2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual ctrc
  2021-03-30  4:41 ` [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual ctrc Emily Deng
@ 2021-03-31  9:00   ` Deng, Emily
  2021-04-01  6:03     ` Deng, Emily
  0 siblings, 1 reply; 25+ messages in thread
From: Deng, Emily @ 2021-03-31  9:00 UTC (permalink / raw)
  To: Deng, Emily, amd-gfx

[AMD Official Use Only - Internal Distribution Only]

Ping......

>-----Original Message-----
>From: Emily Deng <Emily.Deng@amd.com>
>Sent: Tuesday, March 30, 2021 12:42 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily <Emily.Deng@amd.com>
>Subject: [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual ctrc
>
>Set the num_types equal to the enabled num_crtc.
>
>Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>index 5c11144da051..c03a83a2b7cd 100644
>--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>@@ -768,7 +768,7 @@ static const struct amdgpu_irq_src_funcs
>dce_virtual_crtc_irq_funcs = {
>
> static void dce_virtual_set_irq_funcs(struct amdgpu_device *adev)  {
>-adev->crtc_irq.num_types = AMDGPU_CRTC_IRQ_VBLANK6 + 1;
>+adev->crtc_irq.num_types = adev->mode_info.num_crtc;
> adev->crtc_irq.funcs = &dce_virtual_crtc_irq_funcs;  }
>
>--
>2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from vram for navi12 sriov
  2021-03-30  4:41 ` [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from vram for navi12 sriov Emily Deng
@ 2021-03-31  9:01   ` Deng, Emily
  2021-04-01  6:03     ` Deng, Emily
  0 siblings, 1 reply; 25+ messages in thread
From: Deng, Emily @ 2021-03-31  9:01 UTC (permalink / raw)
  To: Deng, Emily, amd-gfx

[AMD Official Use Only - Internal Distribution Only]

Ping .....

>-----Original Message-----
>From: Emily Deng <Emily.Deng@amd.com>
>Sent: Tuesday, March 30, 2021 12:42 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Deng, Emily <Emily.Deng@amd.com>
>Subject: [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from vram for
>navi12 sriov
>
>To fix the board disappear issue.
>
>Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/nv.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
>b/drivers/gpu/drm/amd/amdgpu/nv.c index 46d4bbabce75..48dc171bc759
>100644
>--- a/drivers/gpu/drm/amd/amdgpu/nv.c
>+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>@@ -693,6 +693,10 @@ int nv_set_ip_blocks(struct amdgpu_device *adev)
> adev->nbio.funcs = &nbio_v2_3_funcs;
> adev->nbio.hdp_flush_reg = &nbio_v2_3_hdp_flush_reg;
> }
>+
>+if (amdgpu_sriov_vf(adev) && adev->asic_type == CHIP_NAVI12)
>+amdgpu_discovery = 0;
>+
> adev->hdp.funcs = &hdp_v5_0_funcs;
>
> if (adev->asic_type >= CHIP_SIENNA_CICHLID)
>--
>2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12
  2021-03-31  9:00 ` [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12 Deng, Emily
@ 2021-04-01  6:03   ` Deng, Emily
  0 siblings, 0 replies; 25+ messages in thread
From: Deng, Emily @ 2021-04-01  6:03 UTC (permalink / raw)
  To: Liu, Monk; +Cc: Min, Frank, amd-gfx

[AMD Official Use Only - Internal Distribution Only]

Hi Monk,
     Could you help to review this patch?

Best wishes
Emily Deng


>-----Original Message-----
>From: Deng, Emily <Emily.Deng@amd.com>
>Sent: Wednesday, March 31, 2021 5:01 PM
>To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>Cc: Min, Frank <Frank.Min@amd.com>
>Subject: RE: [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov
>navi12
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping......
>
>>-----Original Message-----
>>From: Emily Deng <Emily.Deng@amd.com>
>>Sent: Tuesday, March 30, 2021 12:42 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Deng, Emily <Emily.Deng@amd.com>; Min, Frank <Frank.Min@amd.com>
>>Subject: [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov
>>navi12
>>
>>Since vcn decoding ring is not required, so just disable it.
>>
>>Signed-off-by: Frank.Min <Frank.Min@amd.com>
>>Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>>---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  4 +++-
>> drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c   | 29 ++++++++++++-------------
>> 2 files changed, 17 insertions(+), 16 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>index 8844f650b17f..5d5c41c9d5aa 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>@@ -427,7 +427,9 @@ static int amdgpu_hw_ip_info(struct amdgpu_device
>>*adev,  if (adev->uvd.harvest_config & (1 << i))  continue;
>>
>>-if (adev->vcn.inst[i].ring_dec.sched.ready)
>>+if (adev->vcn.inst[i].ring_dec.sched.ready || (adev->asic_type ==
>>+CHIP_NAVI12 &&
>>+amdgpu_sriov_vf(adev)))
>> ++num_rings;
>> }
>> ib_start_alignment = 16;
>>diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>>b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>>index 116b9643d5ba..e4b61f3a45fb 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c
>>@@ -220,21 +220,20 @@ static int vcn_v2_0_hw_init(void *handle)  {
>>struct amdgpu_device *adev = (struct amdgpu_device *)handle;  struct
>>amdgpu_ring *ring = &adev->vcn.inst->ring_dec; -int i, r;
>>+int i, r = -1;
>>
>> adev->nbio.funcs->vcn_doorbell_range(adev, ring->use_doorbell,
>>      ring->doorbell_index, 0);
>>
>>-if (amdgpu_sriov_vf(adev))
>>+if (amdgpu_sriov_vf(adev)) {
>> vcn_v2_0_start_sriov(adev);
>>-
>>-r = amdgpu_ring_test_helper(ring);
>>-if (r)
>>-goto done;
>>-
>>-//Disable vcn decode for sriov
>>-if (amdgpu_sriov_vf(adev))
>>-ring->sched.ready = false;
>>+if (adev->asic_type == CHIP_NAVI12)
>>+ring->sched.ready = false;
>>+} else {
>>+r = amdgpu_ring_test_helper(ring);
>>+if (r)
>>+goto done;
>>+}
>>
>> for (i = 0; i < adev->vcn.num_enc_rings; ++i) {  ring =
>>&adev->vcn.inst->ring_enc[i]; @@ -245,8 +244,11 @@ static int
>>vcn_v2_0_hw_init(void *handle)
>>
>> done:
>> if (!r)
>>-DRM_INFO("VCN decode and encode initialized successfully(under
>>%s).\n", -(adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)?"DPG
>Mode":"SPG
>>Mode");
>>+DRM_INFO("VCN %s encode initialized
>>successfully(under %s).\n",
>>+(adev->asic_type == CHIP_NAVI12 &&
>>+amdgpu_sriov_vf(adev))?"":"decode and", (adev->pg_flags &
>>+AMD_PG_SUPPORT_VCN_DPG)?"DPG
>>Mode":"SPG Mode");
>>
>> return r;
>> }
>>@@ -1719,9 +1721,6 @@ int vcn_v2_0_dec_ring_test_ring(struct
>>amdgpu_ring *ring)
>> unsigned i;
>> int r;
>>
>>-if (amdgpu_sriov_vf(adev))
>>-return 0;
>>-
>> WREG32(adev->vcn.inst[ring->me].external.scratch9, 0xCAFEDEAD);  r =
>>amdgpu_ring_alloc(ring, 4);  if (r)
>>--
>>2.25.1
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual ctrc
  2021-03-31  9:00   ` Deng, Emily
@ 2021-04-01  6:03     ` Deng, Emily
  0 siblings, 0 replies; 25+ messages in thread
From: Deng, Emily @ 2021-04-01  6:03 UTC (permalink / raw)
  To: Liu, Monk; +Cc: amd-gfx

[AMD Official Use Only - Internal Distribution Only]

Hi Monk,
     Could you help to review this patch?

Best wishes
Emily Deng

>-----Original Message-----
>From: Deng, Emily <Emily.Deng@amd.com>
>Sent: Wednesday, March 31, 2021 5:01 PM
>To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual ctrc
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping......
>
>>-----Original Message-----
>>From: Emily Deng <Emily.Deng@amd.com>
>>Sent: Tuesday, March 30, 2021 12:42 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Deng, Emily <Emily.Deng@amd.com>
>>Subject: [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual
>>ctrc
>>
>>Set the num_types equal to the enabled num_crtc.
>>
>>Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>>---
>> drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>index 5c11144da051..c03a83a2b7cd 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>@@ -768,7 +768,7 @@ static const struct amdgpu_irq_src_funcs
>>dce_virtual_crtc_irq_funcs = {
>>
>> static void dce_virtual_set_irq_funcs(struct amdgpu_device *adev)  {
>>-adev->crtc_irq.num_types = AMDGPU_CRTC_IRQ_VBLANK6 + 1;
>>+adev->crtc_irq.num_types = adev->mode_info.num_crtc;
>> adev->crtc_irq.funcs = &dce_virtual_crtc_irq_funcs;  }
>>
>>--
>>2.25.1
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from vram for navi12 sriov
  2021-03-31  9:01   ` Deng, Emily
@ 2021-04-01  6:03     ` Deng, Emily
  0 siblings, 0 replies; 25+ messages in thread
From: Deng, Emily @ 2021-04-01  6:03 UTC (permalink / raw)
  To: Liu, Monk; +Cc: amd-gfx

[AMD Official Use Only - Internal Distribution Only]

Hi Monk,
     Could you help to review this patch?

Best wishes
Emily Deng

>-----Original Message-----
>From: Deng, Emily <Emily.Deng@amd.com>
>Sent: Wednesday, March 31, 2021 5:01 PM
>To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>Subject: RE: [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from
>vram for navi12 sriov
>
>[AMD Official Use Only - Internal Distribution Only]
>
>Ping .....
>
>>-----Original Message-----
>>From: Emily Deng <Emily.Deng@amd.com>
>>Sent: Tuesday, March 30, 2021 12:42 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Deng, Emily <Emily.Deng@amd.com>
>>Subject: [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from vram
>>for
>>navi12 sriov
>>
>>To fix the board disappear issue.
>>
>>Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>>---
>> drivers/gpu/drm/amd/amdgpu/nv.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
>>b/drivers/gpu/drm/amd/amdgpu/nv.c index 46d4bbabce75..48dc171bc759
>>100644
>>--- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>@@ -693,6 +693,10 @@ int nv_set_ip_blocks(struct amdgpu_device *adev)
>> adev->nbio.funcs = &nbio_v2_3_funcs;
>> adev->nbio.hdp_flush_reg = &nbio_v2_3_hdp_flush_reg;
>> }
>>+
>>+if (amdgpu_sriov_vf(adev) && adev->asic_type == CHIP_NAVI12)
>>+amdgpu_discovery = 0;
>>+
>> adev->hdp.funcs = &hdp_v5_0_funcs;
>>
>> if (adev->asic_type >= CHIP_SIENNA_CICHLID)
>>--
>>2.25.1
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12
  2021-03-29  7:49 Emily Deng
@ 2021-03-29  7:49 ` Emily Deng
  0 siblings, 0 replies; 25+ messages in thread
From: Emily Deng @ 2021-03-29  7:49 UTC (permalink / raw)
  To: amd-gfx; +Cc: Emily Deng

It will hit ramdomly sdma hang, and pending on utcl2
address translation when access the RPTR polling address.

According sdma firmware team mentioned, the RPTR writeback is done by
hardware automatically, and will hit issue when clock gating occurs. So
stop using the rptr write back for sdma5.0.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 920fc6d4a127..6d268c70857c 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -298,13 +298,19 @@ static void sdma_v5_0_ring_patch_cond_exec(struct amdgpu_ring *ring,
  */
 static uint64_t sdma_v5_0_ring_get_rptr(struct amdgpu_ring *ring)
 {
-	u64 *rptr;
+	struct amdgpu_device *adev = ring->adev;
+	u64 rptr;
+	u32 lowbit, highbit;
+
+	lowbit = RREG32_RLC(sdma_v5_0_get_reg_offset(adev, ring->me, mmSDMA0_GFX_RB_RPTR));
+	highbit = RREG32_RLC(sdma_v5_0_get_reg_offset(adev, ring->me, mmSDMA0_GFX_RB_RPTR_HI));
 
-	/* XXX check if swapping is necessary on BE */
-	rptr = ((u64 *)&ring->adev->wb.wb[ring->rptr_offs]);
+	rptr = highbit;
+	rptr = rptr << 32;
+	rptr |= lowbit;
 
-	DRM_DEBUG("rptr before shift == 0x%016llx\n", *rptr);
-	return ((*rptr) >> 2);
+	DRM_DEBUG("rptr before shift == 0x%016llx\n", rptr);
+	return (rptr >> 2);
 }
 
 /**
@@ -702,7 +708,7 @@ static int sdma_v5_0_gfx_resume(struct amdgpu_device *adev)
 		WREG32(sdma_v5_0_get_reg_offset(adev, i, mmSDMA0_GFX_RB_RPTR_ADDR_LO),
 		       lower_32_bits(adev->wb.gpu_addr + wb_offset) & 0xFFFFFFFC);
 
-		rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL, RPTR_WRITEBACK_ENABLE, 1);
+		rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL, RPTR_WRITEBACK_ENABLE, 0);
 
 		WREG32(sdma_v5_0_get_reg_offset(adev, i, mmSDMA0_GFX_RB_BASE), ring->gpu_addr >> 8);
 		WREG32(sdma_v5_0_get_reg_offset(adev, i, mmSDMA0_GFX_RB_BASE_HI), ring->gpu_addr >> 40);
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-04-01  6:03 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-30  4:41 [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12 Emily Deng
2021-03-30  4:41 ` [PATCH 2/6] drm/amdgpu: Correct the irq numbers for virtual ctrc Emily Deng
2021-03-31  9:00   ` Deng, Emily
2021-04-01  6:03     ` Deng, Emily
2021-03-30  4:41 ` [PATCH 3/6] drm/amdgpu: Restore msix after FLR Emily Deng
2021-03-30  5:37   ` Chen, Guchun
2021-03-30  8:07     ` Deng, Emily
2021-03-30  4:41 ` [PATCH 4/6] drm/amdgpu: Disable fetch discovery data from vram for navi12 sriov Emily Deng
2021-03-31  9:01   ` Deng, Emily
2021-04-01  6:03     ` Deng, Emily
2021-03-30  4:41 ` [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12 Emily Deng
2021-03-30  7:12   ` Christian König
2021-03-30  7:20     ` Deng, Emily
2021-03-30  7:24       ` Christian König
2021-03-30  7:40         ` Deng, Emily
2021-03-30  4:41 ` [PATCH 6/6] drm/amdgpu: Fix driver unload issue Emily Deng
2021-03-30  6:49   ` Chen, Jiansong (Simon)
2021-03-30  7:05     ` Deng, Emily
2021-03-30  7:10       ` Christian König
2021-03-30  8:19         ` Deng, Emily
2021-03-30  8:37           ` Christian König
2021-03-30  9:11             ` Deng, Emily
2021-03-31  9:00 ` [PATCH 1/6] drm/amdgpu: Disable vcn decode ring for sriov navi12 Deng, Emily
2021-04-01  6:03   ` Deng, Emily
  -- strict thread matches above, loose matches on Subject: below --
2021-03-29  7:49 Emily Deng
2021-03-29  7:49 ` [PATCH 5/6] drm/amdgpu: Disable RPTR write back for navi12 Emily Deng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.