All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/18] *** misc patches for SRIOV ***
@ 2017-09-18  6:11 Monk Liu
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

found a lot of patches missed in 4.12 staging

Horace Chen (2):
  drm/amdgpu: Fix amdgpu reload failure under SRIOV
  drm/amdgpu: increate mailbox polling timeout to 12s.

Monk Liu (16):
  drm/amdgpu/sriov:fix missing error handling
  drm/amdgpu:no kiq in IH
  drm/amdgpu/sriov:move in_reset to adev and rename
  drm/amdgpu/sriov:don't load psp fw during gpu reset
  drm/amdgpu:make ctx_add_fence interruptible
  drm/amdgpu/sriov:fix memory leak after gpu reset
  drm/amdgpu:add hdp golden setting register name hint
  drm/amdgpu:halt when vm fault
  drm/amdgpu:insert TMZ_BEGIN
  drm/amdgpu:hdp flush should be put it initialized
  drm/amdgpu:add vgt_flush for gfx9
  drm/amdgpu:use formal register to trigger hdp invalidate
  drm/amdgpu:fix driver unloading bug
  drm/amdgpu/sriov: fix page fault issue of driver unload
  drm/amdgpu:fix uvd ring fini routine
  drm/amdgpu/sriov:init csb for gfxv9

 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     |  12 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c    |  14 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   8 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    |   5 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    |  10 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c    |  15 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c  |  64 +++++++++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c    |  12 +++-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c      |   7 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c      | 100 +++++++++++++++++++++++++----
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c   |   6 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c      |  32 ++++-----
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c    |   7 ++
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h      |   2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h      |   2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c     |   2 +-
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c     |   4 +-
 20 files changed, 226 insertions(+), 93 deletions(-)

-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 01/18] drm/amdgpu/sriov:fix missing error handling
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  6:11   ` Monk Liu
       [not found]     ` <1505715122-23904-2-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 02/18] drm/amdgpu:no kiq in IH Monk Liu
                     ` (16 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

Change-Id: Ifc6942ed0221f3134bfba4d66fde743484191da3
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index e390c01..d1ac27d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -841,8 +841,11 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
 
 	if (amdgpu_sriov_vf(adev)) {
 		r = amdgpu_map_static_csa(adev, &fpriv->vm, &fpriv->csa_va);
-		if (r)
+		if (r) {
+			amdgpu_vm_fini(adev, &fpriv->vm);
+			kfree(fpriv);
 			goto out_suspend;
+		}
 	}
 
 	mutex_init(&fpriv->bo_list_lock);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 02/18] drm/amdgpu:no kiq in IH
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 01/18] drm/amdgpu/sriov:fix missing error handling Monk Liu
@ 2017-09-18  6:11   ` Monk Liu
       [not found]     ` <1505715122-23904-3-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 03/18] drm/amdgpu/sriov:move in_reset to adev and rename Monk Liu
                     ` (15 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

Change-Id: I4deb65675d2531236b2f4e2bc6f015c657546464
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 67610f7..c291e33 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -219,9 +219,9 @@ static u32 vega10_ih_get_wptr(struct amdgpu_device *adev)
 			wptr, adev->irq.ih.rptr, tmp);
 		adev->irq.ih.rptr = tmp;
 
-		tmp = RREG32(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL));
+		tmp = RREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL));
 		tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
-		WREG32(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL), tmp);
+		WREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL), tmp);
 	}
 	return (wptr & adev->irq.ih.ptr_mask);
 }
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 03/18] drm/amdgpu/sriov:move in_reset to adev and rename
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 01/18] drm/amdgpu/sriov:fix missing error handling Monk Liu
  2017-09-18  6:11   ` [PATCH 02/18] drm/amdgpu:no kiq in IH Monk Liu
@ 2017-09-18  6:11   ` Monk Liu
       [not found]     ` <1505715122-23904-4-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 04/18] drm/amdgpu/sriov:don't load psp fw during gpu reset Monk Liu
                     ` (14 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

currently in_reset is only used in sriov gpu reset, and it
will be used for other non-gfx hw component later, like
PSP, so move it from gfx to adev and rename to in_sriov_reset
make more sense.

Change-Id: Ibb8546f6e4635a1cca740e57f6244f158c70a1e6
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c      | 6 +++---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c      | 6 +++---
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index a34c4cb..cc9a232 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1019,7 +1019,6 @@ struct amdgpu_gfx {
 	/* reset mask */
 	uint32_t                        grbm_soft_reset;
 	uint32_t                        srbm_soft_reset;
-	bool                            in_reset;
 	/* s3/s4 mask */
 	bool                            in_suspend;
 	/* NGG */
@@ -1588,6 +1587,7 @@ struct amdgpu_device {
 
 	/* record last mm index being written through WREG32*/
 	unsigned long last_mm_index;
+	bool                            in_sriov_reset;
 };
 
 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 3467179..298a241 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2757,7 +2757,7 @@ int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, struct amdgpu_job *job)
 
 	mutex_lock(&adev->virt.lock_reset);
 	atomic_inc(&adev->gpu_reset_counter);
-	adev->gfx.in_reset = true;
+	adev->in_sriov_reset = true;
 
 	/* block TTM */
 	resched = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
@@ -2868,7 +2868,7 @@ int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, struct amdgpu_job *job)
 		dev_info(adev->dev, "GPU reset successed!\n");
 	}
 
-	adev->gfx.in_reset = false;
+	adev->in_sriov_reset = false;
 	mutex_unlock(&adev->virt.lock_reset);
 	return r;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 6ee348e..3f511a9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -4810,7 +4810,7 @@ static int gfx_v8_0_kiq_init_queue(struct amdgpu_ring *ring)
 
 	gfx_v8_0_kiq_setting(ring);
 
-	if (adev->gfx.in_reset) { /* for GPU_RESET case */
+	if (adev->in_sriov_reset) { /* for GPU_RESET case */
 		/* reset MQD to a clean status */
 		if (adev->gfx.mec.mqd_backup[mqd_idx])
 			memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], sizeof(struct vi_mqd_allocation));
@@ -4847,7 +4847,7 @@ static int gfx_v8_0_kcq_init_queue(struct amdgpu_ring *ring)
 	struct vi_mqd *mqd = ring->mqd_ptr;
 	int mqd_idx = ring - &adev->gfx.compute_ring[0];
 
-	if (!adev->gfx.in_reset && !adev->gfx.in_suspend) {
+	if (!adev->in_sriov_reset && !adev->gfx.in_suspend) {
 		memset((void *)mqd, 0, sizeof(struct vi_mqd_allocation));
 		((struct vi_mqd_allocation *)mqd)->dynamic_cu_mask = 0xFFFFFFFF;
 		((struct vi_mqd_allocation *)mqd)->dynamic_rb_mask = 0xFFFFFFFF;
@@ -4859,7 +4859,7 @@ static int gfx_v8_0_kcq_init_queue(struct amdgpu_ring *ring)
 
 		if (adev->gfx.mec.mqd_backup[mqd_idx])
 			memcpy(adev->gfx.mec.mqd_backup[mqd_idx], mqd, sizeof(struct vi_mqd_allocation));
-	} else if (adev->gfx.in_reset) { /* for GPU_RESET case */
+	} else if (adev->in_sriov_reset) { /* for GPU_RESET case */
 		/* reset MQD to a clean status */
 		if (adev->gfx.mec.mqd_backup[mqd_idx])
 			memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], sizeof(struct vi_mqd_allocation));
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index c133c85..21838f4 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -2698,7 +2698,7 @@ static int gfx_v9_0_kiq_init_queue(struct amdgpu_ring *ring)
 
 	gfx_v9_0_kiq_setting(ring);
 
-	if (adev->gfx.in_reset) { /* for GPU_RESET case */
+	if (adev->in_sriov_reset) { /* for GPU_RESET case */
 		/* reset MQD to a clean status */
 		if (adev->gfx.mec.mqd_backup[mqd_idx])
 			memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], sizeof(struct v9_mqd_allocation));
@@ -2736,7 +2736,7 @@ static int gfx_v9_0_kcq_init_queue(struct amdgpu_ring *ring)
 	struct v9_mqd *mqd = ring->mqd_ptr;
 	int mqd_idx = ring - &adev->gfx.compute_ring[0];
 
-	if (!adev->gfx.in_reset && !adev->gfx.in_suspend) {
+	if (!adev->in_sriov_reset && !adev->gfx.in_suspend) {
 		memset((void *)mqd, 0, sizeof(struct v9_mqd_allocation));
 		((struct v9_mqd_allocation *)mqd)->dynamic_cu_mask = 0xFFFFFFFF;
 		((struct v9_mqd_allocation *)mqd)->dynamic_rb_mask = 0xFFFFFFFF;
@@ -2748,7 +2748,7 @@ static int gfx_v9_0_kcq_init_queue(struct amdgpu_ring *ring)
 
 		if (adev->gfx.mec.mqd_backup[mqd_idx])
 			memcpy(adev->gfx.mec.mqd_backup[mqd_idx], mqd, sizeof(struct v9_mqd_allocation));
-	} else if (adev->gfx.in_reset) { /* for GPU_RESET case */
+	} else if (adev->in_sriov_reset) { /* for GPU_RESET case */
 		/* reset MQD to a clean status */
 		if (adev->gfx.mec.mqd_backup[mqd_idx])
 			memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], sizeof(struct v9_mqd_allocation));
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 04/18] drm/amdgpu/sriov:don't load psp fw during gpu reset
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-09-18  6:11   ` [PATCH 03/18] drm/amdgpu/sriov:move in_reset to adev and rename Monk Liu
@ 2017-09-18  6:11   ` Monk Liu
       [not found]     ` <1505715122-23904-5-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 05/18] drm/amdgpu:make ctx_add_fence interruptible Monk Liu
                     ` (13 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

At least for SRIOV we found reload PSP fw during
gpu reset cause PSP hang.

Change-Id: I5f273187a10bb8571b77651dfba7656ce0429af0
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 8a1ee97..4eee2ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -253,15 +253,18 @@ static int psp_asd_load(struct psp_context *psp)
 
 static int psp_hw_start(struct psp_context *psp)
 {
+	struct amdgpu_device *adev = psp->adev;
 	int ret;
 
-	ret = psp_bootloader_load_sysdrv(psp);
-	if (ret)
-		return ret;
+	if (amdgpu_sriov_vf(adev) && !adev->in_sriov_reset) {
+		ret = psp_bootloader_load_sysdrv(psp);
+		if (ret)
+			return ret;
 
-	ret = psp_bootloader_load_sos(psp);
-	if (ret)
-		return ret;
+		ret = psp_bootloader_load_sos(psp);
+		if (ret)
+			return ret;
+	}
 
 	ret = psp_ring_create(psp, PSP_RING_TYPE__KM);
 	if (ret)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 05/18] drm/amdgpu:make ctx_add_fence interruptible
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
                     ` (3 preceding siblings ...)
  2017-09-18  6:11   ` [PATCH 04/18] drm/amdgpu/sriov:don't load psp fw during gpu reset Monk Liu
@ 2017-09-18  6:11   ` Monk Liu
       [not found]     ` <1505715122-23904-6-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset Monk Liu
                     ` (12 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

otherwise a gpu hang will make application couldn't be killed

Change-Id: I6051b5b3ae1188983f49325a2438c84a6c12374a
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 12 ++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 14 +++++++++-----
 3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index cc9a232..6ff2959 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -736,8 +736,8 @@ struct amdgpu_ctx_mgr {
 struct amdgpu_ctx *amdgpu_ctx_get(struct amdgpu_fpriv *fpriv, uint32_t id);
 int amdgpu_ctx_put(struct amdgpu_ctx *ctx);
 
-uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
-			      struct dma_fence *fence);
+int amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
+			      struct dma_fence *fence, uint64_t *seq);
 struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx,
 				   struct amdgpu_ring *ring, uint64_t seq);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index b59749d..4ac7a92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1043,6 +1043,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	struct amd_sched_entity *entity = &p->ctx->rings[ring->idx].entity;
 	struct amdgpu_job *job;
 	unsigned i;
+	uint64_t seq;
+
 	int r;
 
 	amdgpu_mn_lock(p->mn);
@@ -1071,8 +1073,14 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	job->owner = p->filp;
 	job->fence_ctx = entity->fence_context;
 	p->fence = dma_fence_get(&job->base.s_fence->finished);
-	cs->out.handle = amdgpu_ctx_add_fence(p->ctx, ring, p->fence);
-	job->uf_sequence = cs->out.handle;
+	r = amdgpu_ctx_add_fence(p->ctx, ring, p->fence, &seq);
+	if (r) {
+		dma_fence_put(p->fence);
+		return r;
+	}
+
+	cs->out.handle = seq;
+	job->uf_sequence = seq;
 	amdgpu_job_free_resources(job);
 
 	trace_amdgpu_cs_ioctl(job);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index a11e443..97f8be4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -246,8 +246,8 @@ int amdgpu_ctx_put(struct amdgpu_ctx *ctx)
 	return 0;
 }
 
-uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
-			      struct dma_fence *fence)
+int amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
+			      struct dma_fence *fence, uint64_t* handler)
 {
 	struct amdgpu_ctx_ring *cring = & ctx->rings[ring->idx];
 	uint64_t seq = cring->sequence;
@@ -258,9 +258,11 @@ uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
 	other = cring->fences[idx];
 	if (other) {
 		signed long r;
-		r = dma_fence_wait_timeout(other, false, MAX_SCHEDULE_TIMEOUT);
-		if (r < 0)
+		r = dma_fence_wait_timeout(other, true, MAX_SCHEDULE_TIMEOUT);
+		if (r < 0) {
 			DRM_ERROR("Error (%ld) waiting for fence!\n", r);
+			return -ERESTARTSYS;
+		}
 	}
 
 	dma_fence_get(fence);
@@ -271,8 +273,10 @@ uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
 	spin_unlock(&ctx->ring_lock);
 
 	dma_fence_put(other);
+	if (handler)
+		*handler = seq;
 
-	return seq;
+	return 0;
 }
 
 struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
                     ` (4 preceding siblings ...)
  2017-09-18  6:11   ` [PATCH 05/18] drm/amdgpu:make ctx_add_fence interruptible Monk Liu
@ 2017-09-18  6:11   ` Monk Liu
       [not found]     ` <1505715122-23904-7-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 07/18] drm/amdgpu:add hdp golden setting register name hint Monk Liu
                     ` (11 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

doing gpu reset will rerun all hw_init and thus
ucode_init_bo is invoked again, so we need to skip
the fw_buf allocation during sriov gpu reset to avoid
memory leak.

Change-Id: I31131eda1bd45ea2f5bdc50c5da5fc5a9fe9027d
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h       |  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 64 +++++++++++++++----------------
 2 files changed, 35 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 6ff2959..3d0c633 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1185,6 +1185,9 @@ struct amdgpu_firmware {
 
 	/* gpu info firmware data pointer */
 	const struct firmware *gpu_info_fw;
+
+	void *fw_buf_ptr;
+	uint64_t fw_buf_mc;
 };
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
index f306374..6564902 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
@@ -360,8 +360,6 @@ static int amdgpu_ucode_patch_jt(struct amdgpu_firmware_info *ucode,
 int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
 {
 	struct amdgpu_bo **bo = &adev->firmware.fw_buf;
-	uint64_t fw_mc_addr;
-	void *fw_buf_ptr = NULL;
 	uint64_t fw_offset = 0;
 	int i, err;
 	struct amdgpu_firmware_info *ucode = NULL;
@@ -372,37 +370,39 @@ int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
 		return 0;
 	}
 
-	err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true,
-				amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
-				AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
-				NULL, NULL, 0, bo);
-	if (err) {
-		dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err);
-		goto failed;
-	}
+	if (!amdgpu_sriov_vf(adev) || !adev->in_sriov_reset) {
+		err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true,
+					amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
+					AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
+					NULL, NULL, 0, bo);
+		if (err) {
+			dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err);
+			goto failed;
+		}
 
-	err = amdgpu_bo_reserve(*bo, false);
-	if (err) {
-		dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err);
-		goto failed_reserve;
-	}
+		err = amdgpu_bo_reserve(*bo, false);
+		if (err) {
+			dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err);
+			goto failed_reserve;
+		}
 
-	err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
-				&fw_mc_addr);
-	if (err) {
-		dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err);
-		goto failed_pin;
-	}
+		err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
+					&adev->firmware.fw_buf_mc);
+		if (err) {
+			dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err);
+			goto failed_pin;
+		}
 
-	err = amdgpu_bo_kmap(*bo, &fw_buf_ptr);
-	if (err) {
-		dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err);
-		goto failed_kmap;
-	}
+		err = amdgpu_bo_kmap(*bo, &adev->firmware.fw_buf_ptr);
+		if (err) {
+			dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err);
+			goto failed_kmap;
+		}
 
-	amdgpu_bo_unreserve(*bo);
+		amdgpu_bo_unreserve(*bo);
+	}
 
-	memset(fw_buf_ptr, 0, adev->firmware.fw_size);
+	memset(adev->firmware.fw_buf_ptr, 0, adev->firmware.fw_size);
 
 	/*
 	 * if SMU loaded firmware, it needn't add SMC, UVD, and VCE
@@ -421,14 +421,14 @@ int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
 		ucode = &adev->firmware.ucode[i];
 		if (ucode->fw) {
 			header = (const struct common_firmware_header *)ucode->fw->data;
-			amdgpu_ucode_init_single_fw(adev, ucode, fw_mc_addr + fw_offset,
-						    (void *)((uint8_t *)fw_buf_ptr + fw_offset));
+			amdgpu_ucode_init_single_fw(adev, ucode, adev->firmware.fw_buf_mc + fw_offset,
+						    adev->firmware.fw_buf_ptr + fw_offset);
 			if (i == AMDGPU_UCODE_ID_CP_MEC1 &&
 			    adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) {
 				const struct gfx_firmware_header_v1_0 *cp_hdr;
 				cp_hdr = (const struct gfx_firmware_header_v1_0 *)ucode->fw->data;
-				amdgpu_ucode_patch_jt(ucode, fw_mc_addr + fw_offset,
-						    fw_buf_ptr + fw_offset);
+				amdgpu_ucode_patch_jt(ucode,  adev->firmware.fw_buf_mc + fw_offset,
+						    adev->firmware.fw_buf_ptr + fw_offset);
 				fw_offset += ALIGN(le32_to_cpu(cp_hdr->jt_size) << 2, PAGE_SIZE);
 			}
 			fw_offset += ALIGN(ucode->ucode_size, PAGE_SIZE);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 07/18] drm/amdgpu:add hdp golden setting register name hint
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
                     ` (5 preceding siblings ...)
  2017-09-18  6:11   ` [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset Monk Liu
@ 2017-09-18  6:11   ` Monk Liu
       [not found]     ` <1505715122-23904-8-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 08/18] drm/amdgpu:halt when vm fault Monk Liu
                     ` (10 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

Change-Id: I3a43901f5757b9fab629824a74ad9a4770a47b38
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 7ca9cbe..7a20ba8 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -59,16 +59,16 @@
 
 static const u32 golden_settings_vega10_hdp[] =
 {
-	0xf64, 0x0fffffff, 0x00000000,
-	0xf65, 0x0fffffff, 0x00000000,
-	0xf66, 0x0fffffff, 0x00000000,
-	0xf67, 0x0fffffff, 0x00000000,
-	0xf68, 0x0fffffff, 0x00000000,
-	0xf6a, 0x0fffffff, 0x00000000,
-	0xf6b, 0x0fffffff, 0x00000000,
-	0xf6c, 0x0fffffff, 0x00000000,
-	0xf6d, 0x0fffffff, 0x00000000,
-	0xf6e, 0x0fffffff, 0x00000000,
+	0xf64, 0x0fffffff, 0x00000000,//surface0_low_bound
+	0xf65, 0x0fffffff, 0x00000000,//surface0_upper_bound
+	0xf66, 0x0fffffff, 0x00000000,//surface0_base
+	0xf67, 0x0fffffff, 0x00000000,//surface0_info
+	0xf68, 0x0fffffff, 0x00000000,//surface0_base_hi
+	0xf6a, 0x0fffffff, 0x00000000,//surface1_low_bound
+	0xf6b, 0x0fffffff, 0x00000000,//surface1_upper_bound
+	0xf6c, 0x0fffffff, 0x00000000,//surface1_base
+	0xf6d, 0x0fffffff, 0x00000000,//surface1_info
+	0xf6e, 0x0fffffff, 0x00000000,//surface1_base_hi
 };
 
 static int gmc_v9_0_vm_fault_interrupt_state(struct amdgpu_device *adev,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 08/18] drm/amdgpu:halt when vm fault
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
                     ` (6 preceding siblings ...)
  2017-09-18  6:11   ` [PATCH 07/18] drm/amdgpu:add hdp golden setting register name hint Monk Liu
@ 2017-09-18  6:11   ` Monk Liu
       [not found]     ` <1505715122-23904-9-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 09/18] drm/amdgpu:insert TMZ_BEGIN Monk Liu
                     ` (9 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

only with this way we can debug the VMC page fault issue

Change-Id: Ifc8373c3c3c40d54ae94dedf1be74d6314faeb10
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 6 ++++++
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 7 +++++++
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
index 6c8040e..c17996e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
@@ -319,6 +319,12 @@ void gfxhub_v1_0_set_fault_enable_default(struct amdgpu_device *adev,
 			WRITE_PROTECTION_FAULT_ENABLE_DEFAULT, value);
 	tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
 			EXECUTE_PROTECTION_FAULT_ENABLE_DEFAULT, value);
+	if (!value) {
+		tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
+				CRASH_ON_NO_RETRY_FAULT, 1);
+		tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
+				CRASH_ON_RETRY_FAULT, 1);
+    }
 	WREG32_SOC15(GC, 0, mmVM_L2_PROTECTION_FAULT_CNTL, tmp);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
index 7ff7076..cc21c4b 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
@@ -561,6 +561,13 @@ void mmhub_v1_0_set_fault_enable_default(struct amdgpu_device *adev, bool value)
 			WRITE_PROTECTION_FAULT_ENABLE_DEFAULT, value);
 	tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
 			EXECUTE_PROTECTION_FAULT_ENABLE_DEFAULT, value);
+	if (!value) {
+		tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
+				CRASH_ON_NO_RETRY_FAULT, 1);
+		tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
+				CRASH_ON_RETRY_FAULT, 1);
+    }
+
 	WREG32_SOC15(MMHUB, 0, mmVM_L2_PROTECTION_FAULT_CNTL, tmp);
 }
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 09/18] drm/amdgpu:insert TMZ_BEGIN
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
                     ` (7 preceding siblings ...)
  2017-09-18  6:11   ` [PATCH 08/18] drm/amdgpu:halt when vm fault Monk Liu
@ 2017-09-18  6:11   ` Monk Liu
       [not found]     ` <1505715122-23904-10-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 10/18] drm/amdgpu:hdp flush should be put it initialized Monk Liu
                     ` (8 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

FRAME_CONTROL(begin) is needed for vega10 due to ucode logic change,
it can fix some CTS random fail under gfx preemption enabled mode.

Change-Id: I0442337f6cde13ed2a33f033badcb522e0f35e2d
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 21838f4..3306667 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3764,6 +3764,12 @@ static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring)
 	amdgpu_ring_write_multiple(ring, (void *)&de_payload, sizeof(de_payload) >> 2);
 }
 
+static void gfx_v9_0_ring_emit_tmz(struct amdgpu_ring *ring, bool start)
+{
+	amdgpu_ring_write(ring, PACKET3(PACKET3_FRAME_CONTROL, 0));
+	amdgpu_ring_write(ring, FRAME_CMD(start ? 0 : 1)); /* frame_end */
+}
+
 static void gfx_v9_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags)
 {
 	uint32_t dw2 = 0;
@@ -3771,6 +3777,8 @@ static void gfx_v9_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags)
 	if (amdgpu_sriov_vf(ring->adev))
 		gfx_v9_0_ring_emit_ce_meta(ring);
 
+	gfx_v9_0_ring_emit_tmz(ring, true);
+
 	dw2 |= 0x80000000; /* set load_enable otherwise this package is just NOPs */
 	if (flags & AMDGPU_HAVE_CTX_SWITCH) {
 		/* set load_global_config & load_global_uconfig */
@@ -3821,12 +3829,6 @@ static void gfx_v9_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne
 		ring->ring[offset] = (ring->ring_size>>2) - offset + cur;
 }
 
-static void gfx_v9_0_ring_emit_tmz(struct amdgpu_ring *ring, bool start)
-{
-	amdgpu_ring_write(ring, PACKET3(PACKET3_FRAME_CONTROL, 0));
-	amdgpu_ring_write(ring, FRAME_CMD(start ? 0 : 1)); /* frame_end */
-}
-
 static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)
 {
 	struct amdgpu_device *adev = ring->adev;
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 10/18] drm/amdgpu:hdp flush should be put it initialized
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
                     ` (8 preceding siblings ...)
  2017-09-18  6:11   ` [PATCH 09/18] drm/amdgpu:insert TMZ_BEGIN Monk Liu
@ 2017-09-18  6:11   ` Monk Liu
       [not found]     ` <1505715122-23904-11-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 11/18] drm/amdgpu:add vgt_flush for gfx9 Monk Liu
                     ` (7 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

Change-Id: I635271ba4c89189017daa302a7fe5cd65c3eef06
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 7a20ba8..3d035a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -696,12 +696,6 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device *adev)
 	if (r)
 		return r;
 
-	/* After HDP is initialized, flush HDP.*/
-	if (adev->flags & AMD_IS_APU)
-		nbio_v7_0_hdp_flush(adev);
-	else
-		nbio_v6_1_hdp_flush(adev);
-
 	switch (adev->asic_type) {
 	case CHIP_RAVEN:
 		mmhub_v1_0_initialize_power_gating(adev);
@@ -724,6 +718,12 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device *adev)
 	tmp = RREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL);
 	WREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL, tmp);
 
+	/* After HDP is initialized, flush HDP.*/
+	if (adev->flags & AMD_IS_APU)
+		nbio_v7_0_hdp_flush(adev);
+	else
+		nbio_v6_1_hdp_flush(adev);
+
 	if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_ALWAYS)
 		value = false;
 	else
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 11/18] drm/amdgpu:add vgt_flush for gfx9
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
                     ` (9 preceding siblings ...)
  2017-09-18  6:11   ` [PATCH 10/18] drm/amdgpu:hdp flush should be put it initialized Monk Liu
@ 2017-09-18  6:11   ` Monk Liu
       [not found]     ` <1505715122-23904-12-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate Monk Liu
                     ` (6 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

Change-Id: I584572cfb9145ee1b8d11d69ba2989bd6acfd706
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 3306667..f201510 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3499,6 +3499,17 @@ static void gfx_v9_0_ring_set_wptr_gfx(struct amdgpu_ring *ring)
 	}
 }
 
+static void gfx_v9_0_ring_emit_vgt_flush(struct amdgpu_ring *ring)
+{
+	amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0));
+	amdgpu_ring_write(ring, EVENT_TYPE(VS_PARTIAL_FLUSH) |
+		EVENT_INDEX(4));
+
+	amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0));
+	amdgpu_ring_write(ring, EVENT_TYPE(VGT_FLUSH) |
+		EVENT_INDEX(0));
+}
+
 static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
 {
 	u32 ref_and_mask, reg_mem_engine;
@@ -3530,6 +3541,9 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
 			      nbio_hf_reg->hdp_flush_req_offset,
 			      nbio_hf_reg->hdp_flush_done_offset,
 			      ref_and_mask, ref_and_mask, 0x20);
+
+	if (ring->funcs->type == AMDGPU_RING_TYPE_GFX)
+		gfx_v9_0_ring_emit_vgt_flush(ring);
 }
 
 static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
                     ` (10 preceding siblings ...)
  2017-09-18  6:11   ` [PATCH 11/18] drm/amdgpu:add vgt_flush for gfx9 Monk Liu
@ 2017-09-18  6:11   ` Monk Liu
       [not found]     ` <1505715122-23904-13-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 13/18] drm/amdgpu:fix driver unloading bug Monk Liu
                     ` (5 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index f201510..44960b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3549,7 +3549,7 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
 static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
 {
 	gfx_v9_0_write_data_to_reg(ring, 0, true,
-				   SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0), 1);
+				   SOC15_REG_OFFSET(HDP, 0, mmHDP_READ_CACHE_INVALIDATE), 1);
 }
 
 static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index fd7c72a..d5f3848 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -398,7 +398,7 @@ static void sdma_v4_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
 {
 	amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_SRBM_WRITE) |
 			  SDMA_PKT_SRBM_WRITE_HEADER_BYTE_EN(0xf));
-	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0));
+	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_READ_CACHE_INVALIDATE));
 	amdgpu_ring_write(ring, 1);
 }
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 13/18] drm/amdgpu:fix driver unloading bug
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
                     ` (11 preceding siblings ...)
  2017-09-18  6:11   ` [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate Monk Liu
@ 2017-09-18  6:11   ` Monk Liu
       [not found]     ` <1505715122-23904-14-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 14/18] drm/amdgpu: Fix amdgpu reload failure under SRIOV Monk Liu
                     ` (4 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Horace Chen, Monk Liu

[SWDEV-126631] - fix hypervisor save_vf fail that occured
after driver removed:
1. Because the KIQ and KCQ were not ummapped, save_vf will fail if driver freed mqd of KIQ and KCQ.
2. KIQ can't be unmapped since RLCV always need it, the bo_free on KIQ should be skipped
3. KCQ can be unmapped, and should be unmapped during hw_fini,
4. RLCV still need to access other mc address from some hw even after driver unloaded,
   So we should not unbind gart for VF.

Change-Id: I320487a9a848f41484c5f8cc11be34aca807b424
Signed-off-by: Horace Chen <horace.chen@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  |  5 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    | 60 +++++++++++++++++++++++++++++++-
 3 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index f437008..2fee071 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -394,7 +394,8 @@ int amdgpu_gart_init(struct amdgpu_device *adev)
  */
 void amdgpu_gart_fini(struct amdgpu_device *adev)
 {
-	if (adev->gart.ready) {
+	/* gart is still used by other hw under SRIOV, don't unbind it */
+	if (adev->gart.ready && !amdgpu_sriov_vf(adev)) {
 		/* unbind pages */
 		amdgpu_gart_unbind(adev, 0, adev->gart.num_cpu_pages);
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 4f6c68f..bf6656f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -309,6 +309,11 @@ void amdgpu_gfx_compute_mqd_sw_fini(struct amdgpu_device *adev)
 				      &ring->mqd_ptr);
 	}
 
+	/* don't deallocate KIQ mqd because the bo is still used by RLCV even
+	the guest VM is shutdown */
+	if (amdgpu_sriov_vf(adev))
+		return;
+
 	ring = &adev->gfx.kiq.ring;
 	kfree(adev->gfx.mec.mqd_backup[AMDGPU_MAX_COMPUTE_RINGS]);
 	amdgpu_bo_free_kernel(&ring->mqd_obj,
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 44960b3..a577bbc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -2892,14 +2892,72 @@ static int gfx_v9_0_hw_init(void *handle)
 	return r;
 }
 
+static int gfx_v9_0_kcq_disable(struct amdgpu_ring *kiq_ring,struct amdgpu_ring *ring)
+{
+	struct amdgpu_device *adev = kiq_ring->adev;
+	uint32_t scratch, tmp = 0;
+	int r, i;
+
+	r = amdgpu_gfx_scratch_get(adev, &scratch);
+	if (r) {
+		DRM_ERROR("Failed to get scratch reg (%d).\n", r);
+		return r;
+	}
+	WREG32(scratch, 0xCAFEDEAD);
+
+	r = amdgpu_ring_alloc(kiq_ring, 10);
+	if (r) {
+		DRM_ERROR("Failed to lock KIQ (%d).\n", r);
+		amdgpu_gfx_scratch_free(adev, scratch);
+		return r;
+	}
+
+	/* unmap queues */
+	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_UNMAP_QUEUES, 4));
+	amdgpu_ring_write(kiq_ring, /* Q_sel: 0, vmid: 0, engine: 0, num_Q: 1 */
+						PACKET3_UNMAP_QUEUES_ACTION(1) | /* RESET_QUEUES */
+						PACKET3_UNMAP_QUEUES_QUEUE_SEL(0) |
+						PACKET3_UNMAP_QUEUES_ENGINE_SEL(0) |
+						PACKET3_UNMAP_QUEUES_NUM_QUEUES(1));
+	amdgpu_ring_write(kiq_ring, PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
+	amdgpu_ring_write(kiq_ring, 0);
+	amdgpu_ring_write(kiq_ring, 0);
+	amdgpu_ring_write(kiq_ring, 0);
+	/* write to scratch for completion */
+	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_SET_UCONFIG_REG, 1));
+	amdgpu_ring_write(kiq_ring, (scratch - PACKET3_SET_UCONFIG_REG_START));
+	amdgpu_ring_write(kiq_ring, 0xDEADBEEF);
+	amdgpu_ring_commit(kiq_ring);
+
+	for (i = 0; i < adev->usec_timeout; i++) {
+		tmp = RREG32(scratch);
+		if (tmp == 0xDEADBEEF)
+			break;
+		DRM_UDELAY(1);
+	}
+	if (i >= adev->usec_timeout) {
+		DRM_ERROR("KCQ disabled failed (scratch(0x%04X)=0x%08X)\n", scratch, tmp);
+		r = -EINVAL;
+	}
+	amdgpu_gfx_scratch_free(adev, scratch);
+	return r;
+}
+
+
 static int gfx_v9_0_hw_fini(void *handle)
 {
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+	int i, r;
 
 	amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
 	amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
 	if (amdgpu_sriov_vf(adev)) {
-		pr_debug("For SRIOV client, shouldn't do anything.\n");
+		/* disable KCQ to avoid CPC touch memory not valid anymore */
+		for (i = 0; i < adev->gfx.num_compute_rings; i++) {
+			r = gfx_v9_0_kcq_disable(&adev->gfx.kiq.ring, &adev->gfx.compute_ring[i]);
+			if (r)
+				return r;
+		}
 		return 0;
 	}
 	gfx_v9_0_cp_enable(adev, false);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 14/18] drm/amdgpu: Fix amdgpu reload failure under SRIOV
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
                     ` (12 preceding siblings ...)
  2017-09-18  6:11   ` [PATCH 13/18] drm/amdgpu:fix driver unloading bug Monk Liu
@ 2017-09-18  6:11   ` Monk Liu
       [not found]     ` <1505715122-23904-15-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:11   ` [PATCH 15/18] drm/amdgpu/sriov: fix page fault issue of driver unload Monk Liu
                     ` (3 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Horace Chen, Monk Liu

From: Horace Chen <horace.chen@amd.com>

Kernel will set the PCI power state to UNKNOWN after unloading,
Since SRIOV has faked PCI config space so the UNKNOWN state
will be kept forever.

In driver reload if the power state is UNKNOWN then enabling msi
will fail.

forcely set it to D0 for SRIOV to fix this kernel flawness.

Change-Id: I6a72d5fc9b653b21c3c98167515a511c5edeb91c
Signed-off-by: Horace Chen <horace.chen@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 914c5bf..345406a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -229,7 +229,15 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
 	adev->irq.msi_enabled = false;
 
 	if (amdgpu_msi_ok(adev)) {
-		int ret = pci_enable_msi(adev->pdev);
+		int ret;
+		if (amdgpu_sriov_vf(adev) &&
+		    adev->pdev->current_state == PCI_UNKNOWN){
+			/* If pci power state is unknown on the SRIOV platform,
+			 * it may be set in the remove device. We need to forcely
+			 * set it to D0 to enable the msi*/
+			adev->pdev->current_state = PCI_D0;
+		}
+		ret = pci_enable_msi(adev->pdev);
 		if (!ret) {
 			adev->irq.msi_enabled = true;
 			dev_info(adev->dev, "amdgpu: using MSI.\n");
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 15/18] drm/amdgpu/sriov: fix page fault issue of driver unload
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
                     ` (13 preceding siblings ...)
  2017-09-18  6:11   ` [PATCH 14/18] drm/amdgpu: Fix amdgpu reload failure under SRIOV Monk Liu
@ 2017-09-18  6:11   ` Monk Liu
       [not found]     ` <1505715122-23904-16-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:12   ` [PATCH 16/18] drm/amdgpu: increate mailbox polling timeout to 12s Monk Liu
                     ` (2 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Horace Chen, Monk Liu

bo_free on csa is too late to put in amdgpu_fini because that
time ttm is already finished,
Move it earlier to avoid the page fault.

Change-Id: Id9c3f6aa8720cabbc9936ce21d8cf98af6e23bee
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Horace Chen <horace.chen@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c      | 1 +
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 298a241..e0a17bd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1795,10 +1795,8 @@ static int amdgpu_fini(struct amdgpu_device *adev)
 		adev->ip_blocks[i].status.late_initialized = false;
 	}
 
-	if (amdgpu_sriov_vf(adev)) {
-		amdgpu_bo_free_kernel(&adev->virt.csa_obj, &adev->virt.csa_vmid0_addr, NULL);
+	if (amdgpu_sriov_vf(adev))
 		amdgpu_virt_release_full_gpu(adev, false);
-	}
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 3f511a9..40e5865 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -2113,6 +2113,7 @@ static int gfx_v8_0_sw_fini(void *handle)
 	amdgpu_gfx_compute_mqd_sw_fini(adev);
 	amdgpu_gfx_kiq_free_ring(&adev->gfx.kiq.ring, &adev->gfx.kiq.irq);
 	amdgpu_gfx_kiq_fini(adev);
+	amdgpu_bo_free_kernel(&adev->virt.csa_obj, &adev->virt.csa_vmid0_addr, NULL);
 
 	gfx_v8_0_mec_fini(adev);
 	gfx_v8_0_rlc_fini(adev);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 16/18] drm/amdgpu: increate mailbox polling timeout to 12s.
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
                     ` (14 preceding siblings ...)
  2017-09-18  6:11   ` [PATCH 15/18] drm/amdgpu/sriov: fix page fault issue of driver unload Monk Liu
@ 2017-09-18  6:12   ` Monk Liu
       [not found]     ` <1505715122-23904-17-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:12   ` [PATCH 17/18] drm/amdgpu:fix uvd ring fini routine Monk Liu
  2017-09-18  6:12   ` [PATCH 18/18] drm/amdgpu/sriov:init csb for gfxv9 Monk Liu
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:12 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Horace Chen

From: Horace Chen <horace.chen@amd.com>

Because there may have multiple FLR waiting for done, the waiting
time of events may be long, add the time to 12s to reduce timeout
failure.

Change-Id: I6b33170ba7dedf781b99ba6095127efce403af81
Signed-off-by: Horace Chen <horace.chen@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h | 2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
index 1e91b9a..67e7857 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
@@ -24,7 +24,7 @@
 #ifndef __MXGPU_AI_H__
 #define __MXGPU_AI_H__
 
-#define AI_MAILBOX_TIMEDOUT	5000
+#define AI_MAILBOX_TIMEDOUT	12000
 
 enum idh_request {
 	IDH_REQ_GPU_INIT_ACCESS = 1,
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h
index c791d73..f13dc6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h
@@ -23,7 +23,7 @@
 #ifndef __MXGPU_VI_H__
 #define __MXGPU_VI_H__
 
-#define VI_MAILBOX_TIMEDOUT	5000
+#define VI_MAILBOX_TIMEDOUT	12000
 #define VI_MAILBOX_RESET_TIME	12
 
 /* VI mailbox messages request */
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 17/18] drm/amdgpu:fix uvd ring fini routine
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
                     ` (15 preceding siblings ...)
  2017-09-18  6:12   ` [PATCH 16/18] drm/amdgpu: increate mailbox polling timeout to 12s Monk Liu
@ 2017-09-18  6:12   ` Monk Liu
       [not found]     ` <1505715122-23904-18-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  6:12   ` [PATCH 18/18] drm/amdgpu/sriov:init csb for gfxv9 Monk Liu
  17 siblings, 1 reply; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:12 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

fix missing finish uvd enc_ring and wrongly finish uvd ring

Change-Id: Ib74237ca5adcb3b128c9b751fced0b7db7b09e86
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 331e34a..63b00eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -269,6 +269,8 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
 
 int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)
 {
+	struct amdgpu_ring *ring;
+	int i;
 	kfree(adev->uvd.saved_bo);
 
 	amd_sched_entity_fini(&adev->uvd.ring.sched, &adev->uvd.entity);
@@ -277,7 +279,15 @@ int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)
 			      &adev->uvd.gpu_addr,
 			      (void **)&adev->uvd.cpu_addr);
 
-	amdgpu_ring_fini(&adev->uvd.ring);
+	ring = &adev->uvd.ring;
+	if (ring->adev)
+		amdgpu_ring_fini(ring);
+
+	for (i = 0; i < AMDGPU_MAX_UVD_ENC_RINGS; ++i) {
+		ring = &adev->uvd.ring_enc[i];
+		if (ring->adev)
+			amdgpu_ring_fini(ring);
+	}
 
 	release_firmware(adev->uvd.fw);
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 18/18] drm/amdgpu/sriov:init csb for gfxv9
       [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
                     ` (16 preceding siblings ...)
  2017-09-18  6:12   ` [PATCH 17/18] drm/amdgpu:fix uvd ring fini routine Monk Liu
@ 2017-09-18  6:12   ` Monk Liu
  17 siblings, 0 replies; 61+ messages in thread
From: Monk Liu @ 2017-09-18  6:12 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu

RLC need CSB registers initiated under SRIOV during world switch
otherwise the clear state buffer behav will not be recovered to
current VF scheme after switch back

Change-Id: I3afd82875564c233060b740724bd8031095780f6
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index a577bbc..8d677cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -2044,8 +2044,10 @@ static int gfx_v9_0_rlc_resume(struct amdgpu_device *adev)
 {
 	int r;
 
-	if (amdgpu_sriov_vf(adev))
+	if (amdgpu_sriov_vf(adev)) {
+		gfx_v9_0_init_csb(adev);
 		return 0;
+	}
 
 	gfx_v9_0_rlc_stop(adev);
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/18] drm/amdgpu/sriov:fix missing error handling
       [not found]     ` <1505715122-23904-2-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:04       ` Christian König
  0 siblings, 0 replies; 61+ messages in thread
From: Christian König @ 2017-09-18  9:04 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> Change-Id: Ifc6942ed0221f3134bfba4d66fde743484191da3
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index e390c01..d1ac27d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -841,8 +841,11 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
>   
>   	if (amdgpu_sriov_vf(adev)) {
>   		r = amdgpu_map_static_csa(adev, &fpriv->vm, &fpriv->csa_va);
> -		if (r)
> +		if (r) {
> +			amdgpu_vm_fini(adev, &fpriv->vm);
> +			kfree(fpriv);
>   			goto out_suspend;
> +		}
>   	}
>   
>   	mutex_init(&fpriv->bo_list_lock);


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/18] drm/amdgpu:no kiq in IH
       [not found]     ` <1505715122-23904-3-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:05       ` Christian König
  0 siblings, 0 replies; 61+ messages in thread
From: Christian König @ 2017-09-18  9:05 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> Change-Id: I4deb65675d2531236b2f4e2bc6f015c657546464
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> index 67610f7..c291e33 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> @@ -219,9 +219,9 @@ static u32 vega10_ih_get_wptr(struct amdgpu_device *adev)
>   			wptr, adev->irq.ih.rptr, tmp);
>   		adev->irq.ih.rptr = tmp;
>   
> -		tmp = RREG32(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL));
> +		tmp = RREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL));
>   		tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
> -		WREG32(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL), tmp);
> +		WREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL), tmp);
>   	}
>   	return (wptr & adev->irq.ih.ptr_mask);
>   }


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/18] drm/amdgpu/sriov:move in_reset to adev and rename
       [not found]     ` <1505715122-23904-4-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:05       ` Christian König
  0 siblings, 0 replies; 61+ messages in thread
From: Christian König @ 2017-09-18  9:05 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> currently in_reset is only used in sriov gpu reset, and it
> will be used for other non-gfx hw component later, like
> PSP, so move it from gfx to adev and rename to in_sriov_reset
> make more sense.
>
> Change-Id: Ibb8546f6e4635a1cca740e57f6244f158c70a1e6
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        | 2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c      | 6 +++---
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c      | 6 +++---
>   4 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index a34c4cb..cc9a232 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1019,7 +1019,6 @@ struct amdgpu_gfx {
>   	/* reset mask */
>   	uint32_t                        grbm_soft_reset;
>   	uint32_t                        srbm_soft_reset;
> -	bool                            in_reset;
>   	/* s3/s4 mask */
>   	bool                            in_suspend;
>   	/* NGG */
> @@ -1588,6 +1587,7 @@ struct amdgpu_device {
>   
>   	/* record last mm index being written through WREG32*/
>   	unsigned long last_mm_index;
> +	bool                            in_sriov_reset;
>   };
>   
>   static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 3467179..298a241 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2757,7 +2757,7 @@ int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, struct amdgpu_job *job)
>   
>   	mutex_lock(&adev->virt.lock_reset);
>   	atomic_inc(&adev->gpu_reset_counter);
> -	adev->gfx.in_reset = true;
> +	adev->in_sriov_reset = true;
>   
>   	/* block TTM */
>   	resched = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
> @@ -2868,7 +2868,7 @@ int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, struct amdgpu_job *job)
>   		dev_info(adev->dev, "GPU reset successed!\n");
>   	}
>   
> -	adev->gfx.in_reset = false;
> +	adev->in_sriov_reset = false;
>   	mutex_unlock(&adev->virt.lock_reset);
>   	return r;
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index 6ee348e..3f511a9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -4810,7 +4810,7 @@ static int gfx_v8_0_kiq_init_queue(struct amdgpu_ring *ring)
>   
>   	gfx_v8_0_kiq_setting(ring);
>   
> -	if (adev->gfx.in_reset) { /* for GPU_RESET case */
> +	if (adev->in_sriov_reset) { /* for GPU_RESET case */
>   		/* reset MQD to a clean status */
>   		if (adev->gfx.mec.mqd_backup[mqd_idx])
>   			memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], sizeof(struct vi_mqd_allocation));
> @@ -4847,7 +4847,7 @@ static int gfx_v8_0_kcq_init_queue(struct amdgpu_ring *ring)
>   	struct vi_mqd *mqd = ring->mqd_ptr;
>   	int mqd_idx = ring - &adev->gfx.compute_ring[0];
>   
> -	if (!adev->gfx.in_reset && !adev->gfx.in_suspend) {
> +	if (!adev->in_sriov_reset && !adev->gfx.in_suspend) {
>   		memset((void *)mqd, 0, sizeof(struct vi_mqd_allocation));
>   		((struct vi_mqd_allocation *)mqd)->dynamic_cu_mask = 0xFFFFFFFF;
>   		((struct vi_mqd_allocation *)mqd)->dynamic_rb_mask = 0xFFFFFFFF;
> @@ -4859,7 +4859,7 @@ static int gfx_v8_0_kcq_init_queue(struct amdgpu_ring *ring)
>   
>   		if (adev->gfx.mec.mqd_backup[mqd_idx])
>   			memcpy(adev->gfx.mec.mqd_backup[mqd_idx], mqd, sizeof(struct vi_mqd_allocation));
> -	} else if (adev->gfx.in_reset) { /* for GPU_RESET case */
> +	} else if (adev->in_sriov_reset) { /* for GPU_RESET case */
>   		/* reset MQD to a clean status */
>   		if (adev->gfx.mec.mqd_backup[mqd_idx])
>   			memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], sizeof(struct vi_mqd_allocation));
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index c133c85..21838f4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -2698,7 +2698,7 @@ static int gfx_v9_0_kiq_init_queue(struct amdgpu_ring *ring)
>   
>   	gfx_v9_0_kiq_setting(ring);
>   
> -	if (adev->gfx.in_reset) { /* for GPU_RESET case */
> +	if (adev->in_sriov_reset) { /* for GPU_RESET case */
>   		/* reset MQD to a clean status */
>   		if (adev->gfx.mec.mqd_backup[mqd_idx])
>   			memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], sizeof(struct v9_mqd_allocation));
> @@ -2736,7 +2736,7 @@ static int gfx_v9_0_kcq_init_queue(struct amdgpu_ring *ring)
>   	struct v9_mqd *mqd = ring->mqd_ptr;
>   	int mqd_idx = ring - &adev->gfx.compute_ring[0];
>   
> -	if (!adev->gfx.in_reset && !adev->gfx.in_suspend) {
> +	if (!adev->in_sriov_reset && !adev->gfx.in_suspend) {
>   		memset((void *)mqd, 0, sizeof(struct v9_mqd_allocation));
>   		((struct v9_mqd_allocation *)mqd)->dynamic_cu_mask = 0xFFFFFFFF;
>   		((struct v9_mqd_allocation *)mqd)->dynamic_rb_mask = 0xFFFFFFFF;
> @@ -2748,7 +2748,7 @@ static int gfx_v9_0_kcq_init_queue(struct amdgpu_ring *ring)
>   
>   		if (adev->gfx.mec.mqd_backup[mqd_idx])
>   			memcpy(adev->gfx.mec.mqd_backup[mqd_idx], mqd, sizeof(struct v9_mqd_allocation));
> -	} else if (adev->gfx.in_reset) { /* for GPU_RESET case */
> +	} else if (adev->in_sriov_reset) { /* for GPU_RESET case */
>   		/* reset MQD to a clean status */
>   		if (adev->gfx.mec.mqd_backup[mqd_idx])
>   			memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], sizeof(struct v9_mqd_allocation));


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/18] drm/amdgpu/sriov:don't load psp fw during gpu reset
       [not found]     ` <1505715122-23904-5-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:06       ` Christian König
       [not found]         ` <2cd93ffd-91a6-77c6-b07c-c68188a340a5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Christian König @ 2017-09-18  9:06 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> At least for SRIOV we found reload PSP fw during
> gpu reset cause PSP hang.
>
> Change-Id: I5f273187a10bb8571b77651dfba7656ce0429af0
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 15 +++++++++------
>   1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index 8a1ee97..4eee2ef 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -253,15 +253,18 @@ static int psp_asd_load(struct psp_context *psp)
>   
>   static int psp_hw_start(struct psp_context *psp)
>   {
> +	struct amdgpu_device *adev = psp->adev;
>   	int ret;
>   
> -	ret = psp_bootloader_load_sysdrv(psp);
> -	if (ret)
> -		return ret;
> +	if (amdgpu_sriov_vf(adev) && !adev->in_sriov_reset) {
> +		ret = psp_bootloader_load_sysdrv(psp);
> +		if (ret)
> +			return ret;
>   
> -	ret = psp_bootloader_load_sos(psp);
> -	if (ret)
> -		return ret;
> +		ret = psp_bootloader_load_sos(psp);
> +		if (ret)
> +			return ret;
> +	}
>   
>   	ret = psp_ring_create(psp, PSP_RING_TYPE__KM);
>   	if (ret)


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 05/18] drm/amdgpu:make ctx_add_fence interruptible
       [not found]     ` <1505715122-23904-6-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:10       ` Christian König
  0 siblings, 0 replies; 61+ messages in thread
From: Christian König @ 2017-09-18  9:10 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> otherwise a gpu hang will make application couldn't be killed
>
> Change-Id: I6051b5b3ae1188983f49325a2438c84a6c12374a
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  4 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 12 ++++++++++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 14 +++++++++-----
>   3 files changed, 21 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index cc9a232..6ff2959 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -736,8 +736,8 @@ struct amdgpu_ctx_mgr {
>   struct amdgpu_ctx *amdgpu_ctx_get(struct amdgpu_fpriv *fpriv, uint32_t id);
>   int amdgpu_ctx_put(struct amdgpu_ctx *ctx);
>   
> -uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
> -			      struct dma_fence *fence);
> +int amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
> +			      struct dma_fence *fence, uint64_t *seq);
>   struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx,
>   				   struct amdgpu_ring *ring, uint64_t seq);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index b59749d..4ac7a92 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1043,6 +1043,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>   	struct amd_sched_entity *entity = &p->ctx->rings[ring->idx].entity;
>   	struct amdgpu_job *job;
>   	unsigned i;
> +	uint64_t seq;
> +
>   	int r;
>   
>   	amdgpu_mn_lock(p->mn);
> @@ -1071,8 +1073,14 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
>   	job->owner = p->filp;
>   	job->fence_ctx = entity->fence_context;
>   	p->fence = dma_fence_get(&job->base.s_fence->finished);
> -	cs->out.handle = amdgpu_ctx_add_fence(p->ctx, ring, p->fence);
> -	job->uf_sequence = cs->out.handle;
> +	r = amdgpu_ctx_add_fence(p->ctx, ring, p->fence, &seq);
> +	if (r) {
> +		dma_fence_put(p->fence);
> +		return r;

This will memory leak the job and you need to call amdgpu_mn_unlock() 
before returning.

> +	}
> +
> +	cs->out.handle = seq;
> +	job->uf_sequence = seq;
>   	amdgpu_job_free_resources(job);
>   
>   	trace_amdgpu_cs_ioctl(job);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index a11e443..97f8be4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -246,8 +246,8 @@ int amdgpu_ctx_put(struct amdgpu_ctx *ctx)
>   	return 0;
>   }
>   
> -uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
> -			      struct dma_fence *fence)
> +int amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
> +			      struct dma_fence *fence, uint64_t* handler)
>   {
>   	struct amdgpu_ctx_ring *cring = & ctx->rings[ring->idx];
>   	uint64_t seq = cring->sequence;
> @@ -258,9 +258,11 @@ uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
>   	other = cring->fences[idx];
>   	if (other) {
>   		signed long r;
> -		r = dma_fence_wait_timeout(other, false, MAX_SCHEDULE_TIMEOUT);
> -		if (r < 0)
> +		r = dma_fence_wait_timeout(other, true, MAX_SCHEDULE_TIMEOUT);
> +		if (r < 0) {
>   			DRM_ERROR("Error (%ld) waiting for fence!\n", r);

Drop the extra error message here. Receiving an signal is not something 
that should trigger an extra message in the logs

> +			return -ERESTARTSYS;

And return the original error code here.

Apart from that looks good to me,
Christian.

> +		}
>   	}
>   
>   	dma_fence_get(fence);
> @@ -271,8 +273,10 @@ uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
>   	spin_unlock(&ctx->ring_lock);
>   
>   	dma_fence_put(other);
> +	if (handler)
> +		*handler = seq;
>   
> -	return seq;
> +	return 0;
>   }
>   
>   struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx,


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 14/18] drm/amdgpu: Fix amdgpu reload failure under SRIOV
       [not found]     ` <1505715122-23904-15-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:10       ` Yu, Xiangliang
  2017-09-18  9:31       ` Christian König
  1 sibling, 0 replies; 61+ messages in thread
From: Yu, Xiangliang @ 2017-09-18  9:10 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Chen, Horace, Liu, Monk

NAK, Tonga has no this problem, please keep the patch into internal branch for temporally.


-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Monk Liu
Sent: Monday, September 18, 2017 2:12 PM
To: amd-gfx@lists.freedesktop.org
Cc: Chen, Horace <Horace.Chen@amd.com>; Liu, Monk <Monk.Liu@amd.com>
Subject: [PATCH 14/18] drm/amdgpu: Fix amdgpu reload failure under SRIOV

From: Horace Chen <horace.chen@amd.com>

Kernel will set the PCI power state to UNKNOWN after unloading, Since SRIOV has faked PCI config space so the UNKNOWN state will be kept forever.

In driver reload if the power state is UNKNOWN then enabling msi will fail.

forcely set it to D0 for SRIOV to fix this kernel flawness.

Change-Id: I6a72d5fc9b653b21c3c98167515a511c5edeb91c
Signed-off-by: Horace Chen <horace.chen@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 914c5bf..345406a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -229,7 +229,15 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
 	adev->irq.msi_enabled = false;
 
 	if (amdgpu_msi_ok(adev)) {
-		int ret = pci_enable_msi(adev->pdev);
+		int ret;
+		if (amdgpu_sriov_vf(adev) &&
+		    adev->pdev->current_state == PCI_UNKNOWN){
+			/* If pci power state is unknown on the SRIOV platform,
+			 * it may be set in the remove device. We need to forcely
+			 * set it to D0 to enable the msi*/
+			adev->pdev->current_state = PCI_D0;
+		}
+		ret = pci_enable_msi(adev->pdev);
 		if (!ret) {
 			adev->irq.msi_enabled = true;
 			dev_info(adev->dev, "amdgpu: using MSI.\n");
--
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset
       [not found]     ` <1505715122-23904-7-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:12       ` Christian König
       [not found]         ` <f96a1189-2fe3-6466-df1b-557f87319cb9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Christian König @ 2017-09-18  9:12 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> doing gpu reset will rerun all hw_init and thus
> ucode_init_bo is invoked again, so we need to skip
> the fw_buf allocation during sriov gpu reset to avoid
> memory leak.
>
> Change-Id: I31131eda1bd45ea2f5bdc50c5da5fc5a9fe9027d
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h       |  3 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 64 +++++++++++++++----------------
>   2 files changed, 35 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 6ff2959..3d0c633 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1185,6 +1185,9 @@ struct amdgpu_firmware {
>   
>   	/* gpu info firmware data pointer */
>   	const struct firmware *gpu_info_fw;
> +
> +	void *fw_buf_ptr;
> +	uint64_t fw_buf_mc;
>   };
>   
>   /*
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> index f306374..6564902 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> @@ -360,8 +360,6 @@ static int amdgpu_ucode_patch_jt(struct amdgpu_firmware_info *ucode,
>   int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
>   {
>   	struct amdgpu_bo **bo = &adev->firmware.fw_buf;
> -	uint64_t fw_mc_addr;
> -	void *fw_buf_ptr = NULL;
>   	uint64_t fw_offset = 0;
>   	int i, err;
>   	struct amdgpu_firmware_info *ucode = NULL;
> @@ -372,37 +370,39 @@ int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
>   		return 0;
>   	}
>   
> -	err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true,
> -				amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
> -				AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
> -				NULL, NULL, 0, bo);
> -	if (err) {
> -		dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err);
> -		goto failed;
> -	}
> +	if (!amdgpu_sriov_vf(adev) || !adev->in_sriov_reset) {

Instead of all this better use amdgpu_bo_create_kernel(), this should 
already include most of the handling necessary here.

Christian.

> +		err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true,
> +					amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
> +					AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
> +					NULL, NULL, 0, bo);
> +		if (err) {
> +			dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err);
> +			goto failed;
> +		}
>   
> -	err = amdgpu_bo_reserve(*bo, false);
> -	if (err) {
> -		dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err);
> -		goto failed_reserve;
> -	}
> +		err = amdgpu_bo_reserve(*bo, false);
> +		if (err) {
> +			dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err);
> +			goto failed_reserve;
> +		}
>   
> -	err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
> -				&fw_mc_addr);
> -	if (err) {
> -		dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err);
> -		goto failed_pin;
> -	}
> +		err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
> +					&adev->firmware.fw_buf_mc);
> +		if (err) {
> +			dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err);
> +			goto failed_pin;
> +		}
>   
> -	err = amdgpu_bo_kmap(*bo, &fw_buf_ptr);
> -	if (err) {
> -		dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err);
> -		goto failed_kmap;
> -	}
> +		err = amdgpu_bo_kmap(*bo, &adev->firmware.fw_buf_ptr);
> +		if (err) {
> +			dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err);
> +			goto failed_kmap;
> +		}
>   
> -	amdgpu_bo_unreserve(*bo);
> +		amdgpu_bo_unreserve(*bo);
> +	}
>   
> -	memset(fw_buf_ptr, 0, adev->firmware.fw_size);
> +	memset(adev->firmware.fw_buf_ptr, 0, adev->firmware.fw_size);
>   
>   	/*
>   	 * if SMU loaded firmware, it needn't add SMC, UVD, and VCE
> @@ -421,14 +421,14 @@ int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
>   		ucode = &adev->firmware.ucode[i];
>   		if (ucode->fw) {
>   			header = (const struct common_firmware_header *)ucode->fw->data;
> -			amdgpu_ucode_init_single_fw(adev, ucode, fw_mc_addr + fw_offset,
> -						    (void *)((uint8_t *)fw_buf_ptr + fw_offset));
> +			amdgpu_ucode_init_single_fw(adev, ucode, adev->firmware.fw_buf_mc + fw_offset,
> +						    adev->firmware.fw_buf_ptr + fw_offset);
>   			if (i == AMDGPU_UCODE_ID_CP_MEC1 &&
>   			    adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) {
>   				const struct gfx_firmware_header_v1_0 *cp_hdr;
>   				cp_hdr = (const struct gfx_firmware_header_v1_0 *)ucode->fw->data;
> -				amdgpu_ucode_patch_jt(ucode, fw_mc_addr + fw_offset,
> -						    fw_buf_ptr + fw_offset);
> +				amdgpu_ucode_patch_jt(ucode,  adev->firmware.fw_buf_mc + fw_offset,
> +						    adev->firmware.fw_buf_ptr + fw_offset);
>   				fw_offset += ALIGN(le32_to_cpu(cp_hdr->jt_size) << 2, PAGE_SIZE);
>   			}
>   			fw_offset += ALIGN(ucode->ucode_size, PAGE_SIZE);


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/18] drm/amdgpu:add hdp golden setting register name hint
       [not found]     ` <1505715122-23904-8-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:13       ` Christian König
  0 siblings, 0 replies; 61+ messages in thread
From: Christian König @ 2017-09-18  9:13 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> Change-Id: I3a43901f5757b9fab629824a74ad9a4770a47b38
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 20 ++++++++++----------
>   1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 7ca9cbe..7a20ba8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -59,16 +59,16 @@
>   
>   static const u32 golden_settings_vega10_hdp[] =
>   {
> -	0xf64, 0x0fffffff, 0x00000000,
> -	0xf65, 0x0fffffff, 0x00000000,
> -	0xf66, 0x0fffffff, 0x00000000,
> -	0xf67, 0x0fffffff, 0x00000000,
> -	0xf68, 0x0fffffff, 0x00000000,
> -	0xf6a, 0x0fffffff, 0x00000000,
> -	0xf6b, 0x0fffffff, 0x00000000,
> -	0xf6c, 0x0fffffff, 0x00000000,
> -	0xf6d, 0x0fffffff, 0x00000000,
> -	0xf6e, 0x0fffffff, 0x00000000,
> +	0xf64, 0x0fffffff, 0x00000000,//surface0_low_bound
> +	0xf65, 0x0fffffff, 0x00000000,//surface0_upper_bound
> +	0xf66, 0x0fffffff, 0x00000000,//surface0_base
> +	0xf67, 0x0fffffff, 0x00000000,//surface0_info
> +	0xf68, 0x0fffffff, 0x00000000,//surface0_base_hi
> +	0xf6a, 0x0fffffff, 0x00000000,//surface1_low_bound
> +	0xf6b, 0x0fffffff, 0x00000000,//surface1_upper_bound
> +	0xf6c, 0x0fffffff, 0x00000000,//surface1_base
> +	0xf6d, 0x0fffffff, 0x00000000,//surface1_info
> +	0xf6e, 0x0fffffff, 0x00000000,//surface1_base_hi

Don't use "//" in kernel code.

Christian.

>   };
>   
>   static int gmc_v9_0_vm_fault_interrupt_state(struct amdgpu_device *adev,


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 08/18] drm/amdgpu:halt when vm fault
       [not found]     ` <1505715122-23904-9-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:14       ` Christian König
  0 siblings, 0 replies; 61+ messages in thread
From: Christian König @ 2017-09-18  9:14 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> only with this way we can debug the VMC page fault issue
>
> Change-Id: Ifc8373c3c3c40d54ae94dedf1be74d6314faeb10
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

Please make this behavior depend on the vm_fault_stop module parameter 
just like it is handled on older generations.

Apart from that it looks like a really good idea to me,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 6 ++++++
>   drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 7 +++++++
>   2 files changed, 13 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> index 6c8040e..c17996e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> @@ -319,6 +319,12 @@ void gfxhub_v1_0_set_fault_enable_default(struct amdgpu_device *adev,
>   			WRITE_PROTECTION_FAULT_ENABLE_DEFAULT, value);
>   	tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
>   			EXECUTE_PROTECTION_FAULT_ENABLE_DEFAULT, value);
> +	if (!value) {
> +		tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
> +				CRASH_ON_NO_RETRY_FAULT, 1);
> +		tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
> +				CRASH_ON_RETRY_FAULT, 1);
> +    }
>   	WREG32_SOC15(GC, 0, mmVM_L2_PROTECTION_FAULT_CNTL, tmp);
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
> index 7ff7076..cc21c4b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
> @@ -561,6 +561,13 @@ void mmhub_v1_0_set_fault_enable_default(struct amdgpu_device *adev, bool value)
>   			WRITE_PROTECTION_FAULT_ENABLE_DEFAULT, value);
>   	tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
>   			EXECUTE_PROTECTION_FAULT_ENABLE_DEFAULT, value);
> +	if (!value) {
> +		tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
> +				CRASH_ON_NO_RETRY_FAULT, 1);
> +		tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
> +				CRASH_ON_RETRY_FAULT, 1);
> +    }
> +
>   	WREG32_SOC15(MMHUB, 0, mmVM_L2_PROTECTION_FAULT_CNTL, tmp);
>   }
>   


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/18] drm/amdgpu:insert TMZ_BEGIN
       [not found]     ` <1505715122-23904-10-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:15       ` Christian König
  0 siblings, 0 replies; 61+ messages in thread
From: Christian König @ 2017-09-18  9:15 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> FRAME_CONTROL(begin) is needed for vega10 due to ucode logic change,
> it can fix some CTS random fail under gfx preemption enabled mode.
>
> Change-Id: I0442337f6cde13ed2a33f033badcb522e0f35e2d
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 14 ++++++++------
>   1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 21838f4..3306667 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3764,6 +3764,12 @@ static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring)
>   	amdgpu_ring_write_multiple(ring, (void *)&de_payload, sizeof(de_payload) >> 2);
>   }
>   
> +static void gfx_v9_0_ring_emit_tmz(struct amdgpu_ring *ring, bool start)
> +{
> +	amdgpu_ring_write(ring, PACKET3(PACKET3_FRAME_CONTROL, 0));
> +	amdgpu_ring_write(ring, FRAME_CMD(start ? 0 : 1)); /* frame_end */
> +}
> +
>   static void gfx_v9_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags)
>   {
>   	uint32_t dw2 = 0;
> @@ -3771,6 +3777,8 @@ static void gfx_v9_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags)
>   	if (amdgpu_sriov_vf(ring->adev))
>   		gfx_v9_0_ring_emit_ce_meta(ring);
>   
> +	gfx_v9_0_ring_emit_tmz(ring, true);
> +
>   	dw2 |= 0x80000000; /* set load_enable otherwise this package is just NOPs */
>   	if (flags & AMDGPU_HAVE_CTX_SWITCH) {
>   		/* set load_global_config & load_global_uconfig */
> @@ -3821,12 +3829,6 @@ static void gfx_v9_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne
>   		ring->ring[offset] = (ring->ring_size>>2) - offset + cur;
>   }
>   
> -static void gfx_v9_0_ring_emit_tmz(struct amdgpu_ring *ring, bool start)
> -{
> -	amdgpu_ring_write(ring, PACKET3(PACKET3_FRAME_CONTROL, 0));
> -	amdgpu_ring_write(ring, FRAME_CMD(start ? 0 : 1)); /* frame_end */
> -}
> -
>   static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)
>   {
>   	struct amdgpu_device *adev = ring->adev;


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/18] drm/amdgpu:hdp flush should be put it initialized
       [not found]     ` <1505715122-23904-11-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:16       ` Christian König
  0 siblings, 0 replies; 61+ messages in thread
From: Christian König @ 2017-09-18  9:16 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> Change-Id: I635271ba4c89189017daa302a7fe5cd65c3eef06
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 7a20ba8..3d035a6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -696,12 +696,6 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device *adev)
>   	if (r)
>   		return r;
>   
> -	/* After HDP is initialized, flush HDP.*/
> -	if (adev->flags & AMD_IS_APU)
> -		nbio_v7_0_hdp_flush(adev);
> -	else
> -		nbio_v6_1_hdp_flush(adev);
> -
>   	switch (adev->asic_type) {
>   	case CHIP_RAVEN:
>   		mmhub_v1_0_initialize_power_gating(adev);
> @@ -724,6 +718,12 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device *adev)
>   	tmp = RREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL);
>   	WREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL, tmp);
>   
> +	/* After HDP is initialized, flush HDP.*/
> +	if (adev->flags & AMD_IS_APU)
> +		nbio_v7_0_hdp_flush(adev);
> +	else
> +		nbio_v6_1_hdp_flush(adev);
> +
>   	if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_ALWAYS)
>   		value = false;
>   	else


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 11/18] drm/amdgpu:add vgt_flush for gfx9
       [not found]     ` <1505715122-23904-12-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:18       ` Christian König
       [not found]         ` <34ac878c-5bf7-7735-1787-b5d3c1691fd2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Christian König @ 2017-09-18  9:18 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Olsak, Marek

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> Change-Id: I584572cfb9145ee1b8d11d69ba2989bd6acfd706
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

I could be wrong, but wasn't the consensus that this should be done by 
the UMD?

Marek, please comment.

Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 3306667..f201510 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3499,6 +3499,17 @@ static void gfx_v9_0_ring_set_wptr_gfx(struct amdgpu_ring *ring)
>   	}
>   }
>   
> +static void gfx_v9_0_ring_emit_vgt_flush(struct amdgpu_ring *ring)
> +{
> +	amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0));
> +	amdgpu_ring_write(ring, EVENT_TYPE(VS_PARTIAL_FLUSH) |
> +		EVENT_INDEX(4));
> +
> +	amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0));
> +	amdgpu_ring_write(ring, EVENT_TYPE(VGT_FLUSH) |
> +		EVENT_INDEX(0));
> +}
> +
>   static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
>   {
>   	u32 ref_and_mask, reg_mem_engine;
> @@ -3530,6 +3541,9 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
>   			      nbio_hf_reg->hdp_flush_req_offset,
>   			      nbio_hf_reg->hdp_flush_done_offset,
>   			      ref_and_mask, ref_and_mask, 0x20);
> +
> +	if (ring->funcs->type == AMDGPU_RING_TYPE_GFX)
> +		gfx_v9_0_ring_emit_vgt_flush(ring);
>   }
>   
>   static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
       [not found]     ` <1505715122-23904-13-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:19       ` Christian König
       [not found]         ` <2f11f862-6022-7a97-17ab-ae2c634f0061-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Christian König @ 2017-09-18  9:19 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

Please scan the code once more, we most likely have used mmHDP_DEBUG0 
for this at even more places.

Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index f201510..44960b3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3549,7 +3549,7 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
>   static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>   {
>   	gfx_v9_0_write_data_to_reg(ring, 0, true,
> -				   SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0), 1);
> +				   SOC15_REG_OFFSET(HDP, 0, mmHDP_READ_CACHE_INVALIDATE), 1);
>   }
>   
>   static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index fd7c72a..d5f3848 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -398,7 +398,7 @@ static void sdma_v4_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>   {
>   	amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_SRBM_WRITE) |
>   			  SDMA_PKT_SRBM_WRITE_HEADER_BYTE_EN(0xf));
> -	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0));
> +	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_READ_CACHE_INVALIDATE));
>   	amdgpu_ring_write(ring, 1);
>   }
>   


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 15/18] drm/amdgpu/sriov: fix page fault issue of driver unload
       [not found]     ` <1505715122-23904-16-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:22       ` Christian König
  0 siblings, 0 replies; 61+ messages in thread
From: Christian König @ 2017-09-18  9:22 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Horace Chen

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> bo_free on csa is too late to put in amdgpu_fini because that
> time ttm is already finished,
> Move it earlier to avoid the page fault.
>
> Change-Id: Id9c3f6aa8720cabbc9936ce21d8cf98af6e23bee
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> Signed-off-by: Horace Chen <horace.chen@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +---
>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c      | 1 +
>   2 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 298a241..e0a17bd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1795,10 +1795,8 @@ static int amdgpu_fini(struct amdgpu_device *adev)
>   		adev->ip_blocks[i].status.late_initialized = false;
>   	}
>   
> -	if (amdgpu_sriov_vf(adev)) {
> -		amdgpu_bo_free_kernel(&adev->virt.csa_obj, &adev->virt.csa_vmid0_addr, NULL);
> +	if (amdgpu_sriov_vf(adev))
>   		amdgpu_virt_release_full_gpu(adev, false);
> -	}
>   
>   	return 0;
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index 3f511a9..40e5865 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -2113,6 +2113,7 @@ static int gfx_v8_0_sw_fini(void *handle)
>   	amdgpu_gfx_compute_mqd_sw_fini(adev);
>   	amdgpu_gfx_kiq_free_ring(&adev->gfx.kiq.ring, &adev->gfx.kiq.irq);
>   	amdgpu_gfx_kiq_fini(adev);
> +	amdgpu_bo_free_kernel(&adev->virt.csa_obj, &adev->virt.csa_vmid0_addr, NULL);
>   
>   	gfx_v8_0_mec_fini(adev);
>   	gfx_v8_0_rlc_fini(adev);


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 16/18] drm/amdgpu: increate mailbox polling timeout to 12s.
       [not found]     ` <1505715122-23904-17-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:23       ` Christian König
  0 siblings, 0 replies; 61+ messages in thread
From: Christian König @ 2017-09-18  9:23 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Horace Chen

Am 18.09.2017 um 08:12 schrieb Monk Liu:
> From: Horace Chen <horace.chen@amd.com>
>
> Because there may have multiple FLR waiting for done, the waiting
> time of events may be long, add the time to 12s to reduce timeout
> failure.
>
> Change-Id: I6b33170ba7dedf781b99ba6095127efce403af81
> Signed-off-by: Horace Chen <horace.chen@amd.com>

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h | 2 +-
>   drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
> index 1e91b9a..67e7857 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
> @@ -24,7 +24,7 @@
>   #ifndef __MXGPU_AI_H__
>   #define __MXGPU_AI_H__
>   
> -#define AI_MAILBOX_TIMEDOUT	5000
> +#define AI_MAILBOX_TIMEDOUT	12000
>   
>   enum idh_request {
>   	IDH_REQ_GPU_INIT_ACCESS = 1,
> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h
> index c791d73..f13dc6c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h
> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h
> @@ -23,7 +23,7 @@
>   #ifndef __MXGPU_VI_H__
>   #define __MXGPU_VI_H__
>   
> -#define VI_MAILBOX_TIMEDOUT	5000
> +#define VI_MAILBOX_TIMEDOUT	12000
>   #define VI_MAILBOX_RESET_TIME	12
>   
>   /* VI mailbox messages request */


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 17/18] drm/amdgpu:fix uvd ring fini routine
       [not found]     ` <1505715122-23904-18-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:25       ` Christian König
  0 siblings, 0 replies; 61+ messages in thread
From: Christian König @ 2017-09-18  9:25 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 18.09.2017 um 08:12 schrieb Monk Liu:
> fix missing finish uvd enc_ring and wrongly finish uvd ring
>
> Change-Id: Ib74237ca5adcb3b128c9b751fced0b7db7b09e86
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 12 +++++++++++-
>   1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> index 331e34a..63b00eb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> @@ -269,6 +269,8 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
>   
>   int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)
>   {
> +	struct amdgpu_ring *ring;
> +	int i;

A newline should be between declaration and code.

>   	kfree(adev->uvd.saved_bo);
>   
>   	amd_sched_entity_fini(&adev->uvd.ring.sched, &adev->uvd.entity);
> @@ -277,7 +279,15 @@ int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)
>   			      &adev->uvd.gpu_addr,
>   			      (void **)&adev->uvd.cpu_addr);
>   
> -	amdgpu_ring_fini(&adev->uvd.ring);
> +	ring = &adev->uvd.ring;
> +	if (ring->adev)

No need for that, the first thing that amdgpu_ring_fini() does is 
checking ring->adev, so that is just duplicated.

Reviewed-by: Christian König <christian.koenig@amd.com> with those two 
minor issues fixed.

Regards,
Christian.

> +		amdgpu_ring_fini(ring);
> +
> +	for (i = 0; i < AMDGPU_MAX_UVD_ENC_RINGS; ++i) {
> +		ring = &adev->uvd.ring_enc[i];
> +		if (ring->adev)
> +			amdgpu_ring_fini(ring);
> +	}
>   
>   	release_firmware(adev->uvd.fw);
>   


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 13/18] drm/amdgpu:fix driver unloading bug
       [not found]     ` <1505715122-23904-14-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-18  9:27       ` Christian König
       [not found]         ` <1821bf91-83d8-c933-704d-fcd8db07def1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Christian König @ 2017-09-18  9:27 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Horace Chen

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> [SWDEV-126631] - fix hypervisor save_vf fail that occured
> after driver removed:
> 1. Because the KIQ and KCQ were not ummapped, save_vf will fail if driver freed mqd of KIQ and KCQ.
> 2. KIQ can't be unmapped since RLCV always need it, the bo_free on KIQ should be skipped
> 3. KCQ can be unmapped, and should be unmapped during hw_fini,
> 4. RLCV still need to access other mc address from some hw even after driver unloaded,
>     So we should not unbind gart for VF.
>
> Change-Id: I320487a9a848f41484c5f8cc11be34aca807b424
> Signed-off-by: Horace Chen <horace.chen@amd.com>
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

I absolutely can't judge if this is correct or not, but keeping the GART 
and KIQ alive after the driver is unloaded sounds really fishy to me.

Isn't there any other clean way of handling this?

Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c |  3 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  |  5 +++
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    | 60 +++++++++++++++++++++++++++++++-
>   3 files changed, 66 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> index f437008..2fee071 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> @@ -394,7 +394,8 @@ int amdgpu_gart_init(struct amdgpu_device *adev)
>    */
>   void amdgpu_gart_fini(struct amdgpu_device *adev)
>   {
> -	if (adev->gart.ready) {
> +	/* gart is still used by other hw under SRIOV, don't unbind it */
> +	if (adev->gart.ready && !amdgpu_sriov_vf(adev)) {
>   		/* unbind pages */
>   		amdgpu_gart_unbind(adev, 0, adev->gart.num_cpu_pages);
>   	}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 4f6c68f..bf6656f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -309,6 +309,11 @@ void amdgpu_gfx_compute_mqd_sw_fini(struct amdgpu_device *adev)
>   				      &ring->mqd_ptr);
>   	}
>   
> +	/* don't deallocate KIQ mqd because the bo is still used by RLCV even
> +	the guest VM is shutdown */
> +	if (amdgpu_sriov_vf(adev))
> +		return;
> +
>   	ring = &adev->gfx.kiq.ring;
>   	kfree(adev->gfx.mec.mqd_backup[AMDGPU_MAX_COMPUTE_RINGS]);
>   	amdgpu_bo_free_kernel(&ring->mqd_obj,
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 44960b3..a577bbc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -2892,14 +2892,72 @@ static int gfx_v9_0_hw_init(void *handle)
>   	return r;
>   }
>   
> +static int gfx_v9_0_kcq_disable(struct amdgpu_ring *kiq_ring,struct amdgpu_ring *ring)
> +{
> +	struct amdgpu_device *adev = kiq_ring->adev;
> +	uint32_t scratch, tmp = 0;
> +	int r, i;
> +
> +	r = amdgpu_gfx_scratch_get(adev, &scratch);
> +	if (r) {
> +		DRM_ERROR("Failed to get scratch reg (%d).\n", r);
> +		return r;
> +	}
> +	WREG32(scratch, 0xCAFEDEAD);
> +
> +	r = amdgpu_ring_alloc(kiq_ring, 10);
> +	if (r) {
> +		DRM_ERROR("Failed to lock KIQ (%d).\n", r);
> +		amdgpu_gfx_scratch_free(adev, scratch);
> +		return r;
> +	}
> +
> +	/* unmap queues */
> +	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_UNMAP_QUEUES, 4));
> +	amdgpu_ring_write(kiq_ring, /* Q_sel: 0, vmid: 0, engine: 0, num_Q: 1 */
> +						PACKET3_UNMAP_QUEUES_ACTION(1) | /* RESET_QUEUES */
> +						PACKET3_UNMAP_QUEUES_QUEUE_SEL(0) |
> +						PACKET3_UNMAP_QUEUES_ENGINE_SEL(0) |
> +						PACKET3_UNMAP_QUEUES_NUM_QUEUES(1));
> +	amdgpu_ring_write(kiq_ring, PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
> +	amdgpu_ring_write(kiq_ring, 0);
> +	amdgpu_ring_write(kiq_ring, 0);
> +	amdgpu_ring_write(kiq_ring, 0);
> +	/* write to scratch for completion */
> +	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_SET_UCONFIG_REG, 1));
> +	amdgpu_ring_write(kiq_ring, (scratch - PACKET3_SET_UCONFIG_REG_START));
> +	amdgpu_ring_write(kiq_ring, 0xDEADBEEF);
> +	amdgpu_ring_commit(kiq_ring);
> +
> +	for (i = 0; i < adev->usec_timeout; i++) {
> +		tmp = RREG32(scratch);
> +		if (tmp == 0xDEADBEEF)
> +			break;
> +		DRM_UDELAY(1);
> +	}
> +	if (i >= adev->usec_timeout) {
> +		DRM_ERROR("KCQ disabled failed (scratch(0x%04X)=0x%08X)\n", scratch, tmp);
> +		r = -EINVAL;
> +	}
> +	amdgpu_gfx_scratch_free(adev, scratch);
> +	return r;
> +}
> +
> +
>   static int gfx_v9_0_hw_fini(void *handle)
>   {
>   	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +	int i, r;
>   
>   	amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
>   	amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
>   	if (amdgpu_sriov_vf(adev)) {
> -		pr_debug("For SRIOV client, shouldn't do anything.\n");
> +		/* disable KCQ to avoid CPC touch memory not valid anymore */
> +		for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> +			r = gfx_v9_0_kcq_disable(&adev->gfx.kiq.ring, &adev->gfx.compute_ring[i]);
> +			if (r)
> +				return r;
> +		}
>   		return 0;
>   	}
>   	gfx_v9_0_cp_enable(adev, false);


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 14/18] drm/amdgpu: Fix amdgpu reload failure under SRIOV
       [not found]     ` <1505715122-23904-15-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
  2017-09-18  9:10       ` Yu, Xiangliang
@ 2017-09-18  9:31       ` Christian König
       [not found]         ` <0951ed06-954a-0f31-6b6e-ba923be008a2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 61+ messages in thread
From: Christian König @ 2017-09-18  9:31 UTC (permalink / raw)
  To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Horace Chen

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> From: Horace Chen <horace.chen@amd.com>
>
> Kernel will set the PCI power state to UNKNOWN after unloading,
> Since SRIOV has faked PCI config space so the UNKNOWN state
> will be kept forever.
>
> In driver reload if the power state is UNKNOWN then enabling msi
> will fail.
>
> forcely set it to D0 for SRIOV to fix this kernel flawness.
>
> Change-Id: I6a72d5fc9b653b21c3c98167515a511c5edeb91c
> Signed-off-by: Horace Chen <horace.chen@amd.com>
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

Acked-by: Christian König <christian.koenig@amd.com>, but better wait 
for Alex to have a look as well on this before pushing it.

Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 914c5bf..345406a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -229,7 +229,15 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>   	adev->irq.msi_enabled = false;
>   
>   	if (amdgpu_msi_ok(adev)) {
> -		int ret = pci_enable_msi(adev->pdev);
> +		int ret;
> +		if (amdgpu_sriov_vf(adev) &&
> +		    adev->pdev->current_state == PCI_UNKNOWN){
> +			/* If pci power state is unknown on the SRIOV platform,
> +			 * it may be set in the remove device. We need to forcely
> +			 * set it to D0 to enable the msi*/
> +			adev->pdev->current_state = PCI_D0;
> +		}
> +		ret = pci_enable_msi(adev->pdev);
>   		if (!ret) {
>   			adev->irq.msi_enabled = true;
>   			dev_info(adev->dev, "amdgpu: using MSI.\n");


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 13/18] drm/amdgpu:fix driver unloading bug
       [not found]         ` <1821bf91-83d8-c933-704d-fcd8db07def1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-09-18 10:12           ` Liu, Monk
       [not found]             ` <BLUPR12MB0449D3944109EA4A7D151A2684630-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Liu, Monk @ 2017-09-18 10:12 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Chen, Horace

Christian,

Let's discuss this patch and the one follows which skip the KIQ MQD free to avoid SAVE_FAIL issue.


For skipping KIQ MQD deallocation patch, I think I will drop it and use a new way:
We allocate KIQ MQD in VRAM domain and this BO can be safely freed after driver unloaded, because after driver unloaded no one will change the data in this BO *usually*.
e.g. some root  app can map visible vram and alter the value in it

for this patch "to skipping unbind the GART mapping to keep KIQ MQD always valid":
Since hypervisor side always have couple hw component working, and they rely on GMC kept alive, so this is very different with BARE-METAL. That's to say we can only do like this way.

Besides, we'll have more patches in future for L1 secure mode, which forbidden VF access GMC registers, so under L1 secure mode driver will always skip GMC programing under SRIOV both in init and fini, but that will come later

BR Monk



-----Original Message-----
From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com] 
Sent: 2017年9月18日 17:28
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Chen, Horace <Horace.Chen@amd.com>
Subject: Re: [PATCH 13/18] drm/amdgpu:fix driver unloading bug

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> [SWDEV-126631] - fix hypervisor save_vf fail that occured after driver 
> removed:
> 1. Because the KIQ and KCQ were not ummapped, save_vf will fail if driver freed mqd of KIQ and KCQ.
> 2. KIQ can't be unmapped since RLCV always need it, the bo_free on KIQ 
> should be skipped 3. KCQ can be unmapped, and should be unmapped 
> during hw_fini, 4. RLCV still need to access other mc address from some hw even after driver unloaded,
>     So we should not unbind gart for VF.
>
> Change-Id: I320487a9a848f41484c5f8cc11be34aca807b424
> Signed-off-by: Horace Chen <horace.chen@amd.com>
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

I absolutely can't judge if this is correct or not, but keeping the GART and KIQ alive after the driver is unloaded sounds really fishy to me.

Isn't there any other clean way of handling this?

Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c |  3 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  |  5 +++
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    | 60 +++++++++++++++++++++++++++++++-
>   3 files changed, 66 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> index f437008..2fee071 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> @@ -394,7 +394,8 @@ int amdgpu_gart_init(struct amdgpu_device *adev)
>    */
>   void amdgpu_gart_fini(struct amdgpu_device *adev)
>   {
> -	if (adev->gart.ready) {
> +	/* gart is still used by other hw under SRIOV, don't unbind it */
> +	if (adev->gart.ready && !amdgpu_sriov_vf(adev)) {
>   		/* unbind pages */
>   		amdgpu_gart_unbind(adev, 0, adev->gart.num_cpu_pages);
>   	}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 4f6c68f..bf6656f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -309,6 +309,11 @@ void amdgpu_gfx_compute_mqd_sw_fini(struct amdgpu_device *adev)
>   				      &ring->mqd_ptr);
>   	}
>   
> +	/* don't deallocate KIQ mqd because the bo is still used by RLCV even
> +	the guest VM is shutdown */
> +	if (amdgpu_sriov_vf(adev))
> +		return;
> +
>   	ring = &adev->gfx.kiq.ring;
>   	kfree(adev->gfx.mec.mqd_backup[AMDGPU_MAX_COMPUTE_RINGS]);
>   	amdgpu_bo_free_kernel(&ring->mqd_obj,
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 44960b3..a577bbc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -2892,14 +2892,72 @@ static int gfx_v9_0_hw_init(void *handle)
>   	return r;
>   }
>   
> +static int gfx_v9_0_kcq_disable(struct amdgpu_ring *kiq_ring,struct 
> +amdgpu_ring *ring) {
> +	struct amdgpu_device *adev = kiq_ring->adev;
> +	uint32_t scratch, tmp = 0;
> +	int r, i;
> +
> +	r = amdgpu_gfx_scratch_get(adev, &scratch);
> +	if (r) {
> +		DRM_ERROR("Failed to get scratch reg (%d).\n", r);
> +		return r;
> +	}
> +	WREG32(scratch, 0xCAFEDEAD);
> +
> +	r = amdgpu_ring_alloc(kiq_ring, 10);
> +	if (r) {
> +		DRM_ERROR("Failed to lock KIQ (%d).\n", r);
> +		amdgpu_gfx_scratch_free(adev, scratch);
> +		return r;
> +	}
> +
> +	/* unmap queues */
> +	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_UNMAP_QUEUES, 4));
> +	amdgpu_ring_write(kiq_ring, /* Q_sel: 0, vmid: 0, engine: 0, num_Q: 1 */
> +						PACKET3_UNMAP_QUEUES_ACTION(1) | /* RESET_QUEUES */
> +						PACKET3_UNMAP_QUEUES_QUEUE_SEL(0) |
> +						PACKET3_UNMAP_QUEUES_ENGINE_SEL(0) |
> +						PACKET3_UNMAP_QUEUES_NUM_QUEUES(1));
> +	amdgpu_ring_write(kiq_ring, PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
> +	amdgpu_ring_write(kiq_ring, 0);
> +	amdgpu_ring_write(kiq_ring, 0);
> +	amdgpu_ring_write(kiq_ring, 0);
> +	/* write to scratch for completion */
> +	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_SET_UCONFIG_REG, 1));
> +	amdgpu_ring_write(kiq_ring, (scratch - PACKET3_SET_UCONFIG_REG_START));
> +	amdgpu_ring_write(kiq_ring, 0xDEADBEEF);
> +	amdgpu_ring_commit(kiq_ring);
> +
> +	for (i = 0; i < adev->usec_timeout; i++) {
> +		tmp = RREG32(scratch);
> +		if (tmp == 0xDEADBEEF)
> +			break;
> +		DRM_UDELAY(1);
> +	}
> +	if (i >= adev->usec_timeout) {
> +		DRM_ERROR("KCQ disabled failed (scratch(0x%04X)=0x%08X)\n", scratch, tmp);
> +		r = -EINVAL;
> +	}
> +	amdgpu_gfx_scratch_free(adev, scratch);
> +	return r;
> +}
> +
> +
>   static int gfx_v9_0_hw_fini(void *handle)
>   {
>   	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +	int i, r;
>   
>   	amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
>   	amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
>   	if (amdgpu_sriov_vf(adev)) {
> -		pr_debug("For SRIOV client, shouldn't do anything.\n");
> +		/* disable KCQ to avoid CPC touch memory not valid anymore */
> +		for (i = 0; i < adev->gfx.num_compute_rings; i++) {
> +			r = gfx_v9_0_kcq_disable(&adev->gfx.kiq.ring, &adev->gfx.compute_ring[i]);
> +			if (r)
> +				return r;
> +		}
>   		return 0;
>   	}
>   	gfx_v9_0_cp_enable(adev, false);


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset
       [not found]         ` <f96a1189-2fe3-6466-df1b-557f87319cb9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-09-18 10:47           ` Liu, Monk
       [not found]             ` <BLUPR12MB0449D8D7812A4C80EDA2253D84630-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Liu, Monk @ 2017-09-18 10:47 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

I didn't get your point... how could bo_create_kernel solve my issue ?

The thing here is during gpu reset we invoke hw_init for every hw component, and by design hw_init shouldn't doing anything software related, thus the BO allocating in hw_init is wrong,

Even switch to bo_create_kernel won't address the issue ...


BR Monk

-----Original Message-----
From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com] 
Sent: 2017年9月18日 17:13
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> doing gpu reset will rerun all hw_init and thus ucode_init_bo is 
> invoked again, so we need to skip the fw_buf allocation during sriov 
> gpu reset to avoid memory leak.
>
> Change-Id: I31131eda1bd45ea2f5bdc50c5da5fc5a9fe9027d
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h       |  3 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 64 +++++++++++++++----------------
>   2 files changed, 35 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 6ff2959..3d0c633 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1185,6 +1185,9 @@ struct amdgpu_firmware {
>   
>   	/* gpu info firmware data pointer */
>   	const struct firmware *gpu_info_fw;
> +
> +	void *fw_buf_ptr;
> +	uint64_t fw_buf_mc;
>   };
>   
>   /*
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> index f306374..6564902 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> @@ -360,8 +360,6 @@ static int amdgpu_ucode_patch_jt(struct amdgpu_firmware_info *ucode,
>   int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
>   {
>   	struct amdgpu_bo **bo = &adev->firmware.fw_buf;
> -	uint64_t fw_mc_addr;
> -	void *fw_buf_ptr = NULL;
>   	uint64_t fw_offset = 0;
>   	int i, err;
>   	struct amdgpu_firmware_info *ucode = NULL; @@ -372,37 +370,39 @@ 
> int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
>   		return 0;
>   	}
>   
> -	err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true,
> -				amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
> -				AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
> -				NULL, NULL, 0, bo);
> -	if (err) {
> -		dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err);
> -		goto failed;
> -	}
> +	if (!amdgpu_sriov_vf(adev) || !adev->in_sriov_reset) {

Instead of all this better use amdgpu_bo_create_kernel(), this should already include most of the handling necessary here.

Christian.

> +		err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true,
> +					amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
> +					AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
> +					NULL, NULL, 0, bo);
> +		if (err) {
> +			dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err);
> +			goto failed;
> +		}
>   
> -	err = amdgpu_bo_reserve(*bo, false);
> -	if (err) {
> -		dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err);
> -		goto failed_reserve;
> -	}
> +		err = amdgpu_bo_reserve(*bo, false);
> +		if (err) {
> +			dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err);
> +			goto failed_reserve;
> +		}
>   
> -	err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
> -				&fw_mc_addr);
> -	if (err) {
> -		dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err);
> -		goto failed_pin;
> -	}
> +		err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
> +					&adev->firmware.fw_buf_mc);
> +		if (err) {
> +			dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err);
> +			goto failed_pin;
> +		}
>   
> -	err = amdgpu_bo_kmap(*bo, &fw_buf_ptr);
> -	if (err) {
> -		dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err);
> -		goto failed_kmap;
> -	}
> +		err = amdgpu_bo_kmap(*bo, &adev->firmware.fw_buf_ptr);
> +		if (err) {
> +			dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err);
> +			goto failed_kmap;
> +		}
>   
> -	amdgpu_bo_unreserve(*bo);
> +		amdgpu_bo_unreserve(*bo);
> +	}
>   
> -	memset(fw_buf_ptr, 0, adev->firmware.fw_size);
> +	memset(adev->firmware.fw_buf_ptr, 0, adev->firmware.fw_size);
>   
>   	/*
>   	 * if SMU loaded firmware, it needn't add SMC, UVD, and VCE @@ 
> -421,14 +421,14 @@ int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
>   		ucode = &adev->firmware.ucode[i];
>   		if (ucode->fw) {
>   			header = (const struct common_firmware_header *)ucode->fw->data;
> -			amdgpu_ucode_init_single_fw(adev, ucode, fw_mc_addr + fw_offset,
> -						    (void *)((uint8_t *)fw_buf_ptr + fw_offset));
> +			amdgpu_ucode_init_single_fw(adev, ucode, adev->firmware.fw_buf_mc + fw_offset,
> +						    adev->firmware.fw_buf_ptr + fw_offset);
>   			if (i == AMDGPU_UCODE_ID_CP_MEC1 &&
>   			    adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) {
>   				const struct gfx_firmware_header_v1_0 *cp_hdr;
>   				cp_hdr = (const struct gfx_firmware_header_v1_0 *)ucode->fw->data;
> -				amdgpu_ucode_patch_jt(ucode, fw_mc_addr + fw_offset,
> -						    fw_buf_ptr + fw_offset);
> +				amdgpu_ucode_patch_jt(ucode,  adev->firmware.fw_buf_mc + fw_offset,
> +						    adev->firmware.fw_buf_ptr + fw_offset);
>   				fw_offset += ALIGN(le32_to_cpu(cp_hdr->jt_size) << 2, PAGE_SIZE);
>   			}
>   			fw_offset += ALIGN(ucode->ucode_size, PAGE_SIZE);


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
       [not found]         ` <2f11f862-6022-7a97-17ab-ae2c634f0061-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-09-18 11:03           ` Liu, Monk
       [not found]             ` <BLUPR12MB04497CDE395DCE35F830DD4F84630-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Liu, Monk @ 2017-09-18 11:03 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Only vega10 has this register 

-----Original Message-----
From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com] 
Sent: 2017年9月18日 17:20
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate

Am 18.09.2017 um 08:11 schrieb Monk Liu:
> Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

Please scan the code once more, we most likely have used mmHDP_DEBUG0 for this at even more places.

Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index f201510..44960b3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3549,7 +3549,7 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
>   static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>   {
>   	gfx_v9_0_write_data_to_reg(ring, 0, true,
> -				   SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0), 1);
> +				   SOC15_REG_OFFSET(HDP, 0, mmHDP_READ_CACHE_INVALIDATE), 1);
>   }
>   
>   static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring, diff 
> --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index fd7c72a..d5f3848 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -398,7 +398,7 @@ static void sdma_v4_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>   {
>   	amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_SRBM_WRITE) |
>   			  SDMA_PKT_SRBM_WRITE_HEADER_BYTE_EN(0xf));
> -	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0));
> +	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, 
> +mmHDP_READ_CACHE_INVALIDATE));
>   	amdgpu_ring_write(ring, 1);
>   }
>   


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset
       [not found]             ` <BLUPR12MB0449D8D7812A4C80EDA2253D84630-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-09-18 11:34               ` Christian König
       [not found]                 ` <45fa4145-41a4-6186-4f35-4f3347bad601-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Christian König @ 2017-09-18 11:34 UTC (permalink / raw)
  To: Liu, Monk, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 18.09.2017 um 12:47 schrieb Liu, Monk:
> I didn't get your point... how could bo_create_kernel solve my issue ?

It doesn't solve the underlying issue, you just need less code for your 
workaround.

With bo_create_kernel you can do create/pin/kmap in just one function call.

>
> The thing here is during gpu reset we invoke hw_init for every hw component, and by design hw_init shouldn't doing anything software related, thus the BO allocating in hw_init is wrong,

Yeah, but your patch doesn't fix that either as far as I can see.

> Even switch to bo_create_kernel won't address the issue ...

See the implementation of bo_create_kernel():
>         if (!*bo_ptr) {
>                 r = amdgpu_bo_create(adev, size, align, true, domain,
....
>         }
....
>         r = amdgpu_bo_pin(*bo_ptr, domain, gpu_addr);
...
>         if (cpu_addr) {
>                 r = amdgpu_bo_kmap(*bo_ptr, cpu_addr);
...
>         }

Creating is actually optional, but the function always pins the BO once 
more and figures out it's CPU address.

As far as I can see that should solve your problem for now.

Christian.


>
>
> BR Monk
>
> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
> Sent: 2017年9月18日 17:13
> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset
>
> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>> doing gpu reset will rerun all hw_init and thus ucode_init_bo is
>> invoked again, so we need to skip the fw_buf allocation during sriov
>> gpu reset to avoid memory leak.
>>
>> Change-Id: I31131eda1bd45ea2f5bdc50c5da5fc5a9fe9027d
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h       |  3 ++
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 64 +++++++++++++++----------------
>>    2 files changed, 35 insertions(+), 32 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 6ff2959..3d0c633 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1185,6 +1185,9 @@ struct amdgpu_firmware {
>>    
>>    	/* gpu info firmware data pointer */
>>    	const struct firmware *gpu_info_fw;
>> +
>> +	void *fw_buf_ptr;
>> +	uint64_t fw_buf_mc;
>>    };
>>    
>>    /*
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> index f306374..6564902 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> @@ -360,8 +360,6 @@ static int amdgpu_ucode_patch_jt(struct amdgpu_firmware_info *ucode,
>>    int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
>>    {
>>    	struct amdgpu_bo **bo = &adev->firmware.fw_buf;
>> -	uint64_t fw_mc_addr;
>> -	void *fw_buf_ptr = NULL;
>>    	uint64_t fw_offset = 0;
>>    	int i, err;
>>    	struct amdgpu_firmware_info *ucode = NULL; @@ -372,37 +370,39 @@
>> int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
>>    		return 0;
>>    	}
>>    
>> -	err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true,
>> -				amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
>> -				AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
>> -				NULL, NULL, 0, bo);
>> -	if (err) {
>> -		dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err);
>> -		goto failed;
>> -	}
>> +	if (!amdgpu_sriov_vf(adev) || !adev->in_sriov_reset) {
> Instead of all this better use amdgpu_bo_create_kernel(), this should already include most of the handling necessary here.
>
> Christian.
>
>> +		err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true,
>> +					amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
>> +					AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
>> +					NULL, NULL, 0, bo);
>> +		if (err) {
>> +			dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err);
>> +			goto failed;
>> +		}
>>    
>> -	err = amdgpu_bo_reserve(*bo, false);
>> -	if (err) {
>> -		dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err);
>> -		goto failed_reserve;
>> -	}
>> +		err = amdgpu_bo_reserve(*bo, false);
>> +		if (err) {
>> +			dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err);
>> +			goto failed_reserve;
>> +		}
>>    
>> -	err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
>> -				&fw_mc_addr);
>> -	if (err) {
>> -		dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err);
>> -		goto failed_pin;
>> -	}
>> +		err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
>> +					&adev->firmware.fw_buf_mc);
>> +		if (err) {
>> +			dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err);
>> +			goto failed_pin;
>> +		}
>>    
>> -	err = amdgpu_bo_kmap(*bo, &fw_buf_ptr);
>> -	if (err) {
>> -		dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err);
>> -		goto failed_kmap;
>> -	}
>> +		err = amdgpu_bo_kmap(*bo, &adev->firmware.fw_buf_ptr);
>> +		if (err) {
>> +			dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err);
>> +			goto failed_kmap;
>> +		}
>>    
>> -	amdgpu_bo_unreserve(*bo);
>> +		amdgpu_bo_unreserve(*bo);
>> +	}
>>    
>> -	memset(fw_buf_ptr, 0, adev->firmware.fw_size);
>> +	memset(adev->firmware.fw_buf_ptr, 0, adev->firmware.fw_size);
>>    
>>    	/*
>>    	 * if SMU loaded firmware, it needn't add SMC, UVD, and VCE @@
>> -421,14 +421,14 @@ int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
>>    		ucode = &adev->firmware.ucode[i];
>>    		if (ucode->fw) {
>>    			header = (const struct common_firmware_header *)ucode->fw->data;
>> -			amdgpu_ucode_init_single_fw(adev, ucode, fw_mc_addr + fw_offset,
>> -						    (void *)((uint8_t *)fw_buf_ptr + fw_offset));
>> +			amdgpu_ucode_init_single_fw(adev, ucode, adev->firmware.fw_buf_mc + fw_offset,
>> +						    adev->firmware.fw_buf_ptr + fw_offset);
>>    			if (i == AMDGPU_UCODE_ID_CP_MEC1 &&
>>    			    adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) {
>>    				const struct gfx_firmware_header_v1_0 *cp_hdr;
>>    				cp_hdr = (const struct gfx_firmware_header_v1_0 *)ucode->fw->data;
>> -				amdgpu_ucode_patch_jt(ucode, fw_mc_addr + fw_offset,
>> -						    fw_buf_ptr + fw_offset);
>> +				amdgpu_ucode_patch_jt(ucode,  adev->firmware.fw_buf_mc + fw_offset,
>> +						    adev->firmware.fw_buf_ptr + fw_offset);
>>    				fw_offset += ALIGN(le32_to_cpu(cp_hdr->jt_size) << 2, PAGE_SIZE);
>>    			}
>>    			fw_offset += ALIGN(ucode->ucode_size, PAGE_SIZE);
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
       [not found]             ` <BLUPR12MB04497CDE395DCE35F830DD4F84630-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-09-18 11:39               ` Christian König
       [not found]                 ` <4de1beaf-95c0-ba6e-da79-1070074f82e8-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Christian König @ 2017-09-18 11:39 UTC (permalink / raw)
  To: Liu, Monk, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Yeah, but Vega10 has UVD7 and in uvd_v7_0.c we have:

> static void uvd_v7_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
> {
>         amdgpu_ring_write(ring, PACKET0(SOC15_REG_OFFSET(HDP, 0, 
> mmHDP_DEBUG0), 0));
>         amdgpu_ring_write(ring, 1);
> }

That should probably be fixed as well.

Regards,
Christian.

Am 18.09.2017 um 13:03 schrieb Liu, Monk:
> Only vega10 has this register
>
> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
> Sent: 2017年9月18日 17:20
> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
>
> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>> Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Please scan the code once more, we most likely have used mmHDP_DEBUG0 for this at even more places.
>
> Christian.
>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
>>    drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
>>    2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index f201510..44960b3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -3549,7 +3549,7 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
>>    static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>    {
>>    	gfx_v9_0_write_data_to_reg(ring, 0, true,
>> -				   SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0), 1);
>> +				   SOC15_REG_OFFSET(HDP, 0, mmHDP_READ_CACHE_INVALIDATE), 1);
>>    }
>>    
>>    static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring, diff
>> --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> index fd7c72a..d5f3848 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> @@ -398,7 +398,7 @@ static void sdma_v4_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>    {
>>    	amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_SRBM_WRITE) |
>>    			  SDMA_PKT_SRBM_WRITE_HEADER_BYTE_EN(0xf));
>> -	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0));
>> +	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0,
>> +mmHDP_READ_CACHE_INVALIDATE));
>>    	amdgpu_ring_write(ring, 1);
>>    }
>>    
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 13/18] drm/amdgpu:fix driver unloading bug
       [not found]             ` <BLUPR12MB0449D3944109EA4A7D151A2684630-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-09-18 11:53               ` Christian König
       [not found]                 ` <fade2e70-6594-9a6e-9d5a-d488d360363e-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Christian König @ 2017-09-18 11:53 UTC (permalink / raw)
  To: Liu, Monk, Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Chen, Horace

Am 18.09.2017 um 12:12 schrieb Liu, Monk:
> Christian,
>
> Let's discuss this patch and the one follows which skip the KIQ MQD free to avoid SAVE_FAIL issue.
>
>
> For skipping KIQ MQD deallocation patch, I think I will drop it and use a new way:
> We allocate KIQ MQD in VRAM domain and this BO can be safely freed after driver unloaded, because after driver unloaded no one will change the data in this BO *usually*.
> e.g. some root  app can map visible vram and alter the value in it

That sounds at least a bit better. But my question is why doesn't this 
work like it does on Tonga, e.g. correctly clean things up?

>
> for this patch "to skipping unbind the GART mapping to keep KIQ MQD always valid":
> Since hypervisor side always have couple hw component working, and they rely on GMC kept alive, so this is very different with BARE-METAL. That's to say we can only do like this way.

Yeah, gut keeping the GART mapping alive is complete nonsense. When the 
driver unloads all memory should be returned to the OS.

So we either keep a GART mapping to pages which are about to be reused 
and overwritten, or we leak memory on driver shutdown.

Neither options sounds very good,
Christian.

>
> Besides, we'll have more patches in future for L1 secure mode, which forbidden VF access GMC registers, so under L1 secure mode driver will always skip GMC programing under SRIOV both in init and fini, but that will come later
>
> BR Monk
>
>
>
> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
> Sent: 2017年9月18日 17:28
> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Chen, Horace <Horace.Chen@amd.com>
> Subject: Re: [PATCH 13/18] drm/amdgpu:fix driver unloading bug
>
> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>> [SWDEV-126631] - fix hypervisor save_vf fail that occured after driver
>> removed:
>> 1. Because the KIQ and KCQ were not ummapped, save_vf will fail if driver freed mqd of KIQ and KCQ.
>> 2. KIQ can't be unmapped since RLCV always need it, the bo_free on KIQ
>> should be skipped 3. KCQ can be unmapped, and should be unmapped
>> during hw_fini, 4. RLCV still need to access other mc address from some hw even after driver unloaded,
>>      So we should not unbind gart for VF.
>>
>> Change-Id: I320487a9a848f41484c5f8cc11be34aca807b424
>> Signed-off-by: Horace Chen <horace.chen@amd.com>
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> I absolutely can't judge if this is correct or not, but keeping the GART and KIQ alive after the driver is unloaded sounds really fishy to me.
>
> Isn't there any other clean way of handling this?
>
> Christian.
>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c |  3 +-
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  |  5 +++
>>    drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    | 60 +++++++++++++++++++++++++++++++-
>>    3 files changed, 66 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> index f437008..2fee071 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> @@ -394,7 +394,8 @@ int amdgpu_gart_init(struct amdgpu_device *adev)
>>     */
>>    void amdgpu_gart_fini(struct amdgpu_device *adev)
>>    {
>> -	if (adev->gart.ready) {
>> +	/* gart is still used by other hw under SRIOV, don't unbind it */
>> +	if (adev->gart.ready && !amdgpu_sriov_vf(adev)) {
>>    		/* unbind pages */
>>    		amdgpu_gart_unbind(adev, 0, adev->gart.num_cpu_pages);
>>    	}
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> index 4f6c68f..bf6656f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> @@ -309,6 +309,11 @@ void amdgpu_gfx_compute_mqd_sw_fini(struct amdgpu_device *adev)
>>    				      &ring->mqd_ptr);
>>    	}
>>    
>> +	/* don't deallocate KIQ mqd because the bo is still used by RLCV even
>> +	the guest VM is shutdown */
>> +	if (amdgpu_sriov_vf(adev))
>> +		return;
>> +
>>    	ring = &adev->gfx.kiq.ring;
>>    	kfree(adev->gfx.mec.mqd_backup[AMDGPU_MAX_COMPUTE_RINGS]);
>>    	amdgpu_bo_free_kernel(&ring->mqd_obj,
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index 44960b3..a577bbc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -2892,14 +2892,72 @@ static int gfx_v9_0_hw_init(void *handle)
>>    	return r;
>>    }
>>    
>> +static int gfx_v9_0_kcq_disable(struct amdgpu_ring *kiq_ring,struct
>> +amdgpu_ring *ring) {
>> +	struct amdgpu_device *adev = kiq_ring->adev;
>> +	uint32_t scratch, tmp = 0;
>> +	int r, i;
>> +
>> +	r = amdgpu_gfx_scratch_get(adev, &scratch);
>> +	if (r) {
>> +		DRM_ERROR("Failed to get scratch reg (%d).\n", r);
>> +		return r;
>> +	}
>> +	WREG32(scratch, 0xCAFEDEAD);
>> +
>> +	r = amdgpu_ring_alloc(kiq_ring, 10);
>> +	if (r) {
>> +		DRM_ERROR("Failed to lock KIQ (%d).\n", r);
>> +		amdgpu_gfx_scratch_free(adev, scratch);
>> +		return r;
>> +	}
>> +
>> +	/* unmap queues */
>> +	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_UNMAP_QUEUES, 4));
>> +	amdgpu_ring_write(kiq_ring, /* Q_sel: 0, vmid: 0, engine: 0, num_Q: 1 */
>> +						PACKET3_UNMAP_QUEUES_ACTION(1) | /* RESET_QUEUES */
>> +						PACKET3_UNMAP_QUEUES_QUEUE_SEL(0) |
>> +						PACKET3_UNMAP_QUEUES_ENGINE_SEL(0) |
>> +						PACKET3_UNMAP_QUEUES_NUM_QUEUES(1));
>> +	amdgpu_ring_write(kiq_ring, PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
>> +	amdgpu_ring_write(kiq_ring, 0);
>> +	amdgpu_ring_write(kiq_ring, 0);
>> +	amdgpu_ring_write(kiq_ring, 0);
>> +	/* write to scratch for completion */
>> +	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_SET_UCONFIG_REG, 1));
>> +	amdgpu_ring_write(kiq_ring, (scratch - PACKET3_SET_UCONFIG_REG_START));
>> +	amdgpu_ring_write(kiq_ring, 0xDEADBEEF);
>> +	amdgpu_ring_commit(kiq_ring);
>> +
>> +	for (i = 0; i < adev->usec_timeout; i++) {
>> +		tmp = RREG32(scratch);
>> +		if (tmp == 0xDEADBEEF)
>> +			break;
>> +		DRM_UDELAY(1);
>> +	}
>> +	if (i >= adev->usec_timeout) {
>> +		DRM_ERROR("KCQ disabled failed (scratch(0x%04X)=0x%08X)\n", scratch, tmp);
>> +		r = -EINVAL;
>> +	}
>> +	amdgpu_gfx_scratch_free(adev, scratch);
>> +	return r;
>> +}
>> +
>> +
>>    static int gfx_v9_0_hw_fini(void *handle)
>>    {
>>    	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>> +	int i, r;
>>    
>>    	amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
>>    	amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
>>    	if (amdgpu_sriov_vf(adev)) {
>> -		pr_debug("For SRIOV client, shouldn't do anything.\n");
>> +		/* disable KCQ to avoid CPC touch memory not valid anymore */
>> +		for (i = 0; i < adev->gfx.num_compute_rings; i++) {
>> +			r = gfx_v9_0_kcq_disable(&adev->gfx.kiq.ring, &adev->gfx.compute_ring[i]);
>> +			if (r)
>> +				return r;
>> +		}
>>    		return 0;
>>    	}
>>    	gfx_v9_0_cp_enable(adev, false);
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 11/18] drm/amdgpu:add vgt_flush for gfx9
       [not found]         ` <34ac878c-5bf7-7735-1787-b5d3c1691fd2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-09-18 15:48           ` Marek Olšák
  0 siblings, 0 replies; 61+ messages in thread
From: Marek Olšák @ 2017-09-18 15:48 UTC (permalink / raw)
  To: Christian König; +Cc: Olsak, Marek, amd-gfx mailing list, Monk Liu

Yes, the UMD does it.

Marek

On Mon, Sep 18, 2017 at 11:18 AM, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>>
>> Change-Id: I584572cfb9145ee1b8d11d69ba2989bd6acfd706
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>
>
> I could be wrong, but wasn't the consensus that this should be done by the
> UMD?
>
> Marek, please comment.
>
> Christian.
>
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index 3306667..f201510 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -3499,6 +3499,17 @@ static void gfx_v9_0_ring_set_wptr_gfx(struct
>> amdgpu_ring *ring)
>>         }
>>   }
>>   +static void gfx_v9_0_ring_emit_vgt_flush(struct amdgpu_ring *ring)
>> +{
>> +       amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0));
>> +       amdgpu_ring_write(ring, EVENT_TYPE(VS_PARTIAL_FLUSH) |
>> +               EVENT_INDEX(4));
>> +
>> +       amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0));
>> +       amdgpu_ring_write(ring, EVENT_TYPE(VGT_FLUSH) |
>> +               EVENT_INDEX(0));
>> +}
>> +
>>   static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
>>   {
>>         u32 ref_and_mask, reg_mem_engine;
>> @@ -3530,6 +3541,9 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct
>> amdgpu_ring *ring)
>>                               nbio_hf_reg->hdp_flush_req_offset,
>>                               nbio_hf_reg->hdp_flush_done_offset,
>>                               ref_and_mask, ref_and_mask, 0x20);
>> +
>> +       if (ring->funcs->type == AMDGPU_RING_TYPE_GFX)
>> +               gfx_v9_0_ring_emit_vgt_flush(ring);
>>   }
>>     static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring
>> *ring)
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 14/18] drm/amdgpu: Fix amdgpu reload failure under SRIOV
       [not found]         ` <0951ed06-954a-0f31-6b6e-ba923be008a2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-09-18 21:07           ` Alex Deucher
       [not found]             ` <CADnq5_Nj5Kqp4CXtFLLz-cPynvchBV-RLFFpB6e5D-OCyPXQiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Alex Deucher @ 2017-09-18 21:07 UTC (permalink / raw)
  To: Christian Koenig; +Cc: Horace Chen, amd-gfx list, Monk Liu

On Mon, Sep 18, 2017 at 5:31 AM, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>>
>> From: Horace Chen <horace.chen@amd.com>
>>
>> Kernel will set the PCI power state to UNKNOWN after unloading,
>> Since SRIOV has faked PCI config space so the UNKNOWN state
>> will be kept forever.
>>
>> In driver reload if the power state is UNKNOWN then enabling msi
>> will fail.
>>
>> forcely set it to D0 for SRIOV to fix this kernel flawness.
>>
>> Change-Id: I6a72d5fc9b653b21c3c98167515a511c5edeb91c
>> Signed-off-by: Horace Chen <horace.chen@amd.com>
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>
>
> Acked-by: Christian König <christian.koenig@amd.com>, but better wait for
> Alex to have a look as well on this before pushing it.
>

Seems reasonable to me barring Xiangliang's comment.
Acked-by: Alex Deucher <alexander.deucher@amd.com>

> Christian.
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> index 914c5bf..345406a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> @@ -229,7 +229,15 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>>         adev->irq.msi_enabled = false;
>>         if (amdgpu_msi_ok(adev)) {
>> -               int ret = pci_enable_msi(adev->pdev);
>> +               int ret;
>> +               if (amdgpu_sriov_vf(adev) &&
>> +                   adev->pdev->current_state == PCI_UNKNOWN){
>> +                       /* If pci power state is unknown on the SRIOV
>> platform,
>> +                        * it may be set in the remove device. We need to
>> forcely
>> +                        * set it to D0 to enable the msi*/
>> +                       adev->pdev->current_state = PCI_D0;
>> +               }
>> +               ret = pci_enable_msi(adev->pdev);
>>                 if (!ret) {
>>                         adev->irq.msi_enabled = true;
>>                         dev_info(adev->dev, "amdgpu: using MSI.\n");
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 14/18] drm/amdgpu: Fix amdgpu reload failure under SRIOV
       [not found]             ` <CADnq5_Nj5Kqp4CXtFLLz-cPynvchBV-RLFFpB6e5D-OCyPXQiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-09-19  1:52               ` Yu, Xiangliang
  0 siblings, 0 replies; 61+ messages in thread
From: Yu, Xiangliang @ 2017-09-19  1:52 UTC (permalink / raw)
  To: Alex Deucher, Koenig, Christian; +Cc: Chen, Horace, Liu, Monk, amd-gfx list

pci_enable_device will set power state to D0. This patch is just work around the issue, not address the root cause.


-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Alex Deucher
Sent: Tuesday, September 19, 2017 5:07 AM
To: Koenig, Christian <Christian.Koenig@amd.com>
Cc: Chen, Horace <Horace.Chen@amd.com>; amd-gfx list <amd-gfx@lists.freedesktop.org>; Liu, Monk <Monk.Liu@amd.com>
Subject: Re: [PATCH 14/18] drm/amdgpu: Fix amdgpu reload failure under SRIOV

On Mon, Sep 18, 2017 at 5:31 AM, Christian König <ckoenig.leichtzumerken@gmail.com> wrote:
> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>>
>> From: Horace Chen <horace.chen@amd.com>
>>
>> Kernel will set the PCI power state to UNKNOWN after unloading, Since 
>> SRIOV has faked PCI config space so the UNKNOWN state will be kept 
>> forever.
>>
>> In driver reload if the power state is UNKNOWN then enabling msi will 
>> fail.
>>
>> forcely set it to D0 for SRIOV to fix this kernel flawness.
>>
>> Change-Id: I6a72d5fc9b653b21c3c98167515a511c5edeb91c
>> Signed-off-by: Horace Chen <horace.chen@amd.com>
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>
>
> Acked-by: Christian König <christian.koenig@amd.com>, but better wait 
> for Alex to have a look as well on this before pushing it.
>

Seems reasonable to me barring Xiangliang's comment.
Acked-by: Alex Deucher <alexander.deucher@amd.com>

> Christian.
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> index 914c5bf..345406a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> @@ -229,7 +229,15 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>>         adev->irq.msi_enabled = false;
>>         if (amdgpu_msi_ok(adev)) {
>> -               int ret = pci_enable_msi(adev->pdev);
>> +               int ret;
>> +               if (amdgpu_sriov_vf(adev) &&
>> +                   adev->pdev->current_state == PCI_UNKNOWN){
>> +                       /* If pci power state is unknown on the SRIOV
>> platform,
>> +                        * it may be set in the remove device. We 
>> + need to
>> forcely
>> +                        * set it to D0 to enable the msi*/
>> +                       adev->pdev->current_state = PCI_D0;
>> +               }
>> +               ret = pci_enable_msi(adev->pdev);
>>                 if (!ret) {
>>                         adev->irq.msi_enabled = true;
>>                         dev_info(adev->dev, "amdgpu: using MSI.\n");
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
       [not found]                 ` <4de1beaf-95c0-ba6e-da79-1070074f82e8-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-19  4:04                   ` Liu, Monk
       [not found]                     ` <BLUPR12MB0449D86C880B4B15A4FD916884600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Liu, Monk @ 2017-09-19  4:04 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Yeah, vnc1_0 and uvd_v7_0

Thanks 

-----Original Message-----
From: Koenig, Christian 
Sent: 2017年9月18日 19:39
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate

Yeah, but Vega10 has UVD7 and in uvd_v7_0.c we have:

> static void uvd_v7_0_ring_emit_hdp_invalidate(struct amdgpu_ring 
> *ring) {
>         amdgpu_ring_write(ring, PACKET0(SOC15_REG_OFFSET(HDP, 0, 
> mmHDP_DEBUG0), 0));
>         amdgpu_ring_write(ring, 1);
> }

That should probably be fixed as well.

Regards,
Christian.

Am 18.09.2017 um 13:03 schrieb Liu, Monk:
> Only vega10 has this register
>
> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
> Sent: 2017年9月18日 17:20
> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger 
> hdp invalidate
>
> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>> Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Please scan the code once more, we most likely have used mmHDP_DEBUG0 for this at even more places.
>
> Christian.
>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
>>    drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
>>    2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index f201510..44960b3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -3549,7 +3549,7 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
>>    static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>    {
>>    	gfx_v9_0_write_data_to_reg(ring, 0, true,
>> -				   SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0), 1);
>> +				   SOC15_REG_OFFSET(HDP, 0, mmHDP_READ_CACHE_INVALIDATE), 1);
>>    }
>>    
>>    static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring, 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> index fd7c72a..d5f3848 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> @@ -398,7 +398,7 @@ static void sdma_v4_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>    {
>>    	amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_SRBM_WRITE) |
>>    			  SDMA_PKT_SRBM_WRITE_HEADER_BYTE_EN(0xf));
>> -	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0));
>> +	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, 
>> +mmHDP_READ_CACHE_INVALIDATE));
>>    	amdgpu_ring_write(ring, 1);
>>    }
>>    
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 13/18] drm/amdgpu:fix driver unloading bug
       [not found]                 ` <fade2e70-6594-9a6e-9d5a-d488d360363e-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-09-19  4:14                   ` Liu, Monk
       [not found]                     ` <BLUPR12MB04498EEB2BF374C72EF7CF5384600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Liu, Monk @ 2017-09-19  4:14 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Chen, Horace

Christian, 

> That sounds at least a bit better. But my question is why doesn't this work like it does on Tonga, e.g. correctly clean things up?
Tonga also suffer with this issue, just that we fixed it in the branch for CSP customer and staging code usually behind our private branch ...

> Yeah, gut keeping the GART mapping alive is complete nonsense. When the driver unloads all memory should be returned to the OS.

>So we either keep a GART mapping to pages which are about to be reused and overwritten, or we leak memory on driver shutdown.

>Neither options sounds very good,

Unbinding the GART mapping makes CPC hang if it run MQD commands, and CPC must run MQD commands because RLCV always
Requires CPC do that when RLCV doing SAVE_VF commands,

Do you have way to fix above circle ?


BR Monk


-----Original Message-----
From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com] 
Sent: 2017年9月18日 19:54
To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Chen, Horace <Horace.Chen@amd.com>
Subject: Re: [PATCH 13/18] drm/amdgpu:fix driver unloading bug

Am 18.09.2017 um 12:12 schrieb Liu, Monk:
> Christian,
>
> Let's discuss this patch and the one follows which skip the KIQ MQD free to avoid SAVE_FAIL issue.
>
>
> For skipping KIQ MQD deallocation patch, I think I will drop it and use a new way:
> We allocate KIQ MQD in VRAM domain and this BO can be safely freed after driver unloaded, because after driver unloaded no one will change the data in this BO *usually*.
> e.g. some root  app can map visible vram and alter the value in it

That sounds at least a bit better. But my question is why doesn't this work like it does on Tonga, e.g. correctly clean things up?

>
> for this patch "to skipping unbind the GART mapping to keep KIQ MQD always valid":
> Since hypervisor side always have couple hw component working, and they rely on GMC kept alive, so this is very different with BARE-METAL. That's to say we can only do like this way.

Yeah, gut keeping the GART mapping alive is complete nonsense. When the driver unloads all memory should be returned to the OS.

So we either keep a GART mapping to pages which are about to be reused and overwritten, or we leak memory on driver shutdown.

Neither options sounds very good,
Christian.

>
> Besides, we'll have more patches in future for L1 secure mode, which 
> forbidden VF access GMC registers, so under L1 secure mode driver will 
> always skip GMC programing under SRIOV both in init and fini, but that 
> will come later
>
> BR Monk
>
>
>
> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
> Sent: 2017年9月18日 17:28
> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Chen, Horace <Horace.Chen@amd.com>
> Subject: Re: [PATCH 13/18] drm/amdgpu:fix driver unloading bug
>
> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>> [SWDEV-126631] - fix hypervisor save_vf fail that occured after 
>> driver
>> removed:
>> 1. Because the KIQ and KCQ were not ummapped, save_vf will fail if driver freed mqd of KIQ and KCQ.
>> 2. KIQ can't be unmapped since RLCV always need it, the bo_free on 
>> KIQ should be skipped 3. KCQ can be unmapped, and should be unmapped 
>> during hw_fini, 4. RLCV still need to access other mc address from some hw even after driver unloaded,
>>      So we should not unbind gart for VF.
>>
>> Change-Id: I320487a9a848f41484c5f8cc11be34aca807b424
>> Signed-off-by: Horace Chen <horace.chen@amd.com>
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> I absolutely can't judge if this is correct or not, but keeping the GART and KIQ alive after the driver is unloaded sounds really fishy to me.
>
> Isn't there any other clean way of handling this?
>
> Christian.
>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c |  3 +-
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  |  5 +++
>>    drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    | 60 +++++++++++++++++++++++++++++++-
>>    3 files changed, 66 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> index f437008..2fee071 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> @@ -394,7 +394,8 @@ int amdgpu_gart_init(struct amdgpu_device *adev)
>>     */
>>    void amdgpu_gart_fini(struct amdgpu_device *adev)
>>    {
>> -	if (adev->gart.ready) {
>> +	/* gart is still used by other hw under SRIOV, don't unbind it */
>> +	if (adev->gart.ready && !amdgpu_sriov_vf(adev)) {
>>    		/* unbind pages */
>>    		amdgpu_gart_unbind(adev, 0, adev->gart.num_cpu_pages);
>>    	}
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> index 4f6c68f..bf6656f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> @@ -309,6 +309,11 @@ void amdgpu_gfx_compute_mqd_sw_fini(struct amdgpu_device *adev)
>>    				      &ring->mqd_ptr);
>>    	}
>>    
>> +	/* don't deallocate KIQ mqd because the bo is still used by RLCV even
>> +	the guest VM is shutdown */
>> +	if (amdgpu_sriov_vf(adev))
>> +		return;
>> +
>>    	ring = &adev->gfx.kiq.ring;
>>    	kfree(adev->gfx.mec.mqd_backup[AMDGPU_MAX_COMPUTE_RINGS]);
>>    	amdgpu_bo_free_kernel(&ring->mqd_obj,
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index 44960b3..a577bbc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -2892,14 +2892,72 @@ static int gfx_v9_0_hw_init(void *handle)
>>    	return r;
>>    }
>>    
>> +static int gfx_v9_0_kcq_disable(struct amdgpu_ring *kiq_ring,struct 
>> +amdgpu_ring *ring) {
>> +	struct amdgpu_device *adev = kiq_ring->adev;
>> +	uint32_t scratch, tmp = 0;
>> +	int r, i;
>> +
>> +	r = amdgpu_gfx_scratch_get(adev, &scratch);
>> +	if (r) {
>> +		DRM_ERROR("Failed to get scratch reg (%d).\n", r);
>> +		return r;
>> +	}
>> +	WREG32(scratch, 0xCAFEDEAD);
>> +
>> +	r = amdgpu_ring_alloc(kiq_ring, 10);
>> +	if (r) {
>> +		DRM_ERROR("Failed to lock KIQ (%d).\n", r);
>> +		amdgpu_gfx_scratch_free(adev, scratch);
>> +		return r;
>> +	}
>> +
>> +	/* unmap queues */
>> +	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_UNMAP_QUEUES, 4));
>> +	amdgpu_ring_write(kiq_ring, /* Q_sel: 0, vmid: 0, engine: 0, num_Q: 1 */
>> +						PACKET3_UNMAP_QUEUES_ACTION(1) | /* RESET_QUEUES */
>> +						PACKET3_UNMAP_QUEUES_QUEUE_SEL(0) |
>> +						PACKET3_UNMAP_QUEUES_ENGINE_SEL(0) |
>> +						PACKET3_UNMAP_QUEUES_NUM_QUEUES(1));
>> +	amdgpu_ring_write(kiq_ring, PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
>> +	amdgpu_ring_write(kiq_ring, 0);
>> +	amdgpu_ring_write(kiq_ring, 0);
>> +	amdgpu_ring_write(kiq_ring, 0);
>> +	/* write to scratch for completion */
>> +	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_SET_UCONFIG_REG, 1));
>> +	amdgpu_ring_write(kiq_ring, (scratch - PACKET3_SET_UCONFIG_REG_START));
>> +	amdgpu_ring_write(kiq_ring, 0xDEADBEEF);
>> +	amdgpu_ring_commit(kiq_ring);
>> +
>> +	for (i = 0; i < adev->usec_timeout; i++) {
>> +		tmp = RREG32(scratch);
>> +		if (tmp == 0xDEADBEEF)
>> +			break;
>> +		DRM_UDELAY(1);
>> +	}
>> +	if (i >= adev->usec_timeout) {
>> +		DRM_ERROR("KCQ disabled failed (scratch(0x%04X)=0x%08X)\n", scratch, tmp);
>> +		r = -EINVAL;
>> +	}
>> +	amdgpu_gfx_scratch_free(adev, scratch);
>> +	return r;
>> +}
>> +
>> +
>>    static int gfx_v9_0_hw_fini(void *handle)
>>    {
>>    	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>> +	int i, r;
>>    
>>    	amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
>>    	amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
>>    	if (amdgpu_sriov_vf(adev)) {
>> -		pr_debug("For SRIOV client, shouldn't do anything.\n");
>> +		/* disable KCQ to avoid CPC touch memory not valid anymore */
>> +		for (i = 0; i < adev->gfx.num_compute_rings; i++) {
>> +			r = gfx_v9_0_kcq_disable(&adev->gfx.kiq.ring, &adev->gfx.compute_ring[i]);
>> +			if (r)
>> +				return r;
>> +		}
>>    		return 0;
>>    	}
>>    	gfx_v9_0_cp_enable(adev, false);
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
       [not found]                     ` <BLUPR12MB0449D86C880B4B15A4FD916884600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-09-19  4:25                       ` Zhou, David(ChunMing)
       [not found]                         ` <MWHPR1201MB020621C233AA2C12F6127C61B4600-3iK1xFAIwjrUF/YbdlDdgWrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Zhou, David(ChunMing) @ 2017-09-19  4:25 UTC (permalink / raw)
  To: Liu, Monk, Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Please answer my question as I raised in another thread, otherwise I will give a NAK on this!

Regards,
David Zhou

-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Liu, Monk
Sent: Tuesday, September 19, 2017 12:04 PM
To: Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate

Yeah, vnc1_0 and uvd_v7_0

Thanks 

-----Original Message-----
From: Koenig, Christian
Sent: 2017年9月18日 19:39
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate

Yeah, but Vega10 has UVD7 and in uvd_v7_0.c we have:

> static void uvd_v7_0_ring_emit_hdp_invalidate(struct amdgpu_ring
> *ring) {
>         amdgpu_ring_write(ring, PACKET0(SOC15_REG_OFFSET(HDP, 0, 
> mmHDP_DEBUG0), 0));
>         amdgpu_ring_write(ring, 1);
> }

That should probably be fixed as well.

Regards,
Christian.

Am 18.09.2017 um 13:03 schrieb Liu, Monk:
> Only vega10 has this register
>
> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
> Sent: 2017年9月18日 17:20
> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger 
> hdp invalidate
>
> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>> Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Please scan the code once more, we most likely have used mmHDP_DEBUG0 for this at even more places.
>
> Christian.
>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
>>    drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
>>    2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index f201510..44960b3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -3549,7 +3549,7 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
>>    static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>    {
>>    	gfx_v9_0_write_data_to_reg(ring, 0, true,
>> -				   SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0), 1);
>> +				   SOC15_REG_OFFSET(HDP, 0, mmHDP_READ_CACHE_INVALIDATE), 1);
>>    }
>>    
>>    static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring, 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> index fd7c72a..d5f3848 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> @@ -398,7 +398,7 @@ static void sdma_v4_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>    {
>>    	amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_SRBM_WRITE) |
>>    			  SDMA_PKT_SRBM_WRITE_HEADER_BYTE_EN(0xf));
>> -	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0));
>> +	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, 
>> +mmHDP_READ_CACHE_INVALIDATE));
>>    	amdgpu_ring_write(ring, 1);
>>    }
>>    
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
       [not found]                         ` <MWHPR1201MB020621C233AA2C12F6127C61B4600-3iK1xFAIwjrUF/YbdlDdgWrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-09-19  6:46                           ` Liu, Monk
       [not found]                             ` <BLUPR12MB0449F560B6A658DC4C120EC084600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Liu, Monk @ 2017-09-19  6:46 UTC (permalink / raw)
  To: Zhou, David(ChunMing),
	Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

What question ? please reply here

-----Original Message-----
From: Zhou, David(ChunMing) 
Sent: 2017年9月19日 12:25
To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate

Please answer my question as I raised in another thread, otherwise I will give a NAK on this!

Regards,
David Zhou

-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Liu, Monk
Sent: Tuesday, September 19, 2017 12:04 PM
To: Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate

Yeah, vnc1_0 and uvd_v7_0

Thanks 

-----Original Message-----
From: Koenig, Christian
Sent: 2017年9月18日 19:39
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate

Yeah, but Vega10 has UVD7 and in uvd_v7_0.c we have:

> static void uvd_v7_0_ring_emit_hdp_invalidate(struct amdgpu_ring
> *ring) {
>         amdgpu_ring_write(ring, PACKET0(SOC15_REG_OFFSET(HDP, 0, 
> mmHDP_DEBUG0), 0));
>         amdgpu_ring_write(ring, 1);
> }

That should probably be fixed as well.

Regards,
Christian.

Am 18.09.2017 um 13:03 schrieb Liu, Monk:
> Only vega10 has this register
>
> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
> Sent: 2017年9月18日 17:20
> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger 
> hdp invalidate
>
> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>> Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Please scan the code once more, we most likely have used mmHDP_DEBUG0 for this at even more places.
>
> Christian.
>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
>>    drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
>>    2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index f201510..44960b3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -3549,7 +3549,7 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
>>    static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>    {
>>    	gfx_v9_0_write_data_to_reg(ring, 0, true,
>> -				   SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0), 1);
>> +				   SOC15_REG_OFFSET(HDP, 0, mmHDP_READ_CACHE_INVALIDATE), 1);
>>    }
>>    
>>    static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring, 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> index fd7c72a..d5f3848 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> @@ -398,7 +398,7 @@ static void sdma_v4_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>    {
>>    	amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_SRBM_WRITE) |
>>    			  SDMA_PKT_SRBM_WRITE_HEADER_BYTE_EN(0xf));
>> -	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0));
>> +	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, 
>> +mmHDP_READ_CACHE_INVALIDATE));
>>    	amdgpu_ring_write(ring, 1);
>>    }
>>    
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
       [not found]                             ` <BLUPR12MB0449F560B6A658DC4C120EC084600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-09-19  6:50                               ` zhoucm1
       [not found]                                 ` <baa9518f-d2b1-cfb8-8f98-c3557e3ef8fe-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: zhoucm1 @ 2017-09-19  6:50 UTC (permalink / raw)
  To: Liu, Monk, Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


Seems the change is more proper, but where do you find 
mmHDP_READ_CACHE_INVALIDATE? Could you double check if Windows driver 
has changed to use this?
I'm confusing it, since mmHDP_DEBUG0 implementation is from windows as 
well.
I even don't find mmHDP_READ_CACHE_INVALIDATE in register spec.

Regards,
David Zhou
On 2017年09月19日 14:46, Liu, Monk wrote:
> What question ? please reply here
>
> -----Original Message-----
> From: Zhou, David(ChunMing)
> Sent: 2017年9月19日 12:25
> To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
>
> Please answer my question as I raised in another thread, otherwise I will give a NAK on this!
>
> Regards,
> David Zhou
>
> -----Original Message-----
> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Liu, Monk
> Sent: Tuesday, September 19, 2017 12:04 PM
> To: Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
>
> Yeah, vnc1_0 and uvd_v7_0
>
> Thanks
>
> -----Original Message-----
> From: Koenig, Christian
> Sent: 2017年9月18日 19:39
> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
>
> Yeah, but Vega10 has UVD7 and in uvd_v7_0.c we have:
>
>> static void uvd_v7_0_ring_emit_hdp_invalidate(struct amdgpu_ring
>> *ring) {
>>          amdgpu_ring_write(ring, PACKET0(SOC15_REG_OFFSET(HDP, 0,
>> mmHDP_DEBUG0), 0));
>>          amdgpu_ring_write(ring, 1);
>> }
> That should probably be fixed as well.
>
> Regards,
> Christian.
>
> Am 18.09.2017 um 13:03 schrieb Liu, Monk:
>> Only vega10 has this register
>>
>> -----Original Message-----
>> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
>> Sent: 2017年9月18日 17:20
>> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>> hdp invalidate
>>
>> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>>> Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369
>>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>> Please scan the code once more, we most likely have used mmHDP_DEBUG0 for this at even more places.
>>
>> Christian.
>>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
>>>     drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
>>>     2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> index f201510..44960b3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> @@ -3549,7 +3549,7 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
>>>     static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>>     {
>>>     	gfx_v9_0_write_data_to_reg(ring, 0, true,
>>> -				   SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0), 1);
>>> +				   SOC15_REG_OFFSET(HDP, 0, mmHDP_READ_CACHE_INVALIDATE), 1);
>>>     }
>>>     
>>>     static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> index fd7c72a..d5f3848 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> @@ -398,7 +398,7 @@ static void sdma_v4_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>>     {
>>>     	amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_SRBM_WRITE) |
>>>     			  SDMA_PKT_SRBM_WRITE_HEADER_BYTE_EN(0xf));
>>> -	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0));
>>> +	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0,
>>> +mmHDP_READ_CACHE_INVALIDATE));
>>>     	amdgpu_ring_write(ring, 1);
>>>     }
>>>     
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
       [not found]                                 ` <baa9518f-d2b1-cfb8-8f98-c3557e3ef8fe-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-19  7:00                                   ` Liu, Monk
       [not found]                                     ` <BLUPR12MB0449775C4245A708B15E9D0B84600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Liu, Monk @ 2017-09-19  7:00 UTC (permalink / raw)
  To: Zhou, David(ChunMing),
	Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Deucher, Alexander

First, I didn't check if windows did this way or not, because I don't sure if windows is always doing the right thing, e.g. for GFX preemption I didn't copy windows scheme and we found couple bugs in windows but not in linux ... 
So please don't assume we should copy from windows, unless it's solid like a dead bone

Second, this register is originally comes from the definition file "hdp_4_0_offset.h", not recently introduced by me or someone else, and using this register to replace MM_HDP_DEBUG0 is suggested from a HDP HW guys when
I was working on the PAL/VULKAN preemption hang issue in Orlando, sorry I missed that guy's name ...

@Deucher, Alexander do you know who is on hdp hw ? we can confirm with him 


If you're feeling bad about this change, I can add "if sriov" condition to all of it, so bare-metal will keep still,  is that okay ?

BR Monk


-----Original Message-----
From: Zhou, David(ChunMing) 
Sent: 2017年9月19日 14:51
To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate


Seems the change is more proper, but where do you find mmHDP_READ_CACHE_INVALIDATE? Could you double check if Windows driver has changed to use this?
I'm confusing it, since mmHDP_DEBUG0 implementation is from windows as well.
I even don't find mmHDP_READ_CACHE_INVALIDATE in register spec.

Regards,
David Zhou
On 2017年09月19日 14:46, Liu, Monk wrote:
> What question ? please reply here
>
> -----Original Message-----
> From: Zhou, David(ChunMing)
> Sent: 2017年9月19日 12:25
> To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian 
> <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger 
> hdp invalidate
>
> Please answer my question as I raised in another thread, otherwise I will give a NAK on this!
>
> Regards,
> David Zhou
>
> -----Original Message-----
> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf 
> Of Liu, Monk
> Sent: Tuesday, September 19, 2017 12:04 PM
> To: Koenig, Christian <Christian.Koenig@amd.com>; 
> amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger 
> hdp invalidate
>
> Yeah, vnc1_0 and uvd_v7_0
>
> Thanks
>
> -----Original Message-----
> From: Koenig, Christian
> Sent: 2017年9月18日 19:39
> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger 
> hdp invalidate
>
> Yeah, but Vega10 has UVD7 and in uvd_v7_0.c we have:
>
>> static void uvd_v7_0_ring_emit_hdp_invalidate(struct amdgpu_ring
>> *ring) {
>>          amdgpu_ring_write(ring, PACKET0(SOC15_REG_OFFSET(HDP, 0, 
>> mmHDP_DEBUG0), 0));
>>          amdgpu_ring_write(ring, 1);
>> }
> That should probably be fixed as well.
>
> Regards,
> Christian.
>
> Am 18.09.2017 um 13:03 schrieb Liu, Monk:
>> Only vega10 has this register
>>
>> -----Original Message-----
>> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
>> Sent: 2017年9月18日 17:20
>> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger 
>> hdp invalidate
>>
>> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>>> Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369
>>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>> Please scan the code once more, we most likely have used mmHDP_DEBUG0 for this at even more places.
>>
>> Christian.
>>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
>>>     drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
>>>     2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> index f201510..44960b3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> @@ -3549,7 +3549,7 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
>>>     static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>>     {
>>>     	gfx_v9_0_write_data_to_reg(ring, 0, true,
>>> -				   SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0), 1);
>>> +				   SOC15_REG_OFFSET(HDP, 0, mmHDP_READ_CACHE_INVALIDATE), 1);
>>>     }
>>>     
>>>     static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring, 
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> index fd7c72a..d5f3848 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> @@ -398,7 +398,7 @@ static void sdma_v4_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>>     {
>>>     	amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_SRBM_WRITE) |
>>>     			  SDMA_PKT_SRBM_WRITE_HEADER_BYTE_EN(0xf));
>>> -	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0));
>>> +	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, 
>>> +mmHDP_READ_CACHE_INVALIDATE));
>>>     	amdgpu_ring_write(ring, 1);
>>>     }
>>>     
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
       [not found]                                     ` <BLUPR12MB0449775C4245A708B15E9D0B84600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-09-19  7:02                                       ` zhoucm1
       [not found]                                         ` <5367a2b2-3044-7388-08ff-6f0a620d5aa8-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: zhoucm1 @ 2017-09-19  7:02 UTC (permalink / raw)
  To: Liu, Monk, Koenig, Christian,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Deucher, Alexander

 >using this register to replace MM_HDP_DEBUG0 is suggested from a HDP 
HW guys

I'm OK with this line.

Thanks for explain.
David Zhou

On 2017年09月19日 15:00, Liu, Monk wrote:
> First, I didn't check if windows did this way or not, because I don't sure if windows is always doing the right thing, e.g. for GFX preemption I didn't copy windows scheme and we found couple bugs in windows but not in linux ...
> So please don't assume we should copy from windows, unless it's solid like a dead bone
>
> Second, this register is originally comes from the definition file "hdp_4_0_offset.h", not recently introduced by me or someone else, and using this register to replace MM_HDP_DEBUG0 is suggested from a HDP HW guys when
> I was working on the PAL/VULKAN preemption hang issue in Orlando, sorry I missed that guy's name ...
>
> @Deucher, Alexander do you know who is on hdp hw ? we can confirm with him
>
>
> If you're feeling bad about this change, I can add "if sriov" condition to all of it, so bare-metal will keep still,  is that okay ?
>
> BR Monk
>
>
> -----Original Message-----
> From: Zhou, David(ChunMing)
> Sent: 2017年9月19日 14:51
> To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
>
>
> Seems the change is more proper, but where do you find mmHDP_READ_CACHE_INVALIDATE? Could you double check if Windows driver has changed to use this?
> I'm confusing it, since mmHDP_DEBUG0 implementation is from windows as well.
> I even don't find mmHDP_READ_CACHE_INVALIDATE in register spec.
>
> Regards,
> David Zhou
> On 2017年09月19日 14:46, Liu, Monk wrote:
>> What question ? please reply here
>>
>> -----Original Message-----
>> From: Zhou, David(ChunMing)
>> Sent: 2017年9月19日 12:25
>> To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian
>> <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
>> Subject: RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>> hdp invalidate
>>
>> Please answer my question as I raised in another thread, otherwise I will give a NAK on this!
>>
>> Regards,
>> David Zhou
>>
>> -----Original Message-----
>> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
>> Of Liu, Monk
>> Sent: Tuesday, September 19, 2017 12:04 PM
>> To: Koenig, Christian <Christian.Koenig@amd.com>;
>> amd-gfx@lists.freedesktop.org
>> Subject: RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>> hdp invalidate
>>
>> Yeah, vnc1_0 and uvd_v7_0
>>
>> Thanks
>>
>> -----Original Message-----
>> From: Koenig, Christian
>> Sent: 2017年9月18日 19:39
>> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>> hdp invalidate
>>
>> Yeah, but Vega10 has UVD7 and in uvd_v7_0.c we have:
>>
>>> static void uvd_v7_0_ring_emit_hdp_invalidate(struct amdgpu_ring
>>> *ring) {
>>>           amdgpu_ring_write(ring, PACKET0(SOC15_REG_OFFSET(HDP, 0,
>>> mmHDP_DEBUG0), 0));
>>>           amdgpu_ring_write(ring, 1);
>>> }
>> That should probably be fixed as well.
>>
>> Regards,
>> Christian.
>>
>> Am 18.09.2017 um 13:03 schrieb Liu, Monk:
>>> Only vega10 has this register
>>>
>>> -----Original Message-----
>>> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
>>> Sent: 2017年9月18日 17:20
>>> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>>> hdp invalidate
>>>
>>> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>>>> Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369
>>>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>
>>> Please scan the code once more, we most likely have used mmHDP_DEBUG0 for this at even more places.
>>>
>>> Christian.
>>>
>>>> ---
>>>>      drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
>>>>      drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
>>>>      2 files changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>> index f201510..44960b3 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>> @@ -3549,7 +3549,7 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
>>>>      static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>>>      {
>>>>      	gfx_v9_0_write_data_to_reg(ring, 0, true,
>>>> -				   SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0), 1);
>>>> +				   SOC15_REG_OFFSET(HDP, 0, mmHDP_READ_CACHE_INVALIDATE), 1);
>>>>      }
>>>>      
>>>>      static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>> index fd7c72a..d5f3848 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>> @@ -398,7 +398,7 @@ static void sdma_v4_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>>>      {
>>>>      	amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_SRBM_WRITE) |
>>>>      			  SDMA_PKT_SRBM_WRITE_HEADER_BYTE_EN(0xf));
>>>> -	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0));
>>>> +	amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0,
>>>> +mmHDP_READ_CACHE_INVALIDATE));
>>>>      	amdgpu_ring_write(ring, 1);
>>>>      }
>>>>      
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 13/18] drm/amdgpu:fix driver unloading bug
       [not found]                     ` <BLUPR12MB04498EEB2BF374C72EF7CF5384600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-09-19  8:26                       ` Christian König
       [not found]                         ` <69a1e774-6a9e-31c6-8b30-dfbd430062c8-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Christian König @ 2017-09-19  8:26 UTC (permalink / raw)
  To: Liu, Monk, Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Chen, Horace

Am 19.09.2017 um 06:14 schrieb Liu, Monk:
> Christian,
>
>> That sounds at least a bit better. But my question is why doesn't this work like it does on Tonga, e.g. correctly clean things up?
> Tonga also suffer with this issue, just that we fixed it in the branch for CSP customer and staging code usually behind our private branch ...
>
>> Yeah, gut keeping the GART mapping alive is complete nonsense. When the driver unloads all memory should be returned to the OS.
>> So we either keep a GART mapping to pages which are about to be reused and overwritten, or we leak memory on driver shutdown.
>> Neither options sounds very good,
> Unbinding the GART mapping makes CPC hang if it run MQD commands, and CPC must run MQD commands because RLCV always
> Requires CPC do that when RLCV doing SAVE_VF commands,
>
> Do you have way to fix above circle ?

Well the question is why does the CPC still needs the MQD commands and 
how can we prevent that?

The point is when we need to keep the GART alive to avoid a crash after 
driver unload we also need to keep the pages alive where the GART points to.

This means that the pages are either overwritten or we get massive 
complains from the MM that we are leaking pages here.

If it's not possible to turn of the CPC on driver unload the only 
alternative I can see is to reprogram it so that the MQD commands come 
from VRAM instead of GART.

Regards,
Christian.

>
>
> BR Monk
>
>
> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
> Sent: 2017年9月18日 19:54
> To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Chen, Horace <Horace.Chen@amd.com>
> Subject: Re: [PATCH 13/18] drm/amdgpu:fix driver unloading bug
>
> Am 18.09.2017 um 12:12 schrieb Liu, Monk:
>> Christian,
>>
>> Let's discuss this patch and the one follows which skip the KIQ MQD free to avoid SAVE_FAIL issue.
>>
>>
>> For skipping KIQ MQD deallocation patch, I think I will drop it and use a new way:
>> We allocate KIQ MQD in VRAM domain and this BO can be safely freed after driver unloaded, because after driver unloaded no one will change the data in this BO *usually*.
>> e.g. some root  app can map visible vram and alter the value in it
> That sounds at least a bit better. But my question is why doesn't this work like it does on Tonga, e.g. correctly clean things up?
>
>> for this patch "to skipping unbind the GART mapping to keep KIQ MQD always valid":
>> Since hypervisor side always have couple hw component working, and they rely on GMC kept alive, so this is very different with BARE-METAL. That's to say we can only do like this way.
> Yeah, gut keeping the GART mapping alive is complete nonsense. When the driver unloads all memory should be returned to the OS.
>
> So we either keep a GART mapping to pages which are about to be reused and overwritten, or we leak memory on driver shutdown.
>
> Neither options sounds very good,
> Christian.
>
>> Besides, we'll have more patches in future for L1 secure mode, which
>> forbidden VF access GMC registers, so under L1 secure mode driver will
>> always skip GMC programing under SRIOV both in init and fini, but that
>> will come later
>>
>> BR Monk
>>
>>
>>
>> -----Original Message-----
>> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
>> Sent: 2017年9月18日 17:28
>> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Chen, Horace <Horace.Chen@amd.com>
>> Subject: Re: [PATCH 13/18] drm/amdgpu:fix driver unloading bug
>>
>> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>>> [SWDEV-126631] - fix hypervisor save_vf fail that occured after
>>> driver
>>> removed:
>>> 1. Because the KIQ and KCQ were not ummapped, save_vf will fail if driver freed mqd of KIQ and KCQ.
>>> 2. KIQ can't be unmapped since RLCV always need it, the bo_free on
>>> KIQ should be skipped 3. KCQ can be unmapped, and should be unmapped
>>> during hw_fini, 4. RLCV still need to access other mc address from some hw even after driver unloaded,
>>>       So we should not unbind gart for VF.
>>>
>>> Change-Id: I320487a9a848f41484c5f8cc11be34aca807b424
>>> Signed-off-by: Horace Chen <horace.chen@amd.com>
>>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>> I absolutely can't judge if this is correct or not, but keeping the GART and KIQ alive after the driver is unloaded sounds really fishy to me.
>>
>> Isn't there any other clean way of handling this?
>>
>> Christian.
>>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c |  3 +-
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  |  5 +++
>>>     drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    | 60 +++++++++++++++++++++++++++++++-
>>>     3 files changed, 66 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> index f437008..2fee071 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> @@ -394,7 +394,8 @@ int amdgpu_gart_init(struct amdgpu_device *adev)
>>>      */
>>>     void amdgpu_gart_fini(struct amdgpu_device *adev)
>>>     {
>>> -	if (adev->gart.ready) {
>>> +	/* gart is still used by other hw under SRIOV, don't unbind it */
>>> +	if (adev->gart.ready && !amdgpu_sriov_vf(adev)) {
>>>     		/* unbind pages */
>>>     		amdgpu_gart_unbind(adev, 0, adev->gart.num_cpu_pages);
>>>     	}
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> index 4f6c68f..bf6656f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> @@ -309,6 +309,11 @@ void amdgpu_gfx_compute_mqd_sw_fini(struct amdgpu_device *adev)
>>>     				      &ring->mqd_ptr);
>>>     	}
>>>     
>>> +	/* don't deallocate KIQ mqd because the bo is still used by RLCV even
>>> +	the guest VM is shutdown */
>>> +	if (amdgpu_sriov_vf(adev))
>>> +		return;
>>> +
>>>     	ring = &adev->gfx.kiq.ring;
>>>     	kfree(adev->gfx.mec.mqd_backup[AMDGPU_MAX_COMPUTE_RINGS]);
>>>     	amdgpu_bo_free_kernel(&ring->mqd_obj,
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> index 44960b3..a577bbc 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> @@ -2892,14 +2892,72 @@ static int gfx_v9_0_hw_init(void *handle)
>>>     	return r;
>>>     }
>>>     
>>> +static int gfx_v9_0_kcq_disable(struct amdgpu_ring *kiq_ring,struct
>>> +amdgpu_ring *ring) {
>>> +	struct amdgpu_device *adev = kiq_ring->adev;
>>> +	uint32_t scratch, tmp = 0;
>>> +	int r, i;
>>> +
>>> +	r = amdgpu_gfx_scratch_get(adev, &scratch);
>>> +	if (r) {
>>> +		DRM_ERROR("Failed to get scratch reg (%d).\n", r);
>>> +		return r;
>>> +	}
>>> +	WREG32(scratch, 0xCAFEDEAD);
>>> +
>>> +	r = amdgpu_ring_alloc(kiq_ring, 10);
>>> +	if (r) {
>>> +		DRM_ERROR("Failed to lock KIQ (%d).\n", r);
>>> +		amdgpu_gfx_scratch_free(adev, scratch);
>>> +		return r;
>>> +	}
>>> +
>>> +	/* unmap queues */
>>> +	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_UNMAP_QUEUES, 4));
>>> +	amdgpu_ring_write(kiq_ring, /* Q_sel: 0, vmid: 0, engine: 0, num_Q: 1 */
>>> +						PACKET3_UNMAP_QUEUES_ACTION(1) | /* RESET_QUEUES */
>>> +						PACKET3_UNMAP_QUEUES_QUEUE_SEL(0) |
>>> +						PACKET3_UNMAP_QUEUES_ENGINE_SEL(0) |
>>> +						PACKET3_UNMAP_QUEUES_NUM_QUEUES(1));
>>> +	amdgpu_ring_write(kiq_ring, PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
>>> +	amdgpu_ring_write(kiq_ring, 0);
>>> +	amdgpu_ring_write(kiq_ring, 0);
>>> +	amdgpu_ring_write(kiq_ring, 0);
>>> +	/* write to scratch for completion */
>>> +	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_SET_UCONFIG_REG, 1));
>>> +	amdgpu_ring_write(kiq_ring, (scratch - PACKET3_SET_UCONFIG_REG_START));
>>> +	amdgpu_ring_write(kiq_ring, 0xDEADBEEF);
>>> +	amdgpu_ring_commit(kiq_ring);
>>> +
>>> +	for (i = 0; i < adev->usec_timeout; i++) {
>>> +		tmp = RREG32(scratch);
>>> +		if (tmp == 0xDEADBEEF)
>>> +			break;
>>> +		DRM_UDELAY(1);
>>> +	}
>>> +	if (i >= adev->usec_timeout) {
>>> +		DRM_ERROR("KCQ disabled failed (scratch(0x%04X)=0x%08X)\n", scratch, tmp);
>>> +		r = -EINVAL;
>>> +	}
>>> +	amdgpu_gfx_scratch_free(adev, scratch);
>>> +	return r;
>>> +}
>>> +
>>> +
>>>     static int gfx_v9_0_hw_fini(void *handle)
>>>     {
>>>     	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>> +	int i, r;
>>>     
>>>     	amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
>>>     	amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
>>>     	if (amdgpu_sriov_vf(adev)) {
>>> -		pr_debug("For SRIOV client, shouldn't do anything.\n");
>>> +		/* disable KCQ to avoid CPC touch memory not valid anymore */
>>> +		for (i = 0; i < adev->gfx.num_compute_rings; i++) {
>>> +			r = gfx_v9_0_kcq_disable(&adev->gfx.kiq.ring, &adev->gfx.compute_ring[i]);
>>> +			if (r)
>>> +				return r;
>>> +		}
>>>     		return 0;
>>>     	}
>>>     	gfx_v9_0_cp_enable(adev, false);
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
       [not found]                                         ` <5367a2b2-3044-7388-08ff-6f0a620d5aa8-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-19  8:30                                           ` Christian König
       [not found]                                             ` <28fa17b6-ebb0-99c7-042a-19289d858f64-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Christian König @ 2017-09-19  8:30 UTC (permalink / raw)
  To: zhoucm1, Liu, Monk, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Deucher, Alexander

I don't know why, but the HDP is generally not part of the register spec.

So you can neither find HDP_DEBUG0 nor HDP_READ_CACHE_INVALIDATE in it 
as far as I know.

Point is that the HDP invalidates it's read cache on any register write 
(the register itself doesn't matter). So far we used the HDP_DEBUG0 
register because it is unused otherwise, but having a dedicated register 
just for this job is clearly a good idea.

Regards,
Christian.

Am 19.09.2017 um 09:02 schrieb zhoucm1:
> >using this register to replace MM_HDP_DEBUG0 is suggested from a HDP 
> HW guys
>
> I'm OK with this line.
>
> Thanks for explain.
> David Zhou
>
> On 2017年09月19日 15:00, Liu, Monk wrote:
>> First, I didn't check if windows did this way or not, because I don't 
>> sure if windows is always doing the right thing, e.g. for GFX 
>> preemption I didn't copy windows scheme and we found couple bugs in 
>> windows but not in linux ...
>> So please don't assume we should copy from windows, unless it's solid 
>> like a dead bone
>>
>> Second, this register is originally comes from the definition file 
>> "hdp_4_0_offset.h", not recently introduced by me or someone else, 
>> and using this register to replace MM_HDP_DEBUG0 is suggested from a 
>> HDP HW guys when
>> I was working on the PAL/VULKAN preemption hang issue in Orlando, 
>> sorry I missed that guy's name ...
>>
>> @Deucher, Alexander do you know who is on hdp hw ? we can confirm 
>> with him
>>
>>
>> If you're feeling bad about this change, I can add "if sriov" 
>> condition to all of it, so bare-metal will keep still,  is that okay ?
>>
>> BR Monk
>>
>>
>> -----Original Message-----
>> From: Zhou, David(ChunMing)
>> Sent: 2017年9月19日 14:51
>> To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian 
>> <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger 
>> hdp invalidate
>>
>>
>> Seems the change is more proper, but where do you find 
>> mmHDP_READ_CACHE_INVALIDATE? Could you double check if Windows driver 
>> has changed to use this?
>> I'm confusing it, since mmHDP_DEBUG0 implementation is from windows 
>> as well.
>> I even don't find mmHDP_READ_CACHE_INVALIDATE in register spec.
>>
>> Regards,
>> David Zhou
>> On 2017年09月19日 14:46, Liu, Monk wrote:
>>> What question ? please reply here
>>>
>>> -----Original Message-----
>>> From: Zhou, David(ChunMing)
>>> Sent: 2017年9月19日 12:25
>>> To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian
>>> <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
>>> Subject: RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>>> hdp invalidate
>>>
>>> Please answer my question as I raised in another thread, otherwise I 
>>> will give a NAK on this!
>>>
>>> Regards,
>>> David Zhou
>>>
>>> -----Original Message-----
>>> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
>>> Of Liu, Monk
>>> Sent: Tuesday, September 19, 2017 12:04 PM
>>> To: Koenig, Christian <Christian.Koenig@amd.com>;
>>> amd-gfx@lists.freedesktop.org
>>> Subject: RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>>> hdp invalidate
>>>
>>> Yeah, vnc1_0 and uvd_v7_0
>>>
>>> Thanks
>>>
>>> -----Original Message-----
>>> From: Koenig, Christian
>>> Sent: 2017年9月18日 19:39
>>> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>>> hdp invalidate
>>>
>>> Yeah, but Vega10 has UVD7 and in uvd_v7_0.c we have:
>>>
>>>> static void uvd_v7_0_ring_emit_hdp_invalidate(struct amdgpu_ring
>>>> *ring) {
>>>>           amdgpu_ring_write(ring, PACKET0(SOC15_REG_OFFSET(HDP, 0,
>>>> mmHDP_DEBUG0), 0));
>>>>           amdgpu_ring_write(ring, 1);
>>>> }
>>> That should probably be fixed as well.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 18.09.2017 um 13:03 schrieb Liu, Monk:
>>>> Only vega10 has this register
>>>>
>>>> -----Original Message-----
>>>> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
>>>> Sent: 2017年9月18日 17:20
>>>> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>>>> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>>>> hdp invalidate
>>>>
>>>> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>>>>> Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369
>>>>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>
>>>> Please scan the code once more, we most likely have used 
>>>> mmHDP_DEBUG0 for this at even more places.
>>>>
>>>> Christian.
>>>>
>>>>> ---
>>>>>      drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
>>>>>      drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
>>>>>      2 files changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>> index f201510..44960b3 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>> @@ -3549,7 +3549,7 @@ static void 
>>>>> gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
>>>>>      static void gfx_v9_0_ring_emit_hdp_invalidate(struct 
>>>>> amdgpu_ring *ring)
>>>>>      {
>>>>>          gfx_v9_0_write_data_to_reg(ring, 0, true,
>>>>> -                   SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0), 1);
>>>>> +                   SOC15_REG_OFFSET(HDP, 0, 
>>>>> mmHDP_READ_CACHE_INVALIDATE), 1);
>>>>>      }
>>>>>           static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring 
>>>>> *ring,
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>>> index fd7c72a..d5f3848 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>>> @@ -398,7 +398,7 @@ static void 
>>>>> sdma_v4_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>>>>      {
>>>>>          amdgpu_ring_write(ring, 
>>>>> SDMA_PKT_HEADER_OP(SDMA_OP_SRBM_WRITE) |
>>>>> SDMA_PKT_SRBM_WRITE_HEADER_BYTE_EN(0xf));
>>>>> -    amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0));
>>>>> +    amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0,
>>>>> +mmHDP_READ_CACHE_INVALIDATE));
>>>>>          amdgpu_ring_write(ring, 1);
>>>>>      }
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
       [not found]                                             ` <28fa17b6-ebb0-99c7-042a-19289d858f64-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-19  9:34                                               ` Zhang, Jerry (Junwei)
  2017-09-19 13:42                                               ` Alex Deucher
  1 sibling, 0 replies; 61+ messages in thread
From: Zhang, Jerry (Junwei) @ 2017-09-19  9:34 UTC (permalink / raw)
  To: Christian König, zhoucm1, Liu, Monk,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Deucher, Alexander

On 09/19/2017 04:30 PM, Christian König wrote:
> I don't know why, but the HDP is generally not part of the register spec.

AFAIW, these regs may be used for HW guys to debug some special cases.
Usually they are not suggested to touch formally.
(e.g. GFX cannot access PRT unmap range, but with debug bit set, it could. 
however, it's not normal way.)

Jerry


>
> So you can neither find HDP_DEBUG0 nor HDP_READ_CACHE_INVALIDATE in it as far as
> I know.
>
> Point is that the HDP invalidates it's read cache on any register write (the
> register itself doesn't matter). So far we used the HDP_DEBUG0 register because
> it is unused otherwise, but having a dedicated register just for this job is
> clearly a good idea.
>
> Regards,
> Christian.
>
> Am 19.09.2017 um 09:02 schrieb zhoucm1:
>> >using this register to replace MM_HDP_DEBUG0 is suggested from a HDP HW guys
>>
>> I'm OK with this line.
>>
>> Thanks for explain.
>> David Zhou
>>
>> On 2017年09月19日 15:00, Liu, Monk wrote:
>>> First, I didn't check if windows did this way or not, because I don't sure if
>>> windows is always doing the right thing, e.g. for GFX preemption I didn't
>>> copy windows scheme and we found couple bugs in windows but not in linux ...
>>> So please don't assume we should copy from windows, unless it's solid like a
>>> dead bone
>>>
>>> Second, this register is originally comes from the definition file
>>> "hdp_4_0_offset.h", not recently introduced by me or someone else, and using
>>> this register to replace MM_HDP_DEBUG0 is suggested from a HDP HW guys when
>>> I was working on the PAL/VULKAN preemption hang issue in Orlando, sorry I
>>> missed that guy's name ...
>>>
>>> @Deucher, Alexander do you know who is on hdp hw ? we can confirm with him
>>>
>>>
>>> If you're feeling bad about this change, I can add "if sriov" condition to
>>> all of it, so bare-metal will keep still,  is that okay ?
>>>
>>> BR Monk
>>>
>>>
>>> -----Original Message-----
>>> From: Zhou, David(ChunMing)
>>> Sent: 2017年9月19日 14:51
>>> To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian
>>> <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp
>>> invalidate
>>>
>>>
>>> Seems the change is more proper, but where do you find
>>> mmHDP_READ_CACHE_INVALIDATE? Could you double check if Windows driver has
>>> changed to use this?
>>> I'm confusing it, since mmHDP_DEBUG0 implementation is from windows as well.
>>> I even don't find mmHDP_READ_CACHE_INVALIDATE in register spec.
>>>
>>> Regards,
>>> David Zhou
>>> On 2017年09月19日 14:46, Liu, Monk wrote:
>>>> What question ? please reply here
>>>>
>>>> -----Original Message-----
>>>> From: Zhou, David(ChunMing)
>>>> Sent: 2017年9月19日 12:25
>>>> To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian
>>>> <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
>>>> Subject: RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>>>> hdp invalidate
>>>>
>>>> Please answer my question as I raised in another thread, otherwise I will
>>>> give a NAK on this!
>>>>
>>>> Regards,
>>>> David Zhou
>>>>
>>>> -----Original Message-----
>>>> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
>>>> Of Liu, Monk
>>>> Sent: Tuesday, September 19, 2017 12:04 PM
>>>> To: Koenig, Christian <Christian.Koenig@amd.com>;
>>>> amd-gfx@lists.freedesktop.org
>>>> Subject: RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>>>> hdp invalidate
>>>>
>>>> Yeah, vnc1_0 and uvd_v7_0
>>>>
>>>> Thanks
>>>>
>>>> -----Original Message-----
>>>> From: Koenig, Christian
>>>> Sent: 2017年9月18日 19:39
>>>> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>>>> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>>>> hdp invalidate
>>>>
>>>> Yeah, but Vega10 has UVD7 and in uvd_v7_0.c we have:
>>>>
>>>>> static void uvd_v7_0_ring_emit_hdp_invalidate(struct amdgpu_ring
>>>>> *ring) {
>>>>>           amdgpu_ring_write(ring, PACKET0(SOC15_REG_OFFSET(HDP, 0,
>>>>> mmHDP_DEBUG0), 0));
>>>>>           amdgpu_ring_write(ring, 1);
>>>>> }
>>>> That should probably be fixed as well.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 18.09.2017 um 13:03 schrieb Liu, Monk:
>>>>> Only vega10 has this register
>>>>>
>>>>> -----Original Message-----
>>>>> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
>>>>> Sent: 2017年9月18日 17:20
>>>>> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>>>>> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>>>>> hdp invalidate
>>>>>
>>>>> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>>>>>> Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369
>>>>>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>>
>>>>> Please scan the code once more, we most likely have used mmHDP_DEBUG0 for
>>>>> this at even more places.
>>>>>
>>>>> Christian.
>>>>>
>>>>>> ---
>>>>>>      drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
>>>>>>      drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
>>>>>>      2 files changed, 2 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> index f201510..44960b3 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> @@ -3549,7 +3549,7 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct
>>>>>> amdgpu_ring *ring)
>>>>>>      static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>>>>>      {
>>>>>>          gfx_v9_0_write_data_to_reg(ring, 0, true,
>>>>>> -                   SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0), 1);
>>>>>> +                   SOC15_REG_OFFSET(HDP, 0, mmHDP_READ_CACHE_INVALIDATE),
>>>>>> 1);
>>>>>>      }
>>>>>>           static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>>>> index fd7c72a..d5f3848 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>>>> @@ -398,7 +398,7 @@ static void sdma_v4_0_ring_emit_hdp_invalidate(struct
>>>>>> amdgpu_ring *ring)
>>>>>>      {
>>>>>>          amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_SRBM_WRITE) |
>>>>>> SDMA_PKT_SRBM_WRITE_HEADER_BYTE_EN(0xf));
>>>>>> -    amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0));
>>>>>> +    amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0,
>>>>>> +mmHDP_READ_CACHE_INVALIDATE));
>>>>>>          amdgpu_ring_write(ring, 1);
>>>>>>      }
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 13/18] drm/amdgpu:fix driver unloading bug
       [not found]                         ` <69a1e774-6a9e-31c6-8b30-dfbd430062c8-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-09-19 11:37                           ` Liu, Monk
  0 siblings, 0 replies; 61+ messages in thread
From: Liu, Monk @ 2017-09-19 11:37 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Chen, Horace

> Well the question is why does the CPC still needs the MQD commands and how can we prevent that?

You are right, I'll see what can do


-----Original Message-----
From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com] 
Sent: 2017年9月19日 16:27
To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Chen, Horace <Horace.Chen@amd.com>
Subject: Re: [PATCH 13/18] drm/amdgpu:fix driver unloading bug

Am 19.09.2017 um 06:14 schrieb Liu, Monk:
> Christian,
>
>> That sounds at least a bit better. But my question is why doesn't this work like it does on Tonga, e.g. correctly clean things up?
> Tonga also suffer with this issue, just that we fixed it in the branch for CSP customer and staging code usually behind our private branch ...
>
>> Yeah, gut keeping the GART mapping alive is complete nonsense. When the driver unloads all memory should be returned to the OS.
>> So we either keep a GART mapping to pages which are about to be reused and overwritten, or we leak memory on driver shutdown.
>> Neither options sounds very good,
> Unbinding the GART mapping makes CPC hang if it run MQD commands, and 
> CPC must run MQD commands because RLCV always Requires CPC do that 
> when RLCV doing SAVE_VF commands,
>
> Do you have way to fix above circle ?

Well the question is why does the CPC still needs the MQD commands and how can we prevent that?

The point is when we need to keep the GART alive to avoid a crash after driver unload we also need to keep the pages alive where the GART points to.

This means that the pages are either overwritten or we get massive complains from the MM that we are leaking pages here.

If it's not possible to turn of the CPC on driver unload the only alternative I can see is to reprogram it so that the MQD commands come from VRAM instead of GART.

Regards,
Christian.

>
>
> BR Monk
>
>
> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
> Sent: 2017年9月18日 19:54
> To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian 
> <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Chen, Horace <Horace.Chen@amd.com>
> Subject: Re: [PATCH 13/18] drm/amdgpu:fix driver unloading bug
>
> Am 18.09.2017 um 12:12 schrieb Liu, Monk:
>> Christian,
>>
>> Let's discuss this patch and the one follows which skip the KIQ MQD free to avoid SAVE_FAIL issue.
>>
>>
>> For skipping KIQ MQD deallocation patch, I think I will drop it and use a new way:
>> We allocate KIQ MQD in VRAM domain and this BO can be safely freed after driver unloaded, because after driver unloaded no one will change the data in this BO *usually*.
>> e.g. some root  app can map visible vram and alter the value in it
> That sounds at least a bit better. But my question is why doesn't this work like it does on Tonga, e.g. correctly clean things up?
>
>> for this patch "to skipping unbind the GART mapping to keep KIQ MQD always valid":
>> Since hypervisor side always have couple hw component working, and they rely on GMC kept alive, so this is very different with BARE-METAL. That's to say we can only do like this way.
> Yeah, gut keeping the GART mapping alive is complete nonsense. When the driver unloads all memory should be returned to the OS.
>
> So we either keep a GART mapping to pages which are about to be reused and overwritten, or we leak memory on driver shutdown.
>
> Neither options sounds very good,
> Christian.
>
>> Besides, we'll have more patches in future for L1 secure mode, which 
>> forbidden VF access GMC registers, so under L1 secure mode driver 
>> will always skip GMC programing under SRIOV both in init and fini, 
>> but that will come later
>>
>> BR Monk
>>
>>
>>
>> -----Original Message-----
>> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
>> Sent: 2017年9月18日 17:28
>> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Chen, Horace <Horace.Chen@amd.com>
>> Subject: Re: [PATCH 13/18] drm/amdgpu:fix driver unloading bug
>>
>> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>>> [SWDEV-126631] - fix hypervisor save_vf fail that occured after 
>>> driver
>>> removed:
>>> 1. Because the KIQ and KCQ were not ummapped, save_vf will fail if driver freed mqd of KIQ and KCQ.
>>> 2. KIQ can't be unmapped since RLCV always need it, the bo_free on 
>>> KIQ should be skipped 3. KCQ can be unmapped, and should be unmapped 
>>> during hw_fini, 4. RLCV still need to access other mc address from some hw even after driver unloaded,
>>>       So we should not unbind gart for VF.
>>>
>>> Change-Id: I320487a9a848f41484c5f8cc11be34aca807b424
>>> Signed-off-by: Horace Chen <horace.chen@amd.com>
>>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>> I absolutely can't judge if this is correct or not, but keeping the GART and KIQ alive after the driver is unloaded sounds really fishy to me.
>>
>> Isn't there any other clean way of handling this?
>>
>> Christian.
>>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c |  3 +-
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  |  5 +++
>>>     drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    | 60 +++++++++++++++++++++++++++++++-
>>>     3 files changed, 66 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> index f437008..2fee071 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> @@ -394,7 +394,8 @@ int amdgpu_gart_init(struct amdgpu_device *adev)
>>>      */
>>>     void amdgpu_gart_fini(struct amdgpu_device *adev)
>>>     {
>>> -	if (adev->gart.ready) {
>>> +	/* gart is still used by other hw under SRIOV, don't unbind it */
>>> +	if (adev->gart.ready && !amdgpu_sriov_vf(adev)) {
>>>     		/* unbind pages */
>>>     		amdgpu_gart_unbind(adev, 0, adev->gart.num_cpu_pages);
>>>     	}
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> index 4f6c68f..bf6656f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> @@ -309,6 +309,11 @@ void amdgpu_gfx_compute_mqd_sw_fini(struct amdgpu_device *adev)
>>>     				      &ring->mqd_ptr);
>>>     	}
>>>     
>>> +	/* don't deallocate KIQ mqd because the bo is still used by RLCV even
>>> +	the guest VM is shutdown */
>>> +	if (amdgpu_sriov_vf(adev))
>>> +		return;
>>> +
>>>     	ring = &adev->gfx.kiq.ring;
>>>     	kfree(adev->gfx.mec.mqd_backup[AMDGPU_MAX_COMPUTE_RINGS]);
>>>     	amdgpu_bo_free_kernel(&ring->mqd_obj,
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> index 44960b3..a577bbc 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> @@ -2892,14 +2892,72 @@ static int gfx_v9_0_hw_init(void *handle)
>>>     	return r;
>>>     }
>>>     
>>> +static int gfx_v9_0_kcq_disable(struct amdgpu_ring *kiq_ring,struct 
>>> +amdgpu_ring *ring) {
>>> +	struct amdgpu_device *adev = kiq_ring->adev;
>>> +	uint32_t scratch, tmp = 0;
>>> +	int r, i;
>>> +
>>> +	r = amdgpu_gfx_scratch_get(adev, &scratch);
>>> +	if (r) {
>>> +		DRM_ERROR("Failed to get scratch reg (%d).\n", r);
>>> +		return r;
>>> +	}
>>> +	WREG32(scratch, 0xCAFEDEAD);
>>> +
>>> +	r = amdgpu_ring_alloc(kiq_ring, 10);
>>> +	if (r) {
>>> +		DRM_ERROR("Failed to lock KIQ (%d).\n", r);
>>> +		amdgpu_gfx_scratch_free(adev, scratch);
>>> +		return r;
>>> +	}
>>> +
>>> +	/* unmap queues */
>>> +	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_UNMAP_QUEUES, 4));
>>> +	amdgpu_ring_write(kiq_ring, /* Q_sel: 0, vmid: 0, engine: 0, num_Q: 1 */
>>> +						PACKET3_UNMAP_QUEUES_ACTION(1) | /* RESET_QUEUES */
>>> +						PACKET3_UNMAP_QUEUES_QUEUE_SEL(0) |
>>> +						PACKET3_UNMAP_QUEUES_ENGINE_SEL(0) |
>>> +						PACKET3_UNMAP_QUEUES_NUM_QUEUES(1));
>>> +	amdgpu_ring_write(kiq_ring, PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
>>> +	amdgpu_ring_write(kiq_ring, 0);
>>> +	amdgpu_ring_write(kiq_ring, 0);
>>> +	amdgpu_ring_write(kiq_ring, 0);
>>> +	/* write to scratch for completion */
>>> +	amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_SET_UCONFIG_REG, 1));
>>> +	amdgpu_ring_write(kiq_ring, (scratch - PACKET3_SET_UCONFIG_REG_START));
>>> +	amdgpu_ring_write(kiq_ring, 0xDEADBEEF);
>>> +	amdgpu_ring_commit(kiq_ring);
>>> +
>>> +	for (i = 0; i < adev->usec_timeout; i++) {
>>> +		tmp = RREG32(scratch);
>>> +		if (tmp == 0xDEADBEEF)
>>> +			break;
>>> +		DRM_UDELAY(1);
>>> +	}
>>> +	if (i >= adev->usec_timeout) {
>>> +		DRM_ERROR("KCQ disabled failed (scratch(0x%04X)=0x%08X)\n", scratch, tmp);
>>> +		r = -EINVAL;
>>> +	}
>>> +	amdgpu_gfx_scratch_free(adev, scratch);
>>> +	return r;
>>> +}
>>> +
>>> +
>>>     static int gfx_v9_0_hw_fini(void *handle)
>>>     {
>>>     	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>> +	int i, r;
>>>     
>>>     	amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
>>>     	amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
>>>     	if (amdgpu_sriov_vf(adev)) {
>>> -		pr_debug("For SRIOV client, shouldn't do anything.\n");
>>> +		/* disable KCQ to avoid CPC touch memory not valid anymore */
>>> +		for (i = 0; i < adev->gfx.num_compute_rings; i++) {
>>> +			r = gfx_v9_0_kcq_disable(&adev->gfx.kiq.ring, &adev->gfx.compute_ring[i]);
>>> +			if (r)
>>> +				return r;
>>> +		}
>>>     		return 0;
>>>     	}
>>>     	gfx_v9_0_cp_enable(adev, false);
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
       [not found]                                             ` <28fa17b6-ebb0-99c7-042a-19289d858f64-5C7GfCeVMHo@public.gmane.org>
  2017-09-19  9:34                                               ` Zhang, Jerry (Junwei)
@ 2017-09-19 13:42                                               ` Alex Deucher
  1 sibling, 0 replies; 61+ messages in thread
From: Alex Deucher @ 2017-09-19 13:42 UTC (permalink / raw)
  To: Christian König
  Cc: Deucher, Alexander, zhoucm1,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Liu, Monk

On Tue, Sep 19, 2017 at 4:30 AM, Christian König
<christian.koenig@amd.com> wrote:
> I don't know why, but the HDP is generally not part of the register spec.
>
> So you can neither find HDP_DEBUG0 nor HDP_READ_CACHE_INVALIDATE in it as
> far as I know.
>
> Point is that the HDP invalidates it's read cache on any register write (the
> register itself doesn't matter). So far we used the HDP_DEBUG0 register
> because it is unused otherwise, but having a dedicated register just for
> this job is clearly a good idea.

Both show up in the register spec for me and the descriptions both say
writing 1 to the register invalidates the read cache.

Alex

>
> Regards,
> Christian.
>
>
> Am 19.09.2017 um 09:02 schrieb zhoucm1:
>>
>> >using this register to replace MM_HDP_DEBUG0 is suggested from a HDP HW
>> > guys
>>
>> I'm OK with this line.
>>
>> Thanks for explain.
>> David Zhou
>>
>> On 2017年09月19日 15:00, Liu, Monk wrote:
>>>
>>> First, I didn't check if windows did this way or not, because I don't
>>> sure if windows is always doing the right thing, e.g. for GFX preemption I
>>> didn't copy windows scheme and we found couple bugs in windows but not in
>>> linux ...
>>> So please don't assume we should copy from windows, unless it's solid
>>> like a dead bone
>>>
>>> Second, this register is originally comes from the definition file
>>> "hdp_4_0_offset.h", not recently introduced by me or someone else, and using
>>> this register to replace MM_HDP_DEBUG0 is suggested from a HDP HW guys when
>>> I was working on the PAL/VULKAN preemption hang issue in Orlando, sorry I
>>> missed that guy's name ...
>>>
>>> @Deucher, Alexander do you know who is on hdp hw ? we can confirm with
>>> him
>>>
>>>
>>> If you're feeling bad about this change, I can add "if sriov" condition
>>> to all of it, so bare-metal will keep still,  is that okay ?
>>>
>>> BR Monk
>>>
>>>
>>> -----Original Message-----
>>> From: Zhou, David(ChunMing)
>>> Sent: 2017年9月19日 14:51
>>> To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian
>>> <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp
>>> invalidate
>>>
>>>
>>> Seems the change is more proper, but where do you find
>>> mmHDP_READ_CACHE_INVALIDATE? Could you double check if Windows driver has
>>> changed to use this?
>>> I'm confusing it, since mmHDP_DEBUG0 implementation is from windows as
>>> well.
>>> I even don't find mmHDP_READ_CACHE_INVALIDATE in register spec.
>>>
>>> Regards,
>>> David Zhou
>>> On 2017年09月19日 14:46, Liu, Monk wrote:
>>>>
>>>> What question ? please reply here
>>>>
>>>> -----Original Message-----
>>>> From: Zhou, David(ChunMing)
>>>> Sent: 2017年9月19日 12:25
>>>> To: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian
>>>> <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
>>>> Subject: RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>>>> hdp invalidate
>>>>
>>>> Please answer my question as I raised in another thread, otherwise I
>>>> will give a NAK on this!
>>>>
>>>> Regards,
>>>> David Zhou
>>>>
>>>> -----Original Message-----
>>>> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf
>>>> Of Liu, Monk
>>>> Sent: Tuesday, September 19, 2017 12:04 PM
>>>> To: Koenig, Christian <Christian.Koenig@amd.com>;
>>>> amd-gfx@lists.freedesktop.org
>>>> Subject: RE: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>>>> hdp invalidate
>>>>
>>>> Yeah, vnc1_0 and uvd_v7_0
>>>>
>>>> Thanks
>>>>
>>>> -----Original Message-----
>>>> From: Koenig, Christian
>>>> Sent: 2017年9月18日 19:39
>>>> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>>>> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>>>> hdp invalidate
>>>>
>>>> Yeah, but Vega10 has UVD7 and in uvd_v7_0.c we have:
>>>>
>>>>> static void uvd_v7_0_ring_emit_hdp_invalidate(struct amdgpu_ring
>>>>> *ring) {
>>>>>           amdgpu_ring_write(ring, PACKET0(SOC15_REG_OFFSET(HDP, 0,
>>>>> mmHDP_DEBUG0), 0));
>>>>>           amdgpu_ring_write(ring, 1);
>>>>> }
>>>>
>>>> That should probably be fixed as well.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 18.09.2017 um 13:03 schrieb Liu, Monk:
>>>>>
>>>>> Only vega10 has this register
>>>>>
>>>>> -----Original Message-----
>>>>> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
>>>>> Sent: 2017年9月18日 17:20
>>>>> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>>>>> Subject: Re: [PATCH 12/18] drm/amdgpu:use formal register to trigger
>>>>> hdp invalidate
>>>>>
>>>>> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>>>>>>
>>>>>> Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369
>>>>>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>>>>>
>>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>>
>>>>> Please scan the code once more, we most likely have used mmHDP_DEBUG0
>>>>> for this at even more places.
>>>>>
>>>>> Christian.
>>>>>
>>>>>> ---
>>>>>>      drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
>>>>>>      drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
>>>>>>      2 files changed, 2 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> index f201510..44960b3 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>>> @@ -3549,7 +3549,7 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct
>>>>>> amdgpu_ring *ring)
>>>>>>      static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring
>>>>>> *ring)
>>>>>>      {
>>>>>>          gfx_v9_0_write_data_to_reg(ring, 0, true,
>>>>>> -                   SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0), 1);
>>>>>> +                   SOC15_REG_OFFSET(HDP, 0,
>>>>>> mmHDP_READ_CACHE_INVALIDATE), 1);
>>>>>>      }
>>>>>>           static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring
>>>>>> *ring,
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>>>> index fd7c72a..d5f3848 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>>>>> @@ -398,7 +398,7 @@ static void
>>>>>> sdma_v4_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
>>>>>>      {
>>>>>>          amdgpu_ring_write(ring,
>>>>>> SDMA_PKT_HEADER_OP(SDMA_OP_SRBM_WRITE) |
>>>>>> SDMA_PKT_SRBM_WRITE_HEADER_BYTE_EN(0xf));
>>>>>> -    amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0));
>>>>>> +    amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0,
>>>>>> +mmHDP_READ_CACHE_INVALIDATE));
>>>>>>          amdgpu_ring_write(ring, 1);
>>>>>>      }
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 04/18] drm/amdgpu/sriov:don't load psp fw during gpu reset
       [not found]         ` <2cd93ffd-91a6-77c6-b07c-c68188a340a5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-09-20  1:32           ` Quan, Evan
       [not found]             ` <DM5PR1201MB2489EF41F0B4703FE248AEBDE4610-grEf7a3NxMAAZHT/xKzwlGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 61+ messages in thread
From: Quan, Evan @ 2017-09-20  1:32 UTC (permalink / raw)
  To: Koenig, Christian, Liu, Monk, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Monk,

I think your change affects barematal case. Per my confirmation, vega10 cannot boot with the change applied.
If the change is only intended to cover gpu reset case of sriov, maybe the logic should be
If (!(amdgpu_sriov_vf(adev) && adev->in_sriov_reset))

Regards,
Evan
>-----Original Message-----
>From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Christian
>K?nig
>Sent: Monday, September 18, 2017 5:06 PM
>To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH 04/18] drm/amdgpu/sriov:don't load psp fw during gpu reset
>
>Am 18.09.2017 um 08:11 schrieb Monk Liu:
>> At least for SRIOV we found reload PSP fw during
>> gpu reset cause PSP hang.
>>
>> Change-Id: I5f273187a10bb8571b77651dfba7656ce0429af0
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>
>Acked-by: Christian König <christian.koenig@amd.com>
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 15 +++++++++------
>>   1 file changed, 9 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>> index 8a1ee97..4eee2ef 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>> @@ -253,15 +253,18 @@ static int psp_asd_load(struct psp_context *psp)
>>
>>   static int psp_hw_start(struct psp_context *psp)
>>   {
>> +	struct amdgpu_device *adev = psp->adev;
>>   	int ret;
>>
>> -	ret = psp_bootloader_load_sysdrv(psp);
>> -	if (ret)
>> -		return ret;
>> +	if (amdgpu_sriov_vf(adev) && !adev->in_sriov_reset) {
>> +		ret = psp_bootloader_load_sysdrv(psp);
>> +		if (ret)
>> +			return ret;
>>
>> -	ret = psp_bootloader_load_sos(psp);
>> -	if (ret)
>> -		return ret;
>> +		ret = psp_bootloader_load_sos(psp);
>> +		if (ret)
>> +			return ret;
>> +	}
>>
>>   	ret = psp_ring_create(psp, PSP_RING_TYPE__KM);
>>   	if (ret)
>
>
>_______________________________________________
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 04/18] drm/amdgpu/sriov:don't load psp fw during gpu reset
       [not found]             ` <DM5PR1201MB2489EF41F0B4703FE248AEBDE4610-grEf7a3NxMAAZHT/xKzwlGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-09-20  1:54               ` Liu, Monk
  0 siblings, 0 replies; 61+ messages in thread
From: Liu, Monk @ 2017-09-20  1:54 UTC (permalink / raw)
  To: Quan, Evan, Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

This condition is wrong, you are right, and I followed up a patch to correct it 
Should be :

If (!amdgpu_sriov_vf()) || !adev->in_reset)


BR Monk

-----Original Message-----
From: Quan, Evan 
Sent: 2017年9月20日 9:32
To: Koenig, Christian <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH 04/18] drm/amdgpu/sriov:don't load psp fw during gpu reset

Hi Monk,

I think your change affects barematal case. Per my confirmation, vega10 cannot boot with the change applied.
If the change is only intended to cover gpu reset case of sriov, maybe the logic should be If (!(amdgpu_sriov_vf(adev) && adev->in_sriov_reset))

Regards,
Evan
>-----Original Message-----
>From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf 
>Of Christian K?nig
>Sent: Monday, September 18, 2017 5:06 PM
>To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH 04/18] drm/amdgpu/sriov:don't load psp fw during 
>gpu reset
>
>Am 18.09.2017 um 08:11 schrieb Monk Liu:
>> At least for SRIOV we found reload PSP fw during gpu reset cause PSP 
>> hang.
>>
>> Change-Id: I5f273187a10bb8571b77651dfba7656ce0429af0
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>
>Acked-by: Christian König <christian.koenig@amd.com>
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 15 +++++++++------
>>   1 file changed, 9 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>> index 8a1ee97..4eee2ef 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>> @@ -253,15 +253,18 @@ static int psp_asd_load(struct psp_context 
>> *psp)
>>
>>   static int psp_hw_start(struct psp_context *psp)
>>   {
>> +	struct amdgpu_device *adev = psp->adev;
>>   	int ret;
>>
>> -	ret = psp_bootloader_load_sysdrv(psp);
>> -	if (ret)
>> -		return ret;
>> +	if (amdgpu_sriov_vf(adev) && !adev->in_sriov_reset) {
>> +		ret = psp_bootloader_load_sysdrv(psp);
>> +		if (ret)
>> +			return ret;
>>
>> -	ret = psp_bootloader_load_sos(psp);
>> -	if (ret)
>> -		return ret;
>> +		ret = psp_bootloader_load_sos(psp);
>> +		if (ret)
>> +			return ret;
>> +	}
>>
>>   	ret = psp_ring_create(psp, PSP_RING_TYPE__KM);
>>   	if (ret)
>
>
>_______________________________________________
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset
       [not found]                 ` <45fa4145-41a4-6186-4f35-4f3347bad601-5C7GfCeVMHo@public.gmane.org>
@ 2017-09-20  2:27                   ` Liu, Monk
  0 siblings, 0 replies; 61+ messages in thread
From: Liu, Monk @ 2017-09-20  2:27 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Oh, I see your point, but that actually presents for a cleanup patch, and mine is to add a condition to fix memory leak, I think they different purpose and should be separated,

I can add one more patch to cleanup it with that "create_bo_kenel" to make code more tight and clean

BR Monk

-----Original Message-----
From: Koenig, Christian 
Sent: 2017年9月18日 19:35
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset

Am 18.09.2017 um 12:47 schrieb Liu, Monk:
> I didn't get your point... how could bo_create_kernel solve my issue ?

It doesn't solve the underlying issue, you just need less code for your workaround.

With bo_create_kernel you can do create/pin/kmap in just one function call.

>
> The thing here is during gpu reset we invoke hw_init for every hw 
> component, and by design hw_init shouldn't doing anything software 
> related, thus the BO allocating in hw_init is wrong,

Yeah, but your patch doesn't fix that either as far as I can see.

> Even switch to bo_create_kernel won't address the issue ...

See the implementation of bo_create_kernel():
>         if (!*bo_ptr) {
>                 r = amdgpu_bo_create(adev, size, align, true, domain,
....
>         }
....
>         r = amdgpu_bo_pin(*bo_ptr, domain, gpu_addr);
...
>         if (cpu_addr) {
>                 r = amdgpu_bo_kmap(*bo_ptr, cpu_addr);
...
>         }

Creating is actually optional, but the function always pins the BO once more and figures out it's CPU address.

As far as I can see that should solve your problem for now.

Christian.


>
>
> BR Monk
>
> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
> Sent: 2017年9月18日 17:13
> To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu 
> reset
>
> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>> doing gpu reset will rerun all hw_init and thus ucode_init_bo is 
>> invoked again, so we need to skip the fw_buf allocation during sriov 
>> gpu reset to avoid memory leak.
>>
>> Change-Id: I31131eda1bd45ea2f5bdc50c5da5fc5a9fe9027d
>> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h       |  3 ++
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 64 +++++++++++++++----------------
>>    2 files changed, 35 insertions(+), 32 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 6ff2959..3d0c633 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1185,6 +1185,9 @@ struct amdgpu_firmware {
>>    
>>    	/* gpu info firmware data pointer */
>>    	const struct firmware *gpu_info_fw;
>> +
>> +	void *fw_buf_ptr;
>> +	uint64_t fw_buf_mc;
>>    };
>>    
>>    /*
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> index f306374..6564902 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> @@ -360,8 +360,6 @@ static int amdgpu_ucode_patch_jt(struct amdgpu_firmware_info *ucode,
>>    int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
>>    {
>>    	struct amdgpu_bo **bo = &adev->firmware.fw_buf;
>> -	uint64_t fw_mc_addr;
>> -	void *fw_buf_ptr = NULL;
>>    	uint64_t fw_offset = 0;
>>    	int i, err;
>>    	struct amdgpu_firmware_info *ucode = NULL; @@ -372,37 +370,39 @@ 
>> int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
>>    		return 0;
>>    	}
>>    
>> -	err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true,
>> -				amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
>> -				AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
>> -				NULL, NULL, 0, bo);
>> -	if (err) {
>> -		dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err);
>> -		goto failed;
>> -	}
>> +	if (!amdgpu_sriov_vf(adev) || !adev->in_sriov_reset) {
> Instead of all this better use amdgpu_bo_create_kernel(), this should already include most of the handling necessary here.
>
> Christian.
>
>> +		err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true,
>> +					amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
>> +					AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
>> +					NULL, NULL, 0, bo);
>> +		if (err) {
>> +			dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err);
>> +			goto failed;
>> +		}
>>    
>> -	err = amdgpu_bo_reserve(*bo, false);
>> -	if (err) {
>> -		dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err);
>> -		goto failed_reserve;
>> -	}
>> +		err = amdgpu_bo_reserve(*bo, false);
>> +		if (err) {
>> +			dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err);
>> +			goto failed_reserve;
>> +		}
>>    
>> -	err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
>> -				&fw_mc_addr);
>> -	if (err) {
>> -		dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err);
>> -		goto failed_pin;
>> -	}
>> +		err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
>> +					&adev->firmware.fw_buf_mc);
>> +		if (err) {
>> +			dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err);
>> +			goto failed_pin;
>> +		}
>>    
>> -	err = amdgpu_bo_kmap(*bo, &fw_buf_ptr);
>> -	if (err) {
>> -		dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err);
>> -		goto failed_kmap;
>> -	}
>> +		err = amdgpu_bo_kmap(*bo, &adev->firmware.fw_buf_ptr);
>> +		if (err) {
>> +			dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err);
>> +			goto failed_kmap;
>> +		}
>>    
>> -	amdgpu_bo_unreserve(*bo);
>> +		amdgpu_bo_unreserve(*bo);
>> +	}
>>    
>> -	memset(fw_buf_ptr, 0, adev->firmware.fw_size);
>> +	memset(adev->firmware.fw_buf_ptr, 0, adev->firmware.fw_size);
>>    
>>    	/*
>>    	 * if SMU loaded firmware, it needn't add SMC, UVD, and VCE @@
>> -421,14 +421,14 @@ int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
>>    		ucode = &adev->firmware.ucode[i];
>>    		if (ucode->fw) {
>>    			header = (const struct common_firmware_header *)ucode->fw->data;
>> -			amdgpu_ucode_init_single_fw(adev, ucode, fw_mc_addr + fw_offset,
>> -						    (void *)((uint8_t *)fw_buf_ptr + fw_offset));
>> +			amdgpu_ucode_init_single_fw(adev, ucode, adev->firmware.fw_buf_mc + fw_offset,
>> +						    adev->firmware.fw_buf_ptr + fw_offset);
>>    			if (i == AMDGPU_UCODE_ID_CP_MEC1 &&
>>    			    adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) {
>>    				const struct gfx_firmware_header_v1_0 *cp_hdr;
>>    				cp_hdr = (const struct gfx_firmware_header_v1_0 *)ucode->fw->data;
>> -				amdgpu_ucode_patch_jt(ucode, fw_mc_addr + fw_offset,
>> -						    fw_buf_ptr + fw_offset);
>> +				amdgpu_ucode_patch_jt(ucode,  adev->firmware.fw_buf_mc + fw_offset,
>> +						    adev->firmware.fw_buf_ptr + fw_offset);
>>    				fw_offset += ALIGN(le32_to_cpu(cp_hdr->jt_size) << 2, PAGE_SIZE);
>>    			}
>>    			fw_offset += ALIGN(ucode->ucode_size, PAGE_SIZE);
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2017-09-20  2:27 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-18  6:11 [PATCH 00/18] *** misc patches for SRIOV *** Monk Liu
     [not found] ` <1505715122-23904-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  6:11   ` [PATCH 01/18] drm/amdgpu/sriov:fix missing error handling Monk Liu
     [not found]     ` <1505715122-23904-2-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:04       ` Christian König
2017-09-18  6:11   ` [PATCH 02/18] drm/amdgpu:no kiq in IH Monk Liu
     [not found]     ` <1505715122-23904-3-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:05       ` Christian König
2017-09-18  6:11   ` [PATCH 03/18] drm/amdgpu/sriov:move in_reset to adev and rename Monk Liu
     [not found]     ` <1505715122-23904-4-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:05       ` Christian König
2017-09-18  6:11   ` [PATCH 04/18] drm/amdgpu/sriov:don't load psp fw during gpu reset Monk Liu
     [not found]     ` <1505715122-23904-5-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:06       ` Christian König
     [not found]         ` <2cd93ffd-91a6-77c6-b07c-c68188a340a5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-20  1:32           ` Quan, Evan
     [not found]             ` <DM5PR1201MB2489EF41F0B4703FE248AEBDE4610-grEf7a3NxMAAZHT/xKzwlGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-09-20  1:54               ` Liu, Monk
2017-09-18  6:11   ` [PATCH 05/18] drm/amdgpu:make ctx_add_fence interruptible Monk Liu
     [not found]     ` <1505715122-23904-6-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:10       ` Christian König
2017-09-18  6:11   ` [PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset Monk Liu
     [not found]     ` <1505715122-23904-7-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:12       ` Christian König
     [not found]         ` <f96a1189-2fe3-6466-df1b-557f87319cb9-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-18 10:47           ` Liu, Monk
     [not found]             ` <BLUPR12MB0449D8D7812A4C80EDA2253D84630-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-18 11:34               ` Christian König
     [not found]                 ` <45fa4145-41a4-6186-4f35-4f3347bad601-5C7GfCeVMHo@public.gmane.org>
2017-09-20  2:27                   ` Liu, Monk
2017-09-18  6:11   ` [PATCH 07/18] drm/amdgpu:add hdp golden setting register name hint Monk Liu
     [not found]     ` <1505715122-23904-8-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:13       ` Christian König
2017-09-18  6:11   ` [PATCH 08/18] drm/amdgpu:halt when vm fault Monk Liu
     [not found]     ` <1505715122-23904-9-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:14       ` Christian König
2017-09-18  6:11   ` [PATCH 09/18] drm/amdgpu:insert TMZ_BEGIN Monk Liu
     [not found]     ` <1505715122-23904-10-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:15       ` Christian König
2017-09-18  6:11   ` [PATCH 10/18] drm/amdgpu:hdp flush should be put it initialized Monk Liu
     [not found]     ` <1505715122-23904-11-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:16       ` Christian König
2017-09-18  6:11   ` [PATCH 11/18] drm/amdgpu:add vgt_flush for gfx9 Monk Liu
     [not found]     ` <1505715122-23904-12-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:18       ` Christian König
     [not found]         ` <34ac878c-5bf7-7735-1787-b5d3c1691fd2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-18 15:48           ` Marek Olšák
2017-09-18  6:11   ` [PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate Monk Liu
     [not found]     ` <1505715122-23904-13-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:19       ` Christian König
     [not found]         ` <2f11f862-6022-7a97-17ab-ae2c634f0061-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-18 11:03           ` Liu, Monk
     [not found]             ` <BLUPR12MB04497CDE395DCE35F830DD4F84630-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-18 11:39               ` Christian König
     [not found]                 ` <4de1beaf-95c0-ba6e-da79-1070074f82e8-5C7GfCeVMHo@public.gmane.org>
2017-09-19  4:04                   ` Liu, Monk
     [not found]                     ` <BLUPR12MB0449D86C880B4B15A4FD916884600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-19  4:25                       ` Zhou, David(ChunMing)
     [not found]                         ` <MWHPR1201MB020621C233AA2C12F6127C61B4600-3iK1xFAIwjrUF/YbdlDdgWrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-09-19  6:46                           ` Liu, Monk
     [not found]                             ` <BLUPR12MB0449F560B6A658DC4C120EC084600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-19  6:50                               ` zhoucm1
     [not found]                                 ` <baa9518f-d2b1-cfb8-8f98-c3557e3ef8fe-5C7GfCeVMHo@public.gmane.org>
2017-09-19  7:00                                   ` Liu, Monk
     [not found]                                     ` <BLUPR12MB0449775C4245A708B15E9D0B84600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-19  7:02                                       ` zhoucm1
     [not found]                                         ` <5367a2b2-3044-7388-08ff-6f0a620d5aa8-5C7GfCeVMHo@public.gmane.org>
2017-09-19  8:30                                           ` Christian König
     [not found]                                             ` <28fa17b6-ebb0-99c7-042a-19289d858f64-5C7GfCeVMHo@public.gmane.org>
2017-09-19  9:34                                               ` Zhang, Jerry (Junwei)
2017-09-19 13:42                                               ` Alex Deucher
2017-09-18  6:11   ` [PATCH 13/18] drm/amdgpu:fix driver unloading bug Monk Liu
     [not found]     ` <1505715122-23904-14-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:27       ` Christian König
     [not found]         ` <1821bf91-83d8-c933-704d-fcd8db07def1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-18 10:12           ` Liu, Monk
     [not found]             ` <BLUPR12MB0449D3944109EA4A7D151A2684630-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-18 11:53               ` Christian König
     [not found]                 ` <fade2e70-6594-9a6e-9d5a-d488d360363e-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-19  4:14                   ` Liu, Monk
     [not found]                     ` <BLUPR12MB04498EEB2BF374C72EF7CF5384600-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-09-19  8:26                       ` Christian König
     [not found]                         ` <69a1e774-6a9e-31c6-8b30-dfbd430062c8-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-19 11:37                           ` Liu, Monk
2017-09-18  6:11   ` [PATCH 14/18] drm/amdgpu: Fix amdgpu reload failure under SRIOV Monk Liu
     [not found]     ` <1505715122-23904-15-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:10       ` Yu, Xiangliang
2017-09-18  9:31       ` Christian König
     [not found]         ` <0951ed06-954a-0f31-6b6e-ba923be008a2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-18 21:07           ` Alex Deucher
     [not found]             ` <CADnq5_Nj5Kqp4CXtFLLz-cPynvchBV-RLFFpB6e5D-OCyPXQiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-09-19  1:52               ` Yu, Xiangliang
2017-09-18  6:11   ` [PATCH 15/18] drm/amdgpu/sriov: fix page fault issue of driver unload Monk Liu
     [not found]     ` <1505715122-23904-16-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:22       ` Christian König
2017-09-18  6:12   ` [PATCH 16/18] drm/amdgpu: increate mailbox polling timeout to 12s Monk Liu
     [not found]     ` <1505715122-23904-17-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:23       ` Christian König
2017-09-18  6:12   ` [PATCH 17/18] drm/amdgpu:fix uvd ring fini routine Monk Liu
     [not found]     ` <1505715122-23904-18-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>
2017-09-18  9:25       ` Christian König
2017-09-18  6:12   ` [PATCH 18/18] drm/amdgpu/sriov:init csb for gfxv9 Monk Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.