* [PATCH 1/4] drm/amdgpu: clear UVD VCPU buffer when err_event_athub generated
@ 2019-10-28 11:31 ` Le Ma
0 siblings, 0 replies; 18+ messages in thread
From: Le Ma @ 2019-10-28 11:31 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Le Ma
The err_event_athub error will mess up the buffer and cause UVD resume hang.
Change-Id: If17a2161fb9b1b52eac08de00d2e935191bdbf99
Signed-off-by: Le Ma <le.ma@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index b2c364b..b4dd89a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -39,6 +39,8 @@
#include "cikd.h"
#include "uvd/uvd_4_2_d.h"
+#include "amdgpu_ras.h"
+
/* 1 second timeout */
#define UVD_IDLE_TIMEOUT msecs_to_jiffies(1000)
@@ -372,7 +374,13 @@ int amdgpu_uvd_suspend(struct amdgpu_device *adev)
if (!adev->uvd.inst[j].saved_bo)
return -ENOMEM;
- memcpy_fromio(adev->uvd.inst[j].saved_bo, ptr, size);
+ /* re-write 0 since err_event_athub will corrupt VCPU buffer */
+ if (amdgpu_ras_intr_triggered()) {
+ DRM_WARN("UVD VCPU state may lost due to RAS ERREVENT_ATHUB_INTERRUPT\n");
+ memset(adev->uvd.inst[j].saved_bo, 0, size);
+ } else {
+ memcpy_fromio(adev->uvd.inst[j].saved_bo, ptr, size);
+ }
}
return 0;
}
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 1/4] drm/amdgpu: clear UVD VCPU buffer when err_event_athub generated
@ 2019-10-28 11:31 ` Le Ma
0 siblings, 0 replies; 18+ messages in thread
From: Le Ma @ 2019-10-28 11:31 UTC (permalink / raw)
To: amd-gfx; +Cc: Le Ma
The err_event_athub error will mess up the buffer and cause UVD resume hang.
Change-Id: If17a2161fb9b1b52eac08de00d2e935191bdbf99
Signed-off-by: Le Ma <le.ma@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index b2c364b..b4dd89a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -39,6 +39,8 @@
#include "cikd.h"
#include "uvd/uvd_4_2_d.h"
+#include "amdgpu_ras.h"
+
/* 1 second timeout */
#define UVD_IDLE_TIMEOUT msecs_to_jiffies(1000)
@@ -372,7 +374,13 @@ int amdgpu_uvd_suspend(struct amdgpu_device *adev)
if (!adev->uvd.inst[j].saved_bo)
return -ENOMEM;
- memcpy_fromio(adev->uvd.inst[j].saved_bo, ptr, size);
+ /* re-write 0 since err_event_athub will corrupt VCPU buffer */
+ if (amdgpu_ras_intr_triggered()) {
+ DRM_WARN("UVD VCPU state may lost due to RAS ERREVENT_ATHUB_INTERRUPT\n");
+ memset(adev->uvd.inst[j].saved_bo, 0, size);
+ } else {
+ memcpy_fromio(adev->uvd.inst[j].saved_bo, ptr, size);
+ }
}
return 0;
}
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/4] drm/amdgpu: reset err_event_athub flag if gpu recovery succeeded
@ 2019-10-28 11:31 ` Le Ma
0 siblings, 0 replies; 18+ messages in thread
From: Le Ma @ 2019-10-28 11:31 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Le Ma
Otherwise next err_event_athub error cannot call gpu reset.
Change-Id: I5cd293f30f23876bf2a1860681bcb50f47713ecd
Signed-off-by: Le Ma <le.ma@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 676cad1..51d74bb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4089,6 +4089,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
}
}
+ if (!r && in_ras_intr)
+ atomic_set(&amdgpu_ras_in_intr, 0);
+
skip_sched_resume:
list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
/*unlock kfd: SRIOV would do it separately */
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/4] drm/amdgpu: reset err_event_athub flag if gpu recovery succeeded
@ 2019-10-28 11:31 ` Le Ma
0 siblings, 0 replies; 18+ messages in thread
From: Le Ma @ 2019-10-28 11:31 UTC (permalink / raw)
To: amd-gfx; +Cc: Le Ma
Otherwise next err_event_athub error cannot call gpu reset.
Change-Id: I5cd293f30f23876bf2a1860681bcb50f47713ecd
Signed-off-by: Le Ma <le.ma@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 676cad1..51d74bb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4089,6 +4089,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
}
}
+ if (!r && in_ras_intr)
+ atomic_set(&amdgpu_ras_in_intr, 0);
+
skip_sched_resume:
list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
/*unlock kfd: SRIOV would do it separately */
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/4] drm/amdgpu: bypass some cleanup work after err_event_athub
@ 2019-10-28 11:31 ` Le Ma
0 siblings, 0 replies; 18+ messages in thread
From: Le Ma @ 2019-10-28 11:31 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Le Ma
PSP lost connection when err_event_athub occurs. These cleanup work can be
skipped in BACO reset.
Change-Id: If54a3735edd6ccbb58d40a5f8833392981f8ce37
Signed-off-by: Le Ma <le.ma@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 ++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 7 +++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 20 +++++++++++---------
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 ++++--
4 files changed, 28 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 51d74bb..72d9892 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2274,6 +2274,12 @@ static int amdgpu_device_ip_suspend_phase2(struct amdgpu_device *adev)
/* displays are handled in phase1 */
if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_DCE)
continue;
+ /* PSP lost connection when err_event_athub occurs */
+ if (amdgpu_ras_intr_triggered() &&
+ adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_PSP) {
+ adev->ip_blocks[i].status.hw = false;
+ continue;
+ }
/* XXX handle errors */
r = adev->ip_blocks[i].version->funcs->suspend(adev);
/* XXX handle errors */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index fd7a73f..fce206f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -167,6 +167,13 @@ psp_cmd_submit_buf(struct psp_context *psp,
while (*((unsigned int *)psp->fence_buf) != index) {
if (--timeout == 0)
break;
+ /*
+ * Shouldn't wait for timeout when err_event_athub occurs,
+ * because gpu reset thread triggered and lock resource should
+ * be released for psp resume sequence.
+ */
+ if (amdgpu_ras_intr_triggered())
+ break;
msleep(1);
amdgpu_asic_invalidate_hdp(psp->adev, NULL);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 796326b..dab90c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -558,15 +558,17 @@ int amdgpu_ras_feature_enable(struct amdgpu_device *adev,
if (!(!!enable ^ !!amdgpu_ras_is_feature_enabled(adev, head)))
return 0;
- ret = psp_ras_enable_features(&adev->psp, &info, enable);
- if (ret) {
- DRM_ERROR("RAS ERROR: %s %s feature failed ret %d\n",
- enable ? "enable":"disable",
- ras_block_str(head->block),
- ret);
- if (ret == TA_RAS_STATUS__RESET_NEEDED)
- return -EAGAIN;
- return -EINVAL;
+ if (!amdgpu_ras_intr_triggered()) {
+ ret = psp_ras_enable_features(&adev->psp, &info, enable);
+ if (ret) {
+ DRM_ERROR("RAS ERROR: %s %s feature failed ret %d\n",
+ enable ? "enable":"disable",
+ ras_block_str(head->block),
+ ret);
+ if (ret == TA_RAS_STATUS__RESET_NEEDED)
+ return -EAGAIN;
+ return -EINVAL;
+ }
}
/* setup the obj */
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 9fe95e7..9c2dba62 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3736,8 +3736,10 @@ static int gfx_v9_0_hw_fini(void *handle)
amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
- /* disable KCQ to avoid CPC touch memory not valid anymore */
- gfx_v9_0_kcq_disable(adev);
+ /* DF freeze and kcq disable will fail */
+ if (!amdgpu_ras_intr_triggered())
+ /* disable KCQ to avoid CPC touch memory not valid anymore */
+ gfx_v9_0_kcq_disable(adev);
if (amdgpu_sriov_vf(adev)) {
gfx_v9_0_cp_gfx_enable(adev, false);
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/4] drm/amdgpu: bypass some cleanup work after err_event_athub
@ 2019-10-28 11:31 ` Le Ma
0 siblings, 0 replies; 18+ messages in thread
From: Le Ma @ 2019-10-28 11:31 UTC (permalink / raw)
To: amd-gfx; +Cc: Le Ma
PSP lost connection when err_event_athub occurs. These cleanup work can be
skipped in BACO reset.
Change-Id: If54a3735edd6ccbb58d40a5f8833392981f8ce37
Signed-off-by: Le Ma <le.ma@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 ++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 7 +++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 20 +++++++++++---------
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 ++++--
4 files changed, 28 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 51d74bb..72d9892 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2274,6 +2274,12 @@ static int amdgpu_device_ip_suspend_phase2(struct amdgpu_device *adev)
/* displays are handled in phase1 */
if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_DCE)
continue;
+ /* PSP lost connection when err_event_athub occurs */
+ if (amdgpu_ras_intr_triggered() &&
+ adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_PSP) {
+ adev->ip_blocks[i].status.hw = false;
+ continue;
+ }
/* XXX handle errors */
r = adev->ip_blocks[i].version->funcs->suspend(adev);
/* XXX handle errors */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index fd7a73f..fce206f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -167,6 +167,13 @@ psp_cmd_submit_buf(struct psp_context *psp,
while (*((unsigned int *)psp->fence_buf) != index) {
if (--timeout == 0)
break;
+ /*
+ * Shouldn't wait for timeout when err_event_athub occurs,
+ * because gpu reset thread triggered and lock resource should
+ * be released for psp resume sequence.
+ */
+ if (amdgpu_ras_intr_triggered())
+ break;
msleep(1);
amdgpu_asic_invalidate_hdp(psp->adev, NULL);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 796326b..dab90c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -558,15 +558,17 @@ int amdgpu_ras_feature_enable(struct amdgpu_device *adev,
if (!(!!enable ^ !!amdgpu_ras_is_feature_enabled(adev, head)))
return 0;
- ret = psp_ras_enable_features(&adev->psp, &info, enable);
- if (ret) {
- DRM_ERROR("RAS ERROR: %s %s feature failed ret %d\n",
- enable ? "enable":"disable",
- ras_block_str(head->block),
- ret);
- if (ret == TA_RAS_STATUS__RESET_NEEDED)
- return -EAGAIN;
- return -EINVAL;
+ if (!amdgpu_ras_intr_triggered()) {
+ ret = psp_ras_enable_features(&adev->psp, &info, enable);
+ if (ret) {
+ DRM_ERROR("RAS ERROR: %s %s feature failed ret %d\n",
+ enable ? "enable":"disable",
+ ras_block_str(head->block),
+ ret);
+ if (ret == TA_RAS_STATUS__RESET_NEEDED)
+ return -EAGAIN;
+ return -EINVAL;
+ }
}
/* setup the obj */
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 9fe95e7..9c2dba62 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3736,8 +3736,10 @@ static int gfx_v9_0_hw_fini(void *handle)
amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
- /* disable KCQ to avoid CPC touch memory not valid anymore */
- gfx_v9_0_kcq_disable(adev);
+ /* DF freeze and kcq disable will fail */
+ if (!amdgpu_ras_intr_triggered())
+ /* disable KCQ to avoid CPC touch memory not valid anymore */
+ gfx_v9_0_kcq_disable(adev);
if (amdgpu_sriov_vf(adev)) {
gfx_v9_0_cp_gfx_enable(adev, false);
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/4] drm/amdgpu: remove ras global recovery handling from ras_controller_int handler
@ 2019-10-28 11:31 ` Le Ma
0 siblings, 0 replies; 18+ messages in thread
From: Le Ma @ 2019-10-28 11:31 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Le Ma
From: Le Ma <Le.Ma@amd.com>
Change-Id: Ia8a61a4b3bd529f0f691e43e69b299d7d151c0c2
Signed-off-by: Le Ma <Le.Ma@amd.com>
---
drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
index 0db458f..876690a 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
@@ -324,7 +324,11 @@ static void nbio_v7_4_handle_ras_controller_intr_no_bifring(struct amdgpu_device
RAS_CNTLR_INTERRUPT_CLEAR, 1);
WREG32_SOC15(NBIO, 0, mmBIF_DOORBELL_INT_CNTL, bif_doorbell_intr_cntl);
- amdgpu_ras_global_ras_isr(adev);
+ /*
+ * ras_controller_int is dedicated for nbif ras error,
+ * not the global interrupt for sync flood
+ */
+ amdgpu_ras_reset_gpu(adev, true);
}
}
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/4] drm/amdgpu: remove ras global recovery handling from ras_controller_int handler
@ 2019-10-28 11:31 ` Le Ma
0 siblings, 0 replies; 18+ messages in thread
From: Le Ma @ 2019-10-28 11:31 UTC (permalink / raw)
To: amd-gfx; +Cc: Le Ma
From: Le Ma <Le.Ma@amd.com>
Change-Id: Ia8a61a4b3bd529f0f691e43e69b299d7d151c0c2
Signed-off-by: Le Ma <Le.Ma@amd.com>
---
drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
index 0db458f..876690a 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
@@ -324,7 +324,11 @@ static void nbio_v7_4_handle_ras_controller_intr_no_bifring(struct amdgpu_device
RAS_CNTLR_INTERRUPT_CLEAR, 1);
WREG32_SOC15(NBIO, 0, mmBIF_DOORBELL_INT_CNTL, bif_doorbell_intr_cntl);
- amdgpu_ras_global_ras_isr(adev);
+ /*
+ * ras_controller_int is dedicated for nbif ras error,
+ * not the global interrupt for sync flood
+ */
+ amdgpu_ras_reset_gpu(adev, true);
}
}
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* RE: [PATCH 1/4] drm/amdgpu: clear UVD VCPU buffer when err_event_athub generated
@ 2019-10-28 11:53 ` Zhang, Hawking
0 siblings, 0 replies; 18+ messages in thread
From: Zhang, Hawking @ 2019-10-28 11:53 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Ma, Le
We should hold on patch #2 and patch #4 until we have baco based RAS recovery works since current ras recovery policy is changed by these two patches.
Other than that, the Series is
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Regards,
Hawking
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Le Ma
Sent: 2019年10月28日 19:31
To: amd-gfx@lists.freedesktop.org
Cc: Ma, Le <Le.Ma@amd.com>
Subject: [PATCH 1/4] drm/amdgpu: clear UVD VCPU buffer when err_event_athub generated
The err_event_athub error will mess up the buffer and cause UVD resume hang.
Change-Id: If17a2161fb9b1b52eac08de00d2e935191bdbf99
Signed-off-by: Le Ma <le.ma@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index b2c364b..b4dd89a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -39,6 +39,8 @@
#include "cikd.h"
#include "uvd/uvd_4_2_d.h"
+#include "amdgpu_ras.h"
+
/* 1 second timeout */
#define UVD_IDLE_TIMEOUT msecs_to_jiffies(1000)
@@ -372,7 +374,13 @@ int amdgpu_uvd_suspend(struct amdgpu_device *adev)
if (!adev->uvd.inst[j].saved_bo)
return -ENOMEM;
- memcpy_fromio(adev->uvd.inst[j].saved_bo, ptr, size);
+ /* re-write 0 since err_event_athub will corrupt VCPU buffer */
+ if (amdgpu_ras_intr_triggered()) {
+ DRM_WARN("UVD VCPU state may lost due to RAS ERREVENT_ATHUB_INTERRUPT\n");
+ memset(adev->uvd.inst[j].saved_bo, 0, size);
+ } else {
+ memcpy_fromio(adev->uvd.inst[j].saved_bo, ptr, size);
+ }
}
return 0;
}
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* RE: [PATCH 1/4] drm/amdgpu: clear UVD VCPU buffer when err_event_athub generated
@ 2019-10-28 11:53 ` Zhang, Hawking
0 siblings, 0 replies; 18+ messages in thread
From: Zhang, Hawking @ 2019-10-28 11:53 UTC (permalink / raw)
To: Ma, Le, amd-gfx; +Cc: Ma, Le
We should hold on patch #2 and patch #4 until we have baco based RAS recovery works since current ras recovery policy is changed by these two patches.
Other than that, the Series is
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Regards,
Hawking
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Le Ma
Sent: 2019年10月28日 19:31
To: amd-gfx@lists.freedesktop.org
Cc: Ma, Le <Le.Ma@amd.com>
Subject: [PATCH 1/4] drm/amdgpu: clear UVD VCPU buffer when err_event_athub generated
The err_event_athub error will mess up the buffer and cause UVD resume hang.
Change-Id: If17a2161fb9b1b52eac08de00d2e935191bdbf99
Signed-off-by: Le Ma <le.ma@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index b2c364b..b4dd89a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -39,6 +39,8 @@
#include "cikd.h"
#include "uvd/uvd_4_2_d.h"
+#include "amdgpu_ras.h"
+
/* 1 second timeout */
#define UVD_IDLE_TIMEOUT msecs_to_jiffies(1000)
@@ -372,7 +374,13 @@ int amdgpu_uvd_suspend(struct amdgpu_device *adev)
if (!adev->uvd.inst[j].saved_bo)
return -ENOMEM;
- memcpy_fromio(adev->uvd.inst[j].saved_bo, ptr, size);
+ /* re-write 0 since err_event_athub will corrupt VCPU buffer */
+ if (amdgpu_ras_intr_triggered()) {
+ DRM_WARN("UVD VCPU state may lost due to RAS ERREVENT_ATHUB_INTERRUPT\n");
+ memset(adev->uvd.inst[j].saved_bo, 0, size);
+ } else {
+ memcpy_fromio(adev->uvd.inst[j].saved_bo, ptr, size);
+ }
}
return 0;
}
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* RE: [PATCH 2/4] drm/amdgpu: reset err_event_athub flag if gpu recovery succeeded
@ 2019-10-29 1:27 ` Chen, Guchun
0 siblings, 0 replies; 18+ messages in thread
From: Chen, Guchun @ 2019-10-29 1:27 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Ma, Le
Regards,
Guchun
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Le Ma
Sent: Monday, October 28, 2019 7:31 PM
To: amd-gfx@lists.freedesktop.org
Cc: Ma, Le <Le.Ma@amd.com>
Subject: [PATCH 2/4] drm/amdgpu: reset err_event_athub flag if gpu recovery succeeded
Otherwise next err_event_athub error cannot call gpu reset.
Change-Id: I5cd293f30f23876bf2a1860681bcb50f47713ecd
Signed-off-by: Le Ma <le.ma@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 676cad1..51d74bb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4089,6 +4089,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
}
}
+ if (!r && in_ras_intr)
+ atomic_set(&amdgpu_ras_in_intr, 0);
[Guchun]To access this atomic variable, maybe it's better we create a new function like reset or clear in amdgpu_ras.h or .c first, then we can call that function here, like we we do to amdgpu_ras_intr_triggered in this same function. This will do assist to modularity of ras driver.
skip_sched_resume:
list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
/*unlock kfd: SRIOV would do it separately */
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* RE: [PATCH 2/4] drm/amdgpu: reset err_event_athub flag if gpu recovery succeeded
@ 2019-10-29 1:27 ` Chen, Guchun
0 siblings, 0 replies; 18+ messages in thread
From: Chen, Guchun @ 2019-10-29 1:27 UTC (permalink / raw)
To: Ma, Le, amd-gfx; +Cc: Ma, Le
Regards,
Guchun
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Le Ma
Sent: Monday, October 28, 2019 7:31 PM
To: amd-gfx@lists.freedesktop.org
Cc: Ma, Le <Le.Ma@amd.com>
Subject: [PATCH 2/4] drm/amdgpu: reset err_event_athub flag if gpu recovery succeeded
Otherwise next err_event_athub error cannot call gpu reset.
Change-Id: I5cd293f30f23876bf2a1860681bcb50f47713ecd
Signed-off-by: Le Ma <le.ma@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 676cad1..51d74bb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4089,6 +4089,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
}
}
+ if (!r && in_ras_intr)
+ atomic_set(&amdgpu_ras_in_intr, 0);
[Guchun]To access this atomic variable, maybe it's better we create a new function like reset or clear in amdgpu_ras.h or .c first, then we can call that function here, like we we do to amdgpu_ras_intr_triggered in this same function. This will do assist to modularity of ras driver.
skip_sched_resume:
list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
/*unlock kfd: SRIOV would do it separately */
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* RE: [PATCH 4/4] drm/amdgpu: remove ras global recovery handling from ras_controller_int handler
@ 2019-10-29 1:36 ` Chen, Guchun
0 siblings, 0 replies; 18+ messages in thread
From: Chen, Guchun @ 2019-10-29 1:36 UTC (permalink / raw)
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Ma, Le
Regards,
Guchun
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Le Ma
Sent: Monday, October 28, 2019 7:31 PM
To: amd-gfx@lists.freedesktop.org
Cc: Ma, Le <Le.Ma@amd.com>
Subject: [PATCH 4/4] drm/amdgpu: remove ras global recovery handling from ras_controller_int handler
From: Le Ma <Le.Ma@amd.com>
Change-Id: Ia8a61a4b3bd529f0f691e43e69b299d7d151c0c2
Signed-off-by: Le Ma <Le.Ma@amd.com>
---
drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
index 0db458f..876690a 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
@@ -324,7 +324,11 @@ static void nbio_v7_4_handle_ras_controller_intr_no_bifring(struct amdgpu_device
RAS_CNTLR_INTERRUPT_CLEAR, 1);
WREG32_SOC15(NBIO, 0, mmBIF_DOORBELL_INT_CNTL, bif_doorbell_intr_cntl);
- amdgpu_ras_global_ras_isr(adev);
+ /*
+ * ras_controller_int is dedicated for nbif ras error,
+ * not the global interrupt for sync flood
+ */
+ amdgpu_ras_reset_gpu(adev, true);
[Guchun]We need to add one printing here to tell audience, who and why resets gpu? And moreover, in the removed global ras isr handler amdgpu_ras_global_ras_isr, we call amdgpu_ras_reset_gpu with is_baco parameter "false", but now we use "true" here?
}
}
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* RE: [PATCH 4/4] drm/amdgpu: remove ras global recovery handling from ras_controller_int handler
@ 2019-10-29 1:36 ` Chen, Guchun
0 siblings, 0 replies; 18+ messages in thread
From: Chen, Guchun @ 2019-10-29 1:36 UTC (permalink / raw)
To: Ma, Le, amd-gfx; +Cc: Ma, Le
Regards,
Guchun
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Le Ma
Sent: Monday, October 28, 2019 7:31 PM
To: amd-gfx@lists.freedesktop.org
Cc: Ma, Le <Le.Ma@amd.com>
Subject: [PATCH 4/4] drm/amdgpu: remove ras global recovery handling from ras_controller_int handler
From: Le Ma <Le.Ma@amd.com>
Change-Id: Ia8a61a4b3bd529f0f691e43e69b299d7d151c0c2
Signed-off-by: Le Ma <Le.Ma@amd.com>
---
drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
index 0db458f..876690a 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
@@ -324,7 +324,11 @@ static void nbio_v7_4_handle_ras_controller_intr_no_bifring(struct amdgpu_device
RAS_CNTLR_INTERRUPT_CLEAR, 1);
WREG32_SOC15(NBIO, 0, mmBIF_DOORBELL_INT_CNTL, bif_doorbell_intr_cntl);
- amdgpu_ras_global_ras_isr(adev);
+ /*
+ * ras_controller_int is dedicated for nbif ras error,
+ * not the global interrupt for sync flood
+ */
+ amdgpu_ras_reset_gpu(adev, true);
[Guchun]We need to add one printing here to tell audience, who and why resets gpu? And moreover, in the removed global ras isr handler amdgpu_ras_global_ras_isr, we call amdgpu_ras_reset_gpu with is_baco parameter "false", but now we use "true" here?
}
}
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 18+ messages in thread
* RE: [PATCH 2/4] drm/amdgpu: reset err_event_athub flag if gpu recovery succeeded
@ 2019-10-29 7:27 ` Ma, Le
0 siblings, 0 replies; 18+ messages in thread
From: Ma, Le @ 2019-10-29 7:27 UTC (permalink / raw)
To: Chen, Guchun, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
[-- Attachment #1.1: Type: text/plain, Size: 2395 bytes --]
> -----Original Message-----
> From: Chen, Guchun <Guchun.Chen@amd.com>
> Sent: Tuesday, October 29, 2019 9:28 AM
> To: Ma, Le <Le.Ma@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Ma, Le <Le.Ma@amd.com>
> Subject: RE: [PATCH 2/4] drm/amdgpu: reset err_event_athub flag if gpu
> recovery succeeded
>
>
>
> Regards,
> Guchun
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org<mailto:amd-gfx-bounces@lists.freedesktop.org>> On Behalf Of Le Ma
> Sent: Monday, October 28, 2019 7:31 PM
> To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
> Cc: Ma, Le <Le.Ma@amd.com<mailto:Le.Ma@amd.com>>
> Subject: [PATCH 2/4] drm/amdgpu: reset err_event_athub flag if gpu recovery
> succeeded
>
> Otherwise next err_event_athub error cannot call gpu reset.
>
> Change-Id: I5cd293f30f23876bf2a1860681bcb50f47713ecd
> Signed-off-by: Le Ma <le.ma@amd.com<mailto:le.ma@amd.com>>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 676cad1..51d74bb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4089,6 +4089,9 @@ int amdgpu_device_gpu_recover(struct
> amdgpu_device *adev,
> }
> }
>
> + if (!r && in_ras_intr)
> + atomic_set(&amdgpu_ras_in_intr, 0);
> [Guchun]To access this atomic variable, maybe it's better we create a new
> function like reset or clear in amdgpu_ras.h or .c first, then we can call that
> function here, like we we do to amdgpu_ras_intr_triggered in this same
> function. This will do assist to modularity of ras driver.
> [Le] Agree with you. We could make it paired with amdgpu_ras_intr_triggered.
> skip_sched_resume:
> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
> /*unlock kfd: SRIOV would do it separately */
> --
> 2.7.4
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[-- Attachment #1.2: Type: text/html, Size: 7163 bytes --]
[-- Attachment #2: Type: text/plain, Size: 153 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 2/4] drm/amdgpu: reset err_event_athub flag if gpu recovery succeeded
@ 2019-10-29 7:27 ` Ma, Le
0 siblings, 0 replies; 18+ messages in thread
From: Ma, Le @ 2019-10-29 7:27 UTC (permalink / raw)
To: Chen, Guchun, amd-gfx
[-- Attachment #1.1: Type: text/plain, Size: 2395 bytes --]
> -----Original Message-----
> From: Chen, Guchun <Guchun.Chen@amd.com>
> Sent: Tuesday, October 29, 2019 9:28 AM
> To: Ma, Le <Le.Ma@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Ma, Le <Le.Ma@amd.com>
> Subject: RE: [PATCH 2/4] drm/amdgpu: reset err_event_athub flag if gpu
> recovery succeeded
>
>
>
> Regards,
> Guchun
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org<mailto:amd-gfx-bounces@lists.freedesktop.org>> On Behalf Of Le Ma
> Sent: Monday, October 28, 2019 7:31 PM
> To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
> Cc: Ma, Le <Le.Ma@amd.com<mailto:Le.Ma@amd.com>>
> Subject: [PATCH 2/4] drm/amdgpu: reset err_event_athub flag if gpu recovery
> succeeded
>
> Otherwise next err_event_athub error cannot call gpu reset.
>
> Change-Id: I5cd293f30f23876bf2a1860681bcb50f47713ecd
> Signed-off-by: Le Ma <le.ma@amd.com<mailto:le.ma@amd.com>>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 676cad1..51d74bb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4089,6 +4089,9 @@ int amdgpu_device_gpu_recover(struct
> amdgpu_device *adev,
> }
> }
>
> + if (!r && in_ras_intr)
> + atomic_set(&amdgpu_ras_in_intr, 0);
> [Guchun]To access this atomic variable, maybe it's better we create a new
> function like reset or clear in amdgpu_ras.h or .c first, then we can call that
> function here, like we we do to amdgpu_ras_intr_triggered in this same
> function. This will do assist to modularity of ras driver.
> [Le] Agree with you. We could make it paired with amdgpu_ras_intr_triggered.
> skip_sched_resume:
> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
> /*unlock kfd: SRIOV would do it separately */
> --
> 2.7.4
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[-- Attachment #1.2: Type: text/html, Size: 7163 bytes --]
[-- Attachment #2: Type: text/plain, Size: 153 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 4/4] drm/amdgpu: remove ras global recovery handling from ras_controller_int handler
@ 2019-10-29 7:37 ` Ma, Le
0 siblings, 0 replies; 18+ messages in thread
From: Ma, Le @ 2019-10-29 7:37 UTC (permalink / raw)
To: Chen, Guchun, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
[-- Attachment #1.1: Type: text/plain, Size: 2690 bytes --]
-----Original Message-----
From: Chen, Guchun <Guchun.Chen@amd.com>
Sent: Tuesday, October 29, 2019 9:37 AM
To: Ma, Le <Le.Ma@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Ma, Le <Le.Ma@amd.com>
Subject: RE: [PATCH 4/4] drm/amdgpu: remove ras global recovery handling from ras_controller_int handler
Regards,
Guchun
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org<mailto:amd-gfx-bounces@lists.freedesktop.org>> On Behalf Of Le Ma
Sent: Monday, October 28, 2019 7:31 PM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Ma, Le <Le.Ma@amd.com<mailto:Le.Ma@amd.com>>
Subject: [PATCH 4/4] drm/amdgpu: remove ras global recovery handling from ras_controller_int handler
From: Le Ma <Le.Ma@amd.com<mailto:Le.Ma@amd.com>>
Change-Id: Ia8a61a4b3bd529f0f691e43e69b299d7d151c0c2
Signed-off-by: Le Ma <Le.Ma@amd.com<mailto:Le.Ma@amd.com>>
---
drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
index 0db458f..876690a 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
@@ -324,7 +324,11 @@ static void nbio_v7_4_handle_ras_controller_intr_no_bifring(struct amdgpu_device
RAS_CNTLR_INTERRUPT_CLEAR, 1);
WREG32_SOC15(NBIO, 0, mmBIF_DOORBELL_INT_CNTL, bif_doorbell_intr_cntl);
- amdgpu_ras_global_ras_isr(adev);
+ /*
+ * ras_controller_int is dedicated for nbif ras error,
+ * not the global interrupt for sync flood
+ */
+ amdgpu_ras_reset_gpu(adev, true);
[Guchun]We need to add one printing here to tell audience, who and why resets gpu? And moreover, in the removed global ras isr handler amdgpu_ras_global_ras_isr, we call amdgpu_ras_reset_gpu with is_baco parameter "false", but now we use "true" here?
[Le] We may consider add printing here to indicate it’s ras controller interrupt issue. The is_baco parameter is unused and has no effect. Anyway, I will revise and hold on patch #2 and #4 when baco based RAS recovery totally works as Hawking’s comment.
}
}
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[-- Attachment #1.2: Type: text/html, Size: 8458 bytes --]
[-- Attachment #2: Type: text/plain, Size: 153 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 4/4] drm/amdgpu: remove ras global recovery handling from ras_controller_int handler
@ 2019-10-29 7:37 ` Ma, Le
0 siblings, 0 replies; 18+ messages in thread
From: Ma, Le @ 2019-10-29 7:37 UTC (permalink / raw)
To: Chen, Guchun, amd-gfx
[-- Attachment #1.1: Type: text/plain, Size: 2690 bytes --]
-----Original Message-----
From: Chen, Guchun <Guchun.Chen@amd.com>
Sent: Tuesday, October 29, 2019 9:37 AM
To: Ma, Le <Le.Ma@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Ma, Le <Le.Ma@amd.com>
Subject: RE: [PATCH 4/4] drm/amdgpu: remove ras global recovery handling from ras_controller_int handler
Regards,
Guchun
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org<mailto:amd-gfx-bounces@lists.freedesktop.org>> On Behalf Of Le Ma
Sent: Monday, October 28, 2019 7:31 PM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Ma, Le <Le.Ma@amd.com<mailto:Le.Ma@amd.com>>
Subject: [PATCH 4/4] drm/amdgpu: remove ras global recovery handling from ras_controller_int handler
From: Le Ma <Le.Ma@amd.com<mailto:Le.Ma@amd.com>>
Change-Id: Ia8a61a4b3bd529f0f691e43e69b299d7d151c0c2
Signed-off-by: Le Ma <Le.Ma@amd.com<mailto:Le.Ma@amd.com>>
---
drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
index 0db458f..876690a 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
@@ -324,7 +324,11 @@ static void nbio_v7_4_handle_ras_controller_intr_no_bifring(struct amdgpu_device
RAS_CNTLR_INTERRUPT_CLEAR, 1);
WREG32_SOC15(NBIO, 0, mmBIF_DOORBELL_INT_CNTL, bif_doorbell_intr_cntl);
- amdgpu_ras_global_ras_isr(adev);
+ /*
+ * ras_controller_int is dedicated for nbif ras error,
+ * not the global interrupt for sync flood
+ */
+ amdgpu_ras_reset_gpu(adev, true);
[Guchun]We need to add one printing here to tell audience, who and why resets gpu? And moreover, in the removed global ras isr handler amdgpu_ras_global_ras_isr, we call amdgpu_ras_reset_gpu with is_baco parameter "false", but now we use "true" here?
[Le] We may consider add printing here to indicate it’s ras controller interrupt issue. The is_baco parameter is unused and has no effect. Anyway, I will revise and hold on patch #2 and #4 when baco based RAS recovery totally works as Hawking’s comment.
}
}
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[-- Attachment #1.2: Type: text/html, Size: 8458 bytes --]
[-- Attachment #2: Type: text/plain, Size: 153 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2019-10-29 7:37 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-28 11:31 [PATCH 1/4] drm/amdgpu: clear UVD VCPU buffer when err_event_athub generated Le Ma
2019-10-28 11:31 ` Le Ma
[not found] ` <1572262269-14985-1-git-send-email-le.ma-5C7GfCeVMHo@public.gmane.org>
2019-10-28 11:31 ` [PATCH 2/4] drm/amdgpu: reset err_event_athub flag if gpu recovery succeeded Le Ma
2019-10-28 11:31 ` Le Ma
[not found] ` <1572262269-14985-2-git-send-email-le.ma-5C7GfCeVMHo@public.gmane.org>
2019-10-29 1:27 ` Chen, Guchun
2019-10-29 1:27 ` Chen, Guchun
[not found] ` <BYAPR12MB280615A3803ADC31A4AE9C8EF1610-ZGDeBxoHBPk0CuAkIMgl3QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-10-29 7:27 ` Ma, Le
2019-10-29 7:27 ` Ma, Le
2019-10-28 11:31 ` [PATCH 3/4] drm/amdgpu: bypass some cleanup work after err_event_athub Le Ma
2019-10-28 11:31 ` Le Ma
2019-10-28 11:31 ` [PATCH 4/4] drm/amdgpu: remove ras global recovery handling from ras_controller_int handler Le Ma
2019-10-28 11:31 ` Le Ma
[not found] ` <1572262269-14985-4-git-send-email-le.ma-5C7GfCeVMHo@public.gmane.org>
2019-10-29 1:36 ` Chen, Guchun
2019-10-29 1:36 ` Chen, Guchun
[not found] ` <BYAPR12MB2806A8C355785EFB07D88E2EF1610-ZGDeBxoHBPk0CuAkIMgl3QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-10-29 7:37 ` Ma, Le
2019-10-29 7:37 ` Ma, Le
2019-10-28 11:53 ` [PATCH 1/4] drm/amdgpu: clear UVD VCPU buffer when err_event_athub generated Zhang, Hawking
2019-10-28 11:53 ` Zhang, Hawking
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.