amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: "Zhang, Hawking" <Hawking.Zhang@amd.com>
To: "Chen, Guchun" <Guchun.Chen@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"Li, Dennis" <Dennis.Li@amd.com>,
	"Zhou1, Tao" <Tao.Zhou1@amd.com>,
	"Clements, John" <John.Clements@amd.com>
Subject: RE: [PATCH 2/2] drm/amdgpu: record non-zero error counter info in NBIO before resetting GPU
Date: Fri, 14 Feb 2020 14:59:26 +0000	[thread overview]
Message-ID: <MWHPR12MB14246036D67146A286A0128EFC150@MWHPR12MB1424.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20200214140650.2133-2-guchun.chen@amd.com>

[AMD Official Use Only - Internal Distribution Only]

Series is

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>

Regards,
Hawking
-----Original Message-----
From: Chen, Guchun <Guchun.Chen@amd.com> 
Sent: Friday, February 14, 2020 22:07
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang@amd.com>; Li, Dennis <Dennis.Li@amd.com>; Zhou1, Tao <Tao.Zhou1@amd.com>; Clements, John <John.Clements@amd.com>
Cc: Chen, Guchun <Guchun.Chen@amd.com>
Subject: [PATCH 2/2] drm/amdgpu: record non-zero error counter info in NBIO before resetting GPU

When NBIO's RAS error happens, before trigging GPU reset, it's needed to record error counter information, which can correct the error counter value missed issue when reading from debugfs.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
index 65eb378fa035..149d386590df 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
@@ -318,6 +318,7 @@ static void nbio_v7_4_handle_ras_controller_intr_no_bifring(struct amdgpu_device  {
 	uint32_t bif_doorbell_intr_cntl;
 	struct ras_manager *obj = amdgpu_ras_find_obj(adev, adev->nbio.ras_if);
+	struct ras_err_data err_data = {0, 0, 0, NULL};
 
 	bif_doorbell_intr_cntl = RREG32_SOC15(NBIO, 0, mmBIF_DOORBELL_INT_CNTL);
 	if (REG_GET_FIELD(bif_doorbell_intr_cntl,
@@ -332,7 +333,19 @@ static void nbio_v7_4_handle_ras_controller_intr_no_bifring(struct amdgpu_device
 		 * clear error status after ras_controller_intr according to
 		 * hw team and count ue number for query
 		 */
-		nbio_v7_4_query_ras_error_count(adev, &obj->err_data);
+		nbio_v7_4_query_ras_error_count(adev, &err_data);
+
+		/* logging on error counter and printing for awareness */
+		obj->err_data.ue_count += err_data.ue_count;
+		obj->err_data.ce_count += err_data.ce_count;
+
+		if (err_data.ce_count)
+			DRM_INFO("%ld correctable errors detected in %s block\n",
+				obj->err_data.ce_count, adev->nbio.ras_if->name);
+
+		if (err_data.ue_count)
+			DRM_INFO("%ld uncorrectable errors detected in %s block\n",
+				obj->err_data.ue_count, adev->nbio.ras_if->name);
 
 		DRM_WARN("RAS controller interrupt triggered by NBIF error\n");
 
--
2.17.1
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

      reply	other threads:[~2020-02-14 14:59 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-14 14:06 [PATCH 1/2] drm/amdgpu: log on non-zero error conter per IP before GPU reset Guchun Chen
2020-02-14 14:06 ` [PATCH 2/2] drm/amdgpu: record non-zero error counter info in NBIO before resetting GPU Guchun Chen
2020-02-14 14:59   ` Zhang, Hawking [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MWHPR12MB14246036D67146A286A0128EFC150@MWHPR12MB1424.namprd12.prod.outlook.com \
    --to=hawking.zhang@amd.com \
    --cc=Dennis.Li@amd.com \
    --cc=Guchun.Chen@amd.com \
    --cc=John.Clements@amd.com \
    --cc=Tao.Zhou1@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).