All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Zhou1, Tao" <Tao.Zhou1@amd.com>
To: "Zhang, Hawking" <Hawking.Zhang@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"Clements, John" <John.Clements@amd.com>,
	 "Yang, Stanley" <Stanley.Yang@amd.com>
Subject: Re: [PATCH 2/3] drm/amdgpu: set poison mode for RAS
Date: Sat, 18 Sep 2021 09:32:23 +0000	[thread overview]
Message-ID: <DM6PR12MB46502029AC84D32CCE5B7CBFB0DE9@DM6PR12MB4650.namprd12.prod.outlook.com> (raw)
In-Reply-To: <BN9PR12MB5257DBA732DD7BED5FB54C1AFCDE9@BN9PR12MB5257.namprd12.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 5545 bytes --]

[AMD Official Use Only]

Poison mode is a global setting currently, will we set it per IP block in the future?
For example, set poison mode for GFX but fatal error mode for SDMA?

dgpu_mode is disabled when connected_to_cpu is 1, is irrelevant to IP block.

Regards,
Tao
________________________________
From: Zhang, Hawking <Hawking.Zhang@amd.com>
Sent: Saturday, September 18, 2021 4:59 PM
To: Zhou1, Tao <Tao.Zhou1@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; Clements, John <John.Clements@amd.com>; Yang, Stanley <Stanley.Yang@amd.com>
Subject: RE: [PATCH 2/3] drm/amdgpu: set poison mode for RAS

[AMD Official Use Only]

+       if (amdgpu_ras_is_poison_enabled(adev))
                 ras_cmd->ras_in_message.init_flags.poison_mode_en = 1;
-       else
+       if (!adev->gmc.xgmi.connected_to_cpu)
                 ras_cmd->ras_in_message.init_flags.dgpu_mode = 1;

I'd expect these flags are set in enable_feature command per IP block if needed. Instead of global setting at firmware/TA initialization phase, thoughts?

Regards,
Hawking

-----Original Message-----
From: Zhou1, Tao <Tao.Zhou1@amd.com>
Sent: Saturday, September 18, 2021 16:08
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang@amd.com>; Clements, John <John.Clements@amd.com>; Yang, Stanley <Stanley.Yang@amd.com>
Cc: Zhou1, Tao <Tao.Zhou1@amd.com>
Subject: [PATCH 2/3] drm/amdgpu: set poison mode for RAS

Add RAS poison mode flag and tell PSP RAS TA about the info.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c |  4 ++--  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 28 +++++++++++++++++++++++++  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h |  5 +++++
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 7d09b28889af..140b94da2f5a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1442,9 +1442,9 @@ static int psp_ras_initialize(struct psp_context *psp)
         ras_cmd = (struct ta_ras_shared_memory *)psp->ras_context.context.mem_context.shared_buf;
         memset(ras_cmd, 0, sizeof(struct ta_ras_shared_memory));

-       if (psp->adev->gmc.xgmi.connected_to_cpu)
+       if (amdgpu_ras_is_poison_enabled(adev))
                 ras_cmd->ras_in_message.init_flags.poison_mode_en = 1;
-       else
+       if (!adev->gmc.xgmi.connected_to_cpu)
                 ras_cmd->ras_in_message.init_flags.dgpu_mode = 1;

         ret = psp_ras_load(psp);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index b5332db4d287..7b7e54fdd785 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2180,6 +2180,7 @@ int amdgpu_ras_init(struct amdgpu_device *adev)  {
         struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
         int r;
+       bool df_poison, umc_poison;

         if (con)
                 return 0;
@@ -2249,6 +2250,23 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
                         goto release_con;
         }

+       /* Init poison mode, the default value is false */
+       if (adev->df.funcs &&
+           adev->df.funcs->query_ras_poison_mode &&
+           adev->umc.ras_funcs &&
+           adev->umc.ras_funcs->query_ras_poison_mode) {
+               df_poison =
+                       adev->df.funcs->query_ras_poison_mode(adev);
+               umc_poison =
+                       adev->umc.ras_funcs->query_ras_poison_mode(adev);
+               /* Only poison is set in both DF and UMC, we can enable it */
+               if (df_poison && umc_poison)
+                       con->poison_mode_en = true;
+               else if (df_poison != umc_poison)
+                       dev_warn(adev->dev, "Poison setting is inconsistent in DF/UMC(%d:%d)!\n",
+                                       df_poison, umc_poison);
+       }
+
         if (amdgpu_ras_fs_init(adev)) {
                 r = -EINVAL;
                 goto release_con;
@@ -2292,6 +2310,16 @@ static int amdgpu_persistent_edc_harvesting(struct amdgpu_device *adev,
         return 0;
 }

+bool amdgpu_ras_is_poison_enabled(struct amdgpu_device *adev) {
+       struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
+
+       if (!con)
+               return false;
+
+       return con->poison_mode_en;
+}
+
 /* helper function to handle common stuff in ip late init phase */  int amdgpu_ras_late_init(struct amdgpu_device *adev,
                          struct ras_common_if *ras_block,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
index 1670467c2054..044bd19b7cce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
@@ -345,6 +345,9 @@ struct amdgpu_ras {
         /* disable ras error count harvest in recovery */
         bool disable_ras_err_cnt_harvest;

+       /* is poison mode */
+       bool poison_mode_en;
+
         /* RAS count errors delayed work */
         struct delayed_work ras_counte_delay_work;
         atomic_t ras_ue_count;
@@ -640,4 +643,6 @@ void amdgpu_release_ras_context(struct amdgpu_device *adev);

 int amdgpu_persistent_edc_harvesting_supported(struct amdgpu_device *adev);

+bool amdgpu_ras_is_poison_enabled(struct amdgpu_device *adev);
+
 #endif
--
2.17.1

[-- Attachment #2: Type: text/html, Size: 11505 bytes --]

  reply	other threads:[~2021-09-18  9:32 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-18  8:07 [PATCH 1/3] drm/amdgpu: add poison mode query for UMC Tao Zhou
2021-09-18  8:07 ` [PATCH 2/3] drm/amdgpu: set poison mode for RAS Tao Zhou
2021-09-18  8:59   ` Zhang, Hawking
2021-09-18  9:32     ` Zhou1, Tao [this message]
2021-09-18  8:07 ` [PATCH 3/3] drm/amdgpu: skip umc ras irq handling in poison mode Tao Zhou
2021-09-18  9:10   ` Zhang, Hawking
2021-09-18  8:59 ` [PATCH 1/3] drm/amdgpu: add poison mode query for UMC Zhang, Hawking

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR12MB46502029AC84D32CCE5B7CBFB0DE9@DM6PR12MB4650.namprd12.prod.outlook.com \
    --to=tao.zhou1@amd.com \
    --cc=Hawking.Zhang@amd.com \
    --cc=John.Clements@amd.com \
    --cc=Stanley.Yang@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.