* [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
@ 2020-05-19 9:14 Clements, John
2020-05-19 10:26 ` Zhang, Hawking
0 siblings, 1 reply; 4+ messages in thread
From: Clements, John @ 2020-05-19 9:14 UTC (permalink / raw)
To: amd-gfx, Zhang, Hawking
[-- Attachment #1.1: Type: text/plain, Size: 123 bytes --]
[AMD Official Use Only - Internal Distribution Only]
Submitting patch to block SMU access' via SMI during RAS recovery
[-- Attachment #1.2: Type: text/html, Size: 1850 bytes --]
[-- Attachment #2: 0001-drm-amdgpu-resolve-ras-recovery-vs-smi-race-conditio.patch --]
[-- Type: application/octet-stream, Size: 3724 bytes --]
From 9c5ee349c318e4d1b142f8c1fd4d4003ef2b6a74 Mon Sep 17 00:00:00 2001
From: John Clements <john.clements@amd.com>
Date: Tue, 19 May 2020 17:11:54 +0800
Subject: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
during ras recovery block smu access via smi
Signed-off-by: John Clements <john.clements@amd.com>
Change-Id: Ia280055737f3adda50e4a4a15435ae00e3f07173
---
drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 16 ++++++++++++++++
drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 4 ++++
drivers/gpu/drm/amd/powerplay/arcturus_ppt.c | 3 +++
3 files changed, 23 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
index b75362bf0742..f0fb57d73a7f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
@@ -31,6 +31,7 @@
#include "amdgpu_dpm.h"
#include "amdgpu_display.h"
#include "amdgpu_smu.h"
+#include "amdgpu_ras.h"
#include "atom.h"
#include <linux/power_supply.h>
#include <linux/pci.h>
@@ -104,6 +105,9 @@ int amdgpu_dpm_read_sensor(struct amdgpu_device *adev, enum amd_pp_sensors senso
{
int ret = 0;
+ if (amdgpu_ras_intr_triggered())
+ return 0;
+
if (!data || !size)
return -EINVAL;
@@ -306,6 +310,9 @@ static ssize_t amdgpu_get_power_dpm_force_performance_level(struct device *dev,
if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
return 0;
+ if (amdgpu_ras_intr_triggered())
+ return snprintf(buf, PAGE_SIZE, "unavailable\n");
+
ret = pm_runtime_get_sync(ddev->dev);
if (ret < 0)
return ret;
@@ -343,6 +350,9 @@ static ssize_t amdgpu_set_power_dpm_force_performance_level(struct device *dev,
enum amd_dpm_forced_level current_level = 0xff;
int ret = 0;
+ if (amdgpu_ras_intr_triggered())
+ return 0;
+
if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
return -EINVAL;
@@ -1674,6 +1684,9 @@ static ssize_t amdgpu_get_gpu_busy_percent(struct device *dev,
if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
return 0;
+ if (amdgpu_ras_intr_triggered())
+ return snprintf(buf, PAGE_SIZE, "unavailable\n");
+
r = pm_runtime_get_sync(ddev->dev);
if (r < 0)
return r;
@@ -1967,6 +1980,9 @@ static ssize_t amdgpu_hwmon_show_temp(struct device *dev,
int channel = to_sensor_dev_attr(attr)->index;
int r, temp = 0, size = sizeof(temp);
+ if (amdgpu_ras_intr_triggered())
+ return snprintf(buf, PAGE_SIZE, "unavailable\n");
+
if (channel >= PP_TEMP_MAX)
return -EINVAL;
diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index 8c684a6e0156..15d295a83d2a 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -25,6 +25,7 @@
#include "amdgpu.h"
#include "amdgpu_smu.h"
+#include "amdgpu_ras.h"
#include "smu_internal.h"
#include "smu_v11_0.h"
#include "smu_v12_0.h"
@@ -2421,6 +2422,9 @@ int smu_read_sensor(struct smu_context *smu,
struct amdgpu_device *adev = smu->adev;
int ret = 0;
+ if (amdgpu_ras_intr_triggered())
+ return 0;
+
if (!adev->pm.dpm_enabled)
return -EINVAL;
diff --git a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
index cbf70122de9b..fa303b1c921a 100644
--- a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
@@ -623,6 +623,9 @@ static int arcturus_print_clk_levels(struct smu_context *smu,
struct smu_dpm_context *smu_dpm = &smu->smu_dpm;
struct arcturus_dpm_table *dpm_table = NULL;
+ if (amdgpu_ras_intr_triggered())
+ return snprintf(buf, PAGE_SIZE, "unavailable\n");
+
dpm_table = smu_dpm->dpm_context;
switch (type) {
--
2.17.1
[-- Attachment #3: Type: text/plain, Size: 154 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 4+ messages in thread
* RE: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
2020-05-19 9:14 [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition Clements, John
@ 2020-05-19 10:26 ` Zhang, Hawking
2020-05-20 2:26 ` Clements, John
0 siblings, 1 reply; 4+ messages in thread
From: Zhang, Hawking @ 2020-05-19 10:26 UTC (permalink / raw)
To: Clements, John, amd-gfx
[-- Attachment #1.1: Type: text/plain, Size: 543 bytes --]
[AMD Official Use Only - Internal Distribution Only]
Please only apply the check to arcturus - we don't need to check ras fatal error event on all the NV series.
Regards,
Hawking
From: Clements, John <John.Clements@amd.com>
Sent: Tuesday, May 19, 2020 17:15
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang@amd.com>
Subject: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
[AMD Official Use Only - Internal Distribution Only]
Submitting patch to block SMU access' via SMI during RAS recovery
[-- Attachment #1.2: Type: text/html, Size: 3179 bytes --]
[-- Attachment #2: Type: text/plain, Size: 154 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
2020-05-19 10:26 ` Zhang, Hawking
@ 2020-05-20 2:26 ` Clements, John
2020-05-20 2:27 ` Zhang, Hawking
0 siblings, 1 reply; 4+ messages in thread
From: Clements, John @ 2020-05-20 2:26 UTC (permalink / raw)
To: Zhang, Hawking, amd-gfx
[-- Attachment #1.1: Type: text/plain, Size: 1023 bytes --]
[AMD Official Use Only - Internal Distribution Only]
Thank you Hawking,
I have updated/tested the patch based of your recommendation.
From: Zhang, Hawking <Hawking.Zhang@amd.com>
Sent: Tuesday, May 19, 2020 6:27 PM
To: Clements, John <John.Clements@amd.com>; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
[AMD Official Use Only - Internal Distribution Only]
Please only apply the check to arcturus - we don't need to check ras fatal error event on all the NV series.
Regards,
Hawking
From: Clements, John <John.Clements@amd.com<mailto:John.Clements@amd.com>>
Sent: Tuesday, May 19, 2020 17:15
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Zhang, Hawking <Hawking.Zhang@amd.com<mailto:Hawking.Zhang@amd.com>>
Subject: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
[AMD Official Use Only - Internal Distribution Only]
Submitting patch to block SMU access' via SMI during RAS recovery
[-- Attachment #1.2: Type: text/html, Size: 4412 bytes --]
[-- Attachment #2: 0001-drm-amdgpu-resolve-ras-recovery-vs-smi-race-conditio.patch --]
[-- Type: application/octet-stream, Size: 1302 bytes --]
From a39e2b0deaca5c08869e2073fe6ec00557b1fee5 Mon Sep 17 00:00:00 2001
From: John Clements <john.clements@amd.com>
Date: Wed, 20 May 2020 10:12:54 +0800
Subject: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
during ras recovery block smu access via smi
Signed-off-by: John Clements <john.clements@amd.com>
Change-Id: I268dd5ea5ff002b3489b75007f337874af54f780
---
drivers/gpu/drm/amd/powerplay/arcturus_ppt.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
index cbf70122de9b..27c5fc9572b2 100644
--- a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
@@ -623,6 +623,9 @@ static int arcturus_print_clk_levels(struct smu_context *smu,
struct smu_dpm_context *smu_dpm = &smu->smu_dpm;
struct arcturus_dpm_table *dpm_table = NULL;
+ if (amdgpu_ras_intr_triggered())
+ return snprintf(buf, PAGE_SIZE, "unavailable\n");
+
dpm_table = smu_dpm->dpm_context;
switch (type) {
@@ -998,6 +1001,9 @@ static int arcturus_read_sensor(struct smu_context *smu,
PPTable_t *pptable = table_context->driver_pptable;
int ret = 0;
+ if (amdgpu_ras_intr_triggered())
+ return 0;
+
if (!data || !size)
return -EINVAL;
--
2.17.1
[-- Attachment #3: Type: text/plain, Size: 154 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply related [flat|nested] 4+ messages in thread
* RE: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
2020-05-20 2:26 ` Clements, John
@ 2020-05-20 2:27 ` Zhang, Hawking
0 siblings, 0 replies; 4+ messages in thread
From: Zhang, Hawking @ 2020-05-20 2:27 UTC (permalink / raw)
To: Clements, John, amd-gfx
[-- Attachment #1.1: Type: text/plain, Size: 1489 bytes --]
[AMD Official Use Only - Internal Distribution Only]
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Regards,
Hawking
From: Clements, John <John.Clements@amd.com>
Sent: Wednesday, May 20, 2020 10:26
To: Zhang, Hawking <Hawking.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
[AMD Official Use Only - Internal Distribution Only]
Thank you Hawking,
I have updated/tested the patch based of your recommendation.
From: Zhang, Hawking <Hawking.Zhang@amd.com<mailto:Hawking.Zhang@amd.com>>
Sent: Tuesday, May 19, 2020 6:27 PM
To: Clements, John <John.Clements@amd.com<mailto:John.Clements@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: RE: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
[AMD Official Use Only - Internal Distribution Only]
Please only apply the check to arcturus - we don't need to check ras fatal error event on all the NV series.
Regards,
Hawking
From: Clements, John <John.Clements@amd.com<mailto:John.Clements@amd.com>>
Sent: Tuesday, May 19, 2020 17:15
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Zhang, Hawking <Hawking.Zhang@amd.com<mailto:Hawking.Zhang@amd.com>>
Subject: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
[AMD Official Use Only - Internal Distribution Only]
Submitting patch to block SMU access' via SMI during RAS recovery
[-- Attachment #1.2: Type: text/html, Size: 5500 bytes --]
[-- Attachment #2: Type: text/plain, Size: 154 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-05-20 2:27 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-19 9:14 [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition Clements, John
2020-05-19 10:26 ` Zhang, Hawking
2020-05-20 2:26 ` Clements, John
2020-05-20 2:27 ` Zhang, Hawking
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).