amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
@ 2020-05-19  9:14 Clements, John
  2020-05-19 10:26 ` Zhang, Hawking
  0 siblings, 1 reply; 4+ messages in thread
From: Clements, John @ 2020-05-19  9:14 UTC (permalink / raw)
  To: amd-gfx, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 123 bytes --]

[AMD Official Use Only - Internal Distribution Only]

Submitting patch to block SMU access' via SMI during RAS recovery

[-- Attachment #1.2: Type: text/html, Size: 1850 bytes --]

[-- Attachment #2: 0001-drm-amdgpu-resolve-ras-recovery-vs-smi-race-conditio.patch --]
[-- Type: application/octet-stream, Size: 3724 bytes --]

From 9c5ee349c318e4d1b142f8c1fd4d4003ef2b6a74 Mon Sep 17 00:00:00 2001
From: John Clements <john.clements@amd.com>
Date: Tue, 19 May 2020 17:11:54 +0800
Subject: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition

during ras recovery block smu access via smi

Signed-off-by: John Clements <john.clements@amd.com>
Change-Id: Ia280055737f3adda50e4a4a15435ae00e3f07173
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c       | 16 ++++++++++++++++
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c   |  4 ++++
 drivers/gpu/drm/amd/powerplay/arcturus_ppt.c |  3 +++
 3 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
index b75362bf0742..f0fb57d73a7f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
@@ -31,6 +31,7 @@
 #include "amdgpu_dpm.h"
 #include "amdgpu_display.h"
 #include "amdgpu_smu.h"
+#include "amdgpu_ras.h"
 #include "atom.h"
 #include <linux/power_supply.h>
 #include <linux/pci.h>
@@ -104,6 +105,9 @@ int amdgpu_dpm_read_sensor(struct amdgpu_device *adev, enum amd_pp_sensors senso
 {
 	int ret = 0;
 
+	if (amdgpu_ras_intr_triggered())
+		return 0;
+
 	if (!data || !size)
 		return -EINVAL;
 
@@ -306,6 +310,9 @@ static ssize_t amdgpu_get_power_dpm_force_performance_level(struct device *dev,
 	if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
 		return 0;
 
+	if (amdgpu_ras_intr_triggered())
+		return snprintf(buf, PAGE_SIZE, "unavailable\n");
+
 	ret = pm_runtime_get_sync(ddev->dev);
 	if (ret < 0)
 		return ret;
@@ -343,6 +350,9 @@ static ssize_t amdgpu_set_power_dpm_force_performance_level(struct device *dev,
 	enum amd_dpm_forced_level current_level = 0xff;
 	int ret = 0;
 
+	if (amdgpu_ras_intr_triggered())
+		return 0;
+
 	if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
 		return -EINVAL;
 
@@ -1674,6 +1684,9 @@ static ssize_t amdgpu_get_gpu_busy_percent(struct device *dev,
 	if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
 		return 0;
 
+	if (amdgpu_ras_intr_triggered())
+		return snprintf(buf, PAGE_SIZE, "unavailable\n");
+
 	r = pm_runtime_get_sync(ddev->dev);
 	if (r < 0)
 		return r;
@@ -1967,6 +1980,9 @@ static ssize_t amdgpu_hwmon_show_temp(struct device *dev,
 	int channel = to_sensor_dev_attr(attr)->index;
 	int r, temp = 0, size = sizeof(temp);
 
+	if (amdgpu_ras_intr_triggered())
+		return snprintf(buf, PAGE_SIZE, "unavailable\n");
+
 	if (channel >= PP_TEMP_MAX)
 		return -EINVAL;
 
diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index 8c684a6e0156..15d295a83d2a 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -25,6 +25,7 @@
 
 #include "amdgpu.h"
 #include "amdgpu_smu.h"
+#include "amdgpu_ras.h"
 #include "smu_internal.h"
 #include "smu_v11_0.h"
 #include "smu_v12_0.h"
@@ -2421,6 +2422,9 @@ int smu_read_sensor(struct smu_context *smu,
 	struct amdgpu_device *adev = smu->adev;
 	int ret = 0;
 
+	if (amdgpu_ras_intr_triggered())
+		return 0;
+
 	if (!adev->pm.dpm_enabled)
 		return -EINVAL;
 
diff --git a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
index cbf70122de9b..fa303b1c921a 100644
--- a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
@@ -623,6 +623,9 @@ static int arcturus_print_clk_levels(struct smu_context *smu,
 	struct smu_dpm_context *smu_dpm = &smu->smu_dpm;
 	struct arcturus_dpm_table *dpm_table = NULL;
 
+	if (amdgpu_ras_intr_triggered())
+		return snprintf(buf, PAGE_SIZE, "unavailable\n");
+
 	dpm_table = smu_dpm->dpm_context;
 
 	switch (type) {
-- 
2.17.1


[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* RE: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
  2020-05-19  9:14 [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition Clements, John
@ 2020-05-19 10:26 ` Zhang, Hawking
  2020-05-20  2:26   ` Clements, John
  0 siblings, 1 reply; 4+ messages in thread
From: Zhang, Hawking @ 2020-05-19 10:26 UTC (permalink / raw)
  To: Clements, John, amd-gfx


[-- Attachment #1.1: Type: text/plain, Size: 543 bytes --]

[AMD Official Use Only - Internal Distribution Only]

Please only apply the check to arcturus - we don't need to check ras fatal error event on all the NV series.

Regards,
Hawking
From: Clements, John <John.Clements@amd.com>
Sent: Tuesday, May 19, 2020 17:15
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang@amd.com>
Subject: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition


[AMD Official Use Only - Internal Distribution Only]

Submitting patch to block SMU access' via SMI during RAS recovery

[-- Attachment #1.2: Type: text/html, Size: 3179 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
  2020-05-19 10:26 ` Zhang, Hawking
@ 2020-05-20  2:26   ` Clements, John
  2020-05-20  2:27     ` Zhang, Hawking
  0 siblings, 1 reply; 4+ messages in thread
From: Clements, John @ 2020-05-20  2:26 UTC (permalink / raw)
  To: Zhang, Hawking, amd-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1023 bytes --]

[AMD Official Use Only - Internal Distribution Only]

Thank you Hawking,

I have updated/tested the patch based of your recommendation.

From: Zhang, Hawking <Hawking.Zhang@amd.com>
Sent: Tuesday, May 19, 2020 6:27 PM
To: Clements, John <John.Clements@amd.com>; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition


[AMD Official Use Only - Internal Distribution Only]

Please only apply the check to arcturus - we don't need to check ras fatal error event on all the NV series.

Regards,
Hawking
From: Clements, John <John.Clements@amd.com<mailto:John.Clements@amd.com>>
Sent: Tuesday, May 19, 2020 17:15
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Zhang, Hawking <Hawking.Zhang@amd.com<mailto:Hawking.Zhang@amd.com>>
Subject: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition


[AMD Official Use Only - Internal Distribution Only]

Submitting patch to block SMU access' via SMI during RAS recovery

[-- Attachment #1.2: Type: text/html, Size: 4412 bytes --]

[-- Attachment #2: 0001-drm-amdgpu-resolve-ras-recovery-vs-smi-race-conditio.patch --]
[-- Type: application/octet-stream, Size: 1302 bytes --]

From a39e2b0deaca5c08869e2073fe6ec00557b1fee5 Mon Sep 17 00:00:00 2001
From: John Clements <john.clements@amd.com>
Date: Wed, 20 May 2020 10:12:54 +0800
Subject: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition

during ras recovery block smu access via smi

Signed-off-by: John Clements <john.clements@amd.com>
Change-Id: I268dd5ea5ff002b3489b75007f337874af54f780
---
 drivers/gpu/drm/amd/powerplay/arcturus_ppt.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
index cbf70122de9b..27c5fc9572b2 100644
--- a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
@@ -623,6 +623,9 @@ static int arcturus_print_clk_levels(struct smu_context *smu,
 	struct smu_dpm_context *smu_dpm = &smu->smu_dpm;
 	struct arcturus_dpm_table *dpm_table = NULL;
 
+	if (amdgpu_ras_intr_triggered())
+		return snprintf(buf, PAGE_SIZE, "unavailable\n");
+
 	dpm_table = smu_dpm->dpm_context;
 
 	switch (type) {
@@ -998,6 +1001,9 @@ static int arcturus_read_sensor(struct smu_context *smu,
 	PPTable_t *pptable = table_context->driver_pptable;
 	int ret = 0;
 
+	if (amdgpu_ras_intr_triggered())
+		return 0;
+
 	if (!data || !size)
 		return -EINVAL;
 
-- 
2.17.1


[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* RE: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition
  2020-05-20  2:26   ` Clements, John
@ 2020-05-20  2:27     ` Zhang, Hawking
  0 siblings, 0 replies; 4+ messages in thread
From: Zhang, Hawking @ 2020-05-20  2:27 UTC (permalink / raw)
  To: Clements, John, amd-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1489 bytes --]

[AMD Official Use Only - Internal Distribution Only]

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>

Regards,
Hawking
From: Clements, John <John.Clements@amd.com>
Sent: Wednesday, May 20, 2020 10:26
To: Zhang, Hawking <Hawking.Zhang@amd.com>; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition


[AMD Official Use Only - Internal Distribution Only]

Thank you Hawking,

I have updated/tested the patch based of your recommendation.

From: Zhang, Hawking <Hawking.Zhang@amd.com<mailto:Hawking.Zhang@amd.com>>
Sent: Tuesday, May 19, 2020 6:27 PM
To: Clements, John <John.Clements@amd.com<mailto:John.Clements@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: RE: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition


[AMD Official Use Only - Internal Distribution Only]

Please only apply the check to arcturus - we don't need to check ras fatal error event on all the NV series.

Regards,
Hawking
From: Clements, John <John.Clements@amd.com<mailto:John.Clements@amd.com>>
Sent: Tuesday, May 19, 2020 17:15
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Zhang, Hawking <Hawking.Zhang@amd.com<mailto:Hawking.Zhang@amd.com>>
Subject: [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition


[AMD Official Use Only - Internal Distribution Only]

Submitting patch to block SMU access' via SMI during RAS recovery

[-- Attachment #1.2: Type: text/html, Size: 5500 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-05-20  2:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-19  9:14 [PATCH] drm/amdgpu: resolve ras recovery vs smi race condition Clements, John
2020-05-19 10:26 ` Zhang, Hawking
2020-05-20  2:26   ` Clements, John
2020-05-20  2:27     ` Zhang, Hawking

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).