* [PATCH] drm/amdgpu: Restore information reporting in RAS
@ 2021-10-25 16:02 Luben Tuikov
2021-10-25 16:07 ` Russell, Kent
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Luben Tuikov @ 2021-10-25 16:02 UTC (permalink / raw)
To: amd-gfx; +Cc: Luben Tuikov, Kent Russell, Alex Deucher
A recent patch took away the reporting of number of RAS records and
the threshold due to the way it was edited/spliced on top of the code.
This patch restores this reporting.
Cc: Kent Russell <kent.russell@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Fixes: 07df2fb092d09e ("drm/amdgpu: Add kernel parameter support for ignoring bad page threshold")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index ae64ca02ccc4f8..05117eda105b55 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -1112,7 +1112,10 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control,
res = 0;
} else {
*exceed_err_limit = true;
- dev_err(adev->dev, "GPU will not be initialized. Replace this GPU or increase the threshold.");
+ dev_err(adev->dev,
+ "RAS records:%d exceed threshold:%d, "
+ "GPU will not be initialized. Replace this GPU or increase the threshold",
+ control->ras_num_recs, ras->bad_page_cnt_threshold);
}
}
} else {
base-commit: b60bccb408c831c685b2a257eff575bcda2cbe9d
--
2.33.1.558.g2bd2f258f4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* RE: [PATCH] drm/amdgpu: Restore information reporting in RAS
2021-10-25 16:02 [PATCH] drm/amdgpu: Restore information reporting in RAS Luben Tuikov
@ 2021-10-25 16:07 ` Russell, Kent
2021-10-25 16:15 ` Alex Deucher
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Russell, Kent @ 2021-10-25 16:07 UTC (permalink / raw)
To: Tuikov, Luben, amd-gfx; +Cc: Deucher, Alexander
[AMD Official Use Only]
Thanks Luben
Reviewed-by: Kent Russell <kent.russell@amd.com>
Kent
> -----Original Message-----
> From: Tuikov, Luben <Luben.Tuikov@amd.com>
> Sent: Monday, October 25, 2021 12:02 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Tuikov, Luben <Luben.Tuikov@amd.com>; Russell, Kent <Kent.Russell@amd.com>;
> Deucher, Alexander <Alexander.Deucher@amd.com>
> Subject: [PATCH] drm/amdgpu: Restore information reporting in RAS
>
> A recent patch took away the reporting of number of RAS records and
> the threshold due to the way it was edited/spliced on top of the code.
> This patch restores this reporting.
>
> Cc: Kent Russell <kent.russell@amd.com>
> Cc: Alex Deucher <Alexander.Deucher@amd.com>
> Fixes: 07df2fb092d09e ("drm/amdgpu: Add kernel parameter support for ignoring bad page
> threshold")
> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> index ae64ca02ccc4f8..05117eda105b55 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> @@ -1112,7 +1112,10 @@ int amdgpu_ras_eeprom_init(struct
> amdgpu_ras_eeprom_control *control,
> res = 0;
> } else {
> *exceed_err_limit = true;
> - dev_err(adev->dev, "GPU will not be initialized. Replace this
> GPU or increase the threshold.");
> + dev_err(adev->dev,
> + "RAS records:%d exceed threshold:%d, "
> + "GPU will not be initialized. Replace this GPU or
> increase the threshold",
> + control->ras_num_recs, ras-
> >bad_page_cnt_threshold);
> }
> }
> } else {
>
> base-commit: b60bccb408c831c685b2a257eff575bcda2cbe9d
> --
> 2.33.1.558.g2bd2f258f4
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/amdgpu: Restore information reporting in RAS
2021-10-25 16:02 [PATCH] drm/amdgpu: Restore information reporting in RAS Luben Tuikov
2021-10-25 16:07 ` Russell, Kent
@ 2021-10-25 16:15 ` Alex Deucher
2021-10-25 16:27 ` Lazar, Lijo
2021-10-25 17:30 ` Felix Kuehling
3 siblings, 0 replies; 6+ messages in thread
From: Alex Deucher @ 2021-10-25 16:15 UTC (permalink / raw)
To: Luben Tuikov; +Cc: amd-gfx list, Kent Russell, Alex Deucher
On Mon, Oct 25, 2021 at 12:02 PM Luben Tuikov <luben.tuikov@amd.com> wrote:
>
> A recent patch took away the reporting of number of RAS records and
> the threshold due to the way it was edited/spliced on top of the code.
> This patch restores this reporting.
>
> Cc: Kent Russell <kent.russell@amd.com>
> Cc: Alex Deucher <Alexander.Deucher@amd.com>
> Fixes: 07df2fb092d09e ("drm/amdgpu: Add kernel parameter support for ignoring bad page threshold")
> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> index ae64ca02ccc4f8..05117eda105b55 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> @@ -1112,7 +1112,10 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control,
> res = 0;
> } else {
> *exceed_err_limit = true;
> - dev_err(adev->dev, "GPU will not be initialized. Replace this GPU or increase the threshold.");
> + dev_err(adev->dev,
> + "RAS records:%d exceed threshold:%d, "
> + "GPU will not be initialized. Replace this GPU or increase the threshold",
> + control->ras_num_recs, ras->bad_page_cnt_threshold);
> }
> }
> } else {
>
> base-commit: b60bccb408c831c685b2a257eff575bcda2cbe9d
> --
> 2.33.1.558.g2bd2f258f4
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/amdgpu: Restore information reporting in RAS
2021-10-25 16:02 [PATCH] drm/amdgpu: Restore information reporting in RAS Luben Tuikov
2021-10-25 16:07 ` Russell, Kent
2021-10-25 16:15 ` Alex Deucher
@ 2021-10-25 16:27 ` Lazar, Lijo
2021-10-25 17:30 ` Felix Kuehling
3 siblings, 0 replies; 6+ messages in thread
From: Lazar, Lijo @ 2021-10-25 16:27 UTC (permalink / raw)
To: Tuikov, Luben, amd-gfx; +Cc: Tuikov, Luben, Russell, Kent, Deucher, Alexander
[-- Attachment #1: Type: text/plain, Size: 2211 bytes --]
[Public]
Does the message need a mention about the newly added option to ignore threshold?
Thanks,
Lijo
________________________________
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Luben Tuikov <luben.tuikov@amd.com>
Sent: Monday, October 25, 2021 9:32:20 PM
To: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Tuikov, Luben <Luben.Tuikov@amd.com>; Russell, Kent <Kent.Russell@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>
Subject: [PATCH] drm/amdgpu: Restore information reporting in RAS
A recent patch took away the reporting of number of RAS records and
the threshold due to the way it was edited/spliced on top of the code.
This patch restores this reporting.
Cc: Kent Russell <kent.russell@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Fixes: 07df2fb092d09e ("drm/amdgpu: Add kernel parameter support for ignoring bad page threshold")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index ae64ca02ccc4f8..05117eda105b55 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -1112,7 +1112,10 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control,
res = 0;
} else {
*exceed_err_limit = true;
- dev_err(adev->dev, "GPU will not be initialized. Replace this GPU or increase the threshold.");
+ dev_err(adev->dev,
+ "RAS records:%d exceed threshold:%d, "
+ "GPU will not be initialized. Replace this GPU or increase the threshold",
+ control->ras_num_recs, ras->bad_page_cnt_threshold);
}
}
} else {
base-commit: b60bccb408c831c685b2a257eff575bcda2cbe9d
--
2.33.1.558.g2bd2f258f4
[-- Attachment #2: Type: text/html, Size: 4749 bytes --]
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/amdgpu: Restore information reporting in RAS
2021-10-25 16:02 [PATCH] drm/amdgpu: Restore information reporting in RAS Luben Tuikov
` (2 preceding siblings ...)
2021-10-25 16:27 ` Lazar, Lijo
@ 2021-10-25 17:30 ` Felix Kuehling
2021-10-25 17:59 ` Russell, Kent
3 siblings, 1 reply; 6+ messages in thread
From: Felix Kuehling @ 2021-10-25 17:30 UTC (permalink / raw)
To: Luben Tuikov, amd-gfx; +Cc: Kent Russell, Alex Deucher
Am 2021-10-25 um 12:02 p.m. schrieb Luben Tuikov:
> A recent patch took away the reporting of number of RAS records and
> the threshold due to the way it was edited/spliced on top of the code.
> This patch restores this reporting.
>
> Cc: Kent Russell <kent.russell@amd.com>
> Cc: Alex Deucher <Alexander.Deucher@amd.com>
> Fixes: 07df2fb092d09e ("drm/amdgpu: Add kernel parameter support for ignoring bad page threshold")
> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> index ae64ca02ccc4f8..05117eda105b55 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> @@ -1112,7 +1112,10 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control,
> res = 0;
> } else {
> *exceed_err_limit = true;
> - dev_err(adev->dev, "GPU will not be initialized. Replace this GPU or increase the threshold.");
> + dev_err(adev->dev,
> + "RAS records:%d exceed threshold:%d, "
> + "GPU will not be initialized. Replace this GPU or increase the threshold",
Splitting messages across multiple lines is usually discouraged
(presumably because it makes them hard to grep). I think checkpatch will
treat this as an error, while a long line is just a warning. Therefore
it seems that long lines are less bad than split messages.
Regards,
Felix
> + control->ras_num_recs, ras->bad_page_cnt_threshold);
> }
> }
> } else {
>
> base-commit: b60bccb408c831c685b2a257eff575bcda2cbe9d
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH] drm/amdgpu: Restore information reporting in RAS
2021-10-25 17:30 ` Felix Kuehling
@ 2021-10-25 17:59 ` Russell, Kent
0 siblings, 0 replies; 6+ messages in thread
From: Russell, Kent @ 2021-10-25 17:59 UTC (permalink / raw)
To: Kuehling, Felix, Tuikov, Luben, amd-gfx; +Cc: Deucher, Alexander
[AMD Official Use Only]
> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling@amd.com>
> Sent: Monday, October 25, 2021 1:30 PM
> To: Tuikov, Luben <Luben.Tuikov@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Russell, Kent <Kent.Russell@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: Restore information reporting in RAS
>
> Am 2021-10-25 um 12:02 p.m. schrieb Luben Tuikov:
> > A recent patch took away the reporting of number of RAS records and
> > the threshold due to the way it was edited/spliced on top of the code.
> > This patch restores this reporting.
> >
> > Cc: Kent Russell <kent.russell@amd.com>
> > Cc: Alex Deucher <Alexander.Deucher@amd.com>
> > Fixes: 07df2fb092d09e ("drm/amdgpu: Add kernel parameter support for ignoring bad
> page threshold")
> > Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> > index ae64ca02ccc4f8..05117eda105b55 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> > @@ -1112,7 +1112,10 @@ int amdgpu_ras_eeprom_init(struct
> amdgpu_ras_eeprom_control *control,
> > res = 0;
> > } else {
> > *exceed_err_limit = true;
> > - dev_err(adev->dev, "GPU will not be initialized. Replace this
> GPU or increase the threshold.");
> > + dev_err(adev->dev,
> > + "RAS records:%d exceed threshold:%d, "
> > + "GPU will not be initialized. Replace this GPU or
> increase the threshold",
>
> Splitting messages across multiple lines is usually discouraged
> (presumably because it makes them hard to grep). I think checkpatch will
> treat this as an error, while a long line is just a warning. Therefore
> it seems that long lines are less bad than split messages.
There are a few spots in the eeprom file where it gets done like this; I don't really like it either. Under https://www.kernel.org/doc/html/v5.13/process/coding-style.html , I see it supporting splitting for ASM (point 20) but not for regular string (point 2).
In this one he's just restoring something I dropped, verbatim, so I have no issue giving it my RB.
Kent
>
> Regards,
> Felix
>
>
> > + control->ras_num_recs, ras-
> >bad_page_cnt_threshold);
> > }
> > }
> > } else {
> >
> > base-commit: b60bccb408c831c685b2a257eff575bcda2cbe9d
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-10-25 17:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-25 16:02 [PATCH] drm/amdgpu: Restore information reporting in RAS Luben Tuikov
2021-10-25 16:07 ` Russell, Kent
2021-10-25 16:15 ` Alex Deucher
2021-10-25 16:27 ` Lazar, Lijo
2021-10-25 17:30 ` Felix Kuehling
2021-10-25 17:59 ` Russell, Kent
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.