All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: Restore information reporting in RAS
@ 2021-10-25 16:02 Luben Tuikov
  2021-10-25 16:07 ` Russell, Kent
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Luben Tuikov @ 2021-10-25 16:02 UTC (permalink / raw)
  To: amd-gfx; +Cc: Luben Tuikov, Kent Russell, Alex Deucher

A recent patch took away the reporting of number of RAS records and
the threshold due to the way it was edited/spliced on top of the code.
This patch restores this reporting.

Cc: Kent Russell <kent.russell@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Fixes: 07df2fb092d09e ("drm/amdgpu: Add kernel parameter support for ignoring bad page threshold")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index ae64ca02ccc4f8..05117eda105b55 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -1112,7 +1112,10 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control,
 				res = 0;
 			} else {
 				*exceed_err_limit = true;
-				dev_err(adev->dev, "GPU will not be initialized. Replace this GPU or increase the threshold.");
+				dev_err(adev->dev,
+					"RAS records:%d exceed threshold:%d, "
+					"GPU will not be initialized. Replace this GPU or increase the threshold",
+					control->ras_num_recs, ras->bad_page_cnt_threshold);
 			}
 		}
 	} else {

base-commit: b60bccb408c831c685b2a257eff575bcda2cbe9d
-- 
2.33.1.558.g2bd2f258f4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* RE: [PATCH] drm/amdgpu: Restore information reporting in RAS
  2021-10-25 16:02 [PATCH] drm/amdgpu: Restore information reporting in RAS Luben Tuikov
@ 2021-10-25 16:07 ` Russell, Kent
  2021-10-25 16:15 ` Alex Deucher
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Russell, Kent @ 2021-10-25 16:07 UTC (permalink / raw)
  To: Tuikov, Luben, amd-gfx; +Cc: Deucher, Alexander

[AMD Official Use Only]

Thanks Luben

Reviewed-by: Kent Russell <kent.russell@amd.com>

 Kent

> -----Original Message-----
> From: Tuikov, Luben <Luben.Tuikov@amd.com>
> Sent: Monday, October 25, 2021 12:02 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Tuikov, Luben <Luben.Tuikov@amd.com>; Russell, Kent <Kent.Russell@amd.com>;
> Deucher, Alexander <Alexander.Deucher@amd.com>
> Subject: [PATCH] drm/amdgpu: Restore information reporting in RAS
> 
> A recent patch took away the reporting of number of RAS records and
> the threshold due to the way it was edited/spliced on top of the code.
> This patch restores this reporting.
> 
> Cc: Kent Russell <kent.russell@amd.com>
> Cc: Alex Deucher <Alexander.Deucher@amd.com>
> Fixes: 07df2fb092d09e ("drm/amdgpu: Add kernel parameter support for ignoring bad page
> threshold")
> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> index ae64ca02ccc4f8..05117eda105b55 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> @@ -1112,7 +1112,10 @@ int amdgpu_ras_eeprom_init(struct
> amdgpu_ras_eeprom_control *control,
>  				res = 0;
>  			} else {
>  				*exceed_err_limit = true;
> -				dev_err(adev->dev, "GPU will not be initialized. Replace this
> GPU or increase the threshold.");
> +				dev_err(adev->dev,
> +					"RAS records:%d exceed threshold:%d, "
> +					"GPU will not be initialized. Replace this GPU or
> increase the threshold",
> +					control->ras_num_recs, ras-
> >bad_page_cnt_threshold);
>  			}
>  		}
>  	} else {
> 
> base-commit: b60bccb408c831c685b2a257eff575bcda2cbe9d
> --
> 2.33.1.558.g2bd2f258f4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdgpu: Restore information reporting in RAS
  2021-10-25 16:02 [PATCH] drm/amdgpu: Restore information reporting in RAS Luben Tuikov
  2021-10-25 16:07 ` Russell, Kent
@ 2021-10-25 16:15 ` Alex Deucher
  2021-10-25 16:27 ` Lazar, Lijo
  2021-10-25 17:30 ` Felix Kuehling
  3 siblings, 0 replies; 6+ messages in thread
From: Alex Deucher @ 2021-10-25 16:15 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: amd-gfx list, Kent Russell, Alex Deucher

On Mon, Oct 25, 2021 at 12:02 PM Luben Tuikov <luben.tuikov@amd.com> wrote:
>
> A recent patch took away the reporting of number of RAS records and
> the threshold due to the way it was edited/spliced on top of the code.
> This patch restores this reporting.
>
> Cc: Kent Russell <kent.russell@amd.com>
> Cc: Alex Deucher <Alexander.Deucher@amd.com>
> Fixes: 07df2fb092d09e ("drm/amdgpu: Add kernel parameter support for ignoring bad page threshold")
> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> index ae64ca02ccc4f8..05117eda105b55 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> @@ -1112,7 +1112,10 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control,
>                                 res = 0;
>                         } else {
>                                 *exceed_err_limit = true;
> -                               dev_err(adev->dev, "GPU will not be initialized. Replace this GPU or increase the threshold.");
> +                               dev_err(adev->dev,
> +                                       "RAS records:%d exceed threshold:%d, "
> +                                       "GPU will not be initialized. Replace this GPU or increase the threshold",
> +                                       control->ras_num_recs, ras->bad_page_cnt_threshold);
>                         }
>                 }
>         } else {
>
> base-commit: b60bccb408c831c685b2a257eff575bcda2cbe9d
> --
> 2.33.1.558.g2bd2f258f4
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdgpu: Restore information reporting in RAS
  2021-10-25 16:02 [PATCH] drm/amdgpu: Restore information reporting in RAS Luben Tuikov
  2021-10-25 16:07 ` Russell, Kent
  2021-10-25 16:15 ` Alex Deucher
@ 2021-10-25 16:27 ` Lazar, Lijo
  2021-10-25 17:30 ` Felix Kuehling
  3 siblings, 0 replies; 6+ messages in thread
From: Lazar, Lijo @ 2021-10-25 16:27 UTC (permalink / raw)
  To: Tuikov, Luben, amd-gfx; +Cc: Tuikov, Luben, Russell, Kent, Deucher, Alexander

[-- Attachment #1: Type: text/plain, Size: 2211 bytes --]

[Public]

Does the message need a mention about the newly added option to ignore threshold?

Thanks,
Lijo
________________________________
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Luben Tuikov <luben.tuikov@amd.com>
Sent: Monday, October 25, 2021 9:32:20 PM
To: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Tuikov, Luben <Luben.Tuikov@amd.com>; Russell, Kent <Kent.Russell@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>
Subject: [PATCH] drm/amdgpu: Restore information reporting in RAS

A recent patch took away the reporting of number of RAS records and
the threshold due to the way it was edited/spliced on top of the code.
This patch restores this reporting.

Cc: Kent Russell <kent.russell@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Fixes: 07df2fb092d09e ("drm/amdgpu: Add kernel parameter support for ignoring bad page threshold")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index ae64ca02ccc4f8..05117eda105b55 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -1112,7 +1112,10 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control,
                                 res = 0;
                         } else {
                                 *exceed_err_limit = true;
-                               dev_err(adev->dev, "GPU will not be initialized. Replace this GPU or increase the threshold.");
+                               dev_err(adev->dev,
+                                       "RAS records:%d exceed threshold:%d, "
+                                       "GPU will not be initialized. Replace this GPU or increase the threshold",
+                                       control->ras_num_recs, ras->bad_page_cnt_threshold);
                         }
                 }
         } else {

base-commit: b60bccb408c831c685b2a257eff575bcda2cbe9d
--
2.33.1.558.g2bd2f258f4


[-- Attachment #2: Type: text/html, Size: 4749 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdgpu: Restore information reporting in RAS
  2021-10-25 16:02 [PATCH] drm/amdgpu: Restore information reporting in RAS Luben Tuikov
                   ` (2 preceding siblings ...)
  2021-10-25 16:27 ` Lazar, Lijo
@ 2021-10-25 17:30 ` Felix Kuehling
  2021-10-25 17:59   ` Russell, Kent
  3 siblings, 1 reply; 6+ messages in thread
From: Felix Kuehling @ 2021-10-25 17:30 UTC (permalink / raw)
  To: Luben Tuikov, amd-gfx; +Cc: Kent Russell, Alex Deucher

Am 2021-10-25 um 12:02 p.m. schrieb Luben Tuikov:
> A recent patch took away the reporting of number of RAS records and
> the threshold due to the way it was edited/spliced on top of the code.
> This patch restores this reporting.
>
> Cc: Kent Russell <kent.russell@amd.com>
> Cc: Alex Deucher <Alexander.Deucher@amd.com>
> Fixes: 07df2fb092d09e ("drm/amdgpu: Add kernel parameter support for ignoring bad page threshold")
> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> index ae64ca02ccc4f8..05117eda105b55 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> @@ -1112,7 +1112,10 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control,
>  				res = 0;
>  			} else {
>  				*exceed_err_limit = true;
> -				dev_err(adev->dev, "GPU will not be initialized. Replace this GPU or increase the threshold.");
> +				dev_err(adev->dev,
> +					"RAS records:%d exceed threshold:%d, "
> +					"GPU will not be initialized. Replace this GPU or increase the threshold",

Splitting messages across multiple lines is usually discouraged
(presumably because it makes them hard to grep). I think checkpatch will
treat this as an error, while a long line is just a warning. Therefore
it seems that long lines are less bad than split messages.

Regards,
  Felix


> +					control->ras_num_recs, ras->bad_page_cnt_threshold);
>  			}
>  		}
>  	} else {
>
> base-commit: b60bccb408c831c685b2a257eff575bcda2cbe9d

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] drm/amdgpu: Restore information reporting in RAS
  2021-10-25 17:30 ` Felix Kuehling
@ 2021-10-25 17:59   ` Russell, Kent
  0 siblings, 0 replies; 6+ messages in thread
From: Russell, Kent @ 2021-10-25 17:59 UTC (permalink / raw)
  To: Kuehling, Felix, Tuikov, Luben, amd-gfx; +Cc: Deucher, Alexander

[AMD Official Use Only]



> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling@amd.com>
> Sent: Monday, October 25, 2021 1:30 PM
> To: Tuikov, Luben <Luben.Tuikov@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Russell, Kent <Kent.Russell@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: Restore information reporting in RAS
> 
> Am 2021-10-25 um 12:02 p.m. schrieb Luben Tuikov:
> > A recent patch took away the reporting of number of RAS records and
> > the threshold due to the way it was edited/spliced on top of the code.
> > This patch restores this reporting.
> >
> > Cc: Kent Russell <kent.russell@amd.com>
> > Cc: Alex Deucher <Alexander.Deucher@amd.com>
> > Fixes: 07df2fb092d09e ("drm/amdgpu: Add kernel parameter support for ignoring bad
> page threshold")
> > Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> > index ae64ca02ccc4f8..05117eda105b55 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> > @@ -1112,7 +1112,10 @@ int amdgpu_ras_eeprom_init(struct
> amdgpu_ras_eeprom_control *control,
> >  				res = 0;
> >  			} else {
> >  				*exceed_err_limit = true;
> > -				dev_err(adev->dev, "GPU will not be initialized. Replace this
> GPU or increase the threshold.");
> > +				dev_err(adev->dev,
> > +					"RAS records:%d exceed threshold:%d, "
> > +					"GPU will not be initialized. Replace this GPU or
> increase the threshold",
> 
> Splitting messages across multiple lines is usually discouraged
> (presumably because it makes them hard to grep). I think checkpatch will
> treat this as an error, while a long line is just a warning. Therefore
> it seems that long lines are less bad than split messages.

There are a few spots in the eeprom file where it gets done like this; I don't really like it either. Under https://www.kernel.org/doc/html/v5.13/process/coding-style.html , I see it supporting splitting for ASM (point 20) but not for regular string (point 2). 

In this one he's just restoring something I dropped, verbatim, so I have no issue giving it my RB.

 Kent

> 
> Regards,
>   Felix
> 
> 
> > +					control->ras_num_recs, ras-
> >bad_page_cnt_threshold);
> >  			}
> >  		}
> >  	} else {
> >
> > base-commit: b60bccb408c831c685b2a257eff575bcda2cbe9d

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-25 17:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-25 16:02 [PATCH] drm/amdgpu: Restore information reporting in RAS Luben Tuikov
2021-10-25 16:07 ` Russell, Kent
2021-10-25 16:15 ` Alex Deucher
2021-10-25 16:27 ` Lazar, Lijo
2021-10-25 17:30 ` Felix Kuehling
2021-10-25 17:59   ` Russell, Kent

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.