Linux-EDAC Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] x86/mce: Lower throttling MCE messages to warnings
@ 2019-10-09 15:54 Benjamin Berg
  2019-10-09 15:57 ` Hans de Goede
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Benjamin Berg @ 2019-10-09 15:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Hans de Goede, Srinivas Pandruvada, Benjamin Berg,
	Christian Kellner, Tony Luck, Borislav Petkov, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, linux-edac

On modern CPUs it is quite normal that the temperature limits are
reached and the CPU is throttled. In fact, often the thermal design is
not sufficient to cool the CPU at full load and limits can quickly be
reached when a burst in load happens. This will even happen with
technologies like RAPL limitting the long term power consumption of
the package.

So these messages do not usually indicate a hardware issue (e.g.
insufficient cooling). Log them as warnings to avoid confusion about
their severity.

Signed-off-by: Benjamin Berg <bberg@redhat.com>
Tested-by: Christian Kellner <ckellner@redhat.com>
---
 arch/x86/kernel/cpu/mce/therm_throt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
index 6e2becf547c5..bc441d68d060 100644
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -188,7 +188,7 @@ static void therm_throt_process(bool new_event, int event, int level)
 	/* if we just entered the thermal event */
 	if (new_event) {
 		if (event == THERMAL_THROTTLING_EVENT)
-			pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
+			pr_warn("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
 				this_cpu,
 				level == CORE_LEVEL ? "Core" : "Package",
 				state->count);
-- 
2.23.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mce: Lower throttling MCE messages to warnings
  2019-10-09 15:54 [PATCH] x86/mce: Lower throttling MCE messages to warnings Benjamin Berg
@ 2019-10-09 15:57 ` Hans de Goede
  2019-10-09 17:56 ` Borislav Petkov
  2019-10-17  7:20 ` [tip: ras/core] x86/mce: Lower throttling MCE messages' priority to warning tip-bot2 for Benjamin Berg
  2 siblings, 0 replies; 10+ messages in thread
From: Hans de Goede @ 2019-10-09 15:57 UTC (permalink / raw)
  To: Benjamin Berg, linux-kernel
  Cc: Srinivas Pandruvada, Christian Kellner, Tony Luck,
	Borislav Petkov, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, linux-edac

Hi,

On 09-10-2019 17:54, Benjamin Berg wrote:
> On modern CPUs it is quite normal that the temperature limits are
> reached and the CPU is throttled. In fact, often the thermal design is
> not sufficient to cool the CPU at full load and limits can quickly be
> reached when a burst in load happens. This will even happen with
> technologies like RAPL limitting the long term power consumption of
> the package.
> 
> So these messages do not usually indicate a hardware issue (e.g.
> insufficient cooling). Log them as warnings to avoid confusion about
> their severity.
> 
> Signed-off-by: Benjamin Berg <bberg@redhat.com>
> Tested-by: Christian Kellner <ckellner@redhat.com>

Ah, yes lets please lower the log-prio of these messages:

Reviewed-by: Hans de Goede <hdegoede@redhat.com>

Regards,

Hans



> ---
>   arch/x86/kernel/cpu/mce/therm_throt.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
> index 6e2becf547c5..bc441d68d060 100644
> --- a/arch/x86/kernel/cpu/mce/therm_throt.c
> +++ b/arch/x86/kernel/cpu/mce/therm_throt.c
> @@ -188,7 +188,7 @@ static void therm_throt_process(bool new_event, int event, int level)
>   	/* if we just entered the thermal event */
>   	if (new_event) {
>   		if (event == THERMAL_THROTTLING_EVENT)
> -			pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> +			pr_warn("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
>   				this_cpu,
>   				level == CORE_LEVEL ? "Core" : "Package",
>   				state->count);
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mce: Lower throttling MCE messages to warnings
  2019-10-09 15:54 [PATCH] x86/mce: Lower throttling MCE messages to warnings Benjamin Berg
  2019-10-09 15:57 ` Hans de Goede
@ 2019-10-09 17:56 ` Borislav Petkov
  2019-10-09 18:05   ` Joe Perches
  2019-10-10 21:08   ` Srinivas Pandruvada
  2019-10-17  7:20 ` [tip: ras/core] x86/mce: Lower throttling MCE messages' priority to warning tip-bot2 for Benjamin Berg
  2 siblings, 2 replies; 10+ messages in thread
From: Borislav Petkov @ 2019-10-09 17:56 UTC (permalink / raw)
  To: Benjamin Berg
  Cc: linux-kernel, Hans de Goede, Srinivas Pandruvada,
	Christian Kellner, Tony Luck, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, linux-edac

On Wed, Oct 09, 2019 at 05:54:24PM +0200, Benjamin Berg wrote:
> On modern CPUs it is quite normal that the temperature limits are
> reached and the CPU is throttled. In fact, often the thermal design is
> not sufficient to cool the CPU at full load and limits can quickly be
> reached when a burst in load happens. This will even happen with
> technologies like RAPL limitting the long term power consumption of
> the package.
> 
> So these messages do not usually indicate a hardware issue (e.g.
> insufficient cooling). Log them as warnings to avoid confusion about
> their severity.
> 
> Signed-off-by: Benjamin Berg <bberg@redhat.com>
> Tested-by: Christian Kellner <ckellner@redhat.com>
> ---
>  arch/x86/kernel/cpu/mce/therm_throt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
> index 6e2becf547c5..bc441d68d060 100644
> --- a/arch/x86/kernel/cpu/mce/therm_throt.c
> +++ b/arch/x86/kernel/cpu/mce/therm_throt.c
> @@ -188,7 +188,7 @@ static void therm_throt_process(bool new_event, int event, int level)
>  	/* if we just entered the thermal event */
>  	if (new_event) {
>  		if (event == THERMAL_THROTTLING_EVENT)
> -			pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> +			pr_warn("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
>  				this_cpu,
>  				level == CORE_LEVEL ? "Core" : "Package",
>  				state->count);
> -- 

This has carried over since its very first addition in

commit 3867eb75b9279c7b0f6840d2ad9f27694ba6c4e4
Author: Dave Jones <davej@suse.de>
Date:   Tue Apr 2 20:02:27 2002 -0800

    [PATCH] x86 bluesmoke update.
    
    o  Make MCE compile time optional       (Paul Gortmaker)
    o  P4 thermal trip monitoring.          (Zwane Mwaikambo)
    o  Non-fatal MCE logging.               (Me)


It used to be KERN_EMERG back then, though.

And yes, this issue has come up in the past already so I think I'll take
it. I'll just give Intel folks a couple of days to object should there
be anything to object to.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mce: Lower throttling MCE messages to warnings
  2019-10-09 17:56 ` Borislav Petkov
@ 2019-10-09 18:05   ` Joe Perches
  2019-10-09 18:22     ` Borislav Petkov
  2019-10-10 21:08   ` Srinivas Pandruvada
  1 sibling, 1 reply; 10+ messages in thread
From: Joe Perches @ 2019-10-09 18:05 UTC (permalink / raw)
  To: Borislav Petkov, Benjamin Berg
  Cc: linux-kernel, Hans de Goede, Srinivas Pandruvada,
	Christian Kellner, Tony Luck, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, linux-edac

On Wed, 2019-10-09 at 19:56 +0200, Borislav Petkov wrote:
> On Wed, Oct 09, 2019 at 05:54:24PM +0200, Benjamin Berg wrote:
> > On modern CPUs it is quite normal that the temperature limits are
> > reached and the CPU is throttled. In fact, often the thermal design is
> > not sufficient to cool the CPU at full load and limits can quickly be
> > reached when a burst in load happens. This will even happen with
> > technologies like RAPL limitting the long term power consumption of
> > the package.
> > 
> > So these messages do not usually indicate a hardware issue (e.g.
> > insufficient cooling). Log them as warnings to avoid confusion about
> > their severity.
[]
> > diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
[]
> > @@ -188,7 +188,7 @@ static void therm_throt_process(bool new_event, int event, int level)
> >  	/* if we just entered the thermal event */
> >  	if (new_event) {
> >  		if (event == THERMAL_THROTTLING_EVENT)
> > -			pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> > +			pr_warn("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> >  				this_cpu,
> >  				level == CORE_LEVEL ? "Core" : "Package",
> >  				state->count);
> > -- 
> 
> This has carried over since its very first addition in
> 
> commit 3867eb75b9279c7b0f6840d2ad9f27694ba6c4e4
> Author: Dave Jones <davej@suse.de>
> Date:   Tue Apr 2 20:02:27 2002 -0800
> 
>     [PATCH] x86 bluesmoke update.
>     
>     o  Make MCE compile time optional       (Paul Gortmaker)
>     o  P4 thermal trip monitoring.          (Zwane Mwaikambo)
>     o  Non-fatal MCE logging.               (Me)
> 
> 
> It used to be KERN_EMERG back then, though.
> 
> And yes, this issue has come up in the past already so I think I'll take
> it. I'll just give Intel folks a couple of days to object should there
> be anything to object to.

Perhaps this should be

	pr_warn_ratelimited(...)

as the temperature changes can be relatively quick.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mce: Lower throttling MCE messages to warnings
  2019-10-09 18:05   ` Joe Perches
@ 2019-10-09 18:22     ` Borislav Petkov
  2019-10-09 18:44       ` Joe Perches
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2019-10-09 18:22 UTC (permalink / raw)
  To: Joe Perches
  Cc: Benjamin Berg, linux-kernel, Hans de Goede, Srinivas Pandruvada,
	Christian Kellner, Tony Luck, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, linux-edac

On Wed, Oct 09, 2019 at 11:05:37AM -0700, Joe Perches wrote:
> Perhaps this should be
> 
> 	pr_warn_ratelimited(...)
> 
> as the temperature changes can be relatively quick.

There's already ratelimiting machinery a bit above in the same function.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mce: Lower throttling MCE messages to warnings
  2019-10-09 18:22     ` Borislav Petkov
@ 2019-10-09 18:44       ` Joe Perches
  0 siblings, 0 replies; 10+ messages in thread
From: Joe Perches @ 2019-10-09 18:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Benjamin Berg, linux-kernel, Hans de Goede, Srinivas Pandruvada,
	Christian Kellner, Tony Luck, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, linux-edac

On Wed, 2019-10-09 at 20:22 +0200, Borislav Petkov wrote:
> On Wed, Oct 09, 2019 at 11:05:37AM -0700, Joe Perches wrote:
> > Perhaps this should be
> > 
> > 	pr_warn_ratelimited(...)
> > 
> > as the temperature changes can be relatively quick.
> 
> There's already ratelimiting machinery a bit above in the same function.

right, thanks, nevermind...


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mce: Lower throttling MCE messages to warnings
  2019-10-09 17:56 ` Borislav Petkov
  2019-10-09 18:05   ` Joe Perches
@ 2019-10-10 21:08   ` Srinivas Pandruvada
  2019-10-11  7:31     ` Benjamin Berg
  1 sibling, 1 reply; 10+ messages in thread
From: Srinivas Pandruvada @ 2019-10-10 21:08 UTC (permalink / raw)
  To: Borislav Petkov, Benjamin Berg
  Cc: linux-kernel, Hans de Goede, Christian Kellner, Tony Luck,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-edac

Hi Benjamin,

On Wed, 2019-10-09 at 19:56 +0200, Borislav Petkov wrote:
> On Wed, Oct 09, 2019 at 05:54:24PM +0200, Benjamin Berg wrote:
> > On modern CPUs it is quite normal that the temperature limits are
> > reached and the CPU is throttled. In fact, often the thermal design
> > is
> > not sufficient to cool the CPU at full load and limits can quickly
> > be
> > reached when a burst in load happens. This will even happen with
> > technologies like RAPL limitting the long term power consumption of
> > the package.
> > 
> > So these messages do not usually indicate a hardware issue (e.g.
> > insufficient cooling). Log them as warnings to avoid confusion
> > about
> > their severity.
> > 
I have a patch to address this. Instead of avoiding any critical
warnings or wait for 300 seconds for next one, the warning is based on
how long the system is working on throttled condition. If for example
the fan broke, then the throttling is extended for a long time. Then we
better warn.
I am waiting for internal review, and hope to post by tomorrow.

Thanks
Srinivas

> > Signed-off-by: Benjamin Berg <bberg@redhat.com>
> > Tested-by: Christian Kellner <ckellner@redhat.com>
> > ---
> >  arch/x86/kernel/cpu/mce/therm_throt.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c
> > b/arch/x86/kernel/cpu/mce/therm_throt.c
> > index 6e2becf547c5..bc441d68d060 100644
> > --- a/arch/x86/kernel/cpu/mce/therm_throt.c
> > +++ b/arch/x86/kernel/cpu/mce/therm_throt.c
> > @@ -188,7 +188,7 @@ static void therm_throt_process(bool new_event,
> > int event, int level)
> >  	/* if we just entered the thermal event */
> >  	if (new_event) {
> >  		if (event == THERMAL_THROTTLING_EVENT)
> > -			pr_crit("CPU%d: %s temperature above threshold,
> > cpu clock throttled (total events = %lu)\n",
> > +			pr_warn("CPU%d: %s temperature above threshold,
> > cpu clock throttled (total events = %lu)\n",
> >  				this_cpu,
> >  				level == CORE_LEVEL ? "Core" :
> > "Package",
> >  				state->count);
> > -- 
> 
> This has carried over since its very first addition in
> 
> commit 3867eb75b9279c7b0f6840d2ad9f27694ba6c4e4
> Author: Dave Jones <davej@suse.de>
> Date:   Tue Apr 2 20:02:27 2002 -0800
> 
>     [PATCH] x86 bluesmoke update.
>     
>     o  Make MCE compile time optional       (Paul Gortmaker)
>     o  P4 thermal trip monitoring.          (Zwane Mwaikambo)
>     o  Non-fatal MCE logging.               (Me)
> 
> 
> It used to be KERN_EMERG back then, though.
> 
> And yes, this issue has come up in the past already so I think I'll
> take
> it. I'll just give Intel folks a couple of days to object should
> there
> be anything to object to.
> 
> Thx.
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mce: Lower throttling MCE messages to warnings
  2019-10-10 21:08   ` Srinivas Pandruvada
@ 2019-10-11  7:31     ` Benjamin Berg
  0 siblings, 0 replies; 10+ messages in thread
From: Benjamin Berg @ 2019-10-11  7:31 UTC (permalink / raw)
  To: Srinivas Pandruvada, Borislav Petkov
  Cc: linux-kernel, Hans de Goede, Christian Kellner, Tony Luck,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-edac

Hi Srinivas,

On Thu, 2019-10-10 at 14:08 -0700, Srinivas Pandruvada wrote:
> I have a patch to address this. Instead of avoiding any critical
> warnings or wait for 300 seconds for next one, the warning is based on
> how long the system is working on throttled condition. If for example
> the fan broke, then the throttling is extended for a long time. Then we
> better warn.
> I am waiting for internal review, and hope to post by tomorrow.

Nice! I agree that a heuristic seems better than the very simple
approach taken in this patch.

Thanks,
Benjamin

> Thanks
> Srinivas
> 
> > > Signed-off-by: Benjamin Berg <bberg@redhat.com>
> > > Tested-by: Christian Kellner <ckellner@redhat.com>
> > > ---
> > >  arch/x86/kernel/cpu/mce/therm_throt.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c
> > > b/arch/x86/kernel/cpu/mce/therm_throt.c
> > > index 6e2becf547c5..bc441d68d060 100644
> > > --- a/arch/x86/kernel/cpu/mce/therm_throt.c
> > > +++ b/arch/x86/kernel/cpu/mce/therm_throt.c
> > > @@ -188,7 +188,7 @@ static void therm_throt_process(bool
> > > new_event,
> > > int event, int level)
> > >  	/* if we just entered the thermal event */
> > >  	if (new_event) {
> > >  		if (event == THERMAL_THROTTLING_EVENT)
> > > -			pr_crit("CPU%d: %s temperature above threshold,
> > > cpu clock throttled (total events = %lu)\n",
> > > +			pr_warn("CPU%d: %s temperature above threshold,
> > > cpu clock throttled (total events = %lu)\n",
> > >  				this_cpu,
> > >  				level == CORE_LEVEL ? "Core" :
> > > "Package",
> > >  				state->count);
> > > -- 
> > 
> > This has carried over since its very first addition in
> > 
> > commit 3867eb75b9279c7b0f6840d2ad9f27694ba6c4e4
> > Author: Dave Jones <davej@suse.de>
> > Date:   Tue Apr 2 20:02:27 2002 -0800
> > 
> >     [PATCH] x86 bluesmoke update.
> >     
> >     o  Make MCE compile time optional       (Paul Gortmaker)
> >     o  P4 thermal trip monitoring.          (Zwane Mwaikambo)
> >     o  Non-fatal MCE logging.               (Me)
> > 
> > 
> > It used to be KERN_EMERG back then, though.
> > 
> > And yes, this issue has come up in the past already so I think I'll
> > take
> > it. I'll just give Intel folks a couple of days to object should
> > there
> > be anything to object to.
> > 
> > Thx.
> > 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip: ras/core] x86/mce: Lower throttling MCE messages' priority to warning
  2019-10-09 15:54 [PATCH] x86/mce: Lower throttling MCE messages to warnings Benjamin Berg
  2019-10-09 15:57 ` Hans de Goede
  2019-10-09 17:56 ` Borislav Petkov
@ 2019-10-17  7:20 ` tip-bot2 for Benjamin Berg
  2 siblings, 0 replies; 10+ messages in thread
From: tip-bot2 for Benjamin Berg @ 2019-10-17  7:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Benjamin Berg, Borislav Petkov, Hans de Goede, Christian Kellner,
	H. Peter Anvin, Ingo Molnar, linux-edac, Peter Zijlstra,
	Srinivas Pandruvada, Thomas Gleixner, Tony Luck, x86-ml,
	Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the ras/core branch of tip:

Commit-ID:     9c3bafaa1fd88e4dd2dba3735a1f1abb0f2c7bb7
Gitweb:        https://git.kernel.org/tip/9c3bafaa1fd88e4dd2dba3735a1f1abb0f2c7bb7
Author:        Benjamin Berg <bberg@redhat.com>
AuthorDate:    Wed, 09 Oct 2019 17:54:24 +02:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Thu, 17 Oct 2019 09:07:09 +02:00

x86/mce: Lower throttling MCE messages' priority to warning

On modern CPUs it is quite normal that the temperature limits are
reached and the CPU is throttled. In fact, often the thermal design is
not sufficient to cool the CPU at full load and limits can quickly be
reached when a burst in load happens. This will even happen with
technologies like RAPL limitting the long term power consumption of
the package.

Also, these limits are "softer", as Srinivas explains:

"CPU temperature doesn't have to hit max(TjMax) to get these warnings.
OEMs ha[ve] an ability to program a threshold where a thermal interrupt
can be generated. In some systems the offset is 20C+ (Read only value).

In recent systems, there is another offset on top of it which can be
programmed by OS, once some agent can adjust power limits dynamically.
By default this is set to low by the firmware, which I guess the
prime motivation of Benjamin to submit the patch."

So these messages do not usually indicate a hardware issue (e.g.
insufficient cooling). Log them as warnings to avoid confusion about
their severity.

 [ bp: Massage commit mesage. ]

Signed-off-by: Benjamin Berg <bberg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Christian Kellner <ckellner@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191009155424.249277-1-bberg@redhat.com
---
 arch/x86/kernel/cpu/mce/therm_throt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
index 6e2becf..bc441d6 100644
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -188,7 +188,7 @@ static void therm_throt_process(bool new_event, int event, int level)
 	/* if we just entered the thermal event */
 	if (new_event) {
 		if (event == THERMAL_THROTTLING_EVENT)
-			pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
+			pr_warn("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
 				this_cpu,
 				level == CORE_LEVEL ? "Core" : "Package",
 				state->count);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mce: Lower throttling MCE messages to warnings
       [not found] <5da27a3e.1c69fb81.d3083.7f73SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2019-10-13  7:35 ` Hans de Goede
  0 siblings, 0 replies; 10+ messages in thread
From: Hans de Goede @ 2019-10-13  7:35 UTC (permalink / raw)
  To: Benjamin Berg, linux-kernel
  Cc: Srinivas Pandruvada, Christian Kellner, Tony Luck,
	Borislav Petkov, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, linux-edac

Hi Benjamin,

On 09-10-2019 17:54, Benjamin Berg wrote:
> On modern CPUs it is quite normal that the temperature limits are
> reached and the CPU is throttled. In fact, often the thermal design is
> not sufficient to cool the CPU at full load and limits can quickly be
> reached when a burst in load happens. This will even happen with
> technologies like RAPL limitting the long term power consumption of
> the package.
> 
> So these messages do not usually indicate a hardware issue (e.g.
> insufficient cooling). Log them as warnings to avoid confusion about
> their severity.
> 
> Signed-off-by: Benjamin Berg <bberg@redhat.com>
> Tested-by: Christian Kellner <ckellner@redhat.com>

This seems like the exact same patch as you send before, is there
any reason for this resend ?

Regards,

Hans


> ---
>   arch/x86/kernel/cpu/mce/therm_throt.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
> index 6e2becf547c5..bc441d68d060 100644
> --- a/arch/x86/kernel/cpu/mce/therm_throt.c
> +++ b/arch/x86/kernel/cpu/mce/therm_throt.c
> @@ -188,7 +188,7 @@ static void therm_throt_process(bool new_event, int event, int level)
>   	/* if we just entered the thermal event */
>   	if (new_event) {
>   		if (event == THERMAL_THROTTLING_EVENT)
> -			pr_crit("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
> +			pr_warn("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n",
>   				this_cpu,
>   				level == CORE_LEVEL ? "Core" : "Package",
>   				state->count);
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-09 15:54 [PATCH] x86/mce: Lower throttling MCE messages to warnings Benjamin Berg
2019-10-09 15:57 ` Hans de Goede
2019-10-09 17:56 ` Borislav Petkov
2019-10-09 18:05   ` Joe Perches
2019-10-09 18:22     ` Borislav Petkov
2019-10-09 18:44       ` Joe Perches
2019-10-10 21:08   ` Srinivas Pandruvada
2019-10-11  7:31     ` Benjamin Berg
2019-10-17  7:20 ` [tip: ras/core] x86/mce: Lower throttling MCE messages' priority to warning tip-bot2 for Benjamin Berg
     [not found] <5da27a3e.1c69fb81.d3083.7f73SMTPIN_ADDED_BROKEN@mx.google.com>
2019-10-13  7:35 ` [PATCH] x86/mce: Lower throttling MCE messages to warnings Hans de Goede

Linux-EDAC Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-edac/0 linux-edac/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-edac linux-edac/ https://lore.kernel.org/linux-edac \
		linux-edac@vger.kernel.org
	public-inbox-index linux-edac

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-edac


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git