linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
To: Borislav Petkov <bp@alien8.de>, Benjamin Berg <bberg@redhat.com>
Cc: linux-kernel@vger.kernel.org, Hans de Goede <hdegoede@redhat.com>,
	Christian Kellner <ckellner@redhat.com>,
	Tony Luck <tony.luck@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, linux-edac@vger.kernel.org
Subject: Re: [PATCH] x86/mce: Lower throttling MCE messages to warnings
Date: Thu, 10 Oct 2019 14:08:55 -0700	[thread overview]
Message-ID: <e41580784d8f5a1806250f4daed528304976cf15.camel@linux.intel.com> (raw)
In-Reply-To: <20191009175608.GK10395@zn.tnic>

Hi Benjamin,

On Wed, 2019-10-09 at 19:56 +0200, Borislav Petkov wrote:
> On Wed, Oct 09, 2019 at 05:54:24PM +0200, Benjamin Berg wrote:
> > On modern CPUs it is quite normal that the temperature limits are
> > reached and the CPU is throttled. In fact, often the thermal design
> > is
> > not sufficient to cool the CPU at full load and limits can quickly
> > be
> > reached when a burst in load happens. This will even happen with
> > technologies like RAPL limitting the long term power consumption of
> > the package.
> > 
> > So these messages do not usually indicate a hardware issue (e.g.
> > insufficient cooling). Log them as warnings to avoid confusion
> > about
> > their severity.
> > 
I have a patch to address this. Instead of avoiding any critical
warnings or wait for 300 seconds for next one, the warning is based on
how long the system is working on throttled condition. If for example
the fan broke, then the throttling is extended for a long time. Then we
better warn.
I am waiting for internal review, and hope to post by tomorrow.

Thanks
Srinivas

> > Signed-off-by: Benjamin Berg <bberg@redhat.com>
> > Tested-by: Christian Kellner <ckellner@redhat.com>
> > ---
> >  arch/x86/kernel/cpu/mce/therm_throt.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c
> > b/arch/x86/kernel/cpu/mce/therm_throt.c
> > index 6e2becf547c5..bc441d68d060 100644
> > --- a/arch/x86/kernel/cpu/mce/therm_throt.c
> > +++ b/arch/x86/kernel/cpu/mce/therm_throt.c
> > @@ -188,7 +188,7 @@ static void therm_throt_process(bool new_event,
> > int event, int level)
> >  	/* if we just entered the thermal event */
> >  	if (new_event) {
> >  		if (event == THERMAL_THROTTLING_EVENT)
> > -			pr_crit("CPU%d: %s temperature above threshold,
> > cpu clock throttled (total events = %lu)\n",
> > +			pr_warn("CPU%d: %s temperature above threshold,
> > cpu clock throttled (total events = %lu)\n",
> >  				this_cpu,
> >  				level == CORE_LEVEL ? "Core" :
> > "Package",
> >  				state->count);
> > -- 
> 
> This has carried over since its very first addition in
> 
> commit 3867eb75b9279c7b0f6840d2ad9f27694ba6c4e4
> Author: Dave Jones <davej@suse.de>
> Date:   Tue Apr 2 20:02:27 2002 -0800
> 
>     [PATCH] x86 bluesmoke update.
>     
>     o  Make MCE compile time optional       (Paul Gortmaker)
>     o  P4 thermal trip monitoring.          (Zwane Mwaikambo)
>     o  Non-fatal MCE logging.               (Me)
> 
> 
> It used to be KERN_EMERG back then, though.
> 
> And yes, this issue has come up in the past already so I think I'll
> take
> it. I'll just give Intel folks a couple of days to object should
> there
> be anything to object to.
> 
> Thx.
> 


  parent reply	other threads:[~2019-10-10 21:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-09 15:54 [PATCH] x86/mce: Lower throttling MCE messages to warnings Benjamin Berg
2019-10-09 15:57 ` Hans de Goede
2019-10-09 17:56 ` Borislav Petkov
2019-10-09 18:05   ` Joe Perches
2019-10-09 18:22     ` Borislav Petkov
2019-10-09 18:44       ` Joe Perches
2019-10-10 21:08   ` Srinivas Pandruvada [this message]
2019-10-11  7:31     ` Benjamin Berg
2019-10-17  7:20 ` [tip: ras/core] x86/mce: Lower throttling MCE messages' priority to warning tip-bot2 for Benjamin Berg
     [not found] <5da27a3e.1c69fb81.d3083.7f73SMTPIN_ADDED_BROKEN@mx.google.com>
2019-10-13  7:35 ` [PATCH] x86/mce: Lower throttling MCE messages to warnings Hans de Goede

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e41580784d8f5a1806250f4daed528304976cf15.camel@linux.intel.com \
    --to=srinivas.pandruvada@linux.intel.com \
    --cc=bberg@redhat.com \
    --cc=bp@alien8.de \
    --cc=ckellner@redhat.com \
    --cc=hdegoede@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).