linux-safety.lists.elisa.tech archive mirror
 help / color / mirror / Atom feed
From: "Paoloni, Gabriele" <gabriele.paoloni@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: "Luck, Tony" <tony.luck@intel.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-safety@lists.elisa.tech" <linux-safety@lists.elisa.tech>
Subject: Re: [linux-safety] [PATCH 2/4] x86/mce: move the mce_panic() call and kill_it assignments at the right places
Date: Mon, 23 Nov 2020 17:06:31 +0000	[thread overview]
Message-ID: <MN2PR11MB4158162EBECE1AEA80D5EC0288FC0@MN2PR11MB4158.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20201123142746.GC15044@zn.tnic>

Hi Boris

> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Monday, November 23, 2020 3:28 PM
> To: Paoloni, Gabriele <gabriele.paoloni@intel.com>
> Cc: Luck, Tony <tony.luck@intel.com>; tglx@linutronix.de;
> mingo@redhat.com; x86@kernel.org; hpa@zytor.com; linux-
> edac@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> safety@lists.elisa.tech
> Subject: Re: [PATCH 2/4] x86/mce: move the mce_panic() call and kill_it
> assignments at the right places
> 
> On Wed, Nov 18, 2020 at 03:15:50PM +0000, Gabriele Paoloni wrote:
> > Right now for local MCEs we panic(),if needed, right after lmce is
> > set. For global MCEs mce_reign() takes care of calling mce_panic().
> > Hence this patch:
> > - improves readibility by moving the conditional evaluation of
> > tolerant up to when kill_it is set first
> > - moves the mce_panic() call up into the statement where mce_end()
> > fails
> 
> Pls avoid using "this patch does this and that" in the commit message
> but say directly what it does:
> 
> - Improve readability ...
> 
> - Move mce_panic()...
> 
> and so on.

Thanks, I'll fix it in v2

> 
> > Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
> > Reviewed-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  arch/x86/kernel/cpu/mce/core.c | 21 +++++++++------------
> >  1 file changed, 9 insertions(+), 12 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mce/core.c
> b/arch/x86/kernel/cpu/mce/core.c
> > index b990892c6766..e025ff04438f 100644
> > --- a/arch/x86/kernel/cpu/mce/core.c
> > +++ b/arch/x86/kernel/cpu/mce/core.c
> > @@ -1350,8 +1350,7 @@ noinstr void do_machine_check(struct pt_regs
> *regs)
> >  	 * severity is MCE_AR_SEVERITY we have other options.
> >  	 */
> >  	if (!(m.mcgstatus & MCG_STATUS_RIPV))
> > -		kill_it = 1;
> > -
> > +		kill_it = (cfg->tolerant == 3) ? 0 : 1;
> 
> So you just set kill_it using cfg->tolerant...

Well I fist see if RIPV is not set; the I check the tolerance level to see if we need to
kill the user space app... 

> 
> >  	/*
> >  	 * Check if this MCE is signaled to only this logical processor,
> >  	 * on Intel, Zhaoxin only.
> > @@ -1384,8 +1383,15 @@ noinstr void do_machine_check(struct pt_regs
> *regs)
> >  	 * When there's any problem use only local no_way_out state.
> >  	 */
> >  	if (!lmce) {
> > -		if (mce_end(order) < 0)
> > +		if (mce_end(order) < 0) {
> >  			no_way_out = no_way_out ? no_way_out : worst >=
> MCE_PANIC_SEVERITY;
> > +			/*
> > +			 * mce_reign() has probably failed hence evaluate if
> we need
> > +			 * to panic
> > +			 */
> > +			if (no_way_out && mca_cfg.tolerant < 3)
> 
> ... but here you're testing cfg->tolerant again.

Yes because the tolerant flag tells me if I need to take action...

> 
> why not
> 
> 			if (no_way_out && kill_it)
> 
> ?

From my understanding no_way_out and kill_it are different in principles:
no_way_out is telling that an error occurred 'somewhere' in some CPU bank
that requires the system to panic (e.g. PCC=1); kill_it is saying that the execution
cannot be restarted where it left for the local CPU and hence we need to find
an alternative solution as part of the recovery action. In practice it seems to
me that kill_it is used to replace kill_me_maybe with kill_me_now in case
the exception happened in user mode.

So If I where using the statement "if (no_way_out && kill_it)" I would miss
to panic, for example, in cases where no_way_out captured a fatal error
somewhere in other CPUs but RIPV is set for the local CPU...

Thanks
Gab  

> 
> Thx.
> 
> --
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4 
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale  04236760155
Repertorio Economico Amministrativo n. 997124 
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di 
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#191): https://lists.elisa.tech/g/linux-safety/message/191
Mute This Topic: https://lists.elisa.tech/mt/78342502/5278000
Group Owner: linux-safety+owner@lists.elisa.tech
Unsubscribe: https://lists.elisa.tech/g/linux-safety/unsub [linux-safety@archiver.kernel.org]
-=-=-=-=-=-=-=-=-=-=-=-



  reply	other threads:[~2020-11-23 17:06 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-18 15:15 [linux-safety] [PATCH 0/4] x86/MCE: some minor fixes Paoloni, Gabriele
2020-11-18 15:15 ` [linux-safety] [PATCH 1/4] x86/mce: do not overwrite no_way_out if mce_end() fails Paoloni, Gabriele
2020-11-20 17:07   ` Borislav Petkov
2020-11-20 17:31     ` Paoloni, Gabriele
2020-11-20 17:33       ` Borislav Petkov
2020-11-23 14:35         ` Borislav Petkov
2020-11-20 17:32   ` Borislav Petkov
2020-11-20 17:35     ` Paoloni, Gabriele
2020-11-18 15:15 ` [linux-safety] [PATCH 2/4] x86/mce: move the mce_panic() call and kill_it assignments at the right places Paoloni, Gabriele
2020-11-23 14:27   ` Borislav Petkov
2020-11-23 17:06     ` Paoloni, Gabriele [this message]
2020-11-23 17:19       ` Borislav Petkov
2020-11-23 17:40         ` Paoloni, Gabriele
2020-11-23 18:07           ` Borislav Petkov
2020-11-18 15:15 ` [linux-safety] [PATCH 3/4] x86/mce: for LMCE panic only if mca_cfg.tolerant < 3 Paoloni, Gabriele
2020-11-18 15:15 ` [linux-safety] [PATCH 4/4] x86/mce: remove redundant call to irq_work_queue() Paoloni, Gabriele

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN2PR11MB4158162EBECE1AEA80D5EC0288FC0@MN2PR11MB4158.namprd11.prod.outlook.com \
    --to=gabriele.paoloni@intel.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-safety@lists.elisa.tech \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).