All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Ashok Raj <ashok.raj@intel.com>
Cc: Borislav Petkov <bp@alien8.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tony Luck <tony.luck@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	LKML Mailing List <linux-kernel@vger.kernel.org>,
	X86-kernel <x86@kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Jacon Jun Pan <jacob.jun.pan@intel.com>
Subject: Re: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update
Date: Wed, 17 Aug 2022 09:41:31 +0200	[thread overview]
Message-ID: <Yvybq+hYT4tG/yAg@gmail.com> (raw)
In-Reply-To: <20220817051127.3323755-4-ashok.raj@intel.com>


* Ashok Raj <ashok.raj@intel.com> wrote:

> When a microcode update is in progress, several instructions and MSR's can
> be patched by the update. During the update in progress, touching any of
> the resources being patched could result in unpredictable results. If
> thread0 is doing the update and thread1 happens to get a MCE, the handler
> might read an MSR that's being patched.
> 
> In order to have predictable behavior, to avoid this scenario we set the MCIP in
> all threads. Since MCE's can't be nested, HW will automatically promote to
> shutdown condition.
> 
> After the update is completed, MCIP flag is cleared. The system is going to
> shutdown anyway, since the MCE could be a fatal error, or even recoverable
> errors in kernel space are treated as unrecoverable.
> 
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  arch/x86/include/asm/mce.h           |  4 ++++
>  arch/x86/kernel/cpu/mce/core.c       |  9 +++++++++
>  arch/x86/kernel/cpu/microcode/core.c | 11 +++++++++++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index cc73061e7255..2aef6120e23f 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -207,12 +207,16 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c);
>  void mcheck_cpu_clear(struct cpuinfo_x86 *c);
>  int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
>  			       u64 lapic_id);
> +extern void mce_set_mcip(void);
> +extern void mce_clear_mcip(void);
>  #else
>  static inline int mcheck_init(void) { return 0; }
>  static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
>  static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
>  static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
>  					     u64 lapic_id) { return -EINVAL; }
> +static inline void mce_set_mcip(void) {}
> +static inline void mce_clear_mcip(void) {}
>  #endif
>  
>  void mce_setup(struct mce *m);
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 2c8ec5c71712..72b49d95bb3b 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -402,6 +402,15 @@ static noinstr void mce_wrmsrl(u32 msr, u64 v)
>  		     : : "c" (msr), "a"(low), "d" (high) : "memory");
>  }
>  
> +void mce_set_mcip(void)
> +{
> +	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0x1);
> +}
> +
> +void mce_clear_mcip(void)
> +{
> +	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0x0);
> +}

Instead of naming new APIs after how they are doing stuff, please name them 
after *what* they are doing at the highest level: they disable/enable MCEs.

Ie. I'd suggest something like:

     mce_disable()
     mce_enable()

I'd also suggest to at minimum add a WARN_ON_ONCE() if MSR_IA32_MCG_STATUS 
is already 1 when we disable it - because whoever wanted it disabled will 
now be surprised by us enabling them again.

> +	/*
> +	 * Its dangerous to let MCE while microcode update is in progress.

s/let MCE while
 /let MCEs execute while

> +	 * Its extremely rare and even if happens they are fatal errors.
> +	 * But reading patched areas before the update is complete can be
> +	 * leading to unpredictable results. Setting MCIP will guarantee

s/can be leading to
 /can lead to

> +	 * the platform is taken to reset predictively.

What does 'the platform is taken to reset predictively' mean?

Did you mean 'predictibly'/'reliably'?

> +	 */
> +	mce_set_mcip();
>  	/*
>  	 * On an SMT system, it suffices to load the microcode on one sibling of
>  	 * the core because the microcode engine is shared between the threads.
> @@ -457,6 +466,7 @@ static int __reload_late(void *info)
>  	 * loading attempts happen on multiple threads of an SMT core. See
>  	 * below.
>  	 */
> +
>  	if (cpumask_first(topology_sibling_cpumask(cpu)) == cpu)
>  		apply_microcode_local(&err);
>  	else

Spurious newline added?

Thanks,

	Ingo

  reply	other threads:[~2022-08-17  7:41 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17  5:11 [PATCH v3 0/5] Making microcode late-load robust Ashok Raj
2022-08-17  5:11 ` [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode Ashok Raj
2022-08-17  7:43   ` Ingo Molnar
2022-08-17 10:45     ` Ashok Raj
2022-08-19 10:24   ` Borislav Petkov
2022-08-23 11:13     ` Ashok Raj
2022-08-24 19:27       ` Borislav Petkov
2022-08-25  3:27         ` Ashok Raj
2022-08-26 16:24           ` Borislav Petkov
2022-08-26 17:18             ` Ashok Raj
2022-08-26 17:29               ` Borislav Petkov
2022-08-17  5:11 ` [PATCH v3 2/5] x86/microcode/intel: Allow a late-load only if a min rev is specified Ashok Raj
2022-08-17  7:45   ` Ingo Molnar
2022-08-19 11:11   ` Borislav Petkov
2022-08-23  0:08     ` Ashok Raj
2022-08-24 19:52       ` Borislav Petkov
2022-08-25  4:02         ` Ashok Raj
2022-08-26 12:09           ` Borislav Petkov
2022-08-17  5:11 ` [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update Ashok Raj
2022-08-17  7:41   ` Ingo Molnar [this message]
2022-08-17  7:58     ` Ingo Molnar
2022-08-17  8:09       ` Borislav Petkov
2022-08-17 11:57         ` Ashok Raj
2022-08-17 12:10           ` Borislav Petkov
2022-08-17 12:30             ` Ashok Raj
2022-08-17 14:19               ` Borislav Petkov
2022-08-17 15:06                 ` Ashok Raj
2022-08-29 14:23                   ` Andy Lutomirski
2022-08-17 11:40     ` Ashok Raj
2022-08-17  5:11 ` [PATCH v3 4/5] x86/x2apic: Support x2apic self IPI with NMI_VECTOR Ashok Raj
2022-08-17  5:11 ` [PATCH v3 5/5] x86/microcode: Place siblings in NMI loop while update in progress Ashok Raj
2022-08-30 19:15   ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yvybq+hYT4tG/yAg@gmail.com \
    --to=mingo@kernel.org \
    --cc=ashok.raj@intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=jacob.jun.pan@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.