linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] Making microcode late-load robust
@ 2022-08-17  5:11 Ashok Raj
  2022-08-17  5:11 ` [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode Ashok Raj
                   ` (4 more replies)
  0 siblings, 5 replies; 32+ messages in thread
From: Ashok Raj @ 2022-08-17  5:11 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: Tony Luck, Dave Hansen, LKML Mailing List, X86-kernel,
	Andy Lutomirski, Tom Lendacky, Jacon Jun Pan, Ashok Raj

Hi Boris

I thought v2 worked, but there were some glaring errors in them.
I added more trace and this did go through and finish. More testing would
help.

Let me know how you want to handle 1-3. patch4,5 needs more review and some
help testing on AMD as well.

v2: https://lore.kernel.org/lkml/20220816043754.3258815-1-ashok.raj@intel.com/
v1: https://lore.kernel.org/lkml/20220813223825.3164861-1-ashok.raj@intel.com/


Changes since v2

- Dropped Documentation, queued for tip
  https://lore.kernel.org/lkml/166063601497.401.9527776956724011207.tip-bot2@tip-bot2/

Patches: 1-3
   No Changes since v1

Patch4:
   x2apic doesn't handle NMI vector via the self IPI MSR, it can only
   send Fixed mode interrupts. Enhance send_IPI_self() to comprehend
   and use a function that will deliver to the NMI.

Patch5: 
   - Modified when the NMI was triggered. Moved it closer to when
     __reload_late() is running.
   - Used the new apic->send_IPI_self() support helper.
   - Added an atomic to check if the primary has completed the load.
   - Triggering the NMI and when its taken determines how long the primary
     CPU needs to wait. For Now i have no timeout, but will check on couple
     more platforms before coming with a number, so it can bail out if
     secondaries don't show up in a reasonable amount of time.

Ashok Raj (5):
  x86/microcode/intel: Check against CPU signature before saving
    microcode
  x86/microcode/intel: Allow a late-load only if a min rev is specified
  x86/microcode: Avoid any chance of MCE's during microcode update
  x86/x2apic: Support x2apic self IPI with NMI_VECTOR
  x86/microcode: Place siblings in NMI loop while update in progress

 arch/x86/include/asm/mce.h             |   4 +
 arch/x86/include/asm/microcode_intel.h |   4 +-
 arch/x86/kernel/apic/x2apic_phys.c     |   6 +-
 arch/x86/kernel/cpu/mce/core.c         |   9 +
 arch/x86/kernel/cpu/microcode/core.c   | 229 ++++++++++++++++++++++++-
 arch/x86/kernel/cpu/microcode/intel.c  |  34 +++-
 6 files changed, 274 insertions(+), 12 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode
  2022-08-17  5:11 [PATCH v3 0/5] Making microcode late-load robust Ashok Raj
@ 2022-08-17  5:11 ` Ashok Raj
  2022-08-17  7:43   ` Ingo Molnar
  2022-08-19 10:24   ` Borislav Petkov
  2022-08-17  5:11 ` [PATCH v3 2/5] x86/microcode/intel: Allow a late-load only if a min rev is specified Ashok Raj
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 32+ messages in thread
From: Ashok Raj @ 2022-08-17  5:11 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: Tony Luck, Dave Hansen, LKML Mailing List, X86-kernel,
	Andy Lutomirski, Tom Lendacky, Jacon Jun Pan, Ashok Raj

When save_microcode_patch() is looking to replace an existing microcode in
the cache, current code is *only* checks the CPU sig/pf in the main
header. Microcode can carry additional sig/pf combinations in the extended
signature table, which is completely missed today.

For e.g. Current patch is a multi-stepping patch and new incoming patch is
a specific patch just for this CPUs stepping.

patch1:
fms3 <--- header FMS
...
ext_sig:
fms1
fms2

patch2: new
fms2 <--- header FMS

Current code takes only fms3 and checks with patch2 fms2.

saved_patch.header.fms3 != new_patch.header.fms2, so save_microcode_patch
saves it to the end of list instead of replacing patch1 with patch2.

There is no functional user observable issue since find_patch() skips
patch versions that are <= current_patch and will land on patch2 properly.

Nevertheless this will just end up storing every patch that isn't required.
Kernel just needs to store the latest patch. Otherwise its a memory leak
that sits in kernel and never used.

Cc: stable@vger.kernel.org
Fixes: fe055896c040 ("x86/microcode: Merge the early microcode loader")
Tested-by: William Xie <william.xie@intel.com>
Reported-by: William Xie <william.xie@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/intel.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 025c8f0cd948..c4b11e2fbe33 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -114,10 +114,18 @@ static void save_microcode_patch(struct ucode_cpu_info *uci, void *data, unsigne
 
 	list_for_each_entry_safe(iter, tmp, &microcode_cache, plist) {
 		mc_saved_hdr = (struct microcode_header_intel *)iter->data;
-		sig	     = mc_saved_hdr->sig;
-		pf	     = mc_saved_hdr->pf;
 
-		if (find_matching_signature(data, sig, pf)) {
+		sig = uci->cpu_sig.sig;
+		pf  = uci->cpu_sig.pf;
+
+		/*
+		 * Compare the current CPUs signature with the ones in the
+		 * cache to identify the right candidate to replace. At any
+		 * given time, we should have no more than one valid patch
+		 * file for a given CPU fms+pf in the cache list.
+		 */
+
+		if (find_matching_signature(iter->data, sig, pf)) {
 			prev_found = true;
 
 			if (mc_hdr->rev <= mc_saved_hdr->rev)
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 2/5] x86/microcode/intel: Allow a late-load only if a min rev is specified
  2022-08-17  5:11 [PATCH v3 0/5] Making microcode late-load robust Ashok Raj
  2022-08-17  5:11 ` [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode Ashok Raj
@ 2022-08-17  5:11 ` Ashok Raj
  2022-08-17  7:45   ` Ingo Molnar
  2022-08-19 11:11   ` Borislav Petkov
  2022-08-17  5:11 ` [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update Ashok Raj
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 32+ messages in thread
From: Ashok Raj @ 2022-08-17  5:11 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: Tony Luck, Dave Hansen, LKML Mailing List, X86-kernel,
	Andy Lutomirski, Tom Lendacky, Jacon Jun Pan, Ashok Raj

In general users don't have the necessary information to determine
whether a late-load of a new microcode version has removed any feature
(MSR, CPUID etc) between what is currently loaded and this new microcode.
To address this issue, Intel has added a "minimum required version" field to
a previously reserved field in the file header. Microcode updates
should only be applied if the current microcode version is equal
to, or greater than this minimum required version.

https://lore.kernel.org/linux-kernel/alpine.DEB.2.21.1909062237580.1902@nanos.tec.linutronix.de/

Thomas made some suggestions on how meta-data in the microcode file could
provide Linux with information to decide if the new microcode is suitable
candidate for late-load. But even the "simpler" option#1 requires a lot of
metadata and corresponding kernel code to parse it.

The proposal here is an even simpler option. The criteria for a microcode to
be a viable late-load candidate is that no CPUID or OS visible MSR features
are removed with respect to an earlier version of the microcode.

Pseudocode for late-load is as follows:

if header.min_required_id == 0
	This is old format microcode, block late-load
else if current_ucode_version < header.min_required_id
	Current version is too old, block late-load of this microcode.
else
	OK to proceed with late-load.

Any microcode that removes a feature will set the min_version to itself.
This will enforce this microcode is not suitable for late-loading.

The enforcement is not in hardware and limited to kernel loader enforcing
the requirement. It is not required for early loading of microcode to
enforce this requirement, since the new features are only
evaluated after early loading in the boot process.


Test cases covered:

1. With new kernel, attempting to load an older format microcode with the
   min_rev=0 should be blocked by kernel.

   [  210.541802] microcode: Header MUST specify min version for late-load

2. New microcode with a non-zero min_rev in the header, but the specified
   min_rev is greater than what is currently loaded in the CPU should be
   blocked by kernel.

   245.139828] microcode: Current revision 0x8f685300 is too old to update,
must be at 0xaa000050 version or higher

3. New microcode with a min_rev < currently loaded should allow loading the
   microcode

4. Build initrd with microcode that has min_rev=0, or min_rev > currently
   loaded should permit early loading microcode from initrd.


Tested-by: William Xie <william.xie@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/include/asm/microcode_intel.h |  4 +++-
 arch/x86/kernel/cpu/microcode/intel.c  | 20 ++++++++++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/microcode_intel.h b/arch/x86/include/asm/microcode_intel.h
index 4c92cea7e4b5..16b8715e0984 100644
--- a/arch/x86/include/asm/microcode_intel.h
+++ b/arch/x86/include/asm/microcode_intel.h
@@ -14,7 +14,9 @@ struct microcode_header_intel {
 	unsigned int            pf;
 	unsigned int            datasize;
 	unsigned int            totalsize;
-	unsigned int            reserved[3];
+	unsigned int            reserved1;
+	unsigned int		min_req_id;
+	unsigned int            reserved3;
 };
 
 struct microcode_intel {
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index c4b11e2fbe33..1eb202ec2302 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -178,6 +178,7 @@ static int microcode_sanity_check(void *mc, int print_err)
 	struct extended_sigtable *ext_header = NULL;
 	u32 sum, orig_sum, ext_sigcount = 0, i;
 	struct extended_signature *ext_sig;
+	struct ucode_cpu_info uci;
 
 	total_size = get_totalsize(mc_header);
 	data_size = get_datasize(mc_header);
@@ -248,6 +249,25 @@ static int microcode_sanity_check(void *mc, int print_err)
 		return -EINVAL;
 	}
 
+	/*
+	 * Enforce for late-load that min_req_id is specified in the header.
+	 * Otherwise its an old format microcode, reject it.
+	 */
+	if (print_err) {
+		if (!mc_header->min_req_id) {
+			pr_warn("Header MUST specify min version for late-load\n");
+			return -EINVAL;
+		}
+
+		intel_cpu_collect_info(&uci);
+		if (uci.cpu_sig.rev < mc_header->min_req_id) {
+			pr_warn("Current revision 0x%x is too old to update,"
+				"must  be at 0x%x version or higher\n",
+				uci.cpu_sig.rev, mc_header->min_req_id);
+			return -EINVAL;
+		}
+	}
+
 	if (!ext_table_size)
 		return 0;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update
  2022-08-17  5:11 [PATCH v3 0/5] Making microcode late-load robust Ashok Raj
  2022-08-17  5:11 ` [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode Ashok Raj
  2022-08-17  5:11 ` [PATCH v3 2/5] x86/microcode/intel: Allow a late-load only if a min rev is specified Ashok Raj
@ 2022-08-17  5:11 ` Ashok Raj
  2022-08-17  7:41   ` Ingo Molnar
  2022-08-17  5:11 ` [PATCH v3 4/5] x86/x2apic: Support x2apic self IPI with NMI_VECTOR Ashok Raj
  2022-08-17  5:11 ` [PATCH v3 5/5] x86/microcode: Place siblings in NMI loop while update in progress Ashok Raj
  4 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-08-17  5:11 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: Tony Luck, Dave Hansen, LKML Mailing List, X86-kernel,
	Andy Lutomirski, Tom Lendacky, Jacon Jun Pan, Ashok Raj

When a microcode update is in progress, several instructions and MSR's can
be patched by the update. During the update in progress, touching any of
the resources being patched could result in unpredictable results. If
thread0 is doing the update and thread1 happens to get a MCE, the handler
might read an MSR that's being patched.

In order to have predictable behavior, to avoid this scenario we set the MCIP in
all threads. Since MCE's can't be nested, HW will automatically promote to
shutdown condition.

After the update is completed, MCIP flag is cleared. The system is going to
shutdown anyway, since the MCE could be a fatal error, or even recoverable
errors in kernel space are treated as unrecoverable.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/include/asm/mce.h           |  4 ++++
 arch/x86/kernel/cpu/mce/core.c       |  9 +++++++++
 arch/x86/kernel/cpu/microcode/core.c | 11 +++++++++++
 3 files changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index cc73061e7255..2aef6120e23f 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -207,12 +207,16 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c);
 void mcheck_cpu_clear(struct cpuinfo_x86 *c);
 int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
 			       u64 lapic_id);
+extern void mce_set_mcip(void);
+extern void mce_clear_mcip(void);
 #else
 static inline int mcheck_init(void) { return 0; }
 static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
 static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
 static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
 					     u64 lapic_id) { return -EINVAL; }
+static inline void mce_set_mcip(void) {}
+static inline void mce_clear_mcip(void) {}
 #endif
 
 void mce_setup(struct mce *m);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2c8ec5c71712..72b49d95bb3b 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -402,6 +402,15 @@ static noinstr void mce_wrmsrl(u32 msr, u64 v)
 		     : : "c" (msr), "a"(low), "d" (high) : "memory");
 }
 
+void mce_set_mcip(void)
+{
+	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0x1);
+}
+
+void mce_clear_mcip(void)
+{
+	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0x0);
+}
 /*
  * Collect all global (w.r.t. this processor) status about this machine
  * check into our "mce" struct so that we can use it later to assess
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index ad57e0e4d674..d24e1c754c27 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -39,6 +39,7 @@
 #include <asm/processor.h>
 #include <asm/cmdline.h>
 #include <asm/setup.h>
+#include <asm/mce.h>
 
 #define DRIVER_VERSION	"2.2"
 
@@ -450,6 +451,14 @@ static int __reload_late(void *info)
 	if (__wait_for_cpus(&late_cpus_in, NSEC_PER_SEC))
 		return -1;
 
+	/*
+	 * Its dangerous to let MCE while microcode update is in progress.
+	 * Its extremely rare and even if happens they are fatal errors.
+	 * But reading patched areas before the update is complete can be
+	 * leading to unpredictable results. Setting MCIP will guarantee
+	 * the platform is taken to reset predictively.
+	 */
+	mce_set_mcip();
 	/*
 	 * On an SMT system, it suffices to load the microcode on one sibling of
 	 * the core because the microcode engine is shared between the threads.
@@ -457,6 +466,7 @@ static int __reload_late(void *info)
 	 * loading attempts happen on multiple threads of an SMT core. See
 	 * below.
 	 */
+
 	if (cpumask_first(topology_sibling_cpumask(cpu)) == cpu)
 		apply_microcode_local(&err);
 	else
@@ -473,6 +483,7 @@ static int __reload_late(void *info)
 	if (__wait_for_cpus(&late_cpus_out, NSEC_PER_SEC))
 		panic("Timeout during microcode update!\n");
 
+	mce_clear_mcip();
 	/*
 	 * At least one thread has completed update on each core.
 	 * For others, simply call the update to make sure the
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 4/5] x86/x2apic: Support x2apic self IPI with NMI_VECTOR
  2022-08-17  5:11 [PATCH v3 0/5] Making microcode late-load robust Ashok Raj
                   ` (2 preceding siblings ...)
  2022-08-17  5:11 ` [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update Ashok Raj
@ 2022-08-17  5:11 ` Ashok Raj
  2022-08-17  5:11 ` [PATCH v3 5/5] x86/microcode: Place siblings in NMI loop while update in progress Ashok Raj
  4 siblings, 0 replies; 32+ messages in thread
From: Ashok Raj @ 2022-08-17  5:11 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: Tony Luck, Dave Hansen, LKML Mailing List, X86-kernel,
	Andy Lutomirski, Tom Lendacky, Jacon Jun Pan, Ashok Raj,
	Jacob Pan

X2APIC architecture introduced a dedicated register for sending self-IPI.
Though highly optimized for performance, its semantics limit the delivery
mode to fixed mode.  NMI vector is not supported, this created an
inconsistent behavior between X2APIC and others.

This patch adds support for X2APIC NMI_VECTOR by fall back to the slower
ICR method.

Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/apic/x2apic_phys.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index 6bde05a86b4e..cf187f1906b2 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -149,7 +149,11 @@ int x2apic_phys_pkg_id(int initial_apicid, int index_msb)
 
 void x2apic_send_IPI_self(int vector)
 {
-	apic_write(APIC_SELF_IPI, vector);
+	if (unlikely(vector == NMI_VECTOR))
+		apic->send_IPI_mask(cpumask_of(smp_processor_id()),
+				    NMI_VECTOR);
+	else
+		apic_write(APIC_SELF_IPI, vector);
 }
 
 static struct apic apic_x2apic_phys __ro_after_init = {
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 5/5] x86/microcode: Place siblings in NMI loop while update in progress
  2022-08-17  5:11 [PATCH v3 0/5] Making microcode late-load robust Ashok Raj
                   ` (3 preceding siblings ...)
  2022-08-17  5:11 ` [PATCH v3 4/5] x86/x2apic: Support x2apic self IPI with NMI_VECTOR Ashok Raj
@ 2022-08-17  5:11 ` Ashok Raj
  2022-08-30 19:15   ` Andy Lutomirski
  4 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-08-17  5:11 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: Tony Luck, Dave Hansen, LKML Mailing List, X86-kernel,
	Andy Lutomirski, Tom Lendacky, Jacon Jun Pan, Ashok Raj

Microcode updates need a guarantee that the thread sibling that is waiting
for the update to finish on the primary core will not execute any
instructions until the update is complete. This is required to guarantee
any MSR or instruction that's being patched will be executed before the
update is complete.

After the stop_machine() rendezvous, an NMI handler is registered. If an
NMI were to happen while the microcode update is not complete, the
secondary thread will spin until the ucode update state is cleared.

Couple of choices discussed are:

1. Rendezvous inside the NMI handler, and also perform the update from
   within the handler. This seemed too risky and might cause instability
   with the races that we would need to solve. This would be a difficult
   choice.
   	1.a Since the primary thread of every core is performing a wrmsr
	for the update, once the wrmsr has started, it can't be
	interrupted. Hence its not required to NMI the primary thread of
	the core. Only the secondary thread needs to be parked in NMI
	before the update begins.
	Suggested by From Andy Cooper
2. Thomas (tglx) suggested that we could look into masking all the LVT
   originating NMI's. Such as LINT1, Perf control LVT entries and such.
   Since we are in the rendezvous loop, we don't need to worry about any
   NMI IPI's generated by the OS.

   The one we didn't have any control over is the ACPI mechanism of sending
   notifications to kernel for Firmware First Processing (FFM). Apparently
   it seems there is a PCH register that BIOS in SMI would write to
   generate such an interrupt (ACPI GHES).
3. This is a simpler option. OS registers an NMI handler and doesn't do any
   NMI rendezvous dance. But if an NMI were to happen, we check if any of
   the CPUs thread siblings have an update in progress. Only those CPUs
   would take an NMI. The thread performing the wrmsr() will only take an
   NMI after the completion of the wrmsr 0x79 flow.

   [ Lutomirsky thinks this is weak, and what happens from taking the
   interrupt and the path to the registered callback handler might be
   exposed.]

   Seems like 1.a is the best candidate.

The algorithm is something like this:

After stop_machine() all threads are executing __reload_late()

nmi_callback()
{
	if (!in_ucode_update)
		return NMI_DONE;
	if (cpu not in sibling_mask)
		return NMI_DONE;
	update sibling reached NMI for primary to continue

	while (cpu in sibling_mask)
		wait;
	return NMI_HANDLED;
}

__reload_late()
{

	entry_rendezvous(&late_cpus_in);
	set_mcip()
	if (this_cpu is first_cpu in the core)
		wait for siblings to drop in NMI
		apply_microcode()
	else {
		send self_ipi(NMI_VECTOR);
		goto wait_for_siblings;
	}

wait_for_siblings:
	exit_rendezvous(&late_cpus_out);
	clear_mcip
}

reload_late()
{
	register_nmi_handler()
	prepare_mask of all sibling cpus()
	update state = ucode in progress;
	stop_machine();
	unregister_nmi_handler();
}

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/core.c | 218 ++++++++++++++++++++++++++-
 1 file changed, 211 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index d24e1c754c27..fd3b8ce2c82a 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -39,7 +39,9 @@
 #include <asm/processor.h>
 #include <asm/cmdline.h>
 #include <asm/setup.h>
+#include <asm/apic.h>
 #include <asm/mce.h>
+#include <asm/nmi.h>
 
 #define DRIVER_VERSION	"2.2"
 
@@ -411,6 +413,13 @@ static int check_online_cpus(void)
 
 static atomic_t late_cpus_in;
 static atomic_t late_cpus_out;
+static atomic_t nmi_cpus;	// number of CPUs that enter NMI
+static atomic_t nmi_timeouts;   // number of siblings that timeout
+static atomic_t nmi_siblings;   // Nmber of siblings that enter NMI
+static atomic_t in_ucode_update;// Are we in microcode update?
+static atomic_t nmi_exit;       // Siblings that exit NMI
+
+static struct cpumask all_sibling_mask;
 
 static int __wait_for_cpus(atomic_t *t, long long timeout)
 {
@@ -433,6 +442,104 @@ static int __wait_for_cpus(atomic_t *t, long long timeout)
 	return 0;
 }
 
+struct core_rendez {
+	int num_core_cpus;
+	atomic_t callin;
+	atomic_t core_done;
+};
+
+static DEFINE_PER_CPU(struct core_rendez, core_sync);
+
+static int __wait_for_update(atomic_t *t, long long timeout)
+{
+	while (!atomic_read(t)) {
+		if (timeout < SPINUNIT)
+			return 1;
+
+		cpu_relax();
+		ndelay(SPINUNIT);
+		timeout -= SPINUNIT;
+		touch_nmi_watchdog();
+	}
+	return 0;
+}
+
+static int ucode_nmi_cb(unsigned int val, struct pt_regs *regs)
+{
+	int ret, first_cpu, cpu = smp_processor_id();
+	struct core_rendez *rendez;
+
+	atomic_inc(&nmi_cpus);
+	if (!atomic_read(&in_ucode_update))
+		return NMI_DONE;
+
+	if (!cpumask_test_cpu(cpu, &all_sibling_mask))
+		return NMI_DONE;
+
+	first_cpu = cpumask_first(topology_sibling_cpumask(cpu));
+	rendez = &per_cpu(core_sync, first_cpu);
+
+	/*
+	 * If primary has marked update is complete, we don't need to be
+	 * here in the NMI handler.
+	 */
+	if (atomic_read(&rendez->core_done))
+		return NMI_DONE;
+
+	atomic_inc(&nmi_siblings);
+	pr_debug("Sibling CPU %d made into NMI handler\n", cpu);
+	/*
+	 * primary thread waits for all siblings to checkin the NMI handler
+	 * before performing the microcode update
+	 */
+
+	atomic_inc(&rendez->callin);
+	ret = __wait_for_update(&rendez->core_done, NSEC_PER_SEC);
+	if (ret) {
+		atomic_inc(&nmi_timeouts);
+		pr_debug("Sibling CPU %d sibling timedout\n",cpu);
+	}
+	/*
+	 * Once primary signals update is complete, we are free to get out
+	 * of the NMI jail
+	 */
+	if (atomic_read(&rendez->core_done)) {
+		pr_debug("Sibling CPU %d breaking from NMI\n", cpu);
+		atomic_inc(&nmi_exit);
+	}
+
+	return NMI_HANDLED;
+}
+
+/*
+ * Primary thread clears the cpumask to release the siblings from the NMI
+ * jail
+ */
+
+static void clear_nmi_cpus(void)
+{
+	int first_cpu, wait_cpu, cpu = smp_processor_id();
+
+	first_cpu = cpumask_first(topology_sibling_cpumask(cpu));
+	for_each_cpu(wait_cpu, topology_sibling_cpumask(cpu)) {
+		if (wait_cpu == first_cpu)
+			continue;
+		cpumask_clear_cpu(wait_cpu, &all_sibling_mask);
+	}
+}
+
+static int __wait_for_siblings(struct core_rendez *rendez, long long timeout)
+{
+	int num_sibs = rendez->num_core_cpus - 1;
+	atomic_t *t = &rendez->callin;
+
+	while (atomic_read(t) < num_sibs) {
+		cpu_relax();
+		touch_nmi_watchdog();
+	}
+	return 0;
+}
+
 /*
  * Returns:
  * < 0 - on error
@@ -440,17 +547,20 @@ static int __wait_for_cpus(atomic_t *t, long long timeout)
  */
 static int __reload_late(void *info)
 {
-	int cpu = smp_processor_id();
+	int first_cpu, cpu = smp_processor_id();
 	enum ucode_state err;
 	int ret = 0;
 
 	/*
 	 * Wait for all CPUs to arrive. A load will not be attempted unless all
 	 * CPUs show up.
-	 * */
+	 */
 	if (__wait_for_cpus(&late_cpus_in, NSEC_PER_SEC))
 		return -1;
 
+	if (cpumask_first(cpu_online_mask) == cpu)
+		pr_debug("__reload_late: Entry Sync Done\n");
+
 	/*
 	 * Its dangerous to let MCE while microcode update is in progress.
 	 * Its extremely rare and even if happens they are fatal errors.
@@ -459,6 +569,7 @@ static int __reload_late(void *info)
 	 * the platform is taken to reset predictively.
 	 */
 	mce_set_mcip();
+
 	/*
 	 * On an SMT system, it suffices to load the microcode on one sibling of
 	 * the core because the microcode engine is shared between the threads.
@@ -466,13 +577,35 @@ static int __reload_late(void *info)
 	 * loading attempts happen on multiple threads of an SMT core. See
 	 * below.
 	 */
+	first_cpu = cpumask_first(topology_sibling_cpumask(cpu));
 
-	if (cpumask_first(topology_sibling_cpumask(cpu)) == cpu)
+	/*
+	 * Set the CPUs that we should hold in NMI until the primary has
+	 * completed the microcode update.
+	 */
+	if (first_cpu == cpu) {
+		struct core_rendez *pcpu_core = &per_cpu(core_sync, cpu);
+
+		/*
+		 * Wait for all siblings to enter
+		 * NMI before performing the update
+		 */
+		ret = __wait_for_siblings(pcpu_core, NSEC_PER_SEC);
+		if (ret) {
+			pr_err("CPU %d core lead timeout waiting for"
+			       " siblings\n", cpu);
+			ret = -1;
+		}
+		pr_debug("Primary CPU %d proceeding with update\n", cpu);
 		apply_microcode_local(&err);
-	else
+		atomic_set(&pcpu_core->core_done, 1);
+		clear_nmi_cpus();
+	} else {
+		apic->send_IPI_self(NMI_VECTOR);
 		goto wait_for_siblings;
+	}
 
-	if (err >= UCODE_NFOUND) {
+	if (ret || err >= UCODE_NFOUND) {
 		if (err == UCODE_ERROR)
 			pr_warn("Error reloading microcode on CPU %d\n", cpu);
 
@@ -483,6 +616,9 @@ static int __reload_late(void *info)
 	if (__wait_for_cpus(&late_cpus_out, NSEC_PER_SEC))
 		panic("Timeout during microcode update!\n");
 
+	if (cpumask_first(cpu_online_mask) == cpu)
+		pr_debug("__reload_late: Exit Sync Done\n");
+
 	mce_clear_mcip();
 	/*
 	 * At least one thread has completed update on each core.
@@ -496,26 +632,94 @@ static int __reload_late(void *info)
 	return ret;
 }
 
+static void set_nmi_cpus(int cpu)
+{
+	int first_cpu, wait_cpu;
+	struct core_rendez *pcpu_core = &per_cpu(core_sync, cpu);
+
+	first_cpu = cpumask_first(topology_sibling_cpumask(cpu));
+	for_each_cpu(wait_cpu, topology_sibling_cpumask(cpu)) {
+		if (wait_cpu == first_cpu) {
+			pcpu_core->num_core_cpus =
+					cpumask_weight(topology_sibling_cpumask(wait_cpu));
+			continue;
+		}
+		cpumask_set_cpu(wait_cpu, &all_sibling_mask);
+	}
+}
+
+static void prepare_siblings(void)
+{
+	int cpu;
+
+	for_each_cpu(cpu, cpu_online_mask) {
+		set_nmi_cpus(cpu);
+	}
+}
+
 /*
  * Reload microcode late on all CPUs. Wait for a sec until they
  * all gather together.
  */
 static int microcode_reload_late(void)
 {
-	int ret;
+	int ret = 0;
 
 	pr_err("Attempting late microcode loading - it is dangerous and taints the kernel.\n");
 	pr_err("You should switch to early loading, if possible.\n");
 
+	/*
+	 * Used for late_load entry and exit rendezvous
+	 */
 	atomic_set(&late_cpus_in,  0);
 	atomic_set(&late_cpus_out, 0);
 
+	/*
+	 * in_ucode_update: Global state while in ucode update
+	 * nmi_cpus: Count of CPUs entering NMI while ucode in progress
+	 * nmi_siblings: Count of siblings that enter NMI
+	 * nmi_timeouts: Count of siblings that fail to see mask clear
+	 */
+	atomic_set(&in_ucode_update,0);
+	atomic_set(&nmi_cpus, 0);
+	atomic_set(&nmi_timeouts, 0);
+	atomic_set(&nmi_siblings, 0);
+
+	cpumask_clear(&all_sibling_mask);
+
+	ret = register_nmi_handler(NMI_LOCAL, ucode_nmi_cb, NMI_FLAG_FIRST,
+				   "ucode_nmi");
+	if (ret) {
+		pr_err("Unable to register NMI handler\n");
+		goto done;
+	}
+
+	/*
+	 * Prepare everything for siblings threads to drop into NMI while
+	 * the update is in progress.
+	 */
+	prepare_siblings();
+	atomic_set(&in_ucode_update, 1);
+#if 0
+	apic->send_IPI_mask(&all_sibling_mask, NMI_VECTOR);
+	pr_debug("Sent NMI broadcast to all sibling cpus\n");
+#endif
 	ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);
 	if (ret == 0)
 		microcode_check();
 
-	pr_info("Reload completed, microcode revision: 0x%x\n", boot_cpu_data.microcode);
+	unregister_nmi_handler(NMI_LOCAL, "ucode_nmi");
+
+	pr_debug("Total CPUs that entered NMI     ... %d\n",
+		 atomic_read(&nmi_cpus));
+	pr_debug("Total siblings that entered NMI ... %d\n",
+		 atomic_read(&nmi_siblings));
+	pr_debug("Total siblings timedout         ... %d\n",
+		 atomic_read(&nmi_timeouts));
+	pr_info("Reload completed, microcode revision: 0x%x\n",
+	        boot_cpu_data.microcode);
 
+done:
 	return ret;
 }
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update
  2022-08-17  5:11 ` [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update Ashok Raj
@ 2022-08-17  7:41   ` Ingo Molnar
  2022-08-17  7:58     ` Ingo Molnar
  2022-08-17 11:40     ` Ashok Raj
  0 siblings, 2 replies; 32+ messages in thread
From: Ingo Molnar @ 2022-08-17  7:41 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Borislav Petkov, Thomas Gleixner, Tony Luck, Dave Hansen,
	LKML Mailing List, X86-kernel, Andy Lutomirski, Tom Lendacky,
	Jacon Jun Pan


* Ashok Raj <ashok.raj@intel.com> wrote:

> When a microcode update is in progress, several instructions and MSR's can
> be patched by the update. During the update in progress, touching any of
> the resources being patched could result in unpredictable results. If
> thread0 is doing the update and thread1 happens to get a MCE, the handler
> might read an MSR that's being patched.
> 
> In order to have predictable behavior, to avoid this scenario we set the MCIP in
> all threads. Since MCE's can't be nested, HW will automatically promote to
> shutdown condition.
> 
> After the update is completed, MCIP flag is cleared. The system is going to
> shutdown anyway, since the MCE could be a fatal error, or even recoverable
> errors in kernel space are treated as unrecoverable.
> 
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  arch/x86/include/asm/mce.h           |  4 ++++
>  arch/x86/kernel/cpu/mce/core.c       |  9 +++++++++
>  arch/x86/kernel/cpu/microcode/core.c | 11 +++++++++++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index cc73061e7255..2aef6120e23f 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -207,12 +207,16 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c);
>  void mcheck_cpu_clear(struct cpuinfo_x86 *c);
>  int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
>  			       u64 lapic_id);
> +extern void mce_set_mcip(void);
> +extern void mce_clear_mcip(void);
>  #else
>  static inline int mcheck_init(void) { return 0; }
>  static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
>  static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
>  static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
>  					     u64 lapic_id) { return -EINVAL; }
> +static inline void mce_set_mcip(void) {}
> +static inline void mce_clear_mcip(void) {}
>  #endif
>  
>  void mce_setup(struct mce *m);
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 2c8ec5c71712..72b49d95bb3b 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -402,6 +402,15 @@ static noinstr void mce_wrmsrl(u32 msr, u64 v)
>  		     : : "c" (msr), "a"(low), "d" (high) : "memory");
>  }
>  
> +void mce_set_mcip(void)
> +{
> +	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0x1);
> +}
> +
> +void mce_clear_mcip(void)
> +{
> +	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0x0);
> +}

Instead of naming new APIs after how they are doing stuff, please name them 
after *what* they are doing at the highest level: they disable/enable MCEs.

Ie. I'd suggest something like:

     mce_disable()
     mce_enable()

I'd also suggest to at minimum add a WARN_ON_ONCE() if MSR_IA32_MCG_STATUS 
is already 1 when we disable it - because whoever wanted it disabled will 
now be surprised by us enabling them again.

> +	/*
> +	 * Its dangerous to let MCE while microcode update is in progress.

s/let MCE while
 /let MCEs execute while

> +	 * Its extremely rare and even if happens they are fatal errors.
> +	 * But reading patched areas before the update is complete can be
> +	 * leading to unpredictable results. Setting MCIP will guarantee

s/can be leading to
 /can lead to

> +	 * the platform is taken to reset predictively.

What does 'the platform is taken to reset predictively' mean?

Did you mean 'predictibly'/'reliably'?

> +	 */
> +	mce_set_mcip();
>  	/*
>  	 * On an SMT system, it suffices to load the microcode on one sibling of
>  	 * the core because the microcode engine is shared between the threads.
> @@ -457,6 +466,7 @@ static int __reload_late(void *info)
>  	 * loading attempts happen on multiple threads of an SMT core. See
>  	 * below.
>  	 */
> +
>  	if (cpumask_first(topology_sibling_cpumask(cpu)) == cpu)
>  		apply_microcode_local(&err);
>  	else

Spurious newline added?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode
  2022-08-17  5:11 ` [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode Ashok Raj
@ 2022-08-17  7:43   ` Ingo Molnar
  2022-08-17 10:45     ` Ashok Raj
  2022-08-19 10:24   ` Borislav Petkov
  1 sibling, 1 reply; 32+ messages in thread
From: Ingo Molnar @ 2022-08-17  7:43 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Borislav Petkov, Thomas Gleixner, Tony Luck, Dave Hansen,
	LKML Mailing List, X86-kernel, Andy Lutomirski, Tom Lendacky,
	Jacon Jun Pan


* Ashok Raj <ashok.raj@intel.com> wrote:

> Cc: stable@vger.kernel.org

Not sure the series justifies stable - there's like a ton of things that 
could go wrong with a series like this & we want to have some serious 
testing first.

>  	list_for_each_entry_safe(iter, tmp, &microcode_cache, plist) {
>  		mc_saved_hdr = (struct microcode_header_intel *)iter->data;
> -		sig	     = mc_saved_hdr->sig;
> -		pf	     = mc_saved_hdr->pf;
>  
> -		if (find_matching_signature(data, sig, pf)) {
> +		sig = uci->cpu_sig.sig;
> +		pf  = uci->cpu_sig.pf;
> +
> +		/*
> +		 * Compare the current CPUs signature with the ones in the

s/CPUs
 /CPU's

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 2/5] x86/microcode/intel: Allow a late-load only if a min rev is specified
  2022-08-17  5:11 ` [PATCH v3 2/5] x86/microcode/intel: Allow a late-load only if a min rev is specified Ashok Raj
@ 2022-08-17  7:45   ` Ingo Molnar
  2022-08-19 11:11   ` Borislav Petkov
  1 sibling, 0 replies; 32+ messages in thread
From: Ingo Molnar @ 2022-08-17  7:45 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Borislav Petkov, Thomas Gleixner, Tony Luck, Dave Hansen,
	LKML Mailing List, X86-kernel, Andy Lutomirski, Tom Lendacky,
	Jacon Jun Pan


* Ashok Raj <ashok.raj@intel.com> wrote:

> In general users don't have the necessary information to determine
> whether a late-load of a new microcode version has removed any feature
> (MSR, CPUID etc) between what is currently loaded and this new microcode.
> To address this issue, Intel has added a "minimum required version" field to
> a previously reserved field in the file header. Microcode updates
> should only be applied if the current microcode version is equal
> to, or greater than this minimum required version.
> 
> https://lore.kernel.org/linux-kernel/alpine.DEB.2.21.1909062237580.1902@nanos.tec.linutronix.de/
> 
> Thomas made some suggestions on how meta-data in the microcode file could
> provide Linux with information to decide if the new microcode is suitable
> candidate for late-load. But even the "simpler" option#1 requires a lot of
> metadata and corresponding kernel code to parse it.
> 
> The proposal here is an even simpler option. The criteria for a microcode to
> be a viable late-load candidate is that no CPUID or OS visible MSR features
> are removed with respect to an earlier version of the microcode.
> 
> Pseudocode for late-load is as follows:
> 
> if header.min_required_id == 0
> 	This is old format microcode, block late-load
> else if current_ucode_version < header.min_required_id
> 	Current version is too old, block late-load of this microcode.
> else
> 	OK to proceed with late-load.
> 
> Any microcode that removes a feature will set the min_version to itself.
> This will enforce this microcode is not suitable for late-loading.
> 
> The enforcement is not in hardware and limited to kernel loader enforcing
> the requirement. It is not required for early loading of microcode to
> enforce this requirement, since the new features are only
> evaluated after early loading in the boot process.
> 
> 
> Test cases covered:
> 
> 1. With new kernel, attempting to load an older format microcode with the
>    min_rev=0 should be blocked by kernel.
> 
>    [  210.541802] microcode: Header MUST specify min version for late-load
> 
> 2. New microcode with a non-zero min_rev in the header, but the specified
>    min_rev is greater than what is currently loaded in the CPU should be
>    blocked by kernel.
> 
>    245.139828] microcode: Current revision 0x8f685300 is too old to update,
> must be at 0xaa000050 version or higher
> 
> 3. New microcode with a min_rev < currently loaded should allow loading the
>    microcode
> 
> 4. Build initrd with microcode that has min_rev=0, or min_rev > currently
>    loaded should permit early loading microcode from initrd.
> 
> 
> Tested-by: William Xie <william.xie@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  arch/x86/include/asm/microcode_intel.h |  4 +++-
>  arch/x86/kernel/cpu/microcode/intel.c  | 20 ++++++++++++++++++++
>  2 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/microcode_intel.h b/arch/x86/include/asm/microcode_intel.h
> index 4c92cea7e4b5..16b8715e0984 100644
> --- a/arch/x86/include/asm/microcode_intel.h
> +++ b/arch/x86/include/asm/microcode_intel.h
> @@ -14,7 +14,9 @@ struct microcode_header_intel {
>  	unsigned int            pf;
>  	unsigned int            datasize;
>  	unsigned int            totalsize;
> -	unsigned int            reserved[3];
> +	unsigned int            reserved1;
> +	unsigned int		min_req_id;
> +	unsigned int            reserved3;
>  };
>  
>  struct microcode_intel {
> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
> index c4b11e2fbe33..1eb202ec2302 100644
> --- a/arch/x86/kernel/cpu/microcode/intel.c
> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> @@ -178,6 +178,7 @@ static int microcode_sanity_check(void *mc, int print_err)
>  	struct extended_sigtable *ext_header = NULL;
>  	u32 sum, orig_sum, ext_sigcount = 0, i;
>  	struct extended_signature *ext_sig;
> +	struct ucode_cpu_info uci;
>  
>  	total_size = get_totalsize(mc_header);
>  	data_size = get_datasize(mc_header);
> @@ -248,6 +249,25 @@ static int microcode_sanity_check(void *mc, int print_err)
>  		return -EINVAL;
>  	}
>  
> +	/*
> +	 * Enforce for late-load that min_req_id is specified in the header.
> +	 * Otherwise its an old format microcode, reject it.

s/its
 /it's

...

> +	 */
> +	if (print_err) {
> +		if (!mc_header->min_req_id) {
> +			pr_warn("Header MUST specify min version for late-load\n");
> +			return -EINVAL;
> +		}
> +
> +		intel_cpu_collect_info(&uci);
> +		if (uci.cpu_sig.rev < mc_header->min_req_id) {
> +			pr_warn("Current revision 0x%x is too old to update,"
> +				"must  be at 0x%x version or higher\n",
> +				uci.cpu_sig.rev, mc_header->min_req_id);

Please don't line-break user-visible syslog strings, just because 
checkpatch is stupid.

If the user sees it as a single line, developers should see that same line 
too...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update
  2022-08-17  7:41   ` Ingo Molnar
@ 2022-08-17  7:58     ` Ingo Molnar
  2022-08-17  8:09       ` Borislav Petkov
  2022-08-17 11:40     ` Ashok Raj
  1 sibling, 1 reply; 32+ messages in thread
From: Ingo Molnar @ 2022-08-17  7:58 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Borislav Petkov, Thomas Gleixner, Tony Luck, Dave Hansen,
	LKML Mailing List, X86-kernel, Andy Lutomirski, Tom Lendacky,
	Jacon Jun Pan


* Ingo Molnar <mingo@kernel.org> wrote:

> > +void mce_set_mcip(void)
> > +{
> > +	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0x1);
> > +}
> > +
> > +void mce_clear_mcip(void)
> > +{
> > +	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0x0);
> > +}
> 
> Instead of naming new APIs after how they are doing stuff, please name them 
> after *what* they are doing at the highest level: they disable/enable MCEs.
> 
> Ie. I'd suggest something like:
> 
>      mce_disable()
>      mce_enable()
> 
> I'd also suggest to at minimum add a WARN_ON_ONCE() if MSR_IA32_MCG_STATUS 
> is already 1 when we disable it - because whoever wanted it disabled will 
> now be surprised by us enabling them again.

Also, Boris tells me that writing 0x0 to MSR_IA32_MCG_STATUS apparently 
shuts the platform down - which is not ideal...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update
  2022-08-17  7:58     ` Ingo Molnar
@ 2022-08-17  8:09       ` Borislav Petkov
  2022-08-17 11:57         ` Ashok Raj
  0 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2022-08-17  8:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ashok Raj, Thomas Gleixner, Tony Luck, Dave Hansen,
	LKML Mailing List, X86-kernel, Andy Lutomirski, Tom Lendacky,
	Jacon Jun Pan

On Wed, Aug 17, 2022 at 09:58:03AM +0200, Ingo Molnar wrote:
> Also, Boris tells me that writing 0x0 to MSR_IA32_MCG_STATUS
> apparently shuts the platform down - which is not ideal...

Right, if you get an MCE raised while MCIP=0, the machine shuts down.

And frankly, I can't think of a good solution to this whole issue:

- with current hw, if you get an MCE and MCIP=0 -> shutdown

- in the future, even if you change the hardware to block MCEs from
being detected while the microcode update runs, what happens if a CPU
encounters a hw error during that update?

You raise it immediately after? What if there are multiple MCEs? Not
unheard of on a big machine...

Worse, what happens if there's a bitflip in the memory where the
to-be-updated microcode patch is?

You report the error afterwards?

Just thinking about this makes me real nervous.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode
  2022-08-17  7:43   ` Ingo Molnar
@ 2022-08-17 10:45     ` Ashok Raj
  0 siblings, 0 replies; 32+ messages in thread
From: Ashok Raj @ 2022-08-17 10:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, Thomas Gleixner, Tony Luck, Dave Hansen,
	LKML Mailing List, X86-kernel, Andy Lutomirski, Tom Lendacky,
	Jacon Jun Pan, Ashok Raj

Hi Ingo

On Wed, Aug 17, 2022 at 09:43:07AM +0200, Ingo Molnar wrote:
> 
> * Ashok Raj <ashok.raj@intel.com> wrote:
> 
> > Cc: stable@vger.kernel.org
> 
> Not sure the series justifies stable - there's like a ton of things that 
> could go wrong with a series like this & we want to have some serious 
> testing first.

Its my bad I got lazy and put this patch in the series. This is a *bug* in
existing code, and completely unrelated.  I must have submitted it
just by itself. 

Boris, let me know if you want me to resubmit and I'll do this by itself.

> 
> >  	list_for_each_entry_safe(iter, tmp, &microcode_cache, plist) {
> >  		mc_saved_hdr = (struct microcode_header_intel *)iter->data;
> > -		sig	     = mc_saved_hdr->sig;
> > -		pf	     = mc_saved_hdr->pf;
> >  
> > -		if (find_matching_signature(data, sig, pf)) {
> > +		sig = uci->cpu_sig.sig;
> > +		pf  = uci->cpu_sig.pf;
> > +
> > +		/*
> > +		 * Compare the current CPUs signature with the ones in the
> 
> s/CPUs
>  /CPU's

Thanks, will update.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update
  2022-08-17  7:41   ` Ingo Molnar
  2022-08-17  7:58     ` Ingo Molnar
@ 2022-08-17 11:40     ` Ashok Raj
  1 sibling, 0 replies; 32+ messages in thread
From: Ashok Raj @ 2022-08-17 11:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, Thomas Gleixner, Tony Luck, Dave Hansen,
	LKML Mailing List, X86-kernel, Andy Lutomirski, Tom Lendacky,
	Jacon Jun Pan, Ashok Raj

On Wed, Aug 17, 2022 at 09:41:31AM +0200, Ingo Molnar wrote:
> 
> > +void mce_set_mcip(void)
> > +{
> > +	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0x1);
> > +}
> > +
> > +void mce_clear_mcip(void)
> > +{
> > +	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0x0);
> > +}
> 
> Instead of naming new APIs after how they are doing stuff, please name them 
> after *what* they are doing at the highest level: they disable/enable MCEs.
> 
> Ie. I'd suggest something like:
> 
>      mce_disable()
>      mce_enable()

We actually aren't disabling MCE's we set things up to promote it to a more
severe shutdown if an MCE were to be signaled while in the ucode update
flow.

I'm struggling to find a suitable name. But I agree with what you are
saying.

promote_mce_to_fatal()? I'll take any names that seem fit.

> 
> I'd also suggest to at minimum add a WARN_ON_ONCE() if MSR_IA32_MCG_STATUS 
> is already 1 when we disable it - because whoever wanted it disabled will 
> now be surprised by us enabling them again.

Ok, will add.

> 
> > +	/*
> > +	 * Its dangerous to let MCE while microcode update is in progress.
> 
> s/let MCE while
>  /let MCEs execute while
> 
> > +	 * Its extremely rare and even if happens they are fatal errors.
> > +	 * But reading patched areas before the update is complete can be
> > +	 * leading to unpredictable results. Setting MCIP will guarantee
> 
> s/can be leading to
>  /can lead to
> 
> > +	 * the platform is taken to reset predictively.
> 
> What does 'the platform is taken to reset predictively' mean?

Since we are setting MCG_STATUS.MCIP=1, since MCE's aren't nestable, 
if there is a hardware event trying to signal a MCE, it will turn into a
platform reset. The MCE registers that logged the event will be sticky
and preserve in a warm reset case. BIOS or OS can pickup values after the
reboot is complete.

> 
> Did you mean 'predictibly'/'reliably'?

I don't know the difference, except both are a trustworthy topic :-)

I like predictable, the system is going down.. not reliable :-)
> 
> > +	 */
> > +	mce_set_mcip();
> >  	/*
> >  	 * On an SMT system, it suffices to load the microcode on one sibling of
> >  	 * the core because the microcode engine is shared between the threads.
> > @@ -457,6 +466,7 @@ static int __reload_late(void *info)
> >  	 * loading attempts happen on multiple threads of an SMT core. See
> >  	 * below.
> >  	 */
> > +
> >  	if (cpumask_first(topology_sibling_cpumask(cpu)) == cpu)
> >  		apply_microcode_local(&err);
> >  	else
> 
> Spurious newline added?

It's a gonner !

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update
  2022-08-17  8:09       ` Borislav Petkov
@ 2022-08-17 11:57         ` Ashok Raj
  2022-08-17 12:10           ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-08-17 11:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, Tony Luck, Dave Hansen,
	LKML Mailing List, X86-kernel, Andy Lutomirski, Tom Lendacky,
	Jacon Jun Pan, Ashok Raj

On Wed, Aug 17, 2022 at 10:09:00AM +0200, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 09:58:03AM +0200, Ingo Molnar wrote:
> > Also, Boris tells me that writing 0x0 to MSR_IA32_MCG_STATUS
> > apparently shuts the platform down - which is not ideal...
> 
> Right, if you get an MCE raised while MCIP=0, the machine shuts down.
> 
> And frankly, I can't think of a good solution to this whole issue:
> 
> - with current hw, if you get an MCE and MCIP=0 -> shutdown

You have this reversed. if you get an MCE and MCIP=1 -> shutdown

I'm still very reluctant, this is actually an overkill. I added what is
possible based on Boris's recommendation.

When MCE's happen during the update they are always fatal errors. But
atleast you can log them, even if some other weird error were to be
observed because they stomed over the patch area that primary is currently
working on. 

What we do here by setting MCIP=1, we promote to a more severe shutdown.

Ideally I would rather let the fallout happen since its observable vs a
blind shutdown is what we are promoting to.

> 
> - in the future, even if you change the hardware to block MCEs from
> being detected while the microcode update runs, what happens if a CPU
> encounters a hw error during that update?

I don't think there ever will be blocking MCE's :-)

If an error happens, it leads to shutdown.
> 
> You raise it immediately after? What if there are multiple MCEs? Not
> unheard of on a big machine...

Shutdown, shutdown.. There is only 1 MCE no matter how many CPUs you have.

Exception is the Local MCE which is recoverable, but only to user space.

If you get an error in the atomic we are polling, its a fatal error since
SW can't recover and we shutdown.
> 
> Worse, what happens if there's a bitflip in the memory where the
> to-be-updated microcode patch is?
> 
> You report the error afterwards?
> 
> Just thinking about this makes me real nervous.

Overthinking :-).. If there is concensus, if Boris feels comfortable
enough, i would drop this patch.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update
  2022-08-17 11:57         ` Ashok Raj
@ 2022-08-17 12:10           ` Borislav Petkov
  2022-08-17 12:30             ` Ashok Raj
  0 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2022-08-17 12:10 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Ingo Molnar, Thomas Gleixner, Tony Luck, Dave Hansen,
	LKML Mailing List, X86-kernel, Andy Lutomirski, Tom Lendacky,
	Jacon Jun Pan

On Wed, Aug 17, 2022 at 11:57:37AM +0000, Ashok Raj wrote:
> You have this reversed. if you get an MCE and MCIP=1 -> shutdown

Yeah yeah.

> When MCE's happen during the update they are always fatal errors.

How did you decide that?

Because all CPUs are executing the loop and thus no user process?

> What we do here by setting MCIP=1, we promote to a more severe shutdown.

It probably should say somewhere that a shutdown is possible. Because if
the shutdown really happens, you get a black screen and no info as to
why...

> Ideally I would rather let the fallout happen since its observable vs a
> blind shutdown is what we are promoting to.

What fallout do you mean exactly?

> Shutdown, shutdown.. There is only 1 MCE no matter how many CPUs you have.

Because all CPUs are executing the loop? Or how do you decide this?

> Exception is the Local MCE which is recoverable, but only to user space.
> 
> If you get an error in the atomic we are polling, its a fatal error since
> SW can't recover and we shutdown.

Aha, I think you mean that: the MCE is fatal because if it happens on
any CPU, it will be in kernel mode.

> Overthinking :-).. If there is concensus, if Boris feels comfortable
> enough, i would drop this patch.

This is what we're doing right now - thinking about the consensus. And
Boris will feel comfortable once we've arrived at a sensible decision.

:-)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update
  2022-08-17 12:10           ` Borislav Petkov
@ 2022-08-17 12:30             ` Ashok Raj
  2022-08-17 14:19               ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-08-17 12:30 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, Tony Luck, Dave Hansen,
	LKML Mailing List, X86-kernel, Andy Lutomirski, Tom Lendacky,
	Jacon Jun Pan, Ashok Raj

On Wed, Aug 17, 2022 at 02:10:51PM +0200, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 11:57:37AM +0000, Ashok Raj wrote:
> > You have this reversed. if you get an MCE and MCIP=1 -> shutdown
> 
> Yeah yeah.
> 
> > When MCE's happen during the update they are always fatal errors.
> 
> How did you decide that?
> 
> Because all CPUs are executing the loop and thus no user process?

Correct. There can be a fatal IOMCA which is asynchronous and can fire
anytime. We removed any CPU initiated async errors like patrol-scrub and
cache errors observed during eviction (EWB) into CMCI's when we developed
LMCE.

> 
> > What we do here by setting MCIP=1, we promote to a more severe shutdown.
> 
> It probably should say somewhere that a shutdown is possible. Because if
> the shutdown really happens, you get a black screen and no info as to
> why...

You will find out when system returns after reboot and hopefully wasn't
promoted to a cold-boot which will loose MCE banks.

I can check with the HW team and get back.. 

> 
> > Ideally I would rather let the fallout happen since its observable vs a
> > blind shutdown is what we are promoting to.
> 
> What fallout do you mean exactly?

Meaning deal with the effect of a really rare MCE. Rather than trying to
avoid it. Taking the MCE is more important than finishing the update,
and loosing what the error signaled was trying to convey.

> 
> > Shutdown, shutdown.. There is only 1 MCE no matter how many CPUs you have.
> 
> Because all CPUs are executing the loop? Or how do you decide this?

Fatal errors signaled with PCC=1 in the MCAx.STATUS is *ALWAYS*
broadcast[1] to all CPUs in the system. So MCIP is set at the source of
the CPU and if any other CPU is also attempting they go down. You can't
even collect data.

[1] Broadcast is true for Intel CPUs, its legacy behavior
LMCE is the only one that isn't broadcast and truly local to the logical
thread.


> 
> > Exception is the Local MCE which is recoverable, but only to user space.
> > 
> > If you get an error in the atomic we are polling, its a fatal error since
> > SW can't recover and we shutdown.
> 
> Aha, I think you mean that: the MCE is fatal because if it happens on
> any CPU, it will be in kernel mode.

Yep!

> 
> > Overthinking :-).. If there is concensus, if Boris feels comfortable
> > enough, i would drop this patch.
> 
> This is what we're doing right now - thinking about the consensus. And
> Boris will feel comfortable once we've arrived at a sensible decision.
> 
> :-)
> 
I'm waiting for the results. :-).  And if you feel we can merge the
- Patch1 - bug fix 
- Patch2 - min-rev id

I do have the comments from Ingo captured, but I'll wait for other comments
before i resend just those 2 and we can leave the NMI handling to get more
testing and review before we consider.

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update
  2022-08-17 12:30             ` Ashok Raj
@ 2022-08-17 14:19               ` Borislav Petkov
  2022-08-17 15:06                 ` Ashok Raj
  0 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2022-08-17 14:19 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Ingo Molnar, Thomas Gleixner, Tony Luck, Dave Hansen,
	LKML Mailing List, X86-kernel, Andy Lutomirski, Tom Lendacky,
	Jacon Jun Pan

On Wed, Aug 17, 2022 at 12:30:49PM +0000, Ashok Raj wrote:
> You will find out when system returns after reboot and hopefully wasn't
> promoted to a cold-boot which will loose MCE banks.

Not good enough!

This should issue a warning in dmesg that a potential MCE while update
is running would cause a lockup. That is if we don't disable MCE around
it.

If we decide to disable MCE, it should say shutdown.

> Meaning deal with the effect of a really rare MCE. Rather than trying to
> avoid it. Taking the MCE is more important than finishing the update,
> and loosing what the error signaled was trying to convey.

Right now I'm inclined to not do anything and warn of a potential rare
situation.

> > > Shutdown, shutdown.. There is only 1 MCE no matter how many CPUs you have.
> > 
> > Because all CPUs are executing the loop? Or how do you decide this?
> 
> Fatal errors signaled with PCC=1 in the MCAx.STATUS is *ALWAYS*

What does that have to do with

"There is only 1 MCE no matter how many CPUs you have."

?

That's bullsh*t. Especially if the machine can do LMCE.

> I'm waiting for the results. :-).  And if you feel we can merge the
> - Patch1 - bug fix 
> - Patch2 - min-rev id
> 
> I do have the comments from Ingo captured, but I'll wait for other comments
> before i resend just those 2 and we can leave the NMI handling to get more
> testing and review before we consider.

No, you need to go read Documentation/process/.

I'm tired of having to explain to you how the kernel development process
works. You send your set, wait for a week, collect feedback and then you
send a new revision.

Not hammer people with patchsets every day.

This is not how that works.

If someone's breathing down your neck lemme know - I'd like to talk to
him/her.

Ok?!

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update
  2022-08-17 14:19               ` Borislav Petkov
@ 2022-08-17 15:06                 ` Ashok Raj
  2022-08-29 14:23                   ` Andy Lutomirski
  0 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-08-17 15:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, Tony Luck, Dave Hansen,
	LKML Mailing List, X86-kernel, Andy Lutomirski, Tom Lendacky,
	Jacon Jun Pan, Ashok Raj

On Wed, Aug 17, 2022 at 04:19:40PM +0200, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 12:30:49PM +0000, Ashok Raj wrote:
> > You will find out when system returns after reboot and hopefully wasn't
> > promoted to a cold-boot which will loose MCE banks.
> 
> Not good enough!

I probably misread your question.. are you suggesting we add some WARN when
we initiate late_load? I thought you were asking if the HW must signal
something and OS should log when an MCE happens if MCIP=1


> 
> This should issue a warning in dmesg that a potential MCE while update
> is running would cause a lockup. That is if we don't disable MCE around
> it.
> 
> If we decide to disable MCE, it should say shutdown.

Ok, that clarifies it.. "IF we choose to set MCIP=1, we should tell users
that hell can break loose, get under the table" :-)

> 
> > Meaning deal with the effect of a really rare MCE. Rather than trying to
> > avoid it. Taking the MCE is more important than finishing the update,
> > and loosing what the error signaled was trying to convey.
> 
> Right now I'm inclined to not do anything and warn of a potential rare
> situation.

Encouraging.. So I'll drop that patch from the list next time around.

> 
> > > > Shutdown, shutdown.. There is only 1 MCE no matter how many CPUs you have.
> > > 
> > > Because all CPUs are executing the loop? Or how do you decide this?
> > 
> > Fatal errors signaled with PCC=1 in the MCAx.STATUS is *ALWAYS*
> 
> What does that have to do with
> 
> "There is only 1 MCE no matter how many CPUs you have."
> 
> ?
> 
> That's bullsh*t. Especially if the machine can do LMCE.

Well, not outlandish :)

LMCE is only for recoverable errors. When we have a fatal error, sometimes
the signalling and consumption of poison are going in different directions.
In order to minimize exposure of bad data from being consumed,
*ALL* Intel processors have always broadcast fatal errors. This is the
history behind why we broadcast.

BTW: This is all legacy behavior. Nothing should come as surprise.

LMCE is best effort. This is the current state.

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode
  2022-08-17  5:11 ` [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode Ashok Raj
  2022-08-17  7:43   ` Ingo Molnar
@ 2022-08-19 10:24   ` Borislav Petkov
  2022-08-23 11:13     ` Ashok Raj
  1 sibling, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2022-08-19 10:24 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Thomas Gleixner, Tony Luck, Dave Hansen, LKML Mailing List,
	X86-kernel, Andy Lutomirski, Tom Lendacky, Jacon Jun Pan

On Wed, Aug 17, 2022 at 05:11:23AM +0000, Ashok Raj wrote:
> When save_microcode_patch() is looking to replace an existing microcode in
> the cache, current code is *only* checks the CPU sig/pf in the main

Write those "sig/pf" things out once so that it is clear what that is.

> header. Microcode can carry additional sig/pf combinations in the extended
> signature table, which is completely missed today.
> 
> For e.g. Current patch is a multi-stepping patch and new incoming patch is
> a specific patch just for this CPUs stepping.
> 
> patch1:
> fms3 <--- header FMS
> ...
> ext_sig:
> fms1
> fms2
> 
> patch2: new
> fms2 <--- header FMS
> 
> Current code takes only fms3 and checks with patch2 fms2.

So, find_matching_signature() does all the signatures matching and
scanning already. If anything, that function should tell its callers
whether the patch it is looking at - the fms2 one - should replace the
current one or not.

I.e., all the logic to say how strong a patch match is, should be
concentrated there. And then the caller will do the according action.

> saved_patch.header.fms3 != new_patch.header.fms2, so save_microcode_patch
> saves it to the end of list instead of replacing patch1 with patch2.
> 
> There is no functional user observable issue since find_patch() skips
> patch versions that are <= current_patch and will land on patch2 properly.
> 
> Nevertheless this will just end up storing every patch that isn't required.
> Kernel just needs to store the latest patch. Otherwise its a memory leak
> that sits in kernel and never used.

Oh well.

> Cc: stable@vger.kernel.org

Why?

This looks like a small correction to me which doesn't need to go to
stable...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 2/5] x86/microcode/intel: Allow a late-load only if a min rev is specified
  2022-08-17  5:11 ` [PATCH v3 2/5] x86/microcode/intel: Allow a late-load only if a min rev is specified Ashok Raj
  2022-08-17  7:45   ` Ingo Molnar
@ 2022-08-19 11:11   ` Borislav Petkov
  2022-08-23  0:08     ` Ashok Raj
  1 sibling, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2022-08-19 11:11 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Thomas Gleixner, Tony Luck, Dave Hansen, LKML Mailing List,
	X86-kernel, Andy Lutomirski, Tom Lendacky, Jacon Jun Pan

On Wed, Aug 17, 2022 at 05:11:24AM +0000, Ashok Raj wrote:
> In general users don't have the necessary information to determine
> whether a late-load of a new microcode version has removed any feature
> (MSR, CPUID etc) between what is currently loaded and this new microcode.
> To address this issue, Intel has added a "minimum required version" field to
> a previously reserved field in the file header. Microcode updates
> should only be applied if the current microcode version is equal
> to, or greater than this minimum required version.
> 
> https://lore.kernel.org/linux-kernel/alpine.DEB.2.21.1909062237580.1902@nanos.tec.linutronix.de/

That goes into a Link: tag.

> Thomas made some suggestions on how meta-data in the microcode file could
> provide Linux with information to decide if the new microcode is suitable
> candidate for late-load. But even the "simpler" option#1 requires a lot of

In all your text:

s/late-load/late loading/g

It is called CONFIG_MICROCODE_LATE_LOADING - not CONFIG_MICROCODE_LATE_LOAD.

People are confused enough already - no need for more.

> metadata and corresponding kernel code to parse it.
> 
> The proposal here is an even simpler option. The criteria for a microcode to
> be a viable late-load candidate is that no CPUID or OS visible MSR features

Simply "OS visible features" - CPUID and MSRs are only two examples. The
microcode cannot change how the OS is supposed to interact with visible
features because that causes problems.

> are removed with respect to an earlier version of the microcode.
> 
> Pseudocode for late-load is as follows:

Unknown word [Pseudocode] in commit message.
Suggestions: ['Pseudo code',

> if header.min_required_id == 0
> 	This is old format microcode, block late-load
> else if current_ucode_version < header.min_required_id
> 	Current version is too old, block late-load of this microcode.
> else
> 	OK to proceed with late-load.
> 
> Any microcode that removes a feature will set the min_version to itself.

"... that modifies the interface to a OS-visible feature..."

> This will enforce this microcode is not suitable for late-loading.
> 
> The enforcement is not in hardware and limited to kernel loader enforcing
> the requirement. It is not required for early loading of microcode to
> enforce this requirement, since the new features are only
> evaluated after early loading in the boot process.
> 
> 
> Test cases covered:
> 
> 1. With new kernel, attempting to load an older format microcode with the
>    min_rev=0 should be blocked by kernel.
> 
>    [  210.541802] microcode: Header MUST specify min version for late-load

Make that more user-friendly:

"microcode: Late loading denied: microcode header does not specify a required min version."

> 2. New microcode with a non-zero min_rev in the header, but the specified
>    min_rev is greater than what is currently loaded in the CPU should be
>    blocked by kernel.
> 
>    245.139828] microcode: Current revision 0x8f685300 is too old to update,
> must be at 0xaa000050 version or higher

"microcode: Late loading denied: Current version ... or higher. Use early loading instead."

> 3. New microcode with a min_rev < currently loaded should allow loading the
>    microcode
> 
> 4. Build initrd with microcode that has min_rev=0, or min_rev > currently
>    loaded should permit early loading microcode from initrd.
> 
> 
> Tested-by: William Xie <william.xie@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  arch/x86/include/asm/microcode_intel.h |  4 +++-
>  arch/x86/kernel/cpu/microcode/intel.c  | 20 ++++++++++++++++++++
>  2 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/microcode_intel.h b/arch/x86/include/asm/microcode_intel.h
> index 4c92cea7e4b5..16b8715e0984 100644
> --- a/arch/x86/include/asm/microcode_intel.h
> +++ b/arch/x86/include/asm/microcode_intel.h
> @@ -14,7 +14,9 @@ struct microcode_header_intel {
>  	unsigned int            pf;
>  	unsigned int            datasize;
>  	unsigned int            totalsize;
> -	unsigned int            reserved[3];
> +	unsigned int            reserved1;
> +	unsigned int		min_req_id;
> +	unsigned int            reserved3;
>  };
>  
>  struct microcode_intel {
> diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
> index c4b11e2fbe33..1eb202ec2302 100644
> --- a/arch/x86/kernel/cpu/microcode/intel.c
> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> @@ -178,6 +178,7 @@ static int microcode_sanity_check(void *mc, int print_err)

You can't do this in this function:

load_ucode_intel_bsp -> __load_ucode_intel -> scan_microcode -> microcode_sanity_check

which is the early path.

So you'd have to pass down the fact that you're doing late loading from
request_microcode_fw().

Now, I'm staring at that ugly refresh_fw bool arg in that function and
I *think* I did it 10 years ago because it shouldn't try to load from
the fs when it is resuming because there might not be a fs yet... or
something to that effect.

tglx might have a better idea how to check we're in the ->starting notifier...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 2/5] x86/microcode/intel: Allow a late-load only if a min rev is specified
  2022-08-19 11:11   ` Borislav Petkov
@ 2022-08-23  0:08     ` Ashok Raj
  2022-08-24 19:52       ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-08-23  0:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Tony Luck, Dave Hansen, LKML Mailing List,
	X86-kernel, Andy Lutomirski, Tom Lendacky, Jacon Jun Pan,
	Ashok Raj

Hi Boris

> >  struct microcode_intel {
> > diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
> > index c4b11e2fbe33..1eb202ec2302 100644
> > --- a/arch/x86/kernel/cpu/microcode/intel.c
> > +++ b/arch/x86/kernel/cpu/microcode/intel.c
> > @@ -178,6 +178,7 @@ static int microcode_sanity_check(void *mc, int print_err)
> 
> You can't do this in this function:
> 
> load_ucode_intel_bsp -> __load_ucode_intel -> scan_microcode -> microcode_sanity_check
> 
> which is the early path.

Correct, but print_err parameter is 0 when called from scan_microcode() and 1
when called from generic_load_microcode().

We do min_rev enforcement only when print_err is set.

Should I change the parameter to "late_loading" instead so the
meaning is clear. Let me know if I that's preferred.

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode
  2022-08-19 10:24   ` Borislav Petkov
@ 2022-08-23 11:13     ` Ashok Raj
  2022-08-24 19:27       ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-08-23 11:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Tony Luck, Dave Hansen, LKML Mailing List,
	X86-kernel, Andy Lutomirski, Tom Lendacky, Jacon Jun Pan,
	Ashok Raj

On Fri, Aug 19, 2022 at 12:24:41PM +0200, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 05:11:23AM +0000, Ashok Raj wrote:
> > When save_microcode_patch() is looking to replace an existing microcode in
> > the cache, current code is *only* checks the CPU sig/pf in the main
> 
> Write those "sig/pf" things out once so that it is clear what that is.

Thanks, will do.
> 
> > header. Microcode can carry additional sig/pf combinations in the extended
> > signature table, which is completely missed today.
> > 
> > For e.g. Current patch is a multi-stepping patch and new incoming patch is
> > a specific patch just for this CPUs stepping.
> > 
> > patch1:
> > fms3 <--- header FMS
> > ...
> > ext_sig:
> > fms1
> > fms2
> > 
> > patch2: new
> > fms2 <--- header FMS
> > 
> > Current code takes only fms3 and checks with patch2 fms2.
> 
> So, find_matching_signature() does all the signatures matching and
> scanning already. If anything, that function should tell its callers
> whether the patch it is looking at - the fms2 one - should replace the
> current one or not.
> 
> I.e., all the logic to say how strong a patch match is, should be
> concentrated there. And then the caller will do the according action.

I updated the commit log accordingly. Basically find_matching_signature()
is only intended to find a CPU's sig/pf against a microcode image and not
intended to compare between two different images. 
> 
> > saved_patch.header.fms3 != new_patch.header.fms2, so save_microcode_patch
> > saves it to the end of list instead of replacing patch1 with patch2.
> > 
> > There is no functional user observable issue since find_patch() skips
> > patch versions that are <= current_patch and will land on patch2 properly.
> > 
> > Nevertheless this will just end up storing every patch that isn't required.
> > Kernel just needs to store the latest patch. Otherwise its a memory leak
> > that sits in kernel and never used.
> 
> Oh well.
> 
> > Cc: stable@vger.kernel.org
> 
> Why?

We have some code to support model specific microcode rollback support.
This code is just internal. That codebase triggered the bug. 

I'll drop the Cc next time.

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode
  2022-08-23 11:13     ` Ashok Raj
@ 2022-08-24 19:27       ` Borislav Petkov
  2022-08-25  3:27         ` Ashok Raj
  0 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2022-08-24 19:27 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Thomas Gleixner, Tony Luck, Dave Hansen, LKML Mailing List,
	X86-kernel, Andy Lutomirski, Tom Lendacky, Jacon Jun Pan

On Tue, Aug 23, 2022 at 11:13:13AM +0000, Ashok Raj wrote:
> > > patch1:
> > > fms3 <--- header FMS
> > > ...
> > > ext_sig:
> > > fms1
> > > fms2
> > > 
> > > patch2: new
> > > fms2 <--- header FMS
> > > 
> > > Current code takes only fms3 and checks with patch2 fms2.
> > 
> > So, find_matching_signature() does all the signatures matching and
> > scanning already. If anything, that function should tell its callers
> > whether the patch it is looking at - the fms2 one - should replace the
> > current one or not.
> > 
> > I.e., all the logic to say how strong a patch match is, should be
> > concentrated there. And then the caller will do the according action.
> 
> I updated the commit log accordingly. Basically find_matching_signature()
> is only intended to find a CPU's sig/pf against a microcode image and not
> intended to compare between two different images. 

Err, what?

find_matching_signature() looks at fmt3 - your example above - and then
goes and looks at ext_sig. Also your example above.

So you can teach that function to say with a *separate* return value
"replace current patch with this new patch because this new patch is a
better fit."

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 2/5] x86/microcode/intel: Allow a late-load only if a min rev is specified
  2022-08-23  0:08     ` Ashok Raj
@ 2022-08-24 19:52       ` Borislav Petkov
  2022-08-25  4:02         ` Ashok Raj
  0 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2022-08-24 19:52 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Thomas Gleixner, Tony Luck, Dave Hansen, LKML Mailing List,
	X86-kernel, Andy Lutomirski, Tom Lendacky, Jacon Jun Pan

On Tue, Aug 23, 2022 at 12:08:27AM +0000, Ashok Raj wrote:
> Correct, but print_err parameter is 0 when called from scan_microcode() and 1
> when called from generic_load_microcode().

Well, scan_microcode() gets called from save_microcode_in_initrd() which
is fs_initcall and if we had to be really precise, print_err being 0
there is wrong.

Because at fs_initcall time we can very well print error messages. But
that print_err thing is an old relic so will have to get fixed some
other day.

> We do min_rev enforcement only when print_err is set.

That's wrong - you need to do min_rev enforcement only when you're
loading microcode late. I.e., to paste from my previous mail:

"So you'd have to pass down the fact that you're doing late loading from
request_microcode_fw().

Now, I'm staring at that ugly refresh_fw bool arg in that function and
I *think* I did it 10 years ago because it shouldn't try to load from
the fs when it is resuming because there might not be a fs yet... or
something to that effect.

tglx might have a better idea how to check we're in the ->starting
notifier..."

IOW, we're going to have to do something like

->request_microcode_fw(, ... late_loading=true)

and I wanted to reuse that refresh_fw arg instead of adding another
one...

HTH.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode
  2022-08-24 19:27       ` Borislav Petkov
@ 2022-08-25  3:27         ` Ashok Raj
  2022-08-26 16:24           ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-08-25  3:27 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Tony Luck, Dave Hansen, LKML Mailing List,
	X86-kernel, Andy Lutomirski, Tom Lendacky, Jacon Jun Pan,
	Ashok Raj

On Wed, Aug 24, 2022 at 09:27:55PM +0200, Borislav Petkov wrote:
> On Tue, Aug 23, 2022 at 11:13:13AM +0000, Ashok Raj wrote:
> > > > patch1:
> > > > fms3 <--- header FMS
> > > > ...
> > > > ext_sig:
> > > > fms1
> > > > fms2
> > > > 
> > > > patch2: new
> > > > fms2 <--- header FMS
> > > > 
> > > > Current code takes only fms3 and checks with patch2 fms2.
> > > 
> > > So, find_matching_signature() does all the signatures matching and
> > > scanning already. If anything, that function should tell its callers
> > > whether the patch it is looking at - the fms2 one - should replace the
> > > current one or not.
> > > 
> > > I.e., all the logic to say how strong a patch match is, should be
> > > concentrated there. And then the caller will do the according action.
> > 
> > I updated the commit log accordingly. Basically find_matching_signature()
> > is only intended to find a CPU's sig/pf against a microcode image and not
> > intended to compare between two different images. 
> 
> Err, what?
> 
> find_matching_signature() looks at fmt3 - your example above - and then
> goes and looks at ext_sig. Also your example above.

-               sig          = mc_saved_hdr->sig;
-               pf           = mc_saved_hdr->pf;
 
-               if (find_matching_signature(data, sig, pf)) {

In the above example, 

Old patch header rev is fms3
New patch header is fms2
Your CPU sig is fms2. 

The code was taking mc_saved_hdr->sig (which is fms3) and looking in the
new patch (which only has fms2) now you will never find and end up in the
situation described, which is adding the new patch to the tail instead of
replacing patch1. 

Also look at all the existing usages of the function and we *always* use
the fms of the CPU.

Bottom line, you never need to have more than 1 patch for a given CPU. The
list only exists since it could technically support multi-stepping in the
system.

> 
> So you can teach that function to say with a *separate* return value
> "replace current patch with this new patch because this new patch is a
> better fit."

I didn't quite understand what needs to be taught. 

FWIW, I rewrote the commit  log with *lots* help from Dave. Hopefully the
full rewrite of the log can help jog memory about the purpose and why this
has survived all these years. Simply because the code still is capable of
picking the right microcode except in our internal testing it showed its
warts. 

The new commit log below: 

== Background ==

Microcode needs to be reapplied upon a resume from suspend flow. For this
purpose kernel saves a copy of the microcode after its loaded. Its possible
to build systems with two or more CPUs that have incompatible microcodes,
like if the CPUs have different stepping and each have separate microcode
files.

save_microcode_patch() is called when it comes time to load a new microcode
image into kernel. Its job is to either ADD this to the kernel cache list
if one didn't exist, or REPLACE if one was already saved in the list.
save_microcode_patch() is expected to search the saved list for the
specific CPU in the system, and replace it with the new incoming patch
being updated.

At any given time, there *must* be exactly one entry in the cache for a
given CPU in the system. If one isn't found, save_microcode_patch() just
adds this image to the list.

Lastly, microcode images have a CPU signature for compatible processors
that this image applies. The signature can be found in two places.

- The microcode header
- In the extended signature table in the image. This is located at the end
  of the image itself.

for e.g. The image would consist of

microcode header
    { rev, date, checksum, sigx }
image
    { microcode image }

extended_signature_table.
    sig1
    sig2
    sig3


When Intel releases microcode images, typically the most recent stepping is
listed in the main microcode header, and the additional (older) ones are
listed in the extended signature table.

== Problem ==
The kernel is supposed to match the CPUs signature (struct ucode_cpu_info)
with the existing patches in the microcode_cache list. But, instead it
tries to match the signature in the old microcode, against the one in the
saved patch list in microcode_cache.

This would seem to make perfect sense if both old and new had the same
signature listed in the microcode header. But if the current CPU is listed
in the extended signature (say like sig1, above), and the new patch had a
signature in the main header as sig2, and only sig2 in the new patch.
Current code would take sigx from the current image and compare with sig2.
this is a mismatch and yield no search results, and save_microcode_patch,
will keep both the old and new images. 

== Impact ==

When find_patch() is called since the code only checks for entries in the
cache > what's loaded in the CPU, it will skip the one at the head (which
is the older revision) and find the one at the tail, which is the newer
revision.

-- Problem 1--
So what exactly is the problem? find_patch() will always find the correct
patch to update on resume. But we end up storing an extra image that would
never be required. 

-- Problem 2 --
There was some internal version with support for microcode rollback, which
permits picking revisions that are older than currently loaded.
microcode_cache now has both the old and the new, in some situations it
selected the older patch instead of picking the newer revision at the end
of the list.

[Note, the code that surfaced the bug was purely for internal evaluation
purposes.]

Nevertheless this will just end up storing every patch that isn't required.
Kernel just needs to store the latest patch. Otherwise its a memory leak
that sits in kernel and never used. This is mostly harmless.

== Solution ==
Always search for an entry in the microcode_cache, with the interested CPU
signature instead of using the signature found in another microcode entry.

In fact every other usage of find_matching_microcode() is always called
with the CPU signature. This is the *only* instance that used the wrong
values to compare.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 2/5] x86/microcode/intel: Allow a late-load only if a min rev is specified
  2022-08-24 19:52       ` Borislav Petkov
@ 2022-08-25  4:02         ` Ashok Raj
  2022-08-26 12:09           ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-08-25  4:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Tony Luck, Dave Hansen, LKML Mailing List,
	X86-kernel, Andy Lutomirski, Tom Lendacky, Jacon Jun Pan,
	Ashok Raj

On Wed, Aug 24, 2022 at 09:52:42PM +0200, Borislav Petkov wrote:
> On Tue, Aug 23, 2022 at 12:08:27AM +0000, Ashok Raj wrote:
> > Correct, but print_err parameter is 0 when called from scan_microcode() and 1
> > when called from generic_load_microcode().
> 
> Well, scan_microcode() gets called from save_microcode_in_initrd() which
> is fs_initcall and if we had to be really precise, print_err being 0
> there is wrong.
> 
> Because at fs_initcall time we can very well print error messages. But
> that print_err thing is an old relic so will have to get fixed some
> other day.

Well, the code hasn't changed since 2016, and possibly they migrated from
another file. 
> 
> > We do min_rev enforcement only when print_err is set.
> 
> That's wrong - you need to do min_rev enforcement only when you're
> loading microcode late. I.e., to paste from my previous mail:

True, if this hasn't been used for soo long, I was hoping to simply rename
the variable as late_load, and repurpose it.. 

As you mention we do have some good opportunity to perform some cleanups
here, and could address at that time.

If you feel compelled to turn the print on early boot, I could flip it and
send it along with my other changes? Let me know if you prefer that.


And I'll pursue what you said below. I still like the
microcode_sanity_check(), it sort of falls in that category. I can add
another parameter passing all the way from the request_fw... come through
all the other interceptors and land in the same spot.

The microcode_sanity_check() was nicely isolated Intel only function and
didn't need to perform surgery where it wasn't necessary :-).. 

Good bang for the buck :-)
> 
> "So you'd have to pass down the fact that you're doing late loading from
> request_microcode_fw().
> 
> Now, I'm staring at that ugly refresh_fw bool arg in that function and
> I *think* I did it 10 years ago because it shouldn't try to load from
> the fs when it is resuming because there might not be a fs yet... or
> something to that effect.
> 
> tglx might have a better idea how to check we're in the ->starting
> notifier..."
> 
> IOW, we're going to have to do something like
> 
> ->request_microcode_fw(, ... late_loading=true)

Sure, I'll check with how Thomas prefers it. 
> 
> and I wanted to reuse that refresh_fw arg instead of adding another
> one...
> 
> HTH.

YTH!

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 2/5] x86/microcode/intel: Allow a late-load only if a min rev is specified
  2022-08-25  4:02         ` Ashok Raj
@ 2022-08-26 12:09           ` Borislav Petkov
  0 siblings, 0 replies; 32+ messages in thread
From: Borislav Petkov @ 2022-08-26 12:09 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Thomas Gleixner, Tony Luck, Dave Hansen, LKML Mailing List,
	X86-kernel, Andy Lutomirski, Tom Lendacky, Jacon Jun Pan

On Thu, Aug 25, 2022 at 04:02:07AM +0000, Ashok Raj wrote:
> If you feel compelled to turn the print on early boot, I could flip it and
> send it along with my other changes? Let me know if you prefer that.

I'd prefer that you actually read what I'm trying to explain to you.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode
  2022-08-25  3:27         ` Ashok Raj
@ 2022-08-26 16:24           ` Borislav Petkov
  2022-08-26 17:18             ` Ashok Raj
  0 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2022-08-26 16:24 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Thomas Gleixner, Tony Luck, Dave Hansen, LKML Mailing List,
	X86-kernel, Andy Lutomirski, Tom Lendacky, Jacon Jun Pan

On Thu, Aug 25, 2022 at 03:27:57AM +0000, Ashok Raj wrote:
> Old patch header rev is fms3
> New patch header is fms2
> Your CPU sig is fms2.

Can you pls give me exact reproduction instructions how you come to this
situation? 

dhansen and I are talking about this whole deal on IRC and I'm having
more questions than answers so I'd like to reproduce this locally.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode
  2022-08-26 16:24           ` Borislav Petkov
@ 2022-08-26 17:18             ` Ashok Raj
  2022-08-26 17:29               ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-08-26 17:18 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Tony Luck, Dave Hansen, LKML Mailing List,
	X86-kernel, Andy Lutomirski, Tom Lendacky, Jacon Jun Pan,
	Ashok Raj

On Fri, Aug 26, 2022 at 06:24:02PM +0200, Borislav Petkov wrote:
> On Thu, Aug 25, 2022 at 03:27:57AM +0000, Ashok Raj wrote:
> > Old patch header rev is fms3
> > New patch header is fms2
> > Your CPU sig is fms2.
> 
> Can you pls give me exact reproduction instructions how you come to this
> situation? 

Sure, the problem is that code base is something I *never* wanted to post,
since its not architectural.

The hard part is to split some of the internal code out and have a plain
version on public tree that can reproduce. 

You may need additional hackery since you won't have the additional
revisions for testing. I'll try to hack something up

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode
  2022-08-26 17:18             ` Ashok Raj
@ 2022-08-26 17:29               ` Borislav Petkov
  0 siblings, 0 replies; 32+ messages in thread
From: Borislav Petkov @ 2022-08-26 17:29 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Thomas Gleixner, Tony Luck, Dave Hansen, LKML Mailing List,
	X86-kernel, Andy Lutomirski, Tom Lendacky, Jacon Jun Pan

On Fri, Aug 26, 2022 at 05:18:32PM +0000, Ashok Raj wrote:
> Sure, the problem is that code base is something I *never* wanted to post,
> since its not architectural.

I don't need the code base - I need the microcode patches you're using
to trigger this.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update
  2022-08-17 15:06                 ` Ashok Raj
@ 2022-08-29 14:23                   ` Andy Lutomirski
  0 siblings, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2022-08-29 14:23 UTC (permalink / raw)
  To: Ashok Raj, Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, Tony Luck, Dave Hansen,
	LKML Mailing List, X86-kernel, Andy Lutomirski, Tom Lendacky,
	Jacon Jun Pan

On 8/17/22 08:06, Ashok Raj wrote:
> On Wed, Aug 17, 2022 at 04:19:40PM +0200, Borislav Petkov wrote:
>> On Wed, Aug 17, 2022 at 12:30:49PM +0000, Ashok Raj wrote:
>>> You will find out when system returns after reboot and hopefully wasn't
>>> promoted to a cold-boot which will loose MCE banks.
>> Not good enough!
> I probably misread your question.. are you suggesting we add some WARN when
> we initiate late_load? I thought you were asking if the HW must signal
> something and OS should log when an MCE happens if MCIP=1
>
>
>> This should issue a warning in dmesg that a potential MCE while update
>> is running would cause a lockup. That is if we don't disable MCE around
>> it.
>>
>> If we decide to disable MCE, it should say shutdown.
> Ok, that clarifies it.. "IF we choose to set MCIP=1, we should tell users
> that hell can break loose, get under the table" :-)
>
>>> Meaning deal with the effect of a really rare MCE. Rather than trying to
>>> avoid it. Taking the MCE is more important than finishing the update,
>>> and loosing what the error signaled was trying to convey.
>> Right now I'm inclined to not do anything and warn of a potential rare
>> situation.
> Encouraging.. So I'll drop that patch from the list next time around.


If I followed all this correctly, I agree. If we set MCIP to force a 
crash if we get MCE, then we are guaranteed to crash.  If we don't, then 
we might crash.


An imperfect alternative would be to set a (percpu?) flag that we're 
doing a ucode update and then detect that flag early in the MCE handler 
and warn very loudly.  This seems like it will give us the best chance 
of getting a useful diagnostic.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 5/5] x86/microcode: Place siblings in NMI loop while update in progress
  2022-08-17  5:11 ` [PATCH v3 5/5] x86/microcode: Place siblings in NMI loop while update in progress Ashok Raj
@ 2022-08-30 19:15   ` Andy Lutomirski
  0 siblings, 0 replies; 32+ messages in thread
From: Andy Lutomirski @ 2022-08-30 19:15 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Borislav Petkov, Thomas Gleixner, Tony Luck, Dave Hansen,
	LKML Mailing List, X86-kernel, Tom Lendacky, Jacon Jun Pan

On Tue, Aug 16, 2022 at 10:12 PM Ashok Raj <ashok.raj@intel.com> wrote:
>
> Microcode updates need a guarantee that the thread sibling that is waiting
> for the update to finish on the primary core will not execute any
> instructions until the update is complete. This is required to guarantee
> any MSR or instruction that's being patched will be executed before the
> update is complete.
>
> After the stop_machine() rendezvous, an NMI handler is registered. If an
> NMI were to happen while the microcode update is not complete, the
> secondary thread will spin until the ucode update state is cleared.
>
> Couple of choices discussed are:
>
> 1. Rendezvous inside the NMI handler, and also perform the update from
>    within the handler. This seemed too risky and might cause instability
>    with the races that we would need to solve. This would be a difficult
>    choice.
>         1.a Since the primary thread of every core is performing a wrmsr
>         for the update, once the wrmsr has started, it can't be
>         interrupted. Hence its not required to NMI the primary thread of
>         the core. Only the secondary thread needs to be parked in NMI
>         before the update begins.
>         Suggested by From Andy Cooper
> 2. Thomas (tglx) suggested that we could look into masking all the LVT
>    originating NMI's. Such as LINT1, Perf control LVT entries and such.
>    Since we are in the rendezvous loop, we don't need to worry about any
>    NMI IPI's generated by the OS.
>
>    The one we didn't have any control over is the ACPI mechanism of sending
>    notifications to kernel for Firmware First Processing (FFM). Apparently
>    it seems there is a PCH register that BIOS in SMI would write to
>    generate such an interrupt (ACPI GHES).
> 3. This is a simpler option. OS registers an NMI handler and doesn't do any
>    NMI rendezvous dance. But if an NMI were to happen, we check if any of
>    the CPUs thread siblings have an update in progress. Only those CPUs
>    would take an NMI. The thread performing the wrmsr() will only take an
>    NMI after the completion of the wrmsr 0x79 flow.
>
>    [ Lutomirsky thinks this is weak, and what happens from taking the
>    interrupt and the path to the registered callback handler might be
>    exposed.]
>
>    Seems like 1.a is the best candidate.
>
> The algorithm is something like this:
>
> After stop_machine() all threads are executing __reload_late()
>
> nmi_callback()
> {
>         if (!in_ucode_update)
>                 return NMI_DONE;
>         if (cpu not in sibling_mask)
>                 return NMI_DONE;
>         update sibling reached NMI for primary to continue
>
>         while (cpu in sibling_mask)
>                 wait;
>         return NMI_HANDLED;
> }
>
> __reload_late()
> {
>
>         entry_rendezvous(&late_cpus_in);
>         set_mcip()
>         if (this_cpu is first_cpu in the core)
>                 wait for siblings to drop in NMI
>                 apply_microcode()
>         else {
>                 send self_ipi(NMI_VECTOR);
>                 goto wait_for_siblings;
>         }
>
> wait_for_siblings:
>         exit_rendezvous(&late_cpus_out);
>         clear_mcip
> }
>
> reload_late()
> {
>         register_nmi_handler()
>         prepare_mask of all sibling cpus()
>         update state = ucode in progress;
>         stop_machine();
>         unregister_nmi_handler();
> }
>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
>  arch/x86/kernel/cpu/microcode/core.c | 218 ++++++++++++++++++++++++++-
>  1 file changed, 211 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
> index d24e1c754c27..fd3b8ce2c82a 100644
> --- a/arch/x86/kernel/cpu/microcode/core.c
> +++ b/arch/x86/kernel/cpu/microcode/core.c
> @@ -39,7 +39,9 @@
>  #include <asm/processor.h>
>  #include <asm/cmdline.h>
>  #include <asm/setup.h>
> +#include <asm/apic.h>
>  #include <asm/mce.h>
> +#include <asm/nmi.h>
>
>  #define DRIVER_VERSION "2.2"
>
> @@ -411,6 +413,13 @@ static int check_online_cpus(void)
>
>  static atomic_t late_cpus_in;
>  static atomic_t late_cpus_out;
> +static atomic_t nmi_cpus;      // number of CPUs that enter NMI
> +static atomic_t nmi_timeouts;   // number of siblings that timeout
> +static atomic_t nmi_siblings;   // Nmber of siblings that enter NMI
> +static atomic_t in_ucode_update;// Are we in microcode update?
> +static atomic_t nmi_exit;       // Siblings that exit NMI

Some of these variables seem oddly managed and just for debugging.

> +
> +static struct cpumask all_sibling_mask;
>
>  static int __wait_for_cpus(atomic_t *t, long long timeout)
>  {
> @@ -433,6 +442,104 @@ static int __wait_for_cpus(atomic_t *t, long long timeout)
>         return 0;
>  }
>
> +struct core_rendez {
> +       int num_core_cpus;
> +       atomic_t callin;
> +       atomic_t core_done;
> +};
> +
> +static DEFINE_PER_CPU(struct core_rendez, core_sync);
> +
> +static int __wait_for_update(atomic_t *t, long long timeout)
> +{
> +       while (!atomic_read(t)) {
> +               if (timeout < SPINUNIT)
> +                       return 1;

Since you're using signed arithmetic, timeout < 0 would be a less
error-prone condition.

Anyway, this patch is full of debugging stuff, so I won't do a
line-for-line review, but I do have a suggestion.  Instead of all this
bookkeeping, maybe just track the number of cores to park in NMI, kind
of like this (hand-wavy pseudocode):

static struct cpumask cpus_to_park_in_nmi;

/* fill out the cpumask */
static atomic_t nmi_parked_cpus;
static bool park_enabled;

Then, after __wait_for_cpus (once everything is stopped), one cpu sets
up the nmi handler, sets park_enabled, and sends the NMI IPI to all
the CPUs parked in there.  The handler does:

if (this cpu is in cpus_to_mark_in_nmi) {
  WARN_ON_ONCE(!park_enabled);
  atomic_inc(&nmi_parked_cpus);
  while (READ_ONCE(park_enabled))
    ;  /* because Intel won't promise that cpu_relax() is okay */
  atomic_dec(&nmi_parked_cpus);
}

and the CPUs that aren't supposed to park wait for nmi_parked_cpus to
have the right value.  After the update, park_enabled gets cleared and
everything resumes.

Does this seem reasonable?

I was thinking it would be straightforward to have __wait_for_cpus
handle this, but that would only really be easy in a language with
closures or continuation passing.

--Andy

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2022-08-30 19:15 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-17  5:11 [PATCH v3 0/5] Making microcode late-load robust Ashok Raj
2022-08-17  5:11 ` [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode Ashok Raj
2022-08-17  7:43   ` Ingo Molnar
2022-08-17 10:45     ` Ashok Raj
2022-08-19 10:24   ` Borislav Petkov
2022-08-23 11:13     ` Ashok Raj
2022-08-24 19:27       ` Borislav Petkov
2022-08-25  3:27         ` Ashok Raj
2022-08-26 16:24           ` Borislav Petkov
2022-08-26 17:18             ` Ashok Raj
2022-08-26 17:29               ` Borislav Petkov
2022-08-17  5:11 ` [PATCH v3 2/5] x86/microcode/intel: Allow a late-load only if a min rev is specified Ashok Raj
2022-08-17  7:45   ` Ingo Molnar
2022-08-19 11:11   ` Borislav Petkov
2022-08-23  0:08     ` Ashok Raj
2022-08-24 19:52       ` Borislav Petkov
2022-08-25  4:02         ` Ashok Raj
2022-08-26 12:09           ` Borislav Petkov
2022-08-17  5:11 ` [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update Ashok Raj
2022-08-17  7:41   ` Ingo Molnar
2022-08-17  7:58     ` Ingo Molnar
2022-08-17  8:09       ` Borislav Petkov
2022-08-17 11:57         ` Ashok Raj
2022-08-17 12:10           ` Borislav Petkov
2022-08-17 12:30             ` Ashok Raj
2022-08-17 14:19               ` Borislav Petkov
2022-08-17 15:06                 ` Ashok Raj
2022-08-29 14:23                   ` Andy Lutomirski
2022-08-17 11:40     ` Ashok Raj
2022-08-17  5:11 ` [PATCH v3 4/5] x86/x2apic: Support x2apic self IPI with NMI_VECTOR Ashok Raj
2022-08-17  5:11 ` [PATCH v3 5/5] x86/microcode: Place siblings in NMI loop while update in progress Ashok Raj
2022-08-30 19:15   ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).