linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [v2 00/13] Make microcode late loading more robust
@ 2022-11-03 17:58 Ashok Raj
  2022-11-03 17:58 ` [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times Ashok Raj
                   ` (12 more replies)
  0 siblings, 13 replies; 30+ messages in thread
From: Ashok Raj @ 2022-11-03 17:58 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: LKML Mailing List, X86-kernel, Tony Luck, Dave Hansen,
	Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan, Tom Lendacky,
	Kai Huang, Andrew Cooper, Ashok Raj

Hi Thomas & Boris

This series is based on the recent cleanup series from Boris, here below:

https://lore.kernel.org/lkml/20221028142638.28498-1-bp@alien8.de/

Previous post here:

https://lore.kernel.org/lkml/20221014200913.14644-1-ashok.raj@intel.com/

Ashok Raj (12):
  x86/microcode/intel: Prevent printing updated microcode rev multiple
    times
  x86/microcode/intel: Print old and new rev after early microcode
    update
  x86/microcode/intel: Fix a hang if early loading microcode fails
  x86/microcode: Fix microcode_check() compare after a new uCode update
  x86/microcode: Move late-load warning to earlier where kernel taint
    happens
  x86/microcode: Place siblings in NMI loop while update in progress
  x86/mce: Warn of a microcode update is in progress when MCE arrives
  x86/microcode/intel: Add minimum required revision to microcode header
  x86/microcode: Add a generic mechanism to declare support for minrev
  x86/microcode/intel: Drop wbinvd() from microcode loading
  x86/microcode: Display revisions only when update is successful
  x86/microcode/intel: Add ability to update microcode even if rev is
    unchanged

Jacob Pan (1):
  x86/ipi: Support sending NMI_VECTOR as self ipi

 arch/x86/include/asm/microcode.h       |  43 ++++++
 arch/x86/include/asm/microcode_intel.h |   4 +-
 arch/x86/include/asm/processor.h       |   1 -
 arch/x86/kernel/apic/ipi.c             |   6 +-
 arch/x86/kernel/apic/x2apic_phys.c     |   6 +-
 arch/x86/kernel/cpu/common.c           |  32 ----
 arch/x86/kernel/cpu/mce/core.c         |   5 +
 arch/x86/kernel/cpu/microcode/amd.c    |   2 +-
 arch/x86/kernel/cpu/microcode/core.c   | 200 +++++++++++++++++++++++--
 arch/x86/kernel/cpu/microcode/intel.c  |  96 ++++++++----
 arch/x86/kernel/cpu/microcode/nmi.c    |  71 +++++++++
 arch/x86/kernel/nmi.c                  |   7 +
 arch/x86/kernel/nmi_selftest.c         |  32 ++++
 arch/x86/Kconfig                       |   7 +-
 arch/x86/kernel/cpu/microcode/Makefile |   1 +
 15 files changed, 427 insertions(+), 86 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/microcode/nmi.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times
  2022-11-03 17:58 [v2 00/13] Make microcode late loading more robust Ashok Raj
@ 2022-11-03 17:58 ` Ashok Raj
  2022-11-04 11:00   ` Borislav Petkov
                     ` (2 more replies)
  2022-11-03 17:58 ` [v2 02/13] x86/microcode/intel: Print old and new rev after early microcode update Ashok Raj
                   ` (11 subsequent siblings)
  12 siblings, 3 replies; 30+ messages in thread
From: Ashok Raj @ 2022-11-03 17:58 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: LKML Mailing List, X86-kernel, Tony Luck, Dave Hansen,
	Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan, Tom Lendacky,
	Kai Huang, Andrew Cooper, Ashok Raj

Commit b6f86689d5b7 ("x86/microcode: Rip out the subsys interface gunk")
introduced a race where all CPUs follow this call chain:

microcode_init()->schedule_on_each_cpu(setup_online_cpu)->collect_cpu_info

This results in console spam where multiple CPUs print the signature.

[   33.688639] microcode: sig=0x50654, pf=0x80, revision=0x2006e05
[   33.688659] microcode: sig=0x50654, pf=0x80, revision=0x2006e05
[   33.688660] microcode: sig=0x50654, pf=0x80, revision=0x2006e05

Fix by making sure only boot CPU prints the message.

Fixes: b6f86689d5b7 ("x86/microcode: Rip out the subsys interface gunk")
Reported-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/intel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 8c35c70029bf..8f7f8dd6680e 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -680,6 +680,7 @@ void reload_ucode_intel(void)
 
 static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
 {
+	bool bsp = cpu_num == boot_cpu_data.cpu_index;
 	static struct cpu_signature prev;
 	struct cpuinfo_x86 *c = &cpu_data(cpu_num);
 	unsigned int val[2];
@@ -696,8 +697,7 @@ static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
 
 	csig->rev = c->microcode;
 
-	/* No extra locking on prev, races are harmless. */
-	if (csig->sig != prev.sig || csig->pf != prev.pf || csig->rev != prev.rev) {
+	if (bsp && csig->rev != prev.rev) {
 		pr_info("sig=0x%x, pf=0x%x, revision=0x%x\n",
 			csig->sig, csig->pf, csig->rev);
 		prev = *csig;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [v2 02/13] x86/microcode/intel: Print old and new rev after early microcode update
  2022-11-03 17:58 [v2 00/13] Make microcode late loading more robust Ashok Raj
  2022-11-03 17:58 ` [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times Ashok Raj
@ 2022-11-03 17:58 ` Ashok Raj
  2022-11-03 17:58 ` [v2 03/13] x86/microcode/intel: Fix a hang if early loading microcode fails Ashok Raj
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Ashok Raj @ 2022-11-03 17:58 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: LKML Mailing List, X86-kernel, Tony Luck, Dave Hansen,
	Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan, Tom Lendacky,
	Kai Huang, Andrew Cooper, Ashok Raj

Print the old and new versions of microcode after an early load is
complete. This is useful to know what version was loaded by BIOS before an
early microcode load.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
v2:
 - (Boris) Fix microcode update message consistent for early and late.
 - apply_microcode_intel(): prev_rev isn't set until the first update.

 arch/x86/kernel/cpu/microcode/intel.c | 32 +++++++++++++++++----------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 8f7f8dd6680e..733b5eac0444 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -435,10 +435,10 @@ static bool load_builtin_intel_microcode(struct cpio_data *cp)
  * Print ucode update info.
  */
 static void
-print_ucode_info(struct ucode_cpu_info *uci, unsigned int date)
+print_ucode_info(u32 old_rev, struct ucode_cpu_info *uci, unsigned int date)
 {
-	pr_info_once("microcode updated early to revision 0x%x, date = %04x-%02x-%02x\n",
-		     uci->cpu_sig.rev,
+	pr_info_once("early update: 0x%x -> 0x%x, date = %04x-%02x-%02x\n",
+		     old_rev, uci->cpu_sig.rev,
 		     date & 0xffff,
 		     date >> 24,
 		     (date >> 16) & 0xff);
@@ -448,6 +448,7 @@ print_ucode_info(struct ucode_cpu_info *uci, unsigned int date)
 
 static int delay_ucode_info;
 static int current_mc_date;
+static u32 early_old_rev;
 
 /*
  * Print early updated ucode info after printk works. This is delayed info dump.
@@ -458,7 +459,7 @@ void show_ucode_info_early(void)
 
 	if (delay_ucode_info) {
 		intel_cpu_collect_info(&uci);
-		print_ucode_info(&uci, current_mc_date);
+		print_ucode_info(early_old_rev, &uci, current_mc_date);
 		delay_ucode_info = 0;
 	}
 }
@@ -467,11 +468,12 @@ void show_ucode_info_early(void)
  * At this point, we can not call printk() yet. Delay printing microcode info in
  * show_ucode_info_early() until printk() works.
  */
-static void print_ucode(struct ucode_cpu_info *uci)
+static void print_ucode(u32 old_rev, struct ucode_cpu_info *uci)
 {
 	struct microcode_intel *mc;
 	int *delay_ucode_info_p;
 	int *current_mc_date_p;
+	u32 *old_rev_p;
 
 	mc = uci->mc;
 	if (!mc)
@@ -479,13 +481,15 @@ static void print_ucode(struct ucode_cpu_info *uci)
 
 	delay_ucode_info_p = (int *)__pa_nodebug(&delay_ucode_info);
 	current_mc_date_p = (int *)__pa_nodebug(&current_mc_date);
+	old_rev_p = (u32 *)__pa_nodebug(&early_old_rev);
 
 	*delay_ucode_info_p = 1;
 	*current_mc_date_p = mc->hdr.date;
+	*old_rev_p = old_rev;
 }
 #else
 
-static inline void print_ucode(struct ucode_cpu_info *uci)
+static inline void print_ucode(u32 old_rev, struct ucode_cpu_info *uci)
 {
 	struct microcode_intel *mc;
 
@@ -493,14 +497,14 @@ static inline void print_ucode(struct ucode_cpu_info *uci)
 	if (!mc)
 		return;
 
-	print_ucode_info(uci, mc->hdr.date);
+	print_ucode_info(old_rev, uci, mc->hdr.date);
 }
 #endif
 
 static int apply_microcode_early(struct ucode_cpu_info *uci, bool early)
 {
 	struct microcode_intel *mc;
-	u32 rev;
+	u32 old_rev, rev;
 
 	mc = uci->mc;
 	if (!mc)
@@ -517,6 +521,7 @@ static int apply_microcode_early(struct ucode_cpu_info *uci, bool early)
 		return UCODE_OK;
 	}
 
+	old_rev = rev;
 	/*
 	 * Writeback and invalidate caches before updating microcode to avoid
 	 * internal issues depending on what the microcode is updating.
@@ -533,9 +538,9 @@ static int apply_microcode_early(struct ucode_cpu_info *uci, bool early)
 	uci->cpu_sig.rev = rev;
 
 	if (early)
-		print_ucode(uci);
+		print_ucode(old_rev, uci);
 	else
-		print_ucode_info(uci, mc->hdr.date);
+		print_ucode_info(old_rev, uci, mc->hdr.date);
 
 	return 0;
 }
@@ -739,6 +744,9 @@ static enum ucode_state apply_microcode_intel(int cpu)
 		goto out;
 	}
 
+	if (!prev_rev)
+		prev_rev = rev;
+
 	/*
 	 * Writeback and invalidate caches before updating microcode to avoid
 	 * internal issues depending on what the microcode is updating.
@@ -757,8 +765,8 @@ static enum ucode_state apply_microcode_intel(int cpu)
 	}
 
 	if (bsp && rev != prev_rev) {
-		pr_info("updated to revision 0x%x, date = %04x-%02x-%02x\n",
-			rev,
+		pr_info("update 0x%x -> 0x%x, date = %04x-%02x-%02x\n",
+			prev_rev, rev,
 			mc->hdr.date & 0xffff,
 			mc->hdr.date >> 24,
 			(mc->hdr.date >> 16) & 0xff);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [v2 03/13] x86/microcode/intel: Fix a hang if early loading microcode fails
  2022-11-03 17:58 [v2 00/13] Make microcode late loading more robust Ashok Raj
  2022-11-03 17:58 ` [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times Ashok Raj
  2022-11-03 17:58 ` [v2 02/13] x86/microcode/intel: Print old and new rev after early microcode update Ashok Raj
@ 2022-11-03 17:58 ` Ashok Raj
  2022-11-09 11:25   ` Borislav Petkov
  2022-11-03 17:58 ` [v2 04/13] x86/microcode: Fix microcode_check() compare after a new uCode update Ashok Raj
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Ashok Raj @ 2022-11-03 17:58 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: LKML Mailing List, X86-kernel, Tony Luck, Dave Hansen,
	Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan, Tom Lendacky,
	Kai Huang, Andrew Cooper, Ashok Raj

When early loading of microcode fails for any reason other than the wrong
family-model-stepping, Linux can get into an infinite loop retrying the
same failed load.

A single retry is needed to handle any mixed stepping case.

Assume we have a microcode that fails to load for some reason.
load_ucode_ap() seems to retry if the loading fails. But it searches for
a new rev, but ends up finding the same copy. Hence it appears to repeat
the same load, retry loop for ever.

load_ucode_intel_ap()
{
..
reget:
        if (!*iup) {
                patch = __load_ucode_intel(&uci);
		^^^^^ Finds the same patch every time.

                if (!patch)
                        return;

                *iup = patch;
        }

        uci.mc = *iup;

        if (apply_microcode_early(&uci, true)) {
	^^^^^^^^^^^^ apply fails
              /* Mixed-silicon system? Try to refetch the proper patch: */
              *iup = NULL;

              goto reget;
	      ^^^^^ Rince repeat.
        }

}

Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/intel.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 733b5eac0444..8ef04447fcf0 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -606,6 +606,7 @@ void __init load_ucode_intel_bsp(void)
 {
 	struct microcode_intel *patch;
 	struct ucode_cpu_info uci;
+	int rev, ret;
 
 	patch = __load_ucode_intel(&uci);
 	if (!patch)
@@ -613,13 +614,18 @@ void __init load_ucode_intel_bsp(void)
 
 	uci.mc = patch;
 
-	apply_microcode_early(&uci, true);
+	ret = apply_microcode_early(&uci, true);
+	if (ret) {
+		rev = patch->hdr.rev;
+		pr_err("Revision 0x%x failed during early loading\n", rev);
+	}
 }
 
 void load_ucode_intel_ap(void)
 {
 	struct microcode_intel *patch, **iup;
 	struct ucode_cpu_info uci;
+	bool retried = false;
 
 	if (IS_ENABLED(CONFIG_X86_32))
 		iup = (struct microcode_intel **) __pa_nodebug(&intel_ucode_patch);
@@ -638,9 +644,13 @@ void load_ucode_intel_ap(void)
 	uci.mc = *iup;
 
 	if (apply_microcode_early(&uci, true)) {
+		if (retried)
+			return;
+
 		/* Mixed-silicon system? Try to refetch the proper patch: */
 		*iup = NULL;
 
+		retried = true;
 		goto reget;
 	}
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [v2 04/13] x86/microcode: Fix microcode_check() compare after a new uCode update
  2022-11-03 17:58 [v2 00/13] Make microcode late loading more robust Ashok Raj
                   ` (2 preceding siblings ...)
  2022-11-03 17:58 ` [v2 03/13] x86/microcode/intel: Fix a hang if early loading microcode fails Ashok Raj
@ 2022-11-03 17:58 ` Ashok Raj
  2022-11-03 17:58 ` [v2 05/13] x86/microcode: Move late-load warning to earlier where kernel taint happens Ashok Raj
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Ashok Raj @ 2022-11-03 17:58 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: LKML Mailing List, X86-kernel, Tony Luck, Dave Hansen,
	Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan, Tom Lendacky,
	Kai Huang, Andrew Cooper, Ashok Raj

microcode_check() seems to take a snapshot after an update to compare with
previously cached values of x86_capabilities. Some capabilities can be
turned off by OS as a result of certain command line parameters or due to
some configuration.

Even though there was no change in CPUID bits, reloading the same microcode
threw a warning, that some CPUID bits changed.

To eliminate the false warning, take a snapshot before the update and one
after the update. This eliminates the miscompare.

Also move the microcode_check() from cpu/common.c -> cpu/microcode/core.c

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/include/asm/processor.h     |  1 -
 arch/x86/kernel/cpu/common.c         | 32 -----------------
 arch/x86/kernel/cpu/microcode/core.c | 51 +++++++++++++++++++++++++++-
 3 files changed, 50 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 67c9d73b31fa..2acc8ae0bf47 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -835,7 +835,6 @@ bool xen_set_default_idle(void);
 #endif
 
 void __noreturn stop_this_cpu(void *dummy);
-void microcode_check(void);
 
 enum l1tf_mitigations {
 	L1TF_MITIGATION_OFF,
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3e508f239098..bbd362ead043 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2305,38 +2305,6 @@ void cpu_init_secondary(void)
 }
 #endif
 
-#ifdef CONFIG_MICROCODE_LATE_LOADING
-/*
- * The microcode loader calls this upon late microcode load to recheck features,
- * only when microcode has been updated. Caller holds microcode_mutex and CPU
- * hotplug lock.
- */
-void microcode_check(void)
-{
-	struct cpuinfo_x86 info;
-
-	perf_check_microcode();
-
-	/* Reload CPUID max function as it might've changed. */
-	info.cpuid_level = cpuid_eax(0);
-
-	/*
-	 * Copy all capability leafs to pick up the synthetic ones so that
-	 * memcmp() below doesn't fail on that. The ones coming from CPUID will
-	 * get overwritten in get_cpu_cap().
-	 */
-	memcpy(&info.x86_capability, &boot_cpu_data.x86_capability, sizeof(info.x86_capability));
-
-	get_cpu_cap(&info);
-
-	if (!memcmp(&info.x86_capability, &boot_cpu_data.x86_capability, sizeof(info.x86_capability)))
-		return;
-
-	pr_warn("x86/CPU: CPU features have changed after loading microcode, but might not take effect.\n");
-	pr_warn("x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.\n");
-}
-#endif
-
 /*
  * Invoked from core CPU hotplug code after hotplug operations
  */
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 712aafff96e0..9ed1f6e138d6 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -40,6 +40,8 @@
 #include <asm/cmdline.h>
 #include <asm/setup.h>
 
+#include "../cpu.h"
+
 #define DRIVER_VERSION	"2.2"
 
 static struct microcode_ops	*microcode_ops;
@@ -431,6 +433,51 @@ static int __reload_late(void *info)
 	return ret;
 }
 
+static void copy_cpu_caps(struct cpuinfo_x86 *info)
+{
+	/* Reload CPUID max function as it might've changed. */
+	info->cpuid_level = cpuid_eax(0);
+
+	/*
+	 * Copy all capability leafs to pick up the synthetic ones so that
+	 * memcmp() below doesn't fail on that. The ones coming from CPUID will
+	 * get overwritten in get_cpu_cap().
+	 */
+	memcpy(info->x86_capability, &boot_cpu_data.x86_capability, sizeof(info->x86_capability));
+
+	get_cpu_cap(info);
+}
+
+/*
+ * The microcode loader calls this upon late microcode load to recheck features,
+ * only when microcode has been updated. Caller holds microcode_mutex and CPU
+ * hotplug lock.
+ */
+static void microcode_check(struct cpuinfo_x86 *orig)
+{
+	struct cpuinfo_x86 info;
+
+	perf_check_microcode();
+
+	/* Reload CPUID max function as it might've changed. */
+	info.cpuid_level = cpuid_eax(0);
+
+	/*
+	 * Copy all capability leafs to pick up the synthetic ones so that
+	 * memcmp() below doesn't fail on that. The ones coming from CPUID will
+	 * get overwritten in get_cpu_cap().
+	 */
+	memcpy(&info.x86_capability, &boot_cpu_data.x86_capability, sizeof(info.x86_capability));
+
+	get_cpu_cap(&info);
+
+	if (!memcmp(info.x86_capability, &orig->x86_capability, sizeof(info.x86_capability)))
+		return;
+
+	pr_warn("x86/CPU: CPU features have changed after loading microcode, but might not take effect.\n");
+	pr_warn("x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.\n");
+}
+
 /*
  * Reload microcode late on all CPUs. Wait for a sec until they
  * all gather together.
@@ -438,6 +485,7 @@ static int __reload_late(void *info)
 static int microcode_reload_late(void)
 {
 	int old = boot_cpu_data.microcode, ret;
+	struct cpuinfo_x86 info;
 
 	pr_err("Attempting late microcode loading - it is dangerous and taints the kernel.\n");
 	pr_err("You should switch to early loading, if possible.\n");
@@ -445,9 +493,10 @@ static int microcode_reload_late(void)
 	atomic_set(&late_cpus_in,  0);
 	atomic_set(&late_cpus_out, 0);
 
+	copy_cpu_caps(&info);
 	ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);
 	if (ret == 0)
-		microcode_check();
+		microcode_check(&info);
 
 	pr_info("Reload completed, microcode revision: 0x%x -> 0x%x\n",
 		old, boot_cpu_data.microcode);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [v2 05/13] x86/microcode: Move late-load warning to earlier where kernel taint happens
  2022-11-03 17:58 [v2 00/13] Make microcode late loading more robust Ashok Raj
                   ` (3 preceding siblings ...)
  2022-11-03 17:58 ` [v2 04/13] x86/microcode: Fix microcode_check() compare after a new uCode update Ashok Raj
@ 2022-11-03 17:58 ` Ashok Raj
  2022-11-03 17:58 ` [v2 06/13] x86/ipi: Support sending NMI_VECTOR as self ipi Ashok Raj
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Ashok Raj @ 2022-11-03 17:58 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: LKML Mailing List, X86-kernel, Tony Luck, Dave Hansen,
	Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan, Tom Lendacky,
	Kai Huang, Andrew Cooper, Ashok Raj

Move where the late loading warning is being issued to earlier in the call.
This would put the warn and taint in the same function.

Just a tidy thing, no functional changes.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 9ed1f6e138d6..d41207e50ee6 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -487,9 +487,6 @@ static int microcode_reload_late(void)
 	int old = boot_cpu_data.microcode, ret;
 	struct cpuinfo_x86 info;
 
-	pr_err("Attempting late microcode loading - it is dangerous and taints the kernel.\n");
-	pr_err("You should switch to early loading, if possible.\n");
-
 	atomic_set(&late_cpus_in,  0);
 	atomic_set(&late_cpus_out, 0);
 
@@ -530,6 +527,9 @@ static ssize_t reload_store(struct device *dev,
 	if (tmp_ret != UCODE_NEW)
 		goto put;
 
+	pr_err("Attempting late microcode loading - it is dangerous and taints the kernel.\n");
+	pr_err("You should switch to early loading, if possible.\n");
+
 	mutex_lock(&microcode_mutex);
 	ret = microcode_reload_late();
 	mutex_unlock(&microcode_mutex);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [v2 06/13] x86/ipi: Support sending NMI_VECTOR as self ipi
  2022-11-03 17:58 [v2 00/13] Make microcode late loading more robust Ashok Raj
                   ` (4 preceding siblings ...)
  2022-11-03 17:58 ` [v2 05/13] x86/microcode: Move late-load warning to earlier where kernel taint happens Ashok Raj
@ 2022-11-03 17:58 ` Ashok Raj
  2022-11-03 17:58 ` [v2 07/13] x86/microcode: Place siblings in NMI loop while update in progress Ashok Raj
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Ashok Raj @ 2022-11-03 17:58 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: LKML Mailing List, X86-kernel, Tony Luck, Dave Hansen,
	Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan, Tom Lendacky,
	Kai Huang, Andrew Cooper, Jacob Pan, Ashok Raj

From: Jacob Pan <jacob.jun.pan@linux.intel.com>

apic->send_IPI_self() can be used to send any general vector as a self IPI.
The function uses a SHORTCUT specifier, but it can't be used to send a
vector with delivery mode as NMI. Chapter 10, Advanved Programmable
Interrupt Controller (APIC) Table 10-3 indicates that the shortcut isn't a
legal combination for NMI delivery. The same is true for x2apic
implementations as well, the self IPI MSR can only specify the vector
number, but no delivery mode.

The helper adds proper handling if the vector is NMI_VECTOR.

Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Co-developed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/apic/ipi.c         |  6 +++++-
 arch/x86/kernel/apic/x2apic_phys.c |  6 +++++-
 arch/x86/kernel/nmi_selftest.c     | 32 ++++++++++++++++++++++++++++++
 3 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index 2a6509e8c840..e967c49609ef 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -239,7 +239,11 @@ void default_send_IPI_all(int vector)
 
 void default_send_IPI_self(int vector)
 {
-	__default_send_IPI_shortcut(APIC_DEST_SELF, vector);
+	if (unlikely(vector == NMI_VECTOR))
+		apic->send_IPI_mask(cpumask_of(smp_processor_id()),
+				    NMI_VECTOR);
+	else
+		__default_send_IPI_shortcut(APIC_DEST_SELF, vector);
 }
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index 6bde05a86b4e..cf187f1906b2 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -149,7 +149,11 @@ int x2apic_phys_pkg_id(int initial_apicid, int index_msb)
 
 void x2apic_send_IPI_self(int vector)
 {
-	apic_write(APIC_SELF_IPI, vector);
+	if (unlikely(vector == NMI_VECTOR))
+		apic->send_IPI_mask(cpumask_of(smp_processor_id()),
+				    NMI_VECTOR);
+	else
+		apic_write(APIC_SELF_IPI, vector);
 }
 
 static struct apic apic_x2apic_phys __ro_after_init = {
diff --git a/arch/x86/kernel/nmi_selftest.c b/arch/x86/kernel/nmi_selftest.c
index a1a96df3dff1..f4b813821208 100644
--- a/arch/x86/kernel/nmi_selftest.c
+++ b/arch/x86/kernel/nmi_selftest.c
@@ -105,6 +105,36 @@ static void __init local_ipi(void)
 	test_nmi_ipi(to_cpumask(nmi_ipi_mask));
 }
 
+static void __init self_nmi_test(void)
+{
+	unsigned long timeout;
+
+	cpumask_clear(to_cpumask(nmi_ipi_mask));
+	cpumask_set_cpu(smp_processor_id(), to_cpumask(nmi_ipi_mask));
+
+	if (register_nmi_handler(NMI_LOCAL, test_nmi_ipi_callback,
+				 NMI_FLAG_FIRST, "nmi_selftest", __initdata)) {
+		nmi_fail = FAILURE;
+		return;
+	}
+
+	/* sync above data before sending NMI */
+	wmb();
+
+	apic->send_IPI_self(NMI_VECTOR);
+
+	/* Don't wait longer than a second */
+	timeout = USEC_PER_SEC;
+	while (!cpumask_empty(to_cpumask(nmi_ipi_mask)) && --timeout)
+		udelay(1);
+
+	/* What happens if we timeout, do we still unregister?? */
+	unregister_nmi_handler(NMI_LOCAL, "nmi_selftest");
+
+	if (!timeout)
+		nmi_fail = TIMEOUT;
+}
+
 static void __init reset_nmi(void)
 {
 	nmi_fail = 0;
@@ -157,6 +187,8 @@ void __init nmi_selftest(void)
 	print_testname("local IPI");
 	dotest(local_ipi, SUCCESS);
 	printk(KERN_CONT "\n");
+	print_testname("Self NMI IPI");
+	dotest(self_nmi_test, SUCCESS);
 
 	cleanup_nmi_testsuite();
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [v2 07/13] x86/microcode: Place siblings in NMI loop while update in progress
  2022-11-03 17:58 [v2 00/13] Make microcode late loading more robust Ashok Raj
                   ` (5 preceding siblings ...)
  2022-11-03 17:58 ` [v2 06/13] x86/ipi: Support sending NMI_VECTOR as self ipi Ashok Raj
@ 2022-11-03 17:58 ` Ashok Raj
  2022-11-03 17:58 ` [v2 08/13] x86/mce: Warn of a microcode update is in progress when MCE arrives Ashok Raj
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Ashok Raj @ 2022-11-03 17:58 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: LKML Mailing List, X86-kernel, Tony Luck, Dave Hansen,
	Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan, Tom Lendacky,
	Kai Huang, Andrew Cooper, Ashok Raj

Microcode updates affect the state of the running CPU. In the case of
hyper-threads, the thread initiating the update is in a known state
(performing wrmsr 0x79), but its HT sibling can be executing arbitrary
instructions.

If one of these arbitrary instruction is being patched by the update at the
same time the sibling is trying to execute from it, its using microcode in
an unstable state.

Ensuring a rendezvous of all CPUs using stop_machine() ensures that
siblings are not executing any random user space code, and stop_machine()
also masks interrupts that can be masked.

The ones that can still slip in are the exceptions. They are:

NMI entry code and NMI handlers can also execute relatively arbitrary
instructions. This is an effort to ensure NMI doesn't slip until the wrmsr
has completed.

== Solution: NMI prevention during update ==

Before the stop_machine() rendezvous, an NMI handler is registered. The
handler is placed at the beginning of all other handlers. The siblings
then kick themselves into NMI by doing a self NMI IPI.

The handler does two things:

- Informs the primary thread that it has entered the NMI handler. Only
  after all siblings of a core have entered NMI, the primary proceeds
  with wrmsr to update microcode.
- It spins until the primary CPU has completed the wrmsr and informs the
  sibling to quit the NMI loop.

Also an important thing to remember is the microcode requests for exclusive
access to the core before performing an update. This effectively pulls the
sibling into microcode control until the wrmsr has released exclusive
access. Since the sibling is not executing any instructions while the
wrmsr completes, no other exceptions will surface on the sibling CPU.

Breakpoints can be another source that can lead do taking exceptions. But
on NMI entry, the kernel seems to be save/clear/restore the breakpoint
control register (DR7). local_db_save() and local_db_restore(). This
effectively eliminates any breakpoints leading the sibling into
uncontrolled execution.

The algorithm is something like this:

After stop_machine() all threads are executing __reload_late()

hold_sibling_in_nmi()
{
	/* Not a candidate for uCode NMI Sync */
	if (cpu has no nmi_primary_ptr)
		return;

	update sibling reached NMI for primary to continue

	while (primary not done with update)
		wait;

	return;
}

exc_nmi:IDT()
{
	....
	hold_sibling_in_nmi();
	...
}

__reload_late()
{

	entry_rendezvous(&late_cpus_in);

	if (this_cpu is first_cpu in the core)
		wait for core siblings to drop in NMI
		apply_microcode()
		set completion to release sibling from NMI
	else
		set sibling info to drop into NMI
		send self_IPI(NMI_VECTOR);

wait_for_siblings:

	exit_rendezvous(&late_cpus_out);
}

reload_late()
{
	register_nmi_handler()
	stop_machine(__reload_late);
	unregister_nmi_handler();
}

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/include/asm/microcode.h       | 37 ++++++++++
 arch/x86/kernel/cpu/microcode/core.c   | 98 ++++++++++++++++++++++++--
 arch/x86/kernel/cpu/microcode/nmi.c    | 71 +++++++++++++++++++
 arch/x86/kernel/nmi.c                  |  7 ++
 arch/x86/kernel/cpu/microcode/Makefile |  1 +
 5 files changed, 210 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/microcode/nmi.c

diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index d5a58bde091c..ffb46f2b0354 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -6,6 +6,37 @@
 #include <linux/earlycpio.h>
 #include <linux/initrd.h>
 
+/*
+ * Although this is a per-cpu structure, both the primary and siblings use
+ * only the primary structure to communicate.
+ * All core siblings set an indication they all reached NMI handler.
+ * Once primary has completed the microcode update, sets core_done to
+ * release all core siblings out of NMI.
+ *
+ * num_core_cpus - Number of CPUs in the core.
+ * callin	 - Siblings set to inform primary once they reach NMI.
+ * core_done	 - Set by primary once microcode update has completed.
+ * failed	 - Set when there is a timeout situation during rendezvous
+ */
+struct core_rendez {
+	int num_core_cpus;
+	atomic_t callin;
+	atomic_t core_done;
+	atomic_t failed;
+};
+
+DECLARE_PER_CPU(struct core_rendez, core_sync);
+
+/*
+ * The following structure is only used by secondary.
+ * Sets the primary per_cpu variable to be found inside the NMI handler to
+ * indicate this CPU  is supposed to drop into NMI. Its consulted in the
+ * NMI handler before entering the loop waiting for primary to finish the
+ * loading process. Once loading is complete the NMI handler clears this
+ * pointer.
+ */
+DECLARE_PER_CPU(struct core_rendez *, nmi_primary_ptr);
+
 struct ucode_patch {
 	struct list_head plist;
 	void *data;		/* Intel uses only this one */
@@ -135,4 +166,10 @@ static inline void reload_early_microcode(void)			{ }
 static inline void microcode_bsp_resume(void)			{ }
 #endif
 
+#ifdef CONFIG_MICROCODE_LATE_LOADING
+extern void hold_sibling_in_nmi(void);
+#else
+static inline void hold_sibling_in_nmi(void) { }
+#endif
+
 #endif /* _ASM_X86_MICROCODE_H */
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index d41207e50ee6..6084a87ea8f3 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -39,6 +39,8 @@
 #include <asm/processor.h>
 #include <asm/cmdline.h>
 #include <asm/setup.h>
+#include <asm/apic.h>
+#include <asm/mce.h>
 
 #include "../cpu.h"
 
@@ -380,6 +382,59 @@ static int __wait_for_cpus(atomic_t *t, long long timeout)
 	return 0;
 }
 
+/*
+ * This simply ensures that the self IPI with NMI to siblings is marked as
+ * handled.
+ */
+static int ucode_nmi_cb(unsigned int val, struct pt_regs *regs)
+{
+	return NMI_HANDLED;
+}
+
+/*
+ * Primary thread waits for all siblings to report that they have enterered
+ * the NMI handler
+ */
+static int __wait_for_core_siblings(struct core_rendez *rendez)
+{
+	int num_sibs = rendez->num_core_cpus - 1;
+	unsigned long long timeout = NSEC_PER_MSEC;
+	atomic_t *t = &rendez->callin;
+	int cpu = smp_processor_id();
+
+	while (atomic_read(t) < num_sibs) {
+		cpu_relax();
+		ndelay(SPINUNIT);
+		touch_nmi_watchdog();
+		timeout -= SPINUNIT;
+		if (timeout < SPINUNIT) {
+			pr_err("CPU%d timedout waiting for siblings\n", cpu);
+			atomic_inc(&rendez->failed);
+			return 1;
+		}
+	}
+	return 0;
+}
+
+static void prepare_for_nmi(void)
+{
+	int cpu, first_cpu;
+	struct core_rendez *pcpu_core;
+
+	for_each_online_cpu(cpu) {
+		first_cpu = cpumask_first(topology_sibling_cpumask(cpu));
+		if (cpu != first_cpu)
+			continue;
+
+		pcpu_core = &per_cpu(core_sync, first_cpu);
+		pcpu_core->num_core_cpus =
+		     cpumask_weight(topology_sibling_cpumask(cpu));
+		atomic_set(&pcpu_core->callin, 0);
+		atomic_set(&pcpu_core->core_done, 0);
+		atomic_set(&pcpu_core->failed, 0);
+	}
+}
+
 /*
  * Returns:
  * < 0 - on error
@@ -387,14 +442,15 @@ static int __wait_for_cpus(atomic_t *t, long long timeout)
  */
 static int __reload_late(void *info)
 {
-	int cpu = smp_processor_id();
+	int first_cpu, cpu = smp_processor_id();
+	struct core_rendez *pcpu_core;
 	enum ucode_state err;
 	int ret = 0;
 
 	/*
 	 * Wait for all CPUs to arrive. A load will not be attempted unless all
 	 * CPUs show up.
-	 * */
+	 */
 	if (__wait_for_cpus(&late_cpus_in, NSEC_PER_SEC))
 		return -1;
 
@@ -405,10 +461,32 @@ static int __reload_late(void *info)
 	 * loading attempts happen on multiple threads of an SMT core. See
 	 * below.
 	 */
-	if (cpumask_first(topology_sibling_cpumask(cpu)) == cpu)
+	first_cpu = cpumask_first(topology_sibling_cpumask(cpu));
+	pcpu_core = &per_cpu(core_sync, first_cpu);
+
+	/*
+	 * Set the CPUs that we should hold in NMI until the primary has
+	 * completed the microcode update.
+	 */
+	if (first_cpu == cpu) {
+		/*
+		 * Wait for all siblings to enter
+		 * NMI before performing the update
+		 */
+		ret = __wait_for_core_siblings(pcpu_core);
+		if (ret || atomic_read(&pcpu_core->failed)) {
+			pr_err("CPU %d core lead timeout waiting for siblings\n", cpu);
+			ret = -1;
+		}
+		pr_debug("Primary CPU %d proceeding with update\n", cpu);
 		err = microcode_ops->apply_microcode(cpu);
-	else
+		atomic_set(&pcpu_core->core_done, 1);
+	} else {
+		/* We set the per-cpu of sibling in this case */
+		this_cpu_write(nmi_primary_ptr, pcpu_core);
+		apic->send_IPI_self(NMI_VECTOR);
 		goto wait_for_siblings;
+	}
 
 	if (err >= UCODE_NFOUND) {
 		if (err == UCODE_ERROR)
@@ -490,6 +568,15 @@ static int microcode_reload_late(void)
 	atomic_set(&late_cpus_in,  0);
 	atomic_set(&late_cpus_out, 0);
 
+	prepare_for_nmi();
+
+	ret = register_nmi_handler(NMI_LOCAL, ucode_nmi_cb, NMI_FLAG_FIRST,
+				   "ucode_nmi");
+	if (ret) {
+		pr_err("Unable to register NMI handler\n");
+		goto done;
+	}
+
 	copy_cpu_caps(&info);
 	ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);
 	if (ret == 0)
@@ -498,6 +585,9 @@ static int microcode_reload_late(void)
 	pr_info("Reload completed, microcode revision: 0x%x -> 0x%x\n",
 		old, boot_cpu_data.microcode);
 
+	unregister_nmi_handler(NMI_LOCAL, "ucode_nmi");
+
+done:
 	return ret;
 }
 
diff --git a/arch/x86/kernel/cpu/microcode/nmi.c b/arch/x86/kernel/cpu/microcode/nmi.c
new file mode 100644
index 000000000000..8899659cc5d6
--- /dev/null
+++ b/arch/x86/kernel/cpu/microcode/nmi.c
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022 Ashok Raj <ashok.raj@intel.com>
+ *
+ * X86 CPU microcode update NMI handler.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/delay.h>
+#include <linux/cpu.h>
+#include <linux/nmi.h>
+
+#include <asm/microcode.h>
+
+#define SPINUNIT	100 /* 100 nsec */
+
+DEFINE_PER_CPU(struct core_rendez, core_sync);
+DEFINE_PER_CPU(struct core_rendez *, nmi_primary_ptr);
+
+#define SPINUNIT 100 /* 100 nsec */
+
+static void delay(int ms)
+{
+	unsigned long timeout = jiffies + ((ms * HZ) / 1000);
+
+	while (time_before(jiffies, timeout))
+		cpu_relax();
+}
+
+/*
+ * Siblings wait until microcode update is completed by the primary thread.
+ */
+static int __wait_for_update(atomic_t *t)
+{
+	unsigned long long timeout = NSEC_PER_MSEC;
+
+	while (!arch_atomic_read(t)) {
+		cpu_relax();
+		delay(1);
+		timeout -= SPINUNIT;
+		if (timeout < SPINUNIT)
+			return 1;
+	}
+	return 0;
+}
+
+noinstr void hold_sibling_in_nmi(void)
+{
+	struct	 core_rendez *pcpu_core;
+	int ret = 0;
+
+	pcpu_core = this_cpu_read(nmi_primary_ptr);
+	if (likely(!pcpu_core))
+		return;
+
+	/*
+	 * Increment the callin to inform primary thread that the sibling
+	 * has arrived and parked in the NMI handler
+	 */
+	arch_atomic_inc(&pcpu_core->callin);
+
+	ret = __wait_for_update(&pcpu_core->core_done);
+	if (ret)
+		atomic_inc(&pcpu_core->failed);
+
+	/*
+	 * Clear the nmi_trap, so future NMI's won't be affected
+	 */
+	this_cpu_write(nmi_primary_ptr, NULL);
+}
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index cec0bfa3bc04..619afeaef07c 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -28,6 +28,7 @@
 #include <asm/cpu_entry_area.h>
 #include <asm/traps.h>
 #include <asm/mach_traps.h>
+#include <asm/microcode.h>
 #include <asm/nmi.h>
 #include <asm/x86_init.h>
 #include <asm/reboot.h>
@@ -505,6 +506,12 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
 
 	this_cpu_write(nmi_dr7, local_db_save());
 
+	/*
+	 * If microcodeupdate is in progress, check and hold the sibling in
+	 * the NMI until primary has completed the update
+	 */
+	hold_sibling_in_nmi();
+
 	irq_state = irqentry_nmi_enter(regs);
 
 	inc_irq_stat(__nmi_count);
diff --git a/arch/x86/kernel/cpu/microcode/Makefile b/arch/x86/kernel/cpu/microcode/Makefile
index 34098d48c48f..e469990bba73 100644
--- a/arch/x86/kernel/cpu/microcode/Makefile
+++ b/arch/x86/kernel/cpu/microcode/Makefile
@@ -3,3 +3,4 @@ microcode-y				:= core.o
 obj-$(CONFIG_MICROCODE)			+= microcode.o
 microcode-$(CONFIG_MICROCODE_INTEL)	+= intel.o
 microcode-$(CONFIG_MICROCODE_AMD)	+= amd.o
+microcode-$(CONFIG_MICROCODE_LATE_LOADING) += nmi.o
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [v2 08/13] x86/mce: Warn of a microcode update is in progress when MCE arrives
  2022-11-03 17:58 [v2 00/13] Make microcode late loading more robust Ashok Raj
                   ` (6 preceding siblings ...)
  2022-11-03 17:58 ` [v2 07/13] x86/microcode: Place siblings in NMI loop while update in progress Ashok Raj
@ 2022-11-03 17:58 ` Ashok Raj
  2022-11-03 17:58 ` [v2 09/13] x86/microcode/intel: Add minimum required revision to microcode header Ashok Raj
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Ashok Raj @ 2022-11-03 17:58 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: LKML Mailing List, X86-kernel, Tony Luck, Dave Hansen,
	Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan, Tom Lendacky,
	Kai Huang, Andrew Cooper, Ashok Raj

Due to the nature of microcode updates to long flow instructions, its
possible if an MCE is taken when microcode update is in progress could be
dangerous. There is nothing the kernel can do to mitigate safely.

Drop some bread crumbs to note that a MCE happened while a microcode update
is also in progress.

Suggested-by: Boris Petkov <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/include/asm/microcode.h     | 2 ++
 arch/x86/kernel/cpu/mce/core.c       | 5 +++++
 arch/x86/kernel/cpu/microcode/core.c | 9 +++++++++
 3 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index ffb46f2b0354..f16973fb7330 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -167,8 +167,10 @@ static inline void microcode_bsp_resume(void)			{ }
 #endif
 
 #ifdef CONFIG_MICROCODE_LATE_LOADING
+extern int ucode_update_in_progress(void);
 extern void hold_sibling_in_nmi(void);
 #else
+static inline int ucode_update_in_progress(void) { return 0; }
 static inline void hold_sibling_in_nmi(void) { }
 #endif
 
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2c8ec5c71712..67669686fab4 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -46,6 +46,7 @@
 #include <linux/hardirq.h>
 
 #include <asm/intel-family.h>
+#include <asm/microcode.h>
 #include <asm/processor.h>
 #include <asm/traps.h>
 #include <asm/tlbflush.h>
@@ -1425,6 +1426,10 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	else if (unlikely(!mca_cfg.initialized))
 		return unexpected_machine_check(regs);
 
+	instrumentation_begin();
+	if (ucode_update_in_progress())
+		pr_warn("MCE triggered while microcode update is in progress\n");
+	instrumentation_end();
 	if (mce_flags.skx_repmov_quirk && quirk_skylake_repmov())
 		goto clear;
 
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 6084a87ea8f3..6f59ffdf2881 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -327,6 +327,8 @@ void reload_early_microcode(void)
 static struct platform_device	*microcode_pdev;
 
 #ifdef CONFIG_MICROCODE_LATE_LOADING
+static int ucode_updating;
+
 /*
  * Late loading dance. Why the heavy-handed stomp_machine effort?
  *
@@ -556,6 +558,11 @@ static void microcode_check(struct cpuinfo_x86 *orig)
 	pr_warn("x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.\n");
 }
 
+int ucode_update_in_progress(void)
+{
+	return ucode_updating;
+}
+
 /*
  * Reload microcode late on all CPUs. Wait for a sec until they
  * all gather together.
@@ -578,7 +585,9 @@ static int microcode_reload_late(void)
 	}
 
 	copy_cpu_caps(&info);
+	ucode_updating = 1;
 	ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);
+	ucode_updating = 0;
 	if (ret == 0)
 		microcode_check(&info);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [v2 09/13] x86/microcode/intel: Add minimum required revision to microcode header
  2022-11-03 17:58 [v2 00/13] Make microcode late loading more robust Ashok Raj
                   ` (7 preceding siblings ...)
  2022-11-03 17:58 ` [v2 08/13] x86/mce: Warn of a microcode update is in progress when MCE arrives Ashok Raj
@ 2022-11-03 17:58 ` Ashok Raj
  2022-11-03 17:58 ` [v2 10/13] x86/microcode: Add a generic mechanism to declare support for minrev Ashok Raj
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Ashok Raj @ 2022-11-03 17:58 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: LKML Mailing List, X86-kernel, Tony Luck, Dave Hansen,
	Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan, Tom Lendacky,
	Kai Huang, Andrew Cooper, Ashok Raj

In general users don't have the necessary information to determine
whether a late loading of a new microcode version has removed any feature
(MSR, CPUID etc) between what is currently loaded and this new microcode.
To address this issue, Intel has added a "minimum required version" field
to a previously reserved field in the file header. Microcode updates
should only be applied if the current microcode version is equal
to, or greater than this minimum required version.

Thomas made some suggestions[1] on how meta-data in the microcode file
could provide Linux with information to decide if the new microcode is
suitable candidate for late loading. But even the "simpler" option#1
requires a lot of metadata and corresponding kernel code to parse it.

The proposal here is an even simpler option. Simply "OS visible features"
such as CPUID and MSRs are the only two examples. The microcode must not
change these OS visible features because they cause problems after late
loading. When microcode changes features, microcode will change the min_rev
to prevent such microcodes from being late loaded.

Pseudo code for late loading is as follows:

if header.min_required_id == 0
        This is old format microcode, block late loading
else if current_ucode_version < header.min_required_id
        Current version is too old, block late loading of this microcode.
else
        OK to proceed with late loading.

Any microcode that modifies the interface to an OS-visible feature
will set the min_version to itself. This will enforce this microcode is
not suitable for late loading unless the currently loaded revision is
greater or equal to the new microcode affecting the change.

The enforcement is not in hardware and limited to kernel loader enforcing
the requirement. It is not required for early loading of microcode to
enforce this requirement, since the new features are only
evaluated after early loading in the boot process.


Test cases covered:

1. With new kernel, attempting to load an older format microcode with the
   min_rev=0 should be blocked by kernel.

   [  210.541802] Late loading denied: Microcode header does not specify a
   required min version.

2. New microcode with a non-zero min_rev in the header, but the specified
   min_rev is greater than what is currently loaded in the CPU should be
   blocked by kernel.

   245.139828] microcode: Late loading denied: Current revision 0x8f685300 is too old to update, must be at 0xaa000050 version or higher. Use early loading instead.

3. New microcode with a min_rev < currently loaded should allow loading the
   microcode

4. Build initrd with microcode that has min_rev=0, or min_rev > currently
   loaded should permit early loading microcode from initrd.

[1] https://lore.kernel.org/linux-kernel/alpine.DEB.2.21.1909062237580.1902@nanos.tec.linutronix.de/


Tested-by: William Xie <william.xie@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/include/asm/microcode_intel.h |  4 +++-
 arch/x86/kernel/cpu/microcode/intel.c  | 29 +++++++++++++++++++++++---
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/microcode_intel.h b/arch/x86/include/asm/microcode_intel.h
index 4c92cea7e4b5..bc893dd68b82 100644
--- a/arch/x86/include/asm/microcode_intel.h
+++ b/arch/x86/include/asm/microcode_intel.h
@@ -14,7 +14,9 @@ struct microcode_header_intel {
 	unsigned int            pf;
 	unsigned int            datasize;
 	unsigned int            totalsize;
-	unsigned int            reserved[3];
+	unsigned int            reserved1;
+	unsigned int		min_req_ver;
+	unsigned int		reserved3;
 };
 
 struct microcode_intel {
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 8ef04447fcf0..020d0feed3cc 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -163,13 +163,14 @@ static void save_microcode_patch(struct ucode_cpu_info *uci, void *data, unsigne
 		intel_ucode_patch = p->data;
 }
 
-static int microcode_sanity_check(void *mc, int print_err)
+static int microcode_sanity_check(void *mc, int print_err, bool late_loading)
 {
 	unsigned long total_size, data_size, ext_table_size;
 	struct microcode_header_intel *mc_header = mc;
 	struct extended_sigtable *ext_header = NULL;
 	u32 sum, orig_sum, ext_sigcount = 0, i;
 	struct extended_signature *ext_sig;
+	struct ucode_cpu_info uci;
 
 	total_size = get_totalsize(mc_header);
 	data_size = get_datasize(mc_header);
@@ -240,6 +241,24 @@ static int microcode_sanity_check(void *mc, int print_err)
 		return -EINVAL;
 	}
 
+	/*
+	 * When late-loading, enforce that the current revision loaded on
+	 * the CPU is equal or greater than the value specified in the
+	 * new microcode header
+	 */
+	if (late_loading) {
+		if (!mc_header->min_req_ver) {
+			pr_warn("Late loading denied: Microcode header does not specify a required min version\n");
+			return -EINVAL;
+		}
+		intel_cpu_collect_info(&uci);
+		if (uci.cpu_sig.rev < mc_header->min_req_ver) {
+			pr_warn("Late loading denied: Current revision 0x%x too old to update, must be at 0x%x or higher. Use early loading instead\n",
+				uci.cpu_sig.rev, mc_header->min_req_ver);
+			return -EINVAL;
+		}
+	}
+
 	if (!ext_table_size)
 		return 0;
 
@@ -281,7 +300,7 @@ scan_microcode(void *data, size_t size, struct ucode_cpu_info *uci, bool save)
 		mc_size = get_totalsize(mc_header);
 		if (!mc_size ||
 		    mc_size > size ||
-		    microcode_sanity_check(data, 0) < 0)
+		    microcode_sanity_check(data, 0, false) < 0)
 			break;
 
 		size -= mc_size;
@@ -838,7 +857,8 @@ static enum ucode_state generic_load_microcode(int cpu, struct iov_iter *iter)
 		memcpy(mc, &mc_header, sizeof(mc_header));
 		data = mc + sizeof(mc_header);
 		if (!copy_from_iter_full(data, data_size, iter) ||
-		    microcode_sanity_check(mc, 1) < 0) {
+		    microcode_sanity_check(mc, 1, true) < 0) {
+			ret = UCODE_ERROR;
 			break;
 		}
 
@@ -861,6 +881,9 @@ static enum ucode_state generic_load_microcode(int cpu, struct iov_iter *iter)
 		return UCODE_ERROR;
 	}
 
+	if (ret == UCODE_ERROR)
+		return ret;
+
 	if (!new_mc)
 		return UCODE_NFOUND;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [v2 10/13] x86/microcode: Add a generic mechanism to declare support for minrev
  2022-11-03 17:58 [v2 00/13] Make microcode late loading more robust Ashok Raj
                   ` (8 preceding siblings ...)
  2022-11-03 17:58 ` [v2 09/13] x86/microcode/intel: Add minimum required revision to microcode header Ashok Raj
@ 2022-11-03 17:58 ` Ashok Raj
  2022-11-03 17:58 ` [v2 11/13] x86/microcode/intel: Drop wbinvd() from microcode loading Ashok Raj
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Ashok Raj @ 2022-11-03 17:58 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: LKML Mailing List, X86-kernel, Tony Luck, Dave Hansen,
	Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan, Tom Lendacky,
	Kai Huang, Andrew Cooper, Ashok Raj

Intel microcode adds some meta-data to report a minimum required revision
before this new microcode can be late-loaded. There are no generic mechanism
to declare support for all vendors.

Add generic support to microcode to declare support, so the tainting and
late-loading can be permitted in those architectures that support reporting
a minrev in some form.

Late loading has added support for

- New images declaring a required minimum base version before a late-load
  is performed.
- Improved NMI handling during update to avoid sibling threads taking NMI's
  while primary is still not complete with the microcode update.

With these changes, late-loading can be re-enabled. Tainting only happens
on architectures that don't support minimum required version reporting.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
v2: (Kai) Add missing initialization local variable minrev

 arch/x86/include/asm/microcode.h      |  2 ++
 arch/x86/kernel/cpu/microcode/core.c  | 15 +++++++++++----
 arch/x86/kernel/cpu/microcode/intel.c |  6 ++++++
 arch/x86/Kconfig                      |  7 ++++---
 4 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index f16973fb7330..6286b4056792 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -64,6 +64,8 @@ enum ucode_state {
 };
 
 struct microcode_ops {
+	int (*check_minrev) (void);
+
 	enum ucode_state (*request_microcode_fw) (int cpu, struct device *);
 
 	void (*microcode_fini_cpu) (int cpu);
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 6f59ffdf2881..17dba13d397d 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -607,6 +607,7 @@ static ssize_t reload_store(struct device *dev,
 	enum ucode_state tmp_ret = UCODE_OK;
 	int bsp = boot_cpu_data.cpu_index;
 	unsigned long val;
+	int minrev = 0;
 	ssize_t ret = 0;
 
 	ret = kstrtoul(buf, 0, &val);
@@ -622,13 +623,18 @@ static ssize_t reload_store(struct device *dev,
 	if (ret)
 		goto put;
 
+	if (microcode_ops->check_minrev)
+		minrev = microcode_ops->check_minrev();
+
+	if (!minrev) {
+		pr_err("Attempting late microcode loading - it is dangerous and taints the kernel.\n");
+		pr_err("You should switch to early loading, if possible.\n");
+	}
+
 	tmp_ret = microcode_ops->request_microcode_fw(bsp, &microcode_pdev->dev);
 	if (tmp_ret != UCODE_NEW)
 		goto put;
 
-	pr_err("Attempting late microcode loading - it is dangerous and taints the kernel.\n");
-	pr_err("You should switch to early loading, if possible.\n");
-
 	mutex_lock(&microcode_mutex);
 	ret = microcode_reload_late();
 	mutex_unlock(&microcode_mutex);
@@ -639,7 +645,8 @@ static ssize_t reload_store(struct device *dev,
 	if (ret == 0)
 		ret = size;
 
-	add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
+	if (!minrev)
+		add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
 
 	return ret;
 }
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 020d0feed3cc..5d2ee76cd36c 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -956,7 +956,13 @@ static enum ucode_state request_microcode_fw(int cpu, struct device *device)
 	return ret;
 }
 
+static int intel_check_minrev(void)
+{
+	return 1;
+}
+
 static struct microcode_ops microcode_intel_ops = {
+	.check_minrev			  = intel_check_minrev,
 	.request_microcode_fw             = request_microcode_fw,
 	.collect_cpu_info                 = collect_cpu_info,
 	.apply_microcode                  = apply_microcode_intel,
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 6d1879ef933a..b53626bff5f7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1353,15 +1353,16 @@ config MICROCODE_AMD
 	  processors will be enabled.
 
 config MICROCODE_LATE_LOADING
-	bool "Late microcode loading (DANGEROUS)"
-	default n
+	bool "Late microcode loading"
+	default y
 	depends on MICROCODE
 	help
 	  Loading microcode late, when the system is up and executing instructions
 	  is a tricky business and should be avoided if possible. Just the sequence
 	  of synchronizing all cores and SMT threads is one fragile dance which does
 	  not guarantee that cores might not softlock after the loading. Therefore,
-	  use this at your own risk. Late loading taints the kernel too.
+	  use this at your own risk. Late loading taints the kernel, if it
+	  doesn't support a minimum required base version before an update.
 
 config X86_MSR
 	tristate "/dev/cpu/*/msr - Model-specific register support"
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [v2 11/13] x86/microcode/intel: Drop wbinvd() from microcode loading
  2022-11-03 17:58 [v2 00/13] Make microcode late loading more robust Ashok Raj
                   ` (9 preceding siblings ...)
  2022-11-03 17:58 ` [v2 10/13] x86/microcode: Add a generic mechanism to declare support for minrev Ashok Raj
@ 2022-11-03 17:58 ` Ashok Raj
  2022-11-03 17:59 ` [v2 12/13] x86/microcode: Display revisions only when update is successful Ashok Raj
  2022-11-03 17:59 ` [v2 13/13] x86/microcode/intel: Add ability to update microcode even if rev is unchanged Ashok Raj
  12 siblings, 0 replies; 30+ messages in thread
From: Ashok Raj @ 2022-11-03 17:58 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: LKML Mailing List, X86-kernel, Tony Luck, Dave Hansen,
	Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan, Tom Lendacky,
	Kai Huang, Andrew Cooper, Ashok Raj

Some older processors had a bad interaction when updating microcode if the
caches were dirty causing machine checks. The wbinvd() was added to
mitigate that before performing microcode updates. Now that Linux checks
for the minimum version before performing an update, those microcode
revisions can't be loaded. Remove calls to wbinvd().

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/intel.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 5d2ee76cd36c..7086670da606 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -541,11 +541,6 @@ static int apply_microcode_early(struct ucode_cpu_info *uci, bool early)
 	}
 
 	old_rev = rev;
-	/*
-	 * Writeback and invalidate caches before updating microcode to avoid
-	 * internal issues depending on what the microcode is updating.
-	 */
-	native_wbinvd();
 
 	/* write microcode via MSR 0x79 */
 	native_wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits);
@@ -776,12 +771,6 @@ static enum ucode_state apply_microcode_intel(int cpu)
 	if (!prev_rev)
 		prev_rev = rev;
 
-	/*
-	 * Writeback and invalidate caches before updating microcode to avoid
-	 * internal issues depending on what the microcode is updating.
-	 */
-	native_wbinvd();
-
 	/* write microcode via MSR 0x79 */
 	wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [v2 12/13] x86/microcode: Display revisions only when update is successful
  2022-11-03 17:58 [v2 00/13] Make microcode late loading more robust Ashok Raj
                   ` (10 preceding siblings ...)
  2022-11-03 17:58 ` [v2 11/13] x86/microcode/intel: Drop wbinvd() from microcode loading Ashok Raj
@ 2022-11-03 17:59 ` Ashok Raj
  2022-11-03 17:59 ` [v2 13/13] x86/microcode/intel: Add ability to update microcode even if rev is unchanged Ashok Raj
  12 siblings, 0 replies; 30+ messages in thread
From: Ashok Raj @ 2022-11-03 17:59 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: LKML Mailing List, X86-kernel, Tony Luck, Dave Hansen,
	Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan, Tom Lendacky,
	Kai Huang, Andrew Cooper, Ashok Raj

Display the update message only when its successful

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 17dba13d397d..a7d0cbbb2505 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -591,11 +591,11 @@ static int microcode_reload_late(void)
 	if (ret == 0)
 		microcode_check(&info);
 
-	pr_info("Reload completed, microcode revision: 0x%x -> 0x%x\n",
-		old, boot_cpu_data.microcode);
-
 	unregister_nmi_handler(NMI_LOCAL, "ucode_nmi");
 
+	if (!ret)
+		pr_info("Reload completed, microcode revision: 0x%x -> 0x%x\n",
+			old, boot_cpu_data.microcode);
 done:
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [v2 13/13] x86/microcode/intel: Add ability to update microcode even if rev is unchanged
  2022-11-03 17:58 [v2 00/13] Make microcode late loading more robust Ashok Raj
                   ` (11 preceding siblings ...)
  2022-11-03 17:59 ` [v2 12/13] x86/microcode: Display revisions only when update is successful Ashok Raj
@ 2022-11-03 17:59 ` Ashok Raj
  12 siblings, 0 replies; 30+ messages in thread
From: Ashok Raj @ 2022-11-03 17:59 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner
  Cc: LKML Mailing List, X86-kernel, Tony Luck, Dave Hansen,
	Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan, Tom Lendacky,
	Kai Huang, Andrew Cooper, Ashok Raj

This comes in handy for testing without the need for a new microcode file.
Default is OFF. It can be switched dynamically at run time via debugfs file
/sys/kernel/debug/microcode/load_same.

NOT_FOR_INCLUSION:
Leave it to the discretion of Boris if its suitable for inclusion. It will
at least serve to validate some parts of the series without the need for a
new microcode, since it goes through all the renedezvous parts except not
updating the microcode itself.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/include/asm/microcode.h      |  2 ++
 arch/x86/kernel/cpu/microcode/amd.c   |  2 +-
 arch/x86/kernel/cpu/microcode/core.c  | 23 ++++++++++++++++++-----
 arch/x86/kernel/cpu/microcode/intel.c |  4 ++--
 4 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index 6286b4056792..a356a6a5207e 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -47,6 +47,8 @@ struct ucode_patch {
 
 extern struct list_head microcode_cache;
 
+extern bool ucode_load_same;
+
 struct cpu_signature {
 	unsigned int sig;
 	unsigned int pf;
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index b103d5e5f447..802212e194b3 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -694,7 +694,7 @@ static enum ucode_state apply_microcode_amd(int cpu)
 	rdmsr(MSR_AMD64_PATCH_LEVEL, rev, dummy);
 
 	/* need to apply patch? */
-	if (rev >= mc_amd->hdr.patch_id) {
+	if (rev >= mc_amd->hdr.patch_id && !ucode_load_same) {
 		ret = UCODE_OK;
 		goto out;
 	}
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index a7d0cbbb2505..2d0cd8ca3ea2 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -23,6 +23,7 @@
 #include <linux/miscdevice.h>
 #include <linux/capability.h>
 #include <linux/firmware.h>
+#include <linux/debugfs.h>
 #include <linux/kernel.h>
 #include <linux/delay.h>
 #include <linux/mutex.h>
@@ -48,6 +49,7 @@
 
 static struct microcode_ops	*microcode_ops;
 static bool dis_ucode_ldr = true;
+bool ucode_load_same;
 
 bool initrd_gone;
 
@@ -490,11 +492,12 @@ static int __reload_late(void *info)
 		goto wait_for_siblings;
 	}
 
-	if (err >= UCODE_NFOUND) {
-		if (err == UCODE_ERROR)
+	if (ret || err >= UCODE_NFOUND) {
+		if (err == UCODE_ERROR ||
+		    (err == UCODE_NFOUND && !ucode_load_same)) {
 			pr_warn("Error reloading microcode on CPU %d\n", cpu);
-
-		ret = -1;
+			ret = -1;
+		}
 	}
 
 wait_for_siblings:
@@ -632,9 +635,13 @@ static ssize_t reload_store(struct device *dev,
 	}
 
 	tmp_ret = microcode_ops->request_microcode_fw(bsp, &microcode_pdev->dev);
-	if (tmp_ret != UCODE_NEW)
+	if (tmp_ret == UCODE_ERROR ||
+	    (tmp_ret != UCODE_NEW && !ucode_load_same))
 		goto put;
 
+	if (tmp_ret != UCODE_NEW)
+		pr_info("Force loading ucode\n");
+
 	mutex_lock(&microcode_mutex);
 	ret = microcode_reload_late();
 	mutex_unlock(&microcode_mutex);
@@ -781,6 +788,7 @@ static const struct attribute_group cpu_root_microcode_group = {
 static int __init microcode_init(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
+	static struct dentry *dentry_ucode;
 	int error;
 
 	if (dis_ucode_ldr)
@@ -815,7 +823,12 @@ static int __init microcode_init(void)
 	cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "x86/microcode:online",
 				  mc_cpu_online, mc_cpu_down_prep);
 
+	dentry_ucode = debugfs_create_dir("microcode", NULL);
+	debugfs_create_bool("load_same", 0644, dentry_ucode, &ucode_load_same);
+
 	pr_info("Microcode Update Driver: v%s.", DRIVER_VERSION);
+	pr_info("ucode_load_same is %s\n",
+		ucode_load_same ? "enabled" : "disabled");
 
 	return 0;
 
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 7086670da606..08ba6e009d54 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -763,7 +763,7 @@ static enum ucode_state apply_microcode_intel(int cpu)
 	 * already.
 	 */
 	rev = intel_get_microcode_revision();
-	if (rev >= mc->hdr.rev) {
+	if (rev >= mc->hdr.rev && !ucode_load_same) {
 		ret = UCODE_OK;
 		goto out;
 	}
@@ -782,7 +782,7 @@ static enum ucode_state apply_microcode_intel(int cpu)
 		return UCODE_ERROR;
 	}
 
-	if (bsp && rev != prev_rev) {
+	if (bsp && (rev != prev_rev || ucode_load_same)) {
 		pr_info("update 0x%x -> 0x%x, date = %04x-%02x-%02x\n",
 			prev_rev, rev,
 			mc->hdr.date & 0xffff,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times
  2022-11-03 17:58 ` [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times Ashok Raj
@ 2022-11-04 11:00   ` Borislav Petkov
  2022-11-04 13:53     ` Van De Ven, Arjan
  2022-11-06 13:35   ` Borislav Petkov
  2022-12-03 13:51   ` [tip: x86/microcode] x86/microcode/intel: Do not print microcode revision and processor flags tip-bot2 for Ashok Raj
  2 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2022-11-04 11:00 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Thomas Gleixner, LKML Mailing List, X86-kernel, Tony Luck,
	Dave Hansen, Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan,
	Tom Lendacky, Kai Huang, Andrew Cooper

On Thu, Nov 03, 2022 at 05:58:49PM +0000, Ashok Raj wrote:
> @@ -696,8 +697,7 @@ static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
>  
>  	csig->rev = c->microcode;
>  
> -	/* No extra locking on prev, races are harmless. */
> -	if (csig->sig != prev.sig || csig->pf != prev.pf || csig->rev != prev.rev) {
> +	if (bsp && csig->rev != prev.rev) {

This basically means that the loader is not going to support mixed
steppings microcode.

Yes, no?

If yes, can I remove the patch cache too and use a single buffer for the
current patch?

That would simplify things even more.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times
  2022-11-04 11:00   ` Borislav Petkov
@ 2022-11-04 13:53     ` Van De Ven, Arjan
  2022-11-04 15:52       ` Borislav Petkov
  0 siblings, 1 reply; 30+ messages in thread
From: Van De Ven, Arjan @ 2022-11-04 13:53 UTC (permalink / raw)
  To: Borislav Petkov, Raj, Ashok
  Cc: Thomas Gleixner, LKML Mailing List, X86-kernel, Luck, Tony,
	Hansen, Dave, Lutomirski, Andy, Pan, Jacob jun, Tom Lendacky,
	Huang, Kai, andrew.cooper3

> This basically means that the loader is not going to support mixed
> steppings microcode.
> 
> Yes, no?
> 
> If yes, can I remove the patch cache too and use a single buffer for the
> current patch?
> 
> That would simplify things even more.


multistepping is really not well supported, and for cases where it ends up happening, often a "full set" microcode file is made
(where the kernel doesn't need to know)

so I think by all means, if life is simpler, stop doing complicated things for mixed stepping

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times
  2022-11-04 13:53     ` Van De Ven, Arjan
@ 2022-11-04 15:52       ` Borislav Petkov
  2022-11-04 18:28         ` Ashok Raj
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2022-11-04 15:52 UTC (permalink / raw)
  To: Van De Ven, Arjan
  Cc: Raj, Ashok, Thomas Gleixner, LKML Mailing List, X86-kernel, Luck,
	Tony, Hansen, Dave, Lutomirski, Andy, Pan, Jacob jun,
	Tom Lendacky, Huang, Kai, andrew.cooper3

On Fri, Nov 04, 2022 at 01:53:22PM +0000, Van De Ven, Arjan wrote:
> so I think by all means, if life is simpler, stop doing complicated
> things for mixed stepping

Ok, that's cool. Lemme put it on my TODO to remove the cache.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times
  2022-11-04 15:52       ` Borislav Petkov
@ 2022-11-04 18:28         ` Ashok Raj
  2022-11-04 20:21           ` Borislav Petkov
  0 siblings, 1 reply; 30+ messages in thread
From: Ashok Raj @ 2022-11-04 18:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Van De Ven, Arjan, Thomas Gleixner, LKML Mailing List,
	X86-kernel, Luck, Tony, Hansen, Dave, Lutomirski, Andy, Pan,
	Jacob jun, Tom Lendacky, Huang, Kai, andrew.cooper3, Ashok Raj

On Fri, Nov 04, 2022 at 04:52:07PM +0100, Borislav Petkov wrote:
> On Fri, Nov 04, 2022 at 01:53:22PM +0000, Van De Ven, Arjan wrote:
> > so I think by all means, if life is simpler, stop doing complicated
> > things for mixed stepping
> 
> Ok, that's cool. Lemme put it on my TODO to remove the cache.
> 
I have a series cooking too, but this series got too long already. 

Didn't want to slow the minrev and the late load enabling patches :-)

I'll submit right after. There is a bit more cleanup, I had planned.

Wanted to add a check every AP that if its different fms+pf warn 
and bork late-load. We don't need a list, all we need is a single entry.

We didn't push the fix below, but removing the cache is a better option. So
that's already in my list.

https://lore.kernel.org/lkml/20220817051127.3323755-2-ashok.raj@intel.com/

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times
  2022-11-04 18:28         ` Ashok Raj
@ 2022-11-04 20:21           ` Borislav Petkov
  0 siblings, 0 replies; 30+ messages in thread
From: Borislav Petkov @ 2022-11-04 20:21 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Van De Ven, Arjan, Thomas Gleixner, LKML Mailing List,
	X86-kernel, Luck, Tony, Hansen, Dave, Lutomirski, Andy, Pan,
	Jacob jun, Tom Lendacky, Huang, Kai, andrew.cooper3

On Fri, Nov 04, 2022 at 11:28:36AM -0700, Ashok Raj wrote:
> Wanted to add a check every AP that if its different fms+pf warn 
> and bork late-load.

We don't need that check as we don't/won't support mixed steppings. It
is as simple as that.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times
  2022-11-03 17:58 ` [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times Ashok Raj
  2022-11-04 11:00   ` Borislav Petkov
@ 2022-11-06 13:35   ` Borislav Petkov
  2022-11-07  4:17     ` Ashok Raj
  2022-11-07 16:12     ` Ashok Raj
  2022-12-03 13:51   ` [tip: x86/microcode] x86/microcode/intel: Do not print microcode revision and processor flags tip-bot2 for Ashok Raj
  2 siblings, 2 replies; 30+ messages in thread
From: Borislav Petkov @ 2022-11-06 13:35 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Thomas Gleixner, LKML Mailing List, X86-kernel, Tony Luck,
	Dave Hansen, Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan,
	Tom Lendacky, Kai Huang, Andrew Cooper

On Thu, Nov 03, 2022 at 05:58:49PM +0000, Ashok Raj wrote:
> @@ -696,8 +697,7 @@ static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
>  
>  	csig->rev = c->microcode;
>  
> -	/* No extra locking on prev, races are harmless. */
> -	if (csig->sig != prev.sig || csig->pf != prev.pf || csig->rev != prev.rev) {
> +	if (bsp && csig->rev != prev.rev) {
>  		pr_info("sig=0x%x, pf=0x%x, revision=0x%x\n",
>  			csig->sig, csig->pf, csig->rev);

And now that we've established that we don't do mixed steppings anymore
and the microcode revision is the same system-wide, you should simply
drop this pr_info(), in your next patch you're adding

+static u32 early_old_rev;

That thing should simply be

/* Currently applied microcode revision */
static u32 microcode_rev;

and you simply update that one each time you update microcode and print
it as the previous and the new one and then write the new one into this
var and that's it. Simple.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times
  2022-11-06 13:35   ` Borislav Petkov
@ 2022-11-07  4:17     ` Ashok Raj
  2022-11-07 16:12     ` Ashok Raj
  1 sibling, 0 replies; 30+ messages in thread
From: Ashok Raj @ 2022-11-07  4:17 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, LKML Mailing List, X86-kernel, Tony Luck,
	Dave Hansen, Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan,
	Tom Lendacky, Kai Huang, Andrew Cooper, Ashok Raj

On Sun, Nov 06, 2022 at 02:35:58PM +0100, Borislav Petkov wrote:
> On Thu, Nov 03, 2022 at 05:58:49PM +0000, Ashok Raj wrote:
> > @@ -696,8 +697,7 @@ static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
> >  
> >  	csig->rev = c->microcode;
> >  
> > -	/* No extra locking on prev, races are harmless. */
> > -	if (csig->sig != prev.sig || csig->pf != prev.pf || csig->rev != prev.rev) {
> > +	if (bsp && csig->rev != prev.rev) {
> >  		pr_info("sig=0x%x, pf=0x%x, revision=0x%x\n",
> >  			csig->sig, csig->pf, csig->rev);
> 
> And now that we've established that we don't do mixed steppings anymore
> and the microcode revision is the same system-wide, you should simply
> drop this pr_info(), in your next patch you're adding
> 
> +static u32 early_old_rev;
> 
> That thing should simply be
> 
> /* Currently applied microcode revision */
> static u32 microcode_rev;
> 
> and you simply update that one each time you update microcode and print
> it as the previous and the new one and then write the new one into this
> var and that's it. Simple.
> 

I thought about it... Since microcode/core.c already provides this
information. Only missing part is the microcode date which the common
function doesn't seem to get this. Thought it might be useful. But I can
certainly drop it, if you think this isn't much value.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times
  2022-11-06 13:35   ` Borislav Petkov
  2022-11-07  4:17     ` Ashok Raj
@ 2022-11-07 16:12     ` Ashok Raj
  2022-11-07 18:47       ` Borislav Petkov
  1 sibling, 1 reply; 30+ messages in thread
From: Ashok Raj @ 2022-11-07 16:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, LKML Mailing List, X86-kernel, Tony Luck,
	Dave Hansen, Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan,
	Tom Lendacky, Kai Huang, Andrew Cooper, Ashok Raj

On Sun, Nov 06, 2022 at 02:35:58PM +0100, Borislav Petkov wrote:
> On Thu, Nov 03, 2022 at 05:58:49PM +0000, Ashok Raj wrote:
> > @@ -696,8 +697,7 @@ static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
> >  
> >  	csig->rev = c->microcode;
> >  
> > -	/* No extra locking on prev, races are harmless. */
> > -	if (csig->sig != prev.sig || csig->pf != prev.pf || csig->rev != prev.rev) {
> > +	if (bsp && csig->rev != prev.rev) {
> >  		pr_info("sig=0x%x, pf=0x%x, revision=0x%x\n",
> >  			csig->sig, csig->pf, csig->rev);
> 
> And now that we've established that we don't do mixed steppings anymore
> and the microcode revision is the same system-wide, you should simply
> drop this pr_info(), in your next patch you're adding
> 
> +static u32 early_old_rev;
> 
> That thing should simply be
> 
> /* Currently applied microcode revision */
> static u32 microcode_rev;
> 
> and you simply update that one each time you update microcode and print
> it as the previous and the new one and then write the new one into this
> var and that's it. Simple.
> 

I removed that preparing for the next round. We already have late-loading
capture the previous rev already. So we don't need any new changes.

[  482.242727] microcode: Reload completed, microcode revision: 0x2b000070 -> 0x2b000081

Only missing is the ucode date, not a big deal missing it. 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times
  2022-11-07 16:12     ` Ashok Raj
@ 2022-11-07 18:47       ` Borislav Petkov
  2022-11-08 23:06         ` Ashok Raj
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2022-11-07 18:47 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Thomas Gleixner, LKML Mailing List, X86-kernel, Tony Luck,
	Dave Hansen, Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan,
	Tom Lendacky, Kai Huang, Andrew Cooper

On Mon, Nov 07, 2022 at 04:12:59PM +0000, Ashok Raj wrote:
> Only missing is the ucode date, not a big deal missing it.

Yes, it isn't. One can find it out another way:

$ iucode-tool -l /lib/firmware/intel-ucode/06-9c-00
microcode bundle 1: /lib/firmware/intel-ucode/06-9c-00
selected microcodes:
  001/001: sig 0x000906c0, pf_mask 0x01, 2022-02-19, rev 0x24000023, size 20480
  					 ^^^^^^^^^^
-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times
  2022-11-07 18:47       ` Borislav Petkov
@ 2022-11-08 23:06         ` Ashok Raj
  2022-11-08 23:32           ` Dave Hansen
  2022-11-09  9:18           ` Borislav Petkov
  0 siblings, 2 replies; 30+ messages in thread
From: Ashok Raj @ 2022-11-08 23:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, LKML Mailing List, X86-kernel, Tony Luck,
	Dave Hansen, Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan,
	Tom Lendacky, Kai Huang, Andrew Cooper, Ashok Raj

Hi Boris,

On Mon, Nov 07, 2022 at 07:47:37PM +0100, Borislav Petkov wrote:
> On Mon, Nov 07, 2022 at 04:12:59PM +0000, Ashok Raj wrote:
> > Only missing is the ucode date, not a big deal missing it.
> 
> Yes, it isn't. One can find it out another way:
> 
> $ iucode-tool -l /lib/firmware/intel-ucode/06-9c-00
> microcode bundle 1: /lib/firmware/intel-ucode/06-9c-00
> selected microcodes:
>   001/001: sig 0x000906c0, pf_mask 0x01, 2022-02-19, rev 0x24000023, size 20480
>   					 ^^^^^^^^^^

That's correct, my thought as well. Did you get a chance to review rest of
the patches? 

Thought I'll wait for more comments before I send the next rev.

Patch2 is a simple fix that you suggested.

Patch3 is a bug fix. I suspect some earlier upstream reports of ucode 
failure after update (early loading) might be related. The symptom is 
similar, but those are too old to followup. I got into a similare situation
when i tried to update an incompatible uCode from initrd and system hung.

I'm not positive, but seems highly likely. The following are early loading
failures, not late loading.

https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1911959

https://forums.linuxmint.com/viewtopic.php?p=1827032#1827032

https://askubuntu.com/questions/1291486/boot-crash-after-latest-update-of-intel-microcode-nov-11-2020

Patch 4 is also a bugfix, today when we reload the same ucode even though
nothing changes it seems to think some feature is new. When i added some
more debug it turned out SGX was probably turned off by OS, but enumerated
by microcode. So it always reports 

Cheers,
Ashok


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times
  2022-11-08 23:06         ` Ashok Raj
@ 2022-11-08 23:32           ` Dave Hansen
  2022-11-09  9:18           ` Borislav Petkov
  1 sibling, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2022-11-08 23:32 UTC (permalink / raw)
  To: Ashok Raj, Borislav Petkov
  Cc: Thomas Gleixner, LKML Mailing List, X86-kernel, Tony Luck,
	Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan, Tom Lendacky,
	Kai Huang, Andrew Cooper

On 11/8/22 15:06, Ashok Raj wrote:
> Patch3 is a bug fix. I suspect some earlier upstream reports of ucode 
> failure after update (early loading) might be related. The symptom is 
> similar, but those are too old to followup. I got into a similare situation
> when i tried to update an incompatible uCode from initrd and system hung.

Hi Ashok,

If this really is a bug fix, could you please break it out and send it
separately along with appropriate tags like Fixes and a cc:stable@?  We
handle bug fixes differently from new features.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times
  2022-11-08 23:06         ` Ashok Raj
  2022-11-08 23:32           ` Dave Hansen
@ 2022-11-09  9:18           ` Borislav Petkov
  1 sibling, 0 replies; 30+ messages in thread
From: Borislav Petkov @ 2022-11-09  9:18 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Thomas Gleixner, LKML Mailing List, X86-kernel, Tony Luck,
	Dave Hansen, Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan,
	Tom Lendacky, Kai Huang, Andrew Cooper

On Tue, Nov 08, 2022 at 03:06:20PM -0800, Ashok Raj wrote:
> That's correct, my thought as well. Did you get a chance to review rest of
> the patches?

Dave had a spot-on response to one of your colleagues pinging
impatiently the same way:

"Things are a bit busy in the review queue at the moment. As always,
we'd love help reviewing stuff. So, while you're waiting for us to
review this, could you perhaps look around and find a series that's also
hurting for review tags?"

So how about you help out with review while waiting?

Ashok, I'll say this only once: if you don't stop this impatient pinging
and the private bullsh*t emails - them especially! - I will ignore you
completely.

We've documented it all:

"You should receive comments within a week or so; if that does not
happen, make sure that you have sent your patches to the right place.
Wait for a minimum of one week before resubmitting or pinging reviewers
- possibly longer during busy times like merge windows."

And yes, those rules apply to you too.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v2 03/13] x86/microcode/intel: Fix a hang if early loading microcode fails
  2022-11-03 17:58 ` [v2 03/13] x86/microcode/intel: Fix a hang if early loading microcode fails Ashok Raj
@ 2022-11-09 11:25   ` Borislav Petkov
  2022-11-09 16:07     ` Ashok Raj
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2022-11-09 11:25 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Thomas Gleixner, LKML Mailing List, X86-kernel, Tony Luck,
	Dave Hansen, Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan,
	Tom Lendacky, Kai Huang, Andrew Cooper

On Thu, Nov 03, 2022 at 05:58:51PM +0000, Ashok Raj wrote:
> When early loading of microcode fails for any reason other than the wrong
> family-model-stepping, Linux can get into an infinite loop retrying the
> same failed load.
> 
> A single retry is needed to handle any mixed stepping case.
> 
> Assume we have a microcode that fails to load for some reason.
> load_ucode_ap() seems to retry if the loading fails. But it searches for

Seems to retry because we were supporting mixed revisions. Which we do
not now.

And if you say "seems" then this sounds like the problem hasn't been
analyzed properly. If this can happen with the current code, then this
needs to be fixed in stable. So, how do you trigger exactly?

I'd like to reproduce it myself.

As to this patch: it should simply be removing the retrying instead of
doing silly crap like

	bool retried = false;

...

In light of how a lot has changed since last time, yes, please redo the
patchset ontop of tip:x86/microcode, keeping in mind now that we don't
support mixed revisions anymore.

Just like dhansen said, you can split it in fixes and new features so
that it is not too many patches at once - your call.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v2 03/13] x86/microcode/intel: Fix a hang if early loading microcode fails
  2022-11-09 11:25   ` Borislav Petkov
@ 2022-11-09 16:07     ` Ashok Raj
  2022-11-09 23:34       ` Borislav Petkov
  0 siblings, 1 reply; 30+ messages in thread
From: Ashok Raj @ 2022-11-09 16:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, LKML Mailing List, X86-kernel, Tony Luck,
	Dave Hansen, Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan,
	Tom Lendacky, Kai Huang, Andrew Cooper, Ashok Raj

On Wed, Nov 09, 2022 at 12:25:02PM +0100, Borislav Petkov wrote:
> On Thu, Nov 03, 2022 at 05:58:51PM +0000, Ashok Raj wrote:
> > When early loading of microcode fails for any reason other than the wrong
> > family-model-stepping, Linux can get into an infinite loop retrying the
> > same failed load.
> > 
> > A single retry is needed to handle any mixed stepping case.
> > 
> > Assume we have a microcode that fails to load for some reason.
> > load_ucode_ap() seems to retry if the loading fails. But it searches for
> 
> Seems to retry because we were supporting mixed revisions. Which we do
> not now.

The retry wasn't the problem, but hitting the same failed microcode over
and over is the problem. It is called out in the commit log.

As part of dropping mixed stepping, we can drop this retry.

Maybe the right way is to remember if the bsp failed, then there is no
point in trying to apply on the AP's. 

reload_early_microcode->reload_ucode_intel()
                               ->apply_microcode_intel() 

we aren't checking if early load failed for bsp, we should save and
skip loading on all AP's.

> 
> And if you say "seems" then this sounds like the problem hasn't been
> analyzed properly. If this can happen with the current code, then this
> needs to be fixed in stable. So, how do you trigger exactly?
> 
> I'd like to reproduce it myself.

Certainly, take the fms+pf of the platform you are testing. 

- Take a microcode file from the distribution for a different fms that didn't
  belong to the one you are testing.
- You will have to fake the external header data and change it to the one
  you want microcode match to work 
- recompute all checksums and use that file instead of the original file.

I accidently ran into it since I had a copy of debug uCode that require
additional steps before loading.

I have a tool that I can change to give you some production microcode that
will fail in your platform. Just provide me with the fms+pf values, and I
an provide one for  your test. 

Let me know if you need one for testing.

> 
> As to this patch: it should simply be removing the retrying instead of
> doing silly crap like
> 
> 	bool retried = false;
> 
> ...
> 
> In light of how a lot has changed since last time, yes, please redo the
> patchset ontop of tip:x86/microcode, keeping in mind now that we don't
> support mixed revisions anymore.
> 
> Just like dhansen said, you can split it in fixes and new features so
> that it is not too many patches at once - your call.


That makes sense, I'll send the bug fix patches separately.

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [v2 03/13] x86/microcode/intel: Fix a hang if early loading microcode fails
  2022-11-09 16:07     ` Ashok Raj
@ 2022-11-09 23:34       ` Borislav Petkov
  0 siblings, 0 replies; 30+ messages in thread
From: Borislav Petkov @ 2022-11-09 23:34 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Thomas Gleixner, LKML Mailing List, X86-kernel, Tony Luck,
	Dave Hansen, Arjan van de Ven, Andy Lutomirski, Jacon Jun Pan,
	Tom Lendacky, Kai Huang, Andrew Cooper

On Wed, Nov 09, 2022 at 08:07:32AM -0800, Ashok Raj wrote:
> - Take a microcode file from the distribution for a different fms that didn't
>   belong to the one you are testing.
> - You will have to fake the external header data and change it to the one
>   you want microcode match to work 
> - recompute all checksums and use that file instead of the original file.

This sounds like this cannot happen with officially released microcode -
only with something "hacked". If so, I'm not interested in such "fixes".

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [tip: x86/microcode] x86/microcode/intel: Do not print microcode revision and processor flags
  2022-11-03 17:58 ` [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times Ashok Raj
  2022-11-04 11:00   ` Borislav Petkov
  2022-11-06 13:35   ` Borislav Petkov
@ 2022-12-03 13:51   ` tip-bot2 for Ashok Raj
  2 siblings, 0 replies; 30+ messages in thread
From: tip-bot2 for Ashok Raj @ 2022-12-03 13:51 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Tony Luck, Ashok Raj, Borislav Petkov (AMD), x86, linux-kernel

The following commit has been merged into the x86/microcode branch of tip:

Commit-ID:     5b1586ab064ca24c6a7a6be7a9d0cb9e237ef39a
Gitweb:        https://git.kernel.org/tip/5b1586ab064ca24c6a7a6be7a9d0cb9e237ef39a
Author:        Ashok Raj <ashok.raj@intel.com>
AuthorDate:    Tue, 29 Nov 2022 13:08:26 -08:00
Committer:     Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Sat, 03 Dec 2022 14:41:06 +01:00

x86/microcode/intel: Do not print microcode revision and processor flags

collect_cpu_info() is used to collect the current microcode revision and
processor flags on every CPU.

It had a weird mechanism to try to mimick a "once" functionality in the
sense that, that information should be issued only when it is differing
from the previous CPU.

However (1):

the new calling sequence started doing that in parallel:

  microcode_init()
  |-> schedule_on_each_cpu(setup_online_cpu)
      |-> collect_cpu_info()

resulting in multiple redundant prints:

  microcode: sig=0x50654, pf=0x80, revision=0x2006e05
  microcode: sig=0x50654, pf=0x80, revision=0x2006e05
  microcode: sig=0x50654, pf=0x80, revision=0x2006e05

However (2):

dumping this here is not that important because the kernel does not
support mixed silicon steppings microcode. Finally!

Besides, there is already a pr_info() in microcode_reload_late() that
shows both the old and new revisions.

What is more, the CPU signature (sig=0x50654) and Processor Flags
(pf=0x80) above aren't that useful to the end user, they are available
via /proc/cpuinfo and they don't change anyway.

Remove the redundant pr_info().

  [ bp: Heavily massage. ]

Fixes: b6f86689d5b7 ("x86/microcode: Rip out the subsys interface gunk")
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20221103175901.164783-2-ashok.raj@intel.com
---
 arch/x86/kernel/cpu/microcode/intel.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index c4a00fb..4f93875 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -554,7 +554,6 @@ void reload_ucode_intel(void)
 
 static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
 {
-	static struct cpu_signature prev;
 	struct cpuinfo_x86 *c = &cpu_data(cpu_num);
 	unsigned int val[2];
 
@@ -570,13 +569,6 @@ static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
 
 	csig->rev = c->microcode;
 
-	/* No extra locking on prev, races are harmless. */
-	if (csig->sig != prev.sig || csig->pf != prev.pf || csig->rev != prev.rev) {
-		pr_info("sig=0x%x, pf=0x%x, revision=0x%x\n",
-			csig->sig, csig->pf, csig->rev);
-		prev = *csig;
-	}
-
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-12-03 13:51 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-03 17:58 [v2 00/13] Make microcode late loading more robust Ashok Raj
2022-11-03 17:58 ` [v2 01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times Ashok Raj
2022-11-04 11:00   ` Borislav Petkov
2022-11-04 13:53     ` Van De Ven, Arjan
2022-11-04 15:52       ` Borislav Petkov
2022-11-04 18:28         ` Ashok Raj
2022-11-04 20:21           ` Borislav Petkov
2022-11-06 13:35   ` Borislav Petkov
2022-11-07  4:17     ` Ashok Raj
2022-11-07 16:12     ` Ashok Raj
2022-11-07 18:47       ` Borislav Petkov
2022-11-08 23:06         ` Ashok Raj
2022-11-08 23:32           ` Dave Hansen
2022-11-09  9:18           ` Borislav Petkov
2022-12-03 13:51   ` [tip: x86/microcode] x86/microcode/intel: Do not print microcode revision and processor flags tip-bot2 for Ashok Raj
2022-11-03 17:58 ` [v2 02/13] x86/microcode/intel: Print old and new rev after early microcode update Ashok Raj
2022-11-03 17:58 ` [v2 03/13] x86/microcode/intel: Fix a hang if early loading microcode fails Ashok Raj
2022-11-09 11:25   ` Borislav Petkov
2022-11-09 16:07     ` Ashok Raj
2022-11-09 23:34       ` Borislav Petkov
2022-11-03 17:58 ` [v2 04/13] x86/microcode: Fix microcode_check() compare after a new uCode update Ashok Raj
2022-11-03 17:58 ` [v2 05/13] x86/microcode: Move late-load warning to earlier where kernel taint happens Ashok Raj
2022-11-03 17:58 ` [v2 06/13] x86/ipi: Support sending NMI_VECTOR as self ipi Ashok Raj
2022-11-03 17:58 ` [v2 07/13] x86/microcode: Place siblings in NMI loop while update in progress Ashok Raj
2022-11-03 17:58 ` [v2 08/13] x86/mce: Warn of a microcode update is in progress when MCE arrives Ashok Raj
2022-11-03 17:58 ` [v2 09/13] x86/microcode/intel: Add minimum required revision to microcode header Ashok Raj
2022-11-03 17:58 ` [v2 10/13] x86/microcode: Add a generic mechanism to declare support for minrev Ashok Raj
2022-11-03 17:58 ` [v2 11/13] x86/microcode/intel: Drop wbinvd() from microcode loading Ashok Raj
2022-11-03 17:59 ` [v2 12/13] x86/microcode: Display revisions only when update is successful Ashok Raj
2022-11-03 17:59 ` [v2 13/13] x86/microcode/intel: Add ability to update microcode even if rev is unchanged Ashok Raj

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).