linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch V1 0/7] x86/microcode: Some cleanups and fixes for microcode
@ 2022-11-29 21:08 Ashok Raj
  2022-11-29 21:08 ` [Patch V1 1/7] x86/microcode/intel: Remove redundant microcode rev pr_info()s Ashok Raj
                   ` (6 more replies)
  0 siblings, 7 replies; 32+ messages in thread
From: Ashok Raj @ 2022-11-29 21:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Ashok Raj, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

Hi Boris

These are based on tip/x86/microcode after the recent IFS merge. 

As discussed in IRC, feel free to massage/rework/re-organize them in a way
that you intend to see them. 

Patch1: Fixes a minor race in pr_info due to a recent change in
	microcode_init() sequence.

Patch2: Remove microcode apply retries that cause a hang when loading
	fails.

Patch3: Prep patch for patch4, since its only used in core.c make it local
	there.

Patch4: Take a snapshot of x86 features before and after microcode update,
	instead of using the one from boot_cpu already cached.

Patch5: prepare early print to simplify patch 6 and 7.

Patch6: Print old and new rev after early loading. Its useful to know what
	BIOS version of the patchrev was.

Patch7: Print a message when early loading fails. Currently there is no
	message to user.

Ashok Raj (7):
  x86/microcode/intel: Remove redundant microcode rev pr_info()s
  x86/microcode/intel: Remove retries on early microcode load
  x86/microcode/core: Move microcode_check() to cpu/microcode/core.c
  x86/microcode/core: Take a snapshot before and after applying
    microcode
  x86/microcode/intel: Prepare the print_ucode_rev to simply take a rev
    to print
  x86/microcode/intel: Print old and new rev during early boot
  x86/microcode/intel: Print when early microcode loading fails

 arch/x86/include/asm/processor.h      |  3 +-
 arch/x86/kernel/cpu/cpu.h             |  1 -
 arch/x86/kernel/cpu/common.c          | 32 -------------
 arch/x86/kernel/cpu/microcode/core.c  | 49 +++++++++++++++++++-
 arch/x86/kernel/cpu/microcode/intel.c | 67 +++++++++++----------------
 5 files changed, 75 insertions(+), 77 deletions(-)


base-commit: 1a63b58082869273bfbab1b945007193f7bd3a78
-- 
2.34.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Patch V1 1/7] x86/microcode/intel: Remove redundant microcode rev pr_info()s
  2022-11-29 21:08 [Patch V1 0/7] x86/microcode: Some cleanups and fixes for microcode Ashok Raj
@ 2022-11-29 21:08 ` Ashok Raj
  2022-12-02 18:58   ` Thomas Gleixner
  2022-11-29 21:08 ` [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load Ashok Raj
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-11-29 21:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Ashok Raj, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

There is a pr_info() to dump information about newly loaded microcode.
The code intends this pr_info() to be just once, but the check to ensure
is racy. Unfortunately this happens quite often in with this new change
resulting in multiple redundant prints on the console.

microcode_init()->schedule_on_each_cpu(setup_online_cpu)->collect_cpu_info

[   33.688639] microcode: sig=0x50654, pf=0x80, revision=0x2006e05
[   33.688659] microcode: sig=0x50654, pf=0x80, revision=0x2006e05
[   33.688660] microcode: sig=0x50654, pf=0x80, revision=0x2006e05

There is already a pr_info() in microcode/core.c as shown below:

microcode: Reload completed, microcode revision: 0x2b000041 -> 0x2b000070

The sig and pf aren't that useful to end user, they are available via
/proc/cpuinfo and this never changes between microcode loads.

Remove the redundant pr_info() and the racy single print checks. This
removes the race entirely, zap the duplicated pr_info() spam and
simplify the code.

Fixes: b6f86689d5b7 ("x86/microcode: Rip out the subsys interface gunk")
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/intel.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index c4a00fb97f61..4f93875f57b4 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -554,7 +554,6 @@ void reload_ucode_intel(void)
 
 static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
 {
-	static struct cpu_signature prev;
 	struct cpuinfo_x86 *c = &cpu_data(cpu_num);
 	unsigned int val[2];
 
@@ -570,13 +569,6 @@ static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
 
 	csig->rev = c->microcode;
 
-	/* No extra locking on prev, races are harmless. */
-	if (csig->sig != prev.sig || csig->pf != prev.pf || csig->rev != prev.rev) {
-		pr_info("sig=0x%x, pf=0x%x, revision=0x%x\n",
-			csig->sig, csig->pf, csig->rev);
-		prev = *csig;
-	}
-
 	return 0;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load
  2022-11-29 21:08 [Patch V1 0/7] x86/microcode: Some cleanups and fixes for microcode Ashok Raj
  2022-11-29 21:08 ` [Patch V1 1/7] x86/microcode/intel: Remove redundant microcode rev pr_info()s Ashok Raj
@ 2022-11-29 21:08 ` Ashok Raj
  2022-12-02 19:01   ` Thomas Gleixner
                     ` (3 more replies)
  2022-11-29 21:08 ` [Patch V1 3/7] x86/microcode/core: Move microcode_check() to cpu/microcode/core.c Ashok Raj
                   ` (4 subsequent siblings)
  6 siblings, 4 replies; 32+ messages in thread
From: Ashok Raj @ 2022-11-29 21:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Ashok Raj, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

Microcode loading can fail. This happens today when handling mixed
steppings. But it can also happen for other reasons such as corrupted
image, Security Version Number (SVN) preventing anti-rollback,
dependencies on BIOS loaded microcode image for some capabilities.

When the microcode loading fails, the kernel will quietly hang at boot.
This has been observed by end users (Links below) who had to revert their
microcode packages in order to boot again.

The hang is due to an infinite retry loop. The retries were in place to
support systems with mixed steppings. Now that mixed steppings are no
longer supported, there is only one microcode image at a time. Any retries
will simply reattempt to apply the same image over and over without making
progress.

Some possible past bugs that could be due to this bug are below.

There is no direct evidence that these end user issues were caused by this
retry loop. However, the early boot hangs along with reverting the
microcode update workaround provide strong circumstantial evidence to
support the theory that they are linked.

Remove the retry loop and only attempt to apply microcode once.

Link: https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1911959
Link: https://forums.linuxmint.com/viewtopic.php?p=1827032#1827032
Link: https://askubuntu.com/questions/1291486/boot-crash-after-latest-update-of-intel-microcode-nov-11-2020
Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
Cc: stable@vger.kernel.org
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/intel.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 4f93875f57b4..d68b084a17e7 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -495,7 +495,6 @@ void load_ucode_intel_ap(void)
 	else
 		iup = &intel_ucode_patch;
 
-reget:
 	if (!*iup) {
 		patch = __load_ucode_intel(&uci);
 		if (!patch)
@@ -505,13 +504,7 @@ void load_ucode_intel_ap(void)
 	}
 
 	uci.mc = *iup;
-
-	if (apply_microcode_early(&uci, true)) {
-		/* Mixed-silicon system? Try to refetch the proper patch: */
-		*iup = NULL;
-
-		goto reget;
-	}
+	apply_microcode_early(&uci, true);
 }
 
 static struct microcode_intel *find_patch(struct ucode_cpu_info *uci)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Patch V1 3/7] x86/microcode/core: Move microcode_check() to cpu/microcode/core.c
  2022-11-29 21:08 [Patch V1 0/7] x86/microcode: Some cleanups and fixes for microcode Ashok Raj
  2022-11-29 21:08 ` [Patch V1 1/7] x86/microcode/intel: Remove redundant microcode rev pr_info()s Ashok Raj
  2022-11-29 21:08 ` [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load Ashok Raj
@ 2022-11-29 21:08 ` Ashok Raj
  2022-12-02 19:02   ` Thomas Gleixner
                     ` (2 more replies)
  2022-11-29 21:08 ` [Patch V1 4/7] x86/microcode/core: Take a snapshot before and after applying microcode Ashok Raj
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 32+ messages in thread
From: Ashok Raj @ 2022-11-29 21:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Ashok Raj, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

microcode_check() is only called from microcode/core.c. Move it and make
it static to prepare for upcoming fix of false negative when checking CPU
features after a microcode update. Also move get_cpu_cap() to processor.h
for general use outside of kernel/cpu.h

No functional change.

Suggested-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
Tony
	Add movement of get_cpu_cap() to commit log
Reinette
	Avoid including ../cpu.h and move to more general header.
Alison
	Split patch to just move the function before use inside microcode
	files.
---
 arch/x86/include/asm/processor.h     |  3 +--
 arch/x86/kernel/cpu/cpu.h            |  1 -
 arch/x86/kernel/cpu/common.c         | 32 ----------------------------
 arch/x86/kernel/cpu/microcode/core.c | 31 +++++++++++++++++++++++++++
 4 files changed, 32 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 67c9d73b31fa..f5380806f3fa 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -192,8 +192,8 @@ extern const struct seq_operations cpuinfo_op;
 
 #define cache_line_size()	(boot_cpu_data.x86_cache_alignment)
 
+extern void get_cpu_cap(struct cpuinfo_x86 *c);
 extern void cpu_detect(struct cpuinfo_x86 *c);
-
 static inline unsigned long long l1tf_pfn_limit(void)
 {
 	return BIT_ULL(boot_cpu_data.x86_cache_bits - 1 - PAGE_SHIFT);
@@ -835,7 +835,6 @@ bool xen_set_default_idle(void);
 #endif
 
 void __noreturn stop_this_cpu(void *dummy);
-void microcode_check(void);
 
 enum l1tf_mitigations {
 	L1TF_MITIGATION_OFF,
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index 7c9b5893c30a..a142b8d543a3 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -63,7 +63,6 @@ static inline void tsx_ap_init(void) { }
 
 extern void init_spectral_chicken(struct cpuinfo_x86 *c);
 
-extern void get_cpu_cap(struct cpuinfo_x86 *c);
 extern void get_cpu_address_sizes(struct cpuinfo_x86 *c);
 extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
 extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3e508f239098..bbd362ead043 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2305,38 +2305,6 @@ void cpu_init_secondary(void)
 }
 #endif
 
-#ifdef CONFIG_MICROCODE_LATE_LOADING
-/*
- * The microcode loader calls this upon late microcode load to recheck features,
- * only when microcode has been updated. Caller holds microcode_mutex and CPU
- * hotplug lock.
- */
-void microcode_check(void)
-{
-	struct cpuinfo_x86 info;
-
-	perf_check_microcode();
-
-	/* Reload CPUID max function as it might've changed. */
-	info.cpuid_level = cpuid_eax(0);
-
-	/*
-	 * Copy all capability leafs to pick up the synthetic ones so that
-	 * memcmp() below doesn't fail on that. The ones coming from CPUID will
-	 * get overwritten in get_cpu_cap().
-	 */
-	memcpy(&info.x86_capability, &boot_cpu_data.x86_capability, sizeof(info.x86_capability));
-
-	get_cpu_cap(&info);
-
-	if (!memcmp(&info.x86_capability, &boot_cpu_data.x86_capability, sizeof(info.x86_capability)))
-		return;
-
-	pr_warn("x86/CPU: CPU features have changed after loading microcode, but might not take effect.\n");
-	pr_warn("x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.\n");
-}
-#endif
-
 /*
  * Invoked from core CPU hotplug code after hotplug operations
  */
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 712aafff96e0..ef24e1d228d0 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -431,6 +431,37 @@ static int __reload_late(void *info)
 	return ret;
 }
 
+/*
+ * The microcode loader calls this upon late microcode load to recheck features,
+ * only when microcode has been updated. Caller holds microcode_mutex and CPU
+ * hotplug lock.
+ */
+static void microcode_check(void)
+{
+	struct cpuinfo_x86 info;
+
+	perf_check_microcode();
+
+	/* Reload CPUID max function as it might've changed. */
+	info.cpuid_level = cpuid_eax(0);
+
+	/*
+	 * Copy all capability leafs to pick up the synthetic ones so that
+	 * memcmp() below doesn't fail on that. The ones coming from CPUID will
+	 * get overwritten in get_cpu_cap().
+	 */
+	memcpy(&info.x86_capability, &boot_cpu_data.x86_capability, sizeof(info.x86_capability));
+
+	get_cpu_cap(&info);
+
+	if (!memcmp(&info.x86_capability, &boot_cpu_data.x86_capability,
+		    sizeof(info.x86_capability)))
+		return;
+
+	pr_warn("x86/CPU: CPU features have changed after loading microcode, but might not take effect.\n");
+	pr_warn("x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.\n");
+}
+
 /*
  * Reload microcode late on all CPUs. Wait for a sec until they
  * all gather together.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Patch V1 4/7] x86/microcode/core: Take a snapshot before and after applying microcode
  2022-11-29 21:08 [Patch V1 0/7] x86/microcode: Some cleanups and fixes for microcode Ashok Raj
                   ` (2 preceding siblings ...)
  2022-11-29 21:08 ` [Patch V1 3/7] x86/microcode/core: Move microcode_check() to cpu/microcode/core.c Ashok Raj
@ 2022-11-29 21:08 ` Ashok Raj
  2022-12-02 19:09   ` Thomas Gleixner
  2022-12-07 20:25   ` Borislav Petkov
  2022-11-29 21:08 ` [Patch V1 5/7] x86/microcode/intel: Prepare the print_ucode_rev to simply take a rev to print Ashok Raj
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 32+ messages in thread
From: Ashok Raj @ 2022-11-29 21:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Ashok Raj, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

The kernel caches features about each CPU's features at boot in an
x86_capability[] structure. The microcode update takes one snapshot and
compares it with the saved copy at boot.

However, the capabilities in the boot copy can be turned off as a result of
certain command line parameters or configuration restrictions. This can
cause a mismatch when comparing the values before and after the microcode
update.

microcode_check() is called after an update to report any previously
cached CPUID bits might have changed due to the update.

Ignore the capabilities recorded at boot. Take a new snapshot before the
update and compare with a snapshot after the update to eliminate the false
warning.

Fixes: 1008c52c09dc ("x86/CPU: Add a microcode loader callback")
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/core.c | 36 ++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index ef24e1d228d0..fab2010ff368 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -431,12 +431,28 @@ static int __reload_late(void *info)
 	return ret;
 }
 
+static void copy_cpu_caps(struct cpuinfo_x86 *info)
+{
+	/* Reload CPUID max function as it might've changed. */
+	info->cpuid_level = cpuid_eax(0);
+
+	/*
+	 * Copy all capability leafs to pick up the synthetic ones so that
+	 * memcmp() below doesn't fail on that. The ones coming from CPUID will
+	 * get overwritten in get_cpu_cap().
+	 */
+	memcpy(info->x86_capability, &boot_cpu_data.x86_capability,
+	       sizeof(info->x86_capability));
+
+	get_cpu_cap(info);
+}
+
 /*
  * The microcode loader calls this upon late microcode load to recheck features,
  * only when microcode has been updated. Caller holds microcode_mutex and CPU
  * hotplug lock.
  */
-static void microcode_check(void)
+static void microcode_check(struct cpuinfo_x86 *orig)
 {
 	struct cpuinfo_x86 info;
 
@@ -446,15 +462,13 @@ static void microcode_check(void)
 	info.cpuid_level = cpuid_eax(0);
 
 	/*
-	 * Copy all capability leafs to pick up the synthetic ones so that
-	 * memcmp() below doesn't fail on that. The ones coming from CPUID will
-	 * get overwritten in get_cpu_cap().
-	 */
-	memcpy(&info.x86_capability, &boot_cpu_data.x86_capability, sizeof(info.x86_capability));
+	* Copy all capability leafs to pick up the synthetic ones so that
+	* memcmp() below doesn't fail on that. The ones coming from CPUID will
+	* get overwritten in get_cpu_cap().
+	*/
+	copy_cpu_caps(&info);
 
-	get_cpu_cap(&info);
-
-	if (!memcmp(&info.x86_capability, &boot_cpu_data.x86_capability,
+	if (!memcmp(&info.x86_capability, &orig->x86_capability,
 		    sizeof(info.x86_capability)))
 		return;
 
@@ -469,6 +483,7 @@ static void microcode_check(void)
 static int microcode_reload_late(void)
 {
 	int old = boot_cpu_data.microcode, ret;
+	struct cpuinfo_x86 info;
 
 	pr_err("Attempting late microcode loading - it is dangerous and taints the kernel.\n");
 	pr_err("You should switch to early loading, if possible.\n");
@@ -476,9 +491,10 @@ static int microcode_reload_late(void)
 	atomic_set(&late_cpus_in,  0);
 	atomic_set(&late_cpus_out, 0);
 
+	copy_cpu_caps(&info);
 	ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);
 	if (ret == 0)
-		microcode_check();
+		microcode_check(&info);
 
 	pr_info("Reload completed, microcode revision: 0x%x -> 0x%x\n",
 		old, boot_cpu_data.microcode);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Patch V1 5/7] x86/microcode/intel: Prepare the print_ucode_rev to simply take a rev to print
  2022-11-29 21:08 [Patch V1 0/7] x86/microcode: Some cleanups and fixes for microcode Ashok Raj
                   ` (3 preceding siblings ...)
  2022-11-29 21:08 ` [Patch V1 4/7] x86/microcode/core: Take a snapshot before and after applying microcode Ashok Raj
@ 2022-11-29 21:08 ` Ashok Raj
  2022-12-02 19:23   ` Thomas Gleixner
  2022-11-29 21:08 ` [Patch V1 6/7] x86/microcode/intel: Print old and new rev during early boot Ashok Raj
  2022-11-29 21:08 ` [Patch V1 7/7] x86/microcode/intel: Print when early microcode loading fails Ashok Raj
  6 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-11-29 21:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Ashok Raj, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

Instead of passing a struct ucode_cpu_info, just pass the rev to print.

Next patch will permit printing old and new revisions after an early
update. A subsequent patch will print a message when early loading fails.

struct ucode_cpu_info is always the current version in the CPU. When
loading fails this is the old rev, when its successfully applied its the
new rev. That makes the code bit ugly to read.

Remove the struct ucode_cpu_info parameter from print_ucode() and let
the caller to pass in the revision number to print.

No functional change.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/intel.c | 28 +++++++++------------------
 1 file changed, 9 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index d68b084a17e7..0a4f511e39ea 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -309,10 +309,10 @@ static bool load_builtin_intel_microcode(struct cpio_data *cp)
  * Print ucode update info.
  */
 static void
-print_ucode_info(struct ucode_cpu_info *uci, unsigned int date)
+print_ucode_info(unsigned int new_rev, unsigned int date)
 {
 	pr_info_once("microcode updated early to revision 0x%x, date = %04x-%02x-%02x\n",
-		     uci->cpu_sig.rev,
+		     new_rev,
 		     date & 0xffff,
 		     date >> 24,
 		     (date >> 16) & 0xff);
@@ -332,7 +332,7 @@ void show_ucode_info_early(void)
 
 	if (delay_ucode_info) {
 		intel_cpu_collect_info(&uci);
-		print_ucode_info(&uci, current_mc_date);
+		print_ucode_info(uci.cpu_sig.rev. current_mc_date);
 		delay_ucode_info = 0;
 	}
 }
@@ -341,33 +341,23 @@ void show_ucode_info_early(void)
  * At this point, we can not call printk() yet. Delay printing microcode info in
  * show_ucode_info_early() until printk() works.
  */
-static void print_ucode(struct ucode_cpu_info *uci)
+static void print_ucode(int new_rev, int date)
 {
 	struct microcode_intel *mc;
 	int *delay_ucode_info_p;
 	int *current_mc_date_p;
 
-	mc = uci->mc;
-	if (!mc)
-		return;
-
 	delay_ucode_info_p = (int *)__pa_nodebug(&delay_ucode_info);
 	current_mc_date_p = (int *)__pa_nodebug(&current_mc_date);
 
 	*delay_ucode_info_p = 1;
-	*current_mc_date_p = mc->hdr.date;
+	*current_mc_date_p = date;
 }
 #else
 
-static inline void print_ucode(struct ucode_cpu_info *uci)
+static inline void print_ucode(int new_rev, int date)
 {
-	struct microcode_intel *mc;
-
-	mc = uci->mc;
-	if (!mc)
-		return;
-
-	print_ucode_info(uci, mc->hdr.date);
+	print_ucode_info(new_rev, date);
 }
 #endif
 
@@ -407,9 +397,9 @@ static int apply_microcode_early(struct ucode_cpu_info *uci, bool early)
 	uci->cpu_sig.rev = rev;
 
 	if (early)
-		print_ucode(uci);
+		print_ucode(uci->cpu_sig.rev, mc->hdr.date);
 	else
-		print_ucode_info(uci, mc->hdr.date);
+		print_ucode_info(uci->cpu_sig.rev, mc->hdr.date);
 
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Patch V1 6/7] x86/microcode/intel: Print old and new rev during early boot
  2022-11-29 21:08 [Patch V1 0/7] x86/microcode: Some cleanups and fixes for microcode Ashok Raj
                   ` (4 preceding siblings ...)
  2022-11-29 21:08 ` [Patch V1 5/7] x86/microcode/intel: Prepare the print_ucode_rev to simply take a rev to print Ashok Raj
@ 2022-11-29 21:08 ` Ashok Raj
  2022-12-02 19:29   ` Thomas Gleixner
  2022-11-29 21:08 ` [Patch V1 7/7] x86/microcode/intel: Print when early microcode loading fails Ashok Raj
  6 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-11-29 21:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Ashok Raj, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

Make early loading message to match late loading messages. Print
both old and new revisions.

This is helpful to know what the BIOS loaded revision is before an early
update.

microcode: early update: 0x2b000041 -> 0x2b000070 date = 2000-01-01

Store the early BIOS revision and change the print format to match late
loading message from microcode/core.c

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/intel.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 0a4f511e39ea..3dbcf457f45d 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -309,10 +309,10 @@ static bool load_builtin_intel_microcode(struct cpio_data *cp)
  * Print ucode update info.
  */
 static void
-print_ucode_info(unsigned int new_rev, unsigned int date)
+print_ucode_info(int old_rev, int new_rev, unsigned int date)
 {
-	pr_info_once("microcode updated early to revision 0x%x, date = %04x-%02x-%02x\n",
-		     new_rev,
+	pr_info_once("early update: 0x%x -> 0x%x, date = %04x-%02x-%02x\n",
+		     old_rev, new_rev,
 		     date & 0xffff,
 		     date >> 24,
 		     (date >> 16) & 0xff);
@@ -322,6 +322,7 @@ print_ucode_info(unsigned int new_rev, unsigned int date)
 
 static int delay_ucode_info;
 static int current_mc_date;
+static int early_old_rev;
 
 /*
  * Print early updated ucode info after printk works. This is delayed info dump.
@@ -332,7 +333,7 @@ void show_ucode_info_early(void)
 
 	if (delay_ucode_info) {
 		intel_cpu_collect_info(&uci);
-		print_ucode_info(uci.cpu_sig.rev. current_mc_date);
+		print_ucode_info(early_old_rev, uci.cpu_sig.rev, current_mc_date);
 		delay_ucode_info = 0;
 	}
 }
@@ -341,30 +342,33 @@ void show_ucode_info_early(void)
  * At this point, we can not call printk() yet. Delay printing microcode info in
  * show_ucode_info_early() until printk() works.
  */
-static void print_ucode(int new_rev, int date)
+static void print_ucode(int old_rev, int new_rev, int date)
 {
 	struct microcode_intel *mc;
 	int *delay_ucode_info_p;
 	int *current_mc_date_p;
+	int *early_old_rev_p;
 
 	delay_ucode_info_p = (int *)__pa_nodebug(&delay_ucode_info);
 	current_mc_date_p = (int *)__pa_nodebug(&current_mc_date);
+	early_old_rev_p = (int *)__pa_nodebug(&early_old_rev);
 
 	*delay_ucode_info_p = 1;
 	*current_mc_date_p = date;
+	*early_old_rev_p = old_rev;
 }
 #else
 
-static inline void print_ucode(int new_rev, int date)
+static inline void print_ucode(int old_rev, int new_rev, int date)
 {
-	print_ucode_info(new_rev, date);
+	print_ucode_info(old_rev, new_rev, date);
 }
 #endif
 
 static int apply_microcode_early(struct ucode_cpu_info *uci, bool early)
 {
 	struct microcode_intel *mc;
-	u32 rev;
+	u32 rev, old_rev;
 
 	mc = uci->mc;
 	if (!mc)
@@ -390,6 +394,7 @@ static int apply_microcode_early(struct ucode_cpu_info *uci, bool early)
 	/* write microcode via MSR 0x79 */
 	native_wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits);
 
+	old_rev = rev;
 	rev = intel_get_microcode_revision();
 	if (rev != mc->hdr.rev)
 		return -1;
@@ -397,9 +402,9 @@ static int apply_microcode_early(struct ucode_cpu_info *uci, bool early)
 	uci->cpu_sig.rev = rev;
 
 	if (early)
-		print_ucode(uci->cpu_sig.rev, mc->hdr.date);
+		print_ucode(old_rev, uci->cpu_sig.rev, mc->hdr.date);
 	else
-		print_ucode_info(uci->cpu_sig.rev, mc->hdr.date);
+		print_ucode_info(old_rev, uci->cpu_sig.rev, mc->hdr.date);
 
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Patch V1 7/7] x86/microcode/intel: Print when early microcode loading fails
  2022-11-29 21:08 [Patch V1 0/7] x86/microcode: Some cleanups and fixes for microcode Ashok Raj
                   ` (5 preceding siblings ...)
  2022-11-29 21:08 ` [Patch V1 6/7] x86/microcode/intel: Print old and new rev during early boot Ashok Raj
@ 2022-11-29 21:08 ` Ashok Raj
  2022-12-02 19:30   ` Thomas Gleixner
  6 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-11-29 21:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Ashok Raj, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

Currently when early microcode loading fails there is no way for user to
know that the update failed.

Store the failed status and pass it to print_ucode_info() to let early
loading failures to captured in console log.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/intel.c | 29 ++++++++++++++++-----------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 3dbcf457f45d..3b299f437e35 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -309,13 +309,14 @@ static bool load_builtin_intel_microcode(struct cpio_data *cp)
  * Print ucode update info.
  */
 static void
-print_ucode_info(int old_rev, int new_rev, unsigned int date)
+print_ucode_info(bool failed, int old_rev, int new_rev, unsigned int date)
 {
-	pr_info_once("early update: 0x%x -> 0x%x, date = %04x-%02x-%02x\n",
+	pr_info_once("early update: 0x%x -> 0x%x, date = %04x-%02x-%02x %s\n",
 		     old_rev, new_rev,
 		     date & 0xffff,
 		     date >> 24,
-		     (date >> 16) & 0xff);
+		     (date >> 16) & 0xff,
+		     failed ? "FAILED" : "");
 }
 
 #ifdef CONFIG_X86_32
@@ -323,6 +324,7 @@ print_ucode_info(int old_rev, int new_rev, unsigned int date)
 static int delay_ucode_info;
 static int current_mc_date;
 static int early_old_rev;
+static bool early_failed;
 
 /*
  * Print early updated ucode info after printk works. This is delayed info dump.
@@ -333,7 +335,7 @@ void show_ucode_info_early(void)
 
 	if (delay_ucode_info) {
 		intel_cpu_collect_info(&uci);
-		print_ucode_info(early_old_rev, uci.cpu_sig.rev, current_mc_date);
+		print_ucode_info(early_failed, early_old_rev, uci.cpu_sig.rev, current_mc_date);
 		delay_ucode_info = 0;
 	}
 }
@@ -342,26 +344,28 @@ void show_ucode_info_early(void)
  * At this point, we can not call printk() yet. Delay printing microcode info in
  * show_ucode_info_early() until printk() works.
  */
-static void print_ucode(int old_rev, int new_rev, int date)
+static void print_ucode(bool failed, int old_rev, int new_rev, int date)
 {
-	struct microcode_intel *mc;
 	int *delay_ucode_info_p;
 	int *current_mc_date_p;
 	int *early_old_rev_p;
+	bool *early_failed_p;
 
 	delay_ucode_info_p = (int *)__pa_nodebug(&delay_ucode_info);
 	current_mc_date_p = (int *)__pa_nodebug(&current_mc_date);
 	early_old_rev_p = (int *)__pa_nodebug(&early_old_rev);
+	early_failed_p = (int *)__pa_nodebug(&early_failed);
 
 	*delay_ucode_info_p = 1;
 	*current_mc_date_p = date;
 	*early_old_rev_p = old_rev;
+	*early_failed_p = failed;
 }
 #else
 
-static inline void print_ucode(int old_rev, int new_rev, int date)
+static inline void print_ucode(bool failed, int old_rev, int new_rev, int date)
 {
-	print_ucode_info(old_rev, new_rev, date);
+	print_ucode_info(failed, old_rev, new_rev, date);
 }
 #endif
 
@@ -369,6 +373,7 @@ static int apply_microcode_early(struct ucode_cpu_info *uci, bool early)
 {
 	struct microcode_intel *mc;
 	u32 rev, old_rev;
+	int retval = 0;
 
 	mc = uci->mc;
 	if (!mc)
@@ -397,16 +402,16 @@ static int apply_microcode_early(struct ucode_cpu_info *uci, bool early)
 	old_rev = rev;
 	rev = intel_get_microcode_revision();
 	if (rev != mc->hdr.rev)
-		return -1;
+		retval = -1;
 
 	uci->cpu_sig.rev = rev;
 
 	if (early)
-		print_ucode(old_rev, uci->cpu_sig.rev, mc->hdr.date);
+		print_ucode(retval, old_rev, mc->hdr.rev, mc->hdr.date);
 	else
-		print_ucode_info(old_rev, uci->cpu_sig.rev, mc->hdr.date);
+		print_ucode_info(retval, old_rev, uci->cpu_sig.rev, mc->hdr.date);
 
-	return 0;
+	return retval;
 }
 
 int __init save_microcode_in_initrd_intel(void)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Patch V1 1/7] x86/microcode/intel: Remove redundant microcode rev pr_info()s
  2022-11-29 21:08 ` [Patch V1 1/7] x86/microcode/intel: Remove redundant microcode rev pr_info()s Ashok Raj
@ 2022-12-02 18:58   ` Thomas Gleixner
  2022-12-03  0:26     ` Ashok Raj
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2022-12-02 18:58 UTC (permalink / raw)
  To: Ashok Raj, Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Ashok Raj, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

Ashok!

On Tue, Nov 29 2022 at 13:08, Ashok Raj wrote:
> There is a pr_info() to dump information about newly loaded microcode.

There... Somewhere, right?

> The code intends this pr_info() to be just once, but the check to ensure
> is racy. Unfortunately this happens quite often in with this new change
> resulting in multiple redundant prints on the console.

-ENOPARSE. Can you try to express that in coherent sentences please?

> microcode_init()->schedule_on_each_cpu(setup_online_cpu)->collect_cpu_info
>
> [   33.688639] microcode: sig=0x50654, pf=0x80, revision=0x2006e05
> [   33.688659] microcode: sig=0x50654, pf=0x80, revision=0x2006e05
> [   33.688660] microcode: sig=0x50654, pf=0x80, revision=0x2006e05
>
> There is already a pr_info() in microcode/core.c as shown below:
>
> microcode: Reload completed, microcode revision: 0x2b000041 -> 0x2b000070

There are quite some pr_info()'s in microcode/core.c...

$function_name() prints the new and the previous microcode revision once
when the load has completed:

  microcode: Reload completed, microcode revision: 0x2b000041 -> 0x2b000070

Hmm?

> The sig and pf aren't that useful to end user, they are available via

The sig and pf ?!? Come on, you really can do better.

> /proc/cpuinfo and this never changes between microcode loads.
>
> Remove the redundant pr_info() and the racy single print checks. This
> removes the race entirely, zap the duplicated pr_info() spam and
> simplify the code.

The last sentence does not qualify as coherent either.

Other than that. Nice cleanup.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load
  2022-11-29 21:08 ` [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load Ashok Raj
@ 2022-12-02 19:01   ` Thomas Gleixner
  2022-12-03  1:48     ` Ashok Raj
  2022-12-02 23:53   ` Sohil Mehta
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2022-12-02 19:01 UTC (permalink / raw)
  To: Ashok Raj, Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Ashok Raj, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

On Tue, Nov 29 2022 at 13:08, Ashok Raj wrote:
> Microcode loading can fail. This happens today when handling mixed
> steppings. But it can also happen for other reasons such as corrupted
> image, Security Version Number (SVN) preventing anti-rollback,
> dependencies on BIOS loaded microcode image for some capabilities.
>
> When the microcode loading fails, the kernel will quietly hang at boot.
> This has been observed by end users (Links below) who had to revert their
> microcode packages in order to boot again.
>
> The hang is due to an infinite retry loop. The retries were in place to
> support systems with mixed steppings. Now that mixed steppings are no
> longer supported, there is only one microcode image at a time. Any retries
> will simply reattempt to apply the same image over and over without making
> progress.
>
> Some possible past bugs that could be due to this bug are below.
>
> There is no direct evidence that these end user issues were caused by this
> retry loop. However, the early boot hangs along with reverting the
> microcode update workaround provide strong circumstantial evidence to
> support the theory that they are linked.
>
> Remove the retry loop and only attempt to apply microcode once.

Very concise and informative changelog. See, you can do it :)

> Link: https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1911959
> Link: https://forums.linuxmint.com/viewtopic.php?p=1827032#1827032
> Link: https://askubuntu.com/questions/1291486/boot-crash-after-latest-update-of-intel-microcode-nov-11-2020
> Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
> Cc: stable@vger.kernel.org
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>

Nit: Can you order the tags according to the tip documentation next time
     please?

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 3/7] x86/microcode/core: Move microcode_check() to cpu/microcode/core.c
  2022-11-29 21:08 ` [Patch V1 3/7] x86/microcode/core: Move microcode_check() to cpu/microcode/core.c Ashok Raj
@ 2022-12-02 19:02   ` Thomas Gleixner
  2022-12-02 20:57   ` Sohil Mehta
  2022-12-05 16:25   ` Borislav Petkov
  2 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2022-12-02 19:02 UTC (permalink / raw)
  To: Ashok Raj, Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Ashok Raj, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

On Tue, Nov 29 2022 at 13:08, Ashok Raj wrote:

> microcode_check() is only called from microcode/core.c. Move it and make
> it static to prepare for upcoming fix of false negative when checking CPU
> features after a microcode update. Also move get_cpu_cap() to processor.h
> for general use outside of kernel/cpu.h
>
> No functional change.
>
> Suggested-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 4/7] x86/microcode/core: Take a snapshot before and after applying microcode
  2022-11-29 21:08 ` [Patch V1 4/7] x86/microcode/core: Take a snapshot before and after applying microcode Ashok Raj
@ 2022-12-02 19:09   ` Thomas Gleixner
  2022-12-03  1:57     ` Ashok Raj
  2022-12-07 20:25   ` Borislav Petkov
  1 sibling, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2022-12-02 19:09 UTC (permalink / raw)
  To: Ashok Raj, Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Ashok Raj, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

On Tue, Nov 29 2022 at 13:08, Ashok Raj wrote:

> The kernel caches features about each CPU's features at boot in an
> x86_capability[] structure. The microcode update takes one snapshot and
> compares it with the saved copy at boot.
>
> However, the capabilities in the boot copy can be turned off as a result of
> certain command line parameters or configuration restrictions. This can
> cause a mismatch when comparing the values before and after the microcode
> update.
>
> microcode_check() is called after an update to report any previously
> cached CPUID bits might have changed due to the update.
>
> Ignore the capabilities recorded at boot. Take a new snapshot before the
> update and compare with a snapshot after the update to eliminate the false
> warning.

Makes sense.

> +static void copy_cpu_caps(struct cpuinfo_x86 *info)
> +{
> +	/* Reload CPUID max function as it might've changed. */
> +	info->cpuid_level = cpuid_eax(0);
> +
> +	/*
> +	 * Copy all capability leafs to pick up the synthetic ones so that
> +	 * memcmp() below doesn't fail on that. The ones coming from CPUID will
> +	 * get overwritten in get_cpu_cap().
> +	 */
> +	memcpy(info->x86_capability, &boot_cpu_data.x86_capability,
> +	       sizeof(info->x86_capability));
> +
> +	get_cpu_cap(info);
> +}
> +
>  /*
>   * The microcode loader calls this upon late microcode load to recheck features,
>   * only when microcode has been updated. Caller holds microcode_mutex and CPU
>   * hotplug lock.
>   */
> -static void microcode_check(void)
> +static void microcode_check(struct cpuinfo_x86 *orig)
>  {
>  	struct cpuinfo_x86 info;
>  
> @@ -446,15 +462,13 @@ static void microcode_check(void)
>  	info.cpuid_level = cpuid_eax(0);
>  
>  	/*
> -	 * Copy all capability leafs to pick up the synthetic ones so that
> -	 * memcmp() below doesn't fail on that. The ones coming from CPUID will
> -	 * get overwritten in get_cpu_cap().
> -	 */
> -	memcpy(&info.x86_capability, &boot_cpu_data.x86_capability, sizeof(info.x86_capability));
> +	* Copy all capability leafs to pick up the synthetic ones so that
> +	* memcmp() below doesn't fail on that. The ones coming from CPUID will
> +	* get overwritten in get_cpu_cap().
> +	*/
> +	copy_cpu_caps(&info);
>  
> -	get_cpu_cap(&info);
> -
> -	if (!memcmp(&info.x86_capability, &boot_cpu_data.x86_capability,
> +	if (!memcmp(&info.x86_capability, &orig->x86_capability,
>  		    sizeof(info.x86_capability)))
>  		return;
>  
> @@ -469,6 +483,7 @@ static void microcode_check(void)
>  static int microcode_reload_late(void)
>  {
>  	int old = boot_cpu_data.microcode, ret;
> +	struct cpuinfo_x86 info;
>  
>  	pr_err("Attempting late microcode loading - it is dangerous and taints the kernel.\n");
>  	pr_err("You should switch to early loading, if possible.\n");
> @@ -476,9 +491,10 @@ static int microcode_reload_late(void)
>  	atomic_set(&late_cpus_in,  0);
>  	atomic_set(&late_cpus_out, 0);
>  
> +	copy_cpu_caps(&info);
>  	ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);

You clearly ran out of newlines and comments here.

>  	if (ret == 0)
> -		microcode_check();
> +		microcode_check(&info);
>  
>  	pr_info("Reload completed, microcode revision: 0x%x -> 0x%x\n",
>  		old, boot_cpu_data.microcode);

Unrelated to that patch, but it just caught my attention. Why are we
printing this is case of failure?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 5/7] x86/microcode/intel: Prepare the print_ucode_rev to simply take a rev to print
  2022-11-29 21:08 ` [Patch V1 5/7] x86/microcode/intel: Prepare the print_ucode_rev to simply take a rev to print Ashok Raj
@ 2022-12-02 19:23   ` Thomas Gleixner
  0 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2022-12-02 19:23 UTC (permalink / raw)
  To: Ashok Raj, Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Ashok Raj, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

On Tue, Nov 29 2022 at 13:08, Ashok Raj wrote:

> Instead of passing a struct ucode_cpu_info, just pass the rev to print.
>
> Next patch will permit printing old and new revisions after an early
> update. A subsequent patch will print a message when early loading fails.
>
> struct ucode_cpu_info is always the current version in the CPU. When
> loading fails this is the old rev, when its successfully applied its the
> new rev. That makes the code bit ugly to read.
>
> Remove the struct ucode_cpu_info parameter from print_ucode() and let
> the caller to pass in the revision number to print.

Back to word salad mode?

  Subject: x86/microcode/intel: Use a plain revision argument for print_ucode_rev()

  print_ucode_rev() takes a struct ucode_cpu_info argument. The sole
  purpose of it is to print the microcode revision.

  The only available ucode_cpu_info describes always the currently
  loaded microcode version. After a microcode update this is on success
  the new version or on failure the original version.

  Subsequent changes need to print both the original and the new
  version, but the original version will be cached in a plain
  integer, which makes the code inconsistent.

  Replace the struct ucode_cpu_info argument with a plain integer which
  contains the revision number and adjust the call sites accordingly.

  No functional change.

Hmm?

Other than that.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 6/7] x86/microcode/intel: Print old and new rev during early boot
  2022-11-29 21:08 ` [Patch V1 6/7] x86/microcode/intel: Print old and new rev during early boot Ashok Raj
@ 2022-12-02 19:29   ` Thomas Gleixner
  0 siblings, 0 replies; 32+ messages in thread
From: Thomas Gleixner @ 2022-12-02 19:29 UTC (permalink / raw)
  To: Ashok Raj, Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Ashok Raj, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

On Tue, Nov 29 2022 at 13:08, Ashok Raj wrote:
> Make early loading message to match late loading messages. Print
> both old and new revisions.
>
> This is helpful to know what the BIOS loaded revision is before an early
> update.
>
> microcode: early update: 0x2b000041 -> 0x2b000070 date = 2000-01-01
>
> Store the early BIOS revision and change the print format to match
> late loading message from microcode/core.c

Cache the early BIOS revision before the microcode update and change
print_ucode_info() so it prints both the old and the new revision in the
same format as $function().

>   */
>  static void
> -print_ucode_info(unsigned int new_rev, unsigned int date)
> +print_ucode_info(int old_rev, int new_rev, unsigned int date)

Oh. Forgot to mention that before. Can you please get rid of this
pointless line break after 'static void' ?

>  {
> -	pr_info_once("microcode updated early to revision 0x%x, date = %04x-%02x-%02x\n",
> -		     new_rev,
> +	pr_info_once("early update: 0x%x -> 0x%x, date = %04x-%02x-%02x\n",
> +		     old_rev, new_rev,
>  		     date & 0xffff,
>  		     date >> 24,
>  		     (date >> 16) & 0xff);

And while at it also consolidate these arguments into a single or at max.
two lines?

Should happen in the first patch which touches this function.

Other than that, the code is fine.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 7/7] x86/microcode/intel: Print when early microcode loading fails
  2022-11-29 21:08 ` [Patch V1 7/7] x86/microcode/intel: Print when early microcode loading fails Ashok Raj
@ 2022-12-02 19:30   ` Thomas Gleixner
  2022-12-05 18:19     ` Ashok Raj
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Gleixner @ 2022-12-02 19:30 UTC (permalink / raw)
  To: Ashok Raj, Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Ashok Raj, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

On Tue, Nov 29 2022 at 13:08, Ashok Raj wrote:

> Currently when early microcode loading fails there is no way for user to

the user

> know that the update failed.
>
> Store the failed status and pass it to print_ucode_info() to let early
> loading failures to captured in console log.

so that early loading failures are captured in dmesg.

Hmm?



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 3/7] x86/microcode/core: Move microcode_check() to cpu/microcode/core.c
  2022-11-29 21:08 ` [Patch V1 3/7] x86/microcode/core: Move microcode_check() to cpu/microcode/core.c Ashok Raj
  2022-12-02 19:02   ` Thomas Gleixner
@ 2022-12-02 20:57   ` Sohil Mehta
  2022-12-03  0:21     ` Ashok Raj
  2022-12-05 16:25   ` Borislav Petkov
  2 siblings, 1 reply; 32+ messages in thread
From: Sohil Mehta @ 2022-12-02 20:57 UTC (permalink / raw)
  To: Ashok Raj
  Cc: X86-kernel, LKML Mailing List, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre, Borislav Petkov

On 11/29/2022 1:08 PM, Ashok Raj wrote:
> microcode_check() is only called from microcode/core.c. Move it and make
> it static to prepare for upcoming fix of false negative when checking CPU
> features after a microcode update.

Should we use this opportunity to also make the function name a bit more 
descriptive? microcode_check() seems very ambiguous to a first time reader.

> +/*
> + * The microcode loader calls this upon late microcode load to recheck features,
> + * only when microcode has been updated. Caller holds microcode_mutex and CPU
> + * hotplug lock.
> + */
> +static void microcode_check(void)

How about, microcode_recheck_features() or simply recheck_features() 
since it is static now?

Sohil


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load
  2022-11-29 21:08 ` [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load Ashok Raj
  2022-12-02 19:01   ` Thomas Gleixner
@ 2022-12-02 23:53   ` Sohil Mehta
  2022-12-03  1:47     ` Ashok Raj
  2022-12-05 12:18   ` Borislav Petkov
  2022-12-05 20:53   ` [tip: x86/microcode] x86/microcode/intel: Do not retry microcode reloading on the APs tip-bot2 for Ashok Raj
  3 siblings, 1 reply; 32+ messages in thread
From: Sohil Mehta @ 2022-12-02 23:53 UTC (permalink / raw)
  To: Ashok Raj, Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

On 11/29/2022 1:08 PM, Ashok Raj wrote:
> -
> -	if (apply_microcode_early(&uci, true)) {
> -		/* Mixed-silicon system? Try to refetch the proper patch: */
> -		*iup = NULL;
> -
> -		goto reget;
> -	}
> +	apply_microcode_early(&uci, true);

After this change, none of the callers of apply_microcode_early() check 
the return code.

In future, do we expect callers to care about the return code? The rest 
patches in this series don't seem to suggest so. Also, the expected 
error printing happens in the function itself.

Should the return type for apply_microcode_early() be changed to void 
(in a follow-up patch)?

Sohil

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 3/7] x86/microcode/core: Move microcode_check() to cpu/microcode/core.c
  2022-12-02 20:57   ` Sohil Mehta
@ 2022-12-03  0:21     ` Ashok Raj
  0 siblings, 0 replies; 32+ messages in thread
From: Ashok Raj @ 2022-12-03  0:21 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: X86-kernel, LKML Mailing List, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre, Borislav Petkov, Ashok Raj

On Fri, Dec 02, 2022 at 12:57:23PM -0800, Mehta, Sohil wrote:
> On 11/29/2022 1:08 PM, Ashok Raj wrote:
> > microcode_check() is only called from microcode/core.c. Move it and make
> > it static to prepare for upcoming fix of false negative when checking CPU
> > features after a microcode update.
> 
> Should we use this opportunity to also make the function name a bit more
> descriptive? microcode_check() seems very ambiguous to a first time reader.
> 
> > +/*
> > + * The microcode loader calls this upon late microcode load to recheck features,
> > + * only when microcode has been updated. Caller holds microcode_mutex and CPU
> > + * hotplug lock.
> > + */
> > +static void microcode_check(void)
> 
> How about, microcode_recheck_features() or simply recheck_features() since
> it is static now?

I suppose we could. But given that the function already has some comments
around it and its not having multiple call sites, it seems to be reasonably
serving its named purpose :-)

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 1/7] x86/microcode/intel: Remove redundant microcode rev pr_info()s
  2022-12-02 18:58   ` Thomas Gleixner
@ 2022-12-03  0:26     ` Ashok Raj
  2022-12-03 13:42       ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-12-03  0:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Borislav Petkov, X86-kernel, LKML Mailing List, Dave Hansen,
	Tony Luck, alison.schofield, reinette.chatre, Ashok Raj

On Fri, Dec 02, 2022 at 07:58:42PM +0100, Thomas Gleixner wrote:
> Ashok!
> 
> On Tue, Nov 29 2022 at 13:08, Ashok Raj wrote:
> > There is a pr_info() to dump information about newly loaded microcode.
> 
> There... Somewhere, right?

I'll make it clear, updated commit log below.

> 
> > The code intends this pr_info() to be just once, but the check to ensure
> > is racy. Unfortunately this happens quite often in with this new change
> > resulting in multiple redundant prints on the console.
> 
> -ENOPARSE. Can you try to express that in coherent sentences please?

:-)

> 
> > microcode_init()->schedule_on_each_cpu(setup_online_cpu)->collect_cpu_info
> >
> > [   33.688639] microcode: sig=0x50654, pf=0x80, revision=0x2006e05
> > [   33.688659] microcode: sig=0x50654, pf=0x80, revision=0x2006e05
> > [   33.688660] microcode: sig=0x50654, pf=0x80, revision=0x2006e05
> >
> > There is already a pr_info() in microcode/core.c as shown below:
> >
> > microcode: Reload completed, microcode revision: 0x2b000041 -> 0x2b000070
> 
> There are quite some pr_info()'s in microcode/core.c...
> 
> $function_name() prints the new and the previous microcode revision once
> when the load has completed:
> 
>   microcode: Reload completed, microcode revision: 0x2b000041 -> 0x2b000070
> 
> Hmm?

Agreed!

> 
> > The sig and pf aren't that useful to end user, they are available via
> 
> The sig and pf ?!? Come on, you really can do better.
> 
> > /proc/cpuinfo and this never changes between microcode loads.
> >
> > Remove the redundant pr_info() and the racy single print checks. This
> > removes the race entirely, zap the duplicated pr_info() spam and
> > simplify the code.
> 
> The last sentence does not qualify as coherent either.
> 
> Other than that. Nice cleanup.
> 
Thanks!. I'll try to get better at the commit log stuff 

Updated commit log looks like below. Hope it doesn't get a -ENOPARSE this
time. :-)

------------------------

This code in collect_cpu_info() simply checks with a static variable "prev",
but when multiple CPUs are running this in parallel it is racy and we notice
the pr_info() couple times. The original intend was to print this just once.

New sequence shown below:

microcode_init()->schedule_on_each_cpu(setup_online_cpu)->collect_cpu_info

Resulting multiple prints below:

[   33.688639] microcode: sig=0x50654, pf=0x80, revision=0x2006e05
[   33.688659] microcode: sig=0x50654, pf=0x80, revision=0x2006e05
[   33.688660] microcode: sig=0x50654, pf=0x80, revision=0x2006e05

There is already a pr_info() in microcode_reload_late() that shows both the
old and new revisions as shown below.

microcode: Reload completed, microcode revision: 0x2b000041 -> 0x2b000070

The CPU signature (sig=0x50654) and Processor Flags (pf=0x80) above aren't
that useful to end user, they are available via /proc/cpuinfo and this never
changes between microcode loads.

Remove the redundant pr_info().

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load
  2022-12-02 23:53   ` Sohil Mehta
@ 2022-12-03  1:47     ` Ashok Raj
  0 siblings, 0 replies; 32+ messages in thread
From: Ashok Raj @ 2022-12-03  1:47 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Borislav Petkov, X86-kernel, LKML Mailing List, Dave Hansen,
	Tony Luck, alison.schofield, reinette.chatre, Ashok Raj

On Fri, Dec 02, 2022 at 03:53:52PM -0800, Mehta, Sohil wrote:
> On 11/29/2022 1:08 PM, Ashok Raj wrote:
> > -
> > -	if (apply_microcode_early(&uci, true)) {
> > -		/* Mixed-silicon system? Try to refetch the proper patch: */
> > -		*iup = NULL;
> > -
> > -		goto reget;
> > -	}
> > +	apply_microcode_early(&uci, true);
> 
> After this change, none of the callers of apply_microcode_early() check the
> return code.
> 
> In future, do we expect callers to care about the return code? The rest
> patches in this series don't seem to suggest so. Also, the expected error
> printing happens in the function itself.
> 
> Should the return type for apply_microcode_early() be changed to void (in a
> follow-up patch)?

Good idea.. But I think its early, the return code could be used for
something useful. I have some additional cleanup patches that I need to
fixup and we could use this for real.

For e.g. early loading failures are now reported by each vendor, if we can
consolidate this, we could do it more at a core level, but I'm worried it
might be too much change right now, and this can wait its turn. 

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load
  2022-12-02 19:01   ` Thomas Gleixner
@ 2022-12-03  1:48     ` Ashok Raj
  0 siblings, 0 replies; 32+ messages in thread
From: Ashok Raj @ 2022-12-03  1:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Borislav Petkov, X86-kernel, LKML Mailing List, Dave Hansen,
	Tony Luck, alison.schofield, reinette.chatre, Ashok Raj

On Fri, Dec 02, 2022 at 08:01:54PM +0100, Thomas Gleixner wrote:
> On Tue, Nov 29 2022 at 13:08, Ashok Raj wrote:
> > Microcode loading can fail. This happens today when handling mixed
> > steppings. But it can also happen for other reasons such as corrupted
> > image, Security Version Number (SVN) preventing anti-rollback,
> > dependencies on BIOS loaded microcode image for some capabilities.
> >
> > When the microcode loading fails, the kernel will quietly hang at boot.
> > This has been observed by end users (Links below) who had to revert their
> > microcode packages in order to boot again.
> >
> > The hang is due to an infinite retry loop. The retries were in place to
> > support systems with mixed steppings. Now that mixed steppings are no
> > longer supported, there is only one microcode image at a time. Any retries
> > will simply reattempt to apply the same image over and over without making
> > progress.
> >
> > Some possible past bugs that could be due to this bug are below.
> >
> > There is no direct evidence that these end user issues were caused by this
> > retry loop. However, the early boot hangs along with reverting the
> > microcode update workaround provide strong circumstantial evidence to
> > support the theory that they are linked.
> >
> > Remove the retry loop and only attempt to apply microcode once.
> 
> Very concise and informative changelog. See, you can do it :)
> 
> > Link: https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1911959
> > Link: https://forums.linuxmint.com/viewtopic.php?p=1827032#1827032
> > Link: https://askubuntu.com/questions/1291486/boot-crash-after-latest-update-of-intel-microcode-nov-11-2020
> > Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> 
> Nit: Can you order the tags according to the tip documentation next time
>      please?
> 
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

Thanks!. Dave Hansen gave me script to order them correctly :-).. I'll fix
when i repost them.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 4/7] x86/microcode/core: Take a snapshot before and after applying microcode
  2022-12-02 19:09   ` Thomas Gleixner
@ 2022-12-03  1:57     ` Ashok Raj
  0 siblings, 0 replies; 32+ messages in thread
From: Ashok Raj @ 2022-12-03  1:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Borislav Petkov, X86-kernel, LKML Mailing List, Dave Hansen,
	Tony Luck, alison.schofield, reinette.chatre, Ashok Raj

On Fri, Dec 02, 2022 at 08:09:30PM +0100, Thomas Gleixner wrote:
> On Tue, Nov 29 2022 at 13:08, Ashok Raj wrote:
> 
> > The kernel caches features about each CPU's features at boot in an
> > x86_capability[] structure. The microcode update takes one snapshot and
> > compares it with the saved copy at boot.
> >
> > However, the capabilities in the boot copy can be turned off as a result of
> > certain command line parameters or configuration restrictions. This can
> > cause a mismatch when comparing the values before and after the microcode
> > update.
> >
> > microcode_check() is called after an update to report any previously
> > cached CPUID bits might have changed due to the update.
> >
> > Ignore the capabilities recorded at boot. Take a new snapshot before the
> > update and compare with a snapshot after the update to eliminate the false
> > warning.
> 
> Makes sense.
> 
> > +static void copy_cpu_caps(struct cpuinfo_x86 *info)
> > +{
> > +	/* Reload CPUID max function as it might've changed. */
> > +	info->cpuid_level = cpuid_eax(0);
> > +
> > +	/*
> > +	 * Copy all capability leafs to pick up the synthetic ones so that
> > +	 * memcmp() below doesn't fail on that. The ones coming from CPUID will
> > +	 * get overwritten in get_cpu_cap().
> > +	 */
> > +	memcpy(info->x86_capability, &boot_cpu_data.x86_capability,
> > +	       sizeof(info->x86_capability));
> > +
> > +	get_cpu_cap(info);
> > +}
> > +
> >  /*
> >   * The microcode loader calls this upon late microcode load to recheck features,
> >   * only when microcode has been updated. Caller holds microcode_mutex and CPU
> >   * hotplug lock.
> >   */
> > -static void microcode_check(void)
> > +static void microcode_check(struct cpuinfo_x86 *orig)
> >  {
> >  	struct cpuinfo_x86 info;
> >  
> > @@ -446,15 +462,13 @@ static void microcode_check(void)
> >  	info.cpuid_level = cpuid_eax(0);
> >  
> >  	/*
> > -	 * Copy all capability leafs to pick up the synthetic ones so that
> > -	 * memcmp() below doesn't fail on that. The ones coming from CPUID will
> > -	 * get overwritten in get_cpu_cap().
> > -	 */
> > -	memcpy(&info.x86_capability, &boot_cpu_data.x86_capability, sizeof(info.x86_capability));
> > +	* Copy all capability leafs to pick up the synthetic ones so that
> > +	* memcmp() below doesn't fail on that. The ones coming from CPUID will
> > +	* get overwritten in get_cpu_cap().
> > +	*/
> > +	copy_cpu_caps(&info);
> >  
> > -	get_cpu_cap(&info);
> > -
> > -	if (!memcmp(&info.x86_capability, &boot_cpu_data.x86_capability,
> > +	if (!memcmp(&info.x86_capability, &orig->x86_capability,
> >  		    sizeof(info.x86_capability)))
> >  		return;
> >  
> > @@ -469,6 +483,7 @@ static void microcode_check(void)
> >  static int microcode_reload_late(void)
> >  {
> >  	int old = boot_cpu_data.microcode, ret;
> > +	struct cpuinfo_x86 info;
> >  
> >  	pr_err("Attempting late microcode loading - it is dangerous and taints the kernel.\n");
> >  	pr_err("You should switch to early loading, if possible.\n");
> > @@ -476,9 +491,10 @@ static int microcode_reload_late(void)
> >  	atomic_set(&late_cpus_in,  0);
> >  	atomic_set(&late_cpus_out, 0);
> >  
> > +	copy_cpu_caps(&info);
> >  	ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);
> 
> You clearly ran out of newlines and comments here.

:-).. I'll add comments and some separation in my next post.

> 
> >  	if (ret == 0)
> > -		microcode_check();
> > +		microcode_check(&info);
> >  
> >  	pr_info("Reload completed, microcode revision: 0x%x -> 0x%x\n",
> >  		old, boot_cpu_data.microcode);
> 
> Unrelated to that patch, but it just caught my attention. Why are we
> printing this is case of failure?

You are correct. I do have one, and for some reason was stuck behind couple
other unrelated patches. I moved it up and will include in my next repost.

From: Ashok Raj <ashok.raj@intel.com>
Date: 2022-11-18 19:49:09 -0800

x86/microcode: Display revisions only when update is successful

Right now, both microcode loading failures and successes print the same
message that Reloading is completed. This is misleading to users who would
need to deduce the reload failure by checking that the revision fields
didn't actually change.

Display the updated revision number only if an update is successful.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>

---

 arch/x86/kernel/cpu/microcode/core.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index fab2010ff368..58ee81ea09ed 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -493,11 +493,12 @@ static int microcode_reload_late(void)
 
 	copy_cpu_caps(&info);
 	ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);
-	if (ret == 0)
-		microcode_check(&info);
 
-	pr_info("Reload completed, microcode revision: 0x%x -> 0x%x\n",
-		old, boot_cpu_data.microcode);
+	if (ret == 0) {
+		pr_info("Reload completed, microcode revision: 0x%x -> 0x%x\n",
+			old, boot_cpu_data.microcode);
+		microcode_check(&info);
+	}
 
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Patch V1 1/7] x86/microcode/intel: Remove redundant microcode rev pr_info()s
  2022-12-03  0:26     ` Ashok Raj
@ 2022-12-03 13:42       ` Borislav Petkov
  0 siblings, 0 replies; 32+ messages in thread
From: Borislav Petkov @ 2022-12-03 13:42 UTC (permalink / raw)
  To: Ashok Raj
  Cc: Thomas Gleixner, X86-kernel, LKML Mailing List, Dave Hansen,
	Tony Luck, alison.schofield, reinette.chatre

On Fri, Dec 02, 2022 at 04:26:58PM -0800, Ashok Raj wrote:
> This code in collect_cpu_info() simply checks with a static variable "prev",
> but when multiple CPUs are running this in parallel it is racy and we notice

For the future:

Please use passive voice in your commit message: no "we" or "I", etc,
and describe your changes in imperative mood.

I've fixed it up and the rest now.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load
  2022-11-29 21:08 ` [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load Ashok Raj
  2022-12-02 19:01   ` Thomas Gleixner
  2022-12-02 23:53   ` Sohil Mehta
@ 2022-12-05 12:18   ` Borislav Petkov
  2022-12-05 16:42     ` Ashok Raj
  2022-12-05 20:53   ` [tip: x86/microcode] x86/microcode/intel: Do not retry microcode reloading on the APs tip-bot2 for Ashok Raj
  3 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2022-12-05 12:18 UTC (permalink / raw)
  To: Ashok Raj
  Cc: X86-kernel, LKML Mailing List, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

On Tue, Nov 29, 2022 at 01:08:27PM -0800, Ashok Raj wrote:
> There is no direct evidence that these end user issues were caused by this
> retry loop. However, the early boot hangs along with reverting the
> microcode update workaround provide strong circumstantial evidence to
> support the theory that they are linked.

A "circumstantial" reason for why something "might" be broken has no
place in a commit message.

If you still wanna chase this and *actually* give me a sane,
comprehensible reason of why this could cause an endless loop with
officially released microcode, then I'm willing to listen.

Otherwise, I'll apply this:

---
From: Ashok Raj <ashok.raj@intel.com>
Date: Tue, 29 Nov 2022 13:08:27 -0800
Subject: x86/microcode/intel: Remove retries on early microcode load

The retries in load_ucode_intel_ap() were in place to support systems
with mixed steppings. Mixed steppings are no longer supported and there is
only one microcode image at a time. Any retries will simply reattempt to
apply the same image over and over without making progress.

Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221129210832.107850-3-ashok.raj@intel.com
---
 arch/x86/kernel/cpu/microcode/intel.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 4f93875f57b4..d68b084a17e7 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -495,7 +495,6 @@ void load_ucode_intel_ap(void)
 	else
 		iup = &intel_ucode_patch;
 
-reget:
 	if (!*iup) {
 		patch = __load_ucode_intel(&uci);
 		if (!patch)
@@ -505,13 +504,7 @@ void load_ucode_intel_ap(void)
 	}
 
 	uci.mc = *iup;
-
-	if (apply_microcode_early(&uci, true)) {
-		/* Mixed-silicon system? Try to refetch the proper patch: */
-		*iup = NULL;
-
-		goto reget;
-	}
+	apply_microcode_early(&uci, true);
 }
 
 static struct microcode_intel *find_patch(struct ucode_cpu_info *uci)

-- 
2.34.1


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Patch V1 3/7] x86/microcode/core: Move microcode_check() to cpu/microcode/core.c
  2022-11-29 21:08 ` [Patch V1 3/7] x86/microcode/core: Move microcode_check() to cpu/microcode/core.c Ashok Raj
  2022-12-02 19:02   ` Thomas Gleixner
  2022-12-02 20:57   ` Sohil Mehta
@ 2022-12-05 16:25   ` Borislav Petkov
  2022-12-05 17:05     ` Ashok Raj
  2 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2022-12-05 16:25 UTC (permalink / raw)
  To: Ashok Raj
  Cc: X86-kernel, LKML Mailing List, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

On Tue, Nov 29, 2022 at 01:08:28PM -0800, Ashok Raj wrote:
> microcode_check() is only called from microcode/core.c. Move it and make
> it static to prepare for upcoming fix of false negative when checking CPU
> features after a microcode update.

So this function is there in cpu/common.c because it uses CPU facilities
like cpuinfo_x86 and get_cpu_cap() so the logical place was there.
So that I don't have to export a bunch of things but rather have the
microcode loader call into it only.

Your next patch is using more of those CPU-specific facilities so
"bleeding" them into the microcode loader looks like the wrong way
around.

get_cpu_cap() deals with all those c->x86_capability arrays and other
functions which do that, should be there too.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load
  2022-12-05 12:18   ` Borislav Petkov
@ 2022-12-05 16:42     ` Ashok Raj
  0 siblings, 0 replies; 32+ messages in thread
From: Ashok Raj @ 2022-12-05 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre, Ashok Raj

On Mon, Dec 05, 2022 at 01:18:58PM +0100, Borislav Petkov wrote:
> On Tue, Nov 29, 2022 at 01:08:27PM -0800, Ashok Raj wrote:
> > There is no direct evidence that these end user issues were caused by this
> > retry loop. However, the early boot hangs along with reverting the
> > microcode update workaround provide strong circumstantial evidence to
> > support the theory that they are linked.
> 
> A "circumstantial" reason for why something "might" be broken has no
> place in a commit message.
> 
> If you still wanna chase this and *actually* give me a sane,
> comprehensible reason of why this could cause an endless loop with
> officially released microcode, then I'm willing to listen.

Your changes look fine, thanks for fixing it up.

> 
> Otherwise, I'll apply this:
> 

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 3/7] x86/microcode/core: Move microcode_check() to cpu/microcode/core.c
  2022-12-05 16:25   ` Borislav Petkov
@ 2022-12-05 17:05     ` Ashok Raj
  2022-12-05 21:27       ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Ashok Raj @ 2022-12-05 17:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre, Ashok Raj

On Mon, Dec 05, 2022 at 05:25:17PM +0100, Borislav Petkov wrote:
> On Tue, Nov 29, 2022 at 01:08:28PM -0800, Ashok Raj wrote:
> > microcode_check() is only called from microcode/core.c. Move it and make
> > it static to prepare for upcoming fix of false negative when checking CPU
> > features after a microcode update.
> 
> So this function is there in cpu/common.c because it uses CPU facilities
> like cpuinfo_x86 and get_cpu_cap() so the logical place was there.
> So that I don't have to export a bunch of things but rather have the
> microcode loader call into it only.
> 
> Your next patch is using more of those CPU-specific facilities so
> "bleeding" them into the microcode loader looks like the wrong way
> around.
> 
> get_cpu_cap() deals with all those c->x86_capability arrays and other
> functions which do that, should be there too.

I was trying to move this similar to how x86_read_arch_cap_msr()
moved from x86/kernel/cpu/cpu.h -> asm/cpu.h.

Keeping the usage local since there is just one caller to microcode_check()
but there are other users of get_cpu_cap() like in
arch/x86/xen/enlighten_pv.c which seems to be reaching out to
../kernel/cpu/cpu.h. 

That said, what you say also makes sense. I'm fine with what you decide how
this should look.

Cheers
Ashok

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 7/7] x86/microcode/intel: Print when early microcode loading fails
  2022-12-02 19:30   ` Thomas Gleixner
@ 2022-12-05 18:19     ` Ashok Raj
  0 siblings, 0 replies; 32+ messages in thread
From: Ashok Raj @ 2022-12-05 18:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Borislav Petkov, X86-kernel, LKML Mailing List, Dave Hansen,
	Tony Luck, alison.schofield, reinette.chatre, Ashok Raj

On Fri, Dec 02, 2022 at 08:30:52PM +0100, Thomas Gleixner wrote:
> On Tue, Nov 29 2022 at 13:08, Ashok Raj wrote:
> 
> > Currently when early microcode loading fails there is no way for user to
> 
> the user
> 
> > know that the update failed.
> >
> > Store the failed status and pass it to print_ucode_info() to let early
> > loading failures to captured in console log.
> 
> so that early loading failures are captured in dmesg.
> 
> Hmm?

Thanks for the review Thomas. 

Boris, I have fixups to address all of Thomas's comments. Would you
like me to repost a new series with those fixups?

I'll drop patch1, since you have merged it in tip. You have a candidate for
patch2 as well, maybe I can drop that as well.

This patch has a type cast warning for 32bit compiles that I have fixed as
well.

I added a new patch to address Thomas's comment to print revs only when
loading succeeds[1].

Cheers,
Ashok

[1] https://lore.kernel.org/lkml/874judpqqd.ffs@tglx/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [tip: x86/microcode] x86/microcode/intel: Do not retry microcode reloading on the APs
  2022-11-29 21:08 ` [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load Ashok Raj
                     ` (2 preceding siblings ...)
  2022-12-05 12:18   ` Borislav Petkov
@ 2022-12-05 20:53   ` tip-bot2 for Ashok Raj
  3 siblings, 0 replies; 32+ messages in thread
From: tip-bot2 for Ashok Raj @ 2022-12-05 20:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ashok Raj, Borislav Petkov (AMD),
	Thomas Gleixner, stable, x86, linux-kernel

The following commit has been merged into the x86/microcode branch of tip:

Commit-ID:     be1b670f61443aa5d0d01782e9b8ea0ee825d018
Gitweb:        https://git.kernel.org/tip/be1b670f61443aa5d0d01782e9b8ea0ee825d018
Author:        Ashok Raj <ashok.raj@intel.com>
AuthorDate:    Tue, 29 Nov 2022 13:08:27 -08:00
Committer:     Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Mon, 05 Dec 2022 21:22:21 +01:00

x86/microcode/intel: Do not retry microcode reloading on the APs

The retries in load_ucode_intel_ap() were in place to support systems
with mixed steppings. Mixed steppings are no longer supported and there is
only one microcode image at a time. Any retries will simply reattempt to
apply the same image over and over without making progress.

  [ bp: Zap the circumstantial reasoning from the commit message. ]

Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221129210832.107850-3-ashok.raj@intel.com
---
 arch/x86/kernel/cpu/microcode/intel.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 4f93875..2dba4b5 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -495,7 +495,6 @@ void load_ucode_intel_ap(void)
 	else
 		iup = &intel_ucode_patch;
 
-reget:
 	if (!*iup) {
 		patch = __load_ucode_intel(&uci);
 		if (!patch)
@@ -506,12 +505,7 @@ reget:
 
 	uci.mc = *iup;
 
-	if (apply_microcode_early(&uci, true)) {
-		/* Mixed-silicon system? Try to refetch the proper patch: */
-		*iup = NULL;
-
-		goto reget;
-	}
+	apply_microcode_early(&uci, true);
 }
 
 static struct microcode_intel *find_patch(struct ucode_cpu_info *uci)

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Patch V1 3/7] x86/microcode/core: Move microcode_check() to cpu/microcode/core.c
  2022-12-05 17:05     ` Ashok Raj
@ 2022-12-05 21:27       ` Borislav Petkov
  0 siblings, 0 replies; 32+ messages in thread
From: Borislav Petkov @ 2022-12-05 21:27 UTC (permalink / raw)
  To: Ashok Raj
  Cc: X86-kernel, LKML Mailing List, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

On Mon, Dec 05, 2022 at 09:05:54AM -0800, Ashok Raj wrote:
> I was trying to move this similar to how x86_read_arch_cap_msr()
> moved from x86/kernel/cpu/cpu.h -> asm/cpu.h.

But that is only a function prototype - not the *actual* function.

> Keeping the usage local since there is just one caller to microcode_check()
> but there are other users of get_cpu_cap() like in
> arch/x86/xen/enlighten_pv.c which seems to be reaching out to
> ../kernel/cpu/cpu.h. 

Yah, that's the single use outside of kernel/cpu/. Looks like a hack to me. :)

But no worries, we will clean all that up sooner or later and
get_cpu_cap() will disappear someday soon hopefully.

> That said, what you say also makes sense. I'm fine with what you decide how
> this should look.

Yes pls.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 4/7] x86/microcode/core: Take a snapshot before and after applying microcode
  2022-11-29 21:08 ` [Patch V1 4/7] x86/microcode/core: Take a snapshot before and after applying microcode Ashok Raj
  2022-12-02 19:09   ` Thomas Gleixner
@ 2022-12-07 20:25   ` Borislav Petkov
  2022-12-08  0:05     ` Ashok Raj
  1 sibling, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2022-12-07 20:25 UTC (permalink / raw)
  To: Ashok Raj
  Cc: X86-kernel, LKML Mailing List, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre

On Tue, Nov 29, 2022 at 01:08:29PM -0800, Ashok Raj wrote:
> The kernel caches features about each CPU's features at boot in an
> x86_capability[] structure. The microcode update takes one snapshot and
> compares it with the saved copy at boot.
> 
> However, the capabilities in the boot copy can be turned off as a result of
> certain command line parameters or configuration restrictions. This can
> cause a mismatch when comparing the values before and after the microcode
> update.

Hmm, but if that has happened, the capabilities will be turned off in
your @orig argument below?

Or are you saying that this copy_cpu_caps() read before the update will
overwrite the cleared bits with the their actual values from CPUID so
that what you really wanna compare here is *hardware* CPUID bits before
and after and not our potentially modified copy?

I.e., some bit in CPUID is 1 and we have cleared it in our copy.

Close?

If so, then this should be specifically called out in the commit
message.

> +static void copy_cpu_caps(struct cpuinfo_x86 *info)

If anything, that function should be called

store_cpu_caps()

and store it into its parameter *info.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch V1 4/7] x86/microcode/core: Take a snapshot before and after applying microcode
  2022-12-07 20:25   ` Borislav Petkov
@ 2022-12-08  0:05     ` Ashok Raj
  0 siblings, 0 replies; 32+ messages in thread
From: Ashok Raj @ 2022-12-08  0:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: X86-kernel, LKML Mailing List, Dave Hansen, Tony Luck,
	alison.schofield, reinette.chatre, Ashok Raj

On Wed, Dec 07, 2022 at 09:25:54PM +0100, Borislav Petkov wrote:
> On Tue, Nov 29, 2022 at 01:08:29PM -0800, Ashok Raj wrote:
> > The kernel caches features about each CPU's features at boot in an
> > x86_capability[] structure. The microcode update takes one snapshot and
> > compares it with the saved copy at boot.
> > 
> > However, the capabilities in the boot copy can be turned off as a result of
> > certain command line parameters or configuration restrictions. This can
> > cause a mismatch when comparing the values before and after the microcode
> > update.
> 
> Hmm, but if that has happened, the capabilities will be turned off in
> your @orig argument below?
> 
> Or are you saying that this copy_cpu_caps() read before the update will
> overwrite the cleared bits with the their actual values from CPUID so
> that what you really wanna compare here is *hardware* CPUID bits before
> and after and not our potentially modified copy?
> 
> I.e., some bit in CPUID is 1 and we have cleared it in our copy.
> 
> Close?

Correct.

> 
> If so, then this should be specifically called out in the commit
> message.

Sounds good, I can certainly add it when I repost.

> 
> > +static void copy_cpu_caps(struct cpuinfo_x86 *info)
> 
> If anything, that function should be called
> 
> store_cpu_caps()
> 
The names sounds more fitting, I can change it as suggested.

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2022-12-08  0:05 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-29 21:08 [Patch V1 0/7] x86/microcode: Some cleanups and fixes for microcode Ashok Raj
2022-11-29 21:08 ` [Patch V1 1/7] x86/microcode/intel: Remove redundant microcode rev pr_info()s Ashok Raj
2022-12-02 18:58   ` Thomas Gleixner
2022-12-03  0:26     ` Ashok Raj
2022-12-03 13:42       ` Borislav Petkov
2022-11-29 21:08 ` [Patch V1 2/7] x86/microcode/intel: Remove retries on early microcode load Ashok Raj
2022-12-02 19:01   ` Thomas Gleixner
2022-12-03  1:48     ` Ashok Raj
2022-12-02 23:53   ` Sohil Mehta
2022-12-03  1:47     ` Ashok Raj
2022-12-05 12:18   ` Borislav Petkov
2022-12-05 16:42     ` Ashok Raj
2022-12-05 20:53   ` [tip: x86/microcode] x86/microcode/intel: Do not retry microcode reloading on the APs tip-bot2 for Ashok Raj
2022-11-29 21:08 ` [Patch V1 3/7] x86/microcode/core: Move microcode_check() to cpu/microcode/core.c Ashok Raj
2022-12-02 19:02   ` Thomas Gleixner
2022-12-02 20:57   ` Sohil Mehta
2022-12-03  0:21     ` Ashok Raj
2022-12-05 16:25   ` Borislav Petkov
2022-12-05 17:05     ` Ashok Raj
2022-12-05 21:27       ` Borislav Petkov
2022-11-29 21:08 ` [Patch V1 4/7] x86/microcode/core: Take a snapshot before and after applying microcode Ashok Raj
2022-12-02 19:09   ` Thomas Gleixner
2022-12-03  1:57     ` Ashok Raj
2022-12-07 20:25   ` Borislav Petkov
2022-12-08  0:05     ` Ashok Raj
2022-11-29 21:08 ` [Patch V1 5/7] x86/microcode/intel: Prepare the print_ucode_rev to simply take a rev to print Ashok Raj
2022-12-02 19:23   ` Thomas Gleixner
2022-11-29 21:08 ` [Patch V1 6/7] x86/microcode/intel: Print old and new rev during early boot Ashok Raj
2022-12-02 19:29   ` Thomas Gleixner
2022-11-29 21:08 ` [Patch V1 7/7] x86/microcode/intel: Print when early microcode loading fails Ashok Raj
2022-12-02 19:30   ` Thomas Gleixner
2022-12-05 18:19     ` Ashok Raj

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).