On 5/23/21 11:05 AM, Borislav Petkov wrote: > On Sat, May 22, 2021 at 05:28:42PM -0600, James Feeney wrote: >> Out of curiosity, I added "pr_info("%s : mcheck_intel_therm_init() >> use to run here", __func__);" to __init mcheck_init() in >> arch/x86/kernel/cpu/mce/core.c, just to see *when* it *would have* >> run, compared to when it is running now. > > Yes, this is probably the only timing-sensitive difference this patch of > mine introduces. So who knows what's in that LVT APIC register earlier > and *maybe* reading it later might give us different values. > > So do the same game again pls, but this time apply the patch below. It > is practically a revert of my patch and with it, your box should *not* > freeze anymore *if* my patch really is the culprit. > > Thx. > > patch below. Yes, pretty much what I had in mind. > *if* my patch really is the culprit. Ha! Yes, your patch *is* the culprit. You don't trust git bisect? My first git bisect failure was before realizing that the boot failure was intermittent/probabilistic. That's why I ended-up doing the git bisect *three* times, the last with *10* reboots on each "good" patch. > who knows what's in that LVT APIC register earlier > and *maybe* reading it later might give us different values. "lvtthmr_init: 0x200" != "lvtthmr_init: 0x10200" != "lvtthmr_init: 0x10000" System Management is *hard*, because it must build upon someone else's undocumented buggy software. Thank Intel. Well now, reverting the timing seems to completely resolve the issue. Take the win - 20 reboots, "cold" and "hot", and *no* boot failures. A couple of representative dmesg logs are attached. For a representative quick look: $ grep -C 1 'mcheck\|lvtt' dmesglog.5.12.mcheck.1|sort -- -- dmesglog.5.12.mcheck.1-[ 0.172695] Booting paravirtualized kernel on bare hardware dmesglog.5.12.mcheck.1:[ 0.172698] mcheck_intel_therm_init: lvtthmr_init: 0x200 dmesglog.5.12.mcheck.1:[ 0.172701] mce: mcheck_init : mcheck_intel_therm_init() returned dmesglog.5.12.mcheck.1-[ 0.172706] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns dmesglog.5.12.mcheck.1:[ 0.944549] intel_init_thermal: CPU1, lvtthmr_init: 0x200, read: 0x200, misc_enable (low): 0x64952489 dmesglog.5.12.mcheck.1-[ 1.252230] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear) dmesglog.5.12.mcheck.1:[ 1.258912] intel_init_thermal: CPU0, lvtthmr_init: 0x200, read: 0x200, misc_enable (low): 0x64952489 dmesglog.5.12.mcheck.1-[ 1.262202] CPU0: Thermal monitoring handled by SMI dmesglog.5.12.mcheck.1-[ 1.455547] .... node #0, CPUs: #1 dmesglog.5.12.mcheck.1-[ 1.472322] smp: Brought up 1 node, 2 CPUs I'd suggest, leave in your "pr_info" statements. Also, the "#ifdef CONFIG_X86_THERMAL_VECTOR" may make more sense a little further above, closer to the "#ifdef CONFIG_X86_MCE_INTEL", and not under the "Used by APEI to report memory error via /dev/mcelog" heading. And, a couple of big fat verbose comments, about the placement and timing issue, probably in arch/x86/kernel/cpu/mce/core.c, and in drivers/thermal/intel/therm_throt.c, would be in order. Boris, thanks very much for working this issue with me. Let's get the revisions into the mainstream kernel. The 5.11 and 5.12 initial releases both left this laptop un-bootable. I've been gaining lots of character-building experience with git bisect! Hopefully, going forward, new releases will be more boring. James > --- > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h > index ddfb3cad8dff..5ac8b827bc12 100644 > --- a/arch/x86/include/asm/mce.h > +++ b/arch/x86/include/asm/mce.h > @@ -296,6 +296,12 @@ struct cper_sec_mem_err; > extern void apei_mce_report_mem_error(int corrected, > struct cper_sec_mem_err *mem_err); > > +#ifdef CONFIG_X86_THERMAL_VECTOR > +extern void mcheck_intel_therm_init(void); > +#else > +static inline void mcheck_intel_therm_init(void) { } > +#endif > + > /* > * Enumerate new IP types and HWID values in AMD processors which support > * Scalable MCA. > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > index bf7fe87a7e88..ded20b8612fe 100644 > --- a/arch/x86/kernel/cpu/mce/core.c > +++ b/arch/x86/kernel/cpu/mce/core.c > @@ -2190,6 +2190,7 @@ __setup("mce", mcheck_enable); > > int __init mcheck_init(void) > { > + mcheck_intel_therm_init(); > mce_register_decode_chain(&early_nb); > mce_register_decode_chain(&mce_uc_nb); > mce_register_decode_chain(&mce_default_nb); > diff --git a/drivers/thermal/intel/therm_throt.c b/drivers/thermal/intel/therm_throt.c > index f8e882592ba5..0ebd2386839f 100644 > --- a/drivers/thermal/intel/therm_throt.c > +++ b/drivers/thermal/intel/therm_throt.c > @@ -621,19 +621,30 @@ bool x86_thermal_enabled(void) > return atomic_read(&therm_throt_en); > } > > +void __init mcheck_intel_therm_init(void) > +{ > + /* > + * This function is only called on boot CPU. Save the init thermal > + * LVT value on BSP and use that value to restore APs' thermal LVT > + * entry BIOS programmed later > + */ > + if (intel_thermal_supported(&boot_cpu_data)) { > + lvtthmr_init = apic_read(APIC_LVTTHMR); > + pr_info("%s: lvtthmr_init: 0x%x\n", __func__, lvtthmr_init); > + } else { > + pr_info("%s: !intel_thermal_supported\n", __func__); > + } > +} > + > void intel_init_thermal(struct cpuinfo_x86 *c) > { > unsigned int cpu = smp_processor_id(); > int tm2 = 0; > - u32 l, h; > + u32 l, h, tmp = -1; > > if (!intel_thermal_supported(c)) > return; > > - /* On the BSP? */ > - if (c == &boot_cpu_data) > - lvtthmr_init = apic_read(APIC_LVTTHMR); > - > /* > * First check if its enabled already, in which case there might > * be some SMM goo which handles it, so we can't even put a handler > @@ -652,13 +663,17 @@ void intel_init_thermal(struct cpuinfo_x86 *c) > * BIOS has programmed on AP based on BSP's info we saved since BIOS > * is always setting the same value for all threads/cores. > */ > - if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED) > + if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED) { > apic_write(APIC_LVTTHMR, lvtthmr_init); > + tmp = apic_read(APIC_LVTTHMR); > + } > > + pr_info("%s: CPU%d, lvtthmr_init: 0x%x, read: 0x%x, misc_enable (low): 0x%x\n", > + __func__, cpu, lvtthmr_init, tmp, l); > > if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) { > if (system_state == SYSTEM_BOOTING) > - pr_debug("CPU%d: Thermal monitoring handled by SMI\n", cpu); > + pr_info("CPU%d: Thermal monitoring handled by SMI\n", cpu); > return; > } > >