linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Feeney <james@nurealm.net>
To: Borislav Petkov <bp@suse.de>
Cc: linux-smp@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: linux 5.12 - fails to boot - soft lockup - CPU#0 stuck for 23s! - RIP smp_call_function_single
Date: Mon, 24 May 2021 22:02:24 -0600	[thread overview]
Message-ID: <1f6c70f4-6680-d6ea-465a-548dc7698317@nurealm.net> (raw)
In-Reply-To: <YKtbBXZGpVZS1M4R@zn.tnic>

[-- Attachment #1: Type: text/plain, Size: 9652 bytes --]

On 5/24/21 1:51 AM, Borislav Petkov wrote:
> On Sun, May 23, 2021 at 05:02:01PM -0600, James Feeney wrote:
>> Ha! Yes, your patch *is* the culprit. You don't trust git bisect?
> 
> Well, git-bisect can very easily veer off into the fields if the
> bisector makes a mistake. I'm not saying you have made a mistake but I
> have done that mistake a bunch of times and have seen others do it too
> so it is very easy to get lost.
> 

Quite so, quite so.  But the process, by necessity, is incremental, a step-wise process of increasing knowledge.  It is essentially one of the many levels of the same process described in "The Cathedral and the Bazaar".  Though, it can be tedious at times.

> And with my patch simply moving the LVT THMR read back in the boot
> order, I was very sceptical how can that even be?!
> 
> But...
> 
>> "lvtthmr_init: 0x200" != "lvtthmr_init: 0x10200" != "lvtthmr_init: 0x10000"
> 
> ... *this* is a good catch, I *think* I know what happens and the next
> patch will confirm my theory, see the end of the mail. Remove all diffs
> you have ontop of your 5.12 kernel, apply the one below, do the exact
> same exercise with it and send me one full dmesg pls.
> 

It is not helpful to describe the addition of one pr_info statement in terms of "Remove all diffs you have ontop of your 5.12 kernel".  I prefer to apply each of these by hand, so that I can see what is actually happening, and there are no actual other additional changes, other than this pr_info addition to clear_local_APIC() in arch/x86/kernel/apic/apic.c.

>> System Management is *hard*, because it must build upon someone else's
>> undocumented buggy software. Thank Intel.
> 
> No, SMM is shit. Complain to the OEMs about it.
> 

Hmm - an important ethical question, determining to whom blame should be assigned.  I rather take the position, to put this in melodramatic terms, of not allowing little children, crazy people, fanatics, or politicians, to have guns.  Intel know - or should know - that hardware OEMs are irresponsible software providers.  There is overwhelming evidence, to this.  And yet, Intel continue to choose to provide this SMM "gun" to these OEM "crazy people".  The outcome, then, simply becomes predictable, and inevitable.  I, therefore, prefer to hold Intel, and by extension, AMD, responsible.  A "solid ethical foundation" is the most important element in the machine result.

> However, this time I think it is *we* who are not doing stuff as
> correctly as we should but let's see your results first.
> 
> Thx.
> 

Since I am not familiar with the details, I am interested to know your final analysis.  Still, I take note of the present language in the comment, in intel_init_thermal():

====
         * The initial value of thermal LVT entries on all APs always reads
         * 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
         * sequence to them and LVT registers are reset to 0s except for
         * the mask bits which are set to 1s when APs receive INIT IPI.
         * If BIOS takes over the thermal interrupt and sets its interrupt
         * delivery mode to SMI (not fixed), it restores the value that the
         * BIOS has programmed on AP based on BSP's info we saved since BIOS
         * is always setting the same value for all threads/cores.
====

But then, it is entirely unclear to me exactly "who is doing what" here, while we see these values, resulting under various circumstances, at different times, as 0x10000, 0x10200, and 0x200.


Addressing the pr_info patch in clear_local_APIC(), first we note the comment "Also used to cleanout any BIOS leftovers during boot", and then discover that clear_local_APIC() is called during power-down, but is *never called* during boot.  To be sure that this is true, I've added
	pr_info("%s: before : CPU%d, maxlvt: %d\n", __func__, smp_processor_id(), maxlvt);
and
	pr_info("%s: after : CPU%d, maxlvt: %d\n", __func__, smp_processor_id(), maxlvt);
around
	if (maxlvt >= 5) {...}

since maybe we want to know the actual value of maxlvt during boot, on the possibility that this value might have changed.

With these additional pr_info statements, we see with certainty that clear_local_APIC() is *never* called during boot.  Presumably, this implies smp_found_config in init_bsp_APIC(), which otherwise would have called clear_local_APIC().  dmidecode says "SMBIOS 2.4 present".

dmesg log for a power-up to power-down cycle attached.  Here is the quick look:

$ grep -C 1 'mcheck\|lvtt\|clear' dmesglog.5.12.info.context.1

[    0.171645] Booting paravirtualized kernel on bare hardware
[    0.171648] mcheck_intel_therm_init: lvtthmr_init: 0x200
[    0.171651] mce: mcheck_init : mcheck_intel_therm_init() returned
[    0.171657] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370452778343963 ns
--
[    1.251708] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    1.255457] intel_init_thermal: CPU0, lvtthmr_init: 0x200, read: 0x200, misc_enable (low): 0x64952489
[    1.258348] CPU0: Thermal monitoring handled by SMI
--
[    1.451691] .... node  #0, CPUs:      #1
[    0.944189] intel_init_thermal: CPU1, lvtthmr_init: 0x200, read: 0x200, misc_enable (low): 0x64952489
[    1.468467] smp: Brought up 1 node, 2 CPUs
--
[  153.299300] reboot: Power down
[  153.304433] clear_local_APIC: before : CPU0, maxlvt: 5
[  153.304433] clear_local_APIC: before : CPU1, maxlvt: 5
[  153.304437] clear_local_APIC: CPU1, maxlvt: 5, APIC_LVTTHMR: 0x200, masking LVT
[  153.311404] clear_local_APIC: CPU0, maxlvt: 5, APIC_LVTTHMR: 0x200, masking LVT
[  153.316525] clear_local_APIC: after : CPU1, maxlvt: 5
[  153.325673] clear_local_APIC: after : CPU0, maxlvt: 5
[  153.346744] clear_local_APIC: before : CPU0, maxlvt: 5
[  153.353718] clear_local_APIC: CPU0, maxlvt: 5, APIC_LVTTHMR: 0x10200, masking LVT
[  153.363062] clear_local_APIC: after : CPU0, maxlvt: 5
[  153.369985] acpi_power_off called


James


> ---
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index ddfb3cad8dff..5ac8b827bc12 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -296,6 +296,12 @@ struct cper_sec_mem_err;
>  extern void apei_mce_report_mem_error(int corrected,
>  				      struct cper_sec_mem_err *mem_err);
>  
> +#ifdef CONFIG_X86_THERMAL_VECTOR
> +extern void mcheck_intel_therm_init(void);
> +#else
> +static inline void mcheck_intel_therm_init(void) { }
> +#endif
> +
>  /*
>   * Enumerate new IP types and HWID values in AMD processors which support
>   * Scalable MCA.
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 4a39fb429f15..f21009786877 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1162,6 +1162,10 @@ void clear_local_APIC(void)
>  #ifdef CONFIG_X86_THERMAL_VECTOR
>  	if (maxlvt >= 5) {
>  		v = apic_read(APIC_LVTTHMR);
> +
> +		pr_info("%s: CPU%d, maxlvt: %d, APIC_LVTTHMR: 0x%x, masking LVT\n",
> +			__func__, smp_processor_id(), maxlvt, v);
> +
>  		apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED);
>  	}
>  #endif
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index bf7fe87a7e88..ded20b8612fe 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -2190,6 +2190,7 @@ __setup("mce", mcheck_enable);
>  
>  int __init mcheck_init(void)
>  {
> +	mcheck_intel_therm_init();
>  	mce_register_decode_chain(&early_nb);
>  	mce_register_decode_chain(&mce_uc_nb);
>  	mce_register_decode_chain(&mce_default_nb);
> diff --git a/drivers/thermal/intel/therm_throt.c b/drivers/thermal/intel/therm_throt.c
> index f8e882592ba5..0ebd2386839f 100644
> --- a/drivers/thermal/intel/therm_throt.c
> +++ b/drivers/thermal/intel/therm_throt.c
> @@ -621,19 +621,30 @@ bool x86_thermal_enabled(void)
>  	return atomic_read(&therm_throt_en);
>  }
>  
> +void __init mcheck_intel_therm_init(void)
> +{
> +	/*
> +	 * This function is only called on boot CPU. Save the init thermal
> +	 * LVT value on BSP and use that value to restore APs' thermal LVT
> +	 * entry BIOS programmed later
> +	 */
> +	if (intel_thermal_supported(&boot_cpu_data)) {
> +		lvtthmr_init = apic_read(APIC_LVTTHMR);
> +	pr_info("%s: lvtthmr_init: 0x%x\n", __func__, lvtthmr_init);
> +	} else {
> +		pr_info("%s: !intel_thermal_supported\n", __func__);
> +	}
> +}
> +
>  void intel_init_thermal(struct cpuinfo_x86 *c)
>  {
>  	unsigned int cpu = smp_processor_id();
>  	int tm2 = 0;
> -	u32 l, h;
> +	u32 l, h, tmp = -1;
>  
>  	if (!intel_thermal_supported(c))
>  		return;
>  
> -	/* On the BSP? */
> -	if (c == &boot_cpu_data)
> -		lvtthmr_init = apic_read(APIC_LVTTHMR);
> -
>  	/*
>  	 * First check if its enabled already, in which case there might
>  	 * be some SMM goo which handles it, so we can't even put a handler
> @@ -652,13 +663,17 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
>  	 * BIOS has programmed on AP based on BSP's info we saved since BIOS
>  	 * is always setting the same value for all threads/cores.
>  	 */
> -	if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED)
> +	if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED) {
>  		apic_write(APIC_LVTTHMR, lvtthmr_init);
> +		tmp = apic_read(APIC_LVTTHMR);
> +	}
>  
> +	pr_info("%s: CPU%d, lvtthmr_init: 0x%x, read: 0x%x, misc_enable (low): 0x%x\n",
> +		__func__, cpu, lvtthmr_init, tmp, l);
>  
>  	if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
>  		if (system_state == SYSTEM_BOOTING)
> -			pr_debug("CPU%d: Thermal monitoring handled by SMI\n", cpu);
> +			pr_info("CPU%d: Thermal monitoring handled by SMI\n", cpu);
>  		return;
>  	}
>  
> 

[-- Attachment #2: dmesglog.5.12.info.context.1 --]
[-- Type: application/x-troff-man, Size: 70577 bytes --]

  reply	other threads:[~2021-05-25  4:02 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <8a9599b2-f4fe-af9b-90f5-af39c315ec2f@nurealm.net>
2021-05-17  8:32 ` linux 5.12 - fails to boot - soft lockup - CPU#0 stuck for 23s! - RIP smp_call_function_single Borislav Petkov
2021-05-19  3:58   ` James Feeney
2021-05-19 11:12     ` Borislav Petkov
2021-05-19 20:03       ` James Feeney
2021-05-19 21:18         ` Borislav Petkov
2021-05-20  3:12           ` James Feeney
2021-05-20  9:21             ` Borislav Petkov
2021-05-21 22:11               ` James Feeney
2021-05-22  9:06                 ` Borislav Petkov
2021-05-22 23:28                   ` James Feeney
2021-05-23 17:05                     ` Borislav Petkov
2021-05-23 23:02                       ` James Feeney
2021-05-24  7:51                         ` Borislav Petkov
2021-05-25  4:02                           ` James Feeney [this message]
2021-05-27 10:31                             ` [PATCH] x86/thermal: Fix LVT thermal setup for SMI delivery mode Borislav Petkov
2021-05-27 11:49                               ` Thomas Gleixner
2021-05-27 11:56                                 ` Borislav Petkov
2021-05-27 18:54                                 ` Borislav Petkov
2021-05-28  8:23                                   ` Thomas Gleixner
2021-05-28 11:19                                     ` Borislav Petkov
2021-05-31 18:26                                       ` James Feeney
2021-05-27 18:09                               ` Srinivas Pandruvada
2021-05-27 19:01                                 ` Borislav Petkov
2021-05-27 20:28                                   ` Srinivas Pandruvada
2021-05-28  7:05                               ` James Feeney
2021-05-31 21:46   ` [tip: x86/urgent] " tip-bot2 for Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1f6c70f4-6680-d6ea-465a-548dc7698317@nurealm.net \
    --to=james@nurealm.net \
    --cc=axboe@kernel.dk \
    --cc=bp@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-smp@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).