linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@suse.de>
To: James Feeney <james@nurealm.net>
Cc: linux-smp@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: linux 5.12 - fails to boot - soft lockup - CPU#0 stuck for 23s! - RIP smp_call_function_single
Date: Thu, 20 May 2021 11:21:04 +0200	[thread overview]
Message-ID: <YKYqABhSTTUG8cgV@zn.tnic> (raw)
In-Reply-To: <e7701de5-35f3-da9d-7339-df2de6d8b3cf@nurealm.net>

On Wed, May 19, 2021 at 09:12:04PM -0600, James Feeney wrote:

> $ diff .config .config.old
> 4983c4983,4984
> < # CONFIG_X86_THERMAL_VECTOR is not set
> ---
> > CONFIG_X86_THERMAL_VECTOR=y
> > CONFIG_X86_PKG_TEMP_THERMAL=m
>
> No joy. Still have the same soft lockups and full boots - the full
> boots interrupted by some mystery delay.

Which means, even with therm_throt disabled, it still locks up. Which
can't be caused by my patch.

> I don't know about these patches, modifying and moving the location of
> therm_throt.c, so I'm not in a position to draw any conclusion from
> these results.

They're just moving the thermal interrupt functionality from the
MCE code where they don't belong to the thermal code where they do.
Otherwise there should be no change.

> build 5.11? There are lots of 5.11 kernels from the Arch distribution
> that I have run. Are you looking for a dmesg log from 5.11?

Take the .config you're normally using, make sure it has

CONFIG_X86_THERMAL_VECTOR=y

and build with it plain 5.11 kernel. No patches ontop, no nothing.

Then add

debug ignore_loglevel log_buf_len=16M no_console_suspend systemd.log_target=null console=ttyS0,115200 console=tty0

to its kernel command line and send me full dmesg again pls.

Looking how it sometimes boots and sometimes it locks up, try that a
couple of times.

> So far, something looks quirky - somewhere. Timing related failures
> can be a pain. Is there no useful information being provided by the
> Call Trace in the dmesg log?

What I'm seeing is that *sometimes* - not always - your CPU0 is not
responding to the TLB flush IPI. Which is really weird. Have you had
those always or did they start appearing with 5.12?

That's why I'm still scratching my head over how my patch would cause
CPU0 not responding to IPIs.

Well, *maybe* there's a little difference which my patch did: it does
that APIC_LVTTHMR only on the BSP. And *maybe* there's a problem there,
who knows with those old CPUs.

So here's two more things to try:

1. On plain 5.12, with the same kernel cmdline params add also

"idle=nomwait"

to the kernel command line and boot with it a couple of times to see
whether it still locks up.

2. On plain 5.12, with the same kernel cmdline params apply this hunk
ontop:

---
diff --git a/drivers/thermal/intel/therm_throt.c b/drivers/thermal/intel/therm_throt.c
index f8e882592ba5..42db48cd4666 100644
--- a/drivers/thermal/intel/therm_throt.c
+++ b/drivers/thermal/intel/therm_throt.c
@@ -630,9 +630,8 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
 	if (!intel_thermal_supported(c))
 		return;
 
-	/* On the BSP? */
-	if (c == &boot_cpu_data)
-		lvtthmr_init = apic_read(APIC_LVTTHMR);
+	lvtthmr_init = apic_read(APIC_LVTTHMR);
+	pr_info("%s: CPU%d, lvtthmr_init: 0x%x\n", __func__, cpu, lvtthmr_init);
 
 	/*
 	 * First check if its enabled already, in which case there might
---

That'll tell us the thermal sensor LVT on both CPUs.

Also do that a couple of times - it'll be interesting to see what those
values are *when* the box locks up.

As always, catch full dmesg each time pls.

Thx.

-- 
Regards/Gruss,
    Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

  reply	other threads:[~2021-05-20  9:21 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <8a9599b2-f4fe-af9b-90f5-af39c315ec2f@nurealm.net>
2021-05-17  8:32 ` linux 5.12 - fails to boot - soft lockup - CPU#0 stuck for 23s! - RIP smp_call_function_single Borislav Petkov
2021-05-19  3:58   ` James Feeney
2021-05-19 11:12     ` Borislav Petkov
2021-05-19 20:03       ` James Feeney
2021-05-19 21:18         ` Borislav Petkov
2021-05-20  3:12           ` James Feeney
2021-05-20  9:21             ` Borislav Petkov [this message]
2021-05-21 22:11               ` James Feeney
2021-05-22  9:06                 ` Borislav Petkov
2021-05-22 23:28                   ` James Feeney
2021-05-23 17:05                     ` Borislav Petkov
2021-05-23 23:02                       ` James Feeney
2021-05-24  7:51                         ` Borislav Petkov
2021-05-25  4:02                           ` James Feeney
2021-05-27 10:31                             ` [PATCH] x86/thermal: Fix LVT thermal setup for SMI delivery mode Borislav Petkov
2021-05-27 11:49                               ` Thomas Gleixner
2021-05-27 11:56                                 ` Borislav Petkov
2021-05-27 18:54                                 ` Borislav Petkov
2021-05-28  8:23                                   ` Thomas Gleixner
2021-05-28 11:19                                     ` Borislav Petkov
2021-05-31 18:26                                       ` James Feeney
2021-05-27 18:09                               ` Srinivas Pandruvada
2021-05-27 19:01                                 ` Borislav Petkov
2021-05-27 20:28                                   ` Srinivas Pandruvada
2021-05-28  7:05                               ` James Feeney
2021-05-31 21:46   ` [tip: x86/urgent] " tip-bot2 for Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YKYqABhSTTUG8cgV@zn.tnic \
    --to=bp@suse.de \
    --cc=axboe@kernel.dk \
    --cc=james@nurealm.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-smp@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).