linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>,
	Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>,
	Carlos Bilbao <carlos.bilbao@amd.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] x86/mce: Dump the stack for recoverable machine checks in kernel context
Date: Mon, 31 Oct 2022 19:36:08 +0100	[thread overview]
Message-ID: <Y2AVmOdEtTl5e68l@zn.tnic> (raw)
In-Reply-To: <SJ1PR11MB6083593A0D18EEE0074E77ABFC379@SJ1PR11MB6083.namprd11.prod.outlook.com>

On Mon, Oct 31, 2022 at 05:13:10PM +0000, Luck, Tony wrote:
> > 1. If the error has raised a MCE, then we will dump stack anyway.
> 
> I don't see stack dumps for machine check panics. I don't have any non-standard
> settings (I think). Nor do I see them in the panic messages that other folks send
> to me.
> 
> Are you settting some CONFIG or command line option to get a stack dump?

Well, if one were sane, one would assume that one would expect to see a
stack dump when the machine panics, right? I mean, it is only fair...

And there's an attempt:

#ifdef CONFIG_DEBUG_BUGVERBOSE 
        /*
         * Avoid nested stack-dumping if a panic occurs during oops processing
         */
        if (!test_taint(TAINT_DIE) && oops_in_progress <= 1)
                dump_stack();
#endif

but that oops_in_progress thing is stopping us:

[   13.706764] mce: [Hardware Error]: CPU 2: Machine Check Exception: 6 Bank 4: fe000010000b0c0f
[   13.706781] mce: [Hardware Error]: RIP 10:<ffffffff8103bbcb> {trigger_mce+0xb/0x10}
[   13.706791] mce: [Hardware Error]: TSC c83826d14 ADDR e1101add1e550012 MISC cafebeef 
[   13.706795] mce: [Hardware Error]: PROCESSOR 2:a00f11 TIME 1667244167 SOCKET 0 APIC 2 microcode 1000065
[   13.706809] mce: [Hardware Error]: Machine check: Processor Context Corrupt
[   13.706810] panic: on entry: oops_in_progress: 1
[   13.706812] panic: before bust_spinlocks oops_in_progress: 1
[   13.706813] Kernel panic - not syncing: Fatal local machine check
[   13.706814] panic: taint: 0, oops_in_progress: 2
[   13.707133] Kernel Offset: disabled

as panic() is being entered with oops_in_progress already set to 1. That
oops_in_progress thing looks like is being used for console unblanking.

Looking at

  026ee1f66aaa ("panic: fix stack dump print on direct call to panic()")

it hints that panic() might've been called twice for oops_in_progress to
be already 1 on entry.

I guess we need to figure out why that is...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2022-10-31 18:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-22 19:51 [PATCH 0/2] Dump stack after certain machine checks Tony Luck
2022-09-22 19:51 ` [PATCH 1/2] x86/mce: Use severity table to handle uncorrected errors in kernel Tony Luck
2022-10-31 16:15   ` [tip: ras/core] " tip-bot2 for Tony Luck
2022-09-22 19:51 ` [PATCH 2/2] x86/mce: Dump the stack for recoverable machine checks in kernel context Tony Luck
2022-10-31 16:44   ` Borislav Petkov
2022-10-31 17:13     ` Luck, Tony
2022-10-31 18:36       ` Borislav Petkov [this message]
2022-10-31 19:20         ` Luck, Tony
2022-10-31 10:30 ` [PATCH 0/2] Dump stack after certain machine checks Borislav Petkov
2022-11-01 17:36   ` Yazen Ghannam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2AVmOdEtTl5e68l@zn.tnic \
    --to=bp@alien8.de \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=carlos.bilbao@amd.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).