linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Carlos Bilbao <carlos.bilbao@amd.com>
To: Borislav Petkov <bp@alien8.de>
Cc: tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, x86@kernel.org,
	yazen.ghannam@amd.com, linux-kernel@vger.kernel.org,
	linux-edac@vger.kernel.org, bilbao@vt.edu
Subject: Re: [PATCH] x86/mce: Cover grading of AMD machine error checks
Date: Thu, 10 Mar 2022 12:24:08 -0600	[thread overview]
Message-ID: <4a345de2-6a2c-fe26-c55c-34ce6ea431d4@amd.com> (raw)
In-Reply-To: <Yijz7dA1U0AMcYPZ@zn.tnic>

On 3/9/2022 12:37 PM, Borislav Petkov wrote:
> Definitely a step in the right direction.
>
> Now...
>
> On Wed, Mar 09, 2022 at 11:41:07AM -0600, Carlos Bilbao wrote:
>> AMD's severity grading covers very few machine errors. In the graded cases
>> there are no user-readable messages, complicating debugging of critical
>> hardware errors.
>
> That's too generic. What is the actual use case here you're spending all
> this time for?
>

We will cover grading of MCEs like deferred memory scrub errors, attempts 
to access poisonous data, etc. I could list all new covered cases in the 
commit message if you think that'd be positive.

>> Fix the above issues extending the current grading logic for AMD with cases
>> not previously considered and their corresponding messages.
>>
>> Signed-off-by: Carlos Bilbao <carlos.bilbao@amd.com>
>> ---
>>  arch/x86/include/asm/mce.h         |   6 +
>>  arch/x86/kernel/cpu/mce/severity.c | 232 +++++++++++++++++++++++++----
>>  2 files changed, 205 insertions(+), 33 deletions(-)
>
> Now, looking at the whole thing, AFAICT all you're interested in is
> getting some strings out from those error types. But but, we already
> have something like that. That's even mentioned in the patch:
>
>> +	 * Default return values. The poll handler catches these and passes
>> +	 * responsibility of decoding them to EDAC
>
> So there's a big fat module mce_amd.c which does convert MCEs to
> strings. So why can't that be used and extended instead of adding more
> strings to more places in the kernel?
>
> Thx.
>

The severity grading logic has a double functionality: (1) Grade the 
machine errors and (2) Assign them a severity message. The first task is
arguably more important because depending on the severity we assign, the
MCE handler do_machine_check() will do things differently. 

The part of this patch that grades new errors is needed for (1). But -as 
you mention- why do we need these new strings if we have the EDAC driver?

If we grade the error as a PANIC, the handler will not notify EDAC, and the
kernel will use our new severity messages when displaying the error with
mce_panic().

If the machine error is not graded as PANIC, the notification chain will
come into play, and EDAC will eventually decode the machine errors without 
using the severity message. Hence, you have a good point, these new strings
will be unnecessary -We should send a new version of the patch that only 
includes msgs for PANIC cases. We probably won't need helper functions by
then. I'll also take some time to think of any other error cases we might
not be covering yet.

Hope that helps clarify,
Carlos


  reply	other threads:[~2022-03-10 18:24 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-09 17:41 [PATCH] x86/mce: Cover grading of AMD machine error checks Carlos Bilbao
2022-03-09 18:37 ` Borislav Petkov
2022-03-10 18:24   ` Carlos Bilbao [this message]
2022-03-10 22:13     ` Borislav Petkov
2022-03-11 14:07       ` Carlos Bilbao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a345de2-6a2c-fe26-c55c-34ce6ea431d4@amd.com \
    --to=carlos.bilbao@amd.com \
    --cc=bilbao@vt.edu \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).