All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@suse.de>
To: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>
Cc: "linux-efi@vger.kernel.org" <linux-efi@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"ard.biesheuvel@linaro.org" <ard.biesheuvel@linaro.org>,
	"x86@kernel.org" <x86@kernel.org>,
	Tony Luck <tony.luck@intel.com>
Subject: Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure
Date: Tue, 27 Feb 2018 23:10:39 +0100	[thread overview]
Message-ID: <20180227221039.GR26382@pd.tnic> (raw)
In-Reply-To: <DM5PR12MB191691E786778FE380388B74F8C00@DM5PR12MB1916.namprd12.prod.outlook.com>

On Tue, Feb 27, 2018 at 09:32:18PM +0000, Ghannam, Yazen wrote:
> Not much more readable. It's still vague and confusing to a user and devoid
> of any real info so an expert can't help. And now the information is printed
> arbitrarily, so someone needs to read the source to figure out what it really
> means.

WTF? You need to read the source to figure out what the error is? So
"Corrected Processor Error" is confusing? I think you've been clearly
staring at the spec for waaay too long.

Also, read my mail again:

"Now, I admit that my vesion of the record is not enough to debug it
but it needs to contain only information which is clear and humanly
readable to debug. You can always dump the raw data underneath from the
tracepoint but make the beginning human readable."

> Maybe. But these records are generated by Platform Firmware. Why would
> FW report the error knowing the system is about to die?

WTF more! Dude, are you kidding me?

So the firmware should not report the error if it knows the system is
about to die?!?!

Now you're just making up insane counterarguments, just because.

> Your example still says "Hardware Error"

Oh my, that's the *prefix*.

> and odds are general users won't understand what the error type means
> or what a corrected error is. So it's not much better.

Yeah, and the next thing you'll say is that users won't understand what
"error" means, right?

Geez.

> Exactly! The more info available (usually) the more quickly it can be
> diagnosed.

You still don't understand what I'm trying to explain to you.

It is *not* about diagnosing it - in order to do that you need to
involve people to diagnose it. It is about making the error record as
human readable as possible so that you don't *have* to involve people to
diagnose it in the first place and the user can say, ah ok, corrected
error, no need to do anything.

Or "System Fatal error" - I better replace that part.

> Hardware errors are generally rare and hard to reproduce.

Except when they're not like DRAM ECC floods which are pretty easy to
reproduce.

> So when one does occur we should capture the data and provide it.

Did I say you should not do that?!

I said: make it as human readable as possible and dump the gory hex crap
after it.

> Here are a couple of scenarios based on similar experiences I've had:

Now play that same scenario with the following record format:

[ human readable error record ]
[ full raw error dump ]

> I'll send a V3 set with the following changes:
> 1) Fix table numbering in commit messages.
> 2) Remove "Validation Bits" lines.
> 3) Only print error type GUID for unknown types.
> 
> I think this set should focus on printing the x86 CPER based on the UEFI
> spec and the convention of the other CPER code. CPERs are generated
> by Platform Firmware. So errors are explicitly intended to be viewed by
> the user and all info should be displayed.

You should look up from the spec and realize that real life is much
different.

> I *have* been thinking that it would be nice to take the CPER and pipe it
> through the MCA decoding in arch/x86 and EDAC. This would be really
> nice for when the CPER includes the MCA registers in the Context info.
> So we'd get our usual MCA decoding instead of a binary blob of registers.

That would definitely be a step in the right direction.

> I was thinking that the MCA decoding would be in addition to this. But
> based on Boris's comments, maybe we can make it a default selection.
> For example, if MCA/EDAC decoding is available, use it. Otherwise, print
> the CPER fields in a generic way like we do for the other CPER types.

Yes.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

  reply	other threads:[~2018-02-27 22:11 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-26 19:38 [PATCH v2 0/8] Decode IA32/X64 CPER Yazen Ghannam
2018-02-26 19:38 ` [PATCH v2 1/8] efi: Fix IA32/X64 Processor Error Record definition Yazen Ghannam
2018-02-27 10:47   ` Borislav Petkov
2018-02-27 15:05     ` Ghannam, Yazen
2018-02-26 19:38 ` [PATCH v2 2/8] efi: Decode IA32/X64 Processor Error Section Yazen Ghannam
2018-02-27 11:22   ` Borislav Petkov
2018-02-27 15:13     ` Ghannam, Yazen
2018-02-27 17:00       ` Borislav Petkov
2018-02-27 17:27         ` Ghannam, Yazen
2018-02-27 17:44           ` Borislav Petkov
2018-02-27 18:06             ` Ghannam, Yazen
2018-02-27 18:34               ` Borislav Petkov
2018-02-26 19:38 ` [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure Yazen Ghannam
2018-02-27 14:25   ` Borislav Petkov
2018-02-27 15:25     ` Ghannam, Yazen
2018-02-27 17:04       ` Borislav Petkov
2018-02-27 17:46         ` Ghannam, Yazen
2018-02-27 18:02           ` Borislav Petkov
2018-02-27 18:40             ` Ghannam, Yazen
2018-02-27 19:09               ` Borislav Petkov
2018-02-27 21:32                 ` Ghannam, Yazen
2018-02-27 22:10                   ` Borislav Petkov [this message]
2018-02-26 19:39 ` [PATCH v2 4/8] efi: Decode UEFI-defined IA32/X64 Error Structure GUIDs Yazen Ghannam
2018-02-27 14:30   ` Borislav Petkov
2018-02-27 15:28     ` Ghannam, Yazen
2018-02-27 15:31       ` Ard Biesheuvel
2018-02-26 19:39 ` [PATCH v2 5/8] efi: Decode IA32/X64 Cache, TLB, and Bus Check structures Yazen Ghannam
2018-02-27 15:03   ` Borislav Petkov
2018-02-27 15:33     ` Ghannam, Yazen
2018-02-27 17:06       ` Borislav Petkov
2018-02-26 19:39 ` [PATCH v2 6/8] efi: Decode additional IA32/X64 Bus Check fields Yazen Ghannam
2018-02-26 19:39 ` [PATCH v2 7/8] efi: Decode IA32/X64 MS Check structure Yazen Ghannam
2018-02-26 19:39 ` [PATCH v2 8/8] efi: Decode IA32/X64 Context Info structure Yazen Ghannam
2018-02-28  8:43 ` [PATCH v2 0/8] Decode IA32/X64 CPER Borislav Petkov
2018-02-28 15:12   ` Ghannam, Yazen
2018-02-28 16:35     ` Borislav Petkov
2018-02-28 20:58       ` Ghannam, Yazen
2018-03-01 11:59         ` Borislav Petkov
2018-03-23  0:19           ` Ghannam, Yazen
2018-03-23 15:29             ` Borislav Petkov
2018-03-01 16:38   ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180227221039.GR26382@pd.tnic \
    --to=bp@suse.de \
    --cc=Yazen.Ghannam@amd.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.