All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Yazen Ghannam <Yazen.Ghannam@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH 3/6] x86/mce: Add support for new MCA_SYND register
Date: Fri, 8 Jul 2016 12:14:53 +0200	[thread overview]
Message-ID: <20160708101452.GD3808@pd.tnic> (raw)
In-Reply-To: <20160708094653.GC13849@gmail.com>

On Fri, Jul 08, 2016 at 11:46:53AM +0200, Ingo Molnar wrote:
> I'm not sure I can parse that: how can a reported error have bits corrupted?

No, it is about the actual bits in memory the ECC error is generated
for. So, for example, if an ECC error reports that memory location X had
some bit flips, the syndrome value which gets reported together with
same ECC error shows which actual bits have flipped.

Here's an example from the AMD BKDG, maybe that'll make it more clear:

http://support.amd.com/TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdf

Go to page 246, there it says this:

"For example, assume the ECC syndrome is 03EAh. First search row EAh
for the complete syndrome. Since it is not found, search row 03h for
the complete syndrome. It is found in column 9h, so symbol 9h has the
error. Since the error bitmask indicates value 3h (0011b), bits 0 and 1
within that symbol are corrupted. Symbol 9h maps to bits 72-79, so the
corrupted bits are 72 and 73 of the line."

So you basically search the table of x8 ECC correctable syndromes, first
in row EAh (second syndrome byte) and if you don't find the complete
syndrome there, you search row 03 for it.

It is in column 9 and that means symbol 9. The symbols are 16 - one
symbol for each byte in a 128bit DRAM word + 3 special symbols for the
ECC bits.

The row number 3h is also the error bitmask, so bits 0 and 1 are the
ones which are corrupted.

Which means, when you look at the value in DRAM at the address the error
was reported, you need to go to symbol 9, that's 9*8 = 72 which means,
bits 72-79 and the first 2 in that byte are bits 72 and 73.

So if you want to correct them, you simply flip them as the syndrome
tells you that those 2 are corrupted.

Ok?

See how easy it is :-)))

> I'm fine with an add-on patch that adds a good explanation for all
> this to the code.

How about we point to that section in the BKDG? I think it is written
pretty understandably for a technical document and the example makes it
even more explicit.

:-)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

  reply	other threads:[~2016-07-08 10:15 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-08  9:09 [PATCH 0/6] x86/RAS queue Borislav Petkov
2016-07-08  9:09 ` [PATCH 1/6] x86/mce/AMD: Increase size of bank_map type Borislav Petkov
2016-07-08  9:21   ` Ingo Molnar
2016-07-08  9:32     ` Borislav Petkov
2016-07-08 12:05   ` [tip:ras/core] x86/mce/AMD: Increase size of the " tip-bot for Aravind Gopalakrishnan
2016-07-08  9:09 ` [PATCH 2/6] x86/RAS/AMD: Reduce number of IPIs when prepping error injection Borislav Petkov
2016-07-08 12:06   ` [tip:ras/core] x86/RAS/AMD: Reduce the " tip-bot for Yazen Ghannam
2016-07-08  9:09 ` [PATCH 3/6] x86/mce: Add support for new MCA_SYND register Borislav Petkov
2016-07-08  9:26   ` Ingo Molnar
2016-07-08  9:37     ` Borislav Petkov
2016-07-08  9:46       ` Ingo Molnar
2016-07-08 10:14         ` Borislav Petkov [this message]
2016-07-08 10:26           ` Ingo Molnar
2016-07-08 10:48             ` Borislav Petkov
2016-07-08  9:09 ` [PATCH 4/6] x86/mce: Fix mce_rdmsrl() warning message Borislav Petkov
2016-07-08 12:06   ` [tip:ras/core] " tip-bot for Borislav Petkov
2016-07-08  9:09 ` [PATCH 5/6] EDAC, mce_amd: Print syndrome register value on SMCA systems Borislav Petkov
2016-07-08  9:09 ` [PATCH 6/6] x86/RAS: Add syndrome support to mce_amd_inj Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160708101452.GD3808@pd.tnic \
    --to=bp@alien8.de \
    --cc=Yazen.Ghannam@amd.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.