All of lore.kernel.org
 help / color / mirror / Atom feed
* dimm mapping
@ 2020-12-16 15:06 Michael Di Domenico
  2020-12-16 16:34 ` Borislav Petkov
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Di Domenico @ 2020-12-16 15:06 UTC (permalink / raw)
  To: linux-edac

is there some tool that merges the 'dmidecode -t memory' output with
'edac-util -v' to give me the mapping of my motherboard for the
labels.db file?

the only documentation i've been able to find on the net, says to
populate one dimm at a time and write down the mapping, which seems
horribly dated for 2020

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: dimm mapping
  2020-12-16 15:06 dimm mapping Michael Di Domenico
@ 2020-12-16 16:34 ` Borislav Petkov
  2020-12-16 17:05   ` Michael Di Domenico
  0 siblings, 1 reply; 5+ messages in thread
From: Borislav Petkov @ 2020-12-16 16:34 UTC (permalink / raw)
  To: Michael Di Domenico; +Cc: linux-edac

On Wed, Dec 16, 2020 at 10:06:21AM -0500, Michael Di Domenico wrote:
> is there some tool that merges the 'dmidecode -t memory' output with
> 'edac-util -v' to give me the mapping of my motherboard for the
> labels.db file?
> 
> the only documentation i've been able to find on the net, says to
> populate one dimm at a time and write down the mapping, which seems
> horribly dated for 2020

There's a whole subsystem - EDAC - trying to do that mapping between
silkscreen labels on the motherboard and actual DIMMs and there are
cases where that is simply impossible. Unfortunately.

What is the problem you're trying to solve?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: dimm mapping
  2020-12-16 16:34 ` Borislav Petkov
@ 2020-12-16 17:05   ` Michael Di Domenico
  2020-12-16 17:15     ` Borislav Petkov
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Di Domenico @ 2020-12-16 17:05 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac

the problem i'm trying to solve is that when i run edac-util -v i
don't get the dimm labels.  so when MCE trips it's chore to figure out
which dimm it actually is when looking at the motherboard.  given that
dmidecode seems to report the dimm labels, it seems odd that edac
doesn't use them.  but then again i understand how all that's tied
together (if at all).



On Wed, Dec 16, 2020 at 11:34 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Dec 16, 2020 at 10:06:21AM -0500, Michael Di Domenico wrote:
> > is there some tool that merges the 'dmidecode -t memory' output with
> > 'edac-util -v' to give me the mapping of my motherboard for the
> > labels.db file?
> >
> > the only documentation i've been able to find on the net, says to
> > populate one dimm at a time and write down the mapping, which seems
> > horribly dated for 2020
>
> There's a whole subsystem - EDAC - trying to do that mapping between
> silkscreen labels on the motherboard and actual DIMMs and there are
> cases where that is simply impossible. Unfortunately.
>
> What is the problem you're trying to solve?
>
> --
> Regards/Gruss,
>     Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: dimm mapping
  2020-12-16 17:05   ` Michael Di Domenico
@ 2020-12-16 17:15     ` Borislav Petkov
  2020-12-16 17:31       ` Michael Di Domenico
  0 siblings, 1 reply; 5+ messages in thread
From: Borislav Petkov @ 2020-12-16 17:15 UTC (permalink / raw)
  To: Michael Di Domenico; +Cc: linux-edac

On Wed, Dec 16, 2020 at 12:05:25PM -0500, Michael Di Domenico wrote:
> the problem i'm trying to solve is that when i run edac-util -v i
> don't get the dimm labels.  so when MCE trips it's chore to figure out
> which dimm it actually is when looking at the motherboard.  given that
> dmidecode seems to report the dimm labels, it seems odd that edac
> doesn't use them.  but then again i understand how all that's tied
> together (if at all).

Yeah, the short version is, there's no properly defined way for software
to read out DIMM silkscreen labels on each platform. I highly doubt that
is even possible. Perhaps some SMBUS interfaces or whatnot but firmware
is notoriosly buggy so there's no reliability there.

And, as said before, in some cases one cannot map back the physical
address reported with a DIMM MCE to the actual DIMM.

And, in recent times, OEM vendors do more and more RAS in the firmware
so the kernel doesn't get to even see some errors.

I'm always hoping that I'll be corrected some day but until then that's
the current situation, roughly.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: dimm mapping
  2020-12-16 17:15     ` Borislav Petkov
@ 2020-12-16 17:31       ` Michael Di Domenico
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Di Domenico @ 2020-12-16 17:31 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac

that's a bummer.. :(

i guess the best way to determine which dimm to replace still, is
based on the ipmi sel logs and whether or not one can actually decode
the error in the sel log correctly.  intel fortunately puts out a
decode manual that will tell you, but it's a pain to deal with since
it's all bit math.  other mb vendors can't be bothered.


On Wed, Dec 16, 2020 at 12:15 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Dec 16, 2020 at 12:05:25PM -0500, Michael Di Domenico wrote:
> > the problem i'm trying to solve is that when i run edac-util -v i
> > don't get the dimm labels.  so when MCE trips it's chore to figure out
> > which dimm it actually is when looking at the motherboard.  given that
> > dmidecode seems to report the dimm labels, it seems odd that edac
> > doesn't use them.  but then again i understand how all that's tied
> > together (if at all).
>
> Yeah, the short version is, there's no properly defined way for software
> to read out DIMM silkscreen labels on each platform. I highly doubt that
> is even possible. Perhaps some SMBUS interfaces or whatnot but firmware
> is notoriosly buggy so there's no reliability there.
>
> And, as said before, in some cases one cannot map back the physical
> address reported with a DIMM MCE to the actual DIMM.
>
> And, in recent times, OEM vendors do more and more RAS in the firmware
> so the kernel doesn't get to even see some errors.
>
> I'm always hoping that I'll be corrected some day but until then that's
> the current situation, roughly.
>
> --
> Regards/Gruss,
>     Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-16 17:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-16 15:06 dimm mapping Michael Di Domenico
2020-12-16 16:34 ` Borislav Petkov
2020-12-16 17:05   ` Michael Di Domenico
2020-12-16 17:15     ` Borislav Petkov
2020-12-16 17:31       ` Michael Di Domenico

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.