linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>
To: Borislav Petkov <bp@alien8.de>, Jeff God <jfgaudreault@gmail.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Subject: Re: [GIT PULL] EDAC pile for 5.4 -> AMD family 17h, model 70h support
Date: Wed, 9 Oct 2019 20:31:26 +0000	[thread overview]
Message-ID: <724d6f97-61f2-94bd-3f4b-793a55b6ac15@amd.com> (raw)
In-Reply-To: <20191009103041.GC10395@zn.tnic>

On 10/9/2019 6:30 AM, Borislav Petkov wrote:
> On Tue, Oct 08, 2019 at 07:08:20PM -0400, Jeff God wrote:
>> I also wanted to apologise for the text emails line wrapping, I
>> haven't found a viable email client alternative...
> 
> https://www.kernel.org/doc/html/latest/process/email-clients.html
> 
>> I did not see anything in dmesg, and all status files remained 0
>> (except flag which was hw)
> 
> Nothing here either but my machine is
> 
> vendor_id       : AuthenticAMD
> cpu family      : 23
> model           : 1
> model name      : AMD EPYC 7251 8-Core Processor
> stepping        : 2
> 
> so I'm guessing it needs something else for injection to work on those
> models...
> 

Ah yes, sorry I forgot to mention that you will need to disable Platform First
Error Handling. This can be done in the BIOS. It's usually under something
like:

AMD CBS -> "Core" Common Options -> Platform First Error Handling

This feature will prevent writes to the MCA registers.

Please let me know if this works or not for you. I'll need to do some more
debug if it doesn't work.

>> The memory controller banks are 17 (channel 0) and 18 (channel 1) on Family
>> 17h Model 7Xh, and these are managed by CPU 0.
> 
> Btw, Yazen, we probably need to have an easy way to find out
> how the bank mapping is now on SMCA machine when wanting to do
> injection. I know we talked about having some of that info in
> /sys/devices/system/machinecheck/machinecheckX...
> 

Yep, I agree. I have some ideas, and I'll send them as RFC patches.

>> And even this:
>> [  609.681714] mce: [Hardware Error]: Machine check events logged
>> [  609.681716] [Hardware Error]: Corrected error, no action required.
>> [  609.681720] [Hardware Error]: CPU:0 (17:71:0)
>> MC17_STATUS[-|CE|MiscV|AddrV|-|SyndV|CECC|-|-|Scrub]:
>> 0x9c2041000000011b
>> [  609.681723] [Hardware Error]: Error Addr: 0x000000006d3d483b
>> [  609.681724] [Hardware Error]: IPID: 0x0000000000000000, Syndrome:
>> 0x0000000000000000
>> [  609.681726] [Hardware Error]: Unified Memory Controller Ext. Error
>> Code: 0, DRAM ECC error.
>> [  609.681743] ------------[ cut here ]------------
>> [  609.681748] WARNING: CPU: 4 PID: 2447 at
>> drivers/edac/edac_mc.c:1238 edac_mc_handle_error+0x5a6/0x6d0
> 
> You can ignore that for now. That's a sanity-check for a driver supplying a 0
> for grain.
> 

I've seen this too, and I'm looking into it. I'm doing some research to find
the correct (or at least sane) value for current and legacy systems.

Thanks,
Yazen

  reply	other threads:[~2019-10-09 20:31 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAEVokG7TeAbmkhaxiTpsxhv1pQzqRpU=mR8gVjixb5kXo3s2Eg@mail.gmail.com>
     [not found] ` <20190924092644.GC19317@zn.tnic>
2019-10-05 16:52   ` [GIT PULL] EDAC pile for 5.4 -> AMD family 17h, model 70h support Jeff God
2019-10-07  7:16     ` Borislav Petkov
2019-10-07 12:58       ` Jeff God
2019-10-08 11:50         ` Borislav Petkov
2019-10-08 19:42           ` Ghannam, Yazen
2019-10-08 23:08             ` Jeff God
2019-10-09 10:30               ` Borislav Petkov
2019-10-09 20:31                 ` Ghannam, Yazen [this message]
2019-10-09 23:54                   ` Jeff God
2019-10-10  9:56                     ` Borislav Petkov
2019-10-10 12:48                       ` Jean-Frederic
2019-10-10 13:41                         ` Borislav Petkov
2019-10-10 19:00                           ` Ghannam, Yazen
2019-10-11  1:04                             ` Jean-Frederic
2019-10-18 23:08                               ` Jean-Frederic
2019-10-19  8:25                                 ` Borislav Petkov
2019-10-19 16:12                                   ` Jean-Frederic
2019-10-21 14:24                                     ` Ghannam, Yazen
2020-01-04 20:03                                     ` Jean-Frederic
2020-01-04 21:47                                       ` Jean-Frederic
2019-10-10  9:54                   ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=724d6f97-61f2-94bd-3f4b-793a55b6ac15@amd.com \
    --to=yazen.ghannam@amd.com \
    --cc=bp@alien8.de \
    --cc=jfgaudreault@gmail.com \
    --cc=linux-edac@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).