linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>
To: Jean-Frederic <jfgaudreault@gmail.com>, Borislav Petkov <bp@alien8.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Subject: RE: [GIT PULL] EDAC pile for 5.4 -> AMD family 17h, model 70h support
Date: Mon, 21 Oct 2019 14:24:04 +0000	[thread overview]
Message-ID: <SN6PR12MB26390DC298E06A5443699D02F8690@SN6PR12MB2639.namprd12.prod.outlook.com> (raw)
In-Reply-To: <fddfb084-69a0-a913-f750-ef0a7830dd1e@gmail.com>

> -----Original Message-----
> From: Jean-Frederic <jfgaudreault@gmail.com>
> Sent: Saturday, October 19, 2019 12:13 PM
> To: Borislav Petkov <bp@alien8.de>
> Cc: Ghannam, Yazen <Yazen.Ghannam@amd.com>; linux-edac@vger.kernel.org
> Subject: Re: [GIT PULL] EDAC pile for 5.4 -> AMD family 17h, model 70h support
> 
> On 2019-10-19 4:25 a.m., Borislav Petkov wrote:
> > Look here on page 6:
> > https://www.amd.com/system/files/2017-06/AMD-EPYC-Brings-New-RAS-Capability.pdf
> >
> > It hints at what PFEH does.
> >  [ I believe if the error cannot be handled by the firmware, it gets
> >    reported to the OS but I'll let Yazen comment on that. ]
> 
> Yes, I found that document too after I sent my email yesterday, and I kind of
> had a similar understanding...
> 

Yes, that's right. And even if the firmware handles the error it may still
report to the OS. That's really a policy decision and it may vary between
vendors.

> > I know, I know, we don't trust the firmware to do it right and so on,
> > but it is what it is. Like other stuff we have to rely on the firmware
> > to do right.
> 
> I think we would all like to trust the firmware if it was clear what it is doing
> to be honest.
> However the way these consumer products are sold and documented (the motherboard I mean),
> especially for AMD RYZEN and ECC support, is just that there is almost no information
> (a vague statement aboutit "supports ecc"...)
> 
> The concept of the PFEH and RAS I think is good the more I read about it, but mostly for
> enterprise solutions, and it would be good too I guess for a consumer product if we knew
> we could rely on it.
> 
> As it stands right now, I don't really know if I can trust it. When I did my own tests
> of generating real errors it was either the system is totally stable, or would not boot,
> or would crash suddenly. I could see that ecc really corrects things, because otherwise
> I would get software self check errors in mprime under those conditions fairly quickly
> (after 1-2 minutes), but with ecc enabled I can run for hours without any sign of issue
> under the same conditions.
> 
> So can I rely on this to know one day that I am starting to have hardware issues and I
> should replace my memory (or system)? I don't even know how the firmware will report
> anything to me. There is nothing in the bios that seems to give any report about ecc,
> 

Generally, the firmware will report the error up to the OS and the OS will
report to the user. So you should find the error reported through EDAC, etc.

Thanks,
Yazen

  reply	other threads:[~2019-10-21 14:24 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAEVokG7TeAbmkhaxiTpsxhv1pQzqRpU=mR8gVjixb5kXo3s2Eg@mail.gmail.com>
     [not found] ` <20190924092644.GC19317@zn.tnic>
2019-10-05 16:52   ` [GIT PULL] EDAC pile for 5.4 -> AMD family 17h, model 70h support Jeff God
2019-10-07  7:16     ` Borislav Petkov
2019-10-07 12:58       ` Jeff God
2019-10-08 11:50         ` Borislav Petkov
2019-10-08 19:42           ` Ghannam, Yazen
2019-10-08 23:08             ` Jeff God
2019-10-09 10:30               ` Borislav Petkov
2019-10-09 20:31                 ` Ghannam, Yazen
2019-10-09 23:54                   ` Jeff God
2019-10-10  9:56                     ` Borislav Petkov
2019-10-10 12:48                       ` Jean-Frederic
2019-10-10 13:41                         ` Borislav Petkov
2019-10-10 19:00                           ` Ghannam, Yazen
2019-10-11  1:04                             ` Jean-Frederic
2019-10-18 23:08                               ` Jean-Frederic
2019-10-19  8:25                                 ` Borislav Petkov
2019-10-19 16:12                                   ` Jean-Frederic
2019-10-21 14:24                                     ` Ghannam, Yazen [this message]
2020-01-04 20:03                                     ` Jean-Frederic
2020-01-04 21:47                                       ` Jean-Frederic
2019-10-10  9:54                   ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN6PR12MB26390DC298E06A5443699D02F8690@SN6PR12MB2639.namprd12.prod.outlook.com \
    --to=yazen.ghannam@amd.com \
    --cc=bp@alien8.de \
    --cc=jfgaudreault@gmail.com \
    --cc=linux-edac@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).