linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yazen Ghannam <yazen.ghannam@amd.com>
To: Borislav Petkov <bp@alien8.de>
Cc: linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	tony.luck@intel.com, x86@kernel.org,
	Smita.KoralahalliChannabasappa@amd.com
Subject: Re: [PATCH v2 8/8] x86/MCE/AMD Support new memory interleaving modes during address translation
Date: Fri, 25 Sep 2020 14:51:27 -0500	[thread overview]
Message-ID: <20200925195127.GA323455@yaz-nikka.amd.com> (raw)
In-Reply-To: <20200925072231.GC16872@zn.tnic>

On Fri, Sep 25, 2020 at 09:22:31AM +0200, Borislav Petkov wrote:
> On Wed, Sep 23, 2020 at 11:25:10AM -0500, Yazen Ghannam wrote:
> > I don't remember the original reason, and I was recently asked about
> > this code living in a module. I did some looking after this ask, and I
> > found that we should be using this translation to get a proper value for
> > the memory error notifiers to use. So I think we still need to use this
> > function some way with the core code even if the EDAC interface isn't
> > used.
> 
> You'd need to be more specific here, you want to bypass amd64_edac to
> decode errors? Judging by the current RAS activity coming from you guys,
> I'm thinking firmware. But then wouldn't the firmware do the decoding
> for us and then this function is not even needed?
>

The UC, NFIT, and CEC notifiers all operate on system physical
addresses. The address in the MCE record is checked by
mce_usable_address() to see if it can be used by the kernel, i.e. the
address is a system physical address. Right now, this check passes on
AMD systems if MCA_STATUS[AddrV] is set. This works for memory errors on
legacy AMD systems, since the NB MCA bank logs a physical address for
DRAM ECC errors. But this won't work on newer systems, because the UMC
MCA bank does not log a system physical address for DRAM ECC errors. So
the address provided by the hardware will need to be translated to a
physical address before the notifiers in the MCE chain can use it.

We can add support to get the physical address from firmware in some
cases. But it looks to me that we'll still need to keep updating the
translation code in the kernel to cover some platform/user
configurations. So it makes sense to me to move the functionality into a
module to make it easier to update.

The address translation needs to be done before the notfiers that need
it, and EDAC comes after all of them. There's also the case where the
EDAC interface isn't wanted, so amd64_edac will be unloaded. But the
functionality in the other notifiers are still expected to be available.
So it's more than just decoding the error like we do now with amd64_edac.
That's why I think the translation code can be in a separate module with
a notfier that runs before the others. This can do the translation once
then pass the result down to the CEC, UC, NFIT, and EDAC notifiers to
use as needed.

Thanks,
Yazen

  reply	other threads:[~2020-09-25 20:15 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-03 20:01 [PATCH v2 0/8] AMD MCA Address Translation Updates Yazen Ghannam
2020-09-03 20:01 ` [PATCH v2 1/8] x86/CPU/AMD: Save NodeId on AMD-based systems Yazen Ghannam
2020-09-09 18:06   ` Borislav Petkov
2020-09-09 20:17     ` Yazen Ghannam
2020-09-10 10:14       ` Borislav Petkov
2020-09-14 19:20         ` Yazen Ghannam
2020-09-15  8:35           ` Borislav Petkov
2020-09-16 19:51             ` Yazen Ghannam
2020-09-17 10:37               ` Borislav Petkov
2020-09-17 16:20                 ` Yazen Ghannam
2020-09-17 16:40                   ` Borislav Petkov
2020-09-17 19:44                     ` Yazen Ghannam
2020-09-17 20:10                       ` Borislav Petkov
2020-09-03 20:01 ` [PATCH v2 2/8] x86/CPU/AMD: Remove amd_get_nb_id() Yazen Ghannam
2020-09-03 20:01 ` [PATCH v2 3/8] EDAC/mce_amd: Use struct cpuinfo_x86.node_id for NodeId Yazen Ghannam
2020-09-03 20:01 ` [PATCH v2 4/8] x86/MCE/AMD: Use defines for register addresses in translation code Yazen Ghannam
2020-09-03 20:01 ` [PATCH v2 5/8] x86/MCE/AMD: Use macros to get bitfields " Yazen Ghannam
2020-09-21 13:58   ` Borislav Petkov
2020-09-03 20:01 ` [PATCH v2 6/8] x86/MCE/AMD: Drop tmp variable " Yazen Ghannam
2020-09-23  8:05   ` Borislav Petkov
2020-09-23 16:05     ` Yazen Ghannam
2020-09-03 20:01 ` [PATCH v2 7/8] x86/MCE/AMD: Group register reads " Yazen Ghannam
2020-09-03 20:01 ` [PATCH v2 8/8] x86/MCE/AMD Support new memory interleaving modes during address translation Yazen Ghannam
2020-09-23  8:20   ` Borislav Petkov
2020-09-23 16:25     ` Yazen Ghannam
2020-09-25  7:22       ` Borislav Petkov
2020-09-25 19:51         ` Yazen Ghannam [this message]
2020-09-28  9:47           ` Borislav Petkov
2020-09-28 15:53             ` Yazen Ghannam
2020-09-28 18:14               ` Borislav Petkov
2020-09-29 13:21                 ` Yazen Ghannam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200925195127.GA323455@yaz-nikka.amd.com \
    --to=yazen.ghannam@amd.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).