From: Yazen Ghannam <yazen.ghannam@amd.com>
To: Borislav Petkov <bp@alien8.de>
Cc: linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
tony.luck@intel.com, x86@kernel.org,
Smita.KoralahalliChannabasappa@amd.com, mukul.joshi@amd.com,
alexander.deucher@amd.com, william.roche@oracle.com
Subject: Re: [PATCH 1/3] x86/MCE/AMD: Provide an "Unknown" MCA bank type
Date: Tue, 7 Dec 2021 16:28:42 +0000 [thread overview]
Message-ID: <Ya+LukojuewlomeF@yaz-ubuntu> (raw)
In-Reply-To: <YaqXiVjNLINxwz8G@zn.tnic>
On Fri, Dec 03, 2021 at 11:17:45PM +0100, Borislav Petkov wrote:
> On Fri, Dec 03, 2021 at 02:00:15AM +0000, Yazen Ghannam wrote:
> > The AMD MCA Thresholding sysfs interface populates directories for each
> > bank and thresholding block. The name used for each directory is looked
> > up in a table of known bank types. However, new bank types won't match
> > in this list and will return NULL for the name. This will cause the
> > machinecheck sysfs interface to fail to be populated.
> >
> > Set new and unknown MCA bank types to the "unknown" type. Also,
> > ensure that the bank's thresholding block directories have unique names.
> > This will ensure that the machinecheck sysfs interface can be
> > initialized.
>
> What is the advantage of having a sysfs directory structure headed with
> an "unknown" entry vs not having that structure at all when the kernel
> runs on a machine for which it has not been enabled yet?
>
> IOW, if those new banks would need additional enablement, what's the
> point of having "unknown" on older kernels which do not have any
> functionality?
>
> IOW, how does this:
>
> /sys/devices/system/machinecheck/machinecheck0/unknown/unknown/
> ├── error_count
> ├── interrupt_enable
> └── threshold_limit
>
> help a user?
Yeah, I see your point.
>
> Btw, looking at the current layout:
>
> ...
> ├── insn_fetch
> │ └── insn_fetch
> │ ├── error_count
> │ ├── interrupt_enable
> │ └── threshold_limit
> ├── l2_cache
> │ └── l2_cache
> │ ├── error_count
> │ ├── interrupt_enable
> │ └── threshold_limit
> ...
>
> we have those names repeated which looks wonky and useless too. I'd
> expect them to be:
>
> ...
> ├── insn_fetch
> │ ├── error_count
> │ ├── interrupt_enable
> │ └── threshold_limit
> ├── l2_cache
> │ ├── error_count
> │ ├── interrupt_enable
> │ └── threshold_limit
> ...
>
> Can we fix that too pls?
>
Sure thing. But I don't think removing the second directory will be okay. The
layout is "bank"/"block". If the "block" has special use like DRAM ECC, or L3
Cache on older systems, then it'll have a unique name. Otherwise, the block
will take the name of the bank.
I think the more robust solution is to drop the unique names and use generic
names like "bank"/"block". A new file called "type" can be introduced into the
directory structure, and this can return the name of the bank/block. New bank
types will return "<null>" for the "type", but the directory structure should
remain the same and functional.
I've seen this in other sysfs interfaces like cpuidle,
e.g. /sys/devices/system/cpu/cpu0/cpuidle/stateX
The "blockX/type" file is like the "stateX/desc" file. Or the "type" file can
be called "desc", since it's a description of what the bank or block
represent.
Here are a couple of examples:
/sys/devices/system/machinecheck/machinecheck0/
├── th_bank0
│ ├── type ("Instruction Fetch")
│ └── th_block0
│ ├── type ("All Errors")
│ ├── error_count
│ ├── interrupt_enable
│ └── threshold_limit
├── th_bank1
│ ├── type ("Northbridge")
│ ├── th_block0
│ │ ├── type ("DRAM Errors")
│ │ ├── error_count
│ │ ├── interrupt_enable
│ │ └── threshold_limit
│ └── th_block1
│ ├── type ("Link Errors")
│ ├── error_count
│ ├── interrupt_enable
│ └── threshold_limit
...
OR
/sys/devices/system/machinecheck/machinecheck0/thresholding
├── bank0
│ ├── desc ("Instruction Fetch")
│ └── block0
│ ├── desc ("All Errors")
│ ├── error_count
│ ├── interrupt_enable
│ └── threshold_limit
├── bank1
│ ├── desc ("Northbridge")
│ ├── block0
│ │ ├── desc ("DRAM Errors")
│ │ ├── error_count
│ │ ├── interrupt_enable
│ │ └── threshold_limit
│ └── block1
│ ├── desc ("Link Errors")
│ ├── error_count
│ ├── interrupt_enable
│ └── threshold_limit
...
I'm inclined to the second option, since it keeps all the thresholding
functionality under a single directory.
What do you think?
Thanks,
Yazen
next prev parent reply other threads:[~2021-12-07 16:29 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-03 2:00 [PATCH 0/3] AMD SMCA Updates Yazen Ghannam
2021-12-03 2:00 ` [PATCH 1/3] x86/MCE/AMD: Provide an "Unknown" MCA bank type Yazen Ghannam
2021-12-03 22:17 ` Borislav Petkov
2021-12-07 16:28 ` Yazen Ghannam [this message]
2021-12-11 15:39 ` Borislav Petkov
2021-12-03 2:00 ` [PATCH 2/3] x86/MCE/AMD, EDAC/mce_amd: Add new SMCA Bank Types Yazen Ghannam
2021-12-03 2:00 ` [PATCH 3/3] x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration Yazen Ghannam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ya+LukojuewlomeF@yaz-ubuntu \
--to=yazen.ghannam@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alexander.deucher@amd.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mukul.joshi@amd.com \
--cc=tony.luck@intel.com \
--cc=william.roche@oracle.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).