From: Tracy Smith <tlsmith3777@gmail.com>
To: york.sun@nxp.com
Cc: steve@theinkpens.com, bp@alien8.de, linux-edac@vger.kernel.org,
backports@vger.kernel.org, linux-newbie@vger.kernel.org,
util-linux@vger.kernel.org, linux-mmc@vger.kernel.org
Subject: Re: edac driver initialization, interrupt, & debug
Date: Wed, 21 Nov 2018 16:02:55 -0600 [thread overview]
Message-ID: <CAChUvXMWp1Pou9koFP7QMz4CxJ-OrH2qiGTn6nrYZsDyHdoeyA@mail.gmail.com> (raw)
In-Reply-To: <CAChUvXPCfwfHrntJHWpsydYZE=P692Axd0pFE+GjZCXtx1fgog@mail.gmail.com>
Please ignore the first question. I now see the expected EDAC message
in the kernel log:
EDAC MC0: 1 CE fsl_mc_err on mc#0csrow#0channel#0 (csrow:0 channel:0
page:0x5df1f offset:0xe40 grain:8 syndrome:0xe0e0)
1) Is there anything similar to the edac-utils but for ARM instead of
x86, or does
sysfs replace the edac-utils, or is there something else for ARM?
2) What is currently used for collecting and reporting ECC errors for
ARM/EDAC beyond the kernel log and messages?
https://github.com/grondo/edac-utils
3) How is RAS/rasdaemon reporting integrated into EDAC for error
collection and reporting?
4) Has there been a patch to prevent EDAC sysfs API from reporting bogus values?
See http://lkml.iu.edu/hypermail/linux/kernel/1205.3/02249.html
On Wed, Nov 21, 2018 at 11:01 AM Tracy Smith <tlsmith3777@gmail.com> wrote:
>
> Not probing the edac driver turned out to be a device tree issue as
> Steve suspected. Thanks to both Steve and York, this has been resolved
> and the backport is now logging ECC errors after injection. Added the
> ddr qoriq-memory-controller entry since we used a different .dtsi
> file.
>
> arch/arm64/boot/dts/freescale/...ls1043a.dtsi
>
> ddr: memory-controller@1080000
> { compatible = "fsl,qoriq-memory-controller"; reg = <0x0 0x1080000 0x0
> 0x1000>; interrupts = <0 144 0x4>; big-endian; };
>
> I now need to collect and report CE and UE ECC errors and extend the
> existing logging and reporting function that I currently see. After
> reviewing the following document, the system logging appears different
> from that given in the kernel EDAC document. I need the level of
> granularity described in the edac.txt file.
>
> https://www.mjmwired.net/kernel/Documentation/edac.txt#173 same as
> kernel/Documentation/edac.txt
>
> 1) Can I gather the system logging described below in the edac.txt
> file for layerscape?
>
> 2) Is there anything similar to the edac-utils but for ARM, or does
> sysfs replace the edac-utils, or something else?
>
> 3) What is currently used for collecting and reporting ECC errors for
> ARM/EDAC beyond the kernel log and messages?
> https://github.com/grondo/edac-utils
>
> 4) How is RAS reporting integrated into EDAC for error collection and reporting?
>
> 5) Has there been a patch to prevent EDAC sysfs API from reporting bogus values?
> See http://lkml.iu.edu/hypermail/linux/kernel/1205.3/02249.html
>
> - The EDAC sysfs API will still report bogus values. So, userspace
> tools like edac-utils will still use the bogus data;
>
> - Add a new tracepoint-based way to get the binary information about
> the errors.
>
> This is the logging I currently see with layerscape EDAC. Need
> something explaining these fields.
>
> [ 407.612311] EDAC FSL_DDR MC0: Err Detect Register: 0x80000004 [
> 407.618182] EDAC FSL_DDR MC0: Faulty Data bit: 0
> [ 407.622793] EDAC FSL_DDR MC0: Expected Data / ECC:
> 0x40c50901_40c50900 / 0x800000f0
> [ 407.630443] EDAC FSL_DDR MC0: Captured Data / ECC: 0x40c50900_40c50901 / 0xf0
> [ 407.637571] EDAC FSL_DDR MC0: Err addr: 0x3e0bfff50
> [ 407.642440] EDAC FSL_DDR MC0: PFN: 0x003e0bff
>
> This is the level of detail I need:
>
> SYSTEM LOGGING
> --------------
>
> If logging for UEs and CEs is enabled, then system logs will contain
> information indicating that errors have been detected:
>
> EDAC MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0,
> channel 1 "DIMM_B1": amd76x_edac
>
> EDAC MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0,
> channel 1 "DIMM_B1": amd76x_edac
>
> The structure of the message is:
> the memory controller (MC0)
> Error type (CE)
> memory page (0x283)
> offset in the page (0xce0)
> the byte granularity (grain 8)
> or resolution of the error
> the error syndrome (0xb741)
> memory row (row 0)
> memory channel (channel 1)
> DIMM label, if set prior (DIMM B1
> and then an optional, driver-specific message that may
> have additional information.
>
> Both UEs and CEs with no info will lack all but memory controller, error
> type, a notice of "no info" and then an optional, driver-specific error
> message.
>
> On Mon, Nov 19, 2018 at 10:48 AM York Sun <york.sun@nxp.com> wrote:
> >
> > On 11/19/18 8:38 AM, Tracy Smith wrote:
> > > Steve, you were correct, there wasn't a device tree entry for the
> > > qoriq memory controller in
> > > arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi. I added it making it
> > > identical to the fsl-ls1046s.dtsi, which should have the same memory
> > > controller and entry as the ls1043a. I added this but it didn't make
> > > a difference as far as being able to call the probe function. I'm now
> > > checking the mpc85xx_edac.c dtsi entry for comparison since York used
> > > the mpc85xx as the basis for the layerscape, but there is something
> > > else missing preventing the probe function from being called.
> > >
> > > @York
> > > What is your entry for
> > > /proc/device-tree/soc/ifc@1530000/board-control@1,0/compatible
> >
> > EDAC driver doesn't check IFC. Are you debugging EDAC for memory controller?
> >
> > >
> > > @York
> > > cat /proc/device-tree/compatible entry is this, is this correct?
> > > fsl,ls1043a-rdbfsl,ls1043a
> >
> > Once again, you are using your modified code on your own board. So it is
> > not ls1043ardb. This compatible has nothing to do with EDAC driver.
> >
> > I cannot help you with ls1043ardb because the real ls1043ardb board
> > doesn't support ECC. The closest board I have is ls1046ardb.
> >
> > >
> > > ddr: memory-controller@1080000 {
> > > compatible = "fsl,qoriq-memory-controller";
> > > reg = <0x0 0x1080000 0x0 0x1000>;
> > > interrupts = <0 144 0x4>;
> > > big-endian;
> > > };
> >
> > This is your source code, not your final device tree. Please learn to
> > use "fdt" command under U-Boot to dump your device tree before booting
> > Linux, or check after Linux is up. For your reference, on my ls1046ardb,
> > I have
> >
> > # cat /proc/device-tree/soc/memory-controller@1080000/compatible
> > fsl,qoriq-memory-controller
> >
> > York
>
>
>
> --
> Confidentiality notice: This e-mail message, including any
> attachments, may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), please
> immediately notify the sender and delete this e-mail message.
--
Confidentiality notice: This e-mail message, including any
attachments, may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), please
immediately notify the sender and delete this e-mail message.
next prev parent reply other threads:[~2018-11-21 22:03 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <BYAPR02MB431115EC4735AE5B7E29F2CEF6DC0@BYAPR02MB4311.namprd02.prod.outlook.com>
[not found] ` <BYAPR02MB43110062F32BFDEA712AB371F6DC0@BYAPR02MB4311.namprd02.prod.outlook.com>
[not found] ` <CAChUvXMp6S6MBY_LmrfgdPcctQw70FoyxbiHeFqK+5fQx5omCw@mail.gmail.com>
2018-11-16 17:07 ` FW: edac driver initialization, interrupt, & debug Tracy Smith
2018-11-17 14:05 ` Borislav Petkov
2018-11-17 23:22 ` Tracy Smith
2018-11-18 1:05 ` Steve Inkpen
2018-11-19 16:37 ` Tracy Smith
2018-11-19 16:48 ` York Sun
2018-11-21 17:01 ` Tracy Smith
2018-11-21 22:02 ` Tracy Smith [this message]
2018-11-28 18:48 ` edac driver injection of uncorrected errors & utils Tracy Smith
2018-11-28 19:06 ` York Sun
2018-11-28 19:11 ` Tracy Smith
2018-11-28 19:24 ` York Sun
2018-11-28 22:14 ` Tracy Smith
2018-11-28 23:44 ` Borislav Petkov
2018-12-05 16:37 ` Tracy Smith
2018-12-05 17:12 ` Borislav Petkov
2018-12-05 17:59 ` York Sun
2018-12-05 21:59 ` Patrol scrub questions Tracy Smith
2018-12-05 22:12 ` York Sun
2018-12-05 22:53 ` Layerscape behavior when a UE is detected Tracy Smith
2018-12-05 22:57 ` York Sun
2018-12-05 23:41 ` Layerscape UE detected and no EDAC panic Tracy Smith
2018-11-19 16:24 ` FW: edac driver initialization, interrupt, & debug York Sun
2018-11-19 15:55 ` York Sun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAChUvXMWp1Pou9koFP7QMz4CxJ-OrH2qiGTn6nrYZsDyHdoeyA@mail.gmail.com \
--to=tlsmith3777@gmail.com \
--cc=backports@vger.kernel.org \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-mmc@vger.kernel.org \
--cc=linux-newbie@vger.kernel.org \
--cc=steve@theinkpens.com \
--cc=util-linux@vger.kernel.org \
--cc=york.sun@nxp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).