util-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tracy Smith <tlsmith3777@gmail.com>
To: york.sun@nxp.com
Cc: steve@theinkpens.com, bp@alien8.de, linux-edac@vger.kernel.org,
	backports@vger.kernel.org, linux-newbie@vger.kernel.org,
	util-linux@vger.kernel.org, linux-mmc@vger.kernel.org
Subject: Re: edac driver initialization, interrupt, & debug
Date: Wed, 21 Nov 2018 16:02:55 -0600	[thread overview]
Message-ID: <CAChUvXMWp1Pou9koFP7QMz4CxJ-OrH2qiGTn6nrYZsDyHdoeyA@mail.gmail.com> (raw)
In-Reply-To: <CAChUvXPCfwfHrntJHWpsydYZE=P692Axd0pFE+GjZCXtx1fgog@mail.gmail.com>

Please ignore the first question. I now see the expected EDAC message
in the kernel log:

EDAC MC0: 1 CE fsl_mc_err on mc#0csrow#0channel#0 (csrow:0 channel:0
page:0x5df1f offset:0xe40 grain:8 syndrome:0xe0e0)

1)  Is there anything similar to the edac-utils but for ARM instead of
x86, or does
sysfs replace the edac-utils, or is there something else for ARM?

2)  What is currently used for collecting and reporting ECC errors for
ARM/EDAC beyond the kernel log and messages?

https://github.com/grondo/edac-utils

3) How is RAS/rasdaemon reporting integrated into EDAC for error
collection and reporting?

4) Has there been a patch to prevent EDAC sysfs API from reporting bogus values?
See http://lkml.iu.edu/hypermail/linux/kernel/1205.3/02249.html
On Wed, Nov 21, 2018 at 11:01 AM Tracy Smith <tlsmith3777@gmail.com> wrote:
>
> Not probing the edac driver turned out to be a device tree issue as
> Steve suspected. Thanks to both Steve and York, this has been resolved
> and the backport is now logging ECC errors after injection. Added the
> ddr qoriq-memory-controller entry since we used a different .dtsi
> file.
>
> arch/arm64/boot/dts/freescale/...ls1043a.dtsi
>
> ddr: memory-controller@1080000
> { compatible = "fsl,qoriq-memory-controller"; reg = <0x0 0x1080000 0x0
> 0x1000>; interrupts = <0 144 0x4>; big-endian; };
>
> I now need to collect and report CE and UE ECC errors and extend the
> existing logging and reporting function that I currently see. After
> reviewing the following document, the system logging appears different
> from that given in the kernel EDAC document. I need the level of
> granularity described in the edac.txt file.
>
> https://www.mjmwired.net/kernel/Documentation/edac.txt#173 same as
> kernel/Documentation/edac.txt
>
> 1)  Can I gather the system logging described below in the edac.txt
> file for layerscape?
>
> 2)  Is there anything similar to the edac-utils but for ARM, or does
> sysfs replace the edac-utils, or something else?
>
> 3)  What is currently used for collecting and reporting ECC errors for
> ARM/EDAC beyond the kernel log and messages?
> https://github.com/grondo/edac-utils
>
> 4) How is RAS reporting integrated into EDAC for error collection and reporting?
>
> 5) Has there been a patch to prevent EDAC sysfs API from reporting bogus values?
> See http://lkml.iu.edu/hypermail/linux/kernel/1205.3/02249.html
>
> - The EDAC sysfs API will still report bogus values. So, userspace
> tools like edac-utils will still use the bogus data;
>
> - Add a new tracepoint-based way to get the binary information about
> the errors.
>
> This is the logging I currently see with layerscape EDAC. Need
> something explaining these fields.
>
> [ 407.612311] EDAC FSL_DDR MC0: Err Detect Register: 0x80000004 [
> 407.618182] EDAC FSL_DDR MC0: Faulty Data bit: 0
> [ 407.622793] EDAC FSL_DDR MC0: Expected Data / ECC:
> 0x40c50901_40c50900 / 0x800000f0
> [ 407.630443] EDAC FSL_DDR MC0: Captured Data / ECC: 0x40c50900_40c50901 / 0xf0
> [ 407.637571] EDAC FSL_DDR MC0: Err addr: 0x3e0bfff50
> [ 407.642440] EDAC FSL_DDR MC0: PFN: 0x003e0bff
>
> This is the level of detail I need:
>
> SYSTEM LOGGING
> --------------
>
> If logging for UEs and CEs is enabled, then system logs will contain
> information indicating that errors have been detected:
>
> EDAC MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0,
> channel 1 "DIMM_B1": amd76x_edac
>
> EDAC MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0,
> channel 1 "DIMM_B1": amd76x_edac
>
> The structure of the message is:
>     the memory controller            (MC0)
>     Error type                               (CE)
>     memory page                         (0x283)
>     offset in the page                   (0xce0)
>     the byte granularity                (grain 8)
>         or resolution of the error
>     the error syndrome                 (0xb741)
>     memory row                            (row 0)
>     memory channel                     (channel 1)
>     DIMM label, if set prior            (DIMM B1
>     and then an optional, driver-specific message that may
>             have additional information.
>
> Both UEs and CEs with no info will lack all but memory controller, error
> type, a notice of "no info" and then an optional, driver-specific error
> message.
>
> On Mon, Nov 19, 2018 at 10:48 AM York Sun <york.sun@nxp.com> wrote:
> >
> > On 11/19/18 8:38 AM, Tracy Smith wrote:
> > > Steve, you were correct, there wasn't a device tree entry for the
> > > qoriq memory controller in
> > > arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi.  I added it making it
> > > identical to the fsl-ls1046s.dtsi, which should have the same memory
> > > controller and entry as the ls1043a.  I added this but it didn't make
> > > a difference as far as being able to call the probe function. I'm now
> > > checking the mpc85xx_edac.c dtsi entry for comparison since York used
> > > the mpc85xx as the basis for the layerscape, but there is something
> > > else missing preventing the probe function from being called.
> > >
> > > @York
> > > What is your entry for
> > > /proc/device-tree/soc/ifc@1530000/board-control@1,0/compatible
> >
> > EDAC driver doesn't check IFC. Are you debugging EDAC for memory controller?
> >
> > >
> > > @York
> > > cat /proc/device-tree/compatible entry is this, is this correct?
> > > fsl,ls1043a-rdbfsl,ls1043a
> >
> > Once again, you are using your modified code on your own board. So it is
> > not ls1043ardb. This compatible has nothing to do with EDAC driver.
> >
> > I cannot help you with ls1043ardb because the real ls1043ardb board
> > doesn't support ECC. The closest board I have is ls1046ardb.
> >
> > >
> > >                 ddr: memory-controller@1080000 {
> > >                          compatible = "fsl,qoriq-memory-controller";
> > >                          reg = <0x0 0x1080000 0x0 0x1000>;
> > >                          interrupts = <0 144 0x4>;
> > >                          big-endian;
> > >                  };
> >
> > This is your source code, not your final device tree. Please learn to
> > use "fdt" command under U-Boot to dump your device tree before booting
> > Linux, or check after Linux is up. For your reference, on my ls1046ardb,
> > I have
> >
> > # cat /proc/device-tree/soc/memory-controller@1080000/compatible
> > fsl,qoriq-memory-controller
> >
> > York
>
>
>
> --
> Confidentiality notice: This e-mail message, including any
> attachments, may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), please
> immediately notify the sender and delete this e-mail message.



-- 
Confidentiality notice: This e-mail message, including any
attachments, may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), please
immediately notify the sender and delete this e-mail message.

  reply	other threads:[~2018-11-21 22:03 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <BYAPR02MB431115EC4735AE5B7E29F2CEF6DC0@BYAPR02MB4311.namprd02.prod.outlook.com>
     [not found] ` <BYAPR02MB43110062F32BFDEA712AB371F6DC0@BYAPR02MB4311.namprd02.prod.outlook.com>
     [not found]   ` <CAChUvXMp6S6MBY_LmrfgdPcctQw70FoyxbiHeFqK+5fQx5omCw@mail.gmail.com>
2018-11-16 17:07     ` FW: edac driver initialization, interrupt, & debug Tracy Smith
2018-11-17 14:05       ` Borislav Petkov
2018-11-17 23:22         ` Tracy Smith
2018-11-18  1:05           ` Steve Inkpen
2018-11-19 16:37             ` Tracy Smith
2018-11-19 16:48               ` York Sun
2018-11-21 17:01                 ` Tracy Smith
2018-11-21 22:02                   ` Tracy Smith [this message]
2018-11-28 18:48                   ` edac driver injection of uncorrected errors & utils Tracy Smith
2018-11-28 19:06                     ` York Sun
2018-11-28 19:11                       ` Tracy Smith
2018-11-28 19:24                         ` York Sun
2018-11-28 22:14                           ` Tracy Smith
2018-11-28 23:44                             ` Borislav Petkov
2018-12-05 16:37                               ` Tracy Smith
2018-12-05 17:12                                 ` Borislav Petkov
2018-12-05 17:59                                 ` York Sun
2018-12-05 21:59                                   ` Patrol scrub questions Tracy Smith
2018-12-05 22:12                                     ` York Sun
2018-12-05 22:53                                       ` Layerscape behavior when a UE is detected Tracy Smith
2018-12-05 22:57                                         ` York Sun
2018-12-05 23:41                                           ` Layerscape UE detected and no EDAC panic Tracy Smith
2018-11-19 16:24           ` FW: edac driver initialization, interrupt, & debug York Sun
2018-11-19 15:55         ` York Sun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAChUvXMWp1Pou9koFP7QMz4CxJ-OrH2qiGTn6nrYZsDyHdoeyA@mail.gmail.com \
    --to=tlsmith3777@gmail.com \
    --cc=backports@vger.kernel.org \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=linux-newbie@vger.kernel.org \
    --cc=steve@theinkpens.com \
    --cc=util-linux@vger.kernel.org \
    --cc=york.sun@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).