util-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tracy Smith <tlsmith3777@gmail.com>
To: york.sun@nxp.com
Cc: steve@theinkpens.com, bp@alien8.de, linux-edac@vger.kernel.org,
	backports@vger.kernel.org, linux-newbie@vger.kernel.org,
	util-linux@vger.kernel.org, linux-mmc@vger.kernel.org
Subject: Re: edac driver initialization, interrupt, & debug
Date: Wed, 21 Nov 2018 11:01:01 -0600	[thread overview]
Message-ID: <CAChUvXPCfwfHrntJHWpsydYZE=P692Axd0pFE+GjZCXtx1fgog@mail.gmail.com> (raw)
In-Reply-To: <AM0PR04MB3971FBCF6CB23BE778D39EED9AD80@AM0PR04MB3971.eurprd04.prod.outlook.com>

Not probing the edac driver turned out to be a device tree issue as
Steve suspected. Thanks to both Steve and York, this has been resolved
and the backport is now logging ECC errors after injection. Added the
ddr qoriq-memory-controller entry since we used a different .dtsi
file.

arch/arm64/boot/dts/freescale/...ls1043a.dtsi

ddr: memory-controller@1080000
{ compatible = "fsl,qoriq-memory-controller"; reg = <0x0 0x1080000 0x0
0x1000>; interrupts = <0 144 0x4>; big-endian; };

I now need to collect and report CE and UE ECC errors and extend the
existing logging and reporting function that I currently see. After
reviewing the following document, the system logging appears different
from that given in the kernel EDAC document. I need the level of
granularity described in the edac.txt file.

https://www.mjmwired.net/kernel/Documentation/edac.txt#173 same as
kernel/Documentation/edac.txt

1)  Can I gather the system logging described below in the edac.txt
file for layerscape?

2)  Is there anything similar to the edac-utils but for ARM, or does
sysfs replace the edac-utils, or something else?

3)  What is currently used for collecting and reporting ECC errors for
ARM/EDAC beyond the kernel log and messages?
https://github.com/grondo/edac-utils

4) How is RAS reporting integrated into EDAC for error collection and reporting?

5) Has there been a patch to prevent EDAC sysfs API from reporting bogus values?
See http://lkml.iu.edu/hypermail/linux/kernel/1205.3/02249.html

- The EDAC sysfs API will still report bogus values. So, userspace
tools like edac-utils will still use the bogus data;

- Add a new tracepoint-based way to get the binary information about
the errors.

This is the logging I currently see with layerscape EDAC. Need
something explaining these fields.

[ 407.612311] EDAC FSL_DDR MC0: Err Detect Register: 0x80000004 [
407.618182] EDAC FSL_DDR MC0: Faulty Data bit: 0
[ 407.622793] EDAC FSL_DDR MC0: Expected Data / ECC:
0x40c50901_40c50900 / 0x800000f0
[ 407.630443] EDAC FSL_DDR MC0: Captured Data / ECC: 0x40c50900_40c50901 / 0xf0
[ 407.637571] EDAC FSL_DDR MC0: Err addr: 0x3e0bfff50
[ 407.642440] EDAC FSL_DDR MC0: PFN: 0x003e0bff

This is the level of detail I need:

SYSTEM LOGGING
--------------

If logging for UEs and CEs is enabled, then system logs will contain
information indicating that errors have been detected:

EDAC MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0,
channel 1 "DIMM_B1": amd76x_edac

EDAC MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0,
channel 1 "DIMM_B1": amd76x_edac

The structure of the message is:
    the memory controller            (MC0)
    Error type                               (CE)
    memory page                         (0x283)
    offset in the page                   (0xce0)
    the byte granularity                (grain 8)
        or resolution of the error
    the error syndrome                 (0xb741)
    memory row                            (row 0)
    memory channel                     (channel 1)
    DIMM label, if set prior            (DIMM B1
    and then an optional, driver-specific message that may
            have additional information.

Both UEs and CEs with no info will lack all but memory controller, error
type, a notice of "no info" and then an optional, driver-specific error
message.

On Mon, Nov 19, 2018 at 10:48 AM York Sun <york.sun@nxp.com> wrote:
>
> On 11/19/18 8:38 AM, Tracy Smith wrote:
> > Steve, you were correct, there wasn't a device tree entry for the
> > qoriq memory controller in
> > arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi.  I added it making it
> > identical to the fsl-ls1046s.dtsi, which should have the same memory
> > controller and entry as the ls1043a.  I added this but it didn't make
> > a difference as far as being able to call the probe function. I'm now
> > checking the mpc85xx_edac.c dtsi entry for comparison since York used
> > the mpc85xx as the basis for the layerscape, but there is something
> > else missing preventing the probe function from being called.
> >
> > @York
> > What is your entry for
> > /proc/device-tree/soc/ifc@1530000/board-control@1,0/compatible
>
> EDAC driver doesn't check IFC. Are you debugging EDAC for memory controller?
>
> >
> > @York
> > cat /proc/device-tree/compatible entry is this, is this correct?
> > fsl,ls1043a-rdbfsl,ls1043a
>
> Once again, you are using your modified code on your own board. So it is
> not ls1043ardb. This compatible has nothing to do with EDAC driver.
>
> I cannot help you with ls1043ardb because the real ls1043ardb board
> doesn't support ECC. The closest board I have is ls1046ardb.
>
> >
> >                 ddr: memory-controller@1080000 {
> >                          compatible = "fsl,qoriq-memory-controller";
> >                          reg = <0x0 0x1080000 0x0 0x1000>;
> >                          interrupts = <0 144 0x4>;
> >                          big-endian;
> >                  };
>
> This is your source code, not your final device tree. Please learn to
> use "fdt" command under U-Boot to dump your device tree before booting
> Linux, or check after Linux is up. For your reference, on my ls1046ardb,
> I have
>
> # cat /proc/device-tree/soc/memory-controller@1080000/compatible
> fsl,qoriq-memory-controller
>
> York



-- 
Confidentiality notice: This e-mail message, including any
attachments, may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), please
immediately notify the sender and delete this e-mail message.

  reply	other threads:[~2018-11-21 17:01 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <BYAPR02MB431115EC4735AE5B7E29F2CEF6DC0@BYAPR02MB4311.namprd02.prod.outlook.com>
     [not found] ` <BYAPR02MB43110062F32BFDEA712AB371F6DC0@BYAPR02MB4311.namprd02.prod.outlook.com>
     [not found]   ` <CAChUvXMp6S6MBY_LmrfgdPcctQw70FoyxbiHeFqK+5fQx5omCw@mail.gmail.com>
2018-11-16 17:07     ` FW: edac driver initialization, interrupt, & debug Tracy Smith
2018-11-17 14:05       ` Borislav Petkov
2018-11-17 23:22         ` Tracy Smith
2018-11-18  1:05           ` Steve Inkpen
2018-11-19 16:37             ` Tracy Smith
2018-11-19 16:48               ` York Sun
2018-11-21 17:01                 ` Tracy Smith [this message]
2018-11-21 22:02                   ` Tracy Smith
2018-11-28 18:48                   ` edac driver injection of uncorrected errors & utils Tracy Smith
2018-11-28 19:06                     ` York Sun
2018-11-28 19:11                       ` Tracy Smith
2018-11-28 19:24                         ` York Sun
2018-11-28 22:14                           ` Tracy Smith
2018-11-28 23:44                             ` Borislav Petkov
2018-12-05 16:37                               ` Tracy Smith
2018-12-05 17:12                                 ` Borislav Petkov
2018-12-05 17:59                                 ` York Sun
2018-12-05 21:59                                   ` Patrol scrub questions Tracy Smith
2018-12-05 22:12                                     ` York Sun
2018-12-05 22:53                                       ` Layerscape behavior when a UE is detected Tracy Smith
2018-12-05 22:57                                         ` York Sun
2018-12-05 23:41                                           ` Layerscape UE detected and no EDAC panic Tracy Smith
2018-11-19 16:24           ` FW: edac driver initialization, interrupt, & debug York Sun
2018-11-19 15:55         ` York Sun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAChUvXPCfwfHrntJHWpsydYZE=P692Axd0pFE+GjZCXtx1fgog@mail.gmail.com' \
    --to=tlsmith3777@gmail.com \
    --cc=backports@vger.kernel.org \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=linux-newbie@vger.kernel.org \
    --cc=steve@theinkpens.com \
    --cc=util-linux@vger.kernel.org \
    --cc=york.sun@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).