linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Quan Nguyen <quan@os.amperecomputing.com>
To: macro@orcam.me.uk, Lee Jones <lee@kernel.org>,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	Rob Herring <robh+dt@kernel.org>,
	Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>,
	Jean Delvare <jdelvare@suse.com>,
	Guenter Roeck <linux@roeck-us.net>,
	Jonathan Corbet <corbet@lwn.net>,
	Derek Kiernan <derek.kiernan@xilinx.com>,
	Dragan Cvetic <dragan.cvetic@xilinx.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Quan Nguyen <quan@os.amperecomputing.com>,
	Thu Nguyen <thu@os.amperecomputing.com>,
	linux-kernel@vger.kernel.org, devicetree@vger.kernel.org,
	linux-hwmon@vger.kernel.org, linux-doc@vger.kernel.org,
	OpenBMC Maillist <openbmc@lists.ozlabs.org>,
	Open Source Submission <patches@amperecomputing.com>
Cc: Phong Vo <phong@os.amperecomputing.com>, thang@os.amperecomputing.com
Subject: [PATCH v9 4/9] docs: misc-devices: (smpro-errmon) Add documentation
Date: Thu, 29 Sep 2022 16:43:16 +0700	[thread overview]
Message-ID: <20220929094321.770125-5-quan@os.amperecomputing.com> (raw)
In-Reply-To: <20220929094321.770125-1-quan@os.amperecomputing.com>

Adds documentation for Ampere(R)'s Altra(R) SMpro errmon driver.

Signed-off-by: Thu Nguyen <thu@os.amperecomputing.com>
Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
---
Changes in v9:
  + Fix issue when building htmldocs                      [Bagas]
  + Remove unnecessary channel info for VRD and DIMM event [Quan]
  + Update SPDX license info                               [Greg]
  + Update document to align with new changes in sysfs     [Quan]

Changes in v8:
  + Update to reflect single value per sysfs  [Quan]

Changes in v7:
  + None

Changes in v6:
  + First introduced in v6 [Quan]

 Documentation/misc-devices/index.rst        |   1 +
 Documentation/misc-devices/smpro-errmon.rst | 193 ++++++++++++++++++++
 2 files changed, 194 insertions(+)
 create mode 100644 Documentation/misc-devices/smpro-errmon.rst

diff --git a/Documentation/misc-devices/index.rst b/Documentation/misc-devices/index.rst
index 756be15a49a4..b74b3b34a235 100644
--- a/Documentation/misc-devices/index.rst
+++ b/Documentation/misc-devices/index.rst
@@ -27,6 +27,7 @@ fit into other categories.
    max6875
    oxsemi-tornado
    pci-endpoint-test
+   smpro-errmon
    spear-pcie-gadget
    uacce
    xilinx_sdfec
diff --git a/Documentation/misc-devices/smpro-errmon.rst b/Documentation/misc-devices/smpro-errmon.rst
new file mode 100644
index 000000000000..b17f30a6cafd
--- /dev/null
+++ b/Documentation/misc-devices/smpro-errmon.rst
@@ -0,0 +1,193 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+Kernel driver Ampere(R)'s Altra(R) SMpro errmon
+===============================================
+
+Supported chips:
+
+  * Ampere(R) Altra(R)
+
+    Prefix: 'smpro'
+
+    Preference: Altra SoC BMC Interface Specification
+
+Author: Thu Nguyen <thu@os.amperecomputing.com>
+
+Description
+-----------
+
+This driver supports hardware monitoring for Ampere(R) Altra(R) SoC's based on the
+SMpro co-processor (SMpro).
+The following SoC alert/event types are supported by the errmon driver:
+
+* Core CE/UE error
+* Memory CE/UE error
+* PCIe CE/UE error
+* Other CE/UE error
+* Internal SMpro/PMpro error
+* VRD hot
+* VRD warn/fault
+* DIMM Hot
+
+The SMpro interface provides the registers to query the status of the SoC alerts/events
+and their data and export to userspace by this driver.
+
+The SoC alerts/events will be referenced as error below.
+
+Usage Notes
+-----------
+
+SMpro errmon driver creates the sysfs files for each error type.
+Example: ``error_core_ce`` to get Core CE error type.
+
+* If the error is absented, the sysfs file returns empty.
+* If the errors are presented, one each read to the sysfs, the oldest error will be returned and clear, the next read will be returned with the next error until all the errors are read out.
+
+For each host error type, SMpro keeps a latest max number of errors. All the oldest errors that were not read out will be dropped. In that case, the read to the corresponding overflow sysfs will return 1, otherwise, return 0.
+Example: ``overflow_core_ce`` to report the overflow status of Core CE error type.
+
+The format of the error is depended on the error type.
+
+1) For Core/Memory/PCIe/Other CE/UE error types::
+
+The return 48-byte in hex format in table below:
+
+    =======   =============   ===========   ==========================================
+    OFFSET    FIELD           SIZE (BYTE)   DESCRIPTION
+    =======   =============   ===========   ==========================================
+    00        Error Type      1             See Table below for details
+    01        Subtype         1             See Table below for details
+    02        Instance        2             See Table below for details
+    04        Error status    4             See ARM RAS specification for details
+    08        Error Address   8             See ARM RAS specification for details
+    16        Error Misc 0    8             See ARM RAS specification for details
+    24        Error Misc 1    8             See ARM RAS specification for details
+    32        Error Misc 2    8             See ARM RAS specification for details
+    40        Error Misc 3    8             See ARM RAS specification for details
+    =======   =============   ===========   ==========================================
+
+Below table defines the value of Error types, Sub Types, Sub component and instance:
+
+    ============    ==========    =========   ===============  ====================================
+    Error Group     Error Type    Sub type    Sub component    Instance
+    ============    ==========    =========   ===============  ====================================
+    CPM (core)      0             0           Snoop-Logic      CPM #
+    CPM (core)      0             2           Armv8 Core 1     CPM #
+    MCU (mem)       1             1           ERR1             MCU # | SLOT << 11
+    MCU (mem)       1             2           ERR2             MCU # | SLOT << 11
+    MCU (mem)       1             3           ERR3             MCU #
+    MCU (mem)       1             4           ERR4             MCU #
+    MCU (mem)       1             5           ERR5             MCU #
+    MCU (mem)       1             6           ERR6             MCU #
+    MCU (mem)       1             7           Link Error       MCU #
+    Mesh (other)    2             0           Cross Point      X | (Y << 5) | NS <<11
+    Mesh (other)    2             1           Home Node(IO)    X | (Y << 5) | NS <<11
+    Mesh (other)    2             2           Home Node(Mem)   X | (Y << 5) | NS <<11 | device<<12
+    Mesh (other)    2             4           CCIX Node        X | (Y << 5) | NS <<11
+    2P Link (other) 3             0           N/A              Altra 2P Link #
+    GIC (other)     5             0           ERR0             0
+    GIC (other)     5             1           ERR1             0
+    GIC (other)     5             2           ERR2             0
+    GIC (other)     5             3           ERR3             0
+    GIC (other)     5             4           ERR4             0
+    GIC (other)     5             5           ERR5             0
+    GIC (other)     5             6           ERR6             0
+    GIC (other)     5             7           ERR7             0
+    GIC (other)     5             8           ERR8             0
+    GIC (other)     5             9           ERR9             0
+    GIC (other)     5             10          ERR10            0
+    GIC (other)     5             11          ERR11            0
+    GIC (other)     5             12          ERR12            0
+    GIC (other)     5             13-21       ERR13            RC# + 1
+    SMMU (other)    6             TCU         100              RC #
+    SMMU (other)    6             TBU0        0                RC #
+    SMMU (other)    6             TBU1        1                RC #
+    SMMU (other)    6             TBU2        2                RC #
+    SMMU (other)    6             TBU3        3                RC #
+    SMMU (other)    6             TBU4        4                RC #
+    SMMU (other)    6             TBU5        5                RC #
+    SMMU (other)    6             TBU6        6                RC #
+    SMMU (other)    6             TBU7        7                RC #
+    SMMU (other)    6             TBU8        8                RC #
+    SMMU (other)    6             TBU9        9                RC #
+    PCIe AER (pcie) 7             Root        0                RC #
+    PCIe AER (pcie) 7             Device      1                RC #
+    PCIe RC (pcie)  8             RCA HB      0                RC #
+    PCIe RC (pcie)  8             RCB HB      1                RC #
+    PCIe RC (pcie)  8             RASDP       8                RC #
+    OCM (other)     9             ERR0        0                0
+    OCM (other)     9             ERR1        1                0
+    OCM (other)     9             ERR2        2                0
+    SMpro (other)   10            ERR0        0                0
+    SMpro (other)   10            ERR1        1                0
+    SMpro (other)   10            MPA_ERR     2                0
+    PMpro (other)   11            ERR0        0                0
+    PMpro (other)   11            ERR1        1                0
+    PMpro (other)   11            MPA_ERR     2                0
+    ============    ==========    =========   ===============  ====================================
+
+    For example:
+    # cat error_other_ue
+    880807001e004010401040101500000001004010401040100c0000000000000000000000000000000000000000000000
+
+2) For the Internal SMpro/PMpro error types::
+
+The error_[smpro|pmro] sysfs returns string of 8-byte hex value:
+    <4-byte hex value of Error info><4-byte hex value of Error extensive data>
+
+The warn_[smpro|pmro] sysfs returns string of 4-byte hex value:
+    <4-byte hex value of Warning info>
+
+Reference to Altra SoC BMC Interface Specification for the details.
+
+3) For the VRD hot, VRD /warn/fault, DIMM Hot event::
+
+The return string is 2-byte hex string value. Reference to section 5.7 GPI status register in Altra SoC BMC Interface Specification for the details.
+
+    Example:
+    #cat event_vrd_hot
+    0000
+
+Sysfs entries
+-------------
+
+The following sysfs files are supported:
+
+* Ampere(R) Altra(R):
+
+Alert Types:
+
+    ========================  =================  ==================================================
+    Alert Type                Sysfs name         Description
+    ========================  =================  ==================================================
+    Core CE Error             error_core_ce      Trigger when Core has CE error
+    Core CE Error overflow    overflow_core_ce   Trigger when Core CE error overflow
+    Core UE Error             error_core_ue      Trigger when Core has UE error
+    Core UE Error overflow    overflow_core_ue   Trigger when Core UE error overflow
+    Memory CE Error           error_mem_ce       Trigger when Memory has CE error
+    Memory CE Error overflow  overflow_mem_ce    Trigger when Memory CE error overflow
+    Memory UE Error           error_mem_ue       Trigger when Memory has UE error
+    Memory UE Error overflow  overflow_mem_ue    Trigger when Memory UE error overflow
+    PCIe CE Error             error_pcie_ce      Trigger when any PCIe controller has CE error
+    PCIe CE Error overflow    overflow_pcie_ce   Trigger when any PCIe controller CE error overflow
+    PCIe UE Error             error_pcie_ue      Trigger when any PCIe controller has UE error
+    PCIe UE Error overflow    overflow_pcie_ue   Trigger when any PCIe controller UE error overflow
+    Other CE Error            error_other_ce     Trigger when any Others CE error
+    Other CE Error overflow   overflow_other_ce  Trigger when any Others CE error overflow
+    Other UE Error            error_other_ue     Trigger when any Others UE error
+    Other UE Error overflow   overflow_other_ue  Trigger when Others UE error overflow
+    SMpro Error               error_smpro        Trigger when system have SMpro error
+    SMpro Warning             warn_smpro         Trigger when system have SMpro warning
+    PMpro Error               error_pmpro        Trigger when system have PMpro error
+    PMpro Warning             warn_pmpro         Trigger when system have PMpro warning
+    ========================  =================  ==================================================
+
+Event Type:
+
+    ============================ ==========================
+    Event Type                   Sysfs name
+    ============================ ==========================
+    VRD HOT                      event_vrd_hot
+    VR Warn/Fault                event_vrd_warn_fault
+    DIMM Hot                     event_dimm_hot
+    ============================ ==========================
-- 
2.35.1


  parent reply	other threads:[~2022-09-29  9:44 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-29  9:43 [PATCH v9 0/9] Add Ampere's Altra SMPro MFD and its child drivers Quan Nguyen
2022-09-29  9:43 ` [PATCH v9 1/9] hwmon: smpro: Add Ampere's Altra smpro-hwmon driver Quan Nguyen
2022-10-01 12:59   ` Bagas Sanjaya
2022-10-06  7:47     ` Quan Nguyen
2022-10-26 15:00   ` Guenter Roeck
2022-10-27  3:39     ` Quan Nguyen
2022-09-29  9:43 ` [PATCH v9 2/9] docs: hwmon: (smpro-hwmon) Add documentation Quan Nguyen
2022-10-01 12:56   ` Bagas Sanjaya
2022-10-06  7:47     ` Quan Nguyen
2022-09-29  9:43 ` [PATCH v9 3/9] misc: smpro-errmon: Add Ampere's SMpro error monitor driver Quan Nguyen
2022-09-29  9:53   ` Greg Kroah-Hartman
2022-10-06  7:44     ` Quan Nguyen
2022-09-29  9:43 ` Quan Nguyen [this message]
2022-09-29  9:56   ` [PATCH v9 4/9] docs: misc-devices: (smpro-errmon) Add documentation Greg Kroah-Hartman
2022-10-06  7:46     ` Quan Nguyen
2022-09-30  6:07   ` kernel test robot
2022-09-30 13:13   ` Bagas Sanjaya
2022-10-06  7:46     ` Quan Nguyen
2022-09-29  9:43 ` [PATCH v9 5/9] misc: smpro-misc: Add Ampere's Altra SMpro misc driver Quan Nguyen
2022-09-29  9:55   ` Greg Kroah-Hartman
2022-10-06  7:45     ` Quan Nguyen
2022-09-29  9:43 ` [PATCH v9 6/9] docs: misc-devices: (smpro-misc) Add documentation Quan Nguyen
2022-09-29  9:56   ` Greg Kroah-Hartman
2022-10-06  7:45     ` Quan Nguyen
2022-10-01  4:11   ` Bagas Sanjaya
2022-10-06  7:47     ` Quan Nguyen
2022-09-29  9:43 ` [PATCH v9 7/9] dt-bindings: mfd: Add bindings for Ampere Altra SMPro MFD driver Quan Nguyen
2022-09-29  9:43 ` [PATCH v9 8/9] mfd: Add Ampere's Altra SMpro " Quan Nguyen
2022-10-01 10:11   ` kernel test robot
2022-10-24 12:39   ` Lee Jones
2022-10-27  3:39     ` Quan Nguyen
2022-09-29  9:43 ` [PATCH v9 9/9] docs: ABI: testing: Document the Ampere Altra Family's SMpro sysfs interfaces Quan Nguyen
2022-09-30  9:38   ` Bagas Sanjaya
2022-10-06  7:46     ` Quan Nguyen
2022-09-30 19:41   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220929094321.770125-5-quan@os.amperecomputing.com \
    --to=quan@os.amperecomputing.com \
    --cc=arnd@arndb.de \
    --cc=bagasdotme@gmail.com \
    --cc=corbet@lwn.net \
    --cc=derek.kiernan@xilinx.com \
    --cc=devicetree@vger.kernel.org \
    --cc=dragan.cvetic@xilinx.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jdelvare@suse.com \
    --cc=krzysztof.kozlowski+dt@linaro.org \
    --cc=lee@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-hwmon@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=macro@orcam.me.uk \
    --cc=openbmc@lists.ozlabs.org \
    --cc=patches@amperecomputing.com \
    --cc=phong@os.amperecomputing.com \
    --cc=robh+dt@kernel.org \
    --cc=thang@os.amperecomputing.com \
    --cc=thu@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).