Linux-EDAC Archive on lore.kernel.org
 help / color / Atom feed
From: Shiju Jose <shiju.jose@huawei.com>
To: <linux-edac@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <bp@alien8.de>,
	<tony.luck@intel.com>, <rjw@rjwysocki.net>, <james.morse@arm.com>,
	<lenb@kernel.org>
Cc: <linuxarm@huawei.com>, <shiju.jose@huawei.com>
Subject: [RFC PATCH 7/7] ACPI / APEI: Add reporting ARM64 CPU correctable errors to the CEC
Date: Fri, 2 Oct 2020 13:22:35 +0100
Message-ID: <20201002122235.1280-8-shiju.jose@huawei.com> (raw)
In-Reply-To: <20201002122235.1280-1-shiju.jose@huawei.com>

Add reporting ARM64 CPU correctable errors to the RAS correctable
errors collector(CEC).

ARM processor error types are cache/TLB/bus errors.
Any of the above error types should not be consider for the
error collection and CPU core isolation?

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/acpi/apei/ghes.c | 36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 81bf71b10d44..3cecb457d352 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -511,6 +511,38 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
 #endif
 }
 
+static void ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata)
+{
+	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
+	struct cper_arm_err_info *err_info;
+	int sec_sev;
+	int cpu, i, ret;
+
+	log_arm_hw_error(err);
+
+	sec_sev = ghes_severity(gdata->error_severity);
+	if (sec_sev != GHES_SEV_CORRECTED)
+		return;
+
+#if defined(CONFIG_ARM64)
+	cpu = get_logical_index(err->mpidr);
+	if (cpu == -EINVAL)
+		return;
+
+	/* ARM processor error types are cache/tlb/bus errors.
+	 * Any of the above error types should not be consider for the
+	 * error collection and CPU core isolation?
+	 */
+	err_info = (struct cper_arm_err_info *)(err + 1);
+	for (i = 0; i < err->err_info_num; i++) {
+		ret = cec_cpu_add_elem(cpu, err_info->multiple_error + 1);
+		if (ret)
+			break;
+		err_info += 1;
+	}
+#endif
+}
+
 static bool ghes_do_proc(struct ghes *ghes,
 			 const struct acpi_hest_generic_status *estatus)
 {
@@ -543,9 +575,7 @@ static bool ghes_do_proc(struct ghes *ghes,
 			ghes_handle_aer(gdata);
 		}
 		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
-			struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
-
-			log_arm_hw_error(err);
+			ghes_handle_arm_hw_error(gdata);
 		} else {
 			void *err = acpi_hest_get_payload(gdata);
 
-- 
2.17.1



  parent reply index

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-02 12:22 [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period Shiju Jose
2020-10-02 12:22 ` [RFC PATCH 1/7] RAS/CEC: Replace the macro PFN with ELEM_NO Shiju Jose
2020-10-02 12:22 ` [RFC PATCH 2/7] RAS/CEC: Replace pfns_poisoned with elems_poisoned Shiju Jose
2020-10-02 12:22 ` [RFC PATCH 3/7] RAS/CEC: Move X86 MCE specific code under CONFIG_X86_MCE Shiju Jose
2020-10-02 12:22 ` [RFC PATCH 4/7] RAS/CEC: Modify cec_mod_work() for common use Shiju Jose
2020-10-02 12:22 ` [RFC PATCH 5/7] RAS/CEC: Add support for errors count check on short time period Shiju Jose
2020-10-02 12:22 ` [RFC PATCH 6/7] RAS/CEC: Add CPU Correctable Error Collector to isolate an erroneous CPU core Shiju Jose
2020-10-02 12:22 ` Shiju Jose [this message]
2020-10-02 12:43 ` [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period Borislav Petkov
2020-10-02 15:38   ` Shiju Jose
2020-10-02 17:33     ` James Morse
2020-10-02 18:02       ` Borislav Petkov
2020-10-06 16:13       ` Shiju Jose
2020-10-07 16:45         ` James Morse
2020-10-02 16:04   ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201002122235.1280-8-shiju.jose@huawei.com \
    --to=shiju.jose@huawei.com \
    --cc=bp@alien8.de \
    --cc=james.morse@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=rjw@rjwysocki.net \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-EDAC Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-edac/0 linux-edac/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-edac linux-edac/ https://lore.kernel.org/linux-edac \
		linux-edac@vger.kernel.org
	public-inbox-index linux-edac

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-edac


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git