linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xiaofei Tan <tanxiaofei@huawei.com>
To: <linux-acpi@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<rjw@rjwysocki.net>, <lenb@kernel.org>, <james.morse@arm.com>,
	<tony.luck@intel.com>, <bp@alien8.de>
Cc: <linuxarm@huawei.com>, <shiju.jose@huawei.com>,
	<jonathan.cameron@huawei.com>,
	Xiaofei Tan <tanxiaofei@huawei.com>
Subject: [PATCH] ACPI / APEI: do memory failure on the physical address reported by ARM processor error section
Date: Thu, 30 Jul 2020 15:32:28 +0800	[thread overview]
Message-ID: <1596094348-10230-1-git-send-email-tanxiaofei@huawei.com> (raw)

After the following commit applied, user-mode SEA is preferentially
processed by APEI. Do memory failure to recover.

But there are some problems:
1) The function apei_claim_sea() has processed an CPER, does not
mean that memory failure handling has done. Because the firmware-first
RAS error is reported by both producer and consumer. Mostly SEA uses
ARM processor error section to report as a consumer. (The producer could
be DDRC and cache, and use memory error section and other error section
to report). But memory failure handling for ARM processor error section
has not been supported. We should add it.

2) Some hardware platforms can't record physical address each time. But
they could always have reported a firmware-first RAS error using ARM
processor error section. Such platform should update firmware. Don't
report the RAS error when physical address is not recorded.

Fixes: 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work")
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
---
 drivers/acpi/apei/ghes.c | 42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 81bf71b..07bfa28 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -466,6 +466,44 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
 	return false;
 }
 
+static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, int sev)
+{
+	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
+	struct cper_arm_err_info *err_info;
+	bool queued = false;
+	int sec_sev, i;
+
+	log_arm_hw_error(err);
+
+	if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))
+		return false;
+
+	sec_sev = ghes_severity(gdata->error_severity);
+	if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
+		return false;
+
+	err_info = (struct cper_arm_err_info *) (err + 1);
+	for (i = 0; i < err->err_info_num; i++, err_info++) {
+		unsigned long pfn;
+
+		if (!(err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR))
+			continue;
+
+		pfn = PHYS_PFN(err_info->physical_fault_addr);
+		if (!pfn_valid(pfn)) {
+			pr_warn(FW_WARN GHES_PFX
+				"Invalid address in generic error data: 0x%#llx\n",
+				err_info->physical_fault_addr);
+			continue;
+		}
+
+		memory_failure_queue(pfn, 0);
+		queued = true;
+	}
+
+	return queued;
+}
+
 /*
  * PCIe AER errors need to be sent to the AER driver for reporting and
  * recovery. The GHES severities map to the following AER severities and
@@ -543,9 +581,7 @@ static bool ghes_do_proc(struct ghes *ghes,
 			ghes_handle_aer(gdata);
 		}
 		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
-			struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
-
-			log_arm_hw_error(err);
+			queued = ghes_handle_arm_hw_error(gdata, sev);
 		} else {
 			void *err = acpi_hest_get_payload(gdata);
 
-- 
2.8.1


             reply	other threads:[~2020-07-30  7:33 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-30  7:32 Xiaofei Tan [this message]
2020-07-31 13:48 ` [PATCH] ACPI / APEI: do memory failure on the physical address reported by ARM processor error section James Morse
2020-08-03  9:14   ` Xiaofei Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1596094348-10230-1-git-send-email-tanxiaofei@huawei.com \
    --to=tanxiaofei@huawei.com \
    --cc=bp@alien8.de \
    --cc=james.morse@arm.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=rjw@rjwysocki.net \
    --cc=shiju.jose@huawei.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).