[PATCH V1 0/6] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64

linux-efi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V1 0/6] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64
@ 2016-02-05 19:13 Tyler Baicar
  2016-02-05 19:13 ` [PATCH V1 1/6] acpi: apei: read ack upon ghes record consumption Tyler Baicar
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Tyler Baicar @ 2016-02-05 19:13 UTC (permalink / raw)
  To: fu.wei-QSEj5FYQhm4dnm+yROfE0A, timur-sgV2jX0FEOL9JmXXK+q4OQ,
	harba-sgV2jX0FEOL9JmXXK+q4OQ, rruigrok-sgV2jX0FEOL9JmXXK+q4OQ,
	ahs3-H+wXaHxf7aLQT0dZR+AlfA, Catalin Marinas, Will Deacon,
	Rafael J. Wysocki, Len Brown, Matt Fleming, Robert Moore,
	Lv Zheng, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, devel-E0kO6a4B6psdnm+yROfE0A
  Cc: Tyler Baicar

Add support for Generic Hardware Error Source (GHES) v2, which introduces the
capability for the OS to acknowledge the consumption of the error record
generated by the Reliability, Availability and Serviceability (RAS) controller.
This eliminates potential race conditions between the OS and the RAS controller.

Add support for the timestamp field added to the Generic Error Data Entry v3,
allowing the OS to log the time that the error is generated by the firmware,
rather than the time the error is consumed. This improves the correctness of
event sequences when analyzing error logs. The timestamp is added in
ACPI 6.1, reference Table 18-343 Generic Error Data Entry.

Add support for ARMv8 Common Platform Error Record (CPER) per UEFI 2.6
specification. ARMv8 specific processor error information is reported as part of
the CPER records.  This provides more detail on for processor error logs.

Synchronous External Abort (SEA) represents a specific processor error condition
in ARM systems. A handler is added to recognize SEA errors, and a notifier is
added to parse and report the errors before the process is killed. Refer to
section N.2.1.1 in the Common Platform Error Record appendix of the UEFI 2.6
specification.

Depends on: [PATCH v5] acpi, apei, arm64: APEI initial support for aarch64.
            https://lkml.org/lkml/2015/12/10/131

Tyler Baicar (6):
  acpi: apei: read ack upon ghes record consumption
  ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1
  efi: parse ARMv8 processor error
  arm64: exception: handle Synchronous External Abort
  arm64: exception: handle instruction abort at current EL
  acpi: apei: handle SEA notification type for ARMv8

 arch/arm64/Kconfig                   |   1 +
 arch/arm64/include/asm/system_misc.h |  13 ++
 arch/arm64/kernel/entry.S            |  19 +++
 arch/arm64/mm/fault.c                |  58 +++++++--
 drivers/acpi/apei/Kconfig            |  13 ++
 drivers/acpi/apei/ghes.c             | 159 ++++++++++++++++++++++-
 drivers/acpi/apei/hest.c             |   7 +-
 drivers/firmware/efi/cper.c          | 244 ++++++++++++++++++++++++++++++++---
 include/acpi/actbl1.h                |  46 ++++++-
 include/acpi/ghes.h                  |   1 +
 include/linux/cper.h                 |  72 +++++++++++
 11 files changed, 599 insertions(+), 34 deletions(-)

-- 
1.8.2.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V1 1/6] acpi: apei: read ack upon ghes record consumption
  2016-02-05 19:13 [PATCH V1 0/6] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64 Tyler Baicar
@ 2016-02-05 19:13 ` Tyler Baicar
  2016-02-05 19:13 ` [PATCH V1 3/6] efi: parse ARMv8 processor error Tyler Baicar
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Tyler Baicar @ 2016-02-05 19:13 UTC (permalink / raw)
  To: fu.wei, timur, harba, rruigrok, ahs3, Catalin Marinas,
	Will Deacon, Rafael J. Wysocki, Len Brown, Matt Fleming,
	Robert Moore, Lv Zheng, linux-arm-kernel, linux-kernel,
	linux-acpi, linux-efi, devel
  Cc: Tyler Baicar, Jonathan (Zhixiong) Zhang, Naveen Kaje

A RAS (Reliability, Availability, Serviceability) controller
may be a separate processor running in parallel with OS
execution, and may generate error records for consumption by
the OS. If the RAS controller produces multiple error records,
then they may be overwritten before the OS has consumed them.

The Generic Hardware Error Source (GHES) v2 structure
introduces the capability for the OS to acknowledge the
consumption of the error record generated by the RAS
controller. A RAS controller supporting GHESv2 shall wait for
the acknowledgment before writing a new error record, thus
eliminating the race condition.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Richard Ruigrok <rruigrok@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
---
 drivers/acpi/apei/ghes.c | 41 +++++++++++++++++++++++++++++++++++++++++
 drivers/acpi/apei/hest.c |  7 +++++--
 include/acpi/actbl1.h    | 21 ++++++++++++++++++++-
 include/acpi/ghes.h      |  1 +
 4 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 3dd9c46..db67711 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -45,6 +45,7 @@
 #include <linux/aer.h>
 #include <linux/nmi.h>
 
+#include <acpi/actbl1.h>
 #include <acpi/ghes.h>
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
@@ -239,10 +240,22 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 	struct ghes *ghes;
 	unsigned int error_block_length;
 	int rc;
+	struct acpi_hest_header *hest_hdr;
 
 	ghes = kzalloc(sizeof(*ghes), GFP_KERNEL);
 	if (!ghes)
 		return ERR_PTR(-ENOMEM);
+
+	hest_hdr = (struct acpi_hest_header *)generic;
+	if (hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR_V2) {
+		ghes->generic_v2 = (struct acpi_hest_generic_v2 *)generic;
+		rc = apei_map_generic_address(
+			&ghes->generic_v2->read_ack_reg_address);
+		if (rc)
+			goto err_unmap;
+	} else
+		ghes->generic_v2 = NULL;
+
 	ghes->generic = generic;
 	rc = apei_map_generic_address(&generic->error_status_address);
 	if (rc)
@@ -265,6 +278,9 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 
 err_unmap:
 	apei_unmap_generic_address(&generic->error_status_address);
+	if (ghes->generic_v2)
+		apei_unmap_generic_address(
+			&ghes->generic_v2->read_ack_reg_address);
 err_free:
 	kfree(ghes);
 	return ERR_PTR(rc);
@@ -274,6 +290,9 @@ static void ghes_fini(struct ghes *ghes)
 {
 	kfree(ghes->estatus);
 	apei_unmap_generic_address(&ghes->generic->error_status_address);
+	if (ghes->generic_v2)
+		apei_unmap_generic_address(
+			&ghes->generic_v2->error_status_address);
 }
 
 static inline int ghes_severity(int severity)
@@ -643,6 +662,22 @@ static void ghes_estatus_cache_add(
 	rcu_read_unlock();
 }
 
+static int ghes_do_read_ack(struct acpi_hest_generic_v2 *generic_v2)
+{
+	int rc;
+	u64 val = 0;
+
+	rc = apei_read(&val, &generic_v2->read_ack_reg_address);
+	if (rc)
+		return rc;
+	val &= generic_v2->read_ack_preserve <<
+		generic_v2->read_ack_reg_address.bit_offset;
+	val |= generic_v2->read_ack_write;
+	rc = apei_write(val, &generic_v2->read_ack_reg_address);
+
+	return rc;
+}
+
 static int ghes_proc(struct ghes *ghes)
 {
 	int rc;
@@ -655,6 +690,12 @@ static int ghes_proc(struct ghes *ghes)
 			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
 	}
 	ghes_do_proc(ghes, ghes->estatus);
+
+	if (ghes->generic_v2) {
+		rc = ghes_do_read_ack(ghes->generic_v2);
+		if (rc)
+			return rc;
+	}
 out:
 	ghes_clear_estatus(ghes);
 	return 0;
diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
index c708c95..ae43468 100644
--- a/drivers/acpi/apei/hest.c
+++ b/drivers/acpi/apei/hest.c
@@ -52,6 +52,7 @@ static const int hest_esrc_len_tab[ACPI_HEST_TYPE_RESERVED] = {
 	[ACPI_HEST_TYPE_AER_ENDPOINT] = sizeof(struct acpi_hest_aer),
 	[ACPI_HEST_TYPE_AER_BRIDGE] = sizeof(struct acpi_hest_aer_bridge),
 	[ACPI_HEST_TYPE_GENERIC_ERROR] = sizeof(struct acpi_hest_generic),
+	[ACPI_HEST_TYPE_GENERIC_ERROR_V2] = sizeof(struct acpi_hest_generic_v2),
 };
 
 static int hest_esrc_len(struct acpi_hest_header *hest_hdr)
@@ -147,7 +148,8 @@ static int __init hest_parse_ghes_count(struct acpi_hest_header *hest_hdr, void
 {
 	int *count = data;
 
-	if (hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR)
+	if (hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR ||
+	    hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR_V2)
 		(*count)++;
 	return 0;
 }
@@ -158,7 +160,8 @@ static int __init hest_parse_ghes(struct acpi_hest_header *hest_hdr, void *data)
 	struct ghes_arr *ghes_arr = data;
 	int rc, i;
 
-	if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR)
+	if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR &&
+	    hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR_V2)
 		return 0;
 
 	if (!((struct acpi_hest_generic *)hest_hdr)->enabled)
diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index 16e0136..82695c9 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -427,7 +427,8 @@ enum acpi_hest_types {
 	ACPI_HEST_TYPE_AER_ENDPOINT = 7,
 	ACPI_HEST_TYPE_AER_BRIDGE = 8,
 	ACPI_HEST_TYPE_GENERIC_ERROR = 9,
-	ACPI_HEST_TYPE_RESERVED = 10	/* 10 and greater are reserved */
+	ACPI_HEST_TYPE_GENERIC_ERROR_V2 = 10,
+	ACPI_HEST_TYPE_RESERVED = 11	/* 11 and greater are reserved */
 };
 
 /*
@@ -603,6 +604,24 @@ struct acpi_hest_generic {
 	u32 error_block_length;
 };
 
+/* 10: Generic Hardware Error Source V2*/
+
+struct acpi_hest_generic_v2 {
+	struct acpi_hest_header header;
+	u16 related_source_id;
+	u8 reserved;
+	u8 enabled;
+	u32 records_to_preallocate;
+	u32 max_sections_per_record;
+	u32 max_raw_data_length;
+	struct acpi_generic_address error_status_address;
+	struct acpi_hest_notify notify;
+	u32 error_block_length;
+	struct acpi_generic_address read_ack_reg_address;
+	u64 read_ack_preserve;
+	u64 read_ack_write;
+};
+
 /* Generic Error Status block */
 
 struct acpi_hest_generic_status {
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 720446c..d0108b6 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -14,6 +14,7 @@
 
 struct ghes {
 	struct acpi_hest_generic *generic;
+	struct acpi_hest_generic_v2 *generic_v2;
 	struct acpi_hest_generic_status *estatus;
 	u64 buffer_paddr;
 	unsigned long flags;
-- 
1.8.2.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V1 2/6] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1
       [not found] ` <1454699608-22760-1-git-send-email-tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2016-02-05 19:13   ` Tyler Baicar
  2016-02-05 19:13   ` [PATCH V1 4/6] arm64: exception: handle Synchronous External Abort Tyler Baicar
  2016-02-05 19:13   ` [PATCH V1 5/6] arm64: exception: handle instruction abort at current EL Tyler Baicar
  2 siblings, 0 replies; 16+ messages in thread
From: Tyler Baicar @ 2016-02-05 19:13 UTC (permalink / raw)
  To: fu.wei-QSEj5FYQhm4dnm+yROfE0A, timur-sgV2jX0FEOL9JmXXK+q4OQ,
	harba-sgV2jX0FEOL9JmXXK+q4OQ, rruigrok-sgV2jX0FEOL9JmXXK+q4OQ,
	ahs3-H+wXaHxf7aLQT0dZR+AlfA, Catalin Marinas, Will Deacon,
	Rafael J. Wysocki, Len Brown, Matt Fleming, Robert Moore,
	Lv Zheng, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, devel-E0kO6a4B6psdnm+yROfE0A
  Cc: Tyler Baicar, Jonathan (Zhixiong) Zhang, Naveen Kaje

Currently when a RAS error is reported it is not timestamped.
The ACPI 6.1 spec adds the timestamp field to the generic error
data entry v3 structure. The timestamp of when the firmware
generated the error is now being reported.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Signed-off-by: Richard Ruigrok <rruigrok-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Signed-off-by: Tyler Baicar <tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Signed-off-by: Naveen Kaje <nkaje-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
---
 drivers/acpi/apei/ghes.c    |  35 ++++++++++++--
 drivers/firmware/efi/cper.c | 108 +++++++++++++++++++++++++++++++++++++-------
 include/acpi/actbl1.h       |  19 ++++++++
 3 files changed, 142 insertions(+), 20 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index db67711..6c68100 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -414,7 +414,15 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
 	int flags = -1;
 	int sec_sev = ghes_severity(gdata->error_severity);
 	struct cper_sec_mem_err *mem_err;
-	mem_err = (struct cper_sec_mem_err *)(gdata + 1);
+	struct acpi_hest_generic_data_v3 *gdata_v3 = NULL;
+
+	if ((gdata->revision >> 8) >= 0x03)
+		gdata_v3 = (struct acpi_hest_generic_data_v3 *)gdata;
+
+	if (gdata_v3)
+		mem_err = (struct cper_sec_mem_err *)(gdata_v3 + 1);
+	else
+		mem_err = (struct cper_sec_mem_err *)(gdata + 1);
 
 	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
 		return;
@@ -444,14 +452,27 @@ static void ghes_do_proc(struct ghes *ghes,
 {
 	int sev, sec_sev;
 	struct acpi_hest_generic_data *gdata;
+	struct acpi_hest_generic_data_v3 *gdata_v3 = NULL;
+	uuid_le sec_type;
 
 	sev = ghes_severity(estatus->error_severity);
 	apei_estatus_for_each_section(estatus, gdata) {
 		sec_sev = ghes_severity(gdata->error_severity);
-		if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
+		sec_type = *(uuid_le *)gdata->section_type;
+
+		if ((gdata->revision >> 8) >= 0x03)
+			gdata_v3 = (struct acpi_hest_generic_data_v3 *)gdata;
+
+		if (!uuid_le_cmp(sec_type,
 				 CPER_SEC_PLATFORM_MEM)) {
 			struct cper_sec_mem_err *mem_err;
-			mem_err = (struct cper_sec_mem_err *)(gdata+1);
+
+			if (gdata_v3)
+				mem_err = (struct cper_sec_mem_err *)
+					  (gdata_v3 + 1);
+			else
+				mem_err = (struct cper_sec_mem_err *)
+					  (gdata + 1);
 			ghes_edac_report_mem_error(ghes, sev, mem_err);
 
 			arch_apei_report_mem_error(sev, mem_err);
@@ -461,7 +482,13 @@ static void ghes_do_proc(struct ghes *ghes,
 		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
 				      CPER_SEC_PCIE)) {
 			struct cper_sec_pcie *pcie_err;
-			pcie_err = (struct cper_sec_pcie *)(gdata+1);
+
+			if (gdata_v3)
+				pcie_err = (struct cper_sec_pcie *)
+					   (gdata_v3 + 1);
+			else
+				pcie_err = (struct cper_sec_pcie *)
+					   (gdata + 1);
 			if (sev == GHES_SEV_RECOVERABLE &&
 			    sec_sev == GHES_SEV_RECOVERABLE &&
 			    pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index d425374..accd351 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -32,6 +32,8 @@
 #include <linux/acpi.h>
 #include <linux/pci.h>
 #include <linux/aer.h>
+#include <linux/printk.h>
+#include <linux/bcd.h>
 
 #define INDENT_SP	" "
 
@@ -392,6 +394,10 @@ static void cper_estatus_print_section(
 	uuid_le *sec_type = (uuid_le *)gdata->section_type;
 	__u16 severity;
 	char newpfx[64];
+	struct acpi_hest_generic_data_v3 *gdata_v3 = NULL;
+
+	if ((gdata->revision >> 8) >= 0x03)
+		gdata_v3 = (struct acpi_hest_generic_data_v3 *)gdata;
 
 	severity = gdata->error_severity;
 	printk("%s""Error %d, type: %s\n", pfx, sec_no,
@@ -403,14 +409,24 @@ static void cper_estatus_print_section(
 
 	snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
 	if (!uuid_le_cmp(*sec_type, CPER_SEC_PROC_GENERIC)) {
-		struct cper_sec_proc_generic *proc_err = (void *)(gdata + 1);
+		struct cper_sec_proc_generic *proc_err;
+
+		if (gdata_v3)
+			proc_err = (void *)(gdata_v3 + 1);
+		else
+			proc_err = (void *)(gdata + 1);
 		printk("%s""section_type: general processor error\n", newpfx);
 		if (gdata->error_data_length >= sizeof(*proc_err))
 			cper_print_proc_generic(newpfx, proc_err);
 		else
 			goto err_section_too_small;
 	} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
-		struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
+		struct cper_sec_mem_err *mem_err;
+
+		if (gdata_v3)
+			mem_err = (void *)(gdata_v3 + 1);
+		else
+			mem_err = (void *)(gdata + 1);
 		printk("%s""section_type: memory error\n", newpfx);
 		if (gdata->error_data_length >=
 		    sizeof(struct cper_sec_mem_err_old))
@@ -419,7 +435,12 @@ static void cper_estatus_print_section(
 		else
 			goto err_section_too_small;
 	} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PCIE)) {
-		struct cper_sec_pcie *pcie = (void *)(gdata + 1);
+		struct cper_sec_pcie *pcie;
+
+		if (gdata_v3)
+			pcie = (void *)(gdata_v3 + 1);
+		else
+			pcie = (void *)(gdata + 1);
 		printk("%s""section_type: PCIe error\n", newpfx);
 		if (gdata->error_data_length >= sizeof(*pcie))
 			cper_print_pcie(newpfx, pcie, gdata);
@@ -434,10 +455,36 @@ err_section_too_small:
 	pr_err(FW_WARN "error section length is too small\n");
 }
 
+static void cper_estatus_print_section_v3(const char *pfx,
+	const struct acpi_hest_generic_data_v3 *gdata, int sec_no)
+{
+	__u8 hour, min, sec, day, mon, *timestamp;
+	__u16 year;
+
+		if (gdata->gdata_v2.validation_bits & GED_VALID_TIMESTAMP) {
+			timestamp = (__u8 *)&(gdata->timestamp);
+			memcpy(&sec, timestamp, 1);
+			memcpy(&min, timestamp + 1, 1);
+			memcpy(&hour, timestamp + 2, 1);
+			memcpy(&day, timestamp + 4, 1);
+			memcpy(&mon, timestamp + 5, 1);
+			memcpy(&year, timestamp + 6, 2);
+			printk("%stime: ", pfx);
+			printk("%7s", 0x01 & *(timestamp + 3) ? "precise" : "");
+			printk(" %02d:%02d:%02d %04d-%02d-%02d\n",
+				bcd2bin(hour), bcd2bin(min), bcd2bin(sec),
+				year, bcd2bin(mon),
+				bcd2bin(day));
+		}
+
+	cper_estatus_print_section(pfx, &(gdata->gdata_v2), sec_no);
+}
+
 void cper_estatus_print(const char *pfx,
 			const struct acpi_hest_generic_status *estatus)
 {
 	struct acpi_hest_generic_data *gdata;
+	struct acpi_hest_generic_data_v3 *gdata_v3 = NULL;
 	unsigned int data_len, gedata_len;
 	int sec_no = 0;
 	char newpfx[64];
@@ -451,13 +498,27 @@ void cper_estatus_print(const char *pfx,
 	printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
 	data_len = estatus->data_length;
 	gdata = (struct acpi_hest_generic_data *)(estatus + 1);
+	if ((gdata->revision >> 8) >= 0x03)
+		gdata_v3 = (struct acpi_hest_generic_data_v3 *)gdata;
+
 	snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
-	while (data_len >= sizeof(*gdata)) {
-		gedata_len = gdata->error_data_length;
-		cper_estatus_print_section(newpfx, gdata, sec_no);
-		data_len -= gedata_len + sizeof(*gdata);
-		gdata = (void *)(gdata + 1) + gedata_len;
-		sec_no++;
+
+	if (gdata_v3) {
+		while (data_len >= sizeof(*gdata_v3)) {
+			gedata_len = gdata_v3->gdata_v2.error_data_length;
+			cper_estatus_print_section_v3(newpfx, gdata_v3, sec_no);
+			data_len -= gedata_len + sizeof(*gdata_v3);
+			gdata_v3 = (void *)(gdata_v3 + 1) + gedata_len;
+			sec_no++;
+		}
+	} else {
+		while (data_len >= sizeof(*gdata)) {
+			gedata_len = gdata->error_data_length;
+			cper_estatus_print_section(newpfx, gdata, sec_no);
+			data_len -= gedata_len + sizeof(*gdata);
+			gdata = (void *)(gdata + 1) + gedata_len;
+			sec_no++;
+		}
 	}
 }
 EXPORT_SYMBOL_GPL(cper_estatus_print);
@@ -478,6 +539,7 @@ EXPORT_SYMBOL_GPL(cper_estatus_check_header);
 int cper_estatus_check(const struct acpi_hest_generic_status *estatus)
 {
 	struct acpi_hest_generic_data *gdata;
+	struct acpi_hest_generic_data_v3 *gdata_v3 = NULL;
 	unsigned int data_len, gedata_len;
 	int rc;
 
@@ -486,15 +548,29 @@ int cper_estatus_check(const struct acpi_hest_generic_status *estatus)
 		return rc;
 	data_len = estatus->data_length;
 	gdata = (struct acpi_hest_generic_data *)(estatus + 1);
-	while (data_len >= sizeof(*gdata)) {
-		gedata_len = gdata->error_data_length;
-		if (gedata_len > data_len - sizeof(*gdata))
+
+	if ((gdata->revision >> 8) >= 0x03) {
+		gdata_v3 = (struct acpi_hest_generic_data_v3 *)gdata;
+		while (data_len >= sizeof(*gdata_v3)) {
+			gedata_len = gdata_v3->gdata_v2.error_data_length;
+			if (gedata_len > data_len - sizeof(*gdata_v3))
+				return -EINVAL;
+			data_len -= gedata_len + sizeof(*gdata_v3);
+			gdata_v3 = (void *)(gdata_v3 + 1) + gedata_len;
+		}
+		if (data_len)
+			return -EINVAL;
+	} else {
+		while (data_len >= sizeof(*gdata)) {
+			gedata_len = gdata->error_data_length;
+			if (gedata_len > data_len - sizeof(*gdata))
+				return -EINVAL;
+			data_len -= gedata_len + sizeof(*gdata);
+			gdata = (void *)(gdata + 1) + gedata_len;
+		}
+		if (data_len)
 			return -EINVAL;
-		data_len -= gedata_len + sizeof(*gdata);
-		gdata = (void *)(gdata + 1) + gedata_len;
 	}
-	if (data_len)
-		return -EINVAL;
 
 	return 0;
 }
diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index 82695c9..231ae1e 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -653,6 +653,25 @@ struct acpi_hest_generic_data {
 	u8 fru_text[20];
 };
 
+/* Generic Error Data entry version 3 as defined by ACPI 6.1 spec */
+
+struct acpi_hest_generic_data_v3 {
+	struct acpi_hest_generic_data gdata_v2;
+	__u64                         timestamp;
+};
+
+/*
+ * Validation bits definition for validation_bits in struct
+ * acpi_hest_generic_data[_v3]. If set, corresponding fields in
+ * the struct contain valid information.
+ */
+/* corresponds fru_id */
+#define GED_VALID_FRU_ID                  0x0001
+/* corresponds fru_text */
+#define GED_VALID_FRU_TEXT                0x0002
+/* corresponds timestamp */
+#define GED_VALID_TIMESTAMP               0x0004
+
 /*******************************************************************************
  *
  * MADT - Multiple APIC Description Table
-- 
1.8.2.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V1 3/6] efi: parse ARMv8 processor error
  2016-02-05 19:13 [PATCH V1 0/6] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64 Tyler Baicar
  2016-02-05 19:13 ` [PATCH V1 1/6] acpi: apei: read ack upon ghes record consumption Tyler Baicar
@ 2016-02-05 19:13 ` Tyler Baicar
       [not found] ` <1454699608-22760-1-git-send-email-tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Tyler Baicar @ 2016-02-05 19:13 UTC (permalink / raw)
  To: fu.wei, timur, harba, rruigrok, ahs3, Catalin Marinas,
	Will Deacon, Rafael J. Wysocki, Len Brown, Matt Fleming,
	Robert Moore, Lv Zheng, linux-arm-kernel, linux-kernel,
	linux-acpi, linux-efi, devel
  Cc: Tyler Baicar, Jonathan (Zhixiong) Zhang, Naveen Kaje

Add support for ARMv8 Common Platform Error Record (CPER).
UEFI 2.6 specification adds support for ARMv8 specific
processor error information to be reported as part of the
CPER records. This provides more detail on for processor error logs.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
---
 drivers/firmware/efi/cper.c | 136 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/cper.h        |  72 +++++++++++++++++++++++
 2 files changed, 208 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index accd351..ca803fc 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -109,12 +109,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
 static const char * const proc_type_strs[] = {
 	"IA32/X64",
 	"IA64",
+	"ARMv8",
 };
 
 static const char * const proc_isa_strs[] = {
 	"IA32",
 	"IA64",
 	"X64",
+	"ARM A32/T32",
+	"ARM A64",
 };
 
 static const char * const proc_error_type_strs[] = {
@@ -183,6 +186,127 @@ static void cper_print_proc_generic(const char *pfx,
 		printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
 }
 
+static void cper_print_proc_armv8(const char *pfx,
+				  const struct cper_sec_proc_armv8 *proc)
+{
+	int i, len;
+	struct cper_armv8_err_info *err_info;
+	__u64 *qword = NULL;
+	char newpfx[64];
+
+	printk("%ssection length: %d\n", pfx, proc->section_length);
+	printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
+
+	len = proc->section_length - (sizeof(*proc) +
+		proc->err_info_num * (sizeof(*err_info)));
+	if (len < 0) {
+		printk("%ssection length is too small.\n", pfx);
+		printk("%sERR_INFO_NUM is %d.\n", pfx, proc->err_info_num);
+		return;
+	}
+
+	if (proc->validation_bits & CPER_ARMV8_VALID_MPIDR)
+		printk("%sMPIDR: 0x%016llx\n", pfx, proc->mpidr);
+	if (proc->validation_bits & CPER_ARMV8_VALID_AFFINITY_LEVEL)
+		printk("%serror affinity level: %d\n", pfx,
+			proc->affinity_level);
+	if (proc->validation_bits & CPER_ARMV8_VALID_RUNNING_STATE)
+		printk("%srunning state: %d\n", pfx, proc->running_state);
+
+	snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
+
+	err_info = (struct cper_armv8_err_info *)(proc + 1);
+	for (i = 0; i < proc->err_info_num; i++) {
+		printk("%sError info structure %d:\n", pfx, i);
+		printk("%sversion:%d\n", newpfx, err_info->version);
+		printk("%slength:%d\n", newpfx, err_info->length);
+		if (err_info->validation_bits &
+		    CPER_ARMV8_INFO_VALID_MULTI_ERR) {
+			if (err_info->multiple_error == 0)
+				printk("%ssingle error.\n", newpfx);
+			else if (err_info->multiple_error == 1)
+				printk("%smultiple errors.\n", newpfx);
+			else
+				printk("%smultiple errors count:%d.\n",
+				newpfx, err_info->multiple_error);
+		}
+		if (err_info->validation_bits & CPER_ARMV8_INFO_VALID_FLAGS) {
+			if (err_info->flags & CPER_ARMV8_INFO_FLAGS_FIRST)
+				printk("%sfirst error captured.\n", newpfx);
+			if (err_info->flags & CPER_ARMV8_INFO_FLAGS_LAST)
+				printk("%slast error captured.\n", newpfx);
+			if (err_info->flags & CPER_ARMV8_INFO_FLAGS_PROPAGATED)
+				printk("%spropagated error captured.\n",
+				       newpfx);
+		}
+		printk("%serror_type: %d, %s\n", newpfx, err_info->type,
+			err_info->type < ARRAY_SIZE(proc_error_type_strs) ?
+			proc_error_type_strs[err_info->type] : "unknown");
+		printk("%serror_info: 0x%016llx\n", newpfx,
+		       err_info->error_info);
+		if (err_info->validation_bits & CPER_ARMV8_INFO_VALID_VIRT_ADDR)
+			printk("%svirtual fault address: 0x%016llx\n",
+				newpfx, err_info->virt_fault_addr);
+		if (err_info->validation_bits &
+		    CPER_ARMV8_INFO_VALID_PHYSICAL_ADDR)
+			printk("%sphysical fault address: 0x%016llx\n",
+				newpfx, err_info->physical_fault_addr);
+		err_info += 1;
+	}
+
+	if (len < sizeof(*qword) && proc->context_info_num > 0) {
+		printk("%ssection length is too small.\n", pfx);
+		printk("%sCTX_INFO_NUM is %d.\n", pfx, proc->context_info_num);
+		return;
+	}
+	for (i = 0; i < proc->context_info_num; i++) {
+		qword = (__u64 *)err_info;
+		printk("%sProcessor context info structure %d:\n", pfx, i);
+		printk("%sException level %d.\n", newpfx,
+		       (int)((*qword & CPER_ARMV8_CTX_EL_MASK)
+				>> CPER_ARMV8_CTX_EL_SHIFT));
+		printk("%sSecure bit: %d.\n", newpfx,
+		       (int)((*qword & CPER_ARMV8_CTX_NS_MASK)
+				>> CPER_ARMV8_CTX_NS_SHIFT));
+		if ((*qword & CPER_ARMV8_CTX_TYPE_MASK) == 0) {
+			if (len < CPER_AARCH32_CTX_LEN) {
+				printk("%ssection length is too small.\n", pfx);
+				printk("%sremaining length is %d.\n", pfx, len);
+				return;
+			}
+			printk("%sAArch32 execution context.\n", newpfx);
+			qword++;
+			print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
+				qword, CPER_AARCH32_CTX_LEN - sizeof(*qword),
+				0);
+			len -= CPER_AARCH32_CTX_LEN;
+		} else if ((*qword & CPER_ARMV8_CTX_TYPE_MASK) == 1) {
+			if (len < CPER_AARCH64_CTX_LEN) {
+				printk("%ssection length is too small.\n", pfx);
+				printk("%sremaining length is %d.\n", pfx, len);
+				return;
+			}
+			printk("%sAArch64 execution context.\n", newpfx);
+			qword++;
+			print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
+				qword, CPER_AARCH64_CTX_LEN - sizeof(*qword),
+				0);
+			len -= CPER_AARCH64_CTX_LEN;
+		} else {
+			printk("%scontext type is incorrect 0x%016llx.\n",
+			pfx, *qword);
+			return;
+		}
+	}
+
+	if (len > 0) {
+		printk("%sVendor specific error info has %d bytes.\n", pfx,
+		       len);
+		print_hex_dump(pfx, "", DUMP_PREFIX_OFFSET, 16, 4, qword, len,
+			0);
+	}
+}
+
 static const char * const mem_err_type_strs[] = {
 	"unknown",
 	"no error",
@@ -446,6 +570,18 @@ static void cper_estatus_print_section(
 			cper_print_pcie(newpfx, pcie, gdata);
 		else
 			goto err_section_too_small;
+	} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PROC_ARMV8)) {
+		struct cper_sec_proc_armv8 *armv8_err;
+
+		if (gdata_v3)
+			armv8_err = (void *)(gdata_v3 + 1);
+		else
+			armv8_err = (void *)(gdata + 1);
+		printk("%ssection_type: ARMv8 processor error\n", newpfx);
+		if (gdata->error_data_length >= sizeof(*armv8_err))
+			cper_print_proc_armv8(newpfx, armv8_err);
+		else
+			goto err_section_too_small;
 	} else
 		printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);
 
diff --git a/include/linux/cper.h b/include/linux/cper.h
index dcacb1a..d1efbef 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -162,6 +162,11 @@ enum {
  * corrective action before the data is consumed
  */
 #define CPER_SEC_LATENT_ERROR			0x0020
+/*
+ * If set, the section contains an error that is propagated. The error
+ * did not originate from the hardware associated with this section.
+ */
+#define CPER_SEC_PROPAGATED			0x0040
 
 /*
  * Section type definitions, used in section_type field in struct
@@ -180,6 +185,10 @@ enum {
 #define CPER_SEC_PROC_IPF						\
 	UUID_LE(0xE429FAF1, 0x3CB7, 0x11D4, 0x0B, 0xCA, 0x07, 0x00,	\
 		0x80, 0xC7, 0x3C, 0x88, 0x81)
+/* Processor Specific: ARMv8 */
+#define CPER_SEC_PROC_ARMV8						\
+	UUID_LE(0xE19E3D16, 0xBC11, 0x11E4, 0x9C, 0xAA, 0xC2, 0x05,	\
+		0x1D, 0x5D, 0x46, 0xB0)
 /* Platform Memory */
 #define CPER_SEC_PLATFORM_MEM						\
 	UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83,	\
@@ -255,6 +264,34 @@ enum {
 
 #define CPER_PCIE_SLOT_SHIFT			3
 
+#define CPER_ARMV8_ERR_INFO_NUM_MASK		0x00000000000000FF
+#define CPER_ARMV8_CTX_INFO_NUM_MASK		0x0000000000FFFF00
+#define CPER_ARMV8_CTX_INFO_NUM_SHIFT		8
+
+#define CPER_ARMV8_VALID_MPIDR			0x00000001
+#define CPER_ARMV8_VALID_AFFINITY_LEVEL		0x00000002
+#define CPER_ARMV8_VALID_RUNNING_STATE		0x00000004
+#define CPER_ARMV8_VALID_VENDOR_INFO		0x00000008
+
+#define CPER_ARMV8_INFO_VALID_MULTI_ERR		0x0001
+#define CPER_ARMV8_INFO_VALID_FLAGS		0x0002
+#define CPER_ARMV8_INFO_VALID_ERR_INFO		0x0004
+#define CPER_ARMV8_INFO_VALID_VIRT_ADDR		0x0008
+#define CPER_ARMV8_INFO_VALID_PHYSICAL_ADDR	0x0010
+
+#define CPER_ARMV8_INFO_FLAGS_FIRST		0x0001
+#define CPER_ARMV8_INFO_FLAGS_LAST		0x0002
+#define CPER_ARMV8_INFO_FLAGS_PROPAGATED	0x0004
+
+#define CPER_AARCH64_CTX_LEN			368
+#define CPER_AARCH32_CTX_LEN			256
+
+#define CPER_ARMV8_CTX_TYPE_MASK		0x000000000000000F
+#define CPER_ARMV8_CTX_EL_MASK			0x0000000000000070
+#define CPER_ARMV8_CTX_NS_MASK			0x0000000000000080
+#define CPER_ARMV8_CTX_EL_SHIFT			4
+#define CPER_ARMV8_CTX_NS_SHIFT			7
+
 /*
  * All tables and structs must be byte-packed to match CPER
  * specification, since the tables are provided by the system BIOS
@@ -340,6 +377,41 @@ struct cper_ia_proc_ctx {
 	__u64	mm_reg_addr;
 };
 
+/* ARMv8 Processor Error Section */
+struct cper_sec_proc_armv8 {
+	__u32	validation_bits;
+	__u16	err_info_num; /* Number of Processor Error Info */
+	__u16	context_info_num; /* Number of Processor Context Info Records*/
+	__u32	section_length;
+	__u8	affinity_level;
+	__u8	reserved[3];	/* must be zero */
+	__u64	mpidr;
+	__u64	midr;
+	__u32	running_state; /* Bit 0 set - Processor running. PSCI = 0 */
+	__u32	psci_state;
+};
+
+/* ARMv8 Processor Error Information Structure */
+struct cper_armv8_err_info {
+	__u8	version;
+	__u8	length;
+	__u16	validation_bits;
+	__u8	type;
+	__u16	multiple_error;
+	__u8	flags;
+	__u64	error_info;
+	__u64	virt_fault_addr;
+	__u64	physical_fault_addr;
+};
+
+/* ARMv8 AARCH64 Processor Context Information Structure */
+struct cper_armv8_aarch64_ctx {
+	__u8	type_el_ns;
+	__u8	reserved[7];	/* must be zero */
+	__u8	gpr[288];
+	__u8	spr[68];
+};
+
 /* Old Memory Error Section UEFI 2.1, 2.2 */
 struct cper_sec_mem_err_old {
 	__u64	validation_bits;
-- 
1.8.2.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V1 4/6] arm64: exception: handle Synchronous External Abort
       [not found] ` <1454699608-22760-1-git-send-email-tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2016-02-05 19:13   ` [PATCH V1 2/6] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1 Tyler Baicar
@ 2016-02-05 19:13   ` Tyler Baicar
  2016-02-10 18:03     ` Will Deacon
  2016-02-05 19:13   ` [PATCH V1 5/6] arm64: exception: handle instruction abort at current EL Tyler Baicar
  2 siblings, 1 reply; 16+ messages in thread
From: Tyler Baicar @ 2016-02-05 19:13 UTC (permalink / raw)
  To: fu.wei-QSEj5FYQhm4dnm+yROfE0A, timur-sgV2jX0FEOL9JmXXK+q4OQ,
	harba-sgV2jX0FEOL9JmXXK+q4OQ, rruigrok-sgV2jX0FEOL9JmXXK+q4OQ,
	ahs3-H+wXaHxf7aLQT0dZR+AlfA, Catalin Marinas, Will Deacon,
	Rafael J. Wysocki, Len Brown, Matt Fleming, Robert Moore,
	Lv Zheng, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, devel-E0kO6a4B6psdnm+yROfE0A
  Cc: Tyler Baicar, Jonathan (Zhixiong) Zhang, Naveen Kaje

SEA exceptions are often caused by an uncorrected hardware
error and are handled when data abort and instruction abort
exception classes have specific values for their Fault Status
Code.

When SEA occurs, before killing the process, go through
the handlers registered in the notification list.

Update fault_info[] with specific SEA faults so that the
new SEA handler is used.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Signed-off-by: Tyler Baicar <tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Signed-off-by: Naveen Kaje <nkaje-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
---
 arch/arm64/include/asm/system_misc.h | 13 ++++++++
 arch/arm64/mm/fault.c                | 58 +++++++++++++++++++++++++++++-------
 2 files changed, 61 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index 57f110b..90daf4a 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -64,4 +64,17 @@ extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
 
 #endif	/* __ASSEMBLY__ */
 
+/*
+ * The functions below are used to register and unregister callbacks
+ * that are to be invoked when a Synchronous External Abort (SEA)
+ * occurs. An SEA is raised by certain fault status codes that have
+ * either data or instruction abort as the exception class, and
+ * callbacks may be registered to parse or handle such hardware errors.
+ *
+ * Registered callbacks are run in an interrupt/atomic context. They
+ * are not allowed to block or sleep.
+ */
+int sea_register_handler_chain(struct notifier_block *nb);
+void sea_unregister_handler_chain(struct notifier_block *nb);
+
 #endif	/* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 92ddac1..d6fa691 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -39,6 +39,22 @@
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
 
+/*
+ * GHES SEA handler code may register a notifier call here to
+ * handle HW error record passed from platform.
+ */
+static ATOMIC_NOTIFIER_HEAD(sea_handler_chain);
+
+int sea_register_handler_chain(struct notifier_block *nb)
+{
+	return atomic_notifier_chain_register(&sea_handler_chain, nb);
+}
+
+void sea_unregister_handler_chain(struct notifier_block *nb)
+{
+	atomic_notifier_chain_unregister(&sea_handler_chain, nb);
+}
+
 static const char *fault_name(unsigned int esr);
 
 /*
@@ -379,6 +395,28 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	return 1;
 }
 
+/*
+ * This abort handler deals with Synchronous External Abort.
+ * It calls notifiers, and then returns "fault".
+ */
+static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
+{
+	struct siginfo info;
+
+	atomic_notifier_call_chain(&sea_handler_chain, 0, NULL);
+
+	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
+		 fault_name(esr), esr, addr);
+
+	info.si_signo = SIGBUS;
+	info.si_errno = 0;
+	info.si_code  = 0;
+	info.si_addr  = (void __user *)addr;
+	arm64_notify_die("", regs, &info, esr);
+
+	return 0;
+}
+
 static struct fault_info {
 	int	(*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
 	int	sig;
@@ -401,22 +439,22 @@ static struct fault_info {
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
+	{ do_sea,		SIGBUS,  0,		"synchronous external abort"	},
 	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
+	{ do_sea,		SIGBUS,  0,		"level 0 SEA (trans tbl walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 1 SEA (trans tbl walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 2 SEA (trans tbl walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 3 SEA (trans tbl walk)"	},
+	{ do_sea,		SIGBUS,  0,		"synchronous parity or ECC err" },
 	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
+	{ do_sea,		SIGBUS,  0,		"level 0 synch parity error"	},
+	{ do_sea,		SIGBUS,  0,		"level 1 synch parity error"	},
+	{ do_sea,		SIGBUS,  0,		"level 2 synch parity error"	},
+	{ do_sea,		SIGBUS,  0,		"level 3 synch parity error"	},
 	{ do_bad,		SIGBUS,  0,		"unknown 32"			},
 	{ do_bad,		SIGBUS,  BUS_ADRALN,	"alignment fault"		},
 	{ do_bad,		SIGBUS,  0,		"unknown 34"			},
-- 
1.8.2.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V1 5/6] arm64: exception: handle instruction abort at current EL
       [not found] ` <1454699608-22760-1-git-send-email-tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2016-02-05 19:13   ` [PATCH V1 2/6] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1 Tyler Baicar
  2016-02-05 19:13   ` [PATCH V1 4/6] arm64: exception: handle Synchronous External Abort Tyler Baicar
@ 2016-02-05 19:13   ` Tyler Baicar
  2016-02-10 18:02     ` Will Deacon
  2 siblings, 1 reply; 16+ messages in thread
From: Tyler Baicar @ 2016-02-05 19:13 UTC (permalink / raw)
  To: fu.wei-QSEj5FYQhm4dnm+yROfE0A, timur-sgV2jX0FEOL9JmXXK+q4OQ,
	harba-sgV2jX0FEOL9JmXXK+q4OQ, rruigrok-sgV2jX0FEOL9JmXXK+q4OQ,
	ahs3-H+wXaHxf7aLQT0dZR+AlfA, Catalin Marinas, Will Deacon,
	Rafael J. Wysocki, Len Brown, Matt Fleming, Robert Moore,
	Lv Zheng, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, devel-E0kO6a4B6psdnm+yROfE0A
  Cc: Tyler Baicar, Naveen Kaje

Add a handler for instruction aborts at the current EL
(ESR_ELx_EC_IABT_CUR) so they are no longer handled in el1_inv.
This allows firmware first handling for possible SEA
(Synchronous External Abort) caused instruction abort at
current EL.

Signed-off-by: Tyler Baicar <tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Signed-off-by: Naveen Kaje <nkaje-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
---
 arch/arm64/kernel/entry.S | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 1f7f5a2..6b7fb14 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -336,6 +336,8 @@ el1_sync:
 	lsr	x24, x1, #ESR_ELx_EC_SHIFT	// exception class
 	cmp	x24, #ESR_ELx_EC_DABT_CUR	// data abort in EL1
 	b.eq	el1_da
+	cmp	x24, #ESR_ELx_EC_IABT_CUR	// instruction abort in EL1
+	b.eq	el1_ia
 	cmp	x24, #ESR_ELx_EC_SYS64		// configurable trap
 	b.eq	el1_undef
 	cmp	x24, #ESR_ELx_EC_SP_ALIGN	// stack alignment exception
@@ -363,6 +365,23 @@ el1_da:
 	// disable interrupts before pulling preserved data off the stack
 	disable_irq
 	kernel_exit 1
+el1_ia:
+	/*
+	 * Instruction abort handling
+	 */
+	mrs	x0, far_el1
+	enable_dbg
+	// re-enable interrupts if they were enabled in the aborted context
+	tbnz	x23, #7, 1f			// PSR_I_BIT
+	enable_irq
+1:
+	orr	x1, x1, #1 << 24		// use reserved ISS bit for instruction aborts
+	mov	x2, sp				// struct pt_regs
+	bl	do_mem_abort
+
+	// disable interrupts before pulling preserved data off the stack
+	disable_irq
+	kernel_exit 1
 el1_sp_pc:
 	/*
 	 * Stack or PC alignment exception handling
-- 
1.8.2.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V1 6/6] acpi: apei: handle SEA notification type for ARMv8
  2016-02-05 19:13 [PATCH V1 0/6] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64 Tyler Baicar
                   ` (2 preceding siblings ...)
       [not found] ` <1454699608-22760-1-git-send-email-tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2016-02-05 19:13 ` Tyler Baicar
       [not found]   ` <1454699608-22760-7-git-send-email-tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2016-02-10 17:44 ` [PATCH V1 0/6] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64 Will Deacon
  4 siblings, 1 reply; 16+ messages in thread
From: Tyler Baicar @ 2016-02-05 19:13 UTC (permalink / raw)
  To: fu.wei, timur, harba, rruigrok, ahs3, Catalin Marinas,
	Will Deacon, Rafael J. Wysocki, Len Brown, Matt Fleming,
	Robert Moore, Lv Zheng, linux-arm-kernel, linux-kernel,
	linux-acpi, linux-efi, devel
  Cc: Tyler Baicar, Jonathan (Zhixiong) Zhang, Naveen Kaje

ARM APEI extension proposal added SEA (Synchrounous External
Abort) notification type for ARMv8.

Add a new GHES error source handling function for SEA. If an error
source's notification type is SEA, then this function can be registered
into the SEA exception handler. That way GHES will parse and report
SEA exceptions when they occur.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
---
 arch/arm64/Kconfig        |  1 +
 drivers/acpi/apei/Kconfig | 13 ++++++++
 drivers/acpi/apei/ghes.c  | 83 +++++++++++++++++++++++++++++++++++++++++++++++
 include/acpi/actbl1.h     |  6 +++-
 4 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6e4a4f4..236f398 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -6,6 +6,7 @@ config ARM64
 	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select HAVE_ACPI_APEI if ACPI
+	select HAVE_ACPI_APEI_SEA if ACPI
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index a60bb00..bfcbb9e 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -7,6 +7,19 @@ config HAVE_ACPI_APEI_NMI
 config HAVE_ACPI_APEI_HEST_IA32
 	bool
 
+config HAVE_ACPI_APEI_SEA
+	bool "APEI Synchronous External Abort logging/recovering support"
+	help
+	  This option should be enabled if the system supports
+	  firmware first handling of SEA (Synchronous External Abort).
+	  SEA happens with certain faults of data abort or instruction
+	  abort synchronous exceptions on ARMv8 systems. If a system
+	  supports firmware first handling of SEA, the platform analyzes
+	  and handles hardware error notifications with SEA, and it may then
+	  form a HW error record for the OS to parse and handle. This
+	  option allows the OS to look for such HW error record, and
+	  take appropriate action.
+
 config ACPI_APEI
 	bool "ACPI Platform Error Interface (APEI)"
 	select MISC_FILESYSTEMS
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 6c68100..ed64b97 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -50,6 +50,10 @@
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
 
+#ifdef CONFIG_HAVE_ACPI_APEI_SEA
+#include <asm/system_misc.h>
+#endif
+
 #include "apei-internal.h"
 
 #define GHES_PFX	"GHES: "
@@ -784,6 +788,62 @@ static struct notifier_block ghes_notifier_sci = {
 	.notifier_call = ghes_notify_sci,
 };
 
+#ifdef CONFIG_HAVE_ACPI_APEI_SEA
+static LIST_HEAD(ghes_sea);
+
+static int ghes_notify_sea(struct notifier_block *this,
+				  unsigned long event, void *data)
+{
+	struct ghes *ghes;
+	int ret = NOTIFY_DONE;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
+		if (!ghes_proc(ghes))
+			ret = NOTIFY_OK;
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static struct notifier_block ghes_notifier_sea = {
+	.notifier_call = ghes_notify_sea,
+};
+
+static int ghes_sea_add(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	if (list_empty(&ghes_sea))
+		sea_register_handler_chain(&ghes_notifier_sea);
+	list_add_rcu(&ghes->list, &ghes_sea);
+	mutex_unlock(&ghes_list_mutex);
+	return 0;
+}
+
+static void ghes_sea_remove(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_del_rcu(&ghes->list);
+	if (list_empty(&ghes_sea))
+		sea_unregister_handler_chain(&ghes_notifier_sea);
+	mutex_unlock(&ghes_list_mutex);
+}
+#else /* CONFIG_HAVE_ACPI_APEI_SEA */
+static inline int ghes_sea_add(struct ghes *ghes)
+{
+	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
+	       ghes->generic->header.source_id);
+	return -ENOTSUPP;
+}
+
+static inline void ghes_sea_remove(struct ghes *ghes)
+{
+	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
+	       ghes->generic->header.source_id);
+}
+#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
+
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
  * printk is not safe in NMI context.  So in NMI handler, we allocate
@@ -1028,6 +1088,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_EXTERNAL:
 	case ACPI_HEST_NOTIFY_SCI:
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
+			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
+				generic->header.source_id);
+			rc = -ENOTSUPP;
+			goto err;
+		}
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
 			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
@@ -1039,6 +1107,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
 			   generic->header.source_id);
 		goto err;
+	case ACPI_HEST_NOTIFY_GPIO:
+	case ACPI_HEST_NOTIFY_SEI:
+	case ACPI_HEST_NOTIFY_GSIV:
+		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
+			generic->header.source_id, generic->header.source_id);
+		rc = -ENOTSUPP;
+		goto err;
 	default:
 		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
 			   generic->notify.type, generic->header.source_id);
@@ -1093,6 +1168,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		list_add_rcu(&ghes->list, &ghes_sci);
 		mutex_unlock(&ghes_list_mutex);
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		rc = ghes_sea_add(ghes);
+		if (rc)
+			goto err_edac_unreg;
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_add(ghes);
 		break;
@@ -1135,6 +1215,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
 			unregister_acpi_hed_notifier(&ghes_notifier_sci);
 		mutex_unlock(&ghes_list_mutex);
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		ghes_sea_remove(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_remove(ghes);
 		break;
diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index 231ae1e..e5118fb 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -507,7 +507,11 @@ enum acpi_hest_notify_types {
 	ACPI_HEST_NOTIFY_NMI = 4,
 	ACPI_HEST_NOTIFY_CMCI = 5,	/* ACPI 5.0 */
 	ACPI_HEST_NOTIFY_MCE = 6,	/* ACPI 5.0 */
-	ACPI_HEST_NOTIFY_RESERVED = 7	/* 7 and greater are reserved */
+	ACPI_HEST_NOTIFY_GPIO = 7,	/* ACPI 5.0 */
+	ACPI_HEST_NOTIFY_SEA = 8,	/* ACPI 6.x */
+	ACPI_HEST_NOTIFY_SEI = 9,	/* ACPI 6.x */
+	ACPI_HEST_NOTIFY_GSIV = 10,	/* ACPI 6.x */
+	ACPI_HEST_NOTIFY_RESERVED = 11	/* 11 and greater are reserved */
 };
 
 /* Values for config_write_enable bitfield above */
-- 
1.8.2.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 0/6] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64
  2016-02-05 19:13 [PATCH V1 0/6] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64 Tyler Baicar
                   ` (3 preceding siblings ...)
  2016-02-05 19:13 ` [PATCH V1 6/6] acpi: apei: handle SEA notification type for ARMv8 Tyler Baicar
@ 2016-02-10 17:44 ` Will Deacon
  4 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2016-02-10 17:44 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: fu.wei, timur, harba, rruigrok, ahs3, Catalin Marinas,
	Rafael J. Wysocki, Len Brown, Matt Fleming, Robert Moore,
	Lv Zheng, linux-arm-kernel, linux-kernel, linux-acpi, linux-efi,
	devel

On Fri, Feb 05, 2016 at 12:13:22PM -0700, Tyler Baicar wrote:
> Add support for Generic Hardware Error Source (GHES) v2, which introduces the
> capability for the OS to acknowledge the consumption of the error record
> generated by the Reliability, Availability and Serviceability (RAS) controller.
> This eliminates potential race conditions between the OS and the RAS controller.
> 
> Add support for the timestamp field added to the Generic Error Data Entry v3,
> allowing the OS to log the time that the error is generated by the firmware,
> rather than the time the error is consumed. This improves the correctness of
> event sequences when analyzing error logs. The timestamp is added in
> ACPI 6.1, reference Table 18-343 Generic Error Data Entry.
> 
> Add support for ARMv8 Common Platform Error Record (CPER) per UEFI 2.6
> specification. ARMv8 specific processor error information is reported as part of
> the CPER records.  This provides more detail on for processor error logs.
> 
> Synchronous External Abort (SEA) represents a specific processor error condition
> in ARM systems. A handler is added to recognize SEA errors, and a notifier is
> added to parse and report the errors before the process is killed. Refer to
> section N.2.1.1 in the Common Platform Error Record appendix of the UEFI 2.6
> specification.
> 
> Depends on: [PATCH v5] acpi, apei, arm64: APEI initial support for aarch64.
>             https://lkml.org/lkml/2015/12/10/131

Did the TLB flushing issue ever get sorted out with that series?

http://marc.info/?l=linux-arm-kernel&m=145009373307418&w=2

It doesn't affect arm64, but I think it's a real issue for x86 when local
invalidation is in effect to avoid IPIs from interrupt context.

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 5/6] arm64: exception: handle instruction abort at current EL
  2016-02-05 19:13   ` [PATCH V1 5/6] arm64: exception: handle instruction abort at current EL Tyler Baicar
@ 2016-02-10 18:02     ` Will Deacon
       [not found]       ` <20160210180210.GT1052-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Will Deacon @ 2016-02-10 18:02 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: fu.wei, timur, harba, rruigrok, ahs3, Catalin Marinas,
	Rafael J. Wysocki, Len Brown, Matt Fleming, Robert Moore,
	Lv Zheng, linux-arm-kernel, linux-kernel, linux-acpi, linux-efi,
	devel, Naveen Kaje

On Fri, Feb 05, 2016 at 12:13:27PM -0700, Tyler Baicar wrote:
> Add a handler for instruction aborts at the current EL
> (ESR_ELx_EC_IABT_CUR) so they are no longer handled in el1_inv.
> This allows firmware first handling for possible SEA
> (Synchronous External Abort) caused instruction abort at
> current EL.
> 
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
> ---
>  arch/arm64/kernel/entry.S | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 1f7f5a2..6b7fb14 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -336,6 +336,8 @@ el1_sync:
>  	lsr	x24, x1, #ESR_ELx_EC_SHIFT	// exception class
>  	cmp	x24, #ESR_ELx_EC_DABT_CUR	// data abort in EL1
>  	b.eq	el1_da
> +	cmp	x24, #ESR_ELx_EC_IABT_CUR	// instruction abort in EL1
> +	b.eq	el1_ia
>  	cmp	x24, #ESR_ELx_EC_SYS64		// configurable trap
>  	b.eq	el1_undef
>  	cmp	x24, #ESR_ELx_EC_SP_ALIGN	// stack alignment exception
> @@ -363,6 +365,23 @@ el1_da:
>  	// disable interrupts before pulling preserved data off the stack
>  	disable_irq
>  	kernel_exit 1
> +el1_ia:
> +	/*
> +	 * Instruction abort handling
> +	 */
> +	mrs	x0, far_el1
> +	enable_dbg
> +	// re-enable interrupts if they were enabled in the aborted context
> +	tbnz	x23, #7, 1f			// PSR_I_BIT
> +	enable_irq
> +1:
> +	orr	x1, x1, #1 << 24		// use reserved ISS bit for instruction aborts

Do we actually need to set this bit (ESR_LNX_EXEC) for aborts from EL1?
If not, could we just use the same entry code as el1_da?

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 6/6] acpi: apei: handle SEA notification type for ARMv8
       [not found]   ` <1454699608-22760-7-git-send-email-tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2016-02-10 18:03     ` Will Deacon
  2016-02-11  3:22       ` Abdulhamid, Harb
       [not found]       ` <20160210180332.GU1052-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 2 replies; 16+ messages in thread
From: Will Deacon @ 2016-02-10 18:03 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: fu.wei-QSEj5FYQhm4dnm+yROfE0A, timur-sgV2jX0FEOL9JmXXK+q4OQ,
	harba-sgV2jX0FEOL9JmXXK+q4OQ, rruigrok-sgV2jX0FEOL9JmXXK+q4OQ,
	ahs3-H+wXaHxf7aLQT0dZR+AlfA, Catalin Marinas, Rafael J. Wysocki,
	Len Brown, Matt Fleming, Robert Moore, Lv Zheng,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, devel-E0kO6a4B6psdnm+yROfE0A,
	Naveen Kaje, Jonathan (Zhixiong) Zhang

On Fri, Feb 05, 2016 at 12:13:28PM -0700, Tyler Baicar wrote:
> ARM APEI extension proposal added SEA (Synchrounous External
> Abort) notification type for ARMv8.
> 
> Add a new GHES error source handling function for SEA. If an error
> source's notification type is SEA, then this function can be registered
> into the SEA exception handler. That way GHES will parse and report
> SEA exceptions when they occur.
> 
> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> Signed-off-by: Tyler Baicar <tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> Signed-off-by: Naveen Kaje <nkaje-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> ---
>  arch/arm64/Kconfig        |  1 +
>  drivers/acpi/apei/Kconfig | 13 ++++++++
>  drivers/acpi/apei/ghes.c  | 83 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/acpi/actbl1.h     |  6 +++-
>  4 files changed, 102 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 6e4a4f4..236f398 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -6,6 +6,7 @@ config ARM64
>  	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
>  	select ARCH_HAS_DEVMEM_IS_ALLOWED
>  	select HAVE_ACPI_APEI if ACPI
> +	select HAVE_ACPI_APEI_SEA if ACPI
>  	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>  	select ARCH_HAS_ELF_RANDOMIZE
>  	select ARCH_HAS_GCOV_PROFILE_ALL
> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
> index a60bb00..bfcbb9e 100644
> --- a/drivers/acpi/apei/Kconfig
> +++ b/drivers/acpi/apei/Kconfig
> @@ -7,6 +7,19 @@ config HAVE_ACPI_APEI_NMI
>  config HAVE_ACPI_APEI_HEST_IA32
>  	bool
>  
> +config HAVE_ACPI_APEI_SEA
> +	bool "APEI Synchronous External Abort logging/recovering support"
> +	help
> +	  This option should be enabled if the system supports
> +	  firmware first handling of SEA (Synchronous External Abort).
> +	  SEA happens with certain faults of data abort or instruction
> +	  abort synchronous exceptions on ARMv8 systems. If a system
> +	  supports firmware first handling of SEA, the platform analyzes
> +	  and handles hardware error notifications with SEA, and it may then
> +	  form a HW error record for the OS to parse and handle. This
> +	  option allows the OS to look for such HW error record, and
> +	  take appropriate action.
> +
>  config ACPI_APEI
>  	bool "ACPI Platform Error Interface (APEI)"
>  	select MISC_FILESYSTEMS
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 6c68100..ed64b97 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -50,6 +50,10 @@
>  #include <acpi/apei.h>
>  #include <asm/tlbflush.h>
>  
> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
> +#include <asm/system_misc.h>
> +#endif
> +
>  #include "apei-internal.h"
>  
>  #define GHES_PFX	"GHES: "
> @@ -784,6 +788,62 @@ static struct notifier_block ghes_notifier_sci = {
>  	.notifier_call = ghes_notify_sci,
>  };
>  
> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
> +static LIST_HEAD(ghes_sea);
> +
> +static int ghes_notify_sea(struct notifier_block *this,
> +				  unsigned long event, void *data)
> +{
> +	struct ghes *ghes;
> +	int ret = NOTIFY_DONE;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
> +		if (!ghes_proc(ghes))
> +			ret = NOTIFY_OK;
> +	}
> +	rcu_read_unlock();
> +
> +	return ret;
> +}
> +
> +static struct notifier_block ghes_notifier_sea = {
> +	.notifier_call = ghes_notify_sea,
> +};
> +
> +static int ghes_sea_add(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);

Can you just use spin_lock, to be consistent with our other excception
hooks?

> +	if (list_empty(&ghes_sea))
> +		sea_register_handler_chain(&ghes_notifier_sea);
> +	list_add_rcu(&ghes->list, &ghes_sea);
> +	mutex_unlock(&ghes_list_mutex);
> +	return 0;
> +}
> +
> +static void ghes_sea_remove(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);
> +	list_del_rcu(&ghes->list);
> +	if (list_empty(&ghes_sea))
> +		sea_unregister_handler_chain(&ghes_notifier_sea);
> +	mutex_unlock(&ghes_list_mutex);
> +}
> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
> +static inline int ghes_sea_add(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +	return -ENOTSUPP;
> +}
> +
> +static inline void ghes_sea_remove(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +}

Why are these getting called if !CONFIG_HAVE_ACPI_APEI_SEA?

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 4/6] arm64: exception: handle Synchronous External Abort
  2016-02-05 19:13   ` [PATCH V1 4/6] arm64: exception: handle Synchronous External Abort Tyler Baicar
@ 2016-02-10 18:03     ` Will Deacon
       [not found]       ` <20160210180344.GV1052-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Will Deacon @ 2016-02-10 18:03 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: fu.wei, timur, harba, rruigrok, ahs3, Catalin Marinas,
	Rafael J. Wysocki, Len Brown, Matt Fleming, Robert Moore,
	Lv Zheng, linux-arm-kernel, linux-kernel, linux-acpi, linux-efi,
	devel, Naveen Kaje, Jonathan (Zhixiong) Zhang

On Fri, Feb 05, 2016 at 12:13:26PM -0700, Tyler Baicar wrote:
> SEA exceptions are often caused by an uncorrected hardware
> error and are handled when data abort and instruction abort
> exception classes have specific values for their Fault Status
> Code.
> 
> When SEA occurs, before killing the process, go through
> the handlers registered in the notification list.
> 
> Update fault_info[] with specific SEA faults so that the
> new SEA handler is used.
> 
> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
> ---
>  arch/arm64/include/asm/system_misc.h | 13 ++++++++
>  arch/arm64/mm/fault.c                | 58 +++++++++++++++++++++++++++++-------
>  2 files changed, 61 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
> index 57f110b..90daf4a 100644
> --- a/arch/arm64/include/asm/system_misc.h
> +++ b/arch/arm64/include/asm/system_misc.h
> @@ -64,4 +64,17 @@ extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
>  
>  #endif	/* __ASSEMBLY__ */
>  
> +/*
> + * The functions below are used to register and unregister callbacks
> + * that are to be invoked when a Synchronous External Abort (SEA)
> + * occurs. An SEA is raised by certain fault status codes that have
> + * either data or instruction abort as the exception class, and
> + * callbacks may be registered to parse or handle such hardware errors.
> + *
> + * Registered callbacks are run in an interrupt/atomic context. They
> + * are not allowed to block or sleep.
> + */
> +int sea_register_handler_chain(struct notifier_block *nb);
> +void sea_unregister_handler_chain(struct notifier_block *nb);
> +
>  #endif	/* __ASM_SYSTEM_MISC_H */
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 92ddac1..d6fa691 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -39,6 +39,22 @@
>  #include <asm/pgtable.h>
>  #include <asm/tlbflush.h>
>  
> +/*
> + * GHES SEA handler code may register a notifier call here to
> + * handle HW error record passed from platform.
> + */
> +static ATOMIC_NOTIFIER_HEAD(sea_handler_chain);
> +
> +int sea_register_handler_chain(struct notifier_block *nb)
> +{
> +	return atomic_notifier_chain_register(&sea_handler_chain, nb);
> +}
> +
> +void sea_unregister_handler_chain(struct notifier_block *nb)
> +{
> +	atomic_notifier_chain_unregister(&sea_handler_chain, nb);
> +}
> +
>  static const char *fault_name(unsigned int esr);
>  
>  /*
> @@ -379,6 +395,28 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  	return 1;
>  }
>  
> +/*
> + * This abort handler deals with Synchronous External Abort.
> + * It calls notifiers, and then returns "fault".
> + */
> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
> +{
> +	struct siginfo info;
> +
> +	atomic_notifier_call_chain(&sea_handler_chain, 0, NULL);
> +
> +	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
> +		 fault_name(esr), esr, addr);
> +
> +	info.si_signo = SIGBUS;
> +	info.si_errno = 0;
> +	info.si_code  = 0;
> +	info.si_addr  = (void __user *)addr;
> +	arm64_notify_die("", regs, &info, esr);

Surely we don't want to call this if the notifier chain handled the
exception?

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 4/6] arm64: exception: handle Synchronous External Abort
       [not found]       ` <20160210180344.GV1052-5wv7dgnIgG8@public.gmane.org>
@ 2016-02-11  2:40         ` Abdulhamid, Harb
  0 siblings, 0 replies; 16+ messages in thread
From: Abdulhamid, Harb @ 2016-02-11  2:40 UTC (permalink / raw)
  To: Will Deacon, Tyler Baicar
  Cc: fu.wei-QSEj5FYQhm4dnm+yROfE0A, timur-sgV2jX0FEOL9JmXXK+q4OQ,
	rruigrok-sgV2jX0FEOL9JmXXK+q4OQ, ahs3-H+wXaHxf7aLQT0dZR+AlfA,
	Catalin Marinas, Rafael J. Wysocki, Len Brown, Matt Fleming,
	Robert Moore, Lv Zheng,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, devel-E0kO6a4B6psdnm+yROfE0A,
	Naveen Kaje, Jonathan (Zhixiong) Zhang

On 2/10/2016 1:03 PM, Will Deacon wrote:
> On Fri, Feb 05, 2016 at 12:13:26PM -0700, Tyler Baicar wrote:

<snip>

>> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>> +{
>> +	struct siginfo info;
>> +
>> +	atomic_notifier_call_chain(&sea_handler_chain, 0, NULL);
>> +
>> +	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>> +		 fault_name(esr), esr, addr);
>> +
>> +	info.si_signo = SIGBUS;
>> +	info.si_errno = 0;
>> +	info.si_code  = 0;
>> +	info.si_addr  = (void __user *)addr;
>> +	arm64_notify_die("", regs, &info, esr);
> 
> Surely we don't want to call this if the notifier chain handled the
> exception?
You are correct, Ideally you should not die if the notifier chain
handled the exception (e.g. via memory fault handling).  However, this
patch was intended as a first step to provide the user with more useful
information about the hardware error (e.g. details of a cache error, bus
error, or memory error that led to the SEA).

The thought was to do what your suggesting as a next step (i.e. adding
actually recovery mechanisms in the SEA handler). However, there are a
couple of questions enumerated below that I think need more discussion.

First, you need a way to get information returned from the notifier
chain to understand whether or not it recovered from the error. (If this
easier than I'm making it out to be, please set me straight here, as it
was not clear to me at first glance on how to do that)

Second, you need a way to kill/abort the thread that encountered this
error, which (I assume) would only be valid/possible thing to do if it
was a user thread that encountered the hardware error.

For example, let's say we encounter an SEA due to a memory error that
was successfully handled by the memory fault handling code (e.g. offline
a page owned by some user application).  Since this is a synchronous
error that may have occurred either on a load, store, or instruction
fetch, the SEA handler must also know to kill the user thread that
encountered that hardware error.  It is not clear to me how we do that
cleanly, and what the repercussions would be. Would it get handled
naturally after the page has become invalid (e.g. it would just result
in a translation fault when attempting to continue the thread, existing
kernel software error handling takes it from there)?

Also, keep in mind that our current assumption is that *all* kernel data
and threads should be considered critical, and any
corruption/termination of kernel data/threads should always be treated
as fatal.  Please let us know if you disagree.

Harb
-- 
Qualcomm Technologies, Inc.
on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 5/6] arm64: exception: handle instruction abort at current EL
       [not found]       ` <20160210180210.GT1052-5wv7dgnIgG8@public.gmane.org>
@ 2016-02-11  3:03         ` Abdulhamid, Harb
  0 siblings, 0 replies; 16+ messages in thread
From: Abdulhamid, Harb @ 2016-02-11  3:03 UTC (permalink / raw)
  To: Will Deacon, Tyler Baicar
  Cc: fu.wei-QSEj5FYQhm4dnm+yROfE0A, timur-sgV2jX0FEOL9JmXXK+q4OQ,
	rruigrok-sgV2jX0FEOL9JmXXK+q4OQ, ahs3-H+wXaHxf7aLQT0dZR+AlfA,
	Catalin Marinas, Rafael J. Wysocki, Len Brown, Matt Fleming,
	Robert Moore, Lv Zheng,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, devel-E0kO6a4B6psdnm+yROfE0A,
	Naveen Kaje

On 2/10/2016 1:02 PM, Will Deacon wrote:
> On Fri, Feb 05, 2016 at 12:13:27PM -0700, Tyler Baicar wrote:
>> Add a handler for instruction aborts at the current EL
>> (ESR_ELx_EC_IABT_CUR) so they are no longer handled in el1_inv.
>> This allows firmware first handling for possible SEA
>> (Synchronous External Abort) caused instruction abort at
>> current EL.
>>
>> Signed-off-by: Tyler Baicar <tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>> Signed-off-by: Naveen Kaje <nkaje-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>> ---
>>  arch/arm64/kernel/entry.S | 19 +++++++++++++++++++
>>  1 file changed, 19 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
>> index 1f7f5a2..6b7fb14 100644
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -336,6 +336,8 @@ el1_sync:
>>  	lsr	x24, x1, #ESR_ELx_EC_SHIFT	// exception class
>>  	cmp	x24, #ESR_ELx_EC_DABT_CUR	// data abort in EL1
>>  	b.eq	el1_da
>> +	cmp	x24, #ESR_ELx_EC_IABT_CUR	// instruction abort in EL1
>> +	b.eq	el1_ia
>>  	cmp	x24, #ESR_ELx_EC_SYS64		// configurable trap
>>  	b.eq	el1_undef
>>  	cmp	x24, #ESR_ELx_EC_SP_ALIGN	// stack alignment exception
>> @@ -363,6 +365,23 @@ el1_da:
>>  	// disable interrupts before pulling preserved data off the stack
>>  	disable_irq
>>  	kernel_exit 1
>> +el1_ia:
>> +	/*
>> +	 * Instruction abort handling
>> +	 */
>> +	mrs	x0, far_el1
>> +	enable_dbg
>> +	// re-enable interrupts if they were enabled in the aborted context
>> +	tbnz	x23, #7, 1f			// PSR_I_BIT
>> +	enable_irq
>> +1:
>> +	orr	x1, x1, #1 << 24		// use reserved ISS bit for instruction aborts
> 
> Do we actually need to set this bit (ESR_LNX_EXEC) for aborts from EL1?
> If not, could we just use the same entry code as el1_da?
> 
This is based on what you already do in el0_ia, so the assumption was
that it would be necessary for el1_ia.  Here is an example call flow to
help illustrate why I think this would be needed:
--> el1_ia
  --> do_mem_abort(): determines its a translation fault
    --> do_page_fault(): sets VM_EXEC in vm_flags based on ESR_LNX_EXEC

I admit that I have no idea how the VM_EXEC flag would be used later on
in the guts of the kernel page fault handling code, but we assumed there
is some need to differentiate between instruction and data faults based
on the existence of this flag.

Are you suggesting that this flag does not get used, or is it not really
needed?  If you think this flag adds no value, then we'll do whatever
you suggest.

Harb
-- 
Qualcomm Technologies, Inc.
on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 6/6] acpi: apei: handle SEA notification type for ARMv8
  2016-02-10 18:03     ` Will Deacon
@ 2016-02-11  3:22       ` Abdulhamid, Harb
       [not found]       ` <20160210180332.GU1052-5wv7dgnIgG8@public.gmane.org>
  1 sibling, 0 replies; 16+ messages in thread
From: Abdulhamid, Harb @ 2016-02-11  3:22 UTC (permalink / raw)
  To: Will Deacon, Tyler Baicar
  Cc: fu.wei, timur, rruigrok, ahs3, Catalin Marinas,
	Rafael J. Wysocki, Len Brown, Matt Fleming, Robert Moore,
	Lv Zheng, linux-arm-kernel, linux-kernel, linux-acpi, linux-efi,
	devel, Naveen Kaje, Jonathan (Zhixiong) Zhang

On 2/10/2016 1:03 PM, Will Deacon wrote:
> On Fri, Feb 05, 2016 at 12:13:28PM -0700, Tyler Baicar wrote:

>> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +static inline int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +	return -ENOTSUPP;
>> +}
>> +
>> +static inline void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +}
> 
> Why are these getting called if !CONFIG_HAVE_ACPI_APEI_SEA?
> 
This was added to catch firmware bugs (i.e. bad ACPI tables).  Since
"SEA" is a valid GHES notify type in ACPI, it's just a number in an ACPI
table.  If someone incorrectly set SEA as their notify type in their
HEST table on an Intel system, this would catch that error here.

We may do this with less code by getting rid of the #else (as you
suggest), but we need to add #ifdefs to eliminate the calls to
ghes_sea_add and ghes_sea_remove to avoid compiler errors.  Does the
below change look better?

>>@@ -1093,6 +1168,11 @@ static int ghes_probe(struct platform_device
>>*ghes_dev)
>> 		list_add_rcu(&ghes->list, &ghes_sci);
>> 		mutex_unlock(&ghes_list_mutex);
>> 		break;
>>

+#ifdef CONFIG_HAVE_ACPI_APEI_SEA

>>+	case ACPI_HEST_NOTIFY_SEA:
>>+		rc = ghes_sea_add(ghes);
>>+		if (rc)
>>+			goto err_edac_unreg;
>>+		break;

+#endif

...

>>@@ -1135,6 +1215,9 @@ static int ghes_remove(struct platform_device
>>*ghes_dev)
>> 			>>unregister_acpi_hed_notifier(&ghes_notifier_sci);
>> 		mutex_unlock(&ghes_list_mutex);
>> 		break;


+#ifdef CONFIG_HAVE_ACPI_APEI_SEA

>>+	case ACPI_HEST_NOTIFY_SEA:
>>+		ghes_sea_remove(ghes);
>>+		break;

+#endif

Harb
-- 
Qualcomm Technologies, Inc.
on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 6/6] acpi: apei: handle SEA notification type for ARMv8
       [not found]       ` <20160210180332.GU1052-5wv7dgnIgG8@public.gmane.org>
@ 2016-02-11 22:37         ` Baicar, Tyler
  2016-02-12  9:51           ` Will Deacon
  0 siblings, 1 reply; 16+ messages in thread
From: Baicar, Tyler @ 2016-02-11 22:37 UTC (permalink / raw)
  To: Will Deacon
  Cc: fu.wei-QSEj5FYQhm4dnm+yROfE0A, timur-sgV2jX0FEOL9JmXXK+q4OQ,
	harba-sgV2jX0FEOL9JmXXK+q4OQ, rruigrok-sgV2jX0FEOL9JmXXK+q4OQ,
	ahs3-H+wXaHxf7aLQT0dZR+AlfA, Catalin Marinas, Rafael J. Wysocki,
	Len Brown, Matt Fleming, Robert Moore, Lv Zheng,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, devel-E0kO6a4B6psdnm+yROfE0A,
	Naveen Kaje, Jonathan (Zhixiong) Zhang

Hello Will,

On 2/10/2016 11:03 AM, Will Deacon wrote:
> On Fri, Feb 05, 2016 at 12:13:28PM -0700, Tyler Baicar wrote:
>> ARM APEI extension proposal added SEA (Synchrounous External
>> Abort) notification type for ARMv8.
>>
>> Add a new GHES error source handling function for SEA. If an error
>> source's notification type is SEA, then this function can be registered
>> into the SEA exception handler. That way GHES will parse and report
>> SEA exceptions when they occur.
>>
>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>> Signed-off-by: Tyler Baicar <tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>> Signed-off-by: Naveen Kaje <nkaje-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>> ---
>>   arch/arm64/Kconfig        |  1 +
>>   drivers/acpi/apei/Kconfig | 13 ++++++++
>>   drivers/acpi/apei/ghes.c  | 83 +++++++++++++++++++++++++++++++++++++++++++++++
>>   include/acpi/actbl1.h     |  6 +++-
>>   4 files changed, 102 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 6e4a4f4..236f398 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -6,6 +6,7 @@ config ARM64
>>   	select ACPI_REDUCED_HARDWARE_ONLY if ACPI
>>   	select ARCH_HAS_DEVMEM_IS_ALLOWED
>>   	select HAVE_ACPI_APEI if ACPI
>> +	select HAVE_ACPI_APEI_SEA if ACPI
>>   	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
>>   	select ARCH_HAS_ELF_RANDOMIZE
>>   	select ARCH_HAS_GCOV_PROFILE_ALL
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index a60bb00..bfcbb9e 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -7,6 +7,19 @@ config HAVE_ACPI_APEI_NMI
>>   config HAVE_ACPI_APEI_HEST_IA32
>>   	bool
>>   
>> +config HAVE_ACPI_APEI_SEA
>> +	bool "APEI Synchronous External Abort logging/recovering support"
>> +	help
>> +	  This option should be enabled if the system supports
>> +	  firmware first handling of SEA (Synchronous External Abort).
>> +	  SEA happens with certain faults of data abort or instruction
>> +	  abort synchronous exceptions on ARMv8 systems. If a system
>> +	  supports firmware first handling of SEA, the platform analyzes
>> +	  and handles hardware error notifications with SEA, and it may then
>> +	  form a HW error record for the OS to parse and handle. This
>> +	  option allows the OS to look for such HW error record, and
>> +	  take appropriate action.
>> +
>>   config ACPI_APEI
>>   	bool "ACPI Platform Error Interface (APEI)"
>>   	select MISC_FILESYSTEMS
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 6c68100..ed64b97 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -50,6 +50,10 @@
>>   #include <acpi/apei.h>
>>   #include <asm/tlbflush.h>
>>   
>> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
>> +#include <asm/system_misc.h>
>> +#endif
>> +
>>   #include "apei-internal.h"
>>   
>>   #define GHES_PFX	"GHES: "
>> @@ -784,6 +788,62 @@ static struct notifier_block ghes_notifier_sci = {
>>   	.notifier_call = ghes_notify_sci,
>>   };
>>   
>> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
>> +static LIST_HEAD(ghes_sea);
>> +
>> +static int ghes_notify_sea(struct notifier_block *this,
>> +				  unsigned long event, void *data)
>> +{
>> +	struct ghes *ghes;
>> +	int ret = NOTIFY_DONE;
>> +
>> +	rcu_read_lock();
>> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
>> +		if (!ghes_proc(ghes))
>> +			ret = NOTIFY_OK;
>> +	}
>> +	rcu_read_unlock();
>> +
>> +	return ret;
>> +}
>> +
>> +static struct notifier_block ghes_notifier_sea = {
>> +	.notifier_call = ghes_notify_sea,
>> +};
>> +
>> +static int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
> Can you just use spin_lock, to be consistent with our other excception
> hooks?
This mutex is being used throughout ghes.c for editing the lists, so I 
think this is the proper (or at least consistent) implementation. This 
mutex was defined specifically for editing the lists according to the 
comment above the mutex definition:

"All error sources notified with SCI shares one notifier function, so 
they need to be linked and checked one by one. This is applied to NMI 
too. RCU is used for these lists, so ghes_list_mutex is only used for 
list changing, not for traversing."

The use of this mutex is identical to the way that SCI and NMI use it 
when adding or deleting from the lists. Should I add to this comment 
that this applies to SEA as well?
>
>> +	if (list_empty(&ghes_sea))
>> +		sea_register_handler_chain(&ghes_notifier_sea);
>> +	list_add_rcu(&ghes->list, &ghes_sea);
>> +	mutex_unlock(&ghes_list_mutex);
>> +	return 0;
>> +}
>> +
>> +static void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
>> +	list_del_rcu(&ghes->list);
>> +	if (list_empty(&ghes_sea))
>> +		sea_unregister_handler_chain(&ghes_notifier_sea);
>> +	mutex_unlock(&ghes_list_mutex);
>> +}
>> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +static inline int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +	return -ENOTSUPP;
>> +}
>> +
>> +static inline void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +}
> Why are these getting called if !CONFIG_HAVE_ACPI_APEI_SEA?
>
> Will
Thanks,
Tyler

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V1 6/6] acpi: apei: handle SEA notification type for ARMv8
  2016-02-11 22:37         ` Baicar, Tyler
@ 2016-02-12  9:51           ` Will Deacon
  0 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2016-02-12  9:51 UTC (permalink / raw)
  To: Baicar, Tyler
  Cc: fu.wei, timur, harba, rruigrok, ahs3, Catalin Marinas,
	Rafael J. Wysocki, Len Brown, Matt Fleming, Robert Moore,
	Lv Zheng, linux-arm-kernel, linux-kernel, linux-acpi, linux-efi,
	devel, Naveen Kaje, Jonathan (Zhixiong) Zhang

On Thu, Feb 11, 2016 at 03:37:55PM -0700, Baicar, Tyler wrote:
> On 2/10/2016 11:03 AM, Will Deacon wrote:
> >On Fri, Feb 05, 2016 at 12:13:28PM -0700, Tyler Baicar wrote:
> >>diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> >>index 6c68100..ed64b97 100644
> >>--- a/drivers/acpi/apei/ghes.c
> >>+++ b/drivers/acpi/apei/ghes.c
> >>@@ -50,6 +50,10 @@
> >>  #include <acpi/apei.h>
> >>  #include <asm/tlbflush.h>
> >>+#ifdef CONFIG_HAVE_ACPI_APEI_SEA
> >>+#include <asm/system_misc.h>
> >>+#endif
> >>+
> >>  #include "apei-internal.h"
> >>  #define GHES_PFX	"GHES: "
> >>@@ -784,6 +788,62 @@ static struct notifier_block ghes_notifier_sci = {
> >>  	.notifier_call = ghes_notify_sci,
> >>  };
> >>+#ifdef CONFIG_HAVE_ACPI_APEI_SEA
> >>+static LIST_HEAD(ghes_sea);
> >>+
> >>+static int ghes_notify_sea(struct notifier_block *this,
> >>+				  unsigned long event, void *data)
> >>+{
> >>+	struct ghes *ghes;
> >>+	int ret = NOTIFY_DONE;
> >>+
> >>+	rcu_read_lock();
> >>+	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
> >>+		if (!ghes_proc(ghes))
> >>+			ret = NOTIFY_OK;
> >>+	}
> >>+	rcu_read_unlock();
> >>+
> >>+	return ret;
> >>+}
> >>+
> >>+static struct notifier_block ghes_notifier_sea = {
> >>+	.notifier_call = ghes_notify_sea,
> >>+};
> >>+
> >>+static int ghes_sea_add(struct ghes *ghes)
> >>+{
> >>+	mutex_lock(&ghes_list_mutex);
> >Can you just use spin_lock, to be consistent with our other excception
> >hooks?
> This mutex is being used throughout ghes.c for editing the lists, so I think
> this is the proper (or at least consistent) implementation. This mutex was
> defined specifically for editing the lists according to the comment above
> the mutex definition:
> 
> "All error sources notified with SCI shares one notifier function, so they
> need to be linked and checked one by one. This is applied to NMI too. RCU is
> used for these lists, so ghes_list_mutex is only used for list changing, not
> for traversing."
> 
> The use of this mutex is identical to the way that SCI and NMI use it when
> adding or deleting from the lists. Should I add to this comment that this
> applies to SEA as well?

No, it's fine. I overlooked the fact that this was under drivers/acpi/,
so if you're consistent with other code under there then there's no need
to change anything.

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-02-12  9:51 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-05 19:13 [PATCH V1 0/6] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64 Tyler Baicar
2016-02-05 19:13 ` [PATCH V1 1/6] acpi: apei: read ack upon ghes record consumption Tyler Baicar
2016-02-05 19:13 ` [PATCH V1 3/6] efi: parse ARMv8 processor error Tyler Baicar
     [not found] ` <1454699608-22760-1-git-send-email-tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2016-02-05 19:13   ` [PATCH V1 2/6] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1 Tyler Baicar
2016-02-05 19:13   ` [PATCH V1 4/6] arm64: exception: handle Synchronous External Abort Tyler Baicar
2016-02-10 18:03     ` Will Deacon
     [not found]       ` <20160210180344.GV1052-5wv7dgnIgG8@public.gmane.org>
2016-02-11  2:40         ` Abdulhamid, Harb
2016-02-05 19:13   ` [PATCH V1 5/6] arm64: exception: handle instruction abort at current EL Tyler Baicar
2016-02-10 18:02     ` Will Deacon
     [not found]       ` <20160210180210.GT1052-5wv7dgnIgG8@public.gmane.org>
2016-02-11  3:03         ` Abdulhamid, Harb
2016-02-05 19:13 ` [PATCH V1 6/6] acpi: apei: handle SEA notification type for ARMv8 Tyler Baicar
     [not found]   ` <1454699608-22760-7-git-send-email-tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2016-02-10 18:03     ` Will Deacon
2016-02-11  3:22       ` Abdulhamid, Harb
     [not found]       ` <20160210180332.GU1052-5wv7dgnIgG8@public.gmane.org>
2016-02-11 22:37         ` Baicar, Tyler
2016-02-12  9:51           ` Will Deacon
2016-02-10 17:44 ` [PATCH V1 0/6] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64 Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).