Linux-EDAC Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/6] rasdaemon:add logging of HiSilicon HIP08 non-standard H/W errors and changes in the error decoding code
       [not found] <Shiju Jose>
@ 2019-06-17 14:28 ` Shiju Jose
  2019-06-17 14:28   ` [PATCH 1/6] rasdaemon:print non-standard error data if not decoded Shiju Jose
                     ` (6 more replies)
  2019-08-12 10:11 ` [PATCH RFC 0/4] ACPI: APEI: Add support to notify the vendor specific HW errors Shiju Jose
  1 sibling, 7 replies; 19+ messages in thread
From: Shiju Jose @ 2019-06-17 14:28 UTC (permalink / raw)
  To: mchehab, linux-edac, linuxarm; +Cc: Shiju Jose

This patch set add few changes in the non-standard error decoding code and
logging for the HiSilicon HIP08 non-standard H/W errors.

Shiju Jose (6):
  rasdaemon:print non-standard error data if not decoded
  rasdaemon: rearrange HiSilicon HIP07 decoding function table
  rasdaemon: update iteration logic for the non-standard error decoding
    functions
  rasdaemon:add logging HiSilicon HIP08 H/W errors reported in the OEM
    format1
  rasdaemon:add logging HiSilicon HIP08 H/W errors reported in the OEM
    format2
  rasdaemon:add logging HiSilicon HIP08 PCIe local errors

 Makefile.am                |   2 +-
 non-standard-hisi_hip07.c  |  36 +-
 non-standard-hisi_hip08.c  | 855 +++++++++++++++++++++++++++++++++++++++++++++
 ras-non-standard-handler.c |  36 +-
 ras-non-standard-handler.h |   8 +-
 ras-record.c               |  30 +-
 ras-record.h               |  13 +
 7 files changed, 932 insertions(+), 48 deletions(-)
 create mode 100644 non-standard-hisi_hip08.c

-- 
1.9.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] rasdaemon:print non-standard error data if not decoded
  2019-06-17 14:28 ` [PATCH 0/6] rasdaemon:add logging of HiSilicon HIP08 non-standard H/W errors and changes in the error decoding code Shiju Jose
@ 2019-06-17 14:28   ` Shiju Jose
  2019-06-17 14:28   ` [PATCH 2/6] rasdaemon: rearrange HiSilicon HIP07 decoding function table Shiju Jose
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Shiju Jose @ 2019-06-17 14:28 UTC (permalink / raw)
  To: mchehab, linux-edac, linuxarm; +Cc: Shiju Jose

This patch change printing non-standard error data
only if not decoded.

Suggested-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 ras-non-standard-handler.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/ras-non-standard-handler.c b/ras-non-standard-handler.c
index 21e6a76..d343a2a 100644
--- a/ras-non-standard-handler.c
+++ b/ras-non-standard-handler.c
@@ -160,20 +160,6 @@ int ras_non_standard_event_handler(struct trace_seq *s,
 	ev.error = pevent_get_field_raw(s, event, "buf", record, &len, 1);
 	if(!ev.error)
 		return -1;
-	len = ev.length;
-	i = 0;
-	line_count = 0;
-	trace_seq_printf(s, " error:\n  %08x: ", i);
-	while(len >= 4) {
-		print_le_hex(s, ev.error, i);
-		i+=4;
-		len-=4;
-		if(++line_count == 4) {
-			trace_seq_printf(s, "\n  %08x: ", i);
-			line_count = 0;
-		} else
-			trace_seq_printf(s, " ");
-	}
 
 	for (count = 0; count < dec_tab_count && !dec_done; count++) {
 		dec_tab = ns_dec_tab[count];
@@ -187,6 +173,23 @@ int ras_non_standard_event_handler(struct trace_seq *s,
 		}
 	}
 
+	if (!dec_done) {
+		len = ev.length;
+		i = 0;
+		line_count = 0;
+		trace_seq_printf(s, " error:\n  %08x: ", i);
+		while (len >= 4) {
+			print_le_hex(s, ev.error, i);
+			i += 4;
+			len -= 4;
+			if (++line_count == 4) {
+				trace_seq_printf(s, "\n  %08x: ", i);
+				line_count = 0;
+			} else
+				trace_seq_printf(s, " ");
+		}
+	}
+
 	/* Insert data into the SGBD */
 #ifdef HAVE_SQLITE3
 	ras_store_non_standard_record(ras, &ev);
-- 
1.9.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/6] rasdaemon: rearrange HiSilicon HIP07 decoding function table
  2019-06-17 14:28 ` [PATCH 0/6] rasdaemon:add logging of HiSilicon HIP08 non-standard H/W errors and changes in the error decoding code Shiju Jose
  2019-06-17 14:28   ` [PATCH 1/6] rasdaemon:print non-standard error data if not decoded Shiju Jose
@ 2019-06-17 14:28   ` Shiju Jose
  2019-06-17 14:28   ` [PATCH 3/6] rasdaemon: update iteration logic for the non-standard error decoding functions Shiju Jose
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Shiju Jose @ 2019-06-17 14:28 UTC (permalink / raw)
  To: mchehab, linux-edac, linuxarm; +Cc: Shiju Jose

This patch rearranges the decoding function table for the
HiSilicon HIP07 non-standard errors.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 non-standard-hisi_hip07.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/non-standard-hisi_hip07.c b/non-standard-hisi_hip07.c
index 3e9dabd..19a5c47 100644
--- a/non-standard-hisi_hip07.c
+++ b/non-standard-hisi_hip07.c
@@ -24,20 +24,6 @@
 #define HISI_SAS_VALID_ERR_TYPE       BIT(2)
 #define HISI_SAS_VALID_AXI_ERR_INFO   BIT(3)
 
-static int decode_hip07_sas_error(struct trace_seq *s, const void *error);
-static int decode_hip07_hns_error(struct trace_seq *s, const void *error);
-
-struct ras_ns_dec_tab hisi_ns_dec_tab[] = {
-	{
-		.sec_type = "daffd8146eba4d8c8a91bc9bbf4aa301",
-		.decode = decode_hip07_sas_error,
-	},
-	{
-		.sec_type = "fbc2d923ea7a453dab132949f5af9e53",
-		.decode = decode_hip07_hns_error,
-	},
-};
-
 struct hisi_sas_err_sec {
 	uint64_t   val_bits;
 	uint64_t   physical_addr;
@@ -138,6 +124,18 @@ static int decode_hip07_hns_error(struct trace_seq *s, const void *error)
 {
 	return 0;
 }
+
+struct ras_ns_dec_tab hisi_ns_dec_tab[] = {
+	{
+		.sec_type = "daffd8146eba4d8c8a91bc9bbf4aa301",
+		.decode = decode_hip07_sas_error,
+	},
+	{
+		.sec_type = "fbc2d923ea7a453dab132949f5af9e53",
+		.decode = decode_hip07_hns_error,
+	},
+};
+
 __attribute__((constructor))
 static void hip07_init(void)
 {
-- 
1.9.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/6] rasdaemon: update iteration logic for the non-standard error decoding functions
  2019-06-17 14:28 ` [PATCH 0/6] rasdaemon:add logging of HiSilicon HIP08 non-standard H/W errors and changes in the error decoding code Shiju Jose
  2019-06-17 14:28   ` [PATCH 1/6] rasdaemon:print non-standard error data if not decoded Shiju Jose
  2019-06-17 14:28   ` [PATCH 2/6] rasdaemon: rearrange HiSilicon HIP07 decoding function table Shiju Jose
@ 2019-06-17 14:28   ` Shiju Jose
  2019-06-17 14:28   ` [PATCH 4/6] rasdaemon:add logging HiSilicon HIP08 H/W errors reported in the OEM format1 Shiju Jose
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Shiju Jose @ 2019-06-17 14:28 UTC (permalink / raw)
  To: mchehab, linux-edac, linuxarm; +Cc: Shiju Jose

This patch updates the iteration logic for the non-standard
error decoding functions.

Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 non-standard-hisi_hip07.c  | 2 +-
 ras-non-standard-handler.c | 2 +-
 ras-non-standard-handler.h | 1 -
 3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/non-standard-hisi_hip07.c b/non-standard-hisi_hip07.c
index 19a5c47..bb2576e 100644
--- a/non-standard-hisi_hip07.c
+++ b/non-standard-hisi_hip07.c
@@ -134,11 +134,11 @@ struct ras_ns_dec_tab hisi_ns_dec_tab[] = {
 		.sec_type = "fbc2d923ea7a453dab132949f5af9e53",
 		.decode = decode_hip07_hns_error,
 	},
+	{ /* sentinel */ }
 };
 
 __attribute__((constructor))
 static void hip07_init(void)
 {
-	hisi_ns_dec_tab[0].len = ARRAY_SIZE(hisi_ns_dec_tab);
 	register_ns_dec_tab(hisi_ns_dec_tab);
 }
diff --git a/ras-non-standard-handler.c b/ras-non-standard-handler.c
index d343a2a..392bb27 100644
--- a/ras-non-standard-handler.c
+++ b/ras-non-standard-handler.c
@@ -163,7 +163,7 @@ int ras_non_standard_event_handler(struct trace_seq *s,
 
 	for (count = 0; count < dec_tab_count && !dec_done; count++) {
 		dec_tab = ns_dec_tab[count];
-		for (i = 0; i < dec_tab[0].len; i++) {
+		for (i = 0; dec_tab[i].decode; i++) {
 			if (uuid_le_cmp(ev.sec_type,
 					dec_tab[i].sec_type) == 0) {
 				dec_tab[i].decode(s, ev.error);
diff --git a/ras-non-standard-handler.h b/ras-non-standard-handler.h
index b9e9fb1..b2c9743 100644
--- a/ras-non-standard-handler.h
+++ b/ras-non-standard-handler.h
@@ -23,7 +23,6 @@
 typedef struct ras_ns_dec_tab {
 	const char *sec_type;
 	int (*decode)(struct trace_seq *s, const void *err);
-	size_t len;
 } *p_ns_dec_tab;
 
 int ras_non_standard_event_handler(struct trace_seq *s,
-- 
1.9.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/6] rasdaemon:add logging HiSilicon HIP08 H/W errors reported in the OEM format1
  2019-06-17 14:28 ` [PATCH 0/6] rasdaemon:add logging of HiSilicon HIP08 non-standard H/W errors and changes in the error decoding code Shiju Jose
                     ` (2 preceding siblings ...)
  2019-06-17 14:28   ` [PATCH 3/6] rasdaemon: update iteration logic for the non-standard error decoding functions Shiju Jose
@ 2019-06-17 14:28   ` Shiju Jose
  2019-06-17 14:28   ` [PATCH 5/6] rasdaemon:add logging HiSilicon HIP08 H/W errors reported in the OEM format2 Shiju Jose
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Shiju Jose @ 2019-06-17 14:28 UTC (permalink / raw)
  To: mchehab, linux-edac, linuxarm; +Cc: Shiju Jose

This patch adds logging the HiSilicon HIP08 H/W errors reported
in the non-standard OEM format1.
These errors are from the H/W modules MN, PLL, SLLC, AA, SIOE,
POE, DISP, LPC, SAS and SATA.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 Makefile.am                |   2 +-
 non-standard-hisi_hip07.c  |   8 +-
 non-standard-hisi_hip08.c  | 332 +++++++++++++++++++++++++++++++++++++++++++++
 ras-non-standard-handler.c |   3 +-
 ras-non-standard-handler.h |   7 +-
 ras-record.c               |  30 ++--
 ras-record.h               |  13 ++
 7 files changed, 378 insertions(+), 17 deletions(-)
 create mode 100644 non-standard-hisi_hip08.c

diff --git a/Makefile.am b/Makefile.am
index f036ffd..3d89672 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -49,7 +49,7 @@ if WITH_ABRT_REPORT
    rasdaemon_SOURCES += ras-report.c
 endif
 if WITH_HISI_NS_DECODE
-   rasdaemon_SOURCES += non-standard-hisi_hip07.c
+   rasdaemon_SOURCES += non-standard-hisi_hip07.c non-standard-hisi_hip08.c
 endif
 rasdaemon_LDADD = -lpthread $(SQLITE3_LIBS) libtrace/libtrace.a
 
diff --git a/non-standard-hisi_hip07.c b/non-standard-hisi_hip07.c
index bb2576e..7f58fb3 100644
--- a/non-standard-hisi_hip07.c
+++ b/non-standard-hisi_hip07.c
@@ -87,7 +87,9 @@ static char *sas_axi_err_type(int etype)
 	return "unknown error";
 }
 
-static int decode_hip07_sas_error(struct trace_seq *s, const void *error)
+static int decode_hip07_sas_error(struct ras_events *ras,
+				  struct ras_ns_dec_tab *dec_tab,
+				  struct trace_seq *s, const void *error)
 {
 	char buf[1024];
 	char *p = buf;
@@ -120,7 +122,9 @@ static int decode_hip07_sas_error(struct trace_seq *s, const void *error)
 	return 0;
 }
 
-static int decode_hip07_hns_error(struct trace_seq *s, const void *error)
+static int decode_hip07_hns_error(struct ras_events *ras,
+				  struct ras_ns_dec_tab *dec_tab,
+				  struct trace_seq *s, const void *error)
 {
 	return 0;
 }
diff --git a/non-standard-hisi_hip08.c b/non-standard-hisi_hip08.c
new file mode 100644
index 0000000..240e832
--- /dev/null
+++ b/non-standard-hisi_hip08.c
@@ -0,0 +1,332 @@
+/*
+ * Copyright (c) 2019 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include "ras-record.h"
+#include "ras-logger.h"
+#include "ras-report.h"
+#include "ras-non-standard-handler.h"
+
+/* HISI OEM error definitions */
+/* HISI OEM format1 error definitions */
+#define HISI_OEM_MODULE_ID_MN	0
+#define HISI_OEM_MODULE_ID_PLL	1
+#define HISI_OEM_MODULE_ID_SLLC	2
+#define HISI_OEM_MODULE_ID_AA	3
+#define HISI_OEM_MODULE_ID_SIOE	4
+#define HISI_OEM_MODULE_ID_POE	5
+#define HISI_OEM_MODULE_ID_DISP	8
+#define HISI_OEM_MODULE_ID_LPC	9
+#define HISI_OEM_MODULE_ID_SAS	15
+#define HISI_OEM_MODULE_ID_SATA	16
+
+#define HISI_OEM_VALID_SOC_ID		BIT(0)
+#define HISI_OEM_VALID_SOCKET_ID	BIT(1)
+#define HISI_OEM_VALID_NIMBUS_ID	BIT(2)
+#define HISI_OEM_VALID_MODULE_ID	BIT(3)
+#define HISI_OEM_VALID_SUB_MODULE_ID	BIT(4)
+#define HISI_OEM_VALID_ERR_SEVERITY	BIT(5)
+
+#define HISI_OEM_TYPE1_VALID_ERR_MISC_0	BIT(6)
+#define HISI_OEM_TYPE1_VALID_ERR_MISC_1	BIT(7)
+#define HISI_OEM_TYPE1_VALID_ERR_MISC_2	BIT(8)
+#define HISI_OEM_TYPE1_VALID_ERR_MISC_3	BIT(9)
+#define HISI_OEM_TYPE1_VALID_ERR_MISC_4	BIT(10)
+#define HISI_OEM_TYPE1_VALID_ERR_ADDR	BIT(11)
+
+struct hisi_oem_type1_err_sec {
+	uint32_t   val_bits;
+	uint8_t    version;
+	uint8_t    soc_id;
+	uint8_t    socket_id;
+	uint8_t    nimbus_id;
+	uint8_t    module_id;
+	uint8_t    sub_module_id;
+	uint8_t    err_severity;
+	uint8_t    reserv;
+	uint32_t   err_misc_0;
+	uint32_t   err_misc_1;
+	uint32_t   err_misc_2;
+	uint32_t   err_misc_3;
+	uint32_t   err_misc_4;
+	uint64_t   err_addr;
+};
+
+enum hisi_oem_data_type {
+	hisi_oem_data_type_int,
+	hisi_oem_data_type_int64,
+	hisi_oem_data_type_text,
+};
+
+enum {
+	hip08_oem_type1_field_id,
+	hip08_oem_type1_field_version,
+	hip08_oem_type1_field_soc_id,
+	hip08_oem_type1_field_socket_id,
+	hip08_oem_type1_field_nimbus_id,
+	hip08_oem_type1_field_module_id,
+	hip08_oem_type1_field_sub_module_id,
+	hip08_oem_type1_field_err_sev,
+	hip08_oem_type1_field_err_misc_0,
+	hip08_oem_type1_field_err_misc_1,
+	hip08_oem_type1_field_err_misc_2,
+	hip08_oem_type1_field_err_misc_3,
+	hip08_oem_type1_field_err_misc_4,
+	hip08_oem_type1_field_err_addr,
+};
+
+/* helper functions */
+static char *err_severity(uint8_t err_sev)
+{
+	switch (err_sev) {
+	case 0: return "recoverable";
+	case 1: return "fatal";
+	case 2: return "corrected";
+	case 3: return "none";
+	}
+	return "unknown";
+}
+
+static char *oem_type1_module_name(uint8_t module_id)
+{
+	switch (module_id) {
+	case HISI_OEM_MODULE_ID_MN: return "MN";
+	case HISI_OEM_MODULE_ID_PLL: return "PLL";
+	case HISI_OEM_MODULE_ID_SLLC: return "SLLC";
+	case HISI_OEM_MODULE_ID_AA: return "AA";
+	case HISI_OEM_MODULE_ID_SIOE: return "SIOE";
+	case HISI_OEM_MODULE_ID_POE: return "POE";
+	case HISI_OEM_MODULE_ID_DISP: return "DISP";
+	case HISI_OEM_MODULE_ID_LPC: return "LPC";
+	case HISI_OEM_MODULE_ID_SAS: return "SAS";
+	case HISI_OEM_MODULE_ID_SATA: return "SATA";
+	}
+	return "unknown";
+}
+
+#ifdef HAVE_SQLITE3
+static const struct db_fields hip08_oem_type1_event_fields[] = {
+	{ .name = "id",			.type = "INTEGER PRIMARY KEY" },
+	{ .name = "version",		.type = "INTEGER" },
+	{ .name = "soc_id",		.type = "INTEGER" },
+	{ .name = "socket_id",		.type = "INTEGER" },
+	{ .name = "nimbus_id",		.type = "INTEGER" },
+	{ .name = "module_id",		.type = "TEXT" },
+	{ .name = "sub_module_id",	.type = "INTEGER" },
+	{ .name = "err_severity",	.type = "TEXT" },
+	{ .name = "err_misc_0",		.type = "INTEGER" },
+	{ .name = "err_misc_1",		.type = "INTEGER" },
+	{ .name = "err_misc_2",		.type = "INTEGER" },
+	{ .name = "err_misc_3",		.type = "INTEGER" },
+	{ .name = "err_misc_4",		.type = "INTEGER" },
+	{ .name = "err_addr",		.type = "INTEGER" },
+};
+
+static const struct db_table_descriptor hip08_oem_type1_event_tab = {
+	.name = "hip08_oem_type1_event",
+	.fields = hip08_oem_type1_event_fields,
+	.num_fields = ARRAY_SIZE(hip08_oem_type1_event_fields),
+};
+
+static void record_vendor_data(struct ras_ns_dec_tab *dec_tab,
+			       enum hisi_oem_data_type data_type,
+			       int id, int64_t data, const char *text)
+{
+	switch (data_type) {
+	case hisi_oem_data_type_int:
+		sqlite3_bind_int(dec_tab->stmt_dec_record, id, data);
+		break;
+	case hisi_oem_data_type_int64:
+		sqlite3_bind_int64(dec_tab->stmt_dec_record, id, data);
+		break;
+	case hisi_oem_data_type_text:
+		sqlite3_bind_text(dec_tab->stmt_dec_record, id, text, -1, NULL);
+		break;
+	default:
+		break;
+	}
+}
+
+static int step_vendor_data_tab(struct ras_ns_dec_tab *dec_tab, char *name)
+{
+	int rc;
+
+	rc = sqlite3_step(dec_tab->stmt_dec_record);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed to do %s step on sqlite: error = %d\n", name, rc);
+
+	rc = sqlite3_reset(dec_tab->stmt_dec_record);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed to reset %s on sqlite: error = %d\n", name, rc);
+
+	rc = sqlite3_clear_bindings(dec_tab->stmt_dec_record);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed to clear bindings %s on sqlite: error = %d\n",
+		    name, rc);
+
+	return rc;
+}
+#else
+static void record_vendor_data(struct ras_ns_dec_tab *dec_tab,
+			       enum hisi_oem_data_type data_type,
+			       int id, int64_t data, const char *text)
+{ }
+
+static int step_vendor_data_tab(struct ras_ns_dec_tab *dec_tab, char *name)
+{
+	return 0;
+}
+#endif
+
+/* error data decoding functions */
+static int decode_hip08_oem_type1_error(struct ras_events *ras,
+					struct ras_ns_dec_tab *dec_tab,
+					struct trace_seq *s, const void *error)
+{
+	const struct hisi_oem_type1_err_sec *err = error;
+	char buf[1024];
+	char *p = buf;
+
+	if (err->val_bits == 0) {
+		trace_seq_printf(s, "%s: no valid error information\n",
+				 __func__);
+		return -1;
+	}
+
+#ifdef HAVE_SQLITE3
+	if (!dec_tab->stmt_dec_record) {
+		if (ras_mc_add_vendor_table(ras, &dec_tab->stmt_dec_record,
+					    &hip08_oem_type1_event_tab)
+			!= SQLITE_OK) {
+			trace_seq_printf(s,
+					"create sql hip08_oem_type1_event_tab fail\n");
+			return -1;
+		}
+	}
+#endif
+
+	p += sprintf(p, "[ ");
+	p += sprintf(p, "Table version=%d ", err->version);
+	record_vendor_data(dec_tab, hisi_oem_data_type_int,
+			   hip08_oem_type1_field_version, err->version, NULL);
+
+	if (err->val_bits & HISI_OEM_VALID_SOC_ID) {
+		p += sprintf(p, "SOC ID=%d ", err->soc_id);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type1_field_soc_id,
+				   err->soc_id, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_VALID_SOCKET_ID) {
+		p += sprintf(p, "socket ID=%d ", err->socket_id);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type1_field_socket_id,
+				   err->socket_id, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_VALID_NIMBUS_ID) {
+		p += sprintf(p, "nimbus ID=%d ", err->nimbus_id);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type1_field_nimbus_id,
+				   err->nimbus_id, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_VALID_MODULE_ID) {
+		p += sprintf(p, "module=%s-",
+			     oem_type1_module_name(err->module_id));
+		record_vendor_data(dec_tab, hisi_oem_data_type_text,
+				   hip08_oem_type1_field_module_id,
+				   0, oem_type1_module_name(err->module_id));
+		if (err->val_bits & HISI_OEM_VALID_SUB_MODULE_ID) {
+			p += sprintf(p, "%d ", err->sub_module_id);
+			record_vendor_data(dec_tab, hisi_oem_data_type_int,
+					   hip08_oem_type1_field_sub_module_id,
+					   err->sub_module_id, NULL);
+		}
+	}
+
+	if (err->val_bits & HISI_OEM_VALID_ERR_SEVERITY) {
+		p += sprintf(p, "error severity=%s ",
+			     err_severity(err->err_severity));
+		record_vendor_data(dec_tab, hisi_oem_data_type_text,
+				   hip08_oem_type1_field_err_sev,
+				   0, err_severity(err->err_severity));
+	}
+
+	p += sprintf(p, "]");
+	trace_seq_printf(s, "\nHISI HIP08: OEM Type-1 Error\n");
+	trace_seq_printf(s, "%s\n", buf);
+
+	trace_seq_printf(s, "Reg Dump:\n");
+	if (err->val_bits & HISI_OEM_TYPE1_VALID_ERR_MISC_0) {
+		trace_seq_printf(s, "ERR_MISC0=0x%x\n", err->err_misc_0);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type1_field_err_misc_0,
+				   err->err_misc_0, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_TYPE1_VALID_ERR_MISC_1) {
+		trace_seq_printf(s, "ERR_MISC1=0x%x\n", err->err_misc_1);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type1_field_err_misc_1,
+				   err->err_misc_1, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_TYPE1_VALID_ERR_MISC_2) {
+		trace_seq_printf(s, "ERR_MISC2=0x%x\n", err->err_misc_2);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type1_field_err_misc_2,
+				   err->err_misc_2, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_TYPE1_VALID_ERR_MISC_3) {
+		trace_seq_printf(s, "ERR_MISC3=0x%x\n", err->err_misc_3);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type1_field_err_misc_3,
+				   err->err_misc_3, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_TYPE1_VALID_ERR_MISC_4) {
+		trace_seq_printf(s, "ERR_MISC4=0x%x\n", err->err_misc_4);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type1_field_err_misc_4,
+				   err->err_misc_4, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_TYPE1_VALID_ERR_ADDR) {
+		trace_seq_printf(s, "ERR_ADDR=0x%p\n", (void *)err->err_addr);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int64,
+				   hip08_oem_type1_field_err_addr,
+				   err->err_addr, NULL);
+	}
+
+	step_vendor_data_tab(dec_tab, "hip08_oem_type1_event_tab");
+
+	return 0;
+}
+
+struct ras_ns_dec_tab hip08_ns_oem_tab[] = {
+	{
+		.sec_type = "1f8161e155d641e6bd107afd1dc5f7c5",
+		.decode = decode_hip08_oem_type1_error,
+	},
+	{ /* sentinel */ }
+};
+
+__attribute__((constructor))
+static void hip08_init(void)
+{
+	register_ns_dec_tab(hip08_ns_oem_tab);
+}
diff --git a/ras-non-standard-handler.c b/ras-non-standard-handler.c
index 392bb27..4eda80b 100644
--- a/ras-non-standard-handler.c
+++ b/ras-non-standard-handler.c
@@ -166,7 +166,8 @@ int ras_non_standard_event_handler(struct trace_seq *s,
 		for (i = 0; dec_tab[i].decode; i++) {
 			if (uuid_le_cmp(ev.sec_type,
 					dec_tab[i].sec_type) == 0) {
-				dec_tab[i].decode(s, ev.error);
+				dec_tab[i].decode(ras, &dec_tab[i],
+						  s, ev.error);
 				dec_done = true;
 				break;
 			}
diff --git a/ras-non-standard-handler.h b/ras-non-standard-handler.h
index b2c9743..a7e48a3 100644
--- a/ras-non-standard-handler.h
+++ b/ras-non-standard-handler.h
@@ -22,7 +22,12 @@
 
 typedef struct ras_ns_dec_tab {
 	const char *sec_type;
-	int (*decode)(struct trace_seq *s, const void *err);
+	int (*decode)(struct ras_events *ras, struct ras_ns_dec_tab *dec_tab,
+		      struct trace_seq *s, const void *err);
+#ifdef HAVE_SQLITE3
+#include <sqlite3.h>
+	sqlite3_stmt *stmt_dec_record;
+#endif
 } *p_ns_dec_tab;
 
 int ras_non_standard_event_handler(struct trace_seq *s,
diff --git a/ras-record.c b/ras-record.c
index 4c8b55b..b212607 100644
--- a/ras-record.c
+++ b/ras-record.c
@@ -38,17 +38,6 @@
 
 #define ARRAY_SIZE(x) (sizeof(x)/sizeof(*(x)))
 
-struct db_fields {
-	char *name;
-	char *type;
-};
-
-struct db_table_descriptor {
-	char			*name;
-	const struct db_fields	*fields;
-	size_t			num_fields;
-};
-
 /*
  * Table and functions to handle ras:mc_event
  */
@@ -511,7 +500,7 @@ static int ras_mc_create_table(struct sqlite3_priv *priv,
 {
 	const struct db_fields *field;
 	char sql[1024], *p = sql, *end = sql + sizeof(sql);
-	int i,rc;
+	int i, rc;
 
 	p += snprintf(p, end - p, "CREATE TABLE IF NOT EXISTS %s (",
 		      db_tab->name);
@@ -538,6 +527,23 @@ static int ras_mc_create_table(struct sqlite3_priv *priv,
 	return rc;
 }
 
+int ras_mc_add_vendor_table(struct ras_events *ras,
+			    sqlite3_stmt **stmt,
+			    const struct db_table_descriptor *db_tab)
+{
+	int rc;
+	struct sqlite3_priv *priv = ras->db_priv;
+
+	if (!priv)
+		return -1;
+
+	rc = ras_mc_create_table(priv, db_tab);
+	if (rc == SQLITE_OK)
+		rc = ras_mc_prepare_stmt(priv, stmt, db_tab);
+
+	return rc;
+}
+
 int ras_mc_event_opendb(unsigned cpu, struct ras_events *ras)
 {
 	int rc;
diff --git a/ras-record.h b/ras-record.h
index 2183167..432a571 100644
--- a/ras-record.h
+++ b/ras-record.h
@@ -119,7 +119,20 @@ struct sqlite3_priv {
 #endif
 };
 
+struct db_fields {
+	char *name;
+	char *type;
+};
+
+struct db_table_descriptor {
+	char                    *name;
+	const struct db_fields  *fields;
+	size_t                  num_fields;
+};
+
 int ras_mc_event_opendb(unsigned cpu, struct ras_events *ras);
+int ras_mc_add_vendor_table(struct ras_events *ras, sqlite3_stmt **stmt,
+			    const struct db_table_descriptor *db_tab);
 int ras_store_mc_event(struct ras_events *ras, struct ras_mc_event *ev);
 int ras_store_aer_event(struct ras_events *ras, struct ras_aer_event *ev);
 int ras_store_mce_record(struct ras_events *ras, struct mce_event *ev);
-- 
1.9.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 5/6] rasdaemon:add logging HiSilicon HIP08 H/W errors reported in the OEM format2
  2019-06-17 14:28 ` [PATCH 0/6] rasdaemon:add logging of HiSilicon HIP08 non-standard H/W errors and changes in the error decoding code Shiju Jose
                     ` (3 preceding siblings ...)
  2019-06-17 14:28   ` [PATCH 4/6] rasdaemon:add logging HiSilicon HIP08 H/W errors reported in the OEM format1 Shiju Jose
@ 2019-06-17 14:28   ` Shiju Jose
  2019-06-17 14:28   ` [PATCH 6/6] rasdaemon:add logging HiSilicon HIP08 PCIe local errors Shiju Jose
  2019-06-21 18:42   ` [PATCH 0/6] rasdaemon:add logging of HiSilicon HIP08 non-standard H/W errors and changes in the error decoding code Mauro Carvalho Chehab
  6 siblings, 0 replies; 19+ messages in thread
From: Shiju Jose @ 2019-06-17 14:28 UTC (permalink / raw)
  To: mchehab, linux-edac, linuxarm; +Cc: Shiju Jose

This patch adds logging the HiSilicon HIP08 H/W errors reported
in the non-standard OEM format2.
These errors are from the H/W modules SMMU, HHA, HLLC, PA and DDRC.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 non-standard-hisi_hip08.c | 300 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 300 insertions(+)

diff --git a/non-standard-hisi_hip08.c b/non-standard-hisi_hip08.c
index 240e832..6fe5cbe 100644
--- a/non-standard-hisi_hip08.c
+++ b/non-standard-hisi_hip08.c
@@ -43,6 +43,20 @@
 #define HISI_OEM_TYPE1_VALID_ERR_MISC_4	BIT(10)
 #define HISI_OEM_TYPE1_VALID_ERR_ADDR	BIT(11)
 
+/* HISI OEM format2 error definitions */
+#define HISI_OEM_MODULE_ID_SMMU	0
+#define HISI_OEM_MODULE_ID_HHA	1
+#define HISI_OEM_MODULE_ID_HLLC	2
+#define HISI_OEM_MODULE_ID_PA	3
+#define HISI_OEM_MODULE_ID_DDRC	4
+
+#define HISI_OEM_TYPE2_VALID_ERR_FR	BIT(6)
+#define HISI_OEM_TYPE2_VALID_ERR_CTRL	BIT(7)
+#define HISI_OEM_TYPE2_VALID_ERR_STATUS	BIT(8)
+#define HISI_OEM_TYPE2_VALID_ERR_ADDR	BIT(9)
+#define HISI_OEM_TYPE2_VALID_ERR_MISC_0	BIT(10)
+#define HISI_OEM_TYPE2_VALID_ERR_MISC_1	BIT(11)
+
 struct hisi_oem_type1_err_sec {
 	uint32_t   val_bits;
 	uint8_t    version;
@@ -61,6 +75,30 @@ struct hisi_oem_type1_err_sec {
 	uint64_t   err_addr;
 };
 
+struct hisi_oem_type2_err_sec {
+	uint32_t   val_bits;
+	uint8_t    version;
+	uint8_t    soc_id;
+	uint8_t    socket_id;
+	uint8_t    nimbus_id;
+	uint8_t    module_id;
+	uint8_t    sub_module_id;
+	uint8_t    err_severity;
+	uint8_t    reserv;
+	uint32_t   err_fr_0;
+	uint32_t   err_fr_1;
+	uint32_t   err_ctrl_0;
+	uint32_t   err_ctrl_1;
+	uint32_t   err_status_0;
+	uint32_t   err_status_1;
+	uint32_t   err_addr_0;
+	uint32_t   err_addr_1;
+	uint32_t   err_misc0_0;
+	uint32_t   err_misc0_1;
+	uint32_t   err_misc1_0;
+	uint32_t   err_misc1_1;
+};
+
 enum hisi_oem_data_type {
 	hisi_oem_data_type_int,
 	hisi_oem_data_type_int64,
@@ -84,6 +122,29 @@ enum {
 	hip08_oem_type1_field_err_addr,
 };
 
+enum {
+	hip08_oem_type2_field_id,
+	hip08_oem_type2_field_version,
+	hip08_oem_type2_field_soc_id,
+	hip08_oem_type2_field_socket_id,
+	hip08_oem_type2_field_nimbus_id,
+	hip08_oem_type2_field_module_id,
+	hip08_oem_type2_field_sub_module_id,
+	hip08_oem_type2_field_err_sev,
+	hip08_oem_type2_field_err_fr_0,
+	hip08_oem_type2_field_err_fr_1,
+	hip08_oem_type2_field_err_ctrl_0,
+	hip08_oem_type2_field_err_ctrl_1,
+	hip08_oem_type2_field_err_status_0,
+	hip08_oem_type2_field_err_status_1,
+	hip08_oem_type2_field_err_addr_0,
+	hip08_oem_type2_field_err_addr_1,
+	hip08_oem_type2_field_err_misc0_0,
+	hip08_oem_type2_field_err_misc0_1,
+	hip08_oem_type2_field_err_misc1_0,
+	hip08_oem_type2_field_err_misc1_1,
+};
+
 /* helper functions */
 static char *err_severity(uint8_t err_sev)
 {
@@ -113,6 +174,62 @@ static char *oem_type1_module_name(uint8_t module_id)
 	return "unknown";
 }
 
+static char *oem_type2_module_name(uint8_t module_id)
+{
+	switch (module_id) {
+	case HISI_OEM_MODULE_ID_SMMU: return "SMMU";
+	case HISI_OEM_MODULE_ID_HHA: return "HHA";
+	case HISI_OEM_MODULE_ID_HLLC: return "HLLC";
+	case HISI_OEM_MODULE_ID_PA: return "PA";
+	case HISI_OEM_MODULE_ID_DDRC: return "DDRC";
+	}
+	return "unknown module";
+}
+
+static char *oem_type2_sub_module_id(char *p, uint8_t module_id,
+				     uint8_t sub_module_id)
+{
+	switch (module_id) {
+	case HISI_OEM_MODULE_ID_SMMU:
+	case HISI_OEM_MODULE_ID_HLLC:
+	case HISI_OEM_MODULE_ID_PA:
+		p += sprintf(p, "%d ", sub_module_id);
+		break;
+
+	case HISI_OEM_MODULE_ID_HHA:
+		if (sub_module_id == 0)
+			p += sprintf(p, "TA HHA0 ");
+		else if (sub_module_id == 1)
+			p += sprintf(p, "TA HHA1 ");
+		else if (sub_module_id == 2)
+			p += sprintf(p, "TB HHA0 ");
+		else if (sub_module_id == 3)
+			p += sprintf(p, "TB HHA1 ");
+		break;
+
+	case HISI_OEM_MODULE_ID_DDRC:
+		if (sub_module_id == 0)
+			p += sprintf(p, "TA DDRC0 ");
+		else if (sub_module_id == 1)
+			p += sprintf(p, "TA DDRC1 ");
+		else if (sub_module_id == 2)
+			p += sprintf(p, "TA DDRC2 ");
+		else if (sub_module_id == 3)
+			p += sprintf(p, "TA DDRC3 ");
+		else if (sub_module_id == 4)
+			p += sprintf(p, "TB DDRC0 ");
+		else if (sub_module_id == 5)
+			p += sprintf(p, "TB DDRC1 ");
+		else if (sub_module_id == 6)
+			p += sprintf(p, "TB DDRC2 ");
+		else if (sub_module_id == 7)
+			p += sprintf(p, "TB DDRC3 ");
+		break;
+	}
+
+	return p;
+}
+
 #ifdef HAVE_SQLITE3
 static const struct db_fields hip08_oem_type1_event_fields[] = {
 	{ .name = "id",			.type = "INTEGER PRIMARY KEY" },
@@ -137,6 +254,35 @@ static const struct db_table_descriptor hip08_oem_type1_event_tab = {
 	.num_fields = ARRAY_SIZE(hip08_oem_type1_event_fields),
 };
 
+static const struct db_fields hip08_oem_type2_event_fields[] = {
+	{ .name = "id",                 .type = "INTEGER PRIMARY KEY" },
+	{ .name = "version",            .type = "INTEGER" },
+	{ .name = "soc_id",             .type = "INTEGER" },
+	{ .name = "socket_id",          .type = "INTEGER" },
+	{ .name = "nimbus_id",          .type = "INTEGER" },
+	{ .name = "module_id",          .type = "TEXT" },
+	{ .name = "sub_module_id",      .type = "INTEGER" },
+	{ .name = "err_severity",       .type = "TEXT" },
+	{ .name = "err_fr_0",		.type = "INTEGER" },
+	{ .name = "err_fr_1",		.type = "INTEGER" },
+	{ .name = "err_ctrl_0",		.type = "INTEGER" },
+	{ .name = "err_ctrl_1",		.type = "INTEGER" },
+	{ .name = "err_status_0",	.type = "INTEGER" },
+	{ .name = "err_status_1",	.type = "INTEGER" },
+	{ .name = "err_addr_0",         .type = "INTEGER" },
+	{ .name = "err_addr_1",         .type = "INTEGER" },
+	{ .name = "err_misc0_0",	.type = "INTEGER" },
+	{ .name = "err_misc0_1",	.type = "INTEGER" },
+	{ .name = "err_misc1_0",	.type = "INTEGER" },
+	{ .name = "err_misc1_1",	.type = "INTEGER" },
+};
+
+static const struct db_table_descriptor hip08_oem_type2_event_tab = {
+	.name = "hip08_oem_type2_event",
+	.fields = hip08_oem_type2_event_fields,
+	.num_fields = ARRAY_SIZE(hip08_oem_type2_event_fields),
+};
+
 static void record_vendor_data(struct ras_ns_dec_tab *dec_tab,
 			       enum hisi_oem_data_type data_type,
 			       int id, int64_t data, const char *text)
@@ -317,11 +463,165 @@ static int decode_hip08_oem_type1_error(struct ras_events *ras,
 	return 0;
 }
 
+static int decode_hip08_oem_type2_error(struct ras_events *ras,
+					struct ras_ns_dec_tab *dec_tab,
+					struct trace_seq *s, const void *error)
+{
+	const struct hisi_oem_type2_err_sec *err = error;
+	char buf[1024];
+	char *p = buf;
+
+	if (err->val_bits == 0) {
+		trace_seq_printf(s, "%s: no valid error information\n",
+				 __func__);
+		return -1;
+	}
+
+#ifdef HAVE_SQLITE3
+	if (!dec_tab->stmt_dec_record) {
+		if (ras_mc_add_vendor_table(ras, &dec_tab->stmt_dec_record,
+			&hip08_oem_type2_event_tab) != SQLITE_OK) {
+			trace_seq_printf(s,
+				"create sql hip08_oem_type2_event_tab fail\n");
+			return -1;
+		}
+	}
+#endif
+	p += sprintf(p, "[ ");
+	p += sprintf(p, "Table version=%d ", err->version);
+	record_vendor_data(dec_tab, hisi_oem_data_type_int,
+			   hip08_oem_type2_field_version,
+			   err->version, NULL);
+	if (err->val_bits & HISI_OEM_VALID_SOC_ID) {
+		p += sprintf(p, "SOC ID=%d ", err->soc_id);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_soc_id,
+				   err->soc_id, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_VALID_SOCKET_ID) {
+		p += sprintf(p, "socket ID=%d ", err->socket_id);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_socket_id,
+				   err->socket_id, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_VALID_NIMBUS_ID) {
+		p += sprintf(p, "nimbus ID=%d ", err->nimbus_id);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_nimbus_id,
+				   err->nimbus_id, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_VALID_MODULE_ID) {
+		p += sprintf(p, "module=%s ",
+			     oem_type2_module_name(err->module_id));
+		record_vendor_data(dec_tab, hisi_oem_data_type_text,
+				   hip08_oem_type2_field_module_id,
+				   0, oem_type2_module_name(err->module_id));
+	}
+
+	if (err->val_bits & HISI_OEM_VALID_SUB_MODULE_ID) {
+		p =  oem_type2_sub_module_id(p, err->module_id,
+					     err->sub_module_id);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_sub_module_id,
+				   err->sub_module_id, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_VALID_ERR_SEVERITY) {
+		p += sprintf(p, "error severity=%s ",
+			     err_severity(err->err_severity));
+		record_vendor_data(dec_tab, hisi_oem_data_type_text,
+				   hip08_oem_type2_field_err_sev,
+				   0, err_severity(err->err_severity));
+	}
+
+	p += sprintf(p, "]");
+	trace_seq_printf(s, "\nHISI HIP08: OEM Type-2 Error\n");
+	trace_seq_printf(s, "%s\n", buf);
+
+	trace_seq_printf(s, "Reg Dump:\n");
+	if (err->val_bits & HISI_OEM_TYPE2_VALID_ERR_FR) {
+		trace_seq_printf(s, "ERR_FR_0=0x%x\n", err->err_fr_0);
+		trace_seq_printf(s, "ERR_FR_1=0x%x\n", err->err_fr_1);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_err_fr_0,
+				   err->err_fr_0, NULL);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_err_fr_1,
+				   err->err_fr_1, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_TYPE2_VALID_ERR_CTRL) {
+		trace_seq_printf(s, "ERR_CTRL_0=0x%x\n", err->err_ctrl_0);
+		trace_seq_printf(s, "ERR_CTRL_1=0x%x\n", err->err_ctrl_1);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_err_ctrl_0,
+				   err->err_ctrl_0, NULL);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_err_ctrl_1,
+				   err->err_ctrl_1, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_TYPE2_VALID_ERR_STATUS) {
+		trace_seq_printf(s, "ERR_STATUS_0=0x%x\n", err->err_status_0);
+		trace_seq_printf(s, "ERR_STATUS_1=0x%x\n", err->err_status_1);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_err_status_0,
+				   err->err_status_0, NULL);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_err_status_1,
+				   err->err_status_1, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_TYPE2_VALID_ERR_ADDR) {
+		trace_seq_printf(s, "ERR_ADDR_0=0x%x\n", err->err_addr_0);
+		trace_seq_printf(s, "ERR_ADDR_1=0x%x\n", err->err_addr_1);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_err_addr_0,
+				   err->err_addr_0, NULL);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_err_addr_1,
+				   err->err_addr_1, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_TYPE2_VALID_ERR_MISC_0) {
+		trace_seq_printf(s, "ERR_MISC0_0=0x%x\n", err->err_misc0_0);
+		trace_seq_printf(s, "ERR_MISC0_1=0x%x\n", err->err_misc0_1);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_err_misc0_0,
+				   err->err_misc0_0, NULL);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_err_misc0_1,
+				   err->err_misc0_1, NULL);
+	}
+
+	if (err->val_bits & HISI_OEM_TYPE2_VALID_ERR_MISC_1) {
+		trace_seq_printf(s, "ERR_MISC1_0=0x%x\n", err->err_misc1_0);
+		trace_seq_printf(s, "ERR_MISC1_1=0x%x\n", err->err_misc1_1);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_err_misc1_0,
+				   err->err_misc1_0, NULL);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_oem_type2_field_err_misc1_1,
+				   err->err_misc1_1, NULL);
+	}
+
+	step_vendor_data_tab(dec_tab, "hip08_oem_type2_event_tab");
+
+	return 0;
+}
+
 struct ras_ns_dec_tab hip08_ns_oem_tab[] = {
 	{
 		.sec_type = "1f8161e155d641e6bd107afd1dc5f7c5",
 		.decode = decode_hip08_oem_type1_error,
 	},
+	{
+		.sec_type = "45534ea6ce2341158535e07ab3aef91d",
+		.decode = decode_hip08_oem_type2_error,
+	},
 	{ /* sentinel */ }
 };
 
-- 
1.9.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 6/6] rasdaemon:add logging HiSilicon HIP08 PCIe local errors
  2019-06-17 14:28 ` [PATCH 0/6] rasdaemon:add logging of HiSilicon HIP08 non-standard H/W errors and changes in the error decoding code Shiju Jose
                     ` (4 preceding siblings ...)
  2019-06-17 14:28   ` [PATCH 5/6] rasdaemon:add logging HiSilicon HIP08 H/W errors reported in the OEM format2 Shiju Jose
@ 2019-06-17 14:28   ` Shiju Jose
  2019-06-21 18:42   ` [PATCH 0/6] rasdaemon:add logging of HiSilicon HIP08 non-standard H/W errors and changes in the error decoding code Mauro Carvalho Chehab
  6 siblings, 0 replies; 19+ messages in thread
From: Shiju Jose @ 2019-06-17 14:28 UTC (permalink / raw)
  To: mchehab, linux-edac, linuxarm; +Cc: Shiju Jose

This patch adds logging for the HiSilicon HIP08 PCIe local errors.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 non-standard-hisi_hip08.c | 223 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 223 insertions(+)

diff --git a/non-standard-hisi_hip08.c b/non-standard-hisi_hip08.c
index 6fe5cbe..ae543d6 100644
--- a/non-standard-hisi_hip08.c
+++ b/non-standard-hisi_hip08.c
@@ -57,6 +57,24 @@
 #define HISI_OEM_TYPE2_VALID_ERR_MISC_0	BIT(10)
 #define HISI_OEM_TYPE2_VALID_ERR_MISC_1	BIT(11)
 
+/* HISI PCIe Local error definitions */
+#define HISI_PCIE_SUB_MODULE_ID_AP	0
+#define HISI_PCIE_SUB_MODULE_ID_TL	1
+#define HISI_PCIE_SUB_MODULE_ID_MAC	2
+#define HISI_PCIE_SUB_MODULE_ID_DL	3
+#define HISI_PCIE_SUB_MODULE_ID_SDI	4
+
+#define HISI_PCIE_LOCAL_VALID_VERSION		BIT(0)
+#define HISI_PCIE_LOCAL_VALID_SOC_ID		BIT(1)
+#define HISI_PCIE_LOCAL_VALID_SOCKET_ID		BIT(2)
+#define HISI_PCIE_LOCAL_VALID_NIMBUS_ID		BIT(3)
+#define HISI_PCIE_LOCAL_VALID_SUB_MODULE_ID	BIT(4)
+#define HISI_PCIE_LOCAL_VALID_CORE_ID		BIT(5)
+#define HISI_PCIE_LOCAL_VALID_PORT_ID		BIT(6)
+#define HISI_PCIE_LOCAL_VALID_ERR_TYPE		BIT(7)
+#define HISI_PCIE_LOCAL_VALID_ERR_SEVERITY	BIT(8)
+#define HISI_PCIE_LOCAL_VALID_ERR_MISC		9
+
 struct hisi_oem_type1_err_sec {
 	uint32_t   val_bits;
 	uint8_t    version;
@@ -99,6 +117,21 @@ struct hisi_oem_type2_err_sec {
 	uint32_t   err_misc1_1;
 };
 
+struct hisi_pcie_local_err_sec {
+	uint64_t   val_bits;
+	uint8_t    version;
+	uint8_t    soc_id;
+	uint8_t    socket_id;
+	uint8_t    nimbus_id;
+	uint8_t    sub_module_id;
+	uint8_t    core_id;
+	uint8_t    port_id;
+	uint8_t    err_severity;
+	uint16_t   err_type;
+	uint8_t    reserv[2];
+	uint32_t   err_misc[33];
+};
+
 enum hisi_oem_data_type {
 	hisi_oem_data_type_int,
 	hisi_oem_data_type_int64,
@@ -145,6 +178,20 @@ enum {
 	hip08_oem_type2_field_err_misc1_1,
 };
 
+enum {
+	hip08_pcie_local_field_id,
+	hip08_pcie_local_field_version,
+	hip08_pcie_local_field_soc_id,
+	hip08_pcie_local_field_socket_id,
+	hip08_pcie_local_field_nimbus_id,
+	hip08_pcie_local_field_sub_module_id,
+	hip08_pcie_local_field_core_id,
+	hip08_pcie_local_field_port_id,
+	hip08_pcie_local_field_err_sev,
+	hip08_pcie_local_field_err_type,
+	hip08_pcie_local_field_err_misc,
+};
+
 /* helper functions */
 static char *err_severity(uint8_t err_sev)
 {
@@ -230,6 +277,18 @@ static char *oem_type2_sub_module_id(char *p, uint8_t module_id,
 	return p;
 }
 
+static char *pcie_local_sub_module_name(uint8_t id)
+{
+	switch (id) {
+	case HISI_PCIE_SUB_MODULE_ID_AP: return "AP Layer";
+	case HISI_PCIE_SUB_MODULE_ID_TL: return "TL Layer";
+	case HISI_PCIE_SUB_MODULE_ID_MAC: return "MAC Layer";
+	case HISI_PCIE_SUB_MODULE_ID_DL: return "DL Layer";
+	case HISI_PCIE_SUB_MODULE_ID_SDI: return "SDI Layer";
+	}
+	return "unknown";
+}
+
 #ifdef HAVE_SQLITE3
 static const struct db_fields hip08_oem_type1_event_fields[] = {
 	{ .name = "id",			.type = "INTEGER PRIMARY KEY" },
@@ -283,6 +342,58 @@ static const struct db_table_descriptor hip08_oem_type2_event_tab = {
 	.num_fields = ARRAY_SIZE(hip08_oem_type2_event_fields),
 };
 
+static const struct db_fields hip08_pcie_local_event_fields[] = {
+	{ .name = "id",                 .type = "INTEGER PRIMARY KEY" },
+	{ .name = "version",            .type = "INTEGER" },
+	{ .name = "soc_id",             .type = "INTEGER" },
+	{ .name = "socket_id",          .type = "INTEGER" },
+	{ .name = "nimbus_id",          .type = "INTEGER" },
+	{ .name = "sub_module_id",      .type = "TEXT" },
+	{ .name = "core_id",		.type = "INTEGER" },
+	{ .name = "port_id",		.type = "INTEGER" },
+	{ .name = "err_severity",       .type = "TEXT" },
+	{ .name = "err_type",		.type = "INTEGER" },
+	{ .name = "err_misc0",		.type = "INTEGER" },
+	{ .name = "err_misc1",		.type = "INTEGER" },
+	{ .name = "err_misc2",		.type = "INTEGER" },
+	{ .name = "err_misc3",		.type = "INTEGER" },
+	{ .name = "err_misc4",		.type = "INTEGER" },
+	{ .name = "err_misc5",		.type = "INTEGER" },
+	{ .name = "err_misc6",		.type = "INTEGER" },
+	{ .name = "err_misc7",		.type = "INTEGER" },
+	{ .name = "err_misc8",		.type = "INTEGER" },
+	{ .name = "err_misc9",		.type = "INTEGER" },
+	{ .name = "err_misc10",		.type = "INTEGER" },
+	{ .name = "err_misc11",		.type = "INTEGER" },
+	{ .name = "err_misc12",		.type = "INTEGER" },
+	{ .name = "err_misc13",		.type = "INTEGER" },
+	{ .name = "err_misc14",		.type = "INTEGER" },
+	{ .name = "err_misc15",		.type = "INTEGER" },
+	{ .name = "err_misc16",		.type = "INTEGER" },
+	{ .name = "err_misc17",		.type = "INTEGER" },
+	{ .name = "err_misc18",		.type = "INTEGER" },
+	{ .name = "err_misc19",		.type = "INTEGER" },
+	{ .name = "err_misc20",		.type = "INTEGER" },
+	{ .name = "err_misc21",		.type = "INTEGER" },
+	{ .name = "err_misc22",		.type = "INTEGER" },
+	{ .name = "err_misc23",		.type = "INTEGER" },
+	{ .name = "err_misc24",		.type = "INTEGER" },
+	{ .name = "err_misc25",		.type = "INTEGER" },
+	{ .name = "err_misc26",		.type = "INTEGER" },
+	{ .name = "err_misc27",		.type = "INTEGER" },
+	{ .name = "err_misc28",		.type = "INTEGER" },
+	{ .name = "err_misc29",		.type = "INTEGER" },
+	{ .name = "err_misc30",		.type = "INTEGER" },
+	{ .name = "err_misc31",		.type = "INTEGER" },
+	{ .name = "err_misc32",		.type = "INTEGER" },
+};
+
+static const struct db_table_descriptor hip08_pcie_local_event_tab = {
+	.name = "hip08_pcie_local_event",
+	.fields = hip08_pcie_local_event_fields,
+	.num_fields = ARRAY_SIZE(hip08_pcie_local_event_fields),
+};
+
 static void record_vendor_data(struct ras_ns_dec_tab *dec_tab,
 			       enum hisi_oem_data_type data_type,
 			       int id, int64_t data, const char *text)
@@ -613,6 +724,114 @@ static int decode_hip08_oem_type2_error(struct ras_events *ras,
 	return 0;
 }
 
+static int decode_hip08_pcie_local_error(struct ras_events *ras,
+					 struct ras_ns_dec_tab *dec_tab,
+					 struct trace_seq *s, const void *error)
+{
+	const struct hisi_pcie_local_err_sec *err = error;
+	char buf[1024];
+	char *p = buf;
+	uint32_t i;
+
+	if (err->val_bits == 0) {
+		trace_seq_printf(s, "%s: no valid error information\n",
+				 __func__);
+		return -1;
+	}
+
+#ifdef HAVE_SQLITE3
+	if (!dec_tab->stmt_dec_record) {
+		if (ras_mc_add_vendor_table(ras, &dec_tab->stmt_dec_record,
+				&hip08_pcie_local_event_tab) != SQLITE_OK) {
+			trace_seq_printf(s,
+				"create sql hip08_pcie_local_event_tab fail\n");
+			return -1;
+		}
+	}
+#endif
+	p += sprintf(p, "[ ");
+	p += sprintf(p, "Table version=%d ", err->version);
+	record_vendor_data(dec_tab, hisi_oem_data_type_int,
+			   hip08_pcie_local_field_version,
+			   err->version, NULL);
+	if (err->val_bits & HISI_PCIE_LOCAL_VALID_SOC_ID) {
+		p += sprintf(p, "SOC ID=%d ", err->soc_id);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_pcie_local_field_soc_id,
+				   err->soc_id, NULL);
+	}
+
+	if (err->val_bits & HISI_PCIE_LOCAL_VALID_SOCKET_ID) {
+		p += sprintf(p, "socket ID=%d ", err->socket_id);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_pcie_local_field_socket_id,
+				   err->socket_id, NULL);
+	}
+
+	if (err->val_bits & HISI_PCIE_LOCAL_VALID_NIMBUS_ID) {
+		p += sprintf(p, "nimbus ID=%d ", err->nimbus_id);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_pcie_local_field_nimbus_id,
+				   err->nimbus_id, NULL);
+	}
+
+	if (err->val_bits & HISI_PCIE_LOCAL_VALID_SUB_MODULE_ID) {
+		p += sprintf(p, "sub module=%s ",
+			     pcie_local_sub_module_name(err->sub_module_id));
+		record_vendor_data(dec_tab, hisi_oem_data_type_text,
+				   hip08_pcie_local_field_sub_module_id,
+				   0, pcie_local_sub_module_name(err->sub_module_id));
+	}
+
+	if (err->val_bits & HISI_PCIE_LOCAL_VALID_CORE_ID) {
+		p += sprintf(p, "core ID=core%d ", err->core_id);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_pcie_local_field_core_id,
+				   err->core_id, NULL);
+	}
+
+	if (err->val_bits & HISI_PCIE_LOCAL_VALID_PORT_ID) {
+		p += sprintf(p, "port ID=port%d ", err->port_id);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_pcie_local_field_port_id,
+				   err->port_id, NULL);
+	}
+
+	if (err->val_bits & HISI_PCIE_LOCAL_VALID_ERR_SEVERITY) {
+		p += sprintf(p, "error severity=%s ",
+			     err_severity(err->err_severity));
+		record_vendor_data(dec_tab, hisi_oem_data_type_text,
+				   hip08_pcie_local_field_err_sev,
+				   0, err_severity(err->err_severity));
+	}
+
+	if (err->val_bits & HISI_PCIE_LOCAL_VALID_ERR_TYPE) {
+		p += sprintf(p, "error type=0x%x ", err->err_type);
+		record_vendor_data(dec_tab, hisi_oem_data_type_int,
+				   hip08_pcie_local_field_err_type,
+				   err->err_type, NULL);
+	}
+	p += sprintf(p, "]");
+
+	trace_seq_printf(s, "\nHISI HIP08: PCIe local error\n");
+	trace_seq_printf(s, "%s\n", buf);
+
+	trace_seq_printf(s, "Reg Dump:\n");
+	for (i = 0; i < 33; i++) {
+		if (err->val_bits & BIT(HISI_PCIE_LOCAL_VALID_ERR_MISC + i)) {
+			trace_seq_printf(s, "ERR_MISC_%d=0x%x\n", i,
+					 err->err_misc[i]);
+			record_vendor_data(dec_tab, hisi_oem_data_type_int,
+					   (hip08_pcie_local_field_err_misc + i),
+					   err->err_misc[i], NULL);
+		}
+	}
+
+	step_vendor_data_tab(dec_tab, "hip08_pcie_local_event_tab");
+
+	return 0;
+}
+
 struct ras_ns_dec_tab hip08_ns_oem_tab[] = {
 	{
 		.sec_type = "1f8161e155d641e6bd107afd1dc5f7c5",
@@ -622,6 +841,10 @@ struct ras_ns_dec_tab hip08_ns_oem_tab[] = {
 		.sec_type = "45534ea6ce2341158535e07ab3aef91d",
 		.decode = decode_hip08_oem_type2_error,
 	},
+	{
+		.sec_type = "b2889fc9e7d74f9da867af42e98be772",
+		.decode = decode_hip08_pcie_local_error,
+	},
 	{ /* sentinel */ }
 };
 
-- 
1.9.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] rasdaemon:add logging of HiSilicon HIP08 non-standard H/W errors and changes in the error decoding code
  2019-06-17 14:28 ` [PATCH 0/6] rasdaemon:add logging of HiSilicon HIP08 non-standard H/W errors and changes in the error decoding code Shiju Jose
                     ` (5 preceding siblings ...)
  2019-06-17 14:28   ` [PATCH 6/6] rasdaemon:add logging HiSilicon HIP08 PCIe local errors Shiju Jose
@ 2019-06-21 18:42   ` Mauro Carvalho Chehab
  6 siblings, 0 replies; 19+ messages in thread
From: Mauro Carvalho Chehab @ 2019-06-21 18:42 UTC (permalink / raw)
  To: Shiju Jose; +Cc: linux-edac, linuxarm

Em Mon, 17 Jun 2019 15:28:46 +0100
Shiju Jose <shiju.jose@huawei.com> escreveu:

> This patch set add few changes in the non-standard error decoding code and
> logging for the HiSilicon HIP08 non-standard H/W errors.
> 
> Shiju Jose (6):
>   rasdaemon:print non-standard error data if not decoded
>   rasdaemon: rearrange HiSilicon HIP07 decoding function table
>   rasdaemon: update iteration logic for the non-standard error decoding
>     functions
>   rasdaemon:add logging HiSilicon HIP08 H/W errors reported in the OEM
>     format1
>   rasdaemon:add logging HiSilicon HIP08 H/W errors reported in the OEM
>     format2
>   rasdaemon:add logging HiSilicon HIP08 PCIe local errors
> 
>  Makefile.am                |   2 +-
>  non-standard-hisi_hip07.c  |  36 +-
>  non-standard-hisi_hip08.c  | 855 +++++++++++++++++++++++++++++++++++++++++++++
>  ras-non-standard-handler.c |  36 +-
>  ras-non-standard-handler.h |   8 +-
>  ras-record.c               |  30 +-
>  ras-record.h               |  13 +
>  7 files changed, 932 insertions(+), 48 deletions(-)
>  create mode 100644 non-standard-hisi_hip08.c
> 

Applied, thanks!


Thanks,
Mauro

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH RFC 0/4] ACPI: APEI: Add support to notify the vendor specific HW errors
       [not found] <Shiju Jose>
  2019-06-17 14:28 ` [PATCH 0/6] rasdaemon:add logging of HiSilicon HIP08 non-standard H/W errors and changes in the error decoding code Shiju Jose
@ 2019-08-12 10:11 ` Shiju Jose
  2019-08-12 10:11   ` [PATCH RFC 1/4] " Shiju Jose
                     ` (4 more replies)
  1 sibling, 5 replies; 19+ messages in thread
From: Shiju Jose @ 2019-08-12 10:11 UTC (permalink / raw)
  To: linux-acpi, linux-edac, linux-kernel, rjw, lenb, james.morse,
	tony.luck, bp, baicar
  Cc: linuxarm, jonathan.cameron, tanxiaofei, Shiju Jose

Presently kernel does not support reporting the vendor specific HW errors,
in the non-standard format, to the vendor drivers for the recovery.

This patch set add this support and also move the existing handler
functions for the standard errors to the new callback method.
Also the CCIX RAS patches could be move to the proposed callback method.
https://www.spinics.net/lists/linux-edac/msg10508.html
https://patchwork.kernel.org/patch/10979491/

Shiju Jose (4):
  ACPI: APEI: Add support to notify the vendor specific HW errors
  ACPI: APEI: Add ghes_handle_memory_failure to the new notification
    method
  ACPI: APEI: Add ghes_handle_aer to the new notification method
  ACPI: APEI: Add log_arm_hw_error to the new notification method

 drivers/acpi/apei/ghes.c | 170 +++++++++++++++++++++++++++++++++++++++++------
 drivers/ras/ras.c        |   5 +-
 include/acpi/ghes.h      |  47 +++++++++++++
 include/linux/ras.h      |   7 +-
 4 files changed, 205 insertions(+), 24 deletions(-)

-- 
1.9.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH RFC 1/4] ACPI: APEI: Add support to notify the vendor specific HW errors
  2019-08-12 10:11 ` [PATCH RFC 0/4] ACPI: APEI: Add support to notify the vendor specific HW errors Shiju Jose
@ 2019-08-12 10:11   ` " Shiju Jose
  2019-08-21 17:23     ` James Morse
  2019-08-12 10:11   ` [PATCH RFC 2/4] ACPI: APEI: Add ghes_handle_memory_failure to the new notification method Shiju Jose
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Shiju Jose @ 2019-08-12 10:11 UTC (permalink / raw)
  To: linux-acpi, linux-edac, linux-kernel, rjw, lenb, james.morse,
	tony.luck, bp, baicar
  Cc: linuxarm, jonathan.cameron, tanxiaofei, Shiju Jose

Presently the vendor specific HW errors, in the non-standard format,
are not reported to the vendor drivers for the recovery.

This patch adds support to notify the vendor specific HW errors to the
registered kernel drivers.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/acpi/apei/ghes.c | 118 +++++++++++++++++++++++++++++++++++++++++++++--
 include/acpi/ghes.h      |  47 +++++++++++++++++++
 2 files changed, 160 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index a66e00f..374d197 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -477,6 +477,77 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
 #endif
 }
 
+struct ghes_error_notify {
+	struct list_head list;
+	struct rcu_head	rcu_head;
+	guid_t sec_type; /* guid of the error record */
+	error_handle handle; /* error handler function */
+	void *data; /* handler driver's private data if any */
+};
+
+/* List to store the registered error handling functions */
+static DEFINE_MUTEX(ghes_error_notify_mutex);
+static LIST_HEAD(ghes_error_notify_list);
+static refcount_t ghes_ref_count;
+
+/**
+ * ghes_error_notify_register - register an error handling function
+ * for the hw errors.
+ * @sec_type: sec_type of the corresponding CPER to be notified.
+ * @handle: pointer to the error handling function.
+ * @data: handler driver's private data.
+ *
+ * return 0 : SUCCESS, non-zero : FAIL
+ */
+int ghes_error_notify_register(guid_t sec_type, error_handle handle, void *data)
+{
+	struct ghes_error_notify *err_notify;
+
+	mutex_lock(&ghes_error_notify_mutex);
+	err_notify = kzalloc(sizeof(*err_notify), GFP_KERNEL);
+	if (!err_notify)
+		return -ENOMEM;
+
+	err_notify->handle = handle;
+	guid_copy(&err_notify->sec_type, &sec_type);
+	err_notify->data = data;
+	list_add_rcu(&err_notify->list, &ghes_error_notify_list);
+	mutex_unlock(&ghes_error_notify_mutex);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ghes_error_notify_register);
+
+/**
+ * ghes_error_notify_unregister - unregister an error handling function.
+ * @sec_type: sec_type of the corresponding CPER.
+ * @handle: pointer to the error handling function.
+ *
+ * return none.
+ */
+void ghes_error_notify_unregister(guid_t sec_type, error_handle handle)
+{
+	struct ghes_error_notify *err_notify;
+	bool found = 0;
+
+	mutex_lock(&ghes_error_notify_mutex);
+	rcu_read_lock();
+	list_for_each_entry_rcu(err_notify, &ghes_error_notify_list, list) {
+		if (guid_equal(&err_notify->sec_type, &sec_type) &&
+		    err_notify->handle == handle) {
+			list_del_rcu(&err_notify->list);
+			found = 1;
+			break;
+		}
+	}
+	rcu_read_unlock();
+	synchronize_rcu();
+	mutex_unlock(&ghes_error_notify_mutex);
+	if (found)
+		kfree(err_notify);
+}
+EXPORT_SYMBOL_GPL(ghes_error_notify_unregister);
+
 static void ghes_do_proc(struct ghes *ghes,
 			 const struct acpi_hest_generic_status *estatus)
 {
@@ -485,6 +556,8 @@ static void ghes_do_proc(struct ghes *ghes,
 	guid_t *sec_type;
 	guid_t *fru_id = &NULL_UUID_LE;
 	char *fru_text = "";
+	bool is_notify = 0;
+	struct ghes_error_notify *err_notify;
 
 	sev = ghes_severity(estatus->error_severity);
 	apei_estatus_for_each_section(estatus, gdata) {
@@ -512,11 +585,29 @@ static void ghes_do_proc(struct ghes *ghes,
 
 			log_arm_hw_error(err);
 		} else {
-			void *err = acpi_hest_get_payload(gdata);
-
-			log_non_standard_event(sec_type, fru_id, fru_text,
-					       sec_sev, err,
-					       gdata->error_data_length);
+			rcu_read_lock();
+			list_for_each_entry_rcu(err_notify,
+						&ghes_error_notify_list, list) {
+				if (guid_equal(&err_notify->sec_type,
+					       sec_type)) {
+					/* The notification is called in the
+					 * interrupt context, thus the handler
+					 * functions should be take care of it.
+					 */
+					err_notify->handle(gdata, sev,
+							   err_notify->data);
+					is_notify = 1;
+				}
+			}
+			rcu_read_unlock();
+
+			if (!is_notify) {
+				void *err = acpi_hest_get_payload(gdata);
+
+				log_non_standard_event(sec_type, fru_id,
+						       fru_text, sec_sev, err,
+						       gdata->error_data_length);
+			}
 		}
 	}
 }
@@ -1217,6 +1308,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
 
 	ghes_edac_register(ghes, &ghes_dev->dev);
 
+	if (!refcount_read(&ghes_ref_count))
+		refcount_set(&ghes_ref_count, 1);
+	else
+		refcount_inc(&ghes_ref_count);
+
 	/* Handle any pending errors right away */
 	spin_lock_irqsave(&ghes_notify_lock_irq, flags);
 	ghes_proc(ghes);
@@ -1237,6 +1333,7 @@ static int ghes_remove(struct platform_device *ghes_dev)
 	int rc;
 	struct ghes *ghes;
 	struct acpi_hest_generic *generic;
+	struct ghes_error_notify *err_notify, *tmp;
 
 	ghes = platform_get_drvdata(ghes_dev);
 	generic = ghes->generic;
@@ -1279,6 +1376,17 @@ static int ghes_remove(struct platform_device *ghes_dev)
 
 	ghes_fini(ghes);
 
+	if (refcount_dec_and_test(&ghes_ref_count) &&
+	    !list_empty(&ghes_error_notify_list)) {
+		mutex_lock(&ghes_error_notify_mutex);
+		list_for_each_entry_safe(err_notify, tmp,
+					 &ghes_error_notify_list, list) {
+			list_del_rcu(&err_notify->list);
+			kfree_rcu(err_notify, rcu_head);
+		}
+		mutex_unlock(&ghes_error_notify_mutex);
+	}
+
 	ghes_edac_unregister(ghes);
 
 	kfree(ghes);
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index e3f1cdd..d480537 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -50,6 +50,53 @@ enum {
 	GHES_SEV_PANIC = 0x3,
 };
 
+/**
+ * error_handle - error handling function for the hw errors.
+ * This handle function is called in the interrupt context.
+ * @gdata: acpi_hest_generic_data.
+ * @sev: error severity of the entire error event defined in the
+ * ACPI spec table generic error status block.
+ * @data: handler driver's private data.
+ *
+ * return : none.
+ */
+typedef void (*error_handle)(struct acpi_hest_generic_data *gdata, int sev,
+			     void *data);
+
+#ifdef CONFIG_ACPI_APEI_GHES
+/**
+ * ghes_error_notify_register - register an error handling function
+ * for the hw errors.
+ * @sec_type: sec_type of the corresponding CPER to be notified.
+ * @handle: pointer to the error handling function.
+ * @data: handler driver's private data.
+ *
+ * return : 0 - SUCCESS, non-zero - FAIL.
+ */
+int ghes_error_notify_register(guid_t sec_type, error_handle handle,
+			       void *data);
+
+/**
+ * ghes_error_notify_unregister - unregister an error handling function
+ * for the hw errors.
+ * @sec_type: sec_type of the corresponding CPER.
+ * @handle: pointer to the error handling function.
+ *
+ * return none.
+ */
+void ghes_error_notify_unregister(guid_t sec_type, error_handle handle);
+
+#else
+int ghes_error_notify_register(guid_t sec_type, error_handle handle, void *data)
+{
+	return -ENODEV;
+}
+
+void ghes_error_notify_unregister(guid_t sec_type, error_handle handle)
+{
+}
+#endif
+
 int ghes_estatus_pool_init(int num_ghes);
 
 /* From drivers/edac/ghes_edac.c */
-- 
1.9.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH RFC 2/4] ACPI: APEI: Add ghes_handle_memory_failure to the new notification method
  2019-08-12 10:11 ` [PATCH RFC 0/4] ACPI: APEI: Add support to notify the vendor specific HW errors Shiju Jose
  2019-08-12 10:11   ` [PATCH RFC 1/4] " Shiju Jose
@ 2019-08-12 10:11   ` Shiju Jose
  2019-08-21 17:22     ` James Morse
  2019-08-12 10:11   ` [PATCH RFC 3/4] ACPI: APEI: Add ghes_handle_aer " Shiju Jose
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Shiju Jose @ 2019-08-12 10:11 UTC (permalink / raw)
  To: linux-acpi, linux-edac, linux-kernel, rjw, lenb, james.morse,
	tony.luck, bp, baicar
  Cc: linuxarm, jonathan.cameron, tanxiaofei, Shiju Jose

This patch adds ghes_handle_memory_failure to the new error
notification method.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/acpi/apei/ghes.c | 51 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 36 insertions(+), 15 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 374d197..4400d56 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -401,14 +401,18 @@ static void ghes_clear_estatus(struct ghes *ghes,
 		ghes_ack_error(ghes->generic_v2);
 }
 
-static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int sev)
+static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
+				       int sev, void *data)
 {
-#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
+	struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
+	int sec_sev = ghes_severity(gdata->error_severity);
 	unsigned long pfn;
 	int flags = -1;
-	int sec_sev = ghes_severity(gdata->error_severity);
-	struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
 
+	ghes_edac_report_mem_error(sev, mem_err);
+	arch_apei_report_mem_error(sev, mem_err);
+
+#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
 	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
 		return;
 
@@ -569,15 +573,7 @@ static void ghes_do_proc(struct ghes *ghes,
 		if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
 			fru_text = gdata->fru_text;
 
-		if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) {
-			struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
-
-			ghes_edac_report_mem_error(sev, mem_err);
-
-			arch_apei_report_mem_error(sev, mem_err);
-			ghes_handle_memory_failure(gdata, sev);
-		}
-		else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
+		if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
 			ghes_handle_aer(gdata);
 		}
 		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
@@ -1190,11 +1186,25 @@ static int apei_sdei_unregister_ghes(struct ghes *ghes)
 	return sdei_unregister_ghes(ghes);
 }
 
+struct ghes_err_handler_tab {
+	guid_t sec_type;
+	error_handle handle;
+};
+
+static struct ghes_err_handler_tab handler_tab[] = {
+	{
+		.sec_type = CPER_SEC_PLATFORM_MEM,
+		.handle = ghes_handle_memory_failure,
+	},
+	{ /* sentinel */ }
+};
+
 static int ghes_probe(struct platform_device *ghes_dev)
 {
 	struct acpi_hest_generic *generic;
 	struct ghes *ghes = NULL;
 	unsigned long flags;
+	int i;
 
 	int rc = -EINVAL;
 
@@ -1308,9 +1318,20 @@ static int ghes_probe(struct platform_device *ghes_dev)
 
 	ghes_edac_register(ghes, &ghes_dev->dev);
 
-	if (!refcount_read(&ghes_ref_count))
+	if (!refcount_read(&ghes_ref_count)) {
 		refcount_set(&ghes_ref_count, 1);
-	else
+		/* register handler functions for the standard errors.
+		 * This may be done from the corresponding drivers.
+		 */
+		for (i = 0; handler_tab[i].handle; i++) {
+			if (ghes_error_notify_register(handler_tab[i].sec_type,
+						handler_tab[i].handle, NULL)) {
+				ghes_edac_unregister(ghes);
+				platform_set_drvdata(ghes_dev, NULL);
+				goto err;
+			}
+		}
+	} else
 		refcount_inc(&ghes_ref_count);
 
 	/* Handle any pending errors right away */
-- 
1.9.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH RFC 3/4] ACPI: APEI: Add ghes_handle_aer to the new notification method
  2019-08-12 10:11 ` [PATCH RFC 0/4] ACPI: APEI: Add support to notify the vendor specific HW errors Shiju Jose
  2019-08-12 10:11   ` [PATCH RFC 1/4] " Shiju Jose
  2019-08-12 10:11   ` [PATCH RFC 2/4] ACPI: APEI: Add ghes_handle_memory_failure to the new notification method Shiju Jose
@ 2019-08-12 10:11   ` " Shiju Jose
  2019-08-12 10:11   ` [PATCH RFC 4/4] ACPI: APEI: Add log_arm_hw_error " Shiju Jose
  2019-08-21 17:22   ` [PATCH RFC 0/4] ACPI: APEI: Add support to notify the vendor specific HW errors James Morse
  4 siblings, 0 replies; 19+ messages in thread
From: Shiju Jose @ 2019-08-12 10:11 UTC (permalink / raw)
  To: linux-acpi, linux-edac, linux-kernel, rjw, lenb, james.morse,
	tony.luck, bp, baicar
  Cc: linuxarm, jonathan.cameron, tanxiaofei, Shiju Jose

This patch adds ghes_handle_aer to the new error notification method.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/acpi/apei/ghes.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 4400d56..ffc309c 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -450,7 +450,8 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
  * GHES_SEV_PANIC does not make it to this handling since the kernel must
  *     panic.
  */
-static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
+static void ghes_handle_aer(struct acpi_hest_generic_data *gdata,
+			    int sev, void *data)
 {
 #ifdef CONFIG_ACPI_APEI_PCIEAER
 	struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
@@ -573,10 +574,7 @@ static void ghes_do_proc(struct ghes *ghes,
 		if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
 			fru_text = gdata->fru_text;
 
-		if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
-			ghes_handle_aer(gdata);
-		}
-		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
+		if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
 			struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
 
 			log_arm_hw_error(err);
@@ -1196,6 +1194,10 @@ struct ghes_err_handler_tab {
 		.sec_type = CPER_SEC_PLATFORM_MEM,
 		.handle = ghes_handle_memory_failure,
 	},
+	{
+		.sec_type = CPER_SEC_PCIE,
+		.handle = ghes_handle_aer,
+	},
 	{ /* sentinel */ }
 };
 
-- 
1.9.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH RFC 4/4] ACPI: APEI: Add log_arm_hw_error to the new notification method
  2019-08-12 10:11 ` [PATCH RFC 0/4] ACPI: APEI: Add support to notify the vendor specific HW errors Shiju Jose
                     ` (2 preceding siblings ...)
  2019-08-12 10:11   ` [PATCH RFC 3/4] ACPI: APEI: Add ghes_handle_aer " Shiju Jose
@ 2019-08-12 10:11   ` " Shiju Jose
  2019-08-21 17:22   ` [PATCH RFC 0/4] ACPI: APEI: Add support to notify the vendor specific HW errors James Morse
  4 siblings, 0 replies; 19+ messages in thread
From: Shiju Jose @ 2019-08-12 10:11 UTC (permalink / raw)
  To: linux-acpi, linux-edac, linux-kernel, rjw, lenb, james.morse,
	tony.luck, bp, baicar
  Cc: linuxarm, jonathan.cameron, tanxiaofei, Shiju Jose

This patch adds log_arm_hw_error to the new error notification
method.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/acpi/apei/ghes.c | 47 ++++++++++++++++++++++-------------------------
 drivers/ras/ras.c        |  5 ++++-
 include/linux/ras.h      |  7 +++++--
 3 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index ffc309c..013fea0 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -574,34 +574,27 @@ static void ghes_do_proc(struct ghes *ghes,
 		if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
 			fru_text = gdata->fru_text;
 
-		if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
-			struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
-
-			log_arm_hw_error(err);
-		} else {
-			rcu_read_lock();
-			list_for_each_entry_rcu(err_notify,
-						&ghes_error_notify_list, list) {
-				if (guid_equal(&err_notify->sec_type,
-					       sec_type)) {
-					/* The notification is called in the
-					 * interrupt context, thus the handler
-					 * functions should be take care of it.
-					 */
-					err_notify->handle(gdata, sev,
-							   err_notify->data);
-					is_notify = 1;
-				}
+		rcu_read_lock();
+		list_for_each_entry_rcu(err_notify, &ghes_error_notify_list,
+					list) {
+			if (guid_equal(&err_notify->sec_type, sec_type)) {
+				/* The notification is called in the
+				 * interrupt context, thus the handler
+				 * functions should be take care of it.
+				 */
+				err_notify->handle(gdata, sev,
+						   err_notify->data);
+				is_notify = 1;
 			}
-			rcu_read_unlock();
+		}
+		rcu_read_unlock();
 
-			if (!is_notify) {
-				void *err = acpi_hest_get_payload(gdata);
+		if (!is_notify) {
+			void *err = acpi_hest_get_payload(gdata);
 
-				log_non_standard_event(sec_type, fru_id,
-						       fru_text, sec_sev, err,
-						       gdata->error_data_length);
-			}
+			log_non_standard_event(sec_type, fru_id,
+					       fru_text, sec_sev, err,
+					       gdata->error_data_length);
 		}
 	}
 }
@@ -1198,6 +1191,10 @@ struct ghes_err_handler_tab {
 		.sec_type = CPER_SEC_PCIE,
 		.handle = ghes_handle_aer,
 	},
+	{
+		.sec_type = CPER_SEC_PROC_ARM,
+		.handle = log_arm_hw_error,
+	},
 	{ /* sentinel */ }
 };
 
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index 95540ea..7ec3eeb 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -21,8 +21,11 @@ void log_non_standard_event(const guid_t *sec_type, const guid_t *fru_id,
 	trace_non_standard_event(sec_type, fru_id, fru_text, sev, err, len);
 }
 
-void log_arm_hw_error(struct cper_sec_proc_arm *err)
+void log_arm_hw_error(struct acpi_hest_generic_data *gdata,
+		      int sev, void *data)
 {
+	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
+
 	trace_arm_event(err);
 }
 
diff --git a/include/linux/ras.h b/include/linux/ras.h
index 7c3debb..05b662d 100644
--- a/include/linux/ras.h
+++ b/include/linux/ras.h
@@ -5,6 +5,7 @@
 #include <asm/errno.h>
 #include <linux/uuid.h>
 #include <linux/cper.h>
+#include <acpi/ghes.h>
 
 #ifdef CONFIG_DEBUG_FS
 int ras_userspace_consumers(void);
@@ -29,7 +30,8 @@ static inline void __init cec_init(void)	{ }
 void log_non_standard_event(const guid_t *sec_type,
 			    const guid_t *fru_id, const char *fru_text,
 			    const u8 sev, const u8 *err, const u32 len);
-void log_arm_hw_error(struct cper_sec_proc_arm *err);
+void log_arm_hw_error(struct acpi_hest_generic_data *gdata,
+		      int sev, void *data);
 #else
 static inline void
 log_non_standard_event(const guid_t *sec_type,
@@ -37,7 +39,8 @@ void log_non_standard_event(const guid_t *sec_type,
 		       const u8 sev, const u8 *err, const u32 len)
 { return; }
 static inline void
-log_arm_hw_error(struct cper_sec_proc_arm *err) { return; }
+log_arm_hw_error(struct acpi_hest_generic_data *gdata,
+		 int sev, void *data) { return; }
 #endif
 
 #endif /* __RAS_H__ */
-- 
1.9.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC 0/4] ACPI: APEI: Add support to notify the vendor specific HW errors
  2019-08-12 10:11 ` [PATCH RFC 0/4] ACPI: APEI: Add support to notify the vendor specific HW errors Shiju Jose
                     ` (3 preceding siblings ...)
  2019-08-12 10:11   ` [PATCH RFC 4/4] ACPI: APEI: Add log_arm_hw_error " Shiju Jose
@ 2019-08-21 17:22   ` James Morse
  2019-08-22 16:56     ` Shiju Jose
  4 siblings, 1 reply; 19+ messages in thread
From: James Morse @ 2019-08-21 17:22 UTC (permalink / raw)
  To: Shiju Jose
  Cc: linux-acpi, linux-edac, linux-kernel, rjw, lenb, tony.luck, bp,
	baicar, linuxarm, jonathan.cameron, tanxiaofei

Hi,

On 12/08/2019 11:11, Shiju Jose wrote:
> Presently kernel does not support reporting the vendor specific HW errors,
> in the non-standard format, to the vendor drivers for the recovery.

'non standard' here is probably a little jarring to the casual reader. You're referring to
the UEFI spec's "N.2.3 Non-standard Section Body", which refers to any section type
published somewhere other than the UEFI spec.

These still have to have a GUID to identify them, so they still have the same section
header format.


> This patch set add this support and also move the existing handler
> functions for the standard errors to the new callback method.

Could you give an example of where this would be useful? You're adding an API with no
caller to justify its existence.


GUIDs should only belong to one driver.

I don't think we should call drivers for something described as a fatal error. (which is
the case with what you have here)


> Also the CCIX RAS patches could be move to the proposed callback method.

Presumably for any vendor-specific stuff?


Thanks,

James

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC 2/4] ACPI: APEI: Add ghes_handle_memory_failure to the new notification method
  2019-08-12 10:11   ` [PATCH RFC 2/4] ACPI: APEI: Add ghes_handle_memory_failure to the new notification method Shiju Jose
@ 2019-08-21 17:22     ` James Morse
  2019-08-22 16:57       ` Shiju Jose
  0 siblings, 1 reply; 19+ messages in thread
From: James Morse @ 2019-08-21 17:22 UTC (permalink / raw)
  To: Shiju Jose
  Cc: linux-acpi, linux-edac, linux-kernel, rjw, lenb, tony.luck, bp,
	baicar, linuxarm, jonathan.cameron, tanxiaofei

Hi,

On 12/08/2019 11:11, Shiju Jose wrote:
> This patch adds ghes_handle_memory_failure to the new error
> notification method.

The commit message doesn't answer the question: why?

The existing code works. This just looks like additional churn.
Given a user, I think the vendor specific example is useful. I don't think making this
thing more pluggable is a good idea.


Thanks,

James

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH RFC 1/4] ACPI: APEI: Add support to notify the vendor specific HW errors
  2019-08-12 10:11   ` [PATCH RFC 1/4] " Shiju Jose
@ 2019-08-21 17:23     ` James Morse
  2019-08-22 16:57       ` Shiju Jose
  0 siblings, 1 reply; 19+ messages in thread
From: James Morse @ 2019-08-21 17:23 UTC (permalink / raw)
  To: Shiju Jose
  Cc: linux-acpi, linux-edac, linux-kernel, rjw, lenb, tony.luck, bp,
	baicar, linuxarm, jonathan.cameron, tanxiaofei

Hi,

On 12/08/2019 11:11, Shiju Jose wrote:
> Presently the vendor specific HW errors, in the non-standard format,
> are not reported to the vendor drivers for the recovery.
> 
> This patch adds support to notify the vendor specific HW errors to the
> registered kernel drivers.

> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index a66e00f..374d197 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -477,6 +477,77 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
>  #endif
>  }
>  
> +struct ghes_error_notify {
> +	struct list_head list;> +	struct rcu_head	rcu_head;
> +	guid_t sec_type; /* guid of the error record */

> +	error_handle handle; /* error handler function */

ghes_error_handler_t error_handler; ?


> +	void *data; /* handler driver's private data if any */
> +};
> +
> +/* List to store the registered error handling functions */
> +static DEFINE_MUTEX(ghes_error_notify_mutex);
> +static LIST_HEAD(ghes_error_notify_list);

> +static refcount_t ghes_ref_count;

I don't think this refcount is needed.


> +/**
> + * ghes_error_notify_register - register an error handling function
> + * for the hw errors.
> + * @sec_type: sec_type of the corresponding CPER to be notified.
> + * @handle: pointer to the error handling function.
> + * @data: handler driver's private data.
> + *
> + * return 0 : SUCCESS, non-zero : FAIL
> + */
> +int ghes_error_notify_register(guid_t sec_type, error_handle handle, void *data)
> +{
> +	struct ghes_error_notify *err_notify;
> +
> +	mutex_lock(&ghes_error_notify_mutex);
> +	err_notify = kzalloc(sizeof(*err_notify), GFP_KERNEL);
> +	if (!err_notify)
> +		return -ENOMEM;

Leaving the mutex locked.
You may as well allocate the memory before taking the lock.


> +
> +	err_notify->handle = handle;
> +	guid_copy(&err_notify->sec_type, &sec_type);
> +	err_notify->data = data;
> +	list_add_rcu(&err_notify->list, &ghes_error_notify_list);
> +	mutex_unlock(&ghes_error_notify_mutex);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(ghes_error_notify_register);

Could we leave exporting this to modules until there is a user?


> +/**
> + * ghes_error_notify_unregister - unregister an error handling function.
> + * @sec_type: sec_type of the corresponding CPER.
> + * @handle: pointer to the error handling function.
> + *
> + * return none.
> + */
> +void ghes_error_notify_unregister(guid_t sec_type, error_handle handle)

Why do we need the handle(r) a second time? Surely there can only be one callback for a
given guid.


> +{
> +	struct ghes_error_notify *err_notify;
> +	bool found = 0;
> +
> +	mutex_lock(&ghes_error_notify_mutex);
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(err_notify, &ghes_error_notify_list, list) {
> +		if (guid_equal(&err_notify->sec_type, &sec_type) &&
> +		    err_notify->handle == handle) {
> +			list_del_rcu(&err_notify->list);
> +			found = 1;
> +			break;
> +		}
> +	}
> +	rcu_read_unlock();

> +	synchronize_rcu();

Is this for the kfree()? Please keep them together so its obvious what its for.
Putting it outside the mutex will also save any contended waiter some time.


> +	mutex_unlock(&ghes_error_notify_mutex);
> +	if (found)
> +		kfree(err_notify);
> +}
> +EXPORT_SYMBOL_GPL(ghes_error_notify_unregister);
> +

>  static void ghes_do_proc(struct ghes *ghes,
>  			 const struct acpi_hest_generic_status *estatus)
>  {> @@ -512,11 +585,29 @@ static void ghes_do_proc(struct ghes *ghes,
>  
>  			log_arm_hw_error(err);
>  		} else {
> -			void *err = acpi_hest_get_payload(gdata);
> -
> -			log_non_standard_event(sec_type, fru_id, fru_text,
> -					       sec_sev, err,
> -					       gdata->error_data_length);

> +			rcu_read_lock();
> +			list_for_each_entry_rcu(err_notify,
> +						&ghes_error_notify_list, list) {
> +				if (guid_equal(&err_notify->sec_type,
> +					       sec_type)) {

> +					/* The notification is called in the
> +					 * interrupt context, thus the handler
> +					 * functions should be take care of it.
> +					 */

I read this as "the handler will be called", which doesn't seem to be a useful comment.


> +					err_notify->handle(gdata, sev,
> +							   err_notify->data);
> +					is_notify = 1;

					break;

> +				}
> +			}
> +			rcu_read_unlock();

> +			if (!is_notify) {

if (!found) Seems more natural.


> +				void *err = acpi_hest_get_payload(gdata);
> +
> +				log_non_standard_event(sec_type, fru_id,
> +						       fru_text, sec_sev, err,
> +						       gdata->error_data_length);
> +			}

This is tricky to read as its so bunched up. Please pull it out into a separate function.
ghes_handle_non_standard_event() ?


Because you skip log_non_standard_event(), rasdaemon will no longer see these in
user-space. For any kernel consumer of these, we need to know we aren't breaking the
user-space component.


>  		}
>  	}
>  }
> @@ -1217,6 +1308,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  
>  	ghes_edac_register(ghes, &ghes_dev->dev);
>  
> +	if (!refcount_read(&ghes_ref_count))
> +		refcount_set(&ghes_ref_count, 1);

What stops this from racing with itself if two ghes platform devices are probed at the
same time?

If the refcount needs initialising, please do it in ghes_init()....

> +	else
> +		refcount_inc(&ghes_ref_count);

.. but I don't think this refcount is needed.


>  	/* Handle any pending errors right away */
>  	spin_lock_irqsave(&ghes_notify_lock_irq, flags);
>  	ghes_proc(ghes);

> @@ -1279,6 +1376,17 @@ static int ghes_remove(struct platform_device *ghes_dev)
>  
>  	ghes_fini(ghes);
>  
> +	if (refcount_dec_and_test(&ghes_ref_count) &&
> +	    !list_empty(&ghes_error_notify_list)) {
> +		mutex_lock(&ghes_error_notify_mutex);> +		list_for_each_entry_safe(err_notify, tmp,
> +					 &ghes_error_notify_list, list) {
> +			list_del_rcu(&err_notify->list);
> +			kfree_rcu(err_notify, rcu_head);
> +		}
> +		mutex_unlock(&ghes_error_notify_mutex);
> +	}

... If someone unregisters, and re-registers all the GHES platform devices, the last one
out flushes the vendor-specific error handlers away. Then we re-probe the devices again,
but this time the vendor-specific error handlers don't work.

As you have an add/remove API for drivers, its up to drivers to cleanup when they are
removed. The comings and goings of GHES platform devices isn't relevant.


>  	ghes_edac_unregister(ghes);
>  
>  	kfree(ghes);
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index e3f1cdd..d480537 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -50,6 +50,53 @@ enum {
>  	GHES_SEV_PANIC = 0x3,
>  };
>  
> +/**
> + * error_handle - error handling function for the hw errors.

Fatal errors get dealt with earlier, so drivers will never see them.
| error handling function for non-fatal hardware errors.


> + * This handle function is called in the interrupt context.

As this overrides ghes's logging of the error, we should mention:
| The handler is responsible for any logging of the error.


> + * @gdata: acpi_hest_generic_data.
> + * @sev: error severity of the entire error event defined in the
> + * ACPI spec table generic error status block.
> + * @data: handler driver's private data.
> + *
> + * return : none.
> + */
> +typedef void (*error_handle)(struct acpi_hest_generic_data *gdata, int sev,
> +			     void *data);


Thanks,

James

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH RFC 0/4] ACPI: APEI: Add support to notify the vendor specific HW errors
  2019-08-21 17:22   ` [PATCH RFC 0/4] ACPI: APEI: Add support to notify the vendor specific HW errors James Morse
@ 2019-08-22 16:56     ` Shiju Jose
  0 siblings, 0 replies; 19+ messages in thread
From: Shiju Jose @ 2019-08-22 16:56 UTC (permalink / raw)
  To: James Morse
  Cc: linux-acpi, linux-edac, linux-kernel, rjw, lenb, tony.luck, bp,
	baicar, Linuxarm, Jonathan Cameron, tanxiaofei

Hi James, 

Thanks for the feedback.

>-----Original Message-----
>From: linux-acpi-owner@vger.kernel.org [mailto:linux-acpi-
>owner@vger.kernel.org] On Behalf Of James Morse
>Sent: 21 August 2019 18:23
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-acpi@vger.kernel.org; linux-edac@vger.kernel.org; linux-
>kernel@vger.kernel.org; rjw@rjwysocki.net; lenb@kernel.org;
>tony.luck@intel.com; bp@alien8.de; baicar@os.amperecomputing.com;
>Linuxarm <linuxarm@huawei.com>; Jonathan Cameron
><jonathan.cameron@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>
>Subject: Re: [PATCH RFC 0/4] ACPI: APEI: Add support to notify the vendor
>specific HW errors
>
>Hi,
>
>On 12/08/2019 11:11, Shiju Jose wrote:
>> Presently kernel does not support reporting the vendor specific HW
>> errors, in the non-standard format, to the vendor drivers for the recovery.
>
>'non standard' here is probably a little jarring to the casual reader. You're
>referring to the UEFI spec's "N.2.3 Non-standard Section Body", which refers to
>any section type published somewhere other than the UEFI spec.
OK. I will change it.  
>
>These still have to have a GUID to identify them, so they still have the same
>section header format.
Yes. 
 
>
>
>> This patch set add this support and also move the existing handler
>> functions for the standard errors to the new callback method.
>
>Could you give an example of where this would be useful? You're adding an API
>with no caller to justify its existence.
One such example is handling the local errors occurred in a device controller, such as PCIe.

>
>
>GUIDs should only belong to one driver.
UEFI spec's N.2.3 Non-standard Section Body mentioned,  "The type (e.g. format) of a non-standard section is identified by the GUID populated in the Section Descriptor's Section Type field." 
There is a possibility to define common non-standard error section format which will be used for more than one driver if the error data to be reported is in the same format. Then can the same GUID belong to multiple drivers?

>
>I don't think we should call drivers for something described as a fatal error.
>(which is the case with what you have here)
The notification is intended only for the recoverable errors as the ghes_proc() call panic for the fatal errors in the early stage.

>
>
>> Also the CCIX RAS patches could be move to the proposed callback method.
>
>Presumably for any vendor-specific stuff?
This information was related to the proposal to replace the  number of if(guid_equal(...)) else if(guid_equal(...)) checks in the ghes_do_proc() for the existing UEFI spec defined error sections(such as PCIe,  Memory, ARM HW error) by registering the corresponding handler functions to the proposed notification method. The same apply to the CCIX error sections and any other error sections defined by the UEFI spec in the future.  

>
>
>Thanks,
>
>James

Thanks,
Shiju

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH RFC 2/4] ACPI: APEI: Add ghes_handle_memory_failure to the new notification method
  2019-08-21 17:22     ` James Morse
@ 2019-08-22 16:57       ` Shiju Jose
  0 siblings, 0 replies; 19+ messages in thread
From: Shiju Jose @ 2019-08-22 16:57 UTC (permalink / raw)
  To: James Morse
  Cc: linux-acpi, linux-edac, linux-kernel, rjw, lenb, tony.luck, bp,
	baicar, Linuxarm, Jonathan Cameron, tanxiaofei

Hi James,

>-----Original Message-----
>From: James Morse [mailto:james.morse@arm.com]
>Sent: 21 August 2019 18:23
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-acpi@vger.kernel.org; linux-edac@vger.kernel.org; linux-
>kernel@vger.kernel.org; rjw@rjwysocki.net; lenb@kernel.org;
>tony.luck@intel.com; bp@alien8.de; baicar@os.amperecomputing.com;
>Linuxarm <linuxarm@huawei.com>; Jonathan Cameron
><jonathan.cameron@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>
>Subject: Re: [PATCH RFC 2/4] ACPI: APEI: Add ghes_handle_memory_failure to
>the new notification method
>
>Hi,
>
>On 12/08/2019 11:11, Shiju Jose wrote:
>> This patch adds ghes_handle_memory_failure to the new error
>> notification method.
>
>The commit message doesn't answer the question: why?
>
>The existing code works. This just looks like additional churn.
>Given a user, I think the vendor specific example is useful. I don't think making
>this thing more pluggable is a good idea.
This was intended to replace the  number of if(guid_equal(...)) else if(guid_equal(...)) checks in the ghes_do_proc() , which would grow when new UEFI defined error sections would be added in the future.
>
>
>Thanks,
>
>James

Thanks,
Shiju

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH RFC 1/4] ACPI: APEI: Add support to notify the vendor specific HW errors
  2019-08-21 17:23     ` James Morse
@ 2019-08-22 16:57       ` Shiju Jose
  0 siblings, 0 replies; 19+ messages in thread
From: Shiju Jose @ 2019-08-22 16:57 UTC (permalink / raw)
  To: James Morse
  Cc: linux-acpi, linux-edac, linux-kernel, rjw, lenb, tony.luck, bp,
	baicar, Linuxarm, Jonathan Cameron, tanxiaofei

Hi James,

>-----Original Message-----
>From: James Morse [mailto:james.morse@arm.com]
>Sent: 21 August 2019 18:24
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-acpi@vger.kernel.org; linux-edac@vger.kernel.org; linux-
>kernel@vger.kernel.org; rjw@rjwysocki.net; lenb@kernel.org;
>tony.luck@intel.com; bp@alien8.de; baicar@os.amperecomputing.com;
>Linuxarm <linuxarm@huawei.com>; Jonathan Cameron
><jonathan.cameron@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>
>Subject: Re: [PATCH RFC 1/4] ACPI: APEI: Add support to notify the vendor
>specific HW errors
>
>Hi,
>
>On 12/08/2019 11:11, Shiju Jose wrote:
>> Presently the vendor specific HW errors, in the non-standard format,
>> are not reported to the vendor drivers for the recovery.
>>
>> This patch adds support to notify the vendor specific HW errors to the
>> registered kernel drivers.
>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index
>> a66e00f..374d197 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -477,6 +477,77 @@ static void ghes_handle_aer(struct
>> acpi_hest_generic_data *gdata)  #endif  }
>>
>> +struct ghes_error_notify {
>> +	struct list_head list;> +	struct rcu_head	rcu_head;
>> +	guid_t sec_type; /* guid of the error record */
>
>> +	error_handle handle; /* error handler function */
>
>ghes_error_handler_t error_handler; ?
Sure.

>
>
>> +	void *data; /* handler driver's private data if any */ };
>> +
>> +/* List to store the registered error handling functions */ static
>> +DEFINE_MUTEX(ghes_error_notify_mutex);
>> +static LIST_HEAD(ghes_error_notify_list);
>
>> +static refcount_t ghes_ref_count;
>
>I don't think this refcount is needed.
refcount was added to register standard error handlers with this notification method one time when
multiple ghes platform devices are probed.
 
>
>
>> +/**
>> + * ghes_error_notify_register - register an error handling function
>> + * for the hw errors.
>> + * @sec_type: sec_type of the corresponding CPER to be notified.
>> + * @handle: pointer to the error handling function.
>> + * @data: handler driver's private data.
>> + *
>> + * return 0 : SUCCESS, non-zero : FAIL  */ int
>> +ghes_error_notify_register(guid_t sec_type, error_handle handle, void
>> +*data) {
>> +	struct ghes_error_notify *err_notify;
>> +
>> +	mutex_lock(&ghes_error_notify_mutex);
>> +	err_notify = kzalloc(sizeof(*err_notify), GFP_KERNEL);
>> +	if (!err_notify)
>> +		return -ENOMEM;
>
>Leaving the mutex locked.
>You may as well allocate the memory before taking the lock.
Good spot. I will fix.

>
>
>> +
>> +	err_notify->handle = handle;
>> +	guid_copy(&err_notify->sec_type, &sec_type);
>> +	err_notify->data = data;
>> +	list_add_rcu(&err_notify->list, &ghes_error_notify_list);
>> +	mutex_unlock(&ghes_error_notify_mutex);
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(ghes_error_notify_register);
>
>Could we leave exporting this to modules until there is a user?
>
>
>> +/**
>> + * ghes_error_notify_unregister - unregister an error handling function.
>> + * @sec_type: sec_type of the corresponding CPER.
>> + * @handle: pointer to the error handling function.
>> + *
>> + * return none.
>> + */
>> +void ghes_error_notify_unregister(guid_t sec_type, error_handle
>> +handle)
>
>Why do we need the handle(r) a second time? Surely there can only be one
>callback for a given guid.
There is a possibility of sharing the guid between drivers if the non-standard error section format is common
for more than one devices if the error data to be reported is in the same format.
 
>
>
>> +{
>> +	struct ghes_error_notify *err_notify;
>> +	bool found = 0;
>> +
>> +	mutex_lock(&ghes_error_notify_mutex);
>> +	rcu_read_lock();
>> +	list_for_each_entry_rcu(err_notify, &ghes_error_notify_list, list) {
>> +		if (guid_equal(&err_notify->sec_type, &sec_type) &&
>> +		    err_notify->handle == handle) {
>> +			list_del_rcu(&err_notify->list);
>> +			found = 1;
>> +			break;
>> +		}
>> +	}
>> +	rcu_read_unlock();
>
>> +	synchronize_rcu();
>
>Is this for the kfree()? Please keep them together so its obvious what its for.
>Putting it outside the mutex will also save any contended waiter some time.
Yes. I will move synchronize_rcu () just before kfree. 
 
>
>
>> +	mutex_unlock(&ghes_error_notify_mutex);
>> +	if (found)
>> +		kfree(err_notify);
>> +}
>> +EXPORT_SYMBOL_GPL(ghes_error_notify_unregister);
>> +
>
>>  static void ghes_do_proc(struct ghes *ghes,
>>  			 const struct acpi_hest_generic_status *estatus)  {>
>@@ -512,11
>> +585,29 @@ static void ghes_do_proc(struct ghes *ghes,
>>
>>  			log_arm_hw_error(err);
>>  		} else {
>> -			void *err = acpi_hest_get_payload(gdata);
>> -
>> -			log_non_standard_event(sec_type, fru_id, fru_text,
>> -					       sec_sev, err,
>> -					       gdata->error_data_length);
>
>> +			rcu_read_lock();
>> +			list_for_each_entry_rcu(err_notify,
>> +						&ghes_error_notify_list, list) {
>> +				if (guid_equal(&err_notify->sec_type,
>> +					       sec_type)) {
>
>> +					/* The notification is called in the
>> +					 * interrupt context, thus the handler
>> +					 * functions should be take care of it.
>> +					 */
>
>I read this as "the handler will be called", which doesn't seem to be a useful
>comment.
Ok. I will correct the comment.
>
>
>> +					err_notify->handle(gdata, sev,
>> +							   err_notify->data);
>> +					is_notify = 1;
>
>					break;
>
>> +				}
>> +			}
>> +			rcu_read_unlock();
>
>> +			if (!is_notify) {
>
>if (!found) Seems more natural.
Ok. I will change to "is_notify"  to "found".

>
>
>> +				void *err = acpi_hest_get_payload(gdata);
>> +
>> +				log_non_standard_event(sec_type, fru_id,
>> +						       fru_text, sec_sev, err,
>> +						       gdata->error_data_length);
>> +			}
>
>This is tricky to read as its so bunched up. Please pull it out into a separate
>function.
>ghes_handle_non_standard_event() ?
Ok. I will add to new ghes_handle_non_standard_event() function.

>
>
>Because you skip log_non_standard_event(), rasdaemon will no longer see
>these in user-space. For any kernel consumer of these, we need to know we
>aren't breaking the user-space component.
>
>
>>  		}
>>  	}
>>  }
>> @@ -1217,6 +1308,11 @@ static int ghes_probe(struct platform_device
>> *ghes_dev)
>>
>>  	ghes_edac_register(ghes, &ghes_dev->dev);
>>
>> +	if (!refcount_read(&ghes_ref_count))
>> +		refcount_set(&ghes_ref_count, 1);
>
>What stops this from racing with itself if two ghes platform devices are probed
>at the same time?
yes. It is an issue.
>
>If the refcount needs initialising, please do it in ghes_init()....
refcount was added to register the standard error handlers to the notification
method only for the first time when the ghes device probed multiple times.
I will check is it possible to avoid using refcount by moving the above registration
of standard error handlers to the ghes_init().
>
>> +	else
>> +		refcount_inc(&ghes_ref_count);
>
>.. but I don't think this refcount is needed.
>
>
>>  	/* Handle any pending errors right away */
>>  	spin_lock_irqsave(&ghes_notify_lock_irq, flags);
>>  	ghes_proc(ghes);
>
>> @@ -1279,6 +1376,17 @@ static int ghes_remove(struct platform_device
>> *ghes_dev)
>>
>>  	ghes_fini(ghes);
>>
>> +	if (refcount_dec_and_test(&ghes_ref_count) &&
>> +	    !list_empty(&ghes_error_notify_list)) {
>> +		mutex_lock(&ghes_error_notify_mutex);> +
>	list_for_each_entry_safe(err_notify, tmp,
>> +					 &ghes_error_notify_list, list) {
>> +			list_del_rcu(&err_notify->list);
>> +			kfree_rcu(err_notify, rcu_head);
>> +		}
>> +		mutex_unlock(&ghes_error_notify_mutex);
>> +	}
>
>... If someone unregisters, and re-registers all the GHES platform devices, the
>last one out flushes the vendor-specific error handlers away. Then we re-probe
>the devices again, but this time the vendor-specific error handlers don't work.
>
>As you have an add/remove API for drivers, its up to drivers to cleanup when
>they are removed. The comings and goings of GHES platform devices isn't
>relevant.
Ok. Got it. I will either keep the unregister for the standard error handlers only
if the standard error handlers can be part of the notification method or remove completely
if the registration can be done in the ghes_init().    
>
>
>>  	ghes_edac_unregister(ghes);
>>
>>  	kfree(ghes);
>> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index
>> e3f1cdd..d480537 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -50,6 +50,53 @@ enum {
>>  	GHES_SEV_PANIC = 0x3,
>>  };
>>
>> +/**
>> + * error_handle - error handling function for the hw errors.
>
>Fatal errors get dealt with earlier, so drivers will never see them.
>| error handling function for non-fatal hardware errors.
Ok. I will change the comment as recoverable HW errors.
>
>
>> + * This handle function is called in the interrupt context.
>
>As this overrides ghes's logging of the error, we should mention:
>| The handler is responsible for any logging of the error.
Ok. I will add in the comment.
>
>
>> + * @gdata: acpi_hest_generic_data.
>> + * @sev: error severity of the entire error event defined in the
>> + * ACPI spec table generic error status block.
>> + * @data: handler driver's private data.
>> + *
>> + * return : none.
>> + */
>> +typedef void (*error_handle)(struct acpi_hest_generic_data *gdata, int sev,
>> +			     void *data);
>
>
>Thanks,
>
>James
Thanks,
Shiju

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, back to index

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Shiju Jose>
2019-06-17 14:28 ` [PATCH 0/6] rasdaemon:add logging of HiSilicon HIP08 non-standard H/W errors and changes in the error decoding code Shiju Jose
2019-06-17 14:28   ` [PATCH 1/6] rasdaemon:print non-standard error data if not decoded Shiju Jose
2019-06-17 14:28   ` [PATCH 2/6] rasdaemon: rearrange HiSilicon HIP07 decoding function table Shiju Jose
2019-06-17 14:28   ` [PATCH 3/6] rasdaemon: update iteration logic for the non-standard error decoding functions Shiju Jose
2019-06-17 14:28   ` [PATCH 4/6] rasdaemon:add logging HiSilicon HIP08 H/W errors reported in the OEM format1 Shiju Jose
2019-06-17 14:28   ` [PATCH 5/6] rasdaemon:add logging HiSilicon HIP08 H/W errors reported in the OEM format2 Shiju Jose
2019-06-17 14:28   ` [PATCH 6/6] rasdaemon:add logging HiSilicon HIP08 PCIe local errors Shiju Jose
2019-06-21 18:42   ` [PATCH 0/6] rasdaemon:add logging of HiSilicon HIP08 non-standard H/W errors and changes in the error decoding code Mauro Carvalho Chehab
2019-08-12 10:11 ` [PATCH RFC 0/4] ACPI: APEI: Add support to notify the vendor specific HW errors Shiju Jose
2019-08-12 10:11   ` [PATCH RFC 1/4] " Shiju Jose
2019-08-21 17:23     ` James Morse
2019-08-22 16:57       ` Shiju Jose
2019-08-12 10:11   ` [PATCH RFC 2/4] ACPI: APEI: Add ghes_handle_memory_failure to the new notification method Shiju Jose
2019-08-21 17:22     ` James Morse
2019-08-22 16:57       ` Shiju Jose
2019-08-12 10:11   ` [PATCH RFC 3/4] ACPI: APEI: Add ghes_handle_aer " Shiju Jose
2019-08-12 10:11   ` [PATCH RFC 4/4] ACPI: APEI: Add log_arm_hw_error " Shiju Jose
2019-08-21 17:22   ` [PATCH RFC 0/4] ACPI: APEI: Add support to notify the vendor specific HW errors James Morse
2019-08-22 16:56     ` Shiju Jose

Linux-EDAC Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-edac/0 linux-edac/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-edac linux-edac/ https://lore.kernel.org/linux-edac \
		linux-edac@vger.kernel.org linux-edac@archiver.kernel.org
	public-inbox-index linux-edac


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-edac


AGPL code for this site: git clone https://public-inbox.org/ public-inbox