Linux-EDAC Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH 0/6] CCIX rasdaemon support
@ 2019-06-14 17:55 Jonathan Cameron
  2019-06-14 17:55 ` [RFC PATCH 1/6] rasdaemon: CCIX: CCIX memory error reporting Jonathan Cameron
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Jonathan Cameron @ 2019-06-14 17:55 UTC (permalink / raw)
  To: linux-edac; +Cc: mchehab, linuxarm, jcm, Jonathan Cameron

This is an RFC because the kernel side is currently under review and
may change with obvious follow through effects on this.

https://lore.kernel.org/linux-edac/20190606123654.78973-1-Jonathan.Cameron@huawei.com/

There are a few additional questions around this:
1. Divide between specifity of DB fields vs blobs.
   Where possible I have tried to fully describe the contents via
   separate fields rather than large blobs.  One common SQL convention
   that doesn't seem to have been previously done in rasdaemon is to
   use explicit NULL entries for elements where data is missing.
2. Should we split ras-record.c and have the ccix handling in a separate
   ras-record-ccix.c file or similar as that one is getting rather large.

Note the following is a trademark grant and doesn't prevent normal
stuff covered under fair use.  Given this doesn't current quote from
the spec, there are no such copyright notices.

This patch is being distributed by the CCIX Consortium, Inc. (CCIX) to
you and other parties that are paticipating (the "participants") in
rasdemon project with the understanding that the participants will use CCIX's
name and trademark only when this patch is used in association with 
rasdaemon.

CCIX is also distributing this patch to these participants with the
understanding that if any portion of the CCIX specification will be
used or referenced in rasdaemon, the participants will not modify
the cited portion of the CCIX specification and will give CCIX propery
copyright attribution by including the following copyright notice with
the cited part of the CCIX specification:
"© 2019 CCIX CONSORTIUM, INC. ALL RIGHTS RESERVED."

Jonathan Cameron (6):
  rasdaemon: CCIX: CCIX memory error reporting.
  rasdaemon: CCIX: Cache error support
  rasdaemon: CCIX: ATC errors
  rasdaemon: CCIX: Port error handling
  rasdaemon: CCIX: Link error support
  rasdaemon: CCIX: Agent Internal error support

 Makefile.am        |   6 +-
 configure.ac       |  10 +
 ras-ccix-handler.c | 648 +++++++++++++++++++++++++++++++++++++++++++++
 ras-ccix-handler.h | 139 ++++++++++
 ras-events.c       |  61 +++++
 ras-record.c       | 568 +++++++++++++++++++++++++++++++++++++++
 ras-record.h       |  35 +++
 ras-report.h       |   6 +-
 8 files changed, 1471 insertions(+), 2 deletions(-)
 create mode 100644 ras-ccix-handler.c
 create mode 100644 ras-ccix-handler.h

-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 1/6] rasdaemon: CCIX: CCIX memory error reporting.
  2019-06-14 17:55 [RFC PATCH 0/6] CCIX rasdaemon support Jonathan Cameron
@ 2019-06-14 17:55 ` Jonathan Cameron
  2019-06-14 17:55 ` [RFC PATCH 2/6] rasdaemon: CCIX: Cache error support Jonathan Cameron
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Jonathan Cameron @ 2019-06-14 17:55 UTC (permalink / raw)
  To: linux-edac; +Cc: mchehab, linuxarm, jcm, Jonathan Cameron

Adds support for basic decoding and logging of ccix memory errors
+ storing to sqlite3 DB.

Given that the CCIX memory record is very tightly defined by the
specification and that databases with large blobs in them
are not particularly useful, I have separately exposed all of the
standard fields.  Note that this means setting them NULL if the
validation bits indicate that the field is not valid.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 Makefile.am        |   6 +-
 configure.ac       |  10 ++
 ras-ccix-handler.c | 244 +++++++++++++++++++++++++++++++++++++++++++++
 ras-ccix-handler.h |  61 ++++++++++++
 ras-events.c       |  16 +++
 ras-record.c       | 176 ++++++++++++++++++++++++++++++++
 ras-record.h       |  20 ++++
 ras-report.h       |   6 +-
 8 files changed, 537 insertions(+), 2 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index 87b5a3a..0b4f1ac 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -24,6 +24,9 @@ endif
 if WITH_AER
    rasdaemon_SOURCES += ras-aer-handler.c
 endif
+if WITH_CCIX
+   rasdaemon_SOURCES += ras-ccix-handler.c
+endif
 if WITH_NON_STANDARD
    rasdaemon_SOURCES += ras-non-standard-handler.c
 endif
@@ -51,7 +54,8 @@ rasdaemon_LDADD = -lpthread $(SQLITE3_LIBS) libtrace/libtrace.a
 
 include_HEADERS = config.h  ras-events.h  ras-logger.h  ras-mc-handler.h \
 		  ras-aer-handler.h ras-mce-handler.h ras-record.h bitfield.h ras-report.h \
-		  ras-extlog-handler.h ras-arm-handler.h ras-non-standard-handler.h
+		  ras-extlog-handler.h ras-arm-handler.h ras-non-standard-handler.h \
+		  ras-ccix-handler.h
 
 # This rule can't be called with more than one Makefile job (like make -j8)
 # I can't figure out a way to fix that
diff --git a/configure.ac b/configure.ac
index 6ad5421..75fea44 100644
--- a/configure.ac
+++ b/configure.ac
@@ -44,6 +44,15 @@ AS_IF([test "x$enable_aer" = "xyes"], [
 ])
 AM_CONDITIONAL([WITH_AER], [test x$enable_aer = xyes])
 
+AC_ARG_ENABLE([ccix],
+    AS_HELP_STRING([--enable-ccix], [enable CCIX PER events (currently experimental)]))
+
+AS_IF([test "x$enable_ccix" = "xyes"], [
+  AC_DEFINE(HAVE_CCIX,1,"have CCIX PER events collect")
+  AC_SUBST([WITH_CCIX])
+])
+AM_CONDITIONAL([WITH_CCIX], [test x$enable_ccix = xyes])
+
 AC_ARG_ENABLE([non_standard],
     AS_HELP_STRING([--enable-non-standard], [enable NON_STANDARD events (currently experimental)]))
 
@@ -127,4 +136,5 @@ compile time options summary
     ABRT report         : $enable_abrt_report
     HIP07 SAS HW errors : $enable_hisi_ns_decode
     ARM events          : $enable_arm
+    CCIX                : $enable_ccix
 EOF
diff --git a/ras-ccix-handler.c b/ras-ccix-handler.c
new file mode 100644
index 0000000..2be413f
--- /dev/null
+++ b/ras-ccix-handler.c
@@ -0,0 +1,244 @@
+/*
+ * Copyright (c) 2019 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include "libtrace/kbuffer.h"
+#include "ras-record.h"
+#include "ras-logger.h"
+#include "bitfield.h"
+#include "ras-report.h"
+
+static char *ccix_mem_pool_type(uint8_t pt)
+{
+	switch (pt) {
+	case 0: return "other/not-specified";
+	case 1: return "ROM";
+	case 2: return "volatile";
+	case 3: return "non-volatile";
+	case 4: return "device/register";
+	}
+	if (pt >= 0x80)
+		return "vendor";
+	return "unknown";
+}
+
+static char *ccix_mem_spec_type(uint8_t st)
+{
+	switch (st) {
+	case 0: return "other/not-specified";
+	case 1: return "SRAM";
+	case 2: return "DDR";
+	case 3: return "NVDIMM-F";
+	case 4: return "NVDIMM-N";
+	case 5: return "HBM";
+	case 6: return "flash";
+	}
+	if (st >= 0x80)
+		return "vendor";
+	return "unknown";
+}
+
+static char *ccix_mem_op(uint8_t op)
+{
+	switch (op) {
+	case 0: return "generic";
+	case 1: return "read";
+	case 2: return "write";
+	case 4: return "scrub";
+	}
+	return "unknown";
+}
+
+static char *ccix_mem_err_type(int etype)
+{
+	switch (etype) {
+	case 0: return "unknown";
+	case 1: return "no error";
+	case 2: return "single-bit ECC";
+	case 3: return "multi-bit ECC";
+	case 4: return "single-symbol chipkill ECC";
+	case 5: return "multi-symbol chipkill ECC";
+	case 6: return "master abort";
+	case 7: return "target abort";
+	case 8: return "parity error";
+	case 9: return "watchdog timeout";
+	case 10: return "invalid address";
+	case 11: return "mirror Broken";
+	case 12: return "memory sparing";
+	case 13: return "scrub";
+	case 14: return "physical memory map-out event";
+	}
+	return "unknown-type";
+}
+
+static char *ccix_mem_err_cper_data(const char *c)
+{
+	const struct cper_ccix_mem_err_compact *cpd =
+		(struct cper_ccix_mem_err_compact *)c;
+	static char buf[1024];
+	char *p = buf;
+
+	p += sprintf(p, " (");
+	p += sprintf(p, "fru: %u ", cpd->fru);
+	if (cpd->validation_bits & CCIX_MEM_ERR_MEM_ERR_TYPE_VALID)
+		p += sprintf(p, "error: %s ",
+			     ccix_mem_err_type(cpd->mem_err_type));
+	if (cpd->validation_bits & CCIX_MEM_ERR_GENERIC_MEM_VALID)
+		p += sprintf(p, "type: %s ",
+			     ccix_mem_pool_type(cpd->pool_generic_type));
+	if (cpd->validation_bits & CCIX_MEM_ERR_SPEC_TYPE_VALID)
+		p += sprintf(p, "sub_type: %s ",
+			     ccix_mem_spec_type(cpd->pool_specific_type));
+	if (cpd->validation_bits & CCIX_MEM_ERR_OP_VALID)
+		p += sprintf(p, "op: %s ", ccix_mem_op(cpd->op_type));
+	if (cpd->validation_bits & CCIX_MEM_ERR_CARD_VALID)
+		p += sprintf(p, "card: %u ", cpd->card);
+	if (cpd->validation_bits & CCIX_MEM_ERR_MOD_VALID)
+		p += sprintf(p, "mod: %u ", cpd->module);
+	if (cpd->validation_bits & CCIX_MEM_ERR_BANK_VALID)
+		p += sprintf(p, "bank: %u ", cpd->bank);
+	if (cpd->validation_bits & CCIX_MEM_ERR_DEVICE_VALID)
+		p += sprintf(p, "device: %u ", cpd->device);
+	if (cpd->validation_bits & CCIX_MEM_ERR_ROW_VALID)
+		p += sprintf(p, "row: %u ", cpd->row);
+	if (cpd->validation_bits & CCIX_MEM_ERR_COL_VALID)
+		p += sprintf(p, "col: %u ", cpd->column);
+	if (cpd->validation_bits & CCIX_MEM_ERR_RANK_VALID)
+		p += sprintf(p, "rank: %u ", cpd->rank);
+	if (cpd->validation_bits & CCIX_MEM_ERR_BIT_POS_VALID)
+		p += sprintf(p, "bitpos: %u ", cpd->bit_pos);
+	if (cpd->validation_bits & CCIX_MEM_ERR_CHIP_ID_VALID)
+		p += sprintf(p, "chipid: %u ", cpd->chip_id);
+	p += sprintf(p - 1, ")");
+
+	return buf;
+}
+
+static char *ccix_component_type(int type)
+{
+	switch (type) {
+	case 0: return "RA";
+	case 1: return "HA";
+	case 2: return "SA";
+	case 3: return "Port";
+	case 4: return "CCIX-Link";
+	}
+	return "unknown-component";
+}
+
+static char *err_severity(int severity)
+{
+	switch (severity) {
+	case 0: return "recoverable";
+	case 1: return "fatal";
+	case 2: return "corrected";
+	case 3: return "informational";
+	}
+	return "unknown-severity";
+}
+
+static unsigned long long err_mask(int lsb)
+{
+	if (lsb == 0xff)
+		return ~0ull;
+	return ~((1ull << lsb) - 1);
+}
+
+static int ras_ccix_common_parse(struct trace_seq *s,
+				 struct pevent_record *record,
+				 struct event_format *event, void *context,
+				 struct ras_ccix_event *ev)
+{
+	unsigned long long val;
+	int len;
+
+	if (pevent_get_field_val(s,  event, "err_seq", record, &val, 1) < 0)
+		return -1;
+	ev->error_seq = val;
+	if (pevent_get_field_val(s,  event, "sev", record, &val, 1) < 0)
+		return -1;
+	ev->severity = val;
+	if (pevent_get_field_val(s, event, "sevdetail", record, &val, 1) < 0)
+		return -1;
+	ev->severity_detail = val;
+	if (pevent_get_field_val(s,  event, "pa", record, &val, 1) < 0)
+		return -1;
+	ev->address = val;
+	if (pevent_get_field_val(s,  event, "pa_mask_lsb", record, &val, 1) < 0)
+		return -1;
+	ev->pa_mask_lsb = val;
+	if (pevent_get_field_val(s, event, "source", record, &val, 1) < 0)
+		return -1;
+	ev->source = val;
+	if (pevent_get_field_val(s, event, "component", record, &val, 1) < 0)
+		return -1;
+	ev->component = val;
+
+	ev->cper_data = pevent_get_field_raw(s, event, "data", record, &len, 1);
+	ev->cper_data_length = len;
+
+	if (pevent_get_field_val(s, event, "vendor_data_length", record, &val,
+				 1))
+		return -1;
+	ev->vendor_data_length = val;
+
+	ev->vendor_data = pevent_get_field_raw(s, event, "vendor_data", record,
+					       &len, 1);
+
+	return 0;
+}
+
+int ras_ccix_memory_event_handler(struct trace_seq *s,
+				  struct pevent_record *record,
+				  struct event_format *event, void *context)
+{
+	struct ras_events *ras = context;
+	struct tm *tm;
+	struct ras_ccix_event ev;
+	time_t now;
+	int ret;
+
+	if (ras->use_uptime)
+		now = record->ts/user_hz + ras->uptime_diff;
+	else
+		now = time(NULL);
+
+	tm = localtime(&now);
+
+	if (tm)
+		strftime(ev.timestamp, sizeof(ev.timestamp),
+			 "%Y-%m-%d %H:%M:%S %z", tm);
+	trace_seq_printf(s, "%s ", ev.timestamp);
+
+	ret = ras_ccix_common_parse(s, record, event, context, &ev);
+	if (ret)
+		return ret;
+
+	trace_seq_printf(s, "%d %s id:%d CCIX memory error %s ue:%d nocomm:%d degraded:%d deferred:%d physical addr: 0x%llx mask: 0x%llx %s",
+			 ev.error_seq, err_severity(ev.severity),
+			 ev.source, ccix_component_type(ev.component),
+			 (ev.severity_detail & 0x1) ? 1 : 0,
+			 (ev.severity_detail & 0x2) ? 1 : 0,
+			 (ev.severity_detail & 0x4) ? 1 : 0,
+			 (ev.severity_detail & 0x8) ? 1 : 0,
+			 ev.address,
+			 err_mask(ev.pa_mask_lsb),
+			 ccix_mem_err_cper_data(ev.cper_data));
+
+	ras_store_ccix_memory_event(ras, &ev);
+
+	return 0;
+}
diff --git a/ras-ccix-handler.h b/ras-ccix-handler.h
new file mode 100644
index 0000000..f6d25b1
--- /dev/null
+++ b/ras-ccix-handler.h
@@ -0,0 +1,61 @@
+/*
+ * Copyright (c) 2019 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __RAS_CCIX_HANDLER_H
+#define __RAS_CCIX_HANDLER_H
+
+#include "ras-events.h"
+#include "libtrace/event-parse.h"
+
+int ras_ccix_memory_event_handler(struct trace_seq *s,
+				  struct pevent_record *record,
+				  struct event_format *event, void *context);
+
+/* Perhaps unnecessary paranoia, but the tracepoint structure is packed */
+#pragma pack(1)
+struct cper_ccix_mem_err_compact {
+	uint32_t validation_bits;
+	uint8_t mem_err_type;
+	uint8_t pool_generic_type;
+	uint8_t pool_specific_type;
+	uint8_t op_type;
+	uint8_t card;
+	uint16_t module;
+	uint16_t bank;
+	uint32_t device;
+	uint32_t row;
+	uint32_t column;
+	uint32_t rank;
+	uint8_t bit_pos;
+	uint8_t chip_id;
+	uint8_t fru;
+};
+#pragma pack()
+
+#define CCIX_MEM_ERR_GENERIC_MEM_VALID		0x0001
+#define CCIX_MEM_ERR_OP_VALID			0x0002
+#define CCIX_MEM_ERR_MEM_ERR_TYPE_VALID		0x0004
+#define CCIX_MEM_ERR_CARD_VALID			0x0008
+#define CCIX_MEM_ERR_BANK_VALID			0x0010
+#define CCIX_MEM_ERR_DEVICE_VALID		0x0020
+#define CCIX_MEM_ERR_ROW_VALID			0x0040
+#define CCIX_MEM_ERR_COL_VALID			0x0080
+#define CCIX_MEM_ERR_RANK_VALID			0x0100
+#define CCIX_MEM_ERR_BIT_POS_VALID		0x0200
+#define CCIX_MEM_ERR_CHIP_ID_VALID		0x0400
+#define CCIX_MEM_ERR_VENDOR_DATA_VALID		0x0800
+#define CCIX_MEM_ERR_MOD_VALID			0x1000
+#define CCIX_MEM_ERR_SPEC_TYPE_VALID		0x2000
+
+#endif
diff --git a/ras-events.c b/ras-events.c
index 9395f6f..c7bd1c3 100644
--- a/ras-events.c
+++ b/ras-events.c
@@ -29,6 +29,7 @@
 #include "libtrace/event-parse.h"
 #include "ras-mc-handler.h"
 #include "ras-aer-handler.h"
+#include "ras-ccix-handler.h"
 #include "ras-non-standard-handler.h"
 #include "ras-arm-handler.h"
 #include "ras-mce-handler.h"
@@ -202,6 +203,10 @@ int toggle_ras_mc_event(int enable)
 	rc |= __toggle_ras_mc_event(ras, "ras", "aer_event", enable);
 #endif
 
+#ifdef HAVE_CCIX
+	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_memory_event", enable);
+#endif
+
 #ifdef HAVE_MCE
 	rc |= __toggle_ras_mc_event(ras, "mce", "mce_record", enable);
 #endif
@@ -686,6 +691,17 @@ int handle_ras_events(int record_events)
 		    "ras", "aer_event");
 #endif
 
+#ifdef HAVE_CCIX
+	rc = add_event_handler(ras, pevent, page_size, "ras",
+			       "ccix_memory_error_event",
+			       ras_ccix_memory_event_handler);
+	if (!rc)
+		num_events++;
+	else
+		log(ALL, LOG_ERR, "Can't get traces from %s:%s\n",
+		    "ras", "ccix_memory_event");
+#endif
+
 #ifdef HAVE_NON_STANDARD
         rc = add_event_handler(ras, pevent, page_size, "ras", "non_standard_event",
                                ras_non_standard_event_handler);
diff --git a/ras-record.c b/ras-record.c
index 2e7525e..b1a241a 100644
--- a/ras-record.c
+++ b/ras-record.c
@@ -28,6 +28,7 @@
 #include "ras-events.h"
 #include "ras-mc-handler.h"
 #include "ras-aer-handler.h"
+#include "ras-ccix-handler.h"
 #include "ras-mce-handler.h"
 #include "ras-logger.h"
 
@@ -158,6 +159,174 @@ int ras_store_aer_event(struct ras_events *ras, struct ras_aer_event *ev)
 }
 #endif
 
+#ifdef HAVE_CCIX
+enum {
+	ccix_field_id,
+	ccix_field_timestamp,
+	ccix_field_error_count,
+	ccix_field_severity,
+	ccix_field_severity_detail,
+	ccix_field_address,
+	ccix_field_address_mask,
+	ccix_field_source,
+	ccix_field_component,
+	ccix_field_common_end
+};
+
+#define CCIX_COMMON_FIELDS \
+	[ccix_field_id] =		{ .name = "id",			.type = "INTEGER PRIMARY KEY" }, \
+	[ccix_field_timestamp] =	{ .name = "timestamp",		.type = "TEXT" },	\
+	[ccix_field_error_count] =	{ .name = "error_count",	.type = "INTEGER" }, \
+	[ccix_field_severity] =		{ .name = "severity",		.type = "INTEGER" }, \
+	[ccix_field_severity_detail] =	{ .name = "severity_detail",	.type = "INTEGER" }, \
+	[ccix_field_address] =		{ .name = "address",		.type = "INTEGER" }, \
+	[ccix_field_address_mask] =	{ .name = "address_mask",	.type = "INTEGER" }, \
+	[ccix_field_source] =		{ .name = "source",		.type = "INTEGER" }, \
+	[ccix_field_component] =	{ .name = "component",		.type = "INTEGER" }
+
+enum {
+	ccix_mem_field_error_type = ccix_field_common_end,
+	ccix_mem_field_fru,
+	ccix_mem_field_type,
+	ccix_mem_field_sub_type,
+	ccix_mem_field_operation,
+	ccix_mem_field_card,
+	ccix_mem_field_mod,
+	ccix_mem_field_bank,
+	ccix_mem_field_device,
+	ccix_mem_field_row,
+	ccix_mem_field_col,
+	ccix_mem_field_rank,
+	ccix_mem_field_bit_pos,
+	ccix_mem_field_chip_id,
+	ccix_mem_field_vendor
+};
+
+static const struct db_fields ccix_memory_event_fields[] = {
+	CCIX_COMMON_FIELDS,
+	[ccix_mem_field_error_type] =	{ .name = "mem_err_type",	.type = "INTEGER" },
+	[ccix_mem_field_fru] =		{ .name = "fru",		.type = "INTEGER" },
+	[ccix_mem_field_type] =		{ .name = "type",		.type = "INTEGER" },
+	[ccix_mem_field_sub_type] =	{ .name = "sub_type",		.type = "INTEGER" },
+	[ccix_mem_field_operation] =	{ .name = "operation",		.type = "INTEGER" },
+	[ccix_mem_field_card] =		{ .name = "card",		.type = "INTEGER" },
+	[ccix_mem_field_mod] =		{ .name = "mod",		.type = "INTEGER" },
+	[ccix_mem_field_bank] =		{ .name = "bank",		.type = "INTEGER" },
+	[ccix_mem_field_device] =	{ .name = "device",		.type = "INTEGER" },
+	[ccix_mem_field_row] =		{ .name = "row",		.type = "INTEGER" },
+	[ccix_mem_field_col] =		{ .name = "col",		.type = "INTEGER" },
+	[ccix_mem_field_rank] =		{ .name = "rank",		.type = "INTEGER" },
+	[ccix_mem_field_bit_pos] =	{ .name = "bit_position",	.type = "INTEGER" },
+	[ccix_mem_field_chip_id] =	{ .name = "chip_id",		.type = "INTEGER" },
+	[ccix_mem_field_vendor] =	{ .name = "vendor_data",	.type = "BLOB" },
+};
+
+static const struct db_table_descriptor ccix_memory_event_tab = {
+	.name = "ccix_memory_event",
+	.fields = ccix_memory_event_fields,
+	.num_fields = ARRAY_SIZE(ccix_memory_event_fields),
+};
+
+static void ras_store_ccix_common(sqlite3_stmt *record,
+				  struct ras_ccix_event *ev)
+{
+	sqlite3_bind_text(record, ccix_field_timestamp, ev->timestamp, -1,
+			  NULL);
+	sqlite3_bind_int(record, ccix_field_error_count, ev->error_seq);
+	sqlite3_bind_int(record, ccix_field_severity, ev->severity);
+	sqlite3_bind_int(record, ccix_field_severity_detail,
+			 ev->severity_detail);
+	sqlite3_bind_int64(record, ccix_field_address, ev->address);
+	sqlite3_bind_int64(record, ccix_field_address_mask, ev->pa_mask_lsb);
+	sqlite3_bind_int(record, ccix_field_source, ev->source);
+	sqlite3_bind_int(record, ccix_field_component, ev->component);
+}
+
+int ras_store_ccix_memory_event(struct ras_events *ras,
+				struct ras_ccix_event *ev)
+{
+	int rc;
+	struct sqlite3_priv *priv = ras->db_priv;
+	struct cper_ccix_mem_err_compact *mem =
+	  (struct cper_ccix_mem_err_compact *)ev->cper_data;
+	sqlite3_stmt *rec = priv->stmt_ccix_mem_record;
+
+	if (!priv || !rec)
+		return 0;
+	log(TERM, LOG_INFO, "ccix_memory_eventstore: %p\n", rec);
+
+	ras_store_ccix_common(rec, ev);
+
+	sqlite3_bind_int(rec, ccix_mem_field_fru, mem->fru);
+
+	if (mem->validation_bits & CCIX_MEM_ERR_MEM_ERR_TYPE_VALID)
+		sqlite3_bind_int(rec, ccix_mem_field_error_type,
+				 mem->mem_err_type);
+
+	if (mem->validation_bits & CCIX_MEM_ERR_GENERIC_MEM_VALID)
+		sqlite3_bind_int(rec, ccix_mem_field_type,
+				 mem->pool_generic_type);
+
+	if (mem->validation_bits & CCIX_MEM_ERR_SPEC_TYPE_VALID)
+		sqlite3_bind_int(rec, ccix_mem_field_sub_type,
+				 mem->pool_specific_type);
+
+	if (mem->validation_bits & CCIX_MEM_ERR_OP_VALID)
+		sqlite3_bind_int(rec, ccix_mem_field_operation, mem->op_type);
+
+	if (mem->validation_bits & CCIX_MEM_ERR_CARD_VALID)
+		sqlite3_bind_int(rec, ccix_mem_field_card, mem->card);
+
+	if (mem->validation_bits & CCIX_MEM_ERR_MOD_VALID)
+		sqlite3_bind_int(rec, ccix_mem_field_mod, mem->module);
+
+	if (mem->validation_bits & CCIX_MEM_ERR_BANK_VALID)
+		sqlite3_bind_int(rec, ccix_mem_field_bank, mem->bank);
+
+	if (mem->validation_bits & CCIX_MEM_ERR_DEVICE_VALID)
+		sqlite3_bind_int(rec, ccix_mem_field_device, mem->device);
+
+	if (mem->validation_bits & CCIX_MEM_ERR_ROW_VALID)
+		sqlite3_bind_int(rec, ccix_mem_field_row, mem->row);
+
+	if (mem->validation_bits & CCIX_MEM_ERR_COL_VALID)
+		sqlite3_bind_int(rec, ccix_mem_field_col, mem->column);
+
+	if (mem->validation_bits & CCIX_MEM_ERR_RANK_VALID)
+		sqlite3_bind_int(rec, ccix_mem_field_rank, mem->rank);
+
+	if (mem->validation_bits & CCIX_MEM_ERR_BIT_POS_VALID)
+		sqlite3_bind_int(rec, ccix_mem_field_bit_pos, mem->bit_pos);
+
+	if (mem->validation_bits & CCIX_MEM_ERR_CHIP_ID_VALID)
+		sqlite3_bind_int(rec, ccix_mem_field_chip_id, mem->chip_id);
+
+	if (mem->validation_bits & CCIX_MEM_ERR_VENDOR_DATA_VALID)
+		sqlite3_bind_blob(rec, ccix_mem_field_vendor,
+				  ev->vendor_data, ev->vendor_data_length,
+				  NULL);
+
+	rc = sqlite3_step(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed to do ccix_mem_record step on sqlite: error = %d\n",
+		    rc);
+
+	rc = sqlite3_reset(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed reset ccix_mem_record on sqlite: error = %d\n",
+		    rc);
+
+	rc = sqlite3_clear_bindings(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed to clear ccix_mem_record: error %d\n",
+		    rc);
+	log(TERM, LOG_INFO, "register inserted at db\n");
+	return rc;
+}
+#endif
 /*
  * Table and functions to handle ras:non standard
  */
@@ -547,6 +716,13 @@ int ras_mc_event_opendb(unsigned cpu, struct ras_events *ras)
 					 &extlog_event_tab);
 #endif
 
+#ifdef HAVE_CCIX
+	rc = ras_mc_create_table(priv, &ccix_memory_event_tab);
+	if (rc == SQLITE_OK)
+		rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_mem_record,
+					 &ccix_memory_event_tab);
+#endif
+
 #ifdef HAVE_MCE
 	rc = ras_mc_create_table(priv, &mce_record_tab);
 	if (rc == SQLITE_OK)
diff --git a/ras-record.h b/ras-record.h
index a11f290..4a871fa 100644
--- a/ras-record.h
+++ b/ras-record.h
@@ -44,6 +44,21 @@ struct ras_aer_event {
 	const char *msg;
 };
 
+struct ras_ccix_event {
+	char timestamp[64];
+	int32_t error_seq;
+	int8_t severity;
+	int8_t severity_detail;
+	unsigned long long address;
+	int8_t pa_mask_lsb;
+	uint8_t source;
+	uint8_t component;
+	const char *cper_data;
+	unsigned short cper_data_length;
+	uint16_t vendor_data_length;
+	const char *vendor_data;
+};
+
 struct ras_extlog_event {
 	char timestamp[64];
 	int32_t error_seq;
@@ -98,6 +113,9 @@ struct sqlite3_priv {
 #ifdef HAVE_EXTLOG
 	sqlite3_stmt	*stmt_extlog_record;
 #endif
+#ifdef HAVE_CCIX
+	sqlite3_stmt	*stmt_ccix_mem_record;
+#endif
 #ifdef HAVE_NON_STANDARD
 	sqlite3_stmt	*stmt_non_standard_record;
 #endif
@@ -111,6 +129,7 @@ int ras_store_mc_event(struct ras_events *ras, struct ras_mc_event *ev);
 int ras_store_aer_event(struct ras_events *ras, struct ras_aer_event *ev);
 int ras_store_mce_record(struct ras_events *ras, struct mce_event *ev);
 int ras_store_extlog_mem_record(struct ras_events *ras, struct ras_extlog_event *ev);
+int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev);
 int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev);
 int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev);
 
@@ -120,6 +139,7 @@ static inline int ras_store_mc_event(struct ras_events *ras, struct ras_mc_event
 static inline int ras_store_aer_event(struct ras_events *ras, struct ras_aer_event *ev) { return 0; };
 static inline int ras_store_mce_record(struct ras_events *ras, struct mce_event *ev) { return 0; };
 static inline int ras_store_extlog_mem_record(struct ras_events *ras, struct ras_extlog_event *ev) { return 0; };
+static inline int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev) { return 0; };
 static inline int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev) { return 0; };
 static inline int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev) { return 0; };
 
diff --git a/ras-report.h b/ras-report.h
index 6c466f5..72e950a 100644
--- a/ras-report.h
+++ b/ras-report.h
@@ -19,6 +19,7 @@
 #include "ras-mc-handler.h"
 #include "ras-mce-handler.h"
 #include "ras-aer-handler.h"
+#include "ras-ccix-handler.h"
 
 /* Maximal length of backtrace. */
 #define MAX_BACKTRACE_SIZE (1024*1024)
@@ -34,7 +35,8 @@ enum {
 	MCE_EVENT,
 	AER_EVENT,
 	NON_STANDARD_EVENT,
-	ARM_EVENT
+	ARM_EVENT,
+	CCIX_EVENT,
 };
 
 #ifdef HAVE_ABRT_REPORT
@@ -44,6 +46,7 @@ int ras_report_aer_event(struct ras_events *ras, struct ras_aer_event *ev);
 int ras_report_mce_event(struct ras_events *ras, struct mce_event *ev);
 int ras_report_non_standard_event(struct ras_events *ras, struct ras_non_standard_event *ev);
 int ras_report_arm_event(struct ras_events *ras, struct ras_arm_event *ev);
+int ras_report_ccix_event(struct ras_events *ras, struct ras_ccix_event *ev);
 
 #else
 
@@ -52,6 +55,7 @@ static inline int ras_report_aer_event(struct ras_events *ras, struct ras_aer_ev
 static inline int ras_report_mce_event(struct ras_events *ras, struct mce_event *ev) { return 0; };
 static inline int ras_report_non_standard_event(struct ras_events *ras, struct ras_non_standard_event *ev) { return 0; };
 static inline int ras_report_arm_event(struct ras_events *ras, struct ras_arm_event *ev) { return 0; };
+static inline int ras_report_ccix_event(struct ras_events *ras, struct ras_ccix_event *ev) { return 0; };
 
 #endif
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 2/6] rasdaemon: CCIX: Cache error support
  2019-06-14 17:55 [RFC PATCH 0/6] CCIX rasdaemon support Jonathan Cameron
  2019-06-14 17:55 ` [RFC PATCH 1/6] rasdaemon: CCIX: CCIX memory error reporting Jonathan Cameron
@ 2019-06-14 17:55 ` Jonathan Cameron
  2019-06-14 17:55 ` [RFC PATCH 3/6] rasdaemon: CCIX: ATC errors Jonathan Cameron
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Jonathan Cameron @ 2019-06-14 17:55 UTC (permalink / raw)
  To: linux-edac; +Cc: mchehab, linuxarm, jcm, Jonathan Cameron

Adds the support of CCIX cache error reporting and logging
to sqlite3.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 ras-ccix-handler.c | 114 +++++++++++++++++++++++++++++++++++++++++++++
 ras-ccix-handler.h |  24 ++++++++++
 ras-events.c       |   9 ++++
 ras-record.c       | 100 +++++++++++++++++++++++++++++++++++++++
 ras-record.h       |   3 ++
 5 files changed, 250 insertions(+)

diff --git a/ras-ccix-handler.c b/ras-ccix-handler.c
index 2be413f..f68c297 100644
--- a/ras-ccix-handler.c
+++ b/ras-ccix-handler.c
@@ -127,6 +127,79 @@ static char *ccix_mem_err_cper_data(const char *c)
 	return buf;
 }
 
+static char *ccix_cache_type(uint8_t type)
+{
+	switch (type) {
+	case 0: return "instruction";
+	case 1: return "data";
+	case 2: return "generic/unified";
+	case 3: return "snoop filter directory";
+	}
+	return "unknown";
+}
+
+static char *ccix_cache_err_type(int etype)
+{
+	switch (etype) {
+	case 0: return "data";
+	case 1: return "tag";
+	case 2: return "timeout";
+	case 3: return "hang";
+	case 4: return "data loss";
+	case 5: return "invalid address";
+	}
+	return "unknown-type";
+}
+
+static char *ccix_cache_op(uint8_t op)
+{
+	switch (op) {
+	case 0: return "generic";
+	case 1: return "generic read";
+	case 2: return "generic write";
+	case 3: return "data read";
+	case 4: return "data write";
+	case 5: return "instruction fetch";
+	case 6: return "prefetch";
+	case 7: return "eviction";
+	case 8: return "snooping";
+	case 9: return "snooped";
+	case 10: return "management/command";
+	}
+	return "unknown";
+}
+
+static char *ccix_cache_err_cper_data(const char *c)
+{
+	const struct cper_ccix_cache_err_compact *cpd =
+		(struct cper_ccix_cache_err_compact *)c;
+	static char buf[1024];
+	char *p = buf;
+
+	if (!(cpd->validation_bits))
+		return "";
+
+	p += sprintf(p, " (");
+	if (cpd->validation_bits & CCIX_CACHE_ERR_CACHE_ERR_TYPE_VALID)
+		p += sprintf(p, "error: %s ",
+			     ccix_cache_err_type(cpd->cache_error_type));
+	if (cpd->validation_bits & CCIX_CACHE_ERR_TYPE_VALID)
+		p += sprintf(p, "type: %s ", ccix_cache_type(cpd->cache_type));
+	if (cpd->validation_bits & CCIX_CACHE_ERR_OP_VALID)
+		p += sprintf(p, "op: %s ", ccix_cache_op(cpd->op_type));
+	if (cpd->validation_bits & CCIX_CACHE_ERR_LEVEL_VALID)
+		p += sprintf(p, "level: %u ", cpd->cache_level);
+	if (cpd->validation_bits & CCIX_CACHE_ERR_SET_VALID)
+		p += sprintf(p, "set: %u ", cpd->set);
+	if (cpd->validation_bits & CCIX_CACHE_ERR_WAY_VALID)
+		p += sprintf(p, "way: %u ", cpd->way);
+	if (cpd->validation_bits & CCIX_CACHE_ERR_INSTANCE_ID_VALID)
+		p += sprintf(p, "instance: %u ", cpd->instance);
+	p += sprintf(p - 1, ")");
+
+	return buf;
+}
+
 static char *ccix_component_type(int type)
 {
 	switch (type) {
@@ -242,3 +315,44 @@ int ras_ccix_memory_event_handler(struct trace_seq *s,
 
 	return 0;
 }
+
+int ras_ccix_cache_event_handler(struct trace_seq *s,
+				  struct pevent_record *record,
+				  struct event_format *event, void *context)
+{
+	struct ras_events *ras = context;
+	struct tm *tm;
+	struct ras_ccix_event ev;
+	time_t now;
+	int ret;
+
+	if (ras->use_uptime)
+		now = record->ts/user_hz + ras->uptime_diff;
+	else
+		now = time(NULL);
+
+	tm = localtime(&now);
+
+	if (tm)
+		strftime(ev.timestamp, sizeof(ev.timestamp),
+			 "%Y-%m-%d %H:%M:%S %z", tm);
+	trace_seq_printf(s, "%s ", ev.timestamp);
+	ret = ras_ccix_common_parse(s, record, event, context, &ev);
+	if (ret)
+		return ret;
+
+	trace_seq_printf(s, "%d %s id:%d CCIX cache error %s ue:%d nocomm:%d degraded:%d deferred:%d physical addr: 0x%llx mask: 0x%llx %s",
+			 ev.error_seq, err_severity(ev.severity),
+			 ev.source, ccix_component_type(ev.component),
+			 (ev.severity_detail & 0x1) ? 1 : 0,
+			 (ev.severity_detail & 0x2) ? 1 : 0,
+			 (ev.severity_detail & 0x4) ? 1 : 0,
+			 (ev.severity_detail & 0x8) ? 1 : 0,
+			 ev.address,
+			 err_mask(ev.pa_mask_lsb),
+			 ccix_cache_err_cper_data(ev.cper_data));
+
+	ras_store_ccix_cache_event(ras, &ev);
+
+	return 0;
+}
diff --git a/ras-ccix-handler.h b/ras-ccix-handler.h
index f6d25b1..629ccbe 100644
--- a/ras-ccix-handler.h
+++ b/ras-ccix-handler.h
@@ -21,6 +21,9 @@
 int ras_ccix_memory_event_handler(struct trace_seq *s,
 				  struct pevent_record *record,
 				  struct event_format *event, void *context);
+int ras_ccix_cache_event_handler(struct trace_seq *s,
+				 struct pevent_record *record,
+				 struct event_format *event, void *context);
 
 /* Perhaps unnecessary paranoia, but the tracepoint structure is packed */
 #pragma pack(1)
@@ -41,6 +44,18 @@ struct cper_ccix_mem_err_compact {
 	uint8_t chip_id;
 	uint8_t fru;
 };
+
+struct cper_ccix_cache_err_compact {
+	uint32_t validation_bits;
+	uint32_t set;
+	uint32_t way;
+	uint8_t cache_type;
+	uint8_t op_type;
+	uint8_t cache_error_type;
+	uint8_t cache_level;
+	uint8_t instance;
+};
+
 #pragma pack()
 
 #define CCIX_MEM_ERR_GENERIC_MEM_VALID		0x0001
@@ -58,4 +73,13 @@ struct cper_ccix_mem_err_compact {
 #define CCIX_MEM_ERR_MOD_VALID			0x1000
 #define CCIX_MEM_ERR_SPEC_TYPE_VALID		0x2000
 
+#define CCIX_CACHE_ERR_TYPE_VALID		0x0001
+#define CCIX_CACHE_ERR_OP_VALID			0x0002
+#define CCIX_CACHE_ERR_CACHE_ERR_TYPE_VALID	0x0004
+#define CCIX_CACHE_ERR_LEVEL_VALID		0x0008
+#define CCIX_CACHE_ERR_SET_VALID		0x0010
+#define CCIX_CACHE_ERR_WAY_VALID		0x0020
+#define CCIX_CACHE_ERR_INSTANCE_ID_VALID	0x0040
+#define CCIX_CACHE_ERR_VENDOR_DATA_VALID	0x0080
+
 #endif
diff --git a/ras-events.c b/ras-events.c
index c7bd1c3..1e34cd1 100644
--- a/ras-events.c
+++ b/ras-events.c
@@ -205,6 +205,7 @@ int toggle_ras_mc_event(int enable)
 
 #ifdef HAVE_CCIX
 	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_memory_event", enable);
+	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_cache_event", enable);
 #endif
 
 #ifdef HAVE_MCE
@@ -700,6 +701,14 @@ int handle_ras_events(int record_events)
 	else
 		log(ALL, LOG_ERR, "Can't get traces from %s:%s\n",
 		    "ras", "ccix_memory_event");
+	rc = add_event_handler(ras, pevent, page_size, "ras",
+			       "ccix_cache_error_event",
+			       ras_ccix_cache_event_handler);
+	if (!rc)
+		num_events++;
+	else
+		log(ALL, LOG_ERR, "Can't get traces from %s:%s\n",
+		    "ras", "ccix_cache_event");
 #endif
 
 #ifdef HAVE_NON_STANDARD
diff --git a/ras-record.c b/ras-record.c
index b1a241a..51180f7 100644
--- a/ras-record.c
+++ b/ras-record.c
@@ -326,6 +326,101 @@ int ras_store_ccix_memory_event(struct ras_events *ras,
 	log(TERM, LOG_INFO, "register inserted at db\n");
 	return rc;
 }
+
+enum {
+	ccix_cache_field_type = ccix_field_common_end,
+	ccix_cache_field_operation,
+	ccix_cache_field_error_type,
+	ccix_cache_field_level,
+	ccix_cache_field_set,
+	ccix_cache_field_way,
+	ccix_cache_field_instance,
+	ccix_cache_field_vendor,
+};
+
+static const struct db_fields ccix_cache_event_fields[] = {
+	CCIX_COMMON_FIELDS,
+	[ccix_cache_field_type] =	{ .name = "type",		.type = "INTEGER" },
+	[ccix_cache_field_operation] =	{ .name = "operation",		.type = "INTEGER" },
+	[ccix_cache_field_error_type] =	{ .name = "cache_err_type",	.type = "INTEGER" },
+	[ccix_cache_field_level] =	{ .name = "\"level\"",		.type = "INTEGER" },
+	[ccix_cache_field_set] =	{ .name = "\"set\"",		.type = "INTEGER" },
+	[ccix_cache_field_way] =	{ .name = "way",		.type = "INTEGER" },
+	[ccix_cache_field_instance] =	{ .name = "instance",		.type = "INTEGER" },
+	[ccix_cache_field_vendor] =	{ .name = "vendor_data",	.type = "BLOB" },
+};
+
+static const struct db_table_descriptor ccix_cache_event_tab = {
+	.name = "ccix_cache_event",
+	.fields = ccix_cache_event_fields,
+	.num_fields = ARRAY_SIZE(ccix_cache_event_fields),
+};
+
+int ras_store_ccix_cache_event(struct ras_events *ras,
+			       struct ras_ccix_event *ev)
+{
+	int rc;
+	struct sqlite3_priv *priv = ras->db_priv;
+	struct cper_ccix_cache_err_compact *cache =
+		(struct cper_ccix_cache_err_compact *)ev->cper_data;
+	sqlite3_stmt *rec = priv->stmt_ccix_cache_record;
+
+	if (!priv || !rec)
+		return 0;
+	log(TERM, LOG_INFO, "ccix_cache_eventstore: %p\n", rec);
+
+	ras_store_ccix_common(rec, ev);
+
+	if (cache->validation_bits & CCIX_CACHE_ERR_CACHE_ERR_TYPE_VALID)
+		sqlite3_bind_int(rec, ccix_cache_field_error_type,
+				 cache->cache_error_type);
+
+	if (cache->validation_bits & CCIX_CACHE_ERR_TYPE_VALID)
+		sqlite3_bind_int(rec, ccix_cache_field_type, cache->cache_type);
+
+	if (cache->validation_bits & CCIX_CACHE_ERR_OP_VALID)
+		sqlite3_bind_int(rec, ccix_cache_field_operation,
+				 cache->op_type);
+
+	if (cache->validation_bits & CCIX_CACHE_ERR_LEVEL_VALID)
+		sqlite3_bind_int(rec, ccix_cache_field_level,
+				 cache->cache_level);
+
+	if (cache->validation_bits & CCIX_CACHE_ERR_SET_VALID)
+		sqlite3_bind_int(rec, ccix_cache_field_set, cache->set);
+
+	if (cache->validation_bits & CCIX_CACHE_ERR_WAY_VALID)
+		sqlite3_bind_int(rec, ccix_cache_field_way, cache->way);
+
+	if (cache->validation_bits & CCIX_CACHE_ERR_INSTANCE_ID_VALID)
+		sqlite3_bind_int(rec, ccix_cache_field_instance,
+				 cache->instance);
+
+	if (cache->validation_bits & CCIX_CACHE_ERR_VENDOR_DATA_VALID)
+		sqlite3_bind_blob(rec, ccix_cache_field_vendor,
+				  ev->vendor_data, ev->vendor_data_length,
+				  NULL);
+
+	rc = sqlite3_step(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed to do ccix_cache_record step on sqlite: error = %d\n",
+		    rc);
+
+	rc = sqlite3_reset(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed reset ccix_cache_record on sqlite: error = %d\n",
+		    rc);
+
+	rc = sqlite3_clear_bindings(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed to clear ccix_cache_record: error %d\n",
+		    rc);
+	log(TERM, LOG_INFO, "register inserted at db\n");
+	return rc;
+}
 #endif
 /*
  * Table and functions to handle ras:non standard
@@ -721,6 +816,11 @@ int ras_mc_event_opendb(unsigned cpu, struct ras_events *ras)
 	if (rc == SQLITE_OK)
 		rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_mem_record,
 					 &ccix_memory_event_tab);
+
+	rc = ras_mc_create_table(priv, &ccix_cache_event_tab);
+	if (rc == SQLITE_OK)
+		rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_cache_record,
+					 &ccix_cache_event_tab);
 #endif
 
 #ifdef HAVE_MCE
diff --git a/ras-record.h b/ras-record.h
index 4a871fa..8c90bd5 100644
--- a/ras-record.h
+++ b/ras-record.h
@@ -115,6 +115,7 @@ struct sqlite3_priv {
 #endif
 #ifdef HAVE_CCIX
 	sqlite3_stmt	*stmt_ccix_mem_record;
+	sqlite3_stmt	*stmt_ccix_cache_record;
 #endif
 #ifdef HAVE_NON_STANDARD
 	sqlite3_stmt	*stmt_non_standard_record;
@@ -130,6 +131,7 @@ int ras_store_aer_event(struct ras_events *ras, struct ras_aer_event *ev);
 int ras_store_mce_record(struct ras_events *ras, struct mce_event *ev);
 int ras_store_extlog_mem_record(struct ras_events *ras, struct ras_extlog_event *ev);
 int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev);
+int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev);
 int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev);
 int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev);
 
@@ -140,6 +142,7 @@ static inline int ras_store_aer_event(struct ras_events *ras, struct ras_aer_eve
 static inline int ras_store_mce_record(struct ras_events *ras, struct mce_event *ev) { return 0; };
 static inline int ras_store_extlog_mem_record(struct ras_events *ras, struct ras_extlog_event *ev) { return 0; };
 static inline int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev) { return 0; };
+static inline int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; };
 static inline int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev) { return 0; };
 static inline int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev) { return 0; };
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 3/6] rasdaemon: CCIX: ATC errors
  2019-06-14 17:55 [RFC PATCH 0/6] CCIX rasdaemon support Jonathan Cameron
  2019-06-14 17:55 ` [RFC PATCH 1/6] rasdaemon: CCIX: CCIX memory error reporting Jonathan Cameron
  2019-06-14 17:55 ` [RFC PATCH 2/6] rasdaemon: CCIX: Cache error support Jonathan Cameron
@ 2019-06-14 17:55 ` Jonathan Cameron
  2019-06-14 17:55 ` [RFC PATCH 4/6] rasdaemon: CCIX: Port error handling Jonathan Cameron
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Jonathan Cameron @ 2019-06-14 17:55 UTC (permalink / raw)
  To: linux-edac; +Cc: mchehab, linuxarm, jcm, Jonathan Cameron

Adds support for CCIX address translation cache (ATC) errors.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 ras-ccix-handler.c | 61 ++++++++++++++++++++++++++++++++++++++++
 ras-ccix-handler.h | 13 +++++++++
 ras-events.c       |  9 ++++++
 ras-record.c       | 69 ++++++++++++++++++++++++++++++++++++++++++++++
 ras-record.h       |  3 ++
 5 files changed, 155 insertions(+)

diff --git a/ras-ccix-handler.c b/ras-ccix-handler.c
index f68c297..f7b9e8e 100644
--- a/ras-ccix-handler.c
+++ b/ras-ccix-handler.c
@@ -200,6 +200,26 @@ static char *ccix_cache_err_cper_data(const char *c)
 	return buf;
 }
 
+static char *ccix_atc_err_cper_data(const char *c)
+{
+	const struct cper_ccix_atc_err_compact *cpd =
+		(struct cper_ccix_atc_err_compact *)c;
+	static char buf[1024];
+	char *p = buf;
+
+	if (!cpd->validation_bits)
+		return "";
+
+	p += sprintf(p, " (");
+	if (cpd->validation_bits & CCIX_ATC_ERR_OP_VALID)
+		p += sprintf(p, "op: %s ", ccix_cache_op(cpd->op_type));
+	if (cpd->validation_bits & CCIX_ATC_ERR_INSTANCE_ID_VALID)
+		p += sprintf(p, "instance: %u ", cpd->instance);
+	p += sprintf(p - 1, ")");
+
+	return buf;
+}
+
 static char *ccix_component_type(int type)
 {
 	switch (type) {
@@ -356,3 +376,44 @@ int ras_ccix_cache_event_handler(struct trace_seq *s,
 
 	return 0;
 }
+
+int ras_ccix_atc_event_handler(struct trace_seq *s,
+			       struct pevent_record *record,
+			       struct event_format *event, void *context)
+{
+	struct ras_events *ras = context;
+	struct tm *tm;
+	struct ras_ccix_event ev;
+	time_t now;
+	int ret;
+
+	if (ras->use_uptime)
+		now = record->ts/user_hz + ras->uptime_diff;
+	else
+		now = time(NULL);
+
+	tm = localtime(&now);
+
+	if (tm)
+		strftime(ev.timestamp, sizeof(ev.timestamp),
+			 "%Y-%m-%d %H:%M:%S %z", tm);
+	trace_seq_printf(s, "%s ", ev.timestamp);
+	ret = ras_ccix_common_parse(s, record, event, context, &ev);
+	if (ret)
+		return ret;
+
+	trace_seq_printf(s, "%d %s id:%d CCIX ATC error: %s ue:%d nocomm:%d degraded:%d deferred:%d physical addr: 0x%llx mask: 0x%llx %s",
+			 ev.error_seq, err_severity(ev.severity),
+			 ev.source, ccix_component_type(ev.component),
+			 (ev.severity_detail & 0x1) ? 1 : 0,
+			 (ev.severity_detail & 0x2) ? 1 : 0,
+			 (ev.severity_detail & 0x4) ? 1 : 0,
+			 (ev.severity_detail & 0x8) ? 1 : 0,
+			 ev.address,
+			 err_mask(ev.pa_mask_lsb),
+			 ccix_atc_err_cper_data(ev.cper_data));
+
+	ras_store_ccix_atc_event(ras, &ev);
+
+	return 0;
+}
diff --git a/ras-ccix-handler.h b/ras-ccix-handler.h
index 629ccbe..4528af7 100644
--- a/ras-ccix-handler.h
+++ b/ras-ccix-handler.h
@@ -24,6 +24,9 @@ int ras_ccix_memory_event_handler(struct trace_seq *s,
 int ras_ccix_cache_event_handler(struct trace_seq *s,
 				 struct pevent_record *record,
 				 struct event_format *event, void *context);
+int ras_ccix_atc_event_handler(struct trace_seq *s,
+			       struct pevent_record *record,
+			       struct event_format *event, void *context);
 
 /* Perhaps unnecessary paranoia, but the tracepoint structure is packed */
 #pragma pack(1)
@@ -56,6 +59,12 @@ struct cper_ccix_cache_err_compact {
 	uint8_t instance;
 };
 
+struct cper_ccix_atc_err_compact {
+	uint32_t validation_bits;
+	uint8_t op_type;
+	uint8_t instance;
+};
+
 #pragma pack()
 
 #define CCIX_MEM_ERR_GENERIC_MEM_VALID		0x0001
@@ -82,4 +91,8 @@ struct cper_ccix_cache_err_compact {
 #define CCIX_CACHE_ERR_INSTANCE_ID_VALID	0x0040
 #define CCIX_CACHE_ERR_VENDOR_DATA_VALID	0x0080
 
+#define CCIX_ATC_ERR_OP_VALID			0x0001
+#define CCIX_ATC_ERR_INSTANCE_ID_VALID		0x0002
+#define CCIX_ATC_ERR_VENDOR_DATA_VALID		0x0004
+
 #endif
diff --git a/ras-events.c b/ras-events.c
index 1e34cd1..20e29dd 100644
--- a/ras-events.c
+++ b/ras-events.c
@@ -206,6 +206,7 @@ int toggle_ras_mc_event(int enable)
 #ifdef HAVE_CCIX
 	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_memory_event", enable);
 	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_cache_event", enable);
+	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_atc_event", enable);
 #endif
 
 #ifdef HAVE_MCE
@@ -709,6 +710,14 @@ int handle_ras_events(int record_events)
 	else
 		log(ALL, LOG_ERR, "Can't get traces from %s:%s\n",
 		    "ras", "ccix_cache_event");
+	rc = add_event_handler(ras, pevent, page_size, "ras",
+			       "ccix_atc_error_event",
+			       ras_ccix_atc_event_handler);
+	if (!rc)
+		num_events++;
+	else
+		log(ALL, LOG_ERR, "Can't get traces from %s:%s\n",
+		    "ras", "ccix_atc_event");
 #endif
 
 #ifdef HAVE_NON_STANDARD
diff --git a/ras-record.c b/ras-record.c
index 51180f7..3e8d720 100644
--- a/ras-record.c
+++ b/ras-record.c
@@ -421,6 +421,70 @@ int ras_store_ccix_cache_event(struct ras_events *ras,
 	log(TERM, LOG_INFO, "register inserted at db\n");
 	return rc;
 }
+
+enum {
+	ccix_atc_field_operation = ccix_field_common_end,
+	ccix_atc_field_instance,
+	ccix_atc_field_vendor,
+};
+
+static const struct db_fields ccix_atc_event_fields[] = {
+	CCIX_COMMON_FIELDS,
+	[ccix_atc_field_operation] =	{ .name = "operation",		.type = "INTEGER" },
+	[ccix_atc_field_instance] =	{ .name = "instance",		.type = "INTEGER" },
+	[ccix_atc_field_vendor] =	{ .name = "vendor_data",	.type = "BLOB" },
+};
+
+static const struct db_table_descriptor ccix_atc_event_tab = {
+	.name = "ccix_atc_event",
+	.fields = ccix_atc_event_fields,
+	.num_fields = ARRAY_SIZE(ccix_atc_event_fields),
+};
+
+int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev)
+{
+	int rc;
+	struct sqlite3_priv *priv = ras->db_priv;
+	struct cper_ccix_atc_err_compact *atc =
+		(struct cper_ccix_atc_err_compact *)ev->cper_data;
+	sqlite3_stmt *rec = priv->stmt_ccix_atc_record;
+
+	if (!priv || !rec)
+		return 0;
+	log(TERM, LOG_INFO, "ccix_atc_eventstore: %p\n", rec);
+
+	ras_store_ccix_common(priv->stmt_ccix_atc_record, ev);
+	if (atc->validation_bits & CCIX_ATC_ERR_OP_VALID)
+		sqlite3_bind_int(rec, ccix_atc_field_operation, atc->op_type);
+
+	if (atc->validation_bits & CCIX_ATC_ERR_INSTANCE_ID_VALID)
+		sqlite3_bind_int(rec, ccix_atc_field_instance, atc->instance);
+
+	if (atc->validation_bits & CCIX_ATC_ERR_VENDOR_DATA_VALID)
+		sqlite3_bind_blob(rec, ccix_atc_field_vendor,
+				  ev->vendor_data, ev->vendor_data_length,
+				  NULL);
+
+	rc = sqlite3_step(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed to do ccix_atc_record step on sqlite: error = %d\n",
+		    rc);
+
+	rc = sqlite3_reset(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed reset ccix_atc_record on sqlite: error = %d\n",
+		    rc);
+
+	rc = sqlite3_clear_bindings(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed to clear ccix_atc_record: error %d\n",
+		    rc);
+	log(TERM, LOG_INFO, "register inserted at db\n");
+	return rc;
+}
 #endif
 /*
  * Table and functions to handle ras:non standard
@@ -821,6 +885,11 @@ int ras_mc_event_opendb(unsigned cpu, struct ras_events *ras)
 	if (rc == SQLITE_OK)
 		rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_cache_record,
 					 &ccix_cache_event_tab);
+
+	rc = ras_mc_create_table(priv, &ccix_atc_event_tab);
+	if (rc == SQLITE_OK)
+		rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_atc_record,
+					 &ccix_atc_event_tab);
 #endif
 
 #ifdef HAVE_MCE
diff --git a/ras-record.h b/ras-record.h
index 8c90bd5..8141d26 100644
--- a/ras-record.h
+++ b/ras-record.h
@@ -116,6 +116,7 @@ struct sqlite3_priv {
 #ifdef HAVE_CCIX
 	sqlite3_stmt	*stmt_ccix_mem_record;
 	sqlite3_stmt	*stmt_ccix_cache_record;
+	sqlite3_stmt	*stmt_ccix_atc_record;
 #endif
 #ifdef HAVE_NON_STANDARD
 	sqlite3_stmt	*stmt_non_standard_record;
@@ -132,6 +133,7 @@ int ras_store_mce_record(struct ras_events *ras, struct mce_event *ev);
 int ras_store_extlog_mem_record(struct ras_events *ras, struct ras_extlog_event *ev);
 int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev);
 int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev);
+int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev);
 int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev);
 int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev);
 
@@ -143,6 +145,7 @@ static inline int ras_store_mce_record(struct ras_events *ras, struct mce_event
 static inline int ras_store_extlog_mem_record(struct ras_events *ras, struct ras_extlog_event *ev) { return 0; };
 static inline int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev) { return 0; };
 static inline int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; };
+static inline int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; };
 static inline int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev) { return 0; };
 static inline int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev) { return 0; };
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 4/6] rasdaemon: CCIX: Port error handling
  2019-06-14 17:55 [RFC PATCH 0/6] CCIX rasdaemon support Jonathan Cameron
                   ` (2 preceding siblings ...)
  2019-06-14 17:55 ` [RFC PATCH 3/6] rasdaemon: CCIX: ATC errors Jonathan Cameron
@ 2019-06-14 17:55 ` Jonathan Cameron
  2019-06-14 17:55 ` [RFC PATCH 5/6] rasdaemon: CCIX: Link error support Jonathan Cameron
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Jonathan Cameron @ 2019-06-14 17:55 UTC (permalink / raw)
  To: linux-edac; +Cc: mchehab, linuxarm, jcm, Jonathan Cameron

Add support for reporting and storing to sqlite3 for CCIX
Port errors.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 ras-ccix-handler.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++
 ras-ccix-handler.h | 14 +++++++
 ras-events.c       |  9 +++++
 ras-record.c       | 75 +++++++++++++++++++++++++++++++++++++
 ras-record.h       |  3 ++
 5 files changed, 194 insertions(+)

diff --git a/ras-ccix-handler.c b/ras-ccix-handler.c
index f7b9e8e..0a79627 100644
--- a/ras-ccix-handler.c
+++ b/ras-ccix-handler.c
@@ -220,6 +220,58 @@ static char *ccix_atc_err_cper_data(const char *c)
 	return buf;
 }
 
+static char *ccix_port_op(uint8_t op)
+{
+	switch (op) {
+	case 0: return "command";
+	case 1: return "read";
+	case 2: return "write";
+	}
+	return "unknown";
+}
+
+static char *ccix_port_err_type(uint8_t type)
+{
+	switch (type) {
+	case 0: return "generic bus / slave error";
+	case 1: return "bus parity / ECC error";
+	case 2: return "BDF not present";
+	case 3: return "invalid address";
+	case 4: return "invalid agent ID";
+	case 5: return "bus timeout";
+	case 6: return "hang";
+	case 7: return "egress blocked";
+	}
+	return "unknown-type";
+};
+
+static char *ccix_port_err_cper_data(const char *c)
+{
+	const struct cper_ccix_port_err_compact *cpd =
+		(struct cper_ccix_port_err_compact *)c;
+	static char buf[1024];
+	char *p = buf;
+	int i;
+
+	if (!cpd->validation_bits)
+		return "";
+
+	p += sprintf(p, " (");
+	if (cpd->validation_bits & CCIX_PORT_ERR_TYPE_VALID)
+		p += sprintf(p, "error: %s ",
+			     ccix_port_err_type(cpd->err_type));
+	if (cpd->validation_bits & CCIX_PORT_ERR_OP_VALID)
+		p += sprintf(p, "op: %s ", ccix_port_op(cpd->op_type));
+	if (cpd->validation_bits & CCIX_PORT_ERR_MESSAGE_VALID) {
+		p += sprintf(p, "message: ");
+		for (i = 0; i < 8; i++)
+			p += sprintf(p, "0x%08x ", cpd->message[i]);
+	}
+	p += sprintf(p - 1, ")");
+
+	return buf;
+}
+
 static char *ccix_component_type(int type)
 {
 	switch (type) {
@@ -417,3 +469,44 @@ int ras_ccix_atc_event_handler(struct trace_seq *s,
 
 	return 0;
 }
+
+int ras_ccix_port_event_handler(struct trace_seq *s,
+				struct pevent_record *record,
+				struct event_format *event, void *context)
+{
+	struct ras_events *ras = context;
+	struct tm *tm;
+	struct ras_ccix_event ev;
+	time_t now;
+	int ret;
+
+	if (ras->use_uptime)
+		now = record->ts/user_hz + ras->uptime_diff;
+	else
+		now = time(NULL);
+
+	tm = localtime(&now);
+
+	if (tm)
+		strftime(ev.timestamp, sizeof(ev.timestamp),
+			 "%Y-%m-%d %H:%M:%S %z", tm);
+	trace_seq_printf(s, "%s ", ev.timestamp);
+	ret = ras_ccix_common_parse(s, record, event, context, &ev);
+	if (ret)
+		return ret;
+
+	trace_seq_printf(s, "%d %s id:%d CCIX Port error: %s ue:%d nocomm:%d degraded:%d deferred:%d physical addr: 0x%llx mask: 0x%llx %s",
+			 ev.error_seq, err_severity(ev.severity),
+			 ev.source, ccix_component_type(ev.component),
+			 (ev.severity_detail & 0x1) ? 1 : 0,
+			 (ev.severity_detail & 0x2) ? 1 : 0,
+			 (ev.severity_detail & 0x4) ? 1 : 0,
+			 (ev.severity_detail & 0x8) ? 1 : 0,
+			 ev.address,
+			 err_mask(ev.pa_mask_lsb),
+			 ccix_port_err_cper_data(ev.cper_data));
+
+	ras_store_ccix_port_event(ras, &ev);
+
+	return 0;
+}
diff --git a/ras-ccix-handler.h b/ras-ccix-handler.h
index 4528af7..e824aed 100644
--- a/ras-ccix-handler.h
+++ b/ras-ccix-handler.h
@@ -27,6 +27,9 @@ int ras_ccix_cache_event_handler(struct trace_seq *s,
 int ras_ccix_atc_event_handler(struct trace_seq *s,
 			       struct pevent_record *record,
 			       struct event_format *event, void *context);
+int ras_ccix_port_event_handler(struct trace_seq *s,
+				struct pevent_record *record,
+				struct event_format *event, void *context);
 
 /* Perhaps unnecessary paranoia, but the tracepoint structure is packed */
 #pragma pack(1)
@@ -65,6 +68,12 @@ struct cper_ccix_atc_err_compact {
 	uint8_t instance;
 };
 
+struct cper_ccix_port_err_compact {
+	uint32_t validation_bits;
+	uint32_t message[8];
+	uint8_t err_type;
+	uint8_t op_type;
+};
 #pragma pack()
 
 #define CCIX_MEM_ERR_GENERIC_MEM_VALID		0x0001
@@ -95,4 +104,9 @@ struct cper_ccix_atc_err_compact {
 #define CCIX_ATC_ERR_INSTANCE_ID_VALID		0x0002
 #define CCIX_ATC_ERR_VENDOR_DATA_VALID		0x0004
 
+#define CCIX_PORT_ERR_OP_VALID			0x0001
+#define CCIX_PORT_ERR_TYPE_VALID		0x0002
+#define CCIX_PORT_ERR_MESSAGE_VALID		0x0004
+#define CCIX_PORT_ERR_VENDOR_DATA_VALID		0x0008
+
 #endif
diff --git a/ras-events.c b/ras-events.c
index 20e29dd..aef5eae 100644
--- a/ras-events.c
+++ b/ras-events.c
@@ -207,6 +207,7 @@ int toggle_ras_mc_event(int enable)
 	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_memory_event", enable);
 	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_cache_event", enable);
 	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_atc_event", enable);
+	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_port_event", enable);
 #endif
 
 #ifdef HAVE_MCE
@@ -718,6 +719,14 @@ int handle_ras_events(int record_events)
 	else
 		log(ALL, LOG_ERR, "Can't get traces from %s:%s\n",
 		    "ras", "ccix_atc_event");
+	rc = add_event_handler(ras, pevent, page_size, "ras",
+			       "ccix_port_error_event",
+			       ras_ccix_port_event_handler);
+	if (!rc)
+		num_events++;
+	else
+		log(ALL, LOG_ERR, "Can't get traces from %s:%s\n",
+		    "ras", "ccix_port_event");
 #endif
 
 #ifdef HAVE_NON_STANDARD
diff --git a/ras-record.c b/ras-record.c
index 3e8d720..4824dd3 100644
--- a/ras-record.c
+++ b/ras-record.c
@@ -485,6 +485,76 @@ int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev)
 	log(TERM, LOG_INFO, "register inserted at db\n");
 	return rc;
 }
+
+enum {
+	ccix_port_field_operation = ccix_field_common_end,
+	ccix_port_field_etype,
+	ccix_port_field_message,
+	ccix_port_field_vendor,
+};
+
+static const struct db_fields ccix_port_event_fields[] = {
+	CCIX_COMMON_FIELDS,
+	[ccix_port_field_operation] =	{ .name = "operation",		.type = "INTEGER" },
+	[ccix_port_field_etype] =	{ .name = "etype",		.type = "INTEGER" },
+	[ccix_port_field_message] =	{ .name = "message",		.type = "BLOB" },
+	[ccix_port_field_vendor] =	{ .name = "vendor_data",	.type = "BLOB" },
+};
+
+static const struct db_table_descriptor ccix_port_event_tab = {
+	.name = "ccix_port_event",
+	.fields = ccix_port_event_fields,
+	.num_fields = ARRAY_SIZE(ccix_port_event_fields),
+};
+
+int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev)
+{
+	int rc;
+	struct sqlite3_priv *priv = ras->db_priv;
+	struct cper_ccix_port_err_compact *port =
+		(struct cper_ccix_port_err_compact *)ev->cper_data;
+	sqlite3_stmt *rec = priv->stmt_ccix_port_record;
+
+	if (!priv || !rec)
+		return 0;
+	log(TERM, LOG_INFO, "ccix_port_eventstore: %p\n", rec);
+
+	ras_store_ccix_common(rec, ev);
+	if (port->validation_bits & CCIX_PORT_ERR_OP_VALID)
+		sqlite3_bind_int(rec, ccix_port_field_operation, port->op_type);
+
+	if (port->validation_bits & CCIX_PORT_ERR_TYPE_VALID)
+		sqlite3_bind_int(rec, ccix_port_field_etype, port->err_type);
+
+	if (port->validation_bits & CCIX_PORT_ERR_MESSAGE_VALID)
+		sqlite3_bind_blob(rec, ccix_port_field_message,
+				  port->message, sizeof(port->message), NULL);
+
+	if (port->validation_bits & CCIX_PORT_ERR_VENDOR_DATA_VALID)
+		sqlite3_bind_blob(rec, ccix_port_field_vendor,
+				  ev->vendor_data, ev->vendor_data_length,
+				  NULL);
+
+	rc = sqlite3_step(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed to do ccix_port_record step on sqlite: error = %d\n",
+		    rc);
+
+	rc = sqlite3_reset(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed reset ccix_port_record on sqlite: error = %d\n",
+		    rc);
+
+	rc = sqlite3_clear_bindings(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed to clear ccix_port_record: error %d\n",
+		    rc);
+	log(TERM, LOG_INFO, "register inserted at db\n");
+	return rc;
+}
 #endif
 /*
  * Table and functions to handle ras:non standard
@@ -890,6 +960,11 @@ int ras_mc_event_opendb(unsigned cpu, struct ras_events *ras)
 	if (rc == SQLITE_OK)
 		rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_atc_record,
 					 &ccix_atc_event_tab);
+
+	rc = ras_mc_create_table(priv, &ccix_port_event_tab);
+	if (rc == SQLITE_OK)
+		rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_port_record,
+					 &ccix_port_event_tab);
 #endif
 
 #ifdef HAVE_MCE
diff --git a/ras-record.h b/ras-record.h
index 8141d26..3a473a5 100644
--- a/ras-record.h
+++ b/ras-record.h
@@ -117,6 +117,7 @@ struct sqlite3_priv {
 	sqlite3_stmt	*stmt_ccix_mem_record;
 	sqlite3_stmt	*stmt_ccix_cache_record;
 	sqlite3_stmt	*stmt_ccix_atc_record;
+	sqlite3_stmt	*stmt_ccix_port_record;
 #endif
 #ifdef HAVE_NON_STANDARD
 	sqlite3_stmt	*stmt_non_standard_record;
@@ -134,6 +135,7 @@ int ras_store_extlog_mem_record(struct ras_events *ras, struct ras_extlog_event
 int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev);
 int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev);
 int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev);
+int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev);
 int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev);
 int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev);
 
@@ -146,6 +148,7 @@ static inline int ras_store_extlog_mem_record(struct ras_events *ras, struct ras
 static inline int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev) { return 0; };
 static inline int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; };
 static inline int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; };
+static inline int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; };
 static inline int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev) { return 0; };
 static inline int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev) { return 0; };
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 5/6] rasdaemon: CCIX: Link error support
  2019-06-14 17:55 [RFC PATCH 0/6] CCIX rasdaemon support Jonathan Cameron
                   ` (3 preceding siblings ...)
  2019-06-14 17:55 ` [RFC PATCH 4/6] rasdaemon: CCIX: Port error handling Jonathan Cameron
@ 2019-06-14 17:55 ` Jonathan Cameron
  2019-06-14 17:55 ` [RFC PATCH 6/6] rasdaemon: CCIX: Agent Internal " Jonathan Cameron
  2019-06-21 18:56 ` [RFC PATCH 0/6] CCIX rasdaemon support Mauro Carvalho Chehab
  6 siblings, 0 replies; 8+ messages in thread
From: Jonathan Cameron @ 2019-06-14 17:55 UTC (permalink / raw)
  To: linux-edac; +Cc: mchehab, linuxarm, jcm, Jonathan Cameron

Add support for reporting and storing to sqlite3 of
CCIX Link errors.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 ras-ccix-handler.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++
 ras-ccix-handler.h | 19 +++++++++
 ras-events.c       |  9 +++++
 ras-record.c       | 87 +++++++++++++++++++++++++++++++++++++++++
 ras-record.h       |  3 ++
 5 files changed, 214 insertions(+)

diff --git a/ras-ccix-handler.c b/ras-ccix-handler.c
index 0a79627..69baa48 100644
--- a/ras-ccix-handler.c
+++ b/ras-ccix-handler.c
@@ -272,6 +272,61 @@ static char *ccix_port_err_cper_data(const char *c)
 	return buf;
 }
 
+static char *ccix_link_err_type(uint8_t err)
+{
+	switch (err) {
+	case 0: return "generic";
+	case 1: return "credit underflow";
+	case 2: return "credit overflow";
+	case 3: return "unusable credit";
+	case 4: return "credit timeout";
+	}
+	return "unknown";
+};
+
+static char *ccix_link_credit(uint8_t credit)
+{
+	switch (credit) {
+	case 0: return "memory";
+	case 1: return "snoop";
+	case 2: return "data";
+	case 3: return "misc";
+	}
+	return "unknown";
+};
+
+static char *ccix_link_err_cper_data(const char *c)
+{
+	const struct cper_ccix_link_err_compact *cpd =
+		(struct cper_ccix_link_err_compact *)c;
+	static char buf[1024];
+	char *p = buf;
+	int i;
+
+	if (!cpd->validation_bits)
+		return "";
+
+	p += sprintf(p, " (");
+	if (cpd->validation_bits & CCIX_LINK_ERR_TYPE_VALID)
+		p += sprintf(p, "error: %s ",
+			     ccix_link_err_type(cpd->err_type));
+	if (cpd->validation_bits & CCIX_LINK_ERR_OP_VALID)
+		p += sprintf(p, "op: %s ", ccix_port_op(cpd->op_type));
+	if (cpd->validation_bits & CCIX_LINK_ERR_LINK_ID_VALID)
+		p += sprintf(p, "id: %u ", cpd->link_id);
+	if (cpd->validation_bits & CCIX_LINK_ERR_CREDIT_TYPE_VALID)
+		p += sprintf(p, "credit-type: %s ",
+			     ccix_link_credit(cpd->credit_type));
+	if (cpd->validation_bits & CCIX_LINK_ERR_MESSAGE_VALID) {
+		p += sprintf(p, "message: ");
+		for (i = 0; i < 8; i++)
+			p += sprintf(p, "0x%08x ", cpd->message[i]);
+	}
+	p += sprintf(p - 1, ")");
+
+	return buf;
+}
+
 static char *ccix_component_type(int type)
 {
 	switch (type) {
@@ -510,3 +565,44 @@ int ras_ccix_port_event_handler(struct trace_seq *s,
 
 	return 0;
 }
+
+int ras_ccix_link_event_handler(struct trace_seq *s,
+				struct pevent_record *record,
+				struct event_format *event, void *context)
+{
+	struct ras_events *ras = context;
+	struct tm *tm;
+	struct ras_ccix_event ev;
+	time_t now;
+	int ret;
+
+	if (ras->use_uptime)
+		now = record->ts/user_hz + ras->uptime_diff;
+	else
+		now = time(NULL);
+
+	tm = localtime(&now);
+
+	if (tm)
+		strftime(ev.timestamp, sizeof(ev.timestamp),
+			 "%Y-%m-%d %H:%M:%S %z", tm);
+	trace_seq_printf(s, "%s ", ev.timestamp);
+	ret = ras_ccix_common_parse(s, record, event, context, &ev);
+	if (ret)
+		return ret;
+
+	trace_seq_printf(s, "%d %s id:%d CCIX Link error: %s ue:%d nocomm:%d degraded:%d deferred:%d physical addr: 0x%llx mask: 0x%llx %s",
+			 ev.error_seq, err_severity(ev.severity),
+			 ev.source, ccix_component_type(ev.component),
+			 (ev.severity_detail & 0x1) ? 1 : 0,
+			 (ev.severity_detail & 0x2) ? 1 : 0,
+			 (ev.severity_detail & 0x4) ? 1 : 0,
+			 (ev.severity_detail & 0x8) ? 1 : 0,
+			 ev.address,
+			 err_mask(ev.pa_mask_lsb),
+			 ccix_link_err_cper_data(ev.cper_data));
+
+	ras_store_ccix_link_event(ras, &ev);
+
+	return 0;
+}
diff --git a/ras-ccix-handler.h b/ras-ccix-handler.h
index e824aed..3def534 100644
--- a/ras-ccix-handler.h
+++ b/ras-ccix-handler.h
@@ -30,6 +30,9 @@ int ras_ccix_atc_event_handler(struct trace_seq *s,
 int ras_ccix_port_event_handler(struct trace_seq *s,
 				struct pevent_record *record,
 				struct event_format *event, void *context);
+int ras_ccix_link_event_handler(struct trace_seq *s,
+				struct pevent_record *record,
+				struct event_format *event, void *context);
 
 /* Perhaps unnecessary paranoia, but the tracepoint structure is packed */
 #pragma pack(1)
@@ -74,6 +77,15 @@ struct cper_ccix_port_err_compact {
 	uint8_t err_type;
 	uint8_t op_type;
 };
+
+struct cper_ccix_link_err_compact {
+	uint32_t validation_bits;
+	uint32_t message[8];
+	uint8_t err_type;
+	uint8_t op_type;
+	uint8_t link_id;
+	uint8_t credit_type;
+};
 #pragma pack()
 
 #define CCIX_MEM_ERR_GENERIC_MEM_VALID		0x0001
@@ -109,4 +121,11 @@ struct cper_ccix_port_err_compact {
 #define CCIX_PORT_ERR_MESSAGE_VALID		0x0004
 #define CCIX_PORT_ERR_VENDOR_DATA_VALID		0x0008
 
+#define CCIX_LINK_ERR_OP_VALID			0x0001
+#define CCIX_LINK_ERR_TYPE_VALID		0x0002
+#define CCIX_LINK_ERR_LINK_ID_VALID		0x0004
+#define CCIX_LINK_ERR_CREDIT_TYPE_VALID		0x0008
+#define CCIX_LINK_ERR_MESSAGE_VALID		0x0010
+#define CCIX_LINK_ERR_VENDOR_DATA_VALID		0x0020
+
 #endif
diff --git a/ras-events.c b/ras-events.c
index aef5eae..96c406e 100644
--- a/ras-events.c
+++ b/ras-events.c
@@ -208,6 +208,7 @@ int toggle_ras_mc_event(int enable)
 	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_cache_event", enable);
 	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_atc_event", enable);
 	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_port_event", enable);
+	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_link_event", enable);
 #endif
 
 #ifdef HAVE_MCE
@@ -727,6 +728,14 @@ int handle_ras_events(int record_events)
 	else
 		log(ALL, LOG_ERR, "Can't get traces from %s:%s\n",
 		    "ras", "ccix_port_event");
+	rc = add_event_handler(ras, pevent, page_size, "ras",
+			       "ccix_link_error_event",
+			       ras_ccix_link_event_handler);
+	if (!rc)
+		num_events++;
+	else
+		log(ALL, LOG_ERR, "Can't get traces from %s:%s\n",
+		    "ras", "ccix_link_event");
 #endif
 
 #ifdef HAVE_NON_STANDARD
diff --git a/ras-record.c b/ras-record.c
index 4824dd3..51ccc02 100644
--- a/ras-record.c
+++ b/ras-record.c
@@ -555,6 +555,88 @@ int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev)
 	log(TERM, LOG_INFO, "register inserted at db\n");
 	return rc;
 }
+
+enum {
+	ccix_link_field_operation = ccix_field_common_end,
+	ccix_link_field_etype,
+	ccix_link_field_link_id,
+	ccix_link_field_credit_type,
+	ccix_link_field_message,
+	ccix_link_field_vendor,
+};
+
+static const struct db_fields ccix_link_event_fields[] = {
+	CCIX_COMMON_FIELDS,
+	[ccix_link_field_operation] =	{ .name = "operation",		.type = "INTEGER" },
+	[ccix_link_field_etype] =	{ .name = "etype",		.type = "INTEGER" },
+	[ccix_link_field_link_id] =	{ .name = "credit_id",		.type = "INTEGER" },
+	[ccix_link_field_credit_type] =	{ .name = "credit_type",	.type = "INTEGER" },
+	[ccix_link_field_message] =	{ .name = "message",		.type = "BLOB" },
+	[ccix_link_field_vendor] =	{ .name = "vendor_data",	.type = "BLOB" },
+};
+
+static const struct db_table_descriptor ccix_link_event_tab = {
+	.name = "ccix_link_event",
+	.fields = ccix_link_event_fields,
+	.num_fields = ARRAY_SIZE(ccix_link_event_fields),
+};
+
+int ras_store_ccix_link_event(struct ras_events *ras, struct ras_ccix_event *ev)
+{
+	int rc;
+	struct sqlite3_priv *priv = ras->db_priv;
+	struct cper_ccix_link_err_compact *link =
+		(struct cper_ccix_link_err_compact *)ev->cper_data;
+	sqlite3_stmt *rec = priv->stmt_ccix_link_record;
+
+	if (!priv || !rec)
+		return 0;
+	log(TERM, LOG_INFO, "ccix_link_eventstore: %p\n", rec);
+
+	ras_store_ccix_common(rec, ev);
+	if (link->validation_bits & CCIX_LINK_ERR_OP_VALID)
+		sqlite3_bind_int(rec, ccix_link_field_operation, link->op_type);
+
+	if (link->validation_bits & CCIX_LINK_ERR_TYPE_VALID)
+		sqlite3_bind_int(rec, ccix_link_field_operation,
+				 link->err_type);
+
+	if (link->validation_bits & CCIX_LINK_ERR_LINK_ID_VALID)
+		sqlite3_bind_int(rec, ccix_link_field_link_id, link->link_id);
+
+	if (link->validation_bits & CCIX_LINK_ERR_CREDIT_TYPE_VALID)
+		sqlite3_bind_int(rec, ccix_link_field_credit_type,
+				 link->credit_type);
+
+	if (link->validation_bits & CCIX_LINK_ERR_MESSAGE_VALID)
+		sqlite3_bind_blob(rec, ccix_link_field_message,
+				  link->message, sizeof(link->message), NULL);
+
+	if (link->validation_bits & CCIX_LINK_ERR_VENDOR_DATA_VALID)
+		sqlite3_bind_blob(rec, ccix_link_field_vendor,
+				  ev->vendor_data, ev->vendor_data_length,
+				  NULL);
+
+	rc = sqlite3_step(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed to do ccix_link_record step on sqlite: error = %d\n",
+		    rc);
+
+	rc = sqlite3_reset(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed reset ccix_link_record on sqlite: error = %d\n",
+		    rc);
+
+	rc = sqlite3_clear_bindings(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed to clear ccix_link_record: error %d\n",
+		    rc);
+	log(TERM, LOG_INFO, "register inserted at db\n");
+	return rc;
+}
 #endif
 /*
  * Table and functions to handle ras:non standard
@@ -965,6 +1047,11 @@ int ras_mc_event_opendb(unsigned cpu, struct ras_events *ras)
 	if (rc == SQLITE_OK)
 		rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_port_record,
 					 &ccix_port_event_tab);
+
+	rc = ras_mc_create_table(priv, &ccix_link_event_tab);
+	if (rc == SQLITE_OK)
+		rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_link_record,
+					 &ccix_link_event_tab);
 #endif
 
 #ifdef HAVE_MCE
diff --git a/ras-record.h b/ras-record.h
index 3a473a5..47bfb0d 100644
--- a/ras-record.h
+++ b/ras-record.h
@@ -118,6 +118,7 @@ struct sqlite3_priv {
 	sqlite3_stmt	*stmt_ccix_cache_record;
 	sqlite3_stmt	*stmt_ccix_atc_record;
 	sqlite3_stmt	*stmt_ccix_port_record;
+	sqlite3_stmt	*stmt_ccix_link_record;
 #endif
 #ifdef HAVE_NON_STANDARD
 	sqlite3_stmt	*stmt_non_standard_record;
@@ -136,6 +137,7 @@ int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *e
 int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev);
 int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev);
 int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev);
+int ras_store_ccix_link_event(struct ras_events *ras, struct ras_ccix_event *ev);
 int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev);
 int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev);
 
@@ -149,6 +151,7 @@ static inline int ras_store_ccix_memory_event(struct ras_events *ras, struct ras
 static inline int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; };
 static inline int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; };
 static inline int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; };
+static inline int ras_store_ccix_link_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; };
 static inline int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev) { return 0; };
 static inline int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev) { return 0; };
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 6/6] rasdaemon: CCIX: Agent Internal error support
  2019-06-14 17:55 [RFC PATCH 0/6] CCIX rasdaemon support Jonathan Cameron
                   ` (4 preceding siblings ...)
  2019-06-14 17:55 ` [RFC PATCH 5/6] rasdaemon: CCIX: Link error support Jonathan Cameron
@ 2019-06-14 17:55 ` " Jonathan Cameron
  2019-06-21 18:56 ` [RFC PATCH 0/6] CCIX rasdaemon support Mauro Carvalho Chehab
  6 siblings, 0 replies; 8+ messages in thread
From: Jonathan Cameron @ 2019-06-14 17:55 UTC (permalink / raw)
  To: linux-edac; +Cc: mchehab, linuxarm, jcm, Jonathan Cameron

Add support for reporting and stroing to sqlite3 of
CCIX Agent Interal errors.

In the current 1.0 CCIX specification these only have vendor_data
defined.  However, they are structured to allow additional fields
in future so we handle them the same way as all the other CCIX
error types.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 ras-ccix-handler.c | 40 ++++++++++++++++++++++++++++++
 ras-ccix-handler.h |  8 ++++++
 ras-events.c       |  9 +++++++
 ras-record.c       | 61 ++++++++++++++++++++++++++++++++++++++++++++++
 ras-record.h       |  3 +++
 5 files changed, 121 insertions(+)

diff --git a/ras-ccix-handler.c b/ras-ccix-handler.c
index 69baa48..2088790 100644
--- a/ras-ccix-handler.c
+++ b/ras-ccix-handler.c
@@ -606,3 +606,43 @@ int ras_ccix_link_event_handler(struct trace_seq *s,
 
 	return 0;
 }
+
+int ras_ccix_agent_event_handler(struct trace_seq *s,
+				 struct pevent_record *record,
+				 struct event_format *event, void *context)
+{
+	struct ras_events *ras = context;
+	struct tm *tm;
+	struct ras_ccix_event ev;
+	time_t now;
+	int ret;
+
+	if (ras->use_uptime)
+		now = record->ts/user_hz + ras->uptime_diff;
+	else
+		now = time(NULL);
+
+	tm = localtime(&now);
+
+	if (tm)
+		strftime(ev.timestamp, sizeof(ev.timestamp),
+			 "%Y-%m-%d %H:%M:%S %z", tm);
+	trace_seq_printf(s, "%s ", ev.timestamp);
+	ret = ras_ccix_common_parse(s, record, event, context, &ev);
+	if (ret)
+		return ret;
+
+	trace_seq_printf(s, "%d %s id:%d CCIX Agent Internal error: %s ue:%d nocomm:%d degraded:%d deferred:%d physical addr: 0x%llx mask: 0x%llx",
+			 ev.error_seq, err_severity(ev.severity),
+			 ev.source, ccix_component_type(ev.component),
+			 (ev.severity_detail & 0x1) ? 1 : 0,
+			 (ev.severity_detail & 0x2) ? 1 : 0,
+			 (ev.severity_detail & 0x4) ? 1 : 0,
+			 (ev.severity_detail & 0x8) ? 1 : 0,
+			 ev.address,
+			 err_mask(ev.pa_mask_lsb));
+
+	ras_store_ccix_agent_event(ras, &ev);
+
+	return 0;
+}
diff --git a/ras-ccix-handler.h b/ras-ccix-handler.h
index 3def534..c53e3ee 100644
--- a/ras-ccix-handler.h
+++ b/ras-ccix-handler.h
@@ -33,6 +33,9 @@ int ras_ccix_port_event_handler(struct trace_seq *s,
 int ras_ccix_link_event_handler(struct trace_seq *s,
 				struct pevent_record *record,
 				struct event_format *event, void *context);
+int ras_ccix_agent_event_handler(struct trace_seq *s,
+				 struct pevent_record *record,
+				 struct event_format *event, void *context);
 
 /* Perhaps unnecessary paranoia, but the tracepoint structure is packed */
 #pragma pack(1)
@@ -86,6 +89,10 @@ struct cper_ccix_link_err_compact {
 	uint8_t link_id;
 	uint8_t credit_type;
 };
+
+struct cper_ccix_agent_internal_err_compact {
+	uint32_t validation_bits;
+};
 #pragma pack()
 
 #define CCIX_MEM_ERR_GENERIC_MEM_VALID		0x0001
@@ -128,4 +135,5 @@ struct cper_ccix_link_err_compact {
 #define CCIX_LINK_ERR_MESSAGE_VALID		0x0010
 #define CCIX_LINK_ERR_VENDOR_DATA_VALID		0x0020
 
+#define CCIX_AGENT_ERR_VENDOR_DATA_VALID	0x0001
 #endif
diff --git a/ras-events.c b/ras-events.c
index 96c406e..88fcad1 100644
--- a/ras-events.c
+++ b/ras-events.c
@@ -209,6 +209,7 @@ int toggle_ras_mc_event(int enable)
 	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_atc_event", enable);
 	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_port_event", enable);
 	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_link_event", enable);
+	rc |= __toggle_ras_mc_event(ras, "ras", "ccix_agent_event", enable);
 #endif
 
 #ifdef HAVE_MCE
@@ -736,6 +737,14 @@ int handle_ras_events(int record_events)
 	else
 		log(ALL, LOG_ERR, "Can't get traces from %s:%s\n",
 		    "ras", "ccix_link_event");
+	rc = add_event_handler(ras, pevent, page_size, "ras",
+			       "ccix_agent_error_event",
+			       ras_ccix_agent_event_handler);
+	if (!rc)
+		num_events++;
+	else
+		log(ALL, LOG_ERR, "Cant' get traces from %s:%s\n",
+		    "ras", "ccix_agent_error_event");
 #endif
 
 #ifdef HAVE_NON_STANDARD
diff --git a/ras-record.c b/ras-record.c
index 51ccc02..0525ff0 100644
--- a/ras-record.c
+++ b/ras-record.c
@@ -637,6 +637,62 @@ int ras_store_ccix_link_event(struct ras_events *ras, struct ras_ccix_event *ev)
 	log(TERM, LOG_INFO, "register inserted at db\n");
 	return rc;
 }
+
+enum {
+	ccix_agent_field_vendor = ccix_field_common_end,
+};
+
+static const struct db_fields ccix_agent_event_fields[] = {
+	CCIX_COMMON_FIELDS,
+	[ccix_agent_field_vendor] =	{ .name = "vendor_data",	.type = "BLOB" },
+};
+
+static const struct db_table_descriptor ccix_agent_event_tab = {
+	.name = "ccix_agent_event",
+	.fields = ccix_agent_event_fields,
+	.num_fields = ARRAY_SIZE(ccix_agent_event_fields),
+};
+
+int ras_store_ccix_agent_event(struct ras_events *ras,
+			       struct ras_ccix_event *ev)
+{
+	int rc;
+	struct sqlite3_priv *priv = ras->db_priv;
+	struct cper_ccix_agent_internal_err_compact *agent =
+		(struct cper_ccix_agent_internal_err_compact *)ev->cper_data;
+	sqlite3_stmt *rec = priv->stmt_ccix_agent_record;
+
+	if (!priv || !rec)
+		return 0;
+	log(TERM, LOG_INFO, "ccix_agent_eventstore: %p\n", rec);
+
+	ras_store_ccix_common(rec, ev);
+
+	if (agent->validation_bits & CCIX_AGENT_ERR_VENDOR_DATA_VALID)
+		sqlite3_bind_blob(rec, ccix_agent_field_vendor,
+				  ev->vendor_data, ev->vendor_data_length,
+				  NULL);
+
+	rc = sqlite3_step(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed to do ccix_agent_record step on sqlite: error = %d\n",
+		    rc);
+
+	rc = sqlite3_reset(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed reset ccix_agent_record on sqlite: error = %d\n",
+		    rc);
+
+	rc = sqlite3_clear_bindings(rec);
+	if (rc != SQLITE_OK && rc != SQLITE_DONE)
+		log(TERM, LOG_ERR,
+		    "Failed to clear ccix_agent_record: error %d\n",
+		    rc);
+	log(TERM, LOG_INFO, "register inserted at db\n");
+	return rc;
+}
 #endif
 /*
  * Table and functions to handle ras:non standard
@@ -1052,6 +1108,11 @@ int ras_mc_event_opendb(unsigned cpu, struct ras_events *ras)
 	if (rc == SQLITE_OK)
 		rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_link_record,
 					 &ccix_link_event_tab);
+
+	rc = ras_mc_create_table(priv, &ccix_agent_event_tab);
+	if (rc == SQLITE_OK)
+		rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_agent_record,
+					 &ccix_agent_event_tab);
 #endif
 
 #ifdef HAVE_MCE
diff --git a/ras-record.h b/ras-record.h
index 47bfb0d..07be4bf 100644
--- a/ras-record.h
+++ b/ras-record.h
@@ -119,6 +119,7 @@ struct sqlite3_priv {
 	sqlite3_stmt	*stmt_ccix_atc_record;
 	sqlite3_stmt	*stmt_ccix_port_record;
 	sqlite3_stmt	*stmt_ccix_link_record;
+	sqlite3_stmt	*stmt_ccix_agent_record;
 #endif
 #ifdef HAVE_NON_STANDARD
 	sqlite3_stmt	*stmt_non_standard_record;
@@ -138,6 +139,7 @@ int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev
 int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev);
 int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev);
 int ras_store_ccix_link_event(struct ras_events *ras, struct ras_ccix_event *ev);
+int ras_store_ccix_agent_event(struct ras_events *ras, struct ras_ccix_event *ev);
 int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev);
 int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev);
 
@@ -152,6 +154,7 @@ static inline int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_
 static inline int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; };
 static inline int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; };
 static inline int ras_store_ccix_link_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; };
+static inline int ras_store_ccix_agent_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; };
 static inline int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev) { return 0; };
 static inline int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev) { return 0; };
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 0/6] CCIX rasdaemon support
  2019-06-14 17:55 [RFC PATCH 0/6] CCIX rasdaemon support Jonathan Cameron
                   ` (5 preceding siblings ...)
  2019-06-14 17:55 ` [RFC PATCH 6/6] rasdaemon: CCIX: Agent Internal " Jonathan Cameron
@ 2019-06-21 18:56 ` Mauro Carvalho Chehab
  6 siblings, 0 replies; 8+ messages in thread
From: Mauro Carvalho Chehab @ 2019-06-21 18:56 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: linux-edac, linuxarm, jcm

Em Sat, 15 Jun 2019 01:55:11 +0800
Jonathan Cameron <Jonathan.Cameron@huawei.com> escreveu:

> This is an RFC because the kernel side is currently under review and
> may change with obvious follow through effects on this.
> 
> https://lore.kernel.org/linux-edac/20190606123654.78973-1-Jonathan.Cameron@huawei.com/

Yeah, we should wait for it to be merged upstream before adding them to
rasdaemon ;-)

> 
> There are a few additional questions around this:
> 1. Divide between specifity of DB fields vs blobs.
>    Where possible I have tried to fully describe the contents via
>    separate fields rather than large blobs.

OK!

>    One common SQL convention
>    that doesn't seem to have been previously done in rasdaemon is to
>    use explicit NULL entries for elements where data is missing.

We tried to be a simple as possible when we added the dB option.

We even opted to use sqllite, instead of having support for 
MySQL or Postgres - Not only due to simplicity, but also because,
if a machine has problems, a database at the same machine may crash.

That's said, with MySQL/Postgres support, the logs could be done
via a remote machine, with would be safer.

-

I don't see much problem on adding such things to new tables or even
add optional support for other dB types, but Changing the existing dBs 
can be a problem.

Perhaps it could have a table somewhere storing the Rasdaemon version,
in order to be able to detect if a table has missing something. If
Rasdaemon version is bigger than the one at the server, it could do some
database changes in order to support new features - including things like
replacing empty fields by NULL.

If you think such changes would be useful, feel free to submit patches.

> 2. Should we split ras-record.c and have the ccix handling in a separate
>    ras-record-ccix.c file or similar as that one is getting rather large.

Makes sense to me. As all projects, it started small. As things are
getting bigger, it makes sense to split some features on separate
files.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-14 17:55 [RFC PATCH 0/6] CCIX rasdaemon support Jonathan Cameron
2019-06-14 17:55 ` [RFC PATCH 1/6] rasdaemon: CCIX: CCIX memory error reporting Jonathan Cameron
2019-06-14 17:55 ` [RFC PATCH 2/6] rasdaemon: CCIX: Cache error support Jonathan Cameron
2019-06-14 17:55 ` [RFC PATCH 3/6] rasdaemon: CCIX: ATC errors Jonathan Cameron
2019-06-14 17:55 ` [RFC PATCH 4/6] rasdaemon: CCIX: Port error handling Jonathan Cameron
2019-06-14 17:55 ` [RFC PATCH 5/6] rasdaemon: CCIX: Link error support Jonathan Cameron
2019-06-14 17:55 ` [RFC PATCH 6/6] rasdaemon: CCIX: Agent Internal " Jonathan Cameron
2019-06-21 18:56 ` [RFC PATCH 0/6] CCIX rasdaemon support Mauro Carvalho Chehab

Linux-EDAC Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-edac/0 linux-edac/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-edac linux-edac/ https://lore.kernel.org/linux-edac \
		linux-edac@vger.kernel.org linux-edac@archiver.kernel.org
	public-inbox-index linux-edac


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-edac


AGPL code for this site: git clone https://public-inbox.org/ public-inbox