[RFC PATCH 0/6] CCIX Protocol Error reporting

* [RFC PATCH 0/6] CCIX Protocol Error reporting
@ 2019-06-06 12:36 Jonathan Cameron
  2019-06-06 12:36 ` [RFC PATCH 1/6] efi / ras: CCIX Memory error reporting Jonathan Cameron
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Jonathan Cameron @ 2019-06-06 12:36 UTC (permalink / raw)
  To: linux-edac, linux-acpi, linux-efi
  Cc: linuxarm, rjw, tony.luck, bp, james.morse, ard.beisheuvel,
	nariman.poushin, Jonathan Cameron

UEFI 2.8 defines a new CPER record Appendix N for CCIX Protocol Error Records
(PER). www.uefi.org

These include Protocol Error Record logs which are defined in the
CCIX 1.0 Base Specification www.ccixconsortium.com.

Handling of coherency protocol errors is complex and how Linux does this
will take some time to evolve.  For now, fatal errors are handled via the
usual means and everything else is reported.

There are 6 types of error defined, covering:
* Memory errors
* Cache errors
* Address translation unit errors
* CCIX port errors 
* CCIX link errors
* Agent internal errors.

The set includes tracepoints to report the errors to RAS Daemon and a patch
set for RAS Daemon will follow shortly.

There are several open questions for this RFC.
1. Reporting of vendor data.  We have little choice but to do this via a
   dynamic array as these blocks can take arbitrary size. I had hoped
   no one would actually use these given the odd mismatch between a
   standard error structure and non standard element, but there are
   already designs out there that do use it.
2. The trade off between explicit tracepoint fields, on which we might
   want to filter, and the simplicity of a blob. I have gone for having
   the whole of the block specific to the PER error type in an opaque blob.
   Perhaps this is not the right balance?
3. Whether defining 6 new tracepoints is sensible. I think it is:
   * They are all defined by the CCIX specification as independant error
     classes.
   * Many of them can only be generated by particular types of agent.
   * The handling required will vary widely depending on types.
     In the kernel some map cleanly onto existing handling. Keeping the
     whole flow separate will aide this. They vary by a similar amount
     in scope to the RAS errors found on an existing system which have
     independent tracepoints.
   * Separating them out allows for filtering on the tracepoints by
     elements that are not shared between them.
   * Muxing the lot into one record type can lead to ugly code both in
     kernel and in userspace.

Rasdaemon patches will follow shortly.

This patch is being distributed by the CCIX Consortium, Inc. (CCIX) to
you and other parties that are paticipating (the "participants") in the
Linux kernel with the understanding that the participants will use CCIX's
name and trademark only when this patch is used in association with the
Linux kernel and associated user space.

CCIX is also distributing this patch to these participants with the
understanding that if any portion of the CCIX specification will be
used or referenced in the Linux kernel, the participants will not modify
the cited portion of the CCIX specification and will give CCIX propery
copyright attribution by including the following copyright notice with
the cited part of the CCIX specification:
"© 2019 CCIX CONSORTIUM, INC. ALL RIGHTS RESERVED."

Jonathan Cameron (6):
  efi / ras: CCIX Memory error reporting
  efi / ras: CCIX Cache error reporting
  efi / ras: CCIX Address Translation Cache error reporting
  efi / ras: CCIX Port error reporting
  efi / ras: CCIX Link error reporting
  efi / ras: CCIX Agent internal error reporting

 drivers/acpi/apei/Kconfig        |   8 +
 drivers/acpi/apei/ghes.c         |  59 ++
 drivers/firmware/efi/Kconfig     |   5 +
 drivers/firmware/efi/Makefile    |   1 +
 drivers/firmware/efi/cper-ccix.c | 916 +++++++++++++++++++++++++++++++
 drivers/firmware/efi/cper.c      |   6 +
 include/linux/cper.h             | 333 +++++++++++
 include/ras/ras_event.h          | 405 ++++++++++++++
 8 files changed, 1733 insertions(+)
 create mode 100644 drivers/firmware/efi/cper-ccix.c

-- 
2.20.1

^ permalink raw reply	[flat|nested] 12+ messages in thread