[PATCH 0/2] ARM Error Source Table Support

* [PATCH 0/2] ARM Error Source Table Support
@ 2021-11-24 17:07 ` Tyler Baicar
  0 siblings, 0 replies; 55+ messages in thread
From: Tyler Baicar @ 2021-11-24 17:07 UTC (permalink / raw)
  To: patches, abdulhamid, darren, catalin.marinas, will, maz,
	james.morse, alexandru.elisei, suzuki.poulose, lorenzo.pieralisi,
	guohanjun, sudeep.holla, rafael, lenb, tony.luck, bp,
	mark.rutland, anshuman.khandual, vincenzo.frascino, tabba,
	marcan, keescook, jthierry, masahiroy, samitolvanen, john.garry,
	daniel.lezcano, gor, zhangshaokun, tmricht, dchinner, tglx,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, linux-edac,
	ishii.shuuichir, Vineeth.Pillai
  Cc: Tyler Baicar

This series adds support for the ARM Error Source Table (AEST) based on
the latest version of ACPI for the Armv8 RAS Extensions [0].

The AEST driver supports both memory mapped and system register interfaces.
This series assumes system register interfaces are only registered with
private peripheral interrupts (PPIs); otherwise there is no guarantee the
core handling the error is the core which took the error and has the
syndrome info in it's system registers.

This is meant to be initial support for AEST to address the current gaps
with systems that support ARMv8 RAS extensions but don't have
firmware-first support. This series simply logs all the errors it finds
and triggers a kernel panic if there is an UE present.

I have tested this series on Ampere Altra using processor errors to
exercise PPI handling with system register interface and memory errors
to exercise SPI handling with MMIO interface. Both corrected and
uncorrected errors were tested to verify the non-fatal vs fatal
scenarios.

Future work:
- UER handling to avoid panic
- Looping through all external abort capable (ERR<n>FR.UE != 0) error
   nodes in SEA/SEI handling

Changes from RFC patch series [1]:
- Updated for latest AEST spec
- Utilize ACPICA header defines of AEST structures
- Added support for ARMv8.4 RAS extension
- Dropped the SEA/SEI dumping of SR RAS registers
- Removed unused defines
- Unified RAS extension register printing to a single function
- Updated trace event with additional fields
- Addressed other feedback from RFC series
- Added myself to ARM64 ACPI MAINTAINERS as a reviewer

[0] https://developer.arm.com/documentation/den0085/latest
[1] https://lkml.org/lkml/2019/7/2/781

Tyler Baicar (2):
  ACPI/AEST: Initial AEST driver
  trace, ras: add ARM RAS extension trace event

 MAINTAINERS                     |   1 +
 arch/arm64/include/asm/ras.h    |  52 ++++
 arch/arm64/include/asm/sysreg.h |   2 +
 arch/arm64/kernel/Makefile      |   1 +
 arch/arm64/kernel/ras.c         | 129 +++++++++
 arch/arm64/kvm/sys_regs.c       |   2 +
 drivers/acpi/arm64/Kconfig      |   3 +
 drivers/acpi/arm64/Makefile     |   1 +
 drivers/acpi/arm64/aest.c       | 455 ++++++++++++++++++++++++++++++++
 include/linux/acpi_aest.h       |  50 ++++
 include/linux/cpuhotplug.h      |   1 +
 include/ras/ras_event.h         |  55 ++++
 12 files changed, 752 insertions(+)
 create mode 100644 arch/arm64/include/asm/ras.h
 create mode 100644 arch/arm64/kernel/ras.c
 create mode 100644 drivers/acpi/arm64/aest.c
 create mode 100644 include/linux/acpi_aest.h

-- 
2.33.1

^ permalink raw reply	[flat|nested] 55+ messages in thread