linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] ARM Error Source Table Support
@ 2021-11-24 17:07 Tyler Baicar
  2021-11-24 17:07 ` [PATCH 1/2] ACPI/AEST: Initial AEST driver Tyler Baicar
  2021-11-24 17:07 ` [PATCH 2/2] trace, ras: add ARM RAS extension trace event Tyler Baicar
  0 siblings, 2 replies; 22+ messages in thread
From: Tyler Baicar @ 2021-11-24 17:07 UTC (permalink / raw)
  To: patches, abdulhamid, darren, catalin.marinas, will, maz,
	james.morse, alexandru.elisei, suzuki.poulose, lorenzo.pieralisi,
	guohanjun, sudeep.holla, rafael, lenb, tony.luck, bp,
	mark.rutland, anshuman.khandual, vincenzo.frascino, tabba,
	marcan, keescook, jthierry, masahiroy, samitolvanen, john.garry,
	daniel.lezcano, gor, zhangshaokun, tmricht, dchinner, tglx,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, linux-edac,
	ishii.shuuichir, Vineeth.Pillai
  Cc: Tyler Baicar

This series adds support for the ARM Error Source Table (AEST) based on
the latest version of ACPI for the Armv8 RAS Extensions [0].

The AEST driver supports both memory mapped and system register interfaces.
This series assumes system register interfaces are only registered with
private peripheral interrupts (PPIs); otherwise there is no guarantee the
core handling the error is the core which took the error and has the
syndrome info in it's system registers.

This is meant to be initial support for AEST to address the current gaps
with systems that support ARMv8 RAS extensions but don't have
firmware-first support. This series simply logs all the errors it finds
and triggers a kernel panic if there is an UE present.

I have tested this series on Ampere Altra using processor errors to
exercise PPI handling with system register interface and memory errors
to exercise SPI handling with MMIO interface. Both corrected and
uncorrected errors were tested to verify the non-fatal vs fatal
scenarios.

Future work:
- UER handling to avoid panic
- Looping through all external abort capable (ERR<n>FR.UE != 0) error
   nodes in SEA/SEI handling

Changes from RFC patch series [1]:
- Updated for latest AEST spec
- Utilize ACPICA header defines of AEST structures
- Added support for ARMv8.4 RAS extension
- Dropped the SEA/SEI dumping of SR RAS registers
- Removed unused defines
- Unified RAS extension register printing to a single function
- Updated trace event with additional fields
- Addressed other feedback from RFC series
- Added myself to ARM64 ACPI MAINTAINERS as a reviewer

[0] https://developer.arm.com/documentation/den0085/latest
[1] https://lkml.org/lkml/2019/7/2/781

Tyler Baicar (2):
  ACPI/AEST: Initial AEST driver
  trace, ras: add ARM RAS extension trace event

 MAINTAINERS                     |   1 +
 arch/arm64/include/asm/ras.h    |  52 ++++
 arch/arm64/include/asm/sysreg.h |   2 +
 arch/arm64/kernel/Makefile      |   1 +
 arch/arm64/kernel/ras.c         | 129 +++++++++
 arch/arm64/kvm/sys_regs.c       |   2 +
 drivers/acpi/arm64/Kconfig      |   3 +
 drivers/acpi/arm64/Makefile     |   1 +
 drivers/acpi/arm64/aest.c       | 455 ++++++++++++++++++++++++++++++++
 include/linux/acpi_aest.h       |  50 ++++
 include/linux/cpuhotplug.h      |   1 +
 include/ras/ras_event.h         |  55 ++++
 12 files changed, 752 insertions(+)
 create mode 100644 arch/arm64/include/asm/ras.h
 create mode 100644 arch/arm64/kernel/ras.c
 create mode 100644 drivers/acpi/arm64/aest.c
 create mode 100644 include/linux/acpi_aest.h

-- 
2.33.1


^ permalink raw reply	[flat|nested] 22+ messages in thread
* [PATCH 0/2] ARM Error Source Table V1 Support
@ 2024-03-04 11:15 Ruidong Tian
  2024-03-04 11:15 ` [PATCH 1/2] ACPI/AEST: Initial AEST driver Ruidong Tian
  0 siblings, 1 reply; 22+ messages in thread
From: Ruidong Tian @ 2024-03-04 11:15 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, baolin.wang, linux-kernel, linux-acpi,
	linux-arm-kernel
  Cc: Ruidong Tian

This series adds support for the ARM Error Source Table (AEST) based on
the 1.1 version of ACPI for the Armv8 RAS Extensions [0].

The Arm Error Source Table (AEST) enable kernel-first handling of errors
in a system that supports the Armv8 RAS extensions. Hardware errors will
trigger a RAS interrupt to kernel, kernel scan all AEST node to fine
error node which occur error in irq context and use a workqueue to log
this hardware errors.

I have tested this series on PTG Yitian710 SOC. Both corrected and
uncorrected errors were tested to verify the non-fatal vs fatal
scenarios.

Future work:
1. UE trigger memory_failure other than panic.
2. Add CE storm mitigation.
3. Support AEST V2.

This series is based on Tyler Baicar's patches [1], which do not have v2
sended to mail list yet. Change from origin patch:
1. Add a genpool to collect all AEST error, and log them in a workqueue
other than in irq context.
2. Just use the same one aest_proc function for system register interface
and MMIO interface.
3. Reconstruct some structures and functions to make it more clear.
4. Accept all comments in Tyler Baicar's mail list.

[0]: https://developer.arm.com/documentation/den0085/0101/
[1]: https://lore.kernel.org/all/20211124170708.3874-1-baicar@os.amperecomputing.com/

Tyler Baicar (2):
  ACPI/AEST: Initial AEST driver
  trace, ras: add ARM RAS extension trace event

 MAINTAINERS                  |  11 +
 arch/arm64/include/asm/ras.h |  38 ++
 drivers/acpi/arm64/Kconfig   |  10 +
 drivers/acpi/arm64/Makefile  |   1 +
 drivers/acpi/arm64/aest.c    | 728 +++++++++++++++++++++++++++++++++++
 include/linux/acpi_aest.h    |  91 +++++
 include/linux/cpuhotplug.h   |   1 +
 include/ras/ras_event.h      |  55 +++
 8 files changed, 935 insertions(+)
 create mode 100644 arch/arm64/include/asm/ras.h
 create mode 100644 drivers/acpi/arm64/aest.c
 create mode 100644 include/linux/acpi_aest.h

-- 
2.33.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-03-12  9:53 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-24 17:07 [PATCH 0/2] ARM Error Source Table Support Tyler Baicar
2021-11-24 17:07 ` [PATCH 1/2] ACPI/AEST: Initial AEST driver Tyler Baicar
2021-11-24 18:09   ` Marc Zyngier
2021-11-29 20:39     ` Darren Hart
2021-11-30  9:45       ` Marc Zyngier
2021-11-30 16:41         ` Darren Hart
2021-12-16 22:05           ` Tyler Baicar
2021-12-16 23:42             ` Sudeep Holla
2021-11-24 18:51   ` Mark Rutland
2021-12-16 23:22     ` Tyler Baicar
2021-12-09  8:10   ` ishii.shuuichir
2021-12-16 23:33     ` Tyler Baicar
2022-04-20  7:54       ` ishii.shuuichir
2022-05-09 13:37         ` Tyler Baicar
2022-05-09 23:23           ` ishii.shuuichir
2022-12-07  5:44           ` Ruidong Tian
2021-11-24 17:07 ` [PATCH 2/2] trace, ras: add ARM RAS extension trace event Tyler Baicar
2024-03-04 11:15 [PATCH 0/2] ARM Error Source Table V1 Support Ruidong Tian
2024-03-04 11:15 ` [PATCH 1/2] ACPI/AEST: Initial AEST driver Ruidong Tian
2024-03-04 12:07   ` Marc Zyngier
2024-03-08  4:49     ` Ruidong Tian
     [not found]     ` <aaad88c3-333d-4714-a9ca-3b66c8a5d9c8@linux.alibaba.com>
2024-03-09 10:33       ` Marc Zyngier
2024-03-12  9:53         ` Ruidong Tian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).