[PATCH 00/20] MCA Updates

* [PATCH 00/20] MCA Updates
@ 2023-11-18 19:32 Yazen Ghannam
  2023-11-18 19:32 ` [PATCH 01/20] x86/mce/inject: Clear test status value Yazen Ghannam
                   ` (19 more replies)
  0 siblings, 20 replies; 31+ messages in thread
From: Yazen Ghannam @ 2023-11-18 19:32 UTC (permalink / raw)
  To: linux-edac
  Cc: linux-kernel, tony.luck, x86, Avadhut.Naik,
	Smita.KoralahalliChannabasappa, amd-gfx, linux-trace-kernel,
	Yazen Ghannam

Hi all,

This set is a collection of logically independent updates that make
changes to common code. I've collected them to resolve conflicts and
ordering. Furthermore, this is the first half of a larger set. The
second half is focused on refactoring the AMD MCA Thresholding feature
support. So I decided to leave out the second half for now. The second
part will include AMD CMCI Storm handling support on top of the
refactored code.

Patch 1 is a small, standalone fix for an issue I noticed during testing
of this set.

Patches 2-3 are a redo of a previous set dealing with BERT MCA decode
and preemption.
https://lore.kernel.org/r/20230622131841.3153672-1-yazen.ghannam@amd.com

Patches 4-12 are general refactoring in preparation for later patches in
this set and the second planned set. The overall theme is to simplify
the AMD MCA init flow and to remove unnecessary data caching in per-CPU
variables. The init flow refactor will be completed in the second patch
set, since much of the cached data is used to set up MCA Thresholding.

Patches 13-14 unify the AMD THR and DFR interrupt handlers with MCA
polling.

Patch 15 is a small fix for the MCA Thresholding init path.

Patch 16 adds support for a new Corrected Error Interrupt on Scalable
MCA systems.

Patches 17-20 add support for new Scalable MCA registers and FRU Text
decoding feature. This is a follow up to a previous set.
https://lore.kernel.org/r/20220418174440.334336-1-yazen.ghannam@amd.com

Thanks,
Yazen

Avadhut Naik (2):
  x86/mce: Add wrapper for struct mce to export vendor specific info
  x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

Yazen Ghannam (18):
  x86/mce/inject: Clear test status value
  x86/mce: Define mce_setup() helpers for global and per-CPU fields
  x86/mce: Use mce_setup() helpers for apei_smca_report_x86_error()
  x86/mce/amd, EDAC/mce_amd: Move long names to decoder module
  x86/mce/amd: Use helper for UMC bank type check
  x86/mce/amd: Use helper for GPU UMC bank type checks
  x86/mce/amd: Use fixed bank number for quirks
  x86/mce/amd: Look up bank type by IPID
  x86/mce/amd: Clean up SMCA configuration
  x86/mce/amd: Prep DFR handler before enabling banks
  x86/mce/amd: Simplify DFR handler setup
  x86/mce/amd: Clean up enable_deferred_error_interrupt()
  x86/mce: Unify AMD THR handler with MCA Polling
  x86/mce/amd: Unify AMD DFR handler with MCA Polling
  x86/mce: Skip AMD threshold init if no threshold banks found
  x86/mce/amd: Support SMCA Corrected Error Interrupt
  x86/mce/apei: Handle variable register array size
  EDAC/mce_amd: Add support for FRU Text in MCA

 arch/x86/include/asm/mce.h              |  30 +-
 arch/x86/kernel/cpu/mce/amd.c           | 534 +++++++++++++-----------
 arch/x86/kernel/cpu/mce/apei.c          | 125 ++++--
 arch/x86/kernel/cpu/mce/core.c          | 243 +++++++----
 arch/x86/kernel/cpu/mce/genpool.c       |  20 +-
 arch/x86/kernel/cpu/mce/inject.c        |   5 +-
 arch/x86/kernel/cpu/mce/internal.h      |  11 +-
 drivers/edac/amd64_edac.c               |   2 +-
 drivers/edac/mce_amd.c                  |  70 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c |   9 +-
 include/trace/events/mce.h              |  47 ++-
 11 files changed, 671 insertions(+), 425 deletions(-)

base-commit: 35f30e2dfdccfba60c413248e03782b8793f92e6
-- 
2.34.1

^ permalink raw reply	[flat|nested] 31+ messages in thread