[PATCH 0/6] x86/RAS: Correctable Errors Collector

* [PATCH 0/6] x86/RAS: Correctable Errors Collector
@ 2017-03-27  9:32 Borislav Petkov
  2017-03-27  9:32 ` [PATCH 1/6] x86/mce: Don't print MCEs when mcelog is active Borislav Petkov
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Borislav Petkov @ 2017-03-27  9:32 UTC (permalink / raw)
  To: X86 ML; +Cc: linux-edac, LKML

From: Borislav Petkov <bp@suse.de>

Hi guys,

here's v1, all feedback I know of has been addressed. So I guess it is
time. :)

We don't have it default y yet but will make it so after it has seen
wider testing. The end goal is to have it running by default so that
transient correctable ECC errors don't generate error logs and upset
people unnecessarily.

The other good thing resulting from this patchset is that we have *all*
MCE consumers lined up in a notifier with priorities. This way we have a
single chain which gets to see error records and not some wild variety
of hooks here and there.

Last but not least, /dev/mcelog has been deprecated and all the code is
behind a CONFIG_X86_MCELOG config option.

Btw, patch 1 is for urgent.

Please apply,
thanks.

Changelog:
=========

v0:

here's the latest incarnation of the CEC collector. I think I've taken
care of all review comments but feel free to correct me here. The
introductory comment in cec.c should explain the whole deal - I'm
referring to there so that we have that text in the actual source and
not spread it around commit messages. So pls have a look there for more
info.

The thing has knobs in debugfs now which can control its operation, I
hope I've chosen sane default values.

Andi Kleen (1):
  x86/mce: Don't print MCEs when mcelog is active

Borislav Petkov (4):
  x86/MCE: Rename mce_log()'s argument
  x86/MCE: Rename mce_log to mce_log_buffer
  RAS: Add a Corrected Errors Collector
  x86/mce: Do not register notifiers with invalid prio

Tony Luck (1):
  x86/mce: Deprecate /dev/mcelog

 Documentation/admin-guide/kernel-parameters.txt |   6 +
 arch/x86/Kconfig                                |  10 +-
 arch/x86/include/asm/mce.h                      |  12 +-
 arch/x86/kernel/cpu/mcheck/Makefile             |   2 +
 arch/x86/kernel/cpu/mcheck/dev-mcelog.c         | 397 ++++++++++++++++++
 arch/x86/kernel/cpu/mcheck/mce-internal.h       |   8 +
 arch/x86/kernel/cpu/mcheck/mce.c                | 501 +++++-----------------
 arch/x86/ras/Kconfig                            |  14 +
 drivers/ras/Makefile                            |   3 +-
 drivers/ras/cec.c                               | 532 ++++++++++++++++++++++++
 drivers/ras/debugfs.c                           |   2 +-
 drivers/ras/debugfs.h                           |   8 +
 drivers/ras/ras.c                               |  11 +
 include/linux/ras.h                             |  13 +-
 14 files changed, 1105 insertions(+), 414 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/mcheck/dev-mcelog.c
 create mode 100644 drivers/ras/cec.c
 create mode 100644 drivers/ras/debugfs.h

-- 
2.11.0

^ permalink raw reply	[flat|nested] 13+ messages in thread