All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] x86/RAS: Correctable Errors Collector
@ 2017-03-27  9:32 Borislav Petkov
  2017-03-27  9:32 ` [PATCH 1/6] x86/mce: Don't print MCEs when mcelog is active Borislav Petkov
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Borislav Petkov @ 2017-03-27  9:32 UTC (permalink / raw)
  To: X86 ML; +Cc: linux-edac, LKML

From: Borislav Petkov <bp@suse.de>

Hi guys,

here's v1, all feedback I know of has been addressed. So I guess it is
time. :)

We don't have it default y yet but will make it so after it has seen
wider testing. The end goal is to have it running by default so that
transient correctable ECC errors don't generate error logs and upset
people unnecessarily.

The other good thing resulting from this patchset is that we have *all*
MCE consumers lined up in a notifier with priorities. This way we have a
single chain which gets to see error records and not some wild variety
of hooks here and there.

Last but not least, /dev/mcelog has been deprecated and all the code is
behind a CONFIG_X86_MCELOG config option.

Btw, patch 1 is for urgent.

Please apply,
thanks.

Changelog:
=========

v0:

here's the latest incarnation of the CEC collector. I think I've taken
care of all review comments but feel free to correct me here. The
introductory comment in cec.c should explain the whole deal - I'm
referring to there so that we have that text in the actual source and
not spread it around commit messages. So pls have a look there for more
info.

The thing has knobs in debugfs now which can control its operation, I
hope I've chosen sane default values.

Andi Kleen (1):
  x86/mce: Don't print MCEs when mcelog is active

Borislav Petkov (4):
  x86/MCE: Rename mce_log()'s argument
  x86/MCE: Rename mce_log to mce_log_buffer
  RAS: Add a Corrected Errors Collector
  x86/mce: Do not register notifiers with invalid prio

Tony Luck (1):
  x86/mce: Deprecate /dev/mcelog

 Documentation/admin-guide/kernel-parameters.txt |   6 +
 arch/x86/Kconfig                                |  10 +-
 arch/x86/include/asm/mce.h                      |  12 +-
 arch/x86/kernel/cpu/mcheck/Makefile             |   2 +
 arch/x86/kernel/cpu/mcheck/dev-mcelog.c         | 397 ++++++++++++++++++
 arch/x86/kernel/cpu/mcheck/mce-internal.h       |   8 +
 arch/x86/kernel/cpu/mcheck/mce.c                | 501 +++++-----------------
 arch/x86/ras/Kconfig                            |  14 +
 drivers/ras/Makefile                            |   3 +-
 drivers/ras/cec.c                               | 532 ++++++++++++++++++++++++
 drivers/ras/debugfs.c                           |   2 +-
 drivers/ras/debugfs.h                           |   8 +
 drivers/ras/ras.c                               |  11 +
 include/linux/ras.h                             |  13 +-
 14 files changed, 1105 insertions(+), 414 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/mcheck/dev-mcelog.c
 create mode 100644 drivers/ras/cec.c
 create mode 100644 drivers/ras/debugfs.h

-- 
2.11.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/6] x86/mce: Don't print MCEs when mcelog is active
  2017-03-27  9:32 [PATCH 0/6] x86/RAS: Correctable Errors Collector Borislav Petkov
@ 2017-03-27  9:32 ` Borislav Petkov
  2017-03-28  7:01   ` [tip:ras/core] " tip-bot for Andi Kleen
  2017-03-27  9:33 ` [PATCH 2/6] x86/MCE: Rename mce_log()'s argument Borislav Petkov
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2017-03-27  9:32 UTC (permalink / raw)
  To: X86 ML; +Cc: linux-edac, LKML

From: Andi Kleen <ak@linux.intel.com>

Since

  cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers")

all MCEs are printed even when mcelog is running. Fix the regression to
not print to dmesg when mcelog is running as it is a consumer too.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org # 4.10..
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Fixes: cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers")
Link: http://lkml.kernel.org/r/20170324003140.26400-1-andi@firstfloor.org
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 5e365a2fabe5..e95e02734f25 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -55,6 +55,8 @@
 
 static DEFINE_MUTEX(mce_chrdev_read_mutex);
 
+static int mce_chrdev_open_count;	/* #times opened */
+
 #define mce_log_get_idx_check(p) \
 ({ \
 	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
@@ -599,6 +601,10 @@ static int mce_default_notifier(struct notifier_block *nb, unsigned long val,
 	if (atomic_read(&num_notifiers) > 2)
 		return NOTIFY_DONE;
 
+	/* Don't print when mcelog is running */
+	if (mce_chrdev_open_count > 0)
+		return NOTIFY_DONE;
+
 	__print_mce(m);
 
 	return NOTIFY_DONE;
@@ -1848,7 +1854,6 @@ void mcheck_cpu_clear(struct cpuinfo_x86 *c)
  */
 
 static DEFINE_SPINLOCK(mce_chrdev_state_lock);
-static int mce_chrdev_open_count;	/* #times opened */
 static int mce_chrdev_open_exclu;	/* already open exclusive? */
 
 static int mce_chrdev_open(struct inode *inode, struct file *file)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/6] x86/MCE: Rename mce_log()'s argument
  2017-03-27  9:32 [PATCH 0/6] x86/RAS: Correctable Errors Collector Borislav Petkov
  2017-03-27  9:32 ` [PATCH 1/6] x86/mce: Don't print MCEs when mcelog is active Borislav Petkov
@ 2017-03-27  9:33 ` Borislav Petkov
  2017-03-28  7:01   ` [tip:ras/core] x86/mce: " tip-bot for Borislav Petkov
  2017-03-27  9:33 ` [PATCH 3/6] x86/MCE: Rename mce_log to mce_log_buffer Borislav Petkov
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2017-03-27  9:33 UTC (permalink / raw)
  To: X86 ML; +Cc: linux-edac, LKML

From: Borislav Petkov <bp@suse.de>

We call it everywhere "struct mce *m". Adjust that here too to avoid
confusion.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index e95e02734f25..49673b20f17f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -158,14 +158,14 @@ static struct mce_log mcelog = {
 	.recordlen	= sizeof(struct mce),
 };
 
-void mce_log(struct mce *mce)
+void mce_log(struct mce *m)
 {
 	unsigned next, entry;
 
 	/* Emit the trace record: */
-	trace_mce_record(mce);
+	trace_mce_record(m);
 
-	if (!mce_gen_pool_add(mce))
+	if (!mce_gen_pool_add(m))
 		irq_work_queue(&mce_irq_work);
 
 	wmb();
@@ -195,7 +195,7 @@ void mce_log(struct mce *mce)
 		if (cmpxchg(&mcelog.next, entry, next) == entry)
 			break;
 	}
-	memcpy(mcelog.entry + entry, mce, sizeof(struct mce));
+	memcpy(mcelog.entry + entry, m, sizeof(struct mce));
 	wmb();
 	mcelog.entry[entry].finished = 1;
 	wmb();
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/6] x86/MCE: Rename mce_log to mce_log_buffer
  2017-03-27  9:32 [PATCH 0/6] x86/RAS: Correctable Errors Collector Borislav Petkov
  2017-03-27  9:32 ` [PATCH 1/6] x86/mce: Don't print MCEs when mcelog is active Borislav Petkov
  2017-03-27  9:33 ` [PATCH 2/6] x86/MCE: Rename mce_log()'s argument Borislav Petkov
@ 2017-03-27  9:33 ` Borislav Petkov
  2017-03-28  7:02   ` [tip:ras/core] x86/mce: " tip-bot for Borislav Petkov
  2017-03-27  9:33 ` [PATCH 4/6] RAS: Add a Corrected Errors Collector Borislav Petkov
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2017-03-27  9:33 UTC (permalink / raw)
  To: X86 ML; +Cc: linux-edac, LKML

From: Borislav Petkov <bp@suse.de>

It is confusing when staring at "struct mce_log mcelog" and then there's
also a function called mce_log(). So call the buffer what it is.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/mce.h       |  2 +-
 arch/x86/kernel/cpu/mcheck/mce.c | 30 +++++++++++++++---------------
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index e63873683d4a..0512dcc11750 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -128,7 +128,7 @@
  * debugging tools.  Each entry is only valid when its finished flag
  * is set.
  */
-struct mce_log {
+struct mce_log_buffer {
 	char signature[12]; /* "MACHINECHECK" */
 	unsigned len;	    /* = MCE_LOG_LEN */
 	unsigned next;
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 49673b20f17f..8ada093d4e40 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -152,7 +152,7 @@ EXPORT_PER_CPU_SYMBOL_GPL(injectm);
  * separate MCEs from kernel messages to avoid bogus bug reports.
  */
 
-static struct mce_log mcelog = {
+static struct mce_log_buffer mcelog_buf = {
 	.signature	= MCE_LOG_SIGNATURE,
 	.len		= MCE_LOG_LEN,
 	.recordlen	= sizeof(struct mce),
@@ -170,7 +170,7 @@ void mce_log(struct mce *m)
 
 	wmb();
 	for (;;) {
-		entry = mce_log_get_idx_check(mcelog.next);
+		entry = mce_log_get_idx_check(mcelog_buf.next);
 		for (;;) {
 
 			/*
@@ -180,11 +180,11 @@ void mce_log(struct mce *m)
 			 */
 			if (entry >= MCE_LOG_LEN) {
 				set_bit(MCE_OVERFLOW,
-					(unsigned long *)&mcelog.flags);
+					(unsigned long *)&mcelog_buf.flags);
 				return;
 			}
 			/* Old left over entry. Skip: */
-			if (mcelog.entry[entry].finished) {
+			if (mcelog_buf.entry[entry].finished) {
 				entry++;
 				continue;
 			}
@@ -192,12 +192,12 @@ void mce_log(struct mce *m)
 		}
 		smp_rmb();
 		next = entry + 1;
-		if (cmpxchg(&mcelog.next, entry, next) == entry)
+		if (cmpxchg(&mcelog_buf.next, entry, next) == entry)
 			break;
 	}
-	memcpy(mcelog.entry + entry, m, sizeof(struct mce));
+	memcpy(mcelog_buf.entry + entry, m, sizeof(struct mce));
 	wmb();
-	mcelog.entry[entry].finished = 1;
+	mcelog_buf.entry[entry].finished = 1;
 	wmb();
 
 	set_bit(0, &mce_need_notify);
@@ -1958,7 +1958,7 @@ static ssize_t mce_chrdev_read(struct file *filp, char __user *ubuf,
 			goto out;
 	}
 
-	next = mce_log_get_idx_check(mcelog.next);
+	next = mce_log_get_idx_check(mcelog_buf.next);
 
 	/* Only supports full reads right now */
 	err = -EINVAL;
@@ -1970,7 +1970,7 @@ static ssize_t mce_chrdev_read(struct file *filp, char __user *ubuf,
 	do {
 		for (i = prev; i < next; i++) {
 			unsigned long start = jiffies;
-			struct mce *m = &mcelog.entry[i];
+			struct mce *m = &mcelog_buf.entry[i];
 
 			while (!m->finished) {
 				if (time_after_eq(jiffies, start + 2)) {
@@ -1986,10 +1986,10 @@ static ssize_t mce_chrdev_read(struct file *filp, char __user *ubuf,
 			;
 		}
 
-		memset(mcelog.entry + prev, 0,
+		memset(mcelog_buf.entry + prev, 0,
 		       (next - prev) * sizeof(struct mce));
 		prev = next;
-		next = cmpxchg(&mcelog.next, prev, 0);
+		next = cmpxchg(&mcelog_buf.next, prev, 0);
 	} while (next != prev);
 
 	synchronize_sched();
@@ -2001,7 +2001,7 @@ static ssize_t mce_chrdev_read(struct file *filp, char __user *ubuf,
 	on_each_cpu(collect_tscs, cpu_tsc, 1);
 
 	for (i = next; i < MCE_LOG_LEN; i++) {
-		struct mce *m = &mcelog.entry[i];
+		struct mce *m = &mcelog_buf.entry[i];
 
 		if (m->finished && m->tsc < cpu_tsc[m->cpu]) {
 			err |= copy_to_user(buf, m, sizeof(*m));
@@ -2024,7 +2024,7 @@ static ssize_t mce_chrdev_read(struct file *filp, char __user *ubuf,
 static unsigned int mce_chrdev_poll(struct file *file, poll_table *wait)
 {
 	poll_wait(file, &mce_chrdev_wait, wait);
-	if (READ_ONCE(mcelog.next))
+	if (READ_ONCE(mcelog_buf.next))
 		return POLLIN | POLLRDNORM;
 	if (!mce_apei_read_done && apei_check_mce())
 		return POLLIN | POLLRDNORM;
@@ -2048,8 +2048,8 @@ static long mce_chrdev_ioctl(struct file *f, unsigned int cmd,
 		unsigned flags;
 
 		do {
-			flags = mcelog.flags;
-		} while (cmpxchg(&mcelog.flags, flags, 0) != flags);
+			flags = mcelog_buf.flags;
+		} while (cmpxchg(&mcelog_buf.flags, flags, 0) != flags);
 
 		return put_user(flags, p);
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/6] RAS: Add a Corrected Errors Collector
  2017-03-27  9:32 [PATCH 0/6] x86/RAS: Correctable Errors Collector Borislav Petkov
                   ` (2 preceding siblings ...)
  2017-03-27  9:33 ` [PATCH 3/6] x86/MCE: Rename mce_log to mce_log_buffer Borislav Petkov
@ 2017-03-27  9:33 ` Borislav Petkov
  2017-03-28  7:02   ` [tip:ras/core] " tip-bot for Borislav Petkov
  2017-03-27  9:33 ` [PATCH 5/6] x86/mce: Deprecate /dev/mcelog Borislav Petkov
  2017-03-27  9:33 ` [PATCH 6/6] x86/mce: Do not register notifiers with invalid prio Borislav Petkov
  5 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2017-03-27  9:33 UTC (permalink / raw)
  To: X86 ML; +Cc: linux-edac, LKML

From: Borislav Petkov <bp@suse.de>

A simple data structure for collecting correctable errors along with
accessors. More detailed description in the code itself.

The error decoding is done with the decoding chain now and
mce_first_notifier() gets to see the error first and the CEC decides
whether to log it and then the rest of the chain doesn't hear about it -
basically the main reason for the CE collector - or to continue running
the notifiers.

When the CEC hits the action threshold, it will try to soft-offine the
page containing the ECC and then the whole decoding chain gets to see
the error.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 Documentation/admin-guide/kernel-parameters.txt |   6 +
 arch/x86/include/asm/mce.h                      |   9 +-
 arch/x86/kernel/cpu/mcheck/mce.c                | 191 +++++----
 arch/x86/ras/Kconfig                            |  14 +
 drivers/ras/Makefile                            |   3 +-
 drivers/ras/cec.c                               | 532 ++++++++++++++++++++++++
 drivers/ras/debugfs.c                           |   2 +-
 drivers/ras/debugfs.h                           |   8 +
 drivers/ras/ras.c                               |  11 +
 include/linux/ras.h                             |  13 +-
 10 files changed, 706 insertions(+), 83 deletions(-)
 create mode 100644 drivers/ras/cec.c
 create mode 100644 drivers/ras/debugfs.h

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ea1293d60f17..77f285c802e4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3173,6 +3173,12 @@
 	ramdisk_size=	[RAM] Sizes of RAM disks in kilobytes
 			See Documentation/blockdev/ramdisk.txt.
 
+	ras=option[,option,...]	[KNL] RAS-specific options
+
+		cec_disable	[X86]
+				Disable the Correctable Errors Collector,
+				see CONFIG_RAS_CEC help text.
+
 	rcu_nocbs=	[KNL]
 			The argument is a cpu list, as described above.
 
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 0512dcc11750..c5ae545d27d8 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -191,10 +191,11 @@ extern struct mca_config mca_cfg;
 extern struct mca_msr_regs msr_ops;
 
 enum mce_notifier_prios {
-	MCE_PRIO_SRAO		= INT_MAX,
-	MCE_PRIO_EXTLOG		= INT_MAX - 1,
-	MCE_PRIO_NFIT		= INT_MAX - 2,
-	MCE_PRIO_EDAC		= INT_MAX - 3,
+	MCE_PRIO_FIRST		= INT_MAX,
+	MCE_PRIO_SRAO		= INT_MAX - 1,
+	MCE_PRIO_EXTLOG		= INT_MAX - 2,
+	MCE_PRIO_NFIT		= INT_MAX - 3,
+	MCE_PRIO_EDAC		= INT_MAX - 4,
 	MCE_PRIO_LOWEST		= 0,
 };
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8ada093d4e40..4a907758a516 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -35,6 +35,7 @@
 #include <linux/poll.h>
 #include <linux/nmi.h>
 #include <linux/cpu.h>
+#include <linux/ras.h>
 #include <linux/smp.h>
 #include <linux/fs.h>
 #include <linux/mm.h>
@@ -160,47 +161,8 @@ static struct mce_log_buffer mcelog_buf = {
 
 void mce_log(struct mce *m)
 {
-	unsigned next, entry;
-
-	/* Emit the trace record: */
-	trace_mce_record(m);
-
 	if (!mce_gen_pool_add(m))
 		irq_work_queue(&mce_irq_work);
-
-	wmb();
-	for (;;) {
-		entry = mce_log_get_idx_check(mcelog_buf.next);
-		for (;;) {
-
-			/*
-			 * When the buffer fills up discard new entries.
-			 * Assume that the earlier errors are the more
-			 * interesting ones:
-			 */
-			if (entry >= MCE_LOG_LEN) {
-				set_bit(MCE_OVERFLOW,
-					(unsigned long *)&mcelog_buf.flags);
-				return;
-			}
-			/* Old left over entry. Skip: */
-			if (mcelog_buf.entry[entry].finished) {
-				entry++;
-				continue;
-			}
-			break;
-		}
-		smp_rmb();
-		next = entry + 1;
-		if (cmpxchg(&mcelog_buf.next, entry, next) == entry)
-			break;
-	}
-	memcpy(mcelog_buf.entry + entry, m, sizeof(struct mce));
-	wmb();
-	mcelog_buf.entry[entry].finished = 1;
-	wmb();
-
-	set_bit(0, &mce_need_notify);
 }
 
 void mce_inject_log(struct mce *m)
@@ -213,6 +175,12 @@ EXPORT_SYMBOL_GPL(mce_inject_log);
 
 static struct notifier_block mce_srao_nb;
 
+/*
+ * We run the default notifier if we have only the SRAO, the first and the
+ * default notifier registered. I.e., the mandatory NUM_DEFAULT_NOTIFIERS
+ * notifiers registered on the chain.
+ */
+#define NUM_DEFAULT_NOTIFIERS	3
 static atomic_t num_notifiers;
 
 void mce_register_decode_chain(struct notifier_block *nb)
@@ -522,7 +490,6 @@ static void mce_schedule_work(void)
 
 static void mce_irq_work_cb(struct irq_work *entry)
 {
-	mce_notify_irq();
 	mce_schedule_work();
 }
 
@@ -565,6 +532,111 @@ static int mce_usable_address(struct mce *m)
 	return 1;
 }
 
+static bool memory_error(struct mce *m)
+{
+	struct cpuinfo_x86 *c = &boot_cpu_data;
+
+	if (c->x86_vendor == X86_VENDOR_AMD) {
+		/* ErrCodeExt[20:16] */
+		u8 xec = (m->status >> 16) & 0x1f;
+
+		return (xec == 0x0 || xec == 0x8);
+	} else if (c->x86_vendor == X86_VENDOR_INTEL) {
+		/*
+		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
+		 *
+		 * Bit 7 of the MCACOD field of IA32_MCi_STATUS is used for
+		 * indicating a memory error. Bit 8 is used for indicating a
+		 * cache hierarchy error. The combination of bit 2 and bit 3
+		 * is used for indicating a `generic' cache hierarchy error
+		 * But we can't just blindly check the above bits, because if
+		 * bit 11 is set, then it is a bus/interconnect error - and
+		 * either way the above bits just gives more detail on what
+		 * bus/interconnect error happened. Note that bit 12 can be
+		 * ignored, as it's the "filter" bit.
+		 */
+		return (m->status & 0xef80) == BIT(7) ||
+		       (m->status & 0xef00) == BIT(8) ||
+		       (m->status & 0xeffc) == 0xc;
+	}
+
+	return false;
+}
+
+static bool cec_add_mce(struct mce *m)
+{
+	if (!m)
+		return false;
+
+	/* We eat only correctable DRAM errors with usable addresses. */
+	if (memory_error(m) &&
+	    !(m->status & MCI_STATUS_UC) &&
+	    mce_usable_address(m))
+		if (!cec_add_elem(m->addr >> PAGE_SHIFT))
+			return true;
+
+	return false;
+}
+
+static int mce_first_notifier(struct notifier_block *nb, unsigned long val,
+			      void *data)
+{
+	struct mce *m = (struct mce *)data;
+	unsigned int next, entry;
+
+	if (!m)
+		return NOTIFY_DONE;
+
+	if (cec_add_mce(m))
+		return NOTIFY_STOP;
+
+	/* Emit the trace record: */
+	trace_mce_record(m);
+
+	wmb();
+	for (;;) {
+		entry = mce_log_get_idx_check(mcelog_buf.next);
+		for (;;) {
+
+			/*
+			 * When the buffer fills up discard new entries.
+			 * Assume that the earlier errors are the more
+			 * interesting ones:
+			 */
+			if (entry >= MCE_LOG_LEN) {
+				set_bit(MCE_OVERFLOW,
+					(unsigned long *)&mcelog_buf.flags);
+				return NOTIFY_DONE;
+			}
+			/* Old left over entry. Skip: */
+			if (mcelog_buf.entry[entry].finished) {
+				entry++;
+				continue;
+			}
+			break;
+		}
+		smp_rmb();
+		next = entry + 1;
+		if (cmpxchg(&mcelog_buf.next, entry, next) == entry)
+			break;
+	}
+	memcpy(mcelog_buf.entry + entry, m, sizeof(struct mce));
+	wmb();
+	mcelog_buf.entry[entry].finished = 1;
+	wmb();
+
+	set_bit(0, &mce_need_notify);
+
+	mce_notify_irq();
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block first_nb = {
+	.notifier_call	= mce_first_notifier,
+	.priority	= MCE_PRIO_FIRST,
+};
+
 static int srao_decode_notifier(struct notifier_block *nb, unsigned long val,
 				void *data)
 {
@@ -594,11 +666,7 @@ static int mce_default_notifier(struct notifier_block *nb, unsigned long val,
 	if (!m)
 		return NOTIFY_DONE;
 
-	/*
-	 * Run the default notifier if we have only the SRAO
-	 * notifier and us registered.
-	 */
-	if (atomic_read(&num_notifiers) > 2)
+	if (atomic_read(&num_notifiers) > NUM_DEFAULT_NOTIFIERS)
 		return NOTIFY_DONE;
 
 	/* Don't print when mcelog is running */
@@ -655,37 +723,6 @@ static void mce_read_aux(struct mce *m, int i)
 	}
 }
 
-static bool memory_error(struct mce *m)
-{
-	struct cpuinfo_x86 *c = &boot_cpu_data;
-
-	if (c->x86_vendor == X86_VENDOR_AMD) {
-		/* ErrCodeExt[20:16] */
-		u8 xec = (m->status >> 16) & 0x1f;
-
-		return (xec == 0x0 || xec == 0x8);
-	} else if (c->x86_vendor == X86_VENDOR_INTEL) {
-		/*
-		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
-		 *
-		 * Bit 7 of the MCACOD field of IA32_MCi_STATUS is used for
-		 * indicating a memory error. Bit 8 is used for indicating a
-		 * cache hierarchy error. The combination of bit 2 and bit 3
-		 * is used for indicating a `generic' cache hierarchy error
-		 * But we can't just blindly check the above bits, because if
-		 * bit 11 is set, then it is a bus/interconnect error - and
-		 * either way the above bits just gives more detail on what
-		 * bus/interconnect error happened. Note that bit 12 can be
-		 * ignored, as it's the "filter" bit.
-		 */
-		return (m->status & 0xef80) == BIT(7) ||
-		       (m->status & 0xef00) == BIT(8) ||
-		       (m->status & 0xeffc) == 0xc;
-	}
-
-	return false;
-}
-
 DEFINE_PER_CPU(unsigned, mce_poll_count);
 
 /*
@@ -2167,6 +2204,7 @@ __setup("mce", mcheck_enable);
 int __init mcheck_init(void)
 {
 	mcheck_intel_therm_init();
+	mce_register_decode_chain(&first_nb);
 	mce_register_decode_chain(&mce_srao_nb);
 	mce_register_decode_chain(&mce_default_nb);
 	mcheck_vendor_init_severity();
@@ -2716,6 +2754,7 @@ static int __init mcheck_late_init(void)
 		static_branch_inc(&mcsafe_key);
 
 	mcheck_debugfs_init();
+	cec_init();
 
 	/*
 	 * Flush out everything that has been logged during early boot, now that
diff --git a/arch/x86/ras/Kconfig b/arch/x86/ras/Kconfig
index 0bc60a308730..2a2d89d39af6 100644
--- a/arch/x86/ras/Kconfig
+++ b/arch/x86/ras/Kconfig
@@ -7,3 +7,17 @@ config MCE_AMD_INJ
 	  aspects of the MCE handling code.
 
 	  WARNING: Do not even assume this interface is staying stable!
+
+config RAS_CEC
+	bool "Correctable Errors Collector"
+	depends on X86_MCE && MEMORY_FAILURE && DEBUG_FS
+	---help---
+	  This is a small cache which collects correctable memory errors per 4K
+	  page PFN and counts their repeated occurrence. Once the counter for a
+	  PFN overflows, we try to soft-offline that page as we take it to mean
+	  that it has reached a relatively high error count and would probably
+	  be best if we don't use it anymore.
+
+	  Bear in mind that this is absolutely useless if your platform doesn't
+	  have ECC DIMMs and doesn't have DRAM ECC checking enabled in the BIOS.
+
diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
index d7f73341ced3..7b26dd3aa5d0 100644
--- a/drivers/ras/Makefile
+++ b/drivers/ras/Makefile
@@ -1 +1,2 @@
-obj-$(CONFIG_RAS) += ras.o debugfs.o
+obj-$(CONFIG_RAS)	+= ras.o debugfs.o
+obj-$(CONFIG_RAS_CEC)	+= cec.o
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
new file mode 100644
index 000000000000..6aab46d91d33
--- /dev/null
+++ b/drivers/ras/cec.c
@@ -0,0 +1,532 @@
+#include <linux/mm.h>
+#include <linux/gfp.h>
+#include <linux/kernel.h>
+
+#include <asm/mce.h>
+
+#include "debugfs.h"
+
+/*
+ * RAS Correctable Errors Collector
+ *
+ * This is a simple gadget which collects correctable errors and counts their
+ * occurrence per physical page address.
+ *
+ * We've opted for possibly the simplest data structure to collect those - an
+ * array of the size of a memory page. It stores 512 u64's with the following
+ * structure:
+ *
+ * [63 ... PFN ... 12 | 11 ... generation ... 10 | 9 ... count ... 0]
+ *
+ * The generation in the two highest order bits is two bits which are set to 11b
+ * on every insertion. During the course of each entry's existence, the
+ * generation field gets decremented during spring cleaning to 10b, then 01b and
+ * then 00b.
+ *
+ * This way we're employing the natural numeric ordering to make sure that newly
+ * inserted/touched elements have higher 12-bit counts (which we've manufactured)
+ * and thus iterating over the array initially won't kick out those elements
+ * which were inserted last.
+ *
+ * Spring cleaning is what we do when we reach a certain number CLEAN_ELEMS of
+ * elements entered into the array, during which, we're decaying all elements.
+ * If, after decay, an element gets inserted again, its generation is set to 11b
+ * to make sure it has higher numerical count than other, older elements and
+ * thus emulate an an LRU-like behavior when deleting elements to free up space
+ * in the page.
+ *
+ * When an element reaches it's max count of count_threshold, we try to poison
+ * it by assuming that errors triggered count_threshold times in a single page
+ * are excessive and that page shouldn't be used anymore. count_threshold is
+ * initialized to COUNT_MASK which is the maximum.
+ *
+ * That error event entry causes cec_add_elem() to return !0 value and thus
+ * signal to its callers to log the error.
+ *
+ * To the question why we've chosen a page and moving elements around with
+ * memmove(), it is because it is a very simple structure to handle and max data
+ * movement is 4K which on highly optimized modern CPUs is almost unnoticeable.
+ * We wanted to avoid the pointer traversal of more complex structures like a
+ * linked list or some sort of a balancing search tree.
+ *
+ * Deleting an element takes O(n) but since it is only a single page, it should
+ * be fast enough and it shouldn't happen all too often depending on error
+ * patterns.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) "RAS: " fmt
+
+/*
+ * We use DECAY_BITS bits of PAGE_SHIFT bits for counting decay, i.e., how long
+ * elements have stayed in the array without having been accessed again.
+ */
+#define DECAY_BITS		2
+#define DECAY_MASK		((1ULL << DECAY_BITS) - 1)
+#define MAX_ELEMS		(PAGE_SIZE / sizeof(u64))
+
+/*
+ * Threshold amount of inserted elements after which we start spring
+ * cleaning.
+ */
+#define CLEAN_ELEMS		(MAX_ELEMS >> DECAY_BITS)
+
+/* Bits which count the number of errors happened in this 4K page. */
+#define COUNT_BITS		(PAGE_SHIFT - DECAY_BITS)
+#define COUNT_MASK		((1ULL << COUNT_BITS) - 1)
+#define FULL_COUNT_MASK		(PAGE_SIZE - 1)
+
+/*
+ * u64: [ 63 ... 12 | DECAY_BITS | COUNT_BITS ]
+ */
+
+#define PFN(e)			((e) >> PAGE_SHIFT)
+#define DECAY(e)		(((e) >> COUNT_BITS) & DECAY_MASK)
+#define COUNT(e)		((unsigned int)(e) & COUNT_MASK)
+#define FULL_COUNT(e)		((e) & (PAGE_SIZE - 1))
+
+static struct ce_array {
+	u64 *array;			/* container page */
+	unsigned int n;			/* number of elements in the array */
+
+	unsigned int decay_count;	/*
+					 * number of element insertions/increments
+					 * since the last spring cleaning.
+					 */
+
+	u64 pfns_poisoned;		/*
+					 * number of PFNs which got poisoned.
+					 */
+
+	u64 ces_entered;		/*
+					 * The number of correctable errors
+					 * entered into the collector.
+					 */
+
+	u64 decays_done;		/*
+					 * Times we did spring cleaning.
+					 */
+
+	union {
+		struct {
+			__u32	disabled : 1,	/* cmdline disabled */
+			__resv   : 31;
+		};
+		__u32 flags;
+	};
+} ce_arr;
+
+static DEFINE_MUTEX(ce_mutex);
+static u64 dfs_pfn;
+
+/* Amount of errors after which we offline */
+static unsigned int count_threshold = COUNT_MASK;
+
+/*
+ * The timer "decays" element count each timer_interval which is 24hrs by
+ * default.
+ */
+
+#define CEC_TIMER_DEFAULT_INTERVAL	24 * 60 * 60	/* 24 hrs */
+#define CEC_TIMER_MIN_INTERVAL		 1 * 60 * 60	/* 1h */
+#define CEC_TIMER_MAX_INTERVAL	   30 *	24 * 60 * 60	/* one month */
+static struct timer_list cec_timer;
+static u64 timer_interval = CEC_TIMER_DEFAULT_INTERVAL;
+
+/*
+ * Decrement decay value. We're using DECAY_BITS bits to denote decay of an
+ * element in the array. On insertion and any access, it gets reset to max.
+ */
+static void do_spring_cleaning(struct ce_array *ca)
+{
+	int i;
+
+	for (i = 0; i < ca->n; i++) {
+		u8 decay = DECAY(ca->array[i]);
+
+		if (!decay)
+			continue;
+
+		decay--;
+
+		ca->array[i] &= ~(DECAY_MASK << COUNT_BITS);
+		ca->array[i] |= (decay << COUNT_BITS);
+	}
+	ca->decay_count = 0;
+	ca->decays_done++;
+}
+
+/*
+ * @interval in seconds
+ */
+static void cec_mod_timer(struct timer_list *t, unsigned long interval)
+{
+	unsigned long iv;
+
+	iv = interval * HZ + jiffies;
+
+	mod_timer(t, round_jiffies(iv));
+}
+
+static void cec_timer_fn(unsigned long data)
+{
+	struct ce_array *ca = (struct ce_array *)data;
+
+	do_spring_cleaning(ca);
+
+	cec_mod_timer(&cec_timer, timer_interval);
+}
+
+/*
+ * @to: index of the smallest element which is >= then @pfn.
+ *
+ * Return the index of the pfn if found, otherwise negative value.
+ */
+static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
+{
+	u64 this_pfn;
+	int min = 0, max = ca->n;
+
+	while (min < max) {
+		int tmp = (max + min) >> 1;
+
+		this_pfn = PFN(ca->array[tmp]);
+
+		if (this_pfn < pfn)
+			min = tmp + 1;
+		else if (this_pfn > pfn)
+			max = tmp;
+		else {
+			min = tmp;
+			break;
+		}
+	}
+
+	if (to)
+		*to = min;
+
+	this_pfn = PFN(ca->array[min]);
+
+	if (this_pfn == pfn)
+		return min;
+
+	return -ENOKEY;
+}
+
+static int find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
+{
+	WARN_ON(!to);
+
+	if (!ca->n) {
+		*to = 0;
+		return -ENOKEY;
+	}
+	return __find_elem(ca, pfn, to);
+}
+
+static void del_elem(struct ce_array *ca, int idx)
+{
+	/* Save us a function call when deleting the last element. */
+	if (ca->n - (idx + 1))
+		memmove((void *)&ca->array[idx],
+			(void *)&ca->array[idx + 1],
+			(ca->n - (idx + 1)) * sizeof(u64));
+
+	ca->n--;
+}
+
+static u64 del_lru_elem_unlocked(struct ce_array *ca)
+{
+	unsigned int min = FULL_COUNT_MASK;
+	int i, min_idx = 0;
+
+	for (i = 0; i < ca->n; i++) {
+		unsigned int this = FULL_COUNT(ca->array[i]);
+
+		if (min > this) {
+			min = this;
+			min_idx = i;
+		}
+	}
+
+	del_elem(ca, min_idx);
+
+	return PFN(ca->array[min_idx]);
+}
+
+/*
+ * We return the 0th pfn in the error case under the assumption that it cannot
+ * be poisoned and excessive CEs in there are a serious deal anyway.
+ */
+static u64 __maybe_unused del_lru_elem(void)
+{
+	struct ce_array *ca = &ce_arr;
+	u64 pfn;
+
+	if (!ca->n)
+		return 0;
+
+	mutex_lock(&ce_mutex);
+	pfn = del_lru_elem_unlocked(ca);
+	mutex_unlock(&ce_mutex);
+
+	return pfn;
+}
+
+
+int cec_add_elem(u64 pfn)
+{
+	struct ce_array *ca = &ce_arr;
+	unsigned int to;
+	int count, ret = 0;
+
+	/*
+	 * We can be called very early on the identify_cpu() path where we are
+	 * not initialized yet. We ignore the error for simplicity.
+	 */
+	if (!ce_arr.array || ce_arr.disabled)
+		return -ENODEV;
+
+	ca->ces_entered++;
+
+	mutex_lock(&ce_mutex);
+
+	if (ca->n == MAX_ELEMS)
+		WARN_ON(!del_lru_elem_unlocked(ca));
+
+	ret = find_elem(ca, pfn, &to);
+	if (ret < 0) {
+		/*
+		 * Shift range [to-end] to make room for one more element.
+		 */
+		memmove((void *)&ca->array[to + 1],
+			(void *)&ca->array[to],
+			(ca->n - to) * sizeof(u64));
+
+		ca->array[to] = (pfn << PAGE_SHIFT) |
+				(DECAY_MASK << COUNT_BITS) | 1;
+
+		ca->n++;
+
+		ret = 0;
+
+		goto decay;
+	}
+
+	count = COUNT(ca->array[to]);
+
+	if (count < count_threshold) {
+		ca->array[to] |= (DECAY_MASK << COUNT_BITS);
+		ca->array[to]++;
+
+		ret = 0;
+	} else {
+		u64 pfn = ca->array[to] >> PAGE_SHIFT;
+
+		if (!pfn_valid(pfn)) {
+			pr_warn("CEC: Invalid pfn: 0x%llx\n", pfn);
+		} else {
+			/* We have reached max count for this page, soft-offline it. */
+			pr_err("Soft-offlining pfn: 0x%llx\n", pfn);
+			memory_failure_queue(pfn, 0, MF_SOFT_OFFLINE);
+			ca->pfns_poisoned++;
+		}
+
+		del_elem(ca, to);
+
+		/*
+		 * Return a >0 value to denote that we've reached the offlining
+		 * threshold.
+		 */
+		ret = 1;
+
+		goto unlock;
+	}
+
+decay:
+	ca->decay_count++;
+
+	if (ca->decay_count >= CLEAN_ELEMS)
+		do_spring_cleaning(ca);
+
+unlock:
+	mutex_unlock(&ce_mutex);
+
+	return ret;
+}
+
+static int u64_get(void *data, u64 *val)
+{
+	*val = *(u64 *)data;
+
+	return 0;
+}
+
+static int pfn_set(void *data, u64 val)
+{
+	*(u64 *)data = val;
+
+	return cec_add_elem(val);
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(pfn_ops, u64_get, pfn_set, "0x%llx\n");
+
+static int decay_interval_set(void *data, u64 val)
+{
+	*(u64 *)data = val;
+
+	if (val < CEC_TIMER_MIN_INTERVAL)
+		return -EINVAL;
+
+	if (val > CEC_TIMER_MAX_INTERVAL)
+		return -EINVAL;
+
+	timer_interval = val;
+
+	cec_mod_timer(&cec_timer, timer_interval);
+	return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(decay_interval_ops, u64_get, decay_interval_set, "%lld\n");
+
+static int count_threshold_set(void *data, u64 val)
+{
+	*(u64 *)data = val;
+
+	if (val > COUNT_MASK)
+		val = COUNT_MASK;
+
+	count_threshold = val;
+
+	return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(count_threshold_ops, u64_get, count_threshold_set, "%lld\n");
+
+static int array_dump(struct seq_file *m, void *v)
+{
+	struct ce_array *ca = &ce_arr;
+	u64 prev = 0;
+	int i;
+
+	mutex_lock(&ce_mutex);
+
+	seq_printf(m, "{ n: %d\n", ca->n);
+	for (i = 0; i < ca->n; i++) {
+		u64 this = PFN(ca->array[i]);
+
+		seq_printf(m, " %03d: [%016llx|%03llx]\n", i, this, FULL_COUNT(ca->array[i]));
+
+		WARN_ON(prev > this);
+
+		prev = this;
+	}
+
+	seq_printf(m, "}\n");
+
+	seq_printf(m, "Stats:\nCEs: %llu\nofflined pages: %llu\n",
+		   ca->ces_entered, ca->pfns_poisoned);
+
+	seq_printf(m, "Flags: 0x%x\n", ca->flags);
+
+	seq_printf(m, "Timer interval: %lld seconds\n", timer_interval);
+	seq_printf(m, "Decays: %lld\n", ca->decays_done);
+
+	seq_printf(m, "Action threshold: %d\n", count_threshold);
+
+	mutex_unlock(&ce_mutex);
+
+	return 0;
+}
+
+static int array_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, array_dump, NULL);
+}
+
+static const struct file_operations array_ops = {
+	.owner	 = THIS_MODULE,
+	.open	 = array_open,
+	.read	 = seq_read,
+	.llseek	 = seq_lseek,
+	.release = single_release,
+};
+
+static int __init create_debugfs_nodes(void)
+{
+	struct dentry *d, *pfn, *decay, *count, *array;
+
+	d = debugfs_create_dir("cec", ras_debugfs_dir);
+	if (!d) {
+		pr_warn("Error creating cec debugfs node!\n");
+		return -1;
+	}
+
+	pfn = debugfs_create_file("pfn", S_IRUSR | S_IWUSR, d, &dfs_pfn, &pfn_ops);
+	if (!pfn) {
+		pr_warn("Error creating pfn debugfs node!\n");
+		goto err;
+	}
+
+	array = debugfs_create_file("array", S_IRUSR, d, NULL, &array_ops);
+	if (!array) {
+		pr_warn("Error creating array debugfs node!\n");
+		goto err;
+	}
+
+	decay = debugfs_create_file("decay_interval", S_IRUSR | S_IWUSR, d,
+				    &timer_interval, &decay_interval_ops);
+	if (!decay) {
+		pr_warn("Error creating decay_interval debugfs node!\n");
+		goto err;
+	}
+
+	count = debugfs_create_file("count_threshold", S_IRUSR | S_IWUSR, d,
+				    &count_threshold, &count_threshold_ops);
+	if (!decay) {
+		pr_warn("Error creating count_threshold debugfs node!\n");
+		goto err;
+	}
+
+
+	return 0;
+
+err:
+	debugfs_remove_recursive(d);
+
+	return 1;
+}
+
+void __init cec_init(void)
+{
+	if (ce_arr.disabled)
+		return;
+
+	ce_arr.array = (void *)get_zeroed_page(GFP_KERNEL);
+	if (!ce_arr.array) {
+		pr_err("Error allocating CE array page!\n");
+		return;
+	}
+
+	if (create_debugfs_nodes())
+		return;
+
+	setup_timer(&cec_timer, cec_timer_fn, (unsigned long)&ce_arr);
+	cec_mod_timer(&cec_timer, CEC_TIMER_DEFAULT_INTERVAL);
+
+	pr_info("Correctable Errors collector initialized.\n");
+}
+
+int __init parse_cec_param(char *str)
+{
+	if (!str)
+		return 0;
+
+	if (*str == '=')
+		str++;
+
+	if (!strncmp(str, "cec_disable", 7))
+		ce_arr.disabled = 1;
+	else
+		return 0;
+
+	return 1;
+}
diff --git a/drivers/ras/debugfs.c b/drivers/ras/debugfs.c
index 0322acf67ea5..501603057dff 100644
--- a/drivers/ras/debugfs.c
+++ b/drivers/ras/debugfs.c
@@ -1,6 +1,6 @@
 #include <linux/debugfs.h>
 
-static struct dentry *ras_debugfs_dir;
+struct dentry *ras_debugfs_dir;
 
 static atomic_t trace_count = ATOMIC_INIT(0);
 
diff --git a/drivers/ras/debugfs.h b/drivers/ras/debugfs.h
new file mode 100644
index 000000000000..db72e4513191
--- /dev/null
+++ b/drivers/ras/debugfs.h
@@ -0,0 +1,8 @@
+#ifndef __RAS_DEBUGFS_H__
+#define __RAS_DEBUGFS_H__
+
+#include <linux/debugfs.h>
+
+extern struct dentry *ras_debugfs_dir;
+
+#endif /* __RAS_DEBUGFS_H__ */
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index b67dd362b7b6..94f8038864b4 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -27,3 +27,14 @@ subsys_initcall(ras_init);
 EXPORT_TRACEPOINT_SYMBOL_GPL(extlog_mem_event);
 #endif
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
+
+
+int __init parse_ras_param(char *str)
+{
+#ifdef CONFIG_RAS_CEC
+	parse_cec_param(str);
+#endif
+
+	return 1;
+}
+__setup("ras", parse_ras_param);
diff --git a/include/linux/ras.h b/include/linux/ras.h
index 2aceeafd6fe5..ffb147185e8d 100644
--- a/include/linux/ras.h
+++ b/include/linux/ras.h
@@ -1,14 +1,25 @@
 #ifndef __RAS_H__
 #define __RAS_H__
 
+#include <asm/errno.h>
+
 #ifdef CONFIG_DEBUG_FS
 int ras_userspace_consumers(void);
 void ras_debugfs_init(void);
 int ras_add_daemon_trace(void);
 #else
 static inline int ras_userspace_consumers(void) { return 0; }
-static inline void ras_debugfs_init(void) { return; }
+static inline void ras_debugfs_init(void) { }
 static inline int ras_add_daemon_trace(void) { return 0; }
 #endif
 
+#ifdef CONFIG_RAS_CEC
+void __init cec_init(void);
+int __init parse_cec_param(char *str);
+int cec_add_elem(u64 pfn);
+#else
+static inline void __init cec_init(void)	{ }
+static inline int cec_add_elem(u64 pfn)		{ return -ENODEV; }
 #endif
+
+#endif /* __RAS_H__ */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/6] x86/mce: Deprecate /dev/mcelog
  2017-03-27  9:32 [PATCH 0/6] x86/RAS: Correctable Errors Collector Borislav Petkov
                   ` (3 preceding siblings ...)
  2017-03-27  9:33 ` [PATCH 4/6] RAS: Add a Corrected Errors Collector Borislav Petkov
@ 2017-03-27  9:33 ` Borislav Petkov
  2017-03-28  7:03   ` [tip:ras/core] x86/mce: Factor out and deprecate the /dev/mcelog driver tip-bot for Tony Luck
  2017-03-27  9:33 ` [PATCH 6/6] x86/mce: Do not register notifiers with invalid prio Borislav Petkov
  5 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2017-03-27  9:33 UTC (permalink / raw)
  To: X86 ML; +Cc: linux-edac, LKML

From: Tony Luck <tony.luck@intel.com>

Move all code relating to /dev/mcelog to a separate source file.
/dev/mcelog driver can now operate from the machine check notifier with
lowest prio.

Boris:
* Move the mce_helper and trigger functionality behind
CONFIG_X86_MCELOG.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/Kconfig                          |  10 +-
 arch/x86/include/asm/mce.h                |   1 +
 arch/x86/kernel/cpu/mcheck/Makefile       |   2 +
 arch/x86/kernel/cpu/mcheck/dev-mcelog.c   | 397 ++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/mcheck/mce-internal.h |   8 +
 arch/x86/kernel/cpu/mcheck/mce.c          | 372 +---------------------------
 6 files changed, 426 insertions(+), 364 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/mcheck/dev-mcelog.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8977d9c77373..b3de6fab7503 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1044,6 +1044,14 @@ config X86_MCE
 	  The action the kernel takes depends on the severity of the problem,
 	  ranging from warning messages to halting the machine.
 
+config X86_MCELOG
+	bool "Support for deprecated /dev/mcelog character device"
+	depends on X86_MCE
+	---help---
+	  Enable support for /dev/mcelog which is needed by the old mcelog
+	  userspace logging daemon. Consider switching to the new generation
+	  rasdaemon solution.
+
 config X86_MCE_INTEL
 	def_bool y
 	prompt "Intel MCE features"
@@ -1073,7 +1081,7 @@ config X86_MCE_THRESHOLD
 	def_bool y
 
 config X86_MCE_INJECT
-	depends on X86_MCE && X86_LOCAL_APIC
+	depends on X86_MCE && X86_LOCAL_APIC && X86_MCELOG
 	tristate "Machine check injector support"
 	---help---
 	  Provide support for injecting machine checks for testing purposes.
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index c5ae545d27d8..4fd5195deed0 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -196,6 +196,7 @@ enum mce_notifier_prios {
 	MCE_PRIO_EXTLOG		= INT_MAX - 2,
 	MCE_PRIO_NFIT		= INT_MAX - 3,
 	MCE_PRIO_EDAC		= INT_MAX - 4,
+	MCE_PRIO_MCELOG		= 1,
 	MCE_PRIO_LOWEST		= 0,
 };
 
diff --git a/arch/x86/kernel/cpu/mcheck/Makefile b/arch/x86/kernel/cpu/mcheck/Makefile
index a3311c886194..950e9ff602ea 100644
--- a/arch/x86/kernel/cpu/mcheck/Makefile
+++ b/arch/x86/kernel/cpu/mcheck/Makefile
@@ -9,3 +9,5 @@ obj-$(CONFIG_X86_MCE_INJECT)	+= mce-inject.o
 obj-$(CONFIG_X86_THERMAL_VECTOR) += therm_throt.o
 
 obj-$(CONFIG_ACPI_APEI)		+= mce-apei.o
+
+obj-$(CONFIG_X86_MCELOG)	+= dev-mcelog.o
diff --git a/arch/x86/kernel/cpu/mcheck/dev-mcelog.c b/arch/x86/kernel/cpu/mcheck/dev-mcelog.c
new file mode 100644
index 000000000000..9c632cb88546
--- /dev/null
+++ b/arch/x86/kernel/cpu/mcheck/dev-mcelog.c
@@ -0,0 +1,397 @@
+/*
+ * /dev/mcelog driver
+ *
+ * K8 parts Copyright 2002,2003 Andi Kleen, SuSE Labs.
+ * Rest from unknown author(s).
+ * 2004 Andi Kleen. Rewrote most of it.
+ * Copyright 2008 Intel Corporation
+ * Author: Andi Kleen
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/miscdevice.h>
+#include <linux/slab.h>
+#include <linux/kmod.h>
+#include <linux/poll.h>
+
+#include "mce-internal.h"
+
+static DEFINE_MUTEX(mce_chrdev_read_mutex);
+
+static char mce_helper[128];
+static char *mce_helper_argv[2] = { mce_helper, NULL };
+
+#define mce_log_get_idx_check(p) \
+({ \
+	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
+			 !lockdep_is_held(&mce_chrdev_read_mutex), \
+			 "suspicious mce_log_get_idx_check() usage"); \
+	smp_load_acquire(&(p)); \
+})
+
+/*
+ * Lockless MCE logging infrastructure.
+ * This avoids deadlocks on printk locks without having to break locks. Also
+ * separate MCEs from kernel messages to avoid bogus bug reports.
+ */
+
+static struct mce_log_buffer mcelog = {
+	.signature	= MCE_LOG_SIGNATURE,
+	.len		= MCE_LOG_LEN,
+	.recordlen	= sizeof(struct mce),
+};
+
+static DECLARE_WAIT_QUEUE_HEAD(mce_chrdev_wait);
+
+/* User mode helper program triggered by machine check event */
+extern char			mce_helper[128];
+
+static int dev_mce_log(struct notifier_block *nb, unsigned long val,
+				void *data)
+{
+	struct mce *mce = (struct mce *)data;
+	unsigned int next, entry;
+
+	wmb();
+	for (;;) {
+		entry = mce_log_get_idx_check(mcelog.next);
+		for (;;) {
+
+			/*
+			 * When the buffer fills up discard new entries.
+			 * Assume that the earlier errors are the more
+			 * interesting ones:
+			 */
+			if (entry >= MCE_LOG_LEN) {
+				set_bit(MCE_OVERFLOW,
+					(unsigned long *)&mcelog.flags);
+				return NOTIFY_OK;
+			}
+			/* Old left over entry. Skip: */
+			if (mcelog.entry[entry].finished) {
+				entry++;
+				continue;
+			}
+			break;
+		}
+		smp_rmb();
+		next = entry + 1;
+		if (cmpxchg(&mcelog.next, entry, next) == entry)
+			break;
+	}
+	memcpy(mcelog.entry + entry, mce, sizeof(struct mce));
+	wmb();
+	mcelog.entry[entry].finished = 1;
+	wmb();
+
+	/* wake processes polling /dev/mcelog */
+	wake_up_interruptible(&mce_chrdev_wait);
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block dev_mcelog_nb = {
+	.notifier_call	= dev_mce_log,
+	.priority	= MCE_PRIO_MCELOG,
+};
+
+static void mce_do_trigger(struct work_struct *work)
+{
+	call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
+}
+
+static DECLARE_WORK(mce_trigger_work, mce_do_trigger);
+
+
+void mce_work_trigger(void)
+{
+	if (mce_helper[0])
+		schedule_work(&mce_trigger_work);
+}
+
+static ssize_t
+show_trigger(struct device *s, struct device_attribute *attr, char *buf)
+{
+	strcpy(buf, mce_helper);
+	strcat(buf, "\n");
+	return strlen(mce_helper) + 1;
+}
+
+static ssize_t set_trigger(struct device *s, struct device_attribute *attr,
+				const char *buf, size_t siz)
+{
+	char *p;
+
+	strncpy(mce_helper, buf, sizeof(mce_helper));
+	mce_helper[sizeof(mce_helper)-1] = 0;
+	p = strchr(mce_helper, '\n');
+
+	if (p)
+		*p = 0;
+
+	return strlen(mce_helper) + !!p;
+}
+
+DEVICE_ATTR(trigger, 0644, show_trigger, set_trigger);
+
+/*
+ * mce_chrdev: Character device /dev/mcelog to read and clear the MCE log.
+ */
+
+static DEFINE_SPINLOCK(mce_chrdev_state_lock);
+static int mce_chrdev_open_count;	/* #times opened */
+static int mce_chrdev_open_exclu;	/* already open exclusive? */
+
+static int mce_chrdev_open(struct inode *inode, struct file *file)
+{
+	spin_lock(&mce_chrdev_state_lock);
+
+	if (mce_chrdev_open_exclu ||
+	    (mce_chrdev_open_count && (file->f_flags & O_EXCL))) {
+		spin_unlock(&mce_chrdev_state_lock);
+
+		return -EBUSY;
+	}
+
+	if (file->f_flags & O_EXCL)
+		mce_chrdev_open_exclu = 1;
+	mce_chrdev_open_count++;
+
+	spin_unlock(&mce_chrdev_state_lock);
+
+	return nonseekable_open(inode, file);
+}
+
+static int mce_chrdev_release(struct inode *inode, struct file *file)
+{
+	spin_lock(&mce_chrdev_state_lock);
+
+	mce_chrdev_open_count--;
+	mce_chrdev_open_exclu = 0;
+
+	spin_unlock(&mce_chrdev_state_lock);
+
+	return 0;
+}
+
+static void collect_tscs(void *data)
+{
+	unsigned long *cpu_tsc = (unsigned long *)data;
+
+	cpu_tsc[smp_processor_id()] = rdtsc();
+}
+
+static int mce_apei_read_done;
+
+/* Collect MCE record of previous boot in persistent storage via APEI ERST. */
+static int __mce_read_apei(char __user **ubuf, size_t usize)
+{
+	int rc;
+	u64 record_id;
+	struct mce m;
+
+	if (usize < sizeof(struct mce))
+		return -EINVAL;
+
+	rc = apei_read_mce(&m, &record_id);
+	/* Error or no more MCE record */
+	if (rc <= 0) {
+		mce_apei_read_done = 1;
+		/*
+		 * When ERST is disabled, mce_chrdev_read() should return
+		 * "no record" instead of "no device."
+		 */
+		if (rc == -ENODEV)
+			return 0;
+		return rc;
+	}
+	rc = -EFAULT;
+	if (copy_to_user(*ubuf, &m, sizeof(struct mce)))
+		return rc;
+	/*
+	 * In fact, we should have cleared the record after that has
+	 * been flushed to the disk or sent to network in
+	 * /sbin/mcelog, but we have no interface to support that now,
+	 * so just clear it to avoid duplication.
+	 */
+	rc = apei_clear_mce(record_id);
+	if (rc) {
+		mce_apei_read_done = 1;
+		return rc;
+	}
+	*ubuf += sizeof(struct mce);
+
+	return 0;
+}
+
+static ssize_t mce_chrdev_read(struct file *filp, char __user *ubuf,
+				size_t usize, loff_t *off)
+{
+	char __user *buf = ubuf;
+	unsigned long *cpu_tsc;
+	unsigned prev, next;
+	int i, err;
+
+	cpu_tsc = kmalloc(nr_cpu_ids * sizeof(long), GFP_KERNEL);
+	if (!cpu_tsc)
+		return -ENOMEM;
+
+	mutex_lock(&mce_chrdev_read_mutex);
+
+	if (!mce_apei_read_done) {
+		err = __mce_read_apei(&buf, usize);
+		if (err || buf != ubuf)
+			goto out;
+	}
+
+	next = mce_log_get_idx_check(mcelog.next);
+
+	/* Only supports full reads right now */
+	err = -EINVAL;
+	if (*off != 0 || usize < MCE_LOG_LEN*sizeof(struct mce))
+		goto out;
+
+	err = 0;
+	prev = 0;
+	do {
+		for (i = prev; i < next; i++) {
+			unsigned long start = jiffies;
+			struct mce *m = &mcelog.entry[i];
+
+			while (!m->finished) {
+				if (time_after_eq(jiffies, start + 2)) {
+					memset(m, 0, sizeof(*m));
+					goto timeout;
+				}
+				cpu_relax();
+			}
+			smp_rmb();
+			err |= copy_to_user(buf, m, sizeof(*m));
+			buf += sizeof(*m);
+timeout:
+			;
+		}
+
+		memset(mcelog.entry + prev, 0,
+		       (next - prev) * sizeof(struct mce));
+		prev = next;
+		next = cmpxchg(&mcelog.next, prev, 0);
+	} while (next != prev);
+
+	synchronize_sched();
+
+	/*
+	 * Collect entries that were still getting written before the
+	 * synchronize.
+	 */
+	on_each_cpu(collect_tscs, cpu_tsc, 1);
+
+	for (i = next; i < MCE_LOG_LEN; i++) {
+		struct mce *m = &mcelog.entry[i];
+
+		if (m->finished && m->tsc < cpu_tsc[m->cpu]) {
+			err |= copy_to_user(buf, m, sizeof(*m));
+			smp_rmb();
+			buf += sizeof(*m);
+			memset(m, 0, sizeof(*m));
+		}
+	}
+
+	if (err)
+		err = -EFAULT;
+
+out:
+	mutex_unlock(&mce_chrdev_read_mutex);
+	kfree(cpu_tsc);
+
+	return err ? err : buf - ubuf;
+}
+
+static unsigned int mce_chrdev_poll(struct file *file, poll_table *wait)
+{
+	poll_wait(file, &mce_chrdev_wait, wait);
+	if (READ_ONCE(mcelog.next))
+		return POLLIN | POLLRDNORM;
+	if (!mce_apei_read_done && apei_check_mce())
+		return POLLIN | POLLRDNORM;
+	return 0;
+}
+
+static long mce_chrdev_ioctl(struct file *f, unsigned int cmd,
+				unsigned long arg)
+{
+	int __user *p = (int __user *)arg;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	switch (cmd) {
+	case MCE_GET_RECORD_LEN:
+		return put_user(sizeof(struct mce), p);
+	case MCE_GET_LOG_LEN:
+		return put_user(MCE_LOG_LEN, p);
+	case MCE_GETCLEAR_FLAGS: {
+		unsigned flags;
+
+		do {
+			flags = mcelog.flags;
+		} while (cmpxchg(&mcelog.flags, flags, 0) != flags);
+
+		return put_user(flags, p);
+	}
+	default:
+		return -ENOTTY;
+	}
+}
+
+static ssize_t (*mce_write)(struct file *filp, const char __user *ubuf,
+			    size_t usize, loff_t *off);
+
+void register_mce_write_callback(ssize_t (*fn)(struct file *filp,
+			     const char __user *ubuf,
+			     size_t usize, loff_t *off))
+{
+	mce_write = fn;
+}
+EXPORT_SYMBOL_GPL(register_mce_write_callback);
+
+static ssize_t mce_chrdev_write(struct file *filp, const char __user *ubuf,
+				size_t usize, loff_t *off)
+{
+	if (mce_write)
+		return mce_write(filp, ubuf, usize, off);
+	else
+		return -EINVAL;
+}
+
+static const struct file_operations mce_chrdev_ops = {
+	.open			= mce_chrdev_open,
+	.release		= mce_chrdev_release,
+	.read			= mce_chrdev_read,
+	.write			= mce_chrdev_write,
+	.poll			= mce_chrdev_poll,
+	.unlocked_ioctl		= mce_chrdev_ioctl,
+	.llseek			= no_llseek,
+};
+
+static struct miscdevice mce_chrdev_device = {
+	MISC_MCELOG_MINOR,
+	"mcelog",
+	&mce_chrdev_ops,
+};
+
+static __init int dev_mcelog_init_device(void)
+{
+	int err;
+
+	/* register character device /dev/mcelog */
+	err = misc_register(&mce_chrdev_device);
+	if (err) {
+		pr_err("Unable to init device /dev/mcelog (rc: %d)\n", err);
+		return err;
+	}
+	mce_register_decode_chain(&dev_mcelog_nb);
+	return 0;
+}
+device_initcall_sync(dev_mcelog_init_device);
diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index 903043e6a62b..f096d64485d0 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -96,3 +96,11 @@ static inline bool mce_cmp(struct mce *m1, struct mce *m2)
 		m1->addr != m2->addr ||
 		m1->misc != m2->misc;
 }
+
+extern struct device_attribute dev_attr_trigger;
+
+#ifdef CONFIG_X86_MCELOG
+extern void mce_work_trigger(void);
+#else
+static inline void mce_work_trigger(void)	{ }
+#endif
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 4a907758a516..8e119156429b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -54,17 +54,7 @@
 
 #include "mce-internal.h"
 
-static DEFINE_MUTEX(mce_chrdev_read_mutex);
-
-static int mce_chrdev_open_count;	/* #times opened */
-
-#define mce_log_get_idx_check(p) \
-({ \
-	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
-			 !lockdep_is_held(&mce_chrdev_read_mutex), \
-			 "suspicious mce_log_get_idx_check() usage"); \
-	smp_load_acquire(&(p)); \
-})
+static DEFINE_MUTEX(mce_log_mutex);
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/mce.h>
@@ -89,15 +79,9 @@ struct mca_config mca_cfg __read_mostly = {
 	.monarch_timeout = -1
 };
 
-/* User mode helper program triggered by machine check event */
-static unsigned long		mce_need_notify;
-static char			mce_helper[128];
-static char			*mce_helper_argv[2] = { mce_helper, NULL };
-
-static DECLARE_WAIT_QUEUE_HEAD(mce_chrdev_wait);
-
 static DEFINE_PER_CPU(struct mce, mces_seen);
-static int			cpu_missing;
+static unsigned long mce_need_notify;
+static int cpu_missing;
 
 /*
  * MCA banks polled by the period polling timer for corrected events.
@@ -147,18 +131,6 @@ void mce_setup(struct mce *m)
 DEFINE_PER_CPU(struct mce, injectm);
 EXPORT_PER_CPU_SYMBOL_GPL(injectm);
 
-/*
- * Lockless MCE logging infrastructure.
- * This avoids deadlocks on printk locks without having to break locks. Also
- * separate MCEs from kernel messages to avoid bogus bug reports.
- */
-
-static struct mce_log_buffer mcelog_buf = {
-	.signature	= MCE_LOG_SIGNATURE,
-	.len		= MCE_LOG_LEN,
-	.recordlen	= sizeof(struct mce),
-};
-
 void mce_log(struct mce *m)
 {
 	if (!mce_gen_pool_add(m))
@@ -167,9 +139,9 @@ void mce_log(struct mce *m)
 
 void mce_inject_log(struct mce *m)
 {
-	mutex_lock(&mce_chrdev_read_mutex);
+	mutex_lock(&mce_log_mutex);
 	mce_log(m);
-	mutex_unlock(&mce_chrdev_read_mutex);
+	mutex_unlock(&mce_log_mutex);
 }
 EXPORT_SYMBOL_GPL(mce_inject_log);
 
@@ -582,7 +554,6 @@ static int mce_first_notifier(struct notifier_block *nb, unsigned long val,
 			      void *data)
 {
 	struct mce *m = (struct mce *)data;
-	unsigned int next, entry;
 
 	if (!m)
 		return NOTIFY_DONE;
@@ -593,38 +564,6 @@ static int mce_first_notifier(struct notifier_block *nb, unsigned long val,
 	/* Emit the trace record: */
 	trace_mce_record(m);
 
-	wmb();
-	for (;;) {
-		entry = mce_log_get_idx_check(mcelog_buf.next);
-		for (;;) {
-
-			/*
-			 * When the buffer fills up discard new entries.
-			 * Assume that the earlier errors are the more
-			 * interesting ones:
-			 */
-			if (entry >= MCE_LOG_LEN) {
-				set_bit(MCE_OVERFLOW,
-					(unsigned long *)&mcelog_buf.flags);
-				return NOTIFY_DONE;
-			}
-			/* Old left over entry. Skip: */
-			if (mcelog_buf.entry[entry].finished) {
-				entry++;
-				continue;
-			}
-			break;
-		}
-		smp_rmb();
-		next = entry + 1;
-		if (cmpxchg(&mcelog_buf.next, entry, next) == entry)
-			break;
-	}
-	memcpy(mcelog_buf.entry + entry, m, sizeof(struct mce));
-	wmb();
-	mcelog_buf.entry[entry].finished = 1;
-	wmb();
-
 	set_bit(0, &mce_need_notify);
 
 	mce_notify_irq();
@@ -669,10 +608,6 @@ static int mce_default_notifier(struct notifier_block *nb, unsigned long val,
 	if (atomic_read(&num_notifiers) > NUM_DEFAULT_NOTIFIERS)
 		return NOTIFY_DONE;
 
-	/* Don't print when mcelog is running */
-	if (mce_chrdev_open_count > 0)
-		return NOTIFY_DONE;
-
 	__print_mce(m);
 
 	return NOTIFY_DONE;
@@ -1456,13 +1391,6 @@ static void mce_timer_delete_all(void)
 		del_timer_sync(&per_cpu(mce_timer, cpu));
 }
 
-static void mce_do_trigger(struct work_struct *work)
-{
-	call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
-}
-
-static DECLARE_WORK(mce_trigger_work, mce_do_trigger);
-
 /*
  * Notify the user(s) about new machine check events.
  * Can be called from interrupt context, but not from machine check/NMI
@@ -1474,11 +1402,7 @@ int mce_notify_irq(void)
 	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
 
 	if (test_and_clear_bit(0, &mce_need_notify)) {
-		/* wake processes polling /dev/mcelog */
-		wake_up_interruptible(&mce_chrdev_wait);
-
-		if (mce_helper[0])
-			schedule_work(&mce_trigger_work);
+		mce_work_trigger();
 
 		if (__ratelimit(&ratelimit))
 			pr_info(HW_ERR "Machine check events logged\n");
@@ -1886,251 +1810,6 @@ void mcheck_cpu_clear(struct cpuinfo_x86 *c)
 
 }
 
-/*
- * mce_chrdev: Character device /dev/mcelog to read and clear the MCE log.
- */
-
-static DEFINE_SPINLOCK(mce_chrdev_state_lock);
-static int mce_chrdev_open_exclu;	/* already open exclusive? */
-
-static int mce_chrdev_open(struct inode *inode, struct file *file)
-{
-	spin_lock(&mce_chrdev_state_lock);
-
-	if (mce_chrdev_open_exclu ||
-	    (mce_chrdev_open_count && (file->f_flags & O_EXCL))) {
-		spin_unlock(&mce_chrdev_state_lock);
-
-		return -EBUSY;
-	}
-
-	if (file->f_flags & O_EXCL)
-		mce_chrdev_open_exclu = 1;
-	mce_chrdev_open_count++;
-
-	spin_unlock(&mce_chrdev_state_lock);
-
-	return nonseekable_open(inode, file);
-}
-
-static int mce_chrdev_release(struct inode *inode, struct file *file)
-{
-	spin_lock(&mce_chrdev_state_lock);
-
-	mce_chrdev_open_count--;
-	mce_chrdev_open_exclu = 0;
-
-	spin_unlock(&mce_chrdev_state_lock);
-
-	return 0;
-}
-
-static void collect_tscs(void *data)
-{
-	unsigned long *cpu_tsc = (unsigned long *)data;
-
-	cpu_tsc[smp_processor_id()] = rdtsc();
-}
-
-static int mce_apei_read_done;
-
-/* Collect MCE record of previous boot in persistent storage via APEI ERST. */
-static int __mce_read_apei(char __user **ubuf, size_t usize)
-{
-	int rc;
-	u64 record_id;
-	struct mce m;
-
-	if (usize < sizeof(struct mce))
-		return -EINVAL;
-
-	rc = apei_read_mce(&m, &record_id);
-	/* Error or no more MCE record */
-	if (rc <= 0) {
-		mce_apei_read_done = 1;
-		/*
-		 * When ERST is disabled, mce_chrdev_read() should return
-		 * "no record" instead of "no device."
-		 */
-		if (rc == -ENODEV)
-			return 0;
-		return rc;
-	}
-	rc = -EFAULT;
-	if (copy_to_user(*ubuf, &m, sizeof(struct mce)))
-		return rc;
-	/*
-	 * In fact, we should have cleared the record after that has
-	 * been flushed to the disk or sent to network in
-	 * /sbin/mcelog, but we have no interface to support that now,
-	 * so just clear it to avoid duplication.
-	 */
-	rc = apei_clear_mce(record_id);
-	if (rc) {
-		mce_apei_read_done = 1;
-		return rc;
-	}
-	*ubuf += sizeof(struct mce);
-
-	return 0;
-}
-
-static ssize_t mce_chrdev_read(struct file *filp, char __user *ubuf,
-				size_t usize, loff_t *off)
-{
-	char __user *buf = ubuf;
-	unsigned long *cpu_tsc;
-	unsigned prev, next;
-	int i, err;
-
-	cpu_tsc = kmalloc(nr_cpu_ids * sizeof(long), GFP_KERNEL);
-	if (!cpu_tsc)
-		return -ENOMEM;
-
-	mutex_lock(&mce_chrdev_read_mutex);
-
-	if (!mce_apei_read_done) {
-		err = __mce_read_apei(&buf, usize);
-		if (err || buf != ubuf)
-			goto out;
-	}
-
-	next = mce_log_get_idx_check(mcelog_buf.next);
-
-	/* Only supports full reads right now */
-	err = -EINVAL;
-	if (*off != 0 || usize < MCE_LOG_LEN*sizeof(struct mce))
-		goto out;
-
-	err = 0;
-	prev = 0;
-	do {
-		for (i = prev; i < next; i++) {
-			unsigned long start = jiffies;
-			struct mce *m = &mcelog_buf.entry[i];
-
-			while (!m->finished) {
-				if (time_after_eq(jiffies, start + 2)) {
-					memset(m, 0, sizeof(*m));
-					goto timeout;
-				}
-				cpu_relax();
-			}
-			smp_rmb();
-			err |= copy_to_user(buf, m, sizeof(*m));
-			buf += sizeof(*m);
-timeout:
-			;
-		}
-
-		memset(mcelog_buf.entry + prev, 0,
-		       (next - prev) * sizeof(struct mce));
-		prev = next;
-		next = cmpxchg(&mcelog_buf.next, prev, 0);
-	} while (next != prev);
-
-	synchronize_sched();
-
-	/*
-	 * Collect entries that were still getting written before the
-	 * synchronize.
-	 */
-	on_each_cpu(collect_tscs, cpu_tsc, 1);
-
-	for (i = next; i < MCE_LOG_LEN; i++) {
-		struct mce *m = &mcelog_buf.entry[i];
-
-		if (m->finished && m->tsc < cpu_tsc[m->cpu]) {
-			err |= copy_to_user(buf, m, sizeof(*m));
-			smp_rmb();
-			buf += sizeof(*m);
-			memset(m, 0, sizeof(*m));
-		}
-	}
-
-	if (err)
-		err = -EFAULT;
-
-out:
-	mutex_unlock(&mce_chrdev_read_mutex);
-	kfree(cpu_tsc);
-
-	return err ? err : buf - ubuf;
-}
-
-static unsigned int mce_chrdev_poll(struct file *file, poll_table *wait)
-{
-	poll_wait(file, &mce_chrdev_wait, wait);
-	if (READ_ONCE(mcelog_buf.next))
-		return POLLIN | POLLRDNORM;
-	if (!mce_apei_read_done && apei_check_mce())
-		return POLLIN | POLLRDNORM;
-	return 0;
-}
-
-static long mce_chrdev_ioctl(struct file *f, unsigned int cmd,
-				unsigned long arg)
-{
-	int __user *p = (int __user *)arg;
-
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	switch (cmd) {
-	case MCE_GET_RECORD_LEN:
-		return put_user(sizeof(struct mce), p);
-	case MCE_GET_LOG_LEN:
-		return put_user(MCE_LOG_LEN, p);
-	case MCE_GETCLEAR_FLAGS: {
-		unsigned flags;
-
-		do {
-			flags = mcelog_buf.flags;
-		} while (cmpxchg(&mcelog_buf.flags, flags, 0) != flags);
-
-		return put_user(flags, p);
-	}
-	default:
-		return -ENOTTY;
-	}
-}
-
-static ssize_t (*mce_write)(struct file *filp, const char __user *ubuf,
-			    size_t usize, loff_t *off);
-
-void register_mce_write_callback(ssize_t (*fn)(struct file *filp,
-			     const char __user *ubuf,
-			     size_t usize, loff_t *off))
-{
-	mce_write = fn;
-}
-EXPORT_SYMBOL_GPL(register_mce_write_callback);
-
-static ssize_t mce_chrdev_write(struct file *filp, const char __user *ubuf,
-				size_t usize, loff_t *off)
-{
-	if (mce_write)
-		return mce_write(filp, ubuf, usize, off);
-	else
-		return -EINVAL;
-}
-
-static const struct file_operations mce_chrdev_ops = {
-	.open			= mce_chrdev_open,
-	.release		= mce_chrdev_release,
-	.read			= mce_chrdev_read,
-	.write			= mce_chrdev_write,
-	.poll			= mce_chrdev_poll,
-	.unlocked_ioctl		= mce_chrdev_ioctl,
-	.llseek			= no_llseek,
-};
-
-static struct miscdevice mce_chrdev_device = {
-	MISC_MCELOG_MINOR,
-	"mcelog",
-	&mce_chrdev_ops,
-};
-
 static void __mce_disable_bank(void *arg)
 {
 	int bank = *((int *)arg);
@@ -2349,29 +2028,6 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
 	return size;
 }
 
-static ssize_t
-show_trigger(struct device *s, struct device_attribute *attr, char *buf)
-{
-	strcpy(buf, mce_helper);
-	strcat(buf, "\n");
-	return strlen(mce_helper) + 1;
-}
-
-static ssize_t set_trigger(struct device *s, struct device_attribute *attr,
-				const char *buf, size_t siz)
-{
-	char *p;
-
-	strncpy(mce_helper, buf, sizeof(mce_helper));
-	mce_helper[sizeof(mce_helper)-1] = 0;
-	p = strchr(mce_helper, '\n');
-
-	if (p)
-		*p = 0;
-
-	return strlen(mce_helper) + !!p;
-}
-
 static ssize_t set_ignore_ce(struct device *s,
 			     struct device_attribute *attr,
 			     const char *buf, size_t size)
@@ -2428,7 +2084,6 @@ static ssize_t store_int_with_restart(struct device *s,
 	return ret;
 }
 
-static DEVICE_ATTR(trigger, 0644, show_trigger, set_trigger);
 static DEVICE_INT_ATTR(tolerant, 0644, mca_cfg.tolerant);
 static DEVICE_INT_ATTR(monarch_timeout, 0644, mca_cfg.monarch_timeout);
 static DEVICE_BOOL_ATTR(dont_log_ce, 0644, mca_cfg.dont_log_ce);
@@ -2451,7 +2106,9 @@ static struct dev_ext_attribute dev_attr_cmci_disabled = {
 static struct device_attribute *mce_device_attrs[] = {
 	&dev_attr_tolerant.attr,
 	&dev_attr_check_interval.attr,
+#ifdef CONFIG_X86_MCELOG
 	&dev_attr_trigger,
+#endif
 	&dev_attr_monarch_timeout.attr,
 	&dev_attr_dont_log_ce.attr,
 	&dev_attr_ignore_ce.attr,
@@ -2625,7 +2282,6 @@ static __init void mce_init_banks(void)
 
 static __init int mcheck_init_device(void)
 {
-	enum cpuhp_state hp_online;
 	int err;
 
 	if (!mce_available(&boot_cpu_data)) {
@@ -2653,21 +2309,11 @@ static __init int mcheck_init_device(void)
 				mce_cpu_online, mce_cpu_pre_down);
 	if (err < 0)
 		goto err_out_online;
-	hp_online = err;
 
 	register_syscore_ops(&mce_syscore_ops);
 
-	/* register character device /dev/mcelog */
-	err = misc_register(&mce_chrdev_device);
-	if (err)
-		goto err_register;
-
 	return 0;
 
-err_register:
-	unregister_syscore_ops(&mce_syscore_ops);
-	cpuhp_remove_state(hp_online);
-
 err_out_online:
 	cpuhp_remove_state(CPUHP_X86_MCE_DEAD);
 
@@ -2675,7 +2321,7 @@ static __init int mcheck_init_device(void)
 	free_cpumask_var(mce_device_initialized);
 
 err_out:
-	pr_err("Unable to init device /dev/mcelog (rc: %d)\n", err);
+	pr_err("Unable to init MCE device (rc: %d)\n", err);
 
 	return err;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/6] x86/mce: Do not register notifiers with invalid prio
  2017-03-27  9:32 [PATCH 0/6] x86/RAS: Correctable Errors Collector Borislav Petkov
                   ` (4 preceding siblings ...)
  2017-03-27  9:33 ` [PATCH 5/6] x86/mce: Deprecate /dev/mcelog Borislav Petkov
@ 2017-03-27  9:33 ` Borislav Petkov
  2017-03-28  7:04   ` [tip:ras/core] " tip-bot for Borislav Petkov
  5 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2017-03-27  9:33 UTC (permalink / raw)
  To: X86 ML; +Cc: linux-edac, LKML

From: Borislav Petkov <bp@suse.de>

This is just a defensive precaution: do not register notifiers with a
priority which would disrupt the error handling in the notifiers with
prio higher than MCE_PRIO_EDAC.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8e119156429b..4b57b8336ecd 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -157,9 +157,10 @@ static atomic_t num_notifiers;
 
 void mce_register_decode_chain(struct notifier_block *nb)
 {
-	atomic_inc(&num_notifiers);
+	if (WARN_ON(nb->priority > MCE_PRIO_LOWEST && nb->priority < MCE_PRIO_EDAC))
+		return;
 
-	WARN_ON(nb->priority > MCE_PRIO_LOWEST && nb->priority < MCE_PRIO_EDAC);
+	atomic_inc(&num_notifiers);
 
 	atomic_notifier_chain_register(&x86_mce_decoder_chain, nb);
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:ras/core] x86/mce: Don't print MCEs when mcelog is active
  2017-03-27  9:32 ` [PATCH 1/6] x86/mce: Don't print MCEs when mcelog is active Borislav Petkov
@ 2017-03-28  7:01   ` tip-bot for Andi Kleen
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Andi Kleen @ 2017-03-28  7:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, tglx, ak, hpa, linux-edac, peterz, mingo, bp,
	tony.luck, torvalds

Commit-ID:  cc66afea58f858ff6da7f79b8a595a67bbb4f9a9
Gitweb:     http://git.kernel.org/tip/cc66afea58f858ff6da7f79b8a595a67bbb4f9a9
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Mon, 27 Mar 2017 11:32:59 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 28 Mar 2017 08:53:52 +0200

x86/mce: Don't print MCEs when mcelog is active

Since:

  cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers")

all MCEs are printed even when mcelog is running. Fix the regression to
not print to dmesg when mcelog is running as it is a consumer too.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org # 4.10..
Fixes: cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers")
Link: http://lkml.kernel.org/r/20170327093304.10683-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8e9725c..5accfbd 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -54,6 +54,8 @@
 
 static DEFINE_MUTEX(mce_chrdev_read_mutex);
 
+static int mce_chrdev_open_count;	/* #times opened */
+
 #define mce_log_get_idx_check(p) \
 ({ \
 	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
@@ -598,6 +600,10 @@ static int mce_default_notifier(struct notifier_block *nb, unsigned long val,
 	if (atomic_read(&num_notifiers) > 2)
 		return NOTIFY_DONE;
 
+	/* Don't print when mcelog is running */
+	if (mce_chrdev_open_count > 0)
+		return NOTIFY_DONE;
+
 	__print_mce(m);
 
 	return NOTIFY_DONE;
@@ -1828,7 +1834,6 @@ void mcheck_cpu_clear(struct cpuinfo_x86 *c)
  */
 
 static DEFINE_SPINLOCK(mce_chrdev_state_lock);
-static int mce_chrdev_open_count;	/* #times opened */
 static int mce_chrdev_open_exclu;	/* already open exclusive? */
 
 static int mce_chrdev_open(struct inode *inode, struct file *file)

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:ras/core] x86/mce: Rename mce_log()'s argument
  2017-03-27  9:33 ` [PATCH 2/6] x86/MCE: Rename mce_log()'s argument Borislav Petkov
@ 2017-03-28  7:01   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Borislav Petkov @ 2017-03-28  7:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, peterz, hpa, bp, linux-edac, linux-kernel, torvalds, mingo

Commit-ID:  fe3ed20fddfd8c6a4dcfb8e43de80f269d1f3f2a
Gitweb:     http://git.kernel.org/tip/fe3ed20fddfd8c6a4dcfb8e43de80f269d1f3f2a
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Mon, 27 Mar 2017 11:33:00 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 28 Mar 2017 08:54:38 +0200

x86/mce: Rename mce_log()'s argument

We call it everywhere "struct mce *m". Adjust that here too to avoid
confusion.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index e95e027..49673b2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -158,14 +158,14 @@ static struct mce_log mcelog = {
 	.recordlen	= sizeof(struct mce),
 };
 
-void mce_log(struct mce *mce)
+void mce_log(struct mce *m)
 {
 	unsigned next, entry;
 
 	/* Emit the trace record: */
-	trace_mce_record(mce);
+	trace_mce_record(m);
 
-	if (!mce_gen_pool_add(mce))
+	if (!mce_gen_pool_add(m))
 		irq_work_queue(&mce_irq_work);
 
 	wmb();
@@ -195,7 +195,7 @@ void mce_log(struct mce *mce)
 		if (cmpxchg(&mcelog.next, entry, next) == entry)
 			break;
 	}
-	memcpy(mcelog.entry + entry, mce, sizeof(struct mce));
+	memcpy(mcelog.entry + entry, m, sizeof(struct mce));
 	wmb();
 	mcelog.entry[entry].finished = 1;
 	wmb();

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:ras/core] x86/mce: Rename mce_log to mce_log_buffer
  2017-03-27  9:33 ` [PATCH 3/6] x86/MCE: Rename mce_log to mce_log_buffer Borislav Petkov
@ 2017-03-28  7:02   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Borislav Petkov @ 2017-03-28  7:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, peterz, mingo, linux-edac, linux-kernel, hpa, bp, tglx

Commit-ID:  e64edfcce9c738300b4102d0739577d6ecc96d4a
Gitweb:     http://git.kernel.org/tip/e64edfcce9c738300b4102d0739577d6ecc96d4a
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Mon, 27 Mar 2017 11:33:01 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 28 Mar 2017 08:54:42 +0200

x86/mce: Rename mce_log to mce_log_buffer

It is confusing when staring at "struct mce_log mcelog" and then there's
also a function called mce_log(). So call the buffer what it is.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/mce.h       |  2 +-
 arch/x86/kernel/cpu/mcheck/mce.c | 30 +++++++++++++++---------------
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index e638736..0512dcc 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -128,7 +128,7 @@
  * debugging tools.  Each entry is only valid when its finished flag
  * is set.
  */
-struct mce_log {
+struct mce_log_buffer {
 	char signature[12]; /* "MACHINECHECK" */
 	unsigned len;	    /* = MCE_LOG_LEN */
 	unsigned next;
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 49673b2..8ada093 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -152,7 +152,7 @@ EXPORT_PER_CPU_SYMBOL_GPL(injectm);
  * separate MCEs from kernel messages to avoid bogus bug reports.
  */
 
-static struct mce_log mcelog = {
+static struct mce_log_buffer mcelog_buf = {
 	.signature	= MCE_LOG_SIGNATURE,
 	.len		= MCE_LOG_LEN,
 	.recordlen	= sizeof(struct mce),
@@ -170,7 +170,7 @@ void mce_log(struct mce *m)
 
 	wmb();
 	for (;;) {
-		entry = mce_log_get_idx_check(mcelog.next);
+		entry = mce_log_get_idx_check(mcelog_buf.next);
 		for (;;) {
 
 			/*
@@ -180,11 +180,11 @@ void mce_log(struct mce *m)
 			 */
 			if (entry >= MCE_LOG_LEN) {
 				set_bit(MCE_OVERFLOW,
-					(unsigned long *)&mcelog.flags);
+					(unsigned long *)&mcelog_buf.flags);
 				return;
 			}
 			/* Old left over entry. Skip: */
-			if (mcelog.entry[entry].finished) {
+			if (mcelog_buf.entry[entry].finished) {
 				entry++;
 				continue;
 			}
@@ -192,12 +192,12 @@ void mce_log(struct mce *m)
 		}
 		smp_rmb();
 		next = entry + 1;
-		if (cmpxchg(&mcelog.next, entry, next) == entry)
+		if (cmpxchg(&mcelog_buf.next, entry, next) == entry)
 			break;
 	}
-	memcpy(mcelog.entry + entry, m, sizeof(struct mce));
+	memcpy(mcelog_buf.entry + entry, m, sizeof(struct mce));
 	wmb();
-	mcelog.entry[entry].finished = 1;
+	mcelog_buf.entry[entry].finished = 1;
 	wmb();
 
 	set_bit(0, &mce_need_notify);
@@ -1958,7 +1958,7 @@ static ssize_t mce_chrdev_read(struct file *filp, char __user *ubuf,
 			goto out;
 	}
 
-	next = mce_log_get_idx_check(mcelog.next);
+	next = mce_log_get_idx_check(mcelog_buf.next);
 
 	/* Only supports full reads right now */
 	err = -EINVAL;
@@ -1970,7 +1970,7 @@ static ssize_t mce_chrdev_read(struct file *filp, char __user *ubuf,
 	do {
 		for (i = prev; i < next; i++) {
 			unsigned long start = jiffies;
-			struct mce *m = &mcelog.entry[i];
+			struct mce *m = &mcelog_buf.entry[i];
 
 			while (!m->finished) {
 				if (time_after_eq(jiffies, start + 2)) {
@@ -1986,10 +1986,10 @@ timeout:
 			;
 		}
 
-		memset(mcelog.entry + prev, 0,
+		memset(mcelog_buf.entry + prev, 0,
 		       (next - prev) * sizeof(struct mce));
 		prev = next;
-		next = cmpxchg(&mcelog.next, prev, 0);
+		next = cmpxchg(&mcelog_buf.next, prev, 0);
 	} while (next != prev);
 
 	synchronize_sched();
@@ -2001,7 +2001,7 @@ timeout:
 	on_each_cpu(collect_tscs, cpu_tsc, 1);
 
 	for (i = next; i < MCE_LOG_LEN; i++) {
-		struct mce *m = &mcelog.entry[i];
+		struct mce *m = &mcelog_buf.entry[i];
 
 		if (m->finished && m->tsc < cpu_tsc[m->cpu]) {
 			err |= copy_to_user(buf, m, sizeof(*m));
@@ -2024,7 +2024,7 @@ out:
 static unsigned int mce_chrdev_poll(struct file *file, poll_table *wait)
 {
 	poll_wait(file, &mce_chrdev_wait, wait);
-	if (READ_ONCE(mcelog.next))
+	if (READ_ONCE(mcelog_buf.next))
 		return POLLIN | POLLRDNORM;
 	if (!mce_apei_read_done && apei_check_mce())
 		return POLLIN | POLLRDNORM;
@@ -2048,8 +2048,8 @@ static long mce_chrdev_ioctl(struct file *f, unsigned int cmd,
 		unsigned flags;
 
 		do {
-			flags = mcelog.flags;
-		} while (cmpxchg(&mcelog.flags, flags, 0) != flags);
+			flags = mcelog_buf.flags;
+		} while (cmpxchg(&mcelog_buf.flags, flags, 0) != flags);
 
 		return put_user(flags, p);
 	}

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:ras/core] RAS: Add a Corrected Errors Collector
  2017-03-27  9:33 ` [PATCH 4/6] RAS: Add a Corrected Errors Collector Borislav Petkov
@ 2017-03-28  7:02   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Borislav Petkov @ 2017-03-28  7:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, linux-kernel, bp, peterz, hpa, tglx, linux-edac, torvalds

Commit-ID:  011d8261117249eab97bc86a8e1ac7731e03e319
Gitweb:     http://git.kernel.org/tip/011d8261117249eab97bc86a8e1ac7731e03e319
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Mon, 27 Mar 2017 11:33:02 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 28 Mar 2017 08:54:48 +0200

RAS: Add a Corrected Errors Collector

Introduce a simple data structure for collecting correctable errors
along with accessors. More detailed description in the code itself.

The error decoding is done with the decoding chain now and
mce_first_notifier() gets to see the error first and the CEC decides
whether to log it and then the rest of the chain doesn't hear about it -
basically the main reason for the CE collector - or to continue running
the notifiers.

When the CEC hits the action threshold, it will try to soft-offine the
page containing the ECC and then the whole decoding chain gets to see
the error.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-5-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt |   6 +
 arch/x86/include/asm/mce.h                      |   9 +-
 arch/x86/kernel/cpu/mcheck/mce.c                | 191 +++++----
 arch/x86/ras/Kconfig                            |  14 +
 drivers/ras/Makefile                            |   3 +-
 drivers/ras/cec.c                               | 532 ++++++++++++++++++++++++
 drivers/ras/debugfs.c                           |   2 +-
 drivers/ras/debugfs.h                           |   8 +
 drivers/ras/ras.c                               |  11 +
 include/linux/ras.h                             |  13 +-
 10 files changed, 706 insertions(+), 83 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2ba45ca..927b8b8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3172,6 +3172,12 @@
 	ramdisk_size=	[RAM] Sizes of RAM disks in kilobytes
 			See Documentation/blockdev/ramdisk.txt.
 
+	ras=option[,option,...]	[KNL] RAS-specific options
+
+		cec_disable	[X86]
+				Disable the Correctable Errors Collector,
+				see CONFIG_RAS_CEC help text.
+
 	rcu_nocbs=	[KNL]
 			The argument is a cpu list, as described above.
 
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 0512dcc..c5ae545 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -191,10 +191,11 @@ extern struct mca_config mca_cfg;
 extern struct mca_msr_regs msr_ops;
 
 enum mce_notifier_prios {
-	MCE_PRIO_SRAO		= INT_MAX,
-	MCE_PRIO_EXTLOG		= INT_MAX - 1,
-	MCE_PRIO_NFIT		= INT_MAX - 2,
-	MCE_PRIO_EDAC		= INT_MAX - 3,
+	MCE_PRIO_FIRST		= INT_MAX,
+	MCE_PRIO_SRAO		= INT_MAX - 1,
+	MCE_PRIO_EXTLOG		= INT_MAX - 2,
+	MCE_PRIO_NFIT		= INT_MAX - 3,
+	MCE_PRIO_EDAC		= INT_MAX - 4,
 	MCE_PRIO_LOWEST		= 0,
 };
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8ada093..4a90775 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -35,6 +35,7 @@
 #include <linux/poll.h>
 #include <linux/nmi.h>
 #include <linux/cpu.h>
+#include <linux/ras.h>
 #include <linux/smp.h>
 #include <linux/fs.h>
 #include <linux/mm.h>
@@ -160,47 +161,8 @@ static struct mce_log_buffer mcelog_buf = {
 
 void mce_log(struct mce *m)
 {
-	unsigned next, entry;
-
-	/* Emit the trace record: */
-	trace_mce_record(m);
-
 	if (!mce_gen_pool_add(m))
 		irq_work_queue(&mce_irq_work);
-
-	wmb();
-	for (;;) {
-		entry = mce_log_get_idx_check(mcelog_buf.next);
-		for (;;) {
-
-			/*
-			 * When the buffer fills up discard new entries.
-			 * Assume that the earlier errors are the more
-			 * interesting ones:
-			 */
-			if (entry >= MCE_LOG_LEN) {
-				set_bit(MCE_OVERFLOW,
-					(unsigned long *)&mcelog_buf.flags);
-				return;
-			}
-			/* Old left over entry. Skip: */
-			if (mcelog_buf.entry[entry].finished) {
-				entry++;
-				continue;
-			}
-			break;
-		}
-		smp_rmb();
-		next = entry + 1;
-		if (cmpxchg(&mcelog_buf.next, entry, next) == entry)
-			break;
-	}
-	memcpy(mcelog_buf.entry + entry, m, sizeof(struct mce));
-	wmb();
-	mcelog_buf.entry[entry].finished = 1;
-	wmb();
-
-	set_bit(0, &mce_need_notify);
 }
 
 void mce_inject_log(struct mce *m)
@@ -213,6 +175,12 @@ EXPORT_SYMBOL_GPL(mce_inject_log);
 
 static struct notifier_block mce_srao_nb;
 
+/*
+ * We run the default notifier if we have only the SRAO, the first and the
+ * default notifier registered. I.e., the mandatory NUM_DEFAULT_NOTIFIERS
+ * notifiers registered on the chain.
+ */
+#define NUM_DEFAULT_NOTIFIERS	3
 static atomic_t num_notifiers;
 
 void mce_register_decode_chain(struct notifier_block *nb)
@@ -522,7 +490,6 @@ static void mce_schedule_work(void)
 
 static void mce_irq_work_cb(struct irq_work *entry)
 {
-	mce_notify_irq();
 	mce_schedule_work();
 }
 
@@ -565,6 +532,111 @@ static int mce_usable_address(struct mce *m)
 	return 1;
 }
 
+static bool memory_error(struct mce *m)
+{
+	struct cpuinfo_x86 *c = &boot_cpu_data;
+
+	if (c->x86_vendor == X86_VENDOR_AMD) {
+		/* ErrCodeExt[20:16] */
+		u8 xec = (m->status >> 16) & 0x1f;
+
+		return (xec == 0x0 || xec == 0x8);
+	} else if (c->x86_vendor == X86_VENDOR_INTEL) {
+		/*
+		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
+		 *
+		 * Bit 7 of the MCACOD field of IA32_MCi_STATUS is used for
+		 * indicating a memory error. Bit 8 is used for indicating a
+		 * cache hierarchy error. The combination of bit 2 and bit 3
+		 * is used for indicating a `generic' cache hierarchy error
+		 * But we can't just blindly check the above bits, because if
+		 * bit 11 is set, then it is a bus/interconnect error - and
+		 * either way the above bits just gives more detail on what
+		 * bus/interconnect error happened. Note that bit 12 can be
+		 * ignored, as it's the "filter" bit.
+		 */
+		return (m->status & 0xef80) == BIT(7) ||
+		       (m->status & 0xef00) == BIT(8) ||
+		       (m->status & 0xeffc) == 0xc;
+	}
+
+	return false;
+}
+
+static bool cec_add_mce(struct mce *m)
+{
+	if (!m)
+		return false;
+
+	/* We eat only correctable DRAM errors with usable addresses. */
+	if (memory_error(m) &&
+	    !(m->status & MCI_STATUS_UC) &&
+	    mce_usable_address(m))
+		if (!cec_add_elem(m->addr >> PAGE_SHIFT))
+			return true;
+
+	return false;
+}
+
+static int mce_first_notifier(struct notifier_block *nb, unsigned long val,
+			      void *data)
+{
+	struct mce *m = (struct mce *)data;
+	unsigned int next, entry;
+
+	if (!m)
+		return NOTIFY_DONE;
+
+	if (cec_add_mce(m))
+		return NOTIFY_STOP;
+
+	/* Emit the trace record: */
+	trace_mce_record(m);
+
+	wmb();
+	for (;;) {
+		entry = mce_log_get_idx_check(mcelog_buf.next);
+		for (;;) {
+
+			/*
+			 * When the buffer fills up discard new entries.
+			 * Assume that the earlier errors are the more
+			 * interesting ones:
+			 */
+			if (entry >= MCE_LOG_LEN) {
+				set_bit(MCE_OVERFLOW,
+					(unsigned long *)&mcelog_buf.flags);
+				return NOTIFY_DONE;
+			}
+			/* Old left over entry. Skip: */
+			if (mcelog_buf.entry[entry].finished) {
+				entry++;
+				continue;
+			}
+			break;
+		}
+		smp_rmb();
+		next = entry + 1;
+		if (cmpxchg(&mcelog_buf.next, entry, next) == entry)
+			break;
+	}
+	memcpy(mcelog_buf.entry + entry, m, sizeof(struct mce));
+	wmb();
+	mcelog_buf.entry[entry].finished = 1;
+	wmb();
+
+	set_bit(0, &mce_need_notify);
+
+	mce_notify_irq();
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block first_nb = {
+	.notifier_call	= mce_first_notifier,
+	.priority	= MCE_PRIO_FIRST,
+};
+
 static int srao_decode_notifier(struct notifier_block *nb, unsigned long val,
 				void *data)
 {
@@ -594,11 +666,7 @@ static int mce_default_notifier(struct notifier_block *nb, unsigned long val,
 	if (!m)
 		return NOTIFY_DONE;
 
-	/*
-	 * Run the default notifier if we have only the SRAO
-	 * notifier and us registered.
-	 */
-	if (atomic_read(&num_notifiers) > 2)
+	if (atomic_read(&num_notifiers) > NUM_DEFAULT_NOTIFIERS)
 		return NOTIFY_DONE;
 
 	/* Don't print when mcelog is running */
@@ -655,37 +723,6 @@ static void mce_read_aux(struct mce *m, int i)
 	}
 }
 
-static bool memory_error(struct mce *m)
-{
-	struct cpuinfo_x86 *c = &boot_cpu_data;
-
-	if (c->x86_vendor == X86_VENDOR_AMD) {
-		/* ErrCodeExt[20:16] */
-		u8 xec = (m->status >> 16) & 0x1f;
-
-		return (xec == 0x0 || xec == 0x8);
-	} else if (c->x86_vendor == X86_VENDOR_INTEL) {
-		/*
-		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
-		 *
-		 * Bit 7 of the MCACOD field of IA32_MCi_STATUS is used for
-		 * indicating a memory error. Bit 8 is used for indicating a
-		 * cache hierarchy error. The combination of bit 2 and bit 3
-		 * is used for indicating a `generic' cache hierarchy error
-		 * But we can't just blindly check the above bits, because if
-		 * bit 11 is set, then it is a bus/interconnect error - and
-		 * either way the above bits just gives more detail on what
-		 * bus/interconnect error happened. Note that bit 12 can be
-		 * ignored, as it's the "filter" bit.
-		 */
-		return (m->status & 0xef80) == BIT(7) ||
-		       (m->status & 0xef00) == BIT(8) ||
-		       (m->status & 0xeffc) == 0xc;
-	}
-
-	return false;
-}
-
 DEFINE_PER_CPU(unsigned, mce_poll_count);
 
 /*
@@ -2167,6 +2204,7 @@ __setup("mce", mcheck_enable);
 int __init mcheck_init(void)
 {
 	mcheck_intel_therm_init();
+	mce_register_decode_chain(&first_nb);
 	mce_register_decode_chain(&mce_srao_nb);
 	mce_register_decode_chain(&mce_default_nb);
 	mcheck_vendor_init_severity();
@@ -2716,6 +2754,7 @@ static int __init mcheck_late_init(void)
 		static_branch_inc(&mcsafe_key);
 
 	mcheck_debugfs_init();
+	cec_init();
 
 	/*
 	 * Flush out everything that has been logged during early boot, now that
diff --git a/arch/x86/ras/Kconfig b/arch/x86/ras/Kconfig
index 0bc60a3..2a2d89d 100644
--- a/arch/x86/ras/Kconfig
+++ b/arch/x86/ras/Kconfig
@@ -7,3 +7,17 @@ config MCE_AMD_INJ
 	  aspects of the MCE handling code.
 
 	  WARNING: Do not even assume this interface is staying stable!
+
+config RAS_CEC
+	bool "Correctable Errors Collector"
+	depends on X86_MCE && MEMORY_FAILURE && DEBUG_FS
+	---help---
+	  This is a small cache which collects correctable memory errors per 4K
+	  page PFN and counts their repeated occurrence. Once the counter for a
+	  PFN overflows, we try to soft-offline that page as we take it to mean
+	  that it has reached a relatively high error count and would probably
+	  be best if we don't use it anymore.
+
+	  Bear in mind that this is absolutely useless if your platform doesn't
+	  have ECC DIMMs and doesn't have DRAM ECC checking enabled in the BIOS.
+
diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
index d7f7334..7b26dd3 100644
--- a/drivers/ras/Makefile
+++ b/drivers/ras/Makefile
@@ -1 +1,2 @@
-obj-$(CONFIG_RAS) += ras.o debugfs.o
+obj-$(CONFIG_RAS)	+= ras.o debugfs.o
+obj-$(CONFIG_RAS_CEC)	+= cec.o
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
new file mode 100644
index 0000000..6aab46d
--- /dev/null
+++ b/drivers/ras/cec.c
@@ -0,0 +1,532 @@
+#include <linux/mm.h>
+#include <linux/gfp.h>
+#include <linux/kernel.h>
+
+#include <asm/mce.h>
+
+#include "debugfs.h"
+
+/*
+ * RAS Correctable Errors Collector
+ *
+ * This is a simple gadget which collects correctable errors and counts their
+ * occurrence per physical page address.
+ *
+ * We've opted for possibly the simplest data structure to collect those - an
+ * array of the size of a memory page. It stores 512 u64's with the following
+ * structure:
+ *
+ * [63 ... PFN ... 12 | 11 ... generation ... 10 | 9 ... count ... 0]
+ *
+ * The generation in the two highest order bits is two bits which are set to 11b
+ * on every insertion. During the course of each entry's existence, the
+ * generation field gets decremented during spring cleaning to 10b, then 01b and
+ * then 00b.
+ *
+ * This way we're employing the natural numeric ordering to make sure that newly
+ * inserted/touched elements have higher 12-bit counts (which we've manufactured)
+ * and thus iterating over the array initially won't kick out those elements
+ * which were inserted last.
+ *
+ * Spring cleaning is what we do when we reach a certain number CLEAN_ELEMS of
+ * elements entered into the array, during which, we're decaying all elements.
+ * If, after decay, an element gets inserted again, its generation is set to 11b
+ * to make sure it has higher numerical count than other, older elements and
+ * thus emulate an an LRU-like behavior when deleting elements to free up space
+ * in the page.
+ *
+ * When an element reaches it's max count of count_threshold, we try to poison
+ * it by assuming that errors triggered count_threshold times in a single page
+ * are excessive and that page shouldn't be used anymore. count_threshold is
+ * initialized to COUNT_MASK which is the maximum.
+ *
+ * That error event entry causes cec_add_elem() to return !0 value and thus
+ * signal to its callers to log the error.
+ *
+ * To the question why we've chosen a page and moving elements around with
+ * memmove(), it is because it is a very simple structure to handle and max data
+ * movement is 4K which on highly optimized modern CPUs is almost unnoticeable.
+ * We wanted to avoid the pointer traversal of more complex structures like a
+ * linked list or some sort of a balancing search tree.
+ *
+ * Deleting an element takes O(n) but since it is only a single page, it should
+ * be fast enough and it shouldn't happen all too often depending on error
+ * patterns.
+ */
+
+#undef pr_fmt
+#define pr_fmt(fmt) "RAS: " fmt
+
+/*
+ * We use DECAY_BITS bits of PAGE_SHIFT bits for counting decay, i.e., how long
+ * elements have stayed in the array without having been accessed again.
+ */
+#define DECAY_BITS		2
+#define DECAY_MASK		((1ULL << DECAY_BITS) - 1)
+#define MAX_ELEMS		(PAGE_SIZE / sizeof(u64))
+
+/*
+ * Threshold amount of inserted elements after which we start spring
+ * cleaning.
+ */
+#define CLEAN_ELEMS		(MAX_ELEMS >> DECAY_BITS)
+
+/* Bits which count the number of errors happened in this 4K page. */
+#define COUNT_BITS		(PAGE_SHIFT - DECAY_BITS)
+#define COUNT_MASK		((1ULL << COUNT_BITS) - 1)
+#define FULL_COUNT_MASK		(PAGE_SIZE - 1)
+
+/*
+ * u64: [ 63 ... 12 | DECAY_BITS | COUNT_BITS ]
+ */
+
+#define PFN(e)			((e) >> PAGE_SHIFT)
+#define DECAY(e)		(((e) >> COUNT_BITS) & DECAY_MASK)
+#define COUNT(e)		((unsigned int)(e) & COUNT_MASK)
+#define FULL_COUNT(e)		((e) & (PAGE_SIZE - 1))
+
+static struct ce_array {
+	u64 *array;			/* container page */
+	unsigned int n;			/* number of elements in the array */
+
+	unsigned int decay_count;	/*
+					 * number of element insertions/increments
+					 * since the last spring cleaning.
+					 */
+
+	u64 pfns_poisoned;		/*
+					 * number of PFNs which got poisoned.
+					 */
+
+	u64 ces_entered;		/*
+					 * The number of correctable errors
+					 * entered into the collector.
+					 */
+
+	u64 decays_done;		/*
+					 * Times we did spring cleaning.
+					 */
+
+	union {
+		struct {
+			__u32	disabled : 1,	/* cmdline disabled */
+			__resv   : 31;
+		};
+		__u32 flags;
+	};
+} ce_arr;
+
+static DEFINE_MUTEX(ce_mutex);
+static u64 dfs_pfn;
+
+/* Amount of errors after which we offline */
+static unsigned int count_threshold = COUNT_MASK;
+
+/*
+ * The timer "decays" element count each timer_interval which is 24hrs by
+ * default.
+ */
+
+#define CEC_TIMER_DEFAULT_INTERVAL	24 * 60 * 60	/* 24 hrs */
+#define CEC_TIMER_MIN_INTERVAL		 1 * 60 * 60	/* 1h */
+#define CEC_TIMER_MAX_INTERVAL	   30 *	24 * 60 * 60	/* one month */
+static struct timer_list cec_timer;
+static u64 timer_interval = CEC_TIMER_DEFAULT_INTERVAL;
+
+/*
+ * Decrement decay value. We're using DECAY_BITS bits to denote decay of an
+ * element in the array. On insertion and any access, it gets reset to max.
+ */
+static void do_spring_cleaning(struct ce_array *ca)
+{
+	int i;
+
+	for (i = 0; i < ca->n; i++) {
+		u8 decay = DECAY(ca->array[i]);
+
+		if (!decay)
+			continue;
+
+		decay--;
+
+		ca->array[i] &= ~(DECAY_MASK << COUNT_BITS);
+		ca->array[i] |= (decay << COUNT_BITS);
+	}
+	ca->decay_count = 0;
+	ca->decays_done++;
+}
+
+/*
+ * @interval in seconds
+ */
+static void cec_mod_timer(struct timer_list *t, unsigned long interval)
+{
+	unsigned long iv;
+
+	iv = interval * HZ + jiffies;
+
+	mod_timer(t, round_jiffies(iv));
+}
+
+static void cec_timer_fn(unsigned long data)
+{
+	struct ce_array *ca = (struct ce_array *)data;
+
+	do_spring_cleaning(ca);
+
+	cec_mod_timer(&cec_timer, timer_interval);
+}
+
+/*
+ * @to: index of the smallest element which is >= then @pfn.
+ *
+ * Return the index of the pfn if found, otherwise negative value.
+ */
+static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
+{
+	u64 this_pfn;
+	int min = 0, max = ca->n;
+
+	while (min < max) {
+		int tmp = (max + min) >> 1;
+
+		this_pfn = PFN(ca->array[tmp]);
+
+		if (this_pfn < pfn)
+			min = tmp + 1;
+		else if (this_pfn > pfn)
+			max = tmp;
+		else {
+			min = tmp;
+			break;
+		}
+	}
+
+	if (to)
+		*to = min;
+
+	this_pfn = PFN(ca->array[min]);
+
+	if (this_pfn == pfn)
+		return min;
+
+	return -ENOKEY;
+}
+
+static int find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
+{
+	WARN_ON(!to);
+
+	if (!ca->n) {
+		*to = 0;
+		return -ENOKEY;
+	}
+	return __find_elem(ca, pfn, to);
+}
+
+static void del_elem(struct ce_array *ca, int idx)
+{
+	/* Save us a function call when deleting the last element. */
+	if (ca->n - (idx + 1))
+		memmove((void *)&ca->array[idx],
+			(void *)&ca->array[idx + 1],
+			(ca->n - (idx + 1)) * sizeof(u64));
+
+	ca->n--;
+}
+
+static u64 del_lru_elem_unlocked(struct ce_array *ca)
+{
+	unsigned int min = FULL_COUNT_MASK;
+	int i, min_idx = 0;
+
+	for (i = 0; i < ca->n; i++) {
+		unsigned int this = FULL_COUNT(ca->array[i]);
+
+		if (min > this) {
+			min = this;
+			min_idx = i;
+		}
+	}
+
+	del_elem(ca, min_idx);
+
+	return PFN(ca->array[min_idx]);
+}
+
+/*
+ * We return the 0th pfn in the error case under the assumption that it cannot
+ * be poisoned and excessive CEs in there are a serious deal anyway.
+ */
+static u64 __maybe_unused del_lru_elem(void)
+{
+	struct ce_array *ca = &ce_arr;
+	u64 pfn;
+
+	if (!ca->n)
+		return 0;
+
+	mutex_lock(&ce_mutex);
+	pfn = del_lru_elem_unlocked(ca);
+	mutex_unlock(&ce_mutex);
+
+	return pfn;
+}
+
+
+int cec_add_elem(u64 pfn)
+{
+	struct ce_array *ca = &ce_arr;
+	unsigned int to;
+	int count, ret = 0;
+
+	/*
+	 * We can be called very early on the identify_cpu() path where we are
+	 * not initialized yet. We ignore the error for simplicity.
+	 */
+	if (!ce_arr.array || ce_arr.disabled)
+		return -ENODEV;
+
+	ca->ces_entered++;
+
+	mutex_lock(&ce_mutex);
+
+	if (ca->n == MAX_ELEMS)
+		WARN_ON(!del_lru_elem_unlocked(ca));
+
+	ret = find_elem(ca, pfn, &to);
+	if (ret < 0) {
+		/*
+		 * Shift range [to-end] to make room for one more element.
+		 */
+		memmove((void *)&ca->array[to + 1],
+			(void *)&ca->array[to],
+			(ca->n - to) * sizeof(u64));
+
+		ca->array[to] = (pfn << PAGE_SHIFT) |
+				(DECAY_MASK << COUNT_BITS) | 1;
+
+		ca->n++;
+
+		ret = 0;
+
+		goto decay;
+	}
+
+	count = COUNT(ca->array[to]);
+
+	if (count < count_threshold) {
+		ca->array[to] |= (DECAY_MASK << COUNT_BITS);
+		ca->array[to]++;
+
+		ret = 0;
+	} else {
+		u64 pfn = ca->array[to] >> PAGE_SHIFT;
+
+		if (!pfn_valid(pfn)) {
+			pr_warn("CEC: Invalid pfn: 0x%llx\n", pfn);
+		} else {
+			/* We have reached max count for this page, soft-offline it. */
+			pr_err("Soft-offlining pfn: 0x%llx\n", pfn);
+			memory_failure_queue(pfn, 0, MF_SOFT_OFFLINE);
+			ca->pfns_poisoned++;
+		}
+
+		del_elem(ca, to);
+
+		/*
+		 * Return a >0 value to denote that we've reached the offlining
+		 * threshold.
+		 */
+		ret = 1;
+
+		goto unlock;
+	}
+
+decay:
+	ca->decay_count++;
+
+	if (ca->decay_count >= CLEAN_ELEMS)
+		do_spring_cleaning(ca);
+
+unlock:
+	mutex_unlock(&ce_mutex);
+
+	return ret;
+}
+
+static int u64_get(void *data, u64 *val)
+{
+	*val = *(u64 *)data;
+
+	return 0;
+}
+
+static int pfn_set(void *data, u64 val)
+{
+	*(u64 *)data = val;
+
+	return cec_add_elem(val);
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(pfn_ops, u64_get, pfn_set, "0x%llx\n");
+
+static int decay_interval_set(void *data, u64 val)
+{
+	*(u64 *)data = val;
+
+	if (val < CEC_TIMER_MIN_INTERVAL)
+		return -EINVAL;
+
+	if (val > CEC_TIMER_MAX_INTERVAL)
+		return -EINVAL;
+
+	timer_interval = val;
+
+	cec_mod_timer(&cec_timer, timer_interval);
+	return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(decay_interval_ops, u64_get, decay_interval_set, "%lld\n");
+
+static int count_threshold_set(void *data, u64 val)
+{
+	*(u64 *)data = val;
+
+	if (val > COUNT_MASK)
+		val = COUNT_MASK;
+
+	count_threshold = val;
+
+	return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(count_threshold_ops, u64_get, count_threshold_set, "%lld\n");
+
+static int array_dump(struct seq_file *m, void *v)
+{
+	struct ce_array *ca = &ce_arr;
+	u64 prev = 0;
+	int i;
+
+	mutex_lock(&ce_mutex);
+
+	seq_printf(m, "{ n: %d\n", ca->n);
+	for (i = 0; i < ca->n; i++) {
+		u64 this = PFN(ca->array[i]);
+
+		seq_printf(m, " %03d: [%016llx|%03llx]\n", i, this, FULL_COUNT(ca->array[i]));
+
+		WARN_ON(prev > this);
+
+		prev = this;
+	}
+
+	seq_printf(m, "}\n");
+
+	seq_printf(m, "Stats:\nCEs: %llu\nofflined pages: %llu\n",
+		   ca->ces_entered, ca->pfns_poisoned);
+
+	seq_printf(m, "Flags: 0x%x\n", ca->flags);
+
+	seq_printf(m, "Timer interval: %lld seconds\n", timer_interval);
+	seq_printf(m, "Decays: %lld\n", ca->decays_done);
+
+	seq_printf(m, "Action threshold: %d\n", count_threshold);
+
+	mutex_unlock(&ce_mutex);
+
+	return 0;
+}
+
+static int array_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, array_dump, NULL);
+}
+
+static const struct file_operations array_ops = {
+	.owner	 = THIS_MODULE,
+	.open	 = array_open,
+	.read	 = seq_read,
+	.llseek	 = seq_lseek,
+	.release = single_release,
+};
+
+static int __init create_debugfs_nodes(void)
+{
+	struct dentry *d, *pfn, *decay, *count, *array;
+
+	d = debugfs_create_dir("cec", ras_debugfs_dir);
+	if (!d) {
+		pr_warn("Error creating cec debugfs node!\n");
+		return -1;
+	}
+
+	pfn = debugfs_create_file("pfn", S_IRUSR | S_IWUSR, d, &dfs_pfn, &pfn_ops);
+	if (!pfn) {
+		pr_warn("Error creating pfn debugfs node!\n");
+		goto err;
+	}
+
+	array = debugfs_create_file("array", S_IRUSR, d, NULL, &array_ops);
+	if (!array) {
+		pr_warn("Error creating array debugfs node!\n");
+		goto err;
+	}
+
+	decay = debugfs_create_file("decay_interval", S_IRUSR | S_IWUSR, d,
+				    &timer_interval, &decay_interval_ops);
+	if (!decay) {
+		pr_warn("Error creating decay_interval debugfs node!\n");
+		goto err;
+	}
+
+	count = debugfs_create_file("count_threshold", S_IRUSR | S_IWUSR, d,
+				    &count_threshold, &count_threshold_ops);
+	if (!decay) {
+		pr_warn("Error creating count_threshold debugfs node!\n");
+		goto err;
+	}
+
+
+	return 0;
+
+err:
+	debugfs_remove_recursive(d);
+
+	return 1;
+}
+
+void __init cec_init(void)
+{
+	if (ce_arr.disabled)
+		return;
+
+	ce_arr.array = (void *)get_zeroed_page(GFP_KERNEL);
+	if (!ce_arr.array) {
+		pr_err("Error allocating CE array page!\n");
+		return;
+	}
+
+	if (create_debugfs_nodes())
+		return;
+
+	setup_timer(&cec_timer, cec_timer_fn, (unsigned long)&ce_arr);
+	cec_mod_timer(&cec_timer, CEC_TIMER_DEFAULT_INTERVAL);
+
+	pr_info("Correctable Errors collector initialized.\n");
+}
+
+int __init parse_cec_param(char *str)
+{
+	if (!str)
+		return 0;
+
+	if (*str == '=')
+		str++;
+
+	if (!strncmp(str, "cec_disable", 7))
+		ce_arr.disabled = 1;
+	else
+		return 0;
+
+	return 1;
+}
diff --git a/drivers/ras/debugfs.c b/drivers/ras/debugfs.c
index 0322acf..5016030 100644
--- a/drivers/ras/debugfs.c
+++ b/drivers/ras/debugfs.c
@@ -1,6 +1,6 @@
 #include <linux/debugfs.h>
 
-static struct dentry *ras_debugfs_dir;
+struct dentry *ras_debugfs_dir;
 
 static atomic_t trace_count = ATOMIC_INIT(0);
 
diff --git a/drivers/ras/debugfs.h b/drivers/ras/debugfs.h
new file mode 100644
index 0000000..db72e45
--- /dev/null
+++ b/drivers/ras/debugfs.h
@@ -0,0 +1,8 @@
+#ifndef __RAS_DEBUGFS_H__
+#define __RAS_DEBUGFS_H__
+
+#include <linux/debugfs.h>
+
+extern struct dentry *ras_debugfs_dir;
+
+#endif /* __RAS_DEBUGFS_H__ */
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index b67dd36..94f8038 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -27,3 +27,14 @@ subsys_initcall(ras_init);
 EXPORT_TRACEPOINT_SYMBOL_GPL(extlog_mem_event);
 #endif
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
+
+
+int __init parse_ras_param(char *str)
+{
+#ifdef CONFIG_RAS_CEC
+	parse_cec_param(str);
+#endif
+
+	return 1;
+}
+__setup("ras", parse_ras_param);
diff --git a/include/linux/ras.h b/include/linux/ras.h
index 2aceeaf..ffb1471 100644
--- a/include/linux/ras.h
+++ b/include/linux/ras.h
@@ -1,14 +1,25 @@
 #ifndef __RAS_H__
 #define __RAS_H__
 
+#include <asm/errno.h>
+
 #ifdef CONFIG_DEBUG_FS
 int ras_userspace_consumers(void);
 void ras_debugfs_init(void);
 int ras_add_daemon_trace(void);
 #else
 static inline int ras_userspace_consumers(void) { return 0; }
-static inline void ras_debugfs_init(void) { return; }
+static inline void ras_debugfs_init(void) { }
 static inline int ras_add_daemon_trace(void) { return 0; }
 #endif
 
+#ifdef CONFIG_RAS_CEC
+void __init cec_init(void);
+int __init parse_cec_param(char *str);
+int cec_add_elem(u64 pfn);
+#else
+static inline void __init cec_init(void)	{ }
+static inline int cec_add_elem(u64 pfn)		{ return -ENODEV; }
 #endif
+
+#endif /* __RAS_H__ */

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:ras/core] x86/mce: Factor out and deprecate the /dev/mcelog driver
  2017-03-27  9:33 ` [PATCH 5/6] x86/mce: Deprecate /dev/mcelog Borislav Petkov
@ 2017-03-28  7:03   ` tip-bot for Tony Luck
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Tony Luck @ 2017-03-28  7:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, linux-edac, linux-kernel, hpa, peterz, tony.luck, mingo,
	bp, torvalds

Commit-ID:  5de97c9f6d85fd83af76e09e338b18e7adb1ae60
Gitweb:     http://git.kernel.org/tip/5de97c9f6d85fd83af76e09e338b18e7adb1ae60
Author:     Tony Luck <tony.luck@intel.com>
AuthorDate: Mon, 27 Mar 2017 11:33:03 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 28 Mar 2017 08:55:01 +0200

x86/mce: Factor out and deprecate the /dev/mcelog driver

Move all code relating to /dev/mcelog to a separate source file.
/dev/mcelog driver can now operate from the machine check notifier with
lowest prio.

Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Move the mce_helper and trigger functionality behind CONFIG_X86_MCELOG_LEGACY. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-6-bp@alien8.de
[ Renamed CONFIG_X86_MCELOG to CONFIG_X86_MCELOG_LEGACY. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/Kconfig                          |  10 +-
 arch/x86/include/asm/mce.h                |   1 +
 arch/x86/kernel/cpu/mcheck/Makefile       |   2 +
 arch/x86/kernel/cpu/mcheck/dev-mcelog.c   | 397 ++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/mcheck/mce-internal.h |   8 +
 arch/x86/kernel/cpu/mcheck/mce.c          | 372 +---------------------------
 6 files changed, 426 insertions(+), 364 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cc98d5a..43e6bac 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1043,6 +1043,14 @@ config X86_MCE
 	  The action the kernel takes depends on the severity of the problem,
 	  ranging from warning messages to halting the machine.
 
+config X86_MCELOG_LEGACY
+	bool "Support for deprecated /dev/mcelog character device"
+	depends on X86_MCE
+	---help---
+	  Enable support for /dev/mcelog which is needed by the old mcelog
+	  userspace logging daemon. Consider switching to the new generation
+	  rasdaemon solution.
+
 config X86_MCE_INTEL
 	def_bool y
 	prompt "Intel MCE features"
@@ -1072,7 +1080,7 @@ config X86_MCE_THRESHOLD
 	def_bool y
 
 config X86_MCE_INJECT
-	depends on X86_MCE && X86_LOCAL_APIC
+	depends on X86_MCE && X86_LOCAL_APIC && X86_MCELOG_LEGACY
 	tristate "Machine check injector support"
 	---help---
 	  Provide support for injecting machine checks for testing purposes.
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index c5ae545..4fd5195 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -196,6 +196,7 @@ enum mce_notifier_prios {
 	MCE_PRIO_EXTLOG		= INT_MAX - 2,
 	MCE_PRIO_NFIT		= INT_MAX - 3,
 	MCE_PRIO_EDAC		= INT_MAX - 4,
+	MCE_PRIO_MCELOG		= 1,
 	MCE_PRIO_LOWEST		= 0,
 };
 
diff --git a/arch/x86/kernel/cpu/mcheck/Makefile b/arch/x86/kernel/cpu/mcheck/Makefile
index a3311c8..43051f0 100644
--- a/arch/x86/kernel/cpu/mcheck/Makefile
+++ b/arch/x86/kernel/cpu/mcheck/Makefile
@@ -9,3 +9,5 @@ obj-$(CONFIG_X86_MCE_INJECT)	+= mce-inject.o
 obj-$(CONFIG_X86_THERMAL_VECTOR) += therm_throt.o
 
 obj-$(CONFIG_ACPI_APEI)		+= mce-apei.o
+
+obj-$(CONFIG_X86_MCELOG_LEGACY)	+= dev-mcelog.o
diff --git a/arch/x86/kernel/cpu/mcheck/dev-mcelog.c b/arch/x86/kernel/cpu/mcheck/dev-mcelog.c
new file mode 100644
index 0000000..9c632cb8
--- /dev/null
+++ b/arch/x86/kernel/cpu/mcheck/dev-mcelog.c
@@ -0,0 +1,397 @@
+/*
+ * /dev/mcelog driver
+ *
+ * K8 parts Copyright 2002,2003 Andi Kleen, SuSE Labs.
+ * Rest from unknown author(s).
+ * 2004 Andi Kleen. Rewrote most of it.
+ * Copyright 2008 Intel Corporation
+ * Author: Andi Kleen
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/miscdevice.h>
+#include <linux/slab.h>
+#include <linux/kmod.h>
+#include <linux/poll.h>
+
+#include "mce-internal.h"
+
+static DEFINE_MUTEX(mce_chrdev_read_mutex);
+
+static char mce_helper[128];
+static char *mce_helper_argv[2] = { mce_helper, NULL };
+
+#define mce_log_get_idx_check(p) \
+({ \
+	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
+			 !lockdep_is_held(&mce_chrdev_read_mutex), \
+			 "suspicious mce_log_get_idx_check() usage"); \
+	smp_load_acquire(&(p)); \
+})
+
+/*
+ * Lockless MCE logging infrastructure.
+ * This avoids deadlocks on printk locks without having to break locks. Also
+ * separate MCEs from kernel messages to avoid bogus bug reports.
+ */
+
+static struct mce_log_buffer mcelog = {
+	.signature	= MCE_LOG_SIGNATURE,
+	.len		= MCE_LOG_LEN,
+	.recordlen	= sizeof(struct mce),
+};
+
+static DECLARE_WAIT_QUEUE_HEAD(mce_chrdev_wait);
+
+/* User mode helper program triggered by machine check event */
+extern char			mce_helper[128];
+
+static int dev_mce_log(struct notifier_block *nb, unsigned long val,
+				void *data)
+{
+	struct mce *mce = (struct mce *)data;
+	unsigned int next, entry;
+
+	wmb();
+	for (;;) {
+		entry = mce_log_get_idx_check(mcelog.next);
+		for (;;) {
+
+			/*
+			 * When the buffer fills up discard new entries.
+			 * Assume that the earlier errors are the more
+			 * interesting ones:
+			 */
+			if (entry >= MCE_LOG_LEN) {
+				set_bit(MCE_OVERFLOW,
+					(unsigned long *)&mcelog.flags);
+				return NOTIFY_OK;
+			}
+			/* Old left over entry. Skip: */
+			if (mcelog.entry[entry].finished) {
+				entry++;
+				continue;
+			}
+			break;
+		}
+		smp_rmb();
+		next = entry + 1;
+		if (cmpxchg(&mcelog.next, entry, next) == entry)
+			break;
+	}
+	memcpy(mcelog.entry + entry, mce, sizeof(struct mce));
+	wmb();
+	mcelog.entry[entry].finished = 1;
+	wmb();
+
+	/* wake processes polling /dev/mcelog */
+	wake_up_interruptible(&mce_chrdev_wait);
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block dev_mcelog_nb = {
+	.notifier_call	= dev_mce_log,
+	.priority	= MCE_PRIO_MCELOG,
+};
+
+static void mce_do_trigger(struct work_struct *work)
+{
+	call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
+}
+
+static DECLARE_WORK(mce_trigger_work, mce_do_trigger);
+
+
+void mce_work_trigger(void)
+{
+	if (mce_helper[0])
+		schedule_work(&mce_trigger_work);
+}
+
+static ssize_t
+show_trigger(struct device *s, struct device_attribute *attr, char *buf)
+{
+	strcpy(buf, mce_helper);
+	strcat(buf, "\n");
+	return strlen(mce_helper) + 1;
+}
+
+static ssize_t set_trigger(struct device *s, struct device_attribute *attr,
+				const char *buf, size_t siz)
+{
+	char *p;
+
+	strncpy(mce_helper, buf, sizeof(mce_helper));
+	mce_helper[sizeof(mce_helper)-1] = 0;
+	p = strchr(mce_helper, '\n');
+
+	if (p)
+		*p = 0;
+
+	return strlen(mce_helper) + !!p;
+}
+
+DEVICE_ATTR(trigger, 0644, show_trigger, set_trigger);
+
+/*
+ * mce_chrdev: Character device /dev/mcelog to read and clear the MCE log.
+ */
+
+static DEFINE_SPINLOCK(mce_chrdev_state_lock);
+static int mce_chrdev_open_count;	/* #times opened */
+static int mce_chrdev_open_exclu;	/* already open exclusive? */
+
+static int mce_chrdev_open(struct inode *inode, struct file *file)
+{
+	spin_lock(&mce_chrdev_state_lock);
+
+	if (mce_chrdev_open_exclu ||
+	    (mce_chrdev_open_count && (file->f_flags & O_EXCL))) {
+		spin_unlock(&mce_chrdev_state_lock);
+
+		return -EBUSY;
+	}
+
+	if (file->f_flags & O_EXCL)
+		mce_chrdev_open_exclu = 1;
+	mce_chrdev_open_count++;
+
+	spin_unlock(&mce_chrdev_state_lock);
+
+	return nonseekable_open(inode, file);
+}
+
+static int mce_chrdev_release(struct inode *inode, struct file *file)
+{
+	spin_lock(&mce_chrdev_state_lock);
+
+	mce_chrdev_open_count--;
+	mce_chrdev_open_exclu = 0;
+
+	spin_unlock(&mce_chrdev_state_lock);
+
+	return 0;
+}
+
+static void collect_tscs(void *data)
+{
+	unsigned long *cpu_tsc = (unsigned long *)data;
+
+	cpu_tsc[smp_processor_id()] = rdtsc();
+}
+
+static int mce_apei_read_done;
+
+/* Collect MCE record of previous boot in persistent storage via APEI ERST. */
+static int __mce_read_apei(char __user **ubuf, size_t usize)
+{
+	int rc;
+	u64 record_id;
+	struct mce m;
+
+	if (usize < sizeof(struct mce))
+		return -EINVAL;
+
+	rc = apei_read_mce(&m, &record_id);
+	/* Error or no more MCE record */
+	if (rc <= 0) {
+		mce_apei_read_done = 1;
+		/*
+		 * When ERST is disabled, mce_chrdev_read() should return
+		 * "no record" instead of "no device."
+		 */
+		if (rc == -ENODEV)
+			return 0;
+		return rc;
+	}
+	rc = -EFAULT;
+	if (copy_to_user(*ubuf, &m, sizeof(struct mce)))
+		return rc;
+	/*
+	 * In fact, we should have cleared the record after that has
+	 * been flushed to the disk or sent to network in
+	 * /sbin/mcelog, but we have no interface to support that now,
+	 * so just clear it to avoid duplication.
+	 */
+	rc = apei_clear_mce(record_id);
+	if (rc) {
+		mce_apei_read_done = 1;
+		return rc;
+	}
+	*ubuf += sizeof(struct mce);
+
+	return 0;
+}
+
+static ssize_t mce_chrdev_read(struct file *filp, char __user *ubuf,
+				size_t usize, loff_t *off)
+{
+	char __user *buf = ubuf;
+	unsigned long *cpu_tsc;
+	unsigned prev, next;
+	int i, err;
+
+	cpu_tsc = kmalloc(nr_cpu_ids * sizeof(long), GFP_KERNEL);
+	if (!cpu_tsc)
+		return -ENOMEM;
+
+	mutex_lock(&mce_chrdev_read_mutex);
+
+	if (!mce_apei_read_done) {
+		err = __mce_read_apei(&buf, usize);
+		if (err || buf != ubuf)
+			goto out;
+	}
+
+	next = mce_log_get_idx_check(mcelog.next);
+
+	/* Only supports full reads right now */
+	err = -EINVAL;
+	if (*off != 0 || usize < MCE_LOG_LEN*sizeof(struct mce))
+		goto out;
+
+	err = 0;
+	prev = 0;
+	do {
+		for (i = prev; i < next; i++) {
+			unsigned long start = jiffies;
+			struct mce *m = &mcelog.entry[i];
+
+			while (!m->finished) {
+				if (time_after_eq(jiffies, start + 2)) {
+					memset(m, 0, sizeof(*m));
+					goto timeout;
+				}
+				cpu_relax();
+			}
+			smp_rmb();
+			err |= copy_to_user(buf, m, sizeof(*m));
+			buf += sizeof(*m);
+timeout:
+			;
+		}
+
+		memset(mcelog.entry + prev, 0,
+		       (next - prev) * sizeof(struct mce));
+		prev = next;
+		next = cmpxchg(&mcelog.next, prev, 0);
+	} while (next != prev);
+
+	synchronize_sched();
+
+	/*
+	 * Collect entries that were still getting written before the
+	 * synchronize.
+	 */
+	on_each_cpu(collect_tscs, cpu_tsc, 1);
+
+	for (i = next; i < MCE_LOG_LEN; i++) {
+		struct mce *m = &mcelog.entry[i];
+
+		if (m->finished && m->tsc < cpu_tsc[m->cpu]) {
+			err |= copy_to_user(buf, m, sizeof(*m));
+			smp_rmb();
+			buf += sizeof(*m);
+			memset(m, 0, sizeof(*m));
+		}
+	}
+
+	if (err)
+		err = -EFAULT;
+
+out:
+	mutex_unlock(&mce_chrdev_read_mutex);
+	kfree(cpu_tsc);
+
+	return err ? err : buf - ubuf;
+}
+
+static unsigned int mce_chrdev_poll(struct file *file, poll_table *wait)
+{
+	poll_wait(file, &mce_chrdev_wait, wait);
+	if (READ_ONCE(mcelog.next))
+		return POLLIN | POLLRDNORM;
+	if (!mce_apei_read_done && apei_check_mce())
+		return POLLIN | POLLRDNORM;
+	return 0;
+}
+
+static long mce_chrdev_ioctl(struct file *f, unsigned int cmd,
+				unsigned long arg)
+{
+	int __user *p = (int __user *)arg;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	switch (cmd) {
+	case MCE_GET_RECORD_LEN:
+		return put_user(sizeof(struct mce), p);
+	case MCE_GET_LOG_LEN:
+		return put_user(MCE_LOG_LEN, p);
+	case MCE_GETCLEAR_FLAGS: {
+		unsigned flags;
+
+		do {
+			flags = mcelog.flags;
+		} while (cmpxchg(&mcelog.flags, flags, 0) != flags);
+
+		return put_user(flags, p);
+	}
+	default:
+		return -ENOTTY;
+	}
+}
+
+static ssize_t (*mce_write)(struct file *filp, const char __user *ubuf,
+			    size_t usize, loff_t *off);
+
+void register_mce_write_callback(ssize_t (*fn)(struct file *filp,
+			     const char __user *ubuf,
+			     size_t usize, loff_t *off))
+{
+	mce_write = fn;
+}
+EXPORT_SYMBOL_GPL(register_mce_write_callback);
+
+static ssize_t mce_chrdev_write(struct file *filp, const char __user *ubuf,
+				size_t usize, loff_t *off)
+{
+	if (mce_write)
+		return mce_write(filp, ubuf, usize, off);
+	else
+		return -EINVAL;
+}
+
+static const struct file_operations mce_chrdev_ops = {
+	.open			= mce_chrdev_open,
+	.release		= mce_chrdev_release,
+	.read			= mce_chrdev_read,
+	.write			= mce_chrdev_write,
+	.poll			= mce_chrdev_poll,
+	.unlocked_ioctl		= mce_chrdev_ioctl,
+	.llseek			= no_llseek,
+};
+
+static struct miscdevice mce_chrdev_device = {
+	MISC_MCELOG_MINOR,
+	"mcelog",
+	&mce_chrdev_ops,
+};
+
+static __init int dev_mcelog_init_device(void)
+{
+	int err;
+
+	/* register character device /dev/mcelog */
+	err = misc_register(&mce_chrdev_device);
+	if (err) {
+		pr_err("Unable to init device /dev/mcelog (rc: %d)\n", err);
+		return err;
+	}
+	mce_register_decode_chain(&dev_mcelog_nb);
+	return 0;
+}
+device_initcall_sync(dev_mcelog_init_device);
diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index 903043e..7f2a7c5 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -96,3 +96,11 @@ static inline bool mce_cmp(struct mce *m1, struct mce *m2)
 		m1->addr != m2->addr ||
 		m1->misc != m2->misc;
 }
+
+extern struct device_attribute dev_attr_trigger;
+
+#ifdef CONFIG_X86_MCELOG_LEGACY
+extern void mce_work_trigger(void);
+#else
+static inline void mce_work_trigger(void)	{ }
+#endif
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 4a90775..36082c7 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -54,17 +54,7 @@
 
 #include "mce-internal.h"
 
-static DEFINE_MUTEX(mce_chrdev_read_mutex);
-
-static int mce_chrdev_open_count;	/* #times opened */
-
-#define mce_log_get_idx_check(p) \
-({ \
-	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
-			 !lockdep_is_held(&mce_chrdev_read_mutex), \
-			 "suspicious mce_log_get_idx_check() usage"); \
-	smp_load_acquire(&(p)); \
-})
+static DEFINE_MUTEX(mce_log_mutex);
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/mce.h>
@@ -89,15 +79,9 @@ struct mca_config mca_cfg __read_mostly = {
 	.monarch_timeout = -1
 };
 
-/* User mode helper program triggered by machine check event */
-static unsigned long		mce_need_notify;
-static char			mce_helper[128];
-static char			*mce_helper_argv[2] = { mce_helper, NULL };
-
-static DECLARE_WAIT_QUEUE_HEAD(mce_chrdev_wait);
-
 static DEFINE_PER_CPU(struct mce, mces_seen);
-static int			cpu_missing;
+static unsigned long mce_need_notify;
+static int cpu_missing;
 
 /*
  * MCA banks polled by the period polling timer for corrected events.
@@ -147,18 +131,6 @@ void mce_setup(struct mce *m)
 DEFINE_PER_CPU(struct mce, injectm);
 EXPORT_PER_CPU_SYMBOL_GPL(injectm);
 
-/*
- * Lockless MCE logging infrastructure.
- * This avoids deadlocks on printk locks without having to break locks. Also
- * separate MCEs from kernel messages to avoid bogus bug reports.
- */
-
-static struct mce_log_buffer mcelog_buf = {
-	.signature	= MCE_LOG_SIGNATURE,
-	.len		= MCE_LOG_LEN,
-	.recordlen	= sizeof(struct mce),
-};
-
 void mce_log(struct mce *m)
 {
 	if (!mce_gen_pool_add(m))
@@ -167,9 +139,9 @@ void mce_log(struct mce *m)
 
 void mce_inject_log(struct mce *m)
 {
-	mutex_lock(&mce_chrdev_read_mutex);
+	mutex_lock(&mce_log_mutex);
 	mce_log(m);
-	mutex_unlock(&mce_chrdev_read_mutex);
+	mutex_unlock(&mce_log_mutex);
 }
 EXPORT_SYMBOL_GPL(mce_inject_log);
 
@@ -582,7 +554,6 @@ static int mce_first_notifier(struct notifier_block *nb, unsigned long val,
 			      void *data)
 {
 	struct mce *m = (struct mce *)data;
-	unsigned int next, entry;
 
 	if (!m)
 		return NOTIFY_DONE;
@@ -593,38 +564,6 @@ static int mce_first_notifier(struct notifier_block *nb, unsigned long val,
 	/* Emit the trace record: */
 	trace_mce_record(m);
 
-	wmb();
-	for (;;) {
-		entry = mce_log_get_idx_check(mcelog_buf.next);
-		for (;;) {
-
-			/*
-			 * When the buffer fills up discard new entries.
-			 * Assume that the earlier errors are the more
-			 * interesting ones:
-			 */
-			if (entry >= MCE_LOG_LEN) {
-				set_bit(MCE_OVERFLOW,
-					(unsigned long *)&mcelog_buf.flags);
-				return NOTIFY_DONE;
-			}
-			/* Old left over entry. Skip: */
-			if (mcelog_buf.entry[entry].finished) {
-				entry++;
-				continue;
-			}
-			break;
-		}
-		smp_rmb();
-		next = entry + 1;
-		if (cmpxchg(&mcelog_buf.next, entry, next) == entry)
-			break;
-	}
-	memcpy(mcelog_buf.entry + entry, m, sizeof(struct mce));
-	wmb();
-	mcelog_buf.entry[entry].finished = 1;
-	wmb();
-
 	set_bit(0, &mce_need_notify);
 
 	mce_notify_irq();
@@ -669,10 +608,6 @@ static int mce_default_notifier(struct notifier_block *nb, unsigned long val,
 	if (atomic_read(&num_notifiers) > NUM_DEFAULT_NOTIFIERS)
 		return NOTIFY_DONE;
 
-	/* Don't print when mcelog is running */
-	if (mce_chrdev_open_count > 0)
-		return NOTIFY_DONE;
-
 	__print_mce(m);
 
 	return NOTIFY_DONE;
@@ -1456,13 +1391,6 @@ static void mce_timer_delete_all(void)
 		del_timer_sync(&per_cpu(mce_timer, cpu));
 }
 
-static void mce_do_trigger(struct work_struct *work)
-{
-	call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
-}
-
-static DECLARE_WORK(mce_trigger_work, mce_do_trigger);
-
 /*
  * Notify the user(s) about new machine check events.
  * Can be called from interrupt context, but not from machine check/NMI
@@ -1474,11 +1402,7 @@ int mce_notify_irq(void)
 	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
 
 	if (test_and_clear_bit(0, &mce_need_notify)) {
-		/* wake processes polling /dev/mcelog */
-		wake_up_interruptible(&mce_chrdev_wait);
-
-		if (mce_helper[0])
-			schedule_work(&mce_trigger_work);
+		mce_work_trigger();
 
 		if (__ratelimit(&ratelimit))
 			pr_info(HW_ERR "Machine check events logged\n");
@@ -1886,251 +1810,6 @@ void mcheck_cpu_clear(struct cpuinfo_x86 *c)
 
 }
 
-/*
- * mce_chrdev: Character device /dev/mcelog to read and clear the MCE log.
- */
-
-static DEFINE_SPINLOCK(mce_chrdev_state_lock);
-static int mce_chrdev_open_exclu;	/* already open exclusive? */
-
-static int mce_chrdev_open(struct inode *inode, struct file *file)
-{
-	spin_lock(&mce_chrdev_state_lock);
-
-	if (mce_chrdev_open_exclu ||
-	    (mce_chrdev_open_count && (file->f_flags & O_EXCL))) {
-		spin_unlock(&mce_chrdev_state_lock);
-
-		return -EBUSY;
-	}
-
-	if (file->f_flags & O_EXCL)
-		mce_chrdev_open_exclu = 1;
-	mce_chrdev_open_count++;
-
-	spin_unlock(&mce_chrdev_state_lock);
-
-	return nonseekable_open(inode, file);
-}
-
-static int mce_chrdev_release(struct inode *inode, struct file *file)
-{
-	spin_lock(&mce_chrdev_state_lock);
-
-	mce_chrdev_open_count--;
-	mce_chrdev_open_exclu = 0;
-
-	spin_unlock(&mce_chrdev_state_lock);
-
-	return 0;
-}
-
-static void collect_tscs(void *data)
-{
-	unsigned long *cpu_tsc = (unsigned long *)data;
-
-	cpu_tsc[smp_processor_id()] = rdtsc();
-}
-
-static int mce_apei_read_done;
-
-/* Collect MCE record of previous boot in persistent storage via APEI ERST. */
-static int __mce_read_apei(char __user **ubuf, size_t usize)
-{
-	int rc;
-	u64 record_id;
-	struct mce m;
-
-	if (usize < sizeof(struct mce))
-		return -EINVAL;
-
-	rc = apei_read_mce(&m, &record_id);
-	/* Error or no more MCE record */
-	if (rc <= 0) {
-		mce_apei_read_done = 1;
-		/*
-		 * When ERST is disabled, mce_chrdev_read() should return
-		 * "no record" instead of "no device."
-		 */
-		if (rc == -ENODEV)
-			return 0;
-		return rc;
-	}
-	rc = -EFAULT;
-	if (copy_to_user(*ubuf, &m, sizeof(struct mce)))
-		return rc;
-	/*
-	 * In fact, we should have cleared the record after that has
-	 * been flushed to the disk or sent to network in
-	 * /sbin/mcelog, but we have no interface to support that now,
-	 * so just clear it to avoid duplication.
-	 */
-	rc = apei_clear_mce(record_id);
-	if (rc) {
-		mce_apei_read_done = 1;
-		return rc;
-	}
-	*ubuf += sizeof(struct mce);
-
-	return 0;
-}
-
-static ssize_t mce_chrdev_read(struct file *filp, char __user *ubuf,
-				size_t usize, loff_t *off)
-{
-	char __user *buf = ubuf;
-	unsigned long *cpu_tsc;
-	unsigned prev, next;
-	int i, err;
-
-	cpu_tsc = kmalloc(nr_cpu_ids * sizeof(long), GFP_KERNEL);
-	if (!cpu_tsc)
-		return -ENOMEM;
-
-	mutex_lock(&mce_chrdev_read_mutex);
-
-	if (!mce_apei_read_done) {
-		err = __mce_read_apei(&buf, usize);
-		if (err || buf != ubuf)
-			goto out;
-	}
-
-	next = mce_log_get_idx_check(mcelog_buf.next);
-
-	/* Only supports full reads right now */
-	err = -EINVAL;
-	if (*off != 0 || usize < MCE_LOG_LEN*sizeof(struct mce))
-		goto out;
-
-	err = 0;
-	prev = 0;
-	do {
-		for (i = prev; i < next; i++) {
-			unsigned long start = jiffies;
-			struct mce *m = &mcelog_buf.entry[i];
-
-			while (!m->finished) {
-				if (time_after_eq(jiffies, start + 2)) {
-					memset(m, 0, sizeof(*m));
-					goto timeout;
-				}
-				cpu_relax();
-			}
-			smp_rmb();
-			err |= copy_to_user(buf, m, sizeof(*m));
-			buf += sizeof(*m);
-timeout:
-			;
-		}
-
-		memset(mcelog_buf.entry + prev, 0,
-		       (next - prev) * sizeof(struct mce));
-		prev = next;
-		next = cmpxchg(&mcelog_buf.next, prev, 0);
-	} while (next != prev);
-
-	synchronize_sched();
-
-	/*
-	 * Collect entries that were still getting written before the
-	 * synchronize.
-	 */
-	on_each_cpu(collect_tscs, cpu_tsc, 1);
-
-	for (i = next; i < MCE_LOG_LEN; i++) {
-		struct mce *m = &mcelog_buf.entry[i];
-
-		if (m->finished && m->tsc < cpu_tsc[m->cpu]) {
-			err |= copy_to_user(buf, m, sizeof(*m));
-			smp_rmb();
-			buf += sizeof(*m);
-			memset(m, 0, sizeof(*m));
-		}
-	}
-
-	if (err)
-		err = -EFAULT;
-
-out:
-	mutex_unlock(&mce_chrdev_read_mutex);
-	kfree(cpu_tsc);
-
-	return err ? err : buf - ubuf;
-}
-
-static unsigned int mce_chrdev_poll(struct file *file, poll_table *wait)
-{
-	poll_wait(file, &mce_chrdev_wait, wait);
-	if (READ_ONCE(mcelog_buf.next))
-		return POLLIN | POLLRDNORM;
-	if (!mce_apei_read_done && apei_check_mce())
-		return POLLIN | POLLRDNORM;
-	return 0;
-}
-
-static long mce_chrdev_ioctl(struct file *f, unsigned int cmd,
-				unsigned long arg)
-{
-	int __user *p = (int __user *)arg;
-
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	switch (cmd) {
-	case MCE_GET_RECORD_LEN:
-		return put_user(sizeof(struct mce), p);
-	case MCE_GET_LOG_LEN:
-		return put_user(MCE_LOG_LEN, p);
-	case MCE_GETCLEAR_FLAGS: {
-		unsigned flags;
-
-		do {
-			flags = mcelog_buf.flags;
-		} while (cmpxchg(&mcelog_buf.flags, flags, 0) != flags);
-
-		return put_user(flags, p);
-	}
-	default:
-		return -ENOTTY;
-	}
-}
-
-static ssize_t (*mce_write)(struct file *filp, const char __user *ubuf,
-			    size_t usize, loff_t *off);
-
-void register_mce_write_callback(ssize_t (*fn)(struct file *filp,
-			     const char __user *ubuf,
-			     size_t usize, loff_t *off))
-{
-	mce_write = fn;
-}
-EXPORT_SYMBOL_GPL(register_mce_write_callback);
-
-static ssize_t mce_chrdev_write(struct file *filp, const char __user *ubuf,
-				size_t usize, loff_t *off)
-{
-	if (mce_write)
-		return mce_write(filp, ubuf, usize, off);
-	else
-		return -EINVAL;
-}
-
-static const struct file_operations mce_chrdev_ops = {
-	.open			= mce_chrdev_open,
-	.release		= mce_chrdev_release,
-	.read			= mce_chrdev_read,
-	.write			= mce_chrdev_write,
-	.poll			= mce_chrdev_poll,
-	.unlocked_ioctl		= mce_chrdev_ioctl,
-	.llseek			= no_llseek,
-};
-
-static struct miscdevice mce_chrdev_device = {
-	MISC_MCELOG_MINOR,
-	"mcelog",
-	&mce_chrdev_ops,
-};
-
 static void __mce_disable_bank(void *arg)
 {
 	int bank = *((int *)arg);
@@ -2349,29 +2028,6 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
 	return size;
 }
 
-static ssize_t
-show_trigger(struct device *s, struct device_attribute *attr, char *buf)
-{
-	strcpy(buf, mce_helper);
-	strcat(buf, "\n");
-	return strlen(mce_helper) + 1;
-}
-
-static ssize_t set_trigger(struct device *s, struct device_attribute *attr,
-				const char *buf, size_t siz)
-{
-	char *p;
-
-	strncpy(mce_helper, buf, sizeof(mce_helper));
-	mce_helper[sizeof(mce_helper)-1] = 0;
-	p = strchr(mce_helper, '\n');
-
-	if (p)
-		*p = 0;
-
-	return strlen(mce_helper) + !!p;
-}
-
 static ssize_t set_ignore_ce(struct device *s,
 			     struct device_attribute *attr,
 			     const char *buf, size_t size)
@@ -2428,7 +2084,6 @@ static ssize_t store_int_with_restart(struct device *s,
 	return ret;
 }
 
-static DEVICE_ATTR(trigger, 0644, show_trigger, set_trigger);
 static DEVICE_INT_ATTR(tolerant, 0644, mca_cfg.tolerant);
 static DEVICE_INT_ATTR(monarch_timeout, 0644, mca_cfg.monarch_timeout);
 static DEVICE_BOOL_ATTR(dont_log_ce, 0644, mca_cfg.dont_log_ce);
@@ -2451,7 +2106,9 @@ static struct dev_ext_attribute dev_attr_cmci_disabled = {
 static struct device_attribute *mce_device_attrs[] = {
 	&dev_attr_tolerant.attr,
 	&dev_attr_check_interval.attr,
+#ifdef CONFIG_X86_MCELOG_LEGACY
 	&dev_attr_trigger,
+#endif
 	&dev_attr_monarch_timeout.attr,
 	&dev_attr_dont_log_ce.attr,
 	&dev_attr_ignore_ce.attr,
@@ -2625,7 +2282,6 @@ static __init void mce_init_banks(void)
 
 static __init int mcheck_init_device(void)
 {
-	enum cpuhp_state hp_online;
 	int err;
 
 	if (!mce_available(&boot_cpu_data)) {
@@ -2653,21 +2309,11 @@ static __init int mcheck_init_device(void)
 				mce_cpu_online, mce_cpu_pre_down);
 	if (err < 0)
 		goto err_out_online;
-	hp_online = err;
 
 	register_syscore_ops(&mce_syscore_ops);
 
-	/* register character device /dev/mcelog */
-	err = misc_register(&mce_chrdev_device);
-	if (err)
-		goto err_register;
-
 	return 0;
 
-err_register:
-	unregister_syscore_ops(&mce_syscore_ops);
-	cpuhp_remove_state(hp_online);
-
 err_out_online:
 	cpuhp_remove_state(CPUHP_X86_MCE_DEAD);
 
@@ -2675,7 +2321,7 @@ err_out_mem:
 	free_cpumask_var(mce_device_initialized);
 
 err_out:
-	pr_err("Unable to init device /dev/mcelog (rc: %d)\n", err);
+	pr_err("Unable to init MCE device (rc: %d)\n", err);
 
 	return err;
 }

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:ras/core] x86/mce: Do not register notifiers with invalid prio
  2017-03-27  9:33 ` [PATCH 6/6] x86/mce: Do not register notifiers with invalid prio Borislav Petkov
@ 2017-03-28  7:04   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Borislav Petkov @ 2017-03-28  7:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, hpa, linux-kernel, mingo, tglx, peterz, linux-edac, bp

Commit-ID:  32b40a82e84414499bb4d635cafb9de9e45e6c38
Gitweb:     http://git.kernel.org/tip/32b40a82e84414499bb4d635cafb9de9e45e6c38
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Mon, 27 Mar 2017 11:33:04 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 28 Mar 2017 08:55:15 +0200

x86/mce: Do not register notifiers with invalid prio

This is just a defensive precaution: do not register notifiers with a
priority which would disrupt the error handling in the notifiers with
prio higher than MCE_PRIO_EDAC.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170327093304.10683-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 36082c7..a09bb67 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -157,9 +157,10 @@ static atomic_t num_notifiers;
 
 void mce_register_decode_chain(struct notifier_block *nb)
 {
-	atomic_inc(&num_notifiers);
+	if (WARN_ON(nb->priority > MCE_PRIO_LOWEST && nb->priority < MCE_PRIO_EDAC))
+		return;
 
-	WARN_ON(nb->priority > MCE_PRIO_LOWEST && nb->priority < MCE_PRIO_EDAC);
+	atomic_inc(&num_notifiers);
 
 	atomic_notifier_chain_register(&x86_mce_decoder_chain, nb);
 }

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-03-28  7:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-27  9:32 [PATCH 0/6] x86/RAS: Correctable Errors Collector Borislav Petkov
2017-03-27  9:32 ` [PATCH 1/6] x86/mce: Don't print MCEs when mcelog is active Borislav Petkov
2017-03-28  7:01   ` [tip:ras/core] " tip-bot for Andi Kleen
2017-03-27  9:33 ` [PATCH 2/6] x86/MCE: Rename mce_log()'s argument Borislav Petkov
2017-03-28  7:01   ` [tip:ras/core] x86/mce: " tip-bot for Borislav Petkov
2017-03-27  9:33 ` [PATCH 3/6] x86/MCE: Rename mce_log to mce_log_buffer Borislav Petkov
2017-03-28  7:02   ` [tip:ras/core] x86/mce: " tip-bot for Borislav Petkov
2017-03-27  9:33 ` [PATCH 4/6] RAS: Add a Corrected Errors Collector Borislav Petkov
2017-03-28  7:02   ` [tip:ras/core] " tip-bot for Borislav Petkov
2017-03-27  9:33 ` [PATCH 5/6] x86/mce: Deprecate /dev/mcelog Borislav Petkov
2017-03-28  7:03   ` [tip:ras/core] x86/mce: Factor out and deprecate the /dev/mcelog driver tip-bot for Tony Luck
2017-03-27  9:33 ` [PATCH 6/6] x86/mce: Do not register notifiers with invalid prio Borislav Petkov
2017-03-28  7:04   ` [tip:ras/core] " tip-bot for Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.