All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] x86/ras material for 4.3 queue
@ 2015-07-10 21:57 Luck, Tony
  2015-07-15 11:30 ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Luck, Tony @ 2015-07-10 21:57 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, x86, bp, ashok.raj, gong.chen

Some of these almost made it into 4.2, then we found a bug and
delayed to fix it.  Bug fixes have now been merged back into
the original patch series.

The following changes since commit d770e558e21961ad6cfdf0ff7df0eb5d7d4f0754:

  Linux 4.2-rc1 (2015-07-05 11:01:52 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git tags/please-pull-ras-for-4.3

for you to fetch changes up to 60e23e3342d0ff1201e8ce160a3624bd2ce5ff79:

  x86/mce: Clear Local MCE opt-in before kexec (2015-07-06 14:21:12 -0700)

----------------------------------------------------------------
1) Chen Gong series to make mce logging safer in #MC context
2) Boris deleted drain_mcelog_buffer() - don't want/need it now
3) Ashok fixed a local machine check corner case with kexec

----------------------------------------------------------------
Ashok Raj (2):
      x86/mce: Remove unused function declarations
      x86/mce: Clear Local MCE opt-in before kexec

Borislav Petkov (1):
      x86/mce: Kill drain_mcelog_buffer()

Chen, Gong (4):
      x86/mce: Provide a lockless memory pool to save error records
      x86/mce: Don't use percpu workqueues
      x86/mce: Remove the MCE ring for Action Optional errors
      x86/mce: Avoid potential deadlock due to printk() in MCE context

 arch/x86/Kconfig                          |   1 +
 arch/x86/include/asm/mce.h                |   8 +-
 arch/x86/include/uapi/asm/mce.h           |   3 +-
 arch/x86/kernel/cpu/mcheck/Makefile       |   2 +-
 arch/x86/kernel/cpu/mcheck/mce-apei.c     |   1 -
 arch/x86/kernel/cpu/mcheck/mce-genpool.c  |  99 +++++++++++++
 arch/x86/kernel/cpu/mcheck/mce-internal.h |  12 ++
 arch/x86/kernel/cpu/mcheck/mce.c          | 221 ++++++++++++++----------------
 arch/x86/kernel/cpu/mcheck/mce_intel.c    |  20 ++-
 arch/x86/kernel/process.c                 |   2 +
 arch/x86/kernel/smp.c                     |   2 +
 11 files changed, 242 insertions(+), 129 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/mcheck/mce-genpool.c

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] x86/ras material for 4.3 queue
  2015-07-10 21:57 [GIT PULL] x86/ras material for 4.3 queue Luck, Tony
@ 2015-07-15 11:30 ` Ingo Molnar
  2015-07-16  7:39   ` Borislav Petkov
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2015-07-15 11:30 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-kernel, x86, bp, ashok.raj, gong.chen


* Luck, Tony <tony.luck@intel.com> wrote:

> Some of these almost made it into 4.2, then we found a bug and
> delayed to fix it.  Bug fixes have now been merged back into
> the original patch series.
> 
> The following changes since commit d770e558e21961ad6cfdf0ff7df0eb5d7d4f0754:
> 
>   Linux 4.2-rc1 (2015-07-05 11:01:52 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git tags/please-pull-ras-for-4.3
> 
> for you to fetch changes up to 60e23e3342d0ff1201e8ce160a3624bd2ce5ff79:
> 
>   x86/mce: Clear Local MCE opt-in before kexec (2015-07-06 14:21:12 -0700)
> 
> ----------------------------------------------------------------
> 1) Chen Gong series to make mce logging safer in #MC context
> 2) Boris deleted drain_mcelog_buffer() - don't want/need it now
> 3) Ashok fixed a local machine check corner case with kexec
> 
> ----------------------------------------------------------------
> Ashok Raj (2):
>       x86/mce: Remove unused function declarations
>       x86/mce: Clear Local MCE opt-in before kexec
> 
> Borislav Petkov (1):
>       x86/mce: Kill drain_mcelog_buffer()
> 
> Chen, Gong (4):
>       x86/mce: Provide a lockless memory pool to save error records
>       x86/mce: Don't use percpu workqueues
>       x86/mce: Remove the MCE ring for Action Optional errors
>       x86/mce: Avoid potential deadlock due to printk() in MCE context

So the SOB chains are messed up in a number of commits:

  commit ff7d8f3c477ea2340675179cb8393be5566e4617
  Author: Chen, Gong <gong.chen@linux.intel.com>
  Date:   Wed May 20 15:35:35 2015 -0400

    Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
    Cc: Tony Luck <tony.luck@intel.com>
    Link: http://lkml.kernel.org/r/1432150538-3120-2-git-send-email-gong.chen@linux.intel.com
    [ Rewrite. ]
    Signed-off-by: Borislav Petkov <bp@suse.de>

did you rebase a tree from Boris?

Also, please send all patches as part of the submission so that I can comment on 
individual patches as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] x86/ras material for 4.3 queue
  2015-07-15 11:30 ` Ingo Molnar
@ 2015-07-16  7:39   ` Borislav Petkov
  2015-07-16  7:44     ` [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records Borislav Petkov
  0 siblings, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2015-07-16  7:39 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Luck, Tony, linux-kernel, x86, ashok.raj, gong.chen

On Wed, Jul 15, 2015 at 01:30:14PM +0200, Ingo Molnar wrote:
> did you rebase a tree from Boris?

Yep, he did. We decided to do so because the fixes missed 4.2 and for
4.3 a clean rebase is just fine.

> Also, please send all patches as part of the submission so that I can
> comment on individual patches as well.

Sure, lemme do that now as a reply to this message, because now Tony's
on vacation. The advantage of being two maintainers.

:-)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records
  2015-07-16  7:39   ` Borislav Petkov
@ 2015-07-16  7:44     ` Borislav Petkov
  2015-07-16  7:44       ` [PATCH 2/7] x86/mce: Don't use percpu workqueues Borislav Petkov
                         ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Borislav Petkov @ 2015-07-16  7:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tony Luck, X86-ML, LKML, ashok.raj, gong.chen, Borislav Petkov

From: "Chen, Gong" <gong.chen@linux.intel.com>

printk() is not safe to use in MCE context. Add a lockless memory
allocator pool to save error records in MCE context. Those records will
be issued later, in a printk-safe context. The idea is inspired by
the APEI/GHES driver.

We're very conservative and allocate only two pages for it but since
we're going to use those pages throughout the system's lifetime, we
allocate them statically to avoid early boot time allocation woes.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1432150538-3120-2-git-send-email-gong.chen@linux.intel.com
[ Rewrite. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/Kconfig                          |  1 +
 arch/x86/include/uapi/asm/mce.h           |  3 +-
 arch/x86/kernel/cpu/mcheck/Makefile       |  2 +-
 arch/x86/kernel/cpu/mcheck/mce-genpool.c  | 99 +++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/mcheck/mce-internal.h | 12 ++++
 arch/x86/kernel/cpu/mcheck/mce.c          |  8 ++-
 6 files changed, 122 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/mcheck/mce-genpool.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 55bced17dc95..94cc6ed861be 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -949,6 +949,7 @@ config X86_REROUTE_FOR_BROKEN_BOOT_IRQS
 
 config X86_MCE
 	bool "Machine Check / overheating reporting"
+	select GENERIC_ALLOCATOR
 	default y
 	---help---
 	  Machine Check support allows the processor to notify the
diff --git a/arch/x86/include/uapi/asm/mce.h b/arch/x86/include/uapi/asm/mce.h
index a0eab85ce7b8..76880ede9a35 100644
--- a/arch/x86/include/uapi/asm/mce.h
+++ b/arch/x86/include/uapi/asm/mce.h
@@ -15,7 +15,8 @@ struct mce {
 	__u64 time;	/* wall time_t when error was detected */
 	__u8  cpuvendor;	/* cpu vendor as encoded in system.h */
 	__u8  inject_flags;	/* software inject flags */
-	__u16  pad;
+	__u8  severity;
+	__u8  usable_addr;
 	__u32 cpuid;	/* CPUID 1 EAX */
 	__u8  cs;		/* code segment */
 	__u8  bank;	/* machine check bank */
diff --git a/arch/x86/kernel/cpu/mcheck/Makefile b/arch/x86/kernel/cpu/mcheck/Makefile
index bb34b03af252..a3311c886194 100644
--- a/arch/x86/kernel/cpu/mcheck/Makefile
+++ b/arch/x86/kernel/cpu/mcheck/Makefile
@@ -1,4 +1,4 @@
-obj-y				=  mce.o mce-severity.o
+obj-y				=  mce.o mce-severity.o mce-genpool.o
 
 obj-$(CONFIG_X86_ANCIENT_MCE)	+= winchip.o p5.o
 obj-$(CONFIG_X86_MCE_INTEL)	+= mce_intel.o
diff --git a/arch/x86/kernel/cpu/mcheck/mce-genpool.c b/arch/x86/kernel/cpu/mcheck/mce-genpool.c
new file mode 100644
index 000000000000..d32e262a6d0b
--- /dev/null
+++ b/arch/x86/kernel/cpu/mcheck/mce-genpool.c
@@ -0,0 +1,99 @@
+/*
+ * MCE event pool management in MCE context
+ *
+ * Copyright (C) 2015 Intel Corp.
+ * Author: Chen, Gong <gong.chen@linux.intel.com>
+ *
+ * This file is licensed under GPLv2.
+ */
+#include <linux/smp.h>
+#include <linux/mm.h>
+#include <linux/genalloc.h>
+#include <linux/llist.h>
+#include "mce-internal.h"
+
+/*
+ * printk() is not safe in MCE context. This is a lock-less memory allocator
+ * used to save error information organized in a lock-less list.
+ *
+ * This memory pool is only to be used to save MCE records in MCE context.
+ * MCE events are rare so a fixed size memory pool should be enough. Use
+ * 2 pages to save MCE events for now (~80 MCE records at most).
+ */
+#define MCE_POOLSZ	(2 * PAGE_SIZE)
+
+static struct gen_pool *mce_evt_pool;
+static LLIST_HEAD(mce_event_llist);
+static char genpool_buf[MCE_POOLSZ];
+
+void mce_genpool_process(void)
+{
+	struct llist_node *head;
+	struct mce_evt_llist *node;
+	struct mce *mce;
+
+	head = llist_del_all(&mce_event_llist);
+	if (!head)
+		return;
+
+	head = llist_reverse_order(head);
+	llist_for_each_entry(node, head, llnode) {
+		mce = &node->mce;
+		atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, mce);
+		gen_pool_free(mce_evt_pool, (unsigned long)node, sizeof(*node));
+	}
+}
+
+bool mce_genpool_empty(void)
+{
+	return llist_empty(&mce_event_llist);
+}
+
+bool mce_genpool_add(struct mce *mce)
+{
+	struct mce_evt_llist *node;
+
+	if (!mce_evt_pool)
+		return false;
+
+	node = (void *)gen_pool_alloc(mce_evt_pool, sizeof(*node));
+	if (!node) {
+		pr_warn_ratelimited("MCE records pool full!\n");
+		return false;
+	}
+
+	memcpy(&node->mce, mce, sizeof(*mce));
+	llist_add(&node->llnode, &mce_event_llist);
+
+	return true;
+}
+
+static int mce_genpool_create(void)
+{
+	struct gen_pool *tmpp;
+	int ret = -ENOMEM;
+
+	tmpp = gen_pool_create(ilog2(sizeof(struct mce_evt_llist)), -1);
+	if (!tmpp)
+		goto out;
+
+	ret = gen_pool_add(tmpp, (unsigned long)genpool_buf, MCE_POOLSZ, -1);
+	if (ret) {
+		gen_pool_destroy(tmpp);
+		goto out;
+	}
+
+	mce_evt_pool = tmpp;
+
+out:
+	return ret;
+}
+
+int mce_genpool_init(void)
+{
+	/* Just init mce_genpool once. */
+	if (mce_evt_pool)
+		return 0;
+
+	return mce_genpool_create();
+}
diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index fe32074b865b..70d43f541bed 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -13,6 +13,8 @@ enum severity_level {
 	MCE_PANIC_SEVERITY,
 };
 
+extern struct atomic_notifier_head x86_mce_decoder_chain;
+
 #define ATTR_LEN		16
 #define INITIAL_CHECK_INTERVAL	5 * 60 /* 5 minutes */
 
@@ -24,6 +26,16 @@ struct mce_bank {
 	char			attrname[ATTR_LEN];	/* attribute name */
 };
 
+struct mce_evt_llist {
+	struct llist_node llnode;
+	struct mce mce;
+};
+
+void mce_genpool_process(void);
+bool mce_genpool_empty(void);
+bool mce_genpool_add(struct mce *mce);
+int mce_genpool_init(void);
+
 extern int (*mce_severity)(struct mce *a, int tolerant, char **msg, bool is_excp);
 struct dentry *mce_get_debugfs_dir(void);
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index df919ff103c3..766c4c30b5b7 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -118,7 +118,7 @@ static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
  * CPU/chipset specific EDAC code can register a notifier call here to print
  * MCE errors in a human-readable form.
  */
-static ATOMIC_NOTIFIER_HEAD(x86_mce_decoder_chain);
+ATOMIC_NOTIFIER_HEAD(x86_mce_decoder_chain);
 
 /* Do initial initialization of a struct mce */
 void mce_setup(struct mce *m)
@@ -1731,6 +1731,12 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 		return;
 	}
 
+	if (mce_genpool_init()) {
+		mca_cfg.disabled = true;
+		pr_emerg("Couldn't allocate MCE records pool!\n");
+		return;
+	}
+
 	machine_check_vector = do_machine_check;
 
 	__mcheck_cpu_init_generic();
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/7] x86/mce: Don't use percpu workqueues
  2015-07-16  7:44     ` [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records Borislav Petkov
@ 2015-07-16  7:44       ` Borislav Petkov
  2015-07-16  7:44       ` [PATCH 3/7] x86/mce: Remove the MCE ring for Action Optional errors Borislav Petkov
                         ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Borislav Petkov @ 2015-07-16  7:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tony Luck, X86-ML, LKML, ashok.raj, gong.chen, Borislav Petkov

From: "Chen, Gong" <gong.chen@linux.intel.com>

An MCE is a rare event. Therefore, there's no need to have per-CPU
instances of both normal and IRQ workqueues. Make them both global.

[Tony: Folded in subsequent patch from Rui/Boris/Tony for early boot logging]

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1432150538-3120-3-git-send-email-gong.chen@linux.intel.com
[ massage commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 766c4c30b5b7..51f707e8fd62 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -110,7 +110,8 @@ DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
  */
 mce_banks_t mce_banks_ce_disabled;
 
-static DEFINE_PER_CPU(struct work_struct, mce_work);
+static struct work_struct mce_work;
+static struct irq_work mce_irq_work;
 
 static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
 
@@ -526,11 +527,9 @@ int mce_available(struct cpuinfo_x86 *c)
 static void mce_schedule_work(void)
 {
 	if (!mce_ring_empty())
-		schedule_work(this_cpu_ptr(&mce_work));
+		schedule_work(&mce_work);
 }
 
-static DEFINE_PER_CPU(struct irq_work, mce_irq_work);
-
 static void mce_irq_work_cb(struct irq_work *entry)
 {
 	mce_notify_irq();
@@ -551,7 +550,7 @@ static void mce_report_event(struct pt_regs *regs)
 		return;
 	}
 
-	irq_work_queue(this_cpu_ptr(&mce_irq_work));
+	irq_work_queue(&mce_irq_work);
 }
 
 /*
@@ -1742,8 +1741,6 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_vendor(c);
 	__mcheck_cpu_init_timer();
-	INIT_WORK(this_cpu_ptr(&mce_work), mce_process_work);
-	init_irq_work(this_cpu_ptr(&mce_irq_work), &mce_irq_work_cb);
 }
 
 /*
@@ -2064,6 +2061,9 @@ int __init mcheck_init(void)
 	mcheck_intel_therm_init();
 	mcheck_vendor_init_severity();
 
+	INIT_WORK(&mce_work, mce_process_work);
+	init_irq_work(&mce_irq_work, mce_irq_work_cb);
+
 	return 0;
 }
 
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/7] x86/mce: Remove the MCE ring for Action Optional errors
  2015-07-16  7:44     ` [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records Borislav Petkov
  2015-07-16  7:44       ` [PATCH 2/7] x86/mce: Don't use percpu workqueues Borislav Petkov
@ 2015-07-16  7:44       ` Borislav Petkov
  2015-07-16  7:44       ` [PATCH 4/7] x86/mce: Avoid potential deadlock due to printk() in MCE context Borislav Petkov
                         ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Borislav Petkov @ 2015-07-16  7:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tony Luck, X86-ML, LKML, ashok.raj, gong.chen, Borislav Petkov

From: "Chen, Gong" <gong.chen@linux.intel.com>

Use unified genpool to save Action Optional error events and put Action
Optional error handling in the same notification chain as MCE error
decoding.

[Tony: Folded in subsequent patch from Boris for early boot logging]

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1432150538-3120-4-git-send-email-gong.chen@linux.intel.com
[ Correct a lot. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/mce.h       |   2 +-
 arch/x86/kernel/cpu/mcheck/mce.c | 135 +++++++++++++++++----------------------
 drivers/acpi/acpi_extlog.c       |   2 +-
 drivers/edac/i7core_edac.c       |   2 +-
 drivers/edac/mce_amd.c           |   2 +-
 drivers/edac/sb_edac.c           |   2 +-
 6 files changed, 65 insertions(+), 80 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 982dfc3679ad..dfaa4de1dbb4 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -140,7 +140,7 @@ struct mce_vendor_flags {
 extern struct mce_vendor_flags mce_flags;
 
 extern struct mca_config mca_cfg;
-extern void mce_register_decode_chain(struct notifier_block *nb);
+extern void mce_register_decode_chain(struct notifier_block *nb, bool drain);
 extern void mce_unregister_decode_chain(struct notifier_block *nb);
 
 #include <linux/percpu.h>
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 51f707e8fd62..a76d3251cadf 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -114,6 +114,7 @@ static struct work_struct mce_work;
 static struct irq_work mce_irq_work;
 
 static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
+static int mce_usable_address(struct mce *m);
 
 /*
  * CPU/chipset specific EDAC code can register a notifier call here to print
@@ -234,11 +235,18 @@ static void drain_mcelog_buffer(void)
 	} while (next != prev);
 }
 
+static struct notifier_block mce_srao_nb;
 
-void mce_register_decode_chain(struct notifier_block *nb)
+void mce_register_decode_chain(struct notifier_block *nb, bool drain)
 {
+	/* Ensure SRAO notifier has the highest priority in the decode chain. */
+	if (nb != &mce_srao_nb && nb->priority == INT_MAX)
+		nb->priority -= 1;
+
 	atomic_notifier_chain_register(&x86_mce_decoder_chain, nb);
-	drain_mcelog_buffer();
+
+	if (drain)
+		drain_mcelog_buffer();
 }
 EXPORT_SYMBOL_GPL(mce_register_decode_chain);
 
@@ -462,61 +470,6 @@ static inline void mce_gather_info(struct mce *m, struct pt_regs *regs)
 	}
 }
 
-/*
- * Simple lockless ring to communicate PFNs from the exception handler with the
- * process context work function. This is vastly simplified because there's
- * only a single reader and a single writer.
- */
-#define MCE_RING_SIZE 16	/* we use one entry less */
-
-struct mce_ring {
-	unsigned short start;
-	unsigned short end;
-	unsigned long ring[MCE_RING_SIZE];
-};
-static DEFINE_PER_CPU(struct mce_ring, mce_ring);
-
-/* Runs with CPU affinity in workqueue */
-static int mce_ring_empty(void)
-{
-	struct mce_ring *r = this_cpu_ptr(&mce_ring);
-
-	return r->start == r->end;
-}
-
-static int mce_ring_get(unsigned long *pfn)
-{
-	struct mce_ring *r;
-	int ret = 0;
-
-	*pfn = 0;
-	get_cpu();
-	r = this_cpu_ptr(&mce_ring);
-	if (r->start == r->end)
-		goto out;
-	*pfn = r->ring[r->start];
-	r->start = (r->start + 1) % MCE_RING_SIZE;
-	ret = 1;
-out:
-	put_cpu();
-	return ret;
-}
-
-/* Always runs in MCE context with preempt off */
-static int mce_ring_add(unsigned long pfn)
-{
-	struct mce_ring *r = this_cpu_ptr(&mce_ring);
-	unsigned next;
-
-	next = (r->end + 1) % MCE_RING_SIZE;
-	if (next == r->start)
-		return -1;
-	r->ring[r->end] = pfn;
-	wmb();
-	r->end = next;
-	return 0;
-}
-
 int mce_available(struct cpuinfo_x86 *c)
 {
 	if (mca_cfg.disabled)
@@ -526,7 +479,7 @@ int mce_available(struct cpuinfo_x86 *c)
 
 static void mce_schedule_work(void)
 {
-	if (!mce_ring_empty())
+	if (!mce_genpool_empty() && keventd_up())
 		schedule_work(&mce_work);
 }
 
@@ -553,6 +506,27 @@ static void mce_report_event(struct pt_regs *regs)
 	irq_work_queue(&mce_irq_work);
 }
 
+static int srao_decode_notifier(struct notifier_block *nb, unsigned long val,
+				void *data)
+{
+	struct mce *mce = (struct mce *)data;
+	unsigned long pfn;
+
+	if (!mce)
+		return NOTIFY_DONE;
+
+	if (mce->usable_addr && (mce->severity == MCE_AO_SEVERITY)) {
+		pfn = mce->addr >> PAGE_SHIFT;
+		memory_failure(pfn, MCE_VECTOR, 0);
+	}
+
+	return NOTIFY_OK;
+}
+static struct notifier_block mce_srao_nb = {
+	.notifier_call	= srao_decode_notifier,
+	.priority = INT_MAX,
+};
+
 /*
  * Read ADDR and MISC registers.
  */
@@ -671,8 +645,11 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		 */
 		if (severity == MCE_DEFERRED_SEVERITY && memory_error(&m)) {
 			if (m.status & MCI_STATUS_ADDRV) {
-				mce_ring_add(m.addr >> PAGE_SHIFT);
-				mce_schedule_work();
+				m.severity = severity;
+				m.usable_addr = mce_usable_address(&m);
+
+				if (mce_genpool_add(&m))
+					mce_schedule_work();
 			}
 		}
 
@@ -1142,15 +1119,10 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 
 		mce_read_aux(&m, i);
 
-		/*
-		 * Action optional error. Queue address for later processing.
-		 * When the ring overflows we just ignore the AO error.
-		 * RED-PEN add some logging mechanism when
-		 * usable_address or mce_add_ring fails.
-		 * RED-PEN don't ignore overflow for mca_cfg.tolerant == 0
-		 */
-		if (severity == MCE_AO_SEVERITY && mce_usable_address(&m))
-			mce_ring_add(m.addr >> PAGE_SHIFT);
+		/* assuming valid severity level != 0 */
+		m.severity = severity;
+		m.usable_addr = mce_usable_address(&m);
+		mce_genpool_add(&m);
 
 		mce_log(&m);
 
@@ -1246,14 +1218,11 @@ int memory_failure(unsigned long pfn, int vector, int flags)
 /*
  * Action optional processing happens here (picking up
  * from the list of faulting pages that do_machine_check()
- * placed into the "ring").
+ * placed into the genpool).
  */
 static void mce_process_work(struct work_struct *dummy)
 {
-	unsigned long pfn;
-
-	while (mce_ring_get(&pfn))
-		memory_failure(pfn, MCE_VECTOR, 0);
+	mce_genpool_process();
 }
 
 #ifdef CONFIG_X86_MCE_INTEL
@@ -2059,6 +2028,7 @@ __setup("mce", mcheck_enable);
 int __init mcheck_init(void)
 {
 	mcheck_intel_therm_init();
+	mce_register_decode_chain(&mce_srao_nb, false);
 	mcheck_vendor_init_severity();
 
 	INIT_WORK(&mce_work, mce_process_work);
@@ -2597,5 +2567,20 @@ static int __init mcheck_debugfs_init(void)
 
 	return 0;
 }
-late_initcall(mcheck_debugfs_init);
+#else
+static int __init mcheck_debugfs_init(void) { return -EINVAL; }
 #endif
+
+static int __init mcheck_late_init(void)
+{
+	mcheck_debugfs_init();
+
+	/*
+	 * Flush out everything that has been logged during early boot, now that
+	 * everything has been initialized (workqueues, decoders, ...).
+	 */
+	mce_schedule_work();
+
+	return 0;
+}
+late_initcall(mcheck_late_init);
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index b3842ffc19ba..07e012e74c1b 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -286,7 +286,7 @@ static int __init extlog_init(void)
 	 */
 	old_edac_report_status = get_edac_report_status();
 	set_edac_report_status(EDAC_REPORTING_DISABLED);
-	mce_register_decode_chain(&extlog_mce_dec);
+	mce_register_decode_chain(&extlog_mce_dec, true);
 	/* enable OS to be involved to take over management from BIOS */
 	((struct extlog_l1_head *)extlog_l1_addr)->flags |= FLAG_OS_OPTIN;
 
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 01087a38da22..13d77f4a892c 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -2424,7 +2424,7 @@ static int __init i7core_init(void)
 	pci_rc = pci_register_driver(&i7core_driver);
 
 	if (pci_rc >= 0) {
-		mce_register_decode_chain(&i7_mce_dec);
+		mce_register_decode_chain(&i7_mce_dec, true);
 		return 0;
 	}
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 58586d59bf8e..aca31a237073 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -895,7 +895,7 @@ static int __init mce_amd_init(void)
 
 	pr_info("MCE: In-kernel MCE decoding enabled.\n");
 
-	mce_register_decode_chain(&amd_mce_dec_nb);
+	mce_register_decode_chain(&amd_mce_dec_nb, true);
 
 	return 0;
 }
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index ca7831168298..5780e26c3e58 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -2591,7 +2591,7 @@ static int __init sbridge_init(void)
 
 	pci_rc = pci_register_driver(&sbridge_driver);
 	if (pci_rc >= 0) {
-		mce_register_decode_chain(&sbridge_mce_dec);
+		mce_register_decode_chain(&sbridge_mce_dec, true);
 		if (get_edac_report_status() == EDAC_REPORTING_DISABLED)
 			sbridge_printk(KERN_WARNING, "Loading driver, error reporting disabled.\n");
 		return 0;
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/7] x86/mce: Avoid potential deadlock due to printk() in MCE context
  2015-07-16  7:44     ` [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records Borislav Petkov
  2015-07-16  7:44       ` [PATCH 2/7] x86/mce: Don't use percpu workqueues Borislav Petkov
  2015-07-16  7:44       ` [PATCH 3/7] x86/mce: Remove the MCE ring for Action Optional errors Borislav Petkov
@ 2015-07-16  7:44       ` Borislav Petkov
  2015-07-16  7:44       ` [PATCH 5/7] x86/mce: Kill drain_mcelog_buffer() Borislav Petkov
                         ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Borislav Petkov @ 2015-07-16  7:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tony Luck, X86-ML, LKML, ashok.raj, gong.chen, Borislav Petkov

From: "Chen, Gong" <gong.chen@linux.intel.com>

Printing in MCE context is a no-no, currently, as printk is not
NMI-safe. If some of the notifiers on the MCE chain call do so, we may
deadlock. In order to avoid that, delay printk() to process context
where it is safe to do so.

[Tony: Folded in subsequent patch from Boris for early boot logging]

Reported-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1432150538-3120-5-git-send-email-gong.chen@linux.intel.com
[ Boris: kick irq_work in mce_log() directly. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce-apei.c  | 1 -
 arch/x86/kernel/cpu/mcheck/mce.c       | 4 ++--
 arch/x86/kernel/cpu/mcheck/mce_intel.c | 1 -
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
index a1aef9533154..34c89a3e8260 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
@@ -57,7 +57,6 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
 
 	m.addr = mem_err->physical_addr;
 	mce_log(&m);
-	mce_notify_irq();
 }
 EXPORT_SYMBOL_GPL(apei_mce_report_mem_error);
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index a76d3251cadf..b5214fc58920 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -159,7 +159,8 @@ void mce_log(struct mce *mce)
 	/* Emit the trace record: */
 	trace_mce_record(mce);
 
-	atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, mce);
+	if (mce_genpool_add(mce))
+		irq_work_queue(&mce_irq_work);
 
 	mce->finished = 0;
 	wmb();
@@ -1122,7 +1123,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 		/* assuming valid severity level != 0 */
 		m.severity = severity;
 		m.usable_addr = mce_usable_address(&m);
-		mce_genpool_add(&m);
 
 		mce_log(&m);
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 844f56c5616d..70f567f774ed 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -246,7 +246,6 @@ static void intel_threshold_interrupt(void)
 		return;
 
 	machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned));
-	mce_notify_irq();
 }
 
 /*
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/7] x86/mce: Kill drain_mcelog_buffer()
  2015-07-16  7:44     ` [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records Borislav Petkov
                         ` (2 preceding siblings ...)
  2015-07-16  7:44       ` [PATCH 4/7] x86/mce: Avoid potential deadlock due to printk() in MCE context Borislav Petkov
@ 2015-07-16  7:44       ` Borislav Petkov
  2015-07-16  7:44       ` [PATCH 6/7] x86/mce: Remove unused function declarations Borislav Petkov
                         ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Borislav Petkov @ 2015-07-16  7:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tony Luck, X86-ML, LKML, ashok.raj, gong.chen, Borislav Petkov

This used to flush out MCEs logged during early boot and which were in
the MCA registers from a previous system run. No need for that now,
since we're moving to a genpool.

Signed-off-by: Borislav Petkov <bp@suse.de>
Suggested-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/mce.h       |  2 +-
 arch/x86/kernel/cpu/mcheck/mce.c | 44 ++--------------------------------------
 drivers/acpi/acpi_extlog.c       |  2 +-
 drivers/edac/i7core_edac.c       |  2 +-
 drivers/edac/mce_amd.c           |  2 +-
 drivers/edac/sb_edac.c           |  2 +-
 6 files changed, 7 insertions(+), 47 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index dfaa4de1dbb4..982dfc3679ad 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -140,7 +140,7 @@ struct mce_vendor_flags {
 extern struct mce_vendor_flags mce_flags;
 
 extern struct mca_config mca_cfg;
-extern void mce_register_decode_chain(struct notifier_block *nb, bool drain);
+extern void mce_register_decode_chain(struct notifier_block *nb);
 extern void mce_unregister_decode_chain(struct notifier_block *nb);
 
 #include <linux/percpu.h>
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index b5214fc58920..158d9e7db974 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -199,55 +199,15 @@ void mce_log(struct mce *mce)
 	set_bit(0, &mce_need_notify);
 }
 
-static void drain_mcelog_buffer(void)
-{
-	unsigned int next, i, prev = 0;
-
-	next = ACCESS_ONCE(mcelog.next);
-
-	do {
-		struct mce *m;
-
-		/* drain what was logged during boot */
-		for (i = prev; i < next; i++) {
-			unsigned long start = jiffies;
-			unsigned retries = 1;
-
-			m = &mcelog.entry[i];
-
-			while (!m->finished) {
-				if (time_after_eq(jiffies, start + 2*retries))
-					retries++;
-
-				cpu_relax();
-
-				if (!m->finished && retries >= 4) {
-					pr_err("skipping error being logged currently!\n");
-					break;
-				}
-			}
-			smp_rmb();
-			atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
-		}
-
-		memset(mcelog.entry + prev, 0, (next - prev) * sizeof(*m));
-		prev = next;
-		next = cmpxchg(&mcelog.next, prev, 0);
-	} while (next != prev);
-}
-
 static struct notifier_block mce_srao_nb;
 
-void mce_register_decode_chain(struct notifier_block *nb, bool drain)
+void mce_register_decode_chain(struct notifier_block *nb)
 {
 	/* Ensure SRAO notifier has the highest priority in the decode chain. */
 	if (nb != &mce_srao_nb && nb->priority == INT_MAX)
 		nb->priority -= 1;
 
 	atomic_notifier_chain_register(&x86_mce_decoder_chain, nb);
-
-	if (drain)
-		drain_mcelog_buffer();
 }
 EXPORT_SYMBOL_GPL(mce_register_decode_chain);
 
@@ -2028,7 +1988,7 @@ __setup("mce", mcheck_enable);
 int __init mcheck_init(void)
 {
 	mcheck_intel_therm_init();
-	mce_register_decode_chain(&mce_srao_nb, false);
+	mce_register_decode_chain(&mce_srao_nb);
 	mcheck_vendor_init_severity();
 
 	INIT_WORK(&mce_work, mce_process_work);
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index 07e012e74c1b..b3842ffc19ba 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -286,7 +286,7 @@ static int __init extlog_init(void)
 	 */
 	old_edac_report_status = get_edac_report_status();
 	set_edac_report_status(EDAC_REPORTING_DISABLED);
-	mce_register_decode_chain(&extlog_mce_dec, true);
+	mce_register_decode_chain(&extlog_mce_dec);
 	/* enable OS to be involved to take over management from BIOS */
 	((struct extlog_l1_head *)extlog_l1_addr)->flags |= FLAG_OS_OPTIN;
 
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 13d77f4a892c..01087a38da22 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -2424,7 +2424,7 @@ static int __init i7core_init(void)
 	pci_rc = pci_register_driver(&i7core_driver);
 
 	if (pci_rc >= 0) {
-		mce_register_decode_chain(&i7_mce_dec, true);
+		mce_register_decode_chain(&i7_mce_dec);
 		return 0;
 	}
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index aca31a237073..58586d59bf8e 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -895,7 +895,7 @@ static int __init mce_amd_init(void)
 
 	pr_info("MCE: In-kernel MCE decoding enabled.\n");
 
-	mce_register_decode_chain(&amd_mce_dec_nb, true);
+	mce_register_decode_chain(&amd_mce_dec_nb);
 
 	return 0;
 }
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 5780e26c3e58..ca7831168298 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -2591,7 +2591,7 @@ static int __init sbridge_init(void)
 
 	pci_rc = pci_register_driver(&sbridge_driver);
 	if (pci_rc >= 0) {
-		mce_register_decode_chain(&sbridge_mce_dec, true);
+		mce_register_decode_chain(&sbridge_mce_dec);
 		if (get_edac_report_status() == EDAC_REPORTING_DISABLED)
 			sbridge_printk(KERN_WARNING, "Loading driver, error reporting disabled.\n");
 		return 0;
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/7] x86/mce: Remove unused function declarations
  2015-07-16  7:44     ` [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records Borislav Petkov
                         ` (3 preceding siblings ...)
  2015-07-16  7:44       ` [PATCH 5/7] x86/mce: Kill drain_mcelog_buffer() Borislav Petkov
@ 2015-07-16  7:44       ` Borislav Petkov
  2015-07-16  7:44       ` [PATCH 7/7] x86/mce: Clear Local MCE opt-in before kexec Borislav Petkov
  2015-07-21  8:29       ` [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records Ingo Molnar
  6 siblings, 0 replies; 15+ messages in thread
From: Borislav Petkov @ 2015-07-16  7:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tony Luck, X86-ML, LKML, ashok.raj, gong.chen, linux-edac,
	Borislav Petkov

From: Ashok Raj <ashok.raj@intel.com>

Remove unused function declarations.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1435621095-4802-1-git-send-email-ashok.raj@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/mce.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 982dfc3679ad..38d3a1a8830f 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -185,16 +185,12 @@ void cmci_clear(void);
 void cmci_reenable(void);
 void cmci_rediscover(void);
 void cmci_recheck(void);
-void lmce_clear(void);
-void lmce_enable(void);
 #else
 static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
 static inline void cmci_clear(void) {}
 static inline void cmci_reenable(void) {}
 static inline void cmci_rediscover(void) {}
 static inline void cmci_recheck(void) {}
-static inline void lmce_clear(void) {}
-static inline void lmce_enable(void) {}
 #endif
 
 #ifdef CONFIG_X86_MCE_AMD
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/7] x86/mce: Clear Local MCE opt-in before kexec
  2015-07-16  7:44     ` [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records Borislav Petkov
                         ` (4 preceding siblings ...)
  2015-07-16  7:44       ` [PATCH 6/7] x86/mce: Remove unused function declarations Borislav Petkov
@ 2015-07-16  7:44       ` Borislav Petkov
  2015-07-17  1:16         ` Andy Lutomirski
  2015-07-21  8:29       ` [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records Ingo Molnar
  6 siblings, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2015-07-16  7:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tony Luck, X86-ML, LKML, ashok.raj, gong.chen, Andy Lutomirski,
	Aravind Gopalakrishnan, Oleg Nesterov, linux-edac,
	Borislav Petkov

From: Ashok Raj <ashok.raj@intel.com>

kexec could boot a kernel that could be legacy with no knowledge of
LMCE. Hence we should make sure we clear LMCE optin before kexec reboot.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1435621095-4802-2-git-send-email-ashok.raj@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/mce.h             |  4 ++++
 arch/x86/kernel/cpu/mcheck/mce.c       | 30 ++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/mcheck/mce_intel.c | 19 ++++++++++++++++++-
 arch/x86/kernel/process.c              |  2 ++
 arch/x86/kernel/smp.c                  |  2 ++
 5 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 38d3a1a8830f..2dbc0bf2b9f3 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -151,10 +151,12 @@ extern int mce_p5_enabled;
 #ifdef CONFIG_X86_MCE
 int mcheck_init(void);
 void mcheck_cpu_init(struct cpuinfo_x86 *c);
+void mcheck_cpu_clear(struct cpuinfo_x86 *c);
 void mcheck_vendor_init_severity(void);
 #else
 static inline int mcheck_init(void) { return 0; }
 static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
+static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
 static inline void mcheck_vendor_init_severity(void) {}
 #endif
 
@@ -181,12 +183,14 @@ DECLARE_PER_CPU(struct device *, mce_device);
 
 #ifdef CONFIG_X86_MCE_INTEL
 void mce_intel_feature_init(struct cpuinfo_x86 *c);
+void mce_intel_feature_clear(struct cpuinfo_x86 *c);
 void cmci_clear(void);
 void cmci_reenable(void);
 void cmci_rediscover(void);
 void cmci_recheck(void);
 #else
 static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
+static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { }
 static inline void cmci_clear(void) {}
 static inline void cmci_reenable(void) {}
 static inline void cmci_rediscover(void) {}
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 158d9e7db974..5a19adb86b8f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1606,6 +1606,17 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
 	}
 }
 
+static void __mcheck_cpu_clear_vendor(struct cpuinfo_x86 *c)
+{
+	switch (c->x86_vendor) {
+	case X86_VENDOR_INTEL:
+		mce_intel_feature_clear(c);
+		break;
+	default:
+		break;
+	}
+}
+
 static void mce_start_timer(unsigned int cpu, struct timer_list *t)
 {
 	unsigned long iv = check_interval * HZ;
@@ -1673,6 +1684,25 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 }
 
 /*
+ * Called for each booted CPU to clear some machine checks opt-ins
+ */
+void mcheck_cpu_clear(struct cpuinfo_x86 *c)
+{
+	if (mca_cfg.disabled)
+		return;
+
+	if (!mce_available(c))
+		return;
+
+	/*
+	 * Possibly to clear general settings generic to x86
+	 * __mcheck_cpu_clear_generic(c);
+	 */
+	__mcheck_cpu_clear_vendor(c);
+
+}
+
+/*
  * mce_chrdev: Character device /dev/mcelog to read and clear the MCE log.
  */
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 70f567f774ed..c5c003291861 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -434,7 +434,7 @@ static void intel_init_cmci(void)
 	cmci_recheck();
 }
 
-void intel_init_lmce(void)
+static void intel_init_lmce(void)
 {
 	u64 val;
 
@@ -447,9 +447,26 @@ void intel_init_lmce(void)
 		wrmsrl(MSR_IA32_MCG_EXT_CTL, val | MCG_EXT_CTL_LMCE_EN);
 }
 
+static void intel_clear_lmce(void)
+{
+	u64 val;
+
+	if (!lmce_supported())
+		return;
+
+	rdmsrl(MSR_IA32_MCG_EXT_CTL, val);
+	val &= ~MCG_EXT_CTL_LMCE_EN;
+	wrmsrl(MSR_IA32_MCG_EXT_CTL, val);
+}
+
 void mce_intel_feature_init(struct cpuinfo_x86 *c)
 {
 	intel_init_thermal(c);
 	intel_init_cmci();
 	intel_init_lmce();
 }
+
+void mce_intel_feature_clear(struct cpuinfo_x86 *c)
+{
+	intel_clear_lmce();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9cad694ed7c4..ce7ec649f8a6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -29,6 +29,7 @@
 #include <asm/debugreg.h>
 #include <asm/nmi.h>
 #include <asm/tlbflush.h>
+#include <asm/mce.h>
 
 /*
  * per-CPU TSS segments. Threads are completely 'soft' on Linux,
@@ -319,6 +320,7 @@ void stop_this_cpu(void *dummy)
 	 */
 	set_cpu_online(smp_processor_id(), false);
 	disable_local_APIC();
+	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
 
 	for (;;)
 		halt();
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 15aaa69bbb5e..12c8286206ce 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -30,6 +30,7 @@
 #include <asm/proto.h>
 #include <asm/apic.h>
 #include <asm/nmi.h>
+#include <asm/mce.h>
 #include <asm/trace/irq_vectors.h>
 /*
  *	Some notes on x86 processor bugs affecting SMP operation:
@@ -243,6 +244,7 @@ static void native_stop_other_cpus(int wait)
 finish:
 	local_irq_save(flags);
 	disable_local_APIC();
+	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
 	local_irq_restore(flags);
 }
 
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 7/7] x86/mce: Clear Local MCE opt-in before kexec
  2015-07-16  7:44       ` [PATCH 7/7] x86/mce: Clear Local MCE opt-in before kexec Borislav Petkov
@ 2015-07-17  1:16         ` Andy Lutomirski
  2015-07-17  4:52           ` Raj, Ashok
  0 siblings, 1 reply; 15+ messages in thread
From: Andy Lutomirski @ 2015-07-17  1:16 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Tony Luck, X86-ML, LKML, ashok.raj, Chen Gong,
	Aravind Gopalakrishnan, Oleg Nesterov, linux-edac

On Thu, Jul 16, 2015 at 12:44 AM, Borislav Petkov <bp@suse.de> wrote:
> From: Ashok Raj <ashok.raj@intel.com>
>
> kexec could boot a kernel that could be legacy with no knowledge of
> LMCE. Hence we should make sure we clear LMCE optin before kexec reboot.
>

What happens if an offline-but-not-unplugged CPU gets an MCE?  Or does
this code also clear CR4.MCE?

--Andy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 7/7] x86/mce: Clear Local MCE opt-in before kexec
  2015-07-17  1:16         ` Andy Lutomirski
@ 2015-07-17  4:52           ` Raj, Ashok
  0 siblings, 0 replies; 15+ messages in thread
From: Raj, Ashok @ 2015-07-17  4:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Borislav Petkov, Ingo Molnar, Tony Luck, X86-ML, LKML, Chen Gong,
	Aravind Gopalakrishnan, Oleg Nesterov, linux-edac, Ashok Raj

On Thu, Jul 16, 2015 at 06:16:50PM -0700, Andy Lutomirski wrote:
> > From: Ashok Raj <ashok.raj@intel.com>
> >
> > kexec could boot a kernel that could be legacy with no knowledge of
> > LMCE. Hence we should make sure we clear LMCE optin before kexec reboot.
> >
> 
> What happens if an offline-but-not-unplugged CPU gets an MCE?  Or does
> this code also clear CR4.MCE?

kexec doesn't use cpu_offline() path, but uses an IPI to all threads
before letting the BSP jump to new kernel.

In this patch, we only turned off the LMCE opt-in. CR4.MCE isn't touched.

if an offline-but-not-unplugged CPU gets an MCE its usually fatal and will
be broadcast to all cpus in the system.

Turning off CR4.MCE would not be good, since any thread that receives an MCE
and has CR4.MCE=0 would result in resetting the whole system.

There are other bugs in MCE in the offline path that i'm working on to send a 
patch update.

for e.g. one such bug is that during CPU_DOWN_PREPARE mce_disable_cpu() 
turns off MCx_CTL().

Machine check banks in uncore are visible to all logical cpus. We should not 
clear them. Today offlining a single cpu would disable MCE generation for any
of the uncore banks. I have them brewing in a test, should release in a couple
weeks or so. 

We can only clear banks if they are only thread local during cpu_offline(). 
We don't have such banks today (but coming). Most banks are either core scoped 
or socket scoped.

Cheers,
Ashok

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records
  2015-07-16  7:44     ` [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records Borislav Petkov
                         ` (5 preceding siblings ...)
  2015-07-16  7:44       ` [PATCH 7/7] x86/mce: Clear Local MCE opt-in before kexec Borislav Petkov
@ 2015-07-21  8:29       ` Ingo Molnar
  2015-07-21 10:03         ` Borislav Petkov
  6 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2015-07-21  8:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tony Luck, X86-ML, LKML, ashok.raj, gong.chen, Peter Zijlstra,
	Thomas Gleixner


* Borislav Petkov <bp@suse.de> wrote:

> From: "Chen, Gong" <gong.chen@linux.intel.com>

> diff --git a/arch/x86/include/uapi/asm/mce.h b/arch/x86/include/uapi/asm/mce.h
> index a0eab85ce7b8..76880ede9a35 100644
> --- a/arch/x86/include/uapi/asm/mce.h
> +++ b/arch/x86/include/uapi/asm/mce.h
> @@ -15,7 +15,8 @@ struct mce {
>  	__u64 time;	/* wall time_t when error was detected */
>  	__u8  cpuvendor;	/* cpu vendor as encoded in system.h */
>  	__u8  inject_flags;	/* software inject flags */
> -	__u16  pad;
> +	__u8  severity;
> +	__u8  usable_addr;
>  	__u32 cpuid;	/* CPUID 1 EAX */
>  	__u8  cs;		/* code segment */
>  	__u8  bank;	/* machine check bank */

So this change appears to be completely unrelated to the stated purpose of this 
patch?

> +/*
> + * printk() is not safe in MCE context. This is a lock-less memory allocator
> + * used to save error information organized in a lock-less list.
> + *
> + * This memory pool is only to be used to save MCE records in MCE context.
> + * MCE events are rare so a fixed size memory pool should be enough. Use

Missing comma.

> + * 2 pages to save MCE events for now (~80 MCE records at most).
> + */
> +#define MCE_POOLSZ	(2 * PAGE_SIZE)

> +bool mce_genpool_add(struct mce *mce)
> +{
> +	struct mce_evt_llist *node;
> +
> +	if (!mce_evt_pool)
> +		return false;
> +
> +	node = (void *)gen_pool_alloc(mce_evt_pool, sizeof(*node));
> +	if (!node) {
> +		pr_warn_ratelimited("MCE records pool full!\n");
> +		return false;
> +	}
> +
> +	memcpy(&node->mce, mce, sizeof(*mce));
> +	llist_add(&node->llnode, &mce_event_llist);
> +
> +	return true;
> +}

So I think the standard pattern for allocation failures with integer types is to 
return -ENOMEM, not bool. This really matters, because:

> +
> +static int mce_genpool_create(void)
> +{
> +	struct gen_pool *tmpp;
> +	int ret = -ENOMEM;
> +
> +	tmpp = gen_pool_create(ilog2(sizeof(struct mce_evt_llist)), -1);
> +	if (!tmpp)
> +		goto out;
> +
> +	ret = gen_pool_add(tmpp, (unsigned long)genpool_buf, MCE_POOLSZ, -1);
> +	if (ret) {
> +		gen_pool_destroy(tmpp);
> +		goto out;

here gen_pool_add() has an inverted logic, and they looks confusing.

Furthermore, why do we spell it 'mce_genpool' if the generic facility is spelling 
it gen_pool?

Also, I'm questioning the whole premise of the patches:

> +/*
> + * printk() is not safe in MCE context. This is a lock-less memory allocator
> + * used to save error information organized in a lock-less list.
> + *
> + * This memory pool is only to be used to save MCE records in MCE context.
> + * MCE events are rare so a fixed size memory pool should be enough. Use

So how are we going to report uncorrectable errors that forcibly crash/panic the 
system if we cannot use printk? How will the admin learn what was amiss?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records
  2015-07-21  8:29       ` [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records Ingo Molnar
@ 2015-07-21 10:03         ` Borislav Petkov
  2015-07-21 10:08           ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2015-07-21 10:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tony Luck, X86-ML, LKML, ashok.raj, gong.chen, Peter Zijlstra,
	Thomas Gleixner

On Tue, Jul 21, 2015 at 10:29:49AM +0200, Ingo Molnar wrote:
> So this change appears to be completely unrelated to the stated purpose of this 
> patch?

I'll carve it out into a separate patch.

> Missing comma.

Good point.

> So I think the standard pattern for allocation failures with integer types is to 
> return -ENOMEM, not bool. This really matters, because:

...

> here gen_pool_add() has an inverted logic, and they looks confusing.

Lemme fix that.

> Furthermore, why do we spell it 'mce_genpool' if the generic facility is spelling 
> it gen_pool?

mce_gen_pool() it is.

> So how are we going to report uncorrectable errors that forcibly
> crash/panic the system if we cannot use printk? How will the admin
> learn what was amiss?

There's no change to that policy - we still panic for MCEs of
MCE_PANIC_SEVERITY and higher. And mce_panic() does use printk() to dump
that critical information.

The gen_pool stuff is for MCEs for which the hw still raises an #MC
exception but the severity code determines that we don't need to panic
but do recovery action.

However, we don't want to call printk() from the #MC exception handler
since it is NMI-like atomic context and printk is not NMI-safe (yet).
Those printks are issued later, in process context when we're done with
the exception handler and recovery action.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records
  2015-07-21 10:03         ` Borislav Petkov
@ 2015-07-21 10:08           ` Ingo Molnar
  0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2015-07-21 10:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tony Luck, X86-ML, LKML, ashok.raj, gong.chen, Peter Zijlstra,
	Thomas Gleixner


* Borislav Petkov <bp@suse.de> wrote:

> > So how are we going to report uncorrectable errors that forcibly crash/panic 
> > the system if we cannot use printk? How will the admin learn what was amiss?
> 
> There's no change to that policy - we still panic for MCEs of MCE_PANIC_SEVERITY 
> and higher. And mce_panic() does use printk() to dump that critical information.

Ok, I see: through mce_print().

> The gen_pool stuff is for MCEs for which the hw still raises an #MC exception 
> but the severity code determines that we don't need to panic but do recovery 
> action.
> 
> However, we don't want to call printk() from the #MC exception handler since it 
> is NMI-like atomic context and printk is not NMI-safe (yet). Those printks are 
> issued later, in process context when we're done with the exception handler and 
> recovery action.

Ok - no objections then.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-07-21 10:08 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-10 21:57 [GIT PULL] x86/ras material for 4.3 queue Luck, Tony
2015-07-15 11:30 ` Ingo Molnar
2015-07-16  7:39   ` Borislav Petkov
2015-07-16  7:44     ` [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records Borislav Petkov
2015-07-16  7:44       ` [PATCH 2/7] x86/mce: Don't use percpu workqueues Borislav Petkov
2015-07-16  7:44       ` [PATCH 3/7] x86/mce: Remove the MCE ring for Action Optional errors Borislav Petkov
2015-07-16  7:44       ` [PATCH 4/7] x86/mce: Avoid potential deadlock due to printk() in MCE context Borislav Petkov
2015-07-16  7:44       ` [PATCH 5/7] x86/mce: Kill drain_mcelog_buffer() Borislav Petkov
2015-07-16  7:44       ` [PATCH 6/7] x86/mce: Remove unused function declarations Borislav Petkov
2015-07-16  7:44       ` [PATCH 7/7] x86/mce: Clear Local MCE opt-in before kexec Borislav Petkov
2015-07-17  1:16         ` Andy Lutomirski
2015-07-17  4:52           ` Raj, Ashok
2015-07-21  8:29       ` [PATCH 1/7] x86/mce: Provide a lockless memory pool to save error records Ingo Molnar
2015-07-21 10:03         ` Borislav Petkov
2015-07-21 10:08           ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.