[RFC PATCH 0/3] x86/RAS: Dump error record to dmesg if no consumers

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/3] x86/RAS: Dump error record to dmesg if no consumers
@ 2016-11-01 12:09 Borislav Petkov
  2016-11-01 12:09 ` [RFC PATCH 1/3] notifiers: Document notifier priority Borislav Petkov
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Borislav Petkov @ 2016-11-01 12:09 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-edac, X86 ML, LKML

From: Borislav Petkov <bp@suse.de>

Right,

so this is not a good thing: systems may not have any error record
consumers registered and in such cases, any logged MCEs disappear into
the void. And this shouldn't happen.

So let's dump them to dmesg as a last resort.

Borislav Petkov (3):
  notifiers: Document notifier priority
  x86/RAS: Add TSC to the injected MCE
  x86/MCE: Dump MCE to dmesg if no consumers

 arch/x86/kernel/cpu/mcheck/mce.c | 52 +++++++++++++++++++++++++++++++++++-----
 arch/x86/ras/mce_amd_inj.c       |  2 ++
 include/linux/notifier.h         |  1 +
 3 files changed, 49 insertions(+), 6 deletions(-)

-- 
2.10.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH 1/3] notifiers: Document notifier priority
  2016-11-01 12:09 [RFC PATCH 0/3] x86/RAS: Dump error record to dmesg if no consumers Borislav Petkov
@ 2016-11-01 12:09 ` Borislav Petkov
  2016-11-01 12:09 ` [RFC PATCH 2/3] x86/RAS: Add TSC to the injected MCE Borislav Petkov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2016-11-01 12:09 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-edac, X86 ML, LKML

From: Borislav Petkov <bp@suse.de>

I always forget which is highest priority so document it.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 include/linux/notifier.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 4149868de4e6..6bd3c691e562 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -55,6 +55,7 @@ typedef	int (*notifier_fn_t)(struct notifier_block *nb,
 struct notifier_block {
 	notifier_fn_t notifier_call;
 	struct notifier_block __rcu *next;
+	/* Notifier priority: numerically higher gets executed earlier */
 	int priority;
 };
 
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 2/3] x86/RAS: Add TSC to the injected MCE
  2016-11-01 12:09 [RFC PATCH 0/3] x86/RAS: Dump error record to dmesg if no consumers Borislav Petkov
  2016-11-01 12:09 ` [RFC PATCH 1/3] notifiers: Document notifier priority Borislav Petkov
@ 2016-11-01 12:09 ` Borislav Petkov
  2016-11-08 16:19   ` [tip:ras/core] x86/RAS: Add TSC timestamp " tip-bot for Borislav Petkov
  2016-11-01 12:09 ` [RFC PATCH 3/3] x86/MCE: Dump MCE to dmesg if no consumers Borislav Petkov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2016-11-01 12:09 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-edac, X86 ML, LKML

From: Borislav Petkov <bp@suse.de>

Add the TSC at the time of injection.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/ras/mce_amd_inj.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/ras/mce_amd_inj.c b/arch/x86/ras/mce_amd_inj.c
index 1ac76479c266..8730c2882fff 100644
--- a/arch/x86/ras/mce_amd_inj.c
+++ b/arch/x86/ras/mce_amd_inj.c
@@ -275,6 +275,8 @@ static void do_inject(void)
 	unsigned int cpu = i_mce.extcpu;
 	u8 b = i_mce.bank;
 
+	rdtscll(i_mce.tsc);
+
 	if (i_mce.misc)
 		i_mce.status |= MCI_STATUS_MISCV;
 
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 3/3] x86/MCE: Dump MCE to dmesg if no consumers
  2016-11-01 12:09 [RFC PATCH 0/3] x86/RAS: Dump error record to dmesg if no consumers Borislav Petkov
  2016-11-01 12:09 ` [RFC PATCH 1/3] notifiers: Document notifier priority Borislav Petkov
  2016-11-01 12:09 ` [RFC PATCH 2/3] x86/RAS: Add TSC to the injected MCE Borislav Petkov
@ 2016-11-01 12:09 ` Borislav Petkov
  2016-11-08 16:19   ` [tip:ras/core] " tip-bot for Borislav Petkov
  2016-11-05 13:11 ` [PATCH] x86/MCE: Remove MCP_TIMESTAMP Borislav Petkov
  2016-11-07  7:37 ` [RFC PATCH 0/3] x86/RAS: Dump error record to dmesg if no consumers Ingo Molnar
  4 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2016-11-01 12:09 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-edac, X86 ML, LKML

From: Borislav Petkov <bp@suse.de>

When there are no error record consumers registered with the kernel, the
only thing that appears in dmesg is something like:

  [  300.000326] mce: [Hardware Error]: Machine check events logged

and the error records are gone. Which is seriously counterproductive.

So let's dump them to dmesg instead, in such a case.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 52 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 46 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index a7fdf453d895..4ca00474804b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -207,8 +207,12 @@ EXPORT_SYMBOL_GPL(mce_inject_log);
 
 static struct notifier_block mce_srao_nb;
 
+static atomic_t num_notifiers;
+
 void mce_register_decode_chain(struct notifier_block *nb)
 {
+	atomic_inc(&num_notifiers);
+
 	/* Ensure SRAO notifier has the highest priority in the decode chain. */
 	if (nb != &mce_srao_nb && nb->priority == INT_MAX)
 		nb->priority -= 1;
@@ -219,6 +223,8 @@ EXPORT_SYMBOL_GPL(mce_register_decode_chain);
 
 void mce_unregister_decode_chain(struct notifier_block *nb)
 {
+	atomic_dec(&num_notifiers);
+
 	atomic_notifier_chain_unregister(&x86_mce_decoder_chain, nb);
 }
 EXPORT_SYMBOL_GPL(mce_unregister_decode_chain);
@@ -270,17 +276,17 @@ struct mca_msr_regs msr_ops = {
 	.misc	= misc_reg
 };
 
-static void print_mce(struct mce *m)
+static void __print_mce(struct mce *m)
 {
-	int ret = 0;
-
-	pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
-	       m->extcpu, m->mcgstatus, m->bank, m->status);
+	pr_emerg(HW_ERR "CPU %d: Machine Check%s: %Lx Bank %d: %016Lx\n",
+		 m->extcpu,
+		 (m->mcgstatus & MCG_STATUS_MCIP ? " Exception" : ""),
+		 m->mcgstatus, m->bank, m->status);
 
 	if (m->ip) {
 		pr_emerg(HW_ERR "RIP%s %02x:<%016Lx> ",
 			!(m->mcgstatus & MCG_STATUS_EIPV) ? " !INEXACT!" : "",
-				m->cs, m->ip);
+			m->cs, m->ip);
 
 		if (m->cs == __KERNEL_CS)
 			print_symbol("{%s}", m->ip);
@@ -308,6 +314,13 @@ static void print_mce(struct mce *m)
 	pr_emerg(HW_ERR "PROCESSOR %u:%x TIME %llu SOCKET %u APIC %x microcode %x\n",
 		m->cpuvendor, m->cpuid, m->time, m->socketid, m->apicid,
 		cpu_data(m->extcpu).microcode);
+}
+
+static void print_mce(struct mce *m)
+{
+	int ret = 0;
+
+	__print_mce(m);
 
 	/*
 	 * Print out human-readable details about the MCE error,
@@ -569,6 +582,32 @@ static struct notifier_block mce_srao_nb = {
 	.priority = INT_MAX,
 };
 
+static int mce_default_notifier(struct notifier_block *nb, unsigned long val,
+				void *data)
+{
+	struct mce *m = (struct mce *)data;
+
+	if (!m)
+		return NOTIFY_DONE;
+
+	/*
+	 * Run the default notifier if we have only the SRAO
+	 * notifier and us registered.
+	 */
+	if (atomic_read(&num_notifiers) > 2)
+		return NOTIFY_DONE;
+
+	__print_mce(m);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block mce_default_nb = {
+	.notifier_call	= mce_default_notifier,
+	/* lowest prio, we want it to run last. */
+	.priority	= 0,
+};
+
 /*
  * Read ADDR and MISC registers.
  */
@@ -2138,6 +2177,7 @@ int __init mcheck_init(void)
 {
 	mcheck_intel_therm_init();
 	mce_register_decode_chain(&mce_srao_nb);
+	mce_register_decode_chain(&mce_default_nb);
 	mcheck_vendor_init_severity();
 
 	INIT_WORK(&mce_work, mce_process_work);
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH] x86/MCE: Remove MCP_TIMESTAMP
  2016-11-01 12:09 [RFC PATCH 0/3] x86/RAS: Dump error record to dmesg if no consumers Borislav Petkov
                   ` (2 preceding siblings ...)
  2016-11-01 12:09 ` [RFC PATCH 3/3] x86/MCE: Dump MCE to dmesg if no consumers Borislav Petkov
@ 2016-11-05 13:11 ` Borislav Petkov
  2016-11-07 17:48   ` Luck, Tony
  2016-11-07  7:37 ` [RFC PATCH 0/3] x86/RAS: Dump error record to dmesg if no consumers Ingo Molnar
  4 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2016-11-05 13:11 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-edac, X86 ML, LKML

Whoops,

one more:

---
From: Borislav Petkov <bp@suse.de>
Date: Sat, 5 Nov 2016 12:47:03 +0100
Subject: [PATCH] x86/MCE: Remove MCP_TIMESTAMP

MCP_TIMESTAMP controls whether current TSC value should be added to
the MCE record. Most of machine_check_poll() callers supply it, except
__mcheck_cpu_init_generic() but this is wrong because we could be
logging an MCE right at the same time and thus log one without the TSC
value.

What is more, machine_check_poll() did unconditionally clear mce.tsc
which is another bug.

So, get rid of all that and simply log an MCE with a TSC value always.
Simplifies the code a bit too.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/mce.h             | 5 ++---
 arch/x86/kernel/cpu/mcheck/mce.c       | 6 +-----
 arch/x86/kernel/cpu/mcheck/mce_intel.c | 6 +++---
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 748b8da8e627..3b53c260e0be 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -265,9 +265,8 @@ typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
 DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
 
 enum mcp_flags {
-	MCP_TIMESTAMP	= BIT(0),	/* log time stamp */
-	MCP_UC		= BIT(1),	/* log uncorrected errors */
-	MCP_DONTLOG	= BIT(2),	/* only clear, don't log */
+	MCP_UC		= BIT(0),	/* log uncorrected errors */
+	MCP_DONTLOG	= BIT(1),	/* only clear, don't log */
 };
 bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 4ca00474804b..82564156d6ab 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -713,7 +713,6 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		m.misc = 0;
 		m.addr = 0;
 		m.bank = i;
-		m.tsc = 0;
 
 		barrier();
 		m.status = mce_rdmsrl(msr_ops.status(i));
@@ -735,9 +734,6 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 
 		mce_read_aux(&m, i);
 
-		if (!(flags & MCP_TIMESTAMP))
-			m.tsc = 0;
-
 		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
 
 		if (severity == MCE_DEFERRED_SEVERITY && memory_error(&m))
@@ -1394,7 +1390,7 @@ static void mce_timer_fn(unsigned long data)
 	iv = __this_cpu_read(mce_next_interval);
 
 	if (mce_available(this_cpu_ptr(&cpu_info))) {
-		machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_poll_banks));
+		machine_check_poll(0, this_cpu_ptr(&mce_poll_banks));
 
 		if (mce_intel_cmci_poll()) {
 			iv = mce_adjust_timer(iv);
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 1defb8ea882c..3262f0d726bb 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -130,7 +130,7 @@ bool mce_intel_cmci_poll(void)
 	 * Reset the counter if we've logged an error in the last poll
 	 * during the storm.
 	 */
-	if (machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned)))
+	if (machine_check_poll(0, this_cpu_ptr(&mce_banks_owned)))
 		this_cpu_write(cmci_backoff_cnt, INITIAL_CHECK_INTERVAL);
 	else
 		this_cpu_dec(cmci_backoff_cnt);
@@ -250,7 +250,7 @@ static void intel_threshold_interrupt(void)
 	if (cmci_storm_detect())
 		return;
 
-	machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned));
+	machine_check_poll(0, this_cpu_ptr(&mce_banks_owned));
 }
 
 /*
@@ -342,7 +342,7 @@ void cmci_recheck(void)
 		return;
 
 	local_irq_save(flags);
-	machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned));
+	machine_check_poll(0, this_cpu_ptr(&mce_banks_owned));
 	local_irq_restore(flags);
 }
 
-- 
2.10.0

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 0/3] x86/RAS: Dump error record to dmesg if no consumers
  2016-11-01 12:09 [RFC PATCH 0/3] x86/RAS: Dump error record to dmesg if no consumers Borislav Petkov
                   ` (3 preceding siblings ...)
  2016-11-05 13:11 ` [PATCH] x86/MCE: Remove MCP_TIMESTAMP Borislav Petkov
@ 2016-11-07  7:37 ` Ingo Molnar
  4 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2016-11-07  7:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tony Luck, linux-edac, X86 ML, LKML, Thomas Gleixner, Peter Zijlstra


* Borislav Petkov <bp@alien8.de> wrote:

> From: Borislav Petkov <bp@suse.de>
> 
> Right,
> 
> so this is not a good thing: systems may not have any error record
> consumers registered and in such cases, any logged MCEs disappear into
> the void. And this shouldn't happen.
> 
> So let's dump them to dmesg as a last resort.
> 
> Borislav Petkov (3):
>   notifiers: Document notifier priority
>   x86/RAS: Add TSC to the injected MCE
>   x86/MCE: Dump MCE to dmesg if no consumers
> 
>  arch/x86/kernel/cpu/mcheck/mce.c | 52 +++++++++++++++++++++++++++++++++++-----
>  arch/x86/ras/mce_amd_inj.c       |  2 ++
>  include/linux/notifier.h         |  1 +
>  3 files changed, 49 insertions(+), 6 deletions(-)

Sounds good to me!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] x86/MCE: Remove MCP_TIMESTAMP
  2016-11-05 13:11 ` [PATCH] x86/MCE: Remove MCP_TIMESTAMP Borislav Petkov
@ 2016-11-07 17:48   ` Luck, Tony
  2016-11-07 18:08     ` Borislav Petkov
  0 siblings, 1 reply; 21+ messages in thread
From: Luck, Tony @ 2016-11-07 17:48 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, X86 ML, LKML

> So, get rid of all that and simply log an MCE with a TSC value always.
> Simplifies the code a bit too.

I'm not necessarily opposed to this ... but there was once some logic behind when
logged TSC, and when we didn't.  Essentially we wanted the TSC when we were
logging from #CMCI or #MC .... because the detection of the error was fresh, and
wanted as much precision on the logged time as possible to compare with logged
errors from other banks/cpus. This might allow us to distinguish multiple errors logged
in the same #CMCI, from errors logged in separate #CMCI a tenth of a second apart.

If we found the error while polling, we didn’t want to provide a false sense of precision.
The error could have been logged up to five minutes previously (or when logging
errors during the initial poll of the banks an arbitrary time in the past).

-Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/MCE: Remove MCP_TIMESTAMP
  2016-11-07 17:48   ` Luck, Tony
@ 2016-11-07 18:08     ` Borislav Petkov
  2016-11-07 18:37       ` Luck, Tony
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2016-11-07 18:08 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-edac, X86 ML, LKML

On Mon, Nov 07, 2016 at 05:48:46PM +0000, Luck, Tony wrote:
> > So, get rid of all that and simply log an MCE with a TSC value always.
> > Simplifies the code a bit too.
> 
> I'm not necessarily opposed to this ... but there was once some logic behind when
> logged TSC, and when we didn't.  Essentially we wanted the TSC when we were
> logging from #CMCI or #MC .... because the detection of the error was fresh, and
> wanted as much precision on the logged time as possible to compare with logged
> errors from other banks/cpus. This might allow us to distinguish multiple errors logged
> in the same #CMCI, from errors logged in separate #CMCI a tenth of a second apart.
> 
> If we found the error while polling, we didn’t want to provide a false sense of precision.
> The error could have been logged up to five minutes previously (or when logging
> errors during the initial poll of the banks an arbitrary time in the past).

Right, looks like we've lost that logic:

Functions calling this function: machine_check_poll

  File         Function                  Line
0 mce-inject.c raise_poll                  57 machine_check_poll(0, &b);
1 mce.c        mce_timer_fn              1358 machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_poll_banks));
2 mce.c        __mcheck_cpu_init_generic 1508 machine_check_poll(MCP_UC | m_fl, &all_banks);
3 mce_intel.c  mce_intel_cmci_poll        133 if (machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned)))
4 mce_intel.c  intel_threshold_interrupt  253 machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned));
5 mce_intel.c  cmci_recheck               345 machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned));

So the TSC timestamp will be possibly inexact now in mce_timer_fn(),
__mcheck_cpu_init_generic(), mce_intel_cmci_poll() and cmci_recheck().

Should we bother and add a flag to struct mce - maybe somewhere in the
padding __u8 pad; - to denote that the logged TSC may not be exact?

Mind you, there's also

	m->time = get_seconds();

which also collects time and which could also be possibly inexact.

One other possibility would be to use ->time and write ->tsc *only*
when exact - i.e., in the handler - and this is then enough info about
timing.

->time will give you somewhere around where it happened and ->tsc - only
if set - will give you exact, well, *timestamp* :)

This sounds like a pretty straightforward logic to me...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] x86/MCE: Remove MCP_TIMESTAMP
  2016-11-07 18:08     ` Borislav Petkov
@ 2016-11-07 18:37       ` Luck, Tony
  2016-11-08 18:09         ` Borislav Petkov
  0 siblings, 1 reply; 21+ messages in thread
From: Luck, Tony @ 2016-11-07 18:37 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, X86 ML, LKML

> One other possibility would be to use ->time and write ->tsc *only*
> when exact - i.e., in the handler - and this is then enough info about
> timing.
>
> ->time will give you somewhere around where it happened and ->tsc - only
> if set - will give you exact, well, *timestamp* :)
>
> This sounds like a pretty straightforward logic to me...

Also to me ... and I think that's what used to happen (or at least was the
intent).

-Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [tip:ras/core] x86/RAS: Add TSC timestamp to the injected MCE
  2016-11-01 12:09 ` [RFC PATCH 2/3] x86/RAS: Add TSC to the injected MCE Borislav Petkov
@ 2016-11-08 16:19   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Borislav Petkov @ 2016-11-08 16:19 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: bp, linux-kernel, hpa, tglx, mingo

Commit-ID:  8c203dbb78ca7a9aed4e2570c866b0f43c752e41
Gitweb:     http://git.kernel.org/tip/8c203dbb78ca7a9aed4e2570c866b0f43c752e41
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Tue, 1 Nov 2016 13:04:41 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 8 Nov 2016 17:10:13 +0100

x86/RAS: Add TSC timestamp to the injected MCE

The MCE injection code does not provide the time stamp information for the
injected MCE. Add it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20161101120911.13163-3-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/ras/mce_amd_inj.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/ras/mce_amd_inj.c b/arch/x86/ras/mce_amd_inj.c
index 1ac7647..8730c28 100644
--- a/arch/x86/ras/mce_amd_inj.c
+++ b/arch/x86/ras/mce_amd_inj.c
@@ -275,6 +275,8 @@ static void do_inject(void)
 	unsigned int cpu = i_mce.extcpu;
 	u8 b = i_mce.bank;
 
+	rdtscll(i_mce.tsc);
+
 	if (i_mce.misc)
 		i_mce.status |= MCI_STATUS_MISCV;
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:ras/core] x86/MCE: Dump MCE to dmesg if no consumers
  2016-11-01 12:09 ` [RFC PATCH 3/3] x86/MCE: Dump MCE to dmesg if no consumers Borislav Petkov
@ 2016-11-08 16:19   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Borislav Petkov @ 2016-11-08 16:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, tony.luck, mingo, Eric.Morton, hpa, linux-kernel, tglx

Commit-ID:  cd9c57cad3fe89ea949b9266cddc947c0838f7af
Gitweb:     http://git.kernel.org/tip/cd9c57cad3fe89ea949b9266cddc947c0838f7af
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Tue, 1 Nov 2016 12:52:27 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 8 Nov 2016 17:10:13 +0100

x86/MCE: Dump MCE to dmesg if no consumers

When there are no error record consumers registered with the kernel, the
only thing that appears in dmesg is something like:

  [  300.000326] mce: [Hardware Error]: Machine check events logged

and the error records are gone. Which is seriously counterproductive.

So let's dump them to dmesg instead, in such a case.

Requested-by: Eric Morton <Eric.Morton@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20161101120911.13163-4-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 52 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 46 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index a7fdf45..4ca0047 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -207,8 +207,12 @@ EXPORT_SYMBOL_GPL(mce_inject_log);
 
 static struct notifier_block mce_srao_nb;
 
+static atomic_t num_notifiers;
+
 void mce_register_decode_chain(struct notifier_block *nb)
 {
+	atomic_inc(&num_notifiers);
+
 	/* Ensure SRAO notifier has the highest priority in the decode chain. */
 	if (nb != &mce_srao_nb && nb->priority == INT_MAX)
 		nb->priority -= 1;
@@ -219,6 +223,8 @@ EXPORT_SYMBOL_GPL(mce_register_decode_chain);
 
 void mce_unregister_decode_chain(struct notifier_block *nb)
 {
+	atomic_dec(&num_notifiers);
+
 	atomic_notifier_chain_unregister(&x86_mce_decoder_chain, nb);
 }
 EXPORT_SYMBOL_GPL(mce_unregister_decode_chain);
@@ -270,17 +276,17 @@ struct mca_msr_regs msr_ops = {
 	.misc	= misc_reg
 };
 
-static void print_mce(struct mce *m)
+static void __print_mce(struct mce *m)
 {
-	int ret = 0;
-
-	pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
-	       m->extcpu, m->mcgstatus, m->bank, m->status);
+	pr_emerg(HW_ERR "CPU %d: Machine Check%s: %Lx Bank %d: %016Lx\n",
+		 m->extcpu,
+		 (m->mcgstatus & MCG_STATUS_MCIP ? " Exception" : ""),
+		 m->mcgstatus, m->bank, m->status);
 
 	if (m->ip) {
 		pr_emerg(HW_ERR "RIP%s %02x:<%016Lx> ",
 			!(m->mcgstatus & MCG_STATUS_EIPV) ? " !INEXACT!" : "",
-				m->cs, m->ip);
+			m->cs, m->ip);
 
 		if (m->cs == __KERNEL_CS)
 			print_symbol("{%s}", m->ip);
@@ -308,6 +314,13 @@ static void print_mce(struct mce *m)
 	pr_emerg(HW_ERR "PROCESSOR %u:%x TIME %llu SOCKET %u APIC %x microcode %x\n",
 		m->cpuvendor, m->cpuid, m->time, m->socketid, m->apicid,
 		cpu_data(m->extcpu).microcode);
+}
+
+static void print_mce(struct mce *m)
+{
+	int ret = 0;
+
+	__print_mce(m);
 
 	/*
 	 * Print out human-readable details about the MCE error,
@@ -569,6 +582,32 @@ static struct notifier_block mce_srao_nb = {
 	.priority = INT_MAX,
 };
 
+static int mce_default_notifier(struct notifier_block *nb, unsigned long val,
+				void *data)
+{
+	struct mce *m = (struct mce *)data;
+
+	if (!m)
+		return NOTIFY_DONE;
+
+	/*
+	 * Run the default notifier if we have only the SRAO
+	 * notifier and us registered.
+	 */
+	if (atomic_read(&num_notifiers) > 2)
+		return NOTIFY_DONE;
+
+	__print_mce(m);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block mce_default_nb = {
+	.notifier_call	= mce_default_notifier,
+	/* lowest prio, we want it to run last. */
+	.priority	= 0,
+};
+
 /*
  * Read ADDR and MISC registers.
  */
@@ -2138,6 +2177,7 @@ int __init mcheck_init(void)
 {
 	mcheck_intel_therm_init();
 	mce_register_decode_chain(&mce_srao_nb);
+	mce_register_decode_chain(&mce_default_nb);
 	mcheck_vendor_init_severity();
 
 	INIT_WORK(&mce_work, mce_process_work);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/MCE: Remove MCP_TIMESTAMP
  2016-11-07 18:37       ` Luck, Tony
@ 2016-11-08 18:09         ` Borislav Petkov
  2016-11-08 18:22           ` Luck, Tony
  2016-11-08 20:39           ` Thomas Gleixner
  0 siblings, 2 replies; 21+ messages in thread
From: Borislav Petkov @ 2016-11-08 18:09 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-edac, X86 ML, LKML

On Mon, Nov 07, 2016 at 06:37:50PM +0000, Luck, Tony wrote:
> Also to me ... and I think that's what used to happen (or at least was the
> intent).

How's that?

This still preserves the precise TSC timestamp in intel_threshold_interrupt().

---
From: Borislav Petkov <bp@suse.de>
Date: Tue, 8 Nov 2016 16:20:05 +0100
Subject: [PATCH] x86/MCE: Correct TSC timestamping of error records

We did have logic in the MCE code which would TSC-timestamp an error
record only when it is exact - i.e., it wasn't detected by polling. This
isn't the case anymore. So let's fix that:

We have a TSC timestamp in the error record only when it has been a
precise detection, i.e., either in the #MC handler or in one of the
interrupt handlers (thresholding, deferred, ...).

All other error records still have mce.time which contains the wall time
in order to be able to place the error record in time approximately.

Also, this fixes another bug where machine_check_poll() would clear
mce.tsc unconditionally even if we requested precise MCP_TIMESTAMP
logging.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c       | 3 +--
 arch/x86/kernel/cpu/mcheck/mce_intel.c | 4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 4ca00474804b..b7a976d657f3 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -713,7 +713,6 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		m.misc = 0;
 		m.addr = 0;
 		m.bank = i;
-		m.tsc = 0;
 
 		barrier();
 		m.status = mce_rdmsrl(msr_ops.status(i));
@@ -1394,7 +1393,7 @@ static void mce_timer_fn(unsigned long data)
 	iv = __this_cpu_read(mce_next_interval);
 
 	if (mce_available(this_cpu_ptr(&cpu_info))) {
-		machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_poll_banks));
+		machine_check_poll(0, this_cpu_ptr(&mce_poll_banks));
 
 		if (mce_intel_cmci_poll()) {
 			iv = mce_adjust_timer(iv);
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 1defb8ea882c..be0b2fad47c5 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -130,7 +130,7 @@ bool mce_intel_cmci_poll(void)
 	 * Reset the counter if we've logged an error in the last poll
 	 * during the storm.
 	 */
-	if (machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned)))
+	if (machine_check_poll(0, this_cpu_ptr(&mce_banks_owned)))
 		this_cpu_write(cmci_backoff_cnt, INITIAL_CHECK_INTERVAL);
 	else
 		this_cpu_dec(cmci_backoff_cnt);
@@ -342,7 +342,7 @@ void cmci_recheck(void)
 		return;
 
 	local_irq_save(flags);
-	machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned));
+	machine_check_poll(0, this_cpu_ptr(&mce_banks_owned));
 	local_irq_restore(flags);
 }
 
-- 
2.10.0

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* RE: [PATCH] x86/MCE: Remove MCP_TIMESTAMP
  2016-11-08 18:09         ` Borislav Petkov
@ 2016-11-08 18:22           ` Luck, Tony
  2016-11-08 20:39           ` Thomas Gleixner
  1 sibling, 0 replies; 21+ messages in thread
From: Luck, Tony @ 2016-11-08 18:22 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, X86 ML, LKML

> This still preserves the precise TSC timestamp in intel_threshold_interrupt().

Yup - this looks right.

Acked-by: Tony Luck <tony.luck@intel.com>

-Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/MCE: Remove MCP_TIMESTAMP
  2016-11-08 18:09         ` Borislav Petkov
  2016-11-08 18:22           ` Luck, Tony
@ 2016-11-08 20:39           ` Thomas Gleixner
  2016-11-08 21:08             ` Borislav Petkov
  1 sibling, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2016-11-08 20:39 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Luck, Tony, linux-edac, X86 ML, LKML

On Tue, 8 Nov 2016, Borislav Petkov wrote:
> 
> Also, this fixes another bug where machine_check_poll() would clear
> mce.tsc unconditionally even if we requested precise MCP_TIMESTAMP
> logging.

> @@ -713,7 +713,6 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
>  		m.misc = 0;
>  		m.addr = 0;
>  		m.bank = i;
> -		m.tsc = 0;

That does not make any sense. Where is m.tsc initialized? I couldn't find
any place which does, except this and the conditional clear farther down in
that function.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/MCE: Remove MCP_TIMESTAMP
  2016-11-08 20:39           ` Thomas Gleixner
@ 2016-11-08 21:08             ` Borislav Petkov
  2016-11-08 21:14               ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2016-11-08 21:08 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Luck, Tony, linux-edac, X86 ML, LKML

On Tue, Nov 08, 2016 at 09:39:02PM +0100, Thomas Gleixner wrote:
> That does not make any sense. Where is m.tsc initialized? I couldn't find
> any place which does, except this and the conditional clear farther down in
> that function.

mce_gather_info->mce_setup does

m->tsc = rdtsc();

And we do that *everytime* but then we go and clear the damn thing. I
know, I know, I wanted to flip that logic too and read the TSC *only*
when we want a precise timestamp but that would require more changes as
mce_setup() is used at a bunch of places.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/MCE: Remove MCP_TIMESTAMP
  2016-11-08 21:08             ` Borislav Petkov
@ 2016-11-08 21:14               ` Thomas Gleixner
  2016-11-08 21:24                 ` Borislav Petkov
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2016-11-08 21:14 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Luck, Tony, linux-edac, X86 ML, LKML

On Tue, 8 Nov 2016, Borislav Petkov wrote:
> On Tue, Nov 08, 2016 at 09:39:02PM +0100, Thomas Gleixner wrote:
> > That does not make any sense. Where is m.tsc initialized? I couldn't find
> > any place which does, except this and the conditional clear farther down in
> > that function.
> 
> mce_gather_info->mce_setup does
> 
> m->tsc = rdtsc();
> 
> And we do that *everytime* but then we go and clear the damn thing. I
> know, I know, I wanted to flip that logic too and read the TSC *only*
> when we want a precise timestamp but that would require more changes as
> mce_setup() is used at a bunch of places.

And yes, you should spend the extra cycles. Adding a flags argument to
mce_setup() and propagate it through the various callsites shouldn't be
that hard and would make the stuff obvious instead of obfuscated.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/MCE: Remove MCP_TIMESTAMP
  2016-11-08 21:14               ` Thomas Gleixner
@ 2016-11-08 21:24                 ` Borislav Petkov
  2016-11-08 21:54                   ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2016-11-08 21:24 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Luck, Tony, linux-edac, X86 ML, LKML

On Tue, Nov 08, 2016 at 10:14:04PM +0100, Thomas Gleixner wrote:
> And yes, you should spend the extra cycles. Adding a flags argument to
> mce_setup() and propagate it through the various callsites shouldn't be
> that hard and would make the stuff obvious instead of obfuscated.

Sure, that's already on my TODO. I want to take a look at it when I have
a quiet moment.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/MCE: Remove MCP_TIMESTAMP
  2016-11-08 21:24                 ` Borislav Petkov
@ 2016-11-08 21:54                   ` Thomas Gleixner
  2016-11-09 18:06                     ` Borislav Petkov
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2016-11-08 21:54 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Luck, Tony, linux-edac, X86 ML, LKML

On Tue, 8 Nov 2016, Borislav Petkov wrote:

> On Tue, Nov 08, 2016 at 10:14:04PM +0100, Thomas Gleixner wrote:
> > And yes, you should spend the extra cycles. Adding a flags argument to
> > mce_setup() and propagate it through the various callsites shouldn't be
> > that hard and would make the stuff obvious instead of obfuscated.
> 
> Sure, that's already on my TODO. I want to take a look at it when I have
> a quiet moment.

So for now we should fold something like the below into this patch.

Thanks,

	tglx

8<--------------------

--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -706,6 +706,15 @@ bool machine_check_poll(enum mcp_flags f
 
 	mce_gather_info(&m, NULL);
 
+	/*
+	 * m.tsc was set in mce_setup(). Clear it if not requested.
+	 *
+	 * FIXME: Propagate @flags to mce_gather_info/mce_setup() to avoid
+	 *	  that dance
+	 */
+	if (!(flags & MCP_TIMESTAMP))
+		m.tsc = 0;
+
 	for (i = 0; i < mca_cfg.banks; i++) {
 		if (!mce_banks[i].ctl || !test_bit(i, *b))
 			continue;
@@ -734,9 +743,6 @@ bool machine_check_poll(enum mcp_flags f
 
 		mce_read_aux(&m, i);
 
-		if (!(flags & MCP_TIMESTAMP))
-			m.tsc = 0;
-
 		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
 
 		if (severity == MCE_DEFERRED_SEVERITY && memory_error(&m))

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/MCE: Remove MCP_TIMESTAMP
  2016-11-08 21:54                   ` Thomas Gleixner
@ 2016-11-09 18:06                     ` Borislav Petkov
  2017-01-18 20:34                       ` Borislav Petkov
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2016-11-09 18:06 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Luck, Tony, linux-edac, X86 ML, LKML

On Tue, Nov 08, 2016 at 10:54:52PM +0100, Thomas Gleixner wrote:
> So for now we should fold something like the below into this patch.

Ok, how's that?

---
From: Borislav Petkov <bp@suse.de>
Date: Tue, 8 Nov 2016 16:20:05 +0100
Subject: [PATCH] x86/MCE: Correct TSC timestamping of error records

We did have logic in the MCE code which would TSC-timestamp an error
record only when it is exact - i.e., when it wasn't detected by polling.
This isn't the case anymore. So let's fix that:

We have a valid TSC timestamp in the error record only when it has been
a precise detection, i.e., either in the #MC handler or in one of the
interrupt handlers (thresholding, deferred, ...).

All other error records still have mce.time which contains the wall
time in order to be able to place the error record in time at least
approximately.

Also, this fixes another bug where machine_check_poll() would clear
mce.tsc unconditionally even if we requested precise MCP_TIMESTAMP
logging.

The proper fix would be to generate timestamp only when it has been
requested and not always. But that would require a more thorough code
audit of all mce_gather_info/mce_setup() users. Add a FIXME for now,
courtesy of tglx.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c       | 18 ++++++++++++------
 arch/x86/kernel/cpu/mcheck/mce_intel.c |  4 ++--
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 4ca00474804b..3fbe80066b4e 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -706,6 +706,17 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 
 	mce_gather_info(&m, NULL);
 
+	/*
+	 * m.tsc was set in mce_setup(). Clear it if not requested.
+	 *
+	 * FIXME: Propagate @flags to mce_gather_info/mce_setup() to avoid
+	 *	  that dance.
+	 */
+	if (!(flags & MCP_TIMESTAMP)) {
+		WARN_ON_ONCE(m.tsc);
+		m.tsc = 0;
+	}
+
 	for (i = 0; i < mca_cfg.banks; i++) {
 		if (!mce_banks[i].ctl || !test_bit(i, *b))
 			continue;
@@ -713,14 +724,12 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		m.misc = 0;
 		m.addr = 0;
 		m.bank = i;
-		m.tsc = 0;
 
 		barrier();
 		m.status = mce_rdmsrl(msr_ops.status(i));
 		if (!(m.status & MCI_STATUS_VAL))
 			continue;
 
-
 		/*
 		 * Uncorrected or signalled events are handled by the exception
 		 * handler when it is enabled, so don't process those here.
@@ -735,9 +744,6 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 
 		mce_read_aux(&m, i);
 
-		if (!(flags & MCP_TIMESTAMP))
-			m.tsc = 0;
-
 		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
 
 		if (severity == MCE_DEFERRED_SEVERITY && memory_error(&m))
@@ -1394,7 +1400,7 @@ static void mce_timer_fn(unsigned long data)
 	iv = __this_cpu_read(mce_next_interval);
 
 	if (mce_available(this_cpu_ptr(&cpu_info))) {
-		machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_poll_banks));
+		machine_check_poll(0, this_cpu_ptr(&mce_poll_banks));
 
 		if (mce_intel_cmci_poll()) {
 			iv = mce_adjust_timer(iv);
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 1defb8ea882c..be0b2fad47c5 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -130,7 +130,7 @@ bool mce_intel_cmci_poll(void)
 	 * Reset the counter if we've logged an error in the last poll
 	 * during the storm.
 	 */
-	if (machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned)))
+	if (machine_check_poll(0, this_cpu_ptr(&mce_banks_owned)))
 		this_cpu_write(cmci_backoff_cnt, INITIAL_CHECK_INTERVAL);
 	else
 		this_cpu_dec(cmci_backoff_cnt);
@@ -342,7 +342,7 @@ void cmci_recheck(void)
 		return;
 
 	local_irq_save(flags);
-	machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned));
+	machine_check_poll(0, this_cpu_ptr(&mce_banks_owned));
 	local_irq_restore(flags);
 }
 
-- 
2.10.0

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/MCE: Remove MCP_TIMESTAMP
  2016-11-09 18:06                     ` Borislav Petkov
@ 2017-01-18 20:34                       ` Borislav Petkov
  2017-01-19 13:34                         ` Borislav Petkov
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2017-01-18 20:34 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Luck, Tony, linux-edac, X86 ML, LKML

On Wed, Nov 09, 2016 at 07:06:04PM +0100, Borislav Petkov wrote:
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 4ca00474804b..3fbe80066b4e 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -706,6 +706,17 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
>  
>  	mce_gather_info(&m, NULL);
>  
> +	/*
> +	 * m.tsc was set in mce_setup(). Clear it if not requested.
> +	 *
> +	 * FIXME: Propagate @flags to mce_gather_info/mce_setup() to avoid
> +	 *	  that dance.
> +	 */

Lemme try to address that FIXME. So it doesn't look particularly easy to
do without touching/changing a bunch of places. That's why I'm trying
tricks first.

For example, the mce-apei.c case I'm addressing by setting ->tsc only
for errors of panic severity. The idea there is, that, panic errors will
have raised an #MC and not polled.

And then instead of propagating a flag to mce_setup(), it seems
easier/less code to set ->tsc depending on the call sites, i.e.,
are we polling or are we preparing an MCE record in an exception
handler/thresholding interrupt.

I'll run it tomorrow.

---
diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
index 83f1a98d37db..2eee85379689 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
@@ -52,8 +52,11 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
 
 	if (severity >= GHES_SEV_RECOVERABLE)
 		m.status |= MCI_STATUS_UC;
-	if (severity >= GHES_SEV_PANIC)
+
+	if (severity >= GHES_SEV_PANIC) {
 		m.status |= MCI_STATUS_PCC;
+		m.tsc = rdtsc();
+	}
 
 	m.addr = mem_err->physical_addr;
 	mce_log(&m);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 6eef6fde0f02..ca15a7e1f97d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -128,7 +128,6 @@ void mce_setup(struct mce *m)
 {
 	memset(m, 0, sizeof(struct mce));
 	m->cpu = m->extcpu = smp_processor_id();
-	m->tsc = rdtsc();
 	/* We hope get_seconds stays lockless */
 	m->time = get_seconds();
 	m->cpuvendor = boot_cpu_data.x86_vendor;
@@ -710,14 +709,8 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 
 	mce_gather_info(&m, NULL);
 
-	/*
-	 * m.tsc was set in mce_setup(). Clear it if not requested.
-	 *
-	 * FIXME: Propagate @flags to mce_gather_info/mce_setup() to avoid
-	 *	  that dance.
-	 */
-	if (!(flags & MCP_TIMESTAMP))
-		m.tsc = 0;
+	if (flags & MCP_TIMESTAMP)
+		m.tsc = rdtsc();
 
 	for (i = 0; i < mca_cfg.banks; i++) {
 		if (!mce_banks[i].ctl || !test_bit(i, *b))
@@ -1156,6 +1149,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 		goto out;
 
 	mce_gather_info(&m, regs);
+	m.tsc = rdtsc();
 
 	final = this_cpu_ptr(&mces_seen);
 	*final = m;
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 776379e4a39c..9e5427df3243 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -778,7 +778,8 @@ __log_error(unsigned int bank, bool deferred_err, bool threshold_err, u64 misc)
 	mce_setup(&m);
 
 	m.status = status;
-	m.bank = bank;
+	m.bank   = bank;
+	m.tsc	 = rdtsc();
 
 	if (threshold_err)
 		m.misc = misc;

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/MCE: Remove MCP_TIMESTAMP
  2017-01-18 20:34                       ` Borislav Petkov
@ 2017-01-19 13:34                         ` Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2017-01-19 13:34 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Luck, Tony, linux-edac, X86 ML, LKML, Ashok Raj

On Wed, Jan 18, 2017 at 09:34:13PM +0100, Borislav Petkov wrote:
> I'll run it tomorrow.

Ok, here it is, it looks ok while testing in a guest and injecting MCEs.

---
From: Borislav Petkov <bp@suse.de>
Date: Wed, 18 Jan 2017 21:34:41 +0100
Subject: [PATCH] x86/MCE: Flip the TSC-adding logic

Add the TSC value to the MCE record only when the MCE being logged is
precise, i.e., it is logged as an exception or an MCE-related interrupt.

So it doesn't look particularly easy to do without touching/changing a
bunch of places. That's why I'm trying tricks first.

For example, the mce-apei.c case I'm addressing by setting ->tsc only
for errors of panic severity. The idea there is, that, panic errors will
have raised an #MC and not polled.

And then instead of propagating a flag to mce_setup(), it seems
easier/less code to set ->tsc depending on the call sites, i.e.,
are we polling or are we preparing an MCE record in an exception
handler/thresholding interrupt.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce-apei.c |  5 ++++-
 arch/x86/kernel/cpu/mcheck/mce.c      | 12 +++---------
 arch/x86/kernel/cpu/mcheck/mce_amd.c  |  3 ++-
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
index 83f1a98d37db..2eee85379689 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
@@ -52,8 +52,11 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
 
 	if (severity >= GHES_SEV_RECOVERABLE)
 		m.status |= MCI_STATUS_UC;
-	if (severity >= GHES_SEV_PANIC)
+
+	if (severity >= GHES_SEV_PANIC) {
 		m.status |= MCI_STATUS_PCC;
+		m.tsc = rdtsc();
+	}
 
 	m.addr = mem_err->physical_addr;
 	mce_log(&m);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 6eef6fde0f02..ca15a7e1f97d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -128,7 +128,6 @@ void mce_setup(struct mce *m)
 {
 	memset(m, 0, sizeof(struct mce));
 	m->cpu = m->extcpu = smp_processor_id();
-	m->tsc = rdtsc();
 	/* We hope get_seconds stays lockless */
 	m->time = get_seconds();
 	m->cpuvendor = boot_cpu_data.x86_vendor;
@@ -710,14 +709,8 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 
 	mce_gather_info(&m, NULL);
 
-	/*
-	 * m.tsc was set in mce_setup(). Clear it if not requested.
-	 *
-	 * FIXME: Propagate @flags to mce_gather_info/mce_setup() to avoid
-	 *	  that dance.
-	 */
-	if (!(flags & MCP_TIMESTAMP))
-		m.tsc = 0;
+	if (flags & MCP_TIMESTAMP)
+		m.tsc = rdtsc();
 
 	for (i = 0; i < mca_cfg.banks; i++) {
 		if (!mce_banks[i].ctl || !test_bit(i, *b))
@@ -1156,6 +1149,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 		goto out;
 
 	mce_gather_info(&m, regs);
+	m.tsc = rdtsc();
 
 	final = this_cpu_ptr(&mces_seen);
 	*final = m;
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 776379e4a39c..9e5427df3243 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -778,7 +778,8 @@ __log_error(unsigned int bank, bool deferred_err, bool threshold_err, u64 misc)
 	mce_setup(&m);
 
 	m.status = status;
-	m.bank = bank;
+	m.bank   = bank;
+	m.tsc	 = rdtsc();
 
 	if (threshold_err)
 		m.misc = misc;
-- 
2.11.0

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-01-19 13:40 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-01 12:09 [RFC PATCH 0/3] x86/RAS: Dump error record to dmesg if no consumers Borislav Petkov
2016-11-01 12:09 ` [RFC PATCH 1/3] notifiers: Document notifier priority Borislav Petkov
2016-11-01 12:09 ` [RFC PATCH 2/3] x86/RAS: Add TSC to the injected MCE Borislav Petkov
2016-11-08 16:19   ` [tip:ras/core] x86/RAS: Add TSC timestamp " tip-bot for Borislav Petkov
2016-11-01 12:09 ` [RFC PATCH 3/3] x86/MCE: Dump MCE to dmesg if no consumers Borislav Petkov
2016-11-08 16:19   ` [tip:ras/core] " tip-bot for Borislav Petkov
2016-11-05 13:11 ` [PATCH] x86/MCE: Remove MCP_TIMESTAMP Borislav Petkov
2016-11-07 17:48   ` Luck, Tony
2016-11-07 18:08     ` Borislav Petkov
2016-11-07 18:37       ` Luck, Tony
2016-11-08 18:09         ` Borislav Petkov
2016-11-08 18:22           ` Luck, Tony
2016-11-08 20:39           ` Thomas Gleixner
2016-11-08 21:08             ` Borislav Petkov
2016-11-08 21:14               ` Thomas Gleixner
2016-11-08 21:24                 ` Borislav Petkov
2016-11-08 21:54                   ` Thomas Gleixner
2016-11-09 18:06                     ` Borislav Petkov
2017-01-18 20:34                       ` Borislav Petkov
2017-01-19 13:34                         ` Borislav Petkov
2016-11-07  7:37 ` [RFC PATCH 0/3] x86/RAS: Dump error record to dmesg if no consumers Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).