All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Enable deferred error interrupts
@ 2015-04-30 14:49 Aravind Gopalakrishnan
  2015-04-30 14:49 ` [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit Aravind Gopalakrishnan
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Aravind Gopalakrishnan @ 2015-04-30 14:49 UTC (permalink / raw)
  To: tglx, mingo, hpa, tony.luck, bp, jiang.liu, yinghai
  Cc: x86, dvlasenk, JBeulich, slaoub, luto, dave.hansen, oleg,
	rostedt, rusty, prarit, linux, jroedel, andriy.shevchenko, macro,
	wangnan0, linux-kernel, linux-edac

Newer AMD processors can generate deferred errors and can be configured
to generate APIC interrupts on such events.

This patchset introduces a new interrupt handler for deferred errors and
configures the HW if the feature is present.

Patch1: Defines SUCCOR cpuid bit. This indicates prescence of features
	such as data poisoning and deferred error interrupts in hardware.
Patch2: Implement the interrupt handler.
	- setup vector number, build the interrupt and implement handler
	  function in this patch.
Patch3, Patch 4: Cleanups in the code. No functional changes are introduced.

Aravind Gopalakrishnan (4):
  x86/mce: Define 'SUCCOR' cpuid bit
  x86/mce/amd: Introduce deferred error interrupt handler
  x86, irq: Cleanup ordering of vector numbers
  x86/mce/amd: Rename setup_APIC_mce

 arch/x86/include/asm/entry_arch.h        |   3 +
 arch/x86/include/asm/hardirq.h           |   3 +
 arch/x86/include/asm/hw_irq.h            |   2 +
 arch/x86/include/asm/irq_vectors.h       |  11 ++--
 arch/x86/include/asm/mce.h               |   6 +-
 arch/x86/include/asm/trace/irq_vectors.h |   6 ++
 arch/x86/include/asm/traps.h             |   3 +-
 arch/x86/kernel/cpu/mcheck/mce.c         |   1 +
 arch/x86/kernel/cpu/mcheck/mce_amd.c     | 105 ++++++++++++++++++++++++++++++-
 arch/x86/kernel/entry_64.S               |   5 ++
 arch/x86/kernel/irq.c                    |   6 ++
 arch/x86/kernel/irqinit.c                |   4 ++
 arch/x86/kernel/traps.c                  |   4 ++
 13 files changed, 150 insertions(+), 9 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit
  2015-04-30 14:49 [PATCH 0/4] Enable deferred error interrupts Aravind Gopalakrishnan
@ 2015-04-30 14:49 ` Aravind Gopalakrishnan
  2015-05-01 10:25   ` Borislav Petkov
  2015-05-01 15:09   ` Dave Hansen
  2015-04-30 14:49 ` [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler Aravind Gopalakrishnan
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Aravind Gopalakrishnan @ 2015-04-30 14:49 UTC (permalink / raw)
  To: tglx, mingo, hpa, tony.luck, bp, jiang.liu, yinghai
  Cc: x86, dvlasenk, JBeulich, slaoub, luto, dave.hansen, oleg,
	rostedt, rusty, prarit, linux, jroedel, andriy.shevchenko, macro,
	wangnan0, linux-kernel, linux-edac

SUCCOR stands for S/W UnCorrectable error COntainment and Recovery.
It indicates support for data poisoning in HW and deferred error
interrupts.

Add new bitfield in mce_vendor_flags for this.
We use this to verify prescence of deferred error interrupts
before we enable them in mce_amd.c

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 arch/x86/include/asm/mce.h       | 3 ++-
 arch/x86/kernel/cpu/mcheck/mce.c | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 1f5a86d..dfcb664 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -118,7 +118,8 @@ struct mca_config {
 
 struct mce_vendor_flags {
 	__u64		overflow_recov	: 1, /* cpuid_ebx(80000007) */
-			__reserved_0	: 63;
+			succor		: 1,
+			__reserved_0	: 62;
 };
 extern struct mce_vendor_flags mce_flags;
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index e535533..de61f62e 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1640,6 +1640,7 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
 	case X86_VENDOR_AMD:
 		mce_amd_feature_init(c);
 		mce_flags.overflow_recov = cpuid_ebx(0x80000007) & 0x1;
+		mce_flags.succor = (cpuid_ebx(0x80000007) & 0x2) ? 1 : 0;
 		break;
 	default:
 		break;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
  2015-04-30 14:49 [PATCH 0/4] Enable deferred error interrupts Aravind Gopalakrishnan
  2015-04-30 14:49 ` [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit Aravind Gopalakrishnan
@ 2015-04-30 14:49 ` Aravind Gopalakrishnan
  2015-04-30 20:41   ` Andy Lutomirski
  2015-05-03  9:22   ` Borislav Petkov
  2015-04-30 14:49 ` [PATCH 3/4] x86, irq: Cleanup ordering of vector numbers Aravind Gopalakrishnan
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Aravind Gopalakrishnan @ 2015-04-30 14:49 UTC (permalink / raw)
  To: tglx, mingo, hpa, tony.luck, bp, jiang.liu, yinghai
  Cc: x86, dvlasenk, JBeulich, slaoub, luto, dave.hansen, oleg,
	rostedt, rusty, prarit, linux, jroedel, andriy.shevchenko, macro,
	wangnan0, linux-kernel, linux-edac

Changes introduced in the patch-
  - Assign vector number 0xf4 for Deferred errors
  - Declare deferred_interrupt, allocate gate and bind it
    to DEFERRED_APIC_VECTOR.
  - Declare smp_deferred_interrupt to be used as the
    entry point for the interrupt in mce_amd.c
  - Define trace_deferred_interrupt for tracing
  - Enable deferred error interrupt selectively upon detection
    of 'succor' bitfield
  - Setup amd_deferred_error_interrupt() to handle the interrupt
    and assign it to def_int_vector if feature is present in HW.
    Else, let default handler deal with it.
  - Provide Deferred error interrupt stats on
    /proc/interrupts by incrementing irq_deferred_count

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 arch/x86/include/asm/entry_arch.h        |   3 +
 arch/x86/include/asm/hardirq.h           |   3 +
 arch/x86/include/asm/hw_irq.h            |   2 +
 arch/x86/include/asm/irq_vectors.h       |   1 +
 arch/x86/include/asm/mce.h               |   3 +
 arch/x86/include/asm/trace/irq_vectors.h |   6 ++
 arch/x86/include/asm/traps.h             |   3 +-
 arch/x86/kernel/cpu/mcheck/mce_amd.c     | 101 +++++++++++++++++++++++++++++++
 arch/x86/kernel/entry_64.S               |   5 ++
 arch/x86/kernel/irq.c                    |   6 ++
 arch/x86/kernel/irqinit.c                |   4 ++
 arch/x86/kernel/traps.c                  |   4 ++
 12 files changed, 140 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h
index dc5fa66..f7b957b 100644
--- a/arch/x86/include/asm/entry_arch.h
+++ b/arch/x86/include/asm/entry_arch.h
@@ -50,4 +50,7 @@ BUILD_INTERRUPT(thermal_interrupt,THERMAL_APIC_VECTOR)
 BUILD_INTERRUPT(threshold_interrupt,THRESHOLD_APIC_VECTOR)
 #endif
 
+#ifdef CONFIG_X86_MCE_AMD
+BUILD_INTERRUPT(deferred_interrupt, DEFERRED_APIC_VECTOR)
+#endif
 #endif
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 0f5fb6b..448451c 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -33,6 +33,9 @@ typedef struct {
 #ifdef CONFIG_X86_MCE_THRESHOLD
 	unsigned int irq_threshold_count;
 #endif
+#ifdef CONFIG_X86_MCE_AMD
+	unsigned int irq_deferred_count;
+#endif
 #if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
 	unsigned int irq_hv_callback_count;
 #endif
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index e9571dd..7cf2670 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -73,6 +73,7 @@ extern asmlinkage void invalidate_interrupt31(void);
 extern asmlinkage void irq_move_cleanup_interrupt(void);
 extern asmlinkage void reboot_interrupt(void);
 extern asmlinkage void threshold_interrupt(void);
+extern asmlinkage void deferred_interrupt(void);
 
 extern asmlinkage void call_function_interrupt(void);
 extern asmlinkage void call_function_single_interrupt(void);
@@ -87,6 +88,7 @@ extern void trace_spurious_interrupt(void);
 extern void trace_thermal_interrupt(void);
 extern void trace_reschedule_interrupt(void);
 extern void trace_threshold_interrupt(void);
+extern void trace_deferred_interrupt(void);
 extern void trace_call_function_interrupt(void);
 extern void trace_call_function_single_interrupt(void);
 #define trace_irq_move_cleanup_interrupt  irq_move_cleanup_interrupt
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 666c89e..cee723f 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -113,6 +113,7 @@
 #define IRQ_WORK_VECTOR			0xf6
 
 #define UV_BAU_MESSAGE			0xf5
+#define DEFERRED_APIC_VECTOR		0xf4
 
 /* Vector on which hypervisor callbacks will be delivered */
 #define HYPERVISOR_CALLBACK_VECTOR	0xf3
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index dfcb664..b21b887 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -224,6 +224,9 @@ void do_machine_check(struct pt_regs *, long);
 extern void (*mce_threshold_vector)(void);
 extern void (*threshold_cpu_callback)(unsigned long action, unsigned int cpu);
 
+/* Deferred error interrupt handler */
+extern void (*deferred_int_vector)(void);
+
 /*
  * Thermal handler
  */
diff --git a/arch/x86/include/asm/trace/irq_vectors.h b/arch/x86/include/asm/trace/irq_vectors.h
index 4cab890..3c1f0a7 100644
--- a/arch/x86/include/asm/trace/irq_vectors.h
+++ b/arch/x86/include/asm/trace/irq_vectors.h
@@ -101,6 +101,12 @@ DEFINE_IRQ_VECTOR_EVENT(call_function_single);
 DEFINE_IRQ_VECTOR_EVENT(threshold_apic);
 
 /*
+ * deferred_apic - called when entering/exiting a deferred apic interrupt
+ * vector handler
+ */
+DEFINE_IRQ_VECTOR_EVENT(deferred_apic);
+
+/*
  * thermal_apic - called when entering/exiting a thermal apic interrupt
  * vector handler
  */
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 4e49d7d..ef937b7 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -108,7 +108,8 @@ extern int panic_on_unrecovered_nmi;
 void math_emulate(struct math_emu_info *);
 #ifndef CONFIG_X86_32
 asmlinkage void smp_thermal_interrupt(void);
-asmlinkage void mce_threshold_interrupt(void);
+asmlinkage void smp_threshold_interrupt(void);
+asmlinkage void smp_deferred_interrupt(void);
 #endif
 
 extern enum ctx_state ist_enter(struct pt_regs *regs);
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 55ad9b3..ce82f0b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -12,6 +12,8 @@
  *     - added support for AMD Family 0x10 processors
  *  May 2012
  *     - major scrubbing
+ * April 2015
+ *     - add support for deferred error interrupts
  *
  *  All MC4_MISCi registers are shared between multi-cores
  */
@@ -32,6 +34,7 @@
 #include <asm/idle.h>
 #include <asm/mce.h>
 #include <asm/msr.h>
+#include <asm/trace/irq_vectors.h>
 
 #define NR_BLOCKS         9
 #define THRESHOLD_MAX     0xFFF
@@ -47,6 +50,13 @@
 #define MASK_BLKPTR_LO    0xFF000000
 #define MCG_XBLK_ADDR     0xC0000400
 
+/* Deferred error settings */
+#define MSR_CU_DEF_ERR		0xC0000410
+#define MASK_DEF_LVTOFF		0x000000F0
+#define MASK_DEF_INT_TYPE	0x00000006
+#define DEF_LVT_OFF		0x2
+#define DEF_INT_TYPE_APIC	0x2
+
 static const char * const th_names[] = {
 	"load_store",
 	"insn_fetch",
@@ -60,6 +70,15 @@ static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
 static DEFINE_PER_CPU(unsigned char, bank_map);	/* see which banks are on */
 
 static void amd_threshold_interrupt(void);
+static void amd_deferred_error_interrupt(void);
+
+/* Setup default deferred error interrupt handler */
+static void default_deferred_interrupt(void)
+{
+	pr_err("Unexpected deferred interrupt at vector %x\n",
+			 DEFERRED_APIC_VECTOR);
+}
+void (*def_int_vector)(void) = default_deferred_interrupt;
 
 /*
  * CPU Initialization
@@ -205,6 +224,62 @@ static int setup_APIC_mce(int reserved, int new)
 	return reserved;
 }
 
+static int setup_APIC_deferred_error(int reserved, int new)
+{
+	if (reserved < 0 && !setup_APIC_eilvt(new, DEFERRED_APIC_VECTOR,
+					      APIC_EILVT_MSG_FIX, 0))
+		return new;
+
+	return reserved;
+}
+
+static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
+{
+	u32 low = 0, high = 0;
+	int def_offset = -1, def_new;
+
+	if (rdmsr_safe(MSR_CU_DEF_ERR, &low, &high))
+		return;
+
+	def_new = (low & MASK_DEF_LVTOFF) >> 4;
+	if (c->x86 == 0x15 && c->x86_model == 0x60 &&
+	    !(low & MASK_DEF_LVTOFF)) {
+		def_new = DEF_LVT_OFF;
+		low = (low & ~MASK_DEF_LVTOFF) | (DEF_LVT_OFF << 4);
+	}
+
+	def_offset = setup_APIC_deferred_error(def_offset, def_new);
+
+	if ((def_offset == def_new) &&
+	    (def_int_vector != amd_deferred_error_interrupt))
+		def_int_vector = amd_deferred_error_interrupt;
+
+	low = (low & ~MASK_DEF_INT_TYPE) | DEF_INT_TYPE_APIC;
+	wrmsr(MSR_CU_DEF_ERR, low, high);
+}
+
+static inline void __smp_deferred_interrupt(void)
+{
+	inc_irq_stat(irq_deferred_count);
+	def_int_vector();
+}
+
+asmlinkage __visible void smp_deferred_interrupt(void)
+{
+	entering_irq();
+	__smp_deferred_interrupt();
+	exiting_ack_irq();
+}
+
+asmlinkage __visible void smp_trace_deferred_interrupt(void)
+{
+	entering_irq();
+	trace_deferred_apic_entry(DEFERRED_APIC_VECTOR);
+	__smp_deferred_interrupt();
+	trace_deferred_apic_exit(DEFERRED_APIC_VECTOR);
+	exiting_ack_irq();
+}
+
 /* cpu init entry point, called from mce.c with preempt off */
 void mce_amd_feature_init(struct cpuinfo_x86 *c)
 {
@@ -262,6 +337,32 @@ init:
 			mce_threshold_block_init(&b, offset);
 		}
 	}
+
+	if (mce_flags.succor)
+		deferred_error_interrupt_enable(c);
+}
+
+/* Apic interrupt handler for deferred errors */
+static void amd_deferred_error_interrupt(void)
+{
+	u64 status;
+	unsigned int bank;
+	struct mce m;
+
+	for (bank = 0; bank < mca_cfg.banks; ++bank) {
+		rdmsrl(MSR_IA32_MCx_STATUS(bank), status);
+
+		if (!(status & MCI_STATUS_VAL) ||
+		    !(status & MCI_STATUS_DEFERRED))
+			continue;
+
+		mce_setup(&m);
+		m.bank = bank;
+		m.status = status;
+		mce_log(&m);
+		wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
+		break;
+	}
 }
 
 /*
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index e952f6b..820965c 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -914,6 +914,11 @@ apicinterrupt THRESHOLD_APIC_VECTOR \
 	threshold_interrupt smp_threshold_interrupt
 #endif
 
+#ifdef CONFIG_X86_MCE_AMD
+apicinterrupt DEFERRED_APIC_VECTOR \
+	deferred_interrupt smp_deferred_interrupt
+#endif
+
 #ifdef CONFIG_X86_THERMAL_VECTOR
 apicinterrupt THERMAL_APIC_VECTOR \
 	thermal_interrupt smp_thermal_interrupt
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index e5952c2..406f204 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -116,6 +116,12 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 		seq_printf(p, "%10u ", irq_stats(j)->irq_threshold_count);
 	seq_puts(p, "  Threshold APIC interrupts\n");
 #endif
+#ifdef CONFIG_X86_MCE_AMD
+	seq_printf(p, "%*s: ", prec, "DEF");
+	for_each_online_cpu(j)
+		seq_printf(p, "%10u ", irq_stats(j)->irq_deferred_count);
+	seq_puts(p, "  Deferred APIC interrupts\n");
+#endif
 #ifdef CONFIG_X86_MCE
 	seq_printf(p, "%*s: ", prec, "MCE");
 	for_each_online_cpu(j)
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index cd10a64..e0ffd29 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -135,6 +135,10 @@ static void __init apic_intr_init(void)
 	alloc_intr_gate(THRESHOLD_APIC_VECTOR, threshold_interrupt);
 #endif
 
+#ifdef CONFIG_X86_MCE_AMD
+	alloc_intr_gate(DEFERRED_APIC_VECTOR, deferred_interrupt);
+#endif
+
 #ifdef CONFIG_X86_LOCAL_APIC
 	/* self generated IPI for local APIC timer */
 	alloc_intr_gate(LOCAL_TIMER_VECTOR, apic_timer_interrupt);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 324ab52..dbfe07c2 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -827,6 +827,10 @@ asmlinkage __visible void __attribute__((weak)) smp_threshold_interrupt(void)
 {
 }
 
+asmlinkage __visible void __attribute__((weak)) smp_deferred_interrupt(void)
+{
+}
+
 /*
  * 'math_state_restore()' saves the current math information in the
  * old math state array, and gets the new ones from the current task
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/4] x86, irq: Cleanup ordering of vector numbers
  2015-04-30 14:49 [PATCH 0/4] Enable deferred error interrupts Aravind Gopalakrishnan
  2015-04-30 14:49 ` [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit Aravind Gopalakrishnan
  2015-04-30 14:49 ` [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler Aravind Gopalakrishnan
@ 2015-04-30 14:49 ` Aravind Gopalakrishnan
  2015-04-30 14:49 ` [PATCH 4/4] x86/mce/amd: Rename setup_APIC_mce Aravind Gopalakrishnan
  2015-05-01  7:18 ` [PATCH 0/4] Enable deferred error interrupts Ingo Molnar
  4 siblings, 0 replies; 26+ messages in thread
From: Aravind Gopalakrishnan @ 2015-04-30 14:49 UTC (permalink / raw)
  To: tglx, mingo, hpa, tony.luck, bp, jiang.liu, yinghai
  Cc: x86, dvlasenk, JBeulich, slaoub, luto, dave.hansen, oleg,
	rostedt, rusty, prarit, linux, jroedel, andriy.shevchenko, macro,
	wangnan0, linux-kernel, linux-edac

Enforcing proper descending order of vector number assignments here.
No functional change.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 arch/x86/include/asm/irq_vectors.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index cee723f..244d486 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -102,11 +102,6 @@
  */
 #define X86_PLATFORM_IPI_VECTOR		0xf7
 
-/* Vector for KVM to deliver posted interrupt IPI */
-#ifdef CONFIG_HAVE_KVM
-#define POSTED_INTR_VECTOR		0xf2
-#endif
-
 /*
  * IRQ work vector:
  */
@@ -118,6 +113,11 @@
 /* Vector on which hypervisor callbacks will be delivered */
 #define HYPERVISOR_CALLBACK_VECTOR	0xf3
 
+/* Vector for KVM to deliver posted interrupt IPI */
+#ifdef CONFIG_HAVE_KVM
+#define POSTED_INTR_VECTOR		0xf2
+#endif
+
 /*
  * Local APIC timer IRQ vector is on a different priority level,
  * to work around the 'lost local interrupt if more than 2 IRQ
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/4] x86/mce/amd: Rename setup_APIC_mce
  2015-04-30 14:49 [PATCH 0/4] Enable deferred error interrupts Aravind Gopalakrishnan
                   ` (2 preceding siblings ...)
  2015-04-30 14:49 ` [PATCH 3/4] x86, irq: Cleanup ordering of vector numbers Aravind Gopalakrishnan
@ 2015-04-30 14:49 ` Aravind Gopalakrishnan
  2015-05-01  7:18 ` [PATCH 0/4] Enable deferred error interrupts Ingo Molnar
  4 siblings, 0 replies; 26+ messages in thread
From: Aravind Gopalakrishnan @ 2015-04-30 14:49 UTC (permalink / raw)
  To: tglx, mingo, hpa, tony.luck, bp, jiang.liu, yinghai
  Cc: x86, dvlasenk, JBeulich, slaoub, luto, dave.hansen, oleg,
	rostedt, rusty, prarit, linux, jroedel, andriy.shevchenko, macro,
	wangnan0, linux-kernel, linux-edac

'setup_APIC_mce' doesn't give us an indication of why we are
going to program LVT. Make that explicit by renaming it to
setup_APIC_mce_threshold so we know.

No functional change is introduced.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index ce82f0b..2952864 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -215,7 +215,7 @@ static void mce_threshold_block_init(struct threshold_block *b, int offset)
 	threshold_restart_bank(&tr);
 };
 
-static int setup_APIC_mce(int reserved, int new)
+static int setup_APIC_mce_threshold(int reserved, int new)
 {
 	if (reserved < 0 && !setup_APIC_eilvt(new, THRESHOLD_APIC_VECTOR,
 					      APIC_EILVT_MSG_FIX, 0))
@@ -327,7 +327,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 
 			b.interrupt_enable = 1;
 			new	= (high & MASK_LVTOFF_HI) >> 20;
-			offset  = setup_APIC_mce(offset, new);
+			offset  = setup_APIC_mce_threshold(offset, new);
 
 			if ((offset == new) &&
 			    (mce_threshold_vector != amd_threshold_interrupt))
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
  2015-04-30 14:49 ` [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler Aravind Gopalakrishnan
@ 2015-04-30 20:41   ` Andy Lutomirski
  2015-05-01  4:16     ` Aravind Gopalakrishnan
  2015-05-03  9:22   ` Borislav Petkov
  1 sibling, 1 reply; 26+ messages in thread
From: Andy Lutomirski @ 2015-04-30 20:41 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Borislav Petkov, Jiang Liu, Yinghai Lu, X86 ML, Denys Vlasenko,
	Jan Beulich, slaoub, Dave Hansen, Oleg Nesterov, Steven Rostedt,
	Rusty Russell, Prarit Bhargava, Rasmus Villemoes, jroedel,
	Andy Shevchenko, Maciej W. Rozycki, Wang Nan, linux-kernel,
	linux-edac

On Thu, Apr 30, 2015 at 7:49 AM, Aravind Gopalakrishnan
<Aravind.Gopalakrishnan@amd.com> wrote:
> Changes introduced in the patch-
>   - Assign vector number 0xf4 for Deferred errors
>   - Declare deferred_interrupt, allocate gate and bind it
>     to DEFERRED_APIC_VECTOR.
>   - Declare smp_deferred_interrupt to be used as the
>     entry point for the interrupt in mce_amd.c
>   - Define trace_deferred_interrupt for tracing
>   - Enable deferred error interrupt selectively upon detection
>     of 'succor' bitfield
>   - Setup amd_deferred_error_interrupt() to handle the interrupt
>     and assign it to def_int_vector if feature is present in HW.
>     Else, let default handler deal with it.
>   - Provide Deferred error interrupt stats on
>     /proc/interrupts by incrementing irq_deferred_count

You're calling these "deferred interrupts" all over (e.g.
irq_deferred_count, deferred_int_handler, etc).  That seems like it'll
be confusing.  They're deferred errors, not deferred interrupts.

--Andy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
  2015-04-30 20:41   ` Andy Lutomirski
@ 2015-05-01  4:16     ` Aravind Gopalakrishnan
  2015-05-01  9:36       ` Borislav Petkov
  0 siblings, 1 reply; 26+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-01  4:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Borislav Petkov, Jiang Liu, Yinghai Lu, X86 ML, Denys Vlasenko,
	Jan Beulich, slaoub, Dave Hansen, Oleg Nesterov, Steven Rostedt,
	Rusty Russell, Prarit Bhargava, Rasmus Villemoes, jroedel,
	Andy Shevchenko, Maciej W. Rozycki, Wang Nan, linux-kernel,
	linux-edac


On 4/30/15 3:41 PM, Andy Lutomirski wrote:
> On Thu, Apr 30, 2015 at 7:49 AM, Aravind Gopalakrishnan
> <Aravind.Gopalakrishnan@amd.com> wrote:
>> Changes introduced in the patch-
>>    - Assign vector number 0xf4 for Deferred errors
>>    - Declare deferred_interrupt, allocate gate and bind it
>>      to DEFERRED_APIC_VECTOR.
>>    - Declare smp_deferred_interrupt to be used as the
>>      entry point for the interrupt in mce_amd.c
>>    - Define trace_deferred_interrupt for tracing
>>    - Enable deferred error interrupt selectively upon detection
>>      of 'succor' bitfield
>>    - Setup amd_deferred_error_interrupt() to handle the interrupt
>>      and assign it to def_int_vector if feature is present in HW.
>>      Else, let default handler deal with it.
>>    - Provide Deferred error interrupt stats on
>>      /proc/interrupts by incrementing irq_deferred_count
> You're calling these "deferred interrupts" all over (e.g.
> irq_deferred_count, deferred_int_handler, etc).  That seems like it'll
> be confusing.  They're deferred errors, not deferred interrupts.
>

I used  the term as it is an interrupt due to the deferred error.
Would 'deferred_err_interrupt' be more apt? Maybe 
'irq_deferred_error_count' for the counter?

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] Enable deferred error interrupts
  2015-04-30 14:49 [PATCH 0/4] Enable deferred error interrupts Aravind Gopalakrishnan
                   ` (3 preceding siblings ...)
  2015-04-30 14:49 ` [PATCH 4/4] x86/mce/amd: Rename setup_APIC_mce Aravind Gopalakrishnan
@ 2015-05-01  7:18 ` Ingo Molnar
  2015-05-01 14:50   ` Aravind Gopalakrishnan
  4 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2015-05-01  7:18 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: tglx, mingo, hpa, tony.luck, bp, jiang.liu, yinghai, x86,
	dvlasenk, JBeulich, slaoub, luto, dave.hansen, oleg, rostedt,
	rusty, prarit, linux, jroedel, andriy.shevchenko, macro,
	wangnan0, linux-kernel, linux-edac


* Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> wrote:

> Newer AMD processors can generate deferred errors and can be configured
> to generate APIC interrupts on such events.

What's the wider context of this? What is it good for?

I suspect it's MCE related, but only from the diffstat:

>  arch/x86/kernel/cpu/mcheck/mce.c         |   1 +
>  arch/x86/kernel/cpu/mcheck/mce_amd.c     | 105 ++++++++++++++++++++++++++++++-

Please provide proper high level description for the changes.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
  2015-05-01  4:16     ` Aravind Gopalakrishnan
@ 2015-05-01  9:36       ` Borislav Petkov
  2015-05-01 14:50         ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2015-05-01  9:36 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Tony Luck, Jiang Liu, Yinghai Lu, X86 ML, Denys Vlasenko,
	Jan Beulich, slaoub, Dave Hansen, Oleg Nesterov, Steven Rostedt,
	Rusty Russell, Prarit Bhargava, Rasmus Villemoes, jroedel,
	Andy Shevchenko, Maciej W. Rozycki, Wang Nan, linux-kernel,
	linux-edac

On Thu, Apr 30, 2015 at 11:16:30PM -0500, Aravind Gopalakrishnan wrote:
> I used  the term as it is an interrupt due to the deferred error.
> Would 'deferred_err_interrupt' be more apt? Maybe 'irq_deferred_error_count'
> for the counter?

Yeah, I think it is important to stick to the "deferred error" naming
as those are interrupts announcing deferred errors and not deferred
interrupts.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit
  2015-04-30 14:49 ` [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit Aravind Gopalakrishnan
@ 2015-05-01 10:25   ` Borislav Petkov
  2015-05-01 14:54     ` Aravind Gopalakrishnan
  2015-05-01 15:09   ` Dave Hansen
  1 sibling, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2015-05-01 10:25 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: tglx, mingo, hpa, tony.luck, jiang.liu, yinghai, x86, dvlasenk,
	JBeulich, slaoub, luto, dave.hansen, oleg, rostedt, rusty,
	prarit, linux, jroedel, andriy.shevchenko, macro, wangnan0,
	linux-kernel, linux-edac

On Thu, Apr 30, 2015 at 09:49:22AM -0500, Aravind Gopalakrishnan wrote:
> SUCCOR stands for S/W UnCorrectable error COntainment and Recovery.
> It indicates support for data poisoning in HW and deferred error
> interrupts.
> 
> Add new bitfield in mce_vendor_flags for this.
> We use this to verify prescence of deferred error interrupts
> before we enable them in mce_amd.c
> 
> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
> ---
>  arch/x86/include/asm/mce.h       | 3 ++-
>  arch/x86/kernel/cpu/mcheck/mce.c | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 1f5a86d..dfcb664 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -118,7 +118,8 @@ struct mca_config {
>  
>  struct mce_vendor_flags {
>  	__u64		overflow_recov	: 1, /* cpuid_ebx(80000007) */
> -			__reserved_0	: 63;
> +			succor		: 1,

Please add that CPUID bit definition from the commit message here too so
that we know what it means.

> +			__reserved_0	: 62;
>  };
>  extern struct mce_vendor_flags mce_flags;
>  
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index e535533..de61f62e 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1640,6 +1640,7 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
>  	case X86_VENDOR_AMD:
>  		mce_amd_feature_init(c);
>  		mce_flags.overflow_recov = cpuid_ebx(0x80000007) & 0x1;
> +		mce_flags.succor = (cpuid_ebx(0x80000007) & 0x2) ? 1 : 0;

		mce_flags.succor = !!(cpuid_ebx(0x80000007) & BIT(1));

is a common way of assigning truth values from bits in the kernel.

You can change the above one to use BIT(0) too, while at it, and
vertically align the assignments.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] Enable deferred error interrupts
  2015-05-01  7:18 ` [PATCH 0/4] Enable deferred error interrupts Ingo Molnar
@ 2015-05-01 14:50   ` Aravind Gopalakrishnan
  0 siblings, 0 replies; 26+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-01 14:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: tglx, mingo, hpa, tony.luck, bp, jiang.liu, yinghai, x86,
	dvlasenk, JBeulich, slaoub, luto, dave.hansen, oleg, rostedt,
	rusty, prarit, linux, jroedel, andriy.shevchenko, macro,
	wangnan0, linux-kernel, linux-edac

On 5/1/2015 2:18 AM, Ingo Molnar wrote:
> * Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> wrote:
>
>> Newer AMD processors can generate deferred errors and can be configured
>> to generate APIC interrupts on such events.
> What's the wider context of this? What is it good for?
>
> I suspect it's MCE related, but only from the diffstat:

Deferred errors indicate error conditions that were not corrected, but 
require no action from S/W (or action is optional).
These errors provide info about a latent UC MCE that can occur when a 
poisoned data is consumed by the processor.

HTH,

I shall include the short description in the cover letter of V2.

Thanks,
-Aravind.


>>   arch/x86/kernel/cpu/mcheck/mce.c         |   1 +
>>   arch/x86/kernel/cpu/mcheck/mce_amd.c     | 105 ++++++++++++++++++++++++++++++-
> Please provide proper high level description for the changes.
>
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
  2015-05-01  9:36       ` Borislav Petkov
@ 2015-05-01 14:50         ` Aravind Gopalakrishnan
  0 siblings, 0 replies; 26+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-01 14:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Tony Luck, Jiang Liu, Yinghai Lu, X86 ML, Denys Vlasenko,
	Jan Beulich, slaoub, Dave Hansen, Oleg Nesterov, Steven Rostedt,
	Rusty Russell, Prarit Bhargava, Rasmus Villemoes, jroedel,
	Andy Shevchenko, Maciej W. Rozycki, Wang Nan, linux-kernel,
	linux-edac

On 5/1/2015 4:36 AM, Borislav Petkov wrote:
> On Thu, Apr 30, 2015 at 11:16:30PM -0500, Aravind Gopalakrishnan wrote:
>> I used  the term as it is an interrupt due to the deferred error.
>> Would 'deferred_err_interrupt' be more apt? Maybe 'irq_deferred_error_count'
>> for the counter?
> Yeah, I think it is important to stick to the "deferred error" naming
> as those are interrupts announcing deferred errors and not deferred
> interrupts.
>

Ok. I'll shall do this substitution in V2:
s/deferred_interrupt/deferred_error

and in mce_amd.c, s/def_int_vector/def_err_int_vector

unless you have any objections.

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit
  2015-05-01 10:25   ` Borislav Petkov
@ 2015-05-01 14:54     ` Aravind Gopalakrishnan
  2015-05-03  9:01       ` Borislav Petkov
  0 siblings, 1 reply; 26+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-01 14:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tglx, mingo, hpa, tony.luck, jiang.liu, yinghai, x86, dvlasenk,
	JBeulich, slaoub, luto, dave.hansen, oleg, rostedt, rusty,
	prarit, linux, jroedel, andriy.shevchenko, macro, wangnan0,
	linux-kernel, linux-edac

On 5/1/2015 5:25 AM, Borislav Petkov wrote:
> On Thu, Apr 30, 2015 at 09:49:22AM -0500, Aravind Gopalakrishnan wrote:
>> SUCCOR stands for S/W UnCorrectable error COntainment and Recovery.
>> It indicates support for data poisoning in HW and deferred error
>> interrupts.
>>
>>
>>   
>>   struct mce_vendor_flags {
>>   	__u64		overflow_recov	: 1, /* cpuid_ebx(80000007) */
>> -			__reserved_0	: 63;
>> +			succor		: 1,
> Please add that CPUID bit definition from the commit message here too so
> that we know what it means.

Will do.

Shall I beef up comment regarding 'overflow_recov' too?
Something like 'overflow recovery cpuid bit indicates that overflow 
conditions are not fatal'
would provide a better indication of the usage of the bit IMHO.

>>   	case X86_VENDOR_AMD:
>>   		mce_amd_feature_init(c);
>>   		mce_flags.overflow_recov = cpuid_ebx(0x80000007) & 0x1;
>> +		mce_flags.succor = (cpuid_ebx(0x80000007) & 0x2) ? 1 : 0;
> 		mce_flags.succor = !!(cpuid_ebx(0x80000007) & BIT(1));
>
> is a common way of assigning truth values from bits in the kernel.
>
> You can change the above one to use BIT(0) too, while at it, and
> vertically align the assignments.
>
>


Ok, Will do.

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit
  2015-04-30 14:49 ` [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit Aravind Gopalakrishnan
  2015-05-01 10:25   ` Borislav Petkov
@ 2015-05-01 15:09   ` Dave Hansen
  2015-05-01 16:20     ` Borislav Petkov
  1 sibling, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2015-05-01 15:09 UTC (permalink / raw)
  To: Aravind Gopalakrishnan, tglx, mingo, hpa, tony.luck, bp,
	jiang.liu, yinghai
  Cc: x86, dvlasenk, JBeulich, slaoub, luto, oleg, rostedt, rusty,
	prarit, linux, jroedel, andriy.shevchenko, macro, wangnan0,
	linux-kernel, linux-edac

On 04/30/2015 07:49 AM, Aravind Gopalakrishnan wrote:
> @@ -1640,6 +1640,7 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
>  	case X86_VENDOR_AMD:
>  		mce_amd_feature_init(c);
>  		mce_flags.overflow_recov = cpuid_ebx(0x80000007) & 0x1;
> +		mce_flags.succor = (cpuid_ebx(0x80000007) & 0x2) ? 1 : 0;
>  		break;
>  	default:
>  		break;

Is there a reason to add the cpuid detection like this instead of adding
an X86_FEATURE_ for it and using cpu_has() and friends?  Doing that
would also let you see the bit in /proc/cpuinfo.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit
  2015-05-01 15:09   ` Dave Hansen
@ 2015-05-01 16:20     ` Borislav Petkov
  0 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2015-05-01 16:20 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Aravind Gopalakrishnan, tglx, mingo, hpa, tony.luck, jiang.liu,
	yinghai, x86, dvlasenk, JBeulich, slaoub, luto, oleg, rostedt,
	rusty, prarit, linux, jroedel, andriy.shevchenko, macro,
	wangnan0, linux-kernel, linux-edac

On Fri, May 01, 2015 at 08:09:40AM -0700, Dave Hansen wrote:
> Is there a reason to add the cpuid detection like this instead of adding
> an X86_FEATURE_ for it and using cpu_has() and friends?  Doing that
> would also let you see the bit in /proc/cpuinfo.

Well, as those are RAS-specific, it didn't seem to make a whole lotta
sense to advertise them globally (they're used solely in the MCE code).

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit
  2015-05-01 14:54     ` Aravind Gopalakrishnan
@ 2015-05-03  9:01       ` Borislav Petkov
  0 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2015-05-03  9:01 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: tglx, mingo, hpa, tony.luck, jiang.liu, yinghai, x86, dvlasenk,
	JBeulich, slaoub, luto, dave.hansen, oleg, rostedt, rusty,
	prarit, linux, jroedel, andriy.shevchenko, macro, wangnan0,
	linux-kernel, linux-edac

On Fri, May 01, 2015 at 09:54:19AM -0500, Aravind Gopalakrishnan wrote:
> Shall I beef up comment regarding 'overflow_recov' too?
> Something like 'overflow recovery cpuid bit indicates that overflow
> conditions are not fatal'
> would provide a better indication of the usage of the bit IMHO.

Ok.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
  2015-04-30 14:49 ` [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler Aravind Gopalakrishnan
  2015-04-30 20:41   ` Andy Lutomirski
@ 2015-05-03  9:22   ` Borislav Petkov
  2015-05-04 15:29     ` Aravind Gopalakrishnan
  1 sibling, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2015-05-03  9:22 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: tglx, mingo, hpa, tony.luck, jiang.liu, yinghai, x86, dvlasenk,
	JBeulich, slaoub, luto, dave.hansen, oleg, rostedt, rusty,
	prarit, linux, jroedel, andriy.shevchenko, macro, wangnan0,
	linux-kernel, linux-edac

On Thu, Apr 30, 2015 at 09:49:23AM -0500, Aravind Gopalakrishnan wrote:
> Changes introduced in the patch-
>   - Assign vector number 0xf4 for Deferred errors
>   - Declare deferred_interrupt, allocate gate and bind it
>     to DEFERRED_APIC_VECTOR.
>   - Declare smp_deferred_interrupt to be used as the
>     entry point for the interrupt in mce_amd.c
>   - Define trace_deferred_interrupt for tracing
>   - Enable deferred error interrupt selectively upon detection
>     of 'succor' bitfield
>   - Setup amd_deferred_error_interrupt() to handle the interrupt
>     and assign it to def_int_vector if feature is present in HW.
>     Else, let default handler deal with it.
>   - Provide Deferred error interrupt stats on
>     /proc/interrupts by incrementing irq_deferred_count

This commit message should explain the feature in more high-level way,
what is it good for and so on, not what you're adding.

That I can see. :-)

> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
> ---
>  arch/x86/include/asm/entry_arch.h        |   3 +
>  arch/x86/include/asm/hardirq.h           |   3 +
>  arch/x86/include/asm/hw_irq.h            |   2 +
>  arch/x86/include/asm/irq_vectors.h       |   1 +
>  arch/x86/include/asm/mce.h               |   3 +
>  arch/x86/include/asm/trace/irq_vectors.h |   6 ++
>  arch/x86/include/asm/traps.h             |   3 +-
>  arch/x86/kernel/cpu/mcheck/mce_amd.c     | 101 +++++++++++++++++++++++++++++++
>  arch/x86/kernel/entry_64.S               |   5 ++
>  arch/x86/kernel/irq.c                    |   6 ++
>  arch/x86/kernel/irqinit.c                |   4 ++
>  arch/x86/kernel/traps.c                  |   4 ++
>  12 files changed, 140 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h
> index dc5fa66..f7b957b 100644
> --- a/arch/x86/include/asm/entry_arch.h
> +++ b/arch/x86/include/asm/entry_arch.h
> @@ -50,4 +50,7 @@ BUILD_INTERRUPT(thermal_interrupt,THERMAL_APIC_VECTOR)
>  BUILD_INTERRUPT(threshold_interrupt,THRESHOLD_APIC_VECTOR)
>  #endif
>  
> +#ifdef CONFIG_X86_MCE_AMD
> +BUILD_INTERRUPT(deferred_interrupt, DEFERRED_APIC_VECTOR)

All the other names are written out so you can simply do

BUILD_INTERRUPT(deferred_error_interrupt, DEFERRED_ERROR_VECTOR)

> +#endif
>  #endif
> diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
> index 0f5fb6b..448451c 100644
> --- a/arch/x86/include/asm/hardirq.h
> +++ b/arch/x86/include/asm/hardirq.h
> @@ -33,6 +33,9 @@ typedef struct {
>  #ifdef CONFIG_X86_MCE_THRESHOLD
>  	unsigned int irq_threshold_count;
>  #endif
> +#ifdef CONFIG_X86_MCE_AMD
> +	unsigned int irq_deferred_count;

Right
	unsigned int irq_deferred_error_count;

> +#endif
>  #if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
>  	unsigned int irq_hv_callback_count;
>  #endif
> diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
> index e9571dd..7cf2670 100644
> --- a/arch/x86/include/asm/hw_irq.h
> +++ b/arch/x86/include/asm/hw_irq.h

...

> +static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
> +{
> +	u32 low = 0, high = 0;
> +	int def_offset = -1, def_new;
> +
> +	if (rdmsr_safe(MSR_CU_DEF_ERR, &low, &high))
> +		return;
> +
> +	def_new = (low & MASK_DEF_LVTOFF) >> 4;
> +	if (c->x86 == 0x15 && c->x86_model == 0x60 &&
> +	    !(low & MASK_DEF_LVTOFF)) {

What's the family check for? for BIOSes which don't set the LVT offset
to 2, as they should?

If so, we probably should say

	pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n");

or similar...

> +		def_new = DEF_LVT_OFF;
> +		low = (low & ~MASK_DEF_LVTOFF) | (DEF_LVT_OFF << 4);
> +	}
> +
> +	def_offset = setup_APIC_deferred_error(def_offset, def_new);
> +
> +	if ((def_offset == def_new) &&
> +	    (def_int_vector != amd_deferred_error_interrupt))
> +		def_int_vector = amd_deferred_error_interrupt;
> +
> +	low = (low & ~MASK_DEF_INT_TYPE) | DEF_INT_TYPE_APIC;
> +	wrmsr(MSR_CU_DEF_ERR, low, high);
> +}

...

> +/* Apic interrupt handler for deferred errors */
> +static void amd_deferred_error_interrupt(void)
> +{
> +	u64 status;
> +	unsigned int bank;
> +	struct mce m;
> +
> +	for (bank = 0; bank < mca_cfg.banks; ++bank) {
> +		rdmsrl(MSR_IA32_MCx_STATUS(bank), status);
> +
> +		if (!(status & MCI_STATUS_VAL) ||
> +		    !(status & MCI_STATUS_DEFERRED))
> +			continue;
> +
> +		mce_setup(&m);
> +		m.bank = bank;
> +		m.status = status;
> +		mce_log(&m);
> +		wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
> +		break;
> +	}

That's very similar to what we do in the end of
amd_threshold_interrupt(). You could add a generic __log_error() static
helper in a pre-patch and then call it here.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
  2015-05-03  9:22   ` Borislav Petkov
@ 2015-05-04 15:29     ` Aravind Gopalakrishnan
  2015-05-04 15:46       ` Borislav Petkov
  0 siblings, 1 reply; 26+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-04 15:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tglx, mingo, hpa, tony.luck, jiang.liu, yinghai, x86, dvlasenk,
	JBeulich, slaoub, luto, dave.hansen, oleg, rostedt, rusty,
	prarit, linux, jroedel, andriy.shevchenko, macro, wangnan0,
	linux-kernel, linux-edac

On 5/3/2015 4:22 AM, Borislav Petkov wrote:
> On Thu, Apr 30, 2015 at 09:49:23AM -0500, Aravind Gopalakrishnan wrote:
>> Changes introduced in the patch-
>>    - Assign vector number 0xf4 for Deferred errors
>>    - Declare deferred_interrupt, allocate gate and bind it
>>      to DEFERRED_APIC_VECTOR.
>>    - Declare smp_deferred_interrupt to be used as the
>>      entry point for the interrupt in mce_amd.c
>>    - Define trace_deferred_interrupt for tracing
>>    - Enable deferred error interrupt selectively upon detection
>>      of 'succor' bitfield
>>    - Setup amd_deferred_error_interrupt() to handle the interrupt
>>      and assign it to def_int_vector if feature is present in HW.
>>      Else, let default handler deal with it.
>>    - Provide Deferred error interrupt stats on
>>      /proc/interrupts by incrementing irq_deferred_count
> This commit message should explain the feature in more high-level way,
> what is it good for and so on, not what you're adding.
>
> That I can see. :-)

Okay, I'll include a short description of deferred errors here for V2.

>> +#endif
>>   #endif
>> diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
>> index 0f5fb6b..448451c 100644
>> --- a/arch/x86/include/asm/hardirq.h
>> +++ b/arch/x86/include/asm/hardirq.h
>> @@ -33,6 +33,9 @@ typedef struct {
>>   #ifdef CONFIG_X86_MCE_THRESHOLD
>>   	unsigned int irq_threshold_count;
>>   #endif
>> +#ifdef CONFIG_X86_MCE_AMD
>> +	unsigned int irq_deferred_count;
> Right
> 	unsigned int irq_deferred_error_count;

Ack.

>> +static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
>> +{
>> +	u32 low = 0, high = 0;
>> +	int def_offset = -1, def_new;
>> +
>> +	if (rdmsr_safe(MSR_CU_DEF_ERR, &low, &high))
>> +		return;
>> +
>> +	def_new = (low & MASK_DEF_LVTOFF) >> 4;
>> +	if (c->x86 == 0x15 && c->x86_model == 0x60 &&
>> +	    !(low & MASK_DEF_LVTOFF)) {
> What's the family check for? for BIOSes which don't set the LVT offset
> to 2, as they should?
>
> If so, we probably should say
>
> 	pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n");
>
> or similar...

Yeah. I meant to provide a comment at least for this.
Forgot to do that.

I'll print out a error message as you suggested (considering we do this 
in other places like threshold setup or IBS setup..)

>> +/* Apic interrupt handler for deferred errors */
>> +static void amd_deferred_error_interrupt(void)
>> +{
>> +	u64 status;
>> +	unsigned int bank;
>> +	struct mce m;
>> +
>> +	for (bank = 0; bank < mca_cfg.banks; ++bank) {
>> +		rdmsrl(MSR_IA32_MCx_STATUS(bank), status);
>> +
>> +		if (!(status & MCI_STATUS_VAL) ||
>> +		    !(status & MCI_STATUS_DEFERRED))
>> +			continue;
>> +
>> +		mce_setup(&m);
>> +		m.bank = bank;
>> +		m.status = status;
>> +		mce_log(&m);
>> +		wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
>> +		break;
>> +	}
> That's very similar to what we do in the end of
> amd_threshold_interrupt(). You could add a generic __log_error() static
> helper in a pre-patch and then call it here.
>

Right. I think a __log_error() is a good idea.
Except, in amd_threshold_interrupt(), we have-
m.misc = ((u64)high << 32) | low;

which, is actually useless as we don't use m.misc anywhere in 
amd_decode_mce() or anywhere else in the decoding pipeline AFAICT.
We only print out if 'misc' is valid and we only need status bits for that-
((m->status & MCI_STATUS_MISCV) ? "MiscV" : "-"),

But, more importantly, we don't setup 'm.addr' here (in 
amd_threshold_interrupt() or in amd_deferred_error_interrupt())
Which means anytime we pass an error to be decoded from the interrupt 
handlers, we don't get any info about the error address.

So, we can do one of these-
1. Remove m.misc setup in amd_threshold_interrupt() and 
rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr) before we call mce_log()
2. Since we have mce_read_aux() that reads misc and addr registers, we 
can move the mce_[rd|wr]msrl wrappers and mce_read_aux() into mce.h and 
use it here in mce_amd.c

Thoughts?

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
  2015-05-04 15:29     ` Aravind Gopalakrishnan
@ 2015-05-04 15:46       ` Borislav Petkov
  2015-05-04 17:08         ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2015-05-04 15:46 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: tglx, mingo, hpa, tony.luck, jiang.liu, yinghai, x86, dvlasenk,
	JBeulich, slaoub, luto, dave.hansen, oleg, rostedt, rusty,
	prarit, linux, jroedel, andriy.shevchenko, macro, wangnan0,
	linux-kernel, linux-edac, Robert Richter

On Mon, May 04, 2015 at 10:29:50AM -0500, Aravind Gopalakrishnan wrote:
> >What's the family check for? for BIOSes which don't set the LVT offset
> >to 2, as they should?
> >
> >If so, we probably should say
> >
> >	pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n");
> >
> >or similar...
> 
> Yeah. I meant to provide a comment at least for this.
> Forgot to do that.
> 
> I'll print out a error message as you suggested (considering we do this in
> other places like threshold setup or IBS setup..)

lvt_off_valid() does that already. Adding Robert.

> Right. I think a __log_error() is a good idea.
> Except, in amd_threshold_interrupt(), we have-
> m.misc = ((u64)high << 32) | low;
> 
> which, is actually useless as we don't use m.misc anywhere in
> amd_decode_mce() or anywhere else in the decoding pipeline AFAICT.
> We only print out if 'misc' is valid and we only need status bits for that-
> ((m->status & MCI_STATUS_MISCV) ? "MiscV" : "-"),
> 
> But, more importantly, we don't setup 'm.addr' here (in
> amd_threshold_interrupt() or in amd_deferred_error_interrupt())
> Which means anytime we pass an error to be decoded from the interrupt
> handlers, we don't get any info about the error address.

So what are we reporting with a deferred error if it is not a
full-fledged MCE? We better fix that otherwise we probably shouldn't
even report those. I mean, userspace is supposed to do some error
handling based on error info but if that info's missing, we might just
as well panic right then and there, right?

> So, we can do one of these-
> 1. Remove m.misc setup in amd_threshold_interrupt() and
> rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr) before we call mce_log()
> 2. Since we have mce_read_aux() that reads misc and addr registers, we can
> move the mce_[rd|wr]msrl wrappers and mce_read_aux() into mce.h and use it
> here in mce_amd.c
> 
> Thoughts?

Makes sense but you need to first check though, which registers are
valid in the hw when a threshold/deferred error happens and collect
them. Only then we can do proper recovery.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
  2015-05-04 15:46       ` Borislav Petkov
@ 2015-05-04 17:08         ` Aravind Gopalakrishnan
  2015-05-04 18:46           ` Borislav Petkov
  0 siblings, 1 reply; 26+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-04 17:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tglx, mingo, hpa, tony.luck, jiang.liu, yinghai, x86, dvlasenk,
	JBeulich, slaoub, luto, dave.hansen, oleg, rostedt, rusty,
	prarit, linux, jroedel, andriy.shevchenko, macro, wangnan0,
	linux-kernel, linux-edac, Robert Richter

On 5/4/2015 10:46 AM, Borislav Petkov wrote:
> On Mon, May 04, 2015 at 10:29:50AM -0500, Aravind Gopalakrishnan wrote:
>>> What's the family check for? for BIOSes which don't set the LVT offset
>>> to 2, as they should?
>>>
>>> If so, we probably should say
>>>
>>> 	pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n");
>>>
>>> or similar...
>> Yeah. I meant to provide a comment at least for this.
>> Forgot to do that.
>>
>> I'll print out a error message as you suggested (considering we do this in
>> other places like threshold setup or IBS setup..)
> lvt_off_valid() does that already. Adding Robert.

Not sure if lvt_off_valid() can be reused for deferred error interrupt 
setup.
It expects some some of  info to be in struct threshold_block which is 
fine for threshold errors and the shifts for offset are different too.

For deferred errors, the workaround is a little different as it applies 
to only the given family/model right now.
If the workaround needs to be applied for future processors, we can 
extend the family check for those right?


>> Right. I think a __log_error() is a good idea.
>> Except, in amd_threshold_interrupt(), we have-
>> m.misc = ((u64)high << 32) | low;
>>
>> which, is actually useless as we don't use m.misc anywhere in
>> amd_decode_mce() or anywhere else in the decoding pipeline AFAICT.
>> We only print out if 'misc' is valid and we only need status bits for that-
>> ((m->status & MCI_STATUS_MISCV) ? "MiscV" : "-"),
>>
>> But, more importantly, we don't setup 'm.addr' here (in
>> amd_threshold_interrupt() or in amd_deferred_error_interrupt())
>> Which means anytime we pass an error to be decoded from the interrupt
>> handlers, we don't get any info about the error address.
> So what are we reporting with a deferred error if it is not a
> full-fledged MCE? We better fix that otherwise we probably shouldn't
> even report those. I mean, userspace is supposed to do some error
> handling based on error info but if that info's missing, we might just
> as well panic right then and there, right?

Oh no.. It is a proper MCE.
I was simply saying that when we look at dmesg logs after an error 
happens, we would not see any useful info regarding the error address.
Here's an example-
[ 1314.651485] [Hardware Error]: Deferred error.
[ 1314.651611] [Hardware Error]: CPU:0 (15:60:0) 
MC4_STATUS[Over|CE|MiscV|-|AddrV|Deferred|-|UECC]: 0xdc04b00005080813
[ 1314.651898] [Hardware Error]: MC4 Error Address: 0x0000000000000000
                             ^^^^^^^^^^^^^^^^^^^^

The Error Address will always be logged as 0x0 as m->addr in 
amd_decode_mce() is 0x0.
If we setup 'm.addr' in amd_threshold_interrupt() and 
amd_deferred_error_interrupt() properly, then amd_decode_mce() would 
actually have
some value in m->addr to report.

I didn't mean to say HW doesn't provide us the information in the addr 
and/or the misc registers.

>> So, we can do one of these-
>> 1. Remove m.misc setup in amd_threshold_interrupt() and
>> rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr) before we call mce_log()
>> 2. Since we have mce_read_aux() that reads misc and addr registers, we can
>> move the mce_[rd|wr]msrl wrappers and mce_read_aux() into mce.h and use it
>> here in mce_amd.c
>>
>> Thoughts?
> Makes sense but you need to first check though, which registers are
> valid in the hw when a threshold/deferred error happens and collect
> them. Only then we can do proper recovery.
>
>

The addr, misc registers are still valid for threshold, deferred errors.
(Of course, misc is valid only if m->status & MCI_STATUS_MISCV)

My point was, in __log_error(), we can read relevant status and addr 
MSRs to be passed to mce_log() as those are the only pieces of 
information we use in the decoding chain; and discard the m.misc 
assignment we do for threshold errors.

If userspace tools absolutely need 'misc' info too, we can go with 
option (2) as mentioned above..

Thanks,
-Aravind.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
  2015-05-04 17:08         ` Aravind Gopalakrishnan
@ 2015-05-04 18:46           ` Borislav Petkov
  2015-05-04 19:06             ` Aravind Gopalakrishnan
  2015-05-05 18:39             ` Aravind Gopalakrishnan
  0 siblings, 2 replies; 26+ messages in thread
From: Borislav Petkov @ 2015-05-04 18:46 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: tglx, mingo, hpa, tony.luck, jiang.liu, yinghai, x86, dvlasenk,
	JBeulich, slaoub, luto, dave.hansen, oleg, rostedt, rusty,
	prarit, linux, jroedel, andriy.shevchenko, macro, wangnan0,
	linux-kernel, linux-edac, Robert Richter

On Mon, May 04, 2015 at 12:08:16PM -0500, Aravind Gopalakrishnan wrote:
> Not sure if lvt_off_valid() can be reused for deferred error interrupt
> setup. It expects some some of info to be in struct threshold_block
> which is fine for threshold errors and the shifts for offset are
> different too.

I meant that thresholding and IBS is being taken care of by that
function.

> For deferred errors, the workaround is a little different as it
> applies to only the given family/model right now. If the workaround
> needs to be applied for future processors, we can extend the family
> check for those right?

Or, you can do the check for all families as we're behind a CPUID bit
anyway. This is why CPUID bits are a good thing :-)

> If we setup 'm.addr' in amd_threshold_interrupt() and
> amd_deferred_error_interrupt() properly, then amd_decode_mce() would
> actually have some value in m->addr to report.
>
> I didn't mean to say HW doesn't provide us the information in the addr
> and/or the misc registers.

So you can use mce_read_aux(), yeah, you can move it to mce-internal.h

> The addr, misc registers are still valid for threshold, deferred errors.
> (Of course, misc is valid only if m->status & MCI_STATUS_MISCV)
> 
> My point was, in __log_error(), we can read relevant status and addr MSRs to
> be passed to mce_log() as those are the only pieces of information we use in
> the decoding chain; and discard the m.misc assignment we do for threshold
> errors.

But MCx_MISC is important for thresholding errors, it carries the ErrCnt
and stuff.

So you can pass a parameter to __log_error(..., threshold=true, misc)
and do

	if (threshold)
		m.misc = misc;

Right?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
  2015-05-04 18:46           ` Borislav Petkov
@ 2015-05-04 19:06             ` Aravind Gopalakrishnan
  2015-05-04 19:14               ` Borislav Petkov
  2015-05-05 18:39             ` Aravind Gopalakrishnan
  1 sibling, 1 reply; 26+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-04 19:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tglx, mingo, hpa, tony.luck, jiang.liu, yinghai, x86, dvlasenk,
	JBeulich, slaoub, luto, dave.hansen, oleg, rostedt, rusty,
	prarit, linux, jroedel, andriy.shevchenko, macro, wangnan0,
	linux-kernel, linux-edac, Robert Richter

On 5/4/2015 1:46 PM, Borislav Petkov wrote:
>> For deferred errors, the workaround is a little different as it
>> applies to only the given family/model right now. If the workaround
>> needs to be applied for future processors, we can extend the family
>> check for those right?
> Or, you can do the check for all families as we're behind a CPUID bit
> anyway. This is why CPUID bits are a good thing :-)

Yep. Ok, Will do that.

>> If we setup 'm.addr' in amd_threshold_interrupt() and
>> amd_deferred_error_interrupt() properly, then amd_decode_mce() would
>> actually have some value in m->addr to report.
>>
>> I didn't mean to say HW doesn't provide us the information in the addr
>> and/or the misc registers.
> So you can use mce_read_aux(), yeah, you can move it to mce-internal.h


Ok, will do.
Is it ok to grow another patch in a V2 for this instead of fixing it in 
this patch since it's a real bug?
That should be helpful when someone wants to look up git logs of why 
this was done..

>> The addr, misc registers are still valid for threshold, deferred errors.
>> (Of course, misc is valid only if m->status & MCI_STATUS_MISCV)
>>
>> My point was, in __log_error(), we can read relevant status and addr MSRs to
>> be passed to mce_log() as those are the only pieces of information we use in
>> the decoding chain; and discard the m.misc assignment we do for threshold
>> errors.
> But MCx_MISC is important for thresholding errors, it carries the ErrCnt
> and stuff.
>
> So you can pass a parameter to __log_error(..., threshold=true, misc)
> and do
>
> 	if (threshold)
> 		m.misc = misc;
>
> Right?
>

Yeah, just wanted to keep __log_error() as generic as possible and not 
special case for threshold.
But ok, since MCx_MISC is needed, I'll work it up as you suggested.

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
  2015-05-04 19:06             ` Aravind Gopalakrishnan
@ 2015-05-04 19:14               ` Borislav Petkov
  0 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2015-05-04 19:14 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: tglx, mingo, hpa, tony.luck, jiang.liu, yinghai, x86, dvlasenk,
	JBeulich, slaoub, luto, dave.hansen, oleg, rostedt, rusty,
	prarit, linux, jroedel, andriy.shevchenko, macro, wangnan0,
	linux-kernel, linux-edac, Robert Richter

On Mon, May 04, 2015 at 02:06:43PM -0500, Aravind Gopalakrishnan wrote:
> Is it ok to grow another patch in a V2 for this instead of fixing
> it in this patch since it's a real bug? That should be helpful when
> someone wants to look up git logs of why this was done..

Yes, a prepatch please.

> Yeah, just wanted to keep __log_error() as generic as possible and not
> special case for threshold.

Not important as it is going to be used in mce_amd.c only anyway. It's
main goal is to avoid code duplication - nothing else.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
  2015-05-04 18:46           ` Borislav Petkov
  2015-05-04 19:06             ` Aravind Gopalakrishnan
@ 2015-05-05 18:39             ` Aravind Gopalakrishnan
  2015-05-05 20:28               ` Luck, Tony
  1 sibling, 1 reply; 26+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-05 18:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tglx, mingo, hpa, tony.luck, jiang.liu, yinghai, x86, dvlasenk,
	JBeulich, slaoub, luto, dave.hansen, oleg, rostedt, rusty,
	prarit, linux, jroedel, andriy.shevchenko, macro, wangnan0,
	linux-kernel, linux-edac, Robert Richter

On 5/4/2015 1:46 PM, Borislav Petkov wrote:
> So you can use mce_read_aux(), yeah, you can move it to mce-internal.h 

Re-using mce_read_aux() was not as trivial as I initially thought.
The MISC address value we read in amd_threshold_interrupt() could also 
be the value
in MSR0xc0000408 or MSR0xc0000409 (for a bank == 4 case). But in 
mce_read_aux(), we will only
look at MSR_IA32_MCx_MISC(i) (which is 0x413 for bank = 4)

So, instead of mucking around with mce_read_aux(), I am reusing the 
'misc' value from amd_threshold_interrupt()
and just adding rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr)

> So you can pass a parameter to __log_error(..., threshold=true, misc)
> and do
>
> 	if (threshold)
> 		m.misc = misc;
>

Here's how I have it currently-
static void __log_error(unsigned int bank, bool is_thr, u64 misc)
{
         struct mce m;

         mce_setup(&m);
         rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
         if (!(m.status & MCI_STATUS_VAL))
                 return;

         if (is_thr)
                 m.misc = misc;

         m.bank = bank;
         rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr);
         mce_log(&m);

         wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
}

and works fine..

Before patch:

     [76916.275587] [Hardware Error]: Corrected error, no action required.
     [76916.279576] [Hardware Error]: CPU:0 (15:60:0) 
MC0_STATUS[-|CE|-|-|AddrV|-|-|CECC]: 0x840041000028017b
     [76916.279576] [Hardware Error]: MC0 Error Address: 0x0000000000000000

Corrected error output:
     [  102.623490] [Hardware Error]: Corrected error, no action required.
     [  102.623668] [Hardware Error]: CPU:0 (15:60:0) 
MC0_STATUS[-|CE|-|-|AddrV|-|-|CECC]: 0x840041000028017b
     [  102.623930] [Hardware Error]: MC0 Error Address: 0x00001f808f0ff040

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
  2015-05-05 18:39             ` Aravind Gopalakrishnan
@ 2015-05-05 20:28               ` Luck, Tony
  2015-05-05 20:33                 ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 26+ messages in thread
From: Luck, Tony @ 2015-05-05 20:28 UTC (permalink / raw)
  To: Aravind Gopalakrishnan, Borislav Petkov
  Cc: tglx, mingo, hpa, jiang.liu, yinghai, x86, dvlasenk, JBeulich,
	slaoub, luto, dave.hansen, oleg, rostedt, rusty, prarit, linux,
	jroedel, andriy.shevchenko, macro, wangnan0, linux-kernel,
	linux-edac, Robert Richter

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 393 bytes --]

Should you check whether the address is valid before blindly reading the register?

>             m.bank = bank;
	if (m.status & MCI_STATUS_ADDRV)
         		rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr);
>             mce_log(&m);

-Tony
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler
  2015-05-05 20:28               ` Luck, Tony
@ 2015-05-05 20:33                 ` Aravind Gopalakrishnan
  0 siblings, 0 replies; 26+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-05 20:33 UTC (permalink / raw)
  To: Luck, Tony, Borislav Petkov
  Cc: tglx, mingo, hpa, jiang.liu, yinghai, x86, dvlasenk, JBeulich,
	slaoub, luto, dave.hansen, oleg, rostedt, rusty, prarit, linux,
	jroedel, andriy.shevchenko, macro, wangnan0, linux-kernel,
	linux-edac, Robert Richter

On 5/5/2015 3:28 PM, Luck, Tony wrote:
> Should you check whether the address is valid before blindly reading the register?
>
>>              m.bank = bank;
> 	if (m.status & MCI_STATUS_ADDRV)
>           		rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr);
>>              mce_log(&m);

Yes, missed that when I sent the snippet.
Fixed it:)

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-05-05 20:34 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-30 14:49 [PATCH 0/4] Enable deferred error interrupts Aravind Gopalakrishnan
2015-04-30 14:49 ` [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit Aravind Gopalakrishnan
2015-05-01 10:25   ` Borislav Petkov
2015-05-01 14:54     ` Aravind Gopalakrishnan
2015-05-03  9:01       ` Borislav Petkov
2015-05-01 15:09   ` Dave Hansen
2015-05-01 16:20     ` Borislav Petkov
2015-04-30 14:49 ` [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler Aravind Gopalakrishnan
2015-04-30 20:41   ` Andy Lutomirski
2015-05-01  4:16     ` Aravind Gopalakrishnan
2015-05-01  9:36       ` Borislav Petkov
2015-05-01 14:50         ` Aravind Gopalakrishnan
2015-05-03  9:22   ` Borislav Petkov
2015-05-04 15:29     ` Aravind Gopalakrishnan
2015-05-04 15:46       ` Borislav Petkov
2015-05-04 17:08         ` Aravind Gopalakrishnan
2015-05-04 18:46           ` Borislav Petkov
2015-05-04 19:06             ` Aravind Gopalakrishnan
2015-05-04 19:14               ` Borislav Petkov
2015-05-05 18:39             ` Aravind Gopalakrishnan
2015-05-05 20:28               ` Luck, Tony
2015-05-05 20:33                 ` Aravind Gopalakrishnan
2015-04-30 14:49 ` [PATCH 3/4] x86, irq: Cleanup ordering of vector numbers Aravind Gopalakrishnan
2015-04-30 14:49 ` [PATCH 4/4] x86/mce/amd: Rename setup_APIC_mce Aravind Gopalakrishnan
2015-05-01  7:18 ` [PATCH 0/4] Enable deferred error interrupts Ingo Molnar
2015-05-01 14:50   ` Aravind Gopalakrishnan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.