All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 1/2] apic: Fix error interrupt report at all APs
@ 2011-04-14  6:36 Youquan Song
  2011-04-14  6:36 ` [PATCH v4 2/2] apic: Add print error interrupt reason Youquan Song
  2011-04-19 17:01 ` [PATCH v4 1/2] apic: Fix error interrupt report at all APs Ingo Molnar
  0 siblings, 2 replies; 11+ messages in thread
From: Youquan Song @ 2011-04-14  6:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: akpm, mingo, tglx, hpa, hpa, suresh.b.siddha, yong.y.wang, joe,
	jbaron, trenn, kent.liu, chaohong.guo, Youquan Song,
	Youquan Song

Recently, customer report that once machine boot, there are many error interrupt
reported with exact number of all APs. 

The root cause is Local APIC will generate error interrupt when it detect
the illegal vector (one in 0 ~ 15) in an interrupt message received or
interrupt generate from local vector table or self IPI. SDM3A.chapter 10.

AP LAPIC thermal sensor register will be reset to 0x10000, if thermal throttling
interrupt take over by BIOS, it need restore AP with the thermal sensor register
value of geting from BSP, otherwise cause system issue. If BIOS does not take
over the thermal interrupt, The restore value will be CPU rest value of 0x10000,
which means the interrupt vector is zero. After writing 0x10000 to thermal
sensor LVT, the processor will recieve the error interrupt report if the APIC
error interrupt is also set.

This patch add check the BIOS whether take over the thermal interrupt by look
at interrupt delivery mode not fixed mode(BIOS handle will be SMI mode) before
restore AP's thermal LVT. So the agony noise of error interrupt will dismiss
when boot on machine that BIOS does not handle thermal interrupt..  


Signed-off-by: Youquan Song <youquan.song@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yong Wang <yong.y.wang@intel.com>
---
 arch/x86/include/asm/apicdef.h           |    1 +
 arch/x86/kernel/cpu/mcheck/therm_throt.c |   12 +++++++-----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index d87988b..34595d5 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -78,6 +78,7 @@
 #define		APIC_DEST_LOGICAL	0x00800
 #define		APIC_DEST_PHYSICAL	0x00000
 #define		APIC_DM_FIXED		0x00000
+#define		APIC_DM_FIXED_MASK	0x00700
 #define		APIC_DM_LOWEST		0x00100
 #define		APIC_DM_SMI		0x00200
 #define		APIC_DM_REMRD		0x00300
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 6f8c5e9..22c212a 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -446,18 +446,20 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
 	 */
 	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
 
+	h = lvtthmr_init;
 	/*
 	 * The initial value of thermal LVT entries on all APs always reads
 	 * 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
 	 * sequence to them and LVT registers are reset to 0s except for
 	 * the mask bits which are set to 1s when APs receive INIT IPI.
-	 * Always restore the value that BIOS has programmed on AP based on
-	 * BSP's info we saved since BIOS is always setting the same value
-	 * for all threads/cores
+	 * If BIOS take over the thermal interrupt and set its interrupt
+	 * delivery mode to SMI not fixed, it restore the value that BIOS has
+	 * programmed on AP based on BSP's info we saved since BIOS is always
+	 * setting the same value for all threads/cores.
 	 */
-	apic_write(APIC_LVTTHMR, lvtthmr_init);
+	if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED)
+		apic_write(APIC_LVTTHMR, lvtthmr_init);
 
-	h = lvtthmr_init;
 
 	if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
 		printk(KERN_DEBUG
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread
* [PATCH v4 1/2] apic: Fix error interrupt report at all APs
@ 2011-04-06 12:20 Youquan Song
  2011-04-06 12:20 ` [PATCH v4 2/2] apic: Add print error interrupt reason Youquan Song
  0 siblings, 1 reply; 11+ messages in thread
From: Youquan Song @ 2011-04-06 12:20 UTC (permalink / raw)
  To: linux-kernel, hpa
  Cc: suresh.b.siddha, yong.y.wang, joe, jbaron, trenn, kent.liu,
	chaohong.guo, Youquan Song, Youquan Song

Recently, customer report that once machine boot, there are many error interrupt
reported with exact number of all APs. 

The root cause is Local APIC will generate error interrupt when it detect
the illegal vector (one in 0 ~ 15) in an interrupt message received or
interrupt generate from local vector table or self IPI. SDM3A.chapter 10.

AP LAPIC thermal sensor register will be reset to 0x10000, if thermal throttling
interrupt take over by BIOS, it need restore AP with the thermal sensor register
value of geting from BSP, otherwise cause system issue. If BIOS does not take
over the thermal interrupt, The restore value will be CPU rest value of 0x10000,
which means the interrupt vector is zero. After writing 0x10000 to thermal
sensor LVT, the processor will recieve the error interrupt report if the APIC
error interrupt is also set.

This patch add check the BIOS whether take over the thermal interrupt by look
at interrupt delivery mode not fixed mode(BIOS handle will be SMI mode) before
restore AP's thermal LVT. So the agony noise of error interrupt will dismiss
when boot on machine that BIOS does not handle thermal interrupt..  


Signed-off-by: Youquan Song <youquan.song@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yong Wang <yong.y.wang@intel.com>
---
 arch/x86/include/asm/apicdef.h           |    1 +
 arch/x86/kernel/cpu/mcheck/therm_throt.c |   12 +++++++-----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index d87988b..34595d5 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -78,6 +78,7 @@
 #define		APIC_DEST_LOGICAL	0x00800
 #define		APIC_DEST_PHYSICAL	0x00000
 #define		APIC_DM_FIXED		0x00000
+#define		APIC_DM_FIXED_MASK	0x00700
 #define		APIC_DM_LOWEST		0x00100
 #define		APIC_DM_SMI		0x00200
 #define		APIC_DM_REMRD		0x00300
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 6f8c5e9..22c212a 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -446,18 +446,20 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
 	 */
 	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
 
+	h = lvtthmr_init;
 	/*
 	 * The initial value of thermal LVT entries on all APs always reads
 	 * 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
 	 * sequence to them and LVT registers are reset to 0s except for
 	 * the mask bits which are set to 1s when APs receive INIT IPI.
-	 * Always restore the value that BIOS has programmed on AP based on
-	 * BSP's info we saved since BIOS is always setting the same value
-	 * for all threads/cores
+	 * If BIOS take over the thermal interrupt and set its interrupt
+	 * delivery mode to SMI not fixed, it restore the value that BIOS has
+	 * programmed on AP based on BSP's info we saved since BIOS is always
+	 * setting the same value for all threads/cores.
 	 */
-	apic_write(APIC_LVTTHMR, lvtthmr_init);
+	if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED)
+		apic_write(APIC_LVTTHMR, lvtthmr_init);
 
-	h = lvtthmr_init;
 
 	if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
 		printk(KERN_DEBUG
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread
* [PATCH v4 2/2] apic: Add print error interrupt reason
@ 2011-01-08 16:01 Youquan Song
  0 siblings, 0 replies; 11+ messages in thread
From: Youquan Song @ 2011-01-08 16:01 UTC (permalink / raw)
  To: linux-kernel, hpa, suresh.b.siddha
  Cc: yong.y.wang, joe, jbaron, arjan, trenn, kent.liu, chaohong.guo,
	Youquan Song, Youquan Song

End user worry about the error interrupt information and intend to know what
kind of error interrupts are generated, so this patch add printing out the
detail debug information of error interrupt. 
dynamic debug is not initiated when LAPIC initiation, so the pr_debug will not
output the error interrupt debug information when boot.
In this patch, we use apic_printk(APIC_DEBUG,), so if add kernel option
apic=debug will output the error interrupt during boot.

Thanks for help from Joe Perches <joe@perches.com> about printk format.

Signed-off-by: Youquan Song <youquan.song@intel.com>
---
 arch/x86/kernel/apic/apic.c |   38 +++++++++++++++++++++++++-------------
 1 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 3f838d5..5ed54cd 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1800,6 +1800,17 @@ void smp_spurious_interrupt(struct pt_regs *regs)
 void smp_error_interrupt(struct pt_regs *regs)
 {
 	u32 v, v1;
+	u32 i = 0;
+	static const char * const error_interrupt_reason[] = {
+		"Send CS error",		/* APIC Error Bit 0 */
+		"Receive CS error",		/* APIC Error Bit 1 */
+		"Send accept error",		/* APIC Error Bit 2 */
+		"Receive accept error",		/* APIC Error Bit 3 */
+		"Redirectable IPI",		/* APIC Error Bit 4 */
+		"Send illegal vector",		/* APIC Error Bit 5 */
+		"Received illegal vector",	/* APIC Error Bit 6 */
+		"Illegal register address",	/* APIC Error Bit 7 */
+	};
 
 	exit_idle();
 	irq_enter();
@@ -1810,19 +1821,20 @@ void smp_error_interrupt(struct pt_regs *regs)
 	ack_APIC_irq();
 	atomic_inc(&irq_err_count);
 
-	/*
-	 * Here is what the APIC error bits mean:
-	 * 0: Send CS error
-	 * 1: Receive CS error
-	 * 2: Send accept error
-	 * 3: Receive accept error
-	 * 4: Reserved
-	 * 5: Send illegal vector
-	 * 6: Received illegal vector
-	 * 7: Illegal register address
-	 */
-	pr_debug("APIC error on CPU%d: %02x(%02x)\n",
-		smp_processor_id(), v , v1);
+	apic_printk(APIC_DEBUG, KERN_DEBUG "APIC error on CPU%d: %02x(%02x)",
+		    smp_processor_id(), v , v1);
+
+	v1 = v1 & 0xff;
+	while (v1) {
+		if (v1 & 0x1)
+			apic_printk(APIC_DEBUG, KERN_CONT " : %s",
+				    error_interrupt_reason[i]);
+		i++;
+		v1 >>= 1;
+	};
+
+	apic_printk(APIC_DEBUG, KERN_CONT "\n");
+
 	irq_exit();
 }
 
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-04-21 16:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-14  6:36 [PATCH v4 1/2] apic: Fix error interrupt report at all APs Youquan Song
2011-04-14  6:36 ` [PATCH v4 2/2] apic: Add print error interrupt reason Youquan Song
2011-04-14  7:54   ` Cyrill Gorcunov
2011-04-14  7:57     ` Cyrill Gorcunov
2011-04-19 17:48   ` [tip:x86/apic] x86, apic: Print verbose error interrupt reason on apic=debug tip-bot for Youquan Song
2011-04-19 17:01 ` [PATCH v4 1/2] apic: Fix error interrupt report at all APs Ingo Molnar
2011-04-22  3:12   ` Youquan Song
2011-04-21 15:27     ` Ingo Molnar
2011-04-22  4:34       ` Youquan Song
  -- strict thread matches above, loose matches on Subject: below --
2011-04-06 12:20 Youquan Song
2011-04-06 12:20 ` [PATCH v4 2/2] apic: Add print error interrupt reason Youquan Song
2011-01-08 16:01 Youquan Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.