All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 1/2] apic: Fix error interrupt report at all APs
@ 2011-04-21 16:22 Youquan Song
  2011-05-12  2:19 ` Youquan Song
  2011-05-16 16:49 ` [tip:x86/urgent] x86, apic: Fix spurious error interrupts triggering on all non-boot APs tip-bot for Youquan Song
  0 siblings, 2 replies; 9+ messages in thread
From: Youquan Song @ 2011-04-21 16:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: akpm, mingo, tglx, hpa, hpa, suresh.b.siddha, yong.y.wang, joe,
	jbaron, trenn, kent.liu, chaohong.guo, Youquan Song,
	Youquan Song

This patch fixes a bug reported from customer, who found many unreasonable error
 interrupts reported on all APs during the system boot stage.

According to Chapter 10 of Intel Software Developer Manual Volume 3A, Local APIC
 may signal an illegal vector error when an LVT entry is set as an illegal
vector value (0~15) under FIXED delivery mode (bits 8-11 is 0), regardless of
whether the mask bit is set or an interrupt actually happen. These errors are
seen as error interrupts.

The initial value of thermal LVT entries on all APs always reads 0x10000 because
 APs are woken up by BSP issuing INIT-SIPI-SIPI sequence to them and LVT
registers are reset to 0s except for the mask bits which are set to 1s when APs
receive INIT IPI.  When BIOS take over the thermal throttling interrupt, LVT
thermal deliver mode should be SMI and it is required to restore AP's LVT
thermal monitor register.

This issue happens when BIOS do not take over thermal throttling interrupt,
AP's LVT thermal monitor register will be restored to 0x10000 which means vector
 0 and fixed deliver mode, so all APs will signal illegal vector error
interrupt.  This patch check if interrupt delivery mode is not fixed mode before
 restore AP's LVT thermal monitor register.

Signed-off-by: Youquan Song <youquan.song@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yong Wang <yong.y.wang@intel.com>
---
 arch/x86/include/asm/apicdef.h           |    1 +
 arch/x86/kernel/cpu/mcheck/therm_throt.c |   12 +++++++-----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index d87988b..34595d5 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -78,6 +78,7 @@
 #define		APIC_DEST_LOGICAL	0x00800
 #define		APIC_DEST_PHYSICAL	0x00000
 #define		APIC_DM_FIXED		0x00000
+#define		APIC_DM_FIXED_MASK	0x00700
 #define		APIC_DM_LOWEST		0x00100
 #define		APIC_DM_SMI		0x00200
 #define		APIC_DM_REMRD		0x00300
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 6f8c5e9..22c212a 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -446,18 +446,20 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
 	 */
 	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
 
+	h = lvtthmr_init;
 	/*
 	 * The initial value of thermal LVT entries on all APs always reads
 	 * 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
 	 * sequence to them and LVT registers are reset to 0s except for
 	 * the mask bits which are set to 1s when APs receive INIT IPI.
-	 * Always restore the value that BIOS has programmed on AP based on
-	 * BSP's info we saved since BIOS is always setting the same value
-	 * for all threads/cores
+	 * If BIOS take over the thermal interrupt and set its interrupt
+	 * delivery mode to SMI not fixed, it restore the value that BIOS has
+	 * programmed on AP based on BSP's info we saved since BIOS is always
+	 * setting the same value for all threads/cores.
 	 */
-	apic_write(APIC_LVTTHMR, lvtthmr_init);
+	if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED)
+		apic_write(APIC_LVTTHMR, lvtthmr_init);
 
-	h = lvtthmr_init;
 
 	if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
 		printk(KERN_DEBUG
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/2] apic: Fix error interrupt report at all APs
  2011-04-21 16:22 [PATCH v4 1/2] apic: Fix error interrupt report at all APs Youquan Song
@ 2011-05-12  2:19 ` Youquan Song
  2011-05-16 16:49 ` [tip:x86/urgent] x86, apic: Fix spurious error interrupts triggering on all non-boot APs tip-bot for Youquan Song
  1 sibling, 0 replies; 9+ messages in thread
From: Youquan Song @ 2011-05-12  2:19 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, akpm, mingo, tglx, hpa, hpa, suresh.b.siddha,
	yong.y.wang, joe, jbaron, trenn, kent.liu, chaohong.guo,
	Youquan Song

Hi Ingo,

Below is the updated description. If it looks good, please take it 
since the first patch "x86, apic: Print verbose error interrupt reason on
 apic=debug" in the patchset already accepted in -tip.

Thanks
-Youquan

On Fri, Apr 22, 2011 at 12:22:43AM +0800, Youquan Song wrote:
> This patch fixes a bug reported from customer, who found many unreasonable error
>  interrupts reported on all APs during the system boot stage.
> 
> According to Chapter 10 of Intel Software Developer Manual Volume 3A, Local APIC
>  may signal an illegal vector error when an LVT entry is set as an illegal
> vector value (0~15) under FIXED delivery mode (bits 8-11 is 0), regardless of
> whether the mask bit is set or an interrupt actually happen. These errors are
> seen as error interrupts.
> 
> The initial value of thermal LVT entries on all APs always reads 0x10000 because
>  APs are woken up by BSP issuing INIT-SIPI-SIPI sequence to them and LVT
> registers are reset to 0s except for the mask bits which are set to 1s when APs
> receive INIT IPI.  When BIOS take over the thermal throttling interrupt, LVT
> thermal deliver mode should be SMI and it is required to restore AP's LVT
> thermal monitor register.
> 
> This issue happens when BIOS do not take over thermal throttling interrupt,
> AP's LVT thermal monitor register will be restored to 0x10000 which means vector
>  0 and fixed deliver mode, so all APs will signal illegal vector error
> interrupt.  This patch check if interrupt delivery mode is not fixed mode before
>  restore AP's LVT thermal monitor register.
> 
> Signed-off-by: Youquan Song <youquan.song@intel.com>
> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
> Acked-by: Yong Wang <yong.y.wang@intel.com>
> ---
>  arch/x86/include/asm/apicdef.h           |    1 +
>  arch/x86/kernel/cpu/mcheck/therm_throt.c |   12 +++++++-----
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
> index d87988b..34595d5 100644
> --- a/arch/x86/include/asm/apicdef.h
> +++ b/arch/x86/include/asm/apicdef.h
> @@ -78,6 +78,7 @@
>  #define		APIC_DEST_LOGICAL	0x00800
>  #define		APIC_DEST_PHYSICAL	0x00000
>  #define		APIC_DM_FIXED		0x00000
> +#define		APIC_DM_FIXED_MASK	0x00700
>  #define		APIC_DM_LOWEST		0x00100
>  #define		APIC_DM_SMI		0x00200
>  #define		APIC_DM_REMRD		0x00300
> diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> index 6f8c5e9..22c212a 100644
> --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> @@ -446,18 +446,20 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
>  	 */
>  	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
>  
> +	h = lvtthmr_init;
>  	/*
>  	 * The initial value of thermal LVT entries on all APs always reads
>  	 * 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
>  	 * sequence to them and LVT registers are reset to 0s except for
>  	 * the mask bits which are set to 1s when APs receive INIT IPI.
> -	 * Always restore the value that BIOS has programmed on AP based on
> -	 * BSP's info we saved since BIOS is always setting the same value
> -	 * for all threads/cores
> +	 * If BIOS take over the thermal interrupt and set its interrupt
> +	 * delivery mode to SMI not fixed, it restore the value that BIOS has
> +	 * programmed on AP based on BSP's info we saved since BIOS is always
> +	 * setting the same value for all threads/cores.
>  	 */
> -	apic_write(APIC_LVTTHMR, lvtthmr_init);
> +	if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED)
> +		apic_write(APIC_LVTTHMR, lvtthmr_init);
>  
> -	h = lvtthmr_init;
>  
>  	if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
>  		printk(KERN_DEBUG
> -- 
> 1.6.4.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip:x86/urgent] x86, apic: Fix spurious error interrupts triggering on all non-boot APs
  2011-04-21 16:22 [PATCH v4 1/2] apic: Fix error interrupt report at all APs Youquan Song
  2011-05-12  2:19 ` Youquan Song
@ 2011-05-16 16:49 ` tip-bot for Youquan Song
  1 sibling, 0 replies; 9+ messages in thread
From: tip-bot for Youquan Song @ 2011-05-16 16:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, yong.y.wang, suresh.b.siddha, tglx,
	youquan.song, mingo

Commit-ID:  e503f9e4b092e2349a9477a333543de8f3c7f5d9
Gitweb:     http://git.kernel.org/tip/e503f9e4b092e2349a9477a333543de8f3c7f5d9
Author:     Youquan Song <youquan.song@intel.com>
AuthorDate: Fri, 22 Apr 2011 00:22:43 +0800
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 16 May 2011 13:48:25 +0200

x86, apic: Fix spurious error interrupts triggering on all non-boot APs

This patch fixes a bug reported by a customer, who found
that many unreasonable error interrupts reported on all
non-boot CPUs (APs) during the system boot stage.

According to Chapter 10 of Intel Software Developer Manual
Volume 3A, Local APIC may signal an illegal vector error when
an LVT entry is set as an illegal vector value (0~15) under
FIXED delivery mode (bits 8-11 is 0), regardless of whether
the mask bit is set or an interrupt actually happen. These
errors are seen as error interrupts.

The initial value of thermal LVT entries on all APs always reads
0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
sequence to them and LVT registers are reset to 0s except for
the mask bits which are set to 1s when APs receive INIT IPI.

When the BIOS takes over the thermal throttling interrupt,
the LVT thermal deliver mode should be SMI and it is required
from the kernel to keep AP's LVT thermal monitoring register
programmed as such as well.

This issue happens when BIOS does not take over thermal throttling
interrupt, AP's LVT thermal monitor register will be restored to
0x10000 which means vector 0 and fixed deliver mode, so all APs will
signal illegal vector error interrupts.

This patch check if interrupt delivery mode is not fixed mode before
restoring AP's LVT thermal monitor register.

Signed-off-by: Youquan Song <youquan.song@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yong Wang <yong.y.wang@intel.com>
Cc: hpa@linux.intel.com
Cc: joe@perches.com
Cc: jbaron@redhat.com
Cc: trenn@suse.de
Cc: kent.liu@intel.com
Cc: chaohong.guo@intel.com
Cc: <stable@kernel.org> # As far back as possible
Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/include/asm/apicdef.h           |    1 +
 arch/x86/kernel/cpu/mcheck/therm_throt.c |   12 +++++++-----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index d87988b..34595d5 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -78,6 +78,7 @@
 #define		APIC_DEST_LOGICAL	0x00800
 #define		APIC_DEST_PHYSICAL	0x00000
 #define		APIC_DM_FIXED		0x00000
+#define		APIC_DM_FIXED_MASK	0x00700
 #define		APIC_DM_LOWEST		0x00100
 #define		APIC_DM_SMI		0x00200
 #define		APIC_DM_REMRD		0x00300
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 6f8c5e9..0f03446 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -446,18 +446,20 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
 	 */
 	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
 
+	h = lvtthmr_init;
 	/*
 	 * The initial value of thermal LVT entries on all APs always reads
 	 * 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
 	 * sequence to them and LVT registers are reset to 0s except for
 	 * the mask bits which are set to 1s when APs receive INIT IPI.
-	 * Always restore the value that BIOS has programmed on AP based on
-	 * BSP's info we saved since BIOS is always setting the same value
-	 * for all threads/cores
+	 * If BIOS takes over the thermal interrupt and sets its interrupt
+	 * delivery mode to SMI (not fixed), it restores the value that the
+	 * BIOS has programmed on AP based on BSP's info we saved since BIOS
+	 * is always setting the same value for all threads/cores.
 	 */
-	apic_write(APIC_LVTTHMR, lvtthmr_init);
+	if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED)
+		apic_write(APIC_LVTTHMR, lvtthmr_init);
 
-	h = lvtthmr_init;
 
 	if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
 		printk(KERN_DEBUG

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/2] apic: Fix error interrupt report at all APs
  2011-04-21 15:27     ` Ingo Molnar
@ 2011-04-22  4:34       ` Youquan Song
  0 siblings, 0 replies; 9+ messages in thread
From: Youquan Song @ 2011-04-22  4:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Youquan Song, Youquan Song, linux-kernel, akpm, tglx, hpa, hpa,
	suresh.b.siddha, yong.y.wang, joe, jbaron, trenn, kent.liu,
	chaohong.guo

> Mind resending the updated patch?

I have resended it.

Thanks
-Youquan
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/2] apic: Fix error interrupt report at all APs
  2011-04-19 17:01 ` Ingo Molnar
@ 2011-04-22  3:12   ` Youquan Song
  2011-04-21 15:27     ` Ingo Molnar
  0 siblings, 1 reply; 9+ messages in thread
From: Youquan Song @ 2011-04-22  3:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Youquan Song, linux-kernel, akpm, tglx, hpa, hpa,
	suresh.b.siddha, yong.y.wang, joe, jbaron, trenn, kent.liu,
	chaohong.guo, Youquan Song

> I don't disagree with this change, but unfortunately the changelog is in 
> absolutely unreadable English. Please fix it or find someone who can fix it for 
> you.
> 
> I decoded and fixed the changelog of the 2/2 patch of your series so no need to 
> do it for that patch.

Thanks a lot Ingo!

Here is the fixed changelog for 1/2 patch :Fix error interrupt report at all APs

This patch fixes a bug reported from customer, who found many unreasonable error
 interrupts reported on all APs during the system boot stage. 

According to Chapter 10 of Intel Software Developer Manual Volume 3A, Local APIC
 may signal an illegal vector error when an LVT entry is set as an illegal 
vector value (0~15) under FIXED delivery mode (bits 8-11 is 0), regardless of 
whether the mask bit is set or an interrupt actually happen. These errors are 
seen as error interrupts. 

The initial value of thermal LVT entries on all APs always reads 0x10000 because
 APs are woken up by BSP issuing INIT-SIPI-SIPI sequence to them and LVT 
registers are reset to 0s except for the mask bits which are set to 1s when APs 
receive INIT IPI.  When BIOS take over the thermal throttling interrupt, LVT 
thermal deliver mode should be SMI and it is required to restore AP's LVT 
thermal monitor register. 

This issue happens when BIOS do not take over thermal throttling interrupt, 
AP's LVT thermal monitor register will be restored to 0x10000 which means vector
 0 and fixed deliver mode, so all APs will signal illegal vector error 
interrupt.  This patch check if interrupt delivery mode is not fixed mode before
 restore AP's LVT thermal monitor register. 

-Youquan
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/2] apic: Fix error interrupt report at all APs
  2011-04-22  3:12   ` Youquan Song
@ 2011-04-21 15:27     ` Ingo Molnar
  2011-04-22  4:34       ` Youquan Song
  0 siblings, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2011-04-21 15:27 UTC (permalink / raw)
  To: Youquan Song
  Cc: Youquan Song, linux-kernel, akpm, tglx, hpa, hpa,
	suresh.b.siddha, yong.y.wang, joe, jbaron, trenn, kent.liu,
	chaohong.guo


* Youquan Song <youquan.song@linux.intel.com> wrote:

> > I don't disagree with this change, but unfortunately the changelog is in 
> > absolutely unreadable English. Please fix it or find someone who can fix it for 
> > you.
> > 
> > I decoded and fixed the changelog of the 2/2 patch of your series so no need to 
> > do it for that patch.
> 
> Thanks a lot Ingo!
> 
> Here is the fixed changelog for 1/2 patch :Fix error interrupt report at all APs

Mind resending the updated patch?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/2] apic: Fix error interrupt report at all APs
  2011-04-14  6:36 [PATCH v4 1/2] apic: Fix error interrupt report at all APs Youquan Song
@ 2011-04-19 17:01 ` Ingo Molnar
  2011-04-22  3:12   ` Youquan Song
  0 siblings, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2011-04-19 17:01 UTC (permalink / raw)
  To: Youquan Song
  Cc: linux-kernel, akpm, tglx, hpa, hpa, suresh.b.siddha, yong.y.wang,
	joe, jbaron, trenn, kent.liu, chaohong.guo, Youquan Song


* Youquan Song <youquan.song@intel.com> wrote:

> Recently, customer report that once machine boot, there are many error interrupt
> reported with exact number of all APs. 
> 
> The root cause is Local APIC will generate error interrupt when it detect
> the illegal vector (one in 0 ~ 15) in an interrupt message received or
> interrupt generate from local vector table or self IPI. SDM3A.chapter 10.
> 
> AP LAPIC thermal sensor register will be reset to 0x10000, if thermal throttling
> interrupt take over by BIOS, it need restore AP with the thermal sensor register
> value of geting from BSP, otherwise cause system issue. If BIOS does not take
> over the thermal interrupt, The restore value will be CPU rest value of 0x10000,
> which means the interrupt vector is zero. After writing 0x10000 to thermal
> sensor LVT, the processor will recieve the error interrupt report if the APIC
> error interrupt is also set.
> 
> This patch add check the BIOS whether take over the thermal interrupt by look
> at interrupt delivery mode not fixed mode(BIOS handle will be SMI mode) before
> restore AP's thermal LVT. So the agony noise of error interrupt will dismiss
> when boot on machine that BIOS does not handle thermal interrupt..  
> 
> 
> Signed-off-by: Youquan Song <youquan.song@intel.com>
> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
> Acked-by: Yong Wang <yong.y.wang@intel.com>
> ---
>  arch/x86/include/asm/apicdef.h           |    1 +
>  arch/x86/kernel/cpu/mcheck/therm_throt.c |   12 +++++++-----
>  2 files changed, 8 insertions(+), 5 deletions(-)

I don't disagree with this change, but unfortunately the changelog is in 
absolutely unreadable English. Please fix it or find someone who can fix it for 
you.

I decoded and fixed the changelog of the 2/2 patch of your series so no need to 
do it for that patch.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/2] apic: Fix error interrupt report at all APs
@ 2011-04-14  6:36 Youquan Song
  2011-04-19 17:01 ` Ingo Molnar
  0 siblings, 1 reply; 9+ messages in thread
From: Youquan Song @ 2011-04-14  6:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: akpm, mingo, tglx, hpa, hpa, suresh.b.siddha, yong.y.wang, joe,
	jbaron, trenn, kent.liu, chaohong.guo, Youquan Song,
	Youquan Song

Recently, customer report that once machine boot, there are many error interrupt
reported with exact number of all APs. 

The root cause is Local APIC will generate error interrupt when it detect
the illegal vector (one in 0 ~ 15) in an interrupt message received or
interrupt generate from local vector table or self IPI. SDM3A.chapter 10.

AP LAPIC thermal sensor register will be reset to 0x10000, if thermal throttling
interrupt take over by BIOS, it need restore AP with the thermal sensor register
value of geting from BSP, otherwise cause system issue. If BIOS does not take
over the thermal interrupt, The restore value will be CPU rest value of 0x10000,
which means the interrupt vector is zero. After writing 0x10000 to thermal
sensor LVT, the processor will recieve the error interrupt report if the APIC
error interrupt is also set.

This patch add check the BIOS whether take over the thermal interrupt by look
at interrupt delivery mode not fixed mode(BIOS handle will be SMI mode) before
restore AP's thermal LVT. So the agony noise of error interrupt will dismiss
when boot on machine that BIOS does not handle thermal interrupt..  


Signed-off-by: Youquan Song <youquan.song@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yong Wang <yong.y.wang@intel.com>
---
 arch/x86/include/asm/apicdef.h           |    1 +
 arch/x86/kernel/cpu/mcheck/therm_throt.c |   12 +++++++-----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index d87988b..34595d5 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -78,6 +78,7 @@
 #define		APIC_DEST_LOGICAL	0x00800
 #define		APIC_DEST_PHYSICAL	0x00000
 #define		APIC_DM_FIXED		0x00000
+#define		APIC_DM_FIXED_MASK	0x00700
 #define		APIC_DM_LOWEST		0x00100
 #define		APIC_DM_SMI		0x00200
 #define		APIC_DM_REMRD		0x00300
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 6f8c5e9..22c212a 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -446,18 +446,20 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
 	 */
 	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
 
+	h = lvtthmr_init;
 	/*
 	 * The initial value of thermal LVT entries on all APs always reads
 	 * 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
 	 * sequence to them and LVT registers are reset to 0s except for
 	 * the mask bits which are set to 1s when APs receive INIT IPI.
-	 * Always restore the value that BIOS has programmed on AP based on
-	 * BSP's info we saved since BIOS is always setting the same value
-	 * for all threads/cores
+	 * If BIOS take over the thermal interrupt and set its interrupt
+	 * delivery mode to SMI not fixed, it restore the value that BIOS has
+	 * programmed on AP based on BSP's info we saved since BIOS is always
+	 * setting the same value for all threads/cores.
 	 */
-	apic_write(APIC_LVTTHMR, lvtthmr_init);
+	if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED)
+		apic_write(APIC_LVTTHMR, lvtthmr_init);
 
-	h = lvtthmr_init;
 
 	if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
 		printk(KERN_DEBUG
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 1/2] apic: Fix error interrupt report at all APs
@ 2011-04-06 12:20 Youquan Song
  0 siblings, 0 replies; 9+ messages in thread
From: Youquan Song @ 2011-04-06 12:20 UTC (permalink / raw)
  To: linux-kernel, hpa
  Cc: suresh.b.siddha, yong.y.wang, joe, jbaron, trenn, kent.liu,
	chaohong.guo, Youquan Song, Youquan Song

Recently, customer report that once machine boot, there are many error interrupt
reported with exact number of all APs. 

The root cause is Local APIC will generate error interrupt when it detect
the illegal vector (one in 0 ~ 15) in an interrupt message received or
interrupt generate from local vector table or self IPI. SDM3A.chapter 10.

AP LAPIC thermal sensor register will be reset to 0x10000, if thermal throttling
interrupt take over by BIOS, it need restore AP with the thermal sensor register
value of geting from BSP, otherwise cause system issue. If BIOS does not take
over the thermal interrupt, The restore value will be CPU rest value of 0x10000,
which means the interrupt vector is zero. After writing 0x10000 to thermal
sensor LVT, the processor will recieve the error interrupt report if the APIC
error interrupt is also set.

This patch add check the BIOS whether take over the thermal interrupt by look
at interrupt delivery mode not fixed mode(BIOS handle will be SMI mode) before
restore AP's thermal LVT. So the agony noise of error interrupt will dismiss
when boot on machine that BIOS does not handle thermal interrupt..  


Signed-off-by: Youquan Song <youquan.song@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yong Wang <yong.y.wang@intel.com>
---
 arch/x86/include/asm/apicdef.h           |    1 +
 arch/x86/kernel/cpu/mcheck/therm_throt.c |   12 +++++++-----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index d87988b..34595d5 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -78,6 +78,7 @@
 #define		APIC_DEST_LOGICAL	0x00800
 #define		APIC_DEST_PHYSICAL	0x00000
 #define		APIC_DM_FIXED		0x00000
+#define		APIC_DM_FIXED_MASK	0x00700
 #define		APIC_DM_LOWEST		0x00100
 #define		APIC_DM_SMI		0x00200
 #define		APIC_DM_REMRD		0x00300
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 6f8c5e9..22c212a 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -446,18 +446,20 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
 	 */
 	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
 
+	h = lvtthmr_init;
 	/*
 	 * The initial value of thermal LVT entries on all APs always reads
 	 * 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
 	 * sequence to them and LVT registers are reset to 0s except for
 	 * the mask bits which are set to 1s when APs receive INIT IPI.
-	 * Always restore the value that BIOS has programmed on AP based on
-	 * BSP's info we saved since BIOS is always setting the same value
-	 * for all threads/cores
+	 * If BIOS take over the thermal interrupt and set its interrupt
+	 * delivery mode to SMI not fixed, it restore the value that BIOS has
+	 * programmed on AP based on BSP's info we saved since BIOS is always
+	 * setting the same value for all threads/cores.
 	 */
-	apic_write(APIC_LVTTHMR, lvtthmr_init);
+	if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED)
+		apic_write(APIC_LVTTHMR, lvtthmr_init);
 
-	h = lvtthmr_init;
 
 	if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
 		printk(KERN_DEBUG
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-05-16 16:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-21 16:22 [PATCH v4 1/2] apic: Fix error interrupt report at all APs Youquan Song
2011-05-12  2:19 ` Youquan Song
2011-05-16 16:49 ` [tip:x86/urgent] x86, apic: Fix spurious error interrupts triggering on all non-boot APs tip-bot for Youquan Song
  -- strict thread matches above, loose matches on Subject: below --
2011-04-14  6:36 [PATCH v4 1/2] apic: Fix error interrupt report at all APs Youquan Song
2011-04-19 17:01 ` Ingo Molnar
2011-04-22  3:12   ` Youquan Song
2011-04-21 15:27     ` Ingo Molnar
2011-04-22  4:34       ` Youquan Song
2011-04-06 12:20 Youquan Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.