All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec
@ 2010-03-08 11:17 Thomas Renninger
  2010-03-08 11:26 ` Avi Kivity
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-03-08 11:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kerstin Jonsson, jbohac, Yinghai Lu, akpm, mingo, Avi Kivity,
	Thomas Renninger

From: Kerstin Jonsson <kerstin.jonsson@ericsson.com>

When the SMP kernel decides to crash_kexec() the local APICs may have
pending interrupts in their vector tables.
The setup routine for the local APIC has a deficient mechanism for
clearing these interrupts, it only handles interrupts that has already
been dispatched to the local core for servicing (the ISR register)
safely, it doesn't consider lower prioritized queued interrupts stored
in the IRR register.

If you have more than one pending interrupt within the same 32 bit word
in the LAPIC vector table registers you may find yourself entering the
IO APIC setup with pending interrupts left in the LAPIC. This is a
situation for wich the IO APIC setup is not prepared. Depending of
what/which interrupt vector/vectors are stuck in the APIC tables your
system may show various degrees of malfunctioning.
That was the reason why the check_timer() failed in our system, the
timer interrupts was blocked by pending interrupts from the old kernel
when routed trough the IO APIC.

Additional comment from Jiri Bohac:
==============
If this should go into stable release,
I'd add some kind of limit on the number of iterations, just to be safe from
hard to debug lock-ups:

+if (loops++  > MAX_LOOPS) {
+        printk("LAPIC pending clean-up")
+        break;
+}
 while (queued);

with MAX_LOOPS something like 1E9 this would leave plenty of time for the
pending IRQs to be cleared and would and still cause at most a second of delay
if the loop were to lock-up for whatever reason.
==============

>From trenn@suse.de:
V2: Use tsc if avail to bail out after 1 sec due to possible virtual apic_read
    calls which may take rather long (suggested by: Avi Kivity <avi@redhat.com>)
    If no tsc is available bail out quickly after cpu_khz, if we broke out too
    early and still have irqs pending (which should never happen?) we still
    get a WARN_ON...


CC: jbohac@novell.com
CC: "Yinghai Lu" <yinghai@kernel.org>
CC: akpm@linux-foundation.org
CC: mingo@elte.hu
CC: "Kerstin Jonsson" <kerstin.jonsson@ericsson.com>
CC: "Avi Kivity" <avi@redhat.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
---
 arch/x86/kernel/apic/apic.c |   42 +++++++++++++++++++++++++++++++++---------
 1 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 3987e44..93cdb2a 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -51,6 +51,7 @@
 #include <asm/smp.h>
 #include <asm/mce.h>
 #include <asm/kvm_para.h>
+#include <asm/tsc.h>
 
 unsigned int num_processors;
 
@@ -1151,8 +1152,12 @@ static void __cpuinit lapic_setup_esr(void)
  */
 void __cpuinit setup_local_APIC(void)
 {
-	unsigned int value;
-	int i, j;
+	unsigned int value, queued;
+	int i, j, acked = 0;
+	unsigned long long tsc = 0, ntsc, max_loops = cpu_khz;
+
+	if (cpu_has_tsc)
+		rdtscll(ntsc);
 
 	if (disable_apic) {
 		arch_disable_smp_support();
@@ -1204,13 +1209,32 @@ void __cpuinit setup_local_APIC(void)
 	 * the interrupt. Hence a vector might get locked. It was noticed
 	 * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
 	 */
-	for (i = APIC_ISR_NR - 1; i >= 0; i--) {
-		value = apic_read(APIC_ISR + i*0x10);
-		for (j = 31; j >= 0; j--) {
-			if (value & (1<<j))
-				ack_APIC_irq();
-		}
-	}
+        do {
+            queued = 0;
+            for (i = APIC_ISR_NR - 1; i >= 0; i--)
+                queued |= apic_read(APIC_IRR + i*0x10);
+
+            for (i = APIC_ISR_NR - 1; i >= 0; i--) {
+                value = apic_read(APIC_ISR + i*0x10);
+                for (j = 31; j >= 0; j--) {
+                    if (value & (1<<j)) {
+                        ack_APIC_irq();
+                        acked++;
+                    }
+                }
+            }
+            if (acked > 256) {
+                printk(KERN_ERR "LAPIC pending interrupts after %d EOI\n",
+		       acked);
+                break;
+            }
+	    if (cpu_has_tsc) {
+		    rdtscll(ntsc);
+		    max_loops = (cpu_khz << 10) - (ntsc - tsc);
+	    } else
+		    max_loops--;
+        } while (queued && max_loops > 0);
+	WARN_ON(!max_loops);
 
 	/*
 	 * Now that we are all set up, enable the APIC
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec
  2010-03-08 11:17 [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec Thomas Renninger
@ 2010-03-08 11:26 ` Avi Kivity
  2010-03-08 11:34   ` [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V3 Thomas Renninger
  2010-03-08 11:26 ` [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec Thomas Renninger
  2010-03-08 11:34 ` Cyrill Gorcunov
  2 siblings, 1 reply; 17+ messages in thread
From: Avi Kivity @ 2010-03-08 11:26 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: linux-kernel, Kerstin Jonsson, jbohac, Yinghai Lu, akpm, mingo

On 03/08/2010 01:17 PM, Thomas Renninger wrote:
> From: Kerstin Jonsson<kerstin.jonsson@ericsson.com>
>
> When the SMP kernel decides to crash_kexec() the local APICs may have
> pending interrupts in their vector tables.
> The setup routine for the local APIC has a deficient mechanism for
> clearing these interrupts, it only handles interrupts that has already
> been dispatched to the local core for servicing (the ISR register)
> safely, it doesn't consider lower prioritized queued interrupts stored
> in the IRR register.
>
> If you have more than one pending interrupt within the same 32 bit word
> in the LAPIC vector table registers you may find yourself entering the
> IO APIC setup with pending interrupts left in the LAPIC. This is a
> situation for wich the IO APIC setup is not prepared. Depending of
> what/which interrupt vector/vectors are stuck in the APIC tables your
> system may show various degrees of malfunctioning.
> That was the reason why the check_timer() failed in our system, the
> timer interrupts was blocked by pending interrupts from the old kernel
> when routed trough the IO APIC.
>
> Additional comment from Jiri Bohac:
> ==============
> If this should go into stable release,
> I'd add some kind of limit on the number of iterations, just to be safe from
> hard to debug lock-ups:
>
> +if (loops++>  MAX_LOOPS) {
> +        printk("LAPIC pending clean-up")
> +        break;
> +}
>   while (queued);
>
> with MAX_LOOPS something like 1E9 this would leave plenty of time for the
> pending IRQs to be cleared and would and still cause at most a second of delay
> if the loop were to lock-up for whatever reason.
> ==============
>
>  From trenn@suse.de:
> V2: Use tsc if avail to bail out after 1 sec due to possible virtual apic_read
>      calls which may take rather long (suggested by: Avi Kivity<avi@redhat.com>)
>      If no tsc is available bail out quickly after cpu_khz, if we broke out too
>      early and still have irqs pending (which should never happen?) we still
>      get a WARN_ON...
>
>
>
> @@ -1151,8 +1152,12 @@ static void __cpuinit lapic_setup_esr(void)
>    */
>   void __cpuinit setup_local_APIC(void)
>   {
> -	unsigned int value;
> -	int i, j;
> +	unsigned int value, queued;
> +	int i, j, acked = 0;
> +	unsigned long long tsc = 0, ntsc, max_loops = cpu_khz;
> +
> +	if (cpu_has_tsc)
> +		rdtscll(ntsc);
>
>
>    

...

> +	    if (cpu_has_tsc) {
> +		    rdtscll(ntsc);
> +		    max_loops = (cpu_khz<<  10) - (ntsc - tsc);
>    

Since max_loops is unsigned, this will always be positive.

> +	    } else
> +		    max_loops--;
> +        } while (queued&&  max_loops>  0);
> +	WARN_ON(!max_loops);
>    

So the loop never terminates unless queued becomes true.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec
  2010-03-08 11:17 [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec Thomas Renninger
  2010-03-08 11:26 ` Avi Kivity
@ 2010-03-08 11:26 ` Thomas Renninger
  2010-03-08 11:34 ` Cyrill Gorcunov
  2 siblings, 0 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-03-08 11:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Kerstin Jonsson, jbohac, Yinghai Lu, akpm, mingo, Avi Kivity

On Monday 08 March 2010 12:17:10 Thomas Renninger wrote:
> From: Kerstin Jonsson <kerstin.jonsson@ericsson.com>
> 
> When the SMP kernel decides to crash_kexec() the local APICs may have
> pending interrupts in their vector tables.
...
Sorry, indentation is totally messed up, I resend another version
which is "checkpatch'ed"
If this is an acceptable approach, it would be great if Kerstin
could try out and test the "break out after 1 sec" condition and
whether all still works as expected...

Thanks,

  Thomas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V3
  2010-03-08 11:26 ` Avi Kivity
@ 2010-03-08 11:34   ` Thomas Renninger
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-03-08 11:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kerstin Jonsson, jbohac, Yinghai Lu, akpm, mingo, Avi Kivity,
	Thomas Renninger

From: Kerstin Jonsson <kerstin.jonsson@ericsson.com>

When the SMP kernel decides to crash_kexec() the local APICs may have
pending interrupts in their vector tables.
The setup routine for the local APIC has a deficient mechanism for
clearing these interrupts, it only handles interrupts that has already
been dispatched to the local core for servicing (the ISR register)
safely, it doesn't consider lower prioritized queued interrupts stored
in the IRR register.

If you have more than one pending interrupt within the same 32 bit word
in the LAPIC vector table registers you may find yourself entering the
IO APIC setup with pending interrupts left in the LAPIC. This is a
situation for wich the IO APIC setup is not prepared. Depending of
what/which interrupt vector/vectors are stuck in the APIC tables your
system may show various degrees of malfunctioning.
That was the reason why the check_timer() failed in our system, the
timer interrupts was blocked by pending interrupts from the old kernel
when routed trough the IO APIC.

Additional comment from Jiri Bohac:
==============
If this should go into stable release,
I'd add some kind of limit on the number of iterations, just to be safe from
hard to debug lock-ups:

+if (loops++  > MAX_LOOPS) {
+        printk("LAPIC pending clean-up")
+        break;
+}
 while (queued);

with MAX_LOOPS something like 1E9 this would leave plenty of time for the
pending IRQs to be cleared and would and still cause at most a second of delay
if the loop were to lock-up for whatever reason.
==============

>From trenn@suse.de:
V2: Use tsc if avail to bail out after 1 sec due to possible virtual apic_read
    calls which may take rather long (suggested by: Avi Kivity <avi@redhat.com>)
    If no tsc is available bail out quickly after cpu_khz, if we broke out too
    early and still have irqs pending (which should never happen?) we still
    get a WARN_ON...

V3: - Fixed indentation -> checkpatch clean
    - max_loops must be signed


CC: jbohac@novell.com
CC: "Yinghai Lu" <yinghai@kernel.org>
CC: akpm@linux-foundation.org
CC: mingo@elte.hu
CC: "Kerstin Jonsson" <kerstin.jonsson@ericsson.com>
CC: "Avi Kivity" <avi@redhat.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
---
 arch/x86/kernel/apic/apic.c |   41 +++++++++++++++++++++++++++++++++--------
 1 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 3987e44..badd661 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -51,6 +51,7 @@
 #include <asm/smp.h>
 #include <asm/mce.h>
 #include <asm/kvm_para.h>
+#include <asm/tsc.h>
 
 unsigned int num_processors;
 
@@ -1151,8 +1152,13 @@ static void __cpuinit lapic_setup_esr(void)
  */
 void __cpuinit setup_local_APIC(void)
 {
-	unsigned int value;
-	int i, j;
+	unsigned int value, queued;
+	int i, j, acked = 0;
+	unsigned long long tsc = 0, ntsc;
+	long long max_loops = cpu_khz;
+
+	if (cpu_has_tsc)
+		rdtscll(ntsc);
 
 	if (disable_apic) {
 		arch_disable_smp_support();
@@ -1204,13 +1210,32 @@ void __cpuinit setup_local_APIC(void)
 	 * the interrupt. Hence a vector might get locked. It was noticed
 	 * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
 	 */
-	for (i = APIC_ISR_NR - 1; i >= 0; i--) {
-		value = apic_read(APIC_ISR + i*0x10);
-		for (j = 31; j >= 0; j--) {
-			if (value & (1<<j))
-				ack_APIC_irq();
+	do {
+		queued = 0;
+		for (i = APIC_ISR_NR - 1; i >= 0; i--)
+			queued |= apic_read(APIC_IRR + i*0x10);
+
+		for (i = APIC_ISR_NR - 1; i >= 0; i--) {
+			value = apic_read(APIC_ISR + i*0x10);
+			for (j = 31; j >= 0; j--) {
+				if (value & (1<<j)) {
+					ack_APIC_irq();
+					acked++;
+				}
+			}
 		}
-	}
+		if (acked > 256) {
+			printk(KERN_ERR "LAPIC pending interrupts after %d EOI\n",
+			       acked);
+			break;
+		}
+		if (cpu_has_tsc) {
+			rdtscll(ntsc);
+			max_loops = (cpu_khz << 10) - (ntsc - tsc);
+		} else
+			max_loops--;
+	} while (queued && max_loops > 0);
+	WARN_ON(!max_loops);
 
 	/*
 	 * Now that we are all set up, enable the APIC
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec
  2010-03-08 11:17 [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec Thomas Renninger
  2010-03-08 11:26 ` Avi Kivity
  2010-03-08 11:26 ` [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec Thomas Renninger
@ 2010-03-08 11:34 ` Cyrill Gorcunov
  2010-03-08 11:40   ` Thomas Renninger
  2010-03-08 11:43   ` [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V4 Thomas Renninger
  2 siblings, 2 replies; 17+ messages in thread
From: Cyrill Gorcunov @ 2010-03-08 11:34 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: linux-kernel, Kerstin Jonsson, jbohac, Yinghai Lu, akpm, mingo,
	Avi Kivity

On Mon, Mar 08, 2010 at 12:17:10PM +0100, Thomas Renninger wrote:
...
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 3987e44..93cdb2a 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -51,6 +51,7 @@
>  #include <asm/smp.h>
>  #include <asm/mce.h>
>  #include <asm/kvm_para.h>
> +#include <asm/tsc.h>
>  
>  unsigned int num_processors;
>  
> @@ -1151,8 +1152,12 @@ static void __cpuinit lapic_setup_esr(void)
>   */
>  void __cpuinit setup_local_APIC(void)
>  {
> -	unsigned int value;
> -	int i, j;
> +	unsigned int value, queued;
> +	int i, j, acked = 0;
> +	unsigned long long tsc = 0, ntsc, max_loops = cpu_khz;
> +
> +	if (cpu_has_tsc)
> +		rdtscll(ntsc);

Perhaps rdtscll(tsc)?

>  
>  	if (disable_apic) {
>  		arch_disable_smp_support();
> @@ -1204,13 +1209,32 @@ void __cpuinit setup_local_APIC(void)
>  	 * the interrupt. Hence a vector might get locked. It was noticed
>  	 * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
>  	 */
> -	for (i = APIC_ISR_NR - 1; i >= 0; i--) {
> -		value = apic_read(APIC_ISR + i*0x10);
> -		for (j = 31; j >= 0; j--) {
> -			if (value & (1<<j))
> -				ack_APIC_irq();
> -		}
> -	}
> +        do {
> +            queued = 0;
> +            for (i = APIC_ISR_NR - 1; i >= 0; i--)
> +                queued |= apic_read(APIC_IRR + i*0x10);
> +
> +            for (i = APIC_ISR_NR - 1; i >= 0; i--) {
> +                value = apic_read(APIC_ISR + i*0x10);
> +                for (j = 31; j >= 0; j--) {
> +                    if (value & (1<<j)) {
> +                        ack_APIC_irq();
> +                        acked++;
> +                    }
> +                }
> +            }
> +            if (acked > 256) {
> +                printk(KERN_ERR "LAPIC pending interrupts after %d EOI\n",
> +		       acked);
> +                break;
> +            }
> +	    if (cpu_has_tsc) {
> +		    rdtscll(ntsc);
> +		    max_loops = (cpu_khz << 10) - (ntsc - tsc);

Where is tsc modified? It remains tsc = 0 all the time?
Or I miss the snippet where it is set?

> +	    } else
> +		    max_loops--;
> +        } while (queued && max_loops > 0);
> +	WARN_ON(!max_loops);
>  
>  	/*
>  	 * Now that we are all set up, enable the APIC
> -- 
> 1.6.3
> 
	-- Cyrill

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec
  2010-03-08 11:34 ` Cyrill Gorcunov
@ 2010-03-08 11:40   ` Thomas Renninger
  2010-03-08 11:43   ` [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V4 Thomas Renninger
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-03-08 11:40 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, Kerstin Jonsson, jbohac, Yinghai Lu, akpm, mingo,
	Avi Kivity

On Monday 08 March 2010 12:34:52 Cyrill Gorcunov wrote:
> On Mon, Mar 08, 2010 at 12:17:10PM +0100, Thomas Renninger wrote:
> ...
> > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> > index 3987e44..93cdb2a 100644
> > +	if (cpu_has_tsc)
> > +		rdtscll(ntsc);
> 
> Perhaps rdtscll(tsc)?
Oh dear..., I played with this to wipe out:
warning: ‘tsc’ may be used uninitialized in this function

> > +	    if (cpu_has_tsc) {
> > +		    rdtscll(ntsc);
> > +		    max_loops = (cpu_khz << 10) - (ntsc - tsc);
> 
> Where is tsc modified? It remains tsc = 0 all the time?
> Or I miss the snippet where it is set?
Yes, you are right..., next version, thanks a lot for looking at this,

    Thomas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V4
  2010-03-08 11:34 ` Cyrill Gorcunov
  2010-03-08 11:40   ` Thomas Renninger
@ 2010-03-08 11:43   ` Thomas Renninger
  2010-03-08 16:25     ` Kerstin Jonsson
  2010-03-09  9:14     ` kerstin.jonsson
  1 sibling, 2 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-03-08 11:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kerstin Jonsson, jbohac, Yinghai Lu, akpm, mingo, Avi Kivity,
	Thomas Renninger

From: Kerstin Jonsson <kerstin.jonsson@ericsson.com>

When the SMP kernel decides to crash_kexec() the local APICs may have
pending interrupts in their vector tables.
The setup routine for the local APIC has a deficient mechanism for
clearing these interrupts, it only handles interrupts that has already
been dispatched to the local core for servicing (the ISR register)
safely, it doesn't consider lower prioritized queued interrupts stored
in the IRR register.

If you have more than one pending interrupt within the same 32 bit word
in the LAPIC vector table registers you may find yourself entering the
IO APIC setup with pending interrupts left in the LAPIC. This is a
situation for wich the IO APIC setup is not prepared. Depending of
what/which interrupt vector/vectors are stuck in the APIC tables your
system may show various degrees of malfunctioning.
That was the reason why the check_timer() failed in our system, the
timer interrupts was blocked by pending interrupts from the old kernel
when routed trough the IO APIC.

Additional comment from Jiri Bohac:
==============
If this should go into stable release,
I'd add some kind of limit on the number of iterations, just to be safe from
hard to debug lock-ups:

+if (loops++  > MAX_LOOPS) {
+        printk("LAPIC pending clean-up")
+        break;
+}
 while (queued);

with MAX_LOOPS something like 1E9 this would leave plenty of time for the
pending IRQs to be cleared and would and still cause at most a second of delay
if the loop were to lock-up for whatever reason.
==============

>From trenn@suse.de:
V2: Use tsc if avail to bail out after 1 sec due to possible virtual apic_read
    calls which may take rather long (suggested by: Avi Kivity <avi@redhat.com>)
    If no tsc is available bail out quickly after cpu_khz, if we broke out too
    early and still have irqs pending (which should never happen?) we still
    get a WARN_ON...

V3: - Fixed indentation -> checkpatch clean
    - max_loops must be signed

V4: - Fix typo, mixed up tsc and ntsc in first rdtscll() call

CC: jbohac@novell.com
CC: "Yinghai Lu" <yinghai@kernel.org>
CC: akpm@linux-foundation.org
CC: mingo@elte.hu
CC: "Kerstin Jonsson" <kerstin.jonsson@ericsson.com>
CC: "Avi Kivity" <avi@redhat.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
---
 arch/x86/kernel/apic/apic.c |   41 +++++++++++++++++++++++++++++++++--------
 1 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 3987e44..414a5df 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -51,6 +51,7 @@
 #include <asm/smp.h>
 #include <asm/mce.h>
 #include <asm/kvm_para.h>
+#include <asm/tsc.h>
 
 unsigned int num_processors;
 
@@ -1151,8 +1152,13 @@ static void __cpuinit lapic_setup_esr(void)
  */
 void __cpuinit setup_local_APIC(void)
 {
-	unsigned int value;
-	int i, j;
+	unsigned int value, queued;
+	int i, j, acked = 0;
+	unsigned long long tsc = 0, ntsc;
+	long long max_loops = cpu_khz;
+
+	if (cpu_has_tsc)
+		rdtscll(tsc);
 
 	if (disable_apic) {
 		arch_disable_smp_support();
@@ -1204,13 +1210,32 @@ void __cpuinit setup_local_APIC(void)
 	 * the interrupt. Hence a vector might get locked. It was noticed
 	 * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
 	 */
-	for (i = APIC_ISR_NR - 1; i >= 0; i--) {
-		value = apic_read(APIC_ISR + i*0x10);
-		for (j = 31; j >= 0; j--) {
-			if (value & (1<<j))
-				ack_APIC_irq();
+	do {
+		queued = 0;
+		for (i = APIC_ISR_NR - 1; i >= 0; i--)
+			queued |= apic_read(APIC_IRR + i*0x10);
+
+		for (i = APIC_ISR_NR - 1; i >= 0; i--) {
+			value = apic_read(APIC_ISR + i*0x10);
+			for (j = 31; j >= 0; j--) {
+				if (value & (1<<j)) {
+					ack_APIC_irq();
+					acked++;
+				}
+			}
 		}
-	}
+		if (acked > 256) {
+			printk(KERN_ERR "LAPIC pending interrupts after %d EOI\n",
+			       acked);
+			break;
+		}
+		if (cpu_has_tsc) {
+			rdtscll(ntsc);
+			max_loops = (cpu_khz << 10) - (ntsc - tsc);
+		} else
+			max_loops--;
+	} while (queued && max_loops > 0);
+	WARN_ON(!max_loops);
 
 	/*
 	 * Now that we are all set up, enable the APIC
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* RE: [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V4
  2010-03-08 11:43   ` [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V4 Thomas Renninger
@ 2010-03-08 16:25     ` Kerstin Jonsson
  2010-03-09  9:14     ` kerstin.jonsson
  1 sibling, 0 replies; 17+ messages in thread
From: Kerstin Jonsson @ 2010-03-08 16:25 UTC (permalink / raw)
  To: Thomas Renninger, linux-kernel
  Cc: jbohac, Yinghai Lu, akpm, mingo, Avi Kivity

> From: Thomas Renninger [trenn@suse.de]
> Sent: Monday, March 08, 2010 12:43 PM
> To: linux-kernel@vger.kernel.org
> Cc: Kerstin Jonsson; jbohac@novell.com; Yinghai Lu; akpm@linux-foundation.org; mingo@elte.hu; Avi Kivity; Thomas Renninger
> Subject: [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V4
>
> From: Kerstin Jonsson <kerstin.jonsson@ericsson.com>
>
> When the SMP kernel decides to crash_kexec() the local APICs may have
> pending interrupts in their vector tables.
> The setup routine for the local APIC has a deficient mechanism for
> clearing these interrupts, it only handles interrupts that has already
> been dispatched to the local core for servicing (the ISR register)
> safely, it doesn't consider lower prioritized queued interrupts stored
> in the IRR register.
>
> If you have more than one pending interrupt within the same 32 bit word
> in the LAPIC vector table registers you may find yourself entering the
> IO APIC setup with pending interrupts left in the LAPIC. This is a
> situation for wich the IO APIC setup is not prepared. Depending of
> what/which interrupt vector/vectors are stuck in the APIC tables your
> system may show various degrees of malfunctioning.
> That was the reason why the check_timer() failed in our system, the
> timer interrupts was blocked by pending interrupts from the old kernel
> when routed trough the IO APIC.
>
> Additional comment from Jiri Bohac:
> ==============
> If this should go into stable release,
> I'd add some kind of limit on the number of iterations, just to be safe from
> hard to debug lock-ups:
>
> +if (loops++  > MAX_LOOPS) {
> +        printk("LAPIC pending clean-up")
> +        break;
> +}
>  while (queued);
>
> with MAX_LOOPS something like 1E9 this would leave plenty of time for the
> pending IRQs to be cleared and would and still cause at most a second of delay
> if the loop were to lock-up for whatever reason.
> ==============
>
> >From trenn@suse.de:
> V2: Use tsc if avail to bail out after 1 sec due to possible virtual apic_read
>     calls which may take rather long (suggested by: Avi Kivity <avi@redhat.com>)
>     If no tsc is available bail out quickly after cpu_khz, if we broke out too
>     early and still have irqs pending (which should never happen?) we still
>     get a WARN_ON...
>
> V3: - Fixed indentation -> checkpatch clean
>     - max_loops must be signed
>
> V4: - Fix typo, mixed up tsc and ntsc in first rdtscll() call
>
> CC: jbohac@novell.com
> CC: "Yinghai Lu" <yinghai@kernel.org>
> CC: akpm@linux-foundation.org
> CC: mingo@elte.hu
> CC: "Kerstin Jonsson" <kerstin.jonsson@ericsson.com>
> CC: "Avi Kivity" <avi@redhat.com>
> Signed-off-by: Thomas Renninger <trenn@suse.de>
> ---
>  arch/x86/kernel/apic/apic.c |   41 +++++++++++++++++++++++++++++++++--------
>  1 files changed, 33 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 3987e44..414a5df 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -51,6 +51,7 @@
>  #include <asm/smp.h>
>  #include <asm/mce.h>
>  #include <asm/kvm_para.h>
> +#include <asm/tsc.h>
>
>  unsigned int num_processors;
>
> @@ -1151,8 +1152,13 @@ static void __cpuinit lapic_setup_esr(void)
>   */
>  void __cpuinit setup_local_APIC(void)
>  {
> -       unsigned int value;
> -       int i, j;
> +       unsigned int value, queued;
> +       int i, j, acked = 0;
> +       unsigned long long tsc = 0, ntsc;
> +       long long max_loops = cpu_khz;
> +
> +       if (cpu_has_tsc)
> +               rdtscll(tsc);
>
>         if (disable_apic) {
>                 arch_disable_smp_support();
> @@ -1204,13 +1210,32 @@ void __cpuinit setup_local_APIC(void)
>          * the interrupt. Hence a vector might get locked. It was noticed
>          * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
>          */
> -       for (i = APIC_ISR_NR - 1; i >= 0; i--) {
> -               value = apic_read(APIC_ISR + i*0x10);
> -               for (j = 31; j >= 0; j--) {
> -                       if (value & (1<<j))
> -                               ack_APIC_irq();
> +       do {
> +               queued = 0;
> +               for (i = APIC_ISR_NR - 1; i >= 0; i--)
> +                       queued |= apic_read(APIC_IRR + i*0x10);
> +
> +               for (i = APIC_ISR_NR - 1; i >= 0; i--) {
> +                       value = apic_read(APIC_ISR + i*0x10);
> +                       for (j = 31; j >= 0; j--) {
> +                               if (value & (1<<j)) {
> +                                       ack_APIC_irq();
> +                                       acked++;
> +                               }
> +                       }
>                 }
> -       }
> +               if (acked > 256) {
> +                       printk(KERN_ERR "LAPIC pending interrupts after %d EOI\n",
> +                              acked);
> +                       break;
> +               }
> +               if (cpu_has_tsc) {
> +                       rdtscll(ntsc);
> +                       max_loops = (cpu_khz << 10) - (ntsc - tsc);
> +               } else
> +                       max_loops--;
> +       } while (queued && max_loops > 0);
> +       WARN_ON(!max_loops);
>
>         /*
>          * Now that we are all set up, enable the APIC
> --
> 1.6.3
>
>
>  
Are you quite done now? Anyhow, I was doing documentation, which I hate
intensively! any excuse to defer is appreciated.

I have verified the patch on target HW:

model name      : Dual Core AMD Opteron(tm) Processor 165
cpu MHz         : 1800.056


model name      : Intel(R) Xeon(R) CPU           L5408  @ 2.13GHz
cpu MHz         : 2127.988

and in kvm:

(QEMU PC emulator version 0.10.6 (qemu-kvm-78.0.10.6-0.3.1))

hosted by a:

model name      : Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz
cpu MHz         : 1994.988

It still flushes multiple pending interrupts in the APIC tables -
i.e. my crash kernel boots up OK even when subjected to "ISR mayhem"
prior to crash.
If I force it to stay in the flush loop, it times out in approx. 1.02s in
all different target environments, close enough I'd say.

I do, however, have tsc support in all of them, had I not I'd probably
found it a bit tedious to wait for the kvm loop (if against all odds it
would get stuck) due to longer loop-time in kvm it would take ~100s to
perform (max_loops=cpu_khz) rounds. But then again, my host machine is
old, with better virtualization support in more modern machines and it
is an unlikely case, etc. I guess it won't really be a problem.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V4
  2010-03-08 11:43   ` [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V4 Thomas Renninger
  2010-03-08 16:25     ` Kerstin Jonsson
@ 2010-03-09  9:14     ` kerstin.jonsson
  2010-03-09 10:52       ` [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V5 Thomas Renninger
  1 sibling, 1 reply; 17+ messages in thread
From: kerstin.jonsson @ 2010-03-09  9:14 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: linux-kernel, jbohac, Yinghai Lu, akpm, mingo, Avi Kivity

On 03/08/2010 12:43 PM, Thomas Renninger wrote:
> From: Kerstin Jonsson<kerstin.jonsson@ericsson.com>
>
> When the SMP kernel decides to crash_kexec() the local APICs may have
> pending interrupts in their vector tables.
> The setup routine for the local APIC has a deficient mechanism for
> clearing these interrupts, it only handles interrupts that has already
> been dispatched to the local core for servicing (the ISR register)
> safely, it doesn't consider lower prioritized queued interrupts stored
> in the IRR register.
>
> If you have more than one pending interrupt within the same 32 bit word
> in the LAPIC vector table registers you may find yourself entering the
> IO APIC setup with pending interrupts left in the LAPIC. This is a
> situation for wich the IO APIC setup is not prepared. Depending of
> what/which interrupt vector/vectors are stuck in the APIC tables your
> system may show various degrees of malfunctioning.
> That was the reason why the check_timer() failed in our system, the
> timer interrupts was blocked by pending interrupts from the old kernel
> when routed trough the IO APIC.
>
> Additional comment from Jiri Bohac:
> ==============
> If this should go into stable release,
> I'd add some kind of limit on the number of iterations, just to be safe from
> hard to debug lock-ups:
>
> +if (loops++>  MAX_LOOPS) {
> +        printk("LAPIC pending clean-up")
> +        break;
> +}
>   while (queued);
>
> with MAX_LOOPS something like 1E9 this would leave plenty of time for the
> pending IRQs to be cleared and would and still cause at most a second of delay
> if the loop were to lock-up for whatever reason.
> ==============
>
>  From trenn@suse.de:
> V2: Use tsc if avail to bail out after 1 sec due to possible virtual apic_read
>      calls which may take rather long (suggested by: Avi Kivity<avi@redhat.com>)
>      If no tsc is available bail out quickly after cpu_khz, if we broke out too
>      early and still have irqs pending (which should never happen?) we still
>      get a WARN_ON...
>
> V3: - Fixed indentation ->  checkpatch clean
>      - max_loops must be signed
>
> V4: - Fix typo, mixed up tsc and ntsc in first rdtscll() call
>
> CC: jbohac@novell.com
> CC: "Yinghai Lu"<yinghai@kernel.org>
> CC: akpm@linux-foundation.org
> CC: mingo@elte.hu
> CC: "Kerstin Jonsson"<kerstin.jonsson@ericsson.com>
> CC: "Avi Kivity"<avi@redhat.com>
> Signed-off-by: Thomas Renninger<trenn@suse.de>
> ---
>   arch/x86/kernel/apic/apic.c |   41 +++++++++++++++++++++++++++++++++--------
>   1 files changed, 33 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 3987e44..414a5df 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -51,6 +51,7 @@
>   #include<asm/smp.h>
>   #include<asm/mce.h>
>   #include<asm/kvm_para.h>
> +#include<asm/tsc.h>
>
>   unsigned int num_processors;
>
> @@ -1151,8 +1152,13 @@ static void __cpuinit lapic_setup_esr(void)
>    */
>   void __cpuinit setup_local_APIC(void)
>   {
> -	unsigned int value;
> -	int i, j;
> +	unsigned int value, queued;
> +	int i, j, acked = 0;
> +	unsigned long long tsc = 0, ntsc;
> +	long long max_loops = cpu_khz;
> +
> +	if (cpu_has_tsc)
> +		rdtscll(tsc);
>
>   	if (disable_apic) {
>   		arch_disable_smp_support();
> @@ -1204,13 +1210,32 @@ void __cpuinit setup_local_APIC(void)
>   	 * the interrupt. Hence a vector might get locked. It was noticed
>   	 * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
>   	 */
> -	for (i = APIC_ISR_NR - 1; i>= 0; i--) {
> -		value = apic_read(APIC_ISR + i*0x10);
> -		for (j = 31; j>= 0; j--) {
> -			if (value&  (1<<j))
> -				ack_APIC_irq();
> +	do {
> +		queued = 0;
> +		for (i = APIC_ISR_NR - 1; i>= 0; i--)
> +			queued |= apic_read(APIC_IRR + i*0x10);
> +
> +		for (i = APIC_ISR_NR - 1; i>= 0; i--) {
> +			value = apic_read(APIC_ISR + i*0x10);
> +			for (j = 31; j>= 0; j--) {
> +				if (value&  (1<<j)) {
> +					ack_APIC_irq();
> +					acked++;
> +				}
> +			}
>   		}
> -	}
> +		if (acked>  256) {
> +			printk(KERN_ERR "LAPIC pending interrupts after %d EOI\n",
> +			       acked);
> +			break;
> +		}
> +		if (cpu_has_tsc) {
> +			rdtscll(ntsc);
> +			max_loops = (cpu_khz<<  10) - (ntsc - tsc);
> +		} else
> +			max_loops--;
> +	} while (queued&&  max_loops>  0);
> +	WARN_ON(!max_loops);
>
>   	/*
>   	 * Now that we are all set up, enable the APIC
>    
On the verge of being overzealous:

WARN_ON(!max_loops); max_loops<  0 will probably be the most common error exit in a system that has tsc...


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V5
  2010-03-09  9:14     ` kerstin.jonsson
@ 2010-03-09 10:52       ` Thomas Renninger
  2010-03-19  1:18         ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Renninger @ 2010-03-09 10:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kerstin Jonsson, jbohac, Yinghai Lu, akpm, mingo, Avi Kivity,
	Thomas Renninger

From: Kerstin Jonsson <kerstin.jonsson@ericsson.com>

When the SMP kernel decides to crash_kexec() the local APICs may have
pending interrupts in their vector tables.
The setup routine for the local APIC has a deficient mechanism for
clearing these interrupts, it only handles interrupts that has already
been dispatched to the local core for servicing (the ISR register)
safely, it doesn't consider lower prioritized queued interrupts stored
in the IRR register.

If you have more than one pending interrupt within the same 32 bit word
in the LAPIC vector table registers you may find yourself entering the
IO APIC setup with pending interrupts left in the LAPIC. This is a
situation for wich the IO APIC setup is not prepared. Depending of
what/which interrupt vector/vectors are stuck in the APIC tables your
system may show various degrees of malfunctioning.
That was the reason why the check_timer() failed in our system, the
timer interrupts was blocked by pending interrupts from the old kernel
when routed trough the IO APIC.

Additional comment from Jiri Bohac:
==============
If this should go into stable release,
I'd add some kind of limit on the number of iterations, just to be safe from
hard to debug lock-ups:

+if (loops++  > MAX_LOOPS) {
+        printk("LAPIC pending clean-up")
+        break;
+}
 while (queued);

with MAX_LOOPS something like 1E9 this would leave plenty of time for the
pending IRQs to be cleared and would and still cause at most a second of delay
if the loop were to lock-up for whatever reason.
==============

>From trenn@suse.de:
V2: Use tsc if avail to bail out after 1 sec due to possible virtual apic_read
    calls which may take rather long (suggested by: Avi Kivity <avi@redhat.com>)
    If no tsc is available bail out quickly after cpu_khz, if we broke out too
    early and still have irqs pending (which should never happen?) we still
    get a WARN_ON...

V3: - Fixed indentation -> checkpatch clean
    - max_loops must be signed

V4: - Fix typo, mixed up tsc and ntsc in first rdtscll() call

V5: Adjust WARN_ON() condition to also catch error in cpu_has_tsc case

CC: jbohac@novell.com
CC: "Yinghai Lu" <yinghai@kernel.org>
CC: akpm@linux-foundation.org
CC: mingo@elte.hu
CC: "Kerstin Jonsson" <kerstin.jonsson@ericsson.com>
CC: "Avi Kivity" <avi@redhat.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
---
 arch/x86/kernel/apic/apic.c |   41 +++++++++++++++++++++++++++++++++--------
 1 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 00187f1..cfcc87f 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -51,6 +51,7 @@
 #include <asm/smp.h>
 #include <asm/mce.h>
 #include <asm/kvm_para.h>
+#include <asm/tsc.h>
 
 unsigned int num_processors;
 
@@ -1151,8 +1152,13 @@ static void __cpuinit lapic_setup_esr(void)
  */
 void __cpuinit setup_local_APIC(void)
 {
-	unsigned int value;
-	int i, j;
+	unsigned int value, queued;
+	int i, j, acked = 0;
+	unsigned long long tsc = 0, ntsc;
+	long long max_loops = cpu_khz;
+
+	if (cpu_has_tsc)
+		rdtscll(tsc);
 
 	if (disable_apic) {
 		arch_disable_smp_support();
@@ -1204,13 +1210,32 @@ void __cpuinit setup_local_APIC(void)
 	 * the interrupt. Hence a vector might get locked. It was noticed
 	 * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
 	 */
-	for (i = APIC_ISR_NR - 1; i >= 0; i--) {
-		value = apic_read(APIC_ISR + i*0x10);
-		for (j = 31; j >= 0; j--) {
-			if (value & (1<<j))
-				ack_APIC_irq();
+	do {
+		queued = 0;
+		for (i = APIC_ISR_NR - 1; i >= 0; i--)
+			queued |= apic_read(APIC_IRR + i*0x10);
+
+		for (i = APIC_ISR_NR - 1; i >= 0; i--) {
+			value = apic_read(APIC_ISR + i*0x10);
+			for (j = 31; j >= 0; j--) {
+				if (value & (1<<j)) {
+					ack_APIC_irq();
+					acked++;
+				}
+			}
 		}
-	}
+		if (acked > 256) {
+			printk(KERN_ERR "LAPIC pending interrupts after %d EOI\n",
+			       acked);
+			break;
+		}
+		if (cpu_has_tsc) {
+			rdtscll(ntsc);
+			max_loops = (cpu_khz << 10) - (ntsc - tsc);
+		} else
+			max_loops--;
+	} while (queued && max_loops > 0);
+	WARN_ON(max_loops <= 0);
 
 	/*
 	 * Now that we are all set up, enable the APIC
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V5
  2010-03-09 10:52       ` [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V5 Thomas Renninger
@ 2010-03-19  1:18         ` Eric W. Biederman
  2010-03-20  6:42           ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2010-03-19  1:18 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: linux-kernel, Kerstin Jonsson, jbohac, Yinghai Lu, akpm, mingo,
	Avi Kivity


Andrew thanks for finding this.  I have a test case for this that
reproduces about every other time, and I will plug this patch in and
see it helps.  I'm not wild about how the max_loops variable is
reused both as a timer and as a countdown timer, but the basic
principle feels solid.

I have been seeing this and for some reason I thought I was dying
in calibrate_delay_loop().  But this is much later and much easier
to deal with.  Since we make it to smp_init() there isn't any
good excuse for us to fail to come up.

I'm curious how much testing have you been able to do on this piece
of code?

Eric


Thomas Renninger <trenn@suse.de> writes:

> From: Kerstin Jonsson <kerstin.jonsson@ericsson.com>
>
> When the SMP kernel decides to crash_kexec() the local APICs may have
> pending interrupts in their vector tables.
> The setup routine for the local APIC has a deficient mechanism for
> clearing these interrupts, it only handles interrupts that has already
> been dispatched to the local core for servicing (the ISR register)
> safely, it doesn't consider lower prioritized queued interrupts stored
> in the IRR register.
>
> If you have more than one pending interrupt within the same 32 bit word
> in the LAPIC vector table registers you may find yourself entering the
> IO APIC setup with pending interrupts left in the LAPIC. This is a
> situation for wich the IO APIC setup is not prepared. Depending of
> what/which interrupt vector/vectors are stuck in the APIC tables your
> system may show various degrees of malfunctioning.
> That was the reason why the check_timer() failed in our system, the
> timer interrupts was blocked by pending interrupts from the old kernel
> when routed trough the IO APIC.
>
> Additional comment from Jiri Bohac:
> ==============
> If this should go into stable release,
> I'd add some kind of limit on the number of iterations, just to be safe from
> hard to debug lock-ups:
>
> +if (loops++  > MAX_LOOPS) {
> +        printk("LAPIC pending clean-up")
> +        break;
> +}
>  while (queued);
>
> with MAX_LOOPS something like 1E9 this would leave plenty of time for the
> pending IRQs to be cleared and would and still cause at most a second of delay
> if the loop were to lock-up for whatever reason.
> ==============
>
>>From trenn@suse.de:
> V2: Use tsc if avail to bail out after 1 sec due to possible virtual apic_read
>     calls which may take rather long (suggested by: Avi Kivity <avi@redhat.com>)
>     If no tsc is available bail out quickly after cpu_khz, if we broke out too
>     early and still have irqs pending (which should never happen?) we still
>     get a WARN_ON...
>
> V3: - Fixed indentation -> checkpatch clean
>     - max_loops must be signed
>
> V4: - Fix typo, mixed up tsc and ntsc in first rdtscll() call
>
> V5: Adjust WARN_ON() condition to also catch error in cpu_has_tsc case
>
> CC: jbohac@novell.com
> CC: "Yinghai Lu" <yinghai@kernel.org>
> CC: akpm@linux-foundation.org
> CC: mingo@elte.hu
> CC: "Kerstin Jonsson" <kerstin.jonsson@ericsson.com>
> CC: "Avi Kivity" <avi@redhat.com>
> Signed-off-by: Thomas Renninger <trenn@suse.de>
> ---
>  arch/x86/kernel/apic/apic.c |   41 +++++++++++++++++++++++++++++++++--------
>  1 files changed, 33 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 00187f1..cfcc87f 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -51,6 +51,7 @@
>  #include <asm/smp.h>
>  #include <asm/mce.h>
>  #include <asm/kvm_para.h>
> +#include <asm/tsc.h>
>  
>  unsigned int num_processors;
>  
> @@ -1151,8 +1152,13 @@ static void __cpuinit lapic_setup_esr(void)
>   */
>  void __cpuinit setup_local_APIC(void)
>  {
> -	unsigned int value;
> -	int i, j;
> +	unsigned int value, queued;
> +	int i, j, acked = 0;
> +	unsigned long long tsc = 0, ntsc;
> +	long long max_loops = cpu_khz;
> +
> +	if (cpu_has_tsc)
> +		rdtscll(tsc);
>  
>  	if (disable_apic) {
>  		arch_disable_smp_support();
> @@ -1204,13 +1210,32 @@ void __cpuinit setup_local_APIC(void)
>  	 * the interrupt. Hence a vector might get locked. It was noticed
>  	 * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
>  	 */
> -	for (i = APIC_ISR_NR - 1; i >= 0; i--) {
> -		value = apic_read(APIC_ISR + i*0x10);
> -		for (j = 31; j >= 0; j--) {
> -			if (value & (1<<j))
> -				ack_APIC_irq();
> +	do {
> +		queued = 0;
> +		for (i = APIC_ISR_NR - 1; i >= 0; i--)
> +			queued |= apic_read(APIC_IRR + i*0x10);
> +
> +		for (i = APIC_ISR_NR - 1; i >= 0; i--) {
> +			value = apic_read(APIC_ISR + i*0x10);
> +			for (j = 31; j >= 0; j--) {
> +				if (value & (1<<j)) {
> +					ack_APIC_irq();
> +					acked++;
> +				}
> +			}
>  		}
> -	}
> +		if (acked > 256) {
> +			printk(KERN_ERR "LAPIC pending interrupts after %d EOI\n",
> +			       acked);
> +			break;
> +		}
> +		if (cpu_has_tsc) {
> +			rdtscll(ntsc);
> +			max_loops = (cpu_khz << 10) - (ntsc - tsc);
> +		} else
> +			max_loops--;
> +	} while (queued && max_loops > 0);
> +	WARN_ON(max_loops <= 0);
>  
>  	/*
>  	 * Now that we are all set up, enable the APIC
> -- 
> 1.6.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V5
  2010-03-19  1:18         ` Eric W. Biederman
@ 2010-03-20  6:42           ` Eric W. Biederman
  2010-03-22 11:28             ` kerstin.jonsson
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2010-03-20  6:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Kerstin Jonsson, jbohac, Yinghai Lu, mingo,
	Avi Kivity, Thomas Renninger

ebiederm@xmission.com (Eric W. Biederman) writes:

> Andrew thanks for finding this.  I have a test case for this that
> reproduces about every other time, and I will plug this patch in and
> see it helps.  I'm not wild about how the max_loops variable is
> reused both as a timer and as a countdown timer, but the basic
> principle feels solid.
>
> I have been seeing this and for some reason I thought I was dying
> in calibrate_delay_loop().  But this is much later and much easier
> to deal with.  Since we make it to smp_init() there isn't any
> good excuse for us to fail to come up.
>
> I'm curious how much testing have you been able to do on this piece
> of code?

This code definitely makes things better in my test case.
I had the patience to wait for 12 iterations and I was
expecting 6 failures and I saw none.

I have reservations about the timeout, but the rest of the patch
is definitely doing the right thing, and something is a lot better
than nothing.

Tested-by: "Eric W. Biederman" <ebiederm@xmission.com>


> Thomas Renninger <trenn@suse.de> writes:
>
>> From: Kerstin Jonsson <kerstin.jonsson@ericsson.com>
>>
>> When the SMP kernel decides to crash_kexec() the local APICs may have
>> pending interrupts in their vector tables.
>> The setup routine for the local APIC has a deficient mechanism for
>> clearing these interrupts, it only handles interrupts that has already
>> been dispatched to the local core for servicing (the ISR register)
>> safely, it doesn't consider lower prioritized queued interrupts stored
>> in the IRR register.
>>
>> If you have more than one pending interrupt within the same 32 bit word
>> in the LAPIC vector table registers you may find yourself entering the
>> IO APIC setup with pending interrupts left in the LAPIC. This is a
>> situation for wich the IO APIC setup is not prepared. Depending of
>> what/which interrupt vector/vectors are stuck in the APIC tables your
>> system may show various degrees of malfunctioning.
>> That was the reason why the check_timer() failed in our system, the
>> timer interrupts was blocked by pending interrupts from the old kernel
>> when routed trough the IO APIC.
>>
>> Additional comment from Jiri Bohac:
>> ==============
>> If this should go into stable release,
>> I'd add some kind of limit on the number of iterations, just to be safe from
>> hard to debug lock-ups:
>>
>> +if (loops++  > MAX_LOOPS) {
>> +        printk("LAPIC pending clean-up")
>> +        break;
>> +}
>>  while (queued);
>>
>> with MAX_LOOPS something like 1E9 this would leave plenty of time for the
>> pending IRQs to be cleared and would and still cause at most a second of delay
>> if the loop were to lock-up for whatever reason.
>> ==============
>>
>>>From trenn@suse.de:
>> V2: Use tsc if avail to bail out after 1 sec due to possible virtual apic_read
>>     calls which may take rather long (suggested by: Avi Kivity <avi@redhat.com>)
>>     If no tsc is available bail out quickly after cpu_khz, if we broke out too
>>     early and still have irqs pending (which should never happen?) we still
>>     get a WARN_ON...
>>
>> V3: - Fixed indentation -> checkpatch clean
>>     - max_loops must be signed
>>
>> V4: - Fix typo, mixed up tsc and ntsc in first rdtscll() call
>>
>> V5: Adjust WARN_ON() condition to also catch error in cpu_has_tsc case
>>
>> CC: jbohac@novell.com
>> CC: "Yinghai Lu" <yinghai@kernel.org>
>> CC: akpm@linux-foundation.org
>> CC: mingo@elte.hu
>> CC: "Kerstin Jonsson" <kerstin.jonsson@ericsson.com>
>> CC: "Avi Kivity" <avi@redhat.com>
>> Signed-off-by: Thomas Renninger <trenn@suse.de>
>> ---
>>  arch/x86/kernel/apic/apic.c |   41 +++++++++++++++++++++++++++++++++--------
>>  1 files changed, 33 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
>> index 00187f1..cfcc87f 100644
>> --- a/arch/x86/kernel/apic/apic.c
>> +++ b/arch/x86/kernel/apic/apic.c
>> @@ -51,6 +51,7 @@
>>  #include <asm/smp.h>
>>  #include <asm/mce.h>
>>  #include <asm/kvm_para.h>
>> +#include <asm/tsc.h>
>>  
>>  unsigned int num_processors;
>>  
>> @@ -1151,8 +1152,13 @@ static void __cpuinit lapic_setup_esr(void)
>>   */
>>  void __cpuinit setup_local_APIC(void)
>>  {
>> -	unsigned int value;
>> -	int i, j;
>> +	unsigned int value, queued;
>> +	int i, j, acked = 0;
>> +	unsigned long long tsc = 0, ntsc;
>> +	long long max_loops = cpu_khz;
>> +
>> +	if (cpu_has_tsc)
>> +		rdtscll(tsc);
>>  
>>  	if (disable_apic) {
>>  		arch_disable_smp_support();
>> @@ -1204,13 +1210,32 @@ void __cpuinit setup_local_APIC(void)
>>  	 * the interrupt. Hence a vector might get locked. It was noticed
>>  	 * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
>>  	 */
>> -	for (i = APIC_ISR_NR - 1; i >= 0; i--) {
>> -		value = apic_read(APIC_ISR + i*0x10);
>> -		for (j = 31; j >= 0; j--) {
>> -			if (value & (1<<j))
>> -				ack_APIC_irq();
>> +	do {
>> +		queued = 0;
>> +		for (i = APIC_ISR_NR - 1; i >= 0; i--)
>> +			queued |= apic_read(APIC_IRR + i*0x10);
>> +
>> +		for (i = APIC_ISR_NR - 1; i >= 0; i--) {
>> +			value = apic_read(APIC_ISR + i*0x10);
>> +			for (j = 31; j >= 0; j--) {
>> +				if (value & (1<<j)) {
>> +					ack_APIC_irq();
>> +					acked++;
>> +				}
>> +			}
>>  		}
>> -	}
>> +		if (acked > 256) {
>> +			printk(KERN_ERR "LAPIC pending interrupts after %d EOI\n",
>> +			       acked);
>> +			break;
>> +		}
>> +		if (cpu_has_tsc) {
>> +			rdtscll(ntsc);
>> +			max_loops = (cpu_khz << 10) - (ntsc - tsc);
>> +		} else
>> +			max_loops--;
>> +	} while (queued && max_loops > 0);
>> +	WARN_ON(max_loops <= 0);
>>  
>>  	/*
>>  	 * Now that we are all set up, enable the APIC
>> -- 
>> 1.6.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V5
  2010-03-20  6:42           ` Eric W. Biederman
@ 2010-03-22 11:28             ` kerstin.jonsson
  2010-03-22 12:23               ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: kerstin.jonsson @ 2010-03-22 11:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, linux-kernel, jbohac, Yinghai Lu, mingo,
	Avi Kivity, Thomas Renninger

On 03/20/2010 07:42 AM, Eric W. Biederman wrote:
> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>    
>> Andrew thanks for finding this.  I have a test case for this that
>> reproduces about every other time, and I will plug this patch in and
>> see it helps.  I'm not wild about how the max_loops variable is
>> reused both as a timer and as a countdown timer, but the basic
>> principle feels solid.
>>
>> I have been seeing this and for some reason I thought I was dying
>> in calibrate_delay_loop().  But this is much later and much easier
>> to deal with.  Since we make it to smp_init() there isn't any
>> good excuse for us to fail to come up.
>>
>> I'm curious how much testing have you been able to do on this piece
>> of code?
>>      
> This code definitely makes things better in my test case.
> I had the patience to wait for 12 iterations and I was
> expecting 6 failures and I saw none.
>
> I have reservations about the timeout, but the rest of the patch
> is definitely doing the right thing, and something is a lot better
> than nothing.
>
> Tested-by: "Eric W. Biederman"<ebiederm@xmission.com>
>
>
>    
We have an application that relies heavily on kexec and have seen this 
problem for quite some time without being able to pin-point the root 
cause. Through a case of serendipity we found a reliable way to trigger 
the fault, which is how I was able to get a better understanding of the 
source. My original patch is included in our kernel, which is used in an 
embedded telecom system, and quite well tested. N.B. the original 
version did not include the max_loop constraint. I have added the 
current version of the patch to our kernel, it is not yet included in 
thousands of customer nodes as is the first version, but preliminary 
tests indicates that it still works as intended.

Tested-by: "Kerstin Jonsson" <kerstin.jonsson@ericsson.com>
>> Thomas Renninger<trenn@suse.de>  writes:
>>
>>      
>>> From: Kerstin Jonsson<kerstin.jonsson@ericsson.com>
>>>
>>> When the SMP kernel decides to crash_kexec() the local APICs may have
>>> pending interrupts in their vector tables.
>>> The setup routine for the local APIC has a deficient mechanism for
>>> clearing these interrupts, it only handles interrupts that has already
>>> been dispatched to the local core for servicing (the ISR register)
>>> safely, it doesn't consider lower prioritized queued interrupts stored
>>> in the IRR register.
>>>
>>> If you have more than one pending interrupt within the same 32 bit word
>>> in the LAPIC vector table registers you may find yourself entering the
>>> IO APIC setup with pending interrupts left in the LAPIC. This is a
>>> situation for wich the IO APIC setup is not prepared. Depending of
>>> what/which interrupt vector/vectors are stuck in the APIC tables your
>>> system may show various degrees of malfunctioning.
>>> That was the reason why the check_timer() failed in our system, the
>>> timer interrupts was blocked by pending interrupts from the old kernel
>>> when routed trough the IO APIC.
>>>
>>> Additional comment from Jiri Bohac:
>>> ==============
>>> If this should go into stable release,
>>> I'd add some kind of limit on the number of iterations, just to be safe from
>>> hard to debug lock-ups:
>>>
>>> +if (loops++>  MAX_LOOPS) {
>>> +        printk("LAPIC pending clean-up")
>>> +        break;
>>> +}
>>>   while (queued);
>>>
>>> with MAX_LOOPS something like 1E9 this would leave plenty of time for the
>>> pending IRQs to be cleared and would and still cause at most a second of delay
>>> if the loop were to lock-up for whatever reason.
>>> ==============
>>>
>>> > From trenn@suse.de:
>>> V2: Use tsc if avail to bail out after 1 sec due to possible virtual apic_read
>>>      calls which may take rather long (suggested by: Avi Kivity<avi@redhat.com>)
>>>      If no tsc is available bail out quickly after cpu_khz, if we broke out too
>>>      early and still have irqs pending (which should never happen?) we still
>>>      get a WARN_ON...
>>>
>>> V3: - Fixed indentation ->  checkpatch clean
>>>      - max_loops must be signed
>>>
>>> V4: - Fix typo, mixed up tsc and ntsc in first rdtscll() call
>>>
>>> V5: Adjust WARN_ON() condition to also catch error in cpu_has_tsc case
>>>
>>> CC: jbohac@novell.com
>>> CC: "Yinghai Lu"<yinghai@kernel.org>
>>> CC: akpm@linux-foundation.org
>>> CC: mingo@elte.hu
>>> CC: "Kerstin Jonsson"<kerstin.jonsson@ericsson.com>
>>> CC: "Avi Kivity"<avi@redhat.com>
>>> Signed-off-by: Thomas Renninger<trenn@suse.de>
>>> ---
>>>   arch/x86/kernel/apic/apic.c |   41 +++++++++++++++++++++++++++++++++--------
>>>   1 files changed, 33 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
>>> index 00187f1..cfcc87f 100644
>>> --- a/arch/x86/kernel/apic/apic.c
>>> +++ b/arch/x86/kernel/apic/apic.c
>>> @@ -51,6 +51,7 @@
>>>   #include<asm/smp.h>
>>>   #include<asm/mce.h>
>>>   #include<asm/kvm_para.h>
>>> +#include<asm/tsc.h>
>>>
>>>   unsigned int num_processors;
>>>
>>> @@ -1151,8 +1152,13 @@ static void __cpuinit lapic_setup_esr(void)
>>>    */
>>>   void __cpuinit setup_local_APIC(void)
>>>   {
>>> -	unsigned int value;
>>> -	int i, j;
>>> +	unsigned int value, queued;
>>> +	int i, j, acked = 0;
>>> +	unsigned long long tsc = 0, ntsc;
>>> +	long long max_loops = cpu_khz;
>>> +
>>> +	if (cpu_has_tsc)
>>> +		rdtscll(tsc);
>>>
>>>   	if (disable_apic) {
>>>   		arch_disable_smp_support();
>>> @@ -1204,13 +1210,32 @@ void __cpuinit setup_local_APIC(void)
>>>   	 * the interrupt. Hence a vector might get locked. It was noticed
>>>   	 * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
>>>   	 */
>>> -	for (i = APIC_ISR_NR - 1; i>= 0; i--) {
>>> -		value = apic_read(APIC_ISR + i*0x10);
>>> -		for (j = 31; j>= 0; j--) {
>>> -			if (value&  (1<<j))
>>> -				ack_APIC_irq();
>>> +	do {
>>> +		queued = 0;
>>> +		for (i = APIC_ISR_NR - 1; i>= 0; i--)
>>> +			queued |= apic_read(APIC_IRR + i*0x10);
>>> +
>>> +		for (i = APIC_ISR_NR - 1; i>= 0; i--) {
>>> +			value = apic_read(APIC_ISR + i*0x10);
>>> +			for (j = 31; j>= 0; j--) {
>>> +				if (value&  (1<<j)) {
>>> +					ack_APIC_irq();
>>> +					acked++;
>>> +				}
>>> +			}
>>>   		}
>>> -	}
>>> +		if (acked>  256) {
>>> +			printk(KERN_ERR "LAPIC pending interrupts after %d EOI\n",
>>> +			       acked);
>>> +			break;
>>> +		}
>>> +		if (cpu_has_tsc) {
>>> +			rdtscll(ntsc);
>>> +			max_loops = (cpu_khz<<  10) - (ntsc - tsc);
>>> +		} else
>>> +			max_loops--;
>>> +	} while (queued&&  max_loops>  0);
>>> +	WARN_ON(max_loops<= 0);
>>>
>>>   	/*
>>>   	 * Now that we are all set up, enable the APIC
>>> -- 
>>> 1.6.3
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>        


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V5
  2010-03-22 11:28             ` kerstin.jonsson
@ 2010-03-22 12:23               ` Eric W. Biederman
       [not found]                 ` <43F901BD926A4E43B106BF17856F0755E7C393B9@orsmsx508.amr.corp.intel.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2010-03-22 12:23 UTC (permalink / raw)
  To: kerstin.jonsson
  Cc: Andrew Morton, linux-kernel, jbohac, Yinghai Lu, mingo,
	Avi Kivity, Thomas Renninger

"kerstin.jonsson" <kerstin.jonsson@ericsson.com> writes:

> On 03/20/2010 07:42 AM, Eric W. Biederman wrote:
>> ebiederm@xmission.com (Eric W. Biederman) writes:
>>
>>    
>>> Andrew thanks for finding this.  I have a test case for this that
>>> reproduces about every other time, and I will plug this patch in and
>>> see it helps.  I'm not wild about how the max_loops variable is
>>> reused both as a timer and as a countdown timer, but the basic
>>> principle feels solid.
>>>
>>> I have been seeing this and for some reason I thought I was dying
>>> in calibrate_delay_loop().  But this is much later and much easier
>>> to deal with.  Since we make it to smp_init() there isn't any
>>> good excuse for us to fail to come up.
>>>
>>> I'm curious how much testing have you been able to do on this piece
>>> of code?
>>>      
>> This code definitely makes things better in my test case.
>> I had the patience to wait for 12 iterations and I was
>> expecting 6 failures and I saw none.
>>
>> I have reservations about the timeout, but the rest of the patch
>> is definitely doing the right thing, and something is a lot better
>> than nothing.
>>
>> Tested-by: "Eric W. Biederman"<ebiederm@xmission.com>
>>
>>
>>    
> We have an application that relies heavily on kexec and have seen this problem
> for quite some time without being able to pin-point the root cause. Through a
> case of serendipity we found a reliable way to trigger the fault, which is how I
> was able to get a better understanding of the source. My original patch is
> included in our kernel, which is used in an embedded telecom system, and quite
> well tested. N.B. the original version did not include the max_loop
> constraint. I have added the current version of the patch to our kernel, it is
> not yet included in thousands of customer nodes as is the first version, but
> preliminary tests indicates that it still works as intended.

Alright.  In general I am in favor of loop limits when dealing with hardware
peculiarities.  We should have all of the ioapics disabled from sending anything
at this point so we should be guaranteed not to loop, but occasionally things
don't happen as they should, so it seems like a healthy precaution.

> Tested-by: "Kerstin Jonsson" <kerstin.jonsson@ericsson.com>
>>> Thomas Renninger<trenn@suse.de>  writes:
>>>
>>>      
>>>> From: Kerstin Jonsson<kerstin.jonsson@ericsson.com>
>>>>
>>>> When the SMP kernel decides to crash_kexec() the local APICs may have
>>>> pending interrupts in their vector tables.
>>>> The setup routine for the local APIC has a deficient mechanism for
>>>> clearing these interrupts, it only handles interrupts that has already
>>>> been dispatched to the local core for servicing (the ISR register)
>>>> safely, it doesn't consider lower prioritized queued interrupts stored
>>>> in the IRR register.
>>>>
>>>> If you have more than one pending interrupt within the same 32 bit word
>>>> in the LAPIC vector table registers you may find yourself entering the
>>>> IO APIC setup with pending interrupts left in the LAPIC. This is a
>>>> situation for wich the IO APIC setup is not prepared. Depending of
>>>> what/which interrupt vector/vectors are stuck in the APIC tables your
>>>> system may show various degrees of malfunctioning.
>>>> That was the reason why the check_timer() failed in our system, the
>>>> timer interrupts was blocked by pending interrupts from the old kernel
>>>> when routed trough the IO APIC.
>>>>
>>>> Additional comment from Jiri Bohac:
>>>> ==============
>>>> If this should go into stable release,
>>>> I'd add some kind of limit on the number of iterations, just to be safe from
>>>> hard to debug lock-ups:
>>>>
>>>> +if (loops++>  MAX_LOOPS) {
>>>> +        printk("LAPIC pending clean-up")
>>>> +        break;
>>>> +}
>>>>   while (queued);
>>>>
>>>> with MAX_LOOPS something like 1E9 this would leave plenty of time for the
>>>> pending IRQs to be cleared and would and still cause at most a second of delay
>>>> if the loop were to lock-up for whatever reason.
>>>> ==============
>>>>
>>>> > From trenn@suse.de:
>>>> V2: Use tsc if avail to bail out after 1 sec due to possible virtual apic_read
>>>>      calls which may take rather long (suggested by: Avi Kivity<avi@redhat.com>)
>>>>      If no tsc is available bail out quickly after cpu_khz, if we broke out too
>>>>      early and still have irqs pending (which should never happen?) we still
>>>>      get a WARN_ON...
>>>>
>>>> V3: - Fixed indentation ->  checkpatch clean
>>>>      - max_loops must be signed
>>>>
>>>> V4: - Fix typo, mixed up tsc and ntsc in first rdtscll() call
>>>>
>>>> V5: Adjust WARN_ON() condition to also catch error in cpu_has_tsc case
>>>>
>>>> CC: jbohac@novell.com
>>>> CC: "Yinghai Lu"<yinghai@kernel.org>
>>>> CC: akpm@linux-foundation.org
>>>> CC: mingo@elte.hu
>>>> CC: "Kerstin Jonsson"<kerstin.jonsson@ericsson.com>
>>>> CC: "Avi Kivity"<avi@redhat.com>
>>>> Signed-off-by: Thomas Renninger<trenn@suse.de>
>>>> ---
>>>>   arch/x86/kernel/apic/apic.c |   41 +++++++++++++++++++++++++++++++++--------
>>>>   1 files changed, 33 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
>>>> index 00187f1..cfcc87f 100644
>>>> --- a/arch/x86/kernel/apic/apic.c
>>>> +++ b/arch/x86/kernel/apic/apic.c
>>>> @@ -51,6 +51,7 @@
>>>>   #include<asm/smp.h>
>>>>   #include<asm/mce.h>
>>>>   #include<asm/kvm_para.h>
>>>> +#include<asm/tsc.h>
>>>>
>>>>   unsigned int num_processors;
>>>>
>>>> @@ -1151,8 +1152,13 @@ static void __cpuinit lapic_setup_esr(void)
>>>>    */
>>>>   void __cpuinit setup_local_APIC(void)
>>>>   {
>>>> -	unsigned int value;
>>>> -	int i, j;
>>>> +	unsigned int value, queued;
>>>> +	int i, j, acked = 0;
>>>> +	unsigned long long tsc = 0, ntsc;
>>>> +	long long max_loops = cpu_khz;
>>>> +
>>>> +	if (cpu_has_tsc)
>>>> +		rdtscll(tsc);
>>>>
>>>>   	if (disable_apic) {
>>>>   		arch_disable_smp_support();
>>>> @@ -1204,13 +1210,32 @@ void __cpuinit setup_local_APIC(void)
>>>>   	 * the interrupt. Hence a vector might get locked. It was noticed
>>>>   	 * for timer irq (vector 0x31). Issue an extra EOI to clear ISR.
>>>>   	 */
>>>> -	for (i = APIC_ISR_NR - 1; i>= 0; i--) {
>>>> -		value = apic_read(APIC_ISR + i*0x10);
>>>> -		for (j = 31; j>= 0; j--) {
>>>> -			if (value&  (1<<j))
>>>> -				ack_APIC_irq();
>>>> +	do {
>>>> +		queued = 0;
>>>> +		for (i = APIC_ISR_NR - 1; i>= 0; i--)
>>>> +			queued |= apic_read(APIC_IRR + i*0x10);
>>>> +
>>>> +		for (i = APIC_ISR_NR - 1; i>= 0; i--) {
>>>> +			value = apic_read(APIC_ISR + i*0x10);
>>>> +			for (j = 31; j>= 0; j--) {
>>>> +				if (value&  (1<<j)) {
>>>> +					ack_APIC_irq();
>>>> +					acked++;
>>>> +				}
>>>> +			}
>>>>   		}
>>>> -	}
>>>> +		if (acked>  256) {
>>>> +			printk(KERN_ERR "LAPIC pending interrupts after %d EOI\n",
>>>> +			       acked);
>>>> +			break;
>>>> +		}
>>>> +		if (cpu_has_tsc) {
>>>> +			rdtscll(ntsc);
>>>> +			max_loops = (cpu_khz<<  10) - (ntsc - tsc);
>>>> +		} else
>>>> +			max_loops--;
>>>> +	} while (queued&&  max_loops>  0);
>>>> +	WARN_ON(max_loops<= 0);
>>>>
>>>>   	/*
>>>>   	 * Now that we are all set up, enable the APIC
>>>> -- 
>>>> 1.6.3
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>>        

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V5
       [not found]                 ` <43F901BD926A4E43B106BF17856F0755E7C393B9@orsmsx508.amr.corp.intel.com>
@ 2010-06-17  0:19                   ` H. Peter Anvin
  2010-06-17  1:51                     ` Eric W. Biederman
       [not found]                     ` <43F901BD926A4E43B106BF17856F0755E7C3985D@orsmsx508.amr.corp.intel.com>
  0 siblings, 2 replies; 17+ messages in thread
From: H. Peter Anvin @ 2010-06-17  0:19 UTC (permalink / raw)
  To: Pan, Jacob jun
  Cc: Eric W. Biederman, kerstin.jonsson, Andrew Morton, linux-kernel,
	jbohac, Yinghai Lu, mingo, Avi Kivity, trenn

On 06/16/2010 02:11 PM, Pan, Jacob jun wrote:
> 
> W.R.T. the loop limits, is it possible to use a default max_loops value in
> case when cpu_khz is not set? The reason is that on Moorestown platform
> we need to do an early APIC setup before tsc_init(), so cpu_khz is 0 at the
> time we setup local APIC. The result is that we hit WARN_ON(max_loops<= 0)
> on Moorestown for early APIC setup.
> 
> The early APIC setup is needed because Moorestown does not have a PIT and the
> system timer interrupts are routed via IOAPIC.
> 

Can't MRST install a quick ballpark value into cpu_khz?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V5
  2010-06-17  0:19                   ` H. Peter Anvin
@ 2010-06-17  1:51                     ` Eric W. Biederman
       [not found]                     ` <43F901BD926A4E43B106BF17856F0755E7C3985D@orsmsx508.amr.corp.intel.com>
  1 sibling, 0 replies; 17+ messages in thread
From: Eric W. Biederman @ 2010-06-17  1:51 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Pan, Jacob jun, kerstin.jonsson, Andrew Morton, linux-kernel,
	jbohac, Yinghai Lu, mingo, Avi Kivity, trenn, Thomas Renninger

"H. Peter Anvin" <hpa@zytor.com> writes:

> On 06/16/2010 02:11 PM, Pan, Jacob jun wrote:
>> 
>> W.R.T. the loop limits, is it possible to use a default max_loops value in
>> case when cpu_khz is not set? The reason is that on Moorestown platform
>> we need to do an early APIC setup before tsc_init(), so cpu_khz is 0 at the
>> time we setup local APIC. The result is that we hit WARN_ON(max_loops<= 0)
>> on Moorestown for early APIC setup.

So you get a nasty warning but the system still boots?

>> The early APIC setup is needed because Moorestown does not have a PIT and the
>> system timer interrupts are routed via IOAPIC.
>> 
>
> Can't MRST install a quick ballpark value into cpu_khz?

Looking at the code the initialization order in init/main.c is:

early_irq_init()
init_IRQ()
init_timers()
time_init()
    tsc_init()

local_irq_enable()

So arguably if we simply switched those lines around we could make
this work, as you must be initializing the tsc with interrupts
disabled.

That said given the current ordering it appears that using the tsc
in setup_local_APIC is just broken because if it is properly called
from init_IRQ() the code doesn't work properly.

The question is what do we consider more important the current code
initialization ordering, or virtual processors having such an expensive
apic_read that we need a 1 second timeout.

I think the virtual processor concern is silly.  Most of the time
this loop should execute exactly once after having confirmed there
is nothing to do.  On a bad day this loop should execute at most twice.
If the vitalization is too expensive that this loop cannot execute
twice, bailing out early is a correctness concern.

I think we should set max_loops to 5.  Leave the WARN_ON, and call it
good.

Does the following patch work cleanly on Moorestown?

Eric


>From fc298618e87233de748c7a289f41bcc68ed8fc64 Mon Sep 17 00:00:00 2001
From: Eric W. Biederman <ebiederm@xmission.com>
Date: Wed, 16 Jun 2010 18:42:39 -0700
Subject: [PATCH] x86/apic:  Don't use the tsc in setup_local_APIC

If everything is initialized in the order of init/main.c we should have:
init_IRQ()
    setup_local_APIC()
time_init()
    tsc_init()
local_irq_enable()

Which means the only reason the current use of the tsc in setup_local_APIC
works is because we are calling setup_local_APIC late on most x86
platforms.

The justification given for using a 1 second timeout on this loop
is because virtualized platforms have a very expensive apic_read().

Typically this loop should execute exactly once after confirming there
is nothing to do.  On a bad day this loop should execute twice
after clearing the ISR and the IRR unacked irqs.  If it is too
expensive to execute this loop twice on a virtualized platform
that is not our problem.

Let's set max_loops to 5.  Which is more than enough and that
removes the need for messing with the tsc.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 arch/x86/kernel/apic/apic.c |   13 ++-----------
 1 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index c02cc69..d1a2b19 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -51,7 +51,6 @@
 #include <asm/smp.h>
 #include <asm/mce.h>
 #include <asm/kvm_para.h>
-#include <asm/tsc.h>
 
 unsigned int num_processors;
 
@@ -1154,11 +1153,7 @@ void __cpuinit setup_local_APIC(void)
 {
 	unsigned int value, queued;
 	int i, j, acked = 0;
-	unsigned long long tsc = 0, ntsc;
-	long long max_loops = cpu_khz;
-
-	if (cpu_has_tsc)
-		rdtscll(tsc);
+	int max_loops = 5;
 
 	if (disable_apic) {
 		arch_disable_smp_support();
@@ -1229,11 +1224,7 @@ void __cpuinit setup_local_APIC(void)
 			       acked);
 			break;
 		}
-		if (cpu_has_tsc) {
-			rdtscll(ntsc);
-			max_loops = (cpu_khz << 10) - (ntsc - tsc);
-		} else
-			max_loops--;
+		max_loops--;
 	} while (queued && max_loops > 0);
 	WARN_ON(max_loops <= 0);
 
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V5
       [not found]                     ` <43F901BD926A4E43B106BF17856F0755E7C3985D@orsmsx508.amr.corp.intel.com>
@ 2010-06-17 17:00                       ` H. Peter Anvin
  0 siblings, 0 replies; 17+ messages in thread
From: H. Peter Anvin @ 2010-06-17 17:00 UTC (permalink / raw)
  To: Pan, Jacob jun
  Cc: Eric W. Biederman, kerstin.jonsson, Andrew Morton, linux-kernel,
	jbohac, Yinghai Lu, mingo, Avi Kivity, trenn

On 06/17/2010 09:52 AM, Pan, Jacob jun wrote:
>> On 06/16/2010 02:11 PM, Pan, Jacob jun wrote:
>>>
>>> W.R.T. the loop limits, is it possible to use a default max_loops
>> value in
>>> case when cpu_khz is not set? The reason is that on Moorestown
>> platform
>>> we need to do an early APIC setup before tsc_init(), so cpu_khz is 0
>> at the
>>> time we setup local APIC. The result is that we hit
>> WARN_ON(max_loops<= 0)
>>> on Moorestown for early APIC setup.
>>>
>>> The early APIC setup is needed because Moorestown does not have a PIT
>> and the
>>> system timer interrupts are routed via IOAPIC.
>>>
>>
>> Can't MRST install a quick ballpark value into cpu_khz?
>>
> yes, we can do that to avoid the warning. the true cpu_khz can then be set
> in tsc_init by platform specific calibration code. That is one option.
> 

Seems like a reasonable thing to do to me.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-06-17 17:02 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-08 11:17 [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec Thomas Renninger
2010-03-08 11:26 ` Avi Kivity
2010-03-08 11:34   ` [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V3 Thomas Renninger
2010-03-08 11:26 ` [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec Thomas Renninger
2010-03-08 11:34 ` Cyrill Gorcunov
2010-03-08 11:40   ` Thomas Renninger
2010-03-08 11:43   ` [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V4 Thomas Renninger
2010-03-08 16:25     ` Kerstin Jonsson
2010-03-09  9:14     ` kerstin.jonsson
2010-03-09 10:52       ` [PATCH] x86 apic: Ack all pending irqs when crashed/on kexec - V5 Thomas Renninger
2010-03-19  1:18         ` Eric W. Biederman
2010-03-20  6:42           ` Eric W. Biederman
2010-03-22 11:28             ` kerstin.jonsson
2010-03-22 12:23               ` Eric W. Biederman
     [not found]                 ` <43F901BD926A4E43B106BF17856F0755E7C393B9@orsmsx508.amr.corp.intel.com>
2010-06-17  0:19                   ` H. Peter Anvin
2010-06-17  1:51                     ` Eric W. Biederman
     [not found]                     ` <43F901BD926A4E43B106BF17856F0755E7C3985D@orsmsx508.amr.corp.intel.com>
2010-06-17 17:00                       ` H. Peter Anvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.