linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups
@ 2001-01-29 18:46 Maciej W. Rozycki
  2001-01-29 20:40 ` Manfred Spraul
  0 siblings, 1 reply; 12+ messages in thread
From: Maciej W. Rozycki @ 2001-01-29 18:46 UTC (permalink / raw)
  To: Ingo Molnar, Manfred Spraul, Andrew Morton; +Cc: linux-kernel

Hi,

 After an extensive testing I concluded the infamous APIC lock-up happens
when a level-triggered interrupt gets masked in an I/O APIC when it's in
the send pending state (bit 12 of the respective interrupt redirection
entry is set).  Under this condition, the interrupt is still posted to
CPUs as set up by the redirection entry but with the trigger mode
incorrectly set to edge.  It's possible that setting the mask bit corrupts
somehow the state of the message in progress.  The I/O APIC still
remembers the interrupt to be level-triggered in the IRR bit so it expects
an EOI message from a local unit.  But no local unit is going to send it. 
The only way to recover from the lock-up of the entry is to program it for
the edge-trigged mode which resets the IRR bit. 

 It's possible to detect an incorrect trigger mode received at a local
APIC and reset the respective I/O APIC entry in case of a mismatch, but it
proved to be an overkill.  Manfred's idea of switching the mode upon
mask/unmask_IO_APIC_irq() costs us only two ALU instructions which is
really cheap, so we can afford having it unconditionally (a mismatch
detection would certainly cost us more, including a costly uncached local
APIC access).

 Following is a modification of Manfred's patch -- the original one was
making two r/w I/O APIC accesses unnecessarily.  It proved to work for me
under heavy interrupt load (I've chosen to set up my 8254 interrupt source
as level-triggered, which is a reliable way to provide lots of interrupts) 
with Andrew's IRQ whacker.  Without the patch I've repeatedly got an APIC
lockup after about 800k interrupts, which, I assume, was the point the
first disable_irq(0) was executed.  I've tested it on a DP i430HX board
equipped with an external 82093AA I/O APIC.

 The patch leaves the focus CPU feature enabled -- it does not seem to
make my system any less stable.  The fact that lock-ups were more likely
with the feature enabled is the result of a higher probability of
delivering consecutive interrupts from the same source to the same CPU,
hence it was also more probable for the interrupt to be in the send
pending state.  Empirical tests confirm this fact -- with the 8254
interrupt set up as level-triggered, both CPUs receive a similar number of
timer interrupts when the focus CPU feature is disabled and one of the
CPUs receives about twice more interrupts than the other one otherwise. 

 The patch should be safe for any 0x1x or 0x2x version of I/O APIC.  For
the 82489DX APIC the patch is probably harmful -- 82489DX APICs do not
define EOI messages but I/O units send level-deassert messages instead.  A
local unit considers an IRQ asserted as long as it does not receive a
deassert message.  By resetting an IRR bit in an I/O unit we would make it
asserted (almost) infinitely.  Since we are calling
mask/unmask_IO_APIC_irq() mostly indirectly, that should not be a problem
-- we may implement 82489DX-specific versions of the functions and install
them at run-time. 

 I'll implement an 82489DX update in a few days, but for now I'd like
everyone interested to test the following patch as much as possible.  It
applies to 2.4.0, 2.4.0-ac12 and 2.4.1-pre11 cleanly.

 Happy testing,

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

patch-2.4.0-io_apic-2
diff -up --recursive --new-file linux-2.4.0.macro/arch/i386/kernel/apic.c linux-2.4.0/arch/i386/kernel/apic.c
--- linux-2.4.0.macro/arch/i386/kernel/apic.c	Wed Dec 13 23:54:27 2000
+++ linux-2.4.0/arch/i386/kernel/apic.c	Sun Jan 28 08:58:02 2001
@@ -270,7 +270,7 @@ void __init setup_local_APIC (void)
 	 *   PCI Ne2000 networking cards and PII/PIII processors, dual
 	 *   BX chipset. ]
 	 */
-#if 0
+#if 1
 	/* Enable focus processor (bit==0) */
 	value &= ~(1<<9);
 #else
diff -up --recursive --new-file linux-2.4.0.macro/arch/i386/kernel/io_apic.c linux-2.4.0/arch/i386/kernel/io_apic.c
--- linux-2.4.0.macro/arch/i386/kernel/io_apic.c	Thu Oct  5 21:08:17 2000
+++ linux-2.4.0/arch/i386/kernel/io_apic.c	Sun Jan 28 08:58:02 2001
@@ -122,8 +122,25 @@ static void add_pin_to_irq(unsigned int 
 	static void name##_IO_APIC_irq (unsigned int irq)		\
 	__DO_ACTION(R, ACTION, FINAL)
 
-DO_ACTION( __mask,    0, |= 0x00010000, io_apic_sync(entry->apic))/* mask = 1 */
-DO_ACTION( __unmask,  0, &= 0xfffeffff, )				/* mask = 0 */
+/*
+ * It appears there is an erratum which affects at least the 82093AA
+ * I/O APIC.  If a level-triggered interrupt input is being masked in
+ * the redirection entry while the interrupt is send pending (its
+ * delivery status bit is set), the interrupt is erroneously
+ * delivered as edge-triggered but the IRR bit gets set nevertheless.
+ * As a result the I/O unit expects an EOI message but it will never
+ * arrive and further interrupts are blocked for the source.
+ *
+ * A workaround is to set the trigger mode to edge when masking
+ * a level-triggered interrupt and to revert the mode when unmasking.
+ * The idea is from Manfred Spraul.
+ */
+DO_ACTION( __mask,         0, = (reg & 0xffff7fff) | 0x00010000,
+	io_apic_sync(entry->apic))		/* mask = 1, trigger = edge */
+DO_ACTION( __unmask_edge,  0, &= 0xfffeffff,
+	)					/* mask = 0 */
+DO_ACTION( __unmask_level, 0, = (reg & 0xfffeffff) | 0x00008000,
+	)					/* mask = 0, trigger = level */
 
 static void mask_IO_APIC_irq (unsigned int irq)
 {
@@ -134,15 +151,26 @@ static void mask_IO_APIC_irq (unsigned i
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
-static void unmask_IO_APIC_irq (unsigned int irq)
+static void unmask_edge_IO_APIC_irq (unsigned int irq)
 {
 	unsigned long flags;
 
 	spin_lock_irqsave(&ioapic_lock, flags);
-	__unmask_IO_APIC_irq(irq);
+	__unmask_edge_IO_APIC_irq(irq);
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
+static void unmask_level_IO_APIC_irq (unsigned int irq)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioapic_lock, flags);
+	__unmask_level_IO_APIC_irq(irq);
+	spin_unlock_irqrestore(&ioapic_lock, flags);
+}
+
+#define unmask_IO_APIC_irq unmask_edge_IO_APIC_irq
+
 void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
 {
 	struct IO_APIC_route_entry entry;
@@ -1116,7 +1144,7 @@ static int __init nmi_irq_works(void)
  * that was delayed but this is now handled in the device
  * independent code.
  */
-#define enable_edge_ioapic_irq unmask_IO_APIC_irq
+#define enable_edge_ioapic_irq unmask_edge_IO_APIC_irq
 
 static void disable_edge_ioapic_irq (unsigned int irq) { /* nothing */ }
 
@@ -1141,7 +1169,7 @@ static unsigned int startup_edge_ioapic_
 		if (i8259A_irq_pending(irq))
 			was_pending = 1;
 	}
-	__unmask_IO_APIC_irq(irq);
+	__unmask_edge_IO_APIC_irq(irq);
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 
 	return was_pending;
@@ -1181,13 +1209,13 @@ static void end_edge_ioapic_irq (unsigne
  */
 static unsigned int startup_level_ioapic_irq (unsigned int irq)
 {
-	unmask_IO_APIC_irq(irq);
+	unmask_level_IO_APIC_irq(irq);
 
 	return 0; /* don't check for pending */
 }
 
 #define shutdown_level_ioapic_irq	mask_IO_APIC_irq
-#define enable_level_ioapic_irq		unmask_IO_APIC_irq
+#define enable_level_ioapic_irq		unmask_level_IO_APIC_irq
 #define disable_level_ioapic_irq	mask_IO_APIC_irq
 
 static void end_level_ioapic_irq (unsigned int i)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups
  2001-01-29 18:46 [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups Maciej W. Rozycki
@ 2001-01-29 20:40 ` Manfred Spraul
  2001-01-30 11:20   ` Maciej W. Rozycki
  0 siblings, 1 reply; 12+ messages in thread
From: Manfred Spraul @ 2001-01-29 20:40 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Ingo Molnar, Andrew Morton, linux-kernel

"Maciej W. Rozycki" wrote:
> 
>  I'll implement an 82489DX update in a few days, but for now I'd like
> everyone interested to test the following patch as much as possible.  It
> applies to 2.4.0, 2.4.0-ac12 and 2.4.1-pre11 cleanly.
>
I'm not totally convinced that this fixes all problems:

No lockup, and but a slightly increased packet loss: every few minutes a
block of 5-10 packets is lost. Cpu load is low (~30%), I'm running 3
concurrent bw_tcp, the io apic computer is the 'server'.

IIRC my original patch caused a far higher packet loss, perhaps because
it's slower? (you wrote something about 2 r/w accesses).

Are you sure that this really fixed the bug?
Remember that switching the 'trigger mode' bit will revive the io apic.

It's possible that 
* the io apic still locks up.
* but now {en,dis}able_irq() switches the 'trigger mode' bit, and thus
resets the io apic after a few msec --> a few lost packets.

It's far better than before, but I assume the bug is hidden, not fixed.

I'll make additional tests.

Send the patch to Linus - it makes ne2k cards usable with 2.4+io apic.

--
	Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups
  2001-01-29 20:40 ` Manfred Spraul
@ 2001-01-30 11:20   ` Maciej W. Rozycki
  2001-02-01  0:58     ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Maciej W. Rozycki @ 2001-01-30 11:20 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Ingo Molnar, Andrew Morton, linux-kernel

On Mon, 29 Jan 2001, Manfred Spraul wrote:

> I'm not totally convinced that this fixes all problems:
> 
> No lockup, and but a slightly increased packet loss: every few minutes a
> block of 5-10 packets is lost. Cpu load is low (~30%), I'm running 3
> concurrent bw_tcp, the io apic computer is the 'server'.

 I assume you mean "comparing to the noapic mode", right?

> IIRC my original patch caused a far higher packet loss, perhaps because
> it's slower? (you wrote something about 2 r/w accesses).

  Hmm, it's possible that the IRR bit is not always cleared, but I cannot
imagine any reason at the moment.  For your patch it might be the window
between I/O APIC writes and not the prolonged execution time which is the
reason.

> Are you sure that this really fixed the bug?
> Remember that switching the 'trigger mode' bit will revive the io apic.

 That has been observed, but does it happen every time?  I hope so, but no
proof.

> It's possible that 
> * the io apic still locks up.
> * but now {en,dis}able_irq() switches the 'trigger mode' bit, and thus
> resets the io apic after a few msec --> a few lost packets.

 That should be detectable, I'll prepare some code and see if this
happens.

> It's far better than before, but I assume the bug is hidden, not fixed.

 To fix the bug we'd have to modify the silicon.  It's not feasible at
this time, so we can only write worse or better workarounds, i.e. hide the
bug. 

> Send the patch to Linus - it makes ne2k cards usable with 2.4+io apic.

 Following is the 82489DX-ized version of the patch.  I believe it's fine,
but I would feel safer if others test it before I send it to Linus.

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

patch-2.4.0-io_apic-4
diff -up --recursive --new-file linux-2.4.0.macro/arch/i386/kernel/apic.c linux-2.4.0/arch/i386/kernel/apic.c
--- linux-2.4.0.macro/arch/i386/kernel/apic.c	Wed Dec 13 23:54:27 2000
+++ linux-2.4.0/arch/i386/kernel/apic.c	Sun Jan 28 08:58:02 2001
@@ -270,7 +270,7 @@ void __init setup_local_APIC (void)
 	 *   PCI Ne2000 networking cards and PII/PIII processors, dual
 	 *   BX chipset. ]
 	 */
-#if 0
+#if 1
 	/* Enable focus processor (bit==0) */
 	value &= ~(1<<9);
 #else
diff -up --recursive --new-file linux-2.4.0.macro/arch/i386/kernel/io_apic.c linux-2.4.0/arch/i386/kernel/io_apic.c
--- linux-2.4.0.macro/arch/i386/kernel/io_apic.c	Thu Oct  5 21:08:17 2000
+++ linux-2.4.0/arch/i386/kernel/io_apic.c	Tue Jan 30 07:49:01 2001
@@ -122,8 +122,27 @@ static void add_pin_to_irq(unsigned int 
 	static void name##_IO_APIC_irq (unsigned int irq)		\
 	__DO_ACTION(R, ACTION, FINAL)
 
-DO_ACTION( __mask,    0, |= 0x00010000, io_apic_sync(entry->apic))/* mask = 1 */
-DO_ACTION( __unmask,  0, &= 0xfffeffff, )				/* mask = 0 */
+/*
+ * It appears there is an erratum which affects at least the 82093AA
+ * I/O APIC.  If a level-triggered interrupt input is being masked in
+ * the redirection entry while the interrupt is send pending (its
+ * delivery status bit is set), the interrupt is erroneously
+ * delivered as edge-triggered but the IRR bit gets set nevertheless.
+ * As a result the I/O unit expects an EOI message but it will never
+ * arrive and further interrupts are blocked for the source.
+ *
+ * A workaround is to set the trigger mode to edge when masking
+ * a level-triggered interrupt and to revert the mode when unmasking.
+ * The idea is from Manfred Spraul.  --macro
+ */
+DO_ACTION( __mask,         0, |= 0x00010000,
+	)					/* mask = 1 */
+DO_ACTION( __unmask,       0, &= 0xfffeffff,
+	io_apic_sync(entry->apic))		/* mask = 0 */
+DO_ACTION( __mask_level,   0, = (reg & 0xffff7fff) | 0x00010000,
+	io_apic_sync(entry->apic))		/* mask = 1, trigger = edge */
+DO_ACTION( __unmask_level, 0, = (reg & 0xfffeffff) | 0x00008000,
+	)					/* mask = 0, trigger = level */
 
 static void mask_IO_APIC_irq (unsigned int irq)
 {
@@ -143,6 +162,24 @@ static void unmask_IO_APIC_irq (unsigned
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
+static void mask_level_IO_APIC_irq (unsigned int irq)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioapic_lock, flags);
+	__mask_level_IO_APIC_irq(irq);
+	spin_unlock_irqrestore(&ioapic_lock, flags);
+}
+
+static void unmask_level_IO_APIC_irq (unsigned int irq)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioapic_lock, flags);
+	__unmask_level_IO_APIC_irq(irq);
+	spin_unlock_irqrestore(&ioapic_lock, flags);
+}
+
 void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
 {
 	struct IO_APIC_route_entry entry;
@@ -1181,14 +1218,18 @@ static void end_edge_ioapic_irq (unsigne
  */
 static unsigned int startup_level_ioapic_irq (unsigned int irq)
 {
-	unmask_IO_APIC_irq(irq);
+	unmask_level_IO_APIC_irq(irq);
 
 	return 0; /* don't check for pending */
 }
 
-#define shutdown_level_ioapic_irq	mask_IO_APIC_irq
-#define enable_level_ioapic_irq		unmask_IO_APIC_irq
-#define disable_level_ioapic_irq	mask_IO_APIC_irq
+#define shutdown_level_ioapic_irq	mask_level_IO_APIC_irq
+#define enable_level_ioapic_irq		unmask_level_IO_APIC_irq
+#define disable_level_ioapic_irq	mask_level_IO_APIC_irq
+
+#define shutdown_level_82489dx_irq	mask_IO_APIC_irq
+#define enable_level_82489dx_irq	unmask_IO_APIC_irq
+#define disable_level_82489dx_irq	mask_IO_APIC_irq
 
 static void end_level_ioapic_irq (unsigned int i)
 {
@@ -1503,6 +1544,27 @@ static inline void check_timer(void)
 }
 
 /*
+ * We can't set the trigger mode to edge when masking a
+ * level-triggered interrupt in the 82489DX I/O APIC as
+ * no deassert message will be sent in this case and a
+ * local APIC may keep delivering the interrupt to a CPU.
+ * Hence we substitute generic versions for affected
+ * handlers.
+ */
+
+static inline void setup_IO_APIC_irq_handlers(void)
+{
+	struct IO_APIC_reg_01 reg_01;
+
+	*(int *)&reg_01 = io_apic_read(0, 1);
+	if (reg_01.version < 0x10) {
+		ioapic_level_irq_type.shutdown = shutdown_level_82489dx_irq;
+		ioapic_level_irq_type.enable = enable_level_82489dx_irq;
+		ioapic_level_irq_type.disable = disable_level_82489dx_irq;
+	}
+}
+
+/*
  *
  * IRQ's that are handled by the old PIC in all cases:
  * - IRQ2 is the cascade IRQ, and cannot be a io-apic IRQ.
@@ -1520,6 +1582,8 @@ static inline void check_timer(void)
 
 void __init setup_IO_APIC(void)
 {
+	setup_IO_APIC_irq_handlers();
+
 	enable_IO_APIC();
 
 	io_apic_irqs = ~PIC_IRQS;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups
  2001-01-30 11:20   ` Maciej W. Rozycki
@ 2001-02-01  0:58     ` Andrew Morton
  2001-02-02 12:08       ` Maciej W. Rozycki
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2001-02-01  0:58 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Manfred Spraul, Ingo Molnar, linux-kernel

"Maciej W. Rozycki" wrote:
> 
>  Following is the 82489DX-ized version of the patch.  I believe it's fine,
> but I would feel safer if others test it before I send it to Linus.

Your latest patch passes all my testing.

2.4.1+irq-whacker+netperf:        APIC dies instantly
2.4.1+irq-whacker+netperf+patch:  8 million interrupts, then I got bored.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups
  2001-02-01  0:58     ` Andrew Morton
@ 2001-02-02 12:08       ` Maciej W. Rozycki
  2001-02-02 18:12         ` Gérard Roudier
  0 siblings, 1 reply; 12+ messages in thread
From: Maciej W. Rozycki @ 2001-02-02 12:08 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton; +Cc: Manfred Spraul, Ingo Molnar, linux-kernel

On Thu, 1 Feb 2001, Andrew Morton wrote:

> Your latest patch passes all my testing.
> 
> 2.4.1+irq-whacker+netperf:        APIC dies instantly
> 2.4.1+irq-whacker+netperf+patch:  8 million interrupts, then I got bored.

 Linus, would you please apply the following patch for 2.4.2?  The idea of
operation is described in the comment below.

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

patch-2.4.0-io_apic-4
diff -up --recursive --new-file linux-2.4.0.macro/arch/i386/kernel/apic.c linux-2.4.0/arch/i386/kernel/apic.c
--- linux-2.4.0.macro/arch/i386/kernel/apic.c	Wed Dec 13 23:54:27 2000
+++ linux-2.4.0/arch/i386/kernel/apic.c	Sun Jan 28 08:58:02 2001
@@ -270,7 +270,7 @@ void __init setup_local_APIC (void)
 	 *   PCI Ne2000 networking cards and PII/PIII processors, dual
 	 *   BX chipset. ]
 	 */
-#if 0
+#if 1
 	/* Enable focus processor (bit==0) */
 	value &= ~(1<<9);
 #else
diff -up --recursive --new-file linux-2.4.0.macro/arch/i386/kernel/io_apic.c linux-2.4.0/arch/i386/kernel/io_apic.c
--- linux-2.4.0.macro/arch/i386/kernel/io_apic.c	Thu Oct  5 21:08:17 2000
+++ linux-2.4.0/arch/i386/kernel/io_apic.c	Tue Jan 30 07:49:01 2001
@@ -122,8 +122,27 @@ static void add_pin_to_irq(unsigned int 
 	static void name##_IO_APIC_irq (unsigned int irq)		\
 	__DO_ACTION(R, ACTION, FINAL)
 
-DO_ACTION( __mask,    0, |= 0x00010000, io_apic_sync(entry->apic))/* mask = 1 */
-DO_ACTION( __unmask,  0, &= 0xfffeffff, )				/* mask = 0 */
+/*
+ * It appears there is an erratum which affects at least the 82093AA
+ * I/O APIC.  If a level-triggered interrupt input is being masked in
+ * the redirection entry while the interrupt is send pending (its
+ * delivery status bit is set), the interrupt is erroneously
+ * delivered as edge-triggered but the IRR bit gets set nevertheless.
+ * As a result the I/O unit expects an EOI message but it will never
+ * arrive and further interrupts are blocked for the source.
+ *
+ * A workaround is to set the trigger mode to edge when masking
+ * a level-triggered interrupt and to revert the mode when unmasking.
+ * The idea is from Manfred Spraul.  --macro
+ */
+DO_ACTION( __mask,         0, |= 0x00010000,
+	)					/* mask = 1 */
+DO_ACTION( __unmask,       0, &= 0xfffeffff,
+	io_apic_sync(entry->apic))		/* mask = 0 */
+DO_ACTION( __mask_level,   0, = (reg & 0xffff7fff) | 0x00010000,
+	io_apic_sync(entry->apic))		/* mask = 1, trigger = edge */
+DO_ACTION( __unmask_level, 0, = (reg & 0xfffeffff) | 0x00008000,
+	)					/* mask = 0, trigger = level */
 
 static void mask_IO_APIC_irq (unsigned int irq)
 {
@@ -143,6 +162,24 @@ static void unmask_IO_APIC_irq (unsigned
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
+static void mask_level_IO_APIC_irq (unsigned int irq)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioapic_lock, flags);
+	__mask_level_IO_APIC_irq(irq);
+	spin_unlock_irqrestore(&ioapic_lock, flags);
+}
+
+static void unmask_level_IO_APIC_irq (unsigned int irq)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioapic_lock, flags);
+	__unmask_level_IO_APIC_irq(irq);
+	spin_unlock_irqrestore(&ioapic_lock, flags);
+}
+
 void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
 {
 	struct IO_APIC_route_entry entry;
@@ -1181,14 +1218,18 @@ static void end_edge_ioapic_irq (unsigne
  */
 static unsigned int startup_level_ioapic_irq (unsigned int irq)
 {
-	unmask_IO_APIC_irq(irq);
+	unmask_level_IO_APIC_irq(irq);
 
 	return 0; /* don't check for pending */
 }
 
-#define shutdown_level_ioapic_irq	mask_IO_APIC_irq
-#define enable_level_ioapic_irq		unmask_IO_APIC_irq
-#define disable_level_ioapic_irq	mask_IO_APIC_irq
+#define shutdown_level_ioapic_irq	mask_level_IO_APIC_irq
+#define enable_level_ioapic_irq		unmask_level_IO_APIC_irq
+#define disable_level_ioapic_irq	mask_level_IO_APIC_irq
+
+#define shutdown_level_82489dx_irq	mask_IO_APIC_irq
+#define enable_level_82489dx_irq	unmask_IO_APIC_irq
+#define disable_level_82489dx_irq	mask_IO_APIC_irq
 
 static void end_level_ioapic_irq (unsigned int i)
 {
@@ -1503,6 +1544,27 @@ static inline void check_timer(void)
 }
 
 /*
+ * We can't set the trigger mode to edge when masking a
+ * level-triggered interrupt in the 82489DX I/O APIC as
+ * no deassert message will be sent in this case and a
+ * local APIC may keep delivering the interrupt to a CPU.
+ * Hence we substitute generic versions for affected
+ * handlers.
+ */
+
+static inline void setup_IO_APIC_irq_handlers(void)
+{
+	struct IO_APIC_reg_01 reg_01;
+
+	*(int *)&reg_01 = io_apic_read(0, 1);
+	if (reg_01.version < 0x10) {
+		ioapic_level_irq_type.shutdown = shutdown_level_82489dx_irq;
+		ioapic_level_irq_type.enable = enable_level_82489dx_irq;
+		ioapic_level_irq_type.disable = disable_level_82489dx_irq;
+	}
+}
+
+/*
  *
  * IRQ's that are handled by the old PIC in all cases:
  * - IRQ2 is the cascade IRQ, and cannot be a io-apic IRQ.
@@ -1520,6 +1582,8 @@ static inline void check_timer(void)
 
 void __init setup_IO_APIC(void)
 {
+	setup_IO_APIC_irq_handlers();
+
 	enable_IO_APIC();
 
 	io_apic_irqs = ~PIC_IRQS;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups
  2001-02-02 12:08       ` Maciej W. Rozycki
@ 2001-02-02 18:12         ` Gérard Roudier
  2001-02-02 22:17           ` Manfred Spraul
  0 siblings, 1 reply; 12+ messages in thread
From: Gérard Roudier @ 2001-02-02 18:12 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Linus Torvalds, Andrew Morton, Manfred Spraul, Ingo Molnar, linux-kernel



On Fri, 2 Feb 2001, Maciej W. Rozycki wrote:

> On Thu, 1 Feb 2001, Andrew Morton wrote:

> +/*
> + * It appears there is an erratum which affects at least the 82093AA
> + * I/O APIC.  If a level-triggered interrupt input is being masked in
> + * the redirection entry while the interrupt is send pending (its
> + * delivery status bit is set), the interrupt is erroneously
> + * delivered as edge-triggered but the IRR bit gets set nevertheless.
> + * As a result the I/O unit expects an EOI message but it will never
> + * arrive and further interrupts are blocked for the source.
> + *
> + * A workaround is to set the trigger mode to edge when masking
> + * a level-triggered interrupt and to revert the mode when unmasking.
> + * The idea is from Manfred Spraul.  --macro
> + */

Is the below idea feasible or just stupid:

Once a level-sensitive interrupt has been accepted by a local APIC, the IO
APIC will wait for the EIO message prior to delivering again this
interrupt. Therefore masking a level-sensitive interrupt once it has been
delivered and prior to EIOing it should not race with the APIC hardware.

So, why not using a pure software flag in memory and only tampering the
things if the offending interrupt is actually delivered ? If the given
interrupt is delivered and the software mask is set we could simply do:

- MASK the given interrupt
- EOI it.
- return

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups
  2001-02-02 18:12         ` Gérard Roudier
@ 2001-02-02 22:17           ` Manfred Spraul
  2001-02-03 10:28             ` Gérard Roudier
  2001-02-03 12:14             ` Manfred Spraul
  0 siblings, 2 replies; 12+ messages in thread
From: Manfred Spraul @ 2001-02-02 22:17 UTC (permalink / raw)
  To: Gérard Roudier
  Cc: Maciej W. Rozycki, Linus Torvalds, Andrew Morton, Ingo Molnar,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1070 bytes --]

Gérard Roudier wrote:
> 
> So, why not using a pure software flag in memory and only tampering the
> things if the offending interrupt is actually delivered ? If the given
> interrupt is delivered and the software mask is set we could simply do:
> 
> - MASK the given interrupt
> - EOI it.
> - return
>
Good idea.
I implemented it, and it was a full success: not it always locks up :-(

If I apply the attached patch, then I get a lockup after ~ 100 packets
flood ping.
I've also attached the dmesg print.
I know that startup is currently wrong (must set trigger to level), but
that doesn't matter since I only ifup once.

But I think we can change the bug description:

If an io apic io redirection entry is unmasked while the irq pin is
active, then the io apic sends out the interrupt as edge triggered, but
nevertheless sets the IRR bit.

In a second test run I checked the TMR bit in the local apics: the bit
on the cpu that received the last interrupt is really 0.

I'll not try a 2 step enable:
* unmask.
* io_apic_sync()
* set trigger mode to level.

--
	Manfred

[-- Attachment #2: patch-ger2 --]
[-- Type: text/plain, Size: 4878 bytes --]

--- 2.4/arch/i386/kernel/io_apic.c	Fri Feb  2 15:51:57 2001
+++ build-2.4/arch/i386/kernel/io_apic.c	Fri Feb  2 23:04:44 2001
@@ -115,10 +115,10 @@
 
 /*
  * It appears there is an erratum which affects at least the 82093AA
- * I/O APIC.  If a level-triggered interrupt input is being masked in
- * the redirection entry while the interrupt is send pending (its
- * delivery status bit is set), the interrupt is erroneously
- * delivered as edge-triggered but the IRR bit gets set nevertheless.
+ * I/O APIC.  If a level-triggered interrupt input is being unmasked in
+ * the redirection entry while the interrupt line is active, then
+ * the interrupt is erroneously delivered as edge-triggered but the
+ * IRR bit gets set nevertheless.
  * As a result the I/O unit expects an EOI message but it will never
  * arrive and further interrupts are blocked for the source.
  *
@@ -126,12 +126,8 @@
  * a level-triggered interrupt and to revert the mode when unmasking.
  * The idea is from Manfred Spraul.
  */
-DO_ACTION( __mask,         0, = (reg & 0xffff7fff) | 0x00010000,
-	io_apic_sync(entry->apic))		/* mask = 1, trigger = edge */
-DO_ACTION( __unmask_edge,  0, &= 0xfffeffff,
-	)					/* mask = 0 */
-DO_ACTION( __unmask_level, 0, = (reg & 0xfffeffff) | 0x00008000,
-	)					/* mask = 0, trigger = level */
+DO_ACTION( __mask,         0, = (reg & 0xffff7fff) | 0x00010000, io_apic_sync(entry->apic))	/* mask = 1 */
+DO_ACTION( __unmask,  0, &= 0xfffeffff, )				/* mask = 0 */
 
 static void mask_IO_APIC_irq (unsigned int irq)
 {
@@ -142,26 +138,15 @@
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
-static void unmask_edge_IO_APIC_irq (unsigned int irq)
+static void unmask_IO_APIC_irq (unsigned int irq)
 {
 	unsigned long flags;
 
 	spin_lock_irqsave(&ioapic_lock, flags);
-	__unmask_edge_IO_APIC_irq(irq);
+	__unmask_IO_APIC_irq(irq);
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 }
 
-static void unmask_level_IO_APIC_irq (unsigned int irq)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&ioapic_lock, flags);
-	__unmask_level_IO_APIC_irq(irq);
-	spin_unlock_irqrestore(&ioapic_lock, flags);
-}
-
-#define unmask_IO_APIC_irq unmask_edge_IO_APIC_irq
-
 void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
 {
 	struct IO_APIC_route_entry entry;
@@ -712,7 +697,7 @@
 	printk(KERN_WARNING "          to linux-smp@vger.kernel.org\n");
 }
 
-void __init print_IO_APIC(void)
+void print_IO_APIC(void)
 {
 	int apic, i;
 	struct IO_APIC_reg_00 reg_00;
@@ -1117,7 +1102,7 @@
  * that was delayed but this is now handled in the device
  * independent code.
  */
-#define enable_edge_ioapic_irq unmask_edge_IO_APIC_irq
+#define enable_edge_ioapic_irq unmask_IO_APIC_irq
 
 static void disable_edge_ioapic_irq (unsigned int irq) { /* nothing */ }
 
@@ -1142,7 +1127,7 @@
 		if (i8259A_irq_pending(irq))
 			was_pending = 1;
 	}
-	__unmask_edge_IO_APIC_irq(irq);
+	__unmask_IO_APIC_irq(irq);
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 
 	return was_pending;
@@ -1182,21 +1167,66 @@
  */
 static unsigned int startup_level_ioapic_irq (unsigned int irq)
 {
-	unmask_level_IO_APIC_irq(irq);
+	unmask_IO_APIC_irq(irq);
 
 	return 0; /* don't check for pending */
 }
 
 #define shutdown_level_ioapic_irq	mask_IO_APIC_irq
-#define enable_level_ioapic_irq		unmask_level_IO_APIC_irq
-#define disable_level_ioapic_irq	mask_IO_APIC_irq
+
+static void enable_level_ioapic_irq (unsigned int i)
+{
+	unsigned long flags;
+	int pin;
+	struct irq_pin_list *entry;
+
+	spin_lock_irqsave(&ioapic_lock, flags);
+	entry = irq_2_pin + i;
+
+	for (;;) {
+		unsigned int reg, reg2;
+		pin = entry->pin;
+		if (pin == -1)
+			break;
+		reg = io_apic_read(entry->apic, 0x10 + pin*2);
+		if(!(reg & 0x10000)) {
+			printk(KERN_DEBUG "quick enable_level_ioapic_irq for %d.\n",i);
+			break;
+		}
+		reg &= ~(0x10000);
+		reg |= 0x8000;
+		printk(KERN_DEBUG "physically reenabling int %d, writing %lxh.\n",i, reg);
+reg2 = io_apic_read(entry->apic, 0x10 + pin*2);
+		io_apic_modify(entry->apic, reg);
+printk(KERN_DEBUG "overwriting %xh.\n",reg2);
+		io_apic_sync(entry->apic);
+reg2 = io_apic_read(entry->apic, 0x10 + pin*2);
+printk(KERN_DEBUG "new val %xh.\n",reg2);
+		if (!entry->next)
+			break;
+		entry = irq_2_pin + entry->next;
+	}
+
+	spin_unlock_irqrestore(&ioapic_lock, flags);
+}
+
+static void disable_level_ioapic_irq (unsigned int i) { /* delayed */ }
 
 static void end_level_ioapic_irq (unsigned int i)
 {
 	ack_APIC_irq();
 }
 
-static void mask_and_ack_level_ioapic_irq (unsigned int i) { /* nothing */ }
+static void mask_and_ack_level_ioapic_irq (unsigned int i)
+{
+	if(i == 16)
+		printk(KERN_DEBUG "irq 0x16 seen on cpu %d.\n",
+			smp_processor_id());
+	if (irq_desc[i].status & IRQ_DISABLED) {
+		mask_IO_APIC_irq(i);
+		printk(KERN_DEBUG "physically disabling int %d.\n",i);
+	}
+}
 
 static void set_ioapic_affinity (unsigned int irq, unsigned long mask)
 {

[-- Attachment #3: dmesg-ger2 --]
[-- Type: text/plain, Size: 847 bytes --]

quick enable_level_ioapic_irq for 16.
irq 0x16 seen on cpu 1.
quick enable_level_ioapic_irq for 16.
irq 0x16 seen on cpu 0.
irq 0x16 seen on cpu 1.
quick enable_level_ioapic_irq for 16.
irq 0x16 seen on cpu 0.
irq 0x16 seen on cpu 1.
irq 0x16 seen on cpu 0.
quick enable_level_ioapic_irq for 16.
irq 0x16 seen on cpu 1.
physically disabling int 16.
physically reenabling int 16, writing a9a1h.
overwriting 129a1h.
new val f9a1h.
irq 0x16 seen on cpu 1.
quick enable_level_ioapic_irq for 16.
NETDEV WATCHDOG: eth0: transmit timed out
eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=37.
quick enable_level_ioapic_irq for 16.
quick enable_level_ioapic_irq for 16.
quick enable_level_ioapic_irq for 16.
NETDEV WATCHDOG: eth0: transmit timed out
eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=40.
quick enable_level_ioapic_irq for 16.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups
  2001-02-02 22:17           ` Manfred Spraul
@ 2001-02-03 10:28             ` Gérard Roudier
  2001-02-03 14:44               ` [test patch] reliable apic lockup with one enable/disable_irq() Manfred
  2001-02-05 10:32               ` [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups Maciej W. Rozycki
  2001-02-03 12:14             ` Manfred Spraul
  1 sibling, 2 replies; 12+ messages in thread
From: Gérard Roudier @ 2001-02-03 10:28 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Maciej W. Rozycki, Linus Torvalds, Andrew Morton, Ingo Molnar,
	linux-kernel



On Fri, 2 Feb 2001, Manfred Spraul wrote:

> Gérard Roudier wrote:
> > 
> > So, why not using a pure software flag in memory and only tampering the
> > things if the offending interrupt is actually delivered ? If the given
> > interrupt is delivered and the software mask is set we could simply do:
> > 
> > - MASK the given interrupt
> > - EOI it.
> > - return
> >
> Good idea.
> I implemented it, and it was a full success: not it always locks up :-(
> 
> If I apply the attached patch, then I get a lockup after ~ 100 packets
> flood ping.
> I've also attached the dmesg print.
> I know that startup is currently wrong (must set trigger to level), but
> that doesn't matter since I only ifup once.
> 
> But I think we can change the bug description:
> 
> If an io apic io redirection entry is unmasked while the irq pin is
> active, then the io apic sends out the interrupt as edge triggered, but
> nevertheless sets the IRR bit.

Interesting.

My little finger tells me that O/Ses that thread interrupts might well
want to rely on MASK + EOI in order to quiesce incoming level-sensitive
interrupts.

Note that tampering the IO/APIC after initializations looks extremally
ugly to me. In my opinion, only the local APIC was intended by Intel
designers to be accessed by CPU after initialization (I may be wrong
here).

> In a second test run I checked the TMR bit in the local apics: the bit
> on the cpu that received the last interrupt is really 0.
> 
> I'll not try a 2 step enable:
> * unmask.
> * io_apic_sync()
> * set trigger mode to level.

Thanks for having tried my suggestion.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups
  2001-02-02 22:17           ` Manfred Spraul
  2001-02-03 10:28             ` Gérard Roudier
@ 2001-02-03 12:14             ` Manfred Spraul
  2001-02-05 11:38               ` Maciej W. Rozycki
  1 sibling, 1 reply; 12+ messages in thread
From: Manfred Spraul @ 2001-02-03 12:14 UTC (permalink / raw)
  To: Gérard Roudier, Maciej W. Rozycki, Linus Torvalds,
	Andrew Morton, Ingo Molnar, linux-kernel

Manfred Spraul wrote:
> 
> But I think we can change the bug description:
> 
> If an io apic io redirection entry is unmasked while the irq pin is
> active, then the io apic sends out the interrupt as edge triggered, but
> nevertheless sets the IRR bit.
>

I found another workaround:
8390.c currently calls

	outb_p(ENISR_ALL, e8390_base + EN0_IMR);
	enable_irq(dev->irq);

and locks up after ~ 100 packets flood ping.

If I reorder these calls to

	enable_irq(dev->irq);
	outb_p(ENISR_ALL, e8390_base + EN0_IMR);
	(and the correct spin_lock()'s)

the lockup disappears.

But I have no idea how io_apic.c could prevent lockups.

Playing with the trigger mode is not 100% reliable - I assume it kicks
the io apic only after several changes of the trigger mode bit. Maciej's
patch switches that bit twice during every start_tx operation and thus
doesn't lock up, my patch touches the redirection entry exactly once and
reliably locks up - even if I change trigger mode, polarity, delivery
mode and vector during enable_level_irq().

Any ideas?

--
	Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [test patch] reliable apic lockup with one enable/disable_irq()
  2001-02-03 10:28             ` Gérard Roudier
@ 2001-02-03 14:44               ` Manfred
  2001-02-05 10:32               ` [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups Maciej W. Rozycki
  1 sibling, 0 replies; 12+ messages in thread
From: Manfred @ 2001-02-03 14:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gérard Roudier, Maciej W. Rozycki, Linus Torvalds,
	Andrew Morton, Ingo Molnar, Randy

[-- Attachment #1: Type: text/plain, Size: 886 bytes --]

I found a sequence that reliably locks up my 82093AA ioapic with just
one disable/enable_irq.
I've attached a patch for the tulip driver. (tested with a pnic card)

* focus cpu must be enabled.
* Gerard's mask sequence: disable_irq() only sets IRQ_DISABLED,
mask_and_ack masks the ioapic entry if an interrupt is send to a
disabled handler.
* tulip_open:

disable_irq()
<sleep>
force the irq line active (enable both TPLnkPass and TPLnkFail, one of
them is always active)
<sleep>
enable_irq()

#insmod tulip
#ifup eth1

and now the interrupt is delivered as edge triggered, but the IRR bit is
set --> irq line stuck.

Could someone test the patch with newer boards (i840, via apollo pro,
perhaps i815 if the bios builds the MP table)

I also tried changing the trigger mode bit in
{,un}mask_level_IO_APIC_irq(), but that doesn't prevent the hang.

The patch is against 2.4.1
--
	Manfred

[-- Attachment #2: patch-hang --]
[-- Type: text/plain, Size: 2364 bytes --]

// $Header$
// Kernel Version:
//  VERSION = 2
//  PATCHLEVEL = 4
//  SUBLEVEL = 1
//  EXTRAVERSION =
--- 2.4/drivers/net/tulip/tulip_core.c	Sat Feb  3 14:02:54 2001
+++ build-2.4/drivers/net/tulip/tulip_core.c	Sat Feb  3 14:04:58 2001
@@ -421,6 +421,24 @@
 
 	tulip_up (dev);
 
+	disable_irq(dev->irq);
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	schedule_timeout(10);
+
+{
+	long ioaddr = dev->base_addr;
+	long csr7 = inl(ioaddr + CSR7);
+	outl(NormalIntr|AbnormalIntr|TPLnkPass|TPLnkFail, ioaddr + CSR7);
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	schedule_timeout(10);
+	enable_irq(dev->irq);
+
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	schedule_timeout(10);
+	outl(csr7, ioaddr + CSR7);
+
+}
+
 	netif_start_queue (dev);
 
 	return 0;
--- 2.4/arch/i386/kernel/io_apic.c	Sat Feb  3 14:02:24 2001
+++ build-2.4/arch/i386/kernel/io_apic.c	Sat Feb  3 14:54:14 2001
@@ -693,7 +693,7 @@
 	printk(KERN_WARNING "          to linux-smp@vger.kernel.org\n");
 }
 
-void __init print_IO_APIC(void)
+void print_IO_APIC(void)
 {
 	int apic, i;
 	struct IO_APIC_reg_00 reg_00;
@@ -1189,14 +1189,22 @@
 
 #define shutdown_level_ioapic_irq	mask_IO_APIC_irq
 #define enable_level_ioapic_irq		unmask_IO_APIC_irq
-#define disable_level_ioapic_irq	mask_IO_APIC_irq
+static void disable_level_ioapic_irq(unsigned int i)
+{
+	/* delayed. */
+}
 
 static void end_level_ioapic_irq (unsigned int i)
 {
 	ack_APIC_irq();
 }
 
-static void mask_and_ack_level_ioapic_irq (unsigned int i) { /* nothing */ }
+static void mask_and_ack_level_ioapic_irq (unsigned int i)
+{
+	if (irq_desc[i].status & IRQ_DISABLED) {
+		mask_IO_APIC_irq(i);
+	}
+}
 
 static void set_ioapic_affinity (unsigned int irq, unsigned long mask)
 {
--- 2.4/arch/i386/kernel/apic.c	Sat Feb  3 14:02:24 2001
+++ build-2.4/arch/i386/kernel/apic.c	Sat Feb  3 14:42:47 2001
@@ -270,7 +270,7 @@
 	 *   PCI Ne2000 networking cards and PII/PIII processors, dual
 	 *   BX chipset. ]
 	 */
-#if 0
+#if 1
 	/* Enable focus processor (bit==0) */
 	value &= ~(1<<9);
 #else
--- 2.4/drivers/char/sysrq.c	Mon Dec  4 02:48:19 2000
+++ build-2.4/drivers/char/sysrq.c	Sat Feb  3 14:33:51 2001
@@ -137,6 +137,10 @@
 		send_sig_all(SIGKILL, 1);
 		orig_log_level = 8;
 		break;
+	case 'q':
+		print_all_local_APICs();
+		print_IO_APIC();
+		break;
 	default:					    /* Unknown: help */
 		if (kbd)
 			printk("unRaw ");








^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups
  2001-02-03 10:28             ` Gérard Roudier
  2001-02-03 14:44               ` [test patch] reliable apic lockup with one enable/disable_irq() Manfred
@ 2001-02-05 10:32               ` Maciej W. Rozycki
  1 sibling, 0 replies; 12+ messages in thread
From: Maciej W. Rozycki @ 2001-02-05 10:32 UTC (permalink / raw)
  To: Gérard Roudier
  Cc: Manfred Spraul, Linus Torvalds, Andrew Morton, Ingo Molnar, linux-kernel

On Sat, 3 Feb 2001, [ISO-8859-1] Gérard Roudier wrote:

> Note that tampering the IO/APIC after initializations looks extremally
> ugly to me. In my opinion, only the local APIC was intended by Intel
> designers to be accessed by CPU after initialization (I may be wrong
> here).

 In "82489DX Datasheet" Intel explicitly points to masking and unmasking
an interrupt pin in an I/O APIC as one of three ways of controlling
incoming interrupts (other two being the Task Priority Register in a local
APIC and the IF flag in a CPU) at run time.  So far this is about the only
exhaustive APIC architecture description (a few further hints are also
present in "AP-388 82489DX User's Manual" but the datasheet is mostly a
superset).  I haven't seen any other APIC architecture description -- all
others are mostly register programming guidelines only.

 Neither of these documents are available online, AFAIK.  Last year I
asked Intel if providing electronic copies is possible, but they replied
it's not. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups
  2001-02-03 12:14             ` Manfred Spraul
@ 2001-02-05 11:38               ` Maciej W. Rozycki
  0 siblings, 0 replies; 12+ messages in thread
From: Maciej W. Rozycki @ 2001-02-05 11:38 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Gérard Roudier, Linus Torvalds, Andrew Morton, Ingo Molnar,
	linux-kernel

On Sat, 3 Feb 2001, Manfred Spraul wrote:

> I found another workaround:
> 8390.c currently calls
> 
> 	outb_p(ENISR_ALL, e8390_base + EN0_IMR);
> 	enable_irq(dev->irq);
> 
> and locks up after ~ 100 packets flood ping.
> 
> If I reorder these calls to
> 
> 	enable_irq(dev->irq);
> 	outb_p(ENISR_ALL, e8390_base + EN0_IMR);
> 	(and the correct spin_lock()'s)
> 
> the lockup disappears.

 Is it possible that asserting the IRQ when the mask is active makes it be
mishandled?

> Playing with the trigger mode is not 100% reliable - I assume it kicks
> the io apic only after several changes of the trigger mode bit. Maciej's
> patch switches that bit twice during every start_tx operation and thus
> doesn't lock up, my patch touches the redirection entry exactly once and
> reliably locks up - even if I change trigger mode, polarity, delivery
> mode and vector during enable_level_irq().

 I believe I recover from the lockup -- it's the mask function that
recovers.  But another one happens at the unmask time, I suppose.

> Any ideas?

 I'll implement the IRQ unlocker I was thinking first.  The idea is not to
try to prevent the lockup from happening as it might even be impossible
but to make the unlocker trigger after a lockup happens instead.  I should
have an implementation ready soon.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2001-02-05 11:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-01-29 18:46 [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups Maciej W. Rozycki
2001-01-29 20:40 ` Manfred Spraul
2001-01-30 11:20   ` Maciej W. Rozycki
2001-02-01  0:58     ` Andrew Morton
2001-02-02 12:08       ` Maciej W. Rozycki
2001-02-02 18:12         ` Gérard Roudier
2001-02-02 22:17           ` Manfred Spraul
2001-02-03 10:28             ` Gérard Roudier
2001-02-03 14:44               ` [test patch] reliable apic lockup with one enable/disable_irq() Manfred
2001-02-05 10:32               ` [patch] 2.4.0, 2.4.0-ac12: APIC lock-ups Maciej W. Rozycki
2001-02-03 12:14             ` Manfred Spraul
2001-02-05 11:38               ` Maciej W. Rozycki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).