All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-core] [BUG] IO-APIC stall due to broken fasteoi handling
@ 2007-10-04  9:14 Jan Kiszka
  2007-10-04  9:34 ` Philippe Gerum
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2007-10-04  9:14 UTC (permalink / raw)
  To: adeos-main; +Cc: Xenomai-core

Hi all,

after a really long search I'm now quite sure to have found the reason
for the lockups I'm seeing over 2.6.22-i386. I'm yet struggling to
understand why this issue is not visible over 2.6.19 and .20 for me, but
maybe it is just far less likely there.

Here is a short write-up of the I-pipe trace I was able to catch with
some hacking from a locked up box:

Scenario: I-pipe active, Xenomai not loaded or compiled out (but loading
Xenomai just increases the probability)

1. IRQ 20 arrives, Linux starts serving it, but no one talks to the
   IO-APIC so far because this is a fasteoi type IRQ.

2. Linux reenables IRQs due to IRQF_DISABLED not set for IRQ 20.

3. IRQ 23 arrives and gets delivered as it is of higher priority in the
   APIC. From this point on, things start to fall apart.

4. I-pipe stops the delivery in __ipipe_synch_stage because the
   IPIPE_SYNC_FLAG is still set for the root domain. Linux switches back
   to the IRQ 20 handler so that the usual handling order gets inverted
   -- the first I-pipe bug.

5. IRQ 20 completes and sends an EOI to the APIC. Linux means that this
   is for IRQ 20, but the APIC considers it for IRQ 23!

6. IRQ 23 is re-enabled and arrives before its last event was handled.
   Thus two IRQ-23-events get merged into one, and eoi is only executed
   once instead of twice. This causes all IRQs < 23 being blocked from
   now on. :(

Well, this trace also reveals a second bug that can cause nasty priority
inversion: a high-prio domains executes when a fasteoi-IRQ arrives for a
low-prio domain. This will now block all IRQs until the low-prio domain
was able to run its IRQ handler completely. Thus we must _mask_ fasteoi
IRQs for low-prio domains while high-prio ones are running!

These bugs should impact at least x86_64 as well, not sure about how
powerpc looks like.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] [BUG] IO-APIC stall due to broken fasteoi handling
  2007-10-04  9:14 [Xenomai-core] [BUG] IO-APIC stall due to broken fasteoi handling Jan Kiszka
@ 2007-10-04  9:34 ` Philippe Gerum
  2007-10-04 12:22   ` Philippe Gerum
  0 siblings, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2007-10-04  9:34 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: adeos-main, Xenomai-core

On Thu, 2007-10-04 at 11:14 +0200, Jan Kiszka wrote:
> Hi all,
> 
> after a really long search I'm now quite sure to have found the reason
> for the lockups I'm seeing over 2.6.22-i386. I'm yet struggling to
> understand why this issue is not visible over 2.6.19 and .20 for me, but
> maybe it is just far less likely there.
> 
> Here is a short write-up of the I-pipe trace I was able to catch with
> some hacking from a locked up box:
> 
> Scenario: I-pipe active, Xenomai not loaded or compiled out (but loading
> Xenomai just increases the probability)
> 
> 1. IRQ 20 arrives, Linux starts serving it, but no one talks to the
>    IO-APIC so far because this is a fasteoi type IRQ.
> 
> 2. Linux reenables IRQs due to IRQF_DISABLED not set for IRQ 20.
> 
> 3. IRQ 23 arrives and gets delivered as it is of higher priority in the
>    APIC. From this point on, things start to fall apart.
> 
> 4. I-pipe stops the delivery in __ipipe_synch_stage because the
>    IPIPE_SYNC_FLAG is still set for the root domain. Linux switches back
>    to the IRQ 20 handler so that the usual handling order gets inverted
>    -- the first I-pipe bug.
> 

This means that the synchronization flag must become a per-IRQ thing; it
was introduced to prevent timer IRQs from piling up on behalf of the
syncer on overloaded low-end hardware.

> 5. IRQ 20 completes and sends an EOI to the APIC. Linux means that this
>    is for IRQ 20, but the APIC considers it for IRQ 23!
> 
> 6. IRQ 23 is re-enabled and arrives before its last event was handled.
>    Thus two IRQ-23-events get merged into one, and eoi is only executed
>    once instead of twice. This causes all IRQs < 23 being blocked from
>    now on. :(
> 
> Well, this trace also reveals a second bug that can cause nasty priority
> inversion: a high-prio domains executes when a fasteoi-IRQ arrives for a
> low-prio domain. This will now block all IRQs until the low-prio domain
> was able to run its IRQ handler completely. Thus we must _mask_ fasteoi
> IRQs for low-prio domains while high-prio ones are running!
> 

This code was actually there up to 2.6.17-1.5-02, and was removed at
some point in the 2.6.19 series, due to some severe conflicts with the
vanilla IO-APIC support which used to be a hell of a moving target at
that time. I guess it's time to bring this code back.

> These bugs should impact at least x86_64 as well, not sure about how
> powerpc looks like.

Powerpc has the same problem, even if it already mask+acks fasteois to
prevent interrupt flooding on MPIC hardware.

> 
> Jan
> 
-- 
Philippe.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] [BUG] IO-APIC stall due to broken fasteoi handling
  2007-10-04  9:34 ` Philippe Gerum
@ 2007-10-04 12:22   ` Philippe Gerum
  2007-10-04 12:42     ` Jan Kiszka
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Philippe Gerum @ 2007-10-04 12:22 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: adeos-main, Xenomai-core

On Thu, 2007-10-04 at 11:34 +0200, Philippe Gerum wrote: 
> > Well, this trace also reveals a second bug that can cause nasty priority
> > inversion: a high-prio domains executes when a fasteoi-IRQ arrives for a
> > low-prio domain. This will now block all IRQs until the low-prio domain
> > was able to run its IRQ handler completely. Thus we must _mask_ fasteoi
> > IRQs for low-prio domains while high-prio ones are running!
> > 
> 
> This code was actually there up to 2.6.17-1.5-02, and was removed at
> some point in the 2.6.19 series, due to some severe conflicts with the
> vanilla IO-APIC support which used to be a hell of a moving target at
> that time. I guess it's time to bring this code back.
> 

Does the following work for you?

diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
index 2ae79e9..517937b 100644
--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -2022,6 +2022,8 @@ static void ack_ioapic_quirk_irq(unsigned int irq)
 		__unmask_and_level_IO_APIC_irq(irq);
 		spin_unlock(&ioapic_lock);
 	}
+
+	__mask_IO_APIC_irq(irq);
 }
 
 static int ioapic_retrigger_irq(unsigned int irq)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index ba497a7..1560b4a 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -422,8 +422,13 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc)
 
 	spin_lock(&desc->lock);
 	desc->status &= ~IRQ_INPROGRESS;
+#ifdef CONFIG_IPIPE
+	desc->chip->unmask(irq);
+out:
+#else
 out:
 	desc->chip->eoi(irq);
+#endif
 
 	spin_unlock(&desc->lock);
 }
@@ -533,11 +538,12 @@ void fastcall __ipipe_end_level_irq(unsigned irq, struct irq_desc *desc)
 
 void fastcall __ipipe_ack_fasteoi_irq(unsigned irq, struct irq_desc *desc)
 {
+	desc->chip->eoi(irq);
 }
 
 void fastcall __ipipe_end_fasteoi_irq(unsigned irq, struct irq_desc *desc)
 {
-	desc->chip->eoi(irq);
+	desc->chip->unmask(irq);
 }
 
 void fastcall __ipipe_ack_edge_irq(unsigned irq, struct irq_desc *desc)

-- 
Philippe.




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] [BUG] IO-APIC stall due to broken fasteoi handling
  2007-10-04 12:22   ` Philippe Gerum
@ 2007-10-04 12:42     ` Jan Kiszka
  2007-10-04 12:55       ` Philippe Gerum
  2007-10-04 15:52     ` Jan Kiszka
  2007-10-04 20:05     ` Gilles Chanteperdrix
  2 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2007-10-04 12:42 UTC (permalink / raw)
  To: rpm; +Cc: adeos-main, Xenomai-core

Philippe Gerum wrote:
> On Thu, 2007-10-04 at 11:34 +0200, Philippe Gerum wrote: 
>>> Well, this trace also reveals a second bug that can cause nasty priority
>>> inversion: a high-prio domains executes when a fasteoi-IRQ arrives for a
>>> low-prio domain. This will now block all IRQs until the low-prio domain
>>> was able to run its IRQ handler completely. Thus we must _mask_ fasteoi
>>> IRQs for low-prio domains while high-prio ones are running!
>>>
>> This code was actually there up to 2.6.17-1.5-02, and was removed at
>> some point in the 2.6.19 series, due to some severe conflicts with the
>> vanilla IO-APIC support which used to be a hell of a moving target at
>> that time. I guess it's time to bring this code back.
>>
> 
> Does the following work for you?

Will give it a try later. Meanwhile...

> 
> diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
> index 2ae79e9..517937b 100644
> --- a/arch/i386/kernel/io_apic.c
> +++ b/arch/i386/kernel/io_apic.c
> @@ -2022,6 +2022,8 @@ static void ack_ioapic_quirk_irq(unsigned int irq)
>  		__unmask_and_level_IO_APIC_irq(irq);
>  		spin_unlock(&ioapic_lock);
>  	}
> +
> +	__mask_IO_APIC_irq(irq);
>  }

...I have problems understanding this hunk. Typo? Should this read
__unmask_IO_APIC_irq?

>  
>  static int ioapic_retrigger_irq(unsigned int irq)
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index ba497a7..1560b4a 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -422,8 +422,13 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc)
>  
>  	spin_lock(&desc->lock);
>  	desc->status &= ~IRQ_INPROGRESS;
> +#ifdef CONFIG_IPIPE
> +	desc->chip->unmask(irq);
> +out:
> +#else
>  out:
>  	desc->chip->eoi(irq);
> +#endif
>  
>  	spin_unlock(&desc->lock);
>  }
> @@ -533,11 +538,12 @@ void fastcall __ipipe_end_level_irq(unsigned irq, struct irq_desc *desc)
>  
>  void fastcall __ipipe_ack_fasteoi_irq(unsigned irq, struct irq_desc *desc)
>  {
> +	desc->chip->eoi(irq);
>  }
>  
>  void fastcall __ipipe_end_fasteoi_irq(unsigned irq, struct irq_desc *desc)
>  {
> -	desc->chip->eoi(irq);
> +	desc->chip->unmask(irq);
>  }
>  
>  void fastcall __ipipe_ack_edge_irq(unsigned irq, struct irq_desc *desc)
> 

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] [BUG] IO-APIC stall due to broken fasteoi handling
  2007-10-04 12:42     ` Jan Kiszka
@ 2007-10-04 12:55       ` Philippe Gerum
  2007-10-04 14:06         ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2007-10-04 12:55 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: adeos-main, Xenomai-core

On Thu, 2007-10-04 at 14:42 +0200, Jan Kiszka wrote:
> Philippe Gerum wrote:
> > On Thu, 2007-10-04 at 11:34 +0200, Philippe Gerum wrote: 
> >>> Well, this trace also reveals a second bug that can cause nasty priority
> >>> inversion: a high-prio domains executes when a fasteoi-IRQ arrives for a
> >>> low-prio domain. This will now block all IRQs until the low-prio domain
> >>> was able to run its IRQ handler completely. Thus we must _mask_ fasteoi
> >>> IRQs for low-prio domains while high-prio ones are running!
> >>>
> >> This code was actually there up to 2.6.17-1.5-02, and was removed at
> >> some point in the 2.6.19 series, due to some severe conflicts with the
> >> vanilla IO-APIC support which used to be a hell of a moving target at
> >> that time. I guess it's time to bring this code back.
> >>
> > 
> > Does the following work for you?
> 
> Will give it a try later. Meanwhile...
> 
> > 
> > diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
> > index 2ae79e9..517937b 100644
> > --- a/arch/i386/kernel/io_apic.c
> > +++ b/arch/i386/kernel/io_apic.c
> > @@ -2022,6 +2022,8 @@ static void ack_ioapic_quirk_irq(unsigned int irq)
> >  		__unmask_and_level_IO_APIC_irq(irq);
> >  		spin_unlock(&ioapic_lock);
> >  	}
> > +
> > +	__mask_IO_APIC_irq(irq);
> >  }
> 
> ...I have problems understanding this hunk. Typo? Should this read
> __unmask_IO_APIC_irq?
> 

No, you want to mask it here. EOI in the IO-APIC case goes through some
quirks which you want to apply immediately on behalf of the primary
I-pipe ack handler, basically to work around some IO-APIC errata. Then,
either the high priority domain (__ipipe_end_fasteoi_irq) or the root
one (handle_fasteoi_irq) will unmask the IRQ as needed, whichever comes
first (and only).

> >  
> >  static int ioapic_retrigger_irq(unsigned int irq)
> > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> > index ba497a7..1560b4a 100644
> > --- a/kernel/irq/chip.c
> > +++ b/kernel/irq/chip.c
> > @@ -422,8 +422,13 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc)
> >  
> >  	spin_lock(&desc->lock);
> >  	desc->status &= ~IRQ_INPROGRESS;
> > +#ifdef CONFIG_IPIPE
> > +	desc->chip->unmask(irq);
> > +out:
> > +#else
> >  out:
> >  	desc->chip->eoi(irq);
> > +#endif
> >  
> >  	spin_unlock(&desc->lock);
> >  }
> > @@ -533,11 +538,12 @@ void fastcall __ipipe_end_level_irq(unsigned irq, struct irq_desc *desc)
> >  
> >  void fastcall __ipipe_ack_fasteoi_irq(unsigned irq, struct irq_desc *desc)
> >  {
> > +	desc->chip->eoi(irq);
> >  }
> >  
> >  void fastcall __ipipe_end_fasteoi_irq(unsigned irq, struct irq_desc *desc)
> >  {
> > -	desc->chip->eoi(irq);
> > +	desc->chip->unmask(irq);
> >  }
> >  
> >  void fastcall __ipipe_ack_edge_irq(unsigned irq, struct irq_desc *desc)
> > 
> 
> Jan
> 
-- 
Philippe.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] [BUG] IO-APIC stall due to broken fasteoi handling
  2007-10-04 12:55       ` Philippe Gerum
@ 2007-10-04 14:06         ` Jan Kiszka
  2007-10-04 14:26           ` Philippe Gerum
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2007-10-04 14:06 UTC (permalink / raw)
  To: rpm; +Cc: adeos-main, Xenomai-core

Philippe Gerum wrote:
> On Thu, 2007-10-04 at 14:42 +0200, Jan Kiszka wrote:
>> Philippe Gerum wrote:
>>> On Thu, 2007-10-04 at 11:34 +0200, Philippe Gerum wrote: 
>>>>> Well, this trace also reveals a second bug that can cause nasty priority
>>>>> inversion: a high-prio domains executes when a fasteoi-IRQ arrives for a
>>>>> low-prio domain. This will now block all IRQs until the low-prio domain
>>>>> was able to run its IRQ handler completely. Thus we must _mask_ fasteoi
>>>>> IRQs for low-prio domains while high-prio ones are running!
>>>>>
>>>> This code was actually there up to 2.6.17-1.5-02, and was removed at
>>>> some point in the 2.6.19 series, due to some severe conflicts with the
>>>> vanilla IO-APIC support which used to be a hell of a moving target at
>>>> that time. I guess it's time to bring this code back.
>>>>
>>> Does the following work for you?
>> Will give it a try later. Meanwhile...
>>
>>> diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
>>> index 2ae79e9..517937b 100644
>>> --- a/arch/i386/kernel/io_apic.c
>>> +++ b/arch/i386/kernel/io_apic.c
>>> @@ -2022,6 +2022,8 @@ static void ack_ioapic_quirk_irq(unsigned int irq)
>>>  		__unmask_and_level_IO_APIC_irq(irq);
>>>  		spin_unlock(&ioapic_lock);
>>>  	}
>>> +
>>> +	__mask_IO_APIC_irq(irq);
>>>  }
>> ...I have problems understanding this hunk. Typo? Should this read
>> __unmask_IO_APIC_irq?
>>
> 
> No, you want to mask it here. EOI in the IO-APIC case goes through some
> quirks which you want to apply immediately on behalf of the primary
> I-pipe ack handler, basically to work around some IO-APIC errata. Then,
> either the high priority domain (__ipipe_end_fasteoi_irq) or the root
> one (handle_fasteoi_irq) will unmask the IRQ as needed, whichever comes
> first (and only).

ack_ioapic_quirk_irq == eio for fasteoi, so it is specifically executed
on exit of handle_fasteoi_irq. I still don't see why you want to leave
the IRQ masked here.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] [BUG] IO-APIC stall due to broken fasteoi handling
  2007-10-04 14:06         ` Jan Kiszka
@ 2007-10-04 14:26           ` Philippe Gerum
  2007-10-04 14:44             ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2007-10-04 14:26 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: adeos-main, Xenomai-core

On Thu, 2007-10-04 at 16:06 +0200, Jan Kiszka wrote:
> Philippe Gerum wrote:
> > On Thu, 2007-10-04 at 14:42 +0200, Jan Kiszka wrote:
> >> Philippe Gerum wrote:
> >>> On Thu, 2007-10-04 at 11:34 +0200, Philippe Gerum wrote: 
> >>>>> Well, this trace also reveals a second bug that can cause nasty priority
> >>>>> inversion: a high-prio domains executes when a fasteoi-IRQ arrives for a
> >>>>> low-prio domain. This will now block all IRQs until the low-prio domain
> >>>>> was able to run its IRQ handler completely. Thus we must _mask_ fasteoi
> >>>>> IRQs for low-prio domains while high-prio ones are running!
> >>>>>
> >>>> This code was actually there up to 2.6.17-1.5-02, and was removed at
> >>>> some point in the 2.6.19 series, due to some severe conflicts with the
> >>>> vanilla IO-APIC support which used to be a hell of a moving target at
> >>>> that time. I guess it's time to bring this code back.
> >>>>
> >>> Does the following work for you?
> >> Will give it a try later. Meanwhile...
> >>
> >>> diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
> >>> index 2ae79e9..517937b 100644
> >>> --- a/arch/i386/kernel/io_apic.c
> >>> +++ b/arch/i386/kernel/io_apic.c
> >>> @@ -2022,6 +2022,8 @@ static void ack_ioapic_quirk_irq(unsigned int irq)
> >>>  		__unmask_and_level_IO_APIC_irq(irq);
> >>>  		spin_unlock(&ioapic_lock);
> >>>  	}
> >>> +
> >>> +	__mask_IO_APIC_irq(irq);
> >>>  }
> >> ...I have problems understanding this hunk. Typo? Should this read
> >> __unmask_IO_APIC_irq?
> >>
> > 
> > No, you want to mask it here. EOI in the IO-APIC case goes through some
> > quirks which you want to apply immediately on behalf of the primary
> > I-pipe ack handler, basically to work around some IO-APIC errata. Then,
> > either the high priority domain (__ipipe_end_fasteoi_irq) or the root
> > one (handle_fasteoi_irq) will unmask the IRQ as needed, whichever comes
> > first (and only).
> 
> ack_ioapic_quirk_irq == eio for fasteoi, so it is specifically executed
> on exit of handle_fasteoi_irq. I still don't see why you want to leave
> the IRQ masked here.
> 

It is not executed on IRQ exit anymore when the I-pipe is enabled. The
EOI handler is called earlier in the latter case to ack the LAPIC, then
mask the interrupt source from the IO-APIC, waiting for the Linux
handler to process the device which triggered the interrupt. The source
is eventually unmasked when either the high priority domain or Linux is
done with the interrupt.

> Jan
> 
-- 
Philippe.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] [BUG] IO-APIC stall due to broken fasteoi handling
  2007-10-04 14:26           ` Philippe Gerum
@ 2007-10-04 14:44             ` Jan Kiszka
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kiszka @ 2007-10-04 14:44 UTC (permalink / raw)
  To: rpm; +Cc: adeos-main, Xenomai-core

Philippe Gerum wrote:
> On Thu, 2007-10-04 at 16:06 +0200, Jan Kiszka wrote:
>> Philippe Gerum wrote:
>>> On Thu, 2007-10-04 at 14:42 +0200, Jan Kiszka wrote:
>>>> Philippe Gerum wrote:
>>>>> On Thu, 2007-10-04 at 11:34 +0200, Philippe Gerum wrote: 
>>>>>>> Well, this trace also reveals a second bug that can cause nasty priority
>>>>>>> inversion: a high-prio domains executes when a fasteoi-IRQ arrives for a
>>>>>>> low-prio domain. This will now block all IRQs until the low-prio domain
>>>>>>> was able to run its IRQ handler completely. Thus we must _mask_ fasteoi
>>>>>>> IRQs for low-prio domains while high-prio ones are running!
>>>>>>>
>>>>>> This code was actually there up to 2.6.17-1.5-02, and was removed at
>>>>>> some point in the 2.6.19 series, due to some severe conflicts with the
>>>>>> vanilla IO-APIC support which used to be a hell of a moving target at
>>>>>> that time. I guess it's time to bring this code back.
>>>>>>
>>>>> Does the following work for you?
>>>> Will give it a try later. Meanwhile...
>>>>
>>>>> diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
>>>>> index 2ae79e9..517937b 100644
>>>>> --- a/arch/i386/kernel/io_apic.c
>>>>> +++ b/arch/i386/kernel/io_apic.c
>>>>> @@ -2022,6 +2022,8 @@ static void ack_ioapic_quirk_irq(unsigned int irq)
>>>>>  		__unmask_and_level_IO_APIC_irq(irq);
>>>>>  		spin_unlock(&ioapic_lock);
>>>>>  	}
>>>>> +
>>>>> +	__mask_IO_APIC_irq(irq);
>>>>>  }
>>>> ...I have problems understanding this hunk. Typo? Should this read
>>>> __unmask_IO_APIC_irq?
>>>>
>>> No, you want to mask it here. EOI in the IO-APIC case goes through some
>>> quirks which you want to apply immediately on behalf of the primary
>>> I-pipe ack handler, basically to work around some IO-APIC errata. Then,
>>> either the high priority domain (__ipipe_end_fasteoi_irq) or the root
>>> one (handle_fasteoi_irq) will unmask the IRQ as needed, whichever comes
>>> first (and only).
>> ack_ioapic_quirk_irq == eio for fasteoi, so it is specifically executed
>> on exit of handle_fasteoi_irq. I still don't see why you want to leave
>> the IRQ masked here.
>>
> 
> It is not executed on IRQ exit anymore when the I-pipe is enabled. The
> EOI handler is called earlier in the latter case to ack the LAPIC, then
> mask the interrupt source from the IO-APIC, waiting for the Linux
> handler to process the device which triggered the interrupt. The source
> is eventually unmasked when either the high priority domain or Linux is
> done with the interrupt.

Ah, ok, too blind to see the full picture: handle_fasteoi_irq was
changed in that direction.

Jan (who just kicked off a patched kernel rebuild)

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] [BUG] IO-APIC stall due to broken fasteoi handling
  2007-10-04 12:22   ` Philippe Gerum
  2007-10-04 12:42     ` Jan Kiszka
@ 2007-10-04 15:52     ` Jan Kiszka
  2007-10-04 17:03       ` Jan Kiszka
  2007-10-04 20:05     ` Gilles Chanteperdrix
  2 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2007-10-04 15:52 UTC (permalink / raw)
  To: rpm; +Cc: adeos-main, Xenomai-core

Philippe Gerum wrote:
> On Thu, 2007-10-04 at 11:34 +0200, Philippe Gerum wrote: 
>>> Well, this trace also reveals a second bug that can cause nasty priority
>>> inversion: a high-prio domains executes when a fasteoi-IRQ arrives for a
>>> low-prio domain. This will now block all IRQs until the low-prio domain
>>> was able to run its IRQ handler completely. Thus we must _mask_ fasteoi
>>> IRQs for low-prio domains while high-prio ones are running!
>>>
>> This code was actually there up to 2.6.17-1.5-02, and was removed at
>> some point in the 2.6.19 series, due to some severe conflicts with the
>> vanilla IO-APIC support which used to be a hell of a moving target at
>> that time. I guess it's time to bring this code back.
>>
> 
> Does the following work for you?
> 
> diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
> index 2ae79e9..517937b 100644
> --- a/arch/i386/kernel/io_apic.c
> +++ b/arch/i386/kernel/io_apic.c
> @@ -2022,6 +2022,8 @@ static void ack_ioapic_quirk_irq(unsigned int irq)
>  		__unmask_and_level_IO_APIC_irq(irq);
>  		spin_unlock(&ioapic_lock);
>  	}
> +
> +	__mask_IO_APIC_irq(irq);
>  }
>  
>  static int ioapic_retrigger_irq(unsigned int irq)
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index ba497a7..1560b4a 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -422,8 +422,13 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc)
>  
>  	spin_lock(&desc->lock);
>  	desc->status &= ~IRQ_INPROGRESS;
> +#ifdef CONFIG_IPIPE
> +	desc->chip->unmask(irq);
> +out:
> +#else
>  out:
>  	desc->chip->eoi(irq);
> +#endif
>  
>  	spin_unlock(&desc->lock);
>  }
> @@ -533,11 +538,12 @@ void fastcall __ipipe_end_level_irq(unsigned irq, struct irq_desc *desc)
>  
>  void fastcall __ipipe_ack_fasteoi_irq(unsigned irq, struct irq_desc *desc)
>  {
> +	desc->chip->eoi(irq);
>  }
>  
>  void fastcall __ipipe_end_fasteoi_irq(unsigned irq, struct irq_desc *desc)
>  {
> -	desc->chip->eoi(irq);
> +	desc->chip->unmask(irq);
>  }
>  
>  void fastcall __ipipe_ack_edge_irq(unsigned irq, struct irq_desc *desc)
> 

Good news: The patches seems to stabilise the Xenomai-free use of
I-pipe. I'm writing this mail while compiling a kernel on an
i-pipe-enabled 2.6.22-box.

Bad news: Loading Xenomai still causes a hard reboot after while. Maybe
I was too quick with re-importing the no-COW bits. Now trying to revert
them again to check if they are still involved in nasty page faults...

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] [BUG] IO-APIC stall due to broken fasteoi handling
  2007-10-04 15:52     ` Jan Kiszka
@ 2007-10-04 17:03       ` Jan Kiszka
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kiszka @ 2007-10-04 17:03 UTC (permalink / raw)
  To: rpm; +Cc: adeos-main, Xenomai-core

[-- Attachment #1: Type: text/plain, Size: 2912 bytes --]

Jan Kiszka wrote:
> Philippe Gerum wrote:
>> On Thu, 2007-10-04 at 11:34 +0200, Philippe Gerum wrote: 
>>>> Well, this trace also reveals a second bug that can cause nasty priority
>>>> inversion: a high-prio domains executes when a fasteoi-IRQ arrives for a
>>>> low-prio domain. This will now block all IRQs until the low-prio domain
>>>> was able to run its IRQ handler completely. Thus we must _mask_ fasteoi
>>>> IRQs for low-prio domains while high-prio ones are running!
>>>>
>>> This code was actually there up to 2.6.17-1.5-02, and was removed at
>>> some point in the 2.6.19 series, due to some severe conflicts with the
>>> vanilla IO-APIC support which used to be a hell of a moving target at
>>> that time. I guess it's time to bring this code back.
>>>
>> Does the following work for you?
>>
>> diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
>> index 2ae79e9..517937b 100644
>> --- a/arch/i386/kernel/io_apic.c
>> +++ b/arch/i386/kernel/io_apic.c
>> @@ -2022,6 +2022,8 @@ static void ack_ioapic_quirk_irq(unsigned int irq)
>>  		__unmask_and_level_IO_APIC_irq(irq);
>>  		spin_unlock(&ioapic_lock);
>>  	}
>> +
>> +	__mask_IO_APIC_irq(irq);
>>  }
>>  
>>  static int ioapic_retrigger_irq(unsigned int irq)
>> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
>> index ba497a7..1560b4a 100644
>> --- a/kernel/irq/chip.c
>> +++ b/kernel/irq/chip.c
>> @@ -422,8 +422,13 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc)
>>  
>>  	spin_lock(&desc->lock);
>>  	desc->status &= ~IRQ_INPROGRESS;
>> +#ifdef CONFIG_IPIPE
>> +	desc->chip->unmask(irq);
>> +out:
>> +#else
>>  out:
>>  	desc->chip->eoi(irq);
>> +#endif
>>  
>>  	spin_unlock(&desc->lock);
>>  }
>> @@ -533,11 +538,12 @@ void fastcall __ipipe_end_level_irq(unsigned irq, struct irq_desc *desc)
>>  
>>  void fastcall __ipipe_ack_fasteoi_irq(unsigned irq, struct irq_desc *desc)
>>  {
>> +	desc->chip->eoi(irq);
>>  }
>>  
>>  void fastcall __ipipe_end_fasteoi_irq(unsigned irq, struct irq_desc *desc)
>>  {
>> -	desc->chip->eoi(irq);
>> +	desc->chip->unmask(irq);
>>  }
>>  
>>  void fastcall __ipipe_ack_edge_irq(unsigned irq, struct irq_desc *desc)
>>
> 
> Good news: The patches seems to stabilise the Xenomai-free use of
> I-pipe. I'm writing this mail while compiling a kernel on an
> i-pipe-enabled 2.6.22-box.
> 
> Bad news: Loading Xenomai still causes a hard reboot after while. Maybe
> I was too quick with re-importing the no-COW bits. Now trying to revert
> them again to check if they are still involved in nasty page faults...

No-COW: not guilty.

The system still reboots or locks up when I modprobe e.g. xeno_rtdm
before logging into X. I threw my laptop in some corner (other work is
calling), but I will try to look into this over the weekend or so. I'm
hoping we still get oopses/backtraces/whatever.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] [BUG] IO-APIC stall due to broken fasteoi handling
  2007-10-04 12:22   ` Philippe Gerum
  2007-10-04 12:42     ` Jan Kiszka
  2007-10-04 15:52     ` Jan Kiszka
@ 2007-10-04 20:05     ` Gilles Chanteperdrix
  2 siblings, 0 replies; 11+ messages in thread
From: Gilles Chanteperdrix @ 2007-10-04 20:05 UTC (permalink / raw)
  To: rpm; +Cc: adeos-main, Xenomai-core

Philippe Gerum wrote:
 > On Thu, 2007-10-04 at 11:34 +0200, Philippe Gerum wrote: 
 > > > Well, this trace also reveals a second bug that can cause nasty priority
 > > > inversion: a high-prio domains executes when a fasteoi-IRQ arrives for a
 > > > low-prio domain. This will now block all IRQs until the low-prio domain
 > > > was able to run its IRQ handler completely. Thus we must _mask_ fasteoi
 > > > IRQs for low-prio domains while high-prio ones are running!
 > > > 
 > > 
 > > This code was actually there up to 2.6.17-1.5-02, and was removed at
 > > some point in the 2.6.19 series, due to some severe conflicts with the
 > > vanilla IO-APIC support which used to be a hell of a moving target at
 > > that time. I guess it's time to bring this code back.
 > > 
 > 
 > Does the following work for you?
 > 
 > diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c
 > index 2ae79e9..517937b 100644
 > --- a/arch/i386/kernel/io_apic.c
 > +++ b/arch/i386/kernel/io_apic.c
 > @@ -2022,6 +2022,8 @@ static void ack_ioapic_quirk_irq(unsigned int irq)
 >  		__unmask_and_level_IO_APIC_irq(irq);
 >  		spin_unlock(&ioapic_lock);
 >  	}
 > +
 > +	__mask_IO_APIC_irq(irq);
 >  }
 >  
 >  static int ioapic_retrigger_irq(unsigned int irq)
 > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
 > index ba497a7..1560b4a 100644
 > --- a/kernel/irq/chip.c
 > +++ b/kernel/irq/chip.c
 > @@ -422,8 +422,13 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc)
 >  
 >  	spin_lock(&desc->lock);
 >  	desc->status &= ~IRQ_INPROGRESS;
 > +#ifdef CONFIG_IPIPE
 > +	desc->chip->unmask(irq);
 > +out:
 > +#else
 >  out:
 >  	desc->chip->eoi(irq);
 > +#endif
 >  
 >  	spin_unlock(&desc->lock);
 >  }
 > @@ -533,11 +538,12 @@ void fastcall __ipipe_end_level_irq(unsigned irq, struct irq_desc *desc)
 >  
 >  void fastcall __ipipe_ack_fasteoi_irq(unsigned irq, struct irq_desc *desc)
 >  {
 > +	desc->chip->eoi(irq);
 >  }
 >  
 >  void fastcall __ipipe_end_fasteoi_irq(unsigned irq, struct irq_desc *desc)
 >  {
 > -	desc->chip->eoi(irq);
 > +	desc->chip->unmask(irq);
 >  }
 >  
 >  void fastcall __ipipe_ack_edge_irq(unsigned irq, struct irq_desc *desc)

FWIW, this patche fixes my known-to-crash I-pipe 2.6.20 configuration.

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-10-04 20:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-04  9:14 [Xenomai-core] [BUG] IO-APIC stall due to broken fasteoi handling Jan Kiszka
2007-10-04  9:34 ` Philippe Gerum
2007-10-04 12:22   ` Philippe Gerum
2007-10-04 12:42     ` Jan Kiszka
2007-10-04 12:55       ` Philippe Gerum
2007-10-04 14:06         ` Jan Kiszka
2007-10-04 14:26           ` Philippe Gerum
2007-10-04 14:44             ` Jan Kiszka
2007-10-04 15:52     ` Jan Kiszka
2007-10-04 17:03       ` Jan Kiszka
2007-10-04 20:05     ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.