linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
@ 2008-09-25 23:40 Milton Miller
  2008-09-26  9:16 ` Sebastien Dugue
  0 siblings, 1 reply; 27+ messages in thread
From: Milton Miller @ 2008-09-25 23:40 UTC (permalink / raw)
  To: Sebastien Dugue, Milton Miller; +Cc: linuxppc-dev, linux-kernel

(I trimmed the cc list for the implementation discussion).
 
> On Wed, 24 Sep 2008 11:42:15 -0500 Milton Miller
> <miltonm@bga.com> wrote:
> 
> > On Sep 24, 2008, at 7:30 AM, Sebastien Dugue wrote:
> > >   Hi Milton,
> > > On Wed, 24 Sep 2008 04:58:22 -0500 (CDT) Milton Miller
> > > <miltonm@bga.com> wrote:
> > >> On Mon Sep 15 at 18:04:06 EST in 2008, Sebastien
> > Dugue wrote: >>> When entering the low level handler,
> > level sensitive interrupts are >>> masked, then eio'd in
> > interrupt context and then unmasked at the >>> end of
> > hardirq processing.  That's fine as any interrupt
> > comming >>> in-between will still be processed since the
> > kernel replays those >>> pending interrupts.
> > >>
> > >> Is this to generate some kind of software managed
> > nesting and priority >> of the hardware level
> > interrupts? >
> > >   No, not really. This is only to be sure to not miss
> > > interrupts coming from the same source that were
> > > received during threaded hardirq  processing.
> > > Some instrumentation showed that it never seems to
> > > happen in the eHEA interrupt case, so I think we can
> > > forget this aspect. 
> > I don't trust "the interrupt can never happen during hea
> > hardirq",  because I think there will be a race between
> > their rearming the next  interrupt and the unmask being
> > called.
> 
>   So do I, it was just to make sure I was not hit by
> another interrupt while handling the previous one and thus
> reduce the number of hypothesis.
> 
>   I sure do not say that it cannot happen, just that that
> path is not taken when I have the eHEA hang.
> 
> > I was trying to understand why the mask and early eoi,
> > but I guess its  to handle other more limited interrupt
> > controllers where the interrupts  stack in hardware
> > instead of software. 
> > >   Also, the problem only manifests with the eHEA RX
> > > interrupt. For  example,
> > > the IBM Power Raid (ipr) SCSI exhibits absolutely no
> > > problem under an  RT
> > > kernel. From this I conclude that:
> > >
> > >   IPR - PCI - XICS is OK
> > >   eHEA - IBMEBUS - XICS is broken with hardirq
> > preemption. >
> > >   I also checked that forcing the eHEA interrupt to
> > > take the non  threaded
> > > path does work.
> > 
> > For a long period of time, XICS dealt only with level
> > interrupts.    First Micro Channel, and later PCI buses.
> >  The IPI is made level by  software conventions. 
> > Recently, EHCA, EHEA, and MSI interrupts were  added
> > which by their nature are edge based.  The logic that
> > converts  those interrupts to the XICS layer is
> > responsible for the resend when  no cpu can accept them,
> > but not to retrigger after an EOI.
> 
>  OK
> 
> > 
> > >   Here is a side by side comparison of the fasteoi
> > > flow with and  without hardirq
> > > threading (sorry it's a bit wide)
> > (removed)
> > >   the non-threaded flow does (in interrupt context):
> > >
> > >     mask
> 
>   Whoops, my bad, in the non threaded case, there's no
> mask at all, only an unmask+eoi at the end, maybe that's
> an oversight!
 
No, not an oversight.  The point is, don't mask/unmask
between ack/eoi while handling the interrupt.  For many
irq controllers, the eoi must be done from the same cpu,
hence the mask and eoi before actually handling the
interrupt in the general case.   Its a feature of xics
that we don't have to play that game, but can do the
cpu and device eoi separately.
 
> > >     handle interrupt
> > >     unmask
> > >     eoi
> > >
> > >   the threaded flow does:
> > >
> > >     mask
> > >     eoi
> > >         handle interrupt
> > >         unmask
> > >
> > >   If I remove the mask() call, then the eHEA is no
> > > longer hanging. 
> > Hmm, I guess I'm confused.  You are saying the irq does
> > not appear if  it occurs while it is masked?
> 
>   Looks like it is, but I cannot say for sure, the only
> observable effect is that I do not get any more interrupts
> coming from the eHEA.
 
(removed features of xics)
 
 
>   That may be, but I'm only looking at the code (read no
> specifications at hand) and it looks like a black box to
> me.
 
"PowerPC External Interrupt Architecture" is defined in
appendix A of "Power.org=99 Standard for 
Power Architecture=99 Platform Requirements
(Workstation, Server)", available to Power.org members.
"The developer-level membership in Power.org is free."
(see www.power.org).
 
That said, it likely won't mention the eHEA in enough
detail to note that the interrupt gets cleared on
unmask.
 
On the other hand, I have actually seen the source
to implementations of the xics logic, so I have a
very good understanding of it (and know of a few
implementation "features", shall we say).
 
 
> > The path lengh for mask and unmask is always VERY slow
> > and single  threaded global lock and single context in
> > xics.  It is designed and  tuned to run at driver
> > startup and shutdown (and adapter reset and  reinitalize
> > during pci error processing), not during normal irq 
> > processing.
> 
>   Now, that is quite interesting then. Those mask() and
> unmask() should then be called shutdown() and startup()
> and not at each interrupt or am I misunderstanding you.
 
Basically, yes.  but linux likes to let drivers mask at
other times, and that is the facility we have.
 
> > The XICS hardware implicitly masks the specific source
> > as part of  interrupt ack (get_irq), and implicitly
> > undoes this mask at eoi.   In  addition, it helps to
> > manage the cpu priority by supplying the previous 
> priority as part of the get_irq process and providing for
> > the priority  to be restored (lowered only) as part of
> > the eoi.  The hardware does  support setting the cpu
> priority independently.
> 
>   This confirms, then, that the mask and unmask methods
> should be empty for the xics.
> 
> > 
> > We should only be using this implicit masking for xics,
> > and not the  explicit masking for any normal interrupt
> > processing.
> 
>   OK
> 
> >  I don't know if 
> > this means making the mask/unmask setting a bit in
> > software,
> 
>   Used by whom? 
 
The thought here was if we can't change the caller, then
maybe we could try to figure out what the caller was
trying to accomplish and defer what was requested based
on context.   Obviously, we are better off changing the
caller.
 
> 
> > and the 
> > enable/disable to actually call what we do now on
> > mask/unmask, or if it  means we need a new flow type on
> real time.
> 
>   Maybe a new flow type is not necessary considering what
> you said.
 
Maybe not, but I think it would be preferred ... we do have
the source to both sides.
 
> > While call to mask and unmask might work on level
> > interrupts, its  really slow and will limit performance
> > if done on every interrupt. 
> > >   the non-threaded flow does (in interrupt context):
> > >
> > >     mask
> 
>   Same Whoops, no mask is done in the non threaded case
I was just copying for context :=3d)
 
> > >     handle interrupt
> > >     unmask
> > >     eoi
> > >
> > >   the threaded flow does:
> > >
> > >     mask
> > >     eoi
> > >         handle interrupt
> > >         unmask
> > 
> > I think the flows we want on xics are:
> > 
> > (non-threaded)
> >     getirq (implicit source specific mask until eoi)
> >     handle interrupt
> >     eoi (implicit cpu priority restore)
> 
>   Yep
> 
> > 
> > (threaded)
> >     getirq (implicit source specific mask until eoi)
> >     explicit cpu priority restore
>         ^
>   How do you go about doing that? Still not clear to me.
 
xics_set_cpu_priority(0xff)
 
of course, there needs to be some kind of 
struct irq_chip method to call it.
 
> >     handle interrupt
> >     eoi (implicit cpu priority restore to same as
> > explicit level) 
> > Where the cpu priority restore allows receiving other
> > interrupts of the  same priority from the hardware.
> > 
> > So I guess the question is can the rt kernel interrupt
> > processing take  advantage of xics auto mask,
> 
>   It should, but even mainline could benefit from it I
> guess.
> 
 
I haven't looked, but are you implying mainline calls
mask and unmask for each interrupt, and not just when
the driver does unmask/mask/request/free?
 
Otherwise, I am not sure mainline is doing anything wrong.
 
> > or does someone need to write state 
> > tracking in the xics code to work around this, changing
> > mask under  interrupt to "defer eoi to unmask" (which I
> > can not see as clean, and  having shutdown problems).
> > 
>   Thanks a lot Milton for those explanations,
> 
>   Sebastien.
 
You're welcome.  I don't have time to work on the rt kernel,
but I'll be glad to help suggest an implementation that
works efficiently.
 
milton

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-25 23:40 [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption Milton Miller
@ 2008-09-26  9:16 ` Sebastien Dugue
  0 siblings, 0 replies; 27+ messages in thread
From: Sebastien Dugue @ 2008-09-26  9:16 UTC (permalink / raw)
  To: Milton Miller; +Cc: linux-kernel, linuxppc-dev

On Thu, 25 Sep 2008 18:40:28 -0500 "Milton Miller" <miltonm@bga.com> wrote:

> (I trimmed the cc list for the implementation discussion).

  Yep, good thing.

<snip>

> >=20
> >   Whoops, my bad, in the non threaded case, there's no
> > mask at all, only an unmask+eoi at the end, maybe that's
> > an oversight!
> =20
> No, not an oversight.  The point is, don't mask/unmask
> between ack/eoi while handling the interrupt.  For many
> irq controllers, the eoi must be done from the same cpu,
> hence the mask and eoi before actually handling the
> interrupt in the general case.   Its a feature of xics
> that we don't have to play that game, but can do the
> cpu and device eoi separately.

  Ok, will try to play with this a bit more.

<snip>

> =20
> >   That may be, but I'm only looking at the code (read no
> > specifications at hand) and it looks like a black box to
> > me.
> =20
> "PowerPC External Interrupt Architecture" is defined in
> appendix A of "Power.org=C2=99 Standard for=20
> Power Architecture=C2=99 Platform Requirements
> (Workstation, Server)", available to Power.org members.
> "The developer-level membership in Power.org is free."
> (see www.power.org).

  I already have that one and started to dig into the interrupt stuff, but =
as
with all normative stuff it's boring to the extreme and not always without
flaws. Looks like I know what I'll be reading this WE.

> =20
> That said, it likely won't mention the eHEA in enough
> detail to note that the interrupt gets cleared on
> unmask.
> =20
> On the other hand, I have actually seen the source
> to implementations of the xics logic, so I have a
> very good understanding of it (and know of a few
> implementation "features", shall we say).

  Good to know.

> =20
> =20
> > > The path lengh for mask and unmask is always VERY slow
> > > and single  threaded global lock and single context in
> > > xics.  It is designed and  tuned to run at driver
> > > startup and shutdown (and adapter reset and  reinitalize
> > > during pci error processing), not during normal irq=20
> > > processing.
> >=20
> >   Now, that is quite interesting then. Those mask() and
> > unmask() should then be called shutdown() and startup()
> > and not at each interrupt or am I misunderstanding you.
> =20
> Basically, yes.  but linux likes to let drivers mask at
> other times, and that is the facility we have.

  Darn, do those drivers really need that heavywheight masking
at the source or something simpler could be accomplished by
fiddling with the processor priority in mask and unmask?

> =20
> > > The XICS hardware implicitly masks the specific source
> > > as part of  interrupt ack (get_irq), and implicitly
> > > undoes this mask at eoi.   In  addition, it helps to
> > > manage the cpu priority by supplying the previous=20
> > priority as part of the get_irq process and providing for
> > > the priority  to be restored (lowered only) as part of
> > > the eoi.  The hardware does  support setting the cpu
> > priority independently.
> >=20
> >   This confirms, then, that the mask and unmask methods
> > should be empty for the xics.
> >=20
> > >=20
> > > We should only be using this implicit masking for xics,
> > > and not the  explicit masking for any normal interrupt
> > > processing.
> >=20
> >   OK
> >=20
> > >  I don't know if=20
> > > this means making the mask/unmask setting a bit in
> > > software,
> >=20
> >   Used by whom?=20
> =20
> The thought here was if we can't change the caller, then
> maybe we could try to figure out what the caller was
> trying to accomplish and defer what was requested based
> on context.   Obviously, we are better off changing the
> caller.

  That will not be so easy as we'll change behaviour for every user of
the genirq layer.

> =20
> >=20
> > > and the=20
> > > enable/disable to actually call what we do now on
> > > mask/unmask, or if it  means we need a new flow type on
> > real time.
> >=20
> >   Maybe a new flow type is not necessary considering what
> > you said.
> =20
> Maybe not, but I think it would be preferred ... we do have
> the source to both sides.

  That's straightforward to do in the non threaded case, however, in the
threaded case the xics code would have to also manage the hardirq thread
as we would not be able to reuse the generic one (because of the unmask it
does at the very end).

<snip>

> > > I think the flows we want on xics are:
> > >=20
> > > (non-threaded)
> > >     getirq (implicit source specific mask until eoi)
> > >     handle interrupt
> > >     eoi (implicit cpu priority restore)
> >=20
> >   Yep
> >=20
> > >=20
> > > (threaded)
> > >     getirq (implicit source specific mask until eoi)
> > >     explicit cpu priority restore
> >         ^
> >   How do you go about doing that? Still not clear to me.
> =20
> xics_set_cpu_priority(0xff)

  OK

> =20
> of course, there needs to be some kind of=20
> struct irq_chip method to call it.

  Yep, excepted that it's currently not provided by irq_chip.

> =20
> > >     handle interrupt
> > >     eoi (implicit cpu priority restore to same as
> > > explicit level)=20
> > > Where the cpu priority restore allows receiving other
> > > interrupts of the  same priority from the hardware.
> > >=20
> > > So I guess the question is can the rt kernel interrupt
> > > processing take  advantage of xics auto mask,
> >=20
> >   It should, but even mainline could benefit from it I
> > guess.
> >=20
> =20
> I haven't looked, but are you implying mainline calls
> mask and unmask for each interrupt, and not just when
> the driver does unmask/mask/request/free?

  Nope you're right, I got confused. I only looked at the non threaded case
under preempt-rt which does unmask + eoi (without mask) whereas mainline
only does eoi

> =20
> Otherwise, I am not sure mainline is doing anything wrong.
> =20

  Right

> > > or does someone need to write state=20
> > > tracking in the xics code to work around this, changing
> > > mask under  interrupt to "defer eoi to unmask" (which I
> > > can not see as clean, and  having shutdown problems).
> > >=20
> >   Thanks a lot Milton for those explanations,
> >=20
> >   Sebastien.
> =20
> You're welcome.  I don't have time to work on the rt kernel,
> but I'll be glad to help suggest an implementation that
> works efficiently.
> =20
  Thanks a lot,

  Sebastien.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-24 16:42       ` Milton Miller
  2008-09-24 21:16         ` Benjamin Herrenschmidt
@ 2008-09-25  8:45         ` Sebastien Dugue
  1 sibling, 0 replies; 27+ messages in thread
From: Sebastien Dugue @ 2008-09-25  8:45 UTC (permalink / raw)
  To: Milton Miller
  Cc: linux-kernel, linuxppc-dev, Jan-Bernd Themann, Paul Mackerras,
	Christoph Raisch

On Wed, 24 Sep 2008 11:42:15 -0500 Milton Miller <miltonm@bga.com> wrote:

> On Sep 24, 2008, at 7:30 AM, Sebastien Dugue wrote:
> >   Hi Milton,
> > On Wed, 24 Sep 2008 04:58:22 -0500 (CDT) Milton Miller 
> > <miltonm@bga.com> wrote:
> >> On Mon Sep 15 at 18:04:06 EST in 2008, Sebastien Dugue wrote:
> >>> When entering the low level handler, level sensitive interrupts are
> >>> masked, then eio'd in interrupt context and then unmasked at the
> >>> end of hardirq processing.  That's fine as any interrupt comming
> >>> in-between will still be processed since the kernel replays those
> >>> pending interrupts.
> >>
> >> Is this to generate some kind of software managed nesting and priority
> >> of the hardware level interrupts?
> >
> >   No, not really. This is only to be sure to not miss interrupts coming
> > from the same source that were received during threaded hardirq 
> > processing.
> > Some instrumentation showed that it never seems to happen in the eHEA
> > interrupt case, so I think we can forget this aspect.
> 
> I don't trust "the interrupt can never happen during hea hardirq", 
> because I think there will be a race between their rearming the next 
> interrupt and the unmask being called.

  So do I, it was just to make sure I was not hit by another interrupt while
handling the previous one and thus reduce the number of hypothesis.

  I sure do not say that it cannot happen, just that that path is not taken
when I have the eHEA hang.

> 
> I was trying to understand why the mask and early eoi, but I guess its 
> to handle other more limited interrupt controllers where the interrupts 
> stack in hardware instead of software.
> 
> >   Also, the problem only manifests with the eHEA RX interrupt. For 
> > example,
> > the IBM Power Raid (ipr) SCSI exhibits absolutely no problem under an 
> > RT
> > kernel. From this I conclude that:
> >
> >   IPR - PCI - XICS is OK
> >   eHEA - IBMEBUS - XICS is broken with hardirq preemption.
> >
> >   I also checked that forcing the eHEA interrupt to take the non 
> > threaded
> > path does work.
> 
> For a long period of time, XICS dealt only with level interrupts.   
> First Micro Channel, and later PCI buses.  The IPI is made level by 
> software conventions.  Recently, EHCA, EHEA, and MSI interrupts were 
> added which by their nature are edge based.  The logic that converts 
> those interrupts to the XICS layer is responsible for the resend when 
> no cpu can accept them, but not to retrigger after an EOI.

 OK

> 
> >   Here is a side by side comparison of the fasteoi flow with and 
> > without hardirq
> > threading (sorry it's a bit wide)
> (removed)
> >   the non-threaded flow does (in interrupt context):
> >
> >     mask

  Whoops, my bad, in the non threaded case, there's no mask at all, only an
unmask+eoi at the end, maybe that's an oversight!


> >     handle interrupt
> >     unmask
> >     eoi
> >
> >   the threaded flow does:
> >
> >     mask
> >     eoi
> > 		handle interrupt
> > 		unmask
> >
> >   If I remove the mask() call, then the eHEA is no longer hanging.
> 
> Hmm, I guess I'm confused.  You are saying the irq does not appear if 
> it occurs while it is masked?

  Looks like it is, but I cannot say for sure, the only observable effect
is that I do not get any more interrupts coming from the eHEA.

>  Well, in that case, I would guess that 
> the hypervisor is checking if the irq is previously pending while it 
> was masked and resetting it as part of the unmask.   It can't do it on 
> level, but can on the true edge sources.  I would further say the 
> justification for this might be the hardware might make it pending from 
> some previous stale event that might result in the false interrupt on 
> startup were it not to do this clear.
> 
> >> The reason I ask is the xics controller can do unlimited nesting
> >> of hardware interrupts.  In fact, the hardware has 255 levels of
> >> priority, of which 16 or so are reserved by the hypervisor, leaving
> >> over 200 for the os to manage.  Higher numbers are lower in priority,
> >> and the hardware will only dispatch an interrupt to a given cpu if
> >> it is currenty at a lower priority.  If it is at a higher priority
> >> and the interrupt is not bound to a specific cpu it will look for
> >> another cpu to dispatch it.  The hardware will not re-present an
> >> irq until the it is EOId (managed by a small state machine per
> >> interrupt at the source, which also handles no cpu available try
> >> again later), but software can return its cpu priority to the
> >> previous level to recieve other interrupt sources at the same level.
> >> The hardware also supports lazy update of the cpu priority register
> >> when an interrupt is presented; as long as the cpu is hard-irq
> >> enabled it can take the irq then write is real priority and let the
> >> hw decide if the irq is still pending or it must defer or try another
> >> cpu in the rejection scenerio.  The only restriction is that the
> >> EOI can not cause an interrupt reject by raising the priority while
> >> sending the EOI command.
> >>
> >> The per-interrupt mask and unmask calls have to go through RTAS, a
> >> single-threaded global context, which in addition to increasing
> >> path length will really limit scalability.  The interrupt controller
> >> poll and reject facilities are accessed through hypervisor calls
> >> which are comparable to a fast syscall, and parallel to all cpus.
> >>
> >> We used to lower the priority to allow other interrupts in, but we
> >> realized that in addition to the questionable latency in doing so,
> >> it only caused unlimited stack nesting and overflow without per-irq
> >> stacks.  We currently set IPIs above other irqs so we typically
> >> only process them during a hard irq (but we return to base level
> >> after IPI and could take another base irq, a bug).
> >>
> >>
> >> So, Sebastien, with this information, is does the RT kernel have
> >> a strategy that better matches this hardware?
> >
> >   Don't think so. I think that the problem may be elsewhere as
> > everything is fine with PCI devices (well at least SCSI).
> 
> Those are true level sources, and not edge.

  Right.

> 
> >   As I said earlier in another mail, it seems that the eHEA
> > is behaving as if it was generating edge interrupts which do not
> > support masking. Don't know.
> 
> (I wrote this next paragraph before parsing the "remove mask and it 
> works" / I'm confused paragraph above, so it may not be a problem).
> 
> These sources are truly edge.  Once you do an EOI you are taking 
> responsibility to do the replay yourself.  In your threaded case, you 
> EOI and therefore the hardware will arm for the next event.  When you 
> add the mask, the delivery is deferred until it is unmasked at the end 
> of your EOI loop.  When you do not, the new interrupt may come in but 
> you just EOI it but do not tell the running thread that it happened, 
> then you are dropping the irq event.   Since the source is truly edge, 
> there is no hardware replay and the interrupt is lost.
> 
> (I think the pci express gigabit is one of the few msi interrupt 
> adapters that both IBM and Linux support).
> 
> >   Thanks a lot for the explanation, looks like the xics + hypervisor
> > combo is way more complex than I thought.
> 
> While the hypervisor adds a bit of path length (an hcall vs a single 
> mmio access for get_irq/eoi with multiple priority irq nesting), the 
> model is no more or less complicated than native xics.

  That may be, but I'm only looking at the code (read no specifications at hand)
and it looks like a black box to me.

> 
> The path lengh for mask and unmask is always VERY slow and single 
> threaded global lock and single context in xics.  It is designed and 
> tuned to run at driver startup and shutdown (and adapter reset and 
> reinitalize during pci error processing), not during normal irq 
> processing.

  Now, that is quite interesting then. Those mask() and unmask() should then
be called shutdown() and startup() and not at each interrupt or am I
misunderstanding you.

> 
> The XICS hardware implicitly masks the specific source as part of 
> interrupt ack (get_irq), and implicitly undoes this mask at eoi.   In 
> addition, it helps to manage the cpu priority by supplying the previous 
> priority as part of the get_irq process and providing for the priority 
> to be restored (lowered only) as part of the eoi.  The hardware does 
> support setting the cpu priority independently.

  This confirms, then, that the mask and unmask methods should be empty
for the xics.

> 
> We should only be using this implicit masking for xics, and not the 
> explicit masking for any normal interrupt processing.

  OK

>  I don't know if 
> this means making the mask/unmask setting a bit in software,

  Used by whom? 

> and the 
> enable/disable to actually call what we do now on mask/unmask, or if it 
> means we need a new flow type on real time.

  Maybe a new flow type is not necessary considering what you said.

> 
> While call to mask and unmask might work on level interrupts, its 
> really slow and will limit performance if done on every interrupt.
> 
> >   the non-threaded flow does (in interrupt context):
> >
> >     mask

  Same Whoops, no mask is done in the non threaded case

> >     handle interrupt
> >     unmask
> >     eoi
> >
> >   the threaded flow does:
> >
> >     mask
> >     eoi
> > 		handle interrupt
> > 		unmask
> 
> I think the flows we want on xics are:
> 
> (non-threaded)
> 	getirq (implicit source specific mask until eoi)
> 	handle interrupt
> 	eoi (implicit cpu priority restore)

  Yep

> 
> (threaded)
> 	getirq (implicit source specific mask until eoi)
> 	explicit cpu priority restore
        ^
  How do you go about doing that? Still not clear to me.

> 	handle interrupt
> 	eoi (implicit cpu priority restore to same as explicit level)
> 
> Where the cpu priority restore allows receiving other interrupts of the 
> same priority from the hardware.
> 
> So I guess the question is can the rt kernel interrupt processing take 
> advantage of xics auto mask,

  It should, but even mainline could benefit from it I guess.

> or does someone need to write state 
> tracking in the xics code to work around this, changing mask under 
> interrupt to "defer eoi to unmask" (which I can not see as clean, and 
> having shutdown problems).


  Thanks a lot Milton for those explanations,


  Sebastien.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-25  8:36                 ` Benjamin Herrenschmidt
@ 2008-09-25  8:39                   ` Sebastien Dugue
  0 siblings, 0 replies; 27+ messages in thread
From: Sebastien Dugue @ 2008-09-25  8:39 UTC (permalink / raw)
  To: benh
  Cc: linux-kernel, Milton Miller, linuxppc-dev, Christoph Raisch,
	Paul Mackerras, Jan-Bernd Themann

On Thu, 25 Sep 2008 18:36:19 +1000 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> 
> >   Do you mean creating a custom fasteoi handler into xics.c? Also, it's
> > not clear to me from looking at the code how you go about changing the
> > cpu priority.
> 
> Yup. I think the priority is the CPPR.. Milton can give you more
> details, if not, I'll pick it up tomorrow when at the office.
> 

  Thanks Ben, will look into this.

  Nite

  Sebastien.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-25  7:42               ` Sebastien Dugue
@ 2008-09-25  8:36                 ` Benjamin Herrenschmidt
  2008-09-25  8:39                   ` Sebastien Dugue
  0 siblings, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2008-09-25  8:36 UTC (permalink / raw)
  To: Sebastien Dugue
  Cc: linux-kernel, Milton Miller, linuxppc-dev, Christoph Raisch,
	Paul Mackerras, Jan-Bernd Themann


>   Do you mean creating a custom fasteoi handler into xics.c? Also, it's
> not clear to me from looking at the code how you go about changing the
> cpu priority.

Yup. I think the priority is the CPPR.. Milton can give you more
details, if not, I'll pick it up tomorrow when at the office.

Ben.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-25  7:22             ` Benjamin Herrenschmidt
@ 2008-09-25  7:42               ` Sebastien Dugue
  2008-09-25  8:36                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 27+ messages in thread
From: Sebastien Dugue @ 2008-09-25  7:42 UTC (permalink / raw)
  To: benh
  Cc: linux-kernel, Milton Miller, linuxppc-dev, Christoph Raisch,
	Paul Mackerras, Jan-Bernd Themann

On Thu, 25 Sep 2008 17:22:41 +1000 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Thu, 2008-09-25 at 09:18 +0200, Sebastien Dugue wrote:
> > 
> > > Ok, that's the right approach then. It should work. I don't know
> > what
> > > the specific problems with HEA are at this stage.
> > 
> >   Yep, except as it behaves in way that the current -rt fasteoi flow
> > cannot handle.
> 
> We probably need to make a special xics flow handler for -rt that does
> what Milton suggested, ie, bring down the CPU priority right away

  Do you mean creating a custom fasteoi handler into xics.c? Also, it's
not clear to me from looking at the code how you go about changing the
cpu priority.

> and
> only EOI later or something like that, instead of masking/unmasking.
> 
> I don't know what are the other potential issues with the HEA though.

  Don't know either, but that I can test.

  Thanks,

  Sebastien.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-24 21:14         ` Benjamin Herrenschmidt
@ 2008-09-25  7:31           ` Sebastien Dugue
  0 siblings, 0 replies; 27+ messages in thread
From: Sebastien Dugue @ 2008-09-25  7:31 UTC (permalink / raw)
  To: benh
  Cc: linux-kernel, Milton Miller, linuxppc-dev, Jan-Bernd Themann,
	Paul Mackerras, Christoph Raisch

On Thu, 25 Sep 2008 07:14:07 +1000 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> 
> > There may be some implicit assumption in that we expect the cpu 
> > priority to be returned to normal by the EOI, but there is nothing in 
> > the hardware that requires the EOI to come from the same cpu as 
> > accepted the interrupt for processing, with the exception of the IPI 
> > which is per-cpu (and the only interrupt that is per-cpu).
> 
> Well, there is one fundamental one: The XIRR register we access is
> per-CPU, so if we are to return the right processor priority, we must
> make sure we write the right XIRR.

  That's already the case as the irq fetch (xx_xirr_info_get()) and
eoi (xx_xirr_info_set()) are both done in interrupt context, therefore on
the same cpu.

> 
> Same with Cell, MPIC, actually and a few others. In general I'd say most
> fast_eoi type PICs have this requirement.
> 
> > It would probably mean adding the concept of the current cpu priority 
> > vs interrupts and making sure we write it to hardware at irq_exit() 
> > time when deferring the actual irq handlers.
> 
> I think we need something like a special -rt variant of the fast_eoi
> handler that masks & eoi's in ack() before the thread is spun off, and
> unmasks instead of eoi() when the irq processing is complete.

  This is what is already done in the threaded case:

    - fetch + mask + eoi  in interrupt context

    - unmask in the thread when processing is complete.


  Sebastien.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-25  7:18           ` Sebastien Dugue
@ 2008-09-25  7:22             ` Benjamin Herrenschmidt
  2008-09-25  7:42               ` Sebastien Dugue
  0 siblings, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2008-09-25  7:22 UTC (permalink / raw)
  To: Sebastien Dugue
  Cc: linux-kernel, Milton Miller, linuxppc-dev, Christoph Raisch,
	Paul Mackerras, Jan-Bernd Themann

On Thu, 2008-09-25 at 09:18 +0200, Sebastien Dugue wrote:
> 
> > Ok, that's the right approach then. It should work. I don't know
> what
> > the specific problems with HEA are at this stage.
> 
>   Yep, except as it behaves in way that the current -rt fasteoi flow
> cannot handle.

We probably need to make a special xics flow handler for -rt that does
what Milton suggested, ie, bring down the CPU priority right away and
only EOI later or something like that, instead of masking/unmasking.

I don't know what are the other potential issues with the HEA though.

Ben.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-24 21:15         ` Benjamin Herrenschmidt
@ 2008-09-25  7:18           ` Sebastien Dugue
  2008-09-25  7:22             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 27+ messages in thread
From: Sebastien Dugue @ 2008-09-25  7:18 UTC (permalink / raw)
  To: benh
  Cc: linux-kernel, Milton Miller, linuxppc-dev, Christoph Raisch,
	Paul Mackerras, Jan-Bernd Themann

On Thu, 25 Sep 2008 07:15:17 +1000 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Wed, 2008-09-24 at 14:35 +0200, Sebastien Dugue wrote:
> > Hi Ben,
> > 
> > On Wed, 24 Sep 2008 20:17:47 +1000 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> > 
> > > On Wed, 2008-09-24 at 04:58 -0500, Milton Miller wrote:
> > > > The per-interrupt mask and unmask calls have to go through RTAS, a
> > > > single-threaded global context, which in addition to increasing
> > > > path length will really limit scalability.  The interrupt controller
> > > > poll and reject facilities are accessed through hypervisor calls
> > > > which are comparable to a fast syscall, and parallel to all cpus.
> > > 
> > > Note also that the XICS code thus assumes, iirc, as does the cell IIC
> > > code, that eoi is called on the -same- cpu that fetched the interrupt
> > > initially. That assumption can be broken with IRQ threads no ?
> > 
> >   No, the fetch and the eoi are both done in interrupt context before
> > the hardirq thread is woken up.
> > 
> >   On the other hand, the mask+eoi and the unmask may well happen
> > on different cpus as there's only one hardirq thread per irq on
> > the system. Don't know if this is a problem with the XICS though.
> 
> Ok, that's the right approach then. It should work. I don't know what
> the specific problems with HEA are at this stage.

  Yep, except as it behaves in way that the current -rt fasteoi flow
cannot handle.

> It doesn't seem to
> make sense to implement a set_irq_type(), what would it do ? The
> XICS doesn't expose any concept of interrupt type...

  That's what I gathered from looking at the sources.

  Thanks,

  Sebastien.

  

  

   
  

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-24 21:16         ` Benjamin Herrenschmidt
@ 2008-09-25  3:56           ` Milton Miller
  0 siblings, 0 replies; 27+ messages in thread
From: Milton Miller @ 2008-09-25  3:56 UTC (permalink / raw)
  To: benh
  Cc: linux-kernel, linuxppc-dev, Sebastien Dugue, Jan-Bernd Themann,
	Paul Mackerras, Christoph Raisch

On Sep 24, 2008, at 4:16 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2008-09-24 at 11:42 -0500, Milton Miller wrote:
>>
>> I was trying to understand why the mask and early eoi, but I guess its
>> to handle other more limited interrupt controllers where the 
>> interrupts
>> stack in hardware instead of software.
>
> No Milton, we must do it that way, because the EOI must be done on the
> right CPU even on XICS, or we won't get the CPU priority back properly.

Ben and I had a online chat, and he pointed out I needed to be more 
specific in saying what I was thinking.

>> I think the flows we want on xics are:
>>
>> (non-threaded)
>> 	getirq (implicit source specific mask until eoi)
>> 	handle interrupt
>> 	eoi (implicit cpu priority restore)
>>
>> (threaded)
>> 	getirq (implicit source specific mask until eoi)
>> 	explicit cpu priority restore
>> 	handle interrupt
>> 	eoi (implicit cpu priority restore to same as explicit level)


cpu takes interrupt, checks soft disabled
if so,
	set hard disabled
else
	call get_irq
	if threaded
		write cppr to restore this cpu irq dispatch state to non-interrupt
		mark irq thread as irq pending
	else
		handle interrupt
		eoi (cppr = base)

irq thread will
	handle interrupt
	eoi
	wait for marked pending again

The part Ben did not follow was that the cppr write to base priority is 
done by the interrupted cpu (like the mask and eoi in the current flow) 
and only the final eoi (where the mask is in the existing flow) is done 
on which ever cpu happens to run the irq thread.


(optional) As I was discussing with Paul, when taking an irq when 
soft-disabled but still hard enabled, it is possible to write the cppr 
such that it would reject the pending irq and have it be considered for 
dispatch to another cpu.   But it would increase pathlength on both the 
go-to-hard-disabled and return-from-hard-disabled and the hardware will 
have some latency as it will likely send it back to the io source until 
it retrys, so we would only want to do this if the hard-disable period 
is sufficiently long.

milton

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-24 16:42       ` Milton Miller
@ 2008-09-24 21:16         ` Benjamin Herrenschmidt
  2008-09-25  3:56           ` Milton Miller
  2008-09-25  8:45         ` Sebastien Dugue
  1 sibling, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2008-09-24 21:16 UTC (permalink / raw)
  To: Milton Miller
  Cc: linux-kernel, linuxppc-dev, Sebastien Dugue, Jan-Bernd Themann,
	Paul Mackerras, Christoph Raisch

On Wed, 2008-09-24 at 11:42 -0500, Milton Miller wrote:
> 
> I was trying to understand why the mask and early eoi, but I guess its 
> to handle other more limited interrupt controllers where the interrupts 
> stack in hardware instead of software.

No Milton, we must do it that way, because the EOI must be done on the
right CPU even on XICS, or we won't get the CPU priority back properly.

Ben.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-24 12:35       ` Sebastien Dugue
@ 2008-09-24 21:15         ` Benjamin Herrenschmidt
  2008-09-25  7:18           ` Sebastien Dugue
  0 siblings, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2008-09-24 21:15 UTC (permalink / raw)
  To: Sebastien Dugue
  Cc: linux-kernel, Milton Miller, linuxppc-dev, Christoph Raisch,
	Paul Mackerras, Jan-Bernd Themann

On Wed, 2008-09-24 at 14:35 +0200, Sebastien Dugue wrote:
> Hi Ben,
> 
> On Wed, 24 Sep 2008 20:17:47 +1000 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> 
> > On Wed, 2008-09-24 at 04:58 -0500, Milton Miller wrote:
> > > The per-interrupt mask and unmask calls have to go through RTAS, a
> > > single-threaded global context, which in addition to increasing
> > > path length will really limit scalability.  The interrupt controller
> > > poll and reject facilities are accessed through hypervisor calls
> > > which are comparable to a fast syscall, and parallel to all cpus.
> > 
> > Note also that the XICS code thus assumes, iirc, as does the cell IIC
> > code, that eoi is called on the -same- cpu that fetched the interrupt
> > initially. That assumption can be broken with IRQ threads no ?
> 
>   No, the fetch and the eoi are both done in interrupt context before
> the hardirq thread is woken up.
> 
>   On the other hand, the mask+eoi and the unmask may well happen
> on different cpus as there's only one hardirq thread per irq on
> the system. Don't know if this is a problem with the XICS though.

Ok, that's the right approach then. It should work. I don't know what
the specific problems with HEA are at this stage. It doesn't seem to
make sense to implement a set_irq_type(), what would it do ? The
XICS doesn't expose any concept of interrupt type...

Ben.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-24 11:02       ` Milton Miller
@ 2008-09-24 21:14         ` Benjamin Herrenschmidt
  2008-09-25  7:31           ` Sebastien Dugue
  0 siblings, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2008-09-24 21:14 UTC (permalink / raw)
  To: Milton Miller
  Cc: linux-kernel, linuxppc-dev, Sebastien Dugue, Jan-Bernd Themann,
	Paul Mackerras, Christoph Raisch


> There may be some implicit assumption in that we expect the cpu 
> priority to be returned to normal by the EOI, but there is nothing in 
> the hardware that requires the EOI to come from the same cpu as 
> accepted the interrupt for processing, with the exception of the IPI 
> which is per-cpu (and the only interrupt that is per-cpu).

Well, there is one fundamental one: The XIRR register we access is
per-CPU, so if we are to return the right processor priority, we must
make sure we write the right XIRR.

Same with Cell, MPIC, actually and a few others. In general I'd say most
fast_eoi type PICs have this requirement.

> It would probably mean adding the concept of the current cpu priority 
> vs interrupts and making sure we write it to hardware at irq_exit() 
> time when deferring the actual irq handlers.

I think we need something like a special -rt variant of the fast_eoi
handler that masks & eoi's in ack() before the thread is spun off, and
unmasks instead of eoi() when the irq processing is complete.

Ben.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-24 12:30     ` Sebastien Dugue
@ 2008-09-24 16:42       ` Milton Miller
  2008-09-24 21:16         ` Benjamin Herrenschmidt
  2008-09-25  8:45         ` Sebastien Dugue
  0 siblings, 2 replies; 27+ messages in thread
From: Milton Miller @ 2008-09-24 16:42 UTC (permalink / raw)
  To: Sebastien Dugue
  Cc: linux-kernel, linuxppc-dev, Jan-Bernd Themann, Paul Mackerras,
	Christoph Raisch

On Sep 24, 2008, at 7:30 AM, Sebastien Dugue wrote:
>   Hi Milton,
> On Wed, 24 Sep 2008 04:58:22 -0500 (CDT) Milton Miller 
> <miltonm@bga.com> wrote:
>> On Mon Sep 15 at 18:04:06 EST in 2008, Sebastien Dugue wrote:
>>> When entering the low level handler, level sensitive interrupts are
>>> masked, then eio'd in interrupt context and then unmasked at the
>>> end of hardirq processing.  That's fine as any interrupt comming
>>> in-between will still be processed since the kernel replays those
>>> pending interrupts.
>>
>> Is this to generate some kind of software managed nesting and priority
>> of the hardware level interrupts?
>
>   No, not really. This is only to be sure to not miss interrupts coming
> from the same source that were received during threaded hardirq 
> processing.
> Some instrumentation showed that it never seems to happen in the eHEA
> interrupt case, so I think we can forget this aspect.

I don't trust "the interrupt can never happen during hea hardirq", 
because I think there will be a race between their rearming the next 
interrupt and the unmask being called.

I was trying to understand why the mask and early eoi, but I guess its 
to handle other more limited interrupt controllers where the interrupts 
stack in hardware instead of software.

>   Also, the problem only manifests with the eHEA RX interrupt. For 
> example,
> the IBM Power Raid (ipr) SCSI exhibits absolutely no problem under an 
> RT
> kernel. From this I conclude that:
>
>   IPR - PCI - XICS is OK
>   eHEA - IBMEBUS - XICS is broken with hardirq preemption.
>
>   I also checked that forcing the eHEA interrupt to take the non 
> threaded
> path does work.

For a long period of time, XICS dealt only with level interrupts.   
First Micro Channel, and later PCI buses.  The IPI is made level by 
software conventions.  Recently, EHCA, EHEA, and MSI interrupts were 
added which by their nature are edge based.  The logic that converts 
those interrupts to the XICS layer is responsible for the resend when 
no cpu can accept them, but not to retrigger after an EOI.

>   Here is a side by side comparison of the fasteoi flow with and 
> without hardirq
> threading (sorry it's a bit wide)
(removed)
>   the non-threaded flow does (in interrupt context):
>
>     mask
>     handle interrupt
>     unmask
>     eoi
>
>   the threaded flow does:
>
>     mask
>     eoi
> 		handle interrupt
> 		unmask
>
>   If I remove the mask() call, then the eHEA is no longer hanging.

Hmm, I guess I'm confused.  You are saying the irq does not appear if 
it occurs while it is masked?  Well, in that case, I would guess that 
the hypervisor is checking if the irq is previously pending while it 
was masked and resetting it as part of the unmask.   It can't do it on 
level, but can on the true edge sources.  I would further say the 
justification for this might be the hardware might make it pending from 
some previous stale event that might result in the false interrupt on 
startup were it not to do this clear.

>> The reason I ask is the xics controller can do unlimited nesting
>> of hardware interrupts.  In fact, the hardware has 255 levels of
>> priority, of which 16 or so are reserved by the hypervisor, leaving
>> over 200 for the os to manage.  Higher numbers are lower in priority,
>> and the hardware will only dispatch an interrupt to a given cpu if
>> it is currenty at a lower priority.  If it is at a higher priority
>> and the interrupt is not bound to a specific cpu it will look for
>> another cpu to dispatch it.  The hardware will not re-present an
>> irq until the it is EOId (managed by a small state machine per
>> interrupt at the source, which also handles no cpu available try
>> again later), but software can return its cpu priority to the
>> previous level to recieve other interrupt sources at the same level.
>> The hardware also supports lazy update of the cpu priority register
>> when an interrupt is presented; as long as the cpu is hard-irq
>> enabled it can take the irq then write is real priority and let the
>> hw decide if the irq is still pending or it must defer or try another
>> cpu in the rejection scenerio.  The only restriction is that the
>> EOI can not cause an interrupt reject by raising the priority while
>> sending the EOI command.
>>
>> The per-interrupt mask and unmask calls have to go through RTAS, a
>> single-threaded global context, which in addition to increasing
>> path length will really limit scalability.  The interrupt controller
>> poll and reject facilities are accessed through hypervisor calls
>> which are comparable to a fast syscall, and parallel to all cpus.
>>
>> We used to lower the priority to allow other interrupts in, but we
>> realized that in addition to the questionable latency in doing so,
>> it only caused unlimited stack nesting and overflow without per-irq
>> stacks.  We currently set IPIs above other irqs so we typically
>> only process them during a hard irq (but we return to base level
>> after IPI and could take another base irq, a bug).
>>
>>
>> So, Sebastien, with this information, is does the RT kernel have
>> a strategy that better matches this hardware?
>
>   Don't think so. I think that the problem may be elsewhere as
> everything is fine with PCI devices (well at least SCSI).

Those are true level sources, and not edge.

>   As I said earlier in another mail, it seems that the eHEA
> is behaving as if it was generating edge interrupts which do not
> support masking. Don't know.

(I wrote this next paragraph before parsing the "remove mask and it 
works" / I'm confused paragraph above, so it may not be a problem).

These sources are truly edge.  Once you do an EOI you are taking 
responsibility to do the replay yourself.  In your threaded case, you 
EOI and therefore the hardware will arm for the next event.  When you 
add the mask, the delivery is deferred until it is unmasked at the end 
of your EOI loop.  When you do not, the new interrupt may come in but 
you just EOI it but do not tell the running thread that it happened, 
then you are dropping the irq event.   Since the source is truly edge, 
there is no hardware replay and the interrupt is lost.

(I think the pci express gigabit is one of the few msi interrupt 
adapters that both IBM and Linux support).

>   Thanks a lot for the explanation, looks like the xics + hypervisor
> combo is way more complex than I thought.

While the hypervisor adds a bit of path length (an hcall vs a single 
mmio access for get_irq/eoi with multiple priority irq nesting), the 
model is no more or less complicated than native xics.

The path lengh for mask and unmask is always VERY slow and single 
threaded global lock and single context in xics.  It is designed and 
tuned to run at driver startup and shutdown (and adapter reset and 
reinitalize during pci error processing), not during normal irq 
processing.

The XICS hardware implicitly masks the specific source as part of 
interrupt ack (get_irq), and implicitly undoes this mask at eoi.   In 
addition, it helps to manage the cpu priority by supplying the previous 
priority as part of the get_irq process and providing for the priority 
to be restored (lowered only) as part of the eoi.  The hardware does 
support setting the cpu priority independently.

We should only be using this implicit masking for xics, and not the 
explicit masking for any normal interrupt processing.  I don't know if 
this means making the mask/unmask setting a bit in software, and the 
enable/disable to actually call what we do now on mask/unmask, or if it 
means we need a new flow type on real time.

While call to mask and unmask might work on level interrupts, its 
really slow and will limit performance if done on every interrupt.

>   the non-threaded flow does (in interrupt context):
>
>     mask
>     handle interrupt
>     unmask
>     eoi
>
>   the threaded flow does:
>
>     mask
>     eoi
> 		handle interrupt
> 		unmask

I think the flows we want on xics are:

(non-threaded)
	getirq (implicit source specific mask until eoi)
	handle interrupt
	eoi (implicit cpu priority restore)

(threaded)
	getirq (implicit source specific mask until eoi)
	explicit cpu priority restore
	handle interrupt
	eoi (implicit cpu priority restore to same as explicit level)

Where the cpu priority restore allows receiving other interrupts of the 
same priority from the hardware.

So I guess the question is can the rt kernel interrupt processing take 
advantage of xics auto mask, or does someone need to write state 
tracking in the xics code to work around this, changing mask under 
interrupt to "defer eoi to unmask" (which I can not see as clean, and 
having shutdown problems).

milton

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-24 10:17     ` Benjamin Herrenschmidt
  2008-09-24 11:02       ` Milton Miller
@ 2008-09-24 12:35       ` Sebastien Dugue
  2008-09-24 21:15         ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 27+ messages in thread
From: Sebastien Dugue @ 2008-09-24 12:35 UTC (permalink / raw)
  To: benh
  Cc: linux-kernel, Milton Miller, linuxppc-dev, Christoph Raisch,
	Paul Mackerras, Jan-Bernd Themann


  Hi Ben,

On Wed, 24 Sep 2008 20:17:47 +1000 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Wed, 2008-09-24 at 04:58 -0500, Milton Miller wrote:
> > The per-interrupt mask and unmask calls have to go through RTAS, a
> > single-threaded global context, which in addition to increasing
> > path length will really limit scalability.  The interrupt controller
> > poll and reject facilities are accessed through hypervisor calls
> > which are comparable to a fast syscall, and parallel to all cpus.
> 
> Note also that the XICS code thus assumes, iirc, as does the cell IIC
> code, that eoi is called on the -same- cpu that fetched the interrupt
> initially. That assumption can be broken with IRQ threads no ?

  No, the fetch and the eoi are both done in interrupt context before
the hardirq thread is woken up.

  On the other hand, the mask+eoi and the unmask may well happen
on different cpus as there's only one hardirq thread per irq on
the system. Don't know if this is a problem with the XICS though.

  Thanks,

  Sebastien.

> 
> Ben.
> 
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-24  9:58   ` Milton Miller
  2008-09-24 10:17     ` Benjamin Herrenschmidt
@ 2008-09-24 12:30     ` Sebastien Dugue
  2008-09-24 16:42       ` Milton Miller
  1 sibling, 1 reply; 27+ messages in thread
From: Sebastien Dugue @ 2008-09-24 12:30 UTC (permalink / raw)
  To: Milton Miller
  Cc: linux-kernel, linuxppc-dev, Christoph Raisch, Paul Mackerras,
	Jan-Bernd Themann


  Hi Milton,

On Wed, 24 Sep 2008 04:58:22 -0500 (CDT) Milton Miller <miltonm@bga.com> wrote:

> Jan-Bernd wrote:
> > Ben, can you / your team look into the implementation
> > of the set_irq_type functionality needed for XICS?
> 
> I'm not volunteering to look at or implement any changes for how xics
> works with generic irq, but I'm trying to understand what the rt kernel
> is trying to accomplish with this statement:
> 
> On Mon Sep 15 at 18:04:06 EST in 2008, Sebastien Dugue wrote:
> > When entering the low level handler, level sensitive interrupts are
> > masked, then eio'd in interrupt context and then unmasked at the
> > end of hardirq processing.  That's fine as any interrupt comming
> > in-between will still be processed since the kernel replays those
> > pending interrupts.
> 
> Is this to generate some kind of software managed nesting and priority
> of the hardware level interrupts?

  No, not really. This is only to be sure to not miss interrupts coming
from the same source that were received during threaded hardirq processing.
Some instrumentation showed that it never seems to happen in the eHEA
interrupt case, so I think we can forget this aspect.

  Also, the problem only manifests with the eHEA RX interrupt. For example,
the IBM Power Raid (ipr) SCSI exhibits absolutely no problem under an RT
kernel. From this I conclude that:

  IPR - PCI - XICS is OK
  eHEA - IBMEBUS - XICS is broken with hardirq preemption.

  I also checked that forcing the eHEA interrupt to take the non threaded
path does work.


  Here is a side by side comparison of the fasteoi flow with and without hardirq
threading (sorry it's a bit wide)


					fasteoi flow:
					------------

	Non threaded hardirq			|			threaded hardirq
						|
   interrupt context				|	   interrupt context		hardirq thread
   -----------------				|	   -----------------		--------------
						|
						|
clear IRQ_REPLAY and IRQ_WAITING		|	clear IRQ_REPLAY and IRQ_WAITING
						|
increment percpu interrupt count		|	increment percpu interrupt count
						|
if no action or IRQ_INPROGRESS or IRQ_DISABLED	|	if no action or IRQ_INPROGRESS or IRQ_DISABLED
						|
  set IRQ_PENDING				|	  set IRQ_PENDING
						|
  mask						|	  mask
						|
  eoi						|	  eoi
						|
  done						|	  done
						|
set IRQ_INPROGRESS				|	set IRQ_INPROGRESS
						|
						|
						|	wakeup IRQ thread
						|
						|	mask
						|
						|	eoi
						|
						|	done		     --
						|			       \
						|				\
						|				 \
						|				  --> loop
						|
clear IRQ_PENDING				|				        clear IRQ_PENDING
						|
call handle_IRQ_event				|				        call handle_IRQ_event
						|
						|					check for prempt
						|
						|				      until IRQ_PENDING cleared
						|
						|
clear IRQ_INPROGRESS				|				      clear IRQ_INPROGRESS
						|
if not IRQ_DISABLED				|				      if not IRQ_DISABLED
						|
  unmask					|				        unmask
						|
eoi						|
						|
done						|




  the non-threaded flow does (in interrupt context):

    mask
    handle interrupt
    unmask
    eoi

  the threaded flow does:

    mask
    eoi
		handle interrupt
		unmask

  If I remove the mask() call, then the eHEA is no longer hanging.

> 
> The reason I ask is the xics controller can do unlimited nesting
> of hardware interrupts.  In fact, the hardware has 255 levels of
> priority, of which 16 or so are reserved by the hypervisor, leaving
> over 200 for the os to manage.  Higher numbers are lower in priority,
> and the hardware will only dispatch an interrupt to a given cpu if
> it is currenty at a lower priority.  If it is at a higher priority
> and the interrupt is not bound to a specific cpu it will look for
> another cpu to dispatch it.  The hardware will not re-present an
> irq until the it is EOId (managed by a small state machine per
> interrupt at the source, which also handles no cpu available try
> again later), but software can return its cpu priority to the
> previous level to recieve other interrupt sources at the same level.
> The hardware also supports lazy update of the cpu priority register
> when an interrupt is presented; as long as the cpu is hard-irq
> enabled it can take the irq then write is real priority and let the
> hw decide if the irq is still pending or it must defer or try another
> cpu in the rejection scenerio.  The only restriction is that the
> EOI can not cause an interrupt reject by raising the priority while
> sending the EOI command.
> 
> The per-interrupt mask and unmask calls have to go through RTAS, a
> single-threaded global context, which in addition to increasing
> path length will really limit scalability.  The interrupt controller
> poll and reject facilities are accessed through hypervisor calls
> which are comparable to a fast syscall, and parallel to all cpus.
> 
> We used to lower the priority to allow other interrupts in, but we
> realized that in addition to the questionable latency in doing so,
> it only caused unlimited stack nesting and overflow without per-irq
> stacks.  We currently set IPIs above other irqs so we typically
> only process them during a hard irq (but we return to base level
> after IPI and could take another base irq, a bug).
> 
> 
> So, Sebastien, with this information, is does the RT kernel have
> a strategy that better matches this hardware?

  Don't think so. I think that the problem may be elsewhere as
everything is fine with PCI devices (well at least SCSI).

  As I said earlier in another mail, it seems that the eHEA
is behaving as if it was generating edge interrupts which do not
support masking. Don't know.

  Thanks a lot for the explanation, looks like the xics + hypervisor
combo is way more complex than I thought.

  Sebastien.

> 
> milton
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-24 10:17     ` Benjamin Herrenschmidt
@ 2008-09-24 11:02       ` Milton Miller
  2008-09-24 21:14         ` Benjamin Herrenschmidt
  2008-09-24 12:35       ` Sebastien Dugue
  1 sibling, 1 reply; 27+ messages in thread
From: Milton Miller @ 2008-09-24 11:02 UTC (permalink / raw)
  To: benh
  Cc: linux-kernel, linuxppc-dev, Sebastien Dugue, Jan-Bernd Themann,
	Paul Mackerras, Christoph Raisch

On Sep 24, 2008, at 5:17 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2008-09-24 at 04:58 -0500, Milton Miller wrote:
>> The per-interrupt mask and unmask calls have to go through RTAS, a
>> single-threaded global context, which in addition to increasing
>> path length will really limit scalability.  The interrupt controller
>> poll and reject facilities are accessed through hypervisor calls
>> which are comparable to a fast syscall, and parallel to all cpus.
>
> Note also that the XICS code thus assumes, iirc, as does the cell IIC
> code, that eoi is called on the -same- cpu that fetched the interrupt
> initially. That assumption can be broken with IRQ threads no ?

There may be some implicit assumption in that we expect the cpu 
priority to be returned to normal by the EOI, but there is nothing in 
the hardware that requires the EOI to come from the same cpu as 
accepted the interrupt for processing, with the exception of the IPI 
which is per-cpu (and the only interrupt that is per-cpu).

It would probably mean adding the concept of the current cpu priority 
vs interrupts and making sure we write it to hardware at irq_exit() 
time when deferring the actual irq handlers.

The MPIC hardware, on the other hand, maintains a queue of pending 
interrupts (It has been about a decade but the number 4-5 comes to 
mind), and the hardware must receive the EOI on the cpu that took it.  
I am guessing that the handling described (take level irq, mask it, eoi 
it, dispatch the thread, then unmask it after processing) is a result 
to work within those limitations.  Do you know the cell IIC to know if 
its mpic or xics in this regard?

The other unknown is the (very few) platforms that present as xics but 
are really firmware on mpic.  If they do a full virtual layer and don't 
take shortcuts but do eoi/mask like described here they should work, 
but I would not be surprised that does not hold true :-(.

milton

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-24  9:58   ` Milton Miller
@ 2008-09-24 10:17     ` Benjamin Herrenschmidt
  2008-09-24 11:02       ` Milton Miller
  2008-09-24 12:35       ` Sebastien Dugue
  2008-09-24 12:30     ` Sebastien Dugue
  1 sibling, 2 replies; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2008-09-24 10:17 UTC (permalink / raw)
  To: Milton Miller
  Cc: linux-kernel, linuxppc-dev, Sebastien Dugue, Christoph Raisch,
	Paul Mackerras, Jan-Bernd Themann

On Wed, 2008-09-24 at 04:58 -0500, Milton Miller wrote:
> The per-interrupt mask and unmask calls have to go through RTAS, a
> single-threaded global context, which in addition to increasing
> path length will really limit scalability.  The interrupt controller
> poll and reject facilities are accessed through hypervisor calls
> which are comparable to a fast syscall, and parallel to all cpus.

Note also that the XICS code thus assumes, iirc, as does the cell IIC
code, that eoi is called on the -same- cpu that fetched the interrupt
initially. That assumption can be broken with IRQ threads no ?

Ben.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-15  8:04 ` [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption Sebastien Dugue
                     ` (2 preceding siblings ...)
  2008-09-18  7:53   ` Christoph Raisch
@ 2008-09-24  9:58   ` Milton Miller
  2008-09-24 10:17     ` Benjamin Herrenschmidt
  2008-09-24 12:30     ` Sebastien Dugue
  3 siblings, 2 replies; 27+ messages in thread
From: Milton Miller @ 2008-09-24  9:58 UTC (permalink / raw)
  To: Sebastien Dugue
  Cc: linux-kernel, linuxppc-dev, Christoph Raisch, Paul Mackerras,
	Jan-Bernd Themann

Jan-Bernd wrote:
> Ben, can you / your team look into the implementation
> of the set_irq_type functionality needed for XICS?

I'm not volunteering to look at or implement any changes for how xics
works with generic irq, but I'm trying to understand what the rt kernel
is trying to accomplish with this statement:

On Mon Sep 15 at 18:04:06 EST in 2008, Sebastien Dugue wrote:
> When entering the low level handler, level sensitive interrupts are
> masked, then eio'd in interrupt context and then unmasked at the
> end of hardirq processing.  That's fine as any interrupt comming
> in-between will still be processed since the kernel replays those
> pending interrupts.

Is this to generate some kind of software managed nesting and priority
of the hardware level interrupts?

The reason I ask is the xics controller can do unlimited nesting
of hardware interrupts.  In fact, the hardware has 255 levels of
priority, of which 16 or so are reserved by the hypervisor, leaving
over 200 for the os to manage.  Higher numbers are lower in priority,
and the hardware will only dispatch an interrupt to a given cpu if
it is currenty at a lower priority.  If it is at a higher priority
and the interrupt is not bound to a specific cpu it will look for
another cpu to dispatch it.  The hardware will not re-present an
irq until the it is EOId (managed by a small state machine per
interrupt at the source, which also handles no cpu available try
again later), but software can return its cpu priority to the
previous level to recieve other interrupt sources at the same level.
The hardware also supports lazy update of the cpu priority register
when an interrupt is presented; as long as the cpu is hard-irq
enabled it can take the irq then write is real priority and let the
hw decide if the irq is still pending or it must defer or try another
cpu in the rejection scenerio.  The only restriction is that the
EOI can not cause an interrupt reject by raising the priority while
sending the EOI command.

The per-interrupt mask and unmask calls have to go through RTAS, a
single-threaded global context, which in addition to increasing
path length will really limit scalability.  The interrupt controller
poll and reject facilities are accessed through hypervisor calls
which are comparable to a fast syscall, and parallel to all cpus.

We used to lower the priority to allow other interrupts in, but we
realized that in addition to the questionable latency in doing so,
it only caused unlimited stack nesting and overflow without per-irq
stacks.  We currently set IPIs above other irqs so we typically
only process them during a hard irq (but we return to base level
after IPI and could take another base irq, a bug).


So, Sebastien, with this information, is does the RT kernel have
a strategy that better matches this hardware?

milton

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-18  7:53   ` Christoph Raisch
@ 2008-09-18  9:27     ` Sebastien Dugue
  0 siblings, 0 replies; 27+ messages in thread
From: Sebastien Dugue @ 2008-09-18  9:27 UTC (permalink / raw)
  To: Christoph Raisch
  Cc: Thomas Q Klein, tinytim, Linux-rt, Jan-Bernd Themann, netdev,
	jean-pierre.dion, linux-kernel, linux-ppc,
	Benjamin Herrenschmidt, gilles.carry

On Thu, 18 Sep 2008 09:53:54 +0200 Christoph Raisch <RAISCH@de.ibm.com> wrote:

> 
> Sebastien Dugue <sebastien.dugue@bull.net> wrote on 15.09.2008 10:04:06:
> > [PATCH HACK] powerpc: quick hack to get a functional eHEA with
> > hardirq preemption
> >
> > Sebastien Dugue
> >
> > to:
> >
> > 15.09.2008 10:07
> >
> > Cc:
> >
> > linux-ppc, linux-kernel, Linux-rt, netdev, Jan-Bernd Themann, Thomas
> > Q Klein, Christoph Raisch, jean-pierre.dion, gilles.carry, tinytim
> >
> >
> > WARNING: HACK - HACK - HACK
> > Not-Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
> > ---
> > --- a/arch/powerpc/kernel/ibmebus.c
> > +++ b/arch/powerpc/kernel/ibmebus.c
> > @@ -41,6 +41,7 @@
> > -   return request_irq(irq, handler, irq_flags, devname, dev_id);
> > +   ret = request_irq(irq, handler, irq_flags, devname, dev_id);
> > +
> > +   desc = irq_desc + irq;
> > +   desc->status &= ~(IRQ_TYPE_SENSE_MASK | IRQ_LEVEL);
> > +   desc->status |= IRQ_TYPE_EDGE_RISING;
> > +
> > +   return ret;
> This looks a bit like a set_irq_type call.

  Yep.

> Don't know if this is fully implemented for xics though...

  No, the xics does not implement a set_type() method.

> >  }
> >  EXPORT_SYMBOL(ibmebus_request_irq);
> >
> > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> > index b7b397a..6d366ca 100644
> > --- a/kernel/irq/chip.c
> > +++ b/kernel/irq/chip.c
> > @@ -430,7 +430,7 @@ handle_fasteoi_irq(unsigned int irq, struct
> > irq_desc *desc)
> >     action = desc->action;
> >     if (unlikely(!action || (desc->status & (IRQ_INPROGRESS |
> >                     IRQ_DISABLED)))) {
> > -      desc->status |= IRQ_PENDING;
> > +      desc->status |= IRQ_PENDING | IRQ_MASKED;
> >        if (desc->chip->mask)
> >           desc->chip->mask(irq);
> >        goto out;
> > @@ -439,9 +439,10 @@ handle_fasteoi_irq(unsigned int irq, struct
> > irq_desc *desc)
> >     desc->status |= IRQ_INPROGRESS;
> >     /*
> >      * In the threaded case we fall back to a mask+eoi sequence:
> > +    * excepted for edge interrupts which are not masked.
> >      */
> >     if (redirect_hardirq(desc)) {
> > -      if (desc->chip->mask)
> > +      if (desc->chip->mask && !(desc->status & IRQ_TYPE_EDGE_BOTH))
> >           desc->chip->mask(irq);
> >        goto out;
> >     }
> > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> > index 3bffa20..3e39c71 100644
> > --- a/kernel/irq/manage.c
> > +++ b/kernel/irq/manage.c
> > @@ -788,9 +788,12 @@ static void do_hardirq(struct irq_desc *desc)
> >        thread_simple_irq(desc);
> >     else if (desc->handle_irq == handle_level_irq)
> >        thread_level_irq(desc);
> > -   else if (desc->handle_irq == handle_fasteoi_irq)
> > -      thread_fasteoi_irq(desc);
> > -   else if (desc->handle_irq == handle_edge_irq)
> > +   else if (desc->handle_irq == handle_fasteoi_irq) {
> > +      if (desc->status & IRQ_TYPE_EDGE_BOTH)
> > +         thread_edge_irq(desc);
> > +      else
> > +         thread_fasteoi_irq(desc);
> > +   } else if (desc->handle_irq == handle_edge_irq)
> >        thread_edge_irq(desc);
> >     else
> >        thread_do_irq(desc);
> > --
> > 1.6.0.1.308.gede4c
> >
> According to the specs at some point in the system the HEA IRQs have a edge
> characteristic.
> But since PCI-E edge and level can both be forwarded through a message
> interface
> (HEA is not PCI-E, it's only connected to the same internal bus, where the
> PHB resides)

  Good to know, the problem here seems to be that the xics is using the
fasteoi flow handler, which under unconditionally masks the irq. Will have to
dig in the genirq stuff a bit more to understand what the differences are
between -rt and mainline.

> 
> Anybody from the xics experts want to comment on this?

  It would be really interresting to know if the eHCA exhibits the same
problem under -rt as it's the only other user of the ibmebus.
Unfortunately I don't have the hardware to test.

  Thanks,

  Sebastien.


> 
> 
> 
> Gruss / Regards
> Christoph R.  & Jan-Bernd
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-15  8:04 ` [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption Sebastien Dugue
  2008-09-15 12:17   ` Jan-Bernd Themann
  2008-09-15 12:35   ` Thomas Klein
@ 2008-09-18  7:53   ` Christoph Raisch
  2008-09-18  9:27     ` Sebastien Dugue
  2008-09-24  9:58   ` Milton Miller
  3 siblings, 1 reply; 27+ messages in thread
From: Christoph Raisch @ 2008-09-18  7:53 UTC (permalink / raw)
  To: Sebastien Dugue
  Cc: Thomas Q Klein, tinytim, Linux-rt, Jan-Bernd Themann, netdev,
	jean-pierre.dion, linux-kernel, linux-ppc,
	Benjamin Herrenschmidt, gilles.carry


Sebastien Dugue <sebastien.dugue@bull.net> wrote on 15.09.2008 10:04:06:
> [PATCH HACK] powerpc: quick hack to get a functional eHEA with
> hardirq preemption
>
> Sebastien Dugue
>
> to:
>
> 15.09.2008 10:07
>
> Cc:
>
> linux-ppc, linux-kernel, Linux-rt, netdev, Jan-Bernd Themann, Thomas
> Q Klein, Christoph Raisch, jean-pierre.dion, gilles.carry, tinytim
>
>
> WARNING: HACK - HACK - HACK
> Not-Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
> ---
> --- a/arch/powerpc/kernel/ibmebus.c
> +++ b/arch/powerpc/kernel/ibmebus.c
> @@ -41,6 +41,7 @@
> -   return request_irq(irq, handler, irq_flags, devname, dev_id);
> +   ret = request_irq(irq, handler, irq_flags, devname, dev_id);
> +
> +   desc = irq_desc + irq;
> +   desc->status &= ~(IRQ_TYPE_SENSE_MASK | IRQ_LEVEL);
> +   desc->status |= IRQ_TYPE_EDGE_RISING;
> +
> +   return ret;
This looks a bit like a set_irq_type call.
Don't know if this is fully implemented for xics though...
>  }
>  EXPORT_SYMBOL(ibmebus_request_irq);
>
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index b7b397a..6d366ca 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -430,7 +430,7 @@ handle_fasteoi_irq(unsigned int irq, struct
> irq_desc *desc)
>     action = desc->action;
>     if (unlikely(!action || (desc->status & (IRQ_INPROGRESS |
>                     IRQ_DISABLED)))) {
> -      desc->status |= IRQ_PENDING;
> +      desc->status |= IRQ_PENDING | IRQ_MASKED;
>        if (desc->chip->mask)
>           desc->chip->mask(irq);
>        goto out;
> @@ -439,9 +439,10 @@ handle_fasteoi_irq(unsigned int irq, struct
> irq_desc *desc)
>     desc->status |= IRQ_INPROGRESS;
>     /*
>      * In the threaded case we fall back to a mask+eoi sequence:
> +    * excepted for edge interrupts which are not masked.
>      */
>     if (redirect_hardirq(desc)) {
> -      if (desc->chip->mask)
> +      if (desc->chip->mask && !(desc->status & IRQ_TYPE_EDGE_BOTH))
>           desc->chip->mask(irq);
>        goto out;
>     }
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 3bffa20..3e39c71 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -788,9 +788,12 @@ static void do_hardirq(struct irq_desc *desc)
>        thread_simple_irq(desc);
>     else if (desc->handle_irq == handle_level_irq)
>        thread_level_irq(desc);
> -   else if (desc->handle_irq == handle_fasteoi_irq)
> -      thread_fasteoi_irq(desc);
> -   else if (desc->handle_irq == handle_edge_irq)
> +   else if (desc->handle_irq == handle_fasteoi_irq) {
> +      if (desc->status & IRQ_TYPE_EDGE_BOTH)
> +         thread_edge_irq(desc);
> +      else
> +         thread_fasteoi_irq(desc);
> +   } else if (desc->handle_irq == handle_edge_irq)
>        thread_edge_irq(desc);
>     else
>        thread_do_irq(desc);
> --
> 1.6.0.1.308.gede4c
>
According to the specs at some point in the system the HEA IRQs have a edge
characteristic.
But since PCI-E edge and level can both be forwarded through a message
interface
(HEA is not PCI-E, it's only connected to the same internal bus, where the
PHB resides)

Anybody from the xics experts want to comment on this?



Gruss / Regards
Christoph R.  & Jan-Bernd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-16 11:59       ` Anton Vorontsov
@ 2008-09-16 12:22         ` Sebastien Dugue
  0 siblings, 0 replies; 27+ messages in thread
From: Sebastien Dugue @ 2008-09-16 12:22 UTC (permalink / raw)
  To: avorontsov
  Cc: tklein, tinytim, Linux-rt, jean-pierre.dion, themann, netdev,
	linux-kernel, Thomas Klein, linux-ppc, raisch, gilles.carry


  Hi Anton,

On Tue, 16 Sep 2008 15:59:47 +0400 Anton Vorontsov <avorontsov@ru.mvista.com> wrote:

> On Mon, Sep 15, 2008 at 03:13:32PM +0200, Sebastien Dugue wrote:
> [...]
> > > we are a bit worried about putting this into the mainstream part of non real
> > > time linux.
> > 
> >   Heck, I sure do not want this to be applied mainstream nor into any tree.
> > The sole purpose of this patch was to trigger some reaction from the people who
> > know the hardware and try to understand where the problem lies.
> > 
> > > There interrupts work perfectly fine, and it was a bit of a
> > > challenge to get there for all cases / configurations / machines.
> > 
> >   Agreed, but the fact that it fails with hardirq preemption leads me to
> > believe (without any more knowledge about the harware) that there might be
> > something amiss with this driver (or the code concerning the XICS)
> > nevertheless.
> > 
> > > 
> > > Could you try to enable these changes only for RT-Linux via a real-time
> > > kconfig switch?
> > 
> >   Nope, this is just a quick hack that allows me to have a functional eHEA under
> > the rt kernel. I want to understand what the problem is:
> > 
> >   - Is the eHEA really delivering level interrupts to the XICS?
> > 
> >   - Is the XICS loosing interrupts when they are masked?
> 
> There is a known bug in the -rt kernels, the bug causes handlers
> to lose edge interrupts.
> 
> See this patch:
> 
> http://lkml.org/lkml/2008/6/30/372

  Yes, I've been following that thread back then and my hack is based on your
patch. So yes, it seems to be the same problem and it lies in the way -rt handles
the fasteoi flow.

  But looking at the comments from the XICS code, it seems that its wired for
level only interrupts. Therefore without any more specs, it's still not clear to
me that there's not a bug with the way the xics handles eHEA interrupts.

  Are the eHEA interrupts really level interrupts? If so why do they get lost
when masked?

  I just found that not masking that particular interrupt in the fasteoi path
makes the thing work!

  Thanks,

  Sebastien.

> 
> >   - ...?
> 
> -- 
> Anton Vorontsov
> email: cbouatmailru@gmail.com
> irc://irc.freenode.net/bd2
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-15 13:13     ` Sebastien Dugue
@ 2008-09-16 11:59       ` Anton Vorontsov
  2008-09-16 12:22         ` Sebastien Dugue
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Vorontsov @ 2008-09-16 11:59 UTC (permalink / raw)
  To: Sebastien Dugue
  Cc: tklein, tinytim, Linux-rt, jean-pierre.dion, themann, netdev,
	linux-kernel, Thomas Klein, linux-ppc, raisch, gilles.carry

On Mon, Sep 15, 2008 at 03:13:32PM +0200, Sebastien Dugue wrote:
[...]
> > we are a bit worried about putting this into the mainstream part of non real
> > time linux.
> 
>   Heck, I sure do not want this to be applied mainstream nor into any tree.
> The sole purpose of this patch was to trigger some reaction from the people who
> know the hardware and try to understand where the problem lies.
> 
> > There interrupts work perfectly fine, and it was a bit of a
> > challenge to get there for all cases / configurations / machines.
> 
>   Agreed, but the fact that it fails with hardirq preemption leads me to
> believe (without any more knowledge about the harware) that there might be
> something amiss with this driver (or the code concerning the XICS)
> nevertheless.
> 
> > 
> > Could you try to enable these changes only for RT-Linux via a real-time
> > kconfig switch?
> 
>   Nope, this is just a quick hack that allows me to have a functional eHEA under
> the rt kernel. I want to understand what the problem is:
> 
>   - Is the eHEA really delivering level interrupts to the XICS?
> 
>   - Is the XICS loosing interrupts when they are masked?

There is a known bug in the -rt kernels, the bug causes handlers
to lose edge interrupts.

See this patch:

http://lkml.org/lkml/2008/6/30/372

>   - ...?

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-15 12:35   ` Thomas Klein
@ 2008-09-15 13:13     ` Sebastien Dugue
  2008-09-16 11:59       ` Anton Vorontsov
  0 siblings, 1 reply; 27+ messages in thread
From: Sebastien Dugue @ 2008-09-15 13:13 UTC (permalink / raw)
  To: Thomas Klein
  Cc: tklein, tinytim, Linux-rt, themann, netdev, linux-kernel,
	jean-pierre.dion, linux-ppc, raisch, gilles.carry


  Hi Thomas, Jan-Bernd, Christoph,

On Mon, 15 Sep 2008 14:35:16 +0200 Thomas Klein <osstklei@de.ibm.com> wrote:

> Hi,
> 
> we are a bit worried about putting this into the mainstream part of non real
> time linux.

  Heck, I sure do not want this to be applied mainstream nor into any tree.
The sole purpose of this patch was to trigger some reaction from the people who
know the hardware and try to understand where the problem lies.

> There interrupts work perfectly fine, and it was a bit of a
> challenge to get there for all cases / configurations / machines.

  Agreed, but the fact that it fails with hardirq preemption leads me to
believe (without any more knowledge about the harware) that there might be
something amiss with this driver (or the code concerning the XICS)
nevertheless.

> 
> Could you try to enable these changes only for RT-Linux via a real-time
> kconfig switch?

  Nope, this is just a quick hack that allows me to have a functional eHEA under
the rt kernel. I want to understand what the problem is:

  - Is the eHEA really delivering level interrupts to the XICS?

  - Is the XICS loosing interrupts when they are masked?

  - ...?

> This way we make sure we don't break the scheme for
> eHEA / eHCA.

  Sure, I do not want to break anything, quite the opposite in fact ;-)


  Thanks,

  Sebastien.

> 
> Regards,
> Jan-Bernd, Christoph
> 
> 
> Sebastien Dugue wrote:
> > WARNING: HACK - HACK - HACK
> > 
> >   Under the RT kernel (with hardirq preemption) the eHEA driver hangs right
> > after booting. Fiddling with the hardirqs and softirqs priorities allows to
> > run a bit longer but as soon as the network gets under load, the hang
> > returns. After investigating, it appears that the driver is loosing interrupts.
> > 
> >   To make a long story short, looking at the code, it appears that the XICS
> > maps all its interrupts to level sensitive interrupts (I don't know if it's the
> > reality or if it's due to an incomplete implementation - no datasheets
> > available to check) and use the fasteoi processing flow.
> > 
> >   When entering the low level handler, level sensitive interrupts are masked,
> > then eio'd in interrupt context and then unmasked at the end of hardirq
> > processing.
> > That's fine as any interrupt comming in-between will still be processed since
> > the kernel replays those pending interrupts.
> > 
> >   However, it appears that the eHEA interrupts are behaving as edge sensitive
> > interrupts and are routed through the XICS which process those as level
> > sensitive using the fasteoi handler __OR__ the XICS loses interrupts when they
> > are masked.
> > 
> >   Therefore the masking done in the handler causes any interrupt happening while
> > in the handler to be lost.
> > 
> >   So this patch maps the interrupts being requested through
> > ibmebus_request_irq() as edge sensitive interrupts (this concerns both the eHEA
> > and the eHCA - only users of ibmebus_request_irq()) and changes the way edge
> > interrupts are processed by the fasteoi handler.
> > 
> >   It works for the eHEA, dunno for the eHCA.
> > 
> >   So, unless all the designers of the XICS & eHEA have been shot to keep it
> > a secret, could someone knowledgeable shed some light on this issue.
> > 
> >   Thanks,
> > 
> >   Sebastien.
> > 
> > Not-Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
> > ---
> >  arch/powerpc/kernel/ibmebus.c |   11 ++++++++++-
> >  kernel/irq/chip.c             |    5 +++--
> >  kernel/irq/manage.c           |    9 ++++++---
> >  3 files changed, 19 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/ibmebus.c b/arch/powerpc/kernel/ibmebus.c
> > index 9971159..5200323 100644
> > --- a/arch/powerpc/kernel/ibmebus.c
> > +++ b/arch/powerpc/kernel/ibmebus.c
> > @@ -41,6 +41,7 @@
> >  #include <linux/kobject.h>
> >  #include <linux/dma-mapping.h>
> >  #include <linux/interrupt.h>
> > +#include <linux/irq.h>
> >  #include <linux/of.h>
> >  #include <linux/of_platform.h>
> >  #include <asm/ibmebus.h>
> > @@ -213,11 +214,19 @@ int ibmebus_request_irq(u32 ist, irq_handler_t handler,
> >  			void *dev_id)
> >  {
> >  	unsigned int irq = irq_create_mapping(NULL, ist);
> > +	struct irq_desc *desc;
> > +	int ret;
> >  
> >  	if (irq == NO_IRQ)
> >  		return -EINVAL;
> >  
> > -	return request_irq(irq, handler, irq_flags, devname, dev_id);
> > +	ret = request_irq(irq, handler, irq_flags, devname, dev_id);
> > +
> > +	desc = irq_desc + irq;
> > +	desc->status &= ~(IRQ_TYPE_SENSE_MASK | IRQ_LEVEL);
> > +	desc->status |= IRQ_TYPE_EDGE_RISING;
> > +
> > +	return ret;
> >  }
> >  EXPORT_SYMBOL(ibmebus_request_irq);
> >  
> > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> > index b7b397a..6d366ca 100644
> > --- a/kernel/irq/chip.c
> > +++ b/kernel/irq/chip.c
> > @@ -430,7 +430,7 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc)
> >  	action = desc->action;
> >  	if (unlikely(!action || (desc->status & (IRQ_INPROGRESS |
> >  						 IRQ_DISABLED)))) {
> > -		desc->status |= IRQ_PENDING;
> > +		desc->status |= IRQ_PENDING | IRQ_MASKED;
> >  		if (desc->chip->mask)
> >  			desc->chip->mask(irq);
> >  		goto out;
> > @@ -439,9 +439,10 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc)
> >  	desc->status |= IRQ_INPROGRESS;
> >  	/*
> >  	 * In the threaded case we fall back to a mask+eoi sequence:
> > +	 * excepted for edge interrupts which are not masked.
> >  	 */
> >  	if (redirect_hardirq(desc)) {
> > -		if (desc->chip->mask)
> > +		if (desc->chip->mask && !(desc->status & IRQ_TYPE_EDGE_BOTH))
> >  			desc->chip->mask(irq);
> >  		goto out;
> >  	}
> > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> > index 3bffa20..3e39c71 100644
> > --- a/kernel/irq/manage.c
> > +++ b/kernel/irq/manage.c
> > @@ -788,9 +788,12 @@ static void do_hardirq(struct irq_desc *desc)
> >  		thread_simple_irq(desc);
> >  	else if (desc->handle_irq == handle_level_irq)
> >  		thread_level_irq(desc);
> > -	else if (desc->handle_irq == handle_fasteoi_irq)
> > -		thread_fasteoi_irq(desc);
> > -	else if (desc->handle_irq == handle_edge_irq)
> > +	else if (desc->handle_irq == handle_fasteoi_irq) {
> > +		if (desc->status & IRQ_TYPE_EDGE_BOTH)
> > +			thread_edge_irq(desc);
> > +		else
> > +			thread_fasteoi_irq(desc);
> > +	} else if (desc->handle_irq == handle_edge_irq)
> >  		thread_edge_irq(desc);
> >  	else
> >  		thread_do_irq(desc);
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-15  8:04 ` [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption Sebastien Dugue
  2008-09-15 12:17   ` Jan-Bernd Themann
@ 2008-09-15 12:35   ` Thomas Klein
  2008-09-15 13:13     ` Sebastien Dugue
  2008-09-18  7:53   ` Christoph Raisch
  2008-09-24  9:58   ` Milton Miller
  3 siblings, 1 reply; 27+ messages in thread
From: Thomas Klein @ 2008-09-15 12:35 UTC (permalink / raw)
  To: Sebastien Dugue
  Cc: tklein, tinytim, Linux-rt, themann, netdev, linux-kernel,
	jean-pierre.dion, linux-ppc, raisch, gilles.carry

Hi,

we are a bit worried about putting this into the mainstream part of non real
time linux. There interrupts work perfectly fine, and it was a bit of a
challenge to get there for all cases / configurations / machines.

Could you try to enable these changes only for RT-Linux via a real-time
kconfig switch? This way we make sure we don't break the scheme for
eHEA / eHCA.

Regards,
Jan-Bernd, Christoph


Sebastien Dugue wrote:
> WARNING: HACK - HACK - HACK
> 
>   Under the RT kernel (with hardirq preemption) the eHEA driver hangs right
> after booting. Fiddling with the hardirqs and softirqs priorities allows to
> run a bit longer but as soon as the network gets under load, the hang
> returns. After investigating, it appears that the driver is loosing interrupts.
> 
>   To make a long story short, looking at the code, it appears that the XICS
> maps all its interrupts to level sensitive interrupts (I don't know if it's the
> reality or if it's due to an incomplete implementation - no datasheets
> available to check) and use the fasteoi processing flow.
> 
>   When entering the low level handler, level sensitive interrupts are masked,
> then eio'd in interrupt context and then unmasked at the end of hardirq
> processing.
> That's fine as any interrupt comming in-between will still be processed since
> the kernel replays those pending interrupts.
> 
>   However, it appears that the eHEA interrupts are behaving as edge sensitive
> interrupts and are routed through the XICS which process those as level
> sensitive using the fasteoi handler __OR__ the XICS loses interrupts when they
> are masked.
> 
>   Therefore the masking done in the handler causes any interrupt happening while
> in the handler to be lost.
> 
>   So this patch maps the interrupts being requested through
> ibmebus_request_irq() as edge sensitive interrupts (this concerns both the eHEA
> and the eHCA - only users of ibmebus_request_irq()) and changes the way edge
> interrupts are processed by the fasteoi handler.
> 
>   It works for the eHEA, dunno for the eHCA.
> 
>   So, unless all the designers of the XICS & eHEA have been shot to keep it
> a secret, could someone knowledgeable shed some light on this issue.
> 
>   Thanks,
> 
>   Sebastien.
> 
> Not-Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
> ---
>  arch/powerpc/kernel/ibmebus.c |   11 ++++++++++-
>  kernel/irq/chip.c             |    5 +++--
>  kernel/irq/manage.c           |    9 ++++++---
>  3 files changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/ibmebus.c b/arch/powerpc/kernel/ibmebus.c
> index 9971159..5200323 100644
> --- a/arch/powerpc/kernel/ibmebus.c
> +++ b/arch/powerpc/kernel/ibmebus.c
> @@ -41,6 +41,7 @@
>  #include <linux/kobject.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/interrupt.h>
> +#include <linux/irq.h>
>  #include <linux/of.h>
>  #include <linux/of_platform.h>
>  #include <asm/ibmebus.h>
> @@ -213,11 +214,19 @@ int ibmebus_request_irq(u32 ist, irq_handler_t handler,
>  			void *dev_id)
>  {
>  	unsigned int irq = irq_create_mapping(NULL, ist);
> +	struct irq_desc *desc;
> +	int ret;
>  
>  	if (irq == NO_IRQ)
>  		return -EINVAL;
>  
> -	return request_irq(irq, handler, irq_flags, devname, dev_id);
> +	ret = request_irq(irq, handler, irq_flags, devname, dev_id);
> +
> +	desc = irq_desc + irq;
> +	desc->status &= ~(IRQ_TYPE_SENSE_MASK | IRQ_LEVEL);
> +	desc->status |= IRQ_TYPE_EDGE_RISING;
> +
> +	return ret;
>  }
>  EXPORT_SYMBOL(ibmebus_request_irq);
>  
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index b7b397a..6d366ca 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -430,7 +430,7 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc)
>  	action = desc->action;
>  	if (unlikely(!action || (desc->status & (IRQ_INPROGRESS |
>  						 IRQ_DISABLED)))) {
> -		desc->status |= IRQ_PENDING;
> +		desc->status |= IRQ_PENDING | IRQ_MASKED;
>  		if (desc->chip->mask)
>  			desc->chip->mask(irq);
>  		goto out;
> @@ -439,9 +439,10 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc)
>  	desc->status |= IRQ_INPROGRESS;
>  	/*
>  	 * In the threaded case we fall back to a mask+eoi sequence:
> +	 * excepted for edge interrupts which are not masked.
>  	 */
>  	if (redirect_hardirq(desc)) {
> -		if (desc->chip->mask)
> +		if (desc->chip->mask && !(desc->status & IRQ_TYPE_EDGE_BOTH))
>  			desc->chip->mask(irq);
>  		goto out;
>  	}
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 3bffa20..3e39c71 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -788,9 +788,12 @@ static void do_hardirq(struct irq_desc *desc)
>  		thread_simple_irq(desc);
>  	else if (desc->handle_irq == handle_level_irq)
>  		thread_level_irq(desc);
> -	else if (desc->handle_irq == handle_fasteoi_irq)
> -		thread_fasteoi_irq(desc);
> -	else if (desc->handle_irq == handle_edge_irq)
> +	else if (desc->handle_irq == handle_fasteoi_irq) {
> +		if (desc->status & IRQ_TYPE_EDGE_BOTH)
> +			thread_edge_irq(desc);
> +		else
> +			thread_fasteoi_irq(desc);
> +	} else if (desc->handle_irq == handle_edge_irq)
>  		thread_edge_irq(desc);
>  	else
>  		thread_do_irq(desc);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
  2008-09-15  8:04 ` [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption Sebastien Dugue
@ 2008-09-15 12:17   ` Jan-Bernd Themann
  2008-09-15 12:35   ` Thomas Klein
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Jan-Bernd Themann @ 2008-09-15 12:17 UTC (permalink / raw)
  To: Sebastien Dugue
  Cc: Thomas Q Klein, tinytim, Linux-rt, netdev, jean-pierre.dion,
	linux-kernel, linux-ppc, Christoph Raisch, gilles.carry

[-- Attachment #1: Type: text/plain, Size: 5447 bytes --]

Hi,

we are a bit worried about putting this into the mainstream part of non 
real time linux.
There interrupts work perfectly fine, and it was a bit of a challenge to 
get there for all
cases / configurations / machines.

Could you try to enable these changes only for RT-Linux via a real-time 
kconfig switch?
This way we make sure we don't break the scheme for eHEA / eHCA. 

Regards,
Jan-Bernd & Christoph

Sebastien Dugue <sebastien.dugue@bull.net> wrote on 15.09.2008 10:04:06:

> 
> WARNING: HACK - HACK - HACK
> 
>   Under the RT kernel (with hardirq preemption) the eHEA driver hangs 
right
> after booting. Fiddling with the hardirqs and softirqs priorities allows 
to
> run a bit longer but as soon as the network gets under load, the hang
> returns. After investigating, it appears that the driver is loosing 
> interrupts.
> 
>   To make a long story short, looking at the code, it appears that the 
XICS
> maps all its interrupts to level sensitive interrupts (I don't know 
> if it's the
> reality or if it's due to an incomplete implementation - no datasheets
> available to check) and use the fasteoi processing flow.
> 
>   When entering the low level handler, level sensitive interrupts are 
masked,
> then eio'd in interrupt context and then unmasked at the end of hardirq
> processing.
> That's fine as any interrupt comming in-between will still be processed 
since
> the kernel replays those pending interrupts.
> 
>   However, it appears that the eHEA interrupts are behaving as edge 
sensitive
> interrupts and are routed through the XICS which process those as level
> sensitive using the fasteoi handler __OR__ the XICS loses interruptswhen 
they
> are masked.
> 
>   Therefore the masking done in the handler causes any interrupt 
> happening while
> in the handler to be lost.
> 
>   So this patch maps the interrupts being requested through
> ibmebus_request_irq() as edge sensitive interrupts (this concerns 
> both the eHEA
> and the eHCA - only users of ibmebus_request_irq()) and changes the way 
edge
> interrupts are processed by the fasteoi handler.
> 
>   It works for the eHEA, dunno for the eHCA.
> 
>   So, unless all the designers of the XICS & eHEA have been shot to keep 
it
> a secret, could someone knowledgeable shed some light on this issue.
> 
>   Thanks,
> 
>   Sebastien.
> 
> Not-Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
> ---
>  arch/powerpc/kernel/ibmebus.c |   11 ++++++++++-
>  kernel/irq/chip.c             |    5 +++--
>  kernel/irq/manage.c           |    9 ++++++---
>  3 files changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/ibmebus.c 
b/arch/powerpc/kernel/ibmebus.c
> index 9971159..5200323 100644
> --- a/arch/powerpc/kernel/ibmebus.c
> +++ b/arch/powerpc/kernel/ibmebus.c
> @@ -41,6 +41,7 @@
>  #include <linux/kobject.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/interrupt.h>
> +#include <linux/irq.h>
>  #include <linux/of.h>
>  #include <linux/of_platform.h>
>  #include <asm/ibmebus.h>
> @@ -213,11 +214,19 @@ int ibmebus_request_irq(u32 ist, irq_handler_t 
handler,
>           void *dev_id)
>  {
>     unsigned int irq = irq_create_mapping(NULL, ist);
> +   struct irq_desc *desc;
> +   int ret;
> 
>     if (irq == NO_IRQ)
>        return -EINVAL;
> 
> -   return request_irq(irq, handler, irq_flags, devname, dev_id);
> +   ret = request_irq(irq, handler, irq_flags, devname, dev_id);
> +
> +   desc = irq_desc + irq;
> +   desc->status &= ~(IRQ_TYPE_SENSE_MASK | IRQ_LEVEL);
> +   desc->status |= IRQ_TYPE_EDGE_RISING;
> +
> +   return ret;
>  }
>  EXPORT_SYMBOL(ibmebus_request_irq);
> 
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index b7b397a..6d366ca 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -430,7 +430,7 @@ handle_fasteoi_irq(unsigned int irq, struct 
> irq_desc *desc)
>     action = desc->action;
>     if (unlikely(!action || (desc->status & (IRQ_INPROGRESS |
>                     IRQ_DISABLED)))) {
> -      desc->status |= IRQ_PENDING;
> +      desc->status |= IRQ_PENDING | IRQ_MASKED;
>        if (desc->chip->mask)
>           desc->chip->mask(irq);
>        goto out;
> @@ -439,9 +439,10 @@ handle_fasteoi_irq(unsigned int irq, struct 
> irq_desc *desc)
>     desc->status |= IRQ_INPROGRESS;
>     /*
>      * In the threaded case we fall back to a mask+eoi sequence:
> +    * excepted for edge interrupts which are not masked.
>      */
>     if (redirect_hardirq(desc)) {
> -      if (desc->chip->mask)
> +      if (desc->chip->mask && !(desc->status & IRQ_TYPE_EDGE_BOTH))
>           desc->chip->mask(irq);
>        goto out;
>     }
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 3bffa20..3e39c71 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -788,9 +788,12 @@ static void do_hardirq(struct irq_desc *desc)
>        thread_simple_irq(desc);
>     else if (desc->handle_irq == handle_level_irq)
>        thread_level_irq(desc);
> -   else if (desc->handle_irq == handle_fasteoi_irq)
> -      thread_fasteoi_irq(desc);
> -   else if (desc->handle_irq == handle_edge_irq)
> +   else if (desc->handle_irq == handle_fasteoi_irq) {
> +      if (desc->status & IRQ_TYPE_EDGE_BOTH)
> +         thread_edge_irq(desc);
> +      else
> +         thread_fasteoi_irq(desc);
> +   } else if (desc->handle_irq == handle_edge_irq)
>        thread_edge_irq(desc);
>     else
>        thread_do_irq(desc);
> -- 
> 1.6.0.1.308.gede4c
> 

[-- Attachment #2: Type: text/html, Size: 7710 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
@ 2008-09-15  8:04 ` Sebastien Dugue
  2008-09-15 12:17   ` Jan-Bernd Themann
                     ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Sebastien Dugue @ 2008-09-15  8:04 UTC (permalink / raw)
  Cc: tklein, tinytim, Linux-rt, themann, netdev, linux-kernel,
	jean-pierre.dion, linux-ppc, raisch, gilles.carry


WARNING: HACK - HACK - HACK

  Under the RT kernel (with hardirq preemption) the eHEA driver hangs right
after booting. Fiddling with the hardirqs and softirqs priorities allows to
run a bit longer but as soon as the network gets under load, the hang
returns. After investigating, it appears that the driver is loosing interrupts.

  To make a long story short, looking at the code, it appears that the XICS
maps all its interrupts to level sensitive interrupts (I don't know if it's the
reality or if it's due to an incomplete implementation - no datasheets
available to check) and use the fasteoi processing flow.

  When entering the low level handler, level sensitive interrupts are masked,
then eio'd in interrupt context and then unmasked at the end of hardirq
processing.
That's fine as any interrupt comming in-between will still be processed since
the kernel replays those pending interrupts.

  However, it appears that the eHEA interrupts are behaving as edge sensitive
interrupts and are routed through the XICS which process those as level
sensitive using the fasteoi handler __OR__ the XICS loses interrupts when they
are masked.

  Therefore the masking done in the handler causes any interrupt happening while
in the handler to be lost.

  So this patch maps the interrupts being requested through
ibmebus_request_irq() as edge sensitive interrupts (this concerns both the eHEA
and the eHCA - only users of ibmebus_request_irq()) and changes the way edge
interrupts are processed by the fasteoi handler.

  It works for the eHEA, dunno for the eHCA.

  So, unless all the designers of the XICS & eHEA have been shot to keep it
a secret, could someone knowledgeable shed some light on this issue.

  Thanks,

  Sebastien.

Not-Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
---
 arch/powerpc/kernel/ibmebus.c |   11 ++++++++++-
 kernel/irq/chip.c             |    5 +++--
 kernel/irq/manage.c           |    9 ++++++---
 3 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/ibmebus.c b/arch/powerpc/kernel/ibmebus.c
index 9971159..5200323 100644
--- a/arch/powerpc/kernel/ibmebus.c
+++ b/arch/powerpc/kernel/ibmebus.c
@@ -41,6 +41,7 @@
 #include <linux/kobject.h>
 #include <linux/dma-mapping.h>
 #include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <linux/of.h>
 #include <linux/of_platform.h>
 #include <asm/ibmebus.h>
@@ -213,11 +214,19 @@ int ibmebus_request_irq(u32 ist, irq_handler_t handler,
 			void *dev_id)
 {
 	unsigned int irq = irq_create_mapping(NULL, ist);
+	struct irq_desc *desc;
+	int ret;
 
 	if (irq == NO_IRQ)
 		return -EINVAL;
 
-	return request_irq(irq, handler, irq_flags, devname, dev_id);
+	ret = request_irq(irq, handler, irq_flags, devname, dev_id);
+
+	desc = irq_desc + irq;
+	desc->status &= ~(IRQ_TYPE_SENSE_MASK | IRQ_LEVEL);
+	desc->status |= IRQ_TYPE_EDGE_RISING;
+
+	return ret;
 }
 EXPORT_SYMBOL(ibmebus_request_irq);
 
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index b7b397a..6d366ca 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -430,7 +430,7 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc)
 	action = desc->action;
 	if (unlikely(!action || (desc->status & (IRQ_INPROGRESS |
 						 IRQ_DISABLED)))) {
-		desc->status |= IRQ_PENDING;
+		desc->status |= IRQ_PENDING | IRQ_MASKED;
 		if (desc->chip->mask)
 			desc->chip->mask(irq);
 		goto out;
@@ -439,9 +439,10 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc)
 	desc->status |= IRQ_INPROGRESS;
 	/*
 	 * In the threaded case we fall back to a mask+eoi sequence:
+	 * excepted for edge interrupts which are not masked.
 	 */
 	if (redirect_hardirq(desc)) {
-		if (desc->chip->mask)
+		if (desc->chip->mask && !(desc->status & IRQ_TYPE_EDGE_BOTH))
 			desc->chip->mask(irq);
 		goto out;
 	}
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 3bffa20..3e39c71 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -788,9 +788,12 @@ static void do_hardirq(struct irq_desc *desc)
 		thread_simple_irq(desc);
 	else if (desc->handle_irq == handle_level_irq)
 		thread_level_irq(desc);
-	else if (desc->handle_irq == handle_fasteoi_irq)
-		thread_fasteoi_irq(desc);
-	else if (desc->handle_irq == handle_edge_irq)
+	else if (desc->handle_irq == handle_fasteoi_irq) {
+		if (desc->status & IRQ_TYPE_EDGE_BOTH)
+			thread_edge_irq(desc);
+		else
+			thread_fasteoi_irq(desc);
+	} else if (desc->handle_irq == handle_edge_irq)
 		thread_edge_irq(desc);
 	else
 		thread_do_irq(desc);
-- 
1.6.0.1.308.gede4c

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2008-09-26 12:01 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-25 23:40 [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption Milton Miller
2008-09-26  9:16 ` Sebastien Dugue
  -- strict thread matches above, loose matches on Subject: below --
2008-09-23 15:43 [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption, eHCA is close Jan-Bernd Themann
2008-09-15  8:04 ` [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption Sebastien Dugue
2008-09-15 12:17   ` Jan-Bernd Themann
2008-09-15 12:35   ` Thomas Klein
2008-09-15 13:13     ` Sebastien Dugue
2008-09-16 11:59       ` Anton Vorontsov
2008-09-16 12:22         ` Sebastien Dugue
2008-09-18  7:53   ` Christoph Raisch
2008-09-18  9:27     ` Sebastien Dugue
2008-09-24  9:58   ` Milton Miller
2008-09-24 10:17     ` Benjamin Herrenschmidt
2008-09-24 11:02       ` Milton Miller
2008-09-24 21:14         ` Benjamin Herrenschmidt
2008-09-25  7:31           ` Sebastien Dugue
2008-09-24 12:35       ` Sebastien Dugue
2008-09-24 21:15         ` Benjamin Herrenschmidt
2008-09-25  7:18           ` Sebastien Dugue
2008-09-25  7:22             ` Benjamin Herrenschmidt
2008-09-25  7:42               ` Sebastien Dugue
2008-09-25  8:36                 ` Benjamin Herrenschmidt
2008-09-25  8:39                   ` Sebastien Dugue
2008-09-24 12:30     ` Sebastien Dugue
2008-09-24 16:42       ` Milton Miller
2008-09-24 21:16         ` Benjamin Herrenschmidt
2008-09-25  3:56           ` Milton Miller
2008-09-25  8:45         ` Sebastien Dugue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).