* [RFC] disable_irq()/enable_irq() semantics and ide-probe.c @ 2003-10-09 2:00 viro 2003-10-09 2:29 ` Linus Torvalds 0 siblings, 1 reply; 25+ messages in thread From: viro @ 2003-10-09 2:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, Marcelo Tosatti, Bartlomiej Zolnierkiewicz Current code (at least on x86 and alpha) appears to assume that you can't call disable_irq()/enable_irq() unless you have registered that irq. However, ide-probe.c::probe_hwif() contains the following: /* * We must always disable IRQ, as probe_for_drive will assert IRQ, but * we'll install our IRQ driver much later... */ irqd = hwif->irq; if (irqd) disable_irq(hwif->irq); and later /* * Use cached IRQ number. It might be (and is...) changed by probe * code above */ if (irqd) enable_irq(irqd); That happens *way* before we call register_irq(). Current tree barfs on that in all sorts of interesting ways. Most notably, we get irq enabled and with NULL ->action for a while. If an interrupt comes during that time, we'll get IRQ_INPROGRESS set and not reset until later register_irq() (see handle_irq() for details). Note that calling disable_irq() after that will kill us on SMP - it will spin waiting for IRQ_INPROGRESS to go away. Moreover, if somebody calls register_irq() while we are at it, we'll get ->depth reset to 0. enable_irq() will try to decrement depth and will get very unhappy about the situation. What do we really want to do here? I see only two variants: a) allow enable_irq()/disable_irq() regardless of having the thing registered. IRQ_DISABLED would be set iff ->depth is positive or ->action is NULL. register_irq() wouldn't touch the ->depth and would enable IRQ only if ->depth is 0. enable_irq() would not enable the thing unless ->action was non-NULL. That would work, but I wouldn't bet a dime on correctness - e.g. currently disable_irq() followed by free_irq() works fine and drivers might very well leave ->depth positive when they are removed. With new scheme that would be deadly. b) have ide-probe.c register a dummy handler for that period. Then it would be allowed to do what it does. Said handler would simply return IRQ_NONE and be done with that. Add BUG() to disable_irq()/enable_irq() for cases when they are called with NULL ->action. Note that scenario above is absolutely real - 2.4.21 and later hang on DS10 since their IDE chipset (alim15x3.c) does generate an interrupt after the probe code had called enable_irq(). With obvious results - ide_intr() is never called afterwards. On 2.6 it doesn't happen only because register_irq() forcibly drops IRQ_INPROGRESS, which hides that problem, but doesn't help with other scenarios (e.g. somebody sharing the same IRQ and doing register_irq() before we call enable_irq()). Comments? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 2:00 [RFC] disable_irq()/enable_irq() semantics and ide-probe.c viro @ 2003-10-09 2:29 ` Linus Torvalds 2003-10-09 2:43 ` viro 2003-10-09 12:55 ` Roman Zippel 0 siblings, 2 replies; 25+ messages in thread From: Linus Torvalds @ 2003-10-09 2:29 UTC (permalink / raw) To: viro; +Cc: linux-kernel, Marcelo Tosatti, Bartlomiej Zolnierkiewicz [ Executive summary: I think 2.6.x does the right thing as-is, although there is definitely some ugly corners there. ] On Thu, 9 Oct 2003 viro@parcelfarce.linux.theplanet.co.uk wrote: > > Current code (at least on x86 and alpha) appears to assume that > you can't call disable_irq()/enable_irq() unless you have registered > that irq. Why? It was definitely not designed to do that - the irq disable is an irq _descriptor_ feature, not a irq handler feature. So irq registration should have absolutely zero to do with anything. > That happens *way* before we call register_irq(). Current tree barfs on > that in all sorts of interesting ways. Most notably, we get irq enabled > and with NULL ->action for a while. Which should be perfectly fine. In fact, it's _designed_ to be perfectly fine, since this is how irq probing is also implemented. This is not "barfing". > If an interrupt comes during that > time, we'll get IRQ_INPROGRESS set and not reset until later register_irq() > (see handle_irq() for details). Note that calling disable_irq() after that > will kill us on SMP - it will spin waiting for IRQ_INPROGRESS to go away. Now _this_ is a bug waiting to happen. I don't think it actually happens now (since anybody who does disable_irq() _will_ either have registered the irq already or will do so soon, but I agree that it's just trouble waiting to happen. I think the fix to that is to just add a trivial test for "if the handler list is empty, don't bother synchronizing" in disable_irq(), since clearly if the list is empty there is nothing to synchronize _with_. After all, the synchronization is there just to make sure no handler runs concurrently on another CPU. (We can't add that test to "synchronize_irq()" itself without more surgery, since the irq _freeing_ path actually removes the entry from the queue first, so in that case synchronize_irq() will normally see an empty irq handler list) > Moreover, if somebody calls register_irq() while we are at it, we'll get > ->depth reset to 0. enable_irq() will try to decrement depth and will get > very unhappy about the situation. Yeah, that depth reset I actually worried about at some time. It's never been a problem though, since concurrent device probing just doesn't happen and basically isn't really supported. We discussed it for bus probing, and the rule is to just not do it. The reason I worried about the depth reset is actually different: we've historically not had a good way to atomically enable a PCI device _and_ request its interrupt. Nobody has ever really complained, but if anybody ever wants to do this, then the way to do it would be to - find out the irq - disable it - request the irq - enable the PCI routing for it - set up the device - enable the irq and the thing is, the disable should actually happen _before_ we request the irq, because we potentially could not afford to take an interrupt with the device incompletely set up. NOTE! This is not something we support, and it's not something I've heard people complain about, but it is something I think makes sense. And it definitely implies that we should be able to - disable irq's regardless of whether we have registered them - not reset the disable depth on irq request. > What do we really want to do here? I see only two variants: > a) allow enable_irq()/disable_irq() regardless of having the thing > registered. Absolutely. > b) have ide-probe.c register a dummy handler for that period. No. That's wrong. That just causes lockups with level-triggered PCI devices, so the "dummy handler" really needs to know a lot about how to turn the interrupt off. Which is nasty before the driver has even set up the device completely, and may be in the middle of futzing with it. This is why "disable_irq()" really exists. Which is why (a) is what the current code is _supposed_ to do. > Note that scenario above is absolutely real - 2.4.21 and later > hang on DS10 since their IDE chipset (alim15x3.c) does generate an interrupt > after the probe code had called enable_irq(). With obvious results - > ide_intr() is never called afterwards. On 2.6 it doesn't happen only > because register_irq() forcibly drops IRQ_INPROGRESS, which hides that > problem, but doesn't help with other scenarios (e.g. somebody sharing the > same IRQ and doing register_irq() before we call enable_irq()). As far as I can tell, 2.6.x is doing all the right things. Modulo the (not really supported) concurrent device probing, and the (not implemented) atomic irq requesting. Note that the IRQ_INPROGRESS thing was literally the bit that autodetect used to test, it got changed it to IRQ_WAITING to clarify the code and avoid bad interactions with the other uses of IRQ_INPROGRESS. And note that we do _not_ clear IRQ_INPROGRESS on "action == NULL" very much on purpose: that "action == NULL" thing also happens if the IRQ is disabled, and we need to get the edge replay right. This is why request_irq() literally _needs_ to clear that bit in 2.6.x. So the fix is to make 2.4.x do what 2.6.x does, methinks. Possibly with a more robust implementation (ie the conditional synchronize_irq() thing in disable_irq()). And while I agree that the depth clearing is bogus, but I'd be worried about removing it in case some driver actually depends on it (ie historically it has actually been ok to do: disable_irq(irq); .. set up device .. request_irq(irq, ..); // This will also enable the irq even though it's ugly, and I hope nobody does it). Linus ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 2:29 ` Linus Torvalds @ 2003-10-09 2:43 ` viro 2003-10-09 2:53 ` Linus Torvalds 2003-10-09 12:55 ` Roman Zippel 1 sibling, 1 reply; 25+ messages in thread From: viro @ 2003-10-09 2:43 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, Marcelo Tosatti, Bartlomiej Zolnierkiewicz On Wed, Oct 08, 2003 at 07:29:10PM -0700, Linus Torvalds wrote: > > If an interrupt comes during that > > time, we'll get IRQ_INPROGRESS set and not reset until later register_irq() > > (see handle_irq() for details). Note that calling disable_irq() after that > > will kill us on SMP - it will spin waiting for IRQ_INPROGRESS to go away. > > Now _this_ is a bug waiting to happen. I don't think it actually happens > now (since anybody who does disable_irq() _will_ either have registered > the irq already or will do so soon, but I agree that it's just trouble > waiting to happen. Ummm... probe_hwif() is a good example of the opposite - it can fail past the point where it disables irq and that means no register_irq() after enable_irq() call on cleanup. > I think the fix to that is to just add a trivial test for "if the handler > list is empty, don't bother synchronizing" in disable_irq(), since clearly > if the list is empty there is nothing to synchronize _with_. After all, > the synchronization is there just to make sure no handler runs > concurrently on another CPU. How about action = NULL; if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) { action = desc->action; status &= ~IRQ_PENDING; /* we commit to handling */ if (likely(action)) status |= IRQ_INPROGRESS; /* we are handling it */ } desc->status = status; in handle_irq()? > As far as I can tell, 2.6.x is doing all the right things. Modulo the (not > really supported) concurrent device probing, and the (not implemented) > atomic irq requesting. > > Note that the IRQ_INPROGRESS thing was literally the bit that autodetect > used to test, it got changed it to IRQ_WAITING to clarify the code and > avoid bad interactions with the other uses of IRQ_INPROGRESS. > > And note that we do _not_ clear IRQ_INPROGRESS on "action == NULL" very > much on purpose: that "action == NULL" thing also happens if the IRQ is > disabled, and we need to get the edge replay right. This is why > request_irq() literally _needs_ to clear that bit in 2.6.x. See above - we shouldn't clear it on action == NULL, but we don't need to set it, AFAICS. > So the fix is to make 2.4.x do what 2.6.x does, methinks. ObOtherFun: There's another bogosity in quoted ide-probe.c code, according to dwmw2 - he says that there are PCI IDE cards that get IRQ 0, so the test for hwif->irq is b0rken. We probably should stop overloading ->irq == 0 for "none given", but I'm not sure that we *have* a value that would never be used as an IRQ number on all platforms... ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 2:43 ` viro @ 2003-10-09 2:53 ` Linus Torvalds 2003-10-09 8:03 ` Russell King ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Linus Torvalds @ 2003-10-09 2:53 UTC (permalink / raw) To: viro; +Cc: linux-kernel, Marcelo Tosatti, Bartlomiej Zolnierkiewicz On Thu, 9 Oct 2003 viro@parcelfarce.linux.theplanet.co.uk wrote: > > How about > > action = NULL; > if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) { > action = desc->action; > status &= ~IRQ_PENDING; /* we commit to handling */ > if (likely(action)) > status |= IRQ_INPROGRESS; /* we are handling it */ > } > desc->status = status; > > in handle_irq()? I don't mind it per se, but I don't much see the point either. handle_irq() is pretty timing-critical, so we should keep it as fast as humanly possible. In contrast, all the other paths that care about IRQ_INPROGRESS are _not_ generally timing-critical, which is why I'd rather have them do the extra work. In particular, in this case the only other path that seems to care would be "disable_irq()", which does indeed care (well "request_irq()" also cares, but request_irq() already clears the bit). > See above - we shouldn't clear it on action == NULL, but we don't > need to set it, AFAICS. I agree that we don't need to set it. It's more of a streamlining question. For 2.4.x it might also be a question of "which patch is smaller" (conceptually and in practice). I think they end up being exactly the same in this case. > > So the fix is to make 2.4.x do what 2.6.x does, methinks. > > ObOtherFun: There's another bogosity in quoted ide-probe.c code, according > to dwmw2 - he says that there are PCI IDE cards that get IRQ 0, so the > test for hwif->irq is b0rken. We probably should stop overloading > ->irq == 0 for "none given", but I'm not sure that we *have* a value > that would never be used as an IRQ number on all platforms... The BIOS defines irq 0 in the PCI config space to be "no irq" as far as I know, and on all PC platforms I've ever heard of it's not a usable irq for generic PCI devices (it's wired to the timer thing). All PCI routing chipsets I know about also make "irq0" mean "disabled". Which is not to say that a badly configured setup might not do it, but it really sounds fundamentally broken. Linus ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 2:53 ` Linus Torvalds @ 2003-10-09 8:03 ` Russell King 2003-10-09 22:46 ` Zwane Mwaikambo 2003-10-09 8:07 ` Benjamin Herrenschmidt 2003-10-09 15:46 ` viro 2 siblings, 1 reply; 25+ messages in thread From: Russell King @ 2003-10-09 8:03 UTC (permalink / raw) To: Linus Torvalds Cc: viro, linux-kernel, Marcelo Tosatti, Bartlomiej Zolnierkiewicz On Wed, Oct 08, 2003 at 07:53:36PM -0700, Linus Torvalds wrote: > On Thu, 9 Oct 2003 viro@parcelfarce.linux.theplanet.co.uk wrote: > > ObOtherFun: There's another bogosity in quoted ide-probe.c code, according > > to dwmw2 - he says that there are PCI IDE cards that get IRQ 0, so the > > test for hwif->irq is b0rken. We probably should stop overloading > > ->irq == 0 for "none given", but I'm not sure that we *have* a value > > that would never be used as an IRQ number on all platforms... > > The BIOS defines irq 0 in the PCI config space to be "no irq" as far as I > know, and on all PC platforms I've ever heard of it's not a usable irq for > generic PCI devices (it's wired to the timer thing). > > All PCI routing chipsets I know about also make "irq0" mean "disabled". Correct for x86. For other architectures, it many not be so. On ARM for example, it is quite normal for IRQ0 to be used. Hopefully it'll be something which generic code won't see, but that isn't always true. Someone else might actually follow the PCI specs and use "255" to mean "no irq" on their PCI bus. Irregardless of all that, ARM has always had the following in asm/irq.h: /* * Use this value to indicate lack of interrupt * capability */ #ifndef NO_IRQ #define NO_IRQ ((unsigned int)(-1)) #endif and each time this topic comes up, I always suggest that this idea needs to be propagated to the rest of the kernel - a method by which an interrupt number can be tested to check whether it is real. I don't particularly care if its a constant, or a function-like thing, eg for x86: #define no_irq(irq) ((irq) == 0) -- Russell King (rmk@arm.linux.org.uk) http://www.arm.linux.org.uk/personal/ Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 8:03 ` Russell King @ 2003-10-09 22:46 ` Zwane Mwaikambo 0 siblings, 0 replies; 25+ messages in thread From: Zwane Mwaikambo @ 2003-10-09 22:46 UTC (permalink / raw) To: Russell King Cc: Linus Torvalds, viro, linux-kernel, Marcelo Tosatti, Bartlomiej Zolnierkiewicz On Thu, 9 Oct 2003, Russell King wrote: > Correct for x86. For other architectures, it many not be so. On ARM for > example, it is quite normal for IRQ0 to be used. Hopefully it'll be > something which generic code won't see, but that isn't always true. > Someone else might actually follow the PCI specs and use "255" to mean > "no irq" on their PCI bus. Unfortunately we wouldn't be able to use that for a test on i386; IRQ251 -> 10:11 IRQ253 -> 10:13 IRQ255 -> 10:15 IRQ256 -> 10:16 IRQ257 -> 10:17 IRQ258 -> 10:18 IRQ259 -> 10:19 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 2:53 ` Linus Torvalds 2003-10-09 8:03 ` Russell King @ 2003-10-09 8:07 ` Benjamin Herrenschmidt 2003-10-09 15:46 ` viro 2 siblings, 0 replies; 25+ messages in thread From: Benjamin Herrenschmidt @ 2003-10-09 8:07 UTC (permalink / raw) To: Linus Torvalds Cc: viro, Linux Kernel list, Marcelo Tosatti, Bartlomiej Zolnierkiewicz > > > > ObOtherFun: There's another bogosity in quoted ide-probe.c code, according > > to dwmw2 - he says that there are PCI IDE cards that get IRQ 0, so the > > test for hwif->irq is b0rken. We probably should stop overloading > > ->irq == 0 for "none given", but I'm not sure that we *have* a value > > that would never be used as an IRQ number on all platforms... > > The BIOS defines irq 0 in the PCI config space to be "no irq" as far as I > know, and on all PC platforms I've ever heard of it's not a usable irq for > generic PCI devices (it's wired to the timer thing). > > All PCI routing chipsets I know about also make "irq0" mean "disabled". > > Which is not to say that a badly configured setup might not do it, but it > really sounds fundamentally broken. Well, irq 0 is a perfectly valid IRQ on a number of non-x86 platforms, some embedded platforms for example, and iirc, the Apple G5 even has the serverworks IDE on IRQ 0 ;) That's why we have defined IRQ_NONE a while ago (this was in 2.4, I don'tk now if that made the trip to 2.6, I'm away from my sources at the moment). Ben. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 2:53 ` Linus Torvalds 2003-10-09 8:03 ` Russell King 2003-10-09 8:07 ` Benjamin Herrenschmidt @ 2003-10-09 15:46 ` viro 2003-10-09 16:01 ` Linus Torvalds 2 siblings, 1 reply; 25+ messages in thread From: viro @ 2003-10-09 15:46 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, Marcelo Tosatti, Bartlomiej Zolnierkiewicz On Wed, Oct 08, 2003 at 07:53:36PM -0700, Linus Torvalds wrote: > For 2.4.x it might also be a question of "which patch is smaller" > (conceptually and in practice). I think they end up being exactly the same > in this case. Unfortunately, they don't (AFAICS) ;-/ BTW, there is another thing that feels odd - we start with IRQ_DISABLED set for everything and ->depth set to 0. disable_irq(irq); enable_irq(irq); gets us into the state where a) IRQ_DISABLED is reset b) ->depth is 0. However, any subsequent register_irq();free_irq() gets us back to the IRQ_DISABLED being set and ->depth set to 0. IOW, we have very odd rules of IRQ_DISABLED - when ->action is non-NULL, it's set iff ->depth is positive. That's nice - if you call disable_irq(), you know that enable_irq() will undo the effects. *However*, if you have ->action == NULL, the state depends on history. Morover, once you've done disable_irq(), you have no way to undo all effects - enable_irq() will land you in a different state. It gets particulary ugly when you consider modules - if you do disable_irq(), poke into the hardware and decide to bail out, there is no way to restore the original state on cleanup path. Which leaves us with permanent effects of failed insmod. I'm not saying that it's necessary a bug (aside of the issues with IRQ_INPROGRESS), but it feels like a bug waiting to happen. If we really don't care about interrupts arriving after e.g. such failed insmod, why don't we simply have enable_irq() check that ->action is non-NULL and reset IRQ_DISABLED only in that case? Then it would really be an opposite of disable_irq() in all cases we care about. I do realize that some code might rely on the current behaviour and call irq_disable();irq_enable() as a way to reset IRQ_DISABLED when ->action is NULL. However, I'd argue that it's a kludge - note that simply calling enable_irq() will *not* work, you need to call disable_irq() first. Which doesn't look like a sane interface... IOW, the question is: do we want enable_irq() to undo all effects of disable_irq()? Whether the current behaviour is intentional or not, it's worth documenting, IMO... ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 15:46 ` viro @ 2003-10-09 16:01 ` Linus Torvalds 2003-10-09 17:46 ` viro 0 siblings, 1 reply; 25+ messages in thread From: Linus Torvalds @ 2003-10-09 16:01 UTC (permalink / raw) To: viro; +Cc: linux-kernel, Marcelo Tosatti, Bartlomiej Zolnierkiewicz On Thu, 9 Oct 2003 viro@parcelfarce.linux.theplanet.co.uk wrote: > > IOW, the question is: do we want enable_irq() to undo all effects of > disable_irq()? Whether the current behaviour is intentional or not, > it's worth documenting, IMO... I think yes, it would be clean to make enable_irq() clean up properly. That includes clearing the IRQ_INPROGRESS bit along with the IRQ_DISABLED but. It won't even make the codepath longer, it just changes a constant. And yes, I think we should just remove the clearing of depth in setup_irq(), _and_ make the enabling of the irq be dependent on depth being non-zero. In 2.6.x terms, we'd like to go towards something like the appended, but this actually has a _different_ problem: we have separate "->startup()" vs "->enable()" functions for the irq controller, and this means that if the interrupt was disabled when the first irq handler was requested, "->startup()" wouldn't be called at all. So my suggestion would be: - do the IRQ_INPROGRESS clearing (safe, since disable_irq() will have waited for it if it was valid) - leave the depth reset as-is for now, and think about how we'd like to solve it. - make the synchronize_irq() conditional in irq_disable() (2.6.x now already does that part, so it's not in the patch) Hmm? Linus ---- --- 1.43/arch/i386/kernel/irq.c Wed Oct 8 20:47:36 2003 +++ edited/arch/i386/kernel/irq.c Thu Oct 9 08:55:54 2003 @@ -380,7 +380,7 @@ spin_lock_irqsave(&desc->lock, flags); switch (desc->depth) { case 1: { - unsigned int status = desc->status & ~IRQ_DISABLED; + unsigned int status = desc->status & ~(IRQ_DISABLED | IRQ_INPROGRESS); desc->status = status; if ((status & (IRQ_PENDING | IRQ_REPLAY)) == IRQ_PENDING) { desc->status = status | IRQ_REPLAY; @@ -884,8 +884,7 @@ *p = new; - if (!shared) { - desc->depth = 0; + if (!shared && !desc->depth) { desc->status &= ~(IRQ_DISABLED | IRQ_AUTODETECT | IRQ_WAITING | IRQ_INPROGRESS); desc->handler->startup(irq); } ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 16:01 ` Linus Torvalds @ 2003-10-09 17:46 ` viro 2003-10-09 18:03 ` Linus Torvalds 0 siblings, 1 reply; 25+ messages in thread From: viro @ 2003-10-09 17:46 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, Marcelo Tosatti, Bartlomiej Zolnierkiewicz On Thu, Oct 09, 2003 at 09:01:36AM -0700, Linus Torvalds wrote: > > On Thu, 9 Oct 2003 viro@parcelfarce.linux.theplanet.co.uk wrote: > > > > IOW, the question is: do we want enable_irq() to undo all effects of > > disable_irq()? Whether the current behaviour is intentional or not, > > it's worth documenting, IMO... > > I think yes, it would be clean to make enable_irq() clean up properly. > That includes clearing the IRQ_INPROGRESS bit along with the IRQ_DISABLED > but. It won't even make the codepath longer, it just changes a constant. > > And yes, I think we should just remove the clearing of depth in > setup_irq(), _and_ make the enabling of the irq be dependent on depth > being non-zero. > > In 2.6.x terms, we'd like to go towards something like the appended, but > this actually has a _different_ problem: we have separate "->startup()" vs > "->enable()" functions for the irq controller, and this means that if the > interrupt was disabled when the first irq handler was requested, > "->startup()" wouldn't be called at all. > > So my suggestion would be: > - do the IRQ_INPROGRESS clearing (safe, since disable_irq() will have > waited for it if it was valid) > - leave the depth reset as-is for now, and think about how we'd like to > solve it. > - make the synchronize_irq() conditional in irq_disable() (2.6.x now > already does that part, so it's not in the patch) > > Hmm? I've looked through the uses of IRQ_DISABLED in 2.6. Fun things found: a) on alpha: void i8259a_end_irq(unsigned int irq) { if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS))) i8259a_enable_irq(irq); } and similar in other ->end() on x86: static void end_8259A_irq (unsigned int irq) { if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS)) && irq_desc[irq].action) enable_8259A_irq(irq); } b) if we get *two* interrupts while irq is enabled and not registered, we'll be stuck with IRQ_PENDING in addition to IRQ_INPROGRESS. Which can, AFAICS, confuse other code. c) mind-boggling amount of code duplication - are there any plans to take that stuff to kernel/*? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 17:46 ` viro @ 2003-10-09 18:03 ` Linus Torvalds 2003-10-09 18:27 ` viro 2003-10-15 17:14 ` Anton Blanchard 0 siblings, 2 replies; 25+ messages in thread From: Linus Torvalds @ 2003-10-09 18:03 UTC (permalink / raw) To: viro; +Cc: linux-kernel, Marcelo Tosatti, Bartlomiej Zolnierkiewicz On Thu, 9 Oct 2003 viro@parcelfarce.linux.theplanet.co.uk wrote: > > a) on x86: > static void end_8259A_irq (unsigned int irq) > { > if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS)) && > irq_desc[irq].action) > enable_8259A_irq(irq); > } This matches the "if IRQ is disabled for whatever reason" test in irq.c, and as such it makes some amount of sense. However, from a logical standpoint it is indeed not very sensible. It's hard to see why the code does what it does. > b) if we get *two* interrupts while irq is enabled and not registered, we'll > be stuck with IRQ_PENDING in addition to IRQ_INPROGRESS. Which can, AFAICS, > confuse other code. This should not happen: the first one will shut up the interrupt. That's in fact what (a) does - it refuses to enable the interrupt again (and it will have been shut up by the "ack". (Other interrupt controllers _can_ get multiple interrupts while disabled, like the APIC edge-triggered case. But they have different logic to take care of this - see io_apic.c and the logic there in the [ack|end]_edge_ioapic_irq). In short, this is all behaviour that depends on the low-level irq controller. It may not be very obvious what is going on, but I think it is correct (and again, it might be worth cleaning up some day. Not now) > c) mind-boggling amount of code duplication - are there any plans to take > that stuff to kernel/*? Yes. It was actually tried a few months ago, but some of the other architectures have very different interrupt setups, so it got dropped. But it will almost certainly happen eventually: we've had bugs fixed on x86 that ended up living a lot longer on other architectures. Linus ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 18:03 ` Linus Torvalds @ 2003-10-09 18:27 ` viro 2003-10-09 19:05 ` Linus Torvalds 2003-10-15 17:14 ` Anton Blanchard 1 sibling, 1 reply; 25+ messages in thread From: viro @ 2003-10-09 18:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, Marcelo Tosatti, Bartlomiej Zolnierkiewicz On Thu, Oct 09, 2003 at 11:03:14AM -0700, Linus Torvalds wrote: > > On Thu, 9 Oct 2003 viro@parcelfarce.linux.theplanet.co.uk wrote: > > > > a) on x86: > > static void end_8259A_irq (unsigned int irq) > > { > > if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS)) && > > irq_desc[irq].action) ^^^^^^^^^^^^^^^^^^^^^^^ > > enable_8259A_irq(irq); > > } > > This matches the "if IRQ is disabled for whatever reason" test in irq.c, > and as such it makes some amount of sense. However, from a logical > standpoint it is indeed not very sensible. It's hard to see why the code > does what it does. The underlined bit is absent on alpha version of the same function. Note that this piece is bogus - if .action is NULL, we are already caught by IRQ_INPROGRESS check. So it's not exactly a bug, but considering your arguments about exact same check slightly earlier in handle_irq()... It's from cset1.437.22.19 by mingo; the same changeset had done unconditional removal of IRQ_INPROGRESS, so there it made sense. After the irq.c part had been reverted (1.497.61.30 from you), i8259.c one should be killed too, AFAICS... ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 18:27 ` viro @ 2003-10-09 19:05 ` Linus Torvalds 0 siblings, 0 replies; 25+ messages in thread From: Linus Torvalds @ 2003-10-09 19:05 UTC (permalink / raw) To: viro; +Cc: linux-kernel, Marcelo Tosatti, Bartlomiej Zolnierkiewicz On Thu, 9 Oct 2003 viro@parcelfarce.linux.theplanet.co.uk wrote: > > The underlined bit is absent on alpha version of the same function. > > Note that this piece is bogus - if .action is NULL, we are already caught > by IRQ_INPROGRESS check. So it's not exactly a bug, but considering > your arguments about exact same check slightly earlier in handle_irq()... Yes. I'm definitely not claiming the code is beautiful. I think it happens to be working ;) > It's from cset1.437.22.19 by mingo; the same changeset had done unconditional > removal of IRQ_INPROGRESS, so there it made sense. After the irq.c part > had been reverted (1.497.61.30 from you), i8259.c one should be killed > too, AFAICS... Yeah, the IRQ_INPROGRESS removal in handle_irq() was buggy: it caused the bit to be spuriously cleared if an interrupt happened while the previous interrupt was active (which will _not_ happen in the i8259 case, but does happen in the edge-triggered case). The problem, I think, is that all this code grew fairly organically, and nobody ever sat down and wrote down the rules. Which is why I think it _works_, but it's clearly nonoptimal and sometimes confusing. I suspect the 2.4.x situation is even worse. Linus ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 18:03 ` Linus Torvalds 2003-10-09 18:27 ` viro @ 2003-10-15 17:14 ` Anton Blanchard 2003-10-17 9:19 ` Russell King 1 sibling, 1 reply; 25+ messages in thread From: Anton Blanchard @ 2003-10-15 17:14 UTC (permalink / raw) To: Linus Torvalds Cc: viro, linux-kernel, Marcelo Tosatti, Bartlomiej Zolnierkiewicz > > c) mind-boggling amount of code duplication - are there any plans to take > > that stuff to kernel/*? > > Yes. It was actually tried a few months ago, but some of the other > architectures have very different interrupt setups, so it got dropped. But > it will almost certainly happen eventually: we've had bugs fixed on x86 > that ended up living a lot longer on other architectures. >From memory at least x86, alpha, ppc32 and ppc64 worked with Andrey's irq consolidation patches. I'll be pushing for it in 2.7 together with some macros to abstract away irq_desc[NR_IRQS] completely. (it will make it easier to support 24 bit irqs on ppc64 and should allow sparc64 to use the consolidated irq code). Anton ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-15 17:14 ` Anton Blanchard @ 2003-10-17 9:19 ` Russell King 2003-10-17 10:32 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 25+ messages in thread From: Russell King @ 2003-10-17 9:19 UTC (permalink / raw) To: Anton Blanchard, Benjamin Herrenschmidt; +Cc: linux-kernel On Thu, Oct 16, 2003 at 03:14:11AM +1000, Anton Blanchard wrote: > >From memory at least x86, alpha, ppc32 and ppc64 worked with Andrey's > irq consolidation patches. I'll be pushing for it in 2.7 together with > some macros to abstract away irq_desc[NR_IRQS] completely. (it will > make it easier to support 24 bit irqs on ppc64 and should allow sparc64 > to use the consolidated irq code). (CC list snipped) >From what I heard, benh was seriously considering changing ppc64 IRQ stuff in 2.7 to use something like the system we have in ARM. benh? -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-17 9:19 ` Russell King @ 2003-10-17 10:32 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 25+ messages in thread From: Benjamin Herrenschmidt @ 2003-10-17 10:32 UTC (permalink / raw) To: Russell King; +Cc: Anton Blanchard, Linux Kernel list On Fri, 2003-10-17 at 11:19, Russell King wrote: > On Thu, Oct 16, 2003 at 03:14:11AM +1000, Anton Blanchard wrote: > > >From memory at least x86, alpha, ppc32 and ppc64 worked with Andrey's > > irq consolidation patches. I'll be pushing for it in 2.7 together with > > some macros to abstract away irq_desc[NR_IRQS] completely. (it will > > make it easier to support 24 bit irqs on ppc64 and should allow sparc64 > > to use the consolidated irq code). > > (CC list snipped) > > >From what I heard, benh was seriously considering changing ppc64 IRQ > stuff in 2.7 to use something like the system we have in ARM. It's on my list of things to investigate once I've reached OzLabs, indeed. Right now, I have horrible hacks for the "second" OpenPIC in G5s, and we are indeed reaching the limits of our current irq model. I don't know yet what direction I will take though. I'm also considering the possibility of merging ppc32 and ppc64 archs :) Ben. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 2:29 ` Linus Torvalds 2003-10-09 2:43 ` viro @ 2003-10-09 12:55 ` Roman Zippel 1 sibling, 0 replies; 25+ messages in thread From: Roman Zippel @ 2003-10-09 12:55 UTC (permalink / raw) To: Linus Torvalds Cc: viro, linux-kernel, Marcelo Tosatti, Bartlomiej Zolnierkiewicz Hi, On Wed, 8 Oct 2003, Linus Torvalds wrote: > And while I agree that the depth clearing is bogus, but I'd be worried > about removing it in case some driver actually depends on it (ie > historically it has actually been ok to do: > > disable_irq(irq); > .. set up device .. > request_irq(irq, ..); // This will also enable the irq > > even though it's ugly, and I hope nobody does it). If there are such cases left, I'd really prefer we fix them, as currently nothing protects this against another driver requesting the same irq. To make this even more fun the behaviour is also different if the irq is shared, as the irq is not enabled in this case. In the ide driver I'd really like to see that at the time the probe function reenables the interrupt there is either an irq handler installed or it failed. On the Amiga we also have always problems with this, as the interrupt must be acknowledged by the driver, so we have to be careful not to leave anything pending. The irq handler would automatically take care of this and would make this simpler. bye, Roman ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c
@ 2003-10-09 16:10 Manfred Spraul
2003-10-09 16:38 ` Jeff Garzik
0 siblings, 1 reply; 25+ messages in thread
From: Manfred Spraul @ 2003-10-09 16:10 UTC (permalink / raw)
To: Linus Torvalds; +Cc: viro, linux-kernel
Linus wrote:
>Nobody has ever really complained, but if anybody
>ever wants to do this, then the way to do it would be to
>
> - find out the irq
> - disable it
> - request the irq
> - enable the PCI routing for it
> - set up the device
> - enable the irq
>
I'd like to use that for nic shutdown for natsemi:
disable_irq();
shutdown_nic();
free_irq();
enable_irq();
The irq handler touches registers that restart the nic. Right now I use
a np->hands_off variable to avoid that.
But I don't know if all systems can support atomic request_irq/free_irq
calls. request_irq creates /proc/irq/x/cpu_affinity, and I could imagine
that on some archs it might have perform IPIs to reconfigure the irq
controller of a remote node.
--
Manfred
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 16:10 Manfred Spraul @ 2003-10-09 16:38 ` Jeff Garzik 2003-10-09 16:57 ` Benjamin Herrenschmidt 2003-10-09 17:29 ` Linus Torvalds 0 siblings, 2 replies; 25+ messages in thread From: Jeff Garzik @ 2003-10-09 16:38 UTC (permalink / raw) To: Manfred Spraul; +Cc: Linus Torvalds, viro, linux-kernel Manfred Spraul wrote: > I'd like to use that for nic shutdown for natsemi: > > disable_irq(); > shutdown_nic(); > free_irq(); > enable_irq(); Why not just shutdown the NIC inside spin_lock_irqsave or disable_irq, and then free_irq separately? If you can't stop the NIC hardware from generating interrupts, that's a driver bug. And if the driver cannot handle its interrupt handler between the spin_unlock_irqrestore() and free_irq() (shared irq case), it's also buggy. Jeff ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 16:38 ` Jeff Garzik @ 2003-10-09 16:57 ` Benjamin Herrenschmidt 2003-10-09 17:03 ` Jeff Garzik 2003-10-09 17:29 ` Linus Torvalds 1 sibling, 1 reply; 25+ messages in thread From: Benjamin Herrenschmidt @ 2003-10-09 16:57 UTC (permalink / raw) To: Jeff Garzik; +Cc: Manfred Spraul, Linus Torvalds, viro, Linux Kernel list On Thu, 2003-10-09 at 18:38, Jeff Garzik wrote: > Manfred Spraul wrote: > > I'd like to use that for nic shutdown for natsemi: > > > > disable_irq(); > > shutdown_nic(); > > free_irq(); > > enable_irq(); > > > Why not just shutdown the NIC inside spin_lock_irqsave or disable_irq, > and then free_irq separately? > > If you can't stop the NIC hardware from generating interrupts, that's a > driver bug. And if the driver cannot handle its interrupt handler > between the spin_unlock_irqrestore() and free_irq() (shared irq case), > it's also buggy. Actually you may still get a stale irq ;) The problem is that IRQs are typically an asynchronous event, and an irq can be sort of "queued" up (especially if it's a level one) in the PIC... though at least this won't be a stale level irq so you won't deadlock in an irq handler that can do nothing... Anyway, I quite like the idea. I've been trying to avoid taking a lock in some similar shutdown routine for sungem, because some bits in there need a few ms delay to workaround a chip bug (or machine sleep will break) and I want to schedule. Breaking the lock makes things ugly, beeing able to just disable_irq before/after is nice. The problem of course is when that irq is shared... you are suddently shutting off for a potentially long time a neighbour irq, bad bad... (at least I know that on pmac, sungem irq is never shared). Ben. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 16:57 ` Benjamin Herrenschmidt @ 2003-10-09 17:03 ` Jeff Garzik 2003-10-09 17:07 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 25+ messages in thread From: Jeff Garzik @ 2003-10-09 17:03 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Manfred Spraul, Linus Torvalds, viro, Linux Kernel list Benjamin Herrenschmidt wrote: > On Thu, 2003-10-09 at 18:38, Jeff Garzik wrote: > >>Manfred Spraul wrote: >> >>>I'd like to use that for nic shutdown for natsemi: >>> >>> disable_irq(); >>> shutdown_nic(); >>> free_irq(); >>> enable_irq(); >> >> >>Why not just shutdown the NIC inside spin_lock_irqsave or disable_irq, >>and then free_irq separately? >> >>If you can't stop the NIC hardware from generating interrupts, that's a >>driver bug. And if the driver cannot handle its interrupt handler >>between the spin_unlock_irqrestore() and free_irq() (shared irq case), >>it's also buggy. > > > Actually you may still get a stale irq ;) The problem is that IRQs are > typically an asynchronous event, and an irq can be sort of "queued" up > (especially if it's a level one) in the PIC... though at least this > won't be a stale level irq so you won't deadlock in an irq handler that > can do nothing... Easily solved with a synchronize_irq() ;-) As a slight tangent, with MSI coming up, as well as the presence of hardware that queues IRQ events asynchronously, we'll want synchronize_irq() to handle those cases, even for uniprocess (often synchronize_irq is only defined for SMP). > Anyway, I quite like the idea. I've been trying to avoid taking a lock > in some similar shutdown routine for sungem, because some bits in there > need a few ms delay to workaround a chip bug (or machine sleep will > break) and I want to schedule. Breaking the lock makes things ugly, > beeing able to just disable_irq before/after is nice. > > The problem of course is when that irq is shared... you are suddently > shutting off for a potentially long time a neighbour irq, bad bad... > (at least I know that on pmac, sungem irq is never shared). You want to disable_irq() then sleep? ewww... :) I think we could figure out a better way... Jeff ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 17:03 ` Jeff Garzik @ 2003-10-09 17:07 ` Benjamin Herrenschmidt 2003-10-09 17:16 ` Jeff Garzik 0 siblings, 1 reply; 25+ messages in thread From: Benjamin Herrenschmidt @ 2003-10-09 17:07 UTC (permalink / raw) To: Jeff Garzik; +Cc: Manfred Spraul, Linus Torvalds, viro, Linux Kernel list On Thu, 2003-10-09 at 19:03, Jeff Garzik wrote: > Easily solved with a synchronize_irq() ;-) No. synchronize_irq will do nothing to an irq that is still somewhere in the HW path from the device to the core, and even in the core it may be queued for some cycles before actually delivered. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 17:07 ` Benjamin Herrenschmidt @ 2003-10-09 17:16 ` Jeff Garzik 0 siblings, 0 replies; 25+ messages in thread From: Jeff Garzik @ 2003-10-09 17:16 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Manfred Spraul, Linus Torvalds, viro, Linux Kernel list Benjamin Herrenschmidt wrote: > On Thu, 2003-10-09 at 19:03, Jeff Garzik wrote: > > >>Easily solved with a synchronize_irq() ;-) > > > No. synchronize_irq will do nothing to an irq that is > still somewhere in the HW path from the device to the core, > and even in the core it may be queued for some cycles before > actually delivered. hmmm, ok :) Well my main objection is disable_irq+free_irq+enable_irq. Seems to me we'll wind up coding around that. Maybe a free_and_enable_irq is appropriate to avoid such a situation? Oh well, just thinking out loud... Jeff ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 16:38 ` Jeff Garzik 2003-10-09 16:57 ` Benjamin Herrenschmidt @ 2003-10-09 17:29 ` Linus Torvalds 2003-10-09 17:52 ` Jeff Garzik 1 sibling, 1 reply; 25+ messages in thread From: Linus Torvalds @ 2003-10-09 17:29 UTC (permalink / raw) To: Jeff Garzik; +Cc: Manfred Spraul, viro, linux-kernel On Thu, 9 Oct 2003, Jeff Garzik wrote: > > If you can't stop the NIC hardware from generating interrupts, that's a > driver bug. No it's not. Think shared interrupts here. Linus ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] disable_irq()/enable_irq() semantics and ide-probe.c 2003-10-09 17:29 ` Linus Torvalds @ 2003-10-09 17:52 ` Jeff Garzik 0 siblings, 0 replies; 25+ messages in thread From: Jeff Garzik @ 2003-10-09 17:52 UTC (permalink / raw) To: Linus Torvalds; +Cc: Manfred Spraul, viro, linux-kernel Linus Torvalds wrote: > On Thu, 9 Oct 2003, Jeff Garzik wrote: > >>If you can't stop the NIC hardware from generating interrupts, that's a >>driver bug. > > > No it's not. > > Think shared interrupts here. I was being specific in what I said :) Sure, shared interrupts can still call a driver's handler. But if the driver doesn't stop _its own_ hardware from generating interrupts, you've got screaming interrupts the minute it issues free_irq and signals it doesn't care anymore. The driver damn well better be able to stop the _NIC hardware_ from generating interrupts :) Jeff ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2003-10-17 10:33 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-10-09 2:00 [RFC] disable_irq()/enable_irq() semantics and ide-probe.c viro 2003-10-09 2:29 ` Linus Torvalds 2003-10-09 2:43 ` viro 2003-10-09 2:53 ` Linus Torvalds 2003-10-09 8:03 ` Russell King 2003-10-09 22:46 ` Zwane Mwaikambo 2003-10-09 8:07 ` Benjamin Herrenschmidt 2003-10-09 15:46 ` viro 2003-10-09 16:01 ` Linus Torvalds 2003-10-09 17:46 ` viro 2003-10-09 18:03 ` Linus Torvalds 2003-10-09 18:27 ` viro 2003-10-09 19:05 ` Linus Torvalds 2003-10-15 17:14 ` Anton Blanchard 2003-10-17 9:19 ` Russell King 2003-10-17 10:32 ` Benjamin Herrenschmidt 2003-10-09 12:55 ` Roman Zippel 2003-10-09 16:10 Manfred Spraul 2003-10-09 16:38 ` Jeff Garzik 2003-10-09 16:57 ` Benjamin Herrenschmidt 2003-10-09 17:03 ` Jeff Garzik 2003-10-09 17:07 ` Benjamin Herrenschmidt 2003-10-09 17:16 ` Jeff Garzik 2003-10-09 17:29 ` Linus Torvalds 2003-10-09 17:52 ` Jeff Garzik
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.