* 2.5.69 Interrupt Latency @ 2003-05-07 16:12 Paul Fulghum 2003-05-07 19:41 ` Paul Fulghum 0 siblings, 1 reply; 65+ messages in thread From: Paul Fulghum @ 2003-05-07 16:12 UTC (permalink / raw) To: linux-kernel Starting with kernel version 2.5.69, I am encountering what appears to be increased interrupt latency or spikes in interrupt latency. I noticed this on two serial drivers that use programmed I/O with FIFOs. On 2.5.68, no problems. On 2.5.69 plenty of underruns. Inspecting the driver tracing, it does not look like lost interrupts. I see this on 2 different machines (one SMP server and one laptop). There were changes involving the return type of interrupt handlers (from void to irqreturn_t) in 2.5.69. Could this be related? Has anyone else seen similar results? If I can get time, I'll try and hook up a scope to measure the latencies precisely. I want to check to see if this is a known problems before doing so. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-07 16:12 2.5.69 Interrupt Latency Paul Fulghum @ 2003-05-07 19:41 ` Paul Fulghum 2003-05-07 22:28 ` Andrew Morton 0 siblings, 1 reply; 65+ messages in thread From: Paul Fulghum @ 2003-05-07 19:41 UTC (permalink / raw) To: linux-kernel On Wed, 2003-05-07 at 11:12, Paul Fulghum wrote: > Starting with kernel version 2.5.69, I am > encountering what appears to be increased > interrupt latency or spikes in interrupt latency. > ... > I see this on 2 different machines > (one SMP server and one laptop). > ... > If I can get time, I'll try and hook up a scope > to measure the latencies precisely. I want to > check to see if this is a known problems before doing so. Here are some results with a scope hooked to the hardware while running tests with a regular interrupt pattern: 2.4.20-8 (redhat) Latency 20-30usec Spikes to 80usec 2.5.68 Latency 20-30usec Spikes to 100usec 2.5.69 Latency 100-110usec (5x increase) Spikes from 5-10 milliseconds This is all on a PCI adapter not sharing interrupts on a dual Pentium II-400 Netserver LC3. Any ideas what happened? -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-07 19:41 ` Paul Fulghum @ 2003-05-07 22:28 ` Andrew Morton 2003-05-08 0:25 ` Paul Fulghum ` (2 more replies) 0 siblings, 3 replies; 65+ messages in thread From: Andrew Morton @ 2003-05-07 22:28 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel Paul Fulghum <paulkf@microgate.com> wrote: > > 2.5.69 > Latency 100-110usec (5x increase) > Spikes from 5-10 milliseconds > > This is all on a PCI adapter not sharing interrupts > on a dual Pentium II-400 Netserver LC3. > > Any ideas what happened? Could be that some random piece of code forgot to reenable interrupts, and things stay that way until they get reenabled again by schedule() or syscall return. One way of finding the culprit would be: my_isr() { if (this interrupt is more than 5 milliseconds delayed) dump_stack(); } the stack dump will point up at the place where interrupts finally got enabled. If you can describe what drivers are in use, and what workload triggers the problem then it may be locatable by inspection. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-07 22:28 ` Andrew Morton @ 2003-05-08 0:25 ` Paul Fulghum 2003-05-08 13:56 ` Paul Fulghum 2003-05-08 14:47 ` Paul Fulghum 2 siblings, 0 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-08 0:25 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Wed, 2003-05-07 at 17:28, Andrew Morton wrote: > Paul Fulghum <paulkf@microgate.com> wrote: > > > > 2.5.69 > > Latency 100-110usec (5x increase) > > Spikes from 5-10 milliseconds > > > > This is all on a PCI adapter not sharing interrupts > > on a dual Pentium II-400 Netserver LC3. > > > > Any ideas what happened? > > Could be that some random piece of code forgot to reenable interrupts, and > things stay that way until they get reenabled again by schedule() or > syscall return. > > One way of finding the culprit would be: > > my_isr() > { > if (this interrupt is more than 5 milliseconds delayed) > dump_stack(); > } > > the stack dump will point up at the place where interrupts finally got > enabled. I'll give that a try tomorrow. > If you can describe what drivers are in use, and what workload triggers the > problem then it may be locatable by inspection. It happens on both of the machines I tried (server and laptop). I think the only common hardware between the two is the net controller which is intel etherpro 100 based. I'll check tomorrow to be sure. There was essentially no work load (no net traffic, no CPU intensive program, no disk activity). I was just doing simple loopback tests on our serial devices (PCI based on server and PC Card on laptop). Paul Fulghum paulkf@microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-07 22:28 ` Andrew Morton 2003-05-08 0:25 ` Paul Fulghum @ 2003-05-08 13:56 ` Paul Fulghum 2003-05-08 19:22 ` Andrew Morton 2003-05-08 14:47 ` Paul Fulghum 2 siblings, 1 reply; 65+ messages in thread From: Paul Fulghum @ 2003-05-08 13:56 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Wed, 2003-05-07 at 17:28, Andrew Morton wrote: > Paul Fulghum <paulkf@microgate.com> wrote: > > > > 2.5.69 > > Latency 100-110usec (5x increase) > > Spikes from 5-10 milliseconds > > > If you can describe what drivers are in use, and what workload triggers the > problem then it may be locatable by inspection. After inspecting both machines, there is no common hardware other than the net device. Both machines use the e100 driver. After removing the e100 driver altogether, the increased latency and latency spikes persist. So it looks like this problem is not specific to a particular hardware driver, but is a result of a more fundemental, system wide change. I'm going to try your suggestion of doing a stack dump when the driver encounters the large spikes in IRQ latency, to determine if something is leaving interrupts disabled. That will not address the fact that the minimum latency has jumped from 20usec (2.4.20 - 2.5.68) to 100usec (2.5.69). This may actually be two separate problems introduced with 2.5.69 -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-08 13:56 ` Paul Fulghum @ 2003-05-08 19:22 ` Andrew Morton 2003-05-08 19:35 ` Paul Fulghum 2003-05-09 18:12 ` Paul Fulghum 0 siblings, 2 replies; 65+ messages in thread From: Andrew Morton @ 2003-05-08 19:22 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel Paul Fulghum <paulkf@microgate.com> wrote: > > On Wed, 2003-05-07 at 17:28, Andrew Morton wrote: > > Paul Fulghum <paulkf@microgate.com> wrote: > > > > > > 2.5.69 > > > Latency 100-110usec (5x increase) > > > Spikes from 5-10 milliseconds > > > > > > If you can describe what drivers are in use, and what workload triggers the > > problem then it may be locatable by inspection. > > After inspecting both machines, there is no common > hardware other than the net device. Both machines > use the e100 driver. > > After removing the e100 driver altogether, > the increased latency and latency spikes persist. > > So it looks like this problem is not specific to a > particular hardware driver, but is a result of a > more fundemental, system wide change. > > I'm going to try your suggestion of doing a stack dump > when the driver encounters the large spikes in IRQ latency, > to determine if something is leaving interrupts disabled. I wasn't very informative, alas. > That will not address the fact that the minimum > latency has jumped from 20usec (2.4.20 - 2.5.68) to 100usec > (2.5.69). This may actually be two separate problems > introduced with 2.5.69 Can you pinpoint a kernel version at which it started to happen? ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-08 19:22 ` Andrew Morton @ 2003-05-08 19:35 ` Paul Fulghum 2003-05-08 23:20 ` Brian Gerst 2003-05-09 18:12 ` Paul Fulghum 1 sibling, 1 reply; 65+ messages in thread From: Paul Fulghum @ 2003-05-08 19:35 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Thu, 2003-05-08 at 14:22, Andrew Morton wrote: > Paul Fulghum <paulkf@microgate.com> wrote: > > > > On Wed, 2003-05-07 at 17:28, Andrew Morton wrote: > > > Paul Fulghum <paulkf@microgate.com> wrote: > > > > > > > > 2.5.69 > > > > Latency 100-110usec (5x increase) > > > > Spikes from 5-10 milliseconds > > > > > > I'm going to try your suggestion of doing a stack dump > > when the driver encounters the large spikes in IRQ latency, > > to determine if something is leaving interrupts disabled. > > I wasn't very informative, alas. Yeah, I've been reading through the 2.5.69 patch again and could not really see anything that related to the stack dump. > > That will not address the fact that the minimum > > latency has jumped from 20usec (2.4.20 - 2.5.68) to 100usec > > (2.5.69). This may actually be two separate problems > > introduced with 2.5.69 > > Can you pinpoint a kernel version at which it started to happen? Exactly with 2.5.69 2.5.68 works fine as do earlier versions back to 2.4.20-8 (earliest tested for this problem). All these versions have very consistant latencies as described above. The problem definately started with the 2.5.69 -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-08 19:35 ` Paul Fulghum @ 2003-05-08 23:20 ` Brian Gerst 0 siblings, 0 replies; 65+ messages in thread From: Brian Gerst @ 2003-05-08 23:20 UTC (permalink / raw) To: Paul Fulghum; +Cc: Andrew Morton, linux-kernel Paul Fulghum wrote: > On Thu, 2003-05-08 at 14:22, Andrew Morton wrote: > >>Paul Fulghum <paulkf@microgate.com> wrote: >> >>>On Wed, 2003-05-07 at 17:28, Andrew Morton wrote: >>> >>>>Paul Fulghum <paulkf@microgate.com> wrote: >>>> >>>>>2.5.69 >>>>>Latency 100-110usec (5x increase) >>>>>Spikes from 5-10 milliseconds >>>>> > > >>>I'm going to try your suggestion of doing a stack dump >>>when the driver encounters the large spikes in IRQ latency, >>>to determine if something is leaving interrupts disabled. >> >>I wasn't very informative, alas. > > > Yeah, I've been reading through the 2.5.69 patch again and > could not really see anything that related to the > stack dump. > > >>>That will not address the fact that the minimum >>>latency has jumped from 20usec (2.4.20 - 2.5.68) to 100usec >>>(2.5.69). This may actually be two separate problems >>>introduced with 2.5.69 >> >>Can you pinpoint a kernel version at which it started to happen? > > > Exactly with 2.5.69 > > 2.5.68 works fine as do earlier versions back to 2.4.20-8 > (earliest tested for this problem). All these versions have > very consistant latencies as described above. > > The problem definately started with the 2.5.69 > Try to narrow it down with the 2.5.68-bk snapshots. -- Brian Gerst ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-08 19:22 ` Andrew Morton 2003-05-08 19:35 ` Paul Fulghum @ 2003-05-09 18:12 ` Paul Fulghum 2003-05-09 20:30 ` Paul Fulghum 2003-05-09 21:07 ` Andrew Morton 1 sibling, 2 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-09 18:12 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Randy.Dunlap On Thu, 2003-05-08 at 14:22, Andrew Morton wrote: > Can you pinpoint a kernel version at which it started to happen? I have now isolated the latency problems further to 2.5.68-bk11 2.5.68-bk10 an earlier works fine. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-09 18:12 ` Paul Fulghum @ 2003-05-09 20:30 ` Paul Fulghum 2003-05-09 21:28 ` Andrew Morton 2003-05-09 21:07 ` Andrew Morton 1 sibling, 1 reply; 65+ messages in thread From: Paul Fulghum @ 2003-05-09 20:30 UTC (permalink / raw) To: linux-kernel On Fri, 2003-05-09 at 13:12, Paul Fulghum wrote: > On Thu, 2003-05-08 at 14:22, Andrew Morton wrote: > > Can you pinpoint a kernel version at which it started to happen? > > I have now isolated the latency problems further to 2.5.68-bk11 > > 2.5.68-bk10 an earlier works fine. In the process of eliminating kernel options to isolate the problem, eliminating USB completely appears to fix it. One machine (server) was using usb-uhci and the other (laptop) was using usb-ohci. So it looks like something with USB in 2.5.68-bk11 -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-09 20:30 ` Paul Fulghum @ 2003-05-09 21:28 ` Andrew Morton 2003-05-12 13:57 ` Paul Fulghum 2003-05-14 17:50 ` Paul Fulghum 0 siblings, 2 replies; 65+ messages in thread From: Andrew Morton @ 2003-05-09 21:28 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel Paul Fulghum <paulkf@microgate.com> wrote: > > In the process of eliminating kernel options to isolate > the problem, eliminating USB completely appears to fix it. > > One machine (server) was using usb-uhci and > the other (laptop) was using usb-ohci. > > So it looks like something with USB in 2.5.68-bk11 ah, that helps. This code was added to wakeup_hc(). It is called from uhci_irq(): + /* Global resume for 20ms */ + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); + wait_ms(20); The changelog just says "Minor patch for uhci-hcd.c" Can you delete the wait_ms() and see if that is our culprit? ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-09 21:28 ` Andrew Morton @ 2003-05-12 13:57 ` Paul Fulghum 2003-05-12 14:06 ` Paul Fulghum 2003-05-12 16:24 ` Greg KH 2003-05-14 17:50 ` Paul Fulghum 1 sibling, 2 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-12 13:57 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Arnd Bergmann, johannes On Fri, 2003-05-09 at 16:28, Andrew Morton wrote: > This code was added to wakeup_hc(). It is called from uhci_irq(): > > + /* Global resume for 20ms */ > + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); > + wait_ms(20); > > The changelog just says "Minor patch for uhci-hcd.c" > > Can you delete the wait_ms() and see if that is our culprit? This is the culprit. Removing this line corrects the latency problems on the server. A 20ms delay seems pretty excessive for an interrupt handler. I'm not sure what it is supposed to accomplish, but this seems like something that should be scheduled to run outside of the ISR. I must have messed up a test on the laptop that is also showing latency problems. On the laptop the problem *is* in both 2.5.68/2.5.69 and *is not* eliminated by turning off USB. The laptop uses the ohci driver anyways which is not effected by this patch. The laptop does not show latency problems on 2.4.20. So the patch above is definately a problem, but the problem I am seeing on the laptop is something unrelated, but part of 2.5.x (which I will investigate further). Thanks, Paul -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 13:57 ` Paul Fulghum @ 2003-05-12 14:06 ` Paul Fulghum 2003-05-12 16:24 ` Greg KH 1 sibling, 0 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-12 14:06 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Arnd Bergmann, johannes On Mon, 2003-05-12 at 08:57, Paul Fulghum wrote: > On Fri, 2003-05-09 at 16:28, Andrew Morton wrote: > > > This code was added to wakeup_hc(). It is called from uhci_irq(): > > > > + /* Global resume for 20ms */ > > + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); > > + wait_ms(20); > > > > The changelog just says "Minor patch for uhci-hcd.c" > > > > Can you delete the wait_ms() and see if that is our culprit? > > This is the culprit. > > Removing this line corrects the latency problems on > the server. A 20ms delay seems pretty excessive for an > interrupt handler. I'm not sure what it is supposed to > accomplish, but this seems like something that should > be scheduled to run outside of the ISR. > > I must have messed up a test on the laptop that is > also showing latency problems. On the laptop the > problem *is* in both 2.5.68/2.5.69 and *is not* > eliminated by turning off USB. The laptop uses the > ohci driver anyways which is not effected by this patch. > The laptop does not show latency problems on 2.4.20. > > So the patch above is definately a problem, > but the problem I am seeing on the laptop > is something unrelated, but part of 2.5.x > (which I will investigate further). > > Thanks, > Paul I forgot to add this snippet from the /var/log/messages file of the server in case it helps the USB maintainer in evaluating what to do about the above patch. kernel: drivers/usb/host/uhci-hcd.c: USB Universal Host Controller Interface driver v2.0 kernel: uhci-hcd 00:04.2: Intel Corp. 82371AB/EB/MB PIIX4 kernel: uhci-hcd 00:04.2: irq 19, io base 0000fce0 kernel: Please use the 'usbfs' filetype instead, the 'usbdevfs' name is deprecated. kernel: uhci-hcd 00:04.2: new USB bus registered, assigned bus number 1 kernel: hub 1-0:0: USB hub found kernel: hub 1-0:0: 2 ports detected -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 13:57 ` Paul Fulghum 2003-05-12 14:06 ` Paul Fulghum @ 2003-05-12 16:24 ` Greg KH 2003-05-12 17:08 ` Paul Fulghum 1 sibling, 1 reply; 65+ messages in thread From: Greg KH @ 2003-05-12 16:24 UTC (permalink / raw) To: Paul Fulghum; +Cc: Andrew Morton, stern, linux-kernel, Arnd Bergmann, johannes On Mon, May 12, 2003 at 08:57:42AM -0500, Paul Fulghum wrote: > On Fri, 2003-05-09 at 16:28, Andrew Morton wrote: > > > This code was added to wakeup_hc(). It is called from uhci_irq(): > > > > + /* Global resume for 20ms */ > > + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); > > + wait_ms(20); > > > > The changelog just says "Minor patch for uhci-hcd.c" > > > > Can you delete the wait_ms() and see if that is our culprit? > > This is the culprit. > > Removing this line corrects the latency problems on > the server. A 20ms delay seems pretty excessive for an > interrupt handler. I'm not sure what it is supposed to > accomplish, but this seems like something that should > be scheduled to run outside of the ISR. This should only happen when your hardware receives a "RESUME" signal from a USB device AND the host controller is in a global suspend state at that time. Now I think the wait_ms() call is valid for when this is really happening, but it sounds like you are having this happen all the time during normal operation. Are you using any USB devices with this server? Is USB enabled in the BIOS or not? Also, Johannes / Alan, should we be verifying the global suspend state when we read this value so that we don't accidentally call wakeup_hc() for hardware that sets this bit in an illegal way? I think that might be the proper fix for this problem. thanks, greg k-h ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 16:24 ` Greg KH @ 2003-05-12 17:08 ` Paul Fulghum 2003-05-12 17:30 ` Greg KH 0 siblings, 1 reply; 65+ messages in thread From: Paul Fulghum @ 2003-05-12 17:08 UTC (permalink / raw) To: Greg KH; +Cc: Andrew Morton, stern, linux-kernel, Arnd Bergmann, johannes On Mon, 2003-05-12 at 11:24, Greg KH wrote: > On Mon, May 12, 2003 at 08:57:42AM -0500, Paul Fulghum wrote: > > On Fri, 2003-05-09 at 16:28, Andrew Morton wrote: > > > > > This code was added to wakeup_hc(). It is called from uhci_irq(): > > > > > > + /* Global resume for 20ms */ > > > + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); > > > + wait_ms(20); > > > > > > The changelog just says "Minor patch for uhci-hcd.c" > > > > > > Can you delete the wait_ms() and see if that is our culprit? > > > > This is the culprit. > > > > Removing this line corrects the latency problems on > > the server. A 20ms delay seems pretty excessive for an > > interrupt handler. I'm not sure what it is supposed to > > accomplish, but this seems like something that should > > be scheduled to run outside of the ISR. > > This should only happen when your hardware receives a "RESUME" signal > from a USB device AND the host controller is in a global suspend state > at that time. > > Now I think the wait_ms() call is valid for when this is really > happening, but it sounds like you are having this happen all the time > during normal operation. It does appear to happen on a regular basis. Should the 20ms delay be in the ISR though? I thought it was standard practice to move such lengthy operations outside of the ISR so as not to impact interrupt latency for the system. > Are you using any USB devices with this > server? Is USB enabled in the BIOS or not? There are no USB devices attached to the server. There are no actual USB connectors, and the server's specs do not list USB. There is no option to enable/disable USB in the BIOS. So my guess is this is an unused portion of the chipset being detected and enabled. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 17:08 ` Paul Fulghum @ 2003-05-12 17:30 ` Greg KH 2003-05-12 17:49 ` Paul Fulghum 2003-05-13 15:26 ` Alan Stern 0 siblings, 2 replies; 65+ messages in thread From: Greg KH @ 2003-05-12 17:30 UTC (permalink / raw) To: Paul Fulghum; +Cc: Andrew Morton, stern, linux-kernel, Arnd Bergmann, johannes On Mon, May 12, 2003 at 12:08:21PM -0500, Paul Fulghum wrote: > On Mon, 2003-05-12 at 11:24, Greg KH wrote: > > On Mon, May 12, 2003 at 08:57:42AM -0500, Paul Fulghum wrote: > > > On Fri, 2003-05-09 at 16:28, Andrew Morton wrote: > > > > > > > This code was added to wakeup_hc(). It is called from uhci_irq(): > > > > > > > > + /* Global resume for 20ms */ > > > > + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); > > > > + wait_ms(20); > > > > > > > > The changelog just says "Minor patch for uhci-hcd.c" > > > > > > > > Can you delete the wait_ms() and see if that is our culprit? > > > > > > This is the culprit. > > > > > > Removing this line corrects the latency problems on > > > the server. A 20ms delay seems pretty excessive for an > > > interrupt handler. I'm not sure what it is supposed to > > > accomplish, but this seems like something that should > > > be scheduled to run outside of the ISR. > > > > This should only happen when your hardware receives a "RESUME" signal > > from a USB device AND the host controller is in a global suspend state > > at that time. > > > > Now I think the wait_ms() call is valid for when this is really > > happening, but it sounds like you are having this happen all the time > > during normal operation. > > It does appear to happen on a regular basis. > > Should the 20ms delay be in the ISR though? > I thought it was standard practice to move such > lengthy operations outside of the ISR so as not to > impact interrupt latency for the system. This should only happen when the hardware is suspended, and we are being woken up by a device. So this should be a _very_ rare occurance, and when it does happen, the latency isn't that big of a deal (we need it to wake up the hardware properly.) > > Are you using any USB devices with this > > server? Is USB enabled in the BIOS or not? > > There are no USB devices attached to the server. > There are no actual USB connectors, and the > server's specs do not list USB. There is no > option to enable/disable USB in the BIOS. Heh, then I would suggest not loading this driver at all. It sounds like you have an internal USB controller that probably does not have properly terminated connectors. thanks, greg k-h ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 17:30 ` Greg KH @ 2003-05-12 17:49 ` Paul Fulghum 2003-05-12 18:01 ` Greg KH 2003-05-13 15:26 ` Alan Stern 1 sibling, 1 reply; 65+ messages in thread From: Paul Fulghum @ 2003-05-12 17:49 UTC (permalink / raw) To: Greg KH; +Cc: Andrew Morton, stern, linux-kernel, Arnd Bergmann, johannes On Mon, 2003-05-12 at 12:30, Greg KH wrote: > > Should the 20ms delay be in the ISR though? > > I thought it was standard practice to move such > > lengthy operations outside of the ISR so as not to > > impact interrupt latency for the system. > > This should only happen when the hardware is suspended, and we are being > woken up by a device. So this should be a _very_ rare occurance, and > when it does happen, the latency isn't that big of a deal (we need it to > wake up the hardware properly.) So you feel interrupt latency does not matter when a machine is waking up? I'm not particularly worried about that situation, so I won't argue that. How about some sort of sanity check (as you mentioned earlier), so this is not shooting off all of the time during normal operation. > > There are no USB devices attached to the server. > > There are no actual USB connectors, and the > > server's specs do not list USB. There is no > > option to enable/disable USB in the BIOS. > > Heh, then I would suggest not loading this driver at all. It sounds > like you have an internal USB controller that probably does not have > properly terminated connectors. Maybe. But most distributions have the USB driver loaded by default, so if this new change stays as is, it will silently cause erratic problems for such machines (with unused USB controllers on the mainboard). Then this investigation will be repeated over and over by end users and anyone trying to support latency sensitive devices (such as standard serial ports) on Linux. So either a sanity check to prevent unnecessary calls to this delay, or recoding the delay so it does not run in the ISR and kill interrupt latency would be the correct thing to do. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 17:49 ` Paul Fulghum @ 2003-05-12 18:01 ` Greg KH 2003-05-12 18:15 ` Paul Fulghum 0 siblings, 1 reply; 65+ messages in thread From: Greg KH @ 2003-05-12 18:01 UTC (permalink / raw) To: Paul Fulghum; +Cc: Andrew Morton, stern, linux-kernel, Arnd Bergmann, johannes On Mon, May 12, 2003 at 12:49:29PM -0500, Paul Fulghum wrote: > > How about some sort of sanity check (as you mentioned > earlier), so this is not shooting off all of the time > during normal operation. That's the proper thing to do. Also possibly blacklisting your motherboard's USB controller. What does lspci -v show? thanks, greg k-h ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 18:01 ` Greg KH @ 2003-05-12 18:15 ` Paul Fulghum 0 siblings, 0 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-12 18:15 UTC (permalink / raw) To: Greg KH; +Cc: Andrew Morton, stern, linux-kernel, Arnd Bergmann, johannes On Mon, 2003-05-12 at 13:01, Greg KH wrote: > On Mon, May 12, 2003 at 12:49:29PM -0500, Paul Fulghum wrote: > > > > How about some sort of sanity check (as you mentioned > > earlier), so this is not shooting off all of the time > > during normal operation. > > That's the proper thing to do. Also possibly blacklisting your > motherboard's USB controller. What does lspci -v show? If both can be accomplished (state check to qualify delay and blacklisting), that would be optimal. The machine is the HP Netserver LC3 which does not officially have USB (but apparently has a vestigial controller), so blacklisting should be a no brainer. Output of lspci: 00:00.0 Host bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled) (rev 02) Flags: bus master, medium devsel, latency 64 Memory at <unassigned> (32-bit, prefetchable) [size=256M] 00:04.0 ISA bridge: Intel Corp. 82371AB/EB/MB PIIX4 ISA (rev 02) Flags: bus master, medium devsel, latency 0 00:04.1 IDE interface: Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01) (prog-if 80 [Master]) Flags: bus master, medium devsel, latency 32 I/O ports at fcb0 [size=16] 00:04.2 USB Controller: Intel Corp. 82371AB/EB/MB PIIX4 USB (rev 01) (prog-if 00 [UHCI]) Flags: bus master, medium devsel, latency 32, IRQ 19 I/O ports at fce0 [size=32] 00:04.3 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 02) Flags: medium devsel, IRQ 9 00:07.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 02) (prog-if 00 [Normal decode]) Flags: bus master, medium devsel, latency 57 Bus: primary=00, secondary=01, subordinate=01, sec-latency=36 I/O behind bridge: 0000e000-0000efff Memory behind bridge: feb00000-febfffff Prefetchable memory behind bridge: 00000000fbf00000-00000000fbf00000 00:08.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 05) Subsystem: Hewlett-Packard Company NetServer 10/100TX Flags: bus master, medium devsel, latency 66, IRQ 16 Memory at fecfc000 (32-bit, prefetchable) [size=4K] I/O ports at fcc0 [size=32] Memory at fed00000 (32-bit, non-prefetchable) [size=1M] Expansion ROM at <unassigned> [disabled] [size=1M] Capabilities: [dc] Power Management version 1 00:09.0 Communication controller: Microgate Corporation: Unknown device 0030 (rev 01) Subsystem: Microgate Corporation: Unknown device 0030 Flags: medium devsel, IRQ 17 Memory at fecfd400 (32-bit, non-prefetchable) [size=128] I/O ports at fc00 [size=128] Memory at fecfd800 (32-bit, non-prefetchable) [size=512] Memory at fec80000 (32-bit, prefetchable) [size=256K] Memory at fecfdc00 (32-bit, non-prefetchable) [size=16] 00:0a.0 SCSI storage controller: Adaptec AIC-7880U (rev 01) Subsystem: Adaptec AIC-7880P Ultra/Ultra Wide SCSI Chipset Flags: bus master, medium devsel, latency 64, IRQ 19 I/O ports at f800 [disabled] [size=256] Memory at fecff000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at <unassigned> [disabled] [size=64K] Capabilities: [dc] Power Management version 1 00:0d.0 VGA compatible controller: Cirrus Logic GD 5446 (rev 45) (prog-if 00 [VGA]) Subsystem: Hewlett-Packard Company: Unknown device 0001 Flags: medium devsel Memory at fc000000 (32-bit, prefetchable) [size=32M] Memory at fecfe000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at <unassigned> [disabled] [size=32K] 01:02.0 Communication controller: Microgate Corporation: Unknown device 0020 (rev 01) Subsystem: Microgate Corporation: Unknown device 0020 Flags: medium devsel, IRQ 18 Memory at febff800 (32-bit, non-prefetchable) [size=128] I/O ports at ec00 [size=128] I/O ports at ecfc [size=4] 01:03.0 Communication controller: Microgate Corporation: Unknown device 0210 (rev 02) Subsystem: Microgate Corporation: Unknown device 0210 Flags: medium devsel, IRQ 19 Memory at febff400 (32-bit, non-prefetchable) [size=128] I/O ports at e880 [size=128] I/O ports at ece8 [size=8] Memory at fbf80000 (32-bit, prefetchable) [size=256K] 01:04.0 Communication controller: Microgate Corporation SyncLink WAN Adapter (rev 01) Subsystem: Microgate Corporation SyncLink WAN Adapter Flags: medium devsel, IRQ 16 Memory at febfe800 (32-bit, non-prefetchable) [size=128] I/O ports at e800 [size=128] I/O ports at ecf0 [size=8] Memory at fbf40000 (32-bit, prefetchable) [size=256K] -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 17:30 ` Greg KH 2003-05-12 17:49 ` Paul Fulghum @ 2003-05-13 15:26 ` Alan Stern 2003-05-13 15:35 ` Paul Fulghum 1 sibling, 1 reply; 65+ messages in thread From: Alan Stern @ 2003-05-13 15:26 UTC (permalink / raw) To: Greg KH Cc: Paul Fulghum, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Mon, 12 May 2003, Greg KH wrote: > > > This should only happen when your hardware receives a "RESUME" signal > > > from a USB device AND the host controller is in a global suspend state > > > at that time. > > > > > > Now I think the wait_ms() call is valid for when this is really > > > happening, but it sounds like you are having this happen all the time > > > during normal operation. > > > > It does appear to happen on a regular basis. > > > > Should the 20ms delay be in the ISR though? > > I thought it was standard practice to move such > > lengthy operations outside of the ISR so as not to > > impact interrupt latency for the system. > > This should only happen when the hardware is suspended, and we are being > woken up by a device. So this should be a _very_ rare occurance, and > when it does happen, the latency isn't that big of a deal (we need it to > wake up the hardware properly.) Putting in a sanity check for the global suspend state will be very easy. But I would like to point out that this "global suspend" does not refer to the entire system, only the USB bus. I'm not sure under what circumstances the bus is placed in global suspend; I think it's just when there are no devices attached (or the last remaining device is detached). However, there have been cases on my own system where turning off the only USB peripheral caused the driver to bounce between suspend_hc() and wakeup_hc() several times without any apparent explanation -- possibly as a result of transient electrical signals on the bus (?). So perhaps moving that delay out of the ISR isn't such a bad idea. Alan Stern ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 15:26 ` Alan Stern @ 2003-05-13 15:35 ` Paul Fulghum 2003-05-13 17:30 ` Greg KH ` (2 more replies) 0 siblings, 3 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-13 15:35 UTC (permalink / raw) To: Alan Stern; +Cc: Greg KH, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, 2003-05-13 at 10:26, Alan Stern wrote: > Putting in a sanity check for the global suspend state will be very easy. > But I would like to point out that this "global suspend" does not refer to > the entire system, only the USB bus. That is a problem then, because the delay can still occur during normal system operation. > I'm not sure under what > circumstances the bus is placed in global suspend; I think it's just when > there are no devices attached (or the last remaining device is detached). > > However, there have been cases on my own system where turning off the only > USB peripheral caused the driver to bounce between suspend_hc() and > wakeup_hc() several times without any apparent explanation -- possibly as > a result of transient electrical signals on the bus (?). So perhaps > moving that delay out of the ISR isn't such a bad idea. Agreed. If this can happen on functional USB controllers when no devices are attached, then it is a serious problem. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 15:35 ` Paul Fulghum @ 2003-05-13 17:30 ` Greg KH 2003-05-13 13:01 ` Paul Fulghum 2003-05-13 20:17 ` Bill Davidsen 2003-05-14 20:52 ` Test Patch: " Alan Stern 2 siblings, 1 reply; 65+ messages in thread From: Greg KH @ 2003-05-13 17:30 UTC (permalink / raw) To: Paul Fulghum Cc: Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, May 13, 2003 at 10:35:07AM -0500, Paul Fulghum wrote: > On Tue, 2003-05-13 at 10:26, Alan Stern wrote: > > > Putting in a sanity check for the global suspend state will be very easy. > > But I would like to point out that this "global suspend" does not refer to > > the entire system, only the USB bus. > > That is a problem then, because the delay can still > occur during normal system operation. Ok, can you try the attached patch and see if it causes your latency problem to go away? thanks, greg k-h --- a/drivers/usb/host/uhci-hcd.c Sun May 4 23:49:54 2003 +++ b/drivers/usb/host/uhci-hcd.c Tue May 13 10:26:02 2003 @@ -1947,6 +1947,11 @@ dbg("%x: wakeup_hc", io_addr); + /* Verify that we really should wake up the hc */ + status = inw(io_addr + USBCMD); + if (!(status & USBCMD_EGSM)) + return; + /* Global resume for 20ms */ outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); wait_ms(20); ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 17:30 ` Greg KH @ 2003-05-13 13:01 ` Paul Fulghum 2003-05-13 18:09 ` Greg KH 0 siblings, 1 reply; 65+ messages in thread From: Paul Fulghum @ 2003-05-13 13:01 UTC (permalink / raw) To: Greg KH; +Cc: Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, 2003-05-13 at 12:30, Greg KH wrote: > On Tue, May 13, 2003 at 10:35:07AM -0500, Paul Fulghum wrote: > > On Tue, 2003-05-13 at 10:26, Alan Stern wrote: > > > > > Putting in a sanity check for the global suspend state will be very easy. > > > But I would like to point out that this "global suspend" does not refer to > > > the entire system, only the USB bus. > > > > That is a problem then, because the delay can still > > occur during normal system operation. > > Ok, can you try the attached patch and see if it causes your latency > problem to go away? I applied the patch plus a couple of printk statements, and the wakeup_hc() is being continuously called as well as actually executing the delay. So the check is not preventing anything. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 13:01 ` Paul Fulghum @ 2003-05-13 18:09 ` Greg KH 2003-05-13 18:11 ` Greg KH 0 siblings, 1 reply; 65+ messages in thread From: Greg KH @ 2003-05-13 18:09 UTC (permalink / raw) To: Paul Fulghum Cc: Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, May 13, 2003 at 08:01:01AM -0500, Paul Fulghum wrote: > On Tue, 2003-05-13 at 12:30, Greg KH wrote: > > On Tue, May 13, 2003 at 10:35:07AM -0500, Paul Fulghum wrote: > > > On Tue, 2003-05-13 at 10:26, Alan Stern wrote: > > > > > > > Putting in a sanity check for the global suspend state will be very easy. > > > > But I would like to point out that this "global suspend" does not refer to > > > > the entire system, only the USB bus. > > > > > > That is a problem then, because the delay can still > > > occur during normal system operation. > > > > Ok, can you try the attached patch and see if it causes your latency > > problem to go away? > > I applied the patch plus a couple of printk statements, > and the wakeup_hc() is being continuously called > as well as actually executing the delay. Is the suspend_hc() function ever getting called by anyone in that driver? thanks, greg k-h ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 18:09 ` Greg KH @ 2003-05-13 18:11 ` Greg KH 2003-05-13 21:35 ` Alan Stern 2003-05-14 21:06 ` Paul Fulghum 0 siblings, 2 replies; 65+ messages in thread From: Greg KH @ 2003-05-13 18:11 UTC (permalink / raw) To: Paul Fulghum Cc: Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, May 13, 2003 at 11:09:13AM -0700, Greg KH wrote: > On Tue, May 13, 2003 at 08:01:01AM -0500, Paul Fulghum wrote: > > On Tue, 2003-05-13 at 12:30, Greg KH wrote: > > > On Tue, May 13, 2003 at 10:35:07AM -0500, Paul Fulghum wrote: > > > > On Tue, 2003-05-13 at 10:26, Alan Stern wrote: > > > > > > > > > Putting in a sanity check for the global suspend state will be very easy. > > > > > But I would like to point out that this "global suspend" does not refer to > > > > > the entire system, only the USB bus. > > > > > > > > That is a problem then, because the delay can still > > > > occur during normal system operation. > > > > > > Ok, can you try the attached patch and see if it causes your latency > > > problem to go away? > > > > I applied the patch plus a couple of printk statements, > > and the wakeup_hc() is being continuously called > > as well as actually executing the delay. > > Is the suspend_hc() function ever getting called by anyone in that > driver? Ok, nevermind, I see where it would be getting called under normal operation... Hm, I don't really know. Johannes, any thoughts? thanks, greg k-h ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 18:11 ` Greg KH @ 2003-05-13 21:35 ` Alan Stern 2003-05-13 21:48 ` Helge Hafting 2003-05-14 21:06 ` Paul Fulghum 1 sibling, 1 reply; 65+ messages in thread From: Alan Stern @ 2003-05-13 21:35 UTC (permalink / raw) To: Greg KH Cc: Paul Fulghum, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, 13 May 2003, Greg KH wrote: > On Tue, May 13, 2003 at 11:09:13AM -0700, Greg KH wrote: > > On Tue, May 13, 2003 at 08:01:01AM -0500, Paul Fulghum wrote: > > > > > > I applied the patch plus a couple of printk statements, > > > and the wakeup_hc() is being continuously called > > > as well as actually executing the delay. > > > > Is the suspend_hc() function ever getting called by anyone in that > > driver? > > Ok, nevermind, I see where it would be getting called under normal > operation... > > Hm, I don't really know. Johannes, any thoughts? My take is that wakeup_hc() is getting called whenever some stray signal causes the device to generate an interrupt, and then a little while later the stall timer routine calls suspend_hc() since nothing is active. The interrupts are probably indistinguishable from what you would get if a new device really had just been attached to the bus. Assuming this analysis is correct, only malfunctioning hardware would ever cause the problem to arise. Still, it's something that needs to be handled. (That's a tricky point -- to what extent should the kernel try to compensate for broken hardware?) Unfortunately, there isn't any obvious way to tell that under these circumstances the wakeup_hc() routine doesn't need to run. Using a timer routine to implement that 20 ms delay would at least remove the large interrupt latency. However, this presents some problems as well. In particular, is there anything that would prevent suspend_hc() from being called before the timer had expired? We don't want to find ourselves simultaneously trying to turn the USB controller on and off. Getting that done properly will require some thought. Maybe a kind of grace period would help: each time the controller changes state, don't allow another change until at least 1 second later. That would also help the "bouncing" effect I see when I turn on or off my USB CD-RW drive. Alan Stern ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 21:35 ` Alan Stern @ 2003-05-13 21:48 ` Helge Hafting 2003-05-13 22:09 ` Alan Stern 0 siblings, 1 reply; 65+ messages in thread From: Helge Hafting @ 2003-05-13 21:48 UTC (permalink / raw) To: Alan Stern Cc: Greg KH, Paul Fulghum, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, May 13, 2003 at 05:35:47PM -0400, Alan Stern wrote: > My take is that wakeup_hc() is getting called whenever some stray signal > causes the device to generate an interrupt, and then a little while later > the stall timer routine calls suspend_hc() since nothing is active. The > interrupts are probably indistinguishable from what you would get if a new > device really had just been attached to the bus. > Could this also happen if the USB interrupt is shared? The other device interrupts, and the kernel calls into usb interrupt routine just in case USB has some data too? Helge Hafting ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 21:48 ` Helge Hafting @ 2003-05-13 22:09 ` Alan Stern 0 siblings, 0 replies; 65+ messages in thread From: Alan Stern @ 2003-05-13 22:09 UTC (permalink / raw) To: Helge Hafting Cc: Greg KH, Paul Fulghum, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, 13 May 2003, Helge Hafting wrote: > On Tue, May 13, 2003 at 05:35:47PM -0400, Alan Stern wrote: > > My take is that wakeup_hc() is getting called whenever some stray signal > > causes the device to generate an interrupt, and then a little while later > > the stall timer routine calls suspend_hc() since nothing is active. The > > interrupts are probably indistinguishable from what you would get if a new > > device really had just been attached to the bus. > > > Could this also happen if the USB interrupt is shared? > The other device interrupts, and the kernel calls into > usb interrupt routine just in case USB has some data too? Yes, it certainly could. The other part of the problem, which I failed to mention, is that the Resume-Detect bit in the USB controller's status register is set. wakeup_hc() gets called only if that bit is set, and the bit is supposed to be set only if some device attached to the USB bus has requested a wakeup (also known as "resume"). If there's nothing on the bus, the controller shouldn't indicate that a resume was detected. Alan Stern ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 18:11 ` Greg KH 2003-05-13 21:35 ` Alan Stern @ 2003-05-14 21:06 ` Paul Fulghum 2003-05-14 21:15 ` Johannes Erdfelt 2003-05-14 21:30 ` Greg KH 1 sibling, 2 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-14 21:06 UTC (permalink / raw) To: Greg KH; +Cc: Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, 2003-05-13 at 13:11, Greg KH wrote: I was looking over the PIIX3 datasheet and noticed that the USBSTS_RD bit is only valid when the device is in the suspended state. This bit is being acted on regardless of the suspend state of the controller in the ISR. Could this be why the driver is detecting false 'resume' signals and calling wakeup_hc() when it shouldn't? Maybe the code should be something like: if (uhci->is_suspended && (status & USBSTS_RD)) wakeup_hc(uhci); in the ISR to qualify acting on that status bit. Alternatively, USBCMD_EGSM (BIT3) of the USBCMD register could be tested to qualify action on the state of USBSTS_RD I'm going to test this now, but I wanted to know what you think. Thanks, Paul -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-14 21:06 ` Paul Fulghum @ 2003-05-14 21:15 ` Johannes Erdfelt 2003-05-14 21:30 ` Greg KH 1 sibling, 0 replies; 65+ messages in thread From: Johannes Erdfelt @ 2003-05-14 21:15 UTC (permalink / raw) To: Paul Fulghum Cc: Greg KH, Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann On Wed, May 14, 2003, Paul Fulghum <paulkf@microgate.com> wrote: > I was looking over the PIIX3 datasheet and noticed > that the USBSTS_RD bit is only valid when the > device is in the suspended state. > > This bit is being acted on regardless of the > suspend state of the controller in the ISR. > Could this be why the driver is detecting > false 'resume' signals and calling wakeup_hc() > when it shouldn't? > > Maybe the code should be something like: > > if (uhci->is_suspended && (status & USBSTS_RD)) > wakeup_hc(uhci); > > in the ISR to qualify acting on that status bit. > Alternatively, USBCMD_EGSM (BIT3) of the USBCMD > register could be tested to qualify action on > the state of USBSTS_RD > > I'm going to test this now, but I wanted to > know what you think. Good eye. That may very well be the problem. Looking at the UHCI specs, it says the same thing, but I never really noticed it before. JE ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-14 21:06 ` Paul Fulghum 2003-05-14 21:15 ` Johannes Erdfelt @ 2003-05-14 21:30 ` Greg KH 2003-05-14 21:45 ` Paul Fulghum 1 sibling, 1 reply; 65+ messages in thread From: Greg KH @ 2003-05-14 21:30 UTC (permalink / raw) To: Paul Fulghum Cc: Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Wed, May 14, 2003 at 04:06:33PM -0500, Paul Fulghum wrote: > On Tue, 2003-05-13 at 13:11, Greg KH wrote: > > I was looking over the PIIX3 datasheet and noticed > that the USBSTS_RD bit is only valid when the > device is in the suspended state. > > This bit is being acted on regardless of the > suspend state of the controller in the ISR. > Could this be why the driver is detecting > false 'resume' signals and calling wakeup_hc() > when it shouldn't? > > Maybe the code should be something like: > > if (uhci->is_suspended && (status & USBSTS_RD)) > wakeup_hc(uhci); That's basically what the code I sent you did :) > in the ISR to qualify acting on that status bit. > Alternatively, USBCMD_EGSM (BIT3) of the USBCMD > register could be tested to qualify action on > the state of USBSTS_RD > > I'm going to test this now, but I wanted to > know what you think. I think it's correct, but I don't think it will solve your problem. I would be very happy to be wrong though. thanks, greg k-h ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-14 21:30 ` Greg KH @ 2003-05-14 21:45 ` Paul Fulghum 0 siblings, 0 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-14 21:45 UTC (permalink / raw) To: Greg KH; +Cc: Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Wed, 2003-05-14 at 16:30, Greg KH wrote: > On Wed, May 14, 2003 at 04:06:33PM -0500, Paul Fulghum wrote: > > On Tue, 2003-05-13 at 13:11, Greg KH wrote: > > > > I was looking over the PIIX3 datasheet and noticed > > that the USBSTS_RD bit is only valid when the > > device is in the suspended state. > > > > This bit is being acted on regardless of the > > suspend state of the controller in the ISR. > > Could this be why the driver is detecting > > false 'resume' signals and calling wakeup_hc() > > when it shouldn't? > > > > Maybe the code should be something like: > > > > if (uhci->is_suspended && (status & USBSTS_RD)) > > wakeup_hc(uhci); > > That's basically what the code I sent you did :) Yes, that's right. In this case suspend_hc() is being called anyways, so the controller *is* in the suspended state. suspend_hc() and wakeup_hc() are spinning back and forth forever. For some reason I thought this was firing without a call to suspend_hc(), but I verified it with printks. I tried it both with is_suspended, and again testing USBCMD_EGSM in the command register (Greg's patch) with same results. So it is a good check to add to qualify USBSTS_RD, but in this case it looks like the mainboard implementation is FUBAR and bogus resume messages are being recognized by the controller. Is there some transient period after setting USBCMD_EGSM before which the controller is not officially in the suspended state that might cause a spurious USBSTS_RD indication? (seems unlikely) > > in the ISR to qualify acting on that status bit. > > Alternatively, USBCMD_EGSM (BIT3) of the USBCMD > > register could be tested to qualify action on > > the state of USBSTS_RD > > > > I'm going to test this now, but I wanted to > > know what you think. > > I think it's correct, but I don't think it will solve your problem. I > would be very happy to be wrong though. You are right (IMO) that it is correct and should be added, and you are also right in that it does not solve this problem. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 15:35 ` Paul Fulghum 2003-05-13 17:30 ` Greg KH @ 2003-05-13 20:17 ` Bill Davidsen 2003-05-13 22:39 ` Paul Fulghum 2003-05-14 20:52 ` Test Patch: " Alan Stern 2 siblings, 1 reply; 65+ messages in thread From: Bill Davidsen @ 2003-05-13 20:17 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel On 13 May 2003, Paul Fulghum wrote: > On Tue, 2003-05-13 at 10:26, Alan Stern wrote: > > > Putting in a sanity check for the global suspend state will be very easy. > > But I would like to point out that this "global suspend" does not refer to > > the entire system, only the USB bus. > > That is a problem then, because the delay can still > occur during normal system operation. > > > I'm not sure under what > > circumstances the bus is placed in global suspend; I think it's just when > > there are no devices attached (or the last remaining device is detached). > > > > However, there have been cases on my own system where turning off the only > > USB peripheral caused the driver to bounce between suspend_hc() and > > wakeup_hc() several times without any apparent explanation -- possibly as > > a result of transient electrical signals on the bus (?). So perhaps > > moving that delay out of the ISR isn't such a bad idea. > > Agreed. If this can happen on functional USB controllers > when no devices are attached, then it is a serious problem. Instead of trying to guess when to do it, could the sleep be replaced by setting a flag bit to indicate that a sleep was needed before using the hardware? Then the sleep could be done when needed but no noise on the USB bus wouldn't hurt. 1 - there may be many places, I thought of that but didn't look since someone will tell me if it's a problem. 2 - if you don't use USB why not just take the driver out? It would be nice to prevent the problem, of course. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 20:17 ` Bill Davidsen @ 2003-05-13 22:39 ` Paul Fulghum 0 siblings, 0 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-13 22:39 UTC (permalink / raw) To: Bill Davidsen; +Cc: linux-kernel On Tue, 2003-05-13 at 15:17, Bill Davidsen wrote: > 2 - if you don't use USB why not just take the driver out? Because a driver that runs amok, silently causing interrupt latency problems, becomes a real support nightmare for others. > It would be nice to prevent the problem, of course. Agreed -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Test Patch: 2.5.69 Interrupt Latency 2003-05-13 15:35 ` Paul Fulghum 2003-05-13 17:30 ` Greg KH 2003-05-13 20:17 ` Bill Davidsen @ 2003-05-14 20:52 ` Alan Stern 2003-05-15 13:45 ` Paul Fulghum ` (2 more replies) 2 siblings, 3 replies; 65+ messages in thread From: Alan Stern @ 2003-05-14 20:52 UTC (permalink / raw) To: Paul Fulghum Cc: Greg KH, Andrew Morton, linux-kernel, Arnd Bergmann, johannes, USB development list On 13 May 2003, Paul Fulghum wrote: > On Tue, 2003-05-13 at 10:26, Alan Stern wrote: > > > Putting in a sanity check for the global suspend state will be very easy. > > But I would like to point out that this "global suspend" does not refer to > > the entire system, only the USB bus. > > That is a problem then, because the delay can still > occur during normal system operation. > > > I'm not sure under what > > circumstances the bus is placed in global suspend; I think it's just when > > there are no devices attached (or the last remaining device is detached). > > > > However, there have been cases on my own system where turning off the only > > USB peripheral caused the driver to bounce between suspend_hc() and > > wakeup_hc() several times without any apparent explanation -- possibly as > > a result of transient electrical signals on the bus (?). So perhaps > > moving that delay out of the ISR isn't such a bad idea. > > Agreed. If this can happen on functional USB controllers > when no devices are attached, then it is a serious problem. Below is a patch that addresses both of the issues raised in this thread. The delay time is moved out of the interrupt handler, and the wakeup/suspend transitions are de-bounced. To do this, I needed to add a mildly elaborate state mechanism. State transitions are polled during the stall-timer callback. There is no protection against simultaneous access from multiple threads, such as a PCI suspend occurring at the same time as a normal suspend or resume. The original driver didn't have any either; it's probably not worth worrying too much about. The patch works okay on my system. Try it and see how it works on yours. Johannes, please look over this code and verify that I haven't screwed anything up. Alan Stern ===== uhci-hcd.c 1.48 vs edited ===== --- 1.48/drivers/usb/host/uhci-hcd.c Mon May 5 02:49:54 2003 +++ edited/drivers/usb/host/uhci-hcd.c Wed May 14 16:45:42 2003 @@ -61,7 +61,7 @@ /* * Version Information */ -#define DRIVER_VERSION "v2.0" +#define DRIVER_VERSION "v2.1" #define DRIVER_AUTHOR "Linus 'Frodo Rabbit' Torvalds, Johannes Erdfelt, Randy Dunlap, Georg Acher, Deti Fliegl, Thomas Sailer, Roman Weissgaerber" #define DRIVER_DESC "USB Universal Host Controller Interface driver" @@ -91,9 +91,7 @@ static int uhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb); static void uhci_unlink_generic(struct uhci_hcd *uhci, struct urb *urb); -static int ports_active(struct uhci_hcd *uhci); -static void suspend_hc(struct uhci_hcd *uhci); -static void wakeup_hc(struct uhci_hcd *uhci); +static void hc_state_transitions(struct uhci_hcd *uhci); /* If a transfer is still active after this much time, turn off FSBR */ #define IDLE_TIMEOUT (HZ / 20) /* 50 ms */ @@ -1757,9 +1755,8 @@ uhci->skel_term_qh->link = UHCI_PTR_TERM; } - /* enter global suspend if nothing connected */ - if (!uhci->is_suspended && !ports_active(uhci)) - suspend_hc(uhci); + /* Poll for and perform state transitions */ + hc_state_transitions(uhci); init_stall_timer(hcd); } @@ -1884,14 +1881,14 @@ err("%x: host system error, PCI problems?", io_addr); if (status & USBSTS_HCPE) err("%x: host controller process error. something bad happened", io_addr); - if ((status & USBSTS_HCH) && !uhci->is_suspended) { + if ((status & USBSTS_HCH) && uhci->state > 0) { err("%x: host controller halted. very bad", io_addr); /* FIXME: Reset the controller, fix the offending TD */ } } if (status & USBSTS_RD) - wakeup_hc(uhci); + uhci->resume_detect = 1; uhci_free_pending_qhs(uhci); @@ -1922,45 +1919,75 @@ unsigned int io_addr = uhci->io_addr; /* Global reset for 50ms */ + uhci->state = UHCI_RESET; outw(USBCMD_GRESET, io_addr + USBCMD); wait_ms(50); outw(0, io_addr + USBCMD); wait_ms(10); + uhci->resume_detect = 0; } static void suspend_hc(struct uhci_hcd *uhci) { unsigned int io_addr = uhci->io_addr; - dbg("%x: suspend_hc", io_addr); + switch (uhci->state) { + case UHCI_RUNNING: /* Suspend after 1 second */ + uhci->state = UHCI_SUSPENDING_GRACE; + uhci->state_end = jiffies + HZ; + break; - uhci->is_suspended = 1; - smp_wmb(); + case UHCI_SUSPENDING_GRACE: /* Actually suspend */ + dbg("%x: suspend_hc", io_addr); + uhci->state = UHCI_SUSPENDED; + uhci->resume_detect = 0; + outw(USBCMD_EGSM, io_addr + USBCMD); + break; - outw(USBCMD_EGSM, io_addr + USBCMD); + default: + break; + } } static void wakeup_hc(struct uhci_hcd *uhci) { unsigned int io_addr = uhci->io_addr; - unsigned int status; - dbg("%x: wakeup_hc", io_addr); + switch (uhci->state) { + case UHCI_SUSPENDED: /* Start the resume */ + dbg("%x: wakeup_hc", io_addr); + + /* Global resume for >= 20ms */ + uhci->state = UHCI_RESUMING_1; + uhci->state_end = jiffies + (20*HZ+999) / 1000; + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); + break; - /* Global resume for 20ms */ - outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); - wait_ms(20); - outw(0, io_addr + USBCMD); - - /* wait for EOP to be sent */ - status = inw(io_addr + USBCMD); - while (status & USBCMD_FGR) - status = inw(io_addr + USBCMD); + case UHCI_RESUMING_1: /* Continue the resume */ + uhci->state = UHCI_RESUMING_2; + outw(0, io_addr + USBCMD); + /* Falls through */ - uhci->is_suspended = 0; + case UHCI_RESUMING_2: /* Run for at least 1 second */ - /* Run and mark it configured with a 64-byte max packet */ - outw(USBCMD_RS | USBCMD_CF | USBCMD_MAXP, io_addr + USBCMD); + /* wait for EOP to be sent */ + if (inw(io_addr + USBCMD) & USBCMD_FGR) + break; + + /* Run and mark it configured with a 64-byte max packet */ + uhci->state = UHCI_RUNNING_GRACE; + uhci->state_end = jiffies + HZ; + outw(USBCMD_RS | USBCMD_CF | USBCMD_MAXP, + io_addr + USBCMD); + break; + + case UHCI_RUNNING_GRACE: /* Now allowed to suspend */ + uhci->state = UHCI_RUNNING; + break; + + default: + break; + } } static int ports_active(struct uhci_hcd *uhci) @@ -1975,6 +2002,40 @@ return connection; } +static void hc_state_transitions(struct uhci_hcd *uhci) +{ + switch (uhci->state) { + case UHCI_RUNNING: + + /* enter global suspend if nothing connected */ + if (!ports_active(uhci)) + suspend_hc(uhci); + break; + + case UHCI_SUSPENDING_GRACE: + if (time_after_eq(jiffies, uhci->state_end)) + suspend_hc(uhci); + break; + + case UHCI_SUSPENDED: + + /* wakeup if requested by a device */ + if (uhci->resume_detect) + wakeup_hc(uhci); + break; + + case UHCI_RESUMING_1: + case UHCI_RESUMING_2: + case UHCI_RUNNING_GRACE: + if (time_after_eq(jiffies, uhci->state_end)) + wakeup_hc(uhci); + break; + + default: + break; + } +} + static void start_hc(struct uhci_hcd *uhci) { unsigned int io_addr = uhci->io_addr; @@ -2003,6 +2064,8 @@ outl(uhci->fl->dma_handle, io_addr + USBFLBASEADD); /* Run and mark it configured with a 64-byte max packet */ + uhci->state = UHCI_RUNNING_GRACE; + uhci->state_end = jiffies + HZ; outw(USBCMD_RS | USBCMD_CF | USBCMD_MAXP, io_addr + USBCMD); uhci->hcd.state = USB_STATE_READY; @@ -2101,8 +2164,6 @@ uhci->fsbr = 0; uhci->fsbrtimeout = 0; - uhci->is_suspended = 0; - spin_lock_init(&uhci->qh_remove_list_lock); INIT_LIST_HEAD(&uhci->qh_remove_list); @@ -2335,6 +2396,8 @@ { struct uhci_hcd *uhci = hcd_to_uhci(hcd); + /* Force an immediate suspend */ + uhci->state = UHCI_SUSPENDING_GRACE; suspend_hc(uhci); return 0; } ===== uhci-hcd.h 1.11 vs edited ===== --- 1.11/drivers/usb/host/uhci-hcd.h Tue Dec 10 14:03:10 2002 +++ edited/drivers/usb/host/uhci-hcd.h Wed May 14 15:48:18 2003 @@ -282,6 +282,29 @@ return 0; /* int128 for 128-255 ms (Max.) */ } +/* + * Device states for the host controller. + * + * To prevent "bouncing" in the presence of electrical noise, + * we insist on a 1-second "grace" period, before switching to + * the RUNNING or SUSPENDED states, during which the state is + * not allowed to change. + * + * The resume process is divided into substates in order to avoid + * potentially length delays during the timer handler. + * + * States in which the host controller is halted must have values <= 0. + */ +enum uhci_state { + UHCI_RESET, + UHCI_RUNNING_GRACE, /* Before RUNNING */ + UHCI_RUNNING, /* The normal state */ + UHCI_SUSPENDING_GRACE, /* Before SUSPENDED */ + UHCI_SUSPENDED = -10, /* When no devices are attached */ + UHCI_RESUMING_1, + UHCI_RESUMING_2 +}; + #define hcd_to_uhci(hcd_ptr) container_of(hcd_ptr, struct uhci_hcd, hcd) /* @@ -313,7 +336,10 @@ struct uhci_frame_list *fl; /* P: uhci->frame_list_lock */ int fsbr; /* Full speed bandwidth reclamation */ unsigned long fsbrtimeout; /* FSBR delay */ - int is_suspended; + + enum uhci_state state; /* FIXME: needs a spinlock */ + unsigned long state_end; /* Time of next transition */ + int resume_detect; /* Need a Global Resume */ /* Main list of URB's currently controlled by this HC */ spinlock_t urb_list_lock; ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-14 20:52 ` Test Patch: " Alan Stern @ 2003-05-15 13:45 ` Paul Fulghum 2003-05-15 14:12 ` Paul Fulghum 2003-05-15 15:26 ` Paul Fulghum 2 siblings, 0 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-15 13:45 UTC (permalink / raw) To: Alan Stern Cc: Greg KH, Andrew Morton, linux-kernel, Arnd Bergmann, johannes, USB development list On Wed, 2003-05-14 at 15:52, Alan Stern wrote: > Below is a patch that addresses both of the issues raised in this thread. > The delay time is moved out of the interrupt handler, and the > wakeup/suspend transitions are de-bounced. To do this, I needed to add a > mildly elaborate state mechanism. State transitions are polled during the > stall-timer callback. > > There is no protection against simultaneous access from multiple threads, > such as a PCI suspend occurring at the same time as a normal suspend or > resume. The original driver didn't have any either; it's probably not > worth worrying too much about. The patch works okay on my system. Try it > and see how it works on yours. > > Johannes, please look over this code and verify that I haven't screwed > anything up. > > Alan Stern I tested the patch, and it solves the IRQ latency problems by moving the delay outside of the ISR. The debouncing period reduces the rate of thrashing back and forth between wake and suspend, but the cycle does continue forever: May 15 08:27:27 diemos kernel: suspend_hc():UHCI_RUNNING_GRACE May 15 08:27:27 diemos kernel: suspend_hc():UHCI_RUNNING May 15 08:27:28 diemos kernel: suspend_hc():UHCI_SUSPENDING_GRACE May 15 08:27:28 diemos kernel: wakeup_hc():UHCI_SUSPENDED May 15 08:27:28 diemos kernel: wakeup_hc():UHCI_RESUMING_1 May 15 08:27:28 diemos kernel: suspend_hc():UHCI_RESUMING_2 May 15 08:27:29 diemos kernel: suspend_hc():UHCI_RUNNING_GRACE May 15 08:27:29 diemos kernel: suspend_hc():UHCI_RUNNING May 15 08:27:30 diemos kernel: suspend_hc():UHCI_SUSPENDING_GRACE May 15 08:27:30 diemos kernel: wakeup_hc():UHCI_SUSPENDED May 15 08:27:30 diemos kernel: wakeup_hc():UHCI_RESUMING_1 May 15 08:27:30 diemos kernel: suspend_hc():UHCI_RESUMING_2 May 15 08:27:31 diemos kernel: suspend_hc():UHCI_RUNNING_GRACE This patch removes my complaint, but I do wonder why this unused controller continually generates the USBSTS_RD indications. I would hope HP used pull-ups/downs on unused input signals of the PIIX3 chipset, but maybe not. I also can't vouch for the correct operation of this patch for fully functional USB implementations. If you have other tests you want me to do to try and figure out a sane way of dealing with such unused controllers, just ask. Thanks, Paul -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-14 20:52 ` Test Patch: " Alan Stern 2003-05-15 13:45 ` Paul Fulghum @ 2003-05-15 14:12 ` Paul Fulghum 2003-05-15 21:07 ` Alan Stern 2003-05-15 15:26 ` Paul Fulghum 2 siblings, 1 reply; 65+ messages in thread From: Paul Fulghum @ 2003-05-15 14:12 UTC (permalink / raw) To: Alan Stern Cc: Greg KH, Andrew Morton, linux-kernel, Arnd Bergmann, johannes, USB development list I have a question about the wakeup_hc() code in general: when waking in response to a resume event, the current code sets the USBCMD_FGR (force global resume) and USBCMD_EGSM (enter global suspend mode) bits, waits 20ms and clears both bits to start sending EOP signal. According to the datasheet, the controller itself (not software) sets the FGR bit on detection of a resume event. 20ms after the USBSTS_RD indication, software should clear both the FGR and EGSM bits. My reading of this is that the line: outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); before the 20ms wait should not be necessary. Am I reading this correctly? Thanks, Paul -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-15 14:12 ` Paul Fulghum @ 2003-05-15 21:07 ` Alan Stern 0 siblings, 0 replies; 65+ messages in thread From: Alan Stern @ 2003-05-15 21:07 UTC (permalink / raw) To: Paul Fulghum Cc: Greg KH, Andrew Morton, linux-kernel, Arnd Bergmann, johannes, USB development list On 15 May 2003, Paul Fulghum wrote: > I have a question about the wakeup_hc() code in general: > > when waking in response to a resume event, > the current code sets the USBCMD_FGR (force global resume) > and USBCMD_EGSM (enter global suspend mode) bits, > waits 20ms and clears both bits to start sending EOP signal. > > According to the datasheet, the controller itself > (not software) sets the FGR bit on detection of > a resume event. 20ms after the USBSTS_RD indication, > software should clear both the FGR and EGSM bits. > > My reading of this is that the line: > > outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); > > before the 20ms wait should not be necessary. > > Am I reading this correctly? I interpret it the same way as you. I tried removing that line from the driver, and it continued to work just fine. So it looks like you are right. Alan Stern ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-14 20:52 ` Test Patch: " Alan Stern 2003-05-15 13:45 ` Paul Fulghum 2003-05-15 14:12 ` Paul Fulghum @ 2003-05-15 15:26 ` Paul Fulghum 2003-05-15 18:11 ` Alan Stern 2 siblings, 1 reply; 65+ messages in thread From: Paul Fulghum @ 2003-05-15 15:26 UTC (permalink / raw) To: Alan Stern Cc: Greg KH, Andrew Morton, linux-kernel, Arnd Bergmann, johannes, USB development list I think I found the key to this whole problem: Note:I mistakenly referred to the chipset as PIIX3 in previous messages, in fact it is the PIIX4 (BX) The PIIX4 errata states that false resume indications can be generated if CLK48 is active during an overcondition indication (OC[1..0]). Sure enough, the USBPORTSC[12] registers constantly report a status of 0C80 which shows that both ports are showing overcurrent condition (which disables the associated port). My guess is that HP deliberately tied the OC[1..0] inputs active to force the ports to a disabled state. So checking for the case of both ports constantly in OC condition and disabled may be a reasonable way to either disable the controller altogether or at least not do the wakeup/suspend shuffle. Any comments? -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-15 15:26 ` Paul Fulghum @ 2003-05-15 18:11 ` Alan Stern 2003-05-15 18:40 ` Paul Fulghum 2003-05-15 19:42 ` Paul Fulghum 0 siblings, 2 replies; 65+ messages in thread From: Alan Stern @ 2003-05-15 18:11 UTC (permalink / raw) To: Paul Fulghum Cc: Greg KH, Andrew Morton, linux-kernel, Arnd Bergmann, johannes, USB development list On 15 May 2003, Paul Fulghum wrote: > I think I found the key to this whole problem: > > Note:I mistakenly referred to the chipset as PIIX3 > in previous messages, in fact it is the PIIX4 (BX) > > The PIIX4 errata states that false resume indications > can be generated if CLK48 is active during an > overcondition indication (OC[1..0]). > > Sure enough, the USBPORTSC[12] registers constantly > report a status of 0C80 which shows that both > ports are showing overcurrent condition (which > disables the associated port). > > My guess is that HP deliberately tied the OC[1..0] > inputs active to force the ports to a disabled state. > > So checking for the case of both ports constantly > in OC condition and disabled may be a reasonable > way to either disable the controller altogether or > at least not do the wakeup/suspend shuffle. > > Any comments? That sounds like a believable explanation. My copy of the generic UHCI specification does not include the OC port status bits. I'm guessing from your mail they are either bit 10 or bit 11 of the PORTSC registers, can't tell which. Maybe they are an Intel-specific addition? Or perhaps a more recent version of the spec has more information -- the one I've got is 1.1 (March 1996). Can you suggest a good way of detecting whether or not a controller is part of a PIIX4 chipset, to indicate whether or not the OC bits are valid? Maybe the PCI vendor and product codes will have that information? I'm not sure it's safe to assume that any old host controller will have meaningful values there; the spec just says "reserved" and doesn't stipulate that they will always read as 0's. Alan Stern ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-15 18:11 ` Alan Stern @ 2003-05-15 18:40 ` Paul Fulghum 2003-05-15 19:42 ` Paul Fulghum 1 sibling, 0 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-15 18:40 UTC (permalink / raw) To: Alan Stern Cc: Greg KH, Andrew Morton, linux-kernel, Arnd Bergmann, johannes, USB development list On Thu, 2003-05-15 at 13:11, Alan Stern wrote: > That sounds like a believable explanation. My copy of the generic UHCI > specification does not include the OC port status bits. I'm guessing from > your mail they are either bit 10 or bit 11 of the PORTSC registers, can't > tell which. Maybe they are an Intel-specific addition? Or perhaps a more > recent version of the spec has more information -- the one I've got is 1.1 > (March 1996). > > Can you suggest a good way of detecting whether or not a controller is > part of a PIIX4 chipset, to indicate whether or not the OC bits are valid? > Maybe the PCI vendor and product codes will have that information? I'm > not sure it's safe to assume that any old host controller will have > meaningful values there; the spec just says "reserved" and doesn't > stipulate that they will always read as 0's. I was originally looking at the 82731FB (PIIX) / 82731SB (PIIX3) datasheet which does not have over current inputs and has bits 11..10 labelled as reserved (but read value is not specified). The lspci show the device on the Netserver to be the 82731AB/EB/MB (PIIX4). This datasheet shows 2 over current inputs OC[1..0] and defines PORTSC bits: 11 - over current indicator change (1=changed, 0=not changed) 10 - over current indicator state (1=over current, 0=normal) If bit 10 is set then the documentation says the port is disabled. Which triggers the erratum and false resume signals. As you say, the PIIX3 does not specify that the reserved bits will necessarily read 0, then I guess some other method is needed to indicate these bits are significant. Or maybe some other document does specify that the reserved bits must be zero if not used? The PCI ID should differentiate between the controllers. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-15 18:11 ` Alan Stern 2003-05-15 18:40 ` Paul Fulghum @ 2003-05-15 19:42 ` Paul Fulghum 2003-05-15 19:59 ` Paul Fulghum 2003-05-15 21:13 ` Alan Stern 1 sibling, 2 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-15 19:42 UTC (permalink / raw) To: Alan Stern; +Cc: linux-kernel, johannes, USB development list On Thu, 2003-05-15 at 13:11, Alan Stern wrote: > Maybe they are an Intel-specific addition? Or perhaps a more > recent version of the spec has more information -- the one I've got is 1.1 > (March 1996). I can't find any later documents. > Can you suggest a good way of detecting whether or not a controller is > part of a PIIX4 chipset, to indicate whether or not the OC bits are valid? I don't see a generic way to determine the validity of these bits. I think the PCI ID is the only way: Vendor ID 8086 Device ID 7112 The erratum is only for the PIIX4, and it is triggered only when the OC inputs are active, so limiting the check to that device should be OK. Probably the least intrusive thing to do is to disable suspending the uhci controller if it is a PIIX4 *and* either port has an over current condition. This will catch the case of a functional USB controller that has one or more real over current conditions and the case of a deliberately disabled (by hardwiring the OC inputs) controller. The erratum will pop up in both cases causing suspend<->wake thrashing. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-15 19:42 ` Paul Fulghum @ 2003-05-15 19:59 ` Paul Fulghum 2003-05-15 21:13 ` Alan Stern 1 sibling, 0 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-15 19:59 UTC (permalink / raw) To: Alan Stern; +Cc: linux-kernel, johannes, USB development list On Thu, 2003-05-15 at 14:42, Paul Fulghum wrote: > On Thu, 2003-05-15 at 13:11, Alan Stern wrote: > > Maybe they are an Intel-specific addition? Or perhaps a more > > recent version of the spec has more information -- the one I've got is 1.1 > > (March 1996). > > I can't find any later documents. > > > Can you suggest a good way of detecting whether or not a controller is > > part of a PIIX4 chipset, to indicate whether or not the OC bits are valid? > > I don't see a generic way to determine the validity of these bits. > > I think the PCI ID is the only way: > Vendor ID 8086 > Device ID 7112 > > The erratum is only for the PIIX4, and it is > triggered only when the OC inputs are active, > so limiting the check to that device should > be OK. More clarification (at a suggestion from Charles Lepple): The errata covers all steppings of the 82371AB/EB/MB with a note that this bug will never be fixed in these devices. So checking for 8086:7112 should be sufficient without a need to check the version number. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-15 19:42 ` Paul Fulghum 2003-05-15 19:59 ` Paul Fulghum @ 2003-05-15 21:13 ` Alan Stern 2003-05-15 21:30 ` Paul Fulghum 1 sibling, 1 reply; 65+ messages in thread From: Alan Stern @ 2003-05-15 21:13 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel, johannes, USB development list On 15 May 2003, Paul Fulghum wrote: > The erratum is only for the PIIX4, and it is > triggered only when the OC inputs are active, > so limiting the check to that device should > be OK. > > Probably the least intrusive thing to do > is to disable suspending the uhci controller > if it is a PIIX4 *and* either port has an > over current condition. This will catch the case > of a functional USB controller that has one > or more real over current conditions and the > case of a deliberately disabled (by hardwiring > the OC inputs) controller. The erratum will > pop up in both cases causing suspend<->wake > thrashing. My intention was to avoid resuming if the resume-detect bit is set only on ports in an over-current condition, since that is the case mentioned in the erratum. Of course, this isn't as failsafe as your suggestion. Which do you think would work better? Alan Stern ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-15 21:13 ` Alan Stern @ 2003-05-15 21:30 ` Paul Fulghum 2003-05-15 19:17 ` Paul Fulghum 0 siblings, 1 reply; 65+ messages in thread From: Paul Fulghum @ 2003-05-15 21:30 UTC (permalink / raw) To: Alan Stern; +Cc: linux-kernel, johannes, USB development list On Thu, 2003-05-15 at 16:13, Alan Stern wrote: > My intention was to avoid resuming if the resume-detect bit is set only > on ports in an over-current condition, since that is the case mentioned in > the erratum. Of course, this isn't as failsafe as your suggestion. Which > do you think would work better? This should be caught on the suspend side so that you can still service the ports that do not have the over current condition. A single port in OC makes resume unreliable, so the only thing to do is not suspend. The following worked for me: static int suspend_allowed(struct uhci_hcd *uhci) { unsigned int io_addr = uhci->io_addr; int i; if (!uhci->hcd.pdev || (uhci->hcd.pdev->vendor != PCI_VENDOR_ID_INTEL) || (uhci->hcd.pdev->device != PCI_DEVICE_ID_INTEL_82371AB_2)) return 1; /* This is a 82371AB/EB/MB USB controller which has a bug that * causes false resume indications if any port has an * over current condition. If we do a global suspend then * the controller thrashes back and forth between suspend and wakeup. * * Some motherboards using the 82371AB/EB/MB (but not the USB portion) * appear to hardwire the over current inputs active to disable * the USB ports.. */ /* check for over current condition on all ports */ for (i = 0; i < uhci->rh_numports; i++) { if (inw(io_addr + USBPORTSC1 + i * 2) & 0x0400) return 0; } return 1; } static void suspend_hc(struct uhci_hcd *uhci) { unsigned int io_addr = uhci->io_addr; if (!suspend_allowed(uhci)) return; -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-15 21:30 ` Paul Fulghum @ 2003-05-15 19:17 ` Paul Fulghum 2003-05-16 15:33 ` Alan Stern 0 siblings, 1 reply; 65+ messages in thread From: Paul Fulghum @ 2003-05-15 19:17 UTC (permalink / raw) To: Alan Stern; +Cc: linux-kernel, johannes, USB development list On Thu, 2003-05-15 at 16:30, Paul Fulghum wrote: > On Thu, 2003-05-15 at 16:13, Alan Stern wrote: > > My intention was to avoid resuming if the resume-detect bit is set only > > on ports in an over-current condition, since that is the case mentioned in > > the erratum. Of course, this isn't as failsafe as your suggestion. Which > > do you think would work better? > > This should be caught on the suspend side so > that you can still service the ports that do not > have the over current condition. > > A single port in OC makes resume unreliable, > so the only thing to do is not suspend. Alan: I think I misread your message. Is there a per port resume indication? (I'm at home and don't have the specs in front of me) I was thinking of the global USBSTS_RD bit. If you can qualify the global USBSTS_RD bit with a per port resume indication on a non OC port, then it might make sense to do this on the wakeup side. Pro: you could suspend the controller when appropriate without interference from the OC ports Con: you would be generating a lot of spurious interrupts as the global USBSTS_RD is set (incorrectly) by the OC ports. Even though you would not actually do the wake, you still burn cycles servicing the false interrupts. So my inclination is still to nab this on the suspend side. Paul Fulghum paulkf@microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-15 19:17 ` Paul Fulghum @ 2003-05-16 15:33 ` Alan Stern 2003-05-16 15:58 ` Paul Fulghum 2003-05-16 18:10 ` Paul Fulghum 0 siblings, 2 replies; 65+ messages in thread From: Alan Stern @ 2003-05-16 15:33 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel, johannes, USB development list Paul: On 15 May 2003, Paul Fulghum wrote: > I think I misread your message. Is there a per port resume > indication? (I'm at home and don't have the specs in front > of me) I was thinking of the global USBSTS_RD bit. You can probably find this information in your copy of the specs. Bit 6 of the PORTSC register is Resume Detect. When that bit gets set, the host controller signals a global resume and presumably sets bit 2 of the USBSTS register. > If you can qualify the global USBSTS_RD bit with a per > port resume indication on a non OC port, then it might > make sense to do this on the wakeup side. > > Pro: you could suspend the controller when appropriate > without interference from the OC ports > > Con: you would be generating a lot of spurious interrupts > as the global USBSTS_RD is set (incorrectly) by the OC ports. > Even though you would not actually do the wake, you still > burn cycles servicing the false interrupts. I'm not sure about that. For ports in a permanent OC state, the RD bit would get set just once, so a single interrupt would be generated. When the host clears the Resume Detect bit in the USBSTS register, it shouldn't get set again (not until a different port signals a resume). Otherwise a properly working system would generate continuous interrupts during the global resume sequence. That's just guesswork on my part; the spec isn't sufficiently precise to be certain one way or the other. You can find out pretty easily by testing the driver on your machine. Just don't do anything in the ISR when USBSTS_RD is set. Come to think of it, I can try the same thing here. > So my inclination is still to nab this on the suspend side. By a nice coincidence, my system has a 8086:7112 controller -- wired up correctly and perfectly useable. So I can easily test to make sure that the final proposed change works okay on a good system. BTW, I'm not entirely pleased with the size of my test patch. It's a bit lengthy for something intended mainly to move a delay loop outside an ISR. But I couldn't think of any simpler way to do it, and once the state transition code is there it's really no harder to add the 1-second "grace" periods. So of the three ingredients we've got here (20ms delay outside of ISR, "grace" periods, checking for OC ports), I don't think we could make the patch much simpler by eliminating any. What do you think? Alan Stern ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-16 15:33 ` Alan Stern @ 2003-05-16 15:58 ` Paul Fulghum 2003-05-16 16:18 ` Paul Fulghum ` (2 more replies) 2003-05-16 18:10 ` Paul Fulghum 1 sibling, 3 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-16 15:58 UTC (permalink / raw) To: Alan Stern; +Cc: linux-kernel, johannes, USB development list On Fri, 2003-05-16 at 10:33, Alan Stern wrote: > You can probably find this information in your copy of the specs. Bit 6 > of the PORTSC register is Resume Detect. When that bit gets set, the host > controller signals a global resume and presumably sets bit 2 of the USBSTS > register. Yes, I see that now. > > Con: you would be generating a lot of spurious interrupts > > as the global USBSTS_RD is set (incorrectly) by the OC ports. > > Even though you would not actually do the wake, you still > > burn cycles servicing the false interrupts. > > I'm not sure about that. For ports in a permanent OC state, the RD bit > would get set just once, so a single interrupt would be generated. When > the host clears the Resume Detect bit in the USBSTS register, it shouldn't > get set again (not until a different port signals a resume). Otherwise a > properly working system would generate continuous interrupts during the > global resume sequence. Good point. The continuous interrupts I was seeing is because the wakeup was actually being carried out, followed by the thrashing between suspend and wakeup. If your interpretation is true (which I suspect it is) then global suspend could still be supported with only a couple of extra interrupts after each suspend. > That's just guesswork on my part; the spec isn't sufficiently precise to > be certain one way or the other. You can find out pretty easily by > testing the driver on your machine. Just don't do anything in the ISR > when USBSTS_RD is set. Come to think of it, I can try the same thing > here. I'll test this. > By a nice coincidence, my system has a 8086:7112 controller -- wired up > correctly and perfectly useable. So I can easily test to make sure that > the final proposed change works okay on a good system. Good > BTW, I'm not entirely pleased with the size of my test patch. It's a bit > lengthy for something intended mainly to move a delay loop outside an ISR. > But I couldn't think of any simpler way to do it, and once the state > transition code is there it's really no harder to add the 1-second "grace" > periods. So of the three ingredients we've got here (20ms delay outside > of ISR, "grace" periods, checking for OC ports), I don't think we could > make the patch much simpler by eliminating any. What do you think? Moving the wait out of the ISR and doing the wakeup only for RD on non-OC ports are winners. I can't comment on the 1 second grace period. Was that in response to this investigation, or have you actually seen false RD indications due to noise? There is also the more trivial matter of removing the unnecessary setting of the FGR bit on wakeup. I'll check that the global RD interrupt does not keep repeating after a false RD by an OC port. So I suggest you build a patch that does all of the above (with the grace period at your discretion). Then we can both test it, and you can submit it for actual inclusion. Thanks, Paul ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-16 15:58 ` Paul Fulghum @ 2003-05-16 16:18 ` Paul Fulghum 2003-05-16 17:16 ` Alan Stern 2003-05-16 17:20 ` Alan Stern 2003-05-19 16:41 ` Alan Stern 2 siblings, 1 reply; 65+ messages in thread From: Paul Fulghum @ 2003-05-16 16:18 UTC (permalink / raw) To: Alan Stern; +Cc: linux-kernel, johannes, USB development list On Fri, 2003-05-16 at 10:58, Paul Fulghum wrote: > There is also the more trivial matter of removing the > unnecessary setting of the FGR bit on wakeup. Gack! I just thought of something else: According to the 82371AB datasheet the controller itself sets the USBCMD_FGR bit when a global RD is detected. So qualifying the wakeup in software won't prevent the controller itself from starting the wakeup process on a false RD from an OC port. :-( Hmmmm.... crap Is there a way around this or are we back to preventing the suspend? -- Paul Fulghum paulkf@microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-16 16:18 ` Paul Fulghum @ 2003-05-16 17:16 ` Alan Stern 2003-05-16 17:48 ` Paul Fulghum 0 siblings, 1 reply; 65+ messages in thread From: Alan Stern @ 2003-05-16 17:16 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel, johannes, USB development list On 16 May 2003, Paul Fulghum wrote: > Gack! I just thought of something else: > > According to the 82371AB datasheet the controller > itself sets the USBCMD_FGR bit when a global RD is > detected. So qualifying the wakeup in software won't > prevent the controller itself from starting the > wakeup process on a false RD from an OC port. :-( > > Hmmmm.... crap > Is there a way around this or are we back to > preventing the suspend? It might not be that bad. What will happen is that the controller will assert FGR and interrupt the host. The driver will see the global RD, but will also see that it's present only on OC ports, so the driver will never leave the SUSPENDED state and will never write a 0 to FGR and EGSM. Hence the controller will never become active -- the wakeup won't finish. Of course, it's necessary to worry about what happens if one port is OC, so the controller is in this permanently-waking-up state, when a device is plugged into the other port. I think this will re-assert global RD and generate another interrupt. This time the driver will see that the RD is for real, so it will complete the wakeup sequence. Neither of us can easily test that, because it requires one port to be broken and the other to be working. But if everything else is okay, I think this will work too. Alan Stern ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-16 17:16 ` Alan Stern @ 2003-05-16 17:48 ` Paul Fulghum 2003-05-16 18:31 ` Paul Fulghum 2003-05-16 18:40 ` Alan Stern 0 siblings, 2 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-16 17:48 UTC (permalink / raw) To: Alan Stern; +Cc: linux-kernel, johannes, USB development list On Fri, 2003-05-16 at 12:16, Alan Stern wrote: > On 16 May 2003, Paul Fulghum wrote: > > According to the 82371AB datasheet the controller > > itself sets the USBCMD_FGR bit when a global RD is > > detected. So qualifying the wakeup in software won't > > prevent the controller itself from starting the > > wakeup process on a false RD from an OC port. :-( > > > > Hmmmm.... crap > > Is there a way around this or are we back to > > preventing the suspend? > > It might not be that bad. What will happen is that the controller will > assert FGR and interrupt the host. The driver will see the global RD, but > will also see that it's present only on OC ports, so the driver will never > leave the SUSPENDED state and will never write a 0 to FGR and EGSM. Hence > the controller will never become active -- the wakeup won't finish. > > Of course, it's necessary to worry about what happens if one port is OC, > so the controller is in this permanently-waking-up state, when a device is > plugged into the other port. I think this will re-assert global RD and > generate another interrupt. This time the driver will see that the RD is > for real, so it will complete the wakeup sequence. Agreed, when the controller sets the FGR bit, it starts sending the resume signal to all ports, waking attached devices, which will send back valid resume signals to the host controller, which will complete the wakeup. Which takes us back to the thrashing problem. For the case where ports are not hardwired to OC, we should account for the possibility that the OC condition may clear (bad cable replaced, etc). So if we allow suspend, and then ignore resumes on an OC (temporary condition) port, then that port will not be able to cause a valid resume when the OC condition is cleared. (per port RD is already set) So always allowing suspend, and selectively doing the wakeup will cause: 1. thrashing for case of one port OC, other port OK with attached device. 2. prevent port with OC from doing resume after clearing OC condition. For the case of all ports hardwired OC, this will work because you suspend the whole controller and never get a valid resume. -- Paul Fulghum paulkf@microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-16 17:48 ` Paul Fulghum @ 2003-05-16 18:31 ` Paul Fulghum 2003-05-16 18:40 ` Alan Stern 1 sibling, 0 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-16 18:31 UTC (permalink / raw) To: Alan Stern; +Cc: linux-kernel, johannes, USB development list On Fri, 2003-05-16 at 12:48, Paul Fulghum wrote: > So always allowing suspend, and selectively doing the > wakeup will cause: > 1. thrashing for case of one port OC, > other port OK with attached device. > 2. prevent port with OC from doing resume > after clearing OC condition. > > For the case of all ports hardwired OC, this > will work because you suspend the whole controller > and never get a valid resume. Just to add another argument to disallowing suspend instead of qualifying wakeup: In 99% of cases, with no OC, this won't come into play. In .9% of cases, with transient OC, this won't delay suspend long. In .01% of cases, with all hardwired OC ports, suspend does not matter. Plus it cures the above problems #1 and #2. If problem #1 occurs, I don't see that thrashing is any better than not suspending at all. -- Paul Fulghum paulkf@microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-16 17:48 ` Paul Fulghum 2003-05-16 18:31 ` Paul Fulghum @ 2003-05-16 18:40 ` Alan Stern 2003-05-16 19:05 ` Paul Fulghum 2003-05-18 0:57 ` Andrew McGregor 1 sibling, 2 replies; 65+ messages in thread From: Alan Stern @ 2003-05-16 18:40 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel, johannes, USB development list On 16 May 2003, Paul Fulghum wrote: > Agreed, when the controller sets the FGR bit, it > starts sending the resume signal to all ports, > waking attached devices, which will send back > valid resume signals to the host controller, which > will complete the wakeup. Which takes us back > to the thrashing problem. > > For the case where ports are not hardwired to OC, > we should account for the possibility that the > OC condition may clear (bad cable replaced, etc). > > So if we allow suspend, and then ignore resumes > on an OC (temporary condition) port, then that port will not > be able to cause a valid resume when the OC > condition is cleared. (per port RD is already set) > > So always allowing suspend, and selectively doing the > wakeup will cause: > 1. thrashing for case of one port OC, > other port OK with attached device. > 2. prevent port with OC from doing resume > after clearing OC condition. I disagree with 1. When one port has an attached device there won't be any problem because suspends will never occur (since one port is active). But I agree with 2. So yes, it's probably safer to do a conditional suspend rather than a conditional resume. > For the case of all ports hardwired OC, this > will work because you suspend the whole controller > and never get a valid resume. Not suspending isn't really a big deal. After all, we would suspend only when no devices are plugged in anyway. Is the PIIX4 chipset used in laptops, where the power saving might be important? That's the only reason I can think of for wanting to suspend whenever possible. Alan Stern ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-16 18:40 ` Alan Stern @ 2003-05-16 19:05 ` Paul Fulghum 2003-05-18 0:57 ` Andrew McGregor 1 sibling, 0 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-16 19:05 UTC (permalink / raw) To: Alan Stern; +Cc: linux-kernel, johannes, USB development list On Fri, 2003-05-16 at 13:40, Alan Stern wrote: > I disagree with 1. When one port has an attached device there won't be > any problem because suspends will never occur (since one port is active). > ... > > For the case of all ports hardwired OC, this > > will work because you suspend the whole controller > > and never get a valid resume. > > Not suspending isn't really a big deal. After all, we would suspend only > when no devices are plugged in anyway. Is the PIIX4 chipset used in > laptops, where the power saving might be important? That's the only > reason I can think of for wanting to suspend whenever possible. OK, my bad. I thought global suspend could occur with devices plugged in, if not that makes #1 a non issue. The PIIX4E (same ID, same bug) is used in some laptops. I would assume the 99% of laptop users without the OC condition would like to save power. I don't want to get in the way of the vast majority of users for these rare special cases. Since the thrashing is not a problem (no global suspend when a device is plugged in), the only downside of your original qualify wakeup plan is the possibility of missing a valid resume once a transient OC condition is cleared. Maybe just polling individual ports for OC cleared and RD set to do a global wakeup? You convinced me that the qualified wakeup is the way to go. -- Paul Fulghum paulkf@microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-16 18:40 ` Alan Stern 2003-05-16 19:05 ` Paul Fulghum @ 2003-05-18 0:57 ` Andrew McGregor 1 sibling, 0 replies; 65+ messages in thread From: Andrew McGregor @ 2003-05-18 0:57 UTC (permalink / raw) To: Alan Stern, Paul Fulghum; +Cc: linux-kernel, johannes, USB development list --On Friday, 16 May 2003 2:40 p.m. -0400 Alan Stern <stern@rowland.harvard.edu> wrote: > Not suspending isn't really a big deal. After all, we would suspend only > when no devices are plugged in anyway. Is the PIIX4 chipset used in > laptops, where the power saving might be important? That's the only > reason I can think of for wanting to suspend whenever possible. Yes, it has been used in laptops (some Gateway models, for instance), but I suspect strangely broken configurations of this chip are more common than laptops using it, and I don't expect the power saving to be that dramatic. Those machines are pretty good for battery life anyway, considering the high-end (for the time) configurations. (btw: I no longer have one) Andrew ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-16 15:58 ` Paul Fulghum 2003-05-16 16:18 ` Paul Fulghum @ 2003-05-16 17:20 ` Alan Stern 2003-05-16 17:51 ` Paul Fulghum 2003-05-19 16:41 ` Alan Stern 2 siblings, 1 reply; 65+ messages in thread From: Alan Stern @ 2003-05-16 17:20 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel, johannes, USB development list On 16 May 2003, Paul Fulghum wrote: > Moving the wait out of the ISR and doing the wakeup > only for RD on non-OC ports are winners. > > I can't comment on the 1 second grace period. Was that > in response to this investigation, or have you actually > seen false RD indications due to noise? Well, I have actually seen false indications. Whether they are due to noise is open to debate. Since they occur just at the time when I turn the power to my USB peripheral on or off, that's my best guess. It might even turn out that power on/off generates a temporary OC condition, so fixing that might render the grace period unnecessary. I haven't had a chance try it yet. > There is also the more trivial matter of removing the > unnecessary setting of the FGR bit on wakeup. Yes. That can be done in any case. > I'll check that the global RD interrupt does not > keep repeating after a false RD by an OC port. Good. > So I suggest you build a patch that does all of > the above (with the grace period at your discretion). > Then we can both test it, and you can submit it > for actual inclusion. I will. Probably won't be ready until some time next week. Alan Stern ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-16 17:20 ` Alan Stern @ 2003-05-16 17:51 ` Paul Fulghum 0 siblings, 0 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-16 17:51 UTC (permalink / raw) To: Alan Stern; +Cc: linux-kernel, johannes, USB development list On Fri, 2003-05-16 at 12:20, Alan Stern wrote: > Well, I have actually seen false indications. Whether they are due to > noise is open to debate. Since they occur just at the time when I turn > the power to my USB peripheral on or off, that's my best guess. It might > even turn out that power on/off generates a temporary OC condition, so > fixing that might render the grace period unnecessary. I haven't had a > chance try it yet. That sounds like a reasonable explanation. -- Paul Fulghum paulkf@microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-16 15:58 ` Paul Fulghum 2003-05-16 16:18 ` Paul Fulghum 2003-05-16 17:20 ` Alan Stern @ 2003-05-19 16:41 ` Alan Stern 2003-05-19 18:20 ` Paul Fulghum 2 siblings, 1 reply; 65+ messages in thread From: Alan Stern @ 2003-05-19 16:41 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel, johannes, USB development list Paul: The patch below incorporates your suggested subroutine. That alone wasn't enough to prevent the state from bouncing a few times when I powered my USB device on or off, so the debounce code is in there too. This patch behaves fine on my workstation, which has both ports connected. I'll try it later on my laptop, which only has one port. In the end, I decided it was easiest and safest to follow your "don't suspend if any ports are OC" scheme. We can try it the other way too if you want. What do you think would happen if you were to try to put your machine in an APM/ACPI "suspend" state? This is a cumulative patch, i.e., it applies to a virgin 2.5.69 source. Let me know how it works for you. Alan Stern ===== uhci-hcd.c 1.48 vs edited ===== --- 1.48/drivers/usb/host/uhci-hcd.c Mon May 5 02:49:54 2003 +++ edited/drivers/usb/host/uhci-hcd.c Mon May 19 11:30:22 2003 @@ -61,7 +61,7 @@ /* * Version Information */ -#define DRIVER_VERSION "v2.0" +#define DRIVER_VERSION "v2.1" #define DRIVER_AUTHOR "Linus 'Frodo Rabbit' Torvalds, Johannes Erdfelt, Randy Dunlap, Georg Acher, Deti Fliegl, Thomas Sailer, Roman Weissgaerber" #define DRIVER_DESC "USB Universal Host Controller Interface driver" @@ -91,9 +91,7 @@ static int uhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb); static void uhci_unlink_generic(struct uhci_hcd *uhci, struct urb *urb); -static int ports_active(struct uhci_hcd *uhci); -static void suspend_hc(struct uhci_hcd *uhci); -static void wakeup_hc(struct uhci_hcd *uhci); +static void hc_state_transitions(struct uhci_hcd *uhci); /* If a transfer is still active after this much time, turn off FSBR */ #define IDLE_TIMEOUT (HZ / 20) /* 50 ms */ @@ -1757,9 +1755,8 @@ uhci->skel_term_qh->link = UHCI_PTR_TERM; } - /* enter global suspend if nothing connected */ - if (!uhci->is_suspended && !ports_active(uhci)) - suspend_hc(uhci); + /* Poll for and perform state transitions */ + hc_state_transitions(uhci); init_stall_timer(hcd); } @@ -1884,14 +1881,14 @@ err("%x: host system error, PCI problems?", io_addr); if (status & USBSTS_HCPE) err("%x: host controller process error. something bad happened", io_addr); - if ((status & USBSTS_HCH) && !uhci->is_suspended) { + if ((status & USBSTS_HCH) && uhci->state > 0) { err("%x: host controller halted. very bad", io_addr); /* FIXME: Reset the controller, fix the offending TD */ } } if (status & USBSTS_RD) - wakeup_hc(uhci); + uhci->resume_detect = 1; uhci_free_pending_qhs(uhci); @@ -1922,10 +1919,12 @@ unsigned int io_addr = uhci->io_addr; /* Global reset for 50ms */ + uhci->state = UHCI_RESET; outw(USBCMD_GRESET, io_addr + USBCMD); wait_ms(50); outw(0, io_addr + USBCMD); wait_ms(10); + uhci->resume_detect = 0; } static void suspend_hc(struct uhci_hcd *uhci) @@ -1933,34 +1932,48 @@ unsigned int io_addr = uhci->io_addr; dbg("%x: suspend_hc", io_addr); - - uhci->is_suspended = 1; - smp_wmb(); - + uhci->state = UHCI_SUSPENDED; + uhci->resume_detect = 0; outw(USBCMD_EGSM, io_addr + USBCMD); } static void wakeup_hc(struct uhci_hcd *uhci) { unsigned int io_addr = uhci->io_addr; - unsigned int status; - dbg("%x: wakeup_hc", io_addr); + switch (uhci->state) { + case UHCI_SUSPENDED: /* Start the resume */ + dbg("%x: wakeup_hc", io_addr); + + /* Global resume for >= 20ms */ + uhci->state = UHCI_RESUMING_1; + uhci->state_end = jiffies + (20*HZ+999) / 1000; + break; - /* Global resume for 20ms */ - outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); - wait_ms(20); - outw(0, io_addr + USBCMD); - - /* wait for EOP to be sent */ - status = inw(io_addr + USBCMD); - while (status & USBCMD_FGR) - status = inw(io_addr + USBCMD); + case UHCI_RESUMING_1: /* End global resume */ + uhci->state = UHCI_RESUMING_2; + outw(0, io_addr + USBCMD); + /* Falls through */ - uhci->is_suspended = 0; + case UHCI_RESUMING_2: /* Wait for EOP to be sent */ + if (inw(io_addr + USBCMD) & USBCMD_FGR) + break; - /* Run and mark it configured with a 64-byte max packet */ - outw(USBCMD_RS | USBCMD_CF | USBCMD_MAXP, io_addr + USBCMD); + /* Run for at least 1 second, and + * mark it configured with a 64-byte max packet */ + uhci->state = UHCI_RUNNING_GRACE; + uhci->state_end = jiffies + HZ; + outw(USBCMD_RS | USBCMD_CF | USBCMD_MAXP, + io_addr + USBCMD); + break; + + case UHCI_RUNNING_GRACE: /* Now allowed to suspend */ + uhci->state = UHCI_RUNNING; + break; + + default: + break; + } } static int ports_active(struct uhci_hcd *uhci) @@ -1975,6 +1988,73 @@ return connection; } +static int suspend_allowed(struct uhci_hcd *uhci) +{ + unsigned int io_addr = uhci->io_addr; + int i; + + if (!uhci->hcd.pdev || + uhci->hcd.pdev->vendor != PCI_VENDOR_ID_INTEL || + uhci->hcd.pdev->device != PCI_DEVICE_ID_INTEL_82371AB_2) + return 1; + + /* This is a 82371AB/EB/MB USB controller which has a bug that + * causes false resume indications if any port has an + * over current condition. To prevent problems, we will not + * allow a global suspend if any ports are OC. + * + * Some motherboards using the 82371AB/EB/MB (but not the USB portion) + * appear to hardwire the over current inputs active to disable + * the USB ports. + */ + + /* check for over current condition on any port */ + for (i = 0; i < uhci->rh_numports; i++) { + if (inw(io_addr + USBPORTSC1 + i * 2) & USBPORTSC_OC) + return 0; + } + + return 1; +} + +static void hc_state_transitions(struct uhci_hcd *uhci) +{ + switch (uhci->state) { + case UHCI_RUNNING: + + /* global suspend if nothing connected for 1 second */ + if (!ports_active(uhci) && suspend_allowed(uhci)) { + uhci->state = UHCI_SUSPENDING_GRACE; + uhci->state_end = jiffies + HZ; + } + break; + + case UHCI_SUSPENDING_GRACE: + if (ports_active(uhci)) + uhci->state = UHCI_RUNNING; + else if (time_after_eq(jiffies, uhci->state_end)) + suspend_hc(uhci); + break; + + case UHCI_SUSPENDED: + + /* wakeup if requested by a device */ + if (uhci->resume_detect) + wakeup_hc(uhci); + break; + + case UHCI_RESUMING_1: + case UHCI_RESUMING_2: + case UHCI_RUNNING_GRACE: + if (time_after_eq(jiffies, uhci->state_end)) + wakeup_hc(uhci); + break; + + default: + break; + } +} + static void start_hc(struct uhci_hcd *uhci) { unsigned int io_addr = uhci->io_addr; @@ -2003,6 +2083,8 @@ outl(uhci->fl->dma_handle, io_addr + USBFLBASEADD); /* Run and mark it configured with a 64-byte max packet */ + uhci->state = UHCI_RUNNING_GRACE; + uhci->state_end = jiffies + HZ; outw(USBCMD_RS | USBCMD_CF | USBCMD_MAXP, io_addr + USBCMD); uhci->hcd.state = USB_STATE_READY; @@ -2100,8 +2182,6 @@ uhci->fsbr = 0; uhci->fsbrtimeout = 0; - - uhci->is_suspended = 0; spin_lock_init(&uhci->qh_remove_list_lock); INIT_LIST_HEAD(&uhci->qh_remove_list); ===== uhci-hcd.h 1.11 vs edited ===== --- 1.11/drivers/usb/host/uhci-hcd.h Tue Dec 10 14:03:10 2002 +++ edited/drivers/usb/host/uhci-hcd.h Mon May 19 10:36:36 2003 @@ -53,6 +53,7 @@ #define USBPORTSC_RD 0x0040 /* Resume Detect */ #define USBPORTSC_LSDA 0x0100 /* Low Speed Device Attached */ #define USBPORTSC_PR 0x0200 /* Port Reset */ +#define USBPORTSC_OC 0x0400 /* Over Current condition */ #define USBPORTSC_SUSP 0x1000 /* Suspend */ /* Legacy support register */ @@ -282,6 +283,29 @@ return 0; /* int128 for 128-255 ms (Max.) */ } +/* + * Device states for the host controller. + * + * To prevent "bouncing" in the presence of electrical noise, + * we insist on a 1-second "grace" period, before switching to + * the RUNNING or SUSPENDED states, during which the state is + * not allowed to change. + * + * The resume process is divided into substates in order to avoid + * potentially length delays during the timer handler. + * + * States in which the host controller is halted must have values <= 0. + */ +enum uhci_state { + UHCI_RESET, + UHCI_RUNNING_GRACE, /* Before RUNNING */ + UHCI_RUNNING, /* The normal state */ + UHCI_SUSPENDING_GRACE, /* Before SUSPENDED */ + UHCI_SUSPENDED = -10, /* When no devices are attached */ + UHCI_RESUMING_1, + UHCI_RESUMING_2 +}; + #define hcd_to_uhci(hcd_ptr) container_of(hcd_ptr, struct uhci_hcd, hcd) /* @@ -313,7 +337,10 @@ struct uhci_frame_list *fl; /* P: uhci->frame_list_lock */ int fsbr; /* Full speed bandwidth reclamation */ unsigned long fsbrtimeout; /* FSBR delay */ - int is_suspended; + + enum uhci_state state; /* FIXME: needs a spinlock */ + unsigned long state_end; /* Time of next transition */ + int resume_detect; /* Need a Global Resume */ /* Main list of URB's currently controlled by this HC */ spinlock_t urb_list_lock; ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-19 16:41 ` Alan Stern @ 2003-05-19 18:20 ` Paul Fulghum 2003-05-19 18:49 ` Alan Stern 0 siblings, 1 reply; 65+ messages in thread From: Paul Fulghum @ 2003-05-19 18:20 UTC (permalink / raw) To: Alan Stern; +Cc: linux-kernel, johannes, USB development list On Mon, 2003-05-19 at 11:41, Alan Stern wrote: > The patch below incorporates your suggested subroutine. That alone wasn't > enough to prevent the state from bouncing a few times when I powered my > USB device on or off, so the debounce code is in there too. This patch > behaves fine on my workstation, which has both ports connected. I'll try > it later on my laptop, which only has one port. > > In the end, I decided it was easiest and safest to follow your "don't > suspend if any ports are OC" scheme. We can try it the other way too if > you want. What do you think would happen if you were to try to put your > machine in an APM/ACPI "suspend" state? > > This is a cumulative patch, i.e., it applies to a virgin 2.5.69 source. > Let me know how it works for you. Alan, the patch applied cleanly and worked for me (prevented global suspension). Having the lengthy waits outside of the ISR is a definate plus, and the debounce makes sense. My machine does not have APM/ACPI facilities so I can't test the suspend. It is getting pretty dated, but the economy dictates I live with it for a while longer :-) Does you laptop use the PIIX4? If it does and uses only one port, I would be very interested to see if one port is continuously reporting OC (hardwired). Thanks for the patch, Paul -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-19 18:20 ` Paul Fulghum @ 2003-05-19 18:49 ` Alan Stern 0 siblings, 0 replies; 65+ messages in thread From: Alan Stern @ 2003-05-19 18:49 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel, johannes, USB development list On 19 May 2003, Paul Fulghum wrote: > the patch applied cleanly and worked for me > (prevented global suspension). Having the lengthy > waits outside of the ISR is a definate plus, and > the debounce makes sense. Good. I'll submit this, if no other problems arise. > My machine does not have APM/ACPI facilities so > I can't test the suspend. It is getting pretty > dated, but the economy dictates I live with it for > a while longer :-) > > Does you laptop use the PIIX4? If it does and uses only > one port, I would be very interested to see if > one port is continuously reporting OC (hardwired). I don't remember what chipset it has. But it definitely includes a UHCI controller with only one port. I'll check it out tonight. Alan Stern ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: Test Patch: 2.5.69 Interrupt Latency 2003-05-16 15:33 ` Alan Stern 2003-05-16 15:58 ` Paul Fulghum @ 2003-05-16 18:10 ` Paul Fulghum 1 sibling, 0 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-16 18:10 UTC (permalink / raw) To: Alan Stern; +Cc: linux-kernel, johannes, USB development list On Fri, 2003-05-16 at 10:33, Alan Stern wrote: > Paul: > > On 15 May 2003, Paul Fulghum wrote: > > Con: you would be generating a lot of spurious interrupts > > as the global USBSTS_RD is set (incorrectly) by the OC ports. > > Even though you would not actually do the wake, you still > > burn cycles servicing the false interrupts. > > I'm not sure about that. For ports in a permanent OC state, the RD bit > would get set just once, so a single interrupt would be generated. When > the host clears the Resume Detect bit in the USBSTS register, it shouldn't > get set again (not until a different port signals a resume). Otherwise a > properly working system would generate continuous interrupts during the > global resume sequence. Your interpretation checks out. The global RD interrupt does not reoccur once the individual RD bit is set. So we get a max of once extra interrupt per OC port. -- Paul Fulghum paulkf@microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-09 21:28 ` Andrew Morton 2003-05-12 13:57 ` Paul Fulghum @ 2003-05-14 17:50 ` Paul Fulghum 1 sibling, 0 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-14 17:50 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Fri, 2003-05-09 at 16:28, Andrew Morton wrote: > Paul Fulghum <paulkf@microgate.com> wrote: > > > > In the process of eliminating kernel options to isolate > > the problem, eliminating USB completely appears to fix it. > > > > One machine (server) was using usb-uhci and > > the other (laptop) was using usb-ohci. > > > > So it looks like something with USB in 2.5.68-bk11 The latency problem seen on the laptop turned out to be a stupid mistake on my part: I enabled the ALI15XX IDE controller option as a module instead of in kernel and so it was not available for using DMA mode. Once corrected the latency is running at a smooth 20us without the >5ms spikes associated with PIO IDE. Final Diagnosis: server latency problem = USB wakeup_hc() delay added in 2.5.68-bk11 laptop latency problem = user with dain bramage Thanks, Paul -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-09 18:12 ` Paul Fulghum 2003-05-09 20:30 ` Paul Fulghum @ 2003-05-09 21:07 ` Andrew Morton 2003-05-09 21:28 ` Paul Fulghum 1 sibling, 1 reply; 65+ messages in thread From: Andrew Morton @ 2003-05-09 21:07 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel, randy.dunlap Paul Fulghum <paulkf@microgate.com> wrote: > > On Thu, 2003-05-08 at 14:22, Andrew Morton wrote: > > Can you pinpoint a kernel version at which it started to happen? > > I have now isolated the latency problems further to 2.5.68-bk11 > > 2.5.68-bk10 an earlier works fine. Well I'm darned if I can see a thing wrong there. Are you using ieee1394, or USB, or any fancy networking features? ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-09 21:07 ` Andrew Morton @ 2003-05-09 21:28 ` Paul Fulghum 0 siblings, 0 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-09 21:28 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Randy.Dunlap On Fri, 2003-05-09 at 16:07, Andrew Morton wrote: > Well I'm darned if I can see a thing wrong there. Are you using > ieee1394, or USB, or any fancy networking features? ieee1394 is disabled, pretty basic network options (started from make defconfig) See my reponse to Arnd Bergmann for more details. I'm not thoroughly convinced it's USB either. I'm still collecting info and testing different versions to try and piece this together. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-07 22:28 ` Andrew Morton 2003-05-08 0:25 ` Paul Fulghum 2003-05-08 13:56 ` Paul Fulghum @ 2003-05-08 14:47 ` Paul Fulghum 2 siblings, 0 replies; 65+ messages in thread From: Paul Fulghum @ 2003-05-08 14:47 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Wed, 2003-05-07 at 17:28, Andrew Morton wrote: > Paul Fulghum <paulkf@microgate.com> wrote: > > > > 2.5.69 > > Latency 100-110usec (5x increase) > > Spikes from 5-10 milliseconds > > > Could be that some random piece of code forgot to reenable interrupts, and > things stay that way until they get reenabled again by schedule() or > syscall return. > > One way of finding the culprit would be: > > my_isr() > { > if (this interrupt is more than 5 milliseconds delayed) > dump_stack(); > } > > the stack dump will point up at the place where interrupts finally got > enabled. Here is what I got (latency spike in milliseconds): synclinkscc is a driver I maintain, and that is where I placed the stack_dump() Call Trace: [<cc8bdf1a>] IsrStatusB+0x1fa/0x2c0 [synclinkscc] [<cc8c5b29>] +0xf4/0x5ab [synclinkscc] [<cc8c5fe0>] +0x0/0x14c8 [synclinkscc] [<cc8be575>] mgscc_interrupt+0xa5/0x1b0 [synclinkscc] [<cc8c5fe0>] +0x0/0x14c8 [synclinkscc] [<c010d5eb>] handle_IRQ_event+0x4b/0x120 [<c010d89b>] do_IRQ+0x9b/0x110 [<c010bc00>] common_interrupt+0x18/0x20 [<c01255db>] do_softirq+0x6b/0xe0 [<c010d8fb>] do_IRQ+0xfb/0x110 [<c0108f90>] default_idle+0x0/0x40 [<c010bc00>] common_interrupt+0x18/0x20 [<c0108f90>] default_idle+0x0/0x40 [<c0108fc0>] default_idle+0x30/0x40 [<c010905a>] cpu_idle+0x4a/0x60 [<c0105000>] rest_init+0x0/0x60 [<c0456951>] start_kernel+0x181/0x1b0 [<c0456500>] unknown_bootoption+0x0/0x110 -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 65+ messages in thread
end of thread, other threads:[~2003-05-19 18:36 UTC | newest] Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-05-07 16:12 2.5.69 Interrupt Latency Paul Fulghum 2003-05-07 19:41 ` Paul Fulghum 2003-05-07 22:28 ` Andrew Morton 2003-05-08 0:25 ` Paul Fulghum 2003-05-08 13:56 ` Paul Fulghum 2003-05-08 19:22 ` Andrew Morton 2003-05-08 19:35 ` Paul Fulghum 2003-05-08 23:20 ` Brian Gerst 2003-05-09 18:12 ` Paul Fulghum 2003-05-09 20:30 ` Paul Fulghum 2003-05-09 21:28 ` Andrew Morton 2003-05-12 13:57 ` Paul Fulghum 2003-05-12 14:06 ` Paul Fulghum 2003-05-12 16:24 ` Greg KH 2003-05-12 17:08 ` Paul Fulghum 2003-05-12 17:30 ` Greg KH 2003-05-12 17:49 ` Paul Fulghum 2003-05-12 18:01 ` Greg KH 2003-05-12 18:15 ` Paul Fulghum 2003-05-13 15:26 ` Alan Stern 2003-05-13 15:35 ` Paul Fulghum 2003-05-13 17:30 ` Greg KH 2003-05-13 13:01 ` Paul Fulghum 2003-05-13 18:09 ` Greg KH 2003-05-13 18:11 ` Greg KH 2003-05-13 21:35 ` Alan Stern 2003-05-13 21:48 ` Helge Hafting 2003-05-13 22:09 ` Alan Stern 2003-05-14 21:06 ` Paul Fulghum 2003-05-14 21:15 ` Johannes Erdfelt 2003-05-14 21:30 ` Greg KH 2003-05-14 21:45 ` Paul Fulghum 2003-05-13 20:17 ` Bill Davidsen 2003-05-13 22:39 ` Paul Fulghum 2003-05-14 20:52 ` Test Patch: " Alan Stern 2003-05-15 13:45 ` Paul Fulghum 2003-05-15 14:12 ` Paul Fulghum 2003-05-15 21:07 ` Alan Stern 2003-05-15 15:26 ` Paul Fulghum 2003-05-15 18:11 ` Alan Stern 2003-05-15 18:40 ` Paul Fulghum 2003-05-15 19:42 ` Paul Fulghum 2003-05-15 19:59 ` Paul Fulghum 2003-05-15 21:13 ` Alan Stern 2003-05-15 21:30 ` Paul Fulghum 2003-05-15 19:17 ` Paul Fulghum 2003-05-16 15:33 ` Alan Stern 2003-05-16 15:58 ` Paul Fulghum 2003-05-16 16:18 ` Paul Fulghum 2003-05-16 17:16 ` Alan Stern 2003-05-16 17:48 ` Paul Fulghum 2003-05-16 18:31 ` Paul Fulghum 2003-05-16 18:40 ` Alan Stern 2003-05-16 19:05 ` Paul Fulghum 2003-05-18 0:57 ` Andrew McGregor 2003-05-16 17:20 ` Alan Stern 2003-05-16 17:51 ` Paul Fulghum 2003-05-19 16:41 ` Alan Stern 2003-05-19 18:20 ` Paul Fulghum 2003-05-19 18:49 ` Alan Stern 2003-05-16 18:10 ` Paul Fulghum 2003-05-14 17:50 ` Paul Fulghum 2003-05-09 21:07 ` Andrew Morton 2003-05-09 21:28 ` Paul Fulghum 2003-05-08 14:47 ` Paul Fulghum
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).