* Re: 2.5.69 Interrupt Latency [not found] ` <20030509204010$3c9b@gated-at.bofh.it> @ 2003-05-09 21:06 ` Arnd Bergmann 2003-05-09 21:25 ` Paul Fulghum 0 siblings, 1 reply; 40+ messages in thread From: Arnd Bergmann @ 2003-05-09 21:06 UTC (permalink / raw) To: Paul Fulghum, linux-kernel Paul Fulghum wrote: > One machine (server) was using usb-uhci and > the other (laptop) was using usb-ohci. > > So it looks like something with USB in 2.5.68-bk11 The change below was part of 2.5.68-bk11, and adds a 20ms delay to the uhci interrupt handler. Could that be the culprit? Arnd <>< ChangeSet 1.1042.1.129 2003/04/29 15:30:31 stern@rowland.harvard.edu [PATCH] USB: Minor patch for uhci-hcd.c --- 1.32/drivers/usb/host/uhci-hcd.c Mon Apr 14 11:51:40 2003 +++ 1.33/drivers/usb/host/uhci-hcd.c Fri Apr 18 13:37:24 2003 @@ -1283,7 +1283,8 @@ } if (last_urb) { - *end = (last_urb->start_frame + last_urb->number_of_packets) & 1023; + *end = (last_urb->start_frame + last_urb->number_of_packets * + last_urb->interval) & (UHCI_NUMFRAMES-1); ret = 0; } else ret = -1; /* no previous urb found */ @@ -1933,9 +1934,10 @@ dbg("%x: suspend_hc", io_addr); - outw(USBCMD_EGSM, io_addr + USBCMD); - uhci->is_suspended = 1; + smp_wmb(); + + outw(USBCMD_EGSM, io_addr + USBCMD); } static void wakeup_hc(struct uhci_hcd *uhci) @@ -1945,6 +1947,9 @@ dbg("%x: wakeup_hc", io_addr); + /* Global resume for 20ms */ + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); + wait_ms(20); outw(0, io_addr + USBCMD); /* wait for EOP to be sent */ @@ -1965,7 +1970,7 @@ int i; for (i = 0; i < uhci->rh_numports; i++) - connection |= (inw(io_addr + USBPORTSC1 + i * 2) & 0x1); + connection |= (inw(io_addr + USBPORTSC1 + i * 2) & USBPORTSC_CCS); return connection; } ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-09 21:06 ` 2.5.69 Interrupt Latency Arnd Bergmann @ 2003-05-09 21:25 ` Paul Fulghum 0 siblings, 0 replies; 40+ messages in thread From: Paul Fulghum @ 2003-05-09 21:25 UTC (permalink / raw) To: Arnd Bergmann; +Cc: linux-kernel On Fri, 2003-05-09 at 16:06, Arnd Bergmann wrote: > Paul Fulghum wrote: > > > One machine (server) was using usb-uhci and > > the other (laptop) was using usb-ohci. > > > > So it looks like something with USB in 2.5.68-bk11 > > The change below was part of 2.5.68-bk11, and adds a 20ms > delay to the uhci interrupt handler. Could that be > the culprit? Possibly, I can try backing out just that part. To complicate matters, this is happening on 2 machines: a server and a laptop. I disabled USB on the laptop (2.5.69) and the problem is still there :-( I am confident about these results: 1. On the server, bk10 and earlier works, bk11 and later breaks. 2. On the server, bk11 with USB breaks, bk11 without USB works. 3. On the laptop, 2.5.68 and earlier works, 2.5.69 breaks I need to test the laptop with the bk10/bk11 sets to see if this follows the results on the server. Maybe disabling/enabling USB is just triggering something else in the configuration file. I'm leaving for the weekend now, and will try to get back to this on Monday. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* 2.5.69 Interrupt Latency @ 2003-05-07 16:12 Paul Fulghum 2003-05-07 19:41 ` Paul Fulghum 0 siblings, 1 reply; 40+ messages in thread From: Paul Fulghum @ 2003-05-07 16:12 UTC (permalink / raw) To: linux-kernel Starting with kernel version 2.5.69, I am encountering what appears to be increased interrupt latency or spikes in interrupt latency. I noticed this on two serial drivers that use programmed I/O with FIFOs. On 2.5.68, no problems. On 2.5.69 plenty of underruns. Inspecting the driver tracing, it does not look like lost interrupts. I see this on 2 different machines (one SMP server and one laptop). There were changes involving the return type of interrupt handlers (from void to irqreturn_t) in 2.5.69. Could this be related? Has anyone else seen similar results? If I can get time, I'll try and hook up a scope to measure the latencies precisely. I want to check to see if this is a known problems before doing so. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-07 16:12 Paul Fulghum @ 2003-05-07 19:41 ` Paul Fulghum 2003-05-07 22:28 ` Andrew Morton 0 siblings, 1 reply; 40+ messages in thread From: Paul Fulghum @ 2003-05-07 19:41 UTC (permalink / raw) To: linux-kernel On Wed, 2003-05-07 at 11:12, Paul Fulghum wrote: > Starting with kernel version 2.5.69, I am > encountering what appears to be increased > interrupt latency or spikes in interrupt latency. > ... > I see this on 2 different machines > (one SMP server and one laptop). > ... > If I can get time, I'll try and hook up a scope > to measure the latencies precisely. I want to > check to see if this is a known problems before doing so. Here are some results with a scope hooked to the hardware while running tests with a regular interrupt pattern: 2.4.20-8 (redhat) Latency 20-30usec Spikes to 80usec 2.5.68 Latency 20-30usec Spikes to 100usec 2.5.69 Latency 100-110usec (5x increase) Spikes from 5-10 milliseconds This is all on a PCI adapter not sharing interrupts on a dual Pentium II-400 Netserver LC3. Any ideas what happened? -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-07 19:41 ` Paul Fulghum @ 2003-05-07 22:28 ` Andrew Morton 2003-05-08 0:25 ` Paul Fulghum ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Andrew Morton @ 2003-05-07 22:28 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel Paul Fulghum <paulkf@microgate.com> wrote: > > 2.5.69 > Latency 100-110usec (5x increase) > Spikes from 5-10 milliseconds > > This is all on a PCI adapter not sharing interrupts > on a dual Pentium II-400 Netserver LC3. > > Any ideas what happened? Could be that some random piece of code forgot to reenable interrupts, and things stay that way until they get reenabled again by schedule() or syscall return. One way of finding the culprit would be: my_isr() { if (this interrupt is more than 5 milliseconds delayed) dump_stack(); } the stack dump will point up at the place where interrupts finally got enabled. If you can describe what drivers are in use, and what workload triggers the problem then it may be locatable by inspection. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-07 22:28 ` Andrew Morton @ 2003-05-08 0:25 ` Paul Fulghum 2003-05-08 13:56 ` Paul Fulghum 2003-05-08 14:47 ` Paul Fulghum 2 siblings, 0 replies; 40+ messages in thread From: Paul Fulghum @ 2003-05-08 0:25 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Wed, 2003-05-07 at 17:28, Andrew Morton wrote: > Paul Fulghum <paulkf@microgate.com> wrote: > > > > 2.5.69 > > Latency 100-110usec (5x increase) > > Spikes from 5-10 milliseconds > > > > This is all on a PCI adapter not sharing interrupts > > on a dual Pentium II-400 Netserver LC3. > > > > Any ideas what happened? > > Could be that some random piece of code forgot to reenable interrupts, and > things stay that way until they get reenabled again by schedule() or > syscall return. > > One way of finding the culprit would be: > > my_isr() > { > if (this interrupt is more than 5 milliseconds delayed) > dump_stack(); > } > > the stack dump will point up at the place where interrupts finally got > enabled. I'll give that a try tomorrow. > If you can describe what drivers are in use, and what workload triggers the > problem then it may be locatable by inspection. It happens on both of the machines I tried (server and laptop). I think the only common hardware between the two is the net controller which is intel etherpro 100 based. I'll check tomorrow to be sure. There was essentially no work load (no net traffic, no CPU intensive program, no disk activity). I was just doing simple loopback tests on our serial devices (PCI based on server and PC Card on laptop). Paul Fulghum paulkf@microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-07 22:28 ` Andrew Morton 2003-05-08 0:25 ` Paul Fulghum @ 2003-05-08 13:56 ` Paul Fulghum 2003-05-08 19:22 ` Andrew Morton 2003-05-08 14:47 ` Paul Fulghum 2 siblings, 1 reply; 40+ messages in thread From: Paul Fulghum @ 2003-05-08 13:56 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Wed, 2003-05-07 at 17:28, Andrew Morton wrote: > Paul Fulghum <paulkf@microgate.com> wrote: > > > > 2.5.69 > > Latency 100-110usec (5x increase) > > Spikes from 5-10 milliseconds > > > If you can describe what drivers are in use, and what workload triggers the > problem then it may be locatable by inspection. After inspecting both machines, there is no common hardware other than the net device. Both machines use the e100 driver. After removing the e100 driver altogether, the increased latency and latency spikes persist. So it looks like this problem is not specific to a particular hardware driver, but is a result of a more fundemental, system wide change. I'm going to try your suggestion of doing a stack dump when the driver encounters the large spikes in IRQ latency, to determine if something is leaving interrupts disabled. That will not address the fact that the minimum latency has jumped from 20usec (2.4.20 - 2.5.68) to 100usec (2.5.69). This may actually be two separate problems introduced with 2.5.69 -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-08 13:56 ` Paul Fulghum @ 2003-05-08 19:22 ` Andrew Morton 2003-05-08 19:35 ` Paul Fulghum 2003-05-09 18:12 ` Paul Fulghum 0 siblings, 2 replies; 40+ messages in thread From: Andrew Morton @ 2003-05-08 19:22 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel Paul Fulghum <paulkf@microgate.com> wrote: > > On Wed, 2003-05-07 at 17:28, Andrew Morton wrote: > > Paul Fulghum <paulkf@microgate.com> wrote: > > > > > > 2.5.69 > > > Latency 100-110usec (5x increase) > > > Spikes from 5-10 milliseconds > > > > > > If you can describe what drivers are in use, and what workload triggers the > > problem then it may be locatable by inspection. > > After inspecting both machines, there is no common > hardware other than the net device. Both machines > use the e100 driver. > > After removing the e100 driver altogether, > the increased latency and latency spikes persist. > > So it looks like this problem is not specific to a > particular hardware driver, but is a result of a > more fundemental, system wide change. > > I'm going to try your suggestion of doing a stack dump > when the driver encounters the large spikes in IRQ latency, > to determine if something is leaving interrupts disabled. I wasn't very informative, alas. > That will not address the fact that the minimum > latency has jumped from 20usec (2.4.20 - 2.5.68) to 100usec > (2.5.69). This may actually be two separate problems > introduced with 2.5.69 Can you pinpoint a kernel version at which it started to happen? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-08 19:22 ` Andrew Morton @ 2003-05-08 19:35 ` Paul Fulghum 2003-05-08 23:20 ` Brian Gerst 2003-05-09 18:12 ` Paul Fulghum 1 sibling, 1 reply; 40+ messages in thread From: Paul Fulghum @ 2003-05-08 19:35 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Thu, 2003-05-08 at 14:22, Andrew Morton wrote: > Paul Fulghum <paulkf@microgate.com> wrote: > > > > On Wed, 2003-05-07 at 17:28, Andrew Morton wrote: > > > Paul Fulghum <paulkf@microgate.com> wrote: > > > > > > > > 2.5.69 > > > > Latency 100-110usec (5x increase) > > > > Spikes from 5-10 milliseconds > > > > > > I'm going to try your suggestion of doing a stack dump > > when the driver encounters the large spikes in IRQ latency, > > to determine if something is leaving interrupts disabled. > > I wasn't very informative, alas. Yeah, I've been reading through the 2.5.69 patch again and could not really see anything that related to the stack dump. > > That will not address the fact that the minimum > > latency has jumped from 20usec (2.4.20 - 2.5.68) to 100usec > > (2.5.69). This may actually be two separate problems > > introduced with 2.5.69 > > Can you pinpoint a kernel version at which it started to happen? Exactly with 2.5.69 2.5.68 works fine as do earlier versions back to 2.4.20-8 (earliest tested for this problem). All these versions have very consistant latencies as described above. The problem definately started with the 2.5.69 -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-08 19:35 ` Paul Fulghum @ 2003-05-08 23:20 ` Brian Gerst 0 siblings, 0 replies; 40+ messages in thread From: Brian Gerst @ 2003-05-08 23:20 UTC (permalink / raw) To: Paul Fulghum; +Cc: Andrew Morton, linux-kernel Paul Fulghum wrote: > On Thu, 2003-05-08 at 14:22, Andrew Morton wrote: > >>Paul Fulghum <paulkf@microgate.com> wrote: >> >>>On Wed, 2003-05-07 at 17:28, Andrew Morton wrote: >>> >>>>Paul Fulghum <paulkf@microgate.com> wrote: >>>> >>>>>2.5.69 >>>>>Latency 100-110usec (5x increase) >>>>>Spikes from 5-10 milliseconds >>>>> > > >>>I'm going to try your suggestion of doing a stack dump >>>when the driver encounters the large spikes in IRQ latency, >>>to determine if something is leaving interrupts disabled. >> >>I wasn't very informative, alas. > > > Yeah, I've been reading through the 2.5.69 patch again and > could not really see anything that related to the > stack dump. > > >>>That will not address the fact that the minimum >>>latency has jumped from 20usec (2.4.20 - 2.5.68) to 100usec >>>(2.5.69). This may actually be two separate problems >>>introduced with 2.5.69 >> >>Can you pinpoint a kernel version at which it started to happen? > > > Exactly with 2.5.69 > > 2.5.68 works fine as do earlier versions back to 2.4.20-8 > (earliest tested for this problem). All these versions have > very consistant latencies as described above. > > The problem definately started with the 2.5.69 > Try to narrow it down with the 2.5.68-bk snapshots. -- Brian Gerst ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-08 19:22 ` Andrew Morton 2003-05-08 19:35 ` Paul Fulghum @ 2003-05-09 18:12 ` Paul Fulghum 2003-05-09 20:30 ` Paul Fulghum 2003-05-09 21:07 ` Andrew Morton 1 sibling, 2 replies; 40+ messages in thread From: Paul Fulghum @ 2003-05-09 18:12 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Randy.Dunlap On Thu, 2003-05-08 at 14:22, Andrew Morton wrote: > Can you pinpoint a kernel version at which it started to happen? I have now isolated the latency problems further to 2.5.68-bk11 2.5.68-bk10 an earlier works fine. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-09 18:12 ` Paul Fulghum @ 2003-05-09 20:30 ` Paul Fulghum 2003-05-09 21:28 ` Andrew Morton 2003-05-09 21:07 ` Andrew Morton 1 sibling, 1 reply; 40+ messages in thread From: Paul Fulghum @ 2003-05-09 20:30 UTC (permalink / raw) To: linux-kernel On Fri, 2003-05-09 at 13:12, Paul Fulghum wrote: > On Thu, 2003-05-08 at 14:22, Andrew Morton wrote: > > Can you pinpoint a kernel version at which it started to happen? > > I have now isolated the latency problems further to 2.5.68-bk11 > > 2.5.68-bk10 an earlier works fine. In the process of eliminating kernel options to isolate the problem, eliminating USB completely appears to fix it. One machine (server) was using usb-uhci and the other (laptop) was using usb-ohci. So it looks like something with USB in 2.5.68-bk11 -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-09 20:30 ` Paul Fulghum @ 2003-05-09 21:28 ` Andrew Morton 2003-05-12 13:57 ` Paul Fulghum 2003-05-14 17:50 ` Paul Fulghum 0 siblings, 2 replies; 40+ messages in thread From: Andrew Morton @ 2003-05-09 21:28 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel Paul Fulghum <paulkf@microgate.com> wrote: > > In the process of eliminating kernel options to isolate > the problem, eliminating USB completely appears to fix it. > > One machine (server) was using usb-uhci and > the other (laptop) was using usb-ohci. > > So it looks like something with USB in 2.5.68-bk11 ah, that helps. This code was added to wakeup_hc(). It is called from uhci_irq(): + /* Global resume for 20ms */ + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); + wait_ms(20); The changelog just says "Minor patch for uhci-hcd.c" Can you delete the wait_ms() and see if that is our culprit? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-09 21:28 ` Andrew Morton @ 2003-05-12 13:57 ` Paul Fulghum 2003-05-12 14:06 ` Paul Fulghum 2003-05-12 16:24 ` Greg KH 2003-05-14 17:50 ` Paul Fulghum 1 sibling, 2 replies; 40+ messages in thread From: Paul Fulghum @ 2003-05-12 13:57 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Arnd Bergmann, johannes On Fri, 2003-05-09 at 16:28, Andrew Morton wrote: > This code was added to wakeup_hc(). It is called from uhci_irq(): > > + /* Global resume for 20ms */ > + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); > + wait_ms(20); > > The changelog just says "Minor patch for uhci-hcd.c" > > Can you delete the wait_ms() and see if that is our culprit? This is the culprit. Removing this line corrects the latency problems on the server. A 20ms delay seems pretty excessive for an interrupt handler. I'm not sure what it is supposed to accomplish, but this seems like something that should be scheduled to run outside of the ISR. I must have messed up a test on the laptop that is also showing latency problems. On the laptop the problem *is* in both 2.5.68/2.5.69 and *is not* eliminated by turning off USB. The laptop uses the ohci driver anyways which is not effected by this patch. The laptop does not show latency problems on 2.4.20. So the patch above is definately a problem, but the problem I am seeing on the laptop is something unrelated, but part of 2.5.x (which I will investigate further). Thanks, Paul -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 13:57 ` Paul Fulghum @ 2003-05-12 14:06 ` Paul Fulghum 2003-05-12 16:24 ` Greg KH 1 sibling, 0 replies; 40+ messages in thread From: Paul Fulghum @ 2003-05-12 14:06 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Arnd Bergmann, johannes On Mon, 2003-05-12 at 08:57, Paul Fulghum wrote: > On Fri, 2003-05-09 at 16:28, Andrew Morton wrote: > > > This code was added to wakeup_hc(). It is called from uhci_irq(): > > > > + /* Global resume for 20ms */ > > + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); > > + wait_ms(20); > > > > The changelog just says "Minor patch for uhci-hcd.c" > > > > Can you delete the wait_ms() and see if that is our culprit? > > This is the culprit. > > Removing this line corrects the latency problems on > the server. A 20ms delay seems pretty excessive for an > interrupt handler. I'm not sure what it is supposed to > accomplish, but this seems like something that should > be scheduled to run outside of the ISR. > > I must have messed up a test on the laptop that is > also showing latency problems. On the laptop the > problem *is* in both 2.5.68/2.5.69 and *is not* > eliminated by turning off USB. The laptop uses the > ohci driver anyways which is not effected by this patch. > The laptop does not show latency problems on 2.4.20. > > So the patch above is definately a problem, > but the problem I am seeing on the laptop > is something unrelated, but part of 2.5.x > (which I will investigate further). > > Thanks, > Paul I forgot to add this snippet from the /var/log/messages file of the server in case it helps the USB maintainer in evaluating what to do about the above patch. kernel: drivers/usb/host/uhci-hcd.c: USB Universal Host Controller Interface driver v2.0 kernel: uhci-hcd 00:04.2: Intel Corp. 82371AB/EB/MB PIIX4 kernel: uhci-hcd 00:04.2: irq 19, io base 0000fce0 kernel: Please use the 'usbfs' filetype instead, the 'usbdevfs' name is deprecated. kernel: uhci-hcd 00:04.2: new USB bus registered, assigned bus number 1 kernel: hub 1-0:0: USB hub found kernel: hub 1-0:0: 2 ports detected -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 13:57 ` Paul Fulghum 2003-05-12 14:06 ` Paul Fulghum @ 2003-05-12 16:24 ` Greg KH 2003-05-12 17:08 ` Paul Fulghum 1 sibling, 1 reply; 40+ messages in thread From: Greg KH @ 2003-05-12 16:24 UTC (permalink / raw) To: Paul Fulghum; +Cc: Andrew Morton, stern, linux-kernel, Arnd Bergmann, johannes On Mon, May 12, 2003 at 08:57:42AM -0500, Paul Fulghum wrote: > On Fri, 2003-05-09 at 16:28, Andrew Morton wrote: > > > This code was added to wakeup_hc(). It is called from uhci_irq(): > > > > + /* Global resume for 20ms */ > > + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); > > + wait_ms(20); > > > > The changelog just says "Minor patch for uhci-hcd.c" > > > > Can you delete the wait_ms() and see if that is our culprit? > > This is the culprit. > > Removing this line corrects the latency problems on > the server. A 20ms delay seems pretty excessive for an > interrupt handler. I'm not sure what it is supposed to > accomplish, but this seems like something that should > be scheduled to run outside of the ISR. This should only happen when your hardware receives a "RESUME" signal from a USB device AND the host controller is in a global suspend state at that time. Now I think the wait_ms() call is valid for when this is really happening, but it sounds like you are having this happen all the time during normal operation. Are you using any USB devices with this server? Is USB enabled in the BIOS or not? Also, Johannes / Alan, should we be verifying the global suspend state when we read this value so that we don't accidentally call wakeup_hc() for hardware that sets this bit in an illegal way? I think that might be the proper fix for this problem. thanks, greg k-h ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 16:24 ` Greg KH @ 2003-05-12 17:08 ` Paul Fulghum 2003-05-12 17:30 ` Greg KH 0 siblings, 1 reply; 40+ messages in thread From: Paul Fulghum @ 2003-05-12 17:08 UTC (permalink / raw) To: Greg KH; +Cc: Andrew Morton, stern, linux-kernel, Arnd Bergmann, johannes On Mon, 2003-05-12 at 11:24, Greg KH wrote: > On Mon, May 12, 2003 at 08:57:42AM -0500, Paul Fulghum wrote: > > On Fri, 2003-05-09 at 16:28, Andrew Morton wrote: > > > > > This code was added to wakeup_hc(). It is called from uhci_irq(): > > > > > > + /* Global resume for 20ms */ > > > + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); > > > + wait_ms(20); > > > > > > The changelog just says "Minor patch for uhci-hcd.c" > > > > > > Can you delete the wait_ms() and see if that is our culprit? > > > > This is the culprit. > > > > Removing this line corrects the latency problems on > > the server. A 20ms delay seems pretty excessive for an > > interrupt handler. I'm not sure what it is supposed to > > accomplish, but this seems like something that should > > be scheduled to run outside of the ISR. > > This should only happen when your hardware receives a "RESUME" signal > from a USB device AND the host controller is in a global suspend state > at that time. > > Now I think the wait_ms() call is valid for when this is really > happening, but it sounds like you are having this happen all the time > during normal operation. It does appear to happen on a regular basis. Should the 20ms delay be in the ISR though? I thought it was standard practice to move such lengthy operations outside of the ISR so as not to impact interrupt latency for the system. > Are you using any USB devices with this > server? Is USB enabled in the BIOS or not? There are no USB devices attached to the server. There are no actual USB connectors, and the server's specs do not list USB. There is no option to enable/disable USB in the BIOS. So my guess is this is an unused portion of the chipset being detected and enabled. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 17:08 ` Paul Fulghum @ 2003-05-12 17:30 ` Greg KH 2003-05-12 17:49 ` Paul Fulghum 2003-05-13 15:26 ` Alan Stern 0 siblings, 2 replies; 40+ messages in thread From: Greg KH @ 2003-05-12 17:30 UTC (permalink / raw) To: Paul Fulghum; +Cc: Andrew Morton, stern, linux-kernel, Arnd Bergmann, johannes On Mon, May 12, 2003 at 12:08:21PM -0500, Paul Fulghum wrote: > On Mon, 2003-05-12 at 11:24, Greg KH wrote: > > On Mon, May 12, 2003 at 08:57:42AM -0500, Paul Fulghum wrote: > > > On Fri, 2003-05-09 at 16:28, Andrew Morton wrote: > > > > > > > This code was added to wakeup_hc(). It is called from uhci_irq(): > > > > > > > > + /* Global resume for 20ms */ > > > > + outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); > > > > + wait_ms(20); > > > > > > > > The changelog just says "Minor patch for uhci-hcd.c" > > > > > > > > Can you delete the wait_ms() and see if that is our culprit? > > > > > > This is the culprit. > > > > > > Removing this line corrects the latency problems on > > > the server. A 20ms delay seems pretty excessive for an > > > interrupt handler. I'm not sure what it is supposed to > > > accomplish, but this seems like something that should > > > be scheduled to run outside of the ISR. > > > > This should only happen when your hardware receives a "RESUME" signal > > from a USB device AND the host controller is in a global suspend state > > at that time. > > > > Now I think the wait_ms() call is valid for when this is really > > happening, but it sounds like you are having this happen all the time > > during normal operation. > > It does appear to happen on a regular basis. > > Should the 20ms delay be in the ISR though? > I thought it was standard practice to move such > lengthy operations outside of the ISR so as not to > impact interrupt latency for the system. This should only happen when the hardware is suspended, and we are being woken up by a device. So this should be a _very_ rare occurance, and when it does happen, the latency isn't that big of a deal (we need it to wake up the hardware properly.) > > Are you using any USB devices with this > > server? Is USB enabled in the BIOS or not? > > There are no USB devices attached to the server. > There are no actual USB connectors, and the > server's specs do not list USB. There is no > option to enable/disable USB in the BIOS. Heh, then I would suggest not loading this driver at all. It sounds like you have an internal USB controller that probably does not have properly terminated connectors. thanks, greg k-h ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 17:30 ` Greg KH @ 2003-05-12 17:49 ` Paul Fulghum 2003-05-12 18:01 ` Greg KH 2003-05-13 15:26 ` Alan Stern 1 sibling, 1 reply; 40+ messages in thread From: Paul Fulghum @ 2003-05-12 17:49 UTC (permalink / raw) To: Greg KH; +Cc: Andrew Morton, stern, linux-kernel, Arnd Bergmann, johannes On Mon, 2003-05-12 at 12:30, Greg KH wrote: > > Should the 20ms delay be in the ISR though? > > I thought it was standard practice to move such > > lengthy operations outside of the ISR so as not to > > impact interrupt latency for the system. > > This should only happen when the hardware is suspended, and we are being > woken up by a device. So this should be a _very_ rare occurance, and > when it does happen, the latency isn't that big of a deal (we need it to > wake up the hardware properly.) So you feel interrupt latency does not matter when a machine is waking up? I'm not particularly worried about that situation, so I won't argue that. How about some sort of sanity check (as you mentioned earlier), so this is not shooting off all of the time during normal operation. > > There are no USB devices attached to the server. > > There are no actual USB connectors, and the > > server's specs do not list USB. There is no > > option to enable/disable USB in the BIOS. > > Heh, then I would suggest not loading this driver at all. It sounds > like you have an internal USB controller that probably does not have > properly terminated connectors. Maybe. But most distributions have the USB driver loaded by default, so if this new change stays as is, it will silently cause erratic problems for such machines (with unused USB controllers on the mainboard). Then this investigation will be repeated over and over by end users and anyone trying to support latency sensitive devices (such as standard serial ports) on Linux. So either a sanity check to prevent unnecessary calls to this delay, or recoding the delay so it does not run in the ISR and kill interrupt latency would be the correct thing to do. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 17:49 ` Paul Fulghum @ 2003-05-12 18:01 ` Greg KH 2003-05-12 18:15 ` Paul Fulghum 0 siblings, 1 reply; 40+ messages in thread From: Greg KH @ 2003-05-12 18:01 UTC (permalink / raw) To: Paul Fulghum; +Cc: Andrew Morton, stern, linux-kernel, Arnd Bergmann, johannes On Mon, May 12, 2003 at 12:49:29PM -0500, Paul Fulghum wrote: > > How about some sort of sanity check (as you mentioned > earlier), so this is not shooting off all of the time > during normal operation. That's the proper thing to do. Also possibly blacklisting your motherboard's USB controller. What does lspci -v show? thanks, greg k-h ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 18:01 ` Greg KH @ 2003-05-12 18:15 ` Paul Fulghum 0 siblings, 0 replies; 40+ messages in thread From: Paul Fulghum @ 2003-05-12 18:15 UTC (permalink / raw) To: Greg KH; +Cc: Andrew Morton, stern, linux-kernel, Arnd Bergmann, johannes On Mon, 2003-05-12 at 13:01, Greg KH wrote: > On Mon, May 12, 2003 at 12:49:29PM -0500, Paul Fulghum wrote: > > > > How about some sort of sanity check (as you mentioned > > earlier), so this is not shooting off all of the time > > during normal operation. > > That's the proper thing to do. Also possibly blacklisting your > motherboard's USB controller. What does lspci -v show? If both can be accomplished (state check to qualify delay and blacklisting), that would be optimal. The machine is the HP Netserver LC3 which does not officially have USB (but apparently has a vestigial controller), so blacklisting should be a no brainer. Output of lspci: 00:00.0 Host bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled) (rev 02) Flags: bus master, medium devsel, latency 64 Memory at <unassigned> (32-bit, prefetchable) [size=256M] 00:04.0 ISA bridge: Intel Corp. 82371AB/EB/MB PIIX4 ISA (rev 02) Flags: bus master, medium devsel, latency 0 00:04.1 IDE interface: Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01) (prog-if 80 [Master]) Flags: bus master, medium devsel, latency 32 I/O ports at fcb0 [size=16] 00:04.2 USB Controller: Intel Corp. 82371AB/EB/MB PIIX4 USB (rev 01) (prog-if 00 [UHCI]) Flags: bus master, medium devsel, latency 32, IRQ 19 I/O ports at fce0 [size=32] 00:04.3 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 02) Flags: medium devsel, IRQ 9 00:07.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 02) (prog-if 00 [Normal decode]) Flags: bus master, medium devsel, latency 57 Bus: primary=00, secondary=01, subordinate=01, sec-latency=36 I/O behind bridge: 0000e000-0000efff Memory behind bridge: feb00000-febfffff Prefetchable memory behind bridge: 00000000fbf00000-00000000fbf00000 00:08.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 05) Subsystem: Hewlett-Packard Company NetServer 10/100TX Flags: bus master, medium devsel, latency 66, IRQ 16 Memory at fecfc000 (32-bit, prefetchable) [size=4K] I/O ports at fcc0 [size=32] Memory at fed00000 (32-bit, non-prefetchable) [size=1M] Expansion ROM at <unassigned> [disabled] [size=1M] Capabilities: [dc] Power Management version 1 00:09.0 Communication controller: Microgate Corporation: Unknown device 0030 (rev 01) Subsystem: Microgate Corporation: Unknown device 0030 Flags: medium devsel, IRQ 17 Memory at fecfd400 (32-bit, non-prefetchable) [size=128] I/O ports at fc00 [size=128] Memory at fecfd800 (32-bit, non-prefetchable) [size=512] Memory at fec80000 (32-bit, prefetchable) [size=256K] Memory at fecfdc00 (32-bit, non-prefetchable) [size=16] 00:0a.0 SCSI storage controller: Adaptec AIC-7880U (rev 01) Subsystem: Adaptec AIC-7880P Ultra/Ultra Wide SCSI Chipset Flags: bus master, medium devsel, latency 64, IRQ 19 I/O ports at f800 [disabled] [size=256] Memory at fecff000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at <unassigned> [disabled] [size=64K] Capabilities: [dc] Power Management version 1 00:0d.0 VGA compatible controller: Cirrus Logic GD 5446 (rev 45) (prog-if 00 [VGA]) Subsystem: Hewlett-Packard Company: Unknown device 0001 Flags: medium devsel Memory at fc000000 (32-bit, prefetchable) [size=32M] Memory at fecfe000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at <unassigned> [disabled] [size=32K] 01:02.0 Communication controller: Microgate Corporation: Unknown device 0020 (rev 01) Subsystem: Microgate Corporation: Unknown device 0020 Flags: medium devsel, IRQ 18 Memory at febff800 (32-bit, non-prefetchable) [size=128] I/O ports at ec00 [size=128] I/O ports at ecfc [size=4] 01:03.0 Communication controller: Microgate Corporation: Unknown device 0210 (rev 02) Subsystem: Microgate Corporation: Unknown device 0210 Flags: medium devsel, IRQ 19 Memory at febff400 (32-bit, non-prefetchable) [size=128] I/O ports at e880 [size=128] I/O ports at ece8 [size=8] Memory at fbf80000 (32-bit, prefetchable) [size=256K] 01:04.0 Communication controller: Microgate Corporation SyncLink WAN Adapter (rev 01) Subsystem: Microgate Corporation SyncLink WAN Adapter Flags: medium devsel, IRQ 16 Memory at febfe800 (32-bit, non-prefetchable) [size=128] I/O ports at e800 [size=128] I/O ports at ecf0 [size=8] Memory at fbf40000 (32-bit, prefetchable) [size=256K] -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-12 17:30 ` Greg KH 2003-05-12 17:49 ` Paul Fulghum @ 2003-05-13 15:26 ` Alan Stern 2003-05-13 15:35 ` Paul Fulghum 1 sibling, 1 reply; 40+ messages in thread From: Alan Stern @ 2003-05-13 15:26 UTC (permalink / raw) To: Greg KH Cc: Paul Fulghum, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Mon, 12 May 2003, Greg KH wrote: > > > This should only happen when your hardware receives a "RESUME" signal > > > from a USB device AND the host controller is in a global suspend state > > > at that time. > > > > > > Now I think the wait_ms() call is valid for when this is really > > > happening, but it sounds like you are having this happen all the time > > > during normal operation. > > > > It does appear to happen on a regular basis. > > > > Should the 20ms delay be in the ISR though? > > I thought it was standard practice to move such > > lengthy operations outside of the ISR so as not to > > impact interrupt latency for the system. > > This should only happen when the hardware is suspended, and we are being > woken up by a device. So this should be a _very_ rare occurance, and > when it does happen, the latency isn't that big of a deal (we need it to > wake up the hardware properly.) Putting in a sanity check for the global suspend state will be very easy. But I would like to point out that this "global suspend" does not refer to the entire system, only the USB bus. I'm not sure under what circumstances the bus is placed in global suspend; I think it's just when there are no devices attached (or the last remaining device is detached). However, there have been cases on my own system where turning off the only USB peripheral caused the driver to bounce between suspend_hc() and wakeup_hc() several times without any apparent explanation -- possibly as a result of transient electrical signals on the bus (?). So perhaps moving that delay out of the ISR isn't such a bad idea. Alan Stern ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 15:26 ` Alan Stern @ 2003-05-13 15:35 ` Paul Fulghum 2003-05-13 17:30 ` Greg KH 2003-05-13 20:17 ` Bill Davidsen 0 siblings, 2 replies; 40+ messages in thread From: Paul Fulghum @ 2003-05-13 15:35 UTC (permalink / raw) To: Alan Stern; +Cc: Greg KH, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, 2003-05-13 at 10:26, Alan Stern wrote: > Putting in a sanity check for the global suspend state will be very easy. > But I would like to point out that this "global suspend" does not refer to > the entire system, only the USB bus. That is a problem then, because the delay can still occur during normal system operation. > I'm not sure under what > circumstances the bus is placed in global suspend; I think it's just when > there are no devices attached (or the last remaining device is detached). > > However, there have been cases on my own system where turning off the only > USB peripheral caused the driver to bounce between suspend_hc() and > wakeup_hc() several times without any apparent explanation -- possibly as > a result of transient electrical signals on the bus (?). So perhaps > moving that delay out of the ISR isn't such a bad idea. Agreed. If this can happen on functional USB controllers when no devices are attached, then it is a serious problem. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 15:35 ` Paul Fulghum @ 2003-05-13 17:30 ` Greg KH 2003-05-13 13:01 ` Paul Fulghum 2003-05-13 20:17 ` Bill Davidsen 1 sibling, 1 reply; 40+ messages in thread From: Greg KH @ 2003-05-13 17:30 UTC (permalink / raw) To: Paul Fulghum Cc: Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, May 13, 2003 at 10:35:07AM -0500, Paul Fulghum wrote: > On Tue, 2003-05-13 at 10:26, Alan Stern wrote: > > > Putting in a sanity check for the global suspend state will be very easy. > > But I would like to point out that this "global suspend" does not refer to > > the entire system, only the USB bus. > > That is a problem then, because the delay can still > occur during normal system operation. Ok, can you try the attached patch and see if it causes your latency problem to go away? thanks, greg k-h --- a/drivers/usb/host/uhci-hcd.c Sun May 4 23:49:54 2003 +++ b/drivers/usb/host/uhci-hcd.c Tue May 13 10:26:02 2003 @@ -1947,6 +1947,11 @@ dbg("%x: wakeup_hc", io_addr); + /* Verify that we really should wake up the hc */ + status = inw(io_addr + USBCMD); + if (!(status & USBCMD_EGSM)) + return; + /* Global resume for 20ms */ outw(USBCMD_FGR | USBCMD_EGSM, io_addr + USBCMD); wait_ms(20); ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 17:30 ` Greg KH @ 2003-05-13 13:01 ` Paul Fulghum 2003-05-13 18:09 ` Greg KH 0 siblings, 1 reply; 40+ messages in thread From: Paul Fulghum @ 2003-05-13 13:01 UTC (permalink / raw) To: Greg KH; +Cc: Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, 2003-05-13 at 12:30, Greg KH wrote: > On Tue, May 13, 2003 at 10:35:07AM -0500, Paul Fulghum wrote: > > On Tue, 2003-05-13 at 10:26, Alan Stern wrote: > > > > > Putting in a sanity check for the global suspend state will be very easy. > > > But I would like to point out that this "global suspend" does not refer to > > > the entire system, only the USB bus. > > > > That is a problem then, because the delay can still > > occur during normal system operation. > > Ok, can you try the attached patch and see if it causes your latency > problem to go away? I applied the patch plus a couple of printk statements, and the wakeup_hc() is being continuously called as well as actually executing the delay. So the check is not preventing anything. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 13:01 ` Paul Fulghum @ 2003-05-13 18:09 ` Greg KH 2003-05-13 18:11 ` Greg KH 0 siblings, 1 reply; 40+ messages in thread From: Greg KH @ 2003-05-13 18:09 UTC (permalink / raw) To: Paul Fulghum Cc: Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, May 13, 2003 at 08:01:01AM -0500, Paul Fulghum wrote: > On Tue, 2003-05-13 at 12:30, Greg KH wrote: > > On Tue, May 13, 2003 at 10:35:07AM -0500, Paul Fulghum wrote: > > > On Tue, 2003-05-13 at 10:26, Alan Stern wrote: > > > > > > > Putting in a sanity check for the global suspend state will be very easy. > > > > But I would like to point out that this "global suspend" does not refer to > > > > the entire system, only the USB bus. > > > > > > That is a problem then, because the delay can still > > > occur during normal system operation. > > > > Ok, can you try the attached patch and see if it causes your latency > > problem to go away? > > I applied the patch plus a couple of printk statements, > and the wakeup_hc() is being continuously called > as well as actually executing the delay. Is the suspend_hc() function ever getting called by anyone in that driver? thanks, greg k-h ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 18:09 ` Greg KH @ 2003-05-13 18:11 ` Greg KH 2003-05-13 21:35 ` Alan Stern 2003-05-14 21:06 ` Paul Fulghum 0 siblings, 2 replies; 40+ messages in thread From: Greg KH @ 2003-05-13 18:11 UTC (permalink / raw) To: Paul Fulghum Cc: Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, May 13, 2003 at 11:09:13AM -0700, Greg KH wrote: > On Tue, May 13, 2003 at 08:01:01AM -0500, Paul Fulghum wrote: > > On Tue, 2003-05-13 at 12:30, Greg KH wrote: > > > On Tue, May 13, 2003 at 10:35:07AM -0500, Paul Fulghum wrote: > > > > On Tue, 2003-05-13 at 10:26, Alan Stern wrote: > > > > > > > > > Putting in a sanity check for the global suspend state will be very easy. > > > > > But I would like to point out that this "global suspend" does not refer to > > > > > the entire system, only the USB bus. > > > > > > > > That is a problem then, because the delay can still > > > > occur during normal system operation. > > > > > > Ok, can you try the attached patch and see if it causes your latency > > > problem to go away? > > > > I applied the patch plus a couple of printk statements, > > and the wakeup_hc() is being continuously called > > as well as actually executing the delay. > > Is the suspend_hc() function ever getting called by anyone in that > driver? Ok, nevermind, I see where it would be getting called under normal operation... Hm, I don't really know. Johannes, any thoughts? thanks, greg k-h ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 18:11 ` Greg KH @ 2003-05-13 21:35 ` Alan Stern 2003-05-13 21:48 ` Helge Hafting 2003-05-14 21:06 ` Paul Fulghum 1 sibling, 1 reply; 40+ messages in thread From: Alan Stern @ 2003-05-13 21:35 UTC (permalink / raw) To: Greg KH Cc: Paul Fulghum, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, 13 May 2003, Greg KH wrote: > On Tue, May 13, 2003 at 11:09:13AM -0700, Greg KH wrote: > > On Tue, May 13, 2003 at 08:01:01AM -0500, Paul Fulghum wrote: > > > > > > I applied the patch plus a couple of printk statements, > > > and the wakeup_hc() is being continuously called > > > as well as actually executing the delay. > > > > Is the suspend_hc() function ever getting called by anyone in that > > driver? > > Ok, nevermind, I see where it would be getting called under normal > operation... > > Hm, I don't really know. Johannes, any thoughts? My take is that wakeup_hc() is getting called whenever some stray signal causes the device to generate an interrupt, and then a little while later the stall timer routine calls suspend_hc() since nothing is active. The interrupts are probably indistinguishable from what you would get if a new device really had just been attached to the bus. Assuming this analysis is correct, only malfunctioning hardware would ever cause the problem to arise. Still, it's something that needs to be handled. (That's a tricky point -- to what extent should the kernel try to compensate for broken hardware?) Unfortunately, there isn't any obvious way to tell that under these circumstances the wakeup_hc() routine doesn't need to run. Using a timer routine to implement that 20 ms delay would at least remove the large interrupt latency. However, this presents some problems as well. In particular, is there anything that would prevent suspend_hc() from being called before the timer had expired? We don't want to find ourselves simultaneously trying to turn the USB controller on and off. Getting that done properly will require some thought. Maybe a kind of grace period would help: each time the controller changes state, don't allow another change until at least 1 second later. That would also help the "bouncing" effect I see when I turn on or off my USB CD-RW drive. Alan Stern ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 21:35 ` Alan Stern @ 2003-05-13 21:48 ` Helge Hafting 2003-05-13 22:09 ` Alan Stern 0 siblings, 1 reply; 40+ messages in thread From: Helge Hafting @ 2003-05-13 21:48 UTC (permalink / raw) To: Alan Stern Cc: Greg KH, Paul Fulghum, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, May 13, 2003 at 05:35:47PM -0400, Alan Stern wrote: > My take is that wakeup_hc() is getting called whenever some stray signal > causes the device to generate an interrupt, and then a little while later > the stall timer routine calls suspend_hc() since nothing is active. The > interrupts are probably indistinguishable from what you would get if a new > device really had just been attached to the bus. > Could this also happen if the USB interrupt is shared? The other device interrupts, and the kernel calls into usb interrupt routine just in case USB has some data too? Helge Hafting ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 21:48 ` Helge Hafting @ 2003-05-13 22:09 ` Alan Stern 0 siblings, 0 replies; 40+ messages in thread From: Alan Stern @ 2003-05-13 22:09 UTC (permalink / raw) To: Helge Hafting Cc: Greg KH, Paul Fulghum, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, 13 May 2003, Helge Hafting wrote: > On Tue, May 13, 2003 at 05:35:47PM -0400, Alan Stern wrote: > > My take is that wakeup_hc() is getting called whenever some stray signal > > causes the device to generate an interrupt, and then a little while later > > the stall timer routine calls suspend_hc() since nothing is active. The > > interrupts are probably indistinguishable from what you would get if a new > > device really had just been attached to the bus. > > > Could this also happen if the USB interrupt is shared? > The other device interrupts, and the kernel calls into > usb interrupt routine just in case USB has some data too? Yes, it certainly could. The other part of the problem, which I failed to mention, is that the Resume-Detect bit in the USB controller's status register is set. wakeup_hc() gets called only if that bit is set, and the bit is supposed to be set only if some device attached to the USB bus has requested a wakeup (also known as "resume"). If there's nothing on the bus, the controller shouldn't indicate that a resume was detected. Alan Stern ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 18:11 ` Greg KH 2003-05-13 21:35 ` Alan Stern @ 2003-05-14 21:06 ` Paul Fulghum 2003-05-14 21:15 ` Johannes Erdfelt 2003-05-14 21:30 ` Greg KH 1 sibling, 2 replies; 40+ messages in thread From: Paul Fulghum @ 2003-05-14 21:06 UTC (permalink / raw) To: Greg KH; +Cc: Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Tue, 2003-05-13 at 13:11, Greg KH wrote: I was looking over the PIIX3 datasheet and noticed that the USBSTS_RD bit is only valid when the device is in the suspended state. This bit is being acted on regardless of the suspend state of the controller in the ISR. Could this be why the driver is detecting false 'resume' signals and calling wakeup_hc() when it shouldn't? Maybe the code should be something like: if (uhci->is_suspended && (status & USBSTS_RD)) wakeup_hc(uhci); in the ISR to qualify acting on that status bit. Alternatively, USBCMD_EGSM (BIT3) of the USBCMD register could be tested to qualify action on the state of USBSTS_RD I'm going to test this now, but I wanted to know what you think. Thanks, Paul -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-14 21:06 ` Paul Fulghum @ 2003-05-14 21:15 ` Johannes Erdfelt 2003-05-14 21:30 ` Greg KH 1 sibling, 0 replies; 40+ messages in thread From: Johannes Erdfelt @ 2003-05-14 21:15 UTC (permalink / raw) To: Paul Fulghum Cc: Greg KH, Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann On Wed, May 14, 2003, Paul Fulghum <paulkf@microgate.com> wrote: > I was looking over the PIIX3 datasheet and noticed > that the USBSTS_RD bit is only valid when the > device is in the suspended state. > > This bit is being acted on regardless of the > suspend state of the controller in the ISR. > Could this be why the driver is detecting > false 'resume' signals and calling wakeup_hc() > when it shouldn't? > > Maybe the code should be something like: > > if (uhci->is_suspended && (status & USBSTS_RD)) > wakeup_hc(uhci); > > in the ISR to qualify acting on that status bit. > Alternatively, USBCMD_EGSM (BIT3) of the USBCMD > register could be tested to qualify action on > the state of USBSTS_RD > > I'm going to test this now, but I wanted to > know what you think. Good eye. That may very well be the problem. Looking at the UHCI specs, it says the same thing, but I never really noticed it before. JE ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-14 21:06 ` Paul Fulghum 2003-05-14 21:15 ` Johannes Erdfelt @ 2003-05-14 21:30 ` Greg KH 2003-05-14 21:45 ` Paul Fulghum 1 sibling, 1 reply; 40+ messages in thread From: Greg KH @ 2003-05-14 21:30 UTC (permalink / raw) To: Paul Fulghum Cc: Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Wed, May 14, 2003 at 04:06:33PM -0500, Paul Fulghum wrote: > On Tue, 2003-05-13 at 13:11, Greg KH wrote: > > I was looking over the PIIX3 datasheet and noticed > that the USBSTS_RD bit is only valid when the > device is in the suspended state. > > This bit is being acted on regardless of the > suspend state of the controller in the ISR. > Could this be why the driver is detecting > false 'resume' signals and calling wakeup_hc() > when it shouldn't? > > Maybe the code should be something like: > > if (uhci->is_suspended && (status & USBSTS_RD)) > wakeup_hc(uhci); That's basically what the code I sent you did :) > in the ISR to qualify acting on that status bit. > Alternatively, USBCMD_EGSM (BIT3) of the USBCMD > register could be tested to qualify action on > the state of USBSTS_RD > > I'm going to test this now, but I wanted to > know what you think. I think it's correct, but I don't think it will solve your problem. I would be very happy to be wrong though. thanks, greg k-h ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-14 21:30 ` Greg KH @ 2003-05-14 21:45 ` Paul Fulghum 0 siblings, 0 replies; 40+ messages in thread From: Paul Fulghum @ 2003-05-14 21:45 UTC (permalink / raw) To: Greg KH; +Cc: Alan Stern, Andrew Morton, linux-kernel, Arnd Bergmann, johannes On Wed, 2003-05-14 at 16:30, Greg KH wrote: > On Wed, May 14, 2003 at 04:06:33PM -0500, Paul Fulghum wrote: > > On Tue, 2003-05-13 at 13:11, Greg KH wrote: > > > > I was looking over the PIIX3 datasheet and noticed > > that the USBSTS_RD bit is only valid when the > > device is in the suspended state. > > > > This bit is being acted on regardless of the > > suspend state of the controller in the ISR. > > Could this be why the driver is detecting > > false 'resume' signals and calling wakeup_hc() > > when it shouldn't? > > > > Maybe the code should be something like: > > > > if (uhci->is_suspended && (status & USBSTS_RD)) > > wakeup_hc(uhci); > > That's basically what the code I sent you did :) Yes, that's right. In this case suspend_hc() is being called anyways, so the controller *is* in the suspended state. suspend_hc() and wakeup_hc() are spinning back and forth forever. For some reason I thought this was firing without a call to suspend_hc(), but I verified it with printks. I tried it both with is_suspended, and again testing USBCMD_EGSM in the command register (Greg's patch) with same results. So it is a good check to add to qualify USBSTS_RD, but in this case it looks like the mainboard implementation is FUBAR and bogus resume messages are being recognized by the controller. Is there some transient period after setting USBCMD_EGSM before which the controller is not officially in the suspended state that might cause a spurious USBSTS_RD indication? (seems unlikely) > > in the ISR to qualify acting on that status bit. > > Alternatively, USBCMD_EGSM (BIT3) of the USBCMD > > register could be tested to qualify action on > > the state of USBSTS_RD > > > > I'm going to test this now, but I wanted to > > know what you think. > > I think it's correct, but I don't think it will solve your problem. I > would be very happy to be wrong though. You are right (IMO) that it is correct and should be added, and you are also right in that it does not solve this problem. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 15:35 ` Paul Fulghum 2003-05-13 17:30 ` Greg KH @ 2003-05-13 20:17 ` Bill Davidsen 2003-05-13 22:39 ` Paul Fulghum 1 sibling, 1 reply; 40+ messages in thread From: Bill Davidsen @ 2003-05-13 20:17 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel On 13 May 2003, Paul Fulghum wrote: > On Tue, 2003-05-13 at 10:26, Alan Stern wrote: > > > Putting in a sanity check for the global suspend state will be very easy. > > But I would like to point out that this "global suspend" does not refer to > > the entire system, only the USB bus. > > That is a problem then, because the delay can still > occur during normal system operation. > > > I'm not sure under what > > circumstances the bus is placed in global suspend; I think it's just when > > there are no devices attached (or the last remaining device is detached). > > > > However, there have been cases on my own system where turning off the only > > USB peripheral caused the driver to bounce between suspend_hc() and > > wakeup_hc() several times without any apparent explanation -- possibly as > > a result of transient electrical signals on the bus (?). So perhaps > > moving that delay out of the ISR isn't such a bad idea. > > Agreed. If this can happen on functional USB controllers > when no devices are attached, then it is a serious problem. Instead of trying to guess when to do it, could the sleep be replaced by setting a flag bit to indicate that a sleep was needed before using the hardware? Then the sleep could be done when needed but no noise on the USB bus wouldn't hurt. 1 - there may be many places, I thought of that but didn't look since someone will tell me if it's a problem. 2 - if you don't use USB why not just take the driver out? It would be nice to prevent the problem, of course. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-13 20:17 ` Bill Davidsen @ 2003-05-13 22:39 ` Paul Fulghum 0 siblings, 0 replies; 40+ messages in thread From: Paul Fulghum @ 2003-05-13 22:39 UTC (permalink / raw) To: Bill Davidsen; +Cc: linux-kernel On Tue, 2003-05-13 at 15:17, Bill Davidsen wrote: > 2 - if you don't use USB why not just take the driver out? Because a driver that runs amok, silently causing interrupt latency problems, becomes a real support nightmare for others. > It would be nice to prevent the problem, of course. Agreed -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-09 21:28 ` Andrew Morton 2003-05-12 13:57 ` Paul Fulghum @ 2003-05-14 17:50 ` Paul Fulghum 1 sibling, 0 replies; 40+ messages in thread From: Paul Fulghum @ 2003-05-14 17:50 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Fri, 2003-05-09 at 16:28, Andrew Morton wrote: > Paul Fulghum <paulkf@microgate.com> wrote: > > > > In the process of eliminating kernel options to isolate > > the problem, eliminating USB completely appears to fix it. > > > > One machine (server) was using usb-uhci and > > the other (laptop) was using usb-ohci. > > > > So it looks like something with USB in 2.5.68-bk11 The latency problem seen on the laptop turned out to be a stupid mistake on my part: I enabled the ALI15XX IDE controller option as a module instead of in kernel and so it was not available for using DMA mode. Once corrected the latency is running at a smooth 20us without the >5ms spikes associated with PIO IDE. Final Diagnosis: server latency problem = USB wakeup_hc() delay added in 2.5.68-bk11 laptop latency problem = user with dain bramage Thanks, Paul -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-09 18:12 ` Paul Fulghum 2003-05-09 20:30 ` Paul Fulghum @ 2003-05-09 21:07 ` Andrew Morton 2003-05-09 21:28 ` Paul Fulghum 1 sibling, 1 reply; 40+ messages in thread From: Andrew Morton @ 2003-05-09 21:07 UTC (permalink / raw) To: Paul Fulghum; +Cc: linux-kernel, randy.dunlap Paul Fulghum <paulkf@microgate.com> wrote: > > On Thu, 2003-05-08 at 14:22, Andrew Morton wrote: > > Can you pinpoint a kernel version at which it started to happen? > > I have now isolated the latency problems further to 2.5.68-bk11 > > 2.5.68-bk10 an earlier works fine. Well I'm darned if I can see a thing wrong there. Are you using ieee1394, or USB, or any fancy networking features? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-09 21:07 ` Andrew Morton @ 2003-05-09 21:28 ` Paul Fulghum 0 siblings, 0 replies; 40+ messages in thread From: Paul Fulghum @ 2003-05-09 21:28 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Randy.Dunlap On Fri, 2003-05-09 at 16:07, Andrew Morton wrote: > Well I'm darned if I can see a thing wrong there. Are you using > ieee1394, or USB, or any fancy networking features? ieee1394 is disabled, pretty basic network options (started from make defconfig) See my reponse to Arnd Bergmann for more details. I'm not thoroughly convinced it's USB either. I'm still collecting info and testing different versions to try and piece this together. -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: 2.5.69 Interrupt Latency 2003-05-07 22:28 ` Andrew Morton 2003-05-08 0:25 ` Paul Fulghum 2003-05-08 13:56 ` Paul Fulghum @ 2003-05-08 14:47 ` Paul Fulghum 2 siblings, 0 replies; 40+ messages in thread From: Paul Fulghum @ 2003-05-08 14:47 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Wed, 2003-05-07 at 17:28, Andrew Morton wrote: > Paul Fulghum <paulkf@microgate.com> wrote: > > > > 2.5.69 > > Latency 100-110usec (5x increase) > > Spikes from 5-10 milliseconds > > > Could be that some random piece of code forgot to reenable interrupts, and > things stay that way until they get reenabled again by schedule() or > syscall return. > > One way of finding the culprit would be: > > my_isr() > { > if (this interrupt is more than 5 milliseconds delayed) > dump_stack(); > } > > the stack dump will point up at the place where interrupts finally got > enabled. Here is what I got (latency spike in milliseconds): synclinkscc is a driver I maintain, and that is where I placed the stack_dump() Call Trace: [<cc8bdf1a>] IsrStatusB+0x1fa/0x2c0 [synclinkscc] [<cc8c5b29>] +0xf4/0x5ab [synclinkscc] [<cc8c5fe0>] +0x0/0x14c8 [synclinkscc] [<cc8be575>] mgscc_interrupt+0xa5/0x1b0 [synclinkscc] [<cc8c5fe0>] +0x0/0x14c8 [synclinkscc] [<c010d5eb>] handle_IRQ_event+0x4b/0x120 [<c010d89b>] do_IRQ+0x9b/0x110 [<c010bc00>] common_interrupt+0x18/0x20 [<c01255db>] do_softirq+0x6b/0xe0 [<c010d8fb>] do_IRQ+0xfb/0x110 [<c0108f90>] default_idle+0x0/0x40 [<c010bc00>] common_interrupt+0x18/0x20 [<c0108f90>] default_idle+0x0/0x40 [<c0108fc0>] default_idle+0x30/0x40 [<c010905a>] cpu_idle+0x4a/0x60 [<c0105000>] rest_init+0x0/0x60 [<c0456951>] start_kernel+0x181/0x1b0 [<c0456500>] unknown_bootoption+0x0/0x110 -- Paul Fulghum, paulkf@microgate.com Microgate Corporation, http://www.microgate.com ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2003-05-14 21:34 UTC | newest] Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20030507162013$0b67@gated-at.bofh.it> [not found] ` <20030507195008$71e6@gated-at.bofh.it> [not found] ` <20030507224009$4228@gated-at.bofh.it> [not found] ` <20030508140022$2498@gated-at.bofh.it> [not found] ` <20030508193016$1083@gated-at.bofh.it> [not found] ` <20030509182012$49f0@gated-at.bofh.it> [not found] ` <20030509204010$3c9b@gated-at.bofh.it> 2003-05-09 21:06 ` 2.5.69 Interrupt Latency Arnd Bergmann 2003-05-09 21:25 ` Paul Fulghum 2003-05-07 16:12 Paul Fulghum 2003-05-07 19:41 ` Paul Fulghum 2003-05-07 22:28 ` Andrew Morton 2003-05-08 0:25 ` Paul Fulghum 2003-05-08 13:56 ` Paul Fulghum 2003-05-08 19:22 ` Andrew Morton 2003-05-08 19:35 ` Paul Fulghum 2003-05-08 23:20 ` Brian Gerst 2003-05-09 18:12 ` Paul Fulghum 2003-05-09 20:30 ` Paul Fulghum 2003-05-09 21:28 ` Andrew Morton 2003-05-12 13:57 ` Paul Fulghum 2003-05-12 14:06 ` Paul Fulghum 2003-05-12 16:24 ` Greg KH 2003-05-12 17:08 ` Paul Fulghum 2003-05-12 17:30 ` Greg KH 2003-05-12 17:49 ` Paul Fulghum 2003-05-12 18:01 ` Greg KH 2003-05-12 18:15 ` Paul Fulghum 2003-05-13 15:26 ` Alan Stern 2003-05-13 15:35 ` Paul Fulghum 2003-05-13 17:30 ` Greg KH 2003-05-13 13:01 ` Paul Fulghum 2003-05-13 18:09 ` Greg KH 2003-05-13 18:11 ` Greg KH 2003-05-13 21:35 ` Alan Stern 2003-05-13 21:48 ` Helge Hafting 2003-05-13 22:09 ` Alan Stern 2003-05-14 21:06 ` Paul Fulghum 2003-05-14 21:15 ` Johannes Erdfelt 2003-05-14 21:30 ` Greg KH 2003-05-14 21:45 ` Paul Fulghum 2003-05-13 20:17 ` Bill Davidsen 2003-05-13 22:39 ` Paul Fulghum 2003-05-14 17:50 ` Paul Fulghum 2003-05-09 21:07 ` Andrew Morton 2003-05-09 21:28 ` Paul Fulghum 2003-05-08 14:47 ` Paul Fulghum
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).