* rt20 patch question @ 2006-05-09 12:23 Mark Hounschell 2006-05-09 14:38 ` Daniel Walker 0 siblings, 1 reply; 55+ messages in thread From: Mark Hounschell @ 2006-05-09 12:23 UTC (permalink / raw) To: linux-kernel Can I assume configuring 'Complete preemption' is the same as configuring ('Voluntary preemption' + 'Hardirq' + 'Softirq' + default proc settings)? Thanks Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-09 12:23 rt20 patch question Mark Hounschell @ 2006-05-09 14:38 ` Daniel Walker 2006-05-09 14:58 ` Mark Hounschell 0 siblings, 1 reply; 55+ messages in thread From: Daniel Walker @ 2006-05-09 14:38 UTC (permalink / raw) To: markh; +Cc: linux-kernel On Tue, 2006-05-09 at 08:23 -0400, Mark Hounschell wrote: > Can I assume configuring 'Complete preemption' is the same as > configuring ('Voluntary preemption' + 'Hardirq' + 'Softirq' + default > proc settings)? Not Voluntary preemption, and I'm not sure what default proc settings is referring too . Complete preemption is like CONFIG_PREEMPT and softirq and hardirq threading .. The preemption isn't voluntary, it's forced . Daniel ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-09 14:38 ` Daniel Walker @ 2006-05-09 14:58 ` Mark Hounschell 2006-05-09 15:53 ` Daniel Walker 2006-05-10 12:39 ` Steven Rostedt 0 siblings, 2 replies; 55+ messages in thread From: Mark Hounschell @ 2006-05-09 14:58 UTC (permalink / raw) To: linux-kernel Daniel Walker wrote: > On Tue, 2006-05-09 at 08:23 -0400, Mark Hounschell wrote: >> Can I assume configuring 'Complete preemption' is the same as >> configuring ('Voluntary preemption' + 'Hardirq' + 'Softirq' + default >> proc settings)? > > Not Voluntary preemption, and I'm not sure what default proc settings is > referring too . The proc settings or boot options to enable or disable hardirq or softirq threading that you have avaialable in Voluntary preemption. > Complete preemption is like CONFIG_PREEMPT and softirq > and hardirq threading .. The preemption isn't voluntary, it's forced . > Complete preemption you have no choice of threading hard or soft irqs. They are threaded. So If I config Voluntary preemption + Hardirq and Softirq threading and do not disable hardirq or softirq via proc or boot cmdline, is that the same as configuring Complete preemption? Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-09 14:58 ` Mark Hounschell @ 2006-05-09 15:53 ` Daniel Walker 2006-05-10 12:39 ` Steven Rostedt 1 sibling, 0 replies; 55+ messages in thread From: Daniel Walker @ 2006-05-09 15:53 UTC (permalink / raw) To: markh; +Cc: linux-kernel On Tue, 2006-05-09 at 10:58 -0400, Mark Hounschell wrote: > > So If I config Voluntary preemption + Hardirq and Softirq threading and > do not disable hardirq or softirq via proc or boot cmdline, is that the > same as configuring Complete preemption? No . Daniel ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-09 14:58 ` Mark Hounschell 2006-05-09 15:53 ` Daniel Walker @ 2006-05-10 12:39 ` Steven Rostedt 2006-05-10 13:06 ` Mark Hounschell 1 sibling, 1 reply; 55+ messages in thread From: Steven Rostedt @ 2006-05-10 12:39 UTC (permalink / raw) To: Mark Hounschell; +Cc: linux-kernel, Daniel Walker (It is expected on LKML to not touch the CC list, and especially keep the one you are replying to) On Tue, 9 May 2006, Mark Hounschell wrote: > Daniel Walker wrote: > > On Tue, 2006-05-09 at 08:23 -0400, Mark Hounschell wrote: > >> Can I assume configuring 'Complete preemption' is the same as > >> configuring ('Voluntary preemption' + 'Hardirq' + 'Softirq' + default > >> proc settings)? > > > > Not Voluntary preemption, and I'm not sure what default proc settings is > > referring too . > > The proc settings or boot options to enable or disable hardirq or > softirq threading that you have avaialable in Voluntary preemption. > > > Complete preemption is like CONFIG_PREEMPT and softirq > > and hardirq threading .. The preemption isn't voluntary, it's forced . > > > > Complete preemption you have no choice of threading hard or soft irqs. > They are threaded. > > So If I config Voluntary preemption + Hardirq and Softirq threading and > do not disable hardirq or softirq via proc or boot cmdline, is that the > same as configuring Complete preemption? > No not at all. First Voluntary preemption means that when you are executing in the kernel, and a higher priority process needs to run, the kernel will _not_ be preempted! Voluntary preemption means that there are places in the kernel that are marked as preemption points. So if you are in the kernel and you hit a preemption point, a check is made then to see if the scheduler should be called. So, really this is not a true preemptive kernel. Next you have "Preemptible Kernel (Low Latency Desktop)". This _is_ a preemptive kernel. Which means that, unless preemption is disabled, the default is to preempt a process whether or not it's in the kernel if either it finished it's run time, or a higher priority process wants to run. There is protective places in the kernel that disallow preemption (basically between spinlocks and preempt_enable/disable). But even with Preemptible Kernel + Hardirq and Softirq threading, you still don't have the same as complete preemption. This is because the full preemption turns the spinlocks into mutexes that are not only preemptible, but schedule on contention. To do this, Hard and Soft irqs must be threaded. This is because they use spinlocks, and to schedule in an interrupt, it must be acting as a thread. So you can't have full complete preemption without threading the Hard and Soft irqs, and that's why there is no option to not have them threaded. Without full preemption, you also lose out on having the PI for the spinlock mutexes. -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-10 12:39 ` Steven Rostedt @ 2006-05-10 13:06 ` Mark Hounschell 2006-05-10 14:10 ` Steven Rostedt 0 siblings, 1 reply; 55+ messages in thread From: Mark Hounschell @ 2006-05-10 13:06 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-kernel, Daniel Walker Steven Rostedt wrote: > (It is expected on LKML to not touch the CC list, and especially keep the > one you are replying to) > Ok. I'm on so many it's hard to remember what each want. > On Tue, 9 May 2006, Mark Hounschell wrote: > >> Daniel Walker wrote: >>> On Tue, 2006-05-09 at 08:23 -0400, Mark Hounschell wrote: >>>> Can I assume configuring 'Complete preemption' is the same as >>>> configuring ('Voluntary preemption' + 'Hardirq' + 'Softirq' + default >>>> proc settings)? >>> Not Voluntary preemption, and I'm not sure what default proc settings is >>> referring too . >> The proc settings or boot options to enable or disable hardirq or >> softirq threading that you have avaialable in Voluntary preemption. >> >>> Complete preemption is like CONFIG_PREEMPT and softirq >>> and hardirq threading .. The preemption isn't voluntary, it's forced . >>> >> Complete preemption you have no choice of threading hard or soft irqs. >> They are threaded. >> >> So If I config Voluntary preemption + Hardirq and Softirq threading and >> do not disable hardirq or softirq via proc or boot cmdline, is that the >> same as configuring Complete preemption? >> > > No not at all. > > First Voluntary preemption means that when you are executing in the > kernel, and a higher priority process needs to run, the kernel will _not_ > be preempted! Voluntary preemption means that there are places in the > kernel that are marked as preemption points. So if you are in the kernel > and you hit a preemption point, a check is made then to see if the > scheduler should be called. So, really this is not a true preemptive > kernel. > > Next you have "Preemptible Kernel (Low Latency Desktop)". This _is_ a > preemptive kernel. Which means that, unless preemption is disabled, the > default is to preempt a process whether or not it's in the kernel if > either it finished it's run time, or a higher priority process wants to > run. There is protective places in the kernel that disallow preemption > (basically between spinlocks and preempt_enable/disable). > > But even with Preemptible Kernel + Hardirq and Softirq threading, you > still don't have the same as complete preemption. This is because the > full preemption turns the spinlocks into mutexes that are not only > preemptible, but schedule on contention. To do this, Hard and Soft irqs > must be threaded. This is because they use spinlocks, and to schedule > in an interrupt, it must be acting as a thread. So you can't have full > complete preemption without threading the Hard and Soft irqs, and that's > why there is no option to not have them threaded. > > Without full preemption, you also lose out on having the PI for the > spinlock mutexes. > > -- Steve > > Thank you. That is exactly what I wanted to know. I ask because when I run my app in complete preemption mode I have random periods where the machine stops for many seconds at a time. Only in complete preemption mode does this happen. In Voluntary and Preempt modes this does not occure. I'm having a hard time trying to determine if the problem is in my application. Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-10 13:06 ` Mark Hounschell @ 2006-05-10 14:10 ` Steven Rostedt 2006-05-10 15:33 ` Mark Hounschell 0 siblings, 1 reply; 55+ messages in thread From: Steven Rostedt @ 2006-05-10 14:10 UTC (permalink / raw) To: Mark Hounschell; +Cc: linux-kernel, Daniel Walker On Wed, 10 May 2006, Mark Hounschell wrote: > Steven Rostedt wrote: > > (It is expected on LKML to not touch the CC list, and especially keep the > > one you are replying to) > > > Ok. I'm on so many it's hard to remember what each want. :) I've read that in other lists it's impolite to CC others. I still do it :} I find that, espically if I'm on lots of lists, if I'm on a thread, I prefer to be emailed directly, that way I know about a topic that I might need to quickly respond to. I never pay attention to policies abount stripping CC lists, because I don't ever want to be stripped from a thread I'm interested in. The LKML has 300 to 700 emails a day, so you really do need to CC those, otherwise you'll be lost in the noise. > [snip] > Thank you. That is exactly what I wanted to know. I ask because when I > run my app in complete preemption mode I have random periods where the > machine stops for many seconds at a time. Only in complete preemption > mode does this happen. In Voluntary and Preempt modes this does not > occure. I'm having a hard time trying to determine if the problem is in > my application. > OK, now you got my attention. What do you mean by your machine stops? Are you playing with priorities? You might want to turn on latency tracing, although it could be a PI leak. But I really need to know more, since I'm suspecting that your app isn't written properly to work with a true RT environment. RT means that you can easily freeze the machine if you have a high prio task that runs more than you expect it to. With this power comes great responsibility, as well as understanding. Is this SMP or UP? Could you explain you app a little and what tasks are RT? Thanks, -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-10 14:10 ` Steven Rostedt @ 2006-05-10 15:33 ` Mark Hounschell 2006-05-10 16:17 ` Steven Rostedt 2006-05-10 18:45 ` rt20 patch question Steven Rostedt 0 siblings, 2 replies; 55+ messages in thread From: Mark Hounschell @ 2006-05-10 15:33 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-kernel, Daniel Walker Steven Rostedt wrote: > On Wed, 10 May 2006, Mark Hounschell wrote: > >> Steven Rostedt wrote: >>> (It is expected on LKML to not touch the CC list, and especially keep the >>> one you are replying to) >>> >> Ok. I'm on so many it's hard to remember what each want. > > :) I've read that in other lists it's impolite to CC others. I still do > it :} I find that, espically if I'm on lots of lists, if I'm on a thread, > I prefer to be emailed directly, that way I know about a topic that I > might need to quickly respond to. I never pay attention to policies > abount stripping CC lists, because I don't ever want to be stripped from a > thread I'm interested in. The LKML has 300 to 700 emails a day, so you > really do need to CC those, otherwise you'll be lost in the noise. > > > [snip] > >> Thank you. That is exactly what I wanted to know. I ask because when I >> run my app in complete preemption mode I have random periods where the >> machine stops for many seconds at a time. Only in complete preemption >> mode does this happen. In Voluntary and Preempt modes this does not >> occure. I'm having a hard time trying to determine if the problem is in >> my application. >> > > OK, now you got my attention. What do you mean by your machine stops? > > Are you playing with priorities? You might want to turn on latency > tracing, although it could be a PI leak. But I really need to know more, > since I'm suspecting that your app isn't written properly to work with a > true RT environment. > > RT means that you can easily freeze the machine if you have a high prio > task that runs more than you expect it to. With this power comes great > responsibility, as well as understanding. > > Is this SMP or UP? > > Could you explain you app a little and what tasks are RT? > > Thanks, > > -- Steve > > Ok, I'll try to explain the application. It is an emulation of some old legacy hardware (SEL-32) that ran a proprietary RTOS (MPX-32). We emulate the hardware not the software. We have some specialized pci cards that emulate some of that hardware. IE, a card that has some timers and external interrupt capabilities (RTOM). All our drivers are GPL BTW. This app can only run in an SMP environment BTW. And the more power the better. Anyway the legacy CPU is the main thread/process and each legacy I/O device, whether virtual or real, is emulated or driven by one or more other threads. The most important part of this as far as Real-Time is concerned is the determinism/latancy in the deliverance of interrupts (external or timer) to the main CPU thread from the RTOM card. Determinism of I/O operations is also important however. We achive the best results by using both process and irq affinity. The CPU thread/process and the irq of the RTOM pci card are bound to a single processor and all other 'user' processes and irqs are forced off that processor onto the other processor. The apps I/O threads are bound to that other processor also. The CPU process does not relinquish his processor. He is in a loop fetching and executing legacy machine language instructions and only comes out of that loop upon receiving an interrupt from the RTOM card or some I/O completion event from one of the I/O threads. You know, kind of like a real CPU would do. The CPU process/thread in the past ran at FIFO prio 99 and the I/O threads at lower FIFO or RR priorities. With rt20 all our priorities are now set below the range used for hardirqs. All this has worked well in the past as long as we control what else is run on the system. We want to be able to use the machine for other things and still have some reasonable determinism in the application. So we are looking the the rt patch. We have some other tools that give us an fairly good indication as to the determinism of any given box and can see the rt patch in complete preempt mode does in fact make a difference. So to my problem. What I mean by "the machine stops" is just that all indications of the mouse, keyboard, and vidio stop. Then in a few seconds will usually continue. At first I only saw problems when using ethernet in the emulation. I would telnet into the emulation from the linux box and do the equivalent of cat'ing a very large file. The machine will always "stop" somewhere randomly along the display. Then maybe continue on or maybe not. So I thought I might have a problem with my ethernet module. Then I noticed similar things with the SCSI module when accessing legacy scsi devices from within the emulation. Somtimes the whole machine doesn't stop. It would appear that only somethings have stopped. Like one or more of my I/O threads?? I can only say for sure that I do not have these "stops" when running any other kernel or when running the rt20 kernel in any of the non-complete preemption modes. The only change that had to be made to this app for it to run at all on the rt20 kernel was insuring that the RTOM irq thread was at a higher priority than the CPU process/thread. Otherwise no signals were received from the RTOM. Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-10 15:33 ` Mark Hounschell @ 2006-05-10 16:17 ` Steven Rostedt 2006-05-10 18:30 ` Mark Hounschell 2006-05-10 18:45 ` rt20 patch question Steven Rostedt 1 sibling, 1 reply; 55+ messages in thread From: Steven Rostedt @ 2006-05-10 16:17 UTC (permalink / raw) To: Mark Hounschell; +Cc: linux-kernel, Daniel Walker Wow! I asked for some info on your system, and boy, did I get info! :) On Wed, 10 May 2006, Mark Hounschell wrote: > > Ok, I'll try to explain the application. It is an emulation of some old > legacy hardware (SEL-32) that ran a proprietary RTOS (MPX-32). We > emulate the hardware not the software. We have some specialized pci > cards that emulate some of that hardware. IE, a card that has some > timers and external interrupt capabilities (RTOM). All our drivers are > GPL BTW. > [snip long explaination of system] > > So to my problem. What I mean by "the machine stops" is just that all > indications of the mouse, keyboard, and vidio stop. Then in a few > seconds will usually continue. At first I only saw problems when using > ethernet in the emulation. I would telnet into the emulation from the > linux box and do the equivalent of cat'ing a very large file. The > machine will always "stop" somewhere randomly along the display. Then > maybe continue on or maybe not. So I thought I might have a problem with > my ethernet module. Then I noticed similar things with the SCSI module > when accessing legacy scsi devices from within the emulation. Somtimes > the whole machine doesn't stop. It would appear that only somethings > have stopped. Like one or more of my I/O threads?? > > I can only say for sure that I do not have these "stops" when running > any other kernel or when running the rt20 kernel in any of the > non-complete preemption modes. > > The only change that had to be made to this app for it to run at all on > the rt20 kernel was insuring that the RTOM irq thread was at a higher > priority than the CPU process/thread. Otherwise no signals were received > from the RTOM. > Well, I need to leave the office tonight. I'll look at it in my hotel, and tomorrow, I'll give some feedback. Thanks, -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-10 16:17 ` Steven Rostedt @ 2006-05-10 18:30 ` Mark Hounschell 2006-05-10 18:49 ` Steven Rostedt 2006-05-10 20:33 ` Steven Rostedt 0 siblings, 2 replies; 55+ messages in thread From: Mark Hounschell @ 2006-05-10 18:30 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-kernel, Daniel Walker Steven Rostedt wrote: > Wow! I asked for some info on your system, and boy, did I get info! :) > Sorry. I talk to much. >> I can only say for sure that I do not have these "stops" when running >> any other kernel or when running the rt20 kernel in any of the >> non-complete preemption modes. >> Configured for "Preempable Kernel" I got the following but no "stops" came with it. BUG: scheduling while atomic: softirq-timer/1/0x00000100/15 caller is schedule+0x33/0xf0 [<b0309acc>] __schedule+0x517/0x95b (8) [<f09d7627>] mdio_ctrl+0xaa/0x135 [e100] (48) [<f09d7627>] mdio_ctrl+0xaa/0x135 [e100] (12) [<b030a06c>] schedule+0x33/0xf0 (36) [<b012eee5>] prepare_to_wait+0x12/0x4f (8) [<b0142318>] synchronize_irq+0x96/0xba (20) [<b012eda0>] autoremove_wake_function+0x0/0x37 (12) [<f0a13677>] vortex_timer+0xa0/0x563 [3c59x] (24) [<b0125b76>] __mod_timer+0x8c/0xc3 (12) [<f09d8998>] e100_watchdog+0x0/0x39c [e100] (24) [<b030a4cf>] cond_resched_softirq+0x64/0xaa (8) [<b02a2dcd>] dev_watchdog+0x77/0xac (4) [<f0a135d7>] vortex_timer+0x0/0x563 [3c59x] (12) [<b0125902>] run_timer_softirq+0x1bf/0x3a7 (8) [<b0121960>] ksoftirqd+0x112/0x1cc (52) [<b012184e>] ksoftirqd+0x0/0x1cc (52) [<b012eb9c>] kthread+0xc2/0xc6 (4) [<b012eada>] kthread+0x0/0xc6 (12) [<b0100e35>] kernel_thread_helper+0x5/0xb (16) Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-10 18:30 ` Mark Hounschell @ 2006-05-10 18:49 ` Steven Rostedt 2006-05-10 19:28 ` Mark Hounschell 2006-05-10 20:33 ` Steven Rostedt 1 sibling, 1 reply; 55+ messages in thread From: Steven Rostedt @ 2006-05-10 18:49 UTC (permalink / raw) To: Mark Hounschell; +Cc: linux-kernel, Daniel Walker On Wed, 10 May 2006, Mark Hounschell wrote: > Steven Rostedt wrote: > > Wow! I asked for some info on your system, and boy, did I get info! :) > > > > Sorry. I talk to much. No, by all means, I liked it. > > >> I can only say for sure that I do not have these "stops" when running > >> any other kernel or when running the rt20 kernel in any of the > >> non-complete preemption modes. > >> > > Configured for "Preempable Kernel" I got the following but no "stops" > came with it. Hmm, do you have "Compile kernel with frame pointers" turned on. It's in kernel hacking. It usually gives a better stack trace. > > BUG: scheduling while atomic: softirq-timer/1/0x00000100/15 > caller is schedule+0x33/0xf0 > [<b0309acc>] __schedule+0x517/0x95b (8) > [<f09d7627>] mdio_ctrl+0xaa/0x135 [e100] (48) > [<f09d7627>] mdio_ctrl+0xaa/0x135 [e100] (12) > [<b030a06c>] schedule+0x33/0xf0 (36) > [<b012eee5>] prepare_to_wait+0x12/0x4f (8) > [<b0142318>] synchronize_irq+0x96/0xba (20) > [<b012eda0>] autoremove_wake_function+0x0/0x37 (12) > [<f0a13677>] vortex_timer+0xa0/0x563 [3c59x] (24) > [<b0125b76>] __mod_timer+0x8c/0xc3 (12) > [<f09d8998>] e100_watchdog+0x0/0x39c [e100] (24) > [<b030a4cf>] cond_resched_softirq+0x64/0xaa (8) > [<b02a2dcd>] dev_watchdog+0x77/0xac (4) > [<f0a135d7>] vortex_timer+0x0/0x563 [3c59x] (12) > [<b0125902>] run_timer_softirq+0x1bf/0x3a7 (8) > [<b0121960>] ksoftirqd+0x112/0x1cc (52) > [<b012184e>] ksoftirqd+0x0/0x1cc (52) > [<b012eb9c>] kthread+0xc2/0xc6 (4) > [<b012eada>] kthread+0x0/0xc6 (12) > [<b0100e35>] kernel_thread_helper+0x5/0xb (16) I'll look into this. -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-10 18:49 ` Steven Rostedt @ 2006-05-10 19:28 ` Mark Hounschell 2006-05-11 11:25 ` Mark Hounschell 0 siblings, 1 reply; 55+ messages in thread From: Mark Hounschell @ 2006-05-10 19:28 UTC (permalink / raw) To: Steven Rostedt; +Cc: linux-kernel, Daniel Walker Steven Rostedt wrote: > On Wed, 10 May 2006, Mark Hounschell wrote: > >> Steven Rostedt wrote: >>> Wow! I asked for some info on your system, and boy, did I get info! :) >>> >> Sorry. I talk to much. > > No, by all means, I liked it. > >>>> I can only say for sure that I do not have these "stops" when running >>>> any other kernel or when running the rt20 kernel in any of the >>>> non-complete preemption modes. >>>> >> Configured for "Preempable Kernel" I got the following but no "stops" >> came with it. > > Hmm, do you have "Compile kernel with frame pointers" turned on. It's in > kernel hacking. It usually gives a better stack trace. > I'll turn frame pointers on on this machine and do it again and then also send you (off list) the logdev stuff you asked for from another machine that is configured complete-preempt. Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-10 19:28 ` Mark Hounschell @ 2006-05-11 11:25 ` Mark Hounschell 2006-05-11 12:01 ` Steven Rostedt 0 siblings, 1 reply; 55+ messages in thread From: Mark Hounschell @ 2006-05-11 11:25 UTC (permalink / raw) To: markh, Ingo Molnar; +Cc: Steven Rostedt, linux-kernel, Daniel Walker Mark Hounschell wrote: > Steven Rostedt wrote: >>> Configured for "Preempable Kernel" I got the following but no "stops" >>> came with it. >> >> Hmm, do you have "Compile kernel with frame pointers" turned on. It's in >> kernel hacking. It usually gives a better stack trace. >> > > I'll turn frame pointers on on this machine and do it again and then > also send you (off list) the logdev stuff you asked for from another > machine that is configured complete-preempt. > This is with frame pointers on but doesn't look any more revealing to me. After this one my network connection into the emulation was broken BTW. And yes hard and soft irqs are threaded, preemptable-kernel, and classic RCU BUG: scheduling while atomic: softirq-timer/1/0x00000100/15 caller is schedule+0x33/0xf0 [<b01041c9>] show_trace+0xd/0xf (8) [<b01041e2>] dump_stack+0x17/0x19 (12) [<b03112ec>] __schedule+0x517/0x95b (96) [<b031188f>] schedule+0x33/0xf0 (28) [<b014381f>] synchronize_irq+0x94/0xb9 (40) [<b0143943>] disable_irq+0x31/0x35 (16) [<f0a12715>] vortex_timer+0xa1/0x55b [3c59x] (72) [<b01261d5>] run_timer_softirq+0x1ce/0x3de (56) [<b012212c>] ksoftirqd+0x110/0x1cb (60) [<b012f851>] kthread+0xc8/0xcc (32) [<b0100e15>] kernel_thread_helper+0x5/0xb (268935196) I hope it was OK to add Ingo to the CC list? Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-11 11:25 ` Mark Hounschell @ 2006-05-11 12:01 ` Steven Rostedt 2006-05-11 12:22 ` Steven Rostedt 0 siblings, 1 reply; 55+ messages in thread From: Steven Rostedt @ 2006-05-11 12:01 UTC (permalink / raw) To: Mark Hounschell; +Cc: Ingo Molnar, linux-kernel, Daniel Walker On Thu, 11 May 2006, Mark Hounschell wrote: > Mark Hounschell wrote: > > This is with frame pointers on but doesn't look any more revealing to > me. After this one my network connection into the emulation was broken > BTW. And yes hard and soft irqs are threaded, preemptable-kernel, and > classic RCU > > BUG: scheduling while atomic: softirq-timer/1/0x00000100/15 > caller is schedule+0x33/0xf0 > [<b01041c9>] show_trace+0xd/0xf (8) > [<b01041e2>] dump_stack+0x17/0x19 (12) > [<b03112ec>] __schedule+0x517/0x95b (96) > [<b031188f>] schedule+0x33/0xf0 (28) > [<b014381f>] synchronize_irq+0x94/0xb9 (40) > [<b0143943>] disable_irq+0x31/0x35 (16) > [<f0a12715>] vortex_timer+0xa1/0x55b [3c59x] (72) > [<b01261d5>] run_timer_softirq+0x1ce/0x3de (56) > [<b012212c>] ksoftirqd+0x110/0x1cb (60) > [<b012f851>] kthread+0xc8/0xcc (32) > [<b0100e15>] kernel_thread_helper+0x5/0xb (268935196) Nope, this trace is _a_lot_ better, the previous trace had a lot of garbage in it. Anyway, I already figured out the problem from the last dump. Could you try the patch below to see if it fixes it. > > I hope it was OK to add Ingo to the CC list? > Yep, that's fine, in fact, he should have been added. Try this patch to see if it fixes that bug. -- Steve Index: linux-2.6.16-rt20/kernel/sched.c =================================================================== --- linux-2.6.16-rt20.orig/kernel/sched.c 2006-05-10 16:23:15.000000000 -0400 +++ linux-2.6.16-rt20/kernel/sched.c 2006-05-10 16:28:31.000000000 -0400 @@ -3316,7 +3316,8 @@ void __sched __schedule(void) /* * Test if we are atomic. */ - if (unlikely(in_atomic())) { + if (unlikely(in_atomic()) && + (!hardirq_preemption || (preempt_count() & PREEMPT_MASK))) { stop_trace(); printk(KERN_ERR "BUG: scheduling while atomic: " "%s/0x%08x/%d\n", ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-11 12:01 ` Steven Rostedt @ 2006-05-11 12:22 ` Steven Rostedt 2006-05-11 13:02 ` Mark Hounschell 0 siblings, 1 reply; 55+ messages in thread From: Steven Rostedt @ 2006-05-11 12:22 UTC (permalink / raw) To: Mark Hounschell; +Cc: Ingo Molnar, linux-kernel, Daniel Walker On Thu, 11 May 2006, Steven Rostedt wrote: > On Thu, 11 May 2006, Mark Hounschell wrote: > > > Mark Hounschell wrote: > > > > After this one my network connection into the emulation was broken > > BTW. Crap! I read your email too quick and didn't notice this. The patch I sent you would probably only fix the warning, and not the bug. Don't do the patch yet. Could your run it again, and after it does the bug, and you lose the network connection, could you do a sysrq-t to get the state of the tasks. You need sysrq turned on in the config. Ingo, If soft and hard irqs are threaded on a non PREEMPT_RT kernel? Is it ok to call schedule in a softirq thread? Specifically, it's the disable_irq that's causing the problem. Thanks, -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-11 12:22 ` Steven Rostedt @ 2006-05-11 13:02 ` Mark Hounschell 2006-05-11 13:14 ` Steven Rostedt 0 siblings, 1 reply; 55+ messages in thread From: Mark Hounschell @ 2006-05-11 13:02 UTC (permalink / raw) To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel, Daniel Walker Steven Rostedt wrote: > On Thu, 11 May 2006, Steven Rostedt wrote: > >> On Thu, 11 May 2006, Mark Hounschell wrote: >> >>> Mark Hounschell wrote: >>> >>> After this one my network connection into the emulation was broken >>> BTW. > > Crap! I read your email too quick and didn't notice this. > > The patch I sent you would probably only fix the warning, and not the bug. > > Don't do the patch yet. Could your run it again, and after it does the > bug, and you lose the network connection, could you do a sysrq-t to get > the state of the tasks. You need sysrq turned on in the config. > I hate to sound stupid but when I alt-sysreq-t or sysreq-t nothing happens??? I do have sysreq configured. CONFIG_MAGIC_SYSRQ=y Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-11 13:02 ` Mark Hounschell @ 2006-05-11 13:14 ` Steven Rostedt 2006-05-11 13:26 ` Mark Hounschell 0 siblings, 1 reply; 55+ messages in thread From: Steven Rostedt @ 2006-05-11 13:14 UTC (permalink / raw) To: Mark Hounschell; +Cc: Ingo Molnar, linux-kernel, Daniel Walker On Thu, 11 May 2006, Mark Hounschell wrote: > > I hate to sound stupid but when I alt-sysreq-t or sysreq-t nothing > happens??? I do have sysreq configured. CONFIG_MAGIC_SYSRQ=y > dmesg doesn't show anything? Are you also capturing output from the serial? -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-11 13:14 ` Steven Rostedt @ 2006-05-11 13:26 ` Mark Hounschell 2006-05-11 13:53 ` Steven Rostedt 0 siblings, 1 reply; 55+ messages in thread From: Mark Hounschell @ 2006-05-11 13:26 UTC (permalink / raw) To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel, Daniel Walker Steven Rostedt wrote: > On Thu, 11 May 2006, Mark Hounschell wrote: > >> I hate to sound stupid but when I alt-sysreq-t or sysreq-t nothing >> happens??? I do have sysreq configured. CONFIG_MAGIC_SYSRQ=y >> > > dmesg doesn't show anything? Are you also capturing output from the > serial? > > -- Steve > > dmesg only shows the BUGs. I have nothing connect to my serial port. I certainly can if I need to. When finally the network connection closes all my threads must be in fairly good shape because if I simply restart the network software inside the emulation I'm good to go again. Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-11 13:26 ` Mark Hounschell @ 2006-05-11 13:53 ` Steven Rostedt 2006-05-11 14:57 ` Mark Hounschell 0 siblings, 1 reply; 55+ messages in thread From: Steven Rostedt @ 2006-05-11 13:53 UTC (permalink / raw) To: Mark Hounschell; +Cc: Ingo Molnar, linux-kernel, Daniel Walker On Thu, 11 May 2006, Mark Hounschell wrote: You can also try just # echo t > /proc/sysrq-trigger > dmesg only shows the BUGs. I have nothing connect to my serial port. I > certainly can if I need to. Sometimes a serial capture is easier to log, but you don't really need to do it. That's up to you. > > When finally the network connection closes all my threads must be in > fairly good shape because if I simply restart the network software > inside the emulation I'm good to go again. Hmm, I'm starting to think that this is not really a problem with the -rt implementation, and my earlier patch to turn off the BUG dump, is OK. What RT prio is the network interrupt at? What seems to be happening is that the vortex_timer is going off while the interrupt is running. Hence the disable_irq fails and schedules. Perhaps the interrupt thread has been preempted by some high priority task and causes it to lose a connection. Yeah that task output would be helpful to see if you can get it to work. Also can you show us the output of /proc/interrupts so we know which threads are associated to the network card interrupt, and see where they are. -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-11 13:53 ` Steven Rostedt @ 2006-05-11 14:57 ` Mark Hounschell 2006-05-12 6:47 ` Steven Rostedt 0 siblings, 1 reply; 55+ messages in thread From: Mark Hounschell @ 2006-05-11 14:57 UTC (permalink / raw) To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel, Daniel Walker Steven Rostedt wrote: > On Thu, 11 May 2006, Mark Hounschell wrote: > > You can also try just > > # echo t > /proc/sysrq-trigger > >> dmesg only shows the BUGs. I have nothing connect to my serial port. I >> certainly can if I need to. > > Sometimes a serial capture is easier to log, but you don't really need to > do it. That's up to you. > >> When finally the network connection closes all my threads must be in >> fairly good shape because if I simply restart the network software >> inside the emulation I'm good to go again. > > Hmm, I'm starting to think that this is not really a problem with the -rt > implementation, and my earlier patch to turn off the BUG dump, is OK. > You could be right. The only thing I am certain is rt20 related is those "stops" we are also talking about in complete-preempt mode. I can only say for sure that These BUGs are not seen using a 2.4.13.4 kernel. That kernel and this app are considered stable to me. All else is fair game. > What RT prio is the network interrupt at? > Here is a detailed list of the RT tasks running with prios, cpu masks etc. There are 3 nics. eth1 is the nic being used by the emulation. eth2 is currently unused. pid SCHED PRIO CPUM TASK --- ---- ---- ---- ---- 2 FIFO 99 1 (unknown) 3 FIFO 99 1 (unknown) 4 FIFO 1 1 (unknown) 5 FIFO 1 1 (unknown) 6 FIFO 1 1 (unknown) 7 FIFO 1 1 (unknown) 8 FIFO 1 1 (unknown) 9 FIFO 1 1 (unknown) 10 FIFO 1 1 (unknown) 12 FIFO 99 2 (unknown) 13 FIFO 99 2 (unknown) 14 FIFO 1 2 (unknown) 15 FIFO 1 2 (unknown) 16 FIFO 1 2 (unknown) 17 FIFO 1 2 (unknown) 18 FIFO 1 2 (unknown) 19 FIFO 1 2 (unknown) 20 FIFO 1 2 (unknown) 22 FIFO 1 1 (unknown) 23 FIFO 1 2 (unknown) 39 FIFO acpi 49 [IRQ 9] 1 (unknown) 1129 FIFO rtc 48 [IRQ 8] 1 (unknown) 1135 FIFO i8042 47 [IRQ 12] 1 (unknown) 1145 FIFO floppy 46 [IRQ 6] 1 (unknown) 1178 FIFO i8042 45 [IRQ 1] 1 (unknown) 1268 FIFO ide0 44 [IRQ 14] 1 (unknown) 1313 FIFO ide1 43 [IRQ 15] 1 (unknown) 1362 FIFO 42 [IRQ 169] 1 (unknown) ide2, aic7xxx, aic7xxx, eth1, eth2, gpiohsd, gpiohsd, gpiohsd, gpiohsd, eprm 2663 FIFO ??? 41 [IRQ 4] 1 (unknown) 2667 FIFO ??? 40 [IRQ 3] 1 (unknown) 3420 FIFO 82801BA 39 [IRQ 177] 1 (unknown) 5788 FIFO eth0 38 [IRQ 185] 1 (unknown) 8036 FIFO rtom 37 [IRQ 193] 2 (unknown) 10338 FIFO EMU-CPU 33 2 ./vrsx 10339 FIFO 9 1 ./vrsx 10340 FIFO 9 1 ./vrsx 10341 FIFO 9 1 ./vrsx 10342 FIFO 9 1 ./vrsx 10343 FIFO 23 1 ./vrsx 10344 FIFO 23 1 ./vrsx 10345 FIFO 9 1 ./vrsx 10346 FIFO 9 1 ./vrsx 10347 FIFO 9 1 ./vrsx 10348 FIFO 9 1 ./vrsx 10349 FIFO 9 1 ./vrsx 10350 FIFO 9 1 ./vrsx 10351 FIFO 9 1 ./vrsx 10356 FIFO 10 1 ./vrsx 10357 FIFO 9 1 ./vrsx 10358 FIFO 11 1 ./vrsx 10363 FIFO 10 1 ./vrsx 10364 FIFO 9 1 ./vrsx 10365 FIFO 11 1 ./vrsx 10366 FIFO 9 1 ./vrsx 10367 FIFO 9 1 ./vrsx 10368 FIFO 9 1 ./vrsx 10369 FIFO 9 1 ./vrsx 10370 FIFO 9 1 ./vrsx 10371 FIFO 16 1 ./vrsx 10372 FIFO 16 1 ./vrsx 10373 FIFO 16 1 ./vrsx 10374 FIFO 16 1 ./vrsx 10375 FIFO 15 1 ./vrsx 10376 FIFO 15 1 ./vrsx 10377 FIFO 15 1 ./vrsx 10378 FIFO 15 1 ./vrsx 10379 FIFO 15 1 ./vrsx 10380 FIFO 15 1 ./vrsx 10381 FIFO 15 1 ./vrsx 10382 FIFO 15 1 ./vrsx 10383 FIFO 15 1 ./vrsx 10384 FIFO 15 1 ./vrsx 10385 FIFO 15 1 ./vrsx 10386 FIFO 15 1 ./vrsx 10387 FIFO 15 1 ./vrsx 10388 FIFO 15 1 ./vrsx 10389 FIFO 15 1 ./vrsx 10390 FIFO 15 1 ./vrsx 10391 FIFO 15 1 ./vrsx 10392 FIFO 9 1 ./vrsx 10393 FIFO 9 1 ./vrsx > What seems to be happening is that the vortex_timer is going off while the > interrupt is running. Hence the disable_irq fails and schedules. > > Perhaps the interrupt thread has been preempted by some high priority task > and causes it to lose a connection. > > Yeah that task output would be helpful to see if you can get it to work. Ok I have this but it is 2000+ lines. I probably don't want to put it on the list. Should I send it to you directly? > Also can you show us the output of /proc/interrupts so we know which > threads are associated to the network card interrupt, and see where they > are. > harley:/home/markh/work/lcrs-linux # cat /proc/interrupts CPU0 CPU1 0: 450333 0 IO-APIC-edge [........N/ 0] pit 1: 4288 0 IO-APIC-edge [........./ 1] i8042 8: 2 0 IO-APIC-edge [........./ 0] rtc 9: 0 0 IO-APIC-level [........./ 0] acpi 12: 66129 0 IO-APIC-edge [........./ 1] i8042 14: 3523 0 IO-APIC-edge [........./ 0] ide0 15: 65675 0 IO-APIC-edge [........./ 0] ide1 169: 219209 0 IO-APIC-level [........./ 0] ide2, aic7xxx, aic7xxx, eth1, eth2, gpiohsd, gpiohsd, gpiohsd, gpiohsd, eprm 177: 1821 0 IO-APIC-level [........./ 0] Intel 82801BA-ICH2 185: 185550 0 IO-APIC-level [........./ 0] eth0 193: 0 76740 IO-APIC-level [........./ 0] rtom NMI: 0 0 LOC: 2657906 587751 ERR: 0 MIS: 0 The aic7xxx controllers are both connected to external legacy scsi racks. eth1, eth2, and the aix7xxx cards are in an SBS pci expansion chassis. The 3 gpiohsd and the 1 eprm cards are also in the expansion rack but are not being used at all in this. I'll send the sysreq data when I get it. Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-11 14:57 ` Mark Hounschell @ 2006-05-12 6:47 ` Steven Rostedt 2006-05-12 7:33 ` Sébastien Dugué 2006-05-12 9:08 ` Mark Hounschell 0 siblings, 2 replies; 55+ messages in thread From: Steven Rostedt @ 2006-05-12 6:47 UTC (permalink / raw) To: Mark Hounschell; +Cc: Ingo Molnar, linux-kernel, Daniel Walker On Thu, 11 May 2006, Mark Hounschell wrote: > > Here is a detailed list of the RT tasks running with prios, cpu masks > etc. There are 3 nics. eth1 is the nic being used by the emulation. eth2 > is currently unused. > > pid SCHED PRIO CPUM TASK > --- ---- ---- ---- ---- This being a SMP machine, pid 2 and 3 must be the migration threads. > 2 FIFO 99 1 (unknown) > 3 FIFO 99 1 (unknown) > 4 FIFO 1 1 (unknown) > 5 FIFO 1 1 (unknown) > 6 FIFO 1 1 (unknown) > 7 FIFO 1 1 (unknown) > 8 FIFO 1 1 (unknown) > 9 FIFO 1 1 (unknown) > 10 FIFO 1 1 (unknown) Do you know what these processes are (12 and 13)? > 12 FIFO 99 2 (unknown) > 13 FIFO 99 2 (unknown) [...] > 39 FIFO acpi 49 [IRQ 9] 1 (unknown) > 1129 FIFO rtc 48 [IRQ 8] 1 (unknown) > 1135 FIFO i8042 47 [IRQ 12] 1 (unknown) > 1145 FIFO floppy 46 [IRQ 6] 1 (unknown) > 1178 FIFO i8042 45 [IRQ 1] 1 (unknown) > 1268 FIFO ide0 44 [IRQ 14] 1 (unknown) > 1313 FIFO ide1 43 [IRQ 15] 1 (unknown) > FYI, The above are all of higher priority than the below. > 1362 FIFO 42 [IRQ 169] 1 (unknown) > ide2, aic7xxx, aic7xxx, eth1, eth2, > gpiohsd, gpiohsd, gpiohsd, gpiohsd, eprm Wow! that's a lot on a shared IRQ. Do you have the ide2 being used. If one of these where to spin for a while, then all the below would freeze. Also them being preempted will also have a problem. Perhaps you want to raise the priority of this interrupt thread. > > 2663 FIFO ??? 41 [IRQ 4] 1 (unknown) > 2667 FIFO ??? 40 [IRQ 3] 1 (unknown) > 3420 FIFO 82801BA 39 [IRQ 177] 1 (unknown) > 5788 FIFO eth0 38 [IRQ 185] 1 (unknown) > 8036 FIFO rtom 37 [IRQ 193] 2 (unknown) > 10338 FIFO EMU-CPU 33 2 ./vrsx [...] > > What seems to be happening is that the vortex_timer is going off while the > > interrupt is running. Hence the disable_irq fails and schedules. > > > > Perhaps the interrupt thread has been preempted by some high priority task > > and causes it to lose a connection. > > > > Yeah that task output would be helpful to see if you can get it to work. > > Ok I have this but it is 2000+ lines. I probably don't want to put it on > the list. Should I send it to you directly? Yes please (compress it as well). With so much shared on an IRQ and you are disabling it, it might cause some large timeouts. The disable irq with the hardirqs as threads is a sleep (that's where you hit the bug) where as otherwise it just spins and waits. So it can be a timing issue. Could also you try running the RT kernel without hardirqs as threads to see if it works fine then? > > > Also can you show us the output of /proc/interrupts so we know which > > threads are associated to the network card interrupt, and see where they > > are. > > > > harley:/home/markh/work/lcrs-linux # cat /proc/interrupts > CPU0 CPU1 > 0: 450333 0 IO-APIC-edge [........N/ 0] pit > 1: 4288 0 IO-APIC-edge [........./ 1] i8042 > 8: 2 0 IO-APIC-edge [........./ 0] rtc > 9: 0 0 IO-APIC-level [........./ 0] acpi > 12: 66129 0 IO-APIC-edge [........./ 1] i8042 > 14: 3523 0 IO-APIC-edge [........./ 0] ide0 > 15: 65675 0 IO-APIC-edge [........./ 0] ide1 > 169: 219209 0 IO-APIC-level [........./ 0] ide2, > aic7xxx, aic7xxx, eth1, eth2, gpiohsd, gpiohsd, gpiohsd, gpiohsd, eprm > 177: 1821 0 IO-APIC-level [........./ 0] Intel > 82801BA-ICH2 > 185: 185550 0 IO-APIC-level [........./ 0] eth0 > 193: 0 76740 IO-APIC-level [........./ 0] rtom > NMI: 0 0 > LOC: 2657906 587751 > ERR: 0 > MIS: 0 I see you are pinning all the irqs to CPU0 > > The aic7xxx controllers are both connected to external legacy scsi > racks. eth1, eth2, and the aix7xxx cards are in an SBS pci expansion > chassis. The 3 gpiohsd and the 1 eprm cards are also in the expansion > rack but are not being used at all in this. So all but the 3 gpiohsd and eprm are being used? Still that seems to be a lot. But anyway, send me the compressed task dump, and I'll take a look. Maybe it will shed some light. -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-12 6:47 ` Steven Rostedt @ 2006-05-12 7:33 ` Sébastien Dugué 2006-05-12 8:18 ` Mark Hounschell 2006-05-12 9:08 ` Mark Hounschell 1 sibling, 1 reply; 55+ messages in thread From: Sébastien Dugué @ 2006-05-12 7:33 UTC (permalink / raw) To: Steven Rostedt; +Cc: Mark Hounschell, Ingo Molnar, linux-kernel, Daniel Walker On Fri, 2006-05-12 at 02:47 -0400, Steven Rostedt wrote: > On Thu, 11 May 2006, Mark Hounschell wrote: > > > > > Here is a detailed list of the RT tasks running with prios, cpu masks > > etc. There are 3 nics. eth1 is the nic being used by the emulation. eth2 > > is currently unused. > > > > > pid SCHED PRIO CPUM TASK > > --- ---- ---- ---- ---- > > This being a SMP machine, pid 2 and 3 must be the migration threads. > > > 2 FIFO 99 1 (unknown) > > 3 FIFO 99 1 (unknown) > > > 4 FIFO 1 1 (unknown) > > 5 FIFO 1 1 (unknown) > > 6 FIFO 1 1 (unknown) > > 7 FIFO 1 1 (unknown) > > 8 FIFO 1 1 (unknown) > > 9 FIFO 1 1 (unknown) > > 10 FIFO 1 1 (unknown) > > Do you know what these processes are (12 and 13)? On my machine, the only other processes runnning at prio 99 are the posix_cpu_timer tasks. > > > 12 FIFO 99 2 (unknown) > > 13 FIFO 99 2 (unknown) > > [...] > Hope this helps. Sébastien. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-12 7:33 ` Sébastien Dugué @ 2006-05-12 8:18 ` Mark Hounschell 0 siblings, 0 replies; 55+ messages in thread From: Mark Hounschell @ 2006-05-12 8:18 UTC (permalink / raw) To: Sébastien Dugué Cc: Steven Rostedt, Mark Hounschell, Ingo Molnar, linux-kernel, Daniel Walker Sébastien Dugué wrote: > On Fri, 2006-05-12 at 02:47 -0400, Steven Rostedt wrote: >> On Thu, 11 May 2006, Mark Hounschell wrote: >> >>> Here is a detailed list of the RT tasks running with prios, cpu masks >>> etc. There are 3 nics. eth1 is the nic being used by the emulation. eth2 >>> is currently unused. >>> pid SCHED PRIO CPUM TASK >>> --- ---- ---- ---- ---- >> This being a SMP machine, pid 2 and 3 must be the migration threads. >> >>> 2 FIFO 99 1 (unknown) >>> 3 FIFO 99 1 (unknown) >>> 4 FIFO 1 1 (unknown) >>> 5 FIFO 1 1 (unknown) >>> 6 FIFO 1 1 (unknown) >>> 7 FIFO 1 1 (unknown) >>> 8 FIFO 1 1 (unknown) >>> 9 FIFO 1 1 (unknown) >>> 10 FIFO 1 1 (unknown) >> Do you know what these processes are (12 and 13)? > > On my machine, the only other processes runnning at prio 99 are > the posix_cpu_timer tasks. > >>> 12 FIFO 99 2 (unknown) >>> 13 FIFO 99 2 (unknown) >> [...] >> Yes you are correct. 2 ? S 0:00 [migration/0] 3 ? S 0:00 [posix_cpu_timer] . . 14 ? S 0:00 [migration/1] 15 ? S 0:00 [posix_cpu_timer] Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-12 6:47 ` Steven Rostedt 2006-05-12 7:33 ` Sébastien Dugué @ 2006-05-12 9:08 ` Mark Hounschell 2006-05-12 9:20 ` Steven Rostedt 1 sibling, 1 reply; 55+ messages in thread From: Mark Hounschell @ 2006-05-12 9:08 UTC (permalink / raw) To: Steven Rostedt; +Cc: Mark Hounschell, Ingo Molnar, linux-kernel, Daniel Walker Steven Rostedt wrote: > On Thu, 11 May 2006, Mark Hounschell wrote: > >> Here is a detailed list of the RT tasks running with prios, cpu masks >> etc. There are 3 nics. eth1 is the nic being used by the emulation. eth2 >> is currently unused. > >> pid SCHED PRIO CPUM TASK >> --- ---- ---- ---- ---- > > This being a SMP machine, pid 2 and 3 must be the migration threads. > >> 2 FIFO 99 1 (unknown) >> 3 FIFO 99 1 (unknown) > >> 4 FIFO 1 1 (unknown) >> 5 FIFO 1 1 (unknown) >> 6 FIFO 1 1 (unknown) >> 7 FIFO 1 1 (unknown) >> 8 FIFO 1 1 (unknown) >> 9 FIFO 1 1 (unknown) >> 10 FIFO 1 1 (unknown) > > Do you know what these processes are (12 and 13)? > 2 ? S 0:00 [migration/0] 3 ? S 0:00 [posix_cpu_timer] . . 14 ? S 0:00 [migration/1] 15 ? S 0:00 [posix_cpu_timer] >> 12 FIFO 99 2 (unknown) >> 13 FIFO 99 2 (unknown) > > [...] > >> 39 FIFO acpi 49 [IRQ 9] 1 (unknown) >> 1129 FIFO rtc 48 [IRQ 8] 1 (unknown) >> 1135 FIFO i8042 47 [IRQ 12] 1 (unknown) >> 1145 FIFO floppy 46 [IRQ 6] 1 (unknown) >> 1178 FIFO i8042 45 [IRQ 1] 1 (unknown) >> 1268 FIFO ide0 44 [IRQ 14] 1 (unknown) >> 1313 FIFO ide1 43 [IRQ 15] 1 (unknown) >> > > FYI, The above are all of higher priority than the below. > >> 1362 FIFO 42 [IRQ 169] 1 (unknown) >> ide2, aic7xxx, aic7xxx, eth1, eth2, >> gpiohsd, gpiohsd, gpiohsd, gpiohsd, eprm > > Wow! that's a lot on a shared IRQ. Do you have the ide2 being used. If > one of these where to spin for a while, then all the below would freeze. > Also them being preempted will also have a problem. Perhaps you want to > raise the priority of this interrupt thread. > I'll try that when I get to work this morning. >> 2663 FIFO ??? 41 [IRQ 4] 1 (unknown) >> 2667 FIFO ??? 40 [IRQ 3] 1 (unknown) >> 3420 FIFO 82801BA 39 [IRQ 177] 1 (unknown) >> 5788 FIFO eth0 38 [IRQ 185] 1 (unknown) >> 8036 FIFO rtom 37 [IRQ 193] 2 (unknown) >> 10338 FIFO EMU-CPU 33 2 ./vrsx > > [...] > >>> What seems to be happening is that the vortex_timer is going off while the >>> interrupt is running. Hence the disable_irq fails and schedules. >>> >>> Perhaps the interrupt thread has been preempted by some high priority task >>> and causes it to lose a connection. >>> >>> Yeah that task output would be helpful to see if you can get it to work. >> Ok I have this but it is 2000+ lines. I probably don't want to put it on >> the list. Should I send it to you directly? > Done. Keep in mind it was taken only after one of those BUGs that seemed to cause a network connection loss into the emulation. It was not taken after one of those "stops" in 'complete preempt' mode. Did the logdev output show anything of interest concerning the "stops"? > Yes please (compress it as well). With so much shared on an IRQ and you > are disabling it, it might cause some large timeouts. The disable irq with > the hardirqs as threads is a sleep (that's where you hit the bug) where as > otherwise it just spins and waits. So it can be a timing issue. > > Could also you try running the RT kernel without hardirqs as threads to > see if it works fine then? > I assume you mean in preemptable kernel mode. Will do asap. I have 4 machines I'm attempting to use the rt20 kernel on. All 4 have the "stop/pause" problem in complete preempt mode. This is the only one of the 4 that I have seen the BUGs message on so I suspect you are correct that it may be a result of hardware/configuration. I guess if raising that irq prio or eliminating the irq threads fixes it, then..... >>> Also can you show us the output of /proc/interrupts so we know which >>> threads are associated to the network card interrupt, and see where they >>> are. >>> >> harley:/home/markh/work/lcrs-linux # cat /proc/interrupts >> CPU0 CPU1 >> 0: 450333 0 IO-APIC-edge [........N/ 0] pit >> 1: 4288 0 IO-APIC-edge [........./ 1] i8042 >> 8: 2 0 IO-APIC-edge [........./ 0] rtc >> 9: 0 0 IO-APIC-level [........./ 0] acpi >> 12: 66129 0 IO-APIC-edge [........./ 1] i8042 >> 14: 3523 0 IO-APIC-edge [........./ 0] ide0 >> 15: 65675 0 IO-APIC-edge [........./ 0] ide1 >> 169: 219209 0 IO-APIC-level [........./ 0] ide2, >> aic7xxx, aic7xxx, eth1, eth2, gpiohsd, gpiohsd, gpiohsd, gpiohsd, eprm >> 177: 1821 0 IO-APIC-level [........./ 0] Intel >> 82801BA-ICH2 >> 185: 185550 0 IO-APIC-level [........./ 0] eth0 >> 193: 0 76740 IO-APIC-level [........./ 0] rtom >> NMI: 0 0 >> LOC: 2657906 587751 >> ERR: 0 >> MIS: 0 > > I see you are pinning all the irqs to CPU0 > All except the rtom irq. It is on the same processor as the emulations CPU thread. Even with the rt20 patch, this is still the only way to insure deterministic delivery of signals and such from the rtom driver to the emulations CPU thread. What I find with the rt20 patch (so far) is that now there seems to be an acceptable(sort of) "max" latency (-200usec) that allows me to use the machine for things other than just the emulation. >> The aic7xxx controllers are both connected to external legacy scsi >> racks. eth1, eth2, and the aix7xxx cards are in an SBS pci expansion >> chassis. The 3 gpiohsd and the 1 eprm cards are also in the expansion >> rack but are not being used at all in this. > > So all but the 3 gpiohsd and eprm are being used? Still that seems to be > a lot. But anyway, send me the compressed task dump, and I'll take a > look. Maybe it will shed some light. Actually eth2 is not being used either. Only one of the scsi controllers is being used and it only when I boot the emulation from one of the legacy scsi drives. One scsi card has a bus of tapes the other disks. When I boot the emulation from a virtual disk file it's not used at all. IDE2 is my 'linux' boot drive however. So do you think this BUG reported in 'preempt kernel' mode is related to the "stops" I am having in 'complete preempt mode? Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-12 9:08 ` Mark Hounschell @ 2006-05-12 9:20 ` Steven Rostedt 0 siblings, 0 replies; 55+ messages in thread From: Steven Rostedt @ 2006-05-12 9:20 UTC (permalink / raw) To: Mark Hounschell; +Cc: Mark Hounschell, Ingo Molnar, linux-kernel, Daniel Walker On Fri, 12 May 2006, Mark Hounschell wrote: > Steven Rostedt wrote: > > Done. Keep in mind it was taken only after one of those BUGs that seemed > to cause a network connection loss into the emulation. It was not taken > after one of those "stops" in 'complete preempt' mode. Did the logdev > output show anything of interest concerning the "stops"? Damn, your logdev email got lost in the noise. I'm glad you mentioned it otherwise I would have never known you sent it. I'll look at it now. > > So do you think this BUG reported in 'preempt kernel' mode is related to > the "stops" I am having in 'complete preempt mode? > Yes. That BUG thread that I included you on affects you if hardirqs are threaded in any preempt mode. So yes it is a bug in 'complete preempt mode' too. So that driver really does need to be fixed for you. Unfortunately, I don't have the time now to fix that. Perhaps someone else can? -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-10 18:30 ` Mark Hounschell 2006-05-10 18:49 ` Steven Rostedt @ 2006-05-10 20:33 ` Steven Rostedt 2006-05-12 8:16 ` Ingo Molnar 1 sibling, 1 reply; 55+ messages in thread From: Steven Rostedt @ 2006-05-10 20:33 UTC (permalink / raw) To: Mark Hounschell; +Cc: linux-kernel, Daniel Walker, Ingo Molnar, Thomas Gleixner On Wed, 10 May 2006, Mark Hounschell wrote: > > Configured for "Preempable Kernel" I got the following but no "stops" > came with it. > > BUG: scheduling while atomic: softirq-timer/1/0x00000100/15 > caller is schedule+0x33/0xf0 > [<b0309acc>] __schedule+0x517/0x95b (8) > [<f09d7627>] mdio_ctrl+0xaa/0x135 [e100] (48) > [<f09d7627>] mdio_ctrl+0xaa/0x135 [e100] (12) > [<b030a06c>] schedule+0x33/0xf0 (36) > [<b012eee5>] prepare_to_wait+0x12/0x4f (8) > [<b0142318>] synchronize_irq+0x96/0xba (20) > [<b012eda0>] autoremove_wake_function+0x0/0x37 (12) > [<f0a13677>] vortex_timer+0xa0/0x563 [3c59x] (24) > [<b0125b76>] __mod_timer+0x8c/0xc3 (12) > [<f09d8998>] e100_watchdog+0x0/0x39c [e100] (24) > [<b030a4cf>] cond_resched_softirq+0x64/0xaa (8) > [<b02a2dcd>] dev_watchdog+0x77/0xac (4) > [<f0a135d7>] vortex_timer+0x0/0x563 [3c59x] (12) > [<b0125902>] run_timer_softirq+0x1bf/0x3a7 (8) > [<b0121960>] ksoftirqd+0x112/0x1cc (52) > [<b012184e>] ksoftirqd+0x0/0x1cc (52) > [<b012eb9c>] kthread+0xc2/0xc6 (4) > [<b012eada>] kthread+0x0/0xc6 (12) > [<b0100e35>] kernel_thread_helper+0x5/0xb (16) > Ingo, I traced this down. It is caused by the disable_irq in vortex_timer that is called via run_timer_softirq. disable_irq can call synchronize_irq which can schedule. And thus you get this bug since we are in a softirq. This is the case where we are not in PREEMPT_RT but I'm guessing that Mark has interrupts has threads. Which would allow for synchronize_irq to schedule. So I guess we have a case that we can schedule, but while atomic and BUG when it's really not bad. Should we add something like this: Index: linux-2.6.16-rt20/kernel/sched.c =================================================================== --- linux-2.6.16-rt20.orig/kernel/sched.c 2006-05-10 16:23:15.000000000 -0400 +++ linux-2.6.16-rt20/kernel/sched.c 2006-05-10 16:28:31.000000000 -0400 @@ -3316,7 +3316,8 @@ void __sched __schedule(void) /* * Test if we are atomic. */ - if (unlikely(in_atomic())) { + if (unlikely(in_atomic()) && + (!hardirq_preemption || preempt_count() & PREEMPT_MASK)) { stop_trace(); printk(KERN_ERR "BUG: scheduling while atomic: " "%s/0x%08x/%d\n", -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-10 20:33 ` Steven Rostedt @ 2006-05-12 8:16 ` Ingo Molnar 2006-05-12 8:45 ` Steven Rostedt 0 siblings, 1 reply; 55+ messages in thread From: Ingo Molnar @ 2006-05-12 8:16 UTC (permalink / raw) To: Steven Rostedt Cc: Mark Hounschell, linux-kernel, Daniel Walker, Thomas Gleixner * Steven Rostedt <rostedt@goodmis.org> wrote: > Ingo, > > I traced this down. It is caused by the disable_irq in vortex_timer > that is called via run_timer_softirq. > > disable_irq can call synchronize_irq which can schedule. > > And thus you get this bug since we are in a softirq. hm. When there are threaded interrupts, we quite naturally have to synchronize via scheduling, in synchronize_irq() - the interrupt we are waiting on might be scheduled away! > So I guess we have a case that we can schedule, but while atomic and > BUG when it's really not bad. Should we add something like this: that's not good enough, we must not schedule with the preempt_count() set. one solution would be to forbid disable_irq() from softirq contexts, and to convert the vortex timeout function to a workqueue and use the *_delayed_work() APIs to drive it - and cross fingers there's not many places to fix. another solution would be to make softirqs preemptible if they are threaded. I'm a bit uneasy about that though. In that case we'd also have to make HARDIRQ threading dependent on softirq threading, in the Kconfig. Ingo ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-12 8:16 ` Ingo Molnar @ 2006-05-12 8:45 ` Steven Rostedt 2006-05-12 9:16 ` Ingo Molnar 2006-05-12 9:21 ` Ingo Molnar 0 siblings, 2 replies; 55+ messages in thread From: Steven Rostedt @ 2006-05-12 8:45 UTC (permalink / raw) To: Ingo Molnar; +Cc: Mark Hounschell, linux-kernel, Daniel Walker, Thomas Gleixner On Fri, 12 May 2006, Ingo Molnar wrote: > > > So I guess we have a case that we can schedule, but while atomic and > > BUG when it's really not bad. Should we add something like this: > > that's not good enough, we must not schedule with the preempt_count() > set. It gets even worse, with your new fix, the softirq will schedule with interrutps disabled, which would definitely BUG. > > one solution would be to forbid disable_irq() from softirq contexts, and > to convert the vortex timeout function to a workqueue and use the > *_delayed_work() APIs to drive it - and cross fingers there's not many > places to fix. I prefer the above. Maybe even add a WARN_ON(in_softirq()) in disable_irq. But I must admit, I wouldn't know how to make that change without spending more time on it then I have for this. > > another solution would be to make softirqs preemptible if they are > threaded. I'm a bit uneasy about that though. In that case we'd also > have to make HARDIRQ threading dependent on softirq threading, in the > Kconfig. scary. -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-12 8:45 ` Steven Rostedt @ 2006-05-12 9:16 ` Ingo Molnar 2006-05-12 9:21 ` Ingo Molnar 1 sibling, 0 replies; 55+ messages in thread From: Ingo Molnar @ 2006-05-12 9:16 UTC (permalink / raw) To: Steven Rostedt Cc: Mark Hounschell, linux-kernel, Daniel Walker, Thomas Gleixner * Steven Rostedt <rostedt@goodmis.org> wrote: > > On Fri, 12 May 2006, Ingo Molnar wrote: > > > > > > So I guess we have a case that we can schedule, but while atomic and > > > BUG when it's really not bad. Should we add something like this: > > > > that's not good enough, we must not schedule with the preempt_count() > > set. > > It gets even worse, with your new fix, the softirq will schedule with > interrutps disabled, which would definitely BUG. i dont think so. Calling __do_softirq() with hardirqs disabled is not a problem, it does an explicit local_irq_enable(). Ingo ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-12 8:45 ` Steven Rostedt 2006-05-12 9:16 ` Ingo Molnar @ 2006-05-12 9:21 ` Ingo Molnar 2006-05-12 12:38 ` Mark Hounschell 2006-05-12 13:16 ` 3c59x vortex_timer rt hack (was: rt20 patch question) Steven Rostedt 1 sibling, 2 replies; 55+ messages in thread From: Ingo Molnar @ 2006-05-12 9:21 UTC (permalink / raw) To: Steven Rostedt Cc: Mark Hounschell, linux-kernel, Daniel Walker, Thomas Gleixner * Steven Rostedt <rostedt@goodmis.org> wrote: > > one solution would be to forbid disable_irq() from softirq contexts, and > > to convert the vortex timeout function to a workqueue and use the > > *_delayed_work() APIs to drive it - and cross fingers there's not many > > places to fix. > > I prefer the above. Maybe even add a WARN_ON(in_softirq()) in > disable_irq. > > But I must admit, I wouldn't know how to make that change without > spending more time on it then I have for this. the simplest fix for now would be to use the _nosync variant in the vortex timeout function. Mark, does this fix the problem? Ingo Index: linux-rt.q/drivers/net/3c59x.c =================================================================== --- linux-rt.q.orig/drivers/net/3c59x.c +++ linux-rt.q/drivers/net/3c59x.c @@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data) if (vp->medialock) goto leave_media_alone; - disable_irq(dev->irq); + /* hack! */ + disable_irq_nosync(dev->irq); old_window = ioread16(ioaddr + EL3_CMD) >> 13; EL3WINDOW(4); media_status = ioread16(ioaddr + Wn4_Media); ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-12 9:21 ` Ingo Molnar @ 2006-05-12 12:38 ` Mark Hounschell 2006-05-12 13:18 ` Steven Rostedt 2006-05-12 13:16 ` 3c59x vortex_timer rt hack (was: rt20 patch question) Steven Rostedt 1 sibling, 1 reply; 55+ messages in thread From: Mark Hounschell @ 2006-05-12 12:38 UTC (permalink / raw) To: Ingo Molnar Cc: Steven Rostedt, linux-kernel, Daniel Walker, Thomas Gleixner, johnstul Ingo Molnar wrote: > * Steven Rostedt <rostedt@goodmis.org> wrote: > >>> one solution would be to forbid disable_irq() from softirq contexts, and >>> to convert the vortex timeout function to a workqueue and use the >>> *_delayed_work() APIs to drive it - and cross fingers there's not many >>> places to fix. >> I prefer the above. Maybe even add a WARN_ON(in_softirq()) in >> disable_irq. >> >> But I must admit, I wouldn't know how to make that change without >> spending more time on it then I have for this. > > the simplest fix for now would be to use the _nosync variant in the > vortex timeout function. > > Mark, does this fix the problem? > > Ingo > > Index: linux-rt.q/drivers/net/3c59x.c > =================================================================== > --- linux-rt.q.orig/drivers/net/3c59x.c > +++ linux-rt.q/drivers/net/3c59x.c > @@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data) > > if (vp->medialock) > goto leave_media_alone; > - disable_irq(dev->irq); > + /* hack! */ > + disable_irq_nosync(dev->irq); > old_window = ioread16(ioaddr + EL3_CMD) >> 13; > EL3WINDOW(4); > media_status = ioread16(ioaddr + Wn4_Media); > It looks like it does fix at least the BUG and network disconnection problem I am/was seeing. It's been 45 minutes or so without a glitch. I'm still not running this in complete preempt mode. Should I see if it helps that situation also? It only took a few minutes for that one to show up. Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-12 12:38 ` Mark Hounschell @ 2006-05-12 13:18 ` Steven Rostedt 2006-05-12 13:38 ` Mark Hounschell 2006-05-12 13:43 ` Mark Hounschell 0 siblings, 2 replies; 55+ messages in thread From: Steven Rostedt @ 2006-05-12 13:18 UTC (permalink / raw) To: Mark Hounschell Cc: Ingo Molnar, linux-kernel, Daniel Walker, Thomas Gleixner, johnstul On Fri, 12 May 2006, Mark Hounschell wrote: > Ingo Molnar wrote: > > > > Mark, does this fix the problem? > > > > Ingo > > [...] > > It looks like it does fix at least the BUG and network disconnection > problem I am/was seeing. It's been 45 minutes or so without a glitch. > > I'm still not running this in complete preempt mode. Should I see if it > helps that situation also? It only took a few minutes for that one to > show up. > I was looking at the logdump, but I don't see anything spinning. CPU 1 seems to be constantly running your v67 program (alternating with posix_cpu_timer), and CPU: 0 is still switching with the swapper, along with other tasks, so that this means nothing is just spinning and hogging the CPU (on CPU 0, but I assume the v67 tasks is suppose to keep running). But, this could mean that something is blocked on a lock, or missed a wakeup somewhere and we block X from responding. Although X is shown up, but some signal to do an event my be prevented. I wonder if the fact that softirqs are running with preemption enabled, is the problem here. Could you try the patch that Ingo sent here: http://marc.theaimsgroup.com/?l=linux-kernel&m=114741312301909&q=raw -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-12 13:18 ` Steven Rostedt @ 2006-05-12 13:38 ` Mark Hounschell 2006-05-12 13:43 ` Mark Hounschell 1 sibling, 0 replies; 55+ messages in thread From: Mark Hounschell @ 2006-05-12 13:38 UTC (permalink / raw) To: Steven Rostedt Cc: Ingo Molnar, linux-kernel, Daniel Walker, Thomas Gleixner, johnstul Steven Rostedt wrote: > On Fri, 12 May 2006, Mark Hounschell wrote: > >> Ingo Molnar wrote: >>> Mark, does this fix the problem? >>> >>> Ingo >>> > [...] >> It looks like it does fix at least the BUG and network disconnection >> problem I am/was seeing. It's been 45 minutes or so without a glitch. >> >> I'm still not running this in complete preempt mode. Should I see if it >> helps that situation also? It only took a few minutes for that one to >> show up. >> > > > I was looking at the logdump, but I don't see anything spinning. CPU 1 > seems to be constantly running your v67 program (alternating with > posix_cpu_timer), and CPU: 0 is still switching with the swapper, along > with other tasks, so that this means nothing is just spinning and hogging > the CPU (on CPU 0, but I assume the v67 tasks is suppose to keep running). > > But, this could mean that something is blocked on a lock, or missed a > wakeup somewhere and we block X from responding. Although X is shown up, > but some signal to do an event my be prevented. > > I wonder if the fact that softirqs are running with preemption enabled, is > the problem here. > > Could you try the patch that Ingo sent here: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=114741312301909&q=raw > > -- Steve > > If anything this made it worse. I actually got the freezes while just booting up the emulation. Once up, the same thing though. >Mark, > > as Ingo commented, this is a Hack! not a solution. Understood. Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-12 13:18 ` Steven Rostedt 2006-05-12 13:38 ` Mark Hounschell @ 2006-05-12 13:43 ` Mark Hounschell 2006-05-12 14:05 ` Steven Rostedt 1 sibling, 1 reply; 55+ messages in thread From: Mark Hounschell @ 2006-05-12 13:43 UTC (permalink / raw) To: Steven Rostedt Cc: Ingo Molnar, linux-kernel, Daniel Walker, Thomas Gleixner, johnstul Steven Rostedt wrote: > > I was looking at the logdump, but I don't see anything spinning. CPU 1 > seems to be constantly running your v67 program (alternating with > posix_cpu_timer), and CPU: 0 is still switching with the swapper, along > with other tasks, so that this means nothing is just spinning and hogging > the CPU (on CPU 0, but I assume the v67 tasks is suppose to keep running). > Yes the v67 task is the CPU process. Could it also mean I just didn't get the logdump at the right time? Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-12 13:43 ` Mark Hounschell @ 2006-05-12 14:05 ` Steven Rostedt 2006-05-12 14:36 ` Mark Hounschell 0 siblings, 1 reply; 55+ messages in thread From: Steven Rostedt @ 2006-05-12 14:05 UTC (permalink / raw) To: Mark Hounschell Cc: Ingo Molnar, linux-kernel, Daniel Walker, Thomas Gleixner, johnstul On Fri, 12 May 2006, Mark Hounschell wrote: > Steven Rostedt wrote: > > > > I was looking at the logdump, but I don't see anything spinning. CPU 1 > > seems to be constantly running your v67 program (alternating with > > posix_cpu_timer), and CPU: 0 is still switching with the swapper, along > > with other tasks, so that this means nothing is just spinning and hogging > > the CPU (on CPU 0, but I assume the v67 tasks is suppose to keep running). > > > > Yes the v67 task is the CPU process. Could it also mean I just didn't > get the logdump at the right time? > [ 619.220396] CPU:0 (bash:7783) -->> (konsole:7763) [ 619.220558] CPU:0 (konsole:7763) -->> (swapper:0) [ 619.220706] CPU:1 (v67:11149) -->> (IRQ 161:11082) [ 619.220717] CPU:1 (IRQ 161:11082) -->> (v67:11149) [ 619.223111] CPU:0 (swapper:0) -->> (posix_cpu_timer:3) [ 619.223116] CPU:0 (posix_cpu_timer:3) -->> (softirq-timer/0:5) [ 619.223127] CPU:0 (softirq-timer/0:5) -->> (swapper:0) [ 619.223570] CPU:1 (v67:11149) -->> (posix_cpu_timer:14) [ 619.223573] CPU:1 (posix_cpu_timer:14) -->> (v67:11149) [ 619.227097] CPU:0 (swapper:0) -->> (posix_cpu_timer:3) [ 619.227099] CPU:0 (posix_cpu_timer:3) -->> (softirq-timer/0:5) [ 619.227102] CPU:0 (softirq-timer/0:5) -->> (swapper:0) [ 619.227566] CPU:1 (v67:11149) -->> (posix_cpu_timer:14) [ 619.227568] CPU:1 (posix_cpu_timer:14) -->> (v67:11149) ... [ 633.861475] CPU:1 (v67:11149) -->> (posix_cpu_timer:14) [ 633.861477] CPU:1 (posix_cpu_timer:14) -->> (v67:11149) [ 633.865001] CPU:0 (swapper:0) -->> (posix_cpu_timer:3) [ 633.865003] CPU:0 (posix_cpu_timer:3) -->> (softirq-timer/0:5) [ 633.865006] CPU:0 (softirq-timer/0:5) -->> (swapper:0) [ 633.865470] CPU:1 (v67:11149) -->> (posix_cpu_timer:14) [ 633.865473] CPU:1 (posix_cpu_timer:14) -->> (v67:11149) [ 633.866421] CPU:1 (v67:11149) -->> (IRQ 161:11082) [ 633.866430] CPU:1 (IRQ 161:11082) -->> (v67:11149) [ 633.868998] CPU:0 (swapper:0) -->> (posix_cpu_timer:3) [ 633.869000] CPU:0 (posix_cpu_timer:3) -->> (softirq-timer/0:5) [ 633.869002] CPU:0 (softirq-timer/0:5) -->> (swapper:0) [ 633.869467] CPU:1 (v67:11149) -->> (posix_cpu_timer:14) [ 633.869470] CPU:1 (posix_cpu_timer:14) -->> (v67:11149) [ 633.872993] CPU:0 (swapper:0) -->> (posix_cpu_timer:3) [ 633.872995] CPU:0 (posix_cpu_timer:3) -->> (softirq-timer/0:5) [ 633.872998] CPU:0 (softirq-timer/0:5) -->> (swapper:0) [ 633.873463] CPU:1 (v67:11149) -->> (posix_cpu_timer:14) [ 633.873465] CPU:1 (posix_cpu_timer:14) -->> (v67:11149) [ 633.874747] CPU:1 (v67:11149) -->> (IRQ 161:11082) [ 633.874756] CPU:1 (IRQ 161:11082) -->> (v67:11149) [ 633.876990] CPU:0 (swapper:0) -->> (posix_cpu_timer:3) [ 633.876992] CPU:0 (posix_cpu_timer:3) -->> (softirq-timer/0:5) [ 633.876996] CPU:0 (softirq-timer/0:5) -->> (kded:6119) [ 633.877030] CPU:0 (kded:6119) -->> (swapper:0) [ 633.877460] CPU:1 (v67:11149) -->> (posix_cpu_timer:14) [ 633.877462] CPU:1 (posix_cpu_timer:14) -->> (v67:11149) [ 633.878447] CPU:0 (swapper:0) -->> (IRQ 1:823) [ 633.878474] CPU:0 (IRQ 1:823) -->> (softirq-tasklet:9) [ 633.878478] CPU:0 (softirq-tasklet:9) -->> (events/0:24) [ 633.878488] CPU:0 (events/0:24) -->> (X:5513) [ 633.878627] CPU:0 (X:5513) -->> (konsole:7763) [ 633.878669] CPU:0 (konsole:7763) -->> (X:5513) [ 633.878683] CPU:0 (X:5513) -->> (konsole:7763) [ 633.879309] CPU:0 (konsole:7763) -->> (X:5513) [ 633.879415] CPU:0 (X:5513) -->> (konsole:7763) [ 633.879457] CPU:0 (konsole:7763) -->> (X:5513) [ 633.879463] CPU:0 (X:5513) -->> (konsole:7763) [ 633.879467] CPU:0 (konsole:7763) -->> (X:5513) [ 633.879553] CPU:0 (X:5513) -->> (kded:6119) [ 633.879651] CPU:0 (kded:6119) -->> (kwin:6135) [ 633.879711] CPU:0 (kwin:6135) -->> (kdesktop:6140) [ 633.879782] CPU:0 (kdesktop:6140) -->> (kicker:6142) [ 633.879858] CPU:0 (kicker:6142) -->> (X:5513) [ 633.879927] CPU:0 (X:5513) -->> (kwin:6135) [ 633.879963] CPU:0 (kwin:6135) -->> (X:5513) [ 633.879977] CPU:0 (X:5513) -->> (bash:7783) [ 633.880042] CPU:0 (bash:7783) -->> (konsole:7763) [ 633.880103] CPU:0 (konsole:7763) -->> (X:5513) [ 633.880119] CPU:0 (X:5513) -->> (konsole:7763) [ 633.880211] CPU:0 (konsole:7763) -->> (bash:7783) Well, the bash is what turned off the logging, and the logging started at 619.xxx and ended at 633.xxx so that's ~14 seconds of logging. So I would assume you did it in the right place. How long does the stop happen, and what exactly freezes? Can you ping the machine? Also, have you tried to switch to a console before it freezes, and see if it doesn't freeze that. I'm curious if X isn't waiting on something. -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-12 14:05 ` Steven Rostedt @ 2006-05-12 14:36 ` Mark Hounschell 2006-05-12 14:51 ` Steven Rostedt 0 siblings, 1 reply; 55+ messages in thread From: Mark Hounschell @ 2006-05-12 14:36 UTC (permalink / raw) To: Steven Rostedt Cc: Ingo Molnar, linux-kernel, Daniel Walker, Thomas Gleixner, johnstul Steven Rostedt wrote: > On Fri, 12 May 2006, Mark Hounschell wrote: > >> Steven Rostedt wrote: >> > >>> I was looking at the logdump, but I don't see anything spinning. CPU 1 >>> seems to be constantly running your v67 program (alternating with >>> posix_cpu_timer), and CPU: 0 is still switching with the swapper, along >>> with other tasks, so that this means nothing is just spinning and hogging >>> the CPU (on CPU 0, but I assume the v67 tasks is suppose to keep running). >>> >> Yes the v67 task is the CPU process. Could it also mean I just didn't >> get the logdump at the right time? >> > > > [ 619.220396] CPU:0 (bash:7783) -->> (konsole:7763) > [ 619.220558] CPU:0 (konsole:7763) -->> (swapper:0) > [ 619.220706] CPU:1 (v67:11149) -->> (IRQ 161:11082) > [ 619.220717] CPU:1 (IRQ 161:11082) -->> (v67:11149) > [ 619.223111] CPU:0 (swapper:0) -->> (posix_cpu_timer:3) > [ 619.223116] CPU:0 (posix_cpu_timer:3) -->> (softirq-timer/0:5) > [ 619.223127] CPU:0 (softirq-timer/0:5) -->> (swapper:0) > [ 619.223570] CPU:1 (v67:11149) -->> (posix_cpu_timer:14) > [ 619.223573] CPU:1 (posix_cpu_timer:14) -->> (v67:11149) > [ 619.227097] CPU:0 (swapper:0) -->> (posix_cpu_timer:3) > [ 619.227099] CPU:0 (posix_cpu_timer:3) -->> (softirq-timer/0:5) > [ 619.227102] CPU:0 (softirq-timer/0:5) -->> (swapper:0) > [ 619.227566] CPU:1 (v67:11149) -->> (posix_cpu_timer:14) > [ 619.227568] CPU:1 (posix_cpu_timer:14) -->> (v67:11149) > > ... > > [ 633.861475] CPU:1 (v67:11149) -->> (posix_cpu_timer:14) > [ 633.861477] CPU:1 (posix_cpu_timer:14) -->> (v67:11149) > [ 633.865001] CPU:0 (swapper:0) -->> (posix_cpu_timer:3) > [ 633.865003] CPU:0 (posix_cpu_timer:3) -->> (softirq-timer/0:5) > [ 633.865006] CPU:0 (softirq-timer/0:5) -->> (swapper:0) > [ 633.865470] CPU:1 (v67:11149) -->> (posix_cpu_timer:14) > [ 633.865473] CPU:1 (posix_cpu_timer:14) -->> (v67:11149) > [ 633.866421] CPU:1 (v67:11149) -->> (IRQ 161:11082) > [ 633.866430] CPU:1 (IRQ 161:11082) -->> (v67:11149) > [ 633.868998] CPU:0 (swapper:0) -->> (posix_cpu_timer:3) > [ 633.869000] CPU:0 (posix_cpu_timer:3) -->> (softirq-timer/0:5) > [ 633.869002] CPU:0 (softirq-timer/0:5) -->> (swapper:0) > [ 633.869467] CPU:1 (v67:11149) -->> (posix_cpu_timer:14) > [ 633.869470] CPU:1 (posix_cpu_timer:14) -->> (v67:11149) > [ 633.872993] CPU:0 (swapper:0) -->> (posix_cpu_timer:3) > [ 633.872995] CPU:0 (posix_cpu_timer:3) -->> (softirq-timer/0:5) > [ 633.872998] CPU:0 (softirq-timer/0:5) -->> (swapper:0) > [ 633.873463] CPU:1 (v67:11149) -->> (posix_cpu_timer:14) > [ 633.873465] CPU:1 (posix_cpu_timer:14) -->> (v67:11149) > [ 633.874747] CPU:1 (v67:11149) -->> (IRQ 161:11082) > [ 633.874756] CPU:1 (IRQ 161:11082) -->> (v67:11149) > [ 633.876990] CPU:0 (swapper:0) -->> (posix_cpu_timer:3) > [ 633.876992] CPU:0 (posix_cpu_timer:3) -->> (softirq-timer/0:5) > [ 633.876996] CPU:0 (softirq-timer/0:5) -->> (kded:6119) > [ 633.877030] CPU:0 (kded:6119) -->> (swapper:0) > [ 633.877460] CPU:1 (v67:11149) -->> (posix_cpu_timer:14) > [ 633.877462] CPU:1 (posix_cpu_timer:14) -->> (v67:11149) > [ 633.878447] CPU:0 (swapper:0) -->> (IRQ 1:823) > [ 633.878474] CPU:0 (IRQ 1:823) -->> (softirq-tasklet:9) > [ 633.878478] CPU:0 (softirq-tasklet:9) -->> (events/0:24) > [ 633.878488] CPU:0 (events/0:24) -->> (X:5513) > [ 633.878627] CPU:0 (X:5513) -->> (konsole:7763) > [ 633.878669] CPU:0 (konsole:7763) -->> (X:5513) > [ 633.878683] CPU:0 (X:5513) -->> (konsole:7763) > [ 633.879309] CPU:0 (konsole:7763) -->> (X:5513) > [ 633.879415] CPU:0 (X:5513) -->> (konsole:7763) > [ 633.879457] CPU:0 (konsole:7763) -->> (X:5513) > [ 633.879463] CPU:0 (X:5513) -->> (konsole:7763) > [ 633.879467] CPU:0 (konsole:7763) -->> (X:5513) > [ 633.879553] CPU:0 (X:5513) -->> (kded:6119) > [ 633.879651] CPU:0 (kded:6119) -->> (kwin:6135) > [ 633.879711] CPU:0 (kwin:6135) -->> (kdesktop:6140) > [ 633.879782] CPU:0 (kdesktop:6140) -->> (kicker:6142) > [ 633.879858] CPU:0 (kicker:6142) -->> (X:5513) > [ 633.879927] CPU:0 (X:5513) -->> (kwin:6135) > [ 633.879963] CPU:0 (kwin:6135) -->> (X:5513) > [ 633.879977] CPU:0 (X:5513) -->> (bash:7783) > [ 633.880042] CPU:0 (bash:7783) -->> (konsole:7763) > [ 633.880103] CPU:0 (konsole:7763) -->> (X:5513) > [ 633.880119] CPU:0 (X:5513) -->> (konsole:7763) > [ 633.880211] CPU:0 (konsole:7763) -->> (bash:7783) > > > Well, the bash is what turned off the logging, and the logging started at > 619.xxx and ended at 633.xxx so that's ~14 seconds of logging. So I would > assume you did it in the right place. > > How long does the stop happen, and what exactly freezes? Can you ping the > machine? Also, have you tried to switch to a console before it freezes, > and see if it doesn't freeze that. I'm curious if X isn't waiting on > something. > They stops can be anywhere up to even a few minutes depending how patient I want to be. I was just playing with it to possibly get another log. The machine froze. Did the log thing while frozen. Then I attempted to ssh into it from another machine. It let me in and the machine unfroze at that same time. But only to stop again in a few seconds. The new shell was also frozen. I sshd to it again, same thing. While the machine was unfrozen I was able to halt the cpu process basically taking it out of its execution loop and putting into a delay loop of 1 ms via while(clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &tim, NULL) && errno == EINTR); As long as the CPU process is halted and in this loop the machine acts normal. As soon as the CPU process goes back into his execution loop we are back to the "stops". Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-12 14:36 ` Mark Hounschell @ 2006-05-12 14:51 ` Steven Rostedt 0 siblings, 0 replies; 55+ messages in thread From: Steven Rostedt @ 2006-05-12 14:51 UTC (permalink / raw) To: Mark Hounschell Cc: Ingo Molnar, linux-kernel, Daniel Walker, Thomas Gleixner, johnstul On Fri, 12 May 2006, Mark Hounschell wrote: > > They stops can be anywhere up to even a few minutes depending how > patient I want to be. I was just playing with it to possibly get another > log. The machine froze. Did the log thing while frozen. Then I attempted > to ssh into it from another machine. It let me in and the machine > unfroze at that same time. But only to stop again in a few seconds. The > new shell was also frozen. I sshd to it again, same thing. This is a good indictation of a missed wake up. Now the question is, what is sleeping and why didn't it wake up. > > While the machine was unfrozen I was able to halt the cpu process > basically taking it out of its execution loop and putting into a delay > loop of 1 ms via > > while(clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &tim, NULL) && > errno == EINTR); Hmm, do you have high res timers turned on? > > As long as the CPU process is halted and in this loop the machine acts > normal. As soon as the CPU process goes back into his execution loop we > are back to the "stops". > Could you hook up a serial, and on the machine do a # cat /dev/ttyS0 & Just to open the serial for reading. And then on the machine on the other end of the serial cable, bring up minicom, do a ctrl-a f t ctl-a f sends a break, the t will do a task dump. Do this when the machine is stopped and see what is running. Hopefuly the sysrq works from serial (I've had boxes where the keyboard sysrq didn't work but serial did). Oh, and send me the output too. Thanks, -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 9:21 ` Ingo Molnar 2006-05-12 12:38 ` Mark Hounschell @ 2006-05-12 13:16 ` Steven Rostedt 2006-05-12 13:36 ` Ingo Molnar 2006-05-12 14:16 ` Andrew Morton 1 sibling, 2 replies; 55+ messages in thread From: Steven Rostedt @ 2006-05-12 13:16 UTC (permalink / raw) To: Ingo Molnar Cc: Mark Hounschell, linux-kernel, Daniel Walker, Thomas Gleixner, akpm On Fri, 12 May 2006, Ingo Molnar wrote: > --- linux-rt.q.orig/drivers/net/3c59x.c > +++ linux-rt.q/drivers/net/3c59x.c > @@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data) > > if (vp->medialock) > goto leave_media_alone; > - disable_irq(dev->irq); > + /* hack! */ > + disable_irq_nosync(dev->irq); > old_window = ioread16(ioaddr + EL3_CMD) >> 13; > EL3WINDOW(4); > media_status = ioread16(ioaddr + Wn4_Media); BTW, I originally thought about having Mark do this, but I'm nervious about the side effects that this might have. Basically, it's doing ioreads from the device while the interrupt could be doing iowrites. I don't know the device well enough to know if this is a problem. I've added Andrew Morton to the CC list, since his name is all over the code. Andrew, Do you know off hand what the side-effects to the vortex card might be if we use disable_irq_nosync instead of disable_irq? Mark, as Ingo commented, this is a Hack! not a solution. -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 13:16 ` 3c59x vortex_timer rt hack (was: rt20 patch question) Steven Rostedt @ 2006-05-12 13:36 ` Ingo Molnar 2006-05-12 13:46 ` Steven Rostedt 2006-05-12 14:16 ` Andrew Morton 1 sibling, 1 reply; 55+ messages in thread From: Ingo Molnar @ 2006-05-12 13:36 UTC (permalink / raw) To: Steven Rostedt Cc: Mark Hounschell, linux-kernel, Daniel Walker, Thomas Gleixner, akpm * Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Fri, 12 May 2006, Ingo Molnar wrote: > > > --- linux-rt.q.orig/drivers/net/3c59x.c > > +++ linux-rt.q/drivers/net/3c59x.c > > @@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data) > > > > if (vp->medialock) > > goto leave_media_alone; > > - disable_irq(dev->irq); > > + /* hack! */ > > + disable_irq_nosync(dev->irq); > > old_window = ioread16(ioaddr + EL3_CMD) >> 13; > > EL3WINDOW(4); > > media_status = ioread16(ioaddr + Wn4_Media); > > BTW, I originally thought about having Mark do this, but I'm nervious > about the side effects that this might have. Basically, it's doing > ioreads from the device while the interrupt could be doing iowrites. yes, that can happen - but since this is a timeout, this is rather unlikely in practice. Nevertheless it's possible, so i marked the code a hack. Ingo ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 13:36 ` Ingo Molnar @ 2006-05-12 13:46 ` Steven Rostedt 0 siblings, 0 replies; 55+ messages in thread From: Steven Rostedt @ 2006-05-12 13:46 UTC (permalink / raw) To: Ingo Molnar Cc: Mark Hounschell, linux-kernel, Daniel Walker, Thomas Gleixner, akpm On Fri, 12 May 2006, Ingo Molnar wrote: > > > > BTW, I originally thought about having Mark do this, but I'm nervious > > about the side effects that this might have. Basically, it's doing > > ioreads from the device while the interrupt could be doing iowrites. > > yes, that can happen - but since this is a timeout, this is rather > unlikely in practice. Nevertheless it's possible, so i marked the code a > hack. > Yes, but this is the source of Mark's bug, so he is definitely hitting it. -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 13:16 ` 3c59x vortex_timer rt hack (was: rt20 patch question) Steven Rostedt 2006-05-12 13:36 ` Ingo Molnar @ 2006-05-12 14:16 ` Andrew Morton 2006-05-12 14:32 ` Steven Rostedt 1 sibling, 1 reply; 55+ messages in thread From: Andrew Morton @ 2006-05-12 14:16 UTC (permalink / raw) To: Steven Rostedt; +Cc: mingo, markh, linux-kernel, dwalker, tglx Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Fri, 12 May 2006, Ingo Molnar wrote: > > > --- linux-rt.q.orig/drivers/net/3c59x.c > > +++ linux-rt.q/drivers/net/3c59x.c > > @@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data) > > > > if (vp->medialock) > > goto leave_media_alone; > > - disable_irq(dev->irq); > > + /* hack! */ > > + disable_irq_nosync(dev->irq); > > old_window = ioread16(ioaddr + EL3_CMD) >> 13; > > EL3WINDOW(4); > > media_status = ioread16(ioaddr + Wn4_Media); > > BTW, I originally thought about having Mark do this, but I'm nervious > about the side effects that this might have. Basically, it's doing > ioreads from the device while the interrupt could be doing iowrites. > > I don't know the device well enough to know if this is a problem. > I've added Andrew Morton to the CC list, since his name is all over the > code. > > Andrew, > > Do you know off hand what the side-effects to the vortex card might be > if we use disable_irq_nosync instead of disable_irq? > ooh, ow, sorry, that's lost in the mists of time. I don't know why we're doing disable_irq() in there. Whatever it does, I think you could take vp->lock instead - that'll stop the interrupt handler from doing anything if it does get entered while this CPU is running vortex_timer(). ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 14:16 ` Andrew Morton @ 2006-05-12 14:32 ` Steven Rostedt 2006-05-12 14:39 ` Steven Rostedt 0 siblings, 1 reply; 55+ messages in thread From: Steven Rostedt @ 2006-05-12 14:32 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, markh, linux-kernel, dwalker, tglx On Fri, 12 May 2006, Andrew Morton wrote: > > > > Andrew, > > > > Do you know off hand what the side-effects to the vortex card might be > > if we use disable_irq_nosync instead of disable_irq? > > > > ooh, ow, sorry, that's lost in the mists of time. I don't know why we're > doing disable_irq() in there. > > Whatever it does, I think you could take vp->lock instead - that'll stop > the interrupt handler from doing anything if it does get entered while this > CPU is running vortex_timer(). > Thanks Andrew, I was thinking about using that lock too. Mark, could you try this instead of the hack, and see if it works. Thanks, -- Steve Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Index: linux-2.6.16-rt20/drivers/net/3c59x.c =================================================================== --- linux-2.6.16-rt20.orig/drivers/net/3c59x.c 2006-05-12 10:27:36.000000000 -0400 +++ linux-2.6.16-rt20/drivers/net/3c59x.c 2006-05-12 10:28:22.000000000 -0400 @@ -1897,7 +1897,7 @@ vortex_timer(unsigned long data) if (vp->medialock) goto leave_media_alone; - disable_irq(dev->irq); + spin_lock_bh(&vp->lock); old_window = ioread16(ioaddr + EL3_CMD) >> 13; EL3WINDOW(4); media_status = ioread16(ioaddr + Wn4_Media); @@ -1919,7 +1919,6 @@ vortex_timer(unsigned long data) break; case XCVR_MII: case XCVR_NWAY: { - spin_lock_bh(&vp->lock); mii_status = mdio_read(dev, vp->phys[0], MII_BMSR); if (!(mii_status & BMSR_LSTATUS)) { /* Re-read to get actual link status */ @@ -1957,7 +1956,6 @@ vortex_timer(unsigned long data) } else { netif_carrier_off(dev); } - spin_unlock_bh(&vp->lock); } break; default: /* Other media types handled by Tx timeouts. */ @@ -2000,7 +1998,7 @@ vortex_timer(unsigned long data) /* AKPM: FIXME: Should reset Rx & Tx here. P60 of 3c90xc.pdf */ } EL3WINDOW(old_window); - enable_irq(dev->irq); + spin_unlock_bh(&vp->lock); leave_media_alone: if (vortex_debug > 2) ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 14:32 ` Steven Rostedt @ 2006-05-12 14:39 ` Steven Rostedt 2006-05-12 14:43 ` Ingo Molnar 2006-05-12 14:49 ` Andrew Morton 0 siblings, 2 replies; 55+ messages in thread From: Steven Rostedt @ 2006-05-12 14:39 UTC (permalink / raw) To: Andrew Morton; +Cc: Ingo Molnar, markh, LKML, dwalker, Thomas Gleixner Argh, cut and paste wasn't enough... Use this patch instead. It needs an irq disable. But, believe it or not, on SMP this is actually better. If the irq is shared (as it is in Mark's case), we don't stop the irq of other devices from being handled on another CPU (unfortunately for Mark, he pinned all interrupts to one CPU). Andrew, should this be changed in mainline too? -- Steve Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Index: linux-2.6.16-rt20/drivers/net/3c59x.c =================================================================== --- linux-2.6.16-rt20.orig/drivers/net/3c59x.c 2006-05-12 10:27:36.000000000 -0400 +++ linux-2.6.16-rt20/drivers/net/3c59x.c 2006-05-12 10:34:51.000000000 -0400 @@ -1888,6 +1888,7 @@ vortex_timer(unsigned long data) int next_tick = 60*HZ; int ok = 0; int media_status, mii_status, old_window; + unsigned long flags; if (vortex_debug > 2) { printk(KERN_DEBUG "%s: Media selection timer tick happened, %s.\n", @@ -1897,7 +1898,7 @@ vortex_timer(unsigned long data) if (vp->medialock) goto leave_media_alone; - disable_irq(dev->irq); + spin_lock_irqsave(&vp->lock, flags); old_window = ioread16(ioaddr + EL3_CMD) >> 13; EL3WINDOW(4); media_status = ioread16(ioaddr + Wn4_Media); @@ -1919,7 +1920,6 @@ vortex_timer(unsigned long data) break; case XCVR_MII: case XCVR_NWAY: { - spin_lock_bh(&vp->lock); mii_status = mdio_read(dev, vp->phys[0], MII_BMSR); if (!(mii_status & BMSR_LSTATUS)) { /* Re-read to get actual link status */ @@ -1957,7 +1957,6 @@ vortex_timer(unsigned long data) } else { netif_carrier_off(dev); } - spin_unlock_bh(&vp->lock); } break; default: /* Other media types handled by Tx timeouts. */ @@ -2000,7 +1999,7 @@ vortex_timer(unsigned long data) /* AKPM: FIXME: Should reset Rx & Tx here. P60 of 3c90xc.pdf */ } EL3WINDOW(old_window); - enable_irq(dev->irq); + spin_unlock_irqrestore(&vp->lock, flags); leave_media_alone: if (vortex_debug > 2) ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 14:39 ` Steven Rostedt @ 2006-05-12 14:43 ` Ingo Molnar 2006-05-12 14:49 ` Andrew Morton 1 sibling, 0 replies; 55+ messages in thread From: Ingo Molnar @ 2006-05-12 14:43 UTC (permalink / raw) To: Steven Rostedt; +Cc: Andrew Morton, markh, LKML, dwalker, Thomas Gleixner * Steven Rostedt <rostedt@goodmis.org> wrote: > Use this patch instead. It needs an irq disable. But, believe it or > not, on SMP this is actually better. If the irq is shared (as it is > in Mark's case), we don't stop the irq of other devices from being > handled on another CPU (unfortunately for Mark, he pinned all > interrupts to one CPU). > > Andrew, > > should this be changed in mainline too? > > -- Steve > > Signed-off-by: Steven Rostedt <rostedt@goodmis.org> yeah, would be nice to have this upstream too. It's not urgent so can go post-2.6.17. I've added it to the -rt tree. Ingo ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 14:39 ` Steven Rostedt 2006-05-12 14:43 ` Ingo Molnar @ 2006-05-12 14:49 ` Andrew Morton 2006-05-12 15:04 ` Steven Rostedt 2006-05-12 15:22 ` 3c59x vortex_timer rt hack (was: rt20 patch question) Steven Rostedt 1 sibling, 2 replies; 55+ messages in thread From: Andrew Morton @ 2006-05-12 14:49 UTC (permalink / raw) To: Steven Rostedt; +Cc: mingo, markh, linux-kernel, dwalker, tglx Steven Rostedt <rostedt@goodmis.org> wrote: > > Use this patch instead. It needs an irq disable. But, believe it or not, > on SMP this is actually better. If the irq is shared (as it is in Mark's > case), we don't stop the irq of other devices from being handled on > another CPU (unfortunately for Mark, he pinned all interrupts to one CPU). > > Andrew, > > should this be changed in mainline too? I suppose so - we're taking the lock with spin_lock_bh(), but it can also be taken by this CPU from the interrupt, so it'll deadlock. But lo! We've done disable_irq(), so the interrupt won't be happening. So yes, doing spin_lock_irq() (irqrestore isn't needed in a timer handler) instead of disable_irq() in vortex_timer() looks OK. One does wonder how long we'll hold off interrupts though. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 14:49 ` Andrew Morton @ 2006-05-12 15:04 ` Steven Rostedt 2006-05-12 16:53 ` 3c59x vortex_timer rt hack Mark Hounschell 2006-05-12 15:22 ` 3c59x vortex_timer rt hack (was: rt20 patch question) Steven Rostedt 1 sibling, 1 reply; 55+ messages in thread From: Steven Rostedt @ 2006-05-12 15:04 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, markh, linux-kernel, dwalker, tglx On Fri, 12 May 2006, Andrew Morton wrote: > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > Use this patch instead. It needs an irq disable. But, believe it or not, > > on SMP this is actually better. If the irq is shared (as it is in Mark's > > case), we don't stop the irq of other devices from being handled on > > another CPU (unfortunately for Mark, he pinned all interrupts to one CPU). > > > > Andrew, > > > > should this be changed in mainline too? > > I suppose so - we're taking the lock with spin_lock_bh(), but it can also > be taken by this CPU from the interrupt, so it'll deadlock. But lo! We've > done disable_irq(), so the interrupt won't be happening. > > So yes, doing spin_lock_irq() (irqrestore isn't needed in a timer handler) > instead of disable_irq() in vortex_timer() looks OK. Ah, you're right, it's an over kill. Ingo, here's the patch without irqsave -- Steve Index: linux-2.6.16-rt20/drivers/net/3c59x.c =================================================================== --- linux-2.6.16-rt20.orig/drivers/net/3c59x.c 2006-05-12 10:27:36.000000000 -0400 +++ linux-2.6.16-rt20/drivers/net/3c59x.c 2006-05-12 11:03:39.000000000 -0400 @@ -1897,7 +1897,7 @@ vortex_timer(unsigned long data) if (vp->medialock) goto leave_media_alone; - disable_irq(dev->irq); + spin_lock_irq(&vp->lock); old_window = ioread16(ioaddr + EL3_CMD) >> 13; EL3WINDOW(4); media_status = ioread16(ioaddr + Wn4_Media); @@ -1919,7 +1919,6 @@ vortex_timer(unsigned long data) break; case XCVR_MII: case XCVR_NWAY: { - spin_lock_bh(&vp->lock); mii_status = mdio_read(dev, vp->phys[0], MII_BMSR); if (!(mii_status & BMSR_LSTATUS)) { /* Re-read to get actual link status */ @@ -1957,7 +1956,6 @@ vortex_timer(unsigned long data) } else { netif_carrier_off(dev); } - spin_unlock_bh(&vp->lock); } break; default: /* Other media types handled by Tx timeouts. */ @@ -2000,7 +1998,7 @@ vortex_timer(unsigned long data) /* AKPM: FIXME: Should reset Rx & Tx here. P60 of 3c90xc.pdf */ } EL3WINDOW(old_window); - enable_irq(dev->irq); + spin_unlock_irq(&vp->lock); leave_media_alone: if (vortex_debug > 2) ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack 2006-05-12 15:04 ` Steven Rostedt @ 2006-05-12 16:53 ` Mark Hounschell 0 siblings, 0 replies; 55+ messages in thread From: Mark Hounschell @ 2006-05-12 16:53 UTC (permalink / raw) To: Steven Rostedt; +Cc: Andrew Morton, mingo, linux-kernel, dwalker, tglx Steven Rostedt wrote: > On Fri, 12 May 2006, Andrew Morton wrote: > >> Steven Rostedt <rostedt@goodmis.org> wrote: >>> Use this patch instead. It needs an irq disable. But, believe it or not, >>> on SMP this is actually better. If the irq is shared (as it is in Mark's >>> case), we don't stop the irq of other devices from being handled on >>> another CPU (unfortunately for Mark, he pinned all interrupts to one CPU). >>> >>> Andrew, >>> >>> should this be changed in mainline too? >> I suppose so - we're taking the lock with spin_lock_bh(), but it can also >> be taken by this CPU from the interrupt, so it'll deadlock. But lo! We've >> done disable_irq(), so the interrupt won't be happening. >> >> So yes, doing spin_lock_irq() (irqrestore isn't needed in a timer handler) >> instead of disable_irq() in vortex_timer() looks OK. > > Ah, you're right, it's an over kill. > > Ingo, here's the patch without irqsave > > -- Steve > > Index: linux-2.6.16-rt20/drivers/net/3c59x.c > =================================================================== > --- linux-2.6.16-rt20.orig/drivers/net/3c59x.c 2006-05-12 10:27:36.000000000 -0400 > +++ linux-2.6.16-rt20/drivers/net/3c59x.c 2006-05-12 11:03:39.000000000 -0400 > @@ -1897,7 +1897,7 @@ vortex_timer(unsigned long data) > > if (vp->medialock) > goto leave_media_alone; > - disable_irq(dev->irq); > + spin_lock_irq(&vp->lock); > old_window = ioread16(ioaddr + EL3_CMD) >> 13; > EL3WINDOW(4); > media_status = ioread16(ioaddr + Wn4_Media); > @@ -1919,7 +1919,6 @@ vortex_timer(unsigned long data) > break; > case XCVR_MII: case XCVR_NWAY: > { > - spin_lock_bh(&vp->lock); > mii_status = mdio_read(dev, vp->phys[0], MII_BMSR); > if (!(mii_status & BMSR_LSTATUS)) { > /* Re-read to get actual link status */ > @@ -1957,7 +1956,6 @@ vortex_timer(unsigned long data) > } else { > netif_carrier_off(dev); > } > - spin_unlock_bh(&vp->lock); > } > break; > default: /* Other media types handled by Tx timeouts. */ > @@ -2000,7 +1998,7 @@ vortex_timer(unsigned long data) > /* AKPM: FIXME: Should reset Rx & Tx here. P60 of 3c90xc.pdf */ > } > EL3WINDOW(old_window); > - enable_irq(dev->irq); > + spin_unlock_irq(&vp->lock); > > leave_media_alone: > if (vortex_debug > 2) > I have tried this one and it seems OK. No BUGs or disconnections for over an hour now and I'm beating it good. Mark ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 14:49 ` Andrew Morton 2006-05-12 15:04 ` Steven Rostedt @ 2006-05-12 15:22 ` Steven Rostedt 2006-05-12 15:23 ` Andrew Morton 1 sibling, 1 reply; 55+ messages in thread From: Steven Rostedt @ 2006-05-12 15:22 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, markh, linux-kernel, dwalker, tglx On Fri, 12 May 2006, Andrew Morton wrote: > > So yes, doing spin_lock_irq() (irqrestore isn't needed in a timer handler) > instead of disable_irq() in vortex_timer() looks OK. > > One does wonder how long we'll hold off interrupts though. Any longer than this! in boomerang_start_xmit() spin_lock_irqsave(&vp->lock, flags); /* Wait for the stall to complete. */ issue_and_wait(dev, DownStall); Pretty big wait! [...] spin_unlock_irqrestore(&vp->lock, flags); Where we have in issue_and_wait static void issue_and_wait(struct net_device *dev, int cmd) { [...] /* OK, that didn't work. Do it the slow way. One second */ for (i = 0; i < 100000; i++) { [...] } So this can have interrupts off for over a second! -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 15:22 ` 3c59x vortex_timer rt hack (was: rt20 patch question) Steven Rostedt @ 2006-05-12 15:23 ` Andrew Morton 2006-05-12 15:36 ` Steven Rostedt 0 siblings, 1 reply; 55+ messages in thread From: Andrew Morton @ 2006-05-12 15:23 UTC (permalink / raw) To: Steven Rostedt; +Cc: mingo, markh, linux-kernel, dwalker, tglx Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Fri, 12 May 2006, Andrew Morton wrote: > > > > > So yes, doing spin_lock_irq() (irqrestore isn't needed in a timer handler) > > instead of disable_irq() in vortex_timer() looks OK. > > > > One does wonder how long we'll hold off interrupts though. > > Any longer than this! > > in boomerang_start_xmit() > > spin_lock_irqsave(&vp->lock, flags); > > /* Wait for the stall to complete. */ > issue_and_wait(dev, DownStall); > > Pretty big wait! > > [...] > > spin_unlock_irqrestore(&vp->lock, flags); > > > Where we have in issue_and_wait > > static void > issue_and_wait(struct net_device *dev, int cmd) > { > > [...] > > /* OK, that didn't work. Do it the slow way. One second */ > for (i = 0; i < 100000; i++) { > > [...] > } > > So this can have interrupts off for over a second! > Well, only if the hardware's fratzed. Normally this is quick. otoh vortex_timer() will play with the MII interface, which is slooooow. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 15:23 ` Andrew Morton @ 2006-05-12 15:36 ` Steven Rostedt 2006-05-12 16:03 ` Andrew Morton 0 siblings, 1 reply; 55+ messages in thread From: Steven Rostedt @ 2006-05-12 15:36 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, markh, linux-kernel, dwalker, tglx On Fri, 12 May 2006, Andrew Morton wrote: > > Well, only if the hardware's fratzed. Normally this is quick. > > otoh vortex_timer() will play with the MII interface, which is slooooow. > The vortex_timer is a timeout, will it go off often? -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 15:36 ` Steven Rostedt @ 2006-05-12 16:03 ` Andrew Morton 2006-05-12 16:11 ` Steven Rostedt 0 siblings, 1 reply; 55+ messages in thread From: Andrew Morton @ 2006-05-12 16:03 UTC (permalink / raw) To: Steven Rostedt; +Cc: mingo, markh, linux-kernel, dwalker, tglx Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Fri, 12 May 2006, Andrew Morton wrote: > > > > > Well, only if the hardware's fratzed. Normally this is quick. > > > > otoh vortex_timer() will play with the MII interface, which is slooooow. > > > > The vortex_timer is a timeout, err, it's actually a function. > will it go off often? Every five seconds if the cable's unplugged. Every 60 seconds otherwise. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 16:03 ` Andrew Morton @ 2006-05-12 16:11 ` Steven Rostedt 2006-05-12 16:27 ` Andrew Morton 0 siblings, 1 reply; 55+ messages in thread From: Steven Rostedt @ 2006-05-12 16:11 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, markh, linux-kernel, dwalker, tglx On Fri, 12 May 2006, Andrew Morton wrote: > > > > The vortex_timer is a timeout, > > err, it's actually a function. OK, I meant vp->timer > > > will it go off often? > > Every five seconds if the cable's unplugged. Every 60 seconds otherwise. > OK, so the function is a service and not a fixup (or both). Hmm, so latency is an issue. Thanks, -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 16:11 ` Steven Rostedt @ 2006-05-12 16:27 ` Andrew Morton 2006-05-12 16:38 ` Steven Rostedt 0 siblings, 1 reply; 55+ messages in thread From: Andrew Morton @ 2006-05-12 16:27 UTC (permalink / raw) To: Steven Rostedt; +Cc: mingo, markh, linux-kernel, dwalker, tglx Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Fri, 12 May 2006, Andrew Morton wrote: > > > > > > > The vortex_timer is a timeout, > > > > err, it's actually a function. > > OK, I meant vp->timer That's a kernel timer. > > > > > will it go off often? > > > > Every five seconds if the cable's unplugged. Every 60 seconds otherwise. > > > > OK, so the function is a service and not a fixup (or both). Hmm, so > latency is an issue. yup. It's been five years, sorry - I'm struggling to remember why vortex_timer() needs to block the interrupt handler. The chip is fairly stateful - that EL3WINDOW() thing selects a particular register bank and needs protection against other register readers. But we should avoid running EL3WINDOW() in the rx and tx interrupt handlers anyway - iirc the chip is designed to permit that. Is tricky. How come -rt cannot permit disable_irq() in there? (I think the _reason_ it's disable_irq() is, yes, because it's infrequent and because it can hold off interrupts for a long time if we use spin_lock_irq()) ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 3c59x vortex_timer rt hack (was: rt20 patch question) 2006-05-12 16:27 ` Andrew Morton @ 2006-05-12 16:38 ` Steven Rostedt 0 siblings, 0 replies; 55+ messages in thread From: Steven Rostedt @ 2006-05-12 16:38 UTC (permalink / raw) To: Andrew Morton; +Cc: mingo, markh, linux-kernel, dwalker, tglx On Fri, 12 May 2006, Andrew Morton wrote: > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Fri, 12 May 2006, Andrew Morton wrote: > > > > > > > > > > The vortex_timer is a timeout, > > > > > > err, it's actually a function. > > > > OK, I meant vp->timer > > That's a kernel timer. :P > > > > > > > > will it go off often? > > > > > > Every five seconds if the cable's unplugged. Every 60 seconds otherwise. > > > > > > > OK, so the function is a service and not a fixup (or both). Hmm, so > > latency is an issue. > > yup. It's been five years, sorry - I'm struggling to remember why > vortex_timer() needs to block the interrupt handler. > > The chip is fairly stateful - that EL3WINDOW() thing selects a particular > register bank and needs protection against other register readers. But we > should avoid running EL3WINDOW() in the rx and tx interrupt handlers anyway > - iirc the chip is designed to permit that. > > Is tricky. > > How come -rt cannot permit disable_irq() in there? It's about having threaded interrupts soft and hard, and their combinations. disable_irq with threaded hardirqs can schedule, but we still don't want a softirq to do so (when not threaded). So it too is tricky. > > (I think the _reason_ it's disable_irq() is, yes, because it's infrequent > and because it can hold off interrupts for a long time if we use > spin_lock_irq()) > Well, it seems that the spin_lock_irq for -rt is the answer for now, until we sort out the dependencies of threaded interrupts. Ingo, Maybe it will be necessary to make hardirq threading dependent on softirq threading. Is it really that bad? When would someone want Hardirq threading without having softirq threading?? -- Steve ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: rt20 patch question 2006-05-10 15:33 ` Mark Hounschell 2006-05-10 16:17 ` Steven Rostedt @ 2006-05-10 18:45 ` Steven Rostedt 1 sibling, 0 replies; 55+ messages in thread From: Steven Rostedt @ 2006-05-10 18:45 UTC (permalink / raw) To: Mark Hounschell; +Cc: linux-kernel, Daniel Walker > > So to my problem. What I mean by "the machine stops" is just that all > indications of the mouse, keyboard, and vidio stop. Then in a few > seconds will usually continue. At first I only saw problems when using > ethernet in the emulation. I would telnet into the emulation from the > linux box and do the equivalent of cat'ing a very large file. The > machine will always "stop" somewhere randomly along the display. Then > maybe continue on or maybe not. So I thought I might have a problem with > my ethernet module. Then I noticed similar things with the SCSI module > when accessing legacy scsi devices from within the emulation. Somtimes > the whole machine doesn't stop. It would appear that only somethings > have stopped. Like one or more of my I/O threads?? > > I can only say for sure that I do not have these "stops" when running > any other kernel or when running the rt20 kernel in any of the > non-complete preemption modes. > > The only change that had to be made to this app for it to run at all on > the rt20 kernel was insuring that the RTOM irq thread was at a higher > priority than the CPU process/thread. Otherwise no signals were received > from the RTOM. > (Working way to late! need to go home) Mark, could you do the following for me. I would really like to see what is being scheduled, and when. Could you apply the following patch: http://www.kihontech.com/logdev/logdev-2.6.16-rt20.patch Then do a "make oldconfig" and answer the following questions: Enable hooks for logdev device (LOGDEV_HOOKS) [N/y/?] (NEW) y Enable logdev device (LOGDEV) [M/n/y/?] (NEW) y Number of pages to allocate for logdev device (LOGDEV_PAGES) [256] (NEW) Logdev Multiple CPU buffers (LOGDEV_MULTI_CPUS) [N/y/?] (NEW) n Default Logdev prints should be enabled on startup (LOGDEV_PRINT_ENABLED) [Y/n/?] (NEW) n Default Logdev printing of context switches on startup (LOGDEV_SWITCH_ENABLED) [N/y/?] (NEW) n So basically hit yes for the first two, and use whatever for the number of pages. And no to the rest. This is a kernel ring buffer. But one of the things it records nicely is scheduling switches. The size of the ring buffer is defined by LOGDEV_PAGES, and the bigger that is the more memory it will use. Then make and install this kernel. Then download: http://www.kihontech.com/logdev/logdev_tools-0.3.1.tar.bz2 Untar it and compile it with: make KERNDIR=<path_to_kernel_dir_with_logdev_patch>/include And then copy logread onto the machine that's running the kernel. Boot into that kernel, and on this machine do: # echo 1 > /proc/logdev/switch Run your tests, and just after you see the machine freeze, do a # echo 0 > /proc/logdev/switch right away, otherwise the buffer might over flow. Obviously you can't do it while the machine is frozen, but it should be ok to do it as soon as the machine is back up running. You could just have a terminal up with the echo ready, and when the machine freezes, hit enter, and hopefully the buffered interrupt will go off after the machine is unfrozen. Then do the # ./logread > log.out and email me privately, a compressed version of that log.out. It will show nicely the scheduling that has taken placed: ie. [ 614.937438] CPU:0 (posix_cpu_timer:2) -->> (softirq-timer/0:4) [ 614.937443] CPU:0 (softirq-timer/0:4) -->> (swapper:0) [ 614.938434] CPU:0 (swapper:0) -->> (posix_cpu_timer:2) [ 614.938436] CPU:0 (posix_cpu_timer:2) -->> (softirq-timer/0:4) [ 614.938438] CPU:0 (softirq-timer/0:4) -->> (swapper:0) [ 614.938777] CPU:0 (swapper:0) -->> (IRQ 11:710) So at least I can see what is running on which CPU. Thanks, -- Steve PS. Someday I plan to use relayfs as a backend and clean it up to perhaps get this into the kernel. ^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2006-05-12 16:53 UTC | newest] Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-05-09 12:23 rt20 patch question Mark Hounschell 2006-05-09 14:38 ` Daniel Walker 2006-05-09 14:58 ` Mark Hounschell 2006-05-09 15:53 ` Daniel Walker 2006-05-10 12:39 ` Steven Rostedt 2006-05-10 13:06 ` Mark Hounschell 2006-05-10 14:10 ` Steven Rostedt 2006-05-10 15:33 ` Mark Hounschell 2006-05-10 16:17 ` Steven Rostedt 2006-05-10 18:30 ` Mark Hounschell 2006-05-10 18:49 ` Steven Rostedt 2006-05-10 19:28 ` Mark Hounschell 2006-05-11 11:25 ` Mark Hounschell 2006-05-11 12:01 ` Steven Rostedt 2006-05-11 12:22 ` Steven Rostedt 2006-05-11 13:02 ` Mark Hounschell 2006-05-11 13:14 ` Steven Rostedt 2006-05-11 13:26 ` Mark Hounschell 2006-05-11 13:53 ` Steven Rostedt 2006-05-11 14:57 ` Mark Hounschell 2006-05-12 6:47 ` Steven Rostedt 2006-05-12 7:33 ` Sébastien Dugué 2006-05-12 8:18 ` Mark Hounschell 2006-05-12 9:08 ` Mark Hounschell 2006-05-12 9:20 ` Steven Rostedt 2006-05-10 20:33 ` Steven Rostedt 2006-05-12 8:16 ` Ingo Molnar 2006-05-12 8:45 ` Steven Rostedt 2006-05-12 9:16 ` Ingo Molnar 2006-05-12 9:21 ` Ingo Molnar 2006-05-12 12:38 ` Mark Hounschell 2006-05-12 13:18 ` Steven Rostedt 2006-05-12 13:38 ` Mark Hounschell 2006-05-12 13:43 ` Mark Hounschell 2006-05-12 14:05 ` Steven Rostedt 2006-05-12 14:36 ` Mark Hounschell 2006-05-12 14:51 ` Steven Rostedt 2006-05-12 13:16 ` 3c59x vortex_timer rt hack (was: rt20 patch question) Steven Rostedt 2006-05-12 13:36 ` Ingo Molnar 2006-05-12 13:46 ` Steven Rostedt 2006-05-12 14:16 ` Andrew Morton 2006-05-12 14:32 ` Steven Rostedt 2006-05-12 14:39 ` Steven Rostedt 2006-05-12 14:43 ` Ingo Molnar 2006-05-12 14:49 ` Andrew Morton 2006-05-12 15:04 ` Steven Rostedt 2006-05-12 16:53 ` 3c59x vortex_timer rt hack Mark Hounschell 2006-05-12 15:22 ` 3c59x vortex_timer rt hack (was: rt20 patch question) Steven Rostedt 2006-05-12 15:23 ` Andrew Morton 2006-05-12 15:36 ` Steven Rostedt 2006-05-12 16:03 ` Andrew Morton 2006-05-12 16:11 ` Steven Rostedt 2006-05-12 16:27 ` Andrew Morton 2006-05-12 16:38 ` Steven Rostedt 2006-05-10 18:45 ` rt20 patch question Steven Rostedt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).