* dynamic-hz @ 2004-12-11 14:23 Andrea Arcangeli 2004-12-11 14:50 ` dynamic-hz Zwane Mwaikambo ` (4 more replies) 0 siblings, 5 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-11 14:23 UTC (permalink / raw) To: linux-kernel The below patch allows to set the HZ dynamically at boot time with command line parameter. HZ=1000 HZ=100 HZ=333 any other value just works (though certain value may cause more or less drift to the system time advance/decrease). Is there any interest from the mainline developers to merge this into 2.6? I'm getting requests for this feature being forward ported to 2.6 (both for batch jobs and for the powersaved that can trim the hz down to 80mhz). It should be up to the user to choose the HZ like it was in 2.4-aa. This patch is quite intrusive since many HZ visible to userspace have to be converted to USER_HZ, and most important because HZ isn't available at compile time anymore and every variable in function of HZ must be either changed to be in function of USER_HZ or it must be initialized at runtime. The code has debugging code (optional at compile time) so that I can guarantee that there cannot be any regression. Technically this makes a lot of sense to me (well, you can guess why I implemented it in the first place), at least in archs where one cannot reprogram the timer chip in a performant way (to stop timer ticks completely until the next posted timer). This is in production for years in SLES8 btw. http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.23aa3/9999_zzz-dynamic-hz-5.gz Comments welcome thanks. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-11 14:23 dynamic-hz Andrea Arcangeli @ 2004-12-11 14:50 ` Zwane Mwaikambo 2004-12-12 6:57 ` dynamic-hz Andrea Arcangeli 2004-12-11 21:41 ` dynamic-hz Jan Engelhardt ` (3 subsequent siblings) 4 siblings, 1 reply; 126+ messages in thread From: Zwane Mwaikambo @ 2004-12-11 14:50 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel On Sat, 11 Dec 2004, Andrea Arcangeli wrote: > This patch is quite intrusive since many HZ visible to userspace have to > be converted to USER_HZ, and most important because HZ isn't available Shouldn't that be a bug anyway regardless of dynamic-hz? ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-11 14:50 ` dynamic-hz Zwane Mwaikambo @ 2004-12-12 6:57 ` Andrea Arcangeli 0 siblings, 0 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-12 6:57 UTC (permalink / raw) To: Zwane Mwaikambo; +Cc: linux-kernel On Sat, Dec 11, 2004 at 07:50:31AM -0700, Zwane Mwaikambo wrote: > Shouldn't that be a bug anyway regardless of dynamic-hz? Yes of course. And in theory in 2.6 it'll be easier to implement than it was in 2.4, since it has a chance to be already using USER_HZ at compile time instead of HZ. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-11 14:23 dynamic-hz Andrea Arcangeli 2004-12-11 14:50 ` dynamic-hz Zwane Mwaikambo @ 2004-12-11 21:41 ` Jan Engelhardt 2004-12-12 16:35 ` dynamic-hz Pavel Machek ` (2 subsequent siblings) 4 siblings, 0 replies; 126+ messages in thread From: Jan Engelhardt @ 2004-12-11 21:41 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel Hi, >The below patch allows to set the HZ dynamically at boot time with so the only thing left is to alter HZ at runtime :) >Is there any interest from the mainline developers to merge this into 2.6? For my side, there is interest from the average user. Jan Engelhardt -- ENOSPC ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-11 14:23 dynamic-hz Andrea Arcangeli 2004-12-11 14:50 ` dynamic-hz Zwane Mwaikambo 2004-12-11 21:41 ` dynamic-hz Jan Engelhardt @ 2004-12-12 16:35 ` Pavel Machek 2004-12-12 22:23 ` dynamic-hz Andrea Arcangeli 2004-12-13 20:26 ` dynamic-hz Olaf Hering 2004-12-13 20:56 ` dynamic-hz john stultz 4 siblings, 1 reply; 126+ messages in thread From: Pavel Machek @ 2004-12-12 16:35 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel Hi! > The below patch allows to set the HZ dynamically at boot time with > command line parameter. HZ=1000 HZ=100 HZ=333 any other value just works > (though certain value may cause more or less drift to the system time > advance/decrease). > > Is there any interest from the mainline developers to merge this into > 2.6? I'm getting requests for this feature being forward ported to > 2.6 (both for batch jobs and for the powersaved that can trim the hz > down to 80mhz). It should be up to the user to choose the HZ like it was > in 2.4-aa. > > This patch is quite intrusive since many HZ visible to userspace have to > be converted to USER_HZ, and most important because HZ isn't available > at compile time anymore and every variable in function of HZ must be > either changed to be in function of USER_HZ or it must be initialized at > runtime. The code has debugging code (optional at compile time) so that > I can guarantee that there cannot be any regression. > > Technically this makes a lot of sense to me (well, you can guess why I > implemented it in the first place), at least in archs where one cannot > reprogram the timer chip in a performant way (to stop timer ticks > completely until the next posted timer). This is in production for years > in SLES8 btw. > > http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.23aa3/9999_zzz-dynamic-hz-5.gz It certainly helps with singing capacitors... What is overhead of this? Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-12 16:35 ` dynamic-hz Pavel Machek @ 2004-12-12 22:23 ` Andrea Arcangeli 2004-12-12 23:36 ` dynamic-hz Con Kolivas 0 siblings, 1 reply; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-12 22:23 UTC (permalink / raw) To: Pavel Machek; +Cc: linux-kernel On Sun, Dec 12, 2004 at 05:35:47PM +0100, Pavel Machek wrote: > It certainly helps with singing capacitors... What is overhead of ;) > this? The overhead is a single l1 cacheline in the paths manipulating HZ (rather than having an immediate value hardcoded in the asm, it reads it from a memory location not in the icache). Plus there are some conversion routines in the USER_HZ usages. It's not a measurable difference. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-12 22:23 ` dynamic-hz Andrea Arcangeli @ 2004-12-12 23:36 ` Con Kolivas 2004-12-12 23:42 ` dynamic-hz Pavel Machek ` (6 more replies) 0 siblings, 7 replies; 126+ messages in thread From: Con Kolivas @ 2004-12-12 23:36 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Pavel Machek, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1168 bytes --] Andrea Arcangeli wrote: > On Sun, Dec 12, 2004 at 05:35:47PM +0100, Pavel Machek wrote: > >>It certainly helps with singing capacitors... What is overhead of > > > ;) > > >>this? > > > The overhead is a single l1 cacheline in the paths manipulating HZ > (rather than having an immediate value hardcoded in the asm, it reads it > from a memory location not in the icache). Plus there are some > conversion routines in the USER_HZ usages. It's not a measurable > difference. Just being devils advocate here... I had variable Hz in my tree for a while and found there was one solitary purpose to setting Hz to 100; to silence cheap capacitors. The rest of my users that were setting Hz to 100 for so-called performance gains were doing so under the false impression that cpu usage was lower simply because of the woefully inaccurate cpu usage calcuation at 100Hz. The performance benefit, if any, is often lost in noise during benchmarks and when there, is less than 1%. So I was wondering if you had some specific advantage in mind for this patch? Is there some arch-specific advantage? I can certainly envision disadvantages to lower Hz. Cheers, Con [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 256 bytes --] ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-12 23:36 ` dynamic-hz Con Kolivas @ 2004-12-12 23:42 ` Pavel Machek 2004-12-13 0:09 ` dynamic-hz Con Kolivas 2004-12-12 23:43 ` dynamic-hz Andrea Arcangeli ` (5 subsequent siblings) 6 siblings, 1 reply; 126+ messages in thread From: Pavel Machek @ 2004-12-12 23:42 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrea Arcangeli, linux-kernel Hi! > >The overhead is a single l1 cacheline in the paths manipulating HZ > >(rather than having an immediate value hardcoded in the asm, it reads it > >from a memory location not in the icache). Plus there are some > >conversion routines in the USER_HZ usages. It's not a measurable > >difference. > > Just being devils advocate here... > > I had variable Hz in my tree for a while and found there was one > solitary purpose to setting Hz to 100; to silence cheap capacitors. > > The rest of my users that were setting Hz to 100 for so-called > performance gains were doing so under the false impression that cpu > usage was lower simply because of the woefully inaccurate cpu usage > calcuation at 100Hz. > > The performance benefit, if any, is often lost in noise during > benchmarks and when there, is less than 1%. So I was wondering if you > had some specific advantage in mind for this patch? Is there some > arch-specific advantage? I can certainly envision disadvantages to lower Hz. Actually, I measured about 1W power savings with HZ=100. That's about as much as spindown of disk saves... Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-12 23:42 ` dynamic-hz Pavel Machek @ 2004-12-13 0:09 ` Con Kolivas 2004-12-13 8:37 ` dynamic-hz Jan Engelhardt 2004-12-13 10:43 ` dynamic-hz Pavel Machek 0 siblings, 2 replies; 126+ messages in thread From: Con Kolivas @ 2004-12-13 0:09 UTC (permalink / raw) To: Pavel Machek; +Cc: Con Kolivas, Andrea Arcangeli, linux-kernel Pavel Machek writes: > Hi! > >> >The overhead is a single l1 cacheline in the paths manipulating HZ >> >(rather than having an immediate value hardcoded in the asm, it reads it >> >from a memory location not in the icache). Plus there are some >> >conversion routines in the USER_HZ usages. It's not a measurable >> >difference. >> >> Just being devils advocate here... >> >> I had variable Hz in my tree for a while and found there was one >> solitary purpose to setting Hz to 100; to silence cheap capacitors. >> >> The rest of my users that were setting Hz to 100 for so-called >> performance gains were doing so under the false impression that cpu >> usage was lower simply because of the woefully inaccurate cpu usage >> calcuation at 100Hz. >> >> The performance benefit, if any, is often lost in noise during >> benchmarks and when there, is less than 1%. So I was wondering if you >> had some specific advantage in mind for this patch? Is there some >> arch-specific advantage? I can certainly envision disadvantages to lower Hz. > > Actually, I measured about 1W power savings with HZ=100. That's about > as much as spindown of disk saves... How does the popular proprietary operating system cope with this? My understanding is they run 1000Hz yet they have good power saving and quiet capacitors. Presumably they do a lot less per timer tick? Cheers, Con ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 0:09 ` dynamic-hz Con Kolivas @ 2004-12-13 8:37 ` Jan Engelhardt 2004-12-13 10:43 ` dynamic-hz Pavel Machek 1 sibling, 0 replies; 126+ messages in thread From: Jan Engelhardt @ 2004-12-13 8:37 UTC (permalink / raw) Cc: linux-kernel > How does the popular proprietary operating system cope with this? My > understanding is they run 1000Hz yet they have good power saving and quiet > capacitors. Presumably they do a lot less per timer tick? Either that or they maybe use a dynamic ticker, something that adjusts itself between 100 and 1000 Hz. Jan Engelhardt -- ENOSPC ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 0:09 ` dynamic-hz Con Kolivas 2004-12-13 8:37 ` dynamic-hz Jan Engelhardt @ 2004-12-13 10:43 ` Pavel Machek 2004-12-13 11:08 ` dynamic-hz Andrea Arcangeli 1 sibling, 1 reply; 126+ messages in thread From: Pavel Machek @ 2004-12-13 10:43 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrea Arcangeli, linux-kernel Hi! > >Actually, I measured about 1W power savings with HZ=100. That's about > >as much as spindown of disk saves... > > How does the popular proprietary operating system cope with this? My > understanding is they run 1000Hz yet they have good power saving and quiet > capacitors. Presumably they do a lot less per timer tick? Doing lot less per timer tick is not going to help much... You cpu needs to awaken, anyway, and awaking of CPU takes lot of time and lot of power, and is probably going to take way more power than execution of timer interrupt. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 10:43 ` dynamic-hz Pavel Machek @ 2004-12-13 11:08 ` Andrea Arcangeli 2004-12-13 19:36 ` dynamic-hz john stultz 0 siblings, 1 reply; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-13 11:08 UTC (permalink / raw) To: Pavel Machek; +Cc: Con Kolivas, linux-kernel On Mon, Dec 13, 2004 at 11:43:21AM +0100, Pavel Machek wrote: > Doing lot less per timer tick is not going to help much... You cpu I also doubt we can do significantly less per timer tick. There's some new code and lock like the monotonic_lock but that's going to be lost in the noise, the irq highlevel interface has some overhead too, but that's going to be lost in the noise too. The rest pretty much cannot be avoided. I didn't measure it but I suspect the slowest part might actually be the outb_p/inb_p and the enter/exit kernel. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 11:08 ` dynamic-hz Andrea Arcangeli @ 2004-12-13 19:36 ` john stultz 0 siblings, 0 replies; 126+ messages in thread From: john stultz @ 2004-12-13 19:36 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Pavel Machek, Con Kolivas, lkml On Mon, 2004-12-13 at 03:08, Andrea Arcangeli wrote: > On Mon, Dec 13, 2004 at 11:43:21AM +0100, Pavel Machek wrote: > > Doing lot less per timer tick is not going to help much... You cpu > > I also doubt we can do significantly less per timer tick. Well, I'd like see the timeofday timekeeping work reduced so we don't do it every tick. Instead it would become a scheduled event that goes off every second or so. thanks -john ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-12 23:36 ` dynamic-hz Con Kolivas 2004-12-12 23:42 ` dynamic-hz Pavel Machek @ 2004-12-12 23:43 ` Andrea Arcangeli 2004-12-13 0:18 ` dynamic-hz Con Kolivas 2004-12-13 7:43 ` dynamic-hz Stefan Seyfried ` (4 subsequent siblings) 6 siblings, 1 reply; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-12 23:43 UTC (permalink / raw) To: Con Kolivas; +Cc: Pavel Machek, linux-kernel On Mon, Dec 13, 2004 at 10:36:19AM +1100, Con Kolivas wrote: > The performance benefit, if any, is often lost in noise during > benchmarks and when there, is less than 1%. So I was wondering if you > had some specific advantage in mind for this patch? Is there some > arch-specific advantage? I can certainly envision disadvantages to lower Hz. My last number I've here is 1% for kernel compile. We're not talking fancy desktop stuff here, we're talking about raw computing servers that runs in userspace 99.9% of the time where the 1% loss is going to be multiplied dozen or hundred of times. For those HZ=1000 is a pure tangible disavantage. For desktops 1% of cpu being lost is not an issue of course. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-12 23:43 ` dynamic-hz Andrea Arcangeli @ 2004-12-13 0:18 ` Con Kolivas 2004-12-13 0:27 ` dynamic-hz Andrea Arcangeli 0 siblings, 1 reply; 126+ messages in thread From: Con Kolivas @ 2004-12-13 0:18 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Con Kolivas, Pavel Machek, linux-kernel Andrea Arcangeli writes: > On Mon, Dec 13, 2004 at 10:36:19AM +1100, Con Kolivas wrote: >> The performance benefit, if any, is often lost in noise during >> benchmarks and when there, is less than 1%. So I was wondering if you >> had some specific advantage in mind for this patch? Is there some >> arch-specific advantage? I can certainly envision disadvantages to lower Hz. > > My last number I've here is 1% for kernel compile. We're not talking > fancy desktop stuff here, we're talking about raw computing servers that > runs in userspace 99.9% of the time where the 1% loss is going to be > multiplied dozen or hundred of times. For those HZ=1000 is a pure > tangible disavantage. > > For desktops 1% of cpu being lost is not an issue of course. Thanks. I have to admit that the real reason I wrote this email was for this discussion to go on record so that desktop users would not get inappropriately excited by this change. Cheers, Con ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 0:18 ` dynamic-hz Con Kolivas @ 2004-12-13 0:27 ` Andrea Arcangeli 2004-12-13 1:50 ` dynamic-hz Zwane Mwaikambo 0 siblings, 1 reply; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-13 0:27 UTC (permalink / raw) To: Con Kolivas; +Cc: Pavel Machek, linux-kernel On Mon, Dec 13, 2004 at 11:18:15AM +1100, Con Kolivas wrote: > Thanks. I have to admit that the real reason I wrote this email was for > this discussion to go on record so that desktop users would not get > inappropriately excited by this change. Sure, desktop doesn't need this, the reason somebody is asking for it, is that the desktop stuff hurted some other non-desktop usages. Infact my 2.4 tree was setting by default HZ=1000 if 'desktop' paramter was passed to the kernel (so that I could lower the timeslice accordingly too, without losing the effect of the nicelevels between nice 0 and +19). The other new case where I'm asked for this feature is again not the desktop but the high end laptop with cpu throttling down to 80mhz, and what Pavel mentioned about the lower consumption. Perhaps we could do variable HZ there, though I doubt it has a pit that can be reprogrammed with sane performance. Very few people are going to get real benefit from HZ=1000, but I certainly agree it worth to keep HZ=1000 on desktops since on a idle machine the downside of the more frequent irq sure isn't measurable, while having shorter timeslices may be visible with many tasks, and shorter timeslices requires faster HZ to preserve the nicelevels. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 0:27 ` dynamic-hz Andrea Arcangeli @ 2004-12-13 1:50 ` Zwane Mwaikambo 2004-12-13 11:28 ` dynamic-hz Andrea Arcangeli 0 siblings, 1 reply; 126+ messages in thread From: Zwane Mwaikambo @ 2004-12-13 1:50 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Con Kolivas, Pavel Machek, linux-kernel On Mon, 13 Dec 2004, Andrea Arcangeli wrote: > Sure, desktop doesn't need this, the reason somebody is asking for it, > is that the desktop stuff hurted some other non-desktop usages. Infact > my 2.4 tree was setting by default HZ=1000 if 'desktop' paramter was > passed to the kernel (so that I could lower the timeslice accordingly > too, without losing the effect of the nicelevels between nice 0 and > +19). > > The other new case where I'm asked for this feature is again not the > desktop but the high end laptop with cpu throttling down to 80mhz, and > what Pavel mentioned about the lower consumption. Perhaps we could do > variable HZ there, though I doubt it has a pit that can be reprogrammed > with sane performance. Well most x86(64) these days have local APICs and that provides a relatively inexpensive one shot timer mode. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 1:50 ` dynamic-hz Zwane Mwaikambo @ 2004-12-13 11:28 ` Andrea Arcangeli 2004-12-13 12:43 ` dynamic-hz Pavel Machek 2004-12-13 14:50 ` dynamic-hz Zwane Mwaikambo 0 siblings, 2 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-13 11:28 UTC (permalink / raw) To: Zwane Mwaikambo; +Cc: Con Kolivas, Pavel Machek, linux-kernel On Sun, Dec 12, 2004 at 06:50:30PM -0700, Zwane Mwaikambo wrote: > Well most x86(64) these days have local APICs and that provides a > relatively inexpensive one shot timer mode. I doubt a one shot is appropriate. The irq latency is variable and we won't be able to atomically read tsc and rearm the one-shot timer. The intemediate error will propagate over time. You were the one making the case of the NMI, the NMI will screw completely any attempt of rearming the TSC accurately (though I don't mind too much, like for the sti; hlt, since NMI is pratically impossible to trigger in production, if a NMI is fired we've more troubles than the 1/HZ latency on a pending wakeup or on the system time taking the tangent ;) Note that what we would have to implement to use a one-shot timer for timekeeping, it's very similar to the algorithm we already have if the timer irq get lost because we lost one tick. My USB modem generates a flood of irq latency >1msec (I tried to track it down where it comes from but I failed, it seems not a cli but just the usb_uhci interrup taking 3msec to execute, and the timer irq failing to execute nested, perhaps I could fix it by forcing irq priorities by hand), so the tick-loss-adjustment always trigger on my firewall, and it costantly goes in the future of a minute per hour or so. I had to hack the code myself to reduce a bit the tsc value and now it's almost in time, randomly deviating in future and past (note the deviation with the mainline code is too huge that ntpd has no way to fix it, and it's like having ntp turned off). It's too bad I couldn't yet find any bug in the tick-loss adjustment algorithm yet. In the current tick-loss adjustment case it's the delay_at_last_interrupt and rdtscl that can't be atomic and that will force an error on us. In the one shot case it's the read of the tsc and the rearming that cannot be atomic and it will force an error on the system time. Now perhaps the error is small enough with a fast programming chip like the apic, but the awful results I've got out of the lost-tick adjustment scares me a bit to depend on a variable error to make the system time accurate. Even with the PIT, HZ=100/1000 are two numbers were we can get decent accuracy, there are probably other frequencies where the accuracy is less. (btw, my firewall systemtime will get fixed too by dyanmic-hz HZ=100, it's pure waste to keep my firewall at HZ=1000 even if I didn't have constant irq-latency of 3/4msec [measured with rdtsc], though I didn't mention this yet because dynamic-hz in my firewall case would be a pure band-aid, even fixing the tick-lost adjustment would be a band-aid, the only thing to fix is the usb irq that runs for 3/4msec without returning). ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 11:28 ` dynamic-hz Andrea Arcangeli @ 2004-12-13 12:43 ` Pavel Machek 2004-12-13 12:58 ` dynamic-hz Andrea Arcangeli 2004-12-13 14:50 ` dynamic-hz Zwane Mwaikambo 1 sibling, 1 reply; 126+ messages in thread From: Pavel Machek @ 2004-12-13 12:43 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel Hi! > > Well most x86(64) these days have local APICs and that provides a > > relatively inexpensive one shot timer mode. > > I doubt a one shot is appropriate. The irq latency is variable and we > won't be able to atomically read tsc and rearm the one-shot timer. The > intemediate error will propagate over time. But that does not matter, right? Yes, one-shot timer will not fire exactly at right place, but as long as you are reading TSC and basing next shot on current time, error should not accumulate. Pavel -- Boycott Kodak -- for their patent abuse against Java. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 12:43 ` dynamic-hz Pavel Machek @ 2004-12-13 12:58 ` Andrea Arcangeli 2004-12-13 19:12 ` dynamic-hz Pavel Machek 0 siblings, 1 reply; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-13 12:58 UTC (permalink / raw) To: Pavel Machek; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel Hi Pavel, On Mon, Dec 13, 2004 at 01:43:13PM +0100, Pavel Machek wrote: > But that does not matter, right? Yes, one-shot timer will not fire > exactly at right place, but as long as you are reading TSC and basing > next shot on current time, error should not accumulate. As said in the rest of the message, the error (or some other error) accumulates heavily today in the tick-loss compensation/adjustment algorithm in arch/i386/kernel/timers/timer_tsc.c, so I'm sceptical about using one-shots that have the very same problem of the tick-loss adjustment algorithm. Amittedly the apic is faster to reprogram than the pit to read the delay_at_last_interrupt, but it still doesn't sound too sure it will work fine. At least first I'd invest in trying to find if the tick adjustment is totally malfunctioning because of a tangible real bug, and not simply because it's unfixable (I tried to find the real bug so far, so I'm start thinking it's unfixable if really it's recalled so frequently as while using the broken usb irq like with my adsl modem). > [..] for their patent abuse against Java. java isn't open source regardless of patents, use python instead ;). ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 12:58 ` dynamic-hz Andrea Arcangeli @ 2004-12-13 19:12 ` Pavel Machek 2004-12-13 20:34 ` dynamic-hz john stultz 2004-12-14 2:36 ` dynamic-hz Andrea Arcangeli 0 siblings, 2 replies; 126+ messages in thread From: Pavel Machek @ 2004-12-13 19:12 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel Hi! > > But that does not matter, right? Yes, one-shot timer will not fire > > exactly at right place, but as long as you are reading TSC and basing > > next shot on current time, error should not accumulate. > > As said in the rest of the message, the error (or some other error) > accumulates heavily today in the tick-loss compensation/adjustment > algorithm in arch/i386/kernel/timers/timer_tsc.c, so I'm sceptical > about I do not see how it should accumulate. Lets have working TSC. You want to emulate fixed-period timer with single-shot timer. int should_fire_at; void handle_single_shot() { int delay; retry:n should_fire_at += loops_per_second/HZ delay = should_fire_at - get_tsc(); if (delay < 0) goto retry; do_single_shot_in(delay); } I'm not sure what's broken with compensation code, but using single-shot timer is not broken in theory. > > [..] for their patent abuse against Java. > > java isn't open source regardless of patents, use python instead ;). Yes, java is bad, but using patents against it is evil, too. Plus python does not yet run on my cellphone ;-). Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 19:12 ` dynamic-hz Pavel Machek @ 2004-12-13 20:34 ` john stultz 2004-12-13 20:49 ` dynamic-hz Pavel Machek 2004-12-14 2:46 ` dynamic-hz Andrea Arcangeli 2004-12-14 2:36 ` dynamic-hz Andrea Arcangeli 1 sibling, 2 replies; 126+ messages in thread From: john stultz @ 2004-12-13 20:34 UTC (permalink / raw) To: Pavel Machek; +Cc: Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml On Mon, 2004-12-13 at 11:12, Pavel Machek wrote: > Hi! > > > > But that does not matter, right? Yes, one-shot timer will not fire > > > exactly at right place, but as long as you are reading TSC and basing > > > next shot on current time, error should not accumulate. > > > > As said in the rest of the message, the error (or some other error) > > accumulates heavily today in the tick-loss compensation/adjustment > > algorithm in arch/i386/kernel/timers/timer_tsc.c, so I'm sceptical > > about > > I do not see how it should accumulate. Lets have working TSC. You want > to emulate fixed-period timer with single-shot timer. Its caused by the fact that we don't use the the TSC to accumulate time. We are instead interpolating between timer ticks and the TSC, where the timer tick is what really accumulates time, and the TSC is used for inter-tick time keeping (with the exception of the lost tick compensation code). Unfortunately interrupt delay and queueing can cause situations where a tick appears to be lost, but then immediately after a second one appears. In this case we add two, compensating for the loss, and then add one more. One could try to catch these early-seeming ticks w/ similar compensation code, but due to TSC calibration error there are sure to be holes where more time inconsistencies could poke through. My feeling is that we need to stop interpolating and just trust one time source (ie: the TSC or ACPIPM or HPET or whatever). Check out my timeofday patches for more details. thanks -john ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 20:34 ` dynamic-hz john stultz @ 2004-12-13 20:49 ` Pavel Machek 2004-12-14 2:04 ` dynamic-hz Andrea Arcangeli [not found] ` <20041214013924.GB14617@atomide.com> 2004-12-14 2:46 ` dynamic-hz Andrea Arcangeli 1 sibling, 2 replies; 126+ messages in thread From: Pavel Machek @ 2004-12-13 20:49 UTC (permalink / raw) To: john stultz; +Cc: Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml Hi! > > > > But that does not matter, right? Yes, one-shot timer will not fire > > > > exactly at right place, but as long as you are reading TSC and basing > > > > next shot on current time, error should not accumulate. > > > > > > As said in the rest of the message, the error (or some other error) > > > accumulates heavily today in the tick-loss compensation/adjustment > > > algorithm in arch/i386/kernel/timers/timer_tsc.c, so I'm sceptical > > > about > > > > I do not see how it should accumulate. Lets have working TSC. You want > > to emulate fixed-period timer with single-shot timer. > > Its caused by the fact that we don't use the the TSC to accumulate time. > We are instead interpolating between timer ticks and the TSC, where Yes, it was supposed to be simple, so that Andrea understands that there's nothing inherently broken with single-shot timers. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 20:49 ` dynamic-hz Pavel Machek @ 2004-12-14 2:04 ` Andrea Arcangeli [not found] ` <20041214013924.GB14617@atomide.com> 1 sibling, 0 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-14 2:04 UTC (permalink / raw) To: Pavel Machek; +Cc: john stultz, Zwane Mwaikambo, Con Kolivas, lkml On Mon, Dec 13, 2004 at 09:49:33PM +0100, Pavel Machek wrote: > Yes, it was supposed to be simple, so that Andrea understands that > there's nothing inherently broken with single-shot timers. Single shot timer is unusable for system time accounting, at least as long as you want to allow nmi. This is a tangible fact, no matter how simple the example is. Even the lost tick compensation is not working at all, and it has the same issues that the one-shot timer has in keeping the system time accurate. Pavel, write a program to do iopl(2) cli() wait 3msec; sti() wait 3msec cli() wait 3msec in a loop. Then watch your system time go in the future at a rate of a few minutes per hour, then fix it. After you fixed it we'll get my attention about one-shot timer again ;). I already tried to fix it and failed so far since I can't see bugs in the current code. (actually I fixed it by breaking the code, and dropping some ticks somewhere) ^ permalink raw reply [flat|nested] 126+ messages in thread
[parent not found: <20041214013924.GB14617@atomide.com>]
* Re: dynamic-hz [not found] ` <20041214013924.GB14617@atomide.com> @ 2004-12-14 9:37 ` Pavel Machek 2004-12-14 21:18 ` dynamic-hz Tony Lindgren 0 siblings, 1 reply; 126+ messages in thread From: Pavel Machek @ 2004-12-14 9:37 UTC (permalink / raw) To: Tony Lindgren Cc: john stultz, Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml Hi! > > > > > > But that does not matter, right? Yes, one-shot timer will not fire > > > > > > exactly at right place, but as long as you are reading TSC and basing > > > > > > next shot on current time, error should not accumulate. > > > > > > > > > > As said in the rest of the message, the error (or some other error) > > > > > accumulates heavily today in the tick-loss compensation/adjustment > > > > > algorithm in arch/i386/kernel/timers/timer_tsc.c, so I'm sceptical > > > > > about > > > > > > > > I do not see how it should accumulate. Lets have working TSC. You want > > > > to emulate fixed-period timer with single-shot timer. > > > > > > Its caused by the fact that we don't use the the TSC to accumulate time. > > > We are instead interpolating between timer ticks and the TSC, where > > > > Yes, it was supposed to be simple, so that Andrea understands that > > there's nothing inherently broken with single-shot timers. > > Just a quick comment; The timer does not need to be single-shot > all the time, it can be a combination of continuous and variable > length timer, and it can change depending on the system load. > > We recently added VST support for OMAP in linux-omap bk tree, and > made some changes to the previous VST implementations that might be > of interest: ... > The patch in question is at: > > http://linux-omap.bkbits.net:8080/main/user=tmlind/patch@1.2016.4.18?nav=!-|index.html|stats|!+|index.html|ChangeSet@-12w|cset@1.2016.4.18 Wow, that's basically 8 lines of code plus driver for new hardware... Is it really that simple? Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 9:37 ` dynamic-hz Pavel Machek @ 2004-12-14 21:18 ` Tony Lindgren 2004-12-14 22:06 ` dynamic-hz Pavel Machek 0 siblings, 1 reply; 126+ messages in thread From: Tony Lindgren @ 2004-12-14 21:18 UTC (permalink / raw) To: Pavel Machek Cc: john stultz, Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml * Pavel Machek <pavel@suse.cz> [041214 01:38]: > Hi! > > > > > > > > But that does not matter, right? Yes, one-shot timer will not fire > > > > > > > exactly at right place, but as long as you are reading TSC and basing > > > > > > > next shot on current time, error should not accumulate. > > > > > > > > > > > > As said in the rest of the message, the error (or some other error) > > > > > > accumulates heavily today in the tick-loss compensation/adjustment > > > > > > algorithm in arch/i386/kernel/timers/timer_tsc.c, so I'm sceptical > > > > > > about > > > > > > > > > > I do not see how it should accumulate. Lets have working TSC. You want > > > > > to emulate fixed-period timer with single-shot timer. > > > > > > > > Its caused by the fact that we don't use the the TSC to accumulate time. > > > > We are instead interpolating between timer ticks and the TSC, where > > > > > > Yes, it was supposed to be simple, so that Andrea understands that > > > there's nothing inherently broken with single-shot timers. > > > > Just a quick comment; The timer does not need to be single-shot > > all the time, it can be a combination of continuous and variable > > length timer, and it can change depending on the system load. > > > > We recently added VST support for OMAP in linux-omap bk tree, and > > made some changes to the previous VST implementations that might be > > of interest: > ... > > The patch in question is at: > > > > http://linux-omap.bkbits.net:8080/main/user=tmlind/patch@1.2016.4.18?nav=!-|index.html|stats|!+|index.html|ChangeSet@-12w|cset@1.2016.4.18 > > Wow, that's basically 8 lines of code plus driver for new > hardware... Is it really that simple? Yeah, the key things are reprogramming the timer in the idle loop based on next_timer_interrupt(), and calling timer_interrupt from other interrupts as well :) Should we try a similar patch for x86/amd64? I'm not sure which timers to use though? One should be programmable length for the interrupt, and the other continuous for the timekeeping. BTW, looks like my upgraded mail server is still a bit messed up, and my original post did not make it to the list. But most of the message is quoted above anyways. Here's the link to the patch again as tinyurl: http://tinyurl.com/69n4k Tony ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 21:18 ` dynamic-hz Tony Lindgren @ 2004-12-14 22:06 ` Pavel Machek 2004-12-14 23:00 ` dynamic-hz linux-os 2004-12-14 23:04 ` dynamic-hz Tony Lindgren 0 siblings, 2 replies; 126+ messages in thread From: Pavel Machek @ 2004-12-14 22:06 UTC (permalink / raw) To: Tony Lindgren Cc: john stultz, Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml Hi! > > > The patch in question is at: > > > > > > http://linux-omap.bkbits.net:8080/main/user=tmlind/patch@1.2016.4.18?nav=!-|index.html|stats|!+|index.html|ChangeSet@-12w|cset@1.2016.4.18 > > > > Wow, that's basically 8 lines of code plus driver for new > > hardware... Is it really that simple? > > Yeah, the key things are reprogramming the timer in the idle loop > based on next_timer_interrupt(), and calling timer_interrupt from > other interrupts as well :) > > Should we try a similar patch for x86/amd64? I'm not sure which timers > to use though? One should be programmable length for the interrupt, > and the other continuous for the timekeeping. Yes, it would certainly be interesting. 5% power savings, and no singing capacitors, while keeping HZ=1000. Sounds good to me. There are about 1000 timers available in PC, each having its own quirks. CMOS clock should be able to generate 1024Hz periodic timer (we currently do not use) and TSC we currently use for periodic timer should be usable in single-shot mode. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 22:06 ` dynamic-hz Pavel Machek @ 2004-12-14 23:00 ` linux-os 2004-12-14 23:13 ` dynamic-hz Tony Lindgren 2004-12-14 23:04 ` dynamic-hz Tony Lindgren 1 sibling, 1 reply; 126+ messages in thread From: linux-os @ 2004-12-14 23:00 UTC (permalink / raw) To: Pavel Machek Cc: Tony Lindgren, john stultz, Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml On Tue, 14 Dec 2004, Pavel Machek wrote: > Hi! > >>>> The patch in question is at: >>>> >>>> http://linux-omap.bkbits.net:8080/main/user=tmlind/patch@1.2016.4.18?nav=!-|index.html|stats|!+|index.html|ChangeSet@-12w|cset@1.2016.4.18 >>> >>> Wow, that's basically 8 lines of code plus driver for new >>> hardware... Is it really that simple? >> >> Yeah, the key things are reprogramming the timer in the idle loop >> based on next_timer_interrupt(), and calling timer_interrupt from >> other interrupts as well :) >> >> Should we try a similar patch for x86/amd64? I'm not sure which timers >> to use though? One should be programmable length for the interrupt, >> and the other continuous for the timekeeping. > > Yes, it would certainly be interesting. 5% power savings, and no > singing capacitors, while keeping HZ=1000. Sounds good to me. > > There are about 1000 timers available in PC, each having its own > quirks. CMOS clock should be able to generate 1024Hz periodic timer > (we currently do not use) and TSC we currently use for periodic timer > should be usable in single-shot mode. > Pavel > -- If you use that RTC timer, it needs to be something that can be turned OFF. Many embedded applications use that because its the only timer that the OS doesn't muck with. It also has very low noise which makes in a good timing source for IIR filters for high precision, low data-rate data acquisition (like 24 bits). Since it generates an edge, its interrupt can't be shared. I certainly hope that you don't use it. One can read the time without disturbing the interrupt rate. One just needs to use the existing rtc_lock and not spin with the lock being held. Currently the kernel RTC software allocates the RTC interrupt even though it doesn't use it. This makes it necessary to compile the RTC as a module and then remove it when another driver needs to use the RTC interrupt source. Cheers, Dick Johnson Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by John Ashcroft. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 23:00 ` dynamic-hz linux-os @ 2004-12-14 23:13 ` Tony Lindgren 2004-12-22 20:02 ` dynamic-hz Tony Lindgren 0 siblings, 1 reply; 126+ messages in thread From: Tony Lindgren @ 2004-12-14 23:13 UTC (permalink / raw) To: linux-os Cc: Pavel Machek, john stultz, Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml * linux-os <linux-os@chaos.analogic.com> [041214 15:04]: > On Tue, 14 Dec 2004, Pavel Machek wrote: > > >Hi! > > > >>>>The patch in question is at: > >>>> > >>>>http://linux-omap.bkbits.net:8080/main/user=tmlind/patch@1.2016.4.18?nav=!-|index.html|stats|!+|index.html|ChangeSet@-12w|cset@1.2016.4.18 > >>> > >>>Wow, that's basically 8 lines of code plus driver for new > >>>hardware... Is it really that simple? > >> > >>Yeah, the key things are reprogramming the timer in the idle loop > >>based on next_timer_interrupt(), and calling timer_interrupt from > >>other interrupts as well :) > >> > >>Should we try a similar patch for x86/amd64? I'm not sure which timers > >>to use though? One should be programmable length for the interrupt, > >>and the other continuous for the timekeeping. > > > >Yes, it would certainly be interesting. 5% power savings, and no > >singing capacitors, while keeping HZ=1000. Sounds good to me. > > > >There are about 1000 timers available in PC, each having its own > >quirks. CMOS clock should be able to generate 1024Hz periodic timer > >(we currently do not use) and TSC we currently use for periodic timer > >should be usable in single-shot mode. > > Pavel > >-- > > If you use that RTC timer, it needs to be something that can be > turned OFF. Many embedded applications use that because its the > only timer that the OS doesn't muck with. It also has very low > noise which makes in a good timing source for IIR filters for > high precision, low data-rate data acquisition (like 24 bits). OK, thanks for the information. That could be the continuous timer then, and TSC the periodic timer. > Since it generates an edge, its interrupt can't be shared. > I certainly hope that you don't use it. One can read the > time without disturbing the interrupt rate. One just > needs to use the existing rtc_lock and not spin with > the lock being held. Yeah, the timer update would be just a read from the RTC timer. > Currently the kernel RTC software allocates the RTC interrupt > even though it doesn't use it. This makes it necessary to > compile the RTC as a module and then remove it when another > driver needs to use the RTC interrupt source. The interrupt could be used for timer wrap only. Tony ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 23:13 ` dynamic-hz Tony Lindgren @ 2004-12-22 20:02 ` Tony Lindgren 0 siblings, 0 replies; 126+ messages in thread From: Tony Lindgren @ 2004-12-22 20:02 UTC (permalink / raw) To: linux-os Cc: Pavel Machek, john stultz, Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml * Tony Lindgren <tony@atomide.com> [041214 16:22]: > * linux-os <linux-os@chaos.analogic.com> [041214 15:04]: > > On Tue, 14 Dec 2004, Pavel Machek wrote: > > > > >Hi! > > > > > >>>>The patch in question is at: > > >>>> > > >>>>http://linux-omap.bkbits.net:8080/main/user=tmlind/patch@1.2016.4.18?nav=!-|index.html|stats|!+|index.html|ChangeSet@-12w|cset@1.2016.4.18 > > >>> > > >>>Wow, that's basically 8 lines of code plus driver for new > > >>>hardware... Is it really that simple? > > >> > > >>Yeah, the key things are reprogramming the timer in the idle loop > > >>based on next_timer_interrupt(), and calling timer_interrupt from > > >>other interrupts as well :) > > >> > > >>Should we try a similar patch for x86/amd64? I'm not sure which timers > > >>to use though? One should be programmable length for the interrupt, > > >>and the other continuous for the timekeeping. > > > > > >Yes, it would certainly be interesting. 5% power savings, and no > > >singing capacitors, while keeping HZ=1000. Sounds good to me. > > > > > >There are about 1000 timers available in PC, each having its own > > >quirks. CMOS clock should be able to generate 1024Hz periodic timer > > >(we currently do not use) and TSC we currently use for periodic timer > > >should be usable in single-shot mode. > > > Pavel > > >-- > > > > If you use that RTC timer, it needs to be something that can be > > turned OFF. Many embedded applications use that because its the > > only timer that the OS doesn't muck with. It also has very low > > noise which makes in a good timing source for IIR filters for > > high precision, low data-rate data acquisition (like 24 bits). > > OK, thanks for the information. That could be the continuous timer > then, and TSC the periodic timer. > > > Since it generates an edge, its interrupt can't be shared. > > I certainly hope that you don't use it. One can read the > > time without disturbing the interrupt rate. One just > > needs to use the existing rtc_lock and not spin with > > the lock being held. > > Yeah, the timer update would be just a read from the RTC timer. > > > Currently the kernel RTC software allocates the RTC interrupt > > even though it doesn't use it. This makes it necessary to > > compile the RTC as a module and then remove it when another > > driver needs to use the RTC interrupt source. > > The interrupt could be used for timer wrap only. Well just to follow up, I did some experiments over the weekend on my old athlon box, and looks like it's doable. I'll set up something common where various timers can register their no-tick functions. So far I have APIC timer doing the no-tick interrupts, and nothing yet for the timer to update time from. The code will using whatever timers as long as they implement the right functions. I'll post some patches when I have something working... Probably after the holidays. Tony ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 22:06 ` dynamic-hz Pavel Machek 2004-12-14 23:00 ` dynamic-hz linux-os @ 2004-12-14 23:04 ` Tony Lindgren 1 sibling, 0 replies; 126+ messages in thread From: Tony Lindgren @ 2004-12-14 23:04 UTC (permalink / raw) To: Pavel Machek Cc: john stultz, Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml * Pavel Machek <pavel@suse.cz> [041214 14:07]: > Hi! > > > > > The patch in question is at: > > > > > > > > http://linux-omap.bkbits.net:8080/main/user=tmlind/patch@1.2016.4.18?nav=!-|index.html|stats|!+|index.html|ChangeSet@-12w|cset@1.2016.4.18 > > > > > > Wow, that's basically 8 lines of code plus driver for new > > > hardware... Is it really that simple? > > > > Yeah, the key things are reprogramming the timer in the idle loop > > based on next_timer_interrupt(), and calling timer_interrupt from > > other interrupts as well :) > > > > Should we try a similar patch for x86/amd64? I'm not sure which timers > > to use though? One should be programmable length for the interrupt, > > and the other continuous for the timekeeping. > > Yes, it would certainly be interesting. 5% power savings, and no > singing capacitors, while keeping HZ=1000. Sounds good to me. > > There are about 1000 timers available in PC, each having its own > quirks. CMOS clock should be able to generate 1024Hz periodic timer > (we currently do not use) and TSC we currently use for periodic timer > should be usable in single-shot mode. I guess you mean to use the CMOS clock for continuous timer, and TSC for periodic timer? OK, I'll take a look at it later this week or over the weekend. Haven't looked at the x86 timer code for a while, but I think I'll set up a new clock where we can just register a timer update function and a periodic tick function. That way we can easily use whatever hardware timers are available. Tony ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 20:34 ` dynamic-hz john stultz 2004-12-13 20:49 ` dynamic-hz Pavel Machek @ 2004-12-14 2:46 ` Andrea Arcangeli 2004-12-14 19:24 ` dynamic-hz john stultz 1 sibling, 1 reply; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-14 2:46 UTC (permalink / raw) To: john stultz; +Cc: Pavel Machek, Zwane Mwaikambo, Con Kolivas, lkml On Mon, Dec 13, 2004 at 12:34:00PM -0800, john stultz wrote: > source (ie: the TSC or ACPIPM or HPET or whatever). Check out my How long is the TSC calibration going to last before introducing visible errors? Is there any error introduced while we transfer the accuracy of the pit to the acuracy of the TSC during calibration? It would be much simpler to only use the TSC to provide system time, but I assume we would be already doing it, if it wasn't for the lost accuracy. Plus are you already handling cpufreq changed every second by powersaved? Doesn't that introduce further inaccuracy in the system time? As for the lost-tick compensation, it's not working at all, my system goes as fast in the future as it would go in the past by disabling it. So the only effect I get by the lost tick compensation is that it's moving in the future instead of in the past, but the magnitude of the error is the same and in turn it's not working at all. The real bug is the USB irq handler that takes 3/4msec to execute and I get a constant load of those irqs from the adsl modem. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 2:46 ` dynamic-hz Andrea Arcangeli @ 2004-12-14 19:24 ` john stultz 0 siblings, 0 replies; 126+ messages in thread From: john stultz @ 2004-12-14 19:24 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Pavel Machek, Zwane Mwaikambo, Con Kolivas, lkml On Mon, 2004-12-13 at 18:46, Andrea Arcangeli wrote: > On Mon, Dec 13, 2004 at 12:34:00PM -0800, john stultz wrote: > > source (ie: the TSC or ACPIPM or HPET or whatever). Check out my > > How long is the TSC calibration going to last before introducing visible > errors? Is there any error introduced while we transfer the accuracy of > the pit to the acuracy of the TSC during calibration? It would be much > simpler to only use the TSC to provide system time, but I assume we > would be already doing it, if it wasn't for the lost accuracy. Well, the TSC is a terrible time source. Currently when interpolating, the error between the TSC and the PIT allows for time inconsistencies. When using it as the sole timesource, accurate calibration does become much more important, because we do accumulate the error. However, NTP or other methods of correcting for poor calibration or drift could be used. I realize not everything can use NTP, but George Anzinger has some code that would use the PIT to measure and adjust the TSC frequency values. Unfortunately I haven't gotten around to looking at it yet. > Plus are you already handling cpufreq changed every second by > powersaved? Doesn't that introduce further inaccuracy in the system > time? Yea, my code currently doesn't have cpufreq hooks, but the cpufreq notifier would act as an interrupt which would save off the accumulated time at the old frequency and update the time source with the new frequency. > As for the lost-tick compensation, it's not working at all, my system > goes as fast in the future as it would go in the past by disabling it. > So the only effect I get by the lost tick compensation is that it's > moving in the future instead of in the past, but the magnitude of the > error is the same and in turn it's not working at all. The real bug is > the USB irq handler that takes 3/4msec to execute and I get a constant > load of those irqs from the adsl modem. I agree. Fixing the irq handler is right solution. thanks -john ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 19:12 ` dynamic-hz Pavel Machek 2004-12-13 20:34 ` dynamic-hz john stultz @ 2004-12-14 2:36 ` Andrea Arcangeli 2004-12-14 9:39 ` dynamic-hz Pavel Machek 2004-12-14 9:59 ` dynamic-hz Pavel Machek 1 sibling, 2 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-14 2:36 UTC (permalink / raw) To: Pavel Machek; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel On Mon, Dec 13, 2004 at 08:12:49PM +0100, Pavel Machek wrote: > Hi! > > > > But that does not matter, right? Yes, one-shot timer will not fire > > > exactly at right place, but as long as you are reading TSC and basing > > > next shot on current time, error should not accumulate. > > > > As said in the rest of the message, the error (or some other error) > > accumulates heavily today in the tick-loss compensation/adjustment > > algorithm in arch/i386/kernel/timers/timer_tsc.c, so I'm sceptical > > about > > I do not see how it should accumulate. Lets have working TSC. You want > to emulate fixed-period timer with single-shot timer. > > int should_fire_at; > > void handle_single_shot() > { > int delay; > retry:n > should_fire_at += loops_per_second/HZ > delay = should_fire_at - get_tsc(); > if (delay < 0) > goto retry; Here you get a 10minute long NMI and you're automatically 10 minute in the past (or your event gets a 10 sec introduced delay) without a way to track it down. Now in theory we might run this critical section into some special section and we could restart it by updating regs->eip before returning form the nmi. But that still leaves the unfixable window introduced by the cpu not executing the tsc read and the do_single_shot_in atomically. Given my recent experience with the lost tick compensation code that has exactly the same window, I'm not optimistic it's going to keep the system time uptodate accurately. Perhaps the apic is a lot faster and the error won't propagate visibly. I'm not against trying but the thing about the one-shot timer system time accuracy is a lot more complicated than this pseudocode, and it's not obvious at all that your pseducode will work. > do_single_shot_in(delay); The only other way would be to use the 64bit tsc as the only source for the system time (perhaps that's what you mean with the above pseudocode?). But the calibration code would need changes to allow that. Today we use the calibration divisor only in a small range so the calibration can be quick and this way changing CPU frequency to the cpu is also easier. Even before thinking at using the one-shot timer, I would like to fix the lost tick compensation of current production 2.6.9, only then we can talk about tickless by using a one-shot timer. If we can't do the lost-tick compensation without screwing the system time, I don't see how we can do the one-shot timer without screwing the system time. The lost tick compensation as well could be avoided if we use the TSC as the source for gettimeofday and we ignore the PIT completely and we use the PIT only to wakeup the cpu once in a while. *Then* we could convert the PIT to a one-shot timer trivially too, but as said above the accuracy of the divisor isn't enough and I've no idea if we can get a real calibration that lasts more than a few hours. My fast_gettimeoffset_quotient is set to 0x2f0271, that means the last significant bit of the fast_gettimeoffset_quotient will showup in the gettimeofday last singificant bit, after the tsc counted 2**32 ticks, that means after less then 4 seconds in my computers at >1ghz. That's why the gettimeoffset will never return anything longer than 1/HZ, so the error cannot propagate in userspace. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 2:36 ` dynamic-hz Andrea Arcangeli @ 2004-12-14 9:39 ` Pavel Machek 2004-12-14 9:59 ` dynamic-hz Pavel Machek 1 sibling, 0 replies; 126+ messages in thread From: Pavel Machek @ 2004-12-14 9:39 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel Hi! > > do_single_shot_in(delay); > > The only other way would be to use the 64bit tsc as the only source for > the system time (perhaps that's what you mean with the above > pseudocode?). But the calibration code would need changes to allow > that. Yes, that's what I meant. > Even before thinking at using the one-shot timer, I would like to > fix the lost tick compensation of current production 2.6.9, only then we > can talk about tickless by using a one-shot timer. If we can't do the > lost-tick compensation without screwing the system time, I don't see how > we can do the one-shot timer without screwing the system time. Okay, I'll take a look. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 2:36 ` dynamic-hz Andrea Arcangeli 2004-12-14 9:39 ` dynamic-hz Pavel Machek @ 2004-12-14 9:59 ` Pavel Machek 2004-12-14 15:25 ` dynamic-hz Andrea Arcangeli 1 sibling, 1 reply; 126+ messages in thread From: Pavel Machek @ 2004-12-14 9:59 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel Hi! > Even before thinking at using the one-shot timer, I would like to > fix the lost tick compensation of current production 2.6.9, only then we > can talk about tickless by using a one-shot timer. If we can't do the > lost-tick compensation without screwing the system time, I don't see how > we can do the one-shot timer without screwing the system time. Are you using CONFIG_HPET_TIMER by chance? It seems to be missing some strategic -1, TSC (etc) get it right. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 9:59 ` dynamic-hz Pavel Machek @ 2004-12-14 15:25 ` Andrea Arcangeli 2004-12-14 22:02 ` USB making time drift [was Re: dynamic-hz] Pavel Machek 0 siblings, 1 reply; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-14 15:25 UTC (permalink / raw) To: Pavel Machek; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel On Tue, Dec 14, 2004 at 10:59:39AM +0100, Pavel Machek wrote: > Are you using CONFIG_HPET_TIMER by chance? It seems to be missing some > strategic -1, TSC (etc) get it right. I'm not using hpet because it's an old hardware, this is with timer_tsc. It must be reproducible in any machine out there, especially with machines with usb it should be reproducible even without any userspace testcase doing iopl/cli/sti. Time will go silenty in the future at every usb irq (they often last 3/4msec). ^ permalink raw reply [flat|nested] 126+ messages in thread
* USB making time drift [was Re: dynamic-hz] 2004-12-14 15:25 ` dynamic-hz Andrea Arcangeli @ 2004-12-14 22:02 ` Pavel Machek 2004-12-14 23:16 ` Andrea Arcangeli 2004-12-16 1:15 ` Time goes crazy in 2.6.9 after long cli [was Re: USB making time drift] Pavel Machek 0 siblings, 2 replies; 126+ messages in thread From: Pavel Machek @ 2004-12-14 22:02 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel Hi! > On Tue, Dec 14, 2004 at 10:59:39AM +0100, Pavel Machek wrote: > > Are you using CONFIG_HPET_TIMER by chance? It seems to be missing some > > strategic -1, TSC (etc) get it right. > > I'm not using hpet because it's an old hardware, this is with timer_tsc. > It must be reproducible in any machine out there, especially with > machines with usb it should be reproducible even without any userspace > testcase doing iopl/cli/sti. Time will go silenty in the future at every > usb irq (they often last 3/4msec). How much drift do you see? I have machine with UHCI here, and am using usb most of the time (bluetooth for gprs connection), and did not notice too bad drift. ntpdate does some adjustment each time I connect to the network, but it -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: USB making time drift [was Re: dynamic-hz] 2004-12-14 22:02 ` USB making time drift [was Re: dynamic-hz] Pavel Machek @ 2004-12-14 23:16 ` Andrea Arcangeli 2004-12-15 2:59 ` Gene Heskett 2004-12-16 0:58 ` Pavel Machek 2004-12-16 1:15 ` Time goes crazy in 2.6.9 after long cli [was Re: USB making time drift] Pavel Machek 1 sibling, 2 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-14 23:16 UTC (permalink / raw) To: Pavel Machek; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel On Tue, Dec 14, 2004 at 11:02:39PM +0100, Pavel Machek wrote: > How much drift do you see? huge drift, minutes per hour or similar. > I have machine with UHCI here, and am using usb most of the time > (bluetooth for gprs connection), and did not notice too bad > drift. ntpdate does some adjustment each time I connect to the > network, but it Could be it happens only with my usb chipset or only with the adsl modem with the usermode driver. You can just write the proggy doing iopl cli/sti in a loop (keeping irqs off for 3/4msec a few times per second like my usb modem does), you should be able to see the drift in any machine without requiring an adsl modem. This was the status of my last attempt to fix it a few weeks ago. Patch fixes a few unrelated bits. But the core of the below patch is actually wrong, previous code did the right thing even if this works better in practice. so I had not much motivation to extract the good bits until I find the source of the big screwup in system time. I probably should do any further debugging with an userspace simulation (i.e. the iopl + cli/sti in a loop) within qemu. --- sp1/arch/i386/kernel/timers/timer_tsc.c.~1~ 2004-04-04 08:08:48.000000000 +0200 +++ sp1/arch/i386/kernel/timers/timer_tsc.c 2004-11-22 06:01:21.725371368 +0100 @@ -39,6 +39,7 @@ static unsigned long last_tsc_low; /* ls static unsigned long last_tsc_high; /* msb 32 bits of Time Stamp Counter */ static unsigned long long monotonic_base; static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED; +static int report_lost_ticks; /* command line option */ /* convert from cycles(64bits) => nanoseconds (64bits) * basic equation: @@ -69,8 +70,6 @@ static inline unsigned long long cycles_ } -static int count2; /* counter for mark_offset_tsc() */ - /* Cached *multiplier* to convert TSC counts to microseconds. * (see the equation below). * Equal to 2^32 * (1 / (clocks per usec) ). @@ -153,11 +152,12 @@ unsigned long long sched_clock(void) static void mark_offset_tsc(void) { - unsigned long lost,delay; + unsigned long ticks; unsigned long delta = last_tsc_low; - int count; - int countmp; - static int count1 = 0; + unsigned int count; + unsigned int countmp; + static unsigned int count1 = 0, count2 = LATCH; + unsigned long long this_offset, last_offset; static int lost_count = 0; @@ -175,12 +175,11 @@ static void mark_offset_tsc(void) * has the SA_INTERRUPT flag set. -arca */ - /* read Pentium cycle counter */ - - rdtsc(last_tsc_low, last_tsc_high); spin_lock(&i8253_lock); - outb_p(0x00, PIT_MODE); /* latch the count ASAP */ + + /* read Pentium cycle counter and latch the count ASAP */ + rdtsc(last_tsc_low, last_tsc_high); outb_p(0x00, PIT_MODE); count = inb_p(PIT_CH0); /* read the latched count */ count |= inb(PIT_CH0) << 8; @@ -198,7 +197,7 @@ static void mark_offset_tsc(void) spin_unlock(&i8253_lock); - if (pit_latch_buggy) { + if (unlikely(pit_latch_buggy)) { /* get center value of last 3 time lutch */ if ((count2 >= count && count >= count1) || (count1 >= count && count >= count2)) { @@ -223,11 +222,10 @@ static void mark_offset_tsc(void) "0" (eax)); delta = edx; } - delta += delay_at_last_interrupt; - lost = delta/(1000000/HZ); - delay = delta%(1000000/HZ); - if (lost >= 2) { - jiffies_64 += lost-1; + //delta += delay_at_last_interrupt; + ticks = delta/(1000000/HZ); + if (unlikely(ticks >= 2)) { + jiffies_64 += ticks-1; /* sanity check to ensure we're not always losing ticks */ if (lost_count++ > 100) { @@ -241,6 +239,20 @@ static void mark_offset_tsc(void) clock_fallback(); } + + { + static u64 last_lost_tick; + if (last_lost_tick <= jiffies_64) { + printk(KERN_WARNING "Compensate %ld timer tick(s)\n", ticks-1); + dump_stack(); + if (report_lost_ticks) + /* max 1 per sec */ + last_lost_tick = jiffies_64 + HZ; + else + /* force dump of lost ticks information not more than 1 per day */ + last_lost_tick = jiffies_64 + 60*60*24*HZ; + } + } } else lost_count = 0; /* update the monotonic base value */ @@ -248,16 +260,14 @@ static void mark_offset_tsc(void) monotonic_base += cycles_2_ns(this_offset - last_offset); write_sequnlock(&monotonic_lock); + /* Some i8253 clones hold the LATCH value visible + momentarily as they flip back to zero */ + if (unlikely(count == LATCH)) + count--; + /* calculate delay_at_last_interrupt */ count = ((LATCH-1) - count) * TICK_SIZE; delay_at_last_interrupt = (count + LATCH/2) / LATCH; - - /* catch corner case where tick rollover occured - * between tsc and pit reads (as noted when - * usec delta is > 90% # of usecs/tick) - */ - if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ)) - jiffies_64++; } static void delay_tsc(unsigned long loops) @@ -433,8 +443,6 @@ static int __init init_tsc(char* overrid * moaned if you have the only one in the world - you fix it! */ - count2 = LATCH; /* initialize counter for mark_offset_tsc() */ - if (cpu_has_tsc) { unsigned long tsc_quotient; #ifdef CONFIG_HPET_TIMER @@ -502,7 +510,12 @@ static int __init tsc_setup(char *str) #endif __setup("notsc", tsc_setup); - +static int __init report_lost_ticks_setup(char *str) +{ + report_lost_ticks = 1; + return 1; +} +__setup("report_lost_ticks", report_lost_ticks_setup); /************************************************************/ --- sp1/arch/i386/kernel/irq.c.~1~ 2004-11-21 02:37:25.000000000 +0100 +++ sp1/arch/i386/kernel/irq.c 2004-11-22 07:03:15.140846408 +0100 @@ -217,14 +217,16 @@ inline void synchronize_irq(unsigned int int handle_IRQ_event(unsigned int irq, struct pt_regs *regs, struct irqaction *action) { - int status = 1; /* Force the "do bottom halves" bit */ + int status = 0; int retval = 0; TRIG_EVENT(irq_entry_hook, irq, regs, !(user_mode(regs))); - if (!(action->flags & SA_INTERRUPT)) - local_irq_enable(); - do { + if (action->flags & SA_INTERRUPT) + local_irq_disable(); + else + local_irq_enable(); + status |= action->flags; retval |= action->handler(irq, action->dev_id, regs); action = action->next; ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: USB making time drift [was Re: dynamic-hz] 2004-12-14 23:16 ` Andrea Arcangeli @ 2004-12-15 2:59 ` Gene Heskett 2004-12-15 9:17 ` Andrea Arcangeli 2004-12-16 0:58 ` Pavel Machek 1 sibling, 1 reply; 126+ messages in thread From: Gene Heskett @ 2004-12-15 2:59 UTC (permalink / raw) To: linux-kernel; +Cc: Andrea Arcangeli, Pavel Machek, Zwane Mwaikambo, Con Kolivas On Tuesday 14 December 2004 18:16, Andrea Arcangeli wrote: >On Tue, Dec 14, 2004 at 11:02:39PM +0100, Pavel Machek wrote: >> How much drift do you see? > >huge drift, minutes per hour or similar. Which way? I was running quite fast here, several minutes an hour, then I discovered the tickadj command, found its default was 10000, and started reducing it. At 9926, I'm staying within a sec an hour now. I have no idea when this started, I didn't discover it till I had already been running Ingo's realtime patches for a while, then checked with a stock 2.6.9 and found it was doing it then. [...] -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.30% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: USB making time drift [was Re: dynamic-hz] 2004-12-15 2:59 ` Gene Heskett @ 2004-12-15 9:17 ` Andrea Arcangeli 2004-12-15 16:44 ` Gene Heskett 2004-12-15 17:03 ` Gene Heskett 0 siblings, 2 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-15 9:17 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel, Pavel Machek, Zwane Mwaikambo, Con Kolivas On Tue, Dec 14, 2004 at 09:59:23PM -0500, Gene Heskett wrote: > Which way? I was running quite fast here, several minutes an In the future, if I disable the logic it goes in the past at the same speed it was previously going in the future. > hour, then I discovered the tickadj command, found its default > was 10000, and started reducing it. At 9926, I'm staying within > a sec an hour now. I have no idea when this started, I didn't That seems quite an hack, note I did an hack too and it make the drift much smaller (it gets manageable). But our modifications are wrong. The point is that this didn't happen with HZ=100, so it's not that tickadj is wrong, it's the tick adjustment code that doesn't work. You may want to recompile your kernel with HZ=100 and verify it goes away (I didn't verify myself, but I verified the max irq latency I get is 4msec, and in turn I'm sure HZ=100 would fix it) ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: USB making time drift [was Re: dynamic-hz] 2004-12-15 9:17 ` Andrea Arcangeli @ 2004-12-15 16:44 ` Gene Heskett 2004-12-15 18:20 ` Andrea Arcangeli 2004-12-15 20:16 ` Pavel Machek 2004-12-15 17:03 ` Gene Heskett 1 sibling, 2 replies; 126+ messages in thread From: Gene Heskett @ 2004-12-15 16:44 UTC (permalink / raw) To: linux-kernel; +Cc: Andrea Arcangeli, Pavel Machek, Zwane Mwaikambo, Con Kolivas On Wednesday 15 December 2004 04:17, Andrea Arcangeli wrote: >On Tue, Dec 14, 2004 at 09:59:23PM -0500, Gene Heskett wrote: >> Which way? I was running quite fast here, several minutes an > >In the future, if I disable the logic it goes in the past at the > same speed it was previously going in the future. > >> hour, then I discovered the tickadj command, found its default >> was 10000, and started reducing it. At 9926, I'm staying within >> a sec an hour now. I have no idea when this started, I didn't > >That seems quite an hack, note I did an hack too and it make the > drift much smaller (it gets manageable). But our modifications are > wrong. > >The point is that this didn't happen with HZ=100, so it's not that >tickadj is wrong, it's the tick adjustment code that doesn't work. > The HZ=1000 is the culprit? >You may want to recompile your kernel with HZ=100 and verify it goes >away (I didn't verify myself, but I verified the max irq latency I > get is 4msec, and in turn I'm sure HZ=100 would fix it Humm, that might also reduce the obviousness of the irq activity in the audio, there are times when I can hear it very plainly while a low level audio src is in use, like the sub-millivolt levels that come out of my Hauppauge WinTV-GO+FM card. I keep having to turn the master down to almost zip in order to keep it from sounding like I have mice chewing in the walls, but its coming from the speakers. Onboard AC-97 audio of course. Crappy stuff... Humm, 100HZ would translate to 10 millisecond intervals. If you had a 4 millisecond latency, that would be spread over 4 of the 1000 hz interrupts. That sounds rather confusing to the service routine I imagine. I'll do that just for grins & report back. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.30% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: USB making time drift [was Re: dynamic-hz] 2004-12-15 16:44 ` Gene Heskett @ 2004-12-15 18:20 ` Andrea Arcangeli 2004-12-16 1:59 ` Gene Heskett 2004-12-15 20:16 ` Pavel Machek 1 sibling, 1 reply; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-15 18:20 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel, Pavel Machek, Zwane Mwaikambo, Con Kolivas On Wed, Dec 15, 2004 at 11:44:38AM -0500, Gene Heskett wrote: > The HZ=1000 is the culprit? HZ=1000 isn't the culprit. The culprit is the >1msec latency of the usb irq, but that wouldn't be visible with HZ 100 (for this specific case HZ=100 would only be a band-aid). > Onboard AC-97 audio of course. Crappy stuff... [..] I doubt it's the chip, but only the motherboard to blame. My laptop has the ac97 but no HZ sound out of it. > translate to 10 millisecond intervals. If you had a 4 millisecond > latency, > that would be spread over 4 of the 1000 hz interrupts. That sounds > rather confusing to the service routine I imagine. The ones that get confused are the system time and the jiffies, the rest of the system can deal with long irq delays. The tick adjustment was exactly implemented so that the jiffies and system time wouldn't get confused anymore, but it just confuses it the other way around in my current experience. > I'll do that just for grins & report back. Ok. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: USB making time drift [was Re: dynamic-hz] 2004-12-15 18:20 ` Andrea Arcangeli @ 2004-12-16 1:59 ` Gene Heskett 2004-12-16 11:30 ` Andrea Arcangeli 2004-12-16 12:50 ` Alan Cox 0 siblings, 2 replies; 126+ messages in thread From: Gene Heskett @ 2004-12-16 1:59 UTC (permalink / raw) To: linux-kernel; +Cc: Andrea Arcangeli, Pavel Machek, Zwane Mwaikambo, Con Kolivas On Wednesday 15 December 2004 13:20, Andrea Arcangeli wrote: >On Wed, Dec 15, 2004 at 11:44:38AM -0500, Gene Heskett wrote: >> The HZ=1000 is the culprit? > >HZ=1000 isn't the culprit. The culprit is the >1msec latency of the > usb irq, but that wouldn't be visible with HZ 100 (for this :> specific case HZ=100 would only be a band-aid). > >> Onboard AC-97 audio of course. Crappy stuff... [..] > >I doubt it's the chip, but only the motherboard to blame. My laptop > has the ac97 but no HZ sound out of it. > >> translate to 10 millisecond intervals. If you had a 4 millisecond >> latency, >> that would be spread over 4 of the 1000 hz interrupts. That >> sounds rather confusing to the service routine I imagine. > >The ones that get confused are the system time and the jiffies, the > rest of the system can deal with long irq delays. The tick > adjustment was exactly implemented so that the jiffies and system > time wouldn't get confused anymore, but it just confuses it the > other way around in my current experience. > >> I'll do that just for grins & report back. > >Ok. Unforch, I was not able to find that in the .config file, so where is that particular option set? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.30% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: USB making time drift [was Re: dynamic-hz] 2004-12-16 1:59 ` Gene Heskett @ 2004-12-16 11:30 ` Andrea Arcangeli 2004-12-16 12:50 ` Alan Cox 1 sibling, 0 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-16 11:30 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel, Pavel Machek, Zwane Mwaikambo, Con Kolivas On Wed, Dec 15, 2004 at 08:59:52PM -0500, Gene Heskett wrote: > Unforch, I was not able to find that in the .config file, so where is > that particular option set? There is no config option indeed, you need to edit include/asm-i386/param.h to change HZ. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: USB making time drift [was Re: dynamic-hz] 2004-12-16 1:59 ` Gene Heskett 2004-12-16 11:30 ` Andrea Arcangeli @ 2004-12-16 12:50 ` Alan Cox 1 sibling, 0 replies; 126+ messages in thread From: Alan Cox @ 2004-12-16 12:50 UTC (permalink / raw) To: gene.heskett Cc: Linux Kernel Mailing List, Andrea Arcangeli, Pavel Machek, Zwane Mwaikambo, Con Kolivas On Iau, 2004-12-16 at 01:59, Gene Heskett wrote: > Unforch, I was not able to find that in the .config file, so where is > that particular option set? Base 2.6.9 hardcodes it, 2.6.9-ac has it in the configuration for x86 ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: USB making time drift [was Re: dynamic-hz] 2004-12-15 16:44 ` Gene Heskett 2004-12-15 18:20 ` Andrea Arcangeli @ 2004-12-15 20:16 ` Pavel Machek 2004-12-16 2:02 ` Gene Heskett 1 sibling, 1 reply; 126+ messages in thread From: Pavel Machek @ 2004-12-15 20:16 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel, Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas Hi! > >> Which way? I was running quite fast here, several minutes an > > > >In the future, if I disable the logic it goes in the past at the > > same speed it was previously going in the future. > > > >> hour, then I discovered the tickadj command, found its default > >> was 10000, and started reducing it. At 9926, I'm staying within > >> a sec an hour now. I have no idea when this started, I didn't > > > >That seems quite an hack, note I did an hack too and it make the > > drift much smaller (it gets manageable). But our modifications are > > wrong. > > > >The point is that this didn't happen with HZ=100, so it's not that > >tickadj is wrong, it's the tick adjustment code that doesn't work. > > > The HZ=1000 is the culprit? > > >You may want to recompile your kernel with HZ=100 and verify it goes > >away (I didn't verify myself, but I verified the max irq latency I > > get is 4msec, and in turn I'm sure HZ=100 would fix it > > Humm, that might also reduce the obviousness of the irq activity in > the audio, there are times when I can hear it very plainly while a > low level audio src is in use, like the sub-millivolt levels that come > out of my Hauppauge WinTV-GO+FM card. I keep having to turn the Try idle=poll. That noise may be commig from cpu switching between powersave and full speed. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: USB making time drift [was Re: dynamic-hz] 2004-12-15 20:16 ` Pavel Machek @ 2004-12-16 2:02 ` Gene Heskett 0 siblings, 0 replies; 126+ messages in thread From: Gene Heskett @ 2004-12-16 2:02 UTC (permalink / raw) To: linux-kernel; +Cc: Pavel Machek, Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas On Wednesday 15 December 2004 15:16, Pavel Machek wrote: >Hi! Hi Pavel; >> >> Which way? I was running quite fast here, several minutes an > >Try idle=poll. That noise may be commig from cpu switching between >powersave and full speed. > Pavel I don't think I have that option set/enabled at all, and these machines are running seti so the cpu stays at 100% anyway. Where would I set that if I wanted to try it? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.30% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: USB making time drift [was Re: dynamic-hz] 2004-12-15 9:17 ` Andrea Arcangeli 2004-12-15 16:44 ` Gene Heskett @ 2004-12-15 17:03 ` Gene Heskett 2004-12-15 17:48 ` Tim Schmielau 1 sibling, 1 reply; 126+ messages in thread From: Gene Heskett @ 2004-12-15 17:03 UTC (permalink / raw) To: linux-kernel On Wednesday 15 December 2004 04:17, Andrea Arcangeli wrote: >On Tue, Dec 14, 2004 at 09:59:23PM -0500, Gene Heskett wrote: [...] >The point is that this didn't happen with HZ=100, so it's not that >tickadj is wrong, it's the tick adjustment code that doesn't work. > >You may want to recompile your kernel with HZ=100 and verify it goes >away (I didn't verify myself, but I verified the max irq latency I > get is 4msec, and in turn I'm sure HZ=100 would fix it) Ok, I was going to do that, but forgive me, its not in the .config file as a setting. So where do edit what to revert to 100hz's. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.30% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: USB making time drift [was Re: dynamic-hz] 2004-12-15 17:03 ` Gene Heskett @ 2004-12-15 17:48 ` Tim Schmielau 2004-12-16 2:03 ` Gene Heskett 0 siblings, 1 reply; 126+ messages in thread From: Tim Schmielau @ 2004-12-15 17:48 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel > Ok, I was going to do that, but forgive me, its not in the .config > file as a setting. So where do edit what to revert to 100hz's. It's in line 5 of include/asm-i386/param.h: # define HZ 1000 /* Internal kernel timer frequency */ (if you are on an i386 system). Just change that back to 100. Tim ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: USB making time drift [was Re: dynamic-hz] 2004-12-15 17:48 ` Tim Schmielau @ 2004-12-16 2:03 ` Gene Heskett 0 siblings, 0 replies; 126+ messages in thread From: Gene Heskett @ 2004-12-16 2:03 UTC (permalink / raw) To: linux-kernel; +Cc: Tim Schmielau On Wednesday 15 December 2004 12:48, Tim Schmielau wrote: >> Ok, I was going to do that, but forgive me, its not in the .config >> file as a setting. So where do edit what to revert to 100hz's. > >It's in line 5 of include/asm-i386/param.h: ># define HZ 1000 /* Internal kernel timer > frequency */ > >(if you are on an i386 system). Just change that back to 100. > >Tim Thanks Tim, I might do that for a boot or 2 just for the exersize. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.30% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: USB making time drift [was Re: dynamic-hz] 2004-12-14 23:16 ` Andrea Arcangeli 2004-12-15 2:59 ` Gene Heskett @ 2004-12-16 0:58 ` Pavel Machek 2004-12-16 2:33 ` john stultz 1 sibling, 1 reply; 126+ messages in thread From: Pavel Machek @ 2004-12-16 0:58 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel Hi! > > How much drift do you see? > > huge drift, minutes per hour or similar. Okay, for your amusement, here's the evil "do-few-msec-interrupt-latency" program. Andrea, could you verify that it causes clock to drift for you? I'll leave it running here overnight, and will see what happens. Pavel void main(void) { int i; iopl(3); while (1) { asm volatile("cli"); for (i=0; i<20000000; i++) asm volatile(""); asm volatile("sti"); sleep(1); } } -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: USB making time drift [was Re: dynamic-hz] 2004-12-16 0:58 ` Pavel Machek @ 2004-12-16 2:33 ` john stultz 0 siblings, 0 replies; 126+ messages in thread From: john stultz @ 2004-12-16 2:33 UTC (permalink / raw) To: Pavel Machek; +Cc: Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml On Wed, 2004-12-15 at 16:58, Pavel Machek wrote: > Hi! > > > > How much drift do you see? > > > > huge drift, minutes per hour or similar. > > Okay, for your amusement, here's the evil > "do-few-msec-interrupt-latency" program. Ohhh! Awesome. I love it! I'm playing with it and I'm seeing occasional jumps forward in time (about an hour and 10mins, and then back). I'll start seeing if there isn't anything we can do a quick fix for. Also I'll use this to test the timeofday rework code I'm doing. Very nice! thanks! -john ^ permalink raw reply [flat|nested] 126+ messages in thread
* Time goes crazy in 2.6.9 after long cli [was Re: USB making time drift] 2004-12-14 22:02 ` USB making time drift [was Re: dynamic-hz] Pavel Machek 2004-12-14 23:16 ` Andrea Arcangeli @ 2004-12-16 1:15 ` Pavel Machek 2004-12-16 11:13 ` Andrea Arcangeli 1 sibling, 1 reply; 126+ messages in thread From: Pavel Machek @ 2004-12-16 1:15 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel Hi! > > > Are you using CONFIG_HPET_TIMER by chance? It seems to be missing some > > > strategic -1, TSC (etc) get it right. > > > > I'm not using hpet because it's an old hardware, this is with timer_tsc. > > It must be reproducible in any machine out there, especially with > > machines with usb it should be reproducible even without any userspace > > testcase doing iopl/cli/sti. Time will go silenty in the future at every > > usb irq (they often last 3/4msec). > > How much drift do you see? > > I have machine with UHCI here, and am using usb most of the time > (bluetooth for gprs connection), and did not notice too bad > drift. ntpdate does some adjustment each time I connect to the > network, but it Okay, I have good news and bad news. Bad news is that it is broken on my machine, too. Good news is that breakage is not at all subtle. root@amd:/home/pavel/misc# time ./latency 0.00user 0.00system 1.69 (0m1.694s) elapsed 0.17%CPU root@amd:/home/pavel/misc# time ./latency 0.00user 0.00system 4293.47 (71m33.478s) elapsed 0.00%CPU root@amd:/home/pavel/misc# 71 minutes when it ran for 2.5 seconds?! root@amd:~# ntpdate tak.cesnet.cz 16 Dec 02:04:24 ntpdate[6385]: adjust time server 195.113.144.238 offset 0.010865 sec root@amd:~# ntpdate tak.cesnet.cz 16 Dec 02:08:07 ntpdate[6405]: step time server 195.113.144.238 offset 85.903997 sec root@amd:~# ntpdate tak.cesnet.cz 16 Dec 02:09:02 ntpdate[6410]: step time server 195.113.144.238 offset 4.306853 sec root@amd:~# ntpdate tak.cesnet.cz 16 Dec 02:09:11 ntpdate[6411]: adjust time server 195.113.144.238 offset -0.028829 sec root@amd:~# ntpdate tak.cesnet.cz 16 Dec 02:09:27 ntpdate[6413]: step time server 195.113.144.238 offset 4.283117 sec root@amd:~# ntpdate tak.cesnet.cz 16 Dec 02:09:47 ntpdate[6415]: step time server 195.113.144.238 offset 4.286300 sec root@amd:~# It seems that each cycle of attached program (needs root) breaks system time by 4 seconds... I do not know why it printed 71minutes there. That seems like some underflow somewhere.. Strange, now it happened again. void main(void) { int i; iopl(3); while (1) { asm volatile("cli"); // for (i=0; i<20000000; i++) for (i=0; i<1000000000; i++) asm volatile(""); asm volatile("sti"); sleep(1); } } Actually it seems to create some sort havoc in timer subsystem... Actually it is reproducible: root@amd:/home/pavel/misc# date; time ./latency ; date; sleep 1; date; sleep 1; date; sleep 1; date Thu Dec 16 02:14:18 CET 2004 0.00user 0.00system 4293.51 (71m33.516s) elapsed 0.00%CPU Thu Dec 16 03:25:51 CET 2004 Thu Dec 16 02:14:18 CET 2004 Thu Dec 16 02:14:19 CET 2004 Thu Dec 16 02:14:20 CET 2004 root@amd:/home/pavel/misc# date; time ./latency ; date; sleep 1; date; sleep 1; date; sleep 1; date Thu Dec 16 02:14:23 CET 2004 0.00user 0.00system 4293.52 (71m33.521s) elapsed 0.00%CPU Thu Dec 16 03:25:56 CET 2004 Thu Dec 16 03:25:57 CET 2004 Thu Dec 16 02:14:23 CET 2004 Thu Dec 16 02:14:24 CET 2004 root@amd:/home/pavel/misc# Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: Time goes crazy in 2.6.9 after long cli [was Re: USB making time drift] 2004-12-16 1:15 ` Time goes crazy in 2.6.9 after long cli [was Re: USB making time drift] Pavel Machek @ 2004-12-16 11:13 ` Andrea Arcangeli 2004-12-16 12:49 ` Alan Cox 0 siblings, 1 reply; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-16 11:13 UTC (permalink / raw) To: Pavel Machek; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel On Thu, Dec 16, 2004 at 02:15:49AM +0100, Pavel Machek wrote: > Okay, I have good news and bad news. Bad news is that it is broken on > my machine, too. Good news is that breakage is not at all subtle. Well, I was pretty sure it was reproducible since the PIT and TSC are standard hw in all machines, it's just the excessive usb irq latency that triggers only in a few machines like my firewall (only with HZ=1000). My suggestion is that first we fix the accuracy of this, and *then* we consider switching to a one-short timer. Fixing this is possible as well by using only the TSC accuracy to account for system time, and not to use anymore the PIT accuracy as source of accuracy for system time. But then any error calibration while we transfer the accuracy of the PIT to the TSC, will propagate in a cumulative way over time. So it's not clear to me we can do that safely. The PIT is designed everywhere to be accurate for system time. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: Time goes crazy in 2.6.9 after long cli [was Re: USB making time drift] 2004-12-16 11:13 ` Andrea Arcangeli @ 2004-12-16 12:49 ` Alan Cox 0 siblings, 0 replies; 126+ messages in thread From: Alan Cox @ 2004-12-16 12:49 UTC (permalink / raw) To: Andrea Arcangeli Cc: Pavel Machek, Zwane Mwaikambo, Con Kolivas, Linux Kernel Mailing List On Iau, 2004-12-16 at 11:13, Andrea Arcangeli wrote: > Well, I was pretty sure it was reproducible since the PIT and TSC are > standard hw in all machines, it's just the excessive usb irq latency TSC is not by any means standard hw in all machines and it has a whole pile of issues on some of them with the way it varies rate and/or stops. > My suggestion is that first we fix the accuracy of this, and *then* we > consider switching to a one-short timer. Agreed - one shot timers are going to be nearly impossible to use for system time accounting because we keep losing time resetting it. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 11:28 ` dynamic-hz Andrea Arcangeli 2004-12-13 12:43 ` dynamic-hz Pavel Machek @ 2004-12-13 14:50 ` Zwane Mwaikambo 1 sibling, 0 replies; 126+ messages in thread From: Zwane Mwaikambo @ 2004-12-13 14:50 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Con Kolivas, Pavel Machek, linux-kernel On Mon, 13 Dec 2004, Andrea Arcangeli wrote: > You were the one making the case of the NMI, the NMI will screw > completely any attempt of rearming the TSC accurately (though I don't > mind too much, like for the sti; hlt, since NMI is pratically impossible > to trigger in production, if a NMI is fired we've more troubles than the > 1/HZ latency on a pending wakeup or on the system time taking the > tangent ;) I wouldn't say that NMI isn't used in production, if we didn't cater for NMI it'd be hard to do high sample rate profiling with Oprofile and dynamic-hz. I consider (non)kernel developers profiling code on systems as production use. > (btw, my firewall systemtime will get fixed too by dyanmic-hz HZ=100, > it's pure waste to keep my firewall at HZ=1000 even if I didn't have > constant irq-latency of 3/4msec [measured with rdtsc], though I didn't > mention this yet because dynamic-hz in my firewall case would be a pure > band-aid, even fixing the tick-lost adjustment would be a band-aid, the > only thing to fix is the usb irq that runs for 3/4msec without returning). I have a few personal systems which really would benefit too ;) Thanks, Zwane ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-12 23:36 ` dynamic-hz Con Kolivas 2004-12-12 23:42 ` dynamic-hz Pavel Machek 2004-12-12 23:43 ` dynamic-hz Andrea Arcangeli @ 2004-12-13 7:43 ` Stefan Seyfried 2004-12-13 13:58 ` dynamic-hz Russell King 2004-12-13 16:19 ` dynamic-hz Jan Engelhardt 2004-12-13 8:29 ` dynamic-hz Jan Engelhardt ` (3 subsequent siblings) 6 siblings, 2 replies; 126+ messages in thread From: Stefan Seyfried @ 2004-12-13 7:43 UTC (permalink / raw) To: Con Kolivas; +Cc: Pavel Machek, linux-kernel, Andrea Arcangeli Con Kolivas wrote: > Just being devils advocate here... > > I had variable Hz in my tree for a while and found there was one > solitary purpose to setting Hz to 100; to silence cheap capacitors. power savings? Having the cpu wake up 1000 times per second if the machine is idle cannot be better than only waking it up 100 times. Yes, i am always on the quest for the 5 extra minutes on battery :-) Stefan ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 7:43 ` dynamic-hz Stefan Seyfried @ 2004-12-13 13:58 ` Russell King 2004-12-13 14:14 ` dynamic-hz Russell King ` (3 more replies) 2004-12-13 16:19 ` dynamic-hz Jan Engelhardt 1 sibling, 4 replies; 126+ messages in thread From: Russell King @ 2004-12-13 13:58 UTC (permalink / raw) To: Stefan Seyfried; +Cc: Con Kolivas, Pavel Machek, linux-kernel, Andrea Arcangeli On Mon, Dec 13, 2004 at 08:43:55AM +0100, Stefan Seyfried wrote: > Con Kolivas wrote: > > Just being devils advocate here... > > > > I had variable Hz in my tree for a while and found there was one > > solitary purpose to setting Hz to 100; to silence cheap capacitors. > > power savings? Having the cpu wake up 1000 times per second if the > machine is idle cannot be better than only waking it up 100 times. > > Yes, i am always on the quest for the 5 extra minutes on battery :-) This is an easy thing to grab hold of, but rather pointless in the overall scheme of things. Those of us who have done power usage measurements know this already. The only case where this really makes sense is where the CPU power usage outweighs the power consumption of all other peripherals by at least an order of magnitude such that the rest of the system is insignificant compared to the CPU power. Lets take an example. Lets say that: * a CPU runs at about 245mA when active * 90mA when inactive * the timer interrupt takes 2us to execute 1000 times a second * no other processing is occuring This means that the average current consumption is about: 245mA * 2 * 10^-6 + 90mA * (1 - 2 * 10^-6) = 90.00031mA This means that the timer interrupt has increased CPU power by 0.00034%. Now, lets factor in the rest of a system. Lets the rest of the system takes 84mA. Recalculating (by increasing each figure by 84mA) gives us 174.00031mA, or an increase in overall system power by about 0.00018%. Assuming your battery normally lasts exactly 24 hours on a current drain of 174.00031mA, completely eliminating the tick gives you an extra 0.15 seconds battery life. Note: the above CPU power consumption figures were taken from the Intel PXA255 processor electrical specifications, and the "rest of the system" current consumption taken from a real life device. The timer interrupt taking 2us is probably an over- estimation. Only the battery lifetime of 24 hours is ficticious. And yes, from time to time I keep thinking that it would be nice to eliminate the timer tick to save some power. However, I've never been able to justify the extra code complexity against the power savings. It really only makes sense if you can essentially _power off_ your system until the next timer interrupt (thereby, in the above example, reducing the power consumption by some 174mA) -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 13:58 ` dynamic-hz Russell King @ 2004-12-13 14:14 ` Russell King 2004-12-13 14:52 ` dynamic-hz Alan Cox ` (2 subsequent siblings) 3 siblings, 0 replies; 126+ messages in thread From: Russell King @ 2004-12-13 14:14 UTC (permalink / raw) To: Stefan Seyfried, Con Kolivas, Pavel Machek, linux-kernel, Andrea Arcangeli On Mon, Dec 13, 2004 at 01:58:20PM +0000, Russell King wrote: > On Mon, Dec 13, 2004 at 08:43:55AM +0100, Stefan Seyfried wrote: > > Con Kolivas wrote: > > > Just being devils advocate here... > > > > > > I had variable Hz in my tree for a while and found there was one > > > solitary purpose to setting Hz to 100; to silence cheap capacitors. > > > > power savings? Having the cpu wake up 1000 times per second if the > > machine is idle cannot be better than only waking it up 100 times. > > > > Yes, i am always on the quest for the 5 extra minutes on battery :-) > > This is an easy thing to grab hold of, but rather pointless in the > overall scheme of things. Those of us who have done power usage > measurements know this already. > > The only case where this really makes sense is where the CPU power > usage outweighs the power consumption of all other peripherals by > at least an order of magnitude such that the rest of the system is > insignificant compared to the CPU power. > > Lets take an example. Lets say that: > * a CPU runs at about 245mA when active > * 90mA when inactive > * the timer interrupt takes 2us to execute 1000 times a second > * no other processing is occuring > > This means that the average current consumption is about: > 245mA * 2 * 10^-6 + 90mA * (1 - 2 * 10^-6) = 90.00031mA Sorry, missed out the 1000 times a second. Grumble. 245mA * 1000 * 2 * 10^-6 + 90mA * (1 - 2 * 10^-6 * 1000) = 90.31mA > This means that the timer interrupt has increased CPU power by > 0.00034%. 0.34% > Now, lets factor in the rest of a system. Lets the rest of the > system takes 84mA. Recalculating (by increasing each figure by > 84mA) gives us 174.00031mA, or an increase in overall system 174.31mA > power by about 0.00018%. 0.18% > Assuming your battery normally lasts exactly 24 hours on a current > drain of 174.00031mA, completely eliminating the tick gives you 174.31mA > an extra 0.15 seconds battery life. 2mins 30secs > Note: the above CPU power consumption figures were taken from > the Intel PXA255 processor electrical specifications, and the > "rest of the system" current consumption taken from a real life > device. The timer interrupt taking 2us is probably an over- > estimation. Only the battery lifetime of 24 hours is ficticious. > > And yes, from time to time I keep thinking that it would be nice > to eliminate the timer tick to save some power. However, I've > never been able to justify the extra code complexity against the > power savings. It really only makes sense if you can essentially > _power off_ your system until the next timer interrupt (thereby, > in the above example, reducing the power consumption by some 174mA) -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 13:58 ` dynamic-hz Russell King 2004-12-13 14:14 ` dynamic-hz Russell King @ 2004-12-13 14:52 ` Alan Cox 2004-12-13 16:23 ` dynamic-hz Russell King 2004-12-14 0:16 ` dynamic-hz Eric St-Laurent 2004-12-13 15:30 ` dynamic-hz Zwane Mwaikambo 2004-12-13 16:06 ` dynamic-hz Pavel Machek 3 siblings, 2 replies; 126+ messages in thread From: Alan Cox @ 2004-12-13 14:52 UTC (permalink / raw) To: Russell King Cc: Stefan Seyfried, Con Kolivas, Pavel Machek, Linux Kernel Mailing List, Andrea Arcangeli On Llu, 2004-12-13 at 13:58, Russell King wrote: > Lets take an example. Lets say that: > * a CPU runs at about 245mA when active > * 90mA when inactive > * the timer interrupt takes 2us to execute 1000 times a second > * no other processing is occuring Now take a real laptop and the numbers are in the 20W (15A) range. > to eliminate the timer tick to save some power. However, I've > never been able to justify the extra code complexity against the > power savings. It really only makes sense if you can essentially > _power off_ your system until the next timer interrupt (thereby, > in the above example, reducing the power consumption by some 174mA) On a PC it makes huge sense, the deeply embedded folks who do turn the thing off for 30secs at a time (Eg cellphone) also want it as do virtualisation people where it trashes your scaling. API wise it isn't too hard, its just a matter of time to convert the jiffies users away and to do relative versions of add_timer with accuracy info included. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 14:52 ` dynamic-hz Alan Cox @ 2004-12-13 16:23 ` Russell King 2004-12-13 17:53 ` dynamic-hz Michael Buesch ` (2 more replies) 2004-12-14 0:16 ` dynamic-hz Eric St-Laurent 1 sibling, 3 replies; 126+ messages in thread From: Russell King @ 2004-12-13 16:23 UTC (permalink / raw) To: Alan Cox Cc: Stefan Seyfried, Con Kolivas, Pavel Machek, Linux Kernel Mailing List, Andrea Arcangeli On Mon, Dec 13, 2004 at 02:52:46PM +0000, Alan Cox wrote: > On Llu, 2004-12-13 at 13:58, Russell King wrote: > > Lets take an example. Lets say that: > > * a CPU runs at about 245mA when active > > * 90mA when inactive > > * the timer interrupt takes 2us to execute 1000 times a second > > * no other processing is occuring > > Now take a real laptop and the numbers are in the 20W (15A) range. Roughly 650mA for my laptop while idle or just under 7W - by calculation from battery capacity and measured lifetime. The question is how much of that is due to the CPU itself and how much is due to the peripherals. > > to eliminate the timer tick to save some power. However, I've > > never been able to justify the extra code complexity against the > > power savings. It really only makes sense if you can essentially > > _power off_ your system until the next timer interrupt (thereby, > > in the above example, reducing the power consumption by some 174mA) > > On a PC it makes huge sense, the deeply embedded folks who do turn the > thing off for 30secs at a time (Eg cellphone) also want it as do > virtualisation people where it trashes your scaling. API wise it isn't > too hard, its just a matter of time to convert the jiffies users away > and to do relative versions of add_timer with accuracy info included. I don't disagree with your cellphone example - it makes a whole lot of sense there, where the device is going to end up in someones pocket not doing very much at all. There is another twist here though - the Linux kernel kicks itself out of idle mode and into some other thread multiple times a second while the system is idle. So far, in all my Linux kernel experience, I've yet to see a kernel where it's possible to stay in the idle thread for more than half a second. (The ARM kernels I run are always configured with IDLE LED support, so I can _see_ when it gets kicked out of the idle thread.) So, not only do the VST people need to solve the HZ interrupt problem, but also need to track down which kernel threads keep waking up on an otherwise idle system "just in case" they need to do something. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 16:23 ` dynamic-hz Russell King @ 2004-12-13 17:53 ` Michael Buesch 2004-12-13 18:04 ` dynamic-hz Russell King 2004-12-13 19:04 ` dynamic-hz Pavel Machek 2004-12-13 20:11 ` dynamic-hz Russell King 2 siblings, 1 reply; 126+ messages in thread From: Michael Buesch @ 2004-12-13 17:53 UTC (permalink / raw) To: Russell King Cc: Alan Cox, Stefan Seyfried, Con Kolivas, Pavel Machek, Linux Kernel Mailing List, Andrea Arcangeli [-- Attachment #1: Type: text/plain, Size: 508 bytes --] Quoting Russell King <rmk+lkml@arm.linux.org.uk>: > the system is idle. So far, in all my Linux kernel experience, I've > yet to see a kernel where it's possible to stay in the idle thread > for more than half a second. (The ARM kernels I run are always > configured with IDLE LED support, so I can _see_ when it gets kicked > out of the idle thread.) I guess IDLE LED support is not in mainline kernel, is it? Where can I get it? -- Regards Michael Buesch [ http://www.tuxsoft.de.vu ] [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 17:53 ` dynamic-hz Michael Buesch @ 2004-12-13 18:04 ` Russell King 0 siblings, 0 replies; 126+ messages in thread From: Russell King @ 2004-12-13 18:04 UTC (permalink / raw) To: Michael Buesch Cc: Alan Cox, Stefan Seyfried, Con Kolivas, Pavel Machek, Linux Kernel Mailing List, Andrea Arcangeli On Mon, Dec 13, 2004 at 06:53:40PM +0100, Michael Buesch wrote: > Quoting Russell King <rmk+lkml@arm.linux.org.uk>: > > the system is idle. So far, in all my Linux kernel experience, I've > > yet to see a kernel where it's possible to stay in the idle thread > > for more than half a second. (The ARM kernels I run are always > > configured with IDLE LED support, so I can _see_ when it gets kicked > > out of the idle thread.) > > I guess IDLE LED support is not in mainline kernel, is it? > Where can I get it? It's an ARM only thing, and it's in mainline kernels for ARM platforms which have general purpose LEDs available. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 16:23 ` dynamic-hz Russell King 2004-12-13 17:53 ` dynamic-hz Michael Buesch @ 2004-12-13 19:04 ` Pavel Machek 2004-12-13 20:11 ` dynamic-hz Russell King 2 siblings, 0 replies; 126+ messages in thread From: Pavel Machek @ 2004-12-13 19:04 UTC (permalink / raw) To: Alan Cox, Stefan Seyfried, Con Kolivas, Linux Kernel Mailing List, Andrea Arcangeli Hi! > > > Lets take an example. Lets say that: > > > * a CPU runs at about 245mA when active > > > * 90mA when inactive > > > * the timer interrupt takes 2us to execute 1000 times a second > > > * no other processing is occuring > > > > Now take a real laptop and the numbers are in the 20W (15A) range. > > Roughly 650mA for my laptop while idle or just under 7W - by calculation > from battery capacity and measured lifetime. The question is how much > of that is due to the CPU itself and how much is due to the > peripherals. On Arima notebook here, whole machine takes 28W on 800MHz, idle (measured using external power meter), 33W with max backlight, and 68W at 2GHz, computing. Going to HZ=100 makes it one Watt less. I'd say that's quite significant. [17W number was based on internal power meter when running on battery]. So yes, CPU *is* taking more than all other perihepals. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 16:23 ` dynamic-hz Russell King 2004-12-13 17:53 ` dynamic-hz Michael Buesch 2004-12-13 19:04 ` dynamic-hz Pavel Machek @ 2004-12-13 20:11 ` Russell King 2 siblings, 0 replies; 126+ messages in thread From: Russell King @ 2004-12-13 20:11 UTC (permalink / raw) To: Alan Cox, Stefan Seyfried, Con Kolivas, Pavel Machek, Linux Kernel Mailing List, Andrea Arcangeli On Mon, Dec 13, 2004 at 04:23:55PM +0000, Russell King wrote: > There is another twist here though - the Linux kernel kicks itself out > of idle mode and into some other thread multiple times a second while > the system is idle. So far, in all my Linux kernel experience, I've > yet to see a kernel where it's possible to stay in the idle thread > for more than half a second. (The ARM kernels I run are always > configured with IDLE LED support, so I can _see_ when it gets kicked > out of the idle thread.) For futher information only, analysing this further, we keep switching to the events/0 thread, and it seems to be mainly for: - cursor handling every 200ms - slab cache reaping about every 2s The cursor timer is firing all the time that you have a fbcon console registered, whether or not the cursor should be displayed. Someone looking to save power should probably tackle this such that the cursor timer doesn't needlessly fire. But I guess the cellphone people would be more interested in this problem than the big iron desktop-breaking in-need-of-three-phase-supply boxen. 8) -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 14:52 ` dynamic-hz Alan Cox 2004-12-13 16:23 ` dynamic-hz Russell King @ 2004-12-14 0:16 ` Eric St-Laurent 2004-12-15 18:04 ` dynamic-hz Alan Cox 1 sibling, 1 reply; 126+ messages in thread From: Eric St-Laurent @ 2004-12-14 0:16 UTC (permalink / raw) To: Alan Cox Cc: Russell King, Stefan Seyfried, Con Kolivas, Pavel Machek, Linux Kernel Mailing List, Andrea Arcangeli On Mon, 2004-12-13 at 14:52 +0000, Alan Cox wrote: > On a PC it makes huge sense, the deeply embedded folks who do turn the > thing off for 30secs at a time (Eg cellphone) also want it as do > virtualisation people where it trashes your scaling. API wise it isn't > too hard, its just a matter of time to convert the jiffies users away > and to do relative versions of add_timer with accuracy info included. Alan, On a related subject, a few months ago you posted a patch which added a nice add_timeout()/timeout_pending() API and converted many (if not most) drivers to use it. If I remember correctly it did not generate much comments and the work was not pushed into mainline. I think it's a nice cleanup, IMHO the time_(before|after)(jiffies, ...) construct is horrible. Any chance to resurrect this work ? PS: the original subject was "Initial bits to help pull jiffies out of drivers" Best regards, Eric St-Laurent ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 0:16 ` dynamic-hz Eric St-Laurent @ 2004-12-15 18:04 ` Alan Cox 2004-12-15 19:54 ` dynamic-hz linux-os 2004-12-16 9:10 ` dynamic-hz Gabriel Paubert 0 siblings, 2 replies; 126+ messages in thread From: Alan Cox @ 2004-12-15 18:04 UTC (permalink / raw) To: Eric St-Laurent Cc: Russell King, Stefan Seyfried, Con Kolivas, Pavel Machek, Linux Kernel Mailing List, Andrea Arcangeli On Maw, 2004-12-14 at 00:16, Eric St-Laurent wrote: > Alan, > > On a related subject, a few months ago you posted a patch which added a > nice add_timeout()/timeout_pending() API and converted many (if not > most) drivers to use it. > > If I remember correctly it did not generate much comments and the work > was not pushed into mainline. > > I think it's a nice cleanup, IMHO the time_(before|after)(jiffies, ...) > construct is horrible. > > Any chance to resurrect this work ? I plan to ressurect it when I have a little time but with some small additions from the original work. Several people said "it should be mS not HZ" and someone at OLS proposed that the API also includes an accuracy guide so that systems using programmed wakeups can aggregate timers when accuracy doesn't matter. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-15 18:04 ` dynamic-hz Alan Cox @ 2004-12-15 19:54 ` linux-os 2004-12-16 2:17 ` dynamic-hz Gene Heskett 2004-12-16 9:10 ` dynamic-hz Gabriel Paubert 1 sibling, 1 reply; 126+ messages in thread From: linux-os @ 2004-12-15 19:54 UTC (permalink / raw) To: Alan Cox Cc: Eric St-Laurent, Russell King, Stefan Seyfried, Con Kolivas, Pavel Machek, Linux Kernel Mailing List, Andrea Arcangeli On Wed, 15 Dec 2004, Alan Cox wrote: > On Maw, 2004-12-14 at 00:16, Eric St-Laurent wrote: >> Alan, >> >> On a related subject, a few months ago you posted a patch which added a >> nice add_timeout()/timeout_pending() API and converted many (if not >> most) drivers to use it. >> >> If I remember correctly it did not generate much comments and the work >> was not pushed into mainline. >> >> I think it's a nice cleanup, IMHO the time_(before|after)(jiffies, ...) >> construct is horrible. >> >> Any chance to resurrect this work ? > > I plan to ressurect it when I have a little time but with some small > additions from the original work. Several people said "it should be mS > not HZ" and someone at OLS proposed that the API also includes an > accuracy guide so that systems using programmed wakeups can aggregate > timers when accuracy doesn't matter. I sure hope it isn't mS. Transconductance or its reciprocal doesn't work very well for timing unless you supply the capacitor ;^) FYI, mS means milli-Siemens. Seconds is lower-case --always. Cheers, Dick Johnson Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by John Ashcroft. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-15 19:54 ` dynamic-hz linux-os @ 2004-12-16 2:17 ` Gene Heskett 2004-12-16 12:42 ` dynamic-hz linux-os 2004-12-17 20:12 ` dynamic-hz H. Peter Anvin 0 siblings, 2 replies; 126+ messages in thread From: Gene Heskett @ 2004-12-16 2:17 UTC (permalink / raw) To: linux-kernel, linux-os Cc: Alan Cox, Eric St-Laurent, Russell King, Stefan Seyfried, Con Kolivas, Pavel Machek, Andrea Arcangeli On Wednesday 15 December 2004 14:54, linux-os wrote: >On Wed, 15 Dec 2004, Alan Cox wrote: >> On Maw, 2004-12-14 at 00:16, Eric St-Laurent wrote: >>> Alan, >>> >>> On a related subject, a few months ago you posted a patch which >>> added a nice add_timeout()/timeout_pending() API and converted >>> many (if not most) drivers to use it. >>> >>> If I remember correctly it did not generate much comments and the >>> work was not pushed into mainline. >>> >>> I think it's a nice cleanup, IMHO the >>> time_(before|after)(jiffies, ...) construct is horrible. >>> >>> Any chance to resurrect this work ? >> >> I plan to ressurect it when I have a little time but with some >> small additions from the original work. Several people said "it >> should be mS not HZ" and someone at OLS proposed that the API also >> includes an accuracy guide so that systems using programmed >> wakeups can aggregate timers when accuracy doesn't matter. > >I sure hope it isn't mS. Transconductance or its reciprocal doesn't >work very well for timing unless you supply the capacitor ;^) Me sticks hand up and waves at teacher. And what does 'Transconductance' have to do with this? That may be the wrong terminology to apply here. In vacuum tube (remember those?) specifications, this is the gain of the tube, which AIR is stated as the change in plate current for a one volt change in grid bias, and is normally stated in micromho's as they are high voltage, low current devices, with the highest gain tube that I'm aware of being the 7788. Using the same measurement technique applied to modern relatively highed power field effect transistors where the currents can be many amperes, readings best stated in mho's are fairly common today. A 'mho' of course, is the reciprocal of an ohm. >FYI, mS means milli-Siemens. Seconds is lower-case --always. Yup. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.30% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-16 2:17 ` dynamic-hz Gene Heskett @ 2004-12-16 12:42 ` linux-os 2004-12-17 20:12 ` dynamic-hz H. Peter Anvin 1 sibling, 0 replies; 126+ messages in thread From: linux-os @ 2004-12-16 12:42 UTC (permalink / raw) To: Gene Heskett Cc: linux-kernel, Alan Cox, Eric St-Laurent, Russell King, Stefan Seyfried, Con Kolivas, Pavel Machek, Andrea Arcangeli On Wed, 15 Dec 2004, Gene Heskett wrote: > On Wednesday 15 December 2004 14:54, linux-os wrote: >> On Wed, 15 Dec 2004, Alan Cox wrote: >>> On Maw, 2004-12-14 at 00:16, Eric St-Laurent wrote: >>>> Alan, >>>> >>>> On a related subject, a few months ago you posted a patch which >>>> added a nice add_timeout()/timeout_pending() API and converted >>>> many (if not most) drivers to use it. >>>> >>>> If I remember correctly it did not generate much comments and the >>>> work was not pushed into mainline. >>>> >>>> I think it's a nice cleanup, IMHO the >>>> time_(before|after)(jiffies, ...) construct is horrible. >>>> >>>> Any chance to resurrect this work ? >>> >>> I plan to ressurect it when I have a little time but with some >>> small additions from the original work. Several people said "it >>> should be mS not HZ" and someone at OLS proposed that the API also >>> includes an accuracy guide so that systems using programmed >>> wakeups can aggregate timers when accuracy doesn't matter. >> >> I sure hope it isn't mS. Transconductance or its reciprocal doesn't >> work very well for timing unless you supply the capacitor ;^) > > Me sticks hand up and waves at teacher. > > And what does 'Transconductance' have to do with this? > The international notation for transconductance is Siemens, no longer MHO (Ohm spelled backwards). This happened at the same time that c.p.s. was changed to Hz. But, because the Siemens company is one of the world's largest, the "S" didn't catch on as readily as Hz and others. Siemens is so common you need to look up MHO in some really complete dictionary to find its usage as MHO. > That may be the wrong terminology to apply here. > > In vacuum tube (remember those?) specifications, this is the gain of > the tube, which AIR is stated as the change in plate current for a > one volt change in grid bias, and is normally stated in micromho's as > they are high voltage, low current devices, with the highest gain > tube that I'm aware of being the 7788. Using the same measurement The older MHO was usually stated in micro-mho for vacuum tubes. For instance low mu triodes like 12AT7 had a mu of 10 (10 micro-mho). The "gainier" cousin, the 12AX7 had a mu of 100 (100 micro-mho). Some modern FETs have transconductance up to 10 MHO (10 Siemens). > technique applied to modern relatively highed power field effect > transistors where the currents can be many amperes, readings best > stated in mho's are fairly common today. A 'mho' of course, is the > reciprocal of an ohm. > >> FYI, mS means milli-Siemens. Seconds is lower-case --always. > > Yup. > > -- > Cheers, Gene > "There are four boxes to be used in defense of liberty: > soap, ballot, jury, and ammo. Please use in that order." > -Ed Howdershelt (Author) > 99.30% setiathome rank, not too shabby for a WV hillbilly > Yahoo.com attorneys please note, additions to this message > by Gene Heskett are: > Copyright 2004 by Maurice Eugene Heskett, all rights reserved. > Cheers, Dick Johnson Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by John Ashcroft. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-16 2:17 ` dynamic-hz Gene Heskett 2004-12-16 12:42 ` dynamic-hz linux-os @ 2004-12-17 20:12 ` H. Peter Anvin 1 sibling, 0 replies; 126+ messages in thread From: H. Peter Anvin @ 2004-12-17 20:12 UTC (permalink / raw) To: linux-kernel Followup to: <200412152117.20568.gene.heskett@verizon.net> By author: Gene Heskett <gene.heskett@verizon.net> In newsgroup: linux.dev.kernel > > Me sticks hand up and waves at teacher. > > And what does 'Transconductance' have to do with this? > > That may be the wrong terminology to apply here. > > In vacuum tube (remember those?) specifications, this is the gain of > the tube, which AIR is stated as the change in plate current for a > one volt change in grid bias, and is normally stated in micromho's as > they are high voltage, low current devices, with the highest gain > tube that I'm aware of being the 7788. Using the same measurement > technique applied to modern relatively highed power field effect > transistors where the currents can be many amperes, readings best > stated in mho's are fairly common today. A 'mho' of course, is the > reciprocal of an ohm. > > >FYI, mS means milli-Siemens. Seconds is lower-case --always. > To be excrutiatingly picky: mS means millisiemens. Siemens is the SI unit for (trans)conductance. Like all units named after people: - its symbol (S) is capitalized; - its name (siemens) is not (unless starting a sentence). Note that since it's named after a person named Ernsr Werner von Siemens, it's called "siemens" even in singular (1 S = 1 siemens). According to normal English nomenclature then the plural would be siemenses, but that doesn't seem to have caught on. ms means milliseconds. Seconds is the SI unit for time. Like all units not named after people: - neither its symbol (s) nor its name (second) is capitalized. -hpa ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-15 18:04 ` dynamic-hz Alan Cox 2004-12-15 19:54 ` dynamic-hz linux-os @ 2004-12-16 9:10 ` Gabriel Paubert 2004-12-16 12:17 ` dynamic-hz Geert Uytterhoeven 2004-12-16 14:00 ` dynamic-hz Mitchell Blank Jr 1 sibling, 2 replies; 126+ messages in thread From: Gabriel Paubert @ 2004-12-16 9:10 UTC (permalink / raw) To: Alan Cox Cc: Eric St-Laurent, Russell King, Stefan Seyfried, Con Kolivas, Pavel Machek, Linux Kernel Mailing List, Andrea Arcangeli On Wed, Dec 15, 2004 at 06:04:03PM +0000, Alan Cox wrote: > On Maw, 2004-12-14 at 00:16, Eric St-Laurent wrote: > > Alan, > > > > On a related subject, a few months ago you posted a patch which added a > > nice add_timeout()/timeout_pending() API and converted many (if not > > most) drivers to use it. > > > > If I remember correctly it did not generate much comments and the work > > was not pushed into mainline. > > > > I think it's a nice cleanup, IMHO the time_(before|after)(jiffies, ...) > > construct is horrible. > > > > Any chance to resurrect this work ? > > I plan to ressurect it when I have a little time but with some small > additions from the original work. Several people said "it should be mS > not HZ" and someone at OLS proposed that the API also includes an > accuracy guide so that systems using programmed wakeups can aggregate > timers when accuracy doesn't matter. I suspect people who want to push HZ to 10000 won't be happy with milliseconds since it would not give them a resolution of one jiffy. So the options are: 1) microseconds, allows up to roughly half an hour (signed) or an hour (unsigned). 2) nanoseconds, needs 64 bits, nice for 64 bit machines but at the risk of bloat on 32 bit ones. 3) timespecs, somewhat wasteful on 64 bit machines (two longs). I believe 1) is the best compromise. Regards, Gabriel ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-16 9:10 ` dynamic-hz Gabriel Paubert @ 2004-12-16 12:17 ` Geert Uytterhoeven 2004-12-16 14:00 ` dynamic-hz Mitchell Blank Jr 1 sibling, 0 replies; 126+ messages in thread From: Geert Uytterhoeven @ 2004-12-16 12:17 UTC (permalink / raw) To: Gabriel Paubert Cc: Alan Cox, Eric St-Laurent, Russell King, Stefan Seyfried, Con Kolivas, Pavel Machek, Linux Kernel Mailing List, Andrea Arcangeli On Thu, 16 Dec 2004, Gabriel Paubert wrote: > On Wed, Dec 15, 2004 at 06:04:03PM +0000, Alan Cox wrote: > > On Maw, 2004-12-14 at 00:16, Eric St-Laurent wrote: > > > On a related subject, a few months ago you posted a patch which added a > > > nice add_timeout()/timeout_pending() API and converted many (if not > > > most) drivers to use it. > > > > > > If I remember correctly it did not generate much comments and the work > > > was not pushed into mainline. > > > > > > I think it's a nice cleanup, IMHO the time_(before|after)(jiffies, ...) > > > construct is horrible. > > > > > > Any chance to resurrect this work ? > > > > I plan to ressurect it when I have a little time but with some small > > additions from the original work. Several people said "it should be mS > > not HZ" and someone at OLS proposed that the API also includes an > > accuracy guide so that systems using programmed wakeups can aggregate > > timers when accuracy doesn't matter. > > I suspect people who want to push HZ to 10000 won't be happy with > milliseconds since it would not give them a resolution of one jiffy. > > So the options are: > 1) microseconds, allows up to roughly half an hour (signed) > or an hour (unsigned). > 2) nanoseconds, needs 64 bits, nice for 64 bit machines but > at the risk of bloat on 32 bit ones. > 3) timespecs, somewhat wasteful on 64 bit machines (two longs). > > I believe 1) is the best compromise. Yep. And if the need for ns arises, add a _different_ function (e.g. *_ns()) to wait with ns-resolution. 64 bit is probably not needed, who wants to wait for more than a few seconds with ns-resolution? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-16 9:10 ` dynamic-hz Gabriel Paubert 2004-12-16 12:17 ` dynamic-hz Geert Uytterhoeven @ 2004-12-16 14:00 ` Mitchell Blank Jr 1 sibling, 0 replies; 126+ messages in thread From: Mitchell Blank Jr @ 2004-12-16 14:00 UTC (permalink / raw) To: Gabriel Paubert Cc: Alan Cox, Eric St-Laurent, Russell King, Stefan Seyfried, Con Kolivas, Pavel Machek, Linux Kernel Mailing List, Andrea Arcangeli Gabriel Paubert wrote: > So the options are: > 1) microseconds, allows up to roughly half an hour (signed) > or an hour (unsigned). > 2) nanoseconds, needs 64 bits, nice for 64 bit machines but > at the risk of bloat on 32 bit ones. > 3) timespecs, somewhat wasteful on 64 bit machines (two longs). Also forgive me if this has already been discussed (I might have missed some of the messages on these threads) but there's also Paul Henning Kamp's "bintime" format used in FreeBSD: http://phk.freebsd.dk/pubs/timecounter.pdf I'm not convinced it's the right solution for this problem but the paper does make a lot of good points. I also agree that any new timer API needs to have entry points for users that can handle an imprecise wakeup -- running multiple wakeups at once is important at high load. Another idea I've been toying around in my head related to this (but would need some instrumentation to prove) I bet on heavy server loads there's a lot of timeouts where: 1. Requested timeout is always >N seconds 2. Timeouts almost always get canceled well before (N/2) seconds have passed If this is the case you could make a pretty simple hack -- on each CPU keep two list_head's of timers -- lets call them "add_list" and "prev_list". Now every N/2 seconds do: tmp = prev_list prev_list = add_list add_list = EMPTY ...and then add all the timers on "tmp" to the normal timer queue. The advantage here is that to add a timer you just have to insert it onto add_list -- you don't have to keep these lists in order. By the time we get around to adding the timers on "tmp" to the main timer queue we know: 1. They are still waiting to expire (at most "N" seconds have elapsed since they were inserted) 2. Most of them have been canceled (since at least "N/2" seconds have passed) and were thus removed from the list. "tmp" should not have many elements remaining I think for some class of timeouts (device timeouts, network, etc) this should be pretty efficient. You'd have to do a bunch of instrumenting to see if there are enough timers with these characteristics to make this useful (and what would make a good value for 'N') I've got a zillion other projects so I'm not going to have a chance to do this, but maybe it'll give someone else some ideas. -Mitch ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 13:58 ` dynamic-hz Russell King 2004-12-13 14:14 ` dynamic-hz Russell King 2004-12-13 14:52 ` dynamic-hz Alan Cox @ 2004-12-13 15:30 ` Zwane Mwaikambo 2004-12-13 15:59 ` dynamic-hz Russell King 2004-12-13 16:06 ` dynamic-hz Pavel Machek 3 siblings, 1 reply; 126+ messages in thread From: Zwane Mwaikambo @ 2004-12-13 15:30 UTC (permalink / raw) To: Russell King Cc: Stefan Seyfried, Con Kolivas, Pavel Machek, linux-kernel, Andrea Arcangeli Hi Russell, On Mon, 13 Dec 2004, Russell King wrote: > This is an easy thing to grab hold of, but rather pointless in the > overall scheme of things. Those of us who have done power usage > measurements know this already. > > The only case where this really makes sense is where the CPU power > usage outweighs the power consumption of all other peripherals by > at least an order of magnitude such that the rest of the system is > insignificant compared to the CPU power. > > Note: the above CPU power consumption figures were taken from > the Intel PXA255 processor electrical specifications, and the > "rest of the system" current consumption taken from a real life > device. The timer interrupt taking 2us is probably an over- > estimation. Only the battery lifetime of 24 hours is ficticious. While i do not disagree with your research and resultant conclusions for the PXA255 processor i think it may not be as representative of some of the target systems we're interested in, that is, x86 (cringe, cringe). A number of i386 systems enter model defined partial suspend states when execution of the hlt instruction takes place, resuming from these suspend states draws more power for a short period of time thus doing this every millisecond is going to be detrimental to total power consumption over time. But this isn't only an i386 trait as other desktop/workstation processors are similar. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 15:30 ` dynamic-hz Zwane Mwaikambo @ 2004-12-13 15:59 ` Russell King 2004-12-13 16:14 ` dynamic-hz Pavel Machek 0 siblings, 1 reply; 126+ messages in thread From: Russell King @ 2004-12-13 15:59 UTC (permalink / raw) To: Zwane Mwaikambo Cc: Stefan Seyfried, Con Kolivas, Pavel Machek, linux-kernel, Andrea Arcangeli On Mon, Dec 13, 2004 at 08:30:51AM -0700, Zwane Mwaikambo wrote: > Hi Russell, > > On Mon, 13 Dec 2004, Russell King wrote: > > > This is an easy thing to grab hold of, but rather pointless in the > > overall scheme of things. Those of us who have done power usage > > measurements know this already. > > > > The only case where this really makes sense is where the CPU power > > usage outweighs the power consumption of all other peripherals by > > at least an order of magnitude such that the rest of the system is > > insignificant compared to the CPU power. > > > > Note: the above CPU power consumption figures were taken from > > the Intel PXA255 processor electrical specifications, and the > > "rest of the system" current consumption taken from a real life > > device. The timer interrupt taking 2us is probably an over- > > estimation. Only the battery lifetime of 24 hours is ficticious. > > While i do not disagree with your research and resultant conclusions for > the PXA255 processor i think it may not be as representative of some of > the target systems we're interested in, that is, x86 (cringe, cringe). A > number of i386 systems enter model defined partial suspend states when > execution of the hlt instruction takes place, resuming from these suspend > states draws more power for a short period of time thus doing this every > millisecond is going to be detrimental to total power consumption over > time. But this isn't only an i386 trait as other desktop/workstation > processors are similar. I think you missed the emphasis of my mail - one on measurement to validate if this technology actually buys you anything. The second thing you missed is that drawing a lot of power for a short time may result in a rather negligable reduction in the overall scheme of things. Until you measure it and do the calculation, you'll never know. I can make the same comments as you above about the PXA255 processor. "The PXA255 has a special idle mode which reduces power consumption via the use of a special instruction. Resuming from this state to service the timer interrupt results in more power being drawed for a short period of time, thus doing this every millisecond is going to be determental to the total power consumption over time." See? Measurements in reality give the true story. Words are just that - words - which may not reflect reality. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 15:59 ` dynamic-hz Russell King @ 2004-12-13 16:14 ` Pavel Machek 0 siblings, 0 replies; 126+ messages in thread From: Pavel Machek @ 2004-12-13 16:14 UTC (permalink / raw) To: Zwane Mwaikambo, Stefan Seyfried, Con Kolivas, linux-kernel, Andrea Arcangeli Hi! > See? Measurements in reality give the true story. Words are just > that - words - which may not reflect reality. On athlon64 notebook, HZ=1000 takes 1W of power. That's 5% of overall power consumption, and as much as spinning disk takes. I'd say that's rather significant. Pavel -- Boycott Kodak -- for their patent abuse against Java. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 13:58 ` dynamic-hz Russell King ` (2 preceding siblings ...) 2004-12-13 15:30 ` dynamic-hz Zwane Mwaikambo @ 2004-12-13 16:06 ` Pavel Machek 3 siblings, 0 replies; 126+ messages in thread From: Pavel Machek @ 2004-12-13 16:06 UTC (permalink / raw) To: Stefan Seyfried, Con Kolivas, linux-kernel, Andrea Arcangeli Hi! > > > Just being devils advocate here... > > > > > > I had variable Hz in my tree for a while and found there was one > > > solitary purpose to setting Hz to 100; to silence cheap capacitors. > > > > power savings? Having the cpu wake up 1000 times per second if the > > machine is idle cannot be better than only waking it up 100 times. > > > > Yes, i am always on the quest for the 5 extra minutes on battery :-) > > This is an easy thing to grab hold of, but rather pointless in the > overall scheme of things. Those of us who have done power usage > measurements know this already. > > The only case where this really makes sense is where the CPU power > usage outweighs the power consumption of all other peripherals by > at least an order of magnitude such that the rest of the system is > insignificant compared to the CPU power. Why by order of magnitude? Anyway on PC machines, cpu in low-power mode takes about as much as rest of system, and in high-power mode it takes more than rest of system combined. I measured 1W savings from HZ=100, and that was on system that takes 17W total (arima athlon64 notebook). That is > 5%. > Lets take an example. Lets say that: > * a CPU runs at about 245mA when active > * 90mA when inactive > * the timer interrupt takes 2us to execute 1000 times a second > * no other processing is occuring You assume that cpu goes to sleep immeidately. That is *very* far away from reality on at least pentium 4. It takes half a milisecond to sleep/wakeup the cpu, that basically means that low power mode is not ever entered with HZ=1000... Pavel -- Boycott Kodak -- for their patent abuse against Java. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 7:43 ` dynamic-hz Stefan Seyfried 2004-12-13 13:58 ` dynamic-hz Russell King @ 2004-12-13 16:19 ` Jan Engelhardt 1 sibling, 0 replies; 126+ messages in thread From: Jan Engelhardt @ 2004-12-13 16:19 UTC (permalink / raw) To: Stefan Seyfried; +Cc: Con Kolivas, Pavel Machek, linux-kernel, Andrea Arcangeli >> Just being devils advocate here... >> >> I had variable Hz in my tree for a while and found there was one >> solitary purpose to setting Hz to 100; to silence cheap capacitors. > >power savings? Having the cpu wake up 1000 times per second if the >machine is idle cannot be better than only waking it up 100 times. That's like saying waking up the CPU just to perform a HLT operation would be useless. Jan Engelhardt -- ENOSPC ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-12 23:36 ` dynamic-hz Con Kolivas ` (2 preceding siblings ...) 2004-12-13 7:43 ` dynamic-hz Stefan Seyfried @ 2004-12-13 8:29 ` Jan Engelhardt 2004-12-14 22:54 ` dynamic-hz Lee Revell 2004-12-13 11:02 ` dynamic-hz Andrew Morton ` (2 subsequent siblings) 6 siblings, 1 reply; 126+ messages in thread From: Jan Engelhardt @ 2004-12-13 8:29 UTC (permalink / raw) Cc: linux-kernel > Just being devils advocate here... > > I had variable Hz in my tree for a while and found there was one solitary > purpose to setting Hz to 100; to silence cheap capacitors. > > The rest of my users that were setting Hz to 100 for so-called performance > gains were doing so under the false impression that cpu usage was lower simply > because of the woefully inaccurate cpu usage calcuation at 100Hz. I have found that mplayer drops audio less often when the harddisk is under load. Jan Engelhardt -- ENOSPC ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 8:29 ` dynamic-hz Jan Engelhardt @ 2004-12-14 22:54 ` Lee Revell 2004-12-14 23:38 ` dynamic-hz Chris Friesen 0 siblings, 1 reply; 126+ messages in thread From: Lee Revell @ 2004-12-14 22:54 UTC (permalink / raw) To: Jan Engelhardt; +Cc: linux-kernel On Mon, 2004-12-13 at 09:29 +0100, Jan Engelhardt wrote: > > Just being devils advocate here... > > > > I had variable Hz in my tree for a while and found there was one solitary > > purpose to setting Hz to 100; to silence cheap capacitors. > > > > The rest of my users that were setting Hz to 100 for so-called performance > > gains were doing so under the false impression that cpu usage was lower simply > > because of the woefully inaccurate cpu usage calcuation at 100Hz. > > I have found that mplayer drops audio less often when the harddisk is under > load. > Ugh, because mplayer stupidly does disk i/o and AV playback and GUI in the same thread. Insert Xine plug. Lee ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 22:54 ` dynamic-hz Lee Revell @ 2004-12-14 23:38 ` Chris Friesen 2004-12-15 8:32 ` dynamic-hz Jan Engelhardt 0 siblings, 1 reply; 126+ messages in thread From: Chris Friesen @ 2004-12-14 23:38 UTC (permalink / raw) To: Lee Revell; +Cc: Jan Engelhardt, linux-kernel Lee Revell wrote: > Ugh, because mplayer stupidly does disk i/o and AV playback and GUI in > the same thread. Insert Xine plug. This is not a problem as long as all of them can be done totally async. As soon as anything blocks, then there's an issue. Chris ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 23:38 ` dynamic-hz Chris Friesen @ 2004-12-15 8:32 ` Jan Engelhardt 0 siblings, 0 replies; 126+ messages in thread From: Jan Engelhardt @ 2004-12-15 8:32 UTC (permalink / raw) To: Chris Friesen; +Cc: Lee Revell, linux-kernel >> Ugh, because mplayer stupidly does disk i/o and AV playback and GUI in >> the same thread. Insert Xine plug. There has been a real flame war on using threads in mplayer -- it ended in a fork into mplayer-xp. Surprisingly, the problem is not mplayer. Using the OSS kernel modules instead of ALSA, audio may drop, but it does not _skip_ it. > This is not a problem as long as all of them can be done totally async. As > soon as anything blocks, then there's an issue. Is there a way i can prioritize mplayer to get disk i/o done first? Jan Engelhardt -- ENOSPC ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-12 23:36 ` dynamic-hz Con Kolivas ` (3 preceding siblings ...) 2004-12-13 8:29 ` dynamic-hz Jan Engelhardt @ 2004-12-13 11:02 ` Andrew Morton 2004-12-13 11:17 ` dynamic-hz Andrea Arcangeli 2004-12-13 11:19 ` dynamic-hz Hans Kristian Rosbach 2004-12-13 12:00 ` dynamic-hz Alan Cox 2004-12-14 22:28 ` dynamic-hz Lee Revell 6 siblings, 2 replies; 126+ messages in thread From: Andrew Morton @ 2004-12-13 11:02 UTC (permalink / raw) To: Con Kolivas; +Cc: andrea, pavel, linux-kernel Con Kolivas <kernel@kolivas.org> wrote: > > The performance benefit, if any, is often lost in noise during > benchmarks and when there, is less than 1%. So I was wondering if you > had some specific advantage in mind for this patch? Is there some > arch-specific advantage? I can certainly envision disadvantages to lower Hz. There are apparently some laptops which exhibit appreciable latency between the start of ACPI sleep and actually consuming less power. The 1ms wakeup frequency will shorten battery life on these machines significantly. (I forget the exact numbers - Len will know). So I guess we're going to have to do this sometime - I don't think there's any other solution apart from going fully tickless, which would be considerably more intrusive. We should retain the option of compile-time constant HZ - it's easy enough. Probably the patch already does that. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 11:02 ` dynamic-hz Andrew Morton @ 2004-12-13 11:17 ` Andrea Arcangeli 2004-12-13 11:25 ` dynamic-hz Andrew Morton 2004-12-13 11:19 ` dynamic-hz Hans Kristian Rosbach 1 sibling, 1 reply; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-13 11:17 UTC (permalink / raw) To: Andrew Morton; +Cc: Con Kolivas, pavel, linux-kernel On Mon, Dec 13, 2004 at 03:02:37AM -0800, Andrew Morton wrote: > We should retain the option of compile-time constant HZ - it's > easy enough. Probably the patch already does that. The patch only does HZ at dynamic time. But of course it's absolutely trivial to define it at compile time, it's probably a 3 liner on top of my current patch ;). However personally I don't think the three liner will worth the few seconds more spent configuring the kernel ;). The HZ cacheline is pure readonly (actually I'm not defining it as cacheline_aligned, I probably should, __HZ can go together with __SHIFT_HZ). The only debug option I introduced (because it could have a performance penalty) is a check that nobody ever attempts to read HZ before we initialized it by parsing the boot command line. If that happens I printk and then I fallback to the fixed-HZ, so machine works fine even in case of bugs and I get the debugging printk. That code actually never triggered once. I did it primarly during development to be sure I could debug fast troubles with other archs (this is already running in all archs with SLES8). This is pretty much the core of the patch: +extern unsigned long __HZ; + +static inline unsigned long get_hz(void) +{ +#ifdef CONFIG_DEBUG_HZ + if (unlikely(!__HZ)) { + __label__ here; + printk("early HZ: %p\n", &&here); + here: + init_HZ(USER_HZ); + } +#endif /* CONFIG_DEBUG_HZ */ + return __HZ; +} + +#define HZ get_hz() + +#define CLOCKS_PER_SEC (USER_HZ) /* like times() */ + +#define jiffies_to_clock_t(x) (likely((HZ) >= (USER_HZ)) ? \ + (x + ((HZ) / (USER_HZ)) - 1) / ((HZ) / (USER_HZ)) : \ + (x) * ((USER_HZ) / (HZ))) +#define user_to_kernel_hz(x) (likely((HZ) >= (USER_HZ)) ? \ + (x) * ((HZ) / (USER_HZ)) : \ + (x + ((USER_HZ) / (HZ)) - 1) / ((USER_HZ) / (HZ))) +#define user_to_kernel_hz_overflow(x) ((x * (HZ) + (USER_HZ) - 1) / (USER_HZ)) [..] +++ x/kernel/sched.c 2004-05-31 15:51:42.722918448 +0200 @@ -45,6 +45,8 @@ #define TASK_USER_PRIO(p) USER_PRIO((p)->static_prio) #define MAX_USER_PRIO (USER_PRIO(MAX_PRIO)) +unsigned long __HZ, __SHIFT_HZ; + /* ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 11:17 ` dynamic-hz Andrea Arcangeli @ 2004-12-13 11:25 ` Andrew Morton 2004-12-13 11:47 ` dynamic-hz Andrea Arcangeli 2004-12-14 3:54 ` dynamic-hz Nish Aravamudan 0 siblings, 2 replies; 126+ messages in thread From: Andrew Morton @ 2004-12-13 11:25 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: kernel, pavel, linux-kernel Andrea Arcangeli <andrea@suse.de> wrote: > > The patch only does HZ at dynamic time. But of course it's absolutely > trivial to define it at compile time, it's probably a 3 liner on top of > my current patch ;). However personally I don't think the three liner > will worth the few seconds more spent configuring the kernel ;). We still have 1000-odd places which do things like schedule_timeout(HZ/10); which will now involve a runtime divide. The propagation of msleep() and ssleep() will reduce that a bit, but not much. It's so simple to turn all those into compile-time divides that we may as well do it. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 11:25 ` dynamic-hz Andrew Morton @ 2004-12-13 11:47 ` Andrea Arcangeli 2004-12-14 3:56 ` dynamic-hz Nish Aravamudan 2004-12-14 3:54 ` dynamic-hz Nish Aravamudan 1 sibling, 1 reply; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-13 11:47 UTC (permalink / raw) To: Andrew Morton; +Cc: kernel, pavel, linux-kernel On Mon, Dec 13, 2004 at 03:25:21AM -0800, Andrew Morton wrote: > We still have 1000-odd places which do things like > > schedule_timeout(HZ/10); > > which will now involve a runtime divide. The propagation of msleep() and > ssleep() will reduce that a bit, but not much. The above is by far the least cpu-hungry piece, it's going to sleep for 100msec, so any order-of-nanoseconds computation in such path will be by defininition not measurable. msleep and ssleep as well will obviously be non measurable for the same reason (their only point is to wait and "waste" cpu). I mean, msleep/ssleep are the only places in the kernel that we don't really need to optimize ;). Most other fast paths can't execute the division or multiplication at compile time anyway, so they'd only save 1 cacheline (at the expense of a bit larger icache). > It's so simple to turn all those into compile-time divides that we may as > well do it. I'm not against leaving a compile time option, it's absolutely trivial to add it, but I just don't think it'll provide any measurable benefit in practice, while the ability to switch HZ provides tantible benefits (even to be able to set HZ to higher frequencies than 1khz, so that people can post a nanosleep call that will return in 0.1msec instead of 1msec). Perhaps __HZ could hurt a bit on a NUMA box where the icache may be spread on the local nodes and the __HZ not, but then the __HZ could be made a __per_cpu variable conditionally to NUMA and they would get dynamic settable hz too, which I believe is significant for a numa box since if they're doing just userspace computing they don't need a fast HZ and they can get back 1% of their cpu power from every cpu in the system (on a 512-way system that's quite a lot more than what you will ever get back from HZ set at compile time ;). ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 11:47 ` dynamic-hz Andrea Arcangeli @ 2004-12-14 3:56 ` Nish Aravamudan 0 siblings, 0 replies; 126+ messages in thread From: Nish Aravamudan @ 2004-12-14 3:56 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Andrew Morton, kernel, pavel, linux-kernel On Mon, 13 Dec 2004 12:47:37 +0100, Andrea Arcangeli <andrea@suse.de> wrote: > On Mon, Dec 13, 2004 at 03:25:21AM -0800, Andrew Morton wrote: > > We still have 1000-odd places which do things like > > > > schedule_timeout(HZ/10); > > > > which will now involve a runtime divide. The propagation of msleep() and > > ssleep() will reduce that a bit, but not much. > > The above is by far the least cpu-hungry piece, it's going to sleep for > 100msec, so any order-of-nanoseconds computation in such path will be by > defininition not measurable. > > msleep and ssleep as well will obviously be non measurable for the same > reason (their only point is to wait and "waste" cpu). I mean, > msleep/ssleep are the only places in the kernel that we don't really > need to optimize ;). I don't exactly understand what you mean by ""waste" cpu"? They both give up the CPU by calling schedule_timeout() which calls schedule(). So any "waste" of the CPU is due to no tasks being available to run, not to msleep()/ssleep(). I think :) -Nish ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 11:25 ` dynamic-hz Andrew Morton 2004-12-13 11:47 ` dynamic-hz Andrea Arcangeli @ 2004-12-14 3:54 ` Nish Aravamudan 2004-12-14 4:29 ` dynamic-hz Andrew Morton ` (2 more replies) 1 sibling, 3 replies; 126+ messages in thread From: Nish Aravamudan @ 2004-12-14 3:54 UTC (permalink / raw) To: Andrew Morton; +Cc: Andrea Arcangeli, kernel, pavel, linux-kernel On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote: > Andrea Arcangeli <andrea@suse.de> wrote: > > > > The patch only does HZ at dynamic time. But of course it's absolutely > > trivial to define it at compile time, it's probably a 3 liner on top of > > my current patch ;). However personally I don't think the three liner > > will worth the few seconds more spent configuring the kernel ;). > > We still have 1000-odd places which do things like > > schedule_timeout(HZ/10); Yes, yes, we do :) I replaced far more than I ever thought I could... There are a few issues I have with the remaining schedule_timeout() calls which I think fit ok with this thread... I'd especially like your input, Andrew, as you end up getting most of my patches from KJ. Many drivers use set_current_state(TASK_{UN,}INTERRUPTIBLE); schedule_timeout(1); // or some other small value < 10 This may or may not hide a dependency on a particular HZ value. If the code is somewhat old, perhaps the author intended the task to sleep for 1 jiffy when HZ was equal to 100. That meants that they ended up sleeping for 10 ms. If the code is new, the author intends that the task sleeps for 1 ms (HZ==1000). The question is, what should the replacement be? If they really meant to use schedule_timeout(1) in the sense of highest resolution delay possible (the latter above), then they probably should just call schedule() directly. schedule_timeout(1) simply sets up a timer to fire off after 1 jiffy & then calls schedule() itself. The overhead of setting up a timer and the execution of schedule() itself probably means that the timer will go off in the middle of the schedule() call or very shortly thereafter (I think). In which case, it makes more sense to use schedule() directly... If they meant to schedule a delay of 10ms, then msleep() should be used in those cases. msleep() will also resolve the issues with 0-time timeouts because of rounding, as it adds 1 to the converted parameter. Obviously, changing more and more sleeps to msecs & secs will really help make the changing of HZ more transparent. And specifying the time in real time units just seems so much clearer to me. What do people think? -Nish ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 3:54 ` dynamic-hz Nish Aravamudan @ 2004-12-14 4:29 ` Andrew Morton 2004-12-14 5:25 ` dynamic-hz Nish Aravamudan 2004-12-17 20:10 ` dynamic-hz Nish Aravamudan 2004-12-14 10:01 ` dynamic-hz Domen Puncer 2004-12-14 14:23 ` dynamic-hz linux-os 2 siblings, 2 replies; 126+ messages in thread From: Andrew Morton @ 2004-12-14 4:29 UTC (permalink / raw) To: Nish Aravamudan; +Cc: andrea, kernel, pavel, linux-kernel Nish Aravamudan <nish.aravamudan@gmail.com> wrote: > > On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote: > > Andrea Arcangeli <andrea@suse.de> wrote: > > > > > > The patch only does HZ at dynamic time. But of course it's absolutely > > > trivial to define it at compile time, it's probably a 3 liner on top of > > > my current patch ;). However personally I don't think the three liner > > > will worth the few seconds more spent configuring the kernel ;). > > > > We still have 1000-odd places which do things like > > > > schedule_timeout(HZ/10); > > Yes, yes, we do :) I replaced far more than I ever thought I could... > There are a few issues I have with the remaining schedule_timeout() > calls which I think fit ok with this thread... I'd especially like > your input, Andrew, as you end up getting most of my patches from KJ. > > Many drivers use > > set_current_state(TASK_{UN,}INTERRUPTIBLE); > schedule_timeout(1); // or some other small value < 10 > > This may or may not hide a dependency on a particular HZ value. If the > code is somewhat old, perhaps the author intended the task to sleep > for 1 jiffy when HZ was equal to 100. That meants that they ended up > sleeping for 10 ms. If the code is new, the author intends that the > task sleeps for 1 ms (HZ==1000). The question is, what should the > replacement be? Presumably they meant 10 milliseconds. Or at least, that is the delay which the developer did his testing with. > If they really meant to use schedule_timeout(1) in the sense of > highest resolution delay possible (the latter above), then they > probably should just call schedule() directly. argh. Never do that. It's basically a busywait and can cause lockups if the calling task has realtime scheduling policy. > schedule_timeout(1) > simply sets up a timer to fire off after 1 jiffy & then calls > schedule() itself. The overhead of setting up a timer and the > execution of schedule() itself probably means that the timer will go > off in the middle of the schedule() call or very shortly thereafter (I > think). In which case, it makes more sense to use schedule() > directly... > > If they meant to schedule a delay of 10ms, then msleep() should be > used in those cases. msleep() will also resolve the issues with 0-time > timeouts because of rounding, as it adds 1 to the converted parameter. > > Obviously, changing more and more sleeps to msecs & secs will really > help make the changing of HZ more transparent. And specifying the time > in real time units just seems so much clearer to me. > > What do people think? I'd say that replacing them with msleep(10) is the safest approach. Depending on what the surronding code is actually doing, of course. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 4:29 ` dynamic-hz Andrew Morton @ 2004-12-14 5:25 ` Nish Aravamudan 2004-12-17 20:10 ` dynamic-hz Nish Aravamudan 1 sibling, 0 replies; 126+ messages in thread From: Nish Aravamudan @ 2004-12-14 5:25 UTC (permalink / raw) To: Andrew Morton; +Cc: andrea, kernel, pavel, linux-kernel On Mon, 13 Dec 2004 20:29:39 -0800, Andrew Morton <akpm@osdl.org> wrote: > Nish Aravamudan <nish.aravamudan@gmail.com> wrote: > > > > > > On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote: > > > Andrea Arcangeli <andrea@suse.de> wrote: > > > > > > > > The patch only does HZ at dynamic time. But of course it's absolutely > > > > trivial to define it at compile time, it's probably a 3 liner on top of > > > > my current patch ;). However personally I don't think the three liner > > > > will worth the few seconds more spent configuring the kernel ;). > > > > > > We still have 1000-odd places which do things like > > > > > > schedule_timeout(HZ/10); > > > > Yes, yes, we do :) I replaced far more than I ever thought I could... > > There are a few issues I have with the remaining schedule_timeout() > > calls which I think fit ok with this thread... I'd especially like > > your input, Andrew, as you end up getting most of my patches from KJ. > > > > Many drivers use > > > > set_current_state(TASK_{UN,}INTERRUPTIBLE); > > schedule_timeout(1); // or some other small value < 10 > > > > This may or may not hide a dependency on a particular HZ value. If the > > code is somewhat old, perhaps the author intended the task to sleep > > for 1 jiffy when HZ was equal to 100. That meants that they ended up > > sleeping for 10 ms. If the code is new, the author intends that the > > task sleeps for 1 ms (HZ==1000). The question is, what should the > > replacement be? > > Presumably they meant 10 milliseconds. Or at least, that is the delay > which the developer did his testing with. OK, I will make a set of these changes soon, hopefully. > > If they really meant to use schedule_timeout(1) in the sense of > > highest resolution delay possible (the latter above), then they > > probably should just call schedule() directly. > > argh. Never do that. It's basically a busywait and can cause lockups if > the calling task has realtime scheduling policy. OK, I won't make any such changes in my next next set of patches. > > schedule_timeout(1) > > simply sets up a timer to fire off after 1 jiffy & then calls > > schedule() itself. The overhead of setting up a timer and the > > execution of schedule() itself probably means that the timer will go > > off in the middle of the schedule() call or very shortly thereafter (I > > think). In which case, it makes more sense to use schedule() > > directly... > > > > If they meant to schedule a delay of 10ms, then msleep() should be > > used in those cases. msleep() will also resolve the issues with 0-time > > timeouts because of rounding, as it adds 1 to the converted parameter. > > > > Obviously, changing more and more sleeps to msecs & secs will really > > help make the changing of HZ more transparent. And specifying the time > > in real time units just seems so much clearer to me. > > > > What do people think? > > I'd say that replacing them with msleep(10) is the safest approach. > Depending on what the surronding code is actually doing, of course. Thanks for the info! -Nish ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 4:29 ` dynamic-hz Andrew Morton 2004-12-14 5:25 ` dynamic-hz Nish Aravamudan @ 2004-12-17 20:10 ` Nish Aravamudan 1 sibling, 0 replies; 126+ messages in thread From: Nish Aravamudan @ 2004-12-17 20:10 UTC (permalink / raw) To: Andrew Morton; +Cc: andrea, kernel, pavel, linux-kernel On Mon, 13 Dec 2004 20:29:39 -0800, Andrew Morton <akpm@osdl.org> wrote: > Nish Aravamudan <nish.aravamudan@gmail.com> wrote: > > > > On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote: > > > Andrea Arcangeli <andrea@suse.de> wrote: > > > > > > > > The patch only does HZ at dynamic time. But of course it's absolutely > > > > trivial to define it at compile time, it's probably a 3 liner on top of > > > > my current patch ;). However personally I don't think the three liner > > > > will worth the few seconds more spent configuring the kernel ;). > > > > > > We still have 1000-odd places which do things like > > > > > > schedule_timeout(HZ/10); > > > > Yes, yes, we do :) I replaced far more than I ever thought I could... > > There are a few issues I have with the remaining schedule_timeout() > > calls which I think fit ok with this thread... I'd especially like > > your input, Andrew, as you end up getting most of my patches from KJ. > > > > Many drivers use > > > > set_current_state(TASK_{UN,}INTERRUPTIBLE); > > schedule_timeout(1); // or some other small value < 10 > > > > This may or may not hide a dependency on a particular HZ value. If the > > code is somewhat old, perhaps the author intended the task to sleep > > for 1 jiffy when HZ was equal to 100. That meants that they ended up > > sleeping for 10 ms. If the code is new, the author intends that the > > task sleeps for 1 ms (HZ==1000). The question is, what should the > > replacement be? > > Presumably they meant 10 milliseconds. Or at least, that is the delay > which the developer did his testing with. > > > If they really meant to use schedule_timeout(1) in the sense of > > highest resolution delay possible (the latter above), then they > > probably should just call schedule() directly. > > argh. Never do that. It's basically a busywait and can cause lockups if > the calling task has realtime scheduling policy. For those drivers that use schedule() calls currently to delay, what would you recommend? drivers/atm/ambassador.c contains a few examples. I can get rid of most of the schedule_timeout() calls, but the schedule() ones are a little more difficult. Would schedule_timeout(1) be preferred to schedule()? Thanks, Nish ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 3:54 ` dynamic-hz Nish Aravamudan 2004-12-14 4:29 ` dynamic-hz Andrew Morton @ 2004-12-14 10:01 ` Domen Puncer 2004-12-14 16:56 ` dynamic-hz Nish Aravamudan 2004-12-14 14:23 ` dynamic-hz linux-os 2 siblings, 1 reply; 126+ messages in thread From: Domen Puncer @ 2004-12-14 10:01 UTC (permalink / raw) To: Nish Aravamudan Cc: Andrew Morton, Andrea Arcangeli, kernel, pavel, linux-kernel On 13/12/04 19:54 -0800, Nish Aravamudan wrote: > On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote: > > Andrea Arcangeli <andrea@suse.de> wrote: > > > > > > The patch only does HZ at dynamic time. But of course it's absolutely > > > trivial to define it at compile time, it's probably a 3 liner on top of > > > my current patch ;). However personally I don't think the three liner > > > will worth the few seconds more spent configuring the kernel ;). > > > > We still have 1000-odd places which do things like > > > > schedule_timeout(HZ/10); > ... > Many drivers use > > set_current_state(TASK_{UN,}INTERRUPTIBLE); > schedule_timeout(1); // or some other small value < 10 > ... > If they really meant to use schedule_timeout(1) in the sense of > highest resolution delay possible (the latter above), then they > probably should just call schedule() directly. Um... no (and you should remember this from our discussions), schedule() gives up cpu until waitqueue wakeup or signal is received, and that can be a really long delay :-) Domen ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 10:01 ` dynamic-hz Domen Puncer @ 2004-12-14 16:56 ` Nish Aravamudan 0 siblings, 0 replies; 126+ messages in thread From: Nish Aravamudan @ 2004-12-14 16:56 UTC (permalink / raw) To: Domen Puncer; +Cc: Andrew Morton, Andrea Arcangeli, kernel, pavel, linux-kernel On Tue, 14 Dec 2004 11:01:23 +0100, Domen Puncer <domen@coderock.org> wrote: > On 13/12/04 19:54 -0800, Nish Aravamudan wrote: > > On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote: > > > Andrea Arcangeli <andrea@suse.de> wrote: > > > > > > > > The patch only does HZ at dynamic time. But of course it's absolutely > > > > trivial to define it at compile time, it's probably a 3 liner on top of > > > > my current patch ;). However personally I don't think the three liner > > > > will worth the few seconds more spent configuring the kernel ;). > > > > > > We still have 1000-odd places which do things like > > > > > > schedule_timeout(HZ/10); > > > ... > > Many drivers use > > > > set_current_state(TASK_{UN,}INTERRUPTIBLE); > > schedule_timeout(1); // or some other small value < 10 > > > ... > > If they really meant to use schedule_timeout(1) in the sense of > > highest resolution delay possible (the latter above), then they > > probably should just call schedule() directly. > > Um... no (and you should remember this from our discussions), schedule() > gives up cpu until waitqueue wakeup or signal is received, and that can > be a really long delay :-) True; sorry about that, Domen, completely forgot about that. Will think on it further. -Nish ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 3:54 ` dynamic-hz Nish Aravamudan 2004-12-14 4:29 ` dynamic-hz Andrew Morton 2004-12-14 10:01 ` dynamic-hz Domen Puncer @ 2004-12-14 14:23 ` linux-os 2004-12-14 16:54 ` dynamic-hz Nish Aravamudan 2 siblings, 1 reply; 126+ messages in thread From: linux-os @ 2004-12-14 14:23 UTC (permalink / raw) To: Nish Aravamudan Cc: Andrew Morton, Andrea Arcangeli, kernel, pavel, linux-kernel On Mon, 13 Dec 2004, Nish Aravamudan wrote: > On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote: >> Andrea Arcangeli <andrea@suse.de> wrote: >>> >>> The patch only does HZ at dynamic time. But of course it's absolutely >>> trivial to define it at compile time, it's probably a 3 liner on top of >>> my current patch ;). However personally I don't think the three liner >>> will worth the few seconds more spent configuring the kernel ;). >> >> We still have 1000-odd places which do things like >> >> schedule_timeout(HZ/10); > > Yes, yes, we do :) I replaced far more than I ever thought I could... > There are a few issues I have with the remaining schedule_timeout() > calls which I think fit ok with this thread... I'd especially like > your input, Andrew, as you end up getting most of my patches from KJ. > > Many drivers use > > set_current_state(TASK_{UN,}INTERRUPTIBLE); > schedule_timeout(1); // or some other small value < 10 > > This may or may not hide a dependency on a particular HZ value. If the > code is somewhat old, perhaps the author intended the task to sleep > for 1 jiffy when HZ was equal to 100. That meants that they ended up > sleeping for 10 ms. If the code is new, the author intends that the > task sleeps for 1 ms (HZ==1000). The question is, what should the > replacement be? > > If they really meant to use schedule_timeout(1) in the sense of > highest resolution delay possible (the latter above), then they > probably should just call schedule() directly. schedule_timeout(1) > simply sets up a timer to fire off after 1 jiffy & then calls > schedule() itself. The overhead of setting up a timer and the > execution of schedule() itself probably means that the timer will go > off in the middle of the schedule() call or very shortly thereafter (I > think). In which case, it makes more sense to use schedule() > directly... > > If they meant to schedule a delay of 10ms, then msleep() should be > used in those cases. msleep() will also resolve the issues with 0-time > timeouts because of rounding, as it adds 1 to the converted parameter. > > Obviously, changing more and more sleeps to msecs & secs will really > help make the changing of HZ more transparent. And specifying the time > in real time units just seems so much clearer to me. > > What do people think? > > -Nish I found that if you use schedule() directly then the sleeping task appears to be spinning in "system" in `top`. If you use schedule_timeout(0), it works the same, but doesn't appear to be eating CPU cycles as shown by `top`. Many common drivers need to have the timeout interruptible, but wait <forever if necessary> for a particular event. They need to get the CPU back fairly often to check again for the event. They need the equavalent of user-mode sched_yield(). sys_sched_yield() did't seem to work correctly, last time I tried. Maybe somebody could make a sched_yield() for the kernel. That would improve a lot of drivers. Cheers, Dick Johnson Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by John Ashcroft. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 14:23 ` dynamic-hz linux-os @ 2004-12-14 16:54 ` Nish Aravamudan 2004-12-14 17:15 ` dynamic-hz Andrea Arcangeli 0 siblings, 1 reply; 126+ messages in thread From: Nish Aravamudan @ 2004-12-14 16:54 UTC (permalink / raw) To: linux-os; +Cc: Andrew Morton, Andrea Arcangeli, kernel, pavel, linux-kernel On Tue, 14 Dec 2004 09:23:54 -0500 (EST), linux-os <linux-os@chaos.analogic.com> wrote: > On Mon, 13 Dec 2004, Nish Aravamudan wrote: > > > On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote: > >> Andrea Arcangeli <andrea@suse.de> wrote: > >>> > >>> The patch only does HZ at dynamic time. But of course it's absolutely > >>> trivial to define it at compile time, it's probably a 3 liner on top of > >>> my current patch ;). However personally I don't think the three liner > >>> will worth the few seconds more spent configuring the kernel ;). > >> > >> We still have 1000-odd places which do things like > >> > >> schedule_timeout(HZ/10); > > > > Yes, yes, we do :) I replaced far more than I ever thought I could... > > There are a few issues I have with the remaining schedule_timeout() > > calls which I think fit ok with this thread... I'd especially like > > your input, Andrew, as you end up getting most of my patches from KJ. > > > > Many drivers use > > > > set_current_state(TASK_{UN,}INTERRUPTIBLE); > > schedule_timeout(1); // or some other small value < 10 > > > > This may or may not hide a dependency on a particular HZ value. If the > > code is somewhat old, perhaps the author intended the task to sleep > > for 1 jiffy when HZ was equal to 100. That meants that they ended up > > sleeping for 10 ms. If the code is new, the author intends that the > > task sleeps for 1 ms (HZ==1000). The question is, what should the > > replacement be? > > > > If they really meant to use schedule_timeout(1) in the sense of > > highest resolution delay possible (the latter above), then they > > probably should just call schedule() directly. schedule_timeout(1) > > simply sets up a timer to fire off after 1 jiffy & then calls > > schedule() itself. The overhead of setting up a timer and the > > execution of schedule() itself probably means that the timer will go > > off in the middle of the schedule() call or very shortly thereafter (I > > think). In which case, it makes more sense to use schedule() > > directly... > > > > If they meant to schedule a delay of 10ms, then msleep() should be > > used in those cases. msleep() will also resolve the issues with 0-time > > timeouts because of rounding, as it adds 1 to the converted parameter. > > > > Obviously, changing more and more sleeps to msecs & secs will really > > help make the changing of HZ more transparent. And specifying the time > > in real time units just seems so much clearer to me. > > > > What do people think? > > > > -Nish > > I found that if you use schedule() directly then the sleeping > task appears to be spinning in "system" in `top`. If you use > schedule_timeout(0), it works the same, but doesn't appear > to be eating CPU cycles as shown by `top`. Many common > drivers need to have the timeout interruptible, but wait > <forever if necessary> for a particular event. They need > to get the CPU back fairly often to check again for the > event. They need the equavalent of user-mode sched_yield(). > sys_sched_yield() did't seem to work correctly, last time > I tried. > > Maybe somebody could make a sched_yield() for the kernel. > That would improve a lot of drivers. Hmm, schedule_timeout(0) working that way is interesting. There is also the option to use schedule_timeout(MAX_SCHEDULE_TIMEOUT) which should sleep indefinitely (depending of course on the conditions of the state). Oh but I think I understand what you're saying... the driver needs to sleep indefinitely in total (potentially), but needs to be able to return quite often (like yield() used to) so they could check a condition... Thanks for the input! -Nish ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 16:54 ` dynamic-hz Nish Aravamudan @ 2004-12-14 17:15 ` Andrea Arcangeli 2004-12-14 17:42 ` dynamic-hz Nish Aravamudan 2004-12-14 18:22 ` dynamic-hz linux-os 0 siblings, 2 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-14 17:15 UTC (permalink / raw) To: Nish Aravamudan; +Cc: linux-os, Andrew Morton, kernel, pavel, linux-kernel On Tue, Dec 14, 2004 at 08:54:29AM -0800, Nish Aravamudan wrote: > Hmm, schedule_timeout(0) working that way is interesting. There is > also the option to use schedule_timeout(MAX_SCHEDULE_TIMEOUT) which > should sleep indefinitely (depending of course on the conditions of > the state). Oh but I think I understand what you're saying... the > driver needs to sleep indefinitely in total (potentially), but needs > to be able to return quite often (like yield() used to) so they could > check a condition... > > Thanks for the input! what do you mean like yield() used to? yield() is still there in latest 2.6, just call yield() and you'll get the same effect of sched_yield in userspace. yields in the kernel are a bad thing though (they usually mean code is not well written, code should be event driven not polled driven). Note that __set_current_state(..); schedule_timeout(0) is not like yield. yield will return immediatly if it's the only task running. A yielding loop will consume all available cpu, while the schedule_timeout(0) will wait less than 1/HZ sec. But really schedule_timeout(0) makes little sense, either use schedule_timeout(1) and explicitly wait 1msec, or use yield. schedule_timeout(0) just happens to work because the timer code has to approximate for excess and it will wait for the next timer irq for timeouts <= 0 and it will wait for two ticks for timeouts == 1 etc... I guess we could change schedule_timeout() to WARN_ON if 0 is being passed to it. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 17:15 ` dynamic-hz Andrea Arcangeli @ 2004-12-14 17:42 ` Nish Aravamudan 2004-12-14 18:29 ` dynamic-hz Andrea Arcangeli 2004-12-14 18:22 ` dynamic-hz linux-os 1 sibling, 1 reply; 126+ messages in thread From: Nish Aravamudan @ 2004-12-14 17:42 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-os, Andrew Morton, kernel, pavel, linux-kernel On Tue, 14 Dec 2004 18:15:03 +0100, Andrea Arcangeli <andrea@suse.de> wrote: > On Tue, Dec 14, 2004 at 08:54:29AM -0800, Nish Aravamudan wrote: > > Hmm, schedule_timeout(0) working that way is interesting. There is > > also the option to use schedule_timeout(MAX_SCHEDULE_TIMEOUT) which > > should sleep indefinitely (depending of course on the conditions of > > the state). Oh but I think I understand what you're saying... the > > driver needs to sleep indefinitely in total (potentially), but needs > > to be able to return quite often (like yield() used to) so they could > > check a condition... > > > > Thanks for the input! > > what do you mean like yield() used to? yield() is still there in latest > 2.6, just call yield() and you'll get the same effect of sched_yield in > userspace. yields in the kernel are a bad thing though (they usually > mean code is not well written, code should be event driven not polled > driven). Sorry for my lack of clarity :) I was referring more to the second part of what you said, that the "meaning" of yield() changed for 2.6 and thus shouldn't be used to wait for short times (see kerneljanitors TODO reference from Matthew Wilcox (search for yield in page): http://www.kerneljanitors.org/TODO). > Note that __set_current_state(..); schedule_timeout(0) is not like > yield. yield will return immediatly if it's the only task running. A > yielding loop will consume all available cpu, while the > schedule_timeout(0) will wait less than 1/HZ sec. But really > schedule_timeout(0) makes little sense, either use schedule_timeout(1) > and explicitly wait 1msec, or use yield. schedule_timeout(0) just > happens to work because the timer code has to approximate for excess and > it will wait for the next timer irq for timeouts <= 0 and it will wait > for two ticks for timeouts == 1 etc... >From the context of the TODO, it seems yield() and schedule_timeout() should not be considered alternatives for each other. Maybe someone can clarify? > I guess we could change schedule_timeout() to WARN_ON if 0 is being > passed to it. I will see if anyone is actually calling with 0 -- I don't remember seeing this for my previous sets of patches, but it may happen if HZ changes in value. -Nish ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 17:42 ` dynamic-hz Nish Aravamudan @ 2004-12-14 18:29 ` Andrea Arcangeli 2004-12-14 19:00 ` dynamic-hz Nish Aravamudan 0 siblings, 1 reply; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-14 18:29 UTC (permalink / raw) To: Nish Aravamudan; +Cc: linux-os, Andrew Morton, kernel, pavel, linux-kernel On Tue, Dec 14, 2004 at 09:42:02AM -0800, Nish Aravamudan wrote: > Sorry for my lack of clarity :) I was referring more to the second > part of what you said, that the "meaning" of yield() changed for 2.6 The meaning of yield didn't really change. The behaviour changed a bit to allow scalability even if more than one task is polling for a resource (potentially even the _same_ resource) using yield(). But if you were using yield() in 2.4 you shouldn't change to anything different than yield() in 2.6. If you get bad latencies under load in 2.6, it's simply a gentle reminder that using yield() is always a bad idea ;). NPTL converted the yield() loops in the slow path of the pthread_mutex to even driven futex, otherwise 2.6 behaviour would break a lot more than OOo. In my 2.4-aa I've a sysctl to switch yield between two 2.4/2.6 behaviours. The new behaviour broke OOo and all linthread apps for example, so it was necessary to use a sysctl to control it, even if the new yield() behaviour is more correct because it has a chance to scala under load. Ingo may want to correct me if I remember wrong, I discussed this stuff with him at the time. > and thus shouldn't be used to wait for short times (see kerneljanitors > TODO reference from Matthew Wilcox (search for yield in page): > http://www.kerneljanitors.org/TODO). The 2.4 yield() could introduce significant latencies too if more than one task was looping in yield at the same time for different resources. > From the context of the TODO, it seems yield() and schedule_timeout() > should not be considered alternatives for each other. Maybe someone > can clarify? It depends what you're doing. yield() and __set_current_state(..); schedule_timeout(1) are similar. I don't think schedule_timeout(0) makes much sense (but in practice it works very similarly to schedule_timeout(1)). The former will pool ASAP by guaranteeing the CPU won't go idle. The latter will make the CPU go idle and it'll wait between 1/HZ sec and 2/HZ sec. The point is that polling is wrong and you should register into a waitqueue and then __set_current_state(..); schedule(). This is exactly what NPTL did too, and as far as I can tell it's pratically the most noticeable feature for optimally written threaded apps. The yield/schedule_timeout(1)-without-registering-in-callbacks are just tricks for some special code. For example I used myself schedule_timeout(1) in the oom killer patch a few days ago, but that code runs only when the machine is out of memory and several tasks will try to kill something at the same time. At that time the cpu load really doesn't matter. So tricks like that are ok in corner cases where performance cannot matter at all. For fast paths or regular code, yield should not be used (and schedule_timeout(1) used as as yield won't be much better). Conceptually if you want to poll as soon as possible you should use yield(). If you want to wait and give some idle time to the cpu you should use schedule_timeout(). You should ignore the claim that yield isn't appropriate in 2.6 for waiting short periods of time, yield is still the API to use for polling while keeping the cpu busy. If the machine is overloaded then it will take a while to get back to the polling loop with 2.6, but then 2.4 had other corner cases with the machine overloaded by userspace tasks calling sched_yield too. So it's not really that much different in terms of the guarantees that yield can provide between 2.4/2.6. The only guarantee that yield can provide is that the cpu will remain busy, and that you'll be rescheduled if some other task is pending in the runqueue. It can't provide any guarantee on when you'll become running again. > > I guess we could change schedule_timeout() to WARN_ON if 0 is being > > passed to it. > > I will see if anyone is actually calling with 0 -- I don't remember It's not that bad, I mean schedule_timeout(0) works fine, but once in a while it may not wait anything and just return after invoking a timer callback. So if somebody uses schedule_timeout, it's because he wants always to make the cpu go idle for a little bit, and in turn it would be better to use 1 (0 doesn't guarantee to go idle). > seeing this for my previous sets of patches, but it may happen if HZ > changes in value. The HZ errors are just due the lack of roundup, and schedule_timeout can't do anything about it, only the caller can (it's a problem even for other HZ values that generate rounding errors, and that's why HZ=100 and HZ=1000 are the only two really supported frequencies to freely switch at boot time ;). ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 18:29 ` dynamic-hz Andrea Arcangeli @ 2004-12-14 19:00 ` Nish Aravamudan 0 siblings, 0 replies; 126+ messages in thread From: Nish Aravamudan @ 2004-12-14 19:00 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-os, Andrew Morton, kernel, pavel, linux-kernel On Tue, 14 Dec 2004 19:29:00 +0100, Andrea Arcangeli <andrea@suse.de> wrote: > On Tue, Dec 14, 2004 at 09:42:02AM -0800, Nish Aravamudan wrote: > > Sorry for my lack of clarity :) I was referring more to the second > > part of what you said, that the "meaning" of yield() changed for 2.6 > > The meaning of yield didn't really change. The behaviour changed a bit > to allow scalability even if more than one task is polling for a > resource (potentially even the _same_ resource) using yield(). > > But if you were using yield() in 2.4 you shouldn't change to anything > different than yield() in 2.6. If you get bad latencies under load in > 2.6, it's simply a gentle reminder that using yield() is always a bad > idea ;). > > NPTL converted the yield() loops in the slow path of the pthread_mutex to > even driven futex, otherwise 2.6 behaviour would break a lot more than > OOo. > > In my 2.4-aa I've a sysctl to switch yield between two 2.4/2.6 > behaviours. The new behaviour broke OOo and all linthread apps for > example, so it was necessary to use a sysctl to control it, even if the > new yield() behaviour is more correct because it has a chance to scala > under load. > > Ingo may want to correct me if I remember wrong, I discussed this stuff > with him at the time. > > > and thus shouldn't be used to wait for short times (see kerneljanitors > > TODO reference from Matthew Wilcox (search for yield in page): > > http://www.kerneljanitors.org/TODO). > > The 2.4 yield() could introduce significant latencies too if more than > one task was looping in yield at the same time for different resources. > > > From the context of the TODO, it seems yield() and schedule_timeout() > > should not be considered alternatives for each other. Maybe someone > > can clarify? > > It depends what you're doing. yield() and __set_current_state(..); > schedule_timeout(1) are similar. I don't think schedule_timeout(0) makes > much sense (but in practice it works very similarly to > schedule_timeout(1)). The former will pool ASAP by guaranteeing the CPU > won't go idle. The latter will make the CPU go idle and it'll wait > between 1/HZ sec and 2/HZ sec. > > The point is that polling is wrong and you should register into a > waitqueue and then __set_current_state(..); schedule(). This is exactly > what NPTL did too, and as far as I can tell it's pratically the most > noticeable feature for optimally written threaded apps. The > yield/schedule_timeout(1)-without-registering-in-callbacks are just > tricks for some special code. For example I used myself > schedule_timeout(1) in the oom killer patch a few days ago, but that > code runs only when the machine is out of memory and several tasks will > try to kill something at the same time. At that time the cpu load really > doesn't matter. So tricks like that are ok in corner cases where > performance cannot matter at all. For fast paths or regular code, yield > should not be used (and schedule_timeout(1) used as as yield won't be > much better). > > Conceptually if you want to poll as soon as possible you should use > yield(). If you want to wait and give some idle time to the cpu you > should use schedule_timeout(). > > You should ignore the claim that yield isn't appropriate in 2.6 for > waiting short periods of time, yield is still the API to use for polling > while keeping the cpu busy. If the machine is overloaded then it will > take a while to get back to the polling loop with 2.6, but then 2.4 had > other corner cases with the machine overloaded by userspace tasks > calling sched_yield too. So it's not really that much different in terms > of the guarantees that yield can provide between 2.4/2.6. The only > guarantee that yield can provide is that the cpu will remain busy, and > that you'll be rescheduled if some other task is pending in the > runqueue. It can't provide any guarantee on when you'll become running > again. > > > > I guess we could change schedule_timeout() to WARN_ON if 0 is being > > > passed to it. > > > > I will see if anyone is actually calling with 0 -- I don't remember > > It's not that bad, I mean schedule_timeout(0) works fine, but once in a > while it may not wait anything and just return after invoking a timer > callback. So if somebody uses schedule_timeout, it's because he wants > always to make the cpu go idle for a little bit, and in turn it would be > better to use 1 (0 doesn't guarantee to go idle). > > > seeing this for my previous sets of patches, but it may happen if HZ > > changes in value. > > The HZ errors are just due the lack of roundup, and schedule_timeout > can't do anything about it, only the caller can (it's a problem even for > other HZ values that generate rounding errors, and that's why HZ=100 and > HZ=1000 are the only two really supported frequencies to freely switch > at boot time ;). Great! Thanks a lot for all of the clarifications! -Nish ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 17:15 ` dynamic-hz Andrea Arcangeli 2004-12-14 17:42 ` dynamic-hz Nish Aravamudan @ 2004-12-14 18:22 ` linux-os 2004-12-14 18:38 ` dynamic-hz Andrea Arcangeli 2004-12-14 18:50 ` dynamic-hz Pavel Machek 1 sibling, 2 replies; 126+ messages in thread From: linux-os @ 2004-12-14 18:22 UTC (permalink / raw) To: Andrea Arcangeli Cc: Nish Aravamudan, Andrew Morton, kernel, pavel, linux-kernel On Tue, 14 Dec 2004, Andrea Arcangeli wrote: > On Tue, Dec 14, 2004 at 08:54:29AM -0800, Nish Aravamudan wrote: >> Hmm, schedule_timeout(0) working that way is interesting. There is >> also the option to use schedule_timeout(MAX_SCHEDULE_TIMEOUT) which >> should sleep indefinitely (depending of course on the conditions of >> the state). Oh but I think I understand what you're saying... the >> driver needs to sleep indefinitely in total (potentially), but needs >> to be able to return quite often (like yield() used to) so they could >> check a condition... >> >> Thanks for the input! > > what do you mean like yield() used to? yield() is still there in latest > 2.6, just call yield() and you'll get the same effect of sched_yield in > userspace. yields in the kernel are a bad thing though (they usually > mean code is not well written, code should be event driven not polled > driven). > Yield used to not show a spin in `top`. Also, contrary to "popular" opinion, not all events are accompanied by interrupts. If they where, I'd gladly use one of the sleep_on* functions. For instance, I need to erase NVRAM (Flash). Then I need to program each byte. Waiting for the completion events requires polling the hardware. Proper software will give up the CPU while waiting and only sample the event, not continually spin. You can get away with software murder if you only need to program something that saves some state between shutdowns. However, if you have a writable flash file-system you need to do it right. > Note that __set_current_state(..); schedule_timeout(0) is not like > yield. yield will return immediatly if it's the only task running. A > yielding loop will consume all available cpu, while the > schedule_timeout(0) will wait less than 1/HZ sec. But really The timeout of (0) was really to make the code more obvious, the facts being that we really need to get the CPU back as soon as there are no higher-priority tasks computable. If yield() would work like schedule(0), of course I'd use it. The major problem with yield() probably has to do with accounting. The machine "feels" as though the CPU is properly available when you need it, however it appears to be spinning, using 100% system time. This makes customers nervous. > schedule_timeout(0) makes little sense, either use schedule_timeout(1) > and explicitly wait 1msec, or use yield. schedule_timeout(0) just > happens to work because the timer code has to approximate for excess and > it will wait for the next timer irq for timeouts <= 0 and it will wait > for two ticks for timeouts == 1 etc... > > I guess we could change schedule_timeout() to WARN_ON if 0 is being > passed to it. > Cheers, Dick Johnson Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by John Ashcroft. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 18:22 ` dynamic-hz linux-os @ 2004-12-14 18:38 ` Andrea Arcangeli 2004-12-14 18:50 ` dynamic-hz Pavel Machek 1 sibling, 0 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-14 18:38 UTC (permalink / raw) To: linux-os; +Cc: Nish Aravamudan, Andrew Morton, kernel, pavel, linux-kernel On Tue, Dec 14, 2004 at 01:22:03PM -0500, linux-os wrote: > Yield used to not show a spin in `top`. Also, contrary to > "popular" opinion, not all events are accompanied by interrupts. Yes, ppa zip drive has the same issue. yield shows a spin in top if it's the only running task. Otherwise it will wait other task to run first. The behaviour has changed a bit between 2.4 and 2.6, and we changed the corner cases. But the semantics of yield are still the same. > If they where, I'd gladly use one of the sleep_on* functions. Minor detail: sleep_on is obsolete and should be deleted since it requires the big kernel lock or the global cli to be safe. But I got the point ;) > For instance, I need to erase NVRAM (Flash). Then I need to > program each byte. Waiting for the completion events requires > polling the hardware. Proper software will give up the CPU > while waiting and only sample the event, not continually spin. This is a case where you know when you can expect the hardware to be done (just like it was the case for the ppa zip). While dealing with long hardware delays schedule_timeout makes plenty of sense. It would be pointless to yield and spin, if you know nothing good can happen in the next millisecond. > The timeout of (0) was really to make the code more obvious, the > facts being that we really need to get the CPU back as soon as > there are no higher-priority tasks computable. If yield() would With schedule_timeout(1) you're probably going to become interactive, and you'll be scheduled before other tasks. That's good. I mean the scheduler sorts things out automatically. > work like schedule(0), of course I'd use it. The major problem > with yield() probably has to do with accounting. The machine > "feels" as though the CPU is properly available when you need > it, however it appears to be spinning, using 100% system time. > This makes customers nervous. It's as well a waste of energy power to spin when you can schedule_timeout(1). So you're optimal at using schedule_timeout(1) in this case while waiting hardware to complete as far as I can tell. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 18:22 ` dynamic-hz linux-os 2004-12-14 18:38 ` dynamic-hz Andrea Arcangeli @ 2004-12-14 18:50 ` Pavel Machek 1 sibling, 0 replies; 126+ messages in thread From: Pavel Machek @ 2004-12-14 18:50 UTC (permalink / raw) To: linux-os Cc: Andrea Arcangeli, Nish Aravamudan, Andrew Morton, kernel, linux-kernel HI! > The timeout of (0) was really to make the code more obvious, the > facts being that we really need to get the CPU back as soon as > there are no higher-priority tasks computable. If yield() would > work like schedule(0), of course I'd use it. The major problem > with yield() probably has to do with accounting. The machine > "feels" as though the CPU is properly available when you need > it, however it appears to be spinning, using 100% system time. > This makes customers nervous. Well, machine that showed as "idle" yet had cpu running at full speed would make *me* nervous... Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 11:02 ` dynamic-hz Andrew Morton 2004-12-13 11:17 ` dynamic-hz Andrea Arcangeli @ 2004-12-13 11:19 ` Hans Kristian Rosbach 2004-12-13 11:22 ` dynamic-hz Pavel Machek ` (2 more replies) 1 sibling, 3 replies; 126+ messages in thread From: Hans Kristian Rosbach @ 2004-12-13 11:19 UTC (permalink / raw) To: Andrew Morton; +Cc: Con Kolivas, andrea, pavel, Linux Kernel Mailing List On Mon, 2004-12-13 at 12:02, Andrew Morton wrote: > Con Kolivas <kernel@kolivas.org> wrote: > > The performance benefit, if any, is often lost in noise during > > benchmarks and when there, is less than 1%. So I was wondering if you > > had some specific advantage in mind for this patch? Is there some > > arch-specific advantage? I can certainly envision disadvantages to lower Hz. > > There are apparently some laptops which exhibit appreciable latency between > the start of ACPI sleep and actually consuming less power. The 1ms wakeup > frequency will shorten battery life on these machines significantly. (I > forget the exact numbers - Len will know). Is there any recommended lower bound setting? Would there be a point in recommending lower settings for desktops running only text consoles opposed to X desktops? -HK ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 11:19 ` dynamic-hz Hans Kristian Rosbach @ 2004-12-13 11:22 ` Pavel Machek 2004-12-13 11:39 ` dynamic-hz Andrea Arcangeli 2004-12-13 12:51 ` dynamic-hz Hans Kristian Rosbach 2004-12-13 11:33 ` dynamic-hz Andrea Arcangeli 2004-12-13 14:38 ` dynamic-hz Zwane Mwaikambo 2 siblings, 2 replies; 126+ messages in thread From: Pavel Machek @ 2004-12-13 11:22 UTC (permalink / raw) To: Hans Kristian Rosbach Cc: Andrew Morton, Con Kolivas, andrea, Linux Kernel Mailing List Hi! > > > The performance benefit, if any, is often lost in noise during > > > benchmarks and when there, is less than 1%. So I was wondering if you > > > had some specific advantage in mind for this patch? Is there some > > > arch-specific advantage? I can certainly envision disadvantages to lower Hz. > > > > There are apparently some laptops which exhibit appreciable latency between > > the start of ACPI sleep and actually consuming less power. The 1ms wakeup > > frequency will shorten battery life on these machines significantly. (I > > forget the exact numbers - Len will know). > > Is there any recommended lower bound setting? > Would there be a point in recommending lower settings for desktops > running only text consoles opposed to X desktops? I tried defining HZ to 10 once, and there are some #if arrays in the kernel that prevented me from doing that. Some drivers do timeouts based on jiffies; having HZ=1 may turn 20msec timeout into 1sec, that could hurt a lot in the error case... Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 11:22 ` dynamic-hz Pavel Machek @ 2004-12-13 11:39 ` Andrea Arcangeli 2004-12-13 12:51 ` dynamic-hz Hans Kristian Rosbach 1 sibling, 0 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-13 11:39 UTC (permalink / raw) To: Pavel Machek Cc: Hans Kristian Rosbach, Andrew Morton, Con Kolivas, Linux Kernel Mailing List On Mon, Dec 13, 2004 at 12:22:29PM +0100, Pavel Machek wrote: > I tried defining HZ to 10 once, and there are some #if arrays in the > kernel that prevented me from doing that. I guess you're right and the minimum is HZ=12. I'm pretty sure I could go down to 25, perhaps the absolute minium was 12 and not 10. There's also some side effect like this by setting strange HZ: --- x-ref/net/sched/estimator.c 2003-03-15 03:25:19.000000000 +0100 +++ x/net/sched/estimator.c 2004-05-31 15:51:42.778909936 +0200 @@ -71,10 +71,6 @@ at user level painlessly. */ -#if (HZ%4) != 0 -#error Bad HZ value. -#endif - #define EST_MAX_INTERVAL 5 struct qdisc_estimator @@ -136,6 +132,9 @@ int qdisc_new_estimator(struct tc_stats struct qdisc_estimator *est; struct tc_estimator *parm = RTA_DATA(opt); + if (unlikely(HZ % 4)) + return -EINVAL; + if (RTA_PAYLOAD(opt) < sizeof(*parm)) return -EINVAL; If you boot with an HZ not divisible by 4 you get -EINVAL at runtime (instead of a compile failure since we can't check it at compile time anymore ;). Anyway the major point of the patch is to get HZ switchable from 100 to 1000, those two values are really the only supported ones. The rest is a bonus, and I'm sure at least 50 and 2000 will work flawlessy too. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 11:22 ` dynamic-hz Pavel Machek 2004-12-13 11:39 ` dynamic-hz Andrea Arcangeli @ 2004-12-13 12:51 ` Hans Kristian Rosbach 2004-12-13 13:01 ` dynamic-hz Andrea Arcangeli ` (2 more replies) 1 sibling, 3 replies; 126+ messages in thread From: Hans Kristian Rosbach @ 2004-12-13 12:51 UTC (permalink / raw) To: Pavel Machek Cc: Andrew Morton, Con Kolivas, andrea, Linux Kernel Mailing List On Mon, Dec 13, 2004 at 12:22:29PM +0100, Pavel Machek wrote: > I tried defining HZ to 10 once, and there are some #if arrays in the > kernel that prevented me from doing that. > > Some drivers do timeouts based on jiffies; having HZ=1 may turn 20msec > timeout into 1sec, that could hurt a lot in the error case... On Mon, Dec 13, 2004 at 03:25:21AM -0800, Andrew Morton wrote: > We still have 1000-odd places which do things like > schedule_timeout(HZ/10); > which will now involve a runtime divide. The propagation of msleep() > and ssleep() will reduce that a bit, but not much. Shouldn't that be regarded as a bug/deprecated? I'm not sure what the above "scedule_timeout(HZ/10)" is supposed to do, but the parameter it gets in 1000hz is "100" so I assume this is because we want to wait for 100ms, and in 1000hz that equals 100 cycles. Correct? If so, I guess this calculation would fix that problem, but I guess this is also what Andrew referred to as the extra runtime division? wait-ms/(1000/hz) = hz-to-wait 100/(1000/1000) = 100 == 100ms 100/(1000/100) = 10 == 100ms 100/(1000/50) = 5 == 100ms It would of course be optimized to something like this: wait-ms/ms-per-hz What about this: At startup time we set a global variable based on hz: varX = HZ/1000; then in the rest of the code we can use ex: schedule_timeout(varX*100) for 100ms no matter what hz is. With hz=50 then the lowest ms is 20 for one tick though. And that might trigger problems with approximation at some point. varX would have to be decimal, and that might also be a problem? I think that extremists will push the limits on these settings, and that failure due to wrong timouts or other similar things would generate unwanted noise on LKML. I think I'm just stating about the obvious now, am I not? -HK ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 12:51 ` dynamic-hz Hans Kristian Rosbach @ 2004-12-13 13:01 ` Andrea Arcangeli 2004-12-13 13:02 ` dynamic-hz Andrea Arcangeli 2004-12-13 15:06 ` dynamic-hz Geert Uytterhoeven 2004-12-14 4:05 ` dynamic-hz Nish Aravamudan 2 siblings, 1 reply; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-13 13:01 UTC (permalink / raw) To: Hans Kristian Rosbach Cc: Pavel Machek, Andrew Morton, Con Kolivas, Linux Kernel Mailing List On Mon, Dec 13, 2004 at 01:51:11PM +0100, Hans Kristian Rosbach wrote: > then in the rest of the code we can use ex: > schedule_timeout(varX*100) for 100ms no matter what hz is. There's not real difference between a multiplication or a division, and for either cases it doesn't worth to optimize such usage IMHO. I believe the only real cost is the cacheline anyway. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 13:01 ` dynamic-hz Andrea Arcangeli @ 2004-12-13 13:02 ` Andrea Arcangeli 0 siblings, 0 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-13 13:02 UTC (permalink / raw) To: Hans Kristian Rosbach Cc: Pavel Machek, Andrew Morton, Con Kolivas, Linux Kernel Mailing List On Mon, Dec 13, 2004 at 02:01:42PM +0100, Andrea Arcangeli wrote: > believe the only real cost is the cacheline anyway. [..] and in turn I guess by adding a second dynamic variable you just doubled the only real cost ;) ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 12:51 ` dynamic-hz Hans Kristian Rosbach 2004-12-13 13:01 ` dynamic-hz Andrea Arcangeli @ 2004-12-13 15:06 ` Geert Uytterhoeven 2004-12-13 16:12 ` dynamic-hz Pavel Machek 2004-12-14 4:05 ` dynamic-hz Nish Aravamudan 2 siblings, 1 reply; 126+ messages in thread From: Geert Uytterhoeven @ 2004-12-13 15:06 UTC (permalink / raw) To: Hans Kristian Rosbach Cc: Pavel Machek, Andrew Morton, Con Kolivas, andrea, Linux Kernel Mailing List On Mon, 13 Dec 2004, Hans Kristian Rosbach wrote: > I'm not sure what the above "scedule_timeout(HZ/10)" is supposed to > do, but the parameter it gets in 1000hz is "100" so I assume this > is because we want to wait for 100ms, and in 1000hz that equals > 100 cycles. Correct? `schedule_timeout(HZ/x)' lets it wait for 1/x'th second. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 15:06 ` dynamic-hz Geert Uytterhoeven @ 2004-12-13 16:12 ` Pavel Machek 2004-12-13 16:14 ` dynamic-hz Geert Uytterhoeven 2004-12-14 4:06 ` dynamic-hz Nish Aravamudan 0 siblings, 2 replies; 126+ messages in thread From: Pavel Machek @ 2004-12-13 16:12 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Hans Kristian Rosbach, Andrew Morton, Con Kolivas, andrea, Linux Kernel Mailing List HI! > > I'm not sure what the above "scedule_timeout(HZ/10)" is supposed to > > do, but the parameter it gets in 1000hz is "100" so I assume this > > is because we want to wait for 100ms, and in 1000hz that equals > > 100 cycles. Correct? > > `schedule_timeout(HZ/x)' lets it wait for 1/x'th second. ...small problem is that for HZ lower than x it does not wait at all :-(. Pavel -- Boycott Kodak -- for their patent abuse against Java. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 16:12 ` dynamic-hz Pavel Machek @ 2004-12-13 16:14 ` Geert Uytterhoeven 2004-12-14 4:06 ` dynamic-hz Nish Aravamudan 1 sibling, 0 replies; 126+ messages in thread From: Geert Uytterhoeven @ 2004-12-13 16:14 UTC (permalink / raw) To: Pavel Machek Cc: Hans Kristian Rosbach, Andrew Morton, Con Kolivas, andrea, Linux Kernel Mailing List On Mon, 13 Dec 2004, Pavel Machek wrote: > > > I'm not sure what the above "scedule_timeout(HZ/10)" is supposed to > > > do, but the parameter it gets in 1000hz is "100" so I assume this > > > is because we want to wait for 100ms, and in 1000hz that equals > > > 100 cycles. Correct? > > > > `schedule_timeout(HZ/x)' lets it wait for 1/x'th second. > > ...small problem is that for HZ lower than x it does not wait at all > :-(. I know. You better use `(HZ+x-1)/x' for delays. Integer division can be tricky :-) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 16:12 ` dynamic-hz Pavel Machek 2004-12-13 16:14 ` dynamic-hz Geert Uytterhoeven @ 2004-12-14 4:06 ` Nish Aravamudan 1 sibling, 0 replies; 126+ messages in thread From: Nish Aravamudan @ 2004-12-14 4:06 UTC (permalink / raw) To: Pavel Machek Cc: Geert Uytterhoeven, Hans Kristian Rosbach, Andrew Morton, Con Kolivas, andrea, Linux Kernel Mailing List On Mon, 13 Dec 2004 17:12:07 +0100, Pavel Machek <pavel@suse.cz> wrote: > HI! > > > > I'm not sure what the above "scedule_timeout(HZ/10)" is supposed to > > > do, but the parameter it gets in 1000hz is "100" so I assume this > > > is because we want to wait for 100ms, and in 1000hz that equals > > > 100 cycles. Correct? > > > > `schedule_timeout(HZ/x)' lets it wait for 1/x'th second. > > ...small problem is that for HZ lower than x it does not wait at all > :-(. Ah ha! Another reason to use msleep() or msleep_interruptible() :). Or, if you just want to give up the CPU, use schedule(); or if, giving up the CPU for a long time, use yield() [the current semantic interpretation of yield()]. -Nish ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 12:51 ` dynamic-hz Hans Kristian Rosbach 2004-12-13 13:01 ` dynamic-hz Andrea Arcangeli 2004-12-13 15:06 ` dynamic-hz Geert Uytterhoeven @ 2004-12-14 4:05 ` Nish Aravamudan 2 siblings, 0 replies; 126+ messages in thread From: Nish Aravamudan @ 2004-12-14 4:05 UTC (permalink / raw) To: Hans Kristian Rosbach Cc: Pavel Machek, Andrew Morton, Con Kolivas, andrea, Linux Kernel Mailing List On Mon, 13 Dec 2004 13:51:11 +0100, Hans Kristian Rosbach <hk@isphuset.no> wrote: > > > On Mon, Dec 13, 2004 at 12:22:29PM +0100, Pavel Machek wrote: > > I tried defining HZ to 10 once, and there are some #if arrays in the > > kernel that prevented me from doing that. > > > > Some drivers do timeouts based on jiffies; having HZ=1 may turn 20msec > > timeout into 1sec, that could hurt a lot in the error case... > > On Mon, Dec 13, 2004 at 03:25:21AM -0800, Andrew Morton wrote: > > We still have 1000-odd places which do things like > > schedule_timeout(HZ/10); > > which will now involve a runtime divide. The propagation of msleep() > > and ssleep() will reduce that a bit, but not much. > > Shouldn't that be regarded as a bug/deprecated? > > I'm not sure what the above "scedule_timeout(HZ/10)" is supposed to > do, but the parameter it gets in 1000hz is "100" so I assume this > is because we want to wait for 100ms, and in 1000hz that equals > 100 cycles. Correct? schedule_timeout() specifies a sleep in jiffies -- it's actually a rather annoying interface for the very reason that it depends on the value of HZ how *long* you actually will sleep for (in human time units). So your assumption is incorrect, presuming the code author knows what they are doing. They wish to sleep for 1/10 the number of timer ticks in a second. What this translates to, though, clearly depends on HZ. Thus msleep{,_interruptible}(100); would be far better to use (it calls schedule_timeout() correctly [another thing not done often]). Also, if you look carefully at the timer code, you'll notice that the x86 timer frequency is not actually 1000 Hz, it's actually less. Thus you run into issues with timer intervals... But, in any case, specifying a timeout of 100 msecs is different then specifying a timeout of 100 cycles on x86. I'm not sure what it exactly translates to, but it will be more. Hence, you should use msleep() not schedule_timeout() directly. Jiffies should not be what you base your timing on; msecs & secs are easier and less likely to be misused. > then in the rest of the code we can use ex: > schedule_timeout(varX*100) for 100ms no matter what hz is. No, please don't. Use msleep() or msleep_interruptible(). Let the conversion functions take care of the conversions. > With hz=50 then the lowest ms is 20 for one tick though. And that > might trigger problems with approximation at some point. > varX would have to be decimal, and that might also be a problem? No floating point in the kernel... -Nish ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 11:19 ` dynamic-hz Hans Kristian Rosbach 2004-12-13 11:22 ` dynamic-hz Pavel Machek @ 2004-12-13 11:33 ` Andrea Arcangeli 2004-12-13 14:38 ` dynamic-hz Zwane Mwaikambo 2 siblings, 0 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-13 11:33 UTC (permalink / raw) To: Hans Kristian Rosbach Cc: Andrew Morton, Con Kolivas, pavel, Linux Kernel Mailing List On Mon, Dec 13, 2004 at 12:19:50PM +0100, Hans Kristian Rosbach wrote: > Is there any recommended lower bound setting? > Would there be a point in recommending lower settings for desktops > running only text consoles opposed to X desktops? I don't know the ACPI details, but as far as dyanmic-hz is concerned I seem to recall I tested it with HZ=10/25/50/... too (as well as HZ=2000/5000...), everything will work flawlessy but any number below <50 will pretty much guarantee not to show even an animated flash or gif fluenty ;). Said that you can use X just fine, not only the console (my X usage on the laptop sure doesn't need a fast HZ for example). ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 11:19 ` dynamic-hz Hans Kristian Rosbach 2004-12-13 11:22 ` dynamic-hz Pavel Machek 2004-12-13 11:33 ` dynamic-hz Andrea Arcangeli @ 2004-12-13 14:38 ` Zwane Mwaikambo 2 siblings, 0 replies; 126+ messages in thread From: Zwane Mwaikambo @ 2004-12-13 14:38 UTC (permalink / raw) To: Hans Kristian Rosbach Cc: Andrew Morton, Con Kolivas, andrea, pavel, Linux Kernel Mailing List On Mon, 13 Dec 2004, Hans Kristian Rosbach wrote: > Is there any recommended lower bound setting? > Would there be a point in recommending lower settings for desktops > running only text consoles opposed to X desktops? You could probably go as low as 50 without noticing anything on text only consoles. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-12 23:36 ` dynamic-hz Con Kolivas ` (4 preceding siblings ...) 2004-12-13 11:02 ` dynamic-hz Andrew Morton @ 2004-12-13 12:00 ` Alan Cox 2004-12-13 15:52 ` dynamic-hz Andrea Arcangeli 2004-12-14 22:28 ` dynamic-hz Lee Revell 6 siblings, 1 reply; 126+ messages in thread From: Alan Cox @ 2004-12-13 12:00 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrea Arcangeli, Pavel Machek, Linux Kernel Mailing List On Sul, 2004-12-12 at 23:36, Con Kolivas wrote: > The rest of my users that were setting Hz to 100 for so-called > performance gains were doing so under the false impression that cpu > usage was lower simply because of the woefully inaccurate cpu usage > calcuation at 100Hz. It makes a difference for some HPC workloads. I run 100Hz because - It improves battery life - Laptops tend to lose ticks on battery status queries at 1Khz ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 12:00 ` dynamic-hz Alan Cox @ 2004-12-13 15:52 ` Andrea Arcangeli 0 siblings, 0 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-13 15:52 UTC (permalink / raw) To: Alan Cox; +Cc: Con Kolivas, Pavel Machek, Linux Kernel Mailing List On Mon, Dec 13, 2004 at 12:00:44PM +0000, Alan Cox wrote: > - Laptops tend to lose ticks on battery status queries at 1Khz The lost-tick adjustment code should in theory cope with it, however in my firewall with USB adsl modem taking 3msec-long-irqs, it makes the system time go in the future pretty quick (instead of losing time without tick compensation code). I guess the same would happen with the battery status checks. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-12 23:36 ` dynamic-hz Con Kolivas ` (5 preceding siblings ...) 2004-12-13 12:00 ` dynamic-hz Alan Cox @ 2004-12-14 22:28 ` Lee Revell 2004-12-14 22:40 ` dynamic-hz Con Kolivas 6 siblings, 1 reply; 126+ messages in thread From: Lee Revell @ 2004-12-14 22:28 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrea Arcangeli, Pavel Machek, linux-kernel On Mon, 2004-12-13 at 10:36 +1100, Con Kolivas wrote: > The performance benefit, if any, is often lost in noise during > benchmarks and when there, is less than 1%. I have measured 2.1-2.3% residency for the timer ISR on my 600Mhz VIA C3. And this is a desktop - you have many many embedded systems that are slower. For these systems the difference is very real. I would certainly expect it to be lost in the noise on a 2Ghz machine. Lee ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 22:28 ` dynamic-hz Lee Revell @ 2004-12-14 22:40 ` Con Kolivas 2004-12-14 22:50 ` dynamic-hz Lee Revell 0 siblings, 1 reply; 126+ messages in thread From: Con Kolivas @ 2004-12-14 22:40 UTC (permalink / raw) To: Lee Revell; +Cc: Con Kolivas, Andrea Arcangeli, Pavel Machek, linux-kernel Lee Revell writes: > On Mon, 2004-12-13 at 10:36 +1100, Con Kolivas wrote: >> The performance benefit, if any, is often lost in noise during >> benchmarks and when there, is less than 1%. > > I have measured 2.1-2.3% residency for the timer ISR on my 600Mhz VIA > C3. And this is a desktop - you have many many embedded systems that > are slower. For these systems the difference is very real. Could you explain residency and it's relevance to throughput please? I've not heard this term before. Cheers, Con ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-14 22:40 ` dynamic-hz Con Kolivas @ 2004-12-14 22:50 ` Lee Revell 0 siblings, 0 replies; 126+ messages in thread From: Lee Revell @ 2004-12-14 22:50 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrea Arcangeli, Pavel Machek, linux-kernel On Wed, 2004-12-15 at 09:40 +1100, Con Kolivas wrote: > Lee Revell writes: > > > On Mon, 2004-12-13 at 10:36 +1100, Con Kolivas wrote: > >> The performance benefit, if any, is often lost in noise during > >> benchmarks and when there, is less than 1%. > > > > I have measured 2.1-2.3% residency for the timer ISR on my 600Mhz VIA > > C3. And this is a desktop - you have many many embedded systems that > > are slower. For these systems the difference is very real. > > Could you explain residency and it's relevance to throughput please? I've > not heard this term before. > It means 2.1-2.3% of wallclock time is spent running the timer interrupt handler. IOW, it runs for 21-23 usecs, 1000x per second. Lee ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-11 14:23 dynamic-hz Andrea Arcangeli ` (2 preceding siblings ...) 2004-12-12 16:35 ` dynamic-hz Pavel Machek @ 2004-12-13 20:26 ` Olaf Hering 2004-12-13 22:41 ` dynamic-hz Andrea Arcangeli 2004-12-13 20:56 ` dynamic-hz john stultz 4 siblings, 1 reply; 126+ messages in thread From: Olaf Hering @ 2004-12-13 20:26 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel On Sat, Dec 11, Andrea Arcangeli wrote: > Comments welcome thanks. Not a comment, more a question: Will there be a real benefit by running an old PII 200MMX at 100HZ instead of 1000HZ? I guess less interrupts should improve the desktop performance a little bit. ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 20:26 ` dynamic-hz Olaf Hering @ 2004-12-13 22:41 ` Andrea Arcangeli 0 siblings, 0 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-13 22:41 UTC (permalink / raw) To: Olaf Hering; +Cc: linux-kernel On Mon, Dec 13, 2004 at 09:26:42PM +0100, Olaf Hering wrote: > Not a comment, more a question: > > Will there be a real benefit by running an old PII 200MMX at 100HZ > instead of 1000HZ? > I guess less interrupts should improve the desktop performance a little bit. On a pii the slowdown is probably more than 1%, the slower the cpu, the more 100hz is appropriate. This is not going to be very noticeable on a desktop since a desktop is often idle, but only on servers it should help (for example kernel compiles will be more than 1% faster). ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-11 14:23 dynamic-hz Andrea Arcangeli ` (3 preceding siblings ...) 2004-12-13 20:26 ` dynamic-hz Olaf Hering @ 2004-12-13 20:56 ` john stultz 2004-12-13 22:21 ` dynamic-hz Andrea Arcangeli 4 siblings, 1 reply; 126+ messages in thread From: john stultz @ 2004-12-13 20:56 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: lkml On Sat, 2004-12-11 at 06:23, Andrea Arcangeli wrote: > This patch is quite intrusive since many HZ visible to userspace have to > be converted to USER_HZ, and most important because HZ isn't available > at compile time anymore and every variable in function of HZ must be > either changed to be in function of USER_HZ or it must be initialized at > runtime. The code has debugging code (optional at compile time) so that > I can guarantee that there cannot be any regression. Interesting patch, I know some folks have been asking about HZ=10k recently, so this could help. The only bit that worries me a bit is the change from HZ->USER_HZ for internal calculations. In my mind, USER_HZ should only be used for converting internal system ticks to userspace-visible ticks. Changing drivers to think about things in user-ticks confuses things a bit since suddenly some kernel code is thinking in user-ticks and others in system-ticks. It just muddles things a bit. thanks -john ^ permalink raw reply [flat|nested] 126+ messages in thread
* Re: dynamic-hz 2004-12-13 20:56 ` dynamic-hz john stultz @ 2004-12-13 22:21 ` Andrea Arcangeli 0 siblings, 0 replies; 126+ messages in thread From: Andrea Arcangeli @ 2004-12-13 22:21 UTC (permalink / raw) To: john stultz; +Cc: lkml On Mon, Dec 13, 2004 at 12:56:29PM -0800, john stultz wrote: > Interesting patch, I know some folks have been asking about HZ=10k > recently, so this could help. Yes, they only need to pass HZ=10000 to the boot command line to make it work with 2.4. > The only bit that worries me a bit is the change from HZ->USER_HZ for > internal calculations. In my mind, USER_HZ should only be used for > converting internal system ticks to userspace-visible ticks. Changing > drivers to think about things in user-ticks confuses things a bit since > suddenly some kernel code is thinking in user-ticks and others in > system-ticks. It just muddles things a bit. I tried to make the smallest possible change to make the thing work, even if that sometime meant to think in user hz. The user_to_kernel_hz helper function converts back into kernel hz. ^ permalink raw reply [flat|nested] 126+ messages in thread
end of thread, other threads:[~2004-12-22 20:04 UTC | newest] Thread overview: 126+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-12-11 14:23 dynamic-hz Andrea Arcangeli 2004-12-11 14:50 ` dynamic-hz Zwane Mwaikambo 2004-12-12 6:57 ` dynamic-hz Andrea Arcangeli 2004-12-11 21:41 ` dynamic-hz Jan Engelhardt 2004-12-12 16:35 ` dynamic-hz Pavel Machek 2004-12-12 22:23 ` dynamic-hz Andrea Arcangeli 2004-12-12 23:36 ` dynamic-hz Con Kolivas 2004-12-12 23:42 ` dynamic-hz Pavel Machek 2004-12-13 0:09 ` dynamic-hz Con Kolivas 2004-12-13 8:37 ` dynamic-hz Jan Engelhardt 2004-12-13 10:43 ` dynamic-hz Pavel Machek 2004-12-13 11:08 ` dynamic-hz Andrea Arcangeli 2004-12-13 19:36 ` dynamic-hz john stultz 2004-12-12 23:43 ` dynamic-hz Andrea Arcangeli 2004-12-13 0:18 ` dynamic-hz Con Kolivas 2004-12-13 0:27 ` dynamic-hz Andrea Arcangeli 2004-12-13 1:50 ` dynamic-hz Zwane Mwaikambo 2004-12-13 11:28 ` dynamic-hz Andrea Arcangeli 2004-12-13 12:43 ` dynamic-hz Pavel Machek 2004-12-13 12:58 ` dynamic-hz Andrea Arcangeli 2004-12-13 19:12 ` dynamic-hz Pavel Machek 2004-12-13 20:34 ` dynamic-hz john stultz 2004-12-13 20:49 ` dynamic-hz Pavel Machek 2004-12-14 2:04 ` dynamic-hz Andrea Arcangeli [not found] ` <20041214013924.GB14617@atomide.com> 2004-12-14 9:37 ` dynamic-hz Pavel Machek 2004-12-14 21:18 ` dynamic-hz Tony Lindgren 2004-12-14 22:06 ` dynamic-hz Pavel Machek 2004-12-14 23:00 ` dynamic-hz linux-os 2004-12-14 23:13 ` dynamic-hz Tony Lindgren 2004-12-22 20:02 ` dynamic-hz Tony Lindgren 2004-12-14 23:04 ` dynamic-hz Tony Lindgren 2004-12-14 2:46 ` dynamic-hz Andrea Arcangeli 2004-12-14 19:24 ` dynamic-hz john stultz 2004-12-14 2:36 ` dynamic-hz Andrea Arcangeli 2004-12-14 9:39 ` dynamic-hz Pavel Machek 2004-12-14 9:59 ` dynamic-hz Pavel Machek 2004-12-14 15:25 ` dynamic-hz Andrea Arcangeli 2004-12-14 22:02 ` USB making time drift [was Re: dynamic-hz] Pavel Machek 2004-12-14 23:16 ` Andrea Arcangeli 2004-12-15 2:59 ` Gene Heskett 2004-12-15 9:17 ` Andrea Arcangeli 2004-12-15 16:44 ` Gene Heskett 2004-12-15 18:20 ` Andrea Arcangeli 2004-12-16 1:59 ` Gene Heskett 2004-12-16 11:30 ` Andrea Arcangeli 2004-12-16 12:50 ` Alan Cox 2004-12-15 20:16 ` Pavel Machek 2004-12-16 2:02 ` Gene Heskett 2004-12-15 17:03 ` Gene Heskett 2004-12-15 17:48 ` Tim Schmielau 2004-12-16 2:03 ` Gene Heskett 2004-12-16 0:58 ` Pavel Machek 2004-12-16 2:33 ` john stultz 2004-12-16 1:15 ` Time goes crazy in 2.6.9 after long cli [was Re: USB making time drift] Pavel Machek 2004-12-16 11:13 ` Andrea Arcangeli 2004-12-16 12:49 ` Alan Cox 2004-12-13 14:50 ` dynamic-hz Zwane Mwaikambo 2004-12-13 7:43 ` dynamic-hz Stefan Seyfried 2004-12-13 13:58 ` dynamic-hz Russell King 2004-12-13 14:14 ` dynamic-hz Russell King 2004-12-13 14:52 ` dynamic-hz Alan Cox 2004-12-13 16:23 ` dynamic-hz Russell King 2004-12-13 17:53 ` dynamic-hz Michael Buesch 2004-12-13 18:04 ` dynamic-hz Russell King 2004-12-13 19:04 ` dynamic-hz Pavel Machek 2004-12-13 20:11 ` dynamic-hz Russell King 2004-12-14 0:16 ` dynamic-hz Eric St-Laurent 2004-12-15 18:04 ` dynamic-hz Alan Cox 2004-12-15 19:54 ` dynamic-hz linux-os 2004-12-16 2:17 ` dynamic-hz Gene Heskett 2004-12-16 12:42 ` dynamic-hz linux-os 2004-12-17 20:12 ` dynamic-hz H. Peter Anvin 2004-12-16 9:10 ` dynamic-hz Gabriel Paubert 2004-12-16 12:17 ` dynamic-hz Geert Uytterhoeven 2004-12-16 14:00 ` dynamic-hz Mitchell Blank Jr 2004-12-13 15:30 ` dynamic-hz Zwane Mwaikambo 2004-12-13 15:59 ` dynamic-hz Russell King 2004-12-13 16:14 ` dynamic-hz Pavel Machek 2004-12-13 16:06 ` dynamic-hz Pavel Machek 2004-12-13 16:19 ` dynamic-hz Jan Engelhardt 2004-12-13 8:29 ` dynamic-hz Jan Engelhardt 2004-12-14 22:54 ` dynamic-hz Lee Revell 2004-12-14 23:38 ` dynamic-hz Chris Friesen 2004-12-15 8:32 ` dynamic-hz Jan Engelhardt 2004-12-13 11:02 ` dynamic-hz Andrew Morton 2004-12-13 11:17 ` dynamic-hz Andrea Arcangeli 2004-12-13 11:25 ` dynamic-hz Andrew Morton 2004-12-13 11:47 ` dynamic-hz Andrea Arcangeli 2004-12-14 3:56 ` dynamic-hz Nish Aravamudan 2004-12-14 3:54 ` dynamic-hz Nish Aravamudan 2004-12-14 4:29 ` dynamic-hz Andrew Morton 2004-12-14 5:25 ` dynamic-hz Nish Aravamudan 2004-12-17 20:10 ` dynamic-hz Nish Aravamudan 2004-12-14 10:01 ` dynamic-hz Domen Puncer 2004-12-14 16:56 ` dynamic-hz Nish Aravamudan 2004-12-14 14:23 ` dynamic-hz linux-os 2004-12-14 16:54 ` dynamic-hz Nish Aravamudan 2004-12-14 17:15 ` dynamic-hz Andrea Arcangeli 2004-12-14 17:42 ` dynamic-hz Nish Aravamudan 2004-12-14 18:29 ` dynamic-hz Andrea Arcangeli 2004-12-14 19:00 ` dynamic-hz Nish Aravamudan 2004-12-14 18:22 ` dynamic-hz linux-os 2004-12-14 18:38 ` dynamic-hz Andrea Arcangeli 2004-12-14 18:50 ` dynamic-hz Pavel Machek 2004-12-13 11:19 ` dynamic-hz Hans Kristian Rosbach 2004-12-13 11:22 ` dynamic-hz Pavel Machek 2004-12-13 11:39 ` dynamic-hz Andrea Arcangeli 2004-12-13 12:51 ` dynamic-hz Hans Kristian Rosbach 2004-12-13 13:01 ` dynamic-hz Andrea Arcangeli 2004-12-13 13:02 ` dynamic-hz Andrea Arcangeli 2004-12-13 15:06 ` dynamic-hz Geert Uytterhoeven 2004-12-13 16:12 ` dynamic-hz Pavel Machek 2004-12-13 16:14 ` dynamic-hz Geert Uytterhoeven 2004-12-14 4:06 ` dynamic-hz Nish Aravamudan 2004-12-14 4:05 ` dynamic-hz Nish Aravamudan 2004-12-13 11:33 ` dynamic-hz Andrea Arcangeli 2004-12-13 14:38 ` dynamic-hz Zwane Mwaikambo 2004-12-13 12:00 ` dynamic-hz Alan Cox 2004-12-13 15:52 ` dynamic-hz Andrea Arcangeli 2004-12-14 22:28 ` dynamic-hz Lee Revell 2004-12-14 22:40 ` dynamic-hz Con Kolivas 2004-12-14 22:50 ` dynamic-hz Lee Revell 2004-12-13 20:26 ` dynamic-hz Olaf Hering 2004-12-13 22:41 ` dynamic-hz Andrea Arcangeli 2004-12-13 20:56 ` dynamic-hz john stultz 2004-12-13 22:21 ` dynamic-hz Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).