linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* dynamic-hz
@ 2004-12-11 14:23 Andrea Arcangeli
  2004-12-11 14:50 ` dynamic-hz Zwane Mwaikambo
                   ` (4 more replies)
  0 siblings, 5 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-11 14:23 UTC (permalink / raw)
  To: linux-kernel

The below patch allows to set the HZ dynamically at boot time with
command line parameter. HZ=1000 HZ=100 HZ=333 any other value just works
(though certain value may cause more or less drift to the system time
advance/decrease).

Is there any interest from the mainline developers to merge this into
2.6? I'm getting requests for this feature being forward ported to
2.6 (both for batch jobs and for the powersaved that can trim the hz
down to 80mhz). It should be up to the user to choose the HZ like it was
in 2.4-aa.

This patch is quite intrusive since many HZ visible to userspace have to
be converted to USER_HZ, and most important because HZ isn't available
at compile time anymore and every variable in function of HZ must be
either changed to be in function of USER_HZ or it must be initialized at
runtime. The code has debugging code (optional at compile time) so that
I can guarantee that there cannot be any regression.

Technically this makes a lot of sense to me (well, you can guess why I
implemented it in the first place), at least in archs where one cannot
reprogram the timer chip in a performant way (to stop timer ticks
completely until the next posted timer). This is in production for years
in SLES8 btw.

http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.23aa3/9999_zzz-dynamic-hz-5.gz

Comments welcome thanks.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-11 14:23 dynamic-hz Andrea Arcangeli
@ 2004-12-11 14:50 ` Zwane Mwaikambo
  2004-12-12  6:57   ` dynamic-hz Andrea Arcangeli
  2004-12-11 21:41 ` dynamic-hz Jan Engelhardt
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 126+ messages in thread
From: Zwane Mwaikambo @ 2004-12-11 14:50 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

On Sat, 11 Dec 2004, Andrea Arcangeli wrote:

> This patch is quite intrusive since many HZ visible to userspace have to
> be converted to USER_HZ, and most important because HZ isn't available

Shouldn't that be a bug anyway regardless of dynamic-hz?

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-11 14:23 dynamic-hz Andrea Arcangeli
  2004-12-11 14:50 ` dynamic-hz Zwane Mwaikambo
@ 2004-12-11 21:41 ` Jan Engelhardt
  2004-12-12 16:35 ` dynamic-hz Pavel Machek
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 126+ messages in thread
From: Jan Engelhardt @ 2004-12-11 21:41 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Hi,

>The below patch allows to set the HZ dynamically at boot time with

so the only thing left is to alter HZ at runtime :)

>Is there any interest from the mainline developers to merge this into 2.6?

For my side, there is interest from the average user.



Jan Engelhardt
-- 
ENOSPC

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-11 14:50 ` dynamic-hz Zwane Mwaikambo
@ 2004-12-12  6:57   ` Andrea Arcangeli
  0 siblings, 0 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-12  6:57 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: linux-kernel

On Sat, Dec 11, 2004 at 07:50:31AM -0700, Zwane Mwaikambo wrote:
> Shouldn't that be a bug anyway regardless of dynamic-hz?

Yes of course. And in theory in 2.6 it'll be easier to implement than it
was in 2.4, since it has a chance to be already using USER_HZ at compile
time instead of HZ.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-11 14:23 dynamic-hz Andrea Arcangeli
  2004-12-11 14:50 ` dynamic-hz Zwane Mwaikambo
  2004-12-11 21:41 ` dynamic-hz Jan Engelhardt
@ 2004-12-12 16:35 ` Pavel Machek
  2004-12-12 22:23   ` dynamic-hz Andrea Arcangeli
  2004-12-13 20:26 ` dynamic-hz Olaf Hering
  2004-12-13 20:56 ` dynamic-hz john stultz
  4 siblings, 1 reply; 126+ messages in thread
From: Pavel Machek @ 2004-12-12 16:35 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Hi!

> The below patch allows to set the HZ dynamically at boot time with
> command line parameter. HZ=1000 HZ=100 HZ=333 any other value just works
> (though certain value may cause more or less drift to the system time
> advance/decrease).
> 
> Is there any interest from the mainline developers to merge this into
> 2.6? I'm getting requests for this feature being forward ported to
> 2.6 (both for batch jobs and for the powersaved that can trim the hz
> down to 80mhz). It should be up to the user to choose the HZ like it was
> in 2.4-aa.
> 
> This patch is quite intrusive since many HZ visible to userspace have to
> be converted to USER_HZ, and most important because HZ isn't available
> at compile time anymore and every variable in function of HZ must be
> either changed to be in function of USER_HZ or it must be initialized at
> runtime. The code has debugging code (optional at compile time) so that
> I can guarantee that there cannot be any regression.
> 
> Technically this makes a lot of sense to me (well, you can guess why I
> implemented it in the first place), at least in archs where one cannot
> reprogram the timer chip in a performant way (to stop timer ticks
> completely until the next posted timer). This is in production for years
> in SLES8 btw.
> 
> http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.23aa3/9999_zzz-dynamic-hz-5.gz

It certainly helps with singing capacitors... What is overhead of
this?

								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-12 16:35 ` dynamic-hz Pavel Machek
@ 2004-12-12 22:23   ` Andrea Arcangeli
  2004-12-12 23:36     ` dynamic-hz Con Kolivas
  0 siblings, 1 reply; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-12 22:23 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

On Sun, Dec 12, 2004 at 05:35:47PM +0100, Pavel Machek wrote:
> It certainly helps with singing capacitors... What is overhead of

;)

> this?

The overhead is a single l1 cacheline in the paths manipulating HZ
(rather than having an immediate value hardcoded in the asm, it reads it
from a memory location not in the icache). Plus there are some
conversion routines in the USER_HZ usages. It's not a measurable
difference.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-12 22:23   ` dynamic-hz Andrea Arcangeli
@ 2004-12-12 23:36     ` Con Kolivas
  2004-12-12 23:42       ` dynamic-hz Pavel Machek
                         ` (6 more replies)
  0 siblings, 7 replies; 126+ messages in thread
From: Con Kolivas @ 2004-12-12 23:36 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Pavel Machek, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1168 bytes --]

Andrea Arcangeli wrote:
> On Sun, Dec 12, 2004 at 05:35:47PM +0100, Pavel Machek wrote:
> 
>>It certainly helps with singing capacitors... What is overhead of
> 
> 
> ;)
> 
> 
>>this?
> 
> 
> The overhead is a single l1 cacheline in the paths manipulating HZ
> (rather than having an immediate value hardcoded in the asm, it reads it
> from a memory location not in the icache). Plus there are some
> conversion routines in the USER_HZ usages. It's not a measurable
> difference.

Just being devils advocate here...

I had variable Hz in my tree for a while and found there was one 
solitary purpose to setting Hz to 100; to silence cheap capacitors.

The rest of my users that were setting Hz to 100 for so-called 
performance gains were doing so under the false impression that cpu 
usage was lower simply because of the woefully inaccurate cpu usage 
calcuation at 100Hz.

The performance benefit, if any, is often lost in noise during 
benchmarks and when there, is less than 1%. So I was wondering if you 
had some specific advantage in mind for this patch? Is there some 
arch-specific advantage? I can certainly envision disadvantages to lower Hz.

Cheers,
Con

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-12 23:36     ` dynamic-hz Con Kolivas
@ 2004-12-12 23:42       ` Pavel Machek
  2004-12-13  0:09         ` dynamic-hz Con Kolivas
  2004-12-12 23:43       ` dynamic-hz Andrea Arcangeli
                         ` (5 subsequent siblings)
  6 siblings, 1 reply; 126+ messages in thread
From: Pavel Machek @ 2004-12-12 23:42 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrea Arcangeli, linux-kernel

Hi!

> >The overhead is a single l1 cacheline in the paths manipulating HZ
> >(rather than having an immediate value hardcoded in the asm, it reads it
> >from a memory location not in the icache). Plus there are some
> >conversion routines in the USER_HZ usages. It's not a measurable
> >difference.
> 
> Just being devils advocate here...
> 
> I had variable Hz in my tree for a while and found there was one 
> solitary purpose to setting Hz to 100; to silence cheap capacitors.
> 
> The rest of my users that were setting Hz to 100 for so-called 
> performance gains were doing so under the false impression that cpu 
> usage was lower simply because of the woefully inaccurate cpu usage 
> calcuation at 100Hz.
> 
> The performance benefit, if any, is often lost in noise during 
> benchmarks and when there, is less than 1%. So I was wondering if you 
> had some specific advantage in mind for this patch? Is there some 
> arch-specific advantage? I can certainly envision disadvantages to lower Hz.

Actually, I measured about 1W power savings with HZ=100. That's about
as much as spindown of disk saves...
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-12 23:36     ` dynamic-hz Con Kolivas
  2004-12-12 23:42       ` dynamic-hz Pavel Machek
@ 2004-12-12 23:43       ` Andrea Arcangeli
  2004-12-13  0:18         ` dynamic-hz Con Kolivas
  2004-12-13  7:43       ` dynamic-hz Stefan Seyfried
                         ` (4 subsequent siblings)
  6 siblings, 1 reply; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-12 23:43 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Pavel Machek, linux-kernel

On Mon, Dec 13, 2004 at 10:36:19AM +1100, Con Kolivas wrote:
> The performance benefit, if any, is often lost in noise during 
> benchmarks and when there, is less than 1%. So I was wondering if you 
> had some specific advantage in mind for this patch? Is there some 
> arch-specific advantage? I can certainly envision disadvantages to lower Hz.

My last number I've here is 1% for kernel compile. We're not talking
fancy desktop stuff here, we're talking about raw computing servers that
runs in userspace 99.9% of the time where the 1% loss is going to be
multiplied dozen or hundred of times. For those HZ=1000 is a pure
tangible disavantage.

For desktops 1% of cpu being lost is not an issue of course.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-12 23:42       ` dynamic-hz Pavel Machek
@ 2004-12-13  0:09         ` Con Kolivas
  2004-12-13  8:37           ` dynamic-hz Jan Engelhardt
  2004-12-13 10:43           ` dynamic-hz Pavel Machek
  0 siblings, 2 replies; 126+ messages in thread
From: Con Kolivas @ 2004-12-13  0:09 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Con Kolivas, Andrea Arcangeli, linux-kernel

Pavel Machek writes:

> Hi!
> 
>> >The overhead is a single l1 cacheline in the paths manipulating HZ
>> >(rather than having an immediate value hardcoded in the asm, it reads it
>> >from a memory location not in the icache). Plus there are some
>> >conversion routines in the USER_HZ usages. It's not a measurable
>> >difference.
>> 
>> Just being devils advocate here...
>> 
>> I had variable Hz in my tree for a while and found there was one 
>> solitary purpose to setting Hz to 100; to silence cheap capacitors.
>> 
>> The rest of my users that were setting Hz to 100 for so-called 
>> performance gains were doing so under the false impression that cpu 
>> usage was lower simply because of the woefully inaccurate cpu usage 
>> calcuation at 100Hz.
>> 
>> The performance benefit, if any, is often lost in noise during 
>> benchmarks and when there, is less than 1%. So I was wondering if you 
>> had some specific advantage in mind for this patch? Is there some 
>> arch-specific advantage? I can certainly envision disadvantages to lower Hz.
> 
> Actually, I measured about 1W power savings with HZ=100. That's about
> as much as spindown of disk saves...

How does the popular proprietary operating system cope with this? My 
understanding is they run 1000Hz yet they have good power saving and quiet 
capacitors. Presumably they do a lot less per timer tick?

Cheers,
Con


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-12 23:43       ` dynamic-hz Andrea Arcangeli
@ 2004-12-13  0:18         ` Con Kolivas
  2004-12-13  0:27           ` dynamic-hz Andrea Arcangeli
  0 siblings, 1 reply; 126+ messages in thread
From: Con Kolivas @ 2004-12-13  0:18 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Con Kolivas, Pavel Machek, linux-kernel

Andrea Arcangeli writes:

> On Mon, Dec 13, 2004 at 10:36:19AM +1100, Con Kolivas wrote:
>> The performance benefit, if any, is often lost in noise during 
>> benchmarks and when there, is less than 1%. So I was wondering if you 
>> had some specific advantage in mind for this patch? Is there some 
>> arch-specific advantage? I can certainly envision disadvantages to lower Hz.
> 
> My last number I've here is 1% for kernel compile. We're not talking
> fancy desktop stuff here, we're talking about raw computing servers that
> runs in userspace 99.9% of the time where the 1% loss is going to be
> multiplied dozen or hundred of times. For those HZ=1000 is a pure
> tangible disavantage.
> 
> For desktops 1% of cpu being lost is not an issue of course.

Thanks. I have to admit that the real reason I wrote this email was for this 
discussion to go on record so that desktop users would not get 
inappropriately excited by this change.

Cheers,
Con


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13  0:18         ` dynamic-hz Con Kolivas
@ 2004-12-13  0:27           ` Andrea Arcangeli
  2004-12-13  1:50             ` dynamic-hz Zwane Mwaikambo
  0 siblings, 1 reply; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-13  0:27 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Pavel Machek, linux-kernel

On Mon, Dec 13, 2004 at 11:18:15AM +1100, Con Kolivas wrote:
> Thanks. I have to admit that the real reason I wrote this email was for 
> this discussion to go on record so that desktop users would not get 
> inappropriately excited by this change.

Sure, desktop doesn't need this, the reason somebody is asking for it,
is that the desktop stuff hurted some other non-desktop usages. Infact
my 2.4 tree was setting by default HZ=1000 if 'desktop' paramter was
passed to the kernel (so that I could lower the timeslice accordingly
too, without losing the effect of the nicelevels between nice 0 and
+19).

The other new case where I'm asked for this feature is again not the
desktop but the high end laptop with cpu throttling down to 80mhz, and
what Pavel mentioned about the lower consumption. Perhaps we could do
variable HZ there, though I doubt it has a pit that can be reprogrammed
with sane performance.

Very few people are going to get real benefit from HZ=1000, but
I certainly agree it worth to keep HZ=1000 on desktops since on a idle
machine the downside of the more frequent irq sure isn't measurable,
while having shorter timeslices may be visible with many tasks, and
shorter timeslices requires faster HZ to preserve the nicelevels.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13  0:27           ` dynamic-hz Andrea Arcangeli
@ 2004-12-13  1:50             ` Zwane Mwaikambo
  2004-12-13 11:28               ` dynamic-hz Andrea Arcangeli
  0 siblings, 1 reply; 126+ messages in thread
From: Zwane Mwaikambo @ 2004-12-13  1:50 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Con Kolivas, Pavel Machek, linux-kernel

On Mon, 13 Dec 2004, Andrea Arcangeli wrote:

> Sure, desktop doesn't need this, the reason somebody is asking for it,
> is that the desktop stuff hurted some other non-desktop usages. Infact
> my 2.4 tree was setting by default HZ=1000 if 'desktop' paramter was
> passed to the kernel (so that I could lower the timeslice accordingly
> too, without losing the effect of the nicelevels between nice 0 and
> +19).
> 
> The other new case where I'm asked for this feature is again not the
> desktop but the high end laptop with cpu throttling down to 80mhz, and
> what Pavel mentioned about the lower consumption. Perhaps we could do
> variable HZ there, though I doubt it has a pit that can be reprogrammed
> with sane performance.

Well most x86(64) these days have local APICs and that provides a 
relatively inexpensive one shot timer mode.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-12 23:36     ` dynamic-hz Con Kolivas
  2004-12-12 23:42       ` dynamic-hz Pavel Machek
  2004-12-12 23:43       ` dynamic-hz Andrea Arcangeli
@ 2004-12-13  7:43       ` Stefan Seyfried
  2004-12-13 13:58         ` dynamic-hz Russell King
  2004-12-13 16:19         ` dynamic-hz Jan Engelhardt
  2004-12-13  8:29       ` dynamic-hz Jan Engelhardt
                         ` (3 subsequent siblings)
  6 siblings, 2 replies; 126+ messages in thread
From: Stefan Seyfried @ 2004-12-13  7:43 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Pavel Machek, linux-kernel, Andrea Arcangeli

Con Kolivas wrote:

> Just being devils advocate here...
> 
> I had variable Hz in my tree for a while and found there was one 
> solitary purpose to setting Hz to 100; to silence cheap capacitors.

power savings? Having the cpu wake up 1000 times per second if the
machine is idle cannot be better than only waking it up 100 times.

Yes, i am always on the quest for the 5 extra minutes on battery :-)

Stefan


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-12 23:36     ` dynamic-hz Con Kolivas
                         ` (2 preceding siblings ...)
  2004-12-13  7:43       ` dynamic-hz Stefan Seyfried
@ 2004-12-13  8:29       ` Jan Engelhardt
  2004-12-14 22:54         ` dynamic-hz Lee Revell
  2004-12-13 11:02       ` dynamic-hz Andrew Morton
                         ` (2 subsequent siblings)
  6 siblings, 1 reply; 126+ messages in thread
From: Jan Engelhardt @ 2004-12-13  8:29 UTC (permalink / raw)
  Cc: linux-kernel

> Just being devils advocate here...
>
> I had variable Hz in my tree for a while and found there was one solitary
> purpose to setting Hz to 100; to silence cheap capacitors.
>
> The rest of my users that were setting Hz to 100 for so-called performance
> gains were doing so under the false impression that cpu usage was lower simply
> because of the woefully inaccurate cpu usage calcuation at 100Hz.

I have found that mplayer drops audio less often when the harddisk is under 
load.




Jan Engelhardt
-- 
ENOSPC

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13  0:09         ` dynamic-hz Con Kolivas
@ 2004-12-13  8:37           ` Jan Engelhardt
  2004-12-13 10:43           ` dynamic-hz Pavel Machek
  1 sibling, 0 replies; 126+ messages in thread
From: Jan Engelhardt @ 2004-12-13  8:37 UTC (permalink / raw)
  Cc: linux-kernel

> How does the popular proprietary operating system cope with this? My
> understanding is they run 1000Hz yet they have good power saving and quiet
> capacitors. Presumably they do a lot less per timer tick?

Either that or they maybe use a dynamic ticker, something that adjusts itself
between 100 and 1000 Hz.



Jan Engelhardt
-- 
ENOSPC

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13  0:09         ` dynamic-hz Con Kolivas
  2004-12-13  8:37           ` dynamic-hz Jan Engelhardt
@ 2004-12-13 10:43           ` Pavel Machek
  2004-12-13 11:08             ` dynamic-hz Andrea Arcangeli
  1 sibling, 1 reply; 126+ messages in thread
From: Pavel Machek @ 2004-12-13 10:43 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrea Arcangeli, linux-kernel

Hi!

> >Actually, I measured about 1W power savings with HZ=100. That's about
> >as much as spindown of disk saves...
> 
> How does the popular proprietary operating system cope with this? My 
> understanding is they run 1000Hz yet they have good power saving and quiet 
> capacitors. Presumably they do a lot less per timer tick?

Doing lot less per timer tick is not going to help much... You cpu
needs to awaken, anyway, and awaking of CPU takes lot of time and lot
of power, and is probably going to take way more power than execution
of timer interrupt.
								Pavel

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-12 23:36     ` dynamic-hz Con Kolivas
                         ` (3 preceding siblings ...)
  2004-12-13  8:29       ` dynamic-hz Jan Engelhardt
@ 2004-12-13 11:02       ` Andrew Morton
  2004-12-13 11:17         ` dynamic-hz Andrea Arcangeli
  2004-12-13 11:19         ` dynamic-hz Hans Kristian Rosbach
  2004-12-13 12:00       ` dynamic-hz Alan Cox
  2004-12-14 22:28       ` dynamic-hz Lee Revell
  6 siblings, 2 replies; 126+ messages in thread
From: Andrew Morton @ 2004-12-13 11:02 UTC (permalink / raw)
  To: Con Kolivas; +Cc: andrea, pavel, linux-kernel

Con Kolivas <kernel@kolivas.org> wrote:
>
> The performance benefit, if any, is often lost in noise during 
>  benchmarks and when there, is less than 1%. So I was wondering if you 
>  had some specific advantage in mind for this patch? Is there some 
>  arch-specific advantage? I can certainly envision disadvantages to lower Hz.

There are apparently some laptops which exhibit appreciable latency between
the start of ACPI sleep and actually consuming less power.  The 1ms wakeup
frequency will shorten battery life on these machines significantly.  (I
forget the exact numbers - Len will know).

So I guess we're going to have to do this sometime - I don't think there's
any other solution apart from going fully tickless, which would be
considerably more intrusive.

We should retain the option of compile-time constant HZ - it's
easy enough.  Probably the patch already does that.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 10:43           ` dynamic-hz Pavel Machek
@ 2004-12-13 11:08             ` Andrea Arcangeli
  2004-12-13 19:36               ` dynamic-hz john stultz
  0 siblings, 1 reply; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-13 11:08 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Con Kolivas, linux-kernel

On Mon, Dec 13, 2004 at 11:43:21AM +0100, Pavel Machek wrote:
> Doing lot less per timer tick is not going to help much... You cpu

I also doubt we can do significantly less per timer tick. There's some
new code and lock like the monotonic_lock but that's going to be lost in
the noise, the irq highlevel interface has some overhead too, but that's
going to be lost in the noise too. The rest pretty much cannot be
avoided. I didn't measure it but I suspect the slowest part might
actually be the outb_p/inb_p and the enter/exit kernel.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 11:02       ` dynamic-hz Andrew Morton
@ 2004-12-13 11:17         ` Andrea Arcangeli
  2004-12-13 11:25           ` dynamic-hz Andrew Morton
  2004-12-13 11:19         ` dynamic-hz Hans Kristian Rosbach
  1 sibling, 1 reply; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-13 11:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Con Kolivas, pavel, linux-kernel

On Mon, Dec 13, 2004 at 03:02:37AM -0800, Andrew Morton wrote:
> We should retain the option of compile-time constant HZ - it's
> easy enough.  Probably the patch already does that.

The patch only does HZ at dynamic time. But of course it's absolutely
trivial to define it at compile time, it's probably a 3 liner on top of
my current patch ;). However personally I don't think the three liner
will worth the few seconds more spent configuring the kernel ;).

The HZ cacheline is pure readonly (actually I'm not defining it as
cacheline_aligned, I probably should, __HZ can go together with
__SHIFT_HZ). The only debug option I introduced (because it could have a
performance penalty) is a check that nobody ever attempts to read HZ
before we initialized it by parsing the boot command line. If that
happens I printk and then I fallback to the fixed-HZ, so machine works
fine even in case of bugs and I get the debugging printk. That code
actually never triggered once. I did it primarly during development to
be sure I could debug fast troubles with other archs (this is already
running in all archs with SLES8).

This is pretty much the core of the patch:

+extern unsigned long __HZ;
+
+static inline unsigned long get_hz(void)
+{
+#ifdef CONFIG_DEBUG_HZ
+	if (unlikely(!__HZ)) {
+		__label__ here;
+		printk("early HZ: %p\n", &&here);
+	here:
+		init_HZ(USER_HZ);
+	}
+#endif /* CONFIG_DEBUG_HZ */
+	return __HZ;
+}
+
+#define HZ get_hz()
+
+#define CLOCKS_PER_SEC	(USER_HZ)	/* like times() */
+
+#define jiffies_to_clock_t(x)	(likely((HZ) >= (USER_HZ)) ?
\
+				 (x + ((HZ) / (USER_HZ)) - 1) / ((HZ) /
(USER_HZ)) :	\
+				 (x) * ((USER_HZ) / (HZ)))
+#define user_to_kernel_hz(x)	(likely((HZ) >= (USER_HZ)) ?
\
+				 (x) * ((HZ) / (USER_HZ)) :
\
+				 (x + ((USER_HZ) / (HZ)) - 1) /
((USER_HZ) / (HZ)))
+#define user_to_kernel_hz_overflow(x)	((x * (HZ) + (USER_HZ) - 1) / (USER_HZ))

[..]

+++ x/kernel/sched.c	2004-05-31 15:51:42.722918448 +0200
@@ -45,6 +45,8 @@
 #define TASK_USER_PRIO(p)	USER_PRIO((p)->static_prio)
 #define MAX_USER_PRIO		(USER_PRIO(MAX_PRIO))
 
+unsigned long __HZ, __SHIFT_HZ;
+
 /*

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 11:02       ` dynamic-hz Andrew Morton
  2004-12-13 11:17         ` dynamic-hz Andrea Arcangeli
@ 2004-12-13 11:19         ` Hans Kristian Rosbach
  2004-12-13 11:22           ` dynamic-hz Pavel Machek
                             ` (2 more replies)
  1 sibling, 3 replies; 126+ messages in thread
From: Hans Kristian Rosbach @ 2004-12-13 11:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Con Kolivas, andrea, pavel, Linux Kernel Mailing List

On Mon, 2004-12-13 at 12:02, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> > The performance benefit, if any, is often lost in noise during 
> >  benchmarks and when there, is less than 1%. So I was wondering if you 
> >  had some specific advantage in mind for this patch? Is there some 
> >  arch-specific advantage? I can certainly envision disadvantages to lower Hz.
> 
> There are apparently some laptops which exhibit appreciable latency between
> the start of ACPI sleep and actually consuming less power.  The 1ms wakeup
> frequency will shorten battery life on these machines significantly.  (I
> forget the exact numbers - Len will know).

Is there any recommended lower bound setting?
Would there be a point in recommending lower settings for desktops
running only text consoles opposed to X desktops?

-HK


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 11:19         ` dynamic-hz Hans Kristian Rosbach
@ 2004-12-13 11:22           ` Pavel Machek
  2004-12-13 11:39             ` dynamic-hz Andrea Arcangeli
  2004-12-13 12:51             ` dynamic-hz Hans Kristian Rosbach
  2004-12-13 11:33           ` dynamic-hz Andrea Arcangeli
  2004-12-13 14:38           ` dynamic-hz Zwane Mwaikambo
  2 siblings, 2 replies; 126+ messages in thread
From: Pavel Machek @ 2004-12-13 11:22 UTC (permalink / raw)
  To: Hans Kristian Rosbach
  Cc: Andrew Morton, Con Kolivas, andrea, Linux Kernel Mailing List

Hi!

> > > The performance benefit, if any, is often lost in noise during 
> > >  benchmarks and when there, is less than 1%. So I was wondering if you 
> > >  had some specific advantage in mind for this patch? Is there some 
> > >  arch-specific advantage? I can certainly envision disadvantages to lower Hz.
> > 
> > There are apparently some laptops which exhibit appreciable latency between
> > the start of ACPI sleep and actually consuming less power.  The 1ms wakeup
> > frequency will shorten battery life on these machines significantly.  (I
> > forget the exact numbers - Len will know).
> 
> Is there any recommended lower bound setting?
> Would there be a point in recommending lower settings for desktops
> running only text consoles opposed to X desktops?

I tried defining HZ to 10 once, and there are some #if arrays in the
kernel that prevented me from doing that.

Some drivers do timeouts based on jiffies; having HZ=1 may turn 20msec
timeout into 1sec, that could hurt a lot in the error case...

								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 11:17         ` dynamic-hz Andrea Arcangeli
@ 2004-12-13 11:25           ` Andrew Morton
  2004-12-13 11:47             ` dynamic-hz Andrea Arcangeli
  2004-12-14  3:54             ` dynamic-hz Nish Aravamudan
  0 siblings, 2 replies; 126+ messages in thread
From: Andrew Morton @ 2004-12-13 11:25 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: kernel, pavel, linux-kernel

Andrea Arcangeli <andrea@suse.de> wrote:
>
> The patch only does HZ at dynamic time. But of course it's absolutely
>  trivial to define it at compile time, it's probably a 3 liner on top of
>  my current patch ;). However personally I don't think the three liner
>  will worth the few seconds more spent configuring the kernel ;).

We still have 1000-odd places which do things like

	schedule_timeout(HZ/10);

which will now involve a runtime divide.  The propagation of msleep() and
ssleep() will reduce that a bit, but not much.

It's so simple to turn all those into compile-time divides that we may as
well do it.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13  1:50             ` dynamic-hz Zwane Mwaikambo
@ 2004-12-13 11:28               ` Andrea Arcangeli
  2004-12-13 12:43                 ` dynamic-hz Pavel Machek
  2004-12-13 14:50                 ` dynamic-hz Zwane Mwaikambo
  0 siblings, 2 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-13 11:28 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Con Kolivas, Pavel Machek, linux-kernel

On Sun, Dec 12, 2004 at 06:50:30PM -0700, Zwane Mwaikambo wrote:
> Well most x86(64) these days have local APICs and that provides a 
> relatively inexpensive one shot timer mode.

I doubt a one shot is appropriate. The irq latency is variable and we
won't be able to atomically read tsc and rearm the one-shot timer. The
intemediate error will propagate over time.

You were the one making the case of the NMI, the NMI will screw
completely any attempt of rearming the TSC accurately (though I don't
mind too much, like for the sti; hlt, since NMI is pratically impossible
to trigger in production, if a NMI is fired we've more troubles than the
1/HZ latency on a pending wakeup or on the system time taking the
tangent ;)

Note that what we would have to implement to use a one-shot timer for
timekeeping, it's very similar to the algorithm we already have if the
timer irq get lost because we lost one tick.

My USB modem generates a flood of irq latency >1msec (I tried to track
it down where it comes from but I failed, it seems not a cli but just
the usb_uhci interrup taking 3msec to execute, and the timer irq failing
to execute nested, perhaps I could fix it by forcing irq priorities by
hand), so the tick-loss-adjustment always trigger on my firewall, and it
costantly goes in the future of a minute per hour or so. I had to hack
the code myself to reduce a bit the tsc value and now it's almost in
time, randomly deviating in future and past (note the deviation with the
mainline code is too huge that ntpd has no way to fix it, and it's like
having ntp turned off). It's too bad I couldn't yet find any bug in the
tick-loss adjustment algorithm yet.

In the current tick-loss adjustment case it's the
delay_at_last_interrupt and rdtscl that can't be atomic and that will
force an error on us. In the one shot case it's the read of the tsc and
the rearming that cannot be atomic and it will force an error on the
system time.

Now perhaps the error is small enough with a fast programming chip like
the apic, but the awful results I've got out of the lost-tick adjustment
scares me a bit to depend on a variable error to make the system time
accurate.

Even with the PIT, HZ=100/1000 are two numbers were we can get decent
accuracy, there are probably other frequencies where the accuracy is
less.

(btw, my firewall systemtime will get fixed too by dyanmic-hz HZ=100,
it's pure waste to keep my firewall at HZ=1000 even if I didn't have
constant irq-latency of 3/4msec [measured with rdtsc], though I didn't
mention this yet because dynamic-hz in my firewall case would be a pure
band-aid, even fixing the tick-lost adjustment would be a band-aid, the
only thing to fix is the usb irq that runs for 3/4msec without returning).

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 11:19         ` dynamic-hz Hans Kristian Rosbach
  2004-12-13 11:22           ` dynamic-hz Pavel Machek
@ 2004-12-13 11:33           ` Andrea Arcangeli
  2004-12-13 14:38           ` dynamic-hz Zwane Mwaikambo
  2 siblings, 0 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-13 11:33 UTC (permalink / raw)
  To: Hans Kristian Rosbach
  Cc: Andrew Morton, Con Kolivas, pavel, Linux Kernel Mailing List

On Mon, Dec 13, 2004 at 12:19:50PM +0100, Hans Kristian Rosbach wrote:
> Is there any recommended lower bound setting?
> Would there be a point in recommending lower settings for desktops
> running only text consoles opposed to X desktops?

I don't know the ACPI details, but as far as dyanmic-hz is concerned I
seem to recall I tested it with HZ=10/25/50/... too (as well as
HZ=2000/5000...), everything will work flawlessy but any number below
<50 will pretty much guarantee not to show even an animated flash or gif
fluenty ;). Said that you can use X just fine, not only the console (my
X usage on the laptop sure doesn't need a fast HZ for example).

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 11:22           ` dynamic-hz Pavel Machek
@ 2004-12-13 11:39             ` Andrea Arcangeli
  2004-12-13 12:51             ` dynamic-hz Hans Kristian Rosbach
  1 sibling, 0 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-13 11:39 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Hans Kristian Rosbach, Andrew Morton, Con Kolivas,
	Linux Kernel Mailing List

On Mon, Dec 13, 2004 at 12:22:29PM +0100, Pavel Machek wrote:
> I tried defining HZ to 10 once, and there are some #if arrays in the
> kernel that prevented me from doing that.

I guess you're right and the minimum is HZ=12. I'm pretty sure I could
go down to 25, perhaps the absolute minium was 12 and not 10.

There's also some side effect like this by setting strange HZ:

--- x-ref/net/sched/estimator.c	2003-03-15 03:25:19.000000000 +0100
+++ x/net/sched/estimator.c	2004-05-31 15:51:42.778909936 +0200
@@ -71,10 +71,6 @@
      at user level painlessly.
  */
 
-#if (HZ%4) != 0
-#error Bad HZ value.
-#endif
-
 #define EST_MAX_INTERVAL	5
 
 struct qdisc_estimator
@@ -136,6 +132,9 @@ int qdisc_new_estimator(struct tc_stats 
 	struct qdisc_estimator *est;
 	struct tc_estimator *parm = RTA_DATA(opt);
 
+	if (unlikely(HZ % 4))
+		return -EINVAL;
+
 	if (RTA_PAYLOAD(opt) < sizeof(*parm))
 		return -EINVAL;
 


If you boot with an HZ not divisible by 4 you get -EINVAL at runtime
(instead of a compile failure since we can't check it at compile time
anymore ;).

Anyway the major point of the patch is to get HZ switchable from 100 to
1000, those two values are really the only supported ones. The rest is a
bonus, and I'm sure at least 50 and 2000 will work flawlessy too.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 11:25           ` dynamic-hz Andrew Morton
@ 2004-12-13 11:47             ` Andrea Arcangeli
  2004-12-14  3:56               ` dynamic-hz Nish Aravamudan
  2004-12-14  3:54             ` dynamic-hz Nish Aravamudan
  1 sibling, 1 reply; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-13 11:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: kernel, pavel, linux-kernel

On Mon, Dec 13, 2004 at 03:25:21AM -0800, Andrew Morton wrote:
> We still have 1000-odd places which do things like
> 
> 	schedule_timeout(HZ/10);
> 
> which will now involve a runtime divide.  The propagation of msleep() and
> ssleep() will reduce that a bit, but not much.

The above is by far the least cpu-hungry piece, it's going to sleep for
100msec, so any order-of-nanoseconds computation in such path will be by
defininition not measurable.

msleep and ssleep as well will obviously be non measurable for the same
reason (their only point is to wait and "waste" cpu). I mean,
msleep/ssleep are the only places in the kernel that we don't really
need to optimize ;). 

Most other fast paths can't execute the division or multiplication at
compile time anyway, so they'd only save 1 cacheline (at the expense of
a bit larger icache).

> It's so simple to turn all those into compile-time divides that we may as
> well do it.

I'm not against leaving a compile time option, it's absolutely trivial
to add it, but I just don't think it'll provide any measurable benefit
in practice, while the ability to switch HZ provides tantible benefits
(even to be able to set HZ to higher frequencies than 1khz, so that
people can post a nanosleep call that will return in 0.1msec instead of
1msec).

Perhaps __HZ could hurt a bit on a NUMA box where the icache may be
spread on the local nodes and the __HZ not, but then the __HZ could be
made a __per_cpu variable conditionally to NUMA and they would get
dynamic settable hz too, which I believe is significant for a numa box
since if they're doing just userspace computing they don't need a fast
HZ and they can get back 1% of their cpu power from every cpu in the
system (on a 512-way system that's quite a lot more than what you will
ever get back from HZ set at compile time ;). 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-12 23:36     ` dynamic-hz Con Kolivas
                         ` (4 preceding siblings ...)
  2004-12-13 11:02       ` dynamic-hz Andrew Morton
@ 2004-12-13 12:00       ` Alan Cox
  2004-12-13 15:52         ` dynamic-hz Andrea Arcangeli
  2004-12-14 22:28       ` dynamic-hz Lee Revell
  6 siblings, 1 reply; 126+ messages in thread
From: Alan Cox @ 2004-12-13 12:00 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrea Arcangeli, Pavel Machek, Linux Kernel Mailing List

On Sul, 2004-12-12 at 23:36, Con Kolivas wrote:
> The rest of my users that were setting Hz to 100 for so-called 
> performance gains were doing so under the false impression that cpu 
> usage was lower simply because of the woefully inaccurate cpu usage 
> calcuation at 100Hz.

It makes a difference for some HPC workloads. I run 100Hz because
- It improves battery life
- Laptops tend to lose ticks on battery status queries at 1Khz


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 11:28               ` dynamic-hz Andrea Arcangeli
@ 2004-12-13 12:43                 ` Pavel Machek
  2004-12-13 12:58                   ` dynamic-hz Andrea Arcangeli
  2004-12-13 14:50                 ` dynamic-hz Zwane Mwaikambo
  1 sibling, 1 reply; 126+ messages in thread
From: Pavel Machek @ 2004-12-13 12:43 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel

Hi!

> > Well most x86(64) these days have local APICs and that provides a 
> > relatively inexpensive one shot timer mode.
> 
> I doubt a one shot is appropriate. The irq latency is variable and we
> won't be able to atomically read tsc and rearm the one-shot timer. The
> intemediate error will propagate over time.

But that does not matter, right? Yes, one-shot timer will not fire
exactly at right place, but as long as you are reading TSC and basing
next shot on current time, error should not accumulate.
								Pavel
-- 
Boycott Kodak -- for their patent abuse against Java.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 11:22           ` dynamic-hz Pavel Machek
  2004-12-13 11:39             ` dynamic-hz Andrea Arcangeli
@ 2004-12-13 12:51             ` Hans Kristian Rosbach
  2004-12-13 13:01               ` dynamic-hz Andrea Arcangeli
                                 ` (2 more replies)
  1 sibling, 3 replies; 126+ messages in thread
From: Hans Kristian Rosbach @ 2004-12-13 12:51 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Morton, Con Kolivas, andrea, Linux Kernel Mailing List

On Mon, Dec 13, 2004 at 12:22:29PM +0100, Pavel Machek wrote:
> I tried defining HZ to 10 once, and there are some #if arrays in the
> kernel that prevented me from doing that.
> 
> Some drivers do timeouts based on jiffies; having HZ=1 may turn 20msec
> timeout into 1sec, that could hurt a lot in the error case...

On Mon, Dec 13, 2004 at 03:25:21AM -0800, Andrew Morton wrote:
> We still have 1000-odd places which do things like
>        schedule_timeout(HZ/10);
> which will now involve a runtime divide.  The propagation of msleep()
> and ssleep() will reduce that a bit, but not much.

Shouldn't that be regarded as a bug/deprecated?

I'm not sure what the above "scedule_timeout(HZ/10)" is supposed to
do, but the parameter it gets in 1000hz is "100" so I assume this
is because we want to wait for 100ms, and in 1000hz that equals
100 cycles. Correct?

If so, I guess this calculation would fix that problem, but I guess
this is also what Andrew referred to as the extra runtime division?

wait-ms/(1000/hz) = hz-to-wait
100/(1000/1000) = 100 == 100ms
100/(1000/100)  = 10  == 100ms
100/(1000/50)   = 5   == 100ms

It would of course be optimized to something like this:
wait-ms/ms-per-hz

What about this:
At startup time we set a global variable based on hz:
varX = HZ/1000;

then in the rest of the code we can use ex:
schedule_timeout(varX*100) for 100ms no matter what hz is.

With hz=50 then the lowest ms is 20 for one tick though. And that
might trigger problems with approximation at some point.
varX would have to be decimal, and that might also be a problem?

I think that extremists will push the limits on these settings,
and that failure due to wrong timouts or other similar things
would generate unwanted noise on LKML.

I think I'm just stating about the obvious now, am I not?

-HK


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 12:43                 ` dynamic-hz Pavel Machek
@ 2004-12-13 12:58                   ` Andrea Arcangeli
  2004-12-13 19:12                     ` dynamic-hz Pavel Machek
  0 siblings, 1 reply; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-13 12:58 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel

Hi Pavel,

On Mon, Dec 13, 2004 at 01:43:13PM +0100, Pavel Machek wrote:
> But that does not matter, right? Yes, one-shot timer will not fire
> exactly at right place, but as long as you are reading TSC and basing
> next shot on current time, error should not accumulate.

As said in the rest of the message, the error (or some other error)
accumulates heavily today in the tick-loss compensation/adjustment
algorithm in arch/i386/kernel/timers/timer_tsc.c, so I'm sceptical about
using one-shots that have the very same problem of the tick-loss
adjustment algorithm. Amittedly the apic is faster to reprogram than the
pit to read the delay_at_last_interrupt, but it still doesn't sound too
sure it will work fine. At least first I'd invest in trying to find if
the tick adjustment is totally malfunctioning because of a tangible real
bug, and not simply because it's unfixable (I tried to find the real bug
so far, so I'm start thinking it's unfixable if really it's recalled so
frequently as while using the broken usb irq like with my adsl modem).

> [..] for their patent abuse against Java.

java isn't open source regardless of patents, use python instead ;).

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 12:51             ` dynamic-hz Hans Kristian Rosbach
@ 2004-12-13 13:01               ` Andrea Arcangeli
  2004-12-13 13:02                 ` dynamic-hz Andrea Arcangeli
  2004-12-13 15:06               ` dynamic-hz Geert Uytterhoeven
  2004-12-14  4:05               ` dynamic-hz Nish Aravamudan
  2 siblings, 1 reply; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-13 13:01 UTC (permalink / raw)
  To: Hans Kristian Rosbach
  Cc: Pavel Machek, Andrew Morton, Con Kolivas, Linux Kernel Mailing List

On Mon, Dec 13, 2004 at 01:51:11PM +0100, Hans Kristian Rosbach wrote:
> then in the rest of the code we can use ex:
> schedule_timeout(varX*100) for 100ms no matter what hz is.

There's not real difference between a multiplication or a division,
and for either cases it doesn't worth to optimize such usage IMHO. I
believe the only real cost is the cacheline anyway.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 13:01               ` dynamic-hz Andrea Arcangeli
@ 2004-12-13 13:02                 ` Andrea Arcangeli
  0 siblings, 0 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-13 13:02 UTC (permalink / raw)
  To: Hans Kristian Rosbach
  Cc: Pavel Machek, Andrew Morton, Con Kolivas, Linux Kernel Mailing List

On Mon, Dec 13, 2004 at 02:01:42PM +0100, Andrea Arcangeli wrote:
> believe the only real cost is the cacheline anyway.

[..] and in turn I guess by adding a second dynamic variable you just
doubled the only real cost ;)

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13  7:43       ` dynamic-hz Stefan Seyfried
@ 2004-12-13 13:58         ` Russell King
  2004-12-13 14:14           ` dynamic-hz Russell King
                             ` (3 more replies)
  2004-12-13 16:19         ` dynamic-hz Jan Engelhardt
  1 sibling, 4 replies; 126+ messages in thread
From: Russell King @ 2004-12-13 13:58 UTC (permalink / raw)
  To: Stefan Seyfried; +Cc: Con Kolivas, Pavel Machek, linux-kernel, Andrea Arcangeli

On Mon, Dec 13, 2004 at 08:43:55AM +0100, Stefan Seyfried wrote:
> Con Kolivas wrote:
> > Just being devils advocate here...
> > 
> > I had variable Hz in my tree for a while and found there was one 
> > solitary purpose to setting Hz to 100; to silence cheap capacitors.
> 
> power savings? Having the cpu wake up 1000 times per second if the
> machine is idle cannot be better than only waking it up 100 times.
> 
> Yes, i am always on the quest for the 5 extra minutes on battery :-)

This is an easy thing to grab hold of, but rather pointless in the
overall scheme of things.  Those of us who have done power usage
measurements know this already.

The only case where this really makes sense is where the CPU power
usage outweighs the power consumption of all other peripherals by
at least an order of magnitude such that the rest of the system is
insignificant compared to the CPU power.

Lets take an example.  Lets say that:
* a CPU runs at about 245mA when active
* 90mA when inactive
* the timer interrupt takes 2us to execute 1000 times a second
* no other processing is occuring

This means that the average current consumption is about:
	245mA * 2 * 10^-6 + 90mA * (1 - 2 * 10^-6) = 90.00031mA

This means that the timer interrupt has increased CPU power by
0.00034%.

Now, lets factor in the rest of a system.  Lets the rest of the
system takes 84mA.  Recalculating (by increasing each figure by
84mA) gives us 174.00031mA, or an increase in overall system
power by about 0.00018%.

Assuming your battery normally lasts exactly 24 hours on a current
drain of 174.00031mA, completely eliminating the tick gives you
an extra 0.15 seconds battery life.

Note: the above CPU power consumption figures were taken from
the Intel PXA255 processor electrical specifications, and the
"rest of the system" current consumption taken from a real life
device.  The timer interrupt taking 2us is probably an over-
estimation.  Only the battery lifetime of 24 hours is ficticious.

And yes, from time to time I keep thinking that it would be nice
to eliminate the timer tick to save some power.  However, I've
never been able to justify the extra code complexity against the
power savings.  It really only makes sense if you can essentially
_power off_ your system until the next timer interrupt (thereby,
in the above example, reducing the power consumption by some 174mA)

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 13:58         ` dynamic-hz Russell King
@ 2004-12-13 14:14           ` Russell King
  2004-12-13 14:52           ` dynamic-hz Alan Cox
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 126+ messages in thread
From: Russell King @ 2004-12-13 14:14 UTC (permalink / raw)
  To: Stefan Seyfried, Con Kolivas, Pavel Machek, linux-kernel,
	Andrea Arcangeli

On Mon, Dec 13, 2004 at 01:58:20PM +0000, Russell King wrote:
> On Mon, Dec 13, 2004 at 08:43:55AM +0100, Stefan Seyfried wrote:
> > Con Kolivas wrote:
> > > Just being devils advocate here...
> > > 
> > > I had variable Hz in my tree for a while and found there was one 
> > > solitary purpose to setting Hz to 100; to silence cheap capacitors.
> > 
> > power savings? Having the cpu wake up 1000 times per second if the
> > machine is idle cannot be better than only waking it up 100 times.
> > 
> > Yes, i am always on the quest for the 5 extra minutes on battery :-)
> 
> This is an easy thing to grab hold of, but rather pointless in the
> overall scheme of things.  Those of us who have done power usage
> measurements know this already.
> 
> The only case where this really makes sense is where the CPU power
> usage outweighs the power consumption of all other peripherals by
> at least an order of magnitude such that the rest of the system is
> insignificant compared to the CPU power.
> 
> Lets take an example.  Lets say that:
> * a CPU runs at about 245mA when active
> * 90mA when inactive
> * the timer interrupt takes 2us to execute 1000 times a second
> * no other processing is occuring
> 
> This means that the average current consumption is about:
> 	245mA * 2 * 10^-6 + 90mA * (1 - 2 * 10^-6) = 90.00031mA

Sorry, missed out the 1000 times a second.  Grumble.

	245mA * 1000 * 2 * 10^-6 + 90mA * (1 - 2 * 10^-6 * 1000) = 90.31mA

> This means that the timer interrupt has increased CPU power by
> 0.00034%.

0.34%

> Now, lets factor in the rest of a system.  Lets the rest of the
> system takes 84mA.  Recalculating (by increasing each figure by
> 84mA) gives us 174.00031mA, or an increase in overall system

174.31mA

> power by about 0.00018%.

0.18%

> Assuming your battery normally lasts exactly 24 hours on a current
> drain of 174.00031mA, completely eliminating the tick gives you

174.31mA

> an extra 0.15 seconds battery life.

2mins 30secs

> Note: the above CPU power consumption figures were taken from
> the Intel PXA255 processor electrical specifications, and the
> "rest of the system" current consumption taken from a real life
> device.  The timer interrupt taking 2us is probably an over-
> estimation.  Only the battery lifetime of 24 hours is ficticious.
> 
> And yes, from time to time I keep thinking that it would be nice
> to eliminate the timer tick to save some power.  However, I've
> never been able to justify the extra code complexity against the
> power savings.  It really only makes sense if you can essentially
> _power off_ your system until the next timer interrupt (thereby,
> in the above example, reducing the power consumption by some 174mA)

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 11:19         ` dynamic-hz Hans Kristian Rosbach
  2004-12-13 11:22           ` dynamic-hz Pavel Machek
  2004-12-13 11:33           ` dynamic-hz Andrea Arcangeli
@ 2004-12-13 14:38           ` Zwane Mwaikambo
  2 siblings, 0 replies; 126+ messages in thread
From: Zwane Mwaikambo @ 2004-12-13 14:38 UTC (permalink / raw)
  To: Hans Kristian Rosbach
  Cc: Andrew Morton, Con Kolivas, andrea, pavel, Linux Kernel Mailing List

On Mon, 13 Dec 2004, Hans Kristian Rosbach wrote:

> Is there any recommended lower bound setting?
> Would there be a point in recommending lower settings for desktops
> running only text consoles opposed to X desktops?

You could probably go as low as 50 without noticing anything on text only 
consoles.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 11:28               ` dynamic-hz Andrea Arcangeli
  2004-12-13 12:43                 ` dynamic-hz Pavel Machek
@ 2004-12-13 14:50                 ` Zwane Mwaikambo
  1 sibling, 0 replies; 126+ messages in thread
From: Zwane Mwaikambo @ 2004-12-13 14:50 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Con Kolivas, Pavel Machek, linux-kernel

On Mon, 13 Dec 2004, Andrea Arcangeli wrote:

> You were the one making the case of the NMI, the NMI will screw
> completely any attempt of rearming the TSC accurately (though I don't
> mind too much, like for the sti; hlt, since NMI is pratically impossible
> to trigger in production, if a NMI is fired we've more troubles than the
> 1/HZ latency on a pending wakeup or on the system time taking the
> tangent ;)

I wouldn't say that NMI isn't used in production, if we didn't cater for 
NMI it'd be hard to do high sample rate profiling with Oprofile and 
dynamic-hz. I consider (non)kernel developers profiling code on systems as 
production use.

> (btw, my firewall systemtime will get fixed too by dyanmic-hz HZ=100,
> it's pure waste to keep my firewall at HZ=1000 even if I didn't have
> constant irq-latency of 3/4msec [measured with rdtsc], though I didn't
> mention this yet because dynamic-hz in my firewall case would be a pure
> band-aid, even fixing the tick-lost adjustment would be a band-aid, the
> only thing to fix is the usb irq that runs for 3/4msec without returning).

I have a few personal systems which really would benefit too ;)

Thanks,
	Zwane


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 13:58         ` dynamic-hz Russell King
  2004-12-13 14:14           ` dynamic-hz Russell King
@ 2004-12-13 14:52           ` Alan Cox
  2004-12-13 16:23             ` dynamic-hz Russell King
  2004-12-14  0:16             ` dynamic-hz Eric St-Laurent
  2004-12-13 15:30           ` dynamic-hz Zwane Mwaikambo
  2004-12-13 16:06           ` dynamic-hz Pavel Machek
  3 siblings, 2 replies; 126+ messages in thread
From: Alan Cox @ 2004-12-13 14:52 UTC (permalink / raw)
  To: Russell King
  Cc: Stefan Seyfried, Con Kolivas, Pavel Machek,
	Linux Kernel Mailing List, Andrea Arcangeli

On Llu, 2004-12-13 at 13:58, Russell King wrote:
> Lets take an example.  Lets say that:
> * a CPU runs at about 245mA when active
> * 90mA when inactive
> * the timer interrupt takes 2us to execute 1000 times a second
> * no other processing is occuring

Now take a real laptop and the numbers are in the 20W (15A) range.

> to eliminate the timer tick to save some power.  However, I've
> never been able to justify the extra code complexity against the
> power savings.  It really only makes sense if you can essentially
> _power off_ your system until the next timer interrupt (thereby,
> in the above example, reducing the power consumption by some 174mA)

On a PC it makes huge sense, the deeply embedded folks who do turn the
thing off for 30secs at a time (Eg cellphone) also want it as do
virtualisation people where it trashes your scaling. API wise it isn't
too hard, its just a matter of time to convert the jiffies users away
and to do relative versions of add_timer with accuracy info included.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 12:51             ` dynamic-hz Hans Kristian Rosbach
  2004-12-13 13:01               ` dynamic-hz Andrea Arcangeli
@ 2004-12-13 15:06               ` Geert Uytterhoeven
  2004-12-13 16:12                 ` dynamic-hz Pavel Machek
  2004-12-14  4:05               ` dynamic-hz Nish Aravamudan
  2 siblings, 1 reply; 126+ messages in thread
From: Geert Uytterhoeven @ 2004-12-13 15:06 UTC (permalink / raw)
  To: Hans Kristian Rosbach
  Cc: Pavel Machek, Andrew Morton, Con Kolivas, andrea,
	Linux Kernel Mailing List

On Mon, 13 Dec 2004, Hans Kristian Rosbach wrote:
> I'm not sure what the above "scedule_timeout(HZ/10)" is supposed to
> do, but the parameter it gets in 1000hz is "100" so I assume this
> is because we want to wait for 100ms, and in 1000hz that equals
> 100 cycles. Correct?

`schedule_timeout(HZ/x)' lets it wait for 1/x'th second.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 13:58         ` dynamic-hz Russell King
  2004-12-13 14:14           ` dynamic-hz Russell King
  2004-12-13 14:52           ` dynamic-hz Alan Cox
@ 2004-12-13 15:30           ` Zwane Mwaikambo
  2004-12-13 15:59             ` dynamic-hz Russell King
  2004-12-13 16:06           ` dynamic-hz Pavel Machek
  3 siblings, 1 reply; 126+ messages in thread
From: Zwane Mwaikambo @ 2004-12-13 15:30 UTC (permalink / raw)
  To: Russell King
  Cc: Stefan Seyfried, Con Kolivas, Pavel Machek, linux-kernel,
	Andrea Arcangeli

Hi Russell,

On Mon, 13 Dec 2004, Russell King wrote:

> This is an easy thing to grab hold of, but rather pointless in the
> overall scheme of things.  Those of us who have done power usage
> measurements know this already.
> 
> The only case where this really makes sense is where the CPU power
> usage outweighs the power consumption of all other peripherals by
> at least an order of magnitude such that the rest of the system is
> insignificant compared to the CPU power.
> 
> Note: the above CPU power consumption figures were taken from
> the Intel PXA255 processor electrical specifications, and the
> "rest of the system" current consumption taken from a real life
> device.  The timer interrupt taking 2us is probably an over-
> estimation.  Only the battery lifetime of 24 hours is ficticious.

While i do not disagree with your research and resultant conclusions for 
the PXA255 processor i think it may not be as representative of some of 
the target systems we're interested in, that is, x86 (cringe, cringe). A 
number of i386 systems enter model defined partial suspend states when 
execution of the hlt instruction takes place, resuming from these suspend 
states draws more power for a short period of time thus doing this every 
millisecond is going to be detrimental to total power consumption over 
time. But this isn't only an i386 trait as other desktop/workstation 
processors are similar.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 12:00       ` dynamic-hz Alan Cox
@ 2004-12-13 15:52         ` Andrea Arcangeli
  0 siblings, 0 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-13 15:52 UTC (permalink / raw)
  To: Alan Cox; +Cc: Con Kolivas, Pavel Machek, Linux Kernel Mailing List

On Mon, Dec 13, 2004 at 12:00:44PM +0000, Alan Cox wrote:
> - Laptops tend to lose ticks on battery status queries at 1Khz

The lost-tick adjustment code should in theory cope with it, however in
my firewall with USB adsl modem taking 3msec-long-irqs, it makes the
system time go in the future pretty quick (instead of losing time
without tick compensation code). I guess the same would happen with the
battery status checks.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 15:30           ` dynamic-hz Zwane Mwaikambo
@ 2004-12-13 15:59             ` Russell King
  2004-12-13 16:14               ` dynamic-hz Pavel Machek
  0 siblings, 1 reply; 126+ messages in thread
From: Russell King @ 2004-12-13 15:59 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Stefan Seyfried, Con Kolivas, Pavel Machek, linux-kernel,
	Andrea Arcangeli

On Mon, Dec 13, 2004 at 08:30:51AM -0700, Zwane Mwaikambo wrote:
> Hi Russell,
> 
> On Mon, 13 Dec 2004, Russell King wrote:
> 
> > This is an easy thing to grab hold of, but rather pointless in the
> > overall scheme of things.  Those of us who have done power usage
> > measurements know this already.
> > 
> > The only case where this really makes sense is where the CPU power
> > usage outweighs the power consumption of all other peripherals by
> > at least an order of magnitude such that the rest of the system is
> > insignificant compared to the CPU power.
> > 
> > Note: the above CPU power consumption figures were taken from
> > the Intel PXA255 processor electrical specifications, and the
> > "rest of the system" current consumption taken from a real life
> > device.  The timer interrupt taking 2us is probably an over-
> > estimation.  Only the battery lifetime of 24 hours is ficticious.
> 
> While i do not disagree with your research and resultant conclusions for 
> the PXA255 processor i think it may not be as representative of some of 
> the target systems we're interested in, that is, x86 (cringe, cringe). A 
> number of i386 systems enter model defined partial suspend states when 
> execution of the hlt instruction takes place, resuming from these suspend 
> states draws more power for a short period of time thus doing this every 
> millisecond is going to be detrimental to total power consumption over 
> time. But this isn't only an i386 trait as other desktop/workstation 
> processors are similar.

I think you missed the emphasis of my mail - one on measurement to
validate if this technology actually buys you anything.

The second thing you missed is that drawing a lot of power for a short
time may result in a rather negligable reduction in the overall scheme
of things.  Until you measure it and do the calculation, you'll never
know.

I can make the same comments as you above about the PXA255 processor.
"The PXA255 has a special idle mode which reduces power consumption
via the use of a special instruction.  Resuming from this state to
service the timer interrupt results in more power being drawed for
a short period of time, thus doing this every millisecond is going
to be determental to the total power consumption over time."

See?  Measurements in reality give the true story.  Words are just
that - words - which may not reflect reality.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 13:58         ` dynamic-hz Russell King
                             ` (2 preceding siblings ...)
  2004-12-13 15:30           ` dynamic-hz Zwane Mwaikambo
@ 2004-12-13 16:06           ` Pavel Machek
  3 siblings, 0 replies; 126+ messages in thread
From: Pavel Machek @ 2004-12-13 16:06 UTC (permalink / raw)
  To: Stefan Seyfried, Con Kolivas, linux-kernel, Andrea Arcangeli

Hi!

> > > Just being devils advocate here...
> > > 
> > > I had variable Hz in my tree for a while and found there was one 
> > > solitary purpose to setting Hz to 100; to silence cheap capacitors.
> > 
> > power savings? Having the cpu wake up 1000 times per second if the
> > machine is idle cannot be better than only waking it up 100 times.
> > 
> > Yes, i am always on the quest for the 5 extra minutes on battery :-)
> 
> This is an easy thing to grab hold of, but rather pointless in the
> overall scheme of things.  Those of us who have done power usage
> measurements know this already.
> 
> The only case where this really makes sense is where the CPU power
> usage outweighs the power consumption of all other peripherals by
> at least an order of magnitude such that the rest of the system is
> insignificant compared to the CPU power.

Why by order of magnitude? Anyway on PC machines, cpu in low-power
mode takes about as much as rest of system, and in high-power mode it
takes more than rest of system combined.

I measured 1W savings from HZ=100, and that was on system that takes
17W total (arima athlon64 notebook). That is > 5%.

> Lets take an example.  Lets say that:
> * a CPU runs at about 245mA when active
> * 90mA when inactive
> * the timer interrupt takes 2us to execute 1000 times a second
> * no other processing is occuring

You assume that cpu goes to sleep immeidately. That is *very* far away
from reality on at least pentium 4. It takes half a milisecond to
sleep/wakeup the cpu, that basically means that low power mode is not
ever entered with HZ=1000...
								Pavel

-- 
Boycott Kodak -- for their patent abuse against Java.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 15:06               ` dynamic-hz Geert Uytterhoeven
@ 2004-12-13 16:12                 ` Pavel Machek
  2004-12-13 16:14                   ` dynamic-hz Geert Uytterhoeven
  2004-12-14  4:06                   ` dynamic-hz Nish Aravamudan
  0 siblings, 2 replies; 126+ messages in thread
From: Pavel Machek @ 2004-12-13 16:12 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Hans Kristian Rosbach, Andrew Morton, Con Kolivas, andrea,
	Linux Kernel Mailing List

HI!

> > I'm not sure what the above "scedule_timeout(HZ/10)" is supposed to
> > do, but the parameter it gets in 1000hz is "100" so I assume this
> > is because we want to wait for 100ms, and in 1000hz that equals
> > 100 cycles. Correct?
> 
> `schedule_timeout(HZ/x)' lets it wait for 1/x'th second.

...small problem is that for HZ lower than x it does not wait at all
:-(.
								Pavel
-- 
Boycott Kodak -- for their patent abuse against Java.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 15:59             ` dynamic-hz Russell King
@ 2004-12-13 16:14               ` Pavel Machek
  0 siblings, 0 replies; 126+ messages in thread
From: Pavel Machek @ 2004-12-13 16:14 UTC (permalink / raw)
  To: Zwane Mwaikambo, Stefan Seyfried, Con Kolivas, linux-kernel,
	Andrea Arcangeli

Hi!

> See?  Measurements in reality give the true story.  Words are just
> that - words - which may not reflect reality.

On athlon64 notebook, HZ=1000 takes 1W of power. That's 5% of overall
power consumption, and as much as spinning disk takes. I'd say that's
rather significant.
								Pavel
-- 
Boycott Kodak -- for their patent abuse against Java.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 16:12                 ` dynamic-hz Pavel Machek
@ 2004-12-13 16:14                   ` Geert Uytterhoeven
  2004-12-14  4:06                   ` dynamic-hz Nish Aravamudan
  1 sibling, 0 replies; 126+ messages in thread
From: Geert Uytterhoeven @ 2004-12-13 16:14 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Hans Kristian Rosbach, Andrew Morton, Con Kolivas, andrea,
	Linux Kernel Mailing List

On Mon, 13 Dec 2004, Pavel Machek wrote:
> > > I'm not sure what the above "scedule_timeout(HZ/10)" is supposed to
> > > do, but the parameter it gets in 1000hz is "100" so I assume this
> > > is because we want to wait for 100ms, and in 1000hz that equals
> > > 100 cycles. Correct?
> > 
> > `schedule_timeout(HZ/x)' lets it wait for 1/x'th second.
> 
> ...small problem is that for HZ lower than x it does not wait at all
> :-(.

I know. You better use `(HZ+x-1)/x' for delays.
Integer division can be tricky :-)

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13  7:43       ` dynamic-hz Stefan Seyfried
  2004-12-13 13:58         ` dynamic-hz Russell King
@ 2004-12-13 16:19         ` Jan Engelhardt
  1 sibling, 0 replies; 126+ messages in thread
From: Jan Engelhardt @ 2004-12-13 16:19 UTC (permalink / raw)
  To: Stefan Seyfried; +Cc: Con Kolivas, Pavel Machek, linux-kernel, Andrea Arcangeli

>> Just being devils advocate here...
>> 
>> I had variable Hz in my tree for a while and found there was one 
>> solitary purpose to setting Hz to 100; to silence cheap capacitors.
>
>power savings? Having the cpu wake up 1000 times per second if the
>machine is idle cannot be better than only waking it up 100 times.

That's like saying waking up the CPU just to perform a HLT operation would be 
useless.


Jan Engelhardt
-- 
ENOSPC

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 14:52           ` dynamic-hz Alan Cox
@ 2004-12-13 16:23             ` Russell King
  2004-12-13 17:53               ` dynamic-hz Michael Buesch
                                 ` (2 more replies)
  2004-12-14  0:16             ` dynamic-hz Eric St-Laurent
  1 sibling, 3 replies; 126+ messages in thread
From: Russell King @ 2004-12-13 16:23 UTC (permalink / raw)
  To: Alan Cox
  Cc: Stefan Seyfried, Con Kolivas, Pavel Machek,
	Linux Kernel Mailing List, Andrea Arcangeli

On Mon, Dec 13, 2004 at 02:52:46PM +0000, Alan Cox wrote:
> On Llu, 2004-12-13 at 13:58, Russell King wrote:
> > Lets take an example.  Lets say that:
> > * a CPU runs at about 245mA when active
> > * 90mA when inactive
> > * the timer interrupt takes 2us to execute 1000 times a second
> > * no other processing is occuring
> 
> Now take a real laptop and the numbers are in the 20W (15A) range.

Roughly 650mA for my laptop while idle or just under 7W - by calculation
from battery capacity and measured lifetime.  The question is how much
of that is due to the CPU itself and how much is due to the peripherals.

> > to eliminate the timer tick to save some power.  However, I've
> > never been able to justify the extra code complexity against the
> > power savings.  It really only makes sense if you can essentially
> > _power off_ your system until the next timer interrupt (thereby,
> > in the above example, reducing the power consumption by some 174mA)
> 
> On a PC it makes huge sense, the deeply embedded folks who do turn the
> thing off for 30secs at a time (Eg cellphone) also want it as do
> virtualisation people where it trashes your scaling. API wise it isn't
> too hard, its just a matter of time to convert the jiffies users away
> and to do relative versions of add_timer with accuracy info included.

I don't disagree with your cellphone example - it makes a whole lot of
sense there, where the device is going to end up in someones pocket
not doing very much at all.


There is another twist here though - the Linux kernel kicks itself out
of idle mode and into some other thread multiple times a second while
the system is idle.  So far, in all my Linux kernel experience, I've
yet to see a kernel where it's possible to stay in the idle thread
for more than half a second.  (The ARM kernels I run are always
configured with IDLE LED support, so I can _see_ when it gets kicked
out of the idle thread.)

So, not only do the VST people need to solve the HZ interrupt problem,
but also need to track down which kernel threads keep waking up on an
otherwise idle system "just in case" they need to do something.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 16:23             ` dynamic-hz Russell King
@ 2004-12-13 17:53               ` Michael Buesch
  2004-12-13 18:04                 ` dynamic-hz Russell King
  2004-12-13 19:04               ` dynamic-hz Pavel Machek
  2004-12-13 20:11               ` dynamic-hz Russell King
  2 siblings, 1 reply; 126+ messages in thread
From: Michael Buesch @ 2004-12-13 17:53 UTC (permalink / raw)
  To: Russell King
  Cc: Alan Cox, Stefan Seyfried, Con Kolivas, Pavel Machek,
	Linux Kernel Mailing List, Andrea Arcangeli

[-- Attachment #1: Type: text/plain, Size: 508 bytes --]

Quoting Russell King <rmk+lkml@arm.linux.org.uk>:
> the system is idle.  So far, in all my Linux kernel experience, I've
> yet to see a kernel where it's possible to stay in the idle thread
> for more than half a second.  (The ARM kernels I run are always
> configured with IDLE LED support, so I can _see_ when it gets kicked
> out of the idle thread.)

I guess IDLE LED support is not in mainline kernel, is it?
Where can I get it?

-- 
Regards Michael Buesch  [ http://www.tuxsoft.de.vu ]



[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 17:53               ` dynamic-hz Michael Buesch
@ 2004-12-13 18:04                 ` Russell King
  0 siblings, 0 replies; 126+ messages in thread
From: Russell King @ 2004-12-13 18:04 UTC (permalink / raw)
  To: Michael Buesch
  Cc: Alan Cox, Stefan Seyfried, Con Kolivas, Pavel Machek,
	Linux Kernel Mailing List, Andrea Arcangeli

On Mon, Dec 13, 2004 at 06:53:40PM +0100, Michael Buesch wrote:
> Quoting Russell King <rmk+lkml@arm.linux.org.uk>:
> > the system is idle.  So far, in all my Linux kernel experience, I've
> > yet to see a kernel where it's possible to stay in the idle thread
> > for more than half a second.  (The ARM kernels I run are always
> > configured with IDLE LED support, so I can _see_ when it gets kicked
> > out of the idle thread.)
> 
> I guess IDLE LED support is not in mainline kernel, is it?
> Where can I get it?

It's an ARM only thing, and it's in mainline kernels for ARM platforms
which have general purpose LEDs available.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 16:23             ` dynamic-hz Russell King
  2004-12-13 17:53               ` dynamic-hz Michael Buesch
@ 2004-12-13 19:04               ` Pavel Machek
  2004-12-13 20:11               ` dynamic-hz Russell King
  2 siblings, 0 replies; 126+ messages in thread
From: Pavel Machek @ 2004-12-13 19:04 UTC (permalink / raw)
  To: Alan Cox, Stefan Seyfried, Con Kolivas,
	Linux Kernel Mailing List, Andrea Arcangeli

Hi!

> > > Lets take an example.  Lets say that:
> > > * a CPU runs at about 245mA when active
> > > * 90mA when inactive
> > > * the timer interrupt takes 2us to execute 1000 times a second
> > > * no other processing is occuring
> > 
> > Now take a real laptop and the numbers are in the 20W (15A) range.
> 
> Roughly 650mA for my laptop while idle or just under 7W - by calculation
> from battery capacity and measured lifetime.  The question is how much
> of that is due to the CPU itself and how much is due to the
> peripherals.

On Arima notebook here, whole machine takes 28W on 800MHz, idle
(measured using external power meter), 33W with max backlight, and 68W
at 2GHz, computing. Going to HZ=100 makes it one Watt less. I'd say
that's quite significant. [17W number was based on internal power
meter when running on battery].

So yes, CPU *is* taking more than all other perihepals.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 12:58                   ` dynamic-hz Andrea Arcangeli
@ 2004-12-13 19:12                     ` Pavel Machek
  2004-12-13 20:34                       ` dynamic-hz john stultz
  2004-12-14  2:36                       ` dynamic-hz Andrea Arcangeli
  0 siblings, 2 replies; 126+ messages in thread
From: Pavel Machek @ 2004-12-13 19:12 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel

Hi!

> > But that does not matter, right? Yes, one-shot timer will not fire
> > exactly at right place, but as long as you are reading TSC and basing
> > next shot on current time, error should not accumulate.
> 
> As said in the rest of the message, the error (or some other error)
> accumulates heavily today in the tick-loss compensation/adjustment
> algorithm in arch/i386/kernel/timers/timer_tsc.c, so I'm sceptical
> about

I do not see how it should accumulate. Lets have working TSC. You want
to emulate fixed-period timer with single-shot timer.

int should_fire_at;

void handle_single_shot()
{
	int delay;
retry:n
	should_fire_at += loops_per_second/HZ
	delay = should_fire_at - get_tsc();
	if (delay < 0)
		goto retry;
	do_single_shot_in(delay);
}

I'm not sure what's broken with compensation code, but using
single-shot timer is not broken in theory.

> > [..] for their patent abuse against Java.
> 
> java isn't open source regardless of patents, use python instead ;).

Yes, java is bad, but using patents against it is evil, too. Plus
python does not yet run on my cellphone ;-).
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 11:08             ` dynamic-hz Andrea Arcangeli
@ 2004-12-13 19:36               ` john stultz
  0 siblings, 0 replies; 126+ messages in thread
From: john stultz @ 2004-12-13 19:36 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Pavel Machek, Con Kolivas, lkml

On Mon, 2004-12-13 at 03:08, Andrea Arcangeli wrote:
> On Mon, Dec 13, 2004 at 11:43:21AM +0100, Pavel Machek wrote:
> > Doing lot less per timer tick is not going to help much... You cpu
> 
> I also doubt we can do significantly less per timer tick. 

Well, I'd like see the timeofday timekeeping work reduced so we don't do
it every tick. Instead it would become a scheduled event that goes off
every second or so.

thanks
-john


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 16:23             ` dynamic-hz Russell King
  2004-12-13 17:53               ` dynamic-hz Michael Buesch
  2004-12-13 19:04               ` dynamic-hz Pavel Machek
@ 2004-12-13 20:11               ` Russell King
  2 siblings, 0 replies; 126+ messages in thread
From: Russell King @ 2004-12-13 20:11 UTC (permalink / raw)
  To: Alan Cox, Stefan Seyfried, Con Kolivas, Pavel Machek,
	Linux Kernel Mailing List, Andrea Arcangeli

On Mon, Dec 13, 2004 at 04:23:55PM +0000, Russell King wrote:
> There is another twist here though - the Linux kernel kicks itself out
> of idle mode and into some other thread multiple times a second while
> the system is idle.  So far, in all my Linux kernel experience, I've
> yet to see a kernel where it's possible to stay in the idle thread
> for more than half a second.  (The ARM kernels I run are always
> configured with IDLE LED support, so I can _see_ when it gets kicked
> out of the idle thread.)

For futher information only, analysing this further, we keep switching
to the events/0 thread, and it seems to be mainly for:

  - cursor handling every 200ms
  - slab cache reaping about every 2s

The cursor timer is firing all the time that you have a fbcon console
registered, whether or not the cursor should be displayed.  Someone
looking to save power should probably tackle this such that the cursor
timer doesn't needlessly fire.

But I guess the cellphone people would be more interested in this
problem than the big iron desktop-breaking in-need-of-three-phase-supply
boxen. 8)

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-11 14:23 dynamic-hz Andrea Arcangeli
                   ` (2 preceding siblings ...)
  2004-12-12 16:35 ` dynamic-hz Pavel Machek
@ 2004-12-13 20:26 ` Olaf Hering
  2004-12-13 22:41   ` dynamic-hz Andrea Arcangeli
  2004-12-13 20:56 ` dynamic-hz john stultz
  4 siblings, 1 reply; 126+ messages in thread
From: Olaf Hering @ 2004-12-13 20:26 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

 On Sat, Dec 11, Andrea Arcangeli wrote:

> Comments welcome thanks.

Not a comment, more a question:

Will there be a real benefit by running an old PII 200MMX at 100HZ
instead of 1000HZ?
I guess less interrupts should improve the desktop performance a little bit.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 19:12                     ` dynamic-hz Pavel Machek
@ 2004-12-13 20:34                       ` john stultz
  2004-12-13 20:49                         ` dynamic-hz Pavel Machek
  2004-12-14  2:46                         ` dynamic-hz Andrea Arcangeli
  2004-12-14  2:36                       ` dynamic-hz Andrea Arcangeli
  1 sibling, 2 replies; 126+ messages in thread
From: john stultz @ 2004-12-13 20:34 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml

On Mon, 2004-12-13 at 11:12, Pavel Machek wrote:
> Hi!
> 
> > > But that does not matter, right? Yes, one-shot timer will not fire
> > > exactly at right place, but as long as you are reading TSC and basing
> > > next shot on current time, error should not accumulate.
> > 
> > As said in the rest of the message, the error (or some other error)
> > accumulates heavily today in the tick-loss compensation/adjustment
> > algorithm in arch/i386/kernel/timers/timer_tsc.c, so I'm sceptical
> > about
> 
> I do not see how it should accumulate. Lets have working TSC. You want
> to emulate fixed-period timer with single-shot timer.

Its caused by the fact that we don't use the the TSC to accumulate time.
We are instead interpolating between timer ticks and the TSC, where the
timer tick is what really accumulates time, and the TSC is used for
inter-tick time keeping (with the exception of the lost tick
compensation code).

Unfortunately interrupt delay and queueing can cause situations where a
tick appears to be lost, but then immediately after a second one
appears. In this case we add two, compensating for the loss, and then
add one more.

One could try to catch these early-seeming ticks w/ similar compensation
code, but due to TSC calibration error there are sure to be holes where
more time inconsistencies could poke through. 

My feeling is that we need to stop interpolating and just trust one time
source (ie: the TSC or ACPIPM or HPET or whatever). Check out my
timeofday patches for more details.

thanks
-john




^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 20:34                       ` dynamic-hz john stultz
@ 2004-12-13 20:49                         ` Pavel Machek
  2004-12-14  2:04                           ` dynamic-hz Andrea Arcangeli
       [not found]                           ` <20041214013924.GB14617@atomide.com>
  2004-12-14  2:46                         ` dynamic-hz Andrea Arcangeli
  1 sibling, 2 replies; 126+ messages in thread
From: Pavel Machek @ 2004-12-13 20:49 UTC (permalink / raw)
  To: john stultz; +Cc: Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml

Hi!

> > > > But that does not matter, right? Yes, one-shot timer will not fire
> > > > exactly at right place, but as long as you are reading TSC and basing
> > > > next shot on current time, error should not accumulate.
> > > 
> > > As said in the rest of the message, the error (or some other error)
> > > accumulates heavily today in the tick-loss compensation/adjustment
> > > algorithm in arch/i386/kernel/timers/timer_tsc.c, so I'm sceptical
> > > about
> > 
> > I do not see how it should accumulate. Lets have working TSC. You want
> > to emulate fixed-period timer with single-shot timer.
> 
> Its caused by the fact that we don't use the the TSC to accumulate time.
> We are instead interpolating between timer ticks and the TSC, where

Yes, it was supposed to be simple, so that Andrea understands that
there's nothing inherently broken with single-shot timers.

								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-11 14:23 dynamic-hz Andrea Arcangeli
                   ` (3 preceding siblings ...)
  2004-12-13 20:26 ` dynamic-hz Olaf Hering
@ 2004-12-13 20:56 ` john stultz
  2004-12-13 22:21   ` dynamic-hz Andrea Arcangeli
  4 siblings, 1 reply; 126+ messages in thread
From: john stultz @ 2004-12-13 20:56 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: lkml

On Sat, 2004-12-11 at 06:23, Andrea Arcangeli wrote:
> This patch is quite intrusive since many HZ visible to userspace have to
> be converted to USER_HZ, and most important because HZ isn't available
> at compile time anymore and every variable in function of HZ must be
> either changed to be in function of USER_HZ or it must be initialized at
> runtime. The code has debugging code (optional at compile time) so that
> I can guarantee that there cannot be any regression.

Interesting patch, I know some folks have been asking about HZ=10k
recently, so this could help. 

The only bit that worries me a bit is the change from HZ->USER_HZ for
internal calculations. In my mind, USER_HZ should only be used for
converting internal system ticks to userspace-visible ticks. Changing
drivers to think about things in user-ticks confuses things a bit since
suddenly some kernel code is thinking in user-ticks and others in
system-ticks. It just muddles things a bit.

thanks
-john




^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 20:56 ` dynamic-hz john stultz
@ 2004-12-13 22:21   ` Andrea Arcangeli
  0 siblings, 0 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-13 22:21 UTC (permalink / raw)
  To: john stultz; +Cc: lkml

On Mon, Dec 13, 2004 at 12:56:29PM -0800, john stultz wrote:
> Interesting patch, I know some folks have been asking about HZ=10k
> recently, so this could help. 

Yes, they only need to pass HZ=10000 to the boot command line to make it
work with 2.4.

> The only bit that worries me a bit is the change from HZ->USER_HZ for
> internal calculations. In my mind, USER_HZ should only be used for
> converting internal system ticks to userspace-visible ticks. Changing
> drivers to think about things in user-ticks confuses things a bit since
> suddenly some kernel code is thinking in user-ticks and others in
> system-ticks. It just muddles things a bit.

I tried to make the smallest possible change to make the thing work,
even if that sometime meant to think in user hz. The user_to_kernel_hz
helper function converts back into kernel hz.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 20:26 ` dynamic-hz Olaf Hering
@ 2004-12-13 22:41   ` Andrea Arcangeli
  0 siblings, 0 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-13 22:41 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linux-kernel

On Mon, Dec 13, 2004 at 09:26:42PM +0100, Olaf Hering wrote:
> Not a comment, more a question:
> 
> Will there be a real benefit by running an old PII 200MMX at 100HZ
> instead of 1000HZ?
> I guess less interrupts should improve the desktop performance a little bit.

On a pii the slowdown is probably more than 1%, the slower the cpu, the
more 100hz is appropriate. This is not going to be very noticeable on a
desktop since a desktop is often idle, but only on servers it should
help (for example kernel compiles will be more than 1% faster).

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 14:52           ` dynamic-hz Alan Cox
  2004-12-13 16:23             ` dynamic-hz Russell King
@ 2004-12-14  0:16             ` Eric St-Laurent
  2004-12-15 18:04               ` dynamic-hz Alan Cox
  1 sibling, 1 reply; 126+ messages in thread
From: Eric St-Laurent @ 2004-12-14  0:16 UTC (permalink / raw)
  To: Alan Cox
  Cc: Russell King, Stefan Seyfried, Con Kolivas, Pavel Machek,
	Linux Kernel Mailing List, Andrea Arcangeli

On Mon, 2004-12-13 at 14:52 +0000, Alan Cox wrote:

> On a PC it makes huge sense, the deeply embedded folks who do turn the
> thing off for 30secs at a time (Eg cellphone) also want it as do
> virtualisation people where it trashes your scaling. API wise it isn't
> too hard, its just a matter of time to convert the jiffies users away
> and to do relative versions of add_timer with accuracy info included.

Alan,

On a related subject, a few months ago you posted a patch which added a
nice add_timeout()/timeout_pending() API and converted many (if not
most) drivers to use it.

If I remember correctly it did not generate much comments and the work
was not pushed into mainline.

I think it's a nice cleanup, IMHO the time_(before|after)(jiffies, ...)
construct is horrible.

Any chance to resurrect this work ?

PS: the original subject was "Initial bits to help pull jiffies out of
drivers"

Best regards,

Eric St-Laurent



^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 20:49                         ` dynamic-hz Pavel Machek
@ 2004-12-14  2:04                           ` Andrea Arcangeli
       [not found]                           ` <20041214013924.GB14617@atomide.com>
  1 sibling, 0 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-14  2:04 UTC (permalink / raw)
  To: Pavel Machek; +Cc: john stultz, Zwane Mwaikambo, Con Kolivas, lkml

On Mon, Dec 13, 2004 at 09:49:33PM +0100, Pavel Machek wrote:
> Yes, it was supposed to be simple, so that Andrea understands that
> there's nothing inherently broken with single-shot timers.

Single shot timer is unusable for system time accounting, at least as
long as you want to allow nmi. This is a tangible fact, no matter how
simple the example is.

Even the lost tick compensation is not working at all, and it has the
same issues that the one-shot timer has in keeping the system time
accurate.

Pavel, write a program to do iopl(2) cli() wait 3msec; sti() wait 3msec
cli() wait 3msec in a loop. Then watch your system time go in the future
at a rate of a few minutes per hour, then fix it. After you fixed it
we'll get my attention about one-shot timer again ;). I already tried to
fix it and failed so far since I can't see bugs in the current code.
(actually I fixed it by breaking the code, and dropping some ticks
somewhere)

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 19:12                     ` dynamic-hz Pavel Machek
  2004-12-13 20:34                       ` dynamic-hz john stultz
@ 2004-12-14  2:36                       ` Andrea Arcangeli
  2004-12-14  9:39                         ` dynamic-hz Pavel Machek
  2004-12-14  9:59                         ` dynamic-hz Pavel Machek
  1 sibling, 2 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-14  2:36 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel

On Mon, Dec 13, 2004 at 08:12:49PM +0100, Pavel Machek wrote:
> Hi!
> 
> > > But that does not matter, right? Yes, one-shot timer will not fire
> > > exactly at right place, but as long as you are reading TSC and basing
> > > next shot on current time, error should not accumulate.
> > 
> > As said in the rest of the message, the error (or some other error)
> > accumulates heavily today in the tick-loss compensation/adjustment
> > algorithm in arch/i386/kernel/timers/timer_tsc.c, so I'm sceptical
> > about
> 
> I do not see how it should accumulate. Lets have working TSC. You want
> to emulate fixed-period timer with single-shot timer.
> 
> int should_fire_at;
> 
> void handle_single_shot()
> {
> 	int delay;
> retry:n
> 	should_fire_at += loops_per_second/HZ
> 	delay = should_fire_at - get_tsc();
> 	if (delay < 0)
> 		goto retry;

Here you get a 10minute long NMI and you're automatically 10 minute in
the past (or your event gets a 10 sec introduced delay) without a way to
track it down.

Now in theory we might run this critical section into some special
section and we could restart it by updating regs->eip before returning
form the nmi. But that still leaves the unfixable window introduced by
the cpu not executing the tsc read and the do_single_shot_in atomically.

Given my recent experience with the lost tick compensation code that has
exactly the same window, I'm not optimistic it's going to keep the system
time uptodate accurately. Perhaps the apic is a lot faster and the error
won't propagate visibly. I'm not against trying but the thing about the
one-shot timer system time accuracy is a lot more complicated than this
pseudocode, and it's not obvious at all that your pseducode will work.

> 	do_single_shot_in(delay);

The only other way would be to use the 64bit tsc as the only source for
the system time (perhaps that's what you mean with the above
pseudocode?). But the calibration code would need changes to allow that.
Today we use the calibration divisor only in a small range so the
calibration can be quick and this way changing CPU frequency to the cpu
is also easier.

Even before thinking at using the one-shot timer, I would like to
fix the lost tick compensation of current production 2.6.9, only then we
can talk about tickless by using a one-shot timer. If we can't do the
lost-tick compensation without screwing the system time, I don't see how
we can do the one-shot timer without screwing the system time.

The lost tick compensation as well could be avoided if we use the TSC as
the source for gettimeofday and we ignore the PIT completely and we use
the PIT only to wakeup the cpu once in a while. *Then* we could convert
the PIT to a one-shot timer trivially too, but as said above the
accuracy of the divisor isn't enough and I've no idea if we can get a
real calibration that lasts more than a few hours.

My fast_gettimeoffset_quotient is set to 0x2f0271, that means the last
significant bit of the fast_gettimeoffset_quotient will showup in the
gettimeofday last singificant bit, after the tsc counted 2**32 ticks,
that means after less then 4 seconds in my computers at >1ghz. That's
why the gettimeoffset will never return anything longer than 1/HZ, so
the error cannot propagate in userspace.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 20:34                       ` dynamic-hz john stultz
  2004-12-13 20:49                         ` dynamic-hz Pavel Machek
@ 2004-12-14  2:46                         ` Andrea Arcangeli
  2004-12-14 19:24                           ` dynamic-hz john stultz
  1 sibling, 1 reply; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-14  2:46 UTC (permalink / raw)
  To: john stultz; +Cc: Pavel Machek, Zwane Mwaikambo, Con Kolivas, lkml

On Mon, Dec 13, 2004 at 12:34:00PM -0800, john stultz wrote:
> source (ie: the TSC or ACPIPM or HPET or whatever). Check out my

How long is the TSC calibration going to last before introducing visible
errors? Is there any error introduced while we transfer the accuracy of
the pit to the acuracy of the TSC during calibration? It would be much
simpler to only use the TSC to provide system time, but I assume we
would be already doing it, if it wasn't for the lost accuracy.

Plus are you already handling cpufreq changed every second by
powersaved? Doesn't that introduce further inaccuracy in the system
time?

As for the lost-tick compensation, it's not working at all, my system
goes as fast in the future as it would go in the past by disabling it.
So the only effect I get by the lost tick compensation is that it's
moving in the future instead of in the past, but the magnitude of the
error is the same and in turn it's not working at all. The real bug is
the USB irq handler that takes 3/4msec to execute and I get a constant
load of those irqs from the adsl modem.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 11:25           ` dynamic-hz Andrew Morton
  2004-12-13 11:47             ` dynamic-hz Andrea Arcangeli
@ 2004-12-14  3:54             ` Nish Aravamudan
  2004-12-14  4:29               ` dynamic-hz Andrew Morton
                                 ` (2 more replies)
  1 sibling, 3 replies; 126+ messages in thread
From: Nish Aravamudan @ 2004-12-14  3:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrea Arcangeli, kernel, pavel, linux-kernel

On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote:
> Andrea Arcangeli <andrea@suse.de> wrote:
> >
> > The patch only does HZ at dynamic time. But of course it's absolutely
> >  trivial to define it at compile time, it's probably a 3 liner on top of
> >  my current patch ;). However personally I don't think the three liner
> >  will worth the few seconds more spent configuring the kernel ;).
> 
> We still have 1000-odd places which do things like
> 
>         schedule_timeout(HZ/10);

Yes, yes, we do :) I replaced far more than I ever thought I could...
There are a few issues I have with the remaining schedule_timeout()
calls which I think fit ok with this thread... I'd especially like
your input, Andrew, as you end up getting most of my patches from KJ.

Many drivers use

set_current_state(TASK_{UN,}INTERRUPTIBLE);
schedule_timeout(1); // or some other small value < 10

This may or may not hide a dependency on a particular HZ value. If the
code is somewhat old, perhaps the author intended the task to sleep
for 1 jiffy when HZ was equal to 100. That meants that they ended up
sleeping for 10 ms. If the code is new, the author intends that the
task sleeps for 1 ms (HZ==1000). The question is, what should the
replacement be?

If they really meant to use schedule_timeout(1) in the sense of
highest resolution delay possible (the latter above), then they
probably should just call schedule() directly. schedule_timeout(1)
simply sets up a timer to fire off after 1 jiffy & then calls
schedule() itself. The overhead of setting up a timer and the
execution of schedule() itself probably means that the timer will go
off in the middle of the schedule() call or very shortly thereafter (I
think). In which case, it makes more sense to use schedule()
directly...

If they meant to schedule a delay of 10ms, then msleep() should be
used in those cases. msleep() will also resolve the issues with 0-time
timeouts because of rounding, as it adds 1 to the converted parameter.

Obviously, changing more and more sleeps to msecs & secs will really
help make the changing of HZ more transparent. And specifying the time
in real time units just seems so much clearer to me.

What do people think?

-Nish

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 11:47             ` dynamic-hz Andrea Arcangeli
@ 2004-12-14  3:56               ` Nish Aravamudan
  0 siblings, 0 replies; 126+ messages in thread
From: Nish Aravamudan @ 2004-12-14  3:56 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, kernel, pavel, linux-kernel

On Mon, 13 Dec 2004 12:47:37 +0100, Andrea Arcangeli <andrea@suse.de> wrote:
> On Mon, Dec 13, 2004 at 03:25:21AM -0800, Andrew Morton wrote:
> > We still have 1000-odd places which do things like
> >
> >       schedule_timeout(HZ/10);
> >
> > which will now involve a runtime divide.  The propagation of msleep() and
> > ssleep() will reduce that a bit, but not much.
> 
> The above is by far the least cpu-hungry piece, it's going to sleep for
> 100msec, so any order-of-nanoseconds computation in such path will be by
> defininition not measurable.
> 
> msleep and ssleep as well will obviously be non measurable for the same
> reason (their only point is to wait and "waste" cpu). I mean,
> msleep/ssleep are the only places in the kernel that we don't really
> need to optimize ;).

I don't exactly understand what you mean by ""waste" cpu"? They both
give up the CPU by calling schedule_timeout() which calls schedule().
So any "waste" of the CPU is due to no tasks being available to run,
not to msleep()/ssleep(). I think :)

-Nish

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 12:51             ` dynamic-hz Hans Kristian Rosbach
  2004-12-13 13:01               ` dynamic-hz Andrea Arcangeli
  2004-12-13 15:06               ` dynamic-hz Geert Uytterhoeven
@ 2004-12-14  4:05               ` Nish Aravamudan
  2 siblings, 0 replies; 126+ messages in thread
From: Nish Aravamudan @ 2004-12-14  4:05 UTC (permalink / raw)
  To: Hans Kristian Rosbach
  Cc: Pavel Machek, Andrew Morton, Con Kolivas, andrea,
	Linux Kernel Mailing List

On Mon, 13 Dec 2004 13:51:11 +0100, Hans Kristian Rosbach
<hk@isphuset.no> wrote:
> 
> 
> On Mon, Dec 13, 2004 at 12:22:29PM +0100, Pavel Machek wrote:
> > I tried defining HZ to 10 once, and there are some #if arrays in the
> > kernel that prevented me from doing that.
> >
> > Some drivers do timeouts based on jiffies; having HZ=1 may turn 20msec
> > timeout into 1sec, that could hurt a lot in the error case...
> 
> On Mon, Dec 13, 2004 at 03:25:21AM -0800, Andrew Morton wrote:
> > We still have 1000-odd places which do things like
> >        schedule_timeout(HZ/10);
> > which will now involve a runtime divide.  The propagation of msleep()
> > and ssleep() will reduce that a bit, but not much.
> 
> Shouldn't that be regarded as a bug/deprecated?
> 
> I'm not sure what the above "scedule_timeout(HZ/10)" is supposed to
> do, but the parameter it gets in 1000hz is "100" so I assume this
> is because we want to wait for 100ms, and in 1000hz that equals
> 100 cycles. Correct?

schedule_timeout() specifies a sleep in jiffies -- it's actually a
rather annoying interface for the very reason that it depends on the
value of HZ how *long* you actually will sleep for (in human time
units). So your assumption is incorrect, presuming the code author
knows what they are doing. They wish to sleep for 1/10 the number of
timer ticks in a second. What this translates to, though, clearly
depends on HZ. Thus

msleep{,_interruptible}(100);

would be far better to use (it calls schedule_timeout() correctly
[another thing not done often]). Also, if you look carefully at the
timer code, you'll notice that the x86 timer frequency is not actually
1000 Hz, it's actually less. Thus you run into issues with timer
intervals... But, in any case, specifying a timeout of 100 msecs is
different then specifying a timeout of 100 cycles on x86. I'm not sure
what it exactly translates to, but it will be more. Hence, you should
use msleep() not schedule_timeout() directly. Jiffies should not be
what you base your timing on; msecs & secs are easier and less likely
to be misused.

> then in the rest of the code we can use ex:
> schedule_timeout(varX*100) for 100ms no matter what hz is.

No, please don't. Use msleep() or msleep_interruptible(). Let the
conversion functions take care of the conversions.

> With hz=50 then the lowest ms is 20 for one tick though. And that
> might trigger problems with approximation at some point.
> varX would have to be decimal, and that might also be a problem?

No floating point in the kernel...

-Nish

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13 16:12                 ` dynamic-hz Pavel Machek
  2004-12-13 16:14                   ` dynamic-hz Geert Uytterhoeven
@ 2004-12-14  4:06                   ` Nish Aravamudan
  1 sibling, 0 replies; 126+ messages in thread
From: Nish Aravamudan @ 2004-12-14  4:06 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Geert Uytterhoeven, Hans Kristian Rosbach, Andrew Morton,
	Con Kolivas, andrea, Linux Kernel Mailing List

On Mon, 13 Dec 2004 17:12:07 +0100, Pavel Machek <pavel@suse.cz> wrote:
> HI!
> 
> > > I'm not sure what the above "scedule_timeout(HZ/10)" is supposed to
> > > do, but the parameter it gets in 1000hz is "100" so I assume this
> > > is because we want to wait for 100ms, and in 1000hz that equals
> > > 100 cycles. Correct?
> >
> > `schedule_timeout(HZ/x)' lets it wait for 1/x'th second.
> 
> ...small problem is that for HZ lower than x it does not wait at all
> :-(.

Ah ha! Another reason to use msleep() or msleep_interruptible() :).
Or, if you just want to give up the CPU, use schedule(); or if, giving
up the CPU for a long time, use yield() [the current semantic
interpretation of yield()].

-Nish

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14  3:54             ` dynamic-hz Nish Aravamudan
@ 2004-12-14  4:29               ` Andrew Morton
  2004-12-14  5:25                 ` dynamic-hz Nish Aravamudan
  2004-12-17 20:10                 ` dynamic-hz Nish Aravamudan
  2004-12-14 10:01               ` dynamic-hz Domen Puncer
  2004-12-14 14:23               ` dynamic-hz linux-os
  2 siblings, 2 replies; 126+ messages in thread
From: Andrew Morton @ 2004-12-14  4:29 UTC (permalink / raw)
  To: Nish Aravamudan; +Cc: andrea, kernel, pavel, linux-kernel

Nish Aravamudan <nish.aravamudan@gmail.com> wrote:
>
> On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote:
> > Andrea Arcangeli <andrea@suse.de> wrote:
> > >
> > > The patch only does HZ at dynamic time. But of course it's absolutely
> > >  trivial to define it at compile time, it's probably a 3 liner on top of
> > >  my current patch ;). However personally I don't think the three liner
> > >  will worth the few seconds more spent configuring the kernel ;).
> > 
> > We still have 1000-odd places which do things like
> > 
> >         schedule_timeout(HZ/10);
> 
> Yes, yes, we do :) I replaced far more than I ever thought I could...
> There are a few issues I have with the remaining schedule_timeout()
> calls which I think fit ok with this thread... I'd especially like
> your input, Andrew, as you end up getting most of my patches from KJ.
> 
> Many drivers use
> 
> set_current_state(TASK_{UN,}INTERRUPTIBLE);
> schedule_timeout(1); // or some other small value < 10
> 
> This may or may not hide a dependency on a particular HZ value. If the
> code is somewhat old, perhaps the author intended the task to sleep
> for 1 jiffy when HZ was equal to 100. That meants that they ended up
> sleeping for 10 ms. If the code is new, the author intends that the
> task sleeps for 1 ms (HZ==1000). The question is, what should the
> replacement be?

Presumably they meant 10 milliseconds.  Or at least, that is the delay
which the developer did his testing with.

> If they really meant to use schedule_timeout(1) in the sense of
> highest resolution delay possible (the latter above), then they
> probably should just call schedule() directly.

argh.  Never do that.  It's basically a busywait and can cause lockups if
the calling task has realtime scheduling policy.

> schedule_timeout(1)
> simply sets up a timer to fire off after 1 jiffy & then calls
> schedule() itself. The overhead of setting up a timer and the
> execution of schedule() itself probably means that the timer will go
> off in the middle of the schedule() call or very shortly thereafter (I
> think). In which case, it makes more sense to use schedule()
> directly...
> 
> If they meant to schedule a delay of 10ms, then msleep() should be
> used in those cases. msleep() will also resolve the issues with 0-time
> timeouts because of rounding, as it adds 1 to the converted parameter.
> 
> Obviously, changing more and more sleeps to msecs & secs will really
> help make the changing of HZ more transparent. And specifying the time
> in real time units just seems so much clearer to me.
> 
> What do people think?

I'd say that replacing them with msleep(10) is the safest approach.
Depending on what the surronding code is actually doing, of course.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14  4:29               ` dynamic-hz Andrew Morton
@ 2004-12-14  5:25                 ` Nish Aravamudan
  2004-12-17 20:10                 ` dynamic-hz Nish Aravamudan
  1 sibling, 0 replies; 126+ messages in thread
From: Nish Aravamudan @ 2004-12-14  5:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: andrea, kernel, pavel, linux-kernel

On Mon, 13 Dec 2004 20:29:39 -0800, Andrew Morton <akpm@osdl.org> wrote:
> Nish Aravamudan <nish.aravamudan@gmail.com> wrote:
> 
> 
> >
> > On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote:
> > > Andrea Arcangeli <andrea@suse.de> wrote:
> > > >
> > > > The patch only does HZ at dynamic time. But of course it's absolutely
> > > >  trivial to define it at compile time, it's probably a 3 liner on top of
> > > >  my current patch ;). However personally I don't think the three liner
> > > >  will worth the few seconds more spent configuring the kernel ;).
> > >
> > > We still have 1000-odd places which do things like
> > >
> > >         schedule_timeout(HZ/10);
> >
> > Yes, yes, we do :) I replaced far more than I ever thought I could...
> > There are a few issues I have with the remaining schedule_timeout()
> > calls which I think fit ok with this thread... I'd especially like
> > your input, Andrew, as you end up getting most of my patches from KJ.
> >
> > Many drivers use
> >
> > set_current_state(TASK_{UN,}INTERRUPTIBLE);
> > schedule_timeout(1); // or some other small value < 10
> >
> > This may or may not hide a dependency on a particular HZ value. If the
> > code is somewhat old, perhaps the author intended the task to sleep
> > for 1 jiffy when HZ was equal to 100. That meants that they ended up
> > sleeping for 10 ms. If the code is new, the author intends that the
> > task sleeps for 1 ms (HZ==1000). The question is, what should the
> > replacement be?
> 
> Presumably they meant 10 milliseconds.  Or at least, that is the delay
> which the developer did his testing with.

OK, I will make a set of these changes soon, hopefully.

> > If they really meant to use schedule_timeout(1) in the sense of
> > highest resolution delay possible (the latter above), then they
> > probably should just call schedule() directly.
> 
> argh.  Never do that.  It's basically a busywait and can cause lockups if
> the calling task has realtime scheduling policy.
 
OK, I won't make any such changes in my next next set of patches. 
 
> > schedule_timeout(1)
> > simply sets up a timer to fire off after 1 jiffy & then calls
> > schedule() itself. The overhead of setting up a timer and the
> > execution of schedule() itself probably means that the timer will go
> > off in the middle of the schedule() call or very shortly thereafter (I
> > think). In which case, it makes more sense to use schedule()
> > directly...
> >
> > If they meant to schedule a delay of 10ms, then msleep() should be
> > used in those cases. msleep() will also resolve the issues with 0-time
> > timeouts because of rounding, as it adds 1 to the converted parameter.
> >
> > Obviously, changing more and more sleeps to msecs & secs will really
> > help make the changing of HZ more transparent. And specifying the time
> > in real time units just seems so much clearer to me.
> >
> > What do people think?
> 
> I'd say that replacing them with msleep(10) is the safest approach.
> Depending on what the surronding code is actually doing, of course.

Thanks for the info!

-Nish

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
       [not found]                           ` <20041214013924.GB14617@atomide.com>
@ 2004-12-14  9:37                             ` Pavel Machek
  2004-12-14 21:18                               ` dynamic-hz Tony Lindgren
  0 siblings, 1 reply; 126+ messages in thread
From: Pavel Machek @ 2004-12-14  9:37 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: john stultz, Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml

Hi!

> > > > > > But that does not matter, right? Yes, one-shot timer will not fire
> > > > > > exactly at right place, but as long as you are reading TSC and basing
> > > > > > next shot on current time, error should not accumulate.
> > > > > 
> > > > > As said in the rest of the message, the error (or some other error)
> > > > > accumulates heavily today in the tick-loss compensation/adjustment
> > > > > algorithm in arch/i386/kernel/timers/timer_tsc.c, so I'm sceptical
> > > > > about
> > > > 
> > > > I do not see how it should accumulate. Lets have working TSC. You want
> > > > to emulate fixed-period timer with single-shot timer.
> > > 
> > > Its caused by the fact that we don't use the the TSC to accumulate time.
> > > We are instead interpolating between timer ticks and the TSC, where
> > 
> > Yes, it was supposed to be simple, so that Andrea understands that
> > there's nothing inherently broken with single-shot timers.
> 
> Just a quick comment; The timer does not need to be single-shot 
> all the time, it can be a combination of continuous and variable
> length timer, and it can change depending on the system load.
> 
> We recently added VST support for OMAP in linux-omap bk tree, and 
> made some changes to the previous VST implementations that might be
> of interest:
...
> The patch in question is at:
> 
> http://linux-omap.bkbits.net:8080/main/user=tmlind/patch@1.2016.4.18?nav=!-|index.html|stats|!+|index.html|ChangeSet@-12w|cset@1.2016.4.18

Wow, that's basically 8 lines of code plus driver for new
hardware... Is it really that simple?
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14  2:36                       ` dynamic-hz Andrea Arcangeli
@ 2004-12-14  9:39                         ` Pavel Machek
  2004-12-14  9:59                         ` dynamic-hz Pavel Machek
  1 sibling, 0 replies; 126+ messages in thread
From: Pavel Machek @ 2004-12-14  9:39 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel

Hi!

> > 	do_single_shot_in(delay);
> 
> The only other way would be to use the 64bit tsc as the only source for
> the system time (perhaps that's what you mean with the above
> pseudocode?). But the calibration code would need changes to allow
> that.

Yes, that's what I meant.

> Even before thinking at using the one-shot timer, I would like to
> fix the lost tick compensation of current production 2.6.9, only then we
> can talk about tickless by using a one-shot timer. If we can't do the
> lost-tick compensation without screwing the system time, I don't see how
> we can do the one-shot timer without screwing the system time.

Okay, I'll take a look.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14  2:36                       ` dynamic-hz Andrea Arcangeli
  2004-12-14  9:39                         ` dynamic-hz Pavel Machek
@ 2004-12-14  9:59                         ` Pavel Machek
  2004-12-14 15:25                           ` dynamic-hz Andrea Arcangeli
  1 sibling, 1 reply; 126+ messages in thread
From: Pavel Machek @ 2004-12-14  9:59 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel

Hi!

> Even before thinking at using the one-shot timer, I would like to
> fix the lost tick compensation of current production 2.6.9, only then we
> can talk about tickless by using a one-shot timer. If we can't do the
> lost-tick compensation without screwing the system time, I don't see how
> we can do the one-shot timer without screwing the system time.

Are you using CONFIG_HPET_TIMER by chance? It seems to be missing some
strategic -1, TSC (etc) get it right.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14  3:54             ` dynamic-hz Nish Aravamudan
  2004-12-14  4:29               ` dynamic-hz Andrew Morton
@ 2004-12-14 10:01               ` Domen Puncer
  2004-12-14 16:56                 ` dynamic-hz Nish Aravamudan
  2004-12-14 14:23               ` dynamic-hz linux-os
  2 siblings, 1 reply; 126+ messages in thread
From: Domen Puncer @ 2004-12-14 10:01 UTC (permalink / raw)
  To: Nish Aravamudan
  Cc: Andrew Morton, Andrea Arcangeli, kernel, pavel, linux-kernel

On 13/12/04 19:54 -0800, Nish Aravamudan wrote:
> On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote:
> > Andrea Arcangeli <andrea@suse.de> wrote:
> > >
> > > The patch only does HZ at dynamic time. But of course it's absolutely
> > >  trivial to define it at compile time, it's probably a 3 liner on top of
> > >  my current patch ;). However personally I don't think the three liner
> > >  will worth the few seconds more spent configuring the kernel ;).
> > 
> > We still have 1000-odd places which do things like
> > 
> >         schedule_timeout(HZ/10);
> 
...
> Many drivers use
> 
> set_current_state(TASK_{UN,}INTERRUPTIBLE);
> schedule_timeout(1); // or some other small value < 10
> 
...
> If they really meant to use schedule_timeout(1) in the sense of
> highest resolution delay possible (the latter above), then they
> probably should just call schedule() directly.

Um... no (and you should remember this from our discussions), schedule()
gives up cpu until waitqueue wakeup or signal is received, and that can
be a really long delay :-)


	Domen

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14  3:54             ` dynamic-hz Nish Aravamudan
  2004-12-14  4:29               ` dynamic-hz Andrew Morton
  2004-12-14 10:01               ` dynamic-hz Domen Puncer
@ 2004-12-14 14:23               ` linux-os
  2004-12-14 16:54                 ` dynamic-hz Nish Aravamudan
  2 siblings, 1 reply; 126+ messages in thread
From: linux-os @ 2004-12-14 14:23 UTC (permalink / raw)
  To: Nish Aravamudan
  Cc: Andrew Morton, Andrea Arcangeli, kernel, pavel, linux-kernel

On Mon, 13 Dec 2004, Nish Aravamudan wrote:

> On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote:
>> Andrea Arcangeli <andrea@suse.de> wrote:
>>>
>>> The patch only does HZ at dynamic time. But of course it's absolutely
>>>  trivial to define it at compile time, it's probably a 3 liner on top of
>>>  my current patch ;). However personally I don't think the three liner
>>>  will worth the few seconds more spent configuring the kernel ;).
>>
>> We still have 1000-odd places which do things like
>>
>>         schedule_timeout(HZ/10);
>
> Yes, yes, we do :) I replaced far more than I ever thought I could...
> There are a few issues I have with the remaining schedule_timeout()
> calls which I think fit ok with this thread... I'd especially like
> your input, Andrew, as you end up getting most of my patches from KJ.
>
> Many drivers use
>
> set_current_state(TASK_{UN,}INTERRUPTIBLE);
> schedule_timeout(1); // or some other small value < 10
>
> This may or may not hide a dependency on a particular HZ value. If the
> code is somewhat old, perhaps the author intended the task to sleep
> for 1 jiffy when HZ was equal to 100. That meants that they ended up
> sleeping for 10 ms. If the code is new, the author intends that the
> task sleeps for 1 ms (HZ==1000). The question is, what should the
> replacement be?
>
> If they really meant to use schedule_timeout(1) in the sense of
> highest resolution delay possible (the latter above), then they
> probably should just call schedule() directly. schedule_timeout(1)
> simply sets up a timer to fire off after 1 jiffy & then calls
> schedule() itself. The overhead of setting up a timer and the
> execution of schedule() itself probably means that the timer will go
> off in the middle of the schedule() call or very shortly thereafter (I
> think). In which case, it makes more sense to use schedule()
> directly...
>
> If they meant to schedule a delay of 10ms, then msleep() should be
> used in those cases. msleep() will also resolve the issues with 0-time
> timeouts because of rounding, as it adds 1 to the converted parameter.
>
> Obviously, changing more and more sleeps to msecs & secs will really
> help make the changing of HZ more transparent. And specifying the time
> in real time units just seems so much clearer to me.
>
> What do people think?
>
> -Nish

I found that if you use schedule() directly then the sleeping
task appears to be spinning in "system" in `top`. If you use
schedule_timeout(0), it works the same, but doesn't appear
to be eating CPU cycles as shown by `top`. Many common
drivers need to have the timeout interruptible, but wait
<forever if necessary> for a particular event. They need
to get the CPU back fairly often to check again for the
event. They need the equavalent of user-mode sched_yield().
sys_sched_yield() did't seem to work correctly, last time
I tried.

Maybe somebody could make a sched_yield() for the kernel.
That would improve a lot of drivers.



Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by John Ashcroft.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14  9:59                         ` dynamic-hz Pavel Machek
@ 2004-12-14 15:25                           ` Andrea Arcangeli
  2004-12-14 22:02                             ` USB making time drift [was Re: dynamic-hz] Pavel Machek
  0 siblings, 1 reply; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-14 15:25 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel

On Tue, Dec 14, 2004 at 10:59:39AM +0100, Pavel Machek wrote:
> Are you using CONFIG_HPET_TIMER by chance? It seems to be missing some
> strategic -1, TSC (etc) get it right.

I'm not using hpet because it's an old hardware, this is with timer_tsc.
It must be reproducible in any machine out there, especially with
machines with usb it should be reproducible even without any userspace
testcase doing iopl/cli/sti. Time will go silenty in the future at every
usb irq (they often last 3/4msec).

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 14:23               ` dynamic-hz linux-os
@ 2004-12-14 16:54                 ` Nish Aravamudan
  2004-12-14 17:15                   ` dynamic-hz Andrea Arcangeli
  0 siblings, 1 reply; 126+ messages in thread
From: Nish Aravamudan @ 2004-12-14 16:54 UTC (permalink / raw)
  To: linux-os; +Cc: Andrew Morton, Andrea Arcangeli, kernel, pavel, linux-kernel

On Tue, 14 Dec 2004 09:23:54 -0500 (EST), linux-os
<linux-os@chaos.analogic.com> wrote:
> On Mon, 13 Dec 2004, Nish Aravamudan wrote:
> 
> > On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote:
> >> Andrea Arcangeli <andrea@suse.de> wrote:
> >>>
> >>> The patch only does HZ at dynamic time. But of course it's absolutely
> >>>  trivial to define it at compile time, it's probably a 3 liner on top of
> >>>  my current patch ;). However personally I don't think the three liner
> >>>  will worth the few seconds more spent configuring the kernel ;).
> >>
> >> We still have 1000-odd places which do things like
> >>
> >>         schedule_timeout(HZ/10);
> >
> > Yes, yes, we do :) I replaced far more than I ever thought I could...
> > There are a few issues I have with the remaining schedule_timeout()
> > calls which I think fit ok with this thread... I'd especially like
> > your input, Andrew, as you end up getting most of my patches from KJ.
> >
> > Many drivers use
> >
> > set_current_state(TASK_{UN,}INTERRUPTIBLE);
> > schedule_timeout(1); // or some other small value < 10
> >
> > This may or may not hide a dependency on a particular HZ value. If the
> > code is somewhat old, perhaps the author intended the task to sleep
> > for 1 jiffy when HZ was equal to 100. That meants that they ended up
> > sleeping for 10 ms. If the code is new, the author intends that the
> > task sleeps for 1 ms (HZ==1000). The question is, what should the
> > replacement be?
> >
> > If they really meant to use schedule_timeout(1) in the sense of
> > highest resolution delay possible (the latter above), then they
> > probably should just call schedule() directly. schedule_timeout(1)
> > simply sets up a timer to fire off after 1 jiffy & then calls
> > schedule() itself. The overhead of setting up a timer and the
> > execution of schedule() itself probably means that the timer will go
> > off in the middle of the schedule() call or very shortly thereafter (I
> > think). In which case, it makes more sense to use schedule()
> > directly...
> >
> > If they meant to schedule a delay of 10ms, then msleep() should be
> > used in those cases. msleep() will also resolve the issues with 0-time
> > timeouts because of rounding, as it adds 1 to the converted parameter.
> >
> > Obviously, changing more and more sleeps to msecs & secs will really
> > help make the changing of HZ more transparent. And specifying the time
> > in real time units just seems so much clearer to me.
> >
> > What do people think?
> >
> > -Nish
> 
> I found that if you use schedule() directly then the sleeping
> task appears to be spinning in "system" in `top`. If you use
> schedule_timeout(0), it works the same, but doesn't appear
> to be eating CPU cycles as shown by `top`. Many common
> drivers need to have the timeout interruptible, but wait
> <forever if necessary> for a particular event. They need
> to get the CPU back fairly often to check again for the
> event. They need the equavalent of user-mode sched_yield().
> sys_sched_yield() did't seem to work correctly, last time
> I tried.
> 
> Maybe somebody could make a sched_yield() for the kernel.
> That would improve a lot of drivers.

Hmm, schedule_timeout(0) working that way is interesting. There is
also the option to use schedule_timeout(MAX_SCHEDULE_TIMEOUT) which
should sleep indefinitely (depending of course on the conditions of
the state). Oh but I think I understand what you're saying... the
driver needs to sleep indefinitely in total (potentially), but needs
to be able to return quite often (like yield() used to) so they could
check a condition...

Thanks for the input!

-Nish

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 10:01               ` dynamic-hz Domen Puncer
@ 2004-12-14 16:56                 ` Nish Aravamudan
  0 siblings, 0 replies; 126+ messages in thread
From: Nish Aravamudan @ 2004-12-14 16:56 UTC (permalink / raw)
  To: Domen Puncer; +Cc: Andrew Morton, Andrea Arcangeli, kernel, pavel, linux-kernel

On Tue, 14 Dec 2004 11:01:23 +0100, Domen Puncer <domen@coderock.org> wrote:
> On 13/12/04 19:54 -0800, Nish Aravamudan wrote:
> > On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote:
> > > Andrea Arcangeli <andrea@suse.de> wrote:
> > > >
> > > > The patch only does HZ at dynamic time. But of course it's absolutely
> > > >  trivial to define it at compile time, it's probably a 3 liner on top of
> > > >  my current patch ;). However personally I don't think the three liner
> > > >  will worth the few seconds more spent configuring the kernel ;).
> > >
> > > We still have 1000-odd places which do things like
> > >
> > >         schedule_timeout(HZ/10);
> > 
> ...
> > Many drivers use
> >
> > set_current_state(TASK_{UN,}INTERRUPTIBLE);
> > schedule_timeout(1); // or some other small value < 10
> > 
> ...
> > If they really meant to use schedule_timeout(1) in the sense of
> > highest resolution delay possible (the latter above), then they
> > probably should just call schedule() directly.
> 
> Um... no (and you should remember this from our discussions), schedule()
> gives up cpu until waitqueue wakeup or signal is received, and that can
> be a really long delay :-)

True; sorry about that, Domen, completely forgot about that. Will
think on it further.

-Nish

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 16:54                 ` dynamic-hz Nish Aravamudan
@ 2004-12-14 17:15                   ` Andrea Arcangeli
  2004-12-14 17:42                     ` dynamic-hz Nish Aravamudan
  2004-12-14 18:22                     ` dynamic-hz linux-os
  0 siblings, 2 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-14 17:15 UTC (permalink / raw)
  To: Nish Aravamudan; +Cc: linux-os, Andrew Morton, kernel, pavel, linux-kernel

On Tue, Dec 14, 2004 at 08:54:29AM -0800, Nish Aravamudan wrote:
> Hmm, schedule_timeout(0) working that way is interesting. There is
> also the option to use schedule_timeout(MAX_SCHEDULE_TIMEOUT) which
> should sleep indefinitely (depending of course on the conditions of
> the state). Oh but I think I understand what you're saying... the
> driver needs to sleep indefinitely in total (potentially), but needs
> to be able to return quite often (like yield() used to) so they could
> check a condition...
> 
> Thanks for the input!

what do you mean like yield() used to? yield() is still there in latest
2.6, just call yield() and you'll get the same effect of sched_yield in
userspace. yields in the kernel are a bad thing though (they usually
mean code is not well written, code should be event driven not polled
driven).

Note that __set_current_state(..); schedule_timeout(0) is not like
yield. yield will return immediatly if it's the only task running. A
yielding loop will consume all available cpu, while the
schedule_timeout(0) will wait less than 1/HZ sec. But really
schedule_timeout(0) makes little sense, either use schedule_timeout(1)
and explicitly wait 1msec, or use yield. schedule_timeout(0) just
happens to work because the timer code has to approximate for excess and
it will wait for the next timer irq for timeouts <= 0 and it will wait
for two ticks for timeouts == 1 etc...

I guess we could change schedule_timeout() to WARN_ON if 0 is being
passed to it.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 17:15                   ` dynamic-hz Andrea Arcangeli
@ 2004-12-14 17:42                     ` Nish Aravamudan
  2004-12-14 18:29                       ` dynamic-hz Andrea Arcangeli
  2004-12-14 18:22                     ` dynamic-hz linux-os
  1 sibling, 1 reply; 126+ messages in thread
From: Nish Aravamudan @ 2004-12-14 17:42 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-os, Andrew Morton, kernel, pavel, linux-kernel

On Tue, 14 Dec 2004 18:15:03 +0100, Andrea Arcangeli <andrea@suse.de> wrote:
> On Tue, Dec 14, 2004 at 08:54:29AM -0800, Nish Aravamudan wrote:
> > Hmm, schedule_timeout(0) working that way is interesting. There is
> > also the option to use schedule_timeout(MAX_SCHEDULE_TIMEOUT) which
> > should sleep indefinitely (depending of course on the conditions of
> > the state). Oh but I think I understand what you're saying... the
> > driver needs to sleep indefinitely in total (potentially), but needs
> > to be able to return quite often (like yield() used to) so they could
> > check a condition...
> >
> > Thanks for the input!
> 
> what do you mean like yield() used to? yield() is still there in latest
> 2.6, just call yield() and you'll get the same effect of sched_yield in
> userspace. yields in the kernel are a bad thing though (they usually
> mean code is not well written, code should be event driven not polled
> driven).

Sorry for my lack of clarity :) I was referring more to the second
part of what you said, that the "meaning" of yield() changed for 2.6
and thus shouldn't be used to wait for short times (see kerneljanitors
TODO reference from Matthew Wilcox (search for yield in page):
http://www.kerneljanitors.org/TODO).
 
> Note that __set_current_state(..); schedule_timeout(0) is not like
> yield. yield will return immediatly if it's the only task running. A
> yielding loop will consume all available cpu, while the
> schedule_timeout(0) will wait less than 1/HZ sec. But really
> schedule_timeout(0) makes little sense, either use schedule_timeout(1)
> and explicitly wait 1msec, or use yield. schedule_timeout(0) just
> happens to work because the timer code has to approximate for excess and
> it will wait for the next timer irq for timeouts <= 0 and it will wait
> for two ticks for timeouts == 1 etc...

>From the context of the TODO, it seems yield() and schedule_timeout()
should not be considered alternatives for each other. Maybe someone
can clarify?

> I guess we could change schedule_timeout() to WARN_ON if 0 is being
> passed to it.

I will see if anyone is actually calling with 0 -- I don't remember
seeing this for my previous sets of patches, but it may happen if HZ
changes in value.

-Nish

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 17:15                   ` dynamic-hz Andrea Arcangeli
  2004-12-14 17:42                     ` dynamic-hz Nish Aravamudan
@ 2004-12-14 18:22                     ` linux-os
  2004-12-14 18:38                       ` dynamic-hz Andrea Arcangeli
  2004-12-14 18:50                       ` dynamic-hz Pavel Machek
  1 sibling, 2 replies; 126+ messages in thread
From: linux-os @ 2004-12-14 18:22 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Nish Aravamudan, Andrew Morton, kernel, pavel, linux-kernel

On Tue, 14 Dec 2004, Andrea Arcangeli wrote:

> On Tue, Dec 14, 2004 at 08:54:29AM -0800, Nish Aravamudan wrote:
>> Hmm, schedule_timeout(0) working that way is interesting. There is
>> also the option to use schedule_timeout(MAX_SCHEDULE_TIMEOUT) which
>> should sleep indefinitely (depending of course on the conditions of
>> the state). Oh but I think I understand what you're saying... the
>> driver needs to sleep indefinitely in total (potentially), but needs
>> to be able to return quite often (like yield() used to) so they could
>> check a condition...
>>
>> Thanks for the input!
>
> what do you mean like yield() used to? yield() is still there in latest
> 2.6, just call yield() and you'll get the same effect of sched_yield in
> userspace. yields in the kernel are a bad thing though (they usually
> mean code is not well written, code should be event driven not polled
> driven).
>

Yield used to not show a spin in `top`.  Also, contrary to
"popular" opinion, not all events are accompanied by interrupts.
If they where, I'd gladly use one of the sleep_on* functions.

For instance, I need to erase NVRAM (Flash). Then I need to
program each byte. Waiting for the completion events requires
polling the hardware. Proper software will give up the CPU
while waiting and only sample the event, not continually spin.

You can get away with software murder if you only need to program
something that saves some state between shutdowns. However, if
you have a writable flash file-system you need to do it right.

> Note that __set_current_state(..); schedule_timeout(0) is not like
> yield. yield will return immediatly if it's the only task running. A
> yielding loop will consume all available cpu, while the
> schedule_timeout(0) will wait less than 1/HZ sec. But really

The timeout of (0) was really to make the code more obvious, the
facts being that we really need to get the CPU back as soon as
there are no higher-priority tasks computable. If yield() would
work like schedule(0), of course I'd use it. The major problem
with yield() probably has to do with accounting. The machine
"feels" as though the CPU is properly available when you need
it, however it appears to be spinning, using 100% system time.
This makes customers nervous.

> schedule_timeout(0) makes little sense, either use schedule_timeout(1)
> and explicitly wait 1msec, or use yield. schedule_timeout(0) just
> happens to work because the timer code has to approximate for excess and
> it will wait for the next timer irq for timeouts <= 0 and it will wait
> for two ticks for timeouts == 1 etc...
>
> I guess we could change schedule_timeout() to WARN_ON if 0 is being
> passed to it.
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by John Ashcroft.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 17:42                     ` dynamic-hz Nish Aravamudan
@ 2004-12-14 18:29                       ` Andrea Arcangeli
  2004-12-14 19:00                         ` dynamic-hz Nish Aravamudan
  0 siblings, 1 reply; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-14 18:29 UTC (permalink / raw)
  To: Nish Aravamudan; +Cc: linux-os, Andrew Morton, kernel, pavel, linux-kernel

On Tue, Dec 14, 2004 at 09:42:02AM -0800, Nish Aravamudan wrote:
> Sorry for my lack of clarity :) I was referring more to the second
> part of what you said, that the "meaning" of yield() changed for 2.6

The meaning of yield didn't really change. The behaviour changed a bit
to allow scalability even if more than one task is polling for a
resource (potentially even the _same_ resource) using yield().

But if you were using yield() in 2.4 you shouldn't change to anything
different than yield() in 2.6. If you get bad latencies under load in
2.6, it's simply a gentle reminder that using yield() is always a bad
idea ;).

NPTL converted the yield() loops in the slow path of the pthread_mutex to
even driven futex, otherwise 2.6 behaviour would break a lot more than
OOo.

In my 2.4-aa I've a sysctl to switch yield between two 2.4/2.6
behaviours. The new behaviour broke OOo and all linthread apps for
example, so it was necessary to use a sysctl to control it, even if the
new yield() behaviour is more correct because it has a chance to scala
under load.

Ingo may want to correct me if I remember wrong, I discussed this stuff
with him at the time.

> and thus shouldn't be used to wait for short times (see kerneljanitors
> TODO reference from Matthew Wilcox (search for yield in page):
> http://www.kerneljanitors.org/TODO).

The 2.4 yield() could introduce significant latencies too if more than
one task was looping in yield at the same time for different resources.

> From the context of the TODO, it seems yield() and schedule_timeout()
> should not be considered alternatives for each other. Maybe someone
> can clarify?

It depends what you're doing. yield() and __set_current_state(..);
schedule_timeout(1) are similar. I don't think schedule_timeout(0) makes
much sense (but in practice it works very similarly to
schedule_timeout(1)). The former will pool ASAP by guaranteeing the CPU
won't go idle. The latter will make the CPU go idle and it'll wait
between 1/HZ sec and 2/HZ sec.

The point is that polling is wrong and you should register into a
waitqueue and then __set_current_state(..); schedule(). This is exactly
what NPTL did too, and as far as I can tell it's pratically the most
noticeable feature for optimally written threaded apps. The
yield/schedule_timeout(1)-without-registering-in-callbacks are just
tricks for some special code.  For example I used myself
schedule_timeout(1) in the oom killer patch a few days ago, but that
code runs only when the machine is out of memory and several tasks will
try to kill something at the same time. At that time the cpu load really
doesn't matter. So tricks like that are ok in corner cases where
performance cannot matter at all. For fast paths or regular code, yield
should not be used (and schedule_timeout(1) used as as yield won't be
much better).

Conceptually if you want to poll as soon as possible you should use
yield(). If you want to wait and give some idle time to the cpu you
should use schedule_timeout().

You should ignore the claim that yield isn't appropriate in 2.6 for
waiting short periods of time, yield is still the API to use for polling
while keeping the cpu busy. If the machine is overloaded then it will
take a while to get back to the polling loop with 2.6, but then 2.4 had
other corner cases with the machine overloaded by userspace tasks
calling sched_yield too. So it's not really that much different in terms
of the guarantees that yield can provide between 2.4/2.6. The only
guarantee that yield can provide is that the cpu will remain busy, and
that you'll be rescheduled if some other task is pending in the
runqueue. It can't provide any guarantee on when you'll become running
again.

> > I guess we could change schedule_timeout() to WARN_ON if 0 is being
> > passed to it.
> 
> I will see if anyone is actually calling with 0 -- I don't remember

It's not that bad, I mean schedule_timeout(0) works fine, but once in a
while it may not wait anything and just return after invoking a timer
callback. So if somebody uses schedule_timeout, it's because he wants
always to make the cpu go idle for a little bit, and in turn it would be
better to use 1 (0 doesn't guarantee to go idle).

> seeing this for my previous sets of patches, but it may happen if HZ
> changes in value.

The HZ errors are just due the lack of roundup, and schedule_timeout
can't do anything about it, only the caller can (it's a problem even for
other HZ values that generate rounding errors, and that's why HZ=100 and
HZ=1000 are the only two really supported frequencies to freely switch
at boot time ;).

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 18:22                     ` dynamic-hz linux-os
@ 2004-12-14 18:38                       ` Andrea Arcangeli
  2004-12-14 18:50                       ` dynamic-hz Pavel Machek
  1 sibling, 0 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-14 18:38 UTC (permalink / raw)
  To: linux-os; +Cc: Nish Aravamudan, Andrew Morton, kernel, pavel, linux-kernel

On Tue, Dec 14, 2004 at 01:22:03PM -0500, linux-os wrote:
> Yield used to not show a spin in `top`.  Also, contrary to
> "popular" opinion, not all events are accompanied by interrupts.

Yes, ppa zip drive has the same issue.

yield shows a spin in top if it's the only running task. Otherwise it
will wait other task to run first. The behaviour has changed a bit
between 2.4 and 2.6, and we changed the corner cases. But the semantics
of yield are still the same.

> If they where, I'd gladly use one of the sleep_on* functions.

Minor detail: sleep_on is obsolete and should be deleted since it
requires the big kernel lock or the global cli to be safe. But I got the
point ;)

> For instance, I need to erase NVRAM (Flash). Then I need to
> program each byte. Waiting for the completion events requires
> polling the hardware. Proper software will give up the CPU
> while waiting and only sample the event, not continually spin.

This is a case where you know when you can expect the hardware to be
done (just like it was the case for the ppa zip). While dealing with
long hardware delays schedule_timeout makes plenty of sense. It would be
pointless to yield and spin, if you know nothing good can happen in the
next millisecond.

> The timeout of (0) was really to make the code more obvious, the
> facts being that we really need to get the CPU back as soon as
> there are no higher-priority tasks computable. If yield() would

With schedule_timeout(1) you're probably going to become interactive,
and you'll be scheduled before other tasks. That's good. I mean the
scheduler sorts things out automatically.

> work like schedule(0), of course I'd use it. The major problem
> with yield() probably has to do with accounting. The machine
> "feels" as though the CPU is properly available when you need
> it, however it appears to be spinning, using 100% system time.
> This makes customers nervous.

It's as well a waste of energy power to spin when you can
schedule_timeout(1).

So you're optimal at using schedule_timeout(1) in this case while
waiting hardware to complete as far as I can tell.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 18:22                     ` dynamic-hz linux-os
  2004-12-14 18:38                       ` dynamic-hz Andrea Arcangeli
@ 2004-12-14 18:50                       ` Pavel Machek
  1 sibling, 0 replies; 126+ messages in thread
From: Pavel Machek @ 2004-12-14 18:50 UTC (permalink / raw)
  To: linux-os
  Cc: Andrea Arcangeli, Nish Aravamudan, Andrew Morton, kernel, linux-kernel

HI!

> The timeout of (0) was really to make the code more obvious, the
> facts being that we really need to get the CPU back as soon as
> there are no higher-priority tasks computable. If yield() would
> work like schedule(0), of course I'd use it. The major problem
> with yield() probably has to do with accounting. The machine
> "feels" as though the CPU is properly available when you need
> it, however it appears to be spinning, using 100% system time.
> This makes customers nervous.

Well, machine that showed as "idle" yet had cpu running at full speed
would make *me* nervous...
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 18:29                       ` dynamic-hz Andrea Arcangeli
@ 2004-12-14 19:00                         ` Nish Aravamudan
  0 siblings, 0 replies; 126+ messages in thread
From: Nish Aravamudan @ 2004-12-14 19:00 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-os, Andrew Morton, kernel, pavel, linux-kernel

On Tue, 14 Dec 2004 19:29:00 +0100, Andrea Arcangeli <andrea@suse.de> wrote:
> On Tue, Dec 14, 2004 at 09:42:02AM -0800, Nish Aravamudan wrote:
> > Sorry for my lack of clarity :) I was referring more to the second
> > part of what you said, that the "meaning" of yield() changed for 2.6
> 
> The meaning of yield didn't really change. The behaviour changed a bit
> to allow scalability even if more than one task is polling for a
> resource (potentially even the _same_ resource) using yield().
> 
> But if you were using yield() in 2.4 you shouldn't change to anything
> different than yield() in 2.6. If you get bad latencies under load in
> 2.6, it's simply a gentle reminder that using yield() is always a bad
> idea ;).
> 
> NPTL converted the yield() loops in the slow path of the pthread_mutex to
> even driven futex, otherwise 2.6 behaviour would break a lot more than
> OOo.
> 
> In my 2.4-aa I've a sysctl to switch yield between two 2.4/2.6
> behaviours. The new behaviour broke OOo and all linthread apps for
> example, so it was necessary to use a sysctl to control it, even if the
> new yield() behaviour is more correct because it has a chance to scala
> under load.
> 
> Ingo may want to correct me if I remember wrong, I discussed this stuff
> with him at the time.
> 
> > and thus shouldn't be used to wait for short times (see kerneljanitors
> > TODO reference from Matthew Wilcox (search for yield in page):
> > http://www.kerneljanitors.org/TODO).
> 
> The 2.4 yield() could introduce significant latencies too if more than
> one task was looping in yield at the same time for different resources.
> 
> > From the context of the TODO, it seems yield() and schedule_timeout()
> > should not be considered alternatives for each other. Maybe someone
> > can clarify?
> 
> It depends what you're doing. yield() and __set_current_state(..);
> schedule_timeout(1) are similar. I don't think schedule_timeout(0) makes
> much sense (but in practice it works very similarly to
> schedule_timeout(1)). The former will pool ASAP by guaranteeing the CPU
> won't go idle. The latter will make the CPU go idle and it'll wait
> between 1/HZ sec and 2/HZ sec.
> 
> The point is that polling is wrong and you should register into a
> waitqueue and then __set_current_state(..); schedule(). This is exactly
> what NPTL did too, and as far as I can tell it's pratically the most
> noticeable feature for optimally written threaded apps. The
> yield/schedule_timeout(1)-without-registering-in-callbacks are just
> tricks for some special code.  For example I used myself
> schedule_timeout(1) in the oom killer patch a few days ago, but that
> code runs only when the machine is out of memory and several tasks will
> try to kill something at the same time. At that time the cpu load really
> doesn't matter. So tricks like that are ok in corner cases where
> performance cannot matter at all. For fast paths or regular code, yield
> should not be used (and schedule_timeout(1) used as as yield won't be
> much better).
> 
> Conceptually if you want to poll as soon as possible you should use
> yield(). If you want to wait and give some idle time to the cpu you
> should use schedule_timeout().
> 
> You should ignore the claim that yield isn't appropriate in 2.6 for
> waiting short periods of time, yield is still the API to use for polling
> while keeping the cpu busy. If the machine is overloaded then it will
> take a while to get back to the polling loop with 2.6, but then 2.4 had
> other corner cases with the machine overloaded by userspace tasks
> calling sched_yield too. So it's not really that much different in terms
> of the guarantees that yield can provide between 2.4/2.6. The only
> guarantee that yield can provide is that the cpu will remain busy, and
> that you'll be rescheduled if some other task is pending in the
> runqueue. It can't provide any guarantee on when you'll become running
> again.
> 
> > > I guess we could change schedule_timeout() to WARN_ON if 0 is being
> > > passed to it.
> >
> > I will see if anyone is actually calling with 0 -- I don't remember
> 
> It's not that bad, I mean schedule_timeout(0) works fine, but once in a
> while it may not wait anything and just return after invoking a timer
> callback. So if somebody uses schedule_timeout, it's because he wants
> always to make the cpu go idle for a little bit, and in turn it would be
> better to use 1 (0 doesn't guarantee to go idle).
> 
> > seeing this for my previous sets of patches, but it may happen if HZ
> > changes in value.
> 
> The HZ errors are just due the lack of roundup, and schedule_timeout
> can't do anything about it, only the caller can (it's a problem even for
> other HZ values that generate rounding errors, and that's why HZ=100 and
> HZ=1000 are the only two really supported frequencies to freely switch
> at boot time ;).

Great! Thanks a lot for all of the clarifications!

-Nish

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14  2:46                         ` dynamic-hz Andrea Arcangeli
@ 2004-12-14 19:24                           ` john stultz
  0 siblings, 0 replies; 126+ messages in thread
From: john stultz @ 2004-12-14 19:24 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Pavel Machek, Zwane Mwaikambo, Con Kolivas, lkml

On Mon, 2004-12-13 at 18:46, Andrea Arcangeli wrote:
> On Mon, Dec 13, 2004 at 12:34:00PM -0800, john stultz wrote:
> > source (ie: the TSC or ACPIPM or HPET or whatever). Check out my
> 
> How long is the TSC calibration going to last before introducing visible
> errors? Is there any error introduced while we transfer the accuracy of
> the pit to the acuracy of the TSC during calibration? It would be much
> simpler to only use the TSC to provide system time, but I assume we
> would be already doing it, if it wasn't for the lost accuracy.

Well, the TSC is a terrible time source. Currently when interpolating,
the error between the TSC and the PIT allows for time inconsistencies. 
When using it as the sole timesource, accurate calibration does become
much more important, because we do accumulate the error.  However, NTP
or other methods of correcting for poor calibration or drift could be
used. 

I realize not everything can use NTP, but George Anzinger has some code
that would use the PIT to measure and adjust the TSC frequency values.
Unfortunately I haven't gotten around to looking at it yet.


> Plus are you already handling cpufreq changed every second by
> powersaved? Doesn't that introduce further inaccuracy in the system
> time?

Yea, my code currently doesn't have cpufreq hooks, but the cpufreq
notifier would act as an interrupt which would save off the accumulated
time at the old frequency and update the time source with the new
frequency.


> As for the lost-tick compensation, it's not working at all, my system
> goes as fast in the future as it would go in the past by disabling it.
> So the only effect I get by the lost tick compensation is that it's
> moving in the future instead of in the past, but the magnitude of the
> error is the same and in turn it's not working at all. The real bug is
> the USB irq handler that takes 3/4msec to execute and I get a constant
> load of those irqs from the adsl modem.

I agree. Fixing the irq handler is right solution.

thanks
-john


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14  9:37                             ` dynamic-hz Pavel Machek
@ 2004-12-14 21:18                               ` Tony Lindgren
  2004-12-14 22:06                                 ` dynamic-hz Pavel Machek
  0 siblings, 1 reply; 126+ messages in thread
From: Tony Lindgren @ 2004-12-14 21:18 UTC (permalink / raw)
  To: Pavel Machek
  Cc: john stultz, Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml

* Pavel Machek <pavel@suse.cz> [041214 01:38]:
> Hi!
> 
> > > > > > > But that does not matter, right? Yes, one-shot timer will not fire
> > > > > > > exactly at right place, but as long as you are reading TSC and basing
> > > > > > > next shot on current time, error should not accumulate.
> > > > > > 
> > > > > > As said in the rest of the message, the error (or some other error)
> > > > > > accumulates heavily today in the tick-loss compensation/adjustment
> > > > > > algorithm in arch/i386/kernel/timers/timer_tsc.c, so I'm sceptical
> > > > > > about
> > > > > 
> > > > > I do not see how it should accumulate. Lets have working TSC. You want
> > > > > to emulate fixed-period timer with single-shot timer.
> > > > 
> > > > Its caused by the fact that we don't use the the TSC to accumulate time.
> > > > We are instead interpolating between timer ticks and the TSC, where
> > > 
> > > Yes, it was supposed to be simple, so that Andrea understands that
> > > there's nothing inherently broken with single-shot timers.
> > 
> > Just a quick comment; The timer does not need to be single-shot 
> > all the time, it can be a combination of continuous and variable
> > length timer, and it can change depending on the system load.
> > 
> > We recently added VST support for OMAP in linux-omap bk tree, and 
> > made some changes to the previous VST implementations that might be
> > of interest:
> ...
> > The patch in question is at:
> > 
> > http://linux-omap.bkbits.net:8080/main/user=tmlind/patch@1.2016.4.18?nav=!-|index.html|stats|!+|index.html|ChangeSet@-12w|cset@1.2016.4.18
> 
> Wow, that's basically 8 lines of code plus driver for new
> hardware... Is it really that simple?

Yeah, the key things are reprogramming the timer in the idle loop
based on next_timer_interrupt(), and calling timer_interrupt from
other interrupts as well :)

Should we try a similar patch for x86/amd64? I'm not sure which timers
to use though? One should be programmable length for the interrupt, 
and the other continuous for the timekeeping.

BTW, looks like my upgraded mail server is still a bit messed up, and
my original post did not make it to the list. But most of the message
is quoted above anyways. Here's the link to the patch again as
tinyurl:

http://tinyurl.com/69n4k

Tony

^ permalink raw reply	[flat|nested] 126+ messages in thread

* USB making time drift [was Re: dynamic-hz]
  2004-12-14 15:25                           ` dynamic-hz Andrea Arcangeli
@ 2004-12-14 22:02                             ` Pavel Machek
  2004-12-14 23:16                               ` Andrea Arcangeli
  2004-12-16  1:15                               ` Time goes crazy in 2.6.9 after long cli [was Re: USB making time drift] Pavel Machek
  0 siblings, 2 replies; 126+ messages in thread
From: Pavel Machek @ 2004-12-14 22:02 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel

Hi!

> On Tue, Dec 14, 2004 at 10:59:39AM +0100, Pavel Machek wrote:
> > Are you using CONFIG_HPET_TIMER by chance? It seems to be missing some
> > strategic -1, TSC (etc) get it right.
> 
> I'm not using hpet because it's an old hardware, this is with timer_tsc.
> It must be reproducible in any machine out there, especially with
> machines with usb it should be reproducible even without any userspace
> testcase doing iopl/cli/sti. Time will go silenty in the future at every
> usb irq (they often last 3/4msec).

How much drift do you see?

I have machine with UHCI here, and am using usb most of the time
(bluetooth for gprs connection), and did not notice too bad
drift. ntpdate does some adjustment each time I connect to the
network, but it 

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 21:18                               ` dynamic-hz Tony Lindgren
@ 2004-12-14 22:06                                 ` Pavel Machek
  2004-12-14 23:00                                   ` dynamic-hz linux-os
  2004-12-14 23:04                                   ` dynamic-hz Tony Lindgren
  0 siblings, 2 replies; 126+ messages in thread
From: Pavel Machek @ 2004-12-14 22:06 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: john stultz, Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml

Hi!

> > > The patch in question is at:
> > > 
> > > http://linux-omap.bkbits.net:8080/main/user=tmlind/patch@1.2016.4.18?nav=!-|index.html|stats|!+|index.html|ChangeSet@-12w|cset@1.2016.4.18
> > 
> > Wow, that's basically 8 lines of code plus driver for new
> > hardware... Is it really that simple?
> 
> Yeah, the key things are reprogramming the timer in the idle loop
> based on next_timer_interrupt(), and calling timer_interrupt from
> other interrupts as well :)
> 
> Should we try a similar patch for x86/amd64? I'm not sure which timers
> to use though? One should be programmable length for the interrupt, 
> and the other continuous for the timekeeping.

Yes, it would certainly be interesting. 5% power savings, and no
singing capacitors, while keeping HZ=1000. Sounds good to me.

There are about 1000 timers available in PC, each having its own
quirks. CMOS clock should be able to generate 1024Hz periodic timer
(we currently do not use) and TSC we currently use for periodic timer
should be usable in single-shot mode.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-12 23:36     ` dynamic-hz Con Kolivas
                         ` (5 preceding siblings ...)
  2004-12-13 12:00       ` dynamic-hz Alan Cox
@ 2004-12-14 22:28       ` Lee Revell
  2004-12-14 22:40         ` dynamic-hz Con Kolivas
  6 siblings, 1 reply; 126+ messages in thread
From: Lee Revell @ 2004-12-14 22:28 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrea Arcangeli, Pavel Machek, linux-kernel

On Mon, 2004-12-13 at 10:36 +1100, Con Kolivas wrote:
> The performance benefit, if any, is often lost in noise during 
> benchmarks and when there, is less than 1%.

I have measured 2.1-2.3% residency for the timer ISR on my 600Mhz VIA
C3.  And this is a desktop - you have many many embedded systems that
are slower.  For these systems the difference is very real.

I would certainly expect it to be lost in the noise on a 2Ghz machine.

Lee


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 22:28       ` dynamic-hz Lee Revell
@ 2004-12-14 22:40         ` Con Kolivas
  2004-12-14 22:50           ` dynamic-hz Lee Revell
  0 siblings, 1 reply; 126+ messages in thread
From: Con Kolivas @ 2004-12-14 22:40 UTC (permalink / raw)
  To: Lee Revell; +Cc: Con Kolivas, Andrea Arcangeli, Pavel Machek, linux-kernel

Lee Revell writes:

> On Mon, 2004-12-13 at 10:36 +1100, Con Kolivas wrote:
>> The performance benefit, if any, is often lost in noise during 
>> benchmarks and when there, is less than 1%.
> 
> I have measured 2.1-2.3% residency for the timer ISR on my 600Mhz VIA
> C3.  And this is a desktop - you have many many embedded systems that
> are slower.  For these systems the difference is very real.

Could you explain residency and it's relevance to throughput please? I've 
not heard this term before.

Cheers,
Con


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 22:40         ` dynamic-hz Con Kolivas
@ 2004-12-14 22:50           ` Lee Revell
  0 siblings, 0 replies; 126+ messages in thread
From: Lee Revell @ 2004-12-14 22:50 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrea Arcangeli, Pavel Machek, linux-kernel

On Wed, 2004-12-15 at 09:40 +1100, Con Kolivas wrote:
> Lee Revell writes:
> 
> > On Mon, 2004-12-13 at 10:36 +1100, Con Kolivas wrote:
> >> The performance benefit, if any, is often lost in noise during 
> >> benchmarks and when there, is less than 1%.
> > 
> > I have measured 2.1-2.3% residency for the timer ISR on my 600Mhz VIA
> > C3.  And this is a desktop - you have many many embedded systems that
> > are slower.  For these systems the difference is very real.
> 
> Could you explain residency and it's relevance to throughput please? I've 
> not heard this term before.
> 

It means 2.1-2.3% of wallclock time is spent running the timer interrupt
handler.  IOW, it runs for 21-23 usecs, 1000x per second.

Lee 


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-13  8:29       ` dynamic-hz Jan Engelhardt
@ 2004-12-14 22:54         ` Lee Revell
  2004-12-14 23:38           ` dynamic-hz Chris Friesen
  0 siblings, 1 reply; 126+ messages in thread
From: Lee Revell @ 2004-12-14 22:54 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: linux-kernel

On Mon, 2004-12-13 at 09:29 +0100, Jan Engelhardt wrote:
> > Just being devils advocate here...
> >
> > I had variable Hz in my tree for a while and found there was one solitary
> > purpose to setting Hz to 100; to silence cheap capacitors.
> >
> > The rest of my users that were setting Hz to 100 for so-called performance
> > gains were doing so under the false impression that cpu usage was lower simply
> > because of the woefully inaccurate cpu usage calcuation at 100Hz.
> 
> I have found that mplayer drops audio less often when the harddisk is under 
> load.
> 

Ugh, because mplayer stupidly does disk i/o and AV playback and GUI in
the same thread.  Insert Xine plug.

Lee


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 22:06                                 ` dynamic-hz Pavel Machek
@ 2004-12-14 23:00                                   ` linux-os
  2004-12-14 23:13                                     ` dynamic-hz Tony Lindgren
  2004-12-14 23:04                                   ` dynamic-hz Tony Lindgren
  1 sibling, 1 reply; 126+ messages in thread
From: linux-os @ 2004-12-14 23:00 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Tony Lindgren, john stultz, Andrea Arcangeli, Zwane Mwaikambo,
	Con Kolivas, lkml

On Tue, 14 Dec 2004, Pavel Machek wrote:

> Hi!
>
>>>> The patch in question is at:
>>>>
>>>> http://linux-omap.bkbits.net:8080/main/user=tmlind/patch@1.2016.4.18?nav=!-|index.html|stats|!+|index.html|ChangeSet@-12w|cset@1.2016.4.18
>>>
>>> Wow, that's basically 8 lines of code plus driver for new
>>> hardware... Is it really that simple?
>>
>> Yeah, the key things are reprogramming the timer in the idle loop
>> based on next_timer_interrupt(), and calling timer_interrupt from
>> other interrupts as well :)
>>
>> Should we try a similar patch for x86/amd64? I'm not sure which timers
>> to use though? One should be programmable length for the interrupt,
>> and the other continuous for the timekeeping.
>
> Yes, it would certainly be interesting. 5% power savings, and no
> singing capacitors, while keeping HZ=1000. Sounds good to me.
>
> There are about 1000 timers available in PC, each having its own
> quirks. CMOS clock should be able to generate 1024Hz periodic timer
> (we currently do not use) and TSC we currently use for periodic timer
> should be usable in single-shot mode.
> 								Pavel
> --

If you use that RTC timer, it needs to be something that can be
turned OFF. Many embedded applications use that because its the
only timer that the OS doesn't muck with. It also has very low
noise which makes in a good timing source for IIR filters for
high precision, low data-rate data acquisition (like 24 bits).

Since it generates an edge, its interrupt can't be shared.
I certainly hope that you don't use it. One can read the
time without disturbing the interrupt rate. One just
needs to use the existing rtc_lock and not spin with
the lock being held.

Currently the kernel RTC software allocates the RTC interrupt
even though it doesn't use it. This makes it necessary to
compile the RTC as a module and then remove it when another
driver needs to use the RTC interrupt source.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by John Ashcroft.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 22:06                                 ` dynamic-hz Pavel Machek
  2004-12-14 23:00                                   ` dynamic-hz linux-os
@ 2004-12-14 23:04                                   ` Tony Lindgren
  1 sibling, 0 replies; 126+ messages in thread
From: Tony Lindgren @ 2004-12-14 23:04 UTC (permalink / raw)
  To: Pavel Machek
  Cc: john stultz, Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml

* Pavel Machek <pavel@suse.cz> [041214 14:07]:
> Hi!
> 
> > > > The patch in question is at:
> > > > 
> > > > http://linux-omap.bkbits.net:8080/main/user=tmlind/patch@1.2016.4.18?nav=!-|index.html|stats|!+|index.html|ChangeSet@-12w|cset@1.2016.4.18
> > > 
> > > Wow, that's basically 8 lines of code plus driver for new
> > > hardware... Is it really that simple?
> > 
> > Yeah, the key things are reprogramming the timer in the idle loop
> > based on next_timer_interrupt(), and calling timer_interrupt from
> > other interrupts as well :)
> > 
> > Should we try a similar patch for x86/amd64? I'm not sure which timers
> > to use though? One should be programmable length for the interrupt, 
> > and the other continuous for the timekeeping.
> 
> Yes, it would certainly be interesting. 5% power savings, and no
> singing capacitors, while keeping HZ=1000. Sounds good to me.
> 
> There are about 1000 timers available in PC, each having its own
> quirks. CMOS clock should be able to generate 1024Hz periodic timer
> (we currently do not use) and TSC we currently use for periodic timer
> should be usable in single-shot mode.

I guess you mean to use the CMOS clock for continuous timer, and TSC
for periodic timer?

OK, I'll take a look at it later this week or over the weekend.

Haven't looked at the x86 timer code for a while, but I think
I'll set up a new clock where we can just register a timer update
function and a periodic tick function. That way we can easily use 
whatever hardware timers are available.

Tony

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 23:00                                   ` dynamic-hz linux-os
@ 2004-12-14 23:13                                     ` Tony Lindgren
  2004-12-22 20:02                                       ` dynamic-hz Tony Lindgren
  0 siblings, 1 reply; 126+ messages in thread
From: Tony Lindgren @ 2004-12-14 23:13 UTC (permalink / raw)
  To: linux-os
  Cc: Pavel Machek, john stultz, Andrea Arcangeli, Zwane Mwaikambo,
	Con Kolivas, lkml

* linux-os <linux-os@chaos.analogic.com> [041214 15:04]:
> On Tue, 14 Dec 2004, Pavel Machek wrote:
> 
> >Hi!
> >
> >>>>The patch in question is at:
> >>>>
> >>>>http://linux-omap.bkbits.net:8080/main/user=tmlind/patch@1.2016.4.18?nav=!-|index.html|stats|!+|index.html|ChangeSet@-12w|cset@1.2016.4.18
> >>>
> >>>Wow, that's basically 8 lines of code plus driver for new
> >>>hardware... Is it really that simple?
> >>
> >>Yeah, the key things are reprogramming the timer in the idle loop
> >>based on next_timer_interrupt(), and calling timer_interrupt from
> >>other interrupts as well :)
> >>
> >>Should we try a similar patch for x86/amd64? I'm not sure which timers
> >>to use though? One should be programmable length for the interrupt,
> >>and the other continuous for the timekeeping.
> >
> >Yes, it would certainly be interesting. 5% power savings, and no
> >singing capacitors, while keeping HZ=1000. Sounds good to me.
> >
> >There are about 1000 timers available in PC, each having its own
> >quirks. CMOS clock should be able to generate 1024Hz periodic timer
> >(we currently do not use) and TSC we currently use for periodic timer
> >should be usable in single-shot mode.
> >								Pavel
> >--
> 
> If you use that RTC timer, it needs to be something that can be
> turned OFF. Many embedded applications use that because its the
> only timer that the OS doesn't muck with. It also has very low
> noise which makes in a good timing source for IIR filters for
> high precision, low data-rate data acquisition (like 24 bits).

OK, thanks for the information. That could be the continuous timer
then, and TSC the periodic timer.

> Since it generates an edge, its interrupt can't be shared.
> I certainly hope that you don't use it. One can read the
> time without disturbing the interrupt rate. One just
> needs to use the existing rtc_lock and not spin with
> the lock being held.

Yeah, the timer update would be just a read from the RTC timer.

> Currently the kernel RTC software allocates the RTC interrupt
> even though it doesn't use it. This makes it necessary to
> compile the RTC as a module and then remove it when another
> driver needs to use the RTC interrupt source.

The interrupt could be used for timer wrap only.

Tony

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: USB making time drift [was Re: dynamic-hz]
  2004-12-14 22:02                             ` USB making time drift [was Re: dynamic-hz] Pavel Machek
@ 2004-12-14 23:16                               ` Andrea Arcangeli
  2004-12-15  2:59                                 ` Gene Heskett
  2004-12-16  0:58                                 ` Pavel Machek
  2004-12-16  1:15                               ` Time goes crazy in 2.6.9 after long cli [was Re: USB making time drift] Pavel Machek
  1 sibling, 2 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-14 23:16 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel

On Tue, Dec 14, 2004 at 11:02:39PM +0100, Pavel Machek wrote:
> How much drift do you see?

huge drift, minutes per hour or similar.

> I have machine with UHCI here, and am using usb most of the time
> (bluetooth for gprs connection), and did not notice too bad
> drift. ntpdate does some adjustment each time I connect to the
> network, but it 

Could be it happens only with my usb chipset or only with the adsl modem
with the usermode driver.

You can just write the proggy doing iopl cli/sti in a loop (keeping irqs
off for 3/4msec a few times per second like my usb modem does), you
should be able to see the drift in any machine without requiring an adsl
modem.

This was the status of my last attempt to fix it a few weeks ago. Patch
fixes a few unrelated bits. But the core of the below patch is actually
wrong, previous code did the right thing even if this works better in
practice. so I had not much motivation to extract the good bits until I
find the source of the big screwup in system time.

I probably should do any further debugging with an userspace simulation
(i.e. the iopl + cli/sti in a loop) within qemu.

--- sp1/arch/i386/kernel/timers/timer_tsc.c.~1~	2004-04-04 08:08:48.000000000 +0200
+++ sp1/arch/i386/kernel/timers/timer_tsc.c	2004-11-22 06:01:21.725371368 +0100
@@ -39,6 +39,7 @@ static unsigned long last_tsc_low; /* ls
 static unsigned long last_tsc_high; /* msb 32 bits of Time Stamp Counter */
 static unsigned long long monotonic_base;
 static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
+static int report_lost_ticks; /* command line option */
 
 /* convert from cycles(64bits) => nanoseconds (64bits)
  *  basic equation:
@@ -69,8 +70,6 @@ static inline unsigned long long cycles_
 }
 
 
-static int count2; /* counter for mark_offset_tsc() */
-
 /* Cached *multiplier* to convert TSC counts to microseconds.
  * (see the equation below).
  * Equal to 2^32 * (1 / (clocks per usec) ).
@@ -153,11 +152,12 @@ unsigned long long sched_clock(void)
 
 static void mark_offset_tsc(void)
 {
-	unsigned long lost,delay;
+	unsigned long ticks;
 	unsigned long delta = last_tsc_low;
-	int count;
-	int countmp;
-	static int count1 = 0;
+	unsigned int count;
+	unsigned int countmp;
+	static unsigned int count1 = 0, count2 = LATCH;
+
 	unsigned long long this_offset, last_offset;
 	static int lost_count = 0;
 	
@@ -175,12 +175,11 @@ static void mark_offset_tsc(void)
 	 * has the SA_INTERRUPT flag set. -arca
 	 */
 	
-	/* read Pentium cycle counter */
-
-	rdtsc(last_tsc_low, last_tsc_high);
 
 	spin_lock(&i8253_lock);
-	outb_p(0x00, PIT_MODE);     /* latch the count ASAP */
+
+	/* read Pentium cycle counter and latch the count ASAP */
+	rdtsc(last_tsc_low, last_tsc_high); outb_p(0x00, PIT_MODE);
 
 	count = inb_p(PIT_CH0);    /* read the latched count */
 	count |= inb(PIT_CH0) << 8;
@@ -198,7 +197,7 @@ static void mark_offset_tsc(void)
 
 	spin_unlock(&i8253_lock);
 
-	if (pit_latch_buggy) {
+	if (unlikely(pit_latch_buggy)) {
 		/* get center value of last 3 time lutch */
 		if ((count2 >= count && count >= count1)
 		    || (count1 >= count && count >= count2)) {
@@ -223,11 +222,10 @@ static void mark_offset_tsc(void)
 		 "0" (eax));
 		delta = edx;
 	}
-	delta += delay_at_last_interrupt;
-	lost = delta/(1000000/HZ);
-	delay = delta%(1000000/HZ);
-	if (lost >= 2) {
-		jiffies_64 += lost-1;
+	//delta += delay_at_last_interrupt;
+	ticks = delta/(1000000/HZ);
+	if (unlikely(ticks >= 2)) {
+		jiffies_64 += ticks-1;
 
 		/* sanity check to ensure we're not always losing ticks */
 		if (lost_count++ > 100) {
@@ -241,6 +239,20 @@ static void mark_offset_tsc(void)
 
 			clock_fallback();
 		}
+
+		{
+			static u64 last_lost_tick;
+			if (last_lost_tick <= jiffies_64) {
+				printk(KERN_WARNING "Compensate %ld timer tick(s)\n", ticks-1);
+				dump_stack();
+				if  (report_lost_ticks)
+					/* max 1 per sec */
+					last_lost_tick = jiffies_64 + HZ;
+				else
+					/* force dump of lost ticks information not more than 1 per day */
+					last_lost_tick = jiffies_64 + 60*60*24*HZ;
+			}
+		}
 	} else
 		lost_count = 0;
 	/* update the monotonic base value */
@@ -248,16 +260,14 @@ static void mark_offset_tsc(void)
 	monotonic_base += cycles_2_ns(this_offset - last_offset);
 	write_sequnlock(&monotonic_lock);
 
+	/* Some i8253 clones hold the LATCH value visible
+	   momentarily as they flip back to zero */
+	if (unlikely(count == LATCH))
+		count--;
+
 	/* calculate delay_at_last_interrupt */
 	count = ((LATCH-1) - count) * TICK_SIZE;
 	delay_at_last_interrupt = (count + LATCH/2) / LATCH;
-
-	/* catch corner case where tick rollover occured 
-	 * between tsc and pit reads (as noted when 
-	 * usec delta is > 90% # of usecs/tick)
-	 */
-	if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ))
-		jiffies_64++;
 }
 
 static void delay_tsc(unsigned long loops)
@@ -433,8 +443,6 @@ static int __init init_tsc(char* overrid
  	 *	moaned if you have the only one in the world - you fix it!
  	 */
 
-	count2 = LATCH; /* initialize counter for mark_offset_tsc() */
-
 	if (cpu_has_tsc) {
 		unsigned long tsc_quotient;
 #ifdef CONFIG_HPET_TIMER
@@ -502,7 +510,12 @@ static int __init tsc_setup(char *str)
 #endif
 __setup("notsc", tsc_setup);
 
-
+static int __init report_lost_ticks_setup(char *str)
+{
+	report_lost_ticks = 1;
+	return 1;
+}
+__setup("report_lost_ticks", report_lost_ticks_setup);
 
 /************************************************************/
 
--- sp1/arch/i386/kernel/irq.c.~1~	2004-11-21 02:37:25.000000000 +0100
+++ sp1/arch/i386/kernel/irq.c	2004-11-22 07:03:15.140846408 +0100
@@ -217,14 +217,16 @@ inline void synchronize_irq(unsigned int
 int handle_IRQ_event(unsigned int irq,
 		struct pt_regs *regs, struct irqaction *action)
 {
-	int status = 1;	/* Force the "do bottom halves" bit */
+	int status = 0;
 	int retval = 0;
 
 	TRIG_EVENT(irq_entry_hook, irq, regs, !(user_mode(regs)));
-	if (!(action->flags & SA_INTERRUPT))
-		local_irq_enable();
-
 	do {
+		if (action->flags & SA_INTERRUPT)
+			local_irq_disable();
+		else
+			local_irq_enable();
+
 		status |= action->flags;
 		retval |= action->handler(irq, action->dev_id, regs);
 		action = action->next;

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 22:54         ` dynamic-hz Lee Revell
@ 2004-12-14 23:38           ` Chris Friesen
  2004-12-15  8:32             ` dynamic-hz Jan Engelhardt
  0 siblings, 1 reply; 126+ messages in thread
From: Chris Friesen @ 2004-12-14 23:38 UTC (permalink / raw)
  To: Lee Revell; +Cc: Jan Engelhardt, linux-kernel

Lee Revell wrote:

> Ugh, because mplayer stupidly does disk i/o and AV playback and GUI in
> the same thread.  Insert Xine plug.

This is not a problem as long as all of them can be done totally async.  As soon 
as anything blocks, then there's an issue.

Chris

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: USB making time drift [was Re: dynamic-hz]
  2004-12-14 23:16                               ` Andrea Arcangeli
@ 2004-12-15  2:59                                 ` Gene Heskett
  2004-12-15  9:17                                   ` Andrea Arcangeli
  2004-12-16  0:58                                 ` Pavel Machek
  1 sibling, 1 reply; 126+ messages in thread
From: Gene Heskett @ 2004-12-15  2:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrea Arcangeli, Pavel Machek, Zwane Mwaikambo, Con Kolivas

On Tuesday 14 December 2004 18:16, Andrea Arcangeli wrote:
>On Tue, Dec 14, 2004 at 11:02:39PM +0100, Pavel Machek wrote:
>> How much drift do you see?
>
>huge drift, minutes per hour or similar.

Which way?  I was running quite fast here, several minutes an
hour, then I discovered the tickadj command, found its default
was 10000, and started reducing it.  At 9926, I'm staying within
a sec an hour now.  I have no idea when this started, I didn't
discover it till I had already been running Ingo's realtime
patches for a while, then checked with a stock 2.6.9 and found it
was doing it then.

[...]

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.30% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 23:38           ` dynamic-hz Chris Friesen
@ 2004-12-15  8:32             ` Jan Engelhardt
  0 siblings, 0 replies; 126+ messages in thread
From: Jan Engelhardt @ 2004-12-15  8:32 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Lee Revell, linux-kernel

>> Ugh, because mplayer stupidly does disk i/o and AV playback and GUI in
>> the same thread.  Insert Xine plug.

There has been a real flame war on using threads in mplayer -- it ended in 
a fork into mplayer-xp. Surprisingly, the problem is not mplayer. Using the 
OSS kernel modules instead of ALSA, audio may drop, but it does not _skip_ it.

> This is not a problem as long as all of them can be done totally async.  As
> soon as anything blocks, then there's an issue.

Is there a way i can prioritize mplayer to get disk i/o done first?



Jan Engelhardt
-- 
ENOSPC

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: USB making time drift [was Re: dynamic-hz]
  2004-12-15  2:59                                 ` Gene Heskett
@ 2004-12-15  9:17                                   ` Andrea Arcangeli
  2004-12-15 16:44                                     ` Gene Heskett
  2004-12-15 17:03                                     ` Gene Heskett
  0 siblings, 2 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-15  9:17 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Pavel Machek, Zwane Mwaikambo, Con Kolivas

On Tue, Dec 14, 2004 at 09:59:23PM -0500, Gene Heskett wrote:
> Which way?  I was running quite fast here, several minutes an

In the future, if I disable the logic it goes in the past at the same
speed it was previously going in the future.

> hour, then I discovered the tickadj command, found its default
> was 10000, and started reducing it.  At 9926, I'm staying within
> a sec an hour now.  I have no idea when this started, I didn't

That seems quite an hack, note I did an hack too and it make the drift
much smaller (it gets manageable). But our modifications are wrong.

The point is that this didn't happen with HZ=100, so it's not that
tickadj is wrong, it's the tick adjustment code that doesn't work.

You may want to recompile your kernel with HZ=100 and verify it goes
away (I didn't verify myself, but I verified the max irq latency I get
is 4msec, and in turn I'm sure HZ=100 would fix it)

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: USB making time drift [was Re: dynamic-hz]
  2004-12-15  9:17                                   ` Andrea Arcangeli
@ 2004-12-15 16:44                                     ` Gene Heskett
  2004-12-15 18:20                                       ` Andrea Arcangeli
  2004-12-15 20:16                                       ` Pavel Machek
  2004-12-15 17:03                                     ` Gene Heskett
  1 sibling, 2 replies; 126+ messages in thread
From: Gene Heskett @ 2004-12-15 16:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrea Arcangeli, Pavel Machek, Zwane Mwaikambo, Con Kolivas

On Wednesday 15 December 2004 04:17, Andrea Arcangeli wrote:
>On Tue, Dec 14, 2004 at 09:59:23PM -0500, Gene Heskett wrote:
>> Which way?  I was running quite fast here, several minutes an
>
>In the future, if I disable the logic it goes in the past at the
> same speed it was previously going in the future.
>
>> hour, then I discovered the tickadj command, found its default
>> was 10000, and started reducing it.  At 9926, I'm staying within
>> a sec an hour now.  I have no idea when this started, I didn't
>
>That seems quite an hack, note I did an hack too and it make the
> drift much smaller (it gets manageable). But our modifications are
> wrong.
>
>The point is that this didn't happen with HZ=100, so it's not that
>tickadj is wrong, it's the tick adjustment code that doesn't work.
>
The HZ=1000 is the culprit?

>You may want to recompile your kernel with HZ=100 and verify it goes
>away (I didn't verify myself, but I verified the max irq latency I
> get is 4msec, and in turn I'm sure HZ=100 would fix it

Humm, that might also reduce the obviousness of the irq activity in
the audio, there are times when I can hear it very plainly while a
low level audio src is in use, like the sub-millivolt levels that come
out of my Hauppauge WinTV-GO+FM card.   I keep having to turn the
master down to almost zip in order to keep it from sounding like I
have mice chewing in the walls, but its coming from the speakers. 
Onboard AC-97 audio of course.  Crappy stuff...   Humm, 100HZ would
translate to 10 millisecond intervals.  If you had a 4 millisecond 
latency,
that would be spread over 4 of the 1000 hz interrupts.  That sounds
rather confusing to the service routine I imagine.

I'll do that just for grins & report back.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.30% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: USB making time drift [was Re: dynamic-hz]
  2004-12-15  9:17                                   ` Andrea Arcangeli
  2004-12-15 16:44                                     ` Gene Heskett
@ 2004-12-15 17:03                                     ` Gene Heskett
  2004-12-15 17:48                                       ` Tim Schmielau
  1 sibling, 1 reply; 126+ messages in thread
From: Gene Heskett @ 2004-12-15 17:03 UTC (permalink / raw)
  To: linux-kernel

On Wednesday 15 December 2004 04:17, Andrea Arcangeli wrote:
>On Tue, Dec 14, 2004 at 09:59:23PM -0500, Gene Heskett wrote:
[...]
>The point is that this didn't happen with HZ=100, so it's not
that
>tickadj is wrong, it's the tick adjustment code that doesn't work.
>
>You may want to recompile your kernel with HZ=100 and verify it goes
>away (I didn't verify myself, but I verified the max irq latency I
> get is 4msec, and in turn I'm sure HZ=100 would fix it)

Ok, I was going to do that, but forgive me, its not in the .config
file as a setting.  So where do edit what to revert to 100hz's.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.30% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: USB making time drift [was Re: dynamic-hz]
  2004-12-15 17:03                                     ` Gene Heskett
@ 2004-12-15 17:48                                       ` Tim Schmielau
  2004-12-16  2:03                                         ` Gene Heskett
  0 siblings, 1 reply; 126+ messages in thread
From: Tim Schmielau @ 2004-12-15 17:48 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel

> Ok, I was going to do that, but forgive me, its not in the .config
> file as a setting.  So where do edit what to revert to 100hz's.

It's in line 5 of include/asm-i386/param.h:
# define HZ             1000            /* Internal kernel timer frequency */

(if you are on an i386 system). Just change that back to 100.

Tim

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14  0:16             ` dynamic-hz Eric St-Laurent
@ 2004-12-15 18:04               ` Alan Cox
  2004-12-15 19:54                 ` dynamic-hz linux-os
  2004-12-16  9:10                 ` dynamic-hz Gabriel Paubert
  0 siblings, 2 replies; 126+ messages in thread
From: Alan Cox @ 2004-12-15 18:04 UTC (permalink / raw)
  To: Eric St-Laurent
  Cc: Russell King, Stefan Seyfried, Con Kolivas, Pavel Machek,
	Linux Kernel Mailing List, Andrea Arcangeli

On Maw, 2004-12-14 at 00:16, Eric St-Laurent wrote:
> Alan,
> 
> On a related subject, a few months ago you posted a patch which added a
> nice add_timeout()/timeout_pending() API and converted many (if not
> most) drivers to use it.
> 
> If I remember correctly it did not generate much comments and the work
> was not pushed into mainline.
> 
> I think it's a nice cleanup, IMHO the time_(before|after)(jiffies, ...)
> construct is horrible.
> 
> Any chance to resurrect this work ?

I plan to ressurect it when I have a little time but with some small
additions from the original work. Several people said "it should be mS
not HZ" and someone at OLS proposed that the API also includes an
accuracy guide so that systems using programmed wakeups can aggregate
timers when accuracy doesn't matter.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: USB making time drift [was Re: dynamic-hz]
  2004-12-15 16:44                                     ` Gene Heskett
@ 2004-12-15 18:20                                       ` Andrea Arcangeli
  2004-12-16  1:59                                         ` Gene Heskett
  2004-12-15 20:16                                       ` Pavel Machek
  1 sibling, 1 reply; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-15 18:20 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Pavel Machek, Zwane Mwaikambo, Con Kolivas

On Wed, Dec 15, 2004 at 11:44:38AM -0500, Gene Heskett wrote:
> The HZ=1000 is the culprit?

HZ=1000 isn't the culprit. The culprit is the >1msec latency of the usb
irq, but that wouldn't be visible with HZ 100 (for this specific case
HZ=100 would only be a band-aid). 

> Onboard AC-97 audio of course.  Crappy stuff... [..]

I doubt it's the chip, but only the motherboard to blame. My laptop has
the ac97 but no HZ sound out of it.

> translate to 10 millisecond intervals.  If you had a 4 millisecond 
> latency,
> that would be spread over 4 of the 1000 hz interrupts.  That sounds
> rather confusing to the service routine I imagine.

The ones that get confused are the system time and the jiffies, the rest
of the system can deal with long irq delays. The tick adjustment was
exactly implemented so that the jiffies and system time wouldn't get
confused anymore, but it just confuses it the other way around in my
current experience.

> I'll do that just for grins & report back.

Ok.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-15 18:04               ` dynamic-hz Alan Cox
@ 2004-12-15 19:54                 ` linux-os
  2004-12-16  2:17                   ` dynamic-hz Gene Heskett
  2004-12-16  9:10                 ` dynamic-hz Gabriel Paubert
  1 sibling, 1 reply; 126+ messages in thread
From: linux-os @ 2004-12-15 19:54 UTC (permalink / raw)
  To: Alan Cox
  Cc: Eric St-Laurent, Russell King, Stefan Seyfried, Con Kolivas,
	Pavel Machek, Linux Kernel Mailing List, Andrea Arcangeli

On Wed, 15 Dec 2004, Alan Cox wrote:

> On Maw, 2004-12-14 at 00:16, Eric St-Laurent wrote:
>> Alan,
>>
>> On a related subject, a few months ago you posted a patch which added a
>> nice add_timeout()/timeout_pending() API and converted many (if not
>> most) drivers to use it.
>>
>> If I remember correctly it did not generate much comments and the work
>> was not pushed into mainline.
>>
>> I think it's a nice cleanup, IMHO the time_(before|after)(jiffies, ...)
>> construct is horrible.
>>
>> Any chance to resurrect this work ?
>
> I plan to ressurect it when I have a little time but with some small
> additions from the original work. Several people said "it should be mS
> not HZ" and someone at OLS proposed that the API also includes an
> accuracy guide so that systems using programmed wakeups can aggregate
> timers when accuracy doesn't matter.

I sure hope it isn't mS. Transconductance or its reciprocal doesn't
work very well for timing unless you supply the capacitor ;^)

FYI, mS means milli-Siemens. Seconds is lower-case --always.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by John Ashcroft.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: USB making time drift [was Re: dynamic-hz]
  2004-12-15 16:44                                     ` Gene Heskett
  2004-12-15 18:20                                       ` Andrea Arcangeli
@ 2004-12-15 20:16                                       ` Pavel Machek
  2004-12-16  2:02                                         ` Gene Heskett
  1 sibling, 1 reply; 126+ messages in thread
From: Pavel Machek @ 2004-12-15 20:16 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas

Hi!

> >> Which way?  I was running quite fast here, several minutes an
> >
> >In the future, if I disable the logic it goes in the past at the
> > same speed it was previously going in the future.
> >
> >> hour, then I discovered the tickadj command, found its default
> >> was 10000, and started reducing it.  At 9926, I'm staying within
> >> a sec an hour now.  I have no idea when this started, I didn't
> >
> >That seems quite an hack, note I did an hack too and it make the
> > drift much smaller (it gets manageable). But our modifications are
> > wrong.
> >
> >The point is that this didn't happen with HZ=100, so it's not that
> >tickadj is wrong, it's the tick adjustment code that doesn't work.
> >
> The HZ=1000 is the culprit?
> 
> >You may want to recompile your kernel with HZ=100 and verify it goes
> >away (I didn't verify myself, but I verified the max irq latency I
> > get is 4msec, and in turn I'm sure HZ=100 would fix it
> 
> Humm, that might also reduce the obviousness of the irq activity in
> the audio, there are times when I can hear it very plainly while a
> low level audio src is in use, like the sub-millivolt levels that come
> out of my Hauppauge WinTV-GO+FM card.   I keep having to turn the

Try idle=poll. That noise may be commig from cpu switching between
powersave and full speed.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: USB making time drift [was Re: dynamic-hz]
  2004-12-14 23:16                               ` Andrea Arcangeli
  2004-12-15  2:59                                 ` Gene Heskett
@ 2004-12-16  0:58                                 ` Pavel Machek
  2004-12-16  2:33                                   ` john stultz
  1 sibling, 1 reply; 126+ messages in thread
From: Pavel Machek @ 2004-12-16  0:58 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel

Hi!

> > How much drift do you see?
> 
> huge drift, minutes per hour or similar.

Okay, for your amusement, here's the evil
"do-few-msec-interrupt-latency" program.

Andrea, could you verify that it causes clock to drift for you? I'll
leave it running here overnight, and will see what happens.

								Pavel
void
main(void)
{
        int i;
        iopl(3);
        while (1) {
                asm volatile("cli");
                for (i=0; i<20000000; i++)
                        asm volatile("");
                asm volatile("sti");
                sleep(1);
        }
}


-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Time goes crazy in 2.6.9 after long cli [was Re: USB making time drift]
  2004-12-14 22:02                             ` USB making time drift [was Re: dynamic-hz] Pavel Machek
  2004-12-14 23:16                               ` Andrea Arcangeli
@ 2004-12-16  1:15                               ` Pavel Machek
  2004-12-16 11:13                                 ` Andrea Arcangeli
  1 sibling, 1 reply; 126+ messages in thread
From: Pavel Machek @ 2004-12-16  1:15 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel

Hi!

> > > Are you using CONFIG_HPET_TIMER by chance? It seems to be missing some
> > > strategic -1, TSC (etc) get it right.
> > 
> > I'm not using hpet because it's an old hardware, this is with timer_tsc.
> > It must be reproducible in any machine out there, especially with
> > machines with usb it should be reproducible even without any userspace
> > testcase doing iopl/cli/sti. Time will go silenty in the future at every
> > usb irq (they often last 3/4msec).
> 
> How much drift do you see?
> 
> I have machine with UHCI here, and am using usb most of the time
> (bluetooth for gprs connection), and did not notice too bad
> drift. ntpdate does some adjustment each time I connect to the
> network, but it 

Okay, I have good news and bad news. Bad news is that it is broken on
my machine, too. Good news is that breakage is not at all subtle.

root@amd:/home/pavel/misc# time ./latency

0.00user 0.00system 1.69 (0m1.694s) elapsed 0.17%CPU
root@amd:/home/pavel/misc# time ./latency

0.00user 0.00system 4293.47 (71m33.478s) elapsed 0.00%CPU
root@amd:/home/pavel/misc#

71 minutes when it ran for 2.5 seconds?!

root@amd:~# ntpdate tak.cesnet.cz
16 Dec 02:04:24 ntpdate[6385]: adjust time server 195.113.144.238
offset 0.010865 sec
root@amd:~# ntpdate tak.cesnet.cz
16 Dec 02:08:07 ntpdate[6405]: step time server 195.113.144.238 offset
85.903997 sec
root@amd:~# ntpdate tak.cesnet.cz
16 Dec 02:09:02 ntpdate[6410]: step time server 195.113.144.238 offset
4.306853 sec
root@amd:~# ntpdate tak.cesnet.cz
16 Dec 02:09:11 ntpdate[6411]: adjust time server 195.113.144.238
offset -0.028829 sec
root@amd:~# ntpdate tak.cesnet.cz
16 Dec 02:09:27 ntpdate[6413]: step time server 195.113.144.238 offset
4.283117 sec
root@amd:~# ntpdate tak.cesnet.cz
16 Dec 02:09:47 ntpdate[6415]: step time server 195.113.144.238 offset
4.286300 sec
root@amd:~#

It seems that each cycle of attached program (needs root) breaks
system time by 4 seconds... I do not know why it printed 71minutes
there. That seems like some underflow somewhere.. Strange, now it
happened again.

void
main(void)
{
        int i;
        iopl(3);
        while (1) {
                asm volatile("cli");
                //              for (i=0; i<20000000; i++)
                for (i=0; i<1000000000; i++)
                        asm volatile("");
                asm volatile("sti");
                sleep(1);
        }
}

Actually it seems to create some sort havoc in timer
subsystem... Actually it is reproducible:

root@amd:/home/pavel/misc# date; time ./latency ; date; sleep 1; date; sleep 1; date; sleep 1; date
Thu Dec 16 02:14:18 CET 2004

0.00user 0.00system 4293.51 (71m33.516s) elapsed 0.00%CPU
Thu Dec 16 03:25:51 CET 2004
Thu Dec 16 02:14:18 CET 2004
Thu Dec 16 02:14:19 CET 2004
Thu Dec 16 02:14:20 CET 2004
root@amd:/home/pavel/misc# date; time ./latency ; date; sleep 1; date; sleep 1; date; sleep 1; date
Thu Dec 16 02:14:23 CET 2004

0.00user 0.00system 4293.52 (71m33.521s) elapsed 0.00%CPU
Thu Dec 16 03:25:56 CET 2004
Thu Dec 16 03:25:57 CET 2004
Thu Dec 16 02:14:23 CET 2004
Thu Dec 16 02:14:24 CET 2004
root@amd:/home/pavel/misc#

							Pavel

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: USB making time drift [was Re: dynamic-hz]
  2004-12-15 18:20                                       ` Andrea Arcangeli
@ 2004-12-16  1:59                                         ` Gene Heskett
  2004-12-16 11:30                                           ` Andrea Arcangeli
  2004-12-16 12:50                                           ` Alan Cox
  0 siblings, 2 replies; 126+ messages in thread
From: Gene Heskett @ 2004-12-16  1:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrea Arcangeli, Pavel Machek, Zwane Mwaikambo, Con Kolivas

On Wednesday 15 December 2004 13:20, Andrea Arcangeli wrote:
>On Wed, Dec 15, 2004 at 11:44:38AM -0500, Gene Heskett wrote:
>> The HZ=1000 is the culprit?
>
>HZ=1000 isn't the culprit. The culprit is the >1msec latency of the
> usb irq, but that wouldn't be visible with HZ 100 (for this
:> specific case HZ=100 would only be a band-aid).
>
>> Onboard AC-97 audio of course.  Crappy stuff... [..]
>
>I doubt it's the chip, but only the motherboard to blame. My laptop
> has the ac97 but no HZ sound out of it.
>
>> translate to 10 millisecond intervals.  If you had a 4 millisecond
>> latency,
>> that would be spread over 4 of the 1000 hz interrupts.  That
>> sounds rather confusing to the service routine I imagine.
>
>The ones that get confused are the system time and the jiffies, the
> rest of the system can deal with long irq delays. The tick
> adjustment was exactly implemented so that the jiffies and system
> time wouldn't get confused anymore, but it just confuses it the
> other way around in my current experience.
>
>> I'll do that just for grins & report back.
>
>Ok.

Unforch, I was not able to find that in the .config file, so where is
that particular option set?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.30% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: USB making time drift [was Re: dynamic-hz]
  2004-12-15 20:16                                       ` Pavel Machek
@ 2004-12-16  2:02                                         ` Gene Heskett
  0 siblings, 0 replies; 126+ messages in thread
From: Gene Heskett @ 2004-12-16  2:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Pavel Machek, Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas

On Wednesday 15 December 2004 15:16, Pavel Machek wrote:
>Hi!

Hi Pavel;

>> >> Which way?  I was running quite fast here, several minutes an
>
>Try idle=poll. That noise may be commig from cpu switching between
>powersave and full speed.
>        Pavel

I don't think I have that option set/enabled at all, and these
machines are running seti so the cpu stays at 100% anyway.

Where would I set that if I wanted to try it?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.30% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: USB making time drift [was Re: dynamic-hz]
  2004-12-15 17:48                                       ` Tim Schmielau
@ 2004-12-16  2:03                                         ` Gene Heskett
  0 siblings, 0 replies; 126+ messages in thread
From: Gene Heskett @ 2004-12-16  2:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tim Schmielau

On Wednesday 15 December 2004 12:48, Tim Schmielau wrote:
>> Ok, I was going to do that, but forgive me, its not in the .config
>> file as a setting.  So where do edit what to revert to 100hz's.
>
>It's in line 5 of include/asm-i386/param.h:
># define HZ             1000            /* Internal kernel timer
> frequency */
>
>(if you are on an i386 system). Just change that back to 100.
>
>Tim

Thanks Tim, I might do that for a boot or 2 just for the exersize.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.30% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-15 19:54                 ` dynamic-hz linux-os
@ 2004-12-16  2:17                   ` Gene Heskett
  2004-12-16 12:42                     ` dynamic-hz linux-os
  2004-12-17 20:12                     ` dynamic-hz H. Peter Anvin
  0 siblings, 2 replies; 126+ messages in thread
From: Gene Heskett @ 2004-12-16  2:17 UTC (permalink / raw)
  To: linux-kernel, linux-os
  Cc: Alan Cox, Eric St-Laurent, Russell King, Stefan Seyfried,
	Con Kolivas, Pavel Machek, Andrea Arcangeli

On Wednesday 15 December 2004 14:54, linux-os wrote:
>On Wed, 15 Dec 2004, Alan Cox wrote:
>> On Maw, 2004-12-14 at 00:16, Eric St-Laurent wrote:
>>> Alan,
>>>
>>> On a related subject, a few months ago you posted a patch which
>>> added a nice add_timeout()/timeout_pending() API and converted
>>> many (if not most) drivers to use it.
>>>
>>> If I remember correctly it did not generate much comments and the
>>> work was not pushed into mainline.
>>>
>>> I think it's a nice cleanup, IMHO the
>>> time_(before|after)(jiffies, ...) construct is horrible.
>>>
>>> Any chance to resurrect this work ?
>>
>> I plan to ressurect it when I have a little time but with some
>> small additions from the original work. Several people said "it
>> should be mS not HZ" and someone at OLS proposed that the API also
>> includes an accuracy guide so that systems using programmed
>> wakeups can aggregate timers when accuracy doesn't matter.
>
>I sure hope it isn't mS. Transconductance or its reciprocal doesn't
>work very well for timing unless you supply the capacitor ;^)

Me sticks hand up and waves at teacher.

And what does 'Transconductance' have to do with this?

That may be the wrong terminology to apply here.

In vacuum tube (remember those?) specifications, this is the gain of
the tube, which AIR is stated as the change in plate current for a
one volt change in grid bias, and is normally stated in micromho's as
they are high voltage, low current devices, with the highest gain
tube that I'm aware of being the 7788.   Using the same measurement
technique applied to modern relatively highed power field effect
transistors where the currents can be many amperes, readings best
stated in mho's are fairly common today. A 'mho' of course, is the
reciprocal of an ohm.

>FYI, mS means milli-Siemens. Seconds is lower-case --always.

Yup.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.30% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: USB making time drift [was Re: dynamic-hz]
  2004-12-16  0:58                                 ` Pavel Machek
@ 2004-12-16  2:33                                   ` john stultz
  0 siblings, 0 replies; 126+ messages in thread
From: john stultz @ 2004-12-16  2:33 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Andrea Arcangeli, Zwane Mwaikambo, Con Kolivas, lkml

On Wed, 2004-12-15 at 16:58, Pavel Machek wrote:
> Hi!
> 
> > > How much drift do you see?
> > 
> > huge drift, minutes per hour or similar.
> 
> Okay, for your amusement, here's the evil
> "do-few-msec-interrupt-latency" program.

Ohhh! Awesome. I love it!

I'm playing with it and I'm seeing occasional jumps forward in time
(about an hour and 10mins, and then back). I'll start seeing if there
isn't anything we can do a quick fix for. Also I'll use this to test the
timeofday rework code I'm doing.

Very nice!

thanks!
-john


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-15 18:04               ` dynamic-hz Alan Cox
  2004-12-15 19:54                 ` dynamic-hz linux-os
@ 2004-12-16  9:10                 ` Gabriel Paubert
  2004-12-16 12:17                   ` dynamic-hz Geert Uytterhoeven
  2004-12-16 14:00                   ` dynamic-hz Mitchell Blank Jr
  1 sibling, 2 replies; 126+ messages in thread
From: Gabriel Paubert @ 2004-12-16  9:10 UTC (permalink / raw)
  To: Alan Cox
  Cc: Eric St-Laurent, Russell King, Stefan Seyfried, Con Kolivas,
	Pavel Machek, Linux Kernel Mailing List, Andrea Arcangeli

On Wed, Dec 15, 2004 at 06:04:03PM +0000, Alan Cox wrote:
> On Maw, 2004-12-14 at 00:16, Eric St-Laurent wrote:
> > Alan,
> > 
> > On a related subject, a few months ago you posted a patch which added a
> > nice add_timeout()/timeout_pending() API and converted many (if not
> > most) drivers to use it.
> > 
> > If I remember correctly it did not generate much comments and the work
> > was not pushed into mainline.
> > 
> > I think it's a nice cleanup, IMHO the time_(before|after)(jiffies, ...)
> > construct is horrible.
> > 
> > Any chance to resurrect this work ?
> 
> I plan to ressurect it when I have a little time but with some small
> additions from the original work. Several people said "it should be mS
> not HZ" and someone at OLS proposed that the API also includes an
> accuracy guide so that systems using programmed wakeups can aggregate
> timers when accuracy doesn't matter.

I suspect people who want to push HZ to 10000 won't be happy with
milliseconds since it would not give them a resolution of one jiffy.

So the options are:
1) microseconds, allows up to roughly half an hour (signed) 
   or an hour (unsigned).
2) nanoseconds, needs 64 bits, nice for 64 bit machines but 
   at the risk of bloat on 32 bit ones.
3) timespecs, somewhat wasteful on 64 bit machines (two longs).

I believe 1) is the best compromise.

	Regards,
	Gabriel

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Time goes crazy in 2.6.9 after long cli [was Re: USB making time drift]
  2004-12-16  1:15                               ` Time goes crazy in 2.6.9 after long cli [was Re: USB making time drift] Pavel Machek
@ 2004-12-16 11:13                                 ` Andrea Arcangeli
  2004-12-16 12:49                                   ` Alan Cox
  0 siblings, 1 reply; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-16 11:13 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Zwane Mwaikambo, Con Kolivas, linux-kernel

On Thu, Dec 16, 2004 at 02:15:49AM +0100, Pavel Machek wrote:
> Okay, I have good news and bad news. Bad news is that it is broken on
> my machine, too. Good news is that breakage is not at all subtle.

Well, I was pretty sure it was reproducible since the PIT and TSC are
standard hw in all machines, it's just the excessive usb irq latency
that triggers only in a few machines like my firewall (only with
HZ=1000).

My suggestion is that first we fix the accuracy of this, and *then* we
consider switching to a one-short timer.

Fixing this is possible as well by using only the TSC accuracy to
account for system time, and not to use anymore the PIT accuracy as
source of accuracy for system time. But then any error calibration while
we transfer the accuracy of the PIT to the TSC, will propagate in a
cumulative way over time. So it's not clear to me we can do that safely.
The PIT is designed everywhere to be accurate for system time.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: USB making time drift [was Re: dynamic-hz]
  2004-12-16  1:59                                         ` Gene Heskett
@ 2004-12-16 11:30                                           ` Andrea Arcangeli
  2004-12-16 12:50                                           ` Alan Cox
  1 sibling, 0 replies; 126+ messages in thread
From: Andrea Arcangeli @ 2004-12-16 11:30 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Pavel Machek, Zwane Mwaikambo, Con Kolivas

On Wed, Dec 15, 2004 at 08:59:52PM -0500, Gene Heskett wrote:
> Unforch, I was not able to find that in the .config file, so where is
> that particular option set?

There is no config option indeed, you need to edit
include/asm-i386/param.h to change HZ.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-16  9:10                 ` dynamic-hz Gabriel Paubert
@ 2004-12-16 12:17                   ` Geert Uytterhoeven
  2004-12-16 14:00                   ` dynamic-hz Mitchell Blank Jr
  1 sibling, 0 replies; 126+ messages in thread
From: Geert Uytterhoeven @ 2004-12-16 12:17 UTC (permalink / raw)
  To: Gabriel Paubert
  Cc: Alan Cox, Eric St-Laurent, Russell King, Stefan Seyfried,
	Con Kolivas, Pavel Machek, Linux Kernel Mailing List,
	Andrea Arcangeli

On Thu, 16 Dec 2004, Gabriel Paubert wrote:
> On Wed, Dec 15, 2004 at 06:04:03PM +0000, Alan Cox wrote:
> > On Maw, 2004-12-14 at 00:16, Eric St-Laurent wrote:
> > > On a related subject, a few months ago you posted a patch which added a
> > > nice add_timeout()/timeout_pending() API and converted many (if not
> > > most) drivers to use it.
> > > 
> > > If I remember correctly it did not generate much comments and the work
> > > was not pushed into mainline.
> > > 
> > > I think it's a nice cleanup, IMHO the time_(before|after)(jiffies, ...)
> > > construct is horrible.
> > > 
> > > Any chance to resurrect this work ?
> > 
> > I plan to ressurect it when I have a little time but with some small
> > additions from the original work. Several people said "it should be mS
> > not HZ" and someone at OLS proposed that the API also includes an
> > accuracy guide so that systems using programmed wakeups can aggregate
> > timers when accuracy doesn't matter.
> 
> I suspect people who want to push HZ to 10000 won't be happy with
> milliseconds since it would not give them a resolution of one jiffy.
> 
> So the options are:
> 1) microseconds, allows up to roughly half an hour (signed) 
>    or an hour (unsigned).
> 2) nanoseconds, needs 64 bits, nice for 64 bit machines but 
>    at the risk of bloat on 32 bit ones.
> 3) timespecs, somewhat wasteful on 64 bit machines (two longs).
> 
> I believe 1) is the best compromise.

Yep. And if the need for ns arises, add a _different_ function (e.g. *_ns()) to
wait with ns-resolution. 64 bit is probably not needed, who wants to wait for
more than a few seconds with ns-resolution?

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-16  2:17                   ` dynamic-hz Gene Heskett
@ 2004-12-16 12:42                     ` linux-os
  2004-12-17 20:12                     ` dynamic-hz H. Peter Anvin
  1 sibling, 0 replies; 126+ messages in thread
From: linux-os @ 2004-12-16 12:42 UTC (permalink / raw)
  To: Gene Heskett
  Cc: linux-kernel, Alan Cox, Eric St-Laurent, Russell King,
	Stefan Seyfried, Con Kolivas, Pavel Machek, Andrea Arcangeli

On Wed, 15 Dec 2004, Gene Heskett wrote:

> On Wednesday 15 December 2004 14:54, linux-os wrote:
>> On Wed, 15 Dec 2004, Alan Cox wrote:
>>> On Maw, 2004-12-14 at 00:16, Eric St-Laurent wrote:
>>>> Alan,
>>>>
>>>> On a related subject, a few months ago you posted a patch which
>>>> added a nice add_timeout()/timeout_pending() API and converted
>>>> many (if not most) drivers to use it.
>>>>
>>>> If I remember correctly it did not generate much comments and the
>>>> work was not pushed into mainline.
>>>>
>>>> I think it's a nice cleanup, IMHO the
>>>> time_(before|after)(jiffies, ...) construct is horrible.
>>>>
>>>> Any chance to resurrect this work ?
>>>
>>> I plan to ressurect it when I have a little time but with some
>>> small additions from the original work. Several people said "it
>>> should be mS not HZ" and someone at OLS proposed that the API also
>>> includes an accuracy guide so that systems using programmed
>>> wakeups can aggregate timers when accuracy doesn't matter.
>>
>> I sure hope it isn't mS. Transconductance or its reciprocal doesn't
>> work very well for timing unless you supply the capacitor ;^)
>
> Me sticks hand up and waves at teacher.
>
> And what does 'Transconductance' have to do with this?
>

The international notation for transconductance is Siemens, no longer
MHO (Ohm spelled backwards). This happened at the same time that c.p.s. 
was changed to Hz. But, because the Siemens company is one of the
world's largest, the "S" didn't catch on as readily as Hz and others.
Siemens is so common you need to look up MHO in some really complete
dictionary to find its usage as MHO.

> That may be the wrong terminology to apply here.
>
> In vacuum tube (remember those?) specifications, this is the gain of
> the tube, which AIR is stated as the change in plate current for a
> one volt change in grid bias, and is normally stated in micromho's as
> they are high voltage, low current devices, with the highest gain
> tube that I'm aware of being the 7788.   Using the same measurement

The older MHO was usually stated in micro-mho for vacuum
tubes. For instance low mu triodes like 12AT7 had a mu
of 10 (10 micro-mho). The "gainier" cousin, the 12AX7
had a mu of 100 (100 micro-mho).

Some modern FETs have transconductance up to 10
MHO (10 Siemens).

> technique applied to modern relatively highed power field effect
> transistors where the currents can be many amperes, readings best
> stated in mho's are fairly common today. A 'mho' of course, is the
> reciprocal of an ohm.
>
>> FYI, mS means milli-Siemens. Seconds is lower-case --always.
>
> Yup.
>
> -- 
> Cheers, Gene
> "There are four boxes to be used in defense of liberty:
> soap, ballot, jury, and ammo. Please use in that order."
> -Ed Howdershelt (Author)
> 99.30% setiathome rank, not too shabby for a WV hillbilly
> Yahoo.com attorneys please note, additions to this message
> by Gene Heskett are:
> Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by John Ashcroft.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Time goes crazy in 2.6.9 after long cli [was Re: USB making time drift]
  2004-12-16 11:13                                 ` Andrea Arcangeli
@ 2004-12-16 12:49                                   ` Alan Cox
  0 siblings, 0 replies; 126+ messages in thread
From: Alan Cox @ 2004-12-16 12:49 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Pavel Machek, Zwane Mwaikambo, Con Kolivas, Linux Kernel Mailing List

On Iau, 2004-12-16 at 11:13, Andrea Arcangeli wrote:
> Well, I was pretty sure it was reproducible since the PIT and TSC are
> standard hw in all machines, it's just the excessive usb irq latency

TSC is not by any means standard hw in all machines and it has a whole
pile of issues on some of them with the way it varies rate and/or stops.

> My suggestion is that first we fix the accuracy of this, and *then* we
> consider switching to a one-short timer.

Agreed - one shot timers are going to be nearly impossible to use for
system time accounting because we keep losing time resetting it.




^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: USB making time drift [was Re: dynamic-hz]
  2004-12-16  1:59                                         ` Gene Heskett
  2004-12-16 11:30                                           ` Andrea Arcangeli
@ 2004-12-16 12:50                                           ` Alan Cox
  1 sibling, 0 replies; 126+ messages in thread
From: Alan Cox @ 2004-12-16 12:50 UTC (permalink / raw)
  To: gene.heskett
  Cc: Linux Kernel Mailing List, Andrea Arcangeli, Pavel Machek,
	Zwane Mwaikambo, Con Kolivas

On Iau, 2004-12-16 at 01:59, Gene Heskett wrote:
> Unforch, I was not able to find that in the .config file, so where is
> that particular option set?

Base 2.6.9 hardcodes it, 2.6.9-ac has it in the configuration for x86


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-16  9:10                 ` dynamic-hz Gabriel Paubert
  2004-12-16 12:17                   ` dynamic-hz Geert Uytterhoeven
@ 2004-12-16 14:00                   ` Mitchell Blank Jr
  1 sibling, 0 replies; 126+ messages in thread
From: Mitchell Blank Jr @ 2004-12-16 14:00 UTC (permalink / raw)
  To: Gabriel Paubert
  Cc: Alan Cox, Eric St-Laurent, Russell King, Stefan Seyfried,
	Con Kolivas, Pavel Machek, Linux Kernel Mailing List,
	Andrea Arcangeli

Gabriel Paubert wrote:
> So the options are:
> 1) microseconds, allows up to roughly half an hour (signed) 
>    or an hour (unsigned).
> 2) nanoseconds, needs 64 bits, nice for 64 bit machines but 
>    at the risk of bloat on 32 bit ones.
> 3) timespecs, somewhat wasteful on 64 bit machines (two longs).

Also forgive me if this has already been discussed (I might have missed
some of the messages on these threads) but there's also Paul Henning
Kamp's "bintime" format used in FreeBSD:
  http://phk.freebsd.dk/pubs/timecounter.pdf

I'm not convinced it's the right solution for this problem but the paper
does make a lot of good points.

I also agree that any new timer API needs to have entry points for users
that can handle an imprecise wakeup -- running multiple wakeups at once
is important at high load.

Another idea I've been toying around in my head related to this (but
would need some instrumentation to prove)  I bet on heavy server loads
there's a lot of timeouts where:
  1. Requested timeout is always >N seconds
  2. Timeouts almost always get canceled well before (N/2) seconds have
     passed

If this is the case you could make a pretty simple hack -- on each CPU keep
two list_head's of timers -- lets call them "add_list" and "prev_list".
Now every N/2 seconds do:
  tmp = prev_list
  prev_list = add_list
  add_list = EMPTY
...and then add all the timers on "tmp" to the normal timer queue.

The advantage here is that to add a timer you just have to insert it onto
add_list -- you don't have to keep these lists in order.  By the time
we get around to adding the timers on "tmp" to the main timer queue we know:
  1. They are still waiting to expire (at most "N" seconds have elapsed
     since they were inserted)
  2. Most of them have been canceled (since at least "N/2" seconds have
     passed) and were thus removed from the list.  "tmp" should not
     have many elements remaining

I think for some class of timeouts (device timeouts, network, etc) this
should be pretty efficient.  You'd have to do a bunch of instrumenting
to see if there are enough timers with these characteristics to make this
useful (and what would make a good value for 'N')  I've got a zillion other
projects so I'm not going to have a chance to do this, but maybe it'll
give someone else some ideas.

-Mitch

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14  4:29               ` dynamic-hz Andrew Morton
  2004-12-14  5:25                 ` dynamic-hz Nish Aravamudan
@ 2004-12-17 20:10                 ` Nish Aravamudan
  1 sibling, 0 replies; 126+ messages in thread
From: Nish Aravamudan @ 2004-12-17 20:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: andrea, kernel, pavel, linux-kernel

On Mon, 13 Dec 2004 20:29:39 -0800, Andrew Morton <akpm@osdl.org> wrote:
> Nish Aravamudan <nish.aravamudan@gmail.com> wrote:
> >
> > On Mon, 13 Dec 2004 03:25:21 -0800, Andrew Morton <akpm@osdl.org> wrote:
> > > Andrea Arcangeli <andrea@suse.de> wrote:
> > > >
> > > > The patch only does HZ at dynamic time. But of course it's absolutely
> > > >  trivial to define it at compile time, it's probably a 3 liner on top of
> > > >  my current patch ;). However personally I don't think the three liner
> > > >  will worth the few seconds more spent configuring the kernel ;).
> > >
> > > We still have 1000-odd places which do things like
> > >
> > >         schedule_timeout(HZ/10);
> >
> > Yes, yes, we do :) I replaced far more than I ever thought I could...
> > There are a few issues I have with the remaining schedule_timeout()
> > calls which I think fit ok with this thread... I'd especially like
> > your input, Andrew, as you end up getting most of my patches from KJ.
> >
> > Many drivers use
> >
> > set_current_state(TASK_{UN,}INTERRUPTIBLE);
> > schedule_timeout(1); // or some other small value < 10
> >
> > This may or may not hide a dependency on a particular HZ value. If the
> > code is somewhat old, perhaps the author intended the task to sleep
> > for 1 jiffy when HZ was equal to 100. That meants that they ended up
> > sleeping for 10 ms. If the code is new, the author intends that the
> > task sleeps for 1 ms (HZ==1000). The question is, what should the
> > replacement be?
> 
> Presumably they meant 10 milliseconds.  Or at least, that is the delay
> which the developer did his testing with.
> 
> > If they really meant to use schedule_timeout(1) in the sense of
> > highest resolution delay possible (the latter above), then they
> > probably should just call schedule() directly.
> 
> argh.  Never do that.  It's basically a busywait and can cause lockups if
> the calling task has realtime scheduling policy.

For those drivers that use schedule() calls currently to delay, what
would you recommend? drivers/atm/ambassador.c contains a few examples.
I can get rid of most of the schedule_timeout() calls, but the
schedule() ones are a little more difficult. Would schedule_timeout(1)
be preferred to schedule()?

Thanks,
Nish

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-16  2:17                   ` dynamic-hz Gene Heskett
  2004-12-16 12:42                     ` dynamic-hz linux-os
@ 2004-12-17 20:12                     ` H. Peter Anvin
  1 sibling, 0 replies; 126+ messages in thread
From: H. Peter Anvin @ 2004-12-17 20:12 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <200412152117.20568.gene.heskett@verizon.net>
By author:    Gene Heskett <gene.heskett@verizon.net>
In newsgroup: linux.dev.kernel
> 
> Me sticks hand up and waves at teacher.
> 
> And what does 'Transconductance' have to do with this?
> 
> That may be the wrong terminology to apply here.
> 
> In vacuum tube (remember those?) specifications, this is the gain of
> the tube, which AIR is stated as the change in plate current for a
> one volt change in grid bias, and is normally stated in micromho's as
> they are high voltage, low current devices, with the highest gain
> tube that I'm aware of being the 7788.   Using the same measurement
> technique applied to modern relatively highed power field effect
> transistors where the currents can be many amperes, readings best
> stated in mho's are fairly common today. A 'mho' of course, is the
> reciprocal of an ohm.
> 
> >FYI, mS means milli-Siemens. Seconds is lower-case --always.
> 

To be excrutiatingly picky:

mS means millisiemens.  Siemens is the SI unit for
(trans)conductance.  Like all units named after people:

- its symbol (S) is capitalized;
- its name (siemens) is not (unless starting a sentence).

Note that since it's named after a person named Ernsr Werner von
Siemens, it's called "siemens" even in singular (1 S = 1 siemens).
According to normal English nomenclature then the plural would be
siemenses, but that doesn't seem to have caught on.

ms means milliseconds.  Seconds is the SI unit for time.  Like all
units not named after people:

- neither its symbol (s) nor its name (second) is capitalized.

	-hpa

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: dynamic-hz
  2004-12-14 23:13                                     ` dynamic-hz Tony Lindgren
@ 2004-12-22 20:02                                       ` Tony Lindgren
  0 siblings, 0 replies; 126+ messages in thread
From: Tony Lindgren @ 2004-12-22 20:02 UTC (permalink / raw)
  To: linux-os
  Cc: Pavel Machek, john stultz, Andrea Arcangeli, Zwane Mwaikambo,
	Con Kolivas, lkml

* Tony Lindgren <tony@atomide.com> [041214 16:22]:
> * linux-os <linux-os@chaos.analogic.com> [041214 15:04]:
> > On Tue, 14 Dec 2004, Pavel Machek wrote:
> > 
> > >Hi!
> > >
> > >>>>The patch in question is at:
> > >>>>
> > >>>>http://linux-omap.bkbits.net:8080/main/user=tmlind/patch@1.2016.4.18?nav=!-|index.html|stats|!+|index.html|ChangeSet@-12w|cset@1.2016.4.18
> > >>>
> > >>>Wow, that's basically 8 lines of code plus driver for new
> > >>>hardware... Is it really that simple?
> > >>
> > >>Yeah, the key things are reprogramming the timer in the idle loop
> > >>based on next_timer_interrupt(), and calling timer_interrupt from
> > >>other interrupts as well :)
> > >>
> > >>Should we try a similar patch for x86/amd64? I'm not sure which timers
> > >>to use though? One should be programmable length for the interrupt,
> > >>and the other continuous for the timekeeping.
> > >
> > >Yes, it would certainly be interesting. 5% power savings, and no
> > >singing capacitors, while keeping HZ=1000. Sounds good to me.
> > >
> > >There are about 1000 timers available in PC, each having its own
> > >quirks. CMOS clock should be able to generate 1024Hz periodic timer
> > >(we currently do not use) and TSC we currently use for periodic timer
> > >should be usable in single-shot mode.
> > >								Pavel
> > >--
> > 
> > If you use that RTC timer, it needs to be something that can be
> > turned OFF. Many embedded applications use that because its the
> > only timer that the OS doesn't muck with. It also has very low
> > noise which makes in a good timing source for IIR filters for
> > high precision, low data-rate data acquisition (like 24 bits).
> 
> OK, thanks for the information. That could be the continuous timer
> then, and TSC the periodic timer.
> 
> > Since it generates an edge, its interrupt can't be shared.
> > I certainly hope that you don't use it. One can read the
> > time without disturbing the interrupt rate. One just
> > needs to use the existing rtc_lock and not spin with
> > the lock being held.
> 
> Yeah, the timer update would be just a read from the RTC timer.
> 
> > Currently the kernel RTC software allocates the RTC interrupt
> > even though it doesn't use it. This makes it necessary to
> > compile the RTC as a module and then remove it when another
> > driver needs to use the RTC interrupt source.
> 
> The interrupt could be used for timer wrap only.

Well just to follow up, I did some experiments over the weekend on
my old athlon box, and looks like it's doable. I'll set up something
common where various timers can register their no-tick functions.

So far I have APIC timer doing the no-tick interrupts, and nothing
yet for the timer to update time from. The code will using whatever
timers as long as they implement the right functions.

I'll post some patches when I have something working... Probably
after the holidays.

Tony

^ permalink raw reply	[flat|nested] 126+ messages in thread

end of thread, other threads:[~2004-12-22 20:04 UTC | newest]

Thread overview: 126+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-11 14:23 dynamic-hz Andrea Arcangeli
2004-12-11 14:50 ` dynamic-hz Zwane Mwaikambo
2004-12-12  6:57   ` dynamic-hz Andrea Arcangeli
2004-12-11 21:41 ` dynamic-hz Jan Engelhardt
2004-12-12 16:35 ` dynamic-hz Pavel Machek
2004-12-12 22:23   ` dynamic-hz Andrea Arcangeli
2004-12-12 23:36     ` dynamic-hz Con Kolivas
2004-12-12 23:42       ` dynamic-hz Pavel Machek
2004-12-13  0:09         ` dynamic-hz Con Kolivas
2004-12-13  8:37           ` dynamic-hz Jan Engelhardt
2004-12-13 10:43           ` dynamic-hz Pavel Machek
2004-12-13 11:08             ` dynamic-hz Andrea Arcangeli
2004-12-13 19:36               ` dynamic-hz john stultz
2004-12-12 23:43       ` dynamic-hz Andrea Arcangeli
2004-12-13  0:18         ` dynamic-hz Con Kolivas
2004-12-13  0:27           ` dynamic-hz Andrea Arcangeli
2004-12-13  1:50             ` dynamic-hz Zwane Mwaikambo
2004-12-13 11:28               ` dynamic-hz Andrea Arcangeli
2004-12-13 12:43                 ` dynamic-hz Pavel Machek
2004-12-13 12:58                   ` dynamic-hz Andrea Arcangeli
2004-12-13 19:12                     ` dynamic-hz Pavel Machek
2004-12-13 20:34                       ` dynamic-hz john stultz
2004-12-13 20:49                         ` dynamic-hz Pavel Machek
2004-12-14  2:04                           ` dynamic-hz Andrea Arcangeli
     [not found]                           ` <20041214013924.GB14617@atomide.com>
2004-12-14  9:37                             ` dynamic-hz Pavel Machek
2004-12-14 21:18                               ` dynamic-hz Tony Lindgren
2004-12-14 22:06                                 ` dynamic-hz Pavel Machek
2004-12-14 23:00                                   ` dynamic-hz linux-os
2004-12-14 23:13                                     ` dynamic-hz Tony Lindgren
2004-12-22 20:02                                       ` dynamic-hz Tony Lindgren
2004-12-14 23:04                                   ` dynamic-hz Tony Lindgren
2004-12-14  2:46                         ` dynamic-hz Andrea Arcangeli
2004-12-14 19:24                           ` dynamic-hz john stultz
2004-12-14  2:36                       ` dynamic-hz Andrea Arcangeli
2004-12-14  9:39                         ` dynamic-hz Pavel Machek
2004-12-14  9:59                         ` dynamic-hz Pavel Machek
2004-12-14 15:25                           ` dynamic-hz Andrea Arcangeli
2004-12-14 22:02                             ` USB making time drift [was Re: dynamic-hz] Pavel Machek
2004-12-14 23:16                               ` Andrea Arcangeli
2004-12-15  2:59                                 ` Gene Heskett
2004-12-15  9:17                                   ` Andrea Arcangeli
2004-12-15 16:44                                     ` Gene Heskett
2004-12-15 18:20                                       ` Andrea Arcangeli
2004-12-16  1:59                                         ` Gene Heskett
2004-12-16 11:30                                           ` Andrea Arcangeli
2004-12-16 12:50                                           ` Alan Cox
2004-12-15 20:16                                       ` Pavel Machek
2004-12-16  2:02                                         ` Gene Heskett
2004-12-15 17:03                                     ` Gene Heskett
2004-12-15 17:48                                       ` Tim Schmielau
2004-12-16  2:03                                         ` Gene Heskett
2004-12-16  0:58                                 ` Pavel Machek
2004-12-16  2:33                                   ` john stultz
2004-12-16  1:15                               ` Time goes crazy in 2.6.9 after long cli [was Re: USB making time drift] Pavel Machek
2004-12-16 11:13                                 ` Andrea Arcangeli
2004-12-16 12:49                                   ` Alan Cox
2004-12-13 14:50                 ` dynamic-hz Zwane Mwaikambo
2004-12-13  7:43       ` dynamic-hz Stefan Seyfried
2004-12-13 13:58         ` dynamic-hz Russell King
2004-12-13 14:14           ` dynamic-hz Russell King
2004-12-13 14:52           ` dynamic-hz Alan Cox
2004-12-13 16:23             ` dynamic-hz Russell King
2004-12-13 17:53               ` dynamic-hz Michael Buesch
2004-12-13 18:04                 ` dynamic-hz Russell King
2004-12-13 19:04               ` dynamic-hz Pavel Machek
2004-12-13 20:11               ` dynamic-hz Russell King
2004-12-14  0:16             ` dynamic-hz Eric St-Laurent
2004-12-15 18:04               ` dynamic-hz Alan Cox
2004-12-15 19:54                 ` dynamic-hz linux-os
2004-12-16  2:17                   ` dynamic-hz Gene Heskett
2004-12-16 12:42                     ` dynamic-hz linux-os
2004-12-17 20:12                     ` dynamic-hz H. Peter Anvin
2004-12-16  9:10                 ` dynamic-hz Gabriel Paubert
2004-12-16 12:17                   ` dynamic-hz Geert Uytterhoeven
2004-12-16 14:00                   ` dynamic-hz Mitchell Blank Jr
2004-12-13 15:30           ` dynamic-hz Zwane Mwaikambo
2004-12-13 15:59             ` dynamic-hz Russell King
2004-12-13 16:14               ` dynamic-hz Pavel Machek
2004-12-13 16:06           ` dynamic-hz Pavel Machek
2004-12-13 16:19         ` dynamic-hz Jan Engelhardt
2004-12-13  8:29       ` dynamic-hz Jan Engelhardt
2004-12-14 22:54         ` dynamic-hz Lee Revell
2004-12-14 23:38           ` dynamic-hz Chris Friesen
2004-12-15  8:32             ` dynamic-hz Jan Engelhardt
2004-12-13 11:02       ` dynamic-hz Andrew Morton
2004-12-13 11:17         ` dynamic-hz Andrea Arcangeli
2004-12-13 11:25           ` dynamic-hz Andrew Morton
2004-12-13 11:47             ` dynamic-hz Andrea Arcangeli
2004-12-14  3:56               ` dynamic-hz Nish Aravamudan
2004-12-14  3:54             ` dynamic-hz Nish Aravamudan
2004-12-14  4:29               ` dynamic-hz Andrew Morton
2004-12-14  5:25                 ` dynamic-hz Nish Aravamudan
2004-12-17 20:10                 ` dynamic-hz Nish Aravamudan
2004-12-14 10:01               ` dynamic-hz Domen Puncer
2004-12-14 16:56                 ` dynamic-hz Nish Aravamudan
2004-12-14 14:23               ` dynamic-hz linux-os
2004-12-14 16:54                 ` dynamic-hz Nish Aravamudan
2004-12-14 17:15                   ` dynamic-hz Andrea Arcangeli
2004-12-14 17:42                     ` dynamic-hz Nish Aravamudan
2004-12-14 18:29                       ` dynamic-hz Andrea Arcangeli
2004-12-14 19:00                         ` dynamic-hz Nish Aravamudan
2004-12-14 18:22                     ` dynamic-hz linux-os
2004-12-14 18:38                       ` dynamic-hz Andrea Arcangeli
2004-12-14 18:50                       ` dynamic-hz Pavel Machek
2004-12-13 11:19         ` dynamic-hz Hans Kristian Rosbach
2004-12-13 11:22           ` dynamic-hz Pavel Machek
2004-12-13 11:39             ` dynamic-hz Andrea Arcangeli
2004-12-13 12:51             ` dynamic-hz Hans Kristian Rosbach
2004-12-13 13:01               ` dynamic-hz Andrea Arcangeli
2004-12-13 13:02                 ` dynamic-hz Andrea Arcangeli
2004-12-13 15:06               ` dynamic-hz Geert Uytterhoeven
2004-12-13 16:12                 ` dynamic-hz Pavel Machek
2004-12-13 16:14                   ` dynamic-hz Geert Uytterhoeven
2004-12-14  4:06                   ` dynamic-hz Nish Aravamudan
2004-12-14  4:05               ` dynamic-hz Nish Aravamudan
2004-12-13 11:33           ` dynamic-hz Andrea Arcangeli
2004-12-13 14:38           ` dynamic-hz Zwane Mwaikambo
2004-12-13 12:00       ` dynamic-hz Alan Cox
2004-12-13 15:52         ` dynamic-hz Andrea Arcangeli
2004-12-14 22:28       ` dynamic-hz Lee Revell
2004-12-14 22:40         ` dynamic-hz Con Kolivas
2004-12-14 22:50           ` dynamic-hz Lee Revell
2004-12-13 20:26 ` dynamic-hz Olaf Hering
2004-12-13 22:41   ` dynamic-hz Andrea Arcangeli
2004-12-13 20:56 ` dynamic-hz john stultz
2004-12-13 22:21   ` dynamic-hz Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).