All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: HZ, preferably as small as possible
@ 2002-07-11  2:46 Grover, Andrew
  2002-07-11  3:01 ` Jeff Garzik
  2002-07-11  7:09 ` george anzinger
  0 siblings, 2 replies; 76+ messages in thread
From: Grover, Andrew @ 2002-07-11  2:46 UTC (permalink / raw)
  To: 'CaT', Benjamin LaHaise; +Cc: Andrew Morton, Grover, Andrew, Linux

> From: CaT [mailto:cat@zip.com.au] 
> On Wed, Jul 10, 2002 at 05:42:51PM -0400, Benjamin LaHaise wrote:
> > On Wed, Jul 10, 2002 at 02:38:32PM -0700, Andrew Morton wrote:
> > > OK, I'll grant that.  Why is this useful?
> > 
> > Think video playback, where you want to queue the frame to 
> be played as 
> > close to the correct 1/60s time as possible.  With HZ=100, 
> the code will 
> 
> Or 1/50 (think PAL), no? (Of course HZ=100 would be sweet for that. ;)

I don't know if I should mention this, but...

Win2k's default timer tick is 10ms (i.e. 100HZ) but it will go as low as 1ms
(1000HZ) if people request timers with that level of granularity. On the
fly.

So, a changing tick *can* be done. If Linux does the same thing, seems like
everyone is happy. What are the obstacles to this for Linux? If code is
based on the assumption of a constant timer tick, I humbly assert that the
code is broken.

Regards -- Andy

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11  2:46 HZ, preferably as small as possible Grover, Andrew
@ 2002-07-11  3:01 ` Jeff Garzik
  2002-07-11 11:45   ` Alan Cox
  2002-07-11 17:08   ` Martin Dalecki
  2002-07-11  7:09 ` george anzinger
  1 sibling, 2 replies; 76+ messages in thread
From: Jeff Garzik @ 2002-07-11  3:01 UTC (permalink / raw)
  To: Grover, Andrew; +Cc: 'CaT', Benjamin LaHaise, Andrew Morton, Linux

Grover, Andrew wrote:
> So, a changing tick *can* be done. If Linux does the same thing, seems like
> everyone is happy. What are the obstacles to this for Linux? If code is
> based on the assumption of a constant timer tick, I humbly assert that the
> code is broken.

Unfortunately code in Linux has traditionally compiled in a constant HZ 
all over the place, and jiffies instead of real time units are at the 
heart of all Linux timer-related activities.

I don't see that making 'HZ' a variable is really an option, because 
many drivers and scheduler-related code will be wildly inaccurate as 
soon as HZ actually changes values.

So that leaves us with the option of changing all the code related to 
waiting to be based on msecs and usecs.  Which I would love to do, but 
that's a lot of work, both code- and audit-wise.

	Jeff




^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11  2:46 HZ, preferably as small as possible Grover, Andrew
  2002-07-11  3:01 ` Jeff Garzik
@ 2002-07-11  7:09 ` george anzinger
  1 sibling, 0 replies; 76+ messages in thread
From: george anzinger @ 2002-07-11  7:09 UTC (permalink / raw)
  To: Grover, Andrew; +Cc: 'CaT', Benjamin LaHaise, Andrew Morton, Linux

"Grover, Andrew" wrote:
> 
> > From: CaT [mailto:cat@zip.com.au]
> > On Wed, Jul 10, 2002 at 05:42:51PM -0400, Benjamin LaHaise wrote:
> > > On Wed, Jul 10, 2002 at 02:38:32PM -0700, Andrew Morton wrote:
> > > > OK, I'll grant that.  Why is this useful?
> > >
> > > Think video playback, where you want to queue the frame to
> > be played as
> > > close to the correct 1/60s time as possible.  With HZ=100,
> > the code will
> >
> > Or 1/50 (think PAL), no? (Of course HZ=100 would be sweet for that. ;)
> 
> I don't know if I should mention this, but...
> 
> Win2k's default timer tick is 10ms (i.e. 100HZ) but it will go as low as 1ms
> (1000HZ) if people request timers with that level of granularity. On the
> fly.

This is what the high-res-timers patch does.  It always does
the 1/HZ tick, but if a timer is requested with finer
granularity (resolution) an interrupt is scheduled to take
care of it.  Check it out.  You will find it here:
http://sourceforge.net/projects/high-res-timers/
> 
> So, a changing tick *can* be done. If Linux does the same thing, seems like
> everyone is happy. What are the obstacles to this for Linux? If code is
> based on the assumption of a constant timer tick, I humbly assert that the
> code is broken.
> 
> Regards -- Andy
> -

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11  3:01 ` Jeff Garzik
@ 2002-07-11 11:45   ` Alan Cox
  2002-07-11 17:08   ` Martin Dalecki
  1 sibling, 0 replies; 76+ messages in thread
From: Alan Cox @ 2002-07-11 11:45 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Grover Andrew, 'CaT', Benjamin LaHaise, Andrew Morton, Linux

> Grover, Andrew wrote:
> > So, a changing tick *can* be done. If Linux does the same thing, seems like
> > everyone is happy. What are the obstacles to this for Linux? If code is
> > based on the assumption of a constant timer tick, I humbly assert that the
> > code is broken.
> 
> I don't see that making 'HZ' a variable is really an option, because 
> many drivers and scheduler-related code will be wildly inaccurate as 
> soon as HZ actually changes values.

HZ never changes value. HZ is the top granularity we choose to operate at.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11  3:01 ` Jeff Garzik
  2002-07-11 11:45   ` Alan Cox
@ 2002-07-11 17:08   ` Martin Dalecki
  2002-07-11 19:21     ` Albert D. Cahalan
  2002-07-11 20:34     ` Bill Davidsen
  1 sibling, 2 replies; 76+ messages in thread
From: Martin Dalecki @ 2002-07-11 17:08 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Grover, Andrew, 'CaT', Benjamin LaHaise, Andrew Morton, Linux

Użytkownik Jeff Garzik napisał:
> Grover, Andrew wrote:
> 
>> So, a changing tick *can* be done. If Linux does the same thing, seems 
>> like
>> everyone is happy. What are the obstacles to this for Linux? If code is
>> based on the assumption of a constant timer tick, I humbly assert that 
>> the
>> code is broken.
> 
> 
> Unfortunately code in Linux has traditionally compiled in a constant HZ 
> all over the place, and jiffies instead of real time units are at the 
> heart of all Linux timer-related activities.
> 
> I don't see that making 'HZ' a variable is really an option, because 
> many drivers and scheduler-related code will be wildly inaccurate as 
> soon as HZ actually changes values.
> 
> So that leaves us with the option of changing all the code related to 
> waiting to be based on msecs and usecs.  Which I would love to do, but 
> that's a lot of work, both code- and audit-wise.

vmstat.c:

hz = sysconf(_SC_CLK_TCK);	/* get ticks/s from system */

And yes I know the libproc is *evil* in this area.
The rest should be an implementation detail of sysconf().
Changing this value during the runtime of vmstat is interresting story
anyway, but it should be up to the sysadmin to do this kind
of stuff only at runtlevel 1.
sysconf can be indeed imeplemented as a single global
file containing configuration data. But sysctl is another story
of course.




^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11 17:08   ` Martin Dalecki
@ 2002-07-11 19:21     ` Albert D. Cahalan
  2002-07-16  9:17       ` Kai Henningsen
  2002-07-11 20:34     ` Bill Davidsen
  1 sibling, 1 reply; 76+ messages in thread
From: Albert D. Cahalan @ 2002-07-11 19:21 UTC (permalink / raw)
  To: Martin Dalecki
  Cc: Jeff Garzik, Grover Andrew, 'CaT',
	Benjamin LaHaise, Andrew Morton, Linux

Martin Dalecki writes:
> U\277ytkownik Jeff Garzik napisa\263:

>> I don't see that making 'HZ' a variable is really an option, because 
>> many drivers and scheduler-related code will be wildly inaccurate as 
>> soon as HZ actually changes values.

Definitely:
my_timeout = foo*HZ;

>> So that leaves us with the option of changing all the code related to 
>> waiting to be based on msecs and usecs.  Which I would love to do, but 
>> that's a lot of work, both code- and audit-wise.
>
> vmstat.c:
>
> hz = sysconf(_SC_CLK_TCK);	/* get ticks/s from system */

Oops! Sorry I missed that one. Not that it matters for
the 2.5.25 kernel and above, but that code really should
be using the Hertz value supplied by libproc.

> And yes I know the libproc is *evil* in this area.

Hell yes. It's going to remain evil until the 2.4 kernel
is a distant memory. Debian uses a 2.2 kernel in the
upcoming release, so it will be a good long time until
everyone is using a 2.6 kernel. When 2.8 comes out,
Debian will finally stop using 2.4 and I can get rid of
my evil hack.

Hey, I asked for a clean way to get HZ. I didn't even
get "send a patch"; I got BS about the 2.5.25 behavior
being standard, as if it had already been implemented.

> The rest should be an implementation detail of sysconf().

That's broken. It can't even correctly report the
number of processors you have.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11 17:08   ` Martin Dalecki
  2002-07-11 19:21     ` Albert D. Cahalan
@ 2002-07-11 20:34     ` Bill Davidsen
  2002-07-12 12:01       ` Martin Dalecki
  2002-07-15  5:15       ` Linus Torvalds
  1 sibling, 2 replies; 76+ messages in thread
From: Bill Davidsen @ 2002-07-11 20:34 UTC (permalink / raw)
  To: Martin Dalecki; +Cc: Jeff Garzik, Andrew Morton, Linux

On Thu, 11 Jul 2002, Martin Dalecki wrote:

> vmstat.c:
> 
> hz = sysconf(_SC_CLK_TCK);	/* get ticks/s from system */
> 
> And yes I know the libproc is *evil* in this area.
> The rest should be an implementation detail of sysconf().

Yes, any of the changes need to make the dynamic value available to
programs. Alas, too many programs grab the HZ value and compile it in, and
don't work right on a kernel with a modified rate. I don't know if the
CLK_TCK macro is dynamic or not, I sure hope so.

I'd like to see it set at boot time, and available in /proc/sys for easy
use by scripts. As noted by others, there are a lot of uses in the kernel
source which assume that arithmetic will happen at compile time, and even
if you ignore the overhead it would take a lot of rewriting to make it
dynamic. Setting it a boot time gets most of the gain and none of the
pain (boot time = pick a kernel, not a parameter).

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11 20:34     ` Bill Davidsen
@ 2002-07-12 12:01       ` Martin Dalecki
  2002-07-15  5:15       ` Linus Torvalds
  1 sibling, 0 replies; 76+ messages in thread
From: Martin Dalecki @ 2002-07-12 12:01 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Jeff Garzik, Andrew Morton, Linux

Użytkownik Bill Davidsen napisał:
> On Thu, 11 Jul 2002, Martin Dalecki wrote:
> 
> 
>>vmstat.c:
>>
>>hz = sysconf(_SC_CLK_TCK);	/* get ticks/s from system */
>>
>>And yes I know the libproc is *evil* in this area.
>>The rest should be an implementation detail of sysconf().
> 
> 
> Yes, any of the changes need to make the dynamic value available to
> programs. Alas, too many programs grab the HZ value and compile it in, and
> don't work right on a kernel with a modified rate. I don't know if the
> CLK_TCK macro is dynamic or not, I sure hope so.
> 
> I'd like to see it set at boot time, and available in /proc/sys for easy
> use by scripts. As noted by others, there are a lot of uses in the kernel
> source which assume that arithmetic will happen at compile time, and even
> if you ignore the overhead it would take a lot of rewriting to make it
> dynamic. Setting it a boot time gets most of the gain and none of the
> pain (boot time = pick a kernel, not a parameter).
> 

IMHO there where reasons why the standards are defining a function
to access this information from applications.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11 20:34     ` Bill Davidsen
  2002-07-12 12:01       ` Martin Dalecki
@ 2002-07-15  5:15       ` Linus Torvalds
  2002-07-15  6:56         ` Albert D. Cahalan
  2002-07-15  8:58         ` Dave Mielke
  1 sibling, 2 replies; 76+ messages in thread
From: Linus Torvalds @ 2002-07-15  5:15 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.3.96.1020711162333.5732C-100000@gatekeeper.tmr.com>,
Bill Davidsen  <davidsen@tmr.com> wrote:
>On Thu, 11 Jul 2002, Martin Dalecki wrote:
>
>> vmstat.c:
>> 
>> hz = sysconf(_SC_CLK_TCK);	/* get ticks/s from system */
>> 
>> And yes I know the libproc is *evil* in this area.
>> The rest should be an implementation detail of sysconf().
>
>Yes, any of the changes need to make the dynamic value available to
>programs.

No they don't.

Have people looked at the 2.5.x patches?

CLK_TCK is 100 on x86. As it has always been. User land should never
care about whatever random value the kernel happens to use for the
actual timer tick at that particular moment. Especially since the kernel
internal timer tick may well be variable some day.

The fact that libproc believes that HZ can change is _their_ problem.
I've told people over and over that user-level HZ is a constant (and, on
x86, that constant is 100), and that won't change.

So in current 2.5.x times() still counts at 100Hz, and /proc files that
export clock_t still show the same 100Hz rate.

The fact that the kernel internally counts at some different rate should
be _totally_ invisible to user programs (except they get better latency
for stuff like select() and other timeouts).

		Linus

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15  5:15       ` Linus Torvalds
@ 2002-07-15  6:56         ` Albert D. Cahalan
  2002-07-15  8:24           ` Russell King
  2002-07-15  8:58         ` Dave Mielke
  1 sibling, 1 reply; 76+ messages in thread
From: Albert D. Cahalan @ 2002-07-15  6:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds writes:

> The fact that libproc believes that HZ can change is _their_ problem.
> I've told people over and over that user-level HZ is a constant (and, on
> x86, that constant is 100), and that won't change.

Was HZ supposed to be 1024 or 1200 on alpha?
How about arm... 64, 128, or 1000?

Not even counting user-mode-linux at 20 HZ, there were
about _five_ archs in your official kernel source that
indirectly made HZ a config option.

> So in current 2.5.x times() still counts at 100Hz, and /proc files that
> export clock_t still show the same 100Hz rate.

Good. That works for the 2.5 kernel and above, assuming you
did something about alpha, arm, ia64, s390, and mips.

Unfortunately, the hack must remain for another 4 years or so.
Maybe that's not so bad though. I prefer it over this:

#ifdef __386__
#define HZ 100
#endif
#ifdef __IA64__
#define HZ 1024
#endif
#ifdef __ARM__
#define HZ 128  // if they settle on this
#endif
#ifdef __S390__
#define HZ 10
#endif
...

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15  6:56         ` Albert D. Cahalan
@ 2002-07-15  8:24           ` Russell King
  2002-07-15 15:48             ` David Mosberger
  2002-07-15 16:07             ` Albert D. Cahalan
  0 siblings, 2 replies; 76+ messages in thread
From: Russell King @ 2002-07-15  8:24 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: Linus Torvalds, linux-kernel

On Mon, Jul 15, 2002 at 02:56:14AM -0400, Albert D. Cahalan wrote:
> Unfortunately, the hack must remain for another 4 years or so.
> Maybe that's not so bad though. I prefer it over this:
> 
> #ifdef __386__
> #define HZ 100
> #endif
> #ifdef __IA64__
> #define HZ 1024
> #endif
> #ifdef __ARM__
> #define HZ 128  // if they settle on this

Ehh?  It's been 100 on the majority of ARM.  If it's different in libproc,
the libproc is broken.  One (broken) machine type decided it would be a
good idea to change it to 1000.  Since no one has paid any attention
to this machine for some time, it's support code will get dropped if
they don't fix it before 2.6.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15  5:15       ` Linus Torvalds
  2002-07-15  6:56         ` Albert D. Cahalan
@ 2002-07-15  8:58         ` Dave Mielke
  1 sibling, 0 replies; 76+ messages in thread
From: Dave Mielke @ 2002-07-15  8:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel (mailing list)

[quoted lines by Linus Torvalds on July 15, 2002, at 05:15]

>The fact that the kernel internally counts at some different rate should
>be _totally_ invisible to user programs (except they get better latency
>for stuff like select() and other timeouts).

I believe your position to be right on. May I ask, however, about a quandry
which we have in BRLTTY? We generate short "tunes" via the PC speaker in order
to give a blind user audible clues regarding certain events. To do this, we
need rather precise control over how long each note is on. Due to the current
lack of granularity, we need to do some rather long busy loops. This has worked
out okay, but it'd of course be much better if we could rely on the kernel to
do it, especially on a busy system, if its granularity is good enough. My
quandry is that while I don't believe that user land should know what
granularity the kernel is using, I'd still like to know if we should busy loop
or let the kernel do it depending on whether or not the kernel's granularity is
good enough for our needs. It'd be nice to have a way, therefore, to query two
values at run time, i.e. the granularity that services like select can offer
and the maximum amount of time that nanosleep will do a very accurate short
wait, although I suppose that these abilities could be abused by some.

-- 
Dave Mielke           | 2213 Fox Crescent | I believe that the Bible is the
Phone: 1-613-726-0014 | Ottawa, Ontario   | Word of God. Please contact me
EMail: dave@mielke.cc | Canada  K2A 1H7   | if you're concerned about Hell.
http://familyradio.com


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15  8:24           ` Russell King
@ 2002-07-15 15:48             ` David Mosberger
  2002-07-15 18:20               ` Albert D. Cahalan
  2002-07-15 16:07             ` Albert D. Cahalan
  1 sibling, 1 reply; 76+ messages in thread
From: David Mosberger @ 2002-07-15 15:48 UTC (permalink / raw)
  To: Russell King; +Cc: Albert D. Cahalan, Linus Torvalds, linux-kernel

>>>>> On Mon, 15 Jul 2002 09:24:11 +0100, Russell King <rmk@arm.linux.org.uk> said:

  Russell> On Mon, Jul 15, 2002 at 02:56:14AM -0400, Albert D. Cahalan
  Russell> wrote:
  >> Unfortunately, the hack must remain for another 4 years or so.
  >> Maybe that's not so bad though. I prefer it over this:
  >> 
  >> #ifdef __386__ #define HZ 100 #endif #ifdef __IA64__ #define HZ
  >> 1024 #endif #ifdef __ARM__ #define HZ 128 // if they settle on
  >> this

  Russell> Ehh?  It's been 100 on the majority of ARM.  If it's
  Russell> different in libproc, the libproc is broken.

libproc should be using AT_CLKTCK (as provided via sysconf(_SC_CLK_TCK))
at any rate.

	--david

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15  8:24           ` Russell King
  2002-07-15 15:48             ` David Mosberger
@ 2002-07-15 16:07             ` Albert D. Cahalan
  2002-07-15 17:06               ` Russell King
  2002-07-15 18:50               ` Linus Torvalds
  1 sibling, 2 replies; 76+ messages in thread
From: Albert D. Cahalan @ 2002-07-15 16:07 UTC (permalink / raw)
  To: Russell King; +Cc: Albert D. Cahalan, Linus Torvalds, linux-kernel

Russell King writes:
> On Mon, Jul 15, 2002 at 02:56:14AM -0400, Albert D. Cahalan wrote:

>> Unfortunately, the hack must remain for another 4 years or so.
>> Maybe that's not so bad though. I prefer it over this:
>> 
>> #ifdef __386__
>> #define HZ 100
>> #endif
>> #ifdef __IA64__
>> #define HZ 1024
>> #endif
>> #ifdef __ARM__
>> #define HZ 128  // if they settle on this
>
> Ehh?  It's been 100 on the majority of ARM.  If it's different in libproc,
> the libproc is broken.

It's not a different value in libproc. There's autodetection.
I can't just support "the majority of ARM", and people keep
giving me shit about HZ supposedly being a per-arch constant.
(not that there's a sane way to get a per-arch constant from
user code anyway)

> One (broken) machine type decided it would be a
> good idea to change it to 1000.  Since no one has paid any attention
> to this machine for some time, it's support code will get dropped if
> they don't fix it before 2.6.

You have 64, 128, and 1000. See for yourself.

arch-cl7500/param.h     #define HZ 100
arch-epxa10db/param.h   #define HZ 100
arch-integrator/param.h #define HZ 100
arch-l7200/param.h      #define HZ 128
arch-shark/param.h      #define HZ 64
arch-tbox/param.h       #define HZ 1000

I need to support all of that with one binary.
So I'm stuck with:

  case    9 ...   11 :  Hertz =   10; break; /* S/390 (sometimes) */
  case   18 ...   22 :  Hertz =   20; break; /* user-mode Linux */
  case   30 ...   34 :  Hertz =   32; break; /* ia64 emulator */
  case   48 ...   52 :  Hertz =   50; break;
  case   58 ...   62 :  Hertz =   60; break;
  case   63 ...   65 :  Hertz =   64; break; /* StrongARM /Shark */
  case   95 ...  105 :  Hertz =  100; break; /* normal Linux */
  case  124 ...  132 :  Hertz =  128; break; /* MIPS, ARM */
  case  195 ...  204 :  Hertz =  200; break; /* normal << 1 */
  case  253 ...  260 :  Hertz =  256; break;
  case  393 ...  408 :  Hertz =  400; break; /* normal << 2 */
  case  790 ...  808 :  Hertz =  800; break; /* normal << 3 */
  case  990 ... 1010 :  Hertz = 1000; break; /* ARM */
  case 1015 ... 1035 :  Hertz = 1024; break; /* Alpha, ia64 */
  case 1180 ... 1220 :  Hertz = 1200; break; /* Alpha */

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15 16:07             ` Albert D. Cahalan
@ 2002-07-15 17:06               ` Russell King
  2002-07-15 18:43                 ` Albert D. Cahalan
  2002-07-15 18:50               ` Linus Torvalds
  1 sibling, 1 reply; 76+ messages in thread
From: Russell King @ 2002-07-15 17:06 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: Linus Torvalds, linux-kernel

On Mon, Jul 15, 2002 at 12:07:18PM -0400, Albert D. Cahalan wrote:
> You have 64, 128, and 1000. See for yourself.
> 
> arch-cl7500/param.h     #define HZ 100
> arch-epxa10db/param.h   #define HZ 100
> arch-integrator/param.h #define HZ 100
> arch-l7200/param.h      #define HZ 128
> arch-shark/param.h      #define HZ 64
> arch-tbox/param.h       #define HZ 1000
> 
> I need to support all of that with one binary.
> So I'm stuck with:

Lets look more closely:

#ifndef HZ
#define HZ 100
#endif
#if defined(__KERNEL__) && (HZ == 100)
#define hz_to_std(a) (a)
#endif

And:

$ grep hz_to_std arch-*/param.h
arch-l7200/param.h:#define hz_to_std(a) ((a * HZ)/100)
arch-shark/param.h:#define hz_to_std(a) ((a * HZ)/100)

As I said, tbox is broken, so ignore that.

And hz_to_std gets used (fs/proc/array.c):

                hz_to_std(task->times.tms_utime),
                hz_to_std(task->times.tms_stime),
                hz_to_std(task->times.tms_cutime),
                hz_to_std(task->times.tms_cstime),

So merely grepping for HZ doesn't actually tell you anything.

All /proc values are in 100Hz units on ARM.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15 15:48             ` David Mosberger
@ 2002-07-15 18:20               ` Albert D. Cahalan
  2002-07-15 18:30                 ` David Mosberger
  0 siblings, 1 reply; 76+ messages in thread
From: Albert D. Cahalan @ 2002-07-15 18:20 UTC (permalink / raw)
  To: davidm; +Cc: Russell King, Albert D. Cahalan, Linus Torvalds, linux-kernel

David Mosberger writes:

> libproc should be using AT_CLKTCK (as provided via sysconf(_SC_CLK_TCK))
> at any rate.

If that would work reliably, sure. The glibc hackers have had
some trouble with doing a correct implementation. I've heard
that recently the kernel has been supplying glibc with HZ via
the ELF note mechanism, but I've no way to tell a broken glibc
from a working one. Thus libproc does things the painful way.

Perhaps you could explain how to access ELF notes from
regular app code. That covers 2.4 kernels AFAIK, and so
the hacks could go away as soon as Debian retires the
2.2 kernel.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15 18:20               ` Albert D. Cahalan
@ 2002-07-15 18:30                 ` David Mosberger
  0 siblings, 0 replies; 76+ messages in thread
From: David Mosberger @ 2002-07-15 18:30 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: davidm, Russell King, Linus Torvalds, linux-kernel

>>>>> On Mon, 15 Jul 2002 14:20:31 -0400 (EDT), "Albert D. Cahalan" <acahalan@cs.uml.edu> said:

  Albert> Perhaps you could explain how to access ELF notes from
  Albert> regular app code. That covers 2.4 kernels AFAIK, and so the
  Albert> hacks could go away as soon as Debian retires the 2.2
  Albert> kernel.

The ELF auxiliary info table is stored at the top of the user level
stack (above argv and envp).  &envp[num_envs] should get you there
(check on the alignment, though).

	--david

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15 17:06               ` Russell King
@ 2002-07-15 18:43                 ` Albert D. Cahalan
  2002-07-15 18:53                   ` Russell King
  0 siblings, 1 reply; 76+ messages in thread
From: Albert D. Cahalan @ 2002-07-15 18:43 UTC (permalink / raw)
  To: Russell King; +Cc: Albert D. Cahalan, Linus Torvalds, linux-kernel

Russell King writes:
> On Mon, Jul 15, 2002 at 12:07:18PM -0400, Albert D. Cahalan wrote:

>> You have 64, 128, and 1000. See for yourself.
>>
>> arch-cl7500/param.h     #define HZ 100
>> arch-epxa10db/param.h   #define HZ 100
>> arch-integrator/param.h #define HZ 100
>> arch-l7200/param.h      #define HZ 128
>> arch-shark/param.h      #define HZ 64
>> arch-tbox/param.h       #define HZ 1000
>>
>> I need to support all of that with one binary.
>> So I'm stuck with:
>
> Lets look more closely:
>
> #ifndef HZ
> #define HZ 100
> #endif
> #if defined(__KERNEL__) && (HZ == 100)
> #define hz_to_std(a) (a)
> #endif
>
> And:
>
> $ grep hz_to_std arch-*/param.h
> arch-l7200/param.h:#define hz_to_std(a) ((a * HZ)/100)
> arch-shark/param.h:#define hz_to_std(a) ((a * HZ)/100)

Won't that overflow in 3 or 4 days?

> As I said, tbox is broken, so ignore that.

OK.

> And hz_to_std gets used (fs/proc/array.c):
>
>                 hz_to_std(task->times.tms_utime),
>                 hz_to_std(task->times.tms_stime),
>                 hz_to_std(task->times.tms_cutime),
>                 hz_to_std(task->times.tms_cstime),

Now look in the 2.4.xx kernel source.

> So merely grepping for HZ doesn't actually tell you anything.
> 
> All /proc values are in 100Hz units on ARM.

Since kernel 2.5.25 it looks like. I must support
the 2.4.xx kernels at least, and 2.2.xx is still
pretty popular.



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15 16:07             ` Albert D. Cahalan
  2002-07-15 17:06               ` Russell King
@ 2002-07-15 18:50               ` Linus Torvalds
  2002-07-15 20:15                 ` Albert D. Cahalan
  1 sibling, 1 reply; 76+ messages in thread
From: Linus Torvalds @ 2002-07-15 18:50 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: Russell King, linux-kernel


On Mon, 15 Jul 2002, Albert D. Cahalan wrote:
>
> It's not a different value in libproc. There's autodetection.
> I can't just support "the majority of ARM", and people keep
> giving me shit about HZ supposedly being a per-arch constant.
> (not that there's a sane way to get a per-arch constant from
> user code anyway)

But that's just _wrong_.

There _is_ a sane way to get the per-arch constant, and there has been for 
a long long time.

The kernel exports it with the AT_CLKTCK ELF auxiliary note to every ELF
binary ever loaded, and I think glibc in turn exports that value through
the regular sysconf(_SC_CLK_TCK) thing. (Yeah, I disagree with some of the
glibc sysconf implementation, but it sure should be there, and it's
documented).

If that doesn't work, then it's a glibc bug (well, in theory there could
be a kernel bug too, but since it's a one-liner in the kernel I really
doubt it).

		Linus


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15 18:43                 ` Albert D. Cahalan
@ 2002-07-15 18:53                   ` Russell King
  0 siblings, 0 replies; 76+ messages in thread
From: Russell King @ 2002-07-15 18:53 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: Linus Torvalds, linux-kernel

On Mon, Jul 15, 2002 at 02:43:00PM -0400, Albert D. Cahalan wrote:
> Russell King writes:
> > $ grep hz_to_std arch-*/param.h
> > arch-l7200/param.h:#define hz_to_std(a) ((a * HZ)/100)
> > arch-shark/param.h:#define hz_to_std(a) ((a * HZ)/100)
> 
> Won't that overflow in 3 or 4 days?

Probably.  Someone else's problem though (who wrote those)

> > And hz_to_std gets used (fs/proc/array.c):
> >
> >                 hz_to_std(task->times.tms_utime),
> >                 hz_to_std(task->times.tms_stime),
> >                 hz_to_std(task->times.tms_cutime),
> >                 hz_to_std(task->times.tms_cstime),
> 
> Now look in the 2.4.xx kernel source.

Firstly, you can't base any assumptions about ARM from what's in the
main kernels.

It's not in the Marcelo source, but in the -rmk patch, which you need
to have a working kernel on ARM for _any_ kernel what so ever (because
I haven't yet managed to get Linus to take some trivial bits needed,
neither have I had any response why he won't take them.)

Yes, ARM has always been broken in every kernel there ever has been
from Linus/Marcelo/Alan.

The situation is improving with BK, but it's less than optimal; the
generic changes can't go through BK, therefore I can't really have a
BK tree that builds for ARM (because then the merging of csets gets
horrible.)

This all said, it looks like libproc automatically detects whatever
the kernel uses, so this is all irrelevant in the end.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15 18:50               ` Linus Torvalds
@ 2002-07-15 20:15                 ` Albert D. Cahalan
  0 siblings, 0 replies; 76+ messages in thread
From: Albert D. Cahalan @ 2002-07-15 20:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Albert D. Cahalan, Russell King, linux-kernel

Linus Torvalds writes:
> On Mon, 15 Jul 2002, Albert D. Cahalan wrote:

>> It's not a different value in libproc. There's autodetection.
>> I can't just support "the majority of ARM", and people keep
>> giving me shit about HZ supposedly being a per-arch constant.
>> (not that there's a sane way to get a per-arch constant from
>> user code anyway)
>
> But that's just _wrong_.

If you only support recent kernels and glibc, true.
Debian is about to release a distribution with the 2.2 kernel.

> There _is_ a sane way to get the per-arch constant, and there has been for 
> a long long time.

Your "long long time" is very different, because you
always (?) run the very latest kernel.

> The kernel exports it with the AT_CLKTCK ELF auxiliary note to every ELF
> binary ever loaded, and I think glibc in turn exports that value through
> the regular sysconf(_SC_CLK_TCK) thing. (Yeah, I disagree with some of the
> glibc sysconf implementation, but it sure should be there, and it's
> documented).
>
> If that doesn't work, then it's a glibc bug (well, in theory there could
> be a kernel bug too, but since it's a one-liner in the kernel I really
> doubt it).

Yeah, NOW it should work fine. App code sees:

old glibc and old kernel  -->  guess
old glibc and new kernel  -->  guess
new glibc and old kernel  -->  guess
new glibc and new kernel  -->  useful data

(the guess is correct for unmodified x86)

Two problems with that:

1. must handle the "guess" case
2. can't tell a guess from useful data!

So I can't use the useful data for a few more years.
I can cut that time down to maybe 2 years if I write
code to dig up the ELF notes myself, assuming that
were introduced with the 2.4 kernel.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11 19:21     ` Albert D. Cahalan
@ 2002-07-16  9:17       ` Kai Henningsen
  0 siblings, 0 replies; 76+ messages in thread
From: Kai Henningsen @ 2002-07-16  9:17 UTC (permalink / raw)
  To: linux-kernel

acahalan@cs.uml.edu (Albert D. Cahalan)  wrote on 11.07.02 in <200207111921.g6BJLtI459123@saturn.cs.uml.edu>:

> Hell yes. It's going to remain evil until the 2.4 kernel
> is a distant memory. Debian uses a 2.2 kernel in the
> upcoming release, so it will be a good long time until
> everyone is using a 2.6 kernel. When 2.8 comes out,
> Debian will finally stop using 2.4 and I can get rid of
> my evil hack.

Currently, the upcoming version has 4 2.2.20 kernels, 7 2.4.16 kernels,  
and 7 2.4.18 kernels.


MfG Kai

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-18 12:57         ` Richard B. Johnson
@ 2002-07-18 13:25           ` Daniel Phillips
  0 siblings, 0 replies; 76+ messages in thread
From: Daniel Phillips @ 2002-07-18 13:25 UTC (permalink / raw)
  To: root, Linus Torvalds; +Cc: linux-kernel

On Thursday 18 July 2002 14:57, Richard B. Johnson wrote:
> On Wed, 17 Jul 2002, Linus Torvalds wrote:
> > On Wed, 17 Jul 2002, Richard B. Johnson wrote:
> > >
> > > It is hardly novel and I can't imagine how Bresenham or whomever
> > > could make such a claim to the obvious. Even the DOS writer(s) used
> > > this technique to get one-second time intervals from the 18.206
> > > ticks/per second.
> > 
> > Ehh.. Look at _existing_ linux code to do exactly the same.
> > 
> > See update_wall_time_one_tick() and second_overflow() (which does a lot
> > more besides, but it does largely boil down to this "average fractions
> > using basic integer math" thing.
>
> Maybe you see something in the code I don't. In fact, the hardware
> apprears to have been programmed to interrupt at the HZ rate
> using the constant, CLOCK_TICK_RATE, defined in ../asm/timex.h.
> Maybe the hardware can't be programmed to interrupt at HZ so the
> real ticks are adjusted by 'average fractions' code, but it is
> very unclear if this is being done.
> 
> Here is a 20 year-old source snippit of some synthetic division
> code used to correct the DOS time by substituting part of INT 08.

Yes, that's the same algorithm all right, and 'synthetic division'
is a much better name for it than the one I used.  IMHO, we should be
doing this even when there happens to be an integral relationship 
between timer interrupt rate and HZ.  It eliminates a bunch of
posturing we'd otherwise be stuck with to explain/work around
restrictions in the choice of intervals.  With a little bit of head
scratching it's also possible to add the bookkeeping necessary to
handle varying physical interrupt rates, while still maintaining
the *exact* correct HZ tick count.

Stripping some cruft from your historical example:

 	SUB	WORD PTR [ACCUMULATOR],NUMERATOR
 	JNC	NO_TICK
 	MOV	AX,DIVISOR
 	ADD	WORD PTR [ACCUMULATOR],AX	; Synth div
 	CALL	TICK
NO_TICK:

Pretty hard to beat that for efficiency.

-- 
Daniel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-17 21:02       ` Linus Torvalds
  2002-07-17 21:16         ` Daniel Phillips
@ 2002-07-18 12:57         ` Richard B. Johnson
  2002-07-18 13:25           ` Daniel Phillips
  1 sibling, 1 reply; 76+ messages in thread
From: Richard B. Johnson @ 2002-07-18 12:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Phillips, linux-kernel

On Wed, 17 Jul 2002, Linus Torvalds wrote:

> 
> 
> On Wed, 17 Jul 2002, Richard B. Johnson wrote:
> >
> > It is hardly novel and I can't imagine how Bresenham or whomever
> > could make such a claim to the obvious. Even the DOS writer(s) used
> > this technique to get one-second time intervals from the 18.206
> > ticks/per second.
> 
> Ehh.. Look at _existing_ linux code to do exactly the same.
> 
> See update_wall_time_one_tick() and second_overflow() (which does a lot
> more besides, but it does largely boil down to this "average fractions
> using basic integer math" thing.
> 
> 		Linus
> 
Maybe you see something in the code I don't. In fact, the hardware
apprears to have been programmed to interrupt at the HZ rate
using the constant, CLOCK_TICK_RATE, defined in ../asm/timex.h.
Maybe the hardware can't be programmed to interrupt at HZ so the
real ticks are adjusted by 'average fractions' code, but it is
very unclear if this is being done.

Here is a 20 year-old source snippit of some synthetic division
code used to correct the DOS time by substituting part of INT 08.

The dividend, DIVIDEND, was cached in a variable called _pll,
(accessible from 'C' code as pll). This could be tuned to slowly
adjust the time to make it as exact as you wanted. I replaced
the actual time-keeping code with a CALL ONE_SEC to shorten
this example. It executes at 1 second intervals. The history
in the comments is interesting.


Old-fashion Intel DEST <--- SOURCE

DIVIDEND	EQU	18206			; 18.206
DIVISOR		EQU	1000			; 18206/1000 = 18.206

_pll	DW	DIVIDEND
_false	DW	0
;
;	This is the local timer tick.
;
;	The timer tick used to interrupt at 18.206 ticks/second. When
;	the AT got redone, this was changed to 18.158 because the
;	clock to timer channel 0 got changed to 1.190 MHz and the
;	requirement of using a 3.579545 MHz (color subcarrier) crystal
;	was eliminated.
;
;	Was:	3.579545 / 3 = 1.193181667 / 65536 = 18.206
;	Now:	1.190 / 65536 = 18.158
;	Also:	1.934 / 65536 = 18.210
;
;	As usual, things are not well in AT-Land. The 'C' runtime library
;	and MS-DOS still think that the proper divisor is 18.206. So we
;	have to use that value or the time-stamps on the hour will be
;	about 1/2 second in error. This means that the AT-Time is wrong
;	by about 1/2 seconds per hour!
;
;
EVEN
TIMER	PROC	FAR
	SAV_REG	AX, DS				; Save registers used
	MOV	AX,DGROUP			; Local data area
	MOV	DS,AX				; Set segment
	MOV	AL,01001011B
;                  ||||||||_____ Select in-service register
;                  ||||||_______ No poll
;                  |||||________ Always 1
;                  ||||_________ Always 0
;                  |||__________ Standard mask
;                  |____________ Always 0
	OUT	INT_CTL,AL			; Select OCW3
	IN	AL,INT_CTL			; Get results
	AND	AL,00000001B			; Is interrupt pending?
	JNZ	OKAY				; Yes
	INC	WORD PTR _false			; Not good, record
OKAY:	MOV	AL,SPC_EOI			; Specific EOI
	OUT	INT_CTL,AL			; Reset controller
	SUB	WORD PTR [LCL_SECONDS],DIVISOR	; Subtract 1000 ticks
	JNC	NOSEC				; 1 second is not up yet
	MOV	AX,WORD PTR [_pll]		; Get loop variable
	ADD	WORD PTR [LCL_SECONDS],AX	; Synth div = 1000/18206
	CALL	ONE_SEC				; Dummy
;
NOSEC:	RES_REG	AX, DS				; Restore registers used
	IRET					; Done
TIMER	ENDP
;-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-


Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

                 Windows-2000/Professional isn't.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-17 20:31     ` Richard B. Johnson
  2002-07-17 20:40       ` Daniel Phillips
  2002-07-17 21:02       ` Linus Torvalds
@ 2002-07-18 10:10       ` Kai Henningsen
  2 siblings, 0 replies; 76+ messages in thread
From: Kai Henningsen @ 2002-07-18 10:10 UTC (permalink / raw)
  To: linux-kernel

root@chaos.analogic.com (Richard B. Johnson)  wrote on 17.07.02 in <Pine.LNX.3.95.1020717162206.12592A-100000@chaos.analogic.com>:

> On Wed, 17 Jul 2002, Daniel Phillips wrote:
>
> > On Monday 15 July 2002 07:06, Linus Torvalds wrote:

[Are those attributions really right?]

> > > This Bresenham trick works for arbitrary collections of interrupt
> > > rates, all with different periods.  It has the property that,
> > > over time, the total number of invocations at each rate remains
> > > *exactly* correct, and so long as the raw interrupt runs at a
> > > reasonably high rate, displacement isn't that bad either.
> >
> > This technique is scarcely less efficient than the cruder method.
>
> It is hardly novel and I can't imagine how Bresenham or whomever
> could make such a claim to the obvious. Even the DOS writer(s) used

Well, I mightpoint out the original (AFAIAA) paper is "J. E. Bresenham,  
IBM Systems Journal 4, 25-30 (1965)".

It's a long time from 1965 to the creation of DOS.

MfG Kai

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-17 21:02       ` Linus Torvalds
@ 2002-07-17 21:16         ` Daniel Phillips
  2002-07-18 12:57         ` Richard B. Johnson
  1 sibling, 0 replies; 76+ messages in thread
From: Daniel Phillips @ 2002-07-17 21:16 UTC (permalink / raw)
  To: Linus Torvalds, Richard B. Johnson; +Cc: linux-kernel

On Wednesday 17 July 2002 23:02, Linus Torvalds wrote:
> On Wed, 17 Jul 2002, Richard B. Johnson wrote:
> >
> > It is hardly novel and I can't imagine how Bresenham or whomever
> > could make such a claim to the obvious. Even the DOS writer(s) used
> > this technique to get one-second time intervals from the 18.206
> > ticks/per second.
> 
> Ehh.. Look at _existing_ linux code to do exactly the same.
> 
> See update_wall_time_one_tick() and second_overflow() (which does a lot
> more besides, but it does largely boil down to this "average fractions
> using basic integer math" thing.

I see lots of stuff in there all right, but I don't see anything that
implements the numerator/denominator error analysis technique I
described above.  Maybe I just didn't look hard enough.

-- 
Daniel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-17 20:31     ` Richard B. Johnson
  2002-07-17 20:40       ` Daniel Phillips
@ 2002-07-17 21:02       ` Linus Torvalds
  2002-07-17 21:16         ` Daniel Phillips
  2002-07-18 12:57         ` Richard B. Johnson
  2002-07-18 10:10       ` Kai Henningsen
  2 siblings, 2 replies; 76+ messages in thread
From: Linus Torvalds @ 2002-07-17 21:02 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Daniel Phillips, linux-kernel



On Wed, 17 Jul 2002, Richard B. Johnson wrote:
>
> It is hardly novel and I can't imagine how Bresenham or whomever
> could make such a claim to the obvious. Even the DOS writer(s) used
> this technique to get one-second time intervals from the 18.206
> ticks/per second.

Ehh.. Look at _existing_ linux code to do exactly the same.

See update_wall_time_one_tick() and second_overflow() (which does a lot
more besides, but it does largely boil down to this "average fractions
using basic integer math" thing.

		Linus


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-17 20:40       ` Daniel Phillips
@ 2002-07-17 21:02         ` Richard B. Johnson
  0 siblings, 0 replies; 76+ messages in thread
From: Richard B. Johnson @ 2002-07-17 21:02 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=US-ASCII, Size: 2877 bytes --]

On Wed, 17 Jul 2002, Daniel Phillips wrote:

> On Wednesday 17 July 2002 22:31, Richard B. Johnson wrote:
> > On Wed, 17 Jul 2002, Daniel Phillips wrote:
> > 
> > > On Monday 15 July 2002 07:06, Linus Torvalds wrote:
> > > > There is, of course, the option to do variable frequency (and make it
> > > > integer multiples of the exposed "constant HZ" so that kernel code
> > > > doesn't actually need to _care_ about the variability). There are
> > > > patches to play with things like that.
> > > 
> > > We don't have to feel restricted to integer multiples.  I'll paste in my 
> > > earlier post, for your convenience:
> > > 
> > > > ...If somebody wants a cruder scheduling interval than the raw timer
> > > > interrupt, that's child's play, just step the interval down.  The
> > > > only slightly challenging thing is do that without restricting
> > > > choice of rate for the raw timer and scheduler, respectively.  Here,
> > > > a novel application of Bresenham's algorithm (the line drawing
> > > > algorithm) works nicely: at each raw interrupt, subtract the period
> > > > of the raw interrupt from an accumulator; if the result is less
> > > > than zero, add the period of the scheduler to the accumlator and
> > > > drop into the scheduler's part of the timer interrupt.
> > > 
> > > [which just increments the timer variable I believe]
> > > 
> > > > This Bresenham trick works for arbitrary collections of interrupt
> > > > rates, all with different periods.  It has the property that,
> > > > over time, the total number of invocations at each rate remains
> > > > *exactly* correct, and so long as the raw interrupt runs at a
> > > > reasonably high rate, displacement isn't that bad either.
> > > 
> > > This technique is scarcely less efficient than the cruder method.
> > 
> > It is hardly novel and I can't imagine how Bresenham or whomever
> > could make such a claim to the obvious. Even the DOS writer(s) used
> > this technique to get one-second time intervals from the 18.206
> > ticks/per second. This is simply division by subtraction, but you
> > don't throw away the remainder. Therefore, in the limit, there is
> > no remainder. However, at any instant, the time can be off by as
> > much as the divisor -1. FYI, you make digital filters using this
> > same method, it's hardly novel.
> 
> It's novel for Linux then, because it seems not to have occured to
> anyone here.  I'll take your agressive response as a vote in favor.
> 

It's basically no overhead greater than the minimum if written in
assembly because the carry to less-than-zero is in the flags. In
'C' it requires a subtraction and then a test, but it's trivial code
and it provides for non-integral divisions with integers. I'm all
for it.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

                 Windows-2000/Professional isn't.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-17 19:33   ` Daniel Phillips
  2002-07-17 20:31     ` Richard B. Johnson
@ 2002-07-17 20:55     ` Linus Torvalds
  1 sibling, 0 replies; 76+ messages in thread
From: Linus Torvalds @ 2002-07-17 20:55 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel



On Wed, 17 Jul 2002, Daniel Phillips wrote:
>
> We don't have to feel restricted to integer multiples.  I'll paste in my
> earlier post, for your convenience:

Oh, I agree. I think the integer multiplies simplify the problem space,
and should be trivial for most timer hardware (ie most timer hardware is
likely just a counter, so making the countdown value be N times as big
just automatically gives you integer multiples).

The _important_ part I would prefer people to take away is that it is
easier to "slow down" the clock than it is to speed it up. Mainly because
the place that are likely to care about speeding it up are also very very
timing-critical. For example, there is no way in _hell_ that we're going
to reprogram the old-style PC/AT timer inside the "add_timer()" function.
It just is not viable.

In contrast, the places who are interested in slowing the timer down are
also the places likely to not be as timing-critical. The idle loop being
the perfect example (and also being right now the _only_ example where
somebody actually asked for a slower timer tick).

Also note that once you're willing to do this in the slow path, you can
also do real "fixups" to the results since you can afford to take a small
hit when you get back to "fast mode". For example, if we only do this on
PC's while we go into C3 anyway (where latencies to saving power are quite
noticeably anyway, so that the idle loop already has to do some latency
estimation before it decides to go into C3), then we can easily afford to
completely re-setting not just the timer counter, but doing fairly complex
things like re-adjusting the whole time-of-day clock.

See how it becomes a much simpler game (and you have more options) if you
take the "slow the timer down when idle" approach instead of taking the
"speed the timer up when you need to" approach?

			Linus


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-17 20:31     ` Richard B. Johnson
@ 2002-07-17 20:40       ` Daniel Phillips
  2002-07-17 21:02         ` Richard B. Johnson
  2002-07-17 21:02       ` Linus Torvalds
  2002-07-18 10:10       ` Kai Henningsen
  2 siblings, 1 reply; 76+ messages in thread
From: Daniel Phillips @ 2002-07-17 20:40 UTC (permalink / raw)
  To: root; +Cc: linux-kernel

On Wednesday 17 July 2002 22:31, Richard B. Johnson wrote:
> On Wed, 17 Jul 2002, Daniel Phillips wrote:
> 
> > On Monday 15 July 2002 07:06, Linus Torvalds wrote:
> > > There is, of course, the option to do variable frequency (and make it
> > > integer multiples of the exposed "constant HZ" so that kernel code
> > > doesn't actually need to _care_ about the variability). There are
> > > patches to play with things like that.
> > 
> > We don't have to feel restricted to integer multiples.  I'll paste in my 
> > earlier post, for your convenience:
> > 
> > > ...If somebody wants a cruder scheduling interval than the raw timer
> > > interrupt, that's child's play, just step the interval down.  The
> > > only slightly challenging thing is do that without restricting
> > > choice of rate for the raw timer and scheduler, respectively.  Here,
> > > a novel application of Bresenham's algorithm (the line drawing
> > > algorithm) works nicely: at each raw interrupt, subtract the period
> > > of the raw interrupt from an accumulator; if the result is less
> > > than zero, add the period of the scheduler to the accumlator and
> > > drop into the scheduler's part of the timer interrupt.
> > 
> > [which just increments the timer variable I believe]
> > 
> > > This Bresenham trick works for arbitrary collections of interrupt
> > > rates, all with different periods.  It has the property that,
> > > over time, the total number of invocations at each rate remains
> > > *exactly* correct, and so long as the raw interrupt runs at a
> > > reasonably high rate, displacement isn't that bad either.
> > 
> > This technique is scarcely less efficient than the cruder method.
> 
> It is hardly novel and I can't imagine how Bresenham or whomever
> could make such a claim to the obvious. Even the DOS writer(s) used
> this technique to get one-second time intervals from the 18.206
> ticks/per second. This is simply division by subtraction, but you
> don't throw away the remainder. Therefore, in the limit, there is
> no remainder. However, at any instant, the time can be off by as
> much as the divisor -1. FYI, you make digital filters using this
> same method, it's hardly novel.

It's novel for Linux then, because it seems not to have occured to
anyone here.  I'll take your agressive response as a vote in favor.

-- 
Daniel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-17 19:33   ` Daniel Phillips
@ 2002-07-17 20:31     ` Richard B. Johnson
  2002-07-17 20:40       ` Daniel Phillips
                         ` (2 more replies)
  2002-07-17 20:55     ` Linus Torvalds
  1 sibling, 3 replies; 76+ messages in thread
From: Richard B. Johnson @ 2002-07-17 20:31 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Linus Torvalds, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=US-ASCII, Size: 2195 bytes --]

On Wed, 17 Jul 2002, Daniel Phillips wrote:

> On Monday 15 July 2002 07:06, Linus Torvalds wrote:
> > There is, of course, the option to do variable frequency (and make it
> > integer multiples of the exposed "constant HZ" so that kernel code
> > doesn't actually need to _care_ about the variability). There are
> > patches to play with things like that.
> 
> We don't have to feel restricted to integer multiples.  I'll paste in my 
> earlier post, for your convenience:
> 
> > ...If somebody wants a cruder scheduling interval than the raw timer
> > interrupt, that's child's play, just step the interval down.  The
> > only slightly challenging thing is do that without restricting
> > choice of rate for the raw timer and scheduler, respectively.  Here,
> > a novel application of Bresenham's algorithm (the line drawing
> > algorithm) works nicely: at each raw interrupt, subtract the period
> > of the raw interrupt from an accumulator; if the result is less
> > than zero, add the period of the scheduler to the accumlator and
> > drop into the scheduler's part of the timer interrupt.
> 
> [which just increments the timer variable I believe]
> 
> > This Bresenham trick works for arbitrary collections of interrupt
> > rates, all with different periods.  It has the property that,
> > over time, the total number of invocations at each rate remains
> > *exactly* correct, and so long as the raw interrupt runs at a
> > reasonably high rate, displacement isn't that bad either.
> 
> This technique is scarcely less efficient than the cruder method.

It is hardly novel and I can't imagine how Bresenham or whomever
could make such a claim to the obvious. Even the DOS writer(s) used
this technique to get one-second time intervals from the 18.206
ticks/per second. This is simply division by subtraction, but you
don't throw away the remainder. Therefore, in the limit, there is
no remainder. However, at any instant, the time can be off by as
much as the divisor -1. FYI, you make digital filters using this
same method, it's hardly novel.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

                 Windows-2000/Professional isn't.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15  5:06 ` Linus Torvalds
  2002-07-15 16:26   ` Robert Love
  2002-07-16 11:41   ` Vojtech Pavlik
@ 2002-07-17 19:33   ` Daniel Phillips
  2002-07-17 20:31     ` Richard B. Johnson
  2002-07-17 20:55     ` Linus Torvalds
  2 siblings, 2 replies; 76+ messages in thread
From: Daniel Phillips @ 2002-07-17 19:33 UTC (permalink / raw)
  To: Linus Torvalds, linux-kernel

On Monday 15 July 2002 07:06, Linus Torvalds wrote:
> There is, of course, the option to do variable frequency (and make it
> integer multiples of the exposed "constant HZ" so that kernel code
> doesn't actually need to _care_ about the variability). There are
> patches to play with things like that.

We don't have to feel restricted to integer multiples.  I'll paste in my 
earlier post, for your convenience:

> ...If somebody wants a cruder scheduling interval than the raw timer
> interrupt, that's child's play, just step the interval down.  The
> only slightly challenging thing is do that without restricting
> choice of rate for the raw timer and scheduler, respectively.  Here,
> a novel application of Bresenham's algorithm (the line drawing
> algorithm) works nicely: at each raw interrupt, subtract the period
> of the raw interrupt from an accumulator; if the result is less
> than zero, add the period of the scheduler to the accumlator and
> drop into the scheduler's part of the timer interrupt.

[which just increments the timer variable I believe]

> This Bresenham trick works for arbitrary collections of interrupt
> rates, all with different periods.  It has the property that,
> over time, the total number of invocations at each rate remains
> *exactly* correct, and so long as the raw interrupt runs at a
> reasonably high rate, displacement isn't that bad either.

This technique is scarcely less efficient than the cruder method.

-- 
Daniel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15  5:06 ` Linus Torvalds
  2002-07-15 16:26   ` Robert Love
@ 2002-07-16 11:41   ` Vojtech Pavlik
  2002-07-17 19:33   ` Daniel Phillips
  2 siblings, 0 replies; 76+ messages in thread
From: Vojtech Pavlik @ 2002-07-16 11:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Mon, Jul 15, 2002 at 05:06:45AM +0000, Linus Torvalds wrote:
> In article <59885C5E3098D511AD690002A5072D3C02AB7F88@orsmsx111.jf.intel.com>,
> Grover, Andrew <andrew.grover@intel.com> wrote:
> >
> >But on the other hand, increasing HZ has perf/latency benefits, yes? Have
> >these been quantified?
> 
> I've never had good reason to believe the latency/perf benefits myself,
> but I was approached at OLS about problems with something as simple as
> DVD playing, where a 100Hz timer means that the DVD player ends up
> having to busy-loop on gettimeofday() because it cannot sanely sleep due
> to the lack in sufficient sleeping granularity.
> 
> You apparently end up visibly missing frames - a frame is just 3 timer
> ticks at 100 Hz, and considering that the kernel has to round up by one
> due to POSIX requirements _and_ considering that you lose roughly one
> for actually processing the frame itself, that doesn't sound _that_
> outlandish. 

Actually, this example is pretty much false I believe.

Since there is always the screen refresh rate going at say 85 Hz, you'll
be missing frames anyway.

The really correct solution would be to use the vertical blank
interrupt, which all recent cards provide, to wake the X process to tell
it that it should flip it's xvideo double-buffer (*), and to tell the
DVD player to supply another frame to it, which it would then preferably
DMA over AGP straight into the video card memory.

Now, if you wanted a real smooth video, you'd set your screen refresh
rate to 100 Hz in Europe and 120 Hz in US. (Without that it never can be
100% smooth anyway). And it also has the nice side effect of eliminating
the screen flicker caused by fluorescent lamp interference.

(*) If our interrupt-to-wake latency is too large for X to do the buffer
flip in the vblank, then we'll probably need some more kernel support
for that.

-- 
Vojtech Pavlik
SuSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15 19:52       ` mbs
@ 2002-07-15 20:01         ` yodaiken
  0 siblings, 0 replies; 76+ messages in thread
From: yodaiken @ 2002-07-15 20:01 UTC (permalink / raw)
  To: mbs; +Cc: Linus Torvalds, Robert Love, linux-kernel

On Mon, Jul 15, 2002 at 03:52:37PM -0400, mbs wrote:
> On Monday 15 July 2002 14:56, Linus Torvalds wrote:
> > (*) Which is a lot less than the hw can generate, since you mustn't allow
> > users to bog down the system in timer interrupts by just using
> > "itimer(ITIMER_REAL, .. fine-resolution..)".
> 
> actually, that is an interesting philosophical argument.
> 
> in an embedded system, it is sometimes more useful to not put artificial 

That's why we have RTLinux.

> in an embedded system a "tickless" system is sometimes preferable to a ticked 
> system.  there is often only one or a very small number of processes/threads 
> running and the extra overhead of 10 surplus clock ticks per process quantum 
> is a waste of cycles. (also when using a ppc or similar modern chip(flame 
> on;-), there is no need to keep a software wall clock, as the cpu has a 64bit 
> free running counter)  

Right: but "one or a very small number of processes/threads" does not apply to 
Linux.



-- 
---------------------------------------------------------
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15 18:56     ` Linus Torvalds
@ 2002-07-15 19:52       ` mbs
  2002-07-15 20:01         ` yodaiken
  0 siblings, 1 reply; 76+ messages in thread
From: mbs @ 2002-07-15 19:52 UTC (permalink / raw)
  To: Linus Torvalds, Robert Love; +Cc: linux-kernel



On Monday 15 July 2002 14:56, Linus Torvalds wrote:
> On 15 Jul 2002, Robert Love wrote:
> > A cleaner solution to this issue is a higher resolution timer, e.g. the
> > high-res-timers project which has high resolution POSIX timers.
>
> But that really doesn't solve the problem either.
>
> You still need to have some limit on the timer resolution. Whether you
> call that limit "HZ" or something else is irrelevant in the end. Just
> calling them "high-resolution" doesn't make the problem go away, you still
> have some resolution (*).
>
> So once you set some magic limit on the fine-grained resolution (let's
> call that "MAX_FINE_HZ"), you might as well realize that that really is
> 100% equivalent to just making HZ _be_ that value. Together with possibly
> making the actual timer tick happen at a slower rate according to some
> other heuristics (ie "the system doesn't need timers right now, let's just
> not do them").
>
> 		Linus
>
> (*) Which is a lot less than the hw can generate, since you mustn't allow
> users to bog down the system in timer interrupts by just using
> "itimer(ITIMER_REAL, .. fine-resolution..)".

actually, that is an interesting philosophical argument.

in an embedded system, it is sometimes more useful to not put artificial 
constraints on the system and allow the clock and timer system to work in hw 
increments, but document the hell out of it.

this is the "give 'em enough rope to hang themselves, but tell them the 
precise length of the rope" model.

in an embedded system a "tickless" system is sometimes preferable to a ticked 
system.  there is often only one or a very small number of processes/threads 
running and the extra overhead of 10 surplus clock ticks per process quantum 
is a waste of cycles. (also when using a ppc or similar modern chip(flame 
on;-), there is no need to keep a software wall clock, as the cpu has a 64bit 
free running counter)  

I had this discussion with george A. early in the posix timers project and I 
argued/begged for a compile time config option giving the option of ticked 
and tickless versions.  George chose to go with a ticked system, because it 
benchmarked better in a general purpose system, particlularly under high 
loads, and he didn't have time to implement two systems.   he made the right 
choice for the general purpose kernel and for probably 80% of the embedded 
market. (I'm in the other 20%) 

-- 
/**************************************************
**   Mark Salisbury       ||      mbs@mc.com     **
**************************************************/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15 16:26   ` Robert Love
@ 2002-07-15 18:56     ` Linus Torvalds
  2002-07-15 19:52       ` mbs
  0 siblings, 1 reply; 76+ messages in thread
From: Linus Torvalds @ 2002-07-15 18:56 UTC (permalink / raw)
  To: Robert Love; +Cc: linux-kernel


On 15 Jul 2002, Robert Love wrote:
> 
> A cleaner solution to this issue is a higher resolution timer, e.g. the
> high-res-timers project which has high resolution POSIX timers.

But that really doesn't solve the problem either.

You still need to have some limit on the timer resolution. Whether you
call that limit "HZ" or something else is irrelevant in the end. Just
calling them "high-resolution" doesn't make the problem go away, you still
have some resolution (*).

So once you set some magic limit on the fine-grained resolution (let's
call that "MAX_FINE_HZ"), you might as well realize that that really is
100% equivalent to just making HZ _be_ that value. Together with possibly
making the actual timer tick happen at a slower rate according to some
other heuristics (ie "the system doesn't need timers right now, let's just
not do them").

		Linus

(*) Which is a lot less than the hw can generate, since you mustn't allow
users to bog down the system in timer interrupts by just using
"itimer(ITIMER_REAL, .. fine-resolution..)".


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-15  5:06 ` Linus Torvalds
@ 2002-07-15 16:26   ` Robert Love
  2002-07-15 18:56     ` Linus Torvalds
  2002-07-16 11:41   ` Vojtech Pavlik
  2002-07-17 19:33   ` Daniel Phillips
  2 siblings, 1 reply; 76+ messages in thread
From: Robert Love @ 2002-07-15 16:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Sun, 2002-07-14 at 22:06, Linus Torvalds wrote:

> I've never had good reason to believe the latency/perf benefits myself,
> but I was approached at OLS about problems with something as simple as
> DVD playing, where a 100Hz timer means that the DVD player ends up
> having to busy-loop on gettimeofday() because it cannot sanely sleep due
> to the lack in sufficient sleeping granularity.

A cleaner solution to this issue is a higher resolution timer, e.g. the
high-res-timers project which has high resolution POSIX timers.

We could still bump HZ, of course...

	Robert Love


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 19:59 Grover, Andrew
  2002-07-10 21:09 ` george anzinger
  2002-07-10 21:28 ` Andrew Morton
@ 2002-07-15  5:06 ` Linus Torvalds
  2002-07-15 16:26   ` Robert Love
                     ` (2 more replies)
  2 siblings, 3 replies; 76+ messages in thread
From: Linus Torvalds @ 2002-07-15  5:06 UTC (permalink / raw)
  To: linux-kernel

In article <59885C5E3098D511AD690002A5072D3C02AB7F88@orsmsx111.jf.intel.com>,
Grover, Andrew <andrew.grover@intel.com> wrote:
>
>But on the other hand, increasing HZ has perf/latency benefits, yes? Have
>these been quantified?

I've never had good reason to believe the latency/perf benefits myself,
but I was approached at OLS about problems with something as simple as
DVD playing, where a 100Hz timer means that the DVD player ends up
having to busy-loop on gettimeofday() because it cannot sanely sleep due
to the lack in sufficient sleeping granularity.

You apparently end up visibly missing frames - a frame is just 3 timer
ticks at 100 Hz, and considering that the kernel has to round up by one
due to POSIX requirements _and_ considering that you lose roughly one
for actually processing the frame itself, that doesn't sound _that_
outlandish. 

>		 I'd either like to see a HZ that has balanced
>power/performance, or could we perhaps detect we are on a system that cares
>about power (aka a laptop) and tweak its value at runtime?

Runtime tweaking is not really an option with the current setup. There
are also divisions etc that really want it to be a compile-time constant
for efficiency.

As noted, even power/performance-wise a higher Hz can actually _help_. 
Especially on laptops.  Exactly because you actually sanely _can_ afford
to sleep, which you cannot with a 100Hz timer. 

So you lose some, you win some, depending on your needs.

There is, of course, the option to do variable frequency (and make it
integer multiples of the exposed "constant HZ" so that kernel code
doesn't actually need to _care_ about the variability). There are
patches to play with things like that.

		Linus

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-12  1:26           ` Roland Dreier
@ 2002-07-12 17:30             ` george anzinger
  0 siblings, 0 replies; 76+ messages in thread
From: george anzinger @ 2002-07-12 17:30 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Stevie O, lkml

Roland Dreier wrote:
> 
> >>>>> "george" == george anzinger <george@mvista.com> writes:
> 
>     george> Well, in truth it has nothing to do with interrupts.  It
>     george> is just that that is the way most systems keep time.  The
>     george> REAL definition of HZ is in its relationship to jiffies
>     george> and seconds.
> 
>     george> I.e. jiffies * HZ = seconds, by definition.
> 
> I'm sure you know the truth, but this isn't quite right.  Just to be
> pedantic and make sure the correct definition is out there:
> 
>   jiffies / HZ = seconds
> 
> For example if HZ is 100 then the jiffy counter is incremented 100
> times each second.
> 
Of course you are right.  Must have been a brain fart :)

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-12  0:36       ` Stevie O
  2002-07-12  0:50         ` Thunder from the hill
  2002-07-12  1:09         ` george anzinger
@ 2002-07-12  3:01         ` Bernd Eckenfels
  2 siblings, 0 replies; 76+ messages in thread
From: Bernd Eckenfels @ 2002-07-12  3:01 UTC (permalink / raw)
  To: linux-kernel

In article <5.1.0.14.2.20020711201602.022387b0@whisper.qrpff.net> you wrote:
> Why must HZ be the same as 'interrupts per second'?

Well, it must not. But currently each timer interrupt the tick timestamp is
increased by one. So to find out how many seconds uptime you have (and other
things which are measured in timer ticks and passed to the userspace) you
need to know how many ticks have passed.

Actually there are a few things here, on the one hand, kernel should not
pass values in ticks to the userspace. 

On the other hand having a changing HZ does not work for timespans measured
in those ticks, as long as those are not adjusted. One could think about
having a doze mode where only every 100th interruped is generated but it
increasedss the tick count by 100. Mst likely this will break a lot of
averaged measuring and stats counting, tough.

Greetings
Bernd

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-12  0:55           ` Robert Love
  2002-07-12  0:58             ` Thunder from the hill
  2002-07-12  1:24             ` Alan Cox
@ 2002-07-12  1:37             ` Mark Hahn
  2 siblings, 0 replies; 76+ messages in thread
From: Mark Hahn @ 2002-07-12  1:37 UTC (permalink / raw)
  To: linux-kernel

> > > Why must HZ be the same as 'interrupts per second'?
> > 
> > s/interrupts/scheduler calls/
> 
> Uh, HZ is not scheduler calls per second.
> 
> Neither exactly is it interrupts per second, but _timer_ interrupts per
> second.  It is the frequency of the timer interrupt.

is there really code which uses HZ which is not merely fiddling with jiffies?
that is, HZ is merely "jiffies per second".  there's no reason the timer
(if any!) couldn't run faster than HZ, even at different ratios depending on
power level.

afaikt, jiffies has survived because there's a need for a
moderately fast, strictly monotonically increasing clock.
that doesn't imply that the periodic timer needs to run at HZ
or even that such a clock exists (tickless).  
just that the kernel promises to update jiffies at HZ,
even if that means HZ is 1M, and goes by jumps of 10K.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-12  1:09         ` george anzinger
  2002-07-12  1:26           ` Roland Dreier
@ 2002-07-12  1:35           ` Stevie O
  1 sibling, 0 replies; 76+ messages in thread
From: Stevie O @ 2002-07-12  1:35 UTC (permalink / raw)
  To: george anzinger; +Cc: lkml

At 06:09 PM 7/11/2002 -0700, george anzinger wrote:
>> Why must HZ be the same as 'interrupts per second'?
>
>Well, in truth it has nothing to do with interrupts.  It is
>just that that is the way most systems keep time.  The REAL
>definition of HZ is in its relationship to jiffies and
>seconds.  
>
>I.e. jiffies * HZ = seconds, by definition.  
>
>Then we define interfaces that promise to return so many
>jiffies from now and we keep execution time and time slice
>times in jiffies.  In order to keep these things true, it is
>usual to set up some sort of timer to interrupt once each
>jiffie.  Now we can actually do this two ways.  We can say
>that the interrupt is a reminder to look at a "reliable
>clock" and update the system time with what we find OR we
>can use the interrupt to actually drive the system time. 
>The former is the more accurate way of doing things as it
>eliminates interrupt latency.  It also allows us to use a
>more sloppy source of interrupts since they are just
>reminders to check a clock and not actually driving the
>clock.  This, by the way, is what the high-res-timers patch
>does.  Doing things this way also allows one to reprogram
>the timer interrupt hardware with out worrying too much
>about loosing track of time.  The HRT patch does this to
>generate interrupts at sub jiffie intervals, but only when
>required.

So why not do it this way:

1. Let HZ = 1000.

2. Program PIT (having programmed the PC speaker in DOS, I personally believe Intel forgot the 'A' at the end of the name) to fire every 10ms.

3. void pit_isr(void) { jiffies += 10; do_other_stuff(); }


--
Stevie-O

Real programmers use COPY CON PROGRAM.EXE


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-12  1:09         ` george anzinger
@ 2002-07-12  1:26           ` Roland Dreier
  2002-07-12 17:30             ` george anzinger
  2002-07-12  1:35           ` Stevie O
  1 sibling, 1 reply; 76+ messages in thread
From: Roland Dreier @ 2002-07-12  1:26 UTC (permalink / raw)
  To: george anzinger; +Cc: Stevie O, lkml

>>>>> "george" == george anzinger <george@mvista.com> writes:

    george> Well, in truth it has nothing to do with interrupts.  It
    george> is just that that is the way most systems keep time.  The
    george> REAL definition of HZ is in its relationship to jiffies
    george> and seconds.

    george> I.e. jiffies * HZ = seconds, by definition.

I'm sure you know the truth, but this isn't quite right.  Just to be
pedantic and make sure the correct definition is out there:

  jiffies / HZ = seconds

For example if HZ is 100 then the jiffy counter is incremented 100
times each second.

Best,
  Roland

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-12  0:55           ` Robert Love
  2002-07-12  0:58             ` Thunder from the hill
@ 2002-07-12  1:24             ` Alan Cox
  2002-07-12  1:37             ` Mark Hahn
  2 siblings, 0 replies; 76+ messages in thread
From: Alan Cox @ 2002-07-12  1:24 UTC (permalink / raw)
  To: Robert Love; +Cc: Thunder from the hill, Stevie O, lkml

> Uh, HZ is not scheduler calls per second.
> 
> Neither exactly is it interrupts per second, but _timer_ interrupts per
> second.  It is the frequency of the timer interrupt.

Its not exactly that either. Its 'rate at which jiffies is incremented'.
The distinction is not pedantic its rather critical when you go to a 
variable timer tick...

Alan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-12  0:36       ` Stevie O
  2002-07-12  0:50         ` Thunder from the hill
@ 2002-07-12  1:09         ` george anzinger
  2002-07-12  1:26           ` Roland Dreier
  2002-07-12  1:35           ` Stevie O
  2002-07-12  3:01         ` Bernd Eckenfels
  2 siblings, 2 replies; 76+ messages in thread
From: george anzinger @ 2002-07-12  1:09 UTC (permalink / raw)
  To: Stevie O; +Cc: lkml

Stevie O wrote:
> 
> At <time> <date>, <user> [<email>] wrote:
> > <stuff>
> 
> A lot of people are talking about how HZ needs to be a constant, etc.
> 
> I don't do much kernel hacking, so allow me to post a query that would (probably) better belong on #kernelnewbies if I wasn't so damn lazy ;) --
> 
> Why must HZ be the same as 'interrupts per second'?

Well, in truth it has nothing to do with interrupts.  It is
just that that is the way most systems keep time.  The REAL
definition of HZ is in its relationship to jiffies and
seconds.  

I.e. jiffies * HZ = seconds, by definition.  

Then we define interfaces that promise to return so many
jiffies from now and we keep execution time and time slice
times in jiffies.  In order to keep these things true, it is
usual to set up some sort of timer to interrupt once each
jiffie.  Now we can actually do this two ways.  We can say
that the interrupt is a reminder to look at a "reliable
clock" and update the system time with what we find OR we
can use the interrupt to actually drive the system time. 
The former is the more accurate way of doing things as it
eliminates interrupt latency.  It also allows us to use a
more sloppy source of interrupts since they are just
reminders to check a clock and not actually driving the
clock.  This, by the way, is what the high-res-timers patch
does.  Doing things this way also allows one to reprogram
the timer interrupt hardware with out worrying too much
about loosing track of time.  The HRT patch does this to
generate interrupts at sub jiffie intervals, but only when
required.

-g
> 
> --
> Stevie-O
> 
> Real programmers link their executables by hand.
> 

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-12  0:55           ` Robert Love
@ 2002-07-12  0:58             ` Thunder from the hill
  2002-07-12  1:24             ` Alan Cox
  2002-07-12  1:37             ` Mark Hahn
  2 siblings, 0 replies; 76+ messages in thread
From: Thunder from the hill @ 2002-07-12  0:58 UTC (permalink / raw)
  To: Robert Love; +Cc: Thunder from the hill, Stevie O, lkml

Hi,

On 11 Jul 2002, Robert Love wrote:
> Uh, HZ is not scheduler calls per second.

Sorry, I must be sleeping...

It's disnerving to fix that stuff I'm fixing. I can't even concentrate on 
reading any more. Sure, you're right.

							Regards,
							Thunder
-- 
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o?  K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y- 
------END GEEK CODE BLOCK------


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-12  0:50         ` Thunder from the hill
@ 2002-07-12  0:55           ` Robert Love
  2002-07-12  0:58             ` Thunder from the hill
                               ` (2 more replies)
  0 siblings, 3 replies; 76+ messages in thread
From: Robert Love @ 2002-07-12  0:55 UTC (permalink / raw)
  To: Thunder from the hill; +Cc: Stevie O, lkml

On Thu, 2002-07-11 at 17:50, Thunder from the hill wrote:

> On Thu, 11 Jul 2002, Stevie O wrote:
> > Why must HZ be the same as 'interrupts per second'?
> 
> s/interrupts/scheduler calls/

Uh, HZ is not scheduler calls per second.

Neither exactly is it interrupts per second, but _timer_ interrupts per
second.  It is the frequency of the timer interrupt.

> But what exactly does this question mean to be? I don't fully understand. 
> We define HZ to have an interval for the calls of the scheduler. That's 
> why it is the number of scheduler calls per second, because that's what it 
> was invented to be.

No no no...

	Robert Love


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-12  0:36       ` Stevie O
@ 2002-07-12  0:50         ` Thunder from the hill
  2002-07-12  0:55           ` Robert Love
  2002-07-12  1:09         ` george anzinger
  2002-07-12  3:01         ` Bernd Eckenfels
  2 siblings, 1 reply; 76+ messages in thread
From: Thunder from the hill @ 2002-07-12  0:50 UTC (permalink / raw)
  To: Stevie O; +Cc: lkml

Hi,

On Thu, 11 Jul 2002, Stevie O wrote:
> Why must HZ be the same as 'interrupts per second'?

s/interrupts/scheduler calls/

But what exactly does this question mean to be? I don't fully understand. 
We define HZ to have an interval for the calls of the scheduler. That's 
why it is the number of scheduler calls per second, because that's what it 
was invented to be.

							Regards,
							Thunder
-- 
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o?  K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y- 
------END GEEK CODE BLOCK------


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11  7:15     ` george anzinger
@ 2002-07-12  0:36       ` Stevie O
  2002-07-12  0:50         ` Thunder from the hill
                           ` (2 more replies)
  0 siblings, 3 replies; 76+ messages in thread
From: Stevie O @ 2002-07-12  0:36 UTC (permalink / raw)
  To: lkml

At <time> <date>, <user> [<email>] wrote:
> <stuff>

A lot of people are talking about how HZ needs to be a constant, etc.

I don't do much kernel hacking, so allow me to post a query that would (probably) better belong on #kernelnewbies if I wasn't so damn lazy ;) --

Why must HZ be the same as 'interrupts per second'?

--
Stevie-O

Real programmers link their executables by hand.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11 11:35     ` Kasper Dupont
  2002-07-11 12:30       ` Alan Cox
@ 2002-07-11 18:51       ` george anzinger
  1 sibling, 0 replies; 76+ messages in thread
From: george anzinger @ 2002-07-11 18:51 UTC (permalink / raw)
  To: Kasper Dupont; +Cc: Lincoln Dale, Linux

Kasper Dupont wrote:
> 
> Lincoln Dale wrote:
> >
> > (or a highly-accurate single-fire timer)?
> 
> That would be my preference, at least on hardware where it can
> be done efficient and accurate.
> 
> The x86 PIT can be programmed in one-shot mode, but the delay
> cannot be programmed to be more than approximately 55msec. For
> longer delays we'd have to get interrupted prematurely just
> to reprogram the PIT for another delay. This is of course no
> worse than an interrupt every 1 or 10 msec we actually don't
> need.
> 
> Another problem is that a PIT in one shot mode cannot meassure
> time accurately. Each interrupt will arrive slightly off the
> wanted time. For the interrupt itself this is no big deal, but
> for meassuring time they will accumulate, so you'd see a clock
> drifting beyond anything acceptable.
> 
> The answer here is that we need something else for meassuring
> time, I guess the TSC would be appropriate. If doing all clock
> meassurements using the TSC the clock would no longer drift in
> case of lost timer interrupts. The TSC frequency can be
> meassured at boot time, and if done smart enough that variable
> can be made into a knob that ntpd can control to adjust the
> clock speed instead of a jumping clock once in a while. If we
> are smart enough we can get walltime more accurate than it has
> ever been seen before. :-)

The high-res-timers patch does most of this (all but the ntp
knob).  It allows you to use either the TSC or the ACPI pm
timer to keep clock time.  The former is fast, but some
systems are known to "mess" with the TSC as part of power
management.  The pm timer, being I/O, takes more time to
read, but is not "messed" with.
> 
> The problems remaining know are:
> 1) Reprogramming the PIT is slow and inaccurate, we'd like
>    better hardware for producing timer interrupts. (I think I
>    read somewhere that an APIC could help us here.)

Actually the "best" option would be something like the
decrementer in the PPC.  It can be set to generate an
interrupt at just about any time.  Another HW register
(64-bits) keeps track of (effectively) decrementer clocks
since boot and can be used as the clock source.  The best
solution  in the x86 platform, would be an additional
register that either counts down at TSC speed to an
interrupt OR compares to the TSC and interrupts on compare. 
It should be a cpu register to avoid the latencies of
accessing an I/O register.

> 2) We will be meassuring time in a lot of different units,
>    which needs to be converted. The PIT using 1/1193180 sec,
>    the TSC using a varying unit, and finally the user/kernel
>    interface using secs, msecs, usecs, nsecs.

Not really a big problem.  The conversion constants are
computed once (or at ntp correction) and from then on all
one does is mpy and shift instructions to do the
conversion.  (Again, see the HRT patch.)

> 3) On SMP hardware we will be using different TSCs on
>    different CPUs. Having TSCs in sync might get more imporant
>    than on current kernels.
> 4) We are introducing new hardware requirements.
> 
> I'd like to see oneshot timer interrupts as a compile time
> option on any architecture that is capable of doing it. But of
> course it is not easy.

As I imply above, the one shot, if done as an I/O device, is
less than optimal.  Better is the PPC decrementer.
> 
> Have I missed something somewhere?
> 

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 21:35   ` Benjamin LaHaise
  2002-07-10 21:38     ` Andrew Morton
@ 2002-07-11 17:01     ` Martin Dalecki
  1 sibling, 0 replies; 76+ messages in thread
From: Martin Dalecki @ 2002-07-11 17:01 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Andrew Morton, Grover, Andrew, Linux

Użytkownik Benjamin LaHaise napisał:
> On Wed, Jul 10, 2002 at 02:28:03PM -0700, Andrew Morton wrote:
> 
>>>But on the other hand, increasing HZ has perf/latency benefits, yes? Have
>>>these been quantified?
>>
>>Not that I'm aware of.  And I'd regard any such claims with some
>>scepticism.
> 
> 
> The most obvious one is the reduced latency of select/poll timeouts.

Which you can actually see if running x11perf or simple ico.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11 12:54     ` Thunder from the hill
@ 2002-07-11 15:59       ` Martin Dalecki
  0 siblings, 0 replies; 76+ messages in thread
From: Martin Dalecki @ 2002-07-11 15:59 UTC (permalink / raw)
  To: Thunder from the hill
  Cc: Hannu Savolainen, george anzinger, Grover, Andrew, Linux

Użytkownik Thunder from the hill napisał:
> Hi,
> 
> On Thu, 11 Jul 2002, Hannu Savolainen wrote:
> 
>>This is not a problem at all. Just define HZ as:
>>
>>extern int system_hz;
>>#define HZ system_hz
>>
>>After that all code will use variable HZ. Changing HZ on fly will be
>>dangerous. However HZ can be made a boot time (LILO) parameter.
> 
> 
> OK, that's probably a start. As the next step, I'd recommend that the 
> maintainers and their supporters try to replace the static HZ with 
> possibly-dynamic system_hz. The third step would be to have guys like Ingo 
> to tune system_hz to be really dynamic.
> 
> Cool idea, anyway.

Just remember plase to map it to /proc/sys/kernel/xxx
So we could implement the following properly:

_SC_CLK_TCK            CLK_TCK       Ticks per second          (clock_t)

(Taken from Solaris pecs.)

Unless of course we stick to the fact that HZ exposed
to user land remains an arch specific constant as in 2.5.25 which
I think is the more prefferable solution.

Pitty is RedHat beta does mess with this! The 2.5.25 solutoin from
Linus is far better.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11 13:37         ` Kasper Dupont
@ 2002-07-11 15:46           ` Alan Cox
  0 siblings, 0 replies; 76+ messages in thread
From: Alan Cox @ 2002-07-11 15:46 UTC (permalink / raw)
  To: Kasper Dupont; +Cc: Alan Cox, Lincoln Dale, Linux

> > The APIC on modern systems has decent timers. There may also be ACPI timers
> > we can use on ACPI capable systems.
> 
> In what units do they meassure time? It would be nice if
> they were garanteed to match the TSC frequency or some
> other of the units already being used.

It really doesn't matter providing the resolution is decent. Conversion 
between formats of time is a maths operation, and we can handle those 8)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11 12:30       ` Alan Cox
@ 2002-07-11 13:37         ` Kasper Dupont
  2002-07-11 15:46           ` Alan Cox
  0 siblings, 1 reply; 76+ messages in thread
From: Kasper Dupont @ 2002-07-11 13:37 UTC (permalink / raw)
  To: Alan Cox; +Cc: Lincoln Dale, Linux

Alan Cox wrote:
> 
> > I'd like to see oneshot timer interrupts as a compile time
> > option on any architecture that is capable of doing it. But of
> > course it is not easy.
> >
> > Have I missed something somewhere?
> 
> The APIC on modern systems has decent timers. There may also be ACPI timers
> we can use on ACPI capable systems.

In what units do they meassure time? It would be nice if
they were garanteed to match the TSC frequency or some
other of the units already being used.

-- 
Kasper Dupont -- der bruger for meget tid på usenet.
For sending spam use mailto:razor-report@daimi.au.dk

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11  6:03   ` Hannu Savolainen
  2002-07-11  7:15     ` george anzinger
@ 2002-07-11 12:54     ` Thunder from the hill
  2002-07-11 15:59       ` Martin Dalecki
  1 sibling, 1 reply; 76+ messages in thread
From: Thunder from the hill @ 2002-07-11 12:54 UTC (permalink / raw)
  To: Hannu Savolainen; +Cc: george anzinger, Grover, Andrew, Linux

Hi,

On Thu, 11 Jul 2002, Hannu Savolainen wrote:
> This is not a problem at all. Just define HZ as:
> 
> extern int system_hz;
> #define HZ system_hz
> 
> After that all code will use variable HZ. Changing HZ on fly will be
> dangerous. However HZ can be made a boot time (LILO) parameter.

OK, that's probably a start. As the next step, I'd recommend that the 
maintainers and their supporters try to replace the static HZ with 
possibly-dynamic system_hz. The third step would be to have guys like Ingo 
to tune system_hz to be really dynamic.

Cool idea, anyway.

							Regards,
							Thunder
-- 
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o?  K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y- 
------END GEEK CODE BLOCK------


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11 11:35     ` Kasper Dupont
@ 2002-07-11 12:30       ` Alan Cox
  2002-07-11 13:37         ` Kasper Dupont
  2002-07-11 18:51       ` george anzinger
  1 sibling, 1 reply; 76+ messages in thread
From: Alan Cox @ 2002-07-11 12:30 UTC (permalink / raw)
  To: Kasper Dupont; +Cc: Lincoln Dale, Linux

> I'd like to see oneshot timer interrupts as a compile time
> option on any architecture that is capable of doing it. But of
> course it is not easy.
> 
> Have I missed something somewhere?

The APIC on modern systems has decent timers. There may also be ACPI timers
we can use on ACPI capable systems.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11  0:28   ` Lincoln Dale
@ 2002-07-11 11:35     ` Kasper Dupont
  2002-07-11 12:30       ` Alan Cox
  2002-07-11 18:51       ` george anzinger
  0 siblings, 2 replies; 76+ messages in thread
From: Kasper Dupont @ 2002-07-11 11:35 UTC (permalink / raw)
  To: Lincoln Dale; +Cc: Linux

Lincoln Dale wrote:
> 
> (or a highly-accurate single-fire timer)?

That would be my preference, at least on hardware where it can
be done efficient and accurate.

The x86 PIT can be programmed in one-shot mode, but the delay
cannot be programmed to be more than approximately 55msec. For
longer delays we'd have to get interrupted prematurely just
to reprogram the PIT for another delay. This is of course no
worse than an interrupt every 1 or 10 msec we actually don't
need.

Another problem is that a PIT in one shot mode cannot meassure
time accurately. Each interrupt will arrive slightly off the
wanted time. For the interrupt itself this is no big deal, but
for meassuring time they will accumulate, so you'd see a clock
drifting beyond anything acceptable.

The answer here is that we need something else for meassuring
time, I guess the TSC would be appropriate. If doing all clock
meassurements using the TSC the clock would no longer drift in
case of lost timer interrupts. The TSC frequency can be
meassured at boot time, and if done smart enough that variable
can be made into a knob that ntpd can control to adjust the
clock speed instead of a jumping clock once in a while. If we
are smart enough we can get walltime more accurate than it has
ever been seen before. :-)

The problems remaining know are:
1) Reprogramming the PIT is slow and inaccurate, we'd like
   better hardware for producing timer interrupts. (I think I
   read somewhere that an APIC could help us here.)
2) We will be meassuring time in a lot of different units,
   which needs to be converted. The PIT using 1/1193180 sec,
   the TSC using a varying unit, and finally the user/kernel
   interface using secs, msecs, usecs, nsecs.
3) On SMP hardware we will be using different TSCs on
   different CPUs. Having TSCs in sync might get more imporant
   than on current kernels.
4) We are introducing new hardware requirements.

I'd like to see oneshot timer interrupts as a compile time
option on any architecture that is capable of doing it. But of
course it is not easy.

Have I missed something somewhere?

-- 
Kasper Dupont -- der bruger for meget tid på usenet.
For sending spam use mailto:razor-report@daimi.au.dk

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-11  6:03   ` Hannu Savolainen
@ 2002-07-11  7:15     ` george anzinger
  2002-07-12  0:36       ` Stevie O
  2002-07-11 12:54     ` Thunder from the hill
  1 sibling, 1 reply; 76+ messages in thread
From: george anzinger @ 2002-07-11  7:15 UTC (permalink / raw)
  To: Hannu Savolainen; +Cc: Grover, Andrew, Linux

Hannu Savolainen wrote:
> 
> Hi,
> 
> IMHO the easiest solution is just making HZ selectable (100 or 1000 or
> maybe 1024) when configuring the kernel. Also there has to be a variable
> that exports the configured HZ value to modules. In that way users can
> select HZ depending on their needs.
> 
> There are users who don't use power management. Instead they need higher
> HZ for various reasons. Kernels compiled with HZ=1000 have been used
> successfully since year 0 without any major problems. Making HZ
> configurable just makes life easier for such users.
> 
> OTOH the higher wakeup rate during low power states can be cured by
> temporarily lowering the hw clock rate from 1000 to 100. The timer
> interrupt handler just increases jiffies by 10 (instead of 1). All code
> compiled with HZ=1000 still works but there may be latency problems during
> low power states.
> 
> On Wed, 10 Jul 2002, george anzinger wrote:
> 
> > "Grover, Andrew" wrote:
> > >
> > > I'd like to see HZ closer to 100 than 1000, for CPU power reasons. Processor
> > > power states like C3 may take 100 microseconds+ to enter/leave - time when
> > > both the CPU isn't doing any work, but still drawing power as if it was. We
> > > pop out of C3 whenever there is an interrupt, so reducing timer interrupts
> > > is good from a power standpoint by amortizing the transition penalty over a
> > > longer period of power savings.
> > >
> > > But on the other hand, increasing HZ has perf/latency benefits, yes? Have
> > > these been quantified? I'd either like to see a HZ that has balanced
> > > power/performance, or could we perhaps detect we are on a system that cares
> > > about power (aka a laptop) and tweak its value at runtime?
> >
> > HZ is used in a LOT of places.  I suspect "tweaking" at run
> > time would be a bit difficult.
> This is not a problem at all. Just define HZ as:
> 
> extern int system_hz;
> #define HZ system_hz
> 
> After that all code will use variable HZ. Changing HZ on fly will be
> dangerous. However HZ can be made a boot time (LILO) parameter.

This is not really advisable.  A good deal to of the timer
code depends on HZ being a constant so that calculations are
done at compile time.  A lot of this code would be
measurably slower if these calculations were required at run
time.  For example, often a divide is used with the
understanding that it will be done at compile time, not run
time.

-g

> 
> > The high-res-timers patch give high resolution timers with
> > out changing HZ.  Interrupts are scheculed as needed,
> > between the 1/HZ ticks, so a quite system will have few (if
> > any) interrupts between the ticks.
> >
> > --
> > George Anzinger   george@mvista.com
> > High-res-timers:
> > http://sourceforge.net/projects/high-res-timers/
> > Real time sched:  http://sourceforge.net/projects/rtsched/
> > Preemption patch:
> > http://www.kernel.org/pub/linux/kernel/people/rml
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> 
> Best regards,
> 
> Hannu
> -----
> Hannu Savolainen (hannu@opensound.com)
> http://www.opensound.com (Open Sound System (OSS))
> http://www.compusonic.fi (Finnish OSS pages)

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 21:09 ` george anzinger
@ 2002-07-11  6:03   ` Hannu Savolainen
  2002-07-11  7:15     ` george anzinger
  2002-07-11 12:54     ` Thunder from the hill
  0 siblings, 2 replies; 76+ messages in thread
From: Hannu Savolainen @ 2002-07-11  6:03 UTC (permalink / raw)
  To: george anzinger; +Cc: Grover, Andrew, Linux

Hi,

IMHO the easiest solution is just making HZ selectable (100 or 1000 or
maybe 1024) when configuring the kernel. Also there has to be a variable
that exports the configured HZ value to modules. In that way users can
select HZ depending on their needs.

There are users who don't use power management. Instead they need higher
HZ for various reasons. Kernels compiled with HZ=1000 have been used
successfully since year 0 without any major problems. Making HZ
configurable just makes life easier for such users.

OTOH the higher wakeup rate during low power states can be cured by
temporarily lowering the hw clock rate from 1000 to 100. The timer
interrupt handler just increases jiffies by 10 (instead of 1). All code
compiled with HZ=1000 still works but there may be latency problems during
low power states.

On Wed, 10 Jul 2002, george anzinger wrote:

> "Grover, Andrew" wrote:
> > 
> > I'd like to see HZ closer to 100 than 1000, for CPU power reasons. Processor
> > power states like C3 may take 100 microseconds+ to enter/leave - time when
> > both the CPU isn't doing any work, but still drawing power as if it was. We
> > pop out of C3 whenever there is an interrupt, so reducing timer interrupts
> > is good from a power standpoint by amortizing the transition penalty over a
> > longer period of power savings.
> > 
> > But on the other hand, increasing HZ has perf/latency benefits, yes? Have
> > these been quantified? I'd either like to see a HZ that has balanced
> > power/performance, or could we perhaps detect we are on a system that cares
> > about power (aka a laptop) and tweak its value at runtime?
> 
> HZ is used in a LOT of places.  I suspect "tweaking" at run
> time would be a bit difficult.  
This is not a problem at all. Just define HZ as:

extern int system_hz;
#define HZ system_hz

After that all code will use variable HZ. Changing HZ on fly will be
dangerous. However HZ can be made a boot time (LILO) parameter.

> The high-res-timers patch give high resolution timers with
> out changing HZ.  Interrupts are scheculed as needed,
> between the 1/HZ ticks, so a quite system will have few (if
> any) interrupts between the ticks.
> 
> -- 
> George Anzinger   george@mvista.com
> High-res-timers: 
> http://sourceforge.net/projects/high-res-timers/
> Real time sched:  http://sourceforge.net/projects/rtsched/
> Preemption patch:
> http://www.kernel.org/pub/linux/kernel/people/rml
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


Best regards,

Hannu
-----
Hannu Savolainen (hannu@opensound.com)
http://www.opensound.com (Open Sound System (OSS))
http://www.compusonic.fi (Finnish OSS pages)


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 21:42       ` Benjamin LaHaise
@ 2002-07-11  2:14         ` CaT
  0 siblings, 0 replies; 76+ messages in thread
From: CaT @ 2002-07-11  2:14 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Andrew Morton, Grover, Andrew, Linux

On Wed, Jul 10, 2002 at 05:42:51PM -0400, Benjamin LaHaise wrote:
> On Wed, Jul 10, 2002 at 02:38:32PM -0700, Andrew Morton wrote:
> > OK, I'll grant that.  Why is this useful?
> 
> Think video playback, where you want to queue the frame to be played as 
> close to the correct 1/60s time as possible.  With HZ=100, the code will 

Or 1/50 (think PAL), no? (Of course HZ=100 would be sweet for that. ;)

-- 
GOVERNMENT ANNOUNCEMENT - The  government announced  today that  it is
changing its mascot  to a condom because  it more clearly reflects the
government's political stance.  A condom stands up to inflation, halts
production, destroys  the next generation,  protects a bunch of pricks
and finally, gives you a sense of security while you're being screwed!

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 21:28 ` Andrew Morton
  2002-07-10 21:35   ` Benjamin LaHaise
  2002-07-10 22:01   ` Thunder from the hill
@ 2002-07-11  0:28   ` Lincoln Dale
  2002-07-11 11:35     ` Kasper Dupont
  2 siblings, 1 reply; 76+ messages in thread
From: Lincoln Dale @ 2002-07-11  0:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Grover, Andrew, Linux

At 02:28 PM 10/07/2002 -0700, Andrew Morton wrote:
> > But on the other hand, increasing HZ has perf/latency benefits, yes? Have
> > these been quantified?
>
>Not that I'm aware of.  And I'd regard any such claims with some
>scepticism.

for one, i'm using a modified version of the network FIFO queue discipline 
to inject "delay" and "drop", similar to what ippipe can do on FreeBSD.
given i'm using a kernel timer for this, HZ >= 1000 is essential for <1.5 
millisecond accuracy.

perhaps we really need a high-speed timer mechanism for parts of the kernel 
that require it (or a highly-accurate single-fire timer)?


cheers,

lincoln.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 22:01   ` Thunder from the hill
  2002-07-10 22:09     ` Cort Dougan
  2002-07-10 22:41     ` Thunder from the hill
@ 2002-07-10 23:50     ` J.A. Magallon
  2 siblings, 0 replies; 76+ messages in thread
From: J.A. Magallon @ 2002-07-10 23:50 UTC (permalink / raw)
  To: Thunder from the hill; +Cc: Andrew Morton, Grover, Andrew, Linux


On 2002.07.11 Thunder from the hill wrote:
>Hi,
>
>On Wed, 10 Jul 2002, Andrew Morton wrote:
>> That makes a ton of sense.
>> 
>> > But on the other hand, increasing HZ has perf/latency benefits, yes? Have
>> > these been quantified?
>> 
>> Not that I'm aware of.  And I'd regard any such claims with some
>> scepticism.
>> 
>> > I'd either like to see a HZ that has balanced
>> > power/performance, or could we perhaps detect we are on a system that cares
>> > about power (aka a laptop) and tweak its value at runtime?
>
>Want a config option? Either int or bool (CONFIG_LOW_HZ). It's not too 
>much effort.
>

How about a <boot> option ? linux hz=[low,high]

It is runtime, but just one time.

-- 
J.A. Magallon             \   Software is like sex: It's better when it's free
mailto:jamagallon@able.es  \                    -- Linus Torvalds, FSF T-shirt
Linux werewolf 2.4.19-rc1-jam2, Mandrake Linux 8.3 (Cooker) for i586
gcc (GCC) 3.1.1 (Mandrake Linux 8.3 3.1.1-0.7mdk)

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 23:08       ` Dave Mielke
@ 2002-07-10 23:13         ` Thunder from the hill
  0 siblings, 0 replies; 76+ messages in thread
From: Thunder from the hill @ 2002-07-10 23:13 UTC (permalink / raw)
  To: Dave Mielke; +Cc: Thunder from the hill, Andrew Morton, Grover, Andrew, Linux

Hi,

On Wed, 10 Jul 2002, Dave Mielke wrote:
> [quoted lines by Thunder from the hill on July 10, 2002, at 16:41]
> 
> >+  Enable this  if you care about  your CPU sleeping  time. The current
> >+  interval for  scheduling processes in  the kernel has  recently been
> >+  increased.
> 
> The word "recently" will very quickly become out-of-date. Why not just state
> the way it is and why one might want to select the option?

I don't think this is a patch for long.

							Regards,
							Thunder
-- 
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o?  K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y- 
------END GEEK CODE BLOCK------


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 22:41     ` Thunder from the hill
  2002-07-10 22:47       ` Thunder from the hill
  2002-07-10 22:49       ` Eli Carter
@ 2002-07-10 23:08       ` Dave Mielke
  2002-07-10 23:13         ` Thunder from the hill
  2 siblings, 1 reply; 76+ messages in thread
From: Dave Mielke @ 2002-07-10 23:08 UTC (permalink / raw)
  To: Thunder from the hill; +Cc: Andrew Morton, Grover, Andrew, Linux

[quoted lines by Thunder from the hill on July 10, 2002, at 16:41]

>+  Enable this  if you care about  your CPU sleeping  time. The current
>+  interval for  scheduling processes in  the kernel has  recently been
>+  increased.

The word "recently" will very quickly become out-of-date. Why not just state
the way it is and why one might want to select the option?

-- 
Dave Mielke           | 2213 Fox Crescent | I believe that the Bible is the
Phone: 1-613-726-0014 | Ottawa, Ontario   | Word of God. Please contact me
EMail: dave@mielke.cc | Canada  K2A 1H7   | if you're concerned about Hell.
http://familyradio.com


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 22:49       ` Eli Carter
@ 2002-07-10 23:05         ` Thunder from the hill
  0 siblings, 0 replies; 76+ messages in thread
From: Thunder from the hill @ 2002-07-10 23:05 UTC (permalink / raw)
  To: Eli Carter; +Cc: Thunder from the hill, Andrew Morton, Grover, Andrew, Linux

Hi,

On Wed, 10 Jul 2002, Eli Carter wrote:
> Perhaps s/increased/shortened/ ?

'Course. Sorry, I'm quite out of bounds since I heard that tonight some of 
my friends possibly got lost in the storm in Germany. I wished them lots 
of fun on their canoe tour at the Mueritz...

I think I'll get back there tonight, so don't expect many responses, TAs 
are ugly. Of course I'll try to get on working meanwhile.

The patch w/ the shortened-update is now at

http://luckynet.dynu.com/~thunder/patches/CONFIG_SCHED_LOW_HZ.patch

							Regards,
							Thunder
-- 
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o?  K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y- 
------END GEEK CODE BLOCK------


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 22:41     ` Thunder from the hill
  2002-07-10 22:47       ` Thunder from the hill
@ 2002-07-10 22:49       ` Eli Carter
  2002-07-10 23:05         ` Thunder from the hill
  2002-07-10 23:08       ` Dave Mielke
  2 siblings, 1 reply; 76+ messages in thread
From: Eli Carter @ 2002-07-10 22:49 UTC (permalink / raw)
  To: Thunder from the hill; +Cc: Andrew Morton, Grover, Andrew, Linux

Thunder from the hill wrote:
> Hi,
> 
> On Wed, 10 Jul 2002, Thunder from the hill wrote:
> 
>>Want a config option? Either int or bool (CONFIG_LOW_HZ). It's not too 
>>much effort.
> 
> 
> I guess I forgot the half of it...
> 
> What arches do we want?
> 
> Index: arch/i386/Config.help
> ===================================================================
> RCS file: /var/cvs/thunder-2.5/arch/i386/Config.help,v
> retrieving revision 1.4
> diff -p -u -r1.4 Config.help
> --- arch/i386/Config.help	7 Jul 2002 09:59:46 -0000	1.4
> +++ arch/i386/Config.help	10 Jul 2002 22:40:17 -0000
> @@ -991,3 +991,13 @@ CONFIG_X86_EARLY_PRINTK
>    to the console  much earlier in the boot  process than printk.  This
>    is useful when  debugging fatal problems early in  the boot sequence
>    (e.g. within setup_arch).  If unsure, say N.
> +
> +Low kernel scheduler rate
> +CONFIG_SCHED_LOW_HZ
> +  Enable this  if you care about  your CPU sleeping  time. The current
> +  interval for  scheduling processes in  the kernel has  recently been
> +  increased. The advantage is less latency for many things that depend

Perhaps s/increased/shortened/ ?

> +  on the  timer, the disadvantage is  that your cpu  will probably not
> +  go to sleep in time (so  CPU power management will possibly not work
> +  at all)
> +
> Index: include/asm-i386/param.h
[snip]

Eli
--------------------. "If it ain't broke now,
Eli Carter           \                  it will be soon." -- crypto-gram
eli.carter(a)inet.com `-------------------------------------------------


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 22:41     ` Thunder from the hill
@ 2002-07-10 22:47       ` Thunder from the hill
  2002-07-10 22:49       ` Eli Carter
  2002-07-10 23:08       ` Dave Mielke
  2 siblings, 0 replies; 76+ messages in thread
From: Thunder from the hill @ 2002-07-10 22:47 UTC (permalink / raw)
  To: Thunder from the hill; +Cc: Andrew Morton, Grover, Andrew, Linux

Hi,

On Wed, 10 Jul 2002, Thunder from the hill wrote:
> I guess I forgot the half of it...

I did. Here is the whole version:

Index: arch/i386/Config.help
===================================================================
RCS file: /var/cvs/thunder-2.5/arch/i386/Config.help,v
retrieving revision 1.4
diff -p -u -r1.4 Config.help
--- arch/i386/Config.help	7 Jul 2002 09:59:46 -0000	1.4
+++ arch/i386/Config.help	10 Jul 2002 22:40:17 -0000
@@ -991,3 +991,13 @@ CONFIG_X86_EARLY_PRINTK
   to the console  much earlier in the boot  process than printk.  This
   is useful when  debugging fatal problems early in  the boot sequence
   (e.g. within setup_arch).  If unsure, say N.
+
+Low kernel scheduler rate
+CONFIG_SCHED_LOW_HZ
+  Enable this  if you care about  your CPU sleeping  time. The current
+  interval for  scheduling processes in  the kernel has  recently been
+  increased. The advantage is less latency for many things that depend
+  on the  timer, the disadvantage is  that your cpu  will probably not
+  go to sleep in time (so  CPU power management will possibly not work
+  at all)
+
Index: include/asm-i386/param.h
===================================================================
RCS file: /var/cvs/thunder-2.5/include/asm-i386/param.h,v
retrieving revision 1.2
diff -p -u -r1.2 param.h
--- include/asm-i386/param.h	6 Jul 2002 18:17:30 -0000	1.2
+++ include/asm-i386/param.h	10 Jul 2002 22:40:17 -0000
@@ -2,7 +2,11 @@
 #define _ASMi386_PARAM_H
 
 #ifdef __KERNEL__
-# define HZ		1000		/* Internal kernel timer frequency */
+# ifdef CONFIG_SCHED_LOW_HZ
+#  define HZ		100		/* Internal kernel timer frequency */
+# else
+#  define HZ		1000		/* Internal kernel timer frequency */
+# endif
 # define USER_HZ	100		/* .. some user interfaces are in "ticks" */
 # define CLOCKS_PER_SEC	(USER_HZ)	/* like times() */
 #endif
Index: arch/i386/config.in
===================================================================
RCS file: /var/cvs/thunder-2.5/arch/i386/config.in,v
retrieving revision 1.8
diff -p -u -r1.8 config.in
--- arch/i386/config.in	7 Jul 2002 09:59:47 -0000	1.8
+++ arch/i386/config.in	10 Jul 2002 22:45:28 -0000
@@ -181,6 +181,7 @@ else
    bool 'Multiquad NUMA system' CONFIG_MULTIQUAD
 fi
 
+bool 'Low scheduler rates' CONFIG_SCHED_LOW_HZ
 bool 'Machine Check Exception' CONFIG_X86_MCE
 dep_bool 'Check for non-fatal errors on Athlon/Duron' CONFIG_X86_MCE_NONFATAL $CONFIG_X86_MCE
 dep_bool 'check for P4 thermal throttling interrupt.' CONFIG_X86_MCE_P4THERMAL $CONFIG_X86_MCE $CONFIG_X86_UP_APIC

							Regards,
							Thunder
-- 
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o?  K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y- 
------END GEEK CODE BLOCK------


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 22:01   ` Thunder from the hill
  2002-07-10 22:09     ` Cort Dougan
@ 2002-07-10 22:41     ` Thunder from the hill
  2002-07-10 22:47       ` Thunder from the hill
                         ` (2 more replies)
  2002-07-10 23:50     ` J.A. Magallon
  2 siblings, 3 replies; 76+ messages in thread
From: Thunder from the hill @ 2002-07-10 22:41 UTC (permalink / raw)
  To: Thunder from the hill; +Cc: Andrew Morton, Grover, Andrew, Linux

Hi,

On Wed, 10 Jul 2002, Thunder from the hill wrote:
> Want a config option? Either int or bool (CONFIG_LOW_HZ). It's not too 
> much effort.

I guess I forgot the half of it...

What arches do we want?

Index: arch/i386/Config.help
===================================================================
RCS file: /var/cvs/thunder-2.5/arch/i386/Config.help,v
retrieving revision 1.4
diff -p -u -r1.4 Config.help
--- arch/i386/Config.help	7 Jul 2002 09:59:46 -0000	1.4
+++ arch/i386/Config.help	10 Jul 2002 22:40:17 -0000
@@ -991,3 +991,13 @@ CONFIG_X86_EARLY_PRINTK
   to the console  much earlier in the boot  process than printk.  This
   is useful when  debugging fatal problems early in  the boot sequence
   (e.g. within setup_arch).  If unsure, say N.
+
+Low kernel scheduler rate
+CONFIG_SCHED_LOW_HZ
+  Enable this  if you care about  your CPU sleeping  time. The current
+  interval for  scheduling processes in  the kernel has  recently been
+  increased. The advantage is less latency for many things that depend
+  on the  timer, the disadvantage is  that your cpu  will probably not
+  go to sleep in time (so  CPU power management will possibly not work
+  at all)
+
Index: include/asm-i386/param.h
===================================================================
RCS file: /var/cvs/thunder-2.5/include/asm-i386/param.h,v
retrieving revision 1.2
diff -p -u -r1.2 param.h
--- include/asm-i386/param.h	6 Jul 2002 18:17:30 -0000	1.2
+++ include/asm-i386/param.h	10 Jul 2002 22:40:17 -0000
@@ -2,7 +2,11 @@
 #define _ASMi386_PARAM_H
 
 #ifdef __KERNEL__
-# define HZ		1000		/* Internal kernel timer frequency */
+# ifdef CONFIG_SCHED_LOW_HZ
+#  define HZ		100		/* Internal kernel timer frequency */
+# else
+#  define HZ		1000		/* Internal kernel timer frequency */
+# endif
 # define USER_HZ	100		/* .. some user interfaces are in "ticks" */
 # define CLOCKS_PER_SEC	(USER_HZ)	/* like times() */
 #endif

							Regards,
							Thunder
-- 
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o?  K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y- 
------END GEEK CODE BLOCK------


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 22:01   ` Thunder from the hill
@ 2002-07-10 22:09     ` Cort Dougan
  2002-07-10 22:41     ` Thunder from the hill
  2002-07-10 23:50     ` J.A. Magallon
  2 siblings, 0 replies; 76+ messages in thread
From: Cort Dougan @ 2002-07-10 22:09 UTC (permalink / raw)
  To: Thunder from the hill; +Cc: Andrew Morton, Grover, Andrew, Linux

Yes, please do make it a config option.  10x interrupt overhead makes me
worry.  It lets users tailor the kernel to their expected load.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 21:28 ` Andrew Morton
  2002-07-10 21:35   ` Benjamin LaHaise
@ 2002-07-10 22:01   ` Thunder from the hill
  2002-07-10 22:09     ` Cort Dougan
                       ` (2 more replies)
  2002-07-11  0:28   ` Lincoln Dale
  2 siblings, 3 replies; 76+ messages in thread
From: Thunder from the hill @ 2002-07-10 22:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Grover, Andrew, Linux

Hi,

On Wed, 10 Jul 2002, Andrew Morton wrote:
> That makes a ton of sense.
> 
> > But on the other hand, increasing HZ has perf/latency benefits, yes? Have
> > these been quantified?
> 
> Not that I'm aware of.  And I'd regard any such claims with some
> scepticism.
> 
> > I'd either like to see a HZ that has balanced
> > power/performance, or could we perhaps detect we are on a system that cares
> > about power (aka a laptop) and tweak its value at runtime?

Want a config option? Either int or bool (CONFIG_LOW_HZ). It's not too 
much effort.

							Regards,
							Thunder
-- 
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o?  K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y- 
------END GEEK CODE BLOCK------


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 21:38     ` Andrew Morton
@ 2002-07-10 21:42       ` Benjamin LaHaise
  2002-07-11  2:14         ` CaT
  0 siblings, 1 reply; 76+ messages in thread
From: Benjamin LaHaise @ 2002-07-10 21:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Grover, Andrew, Linux

On Wed, Jul 10, 2002 at 02:38:32PM -0700, Andrew Morton wrote:
> OK, I'll grant that.  Why is this useful?

Think video playback, where you want to queue the frame to be played as 
close to the correct 1/60s time as possible.  With HZ=100, the code will 
frequently wake up much too late.

		-ben
-- 
"You will be reincarnated as a toad; and you will be much happier."

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 21:35   ` Benjamin LaHaise
@ 2002-07-10 21:38     ` Andrew Morton
  2002-07-10 21:42       ` Benjamin LaHaise
  2002-07-11 17:01     ` Martin Dalecki
  1 sibling, 1 reply; 76+ messages in thread
From: Andrew Morton @ 2002-07-10 21:38 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Grover, Andrew, Linux

Benjamin LaHaise wrote:
> 
> On Wed, Jul 10, 2002 at 02:28:03PM -0700, Andrew Morton wrote:
> > > But on the other hand, increasing HZ has perf/latency benefits, yes? Have
> > > these been quantified?
> >
> > Not that I'm aware of.  And I'd regard any such claims with some
> > scepticism.
> 
> The most obvious one is the reduced latency of select/poll timeouts.

OK, I'll grant that.  Why is this useful?

-

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 21:28 ` Andrew Morton
@ 2002-07-10 21:35   ` Benjamin LaHaise
  2002-07-10 21:38     ` Andrew Morton
  2002-07-11 17:01     ` Martin Dalecki
  2002-07-10 22:01   ` Thunder from the hill
  2002-07-11  0:28   ` Lincoln Dale
  2 siblings, 2 replies; 76+ messages in thread
From: Benjamin LaHaise @ 2002-07-10 21:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Grover, Andrew, Linux

On Wed, Jul 10, 2002 at 02:28:03PM -0700, Andrew Morton wrote:
> > But on the other hand, increasing HZ has perf/latency benefits, yes? Have
> > these been quantified?
> 
> Not that I'm aware of.  And I'd regard any such claims with some
> scepticism.

The most obvious one is the reduced latency of select/poll timeouts.

		-ben
-- 
"You will be reincarnated as a toad; and you will be much happier."

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 19:59 Grover, Andrew
  2002-07-10 21:09 ` george anzinger
@ 2002-07-10 21:28 ` Andrew Morton
  2002-07-10 21:35   ` Benjamin LaHaise
                     ` (2 more replies)
  2002-07-15  5:06 ` Linus Torvalds
  2 siblings, 3 replies; 76+ messages in thread
From: Andrew Morton @ 2002-07-10 21:28 UTC (permalink / raw)
  To: Grover, Andrew; +Cc: Linux

"Grover, Andrew" wrote:
> 
> I'd like to see HZ closer to 100 than 1000, for CPU power reasons. Processor
> power states like C3 may take 100 microseconds+ to enter/leave - time when
> both the CPU isn't doing any work, but still drawing power as if it was. We
> pop out of C3 whenever there is an interrupt, so reducing timer interrupts
> is good from a power standpoint by amortizing the transition penalty over a
> longer period of power savings.

That makes a ton of sense.

> But on the other hand, increasing HZ has perf/latency benefits, yes? Have
> these been quantified?

Not that I'm aware of.  And I'd regard any such claims with some
scepticism.

> I'd either like to see a HZ that has balanced
> power/performance, or could we perhaps detect we are on a system that cares
> about power (aka a laptop) and tweak its value at runtime?

It's all rather fishy.

-

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: HZ, preferably as small as possible
  2002-07-10 19:59 Grover, Andrew
@ 2002-07-10 21:09 ` george anzinger
  2002-07-11  6:03   ` Hannu Savolainen
  2002-07-10 21:28 ` Andrew Morton
  2002-07-15  5:06 ` Linus Torvalds
  2 siblings, 1 reply; 76+ messages in thread
From: george anzinger @ 2002-07-10 21:09 UTC (permalink / raw)
  To: Grover, Andrew; +Cc: Linux

"Grover, Andrew" wrote:
> 
> I'd like to see HZ closer to 100 than 1000, for CPU power reasons. Processor
> power states like C3 may take 100 microseconds+ to enter/leave - time when
> both the CPU isn't doing any work, but still drawing power as if it was. We
> pop out of C3 whenever there is an interrupt, so reducing timer interrupts
> is good from a power standpoint by amortizing the transition penalty over a
> longer period of power savings.
> 
> But on the other hand, increasing HZ has perf/latency benefits, yes? Have
> these been quantified? I'd either like to see a HZ that has balanced
> power/performance, or could we perhaps detect we are on a system that cares
> about power (aka a laptop) and tweak its value at runtime?

HZ is used in a LOT of places.  I suspect "tweaking" at run
time would be a bit difficult.  

The high-res-timers patch give high resolution timers with
out changing HZ.  Interrupts are scheculed as needed,
between the 1/HZ ticks, so a quite system will have few (if
any) interrupts between the ticks.

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 76+ messages in thread

* HZ, preferably as small as possible
@ 2002-07-10 19:59 Grover, Andrew
  2002-07-10 21:09 ` george anzinger
                   ` (2 more replies)
  0 siblings, 3 replies; 76+ messages in thread
From: Grover, Andrew @ 2002-07-10 19:59 UTC (permalink / raw)
  To: Linux

I'd like to see HZ closer to 100 than 1000, for CPU power reasons. Processor
power states like C3 may take 100 microseconds+ to enter/leave - time when
both the CPU isn't doing any work, but still drawing power as if it was. We
pop out of C3 whenever there is an interrupt, so reducing timer interrupts
is good from a power standpoint by amortizing the transition penalty over a
longer period of power savings.

But on the other hand, increasing HZ has perf/latency benefits, yes? Have
these been quantified? I'd either like to see a HZ that has balanced
power/performance, or could we perhaps detect we are on a system that cares
about power (aka a laptop) and tweak its value at runtime?

Regards -- Andy

-----------------------------
Andrew Grover
Intel Labs / Mobile Architecture
andrew.grover@intel.com


^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2002-07-18 13:21 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-07-11  2:46 HZ, preferably as small as possible Grover, Andrew
2002-07-11  3:01 ` Jeff Garzik
2002-07-11 11:45   ` Alan Cox
2002-07-11 17:08   ` Martin Dalecki
2002-07-11 19:21     ` Albert D. Cahalan
2002-07-16  9:17       ` Kai Henningsen
2002-07-11 20:34     ` Bill Davidsen
2002-07-12 12:01       ` Martin Dalecki
2002-07-15  5:15       ` Linus Torvalds
2002-07-15  6:56         ` Albert D. Cahalan
2002-07-15  8:24           ` Russell King
2002-07-15 15:48             ` David Mosberger
2002-07-15 18:20               ` Albert D. Cahalan
2002-07-15 18:30                 ` David Mosberger
2002-07-15 16:07             ` Albert D. Cahalan
2002-07-15 17:06               ` Russell King
2002-07-15 18:43                 ` Albert D. Cahalan
2002-07-15 18:53                   ` Russell King
2002-07-15 18:50               ` Linus Torvalds
2002-07-15 20:15                 ` Albert D. Cahalan
2002-07-15  8:58         ` Dave Mielke
2002-07-11  7:09 ` george anzinger
  -- strict thread matches above, loose matches on Subject: below --
2002-07-10 19:59 Grover, Andrew
2002-07-10 21:09 ` george anzinger
2002-07-11  6:03   ` Hannu Savolainen
2002-07-11  7:15     ` george anzinger
2002-07-12  0:36       ` Stevie O
2002-07-12  0:50         ` Thunder from the hill
2002-07-12  0:55           ` Robert Love
2002-07-12  0:58             ` Thunder from the hill
2002-07-12  1:24             ` Alan Cox
2002-07-12  1:37             ` Mark Hahn
2002-07-12  1:09         ` george anzinger
2002-07-12  1:26           ` Roland Dreier
2002-07-12 17:30             ` george anzinger
2002-07-12  1:35           ` Stevie O
2002-07-12  3:01         ` Bernd Eckenfels
2002-07-11 12:54     ` Thunder from the hill
2002-07-11 15:59       ` Martin Dalecki
2002-07-10 21:28 ` Andrew Morton
2002-07-10 21:35   ` Benjamin LaHaise
2002-07-10 21:38     ` Andrew Morton
2002-07-10 21:42       ` Benjamin LaHaise
2002-07-11  2:14         ` CaT
2002-07-11 17:01     ` Martin Dalecki
2002-07-10 22:01   ` Thunder from the hill
2002-07-10 22:09     ` Cort Dougan
2002-07-10 22:41     ` Thunder from the hill
2002-07-10 22:47       ` Thunder from the hill
2002-07-10 22:49       ` Eli Carter
2002-07-10 23:05         ` Thunder from the hill
2002-07-10 23:08       ` Dave Mielke
2002-07-10 23:13         ` Thunder from the hill
2002-07-10 23:50     ` J.A. Magallon
2002-07-11  0:28   ` Lincoln Dale
2002-07-11 11:35     ` Kasper Dupont
2002-07-11 12:30       ` Alan Cox
2002-07-11 13:37         ` Kasper Dupont
2002-07-11 15:46           ` Alan Cox
2002-07-11 18:51       ` george anzinger
2002-07-15  5:06 ` Linus Torvalds
2002-07-15 16:26   ` Robert Love
2002-07-15 18:56     ` Linus Torvalds
2002-07-15 19:52       ` mbs
2002-07-15 20:01         ` yodaiken
2002-07-16 11:41   ` Vojtech Pavlik
2002-07-17 19:33   ` Daniel Phillips
2002-07-17 20:31     ` Richard B. Johnson
2002-07-17 20:40       ` Daniel Phillips
2002-07-17 21:02         ` Richard B. Johnson
2002-07-17 21:02       ` Linus Torvalds
2002-07-17 21:16         ` Daniel Phillips
2002-07-18 12:57         ` Richard B. Johnson
2002-07-18 13:25           ` Daniel Phillips
2002-07-18 10:10       ` Kai Henningsen
2002-07-17 20:55     ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.