All of lore.kernel.org
 help / color / mirror / Atom feed
* finding out the value of HZ from userspace
@ 2004-03-11 14:17 ` Micha Feigin
  2004-03-13 17:24   ` Arjan van de Ven
                     ` (2 more replies)
  0 siblings, 3 replies; 75+ messages in thread
From: Micha Feigin @ 2004-03-11 14:17 UTC (permalink / raw)
  To: lkml

Is it possible to find out what the kernel's notion of HZ is from user
space?
It seem to change from system to system and between 2.4 (100 on i386)
to 2.6 (1000 on i386).

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-11 14:17 ` finding out the value of HZ from userspace Micha Feigin
@ 2004-03-13 17:24   ` Arjan van de Ven
  2004-03-13 19:34     ` John Reiser
                       ` (2 more replies)
  2004-03-14  2:45   ` Horst von Brand
       [not found]   ` <200403161757.48786.mgross@linux.intel.com>
  2 siblings, 3 replies; 75+ messages in thread
From: Arjan van de Ven @ 2004-03-13 17:24 UTC (permalink / raw)
  To: Micha Feigin; +Cc: lkml

[-- Attachment #1: Type: text/plain, Size: 354 bytes --]

On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> Is it possible to find out what the kernel's notion of HZ is from user
> space?
> It seem to change from system to system and between 2.4 (100 on i386)
> to 2.6 (1000 on i386).

if you can see 1000 from userspace that is a bad kernel bug; can you say
where you find something in units of 1000 ?

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-13 17:24   ` Arjan van de Ven
@ 2004-03-13 19:34     ` John Reiser
  2004-03-13 19:38       ` Arjan van de Ven
  2004-03-13 21:19     ` tabris
  2004-03-13 22:10     ` Micha Feigin
  2 siblings, 1 reply; 75+ messages in thread
From: John Reiser @ 2004-03-13 19:34 UTC (permalink / raw)
  To: arjanv; +Cc: Micha Feigin, lkml

Arjan van de Ven wrote:
> On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> 
>>Is it possible to find out what the kernel's notion of HZ is from user
>>space?
>>It seem to change from system to system and between 2.4 (100 on i386)
>>to 2.6 (1000 on i386).
> 
> 
> if you can see 1000 from userspace that is a bad kernel bug; can you say
> where you find something in units of 1000 ?

create_elf_tables() in fs/binfmt_elf.c tells every ELF execve():
         NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
which can be found by crawling through the stack above the pointer
to the last environment variable.

-- 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-13 19:34     ` John Reiser
@ 2004-03-13 19:38       ` Arjan van de Ven
  2004-03-13 22:14         ` Micha Feigin
  2004-03-16  0:28         ` Peter Williams
  0 siblings, 2 replies; 75+ messages in thread
From: Arjan van de Ven @ 2004-03-13 19:38 UTC (permalink / raw)
  To: John Reiser; +Cc: Micha Feigin, lkml

[-- Attachment #1: Type: text/plain, Size: 922 bytes --]

On Sat, Mar 13, 2004 at 11:34:37AM -0800, John Reiser wrote:
> Arjan van de Ven wrote:
> >On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> >
> >>Is it possible to find out what the kernel's notion of HZ is from user
> >>space?
> >>It seem to change from system to system and between 2.4 (100 on i386)
> >>to 2.6 (1000 on i386).
> >
> >
> >if you can see 1000 from userspace that is a bad kernel bug; can you say
> >where you find something in units of 1000 ?
> 
> create_elf_tables() in fs/binfmt_elf.c tells every ELF execve():
>         NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
> which can be found by crawling through the stack above the pointer
> to the last environment variable.

Ugh that should say 100 on x86....
but..
param.h:# define USER_HZ        100             /* .. some user interfaces are in "ticks" */
param.h:# define CLOCKS_PER_SEC (USER_HZ)       /* like times() */
.....
that looks like 100 to me.


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-13 17:24   ` Arjan van de Ven
  2004-03-13 19:34     ` John Reiser
@ 2004-03-13 21:19     ` tabris
  2004-03-13 22:10     ` Micha Feigin
  2 siblings, 0 replies; 75+ messages in thread
From: tabris @ 2004-03-13 21:19 UTC (permalink / raw)
  To: arjanv, Micha Feigin; +Cc: lkml

On Saturday 13 March 2004 12:24 pm, Arjan van de Ven wrote:
> On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> > Is it possible to find out what the kernel's notion of HZ is from user
> > space?
> > It seem to change from system to system and between 2.4 (100 on i386)
> > to 2.6 (1000 on i386).
>
> if you can see 1000 from userspace that is a bad kernel bug; can you say
> where you find something in units of 1000 ?
2.6.3-rc1-mm1

procinfo gives the timer interrupt counting 1000 ints/sec
tho procinfo is broken for other stuff like 2.4 showed pages swapped, pages 
read in and out.

--
tabris
-
"We never make assertions, Miss Taggart," said Hugh Akston.  "That is
the moral crime peculiar to our enemies.  We do not tell -- we *show*.
We do not claim -- we *prove*."  
-- Ayn Rand, _Atlas Shrugged_


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-13 17:24   ` Arjan van de Ven
  2004-03-13 19:34     ` John Reiser
  2004-03-13 21:19     ` tabris
@ 2004-03-13 22:10     ` Micha Feigin
  2004-03-13 22:41       ` Arjan van de Ven
  2 siblings, 1 reply; 75+ messages in thread
From: Micha Feigin @ 2004-03-13 22:10 UTC (permalink / raw)
  To: lkml

On Sat, Mar 13, 2004 at 06:24:31PM +0100, Arjan van de Ven wrote:
> On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> > Is it possible to find out what the kernel's notion of HZ is from user
> > space?
> > It seem to change from system to system and between 2.4 (100 on i386)
> > to 2.6 (1000 on i386).
> 
> if you can see 1000 from userspace that is a bad kernel bug; can you say
> where you find something in units of 1000 ?

I can't see it from user space. Its in the kernel headers. The thing is
I am working on fixes to laptop mode. The problem is it requires
changing bdflush and journaled file systems journal flush times. The
problem is that some of these (bdflush, xfs) expect the value in jiffies
and not seconds or milliseconds so making the initiation script portable
requires knowing the value of HZ.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-13 19:38       ` Arjan van de Ven
@ 2004-03-13 22:14         ` Micha Feigin
  2004-03-13 22:32           ` Arjan van de Ven
  2004-03-16  0:28         ` Peter Williams
  1 sibling, 1 reply; 75+ messages in thread
From: Micha Feigin @ 2004-03-13 22:14 UTC (permalink / raw)
  To: lkml

On Sat, Mar 13, 2004 at 08:38:52PM +0100, Arjan van de Ven wrote:
> On Sat, Mar 13, 2004 at 11:34:37AM -0800, John Reiser wrote:
> > Arjan van de Ven wrote:
> > >On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> > >
> > >>Is it possible to find out what the kernel's notion of HZ is from user
> > >>space?
> > >>It seem to change from system to system and between 2.4 (100 on i386)
> > >>to 2.6 (1000 on i386).
> > >
> > >
> > >if you can see 1000 from userspace that is a bad kernel bug; can you say
> > >where you find something in units of 1000 ?
> > 
> > create_elf_tables() in fs/binfmt_elf.c tells every ELF execve():
> >         NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
> > which can be found by crawling through the stack above the pointer
> > to the last environment variable.
> 
> Ugh that should say 100 on x86....
> but..
> param.h:# define USER_HZ        100             /* .. some user interfaces are in "ticks" */
> param.h:# define CLOCKS_PER_SEC (USER_HZ)       /* like times() */
> .....
> that looks like 100 to me.
> 

When dealing with bdflush and a few other interfaces the values need to
be in jiffies which requires knowledge of the kernels notion of HZ not
userspace.

The other option is to try to push a change to make the interface in
centisecs instead of jiffies, question is if it will catch.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-13 22:14         ` Micha Feigin
@ 2004-03-13 22:32           ` Arjan van de Ven
  2004-03-14  1:05             ` Micha Feigin
  0 siblings, 1 reply; 75+ messages in thread
From: Arjan van de Ven @ 2004-03-13 22:32 UTC (permalink / raw)
  To: Micha Feigin; +Cc: lkml

[-- Attachment #1: Type: text/plain, Size: 1480 bytes --]

On Sat, 2004-03-13 at 23:14, Micha Feigin wrote:
> On Sat, Mar 13, 2004 at 08:38:52PM +0100, Arjan van de Ven wrote:
> > On Sat, Mar 13, 2004 at 11:34:37AM -0800, John Reiser wrote:
> > > Arjan van de Ven wrote:
> > > >On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> > > >
> > > >>Is it possible to find out what the kernel's notion of HZ is from user
> > > >>space?
> > > >>It seem to change from system to system and between 2.4 (100 on i386)
> > > >>to 2.6 (1000 on i386).
> > > >
> > > >
> > > >if you can see 1000 from userspace that is a bad kernel bug; can you say
> > > >where you find something in units of 1000 ?
> > > 
> > > create_elf_tables() in fs/binfmt_elf.c tells every ELF execve():
> > >         NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
> > > which can be found by crawling through the stack above the pointer
> > > to the last environment variable.
> > 
> > Ugh that should say 100 on x86....
> > but..
> > param.h:# define USER_HZ        100             /* .. some user interfaces are in "ticks" */
> > param.h:# define CLOCKS_PER_SEC (USER_HZ)       /* like times() */
> > .....
> > that looks like 100 to me.
> > 
> 
> When dealing with bdflush and a few other interfaces the values need to
> be in jiffies which requires knowledge of the kernels notion of HZ not
> userspace.

Wrong. Any such interface is supposed to convert automatically. Any
interface you can find that doesn't should be reported as a serious bug!


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-13 22:10     ` Micha Feigin
@ 2004-03-13 22:41       ` Arjan van de Ven
  2004-03-14  1:07         ` Micha Feigin
  2004-03-14 18:26         ` John Reiser
  0 siblings, 2 replies; 75+ messages in thread
From: Arjan van de Ven @ 2004-03-13 22:41 UTC (permalink / raw)
  To: Micha Feigin; +Cc: lkml

[-- Attachment #1: Type: text/plain, Size: 1163 bytes --]

On Sat, 2004-03-13 at 23:10, Micha Feigin wrote:
> On Sat, Mar 13, 2004 at 06:24:31PM +0100, Arjan van de Ven wrote:
> > On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> > > Is it possible to find out what the kernel's notion of HZ is from user
> > > space?
> > > It seem to change from system to system and between 2.4 (100 on i386)
> > > to 2.6 (1000 on i386).
> > 
> > if you can see 1000 from userspace that is a bad kernel bug; can you say
> > where you find something in units of 1000 ?
> 
> I can't see it from user space. Its in the kernel headers. The thing is
> I am working on fixes to laptop mode. The problem is it requires
> changing bdflush and journaled file systems journal flush times. The
> problem is that some of these (bdflush, xfs) expect the value in jiffies
> and not seconds or milliseconds so making the initiation script portable
> requires knowing the value of HZ.

the kernel side is supposed to use clock_t_to_jiffies() and co for this
to present a unified HZ to userspace. The internal kernel HZ should
*NOT* leak out to usespace. Heck it's quite thinkable that in the future
there will be no such HZ.
 


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-13 22:32           ` Arjan van de Ven
@ 2004-03-14  1:05             ` Micha Feigin
  2004-03-14  1:49               ` Andrew Morton
  0 siblings, 1 reply; 75+ messages in thread
From: Micha Feigin @ 2004-03-14  1:05 UTC (permalink / raw)
  To: lkml

On Sat, Mar 13, 2004 at 11:32:39PM +0100, Arjan van de Ven wrote:
> On Sat, 2004-03-13 at 23:14, Micha Feigin wrote:
> > On Sat, Mar 13, 2004 at 08:38:52PM +0100, Arjan van de Ven wrote:
> > > On Sat, Mar 13, 2004 at 11:34:37AM -0800, John Reiser wrote:
> > > > Arjan van de Ven wrote:
> > > > >On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> > > > >
> > > > >>Is it possible to find out what the kernel's notion of HZ is from user
> > > > >>space?
> > > > >>It seem to change from system to system and between 2.4 (100 on i386)
> > > > >>to 2.6 (1000 on i386).
> > > > >
> > > > >
> > > > >if you can see 1000 from userspace that is a bad kernel bug; can you say
> > > > >where you find something in units of 1000 ?
> > > > 
> > > > create_elf_tables() in fs/binfmt_elf.c tells every ELF execve():
> > > >         NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
> > > > which can be found by crawling through the stack above the pointer
> > > > to the last environment variable.
> > > 
> > > Ugh that should say 100 on x86....
> > > but..
> > > param.h:# define USER_HZ        100             /* .. some user interfaces are in "ticks" */
> > > param.h:# define CLOCKS_PER_SEC (USER_HZ)       /* like times() */
> > > .....
> > > that looks like 100 to me.
> > > 
> > 
> > When dealing with bdflush and a few other interfaces the values need to
> > be in jiffies which requires knowledge of the kernels notion of HZ not
> > userspace.
> 
> Wrong. Any such interface is supposed to convert automatically. Any
> interface you can find that doesn't should be reported as a serious bug!
> 

Like I said, look at bdflush in 2.4 (this was fixed with the changed 2.6
interface) and xfs proc interface in both 2.4 and 2.6.
In light of your post then there is a serious bug.

For example for bdflush age_buffer field (true for the other used fields
also), no conversion:
	bh->b_flushtime = jiffies + bdf_prm.b_un.age_buffer;

For xfs flush interval:
if (pbd_active == 1) {
			mod_timer(&pb_daemon_timer,
				  jiffies + pb_params.flush_interval.val);
			interruptible_sleep_on(&pbd_waitq);
		}

xfs should be converted to centisecs, bdflush should also be converted
to centisecs, or the interface from 2.6 should somehow be ported to
exist in parallel to the 2.4 one.

I don't mind making a patch, which approach should be used?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-13 22:41       ` Arjan van de Ven
@ 2004-03-14  1:07         ` Micha Feigin
  2004-03-14 18:26         ` John Reiser
  1 sibling, 0 replies; 75+ messages in thread
From: Micha Feigin @ 2004-03-14  1:07 UTC (permalink / raw)
  To: lkml

On Sat, Mar 13, 2004 at 11:41:25PM +0100, Arjan van de Ven wrote:
> On Sat, 2004-03-13 at 23:10, Micha Feigin wrote:
> > On Sat, Mar 13, 2004 at 06:24:31PM +0100, Arjan van de Ven wrote:
> > > On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
> > > > Is it possible to find out what the kernel's notion of HZ is from user
> > > > space?
> > > > It seem to change from system to system and between 2.4 (100 on i386)
> > > > to 2.6 (1000 on i386).
> > > 
> > > if you can see 1000 from userspace that is a bad kernel bug; can you say
> > > where you find something in units of 1000 ?
> > 
> > I can't see it from user space. Its in the kernel headers. The thing is
> > I am working on fixes to laptop mode. The problem is it requires
> > changing bdflush and journaled file systems journal flush times. The
> > problem is that some of these (bdflush, xfs) expect the value in jiffies
> > and not seconds or milliseconds so making the initiation script portable
> > requires knowing the value of HZ.
> 
> the kernel side is supposed to use clock_t_to_jiffies() and co for this
> to present a unified HZ to userspace. The internal kernel HZ should
> *NOT* leak out to usespace. Heck it's quite thinkable that in the future
> there will be no such HZ.
>  
> 

Kernel side doesn't do that at the moment. Even the fixed bdflush
interface in 2.6 which has dirty_writeback_centisecs as an example
converts it as
(dirty_writeback_centisecs * HZ) / 100

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-14  1:05             ` Micha Feigin
@ 2004-03-14  1:49               ` Andrew Morton
  2004-03-14 14:37                 ` Micha Feigin
  0 siblings, 1 reply; 75+ messages in thread
From: Andrew Morton @ 2004-03-14  1:49 UTC (permalink / raw)
  To: Micha Feigin; +Cc: linux-kernel

Micha Feigin <michf@post.tau.ac.il> wrote:
>
>  > Wrong. Any such interface is supposed to convert automatically. Any
>  > interface you can find that doesn't should be reported as a serious bug!
>  > 
> 
>  Like I said, look at bdflush in 2.4 (this was fixed with the changed 2.6
>  interface) and xfs proc interface in both 2.4 and 2.6.
>  In light of your post then there is a serious bug.
> 
>  For example for bdflush age_buffer field (true for the other used fields
>  also), no conversion:
>  	bh->b_flushtime = jiffies + bdf_prm.b_un.age_buffer;

I doubt if there's any motivation to fix these things in 2.4.  If you change
HZ in 2.4 you own both pieces.  (alpha has HZ=1024 in 2.4, so presumably
bdflush tuning doesn't work right).

In 2.6, the bdflush parameters do not exist.  They were replaced by
/proc/sys/vm/*_centisecs, which are HZ-independent.

There are, I think, still some /proc tunables in 2.6 which do depend upon
HZ and they should be found and fixed.  If the same tunables are present in
2.4 kernels then they should be converted to take centiseconds in 2.6, so
2.4-based tools continue to work correctly.

We have similar problems where /proc tunables are expressed in terms of
"number of pages".  As PAGE_SIZE varies from 4096 to 65536 this is
sometimes wrong.  Fixing this is more subtle.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-11 14:17 ` finding out the value of HZ from userspace Micha Feigin
  2004-03-13 17:24   ` Arjan van de Ven
@ 2004-03-14  2:45   ` Horst von Brand
  2004-03-14 14:39     ` Micha Feigin
                       ` (2 more replies)
       [not found]   ` <200403161757.48786.mgross@linux.intel.com>
  2 siblings, 3 replies; 75+ messages in thread
From: Horst von Brand @ 2004-03-14  2:45 UTC (permalink / raw)
  To: lkml

Micha Feigin <michf@post.tau.ac.il> said:
> Is it possible to find out what the kernel's notion of HZ is from user
> space?

What for? It should be invisible to userspace...

> It seem to change from system to system and between 2.4 (100 on i386)
> to 2.6 (1000 on i386).

And can also be tweaked when compiling, and depends on architecture, and...
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-14  1:49               ` Andrew Morton
@ 2004-03-14 14:37                 ` Micha Feigin
  0 siblings, 0 replies; 75+ messages in thread
From: Micha Feigin @ 2004-03-14 14:37 UTC (permalink / raw)
  To: linux-kernel

On Sat, Mar 13, 2004 at 05:49:29PM -0800, Andrew Morton wrote:
> Micha Feigin <michf@post.tau.ac.il> wrote:
> >
> >  > Wrong. Any such interface is supposed to convert automatically. Any
> >  > interface you can find that doesn't should be reported as a serious bug!
> >  > 
> > 
> >  Like I said, look at bdflush in 2.4 (this was fixed with the changed 2.6
> >  interface) and xfs proc interface in both 2.4 and 2.6.
> >  In light of your post then there is a serious bug.
> > 
> >  For example for bdflush age_buffer field (true for the other used fields
> >  also), no conversion:
> >  	bh->b_flushtime = jiffies + bdf_prm.b_un.age_buffer;
> 
> I doubt if there's any motivation to fix these things in 2.4.  If you change
> HZ in 2.4 you own both pieces.  (alpha has HZ=1024 in 2.4, so presumably
> bdflush tuning doesn't work right).
> 

There is for laptop mode which is now in the kernel so a generic startup
script can be written.

I will right a patch and post it in a new thread and see how it takes.

> In 2.6, the bdflush parameters do not exist.  They were replaced by
> /proc/sys/vm/*_centisecs, which are HZ-independent.
> 
> There are, I think, still some /proc tunables in 2.6 which do depend upon
> HZ and they should be found and fixed.  If the same tunables are present in
> 2.4 kernels then they should be converted to take centiseconds in 2.6, so
> 2.4-based tools continue to work correctly.
> 
> We have similar problems where /proc tunables are expressed in terms of
> "number of pages".  As PAGE_SIZE varies from 4096 to 65536 this is
> sometimes wrong.  Fixing this is more subtle.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>  
>  +++++++++++++++++++++++++++++++++++++++++++
>  This Mail Was Scanned By Mail-seCure System
>  at the Tel-Aviv University CC.
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-14  2:45   ` Horst von Brand
@ 2004-03-14 14:39     ` Micha Feigin
  2004-03-15  8:17     ` Jamie Lokier
  2004-03-15 10:13     ` Richard Curnow
  2 siblings, 0 replies; 75+ messages in thread
From: Micha Feigin @ 2004-03-14 14:39 UTC (permalink / raw)
  To: lkml

On Sat, Mar 13, 2004 at 11:45:17PM -0300, Horst von Brand wrote:
> Micha Feigin <michf@post.tau.ac.il> said:
> > Is it possible to find out what the kernel's notion of HZ is from user
> > space?
> 
> What for? It should be invisible to userspace...
> 

Its not. Some proc interfaces expect time in jiffies, which means
knowing HZ (bdflush in 2.4 or xfs for example).

> > It seem to change from system to system and between 2.4 (100 on i386)
> > to 2.6 (1000 on i386).
> 
> And can also be tweaked when compiling, and depends on architecture, and...
> -- 
> Dr. Horst H. von Brand                   User #22616 counter.li.org
> Departamento de Informatica                     Fono: +56 32 654431
> Universidad Tecnica Federico Santa Maria              +56 32 654239
> Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>  
>  +++++++++++++++++++++++++++++++++++++++++++
>  This Mail Was Scanned By Mail-seCure System
>  at the Tel-Aviv University CC.
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-13 22:41       ` Arjan van de Ven
  2004-03-14  1:07         ` Micha Feigin
@ 2004-03-14 18:26         ` John Reiser
  1 sibling, 0 replies; 75+ messages in thread
From: John Reiser @ 2004-03-14 18:26 UTC (permalink / raw)
  Cc: lkml

> The internal kernel HZ should *NOT* leak out to usespace.

/proc/interrupts "leaks" the value of HZ.  On x86, for instance:
    ( cat /proc/interrupts; sleep 5; cat /proc/interrupts )  |  grep timer

-- 



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-14  2:45   ` Horst von Brand
  2004-03-14 14:39     ` Micha Feigin
@ 2004-03-15  8:17     ` Jamie Lokier
  2004-03-16 18:16       ` Mark Gross
  2004-03-15 10:13     ` Richard Curnow
  2 siblings, 1 reply; 75+ messages in thread
From: Jamie Lokier @ 2004-03-15  8:17 UTC (permalink / raw)
  To: Horst von Brand; +Cc: lkml

Horst von Brand wrote:
> What for? It should be invisible to userspace...

It's not invisible.  select/poll/epoll/setitimer round their time
argument according to HZ, and programs which do smooth (i.e. low
_jitter_) animation of the kind where the eye is sensitive to the
jitter need to track it and correct for it.

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-14  2:45   ` Horst von Brand
  2004-03-14 14:39     ` Micha Feigin
  2004-03-15  8:17     ` Jamie Lokier
@ 2004-03-15 10:13     ` Richard Curnow
  2 siblings, 0 replies; 75+ messages in thread
From: Richard Curnow @ 2004-03-15 10:13 UTC (permalink / raw)
  To: Horst von Brand; +Cc: lkml

* Horst von Brand <vonbrand@inf.utfsm.cl> [2004-03-14]:
> Micha Feigin <michf@post.tau.ac.il> said:
> > Is it possible to find out what the kernel's notion of HZ is from user
> > space?
> 
> What for? It should be invisible to userspace...
> 

A related issue that's bugged me for a long time is lack of userspace
access to the quantity that's called 'freq_scale' in 2.4, where it's
(1<<SHIFT_HZ)/HZ for HZ!=100 and 128/128.125 for HZ==100.  (I haven't
started to reverse-engineer the equivalent value in 2.6, I took a quick
look once and concluded things had got a little more hairy.)

My interest is that I maintain (in spare-time) an NTP application called
chrony (http://chrony.sunsite.dk/), originally written to be good for
dial-up, i.e. NTP servers accessible for a short window once or twice a
day.  This app wants to tune the parameters it passes to adjtimex() to
take a best shot at keeping the system clock correct over the
potentially 'long' offline period.  To do this well, it has to
reverse-compensate for the freq_scale multiplier that the kernel will
apply to the frequency value passed to adjtimex().  Getting the right
value for this across different kernels has always been a fragile
exercise.

-- 
Richard \\\ SH-4/SH-5 Core & Debug Architect
Curnow  \\\         SuperH (UK) Ltd, Bristol

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-13 19:38       ` Arjan van de Ven
  2004-03-13 22:14         ` Micha Feigin
@ 2004-03-16  0:28         ` Peter Williams
  2004-03-16  6:33           ` Arjan van de Ven
  1 sibling, 1 reply; 75+ messages in thread
From: Peter Williams @ 2004-03-16  0:28 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: John Reiser, Micha Feigin, lkml

Arjan van de Ven wrote:
> On Sat, Mar 13, 2004 at 11:34:37AM -0800, John Reiser wrote:
> 
>>Arjan van de Ven wrote:
>>
>>>On Thu, 2004-03-11 at 15:17, Micha Feigin wrote:
>>>
>>>
>>>>Is it possible to find out what the kernel's notion of HZ is from user
>>>>space?
>>>>It seem to change from system to system and between 2.4 (100 on i386)
>>>>to 2.6 (1000 on i386).
>>>
>>>
>>>if you can see 1000 from userspace that is a bad kernel bug; can you say
>>>where you find something in units of 1000 ?
>>
>>create_elf_tables() in fs/binfmt_elf.c tells every ELF execve():
>>        NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
>>which can be found by crawling through the stack above the pointer
>>to the last environment variable.
> 
> 
> Ugh that should say 100 on x86....
> but..
> param.h:# define USER_HZ        100             /* .. some user interfaces are in "ticks" */
> param.h:# define CLOCKS_PER_SEC (USER_HZ)       /* like times() */
> .....
> that looks like 100 to me.
> 

This horrible hack of converting all tick values to 100 (from 1000) for 
export to user space because a large number of user space programs 
assume that HZ is 100 would NOT be necessary if there was a mechanism 
whereby user space programs could find out how many ticks there are in a 
second instead of having to make assumptions.

I think that providing such a mechanism should be a priority and when 
it's been available for a reasonable amount time (so that the user space 
programs can be converted to using it) USER_HZ should become equal to HZ.

Another alternative would be to stop exporting time as ticks and use 
some standard unit for all systems.  The chosen unit should be small 
enough (e.g. microseconds or mybe even nanoseconds) so that no 
information is lost (which it is in the current implementation) on 
conversion from ticks to these units.  Of course 64 bit integers would 
be needed.

Peter
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-16  0:28         ` Peter Williams
@ 2004-03-16  6:33           ` Arjan van de Ven
  2004-03-16 23:38             ` Peter Williams
  0 siblings, 1 reply; 75+ messages in thread
From: Arjan van de Ven @ 2004-03-16  6:33 UTC (permalink / raw)
  To: Peter Williams; +Cc: John Reiser, Micha Feigin, lkml

[-- Attachment #1: Type: text/plain, Size: 732 bytes --]

On Tue, Mar 16, 2004 at 11:28:18AM +1100, Peter Williams wrote:
> >Ugh that should say 100 on x86....
> >but..
> >param.h:# define USER_HZ        100             /* .. some user interfaces 
> >are in "ticks" */
> >param.h:# define CLOCKS_PER_SEC (USER_HZ)       /* like times() */
> >.....
> >that looks like 100 to me.
> >
> 
> This horrible hack of converting all tick values to 100 (from 1000) for 
> export to user space because a large number of user space programs 
> assume that HZ is 100 would NOT be necessary if there was a mechanism 
> whereby user space programs could find out how many ticks there are in a 
> second instead of having to make assumptions.

there is one. Nothing uses it
(sysconf() provides this info)


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-15  8:17     ` Jamie Lokier
@ 2004-03-16 18:16       ` Mark Gross
  0 siblings, 0 replies; 75+ messages in thread
From: Mark Gross @ 2004-03-16 18:16 UTC (permalink / raw)
  To: Jamie Lokier, Horst von Brand; +Cc: lkml

On Monday 15 March 2004 00:17, Jamie Lokier wrote:
> Horst von Brand wrote:
> > What for? It should be invisible to userspace...
>
> It's not invisible.  select/poll/epoll/setitimer round their time
> argument according to HZ, and programs which do smooth (i.e. low
> _jitter_) animation of the kind where the eye is sensitive to the
> jitter need to track it and correct for it.
>

Wouldn't it be better to just use high res timers and associated posix interfaces or low jitter applications?

--mgross


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-16  6:33           ` Arjan van de Ven
@ 2004-03-16 23:38             ` Peter Williams
  2004-03-20 10:22               ` Arjan van de Ven
  0 siblings, 1 reply; 75+ messages in thread
From: Peter Williams @ 2004-03-16 23:38 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Micha Feigin, John Reiser, lkml

Arjan van de Ven wrote:
> On Tue, Mar 16, 2004 at 11:28:18AM +1100, Peter Williams wrote:
> 
>>>Ugh that should say 100 on x86....
>>>but..
>>>param.h:# define USER_HZ        100             /* .. some user interfaces 
>>>are in "ticks" */
>>>param.h:# define CLOCKS_PER_SEC (USER_HZ)       /* like times() */
>>>.....
>>>that looks like 100 to me.
>>>
>>
>>This horrible hack of converting all tick values to 100 (from 1000) for 
>>export to user space because a large number of user space programs 
>>assume that HZ is 100 would NOT be necessary if there was a mechanism 
>>whereby user space programs could find out how many ticks there are in a 
>>second instead of having to make assumptions.
> 
> 
> there is one. Nothing uses it
> (sysconf() provides this info)

Seems to me that it would be fairly trivial to modify those programs 
(that should use this mechanism but don't) to use it?  So why should 
they be allowed to dictate kernel behaviour?

Peter
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Call for HRT in 2.6 kernel was Re: finding out the value of HZ from userspace
       [not found]     ` <20040317023059.GD19564@mail.shareable.org>
@ 2004-03-17 16:48       ` Mark Gross
  2004-03-17 20:07         ` Jamie Lokier
  0 siblings, 1 reply; 75+ messages in thread
From: Mark Gross @ 2004-03-17 16:48 UTC (permalink / raw)
  To: Jamie Lokier, Mark Gross; +Cc: Horst von Brand, lkml

[-- Attachment #1: Type: text/plain, Size: 5351 bytes --]

On Tuesday 16 March 2004 18:30, Jamie Lokier wrote:
> Mark Gross wrote:
> > What can I do to help get high res timers available in the base kernel?
>
> Patches were written long ago, by George Anzinger I think.
>
> They were not accepted.  You might want to look into why not.  I think
> the reason was nothing to do with the quality of code, but rather that
> the clock programming overhead and code bloat wasn't desirable, and
> the gains not worth it.
>
> That is less true now that the kernel is pre-emptive.  Before,
> high-res timers couldn't provide any useful scheduling guarantee: you
> might receive a timer at exactly 1.56ms, but your code might still not
> run for another 150ms anyway.

Thats not what we see using the HRT patch on 2.4.40, or with the RHT patch on 2.6.

Yes, preemptive kernels are cool.

>
> Now there is still no guarantee, but the statistical response is much
> better.

Its A LOT better in practice with HRT, and more or less good enough for interactive  and multi-media uses.

>
> > I have some internal folks I've been assigned to help out that need
> > a low jitter time base to do some VoIP and need jitter < 250usec on
> > a ~2ms timer for the DSP computations to be do-able in user space.
> >
> > I would love to help get the existing HRT patches updated so as to
> > be acceptible, or failing that perhaps write a simple low jitter
> > time base driver vareant of the rtc driver.
>
> If it's appropriate for your application, just use one of the existing
> real-time linuxes (RTAI etc.).  They all offer high-res timers and
> will give much better VoIP guarantees than any of the high-res timer
> patches for non-real-time linux.
>

Nope, we need something thats in the future typical distribution or this product is not viable.

This isn't an embedded gadgit these guys are working on, so shipping an HRT patched kernel
is a show stopper for the product.

> If, however, you can't do that, consider: on i386 HZ is currently
> 1000.  So your request for 2ms timer is perfectly satisfiable using
> the standard timers.  Remember to turn on CONFIG_PREEMPT, and use a
> SCHED_FIFO sceduling policy.
>

They tested that a long time ago, an I re-tested it recently.  It doesn't work.
The Jitter is just way too high doing this without HR timers.

Check out the non-HRT timer code it rounds up to the next jiffies, always.  

Running with the HRT patch, we get a lot closer to what is being asked for.  

> If you measure the jitter and find it is unacceptable, be aware that
> none of the high-res timer patches for non-real-time linux will
> improve that.

Not true.  The HRT patch does indeed improve things a lot.  In fact it more or less does the job and it
enables the application.  Its not perfect, but its good.

The high res timer patch re-programs the PIT to produce an interrupt as close to the timeout as it 
can, where just the jiffies clock will wake up on the following jiffies tick.  On average 1 jiffies late!  Thats
a LOT of jitter.  If you look at the code and follow through the logic, if you ask for a 2ms sleep, you are basically 
going to get a 3ms sleep.   If you ask for a 1.1ms sleep you get a 2ms (with random larger jitters) sleep.  
To change this without doing some of the things in the HRT patch opens up the timer code to waking up 
the process too early.  Also a bad thing.  

This just isn't good enough for an entire class of applications that could exist on linux if it weren't for this issue.

>
> > Yup, and the human ear is even more sensitive.
> > Busted lip synch on video playback is embarassing under linux.
>
> Lip sync isn't a problem for buffered audio+video, if the playback
> code is able to adapt to the sound card's slight deviation from the
> nominal sample rate.  The sound card itself provides the regular
> clock.  A small difference between audio and video times is ok, as
> long as it stays consistent.
>
> The difficulty occurs with two-way communication, i.e. VoIP and video,
> when you can't buffer much.
>

Communications is definitely a harder problem calling for good low jitter time base services

> > My faverate flash animation web sites have a hard time with this as well.
>
> Btw, I noticed that flash animations seem visually smoother on
> Netscape 4 than Mozilla 1.2, on the same fast box running 2.6.
> Strange -- do they have different flash implementations?
>

I don't know, but my home box doesn't play back flash too well when booted into Mandrake or Fedora, where
you-know-who's OS works just fine.  Its a BP-6 box, not too speedy.  Regardless, my point is that without
an OS standard low jitter time base in the OS all the ISV's will be cobbling together there own hacks to get
low jitter time bases for their applications.  Some will work well, others will be flaky, more will have coexistence
issues sharing time base sources like the /dev/rtc.  

If the OS provides such support they would all tend to do the same thing and programs 
needing this type of time base features would all just work better.

> > Linux needs a low jitter time base standard for desktop multi-media
> > applications of many types.
>
> That's one of the reasons why HZ was changed to 1000 on x86 for 2.6
> kernels, and the major motivation for adding CONFIG_PREEMPT.
>

I know, but the current solution still isn't good enough, on a number of levels.


--mgross

[-- Attachment #2: jitter_test_noHRT.c --]
[-- Type: text/x-csrc, Size: 3240 bytes --]

#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <time.h>
#include <sys/io.h>
#include <sched.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <errno.h>
#include <signal.h>
#include <assert.h>

#define NOF_JITTER (500 * 60) /* 1 minute test 2ms*500*60 */

int jitter[NOF_JITTER];
int ijit;
struct timeval start;

void alrm_handler(int signo, siginfo_t *info, void *context)
{
  struct timeval end;
  int delta;
  static int bla=0;
  static int first = 1;

  printf(".\n");

  ioperm(0x378,3,1);

  gettimeofday(&end, NULL);

  if(bla) {
     outb(0x1,0x378);
     bla = 0;
   } else { 
     outb(0x0,0x378);
     bla = 1;

  }
	

  delta = (end.tv_sec - start.tv_sec) * 1000000 +
    (end.tv_usec - start.tv_usec);
  delta -= 2000;
  start = end;
  if (first) { first = 0; return; }
  if(ijit >= NOF_JITTER)
	printf("out of number already... \n");
  jitter[ijit++] = delta;
}

void print_hist(void)
{
#define HISTSIZE 1000*5 * 10

  int hist[50000+ 1];
  int i;

  printf("jitter[0] = %d\n", jitter[0]);
  memset(hist, 0, sizeof(hist));

  for (i = 1; i < NOF_JITTER; i++) {
    if (jitter[i] >= HISTSIZE/2) {
      printf("sample %d over max hist: %d\n", i, jitter[i]);
      hist[HISTSIZE]++;
    }
    else if (jitter[i] <= -HISTSIZE/2) {
      printf("sample %d over min hist: %d\n", i, jitter[i]);
      hist[0]++;
    }
    else {
      hist[jitter[i] + HISTSIZE/2]++;
    }
  }
  for (i = 1; i < HISTSIZE; i++) {
    if (hist[i]) {
      printf("%d: %d\n", i-HISTSIZE/2, hist[i]);
    }
  }
  printf("HC-: %d\n", hist[0]);
  printf("HC+: %d\n", hist[HISTSIZE]);
}

void print_avg(void)
{
  double sum;
  int i;

  sum = 0;
  for (i = 1; i < NOF_JITTER; i++) {
    sum += jitter[i];
  }

  printf("avg. jitter: %f\n", sum/(NOF_JITTER-1));
}

int main(void)
{
	int retval;
	timer_t t = 0;
	struct itimerspec ispec;
	struct itimerspec ospec;
	struct sigaction sa;
	struct sched_param sched;
#if 1 
	retval = mlockall(MCL_CURRENT|MCL_FUTURE);
	if (retval) {
	  perror("mlockall(MCL_CURRENT|MCL_FUTURE) failed");
	}
	assert(retval == 0);

	sched.sched_priority = 2;
	retval = sched_setscheduler(0, SCHED_FIFO, &sched); 
	if (retval) {
	  perror("sched_setscheduler(SCHED_FIFO)");
	}
	assert(retval == 0);
#endif

	sa.sa_sigaction = alrm_handler;
	sa.sa_flags = SA_SIGINFO;
	sigemptyset(&sa.sa_mask);

	if (sigaction(SIGALRM, &sa, NULL)) {
		perror("sigaction failed");
		exit(1);
	}

	if (sigaction(SIGRTMIN, &sa, NULL)) {
		perror("sigaction failed");
		exit(1);
	}

	retval = timer_create(CLOCK_REALTIME, NULL, &t);
	if (retval) {
		perror("timer_create(CLOCK_REALTIME) failed");
	}
	assert(retval == 0);

	retval = clock_gettime(CLOCK_REALTIME, &ispec.it_value);
	if (retval) {
		perror("clock_gettime(CLOCK_REALTIME) failed");
	}
	ispec.it_value.tv_sec += 1;
	ispec.it_value.tv_nsec = 0;
	ispec.it_interval.tv_sec = 0;
	ispec.it_interval.tv_nsec = 2*1000*1000; /* 100 Hz */

	retval = timer_settime(t, TIMER_ABSTIME, &ispec, &ospec);
	if (retval) {
		perror("timer_settime(TIMER_ABSTIME) failed");
	}

	do { pause(); } while (ijit < NOF_JITTER);

	retval = timer_delete(t);
	if (retval) {
		perror("timer_delete(existing timer) failed");
	}
	assert(retval == 0);

	print_hist();

	print_avg();

	return 0;
}

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Call for HRT in 2.6 kernel was Re: finding out the value of HZ from userspace
  2004-03-17 16:48       ` Call for HRT in 2.6 kernel was " Mark Gross
@ 2004-03-17 20:07         ` Jamie Lokier
  2004-03-17 21:25           ` Mark Gross
  2004-03-18  1:19           ` Karim Yaghmour
  0 siblings, 2 replies; 75+ messages in thread
From: Jamie Lokier @ 2004-03-17 20:07 UTC (permalink / raw)
  To: Mark Gross; +Cc: Horst von Brand, lkml

Mark Gross wrote:
> > If, however, you can't do that, consider: on i386 HZ is currently
> > 1000.  So your request for 2ms timer is perfectly satisfiable using
> > the standard timers.  Remember to turn on CONFIG_PREEMPT, and use a
> > SCHED_FIFO sceduling policy.
> 
> Check out the non-HRT timer code it rounds up to the next jiffies, always.  

You said you wanted a _2ms_ timer.  Rounded up to the next jiffie,
that's... 2ms!

> Running with the HRT patch, we get a lot closer to what is being asked for.  

Ok.

> > If you measure the jitter and find it is unacceptable, be aware that
> > none of the high-res timer patches for non-real-time linux will
> > improve that.
> 
> Not true.  The HRT patch does indeed improve things a lot.  In fact
> it more or less does the job and it enables the application.  Its
> not perfect, but its good.

Ok.  My point was theoretical and I got it wrong, you're right and you
tried it. :)

> The high res timer patch re-programs the PIT to produce an interrupt
> as close to the timeout as it can, where just the jiffies clock will
> wake up on the following jiffies tick.  On average 1 jiffies late!
> Thats a LOT of jitter.  If you look at the code and follow through
> the logic, if you ask for a 2ms sleep, you are basically going to
> get a 3ms sleep.  If you ask for a 1.1ms sleep you get a 2ms (with
> random larger jitters) sleep.

Yes.  The point is that the added delay with the standard timers is
predictable, so it is possible to structure your program around that,
synchronising to the jiffies clock: have your program tick every
1.99ms or whatever that _actual_ rate of HZ/2 is on 2.6 x86 kernels.
(I gather the jiffie is slightly shorter than 1ms due to timer chip
limitation).

I don't see that the unpredictable part of the jitter would be
improved with high res timers: the unpredictable part being due to
disabled preemption, other interrupts etc.

That's what I meant by jitter, sorry for the lack of precision (no pun
intended).

> To change this without doing some of the things in the HRT patch opens up the timer code to waking up 
> the process too early.  Also a bad thing.

That's what I did with my old "Snake" program: determine when select()
will round up, and then wake up early and busy-wait in a loop calling
gettimeofday() until the precise time arrives.

It's not good, although the busy wait is limited to the length of 1
jiffie, or less if you can structure your program to compute
synchronised with the jiffie clock.

> This just isn't good enough for an entire class of applications that
> could exist on linux if it weren't for this issue.

Hmm.

For VoIP, I'm wondering why you need a timebase other than the sound
card.  Won't it provide an interrupt for every new sound fragment?

> > > Linux needs a low jitter time base standard for desktop multi-media
> > > applications of many types.
> >
> > That's one of the reasons why HZ was changed to 1000 on x86 for 2.6
> > kernels, and the major motivation for adding CONFIG_PREEMPT.
> 
> I know, but the current solution still isn't good enough, on a
> number of levels.

To demonstrate that 1000Hz ticks aren't good enough, because you need
much smaller jitter than 1ms on "ordinary machines" i.e. standard
distros, you'll have to demonstrate that you really are seeing much
smaller jitter than 1ms in your HRT-patched kernels and that it makes
a useful difference.

The pre-emptive patches was initially rejected, but Linus changed his
mind after a lot of good experimental data showing significant and
consistent improvements in latency statistics, the fact that the
patches were remarkeably non-invasive (because most of the work had
been done to support fine-grained SMP by then), and perhaps most
importantly, and surprisingly, I/O performance improved.

So there is hope with HRT, but it needs more than an implementation to
get into the standard tree (IMHO): it has to be fairly small,
non-invasive, not harm existing performance, and backed by convincing
experimental data showing worthwhile improvements.

On the bright side, HRT makes it possible to eliminate the jiffie tick
entirely, which is quite likely to be good for performance and power
consumption.  The objection to that has been that changing code which
depends on the timer tick to not use it any more would complicate that
code without much gain, and it's just not worth complicating anything
for it.  But maybe, as for kernel pre-emption, it will turn out
simpler than expected.

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Call for HRT in 2.6 kernel was Re: finding out the value of HZ from userspace
  2004-03-17 20:07         ` Jamie Lokier
@ 2004-03-17 21:25           ` Mark Gross
  2004-03-18  1:19           ` Karim Yaghmour
  1 sibling, 0 replies; 75+ messages in thread
From: Mark Gross @ 2004-03-17 21:25 UTC (permalink / raw)
  To: Jamie Lokier, Mark Gross; +Cc: Horst von Brand, lkml

[-- Attachment #1: Type: text/plain, Size: 4213 bytes --]

On Wednesday 17 March 2004 12:07, Jamie Lokier wrote:
> That's what I did with my old "Snake" program: determine when select()
> will round up, and then wake up early and busy-wait in a loop calling
> gettimeofday() until the precise time arrives.
>
> It's not good, although the busy wait is limited to the length of 1
> jiffie, or less if you can structure your program to compute
> synchronised with the jiffie clock.
>

Is this not a very strong argument for some type of HRT support in the kernel?

> > This just isn't good enough for an entire class of applications that
> > could exist on linux if it weren't for this issue.
>
> Hmm.
>
> For VoIP, I'm wondering why you need a timebase other than the sound
> card.  Won't it provide an interrupt for every new sound fragment?
>

I don't know the application the team I'm trying to help out well enough to say if thats workable.

We are exploring different "plan -B"s in case we can't get George's HRT patch updated and
into the base.  I would rather put my effort into helping George and the HRT 
implementation than writing another RTC like driver with crappy non-posix interface.

> > > > Linux needs a low jitter time base standard for desktop multi-media
> > > > applications of many types.
> > >
> > > That's one of the reasons why HZ was changed to 1000 on x86 for 2.6
> > > kernels, and the major motivation for adding CONFIG_PREEMPT.
> >
> > I know, but the current solution still isn't good enough, on a
> > number of levels.
>
> To demonstrate that 1000Hz ticks aren't good enough, because you need
> much smaller jitter than 1ms on "ordinary machines" i.e. standard
> distros, you'll have to demonstrate that you really are seeing much
> smaller jitter than 1ms in your HRT-patched kernels and that it makes
> a useful difference.

I'm trying!  

I think I have demostrated that you cannot get a 1ms timer 
by code inspection.

I've also demonstrated that asking for a 2ms periodic wake up will 
result in a 3ms periodic wake up.

The application I'm trying to enable spec's out a 0.25ms jitter on a 2ms 
periodic event clock to support doing some audio dsp.  However; I 
cannot argue the validity of the 0.25ms requierment.  I think its a valid requierment.

Attached is the jitter test using the HRT.  Running on 2.6.3 + a rebase of the source 
forge patch gets me a 2ms wave form with some jitter < 0.25ms.  On my oscilloscope
it "looks" like about +/- 0.2 ms.  

It makes a useful difference today.

I should state the HRT patch for 2.6 on the source forge site is a bit out 
of date WRT 2.6 and needs some updating.  It works for the test application, but has 
some problems that don't happen with the 2.4  version of the patch.  George 
tells me he's rolling in some fixes he has binned up "soon".  Running the same test
using the 2.4.20 + HRT + preemption patch gives less jitter than the 2.6 version.
I'm here to help get that fixed up for 2.6.

The point here is that the rebased version of the current 2.6 HRT patch works good
enough for the application I'm worried about.


>
> The pre-emptive patches was initially rejected, but Linus changed his
> mind after a lot of good experimental data showing significant and
> consistent improvements in latency statistics, the fact that the
> patches were remarkeably non-invasive (because most of the work had
> been done to support fine-grained SMP by then), and perhaps most
> importantly, and surprisingly, I/O performance improved.
>
> So there is hope with HRT, but it needs more than an implementation to
> get into the standard tree (IMHO): it has to be fairly small,
> non-invasive, not harm existing performance, and backed by convincing
> experimental data showing worthwhile improvements.
>

Hope is good.

> On the bright side, HRT makes it possible to eliminate the jiffie tick
> entirely, which is quite likely to be good for performance and power
> consumption.  The objection to that has been that changing code which
> depends on the timer tick to not use it any more would complicate that
> code without much gain, and it's just not worth complicating anything
> for it.  But maybe, as for kernel pre-emption, it will turn out
> simpler than expected.

Perhaps.

--mgross

[-- Attachment #2: jitter_test.c --]
[-- Type: text/x-csrc, Size: 3766 bytes --]

#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <time.h>
#include <sys/io.h>
#include <sched.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <errno.h>
#include <signal.h>
#include <assert.h>

#define __NR_timer_create	259
#define __NR_timer_settime	(__NR_timer_create+1)
#define __NR_timer_gettime	(__NR_timer_create+2)
#define __NR_timer_getoverrun	(__NR_timer_create+3)
#define __NR_timer_delete	(__NR_timer_create+4)
#define __NR_clock_settime	(__NR_timer_create+5)
#define __NR_clock_gettime	(__NR_timer_create+6)
#define __NR_clock_getres	(__NR_timer_create+7)
#define __NR_clock_nanosleep	(__NR_timer_create+8)
#include "high-res-timers/lib/posix_time.h"
#include "high-res-timers/lib/syscall_timer.c"


#define NOF_JITTER (500*60) /* 1 minute test 2ms * 500 * 60 */

int jitter[NOF_JITTER];
int ijit;
struct timeval start;

void alrm_handler(int signo, siginfo_t *info, void *context)
{
  struct timeval end;
  int delta;
  static int bla=0;
  static int first = 1;

  printf(".\n");

  ioperm(0x378,3,1);

  gettimeofday(&end, NULL);

  if(bla) {
     outb(0x1,0x378);
     bla = 0;
   } else { 
     outb(0x0,0x378);
     bla = 1;

  }
	

  delta = (end.tv_sec - start.tv_sec) * 1000000 +
    (end.tv_usec - start.tv_usec);
  delta -= 2000;
  start = end;
  if (first) { first = 0; return; }
  if(ijit >= NOF_JITTER)
	printf("out of number already... \n");
  jitter[ijit++] = delta;
}

void print_hist(void)
{
#define HISTSIZE 1000*5 * 10

  int hist[50000+ 1];
  int i;

  printf("jitter[0] = %d\n", jitter[0]);
  memset(hist, 0, sizeof(hist));

  for (i = 1; i < NOF_JITTER; i++) {
    if (jitter[i] >= HISTSIZE/2) {
      printf("sample %d over max hist: %d\n", i, jitter[i]);
      hist[HISTSIZE]++;
    }
    else if (jitter[i] <= -HISTSIZE/2) {
      printf("sample %d over min hist: %d\n", i, jitter[i]);
      hist[0]++;
    }
    else {
      hist[jitter[i] + HISTSIZE/2]++;
    }
  }
  for (i = 1; i < HISTSIZE; i++) {
    if (hist[i]) {
      printf("%d: %d\n", i-HISTSIZE/2, hist[i]);
    }
  }
  printf("HC-: %d\n", hist[0]);
  printf("HC+: %d\n", hist[HISTSIZE]);
}

void print_avg(void)
{
  double sum;
  int i;

  sum = 0;
  for (i = 1; i < NOF_JITTER; i++) {
    sum += jitter[i];
  }

  printf("avg. jitter: %f\n", sum/(NOF_JITTER-1));
}

int main(void)
{
	int retval;
	timer_t t = 0;
	struct itimerspec ispec;
	struct itimerspec ospec;
	struct sigaction sa;
	struct sched_param sched;
#if 1 
	retval = mlockall(MCL_CURRENT|MCL_FUTURE);
	if (retval) {
	  perror("mlockall(MCL_CURRENT|MCL_FUTURE) failed");
	}
	assert(retval == 0);

	sched.sched_priority = 2;
	retval = sched_setscheduler(0, SCHED_FIFO, &sched); 
	if (retval) {
	  perror("sched_setscheduler(SCHED_FIFO)");
	}
	assert(retval == 0);
#endif

	sa.sa_sigaction = alrm_handler;
	sa.sa_flags = SA_SIGINFO;
	sigemptyset(&sa.sa_mask);

	if (sigaction(SIGALRM, &sa, NULL)) {
		perror("sigaction failed");
		exit(1);
	}

	if (sigaction(SIGRTMIN, &sa, NULL)) {
		perror("sigaction failed");
		exit(1);
	}

	retval = timer_create(CLOCK_REALTIME_HR, NULL, &t);
	if (retval) {
		perror("timer_create(CLOCK_REALTIME) failed");
	}
	assert(retval == 0);

	retval = clock_gettime(CLOCK_REALTIME_HR, &ispec.it_value);
	if (retval) {
		perror("clock_gettime(CLOCK_REALTIME) failed");
	}
	ispec.it_value.tv_sec += 1;
	ispec.it_value.tv_nsec = 0;
	ispec.it_interval.tv_sec = 0;
	ispec.it_interval.tv_nsec = 2*1000*1000; /* 100 Hz */

	retval = timer_settime(t, TIMER_ABSTIME, &ispec, &ospec);
	if (retval) {
		perror("timer_settime(TIMER_ABSTIME) failed");
	}

	do { pause(); } while (ijit < NOF_JITTER);

	retval = timer_delete(t);
	if (retval) {
		perror("timer_delete(existing timer) failed");
	}
	assert(retval == 0);

	print_hist();

	print_avg();

	return 0;
}

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Call for HRT in 2.6 kernel was Re: finding out the value of HZ from userspace
  2004-03-17 20:07         ` Jamie Lokier
  2004-03-17 21:25           ` Mark Gross
@ 2004-03-18  1:19           ` Karim Yaghmour
  2004-03-18 11:56             ` Jamie Lokier
  1 sibling, 1 reply; 75+ messages in thread
From: Karim Yaghmour @ 2004-03-18  1:19 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Mark Gross, Horst von Brand, lkml, Philippe Gerum


Jamie Lokier wrote:
> So there is hope with HRT, but it needs more than an implementation to
> get into the standard tree (IMHO): it has to be fairly small,
> non-invasive, not harm existing performance, and backed by convincing
> experimental data showing worthwhile improvements.

Looking at the high-res timer stuff, it's somewhat similar to what
RTAI does (reprogram timer chip, use nanoseconds for computations,
provide API for ease of use, etc.) except that it can't guarantee
hard-rt, and it's integrated directly with Linux's scheduling
whereas RTAI has its own scheduler and lives as a module.

It's not my intention to debate which is better, I'll leave that
for another day. What I'm interested in, however, is that there is
a common functionality that all such facilities can use to achieve/
deliver hard-rt.

I'm thinking here of Adeos. It's the smallest subset of services
required for obtaining hard-rt in the kernel, and it's fairly
non-invasive (not to mention that configuring it out results in no
changes to the kernel.) So while Adeos doesn't provide abstract
services such as "tasks" or "timers", it does provide the basic
mechanism for all add-ons that want to provide these to obtain
the hard-rt from Adeos using an architecture-independent API.

Here are a few examples of software that can obtain hard-rt with
Adeos:
- RT Executives: Currently RTAI uses Adeos on x86 and a port of
RTLinux/GPL to Adeos is planned. Adeos, however, isn't limited to
either of these executives, and could easily be used by any other
RT executive to sit side-by-side with Linux and other kernels.
- hard-rt drivers: It's fairly trivial for a driver to hook into the
Adeos pipeline and obtain hard-rt without using either RTAI or
RTLinux. In that case, there's just the standard Linux kernel with
a hard-rt driver side-by-side atop Adeos.
- HRT: Any mechanism that modifies the PIT can then export timer
services to drivers for providing them with deterministic timer
response times. Granted there would be no integration with the
current Linux scheduler, but the added advantage is that HRT can
live as a loadable module. Scheduling/notifying of user-space
processes using such HRT is possible; RTAI's hard-rt scheduling
of normal Linux processes being an example.

Actually, most software that needs hard-rt can live as loadable
modules once Adeos is integrated in the kernel.

The Adeos patches are available here for those who are interested:
http://download.gna.org/adeos/patches/

P.S.: Before anyone comes back shouting after looking at the #ifdefs
used to modify _existing_ kernel files in the current Adeos patches,
please keep in mind that they are mainly there because they make it
fairly trivial to create patches for new kernels. When cleaning up
the patches for inclusion, most of these would disappear.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Call for HRT in 2.6 kernel was Re: finding out the value of HZ from userspace
  2004-03-18  1:19           ` Karim Yaghmour
@ 2004-03-18 11:56             ` Jamie Lokier
  2004-03-18 15:23               ` Karim Yaghmour
  0 siblings, 1 reply; 75+ messages in thread
From: Jamie Lokier @ 2004-03-18 11:56 UTC (permalink / raw)
  To: Karim Yaghmour; +Cc: Mark Gross, Horst von Brand, lkml, Philippe Gerum

I see we have gone from a desire for soft-rt high-res timers to
pushing hard-rt :)

Karim Yaghmour wrote:
> I'm thinking here of Adeos. It's the smallest subset of services
> required for obtaining hard-rt in the kernel,

In this case, it's not clear that hard-rt is desirable.  VoIP doesn't
like occasional glitches, but it can tolerate them and must do so when
a machine is overloaded, e.g. trying to handle too many scheduling
objectives at once.  I don't know much about the original poster's
problem, that's just my take on VoIP.

> and it's fairly non-invasive (not to mention that configuring it out
> results in no changes to the kernel.) So while Adeos doesn't provide
> abstract services such as "tasks" or "timers", it does provide the
> basic mechanism for all add-ons that want to provide these to obtain
> the hard-rt from Adeos using an architecture-independent API.

There is also Bernard Kuhn's recent "real-time interrupts" patch for
2.6 which could be utilised:

http://home.t-online.de/home/Bernhard_Kuhn/rtirq/20040304/rtirq.html
http://home.t-online.de/home/Bernhard_Kuhn/rtirq/20040304/rtirq-2.6.2-20040304.tar.bz2

> Actually, most software that needs hard-rt can live as loadable
> modules once Adeos is integrated in the kernel.

A couple of questions.

Can Adeos-registed timer callbacks call the same functions as normal
timer callbacks, schedule userspace, and kick network I/O with near-RT
guarantees?  Or do they run in a non-kernel context?

(Mark can say whether a normal context, i.e. with system calls, memory
allocation and network I/O, is required for Intel's VoIP application.)

Can Adeos itself be loaded as a module which overrides normal non-RT
kernel interrupt and timer functions?  If it can be kept out of the
standard kernel, but loaded when needed, that would be nice.

One more thing would help, IMHO, in getting any fancy interrupt system
in: if it balanced the different execution contexts, i.e. limit total
CPU taken in high priority, low priority interrupts, task queues
etc. in an efficient yet fair way, such that overall throughput was
improved over standard kernels in cases such as network overload.
NAPI does this at the network card level, but there is no reason why
balancing CPU among contexts cannot be done at the generic interrupt
scheduling level, making it work for all I/O devices without special
driver support.

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Call for HRT in 2.6 kernel was Re: finding out the value of HZ from userspace
  2004-03-18 11:56             ` Jamie Lokier
@ 2004-03-18 15:23               ` Karim Yaghmour
  2004-03-21  1:55                 ` Erik Andersen
  0 siblings, 1 reply; 75+ messages in thread
From: Karim Yaghmour @ 2004-03-18 15:23 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Mark Gross, Horst von Brand, lkml, Philippe Gerum


Jamie Lokier wrote:
> I see we have gone from a desire for soft-rt high-res timers to
> pushing hard-rt :)

:D

> In this case, it's not clear that hard-rt is desirable.  VoIP doesn't
> like occasional glitches, but it can tolerate them and must do so when
> a machine is overloaded, e.g. trying to handle too many scheduling
> objectives at once.  I don't know much about the original poster's
> problem, that's just my take on VoIP.

I agree that VoIP doesn't "require" hard-rt, I have no issue with that.
I was expanding on the specific discussion of the HRT mechanism, not
the intended usage itself. Let me know if you'd rather see this discussed
separately.

> There is also Bernard Kuhn's recent "real-time interrupts" patch for
> 2.6 which could be utilised:

The first implementation of such a mechanism for Linux was done by David
Schleef a few years back and it was called the "no-warm air" patch (long
story.) Basically, though, I think this is the wrong approach because the
resulting behavior is very much CPU-dependent (interrupt priority
mechanisms being CPU-specific.) With Adeos, you get the same API and
the same behavior regardless of underlying architecture. Adeos' pipeline
actually provides the same net effect as playing with the int controllers,
except that it's totally hardware independent. No to mention that a
nanokernel's behavior can be modified/extended while a CPU's behavior is
pretty much ... hmmm, well, fixed in silicone ...

> A couple of questions.

Sure.

> Can Adeos-registed timer callbacks call the same functions as normal
> timer callbacks, schedule userspace, and kick network I/O with near-RT
> guarantees?  Or do they run in a non-kernel context?

The Adeos-registered callbacks cannot _directly_ call kernel functions.
They can, however, trigger virtual IRQs that, once propagated to Linux,
can then take care of such calls. Here are the relevant Adeos functions
(see http://home.gna.org/adeos/doc/api/interface_8h.html for the full
API and its usage):

unsigned adeos_alloc_irq (void)
> Allocates a system-wide pipelined virtual interrupt. Virtual interrupts
> are pseudo-interrupt channels which are handled in exactly the same way
> than their hardware-generated counterparts. This is a basic, one-way
> only, inter-domain communication system allowing a domain to trigger the
> execution of a handler inside cooperating domains using the interrupt
> semantics (see adeos_trigger_irq()). A domain can trap the virtual
> interrupt using the adeos_virtualize_irq() service. Any domain can use
> adeos_virtualize_irq() on a virtual interrupt even if such interrupt was
> allocated by another domain.

int adeos_trigger_irq (unsigned irq)
> Simulates the occurrence of an interrupt. Adeos acts as if the specified
> interrupt had been received from the underlying hardware, and starts
> propagating it down the pipeline. The calling domain might be immediately
> preempted on behalf of this routine if the interrupt is delivered to a more
> prioritary domain.

Using these basic functions and the relevant callbacks it's fairly simple
to have the hard-rt component do the critical stuff and immediately signal
a Linux-aware handler to "call the same functions as normal timer callbacks,
schedule userspace, and kick network I/O with near-RT guarantees." It
should also be possible to wrap these primitives around higher-level
functions such as HRT, for example.

Side note: virtual IRQs is something CPU int controller silicone is
incapable of.

> Can Adeos itself be loaded as a module which overrides normal non-RT
> kernel interrupt and timer functions?  If it can be kept out of the
> standard kernel, but loaded when needed, that would be nice.

Sure, Adeos can be compiled as a module and loaded at runtime, but it does
require a few basic things to be modified within the kernel in order to
allow runtime loading.

> One more thing would help, IMHO, in getting any fancy interrupt system
> in: if it balanced the different execution contexts, i.e. limit total
> CPU taken in high priority, low priority interrupts, task queues
> etc. in an efficient yet fair way, such that overall throughput was
> improved over standard kernels in cases such as network overload.
> NAPI does this at the network card level, but there is no reason why
> balancing CPU among contexts cannot be done at the generic interrupt
> scheduling level, making it work for all I/O devices without special
> driver support.

Hmm... IOW, Adeos wouldn't implement a pipeline, but an interrupt
scheduler? I guess that's possible, but I can see things getting very
confusing very fast for driver developers because of the hard-rt issues,
not to mention code complexity in the interrupt scheduler for avoiding
starvation, missed deadlines, overruns, etc. Given that Adeos itself is
a nanokernel it really would make little sense for it to act as the
scheduler of any interrupts within any of its clients. It could certainly
have an OS scheduler, but that's different from you're asking for. There
is no reason, however, for not having an additional Linux-specific
mechanism for prioritizing incoming Linux interrupts. For example, if I
understand it correctly TimeSys has implemented some form of interrupt
threading in their kernel. Maybe that can be used at the Linux level in
addition to having a basic bare-bones hard-RT mechanism for delivering
interrupts. While I can't expand on this, since I haven't seen their
code, I think something similar may be better adapted in order to
preserve existing driver behavior.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-16 23:38             ` Peter Williams
@ 2004-03-20 10:22               ` Arjan van de Ven
  2004-03-20 11:28                 ` Stefan Smietanowski
  2004-03-20 23:26                 ` Peter Williams
  0 siblings, 2 replies; 75+ messages in thread
From: Arjan van de Ven @ 2004-03-20 10:22 UTC (permalink / raw)
  To: Peter Williams; +Cc: Micha Feigin, John Reiser, lkml

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]


On Wed, Mar 17, 2004 at 10:38:03AM +1100, Peter Williams wrote:
> >there is one. Nothing uses it
> >(sysconf() provides this info)
> 
> Seems to me that it would be fairly trivial to modify those programs 
> (that should use this mechanism but don't) to use it?  So why should 
> they be allowed to dictate kernel behaviour?

quality of implementation; for example shell scripts that want to do
echo 500 > /proc/sys/foo/bar/something_in_HZ
...
or /etc/sysctl.conf or ...


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-20 10:22               ` Arjan van de Ven
@ 2004-03-20 11:28                 ` Stefan Smietanowski
  2004-03-20 11:41                   ` Arjan van de Ven
                                     ` (2 more replies)
  2004-03-20 23:26                 ` Peter Williams
  1 sibling, 3 replies; 75+ messages in thread
From: Stefan Smietanowski @ 2004-03-20 11:28 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Peter Williams, Micha Feigin, John Reiser, lkml

>>>there is one. Nothing uses it
>>>(sysconf() provides this info)
>>
>>Seems to me that it would be fairly trivial to modify those programs 
>>(that should use this mechanism but don't) to use it?  So why should 
>>they be allowed to dictate kernel behaviour?
> 
> 
> quality of implementation; for example shell scripts that want to do
> echo 500 > /proc/sys/foo/bar/something_in_HZ
> ...
> or /etc/sysctl.conf or ...
> 

Then write a simple program already. How hard is it to write a program
that does a sysconf() and returns (as ascii of course) just the
value of HZ? Then do some trivial calculation off of that.

HZ=$(gethz)

If your 500 was 5 seconds, do

TIME=$[HZ*5]
echo $TIME > /proc/sys/foo/bar/something_in_HZ

I mean, come on.

Then you include it in the default distro of choice so that
everybody can use it and there you are.

If someone doesn't have "gethz" then they can download it.

// Stefan

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-20 11:28                 ` Stefan Smietanowski
@ 2004-03-20 11:41                   ` Arjan van de Ven
  2004-03-20 23:58                     ` Peter Williams
  2004-03-21  8:00                   ` Kai Henningsen
  2004-03-22 22:34                   ` Micha Feigin
  2 siblings, 1 reply; 75+ messages in thread
From: Arjan van de Ven @ 2004-03-20 11:41 UTC (permalink / raw)
  To: Stefan Smietanowski; +Cc: Peter Williams, Micha Feigin, John Reiser, lkml

[-- Attachment #1: Type: text/plain, Size: 267 bytes --]

On Sat, Mar 20, 2004 at 12:28:00PM +0100, Stefan Smietanowski wrote:
> 
> Then you include it in the default distro of choice so that
> everybody can use it and there you are.

but what is the POINT of all this changing/breaking ?
Can someone at least tell me that ?

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-20 10:22               ` Arjan van de Ven
  2004-03-20 11:28                 ` Stefan Smietanowski
@ 2004-03-20 23:26                 ` Peter Williams
  1 sibling, 0 replies; 75+ messages in thread
From: Peter Williams @ 2004-03-20 23:26 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Micha Feigin, John Reiser, lkml

Arjan van de Ven wrote:
> On Wed, Mar 17, 2004 at 10:38:03AM +1100, Peter Williams wrote:
> 
>>>there is one. Nothing uses it
>>>(sysconf() provides this info)
>>
>>Seems to me that it would be fairly trivial to modify those programs 
>>(that should use this mechanism but don't) to use it?  So why should 
>>they be allowed to dictate kernel behaviour?
> 
> 
> quality of implementation; for example shell scripts that want to do
> echo 500 > /proc/sys/foo/bar/something_in_HZ
> ...
> or /etc/sysctl.conf or ...
> 

A small utility program secs_to_ticks would solve this problem e.g.:

secs_to_ticks 0.5 > /proc/sys/foo/bar/something_in_HZ

Peter
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-20 11:41                   ` Arjan van de Ven
@ 2004-03-20 23:58                     ` Peter Williams
  2004-03-21  1:09                       ` Tim Schmielau
  0 siblings, 1 reply; 75+ messages in thread
From: Peter Williams @ 2004-03-20 23:58 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Stefan Smietanowski, Micha Feigin, John Reiser, lkml

Arjan van de Ven wrote:
> On Sat, Mar 20, 2004 at 12:28:00PM +0100, Stefan Smietanowski wrote:
> 
>>Then you include it in the default distro of choice so that
>>everybody can use it and there you are.
> 
> 
> but what is the POINT of all this changing/breaking ?
> Can someone at least tell me that ?

In the 2.6 kernels internal timing and task statistics (for i386 
systems) are now kept in milliseconds where they were previously in 
1/100ths of a second.  By converting these statistics to 1/100ths of a 
second for export to user space an order of magnitude (i.e. a factor of 
10) loss of precision occurs.

Peter
PS I'd like to point out that there are changes in 2.6 kernels that have 
more serious consequences than this that have to be coped with when 
using 2.6 kernels on distributions such as RedHat 9 that were built 
around older kernels.
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-20 23:58                     ` Peter Williams
@ 2004-03-21  1:09                       ` Tim Schmielau
  2004-03-21  1:30                         ` Peter Williams
  0 siblings, 1 reply; 75+ messages in thread
From: Tim Schmielau @ 2004-03-21  1:09 UTC (permalink / raw)
  To: Peter Williams
  Cc: Arjan van de Ven, Stefan Smietanowski, Micha Feigin, John Reiser, lkml

On Sun, 21 Mar 2004, Peter Williams wrote:

> In the 2.6 kernels internal timing and task statistics (for i386 
> systems) are now kept in milliseconds where they were previously in 
> 1/100ths of a second.  By converting these statistics to 1/100ths of a 
> second for export to user space an order of magnitude (i.e. a factor of 
> 10) loss of precision occurs.

No. The statistics are not a result of full bookkeeping, but simply
gained by periodically sampling the processor state. So they don't
have a precision of 1/1000th of a second anyways.


Tim

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-21  1:09                       ` Tim Schmielau
@ 2004-03-21  1:30                         ` Peter Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Peter Williams @ 2004-03-21  1:30 UTC (permalink / raw)
  To: Tim Schmielau
  Cc: Arjan van de Ven, Stefan Smietanowski, Micha Feigin, John Reiser, lkml

Tim Schmielau wrote:
> On Sun, 21 Mar 2004, Peter Williams wrote:
> 
> 
>>In the 2.6 kernels internal timing and task statistics (for i386 
>>systems) are now kept in milliseconds where they were previously in 
>>1/100ths of a second.  By converting these statistics to 1/100ths of a 
>>second for export to user space an order of magnitude (i.e. a factor of 
>>10) loss of precision occurs.
> 
> 
> No. The statistics are not a result of full bookkeeping, but simply
> gained by periodically sampling the processor state. So they don't
> have a precision of 1/1000th of a second anyways.

1/1000th of a second IS the internal timing precision.  The issue of how 
tasks' CPU usage is allocated for reporting is a different matter but 
from a statistical viewpoint this will just effect the variance (or 
standard deviation) of the estimates and NOT their precision. As the 
number of samples the variance (or standard deviation) decrease rapidly 
so to all intents and purposes the statistics are accurate to the 
nearest 1/1000th of a second.

Peter
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Call for HRT in 2.6 kernel was Re: finding out the value of HZ from userspace
  2004-03-18 15:23               ` Karim Yaghmour
@ 2004-03-21  1:55                 ` Erik Andersen
  2004-03-23 22:35                   ` Karim Yaghmour
  0 siblings, 1 reply; 75+ messages in thread
From: Erik Andersen @ 2004-03-21  1:55 UTC (permalink / raw)
  To: Karim Yaghmour; +Cc: lkml

On Thu Mar 18, 2004 at 10:23:55AM -0500, Karim Yaghmour wrote:
> except that it's totally hardware independent. No to mention that a
> nanokernel's behavior can be modified/extended while a CPU's behavior is
> pretty much ... hmmm, well, fixed in silicone ...

Silicone?  You expect CPU behavior to jiggle around
a lot I suppose.  ;-)

 -Erik

--
Erik B. Andersen             http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-20 11:28                 ` Stefan Smietanowski
  2004-03-20 11:41                   ` Arjan van de Ven
@ 2004-03-21  8:00                   ` Kai Henningsen
  2004-03-21 10:32                     ` Stefan Smietanowski
  2004-03-22 22:34                   ` Micha Feigin
  2 siblings, 1 reply; 75+ messages in thread
From: Kai Henningsen @ 2004-03-21  8:00 UTC (permalink / raw)
  To: linux-kernel

stesmi@stesmi.com (Stefan Smietanowski)  wrote on 20.03.04 in <405C2AC0.70605@stesmi.com>:

> >>>there is one. Nothing uses it
> >>>(sysconf() provides this info)
> >>
> >>Seems to me that it would be fairly trivial to modify those programs
> >>(that should use this mechanism but don't) to use it?  So why should
> >>they be allowed to dictate kernel behaviour?
> >
> >
> > quality of implementation; for example shell scripts that want to do
> > echo 500 > /proc/sys/foo/bar/something_in_HZ
> > ...
> > or /etc/sysctl.conf or ...
> >
>
> Then write a simple program already. How hard is it to write a program
> that does a sysconf() and returns (as ascii of course) just the
> value of HZ? Then do some trivial calculation off of that.

How about a slightly more useful utility, like this:

$ getconf CLK_TCK
100
$ getconf OPEN_MAX
1024
$ getconf PATH_MAX /proc/
4096
$

MfG Kai

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-21  8:00                   ` Kai Henningsen
@ 2004-03-21 10:32                     ` Stefan Smietanowski
  0 siblings, 0 replies; 75+ messages in thread
From: Stefan Smietanowski @ 2004-03-21 10:32 UTC (permalink / raw)
  To: Kai Henningsen; +Cc: linux-kernel

Hi.

>>>>>there is one. Nothing uses it
>>>>>(sysconf() provides this info)
>>>>
>>>>Seems to me that it would be fairly trivial to modify those programs
>>>>(that should use this mechanism but don't) to use it?  So why should
>>>>they be allowed to dictate kernel behaviour?
>>>
>>>
>>>quality of implementation; for example shell scripts that want to do
>>>echo 500 > /proc/sys/foo/bar/something_in_HZ
>>>...
>>>or /etc/sysctl.conf or ...
>>>
>>
>>Then write a simple program already. How hard is it to write a program
>>that does a sysconf() and returns (as ascii of course) just the
>>value of HZ? Then do some trivial calculation off of that.
> 
> 
> How about a slightly more useful utility, like this:
> 
> $ getconf CLK_TCK
> 100
> $ getconf OPEN_MAX
> 1024
> $ getconf PATH_MAX /proc/
> 4096
> $

Yes, yes, yes, I like that one actually.

It does solve the shell script issues and we've never said that things
don't need to adapt to changes before so I don't see why not now.

And that one would be good to have regardless of the HZ issue.

// Stefan

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-20 11:28                 ` Stefan Smietanowski
  2004-03-20 11:41                   ` Arjan van de Ven
  2004-03-21  8:00                   ` Kai Henningsen
@ 2004-03-22 22:34                   ` Micha Feigin
  2004-03-22 23:04                     ` Peter Williams
  2 siblings, 1 reply; 75+ messages in thread
From: Micha Feigin @ 2004-03-22 22:34 UTC (permalink / raw)
  To: lkml

On Sat, Mar 20, 2004 at 12:28:00PM +0100, Stefan Smietanowski wrote:
> >>>there is one. Nothing uses it
> >>>(sysconf() provides this info)
> >>
> >>Seems to me that it would be fairly trivial to modify those programs 
> >>(that should use this mechanism but don't) to use it?  So why should 
> >>they be allowed to dictate kernel behaviour?
> >
> >
> >quality of implementation; for example shell scripts that want to do
> >echo 500 > /proc/sys/foo/bar/something_in_HZ
> >...
> >or /etc/sysctl.conf or ...
> >
> 
> Then write a simple program already. How hard is it to write a program
> that does a sysconf() and returns (as ascii of course) just the
> value of HZ? Then do some trivial calculation off of that.
> 
> HZ=$(gethz)
> 
> If your 500 was 5 seconds, do
> 
> TIME=$[HZ*5]
> echo $TIME > /proc/sys/foo/bar/something_in_HZ
> 

Will this be USER_HZ or kernel HZ?
Someone earlier suggested it would be USER_HZ which would make it
pointless.

> I mean, come on.
> 
> Then you include it in the default distro of choice so that
> everybody can use it and there you are.
> 
> If someone doesn't have "gethz" then they can download it.
> 
> // Stefan
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-22 22:34                   ` Micha Feigin
@ 2004-03-22 23:04                     ` Peter Williams
  2004-03-25 17:40                       ` Jamie Lokier
  2004-03-27 21:11                       ` Micha Feigin
  0 siblings, 2 replies; 75+ messages in thread
From: Peter Williams @ 2004-03-22 23:04 UTC (permalink / raw)
  To: Micha Feigin; +Cc: lkml

Micha Feigin wrote:
> On Sat, Mar 20, 2004 at 12:28:00PM +0100, Stefan Smietanowski wrote:
> 
>>>>>there is one. Nothing uses it
>>>>>(sysconf() provides this info)
>>>>
>>>>Seems to me that it would be fairly trivial to modify those programs 
>>>>(that should use this mechanism but don't) to use it?  So why should 
>>>>they be allowed to dictate kernel behaviour?
>>>
>>>
>>>quality of implementation; for example shell scripts that want to do
>>>echo 500 > /proc/sys/foo/bar/something_in_HZ
>>>...
>>>or /etc/sysctl.conf or ...
>>>
>>
>>Then write a simple program already. How hard is it to write a program
>>that does a sysconf() and returns (as ascii of course) just the
>>value of HZ? Then do some trivial calculation off of that.
>>
>>HZ=$(gethz)
>>
>>If your 500 was 5 seconds, do
>>
>>TIME=$[HZ*5]
>>echo $TIME > /proc/sys/foo/bar/something_in_HZ
>>
> 
> 
> Will this be USER_HZ or kernel HZ?
> Someone earlier suggested it would be USER_HZ which would make it
> pointless.

It has to be whatever enables user space to correctly interpret values 
sent to user space as "ticks".  That means USER_HZ and it's not useless 
as it enables USER_HZ to be different and/or change without breaking 
programs that use values expressed in "ticks".

> 
> 
>>I mean, come on.
>>
>>Then you include it in the default distro of choice so that
>>everybody can use it and there you are.
>>
>>If someone doesn't have "gethz" then they can download it.
>>
>>// Stefan
>>
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Call for HRT in 2.6 kernel was Re: finding out the value of HZ from userspace
  2004-03-21  1:55                 ` Erik Andersen
@ 2004-03-23 22:35                   ` Karim Yaghmour
  0 siblings, 0 replies; 75+ messages in thread
From: Karim Yaghmour @ 2004-03-23 22:35 UTC (permalink / raw)
  To: andersen; +Cc: lkml


Erik Andersen wrote:
> Silicone?  You expect CPU behavior to jiggle around
> a lot I suppose.  ;-)

... I'm sure that it would sometimes help explain odd system behavior :)

Speaking of sexy, though, you gotta love these virtual interrupts.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-22 23:04                     ` Peter Williams
@ 2004-03-25 17:40                       ` Jamie Lokier
  2004-03-25 23:22                         ` Peter Williams
  2004-03-27 21:11                       ` Micha Feigin
  1 sibling, 1 reply; 75+ messages in thread
From: Jamie Lokier @ 2004-03-25 17:40 UTC (permalink / raw)
  To: Peter Williams; +Cc: Micha Feigin, lkml

Peter Williams wrote:
> >Will this be USER_HZ or kernel HZ?
> >Someone earlier suggested it would be USER_HZ which would make it
> >pointless.
> 
> It has to be whatever enables user space to correctly interpret values 
> sent to user space as "ticks".  That means USER_HZ and it's not useless 
> as it enables USER_HZ to be different and/or change without breaking 
> programs that use values expressed in "ticks".

It is, however, useless for the _other_ reasons userspace needs to
know kernel HZ, including as I mentioned userspace timer granularity.

(Btw, that usage would be better as a period rather than a frequency,
so that a "tickless" kernel can report zero).

The fundamental problem is that there are two values, and both values
have programs which can usefully use them.

How hard can it be to export both?

-- Jamie







^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-25 17:40                       ` Jamie Lokier
@ 2004-03-25 23:22                         ` Peter Williams
  2004-03-27 13:31                           ` Jamie Lokier
  0 siblings, 1 reply; 75+ messages in thread
From: Peter Williams @ 2004-03-25 23:22 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Micha Feigin, lkml

Jamie Lokier wrote:
> Peter Williams wrote:
> 
>>>Will this be USER_HZ or kernel HZ?
>>>Someone earlier suggested it would be USER_HZ which would make it
>>>pointless.
>>
>>It has to be whatever enables user space to correctly interpret values 
>>sent to user space as "ticks".  That means USER_HZ and it's not useless 
>>as it enables USER_HZ to be different and/or change without breaking 
>>programs that use values expressed in "ticks".
> 
> 
> It is, however, useless for the _other_ reasons userspace needs to
> know kernel HZ, including as I mentioned userspace timer granularity.

Theoretically, which I know can be a pain, user space timer granularity 
should be in USER_HZ as, theoretically, this is the only one user space 
is supposed to know about.  Because of this, in my view, HZ and USER_HZ 
should be the same or USER_HZ should be greater than HZ.

> 
> (Btw, that usage would be better as a period rather than a frequency,
> so that a "tickless" kernel can report zero).

_SC_CLK_TCK is a POSIX.1 definition and can't be changed.  But I don't 
think that there's any impediment to adding new parameters that can be 
reported by sysconf().

> 
> The fundamental problem is that there are two values,  and both values
> have programs which can usefully use them.
> 
> How hard can it be to export both?
> 

Making HZ == USER_HZ would also solve the problem.

Peter
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-25 23:22                         ` Peter Williams
@ 2004-03-27 13:31                           ` Jamie Lokier
  2004-03-27 23:52                             ` Peter Williams
  0 siblings, 1 reply; 75+ messages in thread
From: Jamie Lokier @ 2004-03-27 13:31 UTC (permalink / raw)
  To: Peter Williams; +Cc: Micha Feigin, lkml

Peter Williams wrote:
> Making HZ == USER_HZ would also solve the problem.

They were equal once.

Making them equal now would reintroduce the problem that USER_HZ was
created to resolve: some userspace programs hard-code the value, so it
cannot be changed in interfaces used by those programs.

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-22 23:04                     ` Peter Williams
  2004-03-25 17:40                       ` Jamie Lokier
@ 2004-03-27 21:11                       ` Micha Feigin
  1 sibling, 0 replies; 75+ messages in thread
From: Micha Feigin @ 2004-03-27 21:11 UTC (permalink / raw)
  To: lkml

On Tue, Mar 23, 2004 at 10:04:22AM +1100, Peter Williams wrote:
> Micha Feigin wrote:
> >On Sat, Mar 20, 2004 at 12:28:00PM +0100, Stefan Smietanowski wrote:
> >
> >>>>>there is one. Nothing uses it
> >>>>>(sysconf() provides this info)
> >>>>
> >>>>Seems to me that it would be fairly trivial to modify those programs 
> >>>>(that should use this mechanism but don't) to use it?  So why should 
> >>>>they be allowed to dictate kernel behaviour?
> >>>
> >>>
> >>>quality of implementation; for example shell scripts that want to do
> >>>echo 500 > /proc/sys/foo/bar/something_in_HZ
> >>>...
> >>>or /etc/sysctl.conf or ...
> >>>
> >>
> >>Then write a simple program already. How hard is it to write a program
> >>that does a sysconf() and returns (as ascii of course) just the
> >>value of HZ? Then do some trivial calculation off of that.
> >>
> >>HZ=$(gethz)
> >>
> >>If your 500 was 5 seconds, do
> >>
> >>TIME=$[HZ*5]
> >>echo $TIME > /proc/sys/foo/bar/something_in_HZ
> >>
> >
> >
> >Will this be USER_HZ or kernel HZ?
> >Someone earlier suggested it would be USER_HZ which would make it
> >pointless.
> 
> It has to be whatever enables user space to correctly interpret values 
> sent to user space as "ticks".  That means USER_HZ and it's not useless 
> as it enables USER_HZ to be different and/or change without breaking 
> programs that use values expressed in "ticks".
> 

Unless the kernel is converted to make that conversion possible then it
is useless at the moment since userspace gets USER_HZ and the kernel
proc interface speaks (KERNEL) HZ so userspace really has no idea how
to speak to kernel space with 2.6.

> >
> >
> >>I mean, come on.
> >>
> >>Then you include it in the default distro of choice so that
> >>everybody can use it and there you are.
> >>
> >>If someone doesn't have "gethz" then they can download it.
> >>
> >>// Stefan
> >>
> >
> >-
> >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
> -- 
> Dr Peter Williams, Chief Scientist                peterw@aurema.com
> Aurema Pty Limited                                Tel:+61 2 9698 2322
> PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
> 79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com
> 
> 
> +++++++++++++++++++++++++++++++++++++++++++
> This Mail Was Scanned By Mail-seCure System
> at the Tel-Aviv University CC.
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-27 13:31                           ` Jamie Lokier
@ 2004-03-27 23:52                             ` Peter Williams
  2004-03-28 12:16                               ` Jamie Lokier
  0 siblings, 1 reply; 75+ messages in thread
From: Peter Williams @ 2004-03-27 23:52 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Micha Feigin, lkml

Jamie Lokier wrote:
> Peter Williams wrote:
> 
>>Making HZ == USER_HZ would also solve the problem.
> 
> 
> They were equal once.
> 
> Making them equal now would reintroduce the problem that USER_HZ was
> created to resolve: some userspace programs hard-code the value, so it
> cannot be changed in interfaces used by those programs.

That was the wrong solution to that particular problem.  The programs 
should have been fixed rather than the kernel being maimed to 
accommodate their shortcomings.

Peter
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-27 23:52                             ` Peter Williams
@ 2004-03-28 12:16                               ` Jamie Lokier
  0 siblings, 0 replies; 75+ messages in thread
From: Jamie Lokier @ 2004-03-28 12:16 UTC (permalink / raw)
  To: Peter Williams; +Cc: Micha Feigin, lkml

Peter Williams wrote:
> >>Making HZ == USER_HZ would also solve the problem.
> >
> >Making them equal now would reintroduce the problem that USER_HZ was
> >created to resolve: some userspace programs hard-code the value, so it
> >cannot be changed in interfaces used by those programs.
> 
> That was the wrong solution to that particular problem.  The programs 
> should have been fixed rather than the kernel being maimed to 
> accommodate their shortcomings.

I agree, and perhaps that should still be done so we can eliminate USER_HZ.

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-04-02 18:28                       ` Tim Bird
@ 2004-04-02 22:05                         ` Peter Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Peter Williams @ 2004-04-02 22:05 UTC (permalink / raw)
  To: Tim Bird
  Cc: Randy.Dunlap, Richard.Curnow, ak, Arjan van de Ven, aeb,
	Jamie Lokier, linux-kernel mailing list, Albert Cahalan

Tim Bird wrote:
> Peter Williams wrote:
> 
>>> It's not possible to change USER_HZ.  There are too many programs with
>>> the number hard-coded into the binary.
>>
>>
>> This is an argument that the tail should be allowed to wag the dog and 
>> is not really valid :-)
> 
> 
> It is an interesting, but untenable, position that the applications
> are the tail and the OS is the dog.  The OS exists to serve the 
> applications.
> The applications, are, after all what a user actually DOES with their 
> computer.

I guess wagging was a bad analogy.  I was thinking in terms of the 
kernel being the main entity and the programs being peripheral in the 
sense that the kernel can exist without the programs but the programs 
can't exist without the kernel.

> 
> It is possible that the current applications which use hardcoded USER_HZ 
> are
> not important enough, or are easy enough to fix, that the cost in 
> incompatibility
> is offset by the benefit of providing different behaviour for future 
> applications.

Yes, this is the real point is that the facilities provided by the 
kernel shouldn't be tailored/compromised to cope with the problems of a 
couple of buggy programs especially when fixing the programs would be 
trivial.  I don't think the importance of the program is an issue as I 
doubt that there is any program that is so important that its 
requirements dictate kernel design.

> 
> But breaking them for no good reason, and particularly while there is a
> migration path possible over time which does not break compatibility, 
> seems like
> a bad idea.

Far more important things than these programs have been "broken" by 
changes in the kernel (I know, I've had to cope with them getting 2.6.X 
kernels to work with Red Hat 9) but no one complains or suggests that 
the kernel should revert to its original behaviour.  Change is part of 
progress and has to be coped with not resisted for no good reason.

Peter
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-04-02  1:44                     ` Peter Williams
@ 2004-04-02 18:28                       ` Tim Bird
  2004-04-02 22:05                         ` Peter Williams
  0 siblings, 1 reply; 75+ messages in thread
From: Tim Bird @ 2004-04-02 18:28 UTC (permalink / raw)
  To: Peter Williams
  Cc: Jamie Lokier, Arjan van de Ven, Albert Cahalan, Randy.Dunlap, ak,
	Richard.Curnow, aeb, linux-kernel mailing list

Peter Williams wrote:
>> It's not possible to change USER_HZ.  There are too many programs with
>> the number hard-coded into the binary.
> 
> This is an argument that the tail should be allowed to wag the dog and 
> is not really valid :-)

It is an interesting, but untenable, position that the applications
are the tail and the OS is the dog.  The OS exists to serve the applications.
The applications, are, after all what a user actually DOES with their computer.

It is possible that the current applications which use hardcoded USER_HZ are
not important enough, or are easy enough to fix, that the cost in incompatibility
is offset by the benefit of providing different behaviour for future applications.

But breaking them for no good reason, and particularly while there is a
migration path possible over time which does not break compatibility, seems like
a bad idea.

=============================
Tim Bird
Architecture Group Co-Chair
CE Linux Forum
Senior Staff Engineer
Sony Electronics
E-mail: Tim.Bird@am.sony.com
=============================


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-04-02  0:39                   ` Jamie Lokier
@ 2004-04-02  1:44                     ` Peter Williams
  2004-04-02 18:28                       ` Tim Bird
  0 siblings, 1 reply; 75+ messages in thread
From: Peter Williams @ 2004-04-02  1:44 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Arjan van de Ven, Albert Cahalan, Randy.Dunlap, ak,
	Richard.Curnow, aeb, linux-kernel mailing list

Jamie Lokier wrote:
> Peter Williams wrote:
> 
>>>When we go to a tickless kernel and offer high-resolution timers to
>>>userspace, then it will be irrelevant.  Until then, or if the kernel
>>>goes tickless but limits the resolution of timers for efficiency, the
>>>value of HZ is still relevant.
>>
>>The resolution will always be limited.  That's the nature of digital 
>>systems.  Unlimited resolution would require real "real" numbers and 
>>that's not possible.  The nearest you get on a digital system is the 
>>floating point APPROXIMATION to real numbers.
> 
> 
> Sure, but HZ will still be irrelevant.  There won't be a HZ to report.
> 
> 
>>IMHO, as I've said several times, USER_HZ should be changed to be equal 
>>to or greater than HZ.  In fact, if having USER_HZ greater than HZ would 
>>still make it unusable for your purposes, I'd change that opinion to say 
>>USER_HZ should be equal to HZ (or, in other words, cease to exist).
> 
> 
> It's not possible to change USER_HZ.  There are too many programs with
> the number hard-coded into the binary.

This is an argument that the tail should be allowed to wag the dog and 
is not really valid :-)

>  The best we could do is make
> the HZ userspace macro non-constant, so it calls sysconf(_SC_CLK_TCK),
> and wait a few years until practically all programs being used no
> longer contain a hard-coded constant.  Then we could get rid of USER_HZ again.

If USER_HZ is dispensed with the programs will get fixed pretty quick 
but as long as this concession to buggy programs is made they won't get 
fixed (because they don't have to be).

Peter
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-04-02  0:07                 ` Peter Williams
@ 2004-04-02  0:39                   ` Jamie Lokier
  2004-04-02  1:44                     ` Peter Williams
  0 siblings, 1 reply; 75+ messages in thread
From: Jamie Lokier @ 2004-04-02  0:39 UTC (permalink / raw)
  To: Peter Williams
  Cc: Arjan van de Ven, Albert Cahalan, Randy.Dunlap, ak,
	Richard.Curnow, aeb, linux-kernel mailing list

Peter Williams wrote:
> >When we go to a tickless kernel and offer high-resolution timers to
> >userspace, then it will be irrelevant.  Until then, or if the kernel
> >goes tickless but limits the resolution of timers for efficiency, the
> >value of HZ is still relevant.
> 
> The resolution will always be limited.  That's the nature of digital 
> systems.  Unlimited resolution would require real "real" numbers and 
> that's not possible.  The nearest you get on a digital system is the 
> floating point APPROXIMATION to real numbers.

Sure, but HZ will still be irrelevant.  There won't be a HZ to report.

> IMHO, as I've said several times, USER_HZ should be changed to be equal 
> to or greater than HZ.  In fact, if having USER_HZ greater than HZ would 
> still make it unusable for your purposes, I'd change that opinion to say 
> USER_HZ should be equal to HZ (or, in other words, cease to exist).

It's not possible to change USER_HZ.  There are too many programs with
the number hard-coded into the binary.  The best we could do is make
the HZ userspace macro non-constant, so it calls sysconf(_SC_CLK_TCK),
and wait a few years until practically all programs being used no
longer contain a hard-coded constant.  Then we could get rid of USER_HZ again.

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-04-01 16:50                 ` Richard B. Johnson
  2004-04-01 17:01                   ` Jamie Lokier
  2004-04-01 21:27                   ` Michael Buesch
@ 2004-04-02  0:16                   ` Peter Williams
  2 siblings, 0 replies; 75+ messages in thread
From: Peter Williams @ 2004-04-02  0:16 UTC (permalink / raw)
  To: root
  Cc: Jamie Lokier, Arjan van de Ven, Albert Cahalan, Randy.Dunlap, ak,
	Richard.Curnow, aeb, linux-kernel mailing list

Richard B. Johnson wrote:
> On Thu, 1 Apr 2004, Jamie Lokier wrote:
> 
> 
>>Arjan van de Ven wrote:
>>
>>>HZ doesn't mean nothing, esp when we go to a tickless kernel...
>>
>>As explained several times in this thread, HZ is meaningful because it
>>affects the rounding in select/poll/epoll/setitimer.  A few userspace
>>programs with low jitter soft-RT timing requirements need to
>>compensate for that rounding and/or deliberately synchronise
>>themselves with the tick.
>>
>>Such programs can determine HZ experimentally and lock onto the tick
>>in the manner of a PLL, but it would be nice to simply be able to
>>have the value, to reduce the number of control variables.
>>
>>When we go to a tickless kernel and offer high-resolution timers to
>>userspace, then it will be irrelevant.  Until then, or if the kernel
>>goes tickless but limits the resolution of timers for efficiency, the
>>value of HZ is still relevant.
>>
>>Not to get irritatingly back to the subject of this thread or
>>anything, but...  is the value of HZ reported to userspace anywhere?
>>
>>Thanks :)
>>-- Jamie
> 
> 
> I may be naive, but what's the matter with:
> 
> #include <stdio.h>
> #include <sys/param.h>   // Required to be here!
> int main()
> {
>     printf("HZ=%d\n", HZ);
>     return 0;
> }
> It works for me.

There's no guarantee that the kernel that's running was compiled using 
that header file which (on my system i.e. RedHat 9) comes as part of the 
glibc package.  It also gets the value indirectly via linux/param.h 
which in turn gets it via asm/param.h which makes any such program 
highly non portable.  Not to mention that the HZ obtained this way is 
100 which is actually not the same as HZ in the 2.6.5-rc3 kernel that 
I'm running.

Peter
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-04-01 16:30               ` Jamie Lokier
  2004-04-01 16:50                 ` Richard B. Johnson
@ 2004-04-02  0:07                 ` Peter Williams
  2004-04-02  0:39                   ` Jamie Lokier
  1 sibling, 1 reply; 75+ messages in thread
From: Peter Williams @ 2004-04-02  0:07 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Arjan van de Ven, Albert Cahalan, Randy.Dunlap, ak,
	Richard.Curnow, aeb, linux-kernel mailing list

Jamie Lokier wrote:
> Arjan van de Ven wrote:
> 
>>HZ doesn't mean nothing, esp when we go to a tickless kernel...
> 
> 
> As explained several times in this thread, HZ is meaningful because it
> affects the rounding in select/poll/epoll/setitimer.  A few userspace
> programs with low jitter soft-RT timing requirements need to
> compensate for that rounding and/or deliberately synchronise
> themselves with the tick.
> 
> Such programs can determine HZ experimentally and lock onto the tick
> in the manner of a PLL, but it would be nice to simply be able to
> have the value, to reduce the number of control variables.
> 
> When we go to a tickless kernel and offer high-resolution timers to
> userspace, then it will be irrelevant.  Until then, or if the kernel
> goes tickless but limits the resolution of timers for efficiency, the
> value of HZ is still relevant.

The resolution will always be limited.  That's the nature of digital 
systems.  Unlimited resolution would require real "real" numbers and 
that's not possible.  The nearest you get on a digital system is the 
floating point APPROXIMATION to real numbers.

> 
> Not to get irritatingly back to the subject of this thread or
> anything, but...  is the value of HZ reported to userspace anywhere?

I don't think so.  There are those (I'm not one) who insist that to do 
so would be a bug.

IMHO, as I've said several times, USER_HZ should be changed to be equal 
to or greater than HZ.  In fact, if having USER_HZ greater than HZ would 
still make it unusable for your purposes, I'd change that opinion to say 
USER_HZ should be equal to HZ (or, in other words, cease to exist).

Peter
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-04-01 16:50                 ` Richard B. Johnson
  2004-04-01 17:01                   ` Jamie Lokier
@ 2004-04-01 21:27                   ` Michael Buesch
  2004-04-02  0:16                   ` Peter Williams
  2 siblings, 0 replies; 75+ messages in thread
From: Michael Buesch @ 2004-04-01 21:27 UTC (permalink / raw)
  To: root; +Cc: linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thursday 01 April 2004 18:50, Richard B. Johnson wrote:
> I may be naive, but what's the matter with:
> 
> #include <stdio.h>
> #include <sys/param.h>   // Required to be here!
> int main()
> {
>     printf("HZ=%d\n", HZ);
>     return 0;
> }
> It works for me.

What when you compile this tool under a system with,
for example 2.4 kern-headers, and switch to a system
with a 2.6 kernel and kern-headers? It still reports
HZ=100 and that's not true anymore.

- -- 
Regards Michael Buesch  [ http://www.tuxsoft.de.vu ]

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAbIkzFGK1OIvVOP4RArmgAJ0QKFVPLjyYH/OZVox9TLGEGSKHWACcC6FP
b++fJyobg5K+FP7Nskx4Djo=
=SckD
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-04-01 16:50                 ` Richard B. Johnson
@ 2004-04-01 17:01                   ` Jamie Lokier
  2004-04-01 21:27                   ` Michael Buesch
  2004-04-02  0:16                   ` Peter Williams
  2 siblings, 0 replies; 75+ messages in thread
From: Jamie Lokier @ 2004-04-01 17:01 UTC (permalink / raw)
  To: Richard B. Johnson
  Cc: Arjan van de Ven, Albert Cahalan, Randy.Dunlap, Peter Williams,
	ak, Richard.Curnow, aeb, linux-kernel mailing list

Richard B. Johnson wrote:
> > Not to get irritatingly back to the subject of this thread or
> > anything, but...  is the value of HZ reported to userspace anywhere?
> 
> I may be naive, but what's the matter with:
> 
> #include <sys/param.h>   // Required to be here!
> int main()
> {
>     printf("HZ=%d\n", HZ);
>     return 0;
> }
> It works for me.

It gives the wrong answer for HZ on 2.6 kernels.  Try it.

The value called "HZ" we are talking about in this thread is the timer
interrupt frequency.  On 2.6 kernels, on x86, that is 1000.  Your
program prints 100.

The reason that you are able to use "HZ" from userspace and get the
wrong answer is that the macros have different names when used from
userspace than from kernelspace.

The value your program reports is what we mean by USER_HZ in this
thread.  That macro is renamed to HZ when the kernel header
<linux/param.h> is included from userspace, for backward
source compatibility with some programs.

Your method also perpetuates the problem that USER_HZ is hard-coded as
a constant into programs, so cannot ever be changed.  Perhaps the
header files should redefine "HZ" to call sysconf(_SC_CLK_TCK)
nowadays, but presently they don't.

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-04-01 16:30               ` Jamie Lokier
@ 2004-04-01 16:50                 ` Richard B. Johnson
  2004-04-01 17:01                   ` Jamie Lokier
                                     ` (2 more replies)
  2004-04-02  0:07                 ` Peter Williams
  1 sibling, 3 replies; 75+ messages in thread
From: Richard B. Johnson @ 2004-04-01 16:50 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Arjan van de Ven, Albert Cahalan, Randy.Dunlap, Peter Williams,
	ak, Richard.Curnow, aeb, linux-kernel mailing list

On Thu, 1 Apr 2004, Jamie Lokier wrote:

> Arjan van de Ven wrote:
> > HZ doesn't mean nothing, esp when we go to a tickless kernel...
>
> As explained several times in this thread, HZ is meaningful because it
> affects the rounding in select/poll/epoll/setitimer.  A few userspace
> programs with low jitter soft-RT timing requirements need to
> compensate for that rounding and/or deliberately synchronise
> themselves with the tick.
>
> Such programs can determine HZ experimentally and lock onto the tick
> in the manner of a PLL, but it would be nice to simply be able to
> have the value, to reduce the number of control variables.
>
> When we go to a tickless kernel and offer high-resolution timers to
> userspace, then it will be irrelevant.  Until then, or if the kernel
> goes tickless but limits the resolution of timers for efficiency, the
> value of HZ is still relevant.
>
> Not to get irritatingly back to the subject of this thread or
> anything, but...  is the value of HZ reported to userspace anywhere?
>
> Thanks :)
> -- Jamie

I may be naive, but what's the matter with:

#include <stdio.h>
#include <sys/param.h>   // Required to be here!
int main()
{
    printf("HZ=%d\n", HZ);
    return 0;
}
It works for me.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.24 on an i686 machine (797.90 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-04-01 16:01             ` Arjan van de Ven
@ 2004-04-01 16:30               ` Jamie Lokier
  2004-04-01 16:50                 ` Richard B. Johnson
  2004-04-02  0:07                 ` Peter Williams
  0 siblings, 2 replies; 75+ messages in thread
From: Jamie Lokier @ 2004-04-01 16:30 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Albert Cahalan, Randy.Dunlap, Peter Williams, ak, Richard.Curnow,
	aeb, linux-kernel mailing list

Arjan van de Ven wrote:
> HZ doesn't mean nothing, esp when we go to a tickless kernel...

As explained several times in this thread, HZ is meaningful because it
affects the rounding in select/poll/epoll/setitimer.  A few userspace
programs with low jitter soft-RT timing requirements need to
compensate for that rounding and/or deliberately synchronise
themselves with the tick.

Such programs can determine HZ experimentally and lock onto the tick
in the manner of a PLL, but it would be nice to simply be able to
have the value, to reduce the number of control variables.

When we go to a tickless kernel and offer high-resolution timers to
userspace, then it will be irrelevant.  Until then, or if the kernel
goes tickless but limits the resolution of timers for efficiency, the
value of HZ is still relevant.

Not to get irritatingly back to the subject of this thread or
anything, but...  is the value of HZ reported to userspace anywhere?

Thanks :)
-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-04-01 15:54           ` Jamie Lokier
  2004-04-01 16:01             ` Arjan van de Ven
@ 2004-04-01 16:12             ` Albert Cahalan
  1 sibling, 0 replies; 75+ messages in thread
From: Albert Cahalan @ 2004-04-01 16:12 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Albert Cahalan, Randy.Dunlap, Peter Williams, arjanv, ak,
	Richard.Curnow, aeb, linux-kernel mailing list

On Thu, 2004-04-01 at 10:54, Jamie Lokier wrote:
> Albert Cahalan wrote:
> > If you rely on sysconf(_SC_CLK_TCK) to work, then
> > your software will support:
> > 
> > * all systems with a 2.6.xx kernel
> > * all systems with a 2.4.xx kernel and recent glibc
> > * all i386 systems running with the default HZ
> > 
> > That's quite a bit I suppose. Maybe you have no
> > interest in supporting a 1200 HZ Alpha with an old
> > kernel or glibc. Maybe you don't care about somebody
> > running a 2.2.xx kernel with modified HZ.
> 
> I'm still unclear.  Does sysconf(_SC_CLK_TCK), when it is reliable,
> return HZ or USER_HZ?

I consider "reliable" to mean it returns whatever is
used by /proc and other kernel interfaces. Prior to the
2.6.xx (and late 2.5.xx) kernels USER_HZ did not exist.

On a 2.6.xx kernel, you get back USER_HZ.

On a 2.4.xx kernel with recent glibc, you get
back HZ, which works OK since there isn't any
HZ to USER_HZ conversion.

On any i386 system with the default HZ, you
will get back 100. On older systems, glibc is
just giving you a constant value -- so it is
correct if your system is an i386 without any
non-Linus modifications. An old glibc can only
do sysconf(_SC_CLK_TCK) this way.



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-04-01 15:54           ` Jamie Lokier
@ 2004-04-01 16:01             ` Arjan van de Ven
  2004-04-01 16:30               ` Jamie Lokier
  2004-04-01 16:12             ` Albert Cahalan
  1 sibling, 1 reply; 75+ messages in thread
From: Arjan van de Ven @ 2004-04-01 16:01 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Albert Cahalan, Randy.Dunlap, Peter Williams, ak, Richard.Curnow,
	aeb, linux-kernel mailing list

[-- Attachment #1: Type: text/plain, Size: 756 bytes --]

On Thu, Apr 01, 2004 at 04:54:20PM +0100, Jamie Lokier wrote:
> Albert Cahalan wrote:
> > If you rely on sysconf(_SC_CLK_TCK) to work, then
> > your software will support:
> > 
> > * all systems with a 2.6.xx kernel
> > * all systems with a 2.4.xx kernel and recent glibc
> > * all i386 systems running with the default HZ
> > 
> > That's quite a bit I suppose. Maybe you have no
> > interest in supporting a 1200 HZ Alpha with an old
> > kernel or glibc. Maybe you don't care about somebody
> > running a 2.2.xx kernel with modified HZ.
> 
> I'm still unclear.  Does sysconf(_SC_CLK_TCK), when it is reliable,
> return HZ or USER_HZ?

USER_HZ; the value all the userspace interfaces are in.
HZ doesn't mean nothing, esp when we go to a tickless kernel...

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-31 23:46         ` Albert Cahalan
@ 2004-04-01 15:54           ` Jamie Lokier
  2004-04-01 16:01             ` Arjan van de Ven
  2004-04-01 16:12             ` Albert Cahalan
  0 siblings, 2 replies; 75+ messages in thread
From: Jamie Lokier @ 2004-04-01 15:54 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: Randy.Dunlap, Peter Williams, arjanv, ak, Richard.Curnow, aeb,
	linux-kernel mailing list

Albert Cahalan wrote:
> If you rely on sysconf(_SC_CLK_TCK) to work, then
> your software will support:
> 
> * all systems with a 2.6.xx kernel
> * all systems with a 2.4.xx kernel and recent glibc
> * all i386 systems running with the default HZ
> 
> That's quite a bit I suppose. Maybe you have no
> interest in supporting a 1200 HZ Alpha with an old
> kernel or glibc. Maybe you don't care about somebody
> running a 2.2.xx kernel with modified HZ.

I'm still unclear.  Does sysconf(_SC_CLK_TCK), when it is reliable,
return HZ or USER_HZ?

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-31 21:40       ` Randy.Dunlap
@ 2004-03-31 23:46         ` Albert Cahalan
  2004-04-01 15:54           ` Jamie Lokier
  0 siblings, 1 reply; 75+ messages in thread
From: Albert Cahalan @ 2004-03-31 23:46 UTC (permalink / raw)
  To: Randy.Dunlap
  Cc: Peter Williams, albert, arjanv, ak, Richard.Curnow, aeb,
	linux-kernel mailing list

> | >>>>there is one. Nothing uses it
> | >>>>(sysconf() provides this info)
> | >>>
> | >>>If you have a recent glibc on a recent kernel, it might.
> | >>>You could also get a -1 or a supposed ABI value that
> | >>>has nothing to do with the kernel currently running.
> | >>>The most reliable way is to first look around on the
> | >>>stack in search of ELF notes, and then fall back to
> | >>>some horribly gross hacks as needed.
> | >>
> | >>eh sysconf() is the nice way to get to the ELF notes
> | >>instead of having to grovel yourself.
> | > 
> | > 
> | > Unless there is some hidden feature that lets
> | > me specify the ELF note number directly, no way.
> | > 
> | > The sysconf(_SC_CLK_TCK) call does not return an
> | > error code when used on a 2.2.xx i386 kernel.
> | > You get an arbitrary value that fails for ARM,
> | > Alpha, and any system with modified HZ.
> | 
> | As Linux is supposed to be POSIX compliant this is a bug and should be 
> | fixed.
> 
> 
> My understanding (from a few years back) is that Linux is POSIX
> if/when/where it makes sense, but not necessarily POSIX-just-to-be-POSIX.

The fixing has been done.

This is not yet helpful for app developers, because
old kernels and old libraries are still in use.

If you rely on sysconf(_SC_CLK_TCK) to work, then
your software will support:

* all systems with a 2.6.xx kernel
* all systems with a 2.4.xx kernel and recent glibc
* all i386 systems running with the default HZ

That's quite a bit I suppose. Maybe you have no
interest in supporting a 1200 HZ Alpha with an old
kernel or glibc. Maybe you don't care about somebody
running a 2.2.xx kernel with modified HZ.

For the moment, I still care. I won't for long.



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-20 23:58     ` Peter Williams
@ 2004-03-31 21:40       ` Randy.Dunlap
  2004-03-31 23:46         ` Albert Cahalan
  0 siblings, 1 reply; 75+ messages in thread
From: Randy.Dunlap @ 2004-03-31 21:40 UTC (permalink / raw)
  To: Peter Williams; +Cc: albert, arjanv, ak, Richard.Curnow, aeb, linux-kernel

On Sun, 21 Mar 2004 10:58:20 +1100 Peter Williams wrote:

| Albert Cahalan wrote:
| > On Sat, 2004-03-20 at 04:56, Arjan van de Ven wrote:
| > 
| >>On Tue, Mar 16, 2004 at 11:14:59AM -0500, Albert Cahalan wrote:
| >>
| >>>>there is one. Nothing uses it
| >>>>(sysconf() provides this info)
| >>>
| >>>If you have a recent glibc on a recent kernel, it might.
| >>>You could also get a -1 or a supposed ABI value that
| >>>has nothing to do with the kernel currently running.
| >>>The most reliable way is to first look around on the
| >>>stack in search of ELF notes, and then fall back to
| >>>some horribly gross hacks as needed.
| >>
| >>eh sysconf() is the nice way to get to the ELF notes
| >>instead of having to grovel yourself.
| > 
| > 
| > Unless there is some hidden feature that lets
| > me specify the ELF note number directly, no way.
| > 
| > The sysconf(_SC_CLK_TCK) call does not return an
| > error code when used on a 2.2.xx i386 kernel.
| > You get an arbitrary value that fails for ARM,
| > Alpha, and any system with modified HZ.
| 
| As Linux is supposed to be POSIX compliant this is a bug and should be 
| fixed.


My understanding (from a few years back) is that Linux is POSIX
if/when/where it makes sense, but not necessarily POSIX-just-to-be-POSIX.

--
~Randy
"You can't do anything without having to do something else first."
-- Belefant's Law

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-20 14:54   ` Albert Cahalan
@ 2004-03-20 23:58     ` Peter Williams
  2004-03-31 21:40       ` Randy.Dunlap
  0 siblings, 1 reply; 75+ messages in thread
From: Peter Williams @ 2004-03-20 23:58 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: Arjan van de Ven, ak, Richard.Curnow, aeb,
	linux-kernel mailing list, Albert Cahalan

Albert Cahalan wrote:
> On Sat, 2004-03-20 at 04:56, Arjan van de Ven wrote:
> 
>>On Tue, Mar 16, 2004 at 11:14:59AM -0500, Albert Cahalan wrote:
>>
>>>>there is one. Nothing uses it
>>>>(sysconf() provides this info)
>>>
>>>If you have a recent glibc on a recent kernel, it might.
>>>You could also get a -1 or a supposed ABI value that
>>>has nothing to do with the kernel currently running.
>>>The most reliable way is to first look around on the
>>>stack in search of ELF notes, and then fall back to
>>>some horribly gross hacks as needed.
>>
>>eh sysconf() is the nice way to get to the ELF notes
>>instead of having to grovel yourself.
> 
> 
> Unless there is some hidden feature that lets
> me specify the ELF note number directly, no way.
> 
> The sysconf(_SC_CLK_TCK) call does not return an
> error code when used on a 2.2.xx i386 kernel.
> You get an arbitrary value that fails for ARM,
> Alpha, and any system with modified HZ.

As Linux is supposed to be POSIX compliant this is a bug and should be 
fixed.

Peter
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-20  9:56 ` Arjan van de Ven
@ 2004-03-20 14:54   ` Albert Cahalan
  2004-03-20 23:58     ` Peter Williams
  0 siblings, 1 reply; 75+ messages in thread
From: Albert Cahalan @ 2004-03-20 14:54 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Albert Cahalan, linux-kernel mailing list, peterw, aeb, ak,
	Richard.Curnow

On Sat, 2004-03-20 at 04:56, Arjan van de Ven wrote:
> On Tue, Mar 16, 2004 at 11:14:59AM -0500, Albert Cahalan wrote:
> > > there is one. Nothing uses it
> > > (sysconf() provides this info)
> > 
> > If you have a recent glibc on a recent kernel, it might.
> > You could also get a -1 or a supposed ABI value that
> > has nothing to do with the kernel currently running.
> > The most reliable way is to first look around on the
> > stack in search of ELF notes, and then fall back to
> > some horribly gross hacks as needed.
> 
> eh sysconf() is the nice way to get to the ELF notes
> instead of having to grovel yourself.

Unless there is some hidden feature that lets
me specify the ELF note number directly, no way.

The sysconf(_SC_CLK_TCK) call does not return an
error code when used on a 2.2.xx i386 kernel.
You get an arbitrary value that fails for ARM,
Alpha, and any system with modified HZ.

You can't rely on sysconf(_SC_NPROCESSORS_CONF)
or sysconf(_SC_NPROCESSORS_ONLN) either. You'll
get back a 0 from the SPARC glibc, which really
means 0 processors since -1 is the error code.

Whatever the question, "use sysconf" is most
likely not the answer.

The man page ought to mention this.



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-16 16:14 Albert Cahalan
  2004-03-16 17:22 ` Richard Curnow
@ 2004-03-20  9:56 ` Arjan van de Ven
  2004-03-20 14:54   ` Albert Cahalan
  1 sibling, 1 reply; 75+ messages in thread
From: Arjan van de Ven @ 2004-03-20  9:56 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list, peterw, ak, Richard.Curnow

[-- Attachment #1: Type: text/plain, Size: 545 bytes --]

On Tue, Mar 16, 2004 at 11:14:59AM -0500, Albert Cahalan wrote:
> > there is one. Nothing uses it
> > (sysconf() provides this info)
> 
> If you have a recent glibc on a recent kernel, it might.
> You could also get a -1 or a supposed ABI value that
> has nothing to do with the kernel currently running.
> The most reliable way is to first look around on the
> stack in search of ELF notes, and then fall back to
> some horribly gross hacks as needed.

eh sysconf() is the nice way to get to the ELF notes instead of having to
grovel yourself.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-16 23:56                 ` Andi Kleen
@ 2004-03-17  0:15                   ` Peter Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Peter Williams @ 2004-03-17  0:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

Andi Kleen wrote:
>>These programs could (and should) use sysconfig(_SC_CLK_TCK) to find out 
>>how many ticks there are in a second so this does not constitute a good 
>>reason for USER_HZ not being equal to HZ.
> 
> 
> These programs are usually shell scripts that initialise some sysctls.

Which ones?  Top and ps don't appear to be scripts on my system (Red Hat 
9.0).

> It's not easy to call sysconf from there.

A small utility program would suffice.

> Also we tend to avoid breaking
> things that would fail silently instead of failing with an obvious error 
> message.  This would be such a case. Silent breakage is an extremly bad
> thing.

This is the responsibility of the authors of the programs in question 
not the kernel.

Peter
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-16 23:15               ` Peter Williams
@ 2004-03-16 23:56                 ` Andi Kleen
  2004-03-17  0:15                   ` Peter Williams
  0 siblings, 1 reply; 75+ messages in thread
From: Andi Kleen @ 2004-03-16 23:56 UTC (permalink / raw)
  To: Peter Williams; +Cc: Andi Kleen, linux-kernel

> 
> These programs could (and should) use sysconfig(_SC_CLK_TCK) to find out 
> how many ticks there are in a second so this does not constitute a good 
> reason for USER_HZ not being equal to HZ.

These programs are usually shell scripts that initialise some sysctls.
It's not easy to call sysconf from there. Also we tend to avoid breaking
things that would fail silently instead of failing with an obvious error 
message.  This would be such a case. Silent breakage is an extremly bad
thing.

-Andi

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-16  9:16             ` Bernd Petrovitsch
@ 2004-03-16 23:45               ` Peter Williams
  0 siblings, 0 replies; 75+ messages in thread
From: Peter Williams @ 2004-03-16 23:45 UTC (permalink / raw)
  To: Bernd Petrovitsch; +Cc: Andi Kleen, linux-kernel

Bernd Petrovitsch wrote:
> On Die, 2004-03-16 at 06:53, Peter Williams wrote:
> 
>>Andi Kleen wrote:
>>
>>>Peter Williams <peterw@aurema.com> writes:
> 
> [...]
> 
>>>Already exists for a long time - AT_CLKTCK. glibc has a nice wrapper
>>>for it too (sysconf)
>>
>>So it does and POSIX.1 (_SC_CLK_TCK) compliant as well.  Unfortunately, 
>>the presence of this functionality makes it VERY difficult to understand 
>>why ticks are being converted from HZ==1000 values to HZ=100 values when 
>>they are being exported to user space especially as this conversion 
>>throws away precision.  Can anyone enlighten me?
> 
> 
> 1) Because Linux had long time HZ=100 hardcoded (except on Alphas) and
>    lots of applications probably use that value today (as HZ in their
>    source and not sysconf(...))  - especially since 2.4 (at least most
>    of them) has HZ=100 except for 64bit CPUs).

That is not a valid reason.  The programs should be fixed.

> 2) There are patches which dynamically change the CPU speed. And it
>    probably (IMHO) makes sense to change HZ dynamically too in that
>    situations. And a over-time changing HZ value is useless in
>    user-space.

I can't see why.  Ticks are used internally for process accounting (e.g. 
utime, stime, cutime and cstime) and if HZ was changing dynamically 
you'd have to visit every task and modify these values to be consistent 
with the changed value of HZ.  Even if HZ was allowed to change 
dynamically the values reported to user space should be in units 
appropriate to the MAXIMUM possible value of HZ so that precision is not 
lost.

Peter
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-16  6:16             ` Andi Kleen
@ 2004-03-16 23:15               ` Peter Williams
  2004-03-16 23:56                 ` Andi Kleen
  0 siblings, 1 reply; 75+ messages in thread
From: Peter Williams @ 2004-03-16 23:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

Andi Kleen wrote:
>>So it does and POSIX.1 (_SC_CLK_TCK) compliant as well.  Unfortunately, 
>>the presence of this functionality makes it VERY difficult to understand 
>>why ticks are being converted from HZ==1000 values to HZ=100 values when 
>>they are being exported to user space especially as this conversion 
>>throws away precision.  Can anyone enlighten me?
> 
> 
> There are two different cases here: 
> 
> Timer tick as visible to user space in the minimum delay of select()
> and other kernel functions with timeout. That is what AT_CLKTCK aims at.

Which is a good reason for USER_HZ to be the same as HZ.

> 
> And exports of values with jiffie units in sysctls in /proc. This was in fact i
> always a bug because they should have used ms or s as unit 
> (there are readily usable utility functions to do this for sysctl). Otherwise
> writing documentation becomes quite difficult. But there are already i
> configurations that set or read these values and was not a good idea to 
> subtly and silently break them. Especially since they predate any exporting 
> of HZ to user space. So the the conversion factor was added.
> 
> This is not only obscure sysctls, ps and top are also consumers of such
> jiffies values in /proc
> 

These programs could (and should) use sysconfig(_SC_CLK_TCK) to find out 
how many ticks there are in a second so this does not constitute a good 
reason for USER_HZ not being equal to HZ.

BTW, in ignorance of sysconfig(_SC_CLK_TCK) and because of statements to 
the same effect in Robert Love's book, I had been assuming that this was 
the reason for USER_HZ and HZ not being equal.  But now that I've been 
told about sysconfig(_SC_CLK_TCK) I can see no valid reason.  That 
doesn't mean that there aren't any but the reasons you've advanced 
certainly aren't them.

Peter
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-16 16:14 Albert Cahalan
@ 2004-03-16 17:22 ` Richard Curnow
  2004-03-20  9:56 ` Arjan van de Ven
  1 sibling, 0 replies; 75+ messages in thread
From: Richard Curnow @ 2004-03-16 17:22 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list, arjanv, peterw, ak

* Albert Cahalan <albert@users.sourceforge.net> [2004-03-16]:
> 
> Fortunately this is a fresh new reason to beg Linus for
> some data. (all previous arguments have been rejected)
> What would be useful for you?
> 
> HZ   (-1 for tickless?)
> USER_HZ
> freq_scale
> some boolean to indicate ppc-like (pure cycle counter) time
> ???

freq_scale would be a good starting point, I think.

However, there is worse.  There is bounds checking on the txc.freq
argument to adjtimex().  IIRC the bounds have changed at various points
in the kernel history, but at one time the limit was +/- 100ppm.  At the
time, I had a mobo with a -300ppm clock error.  To cope with this,
chrony modifies txc.tick to take out the gross error as well as txc.freq
to adjust the fine error.  Therefore, it needs some idea of how tick and
freq inter-relate, and what the valid range of values for tick is.  This
is another mess.  I need to go away and think some more to know info
from the kernel side would make the problem easier to code for, though.

-- 
Richard \\\ SH-4/SH-5 Core & Debug Architect
Curnow  \\\         SuperH (UK) Ltd, Bristol

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
@ 2004-03-16 16:14 Albert Cahalan
  2004-03-16 17:22 ` Richard Curnow
  2004-03-20  9:56 ` Arjan van de Ven
  0 siblings, 2 replies; 75+ messages in thread
From: Albert Cahalan @ 2004-03-16 16:14 UTC (permalink / raw)
  To: linux-kernel mailing list; +Cc: arjanv, peterw, ak, Richard.Curnow

[various people]

>> This horrible hack of converting all tick values to 100
>> (from 1000) for export to user space because a large number
>> of user space programs assume that HZ is 100 would NOT be
>> necessary if there was a mechanism whereby user space
>> programs could find out how many ticks there are in a
>> second instead of having to make assumptions.
>
> there is one. Nothing uses it
> (sysconf() provides this info)

If you have a recent glibc on a recent kernel, it might.
You could also get a -1 or a supposed ABI value that
has nothing to do with the kernel currently running.
The most reliable way is to first look around on the
stack in search of ELF notes, and then fall back to
some horribly gross hacks as needed.

> /proc/interrupts "leaks" the value of HZ.  On x86, for instance:
> ( cat /proc/interrupts; sleep 5; cat /proc/interrupts )  |  grep timer

That doesn't really count. The code could be set to do a
dozen interrupts per jiffie tick. Jiffies are what matter.

>> It seem to change from system to system and between 2.4
>> (100 on i386) to 2.6 (1000 on i386).
>
> And can also be tweaked when compiling, and depends on architecture, and...

Yep. For Linux 2.4.xx and up, ELF notes provide the data.
For older systems, you need to compute the ratio of uptime
to total jiffies.

>> This horrible hack of converting all tick values to 100
>> (from 1000) for export to user space because a large number
>> of user space programs assume that HZ is 100 would NOT be
>> necessary if there was a mechanism whereby user space
>> programs could find out how many ticks there are in a
>> second instead of having to make assumptions.
>
> Already exists for a long time - AT_CLKTCK. glibc has a
> nice wrapper for it too (sysconf)

AT_CLKTCK is new with the 2.4 kernel. When it is missing or
unsupported by an old glibc, the sysconf() call returns
a guess instead of an error code. So sysconf() is worthless
if you want to support old kernels (Debian!) or old glibc.

> This is not only obscure sysctls, ps and top are
> also consumers of such jiffies values in /proc

They follow AT_CLKTCK when it is available, not a HZ
value from some header file. So you can change HZ quite
a bit and these tools won't mind.

> 1) Because Linux had long time HZ=100 hardcoded
>    (except on Alphas) and lots of applications
>    probably use that value today (as HZ in their
>    source and not sysconf(...))  - especially
>    since 2.4 (at least most of them) has HZ=100
>    except for 64bit CPUs).

That is severely broken anyway.

At least with Linux 2.4 kernels, many ports have used
a hardware-specific HZ value. All did, really, if you
consider user-mode Linux. My table:

  10  S/390 (sometimes)  
  20  user-mode Linux  
  32  ia64 emulator  
  64  StrongARM /Shark
 100  normal Linux
 128  MIPS, ARM  
1000  ARM
1024  Alpha, ia64
1200  Alpha

Any app supporting Linux 2.4 with an old glibc or
supporting Linux 2.2 will need to do something evil.

> A related issue that's bugged me for a long time is lack
> of userspace access to the quantity that's called
> 'freq_scale' in 2.4, where it's (1<<SHIFT_HZ)/HZ for
> HZ!=100 and 128/128.125 for HZ==100.  (I haven't started
> to reverse-engineer the equivalent value in 2.6, I took
> a quick look once and concluded things had got a little
> more hairy.)
>
> My interest is that I maintain (in spare-time) an NTP
> application called chrony (http://chrony.sunsite.dk/),
> originally written to be good for dial-up, i.e. NTP
> servers accessible for a short window once or twice a day.
> This app wants to tune the parameters it passes to
> adjtimex() to take a best shot at keeping the system
> clock correct over the potentially 'long' offline period.
> To do this well, it has to reverse-compensate for the
> freq_scale multiplier that the kernel will apply to the
> frequency value passed to adjtimex().  Getting the right
> value for this across different kernels has always been
> a fragile exercise.

Arrrrgh!!!!  I thought I had it bad.

Fortunately this is a fresh new reason to beg Linus for
some data. (all previous arguments have been rejected)
What would be useful for you?

HZ   (-1 for tickless?)
USER_HZ
freq_scale
some boolean to indicate ppc-like (pure cycle counter) time
???




^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-16  5:53           ` Peter Williams
  2004-03-16  6:16             ` Andi Kleen
@ 2004-03-16  9:16             ` Bernd Petrovitsch
  2004-03-16 23:45               ` Peter Williams
  1 sibling, 1 reply; 75+ messages in thread
From: Bernd Petrovitsch @ 2004-03-16  9:16 UTC (permalink / raw)
  To: Peter Williams; +Cc: Andi Kleen, linux-kernel

On Die, 2004-03-16 at 06:53, Peter Williams wrote:
> Andi Kleen wrote:
> > Peter Williams <peterw@aurema.com> writes:
[...]
> > Already exists for a long time - AT_CLKTCK. glibc has a nice wrapper
> > for it too (sysconf)
> 
> So it does and POSIX.1 (_SC_CLK_TCK) compliant as well.  Unfortunately, 
> the presence of this functionality makes it VERY difficult to understand 
> why ticks are being converted from HZ==1000 values to HZ=100 values when 
> they are being exported to user space especially as this conversion 
> throws away precision.  Can anyone enlighten me?

1) Because Linux had long time HZ=100 hardcoded (except on Alphas) and
   lots of applications probably use that value today (as HZ in their
   source and not sysconf(...))  - especially since 2.4 (at least most
   of them) has HZ=100 except for 64bit CPUs).
2) There are patches which dynamically change the CPU speed. And it
   probably (IMHO) makes sense to change HZ dynamically too in that
   situations. And a over-time changing HZ value is useless in
   user-space.

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-16  5:53           ` Peter Williams
@ 2004-03-16  6:16             ` Andi Kleen
  2004-03-16 23:15               ` Peter Williams
  2004-03-16  9:16             ` Bernd Petrovitsch
  1 sibling, 1 reply; 75+ messages in thread
From: Andi Kleen @ 2004-03-16  6:16 UTC (permalink / raw)
  To: Peter Williams; +Cc: Andi Kleen, linux-kernel

> So it does and POSIX.1 (_SC_CLK_TCK) compliant as well.  Unfortunately, 
> the presence of this functionality makes it VERY difficult to understand 
> why ticks are being converted from HZ==1000 values to HZ=100 values when 
> they are being exported to user space especially as this conversion 
> throws away precision.  Can anyone enlighten me?

There are two different cases here: 

Timer tick as visible to user space in the minimum delay of select()
and other kernel functions with timeout. That is what AT_CLKTCK aims at.

And exports of values with jiffie units in sysctls in /proc. This was in fact i
always a bug because they should have used ms or s as unit 
(there are readily usable utility functions to do this for sysctl). Otherwise
writing documentation becomes quite difficult. But there are already i
configurations that set or read these values and was not a good idea to 
subtly and silently break them. Especially since they predate any exporting 
of HZ to user space. So the the conversion factor was added.

This is not only obscure sysctls, ps and top are also consumers of such
jiffies values in /proc

-Andi


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
  2004-03-16  2:27         ` Andi Kleen
@ 2004-03-16  5:53           ` Peter Williams
  2004-03-16  6:16             ` Andi Kleen
  2004-03-16  9:16             ` Bernd Petrovitsch
  0 siblings, 2 replies; 75+ messages in thread
From: Peter Williams @ 2004-03-16  5:53 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

Andi Kleen wrote:
> Peter Williams <peterw@aurema.com> writes:
> 
> 
>>This horrible hack of converting all tick values to 100 (from 1000)
>>for export to user space because a large number of user space programs
>>assume that HZ is 100 would NOT be necessary if there was a mechanism
>>whereby user space programs could find out how many ticks there are in
>>a second instead of having to make assumptions.
> 
> 
> Already exists for a long time - AT_CLKTCK. glibc has a nice wrapper
> for it too (sysconf)

So it does and POSIX.1 (_SC_CLK_TCK) compliant as well.  Unfortunately, 
the presence of this functionality makes it VERY difficult to understand 
why ticks are being converted from HZ==1000 values to HZ=100 values when 
they are being exported to user space especially as this conversion 
throws away precision.  Can anyone enlighten me?

Peter
-- 
Dr Peter Williams, Chief Scientist                peterw@aurema.com
Aurema Pty Limited                                Tel:+61 2 9698 2322
PO Box 305, Strawberry Hills NSW 2012, Australia  Fax:+61 2 9699 9174
79 Myrtle Street, Chippendale NSW 2008, Australia http://www.aurema.com



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: finding out the value of HZ from userspace
       [not found]       ` <1AaWr-655-7@gated-at.bofh.it>
@ 2004-03-16  2:27         ` Andi Kleen
  2004-03-16  5:53           ` Peter Williams
  0 siblings, 1 reply; 75+ messages in thread
From: Andi Kleen @ 2004-03-16  2:27 UTC (permalink / raw)
  To: Peter Williams; +Cc: linux-kernel

Peter Williams <peterw@aurema.com> writes:

> This horrible hack of converting all tick values to 100 (from 1000)
> for export to user space because a large number of user space programs
> assume that HZ is 100 would NOT be necessary if there was a mechanism
> whereby user space programs could find out how many ticks there are in
> a second instead of having to make assumptions.

Already exists for a long time - AT_CLKTCK. glibc has a nice wrapper
for it too (sysconf)

-Andi


^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2004-04-02 22:08 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <michf@post.tau.ac.il>
2004-03-11 14:17 ` finding out the value of HZ from userspace Micha Feigin
2004-03-13 17:24   ` Arjan van de Ven
2004-03-13 19:34     ` John Reiser
2004-03-13 19:38       ` Arjan van de Ven
2004-03-13 22:14         ` Micha Feigin
2004-03-13 22:32           ` Arjan van de Ven
2004-03-14  1:05             ` Micha Feigin
2004-03-14  1:49               ` Andrew Morton
2004-03-14 14:37                 ` Micha Feigin
2004-03-16  0:28         ` Peter Williams
2004-03-16  6:33           ` Arjan van de Ven
2004-03-16 23:38             ` Peter Williams
2004-03-20 10:22               ` Arjan van de Ven
2004-03-20 11:28                 ` Stefan Smietanowski
2004-03-20 11:41                   ` Arjan van de Ven
2004-03-20 23:58                     ` Peter Williams
2004-03-21  1:09                       ` Tim Schmielau
2004-03-21  1:30                         ` Peter Williams
2004-03-21  8:00                   ` Kai Henningsen
2004-03-21 10:32                     ` Stefan Smietanowski
2004-03-22 22:34                   ` Micha Feigin
2004-03-22 23:04                     ` Peter Williams
2004-03-25 17:40                       ` Jamie Lokier
2004-03-25 23:22                         ` Peter Williams
2004-03-27 13:31                           ` Jamie Lokier
2004-03-27 23:52                             ` Peter Williams
2004-03-28 12:16                               ` Jamie Lokier
2004-03-27 21:11                       ` Micha Feigin
2004-03-20 23:26                 ` Peter Williams
2004-03-13 21:19     ` tabris
2004-03-13 22:10     ` Micha Feigin
2004-03-13 22:41       ` Arjan van de Ven
2004-03-14  1:07         ` Micha Feigin
2004-03-14 18:26         ` John Reiser
2004-03-14  2:45   ` Horst von Brand
2004-03-14 14:39     ` Micha Feigin
2004-03-15  8:17     ` Jamie Lokier
2004-03-16 18:16       ` Mark Gross
2004-03-15 10:13     ` Richard Curnow
     [not found]   ` <200403161757.48786.mgross@linux.intel.com>
     [not found]     ` <20040317023059.GD19564@mail.shareable.org>
2004-03-17 16:48       ` Call for HRT in 2.6 kernel was " Mark Gross
2004-03-17 20:07         ` Jamie Lokier
2004-03-17 21:25           ` Mark Gross
2004-03-18  1:19           ` Karim Yaghmour
2004-03-18 11:56             ` Jamie Lokier
2004-03-18 15:23               ` Karim Yaghmour
2004-03-21  1:55                 ` Erik Andersen
2004-03-23 22:35                   ` Karim Yaghmour
     [not found] <1zkOe-Uc-17@gated-at.bofh.it>
     [not found] ` <1zl7M-1eJ-43@gated-at.bofh.it>
     [not found]   ` <1zn9p-3mW-5@gated-at.bofh.it>
     [not found]     ` <1znj5-3wM-15@gated-at.bofh.it>
     [not found]       ` <1AaWr-655-7@gated-at.bofh.it>
2004-03-16  2:27         ` Andi Kleen
2004-03-16  5:53           ` Peter Williams
2004-03-16  6:16             ` Andi Kleen
2004-03-16 23:15               ` Peter Williams
2004-03-16 23:56                 ` Andi Kleen
2004-03-17  0:15                   ` Peter Williams
2004-03-16  9:16             ` Bernd Petrovitsch
2004-03-16 23:45               ` Peter Williams
2004-03-16 16:14 Albert Cahalan
2004-03-16 17:22 ` Richard Curnow
2004-03-20  9:56 ` Arjan van de Ven
2004-03-20 14:54   ` Albert Cahalan
2004-03-20 23:58     ` Peter Williams
2004-03-31 21:40       ` Randy.Dunlap
2004-03-31 23:46         ` Albert Cahalan
2004-04-01 15:54           ` Jamie Lokier
2004-04-01 16:01             ` Arjan van de Ven
2004-04-01 16:30               ` Jamie Lokier
2004-04-01 16:50                 ` Richard B. Johnson
2004-04-01 17:01                   ` Jamie Lokier
2004-04-01 21:27                   ` Michael Buesch
2004-04-02  0:16                   ` Peter Williams
2004-04-02  0:07                 ` Peter Williams
2004-04-02  0:39                   ` Jamie Lokier
2004-04-02  1:44                     ` Peter Williams
2004-04-02 18:28                       ` Tim Bird
2004-04-02 22:05                         ` Peter Williams
2004-04-01 16:12             ` Albert Cahalan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.