linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
@ 2002-07-31 11:01 David Luyer
  2002-07-31 13:28 ` Alan Cox
  0 siblings, 1 reply; 23+ messages in thread
From: David Luyer @ 2002-07-31 11:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: 'Alan Cox'

I wrote:

> In Linux 2.4.19ac3rc3 on IBM x330/x340 SMP systems we're seeing this:
> 
> luyer@praxis8:~$ ps auxwww | tail -1
> luyer     1025  0.0  0.0  1276  352 pts/2    S    Aug06   0:00 tail -1
> luyer@praxis8:~$ date
> Wed Jul 31 12:35:16 EST 2002

(UP systems are fine, SMP have this problem)

Reason:

luyer@praxis8:~$ ps --info 2>&1 | grep Hertz
EUID=111 TTY=136,3 Hertz=50

procps is getting the hertz value wrong, it's computing it as:

  h = (unsigned long)( (double)jiffies/seconds/smp_num_cpus );

but we're only getting timer interrupts on CPU 0, and hence
jiffies is only incrementing once per 100th of a second.

luyer@praxis8:~/procps/procps-2.0.7.orig/proc$ cat /proc/interrupts
           CPU0       CPU1
  0:   52459351          0  local-APIC-edge  timer
  1:          0          2    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
 24:     883655     863043   IO-APIC-level  ips
 26:          7          9   IO-APIC-level  aic7xxx
 27:          8          8   IO-APIC-level  aic7xxx
 28:   97880608   96542591   IO-APIC-level  eth0
NMI:          0          0
LOC:   52456889   52456887
ERR:          0
MIS:          0

procps version is 2.0.7 (Debian 3.0).

Where's the mistake -- should timer interrupts be on both
CPUs (I think this is the problem), or is procps miscalculating
Hz (seems less likely, someone would have noticed by now...)?

David.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-07-31 13:28 ` Alan Cox
@ 2002-07-31 12:59   ` David Luyer
  2002-07-31 14:26     ` Alan Cox
  2002-07-31 23:42     ` Lincoln Dale
  0 siblings, 2 replies; 23+ messages in thread
From: David Luyer @ 2002-07-31 12:59 UTC (permalink / raw)
  To: 'Alan Cox'; +Cc: linux-kernel

Alan Cox wrote:
> > procps version is 2.0.7 (Debian 3.0).
> > 
> > Where's the mistake -- should timer interrupts be on both
> > CPUs (I think this is the problem), or is procps miscalculating
> > Hz (seems less likely, someone would have noticed by now...)?
> 
> HZ on x86 for user space is defined as 100. Its a procps problem

Slight error in my initial diagnosis of why procps is getting Hertz
wrong tho.  It's not because timer interrupts are only happening
on one CPU.  It's because it thinks I have 4 CPUs per system, when
really I only have 2 CPUs per system.

It's taking jiffies from the sum of the figures on the first line
of /proc/stat and dividing by the uptime in seconds from /proc/uptime
multiplied by the number of CPUs.  The system has two CPUs, #0 and #1,
and is reporting _SC_NPROCESSORS_CONF as 4 (the count used by procps
as the number of CPUs).

Looks like even if it is procps's fault for not just using HZ==100,
the kernel is leading it astray by claiming I have twice as many
CPUs as I really do.

uyer@praxis8:~$ make cpus
cc     cpus.c   -o cpus
luyer@praxis8:~$ cat cpus.c
#include <unistd.h>

main () {
  printf("%d\n", sysconf(_SC_NPROCESSORS_CONF));
}
luyer@praxis8:~$ ./cpus
4
luyer@praxis8:~$ grep 'processor        ' /proc/cpuinfo
processor       : 0
processor       : 1
luyer@praxis8:~$ dmesg | grep -E 'Initializing CPU|CPU #. not
responding'
Initializing CPU#0
Initializing CPU#1
CPU #3 not responding - cannot use it.

David.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-07-31 14:26     ` Alan Cox
@ 2002-07-31 13:18       ` David Luyer
  2002-07-31 13:31       ` Dana Lacoste
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: David Luyer @ 2002-07-31 13:18 UTC (permalink / raw)
  To: 'Alan Cox'; +Cc: linux-kernel

> On Wed, 2002-07-31 at 13:59, David Luyer wrote:
> >   printf("%d\n", sysconf(_SC_NPROCESSORS_CONF));
> > }
> > luyer@praxis8:~$ ./cpus
> > 4
> > luyer@praxis8:~$ grep 'processor        ' /proc/cpuinfo
> > processor       : 0
> > processor       : 1
> 
> In which case I suggest you file a glibc bug. sysconf looks 
> at the /proc stuff as I understand it

Great, got it, thanks: sysconf(_SC_NPROCESSORS_CONF) parses
/proc/cpuinfo
using a simple parser:

#ifndef GET_NPROCS_PARSER
# define GET_NPROCS_PARSER(FP, BUFFER, RESULT)
\
  do
\
    {
\
      (RESULT) = 0;
\
      /* Read all lines and count the lines starting with the string
\
         "processor".  We don't have to fear extremely long lines since
\
         the kernel will not generate them.  8192 bytes are really
\
         enough.  */
\
      while (fgets_unlocked (BUFFER, sizeof (BUFFER), FP) != NULL)
\
        if (strncmp (BUFFER, "processor", 9) == 0)
\
          ++(RESULT);
\
    }
\
  while (0)
#endif

It's being tricked by this:

luyer@praxis8:~$ cat /proc/cpuinfo | grep '^processor'
processor       : 0
processor id    : 0
processor       : 1
processor id    : 0

The "processor id" line, only present with SMP enabled, is being counted
as a processor.

David.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-07-31 11:01 Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew David Luyer
@ 2002-07-31 13:28 ` Alan Cox
  2002-07-31 12:59   ` David Luyer
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Cox @ 2002-07-31 13:28 UTC (permalink / raw)
  To: David Luyer; +Cc: linux-kernel

> procps version is 2.0.7 (Debian 3.0).
> 
> Where's the mistake -- should timer interrupts be on both
> CPUs (I think this is the problem), or is procps miscalculating
> Hz (seems less likely, someone would have noticed by now...)?

HZ on x86 for user space is defined as 100. Its a procps problem


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-07-31 14:26     ` Alan Cox
  2002-07-31 13:18       ` David Luyer
@ 2002-07-31 13:31       ` Dana Lacoste
  2002-07-31 13:38         ` David Luyer
  2002-07-31 16:15       ` NMI watchdog, die(), & console_loglevel Jonathan Lundell
  2002-07-31 19:14       ` Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew Albert D. Cahalan
  3 siblings, 1 reply; 23+ messages in thread
From: Dana Lacoste @ 2002-07-31 13:31 UTC (permalink / raw)
  To: David Luyer; +Cc: linux-kernel

On Wed, 2002-07-31 at 13:59, David Luyer wrote:
>   printf("%d\n", sysconf(_SC_NPROCESSORS_CONF));
> }
> luyer@praxis8:~$ ./cpus
> 4

I ran your test program on a Compaq DL360 and an IBM x330
and both showed '2' for the CPU count (2.4.18 stock, glibc 2.2.3)

Just a point of reference to help narrow the problem area down :)

Dana Lacoste
Ottawa, Canada


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-07-31 13:31       ` Dana Lacoste
@ 2002-07-31 13:38         ` David Luyer
  2002-07-31 15:04           ` Alan Cox
  0 siblings, 1 reply; 23+ messages in thread
From: David Luyer @ 2002-07-31 13:38 UTC (permalink / raw)
  To: 'Dana Lacoste'; +Cc: linux-kernel

Dana Lacoste wrote:
> On Wed, 2002-07-31 at 13:59, David Luyer wrote:
> >   printf("%d\n", sysconf(_SC_NPROCESSORS_CONF));
> > }
> > luyer@praxis8:~$ ./cpus
> > 4
> 
> I ran your test program on a Compaq DL360 and an IBM x330
> and both showed '2' for the CPU count (2.4.18 stock, glibc 2.2.3)
> 
> Just a point of reference to help narrow the problem area down :)

Yes, the problem is in the -ac train only.  It's the "processor id"
field that has been added to /proc/cpuinfo which is confusing libc's
way of counting CPUs.

That's a libc bug.  But there's also a kernel bug with that field
it appears.

The kernel bug: the "processor id" fields are both printing zero.

Possibly because show_cpuinfo() in arch/i386/kernel/setup.c prints
directly out of phys_proc_id as at the time it's called, but
smpboot.c declates phys_proc_id as __initdata (either that, or
phys_proc_id is actually zero for both CPUs?).

David.
--
David Luyer                                     Phone:   +61 3 9674 7525
Network Development Manager    P A C I F I C    Fax:     +61 3 9699 8693
Pacific Internet (Australia)  I N T E R N E T   Mobile:  +61 4 1111 BYTE
http://www.pacific.net.au/                      NASDAQ:  PCNTF


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-07-31 15:04           ` Alan Cox
@ 2002-07-31 13:57             ` David Luyer
  0 siblings, 0 replies; 23+ messages in thread
From: David Luyer @ 2002-07-31 13:57 UTC (permalink / raw)
  To: 'Alan Cox'; +Cc: linux-kernel

> > Possibly because show_cpuinfo() in arch/i386/kernel/setup.c prints
> > directly out of phys_proc_id as at the time it's called, but
> > smpboot.c declates phys_proc_id as __initdata (either that, or
> > phys_proc_id is actually zero for both CPUs?).
> 
> The former is the problem. Thanks for spotting it. As to the text
> string, I'll have a chat with Ulrich about it and see what he thinks

The former and the latter possibly: the only assignment I see for
phys_proc_id is when hyperthreading is happening (in fact, requires
all of X86_FEATURE_HT, !disable_x86_ht and smp_num_siblings > 1);
down the end of kernel/setup.c init_intel().

David.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-07-31 12:59   ` David Luyer
@ 2002-07-31 14:26     ` Alan Cox
  2002-07-31 13:18       ` David Luyer
                         ` (3 more replies)
  2002-07-31 23:42     ` Lincoln Dale
  1 sibling, 4 replies; 23+ messages in thread
From: Alan Cox @ 2002-07-31 14:26 UTC (permalink / raw)
  To: David Luyer; +Cc: linux-kernel

On Wed, 2002-07-31 at 13:59, David Luyer wrote:
>   printf("%d\n", sysconf(_SC_NPROCESSORS_CONF));
> }
> luyer@praxis8:~$ ./cpus
> 4
> luyer@praxis8:~$ grep 'processor        ' /proc/cpuinfo
> processor       : 0
> processor       : 1

In which case I suggest you file a glibc bug. sysconf looks at the /proc
stuff as I understand it


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-07-31 13:38         ` David Luyer
@ 2002-07-31 15:04           ` Alan Cox
  2002-07-31 13:57             ` David Luyer
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Cox @ 2002-07-31 15:04 UTC (permalink / raw)
  To: David Luyer; +Cc: 'Dana Lacoste', linux-kernel

On Wed, 2002-07-31 at 14:38, David Luyer wrote:
> Yes, the problem is in the -ac train only.  It's the "processor id"
> field that has been added to /proc/cpuinfo which is confusing libc's
> way of counting CPUs.
> 
> That's a libc bug.  But there's also a kernel bug with that field
> it appears.

Currently yes - it got broken during the Summit rearrangements

> The kernel bug: the "processor id" fields are both printing zero.
> 
> Possibly because show_cpuinfo() in arch/i386/kernel/setup.c prints
> directly out of phys_proc_id as at the time it's called, but
> smpboot.c declates phys_proc_id as __initdata (either that, or
> phys_proc_id is actually zero for both CPUs?).

The former is the problem. Thanks for spotting it. As to the text
string, I'll have a chat with Ulrich about it and see what he thinks


^ permalink raw reply	[flat|nested] 23+ messages in thread

* NMI watchdog, die(), & console_loglevel
  2002-07-31 14:26     ` Alan Cox
  2002-07-31 13:18       ` David Luyer
  2002-07-31 13:31       ` Dana Lacoste
@ 2002-07-31 16:15       ` Jonathan Lundell
  2002-07-31 19:14       ` Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew Albert D. Cahalan
  3 siblings, 0 replies; 23+ messages in thread
From: Jonathan Lundell @ 2002-07-31 16:15 UTC (permalink / raw)
  To: linux-kernel

The i386 NMI watchdog handler prints a message, sets console_loglevel 
to 0 (no output to console), and then kills the current task 
(arch/i386/kernel/nmi.c:nmi_watchdog_tick()); it then leaves the 
console turned off.

die(), on the other hand, starts out by setting console_loglevel to 
15 (print everything), and leaves it there.

Neither behavior seems particularly appropriate, and taken together 
they seem at least inconsistent. What's the justification, if any, 
and wouldn't it be better to leave console_loglevel alone and set an 
appropriate message loglevel? (Not that I'd claim for an instant that 
message loglevels are used consistently; have a look at the various 
applications of KERN_EMERG, for example.)
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-07-31 14:26     ` Alan Cox
                         ` (2 preceding siblings ...)
  2002-07-31 16:15       ` NMI watchdog, die(), & console_loglevel Jonathan Lundell
@ 2002-07-31 19:14       ` Albert D. Cahalan
  2002-08-01  0:37         ` Alan Cox
  3 siblings, 1 reply; 23+ messages in thread
From: Albert D. Cahalan @ 2002-07-31 19:14 UTC (permalink / raw)
  To: Alan Cox; +Cc: David Luyer, linux-kernel

Alan Cox writes:
> On Wed, 2002-07-31 at 13:59, David Luyer wrote:

>>   printf("%d\n", sysconf(_SC_NPROCESSORS_CONF));
>> }
>> luyer@praxis8:~$ ./cpus
>> 4
>> luyer@praxis8:~$ grep 'processor        ' /proc/cpuinfo
>> processor       : 0
>> processor       : 1
>
> In which case I suggest you file a glibc bug. sysconf looks at the /proc
> stuff as I understand it

First you blame ps. Then you blame libc. How about you
place the fault right where it belongs?

Counting processors in /proc/cpuinfo is a joke of an ABI.

Add a proper ABI now, and userspace can transition to it
over the next 4 years.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-07-31 12:59   ` David Luyer
  2002-07-31 14:26     ` Alan Cox
@ 2002-07-31 23:42     ` Lincoln Dale
  2002-08-01  1:33       ` Albert D. Cahalan
  1 sibling, 1 reply; 23+ messages in thread
From: Lincoln Dale @ 2002-07-31 23:42 UTC (permalink / raw)
  To: David Luyer; +Cc: 'Alan Cox', linux-kernel

At 10:59 PM 31/07/2002 +1000, David Luyer wrote:
>Alan Cox wrote:
> > > procps version is 2.0.7 (Debian 3.0).
> > >
> > > Where's the mistake -- should timer interrupts be on both
> > > CPUs (I think this is the problem), or is procps miscalculating
> > > Hz (seems less likely, someone would have noticed by now...)?
> >
> > HZ on x86 for user space is defined as 100. Its a procps problem
>
>Slight error in my initial diagnosis of why procps is getting Hertz
>wrong tho.  It's not because timer interrupts are only happening
>on one CPU.  It's because it thinks I have 4 CPUs per system, when
>really I only have 2 CPUs per system.

procps is still wrong.

HZ on x86 is 100 by default.
that isn't 100 per CPU, but 100 per second, regardless of whether the timer 
interrupt is distributed between CPUs or serviced on a single CPU.


cheers,

lincoln.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-08-01  0:37         ` Alan Cox
@ 2002-07-31 23:49           ` Dave Jones
  2002-08-01  1:30             ` Alan Cox
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Jones @ 2002-07-31 23:49 UTC (permalink / raw)
  To: Alan Cox; +Cc: Albert D. Cahalan, David Luyer, linux-kernel

On Thu, Aug 01, 2002 at 01:37:17AM +0100, Alan Cox wrote:
 > > Add a proper ABI now, and userspace can transition to it
 > > over the next 4 years.
 > 
 > Which is what I've been talking to Ulrich about.

I thought this was the idea behind sysconf(__SC_NPROCESSORS_CONF) ?

        Dave

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-08-01  1:30             ` Alan Cox
@ 2002-08-01  0:19               ` Dave Jones
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Jones @ 2002-08-01  0:19 UTC (permalink / raw)
  To: Alan Cox; +Cc: Albert D. Cahalan, David Luyer, linux-kernel

On Thu, Aug 01, 2002 at 02:30:57AM +0100, Alan Cox wrote:
 > sysconf is implemented in glibc. Right now this is done by poking around
 > in /proc/cpuinfo.

Gotcha, that's what I feared.

 > The kernel doesn't export the data very nicely. With
 > 2.5 and Rusty's hot swappable processors we need to export the data even
 > more explicitly.

driverfs objects perhaps ? Or something more lightweight ?

        Dave.

-- 
| Dave Jones.        http://www.codemonkey.org.uk
| SuSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-07-31 19:14       ` Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew Albert D. Cahalan
@ 2002-08-01  0:37         ` Alan Cox
  2002-07-31 23:49           ` Dave Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Cox @ 2002-08-01  0:37 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: David Luyer, linux-kernel

On Wed, 2002-07-31 at 20:14, Albert D. Cahalan wrote:
> Alan Cox writes:
> > On Wed, 2002-07-31 at 13:59, David Luyer wrote:
> 
> >>   printf("%d\n", sysconf(_SC_NPROCESSORS_CONF));
> >> }
> >> luyer@praxis8:~$ ./cpus
> >> 4
> >> luyer@praxis8:~$ grep 'processor        ' /proc/cpuinfo
> >> processor       : 0
> >> processor       : 1
> >
> > In which case I suggest you file a glibc bug. sysconf looks at the /proc
> > stuff as I understand it
> 
> First you blame ps. Then you blame libc. How about you
> place the fault right where it belongs?

ps is certainly buggy. HZ is 100. ps grovelling around in /proc is bogus
to say the least. That code wasn't exactly well written.

> Counting processors in /proc/cpuinfo is a joke of an ABI.
> 
> Add a proper ABI now, and userspace can transition to it
> over the next 4 years.

Which is what I've been talking to Ulrich about.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-07-31 23:49           ` Dave Jones
@ 2002-08-01  1:30             ` Alan Cox
  2002-08-01  0:19               ` Dave Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Alan Cox @ 2002-08-01  1:30 UTC (permalink / raw)
  To: Dave Jones; +Cc: Albert D. Cahalan, David Luyer, linux-kernel

On Thu, 2002-08-01 at 00:49, Dave Jones wrote:
> On Thu, Aug 01, 2002 at 01:37:17AM +0100, Alan Cox wrote:
>  > > Add a proper ABI now, and userspace can transition to it
>  > > over the next 4 years.
>  > 
>  > Which is what I've been talking to Ulrich about.
> 
> I thought this was the idea behind sysconf(__SC_NPROCESSORS_CONF) ?

sysconf is implemented in glibc. Right now this is done by poking around
in /proc/cpuinfo. The kernel doesn't export the data very nicely. With
2.5 and Rusty's hot swappable processors we need to export the data even
more explicitly.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-07-31 23:42     ` Lincoln Dale
@ 2002-08-01  1:33       ` Albert D. Cahalan
  2002-08-01  3:34         ` Martin J. Bligh
                           ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Albert D. Cahalan @ 2002-08-01  1:33 UTC (permalink / raw)
  To: Lincoln Dale; +Cc: David Luyer, 'Alan Cox', linux-kernel

Lincoln Dale writes:
> At 10:59 PM 31/07/2002 +1000, David Luyer wrote:
> >Alan Cox wrote:

>>> HZ on x86 for user space is defined as 100. Its a procps problem
>>
>> Slight error in my initial diagnosis of why procps is getting Hertz
>> wrong tho.  It's not because timer interrupts are only happening
>> on one CPU.  It's because it thinks I have 4 CPUs per system, when
>> really I only have 2 CPUs per system.
>
> procps is still wrong.
>
> HZ on x86 is 100 by default.
> that isn't 100 per CPU, but 100 per second, regardless of whether the timer 
> interrupt is distributed between CPUs or serviced on a single CPU.

No shit. Now, how do you create a ps executable that handles
a 2.4.xx kernel with a modified HZ value? People did this all
the time. I got many bug reports from these people, so don't
go saying they don't exist. Remember: one executable, running
on both of the these:

2.2.xx i386 as shipped by Linus
2.4.xx i386 with HZ modified

Come on, write the code if you think it's so easy.
You get bonus points for supporting 2.0.xx kernels
and the IA-64 kernel with that same executable.

Maybe you think I should tell these people to go to Hell?
In that case, what about the Alpha systems that ran HZ at
1200 instead of 1024?

I really wonder why people love to torment me for having the
decency to support systems that aren't 100% Linus-compliant.
Do you people burn idols for Linus, or only kiss his butt?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-08-01  1:33       ` Albert D. Cahalan
@ 2002-08-01  3:34         ` Martin J. Bligh
  2002-08-01 14:16           ` Alan Cox
  2002-08-01  8:49         ` Benjamin Herrenschmidt
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Martin J. Bligh @ 2002-08-01  3:34 UTC (permalink / raw)
  To: Albert D. Cahalan, Lincoln Dale
  Cc: David Luyer, 'Alan Cox', linux-kernel

> No shit. Now, how do you create a ps executable that handles
> a 2.4.xx kernel with a modified HZ value? People did this all
> the time. I got many bug reports from these people, so don't
> go saying they don't exist. Remember: one executable, running
> on both of the these:
>
> <rant deleted>

Is it somehow impossible to just export HZ in /proc, and read it?
Doesn't seem too hard to me.

M.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-08-01  1:33       ` Albert D. Cahalan
  2002-08-01  3:34         ` Martin J. Bligh
@ 2002-08-01  8:49         ` Benjamin Herrenschmidt
  2002-08-01 11:40         ` Lincoln Dale
  2002-08-01 14:15         ` Alan Cox
  3 siblings, 0 replies; 23+ messages in thread
From: Benjamin Herrenschmidt @ 2002-08-01  8:49 UTC (permalink / raw)
  To: Albert D. Cahalan, Lincoln Dale
  Cc: David Luyer, 'Alan Cox', linux-kernel

>2.2.xx i386 as shipped by Linus
>2.4.xx i386 with HZ modified
>
>Come on, write the code if you think it's so easy.
>You get bonus points for supporting 2.0.xx kernels
>and the IA-64 kernel with that same executable.
>
>Maybe you think I should tell these people to go to Hell?
>In that case, what about the Alpha systems that ran HZ at
>1200 instead of 1024?

Isn't HZ value passed down to userland via the ELF aux table ?

(At least the "userland visible" one, which isn't the kernel
internal one in recent 2.5's, oh well...)

That's a reason I don't understand why Linus did this separation
between "userland visibl" HZ and kernel internal HZ. I would have
just changed the kernel HZ and let userland be fixed to use the
value passed via the aux table instead of hard coding it.

Ben.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-08-01  1:33       ` Albert D. Cahalan
  2002-08-01  3:34         ` Martin J. Bligh
  2002-08-01  8:49         ` Benjamin Herrenschmidt
@ 2002-08-01 11:40         ` Lincoln Dale
  2002-08-01 18:26           ` Albert D. Cahalan
  2002-08-01 14:15         ` Alan Cox
  3 siblings, 1 reply; 23+ messages in thread
From: Lincoln Dale @ 2002-08-01 11:40 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: David Luyer, 'Alan Cox', linux-kernel

At 09:33 PM 31/07/2002 -0400, Albert D. Cahalan wrote:
> > HZ on x86 is 100 by default.
> > that isn't 100 per CPU, but 100 per second, regardless of whether the 
> timer
> > interrupt is distributed between CPUs or serviced on a single CPU.
>
>No shit. Now, how do you create a ps executable that handles
>a 2.4.xx kernel with a modified HZ value? People did this all
>the time. I got many bug reports from these people, so don't
>go saying they don't exist. Remember: one executable, running
>on both of the these:

thanks for the rant.  most entertaining.  for what its worth, i wasn't 
trolling.

>2.2.xx i386 as shipped by Linus
>2.4.xx i386 with HZ modified

(i assume you mean 2.4.xx i386 as shipped by Linus)

>Come on, write the code if you think it's so easy.
>You get bonus points for supporting 2.0.xx kernels
>and the IA-64 kernel with that same executable.

i suspect you're confusing me with someone else.

in either case, for ELF executables, the kernel puts the CLOCKS_PER_TICK on 
the stack when loading an elf binary.
this is defined to be HZ on all platforms except ia32 where its set to 
100.  one would hope that if you redefine HZ to something else, you also 
remember to redefine CLOCKS_PER_TICK to that same value too.

my tree uses CLOCKS_PER_TICK set to HZ for x86 too.  i also use a tree with 
HZ set to 1000 for a packet-latency-inducer packet-scheduler i use.

the following code determines the value of CLOCKS_PER_TICK in a reliable 
manner on the hosts i have here (2.4.xx, 2.5.xx, ia32):
i don't have any alpha or ia64 boxes here, but i'm confident it'll still 
give you the correct result.


--
         #include <stdio.h>
         #include <unistd.h>

         #define AT_CLKTCK       17              /* Frequency of times() */

         int main(int argc, char *argv[])
         {
                 int i = 0;

                 fprintf(stderr,"sysconf says %u ticks per 
second\n",sysconf(_SC_CLK_TCK));

                 /* loop through command-line and args */
                 while (argv[i] != NULL)
                         i++;

                 /* loop through environment variables */
                 i++;
                 while (argv[i] != NULL)
                         i++;

                 /* now at elf variables */
                 i++;
                 while (argv[i] != NULL) {
                         if ((int)argv[i] != AT_CLKTCK) {
                                 fprintf(stderr,"(elf header entry %d has 
value %d)\n",
                                         (int)argv[i], (int)(argv[(i+1)]));
                         } else {
                                 /* got it */
                                 fprintf(stderr,"eureka, elf header says we 
have %d ticks per second\n",(int)argv[(i+1)]);
                                 break;
                         }
                         i += 2;
                 }
         }
--

the code doesn't work on a 2.2.16 box here, given 2.2.16 doesn't have 
AT_CLKTCK, but i believe that is incidental to this discussion.


cheers,

lincoln.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-08-01  1:33       ` Albert D. Cahalan
                           ` (2 preceding siblings ...)
  2002-08-01 11:40         ` Lincoln Dale
@ 2002-08-01 14:15         ` Alan Cox
  3 siblings, 0 replies; 23+ messages in thread
From: Alan Cox @ 2002-08-01 14:15 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: Lincoln Dale, David Luyer, linux-kernel

On Thu, 2002-08-01 at 02:33, Albert D. Cahalan wrote:
> > HZ on x86 is 100 by default.
> > that isn't 100 per CPU, but 100 per second, regardless of whether the timer 
> > interrupt is distributed between CPUs or serviced on a single CPU.
> 
> No shit. Now, how do you create a ps executable that handles
> a 2.4.xx kernel with a modified HZ value? People did this all

HZ in /proc is still 100 on a correctly modified 2.4 kernel. If people
can't get the modifications right it isnt your fault.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-08-01  3:34         ` Martin J. Bligh
@ 2002-08-01 14:16           ` Alan Cox
  0 siblings, 0 replies; 23+ messages in thread
From: Alan Cox @ 2002-08-01 14:16 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Albert D. Cahalan, Lincoln Dale, David Luyer, linux-kernel

On Thu, 2002-08-01 at 04:34, Martin J. Bligh wrote:

> Is it somehow impossible to just export HZ in /proc, and read it?
> Doesn't seem too hard to me.

Its "100" for x86. HZ is a constant. Thats why the kernel has to keep
the values in terms of HZ published in the same format


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew
  2002-08-01 11:40         ` Lincoln Dale
@ 2002-08-01 18:26           ` Albert D. Cahalan
  0 siblings, 0 replies; 23+ messages in thread
From: Albert D. Cahalan @ 2002-08-01 18:26 UTC (permalink / raw)
  To: Lincoln Dale
  Cc: Albert D. Cahalan, David Luyer, 'Alan Cox', linux-kernel

Lincoln Dale writes:
> At 09:33 PM 31/07/2002 -0400, Albert D. Cahalan wrote:

>> No shit. Now, how do you create a ps executable that handles
>> a 2.4.xx kernel with a modified HZ value? People did this all
>> the time. I got many bug reports from these people, so don't
>> go saying they don't exist. Remember: one executable, running
>> on both of the these:
>
> thanks for the rant.  most entertaining.  for what its worth, i wasn't 
> trolling.
>
>> 2.2.xx i386 as shipped by Linus
>> 2.4.xx i386 with HZ modified
>
> (i assume you mean 2.4.xx i386 as shipped by Linus)

No.

"Debian GNU/Linux 3.0 released July 19th, 2002
...
This version of Debian supports the 2.2 and 2.4
releases of the Linux kernel."

>> Come on, write the code if you think it's so easy.
>> You get bonus points for supporting 2.0.xx kernels
>> and the IA-64 kernel with that same executable.
>
> i suspect you're confusing me with someone else.

Yes and no. You seem to express a common opinion.
Unlike the others, you may have provided a more
reliable hack than the one currently used.

> in either case, for ELF executables, the kernel puts the CLOCKS_PER_TICK on 
> the stack when loading an elf binary.
> this is defined to be HZ on all platforms except ia32 where its set to 
> 100.  one would hope that if you redefine HZ to something else, you also 
> remember to redefine CLOCKS_PER_TICK to that same value too.

Uh... that's not good. It makes AT_CLKTCK unreliable on i386, cris,
mips, and mips64. I'll have to think about your "one would hope".

> the following code determines the value of CLOCKS_PER_TICK in a reliable 
> manner on the hosts i have here (2.4.xx, 2.5.xx, ia32):
> i don't have any alpha or ia64 boxes here, but i'm confident it'll still 
> give you the correct result.

Thank you very much. I'll have to try this on a 64-bit box.
It works on 32-bit ppc with the 2.4.16 kernel.

> the code doesn't work on a 2.2.16 box here, given 2.2.16 doesn't have 
> AT_CLKTCK, but i believe that is incidental to this discussion.

Not really, but I might rely on sysconf() when AT_CLKTCK is missing.
Then I can tolerate:

a. any unmodified kernel, except alpha arch @ 1200HZ and user-mode @ 20HZ
b. any 2.4.xx kernel with HZ==CLOCKS_PER_SEC, even with an old libc
c. any 2.6.xx kernel, even with an old libc

That might be good enough. Asking users to run 2.4.xx if they want
to play with HZ is pretty reasonable. Asking them to run 2.5.xx,
or hack up the proc filesystem, is not.



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2002-08-01 18:23 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-07-31 11:01 Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew David Luyer
2002-07-31 13:28 ` Alan Cox
2002-07-31 12:59   ` David Luyer
2002-07-31 14:26     ` Alan Cox
2002-07-31 13:18       ` David Luyer
2002-07-31 13:31       ` Dana Lacoste
2002-07-31 13:38         ` David Luyer
2002-07-31 15:04           ` Alan Cox
2002-07-31 13:57             ` David Luyer
2002-07-31 16:15       ` NMI watchdog, die(), & console_loglevel Jonathan Lundell
2002-07-31 19:14       ` Linux 2.4.19ac3rc3 on IBM x330/x340 SMP - "ps" time skew Albert D. Cahalan
2002-08-01  0:37         ` Alan Cox
2002-07-31 23:49           ` Dave Jones
2002-08-01  1:30             ` Alan Cox
2002-08-01  0:19               ` Dave Jones
2002-07-31 23:42     ` Lincoln Dale
2002-08-01  1:33       ` Albert D. Cahalan
2002-08-01  3:34         ` Martin J. Bligh
2002-08-01 14:16           ` Alan Cox
2002-08-01  8:49         ` Benjamin Herrenschmidt
2002-08-01 11:40         ` Lincoln Dale
2002-08-01 18:26           ` Albert D. Cahalan
2002-08-01 14:15         ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).