linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* process start time set wrongly at boot for kernel 2.6.9
@ 2004-10-19 18:21 Jerome Borsboom
  2004-10-19 20:11 ` john stultz
  0 siblings, 1 reply; 19+ messages in thread
From: Jerome Borsboom @ 2004-10-19 18:21 UTC (permalink / raw)
  To: linux-kernel

Starting with kernel 2.6.9 the process start time is set wrongly for 
processes that get started early in the boot process. Below is a dump from 
my 'ps' command. Note the start time for processes 1-12. After process 12 
the start time is set right.

Jerome


USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1  1372  500 ?        S    20:59   0:00 init [3] 
root         2  0.0  0.0     0    0 ?        SN   20:59   0:00 [ksoftirqd/0]
root         3  0.0  0.0     0    0 ?        S<   20:59   0:00 [events/0]
root         4  0.0  0.0     0    0 ?        S<   20:59   0:00 [khelper]
root         5  0.0  0.0     0    0 ?        S<   20:59   0:00 [kblockd/0]
root         6  0.0  0.0     0    0 ?        S    20:59   0:00 [pdflush]
root         7  0.0  0.0     0    0 ?        S    20:59   0:00 [pdflush]
root         9  0.0  0.0     0    0 ?        S<   20:59   0:00 [aio/0]
root         8  0.0  0.0     0    0 ?        S    20:59   0:00 [kswapd0]
root        10  0.0  0.0     0    0 ?        S    20:59   0:00 [kseriod]
root        11  0.0  0.0     0    0 ?        S    20:59   0:00 [scsi_eh_0]
root        12  0.0  0.0     0    0 ?        S    20:59   0:00 [ahc_dv_0]
root        13  0.0  0.0     0    0 ?        S    19:48   0:00 [scsi_eh_1]
root        14  0.0  0.0     0    0 ?        S    19:48   0:00 [ahc_dv_1]
root        15  0.0  0.0     0    0 ?        S    19:48   0:00 [scsi_eh_2]
root        16  0.0  0.0     0    0 ?        S    19:48   0:00 [ahc_dv_2]
root        17  0.0  0.0     0    0 ?        S    19:49   0:00 [kjournald]
root        43  0.0  0.0     0    0 ?        S    19:49   0:00 [kjournald]
root        44  0.0  0.0     0    0 ?        S    19:49   0:00 [kjournald]
root        45  0.0  0.0     0    0 ?        S    19:49   0:00 [kjournald]
root        46  0.0  0.0     0    0 ?        S    19:49   0:00 [kjournald]
root        47  0.0  0.0     0    0 ?        S    19:49   0:00 [kjournald]
root        48  0.0  0.0     0    0 ?        S    19:49   0:00 [kjournald]
root       122  0.0  0.2  1420  552 ?        Ss   19:49   0:00 /sbin/syslogd -m 0
root       124  0.0  0.1  1376  452 ?        Ss   19:49   0:00 /sbin/klogd
root       131  0.0  0.3  1640  776 ?        Ss   19:49   0:00 /sbin/apcupsd
root       139  0.0  0.9  2444 2444 ?        SLs  19:49   0:00 /usr/bin/ntpd
ldap       148  0.0  1.8 50084 4696 ?        Ssl  19:49   0:00 /usr/sbin/slapd -4 -u ldap -h ldap:/// ldapi:///
nscd       153  0.0  0.6 10208 1640 ?        Ssl  19:49   0:00 /usr/sbin/nscd
root       162  0.0  0.5  3156 1392 ?        Ss   19:49   0:00 /usr/sbin/sshd
root       168  0.0  0.9  5396 2436 ?        Ss   19:49   0:00 sendmail: accepting connections 
smmsp      173  0.0  0.8  5176 2144 ?        Ss   19:49   0:00 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
root       177  0.0  0.2  1372  564 ?        Ss   19:49   0:00 /usr/sbin/cron

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: process start time set wrongly at boot for kernel 2.6.9
  2004-10-19 18:21 process start time set wrongly at boot for kernel 2.6.9 Jerome Borsboom
@ 2004-10-19 20:11 ` john stultz
  2004-10-20  0:42   ` Tim Schmielau
  2004-10-27  7:55   ` Tim Schmielau
  0 siblings, 2 replies; 19+ messages in thread
From: john stultz @ 2004-10-19 20:11 UTC (permalink / raw)
  To: Jerome Borsboom; +Cc: lkml, tim, george

On Tue, 2004-10-19 at 11:21, Jerome Borsboom wrote:
> Starting with kernel 2.6.9 the process start time is set wrongly for 
> processes that get started early in the boot process. Below is a dump from 
> my 'ps' command. Note the start time for processes 1-12. After process 12 
> the start time is set right.

How reproducible is this? Are the correct and incorrect time values
always off by the same amount? 

Are you running NTP? I'm curious if you are changing your system time
during boot. 

thanks
-john


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: process start time set wrongly at boot for kernel 2.6.9
  2004-10-19 20:11 ` john stultz
@ 2004-10-20  0:42   ` Tim Schmielau
  2004-10-20  0:59     ` john stultz
  2004-10-27  7:55   ` Tim Schmielau
  1 sibling, 1 reply; 19+ messages in thread
From: Tim Schmielau @ 2004-10-20  0:42 UTC (permalink / raw)
  To: john stultz; +Cc: Jerome Borsboom, lkml, george

On Tue, 19 Oct 2004, john stultz wrote:

> On Tue, 2004-10-19 at 11:21, Jerome Borsboom wrote:
> > Starting with kernel 2.6.9 the process start time is set wrongly for 
> > processes that get started early in the boot process. Below is a dump from 
> > my 'ps' command. Note the start time for processes 1-12. After process 12 
> > the start time is set right.
> 
> How reproducible is this? Are the correct and incorrect time values
> always off by the same amount? 
> 
> Are you running NTP? I'm curious if you are changing your system time
> during boot. 

I'd bet that some process early in the boot adjusts your system time.
Then this is expected behavior. This is why I would have preferred the 
simple back-out patch for the boot times problem.

I'm sorry I fell of the net for so long and didn't stand up for the 
simpler change in this case. Oh well.

I'll probably supply a back-out patch for -mm then, after wading through
my multi-megabyte email backlog (sorry John, still need to read your time
keeping proposal and all its discussion).

Tim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: process start time set wrongly at boot for kernel 2.6.9
  2004-10-20  0:42   ` Tim Schmielau
@ 2004-10-20  0:59     ` john stultz
  2004-10-20  3:05       ` gradual timeofday overhaul Tim Schmielau
  2004-10-20 14:51       ` process start time set wrongly at boot for kernel 2.6.9 George Anzinger
  0 siblings, 2 replies; 19+ messages in thread
From: john stultz @ 2004-10-20  0:59 UTC (permalink / raw)
  To: Tim Schmielau; +Cc: Jerome Borsboom, lkml, george anzinger

On Tue, 2004-10-19 at 17:42, Tim Schmielau wrote:
> On Tue, 19 Oct 2004, john stultz wrote:
> 
> > On Tue, 2004-10-19 at 11:21, Jerome Borsboom wrote:
> > > Starting with kernel 2.6.9 the process start time is set wrongly for 
> > > processes that get started early in the boot process. Below is a dump from 
> > > my 'ps' command. Note the start time for processes 1-12. After process 12 
> > > the start time is set right.
> > 
> > How reproducible is this? Are the correct and incorrect time values
> > always off by the same amount? 
> > 
> > Are you running NTP? I'm curious if you are changing your system time
> > during boot. 
> 
> I'd bet that some process early in the boot adjusts your system time.

He claims that's not the case (you weren't CC'ed on his reply, but its
on lkml). He believes the time changes before NTP starts up. Might be
something else, but I'm not sure.

> Then this is expected behavior. This is why I would have preferred the 
> simple back-out patch for the boot times problem.
> 
> I'm sorry I fell of the net for so long and didn't stand up for the 
> simpler change in this case. Oh well.
> 
> I'll probably supply a back-out patch for -mm then, after wading through
> my multi-megabyte email backlog (sorry John, still need to read your time
> keeping proposal and all its discussion).

I've begun to agree with you about this issue. It seems that until we
can catch every use of jiffies for time, doing one by one is going to
cause consistency problems.  So I'd support the full backout of the
do_posix_clock_monotonic_gettime changes to the proc interface. 

George, would you protest this?

As for the timeofday overhaul, I've had zero time to work on it
recently. I hate that I dropped code and then went missing for weeks.
I'll have to see if I can get a few cycles at home to sync up my current
tree and send it out. 

thanks
-john



^ permalink raw reply	[flat|nested] 19+ messages in thread

* gradual timeofday overhaul
  2004-10-20  0:59     ` john stultz
@ 2004-10-20  3:05       ` Tim Schmielau
  2004-10-20  7:47         ` Len Brown
  2004-10-20 18:13         ` john stultz
  2004-10-20 14:51       ` process start time set wrongly at boot for kernel 2.6.9 George Anzinger
  1 sibling, 2 replies; 19+ messages in thread
From: Tim Schmielau @ 2004-10-20  3:05 UTC (permalink / raw)
  To: john stultz; +Cc: lkml, george anzinger

On Tue, 19 Oct 2004, john stultz wrote:

> As for the timeofday overhaul, I've had zero time to work on it
> recently. I hate that I dropped code and then went missing for weeks.
> I'll have to see if I can get a few cycles at home to sync up my current
> tree and send it out. 

I still haven't looked at your code and it's discussion. From what I
remember, I liked your proposal very much. It's surely where we want to
end up someday. But from the above mail it strikes me that we just don't
have enough manpower to get there all at once, so we should have a plan 
for the time code to gradually evolve into what we finally want. I think 
we could do it in the following steps:

  1. Sync up jiffies with the monotonic clock, very much like we
     already handle lost ticks. This would immediately remove the
     hassles with incompatible time sources.
     Judging from the jiffies wrap experience, we there probably are
     some drivers which need fixing (mostly because they wait until 
     jiffies==something), but these are bugs already right now
     in the case of lost ticks.

  2. Decouple jiffies from the actual interrupt counter. We could
     then e.g. set HZ to 10000, also increasing the resolution of
     timers, without increasing the interrupt frequency.
     We'd then need to identify the places where this might lead to
     overflows and promote them to use jiffies_64 instead of jiffies
     (where this hasn't been done already).

  3. Increase HZ all the way up to 1e9. jiffies_64 would then be the
     same as your plain 64 bit nanoseconds value.
     This would require an optimization to the timer code to be able
     to increment jiffies in steps larger than 1.

Thoughts?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-20  3:05       ` gradual timeofday overhaul Tim Schmielau
@ 2004-10-20  7:47         ` Len Brown
  2004-10-20 15:09           ` George Anzinger
                             ` (2 more replies)
  2004-10-20 18:13         ` john stultz
  1 sibling, 3 replies; 19+ messages in thread
From: Len Brown @ 2004-10-20  7:47 UTC (permalink / raw)
  To: Tim Schmielau; +Cc: john stultz, lkml, george anzinger

On Tue, 2004-10-19 at 23:05, Tim Schmielau wrote:
> I think we could do it in the following steps:
> 
>   1. Sync up jiffies with the monotonic clock,...
>   2. Decouple jiffies from the actual interrupt counter...
>   3. Increase HZ all the way up to 1e9....

> Thoughts?

Yes, for long periods of idle, I'd like to see the periodic clock tick
disabled entirely.  Clock ticks causes the hardware to exit power-saving
idle states.

The current design with HZ=1000 gives us 1ms = 1000usec between clock
ticks.  But some platforms take nearly that long just to enter/exit low
power states; which means that on Linux the hardware pays a long idle
state exit latency (performance hit) but gets little or no power savings
from the time it resides in that idle state.

thanks,
-Len



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: process start time set wrongly at boot for kernel 2.6.9
  2004-10-20  0:59     ` john stultz
  2004-10-20  3:05       ` gradual timeofday overhaul Tim Schmielau
@ 2004-10-20 14:51       ` George Anzinger
  2004-10-20 17:42         ` john stultz
  1 sibling, 1 reply; 19+ messages in thread
From: George Anzinger @ 2004-10-20 14:51 UTC (permalink / raw)
  To: john stultz; +Cc: Tim Schmielau, Jerome Borsboom, lkml

john stultz wrote:
> On Tue, 2004-10-19 at 17:42, Tim Schmielau wrote:
> 
>>On Tue, 19 Oct 2004, john stultz wrote:
>>
>>
>>>On Tue, 2004-10-19 at 11:21, Jerome Borsboom wrote:
>>>
>>>>Starting with kernel 2.6.9 the process start time is set wrongly for 
>>>>processes that get started early in the boot process. Below is a dump from 
>>>>my 'ps' command. Note the start time for processes 1-12. After process 12 
>>>>the start time is set right.
>>>
>>>How reproducible is this? Are the correct and incorrect time values
>>>always off by the same amount? 
>>>
>>>Are you running NTP? I'm curious if you are changing your system time
>>>during boot. 
>>
>>I'd bet that some process early in the boot adjusts your system time.
> 
> 
> He claims that's not the case (you weren't CC'ed on his reply, but its
> on lkml). He believes the time changes before NTP starts up. Might be
> something else, but I'm not sure.
> 
> 
>>Then this is expected behavior. This is why I would have preferred the 
>>simple back-out patch for the boot times problem.
>>
>>I'm sorry I fell of the net for so long and didn't stand up for the 
>>simpler change in this case. Oh well.
>>
>>I'll probably supply a back-out patch for -mm then, after wading through
>>my multi-megabyte email backlog (sorry John, still need to read your time
>>keeping proposal and all its discussion).
> 
> 
> I've begun to agree with you about this issue. It seems that until we
> can catch every use of jiffies for time, doing one by one is going to
> cause consistency problems.  So I'd support the full backout of the
> do_posix_clock_monotonic_gettime changes to the proc interface. 
> 
> George, would you protest this?

It seems to me that if we do that we will stop making any changes at all.  I.e. 
we will not see the rest of the "jiffies for time" code, as it will not "hurt" 
any more.

Also, the orgional change was made for a reason...

-g
> 
> As for the timeofday overhaul, I've had zero time to work on it
> recently. I hate that I dropped code and then went missing for weeks.
> I'll have to see if I can get a few cycles at home to sync up my current
> tree and send it out. 
> 
> thanks
> -john
> 
> 

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-20  7:47         ` Len Brown
@ 2004-10-20 15:09           ` George Anzinger
  2004-10-20 15:59             ` Richard B. Johnson
  2004-10-20 15:17           ` George Anzinger
  2004-10-20 17:09           ` Lee Revell
  2 siblings, 1 reply; 19+ messages in thread
From: George Anzinger @ 2004-10-20 15:09 UTC (permalink / raw)
  To: Len Brown; +Cc: Tim Schmielau, john stultz, lkml

Len Brown wrote:
> On Tue, 2004-10-19 at 23:05, Tim Schmielau wrote:
> 
>>I think we could do it in the following steps:
>>
>>  1. Sync up jiffies with the monotonic clock,...
>>  2. Decouple jiffies from the actual interrupt counter...
>>  3. Increase HZ all the way up to 1e9....

Before we do any of the above, I think we need to stop and ponder just what a 
"jiffie" is.  Currently it is, by default (or historically) the "basic tick" of 
the system clock.  On top of this a lot of interpolation code has been "grafted" 
to allow the system to resolve time to finer levels, i.e. to the nanosecond. 
But none of this interpolation code actually changes the tick, i.e. the 
interrupt still happens at the same periodic rate.

As the "basic tick", it is used to do a lot of accounting and scheduling house 
keeping AND as a driver of the system timers.

So, by this definition, it REQUIRES a system interrupt.

I have built a "tick less" system and have evidence from that that such systems 
are over load prone.  The faster the context switch rate, the more accounting 
needs to be done.  On the otherhand, the ticked system has flat accounting 
overhead WRT load.

Regardless of what definitions we settle on, the system needs an interrupt 
source to drive the system timers, and, as I indicate above, the accounting and 
scheduling stuff.  It is a MUST that these interrupts occure at the required 
times or the system timers will be off.  This is why we have a jiffies value 
that is "rather odd" in the x86 today.

George


> 
> 
>>Thoughts?
> 
> 
> Yes, for long periods of idle, I'd like to see the periodic clock tick
> disabled entirely.  Clock ticks causes the hardware to exit power-saving
> idle states.
> 
> The current design with HZ=1000 gives us 1ms = 1000usec between clock
> ticks.  But some platforms take nearly that long just to enter/exit low
> power states; which means that on Linux the hardware pays a long idle
> state exit latency (performance hit) but gets little or no power savings
> from the time it resides in that idle state.
> 
> thanks,
> -Len
> 
> 

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-20  7:47         ` Len Brown
  2004-10-20 15:09           ` George Anzinger
@ 2004-10-20 15:17           ` George Anzinger
  2004-10-20 17:09           ` Lee Revell
  2 siblings, 0 replies; 19+ messages in thread
From: George Anzinger @ 2004-10-20 15:17 UTC (permalink / raw)
  To: Len Brown; +Cc: Tim Schmielau, john stultz, lkml

Len Brown wrote:
> On Tue, 2004-10-19 at 23:05, Tim Schmielau wrote:
> 
>>I think we could do it in the following steps:
>>
>>  1. Sync up jiffies with the monotonic clock,...
>>  2. Decouple jiffies from the actual interrupt counter...
>>  3. Increase HZ all the way up to 1e9....
> 
> 
>>Thoughts?
> 
> 
> Yes, for long periods of idle, I'd like to see the periodic clock tick
> disabled entirely.  Clock ticks causes the hardware to exit power-saving
> idle states.
> 
> The current design with HZ=1000 gives us 1ms = 1000usec between clock
> ticks.  But some platforms take nearly that long just to enter/exit low
> power states; which means that on Linux the hardware pays a long idle
> state exit latency (performance hit) but gets little or no power savings
> from the time it resides in that idle state.


I (and MontaVista) will be expanding on the VST patches.  There are, currently, 
two levels of VST.  VST-I when entering the idle state (task) looks ahead in the 
timer list, finds the next event, and shuts down the "tick" until that time.  An 
interrupts resets things, be it from the end of the time counter or another source.

VST-II adds a call back list to idle entry and exit.  This allows one to add 
code to change (or even remove) timers on idle entry and restore them on exit.

We are doing this work to support deeply embedded applications that often times 
run on small batteries (think cell phone if you like).
-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-20 15:09           ` George Anzinger
@ 2004-10-20 15:59             ` Richard B. Johnson
  0 siblings, 0 replies; 19+ messages in thread
From: Richard B. Johnson @ 2004-10-20 15:59 UTC (permalink / raw)
  To: George Anzinger; +Cc: Len Brown, Tim Schmielau, john stultz, lkml

On Wed, 20 Oct 2004, George Anzinger wrote:

> Len Brown wrote:
>> On Tue, 2004-10-19 at 23:05, Tim Schmielau wrote:
>> 
>>> I think we could do it in the following steps:
>>> 
>>>  1. Sync up jiffies with the monotonic clock,...
>>>  2. Decouple jiffies from the actual interrupt counter...
>>>  3. Increase HZ all the way up to 1e9....
>
> Before we do any of the above, I think we need to stop and ponder just what a 
> "jiffie" is.  Currently it is, by default (or historically) the "basic tick" 
> of the system clock.  On top of this a lot of interpolation code has been 
> "grafted" to allow the system to resolve time to finer levels, i.e. to the 
> nanosecond. But none of this interpolation code actually changes the tick, 
> i.e. the interrupt still happens at the same periodic rate.
>
> As the "basic tick", it is used to do a lot of accounting and scheduling 
> house keeping AND as a driver of the system timers.
>
> So, by this definition, it REQUIRES a system interrupt.
>
> I have built a "tick less" system and have evidence from that that such 
> systems are over load prone.  The faster the context switch rate, the more 
> accounting needs to be done.  On the otherhand, the ticked system has flat 
> accounting overhead WRT load.
>
> Regardless of what definitions we settle on, the system needs an interrupt 
> source to drive the system timers, and, as I indicate above, the accounting 
> and scheduling stuff.  It is a MUST that these interrupts occure at the 
> required times or the system timers will be off.  This is why we have a 
> jiffies value that is "rather odd" in the x86 today.
>
> George
>
>

You need that hardware interrupt for more than time-keeping.
Without a hardware-interrupt, to force a new time-slice,

 	for(;;)
            ;

... would allow a user to grab the CPU forever ...

So, getting rid of the hardware interrupt can't be done.
Also, much effort has gone into obtaining high resolution
timing without any high resolution hardware to back it
up. This means that user's can get numbers like 987,654
microseconds and the last 654 are as valuable as teats
on a bull. With a HZ timer tick, you get 1/HZ resolution
pure and simple. The rest of the "interpolation" is just
guess-work which leads to lots of problems, especially
when one attempts to read a spinning down-count value
from a hardware device accessed off some ports!

If the ix86 CMOS timer was used you could get better
accuracy than present, but accuracy is something one
can accommodate with automatic adjustment of time,
tracable to some appropriate standard.

The top-level schedule-code could contain some flag that
says; "are we in a power-down mode". If so, it could
execute minimal in-cache code, i.e. :

 		for(;;)
                 {
                    hlt();	// Sleep until next tick
 		   if(mode != power_down)
                        schedule();

                 }

The timer-tick ISR or any other ISR wakes us up from halt.
This keeps the system sleeping, not wasting power grabbing
code/data from RAM and grunching some numbers that are
not going to be used.


>> 
>> 
>>> Thoughts?
>> 
>> 
>> Yes, for long periods of idle, I'd like to see the periodic clock tick
>> disabled entirely.  Clock ticks causes the hardware to exit power-saving
>> idle states.
>> 
>> The current design with HZ=1000 gives us 1ms = 1000usec between clock
>> ticks.  But some platforms take nearly that long just to enter/exit low
>> power states; which means that on Linux the hardware pays a long idle
>> state exit latency (performance hit) but gets little or no power savings
>> from the time it resides in that idle state.
>> 
>> thanks,
>> -Len
>> 
>> 
>
> -- 
> George Anzinger   george@mvista.com
> High-res-timers:  http://sourceforge.net/projects/high-res-timers/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 GrumpyMips).
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-20  7:47         ` Len Brown
  2004-10-20 15:09           ` George Anzinger
  2004-10-20 15:17           ` George Anzinger
@ 2004-10-20 17:09           ` Lee Revell
  2004-10-20 21:42             ` Len Brown
  2 siblings, 1 reply; 19+ messages in thread
From: Lee Revell @ 2004-10-20 17:09 UTC (permalink / raw)
  To: Len Brown; +Cc: Tim Schmielau, john stultz, lkml, george anzinger

On Wed, 2004-10-20 at 03:47, Len Brown wrote:
> The current design with HZ=1000 gives us 1ms = 1000usec between clock
> ticks.  But some platforms take nearly that long just to enter/exit low
> power states; which means that on Linux the hardware pays a long idle
> state exit latency (performance hit) but gets little or no power savings
> from the time it resides in that idle state.

My testing shows that the timer interrupt runs for about 21 usec. 
That's 2.1% of its time just running the timer ISR!  No wonder this
causes PM issues, 2.1% cpu load is not exactly an idle machine.  This is
a 600Mhz C3, so on a slower embedded system this might be 5%.

So, any solution that would allow high res timers with Hz = 100 would be
welcome.

Lee


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: process start time set wrongly at boot for kernel 2.6.9
  2004-10-20 14:51       ` process start time set wrongly at boot for kernel 2.6.9 George Anzinger
@ 2004-10-20 17:42         ` john stultz
  2004-10-20 23:52           ` George Anzinger
  0 siblings, 1 reply; 19+ messages in thread
From: john stultz @ 2004-10-20 17:42 UTC (permalink / raw)
  To: george anzinger; +Cc: Tim Schmielau, Jerome Borsboom, lkml

On Wed, 2004-10-20 at 07:51, George Anzinger wrote:
> john stultz wrote:
> > I've begun to agree with you about this issue. It seems that until we
> > can catch every use of jiffies for time, doing one by one is going to
> > cause consistency problems.  So I'd support the full backout of the
> > do_posix_clock_monotonic_gettime changes to the proc interface. 
> > 
> > George, would you protest this?
> 
> It seems to me that if we do that we will stop making any changes at all.  I.e. 
> we will not see the rest of the "jiffies for time" code, as it will not "hurt" 
> any more.

Sorry, not sure I followed that. Could you explain further?

> Also, the orgional change was made for a reason...

Right, but I thought it was you who made the original change, and I
don't recall you answering what that reason was? I wouldn't want the
code ripped out if it was fixing an actual problem, so that's why I'm
asking.

At the moment, I'd like the idea I think Tim is suggesting. Where we fix
time so we have a stable base, then we decouple xtime and jiffies from
the timer interrupt and instead emulate them from the time code. 

So rather then every tick incrementing jiffies, instead jiffies is set
equal to (monotonic_clock()*HZ)/NSEC_PER_SEC. 

Thoughts?
-john







^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-20  3:05       ` gradual timeofday overhaul Tim Schmielau
  2004-10-20  7:47         ` Len Brown
@ 2004-10-20 18:13         ` john stultz
  1 sibling, 0 replies; 19+ messages in thread
From: john stultz @ 2004-10-20 18:13 UTC (permalink / raw)
  To: Tim Schmielau; +Cc: lkml, george anzinger

On Tue, 2004-10-19 at 20:05, Tim Schmielau wrote:
> On Tue, 19 Oct 2004, john stultz wrote:
> 
> > As for the timeofday overhaul, I've had zero time to work on it
> > recently. I hate that I dropped code and then went missing for weeks.
> > I'll have to see if I can get a few cycles at home to sync up my current
> > tree and send it out. 
> 
> I still haven't looked at your code and it's discussion. From what I
> remember, I liked your proposal very much. It's surely where we want to
> end up someday. But from the above mail it strikes me that we just don't
> have enough manpower to get there all at once, so we should have a plan 
> for the time code to gradually evolve into what we finally want. I think 
> we could do it in the following steps:
> 
>   1. Sync up jiffies with the monotonic clock...
> 
>   2. Decouple jiffies from the actual interrupt counter...
> 
>   3. Increase HZ all the way up to 1e9....
> Thoughts?

They all sound good. I like the notion of basing jiffies off of system
time, rather then interrupt counts. However, I'm a little cautious of
changing the meaning of jiffies too drastically. 

Right now jiffies has two core meanings:
1. Count of the number of timer ticks that have passed.
2. Accurate system uptime, measured in units of 1/HZ
(Let me know if I forgot any others)

The problem being, neither of those meaning are 100% true. 
#1 isn't true because when we loose timer ticks, we try to compensate
for them (i386 specifically). But at the same time #2 isn't true because
the timer interrupts don't necessarily run at exactly HZ (again, i386
specifically).

Basically due to our hardware constraints, we need to break one of these
two assumptions. The problem is which do we choose? 

Do we base jiffies off of monotonic_clock(), guaranteeing #2 and
possibly breaking anyone who is assuming #1? Or do we change all users
of jiffies for time to use monotonic_clock, guaranteeing #1, which will
require quite a bit of work.

And which choice makes it harder for folks to create tickless systems?
Its a tough call.

On top of that, we still have the issue that the current  interpolation
used in the time of day subsystem is broken (in my opinion), and we need
to fix that before we can have a reliable monotonic_clock. 

The joke being of course that I'll need to set my /etc/ntp/ntp.drift
file to 500 to find the time to work on any of this. And really, anyone
who really found that funny needs to go home.

thanks
-john


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-20 17:09           ` Lee Revell
@ 2004-10-20 21:42             ` Len Brown
  0 siblings, 0 replies; 19+ messages in thread
From: Len Brown @ 2004-10-20 21:42 UTC (permalink / raw)
  To: Lee Revell; +Cc: Tim Schmielau, john stultz, lkml, george anzinger

On Wed, 2004-10-20 at 13:09, Lee Revell wrote:
> On Wed, 2004-10-20 at 03:47, Len Brown wrote:
> > The current design with HZ=1000 gives us 1ms = 1000usec between
> > clock ticks.  But some platforms take nearly that long just
> > to enter/exit low power states; which means that on Linux
> > the hardware pays a long idle state exit latency
> > (performance hit) but gets little or no power savings
> > from the time it resides in that idle state.
> 
> My testing shows that the timer interrupt runs for about 21 usec.
> That's 2.1% of its time just running the timer ISR!  No wonder this
> causes PM issues, 2.1% cpu load is not exactly an idle machine.  This
> is a 600Mhz C3, so on a slower embedded system this might be 5%.
> 
> So, any solution that would allow high res timers with Hz = 100 would
> be welcome.

5% residency in the clock tick handler is likely more of a problem when
we're _not_ idle -- a 5% performance hit.  When we're idle we've got
nothing better to do with the processor than run these instructions for
5% of the time and run no instructions 95% of the time -- so tick
handler residency isn't the problem in idle, tick frequency is the
problem.

When an interrupt occurrs, the hardware needs to ramp up its voltages,
resume its clocks and all the stuff it need to do to get out of the
power saving state to run the code that services the interrupt.  This
"exit latency" can take a long time.  On a volume Centrino system today
it is up to 185usec.  On other hardware it is as high as 1000 usec. 
Time spent in this exit latency is a double penalty -- we're not saving
power and we're delaying before the processor starts executing
instructions -- so we want to pay this price only when necessary.

-Len


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: process start time set wrongly at boot for kernel 2.6.9
  2004-10-20 17:42         ` john stultz
@ 2004-10-20 23:52           ` George Anzinger
  2004-10-21  0:25             ` john stultz
  0 siblings, 1 reply; 19+ messages in thread
From: George Anzinger @ 2004-10-20 23:52 UTC (permalink / raw)
  To: john stultz; +Cc: Tim Schmielau, Jerome Borsboom, lkml

john stultz wrote:
> On Wed, 2004-10-20 at 07:51, George Anzinger wrote:
> 
>>john stultz wrote:
>>
>>>I've begun to agree with you about this issue. It seems that until we
>>>can catch every use of jiffies for time, doing one by one is going to
>>>cause consistency problems.  So I'd support the full backout of the
>>>do_posix_clock_monotonic_gettime changes to the proc interface. 
>>>
>>>George, would you protest this?
>>
>>It seems to me that if we do that we will stop making any changes at all.  I.e. 
>>we will not see the rest of the "jiffies for time" code, as it will not "hurt" 
>>any more.
> 
> 
> Sorry, not sure I followed that. Could you explain further?

If we rip out the code folks will stop sending in bug reports on it.  Simple as 
that.
> 
> 
>>Also, the orgional change was made for a reason...
> 
> 
> Right, but I thought it was you who made the original change, and I
> don't recall you answering what that reason was? I wouldn't want the
> code ripped out if it was fixing an actual problem, so that's why I'm
> asking.

As I recall the problem was that uptime was not matching the elapsed wall clock. 
  This was because it was jiffies based and the 1/HZ assumption was made about 
what a jiffie is.  When jiffies became ~1/HZ instead of =1/HZ we started all the 
"good times".  And, this was done because 1/HZ could not be obtained with the 
PIT interrupt source with enought accuracy to satisfy ntp code.
> 
> At the moment, I'd like the idea I think Tim is suggesting. Where we fix
> time so we have a stable base, then we decouple xtime and jiffies from
> the timer interrupt and instead emulate them from the time code. 

The can of worms here is decoupling jiffies from the timer interrupt.  Jiffies 
is (like it or not) the unit of measure used for timers and these _require_ and 
interrupt AND it should be consistantly within a few 10s of usec of when the 
jiffie changes.
> 
> So rather then every tick incrementing jiffies, instead jiffies is set
> equal to (monotonic_clock()*HZ)/NSEC_PER_SEC. 

As mention by me (a long time ago), this assumes you have a better source for 
the clock than the interrupt.  I would argue that on the x86 (which I admit is 
really deficient) the best long term clock is, in fact, the PIT interrupt.  The 
_best_ clock on the x86, IMHO, is one that used the PIT interrupt as the gold 
standard.  Then one smooths this to eliminate interrupt latency issues and lost 
ticks using the TSC.   The pm_timer is as good as the PIT but suffers from 
access time issues.


-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: process start time set wrongly at boot for kernel 2.6.9
  2004-10-20 23:52           ` George Anzinger
@ 2004-10-21  0:25             ` john stultz
  2004-10-21  1:04               ` George Anzinger
  0 siblings, 1 reply; 19+ messages in thread
From: john stultz @ 2004-10-21  0:25 UTC (permalink / raw)
  To: george anzinger; +Cc: Tim Schmielau, Jerome Borsboom, lkml

On Wed, 2004-10-20 at 16:52, George Anzinger wrote:
> john stultz wrote:
> > On Wed, 2004-10-20 at 07:51, George Anzinger wrote:
> > 
> >>john stultz wrote:
> >>
> >>>I've begun to agree with you about this issue. It seems that until we
> >>>can catch every use of jiffies for time, doing one by one is going to
> >>>cause consistency problems.  So I'd support the full backout of the
> >>>do_posix_clock_monotonic_gettime changes to the proc interface. 
> >>>
> >>>George, would you protest this?
> >>
> >>It seems to me that if we do that we will stop making any changes at all.  I.e. 
> >>we will not see the rest of the "jiffies for time" code, as it will not "hurt" 
> >>any more.
> > 
> > 
> > Sorry, not sure I followed that. Could you explain further?
> 
> If we rip out the code folks will stop sending in bug reports on it.  Simple as 
> that.

So you feel that we're moving in the right direction, its just that its
going to take a few passes before everything smooths out? Thus its just
a continuation of the effort?

Tim? Is this OK with you, or you feel the immediate inconsistencies  and
bug reports aren't worth the effort?


> > So rather then every tick incrementing jiffies, instead jiffies is set
> > equal to (monotonic_clock()*HZ)/NSEC_PER_SEC. 
> 
> As mention by me (a long time ago), this assumes you have a better source for 
> the clock than the interrupt.  I would argue that on the x86 (which I admit is 
> really deficient) the best long term clock is, in fact, the PIT interrupt.  The 
> _best_ clock on the x86, IMHO, is one that used the PIT interrupt as the gold 
> standard.  Then one smooths this to eliminate interrupt latency issues and lost 
> ticks using the TSC.   The pm_timer is as good as the PIT but suffers from 
> access time issues.

Well, assuming the PIT is programmed to a value it can actually run at
accurately, you might be right. 

The only problem is I've started to arrive at the notion of
interpolation between multiple problematic timesources is just a rats
nest. When you can't trust timer interrupts to arrive and you can't
trust the TSC to run at the right frequency, there's no way to figure
out who's right.  We already have the lost-tick compensation code, but
we still get time inconsistencies. Now maybe I'm just too dim witted to
make it work, but the more I look at it, the more corner cases appear
and the uglier the code gets. 

I say pick a timesource you can trust on your machine and stick to it.
NTP is there to correct for drift, so just use it.

-john




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: process start time set wrongly at boot for kernel 2.6.9
  2004-10-21  0:25             ` john stultz
@ 2004-10-21  1:04               ` George Anzinger
  0 siblings, 0 replies; 19+ messages in thread
From: George Anzinger @ 2004-10-21  1:04 UTC (permalink / raw)
  To: john stultz; +Cc: Tim Schmielau, Jerome Borsboom, lkml

john stultz wrote:
~

> 
>>>So rather then every tick incrementing jiffies, instead jiffies is set
>>>equal to (monotonic_clock()*HZ)/NSEC_PER_SEC. 
>>
>>As mention by me (a long time ago), this assumes you have a better source for 
>>the clock than the interrupt.  I would argue that on the x86 (which I admit is 
>>really deficient) the best long term clock is, in fact, the PIT interrupt.  The 
>>_best_ clock on the x86, IMHO, is one that used the PIT interrupt as the gold 
>>standard.  Then one smooths this to eliminate interrupt latency issues and lost 
>>ticks using the TSC.   The pm_timer is as good as the PIT but suffers from 
>>access time issues.
> 
> 
> Well, assuming the PIT is programmed to a value it can actually run at
> accurately, you might be right. 
> 
> The only problem is I've started to arrive at the notion of
> interpolation between multiple problematic timesources is just a rats
> nest. When you can't trust timer interrupts to arrive and you can't
> trust the TSC to run at the right frequency, there's no way to figure
> out who's right.  We already have the lost-tick compensation code, but
> we still get time inconsistencies. Now maybe I'm just too dim witted to
> make it work, but the more I look at it, the more corner cases appear
> and the uglier the code gets. 
> 
> I say pick a timesource you can trust on your machine and stick to it.
> NTP is there to correct for drift, so just use it.
> 
Lets try to remember that the x86 WRT time is a real pile of used hay.  Even the 
"fixes" the hardware folks are spinning out reflect a real lack of 
understanding.  A pm_timer that you can not trust is doubly bad, but then they 
thought it was part of the powerdown code so...  The new timer which we may see 
on real machines some day, is still in I/O space (read REALLY SLOW TO ACCESS) 
for starters.

Back in my days at HP we (HP) talked with intel and, to some extent, caused a 
change in the IA64.  That machine, and a lot of other platforms, have decent 
time keeping hardware.  All we have to do is wait for the x86 to die :).
-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: process start time set wrongly at boot for kernel 2.6.9
  2004-10-19 20:11 ` john stultz
  2004-10-20  0:42   ` Tim Schmielau
@ 2004-10-27  7:55   ` Tim Schmielau
  1 sibling, 0 replies; 19+ messages in thread
From: Tim Schmielau @ 2004-10-27  7:55 UTC (permalink / raw)
  To: john stultz; +Cc: Jerome Borsboom, lkml, george

On Tue, 19 Oct 2004, john stultz wrote:

> On Tue, 2004-10-19 at 11:21, Jerome Borsboom wrote:
> > Starting with kernel 2.6.9 the process start time is set wrongly for 
> > processes that get started early in the boot process. Below is a dump from 
> > my 'ps' command. Note the start time for processes 1-12. After process 12 
> > the start time is set right.
> 
> How reproducible is this? Are the correct and incorrect time values
> always off by the same amount? 

If the problem is reproducible, does it go away with the following patch 
against 2.6.9?

An untested patch against 2.6.9-mm1 is at
  http://www.physik3.uni-rostock.de/tim/kernel/2.6/uptime-fix-09.patch

Tim


--- linux-2.6.9/fs/proc/array.c	2004-10-27 00:04:58.000000000 +0200
+++ linux-2.6.9-uf/fs/proc/array.c	2004-10-27 01:44:13.000000000 +0200
@@ -360,11 +360,7 @@ int proc_pid_stat(struct task_struct *ta
 	read_unlock(&tasklist_lock);
 
 	/* Temporary variable needed for gcc-2.96 */
-	/* convert timespec -> nsec*/
-	start_time = (unsigned long long)task->start_time.tv_sec * NSEC_PER_SEC
-				+ task->start_time.tv_nsec;
-	/* convert nsec -> ticks */
-	start_time = nsec_to_clock_t(start_time);
+	start_time = jiffies_64_to_clock_t(task->start_time - INITIAL_JIFFIES);
 
 	res = sprintf(buffer,"%d (%s) %c %d %d %d %d %d %lu %lu \
 %lu %lu %lu %lu %lu %ld %ld %ld %ld %d %ld %llu %lu %ld %lu %lu %lu %lu %lu \

--- linux-2.6.9/fs/proc/proc_misc.c	2004-10-27 00:04:58.000000000 +0200
+++ linux-2.6.9-uf/fs/proc/proc_misc.c	2004-10-27 01:44:23.000000000 +0200
@@ -133,19 +133,36 @@ static struct vmalloc_info get_vmalloc_i
 static int uptime_read_proc(char *page, char **start, off_t off,
 				 int count, int *eof, void *data)
 {
-	struct timespec uptime;
-	struct timespec idle;
+	u64 uptime;
+	unsigned long uptime_remainder;
 	int len;
-	u64 idle_jiffies = init_task.utime + init_task.stime;
 
-	do_posix_clock_monotonic_gettime(&uptime);
-	jiffies_to_timespec(idle_jiffies, &idle);
-	len = sprintf(page,"%lu.%02lu %lu.%02lu\n",
-			(unsigned long) uptime.tv_sec,
-			(uptime.tv_nsec / (NSEC_PER_SEC / 100)),
-			(unsigned long) idle.tv_sec,
-			(idle.tv_nsec / (NSEC_PER_SEC / 100)));
+	uptime = get_jiffies_64() - INITIAL_JIFFIES;
+	uptime_remainder = (unsigned long) do_div(uptime, HZ);
 
+#if HZ!=100
+	{
+		u64 idle = init_task.utime + init_task.stime;
+		unsigned long idle_remainder;
+
+		idle_remainder = (unsigned long) do_div(idle, HZ);
+		len = sprintf(page,"%lu.%02lu %lu.%02lu\n",
+			(unsigned long) uptime,
+			(uptime_remainder * 100) / HZ,
+			(unsigned long) idle,
+			(idle_remainder * 100) / HZ);
+	}
+#else
+	{
+		unsigned long idle = init_task.utime + init_task.stime;
+
+		len = sprintf(page,"%lu.%02lu %lu.%02lu\n",
+			(unsigned long) uptime,
+			uptime_remainder,
+			idle / HZ,
+			idle % HZ);
+	}
+#endif
 	return proc_calc_metrics(page, start, off, count, eof, len);
 }
 

--- linux-2.6.9/include/linux/acct.h	2004-10-27 00:04:58.000000000 +0200
+++ linux-2.6.9-uf/include/linux/acct.h	2004-10-27 01:44:13.000000000 +0200
@@ -172,22 +172,15 @@ static inline u32 jiffies_to_AHZ(unsigne
 #endif
 }
 
-static inline u64 nsec_to_AHZ(u64 x)
+static inline u64 jiffies_64_to_AHZ(u64 x)
 {
-#if (NSEC_PER_SEC % AHZ) == 0
-	do_div(x, (NSEC_PER_SEC / AHZ));
-#elif (AHZ % 512) == 0
-	x *= AHZ/512;
-	do_div(x, (NSEC_PER_SEC / 512));
+#if (TICK_NSEC % (NSEC_PER_SEC / AHZ)) == 0
+#if HZ != AHZ
+	do_div(x, HZ / AHZ);
+#endif
 #else
-	/*
-         * max relative error 5.7e-8 (1.8s per year) for AHZ <= 1024,
-         * overflow after 64.99 years.
-         * exact for AHZ=60, 72, 90, 120, 144, 180, 300, 600, 900, ...
-         */
-	x *= 9;
-	do_div(x, (unsigned long)((9ull * NSEC_PER_SEC + (AHZ/2))
-	                          / AHZ));
+	x *= TICK_NSEC;
+	do_div(x, (NSEC_PER_SEC / AHZ));
 #endif
	return x;
 }

--- linux-2.6.9/include/linux/sched.h	2004-10-27 00:04:58.000000000 +0200
+++ linux-2.6.9-uf/include/linux/sched.h	2004-10-27 01:44:13.000000000 +0200
@@ -508,7 +508,7 @@ struct task_struct {
 	struct timer_list real_timer;
 	unsigned long utime, stime;
 	unsigned long nvcsw, nivcsw; /* context switch counts */
-	struct timespec start_time;
+	u64 start_time;
 /* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
 	unsigned long min_flt, maj_flt;
 /* process credentials */

--- linux-2.6.9/include/linux/times.h	2004-10-27 00:04:58.000000000 +0200
+++ linux-2.6.9-uf/include/linux/times.h	2004-10-27 01:44:23.000000000 +0200
@@ -7,16 +7,11 @@
 #include <asm/types.h>
 #include <asm/param.h>
 
-static inline clock_t jiffies_to_clock_t(long x)
-{
-#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0
-	return x / (HZ / USER_HZ);
+#if (HZ % USER_HZ)==0
+# define jiffies_to_clock_t(x) ((x) / (HZ / USER_HZ))
 #else
-	u64 tmp = (u64)x * TICK_NSEC;
-	do_div(tmp, (NSEC_PER_SEC / USER_HZ));
-	return (long)tmp;
+# define jiffies_to_clock_t(x) ((clock_t) jiffies_64_to_clock_t((u64) x))
 #endif
-}
 
 static inline unsigned long clock_t_to_jiffies(unsigned long x)
 {
@@ -40,7 +35,7 @@ static inline unsigned long clock_t_to_j
 
 static inline u64 jiffies_64_to_clock_t(u64 x)
 {
-#if (TICK_NSEC % (NSEC_PER_SEC / USER_HZ)) == 0
+#if (HZ % USER_HZ)==0
 	do_div(x, HZ / USER_HZ);
 #else
 	/*
@@ -48,33 +43,13 @@ static inline u64 jiffies_64_to_clock_t(
 	 * but even this doesn't overflow in hundreds of years
 	 * in 64 bits, so..
 	 */
-	x *= TICK_NSEC;
-	do_div(x, (NSEC_PER_SEC / USER_HZ));
+	x *= USER_HZ;
+	do_div(x, HZ);
 #endif
 	return x;
 }
 #endif
 
-static inline u64 nsec_to_clock_t(u64 x)
-{
-#if (NSEC_PER_SEC % USER_HZ) == 0
-	do_div(x, (NSEC_PER_SEC / USER_HZ));
-#elif (USER_HZ % 512) == 0
-	x *= USER_HZ/512;
-	do_div(x, (NSEC_PER_SEC / 512));
-#else
-	/*
-         * max relative error 5.7e-8 (1.8s per year) for USER_HZ <= 1024,
-         * overflow after 64.99 years.
-         * exact for HZ=60, 72, 90, 120, 144, 180, 300, 600, 900, ...
-         */
-	x *= 9;
-	do_div(x, (unsigned long)((9ull * NSEC_PER_SEC + (USER_HZ/2))
-	                          / USER_HZ));
-#endif
-	return x;
-}
-
 struct tms {
 	clock_t tms_utime;
 	clock_t tms_stime;

--- linux-2.6.9/kernel/acct.c	2004-10-27 00:04:58.000000000 +0200
+++ linux-2.6.9-uf/kernel/acct.c	2004-10-27 01:44:13.000000000 +0200
@@ -384,8 +384,6 @@ static void do_acct_process(long exitcod
 	unsigned long vsize;
 	unsigned long flim;
 	u64 elapsed;
-	u64 run_time;
-	struct timespec uptime;
 
 	/*
 	 * First check to see if there is enough free_space to continue
@@ -403,13 +401,7 @@ static void do_acct_process(long exitcod
 	ac.ac_version = ACCT_VERSION | ACCT_BYTEORDER;
 	strlcpy(ac.ac_comm, current->comm, sizeof(ac.ac_comm));
 
-	/* calculate run_time in nsec*/
-	do_posix_clock_monotonic_gettime(&uptime);
-	run_time = (u64)uptime.tv_sec*NSEC_PER_SEC + uptime.tv_nsec;
-	run_time -= (u64)current->start_time.tv_sec*NSEC_PER_SEC
-					+ current->start_time.tv_nsec;
-	/* convert nsec -> AHZ */
-	elapsed = nsec_to_AHZ(run_time);
+	elapsed = jiffies_64_to_AHZ(get_jiffies_64() - current->start_time);
 #if ACCT_VERSION==3
 	ac.ac_etime = encode_float(elapsed);
 #else

--- linux-2.6.9/kernel/fork.c	2004-10-27 00:04:58.000000000 +0200
+++ linux-2.6.9-uf/kernel/fork.c	2004-10-27 01:44:13.000000000 +0200
@@ -992,7 +992,7 @@ static task_t *copy_process(unsigned lon
 
 	p->utime = p->stime = 0;
 	p->lock_depth = -1;		/* -1 = no lock */
-	do_posix_clock_monotonic_gettime(&p->start_time);
+	p->start_time = get_jiffies_64();
 	p->security = NULL;
 	p->io_context = NULL;
 	p->io_wait = NULL;

--- linux-2.6.9/mm/oom_kill.c	2004-10-27 00:04:59.000000000 +0200
+++ linux-2.6.9-uf/mm/oom_kill.c	2004-10-27 01:44:13.000000000 +0200
@@ -26,7 +26,6 @@
 /**
  * oom_badness - calculate a numeric value for how bad this task has been
  * @p: task struct of which task we should calculate
- * @p: current uptime in seconds
  *
  * The formula used is relatively simple and documented inline in the
  * function. The main rationale is that we want to select a good task
@@ -42,7 +41,7 @@
  *    of least surprise ... (be careful when you change it)
  */
 
-static unsigned long badness(struct task_struct *p, unsigned long uptime)
+static unsigned long badness(struct task_struct *p)
 {
 	unsigned long points, cpu_time, run_time, s;
 
@@ -57,16 +56,12 @@ static unsigned long badness(struct task
 	points = p->mm->total_vm;
 
 	/*
-	 * CPU time is in tens of seconds and run time is in thousands
-         * of seconds. There is no particular reason for this other than
-         * that it turned out to work very well in practice.
+	 * CPU time is in seconds and run time is in minutes. There is no
+	 * particular reason for this other than that it turned out to work
+	 * very well in practice.
 	 */
 	cpu_time = (p->utime + p->stime) >> (SHIFT_HZ + 3);
-
-	if (uptime >= p->start_time.tv_sec)
-		run_time = (uptime - p->start_time.tv_sec) >> 10;
-	else
-		run_time = 0;
+	run_time = (get_jiffies_64() - p->start_time) >> (SHIFT_HZ + 10);
 
 	s = int_sqrt(cpu_time);
 	if (s)
@@ -116,12 +111,10 @@ static struct task_struct * select_bad_p
 	unsigned long maxpoints = 0;
 	struct task_struct *g, *p;
 	struct task_struct *chosen = NULL;
-	struct timespec uptime;
 
-	do_posix_clock_monotonic_gettime(&uptime);
 	do_each_thread(g, p)
 		if (p->pid) {
-			unsigned long points = badness(p, uptime.tv_sec);
+			unsigned long points = badness(p);
 			if (points > maxpoints) {
 				chosen = p;
 				maxpoints = points;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: process start time set wrongly at boot for kernel 2.6.9
@ 2004-10-19 21:03 Jerome Borsboom
  0 siblings, 0 replies; 19+ messages in thread
From: Jerome Borsboom @ 2004-10-19 21:03 UTC (permalink / raw)
  To: johnstul; +Cc: linux-kernel

>How reproducible is this? Are the correct and incorrect time values 
>always off by the same amount?
>
>Are you running NTP? I'm curious if you are changing your system time 
>during boot.
>
>thanks
>-john

At each boot, the time of the first processes seems to be off 1 hour and 
11 minutes. Another system shows the same symptoms but with different 
values.

I am setting the time during boot with ntp, but the start time seems to 
change from incorrect to correct before I even run ntp.

Jerome

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2004-10-27  7:58 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-19 18:21 process start time set wrongly at boot for kernel 2.6.9 Jerome Borsboom
2004-10-19 20:11 ` john stultz
2004-10-20  0:42   ` Tim Schmielau
2004-10-20  0:59     ` john stultz
2004-10-20  3:05       ` gradual timeofday overhaul Tim Schmielau
2004-10-20  7:47         ` Len Brown
2004-10-20 15:09           ` George Anzinger
2004-10-20 15:59             ` Richard B. Johnson
2004-10-20 15:17           ` George Anzinger
2004-10-20 17:09           ` Lee Revell
2004-10-20 21:42             ` Len Brown
2004-10-20 18:13         ` john stultz
2004-10-20 14:51       ` process start time set wrongly at boot for kernel 2.6.9 George Anzinger
2004-10-20 17:42         ` john stultz
2004-10-20 23:52           ` George Anzinger
2004-10-21  0:25             ` john stultz
2004-10-21  1:04               ` George Anzinger
2004-10-27  7:55   ` Tim Schmielau
2004-10-19 21:03 Jerome Borsboom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).