linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 21:00     ` do_gettimeofday vs. rdtsc in the scheduler Andi Kleen
@ 2002-09-17 20:54       ` David S. Miller
  2002-09-17 21:28         ` Alan Cox
  0 siblings, 1 reply; 29+ messages in thread
From: David S. Miller @ 2002-09-17 20:54 UTC (permalink / raw)
  To: ak; +Cc: linux-kernel, johnstul, anton.wilson

   From: Andi Kleen <ak@suse.de>
   Date: 17 Sep 2002 23:00:38 +0200
   
   Also reading HPET is somewhat more costly than reading TSCs because it
   goes to the southbridge, so there are cases where using TSC is
   probably better (e.g. I think for networking packet time stamping the
   TSC is just fine with all its limitations)

The cpu gets a bus clock input, so the system tick should be processor
local as much as TSC is.

It's boggling that this is being messed up so much.  I can't believe
Sun got something incredibly right (Ultra-III has a system tick) :-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
       [not found]   ` <20020917.133933.69057655.davem@redhat.com.suse.lists.linux.kernel>
@ 2002-09-17 21:00     ` Andi Kleen
  2002-09-17 20:54       ` David S. Miller
  0 siblings, 1 reply; 29+ messages in thread
From: Andi Kleen @ 2002-09-17 21:00 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, johnstul, anton.wilson

"David S. Miller" <davem@redhat.com> writes:

>    From: john stultz <johnstul@us.ibm.com>
>    Date: 17 Sep 2002 13:29:18 -0700
>    
>    Some NUMA boxes do not have synced TSC, so on those systems your
>    code won't work.
> 
> It would have been really nice if x86 had specified a "system tick"
> register that incremented based upon the system bus cycles and thus
> were immune the processor rates.

It has - the local APIC timer. It has a tick register too that you can
read. Unfortunately it's buggy/unreliable on many systems. Linux uses
it for task scheduling and the local timer interrupt when it works,
but it's not really good enough for gettimeofday.

Microsoft/Intel have specified the HPET timer as replacement, but 
it is still missing in many chipsets and buggy in others.

Also reading HPET is somewhat more costly than reading TSCs because it
goes to the southbridge, so there are cases where using TSC is
probably better (e.g. I think for networking packet time stamping the
TSC is just fine with all its limitations)

> I foresee lots of patches coming which basically are "how does this
> x86 system provide a stable synchronized tick source".

>From those who didn't implement HPET but some own spec like IBM.

-Andi

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 21:28         ` Alan Cox
@ 2002-09-17 21:18           ` David S. Miller
  2002-09-17 22:02             ` James Cleverdon
  0 siblings, 1 reply; 29+ messages in thread
From: David S. Miller @ 2002-09-17 21:18 UTC (permalink / raw)
  To: alan; +Cc: ak, linux-kernel, johnstul, anton.wilson

   From: Alan Cox <alan@lxorguk.ukuu.org.uk>
   Date: 17 Sep 2002 22:28:12 +0100

   A bus clock - but things like the x440 have more than one bus clock. Its
   NUMA. Also the bus clock and rdtsc clock are different - rdtsc is
   dependant on the multiplier. Shove a celeron 300 and a celeron 450 in a
   BP6 board with tsc on and enjoy
   
That's mostly my point.

If the bus clocks differ, then great create some system wide crystal
oscillator.  That's a detail, the important bit is that you don't need
to go out to the system bus to read the tick value, it must be cpu
local to be effective and without serious performance impact.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 20:54       ` David S. Miller
@ 2002-09-17 21:28         ` Alan Cox
  2002-09-17 21:18           ` David S. Miller
  0 siblings, 1 reply; 29+ messages in thread
From: Alan Cox @ 2002-09-17 21:28 UTC (permalink / raw)
  To: David S. Miller; +Cc: ak, linux-kernel, johnstul, anton.wilson

On Tue, 2002-09-17 at 21:54, David S. Miller wrote:
> The cpu gets a bus clock input, so the system tick should be processor
> local as much as TSC is.
> 
> It's boggling that this is being messed up so much.  I can't believe
> Sun got something incredibly right (Ultra-III has a system tick) :-)

A bus clock - but things like the x440 have more than one bus clock. Its
NUMA. Also the bus clock and rdtsc clock are different - rdtsc is
dependant on the multiplier. Shove a celeron 300 and a celeron 450 in a
BP6 board with tsc on and enjoy


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 21:18           ` David S. Miller
@ 2002-09-17 22:02             ` James Cleverdon
  2002-09-17 22:44               ` Andi Kleen
  2002-09-18  6:40               ` Vojtech Pavlik
  0 siblings, 2 replies; 29+ messages in thread
From: James Cleverdon @ 2002-09-17 22:02 UTC (permalink / raw)
  To: David S. Miller, alan; +Cc: ak, linux-kernel, johnstul, anton.wilson

On Tuesday 17 September 2002 02:18 pm, David S. Miller wrote:
>    From: Alan Cox <alan@lxorguk.ukuu.org.uk>
>    Date: 17 Sep 2002 22:28:12 +0100
>
>    A bus clock - but things like the x440 have more than one bus clock. Its
>    NUMA. Also the bus clock and rdtsc clock are different - rdtsc is
>    dependant on the multiplier. Shove a celeron 300 and a celeron 450 in a
>    BP6 board with tsc on and enjoy
>
> That's mostly my point.
>
> If the bus clocks differ, then great create some system wide crystal
> oscillator.  That's a detail, the important bit is that you don't need
> to go out to the system bus to read the tick value, it must be cpu
> local to be effective and without serious performance impact.
> -

It's more than just a detail.  Sequent's last NUMA system (_not_ the NUMA-Q;  
never released) did exactly what you suggest.  The midplane card generated 
the bus clock for all quad modules.  We had requested this feature because it 
was such a pain dealing with clock drift between nodes in the OS.

The HW guys were able to give us synchronized bus clocks on a 16-way box, but 
warned us that it would not be practical on the 256-way.  Too much clock skew 
at those speeds, or something like that.  I suppose you could trade off 
interconnect rate for clock sync, but then performance would suffer.

I don't know how Sun and SGI manage with their larger systems.  Either they 
don't do clock sync, or they may have to make expensive tradeoffs.

Interestingly, Intel's IA64 manual does not guarantee that the CPU clock (and 
thus its TSC register) has anything to do with the bus clock rate.  Maybe 
they want to dabble with asynchronous logic or multiple clock domains in 
future CPUs.

Trivia:  NUMA-Q systems running Dynix/PTX can contain quads running at very 
different CPU speeds.  This made locating some race conditions quite easy.

-- 
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 22:44               ` Andi Kleen
@ 2002-09-17 22:38                 ` David S. Miller
  2002-09-17 22:55                   ` James Cleverdon
  0 siblings, 1 reply; 29+ messages in thread
From: David S. Miller @ 2002-09-17 22:38 UTC (permalink / raw)
  To: ak; +Cc: jamesclv, alan, linux-kernel, johnstul, anton.wilson

   From: Andi Kleen <ak@suse.de>
   Date: Wed, 18 Sep 2002 00:44:42 +0200

   > I don't know how Sun and SGI manage with their larger systems.  Either they 
   > don't do clock sync, or they may have to make expensive tradeoffs.
   
   I guess you could always run NTP between the different CPUs ;) ;) 
   
:-)

More seriously, you don't need to have the cpu tick registers sync'd,
it is the rate that matters.

Once booted, you can sync these system tick registers with a pretty
straight forward algorithm in the kernel.  Bonus points if you can
figure out how to cancel out the cost of moving the system tick sample
cachelines between master and slave in your algorithm :-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 22:02             ` James Cleverdon
@ 2002-09-17 22:44               ` Andi Kleen
  2002-09-17 22:38                 ` David S. Miller
  2002-09-18  6:40               ` Vojtech Pavlik
  1 sibling, 1 reply; 29+ messages in thread
From: Andi Kleen @ 2002-09-17 22:44 UTC (permalink / raw)
  To: James Cleverdon
  Cc: David S. Miller, alan, ak, linux-kernel, johnstul, anton.wilson

> I don't know how Sun and SGI manage with their larger systems.  Either they 
> don't do clock sync, or they may have to make expensive tradeoffs.

I guess you could always run NTP between the different CPUs ;) ;) 

-Andi

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 22:38                 ` David S. Miller
@ 2002-09-17 22:55                   ` James Cleverdon
  2002-09-17 23:12                     ` David S. Miller
  0 siblings, 1 reply; 29+ messages in thread
From: James Cleverdon @ 2002-09-17 22:55 UTC (permalink / raw)
  To: David S. Miller, ak; +Cc: alan, linux-kernel, johnstul, anton.wilson

On Tuesday 17 September 2002 03:38 pm, David S. Miller wrote:
>    From: Andi Kleen <ak@suse.de>
>    Date: Wed, 18 Sep 2002 00:44:42 +0200
>
>    > I don't know how Sun and SGI manage with their larger systems.  Either
>    > they don't do clock sync, or they may have to make expensive
>    > tradeoffs.
>
>    I guess you could always run NTP between the different CPUs ;) ;)
>
> :-)
>
> More seriously, you don't need to have the cpu tick registers sync'd,
> it is the rate that matters.
>
> Once booted, you can sync these system tick registers with a pretty
> straight forward algorithm in the kernel.  Bonus points if you can
> figure out how to cancel out the cost of moving the system tick sample
> cachelines between master and slave in your algorithm :-)

Been there.  Done that.  Had the product canceled.    ;^)

The initial sync was easy, even with variable latencies on cache lines.  A 
much simplified NTP-ish algorithm works fine.  The painful thing was bus 
clock drift and programs that foolishly relied on the TSC being the same 
between CPUs and between nodes.

-- 
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 22:55                   ` James Cleverdon
@ 2002-09-17 23:12                     ` David S. Miller
  2002-09-17 23:32                       ` john stultz
  0 siblings, 1 reply; 29+ messages in thread
From: David S. Miller @ 2002-09-17 23:12 UTC (permalink / raw)
  To: jamesclv; +Cc: ak, alan, linux-kernel, johnstul, anton.wilson

   From: James Cleverdon <jamesclv@us.ibm.com>
   Date: Tue, 17 Sep 2002 15:55:52 -0700
   
   The initial sync was easy, even with variable latencies on cache lines.  A 
   much simplified NTP-ish algorithm works fine.  The painful thing was bus 
   clock drift and programs that foolishly relied on the TSC being the same 
   between CPUs and between nodes.

This is why the gettimeofday implementation should use the system tick
thing and also any profiling support in the C library should avoid
TSC as well.

For small stretches of code TSC can be used for very precise profiling
but otherwise it is pretty useless by in large.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 23:12                     ` David S. Miller
@ 2002-09-17 23:32                       ` john stultz
  2002-09-17 23:32                         ` David S. Miller
  0 siblings, 1 reply; 29+ messages in thread
From: john stultz @ 2002-09-17 23:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: James, ak, Alan Cox, lkml, anton.wilson

On Tue, 2002-09-17 at 16:12, David S. Miller wrote:
>    From: James Cleverdon <jamesclv@us.ibm.com>
>    Date: Tue, 17 Sep 2002 15:55:52 -0700
>    
>    The initial sync was easy, even with variable latencies on cache lines.  A 
>    much simplified NTP-ish algorithm works fine.  The painful thing was bus 
>    clock drift and programs that foolishly relied on the TSC being the same 
>    between CPUs and between nodes.
> 
> This is why the gettimeofday implementation should use the system tick
> thing and also any profiling support in the C library should avoid
> TSC as well.

I think the point James is making is that on very large systems, you
will get system tick skew as well. On one system I know of, the bus
frequency is intensionally skewed slightly between nodes. This is what
causes the TSCs to skew, and I believe would also cause this "system
tick" to skew as well.

Additionally, where is this system tick thing? You make it sound like
its a register in the cpu, and while the Ultra-III may have one, I'm
unaware of a system/bus tick register on intel chips. Is it in some
semi-documented MSR?

I apologize for being confused, I'm just not sure if your criticizing
the code or the hardware. 

thanks
-john 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 23:32                       ` john stultz
@ 2002-09-17 23:32                         ` David S. Miller
  2002-09-17 23:52                           ` Andi Kleen
  0 siblings, 1 reply; 29+ messages in thread
From: David S. Miller @ 2002-09-17 23:32 UTC (permalink / raw)
  To: johnstul; +Cc: jamesclv, ak, alan, linux-kernel, anton.wilson

   From: john stultz <johnstul@us.ibm.com>
   Date: 17 Sep 2002 16:32:15 -0700
   
   Additionally, where is this system tick thing? You make it sound like
   its a register in the cpu, and while the Ultra-III may have one, I'm
   unaware of a system/bus tick register on intel chips. Is it in some
   semi-documented MSR?

It's in a register on Ultra-III.  The whole point of this
conversation, if you read my initial postings, is that
"this should have been specified in the x86 architecture"

I know full well it isn't currently :-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 23:52                           ` Andi Kleen
@ 2002-09-17 23:46                             ` David S. Miller
  2002-09-17 23:58                               ` Andi Kleen
  0 siblings, 1 reply; 29+ messages in thread
From: David S. Miller @ 2002-09-17 23:46 UTC (permalink / raw)
  To: ak; +Cc: johnstul, jamesclv, alan, linux-kernel, anton.wilson

   From: Andi Kleen <ak@suse.de>
   Date: Wed, 18 Sep 2002 01:52:09 +0200

   On Tue, Sep 17, 2002 at 04:32:46PM -0700, David S. Miller wrote:
   > I know full well it isn't currently :-)
   
   Sorry, it's wrong. The x86 architecture has several such registers

Not in the processor, and not architectually specified.

All of the things you list are in the scope of things outside
the cpu.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 23:58                               ` Andi Kleen
@ 2002-09-17 23:51                                 ` David S. Miller
  2002-09-18  0:05                                   ` Andi Kleen
  2002-09-19 11:20                                 ` Mikael Pettersson
  1 sibling, 1 reply; 29+ messages in thread
From: David S. Miller @ 2002-09-17 23:51 UTC (permalink / raw)
  To: ak; +Cc: johnstul, jamesclv, alan, linux-kernel, anton.wilson

   From: Andi Kleen <ak@suse.de>
   Date: Wed, 18 Sep 2002 01:58:38 +0200
   
   The local APIC timer is specified in the Intel Manual volume 3 for example.
   It's an optional feature (CPUID), but pretty much everyone has it.

It is internal or external to the processor?  Ie. can it be in the
southbridge or something?  If yes, then I still hold my point.

You shouldn't have to PIO to get a reliable timer value.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 23:32                         ` David S. Miller
@ 2002-09-17 23:52                           ` Andi Kleen
  2002-09-17 23:46                             ` David S. Miller
  0 siblings, 1 reply; 29+ messages in thread
From: Andi Kleen @ 2002-09-17 23:52 UTC (permalink / raw)
  To: David S. Miller; +Cc: johnstul, jamesclv, ak, alan, linux-kernel, anton.wilson

On Tue, Sep 17, 2002 at 04:32:46PM -0700, David S. Miller wrote:
>    From: john stultz <johnstul@us.ibm.com>
>    Date: 17 Sep 2002 16:32:15 -0700
>    
>    Additionally, where is this system tick thing? You make it sound like
>    its a register in the cpu, and while the Ultra-III may have one, I'm
>    unaware of a system/bus tick register on intel chips. Is it in some
>    semi-documented MSR?
> 
> It's in a register on Ultra-III.  The whole point of this
> conversation, if you read my initial postings, is that
> "this should have been specified in the x86 architecture"
> 
> I know full well it isn't currently :-)

Sorry, it's wrong. The x86 architecture has several such registers
(apic timers, 8253 timer, HPET [Microsoft requires this for new 
hardware that will be w*s certified]) 
They just all suck on various systems or in general. HPET is ok, 
but still not widespread enough.

-Andi


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 23:46                             ` David S. Miller
@ 2002-09-17 23:58                               ` Andi Kleen
  2002-09-17 23:51                                 ` David S. Miller
  2002-09-19 11:20                                 ` Mikael Pettersson
  0 siblings, 2 replies; 29+ messages in thread
From: Andi Kleen @ 2002-09-17 23:58 UTC (permalink / raw)
  To: David S. Miller; +Cc: ak, johnstul, jamesclv, alan, linux-kernel, anton.wilson

On Tue, Sep 17, 2002 at 04:46:49PM -0700, David S. Miller wrote:
>    From: Andi Kleen <ak@suse.de>
>    Date: Wed, 18 Sep 2002 01:52:09 +0200
> 
>    On Tue, Sep 17, 2002 at 04:32:46PM -0700, David S. Miller wrote:
>    > I know full well it isn't currently :-)
>    
>    Sorry, it's wrong. The x86 architecture has several such registers
> 
> Not in the processor, and not architectually specified.
> 
> All of the things you list are in the scope of things outside
> the cpu.

The local APIC timer is specified in the Intel Manual volume 3 for example.
It's an optional feature (CPUID), but pretty much everyone has it.

-Andi




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 23:51                                 ` David S. Miller
@ 2002-09-18  0:05                                   ` Andi Kleen
  2002-09-18  1:04                                     ` James Cleverdon
  2002-09-20 11:04                                     ` Maciej W. Rozycki
  0 siblings, 2 replies; 29+ messages in thread
From: Andi Kleen @ 2002-09-18  0:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: ak, johnstul, jamesclv, alan, linux-kernel, anton.wilson

On Tue, Sep 17, 2002 at 04:51:31PM -0700, David S. Miller wrote:
>    From: Andi Kleen <ak@suse.de>
>    Date: Wed, 18 Sep 2002 01:58:38 +0200
>    
>    The local APIC timer is specified in the Intel Manual volume 3 for example.
>    It's an optional feature (CPUID), but pretty much everyone has it.
> 
> It is internal or external to the processor?  Ie. can it be in the
> southbridge or something?  If yes, then I still hold my point.

Local Apic is in the cpu.

-Andi


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-18  0:05                                   ` Andi Kleen
@ 2002-09-18  1:04                                     ` James Cleverdon
  2002-09-19 18:02                                       ` Andrea Arcangeli
  2002-09-20 11:04                                     ` Maciej W. Rozycki
  1 sibling, 1 reply; 29+ messages in thread
From: James Cleverdon @ 2002-09-18  1:04 UTC (permalink / raw)
  To: Andi Kleen, David S. Miller
  Cc: ak, johnstul, alan, linux-kernel, anton.wilson

On Tuesday 17 September 2002 05:05 pm, Andi Kleen wrote:
> On Tue, Sep 17, 2002 at 04:51:31PM -0700, David S. Miller wrote:
> >    From: Andi Kleen <ak@suse.de>
> >    Date: Wed, 18 Sep 2002 01:58:38 +0200
> >
> >    The local APIC timer is specified in the Intel Manual volume 3 for
> > example. It's an optional feature (CPUID), but pretty much everyone has
> > it.
> >
> > It is internal or external to the processor?  Ie. can it be in the
> > southbridge or something?  If yes, then I still hold my point.
>
> Local Apic is in the cpu.
>
> -Andi

I believe you gents are going off at a tangent.  Intel's current P4 manual 
says the local APIC timer is driven by the "bus clock".  For serial APICs 
that was doubtless the APIC serial bus clock, which almost always was derived 
from the system clock.  For P4 systems with the xAPIC in parallel mode, the 
only one available is the system bus.

If a multi-node system doesn't have synchronized bus clocks, it doesn't matter 
which one you use.  The time bases will drift relative to each other.

It's even worse when the "Frequency Spreading" BIOS option is turned on.  
Then, the bus clocks are deliberately offset by as much as half a megahertz 
(doubtless to pass FCC or equivalent emission certifications).

I don't know what Sun does with the Ultra SPARC 3's time counter.  Maybe they 
have a separate clock input for it that runs at 1 MHz so skew and 
distribution is no problem.  That's fine for Sun; they build their own CPUs 
and can put in whatever they want.  The rest of us have to work with what we 
get from the different manufacturers.  And, just about all of them use a 
value derived from the bus clock -- which might have drift in a multi-node 
system.

That's where a better abstraction of the timer hardware would come in handy.  
It would use the PIT or TSC for 99% of boxes, and switch to special code for 
the weird ones.

-- 
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 22:02             ` James Cleverdon
  2002-09-17 22:44               ` Andi Kleen
@ 2002-09-18  6:40               ` Vojtech Pavlik
  2002-09-19 18:04                 ` Andrea Arcangeli
  1 sibling, 1 reply; 29+ messages in thread
From: Vojtech Pavlik @ 2002-09-18  6:40 UTC (permalink / raw)
  To: James Cleverdon
  Cc: David S. Miller, alan, ak, linux-kernel, johnstul, anton.wilson

On Tue, Sep 17, 2002 at 03:02:04PM -0700, James Cleverdon wrote:
> On Tuesday 17 September 2002 02:18 pm, David S. Miller wrote:
> >    From: Alan Cox <alan@lxorguk.ukuu.org.uk>
> >    Date: 17 Sep 2002 22:28:12 +0100
> >
> >    A bus clock - but things like the x440 have more than one bus clock. Its
> >    NUMA. Also the bus clock and rdtsc clock are different - rdtsc is
> >    dependant on the multiplier. Shove a celeron 300 and a celeron 450 in a
> >    BP6 board with tsc on and enjoy
> >
> > That's mostly my point.
> >
> > If the bus clocks differ, then great create some system wide crystal
> > oscillator.  That's a detail, the important bit is that you don't need
> > to go out to the system bus to read the tick value, it must be cpu
> > local to be effective and without serious performance impact.
> > -
> 
> It's more than just a detail.  Sequent's last NUMA system (_not_ the NUMA-Q;  
> never released) did exactly what you suggest.  The midplane card generated 
> the bus clock for all quad modules.  We had requested this feature because it 
> was such a pain dealing with clock drift between nodes in the OS.
> 
> The HW guys were able to give us synchronized bus clocks on a 16-way box, but 
> warned us that it would not be practical on the 256-way.  Too much clock skew 
> at those speeds, or something like that.  I suppose you could trade off 
> interconnect rate for clock sync, but then performance would suffer.
> 
> I don't know how Sun and SGI manage with their larger systems.  Either they 
> don't do clock sync, or they may have to make expensive tradeoffs.
> 
> Interestingly, Intel's IA64 manual does not guarantee that the CPU clock (and 
> thus its TSC register) has anything to do with the bus clock rate.  Maybe 
> they want to dabble with asynchronous logic or multiple clock domains in 
> future CPUs.

The point here is: You don't need a synchronized bus clock. You don't
need synchronized CPU clocks. You need a synchronized system-wide clock
that doesn't drive any bus or CPU, just a simple counter in every CPU
that you can read from inside the CPU. You can pull that pretty far and
to many CPUs. That's what I understand Sun does.

-- 
Vojtech Pavlik
SuSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 23:58                               ` Andi Kleen
  2002-09-17 23:51                                 ` David S. Miller
@ 2002-09-19 11:20                                 ` Mikael Pettersson
  2002-09-19 13:27                                   ` Alan Cox
  1 sibling, 1 reply; 29+ messages in thread
From: Mikael Pettersson @ 2002-09-19 11:20 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David S. Miller, johnstul, jamesclv, alan, linux-kernel, anton.wilson

Andi Kleen writes:
 > On Tue, Sep 17, 2002 at 04:46:49PM -0700, David S. Miller wrote:
 > >    From: Andi Kleen <ak@suse.de>
 > >    Date: Wed, 18 Sep 2002 01:52:09 +0200
 > > 
 > >    On Tue, Sep 17, 2002 at 04:32:46PM -0700, David S. Miller wrote:
 > >    > I know full well it isn't currently :-)
 > >    
 > >    Sorry, it's wrong. The x86 architecture has several such registers
 > > 
 > > Not in the processor, and not architectually specified.
 > > 
 > > All of the things you list are in the scope of things outside
 > > the cpu.
 > 
 > The local APIC timer is specified in the Intel Manual volume 3 for example.
 > It's an optional feature (CPUID), but pretty much everyone has it.

Except that like everything else related to the local APIC, you're at
the mercy of the competence (or lack thereof) of the BIOS implementors.
- There are plenty of laptops whose CPUs have local APICs but whose
  BIOSen go berserk if you enable it. There are also plenty of laptops
  that don't have one, since Intel removed it from many Mobile P6 CPUs.
- There are even some desktop boards with BIOS problems, including Intel's
  AL440LX on which Linux must stay away from the local APIC timer.

To assume the local APIC works on 686-class UP boxes is not realistic, alas.

/Mikael

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-19 11:20                                 ` Mikael Pettersson
@ 2002-09-19 13:27                                   ` Alan Cox
  2002-09-19 13:39                                     ` Mikael Pettersson
  2002-09-20 15:26                                     ` John Levon
  0 siblings, 2 replies; 29+ messages in thread
From: Alan Cox @ 2002-09-19 13:27 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: Andi Kleen, David S. Miller, johnstul, James Cleverdon,
	linux-kernel, anton.wilson

On Thu, 2002-09-19 at 12:20, Mikael Pettersson wrote:
>  > The local APIC timer is specified in the Intel Manual volume 3 for example.
>  > It's an optional feature (CPUID), but pretty much everyone has it.
> 
> Except that like everything else related to the local APIC, you're at
> the mercy of the competence (or lack thereof) of the BIOS implementors.
> - There are plenty of laptops whose CPUs have local APICs but whose
>   BIOSen go berserk if you enable it. There are also plenty of laptops

Frequently because we don't disable it again before any APM calls I
suspect. When a CPU goes into sleep mode you must disable PMC and local
apic timer interrupts.

>   that don't have one, since Intel removed it from many Mobile P6 CPUs.
> - There are even some desktop boards with BIOS problems, including Intel's
>   AL440LX on which Linux must stay away from the local APIC timer.
> 
> To assume the local APIC works on 686-class UP boxes is not realistic, alas.

Yep


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-19 13:27                                   ` Alan Cox
@ 2002-09-19 13:39                                     ` Mikael Pettersson
  2002-09-20 15:26                                     ` John Levon
  1 sibling, 0 replies; 29+ messages in thread
From: Mikael Pettersson @ 2002-09-19 13:39 UTC (permalink / raw)
  To: Alan Cox
  Cc: Mikael Pettersson, Andi Kleen, David S. Miller, johnstul,
	James Cleverdon, linux-kernel, anton.wilson

Alan Cox writes:
 > On Thu, 2002-09-19 at 12:20, Mikael Pettersson wrote:
 > >  > The local APIC timer is specified in the Intel Manual volume 3 for example.
 > >  > It's an optional feature (CPUID), but pretty much everyone has it.
 > > 
 > > Except that like everything else related to the local APIC, you're at
 > > the mercy of the competence (or lack thereof) of the BIOS implementors.
 > > - There are plenty of laptops whose CPUs have local APICs but whose
 > >   BIOSen go berserk if you enable it. There are also plenty of laptops
 > 
 > Frequently because we don't disable it again before any APM calls I
 > suspect. When a CPU goes into sleep mode you must disable PMC and local
 > apic timer interrupts.

We do on sane boxes where the APM BIOS informs us before suspending.
E.g., on my ASUS P3B-F & P4T-E suspend works with local APIC enabled
because I hooked both the NMI watchdog and local APIC to the
PM system, so we disable before suspending and restore afterwards.

The problem is that some BIOSen don't post the suspend event to
our APM driver, so we fail to disable before suspend, and some BIOSen
(like the utter crap Dell put in the Inspiron) die on all entries to
the BIOS: pull the power cord -> #SMM event -> box crashes.

/Mikael

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-18  1:04                                     ` James Cleverdon
@ 2002-09-19 18:02                                       ` Andrea Arcangeli
  0 siblings, 0 replies; 29+ messages in thread
From: Andrea Arcangeli @ 2002-09-19 18:02 UTC (permalink / raw)
  To: James Cleverdon
  Cc: Andi Kleen, David S. Miller, johnstul, alan, linux-kernel, anton.wilson

On Tue, Sep 17, 2002 at 06:04:33PM -0700, James Cleverdon wrote:
> have a separate clock input for it that runs at 1 MHz so skew and 

The clock input should be the same, or they can always run out of
synchrony if you left it running forever. The timer generation is an
analogic thing, the reception is digital, so having a single timer
guarantees no counter skew.

If the precision we'd need from the timer driving gettimeofday would be
1HZ, so 1 tick per second, you could make it scale perfectly without
oscillations on a 256G box.

you simply can't do that with a < 1nanosecond tick period on more than a
few cpus, because of physics, or it happens what's been mentioned a
number of times on this thread (oscillations generated by the latency of
the signal delivery or further slowdown in accessing the information
with overhead in the interconnects).

The best hardware solution to this problem is to have two cpu registers
increased by two timers, one is the regular cpu tick (TSC) that we have
today, that could even go away with asynchronous cpus, and the other
timer would be the new "real time timer", a 10/100khz clock delivered to
all the cpus that goes to increase such in-cpu-core counter (so that it
can be read from userspace too inside vgettimeofday and with extremely
low latency, exactly like the current tsc, but driven by such a
secondary low frequency timer that will tell us about the time changes).
10/100usec should be much more than enough margin to deliver this timer
to all the hundred cpus with a very small oscillation. And no software
that I'm aware about needs a time-of-day precision over 10/100usec. An
interrupt itself is going to take some usec. A context switch as well is
going to take more than 10usec, that's the important bit to guarantee
gettimeofday to be monothone, different threads can have a minor
difference in the perception of the time, dominated by the speed of
light delivery of the timer signal, that's not a problem as far as it's
monothone.

The TSC and also the system clock mentioned by Dave are way too fast to
be kept synchronized in a numa without introducing significant drifts
and oscillations.

If somebody really needs 1usec resolution, he will first need vsyscalls
to avoid enter/exit kernel latencies, likely he will need to run iopl
with irq disabled, and so it should be ok to use the TSC in such case
with a specialized hacked kernel config option (with all the disclaimer
that it would break if the cpu clock changes under you etc...) All mere
mortals will be perfectly fine with a 100khz clock for gettimeofday. If
sun did a 1mhz clock to achieve the above suggested design solution,
then they did the optimal thing IMHO.

Another approch would be to use separate timer sources per-cpu and to
re-resychronize every once in a while, at regular intervals that
guarantees the drift not to spread above the half of the time of the
shortest context switch, but it would need tedious software support with
knowledge of very lowevel hardware informations, so I'd definitely
prefer the previous mentioned solution that will require all hardware
vendors to get it right or it won't work. Like it's happening now with
the TSC, with the difference that the 100k timer would be doable, while
the TSC at 2ghz isn't doable.

Of course the cyclone timer and the HPET are the very next best thing
the hardware vendors could provide us on x86, and of course you cannot
do better than the cyclone and HPET without upgrading the cpu too,
because the cpu is simply missing a register to avoid hitting the
southbridge at every vgettimeofday. At least the good thing is that HPET
is mapped in a mmio region so we don't need to enter kernel but only to
access the southbridge from userspace and that saves a number of usec at
every gettimeofday.

All of this assumes gettimeofday is an important operation and that an
additional cpu sequence counter and an additional numa-shared timer
would payoff to make gettimeofday most efficient and most accurate on
all class of machines. It would be also an option to replace
the TSC with such new "real time counter" if adding a new counter is too
expensive, the TSC is almost unusable in its current too high frequency
form, it is useful only for microbenchmarking, so it's more a debugging
facility than a production feature, while the other would be a really
useful feature not only for debugging/benchmarking purposes.

Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-18  6:40               ` Vojtech Pavlik
@ 2002-09-19 18:04                 ` Andrea Arcangeli
  0 siblings, 0 replies; 29+ messages in thread
From: Andrea Arcangeli @ 2002-09-19 18:04 UTC (permalink / raw)
  To: Vojtech Pavlik
  Cc: James Cleverdon, David S. Miller, alan, ak, linux-kernel,
	johnstul, anton.wilson

On Wed, Sep 18, 2002 at 08:40:22AM +0200, Vojtech Pavlik wrote:
> The point here is: You don't need a synchronized bus clock. You don't
> need synchronized CPU clocks. You need a synchronized system-wide clock
> that doesn't drive any bus or CPU, just a simple counter in every CPU
> that you can read from inside the CPU. You can pull that pretty far and

Exactly.

Andrea

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-18  0:05                                   ` Andi Kleen
  2002-09-18  1:04                                     ` James Cleverdon
@ 2002-09-20 11:04                                     ` Maciej W. Rozycki
  1 sibling, 0 replies; 29+ messages in thread
From: Maciej W. Rozycki @ 2002-09-20 11:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David S. Miller, johnstul, jamesclv, alan, linux-kernel, anton.wilson

On Wed, 18 Sep 2002, Andi Kleen wrote:

> > It is internal or external to the processor?  Ie. can it be in the
> > southbridge or something?  If yes, then I still hold my point.
> 
> Local Apic is in the cpu.

 Except from when it's an i82489DX...  Rare but still.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-19 13:27                                   ` Alan Cox
  2002-09-19 13:39                                     ` Mikael Pettersson
@ 2002-09-20 15:26                                     ` John Levon
  1 sibling, 0 replies; 29+ messages in thread
From: John Levon @ 2002-09-20 15:26 UTC (permalink / raw)
  To: linux-kernel

On Thu, Sep 19, 2002 at 02:27:19PM +0100, Alan Cox wrote:

> > - There are plenty of laptops whose CPUs have local APICs but whose
> >   BIOSen go berserk if you enable it. There are also plenty of laptops
> 
> Frequently because we don't disable it again before any APM calls I
> suspect. When a CPU goes into sleep mode you must disable PMC and local
> apic timer interrupts.

Isn't this exactly what apic_pm_suspend() does ? Or is that in 2.5 only ?

regards
john
-- 
Support the project - http://www.gtonline.net/private/mapp/project/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 20:39   ` David S. Miller
@ 2002-09-17 20:57     ` john stultz
  2002-09-17 20:56       ` David S. Miller
  0 siblings, 1 reply; 29+ messages in thread
From: john stultz @ 2002-09-17 20:57 UTC (permalink / raw)
  To: David S. Miller; +Cc: anton.wilson, lkml, george anzinger

On Tue, 2002-09-17 at 13:39, David S. Miller wrote:
>    From: john stultz <johnstul@us.ibm.com>
>    Date: 17 Sep 2002 13:29:18 -0700
>    
>    Some NUMA boxes do not have synced TSC, so on those systems your
>    code won't work.
> 
> It would have been really nice if x86 had specified a "system tick"
> register that incremented based upon the system bus cycles and thus
> were immune the processor rates.

Some systems do, if I'm understanding you properly. Summit based boxes
have an on-chipset performance counter that runs at 100Mhz. My
cyclone-timer patch uses this as a gettimeofday/__delay time source in
the 2.4 kernel. Additionally George Anzinger has patches that allow the
ACPI PM timer to be used as well. Intel's HPET should also provide
another time source.
 
> I foresee lots of patches coming which basically are "how does this
> x86 system provide a stable synchronized tick source".

True, but hopefully my timer-changes patch will allow for better
abstraction around these varied time sources, so one won't really need
to know how all of these different sources work. 

thanks
-john





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 20:57     ` john stultz
@ 2002-09-17 20:56       ` David S. Miller
  0 siblings, 0 replies; 29+ messages in thread
From: David S. Miller @ 2002-09-17 20:56 UTC (permalink / raw)
  To: johnstul; +Cc: anton.wilson, linux-kernel, george

   From: john stultz <johnstul@us.ibm.com>
   Date: 17 Sep 2002 13:57:13 -0700

   On Tue, 2002-09-17 at 13:39, David S. Miller wrote:
   > It would have been really nice if x86 had specified a "system tick"
   > register that incremented based upon the system bus cycles and thus
   > were immune the processor rates.
   
   Some systems do, if I'm understanding you properly. Summit based boxes
   have an on-chipset performance counter that runs at 100Mhz. My
   cyclone-timer patch uses this as a gettimeofday/__delay time source in
   the 2.4 kernel. Additionally George Anzinger has patches that allow the
   ACPI PM timer to be used as well. Intel's HPET should also provide
   another time source.

If any of these need to go beyond the cpu to get the tick value,
they are misimplemented.

The cpu gets the system bus tick input at it's bus pins, therefore
it can implement the system tick register locally obviating the need
to go to a south bridge or memory controller or whatever else external
to the cpu to get at the value.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: do_gettimeofday vs. rdtsc in the scheduler
  2002-09-17 20:29 ` Fwd: " john stultz
@ 2002-09-17 20:39   ` David S. Miller
  2002-09-17 20:57     ` john stultz
  0 siblings, 1 reply; 29+ messages in thread
From: David S. Miller @ 2002-09-17 20:39 UTC (permalink / raw)
  To: johnstul; +Cc: anton.wilson, linux-kernel

   From: john stultz <johnstul@us.ibm.com>
   Date: 17 Sep 2002 13:29:18 -0700
   
   Some NUMA boxes do not have synced TSC, so on those systems your
   code won't work.

It would have been really nice if x86 had specified a "system tick"
register that incremented based upon the system bus cycles and thus
were immune the processor rates.

I foresee lots of patches coming which basically are "how does this
x86 system provide a stable synchronized tick source".

^ permalink raw reply	[flat|nested] 29+ messages in thread

* do_gettimeofday vs. rdtsc in the scheduler
@ 2002-09-09 22:21 anton wilson
  0 siblings, 0 replies; 29+ messages in thread
From: anton wilson @ 2002-09-09 22:21 UTC (permalink / raw)
  To: linux-kernel



I'm writing a patch for the scheduler that allows normal processes to run 
occasionally even though real-time processes completely dominate the CPU. In 
order to do this the way I want to for a specific real-time application, I 
need to keep track of the times that the schedule(void) function gets called. 
This time is then used to calculate the time difference between when a normal 
process was run last and the current time. I was trying to avoid 
do_gettimeofday because of the overhead, but now I'm wondering if rdtsc on an 
SMP machine may mess up my readings because the TSC from two different 
processors may be read. Am I right in assuming this? Secondly, any good 
suggestions on how to proceed with my patch? 


Thanks,

Anton

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2002-09-20 15:21 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200209172020.g8HKKPF13227@eng2.beaverton.ibm.com.suse.lists.linux.kernel>
     [not found] ` <1032294559.22815.180.camel@cog.suse.lists.linux.kernel>
     [not found]   ` <20020917.133933.69057655.davem@redhat.com.suse.lists.linux.kernel>
2002-09-17 21:00     ` do_gettimeofday vs. rdtsc in the scheduler Andi Kleen
2002-09-17 20:54       ` David S. Miller
2002-09-17 21:28         ` Alan Cox
2002-09-17 21:18           ` David S. Miller
2002-09-17 22:02             ` James Cleverdon
2002-09-17 22:44               ` Andi Kleen
2002-09-17 22:38                 ` David S. Miller
2002-09-17 22:55                   ` James Cleverdon
2002-09-17 23:12                     ` David S. Miller
2002-09-17 23:32                       ` john stultz
2002-09-17 23:32                         ` David S. Miller
2002-09-17 23:52                           ` Andi Kleen
2002-09-17 23:46                             ` David S. Miller
2002-09-17 23:58                               ` Andi Kleen
2002-09-17 23:51                                 ` David S. Miller
2002-09-18  0:05                                   ` Andi Kleen
2002-09-18  1:04                                     ` James Cleverdon
2002-09-19 18:02                                       ` Andrea Arcangeli
2002-09-20 11:04                                     ` Maciej W. Rozycki
2002-09-19 11:20                                 ` Mikael Pettersson
2002-09-19 13:27                                   ` Alan Cox
2002-09-19 13:39                                     ` Mikael Pettersson
2002-09-20 15:26                                     ` John Levon
2002-09-18  6:40               ` Vojtech Pavlik
2002-09-19 18:04                 ` Andrea Arcangeli
     [not found] <200209172020.g8HKKPF13227@eng2.beaverton.ibm.com>
2002-09-17 20:29 ` Fwd: " john stultz
2002-09-17 20:39   ` David S. Miller
2002-09-17 20:57     ` john stultz
2002-09-17 20:56       ` David S. Miller
2002-09-09 22:21 anton wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).