All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] Adjust SHIFT_PLL to improve NTP convergence.
@ 2009-05-04 21:03 john stultz
  2009-05-05  0:27 ` Rik van Riel
  2009-05-06 14:46 ` George Spelvin
  0 siblings, 2 replies; 8+ messages in thread
From: john stultz @ 2009-05-04 21:03 UTC (permalink / raw)
  To: lkml; +Cc: Roman Zippel, Thomas Gleixner, Clark Williams, linux, Ulrich Windl

[-- Attachment #1: Type: text/plain, Size: 7853 bytes --]

With the v2.6.19 Linux kenrel, Roman converted the in-kernel NTP code to
the NTPv4 reference model. 

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f19923937321244e7dc334767eb4b67e0e3d5c74

This was great, because along with his other patches in 2.6.19, the
in-kernel behavior better matched what the userland v4 ntp daemon (used
in most distributions) expects.

However, with this change, we also saw NTP convergence times increase.
This was noticed at the time:
http://lkml.indiana.edu/hypermail/linux/kernel/0609.2/1348.html

But as NTP still functioned (just slower), the explanation of improved
clock stability seemed fair:
http://lkml.indiana.edu/hypermail/linux/kernel/0609.3/0433.html


Since then, I've continued to hear some occasional grumbling on the
topic, but most folks seemed to be getting by ok.

However, more recently, as folks have been upgrading kernels, I've been
involved in a handful of customer issues relating to slow NTP
convergence. It ends up that the current stiffness of the NTP
convergence rate is actually causing problems with NTP clients staying
in close sync when thermal environments change (ie: AC kicking on).

Further, on a fresh system deploy, the current kernel can cause NTPd to
take over 12 hours to find the proper freq value to keep the box in
close sync. Additionally, if one is using the TSC, on reboot the TSC
calibration can often see ~30-60ppm variation, which can cause large
time offsets until NTP converges on the new drift freq.

I started comparing NTP behavior between 2.6.18 and the current kernel,
and the difference in convergence time is very apparent.

In my test, I let the systems sync up, turned NTPd off, set the drift
value to -500ppm, and started NTPd up again. I then ran until NTP
converged and charted the results.

In the first attached image (2.6.30-unmodified.png) you can see 2.6.18's
behavior vs 2.6.30-rc1.

o 2.6.18 very quickly adjusts the drift value and the offset quickly
converges. 

o 2.6.30-rc1 is very slow in adjusting the drift freq, so the offset
grows and grows, until it hits the slew boundary and time is set back to
the time server. This happens a few times until the drift value is close
enough that NTPd can slew the time into sync without forcing the clock
down.

I then modified the SHIFT_PLL value (which defines how stiff NTP's freq
correction is), setting it to 2, and created the second attached image
(2.6.30-pll2.png), comparing against the 2.6.18 numbers in the first
chart.

o Here we see 2.6.30-rc1 matches 2.6.18's behavior very closely.

Note: The freq corrections between kernel versions differ, so the ppm
lines in the chart are expected to be different. However, the rate of
change in both kernels are very very similar.

In reading up about the SHIFT_PLL value, I found the following
discussion:
https://lists.ntp.org/pipermail/hackers/2008-January/003487.html

Where David Mills clarified that the SHIFT_PLL value of 4 in the
nanokernel reference model was appropriate for other Unix systems with
HZ values of 100, and should be reduced as HZ increases.

Linux's in-kernel clock steering is for the most part HZ independent
(calculations are done at the second interval, and then spread out over
the "tick length" which may or may not be connected to HZ). So the rule
of thumb in the link above for setting SHIFT_PLL doesn't really apply.

Thus I ran some experiments and established SHIFT_PLL=2 as having
similar convergence behavior to kernels prior to 2.6.19. In testing with
a few customers, all customers reported much improved NTP convergence
times and also saw improved sync through environmental temperature
variations.

I also ran a fair amount of tests with HZ=100, HZ=1000, NO_HZ, as well
as with changes in various ntp.conf settings, and saw no regressions in
long term clock stability.

As always, feedback or testing (especially on non-x86 arches) would be
greatly appreciated.

thanks
-john


The conversion to the ntpv4 reference model
(f19923937321244e7dc334767eb4b67e0e3d5c74) in 2.6.19 added nanosecond
resolution the adjtimex interface, but also changed the "stiffness" of
the frequency adjustments, causing NTP convergence time to greatly
increase.

SHIFT_PLL, which reduces the stiffness of the freq adjustments, was
designed to be inversely linked to HZ, and the reference value of 4 was
designed for Unix systems using HZ=100. However Linux's clock steering
code mostly independent of HZ. 

So this patch reduces the SHIFT_PLL value from 4 to 2, which causes NTPd
behavior to match kernels prior to 2.6.19, greatly reducing convergence
times, and improving close synchronization through environmental thermal
changes.

The patch also changes some l's to L's in nearby code  to avoid
misreading 50l as 501.

Signed-off-by: John Stultz <johnstul@us.ibm.com>

diff --git a/include/linux/timex.h b/include/linux/timex.h
index aa3475f..0daf961 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -170,17 +170,37 @@ struct timex {
 #include <asm/timex.h>
 
 /*
- * SHIFT_KG and SHIFT_KF establish the damping of the PLL and are chosen
- * for a slightly underdamped convergence characteristic. SHIFT_KH
- * establishes the damping of the FLL and is chosen by wisdom and black
- * art.
+ * SHIFT_PLL is used as a dampening factor to define how much we
+ * adjust the frequency correction for a given offset in PLL mode.
+ * It also used in dampening the offset correction, to define how
+ * much of the current value in time_offset we correct for each
+ * second. Changing this value changes the stiffness of the ntp
+ * adjustment code. A lower value makes it more flexible, reducing
+ * NTP convergence time. A higher value makes it stiffer, increasing
+ * convergence time, but making the clock more stable.
  *
- * MAXTC establishes the maximum time constant of the PLL. With the
- * SHIFT_KG and SHIFT_KF values given and a time constant range from
- * zero to MAXTC, the PLL will converge in 15 minutes to 16 hours,
- * respectively.
+ * In David Mills' nanokenrel reference implmentation SHIFT_PLL is 4.
+ * However this seems to increase convergence time much too long.
+ *
+ * https://lists.ntp.org/pipermail/hackers/2008-January/003487.html
+ *
+ * In the above mailing list discussion, it seems the value of 4
+ * was appropriate for other Unix systems with HZ=100, and that
+ * SHIFT_PLL should be decreased as HZ increases. However, Linux's
+ * clock steering implementation is HZ independent.
+ *
+ * Through experimentation, a SHIFT_PLL value of 2 was found to allow
+ * for fast convergence (very similar to the NTPv3 code used prior to
+ * v2.6.19), with good clock stability.
+ *
+ *
+ * SHIFT_FLL is used as a dampening factor to define how much we
+ * adjust the frequency correction for a given offset in FLL mode.
+ * In David Mills' nanokenrel reference implmentation SHIFT_PLL is 2.
+ *
+ * MAXTC establishes the maximum time constant of the PLL.
  */
-#define SHIFT_PLL	4	/* PLL frequency factor (shift) */
+#define SHIFT_PLL	2	/* PLL frequency factor (shift) */
 #define SHIFT_FLL	2	/* FLL frequency factor (shift) */
 #define MAXTC		10	/* maximum time constant (shift) */
 
@@ -192,10 +212,10 @@ struct timex {
 #define SHIFT_USEC 16		/* frequency offset scale (shift) */
 #define PPM_SCALE ((s64)NSEC_PER_USEC << (NTP_SCALE_SHIFT - SHIFT_USEC))
 #define PPM_SCALE_INV_SHIFT 19
-#define PPM_SCALE_INV ((1ll << (PPM_SCALE_INV_SHIFT + NTP_SCALE_SHIFT)) / \
+#define PPM_SCALE_INV ((1LL << (PPM_SCALE_INV_SHIFT + NTP_SCALE_SHIFT)) / \
 		       PPM_SCALE + 1)
 
-#define MAXPHASE 500000000l	/* max phase error (ns) */
+#define MAXPHASE 500000000L	/* max phase error (ns) */
 #define MAXFREQ 500000		/* max frequency error (ns/s) */
 #define MAXFREQ_SCALED ((s64)MAXFREQ << NTP_SCALE_SHIFT)
 #define MINSEC 256		/* min interval between updates (s) */


[-- Attachment #2: 2.6.30-unmodified.png --]
[-- Type: image/png, Size: 42868 bytes --]

[-- Attachment #3: 2.6.30-pll2.png --]
[-- Type: image/png, Size: 37120 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] Adjust SHIFT_PLL to improve NTP convergence.
  2009-05-04 21:03 [RFC][PATCH] Adjust SHIFT_PLL to improve NTP convergence john stultz
@ 2009-05-05  0:27 ` Rik van Riel
  2009-05-06 14:46 ` George Spelvin
  1 sibling, 0 replies; 8+ messages in thread
From: Rik van Riel @ 2009-05-05  0:27 UTC (permalink / raw)
  To: john stultz
  Cc: lkml, Roman Zippel, Thomas Gleixner, Clark Williams, linux, Ulrich Windl

john stultz wrote:
> With the v2.6.19 Linux kenrel, Roman converted the in-kernel NTP code to
> the NTPv4 reference model. 

> I then modified the SHIFT_PLL value (which defines how stiff NTP's freq
> correction is), setting it to 2, and created the second attached image
> (2.6.30-pll2.png), comparing against the 2.6.18 numbers in the first
> chart.

> Signed-off-by: John Stultz <johnstul@us.ibm.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] Adjust SHIFT_PLL to improve NTP convergence.
  2009-05-04 21:03 [RFC][PATCH] Adjust SHIFT_PLL to improve NTP convergence john stultz
  2009-05-05  0:27 ` Rik van Riel
@ 2009-05-06 14:46 ` George Spelvin
  2009-05-06 15:44   ` John Stultz
  2009-05-07  9:14   ` Ulrich Windl
  1 sibling, 2 replies; 8+ messages in thread
From: George Spelvin @ 2009-05-06 14:46 UTC (permalink / raw)
  To: johnstul; +Cc: linux-kernel, linux, tglx, ulrich.windl, williams, zippel

> As always, feedback or testing (especially on non-x86 arches) would be
> greatly appreciated.

Applying it to a couple of PPS-synchronized machines (where I had already
dropped the poll interval into the basement for better convergence, have
to try reverting that) seems to be working.

One thing I just noticed, although I'm sure it has happened in the past,
is that there's a frequency jump each boot due to TSC recalibration.

		Machine 1	Machine 2
Old CPU MHz	2500.170	2500.138
Old NTP ppm	-24.63 +/-0.01	-30.27 +/-0.02

New CPU MHz	2500.176	2500.193
New NTP ppm	-22.26 +/-0.01	-8.20 +/-0.015 

"True" CPU MHz	2500.2316	2500.2136

I should look and see if there's an easy way to tighten that tolerance.
The current algorithm is fine for jiffies purposes, but a few seconds'
worth of background calibration would produce a more stable estimate.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] Adjust SHIFT_PLL to improve NTP convergence.
  2009-05-06 14:46 ` George Spelvin
@ 2009-05-06 15:44   ` John Stultz
  2009-05-07  6:07     ` George Spelvin
  2009-05-07  9:14   ` Ulrich Windl
  1 sibling, 1 reply; 8+ messages in thread
From: John Stultz @ 2009-05-06 15:44 UTC (permalink / raw)
  To: George Spelvin; +Cc: linux-kernel, tglx, ulrich.windl, williams, zippel

On Wed, 2009-05-06 at 10:46 -0400, George Spelvin wrote:
> > As always, feedback or testing (especially on non-x86 arches) would be
> > greatly appreciated.
> 
> Applying it to a couple of PPS-synchronized machines (where I had already
> dropped the poll interval into the basement for better convergence, have
> to try reverting that) seems to be working.


Thanks for the testing!

> One thing I just noticed, although I'm sure it has happened in the past,
> is that there's a frequency jump each boot due to TSC recalibration.
> 
> 		Machine 1	Machine 2
> Old CPU MHz	2500.170	2500.138
> Old NTP ppm	-24.63 +/-0.01	-30.27 +/-0.02
> 
> New CPU MHz	2500.176	2500.193
> New NTP ppm	-22.26 +/-0.01	-8.20 +/-0.015 
> 
> "True" CPU MHz	2500.2316	2500.2136
> 
> I should look and see if there's an easy way to tighten that tolerance.
> The current algorithm is fine for jiffies purposes, but a few seconds'
> worth of background calibration would produce a more stable estimate.

Yea. The TSC calibration issue has been around for awhile. Although the
code has been tweaked recently, I'm not seeing that much improved
consistency from reboot to reboot.

Its a hard trade off for folks who need very quick boot time, vs folks
that want very stable and accurate clocks across reboots.

I'll be looking into the calibration code to see how much is needed to
bring the error rate down.

thanks
-john



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] Adjust SHIFT_PLL to improve NTP convergence.
  2009-05-06 15:44   ` John Stultz
@ 2009-05-07  6:07     ` George Spelvin
  0 siblings, 0 replies; 8+ messages in thread
From: George Spelvin @ 2009-05-07  6:07 UTC (permalink / raw)
  To: johnstul, linux; +Cc: linux-kernel, tglx, ulrich.windl, williams, zippel

> Yea. The TSC calibration issue has been around for awhile. Although the
> code has been tweaked recently, I'm not seeing that much improved
> consistency from reboot to reboot.
> 
> Its a hard trade off for folks who need very quick boot time, vs folks
> that want very stable and accurate clocks across reboots.
> 
> I'll be looking into the calibration code to see how much is needed to
> bring the error rate down.

The fast estimate is very good.  Would it be possible to do a fine
calibration in parallel with the rest of the boot sequence?  Maybe until
the first adjtimex call?  Or is it essential that the numbers not
change?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] Adjust SHIFT_PLL to improve NTP convergence.
  2009-05-06 14:46 ` George Spelvin
  2009-05-06 15:44   ` John Stultz
@ 2009-05-07  9:14   ` Ulrich Windl
  2009-05-07 18:07     ` George Spelvin
  1 sibling, 1 reply; 8+ messages in thread
From: Ulrich Windl @ 2009-05-07  9:14 UTC (permalink / raw)
  To: George Spelvin; +Cc: linux-kernel, linux, tglx, ulrich.windl, williams, zippel

On 6 May 2009 at 10:46, George Spelvin wrote:

[...]
> One thing I just noticed, although I'm sure it has happened in the past,
> is that there's a frequency jump each boot due to TSC recalibration.
> 
> 		Machine 1	Machine 2
> Old CPU MHz	2500.170	2500.138
> Old NTP ppm	-24.63 +/-0.01	-30.27 +/-0.02
> 
> New CPU MHz	2500.176	2500.193
> New NTP ppm	-22.26 +/-0.01	-8.20 +/-0.015 
> 
> "True" CPU MHz	2500.2316	2500.2136
> 
> I should look and see if there's an easy way to tighten that tolerance.
> The current algorithm is fine for jiffies purposes, but a few seconds'
> worth of background calibration would produce a more stable estimate.

Hi,

IMHO, the "value" of calibration during boot is dubious (unless you just reboot a 
"hot" machine): Usually the machine is cold on reboot, and the oscillators will 
drift a lot initially. Depending on your environment, the temperatures inside your 
box may rise about 10°C or more within a few minutes.

Maybe an experimental feature "recalibrate anytime (and as long as you'd like)" 
would be interesting...

Regards,
Ulrich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] Adjust SHIFT_PLL to improve NTP convergence.
  2009-05-07  9:14   ` Ulrich Windl
@ 2009-05-07 18:07     ` George Spelvin
  2009-05-07 19:37       ` john stultz
  0 siblings, 1 reply; 8+ messages in thread
From: George Spelvin @ 2009-05-07 18:07 UTC (permalink / raw)
  To: linux, ulrich.windl; +Cc: linux-kernel, tglx, williams, zippel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1125 bytes --]

> IMHO, the "value" of calibration during boot is dubious (unless you just
> reboot a "hot" machine): Usually the machine is cold on reboot, and the
> oscillators will drift a lot initially. Depending on your environment,
> the temperatures inside your box may rise about 10°C or more within a
> few minutes.

> Maybe an experimental feature "recalibrate anytime (and as long as you'd
> like)" would be interesting...

Well, a couple of points come to mind:
- Rebooting hot is not all that unusual.  Less so that with Windows, but
  some of us install test kernels from time to time. :-)
- If you're calibrating the CPU against the PIT, they're often derived
  via PLL from the same clock source, so the *ratio* is fixed, and thus
  the calibration won't drift at all.
- If there were some mechanism to make the divisor consistent across
  boots, that would be helpful.  Imagine a file in /sys that I could
  write with an absolute frequency at boot time as long as it was within
  1000 ppm (0.1%) of the kernel's measurement.

Thinking about it, the third option seems the most useful.  It's
basically a cleaned-up tickadj.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] Adjust SHIFT_PLL to improve NTP convergence.
  2009-05-07 18:07     ` George Spelvin
@ 2009-05-07 19:37       ` john stultz
  0 siblings, 0 replies; 8+ messages in thread
From: john stultz @ 2009-05-07 19:37 UTC (permalink / raw)
  To: George Spelvin; +Cc: ulrich.windl, linux-kernel, tglx, williams, zippel

On Thu, May 7, 2009 at 11:07 AM, George Spelvin <linux@horizon.com> wrote:
> - If there were some mechanism to make the divisor consistent across
>  boots, that would be helpful.  Imagine a file in /sys that I could
>  write with an absolute frequency at boot time as long as it was within
>  1000 ppm (0.1%) of the kernel's measurement.
>
> Thinking about it, the third option seems the most useful.  It's
> basically a cleaned-up tickadj.

Indeed. Something similar to the lpj= boot option might be nice. I'll
take a look at that.

thanks
-john

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-05-07 20:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-04 21:03 [RFC][PATCH] Adjust SHIFT_PLL to improve NTP convergence john stultz
2009-05-05  0:27 ` Rik van Riel
2009-05-06 14:46 ` George Spelvin
2009-05-06 15:44   ` John Stultz
2009-05-07  6:07     ` George Spelvin
2009-05-07  9:14   ` Ulrich Windl
2009-05-07 18:07     ` George Spelvin
2009-05-07 19:37       ` john stultz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.