linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: john stultz <johnstul@us.ibm.com>
To: Jesper Krogh <jesper@krogh.cc>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Len Brown <len.brown@intel.com>
Subject: Re: Linux 2.6.29-rc6
Date: Tue, 03 Mar 2009 14:22:49 -0800	[thread overview]
Message-ID: <1236118969.6068.87.camel@localhost> (raw)
In-Reply-To: <49AD90E2.7050209@krogh.cc>

[-- Attachment #1: Type: text/plain, Size: 4908 bytes --]

On Tue, 2009-03-03 at 21:19 +0100, Jesper Krogh wrote:
> john stultz wrote:
> > On Tue, 2009-03-03 at 07:04 +0100, Jesper Krogh wrote:
> >> john stultz wrote:
> >>> On Mon, 2009-03-02 at 10:53 +0100, Jesper Krogh wrote:
> >>>> john stultz wrote:
> >>>>> Ok, so it seems ntp hasn't really had a chance to settle down, its only
> >>>>> made a 10ppm adjustment so far. NTPd will stop corrections at ~
> >>>>> +/-500ppm, so you're not at that bound yet, where things would be really
> >>>>> broken.
> >>>>>
> >>>>> If the affected kernel isn't resetting in the logs anymore, I'd be
> >>>>> interested in what the new ppm value is.
> >>>> After 20 hours.. its still resetting.
> >>>> Mar  2 10:43:24 quad12 ntpd[4416]: synchronized to 10.194.133.12, stratum 4
> >>>> Mar  2 10:50:37 quad12 ntpd[4416]: time reset -1.103654 s
> >>> So what's the "ntpdc -c kerninfo" output now?
> >> Mar  3 06:41:10 quad12 ntpd[4416]: time reset -0.813957 s
> >> Mar  3 06:45:20 quad12 ntpd[4416]: synchronized to LOCAL(0), stratum 13
> >> Mar  3 06:45:36 quad12 ntpd[4416]: synchronized to 10.194.133.12, stratum 4
> >> Mar  3 06:51:57 quad12 ntpd[4416]: synchronized to 10.194.133.13, stratum 4
> >> Mar  3 07:00:29 quad12 ntpd[4416]: time reset -0.783390 s
> >> jk@quad12:~$ ntpdc -c kerninfo
> >> pll offset:           0 s
> >> pll frequency:        -28.691 ppm
> > 
> > 
> > This is baffling. You've only gone from -34.754ppm to -28.691ppm in over
> > a day? And you're still not syncing? If the calibration was so bad that
> > NTP couldn't sync, I'd expect the freq value to hit +/-500ppm before it
> > gave up. This just doesn't follow my expectations.
> 
> It's resetting.. without deep knowledge about ntp, doesnt that mean 
> "start over again"? I believe it hits +/-500ppm

No, the "time reset" message means that when the offset is larger
then .125sec (the slew boundary), NTPd has corrected it by calling
settimeofday instead of slewing the clock.

Here's some background about how NTP and the kernel interact:
Every time NTPd calls adjtimex(), its provides the current offset from
the tracked ntp server. The kernel takes this offset and applies a
temporary correction factor to the clocksource frequency to converge
that offset. It also takes the provided offset, dampens it, and then
uses the result to adjust the frequency value. Once the freq value hits
the max adjustment value (+/- 500ppm), then NTP will start throwing
error messages and give up.

The part that is so odd with your data, is that the freq value isn't
changing very much. After a time reset, I'd expect to see adjustments in
the 100us, then multiple ms, and only once we get above 100ms to see
another time reset. All the while, these adjustment values should be
tweaking the freq value, causing the clocks to converge.

The case I can think of that could cause this, is if the drift is
somehow jumping above the slew boundary before NTPd actually makes any
adjtimex calls, so we end up with minimal correction to the freq value,
but that still doesn't completely vibe with the data.


> > Could you provide:
> > /usr/sbin/ntpdc -c version
> 
> $ ntpdc -c version
> ntpdc 4.2.4p4@1.1520-o Tue Jan  6 15:51:00 UTC 2009 (1)
> 
> > Do you see the same behavior if you drop all but one server (including
> > the local clock: 127.127.1.0)? 
> > 
> > You might also add "minpoll 4 maxpoll 4" to the server line to speed up
> > testing.
> 
> Will try those option while debugging.
> 
> > Actually, if you could, I'd be interested if you could send your
> > ntp.conf 
> 
> http://krogh.cc/~jesper/ntp.conf

Cool, I see you're collecting stats already. Depending on the results of
the tests above I may want to check those out as well.

> But this seems to be a "regression". Since 2.6.27.19 doesn't misbehave. 
> Same NTP, same configuration, same hardware. only change is the kernel 
> version. Or am I missing some parameter here?
> 
> Would it make sense to try to bisect it?

Well, I suspect you'll just bisect it to the fast-pit TSC calibration
causing a different correction freq to be needed for synchronization.
The odd part is that the userland NTPd isn't behaving as I'd expect if
the TSC calibration was really so bad that NTP couldn't handle it.

Bisection may be something worth trying just to verify or disprove that
theory, so if you have the time, it would be interesting to see. But if
the theory is true then we're back to the same spot.

I guess something to test my idea above (that the drift is bad enough
that NTPd isn't making slew adjustments via adjtimex offset) is to
remove NTPd from the init.d startup.

Then after rebooting (into 2.6.29), run the attached python script for
10 minutes or so to get an idea of the ppm drift. Then repeat with
2.6.26.

To run: 
./drift-test.py <ntp server>

It will give some wild ppm numbers, but after a few minutes it should
settle down to the "natural drift" of the system.

thanks
-john


[-- Attachment #2: drift-test.py --]
[-- Type: text/x-python, Size: 1523 bytes --]

#!/usr/bin/python

# Time Drift Script
#		Periodically checks and displays time drift
#		by john stultz (jstultz@us.ibm.com)

import commands
import sys
import string
import time

server_default = "yourserverhere"
sleep_time_default  = 60

server = ""
sleep_time = 0
set_time = 0

#parse args
for arg in sys.argv[1:]:
	if arg == "-s":
		set_time = 1
	elif server == "":
		server = arg
	elif sleep_time == 0:
		sleep_time = string.atoi(arg)

if server == "":
	server = server_default
if sleep_time == 0:
	sleep_time = sleep_time_default

#set time
if (set_time == 1):
	cmd = commands.getoutput('/usr/sbin/ntpdate -ub ' + server)

cmd = commands.getoutput('/usr/sbin/ntpdate -uq ' + server)
line = string.split(cmd)

#parse original offset
start_offset = string.atof(line[-2]);
#parse original time
start_time = time.localtime(time.time())
datestr = time.strftime("%d %b %Y %H:%M:%S", start_time)

time.sleep(1)
while 1:
	cmd = commands.getoutput('/usr/sbin/ntpdate -uq ' + server)
	line = string.split(cmd)

	#parse offset
	now_offset = string.atof(line[-2]);

	#parse time
	now_time = time.localtime(time.time())
	datestr = time.strftime("%d %b %Y %H:%M:%S", now_time)

	# calculate drift
	delta_time = time.mktime(now_time) - time.mktime(start_time)
	delta_offset = now_offset - start_offset
	drift =  delta_offset / delta_time * 1000000

	#print output
	print time.strftime("%d %b %H:%M:%S",now_time), 
	print "	offset:", now_offset , 
	print "	drift:", drift ,"ppm"
	sys.stdout.flush()

	#sleep 
	time.sleep(sleep_time)

  reply	other threads:[~2009-03-03 22:25 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-23  4:31 Linux 2.6.29-rc6 Linus Torvalds
2009-02-23 14:07 ` Linux 2.6.29-rc6 - Fix oops in i915_gem_retire_requests Karsten Wiese
2009-02-26 11:15 ` Linux 2.6.29-rc6 Jesper Krogh
2009-02-26 17:17   ` MTD_CK804XROM warning (Was: Linux 2.6.29-rc6) Marcin Slusarz
2009-02-26 17:53   ` Linux 2.6.29-rc6 Linus Torvalds
2009-02-26 19:22     ` David Woodhouse
2009-02-26 19:31     ` Jesper Krogh
2009-02-26 19:36       ` David Woodhouse
2009-02-26 19:46         ` Jesper Krogh
2009-02-26 19:49           ` David Woodhouse
2009-02-26 20:53         ` Carl-Daniel Hailfinger
2009-02-26 20:32       ` Linus Torvalds
2009-02-26 19:55 ` Jesper Krogh
2009-02-26 20:33   ` Linus Torvalds
2009-02-26 20:43     ` Jesper Krogh
2009-02-26 21:19       ` john stultz
2009-02-26 21:35         ` Jesper Krogh
2009-02-26 21:46           ` john stultz
2009-02-26 21:54             ` Thomas Gleixner
2009-02-26 22:04               ` Jesper Krogh
2009-02-27  6:30             ` Jesper Krogh
2009-03-01 13:51             ` Jesper Krogh
2009-02-26 21:49           ` Linus Torvalds
2009-03-01 15:04             ` Jesper Krogh
2009-02-26 21:54           ` john stultz
2009-02-26 22:06             ` Thomas Gleixner
2009-02-26 22:24               ` Linus Torvalds
2009-02-26 22:31                 ` Linus Torvalds
2009-02-26 22:31               ` john stultz
2009-02-26 22:40                 ` Linus Torvalds
2009-02-26 22:59                   ` john stultz
2009-02-27  7:33                     ` Ingo Molnar
2009-02-27 20:50                       ` john stultz
2009-02-27  6:47                 ` Jesper Krogh
2009-02-27 20:35                   ` john stultz
2009-03-01 20:13                     ` Jesper Krogh
2009-03-02  9:53                     ` Jesper Krogh
2009-03-02 21:27                       ` john stultz
2009-03-03  6:04                         ` Jesper Krogh
2009-03-03 19:53                           ` john stultz
2009-03-03 20:19                             ` Jesper Krogh
2009-03-03 22:22                               ` john stultz [this message]
2009-03-04 15:30                                 ` Jesper Krogh
2009-03-04 18:36                                   ` Jesper Krogh
2009-03-04 18:57                                     ` John Stultz
2009-03-05  2:39                                       ` john stultz
2009-03-05  2:52                                         ` john stultz
2009-03-05  8:43                                           ` Ingo Molnar
2009-03-06  3:13                                             ` john stultz
2009-03-06  3:54                                               ` john stultz
2009-03-06 11:34                                                 ` Ingo Molnar
2009-03-09 20:42                                           ` Jesper Krogh
2009-03-10  4:26                                             ` Linus Torvalds
2009-03-10 11:29                                               ` Thomas Gleixner
2009-03-10 19:42                                                 ` Jesper Krogh
2009-03-10 22:22                                                   ` Thomas Gleixner
2009-03-15 19:53                                                     ` Jesper Krogh
2009-03-16 18:40                                                       ` Jesper Krogh
2009-03-15  1:19                                             ` Linus Torvalds
2009-03-15 15:44                                               ` Jesper Krogh
2009-03-15 18:09                                                 ` Linus Torvalds
2009-03-15 18:38                                                   ` Jesper Krogh
2009-03-15 19:02                                                     ` Linus Torvalds
2009-03-15 19:52                                                       ` Jesper Krogh
2009-03-16 18:59                                                         ` Jesper Krogh
2009-03-16 19:32                                                           ` Linus Torvalds
2009-03-17  1:43                                                             ` john stultz
2009-03-17  8:14                                                             ` Ingo Molnar
2009-03-17 15:48                                                               ` Linus Torvalds
2009-03-17 16:13                                                                 ` Ingo Molnar
2009-03-17 16:28                                                                   ` Linus Torvalds
2009-03-17 16:40                                                                     ` Ingo Molnar
2009-03-17 17:28                                                                   ` Olivier Galibert
2009-03-21  9:11                                                             ` Jesper Krogh
2009-03-21 10:06                                                               ` Ingo Molnar
2009-03-15 20:32                                                     ` Linus Torvalds
2009-03-03 20:39                             ` Jesper Krogh
2009-03-03 22:16                               ` john stultz
2009-03-04  5:36                                 ` Jesper Krogh
2009-03-01 15:09   ` Jesper Krogh
2009-03-01 15:44     ` Linux 2.6.29-rc6 (clocksource) Sitsofe Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1236118969.6068.87.camel@localhost \
    --to=johnstul@us.ibm.com \
    --cc=jesper@krogh.cc \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).