linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: john stultz <johnstul@us.ibm.com>
To: Valdis.Kletnieks@vt.edu
Cc: Roman Zippel <zippel@linux-m68k.org>,
	Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org
Subject: Re: 2.6.17-mm2 hrtimer code wedges at boot?
Date: Wed, 05 Jul 2006 17:51:54 -0700	[thread overview]
Message-ID: <1152147114.24656.117.camel@cog.beaverton.ibm.com> (raw)
In-Reply-To: <200607050429.k654TXUr012316@turing-police.cc.vt.edu>

On Wed, 2006-07-05 at 00:29 -0400, Valdis.Kletnieks@vt.edu wrote:
> On Mon, 03 Jul 2006 03:13:39 +0200, Roman Zippel said:
> > Hi,
> > 
> > On Fri, 30 Jun 2006, Valdis.Kletnieks@vt.edu wrote:
> > 
> > > *AHA* I *found* the bugger, I think.
> > > 
> > > In kernel/timer.c, we have:
> > > 
> > > static void clocksource_adjust(struct clocksource *clock, s64 offset)
> > > (s64 used for offset in multiple places).
> > > 
> > > However, in other places, offset is a 'cycle_t', which is:
> > > 
> > > include/linux/clocksource.h:typedef u64 cycle_t;
> > > 
> > > So it looks like a signed/unsigned screwage.
> > 
> > It's a possibility, but the signed/unsigned usage is pretty much 
> > intentional. The assumptation is that time only goes forward so nothing 
> > there should become negative, only adjustments happen in both directions.
> > It's really weird why it's getting completely so out of control early 
> > during boot. It would be great if you could also test the patch below, it 
> > should trigger closer to when it goes wrong and help to analyze the 
> > problem (or at least rule out a number of possibilities).
> 
> Here you go.. For what it's worth, your debugging in clocksource_adjust seems
> to only pop before we start userspace, and get_realtime_clock_ts only once
> userspace starts.

Once again, thanks for the testing! My observations below...

> The dmesg, with all the suggested patches so far applied.  Looks like something
> starts off uninitialized - we get the first 'big adj' squawk right after we
> allocate the console - we don't allocate the tsc timesource for another 4
> seconds or so.
> 
> I'll bite - what *am* I using as a timesource for those first 4 seconds? :)

The jiffies clocksource.

> [    0.000000] Detected 1595.408 MHz processor.
...
>[   24.322196] CPU: Intel(R) Pentium(R) 4 Mobile CPU 1.60GHz stepping 04
...
> [   29.528533] Time: tsc clocksource has been installed.
> [   29.552855] clock changed at -296333 (4294314460971008)
> [   29.577109] clock tsc: m:2628985,s:22,cl: ,ci:1595166,xn:0,xi:4193667486510,e:0

Ok, so here's our initial TSC state:

Verify the mult/shift pair:
2^s/m = 2^22/2628985 = 1.595408113777750729 cyc/ns => 1.595 GHz

Verify the cycle_interval/xtime_interval pair:
xi = ci*m = 1595166 * 2628985 = 4193667486510

Convert xi to ns:
xi>>s = 4193667486510>>22 = 999848.2433581352234 ns/interval

Convert ntp_tick to ns:
ntp_tick>>32 = 4294314460971008>>32 = 999848 ns/tick

Ok, that all looks pretty good...


> [   29.601869] big adj at -296332 (4294314460971008,-16,-25522656,-11031712)
> [   29.626688] clock tsc: m:2628985,s:22,cl:47288392250,ci:1595166,xn:148610636380190,xi:4193667486510,e:-76300711936

Now here's where things turn odd. Note that only one jiffy has passed
(-296332 - -296333 = 1).

However, looking at the difference between cycle_last:
47288392250 - 47171945132 = 116,447,118

That's *way* larger then the 1,595,166 value expected in ci!

Same thing is seen in the later data points:

47452694348 - 47368150550 = 84,543,798
47538833312 - 47452694348 = 86,138,964
etc.

So it seems either something is causing you to take interrupts at a
lower frequency then what is expected, or your cpu is ~50x faster then
advertised :)

This is probably not an issue w/ the timekeeping code, however as a
side-effect it appears to make the clocksource_adjust function oscillate
pretty severely. I've reproduced a similar hang (not completely sure, as
it occurred while X was loading) by adding the following to the top of
update_wall_time:

	static int droptick;
	if(droptick++%60)
		return;

Roman: While I'm not 100% confident about my assessment above, I worry
this is mimicking the problems I had been seeing in my simulator w/ your
clocksource_adjustment algorithm when I simulated dropping many ticks.
While currently this behavior points to some other problem, with the
dynticks patch, its much more likely that we will see 100s of ticks
skipped.

I quickly revived my P-D adjustment patch and it does not appear to
suffer from the same problem with the above droptick change (although
its only been lightly tested). 

I realize you may have a more trivial change to this issue, but would
you consider my method again?

Vladis: Mind trying the following patch to see if it affects the
behavior.

thanks
-john


Implement P-D control for clocksource_adjust()

diff --git a/kernel/timer.c b/kernel/timer.c
index 396a3c0..f4e7681 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1007,81 +1007,108 @@ static int __init timekeeping_init_devic
 
 device_initcall(timekeeping_init_device);
 
-/*
- * If the error is already larger, we look ahead another tick,
- * to compensate for late or lost adjustments.
- */
-static __always_inline int clocksource_bigadjust(int sign, s64 error, s64 *interval, s64 *offset)
+static int error_aproximation(u64 error, u64 unit, int max)
 {
-	int adj;
-
-	/*
-	 * As soon as the machine is synchronized to the external time
-	 * source this should be the common case.
-	 */
-	error >>= 2;
-	if (likely(sign > 0 ? error <= *interval : error >= *interval))
-		return sign;
-
-	/*
-	 * An extra look ahead dampens the effect of the current error,
-	 * which can grow quite large with continously late updates, as
-	 * it would dominate the adjustment value and can lead to
-	 * oscillation.
-	 */
-	error += current_tick_length() >> (TICK_LENGTH_SHIFT - clock->shift + 1);
-	error -= clock->xtime_interval >> 1;
-
-	adj = 0;
+	int adj = 0;
 	while (1) {
 		error >>= 1;
-		if (sign > 0 ? error <= *interval : error >= *interval)
-			break;
-		adj++;
+		if (error <= unit)
+			return adj;
+		if (!max || adj < max)
+			adj++;
 	}
-
-	/*
-	 * Add the current adjustments to the error and take the offset
-	 * into account, the latter can cause the error to be hardly
-	 * reduced at the next tick. Check the error again if there's
-	 * room for another adjustment, thus further reducing the error
-	 * which otherwise had to be corrected at the next update.
-	 */
-	error = (error << 1) - *interval + *offset;
-	if (sign > 0 ? error > *interval : error < *interval)
-		adj++;
-
-	*interval <<= adj;
-	*offset <<= adj;
-	return sign << adj;
 }
+#define MAXOFFADJ 4 /* vary max oscillation vs convergance speed */
 
 /*
  * Adjust the multiplier to reduce the error value,
  * this is optimized for the most common adjustments of -1,0,1,
  * for other values we can do a bit more work.
  */
-static void clocksource_adjust(struct clocksource *clock, s64 offset)
+static void clocksource_adjust(struct clocksource *clock, s64 offset,
+				s64 interval_cycs, s64 interval_error)
 {
 	s64 error, interval = clock->cycle_interval;
-	int adj;
-
-	error = clock->error >> (TICK_LENGTH_SHIFT - clock->shift - 1);
-	if (error > interval) {
-		adj = clocksource_bigadjust(1, error, &interval, &offset);
-	} else if (error < -interval) {
-		interval = -interval;
-		offset = -offset;
-		adj = clocksource_bigadjust(-1, error, &interval, &offset);
-	} else
-		return;
+	
+	error = shift_right(clock->error, (TICK_LENGTH_SHIFT - clock->shift));
+	interval_error = shift_right(interval_error, 
+					(TICK_LENGTH_SHIFT - clock->shift));
+
+	if ((error > interval)
+		||(error < -(interval)) ) {
+
+		int adj, multadj = 0;
+		s64 offset_update = 0, snsec_update = 0;
+		
+		/* First do the frequency adjustment:
+		 *   The idea here is to look at the error 
+		 *   accumulated since the last call to 
+		 *   update_wall_time to determine the 
+		 *   frequency adjustment needed so no new
+		 *   error will be incurred in the next
+		 *   interval.
+		 *
+		 *   This is basically derivative control
+		 *   using the PID terminology (we're calculating
+		 *   the derivative of the slope and correcting it).
+		 *
+		 *   The math is basically:
+		 *      multadj = interval_error/interval_cycles
+		 *   Which we fudge using binary approximation.
+		 */
+		if(interval_error >= 0) {
+			adj = error_aproximation(interval_error,
+						interval_cycs, 0);
+			multadj += 1 << adj;
+			snsec_update += interval << adj;
+			offset_update += offset << adj;
+		} else {
+			adj = error_aproximation(-interval_error, 
+						interval_cycs, 0);
+			multadj -= 1 << adj;
+			snsec_update -= interval << adj;
+			offset_update -= offset << adj;
+	        }
+		/* Now do the offset adjustment:
+		 *   Now that the frequncy is fixed, we
+		 *   want to look at the total error accumulated
+		 *   to move us back in sync using the same method.
+		 *   However, we must be careful as if we make too 
+		 *   sudden an adjustment we might overshoot. So we 
+		 *   limit the amount of change to spread the 
+		 *   adjustment (using MAXOFFADJ) over a longer 
+		 *   period of time.
+		 *
+		 *   This is basically proportional control
+		 *   using the PID terminology.
+		 *
+		 *   We use interval_cycs here as the divisor, which
+		 *   hopes that the next sample will be similar in 
+		 *   distance from the last.
+		 */
+		if(error >= 0) {
+			adj = error_aproximation(error, 
+					interval_cycs, MAXOFFADJ);
+			multadj += 1<<adj;
+			snsec_update += interval <<adj;
+			offset_update += offset << adj;
+		} else {
+			adj = error_aproximation(-error,
+						interval_cycs, MAXOFFADJ);
+			multadj -= 1<<adj;
+			snsec_update -= interval <<adj;
+			offset_update -= offset << adj;
+		}
 
-	clock->mult += adj;
-	clock->xtime_interval += interval;
-	clock->xtime_nsec -= offset;
-	clock->error -= (interval - offset) << (TICK_LENGTH_SHIFT - clock->shift);
+		clock->mult += multadj;
+		clock->xtime_interval += snsec_update;
+		clock->xtime_nsec -= offset_update;
+		clock->error += (offset_update) 
+					<< (TICK_LENGTH_SHIFT - clock->shift);
+	}
 }
 
+
 /*
  * update_wall_time - Uses the current clocksource to increment the wall time
  *
@@ -1089,7 +1116,8 @@ static void clocksource_adjust(struct cl
  */
 static void update_wall_time(void)
 {
-	cycle_t offset;
+	cycle_t offset, interval_cycs = 0;
+	s64 interval_error = 0; 
 
 	clock->xtime_nsec += (s64)xtime.tv_nsec << clock->shift;
 
@@ -1106,8 +1134,13 @@ static void update_wall_time(void)
 		/* accumulate one interval */
 		clock->xtime_nsec += clock->xtime_interval;
 		clock->cycle_last += clock->cycle_interval;
+		interval_cycs += clock->cycle_interval;
 		offset -= clock->cycle_interval;
 
+		/* accumulate error between NTP and clock interval */
+		interval_error += current_tick_length();
+		interval_error -= clock->xtime_interval << (TICK_LENGTH_SHIFT - clock->shift);
+
 		if (clock->xtime_nsec >= (u64)NSEC_PER_SEC << clock->shift) {
 			clock->xtime_nsec -= (u64)NSEC_PER_SEC << clock->shift;
 			xtime.tv_sec++;
@@ -1119,14 +1152,10 @@ static void update_wall_time(void)
 						>> clock->shift);
 		/* increment the NTP state machine */
 		update_ntp_one_tick();
-
-		/* accumulate error between NTP and clock interval */
-		clock->error += current_tick_length();
-		clock->error -= clock->xtime_interval << (TICK_LENGTH_SHIFT - clock->shift);
 	}
-
+	clock->error += interval_error;
 	/* correct the clock when NTP error is too big */
-	clocksource_adjust(clock, offset);
+	clocksource_adjust(clock, offset, interval_cycs, interval_error);
 
 	/* store full nanoseconds into xtime */
 	xtime.tv_nsec = clock->xtime_nsec >> clock->shift;



  parent reply	other threads:[~2006-07-06  0:51 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-24 13:19 2.6.17-mm2 Andrew Morton
2006-06-24 15:53 ` 2.6.17-mm2 Rafael J. Wysocki
2006-06-24 17:20   ` 2.6.17-mm2 Dave Jones
2006-06-24 21:34     ` 2.6.17-mm2 Andrew Morton
2006-06-25  8:51       ` 2.6.17-mm2 Rafael J. Wysocki
2006-06-25 10:22         ` 2.6.17-mm2 Andrew Morton
2006-06-25 15:16           ` 2.6.17-mm2 Andrew Morton
2006-06-25 18:23             ` 2.6.17-mm2 Sam Ravnborg
2006-06-25 18:40               ` 2.6.17-mm2 Andrew Morton
2006-06-25 21:21                 ` 2.6.17-mm2 Sam Ravnborg
2006-06-30  7:38             ` 2.6.17-mm2 Randy.Dunlap
2006-07-02 10:11               ` 2.6.17-mm2 Russell King
2006-07-02 18:42                 ` 2.6.17-mm2 Randy.Dunlap
2006-07-02 18:47                   ` 2.6.17-mm2 Arjan van de Ven
2006-07-02 18:47                   ` 2.6.17-mm2 Sam Ravnborg
2006-07-03  5:50                 ` 2.6.17-mm2 Randy.Dunlap
2006-07-03 13:49                   ` 2.6.17-mm2 Russell King
2006-06-25 19:19           ` 2.6.17-mm2 Rafael J. Wysocki
2006-06-26 20:13           ` 2.6.17-mm2 Chandra Seetharaman
2006-06-24 19:41 ` 2.6.17-mm2 Dominik Karall
2006-06-24 21:43   ` 2.6.17-mm2 Andrew Morton
2006-06-25  6:06 ` 2.6.17-mm2 Reuben Farrelly
2006-06-25  9:37   ` 2.6.17-mm2 Barry K. Nathan
2006-06-25 10:29     ` 2.6.17-mm2 Reuben Farrelly
2006-06-25 11:19 ` 2.6.17-mm2 Michal Piotrowski
2006-06-25 11:40   ` 2.6.17-mm2 Andrew Morton
2006-06-25 12:18     ` 2.6.17-mm2 Michal Piotrowski
2006-06-25 16:25 ` 2.6.17-mm2 (NULL pointer dereference) Dominik Karall
2006-06-25 17:18   ` Andrew Morton
2006-06-25 18:11     ` Dominik Karall
2006-06-25 16:47 ` 2.6.17-mm2: no QLA3YYY_NAPI help text Adrian Bunk
2006-06-25 19:32 ` 2.6.17-mm2: BLK_CPQ_CISS_DA=m error Adrian Bunk
2006-06-26  0:41   ` Vivek Goyal
2006-06-25 23:13 ` [-mm patch] make drivers/scsi/pata_it821x.c:it821x_passthru_dev_select() static Adrian Bunk
2006-06-25 23:27   ` Alan Cox
2006-06-27  1:03   ` Jeff Garzik
2006-06-25 23:13 ` [-mm patch] fs/cifs/cifsproto.h: remove #ifdef around small_smb_init_no_tc() prototype Adrian Bunk
2006-06-26  4:05   ` Steven French
2006-06-26 15:17 ` [-mm patch] drivers/scsi/arcmsr/: cleanups Adrian Bunk
2006-06-26 20:27 ` [-mm patch] drivers/md/raid5.c: remove an unused variable Adrian Bunk
2006-06-26 21:41 ` 2.6.17-mm2 hrtimer code wedges at boot? Valdis.Kletnieks
2006-06-26 22:50   ` Valdis.Kletnieks
2006-06-26 23:02   ` john stultz
2006-06-26 23:27   ` Thomas Gleixner
2006-06-27  2:12     ` Valdis.Kletnieks
2006-06-27  5:54       ` Thomas Gleixner
2006-06-27 10:16   ` Roman Zippel
2006-06-27 16:43     ` Valdis.Kletnieks
2006-06-27 17:10       ` Roman Zippel
2006-06-27 17:23         ` Roman Zippel
2006-06-27 19:07           ` Valdis.Kletnieks
2006-06-28  0:07             ` john stultz
2006-06-28 10:35               ` Roman Zippel
2006-06-28 11:44                 ` Roman Zippel
2006-06-29 23:07                   ` Valdis.Kletnieks
2006-06-30 19:26                     ` john stultz
2006-06-30 21:04                       ` Valdis.Kletnieks
2006-07-03  1:13                         ` Roman Zippel
2006-07-03  1:56                           ` Daniel Walker
2006-07-03  2:20                             ` Valdis.Kletnieks
2006-07-03 20:08                               ` john stultz
2006-07-03 19:59                             ` john stultz
2006-07-04 22:21                               ` Valdis.Kletnieks
2006-07-05  4:29                           ` Valdis.Kletnieks
2006-07-06  0:37                             ` Roman Zippel
2006-07-06  0:56                               ` john stultz
2006-07-06  6:38                               ` Valdis.Kletnieks
2006-07-06  0:51                             ` john stultz [this message]
2006-07-06  1:12                               ` john stultz
2006-07-06  5:43                                 ` john stultz
2006-07-06 20:33                               ` Roman Zippel
2006-07-06 22:05                                 ` john stultz
2006-07-07 23:16                                   ` Roman Zippel
2006-07-08 20:02                                   ` [PATCH] adjust clock for lost ticks Roman Zippel
2006-07-09 21:25                                     ` john stultz
2006-06-28 23:41                 ` 2.6.17-mm2 hrtimer code wedges at boot? john stultz
2006-06-29 11:24                   ` Roman Zippel
2006-06-28 16:54 ` [-mm patch] include/asm-i386/acpi.h should #include <asm/processor.h> Adrian Bunk
2006-06-28 16:54 ` [-mm patch] fix sgivwfb compile Adrian Bunk
2006-06-28 16:54 ` [-mm patch] arch/i386/mach-visws/setup.c: remove dummy function calls Adrian Bunk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1152147114.24656.117.camel@cog.beaverton.ibm.com \
    --to=johnstul@us.ibm.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zippel@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).