From: john stultz <johnstul@us.ibm.com>
To: Valdis.Kletnieks@vt.edu
Cc: Roman Zippel <zippel@linux-m68k.org>,
Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org
Subject: Re: 2.6.17-mm2 hrtimer code wedges at boot?
Date: Wed, 05 Jul 2006 17:51:54 -0700 [thread overview]
Message-ID: <1152147114.24656.117.camel@cog.beaverton.ibm.com> (raw)
In-Reply-To: <200607050429.k654TXUr012316@turing-police.cc.vt.edu>
On Wed, 2006-07-05 at 00:29 -0400, Valdis.Kletnieks@vt.edu wrote:
> On Mon, 03 Jul 2006 03:13:39 +0200, Roman Zippel said:
> > Hi,
> >
> > On Fri, 30 Jun 2006, Valdis.Kletnieks@vt.edu wrote:
> >
> > > *AHA* I *found* the bugger, I think.
> > >
> > > In kernel/timer.c, we have:
> > >
> > > static void clocksource_adjust(struct clocksource *clock, s64 offset)
> > > (s64 used for offset in multiple places).
> > >
> > > However, in other places, offset is a 'cycle_t', which is:
> > >
> > > include/linux/clocksource.h:typedef u64 cycle_t;
> > >
> > > So it looks like a signed/unsigned screwage.
> >
> > It's a possibility, but the signed/unsigned usage is pretty much
> > intentional. The assumptation is that time only goes forward so nothing
> > there should become negative, only adjustments happen in both directions.
> > It's really weird why it's getting completely so out of control early
> > during boot. It would be great if you could also test the patch below, it
> > should trigger closer to when it goes wrong and help to analyze the
> > problem (or at least rule out a number of possibilities).
>
> Here you go.. For what it's worth, your debugging in clocksource_adjust seems
> to only pop before we start userspace, and get_realtime_clock_ts only once
> userspace starts.
Once again, thanks for the testing! My observations below...
> The dmesg, with all the suggested patches so far applied. Looks like something
> starts off uninitialized - we get the first 'big adj' squawk right after we
> allocate the console - we don't allocate the tsc timesource for another 4
> seconds or so.
>
> I'll bite - what *am* I using as a timesource for those first 4 seconds? :)
The jiffies clocksource.
> [ 0.000000] Detected 1595.408 MHz processor.
...
>[ 24.322196] CPU: Intel(R) Pentium(R) 4 Mobile CPU 1.60GHz stepping 04
...
> [ 29.528533] Time: tsc clocksource has been installed.
> [ 29.552855] clock changed at -296333 (4294314460971008)
> [ 29.577109] clock tsc: m:2628985,s:22,cl: ,ci:1595166,xn:0,xi:4193667486510,e:0
Ok, so here's our initial TSC state:
Verify the mult/shift pair:
2^s/m = 2^22/2628985 = 1.595408113777750729 cyc/ns => 1.595 GHz
Verify the cycle_interval/xtime_interval pair:
xi = ci*m = 1595166 * 2628985 = 4193667486510
Convert xi to ns:
xi>>s = 4193667486510>>22 = 999848.2433581352234 ns/interval
Convert ntp_tick to ns:
ntp_tick>>32 = 4294314460971008>>32 = 999848 ns/tick
Ok, that all looks pretty good...
> [ 29.601869] big adj at -296332 (4294314460971008,-16,-25522656,-11031712)
> [ 29.626688] clock tsc: m:2628985,s:22,cl:47288392250,ci:1595166,xn:148610636380190,xi:4193667486510,e:-76300711936
Now here's where things turn odd. Note that only one jiffy has passed
(-296332 - -296333 = 1).
However, looking at the difference between cycle_last:
47288392250 - 47171945132 = 116,447,118
That's *way* larger then the 1,595,166 value expected in ci!
Same thing is seen in the later data points:
47452694348 - 47368150550 = 84,543,798
47538833312 - 47452694348 = 86,138,964
etc.
So it seems either something is causing you to take interrupts at a
lower frequency then what is expected, or your cpu is ~50x faster then
advertised :)
This is probably not an issue w/ the timekeeping code, however as a
side-effect it appears to make the clocksource_adjust function oscillate
pretty severely. I've reproduced a similar hang (not completely sure, as
it occurred while X was loading) by adding the following to the top of
update_wall_time:
static int droptick;
if(droptick++%60)
return;
Roman: While I'm not 100% confident about my assessment above, I worry
this is mimicking the problems I had been seeing in my simulator w/ your
clocksource_adjustment algorithm when I simulated dropping many ticks.
While currently this behavior points to some other problem, with the
dynticks patch, its much more likely that we will see 100s of ticks
skipped.
I quickly revived my P-D adjustment patch and it does not appear to
suffer from the same problem with the above droptick change (although
its only been lightly tested).
I realize you may have a more trivial change to this issue, but would
you consider my method again?
Vladis: Mind trying the following patch to see if it affects the
behavior.
thanks
-john
Implement P-D control for clocksource_adjust()
diff --git a/kernel/timer.c b/kernel/timer.c
index 396a3c0..f4e7681 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1007,81 +1007,108 @@ static int __init timekeeping_init_devic
device_initcall(timekeeping_init_device);
-/*
- * If the error is already larger, we look ahead another tick,
- * to compensate for late or lost adjustments.
- */
-static __always_inline int clocksource_bigadjust(int sign, s64 error, s64 *interval, s64 *offset)
+static int error_aproximation(u64 error, u64 unit, int max)
{
- int adj;
-
- /*
- * As soon as the machine is synchronized to the external time
- * source this should be the common case.
- */
- error >>= 2;
- if (likely(sign > 0 ? error <= *interval : error >= *interval))
- return sign;
-
- /*
- * An extra look ahead dampens the effect of the current error,
- * which can grow quite large with continously late updates, as
- * it would dominate the adjustment value and can lead to
- * oscillation.
- */
- error += current_tick_length() >> (TICK_LENGTH_SHIFT - clock->shift + 1);
- error -= clock->xtime_interval >> 1;
-
- adj = 0;
+ int adj = 0;
while (1) {
error >>= 1;
- if (sign > 0 ? error <= *interval : error >= *interval)
- break;
- adj++;
+ if (error <= unit)
+ return adj;
+ if (!max || adj < max)
+ adj++;
}
-
- /*
- * Add the current adjustments to the error and take the offset
- * into account, the latter can cause the error to be hardly
- * reduced at the next tick. Check the error again if there's
- * room for another adjustment, thus further reducing the error
- * which otherwise had to be corrected at the next update.
- */
- error = (error << 1) - *interval + *offset;
- if (sign > 0 ? error > *interval : error < *interval)
- adj++;
-
- *interval <<= adj;
- *offset <<= adj;
- return sign << adj;
}
+#define MAXOFFADJ 4 /* vary max oscillation vs convergance speed */
/*
* Adjust the multiplier to reduce the error value,
* this is optimized for the most common adjustments of -1,0,1,
* for other values we can do a bit more work.
*/
-static void clocksource_adjust(struct clocksource *clock, s64 offset)
+static void clocksource_adjust(struct clocksource *clock, s64 offset,
+ s64 interval_cycs, s64 interval_error)
{
s64 error, interval = clock->cycle_interval;
- int adj;
-
- error = clock->error >> (TICK_LENGTH_SHIFT - clock->shift - 1);
- if (error > interval) {
- adj = clocksource_bigadjust(1, error, &interval, &offset);
- } else if (error < -interval) {
- interval = -interval;
- offset = -offset;
- adj = clocksource_bigadjust(-1, error, &interval, &offset);
- } else
- return;
+
+ error = shift_right(clock->error, (TICK_LENGTH_SHIFT - clock->shift));
+ interval_error = shift_right(interval_error,
+ (TICK_LENGTH_SHIFT - clock->shift));
+
+ if ((error > interval)
+ ||(error < -(interval)) ) {
+
+ int adj, multadj = 0;
+ s64 offset_update = 0, snsec_update = 0;
+
+ /* First do the frequency adjustment:
+ * The idea here is to look at the error
+ * accumulated since the last call to
+ * update_wall_time to determine the
+ * frequency adjustment needed so no new
+ * error will be incurred in the next
+ * interval.
+ *
+ * This is basically derivative control
+ * using the PID terminology (we're calculating
+ * the derivative of the slope and correcting it).
+ *
+ * The math is basically:
+ * multadj = interval_error/interval_cycles
+ * Which we fudge using binary approximation.
+ */
+ if(interval_error >= 0) {
+ adj = error_aproximation(interval_error,
+ interval_cycs, 0);
+ multadj += 1 << adj;
+ snsec_update += interval << adj;
+ offset_update += offset << adj;
+ } else {
+ adj = error_aproximation(-interval_error,
+ interval_cycs, 0);
+ multadj -= 1 << adj;
+ snsec_update -= interval << adj;
+ offset_update -= offset << adj;
+ }
+ /* Now do the offset adjustment:
+ * Now that the frequncy is fixed, we
+ * want to look at the total error accumulated
+ * to move us back in sync using the same method.
+ * However, we must be careful as if we make too
+ * sudden an adjustment we might overshoot. So we
+ * limit the amount of change to spread the
+ * adjustment (using MAXOFFADJ) over a longer
+ * period of time.
+ *
+ * This is basically proportional control
+ * using the PID terminology.
+ *
+ * We use interval_cycs here as the divisor, which
+ * hopes that the next sample will be similar in
+ * distance from the last.
+ */
+ if(error >= 0) {
+ adj = error_aproximation(error,
+ interval_cycs, MAXOFFADJ);
+ multadj += 1<<adj;
+ snsec_update += interval <<adj;
+ offset_update += offset << adj;
+ } else {
+ adj = error_aproximation(-error,
+ interval_cycs, MAXOFFADJ);
+ multadj -= 1<<adj;
+ snsec_update -= interval <<adj;
+ offset_update -= offset << adj;
+ }
- clock->mult += adj;
- clock->xtime_interval += interval;
- clock->xtime_nsec -= offset;
- clock->error -= (interval - offset) << (TICK_LENGTH_SHIFT - clock->shift);
+ clock->mult += multadj;
+ clock->xtime_interval += snsec_update;
+ clock->xtime_nsec -= offset_update;
+ clock->error += (offset_update)
+ << (TICK_LENGTH_SHIFT - clock->shift);
+ }
}
+
/*
* update_wall_time - Uses the current clocksource to increment the wall time
*
@@ -1089,7 +1116,8 @@ static void clocksource_adjust(struct cl
*/
static void update_wall_time(void)
{
- cycle_t offset;
+ cycle_t offset, interval_cycs = 0;
+ s64 interval_error = 0;
clock->xtime_nsec += (s64)xtime.tv_nsec << clock->shift;
@@ -1106,8 +1134,13 @@ static void update_wall_time(void)
/* accumulate one interval */
clock->xtime_nsec += clock->xtime_interval;
clock->cycle_last += clock->cycle_interval;
+ interval_cycs += clock->cycle_interval;
offset -= clock->cycle_interval;
+ /* accumulate error between NTP and clock interval */
+ interval_error += current_tick_length();
+ interval_error -= clock->xtime_interval << (TICK_LENGTH_SHIFT - clock->shift);
+
if (clock->xtime_nsec >= (u64)NSEC_PER_SEC << clock->shift) {
clock->xtime_nsec -= (u64)NSEC_PER_SEC << clock->shift;
xtime.tv_sec++;
@@ -1119,14 +1152,10 @@ static void update_wall_time(void)
>> clock->shift);
/* increment the NTP state machine */
update_ntp_one_tick();
-
- /* accumulate error between NTP and clock interval */
- clock->error += current_tick_length();
- clock->error -= clock->xtime_interval << (TICK_LENGTH_SHIFT - clock->shift);
}
-
+ clock->error += interval_error;
/* correct the clock when NTP error is too big */
- clocksource_adjust(clock, offset);
+ clocksource_adjust(clock, offset, interval_cycs, interval_error);
/* store full nanoseconds into xtime */
xtime.tv_nsec = clock->xtime_nsec >> clock->shift;
next prev parent reply other threads:[~2006-07-06 0:51 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-24 13:19 2.6.17-mm2 Andrew Morton
2006-06-24 15:53 ` 2.6.17-mm2 Rafael J. Wysocki
2006-06-24 17:20 ` 2.6.17-mm2 Dave Jones
2006-06-24 21:34 ` 2.6.17-mm2 Andrew Morton
2006-06-25 8:51 ` 2.6.17-mm2 Rafael J. Wysocki
2006-06-25 10:22 ` 2.6.17-mm2 Andrew Morton
2006-06-25 15:16 ` 2.6.17-mm2 Andrew Morton
2006-06-25 18:23 ` 2.6.17-mm2 Sam Ravnborg
2006-06-25 18:40 ` 2.6.17-mm2 Andrew Morton
2006-06-25 21:21 ` 2.6.17-mm2 Sam Ravnborg
2006-06-30 7:38 ` 2.6.17-mm2 Randy.Dunlap
2006-07-02 10:11 ` 2.6.17-mm2 Russell King
2006-07-02 18:42 ` 2.6.17-mm2 Randy.Dunlap
2006-07-02 18:47 ` 2.6.17-mm2 Arjan van de Ven
2006-07-02 18:47 ` 2.6.17-mm2 Sam Ravnborg
2006-07-03 5:50 ` 2.6.17-mm2 Randy.Dunlap
2006-07-03 13:49 ` 2.6.17-mm2 Russell King
2006-06-25 19:19 ` 2.6.17-mm2 Rafael J. Wysocki
2006-06-26 20:13 ` 2.6.17-mm2 Chandra Seetharaman
2006-06-24 19:41 ` 2.6.17-mm2 Dominik Karall
2006-06-24 21:43 ` 2.6.17-mm2 Andrew Morton
2006-06-25 6:06 ` 2.6.17-mm2 Reuben Farrelly
2006-06-25 9:37 ` 2.6.17-mm2 Barry K. Nathan
2006-06-25 10:29 ` 2.6.17-mm2 Reuben Farrelly
2006-06-25 11:19 ` 2.6.17-mm2 Michal Piotrowski
2006-06-25 11:40 ` 2.6.17-mm2 Andrew Morton
2006-06-25 12:18 ` 2.6.17-mm2 Michal Piotrowski
2006-06-25 16:25 ` 2.6.17-mm2 (NULL pointer dereference) Dominik Karall
2006-06-25 17:18 ` Andrew Morton
2006-06-25 18:11 ` Dominik Karall
2006-06-25 16:47 ` 2.6.17-mm2: no QLA3YYY_NAPI help text Adrian Bunk
2006-06-25 19:32 ` 2.6.17-mm2: BLK_CPQ_CISS_DA=m error Adrian Bunk
2006-06-26 0:41 ` Vivek Goyal
2006-06-25 23:13 ` [-mm patch] make drivers/scsi/pata_it821x.c:it821x_passthru_dev_select() static Adrian Bunk
2006-06-25 23:27 ` Alan Cox
2006-06-27 1:03 ` Jeff Garzik
2006-06-25 23:13 ` [-mm patch] fs/cifs/cifsproto.h: remove #ifdef around small_smb_init_no_tc() prototype Adrian Bunk
2006-06-26 4:05 ` Steven French
2006-06-26 15:17 ` [-mm patch] drivers/scsi/arcmsr/: cleanups Adrian Bunk
2006-06-26 20:27 ` [-mm patch] drivers/md/raid5.c: remove an unused variable Adrian Bunk
2006-06-26 21:41 ` 2.6.17-mm2 hrtimer code wedges at boot? Valdis.Kletnieks
2006-06-26 22:50 ` Valdis.Kletnieks
2006-06-26 23:02 ` john stultz
2006-06-26 23:27 ` Thomas Gleixner
2006-06-27 2:12 ` Valdis.Kletnieks
2006-06-27 5:54 ` Thomas Gleixner
2006-06-27 10:16 ` Roman Zippel
2006-06-27 16:43 ` Valdis.Kletnieks
2006-06-27 17:10 ` Roman Zippel
2006-06-27 17:23 ` Roman Zippel
2006-06-27 19:07 ` Valdis.Kletnieks
2006-06-28 0:07 ` john stultz
2006-06-28 10:35 ` Roman Zippel
2006-06-28 11:44 ` Roman Zippel
2006-06-29 23:07 ` Valdis.Kletnieks
2006-06-30 19:26 ` john stultz
2006-06-30 21:04 ` Valdis.Kletnieks
2006-07-03 1:13 ` Roman Zippel
2006-07-03 1:56 ` Daniel Walker
2006-07-03 2:20 ` Valdis.Kletnieks
2006-07-03 20:08 ` john stultz
2006-07-03 19:59 ` john stultz
2006-07-04 22:21 ` Valdis.Kletnieks
2006-07-05 4:29 ` Valdis.Kletnieks
2006-07-06 0:37 ` Roman Zippel
2006-07-06 0:56 ` john stultz
2006-07-06 6:38 ` Valdis.Kletnieks
2006-07-06 0:51 ` john stultz [this message]
2006-07-06 1:12 ` john stultz
2006-07-06 5:43 ` john stultz
2006-07-06 20:33 ` Roman Zippel
2006-07-06 22:05 ` john stultz
2006-07-07 23:16 ` Roman Zippel
2006-07-08 20:02 ` [PATCH] adjust clock for lost ticks Roman Zippel
2006-07-09 21:25 ` john stultz
2006-06-28 23:41 ` 2.6.17-mm2 hrtimer code wedges at boot? john stultz
2006-06-29 11:24 ` Roman Zippel
2006-06-28 16:54 ` [-mm patch] include/asm-i386/acpi.h should #include <asm/processor.h> Adrian Bunk
2006-06-28 16:54 ` [-mm patch] fix sgivwfb compile Adrian Bunk
2006-06-28 16:54 ` [-mm patch] arch/i386/mach-visws/setup.c: remove dummy function calls Adrian Bunk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1152147114.24656.117.camel@cog.beaverton.ibm.com \
--to=johnstul@us.ibm.com \
--cc=Valdis.Kletnieks@vt.edu \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=zippel@linux-m68k.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).