Bizarre pv kernel ultra-high frequency rdtsc?!?

* Bizarre pv kernel ultra-high frequency rdtsc?!?
@ 2009-11-20 23:45 Dan Magenheimer
  2009-11-21 17:31 ` Dan Magenheimer
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dan Magenheimer @ 2009-11-20 23:45 UTC (permalink / raw)
  To: Keir Fraser, Jeremy Fitzhardinge; +Cc: Xen-Devel (E-mail)

[-- Attachment #1: Type: text/plain, Size: 3667 bytes --]

Hi Jeremy/Keir (and any other PV time experts out there) --

Working on tsc_mode stuff I've run into a roadblock where
there is some time-related interaction between xen and a
PV kernel that I don't understand.  I'm hoping you
might provide a clue.  There's also a reasonable chance
that this might be uncovering a significant bug that's
been around awhile, but never noticed as other than
a barely noticeable vague slowdown because rdtsc was
unemulated (and "fast").

The problem:

In order to preserve TSC across save/restore/migrate, I
have implemented a "tsc offset" (and also a "tsc scale"
but that isn't used yet).

The result is that the PV kernel starts doing rdtsc's at
a VERY high frequency (1 MILLION / sec).  I suspect this
may be a variation of what Jeremy reported at one point
when emulated rdtsc was first in-tree, but seemed to go away.

By adding some debug code (and confirmed with xenctx)
I can see that the millions of rdtsc's are half in
get_nsec_offset() and half in do_gettimeofday() (presumably
inlined from get_usec_offset()).  This is a 32-bit 2.6.18-based
PV kernel, not upstream.  Poring through the 2.6.18 PV time
code, I can find several places where an essentially infinite
loop might happen if the version fields are wacko, but
none where the timestamp contents make any difference
in control flow, so don't see how modifying these
values (by adding the offset) could cause a behavioral
change in Linux, but obviously a big change is happening!

I can reproduce the problem with a very simple patch
on xen-unstable that adds a fake fixed offset in the
three places I add the "tsc offset", see attached.
By changing BIG_OFFSET to 0, in this patch, the
frequency of rdtsc's becomes normal again.

Suspecting some interaction with wallclock time, I
tried shutting off ntpd and with/without independent
wallclock in the PV guest.  No difference.

I also added debug code to see if the Xen-side code
was churning through version numbers... it is not.

Any ideas?  (And, sorry, I know you're on a trans-
hemisphere trip right now.)

Thanks,
Dan

diff -r bec27eb6f72c xen/arch/x86/time.c

--- a/xen/arch/x86/time.c	Sat Nov 14 10:32:59 2009 +0000
+++ b/xen/arch/x86/time.c	Fri Nov 20 16:58:18 2009 -0500
@@ -813,6 +813,8 @@ s_time_t get_s_time(void)
 #define version_update_begin(v) (((v)+1)|1)
 #define version_update_end(v)   ((v)+1)
 
+#define BIG_OFFSET 10000000000ULL
+
 static void __update_vcpu_system_time(struct vcpu *v, int force)
 {
     struct cpu_time       *t;
@@ -827,7 +829,7 @@ static void __update_vcpu_system_time(st
 
     /* Don't bother unless timestamps have changed or we are forced. */
     if ( !force && (u->tsc_timestamp == (v->domain->arch.vtsc
-                                         ? t->stime_local_stamp
+                                         ? t->stime_local_stamp - BIG_OFFSET
                                          : t->local_tsc_stamp)) )
         return;
 
@@ -835,8 +837,8 @@ static void __update_vcpu_system_time(st
 
     if ( v->domain->arch.vtsc )
     {
-        _u.tsc_timestamp     = t->stime_local_stamp;
-        _u.system_time       = t->stime_local_stamp;
+        _u.tsc_timestamp     = t->stime_local_stamp - BIG_OFFSET;
+        _u.system_time       = t->stime_local_stamp - BIG_OFFSET;
         _u.tsc_to_system_mul = 0x80000000u;
         _u.tsc_shift         = 1;
     }
@@ -1598,6 +1600,8 @@ void pv_soft_rdtsc(struct vcpu *v, struc
 
     spin_unlock(&v->domain->arch.vtsc_lock);
 
+    now -= BIG_OFFSET;
+
     regs->eax = (uint32_t)now;
     regs->edx = (uint32_t)(now >> 32);
 }

[-- Attachment #2: tsc-bigoffset.patch --]
[-- Type: application/octet-stream, Size: 1500 bytes --]

diff -r bec27eb6f72c xen/arch/x86/time.c
--- a/xen/arch/x86/time.c	Sat Nov 14 10:32:59 2009 +0000
+++ b/xen/arch/x86/time.c	Fri Nov 20 16:58:18 2009 -0500
@@ -813,6 +813,8 @@ s_time_t get_s_time(void)
 #define version_update_begin(v) (((v)+1)|1)
 #define version_update_end(v)   ((v)+1)
 
+#define BIG_OFFSET 10000000000ULL
+
 static void __update_vcpu_system_time(struct vcpu *v, int force)
 {
     struct cpu_time       *t;
@@ -827,7 +829,7 @@ static void __update_vcpu_system_time(st
 
     /* Don't bother unless timestamps have changed or we are forced. */
     if ( !force && (u->tsc_timestamp == (v->domain->arch.vtsc
-                                         ? t->stime_local_stamp
+                                         ? t->stime_local_stamp - BIG_OFFSET
                                          : t->local_tsc_stamp)) )
         return;
 
@@ -835,8 +837,8 @@ static void __update_vcpu_system_time(st
 
     if ( v->domain->arch.vtsc )
     {
-        _u.tsc_timestamp     = t->stime_local_stamp;
-        _u.system_time       = t->stime_local_stamp;
+        _u.tsc_timestamp     = t->stime_local_stamp - BIG_OFFSET;
+        _u.system_time       = t->stime_local_stamp - BIG_OFFSET;
         _u.tsc_to_system_mul = 0x80000000u;
         _u.tsc_shift         = 1;
     }
@@ -1598,6 +1600,8 @@ void pv_soft_rdtsc(struct vcpu *v, struc
 
     spin_unlock(&v->domain->arch.vtsc_lock);
 
+    now -= BIG_OFFSET;
+
     regs->eax = (uint32_t)now;
     regs->edx = (uint32_t)(now >> 32);
 }

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread