All of lore.kernel.org
 help / color / mirror / Atom feed
* Bizarre pv kernel ultra-high frequency rdtsc?!?
@ 2009-11-20 23:45 Dan Magenheimer
  2009-11-21 17:31 ` Dan Magenheimer
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dan Magenheimer @ 2009-11-20 23:45 UTC (permalink / raw)
  To: Keir Fraser, Jeremy Fitzhardinge; +Cc: Xen-Devel (E-mail)

[-- Attachment #1: Type: text/plain, Size: 3667 bytes --]

Hi Jeremy/Keir (and any other PV time experts out there) --

Working on tsc_mode stuff I've run into a roadblock where
there is some time-related interaction between xen and a
PV kernel that I don't understand.  I'm hoping you
might provide a clue.  There's also a reasonable chance
that this might be uncovering a significant bug that's
been around awhile, but never noticed as other than
a barely noticeable vague slowdown because rdtsc was
unemulated (and "fast").

The problem:

In order to preserve TSC across save/restore/migrate, I
have implemented a "tsc offset" (and also a "tsc scale"
but that isn't used yet).

The result is that the PV kernel starts doing rdtsc's at
a VERY high frequency (1 MILLION / sec).  I suspect this
may be a variation of what Jeremy reported at one point
when emulated rdtsc was first in-tree, but seemed to go away.

By adding some debug code (and confirmed with xenctx)
I can see that the millions of rdtsc's are half in
get_nsec_offset() and half in do_gettimeofday() (presumably
inlined from get_usec_offset()).  This is a 32-bit 2.6.18-based
PV kernel, not upstream.  Poring through the 2.6.18 PV time
code, I can find several places where an essentially infinite
loop might happen if the version fields are wacko, but
none where the timestamp contents make any difference
in control flow, so don't see how modifying these
values (by adding the offset) could cause a behavioral
change in Linux, but obviously a big change is happening!

I can reproduce the problem with a very simple patch
on xen-unstable that adds a fake fixed offset in the
three places I add the "tsc offset", see attached.
By changing BIG_OFFSET to 0, in this patch, the
frequency of rdtsc's becomes normal again.

Suspecting some interaction with wallclock time, I
tried shutting off ntpd and with/without independent
wallclock in the PV guest.  No difference.

I also added debug code to see if the Xen-side code
was churning through version numbers... it is not.

Any ideas?  (And, sorry, I know you're on a trans-
hemisphere trip right now.)

Thanks,
Dan

diff -r bec27eb6f72c xen/arch/x86/time.c
--- a/xen/arch/x86/time.c	Sat Nov 14 10:32:59 2009 +0000
+++ b/xen/arch/x86/time.c	Fri Nov 20 16:58:18 2009 -0500
@@ -813,6 +813,8 @@ s_time_t get_s_time(void)
 #define version_update_begin(v) (((v)+1)|1)
 #define version_update_end(v)   ((v)+1)
 
+#define BIG_OFFSET 10000000000ULL
+
 static void __update_vcpu_system_time(struct vcpu *v, int force)
 {
     struct cpu_time       *t;
@@ -827,7 +829,7 @@ static void __update_vcpu_system_time(st
 
     /* Don't bother unless timestamps have changed or we are forced. */
     if ( !force && (u->tsc_timestamp == (v->domain->arch.vtsc
-                                         ? t->stime_local_stamp
+                                         ? t->stime_local_stamp - BIG_OFFSET
                                          : t->local_tsc_stamp)) )
         return;
 
@@ -835,8 +837,8 @@ static void __update_vcpu_system_time(st
 
     if ( v->domain->arch.vtsc )
     {
-        _u.tsc_timestamp     = t->stime_local_stamp;
-        _u.system_time       = t->stime_local_stamp;
+        _u.tsc_timestamp     = t->stime_local_stamp - BIG_OFFSET;
+        _u.system_time       = t->stime_local_stamp - BIG_OFFSET;
         _u.tsc_to_system_mul = 0x80000000u;
         _u.tsc_shift         = 1;
     }
@@ -1598,6 +1600,8 @@ void pv_soft_rdtsc(struct vcpu *v, struc
 
     spin_unlock(&v->domain->arch.vtsc_lock);
 
+    now -= BIG_OFFSET;
+
     regs->eax = (uint32_t)now;
     regs->edx = (uint32_t)(now >> 32);
 }

[-- Attachment #2: tsc-bigoffset.patch --]
[-- Type: application/octet-stream, Size: 1500 bytes --]

diff -r bec27eb6f72c xen/arch/x86/time.c
--- a/xen/arch/x86/time.c	Sat Nov 14 10:32:59 2009 +0000
+++ b/xen/arch/x86/time.c	Fri Nov 20 16:58:18 2009 -0500
@@ -813,6 +813,8 @@ s_time_t get_s_time(void)
 #define version_update_begin(v) (((v)+1)|1)
 #define version_update_end(v)   ((v)+1)
 
+#define BIG_OFFSET 10000000000ULL
+
 static void __update_vcpu_system_time(struct vcpu *v, int force)
 {
     struct cpu_time       *t;
@@ -827,7 +829,7 @@ static void __update_vcpu_system_time(st
 
     /* Don't bother unless timestamps have changed or we are forced. */
     if ( !force && (u->tsc_timestamp == (v->domain->arch.vtsc
-                                         ? t->stime_local_stamp
+                                         ? t->stime_local_stamp - BIG_OFFSET
                                          : t->local_tsc_stamp)) )
         return;
 
@@ -835,8 +837,8 @@ static void __update_vcpu_system_time(st
 
     if ( v->domain->arch.vtsc )
     {
-        _u.tsc_timestamp     = t->stime_local_stamp;
-        _u.system_time       = t->stime_local_stamp;
+        _u.tsc_timestamp     = t->stime_local_stamp - BIG_OFFSET;
+        _u.system_time       = t->stime_local_stamp - BIG_OFFSET;
         _u.tsc_to_system_mul = 0x80000000u;
         _u.tsc_shift         = 1;
     }
@@ -1598,6 +1600,8 @@ void pv_soft_rdtsc(struct vcpu *v, struc
 
     spin_unlock(&v->domain->arch.vtsc_lock);
 
+    now -= BIG_OFFSET;
+
     regs->eax = (uint32_t)now;
     regs->edx = (uint32_t)(now >> 32);
 }

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Bizarre pv kernel ultra-high frequency rdtsc?!?
  2009-11-20 23:45 Bizarre pv kernel ultra-high frequency rdtsc?!? Dan Magenheimer
@ 2009-11-21 17:31 ` Dan Magenheimer
  2009-11-21 20:02   ` Keir Fraser
  2009-11-21 19:50 ` Keir Fraser
  2009-11-21 21:45 ` Keir Fraser
  2 siblings, 1 reply; 7+ messages in thread
From: Dan Magenheimer @ 2009-11-21 17:31 UTC (permalink / raw)
  To: dan.magenheimer, Keir Fraser, Jeremy Fitzhardinge; +Cc: Xen-Devel (E-mail)

One big clue: Looking at /proc/interrupts inside
the PV guest, the number of timer0 interrupts is
about 300K/second.

Not remembering well how timer interrupts are handled
in a PV guest... could this high frequency be happening
because the Linux-side PV code is setting a timer
or because the Xen-side interrupt delivery code is
getting confused?

> -----Original Message-----
> From: Dan Magenheimer 
> Sent: Friday, November 20, 2009 4:45 PM
> To: Keir Fraser; Jeremy Fitzhardinge
> Cc: Xen-Devel (E-mail)
> Subject: Bizarre pv kernel ultra-high frequency rdtsc?!?
> 
> 
> Hi Jeremy/Keir (and any other PV time experts out there) --
> 
> Working on tsc_mode stuff I've run into a roadblock where
> there is some time-related interaction between xen and a
> PV kernel that I don't understand.  I'm hoping you
> might provide a clue.  There's also a reasonable chance
> that this might be uncovering a significant bug that's
> been around awhile, but never noticed as other than
> a barely noticeable vague slowdown because rdtsc was
> unemulated (and "fast").
> 
> The problem:
> 
> In order to preserve TSC across save/restore/migrate, I
> have implemented a "tsc offset" (and also a "tsc scale"
> but that isn't used yet).
> 
> The result is that the PV kernel starts doing rdtsc's at
> a VERY high frequency (1 MILLION / sec).  I suspect this
> may be a variation of what Jeremy reported at one point
> when emulated rdtsc was first in-tree, but seemed to go away.
> 
> By adding some debug code (and confirmed with xenctx)
> I can see that the millions of rdtsc's are half in
> get_nsec_offset() and half in do_gettimeofday() (presumably
> inlined from get_usec_offset()).  This is a 32-bit 2.6.18-based
> PV kernel, not upstream.  Poring through the 2.6.18 PV time
> code, I can find several places where an essentially infinite
> loop might happen if the version fields are wacko, but
> none where the timestamp contents make any difference
> in control flow, so don't see how modifying these
> values (by adding the offset) could cause a behavioral
> change in Linux, but obviously a big change is happening!
> 
> I can reproduce the problem with a very simple patch
> on xen-unstable that adds a fake fixed offset in the
> three places I add the "tsc offset", see attached.
> By changing BIG_OFFSET to 0, in this patch, the
> frequency of rdtsc's becomes normal again.
> 
> Suspecting some interaction with wallclock time, I
> tried shutting off ntpd and with/without independent
> wallclock in the PV guest.  No difference.
> 
> I also added debug code to see if the Xen-side code
> was churning through version numbers... it is not.
> 
> Any ideas?  (And, sorry, I know you're on a trans-
> hemisphere trip right now.)
> 
> Thanks,
> Dan
> 
> diff -r bec27eb6f72c xen/arch/x86/time.c
> --- a/xen/arch/x86/time.c	Sat Nov 14 10:32:59 2009 +0000
> +++ b/xen/arch/x86/time.c	Fri Nov 20 16:58:18 2009 -0500
> @@ -813,6 +813,8 @@ s_time_t get_s_time(void)
>  #define version_update_begin(v) (((v)+1)|1)
>  #define version_update_end(v)   ((v)+1)
>  
> +#define BIG_OFFSET 10000000000ULL
> +
>  static void __update_vcpu_system_time(struct vcpu *v, int force)
>  {
>      struct cpu_time       *t;
> @@ -827,7 +829,7 @@ static void __update_vcpu_system_time(st
>  
>      /* Don't bother unless timestamps have changed or we are 
> forced. */
>      if ( !force && (u->tsc_timestamp == (v->domain->arch.vtsc
> -                                         ? t->stime_local_stamp
> +                                         ? 
> t->stime_local_stamp - BIG_OFFSET
>                                           : t->local_tsc_stamp)) )
>          return;
>  
> @@ -835,8 +837,8 @@ static void __update_vcpu_system_time(st
>  
>      if ( v->domain->arch.vtsc )
>      {
> -        _u.tsc_timestamp     = t->stime_local_stamp;
> -        _u.system_time       = t->stime_local_stamp;
> +        _u.tsc_timestamp     = t->stime_local_stamp - BIG_OFFSET;
> +        _u.system_time       = t->stime_local_stamp - BIG_OFFSET;
>          _u.tsc_to_system_mul = 0x80000000u;
>          _u.tsc_shift         = 1;
>      }
> @@ -1598,6 +1600,8 @@ void pv_soft_rdtsc(struct vcpu *v, struc
>  
>      spin_unlock(&v->domain->arch.vtsc_lock);
>  
> +    now -= BIG_OFFSET;
> +
>      regs->eax = (uint32_t)now;
>      regs->edx = (uint32_t)(now >> 32);
>  }
> 
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bizarre pv kernel ultra-high frequency rdtsc?!?
  2009-11-20 23:45 Bizarre pv kernel ultra-high frequency rdtsc?!? Dan Magenheimer
  2009-11-21 17:31 ` Dan Magenheimer
@ 2009-11-21 19:50 ` Keir Fraser
  2009-11-21 21:45 ` Keir Fraser
  2 siblings, 0 replies; 7+ messages in thread
From: Keir Fraser @ 2009-11-21 19:50 UTC (permalink / raw)
  To: Dan Magenheimer, Jeremy Fitzhardinge; +Cc: Xen-Devel (E-mail)

What happens if you add BIG_OFFSET rather than subtract it? You'll be
creating some big 64-bit TSC stamps otherwise, which we'd never normally
expect to reach with a 64-bit-wide counter. Also you will be wrapping the
vTSC some time fairly soon after boot.

Just an easy thing to try. Other than that the patch does look plausible.

You see the high rate while the guest is idle? And 'normal' RDTSC rate is
hundreds per second?

 -- Keir

On 20/11/2009 23:45, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

> Hi Jeremy/Keir (and any other PV time experts out there) --
> 
> Working on tsc_mode stuff I've run into a roadblock where
> there is some time-related interaction between xen and a
> PV kernel that I don't understand.  I'm hoping you
> might provide a clue.  There's also a reasonable chance
> that this might be uncovering a significant bug that's
> been around awhile, but never noticed as other than
> a barely noticeable vague slowdown because rdtsc was
> unemulated (and "fast").
> 
> The problem:
> 
> In order to preserve TSC across save/restore/migrate, I
> have implemented a "tsc offset" (and also a "tsc scale"
> but that isn't used yet).
> 
> The result is that the PV kernel starts doing rdtsc's at
> a VERY high frequency (1 MILLION / sec).  I suspect this
> may be a variation of what Jeremy reported at one point
> when emulated rdtsc was first in-tree, but seemed to go away.
> 
> By adding some debug code (and confirmed with xenctx)
> I can see that the millions of rdtsc's are half in
> get_nsec_offset() and half in do_gettimeofday() (presumably
> inlined from get_usec_offset()).  This is a 32-bit 2.6.18-based
> PV kernel, not upstream.  Poring through the 2.6.18 PV time
> code, I can find several places where an essentially infinite
> loop might happen if the version fields are wacko, but
> none where the timestamp contents make any difference
> in control flow, so don't see how modifying these
> values (by adding the offset) could cause a behavioral
> change in Linux, but obviously a big change is happening!
> 
> I can reproduce the problem with a very simple patch
> on xen-unstable that adds a fake fixed offset in the
> three places I add the "tsc offset", see attached.
> By changing BIG_OFFSET to 0, in this patch, the
> frequency of rdtsc's becomes normal again.
> 
> Suspecting some interaction with wallclock time, I
> tried shutting off ntpd and with/without independent
> wallclock in the PV guest.  No difference.
> 
> I also added debug code to see if the Xen-side code
> was churning through version numbers... it is not.
> 
> Any ideas?  (And, sorry, I know you're on a trans-
> hemisphere trip right now.)
> 
> Thanks,
> Dan
> 
> diff -r bec27eb6f72c xen/arch/x86/time.c
> --- a/xen/arch/x86/time.c Sat Nov 14 10:32:59 2009 +0000
> +++ b/xen/arch/x86/time.c Fri Nov 20 16:58:18 2009 -0500
> @@ -813,6 +813,8 @@ s_time_t get_s_time(void)
>  #define version_update_begin(v) (((v)+1)|1)
>  #define version_update_end(v)   ((v)+1)
>  
> +#define BIG_OFFSET 10000000000ULL
> +
>  static void __update_vcpu_system_time(struct vcpu *v, int force)
>  {
>      struct cpu_time       *t;
> @@ -827,7 +829,7 @@ static void __update_vcpu_system_time(st
>  
>      /* Don't bother unless timestamps have changed or we are forced. */
>      if ( !force && (u->tsc_timestamp == (v->domain->arch.vtsc
> -                                         ? t->stime_local_stamp
> +                                         ? t->stime_local_stamp - BIG_OFFSET
>                                           : t->local_tsc_stamp)) )
>          return;
>  
> @@ -835,8 +837,8 @@ static void __update_vcpu_system_time(st
>  
>      if ( v->domain->arch.vtsc )
>      {
> -        _u.tsc_timestamp     = t->stime_local_stamp;
> -        _u.system_time       = t->stime_local_stamp;
> +        _u.tsc_timestamp     = t->stime_local_stamp - BIG_OFFSET;
> +        _u.system_time       = t->stime_local_stamp - BIG_OFFSET;
>          _u.tsc_to_system_mul = 0x80000000u;
>          _u.tsc_shift         = 1;
>      }
> @@ -1598,6 +1600,8 @@ void pv_soft_rdtsc(struct vcpu *v, struc
>  
>      spin_unlock(&v->domain->arch.vtsc_lock);
>  
> +    now -= BIG_OFFSET;
> +
>      regs->eax = (uint32_t)now;
>      regs->edx = (uint32_t)(now >> 32);
>  }

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bizarre pv kernel ultra-high frequency rdtsc?!?
  2009-11-21 17:31 ` Dan Magenheimer
@ 2009-11-21 20:02   ` Keir Fraser
  2009-11-21 21:24     ` Dan Magenheimer
  0 siblings, 1 reply; 7+ messages in thread
From: Keir Fraser @ 2009-11-21 20:02 UTC (permalink / raw)
  To: Dan Magenheimer, Jeremy Fitzhardinge; +Cc: Xen-Devel (E-mail)

Maybe trace what the guest thinks system time is, compared with Xen itself.
If they get out of whack, you can get nastiness with guest trying to set
timers it thinks are in the future but Xen thinks are in the past, and end
up in a loop.

I still think this is due to the big +ve 64-bit vTSC values looking like -ve
values in pv_soft_rdtsc() and accidentally triggering the
TSC-hasn't-gone-forwards check. As soon as system time goes awry between
guest and Xen, I think you are in trouble. There could be more 'false -ves'
like that elsewhere -- so just add BIG_OFFSET rather than subtract it. In
practice your TSC values are never going to have the 64th bit set (would
take 30 years for TSC to tick that high assuming a 10GHz tick rate), so
adding BIG_OFFSET is a more reasonable synthetic test.

 -- Keir

On 21/11/2009 17:31, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

> One big clue: Looking at /proc/interrupts inside
> the PV guest, the number of timer0 interrupts is
> about 300K/second.
> 
> Not remembering well how timer interrupts are handled
> in a PV guest... could this high frequency be happening
> because the Linux-side PV code is setting a timer
> or because the Xen-side interrupt delivery code is
> getting confused?
> 
>> -----Original Message-----
>> From: Dan Magenheimer
>> Sent: Friday, November 20, 2009 4:45 PM
>> To: Keir Fraser; Jeremy Fitzhardinge
>> Cc: Xen-Devel (E-mail)
>> Subject: Bizarre pv kernel ultra-high frequency rdtsc?!?
>> 
>> 
>> Hi Jeremy/Keir (and any other PV time experts out there) --
>> 
>> Working on tsc_mode stuff I've run into a roadblock where
>> there is some time-related interaction between xen and a
>> PV kernel that I don't understand.  I'm hoping you
>> might provide a clue.  There's also a reasonable chance
>> that this might be uncovering a significant bug that's
>> been around awhile, but never noticed as other than
>> a barely noticeable vague slowdown because rdtsc was
>> unemulated (and "fast").
>> 
>> The problem:
>> 
>> In order to preserve TSC across save/restore/migrate, I
>> have implemented a "tsc offset" (and also a "tsc scale"
>> but that isn't used yet).
>> 
>> The result is that the PV kernel starts doing rdtsc's at
>> a VERY high frequency (1 MILLION / sec).  I suspect this
>> may be a variation of what Jeremy reported at one point
>> when emulated rdtsc was first in-tree, but seemed to go away.
>> 
>> By adding some debug code (and confirmed with xenctx)
>> I can see that the millions of rdtsc's are half in
>> get_nsec_offset() and half in do_gettimeofday() (presumably
>> inlined from get_usec_offset()).  This is a 32-bit 2.6.18-based
>> PV kernel, not upstream.  Poring through the 2.6.18 PV time
>> code, I can find several places where an essentially infinite
>> loop might happen if the version fields are wacko, but
>> none where the timestamp contents make any difference
>> in control flow, so don't see how modifying these
>> values (by adding the offset) could cause a behavioral
>> change in Linux, but obviously a big change is happening!
>> 
>> I can reproduce the problem with a very simple patch
>> on xen-unstable that adds a fake fixed offset in the
>> three places I add the "tsc offset", see attached.
>> By changing BIG_OFFSET to 0, in this patch, the
>> frequency of rdtsc's becomes normal again.
>> 
>> Suspecting some interaction with wallclock time, I
>> tried shutting off ntpd and with/without independent
>> wallclock in the PV guest.  No difference.
>> 
>> I also added debug code to see if the Xen-side code
>> was churning through version numbers... it is not.
>> 
>> Any ideas?  (And, sorry, I know you're on a trans-
>> hemisphere trip right now.)
>> 
>> Thanks,
>> Dan
>> 
>> diff -r bec27eb6f72c xen/arch/x86/time.c
>> --- a/xen/arch/x86/time.c Sat Nov 14 10:32:59 2009 +0000
>> +++ b/xen/arch/x86/time.c Fri Nov 20 16:58:18 2009 -0500
>> @@ -813,6 +813,8 @@ s_time_t get_s_time(void)
>>  #define version_update_begin(v) (((v)+1)|1)
>>  #define version_update_end(v)   ((v)+1)
>>  
>> +#define BIG_OFFSET 10000000000ULL
>> +
>>  static void __update_vcpu_system_time(struct vcpu *v, int force)
>>  {
>>      struct cpu_time       *t;
>> @@ -827,7 +829,7 @@ static void __update_vcpu_system_time(st
>>  
>>      /* Don't bother unless timestamps have changed or we are
>> forced. */
>>      if ( !force && (u->tsc_timestamp == (v->domain->arch.vtsc
>> -                                         ? t->stime_local_stamp
>> +                                         ?
>> t->stime_local_stamp - BIG_OFFSET
>>                                           : t->local_tsc_stamp)) )
>>          return;
>>  
>> @@ -835,8 +837,8 @@ static void __update_vcpu_system_time(st
>>  
>>      if ( v->domain->arch.vtsc )
>>      {
>> -        _u.tsc_timestamp     = t->stime_local_stamp;
>> -        _u.system_time       = t->stime_local_stamp;
>> +        _u.tsc_timestamp     = t->stime_local_stamp - BIG_OFFSET;
>> +        _u.system_time       = t->stime_local_stamp - BIG_OFFSET;
>>          _u.tsc_to_system_mul = 0x80000000u;
>>          _u.tsc_shift         = 1;
>>      }
>> @@ -1598,6 +1600,8 @@ void pv_soft_rdtsc(struct vcpu *v, struc
>>  
>>      spin_unlock(&v->domain->arch.vtsc_lock);
>>  
>> +    now -= BIG_OFFSET;
>> +
>>      regs->eax = (uint32_t)now;
>>      regs->edx = (uint32_t)(now >> 32);
>>  }
>> 
>> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Bizarre pv kernel ultra-high frequency rdtsc?!?
  2009-11-21 20:02   ` Keir Fraser
@ 2009-11-21 21:24     ` Dan Magenheimer
  0 siblings, 0 replies; 7+ messages in thread
From: Dan Magenheimer @ 2009-11-21 21:24 UTC (permalink / raw)
  To: Keir Fraser, Jeremy Fitzhardinge; +Cc: Xen-Devel (E-mail)

> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>
> What happens if you add BIG_OFFSET rather than subtract it? You'll be
> creating some big 64-bit TSC stamps otherwise, which we'd 
> never normally
> expect to reach with a 64-bit-wide counter. Also you will be 
> wrapping the
> vTSC some time fairly soon after boot.

Thanks for the reply!

Well BIG_OFFSET is only 10,000,000,000 so its only a few
seconds (at ~3GHz) and it takes longer than that for Xen
to boot, so the subtraction isn't going "below" zero.
(In the real code, I'm trying to make tsc within the guest
appear to start at zero as if it were real hardware,
so in the real code it is definitely never negative.)

Though skeptical, I tried switching to ADDing the offset
instead as you suggested and... the OPPOSITE seems to happen.
The PV kernel seems to boot VERY slowly, with periodic
"BUG soft lockup" messages.  Xenctx shows that the PV
kernel is spending nearly all of its time idling, and
looking at the count of emulated rdtsc's, there is a
burst of only about a dozen every 10 seconds or so. It
does eventually get to a login prompt, and I can
see from /proc/interrupts that there is now less
than one timer0 interrupt/second.

Curiouser and curiouser...

> Maybe trace what the guest thinks system time is, compared 
> with Xen itself. If they get out of whack, you can get
> nastiness with guest trying to set
> timers it thinks are in the future but Xen thinks are in the 
> past, and end up in a loop.

That sounds promising... any pointers on where to
place some debug code or what variables to probe?

Thanks,
Dan

> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Saturday, November 21, 2009 1:02 PM
> To: Dan Magenheimer; Jeremy Fitzhardinge
> Cc: Xen-Devel (E-mail)
> Subject: Re: Bizarre pv kernel ultra-high frequency rdtsc?!?
> 
> 
> Maybe trace what the guest thinks system time is, compared 
> with Xen itself.
> If they get out of whack, you can get nastiness with guest 
> trying to set
> timers it thinks are in the future but Xen thinks are in the 
> past, and end
> up in a loop.
> 
> I still think this is due to the big +ve 64-bit vTSC values 
> looking like -ve
> values in pv_soft_rdtsc() and accidentally triggering the
> TSC-hasn't-gone-forwards check. As soon as system time goes 
> awry between
> guest and Xen, I think you are in trouble. There could be 
> more 'false -ves'
> like that elsewhere -- so just add BIG_OFFSET rather than 
> subtract it. In
> practice your TSC values are never going to have the 64th bit 
> set (would
> take 30 years for TSC to tick that high assuming a 10GHz tick 
> rate), so
> adding BIG_OFFSET is a more reasonable synthetic test.
> 
>  -- Keir
> 
> On 21/11/2009 17:31, "Dan Magenheimer" 
> <dan.magenheimer@oracle.com> wrote:
> 
> > One big clue: Looking at /proc/interrupts inside
> > the PV guest, the number of timer0 interrupts is
> > about 300K/second.
> > 
> > Not remembering well how timer interrupts are handled
> > in a PV guest... could this high frequency be happening
> > because the Linux-side PV code is setting a timer
> > or because the Xen-side interrupt delivery code is
> > getting confused?
> > 
> >> -----Original Message-----
> >> From: Dan Magenheimer
> >> Sent: Friday, November 20, 2009 4:45 PM
> >> To: Keir Fraser; Jeremy Fitzhardinge
> >> Cc: Xen-Devel (E-mail)
> >> Subject: Bizarre pv kernel ultra-high frequency rdtsc?!?
> >> 
> >> 
> >> Hi Jeremy/Keir (and any other PV time experts out there) --
> >> 
> >> Working on tsc_mode stuff I've run into a roadblock where
> >> there is some time-related interaction between xen and a
> >> PV kernel that I don't understand.  I'm hoping you
> >> might provide a clue.  There's also a reasonable chance
> >> that this might be uncovering a significant bug that's
> >> been around awhile, but never noticed as other than
> >> a barely noticeable vague slowdown because rdtsc was
> >> unemulated (and "fast").
> >> 
> >> The problem:
> >> 
> >> In order to preserve TSC across save/restore/migrate, I
> >> have implemented a "tsc offset" (and also a "tsc scale"
> >> but that isn't used yet).
> >> 
> >> The result is that the PV kernel starts doing rdtsc's at
> >> a VERY high frequency (1 MILLION / sec).  I suspect this
> >> may be a variation of what Jeremy reported at one point
> >> when emulated rdtsc was first in-tree, but seemed to go away.
> >> 
> >> By adding some debug code (and confirmed with xenctx)
> >> I can see that the millions of rdtsc's are half in
> >> get_nsec_offset() and half in do_gettimeofday() (presumably
> >> inlined from get_usec_offset()).  This is a 32-bit 2.6.18-based
> >> PV kernel, not upstream.  Poring through the 2.6.18 PV time
> >> code, I can find several places where an essentially infinite
> >> loop might happen if the version fields are wacko, but
> >> none where the timestamp contents make any difference
> >> in control flow, so don't see how modifying these
> >> values (by adding the offset) could cause a behavioral
> >> change in Linux, but obviously a big change is happening!
> >> 
> >> I can reproduce the problem with a very simple patch
> >> on xen-unstable that adds a fake fixed offset in the
> >> three places I add the "tsc offset", see attached.
> >> By changing BIG_OFFSET to 0, in this patch, the
> >> frequency of rdtsc's becomes normal again.
> >> 
> >> Suspecting some interaction with wallclock time, I
> >> tried shutting off ntpd and with/without independent
> >> wallclock in the PV guest.  No difference.
> >> 
> >> I also added debug code to see if the Xen-side code
> >> was churning through version numbers... it is not.
> >> 
> >> Any ideas?  (And, sorry, I know you're on a trans-
> >> hemisphere trip right now.)
> >> 
> >> Thanks,
> >> Dan
> >> 
> >> diff -r bec27eb6f72c xen/arch/x86/time.c
> >> --- a/xen/arch/x86/time.c Sat Nov 14 10:32:59 2009 +0000
> >> +++ b/xen/arch/x86/time.c Fri Nov 20 16:58:18 2009 -0500
> >> @@ -813,6 +813,8 @@ s_time_t get_s_time(void)
> >>  #define version_update_begin(v) (((v)+1)|1)
> >>  #define version_update_end(v)   ((v)+1)
> >>  
> >> +#define BIG_OFFSET 10000000000ULL
> >> +
> >>  static void __update_vcpu_system_time(struct vcpu *v, int force)
> >>  {
> >>      struct cpu_time       *t;
> >> @@ -827,7 +829,7 @@ static void __update_vcpu_system_time(st
> >>  
> >>      /* Don't bother unless timestamps have changed or we are
> >> forced. */
> >>      if ( !force && (u->tsc_timestamp == (v->domain->arch.vtsc
> >> -                                         ? t->stime_local_stamp
> >> +                                         ?
> >> t->stime_local_stamp - BIG_OFFSET
> >>                                           : t->local_tsc_stamp)) )
> >>          return;
> >>  
> >> @@ -835,8 +837,8 @@ static void __update_vcpu_system_time(st
> >>  
> >>      if ( v->domain->arch.vtsc )
> >>      {
> >> -        _u.tsc_timestamp     = t->stime_local_stamp;
> >> -        _u.system_time       = t->stime_local_stamp;
> >> +        _u.tsc_timestamp     = t->stime_local_stamp - BIG_OFFSET;
> >> +        _u.system_time       = t->stime_local_stamp - BIG_OFFSET;
> >>          _u.tsc_to_system_mul = 0x80000000u;
> >>          _u.tsc_shift         = 1;
> >>      }
> >> @@ -1598,6 +1600,8 @@ void pv_soft_rdtsc(struct vcpu *v, struc
> >>  
> >>      spin_unlock(&v->domain->arch.vtsc_lock);
> >>  
> >> +    now -= BIG_OFFSET;
> >> +
> >>      regs->eax = (uint32_t)now;
> >>      regs->edx = (uint32_t)(now >> 32);
> >>  }
> >> 
> >> 
> 
> 
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bizarre pv kernel ultra-high frequency rdtsc?!?
  2009-11-20 23:45 Bizarre pv kernel ultra-high frequency rdtsc?!? Dan Magenheimer
  2009-11-21 17:31 ` Dan Magenheimer
  2009-11-21 19:50 ` Keir Fraser
@ 2009-11-21 21:45 ` Keir Fraser
  2009-11-21 22:05   ` Dan Magenheimer
  2 siblings, 1 reply; 7+ messages in thread
From: Keir Fraser @ 2009-11-21 21:45 UTC (permalink / raw)
  To: Dan Magenheimer, Jeremy Fitzhardinge; +Cc: Xen-Devel (E-mail)

On 20/11/2009 23:45, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

> -        _u.tsc_timestamp     = t->stime_local_stamp;
> -        _u.system_time       = t->stime_local_stamp;
> +        _u.tsc_timestamp     = t->stime_local_stamp - BIG_OFFSET;
> +        _u.system_time       = t->stime_local_stamp - BIG_OFFSET;

Ah, the line above is the bug. You should be subtracting BIG_OFFSET from
_u.tsc_timestamp only. Leave _u.system_time as it is. That'll fix all your
observed problems.

 -- Keir

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Bizarre pv kernel ultra-high frequency rdtsc?!?
  2009-11-21 21:45 ` Keir Fraser
@ 2009-11-21 22:05   ` Dan Magenheimer
  0 siblings, 0 replies; 7+ messages in thread
From: Dan Magenheimer @ 2009-11-21 22:05 UTC (permalink / raw)
  To: Keir Fraser, Jeremy Fitzhardinge; +Cc: Xen-Devel (E-mail)

Yes, that does indeed fix it.  Thanks very much!

Now, onward to scaling!  (Necessary when the PV kernel
boots on "tsc safe" hardware but then migrates.)

Thanks again!
Dan

> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Saturday, November 21, 2009 2:45 PM
> To: Dan Magenheimer; Jeremy Fitzhardinge
> Cc: Xen-Devel (E-mail)
> Subject: Re: Bizarre pv kernel ultra-high frequency rdtsc?!?
> 
> 
> On 20/11/2009 23:45, "Dan Magenheimer" 
> <dan.magenheimer@oracle.com> wrote:
> 
> > -        _u.tsc_timestamp     = t->stime_local_stamp;
> > -        _u.system_time       = t->stime_local_stamp;
> > +        _u.tsc_timestamp     = t->stime_local_stamp - BIG_OFFSET;
> > +        _u.system_time       = t->stime_local_stamp - BIG_OFFSET;
> 
> Ah, the line above is the bug. You should be subtracting 
> BIG_OFFSET from
> _u.tsc_timestamp only. Leave _u.system_time as it is. That'll 
> fix all your
> observed problems.
> 
>  -- Keir
> 
> 
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-11-21 22:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-20 23:45 Bizarre pv kernel ultra-high frequency rdtsc?!? Dan Magenheimer
2009-11-21 17:31 ` Dan Magenheimer
2009-11-21 20:02   ` Keir Fraser
2009-11-21 21:24     ` Dan Magenheimer
2009-11-21 19:50 ` Keir Fraser
2009-11-21 21:45 ` Keir Fraser
2009-11-21 22:05   ` Dan Magenheimer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.