All of lore.kernel.org
 help / color / mirror / Atom feed
* xen tsc problems?
@ 2010-07-13 14:37 Stefano Stabellini
  2010-07-13 14:43 ` Keir Fraser
  0 siblings, 1 reply; 10+ messages in thread
From: Stefano Stabellini @ 2010-07-13 14:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Dan Magenheimer

Hi all,
I get this warning from the HVM DomU kernel (both PV on HVM or normal
HVM):

checking TSC synchronization [CPU#0 -> CPU#1]:
Measured 116836520 cycles TSC warp between CPUs, turning off TSC clock.
Marking TSC unstable due to check_tsc_sync_source failed

the host cpu is the following:

processor       : 0                                                                                                           
vendor_id       : GenuineIntel                                                                                                
cpu family      : 15
model           : 6
model name      : Genuine Intel(R) CPU 3.00GHz
stepping        : 2
cpu MHz         : 3000.050
cache size      : 2048 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 6
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht nx constant_tsc pni est cid hypervisor arat
bogomips        : 6000.10
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

it happens on both xen 4.0 (pre 21236) and 4.1.

if I specify tsc_mode=2, first I get:

checking TSC synchronization [CPU#0 -> CPU#1]: passed.

but a little bit afterwards:

Clocksource tsc unstable (delta = 116372610 ns)

If I use a PV on HVM kernel (pv timer enabled) and tsc_mode=1,
besides these messages I get about 20-30 messages like the following:

CE: xen increased min_delta_ns to 506250 nsec

tracing them back to xen I found out that they happen when the guest
kernel tries to set the next timer event in the past.

Does this mean that the host has some serious tsc issues?
Can this be a symptom of a bug in xen?
Suggestion are welcome.

Cheers,

Stefano

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: xen tsc problems?
  2010-07-13 14:37 xen tsc problems? Stefano Stabellini
@ 2010-07-13 14:43 ` Keir Fraser
  2010-07-13 15:37   ` Dan Magenheimer
  0 siblings, 1 reply; 10+ messages in thread
From: Keir Fraser @ 2010-07-13 14:43 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: Dan Magenheimer

On 13/07/2010 15:37, "Stefano Stabellini" <Stefano.Stabellini@eu.citrix.com>
wrote:

> Does this mean that the host has some serious tsc issues?
> Can this be a symptom of a bug in xen?
> Suggestion are welcome.

The 's' and 't' debug key handlers will be useful to get an idea of how
stable host TSCs are.

 -- Keir

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: xen tsc problems?
  2010-07-13 14:43 ` Keir Fraser
@ 2010-07-13 15:37   ` Dan Magenheimer
  2010-07-13 17:39     ` Stefano Stabellini
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Magenheimer @ 2010-07-13 15:37 UTC (permalink / raw)
  To: Keir Fraser, Stefano Stabellini, xen-devel

> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> 
> On 13/07/2010 15:37, "Stefano Stabellini"
> <Stefano.Stabellini@eu.citrix.com>
> wrote:
> 
> > Does this mean that the host has some serious tsc issues?
> > Can this be a symptom of a bug in xen?
> > Suggestion are welcome.
> 
> The 's' and 't' debug key handlers will be useful to get an idea of how
> stable host TSCs are.
> 
>  -- Keir

Also you can try max_cstate=0 as a Xen boot parameter to rule
out power management screwing up the tsc.

> > Does this mean that the host has some serious tsc issues?

Probably.  But the default tsc_mode (0) is intended to hide all
such issues.  Could you check the 's' debug-key output to
ensure your guest is actually running with tsc_mode=0?

> > Can this be a symptom of a bug in xen?

Well, if the guest has problems with the default tsc_mode (0),
which does complete tsc emulation, I suppose it could be
a bug in Xen.  In particular, I wonder if the code that
recovers from deep C-states (and writes to the TSC) is broken.
IIRC, there was some changesets in that area recently.
If the problem goes away with max_cstate=0, that would be
a good place to start.

Dan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: xen tsc problems?
  2010-07-13 15:37   ` Dan Magenheimer
@ 2010-07-13 17:39     ` Stefano Stabellini
  2010-07-13 17:48       ` Keir Fraser
  0 siblings, 1 reply; 10+ messages in thread
From: Stefano Stabellini @ 2010-07-13 17:39 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: xen-devel, Keir Fraser, Stefano Stabellini

On Tue, 13 Jul 2010, Dan Magenheimer wrote:
> > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> > 
> > On 13/07/2010 15:37, "Stefano Stabellini"
> > <Stefano.Stabellini@eu.citrix.com>
> > wrote:
> > 
> > > Does this mean that the host has some serious tsc issues?
> > > Can this be a symptom of a bug in xen?
> > > Suggestion are welcome.
> > 
> > The 's' and 't' debug key handlers will be useful to get an idea of how
> > stable host TSCs are.
> > 
> >  -- Keir
> 
> Also you can try max_cstate=0 as a Xen boot parameter to rule
> out power management screwing up the tsc.
> 
> > > Does this mean that the host has some serious tsc issues?
> 
> Probably.  But the default tsc_mode (0) is intended to hide all
> such issues.  Could you check the 's' debug-key output to
> ensure your guest is actually running with tsc_mode=0?
> 

this is the output of 's' and 't' without max_cstate=0:

(XEN) Synced stime skew: max=245ns avg=202ns samples=2 current=160ns
(XEN) Synced cycles skew: max=615 avg=577 samples=2 current=540 
(XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=0 (count=2)                                         
(XEN) dom3(hvm): mode=0,ofs=0x2b2e19a77ea,khz=3000048,inc=1,vtsc count: 1211682 total

this is the output of 's' and 't' with max_cstate=0:

(XEN) Synced stime skew: max=110ns avg=105ns samples=2 current=110ns
(XEN) Synced cycles skew: max=1020 avg=652 samples=2 current=285
(XEN) TSC has constant rate, no deep Cstates, passed warp test, deemed reliable, warp=0 (count=2)
(XEN) dom2(hvm): mode=0,ofs=0xb748091f5,khz=3000032,inc=1,vtsc count: 758954 total

I still get the same warning from the guest.


I started to wonder why the guest is seeing such a big tsc warp when xen
is seeing 0, so I added more tracing and eventually I found out that the
value of v->arch.hvm_vcpu.stime_offset is significantly different
between the two vcpus and the difference increases after the scaling.
Then I added timer_mode=1 to my vm config file and the problem went
away.
I think that delay_for_missed_ticks shouldn't cause tsc scew in
the guest.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: xen tsc problems?
  2010-07-13 17:39     ` Stefano Stabellini
@ 2010-07-13 17:48       ` Keir Fraser
  2010-07-13 18:06         ` Dan Magenheimer
  2010-07-13 18:12         ` Keir Fraser
  0 siblings, 2 replies; 10+ messages in thread
From: Keir Fraser @ 2010-07-13 17:48 UTC (permalink / raw)
  To: Stefano Stabellini, Dan Magenheimer; +Cc: xen-devel

On 13/07/2010 18:39, "Stefano Stabellini" <Stefano.Stabellini@eu.citrix.com>
wrote:

> I started to wonder why the guest is seeing such a big tsc warp when xen
> is seeing 0, so I added more tracing and eventually I found out that the
> value of v->arch.hvm_vcpu.stime_offset is significantly different
> between the two vcpus and the difference increases after the scaling.
> Then I added timer_mode=1 to my vm config file and the problem went
> away.
> I think that delay_for_missed_ticks shouldn't cause tsc scew in
> the guest.

Well, timer_mode=1 is the default and I doubt in all seriousness that the
other modes get any use or testing.

 -- Keir

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: xen tsc problems?
  2010-07-13 17:48       ` Keir Fraser
@ 2010-07-13 18:06         ` Dan Magenheimer
  2010-07-13 18:14           ` Stefano Stabellini
  2010-07-13 18:12         ` Keir Fraser
  1 sibling, 1 reply; 10+ messages in thread
From: Dan Magenheimer @ 2010-07-13 18:06 UTC (permalink / raw)
  To: Keir Fraser, Stefano Stabellini; +Cc: xen-devel

/me wonders if timer_mode=1 is the default for xl?
Or only for xm?

> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Tuesday, July 13, 2010 11:48 AM
> To: Stefano Stabellini; Dan Magenheimer
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] xen tsc problems?
> 
> On 13/07/2010 18:39, "Stefano Stabellini"
> <Stefano.Stabellini@eu.citrix.com>
> wrote:
> 
> > I started to wonder why the guest is seeing such a big tsc warp when
> xen
> > is seeing 0, so I added more tracing and eventually I found out that
> the
> > value of v->arch.hvm_vcpu.stime_offset is significantly different
> > between the two vcpus and the difference increases after the scaling.
> > Then I added timer_mode=1 to my vm config file and the problem went
> > away.
> > I think that delay_for_missed_ticks shouldn't cause tsc scew in
> > the guest.
> 
> Well, timer_mode=1 is the default and I doubt in all seriousness that
> the
> other modes get any use or testing.
> 
>  -- Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: xen tsc problems?
  2010-07-13 17:48       ` Keir Fraser
  2010-07-13 18:06         ` Dan Magenheimer
@ 2010-07-13 18:12         ` Keir Fraser
  1 sibling, 0 replies; 10+ messages in thread
From: Keir Fraser @ 2010-07-13 18:12 UTC (permalink / raw)
  To: Stefano Stabellini, Dan Magenheimer; +Cc: xen-devel

On 13/07/2010 18:48, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

>> I started to wonder why the guest is seeing such a big tsc warp when xen
>> is seeing 0, so I added more tracing and eventually I found out that the
>> value of v->arch.hvm_vcpu.stime_offset is significantly different
>> between the two vcpus and the difference increases after the scaling.
>> Then I added timer_mode=1 to my vm config file and the problem went
>> away.
>> I think that delay_for_missed_ticks shouldn't cause tsc scew in
>> the guest.
> 
> Well, timer_mode=1 is the default and I doubt in all seriousness that the
> other modes get any use or testing.

To give you an idea how long it's probably been broken, my suspicion is that
the culprit is xen-unstable:17716, which is over two years old. That patch
changed HVM time handling to base it more on Xen system time. The fact that
hvm_set_guest_time() no longer directly affects guest TSC is probably the
problem here. I think delay_for_missed_ticks might depend on that. Anyway,
I'm not certain but I'd put money on it.

 -- Keir

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: xen tsc problems?
  2010-07-13 18:06         ` Dan Magenheimer
@ 2010-07-13 18:14           ` Stefano Stabellini
  2010-07-13 18:59             ` Keir Fraser
  0 siblings, 1 reply; 10+ messages in thread
From: Stefano Stabellini @ 2010-07-13 18:14 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: xen-devel, Keir Fraser, Stefano Stabellini

On Tue, 13 Jul 2010, Dan Magenheimer wrote:
> /me wonders if timer_mode=1 is the default for xl?
> Or only for xm?

no, it is not.
Xl defaults to 0, I am going to change it right now.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: xen tsc problems?
  2010-07-13 18:14           ` Stefano Stabellini
@ 2010-07-13 18:59             ` Keir Fraser
  2010-07-13 19:32               ` Dan Magenheimer
  0 siblings, 1 reply; 10+ messages in thread
From: Keir Fraser @ 2010-07-13 18:59 UTC (permalink / raw)
  To: Stefano Stabellini, Dan Magenheimer; +Cc: xen-devel

On 13/07/2010 19:14, "Stefano Stabellini" <Stefano.Stabellini@eu.citrix.com>
wrote:

> On Tue, 13 Jul 2010, Dan Magenheimer wrote:
>> /me wonders if timer_mode=1 is the default for xl?
>> Or only for xm?
> 
> no, it is not.
> Xl defaults to 0, I am going to change it right now.

Possibly we should make timer_mode=1 the default in Xen as well, and
actually disallow setting it to 0. Clearly no good comes of it.

 -- Keir

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: xen tsc problems?
  2010-07-13 18:59             ` Keir Fraser
@ 2010-07-13 19:32               ` Dan Magenheimer
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Magenheimer @ 2010-07-13 19:32 UTC (permalink / raw)
  To: Keir Fraser, Stefano Stabellini; +Cc: xen-devel

> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Tuesday, July 13, 2010 12:59 PM
> To: Stefano Stabellini; Dan Magenheimer
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] xen tsc problems?
> 
> On 13/07/2010 19:14, "Stefano Stabellini"
> <Stefano.Stabellini@eu.citrix.com>
> wrote:
> 
> > On Tue, 13 Jul 2010, Dan Magenheimer wrote:
> >> /me wonders if timer_mode=1 is the default for xl?
> >> Or only for xm?
> >
> > no, it is not.
> > Xl defaults to 0, I am going to change it right now.
> 
> Possibly we should make timer_mode=1 the default in Xen as well, and
> actually disallow setting it to 0. Clearly no good comes of it.

IIRC from >2 years ago, timer_mode=0 was best for older HVM 32-bit
Linux guests.  Obviously if the code (interacting with the tsc code)
has bit-rotted, "best" is a relative term. :-)

Dan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-07-13 19:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-13 14:37 xen tsc problems? Stefano Stabellini
2010-07-13 14:43 ` Keir Fraser
2010-07-13 15:37   ` Dan Magenheimer
2010-07-13 17:39     ` Stefano Stabellini
2010-07-13 17:48       ` Keir Fraser
2010-07-13 18:06         ` Dan Magenheimer
2010-07-13 18:14           ` Stefano Stabellini
2010-07-13 18:59             ` Keir Fraser
2010-07-13 19:32               ` Dan Magenheimer
2010-07-13 18:12         ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.