All of lore.kernel.org
 help / color / mirror / Atom feed
* Runaway real/sys time in newer paravirt domUs?
@ 2010-07-06 16:32 Jed Smith
  2010-07-06 19:05 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 9+ messages in thread
From: Jed Smith @ 2010-07-06 16:32 UTC (permalink / raw)
  To: xen-devel

Good morning,

We've had a few reports from domU customers[1] - confirmed by myself - that CPU
time accounting is very inaccurate in certain circumstances.  This issue seems
to be limited to x86_64 domUs, starting around the 2.6.32 family (but I can't be
sure of that).

The symptoms of the flaw include top reporting hours and days of CPU consumed by
a task which has been running for mere seconds of wall time, as well as the
time(1) utility reporting hundreds of years in some cases.  Contra-indicatively,
the /proc/stat timers on all four VCPUs increment at roughly the expected rate.
Needless to say, this is puzzling.

A test case which highlights the failure has been brought to our attention by
Ævar Arnfjörð Bjarmason, which is a simple Perl script[2] that forks and
executes numerous dig(1) processes.  At the end of his script, time(1) reports
268659840m0.951s of user and 38524003m13.072s of system time consumed.  I am
able to confirm this demonstration using:

 - Xen 3.4.1 on dom0 2.6.18.8-931-2
 - Debian Lenny on domU 2.6.32.12-x86_64-linode12 [3]

Running Ævar's test case looks like this, in that domU:

> real 0m30.741s
> user 307399002m50.773s
> sys 46724m44.192s

However, a quick busyloop in Python seems to report the correct time:

> li21-66:~# cat doit.py 
> for i in xrange(10000000):
>  a = i ** 5
>
> li21-66:~# time python doit.py
>
> real	0m16.600s
> user	0m16.593s
> sys	0m0.006s

I rebooted the domU, and the problem no longer exists.  It seems to be transient
in nature, and difficult to isolate.  /proc/stat seems to increment normally:

> li21-66:/proc# cat stat | grep "cpu " && sleep 1 && cat stat | grep "cpu "
> cpu  3742 0 1560 700180 1326 0 27 1282 0
> cpu  3742 0 1562 700983 1326 0 27 1282 0

I'm not sure where to begin with this one - any thoughts?

 [1]: http://www.linode.com/forums/viewtopic.php?p=30715
 [2]: git://gist.github.com/449825.git
 [3]: Source: http://www.linode.com/src/2.6.32.12-x86_64-linode12.tar.bz2
      Config: http://jedsmith.org/tmp/2.6.32.12-x86_64-linode12.txt

Thanks for the assistance,

Jed Smith
Systems Administrator
Linode, LLC
+1 (609) 593-7103 x1209
jed@linode.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Runaway real/sys time in newer paravirt domUs?
  2010-07-06 16:32 Runaway real/sys time in newer paravirt domUs? Jed Smith
@ 2010-07-06 19:05 ` Jeremy Fitzhardinge
  2010-07-06 20:20   ` Jed Smith
  2010-07-08 16:38   ` Jed Smith
  0 siblings, 2 replies; 9+ messages in thread
From: Jeremy Fitzhardinge @ 2010-07-06 19:05 UTC (permalink / raw)
  To: Jed Smith; +Cc: xen-devel

On 07/06/2010 09:32 AM, Jed Smith wrote:
> Good morning,
>
> We've had a few reports from domU customers[1] - confirmed by myself - that CPU
> time accounting is very inaccurate in certain circumstances.  This issue seems
> to be limited to x86_64 domUs, starting around the 2.6.32 family (but I can't be
> sure of that).
>
> The symptoms of the flaw include top reporting hours and days of CPU consumed by
> a task which has been running for mere seconds of wall time, as well as the
> time(1) utility reporting hundreds of years in some cases.  Contra-indicatively,
> the /proc/stat timers on all four VCPUs increment at roughly the expected rate.
> Needless to say, this is puzzling.
>
> A test case which highlights the failure has been brought to our attention by
> Ævar Arnfjörð Bjarmason, which is a simple Perl script[2] that forks and
> executes numerous dig(1) processes.  At the end of his script, time(1) reports
> 268659840m0.951s of user and 38524003m13.072s of system time consumed.  I am
> able to confirm this demonstration using:
>
>  - Xen 3.4.1 on dom0 2.6.18.8-931-2
>  - Debian Lenny on domU 2.6.32.12-x86_64-linode12 [3]
>
> Running Ævar's test case looks like this, in that domU:
>
>   
>> real 0m30.741s
>> user 307399002m50.773s
>> sys 46724m44.192s
>>     
> However, a quick busyloop in Python seems to report the correct time:
>
>   
>> li21-66:~# cat doit.py 
>> for i in xrange(10000000):
>>  a = i ** 5
>>
>> li21-66:~# time python doit.py
>>
>> real	0m16.600s
>> user	0m16.593s
>> sys	0m0.006s
>>     
> I rebooted the domU, and the problem no longer exists.  It seems to be transient
> in nature, and difficult to isolate.  /proc/stat seems to increment normally:
>
>   
>> li21-66:/proc# cat stat | grep "cpu " && sleep 1 && cat stat | grep "cpu "
>> cpu  3742 0 1560 700180 1326 0 27 1282 0
>> cpu  3742 0 1562 700983 1326 0 27 1282 0
>>     
> I'm not sure where to begin with this one - any thoughts?
>   

It would be helpful to identify what kernel version the change of
behaviour started in (ideally a git bisect down to a particular change,
but a pair of versions would be close enough).

I think this is the same problem as
https://bugzilla.kernel.org/show_bug.cgi?id=16314

Thanks,
    J

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Runaway real/sys time in newer paravirt domUs?
  2010-07-06 19:05 ` Jeremy Fitzhardinge
@ 2010-07-06 20:20   ` Jed Smith
  2010-07-08 16:38   ` Jed Smith
  1 sibling, 0 replies; 9+ messages in thread
From: Jed Smith @ 2010-07-06 20:20 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel

On Jul 6, 2010, at 3:05 PM, Jeremy Fitzhardinge wrote:

> I think this is the same problem as
> https://bugzilla.kernel.org/show_bug.cgi?id=16314

Which fits, because "OverlordQ" in that bug entry has posted on the forum
entry we have about it, too.  I'll see if I can bisect kernels later this
evening or tomorrow - you think .31 is a safe guess for a left side?

Regards,

Jed Smith
Systems Administrator
Linode, LLC
+1 (609) 593-7103 x1209
jed@linode.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Runaway real/sys time in newer paravirt domUs?
  2010-07-06 19:05 ` Jeremy Fitzhardinge
  2010-07-06 20:20   ` Jed Smith
@ 2010-07-08 16:38   ` Jed Smith
  2010-07-09  8:46     ` Jan Beulich
  1 sibling, 1 reply; 9+ messages in thread
From: Jed Smith @ 2010-07-08 16:38 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel

On Jul 6, 2010, at 3:05 PM, Jeremy Fitzhardinge wrote:

> I think this is the same problem as
> https://bugzilla.kernel.org/show_bug.cgi?id=16314

If it is, it goes further back than 2.6.31 as that bug report suggests. I was
able to reproduce it once in v2.6.30 as tagged, and am still laboring to find
an A-B point to bisect. Failure rate is significantly lower in 2.6.30, but
it did happen once.

So far, Ubuntu linux-image-ec2 (2.6.32-30{5,7}-ec2), our 2.6.32.12, vanilla
2.6.31, and vanilla 2.6.30 have demonstrated this issue in my testing
environment.  All x86_64.  I am unable to reproduce on i386 to this point.

Regards,

Jed Smith
Systems Administrator
Linode, LLC
+1 (609) 593-7103 x1209
jed@linode.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Runaway real/sys time in newer paravirt domUs?
  2010-07-08 16:38   ` Jed Smith
@ 2010-07-09  8:46     ` Jan Beulich
  2010-07-09 14:57       ` Jed Smith
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2010-07-09  8:46 UTC (permalink / raw)
  To: Jed Smith; +Cc: Jeremy Fitzhardinge, xen-devel

>>> On 08.07.10 at 18:38, Jed Smith <jed@linode.com> wrote:
> On Jul 6, 2010, at 3:05 PM, Jeremy Fitzhardinge wrote:
> 
>> I think this is the same problem as
>> https://bugzilla.kernel.org/show_bug.cgi?id=16314 
> 
> If it is, it goes further back than 2.6.31 as that bug report suggests. I 
> was
> able to reproduce it once in v2.6.30 as tagged, and am still laboring to 
> find
> an A-B point to bisect. Failure rate is significantly lower in 2.6.30, but
> it did happen once.
> 
> So far, Ubuntu linux-image-ec2 (2.6.32-30{5,7}-ec2), our 2.6.32.12, vanilla
> 2.6.31, and vanilla 2.6.30 have demonstrated this issue in my testing
> environment.  All x86_64.  I am unable to reproduce on i386 to this point.

Is this perhaps also dependent on the CPU make/model?

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Runaway real/sys time in newer paravirt domUs?
  2010-07-09  8:46     ` Jan Beulich
@ 2010-07-09 14:57       ` Jed Smith
  2010-07-09 16:00         ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Jed Smith @ 2010-07-09 14:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Jul 9, 2010, at 4:46 AM, Jan Beulich wrote:

> Is this perhaps also dependent on the CPU make/model?

I have a mostly-homogenous environment, but I'll see what I can do.  I have one
box in mind, actually...

Since I spoke with Jeremy yesterday, I've become suspicious that the issue I am
reporting here is likely the same that I've seen for some time.  A quick
background: In our domU environment, Munin incorrectly reports idle time.  All
Linode domUs have 4 VCPUs, so user/sys travel freely to 400%.  Idle, however,
has always reported 800% under paravirtualized kernels.  This led me to
investigate a bit, and this is what I deduced:

 - In /proc/uptime, idle time outruns system uptime significantly.

 - This ratio is seemingly affected by the number of VCPUs the domU is
   configured for.  With only one VCPU, the ratio is roughly 2.0, leading me to
   think that idle ticks are counted twice per-VCPU.

 - The ratio between idle/system is inconclusive after a lengthy uptime.

 - In a 2.6.29 domU, two things happen:

  - a) The original bug does not manifest after 50 attempts.

  - b) In /proc/uptime, idle time is always precisely 0.00.  It never counts.

 - Between 2.6.29 and 2.6.30, /proc/uptime behaves much differently, and the bug
   then exposes itself.  Something changed there.

I have seen the /proc/uptime behavior on i386.  In fact, a personal domU:

   10:51 jsmith@undertow% cat /proc/uptime
   1984022.43 2954870.51

   10:51 jsmith@undertow% awk '{printf("%f\n", $2 / $1)}' /proc/uptime
   1.489342

   10:51 jsmith@undertow% uname -a
   Linux undertow.jedsmith.org 2.6.32.12-linode25 #1 SMP Wed Apr 28 19:25:11 UTC 2010 i686 GNU/Linux

I am not sure if the fact that I can only make the original bug appear on x86_64
is a red herring.  Maybe the timer {ov,und}erflow display is different depending
on word size, and this is all the same issue?

I will see if I can try this on a vastly different CPU later today.

Regards,

Jed Smith
Systems Administrator
Linode, LLC
+1 (609) 593-7103 x1209
jed@linode.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Runaway real/sys time in newer paravirt domUs?
  2010-07-09 14:57       ` Jed Smith
@ 2010-07-09 16:00         ` Jan Beulich
  2010-07-13 21:10           ` Jed Smith
  2010-07-14 16:31           ` Jed Smith
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Beulich @ 2010-07-09 16:00 UTC (permalink / raw)
  To: Jed Smith; +Cc: xen-devel

>>> On 09.07.10 at 16:57, Jed Smith <jed@linode.com> wrote:
> On Jul 9, 2010, at 4:46 AM, Jan Beulich wrote:
>> Is this perhaps also dependent on the CPU make/model?
>  - Between 2.6.29 and 2.6.30, /proc/uptime behaves much differently, and the 
> bug
>    then exposes itself.  Something changed there.

Others as well as me had seen similar misbehavior (outside of pv-ops),
but only on reasonably new Intel systems. sched_clock_stable getting
set to one in arch/x86/kernel/cpu/intel.c turned out to be the problem.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Runaway real/sys time in newer paravirt domUs?
  2010-07-09 16:00         ` Jan Beulich
@ 2010-07-13 21:10           ` Jed Smith
  2010-07-14 16:31           ` Jed Smith
  1 sibling, 0 replies; 9+ messages in thread
From: Jed Smith @ 2010-07-13 21:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Jul 9, 2010, at 12:00 PM, Jan Beulich wrote:

>>>> On 09.07.10 at 16:57, Jed Smith <jed@linode.com> wrote:
>> On Jul 9, 2010, at 4:46 AM, Jan Beulich wrote:
>>> Is this perhaps also dependent on the CPU make/model?
>> - Between 2.6.29 and 2.6.30, /proc/uptime behaves much differently, and the 
>> bug
>>   then exposes itself.  Something changed there.
> 
> Others as well as me had seen similar misbehavior (outside of pv-ops),
> but only on reasonably new Intel systems. sched_clock_stable getting
> set to one in arch/x86/kernel/cpu/intel.c turned out to be the problem.

Is there a way to test this that I could perform for you?  We're getting
more reports of the same issue in our newer kernels.  Do you think
CONFIG_NO_HZ makes a difference either way?

Regards,

Jed Smith
Systems Administrator
Linode, LLC
+1 (609) 593-7103 x1209
jed@linode.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Runaway real/sys time in newer paravirt domUs?
  2010-07-09 16:00         ` Jan Beulich
  2010-07-13 21:10           ` Jed Smith
@ 2010-07-14 16:31           ` Jed Smith
  1 sibling, 0 replies; 9+ messages in thread
From: Jed Smith @ 2010-07-14 16:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Jul 9, 2010, at 12:00 PM, Jan Beulich wrote:

>>>> On 09.07.10 at 16:57, Jed Smith <jed@linode.com> wrote:
>> On Jul 9, 2010, at 4:46 AM, Jan Beulich wrote:
>>> Is this perhaps also dependent on the CPU make/model?
>> - Between 2.6.29 and 2.6.30, /proc/uptime behaves much differently, and the 
>> bug
>>   then exposes itself.  Something changed there.
> 
> Others as well as me had seen similar misbehavior (outside of pv-ops),
> but only on reasonably new Intel systems. sched_clock_stable getting
> set to one in arch/x86/kernel/cpu/intel.c turned out to be the problem.

Like so?  This change seems to fix the issue, but I'm going to continue to
hammer on it for the remainder of the day.


diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 40e1835..e362517 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -71,7 +71,14 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
                set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
                set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
                set_cpu_cap(c, X86_FEATURE_TSC_RELIABLE);
+#ifndef CONFIG_XEN
+               /*
+                * Considering the clock stable in paravirt domUs under Xen
+                * causes timing instability on certain Intel CPUs.
+                */
                sched_clock_stable = 1;
+#endif /* CONFIG_XEN */
        }
 
        /*

Regards,

Jed Smith
Systems Administrator
Linode, LLC
+1 (609) 593-7103 x1209
jed@linode.com

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-07-14 16:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-06 16:32 Runaway real/sys time in newer paravirt domUs? Jed Smith
2010-07-06 19:05 ` Jeremy Fitzhardinge
2010-07-06 20:20   ` Jed Smith
2010-07-08 16:38   ` Jed Smith
2010-07-09  8:46     ` Jan Beulich
2010-07-09 14:57       ` Jed Smith
2010-07-09 16:00         ` Jan Beulich
2010-07-13 21:10           ` Jed Smith
2010-07-14 16:31           ` Jed Smith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.