All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: DomU clock jumps forward then freezes after Dom0 reboot
@ 2010-10-19 10:07 Olivier Hanesse
  2010-10-24 12:31 ` Cédric Schieli
  0 siblings, 1 reply; 7+ messages in thread
From: Olivier Hanesse @ 2010-10-19 10:07 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 450 bytes --]

Hello,

I think I got exactly the same bug.

After a reboot, all my DomU are stuck during X min ,  where X was the
"uptime" of the dom0 before reboot.

Save/Restore without reboot works perfectly.

I am running Debian Lenny with backports :

ii  xen-hypervisor-4.0-amd64                    4.0.1-1
The Xen Hypervisor on AMD64
ii  linux-image-2.6.32-bpo.5-xen-amd64          2.6.32-23~bpo50+1
Linux 2.6.32 for 64-bit PCs, Xen dom0 suppor

Any ideas ?

[-- Attachment #1.2: Type: text/html, Size: 539 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: DomU clock jumps forward then freezes after Dom0 reboot
  2010-10-19 10:07 DomU clock jumps forward then freezes after Dom0 reboot Olivier Hanesse
@ 2010-10-24 12:31 ` Cédric Schieli
  2010-10-25 23:25   ` Jeremy Fitzhardinge
  2010-10-26  0:21   ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 7+ messages in thread
From: Cédric Schieli @ 2010-10-24 12:31 UTC (permalink / raw)
  To: xen-devel

Hello,

I can confirm my problem reported here
http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00057.html
is the same.
DomU kernels affected by the migration hang are also affected by the
save/restore hang. Reverting "x86, paravirt: Add a global
synchronization point for pvclock" also fix the save/restore hang.
After doing save/reboot/restore (which led to a hang), migrating it to
a host with a longer uptime will unblock the domain, but the wallclock
will be several hours forward. Migrating back will block again.

Regards,
Cédric Schieli

2010/10/19 Olivier Hanesse <olivier.hanesse@gmail.com>:
> Hello,
>
> I think I got exactly the same bug.
>
> After a reboot, all my DomU are stuck during X min ,  where X was the
> "uptime" of the dom0 before reboot.
>
> Save/Restore without reboot works perfectly.
>
> I am running Debian Lenny with backports :
>
> ii  xen-hypervisor-4.0-amd64                    4.0.1-1
> The Xen Hypervisor on AMD64
> ii  linux-image-2.6.32-bpo.5-xen-amd64          2.6.32-23~bpo50+1
> Linux 2.6.32 for 64-bit PCs, Xen dom0 suppor
>
> Any ideas ?
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: DomU clock jumps forward then freezes after Dom0 reboot
  2010-10-24 12:31 ` Cédric Schieli
@ 2010-10-25 23:25   ` Jeremy Fitzhardinge
  2010-10-26  0:21   ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-25 23:25 UTC (permalink / raw)
  To: Cédric Schieli; +Cc: xen-devel

 On 10/24/2010 05:31 AM, Cédric Schieli wrote:
> Hello,
>
> I can confirm my problem reported here
> http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00057.html
> is the same.
> DomU kernels affected by the migration hang are also affected by the
> save/restore hang. Reverting "x86, paravirt: Add a global
> synchronization point for pvclock" also fix the save/restore hang.
> After doing save/reboot/restore (which led to a hang), migrating it to
> a host with a longer uptime will unblock the domain, but the wallclock
> will be several hours forward. Migrating back will block again.

Ah, thanks for that.  I was trying to think of what changes could have
broken that, since it certainly used to work.  I'll sort out a fix.

Thanks,
    J

> Regards,
> Cédric Schieli
>
> 2010/10/19 Olivier Hanesse <olivier.hanesse@gmail.com>:
>> Hello,
>>
>> I think I got exactly the same bug.
>>
>> After a reboot, all my DomU are stuck during X min ,  where X was the
>> "uptime" of the dom0 before reboot.
>>
>> Save/Restore without reboot works perfectly.
>>
>> I am running Debian Lenny with backports :
>>
>> ii  xen-hypervisor-4.0-amd64                    4.0.1-1
>> The Xen Hypervisor on AMD64
>> ii  linux-image-2.6.32-bpo.5-xen-amd64          2.6.32-23~bpo50+1
>> Linux 2.6.32 for 64-bit PCs, Xen dom0 suppor
>>
>> Any ideas ?
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
>>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: DomU clock jumps forward then freezes after Dom0 reboot
  2010-10-24 12:31 ` Cédric Schieli
  2010-10-25 23:25   ` Jeremy Fitzhardinge
@ 2010-10-26  0:21   ` Jeremy Fitzhardinge
  2010-10-26 13:08     ` Cédric Schieli
  1 sibling, 1 reply; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-26  0:21 UTC (permalink / raw)
  To: Cédric Schieli; +Cc: xen-devel

 On 10/24/2010 05:31 AM, Cédric Schieli wrote:
> Hello,
>
> I can confirm my problem reported here
> http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00057.html
> is the same.
> DomU kernels affected by the migration hang are also affected by the
> save/restore hang. Reverting "x86, paravirt: Add a global
> synchronization point for pvclock" also fix the save/restore hang.
> After doing save/reboot/restore (which led to a hang), migrating it to
> a host with a longer uptime will unblock the domain, but the wallclock
> will be several hours forward. Migrating back will block again.

Does this help?

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date: Mon, 25 Oct 2010 16:53:46 -0700
Subject: [PATCH] x86/pvclock: zero last_value on resume

If the guest domain has been suspend/resumed or migrated, then the
system clock backing the pvclock clocksource may revert to a smaller
value (ie, can be non-monotonic across the migration/save-restore).
Make sure we zero last_value in that case so that the domain
continues to see clock updates.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index cd02f32..6226870 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -11,5 +11,6 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
 void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
 			    struct pvclock_vcpu_time_info *vcpu,
 			    struct timespec *ts);
+void pvclock_resume(void);
 
 #endif /* _ASM_X86_PVCLOCK_H */
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 239427c..a4f07c1 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -120,6 +120,11 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src)
 
 static atomic64_t last_value = ATOMIC64_INIT(0);
 
+void pvclock_resume(void)
+{
+	atomic64_set(&last_value, 0);
+}
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
 	struct pvclock_shadow_time shadow;
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index b2bb5aa..5da5e53 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -426,6 +426,8 @@ void xen_timer_resume(void)
 {
 	int cpu;
 
+	pvclock_resume();
+
 	if (xen_clockevent != &xen_vcpuop_clockevent)
 		return;
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: DomU clock jumps forward then freezes after Dom0 reboot
  2010-10-26  0:21   ` Jeremy Fitzhardinge
@ 2010-10-26 13:08     ` Cédric Schieli
  2010-10-26 16:52       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 7+ messages in thread
From: Cédric Schieli @ 2010-10-26 13:08 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel

2010/10/26 Jeremy Fitzhardinge <jeremy@goop.org>:
>  On 10/24/2010 05:31 AM, Cédric Schieli wrote:
>> Hello,
>>
>> I can confirm my problem reported here
>> http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00057.html
>> is the same.
>> DomU kernels affected by the migration hang are also affected by the
>> save/restore hang. Reverting "x86, paravirt: Add a global
>> synchronization point for pvclock" also fix the save/restore hang.
>> After doing save/reboot/restore (which led to a hang), migrating it to
>> a host with a longer uptime will unblock the domain, but the wallclock
>> will be several hours forward. Migrating back will block again.
>
> Does this help?

Yes. With this patch applied I can migrate and migrate back without
problem. Save/restore with a reboot in between also works.

Thanks !

>
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Date: Mon, 25 Oct 2010 16:53:46 -0700
> Subject: [PATCH] x86/pvclock: zero last_value on resume
>
> If the guest domain has been suspend/resumed or migrated, then the
> system clock backing the pvclock clocksource may revert to a smaller
> value (ie, can be non-monotonic across the migration/save-restore).
> Make sure we zero last_value in that case so that the domain
> continues to see clock updates.
>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>
> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
> index cd02f32..6226870 100644
> --- a/arch/x86/include/asm/pvclock.h
> +++ b/arch/x86/include/asm/pvclock.h
> @@ -11,5 +11,6 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
>  void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
>                            struct pvclock_vcpu_time_info *vcpu,
>                            struct timespec *ts);
> +void pvclock_resume(void);
>
>  #endif /* _ASM_X86_PVCLOCK_H */
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 239427c..a4f07c1 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -120,6 +120,11 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src)
>
>  static atomic64_t last_value = ATOMIC64_INIT(0);
>
> +void pvclock_resume(void)
> +{
> +       atomic64_set(&last_value, 0);
> +}
> +
>  cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
>  {
>        struct pvclock_shadow_time shadow;
> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> index b2bb5aa..5da5e53 100644
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -426,6 +426,8 @@ void xen_timer_resume(void)
>  {
>        int cpu;
>
> +       pvclock_resume();
> +
>        if (xen_clockevent != &xen_vcpuop_clockevent)
>                return;
>
>
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: DomU clock jumps forward then freezes after Dom0 reboot
  2010-10-26 13:08     ` Cédric Schieli
@ 2010-10-26 16:52       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-26 16:52 UTC (permalink / raw)
  To: Cédric Schieli; +Cc: xen-devel

 On 10/26/2010 06:08 AM, Cédric Schieli wrote:
> 2010/10/26 Jeremy Fitzhardinge <jeremy@goop.org>:
>>  On 10/24/2010 05:31 AM, Cédric Schieli wrote:
>>> Hello,
>>>
>>> I can confirm my problem reported here
>>> http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00057.html
>>> is the same.
>>> DomU kernels affected by the migration hang are also affected by the
>>> save/restore hang. Reverting "x86, paravirt: Add a global
>>> synchronization point for pvclock" also fix the save/restore hang.
>>> After doing save/reboot/restore (which led to a hang), migrating it to
>>> a host with a longer uptime will unblock the domain, but the wallclock
>>> will be several hours forward. Migrating back will block again.
>> Does this help?
> Yes. With this patch applied I can migrate and migrate back without
> problem. Save/restore with a reboot in between also works.

OK, thanks very much.

    J

> Thanks !
>
>> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>> Date: Mon, 25 Oct 2010 16:53:46 -0700
>> Subject: [PATCH] x86/pvclock: zero last_value on resume
>>
>> If the guest domain has been suspend/resumed or migrated, then the
>> system clock backing the pvclock clocksource may revert to a smaller
>> value (ie, can be non-monotonic across the migration/save-restore).
>> Make sure we zero last_value in that case so that the domain
>> continues to see clock updates.
>>
>> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>>
>> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
>> index cd02f32..6226870 100644
>> --- a/arch/x86/include/asm/pvclock.h
>> +++ b/arch/x86/include/asm/pvclock.h
>> @@ -11,5 +11,6 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
>>  void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
>>                            struct pvclock_vcpu_time_info *vcpu,
>>                            struct timespec *ts);
>> +void pvclock_resume(void);
>>
>>  #endif /* _ASM_X86_PVCLOCK_H */
>> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
>> index 239427c..a4f07c1 100644
>> --- a/arch/x86/kernel/pvclock.c
>> +++ b/arch/x86/kernel/pvclock.c
>> @@ -120,6 +120,11 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src)
>>
>>  static atomic64_t last_value = ATOMIC64_INIT(0);
>>
>> +void pvclock_resume(void)
>> +{
>> +       atomic64_set(&last_value, 0);
>> +}
>> +
>>  cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
>>  {
>>        struct pvclock_shadow_time shadow;
>> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
>> index b2bb5aa..5da5e53 100644
>> --- a/arch/x86/xen/time.c
>> +++ b/arch/x86/xen/time.c
>> @@ -426,6 +426,8 @@ void xen_timer_resume(void)
>>  {
>>        int cpu;
>>
>> +       pvclock_resume();
>> +
>>        if (xen_clockevent != &xen_vcpuop_clockevent)
>>                return;
>>
>>
>>
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* DomU clock jumps forward then freezes after Dom0 reboot
@ 2010-10-12 14:11 Eelco Dolstra
  0 siblings, 0 replies; 7+ messages in thread
From: Eelco Dolstra @ 2010-10-12 14:11 UTC (permalink / raw)
  To: Xen-devel

Hi,

I'm running into a strange problem with DomU clocks after saving/restoring the
domain across a reboot of Dom0.  After saving DomU, rebooting Dom0, and
restoring DomU, DomU's clock jumps into the future by an amount equal to the
previous uptime of Dom0, then freezes until the same amount of time has passed,
after which it start running normally again.  This is on Xen 4.0.1, with Dom0
running Linux 2.6.32.24-xen-179eca5 (the pvops stable-2.6.32.x tree from a few
days ago), and a DomU running a vanilla paravirtualised 2.6.32.24 kernel.

Here is an example:

  [root@mrhankey:~]# xm create drdoctor
  Using config file "/etc/xen/drdoctor".
  Started domain drdoctor (id=4)

  [root@mrhankey:~]# uptime
  18:47pm  up   1:41,  1 user,  load average: 1.04, 1.01, 1.00

  [root@mrhankey:~]# ssh drdoctor date
  Mon Oct 11 18:47:59 CEST 2010

Now we reboot Dom0 (which saves and restores "drdoctor").  After this the clock
in "drdoctor" is stuck in the future:

  [root@mrhankey:~]# uptime
  18:53pm  up   0:01,  1 user,  load average: 0.40, 0.15, 0.05

  [root@mrhankey:~]# date
  Mon Oct 11 18:53:49 CEST 2010

  [root@mrhankey:~]# ssh drdoctor date
  Mon Oct 11 20:33:21 CEST 2010

  (wait a while...)

  [root@mrhankey:~]# ssh drdoctor date
  Mon Oct 11 20:33:21 CEST 2010

Note that the DomU kernel has jumped roughly 1:40 into the future, which was
Dom0's uptime prior to its reboot.  The clock in DomU stays stuck at 20:33:21
until Dom0's clock reaches 20:33:21, after which it starts ticking again.
During this time, the machine is basically unusable because any time-dependent
function (such as sleep()) remains stuck.

The problem does not occur when DomU is saved and restored without a Dom0 reboot
in between.  Whether NTP is running on Dom0 or DomU doesn't matter.  I tried
"tsc_mode=1" (force RDTSC emulation) but it didn't have an effect.  Neither did
changing the clocksource in DomU from "xen" to "tsc", or changing the date with
"date -s" on Dom0 or DomU.

The following messages in /var/log/xen/xend.log might be relevant:

(during save...)
[2010-10-11 16:48:10 2000] INFO (XendCheckpoint:423) xc_save: failed to get the
suspend evtchn port
...
(during restore...)
[2010-10-11 16:53:29 2066] INFO (XendCheckpoint:423) Reloading memory pages:   0%
[2010-10-11 16:53:34 2066] INFO (XendCheckpoint:423) ERROR Internal error: Error
when reading batch size
[2010-10-11 16:53:34 2066] INFO (XendCheckpoint:423) ERROR Internal error: error
when buffering batch, finishing
...
[2010-10-11 16:53:35 2066] INFO (XendCheckpoint:423) Restore exit with rc=0

And another time:

[2010-10-11 14:20:03 2044] INFO (XendCheckpoint:423) ERROR Internal error: Max
batch size exceeded (1970103633). Giving up.
[2010-10-11 14:20:03 2044] INFO (XendCheckpoint:423) ERROR Internal error: error
when buffering batch, finishing

These seem to suggest that the save is incomplete or corrupt.  However, in all
cases the restore completes succesfully, apart from the clock issue.

Anybody have an idea what might be the cause?  BTW, I'm packaging Xen for NixOS
(http://nixos.org/nixos/), which stores packages under non-standard prefixes
(i.e. not /usr), but I don't think this is an issue here.

-- 
Eelco Dolstra | http://www.st.ewi.tudelft.nl/~dolstra/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-10-26 16:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-19 10:07 DomU clock jumps forward then freezes after Dom0 reboot Olivier Hanesse
2010-10-24 12:31 ` Cédric Schieli
2010-10-25 23:25   ` Jeremy Fitzhardinge
2010-10-26  0:21   ` Jeremy Fitzhardinge
2010-10-26 13:08     ` Cédric Schieli
2010-10-26 16:52       ` Jeremy Fitzhardinge
  -- strict thread matches above, loose matches on Subject: below --
2010-10-12 14:11 Eelco Dolstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.