All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/hvm: fix corrupt ACPI PM-Timer during live migration
@ 2013-02-05  6:12 Kouya Shimura
  2013-02-12 12:26 ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Kouya Shimura @ 2013-02-05  6:12 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 466 bytes --]

The value of ACPI PM-Timer may be broken on save unless the timer mode
is delay_for_missed_ticks.
With other timer modes, vcpu->arch.hvm_vcpu.guest_time is always zero
and the adjustment from its value is wrong.

This patch fixes the saved value of ACPI PM-Timer:
- don't adjust the PM-Timer if vcpu->arch.hvm_vcpu.guest_time is zero.
- consolidate calculations of PM-Timer to one function. more precise on save.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>

[-- Attachment #2: fix_corrupt_saved_pmtimer.patch --]
[-- Type: text/x-patch, Size: 3549 bytes --]

diff -r 5af4f2ab06f3 xen/arch/x86/hvm/pmtimer.c
--- a/xen/arch/x86/hvm/pmtimer.c	Tue Jan 22 09:33:10 2013 +0100
+++ b/xen/arch/x86/hvm/pmtimer.c	Tue Feb 05 10:26:13 2013 +0900
@@ -84,28 +84,31 @@ void hvm_acpi_sleep_button(struct domain
 }
 
 /* Set the correct value in the timer, accounting for time elapsed
- * since the last time we did that. */
-static void pmt_update_time(PMTState *s)
+ * since the last time we did that.
+ * return true if the counter's MSB has changed. */
+static bool_t pmt_count_up_and_test_msb(PMTState *s, uint64_t gtime)
 {
-    uint64_t curr_gtime, tmp;
     uint32_t tmr_val = s->pm.tmr_val, msb = tmr_val & TMR_VAL_MSB;
-    
-    ASSERT(spin_is_locked(&s->lock));
+    uint64_t tmp = ((gtime - s->last_gtime) * s->scale) + s->not_accounted;
 
-    /* Update the timer */
-    curr_gtime = hvm_get_guest_time(s->vcpu);
-    tmp = ((curr_gtime - s->last_gtime) * s->scale) + s->not_accounted;
     s->not_accounted = (uint32_t)tmp;
     tmr_val += tmp >> 32;
     tmr_val &= TMR_VAL_MASK;
-    s->last_gtime = curr_gtime;
+    s->last_gtime = gtime;
 
     /* Update timer value atomically wrt lock-free reads in handle_pmt_io(). */
     *(volatile uint32_t *)&s->pm.tmr_val = tmr_val;
 
-    /* If the counter's MSB has changed, set the status bit */
-    if ( (tmr_val & TMR_VAL_MSB) != msb )
+    return  ((tmr_val & TMR_VAL_MSB) != msb);
+}
+
+static void pmt_update_time(PMTState *s)
+{
+    ASSERT(spin_is_locked(&s->lock));
+
+    if ( pmt_count_up_and_test_msb(s, hvm_get_guest_time(s->vcpu)) )
     {
+        /* MSB has changed, set the status bit */
         s->pm.pm1a_sts |= TMR_STS;
         pmt_update_sci(s);
     }
@@ -244,21 +247,34 @@ static int handle_pmt_io(
 static int pmtimer_save(struct domain *d, hvm_domain_context_t *h)
 {
     PMTState *s = &d->arch.hvm_domain.pl_time.vpmt;
-    uint32_t x, msb = s->pm.tmr_val & TMR_VAL_MSB;
+    uint64_t guest_time;
     int rc;
 
     spin_lock(&s->lock);
 
-    /* Update the counter to the guest's current time.  We always save
-     * with the domain paused, so the saved time should be after the
-     * last_gtime, but just in case, make sure we only go forwards */
-    x = ((s->vcpu->arch.hvm_vcpu.guest_time - s->last_gtime) * s->scale) >> 32;
-    if ( x < 1UL<<31 )
-        s->pm.tmr_val += x;
-    if ( (s->pm.tmr_val & TMR_VAL_MSB) != msb )
-        s->pm.pm1a_sts |= TMR_STS;
-    /* No point in setting the SCI here because we'll already have saved the 
-     * IRQ and *PIC state; we'll fix it up when we restore the domain */
+    guest_time = s->vcpu->arch.hvm_vcpu.guest_time;
+    /* Update the counter to the guest's current time only if the
+     * timer mode is delay_for_missed_ticks */
+    if ( guest_time != 0 )
+    {
+        /* We always save with the domain paused, so the saved time
+         * should be after the last_gtime, but just in case, make sure
+         * we only go forwards */
+        if ( (s64)guest_time - (s64)s->last_gtime < 0)
+        {
+            dprintk(XENLOG_WARNING,
+                    "Last update of PMT is ahead of guest's time by %ld\n",
+                    s->last_gtime - guest_time);
+        }
+        else
+        {
+            if ( pmt_count_up_and_test_msb(s, guest_time) )
+                s->pm.pm1a_sts |= TMR_STS;
+            /* No point in setting the SCI here because we'll already
+             * have saved the IRQ and *PIC state;
+             * we'll fix it up when we restore the domain */
+        }
+    }
 
     rc = hvm_save_entry(PMTIMER, 0, h, &s->pm);
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-02-05  6:12 [PATCH] x86/hvm: fix corrupt ACPI PM-Timer during live migration Kouya Shimura
@ 2013-02-12 12:26 ` Jan Beulich
  2013-02-14  6:09   ` Kouya Shimura
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2013-02-12 12:26 UTC (permalink / raw)
  To: Kouya Shimura; +Cc: xen-devel

>>> On 05.02.13 at 07:12, Kouya Shimura <kouya@jp.fujitsu.com> wrote:
>--- a/xen/arch/x86/hvm/pmtimer.c	Tue Jan 22 09:33:10 2013 +0100
>+++ b/xen/arch/x86/hvm/pmtimer.c	Tue Feb 05 10:26:13 2013 +0900
>@@ -84,28 +84,31 @@ void hvm_acpi_sleep_button(struct domain
> }
> 
> /* Set the correct value in the timer, accounting for time elapsed
>- * since the last time we did that. */
>-static void pmt_update_time(PMTState *s)
>+ * since the last time we did that.
>+ * return true if the counter's MSB has changed. */
>+static bool_t pmt_count_up_and_test_msb(PMTState *s, uint64_t gtime)
> {
>-    uint64_t curr_gtime, tmp;
>     uint32_t tmr_val = s->pm.tmr_val, msb = tmr_val & TMR_VAL_MSB;
>-    
>-    ASSERT(spin_is_locked(&s->lock));
>+    uint64_t tmp = ((gtime - s->last_gtime) * s->scale) + s->not_accounted;
> 
>-    /* Update the timer */
>-    curr_gtime = hvm_get_guest_time(s->vcpu);
>-    tmp = ((curr_gtime - s->last_gtime) * s->scale) + s->not_accounted;
>     s->not_accounted = (uint32_t)tmp;
>     tmr_val += tmp >> 32;
>     tmr_val &= TMR_VAL_MASK;
>-    s->last_gtime = curr_gtime;
>+    s->last_gtime = gtime;
> 
>     /* Update timer value atomically wrt lock-free reads in handle_pmt_io(). */
>     *(volatile uint32_t *)&s->pm.tmr_val = tmr_val;
> 
>-    /* If the counter's MSB has changed, set the status bit */
>-    if ( (tmr_val & TMR_VAL_MSB) != msb )
>+    return  ((tmr_val & TMR_VAL_MSB) != msb);

Single space only please.

>+}
>+
>+static void pmt_update_time(PMTState *s)
>+{
>+    ASSERT(spin_is_locked(&s->lock));
>+
>+    if ( pmt_count_up_and_test_msb(s, hvm_get_guest_time(s->vcpu)) )
>     {
>+        /* MSB has changed, set the status bit */
>         s->pm.pm1a_sts |= TMR_STS;
>         pmt_update_sci(s);
>     }
>@@ -244,21 +247,34 @@ static int handle_pmt_io(
> static int pmtimer_save(struct domain *d, hvm_domain_context_t *h)
> {
>     PMTState *s = &d->arch.hvm_domain.pl_time.vpmt;
>-    uint32_t x, msb = s->pm.tmr_val & TMR_VAL_MSB;
>+    uint64_t guest_time;
>     int rc;
> 
>     spin_lock(&s->lock);
> 
>-    /* Update the counter to the guest's current time.  We always save
>-     * with the domain paused, so the saved time should be after the
>-     * last_gtime, but just in case, make sure we only go forwards */
>-    x = ((s->vcpu->arch.hvm_vcpu.guest_time - s->last_gtime) * s->scale) >> 32;
>-    if ( x < 1UL<<31 )
>-        s->pm.tmr_val += x;
>-    if ( (s->pm.tmr_val & TMR_VAL_MSB) != msb )
>-        s->pm.pm1a_sts |= TMR_STS;
>-    /* No point in setting the SCI here because we'll already have saved the 
>-     * IRQ and *PIC state; we'll fix it up when we restore the domain */
>+    guest_time = s->vcpu->arch.hvm_vcpu.guest_time;
>+    /* Update the counter to the guest's current time only if the
>+     * timer mode is delay_for_missed_ticks */
>+    if ( guest_time != 0 )

How is guest_time being (non-)zero an indication of mode being
delay_for_missed_ticks? I think you should check for the mode
explicitly, as the value here being zero can just be an effect of
a large enough negative v->arch.hvm_vcpu.stime_offset.

And besides that you don't explain why the update being done
here is unnecessary in other modes - all you explain is that the
way it's being done in those modes is wrong.

>+    {
>+        /* We always save with the domain paused, so the saved time
>+         * should be after the last_gtime, but just in case, make sure
>+         * we only go forwards */
>+        if ( (s64)guest_time - (s64)s->last_gtime < 0)
>+        {
>+            dprintk(XENLOG_WARNING,
>+                    "Last update of PMT is ahead of guest's time by %ld\n",

While probably fine this way for -unstable, please nevertheless
use the correct PRId64 here (both to be formally correct and to
ease eventual backporting).

Jan

>+                    s->last_gtime - guest_time);
>+        }
>+        else
>+        {
>+            if ( pmt_count_up_and_test_msb(s, guest_time) )
>+                s->pm.pm1a_sts |= TMR_STS;
>+            /* No point in setting the SCI here because we'll already
>+             * have saved the IRQ and *PIC state;
>+             * we'll fix it up when we restore the domain */
>+        }
>+    }
> 
>     rc = hvm_save_entry(PMTIMER, 0, h, &s->pm);
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-02-12 12:26 ` Jan Beulich
@ 2013-02-14  6:09   ` Kouya Shimura
  2013-02-15 16:45     ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Kouya Shimura @ 2013-02-14  6:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 2301 bytes --]

Hi Jan,

Thanks for reviewing the code.

On 02/12/2013 09:26 PM, Jan Beulich wrote:
>>>> On 05.02.13 at 07:12, Kouya Shimura <kouya@jp.fujitsu.com> wrote:
>> --- a/xen/arch/x86/hvm/pmtimer.c	Tue Jan 22 09:33:10 2013 +0100
>> +++ b/xen/arch/x86/hvm/pmtimer.c	Tue Feb 05 10:26:13 2013 +0900
>> @@ -244,21 +247,34 @@ static int handle_pmt_io(
>> static int pmtimer_save(struct domain *d, hvm_domain_context_t *h)
>> {
>>      PMTState *s = &d->arch.hvm_domain.pl_time.vpmt;
>> -    uint32_t x, msb = s->pm.tmr_val & TMR_VAL_MSB;
>> +    uint64_t guest_time;
>>      int rc;
>>
>>      spin_lock(&s->lock);
>>
>> -    /* Update the counter to the guest's current time.  We always save
>> -     * with the domain paused, so the saved time should be after the
>> -     * last_gtime, but just in case, make sure we only go forwards */
>> -    x = ((s->vcpu->arch.hvm_vcpu.guest_time - s->last_gtime) * s->scale) >> 32;
>> -    if ( x < 1UL<<31 )
>> -        s->pm.tmr_val += x;
>> -    if ( (s->pm.tmr_val & TMR_VAL_MSB) != msb )
>> -        s->pm.pm1a_sts |= TMR_STS;
>> -    /* No point in setting the SCI here because we'll already have saved the
>> -     * IRQ and *PIC state; we'll fix it up when we restore the domain */
>> +    guest_time = s->vcpu->arch.hvm_vcpu.guest_time;
>> +    /* Update the counter to the guest's current time only if the
>> +     * timer mode is delay_for_missed_ticks */
>> +    if ( guest_time != 0 )
>
> How is guest_time being (non-)zero an indication of mode being
> delay_for_missed_ticks? I think you should check for the mode
> explicitly, as the value here being zero can just be an effect of
> a large enough negative v->arch.hvm_vcpu.stime_offset.
>
> And besides that you don't explain why the update being done
> here is unnecessary in other modes - all you explain is that the
> way it's being done in those modes is wrong.

After due consideration, the update of PM-timer is necessary
for other modes. I misunderstood it's just an adjustment for
the delay_for_missed_ticks mode.
The cause of this bug is to refer the vcpu->arch.hvm_vcpu.guest_time.

Attached is the revised patch.

Aside from this patch, I've found another problem about
delay_for_missed_ticks.  PM-timer might be broken after
pmt_timer_callback is invoked.
I'll think it over.

Thanks,
Kouya


[-- Attachment #2: fix_corrupt_saved_pmtimer_v2.patch --]
[-- Type: text/x-patch, Size: 2951 bytes --]

# HG changeset patch
# User Kouya Shimura <kouya@jp.fujitsu.com>
# Date 1360820290 -32400
# Node ID 9744ac3f23dd198e997bf032385a573408420762
# Parent  5af4f2ab06f33ce441fa550333a9049c09a9ef28
x86/hvm: fix corrupt ACPI PM-Timer during live migration

ACPI PM-Timer value is broken on saving a VM.
Since c/s 16274:44dde35cb2a6, vcpu->arch.hvm_vcpu.guest_time is not
the correct guest's time any more.
Instead, hvm_get_guest_time() should be used in pmtimer_save to calculate
the timer.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>

diff -r 5af4f2ab06f3 -r 9744ac3f23dd xen/arch/x86/hvm/pmtimer.c
--- a/xen/arch/x86/hvm/pmtimer.c	Tue Jan 22 09:33:10 2013 +0100
+++ b/xen/arch/x86/hvm/pmtimer.c	Thu Feb 14 14:38:10 2013 +0900
@@ -85,7 +85,7 @@ void hvm_acpi_sleep_button(struct domain
 
 /* Set the correct value in the timer, accounting for time elapsed
  * since the last time we did that. */
-static void pmt_update_time(PMTState *s)
+static void pmt_update_time(PMTState *s, bool_t update_sci)
 {
     uint64_t curr_gtime, tmp;
     uint32_t tmr_val = s->pm.tmr_val, msb = tmr_val & TMR_VAL_MSB;
@@ -107,7 +107,8 @@ static void pmt_update_time(PMTState *s)
     if ( (tmr_val & TMR_VAL_MSB) != msb )
     {
         s->pm.pm1a_sts |= TMR_STS;
-        pmt_update_sci(s);
+        if ( update_sci )
+            pmt_update_sci(s);
     }
 }
 
@@ -123,7 +124,7 @@ static void pmt_timer_callback(void *opa
     spin_lock(&s->lock);
 
     /* Recalculate the timer and make sure we get an SCI if we need one */
-    pmt_update_time(s);
+    pmt_update_time(s, 1);
 
     /* How close are we to the next MSB flip? */
     pmt_cycles_until_flip = TMR_VAL_MSB - (s->pm.tmr_val & (TMR_VAL_MSB - 1));
@@ -221,7 +222,7 @@ static int handle_pmt_io(
         if ( spin_trylock(&s->lock) )
         {
             /* We hold the lock: update timer value and return it. */
-            pmt_update_time(s);
+            pmt_update_time(s, 1);
             *val = s->pm.tmr_val;
             spin_unlock(&s->lock);
         }
@@ -244,21 +245,13 @@ static int handle_pmt_io(
 static int pmtimer_save(struct domain *d, hvm_domain_context_t *h)
 {
     PMTState *s = &d->arch.hvm_domain.pl_time.vpmt;
-    uint32_t x, msb = s->pm.tmr_val & TMR_VAL_MSB;
     int rc;
 
     spin_lock(&s->lock);
 
-    /* Update the counter to the guest's current time.  We always save
-     * with the domain paused, so the saved time should be after the
-     * last_gtime, but just in case, make sure we only go forwards */
-    x = ((s->vcpu->arch.hvm_vcpu.guest_time - s->last_gtime) * s->scale) >> 32;
-    if ( x < 1UL<<31 )
-        s->pm.tmr_val += x;
-    if ( (s->pm.tmr_val & TMR_VAL_MSB) != msb )
-        s->pm.pm1a_sts |= TMR_STS;
     /* No point in setting the SCI here because we'll already have saved the 
      * IRQ and *PIC state; we'll fix it up when we restore the domain */
+    pmt_update_time(s, 0);
 
     rc = hvm_save_entry(PMTIMER, 0, h, &s->pm);
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-02-14  6:09   ` Kouya Shimura
@ 2013-02-15 16:45     ` Jan Beulich
  2013-02-20  7:42       ` Kouya Shimura
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2013-02-15 16:45 UTC (permalink / raw)
  To: Kouya Shimura; +Cc: xen-devel

>>> On 14.02.13 at 07:09, Kouya Shimura <kouya@jp.fujitsu.com> wrote:
>@@ -244,21 +245,13 @@ static int handle_pmt_io(
> static int pmtimer_save(struct domain *d, hvm_domain_context_t *h)
> {
>     PMTState *s = &d->arch.hvm_domain.pl_time.vpmt;
>-    uint32_t x, msb = s->pm.tmr_val & TMR_VAL_MSB;
>     int rc;
> 
>     spin_lock(&s->lock);
> 
>-    /* Update the counter to the guest's current time.  We always save
>-     * with the domain paused, so the saved time should be after the
>-     * last_gtime, but just in case, make sure we only go forwards */

So you lose this property of guaranteeing no backward move.

>-    x = ((s->vcpu->arch.hvm_vcpu.guest_time - s->last_gtime) * s->scale) >> 32;
>-    if ( x < 1UL<<31 )
>-        s->pm.tmr_val += x;
>-    if ( (s->pm.tmr_val & TMR_VAL_MSB) != msb )
>-        s->pm.pm1a_sts |= TMR_STS;
>     /* No point in setting the SCI here because we'll already have saved the 
>      * IRQ and *PIC state; we'll fix it up when we restore the domain */
>+    pmt_update_time(s, 0);

And using this function you also have the new side effect of
s->last_gtime being updated.

Perhaps the new parameter should be renamed (to, say,
"saving"), and then allow suppressing all these behavioral
changes.

Also, in delay_for_missed_ticks mode you now use a slightly
different time for updating s->pm - did you double check that
this is not going to be a problem? Or else, the flag above could
similarly be used to circumvent this, or hvm_get_guest_time()
could be made return the frozen time (I suppose, but didn't
verify - as it appears to be an assumption already before your
patch -, that pt_freeze_time() runs before pmtimer_save()).

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-02-15 16:45     ` Jan Beulich
@ 2013-02-20  7:42       ` Kouya Shimura
  2013-03-07 15:58         ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Kouya Shimura @ 2013-02-20  7:42 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 02/16/2013 01:45 AM, Jan Beulich wrote:
>>>> On 14.02.13 at 07:09, Kouya Shimura <kouya@jp.fujitsu.com> wrote:
>> @@ -244,21 +245,13 @@ static int handle_pmt_io(
>> static int pmtimer_save(struct domain *d, hvm_domain_context_t *h)
>> {
>>      PMTState *s = &d->arch.hvm_domain.pl_time.vpmt;
>> -    uint32_t x, msb = s->pm.tmr_val & TMR_VAL_MSB;
>>      int rc;
>>
>>      spin_lock(&s->lock);
>>
>> -    /* Update the counter to the guest's current time.  We always save
>> -     * with the domain paused, so the saved time should be after the
>> -     * last_gtime, but just in case, make sure we only go forwards */
>
> So you lose this property of guaranteeing no backward move.

Exactly.
But the backward move is impossible except delay_for_missed_ticks mode.
Since the monotonicity of hvm_get_guest_time() is guaranteed for other modes.
About the problem of delay_for_missed_ticks mode, I slightly mentioned
in the previous mail.

The backward move can be happend as the following:
1) pt_freeze_time     ... real-time:100 guest-time:100   last_gtime:90
2) pmt_timer_callback ... real-time:110 guest-time:110   last_gtime:110
3) pt_thaw_time       ... real-time:120 guest-time:100   last_gtime:110
4) pmt_update_time    ... real-time:125 guest-time:105 < last_gtime:110

So, it can be happend not only in pmtimer_save() but also
in handle_pmt_io().

Maybe pmt_udpate_time() should check the backward move.


>
>> -    x = ((s->vcpu->arch.hvm_vcpu.guest_time - s->last_gtime) * s->scale) >> 32;
>> -    if ( x < 1UL<<31 )
>> -        s->pm.tmr_val += x;
>> -    if ( (s->pm.tmr_val & TMR_VAL_MSB) != msb )
>> -        s->pm.pm1a_sts |= TMR_STS;
>>      /* No point in setting the SCI here because we'll already have saved the
>>       * IRQ and *PIC state; we'll fix it up when we restore the domain */
>> +    pmt_update_time(s, 0);
>
> And using this function you also have the new side effect of
> s->last_gtime being updated.
>
> Perhaps the new parameter should be renamed (to, say,
> "saving"), and then allow suppressing all these behavioral
> changes.


The new side effect is also a bug fix, I think.
Since updating s->pm.tmr_val and s->last_gtime should be
done at the same time. Updating only s->pm.tmr_val is wrong.
Or else, when the vm is resumed on the same machine (migration failure,
check-pointing, etc), the PM-timer value is double counted.


>
> Also, in delay_for_missed_ticks mode you now use a slightly
> different time for updating s->pm - did you double check that
> this is not going to be a problem? Or else, the flag above could
> similarly be used to circumvent this, or hvm_get_guest_time()
> could be made return the frozen time (I suppose, but didn't
> verify - as it appears to be an assumption already before your
> patch -, that pt_freeze_time() runs before pmtimer_save()).

Modifying hvm_get_guest_time() is my thought, too.
But I don't like the idea because the delay_for_missed_ticks mode
is unusual. I wonder who uses it.

Let me think it over.

Thanks,
Kouya

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-02-20  7:42       ` Kouya Shimura
@ 2013-03-07 15:58         ` Jan Beulich
  2013-03-08  7:59           ` Kouya Shimura
                             ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Jan Beulich @ 2013-03-07 15:58 UTC (permalink / raw)
  To: Kouya Shimura; +Cc: xen-devel

>>> On 20.02.13 at 08:42, Kouya Shimura <kouya@jp.fujitsu.com> wrote:
> On 02/16/2013 01:45 AM, Jan Beulich wrote:
>>>>> On 14.02.13 at 07:09, Kouya Shimura <kouya@jp.fujitsu.com> wrote:
>>> @@ -244,21 +245,13 @@ static int handle_pmt_io(
>>> static int pmtimer_save(struct domain *d, hvm_domain_context_t *h)
>>> {
>>>      PMTState *s = &d->arch.hvm_domain.pl_time.vpmt;
>>> -    uint32_t x, msb = s->pm.tmr_val & TMR_VAL_MSB;
>>>      int rc;
>>>
>>>      spin_lock(&s->lock);
>>>
>>> -    /* Update the counter to the guest's current time.  We always save
>>> -     * with the domain paused, so the saved time should be after the
>>> -     * last_gtime, but just in case, make sure we only go forwards */
>>
>> So you lose this property of guaranteeing no backward move.
> 
> Exactly.
> But the backward move is impossible except delay_for_missed_ticks mode.
> Since the monotonicity of hvm_get_guest_time() is guaranteed for other 
> modes.
> About the problem of delay_for_missed_ticks mode, I slightly mentioned
> in the previous mail.
> 
> The backward move can be happend as the following:
> 1) pt_freeze_time     ... real-time:100 guest-time:100   last_gtime:90
> 2) pmt_timer_callback ... real-time:110 guest-time:110   last_gtime:110
> 3) pt_thaw_time       ... real-time:120 guest-time:100   last_gtime:110
> 4) pmt_update_time    ... real-time:125 guest-time:105 < last_gtime:110
> 
> So, it can be happend not only in pmtimer_save() but also
> in handle_pmt_io().
> 
> Maybe pmt_udpate_time() should check the backward move.
> 
> 
>>
>>> -    x = ((s->vcpu->arch.hvm_vcpu.guest_time - s->last_gtime) * s->scale) >> 32;
>>> -    if ( x < 1UL<<31 )
>>> -        s->pm.tmr_val += x;
>>> -    if ( (s->pm.tmr_val & TMR_VAL_MSB) != msb )
>>> -        s->pm.pm1a_sts |= TMR_STS;
>>>      /* No point in setting the SCI here because we'll already have saved 
> the
>>>       * IRQ and *PIC state; we'll fix it up when we restore the domain */
>>> +    pmt_update_time(s, 0);
>>
>> And using this function you also have the new side effect of
>> s->last_gtime being updated.
>>
>> Perhaps the new parameter should be renamed (to, say,
>> "saving"), and then allow suppressing all these behavioral
>> changes.
> 
> 
> The new side effect is also a bug fix, I think.
> Since updating s->pm.tmr_val and s->last_gtime should be
> done at the same time. Updating only s->pm.tmr_val is wrong.
> Or else, when the vm is resumed on the same machine (migration failure,
> check-pointing, etc), the PM-timer value is double counted.
> 
> 
>>
>> Also, in delay_for_missed_ticks mode you now use a slightly
>> different time for updating s->pm - did you double check that
>> this is not going to be a problem? Or else, the flag above could
>> similarly be used to circumvent this, or hvm_get_guest_time()
>> could be made return the frozen time (I suppose, but didn't
>> verify - as it appears to be an assumption already before your
>> patch -, that pt_freeze_time() runs before pmtimer_save()).
> 
> Modifying hvm_get_guest_time() is my thought, too.
> But I don't like the idea because the delay_for_missed_ticks mode
> is unusual. I wonder who uses it.
> 
> Let me think it over.

Ping?

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-03-07 15:58         ` Jan Beulich
@ 2013-03-08  7:59           ` Kouya Shimura
  2013-03-21  7:31           ` Kouya Shimura
       [not found]           ` <514A9BC4.40508@jp.fujitsu.com>
  2 siblings, 0 replies; 22+ messages in thread
From: Kouya Shimura @ 2013-03-08  7:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

> Ping?

Sorry for late.
Almost done, but the test is not enough yet.
Maybe I can post it next week.

Thanks,
Kouya


On 03/08/2013 12:58 AM, Jan Beulich wrote:
>>>> On 20.02.13 at 08:42, Kouya Shimura <kouya@jp.fujitsu.com> wrote:
>> On 02/16/2013 01:45 AM, Jan Beulich wrote:
>>>>>> On 14.02.13 at 07:09, Kouya Shimura <kouya@jp.fujitsu.com> wrote:
>>>> @@ -244,21 +245,13 @@ static int handle_pmt_io(
>>>> static int pmtimer_save(struct domain *d, hvm_domain_context_t *h)
>>>> {
>>>>       PMTState *s = &d->arch.hvm_domain.pl_time.vpmt;
>>>> -    uint32_t x, msb = s->pm.tmr_val & TMR_VAL_MSB;
>>>>       int rc;
>>>>
>>>>       spin_lock(&s->lock);
>>>>
>>>> -    /* Update the counter to the guest's current time.  We always save
>>>> -     * with the domain paused, so the saved time should be after the
>>>> -     * last_gtime, but just in case, make sure we only go forwards */
>>>
>>> So you lose this property of guaranteeing no backward move.
>>
>> Exactly.
>> But the backward move is impossible except delay_for_missed_ticks mode.
>> Since the monotonicity of hvm_get_guest_time() is guaranteed for other
>> modes.
>> About the problem of delay_for_missed_ticks mode, I slightly mentioned
>> in the previous mail.
>>
>> The backward move can be happend as the following:
>> 1) pt_freeze_time     ... real-time:100 guest-time:100   last_gtime:90
>> 2) pmt_timer_callback ... real-time:110 guest-time:110   last_gtime:110
>> 3) pt_thaw_time       ... real-time:120 guest-time:100   last_gtime:110
>> 4) pmt_update_time    ... real-time:125 guest-time:105 < last_gtime:110
>>
>> So, it can be happend not only in pmtimer_save() but also
>> in handle_pmt_io().
>>
>> Maybe pmt_udpate_time() should check the backward move.
>>
>>
>>>
>>>> -    x = ((s->vcpu->arch.hvm_vcpu.guest_time - s->last_gtime) * s->scale) >> 32;
>>>> -    if ( x < 1UL<<31 )
>>>> -        s->pm.tmr_val += x;
>>>> -    if ( (s->pm.tmr_val & TMR_VAL_MSB) != msb )
>>>> -        s->pm.pm1a_sts |= TMR_STS;
>>>>       /* No point in setting the SCI here because we'll already have saved
>> the
>>>>        * IRQ and *PIC state; we'll fix it up when we restore the domain */
>>>> +    pmt_update_time(s, 0);
>>>
>>> And using this function you also have the new side effect of
>>> s->last_gtime being updated.
>>>
>>> Perhaps the new parameter should be renamed (to, say,
>>> "saving"), and then allow suppressing all these behavioral
>>> changes.
>>
>>
>> The new side effect is also a bug fix, I think.
>> Since updating s->pm.tmr_val and s->last_gtime should be
>> done at the same time. Updating only s->pm.tmr_val is wrong.
>> Or else, when the vm is resumed on the same machine (migration failure,
>> check-pointing, etc), the PM-timer value is double counted.
>>
>>
>>>
>>> Also, in delay_for_missed_ticks mode you now use a slightly
>>> different time for updating s->pm - did you double check that
>>> this is not going to be a problem? Or else, the flag above could
>>> similarly be used to circumvent this, or hvm_get_guest_time()
>>> could be made return the frozen time (I suppose, but didn't
>>> verify - as it appears to be an assumption already before your
>>> patch -, that pt_freeze_time() runs before pmtimer_save()).
>>
>> Modifying hvm_get_guest_time() is my thought, too.
>> But I don't like the idea because the delay_for_missed_ticks mode
>> is unusual. I wonder who uses it.
>>
>> Let me think it over.
>
> Ping?
>
> Jan
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-03-07 15:58         ` Jan Beulich
  2013-03-08  7:59           ` Kouya Shimura
@ 2013-03-21  7:31           ` Kouya Shimura
  2013-03-21  8:05             ` Jan Beulich
       [not found]           ` <514A9BC4.40508@jp.fujitsu.com>
  2 siblings, 1 reply; 22+ messages in thread
From: Kouya Shimura @ 2013-03-21  7:31 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 03/08/2013 12:58 AM, Jan Beulich wrote:
>>>> On 20.02.13 at 08:42, Kouya Shimura <kouya@jp.fujitsu.com> wrote:
>> On 02/16/2013 01:45 AM, Jan Beulich wrote:
>>>>>> On 14.02.13 at 07:09, Kouya Shimura <kouya@jp.fujitsu.com> wrote:
>>>> @@ -244,21 +245,13 @@ static int handle_pmt_io(
>>>> static int pmtimer_save(struct domain *d, hvm_domain_context_t *h)
>>>> {
>>>>       PMTState *s = &d->arch.hvm_domain.pl_time.vpmt;
>>>> -    uint32_t x, msb = s->pm.tmr_val & TMR_VAL_MSB;
>>>>       int rc;
>>>>
>>>>       spin_lock(&s->lock);
>>>>
>>>> -    /* Update the counter to the guest's current time.  We always save
>>>> -     * with the domain paused, so the saved time should be after the
>>>> -     * last_gtime, but just in case, make sure we only go forwards */
>>>
>>> So you lose this property of guaranteeing no backward move.
>>
>> Exactly.
>> But the backward move is impossible except delay_for_missed_ticks mode.
>> Since the monotonicity of hvm_get_guest_time() is guaranteed for other
>> modes.
>> About the problem of delay_for_missed_ticks mode, I slightly mentioned
>> in the previous mail.
>>
>> The backward move can be happend as the following:
>> 1) pt_freeze_time     ... real-time:100 guest-time:100   last_gtime:90
>> 2) pmt_timer_callback ... real-time:110 guest-time:110   last_gtime:110
>> 3) pt_thaw_time       ... real-time:120 guest-time:100   last_gtime:110
>> 4) pmt_update_time    ... real-time:125 guest-time:105 < last_gtime:110
>>
>> So, it can be happend not only in pmtimer_save() but also
>> in handle_pmt_io().
>>
>> Maybe pmt_udpate_time() should check the backward move.
>>
>>
>>>
>>>> -    x = ((s->vcpu->arch.hvm_vcpu.guest_time - s->last_gtime) * s->scale) >> 32;
>>>> -    if ( x < 1UL<<31 )
>>>> -        s->pm.tmr_val += x;
>>>> -    if ( (s->pm.tmr_val & TMR_VAL_MSB) != msb )
>>>> -        s->pm.pm1a_sts |= TMR_STS;
>>>>       /* No point in setting the SCI here because we'll already have saved
>> the
>>>>        * IRQ and *PIC state; we'll fix it up when we restore the domain */
>>>> +    pmt_update_time(s, 0);
>>>
>>> And using this function you also have the new side effect of
>>> s->last_gtime being updated.
>>>
>>> Perhaps the new parameter should be renamed (to, say,
>>> "saving"), and then allow suppressing all these behavioral
>>> changes.
>>
>>
>> The new side effect is also a bug fix, I think.
>> Since updating s->pm.tmr_val and s->last_gtime should be
>> done at the same time. Updating only s->pm.tmr_val is wrong.
>> Or else, when the vm is resumed on the same machine (migration failure,
>> check-pointing, etc), the PM-timer value is double counted.
>>
>>
>>>
>>> Also, in delay_for_missed_ticks mode you now use a slightly
>>> different time for updating s->pm - did you double check that
>>> this is not going to be a problem? Or else, the flag above could
>>> similarly be used to circumvent this, or hvm_get_guest_time()
>>> could be made return the frozen time (I suppose, but didn't
>>> verify - as it appears to be an assumption already before your
>>> patch -, that pt_freeze_time() runs before pmtimer_save()).
>>
>> Modifying hvm_get_guest_time() is my thought, too.
>> But I don't like the idea because the delay_for_missed_ticks mode
>> is unusual. I wonder who uses it.
>>
>> Let me think it over.
>
> Ping?
>
> Jan
>

Sorry for the delay.

I'll send two patches:

This is a new added patch.
  - 1/2 prevent guest's timers from going backwards when timer_mode=0

Nothing but a comment is changed from the previous patch.
  - 2/2 fix corrupt ACPI PM-Timer during live migration


Actually, I don't think I'd like to support timer_mode=0.
timer_mode=0 is meaningless for a recent tickless linux kernel.

Thanks,
Kouya

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/2] x86/hvm: prevent guest's timers from going backwards, when timer_mode=0
       [not found]           ` <514A9BC4.40508@jp.fujitsu.com>
@ 2013-03-21  7:32             ` Kouya Shimura
  2013-03-21  7:32             ` [PATCH 2/2] x86/hvm: fix corrupt ACPI PM-Timer during live migration Kouya Shimura
  1 sibling, 0 replies; 22+ messages in thread
From: Kouya Shimura @ 2013-03-21  7:32 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

In the case of delay_for_missed_ticks mode (timer_mode=0), the guest
virtual time goes backwards when the vcpu is rescheduled. Therefore
guest's HW timer might go backwards, too.

Case 1. SMP guest:
  1) vcpu#1 is de-scheduled and the guest time freezes. (TIME 0:0.010)
  2) vcpu#2 access a timer (0:0.020)
  3) vcpu#1 is re-scheduled and the guest time thaws.   (0:0.030->0:0.010)
  4) vcpu#2 access a timer (0:0.015) // Backwards!!!

Case 2. asynchronous callback:
  1) vcpu#1 is de-scheduled and the guest time freezes. (0:0.010)
  2) pmt_timer_callback() is invoked (0:0.025)
  3) vcpu#1 is re-scheduled and the guest time thaws.   (0:0.030->0:0.010)
  4) vcpu#1 access the PM-TIMER (0:0.015) // Backwards!!!

This patch affects only delay_for_missed_ticks mode (timer_mode=0) and
ensures the monotonicity of the following timers:
  - PIT
  - HPET
  - ACPI PM-TIMER

The following timers are OK since a vcpu never access the other vcpu's timer.
  - Local APIC ( has some callbacks but it's called from pt_intr_post )
  - TSC
Just in case, these timers can use the new function hvm_get_base_time() too,
but doesn't. It's a little bit less efficient than hvm_get_guest_time().

Also, tidy up virtual platform timer code.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
---
  xen/arch/x86/hvm/hpet.c       |    2 +-
  xen/arch/x86/hvm/i8254.c      |    2 +-
  xen/arch/x86/hvm/pmtimer.c    |    2 +-
  xen/arch/x86/hvm/vpt.c        |   93 ++++++++++++++++++++++++++++++++++-------
  xen/include/asm-x86/hvm/hvm.h |    2 +-
  5 files changed, 82 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index 4b4b905..fa44d37 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -39,7 +39,7 @@
  /* Frequency_of_Xen_systeme_time / frequency_of_HPET = 16 */
  #define STIME_PER_HPET_TICK 16
  #define guest_time_hpet(hpet) \
-    (hvm_get_guest_time(vhpet_vcpu(hpet)) / STIME_PER_HPET_TICK)
+    (hvm_get_base_time(vhpet_vcpu(hpet)) / STIME_PER_HPET_TICK)
  
  #define HPET_TN_INT_ROUTE_CAP_SHIFT 32
  #define HPET_TN_CFG_BITS_READONLY_OR_RESERVED (HPET_TN_RESERVED | \
diff --git a/xen/arch/x86/hvm/i8254.c b/xen/arch/x86/hvm/i8254.c
index c0d6bc2..c45ed88 100644
--- a/xen/arch/x86/hvm/i8254.c
+++ b/xen/arch/x86/hvm/i8254.c
@@ -54,7 +54,7 @@ static int handle_speaker_io(
      int dir, uint32_t port, uint32_t bytes, uint32_t *val);
  
  #define get_guest_time(v) \
-   (is_hvm_vcpu(v) ? hvm_get_guest_time(v) : (u64)get_s_time())
+   (is_hvm_vcpu(v) ? hvm_get_base_time(v) : (u64)get_s_time())
  
  static int pit_get_count(PITState *pit, int channel)
  {
diff --git a/xen/arch/x86/hvm/pmtimer.c b/xen/arch/x86/hvm/pmtimer.c
index 01ae31d..5c25cfb 100644
--- a/xen/arch/x86/hvm/pmtimer.c
+++ b/xen/arch/x86/hvm/pmtimer.c
@@ -93,7 +93,7 @@ static void pmt_update_time(PMTState *s)
      ASSERT(spin_is_locked(&s->lock));
  
      /* Update the timer */
-    curr_gtime = hvm_get_guest_time(s->vcpu);
+    curr_gtime = hvm_get_base_time(s->vcpu);
      tmp = ((curr_gtime - s->last_gtime) * s->scale) + s->not_accounted;
      s->not_accounted = (uint32_t)tmp;
      tmr_val += tmp >> 32;
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 46d3ec6..7a3edf3 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -36,6 +36,19 @@ void hvm_init_guest_time(struct domain *d)
      pl->last_guest_time = 0;
  }
  
+static inline u64 pt_now(struct pl_time *pl, struct vcpu *v)
+{
+    u64 now = get_s_time() + pl->stime_offset;
+
+    ASSERT(spin_is_locked(&pl->pl_time_lock));
+
+    if ( (int64_t)(now - pl->last_guest_time) > 0 )
+        pl->last_guest_time = now;
+    else
+        now = ++pl->last_guest_time;
+    return now + v->arch.hvm_vcpu.stime_offset;
+}
+
  u64 hvm_get_guest_time(struct vcpu *v)
  {
      struct pl_time *pl = &v->domain->arch.hvm_domain.pl_time;
@@ -45,19 +58,33 @@ u64 hvm_get_guest_time(struct vcpu *v)
      ASSERT(is_hvm_vcpu(v));
  
      spin_lock(&pl->pl_time_lock);
-    now = get_s_time() + pl->stime_offset;
-    if ( (int64_t)(now - pl->last_guest_time) > 0 )
-        pl->last_guest_time = now;
-    else
-        now = ++pl->last_guest_time;
+    now = pt_now(pl, v);
      spin_unlock(&pl->pl_time_lock);
-
-    return now + v->arch.hvm_vcpu.stime_offset;
+    return now;
  }
  
-void hvm_set_guest_time(struct vcpu *v, u64 guest_time)
+/*
+ * This function is used to emulate HW timer counters. In the case of
+ * delay_for_missed_ticks mode, the guest time once goes backwards to
+ * the frozen time when the vcpu is rescheduled. To avoid decrement
+ * of a timer counter, return the frozen time while the vcpu is not
+ * being scheduled.
+ */
+u64 hvm_get_base_time(struct vcpu *v)
  {
-    v->arch.hvm_vcpu.stime_offset += guest_time - hvm_get_guest_time(v);
+    struct pl_time *pl = &v->domain->arch.hvm_domain.pl_time;
+    u64 now;
+
+    /* Called from device models shared with PV guests. Be careful. */
+    ASSERT(is_hvm_vcpu(v));
+
+    spin_lock(&pl->pl_time_lock);
+    if ( v->arch.hvm_vcpu.guest_time ) /* the guest time is frozen */
+        now = v->arch.hvm_vcpu.guest_time;
+    else
+        now = pt_now(pl, v);
+    spin_unlock(&pl->pl_time_lock);
+    return now;
  }
  
  static int pt_irq_vector(struct periodic_time *pt, enum hvm_intsrc src)
@@ -138,24 +165,62 @@ static void pt_process_missed_ticks(struct periodic_time *pt)
      pt->scheduled += missed_ticks * pt->period;
  }
  
+/*
+ * N.B. The following three functions, pt_freeze_time(),
+ * pt_thaw_time() and pt_step_time() never race with each others,
+ * but race with either hvm_get_guest_time() or hvm_get_base_time().
+ */
+
  static void pt_freeze_time(struct vcpu *v)
  {
+    struct pl_time *pl;
+
      if ( !mode_is(v->domain, delay_for_missed_ticks) )
          return;
  
-    v->arch.hvm_vcpu.guest_time = hvm_get_guest_time(v);
+    pl = &v->domain->arch.hvm_domain.pl_time;
+    spin_lock(&pl->pl_time_lock);
+    v->arch.hvm_vcpu.guest_time = pt_now(pl, v);
+    spin_unlock(&pl->pl_time_lock);
  }
  
  static void pt_thaw_time(struct vcpu *v)
  {
+    struct pl_time *pl;
+    u64 now, frozen_time = v->arch.hvm_vcpu.guest_time;
+
+#if 0 /* redundant */
      if ( !mode_is(v->domain, delay_for_missed_ticks) )
          return;
+#endif
  
-    if ( v->arch.hvm_vcpu.guest_time == 0 )
+    if ( frozen_time == 0 )
          return;
  
-    hvm_set_guest_time(v, v->arch.hvm_vcpu.guest_time);
+    ASSERT(mode_is(v->domain, delay_for_missed_ticks));
+
+    pl = &v->domain->arch.hvm_domain.pl_time;
+    spin_lock(&pl->pl_time_lock);
+    now = pt_now(pl, v);
+    v->arch.hvm_vcpu.stime_offset += frozen_time - now;
      v->arch.hvm_vcpu.guest_time = 0;
+    spin_unlock(&pl->pl_time_lock);
+}
+
+static void pt_step_time(struct vcpu *v, u64 guest_time)
+{
+    struct pl_time *pl;
+    u64 now;
+
+    if ( !mode_is(v->domain, delay_for_missed_ticks) )
+        return;
+
+    pl = &v->domain->arch.hvm_domain.pl_time;
+    spin_lock(&pl->pl_time_lock);
+    now = pt_now(pl, v);
+    if ( now < guest_time )
+        v->arch.hvm_vcpu.stime_offset += guest_time - now;
+    spin_unlock(&pl->pl_time_lock);
  }
  
  void pt_save_timer(struct vcpu *v)
@@ -341,9 +406,7 @@ void pt_intr_post(struct vcpu *v, struct hvm_intack intack)
          }
      }
  
-    if ( mode_is(v->domain, delay_for_missed_ticks) &&
-         (hvm_get_guest_time(v) < pt->last_plt_gtime) )
-        hvm_set_guest_time(v, pt->last_plt_gtime);
+    pt_step_time(v, pt->last_plt_gtime);
  
      cb = pt->cb;
      cb_priv = pt->priv;
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 2fa2ea5..f4cd200 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -226,8 +226,8 @@ void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc);
  u64 hvm_get_guest_tsc(struct vcpu *v);
  
  void hvm_init_guest_time(struct domain *d);
-void hvm_set_guest_time(struct vcpu *v, u64 guest_time);
  u64 hvm_get_guest_time(struct vcpu *v);
+u64 hvm_get_base_time(struct vcpu *v);
  
  int vmsi_deliver(
      struct domain *d, int vector,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/2] x86/hvm: fix corrupt ACPI PM-Timer during live migration
       [not found]           ` <514A9BC4.40508@jp.fujitsu.com>
  2013-03-21  7:32             ` [PATCH 1/2] x86/hvm: prevent guest's timers from going backwards, when timer_mode=0 Kouya Shimura
@ 2013-03-21  7:32             ` Kouya Shimura
  2013-03-21 10:01               ` Jan Beulich
  1 sibling, 1 reply; 22+ messages in thread
From: Kouya Shimura @ 2013-03-21  7:32 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

The ACPI PM-Timer value is broken when a vm is saved.
The function pmtimer_save() calculates the value on its own,
but vcpu->arch.hvm_vcpu.guest_time is not valid (usually zero).

Another problem is a double counting of the PM-Timer value
in failure to save a vm.

This patch corrects the value, and also makes the calculation
more precise.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
---
  xen/arch/x86/hvm/pmtimer.c |   19 ++++++-------------
  1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/hvm/pmtimer.c b/xen/arch/x86/hvm/pmtimer.c
index 5c25cfb..203a939 100644
--- a/xen/arch/x86/hvm/pmtimer.c
+++ b/xen/arch/x86/hvm/pmtimer.c
@@ -85,7 +85,7 @@ void hvm_acpi_sleep_button(struct domain *d)
  
  /* Set the correct value in the timer, accounting for time elapsed
   * since the last time we did that. */
-static void pmt_update_time(PMTState *s)
+static void pmt_update_time(PMTState *s, bool_t update_sci)
  {
      uint64_t curr_gtime, tmp;
      uint32_t tmr_val = s->pm.tmr_val, msb = tmr_val & TMR_VAL_MSB;
@@ -107,7 +107,8 @@ static void pmt_update_time(PMTState *s)
      if ( (tmr_val & TMR_VAL_MSB) != msb )
      {
          s->pm.pm1a_sts |= TMR_STS;
-        pmt_update_sci(s);
+        if ( update_sci )
+            pmt_update_sci(s);
      }
  }
  
@@ -123,7 +124,7 @@ static void pmt_timer_callback(void *opaque)
      spin_lock(&s->lock);
  
      /* Recalculate the timer and make sure we get an SCI if we need one */
-    pmt_update_time(s);
+    pmt_update_time(s, 1);
  
      /* How close are we to the next MSB flip? */
      pmt_cycles_until_flip = TMR_VAL_MSB - (s->pm.tmr_val & (TMR_VAL_MSB - 1));
@@ -221,7 +222,7 @@ static int handle_pmt_io(
          if ( spin_trylock(&s->lock) )
          {
              /* We hold the lock: update timer value and return it. */
-            pmt_update_time(s);
+            pmt_update_time(s, 1);
              *val = s->pm.tmr_val;
              spin_unlock(&s->lock);
          }
@@ -244,19 +245,11 @@ static int handle_pmt_io(
  static int pmtimer_save(struct domain *d, hvm_domain_context_t *h)
  {
      PMTState *s = &d->arch.hvm_domain.pl_time.vpmt;
-    uint32_t x, msb = s->pm.tmr_val & TMR_VAL_MSB;
      int rc;
  
      spin_lock(&s->lock);
  
-    /* Update the counter to the guest's current time.  We always save
-     * with the domain paused, so the saved time should be after the
-     * last_gtime, but just in case, make sure we only go forwards */
-    x = ((s->vcpu->arch.hvm_vcpu.guest_time - s->last_gtime) * s->scale) >> 32;
-    if ( x < 1UL<<31 )
-        s->pm.tmr_val += x;
-    if ( (s->pm.tmr_val & TMR_VAL_MSB) != msb )
-        s->pm.pm1a_sts |= TMR_STS;
+    pmt_update_time(s, 0);
      /* No point in setting the SCI here because we'll already have saved the
       * IRQ and *PIC state; we'll fix it up when we restore the domain */
  
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-03-21  7:31           ` Kouya Shimura
@ 2013-03-21  8:05             ` Jan Beulich
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2013-03-21  8:05 UTC (permalink / raw)
  To: Kouya Shimura; +Cc: xen-devel

>>> On 21.03.13 at 08:31, Kouya Shimura <kouya@jp.fujitsu.com> wrote:
> Actually, I don't think I'd like to support timer_mode=0.
> timer_mode=0 is meaningless for a recent tickless linux kernel.

But this is not a question of your liking, nor is it related to what
recent Linux wants/needs.

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-03-21  7:32             ` [PATCH 2/2] x86/hvm: fix corrupt ACPI PM-Timer during live migration Kouya Shimura
@ 2013-03-21 10:01               ` Jan Beulich
  2013-03-22  1:12                 ` Kouya Shimura
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2013-03-21 10:01 UTC (permalink / raw)
  To: Kouya Shimura; +Cc: Tim Deegan, Keir Fraser, xen-devel

>>> On 21.03.13 at 08:32, Kouya Shimura <kouya@jp.fujitsu.com> wrote:
> @@ -244,19 +245,11 @@ static int handle_pmt_io(
>   static int pmtimer_save(struct domain *d, hvm_domain_context_t *h)
>   {
>       PMTState *s = &d->arch.hvm_domain.pl_time.vpmt;
> -    uint32_t x, msb = s->pm.tmr_val & TMR_VAL_MSB;
>       int rc;
>   
>       spin_lock(&s->lock);
>   
> -    /* Update the counter to the guest's current time.  We always save
> -     * with the domain paused, so the saved time should be after the
> -     * last_gtime, but just in case, make sure we only go forwards */

So on the previous patch version (which you said this one is
identical to) I stated that you lose this property of guaranteeing
no backward move. Am I right in assuming that patch 1 is now
supposed to take care of this?

In any case I'll have to defer to Keir or Tim for that first one.

> -    x = ((s->vcpu->arch.hvm_vcpu.guest_time - s->last_gtime) * s->scale) >> 32;
> -    if ( x < 1UL<<31 )
> -        s->pm.tmr_val += x;
> -    if ( (s->pm.tmr_val & TMR_VAL_MSB) != msb )
> -        s->pm.pm1a_sts |= TMR_STS;
> +    pmt_update_time(s, 0);

Here I can only quote part of my previous reply, which I don't
think you responded to:

"Also, in delay_for_missed_ticks mode you now use a slightly
 different time for updating s->pm - did you double check that
 this is not going to be a problem? Or else, the flag above could
 similarly be used to circumvent this, or hvm_get_guest_time()
 could be made return the frozen time (I suppose, but didn't
 verify - as it appears to be an assumption already before your
 patch -, that pt_freeze_time() runs before pmtimer_save())."

You said you'd think about it, but I don't recall seeing any other
outcome from that than the two patches, and I can't relate the
first patch to this aspect.

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-03-21 10:01               ` Jan Beulich
@ 2013-03-22  1:12                 ` Kouya Shimura
  2013-03-22  8:02                   ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Kouya Shimura @ 2013-03-22  1:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, Keir Fraser, xen-devel

On 03/21/2013 07:01 PM, Jan Beulich wrote:
>>>> On 21.03.13 at 08:32, Kouya Shimura <kouya@jp.fujitsu.com> wrote:
>> @@ -244,19 +245,11 @@ static int handle_pmt_io(
>>    static int pmtimer_save(struct domain *d, hvm_domain_context_t *h)
>>    {
>>        PMTState *s = &d->arch.hvm_domain.pl_time.vpmt;
>> -    uint32_t x, msb = s->pm.tmr_val & TMR_VAL_MSB;
>>        int rc;
>>
>>        spin_lock(&s->lock);
>>
>> -    /* Update the counter to the guest's current time.  We always save
>> -     * with the domain paused, so the saved time should be after the
>> -     * last_gtime, but just in case, make sure we only go forwards */
>
> So on the previous patch version (which you said this one is
> identical to) I stated that you lose this property of guaranteeing
> no backward move. Am I right in assuming that patch 1 is now
> supposed to take care of this?

Yes.  Patch 1 guarantees  that the timer counter only goes forwards.

> In any case I'll have to defer to Keir or Tim for that first one.
>
>> -    x = ((s->vcpu->arch.hvm_vcpu.guest_time - s->last_gtime) * s->scale) >> 32;
>> -    if ( x < 1UL<<31 )
>> -        s->pm.tmr_val += x;
>> -    if ( (s->pm.tmr_val & TMR_VAL_MSB) != msb )
>> -        s->pm.pm1a_sts |= TMR_STS;
>> +    pmt_update_time(s, 0);
>
> Here I can only quote part of my previous reply, which I don't
> think you responded to:
>
> "Also, in delay_for_missed_ticks mode you now use a slightly
>   different time for updating s->pm - did you double check that
>   this is not going to be a problem? Or else, the flag above could
>   similarly be used to circumvent this, or hvm_get_guest_time()
>   could be made return the frozen time (I suppose, but didn't
>   verify - as it appears to be an assumption already before your
>   patch -, that pt_freeze_time() runs before pmtimer_save())."
>
> You said you'd think about it, but I don't recall seeing any other
> outcome from that than the two patches, and I can't relate the
> first patch to this aspect.

In patch 1, pmt_update_time() calls the new function hvm_get_base_time()
which returns the frozen time when the vcpu is de-scheduled.
And I confirmed pmtimer_save() is surely called after pt_freeze_time()
from my test and review.
So, if both patches are applied, there is no difference in
delay_for_missed_ticks mode.

Thanks,
Kouya

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-03-22  1:12                 ` Kouya Shimura
@ 2013-03-22  8:02                   ` Jan Beulich
  2013-05-15 14:23                     ` Alex Bligh
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2013-03-22  8:02 UTC (permalink / raw)
  To: Kouya Shimura; +Cc: Tim Deegan, Keir Fraser, xen-devel

>>> On 22.03.13 at 02:12, Kouya Shimura <kouya@jp.fujitsu.com> wrote:
> On 03/21/2013 07:01 PM, Jan Beulich wrote:
>>>>> On 21.03.13 at 08:32, Kouya Shimura <kouya@jp.fujitsu.com> wrote:
>>> @@ -244,19 +245,11 @@ static int handle_pmt_io(
>>>    static int pmtimer_save(struct domain *d, hvm_domain_context_t *h)
>>>    {
>>>        PMTState *s = &d->arch.hvm_domain.pl_time.vpmt;
>>> -    uint32_t x, msb = s->pm.tmr_val & TMR_VAL_MSB;
>>>        int rc;
>>>
>>>        spin_lock(&s->lock);
>>>
>>> -    /* Update the counter to the guest's current time.  We always save
>>> -     * with the domain paused, so the saved time should be after the
>>> -     * last_gtime, but just in case, make sure we only go forwards */
>>
>> So on the previous patch version (which you said this one is
>> identical to) I stated that you lose this property of guaranteeing
>> no backward move. Am I right in assuming that patch 1 is now
>> supposed to take care of this?
> 
> Yes.  Patch 1 guarantees  that the timer counter only goes forwards.
> 
>> In any case I'll have to defer to Keir or Tim for that first one.
>>
>>> -    x = ((s->vcpu->arch.hvm_vcpu.guest_time - s->last_gtime) * s->scale) >> 32;
>>> -    if ( x < 1UL<<31 )
>>> -        s->pm.tmr_val += x;
>>> -    if ( (s->pm.tmr_val & TMR_VAL_MSB) != msb )
>>> -        s->pm.pm1a_sts |= TMR_STS;
>>> +    pmt_update_time(s, 0);
>>
>> Here I can only quote part of my previous reply, which I don't
>> think you responded to:
>>
>> "Also, in delay_for_missed_ticks mode you now use a slightly
>>   different time for updating s->pm - did you double check that
>>   this is not going to be a problem? Or else, the flag above could
>>   similarly be used to circumvent this, or hvm_get_guest_time()
>>   could be made return the frozen time (I suppose, but didn't
>>   verify - as it appears to be an assumption already before your
>>   patch -, that pt_freeze_time() runs before pmtimer_save())."
>>
>> You said you'd think about it, but I don't recall seeing any other
>> outcome from that than the two patches, and I can't relate the
>> first patch to this aspect.
> 
> In patch 1, pmt_update_time() calls the new function hvm_get_base_time()
> which returns the frozen time when the vcpu is de-scheduled.
> And I confirmed pmtimer_save() is surely called after pt_freeze_time()
> from my test and review.
> So, if both patches are applied, there is no difference in
> delay_for_missed_ticks mode.

Okay, thanks for confirming!

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-03-22  8:02                   ` Jan Beulich
@ 2013-05-15 14:23                     ` Alex Bligh
  2013-05-15 14:31                       ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Alex Bligh @ 2013-05-15 14:23 UTC (permalink / raw)
  To: Jan Beulich, Kouya Shimura; +Cc: Keir Fraser, Tim Deegan, Alex Bligh, xen-devel

Kouya, Jan,

>> In patch 1, pmt_update_time() calls the new function hvm_get_base_time()
>> which returns the frozen time when the vcpu is de-scheduled.
>> And I confirmed pmtimer_save() is surely called after pt_freeze_time()
>> from my test and review.
>> So, if both patches are applied, there is no difference in
>> delay_for_missed_ticks mode.
>
> Okay, thanks for confirming!

Did this ever get committed? I can't immediately find a commit.

I am wondering if this is related to the ACPI timer issues we are seeing
on migration in qemu-upstream DM.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-05-15 14:23                     ` Alex Bligh
@ 2013-05-15 14:31                       ` Jan Beulich
  2013-05-15 14:49                         ` Alex Bligh
  2013-05-16  5:27                         ` Kouya Shimura
  0 siblings, 2 replies; 22+ messages in thread
From: Jan Beulich @ 2013-05-15 14:31 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Tim Deegan, Keir Fraser, Kouya Shimura, xen-devel

>>> On 15.05.13 at 16:23, Alex Bligh <alex@alex.org.uk> wrote:
> Kouya, Jan,
> 
>>> In patch 1, pmt_update_time() calls the new function hvm_get_base_time()
>>> which returns the frozen time when the vcpu is de-scheduled.
>>> And I confirmed pmtimer_save() is surely called after pt_freeze_time()
>>> from my test and review.
>>> So, if both patches are applied, there is no difference in
>>> delay_for_missed_ticks mode.
>>
>> Okay, thanks for confirming!
> 
> Did this ever get committed? I can't immediately find a commit.

No, it didn't - no-one knowing that code well enough ever acked
patch 1, and without that we can't apply the patch here.

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-05-15 14:31                       ` Jan Beulich
@ 2013-05-15 14:49                         ` Alex Bligh
  2013-05-15 14:54                           ` Jan Beulich
  2013-05-16  5:27                         ` Kouya Shimura
  1 sibling, 1 reply; 22+ messages in thread
From: Alex Bligh @ 2013-05-15 14:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Kouya Shimura, Tim Deegan, xen-devel, Alex Bligh,
	Diana Crisan

Jan,

--On 15 May 2013 15:31:19 +0100 Jan Beulich <JBeulich@suse.com> wrote:

>> Did this ever get committed? I can't immediately find a commit.
>
> No, it didn't - no-one knowing that code well enough ever acked
> patch 1, and without that we can't apply the patch here.

I certainly am not someone who knows that code well, so can't help
with that. But I (or more accurately Diana) can reliably replicate
live migrate on HVM and qemu-upstream DM causing (a) ACPI entries
to disappear from xenstore and (b) walltime to fail to advance in
the migrated domain until the walltime is manually set (stuck
clock).

Is this likely to be related?

Diana's threads are here (second and third being most relevant):
 http://lists.xen.org/archives/html/xen-devel/2013-05/msg01475.html
 http://lists.xen.org/archives/html/xen-devel/2013-05/msg01474.html
 http://lists.xen.org/archives/html/xen-devel/2013-05/msg01472.html

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-05-15 14:49                         ` Alex Bligh
@ 2013-05-15 14:54                           ` Jan Beulich
  2013-05-16  5:29                             ` Kouya Shimura
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2013-05-15 14:54 UTC (permalink / raw)
  To: Alex Bligh, Kouya Shimura
  Cc: Tim Deegan, Diana Crisan, Keir Fraser, xen-devel

>>> On 15.05.13 at 16:49, Alex Bligh <alex@alex.org.uk> wrote:
> --On 15 May 2013 15:31:19 +0100 Jan Beulich <JBeulich@suse.com> wrote:
> 
>>> Did this ever get committed? I can't immediately find a commit.
>>
>> No, it didn't - no-one knowing that code well enough ever acked
>> patch 1, and without that we can't apply the patch here.
> 
> I certainly am not someone who knows that code well, so can't help
> with that. But I (or more accurately Diana) can reliably replicate
> live migrate on HVM and qemu-upstream DM causing (a) ACPI entries
> to disappear from xenstore and (b) walltime to fail to advance in
> the migrated domain until the walltime is manually set (stuck
> clock).
> 
> Is this likely to be related?

I'd like to defer to Kouya to tell whether that matches the
symptoms he saw.

Jan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-05-15 14:31                       ` Jan Beulich
  2013-05-15 14:49                         ` Alex Bligh
@ 2013-05-16  5:27                         ` Kouya Shimura
  2013-05-16  9:54                           ` Tim Deegan
  1 sibling, 1 reply; 22+ messages in thread
From: Kouya Shimura @ 2013-05-16  5:27 UTC (permalink / raw)
  To: Keir Fraser, Tim Deegan; +Cc: Alex Bligh, Jan Beulich, xen-devel

[-- Attachment #1: Type: text/plain, Size: 805 bytes --]

Hi Keir, Tim,

I attached the patch again.
Can I get an ACK or NACK?

Thanks,
kouya

On 05/15/2013 11:31 PM, Jan Beulich wrote:
>>>> On 15.05.13 at 16:23, Alex Bligh <alex@alex.org.uk> wrote:
>> Kouya, Jan,
>>
>>>> In patch 1, pmt_update_time() calls the new function hvm_get_base_time()
>>>> which returns the frozen time when the vcpu is de-scheduled.
>>>> And I confirmed pmtimer_save() is surely called after pt_freeze_time()
>>>> from my test and review.
>>>> So, if both patches are applied, there is no difference in
>>>> delay_for_missed_ticks mode.
>>>
>>> Okay, thanks for confirming!
>>
>> Did this ever get committed? I can't immediately find a commit.
>
> No, it didn't - no-one knowing that code well enough ever acked
> patch 1, and without that we can't apply the patch here.
>
> Jan
>




[-- Attachment #2: timer_mode_0.patch --]
[-- Type: text/x-patch, Size: 8017 bytes --]

In the case of delay_for_missed_ticks mode (timer_mode=0), the guest
virtual time goes backwards when the vcpu is rescheduled. Therefore
guest's HW timer might go backwards, too.

Case 1. SMP guest:
 1) vcpu#1 is de-scheduled and the guest time freezes. (TIME 0:0.010)
 2) vcpu#2 access a timer (0:0.020)
 3) vcpu#1 is re-scheduled and the guest time thaws.   (0:0.030->0:0.010)
 4) vcpu#2 access a timer (0:0.015) // Backwards!!!

Case 2. asynchronous callback:
 1) vcpu#1 is de-scheduled and the guest time freezes. (0:0.010)
 2) pmt_timer_callback() is invoked (0:0.025)
 3) vcpu#1 is re-scheduled and the guest time thaws.   (0:0.030->0:0.010)
 4) vcpu#1 access the PM-TIMER (0:0.015) // Backwards!!!

This patch affects only delay_for_missed_ticks mode (timer_mode=0) and
ensures the monotonicity of the following timers:
 - PIT
 - HPET
 - ACPI PM-TIMER

The following timers are OK since a vcpu never access the other vcpu's timer.
 - Local APIC ( has some callbacks but it's called from pt_intr_post )
 - TSC
Just in case, these timers can use the new function hvm_get_base_time() too,
but doesn't. It's a little bit less efficient than hvm_get_guest_time().

Also, tidy up virtual platform timer code.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
---
 xen/arch/x86/hvm/hpet.c       |    2 +-
 xen/arch/x86/hvm/i8254.c      |    2 +-
 xen/arch/x86/hvm/pmtimer.c    |    2 +-
 xen/arch/x86/hvm/vpt.c        |   93 ++++++++++++++++++++++++++++++++++-------
 xen/include/asm-x86/hvm/hvm.h |    2 +-
 5 files changed, 82 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index 4b4b905..fa44d37 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -39,7 +39,7 @@
 /* Frequency_of_Xen_systeme_time / frequency_of_HPET = 16 */
 #define STIME_PER_HPET_TICK 16
 #define guest_time_hpet(hpet) \
-    (hvm_get_guest_time(vhpet_vcpu(hpet)) / STIME_PER_HPET_TICK)
+    (hvm_get_base_time(vhpet_vcpu(hpet)) / STIME_PER_HPET_TICK)
 
 #define HPET_TN_INT_ROUTE_CAP_SHIFT 32
 #define HPET_TN_CFG_BITS_READONLY_OR_RESERVED (HPET_TN_RESERVED | \
diff --git a/xen/arch/x86/hvm/i8254.c b/xen/arch/x86/hvm/i8254.c
index c0d6bc2..c45ed88 100644
--- a/xen/arch/x86/hvm/i8254.c
+++ b/xen/arch/x86/hvm/i8254.c
@@ -54,7 +54,7 @@ static int handle_speaker_io(
     int dir, uint32_t port, uint32_t bytes, uint32_t *val);
 
 #define get_guest_time(v) \
-   (is_hvm_vcpu(v) ? hvm_get_guest_time(v) : (u64)get_s_time())
+   (is_hvm_vcpu(v) ? hvm_get_base_time(v) : (u64)get_s_time())
 
 static int pit_get_count(PITState *pit, int channel)
 {
diff --git a/xen/arch/x86/hvm/pmtimer.c b/xen/arch/x86/hvm/pmtimer.c
index 01ae31d..5c25cfb 100644
--- a/xen/arch/x86/hvm/pmtimer.c
+++ b/xen/arch/x86/hvm/pmtimer.c
@@ -93,7 +93,7 @@ static void pmt_update_time(PMTState *s)
     ASSERT(spin_is_locked(&s->lock));
 
     /* Update the timer */
-    curr_gtime = hvm_get_guest_time(s->vcpu);
+    curr_gtime = hvm_get_base_time(s->vcpu);
     tmp = ((curr_gtime - s->last_gtime) * s->scale) + s->not_accounted;
     s->not_accounted = (uint32_t)tmp;
     tmr_val += tmp >> 32;
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 46d3ec6..7a3edf3 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -36,6 +36,19 @@ void hvm_init_guest_time(struct domain *d)
     pl->last_guest_time = 0;
 }
 
+static inline u64 pt_now(struct pl_time *pl, struct vcpu *v)
+{
+    u64 now = get_s_time() + pl->stime_offset;
+
+    ASSERT(spin_is_locked(&pl->pl_time_lock));
+
+    if ( (int64_t)(now - pl->last_guest_time) > 0 )
+        pl->last_guest_time = now;
+    else
+        now = ++pl->last_guest_time;
+    return now + v->arch.hvm_vcpu.stime_offset;
+}
+
 u64 hvm_get_guest_time(struct vcpu *v)
 {
     struct pl_time *pl = &v->domain->arch.hvm_domain.pl_time;
@@ -45,19 +58,33 @@ u64 hvm_get_guest_time(struct vcpu *v)
     ASSERT(is_hvm_vcpu(v));
 
     spin_lock(&pl->pl_time_lock);
-    now = get_s_time() + pl->stime_offset;
-    if ( (int64_t)(now - pl->last_guest_time) > 0 )
-        pl->last_guest_time = now;
-    else
-        now = ++pl->last_guest_time;
+    now = pt_now(pl, v);
     spin_unlock(&pl->pl_time_lock);
-
-    return now + v->arch.hvm_vcpu.stime_offset;
+    return now;
 }
 
-void hvm_set_guest_time(struct vcpu *v, u64 guest_time)
+/*
+ * This function is used to emulate HW timer counters. In the case of
+ * delay_for_missed_ticks mode, the guest time once goes backwards to
+ * the frozen time when the vcpu is rescheduled. To avoid decrement
+ * of a timer counter, return the frozen time while the vcpu is not
+ * being scheduled.
+ */
+u64 hvm_get_base_time(struct vcpu *v)
 {
-    v->arch.hvm_vcpu.stime_offset += guest_time - hvm_get_guest_time(v);
+    struct pl_time *pl = &v->domain->arch.hvm_domain.pl_time;
+    u64 now;
+
+    /* Called from device models shared with PV guests. Be careful. */
+    ASSERT(is_hvm_vcpu(v));
+
+    spin_lock(&pl->pl_time_lock);
+    if ( v->arch.hvm_vcpu.guest_time ) /* the guest time is frozen */
+        now = v->arch.hvm_vcpu.guest_time;
+    else
+        now = pt_now(pl, v);
+    spin_unlock(&pl->pl_time_lock);
+    return now;
 }
 
 static int pt_irq_vector(struct periodic_time *pt, enum hvm_intsrc src)
@@ -138,24 +165,62 @@ static void pt_process_missed_ticks(struct periodic_time *pt)
     pt->scheduled += missed_ticks * pt->period;
 }
 
+/*
+ * N.B. The following three functions, pt_freeze_time(),
+ * pt_thaw_time() and pt_step_time() never race with each others,
+ * but race with either hvm_get_guest_time() or hvm_get_base_time().
+ */
+
 static void pt_freeze_time(struct vcpu *v)
 {
+    struct pl_time *pl;
+
     if ( !mode_is(v->domain, delay_for_missed_ticks) )
         return;
 
-    v->arch.hvm_vcpu.guest_time = hvm_get_guest_time(v);
+    pl = &v->domain->arch.hvm_domain.pl_time;
+    spin_lock(&pl->pl_time_lock);
+    v->arch.hvm_vcpu.guest_time = pt_now(pl, v);
+    spin_unlock(&pl->pl_time_lock);
 }
 
 static void pt_thaw_time(struct vcpu *v)
 {
+    struct pl_time *pl;
+    u64 now, frozen_time = v->arch.hvm_vcpu.guest_time;
+
+#if 0 /* redundant */
     if ( !mode_is(v->domain, delay_for_missed_ticks) )
         return;
+#endif
 
-    if ( v->arch.hvm_vcpu.guest_time == 0 )
+    if ( frozen_time == 0 )
         return;
 
-    hvm_set_guest_time(v, v->arch.hvm_vcpu.guest_time);
+    ASSERT(mode_is(v->domain, delay_for_missed_ticks));
+
+    pl = &v->domain->arch.hvm_domain.pl_time;
+    spin_lock(&pl->pl_time_lock);
+    now = pt_now(pl, v);
+    v->arch.hvm_vcpu.stime_offset += frozen_time - now;
     v->arch.hvm_vcpu.guest_time = 0;
+    spin_unlock(&pl->pl_time_lock);
+}
+
+static void pt_step_time(struct vcpu *v, u64 guest_time)
+{
+    struct pl_time *pl;
+    u64 now;
+
+    if ( !mode_is(v->domain, delay_for_missed_ticks) )
+        return;
+
+    pl = &v->domain->arch.hvm_domain.pl_time;
+    spin_lock(&pl->pl_time_lock);
+    now = pt_now(pl, v);
+    if ( now < guest_time )
+        v->arch.hvm_vcpu.stime_offset += guest_time - now;
+    spin_unlock(&pl->pl_time_lock);
 }
 
 void pt_save_timer(struct vcpu *v)
@@ -341,9 +406,7 @@ void pt_intr_post(struct vcpu *v, struct hvm_intack intack)
         }
     }
 
-    if ( mode_is(v->domain, delay_for_missed_ticks) &&
-         (hvm_get_guest_time(v) < pt->last_plt_gtime) )
-        hvm_set_guest_time(v, pt->last_plt_gtime);
+    pt_step_time(v, pt->last_plt_gtime);
 
     cb = pt->cb;
     cb_priv = pt->priv;
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 2fa2ea5..f4cd200 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -226,8 +226,8 @@ void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc);
 u64 hvm_get_guest_tsc(struct vcpu *v);
 
 void hvm_init_guest_time(struct domain *d);
-void hvm_set_guest_time(struct vcpu *v, u64 guest_time);
 u64 hvm_get_guest_time(struct vcpu *v);
+u64 hvm_get_base_time(struct vcpu *v);
 
 int vmsi_deliver(
     struct domain *d, int vector,
-- 
1.7.9.5



[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-05-15 14:54                           ` Jan Beulich
@ 2013-05-16  5:29                             ` Kouya Shimura
  2013-05-16 10:38                               ` Alex Bligh
  0 siblings, 1 reply; 22+ messages in thread
From: Kouya Shimura @ 2013-05-16  5:29 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Tim Deegan, Diana Crisan, Keir Fraser, Jan Beulich, xen-devel

On 05/15/2013 11:54 PM, Jan Beulich wrote:
>>>> On 15.05.13 at 16:49, Alex Bligh <alex@alex.org.uk> wrote:
>> --On 15 May 2013 15:31:19 +0100 Jan Beulich <JBeulich@suse.com> wrote:
>>
>>>> Did this ever get committed? I can't immediately find a commit.
>>>
>>> No, it didn't - no-one knowing that code well enough ever acked
>>> patch 1, and without that we can't apply the patch here.
>>
>> I certainly am not someone who knows that code well, so can't help
>> with that. But I (or more accurately Diana) can reliably replicate
>> live migrate on HVM and qemu-upstream DM causing (a) ACPI entries
>> to disappear from xenstore and (b) walltime to fail to advance in
>> the migrated domain until the walltime is manually set (stuck
>> clock).
>>
>> Is this likely to be related?
>
> I'd like to defer to Kouya to tell whether that matches the
> symptoms he saw.

I don't think my fix is related to Alex's problem.
I observed that gettimeofday() sometimes goes backward *a few seconds*
on migration.

In linux OSs, ACPI Timer value is masked by 24bit.
The valid range:
      (1 / 3.579545MHz) * 0xffffff = 4.7sec
So, the effect of the corrupt ACPI timer is at most 4.7 sec.
The clock shouldn't be frozen for a long time.

Apparently (a) sounds like a problem of xl toolstack.

--
Kouya

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-05-16  5:27                         ` Kouya Shimura
@ 2013-05-16  9:54                           ` Tim Deegan
  0 siblings, 0 replies; 22+ messages in thread
From: Tim Deegan @ 2013-05-16  9:54 UTC (permalink / raw)
  To: Kouya Shimura; +Cc: Keir Fraser, Alex Bligh, Jan Beulich, xen-devel

At 14:27 +0900 on 16 May (1368714430), Kouya Shimura wrote:
> Hi Keir, Tim,
> 
> I attached the patch again.
> Can I get an ACK or NACK?

Well, I don't think this is suitable for committing:

> +#if 0 /* redundant */
>      if ( !mode_is(v->domain, delay_for_missed_ticks) )
>          return;
> +#endif

but apart from that it seems OK to me.  It's a bit odd to have the
platform timers (almost) stop ticking when their controlling vcpu is
descheduled but I guess no worse than having them go backwards!

So if you fix the '#if 0' (either cut out the code properly or leave it
alone, I don't mind which), you can add

Reviewed-by: Tim Deegan <tim@xen.org>

You'll need an ack from George as well for the release; I suspect that
will depend on how much testing you've done.

Tim.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] x86/hvm: fix corrupt ACPI PM-Timer during live migration
  2013-05-16  5:29                             ` Kouya Shimura
@ 2013-05-16 10:38                               ` Alex Bligh
  0 siblings, 0 replies; 22+ messages in thread
From: Alex Bligh @ 2013-05-16 10:38 UTC (permalink / raw)
  To: Kouya Shimura
  Cc: Keir Fraser, Jan Beulich, Tim Deegan, xen-devel, Alex Bligh,
	Diana Crisan



--On 16 May 2013 14:29:00 +0900 Kouya Shimura <kouya@jp.fujitsu.com> wrote:

> Apparently (a) sounds like a problem of xl toolstack.

It isn't just the xl toolstack. We are seeing this with libxl (indeed
did originally and had to go replicate with xl).

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2013-05-16 10:38 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-05  6:12 [PATCH] x86/hvm: fix corrupt ACPI PM-Timer during live migration Kouya Shimura
2013-02-12 12:26 ` Jan Beulich
2013-02-14  6:09   ` Kouya Shimura
2013-02-15 16:45     ` Jan Beulich
2013-02-20  7:42       ` Kouya Shimura
2013-03-07 15:58         ` Jan Beulich
2013-03-08  7:59           ` Kouya Shimura
2013-03-21  7:31           ` Kouya Shimura
2013-03-21  8:05             ` Jan Beulich
     [not found]           ` <514A9BC4.40508@jp.fujitsu.com>
2013-03-21  7:32             ` [PATCH 1/2] x86/hvm: prevent guest's timers from going backwards, when timer_mode=0 Kouya Shimura
2013-03-21  7:32             ` [PATCH 2/2] x86/hvm: fix corrupt ACPI PM-Timer during live migration Kouya Shimura
2013-03-21 10:01               ` Jan Beulich
2013-03-22  1:12                 ` Kouya Shimura
2013-03-22  8:02                   ` Jan Beulich
2013-05-15 14:23                     ` Alex Bligh
2013-05-15 14:31                       ` Jan Beulich
2013-05-15 14:49                         ` Alex Bligh
2013-05-15 14:54                           ` Jan Beulich
2013-05-16  5:29                             ` Kouya Shimura
2013-05-16 10:38                               ` Alex Bligh
2013-05-16  5:27                         ` Kouya Shimura
2013-05-16  9:54                           ` Tim Deegan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.