All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kurz <groug@kaod.org>
To: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: qemu-devel@nongnu.org, Laurent Vivier <lvivier@redhat.com>,
	qemu-ppc@nongnu.org, qemu-stable@nongnu.org,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH] target/ppc: only save guest timebase once after stopping
Date: Fri, 4 May 2018 15:50:28 +0200	[thread overview]
Message-ID: <20180504155028.75f1ad45@bahia.lan> (raw)
In-Reply-To: <152543629398.15508.17426624020859105239@sif>

On Fri, 04 May 2018 07:18:13 -0500
Michael Roth <mdroth@linux.vnet.ibm.com> wrote:

> Quoting Greg Kurz (2018-05-04 04:37:24)
> > On Thu,  3 May 2018 23:20:44 -0500
> > Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> >   
> > > In some cases (e.g. spapr) we record guest timebase after qmp_stop()
> > > via a runstate hook so we can restore it on qmp_cont(). If a migration
> > > occurs in between those events we end up saving it again, this time
> > > based on the current timebase the guest would be seeing had it been
> > > running. This has the effect of advancing the guest timebase while
> > > it is stopped, which is not what the code intends.
> > >   
> > 
> > Hi Mike,
> > 
> > The current behavior was introduced by:
> > 
> > commit 42043e4f1241eeb77f87f5816b5cf0b6e9583ed7
> > Author: Laurent Vivier <lvivier@redhat.com>
> > Date:   Fri Jan 27 13:24:58 2017 +0100
> > 
> >     spapr: clock should count only if vm is running
> > 
> > and we have this in the changelog:
> > 
> >     We keep timebase_pre_save to reduce the clock difference on
> >     migration like in:
> >         6053a86 kvmclock: reduce kvmclock difference on migration
> > 
> > 
> > So your patch totally negates ^^ ? Also, I can't see a case where  
> 
> Yah... this is a bit confusing. On one hand, the patch/summary is clearly
> trying to avoid the guest time from advancing while it is stopped, which
> is in the spirit of this patch. But at the same time it is trying to
> compensate for loss of time (relative to host) due to downtime window.
> 

Yeah... not sure why Laurent decided to address both in the same patch...
maybe just because we already had the pre_save hook ?

> I think the subtlety is in the amount of time... saving at pre_save
> rather than vm_stop() compensates for the normal downtime window, which
> is *usually* small (5s is the figure they quote in the notes there and
> in the motivating 6053a86 "kvmclock: reduce kvmclock difference on
> migration"). The delays between vm_stop and vm_cont via something like
> virsh suspend/resume is unbounded, unhowever, hence the rationale for
> the runstate hook (?).
> 

That's my understanding as well.

> So maybe small jumps are considered okay, and large ones not? If that's
> the reasoning, then this patch is addressing the later, so it's not
> necessarily in conflict with that motivation, but the implementation
> does negate the small jumps we try to avoid via pre_save hook since
> we'll end up keep the version we saved just after vm_stop instead.
> 
> I would note that the downtime window itself, while usually small, can
> also be quite large. With 1GB hugepages we've seen some guests requiring
> downtime windows to be set to 25s until QEMU would start cut-over. Also
> rcu_cpu_stall_timeout is configurable...it's possible if we set it to
> 5s it could trigger on the jump the guest experiences from pre_save (I
> haven't tested that though).
> 
> Maybe trying to compensate for downtime is a generally bad idea and we
> should just leave it up to NTP/etc? 

My understanding of NTP is that it isn't designed to cope with sudden
time differences, which is exactly what happens in our case.

> Or maybe we should choose a specific
> upper bound on how much migration downtime we're willing to compensate
> for and enforce that directly? E.g. tb->saved becomes tb->saved_time and
> we check the difference in pre_save before calling timebase_save()
> again.
> 

This would maybe allow to reach a compromise between the current code
and your patch... but it would still be difficult to come up with
a sensible value for this upper bound, wouldn't it ?

> > So your patch totally negates ^^ ? Also, I can't see a case where
> > timebase_save() could be called from vmstate_save_state() while the
> > VM is running, ie, you could drop timebase_pre_save()... or am I
> > *probably* missing something ?  
> 
> Yah, I didn't notice that my patch completely negated the pre_save
> hook... for some reason I was thinking that would continue to function
> normally if we didn't call qmp_stop() explicitly but that's clearly not
> the case. So yah, dropping timebase_pre_save() is essentially what my
> patch is doing...
> 

<thinking aloud>
How does Linux cope with standard software suspend or hibernate ? It also
causes a downtime and it doesn't generate RCU stalls AFAIK... would it
be possible/make sense for migration to look like an hibernate ?
</thinking aloud>

> >   
> > > Other than simple jumps in time, this has been seen to trigger what
> > > appear to be RCU-related crashes in recent kernels when the advance
> > > exceeds rcu_cpu_stall_timeout, and it can be triggered by fairly
> > > common operations such as `virsh migrate ... --timeout 60`.
> > > 
> > > Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> > > Cc: David Gibson <david@gibson.dropbear.id.au>
> > > Cc: Laurent Vivier <lvivier@redhat.com>
> > > Cc: qemu-ppc@nongnu.org
> > > Cc: qemu-stable@nongnu.org
> > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > ---
> > >  hw/ppc/ppc.c         | 12 ++++++++++++
> > >  target/ppc/cpu-qom.h |  1 +
> > >  2 files changed, 13 insertions(+)
> > > 
> > > diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> > > index ec4be25f49..ff0a107864 100644
> > > --- a/hw/ppc/ppc.c
> > > +++ b/hw/ppc/ppc.c
> > > @@ -865,6 +865,15 @@ static void timebase_save(PPCTimebase *tb)
> > >      uint64_t ticks = cpu_get_host_ticks();
> > >      PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
> > >  
> > > +    /* since we generally save timebase just after the guest
> > > +     * has stopped, avoid trying to save it again since we will
> > > +     * end up advancing it by the amount of ticks that have
> > > +     * elapsed in the host since the initial save
> > > +     */
> > > +    if (tb->saved) {
> > > +        return;
> > > +    }
> > > +
> > >      if (!first_ppc_cpu->env.tb_env) {
> > >          error_report("No timebase object");
> > >          return;
> > > @@ -877,6 +886,7 @@ static void timebase_save(PPCTimebase *tb)
> > >       * there is no need to update it from KVM here
> > >       */
> > >      tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
> > > +    tb->saved = true;
> > >  }
> > >  
> > >  static void timebase_load(PPCTimebase *tb)
> > > @@ -908,6 +918,8 @@ static void timebase_load(PPCTimebase *tb)
> > >                          &pcpu->env.tb_env->tb_offset);
> > >  #endif
> > >      }
> > > +
> > > +    tb->saved = false;
> > >  }
> > >  
> > >  void cpu_ppc_clock_vm_state_change(void *opaque, int running,
> > > diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
> > > index deaa46a14b..ec2dbcdcae 100644
> > > --- a/target/ppc/cpu-qom.h
> > > +++ b/target/ppc/cpu-qom.h
> > > @@ -210,6 +210,7 @@ typedef struct PowerPCCPUClass {
> > >  typedef struct PPCTimebase {
> > >      uint64_t guest_timebase;
> > >      int64_t time_of_the_day_ns;
> > > +    bool saved;
> > >  } PPCTimebase;
> > >  
> > >  extern const struct VMStateDescription vmstate_ppc_timebase;  
> >   
> 

  reply	other threads:[~2018-05-04 13:50 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-04  4:20 [Qemu-devel] [PATCH] target/ppc: only save guest timebase once after stopping Michael Roth
2018-05-04  9:37 ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2018-05-04 12:18   ` Michael Roth
2018-05-04 13:50     ` Greg Kurz [this message]
2018-05-04 15:59       ` Laurent Vivier
2018-05-05  4:23       ` David Gibson
2018-05-05  4:20     ` David Gibson
2018-07-26  5:07 ` [Qemu-devel] " David Gibson
2018-07-26  7:44   ` Laurent Vivier
2018-07-27  1:09     ` David Gibson
2018-07-26 12:30   ` Michael Roth
2018-07-27  1:10     ` David Gibson
2018-07-27 12:35       ` Mark Cave-Ayland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180504155028.75f1ad45@bahia.lan \
    --to=groug@kaod.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=lvivier@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.