All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey <a.perevalov@samsung.com>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	i.maximets@samsung.com, f4bug@amsat.org, qemu-devel@nongnu.org,
	Peter Xu <peterx@redhat.com>
Subject: Re: [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side
Date: Wed, 10 May 2017 18:46:50 +0300	[thread overview]
Message-ID: <20170510154650.GC4201@aperevalov-ubuntu> (raw)
In-Reply-To: <20170509094434.GF1669@redhat.com>

On Tue, May 09, 2017 at 10:44:34AM +0100, Daniel P. Berrange wrote:
> On Tue, May 09, 2017 at 10:40:34AM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Mon, May 08, 2017 at 12:08:07PM +0300, Alexey wrote:
> > > > On Mon, May 08, 2017 at 02:29:06PM +0800, Peter Xu wrote:
> > > > > On Fri, Apr 28, 2017 at 02:11:19PM +0300, Alexey Perevalov wrote:
> > > > > > On 04/28/2017 01:00 PM, Peter Xu wrote:
> > > > > > >On Fri, Apr 28, 2017 at 09:57:37AM +0300, Alexey Perevalov wrote:
> > > > > > >>This patch provides downtime calculation per vCPU,
> > > > > > >>as a summary and as a overlapped value for all vCPUs.
> > > > > > >>
> > > > > > >>This approach was suggested by Peter Xu, as an improvements of
> > > > > > >>previous approch where QEMU kept tree with faulted page address and cpus bitmask
> > > > > > >>in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > > > > > >>as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > > > > >>list for downtime per vCPU (could be traced with page_fault_addr)
> > > > > > >>
> > > > > > >>For more details see comments for get_postcopy_total_downtime
> > > > > > >>implementation.
> > > > > > >>
> > > > > > >>Downtime will not calculated if postcopy_downtime field of
> > > > > > >>MigrationIncomingState wasn't initialized.
> > > > > > >>
> > > > > > >>Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > > > > >>---
> > > > > > >>  include/migration/migration.h |   3 ++
> > > > > > >>  migration/migration.c         | 103 ++++++++++++++++++++++++++++++++++++++++++
> > > > > > >>  migration/postcopy-ram.c      |  20 +++++++-
> > > > > > >>  migration/trace-events        |   6 ++-
> > > > > > >>  4 files changed, 130 insertions(+), 2 deletions(-)
> > > > > > >>
> > > > > > >>diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > > > > >>index e8fb68f..a22f9ce 100644
> > > > > > >>--- a/include/migration/migration.h
> > > > > > >>+++ b/include/migration/migration.h
> > > > > > >>@@ -139,6 +139,9 @@ void migration_incoming_state_destroy(void);
> > > > > > >>   * Functions to work with downtime context
> > > > > > >>   */
> > > > > > >>  struct DowntimeContext *downtime_context_new(void);
> > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr);
> > > > > > >>+uint64_t get_postcopy_total_downtime(void);
> > > > > > >>  struct MigrationState
> > > > > > >>  {
> > > > > > >>diff --git a/migration/migration.c b/migration/migration.c
> > > > > > >>index ec76e5c..2c6f150 100644
> > > > > > >>--- a/migration/migration.c
> > > > > > >>+++ b/migration/migration.c
> > > > > > >>@@ -2150,3 +2150,106 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > > > > > >>      return atomic_xchg(&incoming_postcopy_state, new_state);
> > > > > > >>  }
> > > > > > >>+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > > > > > >>+{
> > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > >>+    DowntimeContext *dc;
> > > > > > >>+    if (!mis->downtime_ctx || cpu < 0) {
> > > > > > >>+        return;
> > > > > > >>+    }
> > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > >>+    dc->vcpu_addr[cpu] = addr;
> > > > > > >>+    dc->last_begin = dc->page_fault_vcpu_time[cpu] =
> > > > > > >>+        qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > >>+
> > > > > > >>+    trace_mark_postcopy_downtime_begin(addr, dc, dc->page_fault_vcpu_time[cpu],
> > > > > > >>+            cpu);
> > > > > > >>+}
> > > > > > >>+
> > > > > > >>+void mark_postcopy_downtime_end(uint64_t addr)
> > > > > > >>+{
> > > > > > >>+    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > > > >>+    DowntimeContext *dc;
> > > > > > >>+    int i;
> > > > > > >>+    bool all_vcpu_down = true;
> > > > > > >>+    int64_t now;
> > > > > > >>+
> > > > > > >>+    if (!mis->downtime_ctx) {
> > > > > > >>+        return;
> > > > > > >>+    }
> > > > > > >>+    dc = mis->downtime_ctx;
> > > > > > >>+    now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > > > > >>+
> > > > > > >>+    /* check all vCPU down,
> > > > > > >>+     * QEMU has bitmap.h, but even with bitmap_and
> > > > > > >>+     * will be a cycle */
> > > > > > >>+    for (i = 0; i < smp_cpus; i++) {
> > > > > > >>+        if (dc->vcpu_addr[i]) {
> > > > > > >>+            continue;
> > > > > > >>+        }
> > > > > > >>+        all_vcpu_down = false;
> > > > > > >>+        break;
> > > > > > >>+    }
> > > > > > >>+
> > > > > > >>+    if (all_vcpu_down) {
> > > > > > >>+        dc->total_downtime += now - dc->last_begin;
> > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > >is one of the page faulted addresses? Can it be some other page? I
> > > > > > >don't know. But since we have the loop below to make sure of it, why
> > > > > > >not?
> > > > > > no, the downtime implies since page fault till the
> > > > > > page will be copied.
> > > > > > Yes another pages could be copied as well as pagefaulted,
> > > > > > and they are copied due to prefetching, but it's not a downtime.
> > > > > 
> > > > > Not sure I got the point... Do you mean that when reach here, then
> > > > > this page address is definitely one of the faulted addresses? I am not
> > > > > 100% sure of this, but if you are sure, I am okay with it.
> > > > Let me clarify.
> > > > 
> > > > > > >Shall we do this accouting only if we are sure the copied page address
> > > > > > >is one of the page faulted addresses?
> > > > Yes it's primary condition, due to there are could be another pages,
> > > > which weren't faulted, they just was sent from source to destination,
> > > > I called it prefetching.
> > > > 
> > > > I think I got why did you ask that question, because in this version
> > > > all_vcpu_down and as a result total_downtime calculated incorrectly,
> > > > it calculates every time when any page is copied, but it should
> > > > be calculated only when faulted page copied, so only dc->vcpu_downtime
> > > > was correctly calculated.
> > > 
> > > Exactly. I am afraid if we have such "prefetching" stuff then
> > > total_downtime will be more than its real value.
> > 
> > It should be OK as long as we measure the time between
> >   userfault reporting a page miss for an address 
> >   and
> >   place_page for *that same address*
> > 
> > any places for other pages are irrelevant.
> > 
> > (I still worry that this definition of 'downtime' is possibly
> > arbitrary - since if all but one of the vCPUs are down we
> > don't count it but it's obviously still a big impact).
> 
> Can we also *not* call it "downtime", as it is measuring a very different
> thing than the "downtime" we have measured today on the source during
> pre-copy migration.

As I know downtime in pre-copy migration it's a time since all vCPU
were disabled on source till enabling these vCPU on destination,
but it's continuous time. So the meaning of both downtime is the same.
It's time when all vCPU are down.

But as David mentioned - downtime per vCPU is also important in case of
postcopy live migration, and I'll include it in final result of
query-migrate too.

> 
> Call it "pagewait" or "delaytime" or something like that to indicate it
> is counting delays to CPUs for page fetching.
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 

-- 

BR
Alexey

  reply	other threads:[~2017-05-10 15:47 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20170428065752eucas1p1b702ff53ba0bd96674e8cc35466f8046@eucas1p1.samsung.com>
2017-04-28  6:57 ` [Qemu-devel] [PATCH RESEND V3 0/6] calculate downtime for postcopy live migration Alexey Perevalov
     [not found]   ` <CGME20170428065752eucas1p190511b1932f61b6321c489f0eb4e816f@eucas1p1.samsung.com>
2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 1/6] userfault: add pid into uffd_msg & update UFFD_FEATURE_* Alexey Perevalov
     [not found]   ` <CGME20170428065753eucas1p1639528c4df0b459db96579fd5bee281c@eucas1p1.samsung.com>
2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 2/6] migration: pass ptr to MigrationIncomingState into migration ufd_version_check & postcopy_ram_supported_by_host Alexey Perevalov
2017-04-28  9:04       ` Peter Xu
     [not found]   ` <CGME20170428065753eucas1p1524aa2bd8e469e6c94a88ee80eb54a6e@eucas1p1.samsung.com>
2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 3/6] migration: split ufd_version_check onto receive/request features part Alexey Perevalov
2017-04-28  9:01       ` Peter Xu
2017-04-28 10:58         ` Alexey Perevalov
2017-04-28 12:57           ` Alexey Perevalov
2017-04-28 15:55       ` Dr. David Alan Gilbert
     [not found]   ` <CGME20170428065754eucas1p1f51713373ce8c2d19945a4f91c52bd5c@eucas1p1.samsung.com>
2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 4/6] migration: add postcopy downtime into MigrationIncommingState Alexey Perevalov
2017-04-28  9:38       ` Peter Xu
2017-04-28 10:03         ` Alexey Perevalov
2017-04-28 10:07           ` Peter Xu
2017-04-28 16:22             ` Dr. David Alan Gilbert
2017-04-29  9:16               ` Alexey
2017-04-29 15:02                 ` Eric Blake
2017-05-02  8:51                 ` Dr. David Alan Gilbert
2017-05-04 13:09                   ` Alexey
2017-05-05 14:11                     ` Dr. David Alan Gilbert
2017-05-05 16:25                       ` Alexey
     [not found]   ` <CGME20170428065755eucas1p2ff9aa17eaa294e741d8c65f8d58a71fb@eucas1p2.samsung.com>
2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 5/6] migration: calculate downtime on dst side Alexey Perevalov
2017-04-28 10:00       ` Peter Xu
2017-04-28 11:11         ` Alexey Perevalov
2017-05-08  6:29           ` Peter Xu
2017-05-08  9:08             ` Alexey
2017-05-09  8:26               ` Peter Xu
2017-05-09  9:40                 ` Dr. David Alan Gilbert
2017-05-09  9:44                   ` Daniel P. Berrange
2017-05-10 15:46                     ` Alexey [this message]
2017-05-10 15:58                       ` Daniel P. Berrange
2017-05-11  4:56                         ` Peter Xu
     [not found]                           ` <CGME20170511070940eucas1p2ca3e44c15c84eef00e33d755a11c0ea1@eucas1p2.samsung.com>
2017-05-11  7:09                             ` Alexey
     [not found]                         ` <CGME20170511064629eucas1p114c72db6d922a6a05a4ec4a4d3003b55@eucas1p1.samsung.com>
2017-05-11  6:46                           ` Alexey
2017-05-09 15:19                   ` Alexey
2017-05-09 19:01                     ` Dr. David Alan Gilbert
2017-05-11  6:32                       ` Alexey
2017-05-11  8:25                         ` Dr. David Alan Gilbert
2017-04-28 16:34       ` Dr. David Alan Gilbert
     [not found]   ` <CGME20170428065755eucas1p1cdd0f278a235f176e9f63c40bc64a7a9@eucas1p1.samsung.com>
2017-04-28  6:57     ` [Qemu-devel] [PATCH RESEND V3 6/6] migration: trace postcopy total downtime Alexey Perevalov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170510154650.GC4201@aperevalov-ubuntu \
    --to=a.perevalov@samsung.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=f4bug@amsat.org \
    --cc=i.maximets@samsung.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.