All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Fabiano Rosas <farosas@suse.de>,
	qemu-devel@nongnu.org, Claudio Fontana <cfontana@suse.de>,
	jfehlig@suse.com, dfaggioli@suse.com, dgilbert@redhat.com,
	Juan Quintela <quintela@redhat.com>
Subject: Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram
Date: Tue, 18 Apr 2023 15:26:45 -0400	[thread overview]
Message-ID: <ZD7u9YHTor4edGWw@x1n> (raw)
In-Reply-To: <ZD7MRGQ+4QsDBtKR@redhat.com>

On Tue, Apr 18, 2023 at 05:58:44PM +0100, Daniel P. Berrangé wrote:
> On Fri, Mar 31, 2023 at 12:27:48PM -0400, Peter Xu wrote:
> > On Fri, Mar 31, 2023 at 05:10:16PM +0100, Daniel P. Berrangé wrote:
> > > On Fri, Mar 31, 2023 at 11:55:03AM -0400, Peter Xu wrote:
> > > > On Fri, Mar 31, 2023 at 12:30:45PM -0300, Fabiano Rosas wrote:
> > > > > Peter Xu <peterx@redhat.com> writes:
> > > > > 
> > > > > > On Fri, Mar 31, 2023 at 11:37:50AM -0300, Fabiano Rosas wrote:
> > > > > >> >> Outgoing migration to file. NVMe disk. XFS filesystem.
> > > > > >> >> 
> > > > > >> >> - Single migration runs of stopped 32G guest with ~90% RAM usage. Guest
> > > > > >> >>   running `stress-ng --vm 4 --vm-bytes 90% --vm-method all --verify -t
> > > > > >> >>   10m -v`:
> > > > > >> >> 
> > > > > >> >> migration type  | MB/s | pages/s |  ms
> > > > > >> >> ----------------+------+---------+------
> > > > > >> >> savevm io_uring |  434 |  102294 | 71473
> > > > > >> >
> > > > > >> > So I assume this is the non-live migration scenario.  Could you explain
> > > > > >> > what does io_uring mean here?
> > > > > >> >
> > > > > >> 
> > > > > >> This table is all non-live migration. This particular line is a snapshot
> > > > > >> (hmp_savevm->save_snapshot). I thought it could be relevant because it
> > > > > >> is another way by which we write RAM into disk.
> > > > > >
> > > > > > I see, so if all non-live that explains, because I was curious what's the
> > > > > > relationship between this feature and the live snapshot that QEMU also
> > > > > > supports.
> > > > > >
> > > > > > I also don't immediately see why savevm will be much slower, do you have an
> > > > > > answer?  Maybe it's somewhere but I just overlooked..
> > > > > >
> > > > > 
> > > > > I don't have a concrete answer. I could take a jab and maybe blame the
> > > > > extra memcpy for the buffer in QEMUFile? Or perhaps an unintended effect
> > > > > of bandwidth limits?
> > > > 
> > > > IMHO it would be great if this can be investigated and reasons provided in
> > > > the next cover letter.
> > > > 
> > > > > 
> > > > > > IIUC this is "vm suspend" case, so there's an extra benefit knowledge of
> > > > > > "we can stop the VM".  It smells slightly weird to build this on top of
> > > > > > "migrate" from that pov, rather than "savevm", though.  Any thoughts on
> > > > > > this aspect (on why not building this on top of "savevm")?
> > > > > >
> > > > > 
> > > > > I share the same perception. I have done initial experiments with
> > > > > savevm, but I decided to carry on the work that was already started by
> > > > > others because my understanding of the problem was yet incomplete.
> > > > > 
> > > > > One point that has been raised is that the fixed-ram format alone does
> > > > > not bring that many performance improvements. So we'll need
> > > > > multi-threading and direct-io on top of it. Re-using multifd
> > > > > infrastructure seems like it could be a good idea.
> > > > 
> > > > The thing is IMHO concurrency is not as hard if VM stopped, and when we're
> > > > 100% sure locally on where the page will go.
> > > 
> > > We shouldn't assume the VM is stopped though. When saving to the file
> > > the VM may still be active. The fixed-ram format lets us re-write the
> > > same memory location on disk multiple times in this case, thus avoiding
> > > growth of the file size.
> > 
> > Before discussing on reusing multifd below, now I have a major confusing on
> > the use case of the feature..
> > 
> > The question is whether we would like to stop the VM after fixed-ram
> > migration completes.  I'm asking because:
> > 
> >   1. If it will stop, then it looks like a "VM suspend" to me. If so, could
> >      anyone help explain why we don't stop the VM first then migrate?
> >      Because it avoids copying single pages multiple times, no fiddling
> >      with dirty tracking at all - we just don't ever track anything.  In
> >      short, we'll stop the VM anyway, then why not stop it slightly
> >      earlier?
> > 
> >   2. If it will not stop, then it's "VM live snapshot" to me.  We have
> >      that, aren't we?  That's more efficient because it'll wr-protect all
> >      guest pages, any write triggers a CoW and we only copy the guest pages
> >      once and for all.
> > 
> > Either way to go, there's no need to copy any page more than once.  Did I
> > miss anything perhaps very important?
> > 
> > I would guess it's option (1) above, because it seems we don't snapshot the
> > disk alongside.  But I am really not sure now..
> 
> It is both options above.
> 
> Libvirt has multiple APIs where it currently uses its migrate-to-file
> approach
> 
>   * virDomainManagedSave()
> 
>     This saves VM state to an libvirt managed file, stops the VM, and the
>     file state is auto-restored on next request to start the VM, and the
>     file deleted. The VM CPUs are stopped during both save + restore
>     phase
> 
>   * virDomainSave/virDomainRestore
> 
>     The former saves VM state to a file specified by the mgmt app/user.
>     A later call to virDomaniRestore starts the VM using that saved
>     state. The mgmt app / user can delete the file state, or re-use
>     it many times as they desire. The VM CPUs are stopped during both
>     save + restore phase
> 
>   * virDomainSnapshotXXX
> 
>     This family of APIs takes snapshots of the VM disks, optionally
>     also including the full VM state to a separate file. The snapshots
>     can later be restored. The VM CPUs remain running during the
>     save phase, but are stopped during restore phase

For this one IMHO it'll be good if Libvirt can consider leveraging the new
background-snapshot capability (QEMU 6.0+, so not very new..).  Or is there
perhaps any reason why a generic migrate:fd approach is better?

> 
> All these APIs end up calling the same code inside libvirt that uses
> the libvirt-iohelper, together with QEMU migrate:fd driver.
> 
> IIUC, Suse's original motivation for the performance improvements was
> wrt to the first case of virDomainManagedSave. From the POV of actually
> supporting this in libvirt though, we need to cover all the scenarios
> there. Thus we need this to work both when CPUs are running and stopped,
> and if we didn't use migrate in this case, then we basically just end
> up re-inventing migrate again which IMHO is undesirable both from
> libvirt's POV and QEMU's POV.

Just to make sure we're on the same page - I always think it fine to use
the QMP "migrate" command to do this.

Meanwhile, we can also reuse the migration framework if we think that's
still the good way to go (even if I am not 100% sure on this... I still
think _lots_ of the live migration framework as plenty of logics trying to
take care of a "live" VM, IOW, those logics will become pure overheads if
we reuse the live migration framework for vm suspend).

However could you help elaborate more on why it must support live mode for
a virDomainManagedSave() request?  As I assume this is the core of the goal.

IMHO virDomainManagedSave() is a good interface design, because it contains
the target goal of what it wants to do (according to above).  To ask in
another way, I'm curious whether virDomainManagedSave() will stop the VM
before triggering the QMP "migrate" to fd: If it doesn't, why not?  If it
does, then why we can't have that assumption also for QEMU?

That assumption is IMHO important for QEMU because non-live VM migration
can avoid tons of overhead that a live migration will need.  I've mentioned
this in the other reply, even if we keep using the migration framework, we
can still optimize other things like dirty tracking.  We probably don't
even need any bitmap at all because we simply scan over all ramblocks.

OTOH, if QEMU supports live mode for a "vm suspend" in the initial design,
not only it doesn't sound right at all from interface level, it means QEMU
will need to keep doing so forever because we need to be compatible with
the old interfaces even on new binaries.  That's why I keep suggesting we
should take "VM turned off" part of the cmd if that's what we're looking
for.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2023-04-18 19:27 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-30 18:03 [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 01/26] migration: Add support for 'file:' uri for source migration Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 02/26] migration: Add support for 'file:' uri for incoming migration Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 03/26] tests/qtest: migration: Add migrate_incoming_qmp helper Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 04/26] tests/qtest: migration-test: Add tests for file-based migration Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 05/26] migration: Initial support of fixed-ram feature for analyze-migration.py Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 06/26] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 07/26] io: Add generic pwritev/preadv interface Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 08/26] io: implement io_pwritev/preadv for QIOChannelFile Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 09/26] migration/qemu-file: add utility methods for working with seekable channels Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 10/26] migration/ram: Introduce 'fixed-ram' migration stream capability Fabiano Rosas
2023-03-30 22:01   ` Peter Xu
2023-03-31  7:56     ` Daniel P. Berrangé
2023-03-31 14:39       ` Peter Xu
2023-03-31 15:34         ` Daniel P. Berrangé
2023-03-31 16:13           ` Peter Xu
2023-03-31 15:05     ` Fabiano Rosas
2023-03-31  5:50   ` Markus Armbruster
2023-03-30 18:03 ` [RFC PATCH v1 11/26] migration: Refactor precopy ram loading code Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 12/26] migration: Add support for 'fixed-ram' migration restore Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 13/26] tests/qtest: migration-test: Add tests for fixed-ram file-based migration Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 14/26] migration: Add completion tracepoint Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 15/26] migration/multifd: Remove direct "socket" references Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 16/26] migration/multifd: Allow multifd without packets Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 17/26] migration/multifd: Add outgoing QIOChannelFile support Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 18/26] migration/multifd: Add incoming " Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 19/26] migration/multifd: Add pages to the receiving side Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 20/26] io: Add a pwritev/preadv version that takes a discontiguous iovec Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 21/26] migration/ram: Add a wrapper for fixed-ram shadow bitmap Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 22/26] migration/multifd: Support outgoing fixed-ram stream format Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 23/26] migration/multifd: Support incoming " Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 24/26] tests/qtest: Add a multifd + fixed-ram migration test Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 25/26] migration: Add direct-io parameter Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 26/26] tests/migration/guestperf: Add file, fixed-ram and direct-io support Fabiano Rosas
2023-03-30 21:41 ` [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram Peter Xu
2023-03-31 14:37   ` Fabiano Rosas
2023-03-31 14:52     ` Peter Xu
2023-03-31 15:30       ` Fabiano Rosas
2023-03-31 15:55         ` Peter Xu
2023-03-31 16:10           ` Daniel P. Berrangé
2023-03-31 16:27             ` Peter Xu
2023-03-31 18:18               ` Fabiano Rosas
2023-03-31 21:52                 ` Peter Xu
2023-04-03  7:47                   ` Claudio Fontana
2023-04-03 19:26                     ` Peter Xu
2023-04-04  8:00                       ` Claudio Fontana
2023-04-04 14:53                         ` Peter Xu
2023-04-04 15:10                           ` Claudio Fontana
2023-04-04 15:56                             ` Peter Xu
2023-04-06 16:46                               ` Fabiano Rosas
2023-04-07 10:36                                 ` Claudio Fontana
2023-04-11 15:48                                   ` Peter Xu
2023-04-18 16:58               ` Daniel P. Berrangé
2023-04-18 19:26                 ` Peter Xu [this message]
2023-04-19 17:12                   ` Daniel P. Berrangé
2023-04-19 19:07                     ` Peter Xu
2023-04-20  9:02                       ` Daniel P. Berrangé
2023-04-20 19:19                         ` Peter Xu
2023-04-21  7:48                           ` Daniel P. Berrangé
2023-04-21 13:56                             ` Peter Xu
2023-03-31 15:46       ` Daniel P. Berrangé
2023-04-03  7:38 ` David Hildenbrand
2023-04-03 14:41   ` Fabiano Rosas
2023-04-03 16:24     ` David Hildenbrand
2023-04-03 16:36       ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZD7u9YHTor4edGWw@x1n \
    --to=peterx@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cfontana@suse.de \
    --cc=dfaggioli@suse.com \
    --cc=dgilbert@redhat.com \
    --cc=farosas@suse.de \
    --cc=jfehlig@suse.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.