All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Fabiano Rosas <farosas@suse.de>,
	qemu-devel@nongnu.org, Claudio Fontana <cfontana@suse.de>,
	jfehlig@suse.com, dfaggioli@suse.com, dgilbert@redhat.com,
	Juan Quintela <quintela@redhat.com>
Subject: Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram
Date: Wed, 19 Apr 2023 15:07:19 -0400	[thread overview]
Message-ID: <ZEA759BSs75ldW6Y@x1n> (raw)
In-Reply-To: <ZEAg5QJS44jzAV/v@redhat.com>

On Wed, Apr 19, 2023 at 06:12:05PM +0100, Daniel P. Berrangé wrote:
> On Tue, Apr 18, 2023 at 03:26:45PM -0400, Peter Xu wrote:
> > On Tue, Apr 18, 2023 at 05:58:44PM +0100, Daniel P. Berrangé wrote:
> > > Libvirt has multiple APIs where it currently uses its migrate-to-file
> > > approach
> > > 
> > >   * virDomainManagedSave()
> > > 
> > >     This saves VM state to an libvirt managed file, stops the VM, and the
> > >     file state is auto-restored on next request to start the VM, and the
> > >     file deleted. The VM CPUs are stopped during both save + restore
> > >     phase
> > > 
> > >   * virDomainSave/virDomainRestore
> > > 
> > >     The former saves VM state to a file specified by the mgmt app/user.
> > >     A later call to virDomaniRestore starts the VM using that saved
> > >     state. The mgmt app / user can delete the file state, or re-use
> > >     it many times as they desire. The VM CPUs are stopped during both
> > >     save + restore phase
> > > 
> > >   * virDomainSnapshotXXX
> > > 
> > >     This family of APIs takes snapshots of the VM disks, optionally
> > >     also including the full VM state to a separate file. The snapshots
> > >     can later be restored. The VM CPUs remain running during the
> > >     save phase, but are stopped during restore phase
> > 
> > For this one IMHO it'll be good if Libvirt can consider leveraging the new
> > background-snapshot capability (QEMU 6.0+, so not very new..).  Or is there
> > perhaps any reason why a generic migrate:fd approach is better?
> 
> I'm not sure I fully understand the implications of 'background-snapshot' ?
> 
> Based on what the QAPI comment says, it sounds potentially interesting,
> as conceptually it would be nicer to have the memory / state snapshot
> represent the VM at the point where we started the snapshot operation,
> rather than where we finished the snapshot operation.
> 
> It would not solve the performance problems that the work in this thread
> was intended to address though.  With large VMs (100's of GB of RAM),
> saving all the RAM state to disk takes a very long time, regardless of
> whether the VM vCPUs are paused or running.

I think it solves the performance problem by only copy each of the guest
page once, even if the guest is running.

Different from mostly all the rest of "migrate" use cases, background
snapshot does not use the generic dirty tracking at all (for KVM that's
get-dirty-log), instead it uses userfaultfd wr-protects, so that when
taking the snapshot all the guest pages will be protected once.

Then when each page is written, the guest cannot proceed before copying the
snapshot page over first.  After one guest page is unprotected, any write
to it will be with full speed because the follow up writes won't matter for
a snapshot.

It guarantees the best efficiency of creating a snapshot with VM running,
afaict.  I sincerely think Libvirt should have someone investigating and
see whether virDomainSnapshotXXX() can be implemented by this cap rather
than the default migration.

I actually thought the Libvirt support was there. I think it must be that
someone posted support for Libvirt but it didn't really land for some
reason.

> 
> Currently when doing this libvirt has a "libvirt_iohelper" process
> that we use so that we can do writes with O_DIRECT set. This avoids
> thrashing the host OS's  I/O buffers/cache, and thus negatively
> impacting performance of anything else on the host doing I/O. This
> can't take advantage of multifd though, and even if extended todo
> so, it still imposes extra data copies during the save/restore paths.
> 
> So to speed up the above 3 libvirt APIs, we want QEMU to be able to
> directly save/restore mem/vmstate to files, with parallization and
> O_DIRECT.

Here IIUC above question can be really important on whether existing
virDomainSnapshotXXX() can (and should) use "background-snapshot" to
implement, because that's the only one that will need to support migration
live (out of 3 use cases).

If virDomainSnapshotXXX() can be implemented differently, I think it'll be
much easier to have both virDomainManagedSave() and virDomainSave() trigger
a migration command that will stop the VM first by whatever way.

It's probably fine if we still want to have CAP_FIXED_RAM as a new
capability describing the file property (so that libvirt will know iohelper
is not needed anymore), it can support live migrating even if it shouldn't
really use it.  But then we could probably have another CAP_SUSPEND which
gives QEMU a hint so QEMU can be smart on this non-live migration.

It's just that AFAIU CAP_FIXED_RAM should just always be set with
CAP_SUSPEND, because it must be a SUSPEND to fixed ram or one should just
use virDomainSnapshotXXX() (or say, live snapshot).

Thanks,

-- 
Peter Xu



  reply	other threads:[~2023-04-19 19:08 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-30 18:03 [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 01/26] migration: Add support for 'file:' uri for source migration Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 02/26] migration: Add support for 'file:' uri for incoming migration Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 03/26] tests/qtest: migration: Add migrate_incoming_qmp helper Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 04/26] tests/qtest: migration-test: Add tests for file-based migration Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 05/26] migration: Initial support of fixed-ram feature for analyze-migration.py Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 06/26] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 07/26] io: Add generic pwritev/preadv interface Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 08/26] io: implement io_pwritev/preadv for QIOChannelFile Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 09/26] migration/qemu-file: add utility methods for working with seekable channels Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 10/26] migration/ram: Introduce 'fixed-ram' migration stream capability Fabiano Rosas
2023-03-30 22:01   ` Peter Xu
2023-03-31  7:56     ` Daniel P. Berrangé
2023-03-31 14:39       ` Peter Xu
2023-03-31 15:34         ` Daniel P. Berrangé
2023-03-31 16:13           ` Peter Xu
2023-03-31 15:05     ` Fabiano Rosas
2023-03-31  5:50   ` Markus Armbruster
2023-03-30 18:03 ` [RFC PATCH v1 11/26] migration: Refactor precopy ram loading code Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 12/26] migration: Add support for 'fixed-ram' migration restore Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 13/26] tests/qtest: migration-test: Add tests for fixed-ram file-based migration Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 14/26] migration: Add completion tracepoint Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 15/26] migration/multifd: Remove direct "socket" references Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 16/26] migration/multifd: Allow multifd without packets Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 17/26] migration/multifd: Add outgoing QIOChannelFile support Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 18/26] migration/multifd: Add incoming " Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 19/26] migration/multifd: Add pages to the receiving side Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 20/26] io: Add a pwritev/preadv version that takes a discontiguous iovec Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 21/26] migration/ram: Add a wrapper for fixed-ram shadow bitmap Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 22/26] migration/multifd: Support outgoing fixed-ram stream format Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 23/26] migration/multifd: Support incoming " Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 24/26] tests/qtest: Add a multifd + fixed-ram migration test Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 25/26] migration: Add direct-io parameter Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 26/26] tests/migration/guestperf: Add file, fixed-ram and direct-io support Fabiano Rosas
2023-03-30 21:41 ` [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram Peter Xu
2023-03-31 14:37   ` Fabiano Rosas
2023-03-31 14:52     ` Peter Xu
2023-03-31 15:30       ` Fabiano Rosas
2023-03-31 15:55         ` Peter Xu
2023-03-31 16:10           ` Daniel P. Berrangé
2023-03-31 16:27             ` Peter Xu
2023-03-31 18:18               ` Fabiano Rosas
2023-03-31 21:52                 ` Peter Xu
2023-04-03  7:47                   ` Claudio Fontana
2023-04-03 19:26                     ` Peter Xu
2023-04-04  8:00                       ` Claudio Fontana
2023-04-04 14:53                         ` Peter Xu
2023-04-04 15:10                           ` Claudio Fontana
2023-04-04 15:56                             ` Peter Xu
2023-04-06 16:46                               ` Fabiano Rosas
2023-04-07 10:36                                 ` Claudio Fontana
2023-04-11 15:48                                   ` Peter Xu
2023-04-18 16:58               ` Daniel P. Berrangé
2023-04-18 19:26                 ` Peter Xu
2023-04-19 17:12                   ` Daniel P. Berrangé
2023-04-19 19:07                     ` Peter Xu [this message]
2023-04-20  9:02                       ` Daniel P. Berrangé
2023-04-20 19:19                         ` Peter Xu
2023-04-21  7:48                           ` Daniel P. Berrangé
2023-04-21 13:56                             ` Peter Xu
2023-03-31 15:46       ` Daniel P. Berrangé
2023-04-03  7:38 ` David Hildenbrand
2023-04-03 14:41   ` Fabiano Rosas
2023-04-03 16:24     ` David Hildenbrand
2023-04-03 16:36       ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZEA759BSs75ldW6Y@x1n \
    --to=peterx@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cfontana@suse.de \
    --cc=dfaggioli@suse.com \
    --cc=dgilbert@redhat.com \
    --cc=farosas@suse.de \
    --cc=jfehlig@suse.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.