All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Fabiano Rosas" <farosas@suse.de>,
	qemu-devel@nongnu.org, "Claudio Fontana" <cfontana@suse.de>,
	jfehlig@suse.com, dfaggioli@suse.com, dgilbert@redhat.com,
	"Juan Quintela" <quintela@redhat.com>,
	"Nikolay Borisov" <nborisov@suse.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>
Subject: Re: [RFC PATCH v1 10/26] migration/ram: Introduce 'fixed-ram' migration stream capability
Date: Fri, 31 Mar 2023 12:13:32 -0400	[thread overview]
Message-ID: <ZCcGrEkSA64z6MpV@x1n> (raw)
In-Reply-To: <ZCb9oVI6WUaGizwm@redhat.com>

On Fri, Mar 31, 2023 at 04:34:57PM +0100, Daniel P. Berrangé wrote:
> On Fri, Mar 31, 2023 at 10:39:23AM -0400, Peter Xu wrote:
> > On Fri, Mar 31, 2023 at 08:56:01AM +0100, Daniel P. Berrangé wrote:
> > > On Thu, Mar 30, 2023 at 06:01:51PM -0400, Peter Xu wrote:
> > > > On Thu, Mar 30, 2023 at 03:03:20PM -0300, Fabiano Rosas wrote:
> > > > > From: Nikolay Borisov <nborisov@suse.com>
> > > > > 
> > > > > Implement 'fixed-ram' feature. The core of the feature is to ensure that
> > > > > each ram page of the migration stream has a specific offset in the
> > > > > resulting migration stream. The reason why we'd want such behavior are
> > > > > two fold:
> > > > > 
> > > > >  - When doing a 'fixed-ram' migration the resulting file will have a
> > > > >    bounded size, since pages which are dirtied multiple times will
> > > > >    always go to a fixed location in the file, rather than constantly
> > > > >    being added to a sequential stream. This eliminates cases where a vm
> > > > >    with, say, 1G of ram can result in a migration file that's 10s of
> > > > >    GBs, provided that the workload constantly redirties memory.
> > > > > 
> > > > >  - It paves the way to implement DIO-enabled save/restore of the
> > > > >    migration stream as the pages are ensured to be written at aligned
> > > > >    offsets.
> > > > > 
> > > > > The feature requires changing the stream format. First, a bitmap is
> > > > > introduced which tracks which pages have been written (i.e are
> > > > > dirtied) during migration and subsequently it's being written in the
> > > > > resulting file, again at a fixed location for every ramblock. Zero
> > > > > pages are ignored as they'd be zero in the destination migration as
> > > > > well. With the changed format data would look like the following:
> > > > > 
> > > > > |name len|name|used_len|pc*|bitmap_size|pages_offset|bitmap|pages|
> > > > 
> > > > What happens with huge pages?  Would page size matter here?
> > > > 
> > > > I would assume it's fine it uses a constant (small) page size, assuming
> > > > that should match with the granule that qemu tracks dirty (which IIUC is
> > > > the host page size not guest's).
> > > > 
> > > > But I didn't yet pay any further thoughts on that, maybe it would be
> > > > worthwhile in all cases to record page sizes here to be explicit or the
> > > > meaning of bitmap may not be clear (and then the bitmap_size will be a
> > > > field just for sanity check too).
> > > 
> > > I think recording the page sizes is an anti-feature in this case.
> > > 
> > > The migration format / state needs to reflect the guest ABI, but
> > > we need to be free to have different backend config behind that
> > > either side of the save/restore.
> > > 
> > > IOW, if I start a QEMU with 2 GB of RAM, I should be free to use
> > > small pages initially and after restore use 2 x 1 GB hugepages,
> > > or vica-verca.
> > > 
> > > The important thing with the pages that are saved into the file
> > > is that they are a 1:1 mapping guest RAM regions to file offsets.
> > > IOW, the 2 GB of guest RAM is always a contiguous 2 GB region
> > > in the file.
> > > 
> > > If the src VM used 1 GB pages, we would be writing a full 2 GB
> > > of data assuming both pages were dirty.
> > > 
> > > If the src VM used 4k pages, we would be writing some subset of
> > > the 2 GB of data, and the rest would be unwritten.
> > > 
> > > Either way, when reading back the data we restore it into either
> > > 1 GB pages of 4k pages, beause any places there were unwritten
> > > orignally  will read back as zeros.
> > 
> > I think there's already the page size information, because there's a bitmap
> > embeded in the format at least in the current proposal, and the bitmap can
> > only be defined with a page size provided in some form.
> > 
> > Here I agree the backend can change before/after a migration (live or
> > not).  Though the question is whether page size matters in the snapshot
> > layout rather than what the loaded QEMU instance will use as backend.
> 
> IIUC, the page size information merely sets a constraint on the granularity
> of unwritten (sparse) regions in the file. If we didn't want to express
> page size directly in the file format we would need explicit start/end
> offsets for each written block. This is less convenient that just having
> a bitmap, so I think its ok to use the page size bitmap

I'm perfectly fine with having the bitmap.  The original question was about
whether we should store page_size into the same header too along with the
bitmap.

Currently I think the page size can be implied by either the system
configuration (e.g. arch, cpu setups) and also the size of bitmap.  So I'm
wondering whether it'll be cleaner to replace the bitmap size with page
size (hence one can calculate the bitmap size from the page size), or just
keep both of them for sanity.

Besides, since we seem to be defining a new header format to be stored on
disks, maybe it'll be worthwhile to leave some space for future extentions
of the image?

So the image format can start with a versioning (perhaps also with field
explaning what it contains). Then if someday we want to extend the image,
the new qemu binary will still be able to load the old image even if the
format may change.  Or vice versa, where the old qemu binary would be able
to identify it's loading a new image that it doesn't really understand, so
to properly notify the user rather than weird loading errors.

-- 
Peter Xu



  reply	other threads:[~2023-03-31 16:14 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-30 18:03 [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 01/26] migration: Add support for 'file:' uri for source migration Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 02/26] migration: Add support for 'file:' uri for incoming migration Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 03/26] tests/qtest: migration: Add migrate_incoming_qmp helper Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 04/26] tests/qtest: migration-test: Add tests for file-based migration Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 05/26] migration: Initial support of fixed-ram feature for analyze-migration.py Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 06/26] io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 07/26] io: Add generic pwritev/preadv interface Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 08/26] io: implement io_pwritev/preadv for QIOChannelFile Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 09/26] migration/qemu-file: add utility methods for working with seekable channels Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 10/26] migration/ram: Introduce 'fixed-ram' migration stream capability Fabiano Rosas
2023-03-30 22:01   ` Peter Xu
2023-03-31  7:56     ` Daniel P. Berrangé
2023-03-31 14:39       ` Peter Xu
2023-03-31 15:34         ` Daniel P. Berrangé
2023-03-31 16:13           ` Peter Xu [this message]
2023-03-31 15:05     ` Fabiano Rosas
2023-03-31  5:50   ` Markus Armbruster
2023-03-30 18:03 ` [RFC PATCH v1 11/26] migration: Refactor precopy ram loading code Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 12/26] migration: Add support for 'fixed-ram' migration restore Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 13/26] tests/qtest: migration-test: Add tests for fixed-ram file-based migration Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 14/26] migration: Add completion tracepoint Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 15/26] migration/multifd: Remove direct "socket" references Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 16/26] migration/multifd: Allow multifd without packets Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 17/26] migration/multifd: Add outgoing QIOChannelFile support Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 18/26] migration/multifd: Add incoming " Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 19/26] migration/multifd: Add pages to the receiving side Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 20/26] io: Add a pwritev/preadv version that takes a discontiguous iovec Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 21/26] migration/ram: Add a wrapper for fixed-ram shadow bitmap Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 22/26] migration/multifd: Support outgoing fixed-ram stream format Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 23/26] migration/multifd: Support incoming " Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 24/26] tests/qtest: Add a multifd + fixed-ram migration test Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 25/26] migration: Add direct-io parameter Fabiano Rosas
2023-03-30 18:03 ` [RFC PATCH v1 26/26] tests/migration/guestperf: Add file, fixed-ram and direct-io support Fabiano Rosas
2023-03-30 21:41 ` [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram Peter Xu
2023-03-31 14:37   ` Fabiano Rosas
2023-03-31 14:52     ` Peter Xu
2023-03-31 15:30       ` Fabiano Rosas
2023-03-31 15:55         ` Peter Xu
2023-03-31 16:10           ` Daniel P. Berrangé
2023-03-31 16:27             ` Peter Xu
2023-03-31 18:18               ` Fabiano Rosas
2023-03-31 21:52                 ` Peter Xu
2023-04-03  7:47                   ` Claudio Fontana
2023-04-03 19:26                     ` Peter Xu
2023-04-04  8:00                       ` Claudio Fontana
2023-04-04 14:53                         ` Peter Xu
2023-04-04 15:10                           ` Claudio Fontana
2023-04-04 15:56                             ` Peter Xu
2023-04-06 16:46                               ` Fabiano Rosas
2023-04-07 10:36                                 ` Claudio Fontana
2023-04-11 15:48                                   ` Peter Xu
2023-04-18 16:58               ` Daniel P. Berrangé
2023-04-18 19:26                 ` Peter Xu
2023-04-19 17:12                   ` Daniel P. Berrangé
2023-04-19 19:07                     ` Peter Xu
2023-04-20  9:02                       ` Daniel P. Berrangé
2023-04-20 19:19                         ` Peter Xu
2023-04-21  7:48                           ` Daniel P. Berrangé
2023-04-21 13:56                             ` Peter Xu
2023-03-31 15:46       ` Daniel P. Berrangé
2023-04-03  7:38 ` David Hildenbrand
2023-04-03 14:41   ` Fabiano Rosas
2023-04-03 16:24     ` David Hildenbrand
2023-04-03 16:36       ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZCcGrEkSA64z6MpV@x1n \
    --to=peterx@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cfontana@suse.de \
    --cc=david@redhat.com \
    --cc=dfaggioli@suse.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=farosas@suse.de \
    --cc=jfehlig@suse.com \
    --cc=nborisov@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.