From: Tim Deegan <tim@xen.org>
To: David Vrabel <david.vrabel@citrix.com>
Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>,
Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
Ian Jackson <Ian.Jackson@eu.citrix.com>,
Ian Campbell <ian.campbell@citrix.com>,
"Xen-devel@lists.xen.org" <Xen-devel@lists.xen.org>
Subject: Re: Domain Save Image Format proposal (draft B)
Date: Wed, 12 Feb 2014 17:36:25 +0100 [thread overview]
Message-ID: <20140212163625.GE91459@deinos.phlegethon.org> (raw)
In-Reply-To: <52F90A71.40802@citrix.com>
Hi,
This draft has my wholehearted support. Even without addressing any
of the points under discussion something along these lines would be a
vast improvement on the current format.
I have two general questions:
- The existing save-format definition is spread across a number of
places: libxc for hypervisor state, qemu for DM state, and the main
toolstack (libxl/xend/xapi/&c) for other config runes and a general
wrapper. This is clearly a reworking of the libxc parts -- do
you think there's anything currently defined elsewhere that belongs
in this spec?
- Have you given any thought to making this into a wire protocol
rather than just a file format? Would there be any benefit to
having records individually acked by the receiver in a live
migration, or having the receiver send instructions about
compatibility? Or is that again left to the toolstack to manage?
and a few nits:
At 17:20 +0000 on 10 Feb (1392049249), David Vrabel wrote:
> Records
> =======
>
> A record has a record header, type specific data and a trailing
> footer. If body_length is not a multiple of 8, the body is padded
> with zeroes to align the checksum field on an 8 octet boundary.
>
> 0 1 2 3 4 5 6 7 octet
> +-----------------------+-------------------------+
> | type | body_length |
> +-----------+-----------+-------------------------+
> | options | (reserved) |
> +-----------+-------------------------------------+
> ...
> Record body of length body_length octets followed by
> 0 to 7 octets of padding.
> ...
> +-----------------------+-------------------------+
> | checksum | (reserved) |
> +-----------------------+-------------------------+
>
> --------------------------------------------------------------------
> Field Description
> ----------- -------------------------------------------------------
> type 0x00000000: END
>
> 0x00000001: PAGE_DATA
>
> 0x00000002: VCPU_INFO
>
> 0x00000003: VCPU_CONTEXT
>
> 0x00000004: X86_PV_INFO
>
> 0x00000005: P2M
>
> 0x00000006 - 0xFFFFFFFF: Reserved
>
> body_length Length in octets of the record body.
>
> options Bit 0: 0 - checksum invalid, 1 = checksum valid.
>
> Bit 1-15: Reserved.
>
> checksum CRC-32 checksum of the record body (including any trailing
> padding), or 0x00000000 if the checksum field is invalid.
Apart from any discussion of the merits of per-record vs whole-file
checksums, it would be useful for this checksum to cover the header
too. E.g., by declaring it to be the checksum of header+data where
the checksum field is 0, or by declaring that it shall be that pattern
which causes the finished header+data to checksum to 0.
> VCPU_INFO
> ---------
>
> > [ This is a combination of parts of the extended-info and
> > XC_SAVE_ID_VCPU_INFO chunks. ]
>
> The VCPU_INFO record includes the maximum possible VCPU ID. This will
> be followed a VCPU_CONTEXT record for each online VCPU.
>
> 0 1 2 3 4 5 6 7 octet
> +-----------------------+------------------------+
> | max_vcpu_id | (reserved) |
> +-----------------------+------------------------+
>
> --------------------------------------------------------------------
> Field Description
> ----------- ---------------------------------------------------
> max_vcpu_id Maximum possible VCPU ID.
> --------------------------------------------------------------------
If this is all that's in this record, maybe it should be called
VCPU_COUNT?
> P2M
> ---
>
> [ This is a more flexible replacement for the old p2m_size field and
> p2m array. ]
>
> The P2M record contains a portion of the source domain's P2M.
> Multiple P2M records may be sent if the source P2M changes during the
> stream.
>
> 0 1 2 3 4 5 6 7 octet
> +-------------------------------------------------+
> | pfn_begin |
> +-------------------------------------------------+
> | pfn_end |
> +-------------------------------------------------+
> | mfn[0] |
> +-------------------------------------------------+
> ...
> +-------------------------------------------------+
> | mfn[N-1] |
> +-------------------------------------------------+
>
> --------------------------------------------------------------------
> Field Description
> ----------- --------------------------------------------------------
> pfn_begin The first PFN in this portion of the P2M
>
> pfn_end One past the last PFN in this portion of the P2M.
>
> mfn Array of (pfn_end - pfn-begin) MFNs corresponding to
> the set of PFNs in the range [pfn_begin, pfn_end).
> --------------------------------------------------------------------
The current save record doesn't contain the p2m itself, but rather the
p2m_frame_list, an array of the MFNs (in the save record, PFNs) that
hold the actual p2m. Frames in that list are used to populate the p2m
as memory is allocated on the receiving side.
I'm not sure what it would mean to allow the guest to change the
location of its p2m table (as distinct from the contents) on the fly
during a migration. We would at least have to re-send the contents of
any frames that are no longer in the p2m table, in case the receiver
has already overwritten them. And I think it should be fine to just
send the whole list every time (or else we need to manage deltas
carefully too).
Also, while I'm thinking about record names, this probably ought to be
called X86_PV_P2M or something like that.
Cheers,
Tim.
next prev parent reply other threads:[~2014-02-12 16:36 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-10 17:20 Domain Save Image Format proposal (draft B) David Vrabel
2014-02-10 17:32 ` Andrew Cooper
2014-02-10 17:48 ` Frediano Ziglio
2014-02-10 20:00 ` Shriram Rajagopalan
2014-02-11 1:35 ` Andrew Cooper
2014-02-11 4:12 ` Shriram Rajagopalan
2014-02-11 10:58 ` Zoltan Kiss
2014-02-12 15:34 ` Tim Deegan
2014-02-11 11:58 ` David Vrabel
2014-02-11 16:13 ` Shriram Rajagopalan
2014-02-11 9:30 ` Ian Campbell
2014-02-11 11:40 ` David Vrabel
2014-02-11 12:09 ` Ian Campbell
2014-02-11 13:10 ` Jan Beulich
2014-02-11 13:12 ` Ian Campbell
2014-02-12 15:41 ` Tim Deegan
2014-02-11 10:06 ` Jan Beulich
2014-02-11 13:04 ` David Vrabel
2014-02-11 13:20 ` Jan Beulich
2014-02-11 16:45 ` Ian Jackson
2014-02-11 17:08 ` Shriram Rajagopalan
2014-02-11 17:15 ` Ian Campbell
2014-02-11 17:30 ` Ian Jackson
2014-02-11 17:31 ` Frediano Ziglio
2014-02-11 17:53 ` Ian Jackson
2014-02-11 17:59 ` Ian Campbell
2014-02-12 9:07 ` Frediano Ziglio
2014-02-12 11:27 ` Frediano Ziglio
2014-02-12 11:34 ` Ian Campbell
2014-02-11 16:38 ` Ian Jackson
2014-02-11 17:04 ` Andrew Cooper
2014-02-11 17:07 ` Ian Jackson
2014-02-11 16:49 ` Ian Jackson
2014-02-11 17:10 ` David Vrabel
2014-02-11 17:28 ` Ian Jackson
2014-02-12 16:36 ` Tim Deegan [this message]
2014-02-12 17:09 ` David Vrabel
2014-02-12 18:16 ` Frediano Ziglio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140212163625.GE91459@deinos.phlegethon.org \
--to=tim@xen.org \
--cc=Ian.Jackson@eu.citrix.com \
--cc=Xen-devel@lists.xen.org \
--cc=david.vrabel@citrix.com \
--cc=ian.campbell@citrix.com \
--cc=rshriram@cs.ubc.ca \
--cc=stefano.stabellini@eu.citrix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.