Re: Domain Save Image Format proposal (draft B)

From: Shriram Rajagopalan <rshriram@cs.ubc.ca>
To: David Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>,
	"Xen-devel@lists.xen.org" <Xen-devel@lists.xen.org>
Subject: Re: Domain Save Image Format proposal (draft B)
Date: Tue, 11 Feb 2014 10:13:10 -0600	[thread overview]
Message-ID: <CAP8mzPMgH7rjm0OPwAmEiCKVJMeWXsCvYybHEb2FdnVD9VXXvA@mail.gmail.com> (raw)
In-Reply-To: <52FA1069.2040709@citrix.com>

[-- Attachment #1.1: Type: text/plain, Size: 4502 bytes --]

On Tue, Feb 11, 2014 at 5:58 AM, David Vrabel <david.vrabel@citrix.com>wrote:

> On 10/02/14 20:00, Shriram Rajagopalan wrote:
> > On Mon, Feb 10, 2014 at 9:20 AM, David Vrabel <david.vrabel@citrix.com
> > <mailto:david.vrabel@citrix.com>> wrote:
> >
> >
> > Its tempting to adopt all the TCP-style madness for transferring a set of
> > structured data.  Why this endian-ness mess?  Am I missing something
> here?
> > I am assuming that a lion's share of Xen's deployment is on x86
> > (not including Amazon). So that leaves ARM.  Why not let these
> > processors take the hit of endian-ness conversion?
>
> I'm not sure I would characterize a spec being precise about byte
> ordering as "endianness mess".
>
> I think it would be a pretty poor specification if it didn't specify
> byte ordering -- we can't have the tools having to make assumptions
> about the ordering.
>
>
Totally agree. But as someone else put it (and you did as well), my point
was
that its sufficient to specify it once, somewhere in the image header and
making
sure that (as you put it below), that the current use cases don't have to
go through
needless endian conversion.

> However, I do think it can be specified in such a way that all the
> current use cases don't have to do any byte swapping (except for the
> minimal header).
>
> >         +-----------------------+-------------------------+
> >         | checksum              | (reserved)              |
> >         +-----------------------+-------------------------+
> >
> >
> > I am assuming that you the checksum field is present only
> > for debugging purposes? Otherwise, I see no reason for the
> > computational overhead, given that we are already sending data
> > over a reliable channel + IIRC we already have an image-wide checksum
> > when saving the image to disk.
>
> I'm not aware of any image wide checksum.
>

Yep. I was mistaken.

> The checksum seems like a potentially useful feature but I don't have a
> requirement for it so if no one else thinks it is useful it can be removed.
>
>
My suggestion is that when saving the image to disk, why not have a single
image-wide checksum to ensure that the image from disk being restored is
still valid?

> >     PAGE_DATA
> >     ---------
> [...]
> >     --------------------------------------------------------------------
> >     Field       Description
> >     ----------- --------------------------------------------------------
> >     count       Number of pages described in this record.
> >
> >     pfn         An array of count PFNs. Bits 63-60 contain
> >                 the XEN\_DOMCTL\_PFINFO_* value for that PFN.
> >
> >     page_data   page_size octets of uncompressed page contents for each
> page
> >                 set as present in the pfn array.
> >     --------------------------------------------------------------------
> >
> >
> > s/uncompressed/(compressed/uncompressed)/
> > (Remus sends compressed data)
>
> No.  I think compressed page data should have its own record type. The
> current scheme of mode flipping records seems crazy to me.
>
>
What record flipping? For page compression, Remus basically has a simple
XOR+RLE encoded sequence of bytes, preceded by a 4-byte length field.
Instead of sending the usual 4K per-page page_data, this compressed chunk
is sent.
The additional code on the remote side is an additional "if" block, that
uses
 xc_uncompess instead of memcpy to get the uncompressed page.

It would not change the way the PAGE_DATA record would be transmitted.

Though, one potentially cooler addition could be to use the option field of
the record header
to indicate whether the data is compressed or not. Given that we have 64
bits, we could even
go as far as specifying the type of compression module used (e.g., none,
remus, gzip, etc.).
This might be really helpful when one wants to save/restore large images (a
8GB VM for example)
to/from disks. Is this better/worse than simply gzipping the entire saved
image? I don't know yet.

However, for live migration, this would be pretty helpful (especially when
migrating over long latency
networks).  Remus' compression technique cannot be used for live migration
as it requires a previous
version of pages for XOR+RLE compression.  However, gzip and other such
compression algorithms
would be pretty handy in the live migration case, over WAN or even a
clogged LAN, where there
are tons of VMs being moved back and forth.

Feel free to shoot down this idea if it seems unfeasible.

 Thanks
shriram

[-- Attachment #1.2: Type: text/html, Size: 6435 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel