All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ian Campbell <Ian.Campbell@citrix.com>
To: David Vrabel <david.vrabel@citrix.com>
Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	"Xen-devel@lists.xen.org" <Xen-devel@lists.xen.org>
Subject: Re: Domain Save Image Format proposal (draft B)
Date: Tue, 11 Feb 2014 09:30:40 +0000	[thread overview]
Message-ID: <1392111040.26657.50.camel@kazak.uk.xensource.com> (raw)
In-Reply-To: <52F90A71.40802@citrix.com>

On Mon, 2014-02-10 at 17:20 +0000, David Vrabel wrote:
> It
> does not currently cover all use cases (e.g., images for HVM guest are
> not considered).

Other currently missing cases:
      * live migration
      * remus
      * page compression
      * ...?

I think it is useful to enumerate them, and if necessary to note which
ones you aren't planning to consider in the immediate future.

> Headers
> =======
> 
> Image Header
> ------------
> 
> The image header identifies an image as a Xen domain save image.  It
> includes the version of this specification that the image complies
> with.
> 
> Tools supporting version _V_ of the specification shall always save
> images using version _V_.  Tools shall support restoring from version
> _V_ and version _V_ - 1.

This isn't quite right since it is in terms of image format version and
not Xen version (unless the two are to be linked somehow). The Xen
project supports migration from version X-1 to version X (where X is the
Xen version). It's not inconceivable that the image format version
wouldn't change over Xen releases.

This might become a more obvious issue once you add HVM to the spec and
introduce the necessary Xen HVM save state block.

> Tools may additionally support restoring from earlier versions.

> The marker field can be used to distinguish between legacy images and
> those corresponding to this specification.  Legacy images will have at
> one or more zero bits within the first 8 octets of the image.
> 
> Fields within the image header are always in _big-endian_ byte order,
> regardless of the setting of the endianness bit.
> 
>      0     1     2     3     4     5     6     7 octet
>     +-------------------------------------------------+
>     | marker                                          |
>     +-----------------------+-------------------------+
>     | id                    | version                 |
>     +-----------+-----------+-------------------------+
>     | options   |                                     |
>     +-----------+-------------------------------------+
> 
> 
> --------------------------------------------------------------------
> Field       Description
> ----------- --------------------------------------------------------
> marker      0xFFFFFFFFFFFFFFFF.
> 
> id          0x58454E46 ("XENF" in ASCII).
> 
> version     0x00000001.  The version of this specification.
> 
> options     bit 0: Endianness.  0 = little-endian, 1 = big-endian.

Couldn't we just specify that things are in a specific endianness
related to the header's arch field?

I appreciate the desire to make the format endian neutral and explicit
about which is in use but (apart from the header) why would you ever
want to go non-native endian for a given arch?

>             bit 1-15: Reserved.
> --------------------------------------------------------------------
> 
> Domain Header
> -------------
> 
> The domain header includes general properties of the domain.
> 
>      0      1     2     3     4     5     6     7 octet
>     +-----------+-----------+-----------+-------------+
>     | arch      | type      | page_shift| (reserved)  |
>     +-----------+-----------+-----------+-------------+
> 
> --------------------------------------------------------------------
> Field       Description
> ----------- --------------------------------------------------------
> arch        0x0000: Reserved.
> 
>             0x0001: x86.
> 
>             0x0002: ARM.
> 
> type        0x0000: Reserved.
> 
>             0x0001: x86 PV.
> 
>             0x0002 - 0xFFFF: Reserved.

Is the type field per-arch? i.e. if arch=0x0002 can we use type = 0x0001
for ARM domains?

> page_shift  Size of a guest page as a power of two.
> 
>             i.e., page size = 2^page_shift^.

This doesn't really need 16 bits, no harm though I suppose.

> --------------------------------------------------------------------
> 
> 
> Records
> =======
> 
> A record has a record header, type specific data and a trailing
> footer.  If body_length is not a multiple of 8, the body is padded
> with zeroes to align the checksum field on an 8 octet boundary.
> 
>      0     1     2     3     4     5     6     7 octet
>     +-----------------------+-------------------------+
>     | type                  | body_length             |
>     +-----------+-----------+-------------------------+
>     | options   | (reserved)                          |
>     +-----------+-------------------------------------+
>     ...
>     Record body of length body_length octets followed by
>     0 to 7 octets of padding.
>     ...
>     +-----------------------+-------------------------+
>     | checksum              | (reserved)              |
>     +-----------------------+-------------------------+
> 
> --------------------------------------------------------------------
> Field        Description
> -----------  -------------------------------------------------------
> type         0x00000000: END
> 
>              0x00000001: PAGE_DATA
> 
>              0x00000002: VCPU_INFO
> 
>              0x00000003: VCPU_CONTEXT
> 
>              0x00000004: X86_PV_INFO
> 
>              0x00000005: P2M
> 
>              0x00000006 - 0xFFFFFFFF: Reserved
> 
> body_length  Length in octets of the record body.
> 
> options      Bit 0: 0 - checksum invalid, 1 = checksum valid.
> 
>              Bit 1-15: Reserved.
> 
> checksum     CRC-32 checksum of the record body (including any trailing
>              padding), or 0x00000000 if the checksum field is invalid.

Which CRC-32 :-P

(wikipedia claims there are 6 variants using various polynomials...)

> P2M
> ---
> 
> [ This is a more flexible replacement for the old p2m_size field and
> p2m array. ]


What is the latter for again?

I don't know if it really fits into this spec, but it might be useful to
include the rationale for the inclusion of each record type.

> PAGE_DATA
> ---------
> 
> The bulk of an image consists of many PAGE_DATA records containing the
> memory contents.
> 
>      0     1     2     3     4     5     6     7 octet
>     +-----------------------+-------------------------+
>     | count (C)             | (reserved)              |
>     +-----------------------+-------------------------+
>     | pfn[0]                                          |
>     +-------------------------------------------------+
>     ...
>     +-------------------------------------------------+
>     | pfn[C-1]                                        |
>     +-------------------------------------------------+
>     | page_data[0]...                                 |
>     ...
>     +-------------------------------------------------+
>     | page_data[N-1]...                               |
>     ...
>     +-------------------------------------------------+
> 
> --------------------------------------------------------------------
> Field       Description
> ----------- --------------------------------------------------------
> count       Number of pages described in this record.
> 
> pfn         An array of count PFNs. Bits 63-60 contain
>             the XEN\_DOMCTL\_PFINFO_* value for that PFN.

Now might be a good time to remove this intertwining? I suppose 60-bits
is a lot of pfn's, but if the VMs address space is sparse it isn't out
of the question.

It might be useful to be explicit (perhaps in a glossary) about what you
mean by "PFN" for various guest types -- some of the terminology can be
a bit confused in actual usage between the various guest types,
including in some of the existing interface definitions.

> Layout
> ======
> 
> The set of valid records depends on the guest architecture and type.
> 
> x86 PV Guest
> ------------
> 
> An x86 PV guest image will have in this order:
> 
> 1. Image header
> 2. Domain header
> 3. X86_PV_INFO record
> 4. At least one P2M record
> 5. At least one PAGE_DATA record
> 6. VCPU_INFO record
> 6. At least one VCPU_CONTEXT record
> 7. END record

It might be good to be more flexible in the ordering (or rather, less
prescriptive here), unless there is a reason for a strict ordering i.e.
dependencies between records.

For example it seems like the p2m cannot change after you start sending
page_data, which I don't think works for live migration (although I'm
not sure you've yet incorporated that as a feature).

Ian.

  parent reply	other threads:[~2014-02-11  9:30 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-10 17:20 Domain Save Image Format proposal (draft B) David Vrabel
2014-02-10 17:32 ` Andrew Cooper
2014-02-10 17:48 ` Frediano Ziglio
2014-02-10 20:00 ` Shriram Rajagopalan
2014-02-11  1:35   ` Andrew Cooper
2014-02-11  4:12     ` Shriram Rajagopalan
2014-02-11 10:58       ` Zoltan Kiss
2014-02-12 15:34       ` Tim Deegan
2014-02-11 11:58   ` David Vrabel
2014-02-11 16:13     ` Shriram Rajagopalan
2014-02-11  9:30 ` Ian Campbell [this message]
2014-02-11 11:40   ` David Vrabel
2014-02-11 12:09     ` Ian Campbell
2014-02-11 13:10       ` Jan Beulich
2014-02-11 13:12       ` Ian Campbell
2014-02-12 15:41   ` Tim Deegan
2014-02-11 10:06 ` Jan Beulich
2014-02-11 13:04   ` David Vrabel
2014-02-11 13:20     ` Jan Beulich
2014-02-11 16:45   ` Ian Jackson
2014-02-11 17:08     ` Shriram Rajagopalan
2014-02-11 17:15       ` Ian Campbell
2014-02-11 17:30         ` Ian Jackson
2014-02-11 17:31         ` Frediano Ziglio
2014-02-11 17:53           ` Ian Jackson
2014-02-11 17:59             ` Ian Campbell
2014-02-12  9:07               ` Frediano Ziglio
2014-02-12 11:27                 ` Frediano Ziglio
2014-02-12 11:34                   ` Ian Campbell
2014-02-11 16:38 ` Ian Jackson
2014-02-11 17:04   ` Andrew Cooper
2014-02-11 17:07     ` Ian Jackson
2014-02-11 16:49 ` Ian Jackson
2014-02-11 17:10   ` David Vrabel
2014-02-11 17:28     ` Ian Jackson
2014-02-12 16:36 ` Tim Deegan
2014-02-12 17:09   ` David Vrabel
2014-02-12 18:16     ` Frediano Ziglio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1392111040.26657.50.camel@kazak.uk.xensource.com \
    --to=ian.campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=Xen-devel@lists.xen.org \
    --cc=david.vrabel@citrix.com \
    --cc=rshriram@cs.ubc.ca \
    --cc=stefano.stabellini@eu.citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.