All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tim Deegan <tim@xen.org>
To: David Vrabel <david.vrabel@citrix.com>
Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>,
	"Xen-devel@lists.xen.org" <Xen-devel@lists.xen.org>
Subject: Re: Domain Save Image Format proposal (draft B)
Date: Wed, 12 Feb 2014 17:36:25 +0100	[thread overview]
Message-ID: <20140212163625.GE91459@deinos.phlegethon.org> (raw)
In-Reply-To: <52F90A71.40802@citrix.com>

Hi,

This draft has my wholehearted support.  Even without addressing any
of the points under discussion something along these lines would be a
vast improvement on the current format.

I have two general questions:

 - The existing save-format definition is spread across a number of
   places: libxc for hypervisor state, qemu for DM state, and the main
   toolstack (libxl/xend/xapi/&c) for other config runes and a general
   wrapper.  This is clearly a reworking of the libxc parts -- do
   you think there's anything currently defined elsewhere that belongs
   in this spec?

 - Have you given any thought to making this into a wire protocol
   rather than just a file format?  Would there be any benefit to
   having records individually acked by the receiver in a live
   migration, or having the receiver send instructions about
   compatibility?  Or is that again left to the toolstack to manage?

and a few nits:

At 17:20 +0000 on 10 Feb (1392049249), David Vrabel wrote:
> Records
> =======
> 
> A record has a record header, type specific data and a trailing
> footer.  If body_length is not a multiple of 8, the body is padded
> with zeroes to align the checksum field on an 8 octet boundary.
> 
>      0     1     2     3     4     5     6     7 octet
>     +-----------------------+-------------------------+
>     | type                  | body_length             |
>     +-----------+-----------+-------------------------+
>     | options   | (reserved)                          |
>     +-----------+-------------------------------------+
>     ...
>     Record body of length body_length octets followed by
>     0 to 7 octets of padding.
>     ...
>     +-----------------------+-------------------------+
>     | checksum              | (reserved)              |
>     +-----------------------+-------------------------+
> 
> --------------------------------------------------------------------
> Field        Description
> -----------  -------------------------------------------------------
> type         0x00000000: END
> 
>              0x00000001: PAGE_DATA
> 
>              0x00000002: VCPU_INFO
> 
>              0x00000003: VCPU_CONTEXT
> 
>              0x00000004: X86_PV_INFO
> 
>              0x00000005: P2M
> 
>              0x00000006 - 0xFFFFFFFF: Reserved
> 
> body_length  Length in octets of the record body.
> 
> options      Bit 0: 0 - checksum invalid, 1 = checksum valid.
> 
>              Bit 1-15: Reserved.
> 
> checksum     CRC-32 checksum of the record body (including any trailing
>              padding), or 0x00000000 if the checksum field is invalid.

Apart from any discussion of the merits of per-record vs whole-file
checksums, it would be useful for this checksum to cover the header
too.  E.g., by declaring it to be the checksum of header+data where
the checksum field is 0, or by declaring that it shall be that pattern
which causes the finished header+data to checksum to 0.

> VCPU_INFO
> ---------
> 
> > [ This is a combination of parts of the extended-info and
> > XC_SAVE_ID_VCPU_INFO chunks. ]
> 
> The VCPU_INFO record includes the maximum possible VCPU ID.  This will
> be followed a VCPU_CONTEXT record for each online VCPU.
> 
>      0     1     2     3     4     5     6     7 octet
>     +-----------------------+------------------------+
>     | max_vcpu_id           | (reserved)             |
>     +-----------------------+------------------------+
> 
> --------------------------------------------------------------------
> Field        Description
> -----------  ---------------------------------------------------
> max_vcpu_id  Maximum possible VCPU ID.
> --------------------------------------------------------------------

If this is all that's in this record, maybe it should be called
VCPU_COUNT?

> P2M
> ---
> 
> [ This is a more flexible replacement for the old p2m_size field and
> p2m array. ]
> 
> The P2M record contains a portion of the source domain's P2M.
> Multiple P2M records may be sent if the source P2M changes during the
> stream.
> 
>      0     1     2     3     4     5     6     7 octet
>     +-------------------------------------------------+
>     | pfn_begin                                       |
>     +-------------------------------------------------+
>     | pfn_end                                         |
>     +-------------------------------------------------+
>     | mfn[0]                                          |
>     +-------------------------------------------------+
>     ...
>     +-------------------------------------------------+
>     | mfn[N-1]                                        |
>     +-------------------------------------------------+
> 
> --------------------------------------------------------------------
> Field       Description
> ----------- --------------------------------------------------------
> pfn_begin   The first PFN in this portion of the P2M
> 
> pfn_end     One past the last PFN in this portion of the P2M.
> 
> mfn         Array of (pfn_end - pfn-begin) MFNs corresponding to
>             the set of PFNs in the range [pfn_begin, pfn_end).
> --------------------------------------------------------------------

The current save record doesn't contain the p2m itself, but rather the
p2m_frame_list, an array of the MFNs (in the save record, PFNs) that
hold the actual p2m.  Frames in that list are used to populate the p2m
as memory is allocated on the receiving side.

I'm not sure what it would mean to allow the guest to change the
location of its p2m table (as distinct from the contents) on the fly
during a migration.  We would at least have to re-send the contents of
any frames that are no longer in the p2m table, in case the receiver
has already overwritten them.  And I think it should be fine to just
send the whole list every time (or else we need to manage deltas
carefully too).

Also, while I'm thinking about record names, this probably ought to be
called X86_PV_P2M or something like that.

Cheers,

Tim.

  parent reply	other threads:[~2014-02-12 16:36 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-10 17:20 Domain Save Image Format proposal (draft B) David Vrabel
2014-02-10 17:32 ` Andrew Cooper
2014-02-10 17:48 ` Frediano Ziglio
2014-02-10 20:00 ` Shriram Rajagopalan
2014-02-11  1:35   ` Andrew Cooper
2014-02-11  4:12     ` Shriram Rajagopalan
2014-02-11 10:58       ` Zoltan Kiss
2014-02-12 15:34       ` Tim Deegan
2014-02-11 11:58   ` David Vrabel
2014-02-11 16:13     ` Shriram Rajagopalan
2014-02-11  9:30 ` Ian Campbell
2014-02-11 11:40   ` David Vrabel
2014-02-11 12:09     ` Ian Campbell
2014-02-11 13:10       ` Jan Beulich
2014-02-11 13:12       ` Ian Campbell
2014-02-12 15:41   ` Tim Deegan
2014-02-11 10:06 ` Jan Beulich
2014-02-11 13:04   ` David Vrabel
2014-02-11 13:20     ` Jan Beulich
2014-02-11 16:45   ` Ian Jackson
2014-02-11 17:08     ` Shriram Rajagopalan
2014-02-11 17:15       ` Ian Campbell
2014-02-11 17:30         ` Ian Jackson
2014-02-11 17:31         ` Frediano Ziglio
2014-02-11 17:53           ` Ian Jackson
2014-02-11 17:59             ` Ian Campbell
2014-02-12  9:07               ` Frediano Ziglio
2014-02-12 11:27                 ` Frediano Ziglio
2014-02-12 11:34                   ` Ian Campbell
2014-02-11 16:38 ` Ian Jackson
2014-02-11 17:04   ` Andrew Cooper
2014-02-11 17:07     ` Ian Jackson
2014-02-11 16:49 ` Ian Jackson
2014-02-11 17:10   ` David Vrabel
2014-02-11 17:28     ` Ian Jackson
2014-02-12 16:36 ` Tim Deegan [this message]
2014-02-12 17:09   ` David Vrabel
2014-02-12 18:16     ` Frediano Ziglio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140212163625.GE91459@deinos.phlegethon.org \
    --to=tim@xen.org \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=Xen-devel@lists.xen.org \
    --cc=david.vrabel@citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=rshriram@cs.ubc.ca \
    --cc=stefano.stabellini@eu.citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.