From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tim Deegan Subject: Re: Domain Save Image Format proposal (draft B) Date: Wed, 12 Feb 2014 17:36:25 +0100 Message-ID: <20140212163625.GE91459@deinos.phlegethon.org> References: <52F90A71.40802@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <52F90A71.40802@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: David Vrabel Cc: Shriram Rajagopalan , Stefano Stabellini , Ian Jackson , Ian Campbell , "Xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org Hi, This draft has my wholehearted support. Even without addressing any of the points under discussion something along these lines would be a vast improvement on the current format. I have two general questions: - The existing save-format definition is spread across a number of places: libxc for hypervisor state, qemu for DM state, and the main toolstack (libxl/xend/xapi/&c) for other config runes and a general wrapper. This is clearly a reworking of the libxc parts -- do you think there's anything currently defined elsewhere that belongs in this spec? - Have you given any thought to making this into a wire protocol rather than just a file format? Would there be any benefit to having records individually acked by the receiver in a live migration, or having the receiver send instructions about compatibility? Or is that again left to the toolstack to manage? and a few nits: At 17:20 +0000 on 10 Feb (1392049249), David Vrabel wrote: > Records > ======= > > A record has a record header, type specific data and a trailing > footer. If body_length is not a multiple of 8, the body is padded > with zeroes to align the checksum field on an 8 octet boundary. > > 0 1 2 3 4 5 6 7 octet > +-----------------------+-------------------------+ > | type | body_length | > +-----------+-----------+-------------------------+ > | options | (reserved) | > +-----------+-------------------------------------+ > ... > Record body of length body_length octets followed by > 0 to 7 octets of padding. > ... > +-----------------------+-------------------------+ > | checksum | (reserved) | > +-----------------------+-------------------------+ > > -------------------------------------------------------------------- > Field Description > ----------- ------------------------------------------------------- > type 0x00000000: END > > 0x00000001: PAGE_DATA > > 0x00000002: VCPU_INFO > > 0x00000003: VCPU_CONTEXT > > 0x00000004: X86_PV_INFO > > 0x00000005: P2M > > 0x00000006 - 0xFFFFFFFF: Reserved > > body_length Length in octets of the record body. > > options Bit 0: 0 - checksum invalid, 1 = checksum valid. > > Bit 1-15: Reserved. > > checksum CRC-32 checksum of the record body (including any trailing > padding), or 0x00000000 if the checksum field is invalid. Apart from any discussion of the merits of per-record vs whole-file checksums, it would be useful for this checksum to cover the header too. E.g., by declaring it to be the checksum of header+data where the checksum field is 0, or by declaring that it shall be that pattern which causes the finished header+data to checksum to 0. > VCPU_INFO > --------- > > > [ This is a combination of parts of the extended-info and > > XC_SAVE_ID_VCPU_INFO chunks. ] > > The VCPU_INFO record includes the maximum possible VCPU ID. This will > be followed a VCPU_CONTEXT record for each online VCPU. > > 0 1 2 3 4 5 6 7 octet > +-----------------------+------------------------+ > | max_vcpu_id | (reserved) | > +-----------------------+------------------------+ > > -------------------------------------------------------------------- > Field Description > ----------- --------------------------------------------------- > max_vcpu_id Maximum possible VCPU ID. > -------------------------------------------------------------------- If this is all that's in this record, maybe it should be called VCPU_COUNT? > P2M > --- > > [ This is a more flexible replacement for the old p2m_size field and > p2m array. ] > > The P2M record contains a portion of the source domain's P2M. > Multiple P2M records may be sent if the source P2M changes during the > stream. > > 0 1 2 3 4 5 6 7 octet > +-------------------------------------------------+ > | pfn_begin | > +-------------------------------------------------+ > | pfn_end | > +-------------------------------------------------+ > | mfn[0] | > +-------------------------------------------------+ > ... > +-------------------------------------------------+ > | mfn[N-1] | > +-------------------------------------------------+ > > -------------------------------------------------------------------- > Field Description > ----------- -------------------------------------------------------- > pfn_begin The first PFN in this portion of the P2M > > pfn_end One past the last PFN in this portion of the P2M. > > mfn Array of (pfn_end - pfn-begin) MFNs corresponding to > the set of PFNs in the range [pfn_begin, pfn_end). > -------------------------------------------------------------------- The current save record doesn't contain the p2m itself, but rather the p2m_frame_list, an array of the MFNs (in the save record, PFNs) that hold the actual p2m. Frames in that list are used to populate the p2m as memory is allocated on the receiving side. I'm not sure what it would mean to allow the guest to change the location of its p2m table (as distinct from the contents) on the fly during a migration. We would at least have to re-send the contents of any frames that are no longer in the p2m table, in case the receiver has already overwritten them. And I think it should be fine to just send the whole list every time (or else we need to manage deltas carefully too). Also, while I'm thinking about record names, this probably ought to be called X86_PV_P2M or something like that. Cheers, Tim.