From mboxrd@z Thu Jan  1 00:00:00 1970
From: David Vrabel <david.vrabel@citrix.com>
Subject: Re: Domain Save Image Format proposal (draft B)
Date: Tue, 11 Feb 2014 13:04:08 +0000
Message-ID: <52FA1FC8.7010104@citrix.com>
References: <52F90A71.40802@citrix.com>
	<52FA043B020000780011B10C@nat28.tlf.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <52FA043B020000780011B10C@nat28.tlf.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>, "Xen-devel@lists.xen.org" <Xen-devel@lists.xen.org>, Ian Jackson <Ian.Jackson@eu.citrix.com>, Ian Campbell <ian.campbell@citrix.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>
List-Id: xen-devel@lists.xenproject.org

On 11/02/14 10:06, Jan Beulich wrote:
>>>> On 10.02.14 at 18:20, David Vrabel <david.vrabel@citrix.com> wrote:
>> Fields
>> ------
>>
>> All the fields within the headers and records have a fixed width.
>>
>> Fields are always aligned to their size.
>>
>> Padding and reserved fields are set to zero on save and must be
>> ignored during restore.
> 
> Meaning it would be impossible to assign a meaning to these fields
> later. I'd rather mandate that the restore side has to check these
> fields are zero, and bail if they aren't.

Reserved fields/bits can be used but it would require a new record type
and a bump of the format version.

I was aiming to minimize the number of ways the format can be extended.

>> Integer (numeric) fields in the image header are always in big-endian
>> byte order.
> 
> Why would big endian be preferable when both currently
> supported architectures use little endian?

Mostly to encourage tools to pay attention to byte order rather than
assuming native and getting away with it...

>> Domain Header
>> -------------
>>
>> The domain header includes general properties of the domain.
>>
>>      0      1     2     3     4     5     6     7 octet
>>     +-----------+-----------+-----------+-------------+
>>     | arch      | type      | page_shift| (reserved)  |
>>     +-----------+-----------+-----------+-------------+
>>
>> --------------------------------------------------------------------
>> Field       Description
>> ----------- --------------------------------------------------------
>> arch        0x0000: Reserved.
>>
>>             0x0001: x86.
>>
>>             0x0002: ARM.
>>
>> type        0x0000: Reserved.
>>
>>             0x0001: x86 PV.
>>
>>             0x0002 - 0xFFFF: Reserved.
> 
> So how would ARM, x86 HVM, and x86 PVH be expressed?

Something like:

  0x0001: x86 PV.
  0x0002: x86 HVM.
  0x0003: x86 PVH.
  0x0004: ARM.

Which does make the arch field a bit redundant, I suppose.

>> P2M
>> ---
>>
>> [ This is a more flexible replacement for the old p2m_size field and
>> p2m array. ]
>>
>> The P2M record contains a portion of the source domain's P2M.
>> Multiple P2M records may be sent if the source P2M changes during the
>> stream.
>>
>>      0     1     2     3     4     5     6     7 octet
>>     +-------------------------------------------------+
>>     | pfn_begin                                       |
>>     +-------------------------------------------------+
>>     | pfn_end                                         |
>>     +-------------------------------------------------+
>>     | mfn[0]                                          |
>>     +-------------------------------------------------+
>>     ...
>>     +-------------------------------------------------+
>>     | mfn[N-1]                                        |
>>     +-------------------------------------------------+
>>
>> --------------------------------------------------------------------
>> Field       Description
>> ----------- --------------------------------------------------------
>> pfn_begin   The first PFN in this portion of the P2M
>>
>> pfn_end     One past the last PFN in this portion of the P2M.
> 
> I'd favor an inclusive range here, such that if we ever reach a
> fully populatable 64-bit PFN space (on some future architecture)
> there'd still be no issue with special casing the then unavoidable
> wraparound.

Ok, but 64-bit PFN space would suggest 76 bit of address space which
seems somewhat far off.  Is that something we want to consider now?

>> Legacy Images (x86 only)
>> ========================
>>
>> Restoring legacy images from older tools shall be handled by
>> translating the legacy format image into this new format.
>>
>> It shall not be possible to save in the legacy format.
>>
>> There are two different legacy images depending on whether they were
>> generated by a 32-bit or a 64-bit toolstack. These shall be
>> distinguished by inspecting octets 4-7 in the image.  If these are
>> zero then it is a 64-bit image.
>>
>> Toolstack  Field                            Value
>> ---------  -----                            -----
>> 64-bit     Bit 31-63 of the p2m_size field  0 (since p2m_size < 2^32^)
> 
> Afaics this is being determined via xc_domain_maximum_gpfn(),
> which I don't think guarantees the result to be limited to 2^32.
> Or in fact the libxc interface wrongly limits the value (by
> truncating the "long" returned from the hypercall to an "int"). So
> in practice consistent images would have the field limited to 2^31
> on 64-bit tool stacks (since for larger values the negative function
> return value would get converted by sign-extension, but all sorts
> of other trouble would result due to the now huge p2m_size).

For the handling of legacy images I think we need to only consider
images that could have been practically generated by older tools.

>> Future Extensions
>> =================
>>
>> All changes to this format require the image version to be increased.
> 
> Oh, okay, this partly deals with the first question above. Question
> is whether that's a useful requirement, i.e. whether that wouldn't
> lead to an inflation of versions needing conversion (for a tool stack
> that wants to support more than just migration from N-1).

Only legacy images would be converted to the newest format.  I would
expect version V-1 images would be handled by (mostly) the same code as
V images.  Particularly if V is V-1 with extra record types.

David