From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:37356)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <aliguori@us.ibm.com>) id 1RAhzh-0006HI-KQ
	for qemu-devel@nongnu.org; Mon, 03 Oct 2011 08:51:46 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <aliguori@us.ibm.com>) id 1RAhzf-00014p-Qy
	for qemu-devel@nongnu.org; Mon, 03 Oct 2011 08:51:41 -0400
Received: from e3.ny.us.ibm.com ([32.97.182.143]:54641)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <aliguori@us.ibm.com>) id 1RAhzf-00014M-Nl
	for qemu-devel@nongnu.org; Mon, 03 Oct 2011 08:51:39 -0400
Received: from /spool/local by us.ibm.com with XMail ESMTP
	for <qemu-devel@nongnu.org> from <aliguori@us.ibm.com>;
	Mon, 3 Oct 2011 08:51:28 -0400
Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167])
	by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	p93CpHDH049214
	for <qemu-devel@nongnu.org>; Mon, 3 Oct 2011 08:51:23 -0400
Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1])
	by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP
	id p93CpGFQ013193
	for <qemu-devel@nongnu.org>; Mon, 3 Oct 2011 06:51:16 -0600
Message-ID: <4E89AFB4.8000103@us.ibm.com>
Date: Mon, 03 Oct 2011 07:51:00 -0500
From: Anthony Liguori <aliguori@us.ibm.com>
MIME-Version: 1.0
References: <1316443309-23843-1-git-send-email-mdroth@linux.vnet.ibm.com>
	<20111003064653.GA15380@redhat.com>
In-Reply-To: <20111003064653.GA15380@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC] New Migration Protocol using Visitor
	Interface
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: aliguori@linux.vnet.ibm.com, Michael Roth <mdroth@linux.vnet.ibm.com>, qemu-devel@nongnu.org

On 10/03/2011 01:46 AM, Michael S. Tsirkin wrote:
> On Mon, Sep 19, 2011 at 09:41:41AM -0500, Michael Roth wrote:
>> OVERVIEW
>>
>> This patch series implements a QEMUFile Visitor class that's intended to abstract away direct calls to qemu_put_*/qemu_get_* for save/load functions. Currently this is done by always creating a QEMUFileInputVisitor/QEMUFileOutputVisitor pair with each call to qemu_fopen_ops() and maintaining a QEMUFile->I/O Visitor mapping. save/load functions that are to be converted would them simply use lookup routines to get a Visitor based on their QEMUFile arguments. Once these save/load functions are all converted, we can then change the interfaces in bulk and switch over to passing in the Visitor directly.
>>
>> An example conversion of Slirp, which still uses old-style save/load routines, is included as part of this series. Anthony I believe has VMState converted over to using Visitors, so theoretically all VMStatified devices are good to go (Anthony: if that's the case feel free to rebase on this or point me to the repo to pull in, and I'll work off the resulting branch).
>>
>> PLANS
>>
>> Ultimately, the goal is to implement a new migration protocol that is self-describing, such that incompatibilities or migration errors can be more easilly detected. Currently, a simple change in data types for a particular device can introduce subtle bugs that won't be detected by the target, since the target interprets the data according to it's own expectation of what those data types are. Currently the plan is to use ASN.1 BER in place of QEMUFile.
>>
>> By using a Visitor interface for migration, we also gain the ability to generate a migration schema (via, say, a JSONSchemaVisitor). This is similar to the VMState schema generator in Anthony's "Add live migration unit tests" series (http://thread.gmane.org/gmane.comp.emulators.qemu/97754/focus=97754), except that it is applicable to both VMState and old-style save/load functions.
>>
>> There is still quite a bit a work to get to that point. Anthony addressed some of the issues on the VMState side of things in the aforementioned series, and I believe he already has some patches that convert VMState over to using visitors insteads of qemu_put_*/qemu_get_* directly. That being the case, what's left is:
>>
>> 1) Convert old-style save/load functions over to using visitors. I'm not sure what the best approach is for this. We basically have 2 options: either by converting them to VMState, or converting the functions over to using visitors, such as with the example slirp conversion. Even if we do option 2 it would likely need to be done in 2 phases:
>>
>> phase 1: a mechanical conversion where we basically replace each qemu_put*/qemu_get* with a visit_*. This allows use to plop in a new, say BERVisitor to switch to using the new migration protocol. It should also be possible to do this without breaking migration compatibility with older versions. It does not however result in a well-specified schema if we were to plop in, say, a JSONSchemaVisitor. This is where we need phase 2.
>>
>> phase 2: Currently, runtime state modifies our "schema" the way things are currently done. Slirp instance and virtqueue enumeration, for instance:
>>
>> slirp/slirp.c:
>>      for (ex_ptr = slirp->exec_list; ex_ptr; ex_ptr = ex_ptr->ex_next)
>>          if (ex_ptr->ex_pty == 3) {
>>              struct socket *so;
>>              so = slirp_find_ctl_socket(slirp, ex_ptr->ex_addr,
>>                                         ntohs(ex_ptr->ex_fport));
>>              if (!so)
>>                  continue;
>>
>>              qemu_put_byte(f, 42);
>>              slirp_socket_save(f, so);
>>          }
>>
>> hw/virtio.c:
>>      for (i = 0; i<  VIRTIO_PCI_QUEUE_MAX; i++) {
>>          if (vdev->vq[i].vring.num == 0)
>>              break;
>>
>>          qemu_put_be32(f, vdev->vq[i].vring.num);
>>          qemu_put_be64(f, vdev->vq[i].pa);
>>          qemu_put_be16s(f,&vdev->vq[i].last_avail_idx);
>>          if (vdev->binding->save_queue)
>>              vdev->binding->save_queue(vdev->binding_opaque, i, f);
>>      }
>>
>>
>> To be able to build a schema around this we'd need to do a visit_start_array() or visit_start_list(), to denote a list of entries, and ideally we'd also describe structure of these entries. But often there is no structure of these entries...they may just be values taken from various data structures. So, either we do hacky things like using visitor intefaces to create "fake" structs (we say we're dealing with a struct when we're really not), or we actually put these into an intermediate struct which we then use to assign to store/load migration fields from.
>>
>> phase 2 sounds a whole lot like converting over to VMState. So I think we'll only want to go with option 2 in places where converting to VMState is not practical/planned. Or we can hold off on the phase 2 stuff and just focus on getting the new protocol in place ASAP. After which, we can rework things to support generating a well-specified schema.
>>
>> 2) Once the conversion stuff is done, we can modify the save/load interfaces in bulk to accept a Visitor in place of a QEMUFile (but continue using the old protocol/QEMUFile*Visitor for the time being)
>>
>> 3) Implement migration capabilities negotiation. This was discussed to some extent in the XBZRLE v4 thread (http://lists.gnu.org/archive/html/qemu-devel/2011-08/msg01155.html), basically having a set of migration capabilities that can be queried via QMP. Management tools would then choose the optimal migration settings from the intersection of these.
>>
>> 4) Implement the BERVisitor and make this the default migration protocol.
>>
>> Most of the work will be in 1), though with the implementation in this series we should be able to do it incrementally. I'm not sure if the best approach is doing the mechanical phase 1 conversion, then doing phase 2 sometime after 4), doing phase 1 + 2 as part of 1), or just doing VMState conversions which gives basically the same capabilities as phase 1 + 2.
>>
>> Thoughts?
>
> Yes, I think this creates much more work than an upfront conversion
> of the protocol to BER, then cleaning up interfaces, would be.
>
> Here are some suggestions:
>
> - Let's make the protocol be BER directly.
>    As a first step, use a single octet string for
>    the whole of data. Next, start splitting this up.

This can't be done without breaking the old style migration protocol.  I don't 
have a problem with that but I do have a problem with repeatedly breaking 
migration protocol.

The Visitor interface is very little abstraction over a native BER interface. 
It lets us introduce BER incrementally while still supporting the old protocol.

>
> - Don't try to use arrays at all. They don't make sense
>    in either jason or asn. Instead, use sets for unordered
>    data and sequences for ordered data.

sequences map to visit_start_array() pretty well. You would do something like:

visit_start_array(v, "entries", errp);
for (int i = 0; i < s->size; i++) {
     visit_type_int(v, NULL, &s->entry[i], errp);
}
visit_end_array(v, errp);

> - Most data should be unordered: we don't normally care
>    in which order we get device attributes.
>    So we would have set of sequences, and
>    each sequence is attribute name (ascii string) + attribute data

I'd actually be inclined to make everything ordered by default and not encode 
explicit field names.  I'm afraid that encoding field names on the wire is going 
to easily increase the amount of device state by an order of magnitude.

> - This does not discuss how we will handle cross version
>    migration in a lot of detail. Presumably you expect capability magic to
>    work for this. However this does not work for migration to file.
>    A self describing format makes a much simpler solution possible:
>    separating attributes which destination must parse from those
>    it can skip safely. If destination sees an attribute
>    that was marked as must parse and does not recognize it,
>    migration fails. If not, just skip the attribute.

Compatibility and the wire format are more or less orthogonal.  A self 
describing format helps eliminate a class of bugs but getting compatibility 
right requires quite a bit more.

Regards,

Anthony Liguori