From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=33095 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Q0A6A-0006TH-1C
	for qemu-devel@nongnu.org; Thu, 17 Mar 2011 06:06:31 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1Q0A68-0002hh-S1
	for qemu-devel@nongnu.org; Thu, 17 Mar 2011 06:06:29 -0400
Received: from mx1.redhat.com ([209.132.183.28]:46273)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1Q0A68-0002ha-HD
	for qemu-devel@nongnu.org; Thu, 17 Mar 2011 06:06:28 -0400
Message-ID: <4D81DD6C.4000203@redhat.com>
Date: Thu, 17 Mar 2011 11:07:40 +0100
From: Kevin Wolf <kwolf@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Re: KVM call agenda for Jan 25
References: <20110124132559.GA25236@x200.localdomain>	<4D3DF7EA.4010807@codemonkey.ws>	<20110125115727.5f2b495e@doriath>	<20110125120244.5b18863d@doriath>	<AANLkTi=px6_UZ6vYFm=wvS8aTmecQbA1_Z3ra9UH9ecm@mail.gmail.com>	<4D43F0F5.10206@cse.iitd.ac.in>	<4D67E9EB.7090606@cse.iitd.ac.in>	<AANLkTik6de8nkuS6xz6btOkqjTMTKmOm_B+Jy9DCa2Tk@mail.gmail.com>	<4D6975B0.4060309@cse.iitd.ac.in>	<AANLkTikGeDvT7zq4UbBh-FpCUDrSQ-uoNBp_mJVpKx68@mail.gmail.com>	<4D6C087B.90603@cse.iitd.ac.in>	<AANLkTimMyCmy6yYJM6+5QdAt0DjCQq=WHE1iOwpJJHAq@mail.gmail.com>	<4D7E309B.6080800@cse.iitd.ac.in>	<4D7F3EFC.3050106@redhat.com>
	<AANLkTikRmnB=s8WiEK6mgEbupxoQ7UCBaW80GBaEo88c@mail.gmail.com>
In-Reply-To: <AANLkTikRmnB=s8WiEK6mgEbupxoQ7UCBaW80GBaEo88c@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Dushyant Bansal <cs5070214@cse.iitd.ac.in>, qemu-devel@nongnu.org

Am 16.03.2011 18:47, schrieb Stefan Hajnoczi:
> On Tue, Mar 15, 2011 at 10:27 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 14.03.2011 16:13, schrieb Dushyant Bansal:
>>>>
>>>> Nice that qemu-img convert isn't that far out by default on raw :).
>>>>
>>>> About Google Summer of Code, I have posted my take on applying and
>>>> want to share that with you and qemu-devel:
>>>>
>>>> http://blog.vmsplice.net/2011/03/advice-for-students-applying-to-google.html
>>>>
>>> Thanks for sharing your experiences.
>>>
>>> After reading about qcow2 and qed and how they organize data (thanks to
>>> the newly added qcow2 doc and discussions on the mailing list), this is
>>> what I understand.
>>>
>>> So, the main difference between qed and qcow2 is the absence of
>>> reference count structure in qed(means less meta data).
>>> It improves performance due to:
>>> 1. For write operations, less or no metadata to update.
>>> 2. Data write and metadata write can be in any order
>>>
>>> This also means these features are no longer supported:
>>> 1. Internal snapshots,
>>> 2. CPU/device state snapshots,
>>> 3. Compression,
>>> 4. Encryption
>>>
>>> Now, coming to qed<-->qcow2 conversion, I want to clarify some things.
>>>
>>> 1. header_size: variable in qed, equals to cluster size in qcow2:
>>> When will it be larger than 1 cluster in qed? So, what will happen to
>>> that extra data on qed->qcow2 conversion.
>>
>> If you have an feature that is used in the original image, but cannot be
>> represented in the new format, I think you should just get an error.
>>
>>> 2. L2 table size: equals to L1 table size in qed, equals to cluster size
>>> in qcow2:
>>> we need to take it into account during conversion.
>>
>> Right. I think we'll have to rewrite all of the metadata.
>>
>> I wonder if we can manage to have a nice block driver interface for
>> in-place image conversions so that we don't only get a qed<->qcow2
>> converter, but also can implement the interface in e.g. VMDK and get
>> VMDK<->qcow2 and VMDK<->qed as well.
> 
> I think this will be tricky but would be very interested if someone
> has ideas.  Code-wise an in-place converter probably needs access to
> both format's on-disk structures or internal functions.  I don't think
> abstracting this is easy because the more you abstract the less
> control you have over keeping things in-place and cleanly putting the
> new structures in place.

Well, if it was easy, I would have suggested a specific way of doing it. ;-)

But it would be a really cool thing to have, and I think it's more fun
for a GSoC participant to actually think about a hard problem than just
doing mostly mechanical work.

> On the other hand, I think the starting point for a generic in-place
> converter would be a loop that does something like bdrv_is_allocated()
> but translates the guest position in the block device into an offset
> into the image file.  That, together with some sort of free map or
> space allocation bitmap would allow a generic approach to figuring out
> the data mapping and which parts of the file can be safely used.

We can discuss the detailed API later, but I agree that the critical
thing to convert is the mapping.

You would probably open the file with the source format driver read-only
and with the destination driver read-write. For qcow2 you would start
with writing a refcount table that marks the whole file as used, other
formats use the file size anyway. Then you can start creating L1 and L2
tables and copy the mapping over. Once this is done, you do an fsck to
free the metadata of the old format.

One thing that may become tricky is the image header which both drivers
may want to use and which is fixed at offset 0. And of course, you must
make sure that the image is safe at any point if the converter crashes.

> The big benefit doing an interface for in-place conversion is that we
> can support n-to-n conversions with at most n converter code rather
> than having to code n * n - n different in-place converters.

Yes, this was the idea.

Kevin