From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=58983 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OvFLC-0007o4-QK
	for qemu-devel@nongnu.org; Mon, 13 Sep 2010 16:09:28 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <aliguori@linux.vnet.ibm.com>) id 1OvFLB-0005OC-37
	for qemu-devel@nongnu.org; Mon, 13 Sep 2010 16:09:26 -0400
Received: from e38.co.us.ibm.com ([32.97.110.159]:46449)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <aliguori@linux.vnet.ibm.com>) id 1OvFLA-0005Nb-Qp
	for qemu-devel@nongnu.org; Mon, 13 Sep 2010 16:09:25 -0400
Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com
	[9.17.195.106])
	by e38.co.us.ibm.com (8.14.4/8.13.1) with ESMTP id o8DK1lPG003851
	for <qemu-devel@nongnu.org>; Mon, 13 Sep 2010 14:01:47 -0600
Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169])
	by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	o8DK9Jbh109522
	for <qemu-devel@nongnu.org>; Mon, 13 Sep 2010 14:09:19 -0600
Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1])
	by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP
	id o8DK9Inf019762
	for <qemu-devel@nongnu.org>; Mon, 13 Sep 2010 14:09:18 -0600
Message-ID: <4C8E84EE.9040702@linux.vnet.ibm.com>
Date: Mon, 13 Sep 2010 15:09:18 -0500
From: Anthony Liguori <aliguori@linux.vnet.ibm.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Re: [PATCH 3/3] disk: don't read from disk until
	the guest starts
References: <1284213896-12705-1-git-send-email-aliguori@us.ibm.com>
	<1284213896-12705-4-git-send-email-aliguori@us.ibm.com>
	<4C8DE19B.9090309@redhat.com>	<4C8E2747.9090806@linux.vnet.ibm.com>
	<4C8E2981.7000304@redhat.com>	<4C8E2A52.1000708@linux.vnet.ibm.com>
	<4C8E319A.4090103@redhat.com>
	<AANLkTinCc0UuR3rWmMuPoSjHyiSckLoHkDTJzqjmh1Fz@mail.gmail.com>
	<4C8E8390.6050607@redhat.com>
In-Reply-To: <4C8E8390.6050607@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>, qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>, Juan Quintela <quintela@redhat.com>

On 09/13/2010 03:03 PM, Kevin Wolf wrote:
> Am 13.09.2010 21:29, schrieb Stefan Hajnoczi:
>    
>> On Mon, Sep 13, 2010 at 3:13 PM, Kevin Wolf<kwolf@redhat.com>  wrote:
>>      
>>> Am 13.09.2010 15:42, schrieb Anthony Liguori:
>>>        
>>>> On 09/13/2010 08:39 AM, Kevin Wolf wrote:
>>>>          
>>>>>> Yeah, one of the key design points of live migration is to minimize the
>>>>>> number of failure scenarios where you lose a VM.  If someone typed the
>>>>>> wrong command line or shared storage hasn't been mounted yet and we
>>>>>> delay failure until live migration is in the critical path, that would
>>>>>> be terribly unfortunate.
>>>>>>
>>>>>>              
>>>>> We would catch most of them if we try to open the image when migration
>>>>> starts and immediately close it again until migration is (almost)
>>>>> completed, so that no other code can possibly use it before the source
>>>>> has really closed it.
>>>>>
>>>>>            
>>>> I think the only real advantage is that we fix NFS migration, right?
>>>>          
>>> That's the one that we know about, yes.
>>>
>>> The rest is not a specific scenario, but a strong feeling that having an
>>> image opened twice at the same time feels dangerous. As soon as an
>>> open/close sequence writes to the image for some format, we probably
>>> have a bug. For example, what about this mounted flag that you were
>>> discussing for QED?
>>>        
>> There is some room left to work in, even if we can't check in open().
>> One idea would be to do the check asynchronously once I/O begins.  It
>> is actually easy to check L1/L2 tables as they are loaded.
>>
>> The only barrier relationship between I/O and checking is that an
>> allocating write (which will need to update L1/L2 tables) is only
>> allowed after check completes.  Otherwise reads and non-allocating
>> writes may proceed while the image is not yet fully checked.  We can
>> detect when a table element is an invalid offset and discard it.
>>      
> I'm not even talking about such complicated things. You wanted to have a
> dirty flag in the header, right? So when we allow opening an image
> twice, you get this sequence with migration:
>
> Source: open
> Destination: open (with dirty image)
> Source: close
>
> The image is now marked as clean, even though the destination is still
> working on it.
>    

The dirty flag should be read on demand (which is the first time we 
fetch an L1/L2 table).

I agree that the life cycle of the block drivers is getting fuzzy.  Need 
to think quite a bit here.

Regards,

Anthony Liguori

> Kevin
>