From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=37211 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OumTQ-0004L9-OQ
	for qemu-devel@nongnu.org; Sun, 12 Sep 2010 09:20:01 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1OumLx-00058G-TU
	for qemu-devel@nongnu.org; Sun, 12 Sep 2010 09:12:18 -0400
Received: from mail-gx0-f173.google.com ([209.85.161.173]:57538)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1OumLx-000589-QY
	for qemu-devel@nongnu.org; Sun, 12 Sep 2010 09:12:17 -0400
Received: by gxk22 with SMTP id 22so1482356gxk.4
	for <qemu-devel@nongnu.org>; Sun, 12 Sep 2010 06:12:17 -0700 (PDT)
Message-ID: <4C8CD1AF.3060904@codemonkey.ws>
Date: Sun, 12 Sep 2010 08:12:15 -0500
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [RFC][PATCH 0/3] Fix caching issues with live
	migration
References: <1284213896-12705-1-git-send-email-aliguori@us.ibm.com>
	<4C8CAF9C.8090903@redhat.com>
In-Reply-To: <4C8CAF9C.8090903@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Avi Kivity <avi@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>, Juan Quintela <quintela@redhat.com>

On 09/12/2010 05:46 AM, Avi Kivity wrote:
>  On 09/11/2010 05:04 PM, Anthony Liguori wrote:
>> Today, live migration only works when using shared storage that is fully
>> cache coherent using raw images.
>>
>> The failure case with weak coherent (i.e. NFS) is subtle but 
>> nontheless still
>> exists.  NFS only guarantees close-to-open coherence and when 
>> performing a live
>> migration, we do an open on the source and an open on the 
>> destination.  We
>> fsync() on the source before launching the destination but since we 
>> have two
>> simultaneous opens, we're not guaranteed coherence.
>>
>> This is not necessarily a problem except that we are a bit gratituous 
>> in reading
>> from the disk before launching a guest.  This means that as things 
>> stand today,
>> we're guaranteed to read the first 64k of the disk and as such, if a 
>> client
>> writes to that region during live migration, corruption will result.
>>
>> The second failure condition has to do with image files (such as 
>> qcow2).  Today,
>> we aggressively cache metadata in all image formats and that cache is 
>> definitely
>> not coherent even with fully coherent shared storage.
>>
>> In all image formats, we prefetch at least the L1 table in open() 
>> which means
>> that if there is a write operation that causes a modification to an 
>> L1 table,
>> corruption will ensue.
>>
>> This series attempts to address both of these issue.  Technically, if 
>> a NFS
>> client aggressively prefetches this solution is not enough but in 
>> practice,
>> Linux doesn't do that.
>
> I think it is unlikely that it will, but I prefer to be on the right 
> side of the standards.

I've been asking around about this and one thing that was suggested was 
acquiring a file lock as NFS requires that a lock acquisition drops any 
client cache for a file.  I need to understand this a bit more so it's 
step #2.

>   Why not delay image open until after migration completes?  I know 
> your concern about the image not being there, but we can verify that 
> with access().  If the image is deleted between access() and open() 
> then the user has much bigger problems.

3/3 would still be needed because if we delay the open we obviously can 
do a read until an open.

So it's only really a choice between invalidate_cache and delaying 
open.  It's a far less invasive change to just do invalidate_cache 
though and it has some nice properties.

Regards,

Anthony Liguori

> Note that on NFS, removing (and I think chmoding) a file after it is 
> opened will cause subsequent data access to fail, unlike posix.
>