From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=37211 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OumTQ-0004L9-OQ for qemu-devel@nongnu.org; Sun, 12 Sep 2010 09:20:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OumLx-00058G-TU for qemu-devel@nongnu.org; Sun, 12 Sep 2010 09:12:18 -0400 Received: from mail-gx0-f173.google.com ([209.85.161.173]:57538) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OumLx-000589-QY for qemu-devel@nongnu.org; Sun, 12 Sep 2010 09:12:17 -0400 Received: by gxk22 with SMTP id 22so1482356gxk.4 for ; Sun, 12 Sep 2010 06:12:17 -0700 (PDT) Message-ID: <4C8CD1AF.3060904@codemonkey.ws> Date: Sun, 12 Sep 2010 08:12:15 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC][PATCH 0/3] Fix caching issues with live migration References: <1284213896-12705-1-git-send-email-aliguori@us.ibm.com> <4C8CAF9C.8090903@redhat.com> In-Reply-To: <4C8CAF9C.8090903@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Kevin Wolf , qemu-devel@nongnu.org, Stefan Hajnoczi , Juan Quintela On 09/12/2010 05:46 AM, Avi Kivity wrote: > On 09/11/2010 05:04 PM, Anthony Liguori wrote: >> Today, live migration only works when using shared storage that is fully >> cache coherent using raw images. >> >> The failure case with weak coherent (i.e. NFS) is subtle but >> nontheless still >> exists. NFS only guarantees close-to-open coherence and when >> performing a live >> migration, we do an open on the source and an open on the >> destination. We >> fsync() on the source before launching the destination but since we >> have two >> simultaneous opens, we're not guaranteed coherence. >> >> This is not necessarily a problem except that we are a bit gratituous >> in reading >> from the disk before launching a guest. This means that as things >> stand today, >> we're guaranteed to read the first 64k of the disk and as such, if a >> client >> writes to that region during live migration, corruption will result. >> >> The second failure condition has to do with image files (such as >> qcow2). Today, >> we aggressively cache metadata in all image formats and that cache is >> definitely >> not coherent even with fully coherent shared storage. >> >> In all image formats, we prefetch at least the L1 table in open() >> which means >> that if there is a write operation that causes a modification to an >> L1 table, >> corruption will ensue. >> >> This series attempts to address both of these issue. Technically, if >> a NFS >> client aggressively prefetches this solution is not enough but in >> practice, >> Linux doesn't do that. > > I think it is unlikely that it will, but I prefer to be on the right > side of the standards. I've been asking around about this and one thing that was suggested was acquiring a file lock as NFS requires that a lock acquisition drops any client cache for a file. I need to understand this a bit more so it's step #2. > Why not delay image open until after migration completes? I know > your concern about the image not being there, but we can verify that > with access(). If the image is deleted between access() and open() > then the user has much bigger problems. 3/3 would still be needed because if we delay the open we obviously can do a read until an open. So it's only really a choice between invalidate_cache and delaying open. It's a far less invasive change to just do invalidate_cache though and it has some nice properties. Regards, Anthony Liguori > Note that on NFS, removing (and I think chmoding) a file after it is > opened will cause subsequent data access to fail, unlike posix. >