All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Marc Bevand <m.bevand@gmail.com>
Cc: qemu-devel@nongnu.org, Gleb Natapov <gleb@redhat.com>,
	kvm@vger.kernel.org
Subject: Re: [Qemu-devel] Re: qcow2 corruption observed, fixed by reverting old change
Date: Sun, 15 Feb 2009 02:37:18 +0000	[thread overview]
Message-ID: <20090215023718.GD9281@shareable.org> (raw)
In-Reply-To: <aaccfcb60902132231v53b54070sf7a0151ee565214@mail.gmail.com>

Marc Bevand wrote:
> On Fri, Feb 13, 2009 at 8:23 AM, Jamie Lokier <jamie@shareable.org> wrote:
> >
> > Marc..  this is quite a serious bug you've reported.  Is there a
> > reason you didn't report it earlier?
> 
> Because I only started hitting that bug a couple weeks ago after
> having upgraded to a buggy kvm version.
> 
> > Is there a way to restructure the code and/or how it works so it's
> > more clearly correct?
> 
> I am seriously concerned about the general design of qcow2. The code
> base is more complex than it needs to be, the format itself is
> susceptible to race conditions causing cluster leaks when updating
> some internal datastructures, it gets easily fragmented, etc.

When I read it, I thought the code was remarkably compact for what it
does, although I agree that the leaks, fragmentation and inconsistency
on crashes are serious.  From elsewhere it sounds like the refcount
update cost is significant too.

> I am considering implementing a new disk image format that supports
> base images, snapshots (of the guest state), clones (of the disk
> content); that has a radically simpler design & code base; that is
> always consistent "on disk"; that is friendly to delta diffing (ie.
> space-efficient when used with ZFS snapshots or rsync); and that makes
> use of checksumming & replication to detect & fix corruption of
> critical data structures (ideally this should be implemented by the
> filesystem, unfortunately ZFS is not available everywhere :D).

You have just described a high quality modern filesystem or database
engine; both would certainly be far more complex than qcow2's code.
Especially with checksumming and replication :)

ZFS isn't everywhere, but it looks like everyone wants to clone ZFS's
best features everywhere (but not it's worst feature: lots of memory
required).

I've had similar thoughts myself, by the way :-)

> I believe the key to achieve these (seemingly utopian) goals is to
> represent a disk "image" as a set of sparse files, 1 per
> snapshot/clone.

You can already do this, if your filesystem supports snapshotting.  On
Linux hosts, any filesystem can snapshot by using LVM underneath it
(although it's not pretty to do).  A few experimental Linux
filesystems let you snapshot at the filesystem level.

A feature you missed in the utopian vision is sharing backing store
for equal parts of files between different snapshots _after_ they've
been written in separate branches (with the same data), and also among
different VMs.  It's becoming stylish to put similarity detection in
the filesystem somewhere too :-)

-- Jamie

  parent reply	other threads:[~2009-02-15  2:37 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-11  7:00 qcow2 corruption observed, fixed by reverting old change Jamie Lokier
2009-02-11  7:00 ` [Qemu-devel] " Jamie Lokier
2009-02-11  9:57 ` Kevin Wolf
2009-02-11 11:27   ` Jamie Lokier
2009-02-11 11:27     ` Jamie Lokier
2009-02-11 11:41   ` Jamie Lokier
2009-02-11 11:41     ` Jamie Lokier
2009-02-11 12:41     ` Kevin Wolf
2009-02-11 12:41       ` Kevin Wolf
2009-02-11 16:48       ` Jamie Lokier
2009-02-11 16:48         ` Jamie Lokier
2009-02-12 22:57         ` Consul
2009-02-12 22:57           ` [Qemu-devel] " Consul
2009-02-12 23:19           ` Consul
2009-02-12 23:19             ` [Qemu-devel] " Consul
2009-02-13  7:50             ` Marc Bevand
2009-02-16 12:44         ` [Qemu-devel] " Kevin Wolf
2009-02-17  0:43           ` Jamie Lokier
2009-02-17  0:43             ` Jamie Lokier
2009-03-06 22:37         ` Filip Navara
2009-03-06 22:37           ` Filip Navara
2009-02-12  5:45       ` Chris Wright
2009-02-12  5:45         ` Chris Wright
2009-02-12 11:08         ` Johannes Schindelin
2009-02-12 11:08           ` Johannes Schindelin
2009-02-13  6:41 ` Marc Bevand
2009-02-13 11:16   ` Kevin Wolf
2009-02-13 11:16     ` [Qemu-devel] " Kevin Wolf
2009-02-13 16:23     ` Jamie Lokier
2009-02-13 16:23       ` Jamie Lokier
2009-02-13 18:43       ` Chris Wright
2009-02-13 18:43         ` Chris Wright
2009-02-14  6:31       ` Marc Bevand
2009-02-14 22:28         ` Dor Laor
2009-02-14 22:28           ` Dor Laor
2009-02-15  2:27           ` Jamie Lokier
2009-02-15  7:56           ` Marc Bevand
2009-02-15  7:56             ` Marc Bevand
2009-02-15  2:37         ` Jamie Lokier [this message]
2009-02-15 10:57     ` Gleb Natapov
2009-02-15 10:57       ` [Qemu-devel] " Gleb Natapov
2009-02-15 11:46       ` Marc Bevand
2009-02-15 11:46         ` [Qemu-devel] " Marc Bevand
2009-02-15 11:54         ` Marc Bevand
2009-02-15 11:54           ` [Qemu-devel] " Marc Bevand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090215023718.GD9281@shareable.org \
    --to=jamie@shareable.org \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=m.bevand@gmail.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.