All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@suse.de>
To: Marc Bevand <m.bevand@gmail.com>
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org,
	Gleb Natapov <gleb@redhat.com>
Subject: Re: qcow2 corruption observed, fixed by reverting old change
Date: Fri, 13 Feb 2009 12:16:17 +0100	[thread overview]
Message-ID: <49955681.9070301@suse.de> (raw)
In-Reply-To: <loom.20090213T060937-534@post.gmane.org>

Hi Marc,

You should not take qemu-devel out of the CC list. This is where the
bugs need to be fixed, they aren't KVM specific. I'm quoting your
complete mail to forward it to where it belongs.

Marc Bevand schrieb:
> Jamie Lokier <jamie <at> shareable.org> writes:
>> As you see from the subject, I'm getting qcow2 corruption.
>>
>> I have a Windows 2000 guest which boots and runs fine in kvm-72, fails
>> with a blue-screen indicating file corruption errors in kvm-73 through
>> to kvm-83 (the latest), and succeeds if I replace block-qcow2.c with
>> the version from kvm-72.
>>
>> The blue screen appears towards the end of the boot sequence, and
>> shows only briefly before rebooting.  It says:
>>
>>     STOP: c0000218 (Registry File Failure)
>>     The registry cannot load the hive (file):
>>     \SystemRoot\System32\Config\SOFTWARE
>>     or its log or alternate.
>>     It is corrupt, absent, or not writable.
>>
>>     Beginning dump of physical memory
>>     Physical memory dump complete. Contact your system administrator or
>>     technical support [...?]
> 
> I have got a massive KVM installation with hundreds of guests runnings dozens of
> different OSes, and have also noticed multiple qcow2 corruption bugs. All my
> guests are using the qcow2 format, and my hosts are running vanilla linux 2.6.28
> x86_64 kernels and use NPT (Opteron 'Barcelona' 23xx processors).
> 
> My Windows 2000 guests BSOD just like yours with kvm-73 or newer. I have to run
> kvm-75 (I need the NPT fixes it contains) with block-qcow2.c reverted to the
> version from kvm-72 to fix the BSOD.
> 
> kvm-73+ also causes some of my Windows 2003 guests to exhibit this exact
> registry corruption error:
> http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599
> This bug is also fixed by reverting block-qcow2.c to the version from kvm-72.
> 
> I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older because of the
> qcow2 performance regression caused by the default writethrough caching policy)
> but it randomly triggers an even worse bug: the moment I shut down a guest by
> typing "quit" in the monitor, it sometimes overwrite the first 4kB of the disk
> image with mostly NUL bytes (!) which completely destroys it. I am familiar with
> the qcow2 format and apparently this 4kB block seems to be an L2 table with most
> entries set to zero. I have had to restore at least 6 or 7 disk images from
> backup after occurences of that bug. My intuition tells me this may be the qcow2
> code trying to allocate a cluster to write a new L2 table, but not noticing the
> allocation failed (represented by a 0 offset), and writing the L2 table at that
> 0 offset, overwriting the qcow2 header.
> 
> Fortunately this bug is also fixed by running kvm-75 with block-qcow2.c reverted
> to its kvm-72 version.
> 
> Basically qcow2 in kvm-73 or newer is completely unreliable.
> 
> -marc

I think the corruption is a completely unrelated bug. I would suspect it
was introduced in one of Gleb's patches in December. Adding him to CC.

Kevin

WARNING: multiple messages have this Message-ID
From: Kevin Wolf <kwolf@suse.de>
To: Marc Bevand <m.bevand@gmail.com>
Cc: Gleb Natapov <gleb@redhat.com>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org
Subject: [Qemu-devel] Re: qcow2 corruption observed, fixed by reverting old change
Date: Fri, 13 Feb 2009 12:16:17 +0100	[thread overview]
Message-ID: <49955681.9070301@suse.de> (raw)
In-Reply-To: <loom.20090213T060937-534@post.gmane.org>

Hi Marc,

You should not take qemu-devel out of the CC list. This is where the
bugs need to be fixed, they aren't KVM specific. I'm quoting your
complete mail to forward it to where it belongs.

Marc Bevand schrieb:
> Jamie Lokier <jamie <at> shareable.org> writes:
>> As you see from the subject, I'm getting qcow2 corruption.
>>
>> I have a Windows 2000 guest which boots and runs fine in kvm-72, fails
>> with a blue-screen indicating file corruption errors in kvm-73 through
>> to kvm-83 (the latest), and succeeds if I replace block-qcow2.c with
>> the version from kvm-72.
>>
>> The blue screen appears towards the end of the boot sequence, and
>> shows only briefly before rebooting.  It says:
>>
>>     STOP: c0000218 (Registry File Failure)
>>     The registry cannot load the hive (file):
>>     \SystemRoot\System32\Config\SOFTWARE
>>     or its log or alternate.
>>     It is corrupt, absent, or not writable.
>>
>>     Beginning dump of physical memory
>>     Physical memory dump complete. Contact your system administrator or
>>     technical support [...?]
> 
> I have got a massive KVM installation with hundreds of guests runnings dozens of
> different OSes, and have also noticed multiple qcow2 corruption bugs. All my
> guests are using the qcow2 format, and my hosts are running vanilla linux 2.6.28
> x86_64 kernels and use NPT (Opteron 'Barcelona' 23xx processors).
> 
> My Windows 2000 guests BSOD just like yours with kvm-73 or newer. I have to run
> kvm-75 (I need the NPT fixes it contains) with block-qcow2.c reverted to the
> version from kvm-72 to fix the BSOD.
> 
> kvm-73+ also causes some of my Windows 2003 guests to exhibit this exact
> registry corruption error:
> http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599
> This bug is also fixed by reverting block-qcow2.c to the version from kvm-72.
> 
> I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older because of the
> qcow2 performance regression caused by the default writethrough caching policy)
> but it randomly triggers an even worse bug: the moment I shut down a guest by
> typing "quit" in the monitor, it sometimes overwrite the first 4kB of the disk
> image with mostly NUL bytes (!) which completely destroys it. I am familiar with
> the qcow2 format and apparently this 4kB block seems to be an L2 table with most
> entries set to zero. I have had to restore at least 6 or 7 disk images from
> backup after occurences of that bug. My intuition tells me this may be the qcow2
> code trying to allocate a cluster to write a new L2 table, but not noticing the
> allocation failed (represented by a 0 offset), and writing the L2 table at that
> 0 offset, overwriting the qcow2 header.
> 
> Fortunately this bug is also fixed by running kvm-75 with block-qcow2.c reverted
> to its kvm-72 version.
> 
> Basically qcow2 in kvm-73 or newer is completely unreliable.
> 
> -marc

I think the corruption is a completely unrelated bug. I would suspect it
was introduced in one of Gleb's patches in December. Adding him to CC.

Kevin

  reply	other threads:[~2009-02-13 11:09 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-11  7:00 Jamie Lokier
2009-02-11  7:00 ` [Qemu-devel] " Jamie Lokier
2009-02-11  9:57 ` Kevin Wolf
2009-02-11 11:27   ` Jamie Lokier
2009-02-11 11:27     ` Jamie Lokier
2009-02-11 11:41   ` Jamie Lokier
2009-02-11 11:41     ` Jamie Lokier
2009-02-11 12:41     ` Kevin Wolf
2009-02-11 12:41       ` Kevin Wolf
2009-02-11 16:48       ` Jamie Lokier
2009-02-11 16:48         ` Jamie Lokier
2009-02-12 22:57         ` Consul
2009-02-12 22:57           ` [Qemu-devel] " Consul
2009-02-12 23:19           ` Consul
2009-02-12 23:19             ` [Qemu-devel] " Consul
2009-02-13  7:50             ` Marc Bevand
2009-02-16 12:44         ` [Qemu-devel] " Kevin Wolf
2009-02-17  0:43           ` Jamie Lokier
2009-02-17  0:43             ` Jamie Lokier
2009-03-06 22:37         ` Filip Navara
2009-03-06 22:37           ` Filip Navara
2009-02-12  5:45       ` Chris Wright
2009-02-12  5:45         ` Chris Wright
2009-02-12 11:08         ` Johannes Schindelin
2009-02-12 11:08           ` Johannes Schindelin
2009-02-13  6:41 ` Marc Bevand
2009-02-13 11:16   ` Kevin Wolf [this message]
2009-02-13 11:16     ` [Qemu-devel] " Kevin Wolf
2009-02-13 16:23     ` Jamie Lokier
2009-02-13 16:23       ` Jamie Lokier
2009-02-13 18:43       ` Chris Wright
2009-02-13 18:43         ` Chris Wright
2009-02-14  6:31       ` Marc Bevand
2009-02-14 22:28         ` Dor Laor
2009-02-14 22:28           ` Dor Laor
2009-02-15  2:27           ` Jamie Lokier
2009-02-15  7:56           ` Marc Bevand
2009-02-15  7:56             ` Marc Bevand
2009-02-15  2:37         ` Jamie Lokier
2009-02-15 10:57     ` Gleb Natapov
2009-02-15 10:57       ` [Qemu-devel] " Gleb Natapov
2009-02-15 11:46       ` Marc Bevand
2009-02-15 11:46         ` [Qemu-devel] " Marc Bevand
2009-02-15 11:54         ` Marc Bevand
2009-02-15 11:54           ` [Qemu-devel] " Marc Bevand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49955681.9070301@suse.de \
    --to=kwolf@suse.de \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=m.bevand@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --subject='Re: qcow2 corruption observed, fixed by reverting old change' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.