All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [Bug 1250360] [NEW] qcow2 image logical corruption after host crash
@ 2013-11-12  9:17 Blue
  2013-11-13 10:11 ` Stefan Hajnoczi
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Blue @ 2013-11-12  9:17 UTC (permalink / raw)
  To: qemu-devel

Public bug reported:

Description of problem:
In case of power failure disk images that were active and created in qcow2 format can become logically corrupt so that they actually appear as unused (full of zeroes).
Data seems to be there, but at this moment i cannot find any reliable method to recover it. Should it be a raw image, a recovery path would be available, but a qcow2 image only presents zeroes once it gets corrupted. My understanding is that the blockmap of the image gets reset and the image is then assumed to be unused.
My detailed setup :

Kernel 2.6.32-358.18.1.el6.x86_64
qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64
Used via libvirt libvirt-0.10.2-18.el6_4.14.x86_64
The image was used from a NFS share (the nfs server did NOT crash and remained permanently active).

qemu-img check finds no corruption;
qemu-img convert will fully convert the image to raw at a raw image full of zeroes. However, there is data in the file, and the storage backend was not restarted, inactivated during the incident.
I encountered this issue on two different machines, in both cases i was not able to recover the data.
Image was qcow2, thin provisioned, created like this :
 qemu-img create -f qcow2 -o cluster_size=2M imagename.img

While addressing the root cause in order to not have this issue repeat
would be the ideal scenario, a temporary workaround to run on the
affected qcow2 image to "patch" it and recover the data (eventually
after a full fsck/recovery inside the guest) would also be good.
Otherwise we are basically losing data on a large scale when using
qcow2.


Version-Release number of selected component (if applicable):
Kernel 2.6.32-358.18.1.el6.x86_64
qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64
Used via libvirt libvirt-0.10.2-18.el6_4.14.x86_64

How reproducible:
I am not able (and don't have at the moment enough resources to try to manually reproduce it), but the probability of the issue seems quite high as this is the second case of such corruption in weeks.
Additional info:
I can privately provide an image displaying the corruption.

The reported problem has actually two aspects : first is the cause that eventually produces this issue.
The second is the fact that once the logical corruption has occured, qemu-img check finds nothing wrong with the image - this is obviously wrong.

** Affects: qemu
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1250360

Title:
  qcow2 image logical corruption after host crash

Status in QEMU:
  New

Bug description:
  Description of problem:
  In case of power failure disk images that were active and created in qcow2 format can become logically corrupt so that they actually appear as unused (full of zeroes).
  Data seems to be there, but at this moment i cannot find any reliable method to recover it. Should it be a raw image, a recovery path would be available, but a qcow2 image only presents zeroes once it gets corrupted. My understanding is that the blockmap of the image gets reset and the image is then assumed to be unused.
  My detailed setup :

  Kernel 2.6.32-358.18.1.el6.x86_64
  qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64
  Used via libvirt libvirt-0.10.2-18.el6_4.14.x86_64
  The image was used from a NFS share (the nfs server did NOT crash and remained permanently active).

  qemu-img check finds no corruption;
  qemu-img convert will fully convert the image to raw at a raw image full of zeroes. However, there is data in the file, and the storage backend was not restarted, inactivated during the incident.
  I encountered this issue on two different machines, in both cases i was not able to recover the data.
  Image was qcow2, thin provisioned, created like this :
   qemu-img create -f qcow2 -o cluster_size=2M imagename.img

  While addressing the root cause in order to not have this issue repeat
  would be the ideal scenario, a temporary workaround to run on the
  affected qcow2 image to "patch" it and recover the data (eventually
  after a full fsck/recovery inside the guest) would also be good.
  Otherwise we are basically losing data on a large scale when using
  qcow2.


  Version-Release number of selected component (if applicable):
  Kernel 2.6.32-358.18.1.el6.x86_64
  qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64
  Used via libvirt libvirt-0.10.2-18.el6_4.14.x86_64

  How reproducible:
  I am not able (and don't have at the moment enough resources to try to manually reproduce it), but the probability of the issue seems quite high as this is the second case of such corruption in weeks.
  Additional info:
  I can privately provide an image displaying the corruption.

  The reported problem has actually two aspects : first is the cause that eventually produces this issue.
  The second is the fact that once the logical corruption has occured, qemu-img check finds nothing wrong with the image - this is obviously wrong.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1250360/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Bug 1250360] [NEW] qcow2 image logical corruption after host crash
  2013-11-12  9:17 [Qemu-devel] [Bug 1250360] [NEW] qcow2 image logical corruption after host crash Blue
@ 2013-11-13 10:11 ` Stefan Hajnoczi
  2013-11-13 10:28 ` [Qemu-devel] [Bug 1250360] " Kevin Wolf
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2013-11-13 10:11 UTC (permalink / raw)
  To: Bug 1250360; +Cc: kwolf, qemu-devel

On Tue, Nov 12, 2013 at 09:17:34AM -0000, Blue wrote:

Please post the qemu command-line (ps aux | grep qemu) for the affected
VM.

What kind of workload is accessing the disk?  Guest OS and version?

Please also confirm that nothing else is accessing the image file while
the VM is running.  It is not safe to use tools like qemu-img on the
file while the VM has it open.

> Image was qcow2, thin provisioned, created like this :
>  qemu-img create -f qcow2 -o cluster_size=2M imagename.img

Interesting note, you are setting a custom cluster size.  It should work
but it's not the default configuration that is well-tested.

> I can privately provide an image displaying the corruption.

Please send a link to stefanha@redhat.com and kwolf@redhat.com.  We can
help you inspect the image to determine the nature of the corruption and
what data can be restored.

Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [Bug 1250360] Re: qcow2 image logical corruption after host crash
  2013-11-12  9:17 [Qemu-devel] [Bug 1250360] [NEW] qcow2 image logical corruption after host crash Blue
  2013-11-13 10:11 ` Stefan Hajnoczi
@ 2013-11-13 10:28 ` Kevin Wolf
  2013-11-13 10:39 ` Blue
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2013-11-13 10:28 UTC (permalink / raw)
  To: qemu-devel

See also https://bugzilla.redhat.com/show_bug.cgi?id=1029344 for some
more information

** Bug watch added: Red Hat Bugzilla #1029344
   https://bugzilla.redhat.com/show_bug.cgi?id=1029344

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1250360

Title:
  qcow2 image logical corruption after host crash

Status in QEMU:
  New

Bug description:
  Description of problem:
  In case of power failure disk images that were active and created in qcow2 format can become logically corrupt so that they actually appear as unused (full of zeroes).
  Data seems to be there, but at this moment i cannot find any reliable method to recover it. Should it be a raw image, a recovery path would be available, but a qcow2 image only presents zeroes once it gets corrupted. My understanding is that the blockmap of the image gets reset and the image is then assumed to be unused.
  My detailed setup :

  Kernel 2.6.32-358.18.1.el6.x86_64
  qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64
  Used via libvirt libvirt-0.10.2-18.el6_4.14.x86_64
  The image was used from a NFS share (the nfs server did NOT crash and remained permanently active).

  qemu-img check finds no corruption;
  qemu-img convert will fully convert the image to raw at a raw image full of zeroes. However, there is data in the file, and the storage backend was not restarted, inactivated during the incident.
  I encountered this issue on two different machines, in both cases i was not able to recover the data.
  Image was qcow2, thin provisioned, created like this :
   qemu-img create -f qcow2 -o cluster_size=2M imagename.img

  While addressing the root cause in order to not have this issue repeat
  would be the ideal scenario, a temporary workaround to run on the
  affected qcow2 image to "patch" it and recover the data (eventually
  after a full fsck/recovery inside the guest) would also be good.
  Otherwise we are basically losing data on a large scale when using
  qcow2.


  Version-Release number of selected component (if applicable):
  Kernel 2.6.32-358.18.1.el6.x86_64
  qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64
  Used via libvirt libvirt-0.10.2-18.el6_4.14.x86_64

  How reproducible:
  I am not able (and don't have at the moment enough resources to try to manually reproduce it), but the probability of the issue seems quite high as this is the second case of such corruption in weeks.
  Additional info:
  I can privately provide an image displaying the corruption.

  The reported problem has actually two aspects : first is the cause that eventually produces this issue.
  The second is the fact that once the logical corruption has occured, qemu-img check finds nothing wrong with the image - this is obviously wrong.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1250360/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [Bug 1250360] Re: qcow2 image logical corruption after host crash
  2013-11-12  9:17 [Qemu-devel] [Bug 1250360] [NEW] qcow2 image logical corruption after host crash Blue
  2013-11-13 10:11 ` Stefan Hajnoczi
  2013-11-13 10:28 ` [Qemu-devel] [Bug 1250360] " Kevin Wolf
@ 2013-11-13 10:39 ` Blue
  2017-01-31 20:49 ` Thomas Huth
  2017-02-01  8:23 ` Thomas Huth
  4 siblings, 0 replies; 7+ messages in thread
From: Blue @ 2013-11-13 10:39 UTC (permalink / raw)
  To: qemu-devel

Indeed, it's the same issue, i opened the report @ centos, redhat and here. Thank you Kevin for posting the link before I got to do it :)
Can we link the reports like we can in rhel-centos bugzilla ?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1250360

Title:
  qcow2 image logical corruption after host crash

Status in QEMU:
  New

Bug description:
  Description of problem:
  In case of power failure disk images that were active and created in qcow2 format can become logically corrupt so that they actually appear as unused (full of zeroes).
  Data seems to be there, but at this moment i cannot find any reliable method to recover it. Should it be a raw image, a recovery path would be available, but a qcow2 image only presents zeroes once it gets corrupted. My understanding is that the blockmap of the image gets reset and the image is then assumed to be unused.
  My detailed setup :

  Kernel 2.6.32-358.18.1.el6.x86_64
  qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64
  Used via libvirt libvirt-0.10.2-18.el6_4.14.x86_64
  The image was used from a NFS share (the nfs server did NOT crash and remained permanently active).

  qemu-img check finds no corruption;
  qemu-img convert will fully convert the image to raw at a raw image full of zeroes. However, there is data in the file, and the storage backend was not restarted, inactivated during the incident.
  I encountered this issue on two different machines, in both cases i was not able to recover the data.
  Image was qcow2, thin provisioned, created like this :
   qemu-img create -f qcow2 -o cluster_size=2M imagename.img

  While addressing the root cause in order to not have this issue repeat
  would be the ideal scenario, a temporary workaround to run on the
  affected qcow2 image to "patch" it and recover the data (eventually
  after a full fsck/recovery inside the guest) would also be good.
  Otherwise we are basically losing data on a large scale when using
  qcow2.


  Version-Release number of selected component (if applicable):
  Kernel 2.6.32-358.18.1.el6.x86_64
  qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64
  Used via libvirt libvirt-0.10.2-18.el6_4.14.x86_64

  How reproducible:
  I am not able (and don't have at the moment enough resources to try to manually reproduce it), but the probability of the issue seems quite high as this is the second case of such corruption in weeks.
  Additional info:
  I can privately provide an image displaying the corruption.

  The reported problem has actually two aspects : first is the cause that eventually produces this issue.
  The second is the fact that once the logical corruption has occured, qemu-img check finds nothing wrong with the image - this is obviously wrong.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1250360/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [Bug 1250360] Re: qcow2 image logical corruption after host crash
  2013-11-12  9:17 [Qemu-devel] [Bug 1250360] [NEW] qcow2 image logical corruption after host crash Blue
                   ` (2 preceding siblings ...)
  2013-11-13 10:39 ` Blue
@ 2017-01-31 20:49 ` Thomas Huth
  2017-01-31 21:49   ` Blue
  2017-02-01  8:23 ` Thomas Huth
  4 siblings, 1 reply; 7+ messages in thread
From: Thomas Huth @ 2017-01-31 20:49 UTC (permalink / raw)
  To: qemu-devel

Did this issue happen ever again with a recent version of QEMU? ... if
not, should we close this bug nowadays?

** Changed in: qemu
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1250360

Title:
  qcow2 image logical corruption after host crash

Status in QEMU:
  Incomplete

Bug description:
  Description of problem:
  In case of power failure disk images that were active and created in qcow2 format can become logically corrupt so that they actually appear as unused (full of zeroes).
  Data seems to be there, but at this moment i cannot find any reliable method to recover it. Should it be a raw image, a recovery path would be available, but a qcow2 image only presents zeroes once it gets corrupted. My understanding is that the blockmap of the image gets reset and the image is then assumed to be unused.
  My detailed setup :

  Kernel 2.6.32-358.18.1.el6.x86_64
  qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64
  Used via libvirt libvirt-0.10.2-18.el6_4.14.x86_64
  The image was used from a NFS share (the nfs server did NOT crash and remained permanently active).

  qemu-img check finds no corruption;
  qemu-img convert will fully convert the image to raw at a raw image full of zeroes. However, there is data in the file, and the storage backend was not restarted, inactivated during the incident.
  I encountered this issue on two different machines, in both cases i was not able to recover the data.
  Image was qcow2, thin provisioned, created like this :
   qemu-img create -f qcow2 -o cluster_size=2M imagename.img

  While addressing the root cause in order to not have this issue repeat
  would be the ideal scenario, a temporary workaround to run on the
  affected qcow2 image to "patch" it and recover the data (eventually
  after a full fsck/recovery inside the guest) would also be good.
  Otherwise we are basically losing data on a large scale when using
  qcow2.


  Version-Release number of selected component (if applicable):
  Kernel 2.6.32-358.18.1.el6.x86_64
  qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64
  Used via libvirt libvirt-0.10.2-18.el6_4.14.x86_64

  How reproducible:
  I am not able (and don't have at the moment enough resources to try to manually reproduce it), but the probability of the issue seems quite high as this is the second case of such corruption in weeks.
  Additional info:
  I can privately provide an image displaying the corruption.

  The reported problem has actually two aspects : first is the cause that eventually produces this issue.
  The second is the fact that once the logical corruption has occured, qemu-img check finds nothing wrong with the image - this is obviously wrong.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1250360/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Bug 1250360] Re: qcow2 image logical corruption after host crash
  2017-01-31 20:49 ` Thomas Huth
@ 2017-01-31 21:49   ` Blue
  0 siblings, 0 replies; 7+ messages in thread
From: Blue @ 2017-01-31 21:49 UTC (permalink / raw)
  To: qemu-devel

Since then, I switched all my vm images to (sparse) raw  and never
experienced corruption problems again.
I could not say if this can still be reproduced today, even then it was
probably a corner case.
I would suggest the closing of the issue as we cannot gather newer and more
relevant data.


On Tue, Jan 31, 2017 at 10:49 PM, Thomas Huth <1250360@bugs.launchpad.net>
wrote:

> Did this issue happen ever again with a recent version of QEMU? ... if
> not, should we close this bug nowadays?
>
> ** Changed in: qemu
>        Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1250360
>
> Title:
>   qcow2 image logical corruption after host crash
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/qemu/+bug/1250360/+subscriptions
>

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1250360

Title:
  qcow2 image logical corruption after host crash

Status in QEMU:
  Incomplete

Bug description:
  Description of problem:
  In case of power failure disk images that were active and created in qcow2 format can become logically corrupt so that they actually appear as unused (full of zeroes).
  Data seems to be there, but at this moment i cannot find any reliable method to recover it. Should it be a raw image, a recovery path would be available, but a qcow2 image only presents zeroes once it gets corrupted. My understanding is that the blockmap of the image gets reset and the image is then assumed to be unused.
  My detailed setup :

  Kernel 2.6.32-358.18.1.el6.x86_64
  qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64
  Used via libvirt libvirt-0.10.2-18.el6_4.14.x86_64
  The image was used from a NFS share (the nfs server did NOT crash and remained permanently active).

  qemu-img check finds no corruption;
  qemu-img convert will fully convert the image to raw at a raw image full of zeroes. However, there is data in the file, and the storage backend was not restarted, inactivated during the incident.
  I encountered this issue on two different machines, in both cases i was not able to recover the data.
  Image was qcow2, thin provisioned, created like this :
   qemu-img create -f qcow2 -o cluster_size=2M imagename.img

  While addressing the root cause in order to not have this issue repeat
  would be the ideal scenario, a temporary workaround to run on the
  affected qcow2 image to "patch" it and recover the data (eventually
  after a full fsck/recovery inside the guest) would also be good.
  Otherwise we are basically losing data on a large scale when using
  qcow2.


  Version-Release number of selected component (if applicable):
  Kernel 2.6.32-358.18.1.el6.x86_64
  qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64
  Used via libvirt libvirt-0.10.2-18.el6_4.14.x86_64

  How reproducible:
  I am not able (and don't have at the moment enough resources to try to manually reproduce it), but the probability of the issue seems quite high as this is the second case of such corruption in weeks.
  Additional info:
  I can privately provide an image displaying the corruption.

  The reported problem has actually two aspects : first is the cause that eventually produces this issue.
  The second is the fact that once the logical corruption has occured, qemu-img check finds nothing wrong with the image - this is obviously wrong.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1250360/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [Bug 1250360] Re: qcow2 image logical corruption after host crash
  2013-11-12  9:17 [Qemu-devel] [Bug 1250360] [NEW] qcow2 image logical corruption after host crash Blue
                   ` (3 preceding siblings ...)
  2017-01-31 20:49 ` Thomas Huth
@ 2017-02-01  8:23 ` Thomas Huth
  4 siblings, 0 replies; 7+ messages in thread
From: Thomas Huth @ 2017-02-01  8:23 UTC (permalink / raw)
  To: qemu-devel

Ok, so let's close this now. Thanks for your reply!

** Changed in: qemu
       Status: Incomplete => Won't Fix

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1250360

Title:
  qcow2 image logical corruption after host crash

Status in QEMU:
  Won't Fix

Bug description:
  Description of problem:
  In case of power failure disk images that were active and created in qcow2 format can become logically corrupt so that they actually appear as unused (full of zeroes).
  Data seems to be there, but at this moment i cannot find any reliable method to recover it. Should it be a raw image, a recovery path would be available, but a qcow2 image only presents zeroes once it gets corrupted. My understanding is that the blockmap of the image gets reset and the image is then assumed to be unused.
  My detailed setup :

  Kernel 2.6.32-358.18.1.el6.x86_64
  qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64
  Used via libvirt libvirt-0.10.2-18.el6_4.14.x86_64
  The image was used from a NFS share (the nfs server did NOT crash and remained permanently active).

  qemu-img check finds no corruption;
  qemu-img convert will fully convert the image to raw at a raw image full of zeroes. However, there is data in the file, and the storage backend was not restarted, inactivated during the incident.
  I encountered this issue on two different machines, in both cases i was not able to recover the data.
  Image was qcow2, thin provisioned, created like this :
   qemu-img create -f qcow2 -o cluster_size=2M imagename.img

  While addressing the root cause in order to not have this issue repeat
  would be the ideal scenario, a temporary workaround to run on the
  affected qcow2 image to "patch" it and recover the data (eventually
  after a full fsck/recovery inside the guest) would also be good.
  Otherwise we are basically losing data on a large scale when using
  qcow2.


  Version-Release number of selected component (if applicable):
  Kernel 2.6.32-358.18.1.el6.x86_64
  qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.7.x86_64
  Used via libvirt libvirt-0.10.2-18.el6_4.14.x86_64

  How reproducible:
  I am not able (and don't have at the moment enough resources to try to manually reproduce it), but the probability of the issue seems quite high as this is the second case of such corruption in weeks.
  Additional info:
  I can privately provide an image displaying the corruption.

  The reported problem has actually two aspects : first is the cause that eventually produces this issue.
  The second is the fact that once the logical corruption has occured, qemu-img check finds nothing wrong with the image - this is obviously wrong.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1250360/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-02-01  8:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-12  9:17 [Qemu-devel] [Bug 1250360] [NEW] qcow2 image logical corruption after host crash Blue
2013-11-13 10:11 ` Stefan Hajnoczi
2013-11-13 10:28 ` [Qemu-devel] [Bug 1250360] " Kevin Wolf
2013-11-13 10:39 ` Blue
2017-01-31 20:49 ` Thomas Huth
2017-01-31 21:49   ` Blue
2017-02-01  8:23 ` Thomas Huth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.