All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [Bug 1013714] [NEW] Data corruption after block migration (LV->LV)
@ 2012-06-15 14:53 Dennis Krul
  2012-06-15 16:52 ` [Qemu-devel] [Bug 1013714] " Paolo Bonzini
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Dennis Krul @ 2012-06-15 14:53 UTC (permalink / raw)
  To: qemu-devel

Public bug reported:

We quite frequently use the live block migration feature to move VM's
between nodes without downtime. These sometimes result in data
corruption on the receiving end. It only happens if the VM is actually
doing I/O (doesn't have to be all that much to trigger the issue).

We use logical volumes and each VM has two disks.  We use cache=none for
all VM disks.

All guests use virtio (a mix of various Linux distro's and Windows
2008R2).

We currently have two stacks in use and have seen the issue on both of
them:

Fedora - qemu-kvm 0.13
Scientific Linux 6.2 (RHEL derived) - qemu-kvm package 0.12.1.2

Even though we don't run the most recent versions of KVM I highly
suspect this issue is still unreported and that filing a bug is
therefore appropriate. (There doesn't seem to be any similar bug report
in launchpad or RedHat's bugzilla and nothing related in change logs,
release notes and  git commit logs.)

I have no idea where to look or where to start debugging this issue, but
if there is any way I can provide useful debug information please let me
know.

** Affects: qemu
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1013714

Title:
  Data corruption after block migration (LV->LV)

Status in QEMU:
  New

Bug description:
  We quite frequently use the live block migration feature to move VM's
  between nodes without downtime. These sometimes result in data
  corruption on the receiving end. It only happens if the VM is actually
  doing I/O (doesn't have to be all that much to trigger the issue).

  We use logical volumes and each VM has two disks.  We use cache=none
  for all VM disks.

  All guests use virtio (a mix of various Linux distro's and Windows
  2008R2).

  We currently have two stacks in use and have seen the issue on both of
  them:

  Fedora - qemu-kvm 0.13
  Scientific Linux 6.2 (RHEL derived) - qemu-kvm package 0.12.1.2

  Even though we don't run the most recent versions of KVM I highly
  suspect this issue is still unreported and that filing a bug is
  therefore appropriate. (There doesn't seem to be any similar bug
  report in launchpad or RedHat's bugzilla and nothing related in change
  logs, release notes and  git commit logs.)

  I have no idea where to look or where to start debugging this issue,
  but if there is any way I can provide useful debug information please
  let me know.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1013714/+subscriptions

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Qemu-devel] [Bug 1013714] Re: Data corruption after block migration (LV->LV)
  2012-06-15 14:53 [Qemu-devel] [Bug 1013714] [NEW] Data corruption after block migration (LV->LV) Dennis Krul
@ 2012-06-15 16:52 ` Paolo Bonzini
  2012-06-15 19:24 ` Dennis Krul
  2017-04-07 13:24 ` Thomas Huth
  2 siblings, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2012-06-15 16:52 UTC (permalink / raw)
  To: qemu-devel

Hi, I suggest that you try a newer version. There were several fixes
that I think went only in 0.14, in particular commit
62155e2b51e3c282ddc30adbb6d7b8d3bb7c386e, commit
62155e2b51e3c282ddc30adbb6d7b8d3bb7c386e, commit
62155e2b51e3c282ddc30adbb6d7b8d3bb7c386e. RHEL6.2 doesn't have them.
With the fixes, it's quite less likely that live block migration will
eat your data.

However, we were also thinking of deprecating block migration, so we are
interesting of hearing about your setup. The replacement would be more
powerful (it would allow migrating storage separately from the VM), more
efficient (storage and RAM streams would run in parallel on different
TCP ports), and easier for us to test and maintain.

However, it would be more complicated to set the new mechanism up for
migration without shared storage. This is what live block migration
does, and it sounds like your usecase requires migration without shared
storage. Likely, a true replacement of live block migration would not be
ready in time for the next release (1.2), hence its removal would also
be delayed.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1013714

Title:
  Data corruption after block migration (LV->LV)

Status in QEMU:
  New

Bug description:
  We quite frequently use the live block migration feature to move VM's
  between nodes without downtime. These sometimes result in data
  corruption on the receiving end. It only happens if the VM is actually
  doing I/O (doesn't have to be all that much to trigger the issue).

  We use logical volumes and each VM has two disks.  We use cache=none
  for all VM disks.

  All guests use virtio (a mix of various Linux distro's and Windows
  2008R2).

  We currently have two stacks in use and have seen the issue on both of
  them:

  Fedora - qemu-kvm 0.13
  Scientific Linux 6.2 (RHEL derived) - qemu-kvm package 0.12.1.2

  Even though we don't run the most recent versions of KVM I highly
  suspect this issue is still unreported and that filing a bug is
  therefore appropriate. (There doesn't seem to be any similar bug
  report in launchpad or RedHat's bugzilla and nothing related in change
  logs, release notes and  git commit logs.)

  I have no idea where to look or where to start debugging this issue,
  but if there is any way I can provide useful debug information please
  let me know.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1013714/+subscriptions

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Qemu-devel] [Bug 1013714] Re: Data corruption after block migration (LV->LV)
  2012-06-15 14:53 [Qemu-devel] [Bug 1013714] [NEW] Data corruption after block migration (LV->LV) Dennis Krul
  2012-06-15 16:52 ` [Qemu-devel] [Bug 1013714] " Paolo Bonzini
@ 2012-06-15 19:24 ` Dennis Krul
  2017-04-07 13:24 ` Thomas Huth
  2 siblings, 0 replies; 4+ messages in thread
From: Dennis Krul @ 2012-06-15 19:24 UTC (permalink / raw)
  To: qemu-devel

Hello Paolo,

Thank you for your quick response!

Did you intend to mention 3 different commits or did you accidentally
paste the same commit thrice? ;) I came across that commit but somehow
thought it was already included in 0.13. Thanks!

We're of course in no position to ask, but I'll do it anyway:  Would you
be in a position to add patches for these commits to the qemu-kvm
package for RHEL6 (assuming they apply at all)? Or perhaps ask one of
the RH package maintainers to do so?  We'd be very grateful!

A little bit of background (our use case for using live block
migration): We are an ISP and provide virtual private servers on KVM.

The way we see it traditional centralized shared storage introduces one
big, expensive and complicated SPOF into a VM platform.

We actually have no problems dealing with the limitations of local
storage. For example, we have automated (offline) VM migrations to other
hosts when customers need to upgrade and the current host doesn't have
enough resources. It would be great if live block migration would be
stable enough to do this online to reduce downtime for customers.

We sometimes use live block migration to reduce the server load by
migrating off a busy VM. It doesn't really matter if the migration takes
a while to complete. We also use it to migrate all VM's off a host in
case the hardware is being retired or we need to reinstall.

Live block migration is just not very useful for generic system
maintenance, like a reboot for a kernel or firmware update. In that case
we simply reboot the host (and most customers don't mind that once in a
while).

We would appreciate it if live block migration would not be removed
until its superior replacement is ready. We don't mind if it's more
complex to work with, as long as it's well documented ;)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1013714

Title:
  Data corruption after block migration (LV->LV)

Status in QEMU:
  New

Bug description:
  We quite frequently use the live block migration feature to move VM's
  between nodes without downtime. These sometimes result in data
  corruption on the receiving end. It only happens if the VM is actually
  doing I/O (doesn't have to be all that much to trigger the issue).

  We use logical volumes and each VM has two disks.  We use cache=none
  for all VM disks.

  All guests use virtio (a mix of various Linux distro's and Windows
  2008R2).

  We currently have two stacks in use and have seen the issue on both of
  them:

  Fedora - qemu-kvm 0.13
  Scientific Linux 6.2 (RHEL derived) - qemu-kvm package 0.12.1.2

  Even though we don't run the most recent versions of KVM I highly
  suspect this issue is still unreported and that filing a bug is
  therefore appropriate. (There doesn't seem to be any similar bug
  report in launchpad or RedHat's bugzilla and nothing related in change
  logs, release notes and  git commit logs.)

  I have no idea where to look or where to start debugging this issue,
  but if there is any way I can provide useful debug information please
  let me know.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1013714/+subscriptions

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Qemu-devel] [Bug 1013714] Re: Data corruption after block migration (LV->LV)
  2012-06-15 14:53 [Qemu-devel] [Bug 1013714] [NEW] Data corruption after block migration (LV->LV) Dennis Krul
  2012-06-15 16:52 ` [Qemu-devel] [Bug 1013714] " Paolo Bonzini
  2012-06-15 19:24 ` Dennis Krul
@ 2017-04-07 13:24 ` Thomas Huth
  2 siblings, 0 replies; 4+ messages in thread
From: Thomas Huth @ 2017-04-07 13:24 UTC (permalink / raw)
  To: qemu-devel

Closing according to comment #3

** Changed in: qemu
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1013714

Title:
  Data corruption after block migration (LV->LV)

Status in QEMU:
  Fix Released

Bug description:
  We quite frequently use the live block migration feature to move VM's
  between nodes without downtime. These sometimes result in data
  corruption on the receiving end. It only happens if the VM is actually
  doing I/O (doesn't have to be all that much to trigger the issue).

  We use logical volumes and each VM has two disks.  We use cache=none
  for all VM disks.

  All guests use virtio (a mix of various Linux distro's and Windows
  2008R2).

  We currently have two stacks in use and have seen the issue on both of
  them:

  Fedora - qemu-kvm 0.13
  Scientific Linux 6.2 (RHEL derived) - qemu-kvm package 0.12.1.2

  Even though we don't run the most recent versions of KVM I highly
  suspect this issue is still unreported and that filing a bug is
  therefore appropriate. (There doesn't seem to be any similar bug
  report in launchpad or RedHat's bugzilla and nothing related in change
  logs, release notes and  git commit logs.)

  I have no idea where to look or where to start debugging this issue,
  but if there is any way I can provide useful debug information please
  let me know.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1013714/+subscriptions

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-04-07 13:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-15 14:53 [Qemu-devel] [Bug 1013714] [NEW] Data corruption after block migration (LV->LV) Dennis Krul
2012-06-15 16:52 ` [Qemu-devel] [Bug 1013714] " Paolo Bonzini
2012-06-15 19:24 ` Dennis Krul
2017-04-07 13:24 ` Thomas Huth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.