qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions

All of lore.kernel.org
 help / color / mirror / Atom feed

* qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-09 16:29 ` Lucas Meneghel Rodrigues
  0 siblings, 0 replies; 102+ messages in thread
From: Lucas Meneghel Rodrigues @ 2011-11-09 16:29 UTC (permalink / raw)
  To: QEMU devel, KVM mailing list, Avi Kivity, Marcelo Tosatti,
	Anthony Liguori

Hi guys, here I am, reporting yet another issue with qemu. This time, 
it's something that was first reported in January, and Juan proposed a 
patch for it:

http://comments.gmane.org/gmane.comp.emulators.qemu/89009

[PATCH 4/5] Reopen files after migration

The symptom is, when running disk stress or any intense IO operation in 
guest while migrating it causes a qcow2 corruption. We've seen this 
consistently on the daily test jobs, both for qemu and qemu-kvm. The 
test that triggers it is autotest stress test running on a VM with 
ping-pong background migration.

The fix proposed by Juan is on our RHEL branch and such a problem does 
not happen on the RHEL branch. So, what about re-considering Juan's 
patch, or maybe work out a solution that is satisfactory for the 
upstream maintainers?

I'll open a launchpad bug with this report.

Thanks,

Lucas

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-09 16:29 ` Lucas Meneghel Rodrigues
  0 siblings, 0 replies; 102+ messages in thread
From: Lucas Meneghel Rodrigues @ 2011-11-09 16:29 UTC (permalink / raw)
  To: QEMU devel, KVM mailing list, Avi Kivity, Marcelo Tosatti,
	Anthony Liguori, Kevin Wolf, Michael S. Tsirkin,
	Juan Jose Quintela Carreira

Hi guys, here I am, reporting yet another issue with qemu. This time, 
it's something that was first reported in January, and Juan proposed a 
patch for it:

http://comments.gmane.org/gmane.comp.emulators.qemu/89009

[PATCH 4/5] Reopen files after migration

The symptom is, when running disk stress or any intense IO operation in 
guest while migrating it causes a qcow2 corruption. We've seen this 
consistently on the daily test jobs, both for qemu and qemu-kvm. The 
test that triggers it is autotest stress test running on a VM with 
ping-pong background migration.

The fix proposed by Juan is on our RHEL branch and such a problem does 
not happen on the RHEL branch. So, what about re-considering Juan's 
patch, or maybe work out a solution that is satisfactory for the 
upstream maintainers?

I'll open a launchpad bug with this report.

Thanks,

Lucas

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-09 16:29 ` [Qemu-devel] " Lucas Meneghel Rodrigues
@ 2011-11-09 16:39   ` Anthony Liguori
  -1 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-09 16:39 UTC (permalink / raw)
  To: Lucas Meneghel Rodrigues
  Cc: Kevin Wolf, KVM mailing list, Michael S. Tsirkin,
	Marcelo Tosatti, QEMU devel, Juan Jose Quintela Carreira,
	Avi Kivity

On 11/09/2011 10:29 AM, Lucas Meneghel Rodrigues wrote:
> Hi guys, here I am, reporting yet another issue with qemu. This time, it's
> something that was first reported in January, and Juan proposed a patch for it:
>
> http://comments.gmane.org/gmane.comp.emulators.qemu/89009

Migration with qcow2 is not a supported feature for 1.0.  Migration is only 
supported with raw images using coherent shared storage[1].

[1] NFS is only coherent with close-to-open which right now is not good enough 
for migration.

Regards,

Anthony Liguori

>
> [PATCH 4/5] Reopen files after migration
>
> The symptom is, when running disk stress or any intense IO operation in guest
> while migrating it causes a qcow2 corruption. We've seen this consistently on
> the daily test jobs, both for qemu and qemu-kvm. The test that triggers it is
> autotest stress test running on a VM with ping-pong background migration.
>
> The fix proposed by Juan is on our RHEL branch and such a problem does not
> happen on the RHEL branch. So, what about re-considering Juan's patch, or maybe
> work out a solution that is satisfactory for the upstream maintainers?
>
> I'll open a launchpad bug with this report.
>
> Thanks,
>
> Lucas
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-09 16:39   ` Anthony Liguori
  0 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-09 16:39 UTC (permalink / raw)
  To: Lucas Meneghel Rodrigues
  Cc: Kevin Wolf, KVM mailing list, Michael S. Tsirkin,
	Marcelo Tosatti, QEMU devel, Juan Jose Quintela Carreira,
	Avi Kivity

On 11/09/2011 10:29 AM, Lucas Meneghel Rodrigues wrote:
> Hi guys, here I am, reporting yet another issue with qemu. This time, it's
> something that was first reported in January, and Juan proposed a patch for it:
>
> http://comments.gmane.org/gmane.comp.emulators.qemu/89009

Migration with qcow2 is not a supported feature for 1.0.  Migration is only 
supported with raw images using coherent shared storage[1].

[1] NFS is only coherent with close-to-open which right now is not good enough 
for migration.

Regards,

Anthony Liguori

>
> [PATCH 4/5] Reopen files after migration
>
> The symptom is, when running disk stress or any intense IO operation in guest
> while migrating it causes a qcow2 corruption. We've seen this consistently on
> the daily test jobs, both for qemu and qemu-kvm. The test that triggers it is
> autotest stress test running on a VM with ping-pong background migration.
>
> The fix proposed by Juan is on our RHEL branch and such a problem does not
> happen on the RHEL branch. So, what about re-considering Juan's patch, or maybe
> work out a solution that is satisfactory for the upstream maintainers?
>
> I'll open a launchpad bug with this report.
>
> Thanks,
>
> Lucas
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-09 16:39   ` [Qemu-devel] " Anthony Liguori
@ 2011-11-09 17:02     ` Avi Kivity
  -1 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2011-11-09 17:02 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, QEMU devel, KVM mailing list,
	Marcelo Tosatti, Kevin Wolf, Michael S. Tsirkin,
	Juan Jose Quintela Carreira

On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>
> Migration with qcow2 is not a supported feature for 1.0.  Migration is
> only supported with raw images using coherent shared storage[1].
>
> [1] NFS is only coherent with close-to-open which right now is not
> good enough for migration.

Say what?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-09 17:02     ` Avi Kivity
  0 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2011-11-09 17:02 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>
> Migration with qcow2 is not a supported feature for 1.0.  Migration is
> only supported with raw images using coherent shared storage[1].
>
> [1] NFS is only coherent with close-to-open which right now is not
> good enough for migration.

Say what?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-09 17:02     ` Avi Kivity
@ 2011-11-09 17:35       ` Anthony Liguori
  -1 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-09 17:35 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/09/2011 11:02 AM, Avi Kivity wrote:
> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>
>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>> only supported with raw images using coherent shared storage[1].
>>
>> [1] NFS is only coherent with close-to-open which right now is not
>> good enough for migration.
>
> Say what?

Due to block format probing, we read at least the first sector of the disk 
during start up.

Strictly going by what NFS guarantees, since we don't open on the destination 
*after* as close on the source, we aren't guaranteed to see what's written by 
the source.

In practice, because of block format probing, unless we're using cache=none, the 
first sector can be out of sync with the source on the destination.  If you use 
cache=none on a Linux client with at least a Linux NFS server, you should be 
relatively safe.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-09 17:35       ` Anthony Liguori
  0 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-09 17:35 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/09/2011 11:02 AM, Avi Kivity wrote:
> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>
>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>> only supported with raw images using coherent shared storage[1].
>>
>> [1] NFS is only coherent with close-to-open which right now is not
>> good enough for migration.
>
> Say what?

Due to block format probing, we read at least the first sector of the disk 
during start up.

Strictly going by what NFS guarantees, since we don't open on the destination 
*after* as close on the source, we aren't guaranteed to see what's written by 
the source.

In practice, because of block format probing, unless we're using cache=none, the 
first sector can be out of sync with the source on the destination.  If you use 
cache=none on a Linux client with at least a Linux NFS server, you should be 
relatively safe.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-09 16:29 ` [Qemu-devel] " Lucas Meneghel Rodrigues
@ 2011-11-09 19:25   ` Juan Quintela
  -1 siblings, 0 replies; 102+ messages in thread
From: Juan Quintela @ 2011-11-09 19:25 UTC (permalink / raw)
  To: Lucas Meneghel Rodrigues
  Cc: QEMU devel, KVM mailing list, Avi Kivity, Marcelo Tosatti,
	Anthony Liguori, Kevin Wolf, Michael S. Tsirkin

Lucas Meneghel Rodrigues <lmr@redhat.com> wrote:
> Hi guys, here I am, reporting yet another issue with qemu. This time,
> it's something that was first reported in January, and Juan proposed a
> patch for it:
>
> http://comments.gmane.org/gmane.comp.emulators.qemu/89009
>
> [PATCH 4/5] Reopen files after migration
>
> The symptom is, when running disk stress or any intense IO operation
> in guest while migrating it causes a qcow2 corruption. We've seen this
> consistently on the daily test jobs, both for qemu and qemu-kvm. The
> test that triggers it is autotest stress test running on a VM with
> ping-pong background migration.
>
> The fix proposed by Juan is on our RHEL branch and such a problem does
> not happen on the RHEL branch. So, what about re-considering Juan's
> patch, or maybe work out a solution that is satisfactory for the
> upstream maintainers?
>
> I'll open a launchpad bug with this report.

I have just sent:

[RFC PATCH 0/2] Fix migration with NFS & iscsi/Fiber channel

Only the 1st patch on that series are needed to fix that problem.  Could
you try it?

Thanks, Juan.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-09 19:25   ` Juan Quintela
  0 siblings, 0 replies; 102+ messages in thread
From: Juan Quintela @ 2011-11-09 19:25 UTC (permalink / raw)
  To: Lucas Meneghel Rodrigues
  Cc: Kevin Wolf, KVM mailing list, Michael S. Tsirkin,
	Marcelo Tosatti, QEMU devel, Avi Kivity

Lucas Meneghel Rodrigues <lmr@redhat.com> wrote:
> Hi guys, here I am, reporting yet another issue with qemu. This time,
> it's something that was first reported in January, and Juan proposed a
> patch for it:
>
> http://comments.gmane.org/gmane.comp.emulators.qemu/89009
>
> [PATCH 4/5] Reopen files after migration
>
> The symptom is, when running disk stress or any intense IO operation
> in guest while migrating it causes a qcow2 corruption. We've seen this
> consistently on the daily test jobs, both for qemu and qemu-kvm. The
> test that triggers it is autotest stress test running on a VM with
> ping-pong background migration.
>
> The fix proposed by Juan is on our RHEL branch and such a problem does
> not happen on the RHEL branch. So, what about re-considering Juan's
> patch, or maybe work out a solution that is satisfactory for the
> upstream maintainers?
>
> I'll open a launchpad bug with this report.

I have just sent:

[RFC PATCH 0/2] Fix migration with NFS & iscsi/Fiber channel

Only the 1st patch on that series are needed to fix that problem.  Could
you try it?

Thanks, Juan.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-09 17:35       ` [Qemu-devel] " Anthony Liguori
@ 2011-11-09 19:53         ` Juan Quintela
  -1 siblings, 0 replies; 102+ messages in thread
From: Juan Quintela @ 2011-11-09 19:53 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Lucas Meneghel Rodrigues, Kevin Wolf,
	KVM mailing list, Michael S. Tsirkin, Marcelo Tosatti,
	QEMU devel

Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>
>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>> only supported with raw images using coherent shared storage[1].
>>>
>>> [1] NFS is only coherent with close-to-open which right now is not
>>> good enough for migration.
>>
>> Say what?
>
> Due to block format probing, we read at least the first sector of the
> disk during start up.
>
> Strictly going by what NFS guarantees, since we don't open on the
> destination *after* as close on the source, we aren't guaranteed to
> see what's written by the source.
>
> In practice, because of block format probing, unless we're using
> cache=none, the first sector can be out of sync with the source on the
> destination.  If you use cache=none on a Linux client with at least a
> Linux NFS server, you should be relatively safe.

You are not :-(

If you are using a format that "caches" data, like qcow2 with the L1/L2
cache, you are not safe.  You need to reopen (or discard metadata +
re-read it).  Notice that raw nowadays also has metadata (we can resize
the image on the flight, and we need to reopen to find that).

About the coherence problem, I just sent the patches that we had on RHEL
to the list.  With cache=none, both NFS & iSCSI & Fiberchannel are ok
(module the previous problem of metadata).  If you look at the second
patch that I sent, it "tries" to flush the read cache for a block
device.  Problem with the patch are:
- BLKFLSBUF is linux specific
- BLKFLSBUF only works for "some block devices"
- Christoph just Nacked it due to previous reasons.

In resume:
- If we use raw, we don't resize images, and we use a clustered
  filesystem, qemu.git migration works.

- If we change metadata (qcow2, raw resize, ...) we need to re-read
  metadata (we just close +open on RHEL).

- If we use NFS: we need to use cache=none, or close+open consistency

- if we use iSCSI: we need to use cache=none. close+open is not enough
  for consistency.  The ioctl patch that I sent happens to work on
  linux, but it is not even guaranteed to work there.  And if our block
  layer gurus told us not to use the ioctl() I think that we need to do
  just that.

Later, Juan.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-09 19:53         ` Juan Quintela
  0 siblings, 0 replies; 102+ messages in thread
From: Juan Quintela @ 2011-11-09 19:53 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel, Avi Kivity

Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>
>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>> only supported with raw images using coherent shared storage[1].
>>>
>>> [1] NFS is only coherent with close-to-open which right now is not
>>> good enough for migration.
>>
>> Say what?
>
> Due to block format probing, we read at least the first sector of the
> disk during start up.
>
> Strictly going by what NFS guarantees, since we don't open on the
> destination *after* as close on the source, we aren't guaranteed to
> see what's written by the source.
>
> In practice, because of block format probing, unless we're using
> cache=none, the first sector can be out of sync with the source on the
> destination.  If you use cache=none on a Linux client with at least a
> Linux NFS server, you should be relatively safe.

You are not :-(

If you are using a format that "caches" data, like qcow2 with the L1/L2
cache, you are not safe.  You need to reopen (or discard metadata +
re-read it).  Notice that raw nowadays also has metadata (we can resize
the image on the flight, and we need to reopen to find that).

About the coherence problem, I just sent the patches that we had on RHEL
to the list.  With cache=none, both NFS & iSCSI & Fiberchannel are ok
(module the previous problem of metadata).  If you look at the second
patch that I sent, it "tries" to flush the read cache for a block
device.  Problem with the patch are:
- BLKFLSBUF is linux specific
- BLKFLSBUF only works for "some block devices"
- Christoph just Nacked it due to previous reasons.

In resume:
- If we use raw, we don't resize images, and we use a clustered
  filesystem, qemu.git migration works.

- If we change metadata (qcow2, raw resize, ...) we need to re-read
  metadata (we just close +open on RHEL).

- If we use NFS: we need to use cache=none, or close+open consistency

- if we use iSCSI: we need to use cache=none. close+open is not enough
  for consistency.  The ioctl patch that I sent happens to work on
  linux, but it is not even guaranteed to work there.  And if our block
  layer gurus told us not to use the ioctl() I think that we need to do
  just that.

Later, Juan.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-09 17:35       ` [Qemu-devel] " Anthony Liguori
@ 2011-11-09 20:18         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-09 20:18 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Juan Jose Quintela Carreira, Marcelo Tosatti, QEMU devel,
	Avi Kivity

On Wed, Nov 09, 2011 at 11:35:54AM -0600, Anthony Liguori wrote:
> On 11/09/2011 11:02 AM, Avi Kivity wrote:
> >On 11/09/2011 06:39 PM, Anthony Liguori wrote:
> >>
> >>Migration with qcow2 is not a supported feature for 1.0.  Migration is
> >>only supported with raw images using coherent shared storage[1].
> >>
> >>[1] NFS is only coherent with close-to-open which right now is not
> >>good enough for migration.
> >
> >Say what?
> 
> Due to block format probing, we read at least the first sector of
> the disk during start up.

A simple solution is not to do any probing before the VM is first
started on the incoming path.

Any issues with this?

-- 
MST

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-09 20:18         ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-09 20:18 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Juan Jose Quintela Carreira, Marcelo Tosatti, QEMU devel,
	Avi Kivity

On Wed, Nov 09, 2011 at 11:35:54AM -0600, Anthony Liguori wrote:
> On 11/09/2011 11:02 AM, Avi Kivity wrote:
> >On 11/09/2011 06:39 PM, Anthony Liguori wrote:
> >>
> >>Migration with qcow2 is not a supported feature for 1.0.  Migration is
> >>only supported with raw images using coherent shared storage[1].
> >>
> >>[1] NFS is only coherent with close-to-open which right now is not
> >>good enough for migration.
> >
> >Say what?
> 
> Due to block format probing, we read at least the first sector of
> the disk during start up.

A simple solution is not to do any probing before the VM is first
started on the incoming path.

Any issues with this?

-- 
MST

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-09 20:18         ` [Qemu-devel] " Michael S. Tsirkin
  (?)
@ 2011-11-09 20:22         ` Anthony Liguori
  2011-11-09 21:00             ` [Qemu-devel] " Michael S. Tsirkin
  -1 siblings, 1 reply; 102+ messages in thread
From: Anthony Liguori @ 2011-11-09 20:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Juan Jose Quintela Carreira, Marcelo Tosatti, QEMU devel,
	Avi Kivity

On 11/09/2011 02:18 PM, Michael S. Tsirkin wrote:
> On Wed, Nov 09, 2011 at 11:35:54AM -0600, Anthony Liguori wrote:
>> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>>
>>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>>> only supported with raw images using coherent shared storage[1].
>>>>
>>>> [1] NFS is only coherent with close-to-open which right now is not
>>>> good enough for migration.
>>>
>>> Say what?
>>
>> Due to block format probing, we read at least the first sector of
>> the disk during start up.
>
> A simple solution is not to do any probing before the VM is first
> started on the incoming path.
>
> Any issues with this?
>

http://mid.gmane.org/1284213896-12705-4-git-send-email-aliguori@us.ibm.com

I think Kevin wanted open to get delayed.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-09 20:18         ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-11-09 20:57           ` Juan Quintela
  -1 siblings, 0 replies; 102+ messages in thread
From: Juan Quintela @ 2011-11-09 20:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Anthony Liguori, Avi Kivity, Lucas Meneghel Rodrigues,
	Kevin Wolf, KVM mailing list, Marcelo Tosatti, QEMU devel

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Nov 09, 2011 at 11:35:54AM -0600, Anthony Liguori wrote:
>> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>> >On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>> >>
>> >>Migration with qcow2 is not a supported feature for 1.0.  Migration is
>> >>only supported with raw images using coherent shared storage[1].
>> >>
>> >>[1] NFS is only coherent with close-to-open which right now is not
>> >>good enough for migration.
>> >
>> >Say what?
>> 
>> Due to block format probing, we read at least the first sector of
>> the disk during start up.
>
> A simple solution is not to do any probing before the VM is first
> started on the incoming path.
>
> Any issues with this?

hw/pc.c:cmos init of the devices needs to know the geometry.

But looking at the code, "perhaps" this commit has made things work
correctly even if we open them late.  Something like this was the idea
that Anthony suggested last time I showed the problem.

Later, Juan.

commit c0897e0cb94e83ec1098867b81870e4f51f225b9
Author: Markus Armbruster <armbru@redhat.com>
Date:   Thu Jun 24 19:58:20 2010 +0200

    pc: Fix CMOS info for drives defined with -device

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-09 20:57           ` Juan Quintela
  0 siblings, 0 replies; 102+ messages in thread
From: Juan Quintela @ 2011-11-09 20:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Marcelo Tosatti, QEMU devel, Avi Kivity

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Nov 09, 2011 at 11:35:54AM -0600, Anthony Liguori wrote:
>> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>> >On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>> >>
>> >>Migration with qcow2 is not a supported feature for 1.0.  Migration is
>> >>only supported with raw images using coherent shared storage[1].
>> >>
>> >>[1] NFS is only coherent with close-to-open which right now is not
>> >>good enough for migration.
>> >
>> >Say what?
>> 
>> Due to block format probing, we read at least the first sector of
>> the disk during start up.
>
> A simple solution is not to do any probing before the VM is first
> started on the incoming path.
>
> Any issues with this?

hw/pc.c:cmos init of the devices needs to know the geometry.

But looking at the code, "perhaps" this commit has made things work
correctly even if we open them late.  Something like this was the idea
that Anthony suggested last time I showed the problem.

Later, Juan.

commit c0897e0cb94e83ec1098867b81870e4f51f225b9
Author: Markus Armbruster <armbru@redhat.com>
Date:   Thu Jun 24 19:58:20 2010 +0200

    pc: Fix CMOS info for drives defined with -device

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-09 20:22         ` Anthony Liguori
@ 2011-11-09 21:00             ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-09 21:00 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Juan Jose Quintela Carreira, Marcelo Tosatti, QEMU devel,
	Avi Kivity

On Wed, Nov 09, 2011 at 02:22:02PM -0600, Anthony Liguori wrote:
> On 11/09/2011 02:18 PM, Michael S. Tsirkin wrote:
> >On Wed, Nov 09, 2011 at 11:35:54AM -0600, Anthony Liguori wrote:
> >>On 11/09/2011 11:02 AM, Avi Kivity wrote:
> >>>On 11/09/2011 06:39 PM, Anthony Liguori wrote:
> >>>>
> >>>>Migration with qcow2 is not a supported feature for 1.0.  Migration is
> >>>>only supported with raw images using coherent shared storage[1].
> >>>>
> >>>>[1] NFS is only coherent with close-to-open which right now is not
> >>>>good enough for migration.
> >>>
> >>>Say what?
> >>
> >>Due to block format probing, we read at least the first sector of
> >>the disk during start up.
> >
> >A simple solution is not to do any probing before the VM is first
> >started on the incoming path.
> >
> >Any issues with this?
> >
> 
> http://mid.gmane.org/1284213896-12705-4-git-send-email-aliguori@us.ibm.com
> I think Kevin wanted open to get delayed.
> 
> Regards,
> 
> Anthony Liguori

So, this patchset just needs to be revived and polished up?

-- 
MST

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-09 21:00             ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-09 21:00 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Juan Jose Quintela Carreira, Marcelo Tosatti, QEMU devel,
	Avi Kivity

On Wed, Nov 09, 2011 at 02:22:02PM -0600, Anthony Liguori wrote:
> On 11/09/2011 02:18 PM, Michael S. Tsirkin wrote:
> >On Wed, Nov 09, 2011 at 11:35:54AM -0600, Anthony Liguori wrote:
> >>On 11/09/2011 11:02 AM, Avi Kivity wrote:
> >>>On 11/09/2011 06:39 PM, Anthony Liguori wrote:
> >>>>
> >>>>Migration with qcow2 is not a supported feature for 1.0.  Migration is
> >>>>only supported with raw images using coherent shared storage[1].
> >>>>
> >>>>[1] NFS is only coherent with close-to-open which right now is not
> >>>>good enough for migration.
> >>>
> >>>Say what?
> >>
> >>Due to block format probing, we read at least the first sector of
> >>the disk during start up.
> >
> >A simple solution is not to do any probing before the VM is first
> >started on the incoming path.
> >
> >Any issues with this?
> >
> 
> http://mid.gmane.org/1284213896-12705-4-git-send-email-aliguori@us.ibm.com
> I think Kevin wanted open to get delayed.
> 
> Regards,
> 
> Anthony Liguori

So, this patchset just needs to be revived and polished up?

-- 
MST

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-09 21:00             ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-11-09 21:01               ` Anthony Liguori
  -1 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-09 21:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Juan Jose Quintela Carreira, Marcelo Tosatti, QEMU devel,
	Avi Kivity

On 11/09/2011 03:00 PM, Michael S. Tsirkin wrote:
> On Wed, Nov 09, 2011 at 02:22:02PM -0600, Anthony Liguori wrote:
>> On 11/09/2011 02:18 PM, Michael S. Tsirkin wrote:
>>> On Wed, Nov 09, 2011 at 11:35:54AM -0600, Anthony Liguori wrote:
>>>> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>>>>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>>>>
>>>>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>>>>> only supported with raw images using coherent shared storage[1].
>>>>>>
>>>>>> [1] NFS is only coherent with close-to-open which right now is not
>>>>>> good enough for migration.
>>>>>
>>>>> Say what?
>>>>
>>>> Due to block format probing, we read at least the first sector of
>>>> the disk during start up.
>>>
>>> A simple solution is not to do any probing before the VM is first
>>> started on the incoming path.
>>>
>>> Any issues with this?
>>>
>>
>> http://mid.gmane.org/1284213896-12705-4-git-send-email-aliguori@us.ibm.com
>> I think Kevin wanted open to get delayed.
>>
>> Regards,
>>
>> Anthony Liguori
>
> So, this patchset just needs to be revived and polished up?

What I took from the feedback was that Kevin wanted to defer open until the 
device model started.  That eliminates the need to reopen or have a invalidation 
callback.

I think it would be good for Kevin to comment here though because I might have 
misunderstood his feedback.

Regards,

Anthony Liguori

>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-09 21:01               ` Anthony Liguori
  0 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-09 21:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Juan Jose Quintela Carreira, Marcelo Tosatti, QEMU devel,
	Avi Kivity

On 11/09/2011 03:00 PM, Michael S. Tsirkin wrote:
> On Wed, Nov 09, 2011 at 02:22:02PM -0600, Anthony Liguori wrote:
>> On 11/09/2011 02:18 PM, Michael S. Tsirkin wrote:
>>> On Wed, Nov 09, 2011 at 11:35:54AM -0600, Anthony Liguori wrote:
>>>> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>>>>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>>>>
>>>>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>>>>> only supported with raw images using coherent shared storage[1].
>>>>>>
>>>>>> [1] NFS is only coherent with close-to-open which right now is not
>>>>>> good enough for migration.
>>>>>
>>>>> Say what?
>>>>
>>>> Due to block format probing, we read at least the first sector of
>>>> the disk during start up.
>>>
>>> A simple solution is not to do any probing before the VM is first
>>> started on the incoming path.
>>>
>>> Any issues with this?
>>>
>>
>> http://mid.gmane.org/1284213896-12705-4-git-send-email-aliguori@us.ibm.com
>> I think Kevin wanted open to get delayed.
>>
>> Regards,
>>
>> Anthony Liguori
>
> So, this patchset just needs to be revived and polished up?

What I took from the feedback was that Kevin wanted to defer open until the 
device model started.  That eliminates the need to reopen or have a invalidation 
callback.

I think it would be good for Kevin to comment here though because I might have 
misunderstood his feedback.

Regards,

Anthony Liguori

>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-09 19:25   ` [Qemu-devel] " Juan Quintela
@ 2011-11-09 23:33     ` Lucas Meneghel Rodrigues
  -1 siblings, 0 replies; 102+ messages in thread
From: Lucas Meneghel Rodrigues @ 2011-11-09 23:33 UTC (permalink / raw)
  To: quintela
  Cc: QEMU devel, KVM mailing list, Avi Kivity, Marcelo Tosatti,
	Anthony Liguori, Kevin Wolf, Michael S. Tsirkin

On 11/09/2011 05:25 PM, Juan Quintela wrote:
> Lucas Meneghel Rodrigues<lmr@redhat.com>  wrote:
>> Hi guys, here I am, reporting yet another issue with qemu. This time,
>> it's something that was first reported in January, and Juan proposed a
>> patch for it:
>>
>> http://comments.gmane.org/gmane.comp.emulators.qemu/89009
>>
>> [PATCH 4/5] Reopen files after migration
>>
>> The symptom is, when running disk stress or any intense IO operation
>> in guest while migrating it causes a qcow2 corruption. We've seen this
>> consistently on the daily test jobs, both for qemu and qemu-kvm. The
>> test that triggers it is autotest stress test running on a VM with
>> ping-pong background migration.
>>
>> The fix proposed by Juan is on our RHEL branch and such a problem does
>> not happen on the RHEL branch. So, what about re-considering Juan's
>> patch, or maybe work out a solution that is satisfactory for the
>> upstream maintainers?
>>
>> I'll open a launchpad bug with this report.
>
> I have just sent:
>
> [RFC PATCH 0/2] Fix migration with NFS&  iscsi/Fiber channel
>
> Only the 1st patch on that series are needed to fix that problem.  Could
> you try it?

Just tried it, it conclusively fixes the corruption problems. Reported 
it on the patch email.

> Thanks, Juan.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-09 23:33     ` Lucas Meneghel Rodrigues
  0 siblings, 0 replies; 102+ messages in thread
From: Lucas Meneghel Rodrigues @ 2011-11-09 23:33 UTC (permalink / raw)
  To: quintela
  Cc: Kevin Wolf, KVM mailing list, Michael S. Tsirkin,
	Marcelo Tosatti, QEMU devel, Avi Kivity

On 11/09/2011 05:25 PM, Juan Quintela wrote:
> Lucas Meneghel Rodrigues<lmr@redhat.com>  wrote:
>> Hi guys, here I am, reporting yet another issue with qemu. This time,
>> it's something that was first reported in January, and Juan proposed a
>> patch for it:
>>
>> http://comments.gmane.org/gmane.comp.emulators.qemu/89009
>>
>> [PATCH 4/5] Reopen files after migration
>>
>> The symptom is, when running disk stress or any intense IO operation
>> in guest while migrating it causes a qcow2 corruption. We've seen this
>> consistently on the daily test jobs, both for qemu and qemu-kvm. The
>> test that triggers it is autotest stress test running on a VM with
>> ping-pong background migration.
>>
>> The fix proposed by Juan is on our RHEL branch and such a problem does
>> not happen on the RHEL branch. So, what about re-considering Juan's
>> patch, or maybe work out a solution that is satisfactory for the
>> upstream maintainers?
>>
>> I'll open a launchpad bug with this report.
>
> I have just sent:
>
> [RFC PATCH 0/2] Fix migration with NFS&  iscsi/Fiber channel
>
> Only the 1st patch on that series are needed to fix that problem.  Could
> you try it?

Just tried it, it conclusively fixes the corruption problems. Reported 
it on the patch email.

> Thanks, Juan.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-09 17:35       ` [Qemu-devel] " Anthony Liguori
                         ` (2 preceding siblings ...)
  (?)
@ 2011-11-10  8:55       ` Avi Kivity
  2011-11-10 17:50           ` [Qemu-devel] " Juan Quintela
                           ` (2 more replies)
  -1 siblings, 3 replies; 102+ messages in thread
From: Avi Kivity @ 2011-11-10  8:55 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/09/2011 07:35 PM, Anthony Liguori wrote:
> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>
>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>> only supported with raw images using coherent shared storage[1].
>>>
>>> [1] NFS is only coherent with close-to-open which right now is not
>>> good enough for migration.
>>
>> Say what?
>
> Due to block format probing, we read at least the first sector of the
> disk during start up.
>
> Strictly going by what NFS guarantees, since we don't open on the
> destination *after* as close on the source, we aren't guaranteed to
> see what's written by the source.
>
> In practice, because of block format probing, unless we're using
> cache=none, the first sector can be out of sync with the source on the
> destination.  If you use cache=none on a Linux client with at least a
> Linux NFS server, you should be relatively safe.
>

IMO, this should be a release blocker.  qemu 1.0 only supporting
migration on enterprise storage?

If we have to delay the release for a month to get it right, we should. 
Not that I think we have to.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-09 21:01               ` [Qemu-devel] " Anthony Liguori
@ 2011-11-10 10:41                 ` Kevin Wolf
  -1 siblings, 0 replies; 102+ messages in thread
From: Kevin Wolf @ 2011-11-10 10:41 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Michael S. Tsirkin, Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, Marcelo Tosatti, QEMU devel,
	Avi Kivity

Am 09.11.2011 22:01, schrieb Anthony Liguori:
> On 11/09/2011 03:00 PM, Michael S. Tsirkin wrote:
>> On Wed, Nov 09, 2011 at 02:22:02PM -0600, Anthony Liguori wrote:
>>> On 11/09/2011 02:18 PM, Michael S. Tsirkin wrote:
>>>> On Wed, Nov 09, 2011 at 11:35:54AM -0600, Anthony Liguori wrote:
>>>>> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>>>>>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>>>>>
>>>>>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>>>>>> only supported with raw images using coherent shared storage[1].
>>>>>>>
>>>>>>> [1] NFS is only coherent with close-to-open which right now is not
>>>>>>> good enough for migration.
>>>>>>
>>>>>> Say what?
>>>>>
>>>>> Due to block format probing, we read at least the first sector of
>>>>> the disk during start up.
>>>>
>>>> A simple solution is not to do any probing before the VM is first
>>>> started on the incoming path.
>>>>
>>>> Any issues with this?
>>>>
>>>
>>> http://mid.gmane.org/1284213896-12705-4-git-send-email-aliguori@us.ibm.com
>>> I think Kevin wanted open to get delayed.
>>>
>>> Regards,
>>>
>>> Anthony Liguori
>>
>> So, this patchset just needs to be revived and polished up?
> 
> What I took from the feedback was that Kevin wanted to defer open until the 
> device model started.  That eliminates the need to reopen or have a invalidation 
> callback.
> 
> I think it would be good for Kevin to comment here though because I might have 
> misunderstood his feedback.

Your approach was to delay reads, but still keep the image open. I think
I worried that we might have additional reads somewhere that we don't
know about, and this is why I proposed delaying the open as well, so
that any read would always fail.

I believe just reopening the image is (almost?) as good and it's way
easier to do, so I would be inclined to do that for 1.0.

I'm not 100% sure about cases like iscsi, where reopening doesn't help.
I think delaying the open doesn't help there either if you migrate from
A to B and then back from B to A, you could still get old data. So for
iscsi probably cache=none remains the only safe choice, whatever we do.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-10 10:41                 ` Kevin Wolf
  0 siblings, 0 replies; 102+ messages in thread
From: Kevin Wolf @ 2011-11-10 10:41 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, Marcelo Tosatti, QEMU devel,
	Michael S. Tsirkin, Avi Kivity

Am 09.11.2011 22:01, schrieb Anthony Liguori:
> On 11/09/2011 03:00 PM, Michael S. Tsirkin wrote:
>> On Wed, Nov 09, 2011 at 02:22:02PM -0600, Anthony Liguori wrote:
>>> On 11/09/2011 02:18 PM, Michael S. Tsirkin wrote:
>>>> On Wed, Nov 09, 2011 at 11:35:54AM -0600, Anthony Liguori wrote:
>>>>> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>>>>>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>>>>>
>>>>>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>>>>>> only supported with raw images using coherent shared storage[1].
>>>>>>>
>>>>>>> [1] NFS is only coherent with close-to-open which right now is not
>>>>>>> good enough for migration.
>>>>>>
>>>>>> Say what?
>>>>>
>>>>> Due to block format probing, we read at least the first sector of
>>>>> the disk during start up.
>>>>
>>>> A simple solution is not to do any probing before the VM is first
>>>> started on the incoming path.
>>>>
>>>> Any issues with this?
>>>>
>>>
>>> http://mid.gmane.org/1284213896-12705-4-git-send-email-aliguori@us.ibm.com
>>> I think Kevin wanted open to get delayed.
>>>
>>> Regards,
>>>
>>> Anthony Liguori
>>
>> So, this patchset just needs to be revived and polished up?
> 
> What I took from the feedback was that Kevin wanted to defer open until the 
> device model started.  That eliminates the need to reopen or have a invalidation 
> callback.
> 
> I think it would be good for Kevin to comment here though because I might have 
> misunderstood his feedback.

Your approach was to delay reads, but still keep the image open. I think
I worried that we might have additional reads somewhere that we don't
know about, and this is why I proposed delaying the open as well, so
that any read would always fail.

I believe just reopening the image is (almost?) as good and it's way
easier to do, so I would be inclined to do that for 1.0.

I'm not 100% sure about cases like iscsi, where reopening doesn't help.
I think delaying the open doesn't help there either if you migrate from
A to B and then back from B to A, you could still get old data. So for
iscsi probably cache=none remains the only safe choice, whatever we do.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-10 10:41                 ` Kevin Wolf
@ 2011-11-10 16:50                   ` Juan Quintela
  -1 siblings, 0 replies; 102+ messages in thread
From: Juan Quintela @ 2011-11-10 16:50 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Anthony Liguori, Michael S. Tsirkin, Lucas Meneghel Rodrigues,
	KVM mailing list, Marcelo Tosatti, QEMU devel, Avi Kivity

Kevin Wolf <kwolf@redhat.com> wrote:

>> What I took from the feedback was that Kevin wanted to defer open until the 
>> device model started.  That eliminates the need to reopen or have a invalidation 
>> callback.
>> 
>> I think it would be good for Kevin to comment here though because I might have 
>> misunderstood his feedback.
>
> Your approach was to delay reads, but still keep the image open. I think
> I worried that we might have additional reads somewhere that we don't
> know about, and this is why I proposed delaying the open as well, so
> that any read would always fail.
>
> I believe just reopening the image is (almost?) as good and it's way
> easier to do, so I would be inclined to do that for 1.0.
>
> I'm not 100% sure about cases like iscsi, where reopening doesn't help.
> I think delaying the open doesn't help there either if you migrate from
> A to B and then back from B to A, you could still get old data. So for
> iscsi probably cache=none remains the only safe choice, whatever we do.

iSCSI and NFS only works with cache=none.  Even on NFS with close+open,
we have troubles if anything else has the file opened (think libvirt,
guestfs, whatever).  I really think that anynthing different of
cache=none from iSCSI or NFS is just betting (and yes, it took a while
for Christoph to convince me, I was trying to a "poor man" distributed
lock manager, and as everybody knows, it is a _difficult_ problem to
solve.).

Later, Juan.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-10 16:50                   ` Juan Quintela
  0 siblings, 0 replies; 102+ messages in thread
From: Juan Quintela @ 2011-11-10 16:50 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	Marcelo Tosatti, QEMU devel, Avi Kivity

Kevin Wolf <kwolf@redhat.com> wrote:

>> What I took from the feedback was that Kevin wanted to defer open until the 
>> device model started.  That eliminates the need to reopen or have a invalidation 
>> callback.
>> 
>> I think it would be good for Kevin to comment here though because I might have 
>> misunderstood his feedback.
>
> Your approach was to delay reads, but still keep the image open. I think
> I worried that we might have additional reads somewhere that we don't
> know about, and this is why I proposed delaying the open as well, so
> that any read would always fail.
>
> I believe just reopening the image is (almost?) as good and it's way
> easier to do, so I would be inclined to do that for 1.0.
>
> I'm not 100% sure about cases like iscsi, where reopening doesn't help.
> I think delaying the open doesn't help there either if you migrate from
> A to B and then back from B to A, you could still get old data. So for
> iscsi probably cache=none remains the only safe choice, whatever we do.

iSCSI and NFS only works with cache=none.  Even on NFS with close+open,
we have troubles if anything else has the file opened (think libvirt,
guestfs, whatever).  I really think that anynthing different of
cache=none from iSCSI or NFS is just betting (and yes, it took a while
for Christoph to convince me, I was trying to a "poor man" distributed
lock manager, and as everybody knows, it is a _difficult_ problem to
solve.).

Later, Juan.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-10  8:55       ` Avi Kivity
@ 2011-11-10 17:50           ` Juan Quintela
  2011-11-10 17:54           ` [Qemu-devel] " Anthony Liguori
  2011-11-10 18:27           ` Anthony Liguori
  2 siblings, 0 replies; 102+ messages in thread
From: Juan Quintela @ 2011-11-10 17:50 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Lucas Meneghel Rodrigues, Kevin Wolf,
	KVM mailing list, Michael S. Tsirkin, Marcelo Tosatti,
	QEMU devel

Avi Kivity <avi@redhat.com> wrote:
> On 11/09/2011 07:35 PM, Anthony Liguori wrote:
>> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>>
>>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>>> only supported with raw images using coherent shared storage[1].
>>>>
>>>> [1] NFS is only coherent with close-to-open which right now is not
>>>> good enough for migration.
>>>
>>> Say what?
>>
>> Due to block format probing, we read at least the first sector of the
>> disk during start up.
>>
>> Strictly going by what NFS guarantees, since we don't open on the
>> destination *after* as close on the source, we aren't guaranteed to
>> see what's written by the source.
>>
>> In practice, because of block format probing, unless we're using
>> cache=none, the first sector can be out of sync with the source on the
>> destination.  If you use cache=none on a Linux client with at least a
>> Linux NFS server, you should be relatively safe.
>>
>
> IMO, this should be a release blocker.  qemu 1.0 only supporting
> migration on enterprise storage?
>
> If we have to delay the release for a month to get it right, we should. 
> Not that I think we have to.

I kind of agree here, but it is not my call.  Patch 1/2 have been used
on RHEL for almost 3 years, so it should be safe (TM).

Later, Juan.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-10 17:50           ` Juan Quintela
  0 siblings, 0 replies; 102+ messages in thread
From: Juan Quintela @ 2011-11-10 17:50 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel

Avi Kivity <avi@redhat.com> wrote:
> On 11/09/2011 07:35 PM, Anthony Liguori wrote:
>> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>>
>>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>>> only supported with raw images using coherent shared storage[1].
>>>>
>>>> [1] NFS is only coherent with close-to-open which right now is not
>>>> good enough for migration.
>>>
>>> Say what?
>>
>> Due to block format probing, we read at least the first sector of the
>> disk during start up.
>>
>> Strictly going by what NFS guarantees, since we don't open on the
>> destination *after* as close on the source, we aren't guaranteed to
>> see what's written by the source.
>>
>> In practice, because of block format probing, unless we're using
>> cache=none, the first sector can be out of sync with the source on the
>> destination.  If you use cache=none on a Linux client with at least a
>> Linux NFS server, you should be relatively safe.
>>
>
> IMO, this should be a release blocker.  qemu 1.0 only supporting
> migration on enterprise storage?
>
> If we have to delay the release for a month to get it right, we should. 
> Not that I think we have to.

I kind of agree here, but it is not my call.  Patch 1/2 have been used
on RHEL for almost 3 years, so it should be safe (TM).

Later, Juan.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-10  8:55       ` Avi Kivity
@ 2011-11-10 17:54           ` Anthony Liguori
  2011-11-10 17:54           ` [Qemu-devel] " Anthony Liguori
  2011-11-10 18:27           ` Anthony Liguori
  2 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-10 17:54 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/10/2011 02:55 AM, Avi Kivity wrote:
> On 11/09/2011 07:35 PM, Anthony Liguori wrote:
>> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>>
>>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>>> only supported with raw images using coherent shared storage[1].
>>>>
>>>> [1] NFS is only coherent with close-to-open which right now is not
>>>> good enough for migration.
>>>
>>> Say what?
>>
>> Due to block format probing, we read at least the first sector of the
>> disk during start up.
>>
>> Strictly going by what NFS guarantees, since we don't open on the
>> destination *after* as close on the source, we aren't guaranteed to
>> see what's written by the source.
>>
>> In practice, because of block format probing, unless we're using
>> cache=none, the first sector can be out of sync with the source on the
>> destination.  If you use cache=none on a Linux client with at least a
>> Linux NFS server, you should be relatively safe.
>>
>
> IMO, this should be a release blocker.  qemu 1.0 only supporting
> migration on enterprise storage?

No, this is not going to block the release.

You can't dump patches on the ML during -rc for an issue that has been 
understood for well over a year simply because it's release time.

If this was so important, it should have been fixed a year ago in the proper way.

Regards,

Anthony Liguori

>
> If we have to delay the release for a month to get it right, we should.
> Not that I think we have to.
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-10 17:54           ` Anthony Liguori
  0 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-10 17:54 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/10/2011 02:55 AM, Avi Kivity wrote:
> On 11/09/2011 07:35 PM, Anthony Liguori wrote:
>> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>>
>>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>>> only supported with raw images using coherent shared storage[1].
>>>>
>>>> [1] NFS is only coherent with close-to-open which right now is not
>>>> good enough for migration.
>>>
>>> Say what?
>>
>> Due to block format probing, we read at least the first sector of the
>> disk during start up.
>>
>> Strictly going by what NFS guarantees, since we don't open on the
>> destination *after* as close on the source, we aren't guaranteed to
>> see what's written by the source.
>>
>> In practice, because of block format probing, unless we're using
>> cache=none, the first sector can be out of sync with the source on the
>> destination.  If you use cache=none on a Linux client with at least a
>> Linux NFS server, you should be relatively safe.
>>
>
> IMO, this should be a release blocker.  qemu 1.0 only supporting
> migration on enterprise storage?

No, this is not going to block the release.

You can't dump patches on the ML during -rc for an issue that has been 
understood for well over a year simply because it's release time.

If this was so important, it should have been fixed a year ago in the proper way.

Regards,

Anthony Liguori

>
> If we have to delay the release for a month to get it right, we should.
> Not that I think we have to.
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-10 16:50                   ` [Qemu-devel] " Juan Quintela
@ 2011-11-10 17:59                     ` Anthony Liguori
  -1 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-10 17:59 UTC (permalink / raw)
  To: quintela
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel, Avi Kivity

On 11/10/2011 10:50 AM, Juan Quintela wrote:
> Kevin Wolf<kwolf@redhat.com>  wrote:
>
>>> What I took from the feedback was that Kevin wanted to defer open until the
>>> device model started.  That eliminates the need to reopen or have a invalidation
>>> callback.
>>>
>>> I think it would be good for Kevin to comment here though because I might have
>>> misunderstood his feedback.
>>
>> Your approach was to delay reads, but still keep the image open. I think
>> I worried that we might have additional reads somewhere that we don't
>> know about, and this is why I proposed delaying the open as well, so
>> that any read would always fail.
>>
>> I believe just reopening the image is (almost?) as good and it's way
>> easier to do, so I would be inclined to do that for 1.0.
>>
>> I'm not 100% sure about cases like iscsi, where reopening doesn't help.
>> I think delaying the open doesn't help there either if you migrate from
>> A to B and then back from B to A, you could still get old data. So for
>> iscsi probably cache=none remains the only safe choice, whatever we do.
>
> iSCSI and NFS only works with cache=none.  Even on NFS with close+open,
> we have troubles if anything else has the file opened (think libvirt,
> guestfs, whatever).

Reopening with iSCSI is strictly an issue with the in-kernel initiator, right? 
libiscsi should be safe with a delayed open I would imagine.

Regards,

Anthony Liguori

   I really think that anynthing different of
> cache=none from iSCSI or NFS is just betting (and yes, it took a while
> for Christoph to convince me, I was trying to a "poor man" distributed
> lock manager, and as everybody knows, it is a _difficult_ problem to
> solve.).
>
> Later, Juan.
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-10 17:59                     ` Anthony Liguori
  0 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-10 17:59 UTC (permalink / raw)
  To: quintela
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel, Avi Kivity

On 11/10/2011 10:50 AM, Juan Quintela wrote:
> Kevin Wolf<kwolf@redhat.com>  wrote:
>
>>> What I took from the feedback was that Kevin wanted to defer open until the
>>> device model started.  That eliminates the need to reopen or have a invalidation
>>> callback.
>>>
>>> I think it would be good for Kevin to comment here though because I might have
>>> misunderstood his feedback.
>>
>> Your approach was to delay reads, but still keep the image open. I think
>> I worried that we might have additional reads somewhere that we don't
>> know about, and this is why I proposed delaying the open as well, so
>> that any read would always fail.
>>
>> I believe just reopening the image is (almost?) as good and it's way
>> easier to do, so I would be inclined to do that for 1.0.
>>
>> I'm not 100% sure about cases like iscsi, where reopening doesn't help.
>> I think delaying the open doesn't help there either if you migrate from
>> A to B and then back from B to A, you could still get old data. So for
>> iscsi probably cache=none remains the only safe choice, whatever we do.
>
> iSCSI and NFS only works with cache=none.  Even on NFS with close+open,
> we have troubles if anything else has the file opened (think libvirt,
> guestfs, whatever).

Reopening with iSCSI is strictly an issue with the in-kernel initiator, right? 
libiscsi should be safe with a delayed open I would imagine.

Regards,

Anthony Liguori

   I really think that anynthing different of
> cache=none from iSCSI or NFS is just betting (and yes, it took a while
> for Christoph to convince me, I was trying to a "poor man" distributed
> lock manager, and as everybody knows, it is a _difficult_ problem to
> solve.).
>
> Later, Juan.
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-10 10:41                 ` Kevin Wolf
@ 2011-11-10 18:00                   ` Anthony Liguori
  -1 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-10 18:00 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, Marcelo Tosatti, QEMU devel,
	Michael S. Tsirkin, Avi Kivity

On 11/10/2011 04:41 AM, Kevin Wolf wrote:
> Am 09.11.2011 22:01, schrieb Anthony Liguori:
>> On 11/09/2011 03:00 PM, Michael S. Tsirkin wrote:
>>> On Wed, Nov 09, 2011 at 02:22:02PM -0600, Anthony Liguori wrote:
>>>> On 11/09/2011 02:18 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Nov 09, 2011 at 11:35:54AM -0600, Anthony Liguori wrote:
>>>>>> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>>>>>>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>>>>>>
>>>>>>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>>>>>>> only supported with raw images using coherent shared storage[1].
>>>>>>>>
>>>>>>>> [1] NFS is only coherent with close-to-open which right now is not
>>>>>>>> good enough for migration.
>>>>>>>
>>>>>>> Say what?
>>>>>>
>>>>>> Due to block format probing, we read at least the first sector of
>>>>>> the disk during start up.
>>>>>
>>>>> A simple solution is not to do any probing before the VM is first
>>>>> started on the incoming path.
>>>>>
>>>>> Any issues with this?
>>>>>
>>>>
>>>> http://mid.gmane.org/1284213896-12705-4-git-send-email-aliguori@us.ibm.com
>>>> I think Kevin wanted open to get delayed.
>>>>
>>>> Regards,
>>>>
>>>> Anthony Liguori
>>>
>>> So, this patchset just needs to be revived and polished up?
>>
>> What I took from the feedback was that Kevin wanted to defer open until the
>> device model started.  That eliminates the need to reopen or have a invalidation
>> callback.
>>
>> I think it would be good for Kevin to comment here though because I might have
>> misunderstood his feedback.
>
> Your approach was to delay reads, but still keep the image open. I think
> I worried that we might have additional reads somewhere that we don't
> know about, and this is why I proposed delaying the open as well, so
> that any read would always fail.
>
> I believe just reopening the image is (almost?) as good and it's way
> easier to do, so I would be inclined to do that for 1.0.

I don't think reopen is good enough without delaying CHS probing too.  That 
information is still potentially out of date.  I don't think you can fix this 
problem without delaying CHS probing at least.

Regards,

Anthony Liguori

>
> I'm not 100% sure about cases like iscsi, where reopening doesn't help.
> I think delaying the open doesn't help there either if you migrate from
> A to B and then back from B to A, you could still get old data. So for
> iscsi probably cache=none remains the only safe choice, whatever we do.
>
> Kevin
>


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-10 18:00                   ` Anthony Liguori
  0 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-10 18:00 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	Marcelo Tosatti, QEMU devel, Juan Jose Quintela Carreira,
	Avi Kivity

On 11/10/2011 04:41 AM, Kevin Wolf wrote:
> Am 09.11.2011 22:01, schrieb Anthony Liguori:
>> On 11/09/2011 03:00 PM, Michael S. Tsirkin wrote:
>>> On Wed, Nov 09, 2011 at 02:22:02PM -0600, Anthony Liguori wrote:
>>>> On 11/09/2011 02:18 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Nov 09, 2011 at 11:35:54AM -0600, Anthony Liguori wrote:
>>>>>> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>>>>>>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>>>>>>
>>>>>>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>>>>>>> only supported with raw images using coherent shared storage[1].
>>>>>>>>
>>>>>>>> [1] NFS is only coherent with close-to-open which right now is not
>>>>>>>> good enough for migration.
>>>>>>>
>>>>>>> Say what?
>>>>>>
>>>>>> Due to block format probing, we read at least the first sector of
>>>>>> the disk during start up.
>>>>>
>>>>> A simple solution is not to do any probing before the VM is first
>>>>> started on the incoming path.
>>>>>
>>>>> Any issues with this?
>>>>>
>>>>
>>>> http://mid.gmane.org/1284213896-12705-4-git-send-email-aliguori@us.ibm.com
>>>> I think Kevin wanted open to get delayed.
>>>>
>>>> Regards,
>>>>
>>>> Anthony Liguori
>>>
>>> So, this patchset just needs to be revived and polished up?
>>
>> What I took from the feedback was that Kevin wanted to defer open until the
>> device model started.  That eliminates the need to reopen or have a invalidation
>> callback.
>>
>> I think it would be good for Kevin to comment here though because I might have
>> misunderstood his feedback.
>
> Your approach was to delay reads, but still keep the image open. I think
> I worried that we might have additional reads somewhere that we don't
> know about, and this is why I proposed delaying the open as well, so
> that any read would always fail.
>
> I believe just reopening the image is (almost?) as good and it's way
> easier to do, so I would be inclined to do that for 1.0.

I don't think reopen is good enough without delaying CHS probing too.  That 
information is still potentially out of date.  I don't think you can fix this 
problem without delaying CHS probing at least.

Regards,

Anthony Liguori

>
> I'm not 100% sure about cases like iscsi, where reopening doesn't help.
> I think delaying the open doesn't help there either if you migrate from
> A to B and then back from B to A, you could still get old data. So for
> iscsi probably cache=none remains the only safe choice, whatever we do.
>
> Kevin
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-10  8:55       ` Avi Kivity
@ 2011-11-10 18:27           ` Anthony Liguori
  2011-11-10 17:54           ` [Qemu-devel] " Anthony Liguori
  2011-11-10 18:27           ` Anthony Liguori
  2 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-10 18:27 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Daniel P. Berrange, libvir-list

On 11/10/2011 02:55 AM, Avi Kivity wrote:
> On 11/09/2011 07:35 PM, Anthony Liguori wrote:
>> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>>
>>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>>> only supported with raw images using coherent shared storage[1].
>>>>
>>>> [1] NFS is only coherent with close-to-open which right now is not
>>>> good enough for migration.
>>>
>>> Say what?
>>
>> Due to block format probing, we read at least the first sector of the
>> disk during start up.
>>
>> Strictly going by what NFS guarantees, since we don't open on the
>> destination *after* as close on the source, we aren't guaranteed to
>> see what's written by the source.
>>
>> In practice, because of block format probing, unless we're using
>> cache=none, the first sector can be out of sync with the source on the
>> destination.  If you use cache=none on a Linux client with at least a
>> Linux NFS server, you should be relatively safe.
>>
>
> IMO, this should be a release blocker.  qemu 1.0 only supporting
> migration on enterprise storage?
>
> If we have to delay the release for a month to get it right, we should.
> Not that I think we have to.
>

Adding libvirt to the discussion.

What does libvirt actually do in the monitor prior to migration completing on 
the destination?  The least invasive way of doing delayed open of block devices 
is probably to make -incoming create a monitor and run a main loop before the 
block devices (and full device model) is initialized.  Since this isolates the 
changes strictly to migration, I'd feel okay doing this for 1.0 (although it 
might need to be in the stable branch).

I know a monitor can run like this as I've done it before but some of the 
commands will not behave as expected so it's pretty important to be comfortable 
with what commands are actually being used in this mode.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-10 18:27           ` Anthony Liguori
  0 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-10 18:27 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/10/2011 02:55 AM, Avi Kivity wrote:
> On 11/09/2011 07:35 PM, Anthony Liguori wrote:
>> On 11/09/2011 11:02 AM, Avi Kivity wrote:
>>> On 11/09/2011 06:39 PM, Anthony Liguori wrote:
>>>>
>>>> Migration with qcow2 is not a supported feature for 1.0.  Migration is
>>>> only supported with raw images using coherent shared storage[1].
>>>>
>>>> [1] NFS is only coherent with close-to-open which right now is not
>>>> good enough for migration.
>>>
>>> Say what?
>>
>> Due to block format probing, we read at least the first sector of the
>> disk during start up.
>>
>> Strictly going by what NFS guarantees, since we don't open on the
>> destination *after* as close on the source, we aren't guaranteed to
>> see what's written by the source.
>>
>> In practice, because of block format probing, unless we're using
>> cache=none, the first sector can be out of sync with the source on the
>> destination.  If you use cache=none on a Linux client with at least a
>> Linux NFS server, you should be relatively safe.
>>
>
> IMO, this should be a release blocker.  qemu 1.0 only supporting
> migration on enterprise storage?
>
> If we have to delay the release for a month to get it right, we should.
> Not that I think we have to.
>

Adding libvirt to the discussion.

What does libvirt actually do in the monitor prior to migration completing on 
the destination?  The least invasive way of doing delayed open of block devices 
is probably to make -incoming create a monitor and run a main loop before the 
block devices (and full device model) is initialized.  Since this isolates the 
changes strictly to migration, I'd feel okay doing this for 1.0 (although it 
might need to be in the stable branch).

I know a monitor can run like this as I've done it before but some of the 
commands will not behave as expected so it's pretty important to be comfortable 
with what commands are actually being used in this mode.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-10 18:27           ` Anthony Liguori
@ 2011-11-10 18:42             ` Daniel P. Berrange
  -1 siblings, 0 replies; 102+ messages in thread
From: Daniel P. Berrange @ 2011-11-10 18:42 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

On Thu, Nov 10, 2011 at 12:27:30PM -0600, Anthony Liguori wrote:
> What does libvirt actually do in the monitor prior to migration
> completing on the destination?  The least invasive way of doing
> delayed open of block devices is probably to make -incoming create a
> monitor and run a main loop before the block devices (and full
> device model) is initialized.  Since this isolates the changes
> strictly to migration, I'd feel okay doing this for 1.0 (although it
> might need to be in the stable branch).

The way migration works with libvirt wrt QEMU interactions is now
as follows

 1. Destination.
       Run   qemu -incoming ...args...
       Query chardevs via monitor
       Query vCPU threads via monitor
       Set disk / vnc passwords
       Set netdev link states
       Set balloon target

 2. Source
       Set  migration speed
       Set  migration max downtime
       Run  migrate command (detached)
       while 1
          Query migration status
          if status is failed or success
            break;

 3. Destination
      If final status was success
         Run  'cont' in monitor
      else
         kill QEMU process

 4. Source
      If final status was success and 'cont' on dest succeeded
         kill QEMU process
      else
         Run 'cont' in monitor


In older libvirt, the bits from step 4, would actually take place
at the end of step 2. This meant we could end up with no QEMU
on either the source or dest, if starting CPUs on the dest QEMU
failed for some reason.


We would still really like to have a 'query-migrate' command for
the destination, so that we can confirm that the destination has
consumed all incoming migrate data successfully, rather than just
blindly starting CPUs and hoping for the best.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-10 18:42             ` Daniel P. Berrange
  0 siblings, 0 replies; 102+ messages in thread
From: Daniel P. Berrange @ 2011-11-10 18:42 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

On Thu, Nov 10, 2011 at 12:27:30PM -0600, Anthony Liguori wrote:
> What does libvirt actually do in the monitor prior to migration
> completing on the destination?  The least invasive way of doing
> delayed open of block devices is probably to make -incoming create a
> monitor and run a main loop before the block devices (and full
> device model) is initialized.  Since this isolates the changes
> strictly to migration, I'd feel okay doing this for 1.0 (although it
> might need to be in the stable branch).

The way migration works with libvirt wrt QEMU interactions is now
as follows

 1. Destination.
       Run   qemu -incoming ...args...
       Query chardevs via monitor
       Query vCPU threads via monitor
       Set disk / vnc passwords
       Set netdev link states
       Set balloon target

 2. Source
       Set  migration speed
       Set  migration max downtime
       Run  migrate command (detached)
       while 1
          Query migration status
          if status is failed or success
            break;

 3. Destination
      If final status was success
         Run  'cont' in monitor
      else
         kill QEMU process

 4. Source
      If final status was success and 'cont' on dest succeeded
         kill QEMU process
      else
         Run 'cont' in monitor


In older libvirt, the bits from step 4, would actually take place
at the end of step 2. This meant we could end up with no QEMU
on either the source or dest, if starting CPUs on the dest QEMU
failed for some reason.


We would still really like to have a 'query-migrate' command for
the destination, so that we can confirm that the destination has
consumed all incoming migrate data successfully, rather than just
blindly starting CPUs and hoping for the best.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-10 18:42             ` [Qemu-devel] " Daniel P. Berrange
  (?)
@ 2011-11-10 19:11             ` Anthony Liguori
  2011-11-10 20:06               ` Daniel P. Berrange
  -1 siblings, 1 reply; 102+ messages in thread
From: Anthony Liguori @ 2011-11-10 19:11 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

On 11/10/2011 12:42 PM, Daniel P. Berrange wrote:
> On Thu, Nov 10, 2011 at 12:27:30PM -0600, Anthony Liguori wrote:
>> What does libvirt actually do in the monitor prior to migration
>> completing on the destination?  The least invasive way of doing
>> delayed open of block devices is probably to make -incoming create a
>> monitor and run a main loop before the block devices (and full
>> device model) is initialized.  Since this isolates the changes
>> strictly to migration, I'd feel okay doing this for 1.0 (although it
>> might need to be in the stable branch).
>
> The way migration works with libvirt wrt QEMU interactions is now
> as follows
>
>   1. Destination.
>         Run   qemu -incoming ...args...
>         Query chardevs via monitor
>         Query vCPU threads via monitor
>         Set disk / vnc passwords

Since RHEL carries Juan's patch, and Juan's patch doesn't handle disk passwords 
gracefully, how does libvirt cope with that?

Regards,

Anthony Liguori

>         Set netdev link states
>         Set balloon target
>
>   2. Source
>         Set  migration speed
>         Set  migration max downtime
>         Run  migrate command (detached)
>         while 1
>            Query migration status
>            if status is failed or success
>              break;
>
>   3. Destination
>        If final status was success
>           Run  'cont' in monitor
>        else
>           kill QEMU process
>
>   4. Source
>        If final status was success and 'cont' on dest succeeded
>           kill QEMU process
>        else
>           Run 'cont' in monitor
>
>
> In older libvirt, the bits from step 4, would actually take place
> at the end of step 2. This meant we could end up with no QEMU
> on either the source or dest, if starting CPUs on the dest QEMU
> failed for some reason.
>
>
> We would still really like to have a 'query-migrate' command for
> the destination, so that we can confirm that the destination has
> consumed all incoming migrate data successfully, rather than just
> blindly starting CPUs and hoping for the best.
>
> Regards,
> Daniel


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-10 19:11             ` Anthony Liguori
@ 2011-11-10 20:06               ` Daniel P. Berrange
  2011-11-10 20:07                 ` Anthony Liguori
  0 siblings, 1 reply; 102+ messages in thread
From: Daniel P. Berrange @ 2011-11-10 20:06 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

On Thu, Nov 10, 2011 at 01:11:42PM -0600, Anthony Liguori wrote:
> On 11/10/2011 12:42 PM, Daniel P. Berrange wrote:
> >On Thu, Nov 10, 2011 at 12:27:30PM -0600, Anthony Liguori wrote:
> >>What does libvirt actually do in the monitor prior to migration
> >>completing on the destination?  The least invasive way of doing
> >>delayed open of block devices is probably to make -incoming create a
> >>monitor and run a main loop before the block devices (and full
> >>device model) is initialized.  Since this isolates the changes
> >>strictly to migration, I'd feel okay doing this for 1.0 (although it
> >>might need to be in the stable branch).
> >
> >The way migration works with libvirt wrt QEMU interactions is now
> >as follows
> >
> >  1. Destination.
> >        Run   qemu -incoming ...args...
> >        Query chardevs via monitor
> >        Query vCPU threads via monitor
> >        Set disk / vnc passwords
> 
> Since RHEL carries Juan's patch, and Juan's patch doesn't handle
> disk passwords gracefully, how does libvirt cope with that?

No idea, that's the first I've heard of any patch that causes
problems with passwords in QEMU.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-10 20:06               ` Daniel P. Berrange
@ 2011-11-10 20:07                 ` Anthony Liguori
  0 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-10 20:07 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

On 11/10/2011 02:06 PM, Daniel P. Berrange wrote:
> On Thu, Nov 10, 2011 at 01:11:42PM -0600, Anthony Liguori wrote:
>> On 11/10/2011 12:42 PM, Daniel P. Berrange wrote:
>>> On Thu, Nov 10, 2011 at 12:27:30PM -0600, Anthony Liguori wrote:
>>>> What does libvirt actually do in the monitor prior to migration
>>>> completing on the destination?  The least invasive way of doing
>>>> delayed open of block devices is probably to make -incoming create a
>>>> monitor and run a main loop before the block devices (and full
>>>> device model) is initialized.  Since this isolates the changes
>>>> strictly to migration, I'd feel okay doing this for 1.0 (although it
>>>> might need to be in the stable branch).
>>>
>>> The way migration works with libvirt wrt QEMU interactions is now
>>> as follows
>>>
>>>   1. Destination.
>>>         Run   qemu -incoming ...args...
>>>         Query chardevs via monitor
>>>         Query vCPU threads via monitor
>>>         Set disk / vnc passwords
>>
>> Since RHEL carries Juan's patch, and Juan's patch doesn't handle
>> disk passwords gracefully, how does libvirt cope with that?
>
> No idea, that's the first I've heard of any patch that causes
> problems with passwords in QEMU.

My guess is that migration with a password protected qcow2 file isn't a common 
test-case.

Regards,

Anthony Liguori

>
> Daniel


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-10 18:27           ` Anthony Liguori
@ 2011-11-10 21:30             ` Anthony Liguori
  -1 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-10 21:30 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Daniel P. Berrange, libvir-list

On 11/10/2011 12:27 PM, Anthony Liguori wrote:
> On 11/10/2011 02:55 AM, Avi Kivity wrote:
>> If we have to delay the release for a month to get it right, we should.
>> Not that I think we have to.
>>
>
> Adding libvirt to the discussion.
>
> What does libvirt actually do in the monitor prior to migration completing on
> the destination? The least invasive way of doing delayed open of block devices
> is probably to make -incoming create a monitor and run a main loop before the
> block devices (and full device model) is initialized. Since this isolates the
> changes strictly to migration, I'd feel okay doing this for 1.0 (although it
> might need to be in the stable branch).

This won't work.  libvirt needs things to be initialized.  Plus, once loadvm 
gets to loading the device model, the device model (and BDSes) need to be fully 
initialized.

I think I've convinced myself that without proper clustered shared storage, 
cache=none is a hard requirement.  That goes for iSCSI and NFS.  I don't see a 
way to do migration safely with NFS and there's no way to really solve the page 
cache problem with iSCSI.

Even with the reopen, it's racing against the close on the source.  If you look 
at Daniel's description of what libvirt is doing and then compare that to Juan's 
patches, there's a race condition regarding whether the source gets closed 
before the reopen happens.  cache=none seems to be the only way to solve this.

Live migration with qcow2 or any other image format is just not going to work 
right now even with proper clustered storage.  I think doing a block level flush 
cache interface and letting block devices decide how to do it is the best approach.

Regards,

Anthony Liguori

> I know a monitor can run like this as I've done it before but some of the
> commands will not behave as expected so it's pretty important to be comfortable
> with what commands are actually being used in this mode.
>
> Regards,
>
> Anthony Liguori

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-10 21:30             ` Anthony Liguori
  0 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-10 21:30 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/10/2011 12:27 PM, Anthony Liguori wrote:
> On 11/10/2011 02:55 AM, Avi Kivity wrote:
>> If we have to delay the release for a month to get it right, we should.
>> Not that I think we have to.
>>
>
> Adding libvirt to the discussion.
>
> What does libvirt actually do in the monitor prior to migration completing on
> the destination? The least invasive way of doing delayed open of block devices
> is probably to make -incoming create a monitor and run a main loop before the
> block devices (and full device model) is initialized. Since this isolates the
> changes strictly to migration, I'd feel okay doing this for 1.0 (although it
> might need to be in the stable branch).

This won't work.  libvirt needs things to be initialized.  Plus, once loadvm 
gets to loading the device model, the device model (and BDSes) need to be fully 
initialized.

I think I've convinced myself that without proper clustered shared storage, 
cache=none is a hard requirement.  That goes for iSCSI and NFS.  I don't see a 
way to do migration safely with NFS and there's no way to really solve the page 
cache problem with iSCSI.

Even with the reopen, it's racing against the close on the source.  If you look 
at Daniel's description of what libvirt is doing and then compare that to Juan's 
patches, there's a race condition regarding whether the source gets closed 
before the reopen happens.  cache=none seems to be the only way to solve this.

Live migration with qcow2 or any other image format is just not going to work 
right now even with proper clustered storage.  I think doing a block level flush 
cache interface and letting block devices decide how to do it is the best approach.

Regards,

Anthony Liguori

> I know a monitor can run like this as I've done it before but some of the
> commands will not behave as expected so it's pretty important to be comfortable
> with what commands are actually being used in this mode.
>
> Regards,
>
> Anthony Liguori

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-10 21:30             ` Anthony Liguori
@ 2011-11-11 10:15               ` Kevin Wolf
  -1 siblings, 0 replies; 102+ messages in thread
From: Kevin Wolf @ 2011-11-11 10:15 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

Am 10.11.2011 22:30, schrieb Anthony Liguori:
> Live migration with qcow2 or any other image format is just not going to work 
> right now even with proper clustered storage.  I think doing a block level flush 
> cache interface and letting block devices decide how to do it is the best approach.

I would really prefer reusing the existing open/close code. It means
less (duplicated) code, is existing code that is well tested and doesn't
make migration much of a special case.

If you want to avoid reopening the file on the OS level, we can reopen
only the topmost layer (i.e. the format, but not the protocol) for now
and in 1.1 we can use bdrv_reopen().

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-11 10:15               ` Kevin Wolf
  0 siblings, 0 replies; 102+ messages in thread
From: Kevin Wolf @ 2011-11-11 10:15 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

Am 10.11.2011 22:30, schrieb Anthony Liguori:
> Live migration with qcow2 or any other image format is just not going to work 
> right now even with proper clustered storage.  I think doing a block level flush 
> cache interface and letting block devices decide how to do it is the best approach.

I would really prefer reusing the existing open/close code. It means
less (duplicated) code, is existing code that is well tested and doesn't
make migration much of a special case.

If you want to avoid reopening the file on the OS level, we can reopen
only the topmost layer (i.e. the format, but not the protocol) for now
and in 1.1 we can use bdrv_reopen().

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-11 10:15               ` [Qemu-devel] " Kevin Wolf
  (?)
@ 2011-11-11 14:03               ` Anthony Liguori
  2011-11-11 14:29                 ` Kevin Wolf
  2011-11-12 10:27                 ` Avi Kivity
  -1 siblings, 2 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-11 14:03 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

On 11/11/2011 04:15 AM, Kevin Wolf wrote:
> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>> Live migration with qcow2 or any other image format is just not going to work
>> right now even with proper clustered storage.  I think doing a block level flush
>> cache interface and letting block devices decide how to do it is the best approach.
>
> I would really prefer reusing the existing open/close code. It means
> less (duplicated) code, is existing code that is well tested and doesn't
> make migration much of a special case.

Just to be clear, reopen only addresses image format migration.  It does not 
address NFS migration since it doesn't guarantee close-to-open semantics.

The problem I have with the reopen patches are that they introduce regressions 
and change at semantics for a management tool.  If you look at the libvirt 
workflow with encrypted disks, it would break with the reopen patches.

>
> If you want to avoid reopening the file on the OS level, we can reopen
> only the topmost layer (i.e. the format, but not the protocol) for now
> and in 1.1 we can use bdrv_reopen().

I don't view not supporting migration with image formats as a regression as it's 
never been a feature we've supported.  While there might be confusion about 
support around NFS, I think it's always been clear that image formats cannot be 
used.

Given that, I don't think this is a candidate for 1.0.

Regards,

Anthony Liguori

>
> Kevin
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-11 14:03               ` Anthony Liguori
@ 2011-11-11 14:29                 ` Kevin Wolf
  2011-11-11 14:35                   ` Anthony Liguori
  2011-11-12 10:27                 ` Avi Kivity
  1 sibling, 1 reply; 102+ messages in thread
From: Kevin Wolf @ 2011-11-11 14:29 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

Am 11.11.2011 15:03, schrieb Anthony Liguori:
> On 11/11/2011 04:15 AM, Kevin Wolf wrote:
>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>>> Live migration with qcow2 or any other image format is just not going to work
>>> right now even with proper clustered storage.  I think doing a block level flush
>>> cache interface and letting block devices decide how to do it is the best approach.
>>
>> I would really prefer reusing the existing open/close code. It means
>> less (duplicated) code, is existing code that is well tested and doesn't
>> make migration much of a special case.
> 
> Just to be clear, reopen only addresses image format migration.  It does not 
> address NFS migration since it doesn't guarantee close-to-open semantics.

Yes. But image formats are the only thing that is really completely
broken today. For NFS etc. we can tell users to use
cache=none/directsync and they will be good. There is no such option
that makes image formats safe.

> The problem I have with the reopen patches are that they introduce regressions 
> and change at semantics for a management tool.  If you look at the libvirt 
> workflow with encrypted disks, it would break with the reopen patches.

Yes, this is nasty. But on the other hand: Today migration is broken for
all qcow2 images, with the reopen it's only broken for encrypted ones.
Certainly an improvement, even though there's still a bug left.

>> If you want to avoid reopening the file on the OS level, we can reopen
>> only the topmost layer (i.e. the format, but not the protocol) for now
>> and in 1.1 we can use bdrv_reopen().
> 
> I don't view not supporting migration with image formats as a regression as it's 
> never been a feature we've supported.  While there might be confusion about 
> support around NFS, I think it's always been clear that image formats cannot be 
> used.
> 
> Given that, I don't think this is a candidate for 1.0.

Nobody says it's a regression, but it's a bad bug and you're blocking a
solution for it for over a year now because the solution isn't perfect
enough in your eyes. :-(

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-11 14:29                 ` Kevin Wolf
@ 2011-11-11 14:35                   ` Anthony Liguori
  2011-11-11 14:44                     ` Kevin Wolf
  0 siblings, 1 reply; 102+ messages in thread
From: Anthony Liguori @ 2011-11-11 14:35 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

On 11/11/2011 08:29 AM, Kevin Wolf wrote:
> Am 11.11.2011 15:03, schrieb Anthony Liguori:
>> On 11/11/2011 04:15 AM, Kevin Wolf wrote:
>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>>>> Live migration with qcow2 or any other image format is just not going to work
>>>> right now even with proper clustered storage.  I think doing a block level flush
>>>> cache interface and letting block devices decide how to do it is the best approach.
>>>
>>> I would really prefer reusing the existing open/close code. It means
>>> less (duplicated) code, is existing code that is well tested and doesn't
>>> make migration much of a special case.
>>
>> Just to be clear, reopen only addresses image format migration.  It does not
>> address NFS migration since it doesn't guarantee close-to-open semantics.
>
> Yes. But image formats are the only thing that is really completely
> broken today. For NFS etc. we can tell users to use
> cache=none/directsync and they will be good. There is no such option
> that makes image formats safe.
>
>> The problem I have with the reopen patches are that they introduce regressions
>> and change at semantics for a management tool.  If you look at the libvirt
>> workflow with encrypted disks, it would break with the reopen patches.
>
> Yes, this is nasty. But on the other hand: Today migration is broken for
> all qcow2 images, with the reopen it's only broken for encrypted ones.
> Certainly an improvement, even though there's still a bug left.

This sounds like a good thing to work through in the next release.

>
>>> If you want to avoid reopening the file on the OS level, we can reopen
>>> only the topmost layer (i.e. the format, but not the protocol) for now
>>> and in 1.1 we can use bdrv_reopen().
>>
>> I don't view not supporting migration with image formats as a regression as it's
>> never been a feature we've supported.  While there might be confusion about
>> support around NFS, I think it's always been clear that image formats cannot be
>> used.
>>
>> Given that, I don't think this is a candidate for 1.0.
>
> Nobody says it's a regression, but it's a bad bug and you're blocking a
> solution for it for over a year now because the solution isn't perfect
> enough in your eyes. :-(

This patch was posted a year ago.  Feedback was provided and there was never any 
follow up[1].  I've never Nack'd this approach.  I can't see how I was blocking 
this since I never even responded in the thread.  If this came in before soft 
freeze, I wouldn't have objected if you wanted to go in this direction.

This is not a bug fix, this is a new feature.  We're long past feature freeze. 
It's not a simple and obvious fix either.  It only partially fixes the problem 
and introduces other problems.  It's not a good candidate for making an 
exception at this stage in the release.

[1] http://mid.gmane.org/cover.1294150511.git.quintela@redhat.com

Regards,

Anthony Liguori

>
> Kevin
>


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-11 14:35                   ` Anthony Liguori
@ 2011-11-11 14:44                     ` Kevin Wolf
  2011-11-11 20:38                       ` Anthony Liguori
  0 siblings, 1 reply; 102+ messages in thread
From: Kevin Wolf @ 2011-11-11 14:44 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

Am 11.11.2011 15:35, schrieb Anthony Liguori:
> On 11/11/2011 08:29 AM, Kevin Wolf wrote:
>> Am 11.11.2011 15:03, schrieb Anthony Liguori:
>>> On 11/11/2011 04:15 AM, Kevin Wolf wrote:
>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>>>>> Live migration with qcow2 or any other image format is just not going to work
>>>>> right now even with proper clustered storage.  I think doing a block level flush
>>>>> cache interface and letting block devices decide how to do it is the best approach.
>>>>
>>>> I would really prefer reusing the existing open/close code. It means
>>>> less (duplicated) code, is existing code that is well tested and doesn't
>>>> make migration much of a special case.
>>>
>>> Just to be clear, reopen only addresses image format migration.  It does not
>>> address NFS migration since it doesn't guarantee close-to-open semantics.
>>
>> Yes. But image formats are the only thing that is really completely
>> broken today. For NFS etc. we can tell users to use
>> cache=none/directsync and they will be good. There is no such option
>> that makes image formats safe.
>>
>>> The problem I have with the reopen patches are that they introduce regressions
>>> and change at semantics for a management tool.  If you look at the libvirt
>>> workflow with encrypted disks, it would break with the reopen patches.
>>
>> Yes, this is nasty. But on the other hand: Today migration is broken for
>> all qcow2 images, with the reopen it's only broken for encrypted ones.
>> Certainly an improvement, even though there's still a bug left.
> 
> This sounds like a good thing to work through in the next release.
> 
>>
>>>> If you want to avoid reopening the file on the OS level, we can reopen
>>>> only the topmost layer (i.e. the format, but not the protocol) for now
>>>> and in 1.1 we can use bdrv_reopen().
>>>
>>> I don't view not supporting migration with image formats as a regression as it's
>>> never been a feature we've supported.  While there might be confusion about
>>> support around NFS, I think it's always been clear that image formats cannot be
>>> used.
>>>
>>> Given that, I don't think this is a candidate for 1.0.
>>
>> Nobody says it's a regression, but it's a bad bug and you're blocking a
>> solution for it for over a year now because the solution isn't perfect
>> enough in your eyes. :-(
> 
> This patch was posted a year ago.  Feedback was provided and there was never any 
> follow up[1].  I've never Nack'd this approach.  I can't see how I was blocking 
> this since I never even responded in the thread.  If this came in before soft 
> freeze, I wouldn't have objected if you wanted to go in this direction.
> 
> This is not a bug fix, this is a new feature.  We're long past feature freeze. 
> It's not a simple and obvious fix either.  It only partially fixes the problem 
> and introduces other problems.  It's not a good candidate for making an 
> exception at this stage in the release.
> 
> [1] http://mid.gmane.org/cover.1294150511.git.quintela@redhat.com

Then please send a fix that fails migration with non-raw images. Not
breaking images silently during migration is critical for 1.0, IMO.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-11 14:44                     ` Kevin Wolf
@ 2011-11-11 20:38                       ` Anthony Liguori
  0 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-11 20:38 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

On 11/11/2011 08:44 AM, Kevin Wolf wrote:
> Am 11.11.2011 15:35, schrieb Anthony Liguori:
>> This is not a bug fix, this is a new feature.  We're long past feature freeze.
>> It's not a simple and obvious fix either.  It only partially fixes the problem
>> and introduces other problems.  It's not a good candidate for making an
>> exception at this stage in the release.
>>
>> [1] http://mid.gmane.org/cover.1294150511.git.quintela@redhat.com
>
> Then please send a fix that fails migration with non-raw images. Not
> breaking images silently during migration is critical for 1.0, IMO.

I sent a quick series.  If you want to do things different for the blocker layer 
like add a field to BlockDriver to indicate whether migration is supported and 
register the blocker in the core code, feel free to do that.

Regards,

Anthony Liguori

>
> Kevin
>


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-10 17:54           ` [Qemu-devel] " Anthony Liguori
@ 2011-11-12 10:20             ` Avi Kivity
  -1 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2011-11-12 10:20 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/10/2011 07:54 PM, Anthony Liguori wrote:
>> IMO, this should be a release blocker.  qemu 1.0 only supporting
>> migration on enterprise storage?
>
>
> No, this is not going to block the release.
>
> You can't dump patches on the ML during -rc for an issue that has been
> understood for well over a year simply because it's release time.
>
> If this was so important, it should have been fixed a year ago in the
> proper way.

Nor can you yank support for migration this way.  Might as well put a
big sign on 1.0, "Do Not Use This Release".

Making formal plans and sticking to them is great, but not to the point
of ignoring reality.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-12 10:20             ` Avi Kivity
  0 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2011-11-12 10:20 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/10/2011 07:54 PM, Anthony Liguori wrote:
>> IMO, this should be a release blocker.  qemu 1.0 only supporting
>> migration on enterprise storage?
>
>
> No, this is not going to block the release.
>
> You can't dump patches on the ML during -rc for an issue that has been
> understood for well over a year simply because it's release time.
>
> If this was so important, it should have been fixed a year ago in the
> proper way.

Nor can you yank support for migration this way.  Might as well put a
big sign on 1.0, "Do Not Use This Release".

Making formal plans and sticking to them is great, but not to the point
of ignoring reality.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-11 10:15               ` [Qemu-devel] " Kevin Wolf
@ 2011-11-12 10:25                 ` Avi Kivity
  -1 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2011-11-12 10:25 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Anthony Liguori, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Daniel P. Berrange, libvir-list

On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > Live migration with qcow2 or any other image format is just not going to work 
> > right now even with proper clustered storage.  I think doing a block level flush 
> > cache interface and letting block devices decide how to do it is the best approach.
>
> I would really prefer reusing the existing open/close code. It means
> less (duplicated) code, is existing code that is well tested and doesn't
> make migration much of a special case.
>
> If you want to avoid reopening the file on the OS level, we can reopen
> only the topmost layer (i.e. the format, but not the protocol) for now
> and in 1.1 we can use bdrv_reopen().
>

Intuitively I dislike _reopen style interfaces.  If the second open
yields different results from the first, does it invalidate any
computations in between?

What's wrong with just delaying the open?

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-12 10:25                 ` Avi Kivity
  0 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2011-11-12 10:25 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > Live migration with qcow2 or any other image format is just not going to work 
> > right now even with proper clustered storage.  I think doing a block level flush 
> > cache interface and letting block devices decide how to do it is the best approach.
>
> I would really prefer reusing the existing open/close code. It means
> less (duplicated) code, is existing code that is well tested and doesn't
> make migration much of a special case.
>
> If you want to avoid reopening the file on the OS level, we can reopen
> only the topmost layer (i.e. the format, but not the protocol) for now
> and in 1.1 we can use bdrv_reopen().
>

Intuitively I dislike _reopen style interfaces.  If the second open
yields different results from the first, does it invalidate any
computations in between?

What's wrong with just delaying the open?

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-11 14:03               ` Anthony Liguori
  2011-11-11 14:29                 ` Kevin Wolf
@ 2011-11-12 10:27                 ` Avi Kivity
  2011-11-12 13:39                   ` Anthony Liguori
  1 sibling, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2011-11-12 10:27 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/11/2011 04:03 PM, Anthony Liguori wrote:
>
> I don't view not supporting migration with image formats as a
> regression as it's never been a feature we've supported.  While there
> might be confusion about support around NFS, I think it's always been
> clear that image formats cannot be used.

Was there ever a statement to that effect?  It was never clear to me and
I doubt it was clear to anyone.

>
> Given that, I don't think this is a candidate for 1.0.
>

Let's just skip 1.0 and do 1.1 instead.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-12 10:20             ` [Qemu-devel] " Avi Kivity
@ 2011-11-12 13:30               ` Anthony Liguori
  -1 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-12 13:30 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/12/2011 04:20 AM, Avi Kivity wrote:
> On 11/10/2011 07:54 PM, Anthony Liguori wrote:
>>> IMO, this should be a release blocker.  qemu 1.0 only supporting
>>> migration on enterprise storage?
>>
>>
>> No, this is not going to block the release.
>>
>> You can't dump patches on the ML during -rc for an issue that has been
>> understood for well over a year simply because it's release time.
>>
>> If this was so important, it should have been fixed a year ago in the
>> proper way.
>
> Nor can you yank support for migration this way.  Might as well put a
> big sign on 1.0, "Do Not Use This Release".

You're joking, right?

Let's be very clear.  Live migration works perfectly fine when you use raw 
images and coherent shared storage.

NFS is *not* fully coherent so in order to do live migration with NFS, you have 
to use cache=none.

Live migration does not work with image formats.  There's not a simple way to 
make it support image files.  So far, no one has put the work into making it 
support image files.

>
> Making formal plans and sticking to them is great, but not to the point
> of ignoring reality.

Why do you think this is an end of the world feature?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-12 13:30               ` Anthony Liguori
  0 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-12 13:30 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/12/2011 04:20 AM, Avi Kivity wrote:
> On 11/10/2011 07:54 PM, Anthony Liguori wrote:
>>> IMO, this should be a release blocker.  qemu 1.0 only supporting
>>> migration on enterprise storage?
>>
>>
>> No, this is not going to block the release.
>>
>> You can't dump patches on the ML during -rc for an issue that has been
>> understood for well over a year simply because it's release time.
>>
>> If this was so important, it should have been fixed a year ago in the
>> proper way.
>
> Nor can you yank support for migration this way.  Might as well put a
> big sign on 1.0, "Do Not Use This Release".

You're joking, right?

Let's be very clear.  Live migration works perfectly fine when you use raw 
images and coherent shared storage.

NFS is *not* fully coherent so in order to do live migration with NFS, you have 
to use cache=none.

Live migration does not work with image formats.  There's not a simple way to 
make it support image files.  So far, no one has put the work into making it 
support image files.

>
> Making formal plans and sticking to them is great, but not to the point
> of ignoring reality.

Why do you think this is an end of the world feature?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-12 10:27                 ` Avi Kivity
@ 2011-11-12 13:39                   ` Anthony Liguori
  2011-11-12 14:43                     ` Avi Kivity
  0 siblings, 1 reply; 102+ messages in thread
From: Anthony Liguori @ 2011-11-12 13:39 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/12/2011 04:27 AM, Avi Kivity wrote:
> On 11/11/2011 04:03 PM, Anthony Liguori wrote:
>>
>> I don't view not supporting migration with image formats as a
>> regression as it's never been a feature we've supported.  While there
>> might be confusion about support around NFS, I think it's always been
>> clear that image formats cannot be used.
>
> Was there ever a statement to that effect?  It was never clear to me and
> I doubt it was clear to anyone.

You literally reviewed a patch who's subject was "block: allow migration to work 
with image files"[1] that explained in gory detail what the problem was.

[1] http://mid.gmane.org/4C8CAD7C.5020102@redhat.com

>
>>
>> Given that, I don't think this is a candidate for 1.0.
>>
>
> Let's just skip 1.0 and do 1.1 instead.

Let's stop being overly dramatic.  You know as well as anyone that image format 
support up until the coroutine conversion has had enough problems that no one 
could practically be using them in a production environment.

Live migration is an availability feature.  Up until the 1.0 release, if you 
cared about availability and correctness, you would not be using an image format.

Regards,

Anthony Liguori



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-12 13:30               ` [Qemu-devel] " Anthony Liguori
@ 2011-11-12 14:36                 ` Avi Kivity
  -1 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2011-11-12 14:36 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/12/2011 03:30 PM, Anthony Liguori wrote:
>> Nor can you yank support for migration this way.  Might as well put a
>> big sign on 1.0, "Do Not Use This Release".
>
>
> You're joking, right?

No.

>
> Let's be very clear.  Live migration works perfectly fine when you use
> raw images and coherent shared storage.

This is not a realistic setup.

>
> NFS is *not* fully coherent so in order to do live migration with NFS,
> you have to use cache=none.

While putting restrictions on features is not ideal, requiring
cache=none is acceptable as it's that's what's recommended anyway by the
qemu management tool writer's guide.

>
> Live migration does not work with image formats.  There's not a simple
> way to make it support image files.  So far, no one has put the work
> into making it support image files.

It must be possible, since RHEL's qemu-kvm supports it.  I'm sure Kevin
and Juan can make it work.

>
>>
>> Making formal plans and sticking to them is great, but not to the point
>> of ignoring reality.
>
> Why do you think this is an end of the world feature?
>

In 2007 when live migration was added, random restrictions like that
weren't important.  4.5 years later, pulling support for it makes us
look like a joke.  Even if it's not technically a regression, in terms
of user's expectations, it is.  It's simply impossible to delay it for
six more months, or however long the 1.1 cycle takes.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-12 14:36                 ` Avi Kivity
  0 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2011-11-12 14:36 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Lucas Meneghel Rodrigues, Kevin Wolf, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/12/2011 03:30 PM, Anthony Liguori wrote:
>> Nor can you yank support for migration this way.  Might as well put a
>> big sign on 1.0, "Do Not Use This Release".
>
>
> You're joking, right?

No.

>
> Let's be very clear.  Live migration works perfectly fine when you use
> raw images and coherent shared storage.

This is not a realistic setup.

>
> NFS is *not* fully coherent so in order to do live migration with NFS,
> you have to use cache=none.

While putting restrictions on features is not ideal, requiring
cache=none is acceptable as it's that's what's recommended anyway by the
qemu management tool writer's guide.

>
> Live migration does not work with image formats.  There's not a simple
> way to make it support image files.  So far, no one has put the work
> into making it support image files.

It must be possible, since RHEL's qemu-kvm supports it.  I'm sure Kevin
and Juan can make it work.

>
>>
>> Making formal plans and sticking to them is great, but not to the point
>> of ignoring reality.
>
> Why do you think this is an end of the world feature?
>

In 2007 when live migration was added, random restrictions like that
weren't important.  4.5 years later, pulling support for it makes us
look like a joke.  Even if it's not technically a regression, in terms
of user's expectations, it is.  It's simply impossible to delay it for
six more months, or however long the 1.1 cycle takes.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-12 13:39                   ` Anthony Liguori
@ 2011-11-12 14:43                     ` Avi Kivity
  2011-11-12 16:01                       ` Anthony Liguori
  0 siblings, 1 reply; 102+ messages in thread
From: Avi Kivity @ 2011-11-12 14:43 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/12/2011 03:39 PM, Anthony Liguori wrote:
> On 11/12/2011 04:27 AM, Avi Kivity wrote:
>> On 11/11/2011 04:03 PM, Anthony Liguori wrote:
>>>
>>> I don't view not supporting migration with image formats as a
>>> regression as it's never been a feature we've supported.  While there
>>> might be confusion about support around NFS, I think it's always been
>>> clear that image formats cannot be used.
>>
>> Was there ever a statement to that effect?  It was never clear to me and
>> I doubt it was clear to anyone.
>
> You literally reviewed a patch who's subject was "block: allow
> migration to work with image files"[1] that explained in gory detail
> what the problem was.
>
> [1] http://mid.gmane.org/4C8CAD7C.5020102@redhat.com
>

Isn't a patch fixing a problem with migrating image files a statement
that we do support migrating image files?


>>
>>>
>>> Given that, I don't think this is a candidate for 1.0.
>>>
>>
>> Let's just skip 1.0 and do 1.1 instead.
>
> Let's stop being overly dramatic.  You know as well as anyone that
> image format support up until the coroutine conversion has had enough
> problems that no one could practically be using them in a production
> environment.

They are used in production environments.

>
> Live migration is an availability feature.  Up until the 1.0 release,
> if you cared about availability and correctness, you would not be
> using an image format.
>

Nevertheless, people who care about both availability and correctness,
do use image formats.  In reality, migration and image formats are
critical features for virtualization workloads.  Pretending they're not
makes the 1.0 release a joke.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-12 14:43                     ` Avi Kivity
@ 2011-11-12 16:01                       ` Anthony Liguori
  0 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-12 16:01 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/12/2011 08:43 AM, Avi Kivity wrote:
> On 11/12/2011 03:39 PM, Anthony Liguori wrote:
>> On 11/12/2011 04:27 AM, Avi Kivity wrote:
>>> On 11/11/2011 04:03 PM, Anthony Liguori wrote:
>>>>
>>>> I don't view not supporting migration with image formats as a
>>>> regression as it's never been a feature we've supported.  While there
>>>> might be confusion about support around NFS, I think it's always been
>>>> clear that image formats cannot be used.
>>>
>>> Was there ever a statement to that effect?  It was never clear to me and
>>> I doubt it was clear to anyone.
>>
>> You literally reviewed a patch who's subject was "block: allow
>> migration to work with image files"[1] that explained in gory detail
>> what the problem was.
>>
>> [1] http://mid.gmane.org/4C8CAD7C.5020102@redhat.com
>>
>
> Isn't a patch fixing a problem with migrating image files a statement
> that we do support migrating image files?
>

You know, we could go 9 rounds about this and it really doesn't matter.

For 1.0, I feel very strongly that we cannot change the semantics of migration 
with raw as dramatically as has been proposed.  That's a huge regression risk.

But since live migration with qcow2 has never worked, there's really not a lot 
of harm of adding something that makes qcow2 with live migration work better 
than it does right now.

I just sent out a series that does this.  It's Kevin's original idea since 
actually reopening the file doesn't help anything.  The only thing that's 
different from what I expect Kevin would want is that this is restricted to qcow2.

I want this restrict for 1.0.  If once 1.1 opens up, Kevin wants to promote that 
code to generic block layer code, that's fine with me.

It's also included as part of the migration blockers series so that we are very 
explicit about when we don't support migration for given image formats.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-12 10:25                 ` Avi Kivity
@ 2011-11-14  9:58                   ` Kevin Wolf
  -1 siblings, 0 replies; 102+ messages in thread
From: Kevin Wolf @ 2011-11-14  9:58 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Daniel P. Berrange, libvir-list

Am 12.11.2011 11:25, schrieb Avi Kivity:
> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>>> Live migration with qcow2 or any other image format is just not going to work 
>>> right now even with proper clustered storage.  I think doing a block level flush 
>>> cache interface and letting block devices decide how to do it is the best approach.
>>
>> I would really prefer reusing the existing open/close code. It means
>> less (duplicated) code, is existing code that is well tested and doesn't
>> make migration much of a special case.
>>
>> If you want to avoid reopening the file on the OS level, we can reopen
>> only the topmost layer (i.e. the format, but not the protocol) for now
>> and in 1.1 we can use bdrv_reopen().
> 
> Intuitively I dislike _reopen style interfaces.  If the second open
> yields different results from the first, does it invalidate any
> computations in between?

Not sure what results and what computation you mean, but let me clarify
a bit about bdrv_reopen:

The main purpose of bdrv_reopen() is to change flags, for example toggle
O_SYNC during runtime in order to allow the guest to toggle WCE. This
doesn't necessarily mean a close()/open() sequence if there are other
means to change the flags, like fcntl() (or even using other protocols
than files).

The idea here was to extend this to invalidate all caches if some
specific flag is set. As you don't change any other flag, this will
usually not be a reopen on a lower level.

If we need to use open() though, and it fails (this is really the only
"different" result that comes to mind) then bdrv_reopen() would fail and
the old fd would stay in use. Migration would have to fail, but I don't
think this case is ever needed for reopening after migration.

> What's wrong with just delaying the open?

Nothing, except that with today's code it's harder to do.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14  9:58                   ` Kevin Wolf
  0 siblings, 0 replies; 102+ messages in thread
From: Kevin Wolf @ 2011-11-14  9:58 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

Am 12.11.2011 11:25, schrieb Avi Kivity:
> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>>> Live migration with qcow2 or any other image format is just not going to work 
>>> right now even with proper clustered storage.  I think doing a block level flush 
>>> cache interface and letting block devices decide how to do it is the best approach.
>>
>> I would really prefer reusing the existing open/close code. It means
>> less (duplicated) code, is existing code that is well tested and doesn't
>> make migration much of a special case.
>>
>> If you want to avoid reopening the file on the OS level, we can reopen
>> only the topmost layer (i.e. the format, but not the protocol) for now
>> and in 1.1 we can use bdrv_reopen().
> 
> Intuitively I dislike _reopen style interfaces.  If the second open
> yields different results from the first, does it invalidate any
> computations in between?

Not sure what results and what computation you mean, but let me clarify
a bit about bdrv_reopen:

The main purpose of bdrv_reopen() is to change flags, for example toggle
O_SYNC during runtime in order to allow the guest to toggle WCE. This
doesn't necessarily mean a close()/open() sequence if there are other
means to change the flags, like fcntl() (or even using other protocols
than files).

The idea here was to extend this to invalidate all caches if some
specific flag is set. As you don't change any other flag, this will
usually not be a reopen on a lower level.

If we need to use open() though, and it fails (this is really the only
"different" result that comes to mind) then bdrv_reopen() would fail and
the old fd would stay in use. Migration would have to fail, but I don't
think this case is ever needed for reopening after migration.

> What's wrong with just delaying the open?

Nothing, except that with today's code it's harder to do.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14  9:58                   ` Kevin Wolf
@ 2011-11-14 10:10                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-14 10:10 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, libvir-list, Marcelo Tosatti,
	QEMU devel, Avi Kivity

On Mon, Nov 14, 2011 at 10:58:16AM +0100, Kevin Wolf wrote:
> Am 12.11.2011 11:25, schrieb Avi Kivity:
> > On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> >> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> >>> Live migration with qcow2 or any other image format is just not going to work 
> >>> right now even with proper clustered storage.  I think doing a block level flush 
> >>> cache interface and letting block devices decide how to do it is the best approach.
> >>
> >> I would really prefer reusing the existing open/close code. It means
> >> less (duplicated) code, is existing code that is well tested and doesn't
> >> make migration much of a special case.
> >>
> >> If you want to avoid reopening the file on the OS level, we can reopen
> >> only the topmost layer (i.e. the format, but not the protocol) for now
> >> and in 1.1 we can use bdrv_reopen().
> > 
> > Intuitively I dislike _reopen style interfaces.  If the second open
> > yields different results from the first, does it invalidate any
> > computations in between?
> 
> Not sure what results and what computation you mean, but let me clarify
> a bit about bdrv_reopen:
> 
> The main purpose of bdrv_reopen() is to change flags, for example toggle
> O_SYNC during runtime in order to allow the guest to toggle WCE. This
> doesn't necessarily mean a close()/open() sequence if there are other
> means to change the flags, like fcntl() (or even using other protocols
> than files).
> 
> The idea here was to extend this to invalidate all caches if some
> specific flag is set. As you don't change any other flag, this will
> usually not be a reopen on a lower level.
> 
> If we need to use open() though, and it fails (this is really the only
> "different" result that comes to mind) then bdrv_reopen() would fail and
> the old fd would stay in use. Migration would have to fail, but I don't
> think this case is ever needed for reopening after migration.
> 
> > What's wrong with just delaying the open?
> 
> Nothing, except that with today's code it's harder to do.
> 
> Kevin

It seems cleaner, though, doesn't it?

-- 
MST

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14 10:10                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-14 10:10 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, libvir-list, Marcelo Tosatti,
	QEMU devel, Avi Kivity

On Mon, Nov 14, 2011 at 10:58:16AM +0100, Kevin Wolf wrote:
> Am 12.11.2011 11:25, schrieb Avi Kivity:
> > On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> >> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> >>> Live migration with qcow2 or any other image format is just not going to work 
> >>> right now even with proper clustered storage.  I think doing a block level flush 
> >>> cache interface and letting block devices decide how to do it is the best approach.
> >>
> >> I would really prefer reusing the existing open/close code. It means
> >> less (duplicated) code, is existing code that is well tested and doesn't
> >> make migration much of a special case.
> >>
> >> If you want to avoid reopening the file on the OS level, we can reopen
> >> only the topmost layer (i.e. the format, but not the protocol) for now
> >> and in 1.1 we can use bdrv_reopen().
> > 
> > Intuitively I dislike _reopen style interfaces.  If the second open
> > yields different results from the first, does it invalidate any
> > computations in between?
> 
> Not sure what results and what computation you mean, but let me clarify
> a bit about bdrv_reopen:
> 
> The main purpose of bdrv_reopen() is to change flags, for example toggle
> O_SYNC during runtime in order to allow the guest to toggle WCE. This
> doesn't necessarily mean a close()/open() sequence if there are other
> means to change the flags, like fcntl() (or even using other protocols
> than files).
> 
> The idea here was to extend this to invalidate all caches if some
> specific flag is set. As you don't change any other flag, this will
> usually not be a reopen on a lower level.
> 
> If we need to use open() though, and it fails (this is really the only
> "different" result that comes to mind) then bdrv_reopen() would fail and
> the old fd would stay in use. Migration would have to fail, but I don't
> think this case is ever needed for reopening after migration.
> 
> > What's wrong with just delaying the open?
> 
> Nothing, except that with today's code it's harder to do.
> 
> Kevin

It seems cleaner, though, doesn't it?

-- 
MST

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-12 10:25                 ` Avi Kivity
@ 2011-11-14 10:16                   ` Daniel P. Berrange
  -1 siblings, 0 replies; 102+ messages in thread
From: Daniel P. Berrange @ 2011-11-14 10:16 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Wolf, Anthony Liguori, Lucas Meneghel Rodrigues,
	KVM mailing list, Michael S. Tsirkin, Marcelo Tosatti,
	QEMU devel, Juan Jose Quintela Carreira, libvir-list

On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > Live migration with qcow2 or any other image format is just not going to work 
> > > right now even with proper clustered storage.  I think doing a block level flush 
> > > cache interface and letting block devices decide how to do it is the best approach.
> >
> > I would really prefer reusing the existing open/close code. It means
> > less (duplicated) code, is existing code that is well tested and doesn't
> > make migration much of a special case.
> >
> > If you want to avoid reopening the file on the OS level, we can reopen
> > only the topmost layer (i.e. the format, but not the protocol) for now
> > and in 1.1 we can use bdrv_reopen().
> >
> 
> Intuitively I dislike _reopen style interfaces.  If the second open
> yields different results from the first, does it invalidate any
> computations in between?
> 
> What's wrong with just delaying the open?

If you delay the 'open' until the mgmt app issues 'cont', then you loose
the ability to rollback to the source host upon open failure for most
deployed versions of libvirt. We only fairly recently switched to a five
stage migration handshake to cope with rollback when 'cont' fails.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14 10:16                   ` Daniel P. Berrange
  0 siblings, 0 replies; 102+ messages in thread
From: Daniel P. Berrange @ 2011-11-14 10:16 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > Live migration with qcow2 or any other image format is just not going to work 
> > > right now even with proper clustered storage.  I think doing a block level flush 
> > > cache interface and letting block devices decide how to do it is the best approach.
> >
> > I would really prefer reusing the existing open/close code. It means
> > less (duplicated) code, is existing code that is well tested and doesn't
> > make migration much of a special case.
> >
> > If you want to avoid reopening the file on the OS level, we can reopen
> > only the topmost layer (i.e. the format, but not the protocol) for now
> > and in 1.1 we can use bdrv_reopen().
> >
> 
> Intuitively I dislike _reopen style interfaces.  If the second open
> yields different results from the first, does it invalidate any
> computations in between?
> 
> What's wrong with just delaying the open?

If you delay the 'open' until the mgmt app issues 'cont', then you loose
the ability to rollback to the source host upon open failure for most
deployed versions of libvirt. We only fairly recently switched to a five
stage migration handshake to cope with rollback when 'cont' fails.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14 10:16                   ` Daniel P. Berrange
@ 2011-11-14 10:24                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-14 10:24 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Avi Kivity, Kevin Wolf, Anthony Liguori,
	Lucas Meneghel Rodrigues, KVM mailing list, Marcelo Tosatti,
	QEMU devel, Juan Jose Quintela Carreira, libvir-list

On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > Live migration with qcow2 or any other image format is just not going to work 
> > > > right now even with proper clustered storage.  I think doing a block level flush 
> > > > cache interface and letting block devices decide how to do it is the best approach.
> > >
> > > I would really prefer reusing the existing open/close code. It means
> > > less (duplicated) code, is existing code that is well tested and doesn't
> > > make migration much of a special case.
> > >
> > > If you want to avoid reopening the file on the OS level, we can reopen
> > > only the topmost layer (i.e. the format, but not the protocol) for now
> > > and in 1.1 we can use bdrv_reopen().
> > >
> > 
> > Intuitively I dislike _reopen style interfaces.  If the second open
> > yields different results from the first, does it invalidate any
> > computations in between?
> > 
> > What's wrong with just delaying the open?
> 
> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> the ability to rollback to the source host upon open failure for most
> deployed versions of libvirt. We only fairly recently switched to a five
> stage migration handshake to cope with rollback when 'cont' fails.
> 
> Daniel

I guess reopen can fail as well, so this seems to me to be an important
fix but not a blocker.

> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14 10:24                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-14 10:24 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, libvir-list, Marcelo Tosatti,
	QEMU devel, Avi Kivity

On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > Live migration with qcow2 or any other image format is just not going to work 
> > > > right now even with proper clustered storage.  I think doing a block level flush 
> > > > cache interface and letting block devices decide how to do it is the best approach.
> > >
> > > I would really prefer reusing the existing open/close code. It means
> > > less (duplicated) code, is existing code that is well tested and doesn't
> > > make migration much of a special case.
> > >
> > > If you want to avoid reopening the file on the OS level, we can reopen
> > > only the topmost layer (i.e. the format, but not the protocol) for now
> > > and in 1.1 we can use bdrv_reopen().
> > >
> > 
> > Intuitively I dislike _reopen style interfaces.  If the second open
> > yields different results from the first, does it invalidate any
> > computations in between?
> > 
> > What's wrong with just delaying the open?
> 
> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> the ability to rollback to the source host upon open failure for most
> deployed versions of libvirt. We only fairly recently switched to a five
> stage migration handshake to cope with rollback when 'cont' fails.
> 
> Daniel

I guess reopen can fail as well, so this seems to me to be an important
fix but not a blocker.

> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14 10:24                     ` Michael S. Tsirkin
@ 2011-11-14 11:08                       ` Daniel P. Berrange
  -1 siblings, 0 replies; 102+ messages in thread
From: Daniel P. Berrange @ 2011-11-14 11:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Kevin Wolf, Anthony Liguori,
	Lucas Meneghel Rodrigues, KVM mailing list, Marcelo Tosatti,
	QEMU devel, Juan Jose Quintela Carreira, libvir-list

On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > > Live migration with qcow2 or any other image format is just not going to work 
> > > > > right now even with proper clustered storage.  I think doing a block level flush 
> > > > > cache interface and letting block devices decide how to do it is the best approach.
> > > >
> > > > I would really prefer reusing the existing open/close code. It means
> > > > less (duplicated) code, is existing code that is well tested and doesn't
> > > > make migration much of a special case.
> > > >
> > > > If you want to avoid reopening the file on the OS level, we can reopen
> > > > only the topmost layer (i.e. the format, but not the protocol) for now
> > > > and in 1.1 we can use bdrv_reopen().
> > > >
> > > 
> > > Intuitively I dislike _reopen style interfaces.  If the second open
> > > yields different results from the first, does it invalidate any
> > > computations in between?
> > > 
> > > What's wrong with just delaying the open?
> > 
> > If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > the ability to rollback to the source host upon open failure for most
> > deployed versions of libvirt. We only fairly recently switched to a five
> > stage migration handshake to cope with rollback when 'cont' fails.
> > 
> > Daniel
> 
> I guess reopen can fail as well, so this seems to me to be an important
> fix but not a blocker.

If if the initial open succeeds, then it is far more likely that a later
re-open will succeed too, because you have already elminated the possibility
of configuration mistakes, and will have caught most storage runtime errors
too. So there is a very significant difference in reliability between doing
an 'open at startup + reopen at cont' vs just 'open at cont'

Based on the bug reports I see, we want to be very good at detecting and
gracefully handling open errors because they are pretty frequent.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14 11:08                       ` Daniel P. Berrange
  0 siblings, 0 replies; 102+ messages in thread
From: Daniel P. Berrange @ 2011-11-14 11:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, libvir-list, Marcelo Tosatti,
	QEMU devel, Avi Kivity

On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > > Live migration with qcow2 or any other image format is just not going to work 
> > > > > right now even with proper clustered storage.  I think doing a block level flush 
> > > > > cache interface and letting block devices decide how to do it is the best approach.
> > > >
> > > > I would really prefer reusing the existing open/close code. It means
> > > > less (duplicated) code, is existing code that is well tested and doesn't
> > > > make migration much of a special case.
> > > >
> > > > If you want to avoid reopening the file on the OS level, we can reopen
> > > > only the topmost layer (i.e. the format, but not the protocol) for now
> > > > and in 1.1 we can use bdrv_reopen().
> > > >
> > > 
> > > Intuitively I dislike _reopen style interfaces.  If the second open
> > > yields different results from the first, does it invalidate any
> > > computations in between?
> > > 
> > > What's wrong with just delaying the open?
> > 
> > If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > the ability to rollback to the source host upon open failure for most
> > deployed versions of libvirt. We only fairly recently switched to a five
> > stage migration handshake to cope with rollback when 'cont' fails.
> > 
> > Daniel
> 
> I guess reopen can fail as well, so this seems to me to be an important
> fix but not a blocker.

If if the initial open succeeds, then it is far more likely that a later
re-open will succeed too, because you have already elminated the possibility
of configuration mistakes, and will have caught most storage runtime errors
too. So there is a very significant difference in reliability between doing
an 'open at startup + reopen at cont' vs just 'open at cont'

Based on the bug reports I see, we want to be very good at detecting and
gracefully handling open errors because they are pretty frequent.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14 11:08                       ` Daniel P. Berrange
@ 2011-11-14 11:21                         ` Kevin Wolf
  -1 siblings, 0 replies; 102+ messages in thread
From: Kevin Wolf @ 2011-11-14 11:21 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
>> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
>>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
>>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
>>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>>>>>> Live migration with qcow2 or any other image format is just not going to work 
>>>>>> right now even with proper clustered storage.  I think doing a block level flush 
>>>>>> cache interface and letting block devices decide how to do it is the best approach.
>>>>>
>>>>> I would really prefer reusing the existing open/close code. It means
>>>>> less (duplicated) code, is existing code that is well tested and doesn't
>>>>> make migration much of a special case.
>>>>>
>>>>> If you want to avoid reopening the file on the OS level, we can reopen
>>>>> only the topmost layer (i.e. the format, but not the protocol) for now
>>>>> and in 1.1 we can use bdrv_reopen().
>>>>>
>>>>
>>>> Intuitively I dislike _reopen style interfaces.  If the second open
>>>> yields different results from the first, does it invalidate any
>>>> computations in between?
>>>>
>>>> What's wrong with just delaying the open?
>>>
>>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
>>> the ability to rollback to the source host upon open failure for most
>>> deployed versions of libvirt. We only fairly recently switched to a five
>>> stage migration handshake to cope with rollback when 'cont' fails.
>>>
>>> Daniel
>>
>> I guess reopen can fail as well, so this seems to me to be an important
>> fix but not a blocker.
> 
> If if the initial open succeeds, then it is far more likely that a later
> re-open will succeed too, because you have already elminated the possibility
> of configuration mistakes, and will have caught most storage runtime errors
> too. So there is a very significant difference in reliability between doing
> an 'open at startup + reopen at cont' vs just 'open at cont'
> 
> Based on the bug reports I see, we want to be very good at detecting and
> gracefully handling open errors because they are pretty frequent.

Do you have some more details on the kind of errors? Missing files,
permissions, something like this? Or rather something related to the
actual content of an image file?

I'm asking because for avoiding the former, things like access() could
be enough, whereas for the latter we'd have to do a full open.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14 11:21                         ` Kevin Wolf
  0 siblings, 0 replies; 102+ messages in thread
From: Kevin Wolf @ 2011-11-14 11:21 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
>> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
>>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
>>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
>>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>>>>>> Live migration with qcow2 or any other image format is just not going to work 
>>>>>> right now even with proper clustered storage.  I think doing a block level flush 
>>>>>> cache interface and letting block devices decide how to do it is the best approach.
>>>>>
>>>>> I would really prefer reusing the existing open/close code. It means
>>>>> less (duplicated) code, is existing code that is well tested and doesn't
>>>>> make migration much of a special case.
>>>>>
>>>>> If you want to avoid reopening the file on the OS level, we can reopen
>>>>> only the topmost layer (i.e. the format, but not the protocol) for now
>>>>> and in 1.1 we can use bdrv_reopen().
>>>>>
>>>>
>>>> Intuitively I dislike _reopen style interfaces.  If the second open
>>>> yields different results from the first, does it invalidate any
>>>> computations in between?
>>>>
>>>> What's wrong with just delaying the open?
>>>
>>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
>>> the ability to rollback to the source host upon open failure for most
>>> deployed versions of libvirt. We only fairly recently switched to a five
>>> stage migration handshake to cope with rollback when 'cont' fails.
>>>
>>> Daniel
>>
>> I guess reopen can fail as well, so this seems to me to be an important
>> fix but not a blocker.
> 
> If if the initial open succeeds, then it is far more likely that a later
> re-open will succeed too, because you have already elminated the possibility
> of configuration mistakes, and will have caught most storage runtime errors
> too. So there is a very significant difference in reliability between doing
> an 'open at startup + reopen at cont' vs just 'open at cont'
> 
> Based on the bug reports I see, we want to be very good at detecting and
> gracefully handling open errors because they are pretty frequent.

Do you have some more details on the kind of errors? Missing files,
permissions, something like this? Or rather something related to the
actual content of an image file?

I'm asking because for avoiding the former, things like access() could
be enough, whereas for the latter we'd have to do a full open.

Kevin

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14 11:21                         ` [Qemu-devel] " Kevin Wolf
@ 2011-11-14 11:29                           ` Daniel P. Berrange
  -1 siblings, 0 replies; 102+ messages in thread
From: Daniel P. Berrange @ 2011-11-14 11:29 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Michael S. Tsirkin, Avi Kivity, Anthony Liguori,
	Lucas Meneghel Rodrigues, KVM mailing list, Marcelo Tosatti,
	QEMU devel, Juan Jose Quintela Carreira, libvir-list

On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> >>>>>
> >>>>> I would really prefer reusing the existing open/close code. It means
> >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> >>>>> make migration much of a special case.
> >>>>>
> >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> >>>>> and in 1.1 we can use bdrv_reopen().
> >>>>>
> >>>>
> >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> >>>> yields different results from the first, does it invalidate any
> >>>> computations in between?
> >>>>
> >>>> What's wrong with just delaying the open?
> >>>
> >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> >>> the ability to rollback to the source host upon open failure for most
> >>> deployed versions of libvirt. We only fairly recently switched to a five
> >>> stage migration handshake to cope with rollback when 'cont' fails.
> >>>
> >>> Daniel
> >>
> >> I guess reopen can fail as well, so this seems to me to be an important
> >> fix but not a blocker.
> > 
> > If if the initial open succeeds, then it is far more likely that a later
> > re-open will succeed too, because you have already elminated the possibility
> > of configuration mistakes, and will have caught most storage runtime errors
> > too. So there is a very significant difference in reliability between doing
> > an 'open at startup + reopen at cont' vs just 'open at cont'
> > 
> > Based on the bug reports I see, we want to be very good at detecting and
> > gracefully handling open errors because they are pretty frequent.
> 
> Do you have some more details on the kind of errors? Missing files,
> permissions, something like this? Or rather something related to the
> actual content of an image file?

Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
setup. Access permissions due to incorrect user / group setup, or read
only mounts, or SELinux denials. Actual I/O errors are less common and
are not so likely to cause QEMU to fail to start any, since QEMU is
likely to just report them to the guest OS instead.


Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14 11:29                           ` Daniel P. Berrange
  0 siblings, 0 replies; 102+ messages in thread
From: Daniel P. Berrange @ 2011-11-14 11:29 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> >>>>>
> >>>>> I would really prefer reusing the existing open/close code. It means
> >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> >>>>> make migration much of a special case.
> >>>>>
> >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> >>>>> and in 1.1 we can use bdrv_reopen().
> >>>>>
> >>>>
> >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> >>>> yields different results from the first, does it invalidate any
> >>>> computations in between?
> >>>>
> >>>> What's wrong with just delaying the open?
> >>>
> >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> >>> the ability to rollback to the source host upon open failure for most
> >>> deployed versions of libvirt. We only fairly recently switched to a five
> >>> stage migration handshake to cope with rollback when 'cont' fails.
> >>>
> >>> Daniel
> >>
> >> I guess reopen can fail as well, so this seems to me to be an important
> >> fix but not a blocker.
> > 
> > If if the initial open succeeds, then it is far more likely that a later
> > re-open will succeed too, because you have already elminated the possibility
> > of configuration mistakes, and will have caught most storage runtime errors
> > too. So there is a very significant difference in reliability between doing
> > an 'open at startup + reopen at cont' vs just 'open at cont'
> > 
> > Based on the bug reports I see, we want to be very good at detecting and
> > gracefully handling open errors because they are pretty frequent.
> 
> Do you have some more details on the kind of errors? Missing files,
> permissions, something like this? Or rather something related to the
> actual content of an image file?

Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
setup. Access permissions due to incorrect user / group setup, or read
only mounts, or SELinux denials. Actual I/O errors are less common and
are not so likely to cause QEMU to fail to start any, since QEMU is
likely to just report them to the guest OS instead.


Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14 11:08                       ` Daniel P. Berrange
@ 2011-11-14 11:32                         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-14 11:32 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Avi Kivity, Kevin Wolf, Anthony Liguori,
	Lucas Meneghel Rodrigues, KVM mailing list, Marcelo Tosatti,
	QEMU devel, Juan Jose Quintela Carreira, libvir-list

On Mon, Nov 14, 2011 at 11:08:02AM +0000, Daniel P. Berrange wrote:
> On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > > Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > > > Live migration with qcow2 or any other image format is just not going to work 
> > > > > > right now even with proper clustered storage.  I think doing a block level flush 
> > > > > > cache interface and letting block devices decide how to do it is the best approach.
> > > > >
> > > > > I would really prefer reusing the existing open/close code. It means
> > > > > less (duplicated) code, is existing code that is well tested and doesn't
> > > > > make migration much of a special case.
> > > > >
> > > > > If you want to avoid reopening the file on the OS level, we can reopen
> > > > > only the topmost layer (i.e. the format, but not the protocol) for now
> > > > > and in 1.1 we can use bdrv_reopen().
> > > > >
> > > > 
> > > > Intuitively I dislike _reopen style interfaces.  If the second open
> > > > yields different results from the first, does it invalidate any
> > > > computations in between?
> > > > 
> > > > What's wrong with just delaying the open?
> > > 
> > > If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > the ability to rollback to the source host upon open failure for most
> > > deployed versions of libvirt. We only fairly recently switched to a five
> > > stage migration handshake to cope with rollback when 'cont' fails.
> > > 
> > > Daniel
> > 
> > I guess reopen can fail as well, so this seems to me to be an important
> > fix but not a blocker.
> 
> If if the initial open succeeds, then it is far more likely that a later
> re-open will succeed too, because you have already elminated the possibility
> of configuration mistakes, and will have caught most storage runtime errors
> too. So there is a very significant difference in reliability between doing
> an 'open at startup + reopen at cont' vs just 'open at cont'
> 
> Based on the bug reports I see, we want to be very good at detecting and
> gracefully handling open errors because they are pretty frequent.
> 
> Regards,
> Daniel

IIUC, the 'cont' that we were discussing is the startup of the VM
at destination after migration completes. A failure results in
migration failure, which libvirt has been able to handle since forever.
In case of the 'cont' command on source upon migration failure,
qemu was running there previously so it's likely configuration is OK.

Am I confused? If no, libvirt seems unaffected.


> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14 11:32                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-14 11:32 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, libvir-list, Marcelo Tosatti,
	QEMU devel, Avi Kivity

On Mon, Nov 14, 2011 at 11:08:02AM +0000, Daniel P. Berrange wrote:
> On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > > Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > > > Live migration with qcow2 or any other image format is just not going to work 
> > > > > > right now even with proper clustered storage.  I think doing a block level flush 
> > > > > > cache interface and letting block devices decide how to do it is the best approach.
> > > > >
> > > > > I would really prefer reusing the existing open/close code. It means
> > > > > less (duplicated) code, is existing code that is well tested and doesn't
> > > > > make migration much of a special case.
> > > > >
> > > > > If you want to avoid reopening the file on the OS level, we can reopen
> > > > > only the topmost layer (i.e. the format, but not the protocol) for now
> > > > > and in 1.1 we can use bdrv_reopen().
> > > > >
> > > > 
> > > > Intuitively I dislike _reopen style interfaces.  If the second open
> > > > yields different results from the first, does it invalidate any
> > > > computations in between?
> > > > 
> > > > What's wrong with just delaying the open?
> > > 
> > > If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > the ability to rollback to the source host upon open failure for most
> > > deployed versions of libvirt. We only fairly recently switched to a five
> > > stage migration handshake to cope with rollback when 'cont' fails.
> > > 
> > > Daniel
> > 
> > I guess reopen can fail as well, so this seems to me to be an important
> > fix but not a blocker.
> 
> If if the initial open succeeds, then it is far more likely that a later
> re-open will succeed too, because you have already elminated the possibility
> of configuration mistakes, and will have caught most storage runtime errors
> too. So there is a very significant difference in reliability between doing
> an 'open at startup + reopen at cont' vs just 'open at cont'
> 
> Based on the bug reports I see, we want to be very good at detecting and
> gracefully handling open errors because they are pretty frequent.
> 
> Regards,
> Daniel

IIUC, the 'cont' that we were discussing is the startup of the VM
at destination after migration completes. A failure results in
migration failure, which libvirt has been able to handle since forever.
In case of the 'cont' command on source upon migration failure,
qemu was running there previously so it's likely configuration is OK.

Am I confused? If no, libvirt seems unaffected.


> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14 11:29                           ` Daniel P. Berrange
@ 2011-11-14 11:34                             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-14 11:34 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Avi Kivity, Anthony Liguori,
	Lucas Meneghel Rodrigues, KVM mailing list, Marcelo Tosatti,
	QEMU devel, Juan Jose Quintela Carreira, libvir-list

On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> > >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > >>>>>
> > >>>>> I would really prefer reusing the existing open/close code. It means
> > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > >>>>> make migration much of a special case.
> > >>>>>
> > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > >>>>> and in 1.1 we can use bdrv_reopen().
> > >>>>>
> > >>>>
> > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > >>>> yields different results from the first, does it invalidate any
> > >>>> computations in between?
> > >>>>
> > >>>> What's wrong with just delaying the open?
> > >>>
> > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > >>> the ability to rollback to the source host upon open failure for most
> > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > >>>
> > >>> Daniel
> > >>
> > >> I guess reopen can fail as well, so this seems to me to be an important
> > >> fix but not a blocker.
> > > 
> > > If if the initial open succeeds, then it is far more likely that a later
> > > re-open will succeed too, because you have already elminated the possibility
> > > of configuration mistakes, and will have caught most storage runtime errors
> > > too. So there is a very significant difference in reliability between doing
> > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > 
> > > Based on the bug reports I see, we want to be very good at detecting and
> > > gracefully handling open errors because they are pretty frequent.
> > 
> > Do you have some more details on the kind of errors? Missing files,
> > permissions, something like this? Or rather something related to the
> > actual content of an image file?
> 
> Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> setup. Access permissions due to incorrect user / group setup, or read
> only mounts, or SELinux denials. Actual I/O errors are less common and
> are not so likely to cause QEMU to fail to start any, since QEMU is
> likely to just report them to the guest OS instead.
> 
> 
> Daniel

Do you run qemu with -S, then give a 'cont' command to start it?

> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14 11:34                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-14 11:34 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, libvir-list, Marcelo Tosatti,
	QEMU devel, Avi Kivity

On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> > >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > >>>>>
> > >>>>> I would really prefer reusing the existing open/close code. It means
> > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > >>>>> make migration much of a special case.
> > >>>>>
> > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > >>>>> and in 1.1 we can use bdrv_reopen().
> > >>>>>
> > >>>>
> > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > >>>> yields different results from the first, does it invalidate any
> > >>>> computations in between?
> > >>>>
> > >>>> What's wrong with just delaying the open?
> > >>>
> > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > >>> the ability to rollback to the source host upon open failure for most
> > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > >>>
> > >>> Daniel
> > >>
> > >> I guess reopen can fail as well, so this seems to me to be an important
> > >> fix but not a blocker.
> > > 
> > > If if the initial open succeeds, then it is far more likely that a later
> > > re-open will succeed too, because you have already elminated the possibility
> > > of configuration mistakes, and will have caught most storage runtime errors
> > > too. So there is a very significant difference in reliability between doing
> > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > 
> > > Based on the bug reports I see, we want to be very good at detecting and
> > > gracefully handling open errors because they are pretty frequent.
> > 
> > Do you have some more details on the kind of errors? Missing files,
> > permissions, something like this? Or rather something related to the
> > actual content of an image file?
> 
> Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> setup. Access permissions due to incorrect user / group setup, or read
> only mounts, or SELinux denials. Actual I/O errors are less common and
> are not so likely to cause QEMU to fail to start any, since QEMU is
> likely to just report them to the guest OS instead.
> 
> 
> Daniel

Do you run qemu with -S, then give a 'cont' command to start it?

> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14 11:29                           ` Daniel P. Berrange
  (?)
  (?)
@ 2011-11-14 11:36                           ` Gleb Natapov
  -1 siblings, 0 replies; 102+ messages in thread
From: Gleb Natapov @ 2011-11-14 11:36 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > Do you have some more details on the kind of errors? Missing files,
> > permissions, something like this? Or rather something related to the
> > actual content of an image file?
> 
> Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> setup. Access permissions due to incorrect user / group setup, or read
> only mounts, or SELinux denials. Actual I/O errors are less common and
> are not so likely to cause QEMU to fail to start any, since QEMU is
> likely to just report them to the guest OS instead.
> 
If started correctly QEMU should not report them to the guest OS, but
pause instead.

--
			Gleb.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14 11:34                             ` Michael S. Tsirkin
@ 2011-11-14 11:37                               ` Daniel P. Berrange
  -1 siblings, 0 replies; 102+ messages in thread
From: Daniel P. Berrange @ 2011-11-14 11:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, Avi Kivity, Anthony Liguori,
	Lucas Meneghel Rodrigues, KVM mailing list, Marcelo Tosatti,
	QEMU devel, Juan Jose Quintela Carreira, libvir-list

On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> > > >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> > > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > > >>>>>
> > > >>>>> I would really prefer reusing the existing open/close code. It means
> > > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > > >>>>> make migration much of a special case.
> > > >>>>>
> > > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > >>>>>
> > > >>>>
> > > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > > >>>> yields different results from the first, does it invalidate any
> > > >>>> computations in between?
> > > >>>>
> > > >>>> What's wrong with just delaying the open?
> > > >>>
> > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > >>> the ability to rollback to the source host upon open failure for most
> > > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > >>>
> > > >>> Daniel
> > > >>
> > > >> I guess reopen can fail as well, so this seems to me to be an important
> > > >> fix but not a blocker.
> > > > 
> > > > If if the initial open succeeds, then it is far more likely that a later
> > > > re-open will succeed too, because you have already elminated the possibility
> > > > of configuration mistakes, and will have caught most storage runtime errors
> > > > too. So there is a very significant difference in reliability between doing
> > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > 
> > > > Based on the bug reports I see, we want to be very good at detecting and
> > > > gracefully handling open errors because they are pretty frequent.
> > > 
> > > Do you have some more details on the kind of errors? Missing files,
> > > permissions, something like this? Or rather something related to the
> > > actual content of an image file?
> > 
> > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > setup. Access permissions due to incorrect user / group setup, or read
> > only mounts, or SELinux denials. Actual I/O errors are less common and
> > are not so likely to cause QEMU to fail to start any, since QEMU is
> > likely to just report them to the guest OS instead.
> 
> Do you run qemu with -S, then give a 'cont' command to start it?

Yes

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14 11:37                               ` Daniel P. Berrange
  0 siblings, 0 replies; 102+ messages in thread
From: Daniel P. Berrange @ 2011-11-14 11:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, libvir-list, Marcelo Tosatti,
	QEMU devel, Avi Kivity

On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> > > >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> > > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > > >>>>>
> > > >>>>> I would really prefer reusing the existing open/close code. It means
> > > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > > >>>>> make migration much of a special case.
> > > >>>>>
> > > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > >>>>>
> > > >>>>
> > > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > > >>>> yields different results from the first, does it invalidate any
> > > >>>> computations in between?
> > > >>>>
> > > >>>> What's wrong with just delaying the open?
> > > >>>
> > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > >>> the ability to rollback to the source host upon open failure for most
> > > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > >>>
> > > >>> Daniel
> > > >>
> > > >> I guess reopen can fail as well, so this seems to me to be an important
> > > >> fix but not a blocker.
> > > > 
> > > > If if the initial open succeeds, then it is far more likely that a later
> > > > re-open will succeed too, because you have already elminated the possibility
> > > > of configuration mistakes, and will have caught most storage runtime errors
> > > > too. So there is a very significant difference in reliability between doing
> > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > 
> > > > Based on the bug reports I see, we want to be very good at detecting and
> > > > gracefully handling open errors because they are pretty frequent.
> > > 
> > > Do you have some more details on the kind of errors? Missing files,
> > > permissions, something like this? Or rather something related to the
> > > actual content of an image file?
> > 
> > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > setup. Access permissions due to incorrect user / group setup, or read
> > only mounts, or SELinux denials. Actual I/O errors are less common and
> > are not so likely to cause QEMU to fail to start any, since QEMU is
> > likely to just report them to the guest OS instead.
> 
> Do you run qemu with -S, then give a 'cont' command to start it?

Yes

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14 11:37                               ` Daniel P. Berrange
@ 2011-11-14 11:51                                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-14 11:51 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Avi Kivity, Anthony Liguori,
	Lucas Meneghel Rodrigues, KVM mailing list, Marcelo Tosatti,
	QEMU devel, Juan Jose Quintela Carreira, libvir-list

On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote:
> On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> > > > >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> > > > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > > > >>>>>
> > > > >>>>> I would really prefer reusing the existing open/close code. It means
> > > > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > > > >>>>> make migration much of a special case.
> > > > >>>>>
> > > > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > > >>>>>
> > > > >>>>
> > > > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > > > >>>> yields different results from the first, does it invalidate any
> > > > >>>> computations in between?
> > > > >>>>
> > > > >>>> What's wrong with just delaying the open?
> > > > >>>
> > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > > >>> the ability to rollback to the source host upon open failure for most
> > > > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > > >>>
> > > > >>> Daniel
> > > > >>
> > > > >> I guess reopen can fail as well, so this seems to me to be an important
> > > > >> fix but not a blocker.
> > > > > 
> > > > > If if the initial open succeeds, then it is far more likely that a later
> > > > > re-open will succeed too, because you have already elminated the possibility
> > > > > of configuration mistakes, and will have caught most storage runtime errors
> > > > > too. So there is a very significant difference in reliability between doing
> > > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > > 
> > > > > Based on the bug reports I see, we want to be very good at detecting and
> > > > > gracefully handling open errors because they are pretty frequent.
> > > > 
> > > > Do you have some more details on the kind of errors? Missing files,
> > > > permissions, something like this? Or rather something related to the
> > > > actual content of an image file?
> > > 
> > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > > setup. Access permissions due to incorrect user / group setup, or read
> > > only mounts, or SELinux denials. Actual I/O errors are less common and
> > > are not so likely to cause QEMU to fail to start any, since QEMU is
> > > likely to just report them to the guest OS instead.
> > 
> > Do you run qemu with -S, then give a 'cont' command to start it?
> 
> Yes
> 
> Daniel

Probably in an attempt to improve reliability :)

So this is in fact unrelated to migration.  So we can either ignore this
bug (assuming no distros ship cutting edge qemu with an old libvirt), or
special-case -S and do an open/close cycle on startup.


> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14 11:51                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-14 11:51 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, libvir-list, Marcelo Tosatti,
	QEMU devel, Avi Kivity

On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote:
> On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> > > > >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> > > > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > > > >>>>>
> > > > >>>>> I would really prefer reusing the existing open/close code. It means
> > > > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > > > >>>>> make migration much of a special case.
> > > > >>>>>
> > > > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > > >>>>>
> > > > >>>>
> > > > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > > > >>>> yields different results from the first, does it invalidate any
> > > > >>>> computations in between?
> > > > >>>>
> > > > >>>> What's wrong with just delaying the open?
> > > > >>>
> > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > > >>> the ability to rollback to the source host upon open failure for most
> > > > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > > >>>
> > > > >>> Daniel
> > > > >>
> > > > >> I guess reopen can fail as well, so this seems to me to be an important
> > > > >> fix but not a blocker.
> > > > > 
> > > > > If if the initial open succeeds, then it is far more likely that a later
> > > > > re-open will succeed too, because you have already elminated the possibility
> > > > > of configuration mistakes, and will have caught most storage runtime errors
> > > > > too. So there is a very significant difference in reliability between doing
> > > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > > 
> > > > > Based on the bug reports I see, we want to be very good at detecting and
> > > > > gracefully handling open errors because they are pretty frequent.
> > > > 
> > > > Do you have some more details on the kind of errors? Missing files,
> > > > permissions, something like this? Or rather something related to the
> > > > actual content of an image file?
> > > 
> > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > > setup. Access permissions due to incorrect user / group setup, or read
> > > only mounts, or SELinux denials. Actual I/O errors are less common and
> > > are not so likely to cause QEMU to fail to start any, since QEMU is
> > > likely to just report them to the guest OS instead.
> > 
> > Do you run qemu with -S, then give a 'cont' command to start it?
> 
> Yes
> 
> Daniel

Probably in an attempt to improve reliability :)

So this is in fact unrelated to migration.  So we can either ignore this
bug (assuming no distros ship cutting edge qemu with an old libvirt), or
special-case -S and do an open/close cycle on startup.


> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14 11:51                                 ` Michael S. Tsirkin
@ 2011-11-14 11:55                                   ` Daniel P. Berrange
  -1 siblings, 0 replies; 102+ messages in thread
From: Daniel P. Berrange @ 2011-11-14 11:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, Avi Kivity, Anthony Liguori,
	Lucas Meneghel Rodrigues, KVM mailing list, Marcelo Tosatti,
	QEMU devel, Juan Jose Quintela Carreira, libvir-list

On Mon, Nov 14, 2011 at 01:51:40PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote:
> > On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > > >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> > > > > >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> > > > > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > > > > >>>>>
> > > > > >>>>> I would really prefer reusing the existing open/close code. It means
> > > > > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > > > > >>>>> make migration much of a special case.
> > > > > >>>>>
> > > > > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > > > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > > > > >>>> yields different results from the first, does it invalidate any
> > > > > >>>> computations in between?
> > > > > >>>>
> > > > > >>>> What's wrong with just delaying the open?
> > > > > >>>
> > > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > > > >>> the ability to rollback to the source host upon open failure for most
> > > > > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > > > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > > > >>>
> > > > > >>> Daniel
> > > > > >>
> > > > > >> I guess reopen can fail as well, so this seems to me to be an important
> > > > > >> fix but not a blocker.
> > > > > > 
> > > > > > If if the initial open succeeds, then it is far more likely that a later
> > > > > > re-open will succeed too, because you have already elminated the possibility
> > > > > > of configuration mistakes, and will have caught most storage runtime errors
> > > > > > too. So there is a very significant difference in reliability between doing
> > > > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > > > 
> > > > > > Based on the bug reports I see, we want to be very good at detecting and
> > > > > > gracefully handling open errors because they are pretty frequent.
> > > > > 
> > > > > Do you have some more details on the kind of errors? Missing files,
> > > > > permissions, something like this? Or rather something related to the
> > > > > actual content of an image file?
> > > > 
> > > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > > > setup. Access permissions due to incorrect user / group setup, or read
> > > > only mounts, or SELinux denials. Actual I/O errors are less common and
> > > > are not so likely to cause QEMU to fail to start any, since QEMU is
> > > > likely to just report them to the guest OS instead.
> > > 
> > > Do you run qemu with -S, then give a 'cont' command to start it?
> 
> Probably in an attempt to improve reliability :)

Not really. We can't simply let QEMU start its own CPUs, because there are
various tasks that need performing after the migration transfer finishes,
but before the CPUs are allowed to be started. eg

 - Finish 802.11Qb{g,h} (VEPA) network port profile association on target
 - Release leases for any resources associated with the source QEMU
   via a configured lock manager (eg sanlock)
 - Acquire leases for any resources associated with the target QEMU
   via a configured lock manager (eg sanlock)

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14 11:55                                   ` Daniel P. Berrange
  0 siblings, 0 replies; 102+ messages in thread
From: Daniel P. Berrange @ 2011-11-14 11:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, libvir-list, Marcelo Tosatti,
	QEMU devel, Avi Kivity

On Mon, Nov 14, 2011 at 01:51:40PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote:
> > On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > > >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> > > > > >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> > > > > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > > > > >>>>>
> > > > > >>>>> I would really prefer reusing the existing open/close code. It means
> > > > > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > > > > >>>>> make migration much of a special case.
> > > > > >>>>>
> > > > > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > > > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > > > > >>>> yields different results from the first, does it invalidate any
> > > > > >>>> computations in between?
> > > > > >>>>
> > > > > >>>> What's wrong with just delaying the open?
> > > > > >>>
> > > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > > > >>> the ability to rollback to the source host upon open failure for most
> > > > > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > > > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > > > >>>
> > > > > >>> Daniel
> > > > > >>
> > > > > >> I guess reopen can fail as well, so this seems to me to be an important
> > > > > >> fix but not a blocker.
> > > > > > 
> > > > > > If if the initial open succeeds, then it is far more likely that a later
> > > > > > re-open will succeed too, because you have already elminated the possibility
> > > > > > of configuration mistakes, and will have caught most storage runtime errors
> > > > > > too. So there is a very significant difference in reliability between doing
> > > > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > > > 
> > > > > > Based on the bug reports I see, we want to be very good at detecting and
> > > > > > gracefully handling open errors because they are pretty frequent.
> > > > > 
> > > > > Do you have some more details on the kind of errors? Missing files,
> > > > > permissions, something like this? Or rather something related to the
> > > > > actual content of an image file?
> > > > 
> > > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > > > setup. Access permissions due to incorrect user / group setup, or read
> > > > only mounts, or SELinux denials. Actual I/O errors are less common and
> > > > are not so likely to cause QEMU to fail to start any, since QEMU is
> > > > likely to just report them to the guest OS instead.
> > > 
> > > Do you run qemu with -S, then give a 'cont' command to start it?
> 
> Probably in an attempt to improve reliability :)

Not really. We can't simply let QEMU start its own CPUs, because there are
various tasks that need performing after the migration transfer finishes,
but before the CPUs are allowed to be started. eg

 - Finish 802.11Qb{g,h} (VEPA) network port profile association on target
 - Release leases for any resources associated with the source QEMU
   via a configured lock manager (eg sanlock)
 - Acquire leases for any resources associated with the target QEMU
   via a configured lock manager (eg sanlock)

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14 11:37                               ` Daniel P. Berrange
@ 2011-11-14 11:56                                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-14 11:56 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, libvir-list, Marcelo Tosatti,
	QEMU devel, Avi Kivity

On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote:
> On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> > > > >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> > > > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > > > >>>>>
> > > > >>>>> I would really prefer reusing the existing open/close code. It means
> > > > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > > > >>>>> make migration much of a special case.
> > > > >>>>>
> > > > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > > >>>>>
> > > > >>>>
> > > > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > > > >>>> yields different results from the first, does it invalidate any
> > > > >>>> computations in between?
> > > > >>>>
> > > > >>>> What's wrong with just delaying the open?
> > > > >>>
> > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > > >>> the ability to rollback to the source host upon open failure for most
> > > > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > > >>>
> > > > >>> Daniel
> > > > >>
> > > > >> I guess reopen can fail as well, so this seems to me to be an important
> > > > >> fix but not a blocker.
> > > > > 
> > > > > If if the initial open succeeds, then it is far more likely that a later
> > > > > re-open will succeed too, because you have already elminated the possibility
> > > > > of configuration mistakes, and will have caught most storage runtime errors
> > > > > too. So there is a very significant difference in reliability between doing
> > > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > > 
> > > > > Based on the bug reports I see, we want to be very good at detecting and
> > > > > gracefully handling open errors because they are pretty frequent.
> > > > 
> > > > Do you have some more details on the kind of errors? Missing files,
> > > > permissions, something like this? Or rather something related to the
> > > > actual content of an image file?
> > > 
> > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > > setup. Access permissions due to incorrect user / group setup, or read
> > > only mounts, or SELinux denials. Actual I/O errors are less common and
> > > are not so likely to cause QEMU to fail to start any, since QEMU is
> > > likely to just report them to the guest OS instead.
> > 
> > Do you run qemu with -S, then give a 'cont' command to start it?
> 
> Yes
> 
> Daniel

OK, so let's go back one step now - how is this related to
'rollback to source host'?

-- 
MST

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14 11:56                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-14 11:56 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, libvir-list, Marcelo Tosatti,
	QEMU devel, Avi Kivity

On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote:
> On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> > > > >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> > > > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > > > >>>>>
> > > > >>>>> I would really prefer reusing the existing open/close code. It means
> > > > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > > > >>>>> make migration much of a special case.
> > > > >>>>>
> > > > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > > >>>>>
> > > > >>>>
> > > > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > > > >>>> yields different results from the first, does it invalidate any
> > > > >>>> computations in between?
> > > > >>>>
> > > > >>>> What's wrong with just delaying the open?
> > > > >>>
> > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > > >>> the ability to rollback to the source host upon open failure for most
> > > > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > > >>>
> > > > >>> Daniel
> > > > >>
> > > > >> I guess reopen can fail as well, so this seems to me to be an important
> > > > >> fix but not a blocker.
> > > > > 
> > > > > If if the initial open succeeds, then it is far more likely that a later
> > > > > re-open will succeed too, because you have already elminated the possibility
> > > > > of configuration mistakes, and will have caught most storage runtime errors
> > > > > too. So there is a very significant difference in reliability between doing
> > > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > > 
> > > > > Based on the bug reports I see, we want to be very good at detecting and
> > > > > gracefully handling open errors because they are pretty frequent.
> > > > 
> > > > Do you have some more details on the kind of errors? Missing files,
> > > > permissions, something like this? Or rather something related to the
> > > > actual content of an image file?
> > > 
> > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > > setup. Access permissions due to incorrect user / group setup, or read
> > > only mounts, or SELinux denials. Actual I/O errors are less common and
> > > are not so likely to cause QEMU to fail to start any, since QEMU is
> > > likely to just report them to the guest OS instead.
> > 
> > Do you run qemu with -S, then give a 'cont' command to start it?
> 
> Yes
> 
> Daniel

OK, so let's go back one step now - how is this related to
'rollback to source host'?

-- 
MST

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14 11:56                                 ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-11-14 11:58                                   ` Daniel P. Berrange
  -1 siblings, 0 replies; 102+ messages in thread
From: Daniel P. Berrange @ 2011-11-14 11:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, Avi Kivity, Anthony Liguori,
	Lucas Meneghel Rodrigues, KVM mailing list, Marcelo Tosatti,
	QEMU devel, Juan Jose Quintela Carreira, libvir-list

On Mon, Nov 14, 2011 at 01:56:36PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote:
> > On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > > >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> > > > > >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> > > > > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > > > > >>>>>
> > > > > >>>>> I would really prefer reusing the existing open/close code. It means
> > > > > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > > > > >>>>> make migration much of a special case.
> > > > > >>>>>
> > > > > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > > > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > > > > >>>> yields different results from the first, does it invalidate any
> > > > > >>>> computations in between?
> > > > > >>>>
> > > > > >>>> What's wrong with just delaying the open?
> > > > > >>>
> > > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > > > >>> the ability to rollback to the source host upon open failure for most
> > > > > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > > > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > > > >>>
> > > > > >>> Daniel
> > > > > >>
> > > > > >> I guess reopen can fail as well, so this seems to me to be an important
> > > > > >> fix but not a blocker.
> > > > > > 
> > > > > > If if the initial open succeeds, then it is far more likely that a later
> > > > > > re-open will succeed too, because you have already elminated the possibility
> > > > > > of configuration mistakes, and will have caught most storage runtime errors
> > > > > > too. So there is a very significant difference in reliability between doing
> > > > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > > > 
> > > > > > Based on the bug reports I see, we want to be very good at detecting and
> > > > > > gracefully handling open errors because they are pretty frequent.
> > > > > 
> > > > > Do you have some more details on the kind of errors? Missing files,
> > > > > permissions, something like this? Or rather something related to the
> > > > > actual content of an image file?
> > > > 
> > > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > > > setup. Access permissions due to incorrect user / group setup, or read
> > > > only mounts, or SELinux denials. Actual I/O errors are less common and
> > > > are not so likely to cause QEMU to fail to start any, since QEMU is
> > > > likely to just report them to the guest OS instead.
> > > 
> > > Do you run qemu with -S, then give a 'cont' command to start it?
> > 
> > Yes
> 
> OK, so let's go back one step now - how is this related to
> 'rollback to source host'?

In the old libvirt migration protocol, by the time we run 'cont' on the
destination, the source QEMU has already been killed off, so there's
nothing to resume on failure.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14 11:58                                   ` Daniel P. Berrange
  0 siblings, 0 replies; 102+ messages in thread
From: Daniel P. Berrange @ 2011-11-14 11:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, libvir-list, Marcelo Tosatti,
	QEMU devel, Avi Kivity

On Mon, Nov 14, 2011 at 01:56:36PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote:
> > On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > > >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> > > > > >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> > > > > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > > > > >>>>>
> > > > > >>>>> I would really prefer reusing the existing open/close code. It means
> > > > > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > > > > >>>>> make migration much of a special case.
> > > > > >>>>>
> > > > > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > > > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > > > > >>>> yields different results from the first, does it invalidate any
> > > > > >>>> computations in between?
> > > > > >>>>
> > > > > >>>> What's wrong with just delaying the open?
> > > > > >>>
> > > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > > > >>> the ability to rollback to the source host upon open failure for most
> > > > > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > > > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > > > >>>
> > > > > >>> Daniel
> > > > > >>
> > > > > >> I guess reopen can fail as well, so this seems to me to be an important
> > > > > >> fix but not a blocker.
> > > > > > 
> > > > > > If if the initial open succeeds, then it is far more likely that a later
> > > > > > re-open will succeed too, because you have already elminated the possibility
> > > > > > of configuration mistakes, and will have caught most storage runtime errors
> > > > > > too. So there is a very significant difference in reliability between doing
> > > > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > > > 
> > > > > > Based on the bug reports I see, we want to be very good at detecting and
> > > > > > gracefully handling open errors because they are pretty frequent.
> > > > > 
> > > > > Do you have some more details on the kind of errors? Missing files,
> > > > > permissions, something like this? Or rather something related to the
> > > > > actual content of an image file?
> > > > 
> > > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > > > setup. Access permissions due to incorrect user / group setup, or read
> > > > only mounts, or SELinux denials. Actual I/O errors are less common and
> > > > are not so likely to cause QEMU to fail to start any, since QEMU is
> > > > likely to just report them to the guest OS instead.
> > > 
> > > Do you run qemu with -S, then give a 'cont' command to start it?
> > 
> > Yes
> 
> OK, so let's go back one step now - how is this related to
> 'rollback to source host'?

In the old libvirt migration protocol, by the time we run 'cont' on the
destination, the source QEMU has already been killed off, so there's
nothing to resume on failure.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14 11:58                                   ` Daniel P. Berrange
@ 2011-11-14 12:17                                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-14 12:17 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Avi Kivity, Anthony Liguori,
	Lucas Meneghel Rodrigues, KVM mailing list, Marcelo Tosatti,
	QEMU devel, Juan Jose Quintela Carreira, libvir-list

On Mon, Nov 14, 2011 at 11:58:14AM +0000, Daniel P. Berrange wrote:
> On Mon, Nov 14, 2011 at 01:56:36PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote:
> > > On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > > > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > > > >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> > > > > > >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> > > > > > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > > > > > >>>>>
> > > > > > >>>>> I would really prefer reusing the existing open/close code. It means
> > > > > > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > > > > > >>>>> make migration much of a special case.
> > > > > > >>>>>
> > > > > > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > > > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > > > > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > > > > >>>>>
> > > > > > >>>>
> > > > > > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > > > > > >>>> yields different results from the first, does it invalidate any
> > > > > > >>>> computations in between?
> > > > > > >>>>
> > > > > > >>>> What's wrong with just delaying the open?
> > > > > > >>>
> > > > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > > > > >>> the ability to rollback to the source host upon open failure for most
> > > > > > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > > > > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > > > > >>>
> > > > > > >>> Daniel
> > > > > > >>
> > > > > > >> I guess reopen can fail as well, so this seems to me to be an important
> > > > > > >> fix but not a blocker.
> > > > > > > 
> > > > > > > If if the initial open succeeds, then it is far more likely that a later
> > > > > > > re-open will succeed too, because you have already elminated the possibility
> > > > > > > of configuration mistakes, and will have caught most storage runtime errors
> > > > > > > too. So there is a very significant difference in reliability between doing
> > > > > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > > > > 
> > > > > > > Based on the bug reports I see, we want to be very good at detecting and
> > > > > > > gracefully handling open errors because they are pretty frequent.
> > > > > > 
> > > > > > Do you have some more details on the kind of errors? Missing files,
> > > > > > permissions, something like this? Or rather something related to the
> > > > > > actual content of an image file?
> > > > > 
> > > > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > > > > setup. Access permissions due to incorrect user / group setup, or read
> > > > > only mounts, or SELinux denials. Actual I/O errors are less common and
> > > > > are not so likely to cause QEMU to fail to start any, since QEMU is
> > > > > likely to just report them to the guest OS instead.
> > > > 
> > > > Do you run qemu with -S, then give a 'cont' command to start it?
> > > 
> > > Yes
> > 
> > OK, so let's go back one step now - how is this related to
> > 'rollback to source host'?
> 
> In the old libvirt migration protocol, by the time we run 'cont' on the
> destination, the source QEMU has already been killed off, so there's
> nothing to resume on failure.
> 
> Daniel

I see. So again there are two solutions I see:
1. ignore old libvirt as it can't restart source reliably anyway
2. open files when migration is completed (after startup, but before cont)



> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14 12:17                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2011-11-14 12:17 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Juan Jose Quintela Carreira, libvir-list, Marcelo Tosatti,
	QEMU devel, Avi Kivity

On Mon, Nov 14, 2011 at 11:58:14AM +0000, Daniel P. Berrange wrote:
> On Mon, Nov 14, 2011 at 01:56:36PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote:
> > > On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > > > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > > > >>>>>> Live migration with qcow2 or any other image format is just not going to work 
> > > > > > >>>>>> right now even with proper clustered storage.  I think doing a block level flush 
> > > > > > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > > > > > >>>>>
> > > > > > >>>>> I would really prefer reusing the existing open/close code. It means
> > > > > > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > > > > > >>>>> make migration much of a special case.
> > > > > > >>>>>
> > > > > > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > > > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > > > > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > > > > >>>>>
> > > > > > >>>>
> > > > > > >>>> Intuitively I dislike _reopen style interfaces.  If the second open
> > > > > > >>>> yields different results from the first, does it invalidate any
> > > > > > >>>> computations in between?
> > > > > > >>>>
> > > > > > >>>> What's wrong with just delaying the open?
> > > > > > >>>
> > > > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > > > > >>> the ability to rollback to the source host upon open failure for most
> > > > > > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > > > > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > > > > >>>
> > > > > > >>> Daniel
> > > > > > >>
> > > > > > >> I guess reopen can fail as well, so this seems to me to be an important
> > > > > > >> fix but not a blocker.
> > > > > > > 
> > > > > > > If if the initial open succeeds, then it is far more likely that a later
> > > > > > > re-open will succeed too, because you have already elminated the possibility
> > > > > > > of configuration mistakes, and will have caught most storage runtime errors
> > > > > > > too. So there is a very significant difference in reliability between doing
> > > > > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > > > > 
> > > > > > > Based on the bug reports I see, we want to be very good at detecting and
> > > > > > > gracefully handling open errors because they are pretty frequent.
> > > > > > 
> > > > > > Do you have some more details on the kind of errors? Missing files,
> > > > > > permissions, something like this? Or rather something related to the
> > > > > > actual content of an image file?
> > > > > 
> > > > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > > > > setup. Access permissions due to incorrect user / group setup, or read
> > > > > only mounts, or SELinux denials. Actual I/O errors are less common and
> > > > > are not so likely to cause QEMU to fail to start any, since QEMU is
> > > > > likely to just report them to the guest OS instead.
> > > > 
> > > > Do you run qemu with -S, then give a 'cont' command to start it?
> > > 
> > > Yes
> > 
> > OK, so let's go back one step now - how is this related to
> > 'rollback to source host'?
> 
> In the old libvirt migration protocol, by the time we run 'cont' on the
> destination, the source QEMU has already been killed off, so there's
> nothing to resume on failure.
> 
> Daniel

I see. So again there are two solutions I see:
1. ignore old libvirt as it can't restart source reliably anyway
2. open files when migration is completed (after startup, but before cont)



> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14 10:16                   ` Daniel P. Berrange
@ 2011-11-14 14:19                     ` Anthony Liguori
  -1 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-14 14:19 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Avi Kivity, Kevin Wolf, Lucas Meneghel Rodrigues,
	KVM mailing list, Michael S. Tsirkin, libvir-list,
	Marcelo Tosatti, QEMU devel, Juan Jose Quintela Carreira

On 11/14/2011 04:16 AM, Daniel P. Berrange wrote:
> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>>>> Live migration with qcow2 or any other image format is just not going to work
>>>> right now even with proper clustered storage.  I think doing a block level flush
>>>> cache interface and letting block devices decide how to do it is the best approach.
>>>
>>> I would really prefer reusing the existing open/close code. It means
>>> less (duplicated) code, is existing code that is well tested and doesn't
>>> make migration much of a special case.
>>>
>>> If you want to avoid reopening the file on the OS level, we can reopen
>>> only the topmost layer (i.e. the format, but not the protocol) for now
>>> and in 1.1 we can use bdrv_reopen().
>>>
>>
>> Intuitively I dislike _reopen style interfaces.  If the second open
>> yields different results from the first, does it invalidate any
>> computations in between?
>>
>> What's wrong with just delaying the open?
>
> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> the ability to rollback to the source host upon open failure for most
> deployed versions of libvirt. We only fairly recently switched to a five
> stage migration handshake to cope with rollback when 'cont' fails.

Delayed open isn't a panacea.  With the series I sent, we should be able to 
migration with a qcow2 file on coherent shared storage.

There are two other cases that we care about: migration with nfs cache!=none and 
direct attached storage with cache!=none

Whether the open is deferred matters less with NFS than if the open happens 
after the close on the source.  To fix NFS cache!=none, we would have to do a 
bdrv_close() before sending the last byte of migration data and make sure that 
we bdrv_open() after receiving the last byte of migration data.

The problem with this IMHO is it creates a large window where noone has the file 
open and you're critically vulnerable to losing your VM.

I'm much more in favor of a smarter caching policy.  If we can fcntl() our way 
to O_DIRECT on NFS, that would be fairly interesting.  I'm not sure if this is 
supported today but it's something we could look into adding in the kernel. 
That way we could force NFS to O_DIRECT during migration which would solve this 
problem robustly.

Deferred open doesn't help with direct attached storage.  There simple is no 
guarantee that there isn't data in the page cache.

Again, I think defaulting DAS to cache=none|directsync is what makes the most 
sense here.

We can even add a migration blocker for DAS with cache=on.  If we can do dynamic 
toggling of the cache setting, then that's pretty friendly at the end of the day.

Regards,

Anthony Liguori

>
> Daniel

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-14 14:19                     ` Anthony Liguori
  0 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-14 14:19 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Avi Kivity

On 11/14/2011 04:16 AM, Daniel P. Berrange wrote:
> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>>>> Live migration with qcow2 or any other image format is just not going to work
>>>> right now even with proper clustered storage.  I think doing a block level flush
>>>> cache interface and letting block devices decide how to do it is the best approach.
>>>
>>> I would really prefer reusing the existing open/close code. It means
>>> less (duplicated) code, is existing code that is well tested and doesn't
>>> make migration much of a special case.
>>>
>>> If you want to avoid reopening the file on the OS level, we can reopen
>>> only the topmost layer (i.e. the format, but not the protocol) for now
>>> and in 1.1 we can use bdrv_reopen().
>>>
>>
>> Intuitively I dislike _reopen style interfaces.  If the second open
>> yields different results from the first, does it invalidate any
>> computations in between?
>>
>> What's wrong with just delaying the open?
>
> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> the ability to rollback to the source host upon open failure for most
> deployed versions of libvirt. We only fairly recently switched to a five
> stage migration handshake to cope with rollback when 'cont' fails.

Delayed open isn't a panacea.  With the series I sent, we should be able to 
migration with a qcow2 file on coherent shared storage.

There are two other cases that we care about: migration with nfs cache!=none and 
direct attached storage with cache!=none

Whether the open is deferred matters less with NFS than if the open happens 
after the close on the source.  To fix NFS cache!=none, we would have to do a 
bdrv_close() before sending the last byte of migration data and make sure that 
we bdrv_open() after receiving the last byte of migration data.

The problem with this IMHO is it creates a large window where noone has the file 
open and you're critically vulnerable to losing your VM.

I'm much more in favor of a smarter caching policy.  If we can fcntl() our way 
to O_DIRECT on NFS, that would be fairly interesting.  I'm not sure if this is 
supported today but it's something we could look into adding in the kernel. 
That way we could force NFS to O_DIRECT during migration which would solve this 
problem robustly.

Deferred open doesn't help with direct attached storage.  There simple is no 
guarantee that there isn't data in the page cache.

Again, I think defaulting DAS to cache=none|directsync is what makes the most 
sense here.

We can even add a migration blocker for DAS with cache=on.  If we can do dynamic 
toggling of the cache setting, then that's pretty friendly at the end of the day.

Regards,

Anthony Liguori

>
> Daniel

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14 14:19                     ` Anthony Liguori
@ 2011-11-15 13:20                       ` Juan Quintela
  -1 siblings, 0 replies; 102+ messages in thread
From: Juan Quintela @ 2011-11-15 13:20 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Daniel P. Berrange, Avi Kivity, Kevin Wolf,
	Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	libvir-list, Marcelo Tosatti, QEMU devel

Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 11/14/2011 04:16 AM, Daniel P. Berrange wrote:
>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>>>>> Live migration with qcow2 or any other image format is just not going to work
>>>>> right now even with proper clustered storage.  I think doing a block level flush
>>>>> cache interface and letting block devices decide how to do it is the best approach.
>>>>
>>>> I would really prefer reusing the existing open/close code. It means
>>>> less (duplicated) code, is existing code that is well tested and doesn't
>>>> make migration much of a special case.
>>>>
>>>> If you want to avoid reopening the file on the OS level, we can reopen
>>>> only the topmost layer (i.e. the format, but not the protocol) for now
>>>> and in 1.1 we can use bdrv_reopen().
>>>>
>>>
>>> Intuitively I dislike _reopen style interfaces.  If the second open
>>> yields different results from the first, does it invalidate any
>>> computations in between?
>>>
>>> What's wrong with just delaying the open?
>>
>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
>> the ability to rollback to the source host upon open failure for most
>> deployed versions of libvirt. We only fairly recently switched to a five
>> stage migration handshake to cope with rollback when 'cont' fails.
>
> Delayed open isn't a panacea.  With the series I sent, we should be
> able to migration with a qcow2 file on coherent shared storage.
>
> There are two other cases that we care about: migration with nfs
> cache!=none and direct attached storage with cache!=none
>
> Whether the open is deferred matters less with NFS than if the open
> happens after the close on the source.  To fix NFS cache!=none, we
> would have to do a bdrv_close() before sending the last byte of
> migration data and make sure that we bdrv_open() after receiving the
> last byte of migration data.
>
> The problem with this IMHO is it creates a large window where noone
> has the file open and you're critically vulnerable to losing your VM.

Red Hat NFS guru told that fsync() on source + open() after that on
target is enough.  But anyways, it still depends of nothing else having
the file opened on target.

> I'm much more in favor of a smarter caching policy.  If we can fcntl()
> our way to O_DIRECT on NFS, that would be fairly interesting.  I'm not
> sure if this is supported today but it's something we could look into
> adding in the kernel. That way we could force NFS to O_DIRECT during
> migration which would solve this problem robustly.

We would need O_DIRECT on target during migration, I agree than that
would work.

> Deferred open doesn't help with direct attached storage.  There simple
> is no guarantee that there isn't data in the page cache.

Yeap, I asked the clustered filesystem people how they fixed the
problem, because clustered filesystem have this problem, right.  After
lots of arm twisting, I got the ioctl(BLKFLSBUF,...), but that only
works:
- on linux
- on some block devices

So, we are back to square 1.

> Again, I think defaulting DAS to cache=none|directsync is what makes
> the most sense here.

I think it is the only sane solution.  Otherwise, we need to write the
equivalent of a lock manager, to know _who_ has the storage, and
distributed lock managers are a mess :-(

> We can even add a migration blocker for DAS with cache=on.  If we can
> do dynamic toggling of the cache setting, then that's pretty friendly
> at the end of the day.

That could fix the problem also.  At the moment that we start migration,
we do an fsync() + switch to O_DIRECT for all filesystems.

As you said, time for implementing fcntl(O_DIRECT).

Later, Juan.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-15 13:20                       ` Juan Quintela
  0 siblings, 0 replies; 102+ messages in thread
From: Juan Quintela @ 2011-11-15 13:20 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Avi Kivity

Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 11/14/2011 04:16 AM, Daniel P. Berrange wrote:
>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>>>>> Live migration with qcow2 or any other image format is just not going to work
>>>>> right now even with proper clustered storage.  I think doing a block level flush
>>>>> cache interface and letting block devices decide how to do it is the best approach.
>>>>
>>>> I would really prefer reusing the existing open/close code. It means
>>>> less (duplicated) code, is existing code that is well tested and doesn't
>>>> make migration much of a special case.
>>>>
>>>> If you want to avoid reopening the file on the OS level, we can reopen
>>>> only the topmost layer (i.e. the format, but not the protocol) for now
>>>> and in 1.1 we can use bdrv_reopen().
>>>>
>>>
>>> Intuitively I dislike _reopen style interfaces.  If the second open
>>> yields different results from the first, does it invalidate any
>>> computations in between?
>>>
>>> What's wrong with just delaying the open?
>>
>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
>> the ability to rollback to the source host upon open failure for most
>> deployed versions of libvirt. We only fairly recently switched to a five
>> stage migration handshake to cope with rollback when 'cont' fails.
>
> Delayed open isn't a panacea.  With the series I sent, we should be
> able to migration with a qcow2 file on coherent shared storage.
>
> There are two other cases that we care about: migration with nfs
> cache!=none and direct attached storage with cache!=none
>
> Whether the open is deferred matters less with NFS than if the open
> happens after the close on the source.  To fix NFS cache!=none, we
> would have to do a bdrv_close() before sending the last byte of
> migration data and make sure that we bdrv_open() after receiving the
> last byte of migration data.
>
> The problem with this IMHO is it creates a large window where noone
> has the file open and you're critically vulnerable to losing your VM.

Red Hat NFS guru told that fsync() on source + open() after that on
target is enough.  But anyways, it still depends of nothing else having
the file opened on target.

> I'm much more in favor of a smarter caching policy.  If we can fcntl()
> our way to O_DIRECT on NFS, that would be fairly interesting.  I'm not
> sure if this is supported today but it's something we could look into
> adding in the kernel. That way we could force NFS to O_DIRECT during
> migration which would solve this problem robustly.

We would need O_DIRECT on target during migration, I agree than that
would work.

> Deferred open doesn't help with direct attached storage.  There simple
> is no guarantee that there isn't data in the page cache.

Yeap, I asked the clustered filesystem people how they fixed the
problem, because clustered filesystem have this problem, right.  After
lots of arm twisting, I got the ioctl(BLKFLSBUF,...), but that only
works:
- on linux
- on some block devices

So, we are back to square 1.

> Again, I think defaulting DAS to cache=none|directsync is what makes
> the most sense here.

I think it is the only sane solution.  Otherwise, we need to write the
equivalent of a lock manager, to know _who_ has the storage, and
distributed lock managers are a mess :-(

> We can even add a migration blocker for DAS with cache=on.  If we can
> do dynamic toggling of the cache setting, then that's pretty friendly
> at the end of the day.

That could fix the problem also.  At the moment that we start migration,
we do an fsync() + switch to O_DIRECT for all filesystems.

As you said, time for implementing fcntl(O_DIRECT).

Later, Juan.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-14  9:58                   ` Kevin Wolf
@ 2011-11-15 13:28                     ` Avi Kivity
  -1 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2011-11-15 13:28 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Anthony Liguori, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael S. Tsirkin, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira, Daniel P. Berrange, libvir-list

On 11/14/2011 11:58 AM, Kevin Wolf wrote:
> Am 12.11.2011 11:25, schrieb Avi Kivity:
> > On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> >> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> >>> Live migration with qcow2 or any other image format is just not going to work 
> >>> right now even with proper clustered storage.  I think doing a block level flush 
> >>> cache interface and letting block devices decide how to do it is the best approach.
> >>
> >> I would really prefer reusing the existing open/close code. It means
> >> less (duplicated) code, is existing code that is well tested and doesn't
> >> make migration much of a special case.
> >>
> >> If you want to avoid reopening the file on the OS level, we can reopen
> >> only the topmost layer (i.e. the format, but not the protocol) for now
> >> and in 1.1 we can use bdrv_reopen().
> > 
> > Intuitively I dislike _reopen style interfaces.  If the second open
> > yields different results from the first, does it invalidate any
> > computations in between?
>
> Not sure what results and what computation you mean,

Result = open succeeded.  Computation = anything that derives from the
image, like size, or reading some stuff to guess CHS or something.

>  but let me clarify
> a bit about bdrv_reopen:
>
> The main purpose of bdrv_reopen() is to change flags, for example toggle
> O_SYNC during runtime in order to allow the guest to toggle WCE. This
> doesn't necessarily mean a close()/open() sequence if there are other
> means to change the flags, like fcntl() (or even using other protocols
> than files).
>
> The idea here was to extend this to invalidate all caches if some
> specific flag is set. As you don't change any other flag, this will
> usually not be a reopen on a lower level.
>
> If we need to use open() though, and it fails (this is really the only
> "different" result that comes to mind)

(yes)

>  then bdrv_reopen() would fail and
> the old fd would stay in use. Migration would have to fail, but I don't
> think this case is ever needed for reopening after migration.

Okay.

>
> > What's wrong with just delaying the open?
>
> Nothing, except that with today's code it's harder to do.
>

This has never stopped us (though it may delay us).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
@ 2011-11-15 13:28                     ` Avi Kivity
  0 siblings, 0 replies; 102+ messages in thread
From: Avi Kivity @ 2011-11-15 13:28 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Lucas Meneghel Rodrigues, KVM mailing list, Michael S. Tsirkin,
	libvir-list, Marcelo Tosatti, QEMU devel,
	Juan Jose Quintela Carreira

On 11/14/2011 11:58 AM, Kevin Wolf wrote:
> Am 12.11.2011 11:25, schrieb Avi Kivity:
> > On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> >> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> >>> Live migration with qcow2 or any other image format is just not going to work 
> >>> right now even with proper clustered storage.  I think doing a block level flush 
> >>> cache interface and letting block devices decide how to do it is the best approach.
> >>
> >> I would really prefer reusing the existing open/close code. It means
> >> less (duplicated) code, is existing code that is well tested and doesn't
> >> make migration much of a special case.
> >>
> >> If you want to avoid reopening the file on the OS level, we can reopen
> >> only the topmost layer (i.e. the format, but not the protocol) for now
> >> and in 1.1 we can use bdrv_reopen().
> > 
> > Intuitively I dislike _reopen style interfaces.  If the second open
> > yields different results from the first, does it invalidate any
> > computations in between?
>
> Not sure what results and what computation you mean,

Result = open succeeded.  Computation = anything that derives from the
image, like size, or reading some stuff to guess CHS or something.

>  but let me clarify
> a bit about bdrv_reopen:
>
> The main purpose of bdrv_reopen() is to change flags, for example toggle
> O_SYNC during runtime in order to allow the guest to toggle WCE. This
> doesn't necessarily mean a close()/open() sequence if there are other
> means to change the flags, like fcntl() (or even using other protocols
> than files).
>
> The idea here was to extend this to invalidate all caches if some
> specific flag is set. As you don't change any other flag, this will
> usually not be a reopen on a lower level.
>
> If we need to use open() though, and it fails (this is really the only
> "different" result that comes to mind)

(yes)

>  then bdrv_reopen() would fail and
> the old fd would stay in use. Migration would have to fail, but I don't
> think this case is ever needed for reopening after migration.

Okay.

>
> > What's wrong with just delaying the open?
>
> Nothing, except that with today's code it's harder to do.
>

This has never stopped us (though it may delay us).

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
  2011-11-15 13:20                       ` [Qemu-devel] " Juan Quintela
  (?)
@ 2011-11-15 13:56                       ` Anthony Liguori
  -1 siblings, 0 replies; 102+ messages in thread
From: Anthony Liguori @ 2011-11-15 13:56 UTC (permalink / raw)
  To: quintela
  Cc: Kevin Wolf, Lucas Meneghel Rodrigues, KVM mailing list,
	Michael S. Tsirkin, libvir-list, Marcelo Tosatti, QEMU devel,
	Avi Kivity

On 11/15/2011 07:20 AM, Juan Quintela wrote:
>> Again, I think defaulting DAS to cache=none|directsync is what makes
>> the most sense here.
>
> I think it is the only sane solution.  Otherwise, we need to write the
> equivalent of a lock manager, to know _who_ has the storage, and
> distributed lock managers are a mess :-(
>
>> We can even add a migration blocker for DAS with cache=on.  If we can
>> do dynamic toggling of the cache setting, then that's pretty friendly
>> at the end of the day.
>
> That could fix the problem also.  At the moment that we start migration,
> we do an fsync() + switch to O_DIRECT for all filesystems.
>
> As you said, time for implementing fcntl(O_DIRECT).

Yeah, I think this ends up being a very elegant solution.

We always open block devices O_DIRECT to start with.  That ensures reads go 
directly to disk if its DAS or result in NFS protocol reads.

As long as we fsync on the source (and we do), then we're okay.

For cache=write{back,through}, we would then just fcntl() away O_DIRECT as soon 
as we start the guest.  Then we can start doing reads through the page cache.

Regards,

Anthony Liguori

> Later, Juan.
>


^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2011-11-15 13:56 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-09 16:29 qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions Lucas Meneghel Rodrigues
2011-11-09 16:29 ` [Qemu-devel] " Lucas Meneghel Rodrigues
2011-11-09 16:39 ` Anthony Liguori
2011-11-09 16:39   ` [Qemu-devel] " Anthony Liguori
2011-11-09 17:02   ` Avi Kivity
2011-11-09 17:02     ` Avi Kivity
2011-11-09 17:35     ` Anthony Liguori
2011-11-09 17:35       ` [Qemu-devel] " Anthony Liguori
2011-11-09 19:53       ` Juan Quintela
2011-11-09 19:53         ` [Qemu-devel] " Juan Quintela
2011-11-09 20:18       ` Michael S. Tsirkin
2011-11-09 20:18         ` [Qemu-devel] " Michael S. Tsirkin
2011-11-09 20:22         ` Anthony Liguori
2011-11-09 21:00           ` Michael S. Tsirkin
2011-11-09 21:00             ` [Qemu-devel] " Michael S. Tsirkin
2011-11-09 21:01             ` Anthony Liguori
2011-11-09 21:01               ` [Qemu-devel] " Anthony Liguori
2011-11-10 10:41               ` Kevin Wolf
2011-11-10 10:41                 ` Kevin Wolf
2011-11-10 16:50                 ` Juan Quintela
2011-11-10 16:50                   ` [Qemu-devel] " Juan Quintela
2011-11-10 17:59                   ` Anthony Liguori
2011-11-10 17:59                     ` [Qemu-devel] " Anthony Liguori
2011-11-10 18:00                 ` Anthony Liguori
2011-11-10 18:00                   ` Anthony Liguori
2011-11-09 20:57         ` Juan Quintela
2011-11-09 20:57           ` [Qemu-devel] " Juan Quintela
2011-11-10  8:55       ` Avi Kivity
2011-11-10 17:50         ` Juan Quintela
2011-11-10 17:50           ` [Qemu-devel] " Juan Quintela
2011-11-10 17:54         ` Anthony Liguori
2011-11-10 17:54           ` [Qemu-devel] " Anthony Liguori
2011-11-12 10:20           ` Avi Kivity
2011-11-12 10:20             ` [Qemu-devel] " Avi Kivity
2011-11-12 13:30             ` Anthony Liguori
2011-11-12 13:30               ` [Qemu-devel] " Anthony Liguori
2011-11-12 14:36               ` Avi Kivity
2011-11-12 14:36                 ` [Qemu-devel] " Avi Kivity
2011-11-10 18:27         ` Anthony Liguori
2011-11-10 18:27           ` Anthony Liguori
2011-11-10 18:42           ` Daniel P. Berrange
2011-11-10 18:42             ` [Qemu-devel] " Daniel P. Berrange
2011-11-10 19:11             ` Anthony Liguori
2011-11-10 20:06               ` Daniel P. Berrange
2011-11-10 20:07                 ` Anthony Liguori
2011-11-10 21:30           ` Anthony Liguori
2011-11-10 21:30             ` Anthony Liguori
2011-11-11 10:15             ` Kevin Wolf
2011-11-11 10:15               ` [Qemu-devel] " Kevin Wolf
2011-11-11 14:03               ` Anthony Liguori
2011-11-11 14:29                 ` Kevin Wolf
2011-11-11 14:35                   ` Anthony Liguori
2011-11-11 14:44                     ` Kevin Wolf
2011-11-11 20:38                       ` Anthony Liguori
2011-11-12 10:27                 ` Avi Kivity
2011-11-12 13:39                   ` Anthony Liguori
2011-11-12 14:43                     ` Avi Kivity
2011-11-12 16:01                       ` Anthony Liguori
2011-11-12 10:25               ` Avi Kivity
2011-11-12 10:25                 ` Avi Kivity
2011-11-14  9:58                 ` Kevin Wolf
2011-11-14  9:58                   ` Kevin Wolf
2011-11-14 10:10                   ` Michael S. Tsirkin
2011-11-14 10:10                     ` [Qemu-devel] " Michael S. Tsirkin
2011-11-15 13:28                   ` Avi Kivity
2011-11-15 13:28                     ` Avi Kivity
2011-11-14 10:16                 ` Daniel P. Berrange
2011-11-14 10:16                   ` Daniel P. Berrange
2011-11-14 10:24                   ` Michael S. Tsirkin
2011-11-14 10:24                     ` Michael S. Tsirkin
2011-11-14 11:08                     ` Daniel P. Berrange
2011-11-14 11:08                       ` Daniel P. Berrange
2011-11-14 11:21                       ` Kevin Wolf
2011-11-14 11:21                         ` [Qemu-devel] " Kevin Wolf
2011-11-14 11:29                         ` Daniel P. Berrange
2011-11-14 11:29                           ` Daniel P. Berrange
2011-11-14 11:34                           ` Michael S. Tsirkin
2011-11-14 11:34                             ` Michael S. Tsirkin
2011-11-14 11:37                             ` Daniel P. Berrange
2011-11-14 11:37                               ` Daniel P. Berrange
2011-11-14 11:51                               ` Michael S. Tsirkin
2011-11-14 11:51                                 ` Michael S. Tsirkin
2011-11-14 11:55                                 ` Daniel P. Berrange
2011-11-14 11:55                                   ` Daniel P. Berrange
2011-11-14 11:56                               ` Michael S. Tsirkin
2011-11-14 11:56                                 ` [Qemu-devel] " Michael S. Tsirkin
2011-11-14 11:58                                 ` Daniel P. Berrange
2011-11-14 11:58                                   ` Daniel P. Berrange
2011-11-14 12:17                                   ` Michael S. Tsirkin
2011-11-14 12:17                                     ` Michael S. Tsirkin
2011-11-14 11:36                           ` Gleb Natapov
2011-11-14 11:32                       ` Michael S. Tsirkin
2011-11-14 11:32                         ` Michael S. Tsirkin
2011-11-14 14:19                   ` Anthony Liguori
2011-11-14 14:19                     ` Anthony Liguori
2011-11-15 13:20                     ` Juan Quintela
2011-11-15 13:20                       ` [Qemu-devel] " Juan Quintela
2011-11-15 13:56                       ` Anthony Liguori
2011-11-09 19:25 ` Juan Quintela
2011-11-09 19:25   ` [Qemu-devel] " Juan Quintela
2011-11-09 23:33   ` Lucas Meneghel Rodrigues
2011-11-09 23:33     ` [Qemu-devel] " Lucas Meneghel Rodrigues

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.