All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Possibly incorrect data sparsification by qemu-img
@ 2019-04-23 11:30 ` Martin Kletzander
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Kletzander @ 2019-04-23 11:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Jones, Kevin Wolf, Eric Blake

[-- Attachment #1: Type: text/plain, Size: 3987 bytes --]

Hi,

I am using qemu-img with nbdkit to transfer a disk image and the update it with
extra data from newer snapshots.  The end image cannot be transferred because
the snapshots will be created later than the first transfer and we want to save
some time up front.  You might think of it as a continuous synchronisation.  It
looks something like this:

I first transfer the whole image:

  qemu-img convert -p $nbd disk.raw

Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`

Then, after the next snapshot is created, I can update it thanks to the `-n`
parameter (the $nbd now points to the newer snapshot with unchanged data looking
like holes in the file):

  qemu-img convert -p -n $nbd disk.raw

This is fast and efficient as it uses block status nbd extension, so it only
transfers new data.  This can be done over and over again to keep the local
`disk.raw` image up to date with the latest remote snapshot.

However, when the guest OS zeroes some of the data and it gets written into the
snapshot, qemu-img scans for those zeros and does not write them to the
destination image.  Checking the output of `qemu-img map --output=json $nbd`
shows that the zeroed data is properly marked as `data: true`.

Using `-S 0` would write zeros even where the holes are, effectively overwriting
the data from the last snapshot even though they should not be changed.

Having gone through some workarounds I would like there to be another way.  I
know this is far from the typical usage of qemu-img, but is this really the
expected behaviour or is this just something nobody really needed before?  If it
is the former, would it be possible to have a parameter that would control this
behaviour?  If the latter is the case, can that behaviour be changed so that it
properly replicates the data when `-n` parameter is used?

Basically the only thing we need is to either:

 1) write zeros where they actually are or

 2) turn off explicit sparsification without requesting dense image (basically
    sparsify only the par that is reported as hole on the source) or

 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
    but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
    believe, is effectively the same)

If you want to try this out, I found the easiest reproducible way is using
nbdkit's data plugin, which can simulate whatever source image you like.

The first iteration, which transfers the whole image can be simulated like this:

  nbdkit --run 'qemu-img convert -p $nbd output.raw' data data="1" size=2M

That command will expose an artificial disk with the size of 2MB which has first
byte '1' and the rest is zeros/holes and runs the specified qemu-img command on
that ($nbd is supplied by nbdkit, so the string needs to be enclosed in single
parentheses).

You can see how that data is exposed by running:

  nbdkit --run 'qemu-img map --output=json $nbd' data data="1" size=2M

For completeness I get this output:

  [{ "start": 0, "length": 32768, "depth": 0, "zero": false, "data": true},
   { "start": 32768, "length": 2064384, "depth": 0, "zero": true, "data": false}]

Consequent update from a snapshot (with the first block explicitly zeroed) could
be simulated by running:

  nbdkit --run 'qemu-img convert -n -p $nbd output.raw' data data="0" size=2M

Again, the mapping exposed by nbdkit can be seen by running:

  nbdkit --run 'qemu-img map --output=json $nbd' data data="0" size=2M

For completeness I get this output:

  [{ "start": 0, "length": 32768, "depth": 0, "zero": true, "data": true},
   { "start": 32768, "length": 2064384, "depth": 0, "zero": true, "data": false}]

The resulting image still has `1` as its first byte (following is the output of
`hexdump -C output.raw`):

  00000000  01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
  00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
  *
  00200000

Have a nice day,
Martin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] Possibly incorrect data sparsification by qemu-img
@ 2019-04-23 11:30 ` Martin Kletzander
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Kletzander @ 2019-04-23 11:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Richard Jones

[-- Attachment #1: Type: text/plain, Size: 3987 bytes --]

Hi,

I am using qemu-img with nbdkit to transfer a disk image and the update it with
extra data from newer snapshots.  The end image cannot be transferred because
the snapshots will be created later than the first transfer and we want to save
some time up front.  You might think of it as a continuous synchronisation.  It
looks something like this:

I first transfer the whole image:

  qemu-img convert -p $nbd disk.raw

Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`

Then, after the next snapshot is created, I can update it thanks to the `-n`
parameter (the $nbd now points to the newer snapshot with unchanged data looking
like holes in the file):

  qemu-img convert -p -n $nbd disk.raw

This is fast and efficient as it uses block status nbd extension, so it only
transfers new data.  This can be done over and over again to keep the local
`disk.raw` image up to date with the latest remote snapshot.

However, when the guest OS zeroes some of the data and it gets written into the
snapshot, qemu-img scans for those zeros and does not write them to the
destination image.  Checking the output of `qemu-img map --output=json $nbd`
shows that the zeroed data is properly marked as `data: true`.

Using `-S 0` would write zeros even where the holes are, effectively overwriting
the data from the last snapshot even though they should not be changed.

Having gone through some workarounds I would like there to be another way.  I
know this is far from the typical usage of qemu-img, but is this really the
expected behaviour or is this just something nobody really needed before?  If it
is the former, would it be possible to have a parameter that would control this
behaviour?  If the latter is the case, can that behaviour be changed so that it
properly replicates the data when `-n` parameter is used?

Basically the only thing we need is to either:

 1) write zeros where they actually are or

 2) turn off explicit sparsification without requesting dense image (basically
    sparsify only the par that is reported as hole on the source) or

 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
    but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
    believe, is effectively the same)

If you want to try this out, I found the easiest reproducible way is using
nbdkit's data plugin, which can simulate whatever source image you like.

The first iteration, which transfers the whole image can be simulated like this:

  nbdkit --run 'qemu-img convert -p $nbd output.raw' data data="1" size=2M

That command will expose an artificial disk with the size of 2MB which has first
byte '1' and the rest is zeros/holes and runs the specified qemu-img command on
that ($nbd is supplied by nbdkit, so the string needs to be enclosed in single
parentheses).

You can see how that data is exposed by running:

  nbdkit --run 'qemu-img map --output=json $nbd' data data="1" size=2M

For completeness I get this output:

  [{ "start": 0, "length": 32768, "depth": 0, "zero": false, "data": true},
   { "start": 32768, "length": 2064384, "depth": 0, "zero": true, "data": false}]

Consequent update from a snapshot (with the first block explicitly zeroed) could
be simulated by running:

  nbdkit --run 'qemu-img convert -n -p $nbd output.raw' data data="0" size=2M

Again, the mapping exposed by nbdkit can be seen by running:

  nbdkit --run 'qemu-img map --output=json $nbd' data data="0" size=2M

For completeness I get this output:

  [{ "start": 0, "length": 32768, "depth": 0, "zero": true, "data": true},
   { "start": 32768, "length": 2064384, "depth": 0, "zero": true, "data": false}]

The resulting image still has `1` as its first byte (following is the output of
`hexdump -C output.raw`):

  00000000  01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
  00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
  *
  00200000

Have a nice day,
Martin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
@ 2019-04-23 11:36   ` Richard W.M. Jones
  0 siblings, 0 replies; 17+ messages in thread
From: Richard W.M. Jones @ 2019-04-23 11:36 UTC (permalink / raw)
  To: Martin Kletzander; +Cc: qemu-devel, Kevin Wolf, Eric Blake

On Tue, Apr 23, 2019 at 01:30:28PM +0200, Martin Kletzander wrote:
> I am using qemu-img with nbdkit to transfer a disk image and the
> update it with extra data from newer snapshots.  The end image
> cannot be transferred because the snapshots will be created later
> than the first transfer and we want to save some time up front.  You
> might think of it as a continuous synchronisation.

It's important to note here that Martin is reading the data from a
VMware server, so this is not something that can be solved with qemu's
own snapshots.

[...]

I think the following is an even simpler demo which gets to the nub of
the problem as I understand it:

$ rm -f disk.img snap.img
$ dd if=/dev/urandom of=disk.img bs=2M count=1
$ dd if=/dev/zero of=snap.img bs=2M count=1
$ qemu-img convert -n -p snap.img disk.img
$ hexdump -C disk.img | head
00000000  18 30 e8 1f 09 f0 bb 2c  2f c7 b3 97 8f 12 fe 4b  |.0.....,/......K|
00000010  66 f7 28 cb 8e 72 2a 37  6b fa 98 2e a0 e6 d9 cf  |f.(..r*7k.......|
[etc] <- ie. not zeroes

Should we expect disk.img to contain zeroes at the end?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
@ 2019-04-23 11:36   ` Richard W.M. Jones
  0 siblings, 0 replies; 17+ messages in thread
From: Richard W.M. Jones @ 2019-04-23 11:36 UTC (permalink / raw)
  To: Martin Kletzander; +Cc: Kevin Wolf, qemu-devel

On Tue, Apr 23, 2019 at 01:30:28PM +0200, Martin Kletzander wrote:
> I am using qemu-img with nbdkit to transfer a disk image and the
> update it with extra data from newer snapshots.  The end image
> cannot be transferred because the snapshots will be created later
> than the first transfer and we want to save some time up front.  You
> might think of it as a continuous synchronisation.

It's important to note here that Martin is reading the data from a
VMware server, so this is not something that can be solved with qemu's
own snapshots.

[...]

I think the following is an even simpler demo which gets to the nub of
the problem as I understand it:

$ rm -f disk.img snap.img
$ dd if=/dev/urandom of=disk.img bs=2M count=1
$ dd if=/dev/zero of=snap.img bs=2M count=1
$ qemu-img convert -n -p snap.img disk.img
$ hexdump -C disk.img | head
00000000  18 30 e8 1f 09 f0 bb 2c  2f c7 b3 97 8f 12 fe 4b  |.0.....,/......K|
00000010  66 f7 28 cb 8e 72 2a 37  6b fa 98 2e a0 e6 d9 cf  |f.(..r*7k.......|
[etc] <- ie. not zeroes

Should we expect disk.img to contain zeroes at the end?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
  2019-04-23 11:36   ` Richard W.M. Jones
  (?)
@ 2019-04-23 11:55   ` Daniel P. Berrangé
  -1 siblings, 0 replies; 17+ messages in thread
From: Daniel P. Berrangé @ 2019-04-23 11:55 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Martin Kletzander, Kevin Wolf, qemu-devel

On Tue, Apr 23, 2019 at 12:36:02PM +0100, Richard W.M. Jones wrote:
> On Tue, Apr 23, 2019 at 01:30:28PM +0200, Martin Kletzander wrote:
> > I am using qemu-img with nbdkit to transfer a disk image and the
> > update it with extra data from newer snapshots.  The end image
> > cannot be transferred because the snapshots will be created later
> > than the first transfer and we want to save some time up front.  You
> > might think of it as a continuous synchronisation.
> 
> It's important to note here that Martin is reading the data from a
> VMware server, so this is not something that can be solved with qemu's
> own snapshots.
> 
> [...]
> 
> I think the following is an even simpler demo which gets to the nub of
> the problem as I understand it:
> 
> $ rm -f disk.img snap.img
> $ dd if=/dev/urandom of=disk.img bs=2M count=1
> $ dd if=/dev/zero of=snap.img bs=2M count=1
> $ qemu-img convert -n -p snap.img disk.img
> $ hexdump -C disk.img | head
> 00000000  18 30 e8 1f 09 f0 bb 2c  2f c7 b3 97 8f 12 fe 4b  |.0.....,/......K|
> 00000010  66 f7 28 cb 8e 72 2a 37  6b fa 98 2e a0 e6 d9 cf  |f.(..r*7k.......|
> [etc] <- ie. not zeroes
> 
> Should we expect disk.img to contain zeroes at the end?

I'd expect disk.img and snap.img to be identical when read.
snap.img doesn't have to contain zeros (it could be full of
holes instead), but a read should return all zeros.

That doesn't seem to be the case here though. It looks like
QEMU is seeing that disk.img is all zeros and then neither
writing any zeros to snap.img, not punching sparse holes in
it.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
@ 2019-04-23 12:12   ` Kevin Wolf
  0 siblings, 0 replies; 17+ messages in thread
From: Kevin Wolf @ 2019-04-23 12:12 UTC (permalink / raw)
  To: Martin Kletzander; +Cc: qemu-devel, Richard Jones, Eric Blake

[-- Attachment #1: Type: text/plain, Size: 3802 bytes --]

Am 23.04.2019 um 13:30 hat Martin Kletzander geschrieben:
> Hi,
> 
> I am using qemu-img with nbdkit to transfer a disk image and the update it with
> extra data from newer snapshots.  The end image cannot be transferred because
> the snapshots will be created later than the first transfer and we want to save
> some time up front.  You might think of it as a continuous synchronisation.  It
> looks something like this:
> 
> I first transfer the whole image:
> 
>  qemu-img convert -p $nbd disk.raw
> 
> Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`
> 
> Then, after the next snapshot is created, I can update it thanks to the `-n`
> parameter (the $nbd now points to the newer snapshot with unchanged data looking
> like holes in the file):
> 
>  qemu-img convert -p -n $nbd disk.raw
> 
> This is fast and efficient as it uses block status nbd extension, so it only
> transfers new data.

This is an implementation detail. Don't rely on it. What you're doing is
abusing 'qemu-img convert', so problems like what you describe are to be
expected.

> This can be done over and over again to keep the local
> `disk.raw` image up to date with the latest remote snapshot.
> 
> However, when the guest OS zeroes some of the data and it gets written into the
> snapshot, qemu-img scans for those zeros and does not write them to the
> destination image.  Checking the output of `qemu-img map --output=json $nbd`
> shows that the zeroed data is properly marked as `data: true`.
> 
> Using `-S 0` would write zeros even where the holes are, effectively overwriting
> the data from the last snapshot even though they should not be changed.
> 
> Having gone through some workarounds I would like there to be another way.  I
> know this is far from the typical usage of qemu-img, but is this really the
> expected behaviour or is this just something nobody really needed before?  If it
> is the former, would it be possible to have a parameter that would control this
> behaviour?  If the latter is the case, can that behaviour be changed so that it
> properly replicates the data when `-n` parameter is used?
> 
> Basically the only thing we need is to either:
> 
> 1) write zeros where they actually are or
> 
> 2) turn off explicit sparsification without requesting dense image (basically
>    sparsify only the par that is reported as hole on the source) or
> 
> 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
>    but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
>    believe, is effectively the same)
> 
> If you want to try this out, I found the easiest reproducible way is using
> nbdkit's data plugin, which can simulate whatever source image you like.

I think what you _really_ want is a commit block job. The problem is
just that you don't have a proper backing file chain, but just a bunch
of NBD connections.

Can't you get an NBD connection that already provides the condensed form
of the whole snapshot chain directly at the source? If the NBD server
was QEMU, this would actually be easier than providing each snapshot
individually.

If this isn't possible, I think you need to replicate the backing chain
on the destination instead of converting into the same image again and
again so that qemu-img knows that it must take existing data of the
backing file into consideration:

    qemu-img convert -O qcow2 nbd://... base.qcow2
    qemu-img convert -O qcow2 -F qcow2 -B base.qcow2 nbd://... overlay1.qcow2
    qemu-img convert -O qcow2 -F qcow2 -B overlay1.qcow2 nbd://... overlay2.qcow2
    ...

And at the end you can merge the snapshot chain (using a commit or
stream bĺock job, or qemu-img commit/rebase).

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
@ 2019-04-23 12:12   ` Kevin Wolf
  0 siblings, 0 replies; 17+ messages in thread
From: Kevin Wolf @ 2019-04-23 12:12 UTC (permalink / raw)
  To: Martin Kletzander; +Cc: qemu-devel, Richard Jones

[-- Attachment #1: Type: text/plain, Size: 3802 bytes --]

Am 23.04.2019 um 13:30 hat Martin Kletzander geschrieben:
> Hi,
> 
> I am using qemu-img with nbdkit to transfer a disk image and the update it with
> extra data from newer snapshots.  The end image cannot be transferred because
> the snapshots will be created later than the first transfer and we want to save
> some time up front.  You might think of it as a continuous synchronisation.  It
> looks something like this:
> 
> I first transfer the whole image:
> 
>  qemu-img convert -p $nbd disk.raw
> 
> Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`
> 
> Then, after the next snapshot is created, I can update it thanks to the `-n`
> parameter (the $nbd now points to the newer snapshot with unchanged data looking
> like holes in the file):
> 
>  qemu-img convert -p -n $nbd disk.raw
> 
> This is fast and efficient as it uses block status nbd extension, so it only
> transfers new data.

This is an implementation detail. Don't rely on it. What you're doing is
abusing 'qemu-img convert', so problems like what you describe are to be
expected.

> This can be done over and over again to keep the local
> `disk.raw` image up to date with the latest remote snapshot.
> 
> However, when the guest OS zeroes some of the data and it gets written into the
> snapshot, qemu-img scans for those zeros and does not write them to the
> destination image.  Checking the output of `qemu-img map --output=json $nbd`
> shows that the zeroed data is properly marked as `data: true`.
> 
> Using `-S 0` would write zeros even where the holes are, effectively overwriting
> the data from the last snapshot even though they should not be changed.
> 
> Having gone through some workarounds I would like there to be another way.  I
> know this is far from the typical usage of qemu-img, but is this really the
> expected behaviour or is this just something nobody really needed before?  If it
> is the former, would it be possible to have a parameter that would control this
> behaviour?  If the latter is the case, can that behaviour be changed so that it
> properly replicates the data when `-n` parameter is used?
> 
> Basically the only thing we need is to either:
> 
> 1) write zeros where they actually are or
> 
> 2) turn off explicit sparsification without requesting dense image (basically
>    sparsify only the par that is reported as hole on the source) or
> 
> 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
>    but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
>    believe, is effectively the same)
> 
> If you want to try this out, I found the easiest reproducible way is using
> nbdkit's data plugin, which can simulate whatever source image you like.

I think what you _really_ want is a commit block job. The problem is
just that you don't have a proper backing file chain, but just a bunch
of NBD connections.

Can't you get an NBD connection that already provides the condensed form
of the whole snapshot chain directly at the source? If the NBD server
was QEMU, this would actually be easier than providing each snapshot
individually.

If this isn't possible, I think you need to replicate the backing chain
on the destination instead of converting into the same image again and
again so that qemu-img knows that it must take existing data of the
backing file into consideration:

    qemu-img convert -O qcow2 nbd://... base.qcow2
    qemu-img convert -O qcow2 -F qcow2 -B base.qcow2 nbd://... overlay1.qcow2
    qemu-img convert -O qcow2 -F qcow2 -B overlay1.qcow2 nbd://... overlay2.qcow2
    ...

And at the end you can merge the snapshot chain (using a commit or
stream bĺock job, or qemu-img commit/rebase).

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
@ 2019-04-23 14:26     ` Martin Kletzander
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Kletzander @ 2019-04-23 14:26 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Richard Jones, Eric Blake

[-- Attachment #1: Type: text/plain, Size: 4223 bytes --]

On Tue, Apr 23, 2019 at 02:12:18PM +0200, Kevin Wolf wrote:
>Am 23.04.2019 um 13:30 hat Martin Kletzander geschrieben:
>> Hi,
>>
>> I am using qemu-img with nbdkit to transfer a disk image and the update it with
>> extra data from newer snapshots.  The end image cannot be transferred because
>> the snapshots will be created later than the first transfer and we want to save
>> some time up front.  You might think of it as a continuous synchronisation.  It
>> looks something like this:
>>
>> I first transfer the whole image:
>>
>>  qemu-img convert -p $nbd disk.raw
>>
>> Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`
>>
>> Then, after the next snapshot is created, I can update it thanks to the `-n`
>> parameter (the $nbd now points to the newer snapshot with unchanged data looking
>> like holes in the file):
>>
>>  qemu-img convert -p -n $nbd disk.raw
>>
>> This is fast and efficient as it uses block status nbd extension, so it only
>> transfers new data.
>
>This is an implementation detail. Don't rely on it. What you're doing is
>abusing 'qemu-img convert', so problems like what you describe are to be
>expected.
>
>> This can be done over and over again to keep the local
>> `disk.raw` image up to date with the latest remote snapshot.
>>
>> However, when the guest OS zeroes some of the data and it gets written into the
>> snapshot, qemu-img scans for those zeros and does not write them to the
>> destination image.  Checking the output of `qemu-img map --output=json $nbd`
>> shows that the zeroed data is properly marked as `data: true`.
>>
>> Using `-S 0` would write zeros even where the holes are, effectively overwriting
>> the data from the last snapshot even though they should not be changed.
>>
>> Having gone through some workarounds I would like there to be another way.  I
>> know this is far from the typical usage of qemu-img, but is this really the
>> expected behaviour or is this just something nobody really needed before?  If it
>> is the former, would it be possible to have a parameter that would control this
>> behaviour?  If the latter is the case, can that behaviour be changed so that it
>> properly replicates the data when `-n` parameter is used?
>>
>> Basically the only thing we need is to either:
>>
>> 1) write zeros where they actually are or
>>
>> 2) turn off explicit sparsification without requesting dense image (basically
>>    sparsify only the par that is reported as hole on the source) or
>>
>> 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
>>    but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
>>    believe, is effectively the same)
>>
>> If you want to try this out, I found the easiest reproducible way is using
>> nbdkit's data plugin, which can simulate whatever source image you like.
>
>I think what you _really_ want is a commit block job. The problem is
>just that you don't have a proper backing file chain, but just a bunch
>of NBD connections.
>
>Can't you get an NBD connection that already provides the condensed form
>of the whole snapshot chain directly at the source? If the NBD server
>was QEMU, this would actually be easier than providing each snapshot
>individually.
>
>If this isn't possible, I think you need to replicate the backing chain
>on the destination instead of converting into the same image again and
>again so that qemu-img knows that it must take existing data of the
>backing file into consideration:
>
>    qemu-img convert -O qcow2 nbd://... base.qcow2
>    qemu-img convert -O qcow2 -F qcow2 -B base.qcow2 nbd://... overlay1.qcow2
>    qemu-img convert -O qcow2 -F qcow2 -B overlay1.qcow2 nbd://... overlay2.qcow2
>    ...
>

I thought of this, but (to be honest) I did not know that `-B` would work for
nbd.  Does it assume that data are to be taken from the base image if and only
if the source (be it nbd server or just a plain file) says there is a hole?  If
yes, then it could nicely solve the issue.

>And at the end you can merge the snapshot chain (using a commit or
>stream bĺock job, or qemu-img commit/rebase).
>
>Kevin



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
@ 2019-04-23 14:26     ` Martin Kletzander
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Kletzander @ 2019-04-23 14:26 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Richard Jones

[-- Attachment #1: Type: text/plain, Size: 4223 bytes --]

On Tue, Apr 23, 2019 at 02:12:18PM +0200, Kevin Wolf wrote:
>Am 23.04.2019 um 13:30 hat Martin Kletzander geschrieben:
>> Hi,
>>
>> I am using qemu-img with nbdkit to transfer a disk image and the update it with
>> extra data from newer snapshots.  The end image cannot be transferred because
>> the snapshots will be created later than the first transfer and we want to save
>> some time up front.  You might think of it as a continuous synchronisation.  It
>> looks something like this:
>>
>> I first transfer the whole image:
>>
>>  qemu-img convert -p $nbd disk.raw
>>
>> Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`
>>
>> Then, after the next snapshot is created, I can update it thanks to the `-n`
>> parameter (the $nbd now points to the newer snapshot with unchanged data looking
>> like holes in the file):
>>
>>  qemu-img convert -p -n $nbd disk.raw
>>
>> This is fast and efficient as it uses block status nbd extension, so it only
>> transfers new data.
>
>This is an implementation detail. Don't rely on it. What you're doing is
>abusing 'qemu-img convert', so problems like what you describe are to be
>expected.
>
>> This can be done over and over again to keep the local
>> `disk.raw` image up to date with the latest remote snapshot.
>>
>> However, when the guest OS zeroes some of the data and it gets written into the
>> snapshot, qemu-img scans for those zeros and does not write them to the
>> destination image.  Checking the output of `qemu-img map --output=json $nbd`
>> shows that the zeroed data is properly marked as `data: true`.
>>
>> Using `-S 0` would write zeros even where the holes are, effectively overwriting
>> the data from the last snapshot even though they should not be changed.
>>
>> Having gone through some workarounds I would like there to be another way.  I
>> know this is far from the typical usage of qemu-img, but is this really the
>> expected behaviour or is this just something nobody really needed before?  If it
>> is the former, would it be possible to have a parameter that would control this
>> behaviour?  If the latter is the case, can that behaviour be changed so that it
>> properly replicates the data when `-n` parameter is used?
>>
>> Basically the only thing we need is to either:
>>
>> 1) write zeros where they actually are or
>>
>> 2) turn off explicit sparsification without requesting dense image (basically
>>    sparsify only the par that is reported as hole on the source) or
>>
>> 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
>>    but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
>>    believe, is effectively the same)
>>
>> If you want to try this out, I found the easiest reproducible way is using
>> nbdkit's data plugin, which can simulate whatever source image you like.
>
>I think what you _really_ want is a commit block job. The problem is
>just that you don't have a proper backing file chain, but just a bunch
>of NBD connections.
>
>Can't you get an NBD connection that already provides the condensed form
>of the whole snapshot chain directly at the source? If the NBD server
>was QEMU, this would actually be easier than providing each snapshot
>individually.
>
>If this isn't possible, I think you need to replicate the backing chain
>on the destination instead of converting into the same image again and
>again so that qemu-img knows that it must take existing data of the
>backing file into consideration:
>
>    qemu-img convert -O qcow2 nbd://... base.qcow2
>    qemu-img convert -O qcow2 -F qcow2 -B base.qcow2 nbd://... overlay1.qcow2
>    qemu-img convert -O qcow2 -F qcow2 -B overlay1.qcow2 nbd://... overlay2.qcow2
>    ...
>

I thought of this, but (to be honest) I did not know that `-B` would work for
nbd.  Does it assume that data are to be taken from the base image if and only
if the source (be it nbd server or just a plain file) says there is a hole?  If
yes, then it could nicely solve the issue.

>And at the end you can merge the snapshot chain (using a commit or
>stream bĺock job, or qemu-img commit/rebase).
>
>Kevin



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
@ 2019-04-23 15:08       ` Kevin Wolf
  0 siblings, 0 replies; 17+ messages in thread
From: Kevin Wolf @ 2019-04-23 15:08 UTC (permalink / raw)
  To: Martin Kletzander; +Cc: qemu-devel, Richard Jones, Eric Blake

[-- Attachment #1: Type: text/plain, Size: 4829 bytes --]

Am 23.04.2019 um 16:26 hat Martin Kletzander geschrieben:
> On Tue, Apr 23, 2019 at 02:12:18PM +0200, Kevin Wolf wrote:
> > Am 23.04.2019 um 13:30 hat Martin Kletzander geschrieben:
> > > Hi,
> > > 
> > > I am using qemu-img with nbdkit to transfer a disk image and the update it with
> > > extra data from newer snapshots.  The end image cannot be transferred because
> > > the snapshots will be created later than the first transfer and we want to save
> > > some time up front.  You might think of it as a continuous synchronisation.  It
> > > looks something like this:
> > > 
> > > I first transfer the whole image:
> > > 
> > >  qemu-img convert -p $nbd disk.raw
> > > 
> > > Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`
> > > 
> > > Then, after the next snapshot is created, I can update it thanks to the `-n`
> > > parameter (the $nbd now points to the newer snapshot with unchanged data looking
> > > like holes in the file):
> > > 
> > >  qemu-img convert -p -n $nbd disk.raw
> > > 
> > > This is fast and efficient as it uses block status nbd extension, so it only
> > > transfers new data.
> > 
> > This is an implementation detail. Don't rely on it. What you're doing is
> > abusing 'qemu-img convert', so problems like what you describe are to be
> > expected.
> > 
> > > This can be done over and over again to keep the local
> > > `disk.raw` image up to date with the latest remote snapshot.
> > > 
> > > However, when the guest OS zeroes some of the data and it gets written into the
> > > snapshot, qemu-img scans for those zeros and does not write them to the
> > > destination image.  Checking the output of `qemu-img map --output=json $nbd`
> > > shows that the zeroed data is properly marked as `data: true`.
> > > 
> > > Using `-S 0` would write zeros even where the holes are, effectively overwriting
> > > the data from the last snapshot even though they should not be changed.
> > > 
> > > Having gone through some workarounds I would like there to be another way.  I
> > > know this is far from the typical usage of qemu-img, but is this really the
> > > expected behaviour or is this just something nobody really needed before?  If it
> > > is the former, would it be possible to have a parameter that would control this
> > > behaviour?  If the latter is the case, can that behaviour be changed so that it
> > > properly replicates the data when `-n` parameter is used?
> > > 
> > > Basically the only thing we need is to either:
> > > 
> > > 1) write zeros where they actually are or
> > > 
> > > 2) turn off explicit sparsification without requesting dense image (basically
> > >    sparsify only the par that is reported as hole on the source) or
> > > 
> > > 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
> > >    but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
> > >    believe, is effectively the same)
> > > 
> > > If you want to try this out, I found the easiest reproducible way is using
> > > nbdkit's data plugin, which can simulate whatever source image you like.
> > 
> > I think what you _really_ want is a commit block job. The problem is
> > just that you don't have a proper backing file chain, but just a bunch
> > of NBD connections.
> > 
> > Can't you get an NBD connection that already provides the condensed form
> > of the whole snapshot chain directly at the source? If the NBD server
> > was QEMU, this would actually be easier than providing each snapshot
> > individually.
> > 
> > If this isn't possible, I think you need to replicate the backing chain
> > on the destination instead of converting into the same image again and
> > again so that qemu-img knows that it must take existing data of the
> > backing file into consideration:
> > 
> >    qemu-img convert -O qcow2 nbd://... base.qcow2
> >    qemu-img convert -O qcow2 -F qcow2 -B base.qcow2 nbd://... overlay1.qcow2
> >    qemu-img convert -O qcow2 -F qcow2 -B overlay1.qcow2 nbd://... overlay2.qcow2
> >    ...
> > 
> 
> I thought of this, but (to be honest) I did not know that `-B` would
> work for nbd.

It still depends on the NBD server providing the right block allocation
status, but that's no worse than what you needed for -n. But whether -B
can be used at all depends on the target format, not the source.

> Does it assume that data are to be taken from the base image if and
> only if the source (be it nbd server or just a plain file) says there
> is a hole?  If yes, then it could nicely solve the issue.

I haven't tested it now, but yes, that's what I remember it to do.

Looking at the code, the requirement seems to be that the NBD server
flags the sparse blocks as a HOLE, but not as ZERO.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
@ 2019-04-23 15:08       ` Kevin Wolf
  0 siblings, 0 replies; 17+ messages in thread
From: Kevin Wolf @ 2019-04-23 15:08 UTC (permalink / raw)
  To: Martin Kletzander; +Cc: qemu-devel, Richard Jones

[-- Attachment #1: Type: text/plain, Size: 4829 bytes --]

Am 23.04.2019 um 16:26 hat Martin Kletzander geschrieben:
> On Tue, Apr 23, 2019 at 02:12:18PM +0200, Kevin Wolf wrote:
> > Am 23.04.2019 um 13:30 hat Martin Kletzander geschrieben:
> > > Hi,
> > > 
> > > I am using qemu-img with nbdkit to transfer a disk image and the update it with
> > > extra data from newer snapshots.  The end image cannot be transferred because
> > > the snapshots will be created later than the first transfer and we want to save
> > > some time up front.  You might think of it as a continuous synchronisation.  It
> > > looks something like this:
> > > 
> > > I first transfer the whole image:
> > > 
> > >  qemu-img convert -p $nbd disk.raw
> > > 
> > > Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`
> > > 
> > > Then, after the next snapshot is created, I can update it thanks to the `-n`
> > > parameter (the $nbd now points to the newer snapshot with unchanged data looking
> > > like holes in the file):
> > > 
> > >  qemu-img convert -p -n $nbd disk.raw
> > > 
> > > This is fast and efficient as it uses block status nbd extension, so it only
> > > transfers new data.
> > 
> > This is an implementation detail. Don't rely on it. What you're doing is
> > abusing 'qemu-img convert', so problems like what you describe are to be
> > expected.
> > 
> > > This can be done over and over again to keep the local
> > > `disk.raw` image up to date with the latest remote snapshot.
> > > 
> > > However, when the guest OS zeroes some of the data and it gets written into the
> > > snapshot, qemu-img scans for those zeros and does not write them to the
> > > destination image.  Checking the output of `qemu-img map --output=json $nbd`
> > > shows that the zeroed data is properly marked as `data: true`.
> > > 
> > > Using `-S 0` would write zeros even where the holes are, effectively overwriting
> > > the data from the last snapshot even though they should not be changed.
> > > 
> > > Having gone through some workarounds I would like there to be another way.  I
> > > know this is far from the typical usage of qemu-img, but is this really the
> > > expected behaviour or is this just something nobody really needed before?  If it
> > > is the former, would it be possible to have a parameter that would control this
> > > behaviour?  If the latter is the case, can that behaviour be changed so that it
> > > properly replicates the data when `-n` parameter is used?
> > > 
> > > Basically the only thing we need is to either:
> > > 
> > > 1) write zeros where they actually are or
> > > 
> > > 2) turn off explicit sparsification without requesting dense image (basically
> > >    sparsify only the par that is reported as hole on the source) or
> > > 
> > > 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
> > >    but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
> > >    believe, is effectively the same)
> > > 
> > > If you want to try this out, I found the easiest reproducible way is using
> > > nbdkit's data plugin, which can simulate whatever source image you like.
> > 
> > I think what you _really_ want is a commit block job. The problem is
> > just that you don't have a proper backing file chain, but just a bunch
> > of NBD connections.
> > 
> > Can't you get an NBD connection that already provides the condensed form
> > of the whole snapshot chain directly at the source? If the NBD server
> > was QEMU, this would actually be easier than providing each snapshot
> > individually.
> > 
> > If this isn't possible, I think you need to replicate the backing chain
> > on the destination instead of converting into the same image again and
> > again so that qemu-img knows that it must take existing data of the
> > backing file into consideration:
> > 
> >    qemu-img convert -O qcow2 nbd://... base.qcow2
> >    qemu-img convert -O qcow2 -F qcow2 -B base.qcow2 nbd://... overlay1.qcow2
> >    qemu-img convert -O qcow2 -F qcow2 -B overlay1.qcow2 nbd://... overlay2.qcow2
> >    ...
> > 
> 
> I thought of this, but (to be honest) I did not know that `-B` would
> work for nbd.

It still depends on the NBD server providing the right block allocation
status, but that's no worse than what you needed for -n. But whether -B
can be used at all depends on the target format, not the source.

> Does it assume that data are to be taken from the base image if and
> only if the source (be it nbd server or just a plain file) says there
> is a hole?  If yes, then it could nicely solve the issue.

I haven't tested it now, but yes, that's what I remember it to do.

Looking at the code, the requirement seems to be that the NBD server
flags the sparse blocks as a HOLE, but not as ZERO.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
  2019-04-23 15:08       ` Kevin Wolf
  (?)
@ 2019-04-24  6:40       ` Vladimir Sementsov-Ogievskiy
  2019-04-24  7:19         ` Kevin Wolf
  -1 siblings, 1 reply; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-24  6:40 UTC (permalink / raw)
  To: Kevin Wolf, Martin Kletzander; +Cc: qemu-devel, Richard Jones

23.04.2019 18:08, Kevin Wolf wrote:
> Am 23.04.2019 um 16:26 hat Martin Kletzander geschrieben:
>> On Tue, Apr 23, 2019 at 02:12:18PM +0200, Kevin Wolf wrote:
>>> Am 23.04.2019 um 13:30 hat Martin Kletzander geschrieben:
>>>> Hi,
>>>>
>>>> I am using qemu-img with nbdkit to transfer a disk image and the update it with
>>>> extra data from newer snapshots.  The end image cannot be transferred because
>>>> the snapshots will be created later than the first transfer and we want to save
>>>> some time up front.  You might think of it as a continuous synchronisation.  It
>>>> looks something like this:
>>>>
>>>> I first transfer the whole image:
>>>>
>>>>   qemu-img convert -p $nbd disk.raw
>>>>
>>>> Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`
>>>>
>>>> Then, after the next snapshot is created, I can update it thanks to the `-n`
>>>> parameter (the $nbd now points to the newer snapshot with unchanged data looking
>>>> like holes in the file):
>>>>
>>>>   qemu-img convert -p -n $nbd disk.raw
>>>>
>>>> This is fast and efficient as it uses block status nbd extension, so it only
>>>> transfers new data.
>>>
>>> This is an implementation detail. Don't rely on it. What you're doing is
>>> abusing 'qemu-img convert', so problems like what you describe are to be
>>> expected.
>>>
>>>> This can be done over and over again to keep the local
>>>> `disk.raw` image up to date with the latest remote snapshot.
>>>>
>>>> However, when the guest OS zeroes some of the data and it gets written into the
>>>> snapshot, qemu-img scans for those zeros and does not write them to the
>>>> destination image.  Checking the output of `qemu-img map --output=json $nbd`
>>>> shows that the zeroed data is properly marked as `data: true`.
>>>>
>>>> Using `-S 0` would write zeros even where the holes are, effectively overwriting
>>>> the data from the last snapshot even though they should not be changed.
>>>>
>>>> Having gone through some workarounds I would like there to be another way.  I
>>>> know this is far from the typical usage of qemu-img, but is this really the
>>>> expected behaviour or is this just something nobody really needed before?  If it
>>>> is the former, would it be possible to have a parameter that would control this
>>>> behaviour?  If the latter is the case, can that behaviour be changed so that it
>>>> properly replicates the data when `-n` parameter is used?
>>>>
>>>> Basically the only thing we need is to either:
>>>>
>>>> 1) write zeros where they actually are or
>>>>
>>>> 2) turn off explicit sparsification without requesting dense image (basically
>>>>     sparsify only the par that is reported as hole on the source) or
>>>>
>>>> 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
>>>>     but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
>>>>     believe, is effectively the same)
>>>>
>>>> If you want to try this out, I found the easiest reproducible way is using
>>>> nbdkit's data plugin, which can simulate whatever source image you like.
>>>
>>> I think what you _really_ want is a commit block job. The problem is
>>> just that you don't have a proper backing file chain, but just a bunch
>>> of NBD connections.
>>>
>>> Can't you get an NBD connection that already provides the condensed form
>>> of the whole snapshot chain directly at the source? If the NBD server
>>> was QEMU, this would actually be easier than providing each snapshot
>>> individually.
>>>
>>> If this isn't possible, I think you need to replicate the backing chain
>>> on the destination instead of converting into the same image again and
>>> again so that qemu-img knows that it must take existing data of the
>>> backing file into consideration:
>>>
>>>     qemu-img convert -O qcow2 nbd://... base.qcow2
>>>     qemu-img convert -O qcow2 -F qcow2 -B base.qcow2 nbd://... overlay1.qcow2
>>>     qemu-img convert -O qcow2 -F qcow2 -B overlay1.qcow2 nbd://... overlay2.qcow2
>>>     ...

Is it safe in general?

Qemu often consider rounding-up allocated ranges to be safe, or just consider unknown area as allocated.
And if this happen, we'll convert unallocated hole to allocated zeroes on target, which will break
backing chain.

This way would be correct if on source under nbd server we have valid backing chain too, so in case of
"rounding-up" we'll just read valid data from backing. But it is not the case (or sorry, if I misunderstood).

>>>
>>
>> I thought of this, but (to be honest) I did not know that `-B` would
>> work for nbd.
> 
> It still depends on the NBD server providing the right block allocation
> status, but that's no worse than what you needed for -n. But whether -B
> can be used at all depends on the target format, not the source.
> 
>> Does it assume that data are to be taken from the base image if and
>> only if the source (be it nbd server or just a plain file) says there
>> is a hole?  If yes, then it could nicely solve the issue.
> 
> I haven't tested it now, but yes, that's what I remember it to do.
> 
> Looking at the code, the requirement seems to be that the NBD server
> flags the sparse blocks as a HOLE, but not as ZERO.
> 
> Kevin
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
  2019-04-24  6:40       ` Vladimir Sementsov-Ogievskiy
@ 2019-04-24  7:19         ` Kevin Wolf
  2019-04-24  9:04           ` Martin Kletzander
  2019-04-29  7:27           ` Martin Kletzander
  0 siblings, 2 replies; 17+ messages in thread
From: Kevin Wolf @ 2019-04-24  7:19 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: Martin Kletzander, qemu-devel, Richard Jones

Am 24.04.2019 um 08:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 23.04.2019 18:08, Kevin Wolf wrote:
> > Am 23.04.2019 um 16:26 hat Martin Kletzander geschrieben:
> >> On Tue, Apr 23, 2019 at 02:12:18PM +0200, Kevin Wolf wrote:
> >>> Am 23.04.2019 um 13:30 hat Martin Kletzander geschrieben:
> >>>> Hi,
> >>>>
> >>>> I am using qemu-img with nbdkit to transfer a disk image and the update it with
> >>>> extra data from newer snapshots.  The end image cannot be transferred because
> >>>> the snapshots will be created later than the first transfer and we want to save
> >>>> some time up front.  You might think of it as a continuous synchronisation.  It
> >>>> looks something like this:
> >>>>
> >>>> I first transfer the whole image:
> >>>>
> >>>>   qemu-img convert -p $nbd disk.raw
> >>>>
> >>>> Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`
> >>>>
> >>>> Then, after the next snapshot is created, I can update it thanks to the `-n`
> >>>> parameter (the $nbd now points to the newer snapshot with unchanged data looking
> >>>> like holes in the file):
> >>>>
> >>>>   qemu-img convert -p -n $nbd disk.raw
> >>>>
> >>>> This is fast and efficient as it uses block status nbd extension, so it only
> >>>> transfers new data.
> >>>
> >>> This is an implementation detail. Don't rely on it. What you're doing is
> >>> abusing 'qemu-img convert', so problems like what you describe are to be
> >>> expected.
> >>>
> >>>> This can be done over and over again to keep the local
> >>>> `disk.raw` image up to date with the latest remote snapshot.
> >>>>
> >>>> However, when the guest OS zeroes some of the data and it gets written into the
> >>>> snapshot, qemu-img scans for those zeros and does not write them to the
> >>>> destination image.  Checking the output of `qemu-img map --output=json $nbd`
> >>>> shows that the zeroed data is properly marked as `data: true`.
> >>>>
> >>>> Using `-S 0` would write zeros even where the holes are, effectively overwriting
> >>>> the data from the last snapshot even though they should not be changed.
> >>>>
> >>>> Having gone through some workarounds I would like there to be another way.  I
> >>>> know this is far from the typical usage of qemu-img, but is this really the
> >>>> expected behaviour or is this just something nobody really needed before?  If it
> >>>> is the former, would it be possible to have a parameter that would control this
> >>>> behaviour?  If the latter is the case, can that behaviour be changed so that it
> >>>> properly replicates the data when `-n` parameter is used?
> >>>>
> >>>> Basically the only thing we need is to either:
> >>>>
> >>>> 1) write zeros where they actually are or
> >>>>
> >>>> 2) turn off explicit sparsification without requesting dense image (basically
> >>>>     sparsify only the par that is reported as hole on the source) or
> >>>>
> >>>> 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
> >>>>     but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
> >>>>     believe, is effectively the same)
> >>>>
> >>>> If you want to try this out, I found the easiest reproducible way is using
> >>>> nbdkit's data plugin, which can simulate whatever source image you like.
> >>>
> >>> I think what you _really_ want is a commit block job. The problem is
> >>> just that you don't have a proper backing file chain, but just a bunch
> >>> of NBD connections.
> >>>
> >>> Can't you get an NBD connection that already provides the condensed form
> >>> of the whole snapshot chain directly at the source? If the NBD server
> >>> was QEMU, this would actually be easier than providing each snapshot
> >>> individually.
> >>>
> >>> If this isn't possible, I think you need to replicate the backing chain
> >>> on the destination instead of converting into the same image again and
> >>> again so that qemu-img knows that it must take existing data of the
> >>> backing file into consideration:
> >>>
> >>>     qemu-img convert -O qcow2 nbd://... base.qcow2
> >>>     qemu-img convert -O qcow2 -F qcow2 -B base.qcow2 nbd://... overlay1.qcow2
> >>>     qemu-img convert -O qcow2 -F qcow2 -B overlay1.qcow2 nbd://... overlay2.qcow2
> >>>     ...
> 
> Is it safe in general?
> 
> Qemu often consider rounding-up allocated ranges to be safe, or just
> consider unknown area as allocated.  And if this happen, we'll convert
> unallocated hole to allocated zeroes on target, which will break
> backing chain.
> 
> This way would be correct if on source under nbd server we have valid
> backing chain too, so in case of "rounding-up" we'll just read valid
> data from backing. But it is not the case (or sorry, if I
> misunderstood).

As I said, it depends on the NBD server providing the right block
allocation status - and this includes alignment etc. as well.

It's not a very nice solution because NBD doesn't actually do backing
fields, so we're relying on things that the spec doesn't talk about. But
as I understand, we don't have control over the server side, so it's
probably the best we can do under these conditions.

If the NBD server already took the backing chain into consideration, it
would indeed be much more reliable.

Kevin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
  2019-04-24  7:19         ` Kevin Wolf
@ 2019-04-24  9:04           ` Martin Kletzander
  2019-04-29  7:27           ` Martin Kletzander
  1 sibling, 0 replies; 17+ messages in thread
From: Martin Kletzander @ 2019-04-24  9:04 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Richard Jones

[-- Attachment #1: Type: text/plain, Size: 6376 bytes --]

On Wed, Apr 24, 2019 at 09:19:17AM +0200, Kevin Wolf wrote:
>Am 24.04.2019 um 08:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 23.04.2019 18:08, Kevin Wolf wrote:
>> > Am 23.04.2019 um 16:26 hat Martin Kletzander geschrieben:
>> >> On Tue, Apr 23, 2019 at 02:12:18PM +0200, Kevin Wolf wrote:
>> >>> Am 23.04.2019 um 13:30 hat Martin Kletzander geschrieben:
>> >>>> Hi,
>> >>>>
>> >>>> I am using qemu-img with nbdkit to transfer a disk image and the update it with
>> >>>> extra data from newer snapshots.  The end image cannot be transferred because
>> >>>> the snapshots will be created later than the first transfer and we want to save
>> >>>> some time up front.  You might think of it as a continuous synchronisation.  It
>> >>>> looks something like this:
>> >>>>
>> >>>> I first transfer the whole image:
>> >>>>
>> >>>>   qemu-img convert -p $nbd disk.raw
>> >>>>
>> >>>> Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`
>> >>>>
>> >>>> Then, after the next snapshot is created, I can update it thanks to the `-n`
>> >>>> parameter (the $nbd now points to the newer snapshot with unchanged data looking
>> >>>> like holes in the file):
>> >>>>
>> >>>>   qemu-img convert -p -n $nbd disk.raw
>> >>>>
>> >>>> This is fast and efficient as it uses block status nbd extension, so it only
>> >>>> transfers new data.
>> >>>
>> >>> This is an implementation detail. Don't rely on it. What you're doing is
>> >>> abusing 'qemu-img convert', so problems like what you describe are to be
>> >>> expected.
>> >>>
>> >>>> This can be done over and over again to keep the local
>> >>>> `disk.raw` image up to date with the latest remote snapshot.
>> >>>>
>> >>>> However, when the guest OS zeroes some of the data and it gets written into the
>> >>>> snapshot, qemu-img scans for those zeros and does not write them to the
>> >>>> destination image.  Checking the output of `qemu-img map --output=json $nbd`
>> >>>> shows that the zeroed data is properly marked as `data: true`.
>> >>>>
>> >>>> Using `-S 0` would write zeros even where the holes are, effectively overwriting
>> >>>> the data from the last snapshot even though they should not be changed.
>> >>>>
>> >>>> Having gone through some workarounds I would like there to be another way.  I
>> >>>> know this is far from the typical usage of qemu-img, but is this really the
>> >>>> expected behaviour or is this just something nobody really needed before?  If it
>> >>>> is the former, would it be possible to have a parameter that would control this
>> >>>> behaviour?  If the latter is the case, can that behaviour be changed so that it
>> >>>> properly replicates the data when `-n` parameter is used?
>> >>>>
>> >>>> Basically the only thing we need is to either:
>> >>>>
>> >>>> 1) write zeros where they actually are or
>> >>>>
>> >>>> 2) turn off explicit sparsification without requesting dense image (basically
>> >>>>     sparsify only the par that is reported as hole on the source) or
>> >>>>
>> >>>> 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
>> >>>>     but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
>> >>>>     believe, is effectively the same)
>> >>>>
>> >>>> If you want to try this out, I found the easiest reproducible way is using
>> >>>> nbdkit's data plugin, which can simulate whatever source image you like.
>> >>>
>> >>> I think what you _really_ want is a commit block job. The problem is
>> >>> just that you don't have a proper backing file chain, but just a bunch
>> >>> of NBD connections.
>> >>>
>> >>> Can't you get an NBD connection that already provides the condensed form
>> >>> of the whole snapshot chain directly at the source? If the NBD server
>> >>> was QEMU, this would actually be easier than providing each snapshot
>> >>> individually.
>> >>>
>> >>> If this isn't possible, I think you need to replicate the backing chain
>> >>> on the destination instead of converting into the same image again and
>> >>> again so that qemu-img knows that it must take existing data of the
>> >>> backing file into consideration:
>> >>>
>> >>>     qemu-img convert -O qcow2 nbd://... base.qcow2
>> >>>     qemu-img convert -O qcow2 -F qcow2 -B base.qcow2 nbd://... overlay1.qcow2
>> >>>     qemu-img convert -O qcow2 -F qcow2 -B overlay1.qcow2 nbd://... overlay2.qcow2
>> >>>     ...
>>
>> Is it safe in general?
>>
>> Qemu often consider rounding-up allocated ranges to be safe, or just
>> consider unknown area as allocated.  And if this happen, we'll convert
>> unallocated hole to allocated zeroes on target, which will break
>> backing chain.
>>
>> This way would be correct if on source under nbd server we have valid
>> backing chain too, so in case of "rounding-up" we'll just read valid
>> data from backing. But it is not the case (or sorry, if I
>> misunderstood).
>
>As I said, it depends on the NBD server providing the right block
>allocation status - and this includes alignment etc. as well.
>
>It's not a very nice solution because NBD doesn't actually do backing
>fields, so we're relying on things that the spec doesn't talk about. But

That is what I was concerned about, if I understand correctly there is no
concept of backing chains in the NBD protocol.

>as I understand, we don't have control over the server side, so it's
>probably the best we can do under these conditions.
>
>If the NBD server already took the backing chain into consideration, it
>would indeed be much more reliable.
>

We *kind of* have control over the server.  The nbd server is nbdkit in which we
can make sure does the right thing, however making it open the local file as
backing is something that does not really fit in the design, or at least not
yet.

But we can make sure the provided data is correct even for unallocated areas
because the backing chain we replicated is present on the source.  Reading
couple more blocks is still a major improvement over reading all the data.

I tried your solution and it works nicely, even though it consumes more data
then needed.  I'm guessing this could be at least partially avoided by using
internal snapshots, if that was supported with `convert`, but that's not really
needed.  This is more than enough and, as more of us said, this usage is kind of
an abuse of what qemu-img is designed to do.

Thanks everyone for all the help!

Martin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
  2019-04-24  7:19         ` Kevin Wolf
  2019-04-24  9:04           ` Martin Kletzander
@ 2019-04-29  7:27           ` Martin Kletzander
  2019-04-29  8:58             ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 17+ messages in thread
From: Martin Kletzander @ 2019-04-29  7:27 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Richard Jones

[-- Attachment #1: Type: text/plain, Size: 5543 bytes --]

On Wed, Apr 24, 2019 at 09:19:17AM +0200, Kevin Wolf wrote:
>Am 24.04.2019 um 08:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 23.04.2019 18:08, Kevin Wolf wrote:
>> > Am 23.04.2019 um 16:26 hat Martin Kletzander geschrieben:
>> >> On Tue, Apr 23, 2019 at 02:12:18PM +0200, Kevin Wolf wrote:
>> >>> Am 23.04.2019 um 13:30 hat Martin Kletzander geschrieben:
>> >>>> Hi,
>> >>>>
>> >>>> I am using qemu-img with nbdkit to transfer a disk image and the update it with
>> >>>> extra data from newer snapshots.  The end image cannot be transferred because
>> >>>> the snapshots will be created later than the first transfer and we want to save
>> >>>> some time up front.  You might think of it as a continuous synchronisation.  It
>> >>>> looks something like this:
>> >>>>
>> >>>> I first transfer the whole image:
>> >>>>
>> >>>>   qemu-img convert -p $nbd disk.raw
>> >>>>
>> >>>> Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`
>> >>>>
>> >>>> Then, after the next snapshot is created, I can update it thanks to the `-n`
>> >>>> parameter (the $nbd now points to the newer snapshot with unchanged data looking
>> >>>> like holes in the file):
>> >>>>
>> >>>>   qemu-img convert -p -n $nbd disk.raw
>> >>>>
>> >>>> This is fast and efficient as it uses block status nbd extension, so it only
>> >>>> transfers new data.
>> >>>
>> >>> This is an implementation detail. Don't rely on it. What you're doing is
>> >>> abusing 'qemu-img convert', so problems like what you describe are to be
>> >>> expected.
>> >>>
>> >>>> This can be done over and over again to keep the local
>> >>>> `disk.raw` image up to date with the latest remote snapshot.
>> >>>>
>> >>>> However, when the guest OS zeroes some of the data and it gets written into the
>> >>>> snapshot, qemu-img scans for those zeros and does not write them to the
>> >>>> destination image.  Checking the output of `qemu-img map --output=json $nbd`
>> >>>> shows that the zeroed data is properly marked as `data: true`.
>> >>>>
>> >>>> Using `-S 0` would write zeros even where the holes are, effectively overwriting
>> >>>> the data from the last snapshot even though they should not be changed.
>> >>>>
>> >>>> Having gone through some workarounds I would like there to be another way.  I
>> >>>> know this is far from the typical usage of qemu-img, but is this really the
>> >>>> expected behaviour or is this just something nobody really needed before?  If it
>> >>>> is the former, would it be possible to have a parameter that would control this
>> >>>> behaviour?  If the latter is the case, can that behaviour be changed so that it
>> >>>> properly replicates the data when `-n` parameter is used?
>> >>>>
>> >>>> Basically the only thing we need is to either:
>> >>>>
>> >>>> 1) write zeros where they actually are or
>> >>>>
>> >>>> 2) turn off explicit sparsification without requesting dense image (basically
>> >>>>     sparsify only the par that is reported as hole on the source) or
>> >>>>
>> >>>> 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
>> >>>>     but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
>> >>>>     believe, is effectively the same)
>> >>>>
>> >>>> If you want to try this out, I found the easiest reproducible way is using
>> >>>> nbdkit's data plugin, which can simulate whatever source image you like.
>> >>>
>> >>> I think what you _really_ want is a commit block job. The problem is
>> >>> just that you don't have a proper backing file chain, but just a bunch
>> >>> of NBD connections.
>> >>>
>> >>> Can't you get an NBD connection that already provides the condensed form
>> >>> of the whole snapshot chain directly at the source? If the NBD server
>> >>> was QEMU, this would actually be easier than providing each snapshot
>> >>> individually.
>> >>>
>> >>> If this isn't possible, I think you need to replicate the backing chain
>> >>> on the destination instead of converting into the same image again and
>> >>> again so that qemu-img knows that it must take existing data of the
>> >>> backing file into consideration:
>> >>>
>> >>>     qemu-img convert -O qcow2 nbd://... base.qcow2
>> >>>     qemu-img convert -O qcow2 -F qcow2 -B base.qcow2 nbd://... overlay1.qcow2
>> >>>     qemu-img convert -O qcow2 -F qcow2 -B overlay1.qcow2 nbd://... overlay2.qcow2
>> >>>     ...
>>

So I spoke too soon.  This approach fixed the one thing that I was struggling with, but broke the rest, because it completely replicates the last image even when the source provides proper allocation data.  Best to show with an illustration:

  $ rm -f disk.img snap.img
  $ dd if=/dev/urandom of=disk.img bs=2M count=1
  $ dd if=/dev/zero of=snap.img bs=1M count=1
  $ truncate -s 2M snap.img
  $ qemu-img map --output=json snap.img
  [{ "start": 0, "length": 1048576, "depth": 0, "zero": false, "data": true, "offset": 0},
  { "start": 1048576, "length": 1048576, "depth": 0, "zero": true, "data": false, "offset": 1048576}]
  $ qemu-img convert -f raw -O qcow2 disk.img disk.qcow2
  $ qemu-img convert -f raw -O qcow2 -B disk.qcow2 snap.img snap.qcow2
  $ qemu-img convert -f qcow2 -O raw snap.qcow2 output.raw
  $ hexdump -C output.raw
  00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
  *
  00200000

And qemu-img convert from qcow2 to raw is not broken

So it looks like either we add support for this specific feature in qemu-img or
we need to use our own client that does that.

Unless someone has other ideas, that is.

Martin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
  2019-04-29  7:27           ` Martin Kletzander
@ 2019-04-29  8:58             ` Vladimir Sementsov-Ogievskiy
  2019-04-29  9:16               ` Martin Kletzander
  0 siblings, 1 reply; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-29  8:58 UTC (permalink / raw)
  To: Martin Kletzander, Kevin Wolf; +Cc: qemu-devel, Richard Jones

29.04.2019 10:27, Martin Kletzander wrote:
> On Wed, Apr 24, 2019 at 09:19:17AM +0200, Kevin Wolf wrote:
>> Am 24.04.2019 um 08:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> 23.04.2019 18:08, Kevin Wolf wrote:
>>> > Am 23.04.2019 um 16:26 hat Martin Kletzander geschrieben:
>>> >> On Tue, Apr 23, 2019 at 02:12:18PM +0200, Kevin Wolf wrote:
>>> >>> Am 23.04.2019 um 13:30 hat Martin Kletzander geschrieben:
>>> >>>> Hi,
>>> >>>>
>>> >>>> I am using qemu-img with nbdkit to transfer a disk image and the update it with
>>> >>>> extra data from newer snapshots.  The end image cannot be transferred because
>>> >>>> the snapshots will be created later than the first transfer and we want to save
>>> >>>> some time up front.  You might think of it as a continuous synchronisation.  It
>>> >>>> looks something like this:
>>> >>>>
>>> >>>> I first transfer the whole image:
>>> >>>>
>>> >>>>   qemu-img convert -p $nbd disk.raw
>>> >>>>
>>> >>>> Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`
>>> >>>>
>>> >>>> Then, after the next snapshot is created, I can update it thanks to the `-n`
>>> >>>> parameter (the $nbd now points to the newer snapshot with unchanged data looking
>>> >>>> like holes in the file):
>>> >>>>
>>> >>>>   qemu-img convert -p -n $nbd disk.raw
>>> >>>>
>>> >>>> This is fast and efficient as it uses block status nbd extension, so it only
>>> >>>> transfers new data.
>>> >>>
>>> >>> This is an implementation detail. Don't rely on it. What you're doing is
>>> >>> abusing 'qemu-img convert', so problems like what you describe are to be
>>> >>> expected.
>>> >>>
>>> >>>> This can be done over and over again to keep the local
>>> >>>> `disk.raw` image up to date with the latest remote snapshot.
>>> >>>>
>>> >>>> However, when the guest OS zeroes some of the data and it gets written into the
>>> >>>> snapshot, qemu-img scans for those zeros and does not write them to the
>>> >>>> destination image.  Checking the output of `qemu-img map --output=json $nbd`
>>> >>>> shows that the zeroed data is properly marked as `data: true`.
>>> >>>>
>>> >>>> Using `-S 0` would write zeros even where the holes are, effectively overwriting
>>> >>>> the data from the last snapshot even though they should not be changed.
>>> >>>>
>>> >>>> Having gone through some workarounds I would like there to be another way.  I
>>> >>>> know this is far from the typical usage of qemu-img, but is this really the
>>> >>>> expected behaviour or is this just something nobody really needed before?  If it
>>> >>>> is the former, would it be possible to have a parameter that would control this
>>> >>>> behaviour?  If the latter is the case, can that behaviour be changed so that it
>>> >>>> properly replicates the data when `-n` parameter is used?
>>> >>>>
>>> >>>> Basically the only thing we need is to either:
>>> >>>>
>>> >>>> 1) write zeros where they actually are or
>>> >>>>
>>> >>>> 2) turn off explicit sparsification without requesting dense image (basically
>>> >>>>     sparsify only the par that is reported as hole on the source) or
>>> >>>>
>>> >>>> 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
>>> >>>>     but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
>>> >>>>     believe, is effectively the same)
>>> >>>>
>>> >>>> If you want to try this out, I found the easiest reproducible way is using
>>> >>>> nbdkit's data plugin, which can simulate whatever source image you like.
>>> >>>
>>> >>> I think what you _really_ want is a commit block job. The problem is
>>> >>> just that you don't have a proper backing file chain, but just a bunch
>>> >>> of NBD connections.
>>> >>>
>>> >>> Can't you get an NBD connection that already provides the condensed form
>>> >>> of the whole snapshot chain directly at the source? If the NBD server
>>> >>> was QEMU, this would actually be easier than providing each snapshot
>>> >>> individually.
>>> >>>
>>> >>> If this isn't possible, I think you need to replicate the backing chain
>>> >>> on the destination instead of converting into the same image again and
>>> >>> again so that qemu-img knows that it must take existing data of the
>>> >>> backing file into consideration:
>>> >>>
>>> >>>     qemu-img convert -O qcow2 nbd://... base.qcow2
>>> >>>     qemu-img convert -O qcow2 -F qcow2 -B base.qcow2 nbd://... overlay1.qcow2
>>> >>>     qemu-img convert -O qcow2 -F qcow2 -B overlay1.qcow2 nbd://... overlay2.qcow2
>>> >>>     ...
>>>
> 
> So I spoke too soon.  This approach fixed the one thing that I was struggling with, but broke the rest, because it completely replicates the last image even when the source provides proper allocation data.  Best to show with an illustration:
> 
>   $ rm -f disk.img snap.img
>   $ dd if=/dev/urandom of=disk.img bs=2M count=1
>   $ dd if=/dev/zero of=snap.img bs=1M count=1
>   $ truncate -s 2M snap.img
>   $ qemu-img map --output=json snap.img
>   [{ "start": 0, "length": 1048576, "depth": 0, "zero": false, "data": true, "offset": 0},
>   { "start": 1048576, "length": 1048576, "depth": 0, "zero": true, "data": false, "offset": 1048576}]
>   $ qemu-img convert -f raw -O qcow2 disk.img disk.qcow2
>   $ qemu-img convert -f raw -O qcow2 -B disk.qcow2 snap.img snap.qcow2
>   $ qemu-img convert -f qcow2 -O raw snap.qcow2 output.raw
>   $ hexdump -C output.raw
>   00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
>   *
>   00200000
> 


Raw file holes and qcow2 unallocated clusters are not the same thing. Raw holes are reported
as zero=true (as we see in map output), and this considered "allocated" in terms of backing-chain.
And convert will mark corresponding clusters to be ZERO (not UNALLOCATED) in target qcow2.

But when you export qcow2 with unallocated clusters through NBD, unallocated clusters should be
reported as zero=false data=false, and qemu-img will convert them to UNALLOCATED (not ZERO)
clusters in target qcow2 and it should work.

In qcow2 ZERO and UNALLOCATED clusters works like this:
ZERO: on write - allocate clusters and write data, on read - return zeroes
UNALLOCATED: on write -allocate clusters and write data, on read - read from backing file if we have it, otherwise return zeroes


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] Possibly incorrect data sparsification by qemu-img
  2019-04-29  8:58             ` Vladimir Sementsov-Ogievskiy
@ 2019-04-29  9:16               ` Martin Kletzander
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Kletzander @ 2019-04-29  9:16 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: Kevin Wolf, qemu-devel, Richard Jones

[-- Attachment #1: Type: text/plain, Size: 6693 bytes --]

On Mon, Apr 29, 2019 at 08:58:37AM +0000, Vladimir Sementsov-Ogievskiy wrote:
>29.04.2019 10:27, Martin Kletzander wrote:
>> On Wed, Apr 24, 2019 at 09:19:17AM +0200, Kevin Wolf wrote:
>>> Am 24.04.2019 um 08:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 23.04.2019 18:08, Kevin Wolf wrote:
>>>> > Am 23.04.2019 um 16:26 hat Martin Kletzander geschrieben:
>>>> >> On Tue, Apr 23, 2019 at 02:12:18PM +0200, Kevin Wolf wrote:
>>>> >>> Am 23.04.2019 um 13:30 hat Martin Kletzander geschrieben:
>>>> >>>> Hi,
>>>> >>>>
>>>> >>>> I am using qemu-img with nbdkit to transfer a disk image and the update it with
>>>> >>>> extra data from newer snapshots.  The end image cannot be transferred because
>>>> >>>> the snapshots will be created later than the first transfer and we want to save
>>>> >>>> some time up front.  You might think of it as a continuous synchronisation.  It
>>>> >>>> looks something like this:
>>>> >>>>
>>>> >>>> I first transfer the whole image:
>>>> >>>>
>>>> >>>>   qemu-img convert -p $nbd disk.raw
>>>> >>>>
>>>> >>>> Where `$nbd` is something along the lines of `nbd+unix:///?socket=nbdkit.sock`
>>>> >>>>
>>>> >>>> Then, after the next snapshot is created, I can update it thanks to the `-n`
>>>> >>>> parameter (the $nbd now points to the newer snapshot with unchanged data looking
>>>> >>>> like holes in the file):
>>>> >>>>
>>>> >>>>   qemu-img convert -p -n $nbd disk.raw
>>>> >>>>
>>>> >>>> This is fast and efficient as it uses block status nbd extension, so it only
>>>> >>>> transfers new data.
>>>> >>>
>>>> >>> This is an implementation detail. Don't rely on it. What you're doing is
>>>> >>> abusing 'qemu-img convert', so problems like what you describe are to be
>>>> >>> expected.
>>>> >>>
>>>> >>>> This can be done over and over again to keep the local
>>>> >>>> `disk.raw` image up to date with the latest remote snapshot.
>>>> >>>>
>>>> >>>> However, when the guest OS zeroes some of the data and it gets written into the
>>>> >>>> snapshot, qemu-img scans for those zeros and does not write them to the
>>>> >>>> destination image.  Checking the output of `qemu-img map --output=json $nbd`
>>>> >>>> shows that the zeroed data is properly marked as `data: true`.
>>>> >>>>
>>>> >>>> Using `-S 0` would write zeros even where the holes are, effectively overwriting
>>>> >>>> the data from the last snapshot even though they should not be changed.
>>>> >>>>
>>>> >>>> Having gone through some workarounds I would like there to be another way.  I
>>>> >>>> know this is far from the typical usage of qemu-img, but is this really the
>>>> >>>> expected behaviour or is this just something nobody really needed before?  If it
>>>> >>>> is the former, would it be possible to have a parameter that would control this
>>>> >>>> behaviour?  If the latter is the case, can that behaviour be changed so that it
>>>> >>>> properly replicates the data when `-n` parameter is used?
>>>> >>>>
>>>> >>>> Basically the only thing we need is to either:
>>>> >>>>
>>>> >>>> 1) write zeros where they actually are or
>>>> >>>>
>>>> >>>> 2) turn off explicit sparsification without requesting dense image (basically
>>>> >>>>     sparsify only the par that is reported as hole on the source) or
>>>> >>>>
>>>> >>>> 3) ideally, just FALLOC_FL_PUNCH_HOLE in places where source did report data,
>>>> >>>>     but qemu-img found they are all zeros (or source reported HOLE+ZERO which, I
>>>> >>>>     believe, is effectively the same)
>>>> >>>>
>>>> >>>> If you want to try this out, I found the easiest reproducible way is using
>>>> >>>> nbdkit's data plugin, which can simulate whatever source image you like.
>>>> >>>
>>>> >>> I think what you _really_ want is a commit block job. The problem is
>>>> >>> just that you don't have a proper backing file chain, but just a bunch
>>>> >>> of NBD connections.
>>>> >>>
>>>> >>> Can't you get an NBD connection that already provides the condensed form
>>>> >>> of the whole snapshot chain directly at the source? If the NBD server
>>>> >>> was QEMU, this would actually be easier than providing each snapshot
>>>> >>> individually.
>>>> >>>
>>>> >>> If this isn't possible, I think you need to replicate the backing chain
>>>> >>> on the destination instead of converting into the same image again and
>>>> >>> again so that qemu-img knows that it must take existing data of the
>>>> >>> backing file into consideration:
>>>> >>>
>>>> >>>     qemu-img convert -O qcow2 nbd://... base.qcow2
>>>> >>>     qemu-img convert -O qcow2 -F qcow2 -B base.qcow2 nbd://... overlay1.qcow2
>>>> >>>     qemu-img convert -O qcow2 -F qcow2 -B overlay1.qcow2 nbd://... overlay2.qcow2
>>>> >>>     ...
>>>>
>>
>> So I spoke too soon.  This approach fixed the one thing that I was struggling with, but broke the rest, because it completely replicates the last image even when the source provides proper allocation data.  Best to show with an illustration:
>>
>>   $ rm -f disk.img snap.img
>>   $ dd if=/dev/urandom of=disk.img bs=2M count=1
>>   $ dd if=/dev/zero of=snap.img bs=1M count=1
>>   $ truncate -s 2M snap.img
>>   $ qemu-img map --output=json snap.img
>>   [{ "start": 0, "length": 1048576, "depth": 0, "zero": false, "data": true, "offset": 0},
>>   { "start": 1048576, "length": 1048576, "depth": 0, "zero": true, "data": false, "offset": 1048576}]
>>   $ qemu-img convert -f raw -O qcow2 disk.img disk.qcow2
>>   $ qemu-img convert -f raw -O qcow2 -B disk.qcow2 snap.img snap.qcow2
>>   $ qemu-img convert -f qcow2 -O raw snap.qcow2 output.raw
>>   $ hexdump -C output.raw
>>   00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
>>   *
>>   00200000
>>
>
>
>Raw file holes and qcow2 unallocated clusters are not the same thing. Raw holes are reported
>as zero=true (as we see in map output), and this considered "allocated" in terms of backing-chain.
>And convert will mark corresponding clusters to be ZERO (not UNALLOCATED) in target qcow2.
>
>But when you export qcow2 with unallocated clusters through NBD, unallocated clusters should be
>reported as zero=false data=false, and qemu-img will convert them to UNALLOCATED (not ZERO)
>clusters in target qcow2 and it should work.
>
>In qcow2 ZERO and UNALLOCATED clusters works like this:
>ZERO: on write - allocate clusters and write data, on read - return zeroes
>UNALLOCATED: on write -allocate clusters and write data, on read - read from backing file if we have it, otherwise return zeroes
>

Oh, thanks for the clarification, this makes sense.  I'll try it out and will see.

>
>-- 
>Best regards,
>Vladimir

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-04-29  9:16 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-23 11:30 [Qemu-devel] Possibly incorrect data sparsification by qemu-img Martin Kletzander
2019-04-23 11:30 ` Martin Kletzander
2019-04-23 11:36 ` Richard W.M. Jones
2019-04-23 11:36   ` Richard W.M. Jones
2019-04-23 11:55   ` Daniel P. Berrangé
2019-04-23 12:12 ` Kevin Wolf
2019-04-23 12:12   ` Kevin Wolf
2019-04-23 14:26   ` Martin Kletzander
2019-04-23 14:26     ` Martin Kletzander
2019-04-23 15:08     ` Kevin Wolf
2019-04-23 15:08       ` Kevin Wolf
2019-04-24  6:40       ` Vladimir Sementsov-Ogievskiy
2019-04-24  7:19         ` Kevin Wolf
2019-04-24  9:04           ` Martin Kletzander
2019-04-29  7:27           ` Martin Kletzander
2019-04-29  8:58             ` Vladimir Sementsov-Ogievskiy
2019-04-29  9:16               ` Martin Kletzander

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.