All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?
@ 2018-09-13  3:33 lampahome
  2018-09-13 13:22 ` Max Reitz
  0 siblings, 1 reply; 10+ messages in thread
From: lampahome @ 2018-09-13  3:33 UTC (permalink / raw)
  To: QEMU Developers

I split data to 3 chunks and save it in 3 independent backing files like
below:
img.000 <-- img.001 <-- img.002
img.000 is the backing file of img.001 and 001 is the backing file of 002.
img.000 saves the 1st chunk of data and img.001 saves the 2nd chunk of
data, and img.002 saves the 3rd chunk of data.

Now I have img.003 stores cow data of 1st chunk and img.002 is the backing
file of img.003.
The backing chain is like this:
  img.000 <-- img.001 <-- img.002 <-- img.003

So that means the data of img.003 saves the same range with img.000 but
different data.

I know I can use *`qemu-img commit'* but it only commit the data from
img.003 to img.002.

If I use *`qemu-img rebase -b img.000 img.003`*, the data of img.001 and
img.002 will merge into img.003.

What I want is only commit the data in img.003 into img.000 because the
data of the two image are the same range(1st chunk)

Is there anyway to commit(or merge) data of active image into corresponding
backing file?

thx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?
  2018-09-13  3:33 [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd? lampahome
@ 2018-09-13 13:22 ` Max Reitz
  2018-09-13 17:05   ` Eric Blake
  0 siblings, 1 reply; 10+ messages in thread
From: Max Reitz @ 2018-09-13 13:22 UTC (permalink / raw)
  To: lampahome, QEMU Developers, Qemu-block

[-- Attachment #1: Type: text/plain, Size: 2314 bytes --]

On 13.09.18 05:33, lampahome wrote:
> I split data to 3 chunks and save it in 3 independent backing files like
> below:
> img.000 <-- img.001 <-- img.002
> img.000 is the backing file of img.001 and 001 is the backing file of 002.
> img.000 saves the 1st chunk of data and img.001 saves the 2nd chunk of
> data, and img.002 saves the 3rd chunk of data.
> 
> Now I have img.003 stores cow data of 1st chunk and img.002 is the backing
> file of img.003.
> The backing chain is like this:
>   img.000 <-- img.001 <-- img.002 <-- img.003
> 
> So that means the data of img.003 saves the same range with img.000 but
> different data.
> 
> I know I can use *`qemu-img commit'* but it only commit the data from
> img.003 to img.002.
> 
> If I use *`qemu-img rebase -b img.000 img.003`*, the data of img.001 and
> img.002 will merge into img.003.
> 
> What I want is only commit the data in img.003 into img.000 because the
> data of the two image are the same range(1st chunk)
> 
> Is there anyway to commit(or merge) data of active image into corresponding
> backing file?

So img.000, img.001, and img.002 all contain data at completely
different areas, and img.003 only contains data where img.000 contains
data as well?

Say like so:

$ qemu-img create -f qcow2 img.000 3M
$ qemu-img create -f qcow2 -b img.000 img.001
$ qemu-img create -f qcow2 -b img.001 img.002
$ qemu-img create -f qcow2 -b img.002 img.003
$ qemu-io -c 'write -P 1 0M 1M' img.000
$ qemu-io -c 'write -P 2 1M 1M' img.001
$ qemu-io -c 'write -P 3 2M 1M' img.002
$ qemu-io -c 'write -P 4 0M 1M' img.003

(img.000 contains 1s from 0M to 1M;
 img.001 contains 2s from 1M to 2M;
 img.002 contains 3s from 2M to 3M;
 img.003 contains 4s from 0M to 1M (the range of img.000))

In that case, rebase -u might be what you want, so the following should
work (although it can easily corrupt your data if it isn't the case[1]):

$ qemu-img rebase -u -b img.000 img.003
$ qemu-img commit img.003

(And then maybe
$ qemu-img rebase -u -b img.002 img.003
to return to the previous backing chain.)

Max


[1] It will corrupt your data if img.001 or img.002 contain any data
where img.003 also contains data; because then that data of img.003 will
be hidden when viewed through img.001 and img.002.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?
  2018-09-13 13:22 ` Max Reitz
@ 2018-09-13 17:05   ` Eric Blake
  2018-09-13 18:37     ` Max Reitz
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Blake @ 2018-09-13 17:05 UTC (permalink / raw)
  To: Max Reitz, lampahome, QEMU Developers, Qemu-block, Markus Armbruster

[adding Markus, because of an interesting observation about --image-opts 
vs. JSON null - search for [1] below]

On 9/13/18 8:22 AM, Max Reitz wrote:
> On 13.09.18 05:33, lampahome wrote:
>> I split data to 3 chunks and save it in 3 independent backing files like
>> below:
>> img.000 <-- img.001 <-- img.002
>> img.000 is the backing file of img.001 and 001 is the backing file of 002.
>> img.000 saves the 1st chunk of data and img.001 saves the 2nd chunk of
>> data, and img.002 saves the 3rd chunk of data.

How have you ensured that these three files are visiting different 
ranges of guest data?

It sounds like you are trying to keep the sizes of .000, .001, and .002 
constant, but updating their respective contents.  Rather unusual, but 
not necessarily a bad idea.

>>
>> Now I have img.003 stores cow data of 1st chunk and img.002 is the backing
>> file of img.003.
>> The backing chain is like this:
>>    img.000 <-- img.001 <-- img.002 <-- img.003
>>
>> So that means the data of img.003 saves the same range with img.000 but
>> different data.
>>
>> I know I can use *`qemu-img commit'* but it only commit the data from
>> img.003 to img.002.

Which, if the guest range covered by .000 and .002 are originally 
distinct, makes .002 grow in size for any changes that .003 has made 
relative to .000 or .001, rather than writing to the respective backing 
file.

>>
>> If I use *`qemu-img rebase -b img.000 img.003`*, the data of img.001 and
>> img.002 will merge into img.003.

Which makes .000 grow in size, because you didn't limit how much of .003 
gets committed.  But maybe it's possible to use the 'offset' and 'size' 
parameters to the raw format driver to make qemu-img see only a subset 
of img.003, at which point committing just that subset is easier.  Hmm - 
it might work for img.000, but not so easily for img.001 or img.002, 
because we don't have a clean way to copy from one source offset to a 
different destination offset.  Last month, I proposed a patch to enhance 
'qemu-img dd' to do that - but the argument was that 'qemu-img convert' 
should also be able to do it, with 'qemu-img dd' being a thin veneer 
over convert rather than doing everything itself, so there's still work 
to be done.

>>
>> What I want is only commit the data in img.003 into img.000 because the
>> data of the two image are the same range(1st chunk)
>>
>> Is there anyway to commit(or merge) data of active image into corresponding
>> backing file?
> 
> So img.000, img.001, and img.002 all contain data at completely
> different areas, and img.003 only contains data where img.000 contains
> data as well?
> 
> Say like so:
> 
> $ qemu-img create -f qcow2 img.000 3M
> $ qemu-img create -f qcow2 -b img.000 img.001
> $ qemu-img create -f qcow2 -b img.001 img.002
> $ qemu-img create -f qcow2 -b img.002 img.003

Missing -F qcow2 in those last three lines (you should always specify 
the backing format in the qcow2 metadata, otherwise you are setting 
yourself up for failures because probing is unsafe)

> $ qemu-io -c 'write -P 1 0M 1M' img.000
> $ qemu-io -c 'write -P 2 1M 1M' img.001
> $ qemu-io -c 'write -P 3 2M 1M' img.002
> $ qemu-io -c 'write -P 4 0M 1M' img.003

I'd modify this example to use:
  qemu-io -c 'write -P 4 0M 512k' -c 'write -P 4 1m 512k' \
    -c 'write -P 4 2m 512k' img.003

so that it becomes easier to see if we are ever committing more than 
desired.

> 
> (img.000 contains 1s from 0M to 1M;
>   img.001 contains 2s from 1M to 2M;
>   img.002 contains 3s from 2M to 3M;
>   img.003 contains 4s from 0M to 1M (the range of img.000))

Or, visually, with my tweak to img.003,

img.000     11----
img.001     --22--
img.002     ----33
img.003     4-4-4-
guest sees  414243

and your goal, if I'm understanding, is to do range-based commits so 
that you end up with:

img.000     41----
img.001     --42--
img.002     ----43
img.003     ------
guest sees  414243

> 
> In that case, rebase -u might be what you want, so the following should
> work (although it can easily corrupt your data if it isn't the case[1]):
> 
> $ qemu-img rebase -u -b img.000 img.003
> $ qemu-img commit img.003

No, that still copies anything that img.003 has changed from .001 or 
.002 into .000, making .000 grow in size (that is, your approach changed 
img.000 to read 41-4-4-).  If you can view just a subset of img.003, 
then you CAN commit just that subset into img.000 (but not into .001 or 
.002, because we don't yet have 'qemu-img commit --target-image-opts' to 
specify the 'offset=' argument to the raw driver).  So here's what I tried:

$ qemu-io -c 'r -P 4 0 512k' -c 'r -P 1 512k 512k' -c map --image-opts 
driver=raw,size=1m,file.driver=qcow2,file.file.driver=file,file.file.filename=img.003
read 524288/524288 bytes at offset 0
512 KiB, 1 ops; 0.0002 sec (1.719 GiB/sec and 3521.1268 ops/sec)
read 524288/524288 bytes at offset 524288
512 KiB, 1 ops; 0.0004 sec (1.218 GiB/sec and 2493.7656 ops/sec)
512 KiB (0x80000) bytes     allocated at offset 0 bytes (0x0)
512 KiB (0x80000) bytes not allocated at offset 512 KiB (0x80000)

Yep - that fancy --image-opts syntax let us use a raw wrapper around 
qcow2 to see just the first 1M of image.003.  Now:

$ qemu-img commit --image-opts -b img.000 
driver=raw,size=1m,file.driver=qcow2,file.file.driver=file,file.file.filename=img.003
qemu-img: Did not find 'img.000' in the backing chain of 
'driver=raw,size=1m,file.driver=qcow2,file.file.driver=file,file.file.filename=img.003'

Alas, since 'raw' does not have backing files on its own, qemu-img 
commit refuses to do anything (it will only commit into a known backing 
chain).  I know Max has a proposed series to make filters behave more 
sanely (so that the backing file of an original node is also seen to be 
the backing file of a filter node), but I don't know if that would 
completely help here (the fact that the raw format node is being used 
more as a filter is a bit different from normally using it as a format 
driver - maybe we want size/offset limitations to be an actual filter 
node, separate from the raw format driver?).

But I'm not giving up just yet - we can use qemu-img convert to create a 
temporary file that contains only the data we want committed:

$ qemu-img convert -O qcow2 -B img.000 --image-opts 
driver=raw,size=1m,file.driver=qcow2,file.file.driver=file,file.file.filename=img.003 
img.004

achieving:

img.000     11----
img.001     --22--
img.002     ----33
img.003     4-4-4-
guest sees  414243
img.004     4-

and now commit that:

$ qemu-img commit img.004

and double-check what img.000 now contains:

$ qemu-io -c 'r -P 4 0 512k' -c 'r -P 1 512k 512k' img.000
read 524288/524288 bytes at offset 0
512 KiB, 1 ops; 0.0001 sec (2.872 GiB/sec and 5882.3529 ops/sec)
read 524288/524288 bytes at offset 524288
512 KiB, 1 ops; 0.0002 sec (2.078 GiB/sec and 4255.3191 ops/sec)

so now we have achieved:

img.000     41----
img.001     --22--
img.002     ----33
img.003     4-4-4-
guest sees  414243
img.004     --

Which is not quite our end goal - we have not yet freed the storage in 
img.003, AND img.004 is still wasting storage space. We can delete 
img.004 now, but I know of no way to force img.003 to deallocate those 
clusters.  Attempting:

[1]
$ qemu-io -c 'discard 0 1m' --image-opts 
driver=qcow2,backing=,file.driver=file,file.filename=img.003
warning: Use of "backing": "" is deprecated; use "backing": null instead
discard 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 0.0002 sec (4.399 GiB/sec and 4504.5045 ops/sec)

doesn't work, as 'discard' causes img.003 to now make things read as 
zero rather than deferring to the backing chain, even though I 
specifically told qemu to operate as if img.003 has no backing image 
(although it DOES reduce the disk space occupied by img.003, although 
not the file size - compare 'ls -l' and 'du' output before and after the 
attempt - which means the 'discard' DID end up punching a hole in the 
host file).

Also, that warning message is annoying.  We can't spell 'backing=null' 
because that tries to find a node named "null"; to avoid it, we'd have 
to support using --image-opts with JSON on the command line instead of 
dotted names, as in:

$ qemu-io -c 'discard 0 1m' --image-opts '{"driver":"qcow2", 
"backing":null, "file":{"driver":"file", "filename":"img.003"}}'

except THAT doesn't work yet (we haven't converted all our command line 
arguments to taking JSON yet). (end [1])

I guess I can avoid the warning message by using multiple steps for 
temporarily having no backing file:

$ qemu-img rebase -u -b '' img.003
$ qemu-io -c 'discard 0 1m' img.003
discard 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 0.0002 sec (4.811 GiB/sec and 4926.1084 ops/sec)
$ qemu-img rebase -u -F qcow2 -b img.002 img.003

But whether I use the one-liner with --image-opts or the multi-step with 
explicit 'rebase -u'  I've botched things, because now I have:

img.000     41----
img.001     --22--
img.002     ----33
img.003     z-4-4-
guest sees  014243

To restore things back for further playing around, do
$ qemu-io -c 'w -P 4 0 512k' img.003

Hmm, another idea:
$ qemu-img rebase -f qcow2 -b img.002 -F qcow2 img.003

Nope, doesn't work - it doesn't do deduplication by removing clusters in 
img.003 that are identical to the clusters in the underlying backing 
chain (img.003 still contains '4-4-4-' instead of the desired '--4-4-'). 
So that sounds like yet another missing feature to add later.

> 
> (And then maybe
> $ qemu-img rebase -u -b img.002 img.003
> to return to the previous backing chain.)
> 
> Max
> 
> 
> [1] It will corrupt your data if img.001 or img.002 contain any data
> where img.003 also contains data; because then that data of img.003 will
> be hidden when viewed through img.001 and img.002.

Sorry - for all my experimenting, I could NOT find a reliable way to 
remove duplicated clusters out of img.003 once they were committed to 
img.000, nor a clean way to commit data from a subset of img.003 to the 
proper img.001 or img.002.  It is possible to manually use qemu-img map 
to learn which portions of img.003 should be copied, then use qemu-nbd 
to map both img.001 and img.003 to NBD devices, and use a series of dd 
commands to copy just those portions of the guest-visible data - but 
again, while that commits to the proper backing file, it does not 
discard the clusters from img.003.  Commit with "mode":"incremental" 
could be used to direct which portions of a file to commit, if you had 
an easy way to inject a bitmap describing that portion of the file, but 
we really don't have decent offline bitmap management via qemu-img yet.

So, while this thread has sparked some ideas for future improvements, 
the takeaway message for now is no, you really can't commit just a 
portion of one qcow2 image into another.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?
  2018-09-13 17:05   ` Eric Blake
@ 2018-09-13 18:37     ` Max Reitz
  2018-09-13 19:41       ` Max Reitz
  2018-09-13 20:01       ` Eric Blake
  0 siblings, 2 replies; 10+ messages in thread
From: Max Reitz @ 2018-09-13 18:37 UTC (permalink / raw)
  To: Eric Blake, lampahome, QEMU Developers, Qemu-block, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 16134 bytes --]

On 13.09.18 19:05, Eric Blake wrote:
> [adding Markus, because of an interesting observation about --image-opts
> vs. JSON null - search for [1] below]
> 
> On 9/13/18 8:22 AM, Max Reitz wrote:
>> On 13.09.18 05:33, lampahome wrote:
>>> I split data to 3 chunks and save it in 3 independent backing files like
>>> below:
>>> img.000 <-- img.001 <-- img.002
>>> img.000 is the backing file of img.001 and 001 is the backing file of
>>> 002.
>>> img.000 saves the 1st chunk of data and img.001 saves the 2nd chunk of
>>> data, and img.002 saves the 3rd chunk of data.
> 
> How have you ensured that these three files are visiting different
> ranges of guest data?

He did say "independent".

> It sounds like you are trying to keep the sizes of .000, .001, and .002
> constant, but updating their respective contents.  Rather unusual, but
> not necessarily a bad idea.
> 
>>>
>>> Now I have img.003 stores cow data of 1st chunk and img.002 is the
>>> backing
>>> file of img.003.
>>> The backing chain is like this:
>>>    img.000 <-- img.001 <-- img.002 <-- img.003
>>>
>>> So that means the data of img.003 saves the same range with img.000 but
>>> different data.
>>>
>>> I know I can use *`qemu-img commit'* but it only commit the data from
>>> img.003 to img.002.
> 
> Which, if the guest range covered by .000 and .002 are originally
> distinct, makes .002 grow in size for any changes that .003 has made
> relative to .000 or .001, rather than writing to the respective backing
> file.
> 
>>>
>>> If I use *`qemu-img rebase -b img.000 img.003`*, the data of img.001 and
>>> img.002 will merge into img.003.
> 
> Which makes .000 grow in size, because you didn't limit how much of .003
> gets committed.

I probably shouldn't interpret intentions here, but he did say "img.003
stores cow data of 1st chunk".  Which to me sounded like .003 does not
have any changes relative to .001 or .002, so .000 should not grow in size.

> But maybe it's possible to use the 'offset' and 'size'
> parameters to the raw format driver to make qemu-img see only a subset
> of img.003, at which point committing just that subset is easier.

No, because raw is not marked a filter driver, so you cannot commit
through it.

(In fact, you cannot even commit through filter drivers now.)

And this is probably correct, because exactly that offset and size make
it so that the filter BDS presents different data than its child.  So it
isn't a filter.

> Hmm -
> it might work for img.000, but not so easily for img.001 or img.002,
> because we don't have a clean way to copy from one source offset to a
> different destination offset.  Last month, I proposed a patch to enhance
> 'qemu-img dd' to do that - but the argument was that 'qemu-img convert'
> should also be able to do it, with 'qemu-img dd' being a thin veneer
> over convert rather than doing everything itself, so there's still work
> to be done.
> 
>>>
>>> What I want is only commit the data in img.003 into img.000 because the
>>> data of the two image are the same range(1st chunk)
>>>
>>> Is there anyway to commit(or merge) data of active image into
>>> corresponding
>>> backing file?
>>
>> So img.000, img.001, and img.002 all contain data at completely
>> different areas, and img.003 only contains data where img.000 contains
>> data as well?
>>
>> Say like so:
>>
>> $ qemu-img create -f qcow2 img.000 3M
>> $ qemu-img create -f qcow2 -b img.000 img.001
>> $ qemu-img create -f qcow2 -b img.001 img.002
>> $ qemu-img create -f qcow2 -b img.002 img.003
> 
> Missing -F qcow2 in those last three lines (you should always specify
> the backing format in the qcow2 metadata, otherwise you are setting
> yourself up for failures because probing is unsafe)

Is it really unsafe for non-raw images?

>> $ qemu-io -c 'write -P 1 0M 1M' img.000
>> $ qemu-io -c 'write -P 2 1M 1M' img.001
>> $ qemu-io -c 'write -P 3 2M 1M' img.002
>> $ qemu-io -c 'write -P 4 0M 1M' img.003
> 
> I'd modify this example to use:
>  qemu-io -c 'write -P 4 0M 512k' -c 'write -P 4 1m 512k' \
>    -c 'write -P 4 2m 512k' img.003
> 
> so that it becomes easier to see if we are ever committing more than
> desired.

Well, I interpreted the problem in a way that .003 does not shadow any
data from .001 or .002.

>>
>> (img.000 contains 1s from 0M to 1M;
>>   img.001 contains 2s from 1M to 2M;
>>   img.002 contains 3s from 2M to 3M;
>>   img.003 contains 4s from 0M to 1M (the range of img.000))
> 
> Or, visually, with my tweak to img.003,
> 
> img.000     11----
> img.001     --22--
> img.002     ----33
> img.003     4-4-4-
> guest sees  414243
> 
> and your goal, if I'm understanding, is to do range-based commits so
> that you end up with:
> 
> img.000     41----
> img.001     --42--
> img.002     ----43
> img.003     ------
> guest sees  414243
> 
>>
>> In that case, rebase -u might be what you want, so the following should
>> work (although it can easily corrupt your data if it isn't the case[1]):
>>
>> $ qemu-img rebase -u -b img.000 img.003
>> $ qemu-img commit img.003
> 
> No, that still copies anything that img.003 has changed from .001 or
> .002 into .000, making .000 grow in size (that is, your approach changed
> img.000 to read 41-4-4-).

Well, I definitely misunderstood the issue if .003 changed anything from
.001 or .002, because I didn't read that from the description.  To me,
it sounded like .003 only changed data that's in .000.

> If you can view just a subset of img.003,
> then you CAN commit just that subset into img.000 (but not into .001 or
> .002, because we don't yet have 'qemu-img commit --target-image-opts' to
> specify the 'offset=' argument to the raw driver).  So here's what I tried:
> 
> $ qemu-io -c 'r -P 4 0 512k' -c 'r -P 1 512k 512k' -c map --image-opts
> driver=raw,size=1m,file.driver=qcow2,file.file.driver=file,file.file.filename=img.003
> 
> read 524288/524288 bytes at offset 0
> 512 KiB, 1 ops; 0.0002 sec (1.719 GiB/sec and 3521.1268 ops/sec)
> read 524288/524288 bytes at offset 524288
> 512 KiB, 1 ops; 0.0004 sec (1.218 GiB/sec and 2493.7656 ops/sec)
> 512 KiB (0x80000) bytes     allocated at offset 0 bytes (0x0)
> 512 KiB (0x80000) bytes not allocated at offset 512 KiB (0x80000)
> 
> Yep - that fancy --image-opts syntax let us use a raw wrapper around
> qcow2 to see just the first 1M of image.003.  Now:
> 
> $ qemu-img commit --image-opts -b img.000
> driver=raw,size=1m,file.driver=qcow2,file.file.driver=file,file.file.filename=img.003
> 
> qemu-img: Did not find 'img.000' in the backing chain of
> 'driver=raw,size=1m,file.driver=qcow2,file.file.driver=file,file.file.filename=img.003'
> 
> 
> Alas, since 'raw' does not have backing files on its own, qemu-img
> commit refuses to do anything (it will only commit into a known backing
> chain).  I know Max has a proposed series to make filters behave more
> sanely (so that the backing file of an original node is also seen to be
> the backing file of a filter node), but I don't know if that would
> completely help here (the fact that the raw format node is being used
> more as a filter is a bit different from normally using it as a format
> driver - maybe we want size/offset limitations to be an actual filter
> node, separate from the raw format driver?).

As I said, that isn't a filter.  A filter does not change what data is
visible, and that's very important.

Because for instance, for committing, you need to be able to go
backwards.  So you read something at offset X from the filter, and you
want to commit it down the chain -- of course, you write it to offset X
in the target backing file.  But if you use a raw node with an offset,
that changes, so we'd need to be able to translate it back.

(More generally, if you change the data that's visible, the "data
filter" node would need to provide a way to translate the data.  Well,
the way is clearly there, it's the write function it provides, but we'd
need to do some funky stuff to employ it.)

> But I'm not giving up just yet - we can use qemu-img convert to create a
> temporary file that contains only the data we want committed:
> 
> $ qemu-img convert -O qcow2 -B img.000 --image-opts
> driver=raw,size=1m,file.driver=qcow2,file.file.driver=file,file.file.filename=img.003
> img.004
> 
> achieving:
> 
> img.000     11----
> img.001     --22--
> img.002     ----33
> img.003     4-4-4-
> guest sees  414243
> img.004     4-
> 
> and now commit that:
> 
> $ qemu-img commit img.004
> 
> and double-check what img.000 now contains:
> 
> $ qemu-io -c 'r -P 4 0 512k' -c 'r -P 1 512k 512k' img.000
> read 524288/524288 bytes at offset 0
> 512 KiB, 1 ops; 0.0001 sec (2.872 GiB/sec and 5882.3529 ops/sec)
> read 524288/524288 bytes at offset 524288
> 512 KiB, 1 ops; 0.0002 sec (2.078 GiB/sec and 4255.3191 ops/sec)
> 
> so now we have achieved:
> 
> img.000     41----
> img.001     --22--
> img.002     ----33
> img.003     4-4-4-
> guest sees  414243
> img.004     --
> 
> Which is not quite our end goal - we have not yet freed the storage in
> img.003, AND img.004 is still wasting storage space. We can delete
> img.004 now, but I know of no way to force img.003 to deallocate those
> clusters.  Attempting:
> 
> [1]
> $ qemu-io -c 'discard 0 1m' --image-opts
> driver=qcow2,backing=,file.driver=file,file.filename=img.003
> warning: Use of "backing": "" is deprecated; use "backing": null instead
> discard 1048576/1048576 bytes at offset 0
> 1 MiB, 1 ops; 0.0002 sec (4.399 GiB/sec and 4504.5045 ops/sec)
> 
> doesn't work, as 'discard' causes img.003 to now make things read as
> zero rather than deferring to the backing chain,

Which is intentional because making data re-appear from the backing
chain can be a security issue, as far as I remember.

> even though I
> specifically told qemu to operate as if img.003 has no backing image

discard just says "I don't care what data appears there".  For qcow2 v3
the simplest way is to make it a zero cluster.

> (although it DOES reduce the disk space occupied by img.003, although
> not the file size - compare 'ls -l' and 'du' output before and after the
> attempt - which means the 'discard' DID end up punching a hole in the
> host file).
> 
> Also, that warning message is annoying.  We can't spell 'backing=null'> because that tries to find a node named "null"; to avoid it, we'd have
> to support using --image-opts with JSON on the command line instead of
> dotted names, as in:
> 
> $ qemu-io -c 'discard 0 1m' --image-opts '{"driver":"qcow2",
> "backing":null, "file":{"driver":"file", "filename":"img.003"}}'
> 
> except THAT doesn't work yet (we haven't converted all our command line
> arguments to taking JSON yet). (end [1])

I hate json:{}, but we have it, so why not use it?

$ qemu-io -c 'discard 0 1m' \
    "json:{'driver':'qcow2','backing':null,
           'file':{'driver':'file','filename':'img.003'}}"
discard 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 0.0000 sec (10.389 GiB/sec and 10638.2979 ops/sec)

> I guess I can avoid the warning message by using multiple steps for
> temporarily having no backing file:
> 
> $ qemu-img rebase -u -b '' img.003
> $ qemu-io -c 'discard 0 1m' img.003
> discard 1048576/1048576 bytes at offset 0
> 1 MiB, 1 ops; 0.0002 sec (4.811 GiB/sec and 4926.1084 ops/sec)
> $ qemu-img rebase -u -F qcow2 -b img.002 img.003
> 
> But whether I use the one-liner with --image-opts or the multi-step with
> explicit 'rebase -u'  I've botched things, because now I have:
> 
> img.000     41----
> img.001     --22--
> img.002     ----33
> img.003     z-4-4-
> guest sees  014243
> 
> To restore things back for further playing around, do
> $ qemu-io -c 'w -P 4 0 512k' img.003
> 
> Hmm, another idea:
> $ qemu-img rebase -f qcow2 -b img.002 -F qcow2 img.003
> 
> Nope, doesn't work - it doesn't do deduplication by removing clusters in
> img.003 that are identical to the clusters in the underlying backing
> chain (img.003 still contains '4-4-4-' instead of the desired '--4-4-').
> So that sounds like yet another missing feature to add later.
> 
>>
>> (And then maybe
>> $ qemu-img rebase -u -b img.002 img.003
>> to return to the previous backing chain.)
>>
>> Max
>>
>>
>> [1] It will corrupt your data if img.001 or img.002 contain any data
>> where img.003 also contains data; because then that data of img.003 will
>> be hidden when viewed through img.001 and img.002.
> 
> Sorry - for all my experimenting, I could NOT find a reliable way to
> remove duplicated clusters out of img.003 once they were committed to
> img.000,

I'm not sure whether your experiments really concern what the reporter
needs in his exact case, but just for fun:

Basically, there is only one way to reliably make an image pass through
data from its backing files again.  Well, two, actually.  One is
qemu-img commit, which (for compatibility, mainly) makes the image empty
after the commit.  The other is just throwing the image away and
re-creating it from scratch.

So in any case, you cannot reliably do that for just a part of the image.

First, split .003 into the part we want to commit and the part we don't
want to commit.  This is a bit tricky without qemu-img dd @seek (or a
corresponding convert parameter), so we'll have to make do with
backing=null so we don't copy anything into the output from img.003's
backing chain.

Or, we would have to use backing=null, but for some reason that doesn't
work.  I'll have to investigate.

So rebase will need to do:

$ qemu-img rebase -u -b '' img.003

$ qemu-img convert -O qcow2 \
    "json:{'driver':'raw','offset':0,'size':1048576,\
           'file':{'driver':'qcow2',\
                   'file':{'driver':'file','filename':'img.003'}}}" \
    "json:{'driver':'null-co','size':2097152}" \
    img.003.commit.000

$ qemu-img convert -O qcow2 \
    "json:{'driver':'null-co','size':1048576}" \
    "json:{'driver':'raw','offset':1048576,'size':2097152,\
           'file':{'driver':'qcow2',\
                   'file':{'driver':'file','filename':'img.003'}}}" \
    img.003.nocommit

Now let's set the backing files.  img.003.commit.000 has only data that
goes into img.000, so that goes there, and img.003.nocommit is going to
replace our old img.003, so that goes where that was:

$ qemu-img rebase -u -b img.000 img.003.commit.000
$ qemu-img rebase -u -b img.002 img.003.nocommit

And now let's commit:

$ qemu-img commit img.003.commit.000

And let's clean up:

$ rm img.003.commit.000
$ mv img.003.nocommit img.003

Done.

(If you want to commit all three parts of img.003 into the three
different base images, you would create img.003.commit.001 and
img.003.commit.002 similarly as above, and then commit those into the
respective base images.  Then you'd just rm img.003* and you're back to
the original state.)

Max

> nor a clean way to commit data from a subset of img.003 to the
> proper img.001 or img.002.  It is possible to manually use qemu-img map
> to learn which portions of img.003 should be copied, then use qemu-nbd
> to map both img.001 and img.003 to NBD devices, and use a series of dd
> commands to copy just those portions of the guest-visible data - but
> again, while that commits to the proper backing file, it does not
> discard the clusters from img.003.  Commit with "mode":"incremental"
> could be used to direct which portions of a file to commit, if you had
> an easy way to inject a bitmap describing that portion of the file, but
> we really don't have decent offline bitmap management via qemu-img yet.
> 
> So, while this thread has sparked some ideas for future improvements,
> the takeaway message for now is no, you really can't commit just a
> portion of one qcow2 image into another.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?
  2018-09-13 18:37     ` Max Reitz
@ 2018-09-13 19:41       ` Max Reitz
  2018-09-13 20:06         ` Eric Blake
  2018-09-13 20:01       ` Eric Blake
  1 sibling, 1 reply; 10+ messages in thread
From: Max Reitz @ 2018-09-13 19:41 UTC (permalink / raw)
  To: Eric Blake, lampahome, QEMU Developers, Qemu-block, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 1063 bytes --]

On 13.09.18 20:37, Max Reitz wrote:

[...]

> Or, we would have to use backing=null, but for some reason that doesn't
> work.  I'll have to investigate.

Turns out this was fixed in e59a0cf17b1b9932b65e6fc25d6856976f5e4831.

(Why does Fedora still have only qemu 2.11?)

> So rebase will need to do:
> 
> $ qemu-img rebase -u -b '' img.003
> 
> $ qemu-img convert -O qcow2 \
>     "json:{'driver':'raw','offset':0,'size':1048576,\
>            'file':{'driver':'qcow2',\
>                    'file':{'driver':'file','filename':'img.003'}}}" \
>     "json:{'driver':'null-co','size':2097152}" \
>     img.003.commit.000
> 
> $ qemu-img convert -O qcow2 \
>     "json:{'driver':'null-co','size':1048576}" \
>     "json:{'driver':'raw','offset':1048576,'size':2097152,\
>            'file':{'driver':'qcow2',\
>                    'file':{'driver':'file','filename':'img.003'}}}" \
>     img.003.nocommit

So starting with 2.12, putting a "'backing':null" behind
"'driver':'qcow2," will work just as well as rebasing img.003.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?
  2018-09-13 18:37     ` Max Reitz
  2018-09-13 19:41       ` Max Reitz
@ 2018-09-13 20:01       ` Eric Blake
  2018-09-13 20:44         ` Max Reitz
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Blake @ 2018-09-13 20:01 UTC (permalink / raw)
  To: Max Reitz, lampahome, QEMU Developers, Qemu-block, Markus Armbruster

On 9/13/18 1:37 PM, Max Reitz wrote:
> On 13.09.18 19:05, Eric Blake wrote:
>> [adding Markus, because of an interesting observation about --image-opts
>> vs. JSON null - search for [1] below]
>>
>> On 9/13/18 8:22 AM, Max Reitz wrote:
>>> On 13.09.18 05:33, lampahome wrote:
>>>> I split data to 3 chunks and save it in 3 independent backing files like
>>>> below:
>>>> img.000 <-- img.001 <-- img.002
>>>> img.000 is the backing file of img.001 and 001 is the backing file of
>>>> 002.
>>>> img.000 saves the 1st chunk of data and img.001 saves the 2nd chunk of
>>>> data, and img.002 saves the 3rd chunk of data.
>>
>> How have you ensured that these three files are visiting different
>> ranges of guest data?
> 
> He did say "independent".

True, but I'm curious how they were created in the first place (our 
simple qemu-io -c 'write ...' is fine for testing, but nothing like 
knowing the real story)


>>> $ qemu-img create -f qcow2 img.000 3M
>>> $ qemu-img create -f qcow2 -b img.000 img.001
>>> $ qemu-img create -f qcow2 -b img.001 img.002
>>> $ qemu-img create -f qcow2 -b img.002 img.003
>>
>> Missing -F qcow2 in those last three lines (you should always specify
>> the backing format in the qcow2 metadata, otherwise you are setting
>> yourself up for failures because probing is unsafe)
> 
> Is it really unsafe for non-raw images?

In practice, not a problem for isolated testing. But it DOES interfere 
with libvirt - libvirt assumes that any image that was not explicitly 
specified is raw, rather than probing it, and treating img.002 as raw 
(with no access to img.000 or img.001) means reading through img.003 
sees garbage.

> 
>>> $ qemu-io -c 'write -P 1 0M 1M' img.000
>>> $ qemu-io -c 'write -P 2 1M 1M' img.001
>>> $ qemu-io -c 'write -P 3 2M 1M' img.002
>>> $ qemu-io -c 'write -P 4 0M 1M' img.003
>>
>> I'd modify this example to use:
>>   qemu-io -c 'write -P 4 0M 512k' -c 'write -P 4 1m 512k' \
>>     -c 'write -P 4 2m 512k' img.003
>>
>> so that it becomes easier to see if we are ever committing more than
>> desired.
> 
> Well, I interpreted the problem in a way that .003 does not shadow any
> data from .001 or .002.

True, but the question is again - how was the actual img.003 created, to 
either ensure that it really does just touch clusters shadowed from .000 
(qemu-img map output helps, if it's not too verbose).


>> $ qemu-io -c 'discard 0 1m' --image-opts
>> driver=qcow2,backing=,file.driver=file,file.filename=img.003
>> warning: Use of "backing": "" is deprecated; use "backing": null instead
>> discard 1048576/1048576 bytes at offset 0
>> 1 MiB, 1 ops; 0.0002 sec (4.399 GiB/sec and 4504.5045 ops/sec)
>>
>> doesn't work, as 'discard' causes img.003 to now make things read as
>> zero rather than deferring to the backing chain,
> 
> Which is intentional because making data re-appear from the backing
> chain can be a security issue, as far as I remember.

It can be a potential issue if there is a backing file (exposing data 
that you thought was wiped is not fun).  But where there is NO backing 
file, it's overly cautious, and gets in our way (we read all zeros from 
a file with no backing, whether the cluster is marked as 0 or as 
defer-to-backing).  I'm okay if we still keep the overly cautious way by 
default, but having a knob to say "discard this, and I really do mean 
discard rather than read back as 0" would be useful in qemu (after all, 
that's what fallocate(FALLOC_FL_NO_HIDE_STALE) has recently been used 
for in the kernel, as the knob for whether discarding on a block device 
must read back as zero or may go faster [2]).

[2] https://lore.kernel.org/patchwork/patch/953421/

>>
>> $ qemu-io -c 'discard 0 1m' --image-opts '{"driver":"qcow2",
>> "backing":null, "file":{"driver":"file", "filename":"img.003"}}'
>>
>> except THAT doesn't work yet (we haven't converted all our command line
>> arguments to taking JSON yet). (end [1])
> 
> I hate json:{}, but we have it, so why not use it?
> 
> $ qemu-io -c 'discard 0 1m' \
>      "json:{'driver':'qcow2','backing':null,
>             'file':{'driver':'file','filename':'img.003'}}"

Hmm - that's the pseudo-JSON protocol rather than --image-opts detecting 
a first character of '{'. But yeah, that works for getting at 
"backing":null cleaner than the "backing=" with intentionally empty 
argument via dotted syntax.


>> Sorry - for all my experimenting, I could NOT find a reliable way to
>> remove duplicated clusters out of img.003 once they were committed to
>> img.000,
> 
> I'm not sure whether your experiments really concern what the reporter
> needs in his exact case, but just for fun:

Indeed - lampahome, concrete tests with accurate reproduction 
instructions always makes life easier for people trying to help you.

> 
> Basically, there is only one way to reliably make an image pass through
> data from its backing files again.  Well, two, actually.  One is
> qemu-img commit, which (for compatibility, mainly) makes the image empty
> after the commit.

And only if you did NOT use the -b option (in other words, it only 
empties the file if you are committing to the immediate backing file, 
not deep in the chain).

>  The other is just throwing the image away and
> re-creating it from scratch.

Well yeah, there's that. But now you have a transient problem of extra 
pressure on your storage, while you have duplicated blocks between old 
and new images, prior to being able to remove the old image.  If the 
goal is to make img.000 not grow during the commit, I was assuming that 
we are already storage-constrained, and any solution that does in-place 
modification is therefore better than one that has to create yet another 
copy of data, even if the end result is the same once all operations 
have finished.

> 
> So in any case, you cannot reliably do that for just a part of the image.
> 
> First, split .003 into the part we want to commit and the part we don't
> want to commit.  This is a bit tricky without qemu-img dd @seek (or a
> corresponding convert parameter), so we'll have to make do with
> backing=null so we don't copy anything into the output from img.003's
> backing chain.
> 
> Or, we would have to use backing=null, but for some reason that doesn't
> work.  I'll have to investigate.

Just so I'm following along, what didn't work? 'backing':null in a 
json:{...} pseudoformat, or driver.raw,file.driver=qcow2,file.backing=, 
in dotted syntax?

> 
> So rebase will need to do:
> 
> $ qemu-img rebase -u -b '' img.003
> 
> $ qemu-img convert -O qcow2 \
>      "json:{'driver':'raw','offset':0,'size':1048576,\
>             'file':{'driver':'qcow2',\
>                     'file':{'driver':'file','filename':'img.003'}}}" \
>      "json:{'driver':'null-co','size':2097152}" \
>      img.003.commit.000

Oh right - you can indeed concatenate multiple inputs into one output 
with qemu-img convert.

> 
> $ qemu-img convert -O qcow2 \
>      "json:{'driver':'null-co','size':1048576}" \
>      "json:{'driver':'raw','offset':1048576,'size':2097152,\
>             'file':{'driver':'qcow2',\
>                     'file':{'driver':'file','filename':'img.003'}}}" \
>      img.003.nocommit

So you created:

img.000             11----
img.001             --22--
img.002             ----33
img.003             4-4-4-
guest sees          414243
img.003.commit.000  4-----
img.003.nocommit    --4-4-


> 
> Now let's set the backing files.  img.003.commit.000 has only data that
> goes into img.000, so that goes there, and img.003.nocommit is going to
> replace our old img.003, so that goes where that was:
> 
> $ qemu-img rebase -u -b img.000 img.003.commit.000
> $ qemu-img rebase -u -b img.002 img.003.nocommit
> 
> And now let's commit:
> 
> $ qemu-img commit img.003.commit.000
> 
> And let's clean up:
> 
> $ rm img.003.commit.000
> $ mv img.003.nocommit img.003
> 
> Done.

Done, but with temporary storage usage higher than doing it in place.

> 
> (If you want to commit all three parts of img.003 into the three
> different base images, you would create img.003.commit.001 and
> img.003.commit.002 similarly as above, and then commit those into the
> respective base images.  Then you'd just rm img.003* and you're back to
> the original state.)

Your solution of qemu-img convert to concatenate null-co with an offset 
of img.003 is nice.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?
  2018-09-13 19:41       ` Max Reitz
@ 2018-09-13 20:06         ` Eric Blake
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Blake @ 2018-09-13 20:06 UTC (permalink / raw)
  To: Max Reitz, lampahome, QEMU Developers, Qemu-block, Markus Armbruster

On 9/13/18 2:41 PM, Max Reitz wrote:
> On 13.09.18 20:37, Max Reitz wrote:
> 
> [...]
> 
>> Or, we would have to use backing=null, but for some reason that doesn't
>> work.  I'll have to investigate.
> 
> Turns out this was fixed in e59a0cf17b1b9932b65e6fc25d6856976f5e4831.
> 
> (Why does Fedora still have only qemu 2.11?)

Fedora 28 + virtmaint-sig-virt-preview COPR has qemu 3.0.  Come join us 
on the bleeding edge :)

> So starting with 2.12, putting a "'backing':null" behind
> "'driver':'qcow2," will work just as well as rebasing img.003.

Good to hear.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?
  2018-09-13 20:01       ` Eric Blake
@ 2018-09-13 20:44         ` Max Reitz
  2018-09-14  2:19           ` lampahome
  0 siblings, 1 reply; 10+ messages in thread
From: Max Reitz @ 2018-09-13 20:44 UTC (permalink / raw)
  To: Eric Blake, lampahome, QEMU Developers, Qemu-block, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 5959 bytes --]

On 13.09.18 22:01, Eric Blake wrote:
> On 9/13/18 1:37 PM, Max Reitz wrote:
>> On 13.09.18 19:05, Eric Blake wrote:

[...]

>>> $ qemu-io -c 'discard 0 1m' --image-opts
>>> driver=qcow2,backing=,file.driver=file,file.filename=img.003
>>> warning: Use of "backing": "" is deprecated; use "backing": null instead
>>> discard 1048576/1048576 bytes at offset 0
>>> 1 MiB, 1 ops; 0.0002 sec (4.399 GiB/sec and 4504.5045 ops/sec)
>>>
>>> doesn't work, as 'discard' causes img.003 to now make things read as
>>> zero rather than deferring to the backing chain,
>>
>> Which is intentional because making data re-appear from the backing
>> chain can be a security issue, as far as I remember.
> 
> It can be a potential issue if there is a backing file (exposing data
> that you thought was wiped is not fun).  But where there is NO backing
> file, it's overly cautious, and gets in our way (we read all zeros from
> a file with no backing, whether the cluster is marked as 0 or as
> defer-to-backing).  I'm okay if we still keep the overly cautious way by
> default, but having a knob to say "discard this, and I really do mean
> discard rather than read back as 0" would be useful in qemu (after all,
> that's what fallocate(FALLOC_FL_NO_HIDE_STALE) has recently been used
> for in the kernel, as the knob for whether discarding on a block device
> must read back as zero or may go faster [2]).
> 
> [2] https://lore.kernel.org/patchwork/patch/953421/

Maybe, but I don't see how this would improve anything for qcow2 v3.
Fully unmapping a cluster or making it a zero cluster is basically the
same.  Why would we make qcow2 present effectively random data, when we
can easily make it well-defined?

(It may make a difference for raw images, but this discussion is mainly
about qcow2 and how you could abuse such a feature for making backing
file content reappear. :-))

I just realized I myself have a need to punch such holes, though.  Deep
on my todo list there's this point of making active commit punch holes
in the overlay, because currently, it writes data twice: Once to the
overlay, once to the backing file (like every mirror).  But if for the
respective cluster the backing file is visible from the overlay, we
could simply punch a hole in it and could skip writing the data there.

[...]

>> Basically, there is only one way to reliably make an image pass through
>> data from its backing files again.  Well, two, actually.  One is
>> qemu-img commit, which (for compatibility, mainly) makes the image empty
>> after the commit.
> 
> And only if you did NOT use the -b option (in other words, it only
> empties the file if you are committing to the immediate backing file,
> not deep in the chain).

Yep, because all images between base and top will possibly become
garbage due to that operation.  So if we emptied top, it'd become
garbage, too.  Which is why we don't empty it, so it it stays valid.

And technically, also only if you did not use the -d option, because
that skips the emptying.  Which is useful if you're just going to delete
the image anyway (as in the example I gave here).

>>  The other is just throwing the image away and
>> re-creating it from scratch.
> 
> Well yeah, there's that. But now you have a transient problem of extra
> pressure on your storage, while you have duplicated blocks between old
> and new images, prior to being able to remove the old image.  If the
> goal is to make img.000 not grow during the commit, I was assuming that
> we are already storage-constrained, and any solution that does in-place
> modification is therefore better than one that has to create yet another
> copy of data, even if the end result is the same once all operations
> have finished.

What if you use qemu-img create -n to overwrite it?

(But it's all just academic anyway.  What you'd want is a way to discard
parts of an image, and we just don't have that.)

[...]

>>
>> Now let's set the backing files.  img.003.commit.000 has only data that
>> goes into img.000, so that goes there, and img.003.nocommit is going to
>> replace our old img.003, so that goes where that was:
>>
>> $ qemu-img rebase -u -b img.000 img.003.commit.000
>> $ qemu-img rebase -u -b img.002 img.003.nocommit
>>
>> And now let's commit:
>>
>> $ qemu-img commit img.003.commit.000
>>
>> And let's clean up:
>>
>> $ rm img.003.commit.000
>> $ mv img.003.nocommit img.003
>>
>> Done.
> 
> Done, but with temporary storage usage higher than doing it in place.

Yes, that's true.

>> (If you want to commit all three parts of img.003 into the three
>> different base images, you would create img.003.commit.001 and
>> img.003.commit.002 similarly as above, and then commit those into the
>> respective base images.  Then you'd just rm img.003* and you're back to
>> the original state.)
> 
> Your solution of qemu-img convert to concatenate null-co with an offset
> of img.003 is nice.

I'm not sure whether I'd call it "nice".  "Interesting" probably, yes.

But it is rather obscure, probably nobody outside of qemu-img developers
know that you can do something like that.  Also, it's only an offline
solution that doesn't readily translate into an online one.

Maybe you could mirror img.003 (filtered) to img.003.nocommit, then
complete the mirror, so the latter replaces the former, and then mirror
the to-be-committed part of img.003 (which is no longer in use) to
img.003.commit.000?  And then...  Well, what exactly.  The right thing
would probably to attach img.003.commit.000 as an overlay of img.000
(currently requires a blockdev-del and blockdev-add with backing=img.000
(or backing=null and then blockdev-snapshot, but why)).  And then you'd
commit it down, if blockers allow it.

In that time, img.003.nocommit could have received new data in the
img.000 area, though, but that's probably OK.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?
  2018-09-13 20:44         ` Max Reitz
@ 2018-09-14  2:19           ` lampahome
  2018-09-14 14:48             ` Eric Blake
  0 siblings, 1 reply; 10+ messages in thread
From: lampahome @ 2018-09-14  2:19 UTC (permalink / raw)
  To: Max Reitz; +Cc: Eric Blake, QEMU Developers, Qemu-block, Markus Armbruster

Sorry, I need to explain what case I want to do

Todo: I want to *backup a block device into qcow2 format image.*
I met a problem which is the *file size limit of filesystem* ex: Max is
16TB for any file in ext4, but the block device maybe 32TB or more.

I figure out one way is to *divide data of device into 1TB chunk* and save
every chunk into qcow2 image cuz I don't change filesystem, and  connect
with backing chain.
*(That's what I said range is different)*
Ex: 1st chunk of device will save into image.000
2nd chunk of device will save into image.001
Nth chunk of device will save into image.(N-1)
...etc

I can see all block device data when I mount image.(N-1) by qemu-nbd cuz
the chunk doesn't overlap and all chunks connect by backing chain.

Now I want to do next thing: *Incremental backup*
When I modify data of 1st chunk, what I thought is to write new 1st chunk
to new image *image.N* and let *imgae.(N-1)* be the backing file of
*image.N* .
That's cuz I want to store the data before modified to roll back anytime.

So now I have two *version of block device(like concept of snapshot)*:
One is image.000 to image.(N-1). I can access the data before modify by
mount image.(N-1) through qemu-nbd
The other one is image.000 to image.N.  I can access the data after modify
by mount image.N through qemu-nbd(cuz the visible 1st chunk are in the
image.N)

Consider about the situation:
000   A - - - - - - - -  <<<<<---  store the 1st chunk of block device
001   - B - - - - - - -
002   - - C - - - - - - (1st state of block device)
003   A' - - - - - - - - <<<<<--- store the 1st chunk of block device, but
data is different
004   - - - D - - - - - (2nd state of block device)
005   - - - - E - - - -  (3rd state of block device)

The original problem is If I want to remove the 2nd state(003 and 004) but
I need to keep the data of 003 and 004.
If I just commit 003, the A' of 003 must be committed into 002 cuz 002 is
the backing file of 003.
I try to figure out some way to let it only commit from 003 into 000.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd?
  2018-09-14  2:19           ` lampahome
@ 2018-09-14 14:48             ` Eric Blake
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Blake @ 2018-09-14 14:48 UTC (permalink / raw)
  To: lampahome, Max Reitz; +Cc: QEMU Developers, Qemu-block, Markus Armbruster

On 9/13/18 9:19 PM, lampahome wrote:
> Sorry, I need to explain what case I want to do
> 
> Todo: I want to *backup a block device into qcow2 format image.*
> I met a problem which is the *file size limit of filesystem* ex: Max is
> 16TB for any file in ext4, but the block device maybe 32TB or more.
> 
> I figure out one way is to *divide data of device into 1TB chunk* and save
> every chunk into qcow2 image cuz I don't change filesystem, and  connect
> with backing chain.

A better way would be to use a different filesystem that does not have 
those limits, or even better to just directly use a raw block device 
with the size you need instead of worrying about storing a file system 
on top of the block device just to introduce artificial size limitations 
into the mix.  LVM is great for that.

> *(That's what I said range is different)*
> Ex: 1st chunk of device will save into image.000
> 2nd chunk of device will save into image.001
> Nth chunk of device will save into image.(N-1)
> ...etc
> 
> I can see all block device data when I mount image.(N-1) by qemu-nbd cuz
> the chunk doesn't overlap and all chunks connect by backing chain.

How exactly did you create those images?  I'm trying to verify the steps 
you used to split the image. I know the concept of the split, but 
without seeing actual commands used, I don't know that you actually 
accomplished the split in the manner desired.  (It's okay if a 
reproduction uses smaller scales for speed, such as splitting a 32M 
image across 1M qcow2 files - the point remains that seeing the actual 
steps used may offer additional insights into your usage scenario).

Or are you trying to ask if it is possible to create such a fragmented 
design with current tools?  (The answer that we've given you is that no, 
it is not easy to do, because no one has needed it so far).  There's no 
way to tell a running qemu that writes to offsets 0-1M go into one file, 
while writes to offsets 1M to 2M go into another - ALL writes go into 
the currently active layer, regardless of the offset represented by the 
write.

It would be possible to come up with a new driver (or to add yet another 
mode to the existing quorum driver) that DOES allow runtime 
concatenation of multiple subsidiary devices, in order to present a 
linear view of those images as a single guest device.  To an extent, 
that's what 'qemu-img convert image1 image2 imageout' is doing, except 
that qemu-img is doing it via manual hacks, rather than something baked 
into the internal qemu block layer (we'd need it in the qemu block layer 
for it to work with a running guest with random access, rather than just 
a one-time conversion pass).  But no one has submitted patches for that yet.

> 
> Now I want to do next thing: *Incremental backup*
> When I modify data of 1st chunk, what I thought is to write new 1st chunk
> to new image *image.N* and let *imgae.(N-1)* be the backing file of
> *image.N* .
> That's cuz I want to store the data before modified to roll back anytime.

Qemu DOES support incremental backups via persistent bitmaps coupled 
with NBD exports.  See 
https://bugzilla.redhat.com/show_bug.cgi?id=1207657#c27 for a 
demonstration of all the steps involved, but it is quite possible to 
create an NBD export of a point-in-time incremental of a running guest, 
where you can then query over NBD which portions of the backup represent 
deltas from your earlier point in time (by using a bitmap to track which 
clusters were written from the earlier point in time), and where you can 
read the data from NBD in ANY manner you see fit (including reading 
dirty clusters from 0-1M to write into backup file .000, reading dirty 
clusters from 1M-2M to write into backup file .001, and so on).  So if 
you want to split your backing file into ranges (which I already 
questioned as to how you plan to do that, given that the subsequent 
writes are not split), you can at least create incremental backups that 
are also split.

> 
> So now I have two *version of block device(like concept of snapshot)*:
> One is image.000 to image.(N-1). I can access the data before modify by
> mount image.(N-1) through qemu-nbd
> The other one is image.000 to image.N.  I can access the data after modify
> by mount image.N through qemu-nbd(cuz the visible 1st chunk are in the
> image.N)
> 
> Consider about the situation:
> 000   A - - - - - - - -  <<<<<---  store the 1st chunk of block device
> 001   - B - - - - - - -
> 002   - - C - - - - - - (1st state of block device)
> 003   A' - - - - - - - - <<<<<--- store the 1st chunk of block device, but
> data is different
> 004   - - - D - - - - - (2nd state of block device)
> 005   - - - - E - - - -  (3rd state of block device)
> 
> The original problem is If I want to remove the 2nd state(003 and 004) but
> I need to keep the data of 003 and 004.
> If I just commit 003, the A' of 003 must be committed into 002 cuz 002 is
> the backing file of 003.
> I try to figure out some way to let it only commit from 003 into 000.
> 

I'm not quite following your diagram. My naive read (probably wrong) is 
that you are trying to present a 9M image (scaled M to G or T as 
appropriate) to the guest, as represented by the 9 characters, but that 
the initial image only populated 3M of the 9 with guest-visible contents 
represented by ABC------.  So you want to split that into files 000 
containing offsets 0-1M (A--------), 001 containing offsets 1M-2M 
(-B-------), and 002 containing offsets 2M-3M (--C------).  Then you 
want to run the guest, which does some modifications in offsets 0-1M 
(I'll write it as "a" instead of "A'", you could also have chosen a 
different letter except that your example already uses "D" elsewhere), 
so the guest now sees (aBC------), and you want to store that 
incremental backup in file 003, containing just (a--------).  But that's 
where I got confused - my original assumption was that 003 represented 
offsets 3M-4M (---X-----), but you are now showing it as representing 
offsets 0-1M.  It's also not clear which files in your list have which 
other files as backing files.

So, since I got confused, it may help if you spend more time giving even 
more details diagramming your data splits, with exact filenames that you 
are trying to manipulate, over multiple points in time.

Or, if you really do want to use the quorum block driver to implement a 
new block driver that concatenates multiple subsidiary drivers into a 
linear range, then it would indeed become possible to direct writes into 
a specific file.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-09-14 14:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-13  3:33 [Qemu-devel] Can I only commit from active image to corresponding range of its backing file by qemu cmd? lampahome
2018-09-13 13:22 ` Max Reitz
2018-09-13 17:05   ` Eric Blake
2018-09-13 18:37     ` Max Reitz
2018-09-13 19:41       ` Max Reitz
2018-09-13 20:06         ` Eric Blake
2018-09-13 20:01       ` Eric Blake
2018-09-13 20:44         ` Max Reitz
2018-09-14  2:19           ` lampahome
2018-09-14 14:48             ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.