qcow2 preallocation and backing files

* qcow2 preallocation and backing files
@ 2019-11-20 12:06 Alberto Garcia
  2019-11-20 12:27 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 8+ messages in thread
From: Alberto Garcia @ 2019-11-20 12:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, qemu-block, Max Reitz

Hi,

as we discussed yesterday on IRC there's an inconsistency in the way
qcow2 preallocation works.

Let's create an image and fill it with data:

   $ qemu-img create -f raw base.img 1M
   $ qemu-io -f raw -c 'write -P 0xFF 0 1M' base.img

Now QEMU won't let us create a new image backed by base.img using
preallocation:

   $ qemu-img create -f qcow2 -b base.img -o preallocation=metadata active.img
   qemu-img: active.img: Backing file and preallocation cannot be used at the same time

The reason is that once a cluster is preallocated (i.e. it has a valid
L2 entry pointing to a host offset) the guest won't see the contents
of the backing file, so those options conflict with each other.

It is possible however to create an image that is smaller than
the backing file and then resize it using preallocation. In this
case qemu-img will happily accept any --preallocation option, with
different results from the guest's point of view:

   # This reads as 0xFF (the data comes from base.img)
   $ qemu-img create -f qcow2 -b base.img active.img 512K

   # The second half of the image also reads as 0xFF
   $ qemu-img resize --preallocation=off active.img 1M

   # Here the second half reads as zeroes
   $ qemu-img resize --preallocation=metadata active.img 1M

Apart from "qemu-img resize", the QMP block-resize command can also
extend an image like this, although it always uses PREALLOC_MODE_OFF
and the user cannot change that.

It does not seem right that the guest-visible data changes depending
on the preallocation mode. This could be solved by returning an error
when (backing_bs(blk_bs(blk)) && prealloc != PREALLOC_MODE_OFF) on
img_resize().

The important question is however: what behavior is the right one?
Should growing an image that was smaller than the backing file return
zeroes, or data from the backing file? I would opt for the latter, for
simplicity and consistency with the current behavior of block-resize,
although it was pointed out that this could be a security problem (I'm
not sure that I agree with that, but we can discuss it).

This also has a consequence on how preallocation should be implemented
for images with subclusters. Extended L2 entries allow us to allocate
a cluster but leave each one of its subclusters unallocated. That
would allow us to have a cluster that is simultaneously allocated but
whose data is read from the backing file. But it's up to us to decide
if that's what we should do when resizing an image.

Berto

^ permalink raw reply	[flat|nested] 8+ messages in thread