Re: [Qemu-devel] QCow2 compression

* Re: [Qemu-devel] QCow2 compression
@ 2016-03-05  0:11 mgreger
  0 siblings, 0 replies; 8+ messages in thread
From: mgreger @ 2016-03-05  0:11 UTC (permalink / raw)
  To: Eric Blake, qemu-devel

---- Eric Blake <eblake@redhat.com> wrote: 
> [any way you could convince your mailer to not break threading?]
> 
> On 03/03/2016 09:24 PM, mgreger@cinci.rr.com wrote:
> >>
> >> The zeros are not part of the compressed data, though, that's why the 
> >> Compressed Cluster Descriptor indicates a shorter size. Had another 
> >> compressed cluster been written to the same image, it might have ended 
> >> up where you are seeing the zero padding now. (The trick with 
> >> compression is that multiple guest clusters can end up in a single host 
> >> cluster.) 
> >>
> >  
> > Thanks, but the given length of 0x5600 is still short by 160(decimal) bytes 
> > compared to the 
> > non-zero data (which occupies an additional 133 bytes beyond the expected end at 
> > 0x3DED50) and zero 
> > padding (an additional 27 bytes beyond that). Could there be an off-by-one error 
> > somewhere? 
> > The file doesn't even end on a sector boundary let alone a cluster boundary. 
> 
> Based on an IRC conversation I had when you first asked the question, I
> think the spec is indeed weak, and that we DO have some fishy code.
> 
> Look at what the code does:
> 
> uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>                                                uint64_t offset,
>                                                int compressed_size)
> ...
>     nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
>                   (cluster_offset >> 9);
> 
>     cluster_offset |= QCOW_OFLAG_COMPRESSED |
>                       ((uint64_t)nb_csectors << s->csize_shift);
> 
> That sure does sound like an off-by-one.  cluster_offset does indeed
> look like a byte offset (from qcow2_alloc_bytes); so let's consider what
> happens if we've already allocated one compressed cluster in the past,
> going from 65536 to 66999.  So on this call, cluster_offset would be
> 67000, and if compressed size is 1025 (just round numbers to make
> discussion easy; no idea if gzip would really do this on any particular
> data), we are computing ((67000+1025-1)>>9) - (67000>>9) == 2, but 1025
> bytes occupies 3 sectors, not 2.
> 
> But we offset it by another off-by-one:
> 
> int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset)
> {
> ...
>         nb_csectors = ((cluster_offset >> s->csize_shift) &
> s->csize_mask) + 1;
> 
> Yuck.  That is horribly ugly.
> 
> We need to fix the documentation to make it obvious that the sector
> count is a _LOWER BOUND_ on the number of sectors occupied, and that you
> need to read at least one more cluster's worth of data before decompressing.
> 
> It would also be nice to fix qcow2 code to not have quite so many
> off-by-one computations, but be more precise about what data is going where.
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 

Thanks for verifying the behavior for me. This will
allow me to add compression support to the code I've
already written.

I've implemented QCow2 support for a fork of the popular
DOSBox emulator.

The project (not mine), is called DOSBox-X, and is geared
towards emulating Windows9x era hardware with a specific
focus on games: http://dosbox-x.com/

If you'd like to give me any feedback on the QCow2 code I've written
please feel free to e-mail me directly. The relevant files are
qcow2_disk.cpp and qcow_disk.h. If you do, please be kind as
my C++ skills are very rusty. Overall I think it is pretty readable.
It took me about 2 and a half weeks working in my spare time to go
from reading the spec to the final code integrated into DOSBox-X.

Thanks,
Michael Greger

^ permalink raw reply	[flat|nested] 8+ messages in thread