On 27.04.20 21:26, Denis Plotnikov wrote: > > > On 27.04.2020 15:35, Max Reitz wrote: >> On 21.04.20 10:11, Denis Plotnikov wrote: >>> zstd significantly reduces cluster compression time. >>> It provides better compression performance maintaining >>> the same level of the compression ratio in comparison with >>> zlib, which, at the moment, is the only compression >>> method available. >>> >>> The performance test results: >>> Test compresses and decompresses qemu qcow2 image with just >>> installed rhel-7.6 guest. >>> Image cluster size: 64K. Image on disk size: 2.2G >>> >>> The test was conducted with brd disk to reduce the influence >>> of disk subsystem to the test results. >>> The results is given in seconds. >>> >>> compress cmd: >>>    time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd] >>>                    src.img [zlib|zstd]_compressed.img >>> decompress cmd >>>    time ./qemu-img convert -O qcow2 >>>                    [zlib|zstd]_compressed.img uncompressed.img >>> >>>             compression               decompression >>>           zlib       zstd           zlib         zstd >>> ------------------------------------------------------------ >>> real     65.5       16.3 (-75 %)    1.9          1.6 (-16 %) >>> user     65.0       15.8            5.3          2.5 >>> sys       3.3        0.2            2.0          2.0 >>> >>> Both ZLIB and ZSTD gave the same compression ratio: 1.57 >>> compressed image size in both cases: 1.4G >>> >>> Signed-off-by: Denis Plotnikov >>> Reviewed-by: Vladimir Sementsov-Ogievskiy >>> Reviewed-by: Alberto Garcia >>> QAPI part: >>> Acked-by: Markus Armbruster >>> --- >>>   docs/interop/qcow2.txt |   1 + >>>   configure              |   2 +- >>>   qapi/block-core.json   |   3 +- >>>   block/qcow2-threads.c  | 157 +++++++++++++++++++++++++++++++++++++++++ >>>   block/qcow2.c          |   7 ++ >>>   5 files changed, 168 insertions(+), 2 deletions(-) >> [...] >> >>> diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c >>> index 7dbaf53489..0525718704 100644 >>> --- a/block/qcow2-threads.c >>> +++ b/block/qcow2-threads.c >>> @@ -28,6 +28,11 @@ >>>   #define ZLIB_CONST >>>   #include >>>   +#ifdef CONFIG_ZSTD >>> +#include >>> +#include >>> +#endif >>> + >>>   #include "qcow2.h" >>>   #include "block/thread-pool.h" >>>   #include "crypto.h" >>> @@ -166,6 +171,148 @@ static ssize_t qcow2_zlib_decompress(void >>> *dest, size_t dest_size, >>>       return ret; >>>   } >>>   +#ifdef CONFIG_ZSTD >>> + >>> +/* >>> + * qcow2_zstd_compress() >>> + * >>> + * Compress @src_size bytes of data using zstd compression method >>> + * >>> + * @dest - destination buffer, @dest_size bytes >>> + * @src - source buffer, @src_size bytes >>> + * >>> + * Returns: compressed size on success >>> + *          -ENOMEM destination buffer is not enough to store >>> compressed data >>> + *          -EIO    on any other error >>> + */ >>> +static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size, >>> +                                   const void *src, size_t src_size) >>> +{ >>> +    ssize_t ret; >>> +    ZSTD_outBuffer output = { dest, dest_size, 0 }; >>> +    ZSTD_inBuffer input = { src, src_size, 0 }; >> Minor style note: I think it’d be nicer to use designated initializers >> here. >> >>> +    ZSTD_CCtx *cctx = ZSTD_createCCtx(); >>> + >>> +    if (!cctx) { >>> +        return -EIO; >>> +    } >>> +    /* >>> +     * Use the zstd streamed interface for symmetry with decompression, >>> +     * where streaming is essential since we don't record the exact >>> +     * compressed size. >>> +     * >>> +     * In the loop, we try to compress all the data into one zstd >>> frame. >>> +     * ZSTD_compressStream2 potentially can finish a frame earlier >>> +     * than the full input data is consumed. That's why we are looping >>> +     * until all the input data is consumed. >>> +     */ >>> +    while (input.pos < input.size) { >>> +        size_t zstd_ret; >>> +        /* >>> +         * ZSTD spec: "You must continue calling ZSTD_compressStream2() >>> +         * with ZSTD_e_end until it returns 0, at which point you are >>> +         * free to start a new frame". We assume that "start a new >>> frame" >>> +         * means call ZSTD_compressStream2 in the very beginning or >>> when >>> +         * ZSTD_compressStream2 has returned with 0. >>> +         */ >>> +        do { >>> +            zstd_ret = ZSTD_compressStream2(cctx, &output, &input, >>> ZSTD_e_end); >> The spec makes it sound to me like ZSTD_e_end will always complete in a >> single call if there’s enough space in the output buffer.  So the only >> team we have to loop would be when there isn’t enough space anyway: >> >> It says this about ZSTD_e_end: >>> flush operation is the same, and follows same rules as calling >>> ZSTD_compressStream2() with ZSTD_e_flush. >> Those rules being: >>> Note that, if `output->size` is too small, a single invocation with >>> ZSTD_e_flush might not be enough (return code > 0). >> So it seems like it will only return a value > 0 if the output buffer is >> definitely too small. >> >> The spec also notes that the return value is greater than 0 if: >>>> 0 if some data still present within internal buffer (the value is >>> minimal estimation of remaining size), >> So it’s a minimum estimate.  That’s another point that heavily implies >> to me that if the return value were less than what’s left in the buffer, >> the function wouldn’t return but still try to write it out, until it >> realizes that there isn’t enough space in the output buffer, and then >> return a value that exceeds the remaining output buffer size. >> >> (Because if the function just played it safe, I would expect it to >> return a maximum estimate.) >> >> >> OTOH, if it were actually possible for ZSTD_e_end to finish a frame >> earlier than the end of the input, I think it would make more sense to >> use ZSTD_e_continue until the input is done and then finish with >> ZSTD_e_end, like the spec seems to propose.  That way, we’d always end >> up with a single frame to make decompression simpler (and I think it >> would also make more sense overall). >> >> >> But anyway.  From how I understand the spec, this code simply always >> ends up creating a single frame or erroring out, without looping ever. >> So it isn’t exactly wrong, it just seems overly complicated.  (Again, >> assuming I understand the spec correctly.  Which seems like a tough >> thing to assume, because the spec is not exactly obvious to read...) >> >> (Running some quick tests by converting some images with zstd >> compression seems to confirm that whenever ZSTD_compressStream2() >> returns, either zstd_ret > output.size - output.pos, or zstd_ret == 0 >> and input.pos == input.size.  So none of the loops ever loop.) >> >> Max > > So, what should we do? > > 1. Rely on the test that there's no need for the loop: >    * make one ZSTD_compressStream2() call >    * make sure it returned with zstd_ret == 0 and >      input.pos == input.size. >      if so, return with the size >    * if not, check that zstd_ret > output.size - output.pos >      if so, return with -ENOMEM >    * if none above return with -EIO > >    This should cover the majority of the compressing cases According to how I interpret the spec, “none of the above” should never happen except for ZSTD_isError(zstd_ret), so this should cover all compressing cases, actually. > 2. Leave the loop as is, because of the documentation: >    "You *must* continue calling ZSTD_compressStream2() with ZSTD_e_end > until it returns 0, >     at which point you are free to start a new frame." As far as I can see, the return value is always 0 or greater than the remaining buffer space, so this will always be satisfied even without a loop. (We will always have one of three cases: (1) Success and all input has been consumed, (2) ZSTD_isError(zstd_ret), so we return -EIO, (3) zstd_ret > output.size - output.pos, so we return -ENOMEM. I interpret the “You *must* continue until it returns 0” as “If there is no sufficient space in the output buffer, this function will return a value greater than 0 indicating how much space is at least still required. The caller is free to supply a greater output buffer for the next call (by supplying a different ZSTD_outBuffer structure), and then we’ll try again.” (I.e., retrying with the same ZSTD_outBuffer will make the function return immediately because it knows that it’s insufficient.) Max