All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Denis Plotnikov <dplotnikov@virtuozzo.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, vsementsov@virtuozzo.com, berto@igalia.com,
	qemu-block@nongnu.org, armbru@redhat.com, den@openvz.org
Subject: Re: [PATCH v20 3/4] qcow2: add zstd cluster compression
Date: Mon, 27 Apr 2020 14:35:24 +0200	[thread overview]
Message-ID: <f8b52ed6-9532-ff65-5c18-0b5142c3b550@redhat.com> (raw)
In-Reply-To: <20200421081117.7595-4-dplotnikov@virtuozzo.com>


[-- Attachment #1.1: Type: text/plain, Size: 7238 bytes --]

On 21.04.20 10:11, Denis Plotnikov wrote:
> zstd significantly reduces cluster compression time.
> It provides better compression performance maintaining
> the same level of the compression ratio in comparison with
> zlib, which, at the moment, is the only compression
> method available.
> 
> The performance test results:
> Test compresses and decompresses qemu qcow2 image with just
> installed rhel-7.6 guest.
> Image cluster size: 64K. Image on disk size: 2.2G
> 
> The test was conducted with brd disk to reduce the influence
> of disk subsystem to the test results.
> The results is given in seconds.
> 
> compress cmd:
>   time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
>                   src.img [zlib|zstd]_compressed.img
> decompress cmd
>   time ./qemu-img convert -O qcow2
>                   [zlib|zstd]_compressed.img uncompressed.img
> 
>            compression               decompression
>          zlib       zstd           zlib         zstd
> ------------------------------------------------------------
> real     65.5       16.3 (-75 %)    1.9          1.6 (-16 %)
> user     65.0       15.8            5.3          2.5
> sys       3.3        0.2            2.0          2.0
> 
> Both ZLIB and ZSTD gave the same compression ratio: 1.57
> compressed image size in both cases: 1.4G
> 
> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> QAPI part:
> Acked-by: Markus Armbruster <armbru@redhat.com>
> ---
>  docs/interop/qcow2.txt |   1 +
>  configure              |   2 +-
>  qapi/block-core.json   |   3 +-
>  block/qcow2-threads.c  | 157 +++++++++++++++++++++++++++++++++++++++++
>  block/qcow2.c          |   7 ++
>  5 files changed, 168 insertions(+), 2 deletions(-)

[...]

> diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
> index 7dbaf53489..0525718704 100644
> --- a/block/qcow2-threads.c
> +++ b/block/qcow2-threads.c
> @@ -28,6 +28,11 @@
>  #define ZLIB_CONST
>  #include <zlib.h>
>  
> +#ifdef CONFIG_ZSTD
> +#include <zstd.h>
> +#include <zstd_errors.h>
> +#endif
> +
>  #include "qcow2.h"
>  #include "block/thread-pool.h"
>  #include "crypto.h"
> @@ -166,6 +171,148 @@ static ssize_t qcow2_zlib_decompress(void *dest, size_t dest_size,
>      return ret;
>  }
>  
> +#ifdef CONFIG_ZSTD
> +
> +/*
> + * qcow2_zstd_compress()
> + *
> + * Compress @src_size bytes of data using zstd compression method
> + *
> + * @dest - destination buffer, @dest_size bytes
> + * @src - source buffer, @src_size bytes
> + *
> + * Returns: compressed size on success
> + *          -ENOMEM destination buffer is not enough to store compressed data
> + *          -EIO    on any other error
> + */
> +static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
> +                                   const void *src, size_t src_size)
> +{
> +    ssize_t ret;
> +    ZSTD_outBuffer output = { dest, dest_size, 0 };
> +    ZSTD_inBuffer input = { src, src_size, 0 };

Minor style note: I think it’d be nicer to use designated initializers here.

> +    ZSTD_CCtx *cctx = ZSTD_createCCtx();
> +
> +    if (!cctx) {
> +        return -EIO;
> +    }
> +    /*
> +     * Use the zstd streamed interface for symmetry with decompression,
> +     * where streaming is essential since we don't record the exact
> +     * compressed size.
> +     *
> +     * In the loop, we try to compress all the data into one zstd frame.
> +     * ZSTD_compressStream2 potentially can finish a frame earlier
> +     * than the full input data is consumed. That's why we are looping
> +     * until all the input data is consumed.
> +     */
> +    while (input.pos < input.size) {
> +        size_t zstd_ret;
> +        /*
> +         * ZSTD spec: "You must continue calling ZSTD_compressStream2()
> +         * with ZSTD_e_end until it returns 0, at which point you are
> +         * free to start a new frame". We assume that "start a new frame"
> +         * means call ZSTD_compressStream2 in the very beginning or when
> +         * ZSTD_compressStream2 has returned with 0.
> +         */
> +        do {
> +            zstd_ret = ZSTD_compressStream2(cctx, &output, &input, ZSTD_e_end);

The spec makes it sound to me like ZSTD_e_end will always complete in a
single call if there’s enough space in the output buffer.  So the only
team we have to loop would be when there isn’t enough space anyway:

It says this about ZSTD_e_end:
> flush operation is the same, and follows same rules as calling
> ZSTD_compressStream2() with ZSTD_e_flush.

Those rules being:
> Note that, if `output->size` is too small, a single invocation with
> ZSTD_e_flush might not be enough (return code > 0).

So it seems like it will only return a value > 0 if the output buffer is
definitely too small.

The spec also notes that the return value is greater than 0 if:
> >0 if some data still present within internal buffer (the value is
> minimal estimation of remaining size),

So it’s a minimum estimate.  That’s another point that heavily implies
to me that if the return value were less than what’s left in the buffer,
the function wouldn’t return but still try to write it out, until it
realizes that there isn’t enough space in the output buffer, and then
return a value that exceeds the remaining output buffer size.

(Because if the function just played it safe, I would expect it to
return a maximum estimate.)


OTOH, if it were actually possible for ZSTD_e_end to finish a frame
earlier than the end of the input, I think it would make more sense to
use ZSTD_e_continue until the input is done and then finish with
ZSTD_e_end, like the spec seems to propose.  That way, we’d always end
up with a single frame to make decompression simpler (and I think it
would also make more sense overall).


But anyway.  From how I understand the spec, this code simply always
ends up creating a single frame or erroring out, without looping ever.
So it isn’t exactly wrong, it just seems overly complicated.  (Again,
assuming I understand the spec correctly.  Which seems like a tough
thing to assume, because the spec is not exactly obvious to read...)

(Running some quick tests by converting some images with zstd
compression seems to confirm that whenever ZSTD_compressStream2()
returns, either zstd_ret > output.size - output.pos, or zstd_ret == 0
and input.pos == input.size.  So none of the loops ever loop.)

Max

> +
> +            if (ZSTD_isError(zstd_ret)) {
> +                ret = -EIO;
> +                goto out;
> +            }
> +            /* Dest buffer isn't big enough to store compressed content */
> +            if (zstd_ret > output.size - output.pos) {
> +                ret = -ENOMEM;
> +                goto out;
> +            }
> +        } while (zstd_ret);
> +    }
> +    /* make sure we can safely return compressed buffer size with ssize_t */
> +    assert(output.pos <= SSIZE_MAX);
> +    ret = output.pos;
> +out:
> +    ZSTD_freeCCtx(cctx);
> +    return ret;
> +}


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-04-27 12:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-21  8:11 [PATCH v20 0/4] qcow2: Implement zstd cluster compression methodi Denis Plotnikov
2020-04-21  8:11 ` [PATCH v20 1/4] qcow2: introduce compression type feature Denis Plotnikov
2020-04-21 10:40   ` Alberto Garcia
2020-04-27 12:35   ` Max Reitz
2020-04-21  8:11 ` [PATCH v20 2/4] qcow2: rework the cluster compression routine Denis Plotnikov
2020-04-27 12:36   ` Max Reitz
2020-04-21  8:11 ` [PATCH v20 3/4] qcow2: add zstd cluster compression Denis Plotnikov
2020-04-27 12:35   ` Max Reitz [this message]
2020-04-27 19:26     ` Denis Plotnikov
2020-04-28  6:16       ` Max Reitz
2020-04-28  7:23         ` Denis Plotnikov
2020-04-28 10:17           ` Max Reitz
2020-04-21  8:11 ` [PATCH v20 4/4] iotests: 287: add qcow2 compression type test Denis Plotnikov
2020-04-21 12:06   ` Vladimir Sementsov-Ogievskiy
2020-04-27 13:29   ` Max Reitz
2020-04-28 11:41     ` Denis Plotnikov
2020-04-28 12:55       ` Max Reitz
2020-04-28 13:01         ` Eric Blake
2020-04-28 13:34           ` Denis Plotnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f8b52ed6-9532-ff65-5c18-0b5142c3b550@redhat.com \
    --to=mreitz@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berto@igalia.com \
    --cc=den@openvz.org \
    --cc=dplotnikov@virtuozzo.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.