Re: [Qemu-devel] [PATCH v10 11/17] qcow2: Discard/zero clusters by byte count

From: Max Reitz <mreitz@redhat.com>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v10 11/17] qcow2: Discard/zero clusters by byte count
Date: Wed, 3 May 2017 20:28:05 +0200	[thread overview]
Message-ID: <0fb599e4-9f19-e0c5-9e71-4e697c43abfd@redhat.com> (raw)
In-Reply-To: <20170427014626.11553-12-eblake@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5790 bytes --]

On 27.04.2017 03:46, Eric Blake wrote:
> Passing a byte offset, but sector count, when we ultimately
> want to operate on cluster granularity, is madness.  Clean up
> the external interfaces to take both offset and count as bytes,
> while still keeping the assertion added previously that the
> caller must align the values to a cluster.  Then rename things
> to make sure backports don't get confused by changed units:
> instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
> we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().
> 
> The internal functions still operate on clusters at a time, and
> return an int for number of cleared clusters; but on an image
> with 2M clusters, a single L2 table holds 256k entries that each
> represent a 2M cluster, totalling well over INT_MAX bytes if we
> ever had a request for that many bytes at once.  All our callers
> currently limit themselves to 32-bit bytes (and therefore fewer
> clusters), but by making this function 64-bit clean, we have one
> less place to clean up if we later improve the block layer to
> support 64-bit bytes through all operations (with the block layer
> auto-fragmenting on behalf of more-limited drivers), rather than
> the current state where some interfaces are artificially limited
> to INT_MAX at a time.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v10: squash in fixup accounting for unaligned file end
> v9: rebase to earlier changes, drop R-b
> v7, v8: only earlier half of series submitted for 2.9
> v6: rebase due to context
> v5: s/count/byte/ to make the units obvious, and rework the math
> to ensure no 32-bit integer overflow on large clusters
> v4: improve function names, split assertion additions into earlier patch
> [no v3 or v2]
> v1: https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00339.html
> ---
>  block/qcow2.h          |  9 +++++----
>  block/qcow2-cluster.c  | 40 +++++++++++++++++++++-------------------
>  block/qcow2-snapshot.c |  7 +++----
>  block/qcow2.c          | 21 +++++++++------------
>  4 files changed, 38 insertions(+), 39 deletions(-)

[...]

> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 4f641a9..a47aadc 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1562,16 +1562,16 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
>      return nb_clusters;
>  }
> 
> -int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
> -    int nb_sectors, enum qcow2_discard_type type, bool full_discard)
> +int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
> +                          uint64_t bytes, enum qcow2_discard_type type,
> +                          bool full_discard)
>  {
>      BDRVQcow2State *s = bs->opaque;
> -    uint64_t end_offset;
> +    uint64_t end_offset = offset + bytes;
>      uint64_t nb_clusters;
> +    int64_t cleared;
>      int ret;
> 
> -    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
> -
>      /* Caller must pass aligned values, except at image end */
>      assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
>      assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||

Applying an s/end_offset - offset/bytes/ to the
nb_clusters = size_to_clusters(s, end_offset - offset) following this
would make sense (but won't functionally change anything).

> @@ -1583,13 +1583,15 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
> 
>      /* Each L2 table is handled by its own loop iteration */
>      while (nb_clusters > 0) {
> -        ret = discard_single_l2(bs, offset, nb_clusters, type, full_discard);
> -        if (ret < 0) {
> +        cleared = discard_single_l2(bs, offset, nb_clusters, type,
> +                                    full_discard);
> +        if (cleared < 0) {
> +            ret = cleared;
>              goto fail;
>          }
> 
> -        nb_clusters -= ret;
> -        offset += (ret * s->cluster_size);
> +        nb_clusters -= cleared;
> +        offset += (cleared * s->cluster_size);
>      }
> 
>      ret = 0;

[...]

> diff --git a/block/qcow2.c b/block/qcow2.c
> index 8038793..4d34610 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c

[...]

> @@ -2858,18 +2857,16 @@ static int qcow2_make_empty(BlockDriverState *bs)
> 
>      /* This fallback code simply discards every active cluster; this is slow,
>       * but works in all cases */
> -    for (start_sector = 0; start_sector < bs->total_sectors;
> -         start_sector += sector_step)
> +    end_offset = bs->total_sectors * BDRV_SECTOR_SIZE;
> +    for (offset = 0; offset < end_offset; offset += step)
>      {

This opening brace should now be on the previous line.

With at least this fixed (I leave the other thing to your discretion):

Reviewed-by: Max Reitz <mreitz@redhat.com>

>          /* As this function is generally used after committing an external
>           * snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the
>           * default action for this kind of discard is to pass the discard,
>           * which will ideally result in an actually smaller image file, as
>           * is probably desired. */
> -        ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE,
> -                                     MIN(sector_step,
> -                                         bs->total_sectors - start_sector),
> -                                     QCOW2_DISCARD_SNAPSHOT, true);
> +        ret = qcow2_cluster_discard(bs, offset, MIN(step, end_offset - offset),
> +                                    QCOW2_DISCARD_SNAPSHOT, true);
>          if (ret < 0) {
>              break;
>          }
> 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]