Re: [PATCH 1/4] qcow2: Improve refcount structure rebuilding - Vladimir Sementsov-Ogievskiy

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Max Reitz <mreitz@redhat.com>, qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [PATCH 1/4] qcow2: Improve refcount structure rebuilding
Date: Fri, 26 Mar 2021 14:48:41 +0300	[thread overview]
Message-ID: <969cd321-0cc7-ddc3-8a0d-75819be3a6bf@virtuozzo.com> (raw)
In-Reply-To: <20210310155906.147478-2-mreitz@redhat.com>

10.03.2021 18:59, Max Reitz wrote:
> When rebuilding the refcount structures (when qemu-img check -r found
> errors with refcount = 0, but reference count > 0), the new refcount
> table defaults to being put at the image file end[1].  There is no good
> reason for that except that it means we will not have to rewrite any
> refblocks we already wrote to disk.
> 
> Changing the code to rewrite those refblocks is not too difficult,
> though, so let us do that.  That is beneficial for images on block
> devices, where we cannot really write beyond the end of the image file.
> 
> [1] Unless there is something allocated in the area pointed to by the
>      last refblock, so we have to write that refblock.  In that case, we
>      try to put the reftable in there.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1519071
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/qcow2-refcount.c | 126 ++++++++++++++++++++++-------------------
>   1 file changed, 67 insertions(+), 59 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 8e649b008e..162caeeb8e 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -2352,8 +2352,9 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
>                                         int64_t *nb_clusters)
>   {
>       BDRVQcow2State *s = bs->opaque;
> -    int64_t first_free_cluster = 0, reftable_offset = -1, cluster = 0;
> +    int64_t first_free_cluster = 0, reftable_offset = -1, cluster;
>       int64_t refblock_offset, refblock_start, refblock_index;
> +    int64_t first_cluster, end_cluster;
>       uint32_t reftable_size = 0;
>       uint64_t *on_disk_reftable = NULL;
>       void *on_disk_refblock;
> @@ -2365,8 +2366,11 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
>   
>       qcow2_cache_empty(bs, s->refcount_block_cache);
>   
> +    first_cluster = 0;
> +    end_cluster = *nb_clusters;
> +
>   write_refblocks:
> -    for (; cluster < *nb_clusters; cluster++) {
> +    for (cluster = first_cluster; cluster < end_cluster; cluster++) {
>           if (!s->get_refcount(*refcount_table, cluster)) {
>               continue;
>           }
> @@ -2374,65 +2378,68 @@ write_refblocks:
>           refblock_index = cluster >> s->refcount_block_bits;
>           refblock_start = refblock_index << s->refcount_block_bits;
>   
> -        /* Don't allocate a cluster in a refblock already written to disk */
> -        if (first_free_cluster < refblock_start) {
> -            first_free_cluster = refblock_start;
> -        }
> -        refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
> -                                              nb_clusters, &first_free_cluster);
> -        if (refblock_offset < 0) {
> -            fprintf(stderr, "ERROR allocating refblock: %s\n",
> -                    strerror(-refblock_offset));
> -            res->check_errors++;
> -            ret = refblock_offset;
> -            goto fail;
> -        }
> -
> -        if (reftable_size <= refblock_index) {
> -            uint32_t old_reftable_size = reftable_size;
> -            uint64_t *new_on_disk_reftable;
> +        if (reftable_size > refblock_index &&
> +            on_disk_reftable[refblock_index])
> +        {
> +            refblock_offset = on_disk_reftable[refblock_index];

In this branch, we assign it to ..

> +        } else {
> +            int64_t refblock_cluster_index;
>   
> -            reftable_size = ROUND_UP((refblock_index + 1) * REFTABLE_ENTRY_SIZE,
> -                                     s->cluster_size) / REFTABLE_ENTRY_SIZE;
> -            new_on_disk_reftable = g_try_realloc(on_disk_reftable,
> -                                                 reftable_size *
> -                                                 REFTABLE_ENTRY_SIZE);
> -            if (!new_on_disk_reftable) {
> +            /* Don't allocate a cluster in a refblock already written to disk */
> +            if (first_free_cluster < refblock_start) {
> +                first_free_cluster = refblock_start;
> +            }
> +            refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
> +                                                  nb_clusters,
> +                                                  &first_free_cluster);
> +            if (refblock_offset < 0) {
> +                fprintf(stderr, "ERROR allocating refblock: %s\n",
> +                        strerror(-refblock_offset));
>                   res->check_errors++;
> -                ret = -ENOMEM;
> +                ret = refblock_offset;
>                   goto fail;
>               }
> -            on_disk_reftable = new_on_disk_reftable;
>   
> -            memset(on_disk_reftable + old_reftable_size, 0,
> -                   (reftable_size - old_reftable_size) * REFTABLE_ENTRY_SIZE);
> +            refblock_cluster_index = refblock_offset / s->cluster_size;
> +            if (refblock_cluster_index >= end_cluster) {
> +                /*
> +                 * We must write the refblock that holds this refblock's
> +                 * refcount
> +                 */
> +                end_cluster = refblock_cluster_index + 1;
> +            }
>   
> -            /* The offset we have for the reftable is now no longer valid;
> -             * this will leak that range, but we can easily fix that by running
> -             * a leak-fixing check after this rebuild operation */
> -            reftable_offset = -1;
> -        } else {
> -            assert(on_disk_reftable);
> -        }
> -        on_disk_reftable[refblock_index] = refblock_offset;
> +            if (reftable_size <= refblock_index) {
> +                uint32_t old_reftable_size = reftable_size;
> +                uint64_t *new_on_disk_reftable;
> +
> +                reftable_size =
> +                    ROUND_UP((refblock_index + 1) * REFTABLE_ENTRY_SIZE,
> +                             s->cluster_size) / REFTABLE_ENTRY_SIZE;
> +                new_on_disk_reftable =
> +                    g_try_realloc(on_disk_reftable,
> +                                  reftable_size * REFTABLE_ENTRY_SIZE);
> +                if (!new_on_disk_reftable) {
> +                    res->check_errors++;
> +                    ret = -ENOMEM;
> +                    goto fail;
> +                }
> +                on_disk_reftable = new_on_disk_reftable;
>   
> -        /* If this is apparently the last refblock (for now), try to squeeze the
> -         * reftable in */
> -        if (refblock_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
> -            reftable_offset < 0)
> -        {
> -            uint64_t reftable_clusters = size_to_clusters(s, reftable_size *
> -                                                          REFTABLE_ENTRY_SIZE);
> -            reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
> -                                                  refcount_table, nb_clusters,
> -                                                  &first_free_cluster);
> -            if (reftable_offset < 0) {
> -                fprintf(stderr, "ERROR allocating reftable: %s\n",
> -                        strerror(-reftable_offset));
> -                res->check_errors++;
> -                ret = reftable_offset;
> -                goto fail;
> +                memset(on_disk_reftable + old_reftable_size, 0,
> +                       (reftable_size - old_reftable_size) *
> +                       REFTABLE_ENTRY_SIZE);
> +
> +                /*
> +                 * The offset we have for the reftable is now no longer valid;
> +                 * this will leak that range, but we can easily fix that by
> +                 * running a leak-fixing check after this rebuild operation
> +                 */
> +                reftable_offset = -1;
> +            } else {
> +                assert(on_disk_reftable);
>               }
> +            on_disk_reftable[refblock_index] = refblock_offset;

only to write back again ?

>           }
>   
>           ret = qcow2_pre_write_overlap_check(bs, 0, refblock_offset,
> @@ -2459,15 +2466,12 @@ write_refblocks:
>       }
>   
>       if (reftable_offset < 0) {

at this point reftable_offset is always -1 now..

Ah not. and now I am a bit close to understanding all of this logic. this thing with "goto write_refblocks" is not obvious

> -        uint64_t post_refblock_start, reftable_clusters;
> +        uint64_t reftable_clusters;
>   
> -        post_refblock_start = ROUND_UP(*nb_clusters, s->refcount_block_size);
>           reftable_clusters =
>               size_to_clusters(s, reftable_size * REFTABLE_ENTRY_SIZE);
> -        /* Not pretty but simple */
> -        if (first_free_cluster < post_refblock_start) {
> -            first_free_cluster = post_refblock_start;
> -        }
> +
> +        first_free_cluster = 0;
>           reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
>                                                 refcount_table, nb_clusters,
>                                                 &first_free_cluster);
> @@ -2479,6 +2483,10 @@ write_refblocks:
>               goto fail;
>           }
>   
> +        assert(offset_into_cluster(s, reftable_offset) == 0);
> +        first_cluster = reftable_offset / s->cluster_size;
> +        end_cluster = first_cluster + reftable_clusters;
> +
>           goto write_refblocks;

these three lines now looks like a function call in assembler :)

>       }
>   
> 

You didn't ping the series (more than two week old) so, I'm not sure that you are not preparing v2 now.. But I kept it "marked unred" all this time, and several times tried to look at it, and postponed, because I don't familiar with this place of qcow2 driver. And the function looks too difficult. Now finally I think I understand most of the logic that you change. Honestly, I think a bit of refactoring the rebuild_refcount_structure() prior to logic change would make it a lot easier to review..

-- 
Best regards,
Vladimir