Re: [PATCH v4 03/11] block: use int64_t instead of uint64_t in driver read handlers

From: Eric Blake <eblake@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org
Cc: fam@euphon.net, kwolf@redhat.com, integration@gluster.org,
	berto@igalia.com, stefanha@redhat.com, pavel.dovgaluk@ispras.ru,
	sw@weilnetz.de, pl@kamp.de, qemu-devel@nongnu.org,
	mreitz@redhat.com, jsnow@redhat.com, ronniesahlberg@gmail.com,
	pbonzini@redhat.com, namei.unix@gmail.com, dillaman@redhat.com,
	ari@tuxera.com
Subject: Re: [PATCH v4 03/11] block: use int64_t instead of uint64_t in driver read handlers
Date: Tue, 11 May 2021 14:22:50 -0500	[thread overview]
Message-ID: <6d6b0719-4a43-7d2a-153f-cd19dfaa00e5@redhat.com> (raw)
In-Reply-To: <20210324205132.464899-4-vsementsov@virtuozzo.com>

On 3/24/21 3:51 PM, Vladimir Sementsov-Ogievskiy wrote:
> We are going to convert .bdrv_co_preadv_part and .bdrv_co_pwritev_part
> to int64_t type for offset and bytes parameters (as it's already done
> for generic block/io.c layer).
> 
> In qcow2 .bdrv_co_preadv_part is used in some places, so let's add
> corresponding checks and assertions.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Unusual line, especially since...

> 
> block: use int64_t instead of uint64_t in driver read handlers
> 
> We are generally moving to int64_t for both offset and bytes parameters
> on all io paths.
> 
> Main motivation is realization of 64-bit write_zeroes operation for
> fast zeroing large disk chunks, up to the whole disk.
> 
> We chose signed type, to be consistent with off_t (which is signed) and
> with possibility for signed return type (where negative value means
> error).
> 
> So, convert driver read handlers parameters which are already 64bit to
> signed type.
> 
> While being here, convert also flags parameter to be BdrvRequestFlags.
> 
> Now let's consider all callers. Simple
> 
>   git grep '\->bdrv_\(aio\|co\)_preadv\(_part\)\?'
> 
> shows that's there three callers of driver function:
> 
>  bdrv_driver_preadv() in block/io.c, passes int64_t, checked by
>    bdrv_check_qiov_request() to be non-negative.
> 
>  qcow2_load_vmstate() does bdrv_check_qiov_request().
> 
>  do_perform_cow_read() has uint64_t argument. And a lot of things in
>  qcow2 driver are uint64_t, so converting it is big job. But we must
>  not work with requests that doesn't satisfy bdrv_check_qiov_request(),

s/doesn't/don't/

>  so let's just assert it here.
> 
> Still, the functions may be called directly, not only by drv->...
> Let's check:
> 
> git grep '\.bdrv_\(aio\|co\)_preadv\(_part\)\?\s*=' | \
> awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \
> while read func; do git grep "$func(" | \
> grep -v "$func(BlockDriverState"; done
> 
> The only one such caller:
> 
>     QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1);
>     ...
>     ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0);
> 
> in tesTS/unit/test-bdrv-drain.c, and it's OK obviously.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

...it is repeated here. I'm fine dropping the first.

> ---
>  include/block/block_int.h        | 11 ++++++-----
>  block/backup-top.c               |  4 ++--

>  35 files changed, 120 insertions(+), 89 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index db7a909ea9..e6bcf74e46 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -234,8 +234,8 @@ struct BlockDriver {
>  
>      /* aio */
>      BlockAIOCB *(*bdrv_aio_preadv)(BlockDriverState *bs,
> -        uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags,
> -        BlockCompletionFunc *cb, void *opaque);
> +        int64_t offset, int64_t bytes, QEMUIOVector *qiov,
> +        BdrvRequestFlags flags, BlockCompletionFunc *cb, void *opaque);
>      BlockAIOCB *(*bdrv_aio_pwritev)(BlockDriverState *bs,
>          uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags,
>          BlockCompletionFunc *cb, void *opaque);
> @@ -264,10 +264,11 @@ struct BlockDriver {
>       * The buffer in @qiov may point directly to guest memory.
>       */
>      int coroutine_fn (*bdrv_co_preadv)(BlockDriverState *bs,
> -        uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
> +        int64_t offset, int64_t bytes, QEMUIOVector *qiov,
> +        BdrvRequestFlags flags);
>      int coroutine_fn (*bdrv_co_preadv_part)(BlockDriverState *bs,
> -        uint64_t offset, uint64_t bytes,
> -        QEMUIOVector *qiov, size_t qiov_offset, int flags);
> +        int64_t offset, int64_t bytes,
> +        QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags);

Lots of fallout, but I'm in favor of this signature change.

> +++ b/block/blkdebug.c
> @@ -614,8 +614,8 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
>  }
>  
>  static int coroutine_fn
> -blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
> -                   QEMUIOVector *qiov, int flags)
> +blkdebug_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
> +                   QEMUIOVector *qiov, BdrvRequestFlags flags)
>  {
>      int err;

Still calls rule_check() with uint64_t parameters, but since we've
asserted the caller was within range, no behavior change.

> +++ b/block/blkverify.c
> @@ -221,8 +221,8 @@ blkverify_co_prwv(BlockDriverState *bs, BlkverifyRequest *r, uint64_t offset,
>  }
>  
>  static int coroutine_fn
> -blkverify_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
> -                    QEMUIOVector *qiov, int flags)
> +blkverify_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
> +                    QEMUIOVector *qiov, BdrvRequestFlags flags)
>  {

Similarly for the call to blkverify_co_prwv(), and probably elsewhere in
the patch.

> +++ b/block/crypto.c
> @@ -397,8 +397,8 @@ static int block_crypto_reopen_prepare(BDRVReopenState *state,
>  #define BLOCK_CRYPTO_MAX_IO_SIZE (1024 * 1024)
>  
>  static coroutine_fn int
> -block_crypto_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
> -                       QEMUIOVector *qiov, int flags)
> +block_crypto_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
> +                       QEMUIOVector *qiov, BdrvRequestFlags flags)
>  {
>      BlockCrypto *crypto = bs->opaque;
>      uint64_t cur_bytes; /* number of bytes in current iteration */

We could perhaps change the type of local variables like cur_bytes and
bytes_done; but it doesn't affect semantics.

> +++ b/block/curl.c
> @@ -896,7 +896,8 @@ out:
>  }
>  
>  static int coroutine_fn curl_co_preadv(BlockDriverState *bs,
> -        uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
> +        int64_t offset, int64_t bytes, QEMUIOVector *qiov,
> +        BdrvRequestFlags flags)
>  {
>      CURLAIOCB acb = {
>          .co = qemu_coroutine_self(),

Likewise changing the type of CURLAIOCB.offset and .bytes.  Cleanups
like that can be followups.

> +++ b/block/file-posix.c
> @@ -2033,9 +2033,9 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
>      return raw_thread_pool_submit(bs, handle_aiocb_rw, &acb);
>  }
>  
> -static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset,
> -                                      uint64_t bytes, QEMUIOVector *qiov,
> -                                      int flags)
> +static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset,
> +                                      int64_t bytes, QEMUIOVector *qiov,
> +                                      BdrvRequestFlags flags)
>  {
>      return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_READ);

Similarly for fixing the signature of raw_co_prw() (after the
counterpart change to raw_co_pwritev)

> +++ b/block/nvme.c
> @@ -1221,8 +1221,9 @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
>  }
>  
>  static coroutine_fn int nvme_co_preadv(BlockDriverState *bs,
> -                                       uint64_t offset, uint64_t bytes,
> -                                       QEMUIOVector *qiov, int flags)
> +                                       int64_t offset, int64_t bytes,
> +                                       QEMUIOVector *qiov,
> +                                       BdrvRequestFlags flags)
>  {
>      return nvme_co_prw(bs, offset, bytes, qiov, false, flags);
>  }

And for nvme_co_prw().

> +++ b/block/qcow2.c
> @@ -2281,9 +2281,10 @@ static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task)
>  }
>  
>  static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
> -                                             uint64_t offset, uint64_t bytes,
> +                                             int64_t offset, int64_t bytes,
>                                               QEMUIOVector *qiov,
> -                                             size_t qiov_offset, int flags)
> +                                             size_t qiov_offset,
> +                                             BdrvRequestFlags flags)
>  {
>      BDRVQcow2State *s = bs->opaque;
>      int ret = 0;

Unrelated to this patch; the loop sets cur_bytes = MIN(bytes, INT_MAX);
but it would be smarter to choose a cluster-aligned max instead of
INT_MAX.  It doesn't matter as long as the block layer has pre-clamped
requests to be < 32 bit to begin with, and then our later call to
qcow2_get_host_offset() further clamps it down to actual cluster
boundaries.  But it does look odd, compared to computing the original
MIN() to an aligned boundary in the first place, whether or not we ever
change the block layer to allow individual drivers to opt in to reading
more than 2G in one transaction.

qcow2_add_task() is another internal function worth improving in a followup.

> +++ b/block/quorum.c
> @@ -663,8 +663,8 @@ static int read_fifo_child(QuorumAIOCB *acb)
>      return ret;
>  }
>  
> -static int quorum_co_preadv(BlockDriverState *bs, uint64_t offset,
> -                            uint64_t bytes, QEMUIOVector *qiov, int flags)
> +static int quorum_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
> +                            QEMUIOVector *qiov, BdrvRequestFlags flags)
>  {
>      BDRVQuorumState *s = bs->opaque;
>      QuorumAIOCB *acb = quorum_aio_get(bs, qiov, offset, bytes, flags);

and quorum_aio_get().

> diff --git a/block/raw-format.c b/block/raw-format.c
> index 7717578ed6..e3f459b2cb 100644
> --- a/block/raw-format.c
> +++ b/block/raw-format.c
> @@ -181,8 +181,8 @@ static void raw_reopen_abort(BDRVReopenState *state)
>  }
>  
>  /* Check and adjust the offset, against 'offset' and 'size' options. */
> -static inline int raw_adjust_offset(BlockDriverState *bs, uint64_t *offset,
> -                                    uint64_t bytes, bool is_write)
> +static inline int raw_adjust_offset(BlockDriverState *bs, int64_t *offset,
> +                                    int64_t bytes, bool is_write)
>  {
>      BDRVRawState *s = bs->opaque;
>  
> @@ -201,9 +201,9 @@ static inline int raw_adjust_offset(BlockDriverState *bs, uint64_t *offset,
>      return 0;
>  }
>  
> -static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset,
> -                                      uint64_t bytes, QEMUIOVector *qiov,
> -                                      int flags)
> +static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset,
> +                                      int64_t bytes, QEMUIOVector *qiov,
> +                                      BdrvRequestFlags flags)
>  {
>      int ret;
>  
> @@ -259,7 +259,7 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
>          qiov = &local_qiov;
>      }
>  
> -    ret = raw_adjust_offset(bs, &offset, bytes, true);
> +    ret = raw_adjust_offset(bs, (int64_t *)&offset, bytes, true);

Ugly type-punning; thankfully it's transient until the counterpart patch
to driver write handlers.

>      if (ret) {
>          goto fail;
>      }
> @@ -294,7 +294,7 @@ static int coroutine_fn raw_co_pwrite_zeroes(BlockDriverState *bs,
>  {
>      int ret;
>  
> -    ret = raw_adjust_offset(bs, (uint64_t *)&offset, bytes, true);
> +    ret = raw_adjust_offset(bs, (int64_t *)&offset, bytes, true);

And now you should lose this cast...

>      if (ret) {
>          return ret;
>      }
> @@ -306,7 +306,7 @@ static int coroutine_fn raw_co_pdiscard(BlockDriverState *bs,
>  {
>      int ret;
>  
> -    ret = raw_adjust_offset(bs, (uint64_t *)&offset, bytes, true);
> +    ret = raw_adjust_offset(bs, &offset, bytes, true);

...like you did here.

>      if (ret) {
>          return ret;
>      }
> @@ -541,7 +541,7 @@ static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs,
>  {
>      int ret;
>  
> -    ret = raw_adjust_offset(bs, &src_offset, bytes, false);
> +    ret = raw_adjust_offset(bs, (int64_t *)&src_offset, bytes, false);
>      if (ret) {
>          return ret;
>      }
> @@ -560,7 +560,7 @@ static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs,
>  {
>      int ret;
>  
> -    ret = raw_adjust_offset(bs, &dst_offset, bytes, true);
> +    ret = raw_adjust_offset(bs, (int64_t *)&dst_offset, bytes, true);

Also transient casts.

Easy enough to fix my nits for qcow2 and the commit message.
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org