Re: [PATCH v4 03/11] block: use int64_t instead of uint64_t in driver read handlers - Vladimir Sementsov-Ogievskiy

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Eric Blake <eblake@redhat.com>, qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, integration@gluster.org,
	namei.unix@gmail.com, dillaman@redhat.com, berto@igalia.com,
	pl@kamp.de, ronniesahlberg@gmail.com, fam@euphon.net,
	sw@weilnetz.de, stefanha@redhat.com, pbonzini@redhat.com,
	pavel.dovgaluk@ispras.ru, ari@tuxera.com, mreitz@redhat.com,
	kwolf@redhat.com, jsnow@redhat.com
Subject: Re: [PATCH v4 03/11] block: use int64_t instead of uint64_t in driver read handlers
Date: Mon, 24 May 2021 15:57:00 +0300	[thread overview]
Message-ID: <6bad7a4b-5443-d7a4-4d80-9bfd413aa504@virtuozzo.com> (raw)
In-Reply-To: <6d6b0719-4a43-7d2a-153f-cd19dfaa00e5@redhat.com>

11.05.2021 22:22, Eric Blake wrote:
> On 3/24/21 3:51 PM, Vladimir Sementsov-Ogievskiy wrote:
>> We are going to convert .bdrv_co_preadv_part and .bdrv_co_pwritev_part
>> to int64_t type for offset and bytes parameters (as it's already done
>> for generic block/io.c layer).
>>
>> In qcow2 .bdrv_co_preadv_part is used in some places, so let's add
>> corresponding checks and assertions.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> Unusual line, especially since...
> 
>>
>> block: use int64_t instead of uint64_t in driver read handlers
>>
>> We are generally moving to int64_t for both offset and bytes parameters
>> on all io paths.
>>
>> Main motivation is realization of 64-bit write_zeroes operation for
>> fast zeroing large disk chunks, up to the whole disk.
>>
>> We chose signed type, to be consistent with off_t (which is signed) and
>> with possibility for signed return type (where negative value means
>> error).
>>
>> So, convert driver read handlers parameters which are already 64bit to
>> signed type.
>>
>> While being here, convert also flags parameter to be BdrvRequestFlags.
>>
>> Now let's consider all callers. Simple
>>
>>    git grep '\->bdrv_\(aio\|co\)_preadv\(_part\)\?'
>>
>> shows that's there three callers of driver function:
>>
>>   bdrv_driver_preadv() in block/io.c, passes int64_t, checked by
>>     bdrv_check_qiov_request() to be non-negative.
>>
>>   qcow2_load_vmstate() does bdrv_check_qiov_request().
>>
>>   do_perform_cow_read() has uint64_t argument. And a lot of things in
>>   qcow2 driver are uint64_t, so converting it is big job. But we must
>>   not work with requests that doesn't satisfy bdrv_check_qiov_request(),
> 
> s/doesn't/don't/
> 
>>   so let's just assert it here.
>>
>> Still, the functions may be called directly, not only by drv->...
>> Let's check:
>>
>> git grep '\.bdrv_\(aio\|co\)_preadv\(_part\)\?\s*=' | \
>> awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \
>> while read func; do git grep "$func(" | \
>> grep -v "$func(BlockDriverState"; done
>>
>> The only one such caller:
>>
>>      QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1);
>>      ...
>>      ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0);
>>
>> in tesTS/unit/test-bdrv-drain.c, and it's OK obviously.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> ...it is repeated here. I'm fine dropping the first.
> 
>> ---
>>   include/block/block_int.h        | 11 ++++++-----
>>   block/backup-top.c               |  4 ++--
> 
>>   35 files changed, 120 insertions(+), 89 deletions(-)
>>
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index db7a909ea9..e6bcf74e46 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -234,8 +234,8 @@ struct BlockDriver {
>>   
>>       /* aio */
>>       BlockAIOCB *(*bdrv_aio_preadv)(BlockDriverState *bs,
>> -        uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags,
>> -        BlockCompletionFunc *cb, void *opaque);
>> +        int64_t offset, int64_t bytes, QEMUIOVector *qiov,
>> +        BdrvRequestFlags flags, BlockCompletionFunc *cb, void *opaque);
>>       BlockAIOCB *(*bdrv_aio_pwritev)(BlockDriverState *bs,
>>           uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags,
>>           BlockCompletionFunc *cb, void *opaque);
>> @@ -264,10 +264,11 @@ struct BlockDriver {
>>        * The buffer in @qiov may point directly to guest memory.
>>        */
>>       int coroutine_fn (*bdrv_co_preadv)(BlockDriverState *bs,
>> -        uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags);
>> +        int64_t offset, int64_t bytes, QEMUIOVector *qiov,
>> +        BdrvRequestFlags flags);
>>       int coroutine_fn (*bdrv_co_preadv_part)(BlockDriverState *bs,
>> -        uint64_t offset, uint64_t bytes,
>> -        QEMUIOVector *qiov, size_t qiov_offset, int flags);
>> +        int64_t offset, int64_t bytes,
>> +        QEMUIOVector *qiov, size_t qiov_offset, BdrvRequestFlags flags);
> 
> Lots of fallout, but I'm in favor of this signature change.
> 
> 
>> +++ b/block/blkdebug.c
>> @@ -614,8 +614,8 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
>>   }
>>   
>>   static int coroutine_fn
>> -blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
>> -                   QEMUIOVector *qiov, int flags)
>> +blkdebug_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
>> +                   QEMUIOVector *qiov, BdrvRequestFlags flags)
>>   {
>>       int err;
> 
> Still calls rule_check() with uint64_t parameters, but since we've
> asserted the caller was within range, no behavior change.
> 
>> +++ b/block/blkverify.c
>> @@ -221,8 +221,8 @@ blkverify_co_prwv(BlockDriverState *bs, BlkverifyRequest *r, uint64_t offset,
>>   }
>>   
>>   static int coroutine_fn
>> -blkverify_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
>> -                    QEMUIOVector *qiov, int flags)
>> +blkverify_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
>> +                    QEMUIOVector *qiov, BdrvRequestFlags flags)
>>   {
> 
> Similarly for the call to blkverify_co_prwv(), and probably elsewhere in
> the patch.
> 
>> +++ b/block/crypto.c
>> @@ -397,8 +397,8 @@ static int block_crypto_reopen_prepare(BDRVReopenState *state,
>>   #define BLOCK_CRYPTO_MAX_IO_SIZE (1024 * 1024)
>>   
>>   static coroutine_fn int
>> -block_crypto_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
>> -                       QEMUIOVector *qiov, int flags)
>> +block_crypto_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
>> +                       QEMUIOVector *qiov, BdrvRequestFlags flags)
>>   {
>>       BlockCrypto *crypto = bs->opaque;
>>       uint64_t cur_bytes; /* number of bytes in current iteration */
> 
> We could perhaps change the type of local variables like cur_bytes and
> bytes_done; but it doesn't affect semantics.
> 
>> +++ b/block/curl.c
>> @@ -896,7 +896,8 @@ out:
>>   }
>>   
>>   static int coroutine_fn curl_co_preadv(BlockDriverState *bs,
>> -        uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
>> +        int64_t offset, int64_t bytes, QEMUIOVector *qiov,
>> +        BdrvRequestFlags flags)
>>   {
>>       CURLAIOCB acb = {
>>           .co = qemu_coroutine_self(),
> 
> Likewise changing the type of CURLAIOCB.offset and .bytes.  Cleanups
> like that can be followups.
> 
>> +++ b/block/file-posix.c
>> @@ -2033,9 +2033,9 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
>>       return raw_thread_pool_submit(bs, handle_aiocb_rw, &acb);
>>   }
>>   
>> -static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset,
>> -                                      uint64_t bytes, QEMUIOVector *qiov,
>> -                                      int flags)
>> +static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset,
>> +                                      int64_t bytes, QEMUIOVector *qiov,
>> +                                      BdrvRequestFlags flags)
>>   {
>>       return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_READ);
> 
> Similarly for fixing the signature of raw_co_prw() (after the
> counterpart change to raw_co_pwritev)
> 
>> +++ b/block/nvme.c
>> @@ -1221,8 +1221,9 @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
>>   }
>>   
>>   static coroutine_fn int nvme_co_preadv(BlockDriverState *bs,
>> -                                       uint64_t offset, uint64_t bytes,
>> -                                       QEMUIOVector *qiov, int flags)
>> +                                       int64_t offset, int64_t bytes,
>> +                                       QEMUIOVector *qiov,
>> +                                       BdrvRequestFlags flags)
>>   {
>>       return nvme_co_prw(bs, offset, bytes, qiov, false, flags);
>>   }
> 
> And for nvme_co_prw().
> 
>> +++ b/block/qcow2.c
>> @@ -2281,9 +2281,10 @@ static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task)
>>   }
>>   
>>   static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
>> -                                             uint64_t offset, uint64_t bytes,
>> +                                             int64_t offset, int64_t bytes,
>>                                                QEMUIOVector *qiov,
>> -                                             size_t qiov_offset, int flags)
>> +                                             size_t qiov_offset,
>> +                                             BdrvRequestFlags flags)
>>   {
>>       BDRVQcow2State *s = bs->opaque;
>>       int ret = 0;
> 
> Unrelated to this patch; the loop sets cur_bytes = MIN(bytes, INT_MAX);
> but it would be smarter to choose a cluster-aligned max instead of
> INT_MAX.  It doesn't matter as long as the block layer has pre-clamped
> requests to be < 32 bit to begin with, and then our later call to
> qcow2_get_host_offset() further clamps it down to actual cluster
> boundaries.  But it does look odd, compared to computing the original
> MIN() to an aligned boundary in the first place, whether or not we ever
> change the block layer to allow individual drivers to opt in to reading
> more than 2G in one transaction.
> 
> qcow2_add_task() is another internal function worth improving in a followup.
> 
>> +++ b/block/quorum.c
>> @@ -663,8 +663,8 @@ static int read_fifo_child(QuorumAIOCB *acb)
>>       return ret;
>>   }
>>   
>> -static int quorum_co_preadv(BlockDriverState *bs, uint64_t offset,
>> -                            uint64_t bytes, QEMUIOVector *qiov, int flags)
>> +static int quorum_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
>> +                            QEMUIOVector *qiov, BdrvRequestFlags flags)
>>   {
>>       BDRVQuorumState *s = bs->opaque;
>>       QuorumAIOCB *acb = quorum_aio_get(bs, qiov, offset, bytes, flags);
> 
> and quorum_aio_get().
> 
>> diff --git a/block/raw-format.c b/block/raw-format.c
>> index 7717578ed6..e3f459b2cb 100644
>> --- a/block/raw-format.c
>> +++ b/block/raw-format.c
>> @@ -181,8 +181,8 @@ static void raw_reopen_abort(BDRVReopenState *state)
>>   }
>>   
>>   /* Check and adjust the offset, against 'offset' and 'size' options. */
>> -static inline int raw_adjust_offset(BlockDriverState *bs, uint64_t *offset,
>> -                                    uint64_t bytes, bool is_write)
>> +static inline int raw_adjust_offset(BlockDriverState *bs, int64_t *offset,
>> +                                    int64_t bytes, bool is_write)
>>   {
>>       BDRVRawState *s = bs->opaque;
>>   
>> @@ -201,9 +201,9 @@ static inline int raw_adjust_offset(BlockDriverState *bs, uint64_t *offset,
>>       return 0;
>>   }
>>   
>> -static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset,
>> -                                      uint64_t bytes, QEMUIOVector *qiov,
>> -                                      int flags)
>> +static int coroutine_fn raw_co_preadv(BlockDriverState *bs, int64_t offset,
>> +                                      int64_t bytes, QEMUIOVector *qiov,
>> +                                      BdrvRequestFlags flags)
>>   {
>>       int ret;
>>   
>> @@ -259,7 +259,7 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
>>           qiov = &local_qiov;
>>       }
>>   
>> -    ret = raw_adjust_offset(bs, &offset, bytes, true);
>> +    ret = raw_adjust_offset(bs, (int64_t *)&offset, bytes, true);
> 
> Ugly type-punning; thankfully it's transient until the counterpart patch
> to driver write handlers.
> 
>>       if (ret) {
>>           goto fail;
>>       }
>> @@ -294,7 +294,7 @@ static int coroutine_fn raw_co_pwrite_zeroes(BlockDriverState *bs,
>>   {
>>       int ret;
>>   
>> -    ret = raw_adjust_offset(bs, (uint64_t *)&offset, bytes, true);
>> +    ret = raw_adjust_offset(bs, (int64_t *)&offset, bytes, true);
> 
> And now you should lose this cast...
> 
>>       if (ret) {
>>           return ret;
>>       }
>> @@ -306,7 +306,7 @@ static int coroutine_fn raw_co_pdiscard(BlockDriverState *bs,
>>   {
>>       int ret;
>>   
>> -    ret = raw_adjust_offset(bs, (uint64_t *)&offset, bytes, true);
>> +    ret = raw_adjust_offset(bs, &offset, bytes, true);
> 
> ...like you did here.
> 
>>       if (ret) {
>>           return ret;
>>       }
>> @@ -541,7 +541,7 @@ static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs,
>>   {
>>       int ret;
>>   
>> -    ret = raw_adjust_offset(bs, &src_offset, bytes, false);
>> +    ret = raw_adjust_offset(bs, (int64_t *)&src_offset, bytes, false);
>>       if (ret) {
>>           return ret;
>>       }
>> @@ -560,7 +560,7 @@ static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs,
>>   {
>>       int ret;
>>   
>> -    ret = raw_adjust_offset(bs, &dst_offset, bytes, true);
>> +    ret = raw_adjust_offset(bs, (int64_t *)&dst_offset, bytes, true);
> 
> Also transient casts.
> 
> Easy enough to fix my nits for qcow2 and the commit message.
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 

But you said that qcow2 note is unrelated to that patch. So, as I understand, only commit message
and raw_co_pwrite_zeroes (drop extra case) should be fixed here, yes?

-- 
Best regards,
Vladimir