On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote:
> 12.08.2019 19:37, Vladimir Sementsov-Ogievskiy wrote:
>> 12.08.2019 19:11, Max Reitz wrote:
>>> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>>>> 12.08.2019 18:10, Max Reitz wrote:
>>>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> backup_cow_with_offload can transfer more than one cluster. Let
>>>>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>>>>>> of IO requests, since there is no need to copy cluster by cluster.
>>>>>>
>>>>>> Logic around bounce_buffer allocation changed: we can't just allocate
>>>>>> one-cluster-sized buffer to share for all iterations. We can't also
>>>>>> allocate buffer of full-request length it may be too large, so
>>>>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>>>>>> is to allocate a buffer sufficient to handle all remaining iterations
>>>>>> at the point where we need the buffer for the first time.
>>>>>>
>>>>>> Bonus: get rid of pointer-to-pointer.
>>>>>>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>> ---
>>>>>>    block/backup.c | 65 +++++++++++++++++++++++++++++++-------------------
>>>>>>    1 file changed, 41 insertions(+), 24 deletions(-)
>>>>>>
>>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>>> index d482d93458..65f7212c85 100644
>>>>>> --- a/block/backup.c
>>>>>> +++ b/block/backup.c
>>>>>> @@ -27,6 +27,7 @@
>>>>>>    #include "qemu/error-report.h"
>>>>>>    #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>>>>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>>>>>    typedef struct CowRequest {
>>>>>>        int64_t start_byte;
>>>>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>>>>>        qemu_co_queue_restart_all(&req->wait_queue);
>>>>>>    }
>>>>>> -/* Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>> - * error occurred, return a negative error number */
>>>>>> +/*
>>>>>> + * Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>> + * error occurred, return a negative error number
>>>>>> + *
>>>>>> + * @bounce_buffer is assumed to enough to store
>>>>>
>>>>> s/to/to be/
>>>>>
>>>>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>>>>>> + */
>>>>>>    static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>>>>>                                                          int64_t start,
>>>>>>                                                          int64_t end,
>>>>>>                                                          bool is_write_notifier,
>>>>>>                                                          bool *error_is_read,
>>>>>> -                                                      void **bounce_buffer)
>>>>>> +                                                      void *bounce_buffer)
>>>>>>    {
>>>>>>        int ret;
>>>>>>        BlockBackend *blk = job->common.blk;
>>>>>> -    int nbytes;
>>>>>> +    int nbytes, remaining_bytes;
>>>>>>        int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>>>>>        assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>>>>>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>>>>>> -    nbytes = MIN(job->cluster_size, job->len - start);
>>>>>> -    if (!*bounce_buffer) {
>>>>>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
>>>>>> -    }
>>>>>> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>>>>>> +    nbytes = MIN(end - start, job->len - start);
>>>>>> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
>>>>>> -    if (ret < 0) {
>>>>>> -        trace_backup_do_cow_read_fail(job, start, ret);
>>>>>> -        if (error_is_read) {
>>>>>> -            *error_is_read = true;
>>>>>> +
>>>>>> +    remaining_bytes = nbytes;
>>>>>> +    while (remaining_bytes) {
>>>>>> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>>>>>> +
>>>>>> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
>>>>>> +        if (ret < 0) {
>>>>>> +            trace_backup_do_cow_read_fail(job, start, ret);
>>>>>> +            if (error_is_read) {
>>>>>> +                *error_is_read = true;
>>>>>> +            }
>>>>>> +            goto fail;
>>>>>>            }
>>>>>> -        goto fail;
>>>>>> -    }
>>>>>> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>>>>>> -                        job->write_flags);
>>>>>> -    if (ret < 0) {
>>>>>> -        trace_backup_do_cow_write_fail(job, start, ret);
>>>>>> -        if (error_is_read) {
>>>>>> -            *error_is_read = false;
>>>>>> +        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
>>>>>> +                            job->write_flags);
>>>>>> +        if (ret < 0) {
>>>>>> +            trace_backup_do_cow_write_fail(job, start, ret);
>>>>>> +            if (error_is_read) {
>>>>>> +                *error_is_read = false;
>>>>>> +            }
>>>>>> +            goto fail;
>>>>>>            }
>>>>>> -        goto fail;
>>>>>> +
>>>>>> +        start += chunk;
>>>>>> +        remaining_bytes -= chunk;
>>>>>>        }
>>>>>>        return nbytes;
>>>>>> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>>>>>>                }
>>>>>>            }
>>>>>>            if (!job->use_copy_range) {
>>>>>> +            if (!bounce_buffer) {
>>>>>> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
>>>>>> +                                 MAX(dirty_end - start, end - dirty_end));
>>>>>> +                bounce_buffer = blk_try_blockalign(job->common.blk, len);
>>>>>> +            }
>>>>>
>>>>> If you use _try_, you should probably also check whether it succeeded.
>>>>
>>>> Oops, you are right, of course.
>>>>
>>>>>
>>>>> Anyway, I wonder whether it’d be better to just allocate this buffer
>>>>> once per job (the first time we get here, probably) to be of size
>>>>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe add
>>>>> a buf-size parameter similar to what the mirror jobs have.)
>>>>>
>>>>
>>>> Once per job will not work, as we may have several guest writes in parallel and therefore
>>>> several parallel copy-before-write operations.
>>>
>>> Hm.  I’m not quite happy with that because if the guest just issues many
>>> large discards in parallel, this means that qemu will allocate a large
>>> amount of memory.
>>>
>>> It would be nice if there was a simple way to keep track of the total
>>> memory usage and let requests yield if they would exceed it.
>>
>> Agree, it should be fixed anyway.
>>
> 
> 
> But still..
> 
> Synchronous mirror allocates full-request buffers on guest write. Is it correct?
> 
> If we assume that it is correct to double memory usage of guest operations, than for backup
> the problem is only in write_zero and discard where guest-assumed memory usage should be zero.

Well, but that is the problem.  I didn’t say anything in v2, because I
only thought of normal writes and I found it fine to double the memory
usage there (a guest won’t issue huge write requests in parallel).  But
discard/write-zeroes are a different matter.

> And if we should distinguish writes from write_zeroes and discard, it's better to postpone this
> improvement to be after backup-top filter merged.

But do you need to distinguish it?  Why not just keep track of memory
usage and put the current I/O coroutine to sleep in a CoQueue or
something, and wake that up at the end of backup_do_cow()?

Max