On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote: > 12.08.2019 19:37, Vladimir Sementsov-Ogievskiy wrote: >> 12.08.2019 19:11, Max Reitz wrote: >>> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote: >>>> 12.08.2019 18:10, Max Reitz wrote: >>>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote: >>>>>> backup_cow_with_offload can transfer more than one cluster. Let >>>>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number >>>>>> of IO requests, since there is no need to copy cluster by cluster. >>>>>> >>>>>> Logic around bounce_buffer allocation changed: we can't just allocate >>>>>> one-cluster-sized buffer to share for all iterations. We can't also >>>>>> allocate buffer of full-request length it may be too large, so >>>>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic >>>>>> is to allocate a buffer sufficient to handle all remaining iterations >>>>>> at the point where we need the buffer for the first time. >>>>>> >>>>>> Bonus: get rid of pointer-to-pointer. >>>>>> >>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy >>>>>> --- >>>>>>    block/backup.c | 65 +++++++++++++++++++++++++++++++------------------- >>>>>>    1 file changed, 41 insertions(+), 24 deletions(-) >>>>>> >>>>>> diff --git a/block/backup.c b/block/backup.c >>>>>> index d482d93458..65f7212c85 100644 >>>>>> --- a/block/backup.c >>>>>> +++ b/block/backup.c >>>>>> @@ -27,6 +27,7 @@ >>>>>>    #include "qemu/error-report.h" >>>>>>    #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16) >>>>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024) >>>>>>    typedef struct CowRequest { >>>>>>        int64_t start_byte; >>>>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req) >>>>>>        qemu_co_queue_restart_all(&req->wait_queue); >>>>>>    } >>>>>> -/* Copy range to target with a bounce buffer and return the bytes copied. If >>>>>> - * error occurred, return a negative error number */ >>>>>> +/* >>>>>> + * Copy range to target with a bounce buffer and return the bytes copied. If >>>>>> + * error occurred, return a negative error number >>>>>> + * >>>>>> + * @bounce_buffer is assumed to enough to store >>>>> >>>>> s/to/to be/ >>>>> >>>>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes >>>>>> + */ >>>>>>    static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job, >>>>>>                                                          int64_t start, >>>>>>                                                          int64_t end, >>>>>>                                                          bool is_write_notifier, >>>>>>                                                          bool *error_is_read, >>>>>> -                                                      void **bounce_buffer) >>>>>> +                                                      void *bounce_buffer) >>>>>>    { >>>>>>        int ret; >>>>>>        BlockBackend *blk = job->common.blk; >>>>>> -    int nbytes; >>>>>> +    int nbytes, remaining_bytes; >>>>>>        int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0; >>>>>>        assert(QEMU_IS_ALIGNED(start, job->cluster_size)); >>>>>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size); >>>>>> -    nbytes = MIN(job->cluster_size, job->len - start); >>>>>> -    if (!*bounce_buffer) { >>>>>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size); >>>>>> -    } >>>>>> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start); >>>>>> +    nbytes = MIN(end - start, job->len - start); >>>>>> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags); >>>>>> -    if (ret < 0) { >>>>>> -        trace_backup_do_cow_read_fail(job, start, ret); >>>>>> -        if (error_is_read) { >>>>>> -            *error_is_read = true; >>>>>> + >>>>>> +    remaining_bytes = nbytes; >>>>>> +    while (remaining_bytes) { >>>>>> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes); >>>>>> + >>>>>> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags); >>>>>> +        if (ret < 0) { >>>>>> +            trace_backup_do_cow_read_fail(job, start, ret); >>>>>> +            if (error_is_read) { >>>>>> +                *error_is_read = true; >>>>>> +            } >>>>>> +            goto fail; >>>>>>            } >>>>>> -        goto fail; >>>>>> -    } >>>>>> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer, >>>>>> -                        job->write_flags); >>>>>> -    if (ret < 0) { >>>>>> -        trace_backup_do_cow_write_fail(job, start, ret); >>>>>> -        if (error_is_read) { >>>>>> -            *error_is_read = false; >>>>>> +        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer, >>>>>> +                            job->write_flags); >>>>>> +        if (ret < 0) { >>>>>> +            trace_backup_do_cow_write_fail(job, start, ret); >>>>>> +            if (error_is_read) { >>>>>> +                *error_is_read = false; >>>>>> +            } >>>>>> +            goto fail; >>>>>>            } >>>>>> -        goto fail; >>>>>> + >>>>>> +        start += chunk; >>>>>> +        remaining_bytes -= chunk; >>>>>>        } >>>>>>        return nbytes; >>>>>> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job, >>>>>>                } >>>>>>            } >>>>>>            if (!job->use_copy_range) { >>>>>> +            if (!bounce_buffer) { >>>>>> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER, >>>>>> +                                 MAX(dirty_end - start, end - dirty_end)); >>>>>> +                bounce_buffer = blk_try_blockalign(job->common.blk, len); >>>>>> +            } >>>>> >>>>> If you use _try_, you should probably also check whether it succeeded. >>>> >>>> Oops, you are right, of course. >>>> >>>>> >>>>> Anyway, I wonder whether it’d be better to just allocate this buffer >>>>> once per job (the first time we get here, probably) to be of size >>>>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe add >>>>> a buf-size parameter similar to what the mirror jobs have.) >>>>> >>>> >>>> Once per job will not work, as we may have several guest writes in parallel and therefore >>>> several parallel copy-before-write operations. >>> >>> Hm.  I’m not quite happy with that because if the guest just issues many >>> large discards in parallel, this means that qemu will allocate a large >>> amount of memory. >>> >>> It would be nice if there was a simple way to keep track of the total >>> memory usage and let requests yield if they would exceed it. >> >> Agree, it should be fixed anyway. >> > > > But still.. > > Synchronous mirror allocates full-request buffers on guest write. Is it correct? > > If we assume that it is correct to double memory usage of guest operations, than for backup > the problem is only in write_zero and discard where guest-assumed memory usage should be zero. Well, but that is the problem. I didn’t say anything in v2, because I only thought of normal writes and I found it fine to double the memory usage there (a guest won’t issue huge write requests in parallel). But discard/write-zeroes are a different matter. > And if we should distinguish writes from write_zeroes and discard, it's better to postpone this > improvement to be after backup-top filter merged. But do you need to distinguish it? Why not just keep track of memory usage and put the current I/O coroutine to sleep in a CoQueue or something, and wake that up at the end of backup_do_cow()? Max