Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once - Vladimir Sementsov-Ogievskiy

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Max Reitz <mreitz@redhat.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>
Cc: "fam@euphon.net" <fam@euphon.net>,
	"kwolf@redhat.com" <kwolf@redhat.com>,
	Denis Lunev <den@virtuozzo.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"armbru@redhat.com" <armbru@redhat.com>,
	"stefanha@redhat.com" <stefanha@redhat.com>,
	"jsnow@redhat.com" <jsnow@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
Date: Tue, 13 Aug 2019 14:39:48 +0000	[thread overview]
Message-ID: <cf9718b9-f36d-27f3-ad49-4f6fb1e1d7b3@virtuozzo.com> (raw)
In-Reply-To: <e05af208-c7cb-31e6-bfab-62a44f5281e7@redhat.com>

13.08.2019 17:23, Max Reitz wrote:
> On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote:
>> 12.08.2019 19:37, Vladimir Sementsov-Ogievskiy wrote:
>>> 12.08.2019 19:11, Max Reitz wrote:
>>>> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 12.08.2019 18:10, Max Reitz wrote:
>>>>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> backup_cow_with_offload can transfer more than one cluster. Let
>>>>>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>>>>>>> of IO requests, since there is no need to copy cluster by cluster.
>>>>>>>
>>>>>>> Logic around bounce_buffer allocation changed: we can't just allocate
>>>>>>> one-cluster-sized buffer to share for all iterations. We can't also
>>>>>>> allocate buffer of full-request length it may be too large, so
>>>>>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>>>>>>> is to allocate a buffer sufficient to handle all remaining iterations
>>>>>>> at the point where we need the buffer for the first time.
>>>>>>>
>>>>>>> Bonus: get rid of pointer-to-pointer.
>>>>>>>
>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>> ---
>>>>>>>     block/backup.c | 65 +++++++++++++++++++++++++++++++-------------------
>>>>>>>     1 file changed, 41 insertions(+), 24 deletions(-)
>>>>>>>
>>>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>>>> index d482d93458..65f7212c85 100644
>>>>>>> --- a/block/backup.c
>>>>>>> +++ b/block/backup.c
>>>>>>> @@ -27,6 +27,7 @@
>>>>>>>     #include "qemu/error-report.h"
>>>>>>>     #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>>>>>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>>>>>>     typedef struct CowRequest {
>>>>>>>         int64_t start_byte;
>>>>>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>>>>>>         qemu_co_queue_restart_all(&req->wait_queue);
>>>>>>>     }
>>>>>>> -/* Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>>> - * error occurred, return a negative error number */
>>>>>>> +/*
>>>>>>> + * Copy range to target with a bounce buffer and return the bytes copied. If
>>>>>>> + * error occurred, return a negative error number
>>>>>>> + *
>>>>>>> + * @bounce_buffer is assumed to enough to store
>>>>>>
>>>>>> s/to/to be/
>>>>>>
>>>>>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>>>>>>> + */
>>>>>>>     static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>>>>>>                                                           int64_t start,
>>>>>>>                                                           int64_t end,
>>>>>>>                                                           bool is_write_notifier,
>>>>>>>                                                           bool *error_is_read,
>>>>>>> -                                                      void **bounce_buffer)
>>>>>>> +                                                      void *bounce_buffer)
>>>>>>>     {
>>>>>>>         int ret;
>>>>>>>         BlockBackend *blk = job->common.blk;
>>>>>>> -    int nbytes;
>>>>>>> +    int nbytes, remaining_bytes;
>>>>>>>         int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>>>>>>         assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>>>>>>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>>>>>>> -    nbytes = MIN(job->cluster_size, job->len - start);
>>>>>>> -    if (!*bounce_buffer) {
>>>>>>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
>>>>>>> -    }
>>>>>>> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>>>>>>> +    nbytes = MIN(end - start, job->len - start);
>>>>>>> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
>>>>>>> -    if (ret < 0) {
>>>>>>> -        trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>> -        if (error_is_read) {
>>>>>>> -            *error_is_read = true;
>>>>>>> +
>>>>>>> +    remaining_bytes = nbytes;
>>>>>>> +    while (remaining_bytes) {
>>>>>>> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>>>>>>> +
>>>>>>> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
>>>>>>> +        if (ret < 0) {
>>>>>>> +            trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>> +            if (error_is_read) {
>>>>>>> +                *error_is_read = true;
>>>>>>> +            }
>>>>>>> +            goto fail;
>>>>>>>             }
>>>>>>> -        goto fail;
>>>>>>> -    }
>>>>>>> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>>>>>>> -                        job->write_flags);
>>>>>>> -    if (ret < 0) {
>>>>>>> -        trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>> -        if (error_is_read) {
>>>>>>> -            *error_is_read = false;
>>>>>>> +        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
>>>>>>> +                            job->write_flags);
>>>>>>> +        if (ret < 0) {
>>>>>>> +            trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>> +            if (error_is_read) {
>>>>>>> +                *error_is_read = false;
>>>>>>> +            }
>>>>>>> +            goto fail;
>>>>>>>             }
>>>>>>> -        goto fail;
>>>>>>> +
>>>>>>> +        start += chunk;
>>>>>>> +        remaining_bytes -= chunk;
>>>>>>>         }
>>>>>>>         return nbytes;
>>>>>>> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
>>>>>>>                 }
>>>>>>>             }
>>>>>>>             if (!job->use_copy_range) {
>>>>>>> +            if (!bounce_buffer) {
>>>>>>> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
>>>>>>> +                                 MAX(dirty_end - start, end - dirty_end));
>>>>>>> +                bounce_buffer = blk_try_blockalign(job->common.blk, len);
>>>>>>> +            }
>>>>>>
>>>>>> If you use _try_, you should probably also check whether it succeeded.
>>>>>
>>>>> Oops, you are right, of course.
>>>>>
>>>>>>
>>>>>> Anyway, I wonder whether it’d be better to just allocate this buffer
>>>>>> once per job (the first time we get here, probably) to be of size
>>>>>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe add
>>>>>> a buf-size parameter similar to what the mirror jobs have.)
>>>>>>
>>>>>
>>>>> Once per job will not work, as we may have several guest writes in parallel and therefore
>>>>> several parallel copy-before-write operations.
>>>>
>>>> Hm.  I’m not quite happy with that because if the guest just issues many
>>>> large discards in parallel, this means that qemu will allocate a large
>>>> amount of memory.
>>>>
>>>> It would be nice if there was a simple way to keep track of the total
>>>> memory usage and let requests yield if they would exceed it.
>>>
>>> Agree, it should be fixed anyway.
>>>
>>
>>
>> But still..
>>
>> Synchronous mirror allocates full-request buffers on guest write. Is it correct?
>>
>> If we assume that it is correct to double memory usage of guest operations, than for backup
>> the problem is only in write_zero and discard where guest-assumed memory usage should be zero.
> 
> Well, but that is the problem.  I didn’t say anything in v2, because I
> only thought of normal writes and I found it fine to double the memory
> usage there (a guest won’t issue huge write requests in parallel).  But
> discard/write-zeroes are a different matter.
> 
>> And if we should distinguish writes from write_zeroes and discard, it's better to postpone this
>> improvement to be after backup-top filter merged.
> 
> But do you need to distinguish it?  Why not just keep track of memory
> usage and put the current I/O coroutine to sleep in a CoQueue or
> something, and wake that up at the end of backup_do_cow()?
> 

1. Because if we _can_ allow doubling of memory, it's more effective to not restrict allocations on
guest writes. It's just seems to be more effective technique.

2. Anyway, I'd allow some always-available size to allocate - let it be one cluster, which will correspond
to current behavior and prevent guest io hang in worst case. So I mean, if we have enough memory allow
individual CBW operation to allocate the whole buffer, and if we have no extra memory allow to allocate one
cluster.

-- 
Best regards,
Vladimir