All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Max Reitz <mreitz@redhat.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>
Cc: "fam@euphon.net" <fam@euphon.net>,
	"kwolf@redhat.com" <kwolf@redhat.com>,
	Denis Lunev <den@virtuozzo.com>,
	"wencongyang2@huawei.com" <wencongyang2@huawei.com>,
	"xiechanglong.d@gmail.com" <xiechanglong.d@gmail.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"armbru@redhat.com" <armbru@redhat.com>,
	"jsnow@redhat.com" <jsnow@redhat.com>,
	"stefanha@redhat.com" <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v9 13/13] block/backup: use backup-top instead of write notifiers
Date: Tue, 3 Sep 2019 08:06:01 +0000	[thread overview]
Message-ID: <9efd8378-2e19-b3f4-e449-0d46719baebb@virtuozzo.com> (raw)
In-Reply-To: <d0235117-87f7-2e8d-83e9-4678b5e4f1bb@redhat.com>

02.09.2019 19:34, Max Reitz wrote:
> On 29.08.19 16:55, Vladimir Sementsov-Ogievskiy wrote:
>> 28.08.2019 22:50, Max Reitz wrote:
>>> On 26.08.19 18:13, Vladimir Sementsov-Ogievskiy wrote:
>>>> Drop write notifiers and use filter node instead.
>>>>
>>>> = Changes =
>>>>
>>>> 1. add filter-node-name argument for backup qmp api. We have to do it
>>>> in this commit, as 257 needs to be fixed.
>>>
>>> I feel a bit bad about it not being an implicit node.  But I know you
>>> don’t like them, they’re probably just broken altogether and because
>>> libvirt doesn’t use backup (yet), probably nobody cares.
>>>
>>>> 2. there no move write notifiers here, so is_write_notifier parameter
>>>
>>> s/there/there are/, I suppose?
>>>
>>>> is dropped from block-copy paths.
>>>>
>>>> 3. Intersecting requests handling changed, now synchronization between
>>>> backup-top, backup and guest writes are all done in block/block-copy.c
>>>> and works as follows:
>>>>
>>>> On copy operation, we work only with dirty areas. If bits are dirty it
>>>> means that there are no requests intersecting with this area. We clear
>>>> dirty bits and take bdrv range lock (bdrv_co_try_lock) on this area to
>>>> prevent further operations from interaction with guest (only with
>>>> guest, as neither backup nor backup-top will touch non-dirty area). If
>>>> copy-operation failed we set dirty bits back together with releasing
>>>> the lock.
>>>>
>>>> The actual difference with old scheme is that on guest writes we
>>>> don't lock the whole region but only dirty-parts, and to be more
>>>> precise: only dirty-part we are currently operate on. In old scheme
>>>> guest write to non-dirty area (which may be safely ignored by backup)
>>>> may wait for intersecting request, touching some other area which is
>>>> dirty.
>>>>
>>>> 4. To sync with in-flight requests at job finish we now have drained
>>>> removing of the filter, we don't need rw-lock.
>>>>
>>>> = Notes =
>>>>
>>>> Note the consequence of three objects appearing: backup-top, backup job
>>>> and block-copy-state:
>>>>
>>>> 1. We want to insert backup-top before job creation, to behave similar
>>>> with mirror and commit, where job is started upon filter.
>>>>
>>>> 2. We also have to create block-copy-state after filter injection, as
>>>> we don't want it's source child be replaced by fitler. Instead we want
>>>
>>> s/it's/its/, s/filter/filter/ (although “fitler” does have an amusing
>>> ring to it)
>>>
>>>> to keep BCS.source to be real source node, as we want to use
>>>> bdrv_co_try_lock in CBW operations and it can't be used on filter, as
>>>> on filter we already have in-flight (write) request from upper layer.
>>>
>>> Reasonable, even more so as I suppose BCS.source should eventually be a
>>> BdrvChild of backup-top.
>>>
>>> What looks wrong is that the sync_bitmap is created on the source (“bs”
>>> in backup_job_create()), but backup_cleanup_sync_bitmap() still assumes
>>> it is on blk_bs(job->common.blk) (which is no longer true).
>>>
>>>> So, we firstly create inject backup-top, then create job and BCS. BCS
>>>> is the latest just to not create extra variable for it. Finally we set
>>>> bcs for backup-top filter.
>>>>
>>>> = Iotest changes =
>>>>
>>>> 56: op-blocker doesn't shot now, as we set it on source, but then check
>>>
>>> s/shot/show/?
>>>
>>>> on filter, when trying to start second backup, so error caught in
>>>> test_dismiss_collision is changed. It's OK anyway, as this test-case
>>>> seems to test that after some collision we can dismiss first job and
>>>> successfully start the second one. So, the change is that collision is
>>>> changed from op-blocker to file-posix locks.
>>>
>>> Well, but the op blocker belongs to the source, which should be covered
>>> by internal permissions.  The fact that it now shows up as a file-posix
>>> error thus shows that the conflict also moves from the source to the
>>> target.  It’s OK because we actually don’t have a conflict on the source.
>>>
>>> But I wonder whether it would be better for test_dismiss_collision() to
>>> do a blockdev-backup instead where we can see the collision on the target.
>>>
>>> Hm.  On second thought, why do we even get a conflict on the target?
>>> block-copy does share the WRITE permission for it...
>>
>> Not sure, but assume that this is because in file-posix.c in raw_co_create
>> we do want RESIZE perm.
>>
>> I can instead move this test to use specified job-id, to move the collision
>> to "job-id already exists" error. Is it better?
> 
> It’s probably the only thing that actually makes sense.
> 
>> I'm afraid that posix locks will not work if disable them in config.
> 
> Yes (or if the environment doesn’t support them); which is why I
> suggested using blockdev-backup.  (But that would have the same problem
> of “Where does the permission conflict even come from and is it
> inevitable?”)
> 
> [...]
> 
>>>> diff --git a/block/block-copy.c b/block/block-copy.c
>>>> index 6828c46ba0..f3102ec3ff 100644
>>>> --- a/block/block-copy.c
>>>> +++ b/block/block-copy.c
>>>> @@ -98,27 +98,32 @@ fail:
>>>>     * error occurred, return a negative error number
>>>>     */
>>>>    static int coroutine_fn block_copy_with_bounce_buffer(
>>>> -        BlockCopyState *s, int64_t start, int64_t end, bool is_write_notifier,
>>>> +        BlockCopyState *s, int64_t start, int64_t end,
>>>>            bool *error_is_read, void **bounce_buffer)
>>>>    {
>>>>        int ret;
>>>> -    int nbytes;
>>>> -    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>>> +    int nbytes = MIN(s->cluster_size, s->len - start);
>>>> +    void *lock = bdrv_co_try_lock(blk_bs(s->source), start, nbytes);
>>>> +
>>>> +    /*
>>>> +     * Function must be called only on full-dirty region, so nobody touching or
>>>> +     * touched these bytes. Therefore, we must successfully get lock.
>>>
>>> Which is OK because backup-top does not share the WRITE permission.  But
>>> it is organized a bit weirdly, because it’s actually block-copy here
>>> that relies on the WRITE permission not to be shared; which it itself
>>> cannot do because backup-top needs it.
>>>
>>> This of course results from the fact that backup-top and block-copy
>>> should really use the same object to access the source, namely the
>>> backing BdrvChild of backup-top.
>>>
>>> Maybe there should be a comment somewhere that points this out?
>>
>> But I wanted block-copy not to be directly connected to backup-top.
> 
> But it doesn’t really work.  There’s an obscure contract that already
> binds block-copy to backup-top.
> 
>> I think actually we rely on the fact that user of block-copy() guarantees that
>> nobody is touching dirty areas (except block_copy). And with backup, this is done by
>> backup-top filter. I'll add a comment on this.
> 
> Yes, that’s the contract.  And it’s weird, because block-copy has its
> own BB, so if it doesn’t want anyone to write

doesn't want anyone to write to dirty areas, and we don't have such option in the permission
system...

> to the source outside of
> its control, it should just unshare the WRITE permission – but it cannot
> do that, because its user needs WRITE access, and so it implicitly
> relies on that user to then unshare WRITE.
> 
> If both used the same single BdrvChild, it would be clear
> 
> (1) why the BdrvChild belongs to block-copy’s user (because block-copy’s
> user would need to own a node that then has this BdrvChild),
> 
> (2) why block-copy does not unshare the WRITE permission (it cannot,
> because it doesn’t create the BdrvChild),
> 
> (3) how block-copy could explicitly ensure that WRITE is unshared (by
> asserting !(child->shared_perm & WRITE)).
> 
> 
> I’m not saying that we need to abandon having BBs right now, but I think
> there are a couple of cases which show why I say it’s uglier than using
> BdrvChildren instead.
> 

OK. I'd prefer to move block-copy to BdrvChildren as follow-up, if you don't mind.


-- 
Best regards,
Vladimir

  reply	other threads:[~2019-09-03  8:07 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-26 16:12 [Qemu-devel] [PATCH v9 00/13] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
2019-08-26 16:13 ` [Qemu-devel] [PATCH v9 01/13] block/backup: fix backup_cow_with_offload for last cluster Vladimir Sementsov-Ogievskiy
2019-08-28 14:08   ` Max Reitz
2019-08-26 16:13 ` [Qemu-devel] [PATCH v9 02/13] block/backup: split shareable copying part from backup_do_cow Vladimir Sementsov-Ogievskiy
2019-08-28 14:22   ` Max Reitz
2019-08-28 14:27     ` Vladimir Sementsov-Ogievskiy
2019-08-28 14:32       ` Max Reitz
2019-08-26 16:13 ` [Qemu-devel] [PATCH v9 03/13] block/backup: introduce BlockCopyState Vladimir Sementsov-Ogievskiy
2019-08-28 15:59   ` Max Reitz
2019-08-29 10:52     ` Vladimir Sementsov-Ogievskiy
2019-09-02 16:09       ` Max Reitz
2019-08-26 16:13 ` [Qemu-devel] [PATCH v9 04/13] block/backup: adjust block-copy functions style Vladimir Sementsov-Ogievskiy
2019-08-28 16:06   ` Max Reitz
2019-08-29 11:25     ` Vladimir Sementsov-Ogievskiy
2019-08-26 16:13 ` [Qemu-devel] [PATCH v9 05/13] block: move block_copy from block/backup.c to separate file Vladimir Sementsov-Ogievskiy
2019-08-28 16:16   ` Max Reitz
2019-08-29 12:00     ` Vladimir Sementsov-Ogievskiy
2019-08-26 16:13 ` [Qemu-devel] [PATCH v9 06/13] block: teach bdrv_debug_breakpoint skip filters with backing Vladimir Sementsov-Ogievskiy
2019-08-28 16:21   ` Max Reitz
2019-08-26 16:13 ` [Qemu-devel] [PATCH v9 07/13] iotests: prepare 124 and 257 bitmap querying for backup-top filter Vladimir Sementsov-Ogievskiy
2019-08-28 16:40   ` Max Reitz
2019-08-29 13:22     ` Vladimir Sementsov-Ogievskiy
2019-09-02 16:10       ` Max Reitz
2019-08-26 16:13 ` [Qemu-devel] [PATCH v9 08/13] iotests: 257: drop unused Drive.device field Vladimir Sementsov-Ogievskiy
2019-08-28 16:50   ` Max Reitz
2019-08-26 16:13 ` [Qemu-devel] [PATCH v9 09/13] iotests: 257: drop device_add Vladimir Sementsov-Ogievskiy
2019-08-28 16:54   ` Max Reitz
2019-08-26 16:13 ` [Qemu-devel] [PATCH v9 10/13] block/io: refactor wait_serialising_requests Vladimir Sementsov-Ogievskiy
2019-08-26 16:13 ` [Qemu-devel] [PATCH v9 11/13] block: add lock/unlock range functions Vladimir Sementsov-Ogievskiy
2019-08-28 17:02   ` Max Reitz
2019-08-29  9:17     ` Vladimir Sementsov-Ogievskiy
2019-08-26 16:13 ` [Qemu-devel] [PATCH v9 12/13] block: introduce backup-top filter driver Vladimir Sementsov-Ogievskiy
2019-08-28 17:52   ` Max Reitz
2019-08-26 16:13 ` [Qemu-devel] [PATCH v9 13/13] block/backup: use backup-top instead of write notifiers Vladimir Sementsov-Ogievskiy
2019-08-28 19:50   ` Max Reitz
2019-08-29 14:55     ` Vladimir Sementsov-Ogievskiy
2019-09-02 16:34       ` Max Reitz
2019-09-03  8:06         ` Vladimir Sementsov-Ogievskiy [this message]
2019-09-09  9:12           ` Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9efd8378-2e19-b3f4-e449-0d46719baebb@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=armbru@redhat.com \
    --cc=den@virtuozzo.com \
    --cc=fam@euphon.net \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=wencongyang2@huawei.com \
    --cc=xiechanglong.d@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.