qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] backup bug or question
@ 2019-08-09 13:18 Vladimir Sementsov-Ogievskiy
  2019-08-09 20:13 ` John Snow
  2019-08-12 13:23 ` Kevin Wolf
  0 siblings, 2 replies; 9+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-09 13:18 UTC (permalink / raw)
  To: qemu block, qemu-devel, Kevin Wolf, Max Reitz, John Snow

Hi!

Hmm, hacking around backup I have a question:

What prevents guest write request after job_start but before setting
write notifier?

code path:

qmp_drive_backup or transaction with backup

    job_start
       aio_co_enter(job_co_entry) /* may only schedule execution, isn't it ? */

....

job_co_entry
    job_pause_point() /* it definitely yields, isn't it bad? */
    job->driver->run() /* backup_run */

----

backup_run()
    bdrv_add_before_write_notifier()

...

And what guarantees we give to the user? Is it guaranteed that write notifier is
set when qmp command returns?

And I guess, if we start several backups in a transaction it should be guaranteed
that the set of backups is consistent and correspond to one point in time...

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] backup bug or question
  2019-08-09 13:18 [Qemu-devel] backup bug or question Vladimir Sementsov-Ogievskiy
@ 2019-08-09 20:13 ` John Snow
  2019-08-10 11:17   ` Vladimir Sementsov-Ogievskiy
  2019-08-12 13:23 ` Kevin Wolf
  1 sibling, 1 reply; 9+ messages in thread
From: John Snow @ 2019-08-09 20:13 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu block, qemu-devel, Kevin Wolf,
	Max Reitz



On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote:
> Hi!
> 
> Hmm, hacking around backup I have a question:
> 
> What prevents guest write request after job_start but before setting
> write notifier?
> 
> code path:
> 
> qmp_drive_backup or transaction with backup
> 
>     job_start
>        aio_co_enter(job_co_entry) /* may only schedule execution, isn't it ? */
> 
> ....
> 
> job_co_entry
>     job_pause_point() /* it definitely yields, isn't it bad? */
>     job->driver->run() /* backup_run */
> 
> ----
> 
> backup_run()
>     bdrv_add_before_write_notifier()
> 
> ...
> 

I think you're right... :(


We create jobs like this:

job->paused        = true;
job->pause_count   = 1;


And then job_start does this:

job->co = qemu_coroutine_create(job_co_entry, job);
job->pause_count--;
job->busy = true;
job->paused = false;


Which means that job_co_entry is being called before we lift the pause:

assert(job && job->driver && job->driver->run);
job_pause_point(job);
job->ret = job->driver->run(job, &job->err);

...Which means that we are definitely yielding in job_pause_point.

Yeah, that's a race condition waiting to happen.

> And what guarantees we give to the user? Is it guaranteed that write notifier is
> set when qmp command returns?
> 
> And I guess, if we start several backups in a transaction it should be guaranteed
> that the set of backups is consistent and correspond to one point in time...
> 

I would have hoped that maybe the drain_all coupled with the individual
jobs taking drain_start and drain_end would save us, but I guess we
simply don't have a guarantee that all backup jobs WILL have installed
their handler by the time the transaction ends.

Or, if there is that guarantee, I don't know what provides it, so I
think we shouldn't count on it accidentally working anymore.



I think we should do two things:

1. Move the handler installation to creation time.
2. Modify backup_before_write_notify to return without invoking
backup_do_cow if the job isn't started yet.

I'll send a patch in just a moment ...

--js


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] backup bug or question
  2019-08-09 20:13 ` John Snow
@ 2019-08-10 11:17   ` Vladimir Sementsov-Ogievskiy
  2019-08-12 17:46     ` John Snow
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-10 11:17 UTC (permalink / raw)
  To: John Snow, qemu block, qemu-devel, Kevin Wolf, Max Reitz

09.08.2019 23:13, John Snow wrote:
> 
> 
> On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Hi!
>>
>> Hmm, hacking around backup I have a question:
>>
>> What prevents guest write request after job_start but before setting
>> write notifier?
>>
>> code path:
>>
>> qmp_drive_backup or transaction with backup
>>
>>      job_start
>>         aio_co_enter(job_co_entry) /* may only schedule execution, isn't it ? */
>>
>> ....
>>
>> job_co_entry
>>      job_pause_point() /* it definitely yields, isn't it bad? */
>>      job->driver->run() /* backup_run */
>>
>> ----
>>
>> backup_run()
>>      bdrv_add_before_write_notifier()
>>
>> ...
>>
> 
> I think you're right... :(
> 
> 
> We create jobs like this:
> 
> job->paused        = true;
> job->pause_count   = 1;
> 
> 
> And then job_start does this:
> 
> job->co = qemu_coroutine_create(job_co_entry, job);
> job->pause_count--;
> job->busy = true;
> job->paused = false;
> 
> 
> Which means that job_co_entry is being called before we lift the pause:
> 
> assert(job && job->driver && job->driver->run);
> job_pause_point(job);
> job->ret = job->driver->run(job, &job->err);
> 
> ...Which means that we are definitely yielding in job_pause_point.
> 
> Yeah, that's a race condition waiting to happen.
> 
>> And what guarantees we give to the user? Is it guaranteed that write notifier is
>> set when qmp command returns?
>>
>> And I guess, if we start several backups in a transaction it should be guaranteed
>> that the set of backups is consistent and correspond to one point in time...
>>
> 
> I would have hoped that maybe the drain_all coupled with the individual
> jobs taking drain_start and drain_end would save us, but I guess we
> simply don't have a guarantee that all backup jobs WILL have installed
> their handler by the time the transaction ends.
> 
> Or, if there is that guarantee, I don't know what provides it, so I
> think we shouldn't count on it accidentally working anymore.
> 
> 
> 
> I think we should do two things:
> 
> 1. Move the handler installation to creation time.
> 2. Modify backup_before_write_notify to return without invoking
> backup_do_cow if the job isn't started yet.
> 

Hmm, I don't see, how it helps.. No-op write-notifier will not save as from
guest write, is it?


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] backup bug or question
  2019-08-09 13:18 [Qemu-devel] backup bug or question Vladimir Sementsov-Ogievskiy
  2019-08-09 20:13 ` John Snow
@ 2019-08-12 13:23 ` Kevin Wolf
  2019-08-12 16:09   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 9+ messages in thread
From: Kevin Wolf @ 2019-08-12 13:23 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: John Snow, qemu-devel, qemu block, Max Reitz

Am 09.08.2019 um 15:18 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Hi!
> 
> Hmm, hacking around backup I have a question:
> 
> What prevents guest write request after job_start but before setting
> write notifier?
> 
> code path:
> 
> qmp_drive_backup or transaction with backup
> 
>     job_start
>        aio_co_enter(job_co_entry) /* may only schedule execution, isn't it ? */
> 
> ....
> 
> job_co_entry
>     job_pause_point() /* it definitely yields, isn't it bad? */
>     job->driver->run() /* backup_run */
> 
> ----
> 
> backup_run()
>     bdrv_add_before_write_notifier()
> 
> ...
> 
> And what guarantees we give to the user? Is it guaranteed that write notifier is
> set when qmp command returns?
> 
> And I guess, if we start several backups in a transaction it should be guaranteed
> that the set of backups is consistent and correspond to one point in time...

Do the patches to switch backup to a filter node solve this
automatically because that node would be inserted in
backup_job_create()?

Kevin


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] backup bug or question
  2019-08-12 13:23 ` Kevin Wolf
@ 2019-08-12 16:09   ` Vladimir Sementsov-Ogievskiy
  2019-08-12 16:49     ` Kevin Wolf
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-12 16:09 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: John Snow, qemu-devel, qemu block, Max Reitz

12.08.2019 16:23, Kevin Wolf wrote:
> Am 09.08.2019 um 15:18 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Hi!
>>
>> Hmm, hacking around backup I have a question:
>>
>> What prevents guest write request after job_start but before setting
>> write notifier?
>>
>> code path:
>>
>> qmp_drive_backup or transaction with backup
>>
>>      job_start
>>         aio_co_enter(job_co_entry) /* may only schedule execution, isn't it ? */
>>
>> ....
>>
>> job_co_entry
>>      job_pause_point() /* it definitely yields, isn't it bad? */
>>      job->driver->run() /* backup_run */
>>
>> ----
>>
>> backup_run()
>>      bdrv_add_before_write_notifier()
>>
>> ...
>>
>> And what guarantees we give to the user? Is it guaranteed that write notifier is
>> set when qmp command returns?
>>
>> And I guess, if we start several backups in a transaction it should be guaranteed
>> that the set of backups is consistent and correspond to one point in time...
> 
> Do the patches to switch backup to a filter node solve this
> automatically because that node would be inserted in
> backup_job_create()?
> 

Hmm, great, looks like they should. At least it moves scope of the problem to do_drive_backup
and do_blockdev_backup functions..

Am I right that aio_context_acquire/aio_context_release guarantees no new request created during
the section? Or should we add drained_begin/drained_end pair, or at least drain() at start of
qmp_blockdev_backup and qmp_drive_backup?

Assume scenario like the this,

1. fsfreeze
2. qmp backup
3. fsthaw

to make sure that backup starting point is consistent. So in our qmp command we should:
1. complete all current requests to make drives corresponding to fsfreeze point
2. initialize write-notifiers or filter before any new guest request, i.e. before fsthaw,
i.e. before qmp command return.

Transactions should be OK, as they use drained_begin/drained_end pairs, and additional
aio_context_acquire/aio_context_release pairs.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] backup bug or question
  2019-08-12 16:09   ` Vladimir Sementsov-Ogievskiy
@ 2019-08-12 16:49     ` Kevin Wolf
  2019-08-12 17:02       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 9+ messages in thread
From: Kevin Wolf @ 2019-08-12 16:49 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: John Snow, qemu-devel, qemu block, Max Reitz

Am 12.08.2019 um 18:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 12.08.2019 16:23, Kevin Wolf wrote:
> > Am 09.08.2019 um 15:18 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> Hi!
> >>
> >> Hmm, hacking around backup I have a question:
> >>
> >> What prevents guest write request after job_start but before setting
> >> write notifier?
> >>
> >> code path:
> >>
> >> qmp_drive_backup or transaction with backup
> >>
> >>      job_start
> >>         aio_co_enter(job_co_entry) /* may only schedule execution, isn't it ? */
> >>
> >> ....
> >>
> >> job_co_entry
> >>      job_pause_point() /* it definitely yields, isn't it bad? */
> >>      job->driver->run() /* backup_run */
> >>
> >> ----
> >>
> >> backup_run()
> >>      bdrv_add_before_write_notifier()
> >>
> >> ...
> >>
> >> And what guarantees we give to the user? Is it guaranteed that write notifier is
> >> set when qmp command returns?
> >>
> >> And I guess, if we start several backups in a transaction it should be guaranteed
> >> that the set of backups is consistent and correspond to one point in time...
> > 
> > Do the patches to switch backup to a filter node solve this
> > automatically because that node would be inserted in
> > backup_job_create()?
> > 
> 
> Hmm, great, looks like they should. At least it moves scope of the
> problem to do_drive_backup and do_blockdev_backup functions..
> 
> Am I right that aio_context_acquire/aio_context_release guarantees no
> new request created during the section? Or should we add
> drained_begin/drained_end pair, or at least drain() at start of
> qmp_blockdev_backup and qmp_drive_backup?

Holding the AioContext lock should be enough for this.

But note that it doesn't make a difference if new requests are actually
incoming. The timing of the QMP command to start a backup job versus the
timing of guest requests is essentially random. QEMU doesn't know what
guest requests you mean to be included in the backup and which you don't
unless you stop sending new requests well ahead of time.

If you send a QMP request to start a backup, the backup will be
consistent for some arbitrary point in time between the time that you
sent the QMP request and the time that you received the reply to it.

Draining in the QMP command handler wouldn't change any of this, because
even the drain section starts at some arbitrary point in time.

> Assume scenario like the this,
> 
> 1. fsfreeze
> 2. qmp backup
> 3. fsthaw
> 
> to make sure that backup starting point is consistent. So in our qmp command we should:
> 1. complete all current requests to make drives corresponding to fsfreeze point
> 2. initialize write-notifiers or filter before any new guest request, i.e. before fsthaw,
> i.e. before qmp command return.

If I understand correctly, fsfreeze only returns success after it has
made sure that the guest has quiesced the device. So at any point
between receiving the successful return of the fsfreeze and calling
fsthaw, the state should be consistent.

> Transactions should be OK, as they use drained_begin/drained_end
> pairs, and additional aio_context_acquire/aio_context_release pairs.

Here, draining is actually important because you don't synchronise
against something external that you don't control anyway, but you just
make sure that you start the backup of all disks at the same point in
time (which is still an arbitrary point between the time that you send
the transaction QMP command and the time that you receive success), even
if no fsfreeze/fsthaw was used.

Kevin


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] backup bug or question
  2019-08-12 16:49     ` Kevin Wolf
@ 2019-08-12 17:02       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-12 17:02 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: John Snow, qemu-devel, qemu block, Max Reitz

12.08.2019 19:49, Kevin Wolf wrote:
> Am 12.08.2019 um 18:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 12.08.2019 16:23, Kevin Wolf wrote:
>>> Am 09.08.2019 um 15:18 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> Hi!
>>>>
>>>> Hmm, hacking around backup I have a question:
>>>>
>>>> What prevents guest write request after job_start but before setting
>>>> write notifier?
>>>>
>>>> code path:
>>>>
>>>> qmp_drive_backup or transaction with backup
>>>>
>>>>       job_start
>>>>          aio_co_enter(job_co_entry) /* may only schedule execution, isn't it ? */
>>>>
>>>> ....
>>>>
>>>> job_co_entry
>>>>       job_pause_point() /* it definitely yields, isn't it bad? */
>>>>       job->driver->run() /* backup_run */
>>>>
>>>> ----
>>>>
>>>> backup_run()
>>>>       bdrv_add_before_write_notifier()
>>>>
>>>> ...
>>>>
>>>> And what guarantees we give to the user? Is it guaranteed that write notifier is
>>>> set when qmp command returns?
>>>>
>>>> And I guess, if we start several backups in a transaction it should be guaranteed
>>>> that the set of backups is consistent and correspond to one point in time...
>>>
>>> Do the patches to switch backup to a filter node solve this
>>> automatically because that node would be inserted in
>>> backup_job_create()?
>>>
>>
>> Hmm, great, looks like they should. At least it moves scope of the
>> problem to do_drive_backup and do_blockdev_backup functions..
>>
>> Am I right that aio_context_acquire/aio_context_release guarantees no
>> new request created during the section? Or should we add
>> drained_begin/drained_end pair, or at least drain() at start of
>> qmp_blockdev_backup and qmp_drive_backup?
> 
> Holding the AioContext lock should be enough for this.
> 
> But note that it doesn't make a difference if new requests are actually
> incoming. The timing of the QMP command to start a backup job versus the
> timing of guest requests is essentially random. QEMU doesn't know what
> guest requests you mean to be included in the backup and which you don't
> unless you stop sending new requests well ahead of time.
> 
> If you send a QMP request to start a backup, the backup will be
> consistent for some arbitrary point in time between the time that you
> sent the QMP request and the time that you received the reply to it.
> 
> Draining in the QMP command handler wouldn't change any of this, because
> even the drain section starts at some arbitrary point in time.

Hmm and it don't guarantee even that requests started before qmp command are
taken into backup, as they may be started in guest point of view, but not yet
in QEMU..

> 
>> Assume scenario like the this,
>>
>> 1. fsfreeze
>> 2. qmp backup
>> 3. fsthaw
>>
>> to make sure that backup starting point is consistent. So in our qmp command we should:
>> 1. complete all current requests to make drives corresponding to fsfreeze point
>> 2. initialize write-notifiers or filter before any new guest request, i.e. before fsthaw,
>> i.e. before qmp command return.
> 
> If I understand correctly, fsfreeze only returns success after it has
> made sure that the guest has quiesced the device. So at any point
> between receiving the successful return of the fsfreeze and calling
> fsthaw, the state should be consistent.
> 
>> Transactions should be OK, as they use drained_begin/drained_end
>> pairs, and additional aio_context_acquire/aio_context_release pairs.
> 
> Here, draining is actually important because you don't synchronise
> against something external that you don't control anyway, but you just
> make sure that you start the backup of all disks at the same point in
> time (which is still an arbitrary point between the time that you send
> the transaction QMP command and the time that you receive success), even
> if no fsfreeze/fsthaw was used.
> 
> Kevin
> 

OK, thanks for explanation!

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] backup bug or question
  2019-08-10 11:17   ` Vladimir Sementsov-Ogievskiy
@ 2019-08-12 17:46     ` John Snow
  2019-08-12 17:59       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 9+ messages in thread
From: John Snow @ 2019-08-12 17:46 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu block, qemu-devel, Kevin Wolf,
	Max Reitz



On 8/10/19 7:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> 09.08.2019 23:13, John Snow wrote:
>>
>>
>> On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi!
>>>
>>> Hmm, hacking around backup I have a question:
>>>
>>> What prevents guest write request after job_start but before setting
>>> write notifier?
>>>
>>> code path:
>>>
>>> qmp_drive_backup or transaction with backup
>>>
>>>      job_start
>>>         aio_co_enter(job_co_entry) /* may only schedule execution, isn't it ? */
>>>
>>> ....
>>>
>>> job_co_entry
>>>      job_pause_point() /* it definitely yields, isn't it bad? */
>>>      job->driver->run() /* backup_run */
>>>
>>> ----
>>>
>>> backup_run()
>>>      bdrv_add_before_write_notifier()
>>>
>>> ...
>>>
>>
>> I think you're right... :(
>>
>>
>> We create jobs like this:
>>
>> job->paused        = true;
>> job->pause_count   = 1;
>>
>>
>> And then job_start does this:
>>
>> job->co = qemu_coroutine_create(job_co_entry, job);
>> job->pause_count--;
>> job->busy = true;
>> job->paused = false;
>>
>>
>> Which means that job_co_entry is being called before we lift the pause:
>>
>> assert(job && job->driver && job->driver->run);
>> job_pause_point(job);
>> job->ret = job->driver->run(job, &job->err);
>>
>> ...Which means that we are definitely yielding in job_pause_point.
>>
>> Yeah, that's a race condition waiting to happen.
>>
>>> And what guarantees we give to the user? Is it guaranteed that write notifier is
>>> set when qmp command returns?
>>>
>>> And I guess, if we start several backups in a transaction it should be guaranteed
>>> that the set of backups is consistent and correspond to one point in time...
>>>
>>
>> I would have hoped that maybe the drain_all coupled with the individual
>> jobs taking drain_start and drain_end would save us, but I guess we
>> simply don't have a guarantee that all backup jobs WILL have installed
>> their handler by the time the transaction ends.
>>
>> Or, if there is that guarantee, I don't know what provides it, so I
>> think we shouldn't count on it accidentally working anymore.
>>
>>
>>
>> I think we should do two things:
>>
>> 1. Move the handler installation to creation time.
>> 2. Modify backup_before_write_notify to return without invoking
>> backup_do_cow if the job isn't started yet.
>>
> 
> Hmm, I don't see, how it helps.. No-op write-notifier will not save as from
> guest write, is it?
> 
> 

The idea is that by installing the write notifier during creation, the
write notifier can be switched on the instant job_start is created,
regardless of if we yield in the co_entry shim or not.

That way, no matter when we yield or when the backup_run coroutine
actually gets scheduled and executed, the write notifier is active already.

Or put another way: calling job_start() guarantees that the write
notifier is active.

I think using filters will save us too, but I don't know how ready those
are. Do we still want a patch that guarantees this behavior in the meantime?

--js


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] backup bug or question
  2019-08-12 17:46     ` John Snow
@ 2019-08-12 17:59       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-12 17:59 UTC (permalink / raw)
  To: John Snow, qemu block, qemu-devel, Kevin Wolf, Max Reitz

12.08.2019 20:46, John Snow wrote:
> 
> 
> On 8/10/19 7:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 09.08.2019 23:13, John Snow wrote:
>>>
>>>
>>> On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> Hi!
>>>>
>>>> Hmm, hacking around backup I have a question:
>>>>
>>>> What prevents guest write request after job_start but before setting
>>>> write notifier?
>>>>
>>>> code path:
>>>>
>>>> qmp_drive_backup or transaction with backup
>>>>
>>>>       job_start
>>>>          aio_co_enter(job_co_entry) /* may only schedule execution, isn't it ? */
>>>>
>>>> ....
>>>>
>>>> job_co_entry
>>>>       job_pause_point() /* it definitely yields, isn't it bad? */
>>>>       job->driver->run() /* backup_run */
>>>>
>>>> ----
>>>>
>>>> backup_run()
>>>>       bdrv_add_before_write_notifier()
>>>>
>>>> ...
>>>>
>>>
>>> I think you're right... :(
>>>
>>>
>>> We create jobs like this:
>>>
>>> job->paused        = true;
>>> job->pause_count   = 1;
>>>
>>>
>>> And then job_start does this:
>>>
>>> job->co = qemu_coroutine_create(job_co_entry, job);
>>> job->pause_count--;
>>> job->busy = true;
>>> job->paused = false;
>>>
>>>
>>> Which means that job_co_entry is being called before we lift the pause:
>>>
>>> assert(job && job->driver && job->driver->run);
>>> job_pause_point(job);
>>> job->ret = job->driver->run(job, &job->err);
>>>
>>> ...Which means that we are definitely yielding in job_pause_point.
>>>
>>> Yeah, that's a race condition waiting to happen.
>>>
>>>> And what guarantees we give to the user? Is it guaranteed that write notifier is
>>>> set when qmp command returns?
>>>>
>>>> And I guess, if we start several backups in a transaction it should be guaranteed
>>>> that the set of backups is consistent and correspond to one point in time...
>>>>
>>>
>>> I would have hoped that maybe the drain_all coupled with the individual
>>> jobs taking drain_start and drain_end would save us, but I guess we
>>> simply don't have a guarantee that all backup jobs WILL have installed
>>> their handler by the time the transaction ends.
>>>
>>> Or, if there is that guarantee, I don't know what provides it, so I
>>> think we shouldn't count on it accidentally working anymore.
>>>
>>>
>>>
>>> I think we should do two things:
>>>
>>> 1. Move the handler installation to creation time.
>>> 2. Modify backup_before_write_notify to return without invoking
>>> backup_do_cow if the job isn't started yet.
>>>
>>
>> Hmm, I don't see, how it helps.. No-op write-notifier will not save as from
>> guest write, is it?
>>
>>
> 
> The idea is that by installing the write notifier during creation, the
> write notifier can be switched on the instant job_start is created,
> regardless of if we yield in the co_entry shim or not.
> 
> That way, no matter when we yield or when the backup_run coroutine
> actually gets scheduled and executed, the write notifier is active already.
> 
> Or put another way: calling job_start() guarantees that the write
> notifier is active.


Oh, got it, feel stupid)

> 
> I think using filters will save us too, but I don't know how ready those
> are. Do we still want a patch that guarantees this behavior in the meantime?
> 

I think we want of course, will look at it tomorrow.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-08-12 18:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-09 13:18 [Qemu-devel] backup bug or question Vladimir Sementsov-Ogievskiy
2019-08-09 20:13 ` John Snow
2019-08-10 11:17   ` Vladimir Sementsov-Ogievskiy
2019-08-12 17:46     ` John Snow
2019-08-12 17:59       ` Vladimir Sementsov-Ogievskiy
2019-08-12 13:23 ` Kevin Wolf
2019-08-12 16:09   ` Vladimir Sementsov-Ogievskiy
2019-08-12 16:49     ` Kevin Wolf
2019-08-12 17:02       ` Vladimir Sementsov-Ogievskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).