All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] backup notifier fail policy
@ 2016-09-30 17:11 Vladimir Sementsov-Ogievskiy
  2016-09-30 18:59 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2016-09-30 17:11 UTC (permalink / raw)
  To: qemu-devel, qemu block
  Cc: Jeff Cody, John Snow, Fam Zheng, Stefan Hajnoczi, Denis V. Lunev

Hi all!

Please, can somebody explain me, why we fail guest request in case of io 
error in write notifier? I think guest consistency is more important 
than success of unfinished backup. Or, what am I missing?

I'm saying about this code:

static int coroutine_fn backup_before_write_notify(
         NotifierWithReturn *notifier,
         void *opaque)
{
     BackupBlockJob *job = container_of(notifier, BackupBlockJob, 
before_write);
     BdrvTrackedRequest *req = opaque;
     int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
     int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;

     assert(req->bs == blk_bs(job->common.blk));
     assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
     assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);

     return backup_do_cow(job, sector_num, nb_sectors, NULL, true);
}

So, what about something like

ret = backup_do_cow(job, ...
if (ret < 0 && job->notif_ret == 0) {
    job->notif_ret = ret;
}

return 0;

and fail block job if notif_ret < 0 in other places of backup code?

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] backup notifier fail policy
  2016-09-30 17:11 [Qemu-devel] backup notifier fail policy Vladimir Sementsov-Ogievskiy
@ 2016-09-30 18:59 ` Vladimir Sementsov-Ogievskiy
  2016-10-03 13:11   ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2016-09-30 18:59 UTC (permalink / raw)
  To: qemu-devel, qemu block
  Cc: Jeff Cody, John Snow, Fam Zheng, Stefan Hajnoczi, Denis V. Lunev

On 30.09.2016 20:11, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
>
> Please, can somebody explain me, why we fail guest request in case of 
> io error in write notifier? I think guest consistency is more 
> important than success of unfinished backup. Or, what am I missing?
>
> I'm saying about this code:
>
> static int coroutine_fn backup_before_write_notify(
>         NotifierWithReturn *notifier,
>         void *opaque)
> {
>     BackupBlockJob *job = container_of(notifier, BackupBlockJob, 
> before_write);
>     BdrvTrackedRequest *req = opaque;
>     int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
>     int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
>
>     assert(req->bs == blk_bs(job->common.blk));
>     assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
>     assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
>
>     return backup_do_cow(job, sector_num, nb_sectors, NULL, true);
> }
>
> So, what about something like
>
> ret = backup_do_cow(job, ...
> if (ret < 0 && job->notif_ret == 0) {
>    job->notif_ret = ret;
> }
>
> return 0;
>
> and fail block job if notif_ret < 0 in other places of backup code?
>

And second question about notifiers in backup block job. If block job is 
paused, notifiers still works and can copy data. Is it ok? So, user 
thinks that job is paused, so he can do something with target disk.. But 
really, this 'something' will race with write-notifiers. So, what 
assumptions may user actually have about paused backup job? Is there any 
agreements? Also, on query-block-jobs we will see job.busy = false, when 
actually copy-on-write may be in flight..

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] backup notifier fail policy
  2016-09-30 18:59 ` Vladimir Sementsov-Ogievskiy
@ 2016-10-03 13:11   ` Stefan Hajnoczi
  2016-10-03 18:07     ` John Snow
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2016-10-03 13:11 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu block, Jeff Cody, John Snow, Fam Zheng, Denis V. Lunev

[-- Attachment #1: Type: text/plain, Size: 2182 bytes --]

On Fri, Sep 30, 2016 at 09:59:16PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 30.09.2016 20:11, Vladimir Sementsov-Ogievskiy wrote:
> > Hi all!
> > 
> > Please, can somebody explain me, why we fail guest request in case of io
> > error in write notifier? I think guest consistency is more important
> > than success of unfinished backup. Or, what am I missing?
> > 
> > I'm saying about this code:
> > 
> > static int coroutine_fn backup_before_write_notify(
> >         NotifierWithReturn *notifier,
> >         void *opaque)
> > {
> >     BackupBlockJob *job = container_of(notifier, BackupBlockJob,
> > before_write);
> >     BdrvTrackedRequest *req = opaque;
> >     int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
> >     int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
> > 
> >     assert(req->bs == blk_bs(job->common.blk));
> >     assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
> >     assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
> > 
> >     return backup_do_cow(job, sector_num, nb_sectors, NULL, true);
> > }
> > 
> > So, what about something like
> > 
> > ret = backup_do_cow(job, ...
> > if (ret < 0 && job->notif_ret == 0) {
> >    job->notif_ret = ret;
> > }
> > 
> > return 0;
> > 
> > and fail block job if notif_ret < 0 in other places of backup code?
> > 
> 
> And second question about notifiers in backup block job. If block job is
> paused, notifiers still works and can copy data. Is it ok? So, user thinks
> that job is paused, so he can do something with target disk.. But really,
> this 'something' will race with write-notifiers. So, what assumptions may
> user actually have about paused backup job? Is there any agreements? Also,
> on query-block-jobs we will see job.busy = false, when actually
> copy-on-write may be in flight..

I agree that the job should fail and the guest continues running.

The backup job cannot do the usual ENOSPC stop/resume error handling
since we lose snapshot consistency once guest writes are allowed to
proceed.  Backup errors need to be fatal, resuming is usually not
possible.  The user will have to retry the backup operation.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] backup notifier fail policy
  2016-10-03 13:11   ` Stefan Hajnoczi
@ 2016-10-03 18:07     ` John Snow
  2016-10-04  9:23       ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: John Snow @ 2016-10-03 18:07 UTC (permalink / raw)
  To: Stefan Hajnoczi, Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu block, Jeff Cody, Fam Zheng, Denis V. Lunev



On 10/03/2016 09:11 AM, Stefan Hajnoczi wrote:
> On Fri, Sep 30, 2016 at 09:59:16PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 30.09.2016 20:11, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi all!
>>>
>>> Please, can somebody explain me, why we fail guest request in case of io
>>> error in write notifier? I think guest consistency is more important
>>> than success of unfinished backup. Or, what am I missing?
>>>
>>> I'm saying about this code:
>>>
>>> static int coroutine_fn backup_before_write_notify(
>>>         NotifierWithReturn *notifier,
>>>         void *opaque)
>>> {
>>>     BackupBlockJob *job = container_of(notifier, BackupBlockJob,
>>> before_write);
>>>     BdrvTrackedRequest *req = opaque;
>>>     int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
>>>     int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
>>>
>>>     assert(req->bs == blk_bs(job->common.blk));
>>>     assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
>>>     assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
>>>
>>>     return backup_do_cow(job, sector_num, nb_sectors, NULL, true);
>>> }
>>>
>>> So, what about something like
>>>
>>> ret = backup_do_cow(job, ...
>>> if (ret < 0 && job->notif_ret == 0) {
>>>    job->notif_ret = ret;
>>> }
>>>
>>> return 0;
>>>
>>> and fail block job if notif_ret < 0 in other places of backup code?
>>>
>>
>> And second question about notifiers in backup block job. If block job is
>> paused, notifiers still works and can copy data. Is it ok? So, user thinks
>> that job is paused, so he can do something with target disk.. But really,
>> this 'something' will race with write-notifiers. So, what assumptions may
>> user actually have about paused backup job? Is there any agreements? Also,
>> on query-block-jobs we will see job.busy = false, when actually
>> copy-on-write may be in flight..
>
> I agree that the job should fail and the guest continues running.
>
> The backup job cannot do the usual ENOSPC stop/resume error handling
> since we lose snapshot consistency once guest writes are allowed to
> proceed.  Backup errors need to be fatal, resuming is usually not
> possible.  The user will have to retry the backup operation.
>
> Stefan
>

If we fail and intercept the error for the backup write and HALT at that 
point, why would we lose consistency? If the backup write failed before 
we allowed the guest write to proceed, that data should still be there 
on disk, no?

I guess it is a little messier than the usual STOP case, but it doesn't 
seem inherently impossible to me...

Eh, regardless: If we're not using a STOP policy, it seems like the 
right thing to do is definitely to just fail the backup instead of 
failing the write.

As for paused guarantees... good point. If you want to truly pause a 
backup job, I think you necessarily begin accruing a backlog of data 
that needs to get written back out. Maybe it's not easily possible to 
truly pause a backup block job.

I'm not exactly sure what we should do about it, though I do know that 
eventually we want to replace write notifiers with block filters, but 
even those would likely remain operating during a pause.

'busy' means something very specific within QEMU, but perhaps the query 
function can be adjusted to return 'true' for busy as long as either the 
job is running OR it has latent portions still running (write notifiers, 
block filters, etc.)

--js

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] backup notifier fail policy
  2016-10-03 18:07     ` John Snow
@ 2016-10-04  9:23       ` Stefan Hajnoczi
  2016-10-04  9:34         ` Kevin Wolf
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2016-10-04  9:23 UTC (permalink / raw)
  To: John Snow
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu block, Jeff Cody,
	Fam Zheng, Denis V. Lunev

[-- Attachment #1: Type: text/plain, Size: 3375 bytes --]

On Mon, Oct 03, 2016 at 02:07:34PM -0400, John Snow wrote:
> 
> 
> On 10/03/2016 09:11 AM, Stefan Hajnoczi wrote:
> > On Fri, Sep 30, 2016 at 09:59:16PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > On 30.09.2016 20:11, Vladimir Sementsov-Ogievskiy wrote:
> > > > Hi all!
> > > > 
> > > > Please, can somebody explain me, why we fail guest request in case of io
> > > > error in write notifier? I think guest consistency is more important
> > > > than success of unfinished backup. Or, what am I missing?
> > > > 
> > > > I'm saying about this code:
> > > > 
> > > > static int coroutine_fn backup_before_write_notify(
> > > >         NotifierWithReturn *notifier,
> > > >         void *opaque)
> > > > {
> > > >     BackupBlockJob *job = container_of(notifier, BackupBlockJob,
> > > > before_write);
> > > >     BdrvTrackedRequest *req = opaque;
> > > >     int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
> > > >     int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
> > > > 
> > > >     assert(req->bs == blk_bs(job->common.blk));
> > > >     assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
> > > >     assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
> > > > 
> > > >     return backup_do_cow(job, sector_num, nb_sectors, NULL, true);
> > > > }
> > > > 
> > > > So, what about something like
> > > > 
> > > > ret = backup_do_cow(job, ...
> > > > if (ret < 0 && job->notif_ret == 0) {
> > > >    job->notif_ret = ret;
> > > > }
> > > > 
> > > > return 0;
> > > > 
> > > > and fail block job if notif_ret < 0 in other places of backup code?
> > > > 
> > > 
> > > And second question about notifiers in backup block job. If block job is
> > > paused, notifiers still works and can copy data. Is it ok? So, user thinks
> > > that job is paused, so he can do something with target disk.. But really,
> > > this 'something' will race with write-notifiers. So, what assumptions may
> > > user actually have about paused backup job? Is there any agreements? Also,
> > > on query-block-jobs we will see job.busy = false, when actually
> > > copy-on-write may be in flight..
> > 
> > I agree that the job should fail and the guest continues running.
> > 
> > The backup job cannot do the usual ENOSPC stop/resume error handling
> > since we lose snapshot consistency once guest writes are allowed to
> > proceed.  Backup errors need to be fatal, resuming is usually not
> > possible.  The user will have to retry the backup operation.
> > 
> > Stefan
> > 
> 
> If we fail and intercept the error for the backup write and HALT at that
> point, why would we lose consistency? If the backup write failed before we
> allowed the guest write to proceed, that data should still be there on disk,
> no?

I missed that there are two separate error handling approaches used in
block/backup.c:

1. In the write notifier I/O errors are treated as if the guest write
failed.

2. In the backup_run() loop I/O errors affect the block job's error
status.

I was thinking of case #2 instead of case #1.

> Eh, regardless: If we're not using a STOP policy, it seems like the right
> thing to do is definitely to just fail the backup instead of failing the
> write.

Even with a -drive werror=stop policy the user probably doesn't want
guest downtime if writing to the backup target fails.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] backup notifier fail policy
  2016-10-04  9:23       ` Stefan Hajnoczi
@ 2016-10-04  9:34         ` Kevin Wolf
  2016-10-04 10:41           ` Denis V. Lunev
  0 siblings, 1 reply; 13+ messages in thread
From: Kevin Wolf @ 2016-10-04  9:34 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: John Snow, Vladimir Sementsov-Ogievskiy, Fam Zheng, qemu block,
	Jeff Cody, qemu-devel, Denis V. Lunev

[-- Attachment #1: Type: text/plain, Size: 3910 bytes --]

Am 04.10.2016 um 11:23 hat Stefan Hajnoczi geschrieben:
> On Mon, Oct 03, 2016 at 02:07:34PM -0400, John Snow wrote:
> > 
> > 
> > On 10/03/2016 09:11 AM, Stefan Hajnoczi wrote:
> > > On Fri, Sep 30, 2016 at 09:59:16PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > On 30.09.2016 20:11, Vladimir Sementsov-Ogievskiy wrote:
> > > > > Hi all!
> > > > > 
> > > > > Please, can somebody explain me, why we fail guest request in case of io
> > > > > error in write notifier? I think guest consistency is more important
> > > > > than success of unfinished backup. Or, what am I missing?
> > > > > 
> > > > > I'm saying about this code:
> > > > > 
> > > > > static int coroutine_fn backup_before_write_notify(
> > > > >         NotifierWithReturn *notifier,
> > > > >         void *opaque)
> > > > > {
> > > > >     BackupBlockJob *job = container_of(notifier, BackupBlockJob,
> > > > > before_write);
> > > > >     BdrvTrackedRequest *req = opaque;
> > > > >     int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
> > > > >     int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
> > > > > 
> > > > >     assert(req->bs == blk_bs(job->common.blk));
> > > > >     assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
> > > > >     assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
> > > > > 
> > > > >     return backup_do_cow(job, sector_num, nb_sectors, NULL, true);
> > > > > }
> > > > > 
> > > > > So, what about something like
> > > > > 
> > > > > ret = backup_do_cow(job, ...
> > > > > if (ret < 0 && job->notif_ret == 0) {
> > > > >    job->notif_ret = ret;
> > > > > }
> > > > > 
> > > > > return 0;
> > > > > 
> > > > > and fail block job if notif_ret < 0 in other places of backup code?
> > > > > 
> > > > 
> > > > And second question about notifiers in backup block job. If block job is
> > > > paused, notifiers still works and can copy data. Is it ok? So, user thinks
> > > > that job is paused, so he can do something with target disk.. But really,
> > > > this 'something' will race with write-notifiers. So, what assumptions may
> > > > user actually have about paused backup job? Is there any agreements? Also,
> > > > on query-block-jobs we will see job.busy = false, when actually
> > > > copy-on-write may be in flight..
> > > 
> > > I agree that the job should fail and the guest continues running.
> > > 
> > > The backup job cannot do the usual ENOSPC stop/resume error handling
> > > since we lose snapshot consistency once guest writes are allowed to
> > > proceed.  Backup errors need to be fatal, resuming is usually not
> > > possible.  The user will have to retry the backup operation.
> > > 
> > > Stefan
> > > 
> > 
> > If we fail and intercept the error for the backup write and HALT at that
> > point, why would we lose consistency? If the backup write failed before we
> > allowed the guest write to proceed, that data should still be there on disk,
> > no?
> 
> I missed that there are two separate error handling approaches used in
> block/backup.c:
> 
> 1. In the write notifier I/O errors are treated as if the guest write
> failed.
> 
> 2. In the backup_run() loop I/O errors affect the block job's error
> status.
> 
> I was thinking of case #2 instead of case #1.
> 
> > Eh, regardless: If we're not using a STOP policy, it seems like the right
> > thing to do is definitely to just fail the backup instead of failing the
> > write.
> 
> Even with a -drive werror=stop policy the user probably doesn't want
> guest downtime if writing to the backup target fails.

That's a policy decision that ultimately only the user can make. For one
user, it might be preferable to cancel the backup and keep the VM
running, but for another user it may be more important to keep a
consistent snapshot of the point in time when the backup job was started
than keeping the VM running.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] backup notifier fail policy
  2016-10-04  9:34         ` Kevin Wolf
@ 2016-10-04 10:41           ` Denis V. Lunev
  2016-10-04 11:55             ` Kevin Wolf
  0 siblings, 1 reply; 13+ messages in thread
From: Denis V. Lunev @ 2016-10-04 10:41 UTC (permalink / raw)
  To: Kevin Wolf, Stefan Hajnoczi
  Cc: John Snow, Vladimir Sementsov-Ogievskiy, Fam Zheng, qemu block,
	Jeff Cody, qemu-devel

On 10/04/2016 12:34 PM, Kevin Wolf wrote:
> Am 04.10.2016 um 11:23 hat Stefan Hajnoczi geschrieben:
>> On Mon, Oct 03, 2016 at 02:07:34PM -0400, John Snow wrote:
>>>
>>> On 10/03/2016 09:11 AM, Stefan Hajnoczi wrote:
>>>> On Fri, Sep 30, 2016 at 09:59:16PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>> On 30.09.2016 20:11, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> Hi all!
>>>>>>
>>>>>> Please, can somebody explain me, why we fail guest request in case of io
>>>>>> error in write notifier? I think guest consistency is more important
>>>>>> than success of unfinished backup. Or, what am I missing?
>>>>>>
>>>>>> I'm saying about this code:
>>>>>>
>>>>>> static int coroutine_fn backup_before_write_notify(
>>>>>>         NotifierWithReturn *notifier,
>>>>>>         void *opaque)
>>>>>> {
>>>>>>     BackupBlockJob *job = container_of(notifier, BackupBlockJob,
>>>>>> before_write);
>>>>>>     BdrvTrackedRequest *req = opaque;
>>>>>>     int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
>>>>>>     int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
>>>>>>
>>>>>>     assert(req->bs == blk_bs(job->common.blk));
>>>>>>     assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
>>>>>>     assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
>>>>>>
>>>>>>     return backup_do_cow(job, sector_num, nb_sectors, NULL, true);
>>>>>> }
>>>>>>
>>>>>> So, what about something like
>>>>>>
>>>>>> ret = backup_do_cow(job, ...
>>>>>> if (ret < 0 && job->notif_ret == 0) {
>>>>>>    job->notif_ret = ret;
>>>>>> }
>>>>>>
>>>>>> return 0;
>>>>>>
>>>>>> and fail block job if notif_ret < 0 in other places of backup code?
>>>>>>
>>>>> And second question about notifiers in backup block job. If block job is
>>>>> paused, notifiers still works and can copy data. Is it ok? So, user thinks
>>>>> that job is paused, so he can do something with target disk.. But really,
>>>>> this 'something' will race with write-notifiers. So, what assumptions may
>>>>> user actually have about paused backup job? Is there any agreements? Also,
>>>>> on query-block-jobs we will see job.busy = false, when actually
>>>>> copy-on-write may be in flight..
>>>> I agree that the job should fail and the guest continues running.
>>>>
>>>> The backup job cannot do the usual ENOSPC stop/resume error handling
>>>> since we lose snapshot consistency once guest writes are allowed to
>>>> proceed.  Backup errors need to be fatal, resuming is usually not
>>>> possible.  The user will have to retry the backup operation.
>>>>
>>>> Stefan
>>>>
>>> If we fail and intercept the error for the backup write and HALT at that
>>> point, why would we lose consistency? If the backup write failed before we
>>> allowed the guest write to proceed, that data should still be there on disk,
>>> no?
>> I missed that there are two separate error handling approaches used in
>> block/backup.c:
>>
>> 1. In the write notifier I/O errors are treated as if the guest write
>> failed.
>>
>> 2. In the backup_run() loop I/O errors affect the block job's error
>> status.
>>
>> I was thinking of case #2 instead of case #1.
>>
>>> Eh, regardless: If we're not using a STOP policy, it seems like the right
>>> thing to do is definitely to just fail the backup instead of failing the
>>> write.
>> Even with a -drive werror=stop policy the user probably doesn't want
>> guest downtime if writing to the backup target fails.
> That's a policy decision that ultimately only the user can make. For one
> user, it might be preferable to cancel the backup and keep the VM
> running, but for another user it may be more important to keep a
> consistent snapshot of the point in time when the backup job was started
> than keeping the VM running.
>
> Kevin
In this case policy for guest error and policy for backup
error should be different policies or I have missed something.

Den

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] backup notifier fail policy
  2016-10-04 10:41           ` Denis V. Lunev
@ 2016-10-04 11:55             ` Kevin Wolf
  2016-10-04 16:02               ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Kevin Wolf @ 2016-10-04 11:55 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: Stefan Hajnoczi, John Snow, Vladimir Sementsov-Ogievskiy,
	Fam Zheng, qemu block, Jeff Cody, qemu-devel

Am 04.10.2016 um 12:41 hat Denis V. Lunev geschrieben:
> On 10/04/2016 12:34 PM, Kevin Wolf wrote:
> > Am 04.10.2016 um 11:23 hat Stefan Hajnoczi geschrieben:
> >> On Mon, Oct 03, 2016 at 02:07:34PM -0400, John Snow wrote:
> >>>
> >>> On 10/03/2016 09:11 AM, Stefan Hajnoczi wrote:
> >>>> On Fri, Sep 30, 2016 at 09:59:16PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> >>>>> On 30.09.2016 20:11, Vladimir Sementsov-Ogievskiy wrote:
> >>>>>> Hi all!
> >>>>>>
> >>>>>> Please, can somebody explain me, why we fail guest request in case of io
> >>>>>> error in write notifier? I think guest consistency is more important
> >>>>>> than success of unfinished backup. Or, what am I missing?
> >>>>>>
> >>>>>> I'm saying about this code:
> >>>>>>
> >>>>>> static int coroutine_fn backup_before_write_notify(
> >>>>>>         NotifierWithReturn *notifier,
> >>>>>>         void *opaque)
> >>>>>> {
> >>>>>>     BackupBlockJob *job = container_of(notifier, BackupBlockJob,
> >>>>>> before_write);
> >>>>>>     BdrvTrackedRequest *req = opaque;
> >>>>>>     int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
> >>>>>>     int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
> >>>>>>
> >>>>>>     assert(req->bs == blk_bs(job->common.blk));
> >>>>>>     assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
> >>>>>>     assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
> >>>>>>
> >>>>>>     return backup_do_cow(job, sector_num, nb_sectors, NULL, true);
> >>>>>> }
> >>>>>>
> >>>>>> So, what about something like
> >>>>>>
> >>>>>> ret = backup_do_cow(job, ...
> >>>>>> if (ret < 0 && job->notif_ret == 0) {
> >>>>>>    job->notif_ret = ret;
> >>>>>> }
> >>>>>>
> >>>>>> return 0;
> >>>>>>
> >>>>>> and fail block job if notif_ret < 0 in other places of backup code?
> >>>>>>
> >>>>> And second question about notifiers in backup block job. If block job is
> >>>>> paused, notifiers still works and can copy data. Is it ok? So, user thinks
> >>>>> that job is paused, so he can do something with target disk.. But really,
> >>>>> this 'something' will race with write-notifiers. So, what assumptions may
> >>>>> user actually have about paused backup job? Is there any agreements? Also,
> >>>>> on query-block-jobs we will see job.busy = false, when actually
> >>>>> copy-on-write may be in flight..
> >>>> I agree that the job should fail and the guest continues running.
> >>>>
> >>>> The backup job cannot do the usual ENOSPC stop/resume error handling
> >>>> since we lose snapshot consistency once guest writes are allowed to
> >>>> proceed.  Backup errors need to be fatal, resuming is usually not
> >>>> possible.  The user will have to retry the backup operation.
> >>>>
> >>>> Stefan
> >>>>
> >>> If we fail and intercept the error for the backup write and HALT at that
> >>> point, why would we lose consistency? If the backup write failed before we
> >>> allowed the guest write to proceed, that data should still be there on disk,
> >>> no?
> >> I missed that there are two separate error handling approaches used in
> >> block/backup.c:
> >>
> >> 1. In the write notifier I/O errors are treated as if the guest write
> >> failed.
> >>
> >> 2. In the backup_run() loop I/O errors affect the block job's error
> >> status.
> >>
> >> I was thinking of case #2 instead of case #1.
> >>
> >>> Eh, regardless: If we're not using a STOP policy, it seems like the right
> >>> thing to do is definitely to just fail the backup instead of failing the
> >>> write.
> >> Even with a -drive werror=stop policy the user probably doesn't want
> >> guest downtime if writing to the backup target fails.
> > That's a policy decision that ultimately only the user can make. For one
> > user, it might be preferable to cancel the backup and keep the VM
> > running, but for another user it may be more important to keep a
> > consistent snapshot of the point in time when the backup job was started
> > than keeping the VM running.
> >
> > Kevin
> In this case policy for guest error and policy for backup
> error should be different policies or I have missed something.

I guess so.

Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] backup notifier fail policy
  2016-10-04 11:55             ` Kevin Wolf
@ 2016-10-04 16:02               ` Stefan Hajnoczi
  2016-10-04 16:03                 ` John Snow
  2016-10-05  8:12                 ` Kevin Wolf
  0 siblings, 2 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2016-10-04 16:02 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Denis V. Lunev, John Snow, Vladimir Sementsov-Ogievskiy,
	Fam Zheng, qemu block, Jeff Cody, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4619 bytes --]

On Tue, Oct 04, 2016 at 01:55:30PM +0200, Kevin Wolf wrote:
> Am 04.10.2016 um 12:41 hat Denis V. Lunev geschrieben:
> > On 10/04/2016 12:34 PM, Kevin Wolf wrote:
> > > Am 04.10.2016 um 11:23 hat Stefan Hajnoczi geschrieben:
> > >> On Mon, Oct 03, 2016 at 02:07:34PM -0400, John Snow wrote:
> > >>>
> > >>> On 10/03/2016 09:11 AM, Stefan Hajnoczi wrote:
> > >>>> On Fri, Sep 30, 2016 at 09:59:16PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > >>>>> On 30.09.2016 20:11, Vladimir Sementsov-Ogievskiy wrote:
> > >>>>>> Hi all!
> > >>>>>>
> > >>>>>> Please, can somebody explain me, why we fail guest request in case of io
> > >>>>>> error in write notifier? I think guest consistency is more important
> > >>>>>> than success of unfinished backup. Or, what am I missing?
> > >>>>>>
> > >>>>>> I'm saying about this code:
> > >>>>>>
> > >>>>>> static int coroutine_fn backup_before_write_notify(
> > >>>>>>         NotifierWithReturn *notifier,
> > >>>>>>         void *opaque)
> > >>>>>> {
> > >>>>>>     BackupBlockJob *job = container_of(notifier, BackupBlockJob,
> > >>>>>> before_write);
> > >>>>>>     BdrvTrackedRequest *req = opaque;
> > >>>>>>     int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
> > >>>>>>     int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
> > >>>>>>
> > >>>>>>     assert(req->bs == blk_bs(job->common.blk));
> > >>>>>>     assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
> > >>>>>>     assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
> > >>>>>>
> > >>>>>>     return backup_do_cow(job, sector_num, nb_sectors, NULL, true);
> > >>>>>> }
> > >>>>>>
> > >>>>>> So, what about something like
> > >>>>>>
> > >>>>>> ret = backup_do_cow(job, ...
> > >>>>>> if (ret < 0 && job->notif_ret == 0) {
> > >>>>>>    job->notif_ret = ret;
> > >>>>>> }
> > >>>>>>
> > >>>>>> return 0;
> > >>>>>>
> > >>>>>> and fail block job if notif_ret < 0 in other places of backup code?
> > >>>>>>
> > >>>>> And second question about notifiers in backup block job. If block job is
> > >>>>> paused, notifiers still works and can copy data. Is it ok? So, user thinks
> > >>>>> that job is paused, so he can do something with target disk.. But really,
> > >>>>> this 'something' will race with write-notifiers. So, what assumptions may
> > >>>>> user actually have about paused backup job? Is there any agreements? Also,
> > >>>>> on query-block-jobs we will see job.busy = false, when actually
> > >>>>> copy-on-write may be in flight..
> > >>>> I agree that the job should fail and the guest continues running.
> > >>>>
> > >>>> The backup job cannot do the usual ENOSPC stop/resume error handling
> > >>>> since we lose snapshot consistency once guest writes are allowed to
> > >>>> proceed.  Backup errors need to be fatal, resuming is usually not
> > >>>> possible.  The user will have to retry the backup operation.
> > >>>>
> > >>>> Stefan
> > >>>>
> > >>> If we fail and intercept the error for the backup write and HALT at that
> > >>> point, why would we lose consistency? If the backup write failed before we
> > >>> allowed the guest write to proceed, that data should still be there on disk,
> > >>> no?
> > >> I missed that there are two separate error handling approaches used in
> > >> block/backup.c:
> > >>
> > >> 1. In the write notifier I/O errors are treated as if the guest write
> > >> failed.
> > >>
> > >> 2. In the backup_run() loop I/O errors affect the block job's error
> > >> status.
> > >>
> > >> I was thinking of case #2 instead of case #1.
> > >>
> > >>> Eh, regardless: If we're not using a STOP policy, it seems like the right
> > >>> thing to do is definitely to just fail the backup instead of failing the
> > >>> write.
> > >> Even with a -drive werror=stop policy the user probably doesn't want
> > >> guest downtime if writing to the backup target fails.
> > > That's a policy decision that ultimately only the user can make. For one
> > > user, it might be preferable to cancel the backup and keep the VM
> > > running, but for another user it may be more important to keep a
> > > consistent snapshot of the point in time when the backup job was started
> > > than keeping the VM running.
> > >
> > > Kevin
> > In this case policy for guest error and policy for backup
> > error should be different policies or I have missed something.
> 
> I guess so.

There are separate error policies for -device and the blockjob.  Perhaps
the blockjob error policy can be used in the write notifier code path if
the failure occurs while writing to the backup target.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] backup notifier fail policy
  2016-10-04 16:02               ` Stefan Hajnoczi
@ 2016-10-04 16:03                 ` John Snow
  2016-10-04 16:19                   ` Denis V. Lunev
  2016-10-05  8:12                 ` Kevin Wolf
  1 sibling, 1 reply; 13+ messages in thread
From: John Snow @ 2016-10-04 16:03 UTC (permalink / raw)
  To: Stefan Hajnoczi, Kevin Wolf
  Cc: Denis V. Lunev, Vladimir Sementsov-Ogievskiy, Fam Zheng,
	qemu block, Jeff Cody, qemu-devel



On 10/04/2016 12:02 PM, Stefan Hajnoczi wrote:
> On Tue, Oct 04, 2016 at 01:55:30PM +0200, Kevin Wolf wrote:
>> Am 04.10.2016 um 12:41 hat Denis V. Lunev geschrieben:
>>> On 10/04/2016 12:34 PM, Kevin Wolf wrote:
>>>> Am 04.10.2016 um 11:23 hat Stefan Hajnoczi geschrieben:
>>>>> On Mon, Oct 03, 2016 at 02:07:34PM -0400, John Snow wrote:
>>>>>>
>>>>>> On 10/03/2016 09:11 AM, Stefan Hajnoczi wrote:
>>>>>>> On Fri, Sep 30, 2016 at 09:59:16PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>> On 30.09.2016 20:11, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>> Hi all!
>>>>>>>>>
>>>>>>>>> Please, can somebody explain me, why we fail guest request in case of io
>>>>>>>>> error in write notifier? I think guest consistency is more important
>>>>>>>>> than success of unfinished backup. Or, what am I missing?
>>>>>>>>>
>>>>>>>>> I'm saying about this code:
>>>>>>>>>
>>>>>>>>> static int coroutine_fn backup_before_write_notify(
>>>>>>>>>         NotifierWithReturn *notifier,
>>>>>>>>>         void *opaque)
>>>>>>>>> {
>>>>>>>>>     BackupBlockJob *job = container_of(notifier, BackupBlockJob,
>>>>>>>>> before_write);
>>>>>>>>>     BdrvTrackedRequest *req = opaque;
>>>>>>>>>     int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
>>>>>>>>>     int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
>>>>>>>>>
>>>>>>>>>     assert(req->bs == blk_bs(job->common.blk));
>>>>>>>>>     assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
>>>>>>>>>     assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
>>>>>>>>>
>>>>>>>>>     return backup_do_cow(job, sector_num, nb_sectors, NULL, true);
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> So, what about something like
>>>>>>>>>
>>>>>>>>> ret = backup_do_cow(job, ...
>>>>>>>>> if (ret < 0 && job->notif_ret == 0) {
>>>>>>>>>    job->notif_ret = ret;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> return 0;
>>>>>>>>>
>>>>>>>>> and fail block job if notif_ret < 0 in other places of backup code?
>>>>>>>>>
>>>>>>>> And second question about notifiers in backup block job. If block job is
>>>>>>>> paused, notifiers still works and can copy data. Is it ok? So, user thinks
>>>>>>>> that job is paused, so he can do something with target disk.. But really,
>>>>>>>> this 'something' will race with write-notifiers. So, what assumptions may
>>>>>>>> user actually have about paused backup job? Is there any agreements? Also,
>>>>>>>> on query-block-jobs we will see job.busy = false, when actually
>>>>>>>> copy-on-write may be in flight..
>>>>>>> I agree that the job should fail and the guest continues running.
>>>>>>>
>>>>>>> The backup job cannot do the usual ENOSPC stop/resume error handling
>>>>>>> since we lose snapshot consistency once guest writes are allowed to
>>>>>>> proceed.  Backup errors need to be fatal, resuming is usually not
>>>>>>> possible.  The user will have to retry the backup operation.
>>>>>>>
>>>>>>> Stefan
>>>>>>>
>>>>>> If we fail and intercept the error for the backup write and HALT at that
>>>>>> point, why would we lose consistency? If the backup write failed before we
>>>>>> allowed the guest write to proceed, that data should still be there on disk,
>>>>>> no?
>>>>> I missed that there are two separate error handling approaches used in
>>>>> block/backup.c:
>>>>>
>>>>> 1. In the write notifier I/O errors are treated as if the guest write
>>>>> failed.
>>>>>
>>>>> 2. In the backup_run() loop I/O errors affect the block job's error
>>>>> status.
>>>>>
>>>>> I was thinking of case #2 instead of case #1.
>>>>>
>>>>>> Eh, regardless: If we're not using a STOP policy, it seems like the right
>>>>>> thing to do is definitely to just fail the backup instead of failing the
>>>>>> write.
>>>>> Even with a -drive werror=stop policy the user probably doesn't want
>>>>> guest downtime if writing to the backup target fails.
>>>> That's a policy decision that ultimately only the user can make. For one
>>>> user, it might be preferable to cancel the backup and keep the VM
>>>> running, but for another user it may be more important to keep a
>>>> consistent snapshot of the point in time when the backup job was started
>>>> than keeping the VM running.
>>>>
>>>> Kevin
>>> In this case policy for guest error and policy for backup
>>> error should be different policies or I have missed something.
>>
>> I guess so.
>
> There are separate error policies for -device and the blockjob.  Perhaps
> the blockjob error policy can be used in the write notifier code path if
> the failure occurs while writing to the backup target.
>
> Stefan
>

Sounds good to me.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] backup notifier fail policy
  2016-10-04 16:03                 ` John Snow
@ 2016-10-04 16:19                   ` Denis V. Lunev
  0 siblings, 0 replies; 13+ messages in thread
From: Denis V. Lunev @ 2016-10-04 16:19 UTC (permalink / raw)
  To: John Snow, Stefan Hajnoczi, Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, Fam Zheng, qemu block, Jeff Cody,
	qemu-devel

On 10/04/2016 07:03 PM, John Snow wrote:
>
>
> On 10/04/2016 12:02 PM, Stefan Hajnoczi wrote:
>> On Tue, Oct 04, 2016 at 01:55:30PM +0200, Kevin Wolf wrote:
>>> Am 04.10.2016 um 12:41 hat Denis V. Lunev geschrieben:
>>>> On 10/04/2016 12:34 PM, Kevin Wolf wrote:
>>>>> Am 04.10.2016 um 11:23 hat Stefan Hajnoczi geschrieben:
>>>>>> On Mon, Oct 03, 2016 at 02:07:34PM -0400, John Snow wrote:
>>>>>>>
>>>>>>> On 10/03/2016 09:11 AM, Stefan Hajnoczi wrote:
>>>>>>>> On Fri, Sep 30, 2016 at 09:59:16PM +0300, Vladimir
>>>>>>>> Sementsov-Ogievskiy wrote:
>>>>>>>>> On 30.09.2016 20:11, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>>> Hi all!
>>>>>>>>>>
>>>>>>>>>> Please, can somebody explain me, why we fail guest request in
>>>>>>>>>> case of io
>>>>>>>>>> error in write notifier? I think guest consistency is more
>>>>>>>>>> important
>>>>>>>>>> than success of unfinished backup. Or, what am I missing?
>>>>>>>>>>
>>>>>>>>>> I'm saying about this code:
>>>>>>>>>>
>>>>>>>>>> static int coroutine_fn backup_before_write_notify(
>>>>>>>>>>         NotifierWithReturn *notifier,
>>>>>>>>>>         void *opaque)
>>>>>>>>>> {
>>>>>>>>>>     BackupBlockJob *job = container_of(notifier, BackupBlockJob,
>>>>>>>>>> before_write);
>>>>>>>>>>     BdrvTrackedRequest *req = opaque;
>>>>>>>>>>     int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
>>>>>>>>>>     int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
>>>>>>>>>>
>>>>>>>>>>     assert(req->bs == blk_bs(job->common.blk));
>>>>>>>>>>     assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
>>>>>>>>>>     assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
>>>>>>>>>>
>>>>>>>>>>     return backup_do_cow(job, sector_num, nb_sectors, NULL,
>>>>>>>>>> true);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> So, what about something like
>>>>>>>>>>
>>>>>>>>>> ret = backup_do_cow(job, ...
>>>>>>>>>> if (ret < 0 && job->notif_ret == 0) {
>>>>>>>>>>    job->notif_ret = ret;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> return 0;
>>>>>>>>>>
>>>>>>>>>> and fail block job if notif_ret < 0 in other places of backup
>>>>>>>>>> code?
>>>>>>>>>>
>>>>>>>>> And second question about notifiers in backup block job. If
>>>>>>>>> block job is
>>>>>>>>> paused, notifiers still works and can copy data. Is it ok? So,
>>>>>>>>> user thinks
>>>>>>>>> that job is paused, so he can do something with target disk..
>>>>>>>>> But really,
>>>>>>>>> this 'something' will race with write-notifiers. So, what
>>>>>>>>> assumptions may
>>>>>>>>> user actually have about paused backup job? Is there any
>>>>>>>>> agreements? Also,
>>>>>>>>> on query-block-jobs we will see job.busy = false, when actually
>>>>>>>>> copy-on-write may be in flight..
>>>>>>>> I agree that the job should fail and the guest continues running.
>>>>>>>>
>>>>>>>> The backup job cannot do the usual ENOSPC stop/resume error
>>>>>>>> handling
>>>>>>>> since we lose snapshot consistency once guest writes are
>>>>>>>> allowed to
>>>>>>>> proceed.  Backup errors need to be fatal, resuming is usually not
>>>>>>>> possible.  The user will have to retry the backup operation.
>>>>>>>>
>>>>>>>> Stefan
>>>>>>>>
>>>>>>> If we fail and intercept the error for the backup write and HALT
>>>>>>> at that
>>>>>>> point, why would we lose consistency? If the backup write failed
>>>>>>> before we
>>>>>>> allowed the guest write to proceed, that data should still be
>>>>>>> there on disk,
>>>>>>> no?
>>>>>> I missed that there are two separate error handling approaches
>>>>>> used in
>>>>>> block/backup.c:
>>>>>>
>>>>>> 1. In the write notifier I/O errors are treated as if the guest
>>>>>> write
>>>>>> failed.
>>>>>>
>>>>>> 2. In the backup_run() loop I/O errors affect the block job's error
>>>>>> status.
>>>>>>
>>>>>> I was thinking of case #2 instead of case #1.
>>>>>>
>>>>>>> Eh, regardless: If we're not using a STOP policy, it seems like
>>>>>>> the right
>>>>>>> thing to do is definitely to just fail the backup instead of
>>>>>>> failing the
>>>>>>> write.
>>>>>> Even with a -drive werror=stop policy the user probably doesn't want
>>>>>> guest downtime if writing to the backup target fails.
>>>>> That's a policy decision that ultimately only the user can make.
>>>>> For one
>>>>> user, it might be preferable to cancel the backup and keep the VM
>>>>> running, but for another user it may be more important to keep a
>>>>> consistent snapshot of the point in time when the backup job was
>>>>> started
>>>>> than keeping the VM running.
>>>>>
>>>>> Kevin
>>>> In this case policy for guest error and policy for backup
>>>> error should be different policies or I have missed something.
>>>
>>> I guess so.
>>
>> There are separate error policies for -device and the blockjob.  Perhaps
>> the blockjob error policy can be used in the write notifier code path if
>> the failure occurs while writing to the backup target.
>>
>> Stefan
>>
>
> Sounds good to me.
and to me this way. Cool.

Den

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] backup notifier fail policy
  2016-10-04 16:02               ` Stefan Hajnoczi
  2016-10-04 16:03                 ` John Snow
@ 2016-10-05  8:12                 ` Kevin Wolf
  2016-10-05 12:59                   ` Stefan Hajnoczi
  1 sibling, 1 reply; 13+ messages in thread
From: Kevin Wolf @ 2016-10-05  8:12 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Denis V. Lunev, John Snow, Vladimir Sementsov-Ogievskiy,
	Fam Zheng, qemu block, Jeff Cody, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1653 bytes --]

Am 04.10.2016 um 18:02 hat Stefan Hajnoczi geschrieben:
> On Tue, Oct 04, 2016 at 01:55:30PM +0200, Kevin Wolf wrote:
> > Am 04.10.2016 um 12:41 hat Denis V. Lunev geschrieben:
> > > On 10/04/2016 12:34 PM, Kevin Wolf wrote:
> > > > Am 04.10.2016 um 11:23 hat Stefan Hajnoczi geschrieben:
> > > >> On Mon, Oct 03, 2016 at 02:07:34PM -0400, John Snow wrote:
> > > >>> Eh, regardless: If we're not using a STOP policy, it seems like the right
> > > >>> thing to do is definitely to just fail the backup instead of failing the
> > > >>> write.
> > > >> Even with a -drive werror=stop policy the user probably doesn't want
> > > >> guest downtime if writing to the backup target fails.
> > > > That's a policy decision that ultimately only the user can make. For one
> > > > user, it might be preferable to cancel the backup and keep the VM
> > > > running, but for another user it may be more important to keep a
> > > > consistent snapshot of the point in time when the backup job was started
> > > > than keeping the VM running.
> > > >
> > > > Kevin
> > > In this case policy for guest error and policy for backup
> > > error should be different policies or I have missed something.
> > 
> > I guess so.
> 
> There are separate error policies for -device and the blockjob.  Perhaps
> the blockjob error policy can be used in the write notifier code path if
> the failure occurs while writing to the backup target.

Isn't the block job policy used for the background copy? Or do you think
it's okay to use the same for both cases? That would mean that we stop
the VM even if it's just a background copy that fails.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] backup notifier fail policy
  2016-10-05  8:12                 ` Kevin Wolf
@ 2016-10-05 12:59                   ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2016-10-05 12:59 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Denis V. Lunev, John Snow, Vladimir Sementsov-Ogievskiy,
	Fam Zheng, qemu block, Jeff Cody, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1986 bytes --]

On Wed, Oct 05, 2016 at 10:12:57AM +0200, Kevin Wolf wrote:
> Am 04.10.2016 um 18:02 hat Stefan Hajnoczi geschrieben:
> > On Tue, Oct 04, 2016 at 01:55:30PM +0200, Kevin Wolf wrote:
> > > Am 04.10.2016 um 12:41 hat Denis V. Lunev geschrieben:
> > > > On 10/04/2016 12:34 PM, Kevin Wolf wrote:
> > > > > Am 04.10.2016 um 11:23 hat Stefan Hajnoczi geschrieben:
> > > > >> On Mon, Oct 03, 2016 at 02:07:34PM -0400, John Snow wrote:
> > > > >>> Eh, regardless: If we're not using a STOP policy, it seems like the right
> > > > >>> thing to do is definitely to just fail the backup instead of failing the
> > > > >>> write.
> > > > >> Even with a -drive werror=stop policy the user probably doesn't want
> > > > >> guest downtime if writing to the backup target fails.
> > > > > That's a policy decision that ultimately only the user can make. For one
> > > > > user, it might be preferable to cancel the backup and keep the VM
> > > > > running, but for another user it may be more important to keep a
> > > > > consistent snapshot of the point in time when the backup job was started
> > > > > than keeping the VM running.
> > > > >
> > > > > Kevin
> > > > In this case policy for guest error and policy for backup
> > > > error should be different policies or I have missed something.
> > > 
> > > I guess so.
> > 
> > There are separate error policies for -device and the blockjob.  Perhaps
> > the blockjob error policy can be used in the write notifier code path if
> > the failure occurs while writing to the backup target.
> 
> Isn't the block job policy used for the background copy? Or do you think
> it's okay to use the same for both cases? That would mean that we stop
> the VM even if it's just a background copy that fails.

You're right, strictly speaking the guest notifier code path is a
separate case and should have a dedicated parameter.  At least if we
want to allow users to select from all possible policy combinations...

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-10-05 12:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-30 17:11 [Qemu-devel] backup notifier fail policy Vladimir Sementsov-Ogievskiy
2016-09-30 18:59 ` Vladimir Sementsov-Ogievskiy
2016-10-03 13:11   ` Stefan Hajnoczi
2016-10-03 18:07     ` John Snow
2016-10-04  9:23       ` Stefan Hajnoczi
2016-10-04  9:34         ` Kevin Wolf
2016-10-04 10:41           ` Denis V. Lunev
2016-10-04 11:55             ` Kevin Wolf
2016-10-04 16:02               ` Stefan Hajnoczi
2016-10-04 16:03                 ` John Snow
2016-10-04 16:19                   ` Denis V. Lunev
2016-10-05  8:12                 ` Kevin Wolf
2016-10-05 12:59                   ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.