[Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2

* [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
@ 2012-05-18 17:08 Paolo Bonzini
  2012-05-21  9:29 ` Kevin Wolf
                   ` (3 more replies)
  0 siblings, 4 replies; 38+ messages in thread
From: Paolo Bonzini @ 2012-05-18 17:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Federico Simoncelli, Eric Blake, Stefan Hajnoczi,
	Luiz Capitulino

Hi all,

the current block job API is designed for streaming; one property of
streaming is that in case of an error it can be restarted from the point
where it was left.

In QEMU 1.2 I would like to add an implementation of mirroring (live
block copy) based on the block job API and on dirty-block tracking.
Unlike streaming, this operation is not restartable, because canceling
the job turns off dirty-block tracking.

To avoid this problem, my proposal is to add to block jobs options
similar to rerror/werror.  There are a few more details required to get
this to work, and the purpose of this email is to summarize these details.

New QMP commands
================

The following QMP commands are added.

* block-job-pause: Takes a block device (drive), pauses an active
background block operation on that device.
This command returns immediately after marking the active background
block operation for pausing.  It is an error to call this command if no
operation is in progress.  The operation will pause as soon as possible
(it won't pause if the job is being cancelled).  No event is emitted
when the operation is actually paused.  Cancelling a paused job
automatically resumes it.

* block-job-resume: Takes a block device (drive), resume a paused
background block operation on that device.
This command returns immediately after resuming a paused background
block operation.  It is an error to call this command if no operation is
in progress.

A successful block-job-resume operation also resets the iostatus on the
device that is passed.

  Rationale: because block job errors can occur even while the VM is
  stopped, rerror=stop/werror=stop cannot reuse the "cont" monitor
  command to restart a block job.  So block-job-resume is required
  anyway.  Adding block-job-pause makes it simpler to test the new
  feature.

Modified QMP commands
=====================

* block-stream: I propose adding two options to the existing
block-stream command.  If this is rejected, only mirroring will be able
to use rerror/werror.

The new options are of course rerror/werror.  They are enum options,
with the following possible values:

'report': The behavior is the same as in 1.1.  An I/O error,
respectively during a read or a write, will complete the job immediately
with an error code.

'ignore': An I/O error, respectively during a read or a write, will be
ignored.  For streaming, the job will complete with an error and the
backing file will be left in place.  For mirroring, the sector will be
marked again as dirty and re-examined later.

'stop': The VM *and* the job will be paused---the VM is stopped even if
the block device has neither rerror=stop nor werror={stop,enospc}.  The
error is recorded in the block device's iostatus (which can be examined
with query-block).  However, a BLOCK_IO_ERROR event will _never_ pause a
job.

  Rationale: stopping all I/O seems to be the best choice in order
  to limit the number of errors received.  However, due to backwards-
  compatibility with QEMU 1.1 we cannot pause the job when guest-
  initiated I/O causes an error.  We could do that if the block
  device has rerror=stop/werror={stop,enospc}, but it seems more
  complicated to just never do it.

'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others.

In all cases, even for 'report', the I/O error is reported as a QMP
event BLOCK_JOB_ERROR, with the same arguments as BLOCK_IO_ERROR.

It is possible that while stopping the VM a BLOCK_IO_ERROR event will be
reported and will clobber the event from BLOCK_JOB_ERROR, or vice versa.
 This is not really avoidable since stopping the VM completes all
pending I/O requests.  In fact, it is already possible now that a series
of BLOCK_IO_ERROR events are reported with rerror=stop.

After cancelling a job, the job implementation MAY choose to treat stop
and enospc values as report, i.e. complete the job immediately with an
error code, as long as block_job_is_cancelled(job) returns true when the
completion callback is called.

  Open problems: There could be unrecoverable errors in which the job
  will be completed as if rerror/werror were set to report (example:
  error while switching backing files).  Does it make sense to fire an
  event before the point in time where such errors can happen?

Other points specific to mirroring
==================================

* query-block-jobs: The returned JSON object will grow an additional
member, "target".  The target field is a dictionary with two fields,
"info" and "stats" (resembling the output of query-block and
query-blockstat but for the mirroring target).  Member "device" of the
BlockInfo structure will be made optional.

  Rationale: this allows libvirt to observe the high watermark of qcow2
  mirroring targets, and avoids putting a bad iostatus on a working
  migration source.

* cont: even though cont does _not_ restart the block job that reported
an error, the iostatus is reset for all block devices that are attached
to a block job (like the mirroring target).

  Rationale: cont anyway resets the iostatus for the streaming target
  or mirroring source, because there is a single iostatus for the
  device and the job.  It is simpler to do the same also for the
  mirroring target.

* block-job-resume also resets the iostatus on the mirroring target.

* block-job-complete: new command specific to mirroring (switches the
device to the target), not related to the rest of the proposal.

^ permalink raw reply	[flat|nested] 38+ messages in thread