[Qemu-devel] [PATCH 00/47] Block job improvements for 1.2

* [Qemu-devel] [PATCH 00/47] Block job improvements for 1.2
@ 2012-07-24 11:03 Paolo Bonzini
  2012-07-24 11:03 ` [Qemu-devel] [PATCH 01/47] qapi: generalize documentation of streaming commands Paolo Bonzini
                   ` (47 more replies)
  0 siblings, 48 replies; 136+ messages in thread
From: Paolo Bonzini @ 2012-07-24 11:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, jcody, eblake, stefanha

Hi all, this is the first non-RFC submission of my block job patches
for 1.2.  Everything is there, including multiple in-flight operations
in the mirroring job and new testcases (for all of streaming, mirroring,
hierarchical bitmap).  The tests use blkdebug to test error reporting
for both streaming and mirroring.

This still does not include a persistent dirty bitmap, which will be work
for 1.3.

If you want to tinker with this, everything is available at
git://github.com/bonzini/qemu.git in branch blkmirror-job-1.2.

I know it's a lot of code, I'm sorry for dropping this quite close to
the feature freeze.  Unfortunately, preparing for the Linux merge window
and other non-QEMU tasks have dragged this 1-2 weeks more than I would
have liked.

The patches are organized as follows:

01-12   preparatory work for block job errors, including support for
        pausing and resuming jobs

13-17   introduce block job errors, and add support in block-stream

18-26   preparatory work for block mirroring, including creating new
        new functions out of existing code.

27-34   introduce a simple version of mirroring.  The initial patch
        add the mirroring logic, followed by the ability to switch to
        the destination of migration, to query the target file (for
        example, polling the high-water mark), and to handle errors
        during the job.  All these changes come with testcases.

35-43   These patches introduce the first optimizations, namely supporting
        an arbitrary granularity for the dirty bitmap.  The current default,
        1M, is too coarse to let the job converge quickly and in almost
        real-time.  These patches reimplement the block device dirty bitmap
        to allow efficient iteration, and add cluster copy-on-write logic.
        Cluster copy-on-write is needed because management will want to
        start the copy before the backing file is in place in the destination;
        if mirroring takes care of copy-on-write, BDRV_O_NO_BACKING can be
        used even if the granularity is smaller than the cluster size.

44-47   A second round optimizations, replacing serialized read-write
        operations with multiple asynchronous I/O operations.  The various
        in-flight operations can be of arbitrary size.  The initial copy
        will end up reading large chunks sequentially (10M by default),
        while subsequent passes can mimic more closely the guest's I/O
        patterns.

Compared to v1, the last four patches are entirely new, and so are many
of the testcase changes.  All comments from Eric's review are addressed.
In some cases the patches were modified (reversing if conditions or things
like that) in order to keep later patches simpler.  I also added several
new tracepoints.

Latency is vital to any migration scheme using a dirty bitmap, especially
because completion is entirely asynchronous, so I expect this to be used
either with pretty good storage, or on guests doing relatively little I/O.
I tested this both on my laptop and with moderately high-end SAS disks.

On the SAS disks, time between checkpoints (trace_mirror_before_flush)
on kernel compilation (-j3 to -j12, 4 or 8 vCPUs) is almost always within
1 second, usually much less targeting a local disk.  On hibernation,
which is a worst-case test (sequential I/O happening with no flushes
in between) and failed completely to converge on my lowly laptop hard
disk, a checkpoint was reached every 0.5 to 3 seconds.  When targeting
a local qemu-nbd server performance was similar.  Kernel compilation
showed occasional bumps, but they were fixed in 1.5-7 seconds.

Please review!

Paolo Bonzini (47):
  qapi: generalize documentation of streaming commands
  qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE
  block: move job APIs to separate files
  block: add block_job_query
  block: add support for job pause/resume
  qmp: add block-job-pause and block-job-resume
  qemu-iotests: add test for pausing a streaming operation
  block: rename block_job_complete to block_job_completed
  block: rename BlockErrorAction, BlockQMPEventAction
  block: move BlockdevOnError declaration to QAPI
  block: reorganize io error code
  block: sort BlockDeviceIoStatus errors by severity
  block: introduce block job error
  stream: add on-error argument
  blkdebug: process all set_state rules in the old state
  qemu-iotests: map underscore to dash in QMP argument names
  qemu-iotests: add tests for streaming error handling
  block: live snapshot documentation tweaks
  block: add bdrv_query_info
  block: add bdrv_query_stats
  block: add bdrv_ensure_backing_file
  block: make device optional in BlockInfo
  block: add target info to QMP query-blockjobs command
  block: introduce new dirty bitmap functionality
  block: add block-job-complete
  block: introduce BLOCK_JOB_READY event
  block: introduce mirror job
  qmp: add drive-mirror command
  mirror: support querying target file
  mirror: implement completion
  qemu-iotests: add mirroring test case
  block: forward bdrv_iostatus_reset to block job
  mirror: add support for on-source-error/on-target-error
  qmp: add pull_event function
  qemu-iotests: add testcases for mirroring
    on-source-error/on-target-error
  host-utils: add ffsl and flsl
  add hierarchical bitmap data type and test cases
  block: implement dirty bitmap using HBitmap
  block: make round_to_clusters public
  mirror: perform COW if the cluster size is bigger than the
    granularity
  block: return count of dirty sectors, not chunks
  block: allow customizing the granularity of the dirty bitmap
  mirror: allow customizing the granularity
  mirror: switch mirror_iteration to AIO
  mirror: add buf-size argument to drive-mirror
  mirror: support more than one in-flight AIO operation
  mirror: support arbitrarily-sized iterations

 Makefile.objs                 |    5 +-
 QMP/qmp-events.txt            |   43 +++
 QMP/qmp.py                    |   20 ++
 block-migration.c             |    8 +-
 block.c                       |  486 ++++++++++++------------------
 block.h                       |   37 ++-
 block/Makefile.objs           |    3 +-
 block/blkdebug.c              |   14 +-
 block/mirror.c                |  562 +++++++++++++++++++++++++++++++++++
 block/stream.c                |   33 +-
 block_int.h                   |  192 +++---------
 blockdev.c                    |  257 +++++++++++++---
 blockjob.c                    |  290 ++++++++++++++++++
 blockjob.h                    |  285 ++++++++++++++++++
 hbitmap.c                     |  394 ++++++++++++++++++++++++
 hbitmap.h                     |   51 ++++
 hmp-commands.hx               |   73 ++++-
 hmp.c                         |   65 +++-
 hmp.h                         |    4 +
 host-utils.h                  |   45 +++
 hw/fdc.c                      |    4 +-
 hw/ide/core.c                 |   20 +-
 hw/scsi-disk.c                |   23 +-
 hw/scsi-generic.c             |    4 +-
 hw/virtio-blk.c               |   19 +-
 monitor.c                     |    2 +
 monitor.h                     |    2 +
 qapi-schema.json              |  238 +++++++++++++--
 qemu-tool.c                   |    6 +
 qerror.c                      |   12 +
 qerror.h                      |    9 +
 qmp-commands.hx               |   72 ++++-
 tests/Makefile                |    2 +
 tests/qemu-iotests/030        |  178 ++++++++++-
 tests/qemu-iotests/039        |  661 +++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/group      |    3 +-
 tests/qemu-iotests/iotests.py |   19 +-
 tests/test-hbitmap.c          |  384 ++++++++++++++++++++++++
 trace-events                  |   24 +-
 39 files changed, 3946 insertions(+), 603 deletions(-)
 create mode 100644 block/mirror.c
 create mode 100644 blockjob.c
 create mode 100644 blockjob.h
 create mode 100644 hbitmap.c
 create mode 100644 hbitmap.h
 create mode 100755 tests/qemu-iotests/039
 create mode 100644 tests/test-hbitmap.c

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 136+ messages in thread