[Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite

* [Qemu-devel] [PATCH 00/16] AioContext fine-grained locking, part 1 of 3, including bdrv_drain rewrite
@ 2016-02-16 17:56 Paolo Bonzini
  2016-02-16 17:56 ` [Qemu-devel] [PATCH 01/16] block: make bdrv_start_throttled_reqs return void Paolo Bonzini
                   ` (18 more replies)
  0 siblings, 19 replies; 48+ messages in thread
From: Paolo Bonzini @ 2016-02-16 17:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, stefanha

So the fine-grained locking series has grown from 2 parts to 3...

This first part stops where we remove RFifoLock.

In the previous submission, the bulk of the series took care of
associating an AioContext to a thread, so that aio_poll could run event
handlers only on that thread.  This was done to fix a race in bdrv_drain.
There were two disadvantages:

1) reporting progress from aio_poll to the main thread was Really Hard.
Despite being relatively sure of the correctness---also thanks to formal
models---it's not the kind of code that I'd lightly donate to posterity.

2) the strict dependency between bdrv_drain and a single AioContext
would have been bad for multiqueue.

Instead, this series does the same trick (do not run dataplane event handlers
from the main thread) but reports progress directly at the BlockDriverState
level.

To do so, the mechanism to track in-flight requests is changed to a
simple counter.  This is more flexible than BdrvTrackedRequest, and lets
the block/io.c code track precisely the initiation and completion of the
requests.  In particular, the counter remains nonzero when a bottom half
is required to "really" complete the request.  bdrv_drain doesn't rely
anymore on a non-blocking aio_poll to execute those bottom halves.

It is also more efficient; while BdrvTrackedRequest has to remain
for request serialisation, with this series we can in principle make
BdrvTrackedRequest optional and enable it only when something (automatic
RMW or copy-on-read) requires request serialisation.

In general this flows nicely, the only snag being patch 5.  The problem
here is that mirror is calling bdrv_drain from an AIO callback, which
used to be okay (because bdrv_drain more or less tried to guess when
all AIO callbacks were done) but now causes a deadlock (because bdrv_drain
really checks for AIO callbacks to be complete).  To add to the complication,
the bdrv_drain happens far away from the AIO callback, in the coroutine that
the AIO callback enters.  The situation here is admittedly underdefined,
and Stefan pointed out that the same solution is found in many other places
in the QEMU block layer.  Therefore I think it's acceptable.  I'm pointing
it out explicitly though, because (together with patch 8) it is an example
of the issues caused by the new, stricter definition of bdrv_drain.

Patches 1-7 organize bdrv_drain into separate phases, with a well-defined
order between the BDS and it children.  The phases are: 1) disable
throttling; 2) disable io_plug; 3) call bdrv_drain callbacks; 4) wait
for I/O to finish; 5) re-enable io_plug and throttling.  Previously,
throttling of throttling and io_plug were handled by bdrv_flush_io_queue,
which was repeatedly called as part of the I/O wait loop.  This part of
the series removes bdrv_flush_io_queue.

Patch 8-11 replace aio_poll with bdrv_drain so that the work done
so far applies to all former callers of aio_poll.

Patches 12-13 introduce the logic that lets the main thread wait remotely
for an iothread to drain the BDS.  Patches 14-16 then achieve the purpose
of this series, removing the long-running AioContext critical section
from iothread_run and removing RFifoLock.

The series passes check-block.sh with a few fixes applied for pre-existing
bugs:

    vl: fix migration from prelaunch state
    migration: fix incorrect memory_global_dirty_log_start outside BQL
    qed: fix bdrv_qed_drain

Paolo

Paolo Bonzini (16):
  block: make bdrv_start_throttled_reqs return void
  block: move restarting of throttled reqs to block/throttle-groups.c
  block: introduce bdrv_no_throttling_begin/end
  block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
  mirror: use bottom half to re-enter coroutine
  block: add BDS field to count in-flight requests
  block: change drain to look only at one child at a time
  blockjob: introduce .drain callback for jobs
  block: wait for all pending I/O when doing synchronous requests
  nfs: replace aio_poll with bdrv_drain
  sheepdog: disable dataplane
  aio: introduce aio_context_in_iothread
  block: only call aio_poll from iothread
  iothread: release AioContext around aio_poll
  qemu-thread: introduce QemuRecMutex
  aio: convert from RFifoLock to QemuRecMutex

 async.c                         |  28 +---
 block.c                         |   4 +-
 block/backup.c                  |   7 +
 block/io.c                      | 281 +++++++++++++++++++++++++---------------
 block/linux-aio.c               |  13 +-
 block/mirror.c                  |  37 +++++-
 block/nfs.c                     |  50 ++++---
 block/qed-table.c               |  16 +--
 block/qed.c                     |   4 +-
 block/raw-aio.h                 |   2 +-
 block/raw-posix.c               |  16 +--
 block/sheepdog.c                |  19 +++
 block/throttle-groups.c         |  19 +++
 blockjob.c                      |  16 ++-
 docs/multiple-iothreads.txt     |  40 +++---
 include/block/aio.h             |  13 +-
 include/block/block.h           |   3 +-
 include/block/block_int.h       |  22 +++-
 include/block/blockjob.h        |   7 +
 include/block/throttle-groups.h |   1 +
 include/qemu/rfifolock.h        |  54 --------
 include/qemu/thread-posix.h     |   6 +
 include/qemu/thread-win32.h     |  10 ++
 include/qemu/thread.h           |   3 +
 iothread.c                      |  20 +--
 stubs/Makefile.objs             |   1 +
 stubs/iothread.c                |   8 ++
 tests/.gitignore                |   1 -
 tests/Makefile                  |   2 -
 tests/qemu-iotests/060          |   8 +-
 tests/qemu-iotests/060.out      |   4 +-
 tests/test-aio.c                |  22 ++--
 tests/test-rfifolock.c          |  91 -------------
 util/Makefile.objs              |   1 -
 util/qemu-thread-posix.c        |  13 ++
 util/qemu-thread-win32.c        |  25 ++++
 util/rfifolock.c                |  78 -----------
 37 files changed, 471 insertions(+), 474 deletions(-)
 delete mode 100644 include/qemu/rfifolock.h
 create mode 100644 stubs/iothread.c
 delete mode 100644 tests/test-rfifolock.c
 delete mode 100644 util/rfifolock.c

-- 
2.5.0

^ permalink raw reply	[flat|nested] 48+ messages in thread