[Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support

* [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support
@ 2014-08-05  3:33 Ming Lei
  2014-08-05  3:33 ` [Qemu-devel] [PATCH v1 01/17] qemu/obj_pool.h: introduce object allocation pool Ming Lei
                   ` (18 more replies)
  0 siblings, 19 replies; 81+ messages in thread
From: Ming Lei @ 2014-08-05  3:33 UTC (permalink / raw)
  To: qemu-devel, Peter Maydell, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Michael S. Tsirkin

Hi,

These patches bring up below 4 changes:
        - introduce object allocation pool and apply it to
        virtio-blk dataplane for improving its performance

        - introduce selective coroutine bypass mechanism
        for improving performance of virtio-blk dataplane with
        raw format image

        - linux-aio changes: fixing for cases of -EAGAIN and partial
        completion, increase max events to 256, and remove one unuseful
        fields in 'struct qemu_laiocb'

        - support multi virtqueue for virtio-blk

The virtio-blk multi virtqueue feature will be added to virtio spec 1.1[1],
and the 3.17 linux kernel[2] will support the feature in virtio-blk driver.
For those who wants to play the stuff, the kernel side patche can be found
in either Jens's block tree[3] or linux-next[4].

Below fio script running from VM is used for test improvement of these patches:

        [global]
        direct=1
        size=128G
        bsrange=4k-4k
        timeout=120
        numjobs=${JOBS}
        ioengine=libaio
        iodepth=64
        filename=/dev/vdc
        group_reporting=1

        [f]
        rw=randread

One quadcore VM(8G RAM) is created in below host to run above fio test:

        - server(16cores: 8 physical cores, 2 threads per physical core)

Follows the test result on throughput improvement(IOPS) with
this patchset(4 virtqueues per virito-blk device, 4JOBS) against
QEMU 2.1.0: 53% throughput improvement can be observed, and
scalability for parallel I/Os is improved more(>100% throughput
improvement is observed in case of 4 JOBS).

>From above result, we can see both scalability and performance
get improved a lot.

After commit 580b6b2aa2(dataplane: use the QEMU block
layer for I/O), average time for submiting one single
request has been increased a lot, as my trace, the average
time taken for submiting one request has been doubled even
though block plug&unplug mechanism is introduced to
ease its effect. That is why this patchset introduces
selective coroutine bypass mechanism and object allocation
pool for saving the time first. Based on QEMU 2.0, only
single virtio-blk dataplane multi virtqueue patch can get
better improvement than current result[5].

V1:
	- bypass co: add check for making bypass decision to help
	remove hint from device in future
	- bypass co: run acb->cb() via BH as pointed by Paolo and Stefan
	- virtio: remove patch for decreasing size of VirtQueueElement,
	which will break migration between different QEMU version,
	another standalone patchset might do that 
	- linux-aio: retry io_submit in following completion cb for -EAGAIN
	as suggested by Paolo
	- linux-aio: handle -EAGAIN for non plugged case as suggested by Paolo
	- mq conversion: support multi virtqueue for non-dataplane as required
	by Paolo

TODO:
	- optimize block layer for linux aio, so that
    more time can be saved for submitting request
	- support more than one aio-context for improving
    virtio-blk performance

[1], http://marc.info/?l=linux-api&m=140486843317107&w=2
[2], http://marc.info/?l=linux-api&m=140418368421229&w=2
[3], http://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/ #for-3.17/drivers
[4], https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/
[5], http://marc.info/?l=linux-api&m=140377573830230&w=2

 block.c                         |  233 ++++++++++++++++++++++++++++++++++-----
 block/linux-aio.c               |  124 ++++++++++++++++-----
 block/raw-posix.c               |   34 ++++++
 hw/block/dataplane/virtio-blk.c |  221 ++++++++++++++++++++++++++++---------
 hw/block/virtio-blk.c           |   39 +++++--
 include/block/block.h           |   12 ++
 include/block/block_int.h       |    3 +
 include/block/coroutine.h       |    8 ++
 include/block/coroutine_int.h   |    5 +
 include/hw/virtio/virtio-blk.h  |   14 ++-
 include/qemu/gc.h               |   56 ++++++++++
 include/qemu/obj_pool.h         |   64 +++++++++++
 qemu-coroutine-lock.c           |    4 +-
 qemu-coroutine.c                |   33 ++++++
 14 files changed, 734 insertions(+), 116 deletions(-)

Thanks,

^ permalink raw reply	[flat|nested] 81+ messages in thread