All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-6.1? v2 0/3] linux-aio: limit the batch size to reduce queue latency
@ 2021-07-21  9:42 Stefano Garzarella
  2021-07-21  9:42 ` [PATCH for-6.1? v2 1/3] iothread: generalize iothread_set_param/iothread_get_param Stefano Garzarella
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Stefano Garzarella @ 2021-07-21  9:42 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Daniel P. Berrangé,
	Eduardo Habkost, qemu-block, Stefan Weil, Markus Armbruster,
	Max Reitz, Paolo Bonzini, Eric Blake, Dr. David Alan Gilbert

Since it's a performance regression, if possible we could include it in 6.1-rc1.
There shouldn't be any particular criticism.

v1: https://lists.gnu.org/archive/html/qemu-devel/2021-07/msg01526.html
v2:
  - s/bacth/batch/ [stefanha]
  - limit the batch with the number of available events [stefanha]
  - rebased on master
  - re-run benchmarks

Commit 2558cb8dd4 ("linux-aio: increasing MAX_EVENTS to a larger hardcoded
value") changed MAX_EVENTS from 128 to 1024, to increase the number of
in-flight requests. But this change also increased the potential maximum batch
to 1024 elements.

The problem is noticeable when we have a lot of requests in flight and multiple
queues attached to the same AIO context.
In this case we potentially create very large batches. Instead, when we have
a single queue, the batch is limited because when the queue is unplugged,
there is a call to io_submit(2).
In practice, io_submit(2) was called only when there are no more queues plugged
in or when we fill the AIO queue (MAX_EVENTS = 1024).

This series limit the batch size (number of request submitted to the kernel
through a single io_submit(2) call) in the Linux AIO backend, and add a new
`aio-max-batch` parameter to IOThread to allow tuning it.
If `aio-max-batch` is equal to 0 (default value), the AIO engine will use its
default maximum batch size value.

I run some benchmarks to choose 32 as default batch value for Linux AIO.
Below the kIOPS measured with fio running in the guest (average over 3 runs):

                   |   master  |           with this series applied            |
                   |143c2e04328| maxbatch=8|maxbatch=16|maxbatch=32|maxbatch=64|
          # queues | 1q  | 4qs | 1q  | 4qs | 1q  | 4qs | 1q  | 4qs | 1q  | 4qs |
-- randread tests -|-----------------------------------------------------------|
bs=4k iodepth=1    | 200 | 195 | 181 | 208 | 200 | 203 | 206 | 212 | 200 | 204 |
bs=4k iodepth=8    | 269 | 231 | 256 | 244 | 255 | 260 | 266 | 268 | 270 | 250 |
bs=4k iodepth=64   | 230 | 198 | 262 | 265 | 261 | 253 | 260 | 273 | 253 | 263 |
bs=4k iodepth=128  | 217 | 181 | 261 | 253 | 249 | 276 | 250 | 278 | 255 | 278 |
bs=16k iodepth=1   | 130 | 130 | 130 | 130 | 130 | 130 | 137 | 130 | 130 | 130 |
bs=16k iodepth=8   | 130 | 131 | 130 | 131 | 130 | 130 | 137 | 131 | 131 | 130 |
bs=16k iodepth=64  | 130 | 102 | 131 | 128 | 131 | 128 | 137 | 140 | 130 | 128 |
bs=16k iodepth=128 | 131 | 100 | 130 | 128 | 131 | 129 | 137 | 141 | 130 | 129 |

1q  = virtio-blk device with a single queue
4qs = virito-blk device with multi queues (one queue per vCPU - 4)

I reported only the most significant tests, but I also did other tests to
make sure there were no regressions, here the full report:
https://docs.google.com/spreadsheets/d/11X3_5FJu7pnMTlf4ZatRDvsnU9K3EPj6Mn3aJIsE4tI

Test environment:
- Disk: Intel Corporation NVMe Datacenter SSD [Optane]
- CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
- QEMU: qemu-system-x86_64 -machine q35,accel=kvm -smp 4 -m 4096 \
          ... \
          -object iothread,id=iothread0,aio-max-batch=${MAX_BATCH} \
          -device virtio-blk-pci,iothread=iothread0,num-queues=${NUM_QUEUES}

- benchmark: fio --ioengine=libaio --thread --group_reporting \
                 --number_ios=200000 --direct=1 --filename=/dev/vdb \
                 --rw=${TEST} --bs=${BS} --iodepth=${IODEPTH} --numjobs=16

Next steps:
 - benchmark io_uring and use `aio-max-batch` also there
 - make MAX_EVENTS parametric adding a new `aio-max-events` parameter

Thanks,
Stefano

Stefano Garzarella (3):
  iothread: generalize iothread_set_param/iothread_get_param
  iothread: add aio-max-batch parameter
  linux-aio: limit the batch size using `aio-max-batch` parameter

 qapi/misc.json            |  6 ++-
 qapi/qom.json             |  7 +++-
 include/block/aio.h       | 12 ++++++
 include/sysemu/iothread.h |  3 ++
 block/linux-aio.c         |  9 ++++-
 iothread.c                | 82 ++++++++++++++++++++++++++++++++++-----
 monitor/hmp-cmds.c        |  2 +
 util/aio-posix.c          | 12 ++++++
 util/aio-win32.c          |  5 +++
 util/async.c              |  2 +
 qemu-options.hx           |  8 +++-
 11 files changed, 134 insertions(+), 14 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-21 13:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-21  9:42 [PATCH for-6.1? v2 0/3] linux-aio: limit the batch size to reduce queue latency Stefano Garzarella
2021-07-21  9:42 ` [PATCH for-6.1? v2 1/3] iothread: generalize iothread_set_param/iothread_get_param Stefano Garzarella
2021-07-21  9:42 ` [PATCH for-6.1? v2 2/3] iothread: add aio-max-batch parameter Stefano Garzarella
2021-07-21  9:42 ` [PATCH for-6.1? v2 3/3] linux-aio: limit the batch size using `aio-max-batch` parameter Stefano Garzarella
2021-07-21 13:13 ` [PATCH for-6.1? v2 0/3] linux-aio: limit the batch size to reduce queue latency Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.