All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/6] tcg: fix icount super slowdown
@ 2017-03-03 13:11 Paolo Bonzini
  2017-03-03 13:11 ` [Qemu-devel] [PATCH 1/5] qemu-timer: fix off-by-one Paolo Bonzini
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Paolo Bonzini @ 2017-03-03 13:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

icount has become much slower after tcg_cpu_exec has stopped
using the BQL.  There is also a latent bug that is masked by
the slowness.

The slowness happens because every occurrence of a QEMU_CLOCK_VIRTUAL
timer now has to wake up the I/O thread and wait for it.  The rendez-vous
is mediated by the BQL QemuMutex:

- handle_icount_deadline wakes up the I/O thread with BQL taken
- the I/O thread wakes up and waits on the BQL 
- the VCPU thread releases the BQL a little later
- the I/O thread raises an interrupt, which calls qemu_cpu_kick
- the VCPU thread notices the interrupt, takes the BQL to
  process it and waits on it

All this back and forth is extremely expensive, causing a 6 to 8-fold
slowdown when icount is turned on.

One may think that the issue is that the VCPU thread is too dependent
on the BQL, but then the latent bug comes in.  I first tried removing
the BQL completely from the x86 cpu_exec.  Every guest thern hung, and
the only way to fix it (and make everything slow again) was to add a dummy
BQL lock/unlock pair to qemu_tcg_wait_io_event.

This is because in -icount mode you really have to process the events
before the CPU restarts executing the next instruction.  Therefore, this
series moves the processing of QEMU_CLOCK_VIRTUAL timers straight in
the vCPU thread when running in icount mode.  This is only limited to the
main TimerListGroup.  QEMU_CLOCK_VIRTUAL timers in AioContexts still run
outside the vCPU thread.

With this change, icount mode is pretty much running as fast as in 2.8.
I tested the patches are on top of Alex's series with both x86 and aarch64
guests, but they should be pretty much independent.

The good thing is that the infrastructure to do this is basically
already there, in the form of QEMUTimerListNotifyCB.  It only needs to
be generalized a bit (patches 2 and 3) and bugfixed (patch 1 and 4---the
latter is necessary to avoid the "I/O thread spun for 1000 iterations
and consequent slowing down of vCPU thread).

The bad things are:

- I am not sure of what was different before the patch that removed the
BQL from tcg_cpu_exec (and I don't really have time to profile it right
now---I should not be fixing this in fact...).

- the solution sounds a bit ugly and it probably is---though the patch
itself is pretty small, adding only about 30 lines of new code.

Paolo

Paolo Bonzini (5):
  qemu-timer: fix off-by-one
  qemu-timer: do not include sysemu/cpus.h from util/qemu-timer.h
  cpus: define QEMUTimerListNotifyCB for QEMU system emulation
  main-loop: remove now unnecessary optimization
  icount: process QEMU_CLOCK_VIRTUAL timers in vCPU thread

 cpu-exec.c                   |  1 +
 cpus.c                       | 29 +++++++++++++++++++++++++++--
 hw/core/ptimer.c             |  1 +
 include/qemu/timer.h         | 29 ++++++++++++++++++++++++++---
 include/sysemu/cpus.h        |  3 +++
 kvm-all.c                    |  1 +
 monitor.c                    |  1 +
 replay/replay.c              |  1 +
 stubs/cpu-get-icount.c       |  6 ++++++
 tests/test-aio-multithread.c |  2 +-
 tests/test-aio.c             |  2 +-
 translate-all.c              |  1 +
 util/async.c                 |  2 +-
 util/main-loop.c             |  3 ++-
 util/qemu-timer.c            | 17 ++++++++++-------
 vl.c                         |  5 +----
 16 files changed, 84 insertions(+), 20 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-03-14 16:23 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-03 13:11 [Qemu-devel] [PATCH 0/6] tcg: fix icount super slowdown Paolo Bonzini
2017-03-03 13:11 ` [Qemu-devel] [PATCH 1/5] qemu-timer: fix off-by-one Paolo Bonzini
2017-03-03 13:48   ` Edgar E. Iglesias
2017-03-10  9:46   ` Alex Bennée
2017-03-03 13:11 ` [Qemu-devel] [PATCH 2/5] qemu-timer: do not include sysemu/cpus.h from util/qemu-timer.h Paolo Bonzini
2017-03-03 13:48   ` Edgar E. Iglesias
2017-03-03 14:50   ` Alex Bennée
2017-03-03 14:55     ` Paolo Bonzini
2017-03-10  7:42   ` [Qemu-devel] [PATCH] fixup! " Alex Bennée
2017-03-10  8:27     ` Peter Maydell
2017-03-10  9:47   ` [Qemu-devel] [PATCH 2/5] " Alex Bennée
2017-03-03 13:11 ` [Qemu-devel] [PATCH 3/5] cpus: define QEMUTimerListNotifyCB for QEMU system emulation Paolo Bonzini
2017-03-03 13:53   ` Edgar E. Iglesias
2017-03-03 13:11 ` [Qemu-devel] [PATCH 4/5] main-loop: remove now unnecessary optimization Paolo Bonzini
2017-03-03 13:53   ` Edgar E. Iglesias
2017-03-13 16:23   ` Alex Bennée
2017-03-03 13:11 ` [Qemu-devel] [PATCH 5/5] icount: process QEMU_CLOCK_VIRTUAL timers in vCPU thread Paolo Bonzini
2017-03-13 16:53   ` Alex Bennée
2017-03-13 17:16     ` Paolo Bonzini
2017-03-13 18:15       ` Alex Bennée
2017-03-14 10:05         ` Paolo Bonzini
2017-03-14 12:57           ` Paolo Bonzini
2017-03-14 15:43             ` Alex Bennée
2017-03-14 16:23               ` Paolo Bonzini
2017-03-09 17:19 ` [Qemu-devel] [PATCH 0/6] tcg: fix icount super slowdown Alex Bennée
2017-03-09 17:22   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.