From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49122) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ur92B-0000g0-2S for qemu-devel@nongnu.org; Mon, 24 Jun 2013 11:50:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ur929-0001m6-PH for qemu-devel@nongnu.org; Mon, 24 Jun 2013 11:50:26 -0400 Received: from g4t0017.houston.hp.com ([15.201.24.20]:28979) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ur929-0001lv-JJ for qemu-devel@nongnu.org; Mon, 24 Jun 2013 11:50:25 -0400 From: Chegu Vinod Date: Mon, 24 Jun 2013 03:49:40 -0600 Message-Id: <1372067382-141082-1-git-send-email-chegu_vinod@hp.com> Subject: [Qemu-devel] [PATCH v8 0/3] Throttle-down guest to help with live migration convergence List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: eblake@redhat.com, anthony@codemonkey.ws, quintela@redhat.com, owasserm@redhat.com, qemu-devel@nongnu.org, pbonzini@redhat.com Cc: chegu_vinod@hp.com Busy enterprise workloads hosted on large sized VM's tend to dirty memory faster than the transfer rate achieved via live guest migration. Despite some good recent improvements (& using dedicated 10Gig NICs between hosts) the live migration does NOT converge. If a user chooses to force convergence of their migration via a new migration capability "auto-converge" then this change will auto-detect lack of convergence scenario and trigger a slow down of the workload by explicitly disallowing the VCPUs from spending much time in the VM context. The migration thread tries to catchup and this eventually leads to convergence in some "deterministic" amount of time. Yes it does impact the performance of all the VCPUs but in my observation that lasts only for a short duration of time. i.e. end up entering stage 3 (downtime phase) soon after that. No external trigger is required. Thanks to Juan and Paolo for their useful suggestions. --- Changes from v7: - added a missing else to patch 3/3. Changes from v6: - incorporated feedback from Paolo. - rebased to latest qemu.git and removing RFC Changes from v5: - incorporated feedback from Paolo & Igor. - rebased to latest qemu.git Changes from v4: - incorporated feedback from Paolo. - split into 3 patches. Changes from v3: - incorporated feedback from Paolo and Eric - rebased to latest qemu.git Changes from v2: - incorporated feedback from Orit, Juan and Eric - stop the throttling thread at the start of stage 3 - rebased to latest qemu.git Changes from v1: - rebased to latest qemu.git - added auto-converge capability(default off) - suggested by Anthony Liguori & Eric Blake. Signed-off-by: Chegu Vinod --- Chegu Vinod (3): Introduce async_run_on_cpu() Add 'auto-converge' migration capability Force auto-convegence of live migration arch_init.c | 85 +++++++++++++++++++++++++++++++++++++++++ cpus.c | 29 ++++++++++++++ include/migration/migration.h | 2 + include/qemu-common.h | 1 + include/qom/cpu.h | 10 +++++ migration.c | 9 ++++ qapi-schema.json | 5 ++- 7 files changed, 140 insertions(+), 1 deletions(-)