From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:40856)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jjherne@linux.vnet.ibm.com>) id 1YVmbF-0006TF-2J
	for qemu-devel@nongnu.org; Wed, 11 Mar 2015 15:47:32 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jjherne@linux.vnet.ibm.com>) id 1YVmb9-0003f6-4r
	for qemu-devel@nongnu.org; Wed, 11 Mar 2015 15:47:24 -0400
Received: from e9.ny.us.ibm.com ([32.97.182.139]:35878)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jjherne@linux.vnet.ibm.com>) id 1YVmb9-0003em-13
	for qemu-devel@nongnu.org; Wed, 11 Mar 2015 15:47:19 -0400
Received: from /spool/local
	by e9.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only!
	Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <jjherne@linux.vnet.ibm.com>;
	Wed, 11 Mar 2015 15:47:17 -0400
Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com
	[9.57.198.25])
	by d01dlp01.pok.ibm.com (Postfix) with ESMTP id 6E17C38C8039
	for <qemu-devel@nongnu.org>; Wed, 11 Mar 2015 15:47:15 -0400 (EDT)
Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217])
	by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
	t2BJlFgN28901410
	for <qemu-devel@nongnu.org>; Wed, 11 Mar 2015 19:47:15 GMT
Received: from d01av03.pok.ibm.com (localhost [127.0.0.1])
	by d01av03.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id
	t2BJlDYA009167
	for <qemu-devel@nongnu.org>; Wed, 11 Mar 2015 15:47:14 -0400
Message-ID: <55009BC0.3010905@linux.vnet.ibm.com>
Date: Wed, 11 Mar 2015 15:47:12 -0400
From: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <54F4D076.3040402@linux.vnet.ibm.com>
In-Reply-To: <54F4D076.3040402@linux.vnet.ibm.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Migration auto-converge problem
Reply-To: jjherne@linux.vnet.ibm.com
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "qemu-devel@nongnu.org qemu-devel" <qemu-devel@nongnu.org>, Christian Borntraeger <borntraeger@de.ibm.com>, quintela@redhat.com, amit.shah@redhat.com

On 03/02/2015 04:04 PM, Jason J. Herne wrote:
> We have a test case that dirties memory very very quickly. When we run
> this test case in a guest and attempt a migration, that migration never
> converges even when done with auto-converge on.
>
> The auto converge behavior of Qemu functions differently purpose than I
> had expected. In my mind, I expected auto converge to continuously apply
> adaptive throttling of the cpu utilization of a busy guest if Qemu
> detects that progress is not being made quickly enough in the guest
> memory transfer. The idea is that a guest dirtying pages too quickly
> will be adaptively slowed down by the throttling until migration is able
> to transfer pages fast enough to complete the migration within the max
> downtime. Qemu's current auto converge does not appear to do this in
> practice.
>
> A quick look at the source code shows the following:
> - Autoconverge keeps a counter. This counter is only incremented if, for
> a completed memory pass, the guest is dirtying pages at a rate of 50%
> (or more) of our transfer rate.
> - The counter only increments at most once per pass through memory.
> - The counter must reach 4 before any throttling is done. (a minimum of
> 4 memory passes have to occur)
> - Once the counter reaches 4, it is immediately reset to 0, and then
> throttling action is taken.
> - Throttling occurs by doing an async sleep on each guest cpu for 30ms,
> exactly one time.
>
> Now consider the scenario auto-converge is meant to solve (I think): A
> guest touching lots of memory very quickly. Each pass through memory is
> going to be sending a lot of pages, and thus, taking a decent amount of
> time to complete. If, for every four passes, we are *only* sleeping the
> guest for 30ms, our guest is still going to be able dirty pages faster
> than we can transfer them. We will never catch up because the sleep time
> relative to guest execution time is very very small.
>
> Auto converge, as it is implemented today, does not address the problem
> I expect it solve. However, after rapid prototyping a new version of
> auto converge that performs adaptive modeling I've learned something.
> The workload I'm attempting to migrate is actually a pathological case.
> It is an excellent example of why throttling cpu is not always a good
> method of limiting memory access. In this test case we are able to touch
> over 600 MB of pages in 50 ms of continuous execution. In this case,
> even if I throttle the guest to 5% (50ms runtime, 950ms sleep) we still
> cannot even come close to catching up even with a fairly speedy network
> link (which not every user will have).
>
> Given the above, I believe that some workloads touch memory too fast and
> we'll never be able to live migrate them with auto-converge. On the
> lower end there are workloads that have a very small/stagnant working
> set size which will be live migratable without the need for
> auto-converge. Lastly, we have "the nebulous middle". These are
> workloads that would benefit from auto-converge because they touch pages
> too fast for migration to be able to deal with them, AND (important
> conditional here), throttling will(may?) actually reduce their rate of
> page modifications. I would like to try and define this "middle" set of
> workloads.
>
> A question with no obvious answer: How much throttling is acceptable? If
> I have to throttle a guest 90% and he ends up failing 75% of whatever
> transactions he is attempting to process then we have quite likely
> defeated the entire purpose of "live" migration. Perhaps it would be
> better in this case to just stop the guest and do a non-live migration.
> Maybe by reverting to non-live we actually save time and thus more
> transactions would have completed. This one may take some experimenting
> to be able to get a good idea for what makes the most sense. Maybe even
> have max throttling be be user configurable.
>
> With all this said, I still wonder exactly how big this "nebulous
> middle" really is. If, in practice, that "middle" only accounts for 1%
> of the workloads out there then is it really worth spending time fixing
> it? Keep in mind this is a two pronged test:
> 1. Guest cannot migrate because it changes memory too fast
> 2. Cpu throttling slows guest's memory writes down enough such that he
> can now migrate
>
> I'm interested in any thoughts anyone has. Thanks!
>

Ping, Just wondering if anyone has any thoughts on this issue?

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)