From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40856) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YVmbF-0006TF-2J for qemu-devel@nongnu.org; Wed, 11 Mar 2015 15:47:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YVmb9-0003f6-4r for qemu-devel@nongnu.org; Wed, 11 Mar 2015 15:47:24 -0400 Received: from e9.ny.us.ibm.com ([32.97.182.139]:35878) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YVmb9-0003em-13 for qemu-devel@nongnu.org; Wed, 11 Mar 2015 15:47:19 -0400 Received: from /spool/local by e9.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 11 Mar 2015 15:47:17 -0400 Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com [9.57.198.25]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id 6E17C38C8039 for ; Wed, 11 Mar 2015 15:47:15 -0400 (EDT) Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id t2BJlFgN28901410 for ; Wed, 11 Mar 2015 19:47:15 GMT Received: from d01av03.pok.ibm.com (localhost [127.0.0.1]) by d01av03.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id t2BJlDYA009167 for ; Wed, 11 Mar 2015 15:47:14 -0400 Message-ID: <55009BC0.3010905@linux.vnet.ibm.com> Date: Wed, 11 Mar 2015 15:47:12 -0400 From: "Jason J. Herne" MIME-Version: 1.0 References: <54F4D076.3040402@linux.vnet.ibm.com> In-Reply-To: <54F4D076.3040402@linux.vnet.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Migration auto-converge problem Reply-To: jjherne@linux.vnet.ibm.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "qemu-devel@nongnu.org qemu-devel" , Christian Borntraeger , quintela@redhat.com, amit.shah@redhat.com On 03/02/2015 04:04 PM, Jason J. Herne wrote: > We have a test case that dirties memory very very quickly. When we run > this test case in a guest and attempt a migration, that migration never > converges even when done with auto-converge on. > > The auto converge behavior of Qemu functions differently purpose than I > had expected. In my mind, I expected auto converge to continuously apply > adaptive throttling of the cpu utilization of a busy guest if Qemu > detects that progress is not being made quickly enough in the guest > memory transfer. The idea is that a guest dirtying pages too quickly > will be adaptively slowed down by the throttling until migration is able > to transfer pages fast enough to complete the migration within the max > downtime. Qemu's current auto converge does not appear to do this in > practice. > > A quick look at the source code shows the following: > - Autoconverge keeps a counter. This counter is only incremented if, for > a completed memory pass, the guest is dirtying pages at a rate of 50% > (or more) of our transfer rate. > - The counter only increments at most once per pass through memory. > - The counter must reach 4 before any throttling is done. (a minimum of > 4 memory passes have to occur) > - Once the counter reaches 4, it is immediately reset to 0, and then > throttling action is taken. > - Throttling occurs by doing an async sleep on each guest cpu for 30ms, > exactly one time. > > Now consider the scenario auto-converge is meant to solve (I think): A > guest touching lots of memory very quickly. Each pass through memory is > going to be sending a lot of pages, and thus, taking a decent amount of > time to complete. If, for every four passes, we are *only* sleeping the > guest for 30ms, our guest is still going to be able dirty pages faster > than we can transfer them. We will never catch up because the sleep time > relative to guest execution time is very very small. > > Auto converge, as it is implemented today, does not address the problem > I expect it solve. However, after rapid prototyping a new version of > auto converge that performs adaptive modeling I've learned something. > The workload I'm attempting to migrate is actually a pathological case. > It is an excellent example of why throttling cpu is not always a good > method of limiting memory access. In this test case we are able to touch > over 600 MB of pages in 50 ms of continuous execution. In this case, > even if I throttle the guest to 5% (50ms runtime, 950ms sleep) we still > cannot even come close to catching up even with a fairly speedy network > link (which not every user will have). > > Given the above, I believe that some workloads touch memory too fast and > we'll never be able to live migrate them with auto-converge. On the > lower end there are workloads that have a very small/stagnant working > set size which will be live migratable without the need for > auto-converge. Lastly, we have "the nebulous middle". These are > workloads that would benefit from auto-converge because they touch pages > too fast for migration to be able to deal with them, AND (important > conditional here), throttling will(may?) actually reduce their rate of > page modifications. I would like to try and define this "middle" set of > workloads. > > A question with no obvious answer: How much throttling is acceptable? If > I have to throttle a guest 90% and he ends up failing 75% of whatever > transactions he is attempting to process then we have quite likely > defeated the entire purpose of "live" migration. Perhaps it would be > better in this case to just stop the guest and do a non-live migration. > Maybe by reverting to non-live we actually save time and thus more > transactions would have completed. This one may take some experimenting > to be able to get a good idea for what makes the most sense. Maybe even > have max throttling be be user configurable. > > With all this said, I still wonder exactly how big this "nebulous > middle" really is. If, in practice, that "middle" only accounts for 1% > of the workloads out there then is it really worth spending time fixing > it? Keep in mind this is a two pronged test: > 1. Guest cannot migrate because it changes memory too fast > 2. Cpu throttling slows guest's memory writes down enough such that he > can now migrate > > I'm interested in any thoughts anyone has. Thanks! > Ping, Just wondering if anyone has any thoughts on this issue? -- -- Jason J. Herne (jjherne@linux.vnet.ibm.com)