From mboxrd@z Thu Jan 1 00:00:00 1970 From: Takuya Yoshikawa Subject: Re: [PATCH 00/21][RFC] postcopy live migration Date: Wed, 04 Jan 2012 10:30:09 +0900 Message-ID: <4F03ABA1.7090000@oss.ntt.co.jp> References: <4EFCEC38.3080308@codemonkey.ws> <4F002CDB.7070708@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Anthony Liguori , Isaku Yamahata , kvm@vger.kernel.org, Juan Quintela , t.hirofuchi@aist.go.jp, satoshi.itoh@aist.go.jp, Michael Roth , qemu-devel@nongnu.org, Umesh Deshpande To: dlaor@redhat.com Return-path: Received: from serv2.oss.ntt.co.jp ([222.151.198.100]:33135 "EHLO serv2.oss.ntt.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755029Ab2ADB3D (ORCPT ); Tue, 3 Jan 2012 20:29:03 -0500 In-Reply-To: <4F002CDB.7070708@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: (2012/01/01 18:52), Dor Laor wrote: >> But we really need to think hard about whether this is the right thing >> to take into the tree. I worry a lot about the fact that we don't test >> pre-copy migration nearly enough and adding a second form just >> introduces more things to test. > > It is an issue but it can't be a merge criteria, Isaku is not blame of pre copy live migration lack of testing. > > I would say that 90% of issues of live migration problems are not related to the pre|post stage but more of issues of device model save state. So post-copy shouldn't add a significant regression here. Though they may be only 10% the remaining issues tend to be hard to find. > > Probably it will be good to ask every migration patch writer to write an additional unit test for migration. > >> It's also not clear to me why post-copy is better. If you were going to >> sit down and explain to someone building a management tool when they >> should use pre-copy and when they should use post-copy, what would you >> tell them? > > Today, we have a default of max-downtime of 100ms. > If either the guest work set size or the host networking throughput can't match the downtime, migration won't end. > The mgmt user options are: > - increase the downtime more and more to an actual stop > - fail migrate > > W/ post-copy there is another option. > Performance measurements will teach us (probably prior to commit) when this stage is valuable. Most likely, we better try first with pre-copy and if we can't meet the downtime we can optionally use post-copy. It is difficult to recommend mixing two methods which have different requirements to users: post-copy cannot be canceled and, probably, needs some dedicated/reliable lines to make it sure that guests will not be broken during copy stage. What we want to know, from user's point of view, is clear/simple criteria: what is needed for post-copy for what services we should select post-copy Takuya > > Here's a paper by Umesh (the migration thread writer): > http://osnet.cs.binghamton.edu/publications/hines09postcopy_osr.pdf > > Regards, > Dor > From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:34279) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RiFfA-0002iL-RQ for qemu-devel@nongnu.org; Tue, 03 Jan 2012 20:29:09 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RiFf9-00010B-At for qemu-devel@nongnu.org; Tue, 03 Jan 2012 20:29:08 -0500 Received: from serv2.oss.ntt.co.jp ([222.151.198.100]:33138) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RiFf8-000101-Np for qemu-devel@nongnu.org; Tue, 03 Jan 2012 20:29:07 -0500 Message-ID: <4F03ABA1.7090000@oss.ntt.co.jp> Date: Wed, 04 Jan 2012 10:30:09 +0900 From: Takuya Yoshikawa MIME-Version: 1.0 References: <4EFCEC38.3080308@codemonkey.ws> <4F002CDB.7070708@redhat.com> In-Reply-To: <4F002CDB.7070708@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: dlaor@redhat.com Cc: kvm@vger.kernel.org, satoshi.itoh@aist.go.jp, t.hirofuchi@aist.go.jp, Juan Quintela , Michael Roth , qemu-devel@nongnu.org, Isaku Yamahata , Umesh Deshpande (2012/01/01 18:52), Dor Laor wrote: >> But we really need to think hard about whether this is the right thing >> to take into the tree. I worry a lot about the fact that we don't test >> pre-copy migration nearly enough and adding a second form just >> introduces more things to test. > > It is an issue but it can't be a merge criteria, Isaku is not blame of pre copy live migration lack of testing. > > I would say that 90% of issues of live migration problems are not related to the pre|post stage but more of issues of device model save state. So post-copy shouldn't add a significant regression here. Though they may be only 10% the remaining issues tend to be hard to find. > > Probably it will be good to ask every migration patch writer to write an additional unit test for migration. > >> It's also not clear to me why post-copy is better. If you were going to >> sit down and explain to someone building a management tool when they >> should use pre-copy and when they should use post-copy, what would you >> tell them? > > Today, we have a default of max-downtime of 100ms. > If either the guest work set size or the host networking throughput can't match the downtime, migration won't end. > The mgmt user options are: > - increase the downtime more and more to an actual stop > - fail migrate > > W/ post-copy there is another option. > Performance measurements will teach us (probably prior to commit) when this stage is valuable. Most likely, we better try first with pre-copy and if we can't meet the downtime we can optionally use post-copy. It is difficult to recommend mixing two methods which have different requirements to users: post-copy cannot be canceled and, probably, needs some dedicated/reliable lines to make it sure that guests will not be broken during copy stage. What we want to know, from user's point of view, is clear/simple criteria: what is needed for post-copy for what services we should select post-copy Takuya > > Here's a paper by Umesh (the migration thread writer): > http://osnet.cs.binghamton.edu/publications/hines09postcopy_osr.pdf > > Regards, > Dor >