From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Roth Subject: Re: [PATCH 00/21][RFC] postcopy live migration Date: Tue, 03 Jan 2012 21:48:16 -0600 Message-ID: <4F03CC00.2010303@linux.vnet.ibm.com> References: <4EFCEC38.3080308@codemonkey.ws> <4F002CDB.7070708@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, Juan Quintela , t.hirofuchi@aist.go.jp, satoshi.itoh@aist.go.jp, qemu-devel@nongnu.org, Isaku Yamahata , Umesh Deshpande To: dlaor@redhat.com Return-path: In-Reply-To: <4F002CDB.7070708@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org List-Id: kvm.vger.kernel.org On 01/01/2012 03:52 AM, Dor Laor wrote: > On 12/30/2011 12:39 AM, Anthony Liguori wrote: >> On 12/28/2011 07:25 PM, Isaku Yamahata wrote: >>> Intro >>> ===== >>> This patch series implements postcopy live migration.[1] >>> As discussed at KVM forum 2011, dedicated character device is used for >>> distributed shared memory between migration source and destination. >>> Now we can discuss/benchmark/compare with precopy. I believe there are >>> much rooms for improvement. >>> >>> [1] http://wiki.qemu.org/Features/PostCopyLiveMigration >>> >>> >>> Usage >>> ===== >>> You need load umem character device on the host before starting >>> migration. >>> Postcopy can be used for tcg and kvm accelarator. The implementation >>> depend >>> on only linux umem character device. But the driver dependent code is >>> split >>> into a file. >>> I tested only host page size == guest page size case, but the >>> implementation >>> allows host page size != guest page size case. >>> >>> The following options are added with this patch series. >>> - incoming part >>> command line options >>> -postcopy [-postcopy-flags] >>> where flags is for changing behavior for benchmark/debugging >>> Currently the following flags are available >>> 0: default >>> 1: enable touching page request >>> >>> example: >>> qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm >>> >>> - outging part >>> options for migrate command >>> migrate [-p [-n]] URI >>> -p: indicate postcopy migration >>> -n: disable background transferring pages: This is for >>> benchmark/debugging >>> >>> example: >>> migrate -p -n tcp::4444 >>> >>> >>> TODO >>> ==== >>> - benchmark/evaluation. Especially how async page fault affects the >>> result. >> >> I'll review this series next week (Mike/Juan, please also review when >> you can). >> >> But we really need to think hard about whether this is the right thing >> to take into the tree. I worry a lot about the fact that we don't test >> pre-copy migration nearly enough and adding a second form just >> introduces more things to test. > > It is an issue but it can't be a merge criteria, Isaku is not blame of > pre copy live migration lack of testing. > > I would say that 90% of issues of live migration problems are not > related to the pre|post stage but more of issues of device model save > state. So post-copy shouldn't add a significant regression here. > > Probably it will be good to ask every migration patch writer to write an > additional unit test for migration. > >> It's also not clear to me why post-copy is better. If you were going to >> sit down and explain to someone building a management tool when they >> should use pre-copy and when they should use post-copy, what would you >> tell them? > > Today, we have a default of max-downtime of 100ms. > If either the guest work set size or the host networking throughput > can't match the downtime, migration won't end. > The mgmt user options are: > - increase the downtime more and more to an actual stop > - fail migrate > > W/ post-copy there is another option. > Performance measurements will teach us (probably prior to commit) when > this stage is valuable. Most likely, we better try first with pre-copy > and if we can't meet the downtime we can optionally use post-copy. Umesh's paper seems to already have strong indications that at least 1 iteration of pre-copy is optimal in terms of downtime, so I wonder if we're starting off on the wrong track with the all or nothing approach taken with this series? I only have Umesh's paper to go off (which, granted, notes shadow paging (which I guess we effectively have with these patches) as a potential improvement to the pseudo swap device used there), but otherwise it seems like we'd just get more downtime as the target starts choking on network-based page faults. It's probably not too useful to speculate on performance at this point, but I think it'll be easier to get the data (and hit closer to the mark suggested by the paper) if we started off assuming that pre-copy should still be in play. Maybe something like: migrate -d tcp:host:123 -p[ostcopy] x=0 for post-copy only, no -p for pre-copy-only Also seems a bit cleaner. And if post-copy proves to be optimal we just make -p 0 implied... minor details at this point, but the main thing is that integrating better with the pre-copy code will make it easier to determine what the sweet spot is and how much we stand to gain. > > Here's a paper by Umesh (the migration thread writer): > http://osnet.cs.binghamton.edu/publications/hines09postcopy_osr.pdf > > Regards, > Dor > >> >> Regards, >> >> Anthony Liguori >> > > From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:57409) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RiHqh-0007cI-0L for qemu-devel@nongnu.org; Tue, 03 Jan 2012 22:49:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RiHqf-0001U4-7Q for qemu-devel@nongnu.org; Tue, 03 Jan 2012 22:49:10 -0500 Received: from e37.co.us.ibm.com ([32.97.110.158]:44434) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RiHqe-0001Tq-RU for qemu-devel@nongnu.org; Tue, 03 Jan 2012 22:49:09 -0500 Received: from /spool/local by e37.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 3 Jan 2012 20:49:06 -0700 Received: from d03av05.boulder.ibm.com (d03av05.boulder.ibm.com [9.17.195.85]) by d03relay03.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q043mIEE168012 for ; Tue, 3 Jan 2012 20:48:18 -0700 Received: from d03av05.boulder.ibm.com (loopback [127.0.0.1]) by d03av05.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q043mH25020937 for ; Tue, 3 Jan 2012 20:48:17 -0700 Message-ID: <4F03CC00.2010303@linux.vnet.ibm.com> Date: Tue, 03 Jan 2012 21:48:16 -0600 From: Michael Roth MIME-Version: 1.0 References: <4EFCEC38.3080308@codemonkey.ws> <4F002CDB.7070708@redhat.com> In-Reply-To: <4F002CDB.7070708@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: dlaor@redhat.com Cc: kvm@vger.kernel.org, Juan Quintela , t.hirofuchi@aist.go.jp, satoshi.itoh@aist.go.jp, qemu-devel@nongnu.org, Isaku Yamahata , Umesh Deshpande On 01/01/2012 03:52 AM, Dor Laor wrote: > On 12/30/2011 12:39 AM, Anthony Liguori wrote: >> On 12/28/2011 07:25 PM, Isaku Yamahata wrote: >>> Intro >>> ===== >>> This patch series implements postcopy live migration.[1] >>> As discussed at KVM forum 2011, dedicated character device is used for >>> distributed shared memory between migration source and destination. >>> Now we can discuss/benchmark/compare with precopy. I believe there are >>> much rooms for improvement. >>> >>> [1] http://wiki.qemu.org/Features/PostCopyLiveMigration >>> >>> >>> Usage >>> ===== >>> You need load umem character device on the host before starting >>> migration. >>> Postcopy can be used for tcg and kvm accelarator. The implementation >>> depend >>> on only linux umem character device. But the driver dependent code is >>> split >>> into a file. >>> I tested only host page size == guest page size case, but the >>> implementation >>> allows host page size != guest page size case. >>> >>> The following options are added with this patch series. >>> - incoming part >>> command line options >>> -postcopy [-postcopy-flags] >>> where flags is for changing behavior for benchmark/debugging >>> Currently the following flags are available >>> 0: default >>> 1: enable touching page request >>> >>> example: >>> qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm >>> >>> - outging part >>> options for migrate command >>> migrate [-p [-n]] URI >>> -p: indicate postcopy migration >>> -n: disable background transferring pages: This is for >>> benchmark/debugging >>> >>> example: >>> migrate -p -n tcp::4444 >>> >>> >>> TODO >>> ==== >>> - benchmark/evaluation. Especially how async page fault affects the >>> result. >> >> I'll review this series next week (Mike/Juan, please also review when >> you can). >> >> But we really need to think hard about whether this is the right thing >> to take into the tree. I worry a lot about the fact that we don't test >> pre-copy migration nearly enough and adding a second form just >> introduces more things to test. > > It is an issue but it can't be a merge criteria, Isaku is not blame of > pre copy live migration lack of testing. > > I would say that 90% of issues of live migration problems are not > related to the pre|post stage but more of issues of device model save > state. So post-copy shouldn't add a significant regression here. > > Probably it will be good to ask every migration patch writer to write an > additional unit test for migration. > >> It's also not clear to me why post-copy is better. If you were going to >> sit down and explain to someone building a management tool when they >> should use pre-copy and when they should use post-copy, what would you >> tell them? > > Today, we have a default of max-downtime of 100ms. > If either the guest work set size or the host networking throughput > can't match the downtime, migration won't end. > The mgmt user options are: > - increase the downtime more and more to an actual stop > - fail migrate > > W/ post-copy there is another option. > Performance measurements will teach us (probably prior to commit) when > this stage is valuable. Most likely, we better try first with pre-copy > and if we can't meet the downtime we can optionally use post-copy. Umesh's paper seems to already have strong indications that at least 1 iteration of pre-copy is optimal in terms of downtime, so I wonder if we're starting off on the wrong track with the all or nothing approach taken with this series? I only have Umesh's paper to go off (which, granted, notes shadow paging (which I guess we effectively have with these patches) as a potential improvement to the pseudo swap device used there), but otherwise it seems like we'd just get more downtime as the target starts choking on network-based page faults. It's probably not too useful to speculate on performance at this point, but I think it'll be easier to get the data (and hit closer to the mark suggested by the paper) if we started off assuming that pre-copy should still be in play. Maybe something like: migrate -d tcp:host:123 -p[ostcopy] x=0 for post-copy only, no -p for pre-copy-only Also seems a bit cleaner. And if post-copy proves to be optimal we just make -p 0 implied... minor details at this point, but the main thing is that integrating better with the pre-copy code will make it easier to determine what the sweet spot is and how much we stand to gain. > > Here's a paper by Umesh (the migration thread writer): > http://osnet.cs.binghamton.edu/publications/hines09postcopy_osr.pdf > > Regards, > Dor > >> >> Regards, >> >> Anthony Liguori >> > >