From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dor Laor Subject: Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration Date: Mon, 02 Jan 2012 11:28:49 +0200 Message-ID: <4F0178D1.9070706@redhat.com> References: <4EFCEC38.3080308@codemonkey.ws> <4F002AC6.7080007@redhat.com> Reply-To: dlaor@redhat.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Orit Wasserman , kvm@vger.kernel.org, Juan Quintela , t.hirofuchi@aist.go.jp, satoshi.itoh@aist.go.jp, Michael Roth , qemu-devel@nongnu.org, Isaku Yamahata To: Stefan Hajnoczi Return-path: Received: from mx1.redhat.com ([209.132.183.28]:40763 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751084Ab2ABJ3F (ORCPT ); Mon, 2 Jan 2012 04:29:05 -0500 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On 01/01/2012 06:27 PM, Stefan Hajnoczi wrote: > On Sun, Jan 1, 2012 at 9:43 AM, Orit Wasserman wrote: >> On 12/30/2011 12:39 AM, Anthony Liguori wrote: >>> On 12/28/2011 07:25 PM, Isaku Yamahata wrote: >>>> Intro >>>> ===== >>>> This patch series implements postcopy live migration.[1] >>>> As discussed at KVM forum 2011, dedicated character device is used for >>>> distributed shared memory between migration source and destination. >>>> Now we can discuss/benchmark/compare with precopy. I believe there are >>>> much rooms for improvement. >>>> >>>> [1] http://wiki.qemu.org/Features/PostCopyLiveMigration >>>> >>>> >>>> Usage >>>> ===== >>>> You need load umem character device on the host before starting migration. >>>> Postcopy can be used for tcg and kvm accelarator. The implementation depend >>>> on only linux umem character device. But the driver dependent code is split >>>> into a file. >>>> I tested only host page size == guest page size case, but the implementation >>>> allows host page size != guest page size case. >>>> >>>> The following options are added with this patch series. >>>> - incoming part >>>> command line options >>>> -postcopy [-postcopy-flags] >>>> where flags is for changing behavior for benchmark/debugging >>>> Currently the following flags are available >>>> 0: default >>>> 1: enable touching page request >>>> >>>> example: >>>> qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm >>>> >>>> - outging part >>>> options for migrate command >>>> migrate [-p [-n]] URI >>>> -p: indicate postcopy migration >>>> -n: disable background transferring pages: This is for benchmark/debugging >>>> >>>> example: >>>> migrate -p -n tcp::4444 >>>> >>>> >>>> TODO >>>> ==== >>>> - benchmark/evaluation. Especially how async page fault affects the result. >>> >>> I'll review this series next week (Mike/Juan, please also review when you can). >>> >>> But we really need to think hard about whether this is the right thing to take into the tree. I worry a lot about the fact that we don't test pre-copy migration nearly enough and adding a second form just introduces more things to test. >>> >>> It's also not clear to me why post-copy is better. If you were going to sit down and explain to someone building a management tool when they should use pre-copy and when they should use post-copy, what would you tell them? >> >> Start with pre-copy , if it doesn't converge switch to post-copy > > Post-copy throttles the guest when page faults are encountered because > the destination machine waits for memory pages from the source > machine. Is there a reason this page fault-based throttling cannot be > done on the source machine with pre-copy migration? I'm not sure > post-copy provides new behavior in terms of convergence, we could do > the same with pre-copy migration. There is different w/ these two approaches: 1. post-copy allows progress to vcpus that are not faulting at the moment. Assuming a subset of the guest vcpu can execute freely w/ their memory already at the destination, they can get 100% cpu time. The slowing down approach on the source host, slows down all vcpus. 2. Difference page access pattern post-copy uses on-demand like paging, so the page that is really required get transferred. The slow-down approach can just guess what page to send first. > > Post-copy has other advantages though, it immediately frees logical > CPUs on the source machine (though RAM and network bandwidth is still > required until migration completes). W/ post-copy you can immediately free any page that got transferred to the destination. At the end of the day, it's performance testing using various scenarios that can educate us whether post-copy worth the extra complexity over slowing down the guest on the source. Cheers, Dor > > Stefan > From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:33604) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RheCX-0002ZO-DV for qemu-devel@nongnu.org; Mon, 02 Jan 2012 04:29:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RheCV-0008OU-UD for qemu-devel@nongnu.org; Mon, 02 Jan 2012 04:29:05 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39947) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RheCV-0008OG-Ii for qemu-devel@nongnu.org; Mon, 02 Jan 2012 04:29:03 -0500 Message-ID: <4F0178D1.9070706@redhat.com> Date: Mon, 02 Jan 2012 11:28:49 +0200 From: Dor Laor MIME-Version: 1.0 References: <4EFCEC38.3080308@codemonkey.ws> <4F002AC6.7080007@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 00/21][RFC] postcopy live migration Reply-To: dlaor@redhat.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: kvm@vger.kernel.org, satoshi.itoh@aist.go.jp, t.hirofuchi@aist.go.jp, Juan Quintela , Michael Roth , qemu-devel@nongnu.org, Orit Wasserman , Isaku Yamahata On 01/01/2012 06:27 PM, Stefan Hajnoczi wrote: > On Sun, Jan 1, 2012 at 9:43 AM, Orit Wasserman wrote: >> On 12/30/2011 12:39 AM, Anthony Liguori wrote: >>> On 12/28/2011 07:25 PM, Isaku Yamahata wrote: >>>> Intro >>>> ===== >>>> This patch series implements postcopy live migration.[1] >>>> As discussed at KVM forum 2011, dedicated character device is used for >>>> distributed shared memory between migration source and destination. >>>> Now we can discuss/benchmark/compare with precopy. I believe there are >>>> much rooms for improvement. >>>> >>>> [1] http://wiki.qemu.org/Features/PostCopyLiveMigration >>>> >>>> >>>> Usage >>>> ===== >>>> You need load umem character device on the host before starting migration. >>>> Postcopy can be used for tcg and kvm accelarator. The implementation depend >>>> on only linux umem character device. But the driver dependent code is split >>>> into a file. >>>> I tested only host page size == guest page size case, but the implementation >>>> allows host page size != guest page size case. >>>> >>>> The following options are added with this patch series. >>>> - incoming part >>>> command line options >>>> -postcopy [-postcopy-flags] >>>> where flags is for changing behavior for benchmark/debugging >>>> Currently the following flags are available >>>> 0: default >>>> 1: enable touching page request >>>> >>>> example: >>>> qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm >>>> >>>> - outging part >>>> options for migrate command >>>> migrate [-p [-n]] URI >>>> -p: indicate postcopy migration >>>> -n: disable background transferring pages: This is for benchmark/debugging >>>> >>>> example: >>>> migrate -p -n tcp::4444 >>>> >>>> >>>> TODO >>>> ==== >>>> - benchmark/evaluation. Especially how async page fault affects the result. >>> >>> I'll review this series next week (Mike/Juan, please also review when you can). >>> >>> But we really need to think hard about whether this is the right thing to take into the tree. I worry a lot about the fact that we don't test pre-copy migration nearly enough and adding a second form just introduces more things to test. >>> >>> It's also not clear to me why post-copy is better. If you were going to sit down and explain to someone building a management tool when they should use pre-copy and when they should use post-copy, what would you tell them? >> >> Start with pre-copy , if it doesn't converge switch to post-copy > > Post-copy throttles the guest when page faults are encountered because > the destination machine waits for memory pages from the source > machine. Is there a reason this page fault-based throttling cannot be > done on the source machine with pre-copy migration? I'm not sure > post-copy provides new behavior in terms of convergence, we could do > the same with pre-copy migration. There is different w/ these two approaches: 1. post-copy allows progress to vcpus that are not faulting at the moment. Assuming a subset of the guest vcpu can execute freely w/ their memory already at the destination, they can get 100% cpu time. The slowing down approach on the source host, slows down all vcpus. 2. Difference page access pattern post-copy uses on-demand like paging, so the page that is really required get transferred. The slow-down approach can just guess what page to send first. > > Post-copy has other advantages though, it immediately frees logical > CPUs on the source machine (though RAM and network bandwidth is still > required until migration completes). W/ post-copy you can immediately free any page that got transferred to the destination. At the end of the day, it's performance testing using various scenarios that can educate us whether post-copy worth the extra complexity over slowing down the guest on the source. Cheers, Dor > > Stefan >