From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43095) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ulvvt-0006X4-4V for qemu-devel@nongnu.org; Mon, 10 Jun 2013 02:50:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ulvvq-00086k-57 for qemu-devel@nongnu.org; Mon, 10 Jun 2013 02:50:25 -0400 Received: from mx.ipv6.kamp.de ([2a02:248:0:51::16]:53198 helo=mx01.kamp.de) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1Ulvvp-00086S-NG for qemu-devel@nongnu.org; Mon, 10 Jun 2013 02:50:22 -0400 Message-ID: <51B57727.9080903@kamp.de> Date: Mon, 10 Jun 2013 08:50:15 +0200 From: Peter Lieven MIME-Version: 1.0 References: <51A7036A.3050407@ozlabs.ru> <51A7049F.6040207@redhat.com> <51A70B3D.90609@ozlabs.ru> <51A71705.6060009@kamp.de> <51A74D79.7040204@redhat.com> <2765FDFA-8050-4AA3-8621-7E9EA2C89F9C@kamp.de> <51A764FC.7080705@redhat.com> <51ADF122.70307@kamp.de> <51ADF637.7060804@redhat.com> <51ADFBCE.3080200@kamp.de> <51ADFC7A.7030009@redhat.com> <51AE035A.5070301@kamp.de> <51B2EB0A.7000704@linux.vnet.ibm.com> <51B2EBA2.5060401@ozlabs.ru> <51B3E58C.50301@linux.vnet.ibm.com> <51B3E9A8.5010705@ozlabs.ru> <51B3EFFA.4040608@linux.vnet.ibm.com> <51B3F1FD.1090401@ozlabs.ru> <51B57489.20802@ozlabs.ru> In-Reply-To: <51B57489.20802@ozlabs.ru> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] broken incoming migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: Paolo Bonzini , David Gibson , "qemu-ppc@nongnu.org" , Wenchao Xia , "qemu-devel@nongnu.org" On 10.06.2013 08:39, Alexey Kardashevskiy wrote: > On 06/09/2013 05:27 PM, Peter Lieven wrote: >> Am 09.06.2013 um 05:09 schrieb Alexey Kardashevskiy : >> >>> On 06/09/2013 01:01 PM, Wenchao Xia wrote: >>>> 于 2013-6-9 10:34, Alexey Kardashevskiy 写道: >>>>> On 06/09/2013 12:16 PM, Wenchao Xia wrote: >>>>>> 于 2013-6-8 16:30, Alexey Kardashevskiy 写道: >>>>>>> On 06/08/2013 06:27 PM, Wenchao Xia wrote: >>>>>>>>> On 04.06.2013 16:40, Paolo Bonzini wrote: >>>>>>>>>> Il 04/06/2013 16:38, Peter Lieven ha scritto: >>>>>>>>>>> On 04.06.2013 16:14, Paolo Bonzini wrote: >>>>>>>>>>>> Il 04/06/2013 15:52, Peter Lieven ha scritto: >>>>>>>>>>>>> On 30.05.2013 16:41, Paolo Bonzini wrote: >>>>>>>>>>>>>> Il 30/05/2013 16:38, Peter Lieven ha scritto: >>>>>>>>>>>>>>>>> You could also scan the page for nonzero >>>>>>>>>>>>>>>>> values before writing it. >>>>>>>>>>>>>>> i had this in mind, but then choosed the other >>>>>>>>>>>>>>> approach.... turned out to be a bad idea. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> alexey: i will prepare a patch later today, >>>>>>>>>>>>>>> could you then please verify it fixes your >>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> paolo: would we still need the madvise or is >>>>>>>>>>>>>>> it enough to not write the zeroes? >>>>>>>>>>>>>> It should be enough to not write them. >>>>>>>>>>>>> Problem: checking the pages for zero allocates >>>>>>>>>>>>> them. even at the source. >>>>>>>>>>>> It doesn't look like. I tried this program and top >>>>>>>>>>>> doesn't show an increasing amount of reserved >>>>>>>>>>>> memory: >>>>>>>>>>>> >>>>>>>>>>>> #include #include int main() { >>>>>>>>>>>> char *x = malloc(500 << 20); int i, j; for (i = 0; i >>>>>>>>>>>> < 500; i += 10) { for (j = 0; j < 10 << 20; j += >>>>>>>>>>>> 4096) { *(volatile char*) (x + (i << 20) + j); } >>>>>>>>>>>> getchar(); } } >>>>>>>>>>> strange. we are talking about RSS size, right? >>>>>>>>>> None of the three top values change, and only VIRT is >>>>>>>>>>> 500 MB. >>>>>>>>>>> is the malloc above using mmapped memory? >>>>>>>>>> Yes. >>>>>>>>>> >>>>>>>>>>> which kernel version do you use? >>>>>>>>>> 3.9. >>>>>>>>>> >>>>>>>>>>> what avoids allocating the memory for me is the >>>>>>>>>>> following (with whatever side effects it has ;-)) >>>>>>>>>> This would also fail to migrate any page that is swapped >>>>>>>>>> out, breaking overcommit in a more subtle way. :) >>>>>>>>>> >>>>>>>>>> Paolo >>>>>>>>> the following does also not allocate memory, but qemu >>>>>>>>> does... >>>>>>>> Hi, Peter As the patch writes >>>>>>>> >>>>>>>> "not sending zero pages breaks migration if a page is zero >>>>>>>> at the source but not at the destination." >>>>>>>> >>>>>>>> I don't understand why it would be trouble, shouldn't all >>>>>>>> page not received in dest be treated as zero pages? >>>>>>> >>>>>>> How would the destination guest know if some page must be >>>>>>> cleared? The previous patch (which Peter reverted) did not >>>>>>> send anything for the pages which were zero on the source >>>>>>> side. >>>>>> If an page was not received and destination knows that page >>>>>> should exist according to total size, fill it with zero at >>>>>> destination, would it solve the problem? >>>>> It is _live_ migration, the source sends changes, same pages can >>>>> change and be sent several times. So we would need to turn >>>>> tracking on on the destination to know if some page was received >>>>> from the source or changed by the destination itself (by writing >>>>> there bios/firmware images, etc) and then clear pages which were >>>>> touched by the destination and were not sent by the source. >>>> OK, I can understand the problem is, for example: Destination boots >>>> up with 0x0000-0xFFFF filled with bios image. Source forgot to send >>>> zero pages in 0x0000-0xFFFF. >>> >>> The source did not forget, instead it zeroed these pages during its >>> life and thought that they must be zeroed at the destination already >>> (as the destination did not start and did not have a chance to write >>> something there). >>> >>> >>>> After migration destination got 0x0000-0xFFFF dirty(different with >>>> source) >>> Yep. And those pages were empty on the source what made debugging very >>> easy :) >>> >>> >>>> Thanks for explain. >>>> >>>> This seems refer to the migration protocol: how should the guest >>>> treat unsent pages. The patch causing the problem, actually treat >>>> zero pages as "not to sent" at source, but another half is missing: >>>> treat "not received" as zero pages at destination. I guess if second >>>> half is added, problem is gone: after page transfer completed, >>>> before destination resume, fill zero in "not received" pages. >>> >>> >>> Make a working patch, we'll discuss it :) I do not see much >>> acceleration coming from there. >> I would also not spent much time with this. I would either look to find >> an easy way to fix the initialization code to not unneccessarily load >> data into RAM or i will sent a v2 of my patch following Eric's >> concerns. > There is no easy way to implement the flag and keep your original patch as > we have to implement this flag in all architectures which got broken by > your patch and I personally can fix only PPC64-pseries but not the others. > > Furthermore your revert + new patches perfectly solve the problem, why > would we want to bother now with this new flag which nobody really needs > right now? > > Please, please, revert the original patch or I'll try to do it :) > > I tried, but there where concerns by the community. Alternativly I found the following alternate solution. Please drop the 2 patches and try the following: diff --git a/arch_init.c b/arch_init.c index 5d32ecf..458bf8c 100644 --- a/arch_init.c +++ b/arch_init.c @@ -799,6 +799,8 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) while (total_ram_bytes) { RAMBlock *block; uint8_t len; + void *base; + ram_addr_t offset; len = qemu_get_byte(f); qemu_get_buffer(f, (uint8_t *)id, len); @@ -822,6 +824,14 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) goto done; } + base = memory_region_get_ram_ptr(block->mr); + for (offset = 0; offset < block->length; + offset += TARGET_PAGE_SIZE) { + if (!is_zero_page(base + offset)) { + memset(base + offset, 0x00, TARGET_PAGE_SIZE); + } + } + total_ram_bytes -= length; } } This is done at setup time so there is no additional cost for zero checking at each compressed page coming in. Peter