From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37022) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ujouq-0000Hk-3M for qemu-devel@nongnu.org; Tue, 04 Jun 2013 06:56:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ujouj-0005MT-5U for qemu-devel@nongnu.org; Tue, 04 Jun 2013 06:56:35 -0400 Received: from mx.ipv6.kamp.de ([2a02:248:0:51::16]:52813 helo=mx01.kamp.de) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1Ujoui-0005M9-TB for qemu-devel@nongnu.org; Tue, 04 Jun 2013 06:56:29 -0400 Message-ID: <51ADC7D3.8010008@kamp.de> Date: Tue, 04 Jun 2013 12:56:19 +0200 From: Peter Lieven MIME-Version: 1.0 References: <51A7036A.3050407@ozlabs.ru> <51A7049F.6040207@redhat.com> <51A70B3D.90609@ozlabs.ru> <51A71705.6060009@kamp.de> <51A74D79.7040204@redhat.com> <2765FDFA-8050-4AA3-8621-7E9EA2C89F9C@kamp.de> <51AC6A26.7060309@ozlabs.ru> In-Reply-To: <51AC6A26.7060309@ozlabs.ru> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] broken incoming migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: Paolo Bonzini , "qemu-ppc@nongnu.org" , "qemu-devel@nongnu.org" , David Gibson On 03.06.2013 12:04, Alexey Kardashevskiy wrote: > On 05/31/2013 12:38 AM, Peter Lieven wrote: >> >> >> Am 30.05.2013 um 15:41 schrieb "Paolo Bonzini" : >> >>> Il 30/05/2013 11:08, Peter Lieven ha scritto: >>>> Am 30.05.2013 10:18, schrieb Alexey Kardashevskiy: >>>>> On 05/30/2013 05:49 PM, Paolo Bonzini wrote: >>>>>> Il 30/05/2013 09:44, Alexey Kardashevskiy ha scritto: >>>>>>> Hi! >>>>>>> >>>>>>> I found the migration broken on pseries platform, specifically, this patch >>>>>>> broke it: >>>>>>> >>>>>>> f1c72795af573b24a7da5eb52375c9aba8a37972 >>>>>>> migration: do not sent zero pages in bulk stage >>>>>>> >>>>>>> The idea is not to send zero pages to the destination guest which is >>>>>>> expected to have 100% empty RAM. >>>>>>> >>>>>>> However on pseries plaftorm the guest always has some stuff in the RAM as a >>>>>>> part of initialization (device tree, system firmware and rtas (?)) so it is >>>>>>> not completely empty. As the source guest cannot detect this, it skips some >>>>>>> pages during migration and we get a broken destination guest. Bug. >>>>>>> >>>>>>> While the idea is ok in general, I do not see any easy way to fix it as >>>>>>> neither QEMUMachine::init nor QEMUMachine::reset callbacks has information >>>>>>> about whether we are about to receive a migration or not (-incoming >>>>>>> parameter) and we cannot move device-tree and system firmware >>>>>>> initialization anywhere else. >>>>>>> >>>>>>> ram_bulk_stage is static and cannot be disabled from the platform >>>>>>> initialization code. >>>>>>> >>>>>>> So what would the community suggest? >>>>>> Revert the patch. :) >>>>> I'll wait for 24 hours (forgot to cc: the author) and then post a revert >>>>> patch :) >>>> does this problem only occur on pseries emulation? >>> Probably not. On a PC, it would occur if you had 4K of zeros in the >>> source BIOS but not in the destination BIOS. When you reboot, the BIOS >>> image is wrong. >>> >>>> not sending zero pages is not only a performance benefit it also makes >>>> overcomitted memory usable. the madv_dontneed seems to kick in asynchronously >>>> and memory is not available immediately. >>> You could also scan the page for nonzero values before writing it. >> i had this in mind, but then choosed the other approach.... turned out to be a bad idea. >> >> alexey: i will prepare a patch later today, could you then please verify it fixes your problem. > > Yes I can, where is the patch? :) its on my todo for today. sorry, have been a bit busy lately. Peter