From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:52823) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ui40Q-0005Ep-L9 for qemu-devel@nongnu.org; Thu, 30 May 2013 10:39:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ui40J-0000sQ-N1 for qemu-devel@nongnu.org; Thu, 30 May 2013 10:39:06 -0400 Received: from mx.ipv6.kamp.de ([2a02:248:0:51::16]:49909 helo=mx01.kamp.de) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1Ui40J-0000r9-DS for qemu-devel@nongnu.org; Thu, 30 May 2013 10:38:59 -0400 From: "Peter Lieven" Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable References: <51A7036A.3050407@ozlabs.ru> <51A7049F.6040207@redhat.com> <51A70B3D.90609@ozlabs.ru> <51A71705.6060009@kamp.de> <51A74D79.7040204@redhat.com> Mime-Version: 1.0 (1.0) In-Reply-To: <51A74D79.7040204@redhat.com> Message-Id: <2765FDFA-8050-4AA3-8621-7E9EA2C89F9C@kamp.de> Date: Thu, 30 May 2013 16:38:23 +0200 Subject: Re: [Qemu-devel] broken incoming migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Alexey Kardashevskiy , "qemu-ppc@nongnu.org" , "qemu-devel@nongnu.org" , David Gibson Am 30.05.2013 um 15:41 schrieb "Paolo Bonzini" : > Il 30/05/2013 11:08, Peter Lieven ha scritto: >> Am 30.05.2013 10:18, schrieb Alexey Kardashevskiy: >>> On 05/30/2013 05:49 PM, Paolo Bonzini wrote: >>>> Il 30/05/2013 09:44, Alexey Kardashevskiy ha scritto: >>>>> Hi! >>>>>=20 >>>>> I found the migration broken on pseries platform, specifically, this p= atch >>>>> broke it: >>>>>=20 >>>>> f1c72795af573b24a7da5eb52375c9aba8a37972 >>>>> migration: do not sent zero pages in bulk stage >>>>>=20 >>>>> The idea is not to send zero pages to the destination guest which is >>>>> expected to have 100% empty RAM. >>>>>=20 >>>>> However on pseries plaftorm the guest always has some stuff in the RAM= as a >>>>> part of initialization (device tree, system firmware and rtas (?)) so i= t is >>>>> not completely empty. As the source guest cannot detect this, it skips= some >>>>> pages during migration and we get a broken destination guest. Bug. >>>>>=20 >>>>> While the idea is ok in general, I do not see any easy way to fix it a= s >>>>> neither QEMUMachine::init nor QEMUMachine::reset callbacks has informa= tion >>>>> about whether we are about to receive a migration or not (-incoming >>>>> parameter) and we cannot move device-tree and system firmware >>>>> initialization anywhere else. >>>>>=20 >>>>> ram_bulk_stage is static and cannot be disabled from the platform >>>>> initialization code. >>>>>=20 >>>>> So what would the community suggest? >>>> Revert the patch. :) >>> I'll wait for 24 hours (forgot to cc: the author) and then post a revert= >>> patch :) >> does this problem only occur on pseries emulation? >=20 > Probably not. On a PC, it would occur if you had 4K of zeros in the > source BIOS but not in the destination BIOS. When you reboot, the BIOS > image is wrong. >=20 >> not sending zero pages is not only a performance benefit it also makes >> overcomitted memory usable. the madv_dontneed seems to kick in asynchrono= usly >> and memory is not available immediately. >=20 > You could also scan the page for nonzero values before writing it. i had this in mind, but then choosed the other approach.... turned out to be= a bad idea. alexey: i will prepare a patch later today, could you then please verify it f= ixes your problem. paolo: would we still need the madvise or is it enough to not write the zero= es? Peter >=20 > Paolo