From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:44950) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UhyqY-0006r5-20 for qemu-devel@nongnu.org; Thu, 30 May 2013 05:08:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UhyqR-0007CP-JN for qemu-devel@nongnu.org; Thu, 30 May 2013 05:08:33 -0400 Received: from mx.ipv6.kamp.de ([2a02:248:0:51::16]:38270 helo=mx01.kamp.de) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1UhyqR-0007C9-9Z for qemu-devel@nongnu.org; Thu, 30 May 2013 05:08:27 -0400 Message-ID: <51A71705.6060009@kamp.de> Date: Thu, 30 May 2013 11:08:21 +0200 From: Peter Lieven MIME-Version: 1.0 References: <51A7036A.3050407@ozlabs.ru> <51A7049F.6040207@redhat.com> <51A70B3D.90609@ozlabs.ru> In-Reply-To: <51A70B3D.90609@ozlabs.ru> Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] broken incoming migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: Paolo Bonzini , "qemu-ppc@nongnu.org" , "qemu-devel@nongnu.org" , David Gibson Am 30.05.2013 10:18, schrieb Alexey Kardashevskiy: > On 05/30/2013 05:49 PM, Paolo Bonzini wrote: >> Il 30/05/2013 09:44, Alexey Kardashevskiy ha scritto: >>> Hi! >>> >>> I found the migration broken on pseries platform, specifically, this patch >>> broke it: >>> >>> f1c72795af573b24a7da5eb52375c9aba8a37972 >>> migration: do not sent zero pages in bulk stage >>> >>> The idea is not to send zero pages to the destination guest which is >>> expected to have 100% empty RAM. >>> >>> However on pseries plaftorm the guest always has some stuff in the RAM as a >>> part of initialization (device tree, system firmware and rtas (?)) so it is >>> not completely empty. As the source guest cannot detect this, it skips some >>> pages during migration and we get a broken destination guest. Bug. >>> >>> While the idea is ok in general, I do not see any easy way to fix it as >>> neither QEMUMachine::init nor QEMUMachine::reset callbacks has information >>> about whether we are about to receive a migration or not (-incoming >>> parameter) and we cannot move device-tree and system firmware >>> initialization anywhere else. >>> >>> ram_bulk_stage is static and cannot be disabled from the platform >>> initialization code. >>> >>> So what would the community suggest? >> Revert the patch. :) > I'll wait for 24 hours (forgot to cc: the author) and then post a revert > patch :) > > > does this problem only occur on pseries emulation? not sending zero pages is not only a performance benefit it also makes overcomitted memory usable. the madv_dontneed seems to kick in asynchronously and memory is not available immediately. what I do not understand if the a memory region is not empty at destination due to device tree, firmware etc. it shouldn't be empty at the source as well so in theory this should not be a problem. Peter