From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:47242)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1Ulxi9-0002gG-Ka
	for qemu-devel@nongnu.org; Mon, 10 Jun 2013 04:44:30 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pl@kamp.de>) id 1Ulxi0-0005BQ-QL
	for qemu-devel@nongnu.org; Mon, 10 Jun 2013 04:44:21 -0400
Received: from mx.ipv6.kamp.de ([2a02:248:0:51::16]:49767 helo=mx01.kamp.de)
	by eggs.gnu.org with smtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1Ulxi0-0005B3-DB
	for qemu-devel@nongnu.org; Mon, 10 Jun 2013 04:44:12 -0400
Message-ID: <51B591D1.5040705@kamp.de>
Date: Mon, 10 Jun 2013 10:44:01 +0200
From: Peter Lieven <pl@kamp.de>
MIME-Version: 1.0
References: <51A7036A.3050407@ozlabs.ru> <51A7049F.6040207@redhat.com>
	<51A70B3D.90609@ozlabs.ru> <51A71705.6060009@kamp.de>
	<51A74D79.7040204@redhat.com>
	<2765FDFA-8050-4AA3-8621-7E9EA2C89F9C@kamp.de>
	<51A764FC.7080705@redhat.com> <51ADF122.70307@kamp.de>
	<51ADF637.7060804@redhat.com> <51ADFBCE.3080200@kamp.de>
	<51ADFC7A.7030009@redhat.com> <51AE035A.5070301@kamp.de>
	<51B2EB0A.7000704@linux.vnet.ibm.com> <51B2EBA2.5060401@ozlabs.ru>
	<51B3E58C.50301@linux.vnet.ibm.com> <51B3E9A8.5010705@ozlabs.ru>
	<51B3EFFA.4040608@linux.vnet.ibm.com> <51B3F1FD.1090401@ozlabs.ru>
	<F4010622-8072-434E-94CA-1A63C251BB45@kamp.de>
	<51B57489.20802@ozlabs.ru> <51B57727.9080903@kamp.de>
	<51B5785B.6040704@ozlabs.ru>
In-Reply-To: <51B5785B.6040704@ozlabs.ru>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] broken incoming migration
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Paolo Bonzini <pbonzini@redhat.com>, David Gibson <david@gibson.dropbear.id.au>, "qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>, Wenchao Xia <xiawenc@linux.vnet.ibm.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>

On 10.06.2013 08:55, Alexey Kardashevskiy wrote:
> On 06/10/2013 04:50 PM, Peter Lieven wrote:
>> On 10.06.2013 08:39, Alexey Kardashevskiy wrote:
>>> On 06/09/2013 05:27 PM, Peter Lieven wrote:
>>>> Am 09.06.2013 um 05:09 schrieb Alexey Kardashevskiy <aik@ozlabs.ru>:
>>>>
>>>>> On 06/09/2013 01:01 PM, Wenchao Xia wrote:
>>>>>> 于 2013-6-9 10:34, Alexey Kardashevskiy 写道:
>>>>>>> On 06/09/2013 12:16 PM, Wenchao Xia wrote:
>>>>>>>> 于 2013-6-8 16:30, Alexey Kardashevskiy 写道:
>>>>>>>>> On 06/08/2013 06:27 PM, Wenchao Xia wrote:
>>>>>>>>>>> On 04.06.2013 16:40, Paolo Bonzini wrote:
>>>>>>>>>>>> Il 04/06/2013 16:38, Peter Lieven ha scritto:
>>>>>>>>>>>>> On 04.06.2013 16:14, Paolo Bonzini wrote:
>>>>>>>>>>>>>> Il 04/06/2013 15:52, Peter Lieven ha scritto:
>>>>>>>>>>>>>>> On 30.05.2013 16:41, Paolo Bonzini wrote:
>>>>>>>>>>>>>>>> Il 30/05/2013 16:38, Peter Lieven ha scritto:
>>>>>>>>>>>>>>>>>>> You could also scan the page for nonzero
>>>>>>>>>>>>>>>>>>> values before writing it.
>>>>>>>>>>>>>>>>> i had this in mind, but then choosed the other
>>>>>>>>>>>>>>>>> approach.... turned out to be a bad idea.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> alexey: i will prepare a patch later today,
>>>>>>>>>>>>>>>>> could you then please verify it fixes your
>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> paolo: would we still need the madvise or is
>>>>>>>>>>>>>>>>> it enough to not write the zeroes?
>>>>>>>>>>>>>>>> It should be enough to not write them.
>>>>>>>>>>>>>>> Problem: checking the pages for zero allocates
>>>>>>>>>>>>>>> them. even at the source.
>>>>>>>>>>>>>> It doesn't look like.  I tried this program and top
>>>>>>>>>>>>>> doesn't show an increasing amount of reserved
>>>>>>>>>>>>>> memory:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> #include <stdio.h> #include <stdlib.h> int main() {
>>>>>>>>>>>>>> char *x = malloc(500 << 20); int i, j; for (i = 0; i
>>>>>>>>>>>>>> < 500; i += 10) { for (j = 0; j < 10 << 20; j +=
>>>>>>>>>>>>>> 4096) { *(volatile char*) (x + (i << 20) + j); }
>>>>>>>>>>>>>> getchar(); } }
>>>>>>>>>>>>> strange. we are talking about RSS size, right?
>>>>>>>>>>>> None of the three top values change, and only VIRT is
>>>>>>>>>>>>> 500 MB.
>>>>>>>>>>>>> is the malloc above using mmapped memory?
>>>>>>>>>>>> Yes.
>>>>>>>>>>>>
>>>>>>>>>>>>> which kernel version do you use?
>>>>>>>>>>>> 3.9.
>>>>>>>>>>>>
>>>>>>>>>>>>> what avoids allocating the memory for me is the
>>>>>>>>>>>>> following (with whatever side effects it has ;-))
>>>>>>>>>>>> This would also fail to migrate any page that is swapped
>>>>>>>>>>>> out, breaking overcommit in a more subtle way. :)
>>>>>>>>>>>>
>>>>>>>>>>>> Paolo
>>>>>>>>>>> the following does also not allocate memory, but qemu
>>>>>>>>>>> does...
>>>>>>>>>> Hi, Peter As the patch writes
>>>>>>>>>>
>>>>>>>>>> "not sending zero pages breaks migration if a page is zero
>>>>>>>>>> at the source but not at the destination."
>>>>>>>>>>
>>>>>>>>>> I don't understand why it would be trouble, shouldn't all
>>>>>>>>>> page not received in dest be treated as zero pages?
>>>>>>>>> How would the destination guest know if some page must be
>>>>>>>>> cleared? The previous patch (which Peter reverted) did not
>>>>>>>>> send anything for the pages which were zero on the source
>>>>>>>>> side.
>>>>>>>> If an page was not received and destination knows that page
>>>>>>>> should exist according to total size, fill it with zero at
>>>>>>>> destination, would it solve the problem?
>>>>>>> It is _live_ migration, the source sends changes, same pages can
>>>>>>> change and be sent several times. So we would need to turn
>>>>>>> tracking on on the destination to know if some page was received
>>>>>>> from the source or changed by the destination itself (by writing
>>>>>>> there bios/firmware images, etc) and then clear pages which were
>>>>>>> touched by the destination and were not sent by the source.
>>>>>> OK, I can understand the problem is, for example: Destination boots
>>>>>> up with 0x0000-0xFFFF filled with bios image. Source forgot to send
>>>>>> zero pages in 0x0000-0xFFFF.
>>>>> The source did not forget, instead it zeroed these pages during its
>>>>> life and thought that they must be zeroed at the destination already
>>>>> (as the destination did not start and did not have a chance to write
>>>>> something there).
>>>>>
>>>>>
>>>>>> After migration destination got 0x0000-0xFFFF dirty(different with
>>>>>> source)
>>>>> Yep. And those pages were empty on the source what made debugging very
>>>>> easy :)
>>>>>
>>>>>
>>>>>> Thanks for explain.
>>>>>>
>>>>>> This seems refer to the migration protocol: how should the guest
>>>>>> treat unsent pages. The patch causing the problem, actually treat
>>>>>> zero pages as "not to sent" at source, but another half is missing:
>>>>>> treat "not received" as zero pages at destination. I guess if second
>>>>>> half is added, problem is gone: after page transfer completed,
>>>>>> before destination resume, fill zero in "not received" pages.
>>>>>
>>>>> Make a working patch, we'll discuss it :) I do not see much
>>>>> acceleration coming from there.
>>>> I would also not spent much time with this. I would either look to find
>>>> an easy way to fix the initialization code to not unneccessarily load
>>>> data into RAM or i will sent a v2 of my patch following Eric's
>>>> concerns.
>>> There is no easy way to implement the flag and keep your original patch as
>>> we have to implement this flag in all architectures which got broken by
>>> your patch and I personally can fix only PPC64-pseries but not the others.
>>>
>>> Furthermore your revert + new patches perfectly solve the problem, why
>>> would we want to bother now with this new flag which nobody really needs
>>> right now?
>>>
>>> Please, please, revert the original patch or I'll try to do it :)
>>>
>>>
>> I tried, but there where concerns by the community.
>
> Was here anybody who did not want to revert the patch (besides you)?
> I did not notice.
Eric said I should not drop the skipped_pages stuff in the monitor.
>
>
>> Alternativly I found
>> the following alternate solution. Please drop the 2 patches and try the
>> following:
>
> How is it going to work if upstream QEMU doesn't send anything about empty
> pages at all (this is why I want to revert that patch)?
I do not understand your question. The patch below zeroes out the destination
memory if it is not zero (e.g. if there is a BIOS copied to memory already during
machine init).

I would prefer not to completely drop the patch since it saves bandwidth and
resources.

Peter