From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43317)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Ztbvp-0004Ur-7j
	for qemu-devel@nongnu.org; Tue, 03 Nov 2015 08:47:26 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Ztbvm-0008Jl-1N
	for qemu-devel@nongnu.org; Tue, 03 Nov 2015 08:47:25 -0500
Received: from mx1.redhat.com ([209.132.183.28]:57895)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Ztbvl-0008Jh-SK
	for qemu-devel@nongnu.org; Tue, 03 Nov 2015 08:47:21 -0500
Date: Tue, 3 Nov 2015 13:47:17 +0000
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20151103134716.GC17670@work-vm>
References: <20151103122353.GB17670@work-vm>
 <874mh3z1hb.fsf@emacs.mitica>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <874mh3z1hb.fsf@emacs.mitica>
Subject: Re: [Qemu-devel] safety of migration_bitmap_extend
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Juan Quintela <quintela@redhat.com>
Cc: den@openvz.org, qemu-devel@nongnu.org, lizhijian@cn.fujitsu.com

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > Hi,
> >   I'm trying to understand why migration_bitmap_extend is correct/safe;
> > If I understand correctly, you're arguing that:
> >
> >   1) the migration_bitmap_mutex around the extend, stops any sync's happening
> >      and so no new bits will be set during the extend.
> >
> >   2) If migration sends a page and clears a bitmap entry, it doesn't
> >      matter if we lose the 'clear' because we're copying it as
> >      we extend it, because losing the clear just means the page
> >      gets resent, and so the data is OK.
> >
> > However, doesn't (2) mean that migration_dirty_pages might be wrong?
> > If a page was sent, the bit cleared, and migration_dirty_pages decremented,
> > then if we copy over that bitmap and 'set' that bit again then migration_dirty_pages
> > is too small; that means that either migration would finish too early,
> > or more likely, migration_dirty_pages would wrap-around -ve and
> > never finish.
> >
> > Is there a reason it's really safe?
> 
> No.  It is reasonably safe.  Various values of reasonably.
> 
> migration_dirty_pages should never arrive at values near zero.  Because
> we move to the completion stage way before it gets a value near zero.
> (We could have very, very bad luck, as in it is not safe).

That's only true if we hit the qemu_file_rate_limit() in ram_save_iterate;
if we don't hit the rate limit (e.g. because we're CPU or network limited
to slower than the set limit) then I think ram_save_iterate will go all the
way to sending every page; if that happens it'll go once more
around the main migration loop, and call the pending routine, and now get
a -ve (very +ve) number of pending pages, so continuously do ram_save_iterate
again.

We've had that type of bug before when we messed up the dirty-pages calculation
during hotplug.

> Now, do we really care if migration_dirty_pages is exact?  Not really,
> we just use it to calculate if we should start the throotle or not.
> That only test that each 1 second, so if we have written a couple of
> pages that we are not accounting for, things should be reasonably safe.
> 
> Once told that, I don't know why we didn't catch that problem during
> review (yes, I am guilty here).  Not sure how to really fix it,
> thought.  I think that the problem is more theoretical than real, but

Dave

> ....
> 
> Thanks, Juan.
> 
> >
> > Dave
> >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK