From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43317) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ztbvp-0004Ur-7j for qemu-devel@nongnu.org; Tue, 03 Nov 2015 08:47:26 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ztbvm-0008Jl-1N for qemu-devel@nongnu.org; Tue, 03 Nov 2015 08:47:25 -0500 Received: from mx1.redhat.com ([209.132.183.28]:57895) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ztbvl-0008Jh-SK for qemu-devel@nongnu.org; Tue, 03 Nov 2015 08:47:21 -0500 Date: Tue, 3 Nov 2015 13:47:17 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20151103134716.GC17670@work-vm> References: <20151103122353.GB17670@work-vm> <874mh3z1hb.fsf@emacs.mitica> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <874mh3z1hb.fsf@emacs.mitica> Subject: Re: [Qemu-devel] safety of migration_bitmap_extend List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Juan Quintela Cc: den@openvz.org, qemu-devel@nongnu.org, lizhijian@cn.fujitsu.com * Juan Quintela (quintela@redhat.com) wrote: > "Dr. David Alan Gilbert" wrote: > > Hi, > > I'm trying to understand why migration_bitmap_extend is correct/safe; > > If I understand correctly, you're arguing that: > > > > 1) the migration_bitmap_mutex around the extend, stops any sync's happening > > and so no new bits will be set during the extend. > > > > 2) If migration sends a page and clears a bitmap entry, it doesn't > > matter if we lose the 'clear' because we're copying it as > > we extend it, because losing the clear just means the page > > gets resent, and so the data is OK. > > > > However, doesn't (2) mean that migration_dirty_pages might be wrong? > > If a page was sent, the bit cleared, and migration_dirty_pages decremented, > > then if we copy over that bitmap and 'set' that bit again then migration_dirty_pages > > is too small; that means that either migration would finish too early, > > or more likely, migration_dirty_pages would wrap-around -ve and > > never finish. > > > > Is there a reason it's really safe? > > No. It is reasonably safe. Various values of reasonably. > > migration_dirty_pages should never arrive at values near zero. Because > we move to the completion stage way before it gets a value near zero. > (We could have very, very bad luck, as in it is not safe). That's only true if we hit the qemu_file_rate_limit() in ram_save_iterate; if we don't hit the rate limit (e.g. because we're CPU or network limited to slower than the set limit) then I think ram_save_iterate will go all the way to sending every page; if that happens it'll go once more around the main migration loop, and call the pending routine, and now get a -ve (very +ve) number of pending pages, so continuously do ram_save_iterate again. We've had that type of bug before when we messed up the dirty-pages calculation during hotplug. > Now, do we really care if migration_dirty_pages is exact? Not really, > we just use it to calculate if we should start the throotle or not. > That only test that each 1 second, so if we have written a couple of > pages that we are not accounting for, things should be reasonably safe. > > Once told that, I don't know why we didn't catch that problem during > review (yes, I am guilty here). Not sure how to really fix it, > thought. I think that the problem is more theoretical than real, but Dave > .... > > Thanks, Juan. > > > > > Dave > > > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK