Re: [Qemu-devel] safety of migration_bitmap_extend

From: Wen Congyang <wency@cn.fujitsu.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Juan Quintela <quintela@redhat.com>
Cc: den@openvz.org, qemu-devel@nongnu.org, lizhijian@cn.fujitsu.com
Subject: Re: [Qemu-devel] safety of migration_bitmap_extend
Date: Wed, 4 Nov 2015 11:10:03 +0800	[thread overview]
Message-ID: <5639770B.4090103@cn.fujitsu.com> (raw)
In-Reply-To: <20151103134716.GC17670@work-vm>

On 11/03/2015 09:47 PM, Dr. David Alan Gilbert wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
>>> Hi,
>>>   I'm trying to understand why migration_bitmap_extend is correct/safe;
>>> If I understand correctly, you're arguing that:
>>>
>>>   1) the migration_bitmap_mutex around the extend, stops any sync's happening
>>>      and so no new bits will be set during the extend.
>>>
>>>   2) If migration sends a page and clears a bitmap entry, it doesn't
>>>      matter if we lose the 'clear' because we're copying it as
>>>      we extend it, because losing the clear just means the page
>>>      gets resent, and so the data is OK.
>>>
>>> However, doesn't (2) mean that migration_dirty_pages might be wrong?
>>> If a page was sent, the bit cleared, and migration_dirty_pages decremented,
>>> then if we copy over that bitmap and 'set' that bit again then migration_dirty_pages
>>> is too small; that means that either migration would finish too early,
>>> or more likely, migration_dirty_pages would wrap-around -ve and
>>> never finish.
>>>
>>> Is there a reason it's really safe?
>>
>> No.  It is reasonably safe.  Various values of reasonably.
>>
>> migration_dirty_pages should never arrive at values near zero.  Because
>> we move to the completion stage way before it gets a value near zero.
>> (We could have very, very bad luck, as in it is not safe).
> 
> That's only true if we hit the qemu_file_rate_limit() in ram_save_iterate;
> if we don't hit the rate limit (e.g. because we're CPU or network limited
> to slower than the set limit) then I think ram_save_iterate will go all the
> way to sending every page; if that happens it'll go once more
> around the main migration loop, and call the pending routine, and now get
> a -ve (very +ve) number of pending pages, so continuously do ram_save_iterate
> again.
> 
> We've had that type of bug before when we messed up the dirty-pages calculation
> during hotplug.

IIUC, migration_bitmap_extend() is called when migration is running, and we hotplug
a device.

In this case, I think we hold the iothread mutex when migration_bitmap_extend() is called.

ram_save_complete() is also protected by the iothread mutex.

So if migration_bitmap_extend() is called, the migration thread may be blocked in
migration_completion() and wait it. qemu_savevm_state_complete() will be called after
migration_completion() returns.

Thanks
Wen Congyang

> 
>> Now, do we really care if migration_dirty_pages is exact?  Not really,
>> we just use it to calculate if we should start the throotle or not.
>> That only test that each 1 second, so if we have written a couple of
>> pages that we are not accounting for, things should be reasonably safe.
>>
>> Once told that, I don't know why we didn't catch that problem during
>> review (yes, I am guilty here).  Not sure how to really fix it,
>> thought.  I think that the problem is more theoretical than real, but
> 
> Dave
> 
>> ....
>>
>> Thanks, Juan.
>>
>>>
>>> Dave
>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> .
>