All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] safety of migration_bitmap_extend
@ 2015-11-03 12:23 Dr. David Alan Gilbert
  2015-11-03 12:55 ` Juan Quintela
  0 siblings, 1 reply; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-03 12:23 UTC (permalink / raw)
  To: lizhijian, den, quintela; +Cc: qemu-devel

Hi,
  I'm trying to understand why migration_bitmap_extend is correct/safe;
If I understand correctly, you're arguing that:

  1) the migration_bitmap_mutex around the extend, stops any sync's happening
     and so no new bits will be set during the extend.

  2) If migration sends a page and clears a bitmap entry, it doesn't
     matter if we lose the 'clear' because we're copying it as
     we extend it, because losing the clear just means the page
     gets resent, and so the data is OK.

However, doesn't (2) mean that migration_dirty_pages might be wrong?
If a page was sent, the bit cleared, and migration_dirty_pages decremented,
then if we copy over that bitmap and 'set' that bit again then migration_dirty_pages
is too small; that means that either migration would finish too early,
or more likely, migration_dirty_pages would wrap-around -ve and
never finish.

Is there a reason it's really safe?

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] safety of migration_bitmap_extend
  2015-11-03 12:23 [Qemu-devel] safety of migration_bitmap_extend Dr. David Alan Gilbert
@ 2015-11-03 12:55 ` Juan Quintela
  2015-11-03 13:47   ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 9+ messages in thread
From: Juan Quintela @ 2015-11-03 12:55 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: den, qemu-devel, lizhijian

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> Hi,
>   I'm trying to understand why migration_bitmap_extend is correct/safe;
> If I understand correctly, you're arguing that:
>
>   1) the migration_bitmap_mutex around the extend, stops any sync's happening
>      and so no new bits will be set during the extend.
>
>   2) If migration sends a page and clears a bitmap entry, it doesn't
>      matter if we lose the 'clear' because we're copying it as
>      we extend it, because losing the clear just means the page
>      gets resent, and so the data is OK.
>
> However, doesn't (2) mean that migration_dirty_pages might be wrong?
> If a page was sent, the bit cleared, and migration_dirty_pages decremented,
> then if we copy over that bitmap and 'set' that bit again then migration_dirty_pages
> is too small; that means that either migration would finish too early,
> or more likely, migration_dirty_pages would wrap-around -ve and
> never finish.
>
> Is there a reason it's really safe?

No.  It is reasonably safe.  Various values of reasonably.

migration_dirty_pages should never arrive at values near zero.  Because
we move to the completion stage way before it gets a value near zero.
(We could have very, very bad luck, as in it is not safe).

Now, do we really care if migration_dirty_pages is exact?  Not really,
we just use it to calculate if we should start the throotle or not.
That only test that each 1 second, so if we have written a couple of
pages that we are not accounting for, things should be reasonably safe.

Once told that, I don't know why we didn't catch that problem during
review (yes, I am guilty here).  Not sure how to really fix it,
thought.  I think that the problem is more theoretical than real, but
....

Thanks, Juan.

>
> Dave
>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] safety of migration_bitmap_extend
  2015-11-03 12:55 ` Juan Quintela
@ 2015-11-03 13:47   ` Dr. David Alan Gilbert
  2015-11-04  3:10     ` Wen Congyang
  0 siblings, 1 reply; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-03 13:47 UTC (permalink / raw)
  To: Juan Quintela; +Cc: den, qemu-devel, lizhijian

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > Hi,
> >   I'm trying to understand why migration_bitmap_extend is correct/safe;
> > If I understand correctly, you're arguing that:
> >
> >   1) the migration_bitmap_mutex around the extend, stops any sync's happening
> >      and so no new bits will be set during the extend.
> >
> >   2) If migration sends a page and clears a bitmap entry, it doesn't
> >      matter if we lose the 'clear' because we're copying it as
> >      we extend it, because losing the clear just means the page
> >      gets resent, and so the data is OK.
> >
> > However, doesn't (2) mean that migration_dirty_pages might be wrong?
> > If a page was sent, the bit cleared, and migration_dirty_pages decremented,
> > then if we copy over that bitmap and 'set' that bit again then migration_dirty_pages
> > is too small; that means that either migration would finish too early,
> > or more likely, migration_dirty_pages would wrap-around -ve and
> > never finish.
> >
> > Is there a reason it's really safe?
> 
> No.  It is reasonably safe.  Various values of reasonably.
> 
> migration_dirty_pages should never arrive at values near zero.  Because
> we move to the completion stage way before it gets a value near zero.
> (We could have very, very bad luck, as in it is not safe).

That's only true if we hit the qemu_file_rate_limit() in ram_save_iterate;
if we don't hit the rate limit (e.g. because we're CPU or network limited
to slower than the set limit) then I think ram_save_iterate will go all the
way to sending every page; if that happens it'll go once more
around the main migration loop, and call the pending routine, and now get
a -ve (very +ve) number of pending pages, so continuously do ram_save_iterate
again.

We've had that type of bug before when we messed up the dirty-pages calculation
during hotplug.

> Now, do we really care if migration_dirty_pages is exact?  Not really,
> we just use it to calculate if we should start the throotle or not.
> That only test that each 1 second, so if we have written a couple of
> pages that we are not accounting for, things should be reasonably safe.
> 
> Once told that, I don't know why we didn't catch that problem during
> review (yes, I am guilty here).  Not sure how to really fix it,
> thought.  I think that the problem is more theoretical than real, but

Dave

> ....
> 
> Thanks, Juan.
> 
> >
> > Dave
> >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] safety of migration_bitmap_extend
  2015-11-03 13:47   ` Dr. David Alan Gilbert
@ 2015-11-04  3:10     ` Wen Congyang
  2015-11-04  9:05       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 9+ messages in thread
From: Wen Congyang @ 2015-11-04  3:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Juan Quintela; +Cc: den, qemu-devel, lizhijian

On 11/03/2015 09:47 PM, Dr. David Alan Gilbert wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
>>> Hi,
>>>   I'm trying to understand why migration_bitmap_extend is correct/safe;
>>> If I understand correctly, you're arguing that:
>>>
>>>   1) the migration_bitmap_mutex around the extend, stops any sync's happening
>>>      and so no new bits will be set during the extend.
>>>
>>>   2) If migration sends a page and clears a bitmap entry, it doesn't
>>>      matter if we lose the 'clear' because we're copying it as
>>>      we extend it, because losing the clear just means the page
>>>      gets resent, and so the data is OK.
>>>
>>> However, doesn't (2) mean that migration_dirty_pages might be wrong?
>>> If a page was sent, the bit cleared, and migration_dirty_pages decremented,
>>> then if we copy over that bitmap and 'set' that bit again then migration_dirty_pages
>>> is too small; that means that either migration would finish too early,
>>> or more likely, migration_dirty_pages would wrap-around -ve and
>>> never finish.
>>>
>>> Is there a reason it's really safe?
>>
>> No.  It is reasonably safe.  Various values of reasonably.
>>
>> migration_dirty_pages should never arrive at values near zero.  Because
>> we move to the completion stage way before it gets a value near zero.
>> (We could have very, very bad luck, as in it is not safe).
> 
> That's only true if we hit the qemu_file_rate_limit() in ram_save_iterate;
> if we don't hit the rate limit (e.g. because we're CPU or network limited
> to slower than the set limit) then I think ram_save_iterate will go all the
> way to sending every page; if that happens it'll go once more
> around the main migration loop, and call the pending routine, and now get
> a -ve (very +ve) number of pending pages, so continuously do ram_save_iterate
> again.
> 
> We've had that type of bug before when we messed up the dirty-pages calculation
> during hotplug.

IIUC, migration_bitmap_extend() is called when migration is running, and we hotplug
a device.

In this case, I think we hold the iothread mutex when migration_bitmap_extend() is called.

ram_save_complete() is also protected by the iothread mutex.

So if migration_bitmap_extend() is called, the migration thread may be blocked in
migration_completion() and wait it. qemu_savevm_state_complete() will be called after
migration_completion() returns.

Thanks
Wen Congyang

> 
>> Now, do we really care if migration_dirty_pages is exact?  Not really,
>> we just use it to calculate if we should start the throotle or not.
>> That only test that each 1 second, so if we have written a couple of
>> pages that we are not accounting for, things should be reasonably safe.
>>
>> Once told that, I don't know why we didn't catch that problem during
>> review (yes, I am guilty here).  Not sure how to really fix it,
>> thought.  I think that the problem is more theoretical than real, but
> 
> Dave
> 
>> ....
>>
>> Thanks, Juan.
>>
>>>
>>> Dave
>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> .
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] safety of migration_bitmap_extend
  2015-11-04  3:10     ` Wen Congyang
@ 2015-11-04  9:05       ` Dr. David Alan Gilbert
  2015-11-04  9:13         ` Wen Congyang
  0 siblings, 1 reply; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-04  9:05 UTC (permalink / raw)
  To: Wen Congyang; +Cc: den, qemu-devel, lizhijian, Juan Quintela

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 11/03/2015 09:47 PM, Dr. David Alan Gilbert wrote:
> > * Juan Quintela (quintela@redhat.com) wrote:
> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >>> Hi,
> >>>   I'm trying to understand why migration_bitmap_extend is correct/safe;
> >>> If I understand correctly, you're arguing that:
> >>>
> >>>   1) the migration_bitmap_mutex around the extend, stops any sync's happening
> >>>      and so no new bits will be set during the extend.
> >>>
> >>>   2) If migration sends a page and clears a bitmap entry, it doesn't
> >>>      matter if we lose the 'clear' because we're copying it as
> >>>      we extend it, because losing the clear just means the page
> >>>      gets resent, and so the data is OK.
> >>>
> >>> However, doesn't (2) mean that migration_dirty_pages might be wrong?
> >>> If a page was sent, the bit cleared, and migration_dirty_pages decremented,
> >>> then if we copy over that bitmap and 'set' that bit again then migration_dirty_pages
> >>> is too small; that means that either migration would finish too early,
> >>> or more likely, migration_dirty_pages would wrap-around -ve and
> >>> never finish.
> >>>
> >>> Is there a reason it's really safe?
> >>
> >> No.  It is reasonably safe.  Various values of reasonably.
> >>
> >> migration_dirty_pages should never arrive at values near zero.  Because
> >> we move to the completion stage way before it gets a value near zero.
> >> (We could have very, very bad luck, as in it is not safe).
> > 
> > That's only true if we hit the qemu_file_rate_limit() in ram_save_iterate;
> > if we don't hit the rate limit (e.g. because we're CPU or network limited
> > to slower than the set limit) then I think ram_save_iterate will go all the
> > way to sending every page; if that happens it'll go once more
> > around the main migration loop, and call the pending routine, and now get
> > a -ve (very +ve) number of pending pages, so continuously do ram_save_iterate
> > again.
> > 
> > We've had that type of bug before when we messed up the dirty-pages calculation
> > during hotplug.
> 
> IIUC, migration_bitmap_extend() is called when migration is running, and we hotplug
> a device.
> 
> In this case, I think we hold the iothread mutex when migration_bitmap_extend() is called.
> 
> ram_save_complete() is also protected by the iothread mutex.
>
> So if migration_bitmap_extend() is called, the migration thread may be blocked in
> migration_completion() and wait it. qemu_savevm_state_complete() will be called after
> migration_completion() returns.

But I don't think ram_save_iterate is protected by that lock, and my concern
is that the dirty-pages calculation is wrong during the iteration phase, and then
the iteration phase will never exit and never try and get to ram_save_complete.

Dave

> 
> Thanks
> Wen Congyang
> 
> > 
> >> Now, do we really care if migration_dirty_pages is exact?  Not really,
> >> we just use it to calculate if we should start the throotle or not.
> >> That only test that each 1 second, so if we have written a couple of
> >> pages that we are not accounting for, things should be reasonably safe.
> >>
> >> Once told that, I don't know why we didn't catch that problem during
> >> review (yes, I am guilty here).  Not sure how to really fix it,
> >> thought.  I think that the problem is more theoretical than real, but
> > 
> > Dave
> > 
> >> ....
> >>
> >> Thanks, Juan.
> >>
> >>>
> >>> Dave
> >>>
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> > .
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] safety of migration_bitmap_extend
  2015-11-04  9:05       ` Dr. David Alan Gilbert
@ 2015-11-04  9:13         ` Wen Congyang
  2015-11-04  9:19           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 9+ messages in thread
From: Wen Congyang @ 2015-11-04  9:13 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: den, qemu-devel, lizhijian, Juan Quintela

On 11/04/2015 05:05 PM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> On 11/03/2015 09:47 PM, Dr. David Alan Gilbert wrote:
>>> * Juan Quintela (quintela@redhat.com) wrote:
>>>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
>>>>> Hi,
>>>>>   I'm trying to understand why migration_bitmap_extend is correct/safe;
>>>>> If I understand correctly, you're arguing that:
>>>>>
>>>>>   1) the migration_bitmap_mutex around the extend, stops any sync's happening
>>>>>      and so no new bits will be set during the extend.
>>>>>
>>>>>   2) If migration sends a page and clears a bitmap entry, it doesn't
>>>>>      matter if we lose the 'clear' because we're copying it as
>>>>>      we extend it, because losing the clear just means the page
>>>>>      gets resent, and so the data is OK.
>>>>>
>>>>> However, doesn't (2) mean that migration_dirty_pages might be wrong?
>>>>> If a page was sent, the bit cleared, and migration_dirty_pages decremented,
>>>>> then if we copy over that bitmap and 'set' that bit again then migration_dirty_pages
>>>>> is too small; that means that either migration would finish too early,
>>>>> or more likely, migration_dirty_pages would wrap-around -ve and
>>>>> never finish.
>>>>>
>>>>> Is there a reason it's really safe?
>>>>
>>>> No.  It is reasonably safe.  Various values of reasonably.
>>>>
>>>> migration_dirty_pages should never arrive at values near zero.  Because
>>>> we move to the completion stage way before it gets a value near zero.
>>>> (We could have very, very bad luck, as in it is not safe).
>>>
>>> That's only true if we hit the qemu_file_rate_limit() in ram_save_iterate;
>>> if we don't hit the rate limit (e.g. because we're CPU or network limited
>>> to slower than the set limit) then I think ram_save_iterate will go all the
>>> way to sending every page; if that happens it'll go once more
>>> around the main migration loop, and call the pending routine, and now get
>>> a -ve (very +ve) number of pending pages, so continuously do ram_save_iterate
>>> again.
>>>
>>> We've had that type of bug before when we messed up the dirty-pages calculation
>>> during hotplug.
>>
>> IIUC, migration_bitmap_extend() is called when migration is running, and we hotplug
>> a device.
>>
>> In this case, I think we hold the iothread mutex when migration_bitmap_extend() is called.
>>
>> ram_save_complete() is also protected by the iothread mutex.
>>
>> So if migration_bitmap_extend() is called, the migration thread may be blocked in
>> migration_completion() and wait it. qemu_savevm_state_complete() will be called after
>> migration_completion() returns.
> 
> But I don't think ram_save_iterate is protected by that lock, and my concern
> is that the dirty-pages calculation is wrong during the iteration phase, and then
> the iteration phase will never exit and never try and get to ram_save_complete.

Yes, the dirty-pages may be wrong. But it is smaller, not larger than the exact value.
Why will the iteration phase never exit?

Thanks
Wen Congyang

> 
> Dave
> 
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>>> Now, do we really care if migration_dirty_pages is exact?  Not really,
>>>> we just use it to calculate if we should start the throotle or not.
>>>> That only test that each 1 second, so if we have written a couple of
>>>> pages that we are not accounting for, things should be reasonably safe.
>>>>
>>>> Once told that, I don't know why we didn't catch that problem during
>>>> review (yes, I am guilty here).  Not sure how to really fix it,
>>>> thought.  I think that the problem is more theoretical than real, but
>>>
>>> Dave
>>>
>>>> ....
>>>>
>>>> Thanks, Juan.
>>>>
>>>>>
>>>>> Dave
>>>>>
>>>>> --
>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>> .
>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] safety of migration_bitmap_extend
  2015-11-04  9:13         ` Wen Congyang
@ 2015-11-04  9:19           ` Dr. David Alan Gilbert
  2015-11-12  8:33             ` Wen Congyang
  0 siblings, 1 reply; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-04  9:19 UTC (permalink / raw)
  To: Wen Congyang; +Cc: den, qemu-devel, lizhijian, Juan Quintela

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 11/04/2015 05:05 PM, Dr. David Alan Gilbert wrote:
> > * Wen Congyang (wency@cn.fujitsu.com) wrote:
> >> On 11/03/2015 09:47 PM, Dr. David Alan Gilbert wrote:
> >>> * Juan Quintela (quintela@redhat.com) wrote:
> >>>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >>>>> Hi,
> >>>>>   I'm trying to understand why migration_bitmap_extend is correct/safe;
> >>>>> If I understand correctly, you're arguing that:
> >>>>>
> >>>>>   1) the migration_bitmap_mutex around the extend, stops any sync's happening
> >>>>>      and so no new bits will be set during the extend.
> >>>>>
> >>>>>   2) If migration sends a page and clears a bitmap entry, it doesn't
> >>>>>      matter if we lose the 'clear' because we're copying it as
> >>>>>      we extend it, because losing the clear just means the page
> >>>>>      gets resent, and so the data is OK.
> >>>>>
> >>>>> However, doesn't (2) mean that migration_dirty_pages might be wrong?
> >>>>> If a page was sent, the bit cleared, and migration_dirty_pages decremented,
> >>>>> then if we copy over that bitmap and 'set' that bit again then migration_dirty_pages
> >>>>> is too small; that means that either migration would finish too early,
> >>>>> or more likely, migration_dirty_pages would wrap-around -ve and
> >>>>> never finish.
> >>>>>
> >>>>> Is there a reason it's really safe?
> >>>>
> >>>> No.  It is reasonably safe.  Various values of reasonably.
> >>>>
> >>>> migration_dirty_pages should never arrive at values near zero.  Because
> >>>> we move to the completion stage way before it gets a value near zero.
> >>>> (We could have very, very bad luck, as in it is not safe).
> >>>
> >>> That's only true if we hit the qemu_file_rate_limit() in ram_save_iterate;
> >>> if we don't hit the rate limit (e.g. because we're CPU or network limited
> >>> to slower than the set limit) then I think ram_save_iterate will go all the
> >>> way to sending every page; if that happens it'll go once more
> >>> around the main migration loop, and call the pending routine, and now get
> >>> a -ve (very +ve) number of pending pages, so continuously do ram_save_iterate
> >>> again.
> >>>
> >>> We've had that type of bug before when we messed up the dirty-pages calculation
> >>> during hotplug.
> >>
> >> IIUC, migration_bitmap_extend() is called when migration is running, and we hotplug
> >> a device.
> >>
> >> In this case, I think we hold the iothread mutex when migration_bitmap_extend() is called.
> >>
> >> ram_save_complete() is also protected by the iothread mutex.
> >>
> >> So if migration_bitmap_extend() is called, the migration thread may be blocked in
> >> migration_completion() and wait it. qemu_savevm_state_complete() will be called after
> >> migration_completion() returns.
> > 
> > But I don't think ram_save_iterate is protected by that lock, and my concern
> > is that the dirty-pages calculation is wrong during the iteration phase, and then
> > the iteration phase will never exit and never try and get to ram_save_complete.
> 
> Yes, the dirty-pages may be wrong. But it is smaller, not larger than the exact value.
> Why will the iteration phase never exit?

Imagine that migration_dirty_pages is slightly too small and we enter ram_save_iterate;
ram_save_iterate now sends *all* it's pages, it decrements migration_dirty_pages for
every page sent.  At the end of ram_save_iterate, migration_dirty_pages would be negative.
But migration_dirty_pages is *u*int64_t; so we exit ram_save_iterate,
go around the main migration_thread loop again and call qemu_savevm_state_pending, and
it returns a very large number (because it's actually a negative number), so we keep
going around the loop, because it never gets smaller.

Dave

> 
> Thanks
> Wen Congyang
> 
> > 
> > Dave
> > 
> >>
> >> Thanks
> >> Wen Congyang
> >>
> >>>
> >>>> Now, do we really care if migration_dirty_pages is exact?  Not really,
> >>>> we just use it to calculate if we should start the throotle or not.
> >>>> That only test that each 1 second, so if we have written a couple of
> >>>> pages that we are not accounting for, things should be reasonably safe.
> >>>>
> >>>> Once told that, I don't know why we didn't catch that problem during
> >>>> review (yes, I am guilty here).  Not sure how to really fix it,
> >>>> thought.  I think that the problem is more theoretical than real, but
> >>>
> >>> Dave
> >>>
> >>>> ....
> >>>>
> >>>> Thanks, Juan.
> >>>>
> >>>>>
> >>>>> Dave
> >>>>>
> >>>>> --
> >>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>
> >>> .
> >>>
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > .
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] safety of migration_bitmap_extend
  2015-11-04  9:19           ` Dr. David Alan Gilbert
@ 2015-11-12  8:33             ` Wen Congyang
  2015-11-13  8:55               ` Li Zhijian
  0 siblings, 1 reply; 9+ messages in thread
From: Wen Congyang @ 2015-11-12  8:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: den, qemu-devel, lizhijian, Juan Quintela

On 11/04/2015 05:19 PM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> On 11/04/2015 05:05 PM, Dr. David Alan Gilbert wrote:
>>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>>>> On 11/03/2015 09:47 PM, Dr. David Alan Gilbert wrote:
>>>>> * Juan Quintela (quintela@redhat.com) wrote:
>>>>>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
>>>>>>> Hi,
>>>>>>>   I'm trying to understand why migration_bitmap_extend is correct/safe;
>>>>>>> If I understand correctly, you're arguing that:
>>>>>>>
>>>>>>>   1) the migration_bitmap_mutex around the extend, stops any sync's happening
>>>>>>>      and so no new bits will be set during the extend.
>>>>>>>
>>>>>>>   2) If migration sends a page and clears a bitmap entry, it doesn't
>>>>>>>      matter if we lose the 'clear' because we're copying it as
>>>>>>>      we extend it, because losing the clear just means the page
>>>>>>>      gets resent, and so the data is OK.
>>>>>>>
>>>>>>> However, doesn't (2) mean that migration_dirty_pages might be wrong?
>>>>>>> If a page was sent, the bit cleared, and migration_dirty_pages decremented,
>>>>>>> then if we copy over that bitmap and 'set' that bit again then migration_dirty_pages
>>>>>>> is too small; that means that either migration would finish too early,
>>>>>>> or more likely, migration_dirty_pages would wrap-around -ve and
>>>>>>> never finish.
>>>>>>>
>>>>>>> Is there a reason it's really safe?
>>>>>>
>>>>>> No.  It is reasonably safe.  Various values of reasonably.
>>>>>>
>>>>>> migration_dirty_pages should never arrive at values near zero.  Because
>>>>>> we move to the completion stage way before it gets a value near zero.
>>>>>> (We could have very, very bad luck, as in it is not safe).
>>>>>
>>>>> That's only true if we hit the qemu_file_rate_limit() in ram_save_iterate;
>>>>> if we don't hit the rate limit (e.g. because we're CPU or network limited
>>>>> to slower than the set limit) then I think ram_save_iterate will go all the
>>>>> way to sending every page; if that happens it'll go once more
>>>>> around the main migration loop, and call the pending routine, and now get
>>>>> a -ve (very +ve) number of pending pages, so continuously do ram_save_iterate
>>>>> again.
>>>>>
>>>>> We've had that type of bug before when we messed up the dirty-pages calculation
>>>>> during hotplug.
>>>>
>>>> IIUC, migration_bitmap_extend() is called when migration is running, and we hotplug
>>>> a device.
>>>>
>>>> In this case, I think we hold the iothread mutex when migration_bitmap_extend() is called.
>>>>
>>>> ram_save_complete() is also protected by the iothread mutex.
>>>>
>>>> So if migration_bitmap_extend() is called, the migration thread may be blocked in
>>>> migration_completion() and wait it. qemu_savevm_state_complete() will be called after
>>>> migration_completion() returns.
>>>
>>> But I don't think ram_save_iterate is protected by that lock, and my concern
>>> is that the dirty-pages calculation is wrong during the iteration phase, and then
>>> the iteration phase will never exit and never try and get to ram_save_complete.
>>
>> Yes, the dirty-pages may be wrong. But it is smaller, not larger than the exact value.
>> Why will the iteration phase never exit?
> 
> Imagine that migration_dirty_pages is slightly too small and we enter ram_save_iterate;
> ram_save_iterate now sends *all* it's pages, it decrements migration_dirty_pages for
> every page sent.  At the end of ram_save_iterate, migration_dirty_pages would be negative.
> But migration_dirty_pages is *u*int64_t; so we exit ram_save_iterate,
> go around the main migration_thread loop again and call qemu_savevm_state_pending, and
> it returns a very large number (because it's actually a negative number), so we keep
> going around the loop, because it never gets smaller.

I don't know how to trigger the problem. I think store migration_dirty_pages in BitmapRcu
can fix this problem.

Thanks
Wen Congyang

> 
> Dave
> 
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>> Dave
>>>
>>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>>> Now, do we really care if migration_dirty_pages is exact?  Not really,
>>>>>> we just use it to calculate if we should start the throotle or not.
>>>>>> That only test that each 1 second, so if we have written a couple of
>>>>>> pages that we are not accounting for, things should be reasonably safe.
>>>>>>
>>>>>> Once told that, I don't know why we didn't catch that problem during
>>>>>> review (yes, I am guilty here).  Not sure how to really fix it,
>>>>>> thought.  I think that the problem is more theoretical than real, but
>>>>>
>>>>> Dave
>>>>>
>>>>>> ....
>>>>>>
>>>>>> Thanks, Juan.
>>>>>>
>>>>>>>
>>>>>>> Dave
>>>>>>>
>>>>>>> --
>>>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>> --
>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>
>>>>> .
>>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>> .
>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] safety of migration_bitmap_extend
  2015-11-12  8:33             ` Wen Congyang
@ 2015-11-13  8:55               ` Li Zhijian
  0 siblings, 0 replies; 9+ messages in thread
From: Li Zhijian @ 2015-11-13  8:55 UTC (permalink / raw)
  To: Wen Congyang, Dr. David Alan Gilbert; +Cc: den, qemu-devel, Juan Quintela


[-- Attachment #1.1: Type: text/plain, Size: 930 bytes --]

On 11/12/2015 04:33 PM, Wen Congyang wrote:

>> >Imagine that migration_dirty_pages is slightly too small and we enter ram_save_iterate;
>> >ram_save_iterate now sends*all*  it's pages, it decrements migration_dirty_pages for
>> >every page sent.  At the end of ram_save_iterate, migration_dirty_pages would be negative.
>> >But migration_dirty_pages is *u*int64_t; so we exit ram_save_iterate,
>> >go around the main migration_thread loop again and call qemu_savevm_state_pending, and
>> >it returns a very large number (because it's actually a negative number), so we keep
>> >going around the loop, because it never gets smaller.
> I don't know how to trigger the problem. I think store migration_dirty_pages in BitmapRcu
> can fix this problem.
>
>
hi, David

It seem that it's not easy to reproduce this problem in my environment.
and the following 2 patches are to fix this issue, can you help to review and test.

thx

Li


[-- Attachment #1.2: Type: text/html, Size: 1804 bytes --]

[-- Attachment #2: 0002-migration-use-rcu-to-protect-migration_dirty_pages.patch --]
[-- Type: text/x-patch, Size: 4978 bytes --]

>From 81f27571b3ec2be2afb76576b47f42fb96a06411 Mon Sep 17 00:00:00 2001
From: Wen Congyang <wency@cn.fujitsu.com>
Date: Fri, 13 Nov 2015 13:16:25 +0800
Subject: [PATCH 2/2] migration: use rcu to protect migration_dirty_pages

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 migration/ram.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7f32696..ef259f9 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -221,7 +221,6 @@ static RAMBlock *last_seen_block;
 static RAMBlock *last_sent_block;
 static ram_addr_t last_offset;
 static QemuMutex migration_bitmap_mutex;
-static uint64_t migration_dirty_pages;
 static uint32_t last_version;
 static bool ram_bulk_stage;
 
@@ -246,6 +245,7 @@ static struct BitmapRcu {
      * of the postcopy phase
      */
     unsigned long *unsentmap;
+    uint64_t dirty_pages;
 } *migration_bitmap_rcu;
 
 struct CompressParam {
@@ -580,7 +580,7 @@ static inline bool migration_bitmap_clear_dirty(ram_addr_t addr)
     ret = test_and_clear_bit(nr, bitmap);
 
     if (ret) {
-        migration_dirty_pages--;
+        atomic_rcu_read(&migration_bitmap_rcu)->dirty_pages--;
     }
     return ret;
 }
@@ -589,7 +589,7 @@ static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
 {
     unsigned long *bitmap;
     bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
-    migration_dirty_pages +=
+    atomic_rcu_read(&migration_bitmap_rcu)->dirty_pages +=
         cpu_physical_memory_sync_dirty_bitmap(bitmap, start, length);
 }
 
@@ -613,7 +613,7 @@ static void migration_bitmap_sync_init(void)
 static void migration_bitmap_sync(void)
 {
     RAMBlock *block;
-    uint64_t num_dirty_pages_init = migration_dirty_pages;
+    uint64_t num_dirty_pages_init, num_dirty_pages_new;
     MigrationState *s = migrate_get_current();
     int64_t end_time;
     int64_t bytes_xfer_now;
@@ -633,15 +633,17 @@ static void migration_bitmap_sync(void)
 
     qemu_mutex_lock(&migration_bitmap_mutex);
     rcu_read_lock();
+    num_dirty_pages_init = atomic_rcu_read(&migration_bitmap_rcu)->dirty_pages;
     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
         migration_bitmap_sync_range(block->offset, block->used_length);
     }
+    num_dirty_pages_new = atomic_rcu_read(&migration_bitmap_rcu)->dirty_pages;
     rcu_read_unlock();
     qemu_mutex_unlock(&migration_bitmap_mutex);
 
-    trace_migration_bitmap_sync_end(migration_dirty_pages
+    trace_migration_bitmap_sync_end(num_dirty_pages_new
                                     - num_dirty_pages_init);
-    num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
+    num_dirty_pages_period += num_dirty_pages_new - num_dirty_pages_init;
     end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 
     /* more than 1 second = 1000 millisecons */
@@ -1364,7 +1366,13 @@ void acct_update_position(QEMUFile *f, size_t size, bool zero)
 
 static ram_addr_t ram_save_remaining(void)
 {
-    return migration_dirty_pages;
+    ram_addr_t dirty_pages;
+
+    rcu_read_lock();
+    dirty_pages = atomic_rcu_read(&migration_bitmap_rcu)->dirty_pages;
+    rcu_read_unlock();
+
+    return dirty_pages;
 }
 
 uint64_t ram_bytes_remaining(void)
@@ -1454,7 +1462,9 @@ void migration_bitmap_extend(ram_addr_t old, ram_addr_t new)
          */
         qemu_mutex_lock(&migration_bitmap_mutex);
         bitmap_copy(bitmap->bmap, old_bitmap->bmap, old);
+        bitmap->dirty_pages = bitmap_weight(bitmap->bmap, old);
         bitmap_set(bitmap->bmap, old, new - old);
+        bitmap->dirty_pages += new - old;
 
         /* We don't have a way to safely extend the sentmap
          * with RCU; so mark it as missing, entry to postcopy
@@ -1464,7 +1474,6 @@ void migration_bitmap_extend(ram_addr_t old, ram_addr_t new)
 
         atomic_rcu_set(&migration_bitmap_rcu, bitmap);
         qemu_mutex_unlock(&migration_bitmap_mutex);
-        migration_dirty_pages += new - old;
         call_rcu(old_bitmap, migration_bitmap_free, rcu);
     }
 }
@@ -1692,7 +1701,8 @@ static void postcopy_chunk_hostpages_pass(MigrationState *ms, bool unsent_pass,
                  * Remark them as dirty, updating the count for any pages
                  * that weren't previously dirty.
                  */
-                migration_dirty_pages += !test_and_set_bit(page, bitmap);
+                atomic_rcu_read(&migration_bitmap_rcu)->dirty_pages +=
+                    !test_and_set_bit(page, bitmap);
             }
         }
 
@@ -1924,7 +1934,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
      * Count the total number of pages used by ram blocks not including any
      * gaps due to alignment or unplugs.
      */
-    migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
+    migration_bitmap_rcu->dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
 
     memory_global_dirty_log_start();
     migration_bitmap_sync();
-- 
2.1.4


[-- Attachment #3: 0001-bitmap-add-bitmap_weight-api.patch --]
[-- Type: text/x-patch, Size: 1590 bytes --]

>From 7a2cffb8802e955ede85fb907e94a7172b4a0fa6 Mon Sep 17 00:00:00 2001
From: Li Zhijian <lizhijian@cn.fujitsu.com>
Date: Fri, 13 Nov 2015 15:51:48 +0800
Subject: [PATCH 1/2] bitmap: add bitmap_weight() api

count the number of bit set in bitmap

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 include/qemu/bitmap.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 86dd9cd..6e48429 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -43,6 +43,7 @@
  * bitmap_clear(dst, pos, nbits)		Clear specified bit area
  * bitmap_test_and_clear_atomic(dst, pos, nbits)    Test and clear area
  * bitmap_find_next_zero_area(buf, len, pos, n, mask)	Find bit free area
+ * bitmap_weight(src, nbits)                    Hamming Weight: number set bits
  */
 
 /*
@@ -227,6 +228,23 @@ static inline int bitmap_intersects(const unsigned long *src1,
     }
 }
 
+static inline int bitmap_weight(const unsigned long *src, long nbits)
+{
+    int i, count = 0, nlong = nbits / BITS_PER_LONG;
+
+    if (small_nbits(nbits)) {
+        return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits));
+    }
+    for (i = 0; i < nlong; i++) {
+        count += hweight_long(src[i]);
+    }
+    if (nbits % BITS_PER_LONG) {
+        count += hweight_long(src[i] & BITMAP_LAST_WORD_MASK(nbits));
+    }
+
+    return count;
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-11-13  8:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-03 12:23 [Qemu-devel] safety of migration_bitmap_extend Dr. David Alan Gilbert
2015-11-03 12:55 ` Juan Quintela
2015-11-03 13:47   ` Dr. David Alan Gilbert
2015-11-04  3:10     ` Wen Congyang
2015-11-04  9:05       ` Dr. David Alan Gilbert
2015-11-04  9:13         ` Wen Congyang
2015-11-04  9:19           ` Dr. David Alan Gilbert
2015-11-12  8:33             ` Wen Congyang
2015-11-13  8:55               ` Li Zhijian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.