All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wei Wang <wei.w.wang@intel.com>
To: Peter Xu <peterx@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org, virtio-dev@lists.oasis-open.org,
	quintela@redhat.com, dgilbert@redhat.com, pbonzini@redhat.com,
	liliang.opensource@gmail.com, yang.zhang.wz@gmail.com,
	quan.xu0@gmail.com, nilal@redhat.com, riel@redhat.com,
	zhang.zhanghailiang@huawei.com
Subject: Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
Date: Thu, 07 Jun 2018 19:59:22 +0800	[thread overview]
Message-ID: <5B191E1A.1000605@intel.com> (raw)
In-Reply-To: <20180607063241.GA750@xz-mi>

On 06/07/2018 02:32 PM, Peter Xu wrote:
> On Thu, Jun 07, 2018 at 01:24:29PM +0800, Wei Wang wrote:
>> On 06/06/2018 07:02 PM, Peter Xu wrote:
>>> On Wed, Jun 06, 2018 at 06:04:23PM +0800, Wei Wang wrote:
>>>> On 06/06/2018 01:42 PM, Peter Xu wrote:
>>>>> IMHO migration states do not suite here.  IMHO bitmap syncing is too
>>>>> frequently an operation, especially at the end of a precopy migration.
>>>>> If you really want to introduce some notifiers, I would prefer
>>>>> something new rather than fiddling around with migration state.  E.g.,
>>>>> maybe a new migration event notifiers, then introduce two new events
>>>>> for both start/end of bitmap syncing.
>>>> Please see if below aligns to what you meant:
>>>>
>>>> MigrationState {
>>>> ...
>>>> + int ram_save_state;
>>>>
>>>> }
>>>>
>>>> typedef enum RamSaveState {
>>>>       RAM_SAVE_BEGIN = 0,
>>>>       RAM_SAVE_END = 1,
>>>>       RAM_SAVE_MAX = 2
>>>> }
>>>>
>>>> then at the step 1) and 3) you concluded somewhere below, we change the
>>>> state and invoke the callback.
>>> I mean something like this:
>>>
>>> 1693c64c27 ("postcopy: Add notifier chain", 2018-03-20)
>>>
>>> That was a postcopy-only notifier.  Maybe we can generalize it into a
>>> more common notifier for the migration framework so that we can even
>>> register with non-postcopy events like bitmap syncing?
>> Precopy already has its own notifiers: git 99a0db9b
>> If we want to reuse, that one would be more suitable. I think mixing
>> non-related events into one notifier list isn't nice.
> I think that's only for migration state changes?
>
>>>> Btw, the migration_state_notifiers is already there, but seems not really
>>>> used (I only tracked spice-core.c called
>>>> add_migration_state_change_notifier). I thought adding new migration states
>>>> can reuse all that we have.
>>>> What's your real concern about that? (not sure how defining new events would
>>>> make a difference)
>>> Migration state is exposed via control path (QMP).  Adding new states
>>> mean that the QMP clients will see more.  IMO that's not really
>>> anything that a QMP client will need to know, instead we can keep it
>>> internally.  That's a reason from compatibility pov.
>>>
>>> Meanwhile, it's not really a state-thing at all for me.  It looks
>>> really more like hook or event (start/stop of sync).
>> Thanks for sharing your concerns in detail, which are quite helpful for the
>> discussion. To reuse 99a0db9b, we can also add sub-states (or say events),
>> instead of new migration states.
>> For example, we can still define "enum RamSaveState" as above, which can be
>> an indication for the notifier queued on the 99a0db9b notider_list to decide
>> whether to call start or stop.
>> Does this solve your concern?
> Frankly speaking I don't fully understand how you would add that
> sub-state.  If you are confident with the idea, maybe you can post
> your new version with the change, then I can read the code.

Sure. Code is more straightforward for this one. Let's check it in the 
new version.

>>>>> This is not that obvious to me.  For now I think it's true, since when
>>>>> we call stop() we'll take the mutex, meanwhile the mutex is actually
>>>>> always held by the iothread (in the big loop in
>>>>> virtio_balloon_poll_free_page_hints) until either:
>>>>>
>>>>> - it sleeps in qemu_cond_wait() [1], or
>>>>> - it leaves the big loop [2]
>>>>>
>>>>> Since I don't see anyone who will set dev->block_iothread to true for
>>>>> the balloon device, then [1] cannot happen;
>>>> there is a case in virtio_balloon_set_status which sets dev->block_iothread
>>>> to true.
>>>>
>>>> Did you mean the free_page_lock mutex? it is released at the bottom of the
>>>> while() loop in virtio_balloon_poll_free_page_hint. It's actually released
>>>> for every hint. That is,
>>>>
>>>> while(1){
>>>>       take the lock;
>>>>       process 1 hint from the vq;
>>>>       release the lock;
>>>> }
>>> Ah, so now I understand why you need the lock to be inside the loop,
>>> since the loop is busy polling actually.  Is it possible to do this in
>>> an async way?
>> We need to use polling here because of some back story in the guest side
>> (due to some locks being held) that makes it a barrier to sending
>> notifications for each hints.
> Any link to the "back story" that I can learn about? :) If it's too
> complicated a problem and you think I don't need to understand at all,
> please feel free to do so.

I searched a little bit, and forgot where we discussed this one. But the 
conclusion is that we don't want kick happens when the mm lock is held. 
Also, polling is a good idea here to me.
There are 32 versions of kernel patch discussions scattered, interesting 
to watch, but might take too much time. Also people usually have 
different thoughts (sometimes with partial understanding) when they 
watch something (we even have many different versions of implementations 
ourselves if you check the whole 32 versions). It's not easy to get here 
with many consensus. That's why I hope our discussion could be more 
focused on the migration part, which is the last part that has not be 
fully finalized.



> Then I would assume at least Michael has
> fully acknowledged that idea, and I can just stop putting more time on
> this part.

Yes, he's been on the loop since the beginning.


>
> Besides, if you are going to use a busy loop, then I would be not
> quite sure about whether you really want to share that iothread with
> others, since AFAIU that's not how iothread is designed (which is
> mostly event-based and should not welcome out-of-control blocking in
> the handler of events).  Though I am not 100% confident about my
> understaning on that, I only raise this question up.  Anyway you'll
> just take over the thread for a while without sharing, and after the
> burst IOs it's mostly never used (until the next bitmap sync).  Then
> it seems a bit confusing to me on why you need to share that after
> all.

Not necessarily _need_ to share it, I meant it can be shared using qemu 
command line.
Live migration doesn't happen all the time, and that optimization 
doesn't run that long, if users want to have other BHs run in this 
iothread context, they can only create one iothread via the qemu cmd line.


>
>>> I'm a bit curious on how much time will it use to do
>>> one round of the free page hints (e.g., an idle guest with 8G mem, or
>>> any configuration you tested)?  I suppose during that time the
>>> iothread will be held steady with 100% cpu usage, am I right?
>> Compared to the time spent by the legacy migration to send free pages, that
>> small amount of CPU usage spent on filtering free pages could be neglected.
>> Grinding a chopper will not hold up the work of cutting firewood :)
> Sorry I didn't express myself clearly.
>
> My question was that, have you measured how long time it will take
> from starting of the free page hints (when balloon state is set to
> FREE_PAGE_REPORT_S_REQUESTED), until it completes (when QEMU receives
> the VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID, then set the status to
> FREE_PAGE_REPORT_S_STOP)?
>

I vaguely remember it's several ms (for around 7.5G free pages) long 
time ago. What would be the concern behind that number you want to know?

Best,
Wei

WARNING: multiple messages have this Message-ID (diff)
From: Wei Wang <wei.w.wang@intel.com>
To: Peter Xu <peterx@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org, virtio-dev@lists.oasis-open.org,
	quintela@redhat.com, dgilbert@redhat.com, pbonzini@redhat.com,
	liliang.opensource@gmail.com, yang.zhang.wz@gmail.com,
	quan.xu0@gmail.com, nilal@redhat.com, riel@redhat.com,
	zhang.zhanghailiang@huawei.com
Subject: [virtio-dev] Re: [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
Date: Thu, 07 Jun 2018 19:59:22 +0800	[thread overview]
Message-ID: <5B191E1A.1000605@intel.com> (raw)
In-Reply-To: <20180607063241.GA750@xz-mi>

On 06/07/2018 02:32 PM, Peter Xu wrote:
> On Thu, Jun 07, 2018 at 01:24:29PM +0800, Wei Wang wrote:
>> On 06/06/2018 07:02 PM, Peter Xu wrote:
>>> On Wed, Jun 06, 2018 at 06:04:23PM +0800, Wei Wang wrote:
>>>> On 06/06/2018 01:42 PM, Peter Xu wrote:
>>>>> IMHO migration states do not suite here.  IMHO bitmap syncing is too
>>>>> frequently an operation, especially at the end of a precopy migration.
>>>>> If you really want to introduce some notifiers, I would prefer
>>>>> something new rather than fiddling around with migration state.  E.g.,
>>>>> maybe a new migration event notifiers, then introduce two new events
>>>>> for both start/end of bitmap syncing.
>>>> Please see if below aligns to what you meant:
>>>>
>>>> MigrationState {
>>>> ...
>>>> + int ram_save_state;
>>>>
>>>> }
>>>>
>>>> typedef enum RamSaveState {
>>>>       RAM_SAVE_BEGIN = 0,
>>>>       RAM_SAVE_END = 1,
>>>>       RAM_SAVE_MAX = 2
>>>> }
>>>>
>>>> then at the step 1) and 3) you concluded somewhere below, we change the
>>>> state and invoke the callback.
>>> I mean something like this:
>>>
>>> 1693c64c27 ("postcopy: Add notifier chain", 2018-03-20)
>>>
>>> That was a postcopy-only notifier.  Maybe we can generalize it into a
>>> more common notifier for the migration framework so that we can even
>>> register with non-postcopy events like bitmap syncing?
>> Precopy already has its own notifiers: git 99a0db9b
>> If we want to reuse, that one would be more suitable. I think mixing
>> non-related events into one notifier list isn't nice.
> I think that's only for migration state changes?
>
>>>> Btw, the migration_state_notifiers is already there, but seems not really
>>>> used (I only tracked spice-core.c called
>>>> add_migration_state_change_notifier). I thought adding new migration states
>>>> can reuse all that we have.
>>>> What's your real concern about that? (not sure how defining new events would
>>>> make a difference)
>>> Migration state is exposed via control path (QMP).  Adding new states
>>> mean that the QMP clients will see more.  IMO that's not really
>>> anything that a QMP client will need to know, instead we can keep it
>>> internally.  That's a reason from compatibility pov.
>>>
>>> Meanwhile, it's not really a state-thing at all for me.  It looks
>>> really more like hook or event (start/stop of sync).
>> Thanks for sharing your concerns in detail, which are quite helpful for the
>> discussion. To reuse 99a0db9b, we can also add sub-states (or say events),
>> instead of new migration states.
>> For example, we can still define "enum RamSaveState" as above, which can be
>> an indication for the notifier queued on the 99a0db9b notider_list to decide
>> whether to call start or stop.
>> Does this solve your concern?
> Frankly speaking I don't fully understand how you would add that
> sub-state.  If you are confident with the idea, maybe you can post
> your new version with the change, then I can read the code.

Sure. Code is more straightforward for this one. Let's check it in the 
new version.

>>>>> This is not that obvious to me.  For now I think it's true, since when
>>>>> we call stop() we'll take the mutex, meanwhile the mutex is actually
>>>>> always held by the iothread (in the big loop in
>>>>> virtio_balloon_poll_free_page_hints) until either:
>>>>>
>>>>> - it sleeps in qemu_cond_wait() [1], or
>>>>> - it leaves the big loop [2]
>>>>>
>>>>> Since I don't see anyone who will set dev->block_iothread to true for
>>>>> the balloon device, then [1] cannot happen;
>>>> there is a case in virtio_balloon_set_status which sets dev->block_iothread
>>>> to true.
>>>>
>>>> Did you mean the free_page_lock mutex? it is released at the bottom of the
>>>> while() loop in virtio_balloon_poll_free_page_hint. It's actually released
>>>> for every hint. That is,
>>>>
>>>> while(1){
>>>>       take the lock;
>>>>       process 1 hint from the vq;
>>>>       release the lock;
>>>> }
>>> Ah, so now I understand why you need the lock to be inside the loop,
>>> since the loop is busy polling actually.  Is it possible to do this in
>>> an async way?
>> We need to use polling here because of some back story in the guest side
>> (due to some locks being held) that makes it a barrier to sending
>> notifications for each hints.
> Any link to the "back story" that I can learn about? :) If it's too
> complicated a problem and you think I don't need to understand at all,
> please feel free to do so.

I searched a little bit, and forgot where we discussed this one. But the 
conclusion is that we don't want kick happens when the mm lock is held. 
Also, polling is a good idea here to me.
There are 32 versions of kernel patch discussions scattered, interesting 
to watch, but might take too much time. Also people usually have 
different thoughts (sometimes with partial understanding) when they 
watch something (we even have many different versions of implementations 
ourselves if you check the whole 32 versions). It's not easy to get here 
with many consensus. That's why I hope our discussion could be more 
focused on the migration part, which is the last part that has not be 
fully finalized.



> Then I would assume at least Michael has
> fully acknowledged that idea, and I can just stop putting more time on
> this part.

Yes, he's been on the loop since the beginning.


>
> Besides, if you are going to use a busy loop, then I would be not
> quite sure about whether you really want to share that iothread with
> others, since AFAIU that's not how iothread is designed (which is
> mostly event-based and should not welcome out-of-control blocking in
> the handler of events).  Though I am not 100% confident about my
> understaning on that, I only raise this question up.  Anyway you'll
> just take over the thread for a while without sharing, and after the
> burst IOs it's mostly never used (until the next bitmap sync).  Then
> it seems a bit confusing to me on why you need to share that after
> all.

Not necessarily _need_ to share it, I meant it can be shared using qemu 
command line.
Live migration doesn't happen all the time, and that optimization 
doesn't run that long, if users want to have other BHs run in this 
iothread context, they can only create one iothread via the qemu cmd line.


>
>>> I'm a bit curious on how much time will it use to do
>>> one round of the free page hints (e.g., an idle guest with 8G mem, or
>>> any configuration you tested)?  I suppose during that time the
>>> iothread will be held steady with 100% cpu usage, am I right?
>> Compared to the time spent by the legacy migration to send free pages, that
>> small amount of CPU usage spent on filtering free pages could be neglected.
>> Grinding a chopper will not hold up the work of cutting firewood :)
> Sorry I didn't express myself clearly.
>
> My question was that, have you measured how long time it will take
> from starting of the free page hints (when balloon state is set to
> FREE_PAGE_REPORT_S_REQUESTED), until it completes (when QEMU receives
> the VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID, then set the status to
> FREE_PAGE_REPORT_S_STOP)?
>

I vaguely remember it's several ms (for around 7.5G free pages) long 
time ago. What would be the concern behind that number you want to know?

Best,
Wei


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


  reply	other threads:[~2018-06-07 11:55 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-24  6:13 [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support Wei Wang
2018-04-24  6:13 ` [virtio-dev] " Wei Wang
2018-04-24  6:13 ` [Qemu-devel] [PATCH v7 1/5] bitmap: bitmap_count_one_with_offset Wei Wang
2018-04-24  6:13   ` [virtio-dev] " Wei Wang
2018-04-24  6:13 ` [Qemu-devel] [PATCH v7 2/5] migration: use bitmap_mutex in migration_bitmap_clear_dirty Wei Wang
2018-04-24  6:13   ` [virtio-dev] " Wei Wang
2018-06-01  3:37   ` [Qemu-devel] " Peter Xu
2018-04-24  6:13 ` [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap Wei Wang
2018-04-24  6:13   ` [virtio-dev] " Wei Wang
2018-06-01  4:00   ` [Qemu-devel] " Peter Xu
2018-06-01  7:36     ` Wei Wang
2018-06-01  7:36       ` [virtio-dev] " Wei Wang
2018-06-01 10:06       ` Peter Xu
2018-06-01 12:32         ` Wei Wang
2018-06-01 12:32           ` [virtio-dev] " Wei Wang
2018-06-04  2:49           ` Peter Xu
2018-06-04  7:43             ` Wei Wang
2018-06-04  7:43               ` [virtio-dev] " Wei Wang
2018-04-24  6:13 ` [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT Wei Wang
2018-04-24  6:13   ` [virtio-dev] " Wei Wang
2018-05-29 15:24   ` [Qemu-devel] " Michael S. Tsirkin
2018-05-29 15:24     ` [virtio-dev] " Michael S. Tsirkin
2018-05-30  9:12     ` [Qemu-devel] " Wei Wang
2018-05-30  9:12       ` [virtio-dev] " Wei Wang
2018-05-30 12:47       ` [Qemu-devel] " Michael S. Tsirkin
2018-05-30 12:47         ` [virtio-dev] " Michael S. Tsirkin
2018-05-31  2:27         ` [Qemu-devel] " Wei Wang
2018-05-31  2:27           ` [virtio-dev] " Wei Wang
2018-05-31 17:42           ` [Qemu-devel] " Michael S. Tsirkin
2018-05-31 17:42             ` [virtio-dev] " Michael S. Tsirkin
2018-06-01  3:18             ` [Qemu-devel] " Wei Wang
2018-06-01  3:18               ` [virtio-dev] " Wei Wang
2018-06-04  8:04         ` [Qemu-devel] " Wei Wang
2018-06-04  8:04           ` [virtio-dev] " Wei Wang
2018-06-05  6:58           ` [Qemu-devel] " Peter Xu
2018-06-05 13:22             ` Wei Wang
2018-06-05 13:22               ` [virtio-dev] " Wei Wang
2018-06-06  5:42               ` [Qemu-devel] " Peter Xu
2018-06-06 10:04                 ` Wei Wang
2018-06-06 10:04                   ` [virtio-dev] " Wei Wang
2018-06-06 11:02                   ` [Qemu-devel] " Peter Xu
2018-06-07  5:24                     ` Wei Wang
2018-06-07  5:24                       ` [virtio-dev] " Wei Wang
2018-06-07  6:32                       ` [Qemu-devel] " Peter Xu
2018-06-07 11:59                         ` Wei Wang [this message]
2018-06-07 11:59                           ` [virtio-dev] " Wei Wang
2018-06-08  2:17                           ` [Qemu-devel] " Peter Xu
2018-06-08  7:14                             ` Wei Wang
2018-06-08  7:14                               ` [virtio-dev] " Wei Wang
2018-06-08  7:31                         ` [Qemu-devel] " Wei Wang
2018-06-08  7:31                           ` [virtio-dev] " Wei Wang
2018-06-06  6:43   ` [Qemu-devel] " Peter Xu
2018-06-06 10:11     ` Wei Wang
2018-06-06 10:11       ` [virtio-dev] " Wei Wang
2018-06-07  3:17       ` Peter Xu
2018-06-07  5:29         ` Wei Wang
2018-06-07  5:29           ` [virtio-dev] " Wei Wang
2018-06-07  6:58           ` Peter Xu
2018-06-07 12:01             ` Wei Wang
2018-06-07 12:01               ` [virtio-dev] " Wei Wang
2018-06-08  1:37               ` Peter Xu
2018-06-08  1:58                 ` Peter Xu
2018-06-08  1:58                 ` Michael S. Tsirkin
2018-06-08  1:58                   ` [virtio-dev] " Michael S. Tsirkin
2018-06-08  2:34                   ` Peter Xu
2018-06-08  2:49                     ` Michael S. Tsirkin
2018-06-08  2:49                       ` [virtio-dev] " Michael S. Tsirkin
2018-06-08  3:34                       ` Peter Xu
2018-04-24  6:13 ` [Qemu-devel] [PATCH v7 5/5] migration: use the free page hint feature from balloon Wei Wang
2018-04-24  6:13   ` [virtio-dev] " Wei Wang
2018-04-24  6:42 ` [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support Wei Wang
2018-04-24  6:42   ` [virtio-dev] " Wei Wang
2018-05-14  1:22 ` [Qemu-devel] " Wei Wang
2018-05-14  1:22   ` [virtio-dev] " Wei Wang
2018-05-29 15:00 ` [Qemu-devel] " Hailiang Zhang
2018-05-29 15:24   ` Michael S. Tsirkin
2018-05-29 15:24     ` [virtio-dev] " Michael S. Tsirkin
2018-06-01  4:58 ` Peter Xu
2018-06-01  5:07   ` Peter Xu
2018-06-01  7:29     ` Wei Wang
2018-06-01  7:29       ` [virtio-dev] " Wei Wang
2018-06-01 10:02       ` Peter Xu
2018-06-01 12:31         ` Wei Wang
2018-06-01 12:31           ` [virtio-dev] " Wei Wang
2018-06-01  7:21   ` Wei Wang
2018-06-01  7:21     ` [virtio-dev] " Wei Wang
2018-06-01 10:40     ` Peter Xu
2018-06-01 15:33       ` Dr. David Alan Gilbert
2018-06-05  6:42         ` Peter Xu
2018-06-05 14:40           ` Michael S. Tsirkin
2018-06-05 14:40             ` [virtio-dev] " Michael S. Tsirkin
2018-06-05 14:39         ` Michael S. Tsirkin
2018-06-05 14:39           ` [virtio-dev] " Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5B191E1A.1000605@intel.com \
    --to=wei.w.wang@intel.com \
    --cc=dgilbert@redhat.com \
    --cc=liliang.opensource@gmail.com \
    --cc=mst@redhat.com \
    --cc=nilal@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quan.xu0@gmail.com \
    --cc=quintela@redhat.com \
    --cc=riel@redhat.com \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=yang.zhang.wz@gmail.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.