All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Wei Wang <wei.w.wang@intel.com>
Cc: virtio-dev@lists.oasis-open.org, quintela@redhat.com,
	liliang.opensource@gmail.com, mst@redhat.com,
	qemu-devel@nongnu.org, dgilbert@redhat.com, pbonzini@redhat.com,
	nilal@redhat.com
Subject: Re: [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy
Date: Thu, 29 Nov 2018 13:47:22 +0800	[thread overview]
Message-ID: <20181129054722.GD29246@xz-x1> (raw)
In-Reply-To: <20181129051014.GC29246@xz-x1>

On Thu, Nov 29, 2018 at 01:10:14PM +0800, Peter Xu wrote:
> On Thu, Nov 29, 2018 at 11:40:57AM +0800, Wei Wang wrote:
> > On 11/28/2018 05:32 PM, Peter Xu wrote:
> > > 
> > > So what I am worrying here are corner cases where we might forget to
> > > stop the hinting.  I'm fabricating one example sequence of events:
> > > 
> > >    (start migration)
> > >    START_MIGRATION
> > >    BEFORE_SYNC
> > >    AFTER_SYNC
> > >    ...
> > >    BEFORE_SYNC
> > >    AFTER_SYNC
> > >    (some SaveStateEntry failed rather than RAM, then
> > >     migration_detect_error returned MIG_THR_ERR_FATAL so we need to
> > >     fail the migration, however when running the previous
> > >     ram_save_iterate for RAM's specific SaveStateEntry we didn't see
> > >     any error so no ERROR event detected)
> > > 
> > > Then it seems the hinting will last forever.  Considering that now I'm
> > > not sure whether this can be done ram-only, since even if you capture
> > > ram_save_complete() and at the same time you introduce PRECOPY_END you
> > > may still miss the PRECOPY_END event since AFAIU ram_save_complete()
> > > won't be called at all in this case.
> > > 
> > > Could this happen?
> > 
> > Thanks, indeed this case could happen if we add PRECOPY_END in
> > ram_save_complete.
> > 
> > How about putting PRECOPY_END in ram_save_cleanup?
> > I think it would be called in any case.
> 
> Sounds good.
> 
> > 
> > I'm also thinking probably we don't need PRECOPY_ERR when we have
> > PRECOPY_END,
> > and what do you think of the notifier names below:
> > 
> > +typedef enum PrecopyNotifyReason {
> > +    PRECOPY_NOTIFY_RAM_SAVE_END = 0,
> > +    PRECOPY_NOTIFY_RAM_SAVE_START = 1,
> > +    PRECOPY_NOTIFY_RAM_SAVE_BEFORE_SYNC_BITMAP = 2,
> > +    PRECOPY_NOTIFY_RAM_SAVE_AFTER_SYNC_BITMAP = 3,
> > +    PRECOPY_NOTIFY_RAM_SAVE_MAX = 4,
> > +} PrecopyNotifyReason;
> 
> (please see below [1]...)
> 
> > 
> > 
> > > 
> > > > 
> > > > > [1]
> > > > > 
> > > > > > > Another thing to mention about the "reasons" (though I see it more
> > > > > > > like "events"): have you thought about adding a PRECOPY_NOTIFY_END?
> > > > > > > It might help in some cases:
> > > > > > > 
> > > > > > >      - then you don't need to trickily export the migrate_postcopy()
> > > > > > >        since you'll notify that before postcopy starts
> > > > > > I'm thinking probably we don't need to export migrate_postcopy even now.
> > > > > > It's more like a sanity check, and not needed because now we have the
> > > > > > notifier registered to the precopy specific callchain, which has ensured
> > > > > > that
> > > > > > it is invoked via precopy.
> > > > > But postcopy will always start with precopy, no?
> > > > Yes, but I think we could add the check in precopy_notify()
> > > I'm not sure that's good.  If the notifier could potentially have
> > > other user, they might still work with postcopy, and they might expect
> > > e.g. BEFORE_SYNC to be called for every sync, even if it's at the
> > > precopy stage of a postcopy.
> > 
> > I think this precopy notifier callchain is expected to be used only for
> > the precopy mode. Postcopy has its dedicated notifier callchain that
> > users could use.
> > 
> > How about changing the migrate_postcopy() check to "ms->start_postcopy":
> > 
> > bool migration_postcopy_start(void)
> > {
> >     MigrationState *s;
> > 
> >     s = migrate_get_current();
> > 
> >     return atomic_read(&s->start_postcopy);
> > }
> > 
> > 
> > static void precopy_notify(PrecopyNotifyReason reason)
> > {
> >     if (migration_postcopy_start())
> >         return;
> > 
> >     notifier_list_notify(&precopy_notifier_list, &reason);
> > }
> > 
> > If postcopy started with precopy, the precopy optimization feature
> > could still be used until it switches to the postcopy mode.
> 
> I'm not sure we can use start_postcopy.  It's a variable being set in
> the QMP handler but it does not mean postcopy has started.  I'm afraid
> there can be race where it's still precopy but the variable is set so
> event could be missed...
> 
> IMHO the problem is not that complicated.  How about this proposal:
> 
> [1]
> 
>   typedef enum PrecopyNotifyReason {
>     PRECOPY_NOTIFY_RAM_START,
>     PRECOPY_NOTIFY_RAM_BEFORE_SYNC,
>     PRECOPY_NOTIFY_RAM_AFTER_SYNC,
>     PRECOPY_NOTIFY_COMPLETE,
>     PRECOPY_NOTIFY_RAM_CLEANUP,
>   };
> 
> The first three keep the same as your old ones.  Notify RAM_CLEANUP in
> ram_save_cleanup() to make sure it'll always be cleaned up (the same
> as PRECOPY_END, just another name).  Notify COMPLETE in
> qemu_savevm_state_complete_precopy() to show that precopy is
> completed.  Meanwhile on balloon side you should stop the hinting for
> either RAM_CLEANUP or COMPLETE event.  Then either:
> 
>   - precopy is switching to postcopy, or
>   - precopy completed, or
>   - precopy failed/cancelled
> 
> You should always get at least a notification to stop the balloon.
> Though you could also get one RAM_CLEANUP after one COMPLETE, but
> the balloon should easily handle it (stop the hinting twice).
> 
> Here maybe you can even remove the "RAM_" in both RAM_START and
> RAM_CLEANUP if we're going to have COMPLETE since after all it'll be
> not only limited to RAM.

Oh maybe we can remove all the RAM_ prefix to make it precopy
general...

  typedef enum PrecopyNotifyReason {
    PRECOPY_NOTIFY_SETUP,
    PRECOPY_NOTIFY_BEFORE_SYNC,
    PRECOPY_NOTIFY_AFTER_SYNC,
    PRECOPY_NOTIFY_COMPLETE,
    PRECOPY_NOTIFY_CLEANUP,
  };

Then we can just hook everything with the corresponding names:

  SETUP:    hooks with qemu_savevm_state_setup
  COMPLETE: hooks with qemu_savevm_state_complete_precopy
  CLEANUP:  hooks with qemu_savevm_state_cleanup

I'm not sure whether you'll need another hook in ram_state_reset in
the future but for now I don't see it necessary since I don't thnk
ram_list.version would change during migration for now, so
ram_state_reset should only be called during setup.

> 
> Another suggestion is that you can add an Error into the notify hooks,
> please refer to the postcopy one:
> 
>   int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);
> 
> So the hook functions have a way to even stop the migration (though
> for balloon hinting it'll be always optional so no error should be
> reported...), then the two interfaces are matched.
> 
> > 
> > 
> > 
> > > In that sense I still feel the
> > > PRECOPY_END is better (so contantly call it at the end of precopy, no
> > > matter whether there's another postcopy afterwards).  It sounds like a
> > > cleaner interface.
> > 
> > Probably I still haven't got the point how PRECOPY_END could help above yet.
> 
> Please have a look at above proposal.  Thanks,
> 
> -- 
> Peter Xu
> 

Regards,

-- 
Peter Xu

  reply	other threads:[~2018-11-29  5:47 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-15 10:07 [Qemu-devel] [PATCH v9 0/8] virtio-balloon: free page hint support Wei Wang
2018-11-15 10:07 ` [virtio-dev] " Wei Wang
2018-11-15 10:07 ` [Qemu-devel] [PATCH v9 1/8] bitmap: fix bitmap_count_one Wei Wang
2018-11-15 10:07   ` [virtio-dev] " Wei Wang
2018-11-15 10:07 ` [Qemu-devel] [PATCH v9 2/8] bitmap: bitmap_count_one_with_offset Wei Wang
2018-11-15 10:07   ` [virtio-dev] " Wei Wang
2018-11-15 10:07 ` [Qemu-devel] [PATCH v9 3/8] migration: use bitmap_mutex in migration_bitmap_clear_dirty Wei Wang
2018-11-15 10:07   ` [virtio-dev] " Wei Wang
2018-11-27  5:40   ` [Qemu-devel] " Peter Xu
2018-11-27  6:02     ` Wei Wang
2018-11-27  6:02       ` [virtio-dev] " Wei Wang
2018-11-27  6:12       ` [Qemu-devel] " Wei Wang
2018-11-27  6:12         ` Wei Wang
2018-11-27  7:41         ` [Qemu-devel] " Peter Xu
2018-11-27 10:17           ` Wei Wang
2018-11-27 10:17             ` Wei Wang
2018-11-15 10:08 ` [Qemu-devel] [PATCH v9 4/8] migration: API to clear bits of guest free pages from the dirty bitmap Wei Wang
2018-11-15 10:08   ` [virtio-dev] " Wei Wang
2018-11-27  6:06   ` [Qemu-devel] " Peter Xu
2018-11-27  6:52     ` Wei Wang
2018-11-27  6:52       ` [virtio-dev] " Wei Wang
2018-11-27  7:43       ` [Qemu-devel] " Peter Xu
2018-11-15 10:08 ` [Qemu-devel] [PATCH v9 5/8] migration/ram.c: add a notifier chain for precopy Wei Wang
2018-11-15 10:08   ` [virtio-dev] " Wei Wang
2018-11-27  7:38   ` [Qemu-devel] " Peter Xu
2018-11-27 10:25     ` Wei Wang
2018-11-27 10:25       ` [virtio-dev] " Wei Wang
2018-11-28  5:26       ` [Qemu-devel] " Peter Xu
2018-11-28  9:01         ` Wei Wang
2018-11-28  9:01           ` [virtio-dev] " Wei Wang
2018-11-28  9:32           ` [Qemu-devel] " Peter Xu
2018-11-29  3:40             ` Wei Wang
2018-11-29  3:40               ` [virtio-dev] " Wei Wang
2018-11-29  5:10               ` [Qemu-devel] " Peter Xu
2018-11-29  5:47                 ` Peter Xu [this message]
2018-11-29  6:30                 ` Wei Wang
2018-11-29  6:30                   ` [virtio-dev] " Wei Wang
2018-11-30  5:05                 ` [Qemu-devel] " Wei Wang
2018-11-30  5:05                   ` [virtio-dev] " Wei Wang
2018-11-30  5:57                   ` [Qemu-devel] " Peter Xu
2018-11-30  7:09                     ` Wei Wang
2018-11-30  7:09                       ` [virtio-dev] " Wei Wang
2018-11-15 10:08 ` [Qemu-devel] [PATCH v9 6/8] migration/ram.c: add a function to disable the bulk stage Wei Wang
2018-11-15 10:08   ` [virtio-dev] " Wei Wang
2018-11-15 10:08 ` [Qemu-devel] [PATCH v9 7/8] migration: move migrate_postcopy() to include/migration/misc.h Wei Wang
2018-11-15 10:08   ` [virtio-dev] " Wei Wang
2018-11-15 10:08 ` [Qemu-devel] [PATCH v9 8/8] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT Wei Wang
2018-11-15 10:08   ` [virtio-dev] " Wei Wang
2018-11-15 18:50 ` [Qemu-devel] [PATCH v9 0/8] virtio-balloon: free page hint support no-reply
2018-11-16  1:38   ` Wei Wang
2018-11-16  1:38     ` [virtio-dev] " Wei Wang
2018-11-27  3:11 ` Wei Wang
2018-11-27  3:11   ` [virtio-dev] " Wei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181129054722.GD29246@xz-x1 \
    --to=peterx@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=liliang.opensource@gmail.com \
    --cc=mst@redhat.com \
    --cc=nilal@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=wei.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.