All of lore.kernel.org
 help / color / mirror / Atom feed
From: Raphael Norwitz <raphael.s.norwitz@gmail.com>
To: Dima Stepanov <dimastep@yandex-team.ru>,
	mst@redhat.com, fengli@smartx.com
Cc: fam@euphon.net, kwolf@redhat.com, stefanha@redhat.com,
	qemu-block@nongnu.org, jasowang@redhat.com,
	qemu-devel@nongnu.org, dgilbert@redhat.com,
	raphael.norwitz@nutanix.com, arei.gonglei@huawei.com,
	yc-core@yandex-team.ru, pbonzini@redhat.com,
	marcandre.lureau@redhat.com, mreitz@redhat.com
Subject: Re: [PATCH v2 5/5] vhost: add device started check in migration set log
Date: Wed, 6 May 2020 18:08:34 -0400	[thread overview]
Message-ID: <CAFubqFsvn+wNm-rAU1sLBic6J70-ZP_UT4xMDcvGeZ9TarYdjw@mail.gmail.com> (raw)
In-Reply-To: <d25241eb1fe7a55fc7dbe63ecedb4f1adf407837.1588252862.git.dimastep@yandex-team.ru>

As you correctly point out, this code needs to be looked at more
carefully so that
if the device does disconnect in the background we can handle the migration path
gracefully. In particular, we need to decide whether a migration
should be allowed
to continue if a device disconnects durning the migration stage.

mst, any thoughts?

Have you looked at the suggestion I gave Li Feng to move vhost_dev_cleanup()
into the connection path in vhost-user-blk? I’m not sure if he’s
actively working on it,
but I would prefer if we can find a way to keep some state around
between reconnects
so we aren’t constantly checking dev->started. A device can be stopped
for reasons
other than backend disconnect so I’d rather not reuse this field to
check for backend
disconnect failures.

On Thu, Apr 30, 2020 at 9:57 AM Dima Stepanov <dimastep@yandex-team.ru> wrote:
>
> If vhost-user daemon is used as a backend for the vhost device, then we
> should consider a possibility of disconnect at any moment. If such
> disconnect happened in the vhost_migration_log() routine the vhost
> device structure will be clean up.
> At the start of the vhost_migration_log() function there is a check:
>   if (!dev->started) {
>       dev->log_enabled = enable;
>       return 0;
>   }
> To be consistent with this check add the same check after calling the
> vhost_dev_set_log() routine. This in general help not to break a

Could you point to the specific asserts which are being triggered?

> migration due the assert() message. But it looks like that this code
> should be revised to handle these errors more carefully.
>
> In case of vhost-user device backend the fail paths should consider the
> state of the device. In this case we should skip some function calls
> during rollback on the error paths, so not to get the NULL dereference
> errors.
>
> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> ---
>  hw/virtio/vhost.c | 39 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 35 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 3ee50c4..d5ab96d 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -787,6 +787,17 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
>  static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>  {
>      int r, i, idx;

A couple points here


(1) This will fail the live migration if the device is disconnected.
That my be the right thing
      to do, but if there are cases where migrations can proceed with
a disconnected device,
      this may not be desirable.

(2) This looks racy. As far as I can tell vhost_dev_set_log() is only
called by vhost_migration_log(),
      and as you say one of the first things vhost_migration_log does
is return if dev->started is not
      set. What’s to stop a disconnect from clearing the vdev right
after this check, just before
      vhost_dev_set_features() is called?

As stated above, I would prefer it if we could add some state which
would persist between
reconnects which could then be checked in the vhost-user code before
interacting with
the backend. I understand this will be a much more involved change and
will require a lot
of thought.

Also, regarding (1) above, if the original check in
vhost_migration_log() returns success if the
device is not started why return an error here? I imagine this could
lead to some inconsistent
behavior if the device disconnects before the first check verses
before the second.

> +
> +    if (!dev->started) {
> +        /*
> +         * If vhost-user daemon is used as a backend for the
> +         * device and the connection is broken, then the vhost_dev
> +         * structure will be reset all its values to 0.
> +         * Add additional check for the device state.
> +         */
> +        return -1;
> +    }
> +
>      r = vhost_dev_set_features(dev, enable_log);
>      if (r < 0) {
>          goto err_features;
> @@ -801,12 +812,19 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
>      }


  reply	other threads:[~2020-05-06 22:09 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-30 13:36 [PATCH v2 0/5] vhost-user reconnect issues during vhost initialization Dima Stepanov
2020-04-30 13:36 ` [PATCH v2 1/5] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
2020-05-06  8:54   ` Li Feng
2020-05-06  9:46   ` Marc-André Lureau
2020-04-30 13:36 ` [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device Dima Stepanov
2020-05-04  0:36   ` Raphael Norwitz
2020-05-06  8:54     ` Dima Stepanov
2020-05-11  3:03   ` Jason Wang
2020-05-11  8:55     ` Dima Stepanov
2020-04-30 13:36 ` [PATCH v2 3/5] vhost-user-blk: add mechanism to track the guest notifiers init state Dima Stepanov
2020-05-04  1:06   ` Raphael Norwitz
2020-05-06  8:51     ` Dima Stepanov
2020-04-30 13:36 ` [PATCH v2 4/5] vhost: check vring address before calling unmap Dima Stepanov
2020-05-04  1:13   ` Raphael Norwitz
2020-05-11  3:05   ` Jason Wang
2020-05-11  9:11     ` Dima Stepanov
2020-05-12  3:26       ` Jason Wang
2020-05-12  9:08         ` Dima Stepanov
2020-05-13  3:00           ` Jason Wang
2020-05-13  9:36             ` Dima Stepanov
2020-05-14  7:28               ` Jason Wang
2020-04-30 13:36 ` [PATCH v2 5/5] vhost: add device started check in migration set log Dima Stepanov
2020-05-06 22:08   ` Raphael Norwitz [this message]
2020-05-07  7:15     ` Michael S. Tsirkin
2020-05-07 15:35     ` Dima Stepanov
2020-05-11  0:03       ` Raphael Norwitz
2020-05-11  9:43         ` Dima Stepanov
2020-05-11  3:15   ` Jason Wang
2020-05-11  9:25     ` Dima Stepanov
2020-05-12  3:32       ` Jason Wang
2020-05-12  3:47         ` Li Feng
2020-05-12  9:23           ` Dima Stepanov
2020-05-12  9:35         ` Dima Stepanov
2020-05-13  3:20           ` Jason Wang
2020-05-13  9:39             ` Dima Stepanov
2020-05-13  4:15           ` Michael S. Tsirkin
2020-05-13  5:56             ` Jason Wang
2020-05-13  9:47               ` Dima Stepanov
2020-05-14  7:34                 ` Jason Wang
2020-05-15 16:54                   ` Dima Stepanov
2020-05-16  3:20                     ` Li Feng
2020-05-18  2:52                       ` Jason Wang
2020-05-18  9:33                         ` Dima Stepanov
2020-05-18  9:27                       ` Dima Stepanov
2020-05-18  2:50                     ` Jason Wang
2020-05-18  9:41                       ` Dima Stepanov
2020-05-18  9:53                         ` Dr. David Alan Gilbert
2020-05-19  9:07                           ` Dima Stepanov
2020-05-19 10:24                             ` Dr. David Alan Gilbert
2020-05-19  9:59                     ` Michael S. Tsirkin
2020-05-19  9:13               ` Dima Stepanov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFubqFsvn+wNm-rAU1sLBic6J70-ZP_UT4xMDcvGeZ9TarYdjw@mail.gmail.com \
    --to=raphael.s.norwitz@gmail.com \
    --cc=arei.gonglei@huawei.com \
    --cc=dgilbert@redhat.com \
    --cc=dimastep@yandex-team.ru \
    --cc=fam@euphon.net \
    --cc=fengli@smartx.com \
    --cc=jasowang@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=raphael.norwitz@nutanix.com \
    --cc=stefanha@redhat.com \
    --cc=yc-core@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.