qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] vhost-user reconnect issues during vhost initialization
@ 2020-05-28  9:11 Dima Stepanov
  2020-05-28  9:11 ` [PATCH v4 1/2] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dima Stepanov @ 2020-05-28  9:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, qemu-block, mst, jasowang, dgilbert, mreitz, fengli,
	yc-core, marcandre.lureau, pbonzini, raphael.norwitz

Changes is v4:
- Update the "[PATCH v4 2/2] vhost-user-blk: delay
  vhost_user_blk_disconnect" patch based on Raphael's comment and Li
  Feng previous commit:
  https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg02255.html
  Don't change the vhost_user_blk_device_realize() function. Update the
  comment for the CHR_EVENT_CLOSED event.

Changes in v3:
- "[PATCH v3 1/2] char-socket: return -1 in case of disconnect during
  tcp_chr_write" made a small cleanup suggested by Li Feng. Added
  "Reviewed-by: Marc-André Lureau"
- Rework the vhost_user_blk_disconnect call logic to delay it.
- Remove the migration patch from the patch set, since we are still
  having some discussion about it. In general the current idea is good,
  but need to make some more investigation of how to handle reconnect
  during migration properly

Changes in v2:
- Add to CC list: Li Feng <fengli@smartx.com>, since it looks like that we
are working on pretty similar issues
- Remove [RFC PATCH v1 1/7] contrib/vhost-user-blk: add option to simulate
disconnect on init. Going to send this functionality in the separate
patch, with the LIBVHOST_USER_DEBUG rework. Need to think how to reuse
this option and silence the messages first.
- Remove [RFC PATCH v1 3/7] char-socket: initialize reconnect timer only if
close is emitted. This will be handled in the separate patchset:
[PATCH 3/4] char-socket: avoid double call tcp_chr_free_connection by Li
Feng

v1:
During vhost-user reconnect functionality we hit several issues, if
vhost-user-blk daemon is "crashed" or made disconnect during vhost
initialization. The general scenario is as follows:
  - vhost start routine is called
  - vhost write failed due to SIGPIPE
  - this call the disconnect routine and vhost_dev_cleanup routine
    which set to 0 all the field of the vhost_dev structure
  - return back to vhost start routine with the error
  - on the fail path vhost start routine tries to rollback the changes
    by using vhost_dev struct fields which were already reset
  - sometimes this leads to SIGSEGV, sometimes to SIGABRT
Before revising the vhost-user initialization code, we suggest adding
the sanity checks to be aware of the possible disconnect event and that
the vhost_dev structure can be in "uninitialized" state.

The vhost-user-blk daemon is updated with the additional
"--simulate-disconnect-stage=CASENUM" argument to simulate disconnect during
VHOST device initialization. For instance:
  1. $ ./vhost-user-blk -s ./vhost.sock -b test-img.raw --simulate-disconnect-stage=1
     This command will simulate disconnect in the SET_VRING_CALL handler.
     In this case the vhost device in QEMU is not set the started field to
     true.
  2. $ ./vhost-user-blk -s ./vhost.sock -b test-img.raw --simulate-disconnect-stage=2
     This command will simulate disconnect in the SET_VRING_NUM handler.
     In this case the started field is set to true.
These two cases test different QEMU parts. Also to trigger different code paths
disconnect should be simulated in two ways:
  - before any successful initialization
  - make successful initialization once and try to simulate disconnects
Also we catch SIGABRT on the migration start if vhost-user daemon disconnected
during vhost-user set log commands communication.

Dima Stepanov (2):
  char-socket: return -1 in case of disconnect during tcp_chr_write
  vhost-user-blk: delay vhost_user_blk_disconnect

 chardev/char-socket.c     |  7 ++++---
 hw/block/vhost-user-blk.c | 38 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 41 insertions(+), 4 deletions(-)

-- 
2.7.4



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 1/2] char-socket: return -1 in case of disconnect during tcp_chr_write
  2020-05-28  9:11 [PATCH v4 0/2] vhost-user reconnect issues during vhost initialization Dima Stepanov
@ 2020-05-28  9:11 ` Dima Stepanov
  2020-05-28  9:11 ` [PATCH v4 2/2] vhost-user-blk: delay vhost_user_blk_disconnect Dima Stepanov
  2020-09-14 13:40 ` [PATCH v4 0/2] vhost-user reconnect issues during vhost initialization Stefan Hajnoczi
  2 siblings, 0 replies; 7+ messages in thread
From: Dima Stepanov @ 2020-05-28  9:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, qemu-block, mst, jasowang, dgilbert, mreitz, fengli,
	yc-core, marcandre.lureau, pbonzini, raphael.norwitz

During testing of the vhost-user-blk reconnect functionality the qemu
SIGSEGV was triggered:
 start qemu as:
 x86_64-softmmu/qemu-system-x86_64 -m 1024M -M q35 \
   -object memory-backend-file,id=ram-node0,size=1024M,mem-path=/dev/shm/qemu,share=on \
   -numa node,cpus=0,memdev=ram-node0 \
   -chardev socket,id=chardev0,path=./vhost.sock,noserver,reconnect=1 \
   -device vhost-user-blk-pci,chardev=chardev0,num-queues=4 --enable-kvm
 start vhost-user-blk daemon:
 ./vhost-user-blk -s ./vhost.sock -b test-img.raw

If vhost-user-blk will be killed during the vhost initialization
process, for instance after getting VHOST_SET_VRING_CALL command, then
QEMU will fail with the following backtrace:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x00005555559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffffffd5b0)
    at ./hw/virtio/vhost-user.c:260
260         CharBackend *chr = u->user->chr;

 #0  0x00005555559272bb in vhost_user_read (dev=0x7fffef2d53e0, msg=0x7fffffffd5b0)
    at ./hw/virtio/vhost-user.c:260
 #1  0x000055555592acb8 in vhost_user_get_config (dev=0x7fffef2d53e0, config=0x7fffef2d5394 "", config_len=60)
    at ./hw/virtio/vhost-user.c:1645
 #2  0x0000555555925525 in vhost_dev_get_config (hdev=0x7fffef2d53e0, config=0x7fffef2d5394 "", config_len=60)
    at ./hw/virtio/vhost.c:1490
 #3  0x00005555558cc46b in vhost_user_blk_device_realize (dev=0x7fffef2d51a0, errp=0x7fffffffd8f0)
    at ./hw/block/vhost-user-blk.c:429
 #4  0x0000555555920090 in virtio_device_realize (dev=0x7fffef2d51a0, errp=0x7fffffffd948)
    at ./hw/virtio/virtio.c:3615
 #5  0x0000555555a9779c in device_set_realized (obj=0x7fffef2d51a0, value=true, errp=0x7fffffffdb88)
    at ./hw/core/qdev.c:891
 ...

The problem is that vhost_user_write doesn't get an error after
disconnect and try to call vhost_user_read(). The tcp_chr_write()
routine should return -1 in case of disconnect. Indicate the EIO error
if this routine is called in the disconnected state.

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 chardev/char-socket.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 232e0a8..c2462e0 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -174,15 +174,16 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
 
         if (ret < 0 && errno != EAGAIN) {
             if (tcp_chr_read_poll(chr) <= 0) {
+                /* Perform disconnect and return error. */
                 tcp_chr_disconnect_locked(chr);
-                return len;
             } /* else let the read handler finish it properly */
         }
 
         return ret;
     } else {
-        /* XXX: indicate an error ? */
-        return len;
+        /* Indicate an error. */
+        errno = EIO;
+        return -1;
     }
 }
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 2/2] vhost-user-blk: delay vhost_user_blk_disconnect
  2020-05-28  9:11 [PATCH v4 0/2] vhost-user reconnect issues during vhost initialization Dima Stepanov
  2020-05-28  9:11 ` [PATCH v4 1/2] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
@ 2020-05-28  9:11 ` Dima Stepanov
  2020-05-31  0:55   ` Raphael Norwitz
  2020-09-14 13:40 ` [PATCH v4 0/2] vhost-user reconnect issues during vhost initialization Stefan Hajnoczi
  2 siblings, 1 reply; 7+ messages in thread
From: Dima Stepanov @ 2020-05-28  9:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, qemu-block, mst, jasowang, dgilbert, mreitz, fengli,
	yc-core, marcandre.lureau, pbonzini, raphael.norwitz

A socket write during vhost-user communication may trigger a disconnect
event, calling vhost_user_blk_disconnect() and clearing all the
vhost_dev structures holding data that vhost-user functions expect to
remain valid to roll back initialization correctly. Delay the cleanup to
keep vhost_dev structure valid.
There are two possible states to handle:
1. RUN_STATE_PRELAUNCH: skip bh oneshot call and perform disconnect in
the caller routine.
2. RUN_STATE_RUNNING: delay by using bh

BH changes are based on the similar changes for the vhost-user-net
device:
  commit e7c83a885f865128ae3cf1946f8cb538b63cbfba
  "vhost-user: delay vhost_user_stop"

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
---
 hw/block/vhost-user-blk.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9d8c0b3..76838e7 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -349,6 +349,19 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
     vhost_dev_cleanup(&s->dev);
 }
 
+static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
+
+static void vhost_user_blk_chr_closed_bh(void *opaque)
+{
+    DeviceState *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+    vhost_user_blk_disconnect(dev);
+    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
+            NULL, opaque, NULL, true);
+}
+
 static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
 {
     DeviceState *dev = opaque;
@@ -363,7 +376,30 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
         }
         break;
     case CHR_EVENT_CLOSED:
-        vhost_user_blk_disconnect(dev);
+        /*
+         * A close event may happen during a read/write, but vhost
+         * code assumes the vhost_dev remains setup, so delay the
+         * stop & clear. There are two possible paths to hit this
+         * disconnect event:
+         * 1. When VM is in the RUN_STATE_PRELAUNCH state. The
+         * vhost_user_blk_device_realize() is a caller.
+         * 2. In tha main loop phase after VM start.
+         *
+         * For p2 the disconnect event will be delayed. We can't
+         * do the same for p1, because we are not running the loop
+         * at this moment. So just skip this step and perform
+         * disconnect in the caller function.
+         *
+         * TODO: maybe it is a good idea to make the same fix
+         * for other vhost-user devices.
+         */
+        if (runstate_is_running()) {
+            AioContext *ctx = qemu_get_current_aio_context();
+
+            qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL, NULL,
+                    NULL, NULL, false);
+            aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);
+        }
         break;
     case CHR_EVENT_BREAK:
     case CHR_EVENT_MUX_IN:
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] vhost-user-blk: delay vhost_user_blk_disconnect
  2020-05-28  9:11 ` [PATCH v4 2/2] vhost-user-blk: delay vhost_user_blk_disconnect Dima Stepanov
@ 2020-05-31  0:55   ` Raphael Norwitz
  2020-06-02  3:16     ` Li Feng
  2020-06-02  8:31     ` Dima Stepanov
  0 siblings, 2 replies; 7+ messages in thread
From: Raphael Norwitz @ 2020-05-31  0:55 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: Kevin Wolf, qemu-block, Michael S. Tsirkin, jasowang,
	Dr. David Alan Gilbert, QEMU, Raphael Norwitz, fengli, yc-core,
	Paolo Bonzini, Marc-André Lureau, Max Reitz

On Thu, May 28, 2020 at 5:13 AM Dima Stepanov <dimastep@yandex-team.ru> wrote:
>
> A socket write during vhost-user communication may trigger a disconnect
> event, calling vhost_user_blk_disconnect() and clearing all the
> vhost_dev structures holding data that vhost-user functions expect to
> remain valid to roll back initialization correctly. Delay the cleanup to
> keep vhost_dev structure valid.
> There are two possible states to handle:
> 1. RUN_STATE_PRELAUNCH: skip bh oneshot call and perform disconnect in
> the caller routine.
> 2. RUN_STATE_RUNNING: delay by using bh
>
> BH changes are based on the similar changes for the vhost-user-net
> device:
>   commit e7c83a885f865128ae3cf1946f8cb538b63cbfba
>   "vhost-user: delay vhost_user_stop"
>
> Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>

Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>

Li Feng - would you also like to sign off here?

> ---
>  hw/block/vhost-user-blk.c | 38 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 9d8c0b3..76838e7 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -349,6 +349,19 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
>      vhost_dev_cleanup(&s->dev);
>  }
>
> +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
> +
> +static void vhost_user_blk_chr_closed_bh(void *opaque)
> +{
> +    DeviceState *dev = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +    VHostUserBlk *s = VHOST_USER_BLK(vdev);
> +
> +    vhost_user_blk_disconnect(dev);
> +    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
> +            NULL, opaque, NULL, true);
> +}
> +
>  static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
>  {
>      DeviceState *dev = opaque;
> @@ -363,7 +376,30 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
>          }
>          break;
>      case CHR_EVENT_CLOSED:
> -        vhost_user_blk_disconnect(dev);
> +        /*
> +         * A close event may happen during a read/write, but vhost
> +         * code assumes the vhost_dev remains setup, so delay the
> +         * stop & clear. There are two possible paths to hit this
> +         * disconnect event:
> +         * 1. When VM is in the RUN_STATE_PRELAUNCH state. The
> +         * vhost_user_blk_device_realize() is a caller.
> +         * 2. In tha main loop phase after VM start.
> +         *
> +         * For p2 the disconnect event will be delayed. We can't
> +         * do the same for p1, because we are not running the loop
> +         * at this moment. So just skip this step and perform
> +         * disconnect in the caller function.
> +         *
> +         * TODO: maybe it is a good idea to make the same fix
> +         * for other vhost-user devices.
> +         */
> +        if (runstate_is_running()) {
> +            AioContext *ctx = qemu_get_current_aio_context();
> +
> +            qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL, NULL,
> +                    NULL, NULL, false);
> +            aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);
> +        }
>          break;
>      case CHR_EVENT_BREAK:
>      case CHR_EVENT_MUX_IN:
> --
> 2.7.4
>
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] vhost-user-blk: delay vhost_user_blk_disconnect
  2020-05-31  0:55   ` Raphael Norwitz
@ 2020-06-02  3:16     ` Li Feng
  2020-06-02  8:31     ` Dima Stepanov
  1 sibling, 0 replies; 7+ messages in thread
From: Li Feng @ 2020-06-02  3:16 UTC (permalink / raw)
  To: Raphael Norwitz
  Cc: Kevin Wolf, open list:Block layer core, Michael S. Tsirkin,
	Jason Wang, QEMU, Dr. David Alan Gilbert, Raphael Norwitz,
	yc-core, Paolo Bonzini, Marc-André Lureau, Max Reitz,
	Dima Stepanov

Hi Raphael,
I'm sorry. I just end my journey today.

Yes, pls sign off me here.
this patch is nearly the same as my previous patch.

Thanks,
Feng Li

Raphael Norwitz <raphael.s.norwitz@gmail.com> 于2020年5月31日周日 上午8:55写道:
>
> On Thu, May 28, 2020 at 5:13 AM Dima Stepanov <dimastep@yandex-team.ru> wrote:
> >
> > A socket write during vhost-user communication may trigger a disconnect
> > event, calling vhost_user_blk_disconnect() and clearing all the
> > vhost_dev structures holding data that vhost-user functions expect to
> > remain valid to roll back initialization correctly. Delay the cleanup to
> > keep vhost_dev structure valid.
> > There are two possible states to handle:
> > 1. RUN_STATE_PRELAUNCH: skip bh oneshot call and perform disconnect in
> > the caller routine.
> > 2. RUN_STATE_RUNNING: delay by using bh
> >
> > BH changes are based on the similar changes for the vhost-user-net
> > device:
> >   commit e7c83a885f865128ae3cf1946f8cb538b63cbfba
> >   "vhost-user: delay vhost_user_stop"
> >
> > Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
>
> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
>
> Li Feng - would you also like to sign off here?
>
> > ---
> >  hw/block/vhost-user-blk.c | 38 +++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 37 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > index 9d8c0b3..76838e7 100644
> > --- a/hw/block/vhost-user-blk.c
> > +++ b/hw/block/vhost-user-blk.c
> > @@ -349,6 +349,19 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
> >      vhost_dev_cleanup(&s->dev);
> >  }
> >
> > +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
> > +
> > +static void vhost_user_blk_chr_closed_bh(void *opaque)
> > +{
> > +    DeviceState *dev = opaque;
> > +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > +    VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > +
> > +    vhost_user_blk_disconnect(dev);
> > +    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
> > +            NULL, opaque, NULL, true);
> > +}
> > +
> >  static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
> >  {
> >      DeviceState *dev = opaque;
> > @@ -363,7 +376,30 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
> >          }
> >          break;
> >      case CHR_EVENT_CLOSED:
> > -        vhost_user_blk_disconnect(dev);
> > +        /*
> > +         * A close event may happen during a read/write, but vhost
> > +         * code assumes the vhost_dev remains setup, so delay the
> > +         * stop & clear. There are two possible paths to hit this
> > +         * disconnect event:
> > +         * 1. When VM is in the RUN_STATE_PRELAUNCH state. The
> > +         * vhost_user_blk_device_realize() is a caller.
> > +         * 2. In tha main loop phase after VM start.
> > +         *
> > +         * For p2 the disconnect event will be delayed. We can't
> > +         * do the same for p1, because we are not running the loop
> > +         * at this moment. So just skip this step and perform
> > +         * disconnect in the caller function.
> > +         *
> > +         * TODO: maybe it is a good idea to make the same fix
> > +         * for other vhost-user devices.
> > +         */
> > +        if (runstate_is_running()) {
> > +            AioContext *ctx = qemu_get_current_aio_context();
> > +
> > +            qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL, NULL,
> > +                    NULL, NULL, false);
> > +            aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);
> > +        }
> >          break;
> >      case CHR_EVENT_BREAK:
> >      case CHR_EVENT_MUX_IN:
> > --
> > 2.7.4
> >
> >


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] vhost-user-blk: delay vhost_user_blk_disconnect
  2020-05-31  0:55   ` Raphael Norwitz
  2020-06-02  3:16     ` Li Feng
@ 2020-06-02  8:31     ` Dima Stepanov
  1 sibling, 0 replies; 7+ messages in thread
From: Dima Stepanov @ 2020-06-02  8:31 UTC (permalink / raw)
  To: Raphael Norwitz
  Cc: Kevin Wolf, qemu-block, Michael S. Tsirkin, jasowang,
	Dr. David Alan Gilbert, QEMU, Raphael Norwitz, fengli, yc-core,
	Paolo Bonzini, Marc-André Lureau, Max Reitz

On Sat, May 30, 2020 at 08:55:30PM -0400, Raphael Norwitz wrote:
> On Thu, May 28, 2020 at 5:13 AM Dima Stepanov <dimastep@yandex-team.ru> wrote:
> >
> > A socket write during vhost-user communication may trigger a disconnect
> > event, calling vhost_user_blk_disconnect() and clearing all the
> > vhost_dev structures holding data that vhost-user functions expect to
> > remain valid to roll back initialization correctly. Delay the cleanup to
> > keep vhost_dev structure valid.
> > There are two possible states to handle:
> > 1. RUN_STATE_PRELAUNCH: skip bh oneshot call and perform disconnect in
> > the caller routine.
> > 2. RUN_STATE_RUNNING: delay by using bh
> >
> > BH changes are based on the similar changes for the vhost-user-net
> > device:
> >   commit e7c83a885f865128ae3cf1946f8cb538b63cbfba
> >   "vhost-user: delay vhost_user_stop"
> >
> > Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
> 
> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> 
> Li Feng - would you also like to sign off here?
Raphael,

Will you take this patchset for merging or what is the next step? )

Thanks, Dima.

> 
> > ---
> >  hw/block/vhost-user-blk.c | 38 +++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 37 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > index 9d8c0b3..76838e7 100644
> > --- a/hw/block/vhost-user-blk.c
> > +++ b/hw/block/vhost-user-blk.c
> > @@ -349,6 +349,19 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
> >      vhost_dev_cleanup(&s->dev);
> >  }
> >
> > +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
> > +
> > +static void vhost_user_blk_chr_closed_bh(void *opaque)
> > +{
> > +    DeviceState *dev = opaque;
> > +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > +    VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > +
> > +    vhost_user_blk_disconnect(dev);
> > +    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
> > +            NULL, opaque, NULL, true);
> > +}
> > +
> >  static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
> >  {
> >      DeviceState *dev = opaque;
> > @@ -363,7 +376,30 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
> >          }
> >          break;
> >      case CHR_EVENT_CLOSED:
> > -        vhost_user_blk_disconnect(dev);
> > +        /*
> > +         * A close event may happen during a read/write, but vhost
> > +         * code assumes the vhost_dev remains setup, so delay the
> > +         * stop & clear. There are two possible paths to hit this
> > +         * disconnect event:
> > +         * 1. When VM is in the RUN_STATE_PRELAUNCH state. The
> > +         * vhost_user_blk_device_realize() is a caller.
> > +         * 2. In tha main loop phase after VM start.
> > +         *
> > +         * For p2 the disconnect event will be delayed. We can't
> > +         * do the same for p1, because we are not running the loop
> > +         * at this moment. So just skip this step and perform
> > +         * disconnect in the caller function.
> > +         *
> > +         * TODO: maybe it is a good idea to make the same fix
> > +         * for other vhost-user devices.
> > +         */
> > +        if (runstate_is_running()) {
> > +            AioContext *ctx = qemu_get_current_aio_context();
> > +
> > +            qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL, NULL,
> > +                    NULL, NULL, false);
> > +            aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);
> > +        }
> >          break;
> >      case CHR_EVENT_BREAK:
> >      case CHR_EVENT_MUX_IN:
> > --
> > 2.7.4
> >
> >


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 0/2] vhost-user reconnect issues during vhost initialization
  2020-05-28  9:11 [PATCH v4 0/2] vhost-user reconnect issues during vhost initialization Dima Stepanov
  2020-05-28  9:11 ` [PATCH v4 1/2] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
  2020-05-28  9:11 ` [PATCH v4 2/2] vhost-user-blk: delay vhost_user_blk_disconnect Dima Stepanov
@ 2020-09-14 13:40 ` Stefan Hajnoczi
  2 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2020-09-14 13:40 UTC (permalink / raw)
  To: Dima Stepanov
  Cc: kwolf, Qing Wang, qemu-block, mst, jasowang, Cong Li, dgilbert,
	qemu-devel, raphael.norwitz, fengli, yc-core, pbonzini,
	marcandre.lureau, mreitz

[-- Attachment #1: Type: text/plain, Size: 1872 bytes --]

On Thu, May 28, 2020 at 12:11:17PM +0300, Dima Stepanov wrote:
> Before revising the vhost-user initialization code, we suggest adding
> the sanity checks to be aware of the possible disconnect event and that
> the vhost_dev structure can be in "uninitialized" state.
> 
> The vhost-user-blk daemon is updated with the additional
> "--simulate-disconnect-stage=CASENUM" argument to simulate disconnect during
> VHOST device initialization. For instance:
>   1. $ ./vhost-user-blk -s ./vhost.sock -b test-img.raw --simulate-disconnect-stage=1
>      This command will simulate disconnect in the SET_VRING_CALL handler.
>      In this case the vhost device in QEMU is not set the started field to
>      true.
>   2. $ ./vhost-user-blk -s ./vhost.sock -b test-img.raw --simulate-disconnect-stage=2
>      This command will simulate disconnect in the SET_VRING_NUM handler.
>      In this case the started field is set to true.
> These two cases test different QEMU parts. Also to trigger different code paths
> disconnect should be simulated in two ways:
>   - before any successful initialization
>   - make successful initialization once and try to simulate disconnects
> Also we catch SIGABRT on the migration start if vhost-user daemon disconnected
> during vhost-user set log commands communication.
> 
> Dima Stepanov (2):
>   char-socket: return -1 in case of disconnect during tcp_chr_write
>   vhost-user-blk: delay vhost_user_blk_disconnect
> 
>  chardev/char-socket.c     |  7 ++++---
>  hw/block/vhost-user-blk.c | 38 +++++++++++++++++++++++++++++++++++++-
>  2 files changed, 41 insertions(+), 4 deletions(-)

Hi Dima,
The cover letter mentions ./vhost-user-blk --simulate-disconnect-stage=N
to simulate unexpected disconnects but I don't see that option in
contrib/vhost-user-blk/.

Is there a test case?

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-09-14 13:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-28  9:11 [PATCH v4 0/2] vhost-user reconnect issues during vhost initialization Dima Stepanov
2020-05-28  9:11 ` [PATCH v4 1/2] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
2020-05-28  9:11 ` [PATCH v4 2/2] vhost-user-blk: delay vhost_user_blk_disconnect Dima Stepanov
2020-05-31  0:55   ` Raphael Norwitz
2020-06-02  3:16     ` Li Feng
2020-06-02  8:31     ` Dima Stepanov
2020-09-14 13:40 ` [PATCH v4 0/2] vhost-user reconnect issues during vhost initialization Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).