All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dima Stepanov <dimastep@yandex-team.ru>
To: Jason Wang <jasowang@redhat.com>
Cc: "Fam Zheng" <fam@euphon.net>, "Kevin Wolf" <kwolf@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"open list:Block layer core" <qemu-block@nongnu.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"open list:All patches CC here" <qemu-devel@nongnu.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Gonglei <arei.gonglei@huawei.com>, "Li Feng" <fengli@smartx.com>,
	yc-core@yandex-team.ru, "Paolo Bonzini" <pbonzini@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Raphael Norwitz" <raphael.norwitz@nutanix.com>,
	"Max Reitz" <mreitz@redhat.com>
Subject: Re: [PATCH v2 5/5] vhost: add device started check in migration set log
Date: Mon, 18 May 2020 12:33:19 +0300	[thread overview]
Message-ID: <20200518093313.GA6489@dimastep-nix> (raw)
In-Reply-To: <6bf8a077-01fb-dfd1-164e-440d313503d3@redhat.com>

On Mon, May 18, 2020 at 10:52:08AM +0800, Jason Wang wrote:
> 
> On 2020/5/16 上午11:20, Li Feng wrote:
> >Hi, Dima.
> >This abort is what I have mentioned in my previous email.
> >I have triggered this crash without any fix a week ago.
> >And I have written a test patch to let vhost_log_global_start return
> >int and propagate the error to up layer.
> >However, my change is a little large, because the origin callback
> >return void, and don't do some rollback.
> >After test, the migration could migrate to dst successfully, and fio
> >is still running perfectly, but the src vm is still stuck here, no
> >crash.
> >
> >Is it right to return this error to the up layer?
> 
> 
> That could be a solution or we may ask David for more suggestion.
> 
> Another thing that might be useful is to block re connection during
> migration.
I've written a little more information as answer to Feng's mail. But
what if add some new callback to get the device started state (started or not).
And for the vhost-user (or at least vhost-usr-blk) devices it will use
the connected field also to return the device state:
  - disconnect -> not started
For other devices we can just return the started field value as it is
right now.

No other comments mixed in below.

> 
> Thanks
> 
> 
> >
> >Thanks,
> >Feng Li
> >
> >Dima Stepanov <dimastep@yandex-team.ru> 于2020年5月16日周六 上午12:55写道:
> >>On Thu, May 14, 2020 at 03:34:24PM +0800, Jason Wang wrote:
> >>>On 2020/5/13 下午5:47, Dima Stepanov wrote:
> >>>>>>     case CHR_EVENT_CLOSED:
> >>>>>>         /* a close event may happen during a read/write, but vhost
> >>>>>>          * code assumes the vhost_dev remains setup, so delay the
> >>>>>>          * stop & clear to idle.
> >>>>>>          * FIXME: better handle failure in vhost code, remove bh
> >>>>>>          */
> >>>>>>         if (s->watch) {
> >>>>>>             AioContext *ctx = qemu_get_current_aio_context();
> >>>>>>
> >>>>>>             g_source_remove(s->watch);
> >>>>>>             s->watch = 0;
> >>>>>>             qemu_chr_fe_set_handlers(&s->chr, NULL, NULL, NULL, NULL,
> >>>>>>                                      NULL, NULL, false);
> >>>>>>
> >>>>>>             aio_bh_schedule_oneshot(ctx, chr_closed_bh, opaque);
> >>>>>>         }
> >>>>>>         break;
> >>>>>>
> >>>>>>I think it's time we dropped the FIXME and moved the handling to common
> >>>>>>code. Jason? Marc-André?
> >>>>>I agree. Just to confirm, do you prefer bh or doing changes like what is
> >>>>>done in this series? It looks to me bh can have more easier codes.
> >>>>Could it be a good idea just to make disconnect in the char device but
> >>>>postphone clean up in the vhost-user-blk (or any other vhost-user
> >>>>device) itself? So we are moving the postphone logic and decision from
> >>>>the char device to vhost-user device. One of the idea i have is as
> >>>>follows:
> >>>>   - Put ourself in the INITIALIZATION state
> >>>>   - Start these vhost-user "handshake" commands
> >>>>   - If we got a disconnect error, perform disconnect, but don't clean up
> >>>>     device (it will be clean up on the roll back). I can be done by
> >>>>     checking the state in vhost_user_..._disconnect routine or smth like it
> >>>
> >>>Any issue you saw just using the aio bh as Michael posted above.
> >>>
> >>>Then we don't need to deal with the silent vhost_dev_stop() and we will have
> >>>codes that is much more easier to understand.
> >>I've implemented this solution inside
> >>hw/block/vhost-user-blk.c:vhost_user_blk_event() in the similar way by
> >>using the s->connected field. Looks good and more correct fix ). I have
> >>two questions here before i'll rework the fixes:
> >>1. Is it okay to make the similar fix inside vhost_user_blk_event() or
> >>we are looking for more generic vhost-user solution? What do you think?
> >>2. For migration we require an additional information that for the
> >>vhost-user device it isn't an error, because i'm trigerring the
> >>following assert error:
> >>   Core was generated by `x86_64-softmmu/qemu-system-x86_64 -nodefaults -no-user-config -M q35,sata=false'.
> >>   Program terminated with signal SIGABRT, Aborted.
> >>   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> >>   [Current thread is 1 (Thread 0x7fb486ef5700 (LWP 527734))]
> >>
> >>   (gdb) bt
> >>   #0  0x00007fb56e729428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> >>   #1  0x00007fb56e72b02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
> >>   #2  0x00005648ea376ee6 in vhost_log_global_start
> >>       (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
> >>   #3  0x00005648ea2dde7e in memory_global_dirty_log_start ()
> >>       at ./memory.c:2611
> >>   #4  0x00005648ea2e68e7 in ram_init_bitmaps (rs=0x7fb4740008c0)
> >>       at ./migration/ram.c:2305
> >>   #5  0x00005648ea2e698b in ram_init_all (rsp=0x5648eb1f0f20 <ram_state>)
> >>       at ./migration/ram.c:2323
> >>   #6  0x00005648ea2e6cc5 in ram_save_setup (f=0x5648ec609e00,
> >>       opaque=0x5648eb1f0f20 <ram_state>)
> >>       at ./migration/ram.c:2436
> >>   #7  0x00005648ea67b7d3 in qemu_savevm_state_setup (f=0x5648ec609e00) at
> >>       migration/savevm.c:1176
> >>   #8  0x00005648ea674511 in migration_thread (opaque=0x5648ec031ff0) at
> >>       migration/migration.c:3416
> >>   #9  0x00005648ea85d65d in qemu_thread_start (args=0x5648ec6057f0) at
> >>       util/qemu-thread-posix.c:519
> >>   #10 0x00007fb56eac56ba in start_thread () from
> >>       /lib/x86_64-linux-gnu/libpthread.so.0
> >>   #11 0x00007fb56e7fb41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> >>   (gdb) frame 2
> >>   #2  0x00005648ea376ee6 in vhost_log_global_start
> >>      (listener=0x5648ece4eb08) at ./hw/virtio/vhost.c:857
> >>   857             abort();
> >>   (gdb) list
> >>   852     {
> >>   853         int r;
> >>   854
> >>   855         r = vhost_migration_log(listener, true);
> >>   856         if (r < 0) {
> >>   857             abort();
> >>   858         }
> >>   859     }
> >>   860
> >>   861     static void vhost_log_global_stop(MemoryListener *listener)
> >>Since bh postphone the clean up, we can't use the ->started field.
> >>Do we have any mechanism to get the device type/state in the common
> >>vhost_migration_log() routine? So for example for the vhost-user/disconnect
> >>device we will be able to return 0. Or should we implement it and introduce
> >>it in this patch set?
> >>
> >>Thanks, Dima.
> >>
> >>>Thank
> >>>
> >>>
> >>>>   - vhost-user command returns error back to the _start() routine
> >>>>   - Rollback in one place in the start() routine, by calling this
> >>>>     postphoned clean up for the disconnect
> >>>>
> 


  reply	other threads:[~2020-05-18  9:34 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-30 13:36 [PATCH v2 0/5] vhost-user reconnect issues during vhost initialization Dima Stepanov
2020-04-30 13:36 ` [PATCH v2 1/5] char-socket: return -1 in case of disconnect during tcp_chr_write Dima Stepanov
2020-05-06  8:54   ` Li Feng
2020-05-06  9:46   ` Marc-André Lureau
2020-04-30 13:36 ` [PATCH v2 2/5] vhost: introduce wrappers to set guest notifiers for virtio device Dima Stepanov
2020-05-04  0:36   ` Raphael Norwitz
2020-05-06  8:54     ` Dima Stepanov
2020-05-11  3:03   ` Jason Wang
2020-05-11  8:55     ` Dima Stepanov
2020-04-30 13:36 ` [PATCH v2 3/5] vhost-user-blk: add mechanism to track the guest notifiers init state Dima Stepanov
2020-05-04  1:06   ` Raphael Norwitz
2020-05-06  8:51     ` Dima Stepanov
2020-04-30 13:36 ` [PATCH v2 4/5] vhost: check vring address before calling unmap Dima Stepanov
2020-05-04  1:13   ` Raphael Norwitz
2020-05-11  3:05   ` Jason Wang
2020-05-11  9:11     ` Dima Stepanov
2020-05-12  3:26       ` Jason Wang
2020-05-12  9:08         ` Dima Stepanov
2020-05-13  3:00           ` Jason Wang
2020-05-13  9:36             ` Dima Stepanov
2020-05-14  7:28               ` Jason Wang
2020-04-30 13:36 ` [PATCH v2 5/5] vhost: add device started check in migration set log Dima Stepanov
2020-05-06 22:08   ` Raphael Norwitz
2020-05-07  7:15     ` Michael S. Tsirkin
2020-05-07 15:35     ` Dima Stepanov
2020-05-11  0:03       ` Raphael Norwitz
2020-05-11  9:43         ` Dima Stepanov
2020-05-11  3:15   ` Jason Wang
2020-05-11  9:25     ` Dima Stepanov
2020-05-12  3:32       ` Jason Wang
2020-05-12  3:47         ` Li Feng
2020-05-12  9:23           ` Dima Stepanov
2020-05-12  9:35         ` Dima Stepanov
2020-05-13  3:20           ` Jason Wang
2020-05-13  9:39             ` Dima Stepanov
2020-05-13  4:15           ` Michael S. Tsirkin
2020-05-13  5:56             ` Jason Wang
2020-05-13  9:47               ` Dima Stepanov
2020-05-14  7:34                 ` Jason Wang
2020-05-15 16:54                   ` Dima Stepanov
2020-05-16  3:20                     ` Li Feng
2020-05-18  2:52                       ` Jason Wang
2020-05-18  9:33                         ` Dima Stepanov [this message]
2020-05-18  9:27                       ` Dima Stepanov
2020-05-18  2:50                     ` Jason Wang
2020-05-18  9:41                       ` Dima Stepanov
2020-05-18  9:53                         ` Dr. David Alan Gilbert
2020-05-19  9:07                           ` Dima Stepanov
2020-05-19 10:24                             ` Dr. David Alan Gilbert
2020-05-19  9:59                     ` Michael S. Tsirkin
2020-05-19  9:13               ` Dima Stepanov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200518093313.GA6489@dimastep-nix \
    --to=dimastep@yandex-team.ru \
    --cc=arei.gonglei@huawei.com \
    --cc=dgilbert@redhat.com \
    --cc=fam@euphon.net \
    --cc=fengli@smartx.com \
    --cc=jasowang@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=raphael.norwitz@nutanix.com \
    --cc=stefanha@redhat.com \
    --cc=yc-core@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.