All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Marc-André Lureau" <marcandre.lureau@gmail.com>
Cc: Tetsuya Mukawa <mukawa@igel.co.jp>,
	yuanhan liu <yuanhan.liu@linux.intel.com>,
	Victor Kaplansky <victork@redhat.com>,
	jonshin@cisco.com, QEMU <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail
Date: Mon, 4 Jul 2016 18:43:51 +0300	[thread overview]
Message-ID: <20160704183940-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <CAJ+F1CLOEfrrQ-YxMNsKcG12OnoW26bvn8FkEqqkqD9jumu3Gw@mail.gmail.com>

On Mon, Jun 27, 2016 at 11:01:42AM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Sat, Jun 25, 2016 at 12:38 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Jun 24, 2016 at 03:25:30PM +0200, Marc-André Lureau wrote:
> >> On Thu, Jun 23, 2016 at 7:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >> > If it's ok and we can recover, then why should we print errors?
> >> >>
> >> >> To me, the current disconnect handling is not handled cleanly. There
> >> >> is not much/nothing in the protocol that tells when and how you can
> >> >> disconnect. If qemu vhost-user reconnection works today, it is mostly
> >> >> by luck. Imho, we should print errors if any call to vhost user fails
> >> >> to help further analysis of broken behaviour.
> >> >
> >> > But we decided disconnect is OK, did we not? So now a failure like this
> >> > is just expected. It's not broken behaviour anymore ...
> >>
> >> As you know, there are many ways qemu or the vm can break when the
> >> backend is disconnected.  For now, it would help a lot if we had a bit
> >> of error messages in this case. But do we really want to support
> >> spurious disconnect/reconnect at any time? It's going to be
> >> challenging, to maintain, to test... Is it really worthwhile? I doubt
> >> it. I think it would be easier and more future-proof to have a
> >> dedicated vhost-user request for that and only do a best effort for
> >> the ungraceful disconnect.
> >
> > ungraceful might take a while. But I don't see a need for
> > message exchanges for shutdown. Do you?
> 
> A message exchange would allow the server to do something before
> actually shuting down.: to check if shutdown is allowed/clean, to
> flush some pending operations, to change device state, to request
> something before shutdown (such as ring base etc),...

Who's the server here? It makes sense for backend to flush things,
but qemu doesn't seem to maintain too much state.

Could you be a bit more explicit?


I think it's ok to add a new message if it actually brings some
benefit, but I'm not sure why it makes sense to do it just in case.

> 
> >> >> And we shoud have a
> >> >> clean mechanism to handle shutdown/disconnect/reconnect.
> >> >
> >> >
> >> > At some level this is something that's missing here.
> >> >
> >> > How about we patch docs/specs/vhost-user.txt
> >> > and describe how is reconnection supposed to work?
> >> >
> >> > All we have is:
> >> >         If Master is unable to send the full message or receives a wrong reply
> >> >         it will close the connection. An optional reconnection mechanism can be
> >> >         implemented.
> >>
> >> This text is there from the original commit but it doesn't say how to
> >> reconnect and or how to recover from the ring. I don't have enough
> >> experience with virtio in general to say when it is acceptable for
> >> backend to be gone (not processing the ring), I can imagine the
> >> implementation vary widely, and the requirements depend on the kind of
> >> device.
> >>
> >> If we limit the spec to vhost-user protocol only and leave it open on
> >> how each virtio device should support ungraceful disconnect/reconnect,
> >> then it sounds reasonable to document that after such disconnect, on
> >> reconnect:
> >> - the server assumes the backend has no knowledge of previous connection
> >> - the state can be changed between reconnections (new ring state, mem table etc)
> >> - the server will check feature compatibility with previous backend
> >> and reject incompatible backends
> >> - backend must restore ring processing from used->idx and ignore
> >> VHOST_USER_SET_VRING_BASE (or should we change and document that in
> >> this case the ring base is updated from used->idx?)
> >
> > I think it's cleaner to change VHOST_USER_SET_VRING_BASE to get stuff
> > from used->idx.
> 
> The problem with this approach is that it depends on how the backend
> process the rings. Processing packets in order doesn't seem to be
> required in general, or is it?


The requirement is that packets are flushed before socket
is closed.

If they are in order then you can close at any point.

With out of order, you need to flush them before closing socket.




> >> (it's probably not a complete list, I am not good at imagining all
> >> possible cases)
> >
> > used ring must be flushed before disconnect (so all entries
> > before used->idx have been processed, and no entries
> > after used->idx have been processed).
> > If enabled, dirty log must be flushed before disconnect.
> 
> Ok that sounds like a good requirement for "non-explicit graceful
> shutdown" (how would you name it?) and seem to solve the processing
> order problem. Yet, at this point, I think it would be easy for the
> backend to do a proper shutdown request with all the benefits I
> listed.
> 
> 
> -- 
> Marc-André Lureau

  reply	other threads:[~2016-07-04 15:44 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-21 10:02 [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 01/24] misc: indentation marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 02/24] vhost-user: minor simplification marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 03/24] qemu-char: check socket is actually connected marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 04/24] vhost-user: check qemu_chr_fe_set_msgfds() return value marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail marcandre.lureau
2016-06-23  4:27   ` Michael S. Tsirkin
2016-06-23  9:19     ` Marc-André Lureau
2016-06-23 17:13       ` Michael S. Tsirkin
2016-06-24 13:25         ` Marc-André Lureau
2016-06-24 22:38           ` Michael S. Tsirkin
2016-06-27  9:01             ` Marc-André Lureau
2016-07-04 15:43               ` Michael S. Tsirkin [this message]
2016-07-06 13:47                 ` Marc-André Lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 06/24] vhost-user: check vhost_user_write() return value marcandre.lureau
2016-06-23  4:27   ` Michael S. Tsirkin
2016-06-23  9:16     ` Marc-André Lureau
2016-06-23 17:08       ` Michael S. Tsirkin
2016-06-24 12:49         ` Marc-André Lureau
2016-07-04 15:46           ` Michael S. Tsirkin
2016-07-04 22:01             ` Marc-André Lureau
2016-07-04 22:36               ` Michael S. Tsirkin
2016-06-21 10:02 ` [Qemu-devel] [PATCH 07/24] vhost: use error_report() instead of fprintf(stderr, ...) marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 08/24] vhost-user: return a read error marcandre.lureau
2016-06-23  4:27   ` Michael S. Tsirkin
2016-06-23  9:14     ` Marc-André Lureau
2016-06-23 17:03       ` Michael S. Tsirkin
2016-06-24 12:46         ` Marc-André Lureau
2016-07-04 15:47           ` Michael S. Tsirkin
2016-07-04 21:56             ` Marc-André Lureau
2016-07-04 22:35               ` Michael S. Tsirkin
2016-07-05  9:18                 ` Marc-André Lureau
2016-07-05 11:12                   ` Michael S. Tsirkin
2016-07-06 13:40                     ` Marc-André Lureau
2016-07-06 13:52                       ` Marc-André Lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 09/24] vhost: make vhost_log_put() idempotent marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 10/24] vhost: call vhost_log_put() on cleanup marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 11/24] vhost: add vhost device only after all success marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 12/24] vhost: make vhost_dev_cleanup() idempotent marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 13/24] vhost-net: always call vhost_dev_cleanup() on failure marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 14/24] vhost: don't assume opaque is a fd, use backend cleanup marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 15/24] vhost: fix calling vhost_dev_cleanup() after vhost_dev_init() marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 16/24] vhost-user: keep vhost_net after a disconnection marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 17/24] Revert "vhost-net: do not crash if backend is not present" marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 18/24] get_vhost_net() should be != null after vhost_user_init marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 19/24] vhost-net: success if backend has no ops->vhost_migration_done marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 20/24] vhost: add assert() to check runtime behaviour marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 21/24] char: add chr_wait_connected callback marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 22/24] char: add and use tcp_chr_wait_connected marcandre.lureau
2016-06-21 10:02 ` [Qemu-devel] [PATCH 23/24] vhost-user: wait until link is up marcandre.lureau
2016-06-23  4:25   ` Michael S. Tsirkin
2016-06-23  9:11     ` Marc-André Lureau
2016-06-23 16:45       ` Michael S. Tsirkin
2016-06-21 10:02 ` [Qemu-devel] [PATCH 24/24] tests: add /vhost-user/connect-fail test marcandre.lureau
2016-06-23  4:31 ` [Qemu-devel] [PATCH 00/24] vhost-user reconnect fixes Michael S. Tsirkin
2016-06-24 13:22   ` Marc-André Lureau
2016-06-24 22:19     ` Michael S. Tsirkin
2016-06-27 12:37       ` Marc-André Lureau
2016-06-23  4:32 ` Michael S. Tsirkin
2016-06-24 13:17   ` Marc-André Lureau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160704183940-mutt-send-email-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=jonshin@cisco.com \
    --cc=marcandre.lureau@gmail.com \
    --cc=mukawa@igel.co.jp \
    --cc=qemu-devel@nongnu.org \
    --cc=victork@redhat.com \
    --cc=yuanhan.liu@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.