All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yongji Xie <elohimes@gmail.com>
To: jasowang@redhat.com
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	zhangyu31@baidu.com, Xie Yongji <xieyongji@baidu.com>,
	lilin24@baidu.com, qemu-devel@nongnu.org, chaiwen@baidu.com,
	marcandre.lureau@redhat.com, nixun@baidu.com
Subject: Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting
Date: Wed, 12 Dec 2018 17:18:05 +0800	[thread overview]
Message-ID: <CAONzpcZ_RqUujdSoVCPUHH0A2g4hHY=oM9x-04e-6WugYFQOiw@mail.gmail.com> (raw)
In-Reply-To: <cc6a464a-3193-f95b-08e5-c67e95d72dd5@redhat.com>

On Wed, 12 Dec 2018 at 15:47, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2018/12/12 下午2:41, Yongji Xie wrote:
> > On Wed, 12 Dec 2018 at 12:07, Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On 2018/12/12 上午11:21, Yongji Xie wrote:
> >>> On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote:
> >>>> On 2018/12/12 上午10:48, Yongji Xie wrote:
> >>>>> On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote:
> >>>>>> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote:
> >>>>>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote:
> >>>>>>>> On 2018/12/6 下午2:35,elohimes@gmail.com  wrote:
> >>>>>>>>> From: Xie Yongji<xieyongji@baidu.com>
> >>>>>>>>>
> >>>>>>>>> This patchset is aimed at supporting qemu to reconnect
> >>>>>>>>> vhost-user-blk backend after vhost-user-blk backend crash or
> >>>>>>>>> restart.
> >>>>>>>>>
> >>>>>>>>> The patch 1 tries to implenment the sync connection for
> >>>>>>>>> "reconnect socket".
> >>>>>>>>>
> >>>>>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> >>>>>>>>> to support offering shared memory to backend to record
> >>>>>>>>> its inflight I/O.
> >>>>>>>>>
> >>>>>>>>> The patch 3,4 are the corresponding libvhost-user patches of
> >>>>>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >>>>>>>>>
> >>>>>>>>> The patch 5 supports vhost-user-blk to reconnect backend when
> >>>>>>>>> connection closed.
> >>>>>>>>>
> >>>>>>>>> The patch 6 tells qemu that we support reconnecting now.
> >>>>>>>>>
> >>>>>>>>> To use it, we could start qemu with:
> >>>>>>>>>
> >>>>>>>>> qemu-system-x86_64 \
> >>>>>>>>>              -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> >>>>>>>>>              -device vhost-user-blk-pci,chardev=char0 \
> >>>>>>>>>
> >>>>>>>>> and start vhost-user-blk backend with:
> >>>>>>>>>
> >>>>>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket
> >>>>>>>>>
> >>>>>>>>> Then we can restart vhost-user-blk at any time during VM running.
> >>>>>>>> I wonder whether or not it's better to handle this at the level of virtio
> >>>>>>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to
> >>>>>>>> driver might be sufficient?
> >>>>>>>>
> >>>>>>>> Another possible issue is, looks like you need to deal with different kinds
> >>>>>>>> of ring layouts e.g packed virtqueues.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>> I'm not sure I understand your comments here.
> >>>>>>> All these would be guest-visible extensions.
> >>>>>> Looks not, it only introduces a shared memory between qemu and
> >>>>>> vhost-user backend?
> >>>>>>
> >>>>>>
> >>>>>>> Possible for sure but how is this related to
> >>>>>>> a patch supporting transparent reconnects?
> >>>>>> I might miss something. My understanding is that we support transparent
> >>>>>> reconnects, but we can't deduce an accurate last_avail_idx and this is
> >>>>>> what capability this series try to add. To me, this series is functional
> >>>>>> equivalent to expose last_avail_idx (or avail_idx_cons) in available
> >>>>>> ring. So the information is inside guest memory, vhost-user backend can
> >>>>>> access it and update it directly. I believe this is some modern NIC did
> >>>>>> as well (but index is in MMIO area of course).
> >>>>>>
> >>>>> Hi Jason,
> >>>>>
> >>>>> If my understand is correct, it might be not enough to only expose
> >>>>> last_avail_idx.
> >>>>> Because we would not process descriptors in the same order in which they have
> >>>>> been made available sometimes. If so, we can't get correct inflight
> >>>>> I/O information
> >>>>> from available ring.
> >>>> You can get this with the help of the both used ring and last_avail_idx
> >>>> I believe. Or maybe you can give us an example?
> >>>>
> >>> A simple example, we assume ring size is 8:
> >>>
> >>> 1. guest fill avail ring
> >>>
> >>> avail ring: 0 1 2 3 4 5 6 7
> >>> used ring:
> >>>
> >>> 2. vhost-user backend complete 4,5,6,7 and fill used ring
> >>>
> >>> avail ring: 0 1 2 3 4 5 6 7
> >>> used ring: 4 5 6 7
> >>>
> >>> 3. guest fill avail ring again
> >>>
> >>> avail ring: 4 5 6 7 4 5 6 7
> >>> used ring: 4 5 6 7
> >>>
> >>> 4. vhost-user backend crash
> >>>
> >>> The inflight descriptors 0, 1, 2, 3 lost.
> >>>
> >>> Thanks,
> >>> Yongji
> >>
> >> Ok, then we can simply forbid increasing the avail_idx in this case?
> >>
> >> Basically, it's a question of whether or not it's better to done it in
> >> the level of virtio instead of vhost. I'm pretty sure if we expose
> >> sufficient information, it could be done without touching vhost-user.
> >> And we won't deal with e.g migration and other cases.
> >>
> > OK, I get your point. That's indeed an alternative way. But this feature seems
> > to be only useful to vhost-user backend.
>
>
> I admit I could not think of a use case other than vhost-user.
>
>
> >   I'm not sure whether it make sense to
> > touch virtio protocol for this feature.
>
>
> Some possible advantages:
>
> - Feature could be determined and noticed by user or management layer.
>
> - There's no need to invent ring layout specific protocol to record in
> flight descriptors. E.g if my understanding is correct, for this series
> and for the example above, it still can not work for packed virtqueue
> since descriptor id is not sufficient (descriptor could be overwritten
> by used one). You probably need to have a (partial) copy of descriptor
> ring for this.
>
> - No need to deal with migration, all information was in guest memory.
>

Yes, we have those advantages. But seems like handle this in vhost-user
level could be easier to be maintained in production environment. We can
support old guest. And the bug fix will not depend on guest kernel updating.

Thanks,
Yongji

  reply	other threads:[~2018-12-12  9:18 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-06  6:35 [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting elohimes
2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 1/6] char-socket: Enable "wait" option for client mode elohimes
2018-12-06  7:23   ` Marc-André Lureau
2018-12-06  7:53     ` Yongji Xie
2018-12-06  9:31   ` Yury Kotov
2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 2/6] vhost-user: Add shared memory to record inflight I/O elohimes
2018-12-06  7:19   ` Marc-André Lureau
2018-12-06  7:22     ` Yongji Xie
2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 3/6] libvhost-user: Introduce vu_queue_map_desc() elohimes
2018-12-06  7:16   ` Marc-André Lureau
2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 4/6] libvhost-user: Support recording inflight I/O in shared memory elohimes
2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 5/6] vhost-user-blk: Add support for reconnecting backend elohimes
2018-12-06 12:21   ` Yury Kotov
2018-12-06 13:26     ` Yongji Xie
2018-12-06  6:35 ` [Qemu-devel] [PATCH for-4.0 6/6] contrib/vhost-user-blk: enable inflight I/O recording elohimes
2018-12-06  7:23 ` [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting Marc-André Lureau
2018-12-06  7:43   ` Yongji Xie
2018-12-06  9:21 ` Yury Kotov
2018-12-06  9:41   ` Yongji Xie
2018-12-06  9:52     ` Yury Kotov
2018-12-06 10:35       ` Yongji Xie
2018-12-06 13:57 ` Jason Wang
2018-12-06 13:59   ` Michael S. Tsirkin
2018-12-10  9:32     ` Jason Wang
2018-12-12  2:48       ` Yongji Xie
2018-12-12  3:00         ` Jason Wang
2018-12-12  3:21           ` Yongji Xie
2018-12-12  4:06             ` Jason Wang
2018-12-12  6:41               ` Yongji Xie
2018-12-12  7:47                 ` Jason Wang
2018-12-12  9:18                   ` Yongji Xie [this message]
2018-12-13  2:58                     ` Jason Wang
2018-12-13  3:41                       ` Yongji Xie
2018-12-13 14:56                         ` Michael S. Tsirkin
2018-12-14  4:36                           ` Jason Wang
2018-12-14 13:31                             ` Michael S. Tsirkin
2018-12-06 14:00   ` Jason Wang
2018-12-07  8:56   ` Yongji Xie
2018-12-13 14:45 ` Michael S. Tsirkin
2018-12-14  1:56   ` Yongji Xie
2018-12-14  2:20     ` Michael S. Tsirkin
2018-12-14  2:33       ` Yongji Xie
2018-12-14 21:23         ` Michael S. Tsirkin
2018-12-15 11:34           ` Yongji Xie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAONzpcZ_RqUujdSoVCPUHH0A2g4hHY=oM9x-04e-6WugYFQOiw@mail.gmail.com' \
    --to=elohimes@gmail.com \
    --cc=chaiwen@baidu.com \
    --cc=jasowang@redhat.com \
    --cc=lilin24@baidu.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=nixun@baidu.com \
    --cc=qemu-devel@nongnu.org \
    --cc=xieyongji@baidu.com \
    --cc=zhangyu31@baidu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.