From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:39979)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <marcandre.lureau@gmail.com>) id 1akrGI-0000pF-Ja
	for qemu-devel@nongnu.org; Tue, 29 Mar 2016 06:52:39 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <marcandre.lureau@gmail.com>) id 1akrGE-0005Vx-9V
	for qemu-devel@nongnu.org; Tue, 29 Mar 2016 06:52:38 -0400
Received: from mail-oi0-x22e.google.com ([2607:f8b0:4003:c06::22e]:36831)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <marcandre.lureau@gmail.com>) id 1akrGE-0005Vt-29
	for qemu-devel@nongnu.org; Tue, 29 Mar 2016 06:52:34 -0400
Received: by mail-oi0-x22e.google.com with SMTP id r187so15894025oih.3
	for <qemu-devel@nongnu.org>; Tue, 29 Mar 2016 03:52:33 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <20160329081010.GA3080@yliu-dev.sh.intel.com>
References: <1441753806-14225-1-git-send-email-marcandre.lureau@redhat.com>
	<20151126121944-mutt-send-email-mst@redhat.com>
	<20160324071001.GA4525@yliu-dev.sh.intel.com>
	<CAJ+F1CL0MQsz5JHhWQDMTR-29hh_D65E5d+FUCdg4YQ8453TLA@mail.gmail.com>
	<20160329081010.GA3080@yliu-dev.sh.intel.com>
Date: Tue, 29 Mar 2016 12:52:32 +0200
Message-ID: <CAJ+F1CJziBRyq3SZJ7DYZij2fREY2PRMQtESy4xBoyKamnLXXw@mail.gmail.com>
From: =?UTF-8?B?TWFyYy1BbmRyw6kgTHVyZWF1?= <marcandre.lureau@gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and
	reconnection
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>, QEMU <qemu-devel@nongnu.org>, "Wang,
	Zhihong" <zhihong.wang@intel.com>, Tetsuya Mukawa <mukawa@igel.co.jp>, "Kavanagh,
	Mark B" <mark.b.kavanagh@intel.com>, "Traynor,
	Kevin" <kevin.traynor@intel.com>

Hi

On Tue, Mar 29, 2016 at 10:10 AM, Yuanhan Liu
<yuanhan.liu@linux.intel.com> wrote:
> Backend crash may be not that normal in production usage, it is in
> development stage. It would be handy if we can handle that as well,
> as it's a painful experience for me that every time I do a VM2VM test
> and once my changes to backend causes a crash (unfortunately, it
> happens often in early stage), I have to reboot the two VMs.
>
> If we have reconnect, life could be easier. I then don't need worry
> that I will mess the backend and crash it any more.
>
> And again, the reason I mentioned crash here again is not because
> we need handle it, specially. Instead, I was thinking we might be
> able to handle the two reasons both.

I think crash could be handle with queue reset Michael proposed, but
that would probably require guest side changes.

>> Conceptually, I think if we allow the backend to disconnect, it makes
>> sense that qemu is actually the socket server. But it doesn't matter
>> much, it's simple to teach qemu to reconnect a timer...
>
> We already have that, right? I mean, the char-dev "reconnect" option.

Sure, what I mean is that it sounds cleaner from a design pov for qemu
to be the server (since it is the one actually waiting for backend in
this case), beside a timer is often a pretty rough solution to
anything.

>> So we should
>> probably allow both cases anyway.
>
> Yes, I think both should work. I may be wrong (that I may miss
> something), but it seems it's (far) easier to me to keep QEMU
> as the client, and adding the "reconnect" option, judging that
> we have all stuff to make it work ready. In this way,
>
> - if backend crashes/quits, you just need restart the backend,
>   and QEMU will retry the connection.
>
> - if QEMU crashes/quits, you just need restart the QEMU, and
>   then QEMU will start a fresh connection.

It may all depend on use cases, it's not more obvious or easy than the
other to me.

> However, if let QEMU as the server, there are 2 major works need
> to be done in the backend side (take DPDK as example, that just
> acts as vhost-user server only so far):
>
> - Introducing a new API or extending current API to let it to
>   connect the server socket from QEMU.
>
> - If QEMU crashes/quits, we need add code to backend to keep
>   reconnecting unless connection is established, which is something
>   similar to the "reconnect" stuff in QEMU.
> As you can see, it needs more effort (though it's not something
> you care :). And it has duplicate work.


Ah, I am looking at this from qemu angle, backend may need to adapt if
it doesn't already handle both "socket role" (client & server).

>
>>
>> > - start DPDK vhost-switch example
>> >
>> > - start QEMU, which will connect to DPDK vhost-user
>> >
>> >   link is good now.
>> >
>> > - kill DPDK vhost-switch
>> >
>> >   link is broken at this stage
>> >
>> > - start DPDK vhost-switch again
>> >
>> >   you will find that the link is back again.
>> >
>> >
>> > Will that makes sense to you? If so, we may need do nothing (or just
>> > very few) changes at all to DPDK to get the reconnect work.
>>
>> The main issue with handling crashes (gone at any time) is that the
>> backend my not have time to sync the used idx (at the least). It may
>> already have processed incoming packets, so on reconnect, it may
>> duplicate the receiving/dispatching work.
>
> That's not the case for DPDK backend implementation: incoming packets
> won't be delivered for processing before we update the used idx.

It could be ok on incoming packet side (vm->dpdk), but I don't see how
you could avoid packet loss on the other side (dpdk->vm), since the
packets must be added to queues before updating used idx.

>
>> Similarly, on the backend
>> receiving end, some packets may be lost, never received by the VM, and
>> later overwritten by the backend after reconnect (for the same used
>> idx update reason). This may not be a big deal for unreliable
>> protocols, but I am not familiar enough with network usage to know if
>> that's fine in all cases. It may be fine for some packets, such as
>> udp.
>>
>> However, in general, vhost-user should not be specific to network
>> transmission, and it would be nice to have a reliable way for the the
>> backend to reconnect. That's what I try to do in this series.
>
> Agreed.
>
>> I'll
>> repost it after I have done more testing.
>
> Great, and thanks for the work. I will also take some time to give
> closer review.

thanks


--=20
Marc-Andr=C3=A9 Lureau