From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,LOTS_OF_MONEY, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2C17C433B4 for ; Thu, 13 May 2021 08:22:34 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AE0AA613BF for ; Thu, 13 May 2021 08:22:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AE0AA613BF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:39990 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lh6c4-0001m0-Cn for qemu-devel@archiver.kernel.org; Thu, 13 May 2021 04:22:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:33978) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lh6ae-0000KM-BA for qemu-devel@nongnu.org; Thu, 13 May 2021 04:21:04 -0400 Received: from mail-qt1-x82d.google.com ([2607:f8b0:4864:20::82d]:35404) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lh6aY-0005ZS-RB for qemu-devel@nongnu.org; Thu, 13 May 2021 04:21:02 -0400 Received: by mail-qt1-x82d.google.com with SMTP id c11so19120625qth.2 for ; Thu, 13 May 2021 01:20:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=l7L2znK2f0zqMyDvrSJq8fah8dMOz2i3I6OVYRzOwE8=; b=Yvt+AJkSxSwyLDeAFCEhduYLsTjONEu+ol9jnRE/zBKAAKqxYiLam0148D0h08+z7Z DTREDrPUC3kS07/Ofl12GZqFUQ0MTuFpZMCxKmz1EZsEdWvh5rcaN0SJFdWFUjuJrL5I qLd/zFdS9mBF8IA+4YJWeY8GH0W+0zFGafp5sdJmyWgjA4gp5ztgzLlXyoz4BNdb6p08 ungcC+R5298g31rKOGrI8nZy0VYw29a0chA2I4iGi2wwZzDCrPb/P096jIwVfyNx11hE pTGgwwqaLXoEyMjkBcJy6/1Fg8nDORP6jbOpjjWh5raXZNLQNilTBBZy8RfWhaShGSRz BFoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=l7L2znK2f0zqMyDvrSJq8fah8dMOz2i3I6OVYRzOwE8=; b=ugCrIisSwUEKF/PjNlPzHICKZ5+UkUuvpMvVzy5EWlzwbkb8Bw8ufGjpxT6WV7PtR1 BLHpLfeD7HZG3K6r/xPEJYFg4IrV6TnuyF7Uo+vF6RtGk+yMn9K0ROvLRo3T9Wstzgi7 89TY+0piZqihwXYTmxUKWsdHEM9hnXTqPY8C6t3I8DmnV5LvTZQ6zsXuF2ZmBOCNqDNc BPEh+LiQSaYBu23RSsGkuGP5jr3+cRuq1NrtIegn3OZwPP5G8LohWk5ZI9j8gQjDPoZ1 YjHgaW5YsjBfltS1BwphwKIkr1S237YQw657RvVmhk3QLAX/SxIJDsdMvRIIqsNu5vyO p1XA== X-Gm-Message-State: AOAM531SelOX3NVfd8rLjF+Ui9CtaXsl2cxiseHh7dboH3Er8mM55htC 5SmuTfr55Ykp3wCS1a6Pxhp7B//XloCi2VHkf1go3g== X-Google-Smtp-Source: ABdhPJxQZUoMU3O84lKZX5MeelrZWqi3UzCvmMZ2KdSHYc6S/Whe1BJYsg7jYZ2axUUPKx0PbVqLL8LCBiJYIlYZwPE= X-Received: by 2002:a05:622a:1044:: with SMTP id f4mr21585340qte.181.1620894032851; Thu, 13 May 2021 01:20:32 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jiachen Zhang Date: Thu, 13 May 2021 16:20:22 +0800 Message-ID: Subject: Re: [Phishing Risk] [External] Re: [Virtio-fs] vhost-user reconnection and crash recovery To: "Boeuf, Sebastien" Content-Type: multipart/alternative; boundary="0000000000000c0e7505c231cdea" Received-SPF: pass client-ip=2607:f8b0:4864:20::82d; envelope-from=zhangjiachen.jaycee@bytedance.com; helo=mail-qt1-x82d.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, LOTS_OF_MONEY=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "virtio-fs@redhat.com" , Yongji Xie , "qemu-devel@nongnu.org" , Stefan Hajnoczi , fam.zheng@bytedance.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" --0000000000000c0e7505c231cdea Content-Type: text/plain; charset="UTF-8" Hi Stefan and Sebastien, I think I should give some background context from my perspective. For the virtiofsd crash reconnection (recovery) to QEMU, as said by Stefan, we discussed the possible implementation on the bi-weekly virtio-fs call. I had also sent an RFC patch to the virtio-fs mail-list ( https://patchwork.kernel.org/project/qemu-devel/cover/20201215162119.27360-1-zhangjiachen.jaycee@bytedance.com/), we also have some discussion on the further revision direction in that mail. We also have some needs to support virtiofsd crash recovery when it is used with cloud-hypervisor (https://github.com/cloud-hypervisor/cloud-hypervisor). However, the virtiofsd crash reconnection RFC patch relies on QEMU's vhost-user socket reconnection feature and QEMU's vhost-user inflight I/O tracking feature, which are both not supported by cloud-hypervisor. So I also issued an initial pull-request of cloud-hypervisor vhost-user socket reconnection ( https://github.com/cloud-hypervisor/cloud-hypervisor/pull/2387), which is reviewed by Sebastien. Based on vhost-user socket reconnection, we also want to further develop vhost-user inflight I/O tracking feature for cloud-hypervisor, and finally to support virtiofsd crash reconnection. I am sorry for the delayed patch-revision of the two patch sets. I hope I can free up some time in these two months to make some further progress. All the best, Jiachen On Tue, May 11, 2021 at 11:02 PM Boeuf, Sebastien wrote: > Hi Stefan, > > Thanks for the explanation. > > So reconnection for vhost-user is not a well defined behavior, > and QEMU is doing its best to retry when possible, depending > on each device. > > The guest does not know about it, so it's never notified that > the device needs to be reset. > > But what about the vhost-user backend initialization? Does > QEMU go again through initializing memory table, vrings, etc... > since it can't assume anything from the backend? > > Thanks, > Sebastien > > ------------------------------ > *From:* Stefan Hajnoczi > *Sent:* Tuesday, May 11, 2021 2:45 PM > *To:* Boeuf, Sebastien > *Cc:* virtio-fs@redhat.com; qemu-devel@nongnu.org > *Subject:* vhost-user reconnection and crash recovery > > Hi Sebastien, > On #virtio-fs IRC you asked: > > I have a vhost-user question regarding disconnection/reconnection. How > should this be handled? Let's say the vhost-user backend disconnects, > and reconnects later on, does QEMU reset the virtio device by notifying > the guest? Or does it simply reconnects to the backend without letting > the guest know about what happened? > > The vhost-user protocol does not have a generic reconnection solution. > Reconnection is handled on a case-by-case basis because device-specific > and implementation-specific state is involved. > > The vhost-user-fs-pci device in QEMU has not been tested with > reconnection as far as I know. > > The ideal reconnection behavior is to resume the device from its > previous state without disrupting the guest. Device state must survive > reconnection in order for this to work. Neither QEMU virtiofsd nor > virtiofsd-rs implement this today. > > virtiofs has a lot of state, making it particularly difficult to support > either DEVICE_NEEDS_RESET or transparent vhost-user reconnection. We > have discussed virtiofs crash recovery on the bi-weekly virtiofs call > (https://etherpad.opendev.org/p/virtiofs-external-meeting). If you want > to work on this then joining the call would be a good starting point to > coordinate with others. > > One approach for transparent crash recovery is for virtiofsd to keep its > state in tmpfs (e.g. inode/fd mappings) and open fds shared with a > clone(2) process via CLONE_FILES. This way the virtiofsd process can > terminate but its state persists in memory thanks to its clone process. > The clone can then be used to launch the new virtiofsd process from the > old state. This would allow the device to resume transparently with QEMU > only reconnecting the vhost-user UNIX domain socket. This is an idea > that we discussed in the bi-weekly virtiofs call. > > You mentioned device reset. VIRTIO 1.1 has the Device Status Field > DEVICE_NEEDS_RESET flat that the device can use to tell the driver that > a reset is necessary. This feature is present in the specification but > not implemented in the Linux guest drivers. Again the reason is that > handling it requires driver-specific logic for restoring state after > reset...otherwise the device reset would be visible to userspace. > > Stefan > > --------------------------------------------------------------------- > Intel Corporation SAS (French simplified joint stock company) > Registered headquarters: "Les Montalets"- 2, rue de Paris, > 92196 Meudon Cedex, France > Registration Number: 302 456 199 R.C.S. NANTERRE > Capital: 4,572,000 Euros > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > _______________________________________________ > Virtio-fs mailing list > Virtio-fs@redhat.com > https://listman.redhat.com/mailman/listinfo/virtio-fs > --0000000000000c0e7505c231cdea Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Stefan and Sebastien,

I think I should give some background context from my perspective.
=

For the virtiofsd crash reconnection (recovery) to QEMU= , as said by Stefan,=C2=A0we discussed the possible implementation on the b= i-weekly virtio-fs call. I had also sent an RFC patch to the virtio-fs mail= -list (https://patchwork.k= ernel.org/project/qemu-devel/cover/20201215162119.27360-1-zhangjiachen.jayc= ee@bytedance.com/), we also have some discussion on the further revisio= n direction in that mail.=C2=A0

We also have some = needs to support virtiofsd crash recovery when it is used with cloud-hyperv= isor (http= s://github.com/cloud-hypervisor/cloud-hypervisor). However, the virtiof= sd crash reconnection RFC patch relies on QEMU's=C2=A0vhost-user socket= reconnection feature and QEMU's=C2=A0vhost-user inflight I/O tracking = feature, which are both not supported by cloud-hypervisor.

So I also issued=C2=A0an initial pull-request=C2=A0of cloud-hyperv= isor vhost-user socket reconnection (https://github.com/cloud-hypervisor/cl= oud-hypervisor/pull/2387), which is reviewed by Sebastien. Based on=C2= =A0vhost-user socket reconnection, we also want to further develop vhost-us= er inflight I/O tracking feature for cloud-hypervisor, and finally to suppo= rt virtiofsd crash reconnection.

I am sorry for th= e delayed patch-revision of the two patch sets. I hope I can free up some t= ime in these two months to make some further progress.

=
All the best,
Jiachen

On Tue, May 11, 2021 at 11:02 PM = Boeuf, Sebastien <sebastien= .boeuf@intel.com> wrote:
Hi Stefan,

Thanks for the explanation.

So reconnection for vhost-user is not a well defined behavior,
and QEMU is doing its best to retry when possible, depending
on each device.

The guest does not know about it, so it's never notified that
the device needs to be reset.

But what about the vhost-user backend initialization? Does
QEMU go again through initializing memory table, vrings, etc...
since it can't assume anything from the backend?

Thanks,
Sebastien


From: Stefan Hajnoczi
Sent: Tuesday, May 11, 2021 2:45 PM
To: Boeuf, Sebastien
Cc: virtio= -fs@redhat.com; qemu-devel@nongnu.org
Subject: vhost-user reconnection and crash recovery

Hi Sebastien,
On #virtio-fs IRC you asked:

=C2=A0I have a vhost-user question regarding disconnection/reconnection. Ho= w
=C2=A0should this be handled? Let's say the vhost-user backend disconne= cts,
=C2=A0and reconnects later on, does QEMU reset the virtio device by notifyi= ng
=C2=A0the guest? Or does it simply reconnects to the backend without lettin= g
=C2=A0the guest know about what happened?

The vhost-user protocol does not have a generic reconnection solution.
Reconnection is handled on a case-by-case basis because device-specific
and implementation-specific state is involved.

The vhost-user-fs-pci device in QEMU has not been tested with
reconnection as far as I know.

The ideal reconnection behavior is to resume the device from its
previous state without disrupting the guest. Device state must survive
reconnection in order for this to work. Neither QEMU virtiofsd nor
virtiofsd-rs implement this today.

virtiofs has a lot of state, making it particularly difficult to support either DEVICE_NEEDS_RESET or transparent vhost-user reconnection. We
have discussed virtiofs crash recovery on the bi-weekly virtiofs call
(https://etherpad.opendev.org/p/v= irtiofs-external-meeting). If you want
to work on this then joining the call would be a good starting point to
coordinate with others.

One approach for transparent crash recovery is for virtiofsd to keep its state in tmpfs (e.g. inode/fd mappings) and open fds shared with a
clone(2) process via CLONE_FILES. This way the virtiofsd process can
terminate but its state persists in memory thanks to its clone process.
The clone can then be used to launch the new virtiofsd process from the
old state. This would allow the device to resume transparently with QEMU only reconnecting the vhost-user UNIX domain socket. This is an idea
that we discussed in the bi-weekly virtiofs call.

You mentioned device reset. VIRTIO 1.1 has the Device Status Field
DEVICE_NEEDS_RESET flat that the device can use to tell the driver that
a reset is necessary. This feature is present in the specification but
not implemented in the Linux guest drivers. Again the reason is that
handling it requires driver-specific logic for restoring state after
reset...otherwise the device reset would be visible to userspace.

Stefan

--------------------------------------------------------------------- Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:=C2=A0 302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________
Virtio-fs mailing list
Virtio-fs@redhat.= com
https://listman.redhat.com/mailman/listinfo/vir= tio-fs
--0000000000000c0e7505c231cdea--