All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
To: "Marc-André Lureau" <marcandre.lureau@gmail.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	QEMU <qemu-devel@nongnu.org>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	virtio-fs@redhat.com, Xie Yongji <xieyongji@bytedance.com>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [External] Re: [RFC PATCH 0/9] Support for Virtio-fs daemon crash reconnection
Date: Fri, 18 Dec 2020 17:39:34 +0800	[thread overview]
Message-ID: <CAFQAk7hCqSMMfRjUO8vtK-B2cKxJZZTJgSDAbRycd1AOSktM_w@mail.gmail.com> (raw)
In-Reply-To: <CAJ+F1CLZ4VtgKp5fEdC70m22PgV2VHvRHunR-nPOWDnJPFvqqg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6943 bytes --]

On Wed, Dec 16, 2020 at 11:36 PM Marc-André Lureau <
marcandre.lureau@gmail.com> wrote:

> Hi
>
> On Tue, Dec 15, 2020 at 8:22 PM Jiachen Zhang <
> zhangjiachen.jaycee@bytedance.com> wrote:
>
>> Hi, all
>>
>> We implement virtio-fs crash reconnection in this patchset. The crash
>> reconnection of virtiofsd here is completely transparent to guest, no
>> remount in guest is needed, even the inflight requests can be handled
>> normally after reconnection. We are looking forward to any comments.
>>
>> Thanks,
>> Jiachen
>>
>>
>> OVERVIEW:
>>
>> To support virtio-fs crash reconnection, we need to support the recovery
>> of 1) inflight FUSE request, and 2) virtiofsd internal status information.
>>
>> Fortunately, QEMU's vhost-user reconnection framework already supports
>> inflight I/O tracking by using VHOST_USER_GET_INFLIGHT_FD and
>> VHOST_USER_SET_INFLIGHT_FD (see 5ad204bf2 and 5f9ff1eff for details).
>> As the FUSE requests are transferred by virtqueue I/O requests, by using
>> the vhost-user inflight I/O tracking, we can recover the inflight FUSE
>> requests.
>>
>> To support virtiofsd internal status recovery, we introduce 4 new
>> vhost-user message types. As shown in the following diagram, two of them
>> are used to persist shared lo_maps and opened fds to QEMU, the other two
>> message types are used to restore the status when reconnecting.
>>
>>                                VHOST_USER_SLAVE_SHM
>>                                VHOST_USER_SLAVE_FD
>>     +--------------+       Persist       +--------------------+
>>     |              <---------------------+                    |
>>     |     QEMU     |                     |  Virtio-fs Daemon  |
>>     |              +--------------------->                    |
>>     +--------------+       Restore       +--------------------+
>>             VHOST_USER_SET_SHM
>>             VHOST_USER_SET_FD
>>
>> Although the 4 newly added message types are to support virtiofsd
>> reconnection in this patchset, it might be potential in other situation.
>> So we keep in mind to make them more general when add them to vhost
>> related source files. VHOST_USER_SLAVE_SHM and VHOST_USER_SET_SHM can be
>> used for memory sharing between a vhost-user daemon and QEMU,
>> VHOST_USER_SLAVE_FD and VHOST_USER_SET_FD would be useful if we want to
>> shared opened fds between QEMU process and vhost-user daemon process.
>>
>
> Before adding new messages to the already complex vhost-user protocol, can
> we evaluate other options?
>
> First thing that came to my mind is that the memory state could be saved
> to disk or with a POSIX shared memory object.
>
>
Eventually, the protocol could just pass around the fds, and not make a
> special treatment for shared memory.
>
> Then I remember systemd has a pretty good API & protocol for this sort of
> thing: sd_notify(3) (afaik, it is quite easy to implement a minimal handler)
>
> You can store fds with FDSTORE=1 (with an optional associated FDNAME).
> sd_listen_fds() & others to get them back (note: passed by inheritance only
> I think). systemd seems to not make shm a special case either, just treat
> it like an opened fd to restore.
>
> If we consider backend processes are going to be managed by libvirt or
> even a systemd service, is it a better alternative? sd_notify() offers a
> number of interesting features as well to monitor services.
>
>

Thanks for the suggestions. Actually, we choose to save all state
information to QEMU because a virtiofsd has the same lifecycle as its
QEMU master. However, saving things to a file do avoid communication with
QEMU, and we no longer need to increase the complexity of vhost-user
protocol. The suggestion to save fds to the systemd is also very reasonable
if we don't consider the lifecycle issues, we will try it.

All the best,
Jiachen



>>
>> USAGE and NOTES:
>>
>> - The commits are rebased to a recent QEMU master commit
>> b4d939133dca0fa2b.
>>
>> - ",reconnect=1" should be added to the "-chardev socket" of
>> vhost-user-fs-pci
>> in the QEMU command line, for example:
>>
>>     qemu-system-x86_64 ... \
>>     -chardev socket,id=char0,path=/tmp/vhostqemu,reconnect=1 \
>>     -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs \
>>     ...
>>
>> - We add new options for virtiofsd to enable or disable crash
>> reconnection.
>> And some options are not supported by crash reconnection. So add following
>> options to virtiofsd to enable reconnection:
>>
>>     virtiofsd ... -o reconnect -o no_mount_ns -o no_flock -o no_posix_lock
>>     -o no_xattr ...
>>
>> - The reasons why virtiofsd-side locking, extended attributes, and mount
>> namespace are not supported is explained in the commit message of the 6th
>> patch (virtiofsd: Add two new options for crash reconnection).
>>
>> - The 9th patch is a work-around that will not affect the overall
>> correctness.
>> We remove the qsort related codes because we found that when resubmit_num
>> is
>> larger than 64, seccomp will kill the virtiofsd process.
>>
>> - Support for dax version virtiofsd is very possible and requires almost
>> no
>> additional change to this patchset.
>>
>>
>> Jiachen Zhang (9):
>>   vhost-user-fs: Add support for reconnection of vhost-user-fs backend
>>   vhost: Add vhost-user message types for sending shared memory and file
>>     fds
>>   vhost-user-fs: Support virtiofsd crash reconnection
>>   libvhost-user: Add vhost-user message types for sending shared memory
>>     and file fds
>>   virtiofsd: Convert the struct lo_map array to a more flatten layout
>>   virtiofsd: Add two new options for crash reconnection
>>   virtiofsd: Persist/restore lo_map and opened fds to/from QEMU
>>   virtiofsd: Ensure crash consistency after reconnection
>>   virtiofsd: (work around) Comment qsort in inflight I/O tracking
>>
>>  contrib/libvhost-user/libvhost-user.c | 106 +++-
>>  contrib/libvhost-user/libvhost-user.h |  70 +++
>>  docs/interop/vhost-user.rst           |  41 ++
>>  hw/virtio/vhost-user-fs.c             | 334 ++++++++++-
>>  hw/virtio/vhost-user.c                | 123 ++++
>>  hw/virtio/vhost.c                     |  42 ++
>>  include/hw/virtio/vhost-backend.h     |   6 +
>>  include/hw/virtio/vhost-user-fs.h     |  16 +-
>>  include/hw/virtio/vhost.h             |  42 ++
>>  tools/virtiofsd/fuse_lowlevel.c       |  24 +-
>>  tools/virtiofsd/fuse_virtio.c         |  44 ++
>>  tools/virtiofsd/fuse_virtio.h         |   1 +
>>  tools/virtiofsd/helper.c              |   9 +
>>  tools/virtiofsd/passthrough_helpers.h |   2 +-
>>  tools/virtiofsd/passthrough_ll.c      | 830 ++++++++++++++++++--------
>>  tools/virtiofsd/passthrough_seccomp.c |   1 +
>>  16 files changed, 1413 insertions(+), 278 deletions(-)
>>
>> --
>> 2.20.1
>>
>>
>>
>
> --
> Marc-André Lureau
>

[-- Attachment #2: Type: text/html, Size: 8799 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
To: "Marc-André Lureau" <marcandre.lureau@gmail.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	QEMU <qemu-devel@nongnu.org>,
	virtio-fs@redhat.com, Xie Yongji <xieyongji@bytedance.com>
Subject: Re: [Virtio-fs] [External] Re: [RFC PATCH 0/9] Support for Virtio-fs daemon crash reconnection
Date: Fri, 18 Dec 2020 17:39:34 +0800	[thread overview]
Message-ID: <CAFQAk7hCqSMMfRjUO8vtK-B2cKxJZZTJgSDAbRycd1AOSktM_w@mail.gmail.com> (raw)
In-Reply-To: <CAJ+F1CLZ4VtgKp5fEdC70m22PgV2VHvRHunR-nPOWDnJPFvqqg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6943 bytes --]

On Wed, Dec 16, 2020 at 11:36 PM Marc-André Lureau <
marcandre.lureau@gmail.com> wrote:

> Hi
>
> On Tue, Dec 15, 2020 at 8:22 PM Jiachen Zhang <
> zhangjiachen.jaycee@bytedance.com> wrote:
>
>> Hi, all
>>
>> We implement virtio-fs crash reconnection in this patchset. The crash
>> reconnection of virtiofsd here is completely transparent to guest, no
>> remount in guest is needed, even the inflight requests can be handled
>> normally after reconnection. We are looking forward to any comments.
>>
>> Thanks,
>> Jiachen
>>
>>
>> OVERVIEW:
>>
>> To support virtio-fs crash reconnection, we need to support the recovery
>> of 1) inflight FUSE request, and 2) virtiofsd internal status information.
>>
>> Fortunately, QEMU's vhost-user reconnection framework already supports
>> inflight I/O tracking by using VHOST_USER_GET_INFLIGHT_FD and
>> VHOST_USER_SET_INFLIGHT_FD (see 5ad204bf2 and 5f9ff1eff for details).
>> As the FUSE requests are transferred by virtqueue I/O requests, by using
>> the vhost-user inflight I/O tracking, we can recover the inflight FUSE
>> requests.
>>
>> To support virtiofsd internal status recovery, we introduce 4 new
>> vhost-user message types. As shown in the following diagram, two of them
>> are used to persist shared lo_maps and opened fds to QEMU, the other two
>> message types are used to restore the status when reconnecting.
>>
>>                                VHOST_USER_SLAVE_SHM
>>                                VHOST_USER_SLAVE_FD
>>     +--------------+       Persist       +--------------------+
>>     |              <---------------------+                    |
>>     |     QEMU     |                     |  Virtio-fs Daemon  |
>>     |              +--------------------->                    |
>>     +--------------+       Restore       +--------------------+
>>             VHOST_USER_SET_SHM
>>             VHOST_USER_SET_FD
>>
>> Although the 4 newly added message types are to support virtiofsd
>> reconnection in this patchset, it might be potential in other situation.
>> So we keep in mind to make them more general when add them to vhost
>> related source files. VHOST_USER_SLAVE_SHM and VHOST_USER_SET_SHM can be
>> used for memory sharing between a vhost-user daemon and QEMU,
>> VHOST_USER_SLAVE_FD and VHOST_USER_SET_FD would be useful if we want to
>> shared opened fds between QEMU process and vhost-user daemon process.
>>
>
> Before adding new messages to the already complex vhost-user protocol, can
> we evaluate other options?
>
> First thing that came to my mind is that the memory state could be saved
> to disk or with a POSIX shared memory object.
>
>
Eventually, the protocol could just pass around the fds, and not make a
> special treatment for shared memory.
>
> Then I remember systemd has a pretty good API & protocol for this sort of
> thing: sd_notify(3) (afaik, it is quite easy to implement a minimal handler)
>
> You can store fds with FDSTORE=1 (with an optional associated FDNAME).
> sd_listen_fds() & others to get them back (note: passed by inheritance only
> I think). systemd seems to not make shm a special case either, just treat
> it like an opened fd to restore.
>
> If we consider backend processes are going to be managed by libvirt or
> even a systemd service, is it a better alternative? sd_notify() offers a
> number of interesting features as well to monitor services.
>
>

Thanks for the suggestions. Actually, we choose to save all state
information to QEMU because a virtiofsd has the same lifecycle as its
QEMU master. However, saving things to a file do avoid communication with
QEMU, and we no longer need to increase the complexity of vhost-user
protocol. The suggestion to save fds to the systemd is also very reasonable
if we don't consider the lifecycle issues, we will try it.

All the best,
Jiachen



>>
>> USAGE and NOTES:
>>
>> - The commits are rebased to a recent QEMU master commit
>> b4d939133dca0fa2b.
>>
>> - ",reconnect=1" should be added to the "-chardev socket" of
>> vhost-user-fs-pci
>> in the QEMU command line, for example:
>>
>>     qemu-system-x86_64 ... \
>>     -chardev socket,id=char0,path=/tmp/vhostqemu,reconnect=1 \
>>     -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=myfs \
>>     ...
>>
>> - We add new options for virtiofsd to enable or disable crash
>> reconnection.
>> And some options are not supported by crash reconnection. So add following
>> options to virtiofsd to enable reconnection:
>>
>>     virtiofsd ... -o reconnect -o no_mount_ns -o no_flock -o no_posix_lock
>>     -o no_xattr ...
>>
>> - The reasons why virtiofsd-side locking, extended attributes, and mount
>> namespace are not supported is explained in the commit message of the 6th
>> patch (virtiofsd: Add two new options for crash reconnection).
>>
>> - The 9th patch is a work-around that will not affect the overall
>> correctness.
>> We remove the qsort related codes because we found that when resubmit_num
>> is
>> larger than 64, seccomp will kill the virtiofsd process.
>>
>> - Support for dax version virtiofsd is very possible and requires almost
>> no
>> additional change to this patchset.
>>
>>
>> Jiachen Zhang (9):
>>   vhost-user-fs: Add support for reconnection of vhost-user-fs backend
>>   vhost: Add vhost-user message types for sending shared memory and file
>>     fds
>>   vhost-user-fs: Support virtiofsd crash reconnection
>>   libvhost-user: Add vhost-user message types for sending shared memory
>>     and file fds
>>   virtiofsd: Convert the struct lo_map array to a more flatten layout
>>   virtiofsd: Add two new options for crash reconnection
>>   virtiofsd: Persist/restore lo_map and opened fds to/from QEMU
>>   virtiofsd: Ensure crash consistency after reconnection
>>   virtiofsd: (work around) Comment qsort in inflight I/O tracking
>>
>>  contrib/libvhost-user/libvhost-user.c | 106 +++-
>>  contrib/libvhost-user/libvhost-user.h |  70 +++
>>  docs/interop/vhost-user.rst           |  41 ++
>>  hw/virtio/vhost-user-fs.c             | 334 ++++++++++-
>>  hw/virtio/vhost-user.c                | 123 ++++
>>  hw/virtio/vhost.c                     |  42 ++
>>  include/hw/virtio/vhost-backend.h     |   6 +
>>  include/hw/virtio/vhost-user-fs.h     |  16 +-
>>  include/hw/virtio/vhost.h             |  42 ++
>>  tools/virtiofsd/fuse_lowlevel.c       |  24 +-
>>  tools/virtiofsd/fuse_virtio.c         |  44 ++
>>  tools/virtiofsd/fuse_virtio.h         |   1 +
>>  tools/virtiofsd/helper.c              |   9 +
>>  tools/virtiofsd/passthrough_helpers.h |   2 +-
>>  tools/virtiofsd/passthrough_ll.c      | 830 ++++++++++++++++++--------
>>  tools/virtiofsd/passthrough_seccomp.c |   1 +
>>  16 files changed, 1413 insertions(+), 278 deletions(-)
>>
>> --
>> 2.20.1
>>
>>
>>
>
> --
> Marc-André Lureau
>

[-- Attachment #2: Type: text/html, Size: 8799 bytes --]

  reply	other threads:[~2020-12-18  9:49 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-15 16:21 [RFC PATCH 0/9] Support for Virtio-fs daemon crash reconnection Jiachen Zhang
2020-12-15 16:21 ` [Virtio-fs] " Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 1/9] vhost-user-fs: Add support for reconnection of vhost-user-fs backend Jiachen Zhang
2020-12-15 16:21   ` [Virtio-fs] " Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 2/9] vhost: Add vhost-user message types for sending shared memory and file fds Jiachen Zhang
2020-12-15 16:21   ` [Virtio-fs] " Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 3/9] vhost-user-fs: Support virtiofsd crash reconnection Jiachen Zhang
2020-12-15 16:21   ` [Virtio-fs] " Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 4/9] libvhost-user: Add vhost-user message types for sending shared memory and file fds Jiachen Zhang
2020-12-15 16:21   ` [Virtio-fs] " Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 5/9] virtiofsd: Convert the struct lo_map array to a more flatten layout Jiachen Zhang
2020-12-15 16:21   ` [Virtio-fs] " Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 6/9] virtiofsd: Add two new options for crash reconnection Jiachen Zhang
2020-12-15 16:21   ` [Virtio-fs] " Jiachen Zhang
2021-02-04 12:08   ` Dr. David Alan Gilbert
2021-02-04 12:08     ` [Virtio-fs] " Dr. David Alan Gilbert
2021-02-04 14:16     ` [External] " Jiachen Zhang
2021-02-04 14:16       ` [Virtio-fs] " Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 7/9] virtiofsd: Persist/restore lo_map and opened fds to/from QEMU Jiachen Zhang
2020-12-15 16:21   ` [Virtio-fs] " Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 8/9] virtiofsd: Ensure crash consistency after reconnection Jiachen Zhang
2020-12-15 16:21   ` [Virtio-fs] " Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 9/9] virtiofsd: (work around) Comment qsort in inflight I/O tracking Jiachen Zhang
2020-12-15 16:21   ` [Virtio-fs] " Jiachen Zhang
2021-02-04 12:15   ` Dr. David Alan Gilbert
2021-02-04 12:15     ` [Virtio-fs] " Dr. David Alan Gilbert
2021-02-04 14:20     ` [External] " Jiachen Zhang
2021-02-04 14:20       ` [Virtio-fs] " Jiachen Zhang
2020-12-15 22:51 ` [RFC PATCH 0/9] Support for Virtio-fs daemon crash reconnection no-reply
2020-12-15 22:51   ` [Virtio-fs] " no-reply
2020-12-16 15:36 ` Marc-André Lureau
2020-12-16 15:36   ` [Virtio-fs] " Marc-André Lureau
2020-12-18  9:39   ` Jiachen Zhang [this message]
2020-12-18  9:39     ` [Virtio-fs] [External] " Jiachen Zhang
2021-03-17 10:05     ` Stefan Hajnoczi
2021-03-17 10:05       ` [Virtio-fs] " Stefan Hajnoczi
2021-03-17 11:49       ` Christian Schoenebeck
2021-03-17 11:49         ` [Virtio-fs] " Christian Schoenebeck
2021-03-17 12:57         ` Jiachen Zhang
2021-03-17 12:57           ` [Virtio-fs] " Jiachen Zhang
2021-03-18 11:58           ` Christian Schoenebeck
2021-03-18 11:58             ` [Virtio-fs] " Christian Schoenebeck
2021-03-22 10:54             ` Stefan Hajnoczi
2021-03-22 10:54               ` [Virtio-fs] " Stefan Hajnoczi
2021-03-23 12:54               ` Christian Schoenebeck
2021-03-23 12:54                 ` [Virtio-fs] " Christian Schoenebeck
2021-03-23 14:25                 ` Stefan Hajnoczi
2021-03-23 14:25                   ` [Virtio-fs] " Stefan Hajnoczi
2021-03-17 12:32       ` Jiachen Zhang
2021-03-17 12:32         ` [Virtio-fs] " Jiachen Zhang
2021-03-22 11:00         ` Stefan Hajnoczi
2021-03-22 11:00           ` [Virtio-fs] " Stefan Hajnoczi
2021-03-22 20:13           ` Vivek Goyal
2021-03-22 20:13             ` Vivek Goyal
2021-03-23 13:45             ` Stefan Hajnoczi
2021-03-23 13:45               ` Stefan Hajnoczi
2021-05-10 14:38 ` Jiachen Zhang
2021-05-10 14:38   ` [Virtio-fs] " Jiachen Zhang
2021-05-13 15:17   ` Stefan Hajnoczi
2021-05-13 15:17     ` [Virtio-fs] " Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFQAk7hCqSMMfRjUO8vtK-B2cKxJZZTJgSDAbRycd1AOSktM_w@mail.gmail.com \
    --to=zhangjiachen.jaycee@bytedance.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=marcandre.lureau@gmail.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=virtio-fs@redhat.com \
    --cc=xieyongji@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.