qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Cc: QEMU <qemu-devel@nongnu.org>,
	Xie Yongji <xieyongji@bytedance.com>,
	virtio-fs-list <virtio-fs@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>
Subject: Re: [RFC PATCH 0/9] Support for Virtio-fs daemon crash reconnection
Date: Thu, 13 May 2021 16:17:03 +0100	[thread overview]
Message-ID: <YJ1C752kyBYW9ltm@stefanha-x1.localdomain> (raw)
In-Reply-To: <CAFQAk7gH7DUi0-wDANQQBHTPgdtQxv34k+6tr9vzftPLqJt6KQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3294 bytes --]

On Mon, May 10, 2021 at 10:38:05PM +0800, Jiachen Zhang wrote:
> Hi all,
> 
> 
> We are going to develop the v2 patch for virtio-fs crash reconnection. As
> suggested by Marc-André and Stefan, except for the inflight I/O tracking
> log area, all the other internal statuses of virtiofsd will be saved to
> some places other than QEMU. Specifically, the three lo_maps (ino_map,
> dirp_map, and fd_map) could be saved to several mmapped files, and the
> opened fds could be saved to systemd. I'd like to get some feedback on our
> further thoughts before we work on the revision.
> 
> 
> 1. What about by default save the opened fds as file handles to host
> kernel, instead of saving them to systemd. After some internal discussion,
> we think introducing systemd may introduce more uncertainness to the
> system, as we need to create one service for each daemon, and all the
> daemons may suffer the single-point failure of the systemd process.

I don't think saving file handles works 100%. The difference between an
open fd and a file handle is what happens when the inode is deleted. If
an external process deletes the inode during restart and then the fd
keeps it alive while a file handle becomes stale and the inode is gone.

Regarding systemd, it's pid 1 and cannot die - otherwise the system is
broken.

But in any case I think there are multiple options here. Whether you
choose to systemd, implement the sd_notify(3) protocol in your own
parent process, or take a different approach like a parent process with
clone(2) CLONE_FILES to avoid the communication overhead for saving
every fd, I think all of those approaches would be reasonable.

> 2. Like the btree map implementation (multikey.rs) of virtiofsd-rs, what
> about splitting the flatten lo_map implementation, which supports to be
> persisted to files, from passhtrough_ll.c to a new separated source file.
> This way, maybe we can more easily wrap it with some Rust compatible
> interfaces, and enable crash recovery for virtiofsd-rs based on it.

In the past two months I've noticed the number of virtiofsd-rs merge
requests has increased and I think the trend is that new development is
focussing on virtiofsd-rs.

If it fits into your plans then focussing on virtiofsd-rs would be fine
and then there is no need to worry about Rust compatible interfaces for
C virtiofsd.

> 3. What about dropping the dirp_map, and integrate the opened directory fds
> to fd_map. The virtiofsd-rs implementation only has two maps (inodes and
> handles). In the C version, dirp_map may also unnecessary.

Maybe, but carefully:

The purpose of the maps is to safely isolate the client from the
virtiofsd's internal objects. The way I remember it is that C virtiofsd
has a separate dirp map to prevent type confusion between regular open
files and directories. The client must not trick the server into calling
readdir(3) on something that's not a struct dirent because that could be
a security issue.

However, it's possible that virtiofsd-rs is able to combine the two
because it uses syscall APIs on file descriptors instead of libc
opendir(3) so there is no possibility of type confusion. The syscall
would simply fail if the file descriptor is not O_DIRECTORY.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

      reply	other threads:[~2021-05-13 15:18 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-15 16:21 [RFC PATCH 0/9] Support for Virtio-fs daemon crash reconnection Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 1/9] vhost-user-fs: Add support for reconnection of vhost-user-fs backend Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 2/9] vhost: Add vhost-user message types for sending shared memory and file fds Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 3/9] vhost-user-fs: Support virtiofsd crash reconnection Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 4/9] libvhost-user: Add vhost-user message types for sending shared memory and file fds Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 5/9] virtiofsd: Convert the struct lo_map array to a more flatten layout Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 6/9] virtiofsd: Add two new options for crash reconnection Jiachen Zhang
2021-02-04 12:08   ` Dr. David Alan Gilbert
2021-02-04 14:16     ` [External] " Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 7/9] virtiofsd: Persist/restore lo_map and opened fds to/from QEMU Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 8/9] virtiofsd: Ensure crash consistency after reconnection Jiachen Zhang
2020-12-15 16:21 ` [RFC PATCH 9/9] virtiofsd: (work around) Comment qsort in inflight I/O tracking Jiachen Zhang
2021-02-04 12:15   ` Dr. David Alan Gilbert
2021-02-04 14:20     ` [External] " Jiachen Zhang
2020-12-15 22:51 ` [RFC PATCH 0/9] Support for Virtio-fs daemon crash reconnection no-reply
2020-12-16 15:36 ` Marc-André Lureau
2020-12-18  9:39   ` [External] " Jiachen Zhang
2021-03-17 10:05     ` Stefan Hajnoczi
2021-03-17 11:49       ` Christian Schoenebeck
2021-03-17 12:57         ` Jiachen Zhang
2021-03-18 11:58           ` Christian Schoenebeck
2021-03-22 10:54             ` Stefan Hajnoczi
2021-03-23 12:54               ` Christian Schoenebeck
2021-03-23 14:25                 ` Stefan Hajnoczi
2021-03-17 12:32       ` Jiachen Zhang
2021-03-22 11:00         ` Stefan Hajnoczi
2021-03-22 20:13           ` [Virtio-fs] " Vivek Goyal
2021-03-23 13:45             ` Stefan Hajnoczi
2021-05-10 14:38 ` Jiachen Zhang
2021-05-13 15:17   ` Stefan Hajnoczi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YJ1C752kyBYW9ltm@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=virtio-fs@redhat.com \
    --cc=xieyongji@bytedance.com \
    --cc=zhangjiachen.jaycee@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).