All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eryu Guan <eguan@linux.alibaba.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	fuse-devel <fuse-devel@lists.sourceforge.net>,
	Max Reitz <mreitz@redhat.com>,
	virtio-fs-list <virtio-fs@redhat.com>,
	bergwolf@hyper.sh, gerry@linux.alibaba.com,
	Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [Virtio-fs] [fuse-devel] 'FORGET' ordering semantics (vs unlink & NFS)
Date: Fri, 8 Jan 2021 12:12:52 +0800	[thread overview]
Message-ID: <20210108041252.GT80581@e18g06458.et15sqa> (raw)
In-Reply-To: <CAOQ4uxjTerAAxvjpDmDMQO363ju01MsEhbgBmHHSQb92hoL1_g@mail.gmail.com>

On Thu, Jan 07, 2021 at 12:42:00PM +0200, Amir Goldstein wrote:
> On Thu, Jan 7, 2021 at 10:44 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Wed, Jan 6, 2021 at 5:58 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Wed, Jan 06, 2021 at 02:40:45PM +0100, Miklos Szeredi wrote:
> >
> > > > @@ -901,6 +906,18 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> > > >
> > > >      inode = lo_find(lo, &e->attr, mnt_id);
> > > >      if (inode) {
> > > > +        char buf1[PATH_MAX + 1], buf2[PATH_MAX + 1], procname[64];
> > > > +        ssize_t siz1, siz2;
> > > > +
> > > > +        sprintf(procname, "%i", inode->fd);
> > > > +        siz1 = readlinkat(lo->proc_self_fd, procname, buf1, sizeof(buf1));
> > > > +        sprintf(procname, "%i", newfd);
> > > > +        siz2 = readlinkat(lo->proc_self_fd, procname, buf2, sizeof(buf2));
> > > > +
> > > > +        /* disable close on unlink if alias is detected */
> > > > +        if (siz1 != siz2 || memcmp(buf1, buf2, siz1))
> > > > +            g_atomic_int_inc(&inode->opencount);
> > > > +
> > >
> > > Hi Miklos,
> > >
> > > So if I have a hard links to a file in a separate directories, then this
> > > path can hit. (Say dir1/file.txt and dir2/file-link.txt). IIUC, we will
> > > disable automatic inode->fd closing on these hardlinked files. That
> > > means this solution will not solve problem at hand for hard linked files.
> > > Am I missing something.
> >
> > You are correct, this is a limited solution.  The one above is a
> > corner case, and probably not worth worrying about too much.  However
> > it  can be fixed by having separate O_PATH fd's open for each alias
> > (or at least for each alias that resides in a distinct directory).
> >
> > NB: doing a synchronous FORGET on unlink does not solve this case
> > either, since the inode will not receive a FORGET until the last alias
> > is dropped.
> >
> > > >          close(newfd);
> > > >      } else {
> > > >          inode = calloc(1, sizeof(struct lo_inode));
> > > > @@ -917,6 +934,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> > > >           */
> > > >          g_atomic_int_set(&inode->refcount, 2);
> > > >
> > > > +        g_atomic_int_set(&inode->opencount, 0);
> > > >          inode->nlookup = 1;
> > > >          inode->fd = newfd;
> > > >          inode->key.ino = e->attr.st_ino;
> > > > @@ -1295,6 +1313,10 @@ static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
> > > >      res = unlinkat(lo_fd(req, parent), name, 0);
> > > >
> > > >      fuse_reply_err(req, res == -1 ? errno : 0);
> > > > +    if (!g_atomic_int_get(&inode->opencount)) {
> > > > +        close(inode->fd);
> > > > +        inode->fd = -1;
> > > > +    }
> > >
> > > Can this be racy w.r.t lo_lookup(). IOW, say dir1/file.txt is being
> > > unlinked and we closed inode->fd. And before we could execute
> > > unref_inode_lolocked(), another parallel lookup of dir2/file-link.txt
> > > happens if it gets lo->muxtex lock first, it can still find this
> > > inode and bump up reference count. And that means lo_unlink() will
> > > not free inode (but close inode->fd) and now we will use an inode
> > > with closed O_PATH fd which lead to other failures later.
> >
> > Yes, that's a bug.   Also needs serialization against all access to inode->fd.
> >
> 
> Miklos,
> 
> I would like to point out that this discussion is relevant to any low level
> fuse filesystem, but especially those that are proxying a real filesystem.
> 
> I have raised this question before in the FUSE_PASSTHROUGH threads.
> There is a lot of code duplication among various low-level fuse projects and
> as we get to NFS export support and complex issues like this one, is it
> getting unlikely that all projects will handle this correctly.
> 
> Do you think there is room for some more generic code in libfuse and if
> so how? Following an example implementation (assuming it gets fixed)
> and hand picking fixes to projects cannot scale for long.
> 
> The challenge is that most of the generic code would be in lookup and
> maintaining the internal inode cache, but each filesystem may need
> different representations of the Inode object.
> 
> I was thinking of something along the lines of an OO library that
> implements the generic lookup/inode cache for a base FuseInode class
> that implementers can inherit from.
> 
> This doesn't need to be in the libfuse project at all.
> Seeing the virtiofsd already has a Rust implementation that is BSD
> licensed, maybe that can be used as a starting point?
> 
> David,
> 
> How hard do you think it would be to re-factor virtiofsd-rs to
> an implementation that inherits from a base passthroughfsd-rs?
> 
> BTW, is virtiofsd-rs the offical virtiofsd or an experimental one?
> Which tree does Miklos' patch apply to?
> 
> Anyone has other thoughts about how to reduce fragmentation in
> implementations?

There's an fuse-backend-rs[1] project hosted on cloud-hypervisor, it is
a library to communicate with the Linux FUSE clients, which includes:

- ABI layer, which defines all data structures shared between linux Fuse
  framework and Fuse daemons.
- API layer, defines the interfaces for Fuse daemons to implement a
  userspace file system.
- Transport layer, which supports both the Linux Fuse device and
  virtio-fs protocol.
- VFS/pseudo_fs, an abstraction layer to support multiple file systems
  by a single virtio-fs device.
- A sample passthrough file system implementation, which passes through
  files from daemons to clients.

I'm wondering if fuse-backend-rs is a proper project to work on, and
maybe virtiofsd-rs could be switched to use it as well in the future.

Thanks,
Eryu

[1] https://github.com/cloud-hypervisor/fuse-backend-rs


  parent reply	other threads:[~2021-01-08  4:12 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-04 16:00 [Virtio-fs] 'FORGET' ordering semantics (vs unlink & NFS) Dr. David Alan Gilbert
2021-01-04 18:45 ` Vivek Goyal
2021-01-04 18:56   ` Dr. David Alan Gilbert
2021-01-04 19:04     ` Vivek Goyal
2021-01-04 19:16       ` Vivek Goyal
2021-01-05 11:24     ` [Virtio-fs] [fuse-devel] " Miklos Szeredi
2021-01-05 15:42       ` Vivek Goyal
2021-01-06  4:29     ` Amir Goldstein
2021-01-06  8:01       ` Miklos Szeredi
2021-01-06  9:16         ` Amir Goldstein
2021-01-06  9:27           ` Amir Goldstein
2021-01-06 13:40           ` Miklos Szeredi
2021-01-06 16:57             ` Vivek Goyal
2021-01-07  8:44               ` Miklos Szeredi
2021-01-07 10:42                 ` Amir Goldstein
2021-01-07 20:10                   ` Dr. David Alan Gilbert
2021-01-08  4:12                   ` Eryu Guan [this message]
2021-01-08  9:08                     ` Amir Goldstein
2021-01-08  9:25                       ` Liu, Jiang
2021-01-08 10:18                       ` Eryu Guan
2021-01-08 15:26                     ` Vivek Goyal
2021-01-15 10:20                       ` Peng Tao
2021-01-08 15:55           ` Vivek Goyal
2021-01-11 15:48           ` Dr. David Alan Gilbert
2021-01-05 10:11 ` Nikolaus Rath
2021-01-05 12:28   ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210108041252.GT80581@e18g06458.et15sqa \
    --to=eguan@linux.alibaba.com \
    --cc=amir73il@gmail.com \
    --cc=bergwolf@hyper.sh \
    --cc=fuse-devel@lists.sourceforge.net \
    --cc=gerry@linux.alibaba.com \
    --cc=miklos@szeredi.hu \
    --cc=mreitz@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.