All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Valerie Aurora <vaurora@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	viro@zeniv.linux.org.uk, jblunck@suse.de, hch@infradead.org
Subject: Re: [PATCH 5/5] union: hybrid union filesystem prototype
Date: Fri, 3 Sep 2010 15:10:35 +1000	[thread overview]
Message-ID: <20100903151035.6d5c3c11@notabene> (raw)
In-Reply-To: <20100902213315.GA16004@shell>

On Thu, 2 Sep 2010 17:33:15 -0400
Valerie Aurora <vaurora@redhat.com> wrote:

> On Thu, Sep 02, 2010 at 11:19:41AM +0200, Miklos Szeredi wrote:
> > On Wed, 1 Sep 2010, Valerie Aurora wrote:
> > > > +
> > > > +		err = vfs_create(upperdir, newdentry, attr->ia_mode, NULL);
> > > 
> > > Passing a NULL namiedata pointer to vfs_create() is a convenient
> > > temporary hack, but unfortunately NFS, ceph, etc. still use the
> > > nameidata passed to vfs_create() and other ops.
> > > 
> > > The way union mounts gets a valid nameidata is by doing the create in
> > > the VFS before calling file system ops that may trigger a copyup,
> > > while we still have the original nameidata.  This is one of the major
> > > reasons union mounts lives in the VFS.
> > 
> > Not a big deal, just set up nd as if this was a single component
> > lookup.  The previous version did it like this:
> > 
> > +       struct nameidata nd = {
> > +               .last_type = LAST_NORM,
> > +               .last = *name,
> > +       };
> > +
> > +       nd.path = pue->upperpath;
> > +       path_get(&nd.path);
> > +
> > +       newdentry = lookup_create(&nd, S_ISDIR(attr->ia_mode));
> > 
> > But that's not a solution to the NFS suckage, it's just a workaround.
> 
> Hm, I suspect it's more complicated than this.  I looked at how
> unionfs does it in init_lower_nd() and it requires poking around in
> VFS internal details in the file system implementation.  So unioning
> code is not in the VFS, but VFS code is in the union fs.  Progress?  I
> dunno.

Slightly off-topic, but my personal definition of 'progress' in this context
would be giving more control to the filesystems rather than the VFS telling
them how they have to behave.  The VFS should largely be a library that the
filesystems can call on to do common tasks, but where they can augment what
libVFS does, or just ignore it as they choose.  This would be more like the
model of the page-cache.  It is really easy for a filesystem to use the
pagecache to store file content, and really easy for it to do something else
if that works better.

In this particular situation - where unionfs has a dentry and want to copy
that file to a different dentry, I think what we really want to do is call
the section of code in the middle of do_filp_open, roughly from the "We have
the parent and last component"  comment to the do_last() call.  If that could
be factored out and exported it would get close to what we want.

I had a look at NFS and ceph, and they want to see LOOKUP_CREATE and
LOOPUP_OPEN set, and want the intent.open.file to exist.  do_filp_open can do
all that for you.


> 
> > "Fortunately" NFS isn't good for a writable layer of a union for other
> > reasons, so this isn't a big concern at the moment.
> 
> It's the long-term effect on the code structure that concerns me more.

Code structure:  absolutely agree this is important.  But I don't think it 
    needs to be a problem - just refactor 'VFS" code and call into it.
    (I note that nfsd always passes a NULL nameidata - when refactoring that
    code it would be worth aiming to make it usable by nfsd too).

NFS as writable layer:  Not a concern at the moment, no.  But I think it is
   worth keeping it in mind.
   The biggest problem is, I think, the lack of xattrs which are currently
   needed for whiteout and opaque.
   I think there would be little cost in allowing a symlink to
   (union-whiteout) to be treated as a whiteout even though it has no xattrs
   (maybe as a mount option).
   For opaque you would need a somewhat less-elegant work around. e.g. if the
   directory contains a symlink to (union-opaque) called ._.union_opaque,
   then that symlink is hidden, and the directory is opaque.  This could be
   enabled by that same mount option.
   This might not be as efficient as xattrs, but then people don't use
   networked filesystems for their speed - they have other benefits.

NeilBrown


  reply	other threads:[~2010-09-03  5:10 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-26 18:33 [PATCH 0/5] hybrid union filesystem prototype Miklos Szeredi
2010-08-26 18:33 ` [PATCH 1/5] vfs: implement open "forwarding" Miklos Szeredi
2010-08-26 18:33 ` [PATCH 2/5] vfs: make i_op->permission take a dentry instead of an inode Miklos Szeredi
2010-08-26 20:24   ` David P. Quigley
2010-08-27  4:11     ` Neil Brown
2010-08-27 18:13       ` David P. Quigley
2010-08-27 19:21         ` Valerie Aurora
2010-08-27 18:31       ` David P. Quigley
2010-08-26 18:33 ` [PATCH 3/5] vfs: add flag to allow rename to same inode Miklos Szeredi
2010-08-26 18:33 ` [PATCH 4/5] vfs: export do_splice_direct() to modules Miklos Szeredi
2010-08-26 18:33 ` [PATCH 5/5] union: hybrid union filesystem prototype Miklos Szeredi
2010-09-01 21:42   ` Valerie Aurora
2010-09-02  9:19     ` Miklos Szeredi
2010-09-02 21:33       ` Valerie Aurora
2010-09-03  5:10         ` Neil Brown [this message]
2010-09-03  9:16           ` Miklos Szeredi
2010-09-09 16:02             ` David P. Quigley
2010-09-03  8:52         ` Miklos Szeredi
2010-09-02 21:42   ` Valerie Aurora
2010-09-03 12:31     ` Miklos Szeredi
2010-08-27  7:05 ` [PATCH 0/5] " Neil Brown
2010-08-27  8:47   ` Miklos Szeredi
2010-08-27 11:35     ` Neil Brown
2010-08-27 16:53       ` Miklos Szeredi
2010-08-29  4:42         ` Neil Brown
2010-08-30 10:18           ` Miklos Szeredi
2010-08-30 11:40             ` Neil Brown
2010-08-30 12:20               ` Miklos Szeredi
2010-08-31 19:18                 ` Valerie Aurora
2010-08-31 20:19                   ` Trond Myklebust
2010-09-01  1:56                     ` Valerie Aurora
2010-09-01  4:04                       ` Trond Myklebust
2010-09-01  4:33               ` Neil Brown
2010-09-01 20:11                 ` Miklos Szeredi
2010-08-31 19:29             ` Valerie Aurora
2010-09-02 13:15             ` Jan Engelhardt
2010-09-02 13:32               ` Neil Brown
2010-09-02 14:25                 ` Jan Engelhardt
2010-09-02 14:28                   ` Miklos Szeredi
2010-09-08 19:47                     ` David P. Quigley
2010-09-23 13:18                   ` Jan Engelhardt
2010-09-23 19:22                     ` Valerie Aurora
2010-08-30 18:38       ` Valerie Aurora
2010-08-30 23:12         ` Neil Brown
2010-08-31 11:00           ` Miklos Szeredi
2010-08-31 11:24             ` Neil Brown
2010-08-31 15:05               ` Kyle Moffett
2010-08-31 15:05                 ` Kyle Moffett
2010-08-31 20:36                 ` Valerie Aurora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100903151035.6d5c3c11@notabene \
    --to=neilb@suse.de \
    --cc=hch@infradead.org \
    --cc=jblunck@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=vaurora@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.