All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever III <chuck.lever@oracle.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 8/8] NFSD: Instantiate a struct file when creating a regular NFSv4 file
Date: Sat, 14 May 2022 16:34:08 +0000	[thread overview]
Message-ID: <F0D6DEC4-2589-4501-9A35-93C68CBE642F@oracle.com> (raw)
In-Reply-To: <Yn8zpAbwe9yFq8/i@zeniv-ca.linux.org.uk>



> On May 14, 2022, at 12:44 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> On Fri, May 13, 2022 at 10:20:18PM +0000, Al Viro wrote:
> 
>> Yuck.  dget_parent() is not entirely without valid uses, but this isn't
>> one.  It's for the cases when parent is *not* stable and you need to grab
>> what had been the parent at some point (even though it might not be the
>> parent anymore by the time dget_parent() returns).  Here you seriously
>> depend upon it remaining the parent of that sucker all the way through -
>> otherwise vfs_create() would break.  And you really, really depend upon
>> its survival - the caller is holding it locked, so they would better
>> have it pinned.
> 
> As an aside, the reason why vfs_create() takes inode of parent directory
> and dentry of child is basically that it's easier to describe the locking
> rules that way: vfs_create(..., dir, child, ...) must be called with
> 1) dir being held by caller (exclusive) and
> 2) child->d_parent->d_inode == dir, which is stabilized by (1)
> 
> inode of parent directory is a redundant argument - it can be easily
> derived from the child dentry, for all that family.  The only real
> objection against dropping it from vfs_create() and friends is that
> having rules described as "inode of parent dentry of child must be held
> exclusive by the caller" invites breakage along the lines of
> 
> 	parent = dget_parent(child);
> 	inode_lock(d_inode(parent));
> 	vfs_create(..., child, ...);	// WRONG
> 	inode_unlock(d_inode(parent));
> 	dput(parent);
> 
> which *seems* to match the rules, but actually breaks them badly -
> 'parent' in the snippet above might be no longer related to child by the
> time dget_parent() returns it, so we end up calling vfs_create() with
> wrong directory locked, child->d_parent being completely unstable, etc.
> Note that the difference from your code (which is correct, if redundant) is
> rather subtle.

I'm not sure I have anything informed to say, but here are some
random thoughts:

vfs_create() has to work for both exclusive and non-exclusive
create requests, but its caller is responsible for those semantics.
So perhaps the current vfs_create() synopsis is correct (though as
you point out above, the locking requirements could be documented
a little better?).

NFSD also has the interesting problem of re-exporting NFS mounts.
We'd like the exclusiveness of the creation request to be passed
through to the originating NFS server properly (maybe via
->atomic_open?). Perhaps that's for another day.

But locking the parent means that file creation is serialized,
which for large directories or workloads like "tar", impacts
throughput. Again, perhaps that's for another day.


> If you have any suggestions how to describe these locking rules without
> an explicit inode-of-parent argument, I would really like to hear those.
> The best I'd been able to come up with had been "there's an inode
> locked exclusive by the caller that had been observed to be equal to
> child->d_parent->d_inode at some point after it had been locked", which
> is both cumbersome and confusing...

I have a v2 of my patch which hopefully addresses your comments
(many thanks for those). I'm not sure I captured the gist of your
documentation request though. Looking at some other callers of
vfs_create() such as do_mknodat(), dentry_create's use of the
@path argument might be, well, unexpected. If it looks insane to
you, it can be adjusted.

I'll post v2 shortly and we can iterate on that.


--
Chuck Lever




      reply	other threads:[~2022-05-14 16:34 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-13 19:39 [PATCH 0/8] Make NFSv4 OPEN(CREATE) less brittle Chuck Lever
2022-05-13 19:39 ` [PATCH 1/8] NFSD: Clean up nfsd3_proc_create() Chuck Lever
2022-05-13 19:39 ` [PATCH 2/8] NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create() Chuck Lever
2022-05-13 19:39 ` [PATCH 3/8] NFSD: Refactor nfsd_create_setattr() Chuck Lever
2022-05-13 19:39 ` [PATCH 4/8] NFSD: Refactor NFSv3 CREATE Chuck Lever
2022-05-13 19:39 ` [PATCH 5/8] NFSD: Refactor NFSv4 OPEN(CREATE) Chuck Lever
2022-05-13 19:40 ` [PATCH 6/8] NFSD: Remove do_nfsd_create() Chuck Lever
2022-05-13 19:40 ` [PATCH 7/8] NFSD: Clean up nfsd_open_verified() Chuck Lever
2022-05-13 19:40 ` [PATCH 8/8] NFSD: Instantiate a struct file when creating a regular NFSv4 file Chuck Lever
2022-05-13 22:20   ` Al Viro
2022-05-14  4:44     ` Al Viro
2022-05-14 16:34       ` Chuck Lever III [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F0D6DEC4-2589-4501-9A35-93C68CBE642F@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.