linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Ritesh Harjani <riteshh@linux.ibm.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	wugyuan@cn.ibm.com, jlayton@kernel.org, hsiangkao@aol.com,
	Jan Kara <jack@suse.cz>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH RESEND 1/1] vfs: Really check for inode ptr in lookup_fast
Date: Fri, 1 Nov 2019 23:46:22 +0000	[thread overview]
Message-ID: <20191101234622.GM26530@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20191023110551.D04AE4C044@d06av22.portsmouth.uk.ibm.com>

On Wed, Oct 23, 2019 at 04:35:50PM +0530, Ritesh Harjani wrote:

> > > What we have guaranteed is
> > > 	* ->d_lock serializes ->d_flags/->d_inode changes
> > > 	* ->d_seq is bumped before/after such changes
> > > 	* positive dentry never changes ->d_inode as long as you hold
> > > a reference (negative dentry *can* become positive right under you)
> > > 
> > > So there are 3 classes of valid users: those holding ->d_lock, those
> > > sampling and rechecking ->d_seq and those relying upon having observed
> > > the sucker they've pinned to be positive.
> 
> :) Thanks for simplifying like this. Agreed.

FWIW, after fixing several ceph bugs, add to that the following:
	* all places that turn a negative dentry into positive one are
holding its parent exclusive or dentry has not been observable for
anybody else.  It had been present in the parent's list of children
(negative and unhashed) and it might have been present in in-lookup
hashtable.  However, nobody is going to grab a reference to it from there
without having grabbed ->d_lock on it and observed the state after
it became positive. 

Which means that holding a reference to dentry *and* holding its
parent at least shared stabilizes both ->d_inode and type bits in
->d_flags.  The situation with barriers is more subtle - *IF* we
had sufficient barriers to have ->d_inode/type bits seen right
after having gotten the reference, we are fine.  The only change
possible after that point is negative->positive transition and
that gets taken care of by barriers provided by ->i_rwsem.

If we'd obtained that reference by d_lookup() or __d_lookup(),
we are fine - ->d_lock gives a barrier.  The same goes for places
that grab references during a tree traversal, provided that they
hold ->d_lock around that (fs/autofs/expire.c stuff).  The same goes
for having it found in inode's aliases list (->i_lock).

I really hope that the same applies to accesses to file_dentry(file);
on anything except alpha that would be pretty much automatic and
on alpha we get the things along the lines of

	f = fdt[n]
	mb
	d = f->f_path.dentry
	i = d->d_inode
	assert(i != NULL)
vs.
	see that d->d_inode is non-NULL
	f->f_path.dentry = d
	mb
	fdt[n] = f

IOW, the barriers that make it safe to fetch the fields of struct file
(rcu_dereference_raw() in __fcheck_files() vs. smp_store_release()
in __fd_install() in the above) should *hopefully* take care of all
stores visible by the time of do_dentry_open().  Sure, alpha cache
coherency is insane, but AFAICS it's not _that_ insane.

Question to folks familiar with alpha memory model:

A = 0, B = NULL, C = NULL
CPU1:
	A = 1

CPU2:
	r1 = A
	if (r1) {
		B = &A
		mb
		C = &B
	}

CPU3:
	r2 = C;
	mb
	if (r2) {	// &B
		r3 = *r2	// &A
		r4 = *r3	// 1
		assert(r4 == 1)
	}

is the above safe on alpha?

[snip]

> We may also need similar guarantees with __d_clear_type_and_inode().

Not really - pinned dentry can't go negative.  In any case, with the
audit I've done so far, I don't believe that blanket solutions like
that are good idea - most of the places doing checks are safe as it is.
The surface that needs to be taken care of is fairly small, actually;
most of that is in fs/namei.c and fs/dcache.c.

  reply	other threads:[~2019-11-01 23:46 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-27  4:42 [PATCH RESEND 1/1] vfs: Really check for inode ptr in lookup_fast Ritesh Harjani
2019-10-15  4:07 ` Ritesh Harjani
2019-10-22 13:38   ` Ritesh Harjani
2019-10-22 14:37     ` Al Viro
2019-10-22 14:50       ` Al Viro
2019-10-22 20:11       ` Al Viro
2019-10-23 11:05         ` Ritesh Harjani
2019-11-01 23:46           ` Al Viro [this message]
2019-11-02  6:17             ` Al Viro
2019-11-02 17:24               ` Paul E. McKenney
2019-11-02 17:22             ` Paul E. McKenney
2019-11-02 18:08               ` Al Viro
2019-11-03 14:41                 ` Paul E. McKenney
2019-11-03 16:35                 ` [RFC] lookup_one_len_unlocked() lousy calling conventions Al Viro
2019-11-03 18:20                   ` Al Viro
2019-11-03 18:51                     ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Al Viro
2019-11-03 19:03                       ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either Al Viro
2019-11-13  7:01                       ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Amir Goldstein
2019-11-13 12:52                         ` Al Viro
2019-11-13 16:22                           ` Amir Goldstein
2019-11-13 20:18                           ` Jean-Louis Biasini
2019-11-03 17:05                 ` [PATCH][RFC] ecryptfs unlink/rmdir breakage (similar to caught in ecryptfs rename last year) Al Viro
2019-11-09  3:13                 ` [PATCH][RFC] race in exportfs_decode_fh() Al Viro
2019-11-09 16:55                   ` Linus Torvalds
2019-11-09 18:26                     ` Al Viro
2019-11-11  9:16                   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191101234622.GM26530@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=hsiangkao@aol.com \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=riteshh@linux.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=wugyuan@cn.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).