Linux-Fsdevel Archive on
 help / color / Atom feed
From: Al Viro <>
To: Linus Torvalds <>
Cc: "J. Bruce Fields" <>,
	Ritesh Harjani <>,
	linux-fsdevel <>,
	Linux Kernel Mailing List <>,, Jeff Layton <>,, Jan Kara <>,
	Christoph Hellwig <>
Subject: Re: [PATCH][RFC] race in exportfs_decode_fh()
Date: Sat, 9 Nov 2019 18:26:01 +0000
Message-ID: <> (raw)
In-Reply-To: <>

On Sat, Nov 09, 2019 at 08:55:38AM -0800, Linus Torvalds wrote:
> On Fri, Nov 8, 2019 at 7:13 PM Al Viro <> wrote:
> >
> > We have derived the parent from fhandle, we have a disconnected dentry for child,
> > we go look for the name.  We even find it.  Now, we want to look it up.  And
> > some bastard goes and unlinks it, just as we are trying to lock the parent.
> > We do a lookup, and get a negative dentry.  Then we unlock the parent... and
> > some other bastard does e.g. mkdir with the same name.  OK, nresult->d_inode
> > is not NULL (anymore).  It has fuck-all to do with the original fhandle
> > (different inumber, etc.) but we happily accept it.
> No arguments with your patch, although I doubt that this case has
> actually ever happened in practice ;)

Frankly, by this point I'm rather tempted to introduce new sparse annotation for
dentries - "might not be positive".  The thing is, there are four cases when
->d_inode is guaranteed to be stable:
	1) nobody else has seen that dentry; that includes in-lookup ones - they
can be found in in-lookup hash by d_alloc_parallel(), but it'll wait until they
cease to be in-lookup.
	2) ->d_lock is held
	3) pinned positive
	4) pinned, parent held at least shared
Anything else can have ->d_inode changed by another thread.  And class 3 is by
far the most common.  As the matter of fact, most of dentry pointers in the
system (as opposed to (h)lists traversing dentries) are such.

For obvious reasons we have shitloads of ->d_inode accesses; I'd been going
through a tree-wide audit of those and doing that manually is bloody unpleasant.
And we do have races in that area - the one above is the latest I'd caught;
there had been more and I'm fairly sure that it's not the last.

Redoing that kind of audit every once in a while is something I wouldn't
wish on anyone; it would be nice to use sparse to narrow the field.  Note
that simply checking ->d_inode (or ->d_flags) is not enough unless we
know them to be stable.  E.g. callers of lookup_one_len_unlocked() tend
to be racy exactly because of such tests.  I'm going to add
lookup_positive_unlocked() that would return ERR_PTR(-ENOENT) on
negatives and convert the callers - with one exception they treat
negatives the same way they would treat ERR_PTR(-ENOENT) and their
tests are racy.  I'd rather have sufficient barriers done in one common
helper, instead of trying to add them in 11 places and hope that new
users won't fuck it up...

I'm still not sure what's the best way to do the annotations - propagation
is not a big deal, but expressing the transitions and checking that
maybe-negative ones are not misused is trickier.  The last part can
be actually left to manual audit - annotations would already reduce
the search area big way.  A not too ugly way to say that now this dentry
is known to be non-negative, OTOH...  I want to finish the audit and
take a good look at the list of places where such transitions happen.

  reply index

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-27  4:42 [PATCH RESEND 1/1] vfs: Really check for inode ptr in lookup_fast Ritesh Harjani
2019-10-15  4:07 ` Ritesh Harjani
2019-10-22 13:38   ` Ritesh Harjani
2019-10-22 14:37     ` Al Viro
2019-10-22 14:50       ` Al Viro
2019-10-22 20:11       ` Al Viro
2019-10-23 11:05         ` Ritesh Harjani
2019-11-01 23:46           ` Al Viro
2019-11-02  6:17             ` Al Viro
2019-11-02 17:24               ` Paul E. McKenney
2019-11-02 17:22             ` Paul E. McKenney
2019-11-02 18:08               ` Al Viro
2019-11-03 14:41                 ` Paul E. McKenney
2019-11-03 16:35                 ` [RFC] lookup_one_len_unlocked() lousy calling conventions Al Viro
2019-11-03 18:20                   ` Al Viro
2019-11-03 18:51                     ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Al Viro
2019-11-03 19:03                       ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either Al Viro
2019-11-13  7:01                       ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Amir Goldstein
2019-11-13 12:52                         ` Al Viro
2019-11-13 16:22                           ` Amir Goldstein
2019-11-13 20:18                           ` Jean-Louis Biasini
2019-11-03 17:05                 ` [PATCH][RFC] ecryptfs unlink/rmdir breakage (similar to caught in ecryptfs rename last year) Al Viro
2019-11-09  3:13                 ` [PATCH][RFC] race in exportfs_decode_fh() Al Viro
2019-11-09 16:55                   ` Linus Torvalds
2019-11-09 18:26                     ` Al Viro [this message]
2019-11-11  9:16                   ` Christoph Hellwig

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on

Archives are clonable:
	git clone --mirror linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ \
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone