From: Al Viro <viro@zeniv.linux.org.uk>
To: Ritesh Harjani <riteshh@linux.ibm.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
wugyuan@cn.ibm.com, jlayton@kernel.org, hsiangkao@aol.com,
Jan Kara <jack@suse.cz>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH RESEND 1/1] vfs: Really check for inode ptr in lookup_fast
Date: Tue, 22 Oct 2019 21:11:31 +0100 [thread overview]
Message-ID: <20191022201131.GZ26530@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20191022143736.GX26530@ZenIV.linux.org.uk>
On Tue, Oct 22, 2019 at 03:37:36PM +0100, Al Viro wrote:
> On Tue, Oct 22, 2019 at 07:08:54PM +0530, Ritesh Harjani wrote:
> > I think we have still not taken this patch. Al?
> You've picked the easiest one to hit, but on e.g. KVM setups you can have the
> host thread representing the CPU where __d_set_inode_and_type() runs get
> preempted (by the host kernel), leaving others with much wider window.
>
> Sure, we can do that to all callers of d_is_negative/d_is_positive, but...
> the same goes for any places that assumes that d_is_dir() implies that
> the sucker is positive, etc.
>
> What we have guaranteed is
> * ->d_lock serializes ->d_flags/->d_inode changes
> * ->d_seq is bumped before/after such changes
> * positive dentry never changes ->d_inode as long as you hold
> a reference (negative dentry *can* become positive right under you)
>
> So there are 3 classes of valid users: those holding ->d_lock, those
> sampling and rechecking ->d_seq and those relying upon having observed
> the sucker they've pinned to be positive.
>
> What you've been hitting is "we have it pinned, ->d_flags says it's
> positive but we still observe the value of ->d_inode from before the
> store to ->d_flags that has made it look positive".
Actually, your patch opens another problem there. Suppose you make
it d_really_is_positive() and hit the same race sans reordering.
Dentry is found by __d_lookup() and is negative. Right after we
return from __d_lookup() another thread makes it positive (a symlink)
- ->d_inode is set, d_really_is_positive() becomes true. OK, on we
go, pick the inode and move on. Right? ->d_flags is still not set
by the thread that made it positive. We return from lookup_fast()
and call step_into(). And get to
if (likely(!d_is_symlink(path->dentry)) ||
Which checks ->d_flags and sees the value from before the sucker
became positive. IOW, d_is_symlink() is false here. If that
was the last path component and we'd been told to follow links,
we will end up with positive dentry of a symlink coming out of
pathname resolution.
Similar fun happens if you have mkdir racing with lookup - ENOENT
is what should've happened if lookup comes first, success - if
mkdir does. This way we can hit ENOTDIR, due to false negative
from d_can_lookup().
IOW, d_really_is_negative() in lookup_fast() will paper over
one of oopsen, but it
* won't cover similar oopsen on other codepaths and
* will lead to bogus behaviour.
I'm not sure that blanket conversion of d_is_... to smp_load_acquire()
is the right solution; it might very well be that we need to do that
only on a small subset of call sites, lookup_fast() being one of
those. But we do want at least to be certain that something we'd
got from lookup_fast() in non-RCU mode already has ->d_flags visible.
I'm going through the callers right now, will post a followup once
the things get cleaner...
next prev parent reply other threads:[~2019-10-22 20:11 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-27 4:42 [PATCH RESEND 1/1] vfs: Really check for inode ptr in lookup_fast Ritesh Harjani
2019-10-15 4:07 ` Ritesh Harjani
2019-10-22 13:38 ` Ritesh Harjani
2019-10-22 14:37 ` Al Viro
2019-10-22 14:50 ` Al Viro
2019-10-22 20:11 ` Al Viro [this message]
2019-10-23 11:05 ` Ritesh Harjani
2019-11-01 23:46 ` Al Viro
2019-11-02 6:17 ` Al Viro
2019-11-02 17:24 ` Paul E. McKenney
2019-11-02 17:22 ` Paul E. McKenney
2019-11-02 18:08 ` Al Viro
2019-11-03 14:41 ` Paul E. McKenney
2019-11-03 16:35 ` [RFC] lookup_one_len_unlocked() lousy calling conventions Al Viro
2019-11-03 18:20 ` Al Viro
2019-11-03 18:51 ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Al Viro
2019-11-03 19:03 ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either Al Viro
2019-11-13 7:01 ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Amir Goldstein
2019-11-13 12:52 ` Al Viro
2019-11-13 16:22 ` Amir Goldstein
2019-11-13 20:18 ` Jean-Louis Biasini
2019-11-03 17:05 ` [PATCH][RFC] ecryptfs unlink/rmdir breakage (similar to caught in ecryptfs rename last year) Al Viro
2019-11-09 3:13 ` [PATCH][RFC] race in exportfs_decode_fh() Al Viro
2019-11-09 16:55 ` Linus Torvalds
2019-11-09 18:26 ` Al Viro
2019-11-11 9:16 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191022201131.GZ26530@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=hsiangkao@aol.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=riteshh@linux.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=wugyuan@cn.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).