All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Mikhail Efremov <sem@altlinux.org>
Subject: Re: [PATCH v2] vfs: Don't exchange "short" filenames unconditionally.
Date: Wed, 1 Oct 2014 22:38:47 -0700	[thread overview]
Message-ID: <20141002053847.GI5015@linux.vnet.ibm.com> (raw)
In-Reply-To: <20141001001615.GT7996@ZenIV.linux.org.uk>

On Wed, Oct 01, 2014 at 01:16:15AM +0100, Al Viro wrote:
> On Mon, Sep 29, 2014 at 11:42:18AM -0700, Paul E. McKenney wrote:
> 
> > Assuming that incrementing the external name's reference count is
> > atomic_add_unless, I could believe this part.  Or if you have some
> > locking that makes it impossible to increment the reference count
> > in any case where there is any risk that it might be decremented
> > to zero, I guess.
> > 
> > Which might well be the pair of write_seqcount_begin() calls in __d_move(),
> > now that I look more carefully.  So if the name has to be in use somewhere
> > before it can be copied, then a copy can only be created if there is at
> > least one copy that is not currently being removed?  If so, OK.
> 
> Huh?  copy_name() does copy a _reference_, not the name itself.  All the
> copying involved is source->d_name.name = target->d_name.name.  And those
> are simply unsigned char *.
> 
> write_seqcount_begin() is irrelevant here.  Look: all callers of
> __d_move(x, y) are holding references both to x and y.  Contributing to
> the refcount of dentries themselves, that is, not the names.
> 
> That gives exclusion between __d_move() and free_dentry() - the latter cannot
> be called until dentry refcount reaches zero.  RCU is completely irrelevant
> here.  In fact, no call chain leads to __d_move() under rcu_read_lock().
> You must hold the target dentry hard, or it could simply be freed right
> under you.
> 
> And __d_move() is taking ->d_lock on all dentries involved (in
> addition to rename_lock serializing it system-wide).
> 
> What could possibly lead to refcount zero being observed on target of
> __d_move()?  The history of any dentry is this:
> 	* it is created by __d_alloc().  Nobody can see it until __d_alloc()
> returns.  Dentry refcount (not to be confused with refcount of external
> name) is 1.
> 	* it passes through some (usually - zero) __d_move() calls.
> Some - as the first argument, some - as the second one.  All those
> calls are serialized by global seqlock - callers must hold rename_lock.
> And all of them are done by somebody who is holding a counting reference
> to dentries in question.
> 	* counting references to dentry might be taken and dropped;
> eventually refcount reaches zero (under ->d_lock) and no further
> counting references can be taken after that.  See __dentry_kill() - the
> first thing it does is poisoning the refcount, so that any future
> attempt to increment it would fail.  __dentry_kill() (still under ->d_lock
> of dentry, ->d_lock of its parent and ->i_lock of its inode) removes
> dentry from the tree, from hash and from the alias list of inode;
> Then it drops the locks.  At that point the only search structure dentry
> might be found in is shrink list; if it's not on such list, free_dentry()
> is called immediately, otherwise it's marked so that the code processing
> the shrink list in question would, as soon as it gets to that sucker,
> remove it from the shrink list and call the same free_dentry().  And that's
> the only thing done to such dentry by somebody finding it via a shrink list.
> 	* once free_dentry() has been reached, dentry can can be only seen
> by RCU lookups, and after the grace period ends it gets physically freed.
> 
> free_dentry() isn't allowed to overlap __d_move(); to have that happen is
> a serious dentry refcounting bug.  No __d_move() is allowed _after_
> free_dentry() has been entered, either.  Again, it would take a refcounting
> bug for dentries to have that happen - basically, double dput() somewhere.
> If that happens, all bets are off, of course - if dentry gets unexpectedly
> freed under somebody who has grabbed a reference to it and has not dropped
> it yet, we are fucked.
> 
> Nothing outside of __d_move() is allowed to change ->d_name.name.  RCU-critical
> code is allowed to fetch and dereference it, and such code relies upon
> 	a) freeing of name seen by somebody who'd done rcu_read_lock() being
> delayed until after the matching rcu_read_unlock()
> 	b) store of terminating NUL done by __d_alloc() (and never overwritten
> afterwards) being seen by RCU-critical code that has found the pointer to
> that name in dentry->d_name.name
> 
> All other code accessing ->d_name.name is required to hold one of the locks
> that are held by __d_move() and its callers.  Grabbing any of those leads
> to smp_mb() on alpha, which serves as data dependency barrier there, so
> we don't need explicit barrier there as we do in RCU-critical places - guarding
> NUL will be seen.

Please accept my apologies for the noise!

							Thanx, Paul


  reply	other threads:[~2014-10-02  5:38 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-24 18:14 [PATCH v2] vfs: Don't exchange "short" filenames unconditionally Mikhail Efremov
2014-09-24 18:55 ` Al Viro
2014-09-24 19:20   ` Linus Torvalds
2014-09-24 19:27     ` Linus Torvalds
2014-09-24 20:18     ` Al Viro
2014-09-25  4:46       ` Al Viro
2014-09-26 16:44         ` Al Viro
2014-09-27  4:45           ` Al Viro
2014-09-27 17:56             ` Linus Torvalds
2014-09-27 18:31               ` Al Viro
2014-09-27 19:16                 ` Al Viro
2014-09-27 19:37                   ` Linus Torvalds
2014-09-27 19:39                     ` Linus Torvalds
2014-09-27 19:49                       ` Al Viro
2014-09-27 19:55                         ` Linus Torvalds
2014-09-27 21:48                           ` [git pull] vfs.git for 3.17-rc7 Al Viro
2014-09-27 19:45                   ` [PATCH v2] vfs: Don't exchange "short" filenames unconditionally Al Viro
2014-09-28  7:47                   ` Al Viro
2014-09-28 18:05                     ` Al Viro
2014-09-28 21:51                       ` Al Viro
2014-09-29  1:06                         ` [PATCH] missing data dependency barrier in prepend_name() Al Viro
2014-09-29 15:15                       ` [PATCH v2] vfs: Don't exchange "short" filenames unconditionally Linus Torvalds
2014-09-29 15:59                         ` Al Viro
2014-09-29 16:07                           ` Linus Torvalds
2014-09-29 16:27                             ` Al Viro
2014-09-29 17:54                               ` Linus Torvalds
2014-09-29 19:04                                 ` Al Viro
2014-09-29 20:45                                   ` Linus Torvalds
2014-09-29 18:42                       ` Paul E. McKenney
2014-10-01  0:16                         ` Al Viro
2014-10-02  5:38                           ` Paul E. McKenney [this message]
2014-10-02 10:35                           ` Chuck Ebbert
2014-10-03  2:11                             ` Paul E. McKenney
2014-09-29 13:16                     ` Paul E. McKenney
2014-09-29 15:04                       ` Al Viro
2014-09-28 15:01               ` Mikhail Efremov
2014-09-26 20:23         ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141002053847.GI5015@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sem@altlinux.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.