All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v2] vfs: Don't exchange "short" filenames unconditionally.
Date: Mon, 29 Sep 2014 16:04:45 +0100	[thread overview]
Message-ID: <20140929150445.GE7996@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20140929131613.GL5015@linux.vnet.ibm.com>

On Mon, Sep 29, 2014 at 06:16:13AM -0700, Paul E. McKenney wrote:

> The "textbook" approach is to avoid starting the grace period until
> the after the final reference count is dropped (and of course after
> the name has been made inaccessible to all readers).  Not sure what
> the best way to adapt the current code to this approach (if it is
> in fact feasible to begin with), but one approach would be something
> like this:

> static void __d_free(struct rcu_head *head)
> {
> 	struct dentry *dentry = container_of(head, struct dentry, d_u.d_rcu);
> 
> 	WARN_ON(!hlist_unhashed(&dentry->d_alias));
> 	if (dname_external(dentry) && atomic_dec_return(&dentry->refcnt))
> 		call_rcu(&dentry->rh, __d_free_name_rcu);
> 	kmem_cache_free(dentry_cache, dentry); 
> }
> 
> Of course, this means that the link side has to do something like
> atomic_add_unless(&&dentry->refcnt, 1, 0), and create a new name
> if the old one is on its way out.  If that is too painful, another

What do you mean, "link"?  Rename, perhaps?

> approach is to increment the count for the grace period and decrement
> it at the end of the grace period, and to skip the free if non-zero
> after that decrement.  And this means adding an rcu_head to the
> qstr structure, which might not help memory footprint.

> Presumably the write_seqcount_begin() prevents confusion from ephemeral
> names...

We hold rename_lock, so __d_move() is serialized.  *And* both arguments
are pinned, so they couldn't have reached free_dentry() yet.

> > +static void copy_name(struct dentry *dentry, struct dentry *target)
> > +{
> > +	struct ext_name *old_name = ext_name(dentry);
> > +	if (unlikely(dname_external(target))) {
> > +		atomic_inc(&ext_name(target)->count);
> 
> I might be missing something, but I believe that the above needs to
> be atomic_add_unless().

It would better not; both dentry and target are pinned.

> Might also need to be in an RCU read-side critical section, but I am not
> so sure about that.  If it is not in an RCU read-side critical section,
> I don't see how the kfree_rcu() knows how long to wait before freeing
> the name.

It is a *writer*, not reader.  Readers of ->d_name are either under
rcu_read_lock(), or have hard exclusion wrt that __d_move() one way or another
and hold a reference to dentry, preventing dentry_free().  We really
need to care only about the ones under rcu_read_lock(), so kfree_rcu()
will do.  Non-RCU readers are relying on something like "I'm holding
a reference to dentry and ->i_mutex on the parent directory or
->s_vfs_rename_mutex on the entire filesystem".  Or "I'm holding that
reference and I know that on this filesystem nothing could lead to
d_move() and friends".

Anyway, see followup to that patch; it's equivalent to this one, but instead
of separately delayed freeing of dentry and ext name it makes __d_free()
free just the dentry itself and __d_free_ext() freeing both.  With
free_dentry() choosing which one to call_rcu().

FWIW, the thing I had been worrying about turned out to be a red herring (for
the same reason why RCU list removals don't need barriers - call_rcu() acts
as a barrier, so that reader managing to see the old pointer after
rcu_read_lock() will be seen by call_rcu() from writer, making the callback
delayed until that reader does rcu_read_unlock(); AFAICS, that's guaranteed
by smp_mb() in __call_rcu()).  However, the readers are, indeed, in trouble.
Already.  There's a data dependency - we want to make sure that the string
we observe via ->d_name.name is NUL-terminated.  We have dentry allocation
doing this
        dname[name->len] = 0;

        /* Make sure we always see the terminating NUL character */
        smp_wmb();
        dentry->d_name.name = dname;
and dentry_cmp() (one of the RCU readers) has
        cs = ACCESS_ONCE(dentry->d_name.name);
        smp_read_barrier_depends();
before looking at the string itself.  However, another such reader
(prepend_name()) doesn't - it just does ACCESS_ONCE() and assumes that
store of terminating NUL will be seen.  AFAICS, not guaranteed on Alpha,
so we need the same smp_read_barrier_depends() there.

Another thing: dentry_free() is called only when all non-RCU readers can't
see the dentry.  So that's where we need to decrement the refcount of ext
name, AFAICS.  Same in copy_name() (from __d_move()) where we lose the
reference to old name.  Again, non-RCU readers are not a problem - they
have exclusion wrt __d_move().  For them it's atomic.

Refcount is equal to the number of dentries visible to non-RCU readers with
->d_name.name pointing to that external name.  Freeing isn't scheduled until
that reaches 0 and is done via kfree_rcu() (or, in the later version of patch,
kfree_rcu() on one path and call_rcu() on another)...

  reply	other threads:[~2014-09-29 15:04 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-24 18:14 [PATCH v2] vfs: Don't exchange "short" filenames unconditionally Mikhail Efremov
2014-09-24 18:55 ` Al Viro
2014-09-24 19:20   ` Linus Torvalds
2014-09-24 19:27     ` Linus Torvalds
2014-09-24 20:18     ` Al Viro
2014-09-25  4:46       ` Al Viro
2014-09-26 16:44         ` Al Viro
2014-09-27  4:45           ` Al Viro
2014-09-27 17:56             ` Linus Torvalds
2014-09-27 18:31               ` Al Viro
2014-09-27 19:16                 ` Al Viro
2014-09-27 19:37                   ` Linus Torvalds
2014-09-27 19:39                     ` Linus Torvalds
2014-09-27 19:49                       ` Al Viro
2014-09-27 19:55                         ` Linus Torvalds
2014-09-27 21:48                           ` [git pull] vfs.git for 3.17-rc7 Al Viro
2014-09-27 19:45                   ` [PATCH v2] vfs: Don't exchange "short" filenames unconditionally Al Viro
2014-09-28  7:47                   ` Al Viro
2014-09-28 18:05                     ` Al Viro
2014-09-28 21:51                       ` Al Viro
2014-09-29  1:06                         ` [PATCH] missing data dependency barrier in prepend_name() Al Viro
2014-09-29 15:15                       ` [PATCH v2] vfs: Don't exchange "short" filenames unconditionally Linus Torvalds
2014-09-29 15:59                         ` Al Viro
2014-09-29 16:07                           ` Linus Torvalds
2014-09-29 16:27                             ` Al Viro
2014-09-29 17:54                               ` Linus Torvalds
2014-09-29 19:04                                 ` Al Viro
2014-09-29 20:45                                   ` Linus Torvalds
2014-09-29 18:42                       ` Paul E. McKenney
2014-10-01  0:16                         ` Al Viro
2014-10-02  5:38                           ` Paul E. McKenney
2014-10-02 10:35                           ` Chuck Ebbert
2014-10-03  2:11                             ` Paul E. McKenney
2014-09-29 13:16                     ` Paul E. McKenney
2014-09-29 15:04                       ` Al Viro [this message]
2014-09-28 15:01               ` Mikhail Efremov
2014-09-26 20:23         ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140929150445.GE7996@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.