All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH v2] vfs: Don't exchange "short" filenames unconditionally.
Date: Sun, 28 Sep 2014 08:47:47 +0100	[thread overview]
Message-ID: <20140928074747.GZ7996@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20140927191657.GU7996@ZenIV.linux.org.uk>

On Sat, Sep 27, 2014 at 08:16:57PM +0100, Al Viro wrote:
> On Sat, Sep 27, 2014 at 07:31:39PM +0100, Al Viro wrote:
> 
> > We can get the long name cases right, and I agree that it'll make the
> > things nicer, but it might take a couple of days to get right.  The thing
> > I'm concerned about is not screwing DCACHE_RCUACCESS up.
> 
> FWIW, I suspect that the right approach is to put refcount + rcu_head in
> front of external name and do the following:
> 	* __d_free() checks if we have an external name, gets its containing
> structure and does if (atomic_dec_and_test(&name->count)) kfree(name);
> 	* switch_names() in non-exchange case (I'd probably call it copy_name,
> not move_names, but anyway) sets DCACHE_RCUACCESS on target (source has
> already gotten it from __d_rehash()), increments refcount on target's name
> if external and, if the source old name is external, decrements its refcount
> and calls kfree_rcu() if it has hit zero.
> 
> AFAICS, it guarantees that we'll schedule an RCU callback on name's rch_head
> at most once, that we won't free it while RCU callback on it is scheduled
> and we won't free it until a grace period has expired since the last time
> it had been referenced by observable dentries.  Do you see any holes in that?

Bugger...  I see one, actually.  Look: d1 and d2 come to share name at some
point.  name has refcount 2.  d1 gets dropped.  OK, we get RCU-delayed
__d_free().  refcount is still 2.  Now somebody starts looking through the
hash chain.  And somebody else renames d2.  __d_move() decrements refcount of
name to 1.  No kfree_rcu().  OK, _now_ __d_free() on d1 happens; it was
RCU-delayed, but that won't wait for rcu_read_lock() done after we'd
done call_rcu().  So __d_free() is called, name refcount reaches 0 and the
name is freed.  Just as the hash lookup has gotten around to looking at what
used to be the name of d2.  Oops...

The root cause, of course, is that we delay decrementing the refcount on
dentry_free() path...  One variant is to rip freeing these suckers out of
__d_free() and have dentry_free() do atomic_dec_and_test + kfree_rcu.
That works (and we don't need to set DCACHE_RCUACCESS in copy_name();
the interesting part of never-hashed case is for short names anyway), but
the cost is twice the number of rcu callbacks on freeing long-name dentries.
Is that really painful enough to care?  If not, something like the following
would do; if it is... well, we could probably do make free_dentry() mark
dentry->d_flags for __d_free() with "free the external name hanging off
this one" in case if refcount decrement has ended up with zero.  Doable,
but more complicated (and somewhat messy, since ->d_lock is not held
at that point; OTOH, we could play with ->d_name instead of ->d_flags...).
Anyway, the delta below might suffice.  Comments?

diff --git a/fs/dcache.c b/fs/dcache.c
index cb25a1a..b52f4ee 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -235,18 +235,34 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
 	return dentry_string_cmp(cs, ct, tcount);
 }
 
+struct ext_name {
+	union {
+		atomic_t count;
+		struct rcu_head head;
+	};
+	unsigned char name[];
+};
+
+static inline struct ext_name *ext_name(struct dentry *dentry)
+{
+	if (likely(!dname_external(dentry)))
+		return NULL;
+	return container_of(dentry->d_name.name, struct ext_name, name[0]);
+}
+
 static void __d_free(struct rcu_head *head)
 {
 	struct dentry *dentry = container_of(head, struct dentry, d_u.d_rcu);
 
 	WARN_ON(!hlist_unhashed(&dentry->d_alias));
-	if (dname_external(dentry))
-		kfree(dentry->d_name.name);
 	kmem_cache_free(dentry_cache, dentry); 
 }
 
 static void dentry_free(struct dentry *dentry)
 {
+	struct ext_name *p = ext_name(dentry);
+	if (unlikely(p) && likely(atomic_dec_and_test(&p->count)))
+		kfree_rcu(p, head);
 	/* if dentry was never visible to RCU, immediate free is OK */
 	if (!(dentry->d_flags & DCACHE_RCUACCESS))
 		__d_free(&dentry->d_u.d_rcu);
@@ -1438,11 +1454,14 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	 */
 	dentry->d_iname[DNAME_INLINE_LEN-1] = 0;
 	if (name->len > DNAME_INLINE_LEN-1) {
-		dname = kmalloc(name->len + 1, GFP_KERNEL);
-		if (!dname) {
+		size_t size = offsetof(struct ext_name, name) + name->len + 1;
+		struct ext_name *p = kmalloc(size, GFP_KERNEL);
+		if (!p) {
 			kmem_cache_free(dentry_cache, dentry); 
 			return NULL;
 		}
+		atomic_set(&p->count, 1);
+		dname = p->name;
 	} else  {
 		dname = dentry->d_iname;
 	}	
@@ -2372,11 +2391,10 @@ void dentry_update_name_case(struct dentry *dentry, struct qstr *name)
 }
 EXPORT_SYMBOL(dentry_update_name_case);
 
-static void switch_names(struct dentry *dentry, struct dentry *target,
-			 bool exchange)
+static void swap_names(struct dentry *dentry, struct dentry *target)
 {
-	if (dname_external(target)) {
-		if (dname_external(dentry)) {
+	if (unlikely(dname_external(target))) {
+		if (unlikely(dname_external(dentry))) {
 			/*
 			 * Both external: swap the pointers
 			 */
@@ -2392,7 +2410,7 @@ static void switch_names(struct dentry *dentry, struct dentry *target,
 			target->d_name.name = target->d_iname;
 		}
 	} else {
-		if (dname_external(dentry)) {
+		if (unlikely(dname_external(dentry))) {
 			/*
 			 * dentry:external, target:internal.  Give dentry's
 			 * storage to target and make dentry internal
@@ -2407,12 +2425,6 @@ static void switch_names(struct dentry *dentry, struct dentry *target,
 			 */
 			unsigned int i;
 			BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
-			if (!exchange) {
-				memcpy(dentry->d_iname, target->d_name.name,
-						target->d_name.len + 1);
-				dentry->d_name.hash_len = target->d_name.hash_len;
-				return;
-			}
 			for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
 				swap(((long *) &dentry->d_iname)[i],
 				     ((long *) &target->d_iname)[i]);
@@ -2422,6 +2434,22 @@ static void switch_names(struct dentry *dentry, struct dentry *target,
 	swap(dentry->d_name.hash_len, target->d_name.hash_len);
 }
 
+static void copy_name(struct dentry *dentry, struct dentry *target)
+{
+	struct ext_name *old_name = ext_name(dentry);
+	if (unlikely(dname_external(target))) {
+		atomic_inc(&ext_name(target)->count);
+		dentry->d_name = target->d_name;
+	} else {
+		memcpy(dentry->d_iname, target->d_name.name,
+				target->d_name.len + 1);
+		dentry->d_name.name = dentry->d_iname;
+		dentry->d_name.hash_len = target->d_name.hash_len;
+	}
+	if (unlikely(old_name) && likely(atomic_dec_and_test(&old_name->count)))
+		kfree_rcu(old_name, head);
+}
+
 static void dentry_lock_for_move(struct dentry *dentry, struct dentry *target)
 {
 	/*
@@ -2518,7 +2546,10 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
 	}
 
 	/* Switch the names.. */
-	switch_names(dentry, target, exchange);
+	if (exchange)
+		swap_names(dentry, target);
+	else
+		copy_name(dentry, target);
 
 	/* ... and switch them in the tree */
 	if (IS_ROOT(dentry)) {

  parent reply	other threads:[~2014-09-28  7:47 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-24 18:14 [PATCH v2] vfs: Don't exchange "short" filenames unconditionally Mikhail Efremov
2014-09-24 18:55 ` Al Viro
2014-09-24 19:20   ` Linus Torvalds
2014-09-24 19:27     ` Linus Torvalds
2014-09-24 20:18     ` Al Viro
2014-09-25  4:46       ` Al Viro
2014-09-26 16:44         ` Al Viro
2014-09-27  4:45           ` Al Viro
2014-09-27 17:56             ` Linus Torvalds
2014-09-27 18:31               ` Al Viro
2014-09-27 19:16                 ` Al Viro
2014-09-27 19:37                   ` Linus Torvalds
2014-09-27 19:39                     ` Linus Torvalds
2014-09-27 19:49                       ` Al Viro
2014-09-27 19:55                         ` Linus Torvalds
2014-09-27 21:48                           ` [git pull] vfs.git for 3.17-rc7 Al Viro
2014-09-27 19:45                   ` [PATCH v2] vfs: Don't exchange "short" filenames unconditionally Al Viro
2014-09-28  7:47                   ` Al Viro [this message]
2014-09-28 18:05                     ` Al Viro
2014-09-28 21:51                       ` Al Viro
2014-09-29  1:06                         ` [PATCH] missing data dependency barrier in prepend_name() Al Viro
2014-09-29 15:15                       ` [PATCH v2] vfs: Don't exchange "short" filenames unconditionally Linus Torvalds
2014-09-29 15:59                         ` Al Viro
2014-09-29 16:07                           ` Linus Torvalds
2014-09-29 16:27                             ` Al Viro
2014-09-29 17:54                               ` Linus Torvalds
2014-09-29 19:04                                 ` Al Viro
2014-09-29 20:45                                   ` Linus Torvalds
2014-09-29 18:42                       ` Paul E. McKenney
2014-10-01  0:16                         ` Al Viro
2014-10-02  5:38                           ` Paul E. McKenney
2014-10-02 10:35                           ` Chuck Ebbert
2014-10-03  2:11                             ` Paul E. McKenney
2014-09-29 13:16                     ` Paul E. McKenney
2014-09-29 15:04                       ` Al Viro
2014-09-28 15:01               ` Mikhail Efremov
2014-09-26 20:23         ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140928074747.GZ7996@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.