All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/3] VFS: close race between getcwd() and d_move()
Date: Fri, 10 Nov 2017 09:14:58 +1100	[thread overview]
Message-ID: <8760ajf6al.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <CA+55aFxaoyND1DDk5HqUw97i6-+uMRBbCx4mo=yTKYnBnO2owg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4578 bytes --]

On Thu, Nov 09 2017, Linus Torvalds wrote:

> On Wed, Nov 8, 2017 at 7:22 PM, NeilBrown <neilb@suse.com> wrote:
>> d_move() will call __d_drop() and then __d_rehash()
>> on the dentry being moved.  This creates a small window
>> when the dentry appears to be unhashed.  Many tests
>> of d_unhashed() are made under ->d_lock and so are safe
>> from racing with this window, but some aren't.
>> In particular, getcwd() calls d_unlinked() (which calls
>> d_unhashed()) without d_lock protection, so it can race.
>
> Hmm.
>
> I see what you're doing, but I don't necessarily agree.
>
> I would actually almost prefer that we simply change __d_move() itself.
>
> The problem is that __d_move() really wants to move the hashes things
> atomically, but instead of doing that it does a "unhash and then
> rehash".
>
> How nasty would it be to just expand the calls to __d_drop/__d_rehash
> into __d_move itself, and take both has list locks at the same time
> (with the usual ordering and checking if it's the same list, of
> course).
>
>                      Linus

something like this?
I can probably do better than "b1" and "b2".
I assume target must always be hashed ??
Do you like it enough for me to make it into a real patch?
I'd probably move hlist_bl_lock_two() into list_bl.h.
I'm not generally keen on open-coding subtle code, but maybe it is
justified here.

Thanks,
NeilBrown


diff --git a/fs/dcache.c b/fs/dcache.c
index f90141387f01..1a329fedf23c 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -472,6 +472,9 @@ static void dentry_lru_add(struct dentry *dentry)
  */
 void __d_drop(struct dentry *dentry)
 {
+	/* WARNING: any changes here should be reflected in __d_move()
+	 * which open-codes some of this functionality.
+	 */
 	if (!d_unhashed(dentry)) {
 		struct hlist_bl_head *b;
 		/*
@@ -2380,6 +2383,9 @@ EXPORT_SYMBOL(d_delete);
 
 static void __d_rehash(struct dentry *entry)
 {
+	/* WARNING: any changes here should be reflected in __d_move()
+	 * which open-codes some of this functionality.
+	 */
 	struct hlist_bl_head *b = d_hash(entry->d_name.hash);
 	BUG_ON(!d_unhashed(entry));
 	hlist_bl_lock(b);
@@ -2796,11 +2802,23 @@ static void dentry_unlock_for_move(struct dentry *dentry, struct dentry *target)
  * rename_lock, the i_mutex of the source and target directories,
  * and the sb->s_vfs_rename_mutex if they differ. See lock_rename().
  */
+static void hlist_bl_lock_two(struct hlist_bl_head *b1, struct  hlist_bl_head *b2)
+{
+	if (b1 && b1 < b2)
+		hlist_bl_lock(b1);
+	if (b2)
+		hlist_bl_lock(b2);
+	if (b1 > b2)
+		hlist_bl_lock(b1);
+}
+
 static void __d_move(struct dentry *dentry, struct dentry *target,
 		     bool exchange)
 {
 	struct inode *dir = NULL;
 	unsigned n;
+	struct hlist_bl_head *b1, *b2;
+
 	if (!dentry->d_inode)
 		printk(KERN_WARNING "VFS: moving negative dcache entry\n");
 
@@ -2817,10 +2835,24 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
 	write_seqcount_begin(&dentry->d_seq);
 	write_seqcount_begin_nested(&target->d_seq, DENTRY_D_LOCK_NESTED);
 
-	/* unhash both */
-	/* __d_drop does write_seqcount_barrier, but they're OK to nest. */
-	__d_drop(dentry);
-	__d_drop(target);
+	/* We want to unhash both, change names, then rehash one or both.
+	 * If we use __d_drop() and __d_rehash() there will be a window
+	 * when dentry appears to be d_unhashed() which can race with lockless
+	 * checking.  So instead we open-code the important parts of __d_drop()
+	 * and __d_rehash().
+	 * @target must already be hashed, @dentry must be if @exchange.
+	 */
+	BUG_ON(d_unhashed(dentry) && exchange);
+	BUG_ON(d_unhashed(target));
+
+	b1 = d_unhashed(dentry) ? NULL : d_hash(dentry->d_name.hash);
+	b2 = d_hash(target->d_name.hash);
+	hlist_bl_lock_two(b1, b2);
+	if (b1)
+		__hlist_bl_del(&dentry->d_hash);
+	__hlist_bl_del(&target->d_hash);
+	write_seqcount_invalidate(&dentry->d_seq);
+	write_seqcount_invalidate(&target->d_seq);
 
 	/* Switch the names.. */
 	if (exchange)
@@ -2829,9 +2861,14 @@ static void __d_move(struct dentry *dentry, struct dentry *target,
 		copy_name(dentry, target);
 
 	/* rehash in new place(s) */
-	__d_rehash(dentry);
+	hlist_bl_add_head_rcu(&dentry->d_hash, b2);
 	if (exchange)
-		__d_rehash(target);
+		hlist_bl_add_head_rcu(&target->d_hash, b1);
+	else
+		target->d_hash.pprev = NULL;
+	if (b1 && b1 != b2)
+		hlist_bl_unlock(b1);
+	hlist_bl_unlock(b2);
 
 	/* ... and switch them in the tree */
 	if (IS_ROOT(dentry)) {

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2017-11-09 22:15 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-09  3:22 [PATCH 0/3] Three VFS patch resends NeilBrown
2017-11-09  3:22 ` [PATCH 1/3] VFS: use synchronize_rcu_expedited() in namespace_unlock() NeilBrown
2017-11-09  3:22 ` [PATCH 3/3] VFS: close race between getcwd() and d_move() NeilBrown
2017-11-09 11:41   ` Nikolay Borisov
2017-11-09 13:08     ` Matthew Wilcox
2017-11-09 16:02       ` Nikolay Borisov
2017-11-09 20:23   ` Linus Torvalds
2017-11-09 22:14     ` NeilBrown [this message]
2017-11-10  1:40       ` Linus Torvalds
2017-11-10  4:45         ` NeilBrown
2017-11-10 19:52           ` Linus Torvalds
2017-11-10 20:53           ` Al Viro
2017-11-21 23:50             ` Al Viro
2017-11-22  1:31               ` NeilBrown
2017-11-09  3:22 ` [PATCH 2/3] Improve fairness when locking the per-superblock s_anon list NeilBrown
2017-11-09 19:52   ` Linus Torvalds
2017-11-09 20:50     ` Al Viro
2017-11-09 23:09       ` NeilBrown
2017-11-09 23:19         ` Al Viro
2017-11-10  0:02       ` Linus Torvalds
2017-11-10  8:50     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8760ajf6al.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.