linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] vfs: Make icache searchable under RCU [ver #2]
@ 2019-04-25 15:01 David Howells
  2019-04-25 15:01 ` [PATCH 1/6] vfs, coda: Fix the lack of locking in FID replacement inode rehashing " David Howells
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: David Howells @ 2019-04-25 15:01 UTC (permalink / raw)
  To: viro
  Cc: Jan Harkes, Theodore Ts'o, Andreas Dilger, codalist, coda,
	linux-ext4, linux-afs, dhowells, linux-afs, linux-ext4,
	linux-ntfs-dev, linux-fsdevel, linux-kernel


Hi Al

Here are some patches that make the icache searchable under RCU.  This
benefits ext4 (which can use it to find an inode to update the timestamps on)
and afs (which can use it to find a vnode to invalidate the callback promise
on).

It might also benefit NTFS, if its use ilookup5_nowait() can be substituted
for.  I'm not sure whether it actually needs to wait for inodes that are
undergoing deletion.

The first patch in the series attempts to fix the non-use of locking in Coda
when it moves an inode between buckets because it needs to update the search
key.

Changes:

 (v2)
	Remove the old rehashing code from coda.
	Use READ_ONCE() in ext4 to access i_state.
	Minor fixes.

The patches can also be found on the following branch:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=icache-rcu

Thanks,
David
---
David Howells (6):
      vfs, coda: Fix the lack of locking in FID replacement inode rehashing
      vfs: Change inode_hash_lock to a seqlock
      vfs: Allow searching of the icache under RCU conditions
      afs: Use RCU inode cache search for callback resolution
      ext4: Search for an inode to update under the RCU lock if we can
      vfs: Delete find_inode_nowait()


 fs/afs/callback.c  |   12 ++
 fs/coda/cnode.c    |   16 ++-
 fs/ext4/inode.c    |   54 +++++----
 fs/inode.c         |  307 ++++++++++++++++++++++++++++++++++++----------------
 include/linux/fs.h |   11 +-
 5 files changed, 264 insertions(+), 136 deletions(-)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/6] vfs, coda: Fix the lack of locking in FID replacement inode rehashing [ver #2]
  2019-04-25 15:01 [PATCH 0/6] vfs: Make icache searchable under RCU [ver #2] David Howells
@ 2019-04-25 15:01 ` David Howells
  2019-04-25 15:01 ` [PATCH 2/6] vfs: Change inode_hash_lock to a seqlock " David Howells
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: David Howells @ 2019-04-25 15:01 UTC (permalink / raw)
  To: viro
  Cc: Jan Harkes, coda, codalist, dhowells, linux-afs, linux-ext4,
	linux-ntfs-dev, linux-fsdevel, linux-kernel

When coda attempts to recover from disconnected operation, there exists the
possibility of an file created during disconnected operation not being
creatable on the server with the same file identifier (FID).  In such a
case, coda_replace_fid() has to rehash the inode and move it between chains
- but, as the comment notes, this really needs some locking.

Fix this by moving the core part of the code to fs/inode.c and providing it
with a set() function akin to iget5().  We can then take the inode cache
lock whilst performing the move.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jan Harkes <jaharkes@cs.cmu.edu>
cc: coda@cs.cmu.edu
cc: codalist@coda.cs.cmu.edu
---

 fs/coda/cnode.c    |   16 ++++++++++------
 fs/inode.c         |   23 +++++++++++++++++++++++
 include/linux/fs.h |    3 +++
 3 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/fs/coda/cnode.c b/fs/coda/cnode.c
index 845b5a66952a..a7e9a1b4ba2f 100644
--- a/fs/coda/cnode.c
+++ b/fs/coda/cnode.c
@@ -107,6 +107,15 @@ struct inode *coda_cnode_make(struct CodaFid *fid, struct super_block *sb)
 }
 
 
+static void coda_reset_inode(struct inode *inode, unsigned long hash, void *data)
+{
+	struct CodaFid *fid = (struct CodaFid *)data;
+	struct coda_inode_info *cii = ITOC(inode);
+
+	cii->c_fid = *fid;
+	inode->i_ino = hash;
+}
+
 /* Although we treat Coda file identifiers as immutable, there is one
  * special case for files created during a disconnection where they may
  * not be globally unique. When an identifier collision is detected we
@@ -123,12 +132,7 @@ void coda_replace_fid(struct inode *inode, struct CodaFid *oldfid,
 	
 	BUG_ON(!coda_fideq(&cii->c_fid, oldfid));
 
-	/* replace fid and rehash inode */
-	/* XXX we probably need to hold some lock here! */
-	remove_inode_hash(inode);
-	cii->c_fid = *newfid;
-	inode->i_ino = hash;
-	__insert_inode_hash(inode, hash);
+	rehash_inode(inode, hash, coda_reset_inode, newfid);
 }
 
 /* convert a fid to an inode. */
diff --git a/fs/inode.c b/fs/inode.c
index e9d97add2b36..00bb48ca3642 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -501,6 +501,29 @@ void __remove_inode_hash(struct inode *inode)
 }
 EXPORT_SYMBOL(__remove_inode_hash);
 
+/**
+ * rehash_inode - Relabel and rehash an inode
+ * @inode: Inode to rehash
+ * @hashval: New hash value (usually inode number) to get
+ * @reset: Callback used to relabel the inode
+ * @data: Opaque data pointer to pass to @reset
+ */
+void rehash_inode(struct inode *inode, unsigned long hashval,
+		  void (*reset)(struct inode *inode, unsigned long hashval, void *data),
+		  void *data)
+{
+	struct hlist_head *b = inode_hashtable + hash(inode->i_sb, hashval);
+
+	spin_lock(&inode_hash_lock);
+	spin_lock(&inode->i_lock);
+	hlist_del_init(&inode->i_hash);
+	reset(inode, hashval, data);
+	hlist_add_head(&inode->i_hash, b);
+	spin_unlock(&inode->i_lock);
+	spin_unlock(&inode_hash_lock);
+}
+EXPORT_SYMBOL(rehash_inode);
+
 void clear_inode(struct inode *inode)
 {
 	/*
diff --git a/include/linux/fs.h b/include/linux/fs.h
index dd28e7679089..6442ff08b28c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3019,6 +3019,9 @@ static inline void remove_inode_hash(struct inode *inode)
 		__remove_inode_hash(inode);
 }
 
+extern void rehash_inode(struct inode *, unsigned long,
+			 void (*reset)(struct inode *, unsigned long, void *), void *);
+
 extern void inode_sb_list_add(struct inode *inode);
 
 #ifdef CONFIG_BLOCK


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/6] vfs: Change inode_hash_lock to a seqlock [ver #2]
  2019-04-25 15:01 [PATCH 0/6] vfs: Make icache searchable under RCU [ver #2] David Howells
  2019-04-25 15:01 ` [PATCH 1/6] vfs, coda: Fix the lack of locking in FID replacement inode rehashing " David Howells
@ 2019-04-25 15:01 ` David Howells
  2019-04-25 15:02 ` [PATCH 3/6] vfs: Allow searching of the icache under RCU conditions " David Howells
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: David Howells @ 2019-04-25 15:01 UTC (permalink / raw)
  To: viro
  Cc: dhowells, linux-afs, linux-ext4, linux-ntfs-dev, linux-fsdevel,
	linux-kernel

Change inode_hash_lock to a seqlock so that a subsequent patch can try
doing lockless searches of the icache, but we can still force a retry under
lock by bumping the sequence counter.

For the moment, all locking is done with read_seqlock_excl(), which is just
a wrapper around spin_lock() and doesn't bump the sequence counter at all.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/inode.c |   62 ++++++++++++++++++++++++++++++------------------------------
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 00bb48ca3642..cc2b08d82618 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -56,7 +56,7 @@
 static unsigned int i_hash_mask __read_mostly;
 static unsigned int i_hash_shift __read_mostly;
 static struct hlist_head *inode_hashtable __read_mostly;
-static __cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_hash_lock);
+static __cacheline_aligned_in_smp DEFINE_SEQLOCK(inode_hash_lock);
 
 /*
  * Empty aops. Can be used for the cases where the user does not
@@ -477,11 +477,11 @@ void __insert_inode_hash(struct inode *inode, unsigned long hashval)
 {
 	struct hlist_head *b = inode_hashtable + hash(inode->i_sb, hashval);
 
-	spin_lock(&inode_hash_lock);
+	read_seqlock_excl(&inode_hash_lock);
 	spin_lock(&inode->i_lock);
 	hlist_add_head(&inode->i_hash, b);
 	spin_unlock(&inode->i_lock);
-	spin_unlock(&inode_hash_lock);
+	read_sequnlock_excl(&inode_hash_lock);
 }
 EXPORT_SYMBOL(__insert_inode_hash);
 
@@ -493,11 +493,11 @@ EXPORT_SYMBOL(__insert_inode_hash);
  */
 void __remove_inode_hash(struct inode *inode)
 {
-	spin_lock(&inode_hash_lock);
+	read_seqlock_excl(&inode_hash_lock);
 	spin_lock(&inode->i_lock);
 	hlist_del_init(&inode->i_hash);
 	spin_unlock(&inode->i_lock);
-	spin_unlock(&inode_hash_lock);
+	read_sequnlock_excl(&inode_hash_lock);
 }
 EXPORT_SYMBOL(__remove_inode_hash);
 
@@ -514,13 +514,13 @@ void rehash_inode(struct inode *inode, unsigned long hashval,
 {
 	struct hlist_head *b = inode_hashtable + hash(inode->i_sb, hashval);
 
-	spin_lock(&inode_hash_lock);
+	write_seqlock(&inode_hash_lock);
 	spin_lock(&inode->i_lock);
 	hlist_del_init(&inode->i_hash);
 	reset(inode, hashval, data);
 	hlist_add_head(&inode->i_hash, b);
 	spin_unlock(&inode->i_lock);
-	spin_unlock(&inode_hash_lock);
+	write_sequnlock(&inode_hash_lock);
 }
 EXPORT_SYMBOL(rehash_inode);
 
@@ -1076,14 +1076,14 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval,
 	bool creating = inode->i_state & I_CREATING;
 
 again:
-	spin_lock(&inode_hash_lock);
+	read_seqlock_excl(&inode_hash_lock);
 	old = find_inode(inode->i_sb, head, test, data);
 	if (unlikely(old)) {
 		/*
 		 * Uhhuh, somebody else created the same inode under us.
 		 * Use the old inode instead of the preallocated one.
 		 */
-		spin_unlock(&inode_hash_lock);
+		read_sequnlock_excl(&inode_hash_lock);
 		if (IS_ERR(old))
 			return NULL;
 		wait_on_inode(old);
@@ -1110,7 +1110,7 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval,
 	if (!creating)
 		inode_sb_list_add(inode);
 unlock:
-	spin_unlock(&inode_hash_lock);
+	read_sequnlock_excl(&inode_hash_lock);
 
 	return inode;
 }
@@ -1174,9 +1174,9 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino)
 	struct hlist_head *head = inode_hashtable + hash(sb, ino);
 	struct inode *inode;
 again:
-	spin_lock(&inode_hash_lock);
+	read_seqlock_excl(&inode_hash_lock);
 	inode = find_inode_fast(sb, head, ino);
-	spin_unlock(&inode_hash_lock);
+	read_sequnlock_excl(&inode_hash_lock);
 	if (inode) {
 		if (IS_ERR(inode))
 			return NULL;
@@ -1192,17 +1192,17 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino)
 	if (inode) {
 		struct inode *old;
 
-		spin_lock(&inode_hash_lock);
+		read_seqlock_excl(&inode_hash_lock);
 		/* We released the lock, so.. */
 		old = find_inode_fast(sb, head, ino);
 		if (!old) {
 			inode->i_ino = ino;
 			spin_lock(&inode->i_lock);
 			inode->i_state = I_NEW;
-			hlist_add_head(&inode->i_hash, head);
+			hlist_add_head_rcu(&inode->i_hash, head);
 			spin_unlock(&inode->i_lock);
 			inode_sb_list_add(inode);
-			spin_unlock(&inode_hash_lock);
+			read_sequnlock_excl(&inode_hash_lock);
 
 			/* Return the locked inode with I_NEW set, the
 			 * caller is responsible for filling in the contents
@@ -1215,7 +1215,7 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino)
 		 * us. Use the old inode instead of the one we just
 		 * allocated.
 		 */
-		spin_unlock(&inode_hash_lock);
+		read_sequnlock_excl(&inode_hash_lock);
 		destroy_inode(inode);
 		if (IS_ERR(old))
 			return NULL;
@@ -1242,14 +1242,14 @@ static int test_inode_iunique(struct super_block *sb, unsigned long ino)
 	struct hlist_head *b = inode_hashtable + hash(sb, ino);
 	struct inode *inode;
 
-	spin_lock(&inode_hash_lock);
+	read_seqlock_excl(&inode_hash_lock);
 	hlist_for_each_entry(inode, b, i_hash) {
 		if (inode->i_ino == ino && inode->i_sb == sb) {
-			spin_unlock(&inode_hash_lock);
+			read_sequnlock_excl(&inode_hash_lock);
 			return 0;
 		}
 	}
-	spin_unlock(&inode_hash_lock);
+	read_sequnlock_excl(&inode_hash_lock);
 
 	return 1;
 }
@@ -1332,9 +1332,9 @@ struct inode *ilookup5_nowait(struct super_block *sb, unsigned long hashval,
 	struct hlist_head *head = inode_hashtable + hash(sb, hashval);
 	struct inode *inode;
 
-	spin_lock(&inode_hash_lock);
+	read_seqlock_excl(&inode_hash_lock);
 	inode = find_inode(sb, head, test, data);
-	spin_unlock(&inode_hash_lock);
+	read_sequnlock_excl(&inode_hash_lock);
 
 	return IS_ERR(inode) ? NULL : inode;
 }
@@ -1387,9 +1387,9 @@ struct inode *ilookup(struct super_block *sb, unsigned long ino)
 	struct hlist_head *head = inode_hashtable + hash(sb, ino);
 	struct inode *inode;
 again:
-	spin_lock(&inode_hash_lock);
+	read_seqlock_excl(&inode_hash_lock);
 	inode = find_inode_fast(sb, head, ino);
-	spin_unlock(&inode_hash_lock);
+	read_sequnlock_excl(&inode_hash_lock);
 
 	if (inode) {
 		if (IS_ERR(inode))
@@ -1437,7 +1437,7 @@ struct inode *find_inode_nowait(struct super_block *sb,
 	struct inode *inode, *ret_inode = NULL;
 	int mval;
 
-	spin_lock(&inode_hash_lock);
+	read_seqlock_excl(&inode_hash_lock);
 	hlist_for_each_entry(inode, head, i_hash) {
 		if (inode->i_sb != sb)
 			continue;
@@ -1449,7 +1449,7 @@ struct inode *find_inode_nowait(struct super_block *sb,
 		goto out;
 	}
 out:
-	spin_unlock(&inode_hash_lock);
+	read_sequnlock_excl(&inode_hash_lock);
 	return ret_inode;
 }
 EXPORT_SYMBOL(find_inode_nowait);
@@ -1462,7 +1462,7 @@ int insert_inode_locked(struct inode *inode)
 
 	while (1) {
 		struct inode *old = NULL;
-		spin_lock(&inode_hash_lock);
+		read_seqlock_excl(&inode_hash_lock);
 		hlist_for_each_entry(old, head, i_hash) {
 			if (old->i_ino != ino)
 				continue;
@@ -1480,17 +1480,17 @@ int insert_inode_locked(struct inode *inode)
 			inode->i_state |= I_NEW | I_CREATING;
 			hlist_add_head(&inode->i_hash, head);
 			spin_unlock(&inode->i_lock);
-			spin_unlock(&inode_hash_lock);
+			read_sequnlock_excl(&inode_hash_lock);
 			return 0;
 		}
 		if (unlikely(old->i_state & I_CREATING)) {
 			spin_unlock(&old->i_lock);
-			spin_unlock(&inode_hash_lock);
+			read_sequnlock_excl(&inode_hash_lock);
 			return -EBUSY;
 		}
 		__iget(old);
 		spin_unlock(&old->i_lock);
-		spin_unlock(&inode_hash_lock);
+		read_sequnlock_excl(&inode_hash_lock);
 		wait_on_inode(old);
 		if (unlikely(!inode_unhashed(old))) {
 			iput(old);
@@ -1932,10 +1932,10 @@ static void __wait_on_freeing_inode(struct inode *inode)
 	wq = bit_waitqueue(&inode->i_state, __I_NEW);
 	prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
 	spin_unlock(&inode->i_lock);
-	spin_unlock(&inode_hash_lock);
+	read_sequnlock_excl(&inode_hash_lock);
 	schedule();
 	finish_wait(wq, &wait.wq_entry);
-	spin_lock(&inode_hash_lock);
+	read_seqlock_excl(&inode_hash_lock);
 }
 
 static __initdata unsigned long ihash_entries;


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/6] vfs: Allow searching of the icache under RCU conditions [ver #2]
  2019-04-25 15:01 [PATCH 0/6] vfs: Make icache searchable under RCU [ver #2] David Howells
  2019-04-25 15:01 ` [PATCH 1/6] vfs, coda: Fix the lack of locking in FID replacement inode rehashing " David Howells
  2019-04-25 15:01 ` [PATCH 2/6] vfs: Change inode_hash_lock to a seqlock " David Howells
@ 2019-04-25 15:02 ` David Howells
  2019-04-25 15:19   ` Al Viro
  2019-04-25 15:45   ` David Howells
  2019-04-25 15:02 ` [PATCH 4/6] afs: Use RCU inode cache search for callback resolution " David Howells
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 9+ messages in thread
From: David Howells @ 2019-04-25 15:02 UTC (permalink / raw)
  To: viro
  Cc: dhowells, linux-afs, linux-ext4, linux-ntfs-dev, linux-fsdevel,
	linux-kernel

Allow searching of the inode cache under RCU conditions - but with a
footnote that this is redone under lock under certain conditions.

The following changes are made:

 (1) Use hlist_add_head_rcu() and hlist_del_init_rcu() to add and remove
     an inode to/from a bucket.

 (2) In rehash_inode(), called by Coda to change the identifying parameters
     on an inode during resolution of disconnected operation, lock
     inode_hash_lock with write_seqlock(), which takes the spinlock and
     bumps the sequence counter.

 (3) Provide __find_inode_rcu() and __find_inode_by_ino rcu() which do an
     RCU-safe crawl through a hash bucket.

 (4) Provide find_inode_rcu() and find_inode_by_ino_rcu() which do a
     read_seqbegin_or_lock() conditional lock-loop on inode_hash_lock to
     cover searching the icache.  Normally this will work without needing
     to retry, but in case (4), where an inode may be moved between lists,
     we need to retry with the lock held.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/inode.c         |  200 ++++++++++++++++++++++++++++++++++++++++++++--------
 include/linux/fs.h |    3 +
 2 files changed, 174 insertions(+), 29 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index cc2b08d82618..f13e2db7cc1d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -479,7 +479,7 @@ void __insert_inode_hash(struct inode *inode, unsigned long hashval)
 
 	read_seqlock_excl(&inode_hash_lock);
 	spin_lock(&inode->i_lock);
-	hlist_add_head(&inode->i_hash, b);
+	hlist_add_head_rcu(&inode->i_hash, b);
 	spin_unlock(&inode->i_lock);
 	read_sequnlock_excl(&inode_hash_lock);
 }
@@ -495,7 +495,7 @@ void __remove_inode_hash(struct inode *inode)
 {
 	read_seqlock_excl(&inode_hash_lock);
 	spin_lock(&inode->i_lock);
-	hlist_del_init(&inode->i_hash);
+	hlist_del_init_rcu(&inode->i_hash);
 	spin_unlock(&inode->i_lock);
 	read_sequnlock_excl(&inode_hash_lock);
 }
@@ -507,6 +507,11 @@ EXPORT_SYMBOL(__remove_inode_hash);
  * @hashval: New hash value (usually inode number) to get
  * @reset: Callback used to relabel the inode
  * @data: Opaque data pointer to pass to @reset
+ *
+ * Reinsert the inode into the hash, potentially moving it between queues - but
+ * we have to be aware that someone might be trawling either list under the RCU
+ * readlock whilst we do so; to deal with this, we bump the seq counter in the
+ * hashtable lock.
  */
 void rehash_inode(struct inode *inode, unsigned long hashval,
 		  void (*reset)(struct inode *inode, unsigned long hashval, void *data),
@@ -516,9 +521,11 @@ void rehash_inode(struct inode *inode, unsigned long hashval,
 
 	write_seqlock(&inode_hash_lock);
 	spin_lock(&inode->i_lock);
-	hlist_del_init(&inode->i_hash);
+
+	hlist_del_init_rcu(&inode->i_hash);
 	reset(inode, hashval, data);
-	hlist_add_head(&inode->i_hash, b);
+	hlist_add_head_rcu(&inode->i_hash, b);
+
 	spin_unlock(&inode->i_lock);
 	write_sequnlock(&inode_hash_lock);
 }
@@ -806,8 +813,31 @@ long prune_icache_sb(struct super_block *sb, struct shrink_control *sc)
 }
 
 static void __wait_on_freeing_inode(struct inode *inode);
+
 /*
- * Called with the inode lock held.
+ * Find an inode.  Can be called with either the RCU read lock or the
+ * inode cache lock held.  No check is made as to the validity of the
+ * inode found.
+ */
+static struct inode *__find_inode_rcu(struct super_block *sb,
+				      struct hlist_head *head,
+				      int (*test)(struct inode *, void *),
+				      void *data)
+{
+	struct inode *inode;
+
+	hlist_for_each_entry_rcu(inode, head, i_hash) {
+		if (inode->i_sb == sb &&
+		    test(inode, data))
+			return inode;
+	}
+
+	return NULL;
+}
+
+/*
+ * Called with the inode hash lock held.  Waits until dying inodes are freed,
+ * dropping the inode hash lock temporarily to do so.
  */
 static struct inode *find_inode(struct super_block *sb,
 				struct hlist_head *head,
@@ -817,11 +847,8 @@ static struct inode *find_inode(struct super_block *sb,
 	struct inode *inode = NULL;
 
 repeat:
-	hlist_for_each_entry(inode, head, i_hash) {
-		if (inode->i_sb != sb)
-			continue;
-		if (!test(inode, data))
-			continue;
+	inode = __find_inode_rcu(sb, head, test, data);
+	if (inode) {
 		spin_lock(&inode->i_lock);
 		if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
 			__wait_on_freeing_inode(inode);
@@ -838,6 +865,26 @@ static struct inode *find_inode(struct super_block *sb,
 	return NULL;
 }
 
+/*
+ * Find an inode by inode number.  Can be called with either the RCU
+ * read lock or the inode cache lock held.  No check is made as to the
+ * validity of the inode found.
+ */
+static struct inode *__find_inode_by_ino_rcu(struct super_block *sb,
+					     struct hlist_head *head,
+					     unsigned long ino)
+{
+	struct inode *inode;
+
+	hlist_for_each_entry_rcu(inode, head, i_hash) {
+		if (inode->i_ino == ino &&
+		    inode->i_sb == sb)
+			return inode;
+	}
+
+	return NULL;
+}
+
 /*
  * find_inode_fast is the fast path version of find_inode, see the comment at
  * iget_locked for details.
@@ -848,11 +895,8 @@ static struct inode *find_inode_fast(struct super_block *sb,
 	struct inode *inode = NULL;
 
 repeat:
-	hlist_for_each_entry(inode, head, i_hash) {
-		if (inode->i_ino != ino)
-			continue;
-		if (inode->i_sb != sb)
-			continue;
+	inode = __find_inode_by_ino_rcu(sb, head, ino);
+	if (inode) {
 		spin_lock(&inode->i_lock);
 		if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
 			__wait_on_freeing_inode(inode);
@@ -1105,7 +1149,7 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval,
 	 */
 	spin_lock(&inode->i_lock);
 	inode->i_state |= I_NEW;
-	hlist_add_head(&inode->i_hash, head);
+	hlist_add_head_rcu(&inode->i_hash, head);
 	spin_unlock(&inode->i_lock);
 	if (!creating)
 		inode_sb_list_add(inode);
@@ -1243,15 +1287,9 @@ static int test_inode_iunique(struct super_block *sb, unsigned long ino)
 	struct inode *inode;
 
 	read_seqlock_excl(&inode_hash_lock);
-	hlist_for_each_entry(inode, b, i_hash) {
-		if (inode->i_ino == ino && inode->i_sb == sb) {
-			read_sequnlock_excl(&inode_hash_lock);
-			return 0;
-		}
-	}
+	inode = __find_inode_by_ino_rcu(sb, b, ino);
 	read_sequnlock_excl(&inode_hash_lock);
-
-	return 1;
+	return inode ? 0 : 1;
 }
 
 /**
@@ -1323,6 +1361,7 @@ EXPORT_SYMBOL(igrab);
  *
  * Note: I_NEW is not waited upon so you have to be very careful what you do
  * with the returned inode.  You probably should be using ilookup5() instead.
+ * It may still sleep waiting for I_FREE and I_WILL_FREE, however.
  *
  * Note2: @test is called with the inode_hash_lock held, so can't sleep.
  */
@@ -1454,6 +1493,104 @@ struct inode *find_inode_nowait(struct super_block *sb,
 }
 EXPORT_SYMBOL(find_inode_nowait);
 
+/**
+ * find_inode_rcu - find an inode in the inode cache
+ * @sb:		Super block of file system to search
+ * @hashval:	Key to hash
+ * @test:	Function to test match on an inode
+ * @data:	Data for test function
+ *
+ * Search for the inode specified by @hashval and @data in the inode cache,
+ * where the helper function @test will return 0 if the inode does not match
+ * and 1 if it does.  The @test function must be responsible for taking the
+ * i_lock spin_lock and checking i_state for an inode being freed or being
+ * initialized.
+ *
+ * If successful, this will return the inode for which the @test function
+ * returned 1 and NULL otherwise.
+ *
+ * The @test function is not permitted to take a ref on any inode presented
+ * unless the caller is holding the inode hashtable lock.  It is also not
+ * permitted to sleep, since it may be called with the RCU read lock held.
+ *
+ * The caller must hold either the RCU read lock or the inode hashtable lock.
+ */
+struct inode *find_inode_rcu(struct super_block *sb, unsigned long hashval,
+			     int (*test)(struct inode *, void *), void *data)
+{
+	struct hlist_head *head = inode_hashtable + hash(sb, hashval);
+	struct inode *inode;
+	unsigned int seq = 0;
+
+	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
+			 "suspicious find_inode_rcu() usage");
+
+	do {
+		read_seqbegin_or_lock(&inode_hash_lock, &seq);
+
+		hlist_for_each_entry_rcu(inode, head, i_hash) {
+			if (inode->i_sb == sb &&
+			    !(READ_ONCE(inode->i_state) & (I_FREEING | I_WILL_FREE)) &&
+			    test(inode, data))
+				goto done;
+		}
+	} while (need_seqretry(&inode_hash_lock, seq));
+
+	inode = NULL;
+done:
+	done_seqretry(&inode_hash_lock, seq);
+	return inode;
+}
+EXPORT_SYMBOL(find_inode_rcu);
+
+/**
+ * find_inode_by_ino_rcu - Find an inode in the inode cache
+ * @sb:		Super block of file system to search
+ * @ino:	The inode number to match
+ *
+ * Search for the inode specified by @hashval and @data in the inode cache,
+ * where the helper function @test will return 0 if the inode does not match
+ * and 1 if it does.  The @test function must be responsible for taking the
+ * i_lock spin_lock and checking i_state for an inode being freed or being
+ * initialized.
+ *
+ * If successful, this will return the inode for which the @test function
+ * returned 1 and NULL otherwise.
+ *
+ * The @test function is not permitted to take a ref on any inode presented
+ * unless the caller is holding the inode hashtable lock.  It is also not
+ * permitted to sleep, since it may be called with the RCU read lock held.
+ *
+ * The caller must hold either the RCU read lock or the inode hashtable lock.
+ */
+struct inode *find_inode_by_ino_rcu(struct super_block *sb,
+				    unsigned long ino)
+{
+	struct hlist_head *head = inode_hashtable + hash(sb, ino);
+	struct inode *inode;
+	unsigned int seq = 0;
+
+	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
+			 "suspicious find_inode_by_ino_rcu() usage");
+
+	do {
+		read_seqbegin_or_lock(&inode_hash_lock, &seq);
+
+		hlist_for_each_entry_rcu(inode, head, i_hash) {
+			if (inode->i_ino == ino &&
+			    inode->i_sb == sb &&
+			    !(READ_ONCE(inode->i_state) & (I_FREEING | I_WILL_FREE)))
+				goto done;
+		}
+	} while (need_seqretry(&inode_hash_lock, seq));
+
+	inode = NULL;
+done:
+	done_seqretry(&inode_hash_lock, seq);
+	return inode;
+}
+EXPORT_SYMBOL(find_inode_by_ino_rcu);
+
 int insert_inode_locked(struct inode *inode)
 {
 	struct super_block *sb = inode->i_sb;
@@ -1478,7 +1615,7 @@ int insert_inode_locked(struct inode *inode)
 		if (likely(!old)) {
 			spin_lock(&inode->i_lock);
 			inode->i_state |= I_NEW | I_CREATING;
-			hlist_add_head(&inode->i_hash, head);
+			hlist_add_head_rcu(&inode->i_hash, head);
 			spin_unlock(&inode->i_lock);
 			read_sequnlock_excl(&inode_hash_lock);
 			return 0;
@@ -1538,6 +1675,7 @@ static void iput_final(struct inode *inode)
 {
 	struct super_block *sb = inode->i_sb;
 	const struct super_operations *op = inode->i_sb->s_op;
+	unsigned long state;
 	int drop;
 
 	WARN_ON(inode->i_state & I_NEW);
@@ -1553,16 +1691,20 @@ static void iput_final(struct inode *inode)
 		return;
 	}
 
+	state = READ_ONCE(inode->i_state);
 	if (!drop) {
-		inode->i_state |= I_WILL_FREE;
+		WRITE_ONCE(inode->i_state, state | I_WILL_FREE);
 		spin_unlock(&inode->i_lock);
+
 		write_inode_now(inode, 1);
+
 		spin_lock(&inode->i_lock);
-		WARN_ON(inode->i_state & I_NEW);
-		inode->i_state &= ~I_WILL_FREE;
+		state = READ_ONCE(inode->i_state);
+		WARN_ON(state & I_NEW);
+		state &= ~I_WILL_FREE;
 	}
 
-	inode->i_state |= I_FREEING;
+	WRITE_ONCE(inode->i_state, state | I_FREEING);
 	if (!list_empty(&inode->i_lru))
 		inode_lru_list_del(inode);
 	spin_unlock(&inode->i_lock);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6442ff08b28c..d252242b0ac0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2984,6 +2984,9 @@ extern struct inode *find_inode_nowait(struct super_block *,
 				       int (*match)(struct inode *,
 						    unsigned long, void *),
 				       void *data);
+extern struct inode *find_inode_rcu(struct super_block *, unsigned long,
+				    int (*)(struct inode *, void *), void *);
+extern struct inode *find_inode_by_ino_rcu(struct super_block *, unsigned long);
 extern int insert_inode_locked4(struct inode *, unsigned long, int (*test)(struct inode *, void *), void *);
 extern int insert_inode_locked(struct inode *);
 #ifdef CONFIG_DEBUG_LOCK_ALLOC


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/6] afs: Use RCU inode cache search for callback resolution [ver #2]
  2019-04-25 15:01 [PATCH 0/6] vfs: Make icache searchable under RCU [ver #2] David Howells
                   ` (2 preceding siblings ...)
  2019-04-25 15:02 ` [PATCH 3/6] vfs: Allow searching of the icache under RCU conditions " David Howells
@ 2019-04-25 15:02 ` David Howells
  2019-04-25 15:02 ` [PATCH 5/6] ext4: Search for an inode to update under the RCU lock if we can " David Howells
  2019-04-25 15:02 ` [PATCH 6/6] vfs: Delete find_inode_nowait() " David Howells
  5 siblings, 0 replies; 9+ messages in thread
From: David Howells @ 2019-04-25 15:02 UTC (permalink / raw)
  To: viro
  Cc: linux-afs, dhowells, linux-afs, linux-ext4, linux-ntfs-dev,
	linux-fsdevel, linux-kernel

Search the inode cache under "RCU conditions" when trying to find a vnode
to break the callback on.  We don't want to be taking the icache lock if we
can avoid it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
---

 fs/afs/callback.c |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index 128f2dbe256a..3d0280d4fbf0 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -251,6 +251,7 @@ static void afs_break_one_callback(struct afs_server *server,
 	struct afs_vnode *vnode;
 	struct inode *inode;
 
+	rcu_read_lock();
 	read_lock(&server->cb_break_lock);
 	hlist_for_each_entry(vi, &server->cb_volumes, srv_link) {
 		if (vi->vid < fid->vid)
@@ -284,18 +285,23 @@ static void afs_break_one_callback(struct afs_server *server,
 		} else {
 			data.volume = NULL;
 			data.fid = *fid;
-			inode = ilookup5_nowait(cbi->sb, fid->vnode,
-						afs_iget5_test, &data);
+
+			/* See if we can find a matching inode - even an I_NEW
+			 * inode needs to be marked as it can have its callback
+			 * broken before we finish setting up the local inode.
+			 */
+			inode = find_inode_rcu(cbi->sb, fid->vnode,
+					       afs_iget5_test, &data);
 			if (inode) {
 				vnode = AFS_FS_I(inode);
 				afs_break_callback(vnode);
-				iput(inode);
 			}
 		}
 	}
 
 out:
 	read_unlock(&server->cb_break_lock);
+	rcu_read_unlock();
 }
 
 /*


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/6] ext4: Search for an inode to update under the RCU lock if we can [ver #2]
  2019-04-25 15:01 [PATCH 0/6] vfs: Make icache searchable under RCU [ver #2] David Howells
                   ` (3 preceding siblings ...)
  2019-04-25 15:02 ` [PATCH 4/6] afs: Use RCU inode cache search for callback resolution " David Howells
@ 2019-04-25 15:02 ` David Howells
  2019-04-25 15:02 ` [PATCH 6/6] vfs: Delete find_inode_nowait() " David Howells
  5 siblings, 0 replies; 9+ messages in thread
From: David Howells @ 2019-04-25 15:02 UTC (permalink / raw)
  To: viro
  Cc: Theodore Ts'o, Andreas Dilger, linux-ext4, dhowells,
	linux-afs, linux-ext4, linux-ntfs-dev, linux-fsdevel,
	linux-kernel

When we're updating timestamps in an inode, search for that inode under RCU
conditions to avoid holding the icache lock if we can.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: "Theodore Ts'o" <tytso@mit.edu>
cc: Andreas Dilger <adilger.kernel@dilger.ca>
cc: linux-ext4@vger.kernel.org
---

 fs/ext4/inode.c |   54 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b32a57bc5d5d..b1eb55d5c329 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5144,41 +5144,43 @@ static int ext4_inode_blocks_set(handle_t *handle,
 	return 0;
 }
 
-struct other_inode {
-	unsigned long		orig_ino;
-	struct ext4_inode	*raw_inode;
-};
-
-static int other_inode_match(struct inode * inode, unsigned long ino,
-			     void *data)
+static void __ext4_update_other_inode_time(struct super_block *sb,
+					   unsigned long orig_ino,
+					   unsigned long ino,
+					   struct ext4_inode *raw_inode)
 {
-	struct other_inode *oi = (struct other_inode *) data;
+	struct inode *inode;
+	unsigned long state;
+
+	inode = find_inode_by_ino_rcu(sb, ino);
+	if (!inode)
+		return;
 
+	state = READ_ONCE(inode->i_state);
 	if ((inode->i_ino != ino) ||
-	    (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW |
-			       I_DIRTY_INODE)) ||
-	    ((inode->i_state & I_DIRTY_TIME) == 0))
-		return 0;
+	    (state & (I_FREEING | I_WILL_FREE | I_NEW | I_DIRTY_INODE)) ||
+	    ((state & I_DIRTY_TIME) == 0))
+		return;
+
 	spin_lock(&inode->i_lock);
-	if (((inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW |
-				I_DIRTY_INODE)) == 0) &&
-	    (inode->i_state & I_DIRTY_TIME)) {
+	state = READ_ONCE(inode->i_state);
+	if (((state & (I_FREEING | I_WILL_FREE | I_NEW | I_DIRTY_INODE)) == 0) &&
+	    (state & I_DIRTY_TIME)) {
 		struct ext4_inode_info	*ei = EXT4_I(inode);
 
 		inode->i_state &= ~(I_DIRTY_TIME | I_DIRTY_TIME_EXPIRED);
 		spin_unlock(&inode->i_lock);
 
 		spin_lock(&ei->i_raw_lock);
-		EXT4_INODE_SET_XTIME(i_ctime, inode, oi->raw_inode);
-		EXT4_INODE_SET_XTIME(i_mtime, inode, oi->raw_inode);
-		EXT4_INODE_SET_XTIME(i_atime, inode, oi->raw_inode);
-		ext4_inode_csum_set(inode, oi->raw_inode, ei);
+		EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
+		EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
+		EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
+		ext4_inode_csum_set(inode, raw_inode, ei);
 		spin_unlock(&ei->i_raw_lock);
-		trace_ext4_other_inode_update_time(inode, oi->orig_ino);
-		return -1;
+		trace_ext4_other_inode_update_time(inode, orig_ino);
+		return;
 	}
 	spin_unlock(&inode->i_lock);
-	return -1;
 }
 
 /*
@@ -5188,24 +5190,24 @@ static int other_inode_match(struct inode * inode, unsigned long ino,
 static void ext4_update_other_inodes_time(struct super_block *sb,
 					  unsigned long orig_ino, char *buf)
 {
-	struct other_inode oi;
 	unsigned long ino;
 	int i, inodes_per_block = EXT4_SB(sb)->s_inodes_per_block;
 	int inode_size = EXT4_INODE_SIZE(sb);
 
-	oi.orig_ino = orig_ino;
 	/*
 	 * Calculate the first inode in the inode table block.  Inode
 	 * numbers are one-based.  That is, the first inode in a block
 	 * (assuming 4k blocks and 256 byte inodes) is (n*16 + 1).
 	 */
 	ino = ((orig_ino - 1) & ~(inodes_per_block - 1)) + 1;
+	rcu_read_lock();
 	for (i = 0; i < inodes_per_block; i++, ino++, buf += inode_size) {
 		if (ino == orig_ino)
 			continue;
-		oi.raw_inode = (struct ext4_inode *) buf;
-		(void) find_inode_nowait(sb, ino, other_inode_match, &oi);
+		__ext4_update_other_inode_time(sb, orig_ino, ino,
+					       (struct ext4_inode *)buf);
 	}
+	rcu_read_unlock();
 }
 
 /*


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 6/6] vfs: Delete find_inode_nowait() [ver #2]
  2019-04-25 15:01 [PATCH 0/6] vfs: Make icache searchable under RCU [ver #2] David Howells
                   ` (4 preceding siblings ...)
  2019-04-25 15:02 ` [PATCH 5/6] ext4: Search for an inode to update under the RCU lock if we can " David Howells
@ 2019-04-25 15:02 ` David Howells
  5 siblings, 0 replies; 9+ messages in thread
From: David Howells @ 2019-04-25 15:02 UTC (permalink / raw)
  To: viro
  Cc: dhowells, linux-afs, linux-ext4, linux-ntfs-dev, linux-fsdevel,
	linux-kernel

Delete find_inode_nowait() as it's no longer used.
---

 fs/inode.c         |   50 --------------------------------------------------
 include/linux/fs.h |    5 -----
 2 files changed, 55 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index f13e2db7cc1d..6aae3aa7aa9a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1443,56 +1443,6 @@ struct inode *ilookup(struct super_block *sb, unsigned long ino)
 }
 EXPORT_SYMBOL(ilookup);
 
-/**
- * find_inode_nowait - find an inode in the inode cache
- * @sb:		super block of file system to search
- * @hashval:	hash value (usually inode number) to search for
- * @match:	callback used for comparisons between inodes
- * @data:	opaque data pointer to pass to @match
- *
- * Search for the inode specified by @hashval and @data in the inode
- * cache, where the helper function @match will return 0 if the inode
- * does not match, 1 if the inode does match, and -1 if the search
- * should be stopped.  The @match function must be responsible for
- * taking the i_lock spin_lock and checking i_state for an inode being
- * freed or being initialized, and incrementing the reference count
- * before returning 1.  It also must not sleep, since it is called with
- * the inode_hash_lock spinlock held.
- *
- * This is a even more generalized version of ilookup5() when the
- * function must never block --- find_inode() can block in
- * __wait_on_freeing_inode() --- or when the caller can not increment
- * the reference count because the resulting iput() might cause an
- * inode eviction.  The tradeoff is that the @match funtion must be
- * very carefully implemented.
- */
-struct inode *find_inode_nowait(struct super_block *sb,
-				unsigned long hashval,
-				int (*match)(struct inode *, unsigned long,
-					     void *),
-				void *data)
-{
-	struct hlist_head *head = inode_hashtable + hash(sb, hashval);
-	struct inode *inode, *ret_inode = NULL;
-	int mval;
-
-	read_seqlock_excl(&inode_hash_lock);
-	hlist_for_each_entry(inode, head, i_hash) {
-		if (inode->i_sb != sb)
-			continue;
-		mval = match(inode, hashval, data);
-		if (mval == 0)
-			continue;
-		if (mval == 1)
-			ret_inode = inode;
-		goto out;
-	}
-out:
-	read_sequnlock_excl(&inode_hash_lock);
-	return ret_inode;
-}
-EXPORT_SYMBOL(find_inode_nowait);
-
 /**
  * find_inode_rcu - find an inode in the inode cache
  * @sb:		Super block of file system to search
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d252242b0ac0..3e8689ff3a2f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2979,11 +2979,6 @@ extern struct inode *inode_insert5(struct inode *inode, unsigned long hashval,
 		void *data);
 extern struct inode * iget5_locked(struct super_block *, unsigned long, int (*test)(struct inode *, void *), int (*set)(struct inode *, void *), void *);
 extern struct inode * iget_locked(struct super_block *, unsigned long);
-extern struct inode *find_inode_nowait(struct super_block *,
-				       unsigned long,
-				       int (*match)(struct inode *,
-						    unsigned long, void *),
-				       void *data);
 extern struct inode *find_inode_rcu(struct super_block *, unsigned long,
 				    int (*)(struct inode *, void *), void *);
 extern struct inode *find_inode_by_ino_rcu(struct super_block *, unsigned long);


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/6] vfs: Allow searching of the icache under RCU conditions [ver #2]
  2019-04-25 15:02 ` [PATCH 3/6] vfs: Allow searching of the icache under RCU conditions " David Howells
@ 2019-04-25 15:19   ` Al Viro
  2019-04-25 15:45   ` David Howells
  1 sibling, 0 replies; 9+ messages in thread
From: Al Viro @ 2019-04-25 15:19 UTC (permalink / raw)
  To: David Howells
  Cc: linux-afs, linux-ext4, linux-ntfs-dev, linux-fsdevel, linux-kernel

On Thu, Apr 25, 2019 at 04:02:11PM +0100, David Howells wrote:
> Allow searching of the inode cache under RCU conditions - but with a
> footnote that this is redone under lock under certain conditions.
> 
> The following changes are made:
> 
>  (1) Use hlist_add_head_rcu() and hlist_del_init_rcu() to add and remove
>      an inode to/from a bucket.
> 
>  (2) In rehash_inode(), called by Coda to change the identifying parameters
>      on an inode during resolution of disconnected operation, lock
>      inode_hash_lock with write_seqlock(), which takes the spinlock and
>      bumps the sequence counter.
> 
>  (3) Provide __find_inode_rcu() and __find_inode_by_ino rcu() which do an
>      RCU-safe crawl through a hash bucket.
> 
>  (4) Provide find_inode_rcu() and find_inode_by_ino_rcu() which do a
>      read_seqbegin_or_lock() conditional lock-loop on inode_hash_lock to
>      cover searching the icache.  Normally this will work without needing
>      to retry, but in case (4), where an inode may be moved between lists,
>      we need to retry with the lock held.

Hmm...  Why do these stores to ->i_state need WRITE_ONCE, while an arseload
of similar in fs/fs-writeback.c does not?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/6] vfs: Allow searching of the icache under RCU conditions [ver #2]
  2019-04-25 15:02 ` [PATCH 3/6] vfs: Allow searching of the icache under RCU conditions " David Howells
  2019-04-25 15:19   ` Al Viro
@ 2019-04-25 15:45   ` David Howells
  1 sibling, 0 replies; 9+ messages in thread
From: David Howells @ 2019-04-25 15:45 UTC (permalink / raw)
  To: Al Viro
  Cc: dhowells, linux-afs, linux-ext4, linux-ntfs-dev, linux-fsdevel,
	linux-kernel

Al Viro <viro@zeniv.linux.org.uk> wrote:

> Hmm...  Why do these stores to ->i_state need WRITE_ONCE, while an arseload
> of similar in fs/fs-writeback.c does not?

Because what matters in find_inode_rcu() are the I_WILL_FREE and I_FREEING
flags - and there's a gap during iput_final() where neither is set.

	if (!drop) {
		inode->i_state |= I_WILL_FREE;
		spin_unlock(&inode->i_lock);
		write_inode_now(inode, 1);
		spin_lock(&inode->i_lock);
		WARN_ON(inode->i_state & I_NEW);
		inode->i_state &= ~I_WILL_FREE;
 --->
	}

	inode->i_state |= I_FREEING;

It's normally covered by i_lock, but it's a problem if anyone looks at the
pair without taking i_lock.

Even flipping the order:

	if (!drop) {
		inode->i_state |= I_WILL_FREE;
		spin_unlock(&inode->i_lock);
		write_inode_now(inode, 1);
		spin_lock(&inode->i_lock);
		WARN_ON(inode->i_state & I_NEW);
		inode->i_state |= I_FREEING;
		inode->i_state &= ~I_WILL_FREE;
	} else {
		inode->i_state |= I_FREEING;
	}

isn't a guarantee of the order in which the compiler will do things AIUI.
Maybe I've been listening to Paul McKenney too much.  So the WRITE_ONCE()
should guarantee that both bits will change atomically.

Note that ocfs2_drop_inode() looks a tad suspicious:

	int ocfs2_drop_inode(struct inode *inode)
	{
		struct ocfs2_inode_info *oi = OCFS2_I(inode);

		trace_ocfs2_drop_inode((unsigned long long)oi->ip_blkno,
					inode->i_nlink, oi->ip_flags);

		assert_spin_locked(&inode->i_lock);
		inode->i_state |= I_WILL_FREE;
		spin_unlock(&inode->i_lock);
		write_inode_now(inode, 1);
		spin_lock(&inode->i_lock);
		WARN_ON(inode->i_state & I_NEW);
		inode->i_state &= ~I_WILL_FREE;

		return 1;
	}

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-04-25 15:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-25 15:01 [PATCH 0/6] vfs: Make icache searchable under RCU [ver #2] David Howells
2019-04-25 15:01 ` [PATCH 1/6] vfs, coda: Fix the lack of locking in FID replacement inode rehashing " David Howells
2019-04-25 15:01 ` [PATCH 2/6] vfs: Change inode_hash_lock to a seqlock " David Howells
2019-04-25 15:02 ` [PATCH 3/6] vfs: Allow searching of the icache under RCU conditions " David Howells
2019-04-25 15:19   ` Al Viro
2019-04-25 15:45   ` David Howells
2019-04-25 15:02 ` [PATCH 4/6] afs: Use RCU inode cache search for callback resolution " David Howells
2019-04-25 15:02 ` [PATCH 5/6] ext4: Search for an inode to update under the RCU lock if we can " David Howells
2019-04-25 15:02 ` [PATCH 6/6] vfs: Delete find_inode_nowait() " David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).