linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/2] vfs: add non-blocking mode for function find_inode_nowait()
@ 2020-01-30 17:40 Konstantin Khlebnikov
  2020-01-30 17:40 ` [PATCH v2 2/2] ext4: avoid spinlock congestion in lazytime optimization Konstantin Khlebnikov
  2020-03-09 16:12 ` [PATCH v2 1/2] vfs: add non-blocking mode for function find_inode_nowait() David Howells
  0 siblings, 2 replies; 3+ messages in thread
From: Konstantin Khlebnikov @ 2020-01-30 17:40 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4, Theodore Ts'o, linux-kernel,
	Alexander Viro
  Cc: David Howells, Andreas Dilger, Dmitry Monakhov

Currently concurrent inode lookup by number does not scale well because
inode hash table is protected with single spinlock. Someday inode hash
will become searchable under RCU (see linked patchset by David Howells).

For now main user of this function: ext4_update_other_inodes_time()
could live with optimistic non-blocking try-lock/try-find semantics.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/lkml/158031264567.6836.126132376018905207.stgit@buzz/T/#m83e4432ea81bc6c0ec0d1cca87e97bd89ff671d9
Link: https://lore.kernel.org/lkml/155620449631.4720.8762546550728087460.stgit@warthog.procyon.org.uk/ (RCU)
---
 fs/ext4/inode.c    |    2 +-
 fs/f2fs/node.c     |    2 +-
 fs/inode.c         |    9 +++++++--
 include/linux/fs.h |    2 +-
 4 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 629a25d999f0..9512eb771820 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4865,7 +4865,7 @@ static void ext4_update_other_inodes_time(struct super_block *sb,
 		if (ino == orig_ino)
 			continue;
 		oi.raw_inode = (struct ext4_inode *) buf;
-		(void) find_inode_nowait(sb, ino, other_inode_match, &oi);
+		(void)find_inode_nowait(sb, ino, other_inode_match, &oi, false);
 	}
 }
 
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 3314a0f3405e..1b8515e4f451 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1793,7 +1793,7 @@ static bool flush_dirty_inode(struct page *page)
 	struct inode *inode;
 	nid_t ino = ino_of_node(page);
 
-	inode = find_inode_nowait(sbi->sb, ino, f2fs_match_ino, NULL);
+	inode = find_inode_nowait(sbi->sb, ino, f2fs_match_ino, NULL, false);
 	if (!inode)
 		return false;
 
diff --git a/fs/inode.c b/fs/inode.c
index ea15c6d9f274..6e52ee027b88 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1411,6 +1411,7 @@ EXPORT_SYMBOL(ilookup);
  * @hashval:	hash value (usually inode number) to search for
  * @match:	callback used for comparisons between inodes
  * @data:	opaque data pointer to pass to @match
+ * @nonblock:	if true do not wait for lock and fail with EAGAIN
  *
  * Search for the inode specified by @hashval and @data in the inode
  * cache, where the helper function @match will return 0 if the inode
@@ -1432,13 +1433,17 @@ struct inode *find_inode_nowait(struct super_block *sb,
 				unsigned long hashval,
 				int (*match)(struct inode *, unsigned long,
 					     void *),
-				void *data)
+				void *data, bool nonblock)
 {
 	struct hlist_head *head = inode_hashtable + hash(sb, hashval);
 	struct inode *inode, *ret_inode = NULL;
 	int mval;
 
-	spin_lock(&inode_hash_lock);
+	if (!nonblock)
+		spin_lock(&inode_hash_lock);
+	else if (!spin_trylock(&inode_hash_lock))
+		return ERR_PTR(-EAGAIN);
+
 	hlist_for_each_entry(inode, head, i_hash) {
 		if (inode->i_sb != sb)
 			continue;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 40be2ccb87f3..8eef77dbf0af 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3039,7 +3039,7 @@ extern struct inode *find_inode_nowait(struct super_block *,
 				       unsigned long,
 				       int (*match)(struct inode *,
 						    unsigned long, void *),
-				       void *data);
+				       void *data, bool nonblock);
 extern int insert_inode_locked4(struct inode *, unsigned long, int (*test)(struct inode *, void *), void *);
 extern int insert_inode_locked(struct inode *);
 #ifdef CONFIG_DEBUG_LOCK_ALLOC


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v2 2/2] ext4: avoid spinlock congestion in lazytime optimization
  2020-01-30 17:40 [PATCH v2 1/2] vfs: add non-blocking mode for function find_inode_nowait() Konstantin Khlebnikov
@ 2020-01-30 17:40 ` Konstantin Khlebnikov
  2020-03-09 16:12 ` [PATCH v2 1/2] vfs: add non-blocking mode for function find_inode_nowait() David Howells
  1 sibling, 0 replies; 3+ messages in thread
From: Konstantin Khlebnikov @ 2020-01-30 17:40 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4, Theodore Ts'o, linux-kernel,
	Alexander Viro
  Cc: David Howells, Andreas Dilger, Dmitry Monakhov

Function ext4_update_other_inodes_time() implements optimization which
opportunistically updates times for inodes within same inode table block.

For now concurrent inode lookup by number does not scale well because
inode hash table is protected with single spinlock. It could become very
hot at concurrent writes to fast nvme when inode cache has enough inodes.

Let's use here non-blocking variant of function find_inode_nowait() and
skip opportunistic inode updates if spinlock is congested.


Synthetic testcase by Dmitry Monakhov:

modprobe brd rd_size=10240000 rd_nr=1
mkfs.ext4 -F -I 256 -b 4096 -q /dev/ram0
mkdir -p m
mount /dev/ram0 m -o lazytime,noload,barrier=0
mkdir m/{0..31}
fio --ioengine=ftruncate --direct=1 --bs=4k --filesize=16m --time_based=1 \
--runtime=10 --numjobs=32 --group_reporting --norandommap --name=write-32 \
--rw=randwrite --directory=m --filename_format='$jobnum/t'
umount m
rmdir m
rmmod brd

Before patch:	50k op/s
After patch:	3000k op/s

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Link: https://lore.kernel.org/lkml/158031264567.6836.126132376018905207.stgit@buzz/T/#u (v1)
---
 fs/ext4/inode.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9512eb771820..3dea7327e7ab 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4853,6 +4853,7 @@ static void ext4_update_other_inodes_time(struct super_block *sb,
 	unsigned long ino;
 	int i, inodes_per_block = EXT4_SB(sb)->s_inodes_per_block;
 	int inode_size = EXT4_INODE_SIZE(sb);
+	struct inode *res;
 
 	oi.orig_ino = orig_ino;
 	/*
@@ -4865,7 +4866,10 @@ static void ext4_update_other_inodes_time(struct super_block *sb,
 		if (ino == orig_ino)
 			continue;
 		oi.raw_inode = (struct ext4_inode *) buf;
-		(void)find_inode_nowait(sb, ino, other_inode_match, &oi, false);
+		/* Try to find inode, stop if inode_hash_lock is congested. */
+		res = find_inode_nowait(sb, ino, other_inode_match, &oi, true);
+		if (res == ERR_PTR(-EAGAIN))
+			break;
 	}
 }
 


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2 1/2] vfs: add non-blocking mode for function find_inode_nowait()
  2020-01-30 17:40 [PATCH v2 1/2] vfs: add non-blocking mode for function find_inode_nowait() Konstantin Khlebnikov
  2020-01-30 17:40 ` [PATCH v2 2/2] ext4: avoid spinlock congestion in lazytime optimization Konstantin Khlebnikov
@ 2020-03-09 16:12 ` David Howells
  1 sibling, 0 replies; 3+ messages in thread
From: David Howells @ 2020-03-09 16:12 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: dhowells, linux-fsdevel, linux-ext4, Theodore Ts'o,
	linux-kernel, Alexander Viro, Andreas Dilger, Dmitry Monakhov

Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote:

> Currently concurrent inode lookup by number does not scale well because
> inode hash table is protected with single spinlock. Someday inode hash
> will become searchable under RCU (see linked patchset by David Howells).

Something like this?

David
---
vfs: Make the inode hash table RCU searchable
    
Make the inode hash table RCU searchable so that searches that want to
access or modify an inode without taking a ref on that inode can do so
without taking the inode hash table lock.

The main thing this requires is some RCU annotation on the list
manipulation operations.  Inodes are already freed by RCU in most cases.

Users of this interface must take care as the inode may be still under
construction or may be being torn down around them.

There are at least two instances where this can be of use:

 (1) Ext4 date stamp updating.

 (2) AFS callback breaking.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
cc: linux-ext4@vger.kernel.org
---
 fs/afs/callback.c  |   12 ++-
 fs/ext4/inode.c    |   44 +++++------
 fs/inode.c         |  209 ++++++++++++++++++++++++++++++++++-------------------
 include/linux/fs.h |    8 --
 4 files changed, 172 insertions(+), 101 deletions(-)

diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index 2dca8df1a18d..0dcbd40732d1 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -252,6 +252,7 @@ static void afs_break_one_callback(struct afs_server *server,
 	struct afs_vnode *vnode;
 	struct inode *inode;
 
+	rcu_read_lock();
 	read_lock(&server->cb_break_lock);
 	hlist_for_each_entry(vi, &server->cb_volumes, srv_link) {
 		if (vi->vid < fid->vid)
@@ -287,12 +288,16 @@ static void afs_break_one_callback(struct afs_server *server,
 		} else {
 			data.volume = NULL;
 			data.fid = *fid;
-			inode = ilookup5_nowait(cbi->sb, fid->vnode,
-						afs_iget5_test, &data);
+
+			/* See if we can find a matching inode - even an I_NEW
+			 * inode needs to be marked as it can have its callback
+			 * broken before we finish setting up the local inode.
+			 */
+			inode = find_inode_rcu(cbi->sb, fid->vnode,
+					       afs_iget5_test, &data);
 			if (inode) {
 				vnode = AFS_FS_I(inode);
 				afs_break_callback(vnode, afs_cb_break_for_callback);
-				iput(inode);
 			} else {
 				trace_afs_cb_miss(fid, afs_cb_break_for_callback);
 			}
@@ -301,6 +306,7 @@ static void afs_break_one_callback(struct afs_server *server,
 
 out:
 	read_unlock(&server->cb_break_lock);
+	rcu_read_unlock();
 }
 
 /*
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fa0ff78dc033..9eeb8a22dfdb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4846,21 +4846,22 @@ static int ext4_inode_blocks_set(handle_t *handle,
 	return 0;
 }
 
-struct other_inode {
-	unsigned long		orig_ino;
-	struct ext4_inode	*raw_inode;
-};
-
-static int other_inode_match(struct inode * inode, unsigned long ino,
-			     void *data)
+static void __ext4_update_other_inode_time(struct super_block *sb,
+					   unsigned long orig_ino,
+					   unsigned long ino,
+					   struct ext4_inode *raw_inode)
 {
-	struct other_inode *oi = (struct other_inode *) data;
+	struct inode *inode;
+
+	inode = find_inode_by_ino_rcu(sb, ino);
+	if (!inode)
+		return;
 
-	if ((inode->i_ino != ino) ||
-	    (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW |
+	if ((inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW |
 			       I_DIRTY_INODE)) ||
 	    ((inode->i_state & I_DIRTY_TIME) == 0))
-		return 0;
+		return;
+
 	spin_lock(&inode->i_lock);
 	if (((inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW |
 				I_DIRTY_INODE)) == 0) &&
@@ -4871,16 +4872,15 @@ static int other_inode_match(struct inode * inode, unsigned long ino,
 		spin_unlock(&inode->i_lock);
 
 		spin_lock(&ei->i_raw_lock);
-		EXT4_INODE_SET_XTIME(i_ctime, inode, oi->raw_inode);
-		EXT4_INODE_SET_XTIME(i_mtime, inode, oi->raw_inode);
-		EXT4_INODE_SET_XTIME(i_atime, inode, oi->raw_inode);
-		ext4_inode_csum_set(inode, oi->raw_inode, ei);
+		EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
+		EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
+		EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
+		ext4_inode_csum_set(inode, raw_inode, ei);
 		spin_unlock(&ei->i_raw_lock);
-		trace_ext4_other_inode_update_time(inode, oi->orig_ino);
-		return -1;
+		trace_ext4_other_inode_update_time(inode, orig_ino);
+		return;
 	}
 	spin_unlock(&inode->i_lock);
-	return -1;
 }
 
 /*
@@ -4890,24 +4890,24 @@ static int other_inode_match(struct inode * inode, unsigned long ino,
 static void ext4_update_other_inodes_time(struct super_block *sb,
 					  unsigned long orig_ino, char *buf)
 {
-	struct other_inode oi;
 	unsigned long ino;
 	int i, inodes_per_block = EXT4_SB(sb)->s_inodes_per_block;
 	int inode_size = EXT4_INODE_SIZE(sb);
 
-	oi.orig_ino = orig_ino;
 	/*
 	 * Calculate the first inode in the inode table block.  Inode
 	 * numbers are one-based.  That is, the first inode in a block
 	 * (assuming 4k blocks and 256 byte inodes) is (n*16 + 1).
 	 */
 	ino = ((orig_ino - 1) & ~(inodes_per_block - 1)) + 1;
+	rcu_read_lock();
 	for (i = 0; i < inodes_per_block; i++, ino++, buf += inode_size) {
 		if (ino == orig_ino)
 			continue;
-		oi.raw_inode = (struct ext4_inode *) buf;
-		(void) find_inode_nowait(sb, ino, other_inode_match, &oi);
+		__ext4_update_other_inode_time(sb, orig_ino, ino,
+					       (struct ext4_inode *)buf);
 	}
+	rcu_read_unlock();
 }
 
 /*
diff --git a/fs/inode.c b/fs/inode.c
index 7d57068b6b7a..a190c31e035f 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -496,7 +496,7 @@ void __insert_inode_hash(struct inode *inode, unsigned long hashval)
 
 	spin_lock(&inode_hash_lock);
 	spin_lock(&inode->i_lock);
-	hlist_add_head(&inode->i_hash, b);
+	hlist_add_head_rcu(&inode->i_hash, b);
 	spin_unlock(&inode->i_lock);
 	spin_unlock(&inode_hash_lock);
 }
@@ -512,7 +512,7 @@ void __remove_inode_hash(struct inode *inode)
 {
 	spin_lock(&inode_hash_lock);
 	spin_lock(&inode->i_lock);
-	hlist_del_init(&inode->i_hash);
+	hlist_del_init_rcu(&inode->i_hash);
 	spin_unlock(&inode->i_lock);
 	spin_unlock(&inode_hash_lock);
 }
@@ -807,8 +807,31 @@ long prune_icache_sb(struct super_block *sb, struct shrink_control *sc)
 }
 
 static void __wait_on_freeing_inode(struct inode *inode);
+
 /*
- * Called with the inode lock held.
+ * Find an inode.  Can be called with either the RCU read lock or the
+ * inode cache lock held.  No check is made as to the validity of the
+ * inode found.
+ */
+static struct inode *__find_inode_rcu(struct super_block *sb,
+				      struct hlist_head *head,
+				      int (*test)(struct inode *, void *),
+				      void *data)
+{
+	struct inode *inode;
+
+	hlist_for_each_entry_rcu(inode, head, i_hash) {
+		if (inode->i_sb == sb &&
+		    test(inode, data))
+			return inode;
+	}
+
+	return NULL;
+}
+
+/*
+ * Called with the inode hash lock held.  Waits until dying inodes are freed,
+ * dropping the inode hash lock temporarily to do so.
  */
 static struct inode *find_inode(struct super_block *sb,
 				struct hlist_head *head,
@@ -818,11 +841,8 @@ static struct inode *find_inode(struct super_block *sb,
 	struct inode *inode = NULL;
 
 repeat:
-	hlist_for_each_entry(inode, head, i_hash) {
-		if (inode->i_sb != sb)
-			continue;
-		if (!test(inode, data))
-			continue;
+	inode = __find_inode_rcu(sb, head, test, data);
+	if (inode) {
 		spin_lock(&inode->i_lock);
 		if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
 			__wait_on_freeing_inode(inode);
@@ -839,6 +859,26 @@ static struct inode *find_inode(struct super_block *sb,
 	return NULL;
 }
 
+/*
+ * Find an inode by inode number.  Can be called with either the RCU
+ * read lock or the inode cache lock held.  No check is made as to the
+ * validity of the inode found.
+ */
+static struct inode *__find_inode_by_ino_rcu(struct super_block *sb,
+					     struct hlist_head *head,
+					     unsigned long ino)
+{
+	struct inode *inode;
+
+	hlist_for_each_entry_rcu(inode, head, i_hash) {
+		if (inode->i_ino == ino &&
+		    inode->i_sb == sb)
+			return inode;
+	}
+
+	return NULL;
+}
+
 /*
  * find_inode_fast is the fast path version of find_inode, see the comment at
  * iget_locked for details.
@@ -849,11 +889,8 @@ static struct inode *find_inode_fast(struct super_block *sb,
 	struct inode *inode = NULL;
 
 repeat:
-	hlist_for_each_entry(inode, head, i_hash) {
-		if (inode->i_ino != ino)
-			continue;
-		if (inode->i_sb != sb)
-			continue;
+	inode = __find_inode_by_ino_rcu(sb, head, ino);
+	if (inode) {
 		spin_lock(&inode->i_lock);
 		if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
 			__wait_on_freeing_inode(inode);
@@ -1106,7 +1143,7 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval,
 	 */
 	spin_lock(&inode->i_lock);
 	inode->i_state |= I_NEW;
-	hlist_add_head(&inode->i_hash, head);
+	hlist_add_head_rcu(&inode->i_hash, head);
 	spin_unlock(&inode->i_lock);
 	if (!creating)
 		inode_sb_list_add(inode);
@@ -1200,7 +1237,7 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino)
 			inode->i_ino = ino;
 			spin_lock(&inode->i_lock);
 			inode->i_state = I_NEW;
-			hlist_add_head(&inode->i_hash, head);
+			hlist_add_head_rcu(&inode->i_hash, head);
 			spin_unlock(&inode->i_lock);
 			inode_sb_list_add(inode);
 			spin_unlock(&inode_hash_lock);
@@ -1244,15 +1281,9 @@ static int test_inode_iunique(struct super_block *sb, unsigned long ino)
 	struct inode *inode;
 
 	spin_lock(&inode_hash_lock);
-	hlist_for_each_entry(inode, b, i_hash) {
-		if (inode->i_ino == ino && inode->i_sb == sb) {
-			spin_unlock(&inode_hash_lock);
-			return 0;
-		}
-	}
+	inode = __find_inode_by_ino_rcu(sb, b, ino);
 	spin_unlock(&inode_hash_lock);
-
-	return 1;
+	return inode ? 0 : 1;
 }
 
 /**
@@ -1324,6 +1355,7 @@ EXPORT_SYMBOL(igrab);
  *
  * Note: I_NEW is not waited upon so you have to be very careful what you do
  * with the returned inode.  You probably should be using ilookup5() instead.
+ * It may still sleep waiting for I_FREE and I_WILL_FREE, however.
  *
  * Note2: @test is called with the inode_hash_lock held, so can't sleep.
  */
@@ -1406,54 +1438,84 @@ struct inode *ilookup(struct super_block *sb, unsigned long ino)
 EXPORT_SYMBOL(ilookup);
 
 /**
- * find_inode_nowait - find an inode in the inode cache
- * @sb:		super block of file system to search
- * @hashval:	hash value (usually inode number) to search for
- * @match:	callback used for comparisons between inodes
- * @data:	opaque data pointer to pass to @match
- *
- * Search for the inode specified by @hashval and @data in the inode
- * cache, where the helper function @match will return 0 if the inode
- * does not match, 1 if the inode does match, and -1 if the search
- * should be stopped.  The @match function must be responsible for
- * taking the i_lock spin_lock and checking i_state for an inode being
- * freed or being initialized, and incrementing the reference count
- * before returning 1.  It also must not sleep, since it is called with
- * the inode_hash_lock spinlock held.
- *
- * This is a even more generalized version of ilookup5() when the
- * function must never block --- find_inode() can block in
- * __wait_on_freeing_inode() --- or when the caller can not increment
- * the reference count because the resulting iput() might cause an
- * inode eviction.  The tradeoff is that the @match funtion must be
- * very carefully implemented.
- */
-struct inode *find_inode_nowait(struct super_block *sb,
-				unsigned long hashval,
-				int (*match)(struct inode *, unsigned long,
-					     void *),
-				void *data)
+ * find_inode_rcu - find an inode in the inode cache
+ * @sb:		Super block of file system to search
+ * @hashval:	Key to hash
+ * @test:	Function to test match on an inode
+ * @data:	Data for test function
+ *
+ * Search for the inode specified by @hashval and @data in the inode cache,
+ * where the helper function @test will return 0 if the inode does not match
+ * and 1 if it does.  The @test function must be responsible for taking the
+ * i_lock spin_lock and checking i_state for an inode being freed or being
+ * initialized.
+ *
+ * If successful, this will return the inode for which the @test function
+ * returned 1 and NULL otherwise.
+ *
+ * The @test function is not permitted to take a ref on any inode presented
+ * unless the caller is holding the inode hashtable lock.  It is also not
+ * permitted to sleep, since it may be called with the RCU read lock held.
+ *
+ * The caller must hold either the RCU read lock or the inode hashtable lock.
+ */
+struct inode *find_inode_rcu(struct super_block *sb, unsigned long hashval,
+			     int (*test)(struct inode *, void *), void *data)
 {
 	struct hlist_head *head = inode_hashtable + hash(sb, hashval);
-	struct inode *inode, *ret_inode = NULL;
-	int mval;
+	struct inode *inode;
 
-	spin_lock(&inode_hash_lock);
-	hlist_for_each_entry(inode, head, i_hash) {
-		if (inode->i_sb != sb)
-			continue;
-		mval = match(inode, hashval, data);
-		if (mval == 0)
-			continue;
-		if (mval == 1)
-			ret_inode = inode;
-		goto out;
+	RCU_LOCKDEP_WARN(!lockdep_is_held(&inode_hash_lock) && !rcu_read_lock_held(),
+			 "suspicious find_inode_by_ino_rcu() usage");
+
+	hlist_for_each_entry_rcu(inode, head, i_hash) {
+		if (inode->i_sb == sb &&
+		    !(READ_ONCE(inode->i_state) & (I_FREEING | I_WILL_FREE)) &&
+		    test(inode, data))
+			return inode;
 	}
-out:
-	spin_unlock(&inode_hash_lock);
-	return ret_inode;
+	return NULL;
+}
+EXPORT_SYMBOL(find_inode_rcu);
+
+/**
+ * find_inode_by_rcu - Find an inode in the inode cache
+ * @sb:		Super block of file system to search
+ * @ino:	The inode number to match
+ *
+ * Search for the inode specified by @hashval and @data in the inode cache,
+ * where the helper function @test will return 0 if the inode does not match
+ * and 1 if it does.  The @test function must be responsible for taking the
+ * i_lock spin_lock and checking i_state for an inode being freed or being
+ * initialized.
+ *
+ * If successful, this will return the inode for which the @test function
+ * returned 1 and NULL otherwise.
+ *
+ * The @test function is not permitted to take a ref on any inode presented
+ * unless the caller is holding the inode hashtable lock.  It is also not
+ * permitted to sleep, since it may be called with the RCU read lock held.
+ *
+ * The caller must hold either the RCU read lock or the inode hashtable lock.
+ */
+struct inode *find_inode_by_ino_rcu(struct super_block *sb,
+				    unsigned long ino)
+{
+	struct hlist_head *head = inode_hashtable + hash(sb, ino);
+	struct inode *inode;
+
+	RCU_LOCKDEP_WARN(!lockdep_is_held(&inode_hash_lock) && !rcu_read_lock_held(),
+			 "suspicious find_inode_by_ino_rcu() usage");
+
+	hlist_for_each_entry_rcu(inode, head, i_hash) {
+		if (inode->i_ino == ino &&
+		    inode->i_sb == sb &&
+		    !(READ_ONCE(inode->i_state) & (I_FREEING | I_WILL_FREE)))
+		    return inode;
+	}
+	return NULL;
 }
-EXPORT_SYMBOL(find_inode_nowait);
+EXPORT_SYMBOL(find_inode_by_ino_rcu);
 
 int insert_inode_locked(struct inode *inode)
 {
@@ -1479,7 +1541,7 @@ int insert_inode_locked(struct inode *inode)
 		if (likely(!old)) {
 			spin_lock(&inode->i_lock);
 			inode->i_state |= I_NEW | I_CREATING;
-			hlist_add_head(&inode->i_hash, head);
+			hlist_add_head_rcu(&inode->i_hash, head);
 			spin_unlock(&inode->i_lock);
 			spin_unlock(&inode_hash_lock);
 			return 0;
@@ -1539,6 +1601,7 @@ static void iput_final(struct inode *inode)
 {
 	struct super_block *sb = inode->i_sb;
 	const struct super_operations *op = inode->i_sb->s_op;
+	unsigned long state;
 	int drop;
 
 	WARN_ON(inode->i_state & I_NEW);
@@ -1554,16 +1617,20 @@ static void iput_final(struct inode *inode)
 		return;
 	}
 
+	state = READ_ONCE(inode->i_state);
 	if (!drop) {
-		inode->i_state |= I_WILL_FREE;
+		WRITE_ONCE(inode->i_state, state | I_WILL_FREE);
 		spin_unlock(&inode->i_lock);
+
 		write_inode_now(inode, 1);
+
 		spin_lock(&inode->i_lock);
-		WARN_ON(inode->i_state & I_NEW);
-		inode->i_state &= ~I_WILL_FREE;
+		state = READ_ONCE(inode->i_state);
+		WARN_ON(state & I_NEW);
+		state &= ~I_WILL_FREE;
 	}
 
-	inode->i_state |= I_FREEING;
+	WRITE_ONCE(inode->i_state, state | I_FREEING);
 	if (!list_empty(&inode->i_lru))
 		inode_lru_list_del(inode);
 	spin_unlock(&inode->i_lock);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3cd4fe6b845e..41f7cb439e34 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3057,11 +3057,9 @@ extern struct inode *inode_insert5(struct inode *inode, unsigned long hashval,
 		void *data);
 extern struct inode * iget5_locked(struct super_block *, unsigned long, int (*test)(struct inode *, void *), int (*set)(struct inode *, void *), void *);
 extern struct inode * iget_locked(struct super_block *, unsigned long);
-extern struct inode *find_inode_nowait(struct super_block *,
-				       unsigned long,
-				       int (*match)(struct inode *,
-						    unsigned long, void *),
-				       void *data);
+extern struct inode *find_inode_rcu(struct super_block *, unsigned long,
+				    int (*)(struct inode *, void *), void *);
+extern struct inode *find_inode_by_ino_rcu(struct super_block *, unsigned long);
 extern int insert_inode_locked4(struct inode *, unsigned long, int (*test)(struct inode *, void *), void *);
 extern int insert_inode_locked(struct inode *);
 #ifdef CONFIG_DEBUG_LOCK_ALLOC


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-03-09 16:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-30 17:40 [PATCH v2 1/2] vfs: add non-blocking mode for function find_inode_nowait() Konstantin Khlebnikov
2020-01-30 17:40 ` [PATCH v2 2/2] ext4: avoid spinlock congestion in lazytime optimization Konstantin Khlebnikov
2020-03-09 16:12 ` [PATCH v2 1/2] vfs: add non-blocking mode for function find_inode_nowait() David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).