All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] ext4: improve commit path performance for fast commit
@ 2022-03-08 16:33 Harshad Shirwadkar
  2022-03-08 16:33 ` [PATCH v2 1/5] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Harshad Shirwadkar @ 2022-03-08 16:33 UTC (permalink / raw)
  To: linux-ext4; +Cc: riteshh, jack, tytso, Harshad Shirwadkar

From: Harshad Shirwadkar <harshadshirwadkar@gmail.com>

This is the V2 of the patch series after handling Jan's comments on
the first version.

This patch series supersedes the patch "ext4: remove journal barrier
during fast commit" sent in Feb 2022.

This patch series reworks the fast commit's commit path to improve
overall performance of the commit path. Following optimizations have
been added in this series:

* Avoid having to lock the journal throughout the fast commit.
* Remove tracking of open handles per inode.

With the changes introduced in this patch series, now the commit path
for fast commits is as follows:

 [1] Lock the journal by calling jbd2_journal_lock_updates. This
     ensures that all the exsiting handles finish and no new handles
     can start.
 [2] Mark all the fast commit eligible inodes as undergoing fast commit
     by setting "EXT4_STATE_FC_COMMITTING" state.
 [3] Unlock the journal by calling jbd2_journal_unlock_updates. This allows
     starting of new handles. If new handles try to start an update on
     any of the inodes that are being committed, ext4_fc_track_inode()
     will block until those inodes have finished the fast commit.
 [4] Submit data buffers of all the committing inodes.
 [5] Wait for [4] to complete.
 [6] Commit all the directory entry updates in the fast commit space.
 [7] Commit all the changed inodes in the fast commit space and clear
     "EXT4_STATE_FC_COMMITTING" for all the inodes.
 [8] Write tail tag to ensure atomicity of commits.

(The above flow has been documented in the code as well)

I verified that the patch series introduces no regressions in "quick"
and "log" groups when "fast_commit" feature is enabled.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>

Harshad Shirwadkar (5):
  ext4: convert i_fc_lock to spinlock
  ext4: for committing inode, make ext4_fc_track_inode wait
  ext4: rework fast commit commit path
  ext4: drop i_fc_updates from inode fc info
  ext4: update code documentation

 fs/ext4/ext4.h        |  12 +--
 fs/ext4/ext4_jbd2.c   |  12 +++
 fs/ext4/ext4_jbd2.h   |  13 +--
 fs/ext4/fast_commit.c | 227 ++++++++++++++++++++----------------------
 fs/ext4/inode.c       |   3 +-
 fs/ext4/super.c       |   2 +-
 fs/jbd2/journal.c     |   2 -
 7 files changed, 130 insertions(+), 141 deletions(-)

-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/5] ext4: convert i_fc_lock to spinlock
  2022-03-08 16:33 [PATCH v2 0/5] ext4: improve commit path performance for fast commit Harshad Shirwadkar
@ 2022-03-08 16:33 ` Harshad Shirwadkar
  2022-03-09 10:10   ` Jan Kara
  2022-03-08 16:33 ` [PATCH v2 2/5] ext4: for committing inode, make ext4_fc_track_inode wait Harshad Shirwadkar
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Harshad Shirwadkar @ 2022-03-08 16:33 UTC (permalink / raw)
  To: linux-ext4; +Cc: riteshh, jack, tytso, Harshad Shirwadkar

From: Harshad Shirwadkar <harshadshirwadkar@gmail.com>

Convert ext4_inode_info->i_fc_lock to spinlock to avoid sleeping
in invalid contexts.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
---
 fs/ext4/ext4.h        |  7 +++++--
 fs/ext4/fast_commit.c | 28 +++++++++++++++-------------
 fs/ext4/super.c       |  2 +-
 3 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3f87cca49f0c..fb6d65f1176f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1065,8 +1065,11 @@ struct ext4_inode_info {
 	/* Fast commit wait queue for this inode */
 	wait_queue_head_t i_fc_wait;
 
-	/* Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len */
-	struct mutex i_fc_lock;
+	/*
+	 * Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len
+	 * and inode's EXT4_FC_STATE_COMMITTING state bit.
+	 */
+	spinlock_t i_fc_lock;
 
 	/*
 	 * i_disksize keeps track of what the inode size is ON DISK, not
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 5ac594e03402..9913de655b61 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -387,7 +387,7 @@ static int ext4_fc_track_template(
 		return -EINVAL;
 
 	tid = handle->h_transaction->t_tid;
-	mutex_lock(&ei->i_fc_lock);
+	spin_lock(&ei->i_fc_lock);
 	if (tid == ei->i_sync_tid) {
 		update = true;
 	} else {
@@ -395,7 +395,7 @@ static int ext4_fc_track_template(
 		ei->i_sync_tid = tid;
 	}
 	ret = __fc_track_fn(inode, args, update);
-	mutex_unlock(&ei->i_fc_lock);
+	spin_unlock(&ei->i_fc_lock);
 
 	if (!enqueue)
 		return ret;
@@ -427,11 +427,11 @@ static int __track_dentry_update(struct inode *inode, void *arg, bool update)
 	struct dentry *dentry = dentry_update->dentry;
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 
-	mutex_unlock(&ei->i_fc_lock);
+	spin_unlock(&ei->i_fc_lock);
 	node = kmem_cache_alloc(ext4_fc_dentry_cachep, GFP_NOFS);
 	if (!node) {
 		ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_NOMEM, NULL);
-		mutex_lock(&ei->i_fc_lock);
+		spin_lock(&ei->i_fc_lock);
 		return -ENOMEM;
 	}
 
@@ -444,7 +444,7 @@ static int __track_dentry_update(struct inode *inode, void *arg, bool update)
 			kmem_cache_free(ext4_fc_dentry_cachep, node);
 			ext4_fc_mark_ineligible(inode->i_sb,
 				EXT4_FC_REASON_NOMEM, NULL);
-			mutex_lock(&ei->i_fc_lock);
+			spin_lock(&ei->i_fc_lock);
 			return -ENOMEM;
 		}
 		memcpy((u8 *)node->fcd_name.name, dentry->d_name.name,
@@ -478,7 +478,7 @@ static int __track_dentry_update(struct inode *inode, void *arg, bool update)
 		list_add_tail(&node->fcd_dilist, &ei->i_fc_dilist);
 	}
 	spin_unlock(&sbi->s_fc_lock);
-	mutex_lock(&ei->i_fc_lock);
+	spin_lock(&ei->i_fc_lock);
 
 	return 0;
 }
@@ -580,10 +580,8 @@ static int __track_range(struct inode *inode, void *arg, bool update)
 	struct __track_range_args *__arg =
 		(struct __track_range_args *)arg;
 
-	if (inode->i_ino < EXT4_FIRST_INO(inode->i_sb)) {
-		ext4_debug("Special inode %ld being modified\n", inode->i_ino);
+	if (inode->i_ino < EXT4_FIRST_INO(inode->i_sb))
 		return -ECANCELED;
-	}
 
 	oldstart = ei->i_fc_lblk_start;
 
@@ -867,15 +865,15 @@ static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)
 	struct ext4_extent *ex;
 	int ret;
 
-	mutex_lock(&ei->i_fc_lock);
+	spin_lock(&ei->i_fc_lock);
 	if (ei->i_fc_lblk_len == 0) {
-		mutex_unlock(&ei->i_fc_lock);
+		spin_unlock(&ei->i_fc_lock);
 		return 0;
 	}
 	old_blk_size = ei->i_fc_lblk_start;
 	new_blk_size = ei->i_fc_lblk_start + ei->i_fc_lblk_len - 1;
 	ei->i_fc_lblk_len = 0;
-	mutex_unlock(&ei->i_fc_lock);
+	spin_unlock(&ei->i_fc_lock);
 
 	cur_lblk_off = old_blk_size;
 	jbd_debug(1, "%s: will try writing %d to %d for inode %ld\n",
@@ -972,9 +970,13 @@ static int ext4_fc_wait_inode_data_all(journal_t *journal)
 
 	spin_lock(&sbi->s_fc_lock);
 	list_for_each_entry_safe(pos, n, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
+		spin_lock(&pos->i_fc_lock);
 		if (!ext4_test_inode_state(&pos->vfs_inode,
-					   EXT4_STATE_FC_COMMITTING))
+					   EXT4_STATE_FC_COMMITTING)) {
+			spin_unlock(&pos->i_fc_lock);
 			continue;
+		}
+		spin_unlock(&pos->i_fc_lock);
 		spin_unlock(&sbi->s_fc_lock);
 
 		ret = jbd2_wait_inode_data(journal, pos->jinode);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 1e5f4994fe57..38d63113c383 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1346,7 +1346,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
 	atomic_set(&ei->i_unwritten, 0);
 	INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work);
 	ext4_fc_init_inode(&ei->vfs_inode);
-	mutex_init(&ei->i_fc_lock);
+	spin_lock_init(&ei->i_fc_lock);
 	return &ei->vfs_inode;
 }
 
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/5] ext4: for committing inode, make ext4_fc_track_inode wait
  2022-03-08 16:33 [PATCH v2 0/5] ext4: improve commit path performance for fast commit Harshad Shirwadkar
  2022-03-08 16:33 ` [PATCH v2 1/5] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
@ 2022-03-08 16:33 ` Harshad Shirwadkar
  2022-03-09 10:14   ` Jan Kara
  2022-03-08 16:33 ` [PATCH v2 3/5] ext4: rework fast commit commit path Harshad Shirwadkar
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Harshad Shirwadkar @ 2022-03-08 16:33 UTC (permalink / raw)
  To: linux-ext4; +Cc: riteshh, jack, tytso, Harshad Shirwadkar

From: Harshad Shirwadkar <harshadshirwadkar@gmail.com>

If the inode that's being requested to track using ext4_fc_track_inode
is being committed, then wait until the inode finishes the commit.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
---
 fs/ext4/ext4_jbd2.c   | 12 ++++++++++++
 fs/ext4/ext4_jbd2.h   | 13 ++++---------
 fs/ext4/fast_commit.c | 28 ++++++++++++++++++++++++++++
 fs/ext4/inode.c       |  3 ++-
 4 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 3477a16d08ae..7fa301b0a35a 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -106,6 +106,18 @@ handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
 				   GFP_NOFS, type, line);
 }
 
+handle_t *__ext4_journal_start(struct inode *inode, unsigned int line,
+				  int type, int blocks, int rsv_blocks,
+				  int revoke_creds)
+{
+	handle_t *handle = __ext4_journal_start_sb(inode->i_sb, line,
+						   type, blocks, rsv_blocks,
+						   revoke_creds);
+	if (ext4_handle_valid(handle) && !IS_ERR(handle))
+		ext4_fc_track_inode(handle, inode);
+	return handle;
+}
+
 int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle)
 {
 	struct super_block *sb;
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index db2ae4a2b38d..e408622fe896 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -302,6 +302,10 @@ static inline int ext4_trans_default_revoke_credits(struct super_block *sb)
 	return ext4_free_metadata_revoke_credits(sb, 8);
 }
 
+handle_t *__ext4_journal_start(struct inode *inode, unsigned int line,
+			       int type, int blocks, int rsv_blocks,
+			       int revoke_creds);
+
 #define ext4_journal_start_sb(sb, type, nblocks)			\
 	__ext4_journal_start_sb((sb), __LINE__, (type), (nblocks), 0,	\
 				ext4_trans_default_revoke_credits(sb))
@@ -318,15 +322,6 @@ static inline int ext4_trans_default_revoke_credits(struct super_block *sb)
 	__ext4_journal_start((inode), __LINE__, (type), (blocks), 0,	\
 			     (revoke_creds))
 
-static inline handle_t *__ext4_journal_start(struct inode *inode,
-					     unsigned int line, int type,
-					     int blocks, int rsv_blocks,
-					     int revoke_creds)
-{
-	return __ext4_journal_start_sb(inode->i_sb, line, type, blocks,
-				       rsv_blocks, revoke_creds);
-}
-
 #define ext4_journal_stop(handle) \
 	__ext4_journal_stop(__func__, __LINE__, (handle))
 
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 9913de655b61..be8c5b3456ec 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -551,8 +551,14 @@ static int __track_inode(struct inode *inode, void *arg, bool update)
 	return 0;
 }
 
+/*
+ * Track inode as part of the next fast commit. If the inode is being
+ * committed, this function will wait for the commit to finish.
+ */
 void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
 {
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	wait_queue_head_t *wq;
 	int ret;
 
 	if (S_ISDIR(inode->i_mode))
@@ -564,6 +570,28 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
 		return;
 	}
 
+	if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) ||
+	    (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY))
+		return;
+
+	while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
+#if (BITS_PER_LONG < 64)
+		DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
+				EXT4_STATE_FC_COMMITTING);
+		wq = bit_waitqueue(&ei->i_state_flags,
+				   EXT4_STATE_FC_COMMITTING);
+#else
+		DEFINE_WAIT_BIT(wait, &ei->i_flags,
+				EXT4_STATE_FC_COMMITTING);
+		wq = bit_waitqueue(&ei->i_flags,
+				   EXT4_STATE_FC_COMMITTING);
+#endif
+		prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
+		if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING))
+			schedule();
+		finish_wait(wq, &wait.wq_entry);
+	}
+
 	ret = ext4_fc_track_template(handle, inode, __track_inode, NULL, 1);
 	trace_ext4_fc_track_inode(inode, ret);
 }
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 531a94f48637..7a01f5bd377c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -629,6 +629,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
 	 * with create == 1 flag.
 	 */
 	down_write(&EXT4_I(inode)->i_data_sem);
+	ext4_fc_track_inode(handle, inode);
 
 	/*
 	 * We need to check for EXT4 here because migrate
@@ -5690,7 +5691,6 @@ int ext4_mark_iloc_dirty(handle_t *handle,
 		put_bh(iloc->bh);
 		return -EIO;
 	}
-	ext4_fc_track_inode(handle, inode);
 
 	if (IS_I_VERSION(inode))
 		inode_inc_iversion(inode);
@@ -5727,6 +5727,7 @@ ext4_reserve_inode_write(handle_t *handle, struct inode *inode,
 			brelse(iloc->bh);
 			iloc->bh = NULL;
 		}
+		ext4_fc_track_inode(handle, inode);
 	}
 	ext4_std_error(inode->i_sb, err);
 	return err;
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/5] ext4: rework fast commit commit path
  2022-03-08 16:33 [PATCH v2 0/5] ext4: improve commit path performance for fast commit Harshad Shirwadkar
  2022-03-08 16:33 ` [PATCH v2 1/5] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
  2022-03-08 16:33 ` [PATCH v2 2/5] ext4: for committing inode, make ext4_fc_track_inode wait Harshad Shirwadkar
@ 2022-03-08 16:33 ` Harshad Shirwadkar
  2022-03-09 10:17   ` Jan Kara
  2022-03-08 16:33 ` [PATCH v2 4/5] ext4: drop i_fc_updates from inode fc info Harshad Shirwadkar
  2022-03-08 16:33 ` [PATCH v2 5/5] ext4: update code documentation Harshad Shirwadkar
  4 siblings, 1 reply; 16+ messages in thread
From: Harshad Shirwadkar @ 2022-03-08 16:33 UTC (permalink / raw)
  To: linux-ext4; +Cc: riteshh, jack, tytso, Harshad Shirwadkar

From: Harshad Shirwadkar <harshadshirwadkar@gmail.com>

This patch reworks fast commit's commit path to remove locking the
journal for the entire duration of a fast commit. Instead, we only lock
the journal while marking all the eligible inodes as "committing". This
allows handles to make progress in parallel with the fast commit.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
---
 fs/ext4/fast_commit.c | 77 ++++++++++++++++++++++++++-----------------
 fs/jbd2/journal.c     |  2 --
 2 files changed, 47 insertions(+), 32 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index be8c5b3456ec..eedcf8b4d47b 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -287,20 +287,30 @@ void ext4_fc_del(struct inode *inode)
 	    (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY))
 		return;
 
-restart:
 	spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock);
 	if (list_empty(&ei->i_fc_list) && list_empty(&ei->i_fc_dilist)) {
 		spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
 		return;
 	}
 
-	if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
-		ext4_fc_wait_committing_inode(inode);
-		goto restart;
-	}
-
-	if (!list_empty(&ei->i_fc_list))
-		list_del_init(&ei->i_fc_list);
+	/*
+	 * Since ext4_fc_del is called from ext4_evict_inode while having a
+	 * handle open, there is no need for us to wait here even if a fast
+	 * commit is going on. That is because, if this inode is being
+	 * committed, ext4_mark_inode_dirty would have waited for inode commit
+	 * operation to finish before we come here. So, by the time we come
+	 * here, inode's EXT4_STATE_FC_COMMITTING would have been cleared. So,
+	 * we shouldn't see EXT4_STATE_FC_COMMITTING to be set on this inode
+	 * here.
+	 *
+	 * We may come here without any handles open in the "no_delete" case of
+	 * ext4_evict_inode as well. However, if that happens, we first mark the
+	 * file system as fast commit ineligible anyway. So, even in that case,
+	 * it is okay to remove the inode from the fc list.
+	 */
+	WARN_ON(ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)
+		&& !ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE));
+	list_del_init(&ei->i_fc_list);
 
 	/*
 	 * Since this inode is getting removed, let's also remove all FC
@@ -323,8 +333,6 @@ void ext4_fc_del(struct inode *inode)
 		fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
 		kfree(fc_dentry->fcd_name.name);
 	kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
-
-	return;
 }
 
 /*
@@ -964,19 +972,6 @@ static int ext4_fc_submit_inode_data_all(journal_t *journal)
 
 	spin_lock(&sbi->s_fc_lock);
 	list_for_each_entry(ei, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
-		ext4_set_inode_state(&ei->vfs_inode, EXT4_STATE_FC_COMMITTING);
-		while (atomic_read(&ei->i_fc_updates)) {
-			DEFINE_WAIT(wait);
-
-			prepare_to_wait(&ei->i_fc_wait, &wait,
-						TASK_UNINTERRUPTIBLE);
-			if (atomic_read(&ei->i_fc_updates)) {
-				spin_unlock(&sbi->s_fc_lock);
-				schedule();
-				spin_lock(&sbi->s_fc_lock);
-			}
-			finish_wait(&ei->i_fc_wait, &wait);
-		}
 		spin_unlock(&sbi->s_fc_lock);
 		ret = jbd2_submit_inode_data(ei->jinode);
 		if (ret)
@@ -998,13 +993,9 @@ static int ext4_fc_wait_inode_data_all(journal_t *journal)
 
 	spin_lock(&sbi->s_fc_lock);
 	list_for_each_entry_safe(pos, n, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
-		spin_lock(&pos->i_fc_lock);
 		if (!ext4_test_inode_state(&pos->vfs_inode,
-					   EXT4_STATE_FC_COMMITTING)) {
-			spin_unlock(&pos->i_fc_lock);
+					   EXT4_STATE_FC_COMMITTING))
 			continue;
-		}
-		spin_unlock(&pos->i_fc_lock);
 		spin_unlock(&sbi->s_fc_lock);
 
 		ret = jbd2_wait_inode_data(journal, pos->jinode);
@@ -1093,6 +1084,16 @@ static int ext4_fc_perform_commit(journal_t *journal)
 	int ret = 0;
 	u32 crc = 0;
 
+	/* Lock the journal */
+	jbd2_journal_lock_updates(journal);
+	spin_lock(&sbi->s_fc_lock);
+	list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
+		ext4_set_inode_state(&iter->vfs_inode,
+				     EXT4_STATE_FC_COMMITTING);
+	}
+	spin_unlock(&sbi->s_fc_lock);
+	jbd2_journal_unlock_updates(journal);
+
 	ret = ext4_fc_submit_inode_data_all(journal);
 	if (ret)
 		return ret;
@@ -1143,6 +1144,18 @@ static int ext4_fc_perform_commit(journal_t *journal)
 		ret = ext4_fc_write_inode(inode, &crc);
 		if (ret)
 			goto out;
+		ext4_clear_inode_state(inode, EXT4_STATE_FC_COMMITTING);
+		/*
+		 * Make sure clearing of EXT4_STATE_FC_COMMITTING is
+		 * visible before we send the wakeup. Pairs with implicit
+		 * barrier in prepare_to_wait() in ext4_fc_track_inode().
+		 */
+		smp_mb();
+#if (BITS_PER_LONG < 64)
+		wake_up_bit(&iter->i_state_flags, EXT4_STATE_FC_COMMITTING);
+#else
+		wake_up_bit(&iter->i_flags, EXT4_STATE_FC_COMMITTING);
+#endif
 		spin_lock(&sbi->s_fc_lock);
 	}
 	spin_unlock(&sbi->s_fc_lock);
@@ -1276,13 +1289,17 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
 	spin_lock(&sbi->s_fc_lock);
 	list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN],
 				 i_fc_list) {
-		list_del_init(&iter->i_fc_list);
 		ext4_clear_inode_state(&iter->vfs_inode,
 				       EXT4_STATE_FC_COMMITTING);
 		if (iter->i_sync_tid <= tid)
 			ext4_fc_reset_inode(&iter->vfs_inode);
-		/* Make sure EXT4_STATE_FC_COMMITTING bit is clear */
+		/*
+		 * Make sure clearing of EXT4_STATE_FC_COMMITTING is
+		 * visible before we send the wakeup. Pairs with implicit
+		 * barrier in prepare_to_wait() in ext4_fc_track_inode().
+		 */
 		smp_mb();
+		list_del_init(&iter->i_fc_list);
 #if (BITS_PER_LONG < 64)
 		wake_up_bit(&iter->i_state_flags, EXT4_STATE_FC_COMMITTING);
 #else
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index c2cf74b01ddb..06b885628b1c 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -757,7 +757,6 @@ int jbd2_fc_begin_commit(journal_t *journal, tid_t tid)
 	}
 	journal->j_flags |= JBD2_FAST_COMMIT_ONGOING;
 	write_unlock(&journal->j_state_lock);
-	jbd2_journal_lock_updates(journal);
 
 	return 0;
 }
@@ -769,7 +768,6 @@ EXPORT_SYMBOL(jbd2_fc_begin_commit);
  */
 static int __jbd2_fc_end_commit(journal_t *journal, tid_t tid, bool fallback)
 {
-	jbd2_journal_unlock_updates(journal);
 	if (journal->j_fc_cleanup_callback)
 		journal->j_fc_cleanup_callback(journal, 0, tid);
 	write_lock(&journal->j_state_lock);
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/5] ext4: drop i_fc_updates from inode fc info
  2022-03-08 16:33 [PATCH v2 0/5] ext4: improve commit path performance for fast commit Harshad Shirwadkar
                   ` (2 preceding siblings ...)
  2022-03-08 16:33 ` [PATCH v2 3/5] ext4: rework fast commit commit path Harshad Shirwadkar
@ 2022-03-08 16:33 ` Harshad Shirwadkar
  2022-03-09 10:17   ` Jan Kara
  2022-03-08 16:33 ` [PATCH v2 5/5] ext4: update code documentation Harshad Shirwadkar
  4 siblings, 1 reply; 16+ messages in thread
From: Harshad Shirwadkar @ 2022-03-08 16:33 UTC (permalink / raw)
  To: linux-ext4; +Cc: riteshh, jack, tytso, Harshad Shirwadkar

From: Harshad Shirwadkar <harshadshirwadkar@gmail.com>

The new logic introduced in this series does not require tracking number
of active handles open on an inode. So, drop it.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
---
 fs/ext4/ext4.h        |  5 ----
 fs/ext4/fast_commit.c | 70 -------------------------------------------
 2 files changed, 75 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index fb6d65f1176f..6861a3127a42 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1059,9 +1059,6 @@ struct ext4_inode_info {
 	/* End of lblk range that needs to be committed in this fast commit */
 	ext4_lblk_t i_fc_lblk_len;
 
-	/* Number of ongoing updates on this inode */
-	atomic_t  i_fc_updates;
-
 	/* Fast commit wait queue for this inode */
 	wait_queue_head_t i_fc_wait;
 
@@ -2930,8 +2927,6 @@ void __ext4_fc_track_create(handle_t *handle, struct inode *inode,
 void ext4_fc_track_create(handle_t *handle, struct dentry *dentry);
 void ext4_fc_track_inode(handle_t *handle, struct inode *inode);
 void ext4_fc_mark_ineligible(struct super_block *sb, int reason, handle_t *handle);
-void ext4_fc_start_update(struct inode *inode);
-void ext4_fc_stop_update(struct inode *inode);
 void ext4_fc_del(struct inode *inode);
 bool ext4_fc_replay_check_excluded(struct super_block *sb, ext4_fsblk_t block);
 void ext4_fc_replay_cleanup(struct super_block *sb);
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index eedcf8b4d47b..eea19e3ea9ba 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -201,76 +201,6 @@ void ext4_fc_init_inode(struct inode *inode)
 	INIT_LIST_HEAD(&ei->i_fc_list);
 	INIT_LIST_HEAD(&ei->i_fc_dilist);
 	init_waitqueue_head(&ei->i_fc_wait);
-	atomic_set(&ei->i_fc_updates, 0);
-}
-
-/* This function must be called with sbi->s_fc_lock held. */
-static void ext4_fc_wait_committing_inode(struct inode *inode)
-__releases(&EXT4_SB(inode->i_sb)->s_fc_lock)
-{
-	wait_queue_head_t *wq;
-	struct ext4_inode_info *ei = EXT4_I(inode);
-
-#if (BITS_PER_LONG < 64)
-	DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
-			EXT4_STATE_FC_COMMITTING);
-	wq = bit_waitqueue(&ei->i_state_flags,
-				EXT4_STATE_FC_COMMITTING);
-#else
-	DEFINE_WAIT_BIT(wait, &ei->i_flags,
-			EXT4_STATE_FC_COMMITTING);
-	wq = bit_waitqueue(&ei->i_flags,
-				EXT4_STATE_FC_COMMITTING);
-#endif
-	lockdep_assert_held(&EXT4_SB(inode->i_sb)->s_fc_lock);
-	prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
-	spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
-	schedule();
-	finish_wait(wq, &wait.wq_entry);
-}
-
-/*
- * Inform Ext4's fast about start of an inode update
- *
- * This function is called by the high level call VFS callbacks before
- * performing any inode update. This function blocks if there's an ongoing
- * fast commit on the inode in question.
- */
-void ext4_fc_start_update(struct inode *inode)
-{
-	struct ext4_inode_info *ei = EXT4_I(inode);
-
-	if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) ||
-	    (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY))
-		return;
-
-restart:
-	spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock);
-	if (list_empty(&ei->i_fc_list))
-		goto out;
-
-	if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
-		ext4_fc_wait_committing_inode(inode);
-		goto restart;
-	}
-out:
-	atomic_inc(&ei->i_fc_updates);
-	spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
-}
-
-/*
- * Stop inode update and wake up waiting fast commits if any.
- */
-void ext4_fc_stop_update(struct inode *inode)
-{
-	struct ext4_inode_info *ei = EXT4_I(inode);
-
-	if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) ||
-	    (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY))
-		return;
-
-	if (atomic_dec_and_test(&ei->i_fc_updates))
-		wake_up_all(&ei->i_fc_wait);
 }
 
 /*
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 5/5] ext4: update code documentation
  2022-03-08 16:33 [PATCH v2 0/5] ext4: improve commit path performance for fast commit Harshad Shirwadkar
                   ` (3 preceding siblings ...)
  2022-03-08 16:33 ` [PATCH v2 4/5] ext4: drop i_fc_updates from inode fc info Harshad Shirwadkar
@ 2022-03-08 16:33 ` Harshad Shirwadkar
  2022-03-09 10:19   ` Jan Kara
  4 siblings, 1 reply; 16+ messages in thread
From: Harshad Shirwadkar @ 2022-03-08 16:33 UTC (permalink / raw)
  To: linux-ext4; +Cc: riteshh, jack, tytso, Harshad Shirwadkar

From: Harshad Shirwadkar <harshadshirwadkar@gmail.com>

This patch updates code documentation to reflect the commit path changes
made in this series.

Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
---
 fs/ext4/fast_commit.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index eea19e3ea9ba..c14e6d34d552 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -49,14 +49,21 @@
  * that need to be committed during a fast commit in another in memory queue of
  * inodes. During the commit operation, we commit in the following order:
  *
- * [1] Lock inodes for any further data updates by setting COMMITTING state
- * [2] Submit data buffers of all the inodes
- * [3] Wait for [2] to complete
- * [4] Commit all the directory entry updates in the fast commit space
- * [5] Commit all the changed inode structures
- * [6] Write tail tag (this tag ensures the atomicity, please read the following
+ * [1] Lock the journal by calling jbd2_journal_lock_updates. This ensures that
+ *     all the exsiting handles finish and no new handles can start.
+ * [2] Mark all the fast commit eligible inodes as undergoing fast commit
+ *     by setting "EXT4_STATE_FC_COMMITTING" state.
+ * [3] Unlock the journal by calling jbd2_journal_unlock_updates. This allows
+ *     starting of new handles. If new handles try to start an update on
+ *     any of the inodes that are being committed, ext4_fc_track_inode()
+ *     will block until those inodes have finished the fast commit.
+ * [4] Submit data buffers of all the committing inodes.
+ * [5] Wait for [4] to complete.
+ * [6] Commit all the directory entry updates in the fast commit space.
+ * [7] Commit all the changed inodes in the fast commit space and clear
+ *     "EXT4_STATE_FC_COMMITTING" for these inodes.
+ * [8] Write tail tag (this tag ensures the atomicity, please read the following
  *     section for more details).
- * [7] Wait for [4], [5] and [6] to complete.
  *
  * All the inode updates must call ext4_fc_start_update() before starting an
  * update. If such an ongoing update is present, fast commit waits for it to
@@ -142,6 +149,13 @@
  * similarly. Thus, by converting a non-idempotent procedure into a series of
  * idempotent outcomes, fast commits ensured idempotence during the replay.
  *
+ * Locking
+ * -------
+ * sbi->s_fc_lock protects the fast commit inodes queue and the fast commit
+ * dentry queue. ei->i_fc_lock protects the fast commit related info in a given
+ * inode. Most of the code avoids acquiring both the locks, but if one must do
+ * that then sbi->s_fc_lock must be acquired before ei->i_fc_lock.
+ *
  * TODOs
  * -----
  *
@@ -156,13 +170,7 @@
  *    fast commit recovery even if that area is invalidated by later full
  *    commits.
  *
- * 1) Fast commit's commit path locks the entire file system during fast
- *    commit. This has significant performance penalty. Instead of that, we
- *    should use ext4_fc_start/stop_update functions to start inode level
- *    updates from ext4_journal_start/stop. Once we do that we can drop file
- *    system locking during commit path.
- *
- * 2) Handle more ineligible cases.
+ * 1) Handle more ineligible cases.
  */
 
 #include <trace/events/ext4.h>
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/5] ext4: convert i_fc_lock to spinlock
  2022-03-08 16:33 ` [PATCH v2 1/5] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
@ 2022-03-09 10:10   ` Jan Kara
  2022-03-11  4:02     ` harshad shirwadkar
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2022-03-09 10:10 UTC (permalink / raw)
  To: Harshad Shirwadkar; +Cc: linux-ext4, riteshh, jack, tytso

On Tue 08-03-22 08:33:15, Harshad Shirwadkar wrote:
> From: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
> 
> Convert ext4_inode_info->i_fc_lock to spinlock to avoid sleeping
> in invalid contexts.
> 
> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>

One comment below...

> @@ -972,9 +970,13 @@ static int ext4_fc_wait_inode_data_all(journal_t *journal)
>  
>  	spin_lock(&sbi->s_fc_lock);
>  	list_for_each_entry_safe(pos, n, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
> +		spin_lock(&pos->i_fc_lock);
>  		if (!ext4_test_inode_state(&pos->vfs_inode,
> -					   EXT4_STATE_FC_COMMITTING))
> +					   EXT4_STATE_FC_COMMITTING)) {
> +			spin_unlock(&pos->i_fc_lock);
>  			continue;
> +		}
> +		spin_unlock(&pos->i_fc_lock);
>  		spin_unlock(&sbi->s_fc_lock);

Why do you add a lock here in a pure lock-conversion patch? Furthermore I
don't think the lock is needed...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] ext4: for committing inode, make ext4_fc_track_inode wait
  2022-03-08 16:33 ` [PATCH v2 2/5] ext4: for committing inode, make ext4_fc_track_inode wait Harshad Shirwadkar
@ 2022-03-09 10:14   ` Jan Kara
  2022-03-11  4:17     ` harshad shirwadkar
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2022-03-09 10:14 UTC (permalink / raw)
  To: Harshad Shirwadkar; +Cc: linux-ext4, riteshh, jack, tytso

On Tue 08-03-22 08:33:16, Harshad Shirwadkar wrote:
> From: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
> 
> If the inode that's being requested to track using ext4_fc_track_inode
> is being committed, then wait until the inode finishes the commit.
> 
> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>

One comment below...

> --- a/fs/ext4/ext4_jbd2.c
> +++ b/fs/ext4/ext4_jbd2.c
> @@ -106,6 +106,18 @@ handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
>  				   GFP_NOFS, type, line);
>  }
>  
> +handle_t *__ext4_journal_start(struct inode *inode, unsigned int line,
> +				  int type, int blocks, int rsv_blocks,
> +				  int revoke_creds)
> +{
> +	handle_t *handle = __ext4_journal_start_sb(inode->i_sb, line,
> +						   type, blocks, rsv_blocks,
> +						   revoke_creds);
> +	if (ext4_handle_valid(handle) && !IS_ERR(handle))
> +		ext4_fc_track_inode(handle, inode);
> +	return handle;
> +}
> +

Please fix fs/ext4/inline.c rather than papering over the problem like
this. Because it is just a landmine waiting to explode in some strange
cornercase when someone does not call ext4_journal_start() but other handle
starting function.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] ext4: rework fast commit commit path
  2022-03-08 16:33 ` [PATCH v2 3/5] ext4: rework fast commit commit path Harshad Shirwadkar
@ 2022-03-09 10:17   ` Jan Kara
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2022-03-09 10:17 UTC (permalink / raw)
  To: Harshad Shirwadkar; +Cc: linux-ext4, riteshh, jack, tytso

On Tue 08-03-22 08:33:17, Harshad Shirwadkar wrote:
> From: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
> 
> This patch reworks fast commit's commit path to remove locking the
> journal for the entire duration of a fast commit. Instead, we only lock
> the journal while marking all the eligible inodes as "committing". This
> allows handles to make progress in parallel with the fast commit.
> 
> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>

The patch looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> ---
>  fs/ext4/fast_commit.c | 77 ++++++++++++++++++++++++++-----------------
>  fs/jbd2/journal.c     |  2 --
>  2 files changed, 47 insertions(+), 32 deletions(-)
> 
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index be8c5b3456ec..eedcf8b4d47b 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -287,20 +287,30 @@ void ext4_fc_del(struct inode *inode)
>  	    (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY))
>  		return;
>  
> -restart:
>  	spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock);
>  	if (list_empty(&ei->i_fc_list) && list_empty(&ei->i_fc_dilist)) {
>  		spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
>  		return;
>  	}
>  
> -	if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
> -		ext4_fc_wait_committing_inode(inode);
> -		goto restart;
> -	}
> -
> -	if (!list_empty(&ei->i_fc_list))
> -		list_del_init(&ei->i_fc_list);
> +	/*
> +	 * Since ext4_fc_del is called from ext4_evict_inode while having a
> +	 * handle open, there is no need for us to wait here even if a fast
> +	 * commit is going on. That is because, if this inode is being
> +	 * committed, ext4_mark_inode_dirty would have waited for inode commit
> +	 * operation to finish before we come here. So, by the time we come
> +	 * here, inode's EXT4_STATE_FC_COMMITTING would have been cleared. So,
> +	 * we shouldn't see EXT4_STATE_FC_COMMITTING to be set on this inode
> +	 * here.
> +	 *
> +	 * We may come here without any handles open in the "no_delete" case of
> +	 * ext4_evict_inode as well. However, if that happens, we first mark the
> +	 * file system as fast commit ineligible anyway. So, even in that case,
> +	 * it is okay to remove the inode from the fc list.
> +	 */
> +	WARN_ON(ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)
> +		&& !ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE));
> +	list_del_init(&ei->i_fc_list);
>  
>  	/*
>  	 * Since this inode is getting removed, let's also remove all FC
> @@ -323,8 +333,6 @@ void ext4_fc_del(struct inode *inode)
>  		fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
>  		kfree(fc_dentry->fcd_name.name);
>  	kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
> -
> -	return;
>  }
>  
>  /*
> @@ -964,19 +972,6 @@ static int ext4_fc_submit_inode_data_all(journal_t *journal)
>  
>  	spin_lock(&sbi->s_fc_lock);
>  	list_for_each_entry(ei, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
> -		ext4_set_inode_state(&ei->vfs_inode, EXT4_STATE_FC_COMMITTING);
> -		while (atomic_read(&ei->i_fc_updates)) {
> -			DEFINE_WAIT(wait);
> -
> -			prepare_to_wait(&ei->i_fc_wait, &wait,
> -						TASK_UNINTERRUPTIBLE);
> -			if (atomic_read(&ei->i_fc_updates)) {
> -				spin_unlock(&sbi->s_fc_lock);
> -				schedule();
> -				spin_lock(&sbi->s_fc_lock);
> -			}
> -			finish_wait(&ei->i_fc_wait, &wait);
> -		}
>  		spin_unlock(&sbi->s_fc_lock);
>  		ret = jbd2_submit_inode_data(ei->jinode);
>  		if (ret)
> @@ -998,13 +993,9 @@ static int ext4_fc_wait_inode_data_all(journal_t *journal)
>  
>  	spin_lock(&sbi->s_fc_lock);
>  	list_for_each_entry_safe(pos, n, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
> -		spin_lock(&pos->i_fc_lock);
>  		if (!ext4_test_inode_state(&pos->vfs_inode,
> -					   EXT4_STATE_FC_COMMITTING)) {
> -			spin_unlock(&pos->i_fc_lock);
> +					   EXT4_STATE_FC_COMMITTING))
>  			continue;
> -		}
> -		spin_unlock(&pos->i_fc_lock);
>  		spin_unlock(&sbi->s_fc_lock);
>  
>  		ret = jbd2_wait_inode_data(journal, pos->jinode);
> @@ -1093,6 +1084,16 @@ static int ext4_fc_perform_commit(journal_t *journal)
>  	int ret = 0;
>  	u32 crc = 0;
>  
> +	/* Lock the journal */
> +	jbd2_journal_lock_updates(journal);
> +	spin_lock(&sbi->s_fc_lock);
> +	list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
> +		ext4_set_inode_state(&iter->vfs_inode,
> +				     EXT4_STATE_FC_COMMITTING);
> +	}
> +	spin_unlock(&sbi->s_fc_lock);
> +	jbd2_journal_unlock_updates(journal);
> +
>  	ret = ext4_fc_submit_inode_data_all(journal);
>  	if (ret)
>  		return ret;
> @@ -1143,6 +1144,18 @@ static int ext4_fc_perform_commit(journal_t *journal)
>  		ret = ext4_fc_write_inode(inode, &crc);
>  		if (ret)
>  			goto out;
> +		ext4_clear_inode_state(inode, EXT4_STATE_FC_COMMITTING);
> +		/*
> +		 * Make sure clearing of EXT4_STATE_FC_COMMITTING is
> +		 * visible before we send the wakeup. Pairs with implicit
> +		 * barrier in prepare_to_wait() in ext4_fc_track_inode().
> +		 */
> +		smp_mb();
> +#if (BITS_PER_LONG < 64)
> +		wake_up_bit(&iter->i_state_flags, EXT4_STATE_FC_COMMITTING);
> +#else
> +		wake_up_bit(&iter->i_flags, EXT4_STATE_FC_COMMITTING);
> +#endif
>  		spin_lock(&sbi->s_fc_lock);
>  	}
>  	spin_unlock(&sbi->s_fc_lock);
> @@ -1276,13 +1289,17 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
>  	spin_lock(&sbi->s_fc_lock);
>  	list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN],
>  				 i_fc_list) {
> -		list_del_init(&iter->i_fc_list);
>  		ext4_clear_inode_state(&iter->vfs_inode,
>  				       EXT4_STATE_FC_COMMITTING);
>  		if (iter->i_sync_tid <= tid)
>  			ext4_fc_reset_inode(&iter->vfs_inode);
> -		/* Make sure EXT4_STATE_FC_COMMITTING bit is clear */
> +		/*
> +		 * Make sure clearing of EXT4_STATE_FC_COMMITTING is
> +		 * visible before we send the wakeup. Pairs with implicit
> +		 * barrier in prepare_to_wait() in ext4_fc_track_inode().
> +		 */
>  		smp_mb();
> +		list_del_init(&iter->i_fc_list);
>  #if (BITS_PER_LONG < 64)
>  		wake_up_bit(&iter->i_state_flags, EXT4_STATE_FC_COMMITTING);
>  #else
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index c2cf74b01ddb..06b885628b1c 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -757,7 +757,6 @@ int jbd2_fc_begin_commit(journal_t *journal, tid_t tid)
>  	}
>  	journal->j_flags |= JBD2_FAST_COMMIT_ONGOING;
>  	write_unlock(&journal->j_state_lock);
> -	jbd2_journal_lock_updates(journal);
>  
>  	return 0;
>  }
> @@ -769,7 +768,6 @@ EXPORT_SYMBOL(jbd2_fc_begin_commit);
>   */
>  static int __jbd2_fc_end_commit(journal_t *journal, tid_t tid, bool fallback)
>  {
> -	jbd2_journal_unlock_updates(journal);
>  	if (journal->j_fc_cleanup_callback)
>  		journal->j_fc_cleanup_callback(journal, 0, tid);
>  	write_lock(&journal->j_state_lock);
> -- 
> 2.35.1.616.g0bdcbb4464-goog
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/5] ext4: drop i_fc_updates from inode fc info
  2022-03-08 16:33 ` [PATCH v2 4/5] ext4: drop i_fc_updates from inode fc info Harshad Shirwadkar
@ 2022-03-09 10:17   ` Jan Kara
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2022-03-09 10:17 UTC (permalink / raw)
  To: Harshad Shirwadkar; +Cc: linux-ext4, riteshh, jack, tytso

On Tue 08-03-22 08:33:18, Harshad Shirwadkar wrote:
> From: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
> 
> The new logic introduced in this series does not require tracking number
> of active handles open on an inode. So, drop it.
> 
> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/ext4.h        |  5 ----
>  fs/ext4/fast_commit.c | 70 -------------------------------------------
>  2 files changed, 75 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index fb6d65f1176f..6861a3127a42 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1059,9 +1059,6 @@ struct ext4_inode_info {
>  	/* End of lblk range that needs to be committed in this fast commit */
>  	ext4_lblk_t i_fc_lblk_len;
>  
> -	/* Number of ongoing updates on this inode */
> -	atomic_t  i_fc_updates;
> -
>  	/* Fast commit wait queue for this inode */
>  	wait_queue_head_t i_fc_wait;
>  
> @@ -2930,8 +2927,6 @@ void __ext4_fc_track_create(handle_t *handle, struct inode *inode,
>  void ext4_fc_track_create(handle_t *handle, struct dentry *dentry);
>  void ext4_fc_track_inode(handle_t *handle, struct inode *inode);
>  void ext4_fc_mark_ineligible(struct super_block *sb, int reason, handle_t *handle);
> -void ext4_fc_start_update(struct inode *inode);
> -void ext4_fc_stop_update(struct inode *inode);
>  void ext4_fc_del(struct inode *inode);
>  bool ext4_fc_replay_check_excluded(struct super_block *sb, ext4_fsblk_t block);
>  void ext4_fc_replay_cleanup(struct super_block *sb);
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index eedcf8b4d47b..eea19e3ea9ba 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -201,76 +201,6 @@ void ext4_fc_init_inode(struct inode *inode)
>  	INIT_LIST_HEAD(&ei->i_fc_list);
>  	INIT_LIST_HEAD(&ei->i_fc_dilist);
>  	init_waitqueue_head(&ei->i_fc_wait);
> -	atomic_set(&ei->i_fc_updates, 0);
> -}
> -
> -/* This function must be called with sbi->s_fc_lock held. */
> -static void ext4_fc_wait_committing_inode(struct inode *inode)
> -__releases(&EXT4_SB(inode->i_sb)->s_fc_lock)
> -{
> -	wait_queue_head_t *wq;
> -	struct ext4_inode_info *ei = EXT4_I(inode);
> -
> -#if (BITS_PER_LONG < 64)
> -	DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
> -			EXT4_STATE_FC_COMMITTING);
> -	wq = bit_waitqueue(&ei->i_state_flags,
> -				EXT4_STATE_FC_COMMITTING);
> -#else
> -	DEFINE_WAIT_BIT(wait, &ei->i_flags,
> -			EXT4_STATE_FC_COMMITTING);
> -	wq = bit_waitqueue(&ei->i_flags,
> -				EXT4_STATE_FC_COMMITTING);
> -#endif
> -	lockdep_assert_held(&EXT4_SB(inode->i_sb)->s_fc_lock);
> -	prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
> -	spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
> -	schedule();
> -	finish_wait(wq, &wait.wq_entry);
> -}
> -
> -/*
> - * Inform Ext4's fast about start of an inode update
> - *
> - * This function is called by the high level call VFS callbacks before
> - * performing any inode update. This function blocks if there's an ongoing
> - * fast commit on the inode in question.
> - */
> -void ext4_fc_start_update(struct inode *inode)
> -{
> -	struct ext4_inode_info *ei = EXT4_I(inode);
> -
> -	if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) ||
> -	    (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY))
> -		return;
> -
> -restart:
> -	spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock);
> -	if (list_empty(&ei->i_fc_list))
> -		goto out;
> -
> -	if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
> -		ext4_fc_wait_committing_inode(inode);
> -		goto restart;
> -	}
> -out:
> -	atomic_inc(&ei->i_fc_updates);
> -	spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
> -}
> -
> -/*
> - * Stop inode update and wake up waiting fast commits if any.
> - */
> -void ext4_fc_stop_update(struct inode *inode)
> -{
> -	struct ext4_inode_info *ei = EXT4_I(inode);
> -
> -	if (!test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT) ||
> -	    (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY))
> -		return;
> -
> -	if (atomic_dec_and_test(&ei->i_fc_updates))
> -		wake_up_all(&ei->i_fc_wait);
>  }
>  
>  /*
> -- 
> 2.35.1.616.g0bdcbb4464-goog
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] ext4: update code documentation
  2022-03-08 16:33 ` [PATCH v2 5/5] ext4: update code documentation Harshad Shirwadkar
@ 2022-03-09 10:19   ` Jan Kara
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2022-03-09 10:19 UTC (permalink / raw)
  To: Harshad Shirwadkar; +Cc: linux-ext4, riteshh, jack, tytso

On Tue 08-03-22 08:33:19, Harshad Shirwadkar wrote:
> From: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
> 
> This patch updates code documentation to reflect the commit path changes
> made in this series.
> 
> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/fast_commit.c | 36 ++++++++++++++++++++++--------------
>  1 file changed, 22 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index eea19e3ea9ba..c14e6d34d552 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -49,14 +49,21 @@
>   * that need to be committed during a fast commit in another in memory queue of
>   * inodes. During the commit operation, we commit in the following order:
>   *
> - * [1] Lock inodes for any further data updates by setting COMMITTING state
> - * [2] Submit data buffers of all the inodes
> - * [3] Wait for [2] to complete
> - * [4] Commit all the directory entry updates in the fast commit space
> - * [5] Commit all the changed inode structures
> - * [6] Write tail tag (this tag ensures the atomicity, please read the following
> + * [1] Lock the journal by calling jbd2_journal_lock_updates. This ensures that
> + *     all the exsiting handles finish and no new handles can start.
> + * [2] Mark all the fast commit eligible inodes as undergoing fast commit
> + *     by setting "EXT4_STATE_FC_COMMITTING" state.
> + * [3] Unlock the journal by calling jbd2_journal_unlock_updates. This allows
> + *     starting of new handles. If new handles try to start an update on
> + *     any of the inodes that are being committed, ext4_fc_track_inode()
> + *     will block until those inodes have finished the fast commit.
> + * [4] Submit data buffers of all the committing inodes.
> + * [5] Wait for [4] to complete.
> + * [6] Commit all the directory entry updates in the fast commit space.
> + * [7] Commit all the changed inodes in the fast commit space and clear
> + *     "EXT4_STATE_FC_COMMITTING" for these inodes.
> + * [8] Write tail tag (this tag ensures the atomicity, please read the following
>   *     section for more details).
> - * [7] Wait for [4], [5] and [6] to complete.
>   *
>   * All the inode updates must call ext4_fc_start_update() before starting an
>   * update. If such an ongoing update is present, fast commit waits for it to
> @@ -142,6 +149,13 @@
>   * similarly. Thus, by converting a non-idempotent procedure into a series of
>   * idempotent outcomes, fast commits ensured idempotence during the replay.
>   *
> + * Locking
> + * -------
> + * sbi->s_fc_lock protects the fast commit inodes queue and the fast commit
> + * dentry queue. ei->i_fc_lock protects the fast commit related info in a given
> + * inode. Most of the code avoids acquiring both the locks, but if one must do
> + * that then sbi->s_fc_lock must be acquired before ei->i_fc_lock.
> + *
>   * TODOs
>   * -----
>   *
> @@ -156,13 +170,7 @@
>   *    fast commit recovery even if that area is invalidated by later full
>   *    commits.
>   *
> - * 1) Fast commit's commit path locks the entire file system during fast
> - *    commit. This has significant performance penalty. Instead of that, we
> - *    should use ext4_fc_start/stop_update functions to start inode level
> - *    updates from ext4_journal_start/stop. Once we do that we can drop file
> - *    system locking during commit path.
> - *
> - * 2) Handle more ineligible cases.
> + * 1) Handle more ineligible cases.
>   */
>  
>  #include <trace/events/ext4.h>
> -- 
> 2.35.1.616.g0bdcbb4464-goog
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/5] ext4: convert i_fc_lock to spinlock
  2022-03-09 10:10   ` Jan Kara
@ 2022-03-11  4:02     ` harshad shirwadkar
  0 siblings, 0 replies; 16+ messages in thread
From: harshad shirwadkar @ 2022-03-11  4:02 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ext4 Developers List, Ritesh Harjani, Theodore Y. Ts'o

On Wed, 9 Mar 2022 at 02:10, Jan Kara <jack@suse.cz> wrote:
>
> On Tue 08-03-22 08:33:15, Harshad Shirwadkar wrote:
> > From: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
> >
> > Convert ext4_inode_info->i_fc_lock to spinlock to avoid sleeping
> > in invalid contexts.
> >
> > Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
>
> One comment below...
>
> > @@ -972,9 +970,13 @@ static int ext4_fc_wait_inode_data_all(journal_t *journal)
> >
> >       spin_lock(&sbi->s_fc_lock);
> >       list_for_each_entry_safe(pos, n, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
> > +             spin_lock(&pos->i_fc_lock);
> >               if (!ext4_test_inode_state(&pos->vfs_inode,
> > -                                        EXT4_STATE_FC_COMMITTING))
> > +                                        EXT4_STATE_FC_COMMITTING)) {
> > +                     spin_unlock(&pos->i_fc_lock);
> >                       continue;
> > +             }
> > +             spin_unlock(&pos->i_fc_lock);
> >               spin_unlock(&sbi->s_fc_lock);
>
> Why do you add a lock here in a pure lock-conversion patch? Furthermore I
> don't think the lock is needed...
Oops sorry, this was an unintentional leftover from the first version,
I'll remove it in the next one, thanks!

- Harshad
>
>                                                                 Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] ext4: for committing inode, make ext4_fc_track_inode wait
  2022-03-09 10:14   ` Jan Kara
@ 2022-03-11  4:17     ` harshad shirwadkar
  2022-03-11  8:25       ` harshad shirwadkar
  0 siblings, 1 reply; 16+ messages in thread
From: harshad shirwadkar @ 2022-03-11  4:17 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ext4 Developers List, Ritesh Harjani, Theodore Y. Ts'o

Thanks for the reviews Jan! I'll update inline.c as you mentioned in
the next version.

- Harshad

On Wed, 9 Mar 2022 at 02:14, Jan Kara <jack@suse.cz> wrote:
>
> On Tue 08-03-22 08:33:16, Harshad Shirwadkar wrote:
> > From: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
> >
> > If the inode that's being requested to track using ext4_fc_track_inode
> > is being committed, then wait until the inode finishes the commit.
> >
> > Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
>
> One comment below...
>
> > --- a/fs/ext4/ext4_jbd2.c
> > +++ b/fs/ext4/ext4_jbd2.c
> > @@ -106,6 +106,18 @@ handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
> >                                  GFP_NOFS, type, line);
> >  }
> >
> > +handle_t *__ext4_journal_start(struct inode *inode, unsigned int line,
> > +                               int type, int blocks, int rsv_blocks,
> > +                               int revoke_creds)
> > +{
> > +     handle_t *handle = __ext4_journal_start_sb(inode->i_sb, line,
> > +                                                type, blocks, rsv_blocks,
> > +                                                revoke_creds);
> > +     if (ext4_handle_valid(handle) && !IS_ERR(handle))
> > +             ext4_fc_track_inode(handle, inode);
> > +     return handle;
> > +}
> > +
>
> Please fix fs/ext4/inline.c rather than papering over the problem like
> this. Because it is just a landmine waiting to explode in some strange
> cornercase when someone does not call ext4_journal_start() but other handle
> starting function.
>
>                                                                 Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] ext4: for committing inode, make ext4_fc_track_inode wait
  2022-03-11  4:17     ` harshad shirwadkar
@ 2022-03-11  8:25       ` harshad shirwadkar
  2022-03-21 14:06         ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: harshad shirwadkar @ 2022-03-11  8:25 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ext4 Developers List, Ritesh Harjani, Theodore Y. Ts'o

Hmm, after removing ext4_fc_track_inode() from ext4_journal_start(), I
see one deadlock - there are some places in code where
ext4_mark_inode_dirty gets called while holding i_data_sem. Commit
path requires i_data_sem to commit inode data (via ext4_map_blocks).
So, if an inode is being committed, ext4_mark_inode_dirty may start
waiting for the inode to commit while holding i_data_sem and the
commit path may wait on i_data_sem. The right way to fix this is to
call ext4_fc_track_inode in such places before acquiring i_data_sem in
write mode. But that would mean we sprinkle code with more
ext4_fc_track_inode() calls which is something that I preferably
wanted to avoid.

This makes me wonder though, for fast commits, is it a terrible idea
to extend the meaning of ext4_journal_start() from "start a new
handle" to "start a new handle with an intention to modify the passed
inode"? Most of the handles modify only one inode, and for other
places where we do modify multiple inodes, ext4_reserve_inode_write()
would ensure that those inodes are tracked as well. All of the
existing places where inode gets modified after grabbing i_data_sem,
i_data_sem is grabbed only after starting the handle. This would take
care of the deadlock mentioned above and similar deadlocks. Another
advantage with doing this is that developers wouldn't need to worry
about adding explicit ext4_fc_track_inode() calls for future changes.

If we decide to do this, we would need to do a thorough code review to
ensure that the above rule is followed everywhere. But note that
ext4_fc_track_inode() is idempotent, so it doesn't matter if this
function gets called multiple times in the same handle. So to avoid
breaking fast commits, we can be super careful and in the first
version, we can have ext4_fc_track_range() calls in
ext4_reserve_inode_dirty(), ext4_journal_start(), inline.c and in
handles where i_data_sem gets acquired in write mode. We can then
carefully evaluate each code path and remove redundant
ext4_fc_track_range() calls.

What do you think?

- Harshad

On Thu, 10 Mar 2022 at 20:17, harshad shirwadkar
<harshadshirwadkar@gmail.com> wrote:
>
> Thanks for the reviews Jan! I'll update inline.c as you mentioned in
> the next version.
>
> - Harshad
>
> On Wed, 9 Mar 2022 at 02:14, Jan Kara <jack@suse.cz> wrote:
> >
> > On Tue 08-03-22 08:33:16, Harshad Shirwadkar wrote:
> > > From: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
> > >
> > > If the inode that's being requested to track using ext4_fc_track_inode
> > > is being committed, then wait until the inode finishes the commit.
> > >
> > > Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
> >
> > One comment below...
> >
> > > --- a/fs/ext4/ext4_jbd2.c
> > > +++ b/fs/ext4/ext4_jbd2.c
> > > @@ -106,6 +106,18 @@ handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
> > >                                  GFP_NOFS, type, line);
> > >  }
> > >
> > > +handle_t *__ext4_journal_start(struct inode *inode, unsigned int line,
> > > +                               int type, int blocks, int rsv_blocks,
> > > +                               int revoke_creds)
> > > +{
> > > +     handle_t *handle = __ext4_journal_start_sb(inode->i_sb, line,
> > > +                                                type, blocks, rsv_blocks,
> > > +                                                revoke_creds);
> > > +     if (ext4_handle_valid(handle) && !IS_ERR(handle))
> > > +             ext4_fc_track_inode(handle, inode);
> > > +     return handle;
> > > +}
> > > +
> >
> > Please fix fs/ext4/inline.c rather than papering over the problem like
> > this. Because it is just a landmine waiting to explode in some strange
> > cornercase when someone does not call ext4_journal_start() but other handle
> > starting function.
> >
> >                                                                 Honza
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] ext4: for committing inode, make ext4_fc_track_inode wait
  2022-03-11  8:25       ` harshad shirwadkar
@ 2022-03-21 14:06         ` Jan Kara
  2022-04-18 21:02           ` harshad shirwadkar
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2022-03-21 14:06 UTC (permalink / raw)
  To: harshad shirwadkar
  Cc: Jan Kara, Ext4 Developers List, Ritesh Harjani, Theodore Y. Ts'o

Sorry for delayed reply, I've got dragged into other things and somewhat
forgot about this...

On Fri 11-03-22 00:25:48, harshad shirwadkar wrote:
> Hmm, after removing ext4_fc_track_inode() from ext4_journal_start(), I
> see one deadlock - there are some places in code where
> ext4_mark_inode_dirty gets called while holding i_data_sem. Commit
> path requires i_data_sem to commit inode data (via ext4_map_blocks).
> So, if an inode is being committed, ext4_mark_inode_dirty may start
> waiting for the inode to commit while holding i_data_sem and the
> commit path may wait on i_data_sem.

Indeed, that is a problem.

> The right way to fix this is to
> call ext4_fc_track_inode in such places before acquiring i_data_sem in
> write mode. But that would mean we sprinkle code with more
> ext4_fc_track_inode() calls which is something that I preferably
> wanted to avoid.

So I agree calling ext4_fc_track_inode() from ext4_reserve_inode_write()
isn't going to fly as is. We need to find a better way of doing things.

> This makes me wonder though, for fast commits, is it a terrible idea
> to extend the meaning of ext4_journal_start() from "start a new
> handle" to "start a new handle with an intention to modify the passed
> inode"? Most of the handles modify only one inode, and for other
> places where we do modify multiple inodes, ext4_reserve_inode_write()
> would ensure that those inodes are tracked as well.

Well but to avoid deadlocks like you've described above you would have to
start tracking inode with explicit ext4_fc_track_inode() calls before
grabbing i_data_sem. ext4_reserve_inode_write() would be only useful for an
assertion check that indeed the inode is already tracked.

> All of the
> existing places where inode gets modified after grabbing i_data_sem,
> i_data_sem is grabbed only after starting the handle. This would take
> care of the deadlock mentioned above and similar deadlocks. Another
> advantage with doing this is that developers wouldn't need to worry
> about adding explicit ext4_fc_track_inode() calls for future changes.

Well, at least for the simple case where only one inode is modified. But I
agree that's a majority.

> If we decide to do this, we would need to do a thorough code review to
> ensure that the above rule is followed everywhere. But note that
> ext4_fc_track_inode() is idempotent, so it doesn't matter if this
> function gets called multiple times in the same handle. So to avoid
> breaking fast commits, we can be super careful and in the first
> version, we can have ext4_fc_track_range() calls in
> ext4_reserve_inode_dirty(), ext4_journal_start(), inline.c and in
> handles where i_data_sem gets acquired in write mode. We can then
> carefully evaluate each code path and remove redundant
> ext4_fc_track_range() calls.

As I wrote above, if we go this path, I'd be for ext4_fc_track_inode()
calls in ext4_journal_start() and then adding explicit calls to
ext4_fc_track_inode() where additionally needed and have only assertion
checks in ext4_reserve_inode_dirty() and other places which modify inode
metadata, to catch places which need explicit calls to
ext4_fc_track_inode(). That way we won't have any hidden deadlocks in the
code waiting to happen.

I have also another proposal for consideration how we could handle this.
Mostly branstorming now: We could also drop the need for fastcommit code to
acquire i_data_sem during commit. We could use just information in extent
status tree to provide block mapping for the fastcommit code (that does not
need i_data_sem). The only thing is we'd need to make sure modified extents
from the status tree are not evicted from memory before the appropriate
transaction commits so that they are available for appropriate fastcommit -
for that we'd probably need to add TID of the transaction that last
modified an extent and add check into the shrinker to avoid shrinking
uncommitted extents. As a bonus, we could now add to fastcommit only
extents with appropriate TID and thus save on extent logging for sparsely
modified inode.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] ext4: for committing inode, make ext4_fc_track_inode wait
  2022-03-21 14:06         ` Jan Kara
@ 2022-04-18 21:02           ` harshad shirwadkar
  0 siblings, 0 replies; 16+ messages in thread
From: harshad shirwadkar @ 2022-04-18 21:02 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ext4 Developers List, Ritesh Harjani, Theodore Y. Ts'o

Sorry for the late reply, I was on vacation and getting to this just now.

On Mon, 21 Mar 2022 at 07:06, Jan Kara <jack@suse.cz> wrote:
>
> Sorry for delayed reply, I've got dragged into other things and somewhat
> forgot about this...
>
> On Fri 11-03-22 00:25:48, harshad shirwadkar wrote:
> > Hmm, after removing ext4_fc_track_inode() from ext4_journal_start(), I
> > see one deadlock - there are some places in code where
> > ext4_mark_inode_dirty gets called while holding i_data_sem. Commit
> > path requires i_data_sem to commit inode data (via ext4_map_blocks).
> > So, if an inode is being committed, ext4_mark_inode_dirty may start
> > waiting for the inode to commit while holding i_data_sem and the
> > commit path may wait on i_data_sem.
>
> Indeed, that is a problem.
>
> > The right way to fix this is to
> > call ext4_fc_track_inode in such places before acquiring i_data_sem in
> > write mode. But that would mean we sprinkle code with more
> > ext4_fc_track_inode() calls which is something that I preferably
> > wanted to avoid.
>
> So I agree calling ext4_fc_track_inode() from ext4_reserve_inode_write()
> isn't going to fly as is. We need to find a better way of doing things.
>
> > This makes me wonder though, for fast commits, is it a terrible idea
> > to extend the meaning of ext4_journal_start() from "start a new
> > handle" to "start a new handle with an intention to modify the passed
> > inode"? Most of the handles modify only one inode, and for other
> > places where we do modify multiple inodes, ext4_reserve_inode_write()
> > would ensure that those inodes are tracked as well.
>
> Well but to avoid deadlocks like you've described above you would have to
> start tracking inode with explicit ext4_fc_track_inode() calls before
> grabbing i_data_sem. ext4_reserve_inode_write() would be only useful for an
> assertion check that indeed the inode is already tracked.
>
> > All of the
> > existing places where inode gets modified after grabbing i_data_sem,
> > i_data_sem is grabbed only after starting the handle. This would take
> > care of the deadlock mentioned above and similar deadlocks. Another
> > advantage with doing this is that developers wouldn't need to worry
> > about adding explicit ext4_fc_track_inode() calls for future changes.
>
> Well, at least for the simple case where only one inode is modified. But I
> agree that's a majority.
>
> > If we decide to do this, we would need to do a thorough code review to
> > ensure that the above rule is followed everywhere. But note that
> > ext4_fc_track_inode() is idempotent, so it doesn't matter if this
> > function gets called multiple times in the same handle. So to avoid
> > breaking fast commits, we can be super careful and in the first
> > version, we can have ext4_fc_track_range() calls in
> > ext4_reserve_inode_dirty(), ext4_journal_start(), inline.c and in
> > handles where i_data_sem gets acquired in write mode. We can then
> > carefully evaluate each code path and remove redundant
> > ext4_fc_track_range() calls.
>
> As I wrote above, if we go this path, I'd be for ext4_fc_track_inode()
> calls in ext4_journal_start() and then adding explicit calls to
> ext4_fc_track_inode() where additionally needed and have only assertion
> checks in ext4_reserve_inode_dirty() and other places which modify inode
> metadata, to catch places which need explicit calls to
> ext4_fc_track_inode(). That way we won't have any hidden deadlocks in the
> code waiting to happen.
Okay sounds good, so based on your response, I'll rework the patch
series to make following changes:
1) Add ext4_fc_track_inode() calls in ext4_journal_start()
2) Add ext4_fc_track_inode calls in places where more than one inode
are changed in a handle and / or i_data_sem is being acquired.
3) Add an assertion in ext4_reserve_inode_dirty() to check if the
inode on which write is being reserved is already being tracked.

Does that sound good?
>
> I have also another proposal for consideration how we could handle this.
> Mostly branstorming now: We could also drop the need for fastcommit code to
> acquire i_data_sem during commit. We could use just information in extent
> status tree to provide block mapping for the fastcommit code (that does not
> need i_data_sem). The only thing is we'd need to make sure modified extents
> from the status tree are not evicted from memory before the appropriate
> transaction commits so that they are available for appropriate fastcommit -
> for that we'd probably need to add TID of the transaction that last
> modified an extent and add check into the shrinker to avoid shrinking
> uncommitted extents. As a bonus, we could now add to fastcommit only
> extents with appropriate TID and thus save on extent logging for sparsely
> modified inode.
Yeah, I like this idea. I'll add that as a TODO in the code just so
that we don't lose it.

Thanks,
Harshad
>
>                                                                 Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-04-18 21:03 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-08 16:33 [PATCH v2 0/5] ext4: improve commit path performance for fast commit Harshad Shirwadkar
2022-03-08 16:33 ` [PATCH v2 1/5] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
2022-03-09 10:10   ` Jan Kara
2022-03-11  4:02     ` harshad shirwadkar
2022-03-08 16:33 ` [PATCH v2 2/5] ext4: for committing inode, make ext4_fc_track_inode wait Harshad Shirwadkar
2022-03-09 10:14   ` Jan Kara
2022-03-11  4:17     ` harshad shirwadkar
2022-03-11  8:25       ` harshad shirwadkar
2022-03-21 14:06         ` Jan Kara
2022-04-18 21:02           ` harshad shirwadkar
2022-03-08 16:33 ` [PATCH v2 3/5] ext4: rework fast commit commit path Harshad Shirwadkar
2022-03-09 10:17   ` Jan Kara
2022-03-08 16:33 ` [PATCH v2 4/5] ext4: drop i_fc_updates from inode fc info Harshad Shirwadkar
2022-03-09 10:17   ` Jan Kara
2022-03-08 16:33 ` [PATCH v2 5/5] ext4: update code documentation Harshad Shirwadkar
2022-03-09 10:19   ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.