linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] ext4/jbd2: inline_data fixes and some cleanups
@ 2022-01-13  3:26 Ritesh Harjani
  2022-01-13  3:26 ` [PATCH 1/6] ext4: Fix error handling in ext4_restore_inline_data() Ritesh Harjani
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Ritesh Harjani @ 2022-01-13  3:26 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, Jan Kara, Andreas Dilger, tytso,
	Eric Whitney, Ritesh Harjani

Hellos,

Patch[1]: fixes BUG_ON with inline_data which was reported [1] with generic/475.

Patch[2]: is mostly cleanup found during code review of inline_data code.

Patch[3]: is a possible memory corruption fix in case of krealloc failure.

Patch[4-5]: Cleanups.

Patch[6]: Needs careful review. As it gets rid of t_handle_lock spinlock
in jbd2_journal_wait_updates(). From the code review I found it to be not
required. But let me know if I missed anything here.

[1]: https://lore.kernel.org/linux-ext4/20210527192418.GA2633@localhost.localdomain/

Ritesh Harjani (6):
  ext4: Fix error handling in ext4_restore_inline_data()
  ext4: Remove redundant max inline_size check in ext4_da_write_inline_data_begin()
  ext4: Fix error handling in ext4_fc_record_modified_inode()
  jbd2: Cleanup unused functions declarations from jbd2.h
  jbd2: Refactor wait logic for transaction updates into a common function
  jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates

 fs/ext4/fast_commit.c | 64 ++++++++++++++++++++-----------------------
 fs/ext4/inline.c      | 23 +++++++++-------
 fs/jbd2/commit.c      | 19 ++-----------
 fs/jbd2/transaction.c | 24 ++--------------
 include/linux/jbd2.h  | 34 +++++++++++++++++------
 5 files changed, 74 insertions(+), 90 deletions(-)

--
2.31.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/6] ext4: Fix error handling in ext4_restore_inline_data()
  2022-01-13  3:26 [PATCH 0/6] ext4/jbd2: inline_data fixes and some cleanups Ritesh Harjani
@ 2022-01-13  3:26 ` Ritesh Harjani
  2022-01-13 10:58   ` Jan Kara
  2022-01-13  3:26 ` [PATCH 2/6] ext4: Remove redundant max inline_size check in ext4_da_write_inline_data_begin() Ritesh Harjani
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Ritesh Harjani @ 2022-01-13  3:26 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, Jan Kara, Andreas Dilger, tytso,
	Eric Whitney, Ritesh Harjani

While running "./check -I 200 generic/475" it sometimes gives below
kernel BUG(). Ideally we should not call ext4_write_inline_data() if
ext4_create_inline_data() has failed.

<log snip>
[73131.453234] kernel BUG at fs/ext4/inline.c:223!

<code snip>
 212 static void ext4_write_inline_data(struct inode *inode, struct ext4_iloc *iloc,
 213                                    void *buffer, loff_t pos, unsigned int len)
 214 {
<...>
 223         BUG_ON(!EXT4_I(inode)->i_inline_off);
 224         BUG_ON(pos + len > EXT4_I(inode)->i_inline_size);

This patch handles the error and prints out a emergency msg saying potential
data loss for the given inode (since we couldn't restore the original
inline_data due to some previous error).

[ 9571.070313] EXT4-fs (dm-0): error restoring inline_data for inode -- potential data loss! (inode 1703982, error -30)

Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/ext4/inline.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 534c0329e110..31741e8a462e 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1135,7 +1135,15 @@ static void ext4_restore_inline_data(handle_t *handle, struct inode *inode,
 				     struct ext4_iloc *iloc,
 				     void *buf, int inline_size)
 {
-	ext4_create_inline_data(handle, inode, inline_size);
+	int ret;
+
+	ret = ext4_create_inline_data(handle, inode, inline_size);
+	if (ret) {
+		ext4_msg(inode->i_sb, KERN_EMERG,
+			"error restoring inline_data for inode -- potential data loss! (inode %lu, error %d)",
+			inode->i_ino, ret);
+		return;
+	}
 	ext4_write_inline_data(inode, iloc, buf, 0, inline_size);
 	ext4_set_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/6] ext4: Remove redundant max inline_size check in ext4_da_write_inline_data_begin()
  2022-01-13  3:26 [PATCH 0/6] ext4/jbd2: inline_data fixes and some cleanups Ritesh Harjani
  2022-01-13  3:26 ` [PATCH 1/6] ext4: Fix error handling in ext4_restore_inline_data() Ritesh Harjani
@ 2022-01-13  3:26 ` Ritesh Harjani
  2022-01-13 10:58   ` Jan Kara
  2022-01-13  3:26 ` [PATCH 3/6] ext4: Fix error handling in ext4_fc_record_modified_inode() Ritesh Harjani
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Ritesh Harjani @ 2022-01-13  3:26 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, Jan Kara, Andreas Dilger, tytso,
	Eric Whitney, Ritesh Harjani

ext4_prepare_inline_data() already checks for ext4_get_max_inline_size()
and returns -ENOSPC. So there is no need to check it twice within
ext4_da_write_inline_data_begin(). This patch removes the extra check.

It also makes it more clean.

No functionality change in this patch.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/ext4/inline.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 31741e8a462e..c52b0037983d 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -913,7 +913,7 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping,
 				    struct page **pagep,
 				    void **fsdata)
 {
-	int ret, inline_size;
+	int ret;
 	handle_t *handle;
 	struct page *page;
 	struct ext4_iloc iloc;
@@ -930,14 +930,9 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping,
 		goto out;
 	}
 
-	inline_size = ext4_get_max_inline_size(inode);
-
-	ret = -ENOSPC;
-	if (inline_size >= pos + len) {
-		ret = ext4_prepare_inline_data(handle, inode, pos + len);
-		if (ret && ret != -ENOSPC)
-			goto out_journal;
-	}
+	ret = ext4_prepare_inline_data(handle, inode, pos + len);
+	if (ret && ret != -ENOSPC)
+		goto out_journal;
 
 	/*
 	 * We cannot recurse into the filesystem as the transaction
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/6] ext4: Fix error handling in ext4_fc_record_modified_inode()
  2022-01-13  3:26 [PATCH 0/6] ext4/jbd2: inline_data fixes and some cleanups Ritesh Harjani
  2022-01-13  3:26 ` [PATCH 1/6] ext4: Fix error handling in ext4_restore_inline_data() Ritesh Harjani
  2022-01-13  3:26 ` [PATCH 2/6] ext4: Remove redundant max inline_size check in ext4_da_write_inline_data_begin() Ritesh Harjani
@ 2022-01-13  3:26 ` Ritesh Harjani
  2022-01-13 11:00   ` Jan Kara
  2022-01-13  3:26 ` [PATCH 4/6] jbd2: Cleanup unused functions declarations from jbd2.h Ritesh Harjani
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Ritesh Harjani @ 2022-01-13  3:26 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, Jan Kara, Andreas Dilger, tytso,
	Eric Whitney, Ritesh Harjani, luo penghao, Lukas Czerner

Current code does not fully takes care of krealloc() error case,
which could lead to silent memory corruption or a kernel bug.
This patch fixes that.

Also it cleans up some duplicated error handling logic from various functions
in fast_commit.c file.

Reported-by: luo penghao <luo.penghao@zte.com.cn>
Suggested-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/ext4/fast_commit.c | 64 ++++++++++++++++++++-----------------------
 1 file changed, 29 insertions(+), 35 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 5ae8026a0c56..4541c8468c01 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1392,14 +1392,15 @@ static int ext4_fc_record_modified_inode(struct super_block *sb, int ino)
 		if (state->fc_modified_inodes[i] == ino)
 			return 0;
 	if (state->fc_modified_inodes_used == state->fc_modified_inodes_size) {
-		state->fc_modified_inodes_size +=
-			EXT4_FC_REPLAY_REALLOC_INCREMENT;
 		state->fc_modified_inodes = krealloc(
-					state->fc_modified_inodes, sizeof(int) *
-					state->fc_modified_inodes_size,
-					GFP_KERNEL);
+				state->fc_modified_inodes,
+				sizeof(int) * (state->fc_modified_inodes_size +
+				EXT4_FC_REPLAY_REALLOC_INCREMENT),
+				GFP_KERNEL);
 		if (!state->fc_modified_inodes)
 			return -ENOMEM;
+		state->fc_modified_inodes_size +=
+			EXT4_FC_REPLAY_REALLOC_INCREMENT;
 	}
 	state->fc_modified_inodes[state->fc_modified_inodes_used++] = ino;
 	return 0;
@@ -1431,7 +1432,9 @@ static int ext4_fc_replay_inode(struct super_block *sb, struct ext4_fc_tl *tl,
 	}
 	inode = NULL;
 
-	ext4_fc_record_modified_inode(sb, ino);
+	ret = ext4_fc_record_modified_inode(sb, ino);
+	if (ret)
+		goto out;
 
 	raw_fc_inode = (struct ext4_inode *)
 		(val + offsetof(struct ext4_fc_inode, fc_raw_inode));
@@ -1621,6 +1624,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
 	}
 
 	ret = ext4_fc_record_modified_inode(sb, inode->i_ino);
+	if (ret)
+		goto out;
 
 	start = le32_to_cpu(ex->ee_block);
 	start_pblk = ext4_ext_pblock(ex);
@@ -1638,18 +1643,14 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
 		map.m_pblk = 0;
 		ret = ext4_map_blocks(NULL, inode, &map, 0);
 
-		if (ret < 0) {
-			iput(inode);
-			return 0;
-		}
+		if (ret < 0)
+			goto out;
 
 		if (ret == 0) {
 			/* Range is not mapped */
 			path = ext4_find_extent(inode, cur, NULL, 0);
-			if (IS_ERR(path)) {
-				iput(inode);
-				return 0;
-			}
+			if (IS_ERR(path))
+				goto out;
 			memset(&newex, 0, sizeof(newex));
 			newex.ee_block = cpu_to_le32(cur);
 			ext4_ext_store_pblock(
@@ -1663,10 +1664,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
 			up_write((&EXT4_I(inode)->i_data_sem));
 			ext4_ext_drop_refs(path);
 			kfree(path);
-			if (ret) {
-				iput(inode);
-				return 0;
-			}
+			if (ret)
+				goto out;
 			goto next;
 		}
 
@@ -1679,10 +1678,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
 			ret = ext4_ext_replay_update_ex(inode, cur, map.m_len,
 					ext4_ext_is_unwritten(ex),
 					start_pblk + cur - start);
-			if (ret) {
-				iput(inode);
-				return 0;
-			}
+			if (ret)
+				goto out;
 			/*
 			 * Mark the old blocks as free since they aren't used
 			 * anymore. We maintain an array of all the modified
@@ -1702,10 +1699,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
 			ext4_ext_is_unwritten(ex), map.m_pblk);
 		ret = ext4_ext_replay_update_ex(inode, cur, map.m_len,
 					ext4_ext_is_unwritten(ex), map.m_pblk);
-		if (ret) {
-			iput(inode);
-			return 0;
-		}
+		if (ret)
+			goto out;
 		/*
 		 * We may have split the extent tree while toggling the state.
 		 * Try to shrink the extent tree now.
@@ -1717,6 +1712,7 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
 	}
 	ext4_ext_replay_shrink_inode(inode, i_size_read(inode) >>
 					sb->s_blocksize_bits);
+out:
 	iput(inode);
 	return 0;
 }
@@ -1746,6 +1742,8 @@ ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl,
 	}
 
 	ret = ext4_fc_record_modified_inode(sb, inode->i_ino);
+	if (ret)
+		goto out;
 
 	jbd_debug(1, "DEL_RANGE, inode %ld, lblk %d, len %d\n",
 			inode->i_ino, le32_to_cpu(lrange.fc_lblk),
@@ -1755,10 +1753,8 @@ ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl,
 		map.m_len = remaining;
 
 		ret = ext4_map_blocks(NULL, inode, &map, 0);
-		if (ret < 0) {
-			iput(inode);
-			return 0;
-		}
+		if (ret < 0)
+			goto out;
 		if (ret > 0) {
 			remaining -= ret;
 			cur += ret;
@@ -1773,15 +1769,13 @@ ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl,
 	ret = ext4_ext_remove_space(inode, lrange.fc_lblk,
 				lrange.fc_lblk + lrange.fc_len - 1);
 	up_write(&EXT4_I(inode)->i_data_sem);
-	if (ret) {
-		iput(inode);
-		return 0;
-	}
+	if (ret)
+		goto out;
 	ext4_ext_replay_shrink_inode(inode,
 		i_size_read(inode) >> sb->s_blocksize_bits);
 	ext4_mark_inode_dirty(NULL, inode);
+out:
 	iput(inode);
-
 	return 0;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/6] jbd2: Cleanup unused functions declarations from jbd2.h
  2022-01-13  3:26 [PATCH 0/6] ext4/jbd2: inline_data fixes and some cleanups Ritesh Harjani
                   ` (2 preceding siblings ...)
  2022-01-13  3:26 ` [PATCH 3/6] ext4: Fix error handling in ext4_fc_record_modified_inode() Ritesh Harjani
@ 2022-01-13  3:26 ` Ritesh Harjani
  2022-01-13 11:01   ` Jan Kara
  2022-01-13  3:26 ` [PATCH 5/6] jbd2: Refactor wait logic for transaction updates into a common function Ritesh Harjani
  2022-01-13  3:26 ` [PATCH 6/6] jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates Ritesh Harjani
  5 siblings, 1 reply; 17+ messages in thread
From: Ritesh Harjani @ 2022-01-13  3:26 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, Jan Kara, Andreas Dilger, tytso,
	Eric Whitney, Ritesh Harjani

During code review found no references of few of these below function
declarations. This patch cleans those up from jbd2.h

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 include/linux/jbd2.h | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index fd933c45281a..f76598265896 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1419,9 +1419,7 @@ extern void jbd2_journal_unfile_buffer(journal_t *, struct journal_head *);
 extern bool __jbd2_journal_refile_buffer(struct journal_head *);
 extern void jbd2_journal_refile_buffer(journal_t *, struct journal_head *);
 extern void __jbd2_journal_file_buffer(struct journal_head *, transaction_t *, int);
-extern void __journal_free_buffer(struct journal_head *bh);
 extern void jbd2_journal_file_buffer(struct journal_head *, transaction_t *, int);
-extern void __journal_clean_data_list(transaction_t *transaction);
 static inline void jbd2_file_log_bh(struct list_head *head, struct buffer_head *bh)
 {
 	list_add_tail(&bh->b_assoc_buffers, head);
@@ -1486,9 +1484,6 @@ extern int jbd2_journal_write_metadata_buffer(transaction_t *transaction,
 					      struct buffer_head **bh_out,
 					      sector_t blocknr);
 
-/* Transaction locking */
-extern void		__wait_on_journal (journal_t *);
-
 /* Transaction cache support */
 extern void jbd2_journal_destroy_transaction_cache(void);
 extern int __init jbd2_journal_init_transaction_cache(void);
@@ -1774,8 +1769,6 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal)
 #define BJ_Reserved	4	/* Buffer is reserved for access by journal */
 #define BJ_Types	5
 
-extern int jbd_blocks_per_page(struct inode *inode);
-
 /* JBD uses a CRC32 checksum */
 #define JBD_MAX_CHECKSUM_SIZE 4
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/6] jbd2: Refactor wait logic for transaction updates into a common function
  2022-01-13  3:26 [PATCH 0/6] ext4/jbd2: inline_data fixes and some cleanups Ritesh Harjani
                   ` (3 preceding siblings ...)
  2022-01-13  3:26 ` [PATCH 4/6] jbd2: Cleanup unused functions declarations from jbd2.h Ritesh Harjani
@ 2022-01-13  3:26 ` Ritesh Harjani
  2022-01-13 11:30   ` Jan Kara
  2022-01-13  3:26 ` [PATCH 6/6] jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates Ritesh Harjani
  5 siblings, 1 reply; 17+ messages in thread
From: Ritesh Harjani @ 2022-01-13  3:26 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, Jan Kara, Andreas Dilger, tytso,
	Eric Whitney, Ritesh Harjani

No functionality change as such in this patch. This only refactors the
common piece of code which waits for t_updates to finish into a common
function named as jbd2_journal_wait_updates(journal_t *)

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/jbd2/commit.c      | 19 +++----------------
 fs/jbd2/transaction.c | 24 +++---------------------
 include/linux/jbd2.h  | 31 ++++++++++++++++++++++++++++++-
 3 files changed, 36 insertions(+), 38 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 3cc4ab2ba7f4..428364f107be 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -484,22 +484,9 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 	stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
 					      stats.run.rs_locked);
 
-	spin_lock(&commit_transaction->t_handle_lock);
-	while (atomic_read(&commit_transaction->t_updates)) {
-		DEFINE_WAIT(wait);
+	// waits for any t_updates to finish
+	jbd2_journal_wait_updates(journal);
 
-		prepare_to_wait(&journal->j_wait_updates, &wait,
-					TASK_UNINTERRUPTIBLE);
-		if (atomic_read(&commit_transaction->t_updates)) {
-			spin_unlock(&commit_transaction->t_handle_lock);
-			write_unlock(&journal->j_state_lock);
-			schedule();
-			write_lock(&journal->j_state_lock);
-			spin_lock(&commit_transaction->t_handle_lock);
-		}
-		finish_wait(&journal->j_wait_updates, &wait);
-	}
-	spin_unlock(&commit_transaction->t_handle_lock);
 	commit_transaction->t_state = T_SWITCH;
 	write_unlock(&journal->j_state_lock);
 
@@ -817,7 +804,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 	commit_transaction->t_state = T_COMMIT_DFLUSH;
 	write_unlock(&journal->j_state_lock);
 
-	/* 
+	/*
 	 * If the journal is not located on the file system device,
 	 * then we must flush the file system device before we issue
 	 * the commit record
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 6a3caedd2285..89a955ab1557 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -449,7 +449,7 @@ static int start_this_handle(journal_t *journal, handle_t *handle,
 	}
 
 	/* OK, account for the buffers that this operation expects to
-	 * use and add the handle to the running transaction. 
+	 * use and add the handle to the running transaction.
 	 */
 	update_t_max_wait(transaction, ts);
 	handle->h_transaction = transaction;
@@ -863,27 +863,9 @@ void jbd2_journal_lock_updates(journal_t *journal)
 		write_lock(&journal->j_state_lock);
 	}
 
-	/* Wait until there are no running updates */
-	while (1) {
-		transaction_t *transaction = journal->j_running_transaction;
+	/* Wait until there are no running t_updates */
+	jbd2_journal_wait_updates(journal);
 
-		if (!transaction)
-			break;
-
-		spin_lock(&transaction->t_handle_lock);
-		prepare_to_wait(&journal->j_wait_updates, &wait,
-				TASK_UNINTERRUPTIBLE);
-		if (!atomic_read(&transaction->t_updates)) {
-			spin_unlock(&transaction->t_handle_lock);
-			finish_wait(&journal->j_wait_updates, &wait);
-			break;
-		}
-		spin_unlock(&transaction->t_handle_lock);
-		write_unlock(&journal->j_state_lock);
-		schedule();
-		finish_wait(&journal->j_wait_updates, &wait);
-		write_lock(&journal->j_state_lock);
-	}
 	write_unlock(&journal->j_state_lock);
 
 	/*
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f76598265896..34b051aa9009 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -594,7 +594,7 @@ struct transaction_s
 	 */
 	unsigned long		t_log_start;
 
-	/* 
+	/*
 	 * Number of buffers on the t_buffers list [j_list_lock, no locks
 	 * needed for jbd2 thread]
 	 */
@@ -1757,6 +1757,35 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal)
 	return max_t(long, free, 0);
 }
 
+/*
+ * Waits for any outstanding t_updates to finish.
+ * This is called with write j_state_lock held.
+ */
+static inline void jbd2_journal_wait_updates(journal_t *journal)
+{
+	transaction_t *commit_transaction = journal->j_running_transaction;
+
+	if (!commit_transaction)
+		return;
+
+	spin_lock(&commit_transaction->t_handle_lock);
+	while (atomic_read(&commit_transaction->t_updates)) {
+		DEFINE_WAIT(wait);
+
+		prepare_to_wait(&journal->j_wait_updates, &wait,
+					TASK_UNINTERRUPTIBLE);
+		if (atomic_read(&commit_transaction->t_updates)) {
+			spin_unlock(&commit_transaction->t_handle_lock);
+			write_unlock(&journal->j_state_lock);
+			schedule();
+			write_lock(&journal->j_state_lock);
+			spin_lock(&commit_transaction->t_handle_lock);
+		}
+		finish_wait(&journal->j_wait_updates, &wait);
+	}
+	spin_unlock(&commit_transaction->t_handle_lock);
+}
+
 /*
  * Definitions which augment the buffer_head layer
  */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/6] jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates
  2022-01-13  3:26 [PATCH 0/6] ext4/jbd2: inline_data fixes and some cleanups Ritesh Harjani
                   ` (4 preceding siblings ...)
  2022-01-13  3:26 ` [PATCH 5/6] jbd2: Refactor wait logic for transaction updates into a common function Ritesh Harjani
@ 2022-01-13  3:26 ` Ritesh Harjani
  2022-01-13 11:27   ` Jan Kara
  5 siblings, 1 reply; 17+ messages in thread
From: Ritesh Harjani @ 2022-01-13  3:26 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, Jan Kara, Andreas Dilger, tytso,
	Eric Whitney, Ritesh Harjani

Since jbd2_journal_wait_updates() uses waitq based on t_updates atomic_t
variable. So from code review it looks like we don't need to use
t_handle_lock spinlock for checking t_updates value.
Hence this patch gets rid of the spinlock protection in
jbd2_journal_wait_updates()

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 include/linux/jbd2.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 34b051aa9009..9bef47622b9d 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1768,22 +1768,18 @@ static inline void jbd2_journal_wait_updates(journal_t *journal)
 	if (!commit_transaction)
 		return;
 
-	spin_lock(&commit_transaction->t_handle_lock);
 	while (atomic_read(&commit_transaction->t_updates)) {
 		DEFINE_WAIT(wait);
 
 		prepare_to_wait(&journal->j_wait_updates, &wait,
 					TASK_UNINTERRUPTIBLE);
 		if (atomic_read(&commit_transaction->t_updates)) {
-			spin_unlock(&commit_transaction->t_handle_lock);
 			write_unlock(&journal->j_state_lock);
 			schedule();
 			write_lock(&journal->j_state_lock);
-			spin_lock(&commit_transaction->t_handle_lock);
 		}
 		finish_wait(&journal->j_wait_updates, &wait);
 	}
-	spin_unlock(&commit_transaction->t_handle_lock);
 }
 
 /*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] ext4: Remove redundant max inline_size check in ext4_da_write_inline_data_begin()
  2022-01-13  3:26 ` [PATCH 2/6] ext4: Remove redundant max inline_size check in ext4_da_write_inline_data_begin() Ritesh Harjani
@ 2022-01-13 10:58   ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2022-01-13 10:58 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, linux-fsdevel, linux-kernel, Jan Kara,
	Andreas Dilger, tytso, Eric Whitney

On Thu 13-01-22 08:56:25, Ritesh Harjani wrote:
> ext4_prepare_inline_data() already checks for ext4_get_max_inline_size()
> and returns -ENOSPC. So there is no need to check it twice within
> ext4_da_write_inline_data_begin(). This patch removes the extra check.
> 
> It also makes it more clean.
> 
> No functionality change in this patch.
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inline.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
> index 31741e8a462e..c52b0037983d 100644
> --- a/fs/ext4/inline.c
> +++ b/fs/ext4/inline.c
> @@ -913,7 +913,7 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping,
>  				    struct page **pagep,
>  				    void **fsdata)
>  {
> -	int ret, inline_size;
> +	int ret;
>  	handle_t *handle;
>  	struct page *page;
>  	struct ext4_iloc iloc;
> @@ -930,14 +930,9 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping,
>  		goto out;
>  	}
>  
> -	inline_size = ext4_get_max_inline_size(inode);
> -
> -	ret = -ENOSPC;
> -	if (inline_size >= pos + len) {
> -		ret = ext4_prepare_inline_data(handle, inode, pos + len);
> -		if (ret && ret != -ENOSPC)
> -			goto out_journal;
> -	}
> +	ret = ext4_prepare_inline_data(handle, inode, pos + len);
> +	if (ret && ret != -ENOSPC)
> +		goto out_journal;
>  
>  	/*
>  	 * We cannot recurse into the filesystem as the transaction
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] ext4: Fix error handling in ext4_restore_inline_data()
  2022-01-13  3:26 ` [PATCH 1/6] ext4: Fix error handling in ext4_restore_inline_data() Ritesh Harjani
@ 2022-01-13 10:58   ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2022-01-13 10:58 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, linux-fsdevel, linux-kernel, Jan Kara,
	Andreas Dilger, tytso, Eric Whitney

On Thu 13-01-22 08:56:24, Ritesh Harjani wrote:
> While running "./check -I 200 generic/475" it sometimes gives below
> kernel BUG(). Ideally we should not call ext4_write_inline_data() if
> ext4_create_inline_data() has failed.
> 
> <log snip>
> [73131.453234] kernel BUG at fs/ext4/inline.c:223!
> 
> <code snip>
>  212 static void ext4_write_inline_data(struct inode *inode, struct ext4_iloc *iloc,
>  213                                    void *buffer, loff_t pos, unsigned int len)
>  214 {
> <...>
>  223         BUG_ON(!EXT4_I(inode)->i_inline_off);
>  224         BUG_ON(pos + len > EXT4_I(inode)->i_inline_size);
> 
> This patch handles the error and prints out a emergency msg saying potential
> data loss for the given inode (since we couldn't restore the original
> inline_data due to some previous error).
> 
> [ 9571.070313] EXT4-fs (dm-0): error restoring inline_data for inode -- potential data loss! (inode 1703982, error -30)
> 
> Reported-by: Eric Whitney <enwlinux@gmail.com>
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

Makes sence. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inline.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
> index 534c0329e110..31741e8a462e 100644
> --- a/fs/ext4/inline.c
> +++ b/fs/ext4/inline.c
> @@ -1135,7 +1135,15 @@ static void ext4_restore_inline_data(handle_t *handle, struct inode *inode,
>  				     struct ext4_iloc *iloc,
>  				     void *buf, int inline_size)
>  {
> -	ext4_create_inline_data(handle, inode, inline_size);
> +	int ret;
> +
> +	ret = ext4_create_inline_data(handle, inode, inline_size);
> +	if (ret) {
> +		ext4_msg(inode->i_sb, KERN_EMERG,
> +			"error restoring inline_data for inode -- potential data loss! (inode %lu, error %d)",
> +			inode->i_ino, ret);
> +		return;
> +	}
>  	ext4_write_inline_data(inode, iloc, buf, 0, inline_size);
>  	ext4_set_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
>  }
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/6] ext4: Fix error handling in ext4_fc_record_modified_inode()
  2022-01-13  3:26 ` [PATCH 3/6] ext4: Fix error handling in ext4_fc_record_modified_inode() Ritesh Harjani
@ 2022-01-13 11:00   ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2022-01-13 11:00 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, linux-fsdevel, linux-kernel, Jan Kara,
	Andreas Dilger, tytso, Eric Whitney, luo penghao, Lukas Czerner

On Thu 13-01-22 08:56:26, Ritesh Harjani wrote:
> Current code does not fully takes care of krealloc() error case,
> which could lead to silent memory corruption or a kernel bug.
> This patch fixes that.
> 
> Also it cleans up some duplicated error handling logic from various functions
> in fast_commit.c file.
> 
> Reported-by: luo penghao <luo.penghao@zte.com.cn>
> Suggested-by: Lukas Czerner <lczerner@redhat.com>
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/fast_commit.c | 64 ++++++++++++++++++++-----------------------
>  1 file changed, 29 insertions(+), 35 deletions(-)
> 
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 5ae8026a0c56..4541c8468c01 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -1392,14 +1392,15 @@ static int ext4_fc_record_modified_inode(struct super_block *sb, int ino)
>  		if (state->fc_modified_inodes[i] == ino)
>  			return 0;
>  	if (state->fc_modified_inodes_used == state->fc_modified_inodes_size) {
> -		state->fc_modified_inodes_size +=
> -			EXT4_FC_REPLAY_REALLOC_INCREMENT;
>  		state->fc_modified_inodes = krealloc(
> -					state->fc_modified_inodes, sizeof(int) *
> -					state->fc_modified_inodes_size,
> -					GFP_KERNEL);
> +				state->fc_modified_inodes,
> +				sizeof(int) * (state->fc_modified_inodes_size +
> +				EXT4_FC_REPLAY_REALLOC_INCREMENT),
> +				GFP_KERNEL);
>  		if (!state->fc_modified_inodes)
>  			return -ENOMEM;
> +		state->fc_modified_inodes_size +=
> +			EXT4_FC_REPLAY_REALLOC_INCREMENT;
>  	}
>  	state->fc_modified_inodes[state->fc_modified_inodes_used++] = ino;
>  	return 0;
> @@ -1431,7 +1432,9 @@ static int ext4_fc_replay_inode(struct super_block *sb, struct ext4_fc_tl *tl,
>  	}
>  	inode = NULL;
>  
> -	ext4_fc_record_modified_inode(sb, ino);
> +	ret = ext4_fc_record_modified_inode(sb, ino);
> +	if (ret)
> +		goto out;
>  
>  	raw_fc_inode = (struct ext4_inode *)
>  		(val + offsetof(struct ext4_fc_inode, fc_raw_inode));
> @@ -1621,6 +1624,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
>  	}
>  
>  	ret = ext4_fc_record_modified_inode(sb, inode->i_ino);
> +	if (ret)
> +		goto out;
>  
>  	start = le32_to_cpu(ex->ee_block);
>  	start_pblk = ext4_ext_pblock(ex);
> @@ -1638,18 +1643,14 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
>  		map.m_pblk = 0;
>  		ret = ext4_map_blocks(NULL, inode, &map, 0);
>  
> -		if (ret < 0) {
> -			iput(inode);
> -			return 0;
> -		}
> +		if (ret < 0)
> +			goto out;
>  
>  		if (ret == 0) {
>  			/* Range is not mapped */
>  			path = ext4_find_extent(inode, cur, NULL, 0);
> -			if (IS_ERR(path)) {
> -				iput(inode);
> -				return 0;
> -			}
> +			if (IS_ERR(path))
> +				goto out;
>  			memset(&newex, 0, sizeof(newex));
>  			newex.ee_block = cpu_to_le32(cur);
>  			ext4_ext_store_pblock(
> @@ -1663,10 +1664,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
>  			up_write((&EXT4_I(inode)->i_data_sem));
>  			ext4_ext_drop_refs(path);
>  			kfree(path);
> -			if (ret) {
> -				iput(inode);
> -				return 0;
> -			}
> +			if (ret)
> +				goto out;
>  			goto next;
>  		}
>  
> @@ -1679,10 +1678,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
>  			ret = ext4_ext_replay_update_ex(inode, cur, map.m_len,
>  					ext4_ext_is_unwritten(ex),
>  					start_pblk + cur - start);
> -			if (ret) {
> -				iput(inode);
> -				return 0;
> -			}
> +			if (ret)
> +				goto out;
>  			/*
>  			 * Mark the old blocks as free since they aren't used
>  			 * anymore. We maintain an array of all the modified
> @@ -1702,10 +1699,8 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
>  			ext4_ext_is_unwritten(ex), map.m_pblk);
>  		ret = ext4_ext_replay_update_ex(inode, cur, map.m_len,
>  					ext4_ext_is_unwritten(ex), map.m_pblk);
> -		if (ret) {
> -			iput(inode);
> -			return 0;
> -		}
> +		if (ret)
> +			goto out;
>  		/*
>  		 * We may have split the extent tree while toggling the state.
>  		 * Try to shrink the extent tree now.
> @@ -1717,6 +1712,7 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
>  	}
>  	ext4_ext_replay_shrink_inode(inode, i_size_read(inode) >>
>  					sb->s_blocksize_bits);
> +out:
>  	iput(inode);
>  	return 0;
>  }
> @@ -1746,6 +1742,8 @@ ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl,
>  	}
>  
>  	ret = ext4_fc_record_modified_inode(sb, inode->i_ino);
> +	if (ret)
> +		goto out;
>  
>  	jbd_debug(1, "DEL_RANGE, inode %ld, lblk %d, len %d\n",
>  			inode->i_ino, le32_to_cpu(lrange.fc_lblk),
> @@ -1755,10 +1753,8 @@ ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl,
>  		map.m_len = remaining;
>  
>  		ret = ext4_map_blocks(NULL, inode, &map, 0);
> -		if (ret < 0) {
> -			iput(inode);
> -			return 0;
> -		}
> +		if (ret < 0)
> +			goto out;
>  		if (ret > 0) {
>  			remaining -= ret;
>  			cur += ret;
> @@ -1773,15 +1769,13 @@ ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl,
>  	ret = ext4_ext_remove_space(inode, lrange.fc_lblk,
>  				lrange.fc_lblk + lrange.fc_len - 1);
>  	up_write(&EXT4_I(inode)->i_data_sem);
> -	if (ret) {
> -		iput(inode);
> -		return 0;
> -	}
> +	if (ret)
> +		goto out;
>  	ext4_ext_replay_shrink_inode(inode,
>  		i_size_read(inode) >> sb->s_blocksize_bits);
>  	ext4_mark_inode_dirty(NULL, inode);
> +out:
>  	iput(inode);
> -
>  	return 0;
>  }
>  
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6] jbd2: Cleanup unused functions declarations from jbd2.h
  2022-01-13  3:26 ` [PATCH 4/6] jbd2: Cleanup unused functions declarations from jbd2.h Ritesh Harjani
@ 2022-01-13 11:01   ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2022-01-13 11:01 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, linux-fsdevel, linux-kernel, Jan Kara,
	Andreas Dilger, tytso, Eric Whitney

On Thu 13-01-22 08:56:27, Ritesh Harjani wrote:
> During code review found no references of few of these below function
> declarations. This patch cleans those up from jbd2.h
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

Spring cleaning :). Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/jbd2.h | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index fd933c45281a..f76598265896 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1419,9 +1419,7 @@ extern void jbd2_journal_unfile_buffer(journal_t *, struct journal_head *);
>  extern bool __jbd2_journal_refile_buffer(struct journal_head *);
>  extern void jbd2_journal_refile_buffer(journal_t *, struct journal_head *);
>  extern void __jbd2_journal_file_buffer(struct journal_head *, transaction_t *, int);
> -extern void __journal_free_buffer(struct journal_head *bh);
>  extern void jbd2_journal_file_buffer(struct journal_head *, transaction_t *, int);
> -extern void __journal_clean_data_list(transaction_t *transaction);
>  static inline void jbd2_file_log_bh(struct list_head *head, struct buffer_head *bh)
>  {
>  	list_add_tail(&bh->b_assoc_buffers, head);
> @@ -1486,9 +1484,6 @@ extern int jbd2_journal_write_metadata_buffer(transaction_t *transaction,
>  					      struct buffer_head **bh_out,
>  					      sector_t blocknr);
>  
> -/* Transaction locking */
> -extern void		__wait_on_journal (journal_t *);
> -
>  /* Transaction cache support */
>  extern void jbd2_journal_destroy_transaction_cache(void);
>  extern int __init jbd2_journal_init_transaction_cache(void);
> @@ -1774,8 +1769,6 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal)
>  #define BJ_Reserved	4	/* Buffer is reserved for access by journal */
>  #define BJ_Types	5
>  
> -extern int jbd_blocks_per_page(struct inode *inode);
> -
>  /* JBD uses a CRC32 checksum */
>  #define JBD_MAX_CHECKSUM_SIZE 4
>  
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates
  2022-01-13  3:26 ` [PATCH 6/6] jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates Ritesh Harjani
@ 2022-01-13 11:27   ` Jan Kara
  2022-01-13 12:38     ` Ritesh Harjani
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2022-01-13 11:27 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, linux-fsdevel, linux-kernel, Jan Kara,
	Andreas Dilger, tytso, Eric Whitney

On Thu 13-01-22 08:56:29, Ritesh Harjani wrote:
> Since jbd2_journal_wait_updates() uses waitq based on t_updates atomic_t
> variable. So from code review it looks like we don't need to use
> t_handle_lock spinlock for checking t_updates value.
> Hence this patch gets rid of the spinlock protection in
> jbd2_journal_wait_updates()
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

This patch looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

Actually looking at it, t_handle_lock seems to be very much unused. I agree
we don't need it when waiting for outstanding handles but the only
remaining uses are:

1) jbd2_journal_extend() where it is not needed either - we use
atomic_add_return() to manipulate t_outstanding_credits and hold
j_state_lock for reading which provides us enough exclusion.

2) update_t_max_wait() - this is the only valid use of t_handle_lock but we
can just switch it to cmpxchg loop with a bit of care. Something like:

	unsigned long old;

	ts = jbd2_time_diff(ts, transaction->t_start);
	old = transaction->t_max_wait;
	while (old < ts)
		old = cmpxchg(&transaction->t_max_wait, old, ts);

So perhaps you can add two more patches to remove other t_handle_lock uses
and drop it completely.

								Honza

> ---
>  include/linux/jbd2.h | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 34b051aa9009..9bef47622b9d 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1768,22 +1768,18 @@ static inline void jbd2_journal_wait_updates(journal_t *journal)
>  	if (!commit_transaction)
>  		return;
>  
> -	spin_lock(&commit_transaction->t_handle_lock);
>  	while (atomic_read(&commit_transaction->t_updates)) {
>  		DEFINE_WAIT(wait);
>  
>  		prepare_to_wait(&journal->j_wait_updates, &wait,
>  					TASK_UNINTERRUPTIBLE);
>  		if (atomic_read(&commit_transaction->t_updates)) {
> -			spin_unlock(&commit_transaction->t_handle_lock);
>  			write_unlock(&journal->j_state_lock);
>  			schedule();
>  			write_lock(&journal->j_state_lock);
> -			spin_lock(&commit_transaction->t_handle_lock);
>  		}
>  		finish_wait(&journal->j_wait_updates, &wait);
>  	}
> -	spin_unlock(&commit_transaction->t_handle_lock);
>  }
>  
>  /*
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/6] jbd2: Refactor wait logic for transaction updates into a common function
  2022-01-13  3:26 ` [PATCH 5/6] jbd2: Refactor wait logic for transaction updates into a common function Ritesh Harjani
@ 2022-01-13 11:30   ` Jan Kara
  2022-01-13 12:17     ` Ritesh Harjani
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2022-01-13 11:30 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, linux-fsdevel, linux-kernel, Jan Kara,
	Andreas Dilger, tytso, Eric Whitney

On Thu 13-01-22 08:56:28, Ritesh Harjani wrote:
> No functionality change as such in this patch. This only refactors the
> common piece of code which waits for t_updates to finish into a common
> function named as jbd2_journal_wait_updates(journal_t *)
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

Just one nit, otherwise. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

> @@ -1757,6 +1757,35 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal)
>  	return max_t(long, free, 0);
>  }
>  
> +/*
> + * Waits for any outstanding t_updates to finish.
> + * This is called with write j_state_lock held.
> + */
> +static inline void jbd2_journal_wait_updates(journal_t *journal)
> +{
> +	transaction_t *commit_transaction = journal->j_running_transaction;
> +
> +	if (!commit_transaction)
> +		return;
> +
> +	spin_lock(&commit_transaction->t_handle_lock);
> +	while (atomic_read(&commit_transaction->t_updates)) {
> +		DEFINE_WAIT(wait);
> +
> +		prepare_to_wait(&journal->j_wait_updates, &wait,
> +					TASK_UNINTERRUPTIBLE);
> +		if (atomic_read(&commit_transaction->t_updates)) {
> +			spin_unlock(&commit_transaction->t_handle_lock);
> +			write_unlock(&journal->j_state_lock);
> +			schedule();
> +			write_lock(&journal->j_state_lock);
> +			spin_lock(&commit_transaction->t_handle_lock);
> +		}
> +		finish_wait(&journal->j_wait_updates, &wait);
> +	}
> +	spin_unlock(&commit_transaction->t_handle_lock);
> +}
> +

I don't think making this inline makes sence. Neither the commit code nor
jbd2_journal_lock_updates() are so hot that it would warrant this large
inline function...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/6] jbd2: Refactor wait logic for transaction updates into a common function
  2022-01-13 11:30   ` Jan Kara
@ 2022-01-13 12:17     ` Ritesh Harjani
  0 siblings, 0 replies; 17+ messages in thread
From: Ritesh Harjani @ 2022-01-13 12:17 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, linux-fsdevel, linux-kernel, Jan Kara,
	Andreas Dilger, tytso, Eric Whitney

On 22/01/13 12:30PM, Jan Kara wrote:
> On Thu 13-01-22 08:56:28, Ritesh Harjani wrote:
> > No functionality change as such in this patch. This only refactors the
> > common piece of code which waits for t_updates to finish into a common
> > function named as jbd2_journal_wait_updates(journal_t *)
> >
> > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
>
> Just one nit, otherwise. Feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
> > @@ -1757,6 +1757,35 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal)
> >  	return max_t(long, free, 0);
> >  }
> >
> > +/*
> > + * Waits for any outstanding t_updates to finish.
> > + * This is called with write j_state_lock held.
> > + */
> > +static inline void jbd2_journal_wait_updates(journal_t *journal)
> > +{
> > +	transaction_t *commit_transaction = journal->j_running_transaction;
> > +
> > +	if (!commit_transaction)
> > +		return;
> > +
> > +	spin_lock(&commit_transaction->t_handle_lock);
> > +	while (atomic_read(&commit_transaction->t_updates)) {
> > +		DEFINE_WAIT(wait);
> > +
> > +		prepare_to_wait(&journal->j_wait_updates, &wait,
> > +					TASK_UNINTERRUPTIBLE);
> > +		if (atomic_read(&commit_transaction->t_updates)) {
> > +			spin_unlock(&commit_transaction->t_handle_lock);
> > +			write_unlock(&journal->j_state_lock);
> > +			schedule();
> > +			write_lock(&journal->j_state_lock);
> > +			spin_lock(&commit_transaction->t_handle_lock);
> > +		}
> > +		finish_wait(&journal->j_wait_updates, &wait);
> > +	}
> > +	spin_unlock(&commit_transaction->t_handle_lock);
> > +}
> > +
>
> I don't think making this inline makes sence. Neither the commit code nor
> jbd2_journal_lock_updates() are so hot that it would warrant this large
> inline function...

Yes, make sense. Thanks for the review.
Will do the needful in v2.

-ritesh

>
> 								Honza
>
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates
  2022-01-13 11:27   ` Jan Kara
@ 2022-01-13 12:38     ` Ritesh Harjani
  2022-01-17 12:55       ` Ritesh Harjani
  0 siblings, 1 reply; 17+ messages in thread
From: Ritesh Harjani @ 2022-01-13 12:38 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, linux-fsdevel, linux-kernel, Jan Kara,
	Andreas Dilger, tytso, Eric Whitney

On 22/01/13 12:27PM, Jan Kara wrote:
> On Thu 13-01-22 08:56:29, Ritesh Harjani wrote:
> > Since jbd2_journal_wait_updates() uses waitq based on t_updates atomic_t
> > variable. So from code review it looks like we don't need to use
> > t_handle_lock spinlock for checking t_updates value.
> > Hence this patch gets rid of the spinlock protection in
> > jbd2_journal_wait_updates()
> >
> > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
>
> This patch looks good. Feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
> Actually looking at it, t_handle_lock seems to be very much unused. I agree

I too had this thought in mind. Thanks for taking a deeper look into it :)

>
> we don't need it when waiting for outstanding handles but the only
> remaining uses are:
>
> 1) jbd2_journal_extend() where it is not needed either - we use
> atomic_add_return() to manipulate t_outstanding_credits and hold
> j_state_lock for reading which provides us enough exclusion.
>
> 2) update_t_max_wait() - this is the only valid use of t_handle_lock but we
> can just switch it to cmpxchg loop with a bit of care. Something like:
>
> 	unsigned long old;
>
> 	ts = jbd2_time_diff(ts, transaction->t_start);
> 	old = transaction->t_max_wait;
> 	while (old < ts)
> 		old = cmpxchg(&transaction->t_max_wait, old, ts);
>
> So perhaps you can add two more patches to remove other t_handle_lock uses
> and drop it completely.

Thanks for providing the details Jan :)
Agree with jbd2_journal_extend(). I did looked a bit around t_max_wait and
I agree that something like above could work. I will spend some more time around
that code and will submit those changes together in v2.

-ritesh

>
> 								Honza
>
> > ---
> >  include/linux/jbd2.h | 4 ----
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > index 34b051aa9009..9bef47622b9d 100644
> > --- a/include/linux/jbd2.h
> > +++ b/include/linux/jbd2.h
> > @@ -1768,22 +1768,18 @@ static inline void jbd2_journal_wait_updates(journal_t *journal)
> >  	if (!commit_transaction)
> >  		return;
> >
> > -	spin_lock(&commit_transaction->t_handle_lock);
> >  	while (atomic_read(&commit_transaction->t_updates)) {
> >  		DEFINE_WAIT(wait);
> >
> >  		prepare_to_wait(&journal->j_wait_updates, &wait,
> >  					TASK_UNINTERRUPTIBLE);
> >  		if (atomic_read(&commit_transaction->t_updates)) {
> > -			spin_unlock(&commit_transaction->t_handle_lock);
> >  			write_unlock(&journal->j_state_lock);
> >  			schedule();
> >  			write_lock(&journal->j_state_lock);
> > -			spin_lock(&commit_transaction->t_handle_lock);
> >  		}
> >  		finish_wait(&journal->j_wait_updates, &wait);
> >  	}
> > -	spin_unlock(&commit_transaction->t_handle_lock);
> >  }
> >
> >  /*
> > --
> > 2.31.1
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates
  2022-01-13 12:38     ` Ritesh Harjani
@ 2022-01-17 12:55       ` Ritesh Harjani
  2022-01-17 14:38         ` Jan Kara
  0 siblings, 1 reply; 17+ messages in thread
From: Ritesh Harjani @ 2022-01-17 12:55 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, linux-fsdevel, linux-kernel, Jan Kara,
	Andreas Dilger, tytso, Eric Whitney

On 22/01/13 06:08PM, Ritesh Harjani wrote:
> On 22/01/13 12:27PM, Jan Kara wrote:
> > On Thu 13-01-22 08:56:29, Ritesh Harjani wrote:
> > > Since jbd2_journal_wait_updates() uses waitq based on t_updates atomic_t
> > > variable. So from code review it looks like we don't need to use
> > > t_handle_lock spinlock for checking t_updates value.
> > > Hence this patch gets rid of the spinlock protection in
> > > jbd2_journal_wait_updates()
> > >
> > > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> >
> > This patch looks good. Feel free to add:
> >
> > Reviewed-by: Jan Kara <jack@suse.cz>
> >
> > Actually looking at it, t_handle_lock seems to be very much unused. I agree

Thanks Jan for your help in this.
I have dropped this patch from v2 in order to discuss few more things and I felt
killing t_handle_lock completely can be sent in a seperate patch series.


>
> I too had this thought in mind. Thanks for taking a deeper look into it :)
>
> >
> > we don't need it when waiting for outstanding handles but the only
> > remaining uses are:
> >
> > 1) jbd2_journal_extend() where it is not needed either - we use
> > atomic_add_return() to manipulate t_outstanding_credits and hold
> > j_state_lock for reading which provides us enough exclusion.

I looked into jbd2_journal_extend and yes, we don't need t_handle_lock
for updating transaction->t_outstanding_credits, since it already happens with
atomic API calls.

Now I do see we update handle->h_**_credits in that function.
But I think this is per process (based on task_struct, current->journal_info)
and doesn't need a lock protection right?


> >
> > 2) update_t_max_wait() - this is the only valid use of t_handle_lock but we
> > can just switch it to cmpxchg loop with a bit of care. Something like:
> >
> > 	unsigned long old;
> >
> > 	ts = jbd2_time_diff(ts, transaction->t_start);
> > 	old = transaction->t_max_wait;
> > 	while (old < ts)
> > 		old = cmpxchg(&transaction->t_max_wait, old, ts);

I think there might be a simpler and more straight forward way for updating
t_max_wait.

I did look into the t_max_wait logic and where all we are updating it.

t_max_wait is the max wait time in starting (&attaching) a _new_ running
transaction by a handle. Is this understaning correct?
From code I don't see t_max_wait getting updated for the time taken in order
to start the handle by a existing running transaction.

Here is how -
update_t_max_wait() will only update t_max_wait if the
transaction->t_start is after ts
(ts is nothing but when start_this_handle() was called).

1. This means that for transaction->t_start to be greater than ts, it has to be
   the new transaction that gets started right (in start_this_handle() func)?

2. Second place where transaction->t_start is updated is just after the start of
   commit phase 7. But this only means that this transaction has become the
   commit transaction. That means someone has to alloc a new running transaction
   which again is case-1.

Now I think this spinlock was added since multiple processes can start a handle
in parallel and attach a running transaction.

Also this was then moved within CONFIG_JBD2_DEBUG since to avoid spinlock
contention on a SMP system in starting multiple handles by different processes.

Now looking at all of above, I think we can move update_t_max_wait()
inside jbd2_get_transaction() in start_this_handle(). Because that is where
a new transaction will be started and transaction->t_start will be greater then
ts. This also is protected within j_state_lock write_lock, so we don't need
spinlock.

This would also mean that we can move t_max_wait outside of CONFIG_JBD2_DEBUG
and jbd2_journal_enable_debug.

Jan, could you confirm if above understaning is correct and shall I go ahead
with above changes?

-ritesh

> >
> > So perhaps you can add two more patches to remove other t_handle_lock uses
> > and drop it completely.
>
> Thanks for providing the details Jan :)
> Agree with jbd2_journal_extend().





> I did looked a bit around t_max_wait and
> I agree that something like above could work. I will spend some more time around
> that code and will submit those changes together in v2.
>
> -ritesh
>
> >
> > 								Honza
> >
> > > ---
> > >  include/linux/jbd2.h | 4 ----
> > >  1 file changed, 4 deletions(-)
> > >
> > > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> > > index 34b051aa9009..9bef47622b9d 100644
> > > --- a/include/linux/jbd2.h
> > > +++ b/include/linux/jbd2.h
> > > @@ -1768,22 +1768,18 @@ static inline void jbd2_journal_wait_updates(journal_t *journal)
> > >  	if (!commit_transaction)
> > >  		return;
> > >
> > > -	spin_lock(&commit_transaction->t_handle_lock);
> > >  	while (atomic_read(&commit_transaction->t_updates)) {
> > >  		DEFINE_WAIT(wait);
> > >
> > >  		prepare_to_wait(&journal->j_wait_updates, &wait,
> > >  					TASK_UNINTERRUPTIBLE);
> > >  		if (atomic_read(&commit_transaction->t_updates)) {
> > > -			spin_unlock(&commit_transaction->t_handle_lock);
> > >  			write_unlock(&journal->j_state_lock);
> > >  			schedule();
> > >  			write_lock(&journal->j_state_lock);
> > > -			spin_lock(&commit_transaction->t_handle_lock);
> > >  		}
> > >  		finish_wait(&journal->j_wait_updates, &wait);
> > >  	}
> > > -	spin_unlock(&commit_transaction->t_handle_lock);
> > >  }
> > >
> > >  /*
> > > --
> > > 2.31.1
> > >
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates
  2022-01-17 12:55       ` Ritesh Harjani
@ 2022-01-17 14:38         ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2022-01-17 14:38 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Jan Kara, linux-ext4, linux-fsdevel, linux-kernel, Jan Kara,
	Andreas Dilger, tytso, Eric Whitney

On Mon 17-01-22 18:25:27, Ritesh Harjani wrote:
> On 22/01/13 06:08PM, Ritesh Harjani wrote:
> > On 22/01/13 12:27PM, Jan Kara wrote:
> > > On Thu 13-01-22 08:56:29, Ritesh Harjani wrote:
> > > > Since jbd2_journal_wait_updates() uses waitq based on t_updates atomic_t
> > > > variable. So from code review it looks like we don't need to use
> > > > t_handle_lock spinlock for checking t_updates value.
> > > > Hence this patch gets rid of the spinlock protection in
> > > > jbd2_journal_wait_updates()
> > > >
> > > > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> > >
> > > This patch looks good. Feel free to add:
> > >
> > > Reviewed-by: Jan Kara <jack@suse.cz>
> > >
> > > Actually looking at it, t_handle_lock seems to be very much unused. I agree
> 
> Thanks Jan for your help in this.
> I have dropped this patch from v2 in order to discuss few more things and I felt
> killing t_handle_lock completely can be sent in a seperate patch series.

Yes, probably a good choice.

> > I too had this thought in mind. Thanks for taking a deeper look into it :)
> >
> > >
> > > we don't need it when waiting for outstanding handles but the only
> > > remaining uses are:
> > >
> > > 1) jbd2_journal_extend() where it is not needed either - we use
> > > atomic_add_return() to manipulate t_outstanding_credits and hold
> > > j_state_lock for reading which provides us enough exclusion.
> 
> I looked into jbd2_journal_extend and yes, we don't need t_handle_lock
> for updating transaction->t_outstanding_credits, since it already happens with
> atomic API calls.
> 
> Now I do see we update handle->h_**_credits in that function.
> But I think this is per process (based on task_struct, current->journal_info)
> and doesn't need a lock protection right?

Yes, handle is per process so no lock is needed there.

> > > 2) update_t_max_wait() - this is the only valid use of t_handle_lock but we
> > > can just switch it to cmpxchg loop with a bit of care. Something like:
> > >
> > > 	unsigned long old;
> > >
> > > 	ts = jbd2_time_diff(ts, transaction->t_start);
> > > 	old = transaction->t_max_wait;
> > > 	while (old < ts)
> > > 		old = cmpxchg(&transaction->t_max_wait, old, ts);
> 
> I think there might be a simpler and more straight forward way for updating
> t_max_wait.
> 
> I did look into the t_max_wait logic and where all we are updating it.
> 
> t_max_wait is the max wait time in starting (&attaching) a _new_ running
> transaction by a handle. Is this understaning correct?

Correct. It is the maximum time we had to wait for a new transaction to be
created.

> From code I don't see t_max_wait getting updated for the time taken in order
> to start the handle by a existing running transaction.
> 
> Here is how -
> update_t_max_wait() will only update t_max_wait if the
> transaction->t_start is after ts
> (ts is nothing but when start_this_handle() was called).
> 
> 1. This means that for transaction->t_start to be greater than ts, it has to be
>    the new transaction that gets started right (in start_this_handle() func)?
>
> 2. Second place where transaction->t_start is updated is just after the start of
>    commit phase 7. But this only means that this transaction has become the
>    commit transaction. That means someone has to alloc a new running transaction
>    which again is case-1.
> 
> Now I think this spinlock was added since multiple processes can start a handle
> in parallel and attach a running transaction.
> 
> Also this was then moved within CONFIG_JBD2_DEBUG since to avoid spinlock
> contention on a SMP system in starting multiple handles by different processes.
> 
> Now looking at all of above, I think we can move update_t_max_wait()
> inside jbd2_get_transaction() in start_this_handle(). Because that is where
> a new transaction will be started and transaction->t_start will be greater then
> ts. This also is protected within j_state_lock write_lock, so we don't need
> spinlock.

All above is correct upto this point. The catch is there can be (and often
are) more processes in start_this_handle() waiting in
wait_transaction_switching() and then racing to create the new transaction.
The process calling jbd2_get_transaction() is not necessarily the one which
entered start_this_handle() first and thus t_max_wait would not be really
the maximum time someone had to wait.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-01-17 14:38 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-13  3:26 [PATCH 0/6] ext4/jbd2: inline_data fixes and some cleanups Ritesh Harjani
2022-01-13  3:26 ` [PATCH 1/6] ext4: Fix error handling in ext4_restore_inline_data() Ritesh Harjani
2022-01-13 10:58   ` Jan Kara
2022-01-13  3:26 ` [PATCH 2/6] ext4: Remove redundant max inline_size check in ext4_da_write_inline_data_begin() Ritesh Harjani
2022-01-13 10:58   ` Jan Kara
2022-01-13  3:26 ` [PATCH 3/6] ext4: Fix error handling in ext4_fc_record_modified_inode() Ritesh Harjani
2022-01-13 11:00   ` Jan Kara
2022-01-13  3:26 ` [PATCH 4/6] jbd2: Cleanup unused functions declarations from jbd2.h Ritesh Harjani
2022-01-13 11:01   ` Jan Kara
2022-01-13  3:26 ` [PATCH 5/6] jbd2: Refactor wait logic for transaction updates into a common function Ritesh Harjani
2022-01-13 11:30   ` Jan Kara
2022-01-13 12:17     ` Ritesh Harjani
2022-01-13  3:26 ` [PATCH 6/6] jbd2: No need to use t_handle_lock in jbd2_journal_wait_updates Ritesh Harjani
2022-01-13 11:27   ` Jan Kara
2022-01-13 12:38     ` Ritesh Harjani
2022-01-17 12:55       ` Ritesh Harjani
2022-01-17 14:38         ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).