All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/8] ext4, jbd2: fix 3 issues about bdev_try_to_free_page()
@ 2021-05-27 13:56 Zhang Yi
  2021-05-27 13:56 ` [RFC PATCH v3 1/8] jbd2: remove the out label in __jbd2_journal_remove_checkpoint() Zhang Yi
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Zhang Yi @ 2021-05-27 13:56 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

Patch 1-3: fix a potential filesystem inconsistency problem.
Patch 4-8: add a shrinker to release journal_head and remove
           bdev_try_to_free_page() callback, and also do some cleanup.

Changes since v2:
 - Fix some comments and spelling mistakes on patch 2 and 3.
 - Give up the solution of add refcount on super_block and fix the
   use-after-free issue in bdev_try_to_free_page(), switch to introduce
   a shrinker to free checkpoint buffers' journal_head and remove the
   whole callback at all.

Thanks,
Yi.

---------

Changes since v1:
 - Do not use j_checkpoint_mutex to fix the filesystem inconsistency
   problem, introduce a new mark instead.
 - Fix superblock use-after-free issue in blkdev_releasepage().
 - Avoid race between bdev_try_to_free_page() and ext4_put_super().



Zhang Yi (8):
  jbd2: remove the out label in __jbd2_journal_remove_checkpoint()
  jbd2: ensure abort the journal if detect IO error when writing
    original buffer back
  jbd2: don't abort the journal when freeing buffers
  jbd2: remove redundant buffer io error checks
  jbd2,ext4: add a shrinker to release checkpointed buffers
  jbd2: simplify journal_clean_one_cp_list()
  ext4: remove bdev_try_to_free_page() callback
  fs: remove bdev_try_to_free_page callback

 fs/block_dev.c              |  15 ---
 fs/ext4/super.c             |  29 ++----
 fs/jbd2/checkpoint.c        | 200 ++++++++++++++++++++++++++++++------
 fs/jbd2/journal.c           | 101 ++++++++++++++++++
 fs/jbd2/transaction.c       |  17 ---
 include/linux/fs.h          |   1 -
 include/linux/jbd2.h        |  37 +++++++
 include/trace/events/jbd2.h | 101 ++++++++++++++++++
 8 files changed, 413 insertions(+), 88 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 1/8] jbd2: remove the out label in __jbd2_journal_remove_checkpoint()
  2021-05-27 13:56 [RFC PATCH v3 0/8] ext4, jbd2: fix 3 issues about bdev_try_to_free_page() Zhang Yi
@ 2021-05-27 13:56 ` Zhang Yi
  2021-05-27 13:56 ` [RFC PATCH v3 2/8] jbd2: ensure abort the journal if detect IO error when writing original buffer back Zhang Yi
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Zhang Yi @ 2021-05-27 13:56 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

The 'out' lable just return the 'ret' value and seems not required, so
remove this label and switch to return appropriate value immediately.
This patch also do some minor cleanup, no logical change.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/jbd2/checkpoint.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 63b526d44886..bf5511d19ac5 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -564,13 +564,13 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
 	struct transaction_chp_stats_s *stats;
 	transaction_t *transaction;
 	journal_t *journal;
-	int ret = 0;
 
 	JBUFFER_TRACE(jh, "entry");
 
-	if ((transaction = jh->b_cp_transaction) == NULL) {
+	transaction = jh->b_cp_transaction;
+	if (!transaction) {
 		JBUFFER_TRACE(jh, "not on transaction");
-		goto out;
+		return 0;
 	}
 	journal = transaction->t_journal;
 
@@ -579,9 +579,9 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
 	jh->b_cp_transaction = NULL;
 	jbd2_journal_put_journal_head(jh);
 
-	if (transaction->t_checkpoint_list != NULL ||
-	    transaction->t_checkpoint_io_list != NULL)
-		goto out;
+	/* Is this transaction empty? */
+	if (transaction->t_checkpoint_list || transaction->t_checkpoint_io_list)
+		return 0;
 
 	/*
 	 * There is one special case to worry about: if we have just pulled the
@@ -593,10 +593,12 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
 	 * See the comment at the end of jbd2_journal_commit_transaction().
 	 */
 	if (transaction->t_state != T_FINISHED)
-		goto out;
+		return 0;
 
-	/* OK, that was the last buffer for the transaction: we can now
-	   safely remove this transaction from the log */
+	/*
+	 * OK, that was the last buffer for the transaction, we can now
+	 * safely remove this transaction from the log.
+	 */
 	stats = &transaction->t_chp_stats;
 	if (stats->cs_chp_time)
 		stats->cs_chp_time = jbd2_time_diff(stats->cs_chp_time,
@@ -606,9 +608,7 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
 
 	__jbd2_journal_drop_transaction(journal, transaction);
 	jbd2_journal_free_transaction(transaction);
-	ret = 1;
-out:
-	return ret;
+	return 1;
 }
 
 /*
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 2/8] jbd2: ensure abort the journal if detect IO error when writing original buffer back
  2021-05-27 13:56 [RFC PATCH v3 0/8] ext4, jbd2: fix 3 issues about bdev_try_to_free_page() Zhang Yi
  2021-05-27 13:56 ` [RFC PATCH v3 1/8] jbd2: remove the out label in __jbd2_journal_remove_checkpoint() Zhang Yi
@ 2021-05-27 13:56 ` Zhang Yi
  2021-06-03 16:21   ` Jan Kara
  2021-05-27 13:56 ` [RFC PATCH v3 3/8] jbd2: don't abort the journal when freeing buffers Zhang Yi
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Zhang Yi @ 2021-05-27 13:56 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

Although we merged c044f3d8360 ("jbd2: abort journal if free a async
write error metadata buffer"), there is a race between
jbd2_journal_try_to_free_buffers() and jbd2_journal_destroy(), so the
jbd2_log_do_checkpoint() may still fail to detect the buffer write
io error flag which may lead to filesystem inconsistency.

jbd2_journal_try_to_free_buffers()     ext4_put_super()
                                        jbd2_journal_destroy()
  __jbd2_journal_remove_checkpoint()
  detect buffer write error              jbd2_log_do_checkpoint()
                                         jbd2_cleanup_journal_tail()
                                           <--- lead to inconsistency
  jbd2_journal_abort()

Fix this issue by introducing a new atomic flag which only have one
JBD2_CHECKPOINT_IO_ERROR bit now, and set it in
__jbd2_journal_remove_checkpoint() when freeing a checkpoint buffer
which has write_io_error flag. Then jbd2_journal_destroy() will detect
this mark and abort the journal to prevent updating log tail.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/jbd2/checkpoint.c | 12 ++++++++++++
 fs/jbd2/journal.c    | 14 ++++++++++++++
 include/linux/jbd2.h | 11 +++++++++++
 3 files changed, 37 insertions(+)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index bf5511d19ac5..2cbac0e3cff3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -564,6 +564,7 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
 	struct transaction_chp_stats_s *stats;
 	transaction_t *transaction;
 	journal_t *journal;
+	struct buffer_head *bh = jh2bh(jh);
 
 	JBUFFER_TRACE(jh, "entry");
 
@@ -575,6 +576,17 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
 	journal = transaction->t_journal;
 
 	JBUFFER_TRACE(jh, "removing from transaction");
+
+	/*
+	 * If we have failed to write the buffer out to disk, the filesystem
+	 * may become inconsistent. We cannot abort the journal here since
+	 * we hold j_list_lock and we have to careful about races with
+	 * jbd2_journal_destroy(). So mark the writeback IO error in the
+	 * journal here and we abort the journal later from a better context.
+	 */
+	if (buffer_write_io_error(bh))
+		set_bit(JBD2_CHECKPOINT_IO_ERROR, &journal->j_atomic_flags);
+
 	__buffer_unlink(jh);
 	jh->b_cp_transaction = NULL;
 	jbd2_journal_put_journal_head(jh);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 2dc944442802..90146755941f 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1618,6 +1618,10 @@ int jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
 
 	if (is_journal_aborted(journal))
 		return -EIO;
+	if (test_bit(JBD2_CHECKPOINT_IO_ERROR, &journal->j_atomic_flags)) {
+		jbd2_journal_abort(journal, -EIO);
+		return -EIO;
+	}
 
 	BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex));
 	jbd_debug(1, "JBD2: updating superblock (start %lu, seq %u)\n",
@@ -1995,6 +1999,16 @@ int jbd2_journal_destroy(journal_t *journal)
 	J_ASSERT(journal->j_checkpoint_transactions == NULL);
 	spin_unlock(&journal->j_list_lock);
 
+	/*
+	 * OK, all checkpoint transactions have been checked, now check the
+	 * write out io error flag and abort the journal if some buffer failed
+	 * to write back to the original location, otherwise the filesystem
+	 * may become inconsistent.
+	 */
+	if (!is_journal_aborted(journal) &&
+	    test_bit(JBD2_CHECKPOINT_IO_ERROR, &journal->j_atomic_flags))
+		jbd2_journal_abort(journal, -EIO);
+
 	if (journal->j_sb_buffer) {
 		if (!is_journal_aborted(journal)) {
 			mutex_lock_io(&journal->j_checkpoint_mutex);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index db0e1920cb12..f9b5e657b8f3 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -779,6 +779,11 @@ struct journal_s
 	 */
 	unsigned long		j_flags;
 
+	/**
+	 * @j_atomic_flags: Atomic journaling state flags.
+	 */
+	unsigned long		j_atomic_flags;
+
 	/**
 	 * @j_errno:
 	 *
@@ -1371,6 +1376,12 @@ JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit,	FAST_COMMIT)
 #define JBD2_FAST_COMMIT_ONGOING	0x100	/* Fast commit is ongoing */
 #define JBD2_FULL_COMMIT_ONGOING	0x200	/* Full commit is ongoing */
 
+/*
+ * Journal atomic flag definitions
+ */
+#define JBD2_CHECKPOINT_IO_ERROR	0x001	/* Detect io error while writing
+						 * buffer back to disk */
+
 /*
  * Function declarations for the journaling transaction and buffer
  * management
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 3/8] jbd2: don't abort the journal when freeing buffers
  2021-05-27 13:56 [RFC PATCH v3 0/8] ext4, jbd2: fix 3 issues about bdev_try_to_free_page() Zhang Yi
  2021-05-27 13:56 ` [RFC PATCH v3 1/8] jbd2: remove the out label in __jbd2_journal_remove_checkpoint() Zhang Yi
  2021-05-27 13:56 ` [RFC PATCH v3 2/8] jbd2: ensure abort the journal if detect IO error when writing original buffer back Zhang Yi
@ 2021-05-27 13:56 ` Zhang Yi
  2021-05-27 13:56 ` [RFC PATCH v3 4/8] jbd2: remove redundant buffer io error checks Zhang Yi
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Zhang Yi @ 2021-05-27 13:56 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

Now that we can be sure the journal is aborted once a buffer has failed
to be written back to disk, we can remove the journal abort logic in
jbd2_journal_try_to_free_buffers() which was introduced in
commit c044f3d8360d ("jbd2: abort journal if free a async write error
metadata buffer"), because it may cost and propably is not safe.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/jbd2/transaction.c | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index e8fc45fd751f..8804e126805f 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2123,7 +2123,6 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page)
 {
 	struct buffer_head *head;
 	struct buffer_head *bh;
-	bool has_write_io_error = false;
 	int ret = 0;
 
 	J_ASSERT(PageLocked(page));
@@ -2148,26 +2147,10 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page)
 		jbd2_journal_put_journal_head(jh);
 		if (buffer_jbd(bh))
 			goto busy;
-
-		/*
-		 * If we free a metadata buffer which has been failed to
-		 * write out, the jbd2 checkpoint procedure will not detect
-		 * this failure and may lead to filesystem inconsistency
-		 * after cleanup journal tail.
-		 */
-		if (buffer_write_io_error(bh)) {
-			pr_err("JBD2: Error while async write back metadata bh %llu.",
-			       (unsigned long long)bh->b_blocknr);
-			has_write_io_error = true;
-		}
 	} while ((bh = bh->b_this_page) != head);
 
 	ret = try_to_free_buffers(page);
-
 busy:
-	if (has_write_io_error)
-		jbd2_journal_abort(journal, -EIO);
-
 	return ret;
 }
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 4/8] jbd2: remove redundant buffer io error checks
  2021-05-27 13:56 [RFC PATCH v3 0/8] ext4, jbd2: fix 3 issues about bdev_try_to_free_page() Zhang Yi
                   ` (2 preceding siblings ...)
  2021-05-27 13:56 ` [RFC PATCH v3 3/8] jbd2: don't abort the journal when freeing buffers Zhang Yi
@ 2021-05-27 13:56 ` Zhang Yi
  2021-06-03 16:28   ` Jan Kara
  2021-05-27 13:56 ` [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers Zhang Yi
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Zhang Yi @ 2021-05-27 13:56 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

Now that __jbd2_journal_remove_checkpoint() can detect buffer io error
and mark journal checkpoint error, then we abort the journal later
before updating log tail to ensure the filesystem works consistently.
So we could remove other redundant buffer io error checkes.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/jbd2/checkpoint.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 2cbac0e3cff3..c1f746a5cc1a 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -91,8 +91,7 @@ static int __try_to_free_cp_buf(struct journal_head *jh)
 	int ret = 0;
 	struct buffer_head *bh = jh2bh(jh);
 
-	if (jh->b_transaction == NULL && !buffer_locked(bh) &&
-	    !buffer_dirty(bh) && !buffer_write_io_error(bh)) {
+	if (!jh->b_transaction && !buffer_locked(bh) && !buffer_dirty(bh)) {
 		JBUFFER_TRACE(jh, "remove from checkpoint list");
 		ret = __jbd2_journal_remove_checkpoint(jh) + 1;
 	}
@@ -295,8 +294,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
 			goto restart;
 		}
 		if (!buffer_dirty(bh)) {
-			if (unlikely(buffer_write_io_error(bh)) && !result)
-				result = -EIO;
 			BUFFER_TRACE(bh, "remove from checkpoint");
 			if (__jbd2_journal_remove_checkpoint(jh))
 				/* The transaction was released; we're done */
@@ -356,8 +353,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
 			spin_lock(&journal->j_list_lock);
 			goto restart2;
 		}
-		if (unlikely(buffer_write_io_error(bh)) && !result)
-			result = -EIO;
 
 		/*
 		 * Now in whatever state the buffer currently is, we
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers
  2021-05-27 13:56 [RFC PATCH v3 0/8] ext4, jbd2: fix 3 issues about bdev_try_to_free_page() Zhang Yi
                   ` (3 preceding siblings ...)
  2021-05-27 13:56 ` [RFC PATCH v3 4/8] jbd2: remove redundant buffer io error checks Zhang Yi
@ 2021-05-27 13:56 ` Zhang Yi
  2021-05-27 15:10   ` [RFC PATCH v3 5/8] jbd2, ext4: " kernel test robot
                     ` (5 more replies)
  2021-05-27 13:56 ` [RFC PATCH v3 6/8] jbd2: simplify journal_clean_one_cp_list() Zhang Yi
                   ` (2 subsequent siblings)
  7 siblings, 6 replies; 22+ messages in thread
From: Zhang Yi @ 2021-05-27 13:56 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

Current metadata buffer release logic in bdev_try_to_free_page() have
a lot of use-after-free issues when umount filesystem concurrently, and
it is difficult to fix directly because ext4 is the only user of
s_op->bdev_try_to_free_page callback and we may have to add more special
refcount or lock that is only used by ext4 into the common vfs layer,
which is unacceptable.

One better solution is remove the bdev_try_to_free_page callback, but
the real problem is we cannot easily release journal_head on the
checkpointed buffer, so try_to_free_buffers() cannot release buffers and
page under memory pressure, which is more likely to trigger
out-of-memory. So we cannot remove the callback directly before we find
another way to release journal_head.

This patch introduce a shrinker to free journal_head on the checkpointed
transaction. After the journal_head got freed, try_to_free_buffers()
could free buffer properly.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/super.c             |   8 ++
 fs/jbd2/checkpoint.c        | 147 ++++++++++++++++++++++++++++++++++++
 fs/jbd2/journal.c           |  87 +++++++++++++++++++++
 include/linux/jbd2.h        |  26 +++++++
 include/trace/events/jbd2.h | 101 +++++++++++++++++++++++++
 5 files changed, 369 insertions(+)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 7dc94f3e18e6..bf6d0085e1b7 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1174,6 +1174,7 @@ static void ext4_put_super(struct super_block *sb)
 	ext4_unregister_sysfs(sb);
 
 	if (sbi->s_journal) {
+		jbd2_journal_unregister_shrinker(sbi->s_journal);
 		aborted = is_journal_aborted(sbi->s_journal);
 		err = jbd2_journal_destroy(sbi->s_journal);
 		sbi->s_journal = NULL;
@@ -5172,6 +5173,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	sbi->s_ea_block_cache = NULL;
 
 	if (sbi->s_journal) {
+		jbd2_journal_unregister_shrinker(sbi->s_journal);
 		jbd2_journal_destroy(sbi->s_journal);
 		sbi->s_journal = NULL;
 	}
@@ -5497,6 +5499,12 @@ static int ext4_load_journal(struct super_block *sb,
 		ext4_commit_super(sb);
 	}
 
+	err = jbd2_journal_register_shrinker(journal);
+	if (err) {
+		EXT4_SB(sb)->s_journal = NULL;
+		goto err_out;
+	}
+
 	return 0;
 
 err_out:
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index c1f746a5cc1a..727389185d24 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -79,6 +79,18 @@ static inline void __buffer_relink_io(struct journal_head *jh)
 	transaction->t_checkpoint_io_list = jh;
 }
 
+/*
+ * Check a checkpoint buffer could be release or not.
+ *
+ * Requires j_list_lock
+ */
+static inline bool __cp_buffer_busy(struct journal_head *jh)
+{
+	struct buffer_head *bh = jh2bh(jh);
+
+	return (jh->b_transaction || buffer_locked(bh) || buffer_dirty(bh));
+}
+
 /*
  * Try to release a checkpointed buffer from its transaction.
  * Returns 1 if we released it and 2 if we also released the
@@ -462,6 +474,137 @@ static int journal_clean_one_cp_list(struct journal_head *jh, bool destroy)
 	return 0;
 }
 
+/*
+ * journal_shrink_one_cp_list
+ *
+ * Find 'nr_to_scan' written-back checkpoint buffers in the given list
+ * and try to release them. If the whole transaction is released, set
+ * the 'released' parameter. Return the number of released checkpointed
+ * buffers.
+ *
+ * Called with j_list_lock held.
+ */
+static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
+						unsigned long *nr_to_scan,
+						bool *released)
+{
+	struct journal_head *last_jh;
+	struct journal_head *next_jh = jh;
+	unsigned long nr_freed = 0;
+	int ret;
+
+	if (!jh || *nr_to_scan == 0)
+		return 0;
+
+	last_jh = jh->b_cpprev;
+	do {
+		jh = next_jh;
+		next_jh = jh->b_cpnext;
+
+		(*nr_to_scan)--;
+		if (__cp_buffer_busy(jh))
+			continue;
+
+		nr_freed++;
+		ret = __jbd2_journal_remove_checkpoint(jh);
+		if (ret) {
+			*released = true;
+			break;
+		}
+
+		if (need_resched())
+			break;
+	} while (jh != last_jh && *nr_to_scan);
+
+	return nr_freed;
+}
+
+/*
+ * jbd2_journal_shrink_checkpoint_list
+ *
+ * Find 'nr_to_scan' written-back checkpoint buffers in the journal
+ * and try to release them. Return the number of released checkpointed
+ * buffers.
+ *
+ * Called with j_list_lock held.
+ */
+unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal,
+						  unsigned long *nr_to_scan)
+{
+	transaction_t *transaction, *last_transaction, *next_transaction;
+	bool released;
+	tid_t first_tid = 0, last_tid = 0, next_tid = 0;
+	tid_t tid = 0;
+	unsigned long nr_freed = 0;
+	unsigned long nr_scanned = *nr_to_scan;
+
+again:
+	spin_lock(&journal->j_list_lock);
+	if (!journal->j_checkpoint_transactions) {
+		spin_unlock(&journal->j_list_lock);
+		goto out;
+	}
+
+	/*
+	 * Get next shrink transaction, resume previous scan or start
+	 * over again. If some others do checkpoint and drop transaction
+	 * from the checkpoint list, we ignore saved j_shrink_transaction
+	 * and start over unconditionally.
+	 */
+	if (journal->j_shrink_transaction)
+		transaction = journal->j_shrink_transaction;
+	else
+		transaction = journal->j_checkpoint_transactions;
+
+	if (!first_tid)
+		first_tid = transaction->t_tid;
+	last_transaction = journal->j_checkpoint_transactions->t_cpprev;
+	next_transaction = transaction;
+	last_tid = last_transaction->t_tid;
+	do {
+		transaction = next_transaction;
+		next_transaction = transaction->t_cpnext;
+		tid = transaction->t_tid;
+		released = false;
+
+		nr_freed += journal_shrink_one_cp_list(transaction->t_checkpoint_list,
+						       nr_to_scan, &released);
+		if (*nr_to_scan == 0)
+			break;
+		if (need_resched() || spin_needbreak(&journal->j_list_lock))
+			break;
+		if (released)
+			continue;
+
+		nr_freed += journal_shrink_one_cp_list(transaction->t_checkpoint_io_list,
+						       nr_to_scan, &released);
+		if (*nr_to_scan == 0)
+			break;
+		if (need_resched() || spin_needbreak(&journal->j_list_lock))
+			break;
+	} while (transaction != last_transaction);
+
+	if (transaction != last_transaction) {
+		journal->j_shrink_transaction = next_transaction;
+		next_tid = next_transaction->t_tid;
+	} else {
+		journal->j_shrink_transaction = NULL;
+		next_tid = 0;
+	}
+
+	spin_unlock(&journal->j_list_lock);
+	cond_resched();
+
+	if (*nr_to_scan && next_tid)
+		goto again;
+out:
+	nr_scanned -= *nr_to_scan;
+	trace_jbd2_shrink_checkpoint_list(journal, first_tid, tid, last_tid,
+					  nr_freed, nr_scanned, next_tid);
+
+	return nr_freed;
+}
+
 /*
  * journal_clean_checkpoint_list
  *
@@ -584,6 +727,7 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
 
 	__buffer_unlink(jh);
 	jh->b_cp_transaction = NULL;
+	percpu_counter_dec(&journal->j_jh_shrink_count);
 	jbd2_journal_put_journal_head(jh);
 
 	/* Is this transaction empty? */
@@ -646,6 +790,7 @@ void __jbd2_journal_insert_checkpoint(struct journal_head *jh,
 		jh->b_cpnext->b_cpprev = jh;
 	}
 	transaction->t_checkpoint_list = jh;
+	percpu_counter_inc(&transaction->t_journal->j_jh_shrink_count);
 }
 
 /*
@@ -661,6 +806,8 @@ void __jbd2_journal_insert_checkpoint(struct journal_head *jh,
 void __jbd2_journal_drop_transaction(journal_t *journal, transaction_t *transaction)
 {
 	assert_spin_locked(&journal->j_list_lock);
+
+	journal->j_shrink_transaction = NULL;
 	if (transaction->t_cpnext) {
 		transaction->t_cpnext->t_cpprev = transaction->t_cpprev;
 		transaction->t_cpprev->t_cpnext = transaction->t_cpnext;
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 90146755941f..cfad32036857 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1954,6 +1954,91 @@ int jbd2_journal_load(journal_t *journal)
 	return -EIO;
 }
 
+/**
+ * jbd2_journal_shrink_scan()
+ *
+ * Scan the checkpointed buffer on the checkpoint list and release the
+ * journal_head.
+ */
+unsigned long jbd2_journal_shrink_scan(struct shrinker *shrink,
+				       struct shrink_control *sc)
+{
+	journal_t *journal = container_of(shrink, journal_t, j_shrinker);
+	unsigned long nr_to_scan = sc->nr_to_scan;
+	unsigned long nr_shrunk;
+	unsigned long count;
+
+	count = percpu_counter_sum_positive(&journal->j_jh_shrink_count);
+	trace_jbd2_shrink_scan_enter(journal, sc->nr_to_scan, count);
+
+	nr_shrunk = jbd2_journal_shrink_checkpoint_list(journal, &nr_to_scan);
+
+	count = percpu_counter_sum_positive(&journal->j_jh_shrink_count);
+	trace_jbd2_shrink_scan_exit(journal, nr_to_scan, nr_shrunk, count);
+
+	return nr_shrunk;
+}
+
+/**
+ * jbd2_journal_shrink_scan()
+ *
+ * Count the number of checkpoint buffers on the checkpoint list.
+ */
+unsigned long jbd2_journal_shrink_count(struct shrinker *shrink,
+					struct shrink_control *sc)
+{
+	journal_t *journal = container_of(shrink, journal_t, j_shrinker);
+	unsigned long count;
+
+	count = percpu_counter_sum_positive(&journal->j_jh_shrink_count);
+	trace_jbd2_shrink_count(journal, sc->nr_to_scan, count);
+
+	return count;
+}
+
+/**
+ * jbd2_journal_register_shrinker()
+ * @journal: Journal to act on.
+ *
+ * Init a percpu counter to record the checkpointed buffers on the checkpoint
+ * list and register a shrinker to release their journal_head.
+ */
+int jbd2_journal_register_shrinker(journal_t *journal)
+{
+	int err;
+
+	journal->j_shrink_transaction = NULL;
+
+	err = percpu_counter_init(&journal->j_jh_shrink_count, 0, GFP_KERNEL);
+	if (err)
+		return err;
+
+	journal->j_shrinker.scan_objects = jbd2_journal_shrink_scan;
+	journal->j_shrinker.count_objects = jbd2_journal_shrink_count;
+	journal->j_shrinker.seeks = DEFAULT_SEEKS;
+	journal->j_shrinker.batch = journal->j_max_transaction_buffers;
+
+	err = register_shrinker(&journal->j_shrinker);
+	if (err) {
+		percpu_counter_destroy(&journal->j_jh_shrink_count);
+		return err;
+	}
+
+	return 0;
+}
+
+/**
+ * jbd2_journal_unregister_shrinker()
+ * @journal: Journal to act on.
+ *
+ * Unregister the checkpointed buffer shrinker and destroy the percpu counter.
+ */
+void jbd2_journal_unregister_shrinker(journal_t *journal)
+{
+	percpu_counter_destroy(&journal->j_jh_shrink_count);
+	unregister_shrinker(&journal->j_shrinker);
+}
+
 /**
  * jbd2_journal_destroy() - Release a journal_t structure.
  * @journal: Journal to act on.
@@ -2026,6 +2111,8 @@ int jbd2_journal_destroy(journal_t *journal)
 		brelse(journal->j_sb_buffer);
 	}
 
+	jbd2_journal_unregister_shrinker(journal);
+
 	if (journal->j_proc_entry)
 		jbd2_stats_proc_exit(journal);
 	iput(journal->j_inode);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f9b5e657b8f3..23578506215f 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -909,6 +909,29 @@ struct journal_s
 	 */
 	struct buffer_head	*j_chkpt_bhs[JBD2_NR_BATCH];
 
+	/**
+	 * @j_shrinker:
+	 *
+	 * Journal head shrinker, reclaim buffer's journal head which
+	 * has been written back.
+	 */
+	struct shrinker		j_shrinker;
+
+	/**
+	 * @j_jh_shrink_count:
+	 *
+	 * Number of journal buffers on the checkpoint list. [j_list_lock]
+	 */
+	struct percpu_counter	j_jh_shrink_count;
+
+	/**
+	 * @j_shrink_transaction:
+	 *
+	 * Record next transaction will shrink on the checkpoint list.
+	 * [j_list_lock]
+	 */
+	transaction_t		*j_shrink_transaction;
+
 	/**
 	 * @j_head:
 	 *
@@ -1418,6 +1441,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
 
 /* Checkpoint list management */
 void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
+unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
 int __jbd2_journal_remove_checkpoint(struct journal_head *);
 void jbd2_journal_destroy_checkpoint(journal_t *journal);
 void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
@@ -1528,6 +1552,8 @@ extern int	   jbd2_journal_set_features
 		   (journal_t *, unsigned long, unsigned long, unsigned long);
 extern void	   jbd2_journal_clear_features
 		   (journal_t *, unsigned long, unsigned long, unsigned long);
+extern int	   jbd2_journal_register_shrinker(journal_t *journal);
+extern void	   jbd2_journal_unregister_shrinker(journal_t *journal);
 extern int	   jbd2_journal_load       (journal_t *journal);
 extern int	   jbd2_journal_destroy    (journal_t *);
 extern int	   jbd2_journal_recover    (journal_t *journal);
diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h
index d16a32867f3a..a4dfe005983d 100644
--- a/include/trace/events/jbd2.h
+++ b/include/trace/events/jbd2.h
@@ -394,6 +394,107 @@ TRACE_EVENT(jbd2_lock_buffer_stall,
 		__entry->stall_ms)
 );
 
+DECLARE_EVENT_CLASS(jbd2_journal_shrink,
+
+	TP_PROTO(journal_t *journal, unsigned long nr_to_scan,
+		 unsigned long count),
+
+	TP_ARGS(journal, nr_to_scan, count),
+
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned long, nr_to_scan)
+		__field(unsigned long, count)
+	),
+
+	TP_fast_assign(
+		__entry->dev		= journal->j_fs_dev->bd_dev;
+		__entry->nr_to_scan	= nr_to_scan;
+		__entry->count		= count;
+	),
+
+	TP_printk("dev %d,%d nr_to_scan %lu count %lu",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->nr_to_scan, __entry->count)
+);
+
+DEFINE_EVENT(jbd2_journal_shrink, jbd2_shrink_count,
+
+	TP_PROTO(journal_t *journal, unsigned long nr_to_scan, unsigned long count),
+
+	TP_ARGS(journal, nr_to_scan, count)
+);
+
+DEFINE_EVENT(jbd2_journal_shrink, jbd2_shrink_scan_enter,
+
+	TP_PROTO(journal_t *journal, unsigned long nr_to_scan, unsigned long count),
+
+	TP_ARGS(journal, nr_to_scan, count)
+);
+
+TRACE_EVENT(jbd2_shrink_scan_exit,
+
+	TP_PROTO(journal_t *journal, unsigned long nr_to_scan,
+		 unsigned long nr_shrunk, unsigned long count),
+
+	TP_ARGS(journal, nr_to_scan, nr_shrunk, count),
+
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned long, nr_to_scan)
+		__field(unsigned long, nr_shrunk)
+		__field(unsigned long, count)
+	),
+
+	TP_fast_assign(
+		__entry->dev		= journal->j_fs_dev->bd_dev;
+		__entry->nr_to_scan	= nr_to_scan;
+		__entry->nr_shrunk	= nr_shrunk;
+		__entry->count		= count;
+	),
+
+	TP_printk("dev %d,%d nr_to_scan %lu nr_shrunk %lu count %lu",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->nr_to_scan, __entry->nr_shrunk,
+		  __entry->count)
+);
+
+TRACE_EVENT(jbd2_shrink_checkpoint_list,
+
+	TP_PROTO(journal_t *journal, tid_t first_tid, tid_t tid, tid_t last_tid,
+		 unsigned long nr_freed, unsigned long nr_scanned,
+		 tid_t next_tid),
+
+	TP_ARGS(journal, first_tid, tid, last_tid, nr_freed,
+		nr_scanned, next_tid),
+
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(tid_t, first_tid)
+		__field(tid_t, tid)
+		__field(tid_t, last_tid)
+		__field(unsigned long, nr_freed)
+		__field(unsigned long, nr_scanned)
+		__field(tid_t, next_tid)
+	),
+
+	TP_fast_assign(
+		__entry->dev		= journal->j_fs_dev->bd_dev;
+		__entry->first_tid	= first_tid;
+		__entry->tid		= tid;
+		__entry->last_tid	= last_tid;
+		__entry->nr_freed	= nr_freed;
+		__entry->nr_scanned	= nr_scanned;
+		__entry->next_tid	= next_tid;
+	),
+
+	TP_printk("dev %d,%d shrink transaction %u-%u(%u) freed %lu "
+		  "scanned %lu next transaction %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->first_tid, __entry->tid, __entry->last_tid,
+		  __entry->nr_freed, __entry->nr_scanned, __entry->next_tid)
+);
+
 #endif /* _TRACE_JBD2_H */
 
 /* This part must be outside protection */
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 6/8] jbd2: simplify journal_clean_one_cp_list()
  2021-05-27 13:56 [RFC PATCH v3 0/8] ext4, jbd2: fix 3 issues about bdev_try_to_free_page() Zhang Yi
                   ` (4 preceding siblings ...)
  2021-05-27 13:56 ` [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers Zhang Yi
@ 2021-05-27 13:56 ` Zhang Yi
  2021-06-07 15:50   ` Jan Kara
  2021-05-27 13:56 ` [RFC PATCH v3 7/8] ext4: remove bdev_try_to_free_page() callback Zhang Yi
  2021-05-27 13:56 ` [RFC PATCH v3 8/8] fs: remove bdev_try_to_free_page callback Zhang Yi
  7 siblings, 1 reply; 22+ messages in thread
From: Zhang Yi @ 2021-05-27 13:56 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

Now that __try_to_free_cp_buf() remove checkpointed buffer or transaction
when the buffer is not 'busy', which is only called by
journal_clean_one_cp_list(). This patch simplify this function by remove
__try_to_free_cp_buf() and invoke __cp_buffer_busy() directly.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/jbd2/checkpoint.c | 30 ++++--------------------------
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 727389185d24..7dea46cc7099 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -91,25 +91,6 @@ static inline bool __cp_buffer_busy(struct journal_head *jh)
 	return (jh->b_transaction || buffer_locked(bh) || buffer_dirty(bh));
 }
 
-/*
- * Try to release a checkpointed buffer from its transaction.
- * Returns 1 if we released it and 2 if we also released the
- * whole transaction.
- *
- * Requires j_list_lock
- */
-static int __try_to_free_cp_buf(struct journal_head *jh)
-{
-	int ret = 0;
-	struct buffer_head *bh = jh2bh(jh);
-
-	if (!jh->b_transaction && !buffer_locked(bh) && !buffer_dirty(bh)) {
-		JBUFFER_TRACE(jh, "remove from checkpoint list");
-		ret = __jbd2_journal_remove_checkpoint(jh) + 1;
-	}
-	return ret;
-}
-
 /*
  * __jbd2_log_wait_for_space: wait until there is space in the journal.
  *
@@ -444,7 +425,6 @@ static int journal_clean_one_cp_list(struct journal_head *jh, bool destroy)
 {
 	struct journal_head *last_jh;
 	struct journal_head *next_jh = jh;
-	int ret;
 
 	if (!jh)
 		return 0;
@@ -453,13 +433,11 @@ static int journal_clean_one_cp_list(struct journal_head *jh, bool destroy)
 	do {
 		jh = next_jh;
 		next_jh = jh->b_cpnext;
-		if (!destroy)
-			ret = __try_to_free_cp_buf(jh);
-		else
-			ret = __jbd2_journal_remove_checkpoint(jh) + 1;
-		if (!ret)
+
+		if (!destroy && __cp_buffer_busy(jh))
 			return 0;
-		if (ret == 2)
+
+		if (__jbd2_journal_remove_checkpoint(jh))
 			return 1;
 		/*
 		 * This function only frees up some memory
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 7/8] ext4: remove bdev_try_to_free_page() callback
  2021-05-27 13:56 [RFC PATCH v3 0/8] ext4, jbd2: fix 3 issues about bdev_try_to_free_page() Zhang Yi
                   ` (5 preceding siblings ...)
  2021-05-27 13:56 ` [RFC PATCH v3 6/8] jbd2: simplify journal_clean_one_cp_list() Zhang Yi
@ 2021-05-27 13:56 ` Zhang Yi
  2021-06-07 15:50   ` Jan Kara
  2021-05-27 13:56 ` [RFC PATCH v3 8/8] fs: remove bdev_try_to_free_page callback Zhang Yi
  7 siblings, 1 reply; 22+ messages in thread
From: Zhang Yi @ 2021-05-27 13:56 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

After we introduce a jbd2 shrinker to release checkpointed buffer's
journal head, we could free buffer without bdev_try_to_free_page()
under memory pressure. So this patch remove the whole
bdev_try_to_free_page() callback directly. It also remove many
use-after-free issues relate to it together.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/super.c | 21 ---------------------
 1 file changed, 21 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index bf6d0085e1b7..b778236d06e6 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1442,26 +1442,6 @@ static int ext4_nfs_commit_metadata(struct inode *inode)
 	return ext4_write_inode(inode, &wbc);
 }
 
-/*
- * Try to release metadata pages (indirect blocks, directories) which are
- * mapped via the block device.  Since these pages could have journal heads
- * which would prevent try_to_free_buffers() from freeing them, we must use
- * jbd2 layer's try_to_free_buffers() function to release them.
- */
-static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
-				 gfp_t wait)
-{
-	journal_t *journal = EXT4_SB(sb)->s_journal;
-
-	WARN_ON(PageChecked(page));
-	if (!page_has_buffers(page))
-		return 0;
-	if (journal)
-		return jbd2_journal_try_to_free_buffers(journal, page);
-
-	return try_to_free_buffers(page);
-}
-
 #ifdef CONFIG_FS_ENCRYPTION
 static int ext4_get_context(struct inode *inode, void *ctx, size_t len)
 {
@@ -1656,7 +1636,6 @@ static const struct super_operations ext4_sops = {
 	.quota_write	= ext4_quota_write,
 	.get_dquots	= ext4_get_dquots,
 #endif
-	.bdev_try_to_free_page = bdev_try_to_free_page,
 };
 
 static const struct export_operations ext4_export_ops = {
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 8/8] fs: remove bdev_try_to_free_page callback
  2021-05-27 13:56 [RFC PATCH v3 0/8] ext4, jbd2: fix 3 issues about bdev_try_to_free_page() Zhang Yi
                   ` (6 preceding siblings ...)
  2021-05-27 13:56 ` [RFC PATCH v3 7/8] ext4: remove bdev_try_to_free_page() callback Zhang Yi
@ 2021-05-27 13:56 ` Zhang Yi
  2021-06-07 15:50   ` Jan Kara
  7 siblings, 1 reply; 22+ messages in thread
From: Zhang Yi @ 2021-05-27 13:56 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

After remove the unique user of sop->bdev_try_to_free_page() callback,
we could remove the callback and the corresponding blkdev_releasepage()
at all.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/block_dev.c     | 15 ---------------
 include/linux/fs.h |  1 -
 2 files changed, 16 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 6cc4d4cfe0c2..e215da6d49b4 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1733,20 +1733,6 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
 }
 EXPORT_SYMBOL_GPL(blkdev_read_iter);
 
-/*
- * Try to release a page associated with block device when the system
- * is under memory pressure.
- */
-static int blkdev_releasepage(struct page *page, gfp_t wait)
-{
-	struct super_block *super = BDEV_I(page->mapping->host)->bdev.bd_super;
-
-	if (super && super->s_op->bdev_try_to_free_page)
-		return super->s_op->bdev_try_to_free_page(super, page, wait);
-
-	return try_to_free_buffers(page);
-}
-
 static int blkdev_writepages(struct address_space *mapping,
 			     struct writeback_control *wbc)
 {
@@ -1760,7 +1746,6 @@ static const struct address_space_operations def_blk_aops = {
 	.write_begin	= blkdev_write_begin,
 	.write_end	= blkdev_write_end,
 	.writepages	= blkdev_writepages,
-	.releasepage	= blkdev_releasepage,
 	.direct_IO	= blkdev_direct_IO,
 	.migratepage	= buffer_migrate_page_norefs,
 	.is_dirty_writeback = buffer_check_dirty_writeback,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c3c88fdb9b2a..c3277b445f96 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2171,7 +2171,6 @@ struct super_operations {
 	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
 	struct dquot **(*get_dquots)(struct inode *);
 #endif
-	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
 	long (*nr_cached_objects)(struct super_block *,
 				  struct shrink_control *);
 	long (*free_cached_objects)(struct super_block *,
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 5/8] jbd2, ext4: add a shrinker to release checkpointed buffers
  2021-05-27 13:56 ` [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers Zhang Yi
@ 2021-05-27 15:10   ` kernel test robot
  2021-05-27 15:41   ` kernel test robot
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2021-05-27 15:10 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4346 bytes --]

Hi Zhang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on ext4/dev]
[also build test WARNING on tip/perf/core block/for-next linus/master v5.13-rc3 next-20210527]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Zhang-Yi/ext4-jbd2-fix-3-issues-about-bdev_try_to_free_page/20210527-215117
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
config: nds32-randconfig-r014-20210526 (attached as .config)
compiler: nds32le-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/8e85274125ab168f6a594e0a6ac0db18e04a1ff2
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Zhang-Yi/ext4-jbd2-fix-3-issues-about-bdev_try_to_free_page/20210527-215117
        git checkout 8e85274125ab168f6a594e0a6ac0db18e04a1ff2
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nds32 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/jbd2/journal.c:1963:15: warning: no previous prototype for 'jbd2_journal_shrink_scan' [-Wmissing-prototypes]
    1963 | unsigned long jbd2_journal_shrink_scan(struct shrinker *shrink,
         |               ^~~~~~~~~~~~~~~~~~~~~~~~
>> fs/jbd2/journal.c:1987:15: warning: no previous prototype for 'jbd2_journal_shrink_count' [-Wmissing-prototypes]
    1987 | unsigned long jbd2_journal_shrink_count(struct shrinker *shrink,
         |               ^~~~~~~~~~~~~~~~~~~~~~~~~
--
   fs/jbd2/journal.c:1965: warning: Function parameter or member 'shrink' not described in 'jbd2_journal_shrink_scan'
   fs/jbd2/journal.c:1965: warning: Function parameter or member 'sc' not described in 'jbd2_journal_shrink_scan'
   fs/jbd2/journal.c:1989: warning: Function parameter or member 'shrink' not described in 'jbd2_journal_shrink_count'
   fs/jbd2/journal.c:1989: warning: Function parameter or member 'sc' not described in 'jbd2_journal_shrink_count'
>> fs/jbd2/journal.c:1989: warning: expecting prototype for jbd2_journal_shrink_scan(). Prototype was for jbd2_journal_shrink_count() instead


vim +/jbd2_journal_shrink_scan +1963 fs/jbd2/journal.c

  1956	
  1957	/**
  1958	 * jbd2_journal_shrink_scan()
  1959	 *
  1960	 * Scan the checkpointed buffer on the checkpoint list and release the
  1961	 * journal_head.
  1962	 */
> 1963	unsigned long jbd2_journal_shrink_scan(struct shrinker *shrink,
  1964					       struct shrink_control *sc)
  1965	{
  1966		journal_t *journal = container_of(shrink, journal_t, j_shrinker);
  1967		unsigned long nr_to_scan = sc->nr_to_scan;
  1968		unsigned long nr_shrunk;
  1969		unsigned long count;
  1970	
  1971		count = percpu_counter_sum_positive(&journal->j_jh_shrink_count);
  1972		trace_jbd2_shrink_scan_enter(journal, sc->nr_to_scan, count);
  1973	
  1974		nr_shrunk = jbd2_journal_shrink_checkpoint_list(journal, &nr_to_scan);
  1975	
  1976		count = percpu_counter_sum_positive(&journal->j_jh_shrink_count);
  1977		trace_jbd2_shrink_scan_exit(journal, nr_to_scan, nr_shrunk, count);
  1978	
  1979		return nr_shrunk;
  1980	}
  1981	
  1982	/**
  1983	 * jbd2_journal_shrink_scan()
  1984	 *
  1985	 * Count the number of checkpoint buffers on the checkpoint list.
  1986	 */
> 1987	unsigned long jbd2_journal_shrink_count(struct shrinker *shrink,
  1988						struct shrink_control *sc)
> 1989	{
  1990		journal_t *journal = container_of(shrink, journal_t, j_shrinker);
  1991		unsigned long count;
  1992	
  1993		count = percpu_counter_sum_positive(&journal->j_jh_shrink_count);
  1994		trace_jbd2_shrink_count(journal, sc->nr_to_scan, count);
  1995	
  1996		return count;
  1997	}
  1998	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 25120 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 5/8] jbd2, ext4: add a shrinker to release checkpointed buffers
  2021-05-27 13:56 ` [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers Zhang Yi
  2021-05-27 15:10   ` [RFC PATCH v3 5/8] jbd2, ext4: " kernel test robot
@ 2021-05-27 15:41   ` kernel test robot
  2021-05-27 18:25   ` kernel test robot
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2021-05-27 15:41 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4322 bytes --]

Hi Zhang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on ext4/dev]
[also build test WARNING on tip/perf/core block/for-next linus/master v5.13-rc3 next-20210527]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Zhang-Yi/ext4-jbd2-fix-3-issues-about-bdev_try_to_free_page/20210527-215117
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
config: powerpc64-randconfig-r016-20210526 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 6505c630407c5feec818f0bb1c284f9eeebf2071)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install powerpc64 cross compiling tool for clang build
        # apt-get install binutils-powerpc64-linux-gnu
        # https://github.com/0day-ci/linux/commit/8e85274125ab168f6a594e0a6ac0db18e04a1ff2
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Zhang-Yi/ext4-jbd2-fix-3-issues-about-bdev_try_to_free_page/20210527-215117
        git checkout 8e85274125ab168f6a594e0a6ac0db18e04a1ff2
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/jbd2/journal.c:1963:15: warning: no previous prototype for function 'jbd2_journal_shrink_scan' [-Wmissing-prototypes]
   unsigned long jbd2_journal_shrink_scan(struct shrinker *shrink,
                 ^
   fs/jbd2/journal.c:1963:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   unsigned long jbd2_journal_shrink_scan(struct shrinker *shrink,
   ^
   static 
>> fs/jbd2/journal.c:1987:15: warning: no previous prototype for function 'jbd2_journal_shrink_count' [-Wmissing-prototypes]
   unsigned long jbd2_journal_shrink_count(struct shrinker *shrink,
                 ^
   fs/jbd2/journal.c:1987:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   unsigned long jbd2_journal_shrink_count(struct shrinker *shrink,
   ^
   static 
   2 warnings generated.


vim +/jbd2_journal_shrink_scan +1963 fs/jbd2/journal.c

  1956	
  1957	/**
  1958	 * jbd2_journal_shrink_scan()
  1959	 *
  1960	 * Scan the checkpointed buffer on the checkpoint list and release the
  1961	 * journal_head.
  1962	 */
> 1963	unsigned long jbd2_journal_shrink_scan(struct shrinker *shrink,
  1964					       struct shrink_control *sc)
  1965	{
  1966		journal_t *journal = container_of(shrink, journal_t, j_shrinker);
  1967		unsigned long nr_to_scan = sc->nr_to_scan;
  1968		unsigned long nr_shrunk;
  1969		unsigned long count;
  1970	
  1971		count = percpu_counter_sum_positive(&journal->j_jh_shrink_count);
  1972		trace_jbd2_shrink_scan_enter(journal, sc->nr_to_scan, count);
  1973	
  1974		nr_shrunk = jbd2_journal_shrink_checkpoint_list(journal, &nr_to_scan);
  1975	
  1976		count = percpu_counter_sum_positive(&journal->j_jh_shrink_count);
  1977		trace_jbd2_shrink_scan_exit(journal, nr_to_scan, nr_shrunk, count);
  1978	
  1979		return nr_shrunk;
  1980	}
  1981	
  1982	/**
  1983	 * jbd2_journal_shrink_scan()
  1984	 *
  1985	 * Count the number of checkpoint buffers on the checkpoint list.
  1986	 */
> 1987	unsigned long jbd2_journal_shrink_count(struct shrinker *shrink,
  1988						struct shrink_control *sc)
  1989	{
  1990		journal_t *journal = container_of(shrink, journal_t, j_shrinker);
  1991		unsigned long count;
  1992	
  1993		count = percpu_counter_sum_positive(&journal->j_jh_shrink_count);
  1994		trace_jbd2_shrink_count(journal, sc->nr_to_scan, count);
  1995	
  1996		return count;
  1997	}
  1998	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 34451 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 5/8] jbd2, ext4: add a shrinker to release checkpointed buffers
  2021-05-27 13:56 ` [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers Zhang Yi
  2021-05-27 15:10   ` [RFC PATCH v3 5/8] jbd2, ext4: " kernel test robot
  2021-05-27 15:41   ` kernel test robot
@ 2021-05-27 18:25   ` kernel test robot
  2021-05-27 18:25   ` [RFC PATCH] jbd2, ext4: jbd2_journal_shrink_scan() can be static kernel test robot
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2021-05-27 18:25 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1860 bytes --]

Hi Zhang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on ext4/dev]
[also build test WARNING on tip/perf/core block/for-next linus/master v5.13-rc3 next-20210527]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Zhang-Yi/ext4-jbd2-fix-3-issues-about-bdev_try_to_free_page/20210527-215117
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
config: x86_64-randconfig-s021-20210527 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/8e85274125ab168f6a594e0a6ac0db18e04a1ff2
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Zhang-Yi/ext4-jbd2-fix-3-issues-about-bdev_try_to_free_page/20210527-215117
        git checkout 8e85274125ab168f6a594e0a6ac0db18e04a1ff2
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> fs/jbd2/journal.c:1963:15: sparse: sparse: symbol 'jbd2_journal_shrink_scan' was not declared. Should it be static?
>> fs/jbd2/journal.c:1987:15: sparse: sparse: symbol 'jbd2_journal_shrink_count' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33701 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH] jbd2, ext4: jbd2_journal_shrink_scan() can be static
  2021-05-27 13:56 ` [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers Zhang Yi
                     ` (2 preceding siblings ...)
  2021-05-27 18:25   ` kernel test robot
@ 2021-05-27 18:25   ` kernel test robot
  2021-06-07 15:45   ` [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers Jan Kara
  2021-06-09  3:00   ` Dave Chinner
  5 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2021-05-27 18:25 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1529 bytes --]

fs/jbd2/journal.c:1963:15: warning: symbol 'jbd2_journal_shrink_scan' was not declared. Should it be static?
fs/jbd2/journal.c:1987:15: warning: symbol 'jbd2_journal_shrink_count' was not declared. Should it be static?

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
---
 journal.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index cfad32036857e..491c311dabec1 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1960,8 +1960,8 @@ int jbd2_journal_load(journal_t *journal)
  * Scan the checkpointed buffer on the checkpoint list and release the
  * journal_head.
  */
-unsigned long jbd2_journal_shrink_scan(struct shrinker *shrink,
-				       struct shrink_control *sc)
+static unsigned long jbd2_journal_shrink_scan(struct shrinker *shrink,
+					      struct shrink_control *sc)
 {
 	journal_t *journal = container_of(shrink, journal_t, j_shrinker);
 	unsigned long nr_to_scan = sc->nr_to_scan;
@@ -1984,8 +1984,8 @@ unsigned long jbd2_journal_shrink_scan(struct shrinker *shrink,
  *
  * Count the number of checkpoint buffers on the checkpoint list.
  */
-unsigned long jbd2_journal_shrink_count(struct shrinker *shrink,
-					struct shrink_control *sc)
+static unsigned long jbd2_journal_shrink_count(struct shrinker *shrink,
+					       struct shrink_control *sc)
 {
 	journal_t *journal = container_of(shrink, journal_t, j_shrinker);
 	unsigned long count;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 2/8] jbd2: ensure abort the journal if detect IO error when writing original buffer back
  2021-05-27 13:56 ` [RFC PATCH v3 2/8] jbd2: ensure abort the journal if detect IO error when writing original buffer back Zhang Yi
@ 2021-06-03 16:21   ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2021-06-03 16:21 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Thu 27-05-21 21:56:35, Zhang Yi wrote:
> Although we merged c044f3d8360 ("jbd2: abort journal if free a async
> write error metadata buffer"), there is a race between
> jbd2_journal_try_to_free_buffers() and jbd2_journal_destroy(), so the
> jbd2_log_do_checkpoint() may still fail to detect the buffer write
> io error flag which may lead to filesystem inconsistency.
> 
> jbd2_journal_try_to_free_buffers()     ext4_put_super()
>                                         jbd2_journal_destroy()
>   __jbd2_journal_remove_checkpoint()
>   detect buffer write error              jbd2_log_do_checkpoint()
>                                          jbd2_cleanup_journal_tail()
>                                            <--- lead to inconsistency
>   jbd2_journal_abort()
> 
> Fix this issue by introducing a new atomic flag which only have one
> JBD2_CHECKPOINT_IO_ERROR bit now, and set it in
> __jbd2_journal_remove_checkpoint() when freeing a checkpoint buffer
> which has write_io_error flag. Then jbd2_journal_destroy() will detect
> this mark and abort the journal to prevent updating log tail.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. Just one spelling fix below. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
> @@ -575,6 +576,17 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
>  	journal = transaction->t_journal;
>  
>  	JBUFFER_TRACE(jh, "removing from transaction");
> +
> +	/*
> +	 * If we have failed to write the buffer out to disk, the filesystem
> +	 * may become inconsistent. We cannot abort the journal here since
> +	 * we hold j_list_lock and we have to careful about races with
					   ^^^ to be careful ...

> +	 * jbd2_journal_destroy(). So mark the writeback IO error in the
> +	 * journal here and we abort the journal later from a better context.
> +	 */
> +	if (buffer_write_io_error(bh))
> +		set_bit(JBD2_CHECKPOINT_IO_ERROR, &journal->j_atomic_flags);
> +
>  	__buffer_unlink(jh);
>  	jh->b_cp_transaction = NULL;
>  	jbd2_journal_put_journal_head(jh);
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 4/8] jbd2: remove redundant buffer io error checks
  2021-05-27 13:56 ` [RFC PATCH v3 4/8] jbd2: remove redundant buffer io error checks Zhang Yi
@ 2021-06-03 16:28   ` Jan Kara
  2021-06-10  2:05     ` Zhang Yi
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2021-06-03 16:28 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Thu 27-05-21 21:56:37, Zhang Yi wrote:
> Now that __jbd2_journal_remove_checkpoint() can detect buffer io error
> and mark journal checkpoint error, then we abort the journal later
> before updating log tail to ensure the filesystem works consistently.
> So we could remove other redundant buffer io error checkes.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ---
>  fs/jbd2/checkpoint.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
> index 2cbac0e3cff3..c1f746a5cc1a 100644
> --- a/fs/jbd2/checkpoint.c
> +++ b/fs/jbd2/checkpoint.c
> @@ -91,8 +91,7 @@ static int __try_to_free_cp_buf(struct journal_head *jh)
>  	int ret = 0;
>  	struct buffer_head *bh = jh2bh(jh);
>  
> -	if (jh->b_transaction == NULL && !buffer_locked(bh) &&
> -	    !buffer_dirty(bh) && !buffer_write_io_error(bh)) {
> +	if (!jh->b_transaction && !buffer_locked(bh) && !buffer_dirty(bh)) {
>  		JBUFFER_TRACE(jh, "remove from checkpoint list");
>  		ret = __jbd2_journal_remove_checkpoint(jh) + 1;
>  	}
> @@ -295,8 +294,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
>  			goto restart;
>  		}
>  		if (!buffer_dirty(bh)) {
> -			if (unlikely(buffer_write_io_error(bh)) && !result)
> -				result = -EIO;
>  			BUFFER_TRACE(bh, "remove from checkpoint");
>  			if (__jbd2_journal_remove_checkpoint(jh))
>  				/* The transaction was released; we're done */
> @@ -356,8 +353,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
>  			spin_lock(&journal->j_list_lock);
>  			goto restart2;
>  		}
> -		if (unlikely(buffer_write_io_error(bh)) && !result)
> -			result = -EIO;
>  
>  		/*
>  		 * Now in whatever state the buffer currently is, we

You can also drop:

	if (result < 0)
                jbd2_journal_abort(journal, result);

in jbd2_log_do_checkpoint() as there's now nothing which can set 'result'
in the loops... Otherwise looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers
  2021-05-27 13:56 ` [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers Zhang Yi
                     ` (3 preceding siblings ...)
  2021-05-27 18:25   ` [RFC PATCH] jbd2, ext4: jbd2_journal_shrink_scan() can be static kernel test robot
@ 2021-06-07 15:45   ` Jan Kara
  2021-06-09  3:00   ` Dave Chinner
  5 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2021-06-07 15:45 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Thu 27-05-21 21:56:38, Zhang Yi wrote:
> Current metadata buffer release logic in bdev_try_to_free_page() have
> a lot of use-after-free issues when umount filesystem concurrently, and
> it is difficult to fix directly because ext4 is the only user of
> s_op->bdev_try_to_free_page callback and we may have to add more special
> refcount or lock that is only used by ext4 into the common vfs layer,
> which is unacceptable.
> 
> One better solution is remove the bdev_try_to_free_page callback, but
> the real problem is we cannot easily release journal_head on the
> checkpointed buffer, so try_to_free_buffers() cannot release buffers and
> page under memory pressure, which is more likely to trigger
> out-of-memory. So we cannot remove the callback directly before we find
> another way to release journal_head.
> 
> This patch introduce a shrinker to free journal_head on the checkpointed
> transaction. After the journal_head got freed, try_to_free_buffers()
> could free buffer properly.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> Suggested-by: Jan Kara <jack@suse.cz>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 6/8] jbd2: simplify journal_clean_one_cp_list()
  2021-05-27 13:56 ` [RFC PATCH v3 6/8] jbd2: simplify journal_clean_one_cp_list() Zhang Yi
@ 2021-06-07 15:50   ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2021-06-07 15:50 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Thu 27-05-21 21:56:39, Zhang Yi wrote:
> Now that __try_to_free_cp_buf() remove checkpointed buffer or transaction
> when the buffer is not 'busy', which is only called by
> journal_clean_one_cp_list(). This patch simplify this function by remove
> __try_to_free_cp_buf() and invoke __cp_buffer_busy() directly.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good! Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> ---
>  fs/jbd2/checkpoint.c | 30 ++++--------------------------
>  1 file changed, 4 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
> index 727389185d24..7dea46cc7099 100644
> --- a/fs/jbd2/checkpoint.c
> +++ b/fs/jbd2/checkpoint.c
> @@ -91,25 +91,6 @@ static inline bool __cp_buffer_busy(struct journal_head *jh)
>  	return (jh->b_transaction || buffer_locked(bh) || buffer_dirty(bh));
>  }
>  
> -/*
> - * Try to release a checkpointed buffer from its transaction.
> - * Returns 1 if we released it and 2 if we also released the
> - * whole transaction.
> - *
> - * Requires j_list_lock
> - */
> -static int __try_to_free_cp_buf(struct journal_head *jh)
> -{
> -	int ret = 0;
> -	struct buffer_head *bh = jh2bh(jh);
> -
> -	if (!jh->b_transaction && !buffer_locked(bh) && !buffer_dirty(bh)) {
> -		JBUFFER_TRACE(jh, "remove from checkpoint list");
> -		ret = __jbd2_journal_remove_checkpoint(jh) + 1;
> -	}
> -	return ret;
> -}
> -
>  /*
>   * __jbd2_log_wait_for_space: wait until there is space in the journal.
>   *
> @@ -444,7 +425,6 @@ static int journal_clean_one_cp_list(struct journal_head *jh, bool destroy)
>  {
>  	struct journal_head *last_jh;
>  	struct journal_head *next_jh = jh;
> -	int ret;
>  
>  	if (!jh)
>  		return 0;
> @@ -453,13 +433,11 @@ static int journal_clean_one_cp_list(struct journal_head *jh, bool destroy)
>  	do {
>  		jh = next_jh;
>  		next_jh = jh->b_cpnext;
> -		if (!destroy)
> -			ret = __try_to_free_cp_buf(jh);
> -		else
> -			ret = __jbd2_journal_remove_checkpoint(jh) + 1;
> -		if (!ret)
> +
> +		if (!destroy && __cp_buffer_busy(jh))
>  			return 0;
> -		if (ret == 2)
> +
> +		if (__jbd2_journal_remove_checkpoint(jh))
>  			return 1;
>  		/*
>  		 * This function only frees up some memory
> -- 
> 2.25.4
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 7/8] ext4: remove bdev_try_to_free_page() callback
  2021-05-27 13:56 ` [RFC PATCH v3 7/8] ext4: remove bdev_try_to_free_page() callback Zhang Yi
@ 2021-06-07 15:50   ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2021-06-07 15:50 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Thu 27-05-21 21:56:40, Zhang Yi wrote:
> After we introduce a jbd2 shrinker to release checkpointed buffer's
> journal head, we could free buffer without bdev_try_to_free_page()
> under memory pressure. So this patch remove the whole
> bdev_try_to_free_page() callback directly. It also remove many
> use-after-free issues relate to it together.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/super.c | 21 ---------------------
>  1 file changed, 21 deletions(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index bf6d0085e1b7..b778236d06e6 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1442,26 +1442,6 @@ static int ext4_nfs_commit_metadata(struct inode *inode)
>  	return ext4_write_inode(inode, &wbc);
>  }
>  
> -/*
> - * Try to release metadata pages (indirect blocks, directories) which are
> - * mapped via the block device.  Since these pages could have journal heads
> - * which would prevent try_to_free_buffers() from freeing them, we must use
> - * jbd2 layer's try_to_free_buffers() function to release them.
> - */
> -static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
> -				 gfp_t wait)
> -{
> -	journal_t *journal = EXT4_SB(sb)->s_journal;
> -
> -	WARN_ON(PageChecked(page));
> -	if (!page_has_buffers(page))
> -		return 0;
> -	if (journal)
> -		return jbd2_journal_try_to_free_buffers(journal, page);
> -
> -	return try_to_free_buffers(page);
> -}
> -
>  #ifdef CONFIG_FS_ENCRYPTION
>  static int ext4_get_context(struct inode *inode, void *ctx, size_t len)
>  {
> @@ -1656,7 +1636,6 @@ static const struct super_operations ext4_sops = {
>  	.quota_write	= ext4_quota_write,
>  	.get_dquots	= ext4_get_dquots,
>  #endif
> -	.bdev_try_to_free_page = bdev_try_to_free_page,
>  };
>  
>  static const struct export_operations ext4_export_ops = {
> -- 
> 2.25.4
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 8/8] fs: remove bdev_try_to_free_page callback
  2021-05-27 13:56 ` [RFC PATCH v3 8/8] fs: remove bdev_try_to_free_page callback Zhang Yi
@ 2021-06-07 15:50   ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2021-06-07 15:50 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Thu 27-05-21 21:56:41, Zhang Yi wrote:
> After remove the unique user of sop->bdev_try_to_free_page() callback,
> we could remove the callback and the corresponding blkdev_releasepage()
> at all.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/block_dev.c     | 15 ---------------
>  include/linux/fs.h |  1 -
>  2 files changed, 16 deletions(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 6cc4d4cfe0c2..e215da6d49b4 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1733,20 +1733,6 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  }
>  EXPORT_SYMBOL_GPL(blkdev_read_iter);
>  
> -/*
> - * Try to release a page associated with block device when the system
> - * is under memory pressure.
> - */
> -static int blkdev_releasepage(struct page *page, gfp_t wait)
> -{
> -	struct super_block *super = BDEV_I(page->mapping->host)->bdev.bd_super;
> -
> -	if (super && super->s_op->bdev_try_to_free_page)
> -		return super->s_op->bdev_try_to_free_page(super, page, wait);
> -
> -	return try_to_free_buffers(page);
> -}
> -
>  static int blkdev_writepages(struct address_space *mapping,
>  			     struct writeback_control *wbc)
>  {
> @@ -1760,7 +1746,6 @@ static const struct address_space_operations def_blk_aops = {
>  	.write_begin	= blkdev_write_begin,
>  	.write_end	= blkdev_write_end,
>  	.writepages	= blkdev_writepages,
> -	.releasepage	= blkdev_releasepage,
>  	.direct_IO	= blkdev_direct_IO,
>  	.migratepage	= buffer_migrate_page_norefs,
>  	.is_dirty_writeback = buffer_check_dirty_writeback,
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c3c88fdb9b2a..c3277b445f96 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2171,7 +2171,6 @@ struct super_operations {
>  	ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
>  	struct dquot **(*get_dquots)(struct inode *);
>  #endif
> -	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
>  	long (*nr_cached_objects)(struct super_block *,
>  				  struct shrink_control *);
>  	long (*free_cached_objects)(struct super_block *,
> -- 
> 2.25.4
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers
  2021-05-27 13:56 ` [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers Zhang Yi
                     ` (4 preceding siblings ...)
  2021-06-07 15:45   ` [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers Jan Kara
@ 2021-06-09  3:00   ` Dave Chinner
  2021-06-10  2:38     ` Zhang Yi
  5 siblings, 1 reply; 22+ messages in thread
From: Dave Chinner @ 2021-06-09  3:00 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Thu, May 27, 2021 at 09:56:38PM +0800, Zhang Yi wrote:
> +/**
> + * jbd2_journal_shrink_scan()
> + *
> + * Scan the checkpointed buffer on the checkpoint list and release the
> + * journal_head.
> + */
> +unsigned long jbd2_journal_shrink_scan(struct shrinker *shrink,
> +				       struct shrink_control *sc)
> +{
> +	journal_t *journal = container_of(shrink, journal_t, j_shrinker);
> +	unsigned long nr_to_scan = sc->nr_to_scan;
> +	unsigned long nr_shrunk;
> +	unsigned long count;
> +
> +	count = percpu_counter_sum_positive(&journal->j_jh_shrink_count);
> +	trace_jbd2_shrink_scan_enter(journal, sc->nr_to_scan, count);
> +
> +	nr_shrunk = jbd2_journal_shrink_checkpoint_list(journal, &nr_to_scan);
> +
> +	count = percpu_counter_sum_positive(&journal->j_jh_shrink_count);
> +	trace_jbd2_shrink_scan_exit(journal, nr_to_scan, nr_shrunk, count);
> +
> +	return nr_shrunk;
> +}
> +
> +/**
> + * jbd2_journal_shrink_scan()
> + *
> + * Count the number of checkpoint buffers on the checkpoint list.
> + */
> +unsigned long jbd2_journal_shrink_count(struct shrinker *shrink,
> +					struct shrink_control *sc)
> +{
> +	journal_t *journal = container_of(shrink, journal_t, j_shrinker);
> +	unsigned long count;
> +
> +	count = percpu_counter_sum_positive(&journal->j_jh_shrink_count);
> +	trace_jbd2_shrink_count(journal, sc->nr_to_scan, count);
> +
> +	return count;
> +}

NACK.

You should not be putting an expensive percpu counter sum in a
shrinker object count routine. These gets called a *lot* under
memory pressure, and summing over hundreds/thousands of CPUs on
every direct reclaim instance over every mounted filesystem
instance that uses jbd2 is really, really going to hurt system
performance under memory pressure.

And, quite frankly, summing twice in the scan routine just to trace
the result with no other purpose is unnecessary and excessive CPU
overhead for a shrinker.

Just use percpu_counter_read() in all cases here - it is more than
accurate enough for the purposes of both tracing and memory reclaim.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 4/8] jbd2: remove redundant buffer io error checks
  2021-06-03 16:28   ` Jan Kara
@ 2021-06-10  2:05     ` Zhang Yi
  0 siblings, 0 replies; 22+ messages in thread
From: Zhang Yi @ 2021-06-10  2:05 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, tytso, adilger.kernel, yukuai3

On 2021/6/4 0:28, Jan Kara wrote:
> On Thu 27-05-21 21:56:37, Zhang Yi wrote:
>> Now that __jbd2_journal_remove_checkpoint() can detect buffer io error
>> and mark journal checkpoint error, then we abort the journal later
>> before updating log tail to ensure the filesystem works consistently.
>> So we could remove other redundant buffer io error checkes.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>> ---
>>  fs/jbd2/checkpoint.c | 7 +------
>>  1 file changed, 1 insertion(+), 6 deletions(-)
>>
>> diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
>> index 2cbac0e3cff3..c1f746a5cc1a 100644
>> --- a/fs/jbd2/checkpoint.c
>> +++ b/fs/jbd2/checkpoint.c
>> @@ -91,8 +91,7 @@ static int __try_to_free_cp_buf(struct journal_head *jh)
>>  	int ret = 0;
>>  	struct buffer_head *bh = jh2bh(jh);
>>  
>> -	if (jh->b_transaction == NULL && !buffer_locked(bh) &&
>> -	    !buffer_dirty(bh) && !buffer_write_io_error(bh)) {
>> +	if (!jh->b_transaction && !buffer_locked(bh) && !buffer_dirty(bh)) {
>>  		JBUFFER_TRACE(jh, "remove from checkpoint list");
>>  		ret = __jbd2_journal_remove_checkpoint(jh) + 1;
>>  	}
>> @@ -295,8 +294,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
>>  			goto restart;
>>  		}
>>  		if (!buffer_dirty(bh)) {
>> -			if (unlikely(buffer_write_io_error(bh)) && !result)
>> -				result = -EIO;
>>  			BUFFER_TRACE(bh, "remove from checkpoint");
>>  			if (__jbd2_journal_remove_checkpoint(jh))
>>  				/* The transaction was released; we're done */
>> @@ -356,8 +353,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
>>  			spin_lock(&journal->j_list_lock);
>>  			goto restart2;
>>  		}
>> -		if (unlikely(buffer_write_io_error(bh)) && !result)
>> -			result = -EIO;
>>  
>>  		/*
>>  		 * Now in whatever state the buffer currently is, we
> 
> You can also drop:
> 
> 	if (result < 0)
>                 jbd2_journal_abort(journal, result);
> 
> in jbd2_log_do_checkpoint() as there's now nothing which can set 'result'
> in the loops... Otherwise looks good. Feel free to add:
> 

Yes, I will remove it in the next iteration, thanks for the review.

Thanks,
Yi.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers
  2021-06-09  3:00   ` Dave Chinner
@ 2021-06-10  2:38     ` Zhang Yi
  0 siblings, 0 replies; 22+ messages in thread
From: Zhang Yi @ 2021-06-10  2:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

Hi, Dave

On 2021/6/9 11:00, Dave Chinner wrote:
> On Thu, May 27, 2021 at 09:56:38PM +0800, Zhang Yi wrote:
>> +/**
>> + * jbd2_journal_shrink_scan()
>> + *
>> + * Scan the checkpointed buffer on the checkpoint list and release the
>> + * journal_head.
>> + */
>> +unsigned long jbd2_journal_shrink_scan(struct shrinker *shrink,
>> +				       struct shrink_control *sc)
>> +{
>> +	journal_t *journal = container_of(shrink, journal_t, j_shrinker);
>> +	unsigned long nr_to_scan = sc->nr_to_scan;
>> +	unsigned long nr_shrunk;
>> +	unsigned long count;
>> +
>> +	count = percpu_counter_sum_positive(&journal->j_jh_shrink_count);
>> +	trace_jbd2_shrink_scan_enter(journal, sc->nr_to_scan, count);
>> +
>> +	nr_shrunk = jbd2_journal_shrink_checkpoint_list(journal, &nr_to_scan);
>> +
>> +	count = percpu_counter_sum_positive(&journal->j_jh_shrink_count);
>> +	trace_jbd2_shrink_scan_exit(journal, nr_to_scan, nr_shrunk, count);
>> +
>> +	return nr_shrunk;
>> +}
>> +
>> +/**
>> + * jbd2_journal_shrink_scan()
>> + *
>> + * Count the number of checkpoint buffers on the checkpoint list.
>> + */
>> +unsigned long jbd2_journal_shrink_count(struct shrinker *shrink,
>> +					struct shrink_control *sc)
>> +{
>> +	journal_t *journal = container_of(shrink, journal_t, j_shrinker);
>> +	unsigned long count;
>> +
>> +	count = percpu_counter_sum_positive(&journal->j_jh_shrink_count);
>> +	trace_jbd2_shrink_count(journal, sc->nr_to_scan, count);
>> +
>> +	return count;
>> +}
> 
> NACK.
> 
> You should not be putting an expensive percpu counter sum in a
> shrinker object count routine. These gets called a *lot* under
> memory pressure, and summing over hundreds/thousands of CPUs on
> every direct reclaim instance over every mounted filesystem
> instance that uses jbd2 is really, really going to hurt system
> performance under memory pressure.
> 
> And, quite frankly, summing twice in the scan routine just to trace
> the result with no other purpose is unnecessary and excessive CPU
> overhead for a shrinker.
> 
> Just use percpu_counter_read() in all cases here - it is more than
> accurate enough for the purposes of both tracing and memory reclaim.

OK, I missed this point, thanks for the comments, I will use
percpu_counter_read_positive() in the next iteration.

Thanks,
Yi.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-06-10  2:38 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-27 13:56 [RFC PATCH v3 0/8] ext4, jbd2: fix 3 issues about bdev_try_to_free_page() Zhang Yi
2021-05-27 13:56 ` [RFC PATCH v3 1/8] jbd2: remove the out label in __jbd2_journal_remove_checkpoint() Zhang Yi
2021-05-27 13:56 ` [RFC PATCH v3 2/8] jbd2: ensure abort the journal if detect IO error when writing original buffer back Zhang Yi
2021-06-03 16:21   ` Jan Kara
2021-05-27 13:56 ` [RFC PATCH v3 3/8] jbd2: don't abort the journal when freeing buffers Zhang Yi
2021-05-27 13:56 ` [RFC PATCH v3 4/8] jbd2: remove redundant buffer io error checks Zhang Yi
2021-06-03 16:28   ` Jan Kara
2021-06-10  2:05     ` Zhang Yi
2021-05-27 13:56 ` [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers Zhang Yi
2021-05-27 15:10   ` [RFC PATCH v3 5/8] jbd2, ext4: " kernel test robot
2021-05-27 15:41   ` kernel test robot
2021-05-27 18:25   ` kernel test robot
2021-05-27 18:25   ` [RFC PATCH] jbd2, ext4: jbd2_journal_shrink_scan() can be static kernel test robot
2021-06-07 15:45   ` [RFC PATCH v3 5/8] jbd2,ext4: add a shrinker to release checkpointed buffers Jan Kara
2021-06-09  3:00   ` Dave Chinner
2021-06-10  2:38     ` Zhang Yi
2021-05-27 13:56 ` [RFC PATCH v3 6/8] jbd2: simplify journal_clean_one_cp_list() Zhang Yi
2021-06-07 15:50   ` Jan Kara
2021-05-27 13:56 ` [RFC PATCH v3 7/8] ext4: remove bdev_try_to_free_page() callback Zhang Yi
2021-06-07 15:50   ` Jan Kara
2021-05-27 13:56 ` [RFC PATCH v3 8/8] fs: remove bdev_try_to_free_page callback Zhang Yi
2021-06-07 15:50   ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.