[PATCH AUTOSEL 5.4 026/459] ext4: fix ext4_dax_read/write inode locking sequence for IOCB

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 5.4 026/459] ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAIT
       [not found] <20200214160149.11681-1-sashal@kernel.org>
@ 2020-02-14 15:54 ` Sasha Levin
  2020-02-14 15:55 ` [PATCH AUTOSEL 5.4 073/459] jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal Sasha Levin
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2020-02-14 15:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Ritesh Harjani, Jan Kara, Matthew Bobrowski, Joseph Qi,
	Theodore Ts'o, Sasha Levin, linux-ext4

From: Ritesh Harjani <riteshh@linux.ibm.com>

[ Upstream commit f629afe3369e9885fd6e9cc7a4f514b6a65cf9e9 ]

Apparently our current rwsem code doesn't like doing the trylock, then
lock for real scheme.  So change our dax read/write methods to just do the
trylock for the RWF_NOWAIT case.
This seems to fix AIM7 regression in some scalable filesystems upto ~25%
in some cases. Claimed in commit 942491c9e6d6 ("xfs: fix AIM7 regression")

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20191212055557.11151-2-riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/ext4/file.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 8d2bbcc2d8133..fd7ce3573a00a 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -40,9 +40,10 @@ static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	struct inode *inode = file_inode(iocb->ki_filp);
 	ssize_t ret;
 
-	if (!inode_trylock_shared(inode)) {
-		if (iocb->ki_flags & IOCB_NOWAIT)
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		if (!inode_trylock_shared(inode))
 			return -EAGAIN;
+	} else {
 		inode_lock_shared(inode);
 	}
 	/*
@@ -190,9 +191,10 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	struct inode *inode = file_inode(iocb->ki_filp);
 	ssize_t ret;
 
-	if (!inode_trylock(inode)) {
-		if (iocb->ki_flags & IOCB_NOWAIT)
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		if (!inode_trylock(inode))
 			return -EAGAIN;
+	} else {
 		inode_lock(inode);
 	}
 	ret = ext4_write_checks(iocb, from);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 5.4 073/459] jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal
       [not found] <20200214160149.11681-1-sashal@kernel.org>
  2020-02-14 15:54 ` [PATCH AUTOSEL 5.4 026/459] ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAIT Sasha Levin
@ 2020-02-14 15:55 ` Sasha Levin
  2020-02-14 15:55 ` [PATCH AUTOSEL 5.4 074/459] ext4: fix deadlock allocating bio_post_read_ctx from mempool Sasha Levin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2020-02-14 15:55 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Kai Li, Theodore Ts'o, Sasha Levin, linux-ext4

From: Kai Li <li.kai4@h3c.com>

[ Upstream commit a09decff5c32060639a685581c380f51b14e1fc2 ]

If the journal is dirty when the filesystem is mounted, jbd2 will replay
the journal but the journal superblock will not be updated by
journal_reset() because JBD2_ABORT flag is still set (it was set in
journal_init_common()). This is problematic because when a new transaction
is then committed, it will be recorded in block 1 (journal->j_tail was set
to 1 in journal_reset()). If unclean shutdown happens again before the
journal superblock is updated, the new recorded transaction will not be
replayed during the next mount (because of stale sb->s_start and
sb->s_sequence values) which can lead to filesystem corruption.

Fixes: 85e0c4e89c1b ("jbd2: if the journal is aborted then don't allow update of the log tail")
Signed-off-by: Kai Li <li.kai4@h3c.com>
Link: https://lore.kernel.org/r/20200111022542.5008-1-li.kai4@h3c.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/jbd2/journal.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index ef485f892d1b0..389c9be4e7919 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1682,6 +1682,11 @@ int jbd2_journal_load(journal_t *journal)
 		       journal->j_devname);
 		return -EFSCORRUPTED;
 	}
+	/*
+	 * clear JBD2_ABORT flag initialized in journal_init_common
+	 * here to update log tail information with the newest seq.
+	 */
+	journal->j_flags &= ~JBD2_ABORT;
 
 	/* OK, we've finished with the dynamic journal bits:
 	 * reinitialise the dynamic contents of the superblock in memory
@@ -1689,7 +1694,6 @@ int jbd2_journal_load(journal_t *journal)
 	if (journal_reset(journal))
 		goto recovery_error;
 
-	journal->j_flags &= ~JBD2_ABORT;
 	journal->j_flags |= JBD2_LOADED;
 	return 0;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 5.4 074/459] ext4: fix deadlock allocating bio_post_read_ctx from mempool
       [not found] <20200214160149.11681-1-sashal@kernel.org>
  2020-02-14 15:54 ` [PATCH AUTOSEL 5.4 026/459] ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAIT Sasha Levin
  2020-02-14 15:55 ` [PATCH AUTOSEL 5.4 073/459] jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal Sasha Levin
@ 2020-02-14 15:55 ` Sasha Levin
  2020-02-14 15:55 ` [PATCH AUTOSEL 5.4 091/459] ext4, jbd2: ensure panic when aborting with zero errno Sasha Levin
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2020-02-14 15:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Eric Biggers, Theodore Ts'o, Sasha Levin, linux-ext4

From: Eric Biggers <ebiggers@google.com>

[ Upstream commit 68e45330e341dad2d3a0a3f8ef2ec46a2a0a3bbc ]

Without any form of coordination, any case where multiple allocations
from the same mempool are needed at a time to make forward progress can
deadlock under memory pressure.

This is the case for struct bio_post_read_ctx, as one can be allocated
to decrypt a Merkle tree page during fsverity_verify_bio(), which itself
is running from a post-read callback for a data bio which has its own
struct bio_post_read_ctx.

Fix this by freeing the first bio_post_read_ctx before calling
fsverity_verify_bio().  This works because verity (if enabled) is always
the last post-read step.

This deadlock can be reproduced by trying to read from an encrypted
verity file after reducing NUM_PREALLOC_POST_READ_CTXS to 1 and patching
mempool_alloc() to pretend that pool->alloc() always fails.

Note that since NUM_PREALLOC_POST_READ_CTXS is actually 128, to actually
hit this bug in practice would require reading from lots of encrypted
verity files at the same time.  But it's theoretically possible, as N
available objects isn't enough to guarantee forward progress when > N/2
threads each need 2 objects at a time.

Fixes: 22cfe4b48ccb ("ext4: add fs-verity read support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20191231181222.47684-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/ext4/readpage.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index a30b203fa461c..a5f55fece9b04 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -57,6 +57,7 @@ enum bio_post_read_step {
 	STEP_INITIAL = 0,
 	STEP_DECRYPT,
 	STEP_VERITY,
+	STEP_MAX,
 };

 struct bio_post_read_ctx {
@@ -106,10 +107,22 @@ static void verity_work(struct work_struct *work)
 {
 	struct bio_post_read_ctx *ctx =
 		container_of(work, struct bio_post_read_ctx, work);
+	struct bio *bio = ctx->bio;

-	fsverity_verify_bio(ctx->bio);
+	/*
+	 * fsverity_verify_bio() may call readpages() again, and although verity
+	 * will be disabled for that, decryption may still be needed, causing
+	 * another bio_post_read_ctx to be allocated.  So to guarantee that
+	 * mempool_alloc() never deadlocks we must free the current ctx first.
+	 * This is safe because verity is the last post-read step.
+	 */
+	BUILD_BUG_ON(STEP_VERITY + 1 != STEP_MAX);
+	mempool_free(ctx, bio_post_read_ctx_pool);
+	bio->bi_private = NULL;

-	bio_post_read_processing(ctx);
+	fsverity_verify_bio(bio);
+
+	__read_end_io(bio);
 }

 static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 5.4 091/459] ext4, jbd2: ensure panic when aborting with zero errno
       [not found] <20200214160149.11681-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2020-02-14 15:55 ` [PATCH AUTOSEL 5.4 074/459] ext4: fix deadlock allocating bio_post_read_ctx from mempool Sasha Levin
@ 2020-02-14 15:55 ` Sasha Levin
  2020-02-14 16:00 ` [PATCH AUTOSEL 5.4 398/459] jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record Sasha Levin
  2020-02-14 16:00 ` [PATCH AUTOSEL 5.4 399/459] jbd2: make sure ESHUTDOWN to be recorded in the journal superblock Sasha Levin
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2020-02-14 15:55 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: zhangyi (F), Jan Kara, Theodore Ts'o, Sasha Levin, linux-ext4

From: "zhangyi (F)" <yi.zhang@huawei.com>

[ Upstream commit 51f57b01e4a3c7d7bdceffd84de35144e8c538e7 ]

JBD2_REC_ERR flag used to indicate the errno has been updated when jbd2
aborted, and then __ext4_abort() and ext4_handle_error() can invoke
panic if ERRORS_PANIC is specified. But if the journal has been aborted
with zero errno, jbd2_journal_abort() didn't set this flag so we can
no longer panic. Fix this by always record the proper errno in the
journal superblock.

Fixes: 4327ba52afd03 ("ext4, jbd2: ensure entering into panic after recording an error in superblock")
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191204124614.45424-3-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/jbd2/checkpoint.c |  2 +-
 fs/jbd2/journal.c    | 15 ++++-----------
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index a1909066bde66..62cf497f18eb4 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -164,7 +164,7 @@ void __jbd2_log_wait_for_space(journal_t *journal)
 				       "journal space in %s\n", __func__,
 				       journal->j_devname);
 				WARN_ON(1);
-				jbd2_journal_abort(journal, 0);
+				jbd2_journal_abort(journal, -EIO);
 			}
 			write_lock(&journal->j_state_lock);
 		} else {
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 389c9be4e7919..65e78d3a2f64c 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2123,12 +2123,10 @@ static void __journal_abort_soft (journal_t *journal, int errno)
 
 	__jbd2_journal_abort_hard(journal);
 
-	if (errno) {
-		jbd2_journal_update_sb_errno(journal);
-		write_lock(&journal->j_state_lock);
-		journal->j_flags |= JBD2_REC_ERR;
-		write_unlock(&journal->j_state_lock);
-	}
+	jbd2_journal_update_sb_errno(journal);
+	write_lock(&journal->j_state_lock);
+	journal->j_flags |= JBD2_REC_ERR;
+	write_unlock(&journal->j_state_lock);
 }
 
 /**
@@ -2170,11 +2168,6 @@ static void __journal_abort_soft (journal_t *journal, int errno)
  * failure to disk.  ext3_error, for example, now uses this
  * functionality.
  *
- * Errors which originate from within the journaling layer will NOT
- * supply an errno; a null errno implies that absolutely no further
- * writes are done to the journal (unless there are any already in
- * progress).
- *
  */
 
 void jbd2_journal_abort(journal_t *journal, int errno)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 5.4 398/459] jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record
       [not found] <20200214160149.11681-1-sashal@kernel.org>
                   ` (3 preceding siblings ...)
  2020-02-14 15:55 ` [PATCH AUTOSEL 5.4 091/459] ext4, jbd2: ensure panic when aborting with zero errno Sasha Levin
@ 2020-02-14 16:00 ` Sasha Levin
  2020-02-14 16:00 ` [PATCH AUTOSEL 5.4 399/459] jbd2: make sure ESHUTDOWN to be recorded in the journal superblock Sasha Levin
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2020-02-14 16:00 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: zhangyi (F), Jan Kara, Theodore Ts'o, Sasha Levin, linux-ext4

From: "zhangyi (F)" <yi.zhang@huawei.com>

[ Upstream commit d0a186e0d3e7ac05cc77da7c157dae5aa59f95d9 ]

We invoke jbd2_journal_abort() to abort the journal and record errno
in the jbd2 superblock when committing journal transaction besides the
failure on submitting the commit record. But there is no need for the
case and we can also invoke jbd2_journal_abort() instead of
__jbd2_journal_abort_hard().

Fixes: 818d276ceb83a ("ext4: Add the journal checksum feature")
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191204124614.45424-2-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/jbd2/commit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index c43591cd70f1c..b33adae434331 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -784,7 +784,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 		err = journal_submit_commit_record(journal, commit_transaction,
 						 &cbh, crc32_sum);
 		if (err)
-			__jbd2_journal_abort_hard(journal);
+			jbd2_journal_abort(journal, err);
 	}
 
 	blk_finish_plug(&plug);
@@ -877,7 +877,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 		err = journal_submit_commit_record(journal, commit_transaction,
 						&cbh, crc32_sum);
 		if (err)
-			__jbd2_journal_abort_hard(journal);
+			jbd2_journal_abort(journal, err);
 	}
 	if (cbh)
 		err = journal_wait_on_commit_record(journal, cbh);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 5.4 399/459] jbd2: make sure ESHUTDOWN to be recorded in the journal superblock
       [not found] <20200214160149.11681-1-sashal@kernel.org>
                   ` (4 preceding siblings ...)
  2020-02-14 16:00 ` [PATCH AUTOSEL 5.4 398/459] jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record Sasha Levin
@ 2020-02-14 16:00 ` Sasha Levin
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2020-02-14 16:00 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: zhangyi (F), Jan Kara, Theodore Ts'o, Sasha Levin, linux-ext4

From: "zhangyi (F)" <yi.zhang@huawei.com>

[ Upstream commit 0e98c084a21177ef136149c6a293b3d1eb33ff92 ]

Commit fb7c02445c49 ("ext4: pass -ESHUTDOWN code to jbd2 layer") want
to allow jbd2 layer to distinguish shutdown journal abort from other
error cases. So the ESHUTDOWN should be taken precedence over any other
errno which has already been recoded after EXT4_FLAGS_SHUTDOWN is set,
but it only update errno in the journal suoerblock now if the old errno
is 0.

Fixes: fb7c02445c49 ("ext4: pass -ESHUTDOWN code to jbd2 layer")
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191204124614.45424-4-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/jbd2/journal.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 65e78d3a2f64c..c1ce2805c5639 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2114,8 +2114,7 @@ static void __journal_abort_soft (journal_t *journal, int errno)
 
 	if (journal->j_flags & JBD2_ABORT) {
 		write_unlock(&journal->j_state_lock);
-		if (!old_errno && old_errno != -ESHUTDOWN &&
-		    errno == -ESHUTDOWN)
+		if (old_errno != -ESHUTDOWN && errno == -ESHUTDOWN)
 			jbd2_journal_update_sb_errno(journal);
 		return;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-02-14 17:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200214160149.11681-1-sashal@kernel.org>
2020-02-14 15:54 ` [PATCH AUTOSEL 5.4 026/459] ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAIT Sasha Levin
2020-02-14 15:55 ` [PATCH AUTOSEL 5.4 073/459] jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal Sasha Levin
2020-02-14 15:55 ` [PATCH AUTOSEL 5.4 074/459] ext4: fix deadlock allocating bio_post_read_ctx from mempool Sasha Levin
2020-02-14 15:55 ` [PATCH AUTOSEL 5.4 091/459] ext4, jbd2: ensure panic when aborting with zero errno Sasha Levin
2020-02-14 16:00 ` [PATCH AUTOSEL 5.4 398/459] jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record Sasha Levin
2020-02-14 16:00 ` [PATCH AUTOSEL 5.4 399/459] jbd2: make sure ESHUTDOWN to be recorded in the journal superblock Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).