All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] Miscellaneous error handling patches
@ 2021-09-27 18:05 Josef Bacik
  2021-09-27 18:05 ` [PATCH v3 1/5] btrfs: change handle_fs_error in recover_log_trees to aborts Josef Bacik
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Josef Bacik @ 2021-09-27 18:05 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Hello,

This series is left overs from a few different series.  The error handling
patches look like they just got missed somehow.  The FS_ERROR helper has been
updated based on the comments from Dave.  I'm attaching this to the open GH
thing that I was looking at to update, but really just has the FS_ERROR helper
patch from that series.  Thanks,

Josef

Josef Bacik (5):
  btrfs: change handle_fs_error in recover_log_trees to aborts
  btrfs: change error handling for btrfs_delete_*_in_log
  btrfs: add a BTRFS_FS_ERROR helper
  btrfs: do not infinite loop in data reclaim if we aborted
  btrfs: fix abort logic in btrfs_replace_file_extents

 fs/btrfs/ctree.h       |  3 +++
 fs/btrfs/disk-io.c     |  8 +++---
 fs/btrfs/extent_io.c   |  2 +-
 fs/btrfs/file.c        |  7 ++---
 fs/btrfs/inode.c       | 22 +++++-----------
 fs/btrfs/scrub.c       |  2 +-
 fs/btrfs/space-info.c  | 27 ++++++++++++++++---
 fs/btrfs/super.c       |  2 +-
 fs/btrfs/transaction.c | 11 ++++----
 fs/btrfs/tree-log.c    | 59 ++++++++++++++++--------------------------
 fs/btrfs/tree-log.h    | 16 ++++++------
 11 files changed, 78 insertions(+), 81 deletions(-)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/5] btrfs: change handle_fs_error in recover_log_trees to aborts
  2021-09-27 18:05 [PATCH v3 0/5] Miscellaneous error handling patches Josef Bacik
@ 2021-09-27 18:05 ` Josef Bacik
  2021-09-28  5:46   ` Anand Jain
  2021-09-27 18:05 ` [PATCH v3 2/5] btrfs: change error handling for btrfs_delete_*_in_log Josef Bacik
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2021-09-27 18:05 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Johannes Thumshirn

During inspection of the return path for replay I noticed that we don't
actually abort the transaction if we get a failure during replay.  This
isn't a problem necessarily, as we properly return the error and will
fail to mount.  However we still leave this dangling transaction that
could conceivably be committed without thinking there was an error.
We were using btrfs_handle_fs_error() here, but that pre-dates the
transaction abort code.  Simply replace the btrfs_handle_fs_error()
calls with transaction aborts, so we still know where exactly things
went wrong, and add a few in some other un-handled error cases.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/tree-log.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 30590ddd69ac..e0c2d4c7f939 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -6527,8 +6527,7 @@ int btrfs_recover_log_trees(struct btrfs_root *log_root_tree)
 		ret = btrfs_search_slot(NULL, log_root_tree, &key, path, 0, 0);
 
 		if (ret < 0) {
-			btrfs_handle_fs_error(fs_info, ret,
-				    "Couldn't find tree log root.");
+			btrfs_abort_transaction(trans, ret);
 			goto error;
 		}
 		if (ret > 0) {
@@ -6545,8 +6544,7 @@ int btrfs_recover_log_trees(struct btrfs_root *log_root_tree)
 		log = btrfs_read_tree_root(log_root_tree, &found_key);
 		if (IS_ERR(log)) {
 			ret = PTR_ERR(log);
-			btrfs_handle_fs_error(fs_info, ret,
-				    "Couldn't read tree log root.");
+			btrfs_abort_transaction(trans, ret);
 			goto error;
 		}
 
@@ -6574,8 +6572,7 @@ int btrfs_recover_log_trees(struct btrfs_root *log_root_tree)
 
 			if (!ret)
 				goto next;
-			btrfs_handle_fs_error(fs_info, ret,
-				"Couldn't read target root for tree log recovery.");
+			btrfs_abort_transaction(trans, ret);
 			goto error;
 		}
 
@@ -6583,14 +6580,15 @@ int btrfs_recover_log_trees(struct btrfs_root *log_root_tree)
 		ret = btrfs_record_root_in_trans(trans, wc.replay_dest);
 		if (ret)
 			/* The loop needs to continue due to the root refs */
-			btrfs_handle_fs_error(fs_info, ret,
-				"failed to record the log root in transaction");
+			btrfs_abort_transaction(trans, ret);
 		else
 			ret = walk_log_tree(trans, log, &wc);
 
 		if (!ret && wc.stage == LOG_WALK_REPLAY_ALL) {
 			ret = fixup_inode_link_counts(trans, wc.replay_dest,
 						      path);
+			if (ret)
+				btrfs_abort_transaction(trans, ret);
 		}
 
 		if (!ret && wc.stage == LOG_WALK_REPLAY_ALL) {
@@ -6607,6 +6605,8 @@ int btrfs_recover_log_trees(struct btrfs_root *log_root_tree)
 			 * could only happen during mount.
 			 */
 			ret = btrfs_init_root_free_objectid(root);
+			if (ret)
+				btrfs_abort_transaction(trans, ret);
 		}
 
 		wc.replay_dest->log_root = NULL;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/5] btrfs: change error handling for btrfs_delete_*_in_log
  2021-09-27 18:05 [PATCH v3 0/5] Miscellaneous error handling patches Josef Bacik
  2021-09-27 18:05 ` [PATCH v3 1/5] btrfs: change handle_fs_error in recover_log_trees to aborts Josef Bacik
@ 2021-09-27 18:05 ` Josef Bacik
  2021-09-28  9:53   ` Filipe Manana
  2021-09-27 18:05 ` [PATCH v3 3/5] btrfs: add a BTRFS_FS_ERROR helper Josef Bacik
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2021-09-27 18:05 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Currently we will abort the transaction if we get a random error (like
-EIO) while trying to remove the directory entries from the root log
during rename.

However the log sync stuff doesn't actually honor the transaction abort
flag, it'll happily commit the log even if we've aborted the transaction
for reasons related to the log, which is a pretty bad problem.

Fix this by making these functions void, as we don't actually care if
this fails.  Instead mark the log as requiring a full commit on error.
This will keep us from committing this bad log, and if we fsync we'll
force a full transaction commit and have a consistent file system.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c    | 16 +++-------------
 fs/btrfs/tree-log.c | 41 ++++++++++++++---------------------------
 fs/btrfs/tree-log.h | 16 ++++++++--------
 3 files changed, 25 insertions(+), 48 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a643be30c18a..03a843b9659b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4108,19 +4108,9 @@ static int __btrfs_unlink_inode(struct btrfs_trans_handle *trans,
 		goto err;
 	}
 
-	ret = btrfs_del_inode_ref_in_log(trans, root, name, name_len, inode,
-			dir_ino);
-	if (ret != 0 && ret != -ENOENT) {
-		btrfs_abort_transaction(trans, ret);
-		goto err;
-	}
-
-	ret = btrfs_del_dir_entries_in_log(trans, root, name, name_len, dir,
-			index);
-	if (ret == -ENOENT)
-		ret = 0;
-	else if (ret)
-		btrfs_abort_transaction(trans, ret);
+	btrfs_del_inode_ref_in_log(trans, root, name, name_len, inode,
+				   dir_ino);
+	btrfs_del_dir_entries_in_log(trans, root, name, name_len, dir, index);
 
 	/*
 	 * If we have a pending delayed iput we could end up with the final iput
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index e0c2d4c7f939..720723611875 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3500,10 +3500,10 @@ static bool inode_logged(struct btrfs_trans_handle *trans,
  * This optimizations allows us to avoid relogging the entire inode
  * or the entire directory.
  */
-int btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
-				 struct btrfs_root *root,
-				 const char *name, int name_len,
-				 struct btrfs_inode *dir, u64 index)
+void btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
+				  struct btrfs_root *root,
+				  const char *name, int name_len,
+				  struct btrfs_inode *dir, u64 index)
 {
 	struct btrfs_root *log;
 	struct btrfs_dir_item *di;
@@ -3513,11 +3513,11 @@ int btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
 	u64 dir_ino = btrfs_ino(dir);
 
 	if (!inode_logged(trans, dir))
-		return 0;
+		return;
 
 	ret = join_running_log_trans(root);
 	if (ret)
-		return 0;
+		return;
 
 	mutex_lock(&dir->log_mutex);
 
@@ -3565,49 +3565,36 @@ int btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
 	btrfs_free_path(path);
 out_unlock:
 	mutex_unlock(&dir->log_mutex);
-	if (err == -ENOSPC) {
+	if (err < 0 && err != -ENOENT)
 		btrfs_set_log_full_commit(trans);
-		err = 0;
-	} else if (err < 0 && err != -ENOENT) {
-		/* ENOENT can be returned if the entry hasn't been fsynced yet */
-		btrfs_abort_transaction(trans, err);
-	}
-
 	btrfs_end_log_trans(root);
-
-	return err;
 }
 
 /* see comments for btrfs_del_dir_entries_in_log */
-int btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans,
-			       struct btrfs_root *root,
-			       const char *name, int name_len,
-			       struct btrfs_inode *inode, u64 dirid)
+void btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans,
+				struct btrfs_root *root,
+				const char *name, int name_len,
+				struct btrfs_inode *inode, u64 dirid)
 {
 	struct btrfs_root *log;
 	u64 index;
 	int ret;
 
 	if (!inode_logged(trans, inode))
-		return 0;
+		return;
 
 	ret = join_running_log_trans(root);
 	if (ret)
-		return 0;
+		return;
 	log = root->log_root;
 	mutex_lock(&inode->log_mutex);
 
 	ret = btrfs_del_inode_ref(trans, log, name, name_len, btrfs_ino(inode),
 				  dirid, &index);
 	mutex_unlock(&inode->log_mutex);
-	if (ret == -ENOSPC) {
+	if (ret < 0 && ret != -ENOENT)
 		btrfs_set_log_full_commit(trans);
-		ret = 0;
-	} else if (ret < 0 && ret != -ENOENT)
-		btrfs_abort_transaction(trans, ret);
 	btrfs_end_log_trans(root);
-
-	return ret;
 }
 
 /*
diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
index 3ce6bdb76009..f6811c3df38a 100644
--- a/fs/btrfs/tree-log.h
+++ b/fs/btrfs/tree-log.h
@@ -70,14 +70,14 @@ int btrfs_recover_log_trees(struct btrfs_root *tree_root);
 int btrfs_log_dentry_safe(struct btrfs_trans_handle *trans,
 			  struct dentry *dentry,
 			  struct btrfs_log_ctx *ctx);
-int btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
-				 struct btrfs_root *root,
-				 const char *name, int name_len,
-				 struct btrfs_inode *dir, u64 index);
-int btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans,
-			       struct btrfs_root *root,
-			       const char *name, int name_len,
-			       struct btrfs_inode *inode, u64 dirid);
+void btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
+				  struct btrfs_root *root,
+				  const char *name, int name_len,
+				  struct btrfs_inode *dir, u64 index);
+void btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans,
+				struct btrfs_root *root,
+				const char *name, int name_len,
+				struct btrfs_inode *inode, u64 dirid);
 void btrfs_end_log_trans(struct btrfs_root *root);
 void btrfs_pin_log_trans(struct btrfs_root *root);
 void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans,
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/5] btrfs: add a BTRFS_FS_ERROR helper
  2021-09-27 18:05 [PATCH v3 0/5] Miscellaneous error handling patches Josef Bacik
  2021-09-27 18:05 ` [PATCH v3 1/5] btrfs: change handle_fs_error in recover_log_trees to aborts Josef Bacik
  2021-09-27 18:05 ` [PATCH v3 2/5] btrfs: change error handling for btrfs_delete_*_in_log Josef Bacik
@ 2021-09-27 18:05 ` Josef Bacik
  2021-09-28  7:38   ` Anand Jain
  2021-09-27 18:05 ` [PATCH v3 4/5] btrfs: do not infinite loop in data reclaim if we aborted Josef Bacik
  2021-09-27 18:05 ` [PATCH v3 5/5] btrfs: fix abort logic in btrfs_replace_file_extents Josef Bacik
  4 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2021-09-27 18:05 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We have a few flags that are inconsistently used to describe the fs in
different states of failure.  As of

	btrfs: always abort the transaction if we abort a trans handle

we will always set BTRFS_FS_STATE_ERROR if we abort, so we don't have to
check both ABORTED and ERROR to see if things have gone wrong.  Add a
helper to check BTRFS_FS_STATE_HELPER and then convert all checkers of
FS_STATE_ERROR to use the helper.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h       |  3 +++
 fs/btrfs/disk-io.c     |  8 +++-----
 fs/btrfs/extent_io.c   |  2 +-
 fs/btrfs/file.c        |  2 +-
 fs/btrfs/inode.c       |  6 +++---
 fs/btrfs/scrub.c       |  2 +-
 fs/btrfs/super.c       |  2 +-
 fs/btrfs/transaction.c | 11 +++++------
 fs/btrfs/tree-log.c    |  2 +-
 9 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 522ded06fd85..53f15decb523 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3576,6 +3576,9 @@ do {								\
 			  (errno), fmt, ##args);		\
 } while (0)
 
+#define BTRFS_FS_ERROR(fs_info)	(unlikely(test_bit(BTRFS_FS_STATE_ERROR, \
+						   &(fs_info)->fs_state)))
+
 __printf(5, 6)
 __cold
 void __btrfs_panic(struct btrfs_fs_info *fs_info, const char *function,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 37637539c5ab..1ae30b29f2b5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1954,8 +1954,7 @@ static int transaction_kthread(void *arg)
 		wake_up_process(fs_info->cleaner_kthread);
 		mutex_unlock(&fs_info->transaction_kthread_mutex);
 
-		if (unlikely(test_bit(BTRFS_FS_STATE_ERROR,
-				      &fs_info->fs_state)))
+		if (BTRFS_FS_ERROR(fs_info))
 			btrfs_cleanup_transaction(fs_info);
 		if (!kthread_should_stop() &&
 				(!btrfs_transaction_blocked(fs_info) ||
@@ -4232,7 +4231,7 @@ void btrfs_drop_and_free_fs_root(struct btrfs_fs_info *fs_info,
 		drop_ref = true;
 	spin_unlock(&fs_info->fs_roots_radix_lock);
 
-	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
+	if (BTRFS_FS_ERROR(fs_info)) {
 		ASSERT(root->log_root == NULL);
 		if (root->reloc_root) {
 			btrfs_put_root(root->reloc_root);
@@ -4383,8 +4382,7 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
 			btrfs_err(fs_info, "commit super ret %d", ret);
 	}
 
-	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state) ||
-	    test_bit(BTRFS_FS_STATE_TRANS_ABORTED, &fs_info->fs_state))
+	if (BTRFS_FS_ERROR(fs_info))
 		btrfs_error_commit_super(fs_info);
 
 	kthread_stop(fs_info->transaction_kthread);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c56973f7daae..f6f004d673a0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4876,7 +4876,7 @@ int btree_write_cache_pages(struct address_space *mapping,
 	 *   extent io tree. Thus we don't want to submit such wild eb
 	 *   if the fs already has error.
 	 */
-	if (!test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
+	if (!BTRFS_FS_ERROR(fs_info)) {
 		ret = flush_write_bio(&epd);
 	} else {
 		ret = -EROFS;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 7ff577005d0f..fdceab28587e 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2013,7 +2013,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
 	 * have opened a file as writable, we have to stop this write operation
 	 * to ensure consistency.
 	 */
-	if (test_bit(BTRFS_FS_STATE_ERROR, &inode->root->fs_info->fs_state))
+	if (BTRFS_FS_ERROR(inode->root->fs_info))
 		return -EROFS;
 
 	if (!(iocb->ki_flags & IOCB_DIRECT) &&
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 03a843b9659b..26d155e72152 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4368,7 +4368,7 @@ static void btrfs_prune_dentries(struct btrfs_root *root)
 	struct inode *inode;
 	u64 objectid = 0;
 
-	if (!test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
+	if (!BTRFS_FS_ERROR(fs_info))
 		WARN_ON(btrfs_root_refs(&root->root_item) != 0);
 
 	spin_lock(&root->inode_lock);
@@ -9981,7 +9981,7 @@ int btrfs_start_delalloc_snapshot(struct btrfs_root *root, bool in_reclaim_conte
 	};
 	struct btrfs_fs_info *fs_info = root->fs_info;
 
-	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
+	if (BTRFS_FS_ERROR(fs_info))
 		return -EROFS;
 
 	return start_delalloc_inodes(root, &wbc, true, in_reclaim_context);
@@ -10000,7 +10000,7 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, long nr,
 	struct list_head splice;
 	int ret;
 
-	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
+	if (BTRFS_FS_ERROR(fs_info))
 		return -EROFS;
 
 	INIT_LIST_HEAD(&splice);
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index bd3cd7427391..b1c26a90243b 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3955,7 +3955,7 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
 	int	ret;
 	struct btrfs_fs_info *fs_info = sctx->fs_info;
 
-	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
+	if (BTRFS_FS_ERROR(fs_info))
 		return -EROFS;
 
 	/* Seed devices of a new filesystem has their own generation. */
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 7f91d62c2225..a1c54a2c787c 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2006,7 +2006,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 		if (ret)
 			goto restore;
 	} else {
-		if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
+		if (BTRFS_FS_ERROR(fs_info)) {
 			btrfs_err(fs_info,
 				"Remounting read-write after error is not allowed");
 			ret = -EINVAL;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 14b9fdc8aaa9..1c3a1189c0bd 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -283,7 +283,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 	spin_lock(&fs_info->trans_lock);
 loop:
 	/* The file system has been taken offline. No new transactions. */
-	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
+	if (BTRFS_FS_ERROR(fs_info)) {
 		spin_unlock(&fs_info->trans_lock);
 		return -EROFS;
 	}
@@ -331,7 +331,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 		 */
 		kfree(cur_trans);
 		goto loop;
-	} else if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
+	} else if (BTRFS_FS_ERROR(fs_info)) {
 		spin_unlock(&fs_info->trans_lock);
 		kfree(cur_trans);
 		return -EROFS;
@@ -579,7 +579,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
 	bool do_chunk_alloc = false;
 	int ret;
 
-	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
+	if (BTRFS_FS_ERROR(fs_info))
 		return ERR_PTR(-EROFS);
 
 	if (current->journal_info) {
@@ -991,8 +991,7 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
 	if (throttle)
 		btrfs_run_delayed_iputs(info);
 
-	if (TRANS_ABORTED(trans) ||
-	    test_bit(BTRFS_FS_STATE_ERROR, &info->fs_state)) {
+	if (TRANS_ABORTED(trans) || BTRFS_FS_ERROR(info)) {
 		wake_up_process(info->transaction_kthread);
 		if (TRANS_ABORTED(trans))
 			err = trans->aborted;
@@ -2155,7 +2154,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 		 * abort to prevent writing a new superblock that reflects a
 		 * corrupt state (pointing to trees with unwritten nodes/leafs).
 		 */
-		if (test_bit(BTRFS_FS_STATE_TRANS_ABORTED, &fs_info->fs_state)) {
+		if (BTRFS_FS_ERROR(fs_info)) {
 			ret = -EROFS;
 			goto cleanup_transaction;
 		}
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 720723611875..9e5937685896 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3335,7 +3335,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
 	 * writing the super here would result in transid mismatches.  If there
 	 * is an error here just bail.
 	 */
-	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
+	if (BTRFS_FS_ERROR(fs_info)) {
 		ret = -EIO;
 		btrfs_set_log_full_commit(trans);
 		btrfs_abort_transaction(trans, ret);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 4/5] btrfs: do not infinite loop in data reclaim if we aborted
  2021-09-27 18:05 [PATCH v3 0/5] Miscellaneous error handling patches Josef Bacik
                   ` (2 preceding siblings ...)
  2021-09-27 18:05 ` [PATCH v3 3/5] btrfs: add a BTRFS_FS_ERROR helper Josef Bacik
@ 2021-09-27 18:05 ` Josef Bacik
  2021-09-28 10:22   ` Filipe Manana
  2021-10-01 21:48     ` kernel test robot
  2021-09-27 18:05 ` [PATCH v3 5/5] btrfs: fix abort logic in btrfs_replace_file_extents Josef Bacik
  4 siblings, 2 replies; 13+ messages in thread
From: Josef Bacik @ 2021-09-27 18:05 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Error injection stressing uncovered a busy loop in our data reclaim
loop.  There are two cases here, one where we loop creating block groups
until space_info->full is set, or in the main loop we will skip erroring
out any tickets if space_info->full == 0.  Unfortunately if we aborted
the transaction then we will never allocate chunks or reclaim any space
and thus never get ->full, and you'll see stack traces like this

watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/u4:4:139]
CPU: 0 PID: 139 Comm: kworker/u4:4 Tainted: G        W         5.13.0-rc1+ #328
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: events_unbound btrfs_async_reclaim_data_space
RIP: 0010:btrfs_join_transaction+0x12/0x20
RSP: 0018:ffffb2b780b77de0 EFLAGS: 00000246
RAX: ffffb2b781863d58 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000801 RSI: ffff987952b57400 RDI: ffff987940aa3000
RBP: ffff987954d55000 R08: 0000000000000001 R09: ffff98795539e8f0
R10: 000000000000000f R11: 000000000000000f R12: ffffffffffffffff
R13: ffff987952b574c8 R14: ffff987952b57400 R15: 0000000000000008
FS:  0000000000000000(0000) GS:ffff9879bbc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0703da4000 CR3: 0000000113398004 CR4: 0000000000370ef0
Call Trace:
 flush_space+0x4a8/0x660
 btrfs_async_reclaim_data_space+0x55/0x130
 process_one_work+0x1e9/0x380
 worker_thread+0x53/0x3e0
 ? process_one_work+0x380/0x380
 kthread+0x118/0x140
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x1f/0x30

Fix this by checking to see if we have a btrfs fs error in either of the
reclaim loops, and if so fail the tickets and bail.  In addition to
this, fix maybe_fail_all_tickets() to not try to grant tickets if we've
aborted, simply fail everything.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index aa5be0b24987..63423f9729c5 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -885,6 +885,7 @@ static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info,
 {
 	struct reserve_ticket *ticket;
 	u64 tickets_id = space_info->tickets_id;
+	const bool aborted = btrfs_has_fs_error(fs_info);
 
 	trace_btrfs_fail_all_tickets(fs_info, space_info);
 
@@ -898,16 +899,19 @@ static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info,
 		ticket = list_first_entry(&space_info->tickets,
 					  struct reserve_ticket, list);
 
-		if (ticket->steal &&
+		if (!aborted && ticket->steal &&
 		    steal_from_global_rsv(fs_info, space_info, ticket))
 			return true;
 
-		if (btrfs_test_opt(fs_info, ENOSPC_DEBUG))
+		if (!aborted && btrfs_test_opt(fs_info, ENOSPC_DEBUG))
 			btrfs_info(fs_info, "failing ticket with %llu bytes",
 				   ticket->bytes);
 
 		remove_ticket(space_info, ticket);
-		ticket->error = -ENOSPC;
+		if (aborted)
+			ticket->error = -EIO;
+		else
+			ticket->error = -ENOSPC;
 		wake_up(&ticket->wait);
 
 		/*
@@ -916,7 +920,8 @@ static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info,
 		 * here to see if we can make progress with the next ticket in
 		 * the list.
 		 */
-		btrfs_try_granting_tickets(fs_info, space_info);
+		if (!aborted)
+			btrfs_try_granting_tickets(fs_info, space_info);
 	}
 	return (tickets_id != space_info->tickets_id);
 }
@@ -1172,6 +1177,10 @@ static void btrfs_async_reclaim_data_space(struct work_struct *work)
 			spin_unlock(&space_info->lock);
 			return;
 		}
+
+		/* Something happened, fail everything and bail. */
+		if (btrfs_has_fs_error(fs_info))
+			goto aborted_fs;
 		last_tickets_id = space_info->tickets_id;
 		spin_unlock(&space_info->lock);
 	}
@@ -1202,9 +1211,19 @@ static void btrfs_async_reclaim_data_space(struct work_struct *work)
 			} else {
 				flush_state = 0;
 			}
+
+			/* Something happened, fail everything and bail. */
+			if (btrfs_has_fs_error(fs_info))
+				goto aborted_fs;
+
 		}
 		spin_unlock(&space_info->lock);
 	}
+	return;
+aborted_fs:
+	maybe_fail_all_tickets(fs_info, space_info);
+	space_info->flush = 0;
+	spin_unlock(&space_info->lock);
 }
 
 void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 5/5] btrfs: fix abort logic in btrfs_replace_file_extents
  2021-09-27 18:05 [PATCH v3 0/5] Miscellaneous error handling patches Josef Bacik
                   ` (3 preceding siblings ...)
  2021-09-27 18:05 ` [PATCH v3 4/5] btrfs: do not infinite loop in data reclaim if we aborted Josef Bacik
@ 2021-09-27 18:05 ` Josef Bacik
  2021-09-28 10:07   ` Filipe Manana
  4 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2021-09-27 18:05 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Error injection testing uncovered a case where we'd end up with a
corrupt file system with a missing extent in the middle of a file.  This
occurs because the if statement to decide if we should abort is wrong.
The only way we would abort in this case is if we got a ret !=
-EOPNOTSUPP and we called from the file clone code.  However the
prealloc code uses this path too.  Instead we need to abort if there is
an error, and the only error we _don't_ abort on is -EOPNOTSUPP and only
if we came from the clone file code.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/file.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index fdceab28587e..f9a1498cf030 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2710,8 +2710,9 @@ int btrfs_replace_file_extents(struct btrfs_inode *inode,
 			 * returned by __btrfs_drop_extents() without having
 			 * changed anything in the file.
 			 */
-			if (extent_info && !extent_info->is_new_extent &&
-			    ret && ret != -EOPNOTSUPP)
+			if (ret &&
+			    (ret != -EOPNOTSUPP ||
+			     (extent_info && extent_info->is_new_extent)))
 				btrfs_abort_transaction(trans, ret);
 			break;
 		}
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] btrfs: change handle_fs_error in recover_log_trees to aborts
  2021-09-27 18:05 ` [PATCH v3 1/5] btrfs: change handle_fs_error in recover_log_trees to aborts Josef Bacik
@ 2021-09-28  5:46   ` Anand Jain
  0 siblings, 0 replies; 13+ messages in thread
From: Anand Jain @ 2021-09-28  5:46 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: Johannes Thumshirn

On 28/09/2021 02:05, Josef Bacik wrote:
> During inspection of the return path for replay I noticed that we don't
> actually abort the transaction if we get a failure during replay.  This
> isn't a problem necessarily, as we properly return the error and will
> fail to mount.  However we still leave this dangling transaction that
> could conceivably be committed without thinking there was an error.
> We were using btrfs_handle_fs_error() here, but that pre-dates the
> transaction abort code.  Simply replace the btrfs_handle_fs_error()
> calls with transaction aborts, so we still know where exactly things
> went wrong, and add a few in some other un-handled error cases.
> 
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

  After this patch, failure from walk_log_tree() still won't call 
btrfs_abort_transaction(), is it intentional?

  Maybe it is time to enhance btrfs_abort_transaction() with an argument 
to pass those kernel-error-messages in the 3rd argument of 
btrfs_handle_fs_error() in btrfs_recover_log_trees().

  After this patch, it will continue to print function line numbers, but 
those kernel-error messages are more informative if printed.

Thanks, Anand

> ---
>   fs/btrfs/tree-log.c | 16 ++++++++--------
>   1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 30590ddd69ac..e0c2d4c7f939 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -6527,8 +6527,7 @@ int btrfs_recover_log_trees(struct btrfs_root *log_root_tree)
>   		ret = btrfs_search_slot(NULL, log_root_tree, &key, path, 0, 0);
>   
>   		if (ret < 0) {
> -			btrfs_handle_fs_error(fs_info, ret,
> -				    "Couldn't find tree log root.");
> +			btrfs_abort_transaction(trans, ret);
>   			goto error;
>   		}
>   		if (ret > 0) {
> @@ -6545,8 +6544,7 @@ int btrfs_recover_log_trees(struct btrfs_root *log_root_tree)
>   		log = btrfs_read_tree_root(log_root_tree, &found_key);
>   		if (IS_ERR(log)) {
>   			ret = PTR_ERR(log);
> -			btrfs_handle_fs_error(fs_info, ret,
> -				    "Couldn't read tree log root.");
> +			btrfs_abort_transaction(trans, ret);
>   			goto error;
>   		}
>   
> @@ -6574,8 +6572,7 @@ int btrfs_recover_log_trees(struct btrfs_root *log_root_tree)
>   
>   			if (!ret)
>   				goto next;
> -			btrfs_handle_fs_error(fs_info, ret,
> -				"Couldn't read target root for tree log recovery.");
> +			btrfs_abort_transaction(trans, ret);
>   			goto error;
>   		}
>   
> @@ -6583,14 +6580,15 @@ int btrfs_recover_log_trees(struct btrfs_root *log_root_tree)
>   		ret = btrfs_record_root_in_trans(trans, wc.replay_dest);
>   		if (ret)
>   			/* The loop needs to continue due to the root refs */
> -			btrfs_handle_fs_error(fs_info, ret,
> -				"failed to record the log root in transaction");
> +			btrfs_abort_transaction(trans, ret);
>   		else
>   			ret = walk_log_tree(trans, log, &wc);
>   
>   		if (!ret && wc.stage == LOG_WALK_REPLAY_ALL) {
>   			ret = fixup_inode_link_counts(trans, wc.replay_dest,
>   						      path);
> +			if (ret)
> +				btrfs_abort_transaction(trans, ret);
>   		}
>   
>   		if (!ret && wc.stage == LOG_WALK_REPLAY_ALL) {
> @@ -6607,6 +6605,8 @@ int btrfs_recover_log_trees(struct btrfs_root *log_root_tree)
>   			 * could only happen during mount.
>   			 */
>   			ret = btrfs_init_root_free_objectid(root);
> +			if (ret)
> +				btrfs_abort_transaction(trans, ret);
>   		}
>   
>   		wc.replay_dest->log_root = NULL;
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/5] btrfs: add a BTRFS_FS_ERROR helper
  2021-09-27 18:05 ` [PATCH v3 3/5] btrfs: add a BTRFS_FS_ERROR helper Josef Bacik
@ 2021-09-28  7:38   ` Anand Jain
  0 siblings, 0 replies; 13+ messages in thread
From: Anand Jain @ 2021-09-28  7:38 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team

On 28/09/2021 02:05, Josef Bacik wrote:
> We have a few flags that are inconsistently used to describe the fs in
> different states of failure.  As of
> 
> 	btrfs: always abort the transaction if we abort a trans handle
> 

> we will always set BTRFS_FS_STATE_ERROR if we abort, so we don't have to
> check both ABORTED and ERROR to see if things have gone wrong.

>  Add a
> helper to check BTRFS_FS_STATE_HELPER and then convert all checkers of
         Nit:                ERROR  ^^^

> FS_STATE_ERROR to use the helper.


> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

LGTM. So additionally, this patch will add compiler optimize unlikely()
at most of the places which are good IMO.

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

> ---
>   fs/btrfs/ctree.h       |  3 +++
>   fs/btrfs/disk-io.c     |  8 +++-----
>   fs/btrfs/extent_io.c   |  2 +-
>   fs/btrfs/file.c        |  2 +-
>   fs/btrfs/inode.c       |  6 +++---
>   fs/btrfs/scrub.c       |  2 +-
>   fs/btrfs/super.c       |  2 +-
>   fs/btrfs/transaction.c | 11 +++++------
>   fs/btrfs/tree-log.c    |  2 +-
>   9 files changed, 19 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 522ded06fd85..53f15decb523 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3576,6 +3576,9 @@ do {								\
>   			  (errno), fmt, ##args);		\
>   } while (0)
>   
> +#define BTRFS_FS_ERROR(fs_info)	(unlikely(test_bit(BTRFS_FS_STATE_ERROR, \
> +						   &(fs_info)->fs_state)))
> +
>   __printf(5, 6)
>   __cold
>   void __btrfs_panic(struct btrfs_fs_info *fs_info, const char *function,
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 37637539c5ab..1ae30b29f2b5 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1954,8 +1954,7 @@ static int transaction_kthread(void *arg)
>   		wake_up_process(fs_info->cleaner_kthread);
>   		mutex_unlock(&fs_info->transaction_kthread_mutex);
>   
> -		if (unlikely(test_bit(BTRFS_FS_STATE_ERROR,
> -				      &fs_info->fs_state)))
> +		if (BTRFS_FS_ERROR(fs_info))
>   			btrfs_cleanup_transaction(fs_info);
>   		if (!kthread_should_stop() &&
>   				(!btrfs_transaction_blocked(fs_info) ||
> @@ -4232,7 +4231,7 @@ void btrfs_drop_and_free_fs_root(struct btrfs_fs_info *fs_info,
>   		drop_ref = true;
>   	spin_unlock(&fs_info->fs_roots_radix_lock);
>   
> -	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
> +	if (BTRFS_FS_ERROR(fs_info)) {
>   		ASSERT(root->log_root == NULL);
>   		if (root->reloc_root) {
>   			btrfs_put_root(root->reloc_root);
> @@ -4383,8 +4382,7 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
>   			btrfs_err(fs_info, "commit super ret %d", ret);
>   	}
>   
> -	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state) ||
> -	    test_bit(BTRFS_FS_STATE_TRANS_ABORTED, &fs_info->fs_state))
> +	if (BTRFS_FS_ERROR(fs_info))
>   		btrfs_error_commit_super(fs_info);
>   
>   	kthread_stop(fs_info->transaction_kthread);
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index c56973f7daae..f6f004d673a0 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4876,7 +4876,7 @@ int btree_write_cache_pages(struct address_space *mapping,
>   	 *   extent io tree. Thus we don't want to submit such wild eb
>   	 *   if the fs already has error.
>   	 */
> -	if (!test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
> +	if (!BTRFS_FS_ERROR(fs_info)) {
>   		ret = flush_write_bio(&epd);
>   	} else {
>   		ret = -EROFS;
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 7ff577005d0f..fdceab28587e 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2013,7 +2013,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>   	 * have opened a file as writable, we have to stop this write operation
>   	 * to ensure consistency.
>   	 */
> -	if (test_bit(BTRFS_FS_STATE_ERROR, &inode->root->fs_info->fs_state))
> +	if (BTRFS_FS_ERROR(inode->root->fs_info))
>   		return -EROFS;
>   
>   	if (!(iocb->ki_flags & IOCB_DIRECT) &&
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 03a843b9659b..26d155e72152 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -4368,7 +4368,7 @@ static void btrfs_prune_dentries(struct btrfs_root *root)
>   	struct inode *inode;
>   	u64 objectid = 0;
>   
> -	if (!test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
> +	if (!BTRFS_FS_ERROR(fs_info))
>   		WARN_ON(btrfs_root_refs(&root->root_item) != 0);
>   
>   	spin_lock(&root->inode_lock);
> @@ -9981,7 +9981,7 @@ int btrfs_start_delalloc_snapshot(struct btrfs_root *root, bool in_reclaim_conte
>   	};
>   	struct btrfs_fs_info *fs_info = root->fs_info;
>   
> -	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
> +	if (BTRFS_FS_ERROR(fs_info))
>   		return -EROFS;
>   
>   	return start_delalloc_inodes(root, &wbc, true, in_reclaim_context);
> @@ -10000,7 +10000,7 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, long nr,
>   	struct list_head splice;
>   	int ret;
>   
> -	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
> +	if (BTRFS_FS_ERROR(fs_info))
>   		return -EROFS;
>   
>   	INIT_LIST_HEAD(&splice);
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index bd3cd7427391..b1c26a90243b 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -3955,7 +3955,7 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
>   	int	ret;
>   	struct btrfs_fs_info *fs_info = sctx->fs_info;
>   
> -	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
> +	if (BTRFS_FS_ERROR(fs_info))
>   		return -EROFS;
>   
>   	/* Seed devices of a new filesystem has their own generation. */
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 7f91d62c2225..a1c54a2c787c 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2006,7 +2006,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
>   		if (ret)
>   			goto restore;
>   	} else {
> -		if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
> +		if (BTRFS_FS_ERROR(fs_info)) {
>   			btrfs_err(fs_info,
>   				"Remounting read-write after error is not allowed");
>   			ret = -EINVAL;
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 14b9fdc8aaa9..1c3a1189c0bd 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -283,7 +283,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
>   	spin_lock(&fs_info->trans_lock);
>   loop:
>   	/* The file system has been taken offline. No new transactions. */
> -	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
> +	if (BTRFS_FS_ERROR(fs_info)) {
>   		spin_unlock(&fs_info->trans_lock);
>   		return -EROFS;
>   	}
> @@ -331,7 +331,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
>   		 */
>   		kfree(cur_trans);
>   		goto loop;
> -	} else if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
> +	} else if (BTRFS_FS_ERROR(fs_info)) {
>   		spin_unlock(&fs_info->trans_lock);
>   		kfree(cur_trans);
>   		return -EROFS;
> @@ -579,7 +579,7 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
>   	bool do_chunk_alloc = false;
>   	int ret;
>   
> -	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
> +	if (BTRFS_FS_ERROR(fs_info))
>   		return ERR_PTR(-EROFS);
>   
>   	if (current->journal_info) {
> @@ -991,8 +991,7 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
>   	if (throttle)
>   		btrfs_run_delayed_iputs(info);
>   
> -	if (TRANS_ABORTED(trans) ||
> -	    test_bit(BTRFS_FS_STATE_ERROR, &info->fs_state)) {
> +	if (TRANS_ABORTED(trans) || BTRFS_FS_ERROR(info)) {
>   		wake_up_process(info->transaction_kthread);
>   		if (TRANS_ABORTED(trans))
>   			err = trans->aborted;
> @@ -2155,7 +2154,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
>   		 * abort to prevent writing a new superblock that reflects a
>   		 * corrupt state (pointing to trees with unwritten nodes/leafs).
>   		 */
> -		if (test_bit(BTRFS_FS_STATE_TRANS_ABORTED, &fs_info->fs_state)) {
> +		if (BTRFS_FS_ERROR(fs_info)) {
>   			ret = -EROFS;
>   			goto cleanup_transaction;
>   		}
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 720723611875..9e5937685896 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -3335,7 +3335,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
>   	 * writing the super here would result in transid mismatches.  If there
>   	 * is an error here just bail.
>   	 */
> -	if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
> +	if (BTRFS_FS_ERROR(fs_info)) {
>   		ret = -EIO;
>   		btrfs_set_log_full_commit(trans);
>   		btrfs_abort_transaction(trans, ret);
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/5] btrfs: change error handling for btrfs_delete_*_in_log
  2021-09-27 18:05 ` [PATCH v3 2/5] btrfs: change error handling for btrfs_delete_*_in_log Josef Bacik
@ 2021-09-28  9:53   ` Filipe Manana
  0 siblings, 0 replies; 13+ messages in thread
From: Filipe Manana @ 2021-09-28  9:53 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Mon, Sep 27, 2021 at 7:06 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> Currently we will abort the transaction if we get a random error (like
> -EIO) while trying to remove the directory entries from the root log
> during rename.
>
> However the log sync stuff doesn't actually honor the transaction abort
> flag, it'll happily commit the log even if we've aborted the transaction
> for reasons related to the log, which is a pretty bad problem.

I'm not seeing how that would happen.
Since your commit 165ea85f14831f ("btrfs: do not write supers if we
have an fs error"), log syncing actually checks if a transaction was
aborted, and skips the log commit.
We're also aborting the transaction while holding the log pinned.
So I don't see how we can end up committing an inconsistent log if any
of the calls to remove dirents from the log fails.

Maybe this patch was prepared before that other patch?

Either way it seems the change log needs to be updated, because log
syncing checks for transaction aborts / fs error state.

>
> Fix this by making these functions void, as we don't actually care if
> this fails.  Instead mark the log as requiring a full commit on error.

This makes sense, there's really no need to abort a transaction,
forcing the next log commit attempt to fallback to a transaction
commit is more than enough.

> This will keep us from committing this bad log, and if we fsync we'll
> force a full transaction commit and have a consistent file system.

Codewise it looks good, I just don't think the part of committing a
bad log can happen after that other commit.

Thanks.

>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/inode.c    | 16 +++-------------
>  fs/btrfs/tree-log.c | 41 ++++++++++++++---------------------------
>  fs/btrfs/tree-log.h | 16 ++++++++--------
>  3 files changed, 25 insertions(+), 48 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index a643be30c18a..03a843b9659b 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -4108,19 +4108,9 @@ static int __btrfs_unlink_inode(struct btrfs_trans_handle *trans,
>                 goto err;
>         }
>
> -       ret = btrfs_del_inode_ref_in_log(trans, root, name, name_len, inode,
> -                       dir_ino);
> -       if (ret != 0 && ret != -ENOENT) {
> -               btrfs_abort_transaction(trans, ret);
> -               goto err;
> -       }
> -
> -       ret = btrfs_del_dir_entries_in_log(trans, root, name, name_len, dir,
> -                       index);
> -       if (ret == -ENOENT)
> -               ret = 0;
> -       else if (ret)
> -               btrfs_abort_transaction(trans, ret);
> +       btrfs_del_inode_ref_in_log(trans, root, name, name_len, inode,
> +                                  dir_ino);
> +       btrfs_del_dir_entries_in_log(trans, root, name, name_len, dir, index);
>
>         /*
>          * If we have a pending delayed iput we could end up with the final iput
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index e0c2d4c7f939..720723611875 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -3500,10 +3500,10 @@ static bool inode_logged(struct btrfs_trans_handle *trans,
>   * This optimizations allows us to avoid relogging the entire inode
>   * or the entire directory.
>   */
> -int btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
> -                                struct btrfs_root *root,
> -                                const char *name, int name_len,
> -                                struct btrfs_inode *dir, u64 index)
> +void btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
> +                                 struct btrfs_root *root,
> +                                 const char *name, int name_len,
> +                                 struct btrfs_inode *dir, u64 index)
>  {
>         struct btrfs_root *log;
>         struct btrfs_dir_item *di;
> @@ -3513,11 +3513,11 @@ int btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
>         u64 dir_ino = btrfs_ino(dir);
>
>         if (!inode_logged(trans, dir))
> -               return 0;
> +               return;
>
>         ret = join_running_log_trans(root);
>         if (ret)
> -               return 0;
> +               return;
>
>         mutex_lock(&dir->log_mutex);
>
> @@ -3565,49 +3565,36 @@ int btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
>         btrfs_free_path(path);
>  out_unlock:
>         mutex_unlock(&dir->log_mutex);
> -       if (err == -ENOSPC) {
> +       if (err < 0 && err != -ENOENT)
>                 btrfs_set_log_full_commit(trans);
> -               err = 0;
> -       } else if (err < 0 && err != -ENOENT) {
> -               /* ENOENT can be returned if the entry hasn't been fsynced yet */
> -               btrfs_abort_transaction(trans, err);
> -       }
> -
>         btrfs_end_log_trans(root);
> -
> -       return err;
>  }
>
>  /* see comments for btrfs_del_dir_entries_in_log */
> -int btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans,
> -                              struct btrfs_root *root,
> -                              const char *name, int name_len,
> -                              struct btrfs_inode *inode, u64 dirid)
> +void btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans,
> +                               struct btrfs_root *root,
> +                               const char *name, int name_len,
> +                               struct btrfs_inode *inode, u64 dirid)
>  {
>         struct btrfs_root *log;
>         u64 index;
>         int ret;
>
>         if (!inode_logged(trans, inode))
> -               return 0;
> +               return;
>
>         ret = join_running_log_trans(root);
>         if (ret)
> -               return 0;
> +               return;
>         log = root->log_root;
>         mutex_lock(&inode->log_mutex);
>
>         ret = btrfs_del_inode_ref(trans, log, name, name_len, btrfs_ino(inode),
>                                   dirid, &index);
>         mutex_unlock(&inode->log_mutex);
> -       if (ret == -ENOSPC) {
> +       if (ret < 0 && ret != -ENOENT)
>                 btrfs_set_log_full_commit(trans);
> -               ret = 0;
> -       } else if (ret < 0 && ret != -ENOENT)
> -               btrfs_abort_transaction(trans, ret);
>         btrfs_end_log_trans(root);
> -
> -       return ret;
>  }
>
>  /*
> diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
> index 3ce6bdb76009..f6811c3df38a 100644
> --- a/fs/btrfs/tree-log.h
> +++ b/fs/btrfs/tree-log.h
> @@ -70,14 +70,14 @@ int btrfs_recover_log_trees(struct btrfs_root *tree_root);
>  int btrfs_log_dentry_safe(struct btrfs_trans_handle *trans,
>                           struct dentry *dentry,
>                           struct btrfs_log_ctx *ctx);
> -int btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
> -                                struct btrfs_root *root,
> -                                const char *name, int name_len,
> -                                struct btrfs_inode *dir, u64 index);
> -int btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans,
> -                              struct btrfs_root *root,
> -                              const char *name, int name_len,
> -                              struct btrfs_inode *inode, u64 dirid);
> +void btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
> +                                 struct btrfs_root *root,
> +                                 const char *name, int name_len,
> +                                 struct btrfs_inode *dir, u64 index);
> +void btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans,
> +                               struct btrfs_root *root,
> +                               const char *name, int name_len,
> +                               struct btrfs_inode *inode, u64 dirid);
>  void btrfs_end_log_trans(struct btrfs_root *root);
>  void btrfs_pin_log_trans(struct btrfs_root *root);
>  void btrfs_record_unlink_dir(struct btrfs_trans_handle *trans,
> --
> 2.26.3
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 5/5] btrfs: fix abort logic in btrfs_replace_file_extents
  2021-09-27 18:05 ` [PATCH v3 5/5] btrfs: fix abort logic in btrfs_replace_file_extents Josef Bacik
@ 2021-09-28 10:07   ` Filipe Manana
  0 siblings, 0 replies; 13+ messages in thread
From: Filipe Manana @ 2021-09-28 10:07 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Mon, Sep 27, 2021 at 7:07 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> Error injection testing uncovered a case where we'd end up with a
> corrupt file system with a missing extent in the middle of a file.  This
> occurs because the if statement to decide if we should abort is wrong.
> The only way we would abort in this case is if we got a ret !=
> -EOPNOTSUPP and we called from the file clone code.  However the
> prealloc code uses this path too.  Instead we need to abort if there is
> an error, and the only error we _don't_ abort on is -EOPNOTSUPP and only
> if we came from the clone file code.

Looks good, but the comment above the conditional in the code needs to
be updated, to reflect the new logic as mentioned in the changelog.

Thanks.

>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/file.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index fdceab28587e..f9a1498cf030 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2710,8 +2710,9 @@ int btrfs_replace_file_extents(struct btrfs_inode *inode,
>                          * returned by __btrfs_drop_extents() without having
>                          * changed anything in the file.
>                          */
> -                       if (extent_info && !extent_info->is_new_extent &&
> -                           ret && ret != -EOPNOTSUPP)
> +                       if (ret &&
> +                           (ret != -EOPNOTSUPP ||
> +                            (extent_info && extent_info->is_new_extent)))
>                                 btrfs_abort_transaction(trans, ret);
>                         break;
>                 }
> --
> 2.26.3
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/5] btrfs: do not infinite loop in data reclaim if we aborted
  2021-09-27 18:05 ` [PATCH v3 4/5] btrfs: do not infinite loop in data reclaim if we aborted Josef Bacik
@ 2021-09-28 10:22   ` Filipe Manana
  2021-10-01 21:48     ` kernel test robot
  1 sibling, 0 replies; 13+ messages in thread
From: Filipe Manana @ 2021-09-28 10:22 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Mon, Sep 27, 2021 at 7:07 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> Error injection stressing uncovered a busy loop in our data reclaim
> loop.  There are two cases here, one where we loop creating block groups
> until space_info->full is set, or in the main loop we will skip erroring
> out any tickets if space_info->full == 0.  Unfortunately if we aborted
> the transaction then we will never allocate chunks or reclaim any space
> and thus never get ->full, and you'll see stack traces like this
>
> watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/u4:4:139]
> CPU: 0 PID: 139 Comm: kworker/u4:4 Tainted: G        W         5.13.0-rc1+ #328
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> Workqueue: events_unbound btrfs_async_reclaim_data_space
> RIP: 0010:btrfs_join_transaction+0x12/0x20
> RSP: 0018:ffffb2b780b77de0 EFLAGS: 00000246
> RAX: ffffb2b781863d58 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000801 RSI: ffff987952b57400 RDI: ffff987940aa3000
> RBP: ffff987954d55000 R08: 0000000000000001 R09: ffff98795539e8f0
> R10: 000000000000000f R11: 000000000000000f R12: ffffffffffffffff
> R13: ffff987952b574c8 R14: ffff987952b57400 R15: 0000000000000008
> FS:  0000000000000000(0000) GS:ffff9879bbc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f0703da4000 CR3: 0000000113398004 CR4: 0000000000370ef0
> Call Trace:
>  flush_space+0x4a8/0x660
>  btrfs_async_reclaim_data_space+0x55/0x130
>  process_one_work+0x1e9/0x380
>  worker_thread+0x53/0x3e0
>  ? process_one_work+0x380/0x380
>  kthread+0x118/0x140
>  ? __kthread_bind_mask+0x60/0x60
>  ret_from_fork+0x1f/0x30
>
> Fix this by checking to see if we have a btrfs fs error in either of the
> reclaim loops, and if so fail the tickets and bail.  In addition to
> this, fix maybe_fail_all_tickets() to not try to grant tickets if we've
> aborted, simply fail everything.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/space-info.c | 27 +++++++++++++++++++++++----
>  1 file changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
> index aa5be0b24987..63423f9729c5 100644
> --- a/fs/btrfs/space-info.c
> +++ b/fs/btrfs/space-info.c
> @@ -885,6 +885,7 @@ static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info,
>  {
>         struct reserve_ticket *ticket;
>         u64 tickets_id = space_info->tickets_id;
> +       const bool aborted = btrfs_has_fs_error(fs_info);
>
>         trace_btrfs_fail_all_tickets(fs_info, space_info);
>
> @@ -898,16 +899,19 @@ static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info,
>                 ticket = list_first_entry(&space_info->tickets,
>                                           struct reserve_ticket, list);
>
> -               if (ticket->steal &&
> +               if (!aborted && ticket->steal &&
>                     steal_from_global_rsv(fs_info, space_info, ticket))
>                         return true;
>
> -               if (btrfs_test_opt(fs_info, ENOSPC_DEBUG))
> +               if (!aborted && btrfs_test_opt(fs_info, ENOSPC_DEBUG))
>                         btrfs_info(fs_info, "failing ticket with %llu bytes",
>                                    ticket->bytes);
>
>                 remove_ticket(space_info, ticket);
> -               ticket->error = -ENOSPC;
> +               if (aborted)
> +                       ticket->error = -EIO;
> +               else
> +                       ticket->error = -ENOSPC;
>                 wake_up(&ticket->wait);
>
>                 /*
> @@ -916,7 +920,8 @@ static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info,
>                  * here to see if we can make progress with the next ticket in
>                  * the list.
>                  */
> -               btrfs_try_granting_tickets(fs_info, space_info);
> +               if (!aborted)
> +                       btrfs_try_granting_tickets(fs_info, space_info);
>         }
>         return (tickets_id != space_info->tickets_id);
>  }
> @@ -1172,6 +1177,10 @@ static void btrfs_async_reclaim_data_space(struct work_struct *work)
>                         spin_unlock(&space_info->lock);
>                         return;
>                 }
> +
> +               /* Something happened, fail everything and bail. */
> +               if (btrfs_has_fs_error(fs_info))
> +                       goto aborted_fs;
>                 last_tickets_id = space_info->tickets_id;
>                 spin_unlock(&space_info->lock);
>         }
> @@ -1202,9 +1211,19 @@ static void btrfs_async_reclaim_data_space(struct work_struct *work)
>                         } else {
>                                 flush_state = 0;
>                         }
> +
> +                       /* Something happened, fail everything and bail. */
> +                       if (btrfs_has_fs_error(fs_info))
> +                               goto aborted_fs;
> +
>                 }
>                 spin_unlock(&space_info->lock);
>         }
> +       return;
> +aborted_fs:
> +       maybe_fail_all_tickets(fs_info, space_info);
> +       space_info->flush = 0;
> +       spin_unlock(&space_info->lock);
>  }
>
>  void btrfs_init_async_reclaim_work(struct btrfs_fs_info *fs_info)
> --
> 2.26.3
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/5] btrfs: do not infinite loop in data reclaim if we aborted
  2021-09-27 18:05 ` [PATCH v3 4/5] btrfs: do not infinite loop in data reclaim if we aborted Josef Bacik
@ 2021-10-01 21:48     ` kernel test robot
  2021-10-01 21:48     ` kernel test robot
  1 sibling, 0 replies; 13+ messages in thread
From: kernel test robot @ 2021-10-01 21:48 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4490 bytes --]

Hi Josef,

I love your patch! Yet something to improve:

[auto build test ERROR on v5.15-rc3]
[cannot apply to kdave/for-next next-20210922]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Josef-Bacik/Miscellaneous-error-handling-patches/20210929-185151
base:    5816b3e6577eaa676ceb00a848f0fd65fe2adc29
config: nios2-allyesconfig (attached as .config)
compiler: nios2-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/4d626ae95cb373b954751bcdadacf6b0f92f3a6c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Josef-Bacik/Miscellaneous-error-handling-patches/20210929-185151
        git checkout 4d626ae95cb373b954751bcdadacf6b0f92f3a6c
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=nios2 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   fs/btrfs/space-info.c: In function 'maybe_fail_all_tickets':
>> fs/btrfs/space-info.c:888:30: error: implicit declaration of function 'btrfs_has_fs_error'; did you mean 'btrfs_handle_fs_error'? [-Werror=implicit-function-declaration]
     888 |         const bool aborted = btrfs_has_fs_error(fs_info);
         |                              ^~~~~~~~~~~~~~~~~~
         |                              btrfs_handle_fs_error
   cc1: all warnings being treated as errors


vim +888 fs/btrfs/space-info.c

   867	
   868	/*
   869	 * maybe_fail_all_tickets - we've exhausted our flushing, start failing tickets
   870	 * @fs_info - fs_info for this fs
   871	 * @space_info - the space info we were flushing
   872	 *
   873	 * We call this when we've exhausted our flushing ability and haven't made
   874	 * progress in satisfying tickets.  The reservation code handles tickets in
   875	 * order, so if there is a large ticket first and then smaller ones we could
   876	 * very well satisfy the smaller tickets.  This will attempt to wake up any
   877	 * tickets in the list to catch this case.
   878	 *
   879	 * This function returns true if it was able to make progress by clearing out
   880	 * other tickets, or if it stumbles across a ticket that was smaller than the
   881	 * first ticket.
   882	 */
   883	static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info,
   884					   struct btrfs_space_info *space_info)
   885	{
   886		struct reserve_ticket *ticket;
   887		u64 tickets_id = space_info->tickets_id;
 > 888		const bool aborted = btrfs_has_fs_error(fs_info);
   889	
   890		trace_btrfs_fail_all_tickets(fs_info, space_info);
   891	
   892		if (btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
   893			btrfs_info(fs_info, "cannot satisfy tickets, dumping space info");
   894			__btrfs_dump_space_info(fs_info, space_info);
   895		}
   896	
   897		while (!list_empty(&space_info->tickets) &&
   898		       tickets_id == space_info->tickets_id) {
   899			ticket = list_first_entry(&space_info->tickets,
   900						  struct reserve_ticket, list);
   901	
   902			if (!aborted && ticket->steal &&
   903			    steal_from_global_rsv(fs_info, space_info, ticket))
   904				return true;
   905	
   906			if (!aborted && btrfs_test_opt(fs_info, ENOSPC_DEBUG))
   907				btrfs_info(fs_info, "failing ticket with %llu bytes",
   908					   ticket->bytes);
   909	
   910			remove_ticket(space_info, ticket);
   911			if (aborted)
   912				ticket->error = -EIO;
   913			else
   914				ticket->error = -ENOSPC;
   915			wake_up(&ticket->wait);
   916	
   917			/*
   918			 * We're just throwing tickets away, so more flushing may not
   919			 * trip over btrfs_try_granting_tickets, so we need to call it
   920			 * here to see if we can make progress with the next ticket in
   921			 * the list.
   922			 */
   923			if (!aborted)
   924				btrfs_try_granting_tickets(fs_info, space_info);
   925		}
   926		return (tickets_id != space_info->tickets_id);
   927	}
   928	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 60767 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/5] btrfs: do not infinite loop in data reclaim if we aborted
@ 2021-10-01 21:48     ` kernel test robot
  0 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2021-10-01 21:48 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4597 bytes --]

Hi Josef,

I love your patch! Yet something to improve:

[auto build test ERROR on v5.15-rc3]
[cannot apply to kdave/for-next next-20210922]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Josef-Bacik/Miscellaneous-error-handling-patches/20210929-185151
base:    5816b3e6577eaa676ceb00a848f0fd65fe2adc29
config: nios2-allyesconfig (attached as .config)
compiler: nios2-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/4d626ae95cb373b954751bcdadacf6b0f92f3a6c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Josef-Bacik/Miscellaneous-error-handling-patches/20210929-185151
        git checkout 4d626ae95cb373b954751bcdadacf6b0f92f3a6c
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=nios2 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   fs/btrfs/space-info.c: In function 'maybe_fail_all_tickets':
>> fs/btrfs/space-info.c:888:30: error: implicit declaration of function 'btrfs_has_fs_error'; did you mean 'btrfs_handle_fs_error'? [-Werror=implicit-function-declaration]
     888 |         const bool aborted = btrfs_has_fs_error(fs_info);
         |                              ^~~~~~~~~~~~~~~~~~
         |                              btrfs_handle_fs_error
   cc1: all warnings being treated as errors


vim +888 fs/btrfs/space-info.c

   867	
   868	/*
   869	 * maybe_fail_all_tickets - we've exhausted our flushing, start failing tickets
   870	 * @fs_info - fs_info for this fs
   871	 * @space_info - the space info we were flushing
   872	 *
   873	 * We call this when we've exhausted our flushing ability and haven't made
   874	 * progress in satisfying tickets.  The reservation code handles tickets in
   875	 * order, so if there is a large ticket first and then smaller ones we could
   876	 * very well satisfy the smaller tickets.  This will attempt to wake up any
   877	 * tickets in the list to catch this case.
   878	 *
   879	 * This function returns true if it was able to make progress by clearing out
   880	 * other tickets, or if it stumbles across a ticket that was smaller than the
   881	 * first ticket.
   882	 */
   883	static bool maybe_fail_all_tickets(struct btrfs_fs_info *fs_info,
   884					   struct btrfs_space_info *space_info)
   885	{
   886		struct reserve_ticket *ticket;
   887		u64 tickets_id = space_info->tickets_id;
 > 888		const bool aborted = btrfs_has_fs_error(fs_info);
   889	
   890		trace_btrfs_fail_all_tickets(fs_info, space_info);
   891	
   892		if (btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
   893			btrfs_info(fs_info, "cannot satisfy tickets, dumping space info");
   894			__btrfs_dump_space_info(fs_info, space_info);
   895		}
   896	
   897		while (!list_empty(&space_info->tickets) &&
   898		       tickets_id == space_info->tickets_id) {
   899			ticket = list_first_entry(&space_info->tickets,
   900						  struct reserve_ticket, list);
   901	
   902			if (!aborted && ticket->steal &&
   903			    steal_from_global_rsv(fs_info, space_info, ticket))
   904				return true;
   905	
   906			if (!aborted && btrfs_test_opt(fs_info, ENOSPC_DEBUG))
   907				btrfs_info(fs_info, "failing ticket with %llu bytes",
   908					   ticket->bytes);
   909	
   910			remove_ticket(space_info, ticket);
   911			if (aborted)
   912				ticket->error = -EIO;
   913			else
   914				ticket->error = -ENOSPC;
   915			wake_up(&ticket->wait);
   916	
   917			/*
   918			 * We're just throwing tickets away, so more flushing may not
   919			 * trip over btrfs_try_granting_tickets, so we need to call it
   920			 * here to see if we can make progress with the next ticket in
   921			 * the list.
   922			 */
   923			if (!aborted)
   924				btrfs_try_granting_tickets(fs_info, space_info);
   925		}
   926		return (tickets_id != space_info->tickets_id);
   927	}
   928	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 60767 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-10-01 21:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-27 18:05 [PATCH v3 0/5] Miscellaneous error handling patches Josef Bacik
2021-09-27 18:05 ` [PATCH v3 1/5] btrfs: change handle_fs_error in recover_log_trees to aborts Josef Bacik
2021-09-28  5:46   ` Anand Jain
2021-09-27 18:05 ` [PATCH v3 2/5] btrfs: change error handling for btrfs_delete_*_in_log Josef Bacik
2021-09-28  9:53   ` Filipe Manana
2021-09-27 18:05 ` [PATCH v3 3/5] btrfs: add a BTRFS_FS_ERROR helper Josef Bacik
2021-09-28  7:38   ` Anand Jain
2021-09-27 18:05 ` [PATCH v3 4/5] btrfs: do not infinite loop in data reclaim if we aborted Josef Bacik
2021-09-28 10:22   ` Filipe Manana
2021-10-01 21:48   ` kernel test robot
2021-10-01 21:48     ` kernel test robot
2021-09-27 18:05 ` [PATCH v3 5/5] btrfs: fix abort logic in btrfs_replace_file_extents Josef Bacik
2021-09-28 10:07   ` Filipe Manana

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.