linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] btrfs: some log tree fixes and cleanups
@ 2023-01-10 14:56 fdmanana
  2023-01-10 14:56 ` [PATCH 1/8] btrfs: fix missing error handling when logging directory items fdmanana
                   ` (9 more replies)
  0 siblings, 10 replies; 13+ messages in thread
From: fdmanana @ 2023-01-10 14:56 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Some fixes mostly around directory logging and some cleanups (the last
three patches).

Filipe Manana (8):
  btrfs: fix missing error handling when logging directory items
  btrfs: fix directory logging due to race with concurrent index key deletion
  btrfs: add missing setup of log for full commit at add_conflicting_inode()
  btrfs: do not abort transaction on failure to write log tree when syncing log
  btrfs: do not abort transaction on failure to update log root
  btrfs: simplify update of last_dir_index_offset when logging a directory
  btrfs: use a negative value for BTRFS_LOG_FORCE_COMMIT
  btrfs: use a single variable to track return value for log_dir_items()

 fs/btrfs/disk-io.c  |  9 ++++-
 fs/btrfs/tree-log.c | 91 ++++++++++++++++++++++++++++-----------------
 fs/btrfs/tree-log.h | 11 ++++--
 3 files changed, 71 insertions(+), 40 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/8] btrfs: fix missing error handling when logging directory items
  2023-01-10 14:56 [PATCH 0/8] btrfs: some log tree fixes and cleanups fdmanana
@ 2023-01-10 14:56 ` fdmanana
  2023-01-10 14:56 ` [PATCH 2/8] btrfs: fix directory logging due to race with concurrent index key deletion fdmanana
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: fdmanana @ 2023-01-10 14:56 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When logging a directory, at log_dir_items(), if we get an error when
attempting to search the subvolume tree for a dir index item, we end up
returning 0 (success) from log_dir_items() because 'err' is left with a
value of 0.

This can lead to a few problems, specially in the case the variable
'last_offset' has a value of (u64)-1 (and it's initialized to that when
it was declared):

1) By returning from log_dir_items() with success (0) and a value of
   (u64)-1 for '*last_offset_ret', we end up not logging any other dir
   index keys that follow the missing, just deleted, index key. The
   (u64)-1 value makes log_directory_changes() not call log_dir_items()
   again;

2) Before returning with success (0), log_dir_items(), will log a dir
   index range item covering a range from the last old dentry index
   (stored in the variable 'last_old_dentry_offset') to the value of
   'last_offset'. If 'last_offset' has a value of (u64)-1, then it means
   if the log is persisted and replayed after a power failure, it will
   cause deletion of all the directory entries that have an index number
   between last_old_dentry_offset + 1 and (u64)-1;

3) We can end up returning from log_dir_items() with
   ctx->last_dir_item_offset having a lower value than
   inode->last_dir_index_offset, because the former is set to the current
   key we are processing at process_dir_items_leaf(), and at the end of
   log_directory_changes() we set inode->last_dir_index_offset to the
   current value of ctx->last_dir_item_offset. So if for example a
   deletion of a lower dir index key happened, we set
   ctx->last_dir_item_offset to that index value, then if we return from
   log_dir_items() because btrfs_search_slot() returned an error, we end up
   returning without any error from log_dir_items() and then
   log_directory_changes() sets inode->last_dir_index_offset to a lower
   value than it had before.
   This can result in unpredictable and unexpected behaviour when we
   need to log again the directory in the same transaction, and can result
   in ending up with a log tree leaf that has duplicated keys, as we do
   batch insertions of dir index keys into a log tree.

Fix this by setting 'err' to the value of 'ret' in case
btrfs_search_slot() or btrfs_previous_item() returned an error. That will
result in falling back to a full transaction commit.

Reported-by: David Arendt <admin@prnet.org>
Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/
Fixes: e02119d5a7b4 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/tree-log.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index fb52aa060093..3ef0266e9527 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3826,7 +3826,10 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
 					      path->slots[0]);
 			if (tmp.type == BTRFS_DIR_INDEX_KEY)
 				last_old_dentry_offset = tmp.offset;
+		} else if (ret < 0) {
+			err = ret;
 		}
+
 		goto done;
 	}
 
@@ -3846,7 +3849,11 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
 		 */
 		if (tmp.type == BTRFS_DIR_INDEX_KEY)
 			last_old_dentry_offset = tmp.offset;
+	} else if (ret < 0) {
+		err = ret;
+		goto done;
 	}
+
 	btrfs_release_path(path);
 
 	/*
@@ -3859,6 +3866,8 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
 	 */
 search:
 	ret = btrfs_search_slot(NULL, root, &min_key, path, 0, 0);
+	if (ret < 0)
+		err = ret;
 	if (ret != 0)
 		goto done;
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/8] btrfs: fix directory logging due to race with concurrent index key deletion
  2023-01-10 14:56 [PATCH 0/8] btrfs: some log tree fixes and cleanups fdmanana
  2023-01-10 14:56 ` [PATCH 1/8] btrfs: fix missing error handling when logging directory items fdmanana
@ 2023-01-10 14:56 ` fdmanana
  2023-01-10 14:56 ` [PATCH 3/8] btrfs: add missing setup of log for full commit at add_conflicting_inode() fdmanana
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: fdmanana @ 2023-01-10 14:56 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Sometimes we log a directory without holding its VFS lock, so while we
logging it, dir index entries may be added or removed. This typically
happens when logging a dentry from a parent directory that points to a
new directory, through log_new_dir_dentries(), or when while logging
some other inode we also need to log its parent directories (through
btrfs_log_all_parents()).

This means that while we are at log_dir_items(), we may not find a dir
index key we found before, because it was deleted in the meanwhile, so
a call to btrfs_search_slot() may return 1 (key not found). In that case
we return from log_dir_items() with a success value (the variable 'err'
has a value of 0). This can lead to a few problems, specially in the case
where the variable 'last_offset' has a value of (u64)-1 (and it's
initialized to that when it was declared):

1) By returning from log_dir_items() with success (0) and a value of
   (u64)-1 for '*last_offset_ret', we end up not logging any other dir
   index keys that follow the missing, just deleted, index key. The
   (u64)-1 value makes log_directory_changes() not call log_dir_items()
   again;

2) Before returning with success (0), log_dir_items(), will log a dir
   index range item covering a range from the last old dentry index
   (stored in the variable 'last_old_dentry_offset') to the value of
   'last_offset'. If 'last_offset' has a value of (u64)-1, then it means
   if the log is persisted and replayed after a power failure, it will
   cause deletion of all the directory entries that have an index number
   between last_old_dentry_offset + 1 and (u64)-1;

3) We can end up returning from log_dir_items() with
   ctx->last_dir_item_offset having a lower value than
   inode->last_dir_index_offset, because the former is set to the current
   key we are processing at process_dir_items_leaf(), and at the end of
   log_directory_changes() we set inode->last_dir_index_offset to the
   current value of ctx->last_dir_item_offset. So if for example a
   deletion of a lower dir index key happened, we set
   ctx->last_dir_item_offset to that index value, then if we return from
   log_dir_items() because btrfs_search_slot() returned 1, we end up
   returning from log_dir_items() with success (0) and then
   log_directory_changes() sets inode->last_dir_index_offset to a lower
   value than it had before.
   This can result in unpredictable and unexpected behaviour when we
   need to log again the directory in the same transaction, and can result
   in ending up with a log tree leaf that has duplicated keys, as we do
   batch insertions of dir index keys into a log tree.

So fix this by making log_dir_items() move on to the next dir index key
if it does not find the one it was looking for.

Reported-by: David Arendt <admin@prnet.org>
Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/tree-log.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 3ef0266e9527..c09daab3f19e 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3857,17 +3857,26 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
 	btrfs_release_path(path);
 
 	/*
-	 * Find the first key from this transaction again.  See the note for
-	 * log_new_dir_dentries, if we're logging a directory recursively we
-	 * won't be holding its i_mutex, which means we can modify the directory
-	 * while we're logging it.  If we remove an entry between our first
-	 * search and this search we'll not find the key again and can just
-	 * bail.
+	 * Find the first key from this transaction again or the one we were at
+	 * in the loop below in case we had to reschedule. We may be logging the
+	 * directory without holding its VFS lock, which happen when logging new
+	 * dentries (through log_new_dir_dentries()) or in some cases when we
+	 * need to log the parent directory of an inode. This means a dir index
+	 * key might be deleted from the inode's root, and therefore we may not
+	 * find it anymore. If we can't find it, just move to the next key. We
+	 * can not bail out and ignore, because if we do that we will simply
+	 * not log dir index keys that come after the one that was just deleted
+	 * and we can end up logging a dir index range that ends at (u64)-1
+	 * (@last_offset is initialized to that), resulting in removing dir
+	 * entries we should not remove at log replay time.
 	 */
 search:
 	ret = btrfs_search_slot(NULL, root, &min_key, path, 0, 0);
+	if (ret > 0)
+		ret = btrfs_next_item(root, path);
 	if (ret < 0)
 		err = ret;
+	/* If ret is 1, there are no more keys in the inode's root. */
 	if (ret != 0)
 		goto done;
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/8] btrfs: add missing setup of log for full commit at add_conflicting_inode()
  2023-01-10 14:56 [PATCH 0/8] btrfs: some log tree fixes and cleanups fdmanana
  2023-01-10 14:56 ` [PATCH 1/8] btrfs: fix missing error handling when logging directory items fdmanana
  2023-01-10 14:56 ` [PATCH 2/8] btrfs: fix directory logging due to race with concurrent index key deletion fdmanana
@ 2023-01-10 14:56 ` fdmanana
  2023-01-10 14:56 ` [PATCH 4/8] btrfs: do not abort transaction on failure to write log tree when syncing log fdmanana
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: fdmanana @ 2023-01-10 14:56 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When logging conflicting inodes, if we reach the maximum limit of inodes,
we return BTRFS_LOG_FORCE_COMMIT to force a transaction commit. However
we don't mark the log for full commit (with btrfs_set_log_full_commit()),
which means that once we leave the log transaction and before we commit
the transaction, some other task may sync the log, which is incomplete
as we have not logged all conflicting inodes, leading to some inconsistent
in case that log ends up being replayed.

So also call btrfs_set_log_full_commit() at add_conflicting_inode().

Fixes: e09d94c9e448 ("btrfs: log conflicting inodes without holding log mutex of the initial inode")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/tree-log.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index c09daab3f19e..afad44a0becf 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -5598,8 +5598,10 @@ static int add_conflicting_inode(struct btrfs_trans_handle *trans,
 	 * LOG_INODE_EXISTS mode) and slow down other fsyncs or transaction
 	 * commits.
 	 */
-	if (ctx->num_conflict_inodes >= MAX_CONFLICT_INODES)
+	if (ctx->num_conflict_inodes >= MAX_CONFLICT_INODES) {
+		btrfs_set_log_full_commit(trans);
 		return BTRFS_LOG_FORCE_COMMIT;
+	}
 
 	inode = btrfs_iget(root->fs_info->sb, ino, root);
 	/*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/8] btrfs: do not abort transaction on failure to write log tree when syncing log
  2023-01-10 14:56 [PATCH 0/8] btrfs: some log tree fixes and cleanups fdmanana
                   ` (2 preceding siblings ...)
  2023-01-10 14:56 ` [PATCH 3/8] btrfs: add missing setup of log for full commit at add_conflicting_inode() fdmanana
@ 2023-01-10 14:56 ` fdmanana
  2023-01-10 14:56 ` [PATCH 5/8] btrfs: do not abort transaction on failure to update log root fdmanana
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: fdmanana @ 2023-01-10 14:56 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When syncing the log, if we fail to write log tree extent buffers, we mark
the log for a full commit and abort the transaction. However we don't need
to abort the transaction, all we really need to do is to make sure no one
can commit a superblock pointing to new log tree roots. Just because we
got a failure writing extent buffers for a log tree, it does not mean we
will also fail to do a transaction commit.

One particular case is if due to a bug somewhere, when writing log tree
extent buffers, the tree checker detects some corruption and the writeout
fails because of that. Aborting the transaction can be very disruptive for
a user, specially if the issue happened on a root filesystem. One example
is the scenario in the Link tag below, where an isolated corruption on log
tree leaves was causing transaction aborts when syncing the log.

Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/disk-io.c  | 9 ++++++++-
 fs/btrfs/tree-log.c | 2 --
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a7d5a3967ba0..7586a8e9b718 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -367,7 +367,14 @@ static int csum_one_extent_buffer(struct extent_buffer *eb)
 	btrfs_print_tree(eb, 0);
 	btrfs_err(fs_info, "block=%llu write time tree block corruption detected",
 		  eb->start);
-	WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+	/*
+	 * Be noisy if this is an extent buffer from a log tree. We don't abort
+	 * a transaction in case there's a bad log tree extent buffer, we just
+	 * fallback to a transaction commit. Still we want to know when there is
+	 * a bad log tree extent buffer, as that may signal a bug somewhere.
+	 */
+	WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG) ||
+		btrfs_header_owner(eb) == BTRFS_TREE_LOG_OBJECTID);
 	return ret;
 }
 
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index afad44a0becf..1f70d4ebffae 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2980,7 +2980,6 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
 		ret = 0;
 	if (ret) {
 		blk_finish_plug(&plug);
-		btrfs_abort_transaction(trans, ret);
 		btrfs_set_log_full_commit(trans);
 		mutex_unlock(&root->log_mutex);
 		goto out;
@@ -3112,7 +3111,6 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
 		goto out_wake_log_root;
 	} else if (ret) {
 		btrfs_set_log_full_commit(trans);
-		btrfs_abort_transaction(trans, ret);
 		mutex_unlock(&log_root_tree->log_mutex);
 		goto out_wake_log_root;
 	}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/8] btrfs: do not abort transaction on failure to update log root
  2023-01-10 14:56 [PATCH 0/8] btrfs: some log tree fixes and cleanups fdmanana
                   ` (3 preceding siblings ...)
  2023-01-10 14:56 ` [PATCH 4/8] btrfs: do not abort transaction on failure to write log tree when syncing log fdmanana
@ 2023-01-10 14:56 ` fdmanana
  2023-01-10 14:56 ` [PATCH 6/8] btrfs: simplify update of last_dir_index_offset when logging a directory fdmanana
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: fdmanana @ 2023-01-10 14:56 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When syncing a log, if we fail to update a log root in the log root tree,
we are aborting the transaction if the failure was not -ENOSPC. This is
excessive because there is a chance that a transaction commit can succeed,
and therefore avoid to turn the filesystem into RO mode. All we need to be
careful about is to mark the log for a full commit, which we already do,
to make sure no one commits a super block pointing to an oudated log root
tree.

So don't abort the transaction if we fail to update a log root in the log
root tree, and log an error if the failure is not -ENOSPC, so that it does
not go completely unnoticed.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/tree-log.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 1f70d4ebffae..d43261545264 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3044,15 +3044,12 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
 
 		blk_finish_plug(&plug);
 		btrfs_set_log_full_commit(trans);
-
-		if (ret != -ENOSPC) {
-			btrfs_abort_transaction(trans, ret);
-			mutex_unlock(&log_root_tree->log_mutex);
-			goto out;
-		}
+		if (ret != -ENOSPC)
+			btrfs_err(fs_info,
+				  "failed to update log for root %llu ret %d",
+				  root->root_key.objectid, ret);
 		btrfs_wait_tree_log_extents(log, mark);
 		mutex_unlock(&log_root_tree->log_mutex);
-		ret = BTRFS_LOG_FORCE_COMMIT;
 		goto out;
 	}
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/8] btrfs: simplify update of last_dir_index_offset when logging a directory
  2023-01-10 14:56 [PATCH 0/8] btrfs: some log tree fixes and cleanups fdmanana
                   ` (4 preceding siblings ...)
  2023-01-10 14:56 ` [PATCH 5/8] btrfs: do not abort transaction on failure to update log root fdmanana
@ 2023-01-10 14:56 ` fdmanana
  2023-01-31 17:42   ` Filipe Manana
  2023-01-10 14:56 ` [PATCH 7/8] btrfs: use a negative value for BTRFS_LOG_FORCE_COMMIT fdmanana
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 13+ messages in thread
From: fdmanana @ 2023-01-10 14:56 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When logging a directory, we always set the inode's last_dir_index_offset
to the offset of the last dir index item we found. This is using an extra
field in the log context structure, and it makes more sense to update it
only after we insert dir index items, and we could directly update the
inode's last_dir_index_offset field instead.

So make this simpler by updating the inode's last_dir_index_offset only
when we actually insert dir index keys in the log tree, and getting rid
of the last_dir_item_offset field in the log context structure.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/tree-log.c | 23 +++++++++++++++++------
 fs/btrfs/tree-log.h |  2 --
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index d43261545264..58599189bd18 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3576,17 +3576,19 @@ static noinline int insert_dir_log_key(struct btrfs_trans_handle *trans,
 }
 
 static int flush_dir_items_batch(struct btrfs_trans_handle *trans,
-				 struct btrfs_root *log,
+				 struct btrfs_inode *inode,
 				 struct extent_buffer *src,
 				 struct btrfs_path *dst_path,
 				 int start_slot,
 				 int count)
 {
+	struct btrfs_root *log = inode->root->log_root;
 	char *ins_data = NULL;
 	struct btrfs_item_batch batch;
 	struct extent_buffer *dst;
 	unsigned long src_offset;
 	unsigned long dst_offset;
+	u64 last_index;
 	struct btrfs_key key;
 	u32 item_size;
 	int ret;
@@ -3644,6 +3646,19 @@ static int flush_dir_items_batch(struct btrfs_trans_handle *trans,
 	src_offset = btrfs_item_ptr_offset(src, start_slot + count - 1);
 	copy_extent_buffer(dst, src, dst_offset, src_offset, batch.total_data_size);
 	btrfs_release_path(dst_path);
+
+	last_index = batch.keys[count - 1].offset;
+	ASSERT(last_index > inode->last_dir_index_offset);
+
+	/*
+	 * If for some unexpected reason the last item's index is not greater
+	 * than the last index we logged, warn and return an error to fallback
+	 * to a transaction commit.
+	 */
+	if (WARN_ON(last_index <= inode->last_dir_index_offset))
+		ret = -EUCLEAN;
+	else
+		inode->last_dir_index_offset = last_index;
 out:
 	kfree(ins_data);
 
@@ -3693,7 +3708,6 @@ static int process_dir_items_leaf(struct btrfs_trans_handle *trans,
 		}
 
 		di = btrfs_item_ptr(src, i, struct btrfs_dir_item);
-		ctx->last_dir_item_offset = key.offset;
 
 		/*
 		 * Skip ranges of items that consist only of dir item keys created
@@ -3756,7 +3770,7 @@ static int process_dir_items_leaf(struct btrfs_trans_handle *trans,
 	if (batch_size > 0) {
 		int ret;
 
-		ret = flush_dir_items_batch(trans, log, src, dst_path,
+		ret = flush_dir_items_batch(trans, inode, src, dst_path,
 					    batch_start, batch_size);
 		if (ret < 0)
 			return ret;
@@ -4044,7 +4058,6 @@ static noinline int log_directory_changes(struct btrfs_trans_handle *trans,
 
 	min_key = BTRFS_DIR_START_INDEX;
 	max_key = 0;
-	ctx->last_dir_item_offset = inode->last_dir_index_offset;
 
 	while (1) {
 		ret = log_dir_items(trans, inode, path, dst_path,
@@ -4056,8 +4069,6 @@ static noinline int log_directory_changes(struct btrfs_trans_handle *trans,
 		min_key = max_key + 1;
 	}
 
-	inode->last_dir_index_offset = ctx->last_dir_item_offset;
-
 	return 0;
 }
 
diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
index 85b43075ac58..85cd24cb0540 100644
--- a/fs/btrfs/tree-log.h
+++ b/fs/btrfs/tree-log.h
@@ -24,8 +24,6 @@ struct btrfs_log_ctx {
 	bool logging_new_delayed_dentries;
 	/* Indicate if the inode being logged was logged before. */
 	bool logged_before;
-	/* Tracks the last logged dir item/index key offset. */
-	u64 last_dir_item_offset;
 	struct inode *inode;
 	struct list_head list;
 	/* Only used for fast fsyncs. */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/8] btrfs: use a negative value for BTRFS_LOG_FORCE_COMMIT
  2023-01-10 14:56 [PATCH 0/8] btrfs: some log tree fixes and cleanups fdmanana
                   ` (5 preceding siblings ...)
  2023-01-10 14:56 ` [PATCH 6/8] btrfs: simplify update of last_dir_index_offset when logging a directory fdmanana
@ 2023-01-10 14:56 ` fdmanana
  2023-01-10 14:56 ` [PATCH 8/8] btrfs: use a single variable to track return value for log_dir_items() fdmanana
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: fdmanana @ 2023-01-10 14:56 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Currently we use the value 1 for BTRFS_LOG_FORCE_COMMIT, but that value
has a few inconveniences:

1) If it's ever used by btrfs_log_inode(), or any function down the call
   chain, we have to remember to btrfs_set_log_full_commit(), which is
   repetitive and has a chance to be forgotten in future use cases.
   btrfs_log_inode_parent() only calls btrfs_set_log_full_commit() when
   it gets a negative value from btrfs_log_inode();

2) Down the call chain of btrfs_log_inode(), we may have functions that
   need to force a log commit, but can return either an error (negative
   value), false (0) or true (1). So they are forced to return some
   random negative to force a log commit - using BTRFS_LOG_FORCE_COMMIT
   would make the intention more clear. Currently the only example is
   flush_dir_items_batch().

So turn BTRFS_LOG_FORCE_COMMIT into a negative value. The chosen value
is -(MAX_ERRNO + 1), so that it does not overlap any errno value and makes
it easier to debug.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/tree-log.c | 10 +++-------
 fs/btrfs/tree-log.h |  9 +++++++--
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 58599189bd18..94fc8b08254c 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3652,11 +3652,10 @@ static int flush_dir_items_batch(struct btrfs_trans_handle *trans,
 
 	/*
 	 * If for some unexpected reason the last item's index is not greater
-	 * than the last index we logged, warn and return an error to fallback
-	 * to a transaction commit.
+	 * than the last index we logged, warn and force a transaction commit.
 	 */
 	if (WARN_ON(last_index <= inode->last_dir_index_offset))
-		ret = -EUCLEAN;
+		ret = BTRFS_LOG_FORCE_COMMIT;
 	else
 		inode->last_dir_index_offset = last_index;
 out:
@@ -5604,10 +5603,8 @@ static int add_conflicting_inode(struct btrfs_trans_handle *trans,
 	 * LOG_INODE_EXISTS mode) and slow down other fsyncs or transaction
 	 * commits.
 	 */
-	if (ctx->num_conflict_inodes >= MAX_CONFLICT_INODES) {
-		btrfs_set_log_full_commit(trans);
+	if (ctx->num_conflict_inodes >= MAX_CONFLICT_INODES)
 		return BTRFS_LOG_FORCE_COMMIT;
-	}
 
 	inode = btrfs_iget(root->fs_info->sb, ino, root);
 	/*
@@ -6466,7 +6463,6 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
 	 * result in losing the file after a log replay.
 	 */
 	if (full_dir_logging && inode->last_unlink_trans >= trans->transid) {
-		btrfs_set_log_full_commit(trans);
 		ret = BTRFS_LOG_FORCE_COMMIT;
 		goto out_unlock;
 	}
diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
index 85cd24cb0540..bdeb5216718f 100644
--- a/fs/btrfs/tree-log.h
+++ b/fs/btrfs/tree-log.h
@@ -13,8 +13,13 @@
 /* return value for btrfs_log_dentry_safe that means we don't need to log it at all */
 #define BTRFS_NO_LOG_SYNC 256
 
-/* We can't use the tree log for whatever reason, force a transaction commit */
-#define BTRFS_LOG_FORCE_COMMIT				(1)
+/*
+ * We can't use the tree log for whatever reason, force a transaction commit.
+ * We use a negative value because there are functions through the logging code
+ * that need to return an error (< 0 value), false (0) or true (1). Any negative
+ * value will do, as it will cause the log to be marked for a full sync.
+ */
+#define BTRFS_LOG_FORCE_COMMIT				(-(MAX_ERRNO + 1))
 
 struct btrfs_log_ctx {
 	int log_ret;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 8/8] btrfs: use a single variable to track return value for log_dir_items()
  2023-01-10 14:56 [PATCH 0/8] btrfs: some log tree fixes and cleanups fdmanana
                   ` (6 preceding siblings ...)
  2023-01-10 14:56 ` [PATCH 7/8] btrfs: use a negative value for BTRFS_LOG_FORCE_COMMIT fdmanana
@ 2023-01-10 14:56 ` fdmanana
  2023-01-11 21:24 ` [PATCH 0/8] btrfs: some log tree fixes and cleanups Josef Bacik
  2023-01-12 14:45 ` David Sterba
  9 siblings, 0 replies; 13+ messages in thread
From: fdmanana @ 2023-01-10 14:56 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

We currently use 'ret' and 'err' to track the return value for
log_dir_items(), which is confusing and likely the cause for previous
bugs where log_dir_items() did not return an error when it should, fixed
in previous patches.

So change this and use only a single variable, 'ret', to track the return
value. This is simpler and makes it similar to most of the existing code.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/tree-log.c | 37 ++++++++++++++++++-------------------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 94fc8b08254c..997ba92481cb 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3793,7 +3793,6 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
 	struct btrfs_key min_key;
 	struct btrfs_root *root = inode->root;
 	struct btrfs_root *log = root->log_root;
-	int err = 0;
 	int ret;
 	u64 last_old_dentry_offset = min_offset - 1;
 	u64 last_offset = (u64)-1;
@@ -3834,8 +3833,8 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
 					      path->slots[0]);
 			if (tmp.type == BTRFS_DIR_INDEX_KEY)
 				last_old_dentry_offset = tmp.offset;
-		} else if (ret < 0) {
-			err = ret;
+		} else if (ret > 0) {
+			ret = 0;
 		}
 
 		goto done;
@@ -3858,7 +3857,6 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
 		if (tmp.type == BTRFS_DIR_INDEX_KEY)
 			last_old_dentry_offset = tmp.offset;
 	} else if (ret < 0) {
-		err = ret;
 		goto done;
 	}
 
@@ -3880,12 +3878,15 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
 	 */
 search:
 	ret = btrfs_search_slot(NULL, root, &min_key, path, 0, 0);
-	if (ret > 0)
+	if (ret > 0) {
 		ret = btrfs_next_item(root, path);
+		if (ret > 0) {
+			/* There are no more keys in the inode's root. */
+			ret = 0;
+			goto done;
+		}
+	}
 	if (ret < 0)
-		err = ret;
-	/* If ret is 1, there are no more keys in the inode's root. */
-	if (ret != 0)
 		goto done;
 
 	/*
@@ -3896,8 +3897,8 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
 		ret = process_dir_items_leaf(trans, inode, path, dst_path, ctx,
 					     &last_old_dentry_offset);
 		if (ret != 0) {
-			if (ret < 0)
-				err = ret;
+			if (ret > 0)
+				ret = 0;
 			goto done;
 		}
 		path->slots[0] = btrfs_header_nritems(path->nodes[0]);
@@ -3908,10 +3909,10 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
 		 */
 		ret = btrfs_next_leaf(root, path);
 		if (ret) {
-			if (ret == 1)
+			if (ret == 1) {
 				last_offset = (u64)-1;
-			else
-				err = ret;
+				ret = 0;
+			}
 			goto done;
 		}
 		btrfs_item_key_to_cpu(path->nodes[0], &min_key, path->slots[0]);
@@ -3942,7 +3943,7 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
 	btrfs_release_path(path);
 	btrfs_release_path(dst_path);
 
-	if (err == 0) {
+	if (ret == 0) {
 		*last_offset_ret = last_offset;
 		/*
 		 * In case the leaf was changed in the current transaction but
@@ -3953,15 +3954,13 @@ static noinline int log_dir_items(struct btrfs_trans_handle *trans,
 		 * a range, last_old_dentry_offset is == to last_offset.
 		 */
 		ASSERT(last_old_dentry_offset <= last_offset);
-		if (last_old_dentry_offset < last_offset) {
+		if (last_old_dentry_offset < last_offset)
 			ret = insert_dir_log_key(trans, log, path, ino,
 						 last_old_dentry_offset + 1,
 						 last_offset);
-			if (ret)
-				err = ret;
-		}
 	}
-	return err;
+
+	return ret;
 }
 
 /*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/8] btrfs: some log tree fixes and cleanups
  2023-01-10 14:56 [PATCH 0/8] btrfs: some log tree fixes and cleanups fdmanana
                   ` (7 preceding siblings ...)
  2023-01-10 14:56 ` [PATCH 8/8] btrfs: use a single variable to track return value for log_dir_items() fdmanana
@ 2023-01-11 21:24 ` Josef Bacik
  2023-01-12 14:45 ` David Sterba
  9 siblings, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2023-01-11 21:24 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Tue, Jan 10, 2023 at 02:56:33PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Some fixes mostly around directory logging and some cleanups (the last
> three patches).
> 

You can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/8] btrfs: some log tree fixes and cleanups
  2023-01-10 14:56 [PATCH 0/8] btrfs: some log tree fixes and cleanups fdmanana
                   ` (8 preceding siblings ...)
  2023-01-11 21:24 ` [PATCH 0/8] btrfs: some log tree fixes and cleanups Josef Bacik
@ 2023-01-12 14:45 ` David Sterba
  9 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2023-01-12 14:45 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Tue, Jan 10, 2023 at 02:56:33PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Some fixes mostly around directory logging and some cleanups (the last
> three patches).
> 
> Filipe Manana (8):
>   btrfs: fix missing error handling when logging directory items
>   btrfs: fix directory logging due to race with concurrent index key deletion
>   btrfs: add missing setup of log for full commit at add_conflicting_inode()
>   btrfs: do not abort transaction on failure to write log tree when syncing log
>   btrfs: do not abort transaction on failure to update log root
>   btrfs: simplify update of last_dir_index_offset when logging a directory
>   btrfs: use a negative value for BTRFS_LOG_FORCE_COMMIT
>   btrfs: use a single variable to track return value for log_dir_items()

Added to misc-next, thanks. The first 5 patches will go to the next pull
request to fix the recently reported tree-log bugs.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/8] btrfs: simplify update of last_dir_index_offset when logging a directory
  2023-01-10 14:56 ` [PATCH 6/8] btrfs: simplify update of last_dir_index_offset when logging a directory fdmanana
@ 2023-01-31 17:42   ` Filipe Manana
  2023-02-06 18:40     ` David Sterba
  0 siblings, 1 reply; 13+ messages in thread
From: Filipe Manana @ 2023-01-31 17:42 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

On Tue, Jan 10, 2023 at 3:20 PM <fdmanana@kernel.org> wrote:
>
> From: Filipe Manana <fdmanana@suse.com>
>
> When logging a directory, we always set the inode's last_dir_index_offset
> to the offset of the last dir index item we found. This is using an extra
> field in the log context structure, and it makes more sense to update it
> only after we insert dir index items, and we could directly update the
> inode's last_dir_index_offset field instead.
>
> So make this simpler by updating the inode's last_dir_index_offset only
> when we actually insert dir index keys in the log tree, and getting rid
> of the last_dir_item_offset field in the log context structure.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reported-by: David Arendt <admin@prnet.org>
Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/
Reported-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/Y8voyTXdnPDz8xwY@mail.gmail.com/
Reported-by: Hunter Wardlaw <wardlawhunter@gmail.com>
Link: https://bugzilla.suse.com/show_bug.cgi?id=1207231
CC: stable@vger.kernel.org # 6.1+

David, can you please add the following tags to the patch?
It happens to fix an issue those users reported, all on 6.1 only.

Thanks.

> ---
>  fs/btrfs/tree-log.c | 23 +++++++++++++++++------
>  fs/btrfs/tree-log.h |  2 --
>  2 files changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index d43261545264..58599189bd18 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -3576,17 +3576,19 @@ static noinline int insert_dir_log_key(struct btrfs_trans_handle *trans,
>  }
>
>  static int flush_dir_items_batch(struct btrfs_trans_handle *trans,
> -                                struct btrfs_root *log,
> +                                struct btrfs_inode *inode,
>                                  struct extent_buffer *src,
>                                  struct btrfs_path *dst_path,
>                                  int start_slot,
>                                  int count)
>  {
> +       struct btrfs_root *log = inode->root->log_root;
>         char *ins_data = NULL;
>         struct btrfs_item_batch batch;
>         struct extent_buffer *dst;
>         unsigned long src_offset;
>         unsigned long dst_offset;
> +       u64 last_index;
>         struct btrfs_key key;
>         u32 item_size;
>         int ret;
> @@ -3644,6 +3646,19 @@ static int flush_dir_items_batch(struct btrfs_trans_handle *trans,
>         src_offset = btrfs_item_ptr_offset(src, start_slot + count - 1);
>         copy_extent_buffer(dst, src, dst_offset, src_offset, batch.total_data_size);
>         btrfs_release_path(dst_path);
> +
> +       last_index = batch.keys[count - 1].offset;
> +       ASSERT(last_index > inode->last_dir_index_offset);
> +
> +       /*
> +        * If for some unexpected reason the last item's index is not greater
> +        * than the last index we logged, warn and return an error to fallback
> +        * to a transaction commit.
> +        */
> +       if (WARN_ON(last_index <= inode->last_dir_index_offset))
> +               ret = -EUCLEAN;
> +       else
> +               inode->last_dir_index_offset = last_index;
>  out:
>         kfree(ins_data);
>
> @@ -3693,7 +3708,6 @@ static int process_dir_items_leaf(struct btrfs_trans_handle *trans,
>                 }
>
>                 di = btrfs_item_ptr(src, i, struct btrfs_dir_item);
> -               ctx->last_dir_item_offset = key.offset;
>
>                 /*
>                  * Skip ranges of items that consist only of dir item keys created
> @@ -3756,7 +3770,7 @@ static int process_dir_items_leaf(struct btrfs_trans_handle *trans,
>         if (batch_size > 0) {
>                 int ret;
>
> -               ret = flush_dir_items_batch(trans, log, src, dst_path,
> +               ret = flush_dir_items_batch(trans, inode, src, dst_path,
>                                             batch_start, batch_size);
>                 if (ret < 0)
>                         return ret;
> @@ -4044,7 +4058,6 @@ static noinline int log_directory_changes(struct btrfs_trans_handle *trans,
>
>         min_key = BTRFS_DIR_START_INDEX;
>         max_key = 0;
> -       ctx->last_dir_item_offset = inode->last_dir_index_offset;
>
>         while (1) {
>                 ret = log_dir_items(trans, inode, path, dst_path,
> @@ -4056,8 +4069,6 @@ static noinline int log_directory_changes(struct btrfs_trans_handle *trans,
>                 min_key = max_key + 1;
>         }
>
> -       inode->last_dir_index_offset = ctx->last_dir_item_offset;
> -
>         return 0;
>  }
>
> diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
> index 85b43075ac58..85cd24cb0540 100644
> --- a/fs/btrfs/tree-log.h
> +++ b/fs/btrfs/tree-log.h
> @@ -24,8 +24,6 @@ struct btrfs_log_ctx {
>         bool logging_new_delayed_dentries;
>         /* Indicate if the inode being logged was logged before. */
>         bool logged_before;
> -       /* Tracks the last logged dir item/index key offset. */
> -       u64 last_dir_item_offset;
>         struct inode *inode;
>         struct list_head list;
>         /* Only used for fast fsyncs. */
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 6/8] btrfs: simplify update of last_dir_index_offset when logging a directory
  2023-01-31 17:42   ` Filipe Manana
@ 2023-02-06 18:40     ` David Sterba
  0 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2023-02-06 18:40 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs, David Sterba

On Tue, Jan 31, 2023 at 05:42:12PM +0000, Filipe Manana wrote:
> On Tue, Jan 10, 2023 at 3:20 PM <fdmanana@kernel.org> wrote:
> >
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When logging a directory, we always set the inode's last_dir_index_offset
> > to the offset of the last dir index item we found. This is using an extra
> > field in the log context structure, and it makes more sense to update it
> > only after we insert dir index items, and we could directly update the
> > inode's last_dir_index_offset field instead.
> >
> > So make this simpler by updating the inode's last_dir_index_offset only
> > when we actually insert dir index keys in the log tree, and getting rid
> > of the last_dir_item_offset field in the log context structure.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> 
> Reported-by: David Arendt <admin@prnet.org>
> Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/
> Reported-by: Maxim Mikityanskiy <maxtram95@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/Y8voyTXdnPDz8xwY@mail.gmail.com/
> Reported-by: Hunter Wardlaw <wardlawhunter@gmail.com>
> Link: https://bugzilla.suse.com/show_bug.cgi?id=1207231
> CC: stable@vger.kernel.org # 6.1+
> 
> David, can you please add the following tags to the patch?
> It happens to fix an issue those users reported, all on 6.1 only.

Updated and added to pull request queue, thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-02-06 18:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-10 14:56 [PATCH 0/8] btrfs: some log tree fixes and cleanups fdmanana
2023-01-10 14:56 ` [PATCH 1/8] btrfs: fix missing error handling when logging directory items fdmanana
2023-01-10 14:56 ` [PATCH 2/8] btrfs: fix directory logging due to race with concurrent index key deletion fdmanana
2023-01-10 14:56 ` [PATCH 3/8] btrfs: add missing setup of log for full commit at add_conflicting_inode() fdmanana
2023-01-10 14:56 ` [PATCH 4/8] btrfs: do not abort transaction on failure to write log tree when syncing log fdmanana
2023-01-10 14:56 ` [PATCH 5/8] btrfs: do not abort transaction on failure to update log root fdmanana
2023-01-10 14:56 ` [PATCH 6/8] btrfs: simplify update of last_dir_index_offset when logging a directory fdmanana
2023-01-31 17:42   ` Filipe Manana
2023-02-06 18:40     ` David Sterba
2023-01-10 14:56 ` [PATCH 7/8] btrfs: use a negative value for BTRFS_LOG_FORCE_COMMIT fdmanana
2023-01-10 14:56 ` [PATCH 8/8] btrfs: use a single variable to track return value for log_dir_items() fdmanana
2023-01-11 21:24 ` [PATCH 0/8] btrfs: some log tree fixes and cleanups Josef Bacik
2023-01-12 14:45 ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).