All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] btrfs: fix for btrfs_prev_leaf() and unexport it
@ 2023-04-12 10:33 fdmanana
  2023-04-12 10:33 ` [PATCH 1/2] btrfs: fix btrfs_prev_leaf() to not return the same key twice fdmanana
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: fdmanana @ 2023-04-12 10:33 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

A fix for btrfs_prev_leaf() where due to races with concurrent insertions,
can return the same key again, and unexport it since it's not used outside
ctree.c. More details on the individual changelogs.

Filipe Manana (2):
  btrfs: fix btrfs_prev_leaf() to not return the same key twice
  btrfs: unexport btrfs_prev_leaf()

 fs/btrfs/ctree.c | 131 +++++++++++++++++++++++++++++------------------
 fs/btrfs/ctree.h |   1 -
 2 files changed, 81 insertions(+), 51 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] btrfs: fix btrfs_prev_leaf() to not return the same key twice
  2023-04-12 10:33 [PATCH 0/2] btrfs: fix for btrfs_prev_leaf() and unexport it fdmanana
@ 2023-04-12 10:33 ` fdmanana
  2023-04-13 19:57   ` Wang Yugui
  2023-04-12 10:33 ` [PATCH 2/2] btrfs: unexport btrfs_prev_leaf() fdmanana
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: fdmanana @ 2023-04-12 10:33 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

A call to btrfs_prev_leaf() may end up returning a path that points to the
same item (key) again. This happens if while btrfs_prev_leaf(), after we
release the path, a concurrent insertion happens, which moves items off
from a sibbling into the front of the previous leaf, and an item with the
computed previous key does not exists.

For example, suppose we have the two following leaves:

  Leaf A

  -------------------------------------------------------------
  | ...   key (300 96 10)   key (300 96 15)   key (300 96 16) |
  -------------------------------------------------------------
              slot 20             slot 21             slot 22

  Leaf B

  -------------------------------------------------------------
  | key (300 96 20)   key (300 96 21)   key (300 96 22)   ... |
  -------------------------------------------------------------
      slot 0             slot 1             slot 2

If we call btrfs_prev_leaf(), from btrfs_previous_item() for example, with
a path pointing to leaf B and slot 0 and the following happens:

1) At btrfs_prev_leaf() we compute the previous key to search as:
   (300 96 19), which is a key that does not exists in the tree;

2) Then we call btrfs_release_path() at btrfs_prev_leaf();

3) Some other task inserts a key at leaf A, that sorts before the key at
   slot 20, for example it has an objectid of 299. In order to make room
   for the new key, the key at slot 22 is moved to the front of leaf B.
   This happens at push_leaf_right(), called from split_leaf().

   After this leaf B now looks like:

  --------------------------------------------------------------------------------
  | key (300 96 16)    key (300 96 20)   key (300 96 21)   key (300 96 22)   ... |
  --------------------------------------------------------------------------------
       slot 0              slot 1             slot 2             slot 3

4) At btrfs_prev_leaf() we call btrfs_search_slot() for the computed
   previous key: (300 96 19). Since the key does not exists,
   btrfs_search_slot() returns 1 and with a path pointing to leaf B
   and slot 1, the item with key (300 96 20);

5) This makes btrfs_prev_leaf() return a path that points to slot 1 of
   leaf B, the same key as before it was called, since the key at slot 0
   of leaf B (300 96 16) is less than the computed previous key, which is
   (300 96 19);

6) As a consequence btrfs_previous_item() returns a path that points again
   to the item with key (300 96 20).

For some users of btrfs_prev_leaf() or btrfs_previous_item() this may not
be functional a problem, despite not making sense to return a new path
pointing again to the same item/key. However for a caller such as
tree-log.c:log_dir_items(), this has a bad consequence, as it can result
in not logging some dir index deletions in case the directory is being
logged without holding the inode's VFS lock (logging triggered while
logging a child inode for example) - for the example scenario above, in
case the dir index keys 17, 18 and 19 were deleted in the current
transaction.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 3c983c70028a..6bdd8e42c95e 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -4478,10 +4478,12 @@ int btrfs_del_items(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path)
 {
 	struct btrfs_key key;
+	struct btrfs_key orig_key;
 	struct btrfs_disk_key found_key;
 	int ret;
 
 	btrfs_item_key_to_cpu(path->nodes[0], &key, 0);
+	orig_key = key;
 
 	if (key.offset > 0) {
 		key.offset--;
@@ -4498,8 +4500,36 @@ int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path)
 
 	btrfs_release_path(path);
 	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
-	if (ret < 0)
+	if (ret <= 0)
 		return ret;
+
+	/*
+	 * Previous key not found. Even if we were at slot 0 of the leaf we had
+	 * before releasing the path and calling btrfs_search_slot(), we now may
+	 * be in a slot pointing to the same original key - this can happen if
+	 * after we released the path, one of more items were moved from a
+	 * sibbling leaf into the front of the leaf we had due to an insertion
+	 * (see push_leaf_right()).
+	 * If we hit this case and our slot is > 0 and just decrement the slot
+	 * so that the caller does not process the same key again, which may or
+	 * may not break the caller, depending on its logic.
+	 */
+	if (path->slots[0] < btrfs_header_nritems(path->nodes[0])) {
+		btrfs_item_key(path->nodes[0], &found_key, path->slots[0]);
+		ret = comp_keys(&found_key, &orig_key);
+		if (ret == 0) {
+			if (path->slots[0] > 0) {
+				path->slots[0]--;
+				return 0;
+			}
+			/*
+			 * At slot 0, same key as before, it means orig_key is
+			 * the lowest, leftmost, key in the tree. We're done.
+			 */
+			return 1;
+		}
+	}
+
 	btrfs_item_key(path->nodes[0], &found_key, 0);
 	ret = comp_keys(&found_key, &key);
 	/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] btrfs: unexport btrfs_prev_leaf()
  2023-04-12 10:33 [PATCH 0/2] btrfs: fix for btrfs_prev_leaf() and unexport it fdmanana
  2023-04-12 10:33 ` [PATCH 1/2] btrfs: fix btrfs_prev_leaf() to not return the same key twice fdmanana
@ 2023-04-12 10:33 ` fdmanana
  2023-04-15  7:14   ` Anand Jain
  2023-04-12 13:47 ` [PATCH 0/2] btrfs: fix for btrfs_prev_leaf() and unexport it Josef Bacik
  2023-04-17 18:53 ` David Sterba
  3 siblings, 1 reply; 9+ messages in thread
From: fdmanana @ 2023-04-12 10:33 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

btrfs_prev_leaf() is not used outside ctree.c, so there's no need to
export it at ctree.h - just make it static at ctree.c and move its
definition above btrfs_search_slot_for_read(), since that function
calls it.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 161 ++++++++++++++++++++++++-----------------------
 fs/btrfs/ctree.h |   1 -
 2 files changed, 81 insertions(+), 81 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 6bdd8e42c95e..d94429e0f16a 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2378,6 +2378,87 @@ int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
 	return ret;
 }
 
+/*
+ * Search the tree again to find a leaf with smaller keys.
+ * Returns 0 if it found something.
+ * Returns 1 if there are no smaller keys.
+ * Returns < 0 on error.
+ *
+ * This may release the path, and so you may lose any locks held at the
+ * time you call it.
+ */
+static int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path)
+{
+	struct btrfs_key key;
+	struct btrfs_key orig_key;
+	struct btrfs_disk_key found_key;
+	int ret;
+
+	btrfs_item_key_to_cpu(path->nodes[0], &key, 0);
+	orig_key = key;
+
+	if (key.offset > 0) {
+		key.offset--;
+	} else if (key.type > 0) {
+		key.type--;
+		key.offset = (u64)-1;
+	} else if (key.objectid > 0) {
+		key.objectid--;
+		key.type = (u8)-1;
+		key.offset = (u64)-1;
+	} else {
+		return 1;
+	}
+
+	btrfs_release_path(path);
+	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+	if (ret <= 0)
+		return ret;
+
+	/*
+	 * Previous key not found. Even if we were at slot 0 of the leaf we had
+	 * before releasing the path and calling btrfs_search_slot(), we now may
+	 * be in a slot pointing to the same original key - this can happen if
+	 * after we released the path, one of more items were moved from a
+	 * sibbling leaf into the front of the leaf we had due to an insertion
+	 * (see push_leaf_right()).
+	 * If we hit this case and our slot is > 0 and just decrement the slot
+	 * so that the caller does not process the same key again, which may or
+	 * may not break the caller, depending on its logic.
+	 */
+	if (path->slots[0] < btrfs_header_nritems(path->nodes[0])) {
+		btrfs_item_key(path->nodes[0], &found_key, path->slots[0]);
+		ret = comp_keys(&found_key, &orig_key);
+		if (ret == 0) {
+			if (path->slots[0] > 0) {
+				path->slots[0]--;
+				return 0;
+			}
+			/*
+			 * At slot 0, same key as before, it means orig_key is
+			 * the lowest, leftmost, key in the tree. We're done.
+			 */
+			return 1;
+		}
+	}
+
+	btrfs_item_key(path->nodes[0], &found_key, 0);
+	ret = comp_keys(&found_key, &key);
+	/*
+	 * We might have had an item with the previous key in the tree right
+	 * before we released our path. And after we released our path, that
+	 * item might have been pushed to the first slot (0) of the leaf we
+	 * were holding due to a tree balance. Alternatively, an item with the
+	 * previous key can exist as the only element of a leaf (big fat item).
+	 * Therefore account for these 2 cases, so that our callers (like
+	 * btrfs_previous_item) don't miss an existing item with a key matching
+	 * the previous key we computed above.
+	 */
+	if (ret <= 0)
+		return 0;
+	return 1;
+}
+
 /*
  * helper to use instead of search slot if no exact match is needed but
  * instead the next or previous item should be returned.
@@ -4467,86 +4548,6 @@ int btrfs_del_items(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 	return ret;
 }
 
-/*
- * search the tree again to find a leaf with lesser keys
- * returns 0 if it found something or 1 if there are no lesser leaves.
- * returns < 0 on io errors.
- *
- * This may release the path, and so you may lose any locks held at the
- * time you call it.
- */
-int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path)
-{
-	struct btrfs_key key;
-	struct btrfs_key orig_key;
-	struct btrfs_disk_key found_key;
-	int ret;
-
-	btrfs_item_key_to_cpu(path->nodes[0], &key, 0);
-	orig_key = key;
-
-	if (key.offset > 0) {
-		key.offset--;
-	} else if (key.type > 0) {
-		key.type--;
-		key.offset = (u64)-1;
-	} else if (key.objectid > 0) {
-		key.objectid--;
-		key.type = (u8)-1;
-		key.offset = (u64)-1;
-	} else {
-		return 1;
-	}
-
-	btrfs_release_path(path);
-	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
-	if (ret <= 0)
-		return ret;
-
-	/*
-	 * Previous key not found. Even if we were at slot 0 of the leaf we had
-	 * before releasing the path and calling btrfs_search_slot(), we now may
-	 * be in a slot pointing to the same original key - this can happen if
-	 * after we released the path, one of more items were moved from a
-	 * sibbling leaf into the front of the leaf we had due to an insertion
-	 * (see push_leaf_right()).
-	 * If we hit this case and our slot is > 0 and just decrement the slot
-	 * so that the caller does not process the same key again, which may or
-	 * may not break the caller, depending on its logic.
-	 */
-	if (path->slots[0] < btrfs_header_nritems(path->nodes[0])) {
-		btrfs_item_key(path->nodes[0], &found_key, path->slots[0]);
-		ret = comp_keys(&found_key, &orig_key);
-		if (ret == 0) {
-			if (path->slots[0] > 0) {
-				path->slots[0]--;
-				return 0;
-			}
-			/*
-			 * At slot 0, same key as before, it means orig_key is
-			 * the lowest, leftmost, key in the tree. We're done.
-			 */
-			return 1;
-		}
-	}
-
-	btrfs_item_key(path->nodes[0], &found_key, 0);
-	ret = comp_keys(&found_key, &key);
-	/*
-	 * We might have had an item with the previous key in the tree right
-	 * before we released our path. And after we released our path, that
-	 * item might have been pushed to the first slot (0) of the leaf we
-	 * were holding due to a tree balance. Alternatively, an item with the
-	 * previous key can exist as the only element of a leaf (big fat item).
-	 * Therefore account for these 2 cases, so that our callers (like
-	 * btrfs_previous_item) don't miss an existing item with a key matching
-	 * the previous key we computed above.
-	 */
-	if (ret <= 0)
-		return 0;
-	return 1;
-}
-
 /*
  * A helper function to walk down the tree starting at min_key, and looking
  * for nodes or leaves that are have a minimum transaction id.
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4c1986cd5bed..e81f10e12867 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -633,7 +633,6 @@ static inline int btrfs_insert_empty_item(struct btrfs_trans_handle *trans,
 	return btrfs_insert_empty_items(trans, root, path, &batch);
 }
 
-int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path);
 int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path,
 			u64 time_seq);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] btrfs: fix for btrfs_prev_leaf() and unexport it
  2023-04-12 10:33 [PATCH 0/2] btrfs: fix for btrfs_prev_leaf() and unexport it fdmanana
  2023-04-12 10:33 ` [PATCH 1/2] btrfs: fix btrfs_prev_leaf() to not return the same key twice fdmanana
  2023-04-12 10:33 ` [PATCH 2/2] btrfs: unexport btrfs_prev_leaf() fdmanana
@ 2023-04-12 13:47 ` Josef Bacik
  2023-04-17 18:53 ` David Sterba
  3 siblings, 0 replies; 9+ messages in thread
From: Josef Bacik @ 2023-04-12 13:47 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Wed, Apr 12, 2023 at 11:33:08AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> A fix for btrfs_prev_leaf() where due to races with concurrent insertions,
> can return the same key again, and unexport it since it's not used outside
> ctree.c. More details on the individual changelogs.
> 
> Filipe Manana (2):
>   btrfs: fix btrfs_prev_leaf() to not return the same key twice
>   btrfs: unexport btrfs_prev_leaf()
> 
>  fs/btrfs/ctree.c | 131 +++++++++++++++++++++++++++++------------------
>  fs/btrfs/ctree.h |   1 -
>  2 files changed, 81 insertions(+), 51 deletions(-)
> 

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs: fix btrfs_prev_leaf() to not return the same key twice
  2023-04-12 10:33 ` [PATCH 1/2] btrfs: fix btrfs_prev_leaf() to not return the same key twice fdmanana
@ 2023-04-13 19:57   ` Wang Yugui
  2023-04-14  3:43     ` Wang Yugui
  0 siblings, 1 reply; 9+ messages in thread
From: Wang Yugui @ 2023-04-13 19:57 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

Hi,

> From: Filipe Manana <fdmanana@suse.com>
> 
> A call to btrfs_prev_leaf() may end up returning a path that points to the
> same item (key) again. This happens if while btrfs_prev_leaf(), after we
> release the path, a concurrent insertion happens, which moves items off
> from a sibbling into the front of the previous leaf, and an item with the
> computed previous key does not exists.
> 
> For example, suppose we have the two following leaves:
> 
>   Leaf A
> 
>   -------------------------------------------------------------
>   | ...   key (300 96 10)   key (300 96 15)   key (300 96 16) |
>   -------------------------------------------------------------
>               slot 20             slot 21             slot 22
> 
>   Leaf B
> 
>   -------------------------------------------------------------
>   | key (300 96 20)   key (300 96 21)   key (300 96 22)   ... |
>   -------------------------------------------------------------
>       slot 0             slot 1             slot 2
> 
> If we call btrfs_prev_leaf(), from btrfs_previous_item() for example, with
> a path pointing to leaf B and slot 0 and the following happens:
> 
> 1) At btrfs_prev_leaf() we compute the previous key to search as:
>    (300 96 19), which is a key that does not exists in the tree;
> 
> 2) Then we call btrfs_release_path() at btrfs_prev_leaf();
> 
> 3) Some other task inserts a key at leaf A, that sorts before the key at
>    slot 20, for example it has an objectid of 299. In order to make room
>    for the new key, the key at slot 22 is moved to the front of leaf B.
>    This happens at push_leaf_right(), called from split_leaf().
> 
>    After this leaf B now looks like:
> 
>   --------------------------------------------------------------------------------
>   | key (300 96 16)    key (300 96 20)   key (300 96 21)   key (300 96 22)   ... |
>   --------------------------------------------------------------------------------
>        slot 0              slot 1             slot 2             slot 3
> 
> 4) At btrfs_prev_leaf() we call btrfs_search_slot() for the computed
>    previous key: (300 96 19). Since the key does not exists,
>    btrfs_search_slot() returns 1 and with a path pointing to leaf B
>    and slot 1, the item with key (300 96 20);
> 
> 5) This makes btrfs_prev_leaf() return a path that points to slot 1 of
>    leaf B, the same key as before it was called, since the key at slot 0
>    of leaf B (300 96 16) is less than the computed previous key, which is
>    (300 96 19);
> 
> 6) As a consequence btrfs_previous_item() returns a path that points again
>    to the item with key (300 96 20).
> 
> For some users of btrfs_prev_leaf() or btrfs_previous_item() this may not
> be functional a problem, despite not making sense to return a new path
> pointing again to the same item/key. However for a caller such as
> tree-log.c:log_dir_items(), this has a bad consequence, as it can result
> in not logging some dir index deletions in case the directory is being
> logged without holding the inode's VFS lock (logging triggered while
> logging a child inode for example) - for the example scenario above, in
> case the dir index keys 17, 18 and 19 were deleted in the current
> transaction.

fstests(btrfs/080) had few frequency(<1/10) to fail after this pacth is applied.
but it is yet not clear enough.

# cat results//btrfs/080.out.bad
QA output created by 080
Unexpected digest for file /mnt/scratch/20_40_30_986878191_snap/foobar_63
(see /usr/hpc-bio/xfstests/results//btrfs/080.full for details)

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/04/14



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs: fix btrfs_prev_leaf() to not return the same key twice
  2023-04-13 19:57   ` Wang Yugui
@ 2023-04-14  3:43     ` Wang Yugui
  2023-04-17 10:18       ` Filipe Manana
  0 siblings, 1 reply; 9+ messages in thread
From: Wang Yugui @ 2023-04-14  3:43 UTC (permalink / raw)
  To: fdmanana, linux-btrfs

Hi,

> Hi,
> 
> > From: Filipe Manana <fdmanana@suse.com>
> > 
> > A call to btrfs_prev_leaf() may end up returning a path that points to the
> > same item (key) again. This happens if while btrfs_prev_leaf(), after we
> > release the path, a concurrent insertion happens, which moves items off
> > from a sibbling into the front of the previous leaf, and an item with the
> > computed previous key does not exists.
> > 
> > For example, suppose we have the two following leaves:
> > 
> >   Leaf A
> > 
> >   -------------------------------------------------------------
> >   | ...   key (300 96 10)   key (300 96 15)   key (300 96 16) |
> >   -------------------------------------------------------------
> >               slot 20             slot 21             slot 22
> > 
> >   Leaf B
> > 
> >   -------------------------------------------------------------
> >   | key (300 96 20)   key (300 96 21)   key (300 96 22)   ... |
> >   -------------------------------------------------------------
> >       slot 0             slot 1             slot 2
> > 
> > If we call btrfs_prev_leaf(), from btrfs_previous_item() for example, with
> > a path pointing to leaf B and slot 0 and the following happens:
> > 
> > 1) At btrfs_prev_leaf() we compute the previous key to search as:
> >    (300 96 19), which is a key that does not exists in the tree;
> > 
> > 2) Then we call btrfs_release_path() at btrfs_prev_leaf();
> > 
> > 3) Some other task inserts a key at leaf A, that sorts before the key at
> >    slot 20, for example it has an objectid of 299. In order to make room
> >    for the new key, the key at slot 22 is moved to the front of leaf B.
> >    This happens at push_leaf_right(), called from split_leaf().
> > 
> >    After this leaf B now looks like:
> > 
> >   --------------------------------------------------------------------------------
> >   | key (300 96 16)    key (300 96 20)   key (300 96 21)   key (300 96 22)   ... |
> >   --------------------------------------------------------------------------------
> >        slot 0              slot 1             slot 2             slot 3
> > 
> > 4) At btrfs_prev_leaf() we call btrfs_search_slot() for the computed
> >    previous key: (300 96 19). Since the key does not exists,
> >    btrfs_search_slot() returns 1 and with a path pointing to leaf B
> >    and slot 1, the item with key (300 96 20);
> > 
> > 5) This makes btrfs_prev_leaf() return a path that points to slot 1 of
> >    leaf B, the same key as before it was called, since the key at slot 0
> >    of leaf B (300 96 16) is less than the computed previous key, which is
> >    (300 96 19);
> > 
> > 6) As a consequence btrfs_previous_item() returns a path that points again
> >    to the item with key (300 96 20).
> > 
> > For some users of btrfs_prev_leaf() or btrfs_previous_item() this may not
> > be functional a problem, despite not making sense to return a new path
> > pointing again to the same item/key. However for a caller such as
> > tree-log.c:log_dir_items(), this has a bad consequence, as it can result
> > in not logging some dir index deletions in case the directory is being
> > logged without holding the inode's VFS lock (logging triggered while
> > logging a child inode for example) - for the example scenario above, in
> > case the dir index keys 17, 18 and 19 were deleted in the current
> > transaction.
> 
> fstests(btrfs/080) had few frequency(<1/10) to fail after this pacth is applied.
> but it is yet not clear enough.
> 
> # cat results//btrfs/080.out.bad
> QA output created by 080
> Unexpected digest for file /mnt/scratch/20_40_30_986878191_snap/foobar_63
> (see /usr/hpc-bio/xfstests/results//btrfs/080.full for details)

fstests(btrfs/080) failure happened without this patch too.

the next info update may take some long time, because of the low reproduce
frequency.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/04/14



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] btrfs: unexport btrfs_prev_leaf()
  2023-04-12 10:33 ` [PATCH 2/2] btrfs: unexport btrfs_prev_leaf() fdmanana
@ 2023-04-15  7:14   ` Anand Jain
  0 siblings, 0 replies; 9+ messages in thread
From: Anand Jain @ 2023-04-15  7:14 UTC (permalink / raw)
  To: fdmanana, linux-btrfs

On 12/4/23 18:33, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> btrfs_prev_leaf() is not used outside ctree.c, so there's no need to
> export it at ctree.h - just make it static at ctree.c and move its
> definition above btrfs_search_slot_for_read(), since that function
> calls it.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

LGTM

Reviewed-by: Anand Jain <anand.jain@oracle.com>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs: fix btrfs_prev_leaf() to not return the same key twice
  2023-04-14  3:43     ` Wang Yugui
@ 2023-04-17 10:18       ` Filipe Manana
  0 siblings, 0 replies; 9+ messages in thread
From: Filipe Manana @ 2023-04-17 10:18 UTC (permalink / raw)
  To: Wang Yugui; +Cc: linux-btrfs

On Fri, Apr 14, 2023 at 4:44 AM Wang Yugui <wangyugui@e16-tech.com> wrote:
>
> Hi,
>
> > Hi,
> >
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > A call to btrfs_prev_leaf() may end up returning a path that points to the
> > > same item (key) again. This happens if while btrfs_prev_leaf(), after we
> > > release the path, a concurrent insertion happens, which moves items off
> > > from a sibbling into the front of the previous leaf, and an item with the
> > > computed previous key does not exists.
> > >
> > > For example, suppose we have the two following leaves:
> > >
> > >   Leaf A
> > >
> > >   -------------------------------------------------------------
> > >   | ...   key (300 96 10)   key (300 96 15)   key (300 96 16) |
> > >   -------------------------------------------------------------
> > >               slot 20             slot 21             slot 22
> > >
> > >   Leaf B
> > >
> > >   -------------------------------------------------------------
> > >   | key (300 96 20)   key (300 96 21)   key (300 96 22)   ... |
> > >   -------------------------------------------------------------
> > >       slot 0             slot 1             slot 2
> > >
> > > If we call btrfs_prev_leaf(), from btrfs_previous_item() for example, with
> > > a path pointing to leaf B and slot 0 and the following happens:
> > >
> > > 1) At btrfs_prev_leaf() we compute the previous key to search as:
> > >    (300 96 19), which is a key that does not exists in the tree;
> > >
> > > 2) Then we call btrfs_release_path() at btrfs_prev_leaf();
> > >
> > > 3) Some other task inserts a key at leaf A, that sorts before the key at
> > >    slot 20, for example it has an objectid of 299. In order to make room
> > >    for the new key, the key at slot 22 is moved to the front of leaf B.
> > >    This happens at push_leaf_right(), called from split_leaf().
> > >
> > >    After this leaf B now looks like:
> > >
> > >   --------------------------------------------------------------------------------
> > >   | key (300 96 16)    key (300 96 20)   key (300 96 21)   key (300 96 22)   ... |
> > >   --------------------------------------------------------------------------------
> > >        slot 0              slot 1             slot 2             slot 3
> > >
> > > 4) At btrfs_prev_leaf() we call btrfs_search_slot() for the computed
> > >    previous key: (300 96 19). Since the key does not exists,
> > >    btrfs_search_slot() returns 1 and with a path pointing to leaf B
> > >    and slot 1, the item with key (300 96 20);
> > >
> > > 5) This makes btrfs_prev_leaf() return a path that points to slot 1 of
> > >    leaf B, the same key as before it was called, since the key at slot 0
> > >    of leaf B (300 96 16) is less than the computed previous key, which is
> > >    (300 96 19);
> > >
> > > 6) As a consequence btrfs_previous_item() returns a path that points again
> > >    to the item with key (300 96 20).
> > >
> > > For some users of btrfs_prev_leaf() or btrfs_previous_item() this may not
> > > be functional a problem, despite not making sense to return a new path
> > > pointing again to the same item/key. However for a caller such as
> > > tree-log.c:log_dir_items(), this has a bad consequence, as it can result
> > > in not logging some dir index deletions in case the directory is being
> > > logged without holding the inode's VFS lock (logging triggered while
> > > logging a child inode for example) - for the example scenario above, in
> > > case the dir index keys 17, 18 and 19 were deleted in the current
> > > transaction.
> >
> > fstests(btrfs/080) had few frequency(<1/10) to fail after this pacth is applied.
> > but it is yet not clear enough.
> >
> > # cat results//btrfs/080.out.bad
> > QA output created by 080
> > Unexpected digest for file /mnt/scratch/20_40_30_986878191_snap/foobar_63
> > (see /usr/hpc-bio/xfstests/results//btrfs/080.full for details)
>
> fstests(btrfs/080) failure happened without this patch too.
>
> the next info update may take some long time, because of the low reproduce
> frequency.

That test sporadically fails without the patch yes.

Also, more important: the test does not exercise at all the function
changed by this patch.
So you can save time and skip doing that pointless info update on
frequency failure...

Thanks.

>
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2023/04/14
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] btrfs: fix for btrfs_prev_leaf() and unexport it
  2023-04-12 10:33 [PATCH 0/2] btrfs: fix for btrfs_prev_leaf() and unexport it fdmanana
                   ` (2 preceding siblings ...)
  2023-04-12 13:47 ` [PATCH 0/2] btrfs: fix for btrfs_prev_leaf() and unexport it Josef Bacik
@ 2023-04-17 18:53 ` David Sterba
  3 siblings, 0 replies; 9+ messages in thread
From: David Sterba @ 2023-04-17 18:53 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Wed, Apr 12, 2023 at 11:33:08AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> A fix for btrfs_prev_leaf() where due to races with concurrent insertions,
> can return the same key again, and unexport it since it's not used outside
> ctree.c. More details on the individual changelogs.
> 
> Filipe Manana (2):
>   btrfs: fix btrfs_prev_leaf() to not return the same key twice
>   btrfs: unexport btrfs_prev_leaf()

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-04-17 18:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-12 10:33 [PATCH 0/2] btrfs: fix for btrfs_prev_leaf() and unexport it fdmanana
2023-04-12 10:33 ` [PATCH 1/2] btrfs: fix btrfs_prev_leaf() to not return the same key twice fdmanana
2023-04-13 19:57   ` Wang Yugui
2023-04-14  3:43     ` Wang Yugui
2023-04-17 10:18       ` Filipe Manana
2023-04-12 10:33 ` [PATCH 2/2] btrfs: unexport btrfs_prev_leaf() fdmanana
2023-04-15  7:14   ` Anand Jain
2023-04-12 13:47 ` [PATCH 0/2] btrfs: fix for btrfs_prev_leaf() and unexport it Josef Bacik
2023-04-17 18:53 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.