All of lore.kernel.org
 help / color / mirror / Atom feed
From: fdmanana@kernel.org
To: linux-btrfs@vger.kernel.org
Cc: ce3g8jdj@umail.furryterror.org, Filipe Manana <fdmanana@suse.com>
Subject: [PATCH 2/9] btrfs: always pin deleted leaves when there are active tree mod log users
Date: Thu, 11 Mar 2021 14:31:06 +0000	[thread overview]
Message-ID: <d9d8cda5b3ab2a262d4d66e9fe8abd75912f252f.1615472583.git.fdmanana@suse.com> (raw)
In-Reply-To: <cover.1615472583.git.fdmanana@suse.com>

From: Filipe Manana <fdmanana@suse.com>

When freeing a tree block we may end up adding its extent back to the
free space cache/tree, as long as there are no more references for it,
it was created in the current transaction and writeback for it never
happened. This is generally fine, however when we have tree mod log
operations it can result in inconsistent versions of a btree after
unwinding extent buffers with the recorded tree mod log operations.

This is because:

* We only log operations for nodes (adding and removing key/pointers),
  for leaves we don't do anything;

* This means that we can log a MOD_LOG_KEY_REMOVE_WHILE_FREEING operation
  for a node that points to a leaf that was deleted;

* Before we apply the logged operation to unwind a node, we can have
  that leaf's extent allocated again, either as a node or as a leaf, and
  possibly for another btree. This is possible if the leaf was created in
  the current transaction and writeback for it never started, in which
  case btrfs_free_tree_block() returns its extent back to the free space
  cache/tree;

* Then, before applying the tree mod log operation, some task allocates
  the metadata extent just freed before, and uses it either as a leaf or
  as a node for some btree (can be the same or another one, it does not
  matter);

* After applying the MOD_LOG_KEY_REMOVE_WHILE_FREEING operation we now
  get the target node with an item pointing to the metadata extent that
  now has content different from what it had before the leaf was deleted.
  It might now belong to a different btree and be a node and not a leaf
  anymore.

  As a consequence, the results of searches after the unwinding can be
  unpredictable and produce unexpected results.

So make sure we pin extent buffers corresponding to leaves when there
are tree mod log users.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/extent-tree.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5e228d6ad63f..2482b26b1971 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3310,6 +3310,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 
 	if (last_ref && btrfs_header_generation(buf) == trans->transid) {
 		struct btrfs_block_group *cache;
+		bool must_pin = false;
 
 		if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) {
 			ret = check_ref_cleanup(trans, buf->start);
@@ -3327,7 +3328,27 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 			goto out;
 		}
 
-		if (btrfs_is_zoned(fs_info)) {
+		/*
+		 * If this is a leaf and there are tree mod log users, we may
+		 * have recorded mod log operations that point to this leaf.
+		 * So we must make sure no one reuses this leaf's extent before
+		 * mod log operations are applied to a node, otherwise after
+		 * rewinding a node using the mod log operations we get an
+		 * inconsistent btree, as the leaf's extent may now be used as
+		 * a node or leaf for another different btree.
+		 * We are safe from races here because at this point no other
+		 * node or root points to this extent buffer, so if after this
+		 * check a new tree mod log user joins, it will not be able to
+		 * find a node pointing to this leaf and record operations that
+		 * point to this leaf.
+		 */
+		if (btrfs_header_level(buf) == 0) {
+			read_lock(&fs_info->tree_mod_log_lock);
+			must_pin = !list_empty(&fs_info->tree_mod_seq_list);
+			read_unlock(&fs_info->tree_mod_log_lock);
+		}
+
+		if (must_pin || btrfs_is_zoned(fs_info)) {
 			btrfs_redirty_list_add(trans->transaction, buf);
 			pin_down_extent(trans, cache, buf->start, buf->len, 1);
 			btrfs_put_block_group(cache);
-- 
2.28.0


  parent reply	other threads:[~2021-03-11 14:32 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-11 14:31 [PATCH 0/9] btrfs: bug fixes for the tree mod log and small refactorings fdmanana
2021-03-11 14:31 ` [PATCH 1/9] btrfs: fix race when cloning extent buffer during rewind of an old root fdmanana
2021-03-11 14:31 ` fdmanana [this message]
2021-03-15 19:28   ` [PATCH 2/9] btrfs: always pin deleted leaves when there are active tree mod log users David Sterba
2021-03-16 11:49     ` Filipe Manana
2021-03-16 11:54       ` Johannes Thumshirn
2021-03-11 14:31 ` [PATCH 3/9] btrfs: move the tree mod log code into its own file fdmanana
2021-03-11 17:26   ` kernel test robot
2021-03-11 17:26     ` kernel test robot
2021-03-11 17:41     ` Filipe Manana
2021-03-11 17:41       ` Filipe Manana
2021-03-12  8:50   ` Anand Jain
2021-03-11 14:31 ` [PATCH 4/9] btrfs: use booleans where appropriate for the tree mod log functions fdmanana
2021-03-12 12:44   ` Anand Jain
2021-03-11 14:31 ` [PATCH 5/9] btrfs: use a bit to track the existence of tree mod log users fdmanana
2021-03-13  7:26   ` Wang Yugui
2021-03-15  9:52     ` Filipe Manana
2021-03-11 14:31 ` [PATCH 6/9] btrfs: use the new bit BTRFS_FS_TREE_MOD_LOG_USERS at btrfs_free_tree_block() fdmanana
2021-03-11 14:31 ` [PATCH 7/9] btrfs: remove unnecessary leaf check at btrfs_tree_mod_log_free_eb() fdmanana
2021-03-11 14:31 ` [PATCH 8/9] btrfs: add and use helper to get lowest sequence number for the tree mod log fdmanana
2021-03-11 14:31 ` [PATCH 9/9] btrfs: update debug message when checking seq number of a delayed ref fdmanana
2021-03-16 16:58 ` [PATCH 0/9] btrfs: bug fixes for the tree mod log and small refactorings David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d9d8cda5b3ab2a262d4d66e9fe8abd75912f252f.1615472583.git.fdmanana@suse.com \
    --to=fdmanana@kernel.org \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=fdmanana@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.