From: fdmanana@kernel.org
To: linux-btrfs@vger.kernel.org
Subject: [PATCH 2/6] btrfs: fix race that results in logging old extents during a fast fsync
Date: Wed, 25 Nov 2020 12:19:24 +0000 [thread overview]
Message-ID: <9f9b13e0edb79adea30d7e92dccfea8daf9cac88.1606305501.git.fdmanana@suse.com> (raw)
In-Reply-To: <cover.1606305501.git.fdmanana@suse.com>
In-Reply-To: <cover.1606305501.git.fdmanana@suse.com>
From: Filipe Manana <fdmanana@suse.com>
When logging the extents of an inode during a fast fsync, we have a time
window where we can log extents that are from the previous transaction and
already persisted. This only makes us waste time unnecessarily.
The following sequence of steps shows how this can happen:
1) We are at transaction 1000;
2) An ordered extent E from inode I completes, that is it has gone through
btrfs_finish_ordered_io(), and it set the extent maps' generation to
1000 when we unpin the extent, which is the generation of the current
transaction;
3) The commit for transaction 1000 starts by task A;
4) The task committing transaction 1000 sets the transaction state to
unblocked, writes the dirty extent buffers and the super blocks, then
unlocks tree_log_mutex;
5) Some change is made to inode I, resulting in creation of a new
transaction with a generation of 1001;
6) The transaction 1000 commit starts unpinning extents. At this point
fs_info->last_trans_committed still has a value of 999;
7) Task B starts an fsync on inode I, and when it gets to
btrfs_log_changed_extents() sees the extent map for extent E in the
list of modified extents. It sees the extent map has a generation of
1000 and fs_info->last_trans_committed has a value of 999, so it
proceeds to logging the respective file extent item and all the
checksums covering its range.
So we end up wasting time since the extent was already persisted and
is reachable through the trees pointed to by the super block committed
by transaction 1000.
So just fix this by comparing the extent maps generation against the
generation of the transaction handle - if it is smaller then the id in the
handle, we know the extent was already persisted and we do not need to log
it.
This patch belongs to a patch set that is comprised of the following
patches:
btrfs: fix race causing unnecessary inode logging during link and rename
btrfs: fix race that results in logging old extents during a fast fsync
btrfs: fix race that causes unnecessary logging of ancestor inodes
btrfs: fix race that makes inode logging fallback to transaction commit
btrfs: fix race leading to unnecessary transaction commit when logging inode
btrfs: do not block inode logging for so long during transaction commit
Performance results are mentioned in the change log of the last patch.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index b63f5c2b982a..596d72d239e9 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4416,14 +4416,12 @@ static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans,
struct extent_map *em, *n;
struct list_head extents;
struct extent_map_tree *tree = &inode->extent_tree;
- u64 test_gen;
int ret = 0;
int num = 0;
INIT_LIST_HEAD(&extents);
write_lock(&tree->lock);
- test_gen = root->fs_info->last_trans_committed;
list_for_each_entry_safe(em, n, &tree->modified_extents, list) {
list_del_init(&em->list);
@@ -4439,7 +4437,7 @@ static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans,
goto process;
}
- if (em->generation <= test_gen)
+ if (em->generation < trans->transid)
continue;
/* We log prealloc extents beyond eof later. */
--
2.28.0
next prev parent reply other threads:[~2020-11-25 12:19 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-25 12:19 [PATCH 0/6] btrfs: some performance improvements for dbench alike workloads fdmanana
2020-11-25 12:19 ` [PATCH 1/6] btrfs: fix race causing unnecessary inode logging during link and rename fdmanana
2020-11-25 12:19 ` fdmanana [this message]
2020-11-25 12:19 ` [PATCH 3/6] btrfs: fix race that causes unnecessary logging of ancestor inodes fdmanana
2020-11-25 12:19 ` [PATCH 4/6] btrfs: fix race that makes inode logging fallback to transaction commit fdmanana
2020-11-25 12:19 ` [PATCH 5/6] btrfs: fix race leading to unnecessary transaction commit when logging inode fdmanana
2020-11-25 12:19 ` [PATCH 6/6] btrfs: do not block inode logging for so long during transaction commit fdmanana
2021-02-02 5:38 ` Wang Yugui
2021-02-02 11:28 ` Filipe Manana
2021-02-02 13:01 ` Wang Yugui
2021-02-02 14:09 ` Wang Yugui
2021-02-03 10:51 ` Filipe Manana
2021-02-03 11:03 ` Nikolay Borisov
2021-02-03 15:16 ` Wang Yugui
2021-02-04 3:17 ` Wang Yugui
2021-02-04 6:19 ` Nikolay Borisov
2021-02-04 11:34 ` Wang Yugui
2021-02-04 11:58 ` Nikolay Borisov
2021-02-05 1:47 ` Wang Yugui
2020-11-30 17:30 ` [PATCH 0/6] btrfs: some performance improvements for dbench alike workloads David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9f9b13e0edb79adea30d7e92dccfea8daf9cac88.1606305501.git.fdmanana@suse.com \
--to=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.