From: Filipe Manana <fdmanana@kernel.org>
To: linux-btrfs <linux-btrfs@vger.kernel.org>,
David Sterba <dsterba@suse.com>
Subject: Re: [PATCH 5/6] btrfs: avoid inode logging during rename and link when possible
Date: Wed, 2 Feb 2022 16:49:46 +0000 [thread overview]
Message-ID: <CAL3q7H4b7NcJMgkfGOoBi0HPTQS8zFy6mGxyZamU_51dtKp2HQ@mail.gmail.com> (raw)
In-Reply-To: <e4f702b102c093d3b67a36194fbe6c51a9e5a94c.1642676248.git.fdmanana@suse.com>
On Fri, Jan 21, 2022 at 11:20 PM <fdmanana@kernel.org> wrote:
>
> From: Filipe Manana <fdmanana@suse.com>
>
> During a rename or link operation, we need to determine if an inode was
> previously logged or not, and if it was, do some update to the logged
> inode. We used to rely exclusively on the logged_trans field of struct
> btrfs_inode to determine that, but that was not reliable because the
> value of that field is not persisted in the inode item, so it's lost
> when an inode is evicted and loaded back again. That led to several
> issues in the past, such as not persisting deletions (such as the case
> fixed by commit 803f0f64d17769 ("Btrfs: fix fsync not persisting dentry
> deletions due to inode evictions")), or resulting in losing a file
> after an inode eviction followed by a rename (commit ecc64fab7d49c6
> ("btrfs: fix lost inode on log replay after mix of fsync, rename and
> inode eviction")), besides other issues.
>
> So the inode_logged() helper was introduced and used to determine if an
> inode was possibly logged before in the current transaction, with the
> caveat that it could return false positives, in the sense that even if an
> inode was not logged before in the current transaction, it could still
> return true, but never to return false in case the inode was logged.
> From a functional point of view that is fine, but from a performance
> perspective it can introduce significant latencies to rename and link
> operations, as they will end up doing inode logging even when it is not
> necessary.
>
> Recently on a 5.15 kernel, an openSUSE Tumbleweed user reported package
> installations and upgrades, with the zypper tool, were often taking a
> long time to complete. With strace it could be observed that zypper was
> spending about 99% of its time on rename operations, and then with
> further analysis we checked that directory logging was happening too
> frequently. Taking into account that installation/upgrade of some of the
> packages needed a few thousand file renames, the slowdown was very
> noticeable for the user.
>
> The issue was caused indirectly due to an excessive number of inode
> evictions on a 5.15 kernel, about 100x more compared to a 5.13, 5.14 or
> a 5.16-rc8 kernel. While triggering the inode evictions if something
> outside btrfs' control, btrfs could still behave better by eliminating
> the false positives from the inode_logged() helper.
>
> So change inode_logged() to actually eliminate such false positives caused
> by inode eviction and when an inode was never logged since the filesystem
> was mounted, as both cases relate to when the logges_trans field of struct
> btrfs_inode has a value of zero. When it can not determine if the inode
> was logged based only on the logged_trans value, lookup for the existence
> of the inode item in the log tree - if it's there then we known the inode
> was logged, if it's not there then it can not have been logged in the
> current transaction. Once we determine if the inode was logged, update
> the logged_trans value to avoid future calls to have to search in the log
> tree again.
>
> Alternatively, we could start storing logged_trans in the on disk inode
> item structure (struct btrfs_inode_item) in the unused space it still has,
> but that would be a bit odd because:
>
> 1) We only care about logged_trans since the filesystem was mounted, we
> don't care about its value from a previous mount. Having it persisted
> in the inode item structure would not make the best use of the precious
> unused space;
>
> 2) In order to get logged_trans persisted before inode eviction, we would
> have to update the delayed inode when we finish logging the inode and
> update its logged_trans in struct btrfs_inode, which makes it a bit
> cumbersome since we need to check if the delayed inode exists, if not
> create it and populate it and deal with any errors (-ENOMEM mostly).
>
> This change is part of a patchset comprised of the following patches:
>
> 1/5 btrfs: add helper to delete a dir entry from a log tree
> 2/5 btrfs: pass the dentry to btrfs_log_new_name() instead of the inode
> 3/5 btrfs: avoid logging all directory changes during renames
> 4/5 btrfs: stop doing unnecessary log updates during a rename
> 5/5 btrfs: avoid inode logging during rename and link when possible
>
> The following test script mimics part of what the zypper tool does during
> package installations/upgrades. It does not triggers inode evictions, but
> it's similar because it triggers false positives from the inode_logged()
> helper, because the inodes have a logged_trans of 0, there's a log tree
> due to a fsync of an unrelated file and the directory inode has its
> last_trans field set to the current transaction:
>
> $ cat test.sh
>
> #!/bin/bash
>
> DEV=/dev/nvme0n1
> MNT=/mnt/nvme0n1
>
> NUM_FILES=10000
>
> mkfs.btrfs -f $DEV
> mount $DEV $MNT
>
> mkdir $MNT/testdir
>
> for ((i = 1; i <= $NUM_FILES; i++)); do
> echo -n > $MNT/testdir/file_$i
> done
>
> sync
>
> # Now do some change to an unrelated file and fsync it.
> # This is just to create a log tree to make sure that inode_logged()
> # does not return false when called against "testdir".
> xfs_io -f -c "pwrite 0 4K" -c "fsync" $MNT/foo
>
> # Do some change to testdir. This is to make sure inode_logged()
> # will return true when called against "testdir", because its
> # logged_trans is 0, it was changed in the current transaction
> # and there's a log tree.
> echo -n > $MNT/testdir/file_$((NUM_FILES + 1))
>
> echo "Renaming $NUM_FILES files..."
> start=$(date +%s%N)
> for ((i = 1; i <= $NUM_FILES; i++)); do
> mv $MNT/testdir/file_$i $MNT/testdir/file_$i-RPMDELETE
> done
> end=$(date +%s%N)
>
> dur=$(( (end - start) / 1000000 ))
> echo "Renames took $dur milliseconds"
>
> umount $MNT
>
> Testing this change on a box using a non-debug kernel (Debian's default
> kernel config) gave the following results:
>
> NUM_FILES=10000, before patchset: 27837 ms
> NUM_FILES=10000, after patches 1/5 to 4/5 applied: 9236 ms (-66.8%)
> NUM_FILES=10000, after whole patchset applied: 8902 ms (-68.0%)
>
> NUM_FILES=5000, before patchset: 9127 ms
> NUM_FILES=5000, after patches 1/5 to 4/5 applied: 4640 ms (-49.2%)
> NUM_FILES=5000, after whole patchset applied: 4441 ms (-51.3%)
>
> NUM_FILES=2000, before patchset: 2528 ms
> NUM_FILES=2000, after patches 1/5 to 4/5 applied: 1983 ms (-21.6%)
> NUM_FILES=2000, after whole patchset applied: 1747 ms (-30.9%)
>
> NUM_FILES=1000, before patchset: 1085 ms
> NUM_FILES=1000, after patches 1/5 to 4/5 applied: 893 ms (-17.7%)
> NUM_FILES=1000, after whole patchset applied: 867 ms (-20.1%)
>
> Running dbench on the same physical machine with the following script:
>
> $ cat run-dbench.sh
> #!/bin/bash
>
> NUM_JOBS=$(nproc --all)
>
> DEV=/dev/nvme0n1
> MNT=/mnt/nvme0n1
> MOUNT_OPTIONS="-o ssd"
> MKFS_OPTIONS="-O no-holes -R free-space-tree"
>
> echo "performance" | \
> tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
>
> mkfs.btrfs -f $MKFS_OPTIONS $DEV
> mount $MOUNT_OPTIONS $DEV $MNT
>
> dbench -D $MNT -t 120 $NUM_JOBS
>
> umount $MNT
>
> Before patchset:
>
> Operation Count AvgLat MaxLat
> ----------------------------------------
> NTCreateX 3761352 0.032 143.843
> Close 2762770 0.002 2.273
> Rename 159304 0.291 67.037
> Unlink 759784 0.207 143.998
> Deltree 72 4.028 15.977
> Mkdir 36 0.003 0.006
> Qpathinfo 3409780 0.013 9.678
> Qfileinfo 596772 0.001 0.878
> Qfsinfo 625189 0.003 1.245
> Sfileinfo 306443 0.006 1.840
> Find 1318106 0.063 19.798
> WriteX 1871137 0.021 8.532
> ReadX 5897325 0.003 3.567
> LockX 12252 0.003 0.258
> UnlockX 12252 0.002 0.100
> Flush 263666 3.327 155.632
>
> Throughput 980.047 MB/sec 12 clients 12 procs max_latency=155.636 ms
>
> After whole patchset applied:
>
> Operation Count AvgLat MaxLat
> ----------------------------------------
> NTCreateX 4195584 0.033 107.742
> Close 3081932 0.002 1.935
> Rename 177641 0.218 14.905
> Unlink 847333 0.166 107.822
> Deltree 118 5.315 15.247
> Mkdir 59 0.004 0.048
> Qpathinfo 3802612 0.014 10.302
> Qfileinfo 666748 0.001 1.034
> Qfsinfo 697329 0.003 0.944
> Sfileinfo 341712 0.006 2.099
> Find 1470365 0.065 9.359
> WriteX 2093921 0.021 8.087
> ReadX 6576234 0.003 3.407
> LockX 13660 0.003 0.308
> UnlockX 13660 0.002 0.114
> Flush 294090 2.906 115.539
>
> Throughput 1093.11 MB/sec 12 clients 12 procs max_latency=115.544 ms
>
> +11.5% throughput -25.8% max latency rename max latency -77.8%
>
> Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1193549
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> fs/btrfs/tree-log.c | 245 ++++++++++++++++++++++++++++++++------------
> fs/btrfs/tree-log.h | 3 +
> 2 files changed, 183 insertions(+), 65 deletions(-)
>
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index bf529c6cb3ff..6f9829188948 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -3459,35 +3459,121 @@ int btrfs_free_log_root_tree(struct btrfs_trans_handle *trans,
> }
>
> /*
> - * Check if an inode was logged in the current transaction. This may often
> - * return some false positives, because logged_trans is an in memory only field,
> - * not persisted anywhere. This is meant to be used in contexts where a false
> - * positive has no functional consequences.
> + * Check if an inode was logged in the current transaction. This correctly deals
> + * with the case where the inode was logged but has a logged_trans of 0, which
> + * happens if the inode is evicted and loaded again, as logged_trans is an in
> + * memory only field (not persisted).
> + *
> + * Returns 1 if the inode was logged before in the transaction, 0 if it was not,
> + * and < 0 on error.
> */
> -static bool inode_logged(struct btrfs_trans_handle *trans,
> - struct btrfs_inode *inode)
> +static int inode_logged(struct btrfs_trans_handle *trans,
> + struct btrfs_inode *inode,
> + struct btrfs_path *path_in)
> {
> + struct btrfs_path *path = path_in;
> + struct btrfs_key key;
> + int ret;
> +
> if (inode->logged_trans == trans->transid)
> - return true;
> + return 1;
>
> - if (!test_bit(BTRFS_ROOT_HAS_LOG_TREE, &inode->root->state))
> - return false;
> + /*
> + * If logged_trans is not 0, then we know the inode logged was not logged
> + * in this transaction, so we can return false right away.
> + */
> + if (inode->logged_trans > 0)
> + return 0;
> +
> + /*
> + * If no log tree was created for this root in this transaction, then
> + * the inode can not have been logged in this transaction. In that case
> + * set logged_trans to anything greater than 0 and less than the current
> + * transaction's ID, to avoid the search below in a future call in case
> + * a log tree gets created after this.
> + */
> + if (!test_bit(BTRFS_ROOT_HAS_LOG_TREE, &inode->root->state)) {
> + inode->logged_trans = trans->transid - 1;
> + return 0;
> + }
> +
> + /*
> + * We have a log tree and the inode's logged_trans is 0. We can't tell
> + * for sure if the inode was logged before in this transaction by looking
> + * only at logged_trans. We could be pessimistic and assume it was, but
> + * that can lead to unnecessarily logging an inode during rename and link
> + * operations, and then further updating the log in followup rename and
> + * link operations, specially if it's a directory, which adds latency
> + * visible to applications doing a series of rename or link operations.
> + *
> + * A logged_trans of 0 here can mean several things:
> + *
> + * 1) The inode was never logged since the filesystem was mounted, and may
> + * or may have not been evicted and loaded again;
> + *
> + * 2) The inode was logged in a previous transaction, then evicted and
> + * then loaded again;
> + *
> + * 3) The inode was logged in the current transaction, then evicted and
> + * then loaded again.
> + *
> + * For cases 1) and 2) we don't want to return true, but we need to detect
> + * case 3) and return true. So we do a search in the log root for the inode
> + * item.
> + */
> + key.objectid = btrfs_ino(inode);
> + key.type = BTRFS_INODE_ITEM_KEY;
> + key.offset = 0;
> +
> + if (!path) {
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
> + }
> +
> + ret = btrfs_search_slot(NULL, inode->root->log_root, &key, path, 0, 0);
> +
> + if (path_in)
> + btrfs_release_path(path);
> + else
> + btrfs_free_path(path);
>
> /*
> - * The inode's logged_trans is always 0 when we load it (because it is
> - * not persisted in the inode item or elsewhere). So if it is 0, the
> - * inode was last modified in the current transaction then the inode may
> - * have been logged before in the current transaction, then evicted and
> - * loaded again in the current transaction - or may have never been logged
> - * in the current transaction, but since we can not be sure, we have to
> - * assume it was, otherwise our callers can leave an inconsistent log.
> + * Logging an inode always results in logging its inode item. So if we
> + * did not find the item we know the inode was not logged for sure.
> */
> - if (inode->logged_trans == 0 &&
> - inode->last_trans == trans->transid &&
> - !test_bit(BTRFS_FS_LOG_RECOVERING, &trans->fs_info->flags))
> - return true;
> + if (ret < 0) {
> + return ret;
> + } else if (ret > 0) {
> + /*
> + * Set logged_trans to a value greater than 0 and less then the
> + * current transaction to avoid doing the search in future calls.
> + */
> + inode->logged_trans = trans->transid - 1;
> + return 0;
> + }
> +
> + /*
> + * The inode was previously logged and then evicted, set logged_trans to
> + * the current transacion's ID, to avoid future tree searches as long as
> + * the inode is not evicted again.
> + */
> + inode->logged_trans = trans->transid;
> +
> + /*
> + * If it's a directory, then we must set last_dir_index_offset to the
> + * maximum possible value, so that the next attempt to log the inode does
> + * not skip checking if dir index keys found in modified subvolume tree
> + * leaves have been logged before, otherwise it would result in attempts
> + * to insert duplicate dir index keys in the log tree. This must be done
> + * because last_dir_index_offset is an in-memory only field, not persisted
> + * in the inode item or any other on-disk structure, so its value is lost
> + * once the inode is evicted.
> + */
> + if (S_ISDIR(inode->vfs_inode.i_mode))
> + inode->last_dir_index_offset = (u64)-1;
>
> - return false;
> + return 1;
> }
>
> /*
> @@ -3552,8 +3638,13 @@ void btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
> struct btrfs_path *path;
> int ret;
>
> - if (!inode_logged(trans, dir))
> + ret = inode_logged(trans, dir, NULL);
> + if (ret == 0)
> + return;
> + else if (ret < 0) {
> + btrfs_set_log_full_commit(trans);
> return;
> + }
>
> ret = join_running_log_trans(root);
> if (ret)
> @@ -3587,8 +3678,13 @@ void btrfs_del_inode_ref_in_log(struct btrfs_trans_handle *trans,
> u64 index;
> int ret;
>
> - if (!inode_logged(trans, inode))
> + ret = inode_logged(trans, inode, NULL);
> + if (ret == 0)
> return;
> + else if (ret < 0) {
> + btrfs_set_log_full_commit(trans);
> + return;
> + }
>
> ret = join_running_log_trans(root);
> if (ret)
> @@ -3720,7 +3816,6 @@ static int process_dir_items_leaf(struct btrfs_trans_handle *trans,
> struct extent_buffer *src = path->nodes[0];
> const int nritems = btrfs_header_nritems(src);
> const u64 ino = btrfs_ino(inode);
> - const bool inode_logged_before = inode_logged(trans, inode);
> bool last_found = false;
> int batch_start = 0;
> int batch_size = 0;
> @@ -3796,14 +3891,16 @@ static int process_dir_items_leaf(struct btrfs_trans_handle *trans,
> ctx->log_new_dentries = true;
> }
>
> - if (!inode_logged_before)
> + if (!ctx->logged_before)
> goto add_to_batch;
>
> /*
> * If we were logged before and have logged dir items, we can skip
> * checking if any item with a key offset larger than the last one
> * we logged is in the log tree, saving time and avoiding adding
> - * contention on the log tree.
> + * contention on the log tree. We can only rely on the value of
> + * last_dir_index_offset when we know for sure that the inode was
> + * previously logged in the current transaction.
> */
> if (key.offset > inode->last_dir_index_offset)
> goto add_to_batch;
> @@ -4046,21 +4143,6 @@ static noinline int log_directory_changes(struct btrfs_trans_handle *trans,
> u64 max_key;
> int ret;
>
> - /*
> - * If this is the first time we are being logged in the current
> - * transaction, or we were logged before but the inode was evicted and
> - * reloaded later, in which case its logged_trans is 0, reset the value
> - * of the last logged key offset. Note that we don't use the helper
> - * function inode_logged() here - that is because the function returns
> - * true after an inode eviction, assuming the worst case as it can not
> - * know for sure if the inode was logged before. So we can not skip key
> - * searches in the case the inode was evicted, because it may not have
> - * been logged in this transaction and may have been logged in a past
> - * transaction, so we need to reset the last dir index offset to (u64)-1.
> - */
> - if (inode->logged_trans != trans->transid)
> - inode->last_dir_index_offset = (u64)-1;
> -
> min_key = BTRFS_DIR_START_INDEX;
> max_key = 0;
> ctx->last_dir_item_offset = inode->last_dir_index_offset;
> @@ -4097,9 +4179,6 @@ static int drop_inode_items(struct btrfs_trans_handle *trans,
> struct btrfs_key found_key;
> int start_slot;
>
> - if (!inode_logged(trans, inode))
> - return 0;
> -
> key.objectid = btrfs_ino(inode);
> key.type = max_key_type;
> key.offset = (u64)-1;
> @@ -4597,7 +4676,7 @@ static int log_one_extent(struct btrfs_trans_handle *trans,
> * are small, with a root at level 2 or 3 at most, due to their short
> * life span.
> */
> - if (inode_logged(trans, inode)) {
> + if (ctx->logged_before) {
> drop_args.path = path;
> drop_args.start = em->start;
> drop_args.end = em->start + em->len;
> @@ -5594,6 +5673,7 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
> bool xattrs_logged = false;
> bool recursive_logging = false;
> bool inode_item_dropped = true;
> + const bool orig_logged_before = ctx->logged_before;
>
> path = btrfs_alloc_path();
> if (!path)
> @@ -5643,6 +5723,18 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
> mutex_lock(&inode->log_mutex);
> }
>
> + /*
> + * Before logging the inode item, cache the value returned by
> + * inode_logged(), because after that we have the need to figure out if
> + * the inode was previously logged in this transaction.
> + */
> + ret = inode_logged(trans, inode, path);
> + if (ret < 0) {
> + err = ret;
> + goto out;
This should be 'goto out_unlock', otherwise it returns without
unlocking the inode's log_mutex.
David, do you prefer me to send a new version of the patchset or can
you edit this in misc-next?
Thanks.
> + }
> + ctx->logged_before = (ret == 1);
> +
> /*
> * This is for cases where logging a directory could result in losing a
> * a file after replaying the log. For example, if we move a file from a
> @@ -5668,9 +5760,11 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
> clear_bit(BTRFS_INODE_COPY_EVERYTHING, &inode->runtime_flags);
> if (inode_only == LOG_INODE_EXISTS)
> max_key_type = BTRFS_XATTR_ITEM_KEY;
> - ret = drop_inode_items(trans, log, path, inode, max_key_type);
> + if (ctx->logged_before)
> + ret = drop_inode_items(trans, log, path, inode,
> + max_key_type);
> } else {
> - if (inode_only == LOG_INODE_EXISTS && inode_logged(trans, inode)) {
> + if (inode_only == LOG_INODE_EXISTS && ctx->logged_before) {
> /*
> * Make sure the new inode item we write to the log has
> * the same isize as the current one (if it exists).
> @@ -5692,14 +5786,15 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
> &inode->runtime_flags)) {
> if (inode_only == LOG_INODE_EXISTS) {
> max_key.type = BTRFS_XATTR_ITEM_KEY;
> - ret = drop_inode_items(trans, log, path, inode,
> - max_key.type);
> + if (ctx->logged_before)
> + ret = drop_inode_items(trans, log, path,
> + inode, max_key.type);
> } else {
> clear_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
> &inode->runtime_flags);
> clear_bit(BTRFS_INODE_COPY_EVERYTHING,
> &inode->runtime_flags);
> - if (inode_logged(trans, inode))
> + if (ctx->logged_before)
> ret = truncate_inode_items(trans, log,
> inode, 0, 0);
> }
> @@ -5709,8 +5804,9 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
> if (inode_only == LOG_INODE_ALL)
> fast_search = true;
> max_key.type = BTRFS_XATTR_ITEM_KEY;
> - ret = drop_inode_items(trans, log, path, inode,
> - max_key.type);
> + if (ctx->logged_before)
> + ret = drop_inode_items(trans, log, path, inode,
> + max_key.type);
> } else {
> if (inode_only == LOG_INODE_ALL)
> fast_search = true;
> @@ -5830,6 +5926,10 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
> out:
> btrfs_free_path(path);
> btrfs_free_path(dst_path);
> +
> + if (recursive_logging)
> + ctx->logged_before = orig_logged_before;
> +
> return err;
> }
>
> @@ -6774,7 +6874,7 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans,
> struct btrfs_root *root = inode->root;
> struct btrfs_log_ctx ctx;
> bool log_pinned = false;
> - int ret = 0;
> + int ret;
>
> /*
> * this will force the logging code to walk the dentry chain
> @@ -6787,9 +6887,24 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans,
> * if this inode hasn't been logged and directory we're renaming it
> * from hasn't been logged, we don't need to log it
> */
> - if (!inode_logged(trans, inode) &&
> - (!old_dir || !inode_logged(trans, old_dir)))
> - return;
> + ret = inode_logged(trans, inode, NULL);
> + if (ret < 0) {
> + goto out;
> + } else if (ret == 0) {
> + if (!old_dir)
> + return;
> + /*
> + * If the inode was not logged and we are doing a rename (old_dir is not
> + * NULL), check if old_dir was logged - if it was not we can return and
> + * do nothing.
> + */
> + ret = inode_logged(trans, old_dir, NULL);
> + if (ret < 0)
> + goto out;
> + else if (ret == 0)
> + return;
> + }
> + ret = 0;
>
> /*
> * If we are doing a rename (old_dir is not NULL) from a directory that
> @@ -6849,15 +6964,15 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans,
> */
> btrfs_log_inode_parent(trans, inode, parent, LOG_INODE_EXISTS, &ctx);
> out:
> - if (log_pinned) {
> - /*
> - * If an error happened mark the log for a full commit because
> - * it's not consistent and up to date. Do it before unpinning the
> - * log, to avoid any races with someone else trying to commit it.
> - */
> - if (ret < 0)
> - btrfs_set_log_full_commit(trans);
> + /*
> + * If an error happened mark the log for a full commit because it's not
> + * consistent and up to date or we couldn't find out if one of the
> + * inodes was logged before in this transaction. Do it before unpinning
> + * the log, to avoid any races with someone else trying to commit it.
> + */
> + if (ret < 0)
> + btrfs_set_log_full_commit(trans);
> + if (log_pinned)
> btrfs_end_log_trans(root);
> - }
> }
>
> diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
> index f1acb7fa944c..1620f8170629 100644
> --- a/fs/btrfs/tree-log.h
> +++ b/fs/btrfs/tree-log.h
> @@ -17,6 +17,8 @@ struct btrfs_log_ctx {
> int log_transid;
> bool log_new_dentries;
> bool logging_new_name;
> + /* Indicate if the inode being logged was logged before. */
> + bool logged_before;
> /* Tracks the last logged dir item/index key offset. */
> u64 last_dir_item_offset;
> struct inode *inode;
> @@ -32,6 +34,7 @@ static inline void btrfs_init_log_ctx(struct btrfs_log_ctx *ctx,
> ctx->log_transid = 0;
> ctx->log_new_dentries = false;
> ctx->logging_new_name = false;
> + ctx->logged_before = false;
> ctx->inode = inode;
> INIT_LIST_HEAD(&ctx->list);
> INIT_LIST_HEAD(&ctx->ordered_extents);
> --
> 2.33.0
>
next prev parent reply other threads:[~2022-02-02 16:50 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-20 11:00 [PATCH 0/6] btrfs: speedup and avoid inode logging during rename/link fdmanana
2022-01-20 11:00 ` [PATCH 1/6] btrfs: add helper to delete a dir entry from a log tree fdmanana
2022-01-20 11:00 ` [PATCH 2/6] btrfs: pass the dentry to btrfs_log_new_name() instead of the inode fdmanana
2022-01-20 11:00 ` [PATCH 3/6] btrfs: avoid logging all directory changes during renames fdmanana
2022-01-20 11:00 ` [PATCH 4/6] btrfs: stop doing unnecessary log updates during a rename fdmanana
2022-01-20 11:00 ` [PATCH 5/6] btrfs: avoid inode logging during rename and link when possible fdmanana
2022-02-02 16:49 ` Filipe Manana [this message]
2022-01-20 11:00 ` [PATCH 6/6] btrfs: use single variable to track return value at btrfs_log_inode() fdmanana
2022-01-25 17:42 ` [PATCH 0/6] btrfs: speedup and avoid inode logging during rename/link David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAL3q7H4b7NcJMgkfGOoBi0HPTQS8zFy6mGxyZamU_51dtKp2HQ@mail.gmail.com \
--to=fdmanana@kernel.org \
--cc=dsterba@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).