All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Filipe Manana <fdmanana@suse.com>,
	David Sterba <dsterba@suse.com>, Sasha Levin <sashal@kernel.org>,
	clm@fb.com, jbacik@fb.com, linux-btrfs@vger.kernel.org
Subject: [PATCH AUTOSEL 5.17 17/21] btrfs: reset last_reflink_trans after fsyncing inode
Date: Mon, 28 Mar 2022 15:41:52 -0400	[thread overview]
Message-ID: <20220328194157.1585642-17-sashal@kernel.org> (raw)
In-Reply-To: <20220328194157.1585642-1-sashal@kernel.org>

From: Filipe Manana <fdmanana@suse.com>

[ Upstream commit 23e3337faf73e5bb2610697977e175313d48acb0 ]

When an inode has a last_reflink_trans matching the current transaction,
we have to take special care when logging its checksums in order to
avoid getting checksum items with overlapping ranges in a log tree,
which could result in missing checksums after log replay (more on that
in the changelogs of commit 40e046acbd2f36 ("Btrfs: fix missing data
checksums after replaying a log tree") and commit e289f03ea79bbc ("btrfs:
fix corrupt log due to concurrent fsync of inodes with shared extents")).
We also need to make sure a full fsync will copy all old file extent
items it finds in modified leaves, because they might have been copied
from some other inode.

However once we fsync an inode, we don't need to keep paying the price of
that extra special care in future fsyncs done in the same transaction,
unless the inode is used for another reflink operation or the full sync
flag is set on it (truncate, failure to allocate extent maps for holes,
and other exceptional and infrequent cases).

So after we fsync an inode reset its last_unlink_trans to zero. In case
another reflink happens, we continue to update the last_reflink_trans of
the inode, just as before. Also set last_reflink_trans to the generation
of the last transaction that modified the inode whenever we need to set
the full sync flag on the inode, just like when we need to load an inode
from disk after eviction.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/btrfs_inode.h | 30 ++++++++++++++++++++++++++++++
 fs/btrfs/file.c        |  7 +++----
 fs/btrfs/inode.c       | 12 +++++-------
 fs/btrfs/reflink.c     |  5 ++---
 fs/btrfs/tree-log.c    |  8 ++++++++
 5 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index b3e46aabc3d8..d0b52b106041 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -333,6 +333,36 @@ static inline void btrfs_set_inode_last_sub_trans(struct btrfs_inode *inode)
 	spin_unlock(&inode->lock);
 }
 
+/*
+ * Should be called while holding the inode's VFS lock in exclusive mode or in a
+ * context where no one else can access the inode concurrently (during inode
+ * creation or when loading an inode from disk).
+ */
+static inline void btrfs_set_inode_full_sync(struct btrfs_inode *inode)
+{
+	set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
+	/*
+	 * The inode may have been part of a reflink operation in the last
+	 * transaction that modified it, and then a fsync has reset the
+	 * last_reflink_trans to avoid subsequent fsyncs in the same
+	 * transaction to do unnecessary work. So update last_reflink_trans
+	 * to the last_trans value (we have to be pessimistic and assume a
+	 * reflink happened).
+	 *
+	 * The ->last_trans is protected by the inode's spinlock and we can
+	 * have a concurrent ordered extent completion update it. Also set
+	 * last_reflink_trans to ->last_trans only if the former is less than
+	 * the later, because we can be called in a context where
+	 * last_reflink_trans was set to the current transaction generation
+	 * while ->last_trans was not yet updated in the current transaction,
+	 * and therefore has a lower value.
+	 */
+	spin_lock(&inode->lock);
+	if (inode->last_reflink_trans < inode->last_trans)
+		inode->last_reflink_trans = inode->last_trans;
+	spin_unlock(&inode->lock);
+}
+
 static inline bool btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation)
 {
 	bool ret = false;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a0179cc62913..f38cc706a6cf 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2474,7 +2474,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 	hole_em = alloc_extent_map();
 	if (!hole_em) {
 		btrfs_drop_extent_cache(inode, offset, end - 1, 0);
-		set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
+		btrfs_set_inode_full_sync(inode);
 	} else {
 		hole_em->start = offset;
 		hole_em->len = end - offset;
@@ -2495,8 +2495,7 @@ static int fill_holes(struct btrfs_trans_handle *trans,
 		} while (ret == -EEXIST);
 		free_extent_map(hole_em);
 		if (ret)
-			set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
-					&inode->runtime_flags);
+			btrfs_set_inode_full_sync(inode);
 	}
 
 	return 0;
@@ -2850,7 +2849,7 @@ int btrfs_replace_file_extents(struct btrfs_inode *inode,
 	 * maps for the replacement extents (or holes).
 	 */
 	if (extent_info && !extent_info->is_new_extent)
-		set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
+		btrfs_set_inode_full_sync(inode);
 
 	if (ret)
 		goto out_trans;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5aace4c13519..3783fdf78da8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -423,7 +423,7 @@ static noinline int cow_file_range_inline(struct btrfs_inode *inode, u64 start,
 		goto out;
 	}
 
-	set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &inode->runtime_flags);
+	btrfs_set_inode_full_sync(inode);
 out:
 	/*
 	 * Don't forget to free the reserved space, as for inlined extent
@@ -4882,8 +4882,7 @@ int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size)
 						cur_offset + hole_size - 1, 0);
 			hole_em = alloc_extent_map();
 			if (!hole_em) {
-				set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
-					&inode->runtime_flags);
+				btrfs_set_inode_full_sync(inode);
 				goto next;
 			}
 			hole_em->start = cur_offset;
@@ -6146,7 +6145,7 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans,
 	 * sync since it will be a full sync anyway and this will blow away the
 	 * old info in the log.
 	 */
-	set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(inode)->runtime_flags);
+	btrfs_set_inode_full_sync(BTRFS_I(inode));
 
 	key[0].objectid = objectid;
 	key[0].type = BTRFS_INODE_ITEM_KEY;
@@ -8740,7 +8739,7 @@ static int btrfs_truncate(struct inode *inode, bool skip_writeback)
 	 * extents beyond i_size to drop.
 	 */
 	if (control.extents_found > 0)
-		set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(inode)->runtime_flags);
+		btrfs_set_inode_full_sync(BTRFS_I(inode));
 
 	return ret;
 }
@@ -10027,8 +10026,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 
 		em = alloc_extent_map();
 		if (!em) {
-			set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
-				&BTRFS_I(inode)->runtime_flags);
+			btrfs_set_inode_full_sync(BTRFS_I(inode));
 			goto next;
 		}
 
diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index a3930da4eb3f..e37a61ad87df 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -277,7 +277,7 @@ static int clone_copy_inline_extent(struct inode *dst,
 						  path->slots[0]),
 			    size);
 	btrfs_update_inode_bytes(BTRFS_I(dst), datal, drop_args.bytes_found);
-	set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, &BTRFS_I(dst)->runtime_flags);
+	btrfs_set_inode_full_sync(BTRFS_I(dst));
 	ret = btrfs_inode_set_file_extent_range(BTRFS_I(dst), 0, aligned_end);
 out:
 	if (!ret && !trans) {
@@ -575,8 +575,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 		 * replaced file extent items.
 		 */
 		if (last_dest_end >= i_size_read(inode))
-			set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
-				&BTRFS_I(inode)->runtime_flags);
+			btrfs_set_inode_full_sync(BTRFS_I(inode));
 
 		ret = btrfs_replace_file_extents(BTRFS_I(inode), path,
 				last_dest_end, destoff + len - 1, NULL, &trans);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 6bc8834ac8f7..607527a924c2 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -5836,6 +5836,14 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
 	if (inode_only != LOG_INODE_EXISTS)
 		inode->last_log_commit = inode->last_sub_trans;
 	spin_unlock(&inode->lock);
+
+	/*
+	 * Reset the last_reflink_trans so that the next fsync does not need to
+	 * go through the slower path when logging extents and their checksums.
+	 */
+	if (inode_only == LOG_INODE_ALL)
+		inode->last_reflink_trans = 0;
+
 out_unlock:
 	mutex_unlock(&inode->log_mutex);
 out:
-- 
2.34.1


  parent reply	other threads:[~2022-03-28 19:42 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-28 19:41 [PATCH AUTOSEL 5.17 01/21] atomics: Fix atomic64_{read_acquire,set_release} fallbacks Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 02/21] locking/lockdep: Iterate lock_classes directly when reading lockdep files Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 03/21] ext4: correct cluster len and clusters changed accounting in ext4_mb_mark_bb Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 04/21] ext4: fix ext4_mb_mark_bb() with flex_bg with fast_commit Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 05/21] sched/tracing: Don't re-read p->state when emitting sched_switch event Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 06/21] sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 07/21] ext4: don't BUG if someone dirty pages without asking ext4 first Sasha Levin
2022-03-28 19:42   ` syzbot
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 08/21] f2fs: fix to do sanity check on curseg->alloc_type Sasha Levin
2022-03-28 19:41   ` [f2fs-dev] " Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 09/21] NFSD: Fix nfsd_breaker_owns_lease() return values Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 10/21] f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs Sasha Levin
2022-03-28 19:41   ` [f2fs-dev] " Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 11/21] btrfs: harden identification of a stale device Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 12/21] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() Sasha Levin
2022-03-29 19:57   ` Omar Sandoval
2022-03-31 16:58     ` Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 13/21] btrfs: make search_csum_tree return 0 if we get -EFBIG Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 14/21] btrfs: handle csum lookup errors properly on reads Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 15/21] btrfs: do not double complete bio on errors during compressed reads Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 16/21] btrfs: do not clean up repair bio if submit fails Sasha Levin
2022-03-28 19:41 ` Sasha Levin [this message]
2022-03-29  9:59   ` [PATCH AUTOSEL 5.17 17/21] btrfs: reset last_reflink_trans after fsyncing inode Filipe Manana
2022-03-31 16:59     ` Sasha Levin
2022-03-31 17:57       ` Filipe Manana
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 18/21] f2fs: use spin_lock to avoid hang Sasha Levin
2022-03-28 19:41   ` [f2fs-dev] " Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 19/21] f2fs: compress: fix to print raw data size in error path of lz4 decompression Sasha Levin
2022-03-28 19:41   ` [f2fs-dev] " Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 20/21] Adjust cifssb maximum read size Sasha Levin
2022-03-28 19:41 ` [PATCH AUTOSEL 5.17 21/21] ntfs: add sanity check on allocation size Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220328194157.1585642-17-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.