linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] btrfs: send: fix a failure when looking for data backrefs after relocation
@ 2021-12-02 10:21 fdmanana
  2021-12-02 19:28 ` Josef Bacik
  2021-12-06 17:52 ` David Sterba
  0 siblings, 2 replies; 4+ messages in thread
From: fdmanana @ 2021-12-02 10:21 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

During a send, when trying to find roots from which to clone data extents,
if the leaf of our file extent item was obtained before relocation for a
data block group finished, we can end up trying to lookup for backrefs
for an extent location (file extent item's disk_bytenr) that is not in
use anymore. That is, the extent was reallocated and the transaction used
for the relocation was committed. This makes the backref lookup not find
anything and we fail at find_extent_clone() with -EIO and log an error
message like the following:

  [ 7642.897365] BTRFS error (device sdc): did not find backref in send_root. inode=881, offset=2592768, disk_byte=1292025856 found extent=1292025856

This is because we are checking if relocation happened after we check if
we found the backref for the file extent item we are processing. We should
do it before, and in case relocation happened, do not attempt to clone and
instead fallback to issuing write commands, which will read the correct
data from the new extent location. The current check is being done too
late, so fix this by moving it to right after we do the backref lookup and
before checking if we found our own backref.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

David, this can be squashed into the patch:

   "btrfs: make send work with concurrent block group relocation"

 fs/btrfs/send.c | 42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index f0015b5cf4b1..3fc144b8c0d8 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -1431,6 +1431,26 @@ static int find_extent_clone(struct send_ctx *sctx,
 	if (ret < 0)
 		goto out;
 
+	down_read(&fs_info->commit_root_sem);
+	if (fs_info->last_reloc_trans > sctx->last_reloc_trans) {
+		/*
+		 * A transaction commit for a transaction in which block group
+		 * relocation was done just happened.
+		 * The disk_bytenr of the file extent item we processed is
+		 * possibly stale, referring to the extent's location before
+		 * relocation. So act as if we haven't found any clone sources
+		 * and fallback to write commands, which will read the correct
+		 * data from the new extent location. Otherwise we will fail
+		 * below because we haven't found our own back reference or we
+		 * could be getting incorrect sources in case the old extent
+		 * was already reallocated after the relocation.
+		 */
+		up_read(&fs_info->commit_root_sem);
+		ret = -ENOENT;
+		goto out;
+	}
+	up_read(&fs_info->commit_root_sem);
+
 	if (!backref_ctx.found_itself) {
 		/* found a bug in backref code? */
 		ret = -EIO;
@@ -1444,28 +1464,8 @@ static int find_extent_clone(struct send_ctx *sctx,
 		    "find_extent_clone: data_offset=%llu, ino=%llu, num_bytes=%llu, logical=%llu",
 		    data_offset, ino, num_bytes, logical);
 
-	if (backref_ctx.found > 0) {
-		down_read(&fs_info->commit_root_sem);
-		if (fs_info->last_reloc_trans > sctx->last_reloc_trans) {
-			/*
-			 * A transaction commit for a transaction in which block
-			 * group relocation was done just happened.
-			 * The disk_bytenr of the file extent item we processed
-			 * is possibly stale, referring to the extent's location
-			 * before relocation, so act as if we haven't found any
-			 * clone sources - otherwise we could end up later issuing
-			 * clone operations that could leave the receiver with
-			 * incorrect data, in case the old disk_bytenr got
-			 * reallocated for another extent.
-			 */
-			up_read(&fs_info->commit_root_sem);
-			ret = -ENOENT;
-			goto out;
-		}
-		up_read(&fs_info->commit_root_sem);
-	} else {
+	if (!backref_ctx.found)
 		btrfs_debug(fs_info, "no clones found");
-	}
 
 	cur_clone_root = NULL;
 	for (i = 0; i < sctx->clone_roots_cnt; i++) {
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: send: fix a failure when looking for data backrefs after relocation
  2021-12-02 10:21 [PATCH] btrfs: send: fix a failure when looking for data backrefs after relocation fdmanana
@ 2021-12-02 19:28 ` Josef Bacik
  2021-12-03 11:13   ` Filipe Manana
  2021-12-06 17:52 ` David Sterba
  1 sibling, 1 reply; 4+ messages in thread
From: Josef Bacik @ 2021-12-02 19:28 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Thu, Dec 02, 2021 at 10:21:43AM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> During a send, when trying to find roots from which to clone data extents,
> if the leaf of our file extent item was obtained before relocation for a
> data block group finished, we can end up trying to lookup for backrefs
> for an extent location (file extent item's disk_bytenr) that is not in
> use anymore. That is, the extent was reallocated and the transaction used
> for the relocation was committed. This makes the backref lookup not find
> anything and we fail at find_extent_clone() with -EIO and log an error
> message like the following:
> 
>   [ 7642.897365] BTRFS error (device sdc): did not find backref in send_root. inode=881, offset=2592768, disk_byte=1292025856 found extent=1292025856
> 
> This is because we are checking if relocation happened after we check if
> we found the backref for the file extent item we are processing. We should
> do it before, and in case relocation happened, do not attempt to clone and
> instead fallback to issuing write commands, which will read the correct
> data from the new extent location. The current check is being done too
> late, so fix this by moving it to right after we do the backref lookup and
> before checking if we found our own backref.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

I'm not against this in principal, but won't we come all the way back out of
this loop and re-search higher up because things changed?  Can we just do a
-EAGAIN, come out and re-search down to this key so we can still do the clone
properly?  If we can't then this is reasonable, but I'd like to avoid blowing up
a send stream because relocation was running if at all possible.

Josef

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: send: fix a failure when looking for data backrefs after relocation
  2021-12-02 19:28 ` Josef Bacik
@ 2021-12-03 11:13   ` Filipe Manana
  0 siblings, 0 replies; 4+ messages in thread
From: Filipe Manana @ 2021-12-03 11:13 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On Thu, Dec 02, 2021 at 02:28:49PM -0500, Josef Bacik wrote:
> On Thu, Dec 02, 2021 at 10:21:43AM +0000, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> > 
> > During a send, when trying to find roots from which to clone data extents,
> > if the leaf of our file extent item was obtained before relocation for a
> > data block group finished, we can end up trying to lookup for backrefs
> > for an extent location (file extent item's disk_bytenr) that is not in
> > use anymore. That is, the extent was reallocated and the transaction used
> > for the relocation was committed. This makes the backref lookup not find
> > anything and we fail at find_extent_clone() with -EIO and log an error
> > message like the following:
> > 
> >   [ 7642.897365] BTRFS error (device sdc): did not find backref in send_root. inode=881, offset=2592768, disk_byte=1292025856 found extent=1292025856
> > 
> > This is because we are checking if relocation happened after we check if
> > we found the backref for the file extent item we are processing. We should
> > do it before, and in case relocation happened, do not attempt to clone and
> > instead fallback to issuing write commands, which will read the correct
> > data from the new extent location. The current check is being done too
> > late, so fix this by moving it to right after we do the backref lookup and
> > before checking if we found our own backref.
> > 
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> 
> I'm not against this in principal, but won't we come all the way back out of
> this loop and re-search higher up because things changed?  Can we just do a
> -EAGAIN, come out and re-search down to this key so we can still do the clone
> properly?  If we can't then this is reasonable, but I'd like to avoid blowing up
> a send stream because relocation was running if at all possible.

It could be done, but I didn't do it that way because:

1) Mostly to keep it as simple as possible initially.

2) I wanted to avoid the possibility of too many tree re-searches.
   Though I have seen it happens rarely at find_extent_clone() during my
   testing, and that's because we do the check and re-search before advancing
   to the next key in the tree iteration code (full_send_tree() and
   btrfs_compare_trees()).

   Overall I haven't seen an excessive number of re-searches, and when they
   happen they are cheap as the extent buffers are already in memory.

   But I plan on later to eliminate some unnecessary re-searches, by keeping
   track of what kind of block group was relocated (data/metadata/system) and
   its logical range.

   For now I wanted to make sure that it always produces corrects results and
   that performance is acceptable. As it is it may ocassionaly issues write
   operations instead of clone operations, but again it rarely happens and we
   already have a few cases where we skip cloning anyway (too many extent refs
   and a few edge cases).

Seems reasonable?

Thanks.

> 
> Josef

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: send: fix a failure when looking for data backrefs after relocation
  2021-12-02 10:21 [PATCH] btrfs: send: fix a failure when looking for data backrefs after relocation fdmanana
  2021-12-02 19:28 ` Josef Bacik
@ 2021-12-06 17:52 ` David Sterba
  1 sibling, 0 replies; 4+ messages in thread
From: David Sterba @ 2021-12-06 17:52 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Thu, Dec 02, 2021 at 10:21:43AM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> During a send, when trying to find roots from which to clone data extents,
> if the leaf of our file extent item was obtained before relocation for a
> data block group finished, we can end up trying to lookup for backrefs
> for an extent location (file extent item's disk_bytenr) that is not in
> use anymore. That is, the extent was reallocated and the transaction used
> for the relocation was committed. This makes the backref lookup not find
> anything and we fail at find_extent_clone() with -EIO and log an error
> message like the following:
> 
>   [ 7642.897365] BTRFS error (device sdc): did not find backref in send_root. inode=881, offset=2592768, disk_byte=1292025856 found extent=1292025856
> 
> This is because we are checking if relocation happened after we check if
> we found the backref for the file extent item we are processing. We should
> do it before, and in case relocation happened, do not attempt to clone and
> instead fallback to issuing write commands, which will read the correct
> data from the new extent location. The current check is being done too
> late, so fix this by moving it to right after we do the backref lookup and
> before checking if we found our own backref.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> 
> David, this can be squashed into the patch:
> 
>    "btrfs: make send work with concurrent block group relocation"

Squashed, thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-12-06 17:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-02 10:21 [PATCH] btrfs: send: fix a failure when looking for data backrefs after relocation fdmanana
2021-12-02 19:28 ` Josef Bacik
2021-12-03 11:13   ` Filipe Manana
2021-12-06 17:52 ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).