All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] btrfs: send: correctly recreate changed inodes
@ 2021-01-25 19:42 Roman Anasal
  2021-01-25 19:42 ` [PATCH v2 1/4] btrfs: send: rename send_ctx.cur_inode_new_gen to cur_inode_recreated Roman Anasal
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Roman Anasal @ 2021-01-25 19:42 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Roman Anasal

This is v2 of the patch set submitted in [1]. Changes in the patch set:
- fixed mixed code/declarations according to feedback
- add patch to also handle the case of differing rdev with same gen
- add patch to fix invalid commands when using disconnected roots, see [2]

[1] https://lore.kernel.org/linux-btrfs/20210111190243.4152-1-roman.anasal@bdsu.de/
[2] https://lore.kernel.org/linux-btrfs/424d62853024d8b0bc5ca03206eeca35be6014a2.camel@bdsu.de/


TL;DR: the patches address issues arising only in edge cases when snapshots
used for incremental send are not stricly based off of a single writable
parent, i.e. when
- the subvolumes do not share a common anchestor
- the subvolumes are (based off of) snapshots that were modified within the
  same transaction after they were created

Outside of these cases it is safe to assume that an inode will only need to
be recreated if the generation changed - which happens when the inode was
deleted and a new inode reused the same inode number - because creating a
snapshot forces a transaction commit.

But in the above cases it is possible that inodes that are created within
same transaction in both subvolumes end up having the same inode number as
well as same generation. These would the produce invalid commands streams
that cause the receiver to fail or produce incorrect results.


I am aware that these edge cases lie clearly outside the expected use case
which is sending only related subvolumes incrementally. The reason I updated
the patches anyway was that I felt challenged by this and wanted to get some
programming/debugging practice with the kernel and get a deeper
understanding of btrfs.

I tried as good as I could to limit any effects of the patches to the
particluar cases I discovered and am fairly confident that they won't affect
the behavior in other cases - but of course am open to be correct here
by the far more experienced eyes on this.
But even if the patches will be rejected because the issues can only occur
in unsupported use of btrfs send/receive I wanted to document my findings
here; maybe they can still be picked up later.

In any case I'd appreciate any feedback and the code/patches themselves
since it's my first contribution (attempt) to the kernel (=


Roman Anasal (4):
  btrfs: send: rename send_ctx.cur_inode_new_gen to cur_inode_recreated
  btrfs: send: fix invalid commands for inodes with changed type but
    same gen
  btrfs: send: fix invalid commands for inodes with changed rdev but
    same gen
  btrfs: send: fix invalid commands for inodes in disconnected roots

 fs/btrfs/send.c | 69 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 45 insertions(+), 24 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/4] btrfs: send: rename send_ctx.cur_inode_new_gen to cur_inode_recreated
  2021-01-25 19:42 [PATCH v2 0/4] btrfs: send: correctly recreate changed inodes Roman Anasal
@ 2021-01-25 19:42 ` Roman Anasal
  2021-01-25 19:42 ` [PATCH v2 2/4] btrfs: send: fix invalid commands for inodes with changed type but same gen Roman Anasal
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Roman Anasal @ 2021-01-25 19:42 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Roman Anasal

cur_inode_new_gen is used to detect whether an inode was/has to be
recreated which is - currently (!) - only based on whether a changed
inode as differing generations.
To allow additional checks for recreating an inode (see following patch)
and still have a sane naming this change was made.

Signed-off-by: Roman Anasal <roman.anasal@bdsu.de>
---
 fs/btrfs/send.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index fee15c4d3..ca78f66a0 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -102,7 +102,7 @@ struct send_ctx {
 	u64 cur_ino;
 	u64 cur_inode_gen;
 	int cur_inode_new;
-	int cur_inode_new_gen;
+	int cur_inode_recreated;
 	int cur_inode_deleted;
 	u64 cur_inode_size;
 	u64 cur_inode_mode;
@@ -322,7 +322,7 @@ static int is_waiting_for_rm(struct send_ctx *sctx, u64 dir_ino, u64 gen);
 static int need_send_hole(struct send_ctx *sctx)
 {
 	return (sctx->parent_root && !sctx->cur_inode_new &&
-		!sctx->cur_inode_new_gen && !sctx->cur_inode_deleted &&
+		!sctx->cur_inode_recreated && !sctx->cur_inode_deleted &&
 		S_ISREG(sctx->cur_inode_mode));
 }
 
@@ -6265,7 +6265,7 @@ static int changed_inode(struct send_ctx *sctx,
 	u64 right_gen = 0;
 
 	sctx->cur_ino = key->objectid;
-	sctx->cur_inode_new_gen = 0;
+	sctx->cur_inode_recreated = 0;
 	sctx->cur_inode_last_extent = (u64)-1;
 	sctx->cur_inode_next_write_offset = 0;
 	sctx->ignore_cur_inode = false;
@@ -6306,7 +6306,7 @@ static int changed_inode(struct send_ctx *sctx,
 		 */
 		if (left_gen != right_gen &&
 		    sctx->cur_ino != BTRFS_FIRST_FREE_OBJECTID)
-			sctx->cur_inode_new_gen = 1;
+			sctx->cur_inode_recreated = 1;
 	}
 
 	/*
@@ -6364,7 +6364,7 @@ static int changed_inode(struct send_ctx *sctx,
 		 * reused the same inum. So we have to treat the old inode as
 		 * deleted and the new one as new.
 		 */
-		if (sctx->cur_inode_new_gen) {
+		if (sctx->cur_inode_recreated) {
 			/*
 			 * First, process the inode as if it was deleted.
 			 */
@@ -6401,7 +6401,8 @@ static int changed_inode(struct send_ctx *sctx,
 				goto out;
 			/*
 			 * Advance send_progress now as we did not get into
-			 * process_recorded_refs_if_needed in the new_gen case.
+			 * process_recorded_refs_if_needed in the
+			 * cur_inode_recreated case.
 			 */
 			sctx->send_progress = sctx->cur_ino + 1;
 
@@ -6418,7 +6419,7 @@ static int changed_inode(struct send_ctx *sctx,
 		} else {
 			sctx->cur_inode_gen = left_gen;
 			sctx->cur_inode_new = 0;
-			sctx->cur_inode_new_gen = 0;
+			sctx->cur_inode_recreated = 0;
 			sctx->cur_inode_deleted = 0;
 			sctx->cur_inode_size = btrfs_inode_size(
 					sctx->left_path->nodes[0], left_ii);
@@ -6435,7 +6436,7 @@ static int changed_inode(struct send_ctx *sctx,
  * We have to process new refs before deleted refs, but compare_trees gives us
  * the new and deleted refs mixed. To fix this, we record the new/deleted refs
  * first and later process them in process_recorded_refs.
- * For the cur_inode_new_gen case, we skip recording completely because
+ * For the cur_inode_recreated case, we skip recording completely because
  * changed_inode did already initiate processing of refs. The reason for this is
  * that in this case, compare_tree actually compares the refs of 2 different
  * inodes. To fix this, process_all_refs is used in changed_inode to handle all
@@ -6451,7 +6452,7 @@ static int changed_ref(struct send_ctx *sctx,
 		return -EIO;
 	}
 
-	if (!sctx->cur_inode_new_gen &&
+	if (!sctx->cur_inode_recreated &&
 	    sctx->cur_ino != BTRFS_FIRST_FREE_OBJECTID) {
 		if (result == BTRFS_COMPARE_TREE_NEW)
 			ret = record_new_ref(sctx);
@@ -6466,8 +6467,8 @@ static int changed_ref(struct send_ctx *sctx,
 
 /*
  * Process new/deleted/changed xattrs. We skip processing in the
- * cur_inode_new_gen case because changed_inode did already initiate processing
- * of xattrs. The reason is the same as in changed_ref
+ * cur_inode_recreated case because changed_inode did already initiate
+ * processing of xattrs. The reason is the same as in changed_ref
  */
 static int changed_xattr(struct send_ctx *sctx,
 			 enum btrfs_compare_tree_result result)
@@ -6479,7 +6480,7 @@ static int changed_xattr(struct send_ctx *sctx,
 		return -EIO;
 	}
 
-	if (!sctx->cur_inode_new_gen && !sctx->cur_inode_deleted) {
+	if (!sctx->cur_inode_recreated && !sctx->cur_inode_deleted) {
 		if (result == BTRFS_COMPARE_TREE_NEW)
 			ret = process_new_xattr(sctx);
 		else if (result == BTRFS_COMPARE_TREE_DELETED)
@@ -6493,8 +6494,8 @@ static int changed_xattr(struct send_ctx *sctx,
 
 /*
  * Process new/deleted/changed extents. We skip processing in the
- * cur_inode_new_gen case because changed_inode did already initiate processing
- * of extents. The reason is the same as in changed_ref
+ * cur_inode_recreated case because changed_inode did already initiate
+ * processing of extents. The reason is the same as in changed_ref
  */
 static int changed_extent(struct send_ctx *sctx,
 			  enum btrfs_compare_tree_result result)
@@ -6517,7 +6518,7 @@ static int changed_extent(struct send_ctx *sctx,
 	if (sctx->cur_ino != sctx->cmp_key->objectid)
 		return 0;
 
-	if (!sctx->cur_inode_new_gen && !sctx->cur_inode_deleted) {
+	if (!sctx->cur_inode_recreated && !sctx->cur_inode_deleted) {
 		if (result != BTRFS_COMPARE_TREE_DELETED)
 			ret = process_extent(sctx, sctx->left_path,
 					sctx->cmp_key);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/4] btrfs: send: fix invalid commands for inodes with changed type but same gen
  2021-01-25 19:42 [PATCH v2 0/4] btrfs: send: correctly recreate changed inodes Roman Anasal
  2021-01-25 19:42 ` [PATCH v2 1/4] btrfs: send: rename send_ctx.cur_inode_new_gen to cur_inode_recreated Roman Anasal
@ 2021-01-25 19:42 ` Roman Anasal
  2021-01-25 19:42 ` [PATCH v2 3/4] btrfs: send: fix invalid commands for inodes with changed rdev " Roman Anasal
  2021-01-25 19:42 ` [PATCH v2 4/4] btrfs: send: fix invalid commands for inodes in disconnected roots Roman Anasal
  3 siblings, 0 replies; 11+ messages in thread
From: Roman Anasal @ 2021-01-25 19:42 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Roman Anasal

When doing a send, if a new inode has the same number as an inode in the
parent subvolume it will only be detected as to be recreated when the
genrations differ. But with inodes in the same generation the emitted
commands will cause the receiver to fail.

This case does not happen when doing incremental sends with snapshots
that are kept read-only by the user all the time, but it may happen if
- a snapshot was modified after it was created
- the subvol used as parent was created independently from the sent subvol

Example reproducers:

  # case 1: same ino at same path
  btrfs subvolume create subvol1
  btrfs subvolume create subvol2
  mkdir subvol1/a
  touch subvol2/a
  btrfs property set subvol1 ro true
  btrfs property set subvol2 ro true
  btrfs send -p subvol1 subvol2 | btrfs receive --dump

The produced tree state here is:
  |-- subvol1
  |   `-- a/        (ino 257)
  |
  `-- subvol2
      `-- a         (ino 257)

  Where subvol1/a/ is a directory and subvol2/a is a file with the same
  inode number and same generation.

Example output of the receive command:
  At subvol subvol2
  snapshot        ./subvol2                       uuid=19d2be0a-5af1-fa44-9b3f-f21815178d00 transid=9 parent_uuid=1bac8b12-ddb2-6441-8551-700456991785 parent_transid=9
  utimes          ./subvol2/                      atime=2021-01-11T13:41:36+0000 mtime=2021-01-11T13:41:36+0000 ctime=2021-01-11T13:41:36+0000
  link            ./subvol2/a                     dest=a
  unlink          ./subvol2/a
  utimes          ./subvol2/                      atime=2021-01-11T13:41:36+0000 mtime=2021-01-11T13:41:36+0000 ctime=2021-01-11T13:41:36+0000
  chmod           ./subvol2/a                     mode=644
  utimes          ./subvol2/a                     atime=2021-01-11T13:41:36+0000 mtime=2021-01-11T13:41:36+0000 ctime=2021-01-11T13:41:36+0000

=> the `link` command causes the receiver to fail with:
   ERROR: link a -> a failed: File exists

Second example:
  # case 2: same ino at different path
  btrfs subvolume create subvol1
  btrfs subvolume create subvol2
  mkdir subvol1/a
  touch subvol2/b
  btrfs property set subvol1 ro true
  btrfs property set subvol2 ro true
  btrfs send -p subvol1 subvol2 | btrfs receive --dump

The produced tree state here is:
  |-- subvol1
  |   `-- a/        (ino 257)
  |
  `-- subvol2
      `-- b         (ino 257)

  Where subvol1/a/ is a directory and subvol2/b is a file with the same
  inode number and same generation.

Example output of the receive command:
  At subvol subvol2
  snapshot        ./subvol2                       uuid=ea93c47a-5f47-724f-8a43-e15ce745aef0 transid=20 parent_uuid=f03578ef-5bca-1445-a480-3df63677fddf parent_transid=20
  utimes          ./subvol2/                      atime=2021-01-11T13:58:00+0000 mtime=2021-01-11T13:58:00+0000 ctime=2021-01-11T13:58:00+0000
  link            ./subvol2/b                     dest=a
  unlink          ./subvol2/a
  utimes          ./subvol2/                      atime=2021-01-11T13:58:00+0000 mtime=2021-01-11T13:58:00+0000 ctime=2021-01-11T13:58:00+0000
  chmod           ./subvol2/b                     mode=644
  utimes          ./subvol2/b                     atime=2021-01-11T13:58:00+0000 mtime=2021-01-11T13:58:00+0000 ctime=2021-01-11T13:58:00+0000

=> the `link` command causes the receiver to fail with:
   ERROR: link b -> a failed: Operation not permitted

Signed-off-by: Roman Anasal <roman.anasal@bdsu.de>
---
v2:
  - move declarations to the top
---
 fs/btrfs/send.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index ca78f66a0..c8b1f441f 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6263,6 +6263,7 @@ static int changed_inode(struct send_ctx *sctx,
 	struct btrfs_inode_item *right_ii = NULL;
 	u64 left_gen = 0;
 	u64 right_gen = 0;
+	u64 left_type, right_type;
 
 	sctx->cur_ino = key->objectid;
 	sctx->cur_inode_recreated = 0;
@@ -6299,12 +6300,17 @@ static int changed_inode(struct send_ctx *sctx,
 		right_gen = btrfs_inode_generation(sctx->right_path->nodes[0],
 				right_ii);
 
+		left_type = S_IFMT & btrfs_inode_mode(
+				sctx->left_path->nodes[0], left_ii);
+		right_type = S_IFMT & btrfs_inode_mode(
+				sctx->right_path->nodes[0], right_ii);
+
 		/*
 		 * The cur_ino = root dir case is special here. We can't treat
 		 * the inode as deleted+reused because it would generate a
 		 * stream that tries to delete/mkdir the root dir.
 		 */
-		if (left_gen != right_gen &&
+		if ((left_gen != right_gen || left_type != right_type) &&
 		    sctx->cur_ino != BTRFS_FIRST_FREE_OBJECTID)
 			sctx->cur_inode_recreated = 1;
 	}
@@ -6359,10 +6365,10 @@ static int changed_inode(struct send_ctx *sctx,
 	} else if (result == BTRFS_COMPARE_TREE_CHANGED) {
 		/*
 		 * We need to do some special handling in case the inode was
-		 * reported as changed with a changed generation number. This
-		 * means that the original inode was deleted and new inode
-		 * reused the same inum. So we have to treat the old inode as
-		 * deleted and the new one as new.
+		 * reported as changed with a changed generation number or
+		 * changed inode type. This means that the original inode was
+		 * deleted and new inode reused the same inum. So we have to
+		 * treat the old inode as deleted and the new one as new.
 		 */
 		if (sctx->cur_inode_recreated) {
 			/*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/4] btrfs: send: fix invalid commands for inodes with changed rdev but same gen
  2021-01-25 19:42 [PATCH v2 0/4] btrfs: send: correctly recreate changed inodes Roman Anasal
  2021-01-25 19:42 ` [PATCH v2 1/4] btrfs: send: rename send_ctx.cur_inode_new_gen to cur_inode_recreated Roman Anasal
  2021-01-25 19:42 ` [PATCH v2 2/4] btrfs: send: fix invalid commands for inodes with changed type but same gen Roman Anasal
@ 2021-01-25 19:42 ` Roman Anasal
  2021-01-25 20:51   ` Filipe Manana
  2021-01-25 19:42 ` [PATCH v2 4/4] btrfs: send: fix invalid commands for inodes in disconnected roots Roman Anasal
  3 siblings, 1 reply; 11+ messages in thread
From: Roman Anasal @ 2021-01-25 19:42 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Roman Anasal

This is analogous to the preceding patch ("btrfs: send: fix invalid
commands for inodes with changed type but same gen") but for changed
rdev:

When doing an incremental send, if a new inode has the same number as an
inode in the parent subvolume, was created with the same generation but
has differing rdev it will not be detected as changed and thus not
recreated. This will lead to incorrect results on the receiver where the
inode will keep the rdev of the inode in the parent subvolume or even
fail when also the ref is unchanged.

This case does not happen when doing incremental sends with snapshots
that are kept read-only by the user all the time, but it may happen if
- a snapshot was modified in the same transaction as its parent after it
  was created
- the subvol used as parent was created independently from the sent subvol

Example reproducers:

  # case 1: same ino at same path
  btrfs subvolume create subvol1
  btrfs subvolume create subvol2
  mknod subvol1/a c 1 3
  mknod subvol2/a c 1 5
  btrfs property set subvol1 ro true
  btrfs property set subvol2 ro true
  btrfs send -p subvol1 subvol2 | btrfs receive --dump

The produced tree state here is:
  |-- subvol1
  |   `-- a         (ino 257, c 1,3)
  |
  `-- subvol2
      `-- a         (ino 257, c 1,5)

Where subvol1/a and subvol2/a are character devices with differing minor
numbers but same inode number and same generation.

Example output of the receive command:
  At subvol subvol2
  snapshot        ./subvol2                       uuid=7513941c-4ef7-f847-b05e-4fdfe003af7b transid=9 parent_uuid=b66f015b-c226-2548-9e39-048c7fdbec99 parent_transid=9
  utimes          ./subvol2/                      atime=2021-01-25T17:14:36+0000 mtime=2021-01-25T17:14:36+0000 ctime=2021-01-25T17:14:36+0000
  link            ./subvol2/a                     dest=a
  unlink          ./subvol2/a
  utimes          ./subvol2/                      atime=2021-01-25T17:14:36+0000 mtime=2021-01-25T17:14:36+0000 ctime=2021-01-25T17:14:36+0000
  utimes          ./subvol2/a                     atime=2021-01-25T17:14:36+0000 mtime=2021-01-25T17:14:36+0000 ctime=2021-01-25T17:14:36+0000

=> the `link` command causes the receiver to fail with:
   ERROR: link a -> a failed: File exists

Second example:
  # case 2: same ino at different path
  btrfs subvolume create subvol1
  btrfs subvolume create subvol2
  mknod subvol1/a c 1 3
  mknod subvol2/b c 1 5
  btrfs property set subvol1 ro true
  btrfs property set subvol2 ro true
  btrfs send -p subvol1 subvol2 | btrfs receive --dump

The produced tree state here is:
  |-- subvol1
  |   `-- a         (ino 257, c 1,3)
  |
  `-- subvol2
      `-- b         (ino 257, c 1,5)

Where subvol1/a and subvol2/b are character devices with differing minor
numbers but same inode number and same generation.

Example output of the receive command:
  At subvol subvol2
  snapshot        ./subvol2                       uuid=1c175819-8b97-0046-a20e-5f95e37cbd40 transid=13 parent_uuid=bad4a908-21b4-6f40-9a08-6b0768346725 parent_transid=13
  utimes          ./subvol2/                      atime=2021-01-25T17:18:46+0000 mtime=2021-01-25T17:18:46+0000 ctime=2021-01-25T17:18:46+0000
  link            ./subvol2/b                     dest=a
  unlink          ./subvol2/a
  utimes          ./subvol2/                      atime=2021-01-25T17:18:46+0000 mtime=2021-01-25T17:18:46+0000 ctime=2021-01-25T17:18:46+0000
  utimes          ./subvol2/b                     atime=2021-01-25T17:18:46+0000 mtime=2021-01-25T17:18:46+0000 ctime=2021-01-25T17:18:46+0000

=> subvol1/a is renamed to subvol2/b instead of recreated to updated
   rdev which results in received subvol2/b having the wrong minor
   number:

  257 crw-r--r--. 1 root root 1, 3 Jan 25 17:18 subvol2/b

Signed-off-by: Roman Anasal <roman.anasal@bdsu.de>
---
v2:
  - add this patch to also handle changed rdev
---
 fs/btrfs/send.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index c8b1f441f..ef544525f 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6263,6 +6263,7 @@ static int changed_inode(struct send_ctx *sctx,
 	struct btrfs_inode_item *right_ii = NULL;
 	u64 left_gen = 0;
 	u64 right_gen = 0;
+	u64 left_rdev, right_rdev;
 	u64 left_type, right_type;
 
 	sctx->cur_ino = key->objectid;
@@ -6285,6 +6286,8 @@ static int changed_inode(struct send_ctx *sctx,
 				struct btrfs_inode_item);
 		left_gen = btrfs_inode_generation(sctx->left_path->nodes[0],
 				left_ii);
+		left_rdev = btrfs_inode_rdev(sctx->left_path->nodes[0],
+				left_ii);
 	} else {
 		right_ii = btrfs_item_ptr(sctx->right_path->nodes[0],
 				sctx->right_path->slots[0],
@@ -6300,6 +6303,9 @@ static int changed_inode(struct send_ctx *sctx,
 		right_gen = btrfs_inode_generation(sctx->right_path->nodes[0],
 				right_ii);
 
+		right_rdev = btrfs_inode_rdev(sctx->right_path->nodes[0],
+				right_ii);
+
 		left_type = S_IFMT & btrfs_inode_mode(
 				sctx->left_path->nodes[0], left_ii);
 		right_type = S_IFMT & btrfs_inode_mode(
@@ -6310,7 +6316,8 @@ static int changed_inode(struct send_ctx *sctx,
 		 * the inode as deleted+reused because it would generate a
 		 * stream that tries to delete/mkdir the root dir.
 		 */
-		if ((left_gen != right_gen || left_type != right_type) &&
+		if ((left_gen != right_gen || left_type != right_type ||
+		    left_rdev != right_rdev) &&
 		    sctx->cur_ino != BTRFS_FIRST_FREE_OBJECTID)
 			sctx->cur_inode_recreated = 1;
 	}
@@ -6350,8 +6357,7 @@ static int changed_inode(struct send_ctx *sctx,
 				sctx->left_path->nodes[0], left_ii);
 		sctx->cur_inode_mode = btrfs_inode_mode(
 				sctx->left_path->nodes[0], left_ii);
-		sctx->cur_inode_rdev = btrfs_inode_rdev(
-				sctx->left_path->nodes[0], left_ii);
+		sctx->cur_inode_rdev = left_rdev;
 		if (sctx->cur_ino != BTRFS_FIRST_FREE_OBJECTID)
 			ret = send_create_inode_if_needed(sctx);
 	} else if (result == BTRFS_COMPARE_TREE_DELETED) {
@@ -6396,8 +6402,7 @@ static int changed_inode(struct send_ctx *sctx,
 					sctx->left_path->nodes[0], left_ii);
 			sctx->cur_inode_mode = btrfs_inode_mode(
 					sctx->left_path->nodes[0], left_ii);
-			sctx->cur_inode_rdev = btrfs_inode_rdev(
-					sctx->left_path->nodes[0], left_ii);
+			sctx->cur_inode_rdev = left_rdev;
 			ret = send_create_inode_if_needed(sctx);
 			if (ret < 0)
 				goto out;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/4] btrfs: send: fix invalid commands for inodes in disconnected roots
  2021-01-25 19:42 [PATCH v2 0/4] btrfs: send: correctly recreate changed inodes Roman Anasal
                   ` (2 preceding siblings ...)
  2021-01-25 19:42 ` [PATCH v2 3/4] btrfs: send: fix invalid commands for inodes with changed rdev " Roman Anasal
@ 2021-01-25 19:42 ` Roman Anasal
  3 siblings, 0 replies; 11+ messages in thread
From: Roman Anasal @ 2021-01-25 19:42 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Roman Anasal

When doing an inceremental send using an independently created subvolume
as the send parent compare_refs will falsely report changed refs for
inodes with the same inode number, same generation and same ref name.

This is due to dir_changed returning true if the generations of the
parent directory differ in the subvolume and the send parent. Normally
this situation would cause the parent directory to be recreated and the
contained inodes would need to be moved (e.g. by link/unlink).

In the case of the root directory (ino BTRFS_FIRST_FREE_OBJECTID) though
changed_inode will not try to recreated it since this would produce a
stream trying to remove and recreate the subvolume itself.
For independently created subvolumes the generation will always be
different since create_subvol() commits the transaction after createing
a new subvolume.
By handling this case in dir_changed as well the produced commands are
correct again.

Summarizing the conditions for this:
- inode has same number, type/rdev, ref and generation in subvolume and
  send parent
- ref is a child of the subvolume root directory
- root directory has the same inode number - which is always
  BTRFS_FIRST_FREE_OBJECTID
- root directory has different generation in subvolume and send parent
  which is always true for independent subvolumes, i.e. if they're not
  snapshots (of snapshots) of one another

Example reproducer:
  btrfs subvolume create subvol1
  btrfs subvolume create subvol2
  touch subvol1/a subvol2/a
  btrfs property set subvol1 ro true
  btrfs property set subvol2 ro true
  btrfs send -p subvol1 subvol2 | btrfs receive --dump

The produced tree state here is (starting at gen 6):
  |-- subvol1       (ino 256, gen 6)
  |   `-- a         (ino 257, gen 8)
  |
  `-- subvol2       (ino 256, gen 7)
      `-- a         (ino 257, gen 8)

subvol1/a and subvol2/a are files with the same inode number, generation
and path name. The subvolume root directories have the same inode number
but different generations.

Example output of the receive command:
  At subvol subvol2
  snapshot        ./subvol2                       uuid=e783948d-4fd5-0d47-9777-43036e468170 transid=8 parent_uuid=7b0cefdb-738d-e342-a903-501df1877b01 parent_transid=8
  utimes          ./subvol2/                      atime=2021-01-25T18:07:42+0000 mtime=2021-01-25T18:07:42+0000 ctime=2021-01-25T18:07:42+0000
  link            ./subvol2/a                     dest=a
  unlink          ./subvol2/a
  utimes          ./subvol2/                      atime=2021-01-25T18:07:42+0000 mtime=2021-01-25T18:07:42+0000 ctime=2021-01-25T18:07:42+0000
  utimes          ./subvol2/a                     atime=2021-01-25T18:07:42+0000 mtime=2021-01-25T18:07:42+0000 ctime=2021-01-25T18:07:42+0000

=> the `link` command causes the receiver to fail with:
   ERROR: link a -> a failed: File exists

Signed-off-by: Roman Anasal <roman.anasal@bdsu.de>
---
v2:
  - add this patch based on feedback in
    https://lore.kernel.org/linux-btrfs/9e177865-0408-c321-951e-ce0f3ff33389@gmail.com/
---
 fs/btrfs/send.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index ef544525f..3114770be 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6543,6 +6543,15 @@ static int dir_changed(struct send_ctx *sctx, u64 dir)
 	u64 orig_gen, new_gen;
 	int ret;
 
+	/*
+	 * Treat the root dir case special here: changed_inode will never
+	 * produce a stream that tries to delete/rmdir/rename the root dir.
+	 * So we must assume the root always as unchanged here to not produce
+	 * incorrect link/rename commands for contained refs.
+	 */
+	if (dir == BTRFS_FIRST_FREE_OBJECTID)
+		return 0;
+
 	ret = get_inode_info(sctx->send_root, dir, NULL, &new_gen, NULL, NULL,
 			     NULL, NULL);
 	if (ret)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] btrfs: send: fix invalid commands for inodes with changed rdev but same gen
  2021-01-25 19:42 ` [PATCH v2 3/4] btrfs: send: fix invalid commands for inodes with changed rdev " Roman Anasal
@ 2021-01-25 20:51   ` Filipe Manana
  2021-01-26 19:19     ` Roman Anasal | BDSU
  2021-01-31 15:52     ` Roman Anasal | BDSU
  0 siblings, 2 replies; 11+ messages in thread
From: Filipe Manana @ 2021-01-25 20:51 UTC (permalink / raw)
  To: Roman Anasal; +Cc: linux-btrfs

On Mon, Jan 25, 2021 at 7:51 PM Roman Anasal <roman.anasal@bdsu.de> wrote:
>
> This is analogous to the preceding patch ("btrfs: send: fix invalid
> commands for inodes with changed type but same gen") but for changed
> rdev:
>
> When doing an incremental send, if a new inode has the same number as an
> inode in the parent subvolume, was created with the same generation but
> has differing rdev it will not be detected as changed and thus not
> recreated. This will lead to incorrect results on the receiver where the
> inode will keep the rdev of the inode in the parent subvolume or even
> fail when also the ref is unchanged.
>
> This case does not happen when doing incremental sends with snapshots
> that are kept read-only by the user all the time, but it may happen if
> - a snapshot was modified in the same transaction as its parent after it
>   was created
> - the subvol used as parent was created independently from the sent subvol
>
> Example reproducers:
>
>   # case 1: same ino at same path
>   btrfs subvolume create subvol1
>   btrfs subvolume create subvol2
>   mknod subvol1/a c 1 3
>   mknod subvol2/a c 1 5
>   btrfs property set subvol1 ro true
>   btrfs property set subvol2 ro true
>   btrfs send -p subvol1 subvol2 | btrfs receive --dump
>
> The produced tree state here is:
>   |-- subvol1
>   |   `-- a         (ino 257, c 1,3)
>   |
>   `-- subvol2
>       `-- a         (ino 257, c 1,5)
>
> Where subvol1/a and subvol2/a are character devices with differing minor
> numbers but same inode number and same generation.
>
> Example output of the receive command:
>   At subvol subvol2
>   snapshot        ./subvol2                       uuid=7513941c-4ef7-f847-b05e-4fdfe003af7b transid=9 parent_uuid=b66f015b-c226-2548-9e39-048c7fdbec99 parent_transid=9
>   utimes          ./subvol2/                      atime=2021-01-25T17:14:36+0000 mtime=2021-01-25T17:14:36+0000 ctime=2021-01-25T17:14:36+0000
>   link            ./subvol2/a                     dest=a
>   unlink          ./subvol2/a
>   utimes          ./subvol2/                      atime=2021-01-25T17:14:36+0000 mtime=2021-01-25T17:14:36+0000 ctime=2021-01-25T17:14:36+0000
>   utimes          ./subvol2/a                     atime=2021-01-25T17:14:36+0000 mtime=2021-01-25T17:14:36+0000 ctime=2021-01-25T17:14:36+0000
>
> => the `link` command causes the receiver to fail with:
>    ERROR: link a -> a failed: File exists
>
> Second example:
>   # case 2: same ino at different path
>   btrfs subvolume create subvol1
>   btrfs subvolume create subvol2
>   mknod subvol1/a c 1 3
>   mknod subvol2/b c 1 5
>   btrfs property set subvol1 ro true
>   btrfs property set subvol2 ro true
>   btrfs send -p subvol1 subvol2 | btrfs receive --dump

As I've told you before for the v1 patchset from a week or two ago,
this is not a supported scenario for incremental sends.
Incremental sends are meant to be used on RO snapshots of the same
subvolume, and those snapshots must never be changed after they were
created.

Incremental sends were simply not designed for these cases, and can
never be guaranteed to work with such cases.

The bug is not having incremental sends fail right away, with an
explicit error message, when the send and parent roots aren't RO
snapshots of the same subvolume.

>
> The produced tree state here is:
>   |-- subvol1
>   |   `-- a         (ino 257, c 1,3)
>   |
>   `-- subvol2
>       `-- b         (ino 257, c 1,5)
>
> Where subvol1/a and subvol2/b are character devices with differing minor
> numbers but same inode number and same generation.
>
> Example output of the receive command:
>   At subvol subvol2
>   snapshot        ./subvol2                       uuid=1c175819-8b97-0046-a20e-5f95e37cbd40 transid=13 parent_uuid=bad4a908-21b4-6f40-9a08-6b0768346725 parent_transid=13
>   utimes          ./subvol2/                      atime=2021-01-25T17:18:46+0000 mtime=2021-01-25T17:18:46+0000 ctime=2021-01-25T17:18:46+0000
>   link            ./subvol2/b                     dest=a
>   unlink          ./subvol2/a
>   utimes          ./subvol2/                      atime=2021-01-25T17:18:46+0000 mtime=2021-01-25T17:18:46+0000 ctime=2021-01-25T17:18:46+0000
>   utimes          ./subvol2/b                     atime=2021-01-25T17:18:46+0000 mtime=2021-01-25T17:18:46+0000 ctime=2021-01-25T17:18:46+0000
>
> => subvol1/a is renamed to subvol2/b instead of recreated to updated
>    rdev which results in received subvol2/b having the wrong minor
>    number:
>
>   257 crw-r--r--. 1 root root 1, 3 Jan 25 17:18 subvol2/b
>
> Signed-off-by: Roman Anasal <roman.anasal@bdsu.de>
> ---
> v2:
>   - add this patch to also handle changed rdev
> ---
>  fs/btrfs/send.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index c8b1f441f..ef544525f 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -6263,6 +6263,7 @@ static int changed_inode(struct send_ctx *sctx,
>         struct btrfs_inode_item *right_ii = NULL;
>         u64 left_gen = 0;
>         u64 right_gen = 0;
> +       u64 left_rdev, right_rdev;
>         u64 left_type, right_type;
>
>         sctx->cur_ino = key->objectid;
> @@ -6285,6 +6286,8 @@ static int changed_inode(struct send_ctx *sctx,
>                                 struct btrfs_inode_item);
>                 left_gen = btrfs_inode_generation(sctx->left_path->nodes[0],
>                                 left_ii);
> +               left_rdev = btrfs_inode_rdev(sctx->left_path->nodes[0],
> +                               left_ii);
>         } else {
>                 right_ii = btrfs_item_ptr(sctx->right_path->nodes[0],
>                                 sctx->right_path->slots[0],
> @@ -6300,6 +6303,9 @@ static int changed_inode(struct send_ctx *sctx,
>                 right_gen = btrfs_inode_generation(sctx->right_path->nodes[0],
>                                 right_ii);
>
> +               right_rdev = btrfs_inode_rdev(sctx->right_path->nodes[0],
> +                               right_ii);
> +
>                 left_type = S_IFMT & btrfs_inode_mode(
>                                 sctx->left_path->nodes[0], left_ii);
>                 right_type = S_IFMT & btrfs_inode_mode(
> @@ -6310,7 +6316,8 @@ static int changed_inode(struct send_ctx *sctx,
>                  * the inode as deleted+reused because it would generate a
>                  * stream that tries to delete/mkdir the root dir.
>                  */
> -               if ((left_gen != right_gen || left_type != right_type) &&
> +               if ((left_gen != right_gen || left_type != right_type ||
> +                   left_rdev != right_rdev) &&
>                     sctx->cur_ino != BTRFS_FIRST_FREE_OBJECTID)
>                         sctx->cur_inode_recreated = 1;
>         }
> @@ -6350,8 +6357,7 @@ static int changed_inode(struct send_ctx *sctx,
>                                 sctx->left_path->nodes[0], left_ii);
>                 sctx->cur_inode_mode = btrfs_inode_mode(
>                                 sctx->left_path->nodes[0], left_ii);
> -               sctx->cur_inode_rdev = btrfs_inode_rdev(
> -                               sctx->left_path->nodes[0], left_ii);
> +               sctx->cur_inode_rdev = left_rdev;
>                 if (sctx->cur_ino != BTRFS_FIRST_FREE_OBJECTID)
>                         ret = send_create_inode_if_needed(sctx);
>         } else if (result == BTRFS_COMPARE_TREE_DELETED) {
> @@ -6396,8 +6402,7 @@ static int changed_inode(struct send_ctx *sctx,
>                                         sctx->left_path->nodes[0], left_ii);
>                         sctx->cur_inode_mode = btrfs_inode_mode(
>                                         sctx->left_path->nodes[0], left_ii);
> -                       sctx->cur_inode_rdev = btrfs_inode_rdev(
> -                                       sctx->left_path->nodes[0], left_ii);
> +                       sctx->cur_inode_rdev = left_rdev;
>                         ret = send_create_inode_if_needed(sctx);
>                         if (ret < 0)
>                                 goto out;
> --
> 2.26.2
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] btrfs: send: fix invalid commands for inodes with changed rdev but same gen
  2021-01-25 20:51   ` Filipe Manana
@ 2021-01-26 19:19     ` Roman Anasal | BDSU
  2021-01-27 10:53       ` Filipe Manana
  2021-01-31 15:52     ` Roman Anasal | BDSU
  1 sibling, 1 reply; 11+ messages in thread
From: Roman Anasal | BDSU @ 2021-01-26 19:19 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

Am Montag, den 25.01.2021, 20:51 +0000 schrieb Filipe Manana:
> On Mon, Jan 25, 2021 at 7:51 PM Roman Anasal <roman.anasal@bdsu.de>
> wrote:
> > This is analogous to the preceding patch ("btrfs: send: fix invalid
> > commands for inodes with changed type but same gen") but for
> > changed
> > rdev:
> > 
> > When doing an incremental send, if a new inode has the same number
> > as an
> > inode in the parent subvolume, was created with the same generation
> > but
> > has differing rdev it will not be detected as changed and thus not
> > recreated. This will lead to incorrect results on the receiver
> > where the
> > inode will keep the rdev of the inode in the parent subvolume or
> > even
> > fail when also the ref is unchanged.
> > 
> > This case does not happen when doing incremental sends with
> > snapshots
> > that are kept read-only by the user all the time, but it may happen
> > if
> > - a snapshot was modified in the same transaction as its parent
> > after it
> >   was created
> > - the subvol used as parent was created independently from the sent
> > subvol
> > 
> > Example reproducers:
> > 
> >   # case 1: same ino at same path
> >   btrfs subvolume create subvol1
> >   btrfs subvolume create subvol2
> >   mknod subvol1/a c 1 3
> >   mknod subvol2/a c 1 5
> >   btrfs property set subvol1 ro true
> >   btrfs property set subvol2 ro true
> >   btrfs send -p subvol1 subvol2 | btrfs receive --dump
> > 
> > The produced tree state here is:
> >   |-- subvol1
> >   |   `-- a         (ino 257, c 1,3)
> >   |
> >   `-- subvol2
> >       `-- a         (ino 257, c 1,5)
> > 
> > Where subvol1/a and subvol2/a are character devices with differing
> > minor
> > numbers but same inode number and same generation.
> > 
> > Example output of the receive command:
> >   At subvol subvol2
> >   snapshot        ./subvol2                       uuid=7513941c-
> > 4ef7-f847-b05e-4fdfe003af7b transid=9 parent_uuid=b66f015b-c226-
> > 2548-9e39-048c7fdbec99 parent_transid=9
> >   utimes          ./subvol2/                      atime=2021-01-
> > 25T17:14:36+0000 mtime=2021-01-25T17:14:36+0000 ctime=2021-01-
> > 25T17:14:36+0000
> >   link            ./subvol2/a                     dest=a
> >   unlink          ./subvol2/a
> >   utimes          ./subvol2/                      atime=2021-01-
> > 25T17:14:36+0000 mtime=2021-01-25T17:14:36+0000 ctime=2021-01-
> > 25T17:14:36+0000
> >   utimes          ./subvol2/a                     atime=2021-01-
> > 25T17:14:36+0000 mtime=2021-01-25T17:14:36+0000 ctime=2021-01-
> > 25T17:14:36+0000
> > 
> > => the `link` command causes the receiver to fail with:
> >    ERROR: link a -> a failed: File exists
> > 
> > Second example:
> >   # case 2: same ino at different path
> >   btrfs subvolume create subvol1
> >   btrfs subvolume create subvol2
> >   mknod subvol1/a c 1 3
> >   mknod subvol2/b c 1 5
> >   btrfs property set subvol1 ro true
> >   btrfs property set subvol2 ro true
> >   btrfs send -p subvol1 subvol2 | btrfs receive --dump
> 
> As I've told you before for the v1 patchset from a week or two ago,
> this is not a supported scenario for incremental sends.
> Incremental sends are meant to be used on RO snapshots of the same
> subvolume, and those snapshots must never be changed after they were
> created.
> 
> Incremental sends were simply not designed for these cases, and can
> never be guaranteed to work with such cases.
> 
> The bug is not having incremental sends fail right away, with an
> explicit error message, when the send and parent roots aren't RO
> snapshots of the same subvolume.

I am sorry, I didn't want to anger you or to appear to be just stubborn
by posting this.

As I wrote in the cover letter I am aware that this is not a supported
use case and I understand that that makes the patches likely to be
rejected.
As said the reason I _wrote_ the patches was simply to learn more about
the btrfs code and its internals and see if I would be able to
understand it enough. The reason I _submitted_ them was just to
document what I found out so others could have a look into it and just
in case it maybe useful at a later time.

I also don't want to claim that these will add full support for sending
unrelated roots - they don't! They only handle those very specific edge
cases I found, which are currently _possible_, although still not
supported.

I took a deeper look into the rest to see if it could be supported:
the comparing algorithm actually works fine, even with completely
unrelated subvolumes (i.e. btrfs_compare_trees, changed_cb,
changed_inode etc.), but the processing of the changes (i.e.
process_recorded_refs etc.) is heavily based on (ino, gen) as
identifying handle, which can not be changed without the high risk of
regression - just as you said in your earlier comments - since side
effects of any changes are hard to see or understand without a very
deep understanding of the whole code; which is why I didn't even try to
touch that parts.

I apologize if I appeared to be stubborn or ignorant of your feedback!
That really wasn't my intent.


> > The produced tree state here is:
> >   |-- subvol1
> >   |   `-- a         (ino 257, c 1,3)
> >   |
> >   `-- subvol2
> >       `-- b         (ino 257, c 1,5)
> > 
> > Where subvol1/a and subvol2/b are character devices with differing
> > minor
> > numbers but same inode number and same generation.
> > 
> > Example output of the receive command:
> >   At subvol subvol2
> >   snapshot        ./subvol2                       uuid=1c175819-
> > 8b97-0046-a20e-5f95e37cbd40 transid=13 parent_uuid=bad4a908-21b4-
> > 6f40-9a08-6b0768346725 parent_transid=13
> >   utimes          ./subvol2/                      atime=2021-01-
> > 25T17:18:46+0000 mtime=2021-01-25T17:18:46+0000 ctime=2021-01-
> > 25T17:18:46+0000
> >   link            ./subvol2/b                     dest=a
> >   unlink          ./subvol2/a
> >   utimes          ./subvol2/                      atime=2021-01-
> > 25T17:18:46+0000 mtime=2021-01-25T17:18:46+0000 ctime=2021-01-
> > 25T17:18:46+0000
> >   utimes          ./subvol2/b                     atime=2021-01-
> > 25T17:18:46+0000 mtime=2021-01-25T17:18:46+0000 ctime=2021-01-
> > 25T17:18:46+0000
> > 
> > => subvol1/a is renamed to subvol2/b instead of recreated to
> > updated
> >    rdev which results in received subvol2/b having the wrong minor
> >    number:
> > 
> >   257 crw-r--r--. 1 root root 1, 3 Jan 25 17:18 subvol2/b
> > 
> > Signed-off-by: Roman Anasal <roman.anasal@bdsu.de>
> > ---
> > v2:
> >   - add this patch to also handle changed rdev
> > ---
> >  fs/btrfs/send.c | 15 ++++++++++-----
> >  1 file changed, 10 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> > index c8b1f441f..ef544525f 100644
> > --- a/fs/btrfs/send.c
> > +++ b/fs/btrfs/send.c
> > @@ -6263,6 +6263,7 @@ static int changed_inode(struct send_ctx
> > *sctx,
> >         struct btrfs_inode_item *right_ii = NULL;
> >         u64 left_gen = 0;
> >         u64 right_gen = 0;
> > +       u64 left_rdev, right_rdev;
> >         u64 left_type, right_type;
> > 
> >         sctx->cur_ino = key->objectid;
> > @@ -6285,6 +6286,8 @@ static int changed_inode(struct send_ctx
> > *sctx,
> >                                 struct btrfs_inode_item);
> >                 left_gen = btrfs_inode_generation(sctx->left_path-
> > >nodes[0],
> >                                 left_ii);
> > +               left_rdev = btrfs_inode_rdev(sctx->left_path-
> > >nodes[0],
> > +                               left_ii);
> >         } else {
> >                 right_ii = btrfs_item_ptr(sctx->right_path-
> > >nodes[0],
> >                                 sctx->right_path->slots[0],
> > @@ -6300,6 +6303,9 @@ static int changed_inode(struct send_ctx
> > *sctx,
> >                 right_gen = btrfs_inode_generation(sctx-
> > >right_path->nodes[0],
> >                                 right_ii);
> > 
> > +               right_rdev = btrfs_inode_rdev(sctx->right_path-
> > >nodes[0],
> > +                               right_ii);
> > +
> >                 left_type = S_IFMT & btrfs_inode_mode(
> >                                 sctx->left_path->nodes[0],
> > left_ii);
> >                 right_type = S_IFMT & btrfs_inode_mode(
> > @@ -6310,7 +6316,8 @@ static int changed_inode(struct send_ctx
> > *sctx,
> >                  * the inode as deleted+reused because it would
> > generate a
> >                  * stream that tries to delete/mkdir the root dir.
> >                  */
> > -               if ((left_gen != right_gen || left_type !=
> > right_type) &&
> > +               if ((left_gen != right_gen || left_type !=
> > right_type ||
> > +                   left_rdev != right_rdev) &&
> >                     sctx->cur_ino != BTRFS_FIRST_FREE_OBJECTID)
> >                         sctx->cur_inode_recreated = 1;
> >         }
> > @@ -6350,8 +6357,7 @@ static int changed_inode(struct send_ctx
> > *sctx,
> >                                 sctx->left_path->nodes[0],
> > left_ii);
> >                 sctx->cur_inode_mode = btrfs_inode_mode(
> >                                 sctx->left_path->nodes[0],
> > left_ii);
> > -               sctx->cur_inode_rdev = btrfs_inode_rdev(
> > -                               sctx->left_path->nodes[0],
> > left_ii);
> > +               sctx->cur_inode_rdev = left_rdev;
> >                 if (sctx->cur_ino != BTRFS_FIRST_FREE_OBJECTID)
> >                         ret = send_create_inode_if_needed(sctx);
> >         } else if (result == BTRFS_COMPARE_TREE_DELETED) {
> > @@ -6396,8 +6402,7 @@ static int changed_inode(struct send_ctx
> > *sctx,
> >                                         sctx->left_path->nodes[0],
> > left_ii);
> >                         sctx->cur_inode_mode = btrfs_inode_mode(
> >                                         sctx->left_path->nodes[0],
> > left_ii);
> > -                       sctx->cur_inode_rdev = btrfs_inode_rdev(
> > -                                       sctx->left_path->nodes[0],
> > left_ii);
> > +                       sctx->cur_inode_rdev = left_rdev;
> >                         ret = send_create_inode_if_needed(sctx);
> >                         if (ret < 0)
> >                                 goto out;
> > --
> > 2.26.2
> > 
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] btrfs: send: fix invalid commands for inodes with changed rdev but same gen
  2021-01-26 19:19     ` Roman Anasal | BDSU
@ 2021-01-27 10:53       ` Filipe Manana
  0 siblings, 0 replies; 11+ messages in thread
From: Filipe Manana @ 2021-01-27 10:53 UTC (permalink / raw)
  To: Roman Anasal | BDSU; +Cc: linux-btrfs

On Tue, Jan 26, 2021 at 7:19 PM Roman Anasal | BDSU
<roman.anasal@bdsu.de> wrote:
>
> Am Montag, den 25.01.2021, 20:51 +0000 schrieb Filipe Manana:
> > On Mon, Jan 25, 2021 at 7:51 PM Roman Anasal <roman.anasal@bdsu.de>
> > wrote:
> > > This is analogous to the preceding patch ("btrfs: send: fix invalid
> > > commands for inodes with changed type but same gen") but for
> > > changed
> > > rdev:
> > >
> > > When doing an incremental send, if a new inode has the same number
> > > as an
> > > inode in the parent subvolume, was created with the same generation
> > > but
> > > has differing rdev it will not be detected as changed and thus not
> > > recreated. This will lead to incorrect results on the receiver
> > > where the
> > > inode will keep the rdev of the inode in the parent subvolume or
> > > even
> > > fail when also the ref is unchanged.
> > >
> > > This case does not happen when doing incremental sends with
> > > snapshots
> > > that are kept read-only by the user all the time, but it may happen
> > > if
> > > - a snapshot was modified in the same transaction as its parent
> > > after it
> > >   was created
> > > - the subvol used as parent was created independently from the sent
> > > subvol
> > >
> > > Example reproducers:
> > >
> > >   # case 1: same ino at same path
> > >   btrfs subvolume create subvol1
> > >   btrfs subvolume create subvol2
> > >   mknod subvol1/a c 1 3
> > >   mknod subvol2/a c 1 5
> > >   btrfs property set subvol1 ro true
> > >   btrfs property set subvol2 ro true
> > >   btrfs send -p subvol1 subvol2 | btrfs receive --dump
> > >
> > > The produced tree state here is:
> > >   |-- subvol1
> > >   |   `-- a         (ino 257, c 1,3)
> > >   |
> > >   `-- subvol2
> > >       `-- a         (ino 257, c 1,5)
> > >
> > > Where subvol1/a and subvol2/a are character devices with differing
> > > minor
> > > numbers but same inode number and same generation.
> > >
> > > Example output of the receive command:
> > >   At subvol subvol2
> > >   snapshot        ./subvol2                       uuid=7513941c-
> > > 4ef7-f847-b05e-4fdfe003af7b transid=9 parent_uuid=b66f015b-c226-
> > > 2548-9e39-048c7fdbec99 parent_transid=9
> > >   utimes          ./subvol2/                      atime=2021-01-
> > > 25T17:14:36+0000 mtime=2021-01-25T17:14:36+0000 ctime=2021-01-
> > > 25T17:14:36+0000
> > >   link            ./subvol2/a                     dest=a
> > >   unlink          ./subvol2/a
> > >   utimes          ./subvol2/                      atime=2021-01-
> > > 25T17:14:36+0000 mtime=2021-01-25T17:14:36+0000 ctime=2021-01-
> > > 25T17:14:36+0000
> > >   utimes          ./subvol2/a                     atime=2021-01-
> > > 25T17:14:36+0000 mtime=2021-01-25T17:14:36+0000 ctime=2021-01-
> > > 25T17:14:36+0000
> > >
> > > => the `link` command causes the receiver to fail with:
> > >    ERROR: link a -> a failed: File exists
> > >
> > > Second example:
> > >   # case 2: same ino at different path
> > >   btrfs subvolume create subvol1
> > >   btrfs subvolume create subvol2
> > >   mknod subvol1/a c 1 3
> > >   mknod subvol2/b c 1 5
> > >   btrfs property set subvol1 ro true
> > >   btrfs property set subvol2 ro true
> > >   btrfs send -p subvol1 subvol2 | btrfs receive --dump
> >
> > As I've told you before for the v1 patchset from a week or two ago,
> > this is not a supported scenario for incremental sends.
> > Incremental sends are meant to be used on RO snapshots of the same
> > subvolume, and those snapshots must never be changed after they were
> > created.
> >
> > Incremental sends were simply not designed for these cases, and can
> > never be guaranteed to work with such cases.
> >
> > The bug is not having incremental sends fail right away, with an
> > explicit error message, when the send and parent roots aren't RO
> > snapshots of the same subvolume.
>
> I am sorry, I didn't want to anger you or to appear to be just stubborn
> by posting this.
>
> As I wrote in the cover letter I am aware that this is not a supported
> use case and I understand that that makes the patches likely to be
> rejected.

Ok, now I got the cover letter and the remaining v2 patches.
Vger has been having some lag this week, only got the mails during the
last evening.

Thanks.

> As said the reason I _wrote_ the patches was simply to learn more about
> the btrfs code and its internals and see if I would be able to
> understand it enough. The reason I _submitted_ them was just to
> document what I found out so others could have a look into it and just
> in case it maybe useful at a later time.
>
> I also don't want to claim that these will add full support for sending
> unrelated roots - they don't! They only handle those very specific edge
> cases I found, which are currently _possible_, although still not
> supported.
>
> I took a deeper look into the rest to see if it could be supported:
> the comparing algorithm actually works fine, even with completely
> unrelated subvolumes (i.e. btrfs_compare_trees, changed_cb,
> changed_inode etc.), but the processing of the changes (i.e.
> process_recorded_refs etc.) is heavily based on (ino, gen) as
> identifying handle, which can not be changed without the high risk of
> regression - just as you said in your earlier comments - since side
> effects of any changes are hard to see or understand without a very
> deep understanding of the whole code; which is why I didn't even try to
> touch that parts.
>
> I apologize if I appeared to be stubborn or ignorant of your feedback!
> That really wasn't my intent.
>
>
> > > The produced tree state here is:
> > >   |-- subvol1
> > >   |   `-- a         (ino 257, c 1,3)
> > >   |
> > >   `-- subvol2
> > >       `-- b         (ino 257, c 1,5)
> > >
> > > Where subvol1/a and subvol2/b are character devices with differing
> > > minor
> > > numbers but same inode number and same generation.
> > >
> > > Example output of the receive command:
> > >   At subvol subvol2
> > >   snapshot        ./subvol2                       uuid=1c175819-
> > > 8b97-0046-a20e-5f95e37cbd40 transid=13 parent_uuid=bad4a908-21b4-
> > > 6f40-9a08-6b0768346725 parent_transid=13
> > >   utimes          ./subvol2/                      atime=2021-01-
> > > 25T17:18:46+0000 mtime=2021-01-25T17:18:46+0000 ctime=2021-01-
> > > 25T17:18:46+0000
> > >   link            ./subvol2/b                     dest=a
> > >   unlink          ./subvol2/a
> > >   utimes          ./subvol2/                      atime=2021-01-
> > > 25T17:18:46+0000 mtime=2021-01-25T17:18:46+0000 ctime=2021-01-
> > > 25T17:18:46+0000
> > >   utimes          ./subvol2/b                     atime=2021-01-
> > > 25T17:18:46+0000 mtime=2021-01-25T17:18:46+0000 ctime=2021-01-
> > > 25T17:18:46+0000
> > >
> > > => subvol1/a is renamed to subvol2/b instead of recreated to
> > > updated
> > >    rdev which results in received subvol2/b having the wrong minor
> > >    number:
> > >
> > >   257 crw-r--r--. 1 root root 1, 3 Jan 25 17:18 subvol2/b
> > >
> > > Signed-off-by: Roman Anasal <roman.anasal@bdsu.de>
> > > ---
> > > v2:
> > >   - add this patch to also handle changed rdev
> > > ---
> > >  fs/btrfs/send.c | 15 ++++++++++-----
> > >  1 file changed, 10 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> > > index c8b1f441f..ef544525f 100644
> > > --- a/fs/btrfs/send.c
> > > +++ b/fs/btrfs/send.c
> > > @@ -6263,6 +6263,7 @@ static int changed_inode(struct send_ctx
> > > *sctx,
> > >         struct btrfs_inode_item *right_ii = NULL;
> > >         u64 left_gen = 0;
> > >         u64 right_gen = 0;
> > > +       u64 left_rdev, right_rdev;
> > >         u64 left_type, right_type;
> > >
> > >         sctx->cur_ino = key->objectid;
> > > @@ -6285,6 +6286,8 @@ static int changed_inode(struct send_ctx
> > > *sctx,
> > >                                 struct btrfs_inode_item);
> > >                 left_gen = btrfs_inode_generation(sctx->left_path-
> > > >nodes[0],
> > >                                 left_ii);
> > > +               left_rdev = btrfs_inode_rdev(sctx->left_path-
> > > >nodes[0],
> > > +                               left_ii);
> > >         } else {
> > >                 right_ii = btrfs_item_ptr(sctx->right_path-
> > > >nodes[0],
> > >                                 sctx->right_path->slots[0],
> > > @@ -6300,6 +6303,9 @@ static int changed_inode(struct send_ctx
> > > *sctx,
> > >                 right_gen = btrfs_inode_generation(sctx-
> > > >right_path->nodes[0],
> > >                                 right_ii);
> > >
> > > +               right_rdev = btrfs_inode_rdev(sctx->right_path-
> > > >nodes[0],
> > > +                               right_ii);
> > > +
> > >                 left_type = S_IFMT & btrfs_inode_mode(
> > >                                 sctx->left_path->nodes[0],
> > > left_ii);
> > >                 right_type = S_IFMT & btrfs_inode_mode(
> > > @@ -6310,7 +6316,8 @@ static int changed_inode(struct send_ctx
> > > *sctx,
> > >                  * the inode as deleted+reused because it would
> > > generate a
> > >                  * stream that tries to delete/mkdir the root dir.
> > >                  */
> > > -               if ((left_gen != right_gen || left_type !=
> > > right_type) &&
> > > +               if ((left_gen != right_gen || left_type !=
> > > right_type ||
> > > +                   left_rdev != right_rdev) &&
> > >                     sctx->cur_ino != BTRFS_FIRST_FREE_OBJECTID)
> > >                         sctx->cur_inode_recreated = 1;
> > >         }
> > > @@ -6350,8 +6357,7 @@ static int changed_inode(struct send_ctx
> > > *sctx,
> > >                                 sctx->left_path->nodes[0],
> > > left_ii);
> > >                 sctx->cur_inode_mode = btrfs_inode_mode(
> > >                                 sctx->left_path->nodes[0],
> > > left_ii);
> > > -               sctx->cur_inode_rdev = btrfs_inode_rdev(
> > > -                               sctx->left_path->nodes[0],
> > > left_ii);
> > > +               sctx->cur_inode_rdev = left_rdev;
> > >                 if (sctx->cur_ino != BTRFS_FIRST_FREE_OBJECTID)
> > >                         ret = send_create_inode_if_needed(sctx);
> > >         } else if (result == BTRFS_COMPARE_TREE_DELETED) {
> > > @@ -6396,8 +6402,7 @@ static int changed_inode(struct send_ctx
> > > *sctx,
> > >                                         sctx->left_path->nodes[0],
> > > left_ii);
> > >                         sctx->cur_inode_mode = btrfs_inode_mode(
> > >                                         sctx->left_path->nodes[0],
> > > left_ii);
> > > -                       sctx->cur_inode_rdev = btrfs_inode_rdev(
> > > -                                       sctx->left_path->nodes[0],
> > > left_ii);
> > > +                       sctx->cur_inode_rdev = left_rdev;
> > >                         ret = send_create_inode_if_needed(sctx);
> > >                         if (ret < 0)
> > >                                 goto out;
> > > --
> > > 2.26.2
> > >
> >
> >



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] btrfs: send: fix invalid commands for inodes with changed rdev but same gen
  2021-01-25 20:51   ` Filipe Manana
  2021-01-26 19:19     ` Roman Anasal | BDSU
@ 2021-01-31 15:52     ` Roman Anasal | BDSU
  2021-02-02 11:56       ` Filipe Manana
  1 sibling, 1 reply; 11+ messages in thread
From: Roman Anasal | BDSU @ 2021-01-31 15:52 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Mon, Jan 25, 2021 at 20:51 +0000 Filipe Manana wrote:
> On Mon, Jan 25, 2021 at 7:51 PM Roman Anasal <roman.anasal@bdsu.de>
> wrote:
> > Second example:
> >   # case 2: same ino at different path
> >   btrfs subvolume create subvol1
> >   btrfs subvolume create subvol2
> >   mknod subvol1/a c 1 3
> >   mknod subvol2/b c 1 5
> >   btrfs property set subvol1 ro true
> >   btrfs property set subvol2 ro true
> >   btrfs send -p subvol1 subvol2 | btrfs receive --dump
> 
> As I've told you before for the v1 patchset from a week or two ago,
> this is not a supported scenario for incremental sends.
> Incremental sends are meant to be used on RO snapshots of the same
> subvolume, and those snapshots must never be changed after they were
> created.
> 
> Incremental sends were simply not designed for these cases, and can
> never be guaranteed to work with such cases.
> 
> The bug is not having incremental sends fail right away, with an
> explicit error message, when the send and parent roots aren't RO
> snapshots of the same subvolume.

Since this should be fixed then I'd like to propose to add the
following check:

The inodes of the subvolumes' root directories (ino
BTRFS_FIRST_FREE_OBJECTID = 256) must have the same generation.

Since create_subvol() will always commit the transaction, i.e.
increment the generation, no two _independently_ created subvolumes can
be created within the same generation (are there race conditions
possible here?).
Taking a snapshot of a subvolume does not modify the generation of the
root dir inode. Also it is not possible to change or delete/re-create
the root directory of a subvolume since this would delete the subvolume
itself.


So having two subvolumes with root directories created with different
generations means they were created independently and can not share a
common ancestor. Doing an incremental send with them is unsafe and thus
must return an error.
With the root directories at the same generation though the subvolumes
are based on a common ancestor which is a requirement for a safe
incremental send.

Are my assumptions and my understanding here correct? Then this check
would catch most of the unsafe parents.
If so I could have a shot at a patch for this if you'd like me to?


This check still does not solve the second edge case though, when
snapshots are modified afterwards and diverge independently form one
another. For this I still see no good solution besides a new on-disk
flag whether a snapshot was *ever* set to ro=false. But with that I'm
not sure how to (not) inherit that flag in a safe way ...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] btrfs: send: fix invalid commands for inodes with changed rdev but same gen
  2021-01-31 15:52     ` Roman Anasal | BDSU
@ 2021-02-02 11:56       ` Filipe Manana
  2021-02-03 16:20         ` Roman Anasal | BDSU
  0 siblings, 1 reply; 11+ messages in thread
From: Filipe Manana @ 2021-02-02 11:56 UTC (permalink / raw)
  To: Roman Anasal | BDSU; +Cc: linux-btrfs

On Sun, Jan 31, 2021 at 3:52 PM Roman Anasal | BDSU
<roman.anasal@bdsu.de> wrote:
>
> On Mon, Jan 25, 2021 at 20:51 +0000 Filipe Manana wrote:
> > On Mon, Jan 25, 2021 at 7:51 PM Roman Anasal <roman.anasal@bdsu.de>
> > wrote:
> > > Second example:
> > >   # case 2: same ino at different path
> > >   btrfs subvolume create subvol1
> > >   btrfs subvolume create subvol2
> > >   mknod subvol1/a c 1 3
> > >   mknod subvol2/b c 1 5
> > >   btrfs property set subvol1 ro true
> > >   btrfs property set subvol2 ro true
> > >   btrfs send -p subvol1 subvol2 | btrfs receive --dump
> >
> > As I've told you before for the v1 patchset from a week or two ago,
> > this is not a supported scenario for incremental sends.
> > Incremental sends are meant to be used on RO snapshots of the same
> > subvolume, and those snapshots must never be changed after they were
> > created.
> >
> > Incremental sends were simply not designed for these cases, and can
> > never be guaranteed to work with such cases.
> >
> > The bug is not having incremental sends fail right away, with an
> > explicit error message, when the send and parent roots aren't RO
> > snapshots of the same subvolume.
>
> Since this should be fixed then I'd like to propose to add the
> following check:
>
> The inodes of the subvolumes' root directories (ino
> BTRFS_FIRST_FREE_OBJECTID = 256) must have the same generation.
>
> Since create_subvol() will always commit the transaction, i.e.
> increment the generation, no two _independently_ created subvolumes can
> be created within the same generation (are there race conditions
> possible here?).

That is currently true, but it has been discussed and proposed the
ability to skip the transaction commit when creating a subvolume
Boris sent a proposal patch for that a few months ago.

I don't think that should be assumed. Avoiding the transaction commit,
either by default or optionally, is something that makes sense.
Plus for a case like snapshots, we can actually batch the creation of
several ones in a single transaction.

> Taking a snapshot of a subvolume does not modify the generation of the
> root dir inode. Also it is not possible to change or delete/re-create
> the root directory of a subvolume since this would delete the subvolume
> itself.
>
>
> So having two subvolumes with root directories created with different
> generations means they were created independently and can not share a
> common ancestor. Doing an incremental send with them is unsafe and thus
> must return an error.
> With the root directories at the same generation though the subvolumes
> are based on a common ancestor which is a requirement for a safe
> incremental send.
>
> Are my assumptions and my understanding here correct? Then this check
> would catch most of the unsafe parents.
> If so I could have a shot at a patch for this if you'd like me to?

That is too complex and makes too many assumptions.

To check if two roots are snapshots of the same subvolume (the send
and parent roots), you can simply check if they have non-null uuids in
the "parent_uuid" field of their root items and that they match.

While this is more straightforward to do in the kernel, I would prefer
to have it in btrfs-progs, because:

1) In btrfs-progs we can explicitly print an informative error message
to the user, while in the kernel you can only return an errno value
and log something dmesg/syslog, which is much less user friendly;

2) The check would be on by default but could be skipped with some new
flag - this is just being conservative to avoid breaking any existing
workflows we might not be aware of.
    In particular I'm thinking about people using "btrfs send" with -c
and omitting -p, in which case btrfs-progs selects one of the -c roots
to be used as the parent root,
    but the selected root might not be a snapshot of the same
subvolume as the send root.
    Then maybe one day that option to skip the check would be removed,
after we are more sure no one is using or really needs such workflows.

>
>
> This check still does not solve the second edge case though, when
> snapshots are modified afterwards and diverge independently form one
> another. For this I still see no good solution besides a new on-disk
> flag whether a snapshot was *ever* set to ro=false. But with that I'm
> not sure how to (not) inherit that flag in a safe way ...

I'm afraid there's nothing, codewise, to do about that case.

Setting some flag on the root to make it unusable for send in case it
was ever RW would break send in at least one way:

During a receive we create the root as RW, apply the send stream and
then change the root to RO.
After such change, it would mean we could not send the received
snapshot anymore. There's no way to make sure that only btrfs-receive
can do that, since anyone can use the ioctl.

Perhaps all that needs to be done is to document this well in the man
pages and wiki in case it's not already there.

Thanks.


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] btrfs: send: fix invalid commands for inodes with changed rdev but same gen
  2021-02-02 11:56       ` Filipe Manana
@ 2021-02-03 16:20         ` Roman Anasal | BDSU
  0 siblings, 0 replies; 11+ messages in thread
From: Roman Anasal | BDSU @ 2021-02-03 16:20 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Tue, 2021-02-02 at 11:56 +0000, Filipe Manana wrote:
> On Sun, Jan 31, 2021 at 3:52 PM Roman Anasal | BDSU
> <roman.anasal@bdsu.de> wrote:
> > On Mon, Jan 25, 2021 at 20:51 +0000 Filipe Manana wrote:
> > > On Mon, Jan 25, 2021 at 7:51 PM Roman Anasal <
> > > roman.anasal@bdsu.de>
> > > wrote:
> > > > Second example:
> > > >   # case 2: same ino at different path
> > > >   btrfs subvolume create subvol1
> > > >   btrfs subvolume create subvol2
> > > >   mknod subvol1/a c 1 3
> > > >   mknod subvol2/b c 1 5
> > > >   btrfs property set subvol1 ro true
> > > >   btrfs property set subvol2 ro true
> > > >   btrfs send -p subvol1 subvol2 | btrfs receive --dump
> > > 
> > > As I've told you before for the v1 patchset from a week or two
> > > ago,
> > > this is not a supported scenario for incremental sends.
> > > Incremental sends are meant to be used on RO snapshots of the
> > > same
> > > subvolume, and those snapshots must never be changed after they
> > > were
> > > created.
> > > 
> > > Incremental sends were simply not designed for these cases, and
> > > can
> > > never be guaranteed to work with such cases.
> > > 
> > > The bug is not having incremental sends fail right away, with an
> > > explicit error message, when the send and parent roots aren't RO
> > > snapshots of the same subvolume.
> > 
> > Since this should be fixed then I'd like to propose to add the
> > following check:
> > 
> > The inodes of the subvolumes' root directories (ino
> > BTRFS_FIRST_FREE_OBJECTID = 256) must have the same generation.
> > 
> > Since create_subvol() will always commit the transaction, i.e.
> > increment the generation, no two _independently_ created subvolumes
> > can
> > be created within the same generation (are there race conditions
> > possible here?).
> 
> That is currently true, but it has been discussed and proposed the
> ability to skip the transaction commit when creating a subvolume
> Boris sent a proposal patch for that a few months ago.

Ah, okay then, if this may change in the future then this idea isn't
safe and should be dismissed.


> I don't think that should be assumed. Avoiding the transaction
> commit,
> either by default or optionally, is something that makes sense.
> Plus for a case like snapshots, we can actually batch the creation of
> several ones in a single transaction.
> 
> > Taking a snapshot of a subvolume does not modify the generation of
> > the
> > root dir inode. Also it is not possible to change or delete/re-
> > create
> > the root directory of a subvolume since this would delete the
> > subvolume
> > itself.
> > 
> > 
> > So having two subvolumes with root directories created with
> > different
> > generations means they were created independently and can not share
> > a
> > common ancestor. Doing an incremental send with them is unsafe and
> > thus
> > must return an error.
> > With the root directories at the same generation though the
> > subvolumes
> > are based on a common ancestor which is a requirement for a safe
> > incremental send.
> > 
> > Are my assumptions and my understanding here correct? Then this
> > check
> > would catch most of the unsafe parents.
> > If so I could have a shot at a patch for this if you'd like me to?
> 
> That is too complex and makes too many assumptions.
> 
> To check if two roots are snapshots of the same subvolume (the send
> and parent roots), you can simply check if they have non-null uuids
> in
> the "parent_uuid" field of their root items and that they match.

I thought of this, too, but see it break in some scenarios I'd expect
it to work, mostly with "chains" of snapshots as they happen on a
receiving side.

Consider this scenario:

   btrfs subvolume create /subvol/
   # modify /subvol
   btrfs subvolume snapshot -r /subvol/ /snapshots/snap1
   # modify /subvol
   btrfs subvolume snapshot -r /subvol/ /snapshots/snap2
   # modify /subvol
btrfs subvolume snapshot -r /subvol/ /snapshots/snap3

I.e. have a single RW subvolume and taking incremental snapshots of it.

   cd /snapshots/
   btrfs send snap1 | btrfs receive /mnt/backups/
btrfs send -p snap1 snap2 | btrfs receive /mnt/backups/   btrfs send -p snap2 snap3 | btrfs receive /mnt/backups/

I.e. incrementally send the snapshots to another btrfs volume.

   cd /mnt/backups
   btrfs subvolume delete snap2
   btrfs send snap1 | btrfs receive /mnt/backups2/
   btrfs send -p snap1 snap3 | btrfs receive /mnt/backups2/

I.e. delete the intermediate snapshot snap2 and incrementally send
snap1 and snap3 from the receiving filesystem to yet another btrfs
filesystem.

The last command would fail since snap3 was based on snap2 which was
based on snap1; so neither is snap1 the (direct) parent of snap3 nor do
they share a common (direct) parent nor would it be possible to
reconstruct their relation by walking the chain since snap2 does no
longer exist.

While on the orignal filesystem all snapshots have the same parent on
the reciving volume it is a chain:

orignal volume:

        subvolume
        ^   ^   ^
       /    |    \
   snap1  snap2  snap3

receiving volume:

   snap1 <- snap2 <- snap3


So for this to work it would probably require another attribute
"original subvol UUID" for the root of the ancestry tree...


> While this is more straightforward to do in the kernel, I would
> prefer
> to have it in btrfs-progs, because:
> 
> 1) In btrfs-progs we can explicitly print an informative error
> message
> to the user, while in the kernel you can only return an errno value
> and log something dmesg/syslog, which is much less user friendly;

I was thinking about implementing it in the kernel as an (additional)
check to block unsafe sends regardless of the user space tool (are
there any besides btrfs-progs?); but proper handling and an explaining
error message must be imlpemented in btrfs-progs, totally.


> 2) The check would be on by default but could be skipped with some
> new
> flag - this is just being conservative to avoid breaking any existing
> workflows we might not be aware of.
>     In particular I'm thinking about people using "btrfs send" with
> -c
> and omitting -p, in which case btrfs-progs selects one of the -c
> roots
> to be used as the parent root,
>     but the selected root might not be a snapshot of the same
> subvolume as the send root.
>     Then maybe one day that option to skip the check would be
> removed,
> after we are more sure no one is using or really needs such
> workflows.

The way I read find_good_parent() it will only select a clone source as
parent if it is the parent subvolume of the send subvolume [1] or if
both have the same parent [2]?
Which makes sense since selecting an snapshot of an unrelated subvolume
would be unsafe.

[1] https://github.com/kdave/btrfs-
progs/blob/273380d98f4412ae8b0f35ad69debf682e48c6bd/cmds/send.c#L118
[2]
https://github.com/kdave/btrfs-
progs/blob/273380d98f4412ae8b0f35ad69debf682e48c6bd/cmds/send.c#L131

> > 
> > This check still does not solve the second edge case though, when
> > snapshots are modified afterwards and diverge independently form
> > one
> > another. For this I still see no good solution besides a new on-
> > disk
> > flag whether a snapshot was *ever* set to ro=false. But with that
> > I'm
> > not sure how to (not) inherit that flag in a safe way ...
> 
> I'm afraid there's nothing, codewise, to do about that case.
> 
> Setting some flag on the root to make it unusable for send in case it
> was ever RW would break send in at least one way:
> 
> During a receive we create the root as RW, apply the send stream and
> then change the root to RO.
> After such change, it would mean we could not send the received
> snapshot anymore. There's no way to make sure that only btrfs-receive
> can do that, since anyone can use the ioctl.

Another case where allowing to switch to RW before send would be
desirable: make snapshot RW, delete files you don't need anymore, make
RO again, send to backup disk.
Only deleting files/inodes should even be safe now.


> Perhaps all that needs to be done is to document this well in the man
> pages and wiki in case it's not already there.

Yes. Since these are all very unlikely edge cases and reliably detecting them without false positives is hard, just explicitly documenting them is probably the best solution.

> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-02-03 16:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-25 19:42 [PATCH v2 0/4] btrfs: send: correctly recreate changed inodes Roman Anasal
2021-01-25 19:42 ` [PATCH v2 1/4] btrfs: send: rename send_ctx.cur_inode_new_gen to cur_inode_recreated Roman Anasal
2021-01-25 19:42 ` [PATCH v2 2/4] btrfs: send: fix invalid commands for inodes with changed type but same gen Roman Anasal
2021-01-25 19:42 ` [PATCH v2 3/4] btrfs: send: fix invalid commands for inodes with changed rdev " Roman Anasal
2021-01-25 20:51   ` Filipe Manana
2021-01-26 19:19     ` Roman Anasal | BDSU
2021-01-27 10:53       ` Filipe Manana
2021-01-31 15:52     ` Roman Anasal | BDSU
2021-02-02 11:56       ` Filipe Manana
2021-02-03 16:20         ` Roman Anasal | BDSU
2021-01-25 19:42 ` [PATCH v2 4/4] btrfs: send: fix invalid commands for inodes in disconnected roots Roman Anasal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.