linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] btrfs: Add intrudoction to dev-replace.
@ 2020-01-22  7:20 Qu Wenruo
  0 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2020-01-22  7:20 UTC (permalink / raw)
  To: linux-btrfs

The overview of btrfs dev-replace is not that complex.
But digging into the code directly can waste some extra time, so add
such introduction to help later guys.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
This is just a code reading note for dev replace code.

The real problem I'm chasing is the data corruption bug reported by
Filipe.
That bug can be produced by looping btrfs/06[45] btrfs/071 test cases.

The offending commit is b12de52896c0 ("btrfs: scrub: Don't check free
space before marking a block group RO").

While older commit 76a8efa171bf ("btrfs: Continue replace when
set_block_ro failed") also looks suspicious since it allows dev-replace
to happen without even marking the block group RO.

The observed result is, all data corruption happens when data chunks
are not marked RO.
So commit b12de52896c0 is increasing the possibility of a block group
not marked as RO.

But with the protection from write duplication, it shouldn't happen at
all.

Write duplication starts by setting fs_info::dev_replace, then wait for
all existing ordered extents, then commit transaction.
So that all write after btrfs_dev_replace_start() should also happen on
target device.

Although scrub only iterates through commit tree, after
setting fs_info::dev_replace there is no way that any new write won't
reach target device.
Thus no matter whether the block group is marked RO, it should be safe.

Looking for extra ideas.
---
 fs/btrfs/dev-replace.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index f639dde2a679..a3d8272d9d80 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -22,6 +22,37 @@
 #include "dev-replace.h"
 #include "sysfs.h"
 
+/*
+ * Introduction for dev-replace.
+ *
+ * [Objective]
+ * To copy all extents (both runtime and on-disk) from source device
+ * to target device, while still keeps the fs RW.
+ *
+ * [Method]
+ * There are two main methods involved:
+ * - Write duplication
+ *   All newer write will to written to both target and source devices.
+ *   So that even replace get canceled, old device is still valid.
+ *
+ *   Location:		handle_ops_on_dev_replace() from __btrfs_map_block()
+ *   Start timing:	btrfs_dev_replace_start()
+ *   End timing:	btrfs_dev_replace_finishing()
+ *
+ * - Existing extents copy
+ *   This happens by re-using scrub facility, as scrub also iterates through
+ *   exiting extents from commit root.
+ *
+ *   Location:		scrub_write_block_to_dev_replace() from
+ *   			scrub_block_complete()
+ *
+ * After replace is done, the finishing part is done by:
+ * - Swap target and source device
+ *   When the scrub finishes, swap the source device with target device.
+ *
+ *   Location:		btrfs_dev_replace_update_device_in_mapping_tree() from
+ *   			btrfs_dev_replace_finishing()
+ */
 static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 				       int scrub_ret);
 static void btrfs_dev_replace_update_device_in_mapping_tree(
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: Add intrudoction to dev-replace.
  2020-01-23  7:44 Qu Wenruo
  2020-01-23  8:00 ` Anand Jain
@ 2020-01-30 12:46 ` David Sterba
  1 sibling, 0 replies; 4+ messages in thread
From: David Sterba @ 2020-01-30 12:46 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Jan 23, 2020 at 03:44:50PM +0800, Qu Wenruo wrote:
> The overview of btrfs dev-replace is not that complex.
> But digging into the code directly can waste some extra time, so add
> such introduction to help later guys.
> 
> Also, it mentions some corner cases caused by the write duplication and
> scrub based data copy, to inform new comers not to get trapped by that
> pitfall.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Thanks for the docs, I've adjusted some wording and fixed a few typos.
Please try to proofread it before sending, also reviews should catch
and point that out.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: Add intrudoction to dev-replace.
  2020-01-23  7:44 Qu Wenruo
@ 2020-01-23  8:00 ` Anand Jain
  2020-01-30 12:46 ` David Sterba
  1 sibling, 0 replies; 4+ messages in thread
From: Anand Jain @ 2020-01-23  8:00 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 1/23/20 3:44 PM, Qu Wenruo wrote:
> The overview of btrfs dev-replace is not that complex.
> But digging into the code directly can waste some extra time, so add
> such introduction to help later guys.
> 
> Also, it mentions some corner cases caused by the write duplication and
> scrub based data copy, to inform new comers not to get trapped by that
> pitfall.
> 

looks good.
Reviewed-by: Anand Jain <anand.jain@oracle.com>

nits below.

> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/dev-replace.c | 38 ++++++++++++++++++++++++++++++++++++++
>   1 file changed, 38 insertions(+)
> 
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index f639dde2a679..5889c10ed8d2 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -22,6 +22,44 @@
>   #include "dev-replace.h"
>   #include "sysfs.h"
>   
> +/*
> + * Introduction for dev-replace.
> + *
> + * [Objective]
> + * To copy all extents (both runtime and on-disk) from source device
> + * to target device, while still keeps the fs RW.
> + *
> + * [Method]
> + * There are two main methods involved:
> + * - Write duplication
> + *   All newer write will to written to both target and source devices.
                           ^^^^^^^^^

> + *   So that even replace get canceled, old device is still valid.
> + *
> + *   Location:		handle_ops_on_dev_replace() from __btrfs_map_block()

Term Location is bit confusing, instead Functions will do?

Thanks, Anand

> + *   Start timing:	btrfs_dev_replace_start()
> + *   End timing:	btrfs_dev_replace_finishing()
> + *   Content:		Latest data/meta
> + *
> + * - Existing extents copy
> + *   This happens by re-using scrub facility, as scrub also iterates through
> + *   exiting extents from commit root.
> + *
> + *   Location:		scrub_write_block_to_dev_replace() from
> + *   			scrub_block_complete()
> + *   Content:		Data/meta from commit root.
> + *
> + * Due to the content difference, we need to avoid nocow write when dev-replace
> + * is happening.
> + * This is done by marking the block group RO and wait for nocow writes.
> + *
> + * After replace is done, the finishing part is done by:
> + * - Swap target and source device
> + *   When the scrub finishes, swap the source device with target device.
> + *
> + *   Location:		btrfs_dev_replace_update_device_in_mapping_tree() from
> + *   			btrfs_dev_replace_finishing()
> + */
> +
>   static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
>   				       int scrub_ret);
>   static void btrfs_dev_replace_update_device_in_mapping_tree(
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] btrfs: Add intrudoction to dev-replace.
@ 2020-01-23  7:44 Qu Wenruo
  2020-01-23  8:00 ` Anand Jain
  2020-01-30 12:46 ` David Sterba
  0 siblings, 2 replies; 4+ messages in thread
From: Qu Wenruo @ 2020-01-23  7:44 UTC (permalink / raw)
  To: linux-btrfs

The overview of btrfs dev-replace is not that complex.
But digging into the code directly can waste some extra time, so add
such introduction to help later guys.

Also, it mentions some corner cases caused by the write duplication and
scrub based data copy, to inform new comers not to get trapped by that
pitfall.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/dev-replace.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index f639dde2a679..5889c10ed8d2 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -22,6 +22,44 @@
 #include "dev-replace.h"
 #include "sysfs.h"
 
+/*
+ * Introduction for dev-replace.
+ *
+ * [Objective]
+ * To copy all extents (both runtime and on-disk) from source device
+ * to target device, while still keeps the fs RW.
+ *
+ * [Method]
+ * There are two main methods involved:
+ * - Write duplication
+ *   All newer write will to written to both target and source devices.
+ *   So that even replace get canceled, old device is still valid.
+ *
+ *   Location:		handle_ops_on_dev_replace() from __btrfs_map_block()
+ *   Start timing:	btrfs_dev_replace_start()
+ *   End timing:	btrfs_dev_replace_finishing()
+ *   Content:		Latest data/meta
+ *
+ * - Existing extents copy
+ *   This happens by re-using scrub facility, as scrub also iterates through
+ *   exiting extents from commit root.
+ *
+ *   Location:		scrub_write_block_to_dev_replace() from
+ *   			scrub_block_complete()
+ *   Content:		Data/meta from commit root.
+ *
+ * Due to the content difference, we need to avoid nocow write when dev-replace
+ * is happening.
+ * This is done by marking the block group RO and wait for nocow writes.
+ *
+ * After replace is done, the finishing part is done by:
+ * - Swap target and source device
+ *   When the scrub finishes, swap the source device with target device.
+ *
+ *   Location:		btrfs_dev_replace_update_device_in_mapping_tree() from
+ *   			btrfs_dev_replace_finishing()
+ */
+
 static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 				       int scrub_ret);
 static void btrfs_dev_replace_update_device_in_mapping_tree(
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-01-30 12:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-22  7:20 [PATCH] btrfs: Add intrudoction to dev-replace Qu Wenruo
2020-01-23  7:44 Qu Wenruo
2020-01-23  8:00 ` Anand Jain
2020-01-30 12:46 ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).