linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] btrfs: metadata uuid fixes and enhancements
@ 2019-12-12 11:01 damenly.su
  2019-12-12 11:01 ` [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan damenly.su
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: damenly.su @ 2019-12-12 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Su Yue

From: Su Yue <Damenly_Su@gmx.com>

This patchset fixes one reproducible bug and add two split-brain
cases ignored.

The origin code thinks the final state of successful synced device is
always has INCOMPAT_METADATA_UUID feature. However, a device without
the feature flag can be the one pull into disk. This is what handled
in the patchset. Test images are added in btrfs-progs part.

Patch[1] fixes a bug about wrong fsid copy.
Patch[2] is for the later patches.
Patch[3-5] add the forgotten cases.
Patch[6] just does simple code movement for grace.

The set passes xfstests-dev without regressions.

Su Yue (6):
  btrfs: metadata_uuid: fix failed assertion due to unsuccessful device
    scan
  btrfs: metadata_uuid: move split-brain handling from fs_id() to new
    function
  btrfs: split-brain case for scanned changing device with
    INCOMPAT_METADATA_UUID
  btrfs: split-brain case for scanned changed device without
    INCOMPAT_METADATA_UUID
  btrfs: copy fsid and metadata_uuid for pulled disk without
    INCOMPAT_METADATA_UUID
  btrfs: metadata_uuid: move partly logic into find_fsid_inprogress()

 fs/btrfs/volumes.c | 193 +++++++++++++++++++++++++++++----------------
 1 file changed, 125 insertions(+), 68 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan
  2019-12-12 11:01 [PATCH 0/6] btrfs: metadata uuid fixes and enhancements damenly.su
@ 2019-12-12 11:01 ` damenly.su
  2019-12-12 14:15   ` Nikolay Borisov
  2019-12-12 11:01 ` [PATCH 2/6] btrfs: metadata_uuid: move split-brain handling from fs_id() to new function damenly.su
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: damenly.su @ 2019-12-12 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Su Yue

From: Su Yue <Damenly_Su@gmx.com>

While running misc-tests/034 of btrfs-progs, easy to hit:
======================================================================
[ 1318.749685] BTRFS: device fsid 55abf31c-6b3a-47ad-8711-cfd5a249dd4c devid 2 transid 8 /dev/loop0 scanned by systemd-udevd (3530)
[ 1318.846791] BTRFS: device fsid 55abf31c-6b3a-47ad-8711-cfd5a249dd4c devid 2 transid 8 /dev/loop0 scanned by systemd-udevd (3530)
[ 1318.847812] BTRFS: device fsid 55abf31c-6b3a-47ad-8711-cfd5a249dd4c devid 2 transid 8 /dev/loop0 scanned by mount (3540)
[ 1318.901278] assertion failed: !memcmp(fs_info->fs_devices->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE), in fs/btrfs/disk-io.c:2874
[ 1318.901499] ------------[ cut here ]------------
[ 1318.901503] kernel BUG at fs/btrfs/ctree.h:3118!
[ 1318.901582] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 1318.901644] CPU: 4 PID: 3540 Comm: mount Tainted: G           O      5.5.0-rc1-custom+ #41
[ 1318.901720] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 1318.901836] RIP: 0010:assfail.constprop.0+0x1c/0x1e [btrfs]
[ 1318.901894] Code: 00 00 44 89 c0 5b 41 5c 41 5d 41 5e 5d c3 55 89 f1 48 c7 c2 40 13 0f c1 48 89 fe 48 c7 c7 00 1b 0f c1 48 89 e5 e8 3e 12 0f dc <0f> 0b e8 93 01 39 dc be 56 03 00 00 48 c7 c7 a0 1b 0f c1 e8 cc ff
[ 1318.902069] RSP: 0018:ffff88814fea76e0 EFLAGS: 00010282
[ 1318.902124] RAX: 000000000000007c RBX: ffff88814d862400 RCX: 0000000000000000
[ 1318.902191] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffed1029fd4ecf
[ 1318.902259] RBP: ffff88814fea76e0 R08: 000000000000007c R09: ffffed102b3fe719
[ 1318.902326] R10: ffffed102b3fe718 R11: ffff888159ff38c7 R12: ffff8881398ac000
[ 1318.902394] R13: ffff888145742930 R14: ffff88812f56e000 R15: ffff88812f56e000
[ 1318.902464] FS:  00007f0753032500(0000) GS:ffff888159e00000(0000) knlGS:0000000000000000
[ 1318.902539] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1318.902595] CR2: 0000560357a9c018 CR3: 00000001502d4000 CR4: 0000000000340ee0
[ 1318.903852] Call Trace:
[ 1318.904958]  open_ctree+0x166c/0x373e [btrfs]
[ 1318.906029]  ? congestion_wait+0x2d0/0x2d0
[ 1318.907500]  ? wb_init+0x31e/0x400
[ 1318.908699]  ? close_ctree+0x52b/0x52b [btrfs]
[ 1318.909734]  btrfs_mount_root.cold+0xe/0x118 [btrfs]
[ 1318.910764]  ? btrfs_decode_error+0x40/0x40 [btrfs]
[ 1318.911784]  ? vfs_parse_fs_string+0xdc/0x130
[ 1318.912798]  ? rcu_read_lock_sched_held+0xa1/0xd0
[ 1318.913835]  ? rcu_read_lock_bh_held+0xb0/0xb0
[ 1318.914878]  ? legacy_parse_param+0x75/0x340
[ 1318.915946]  ? __lookup_constant+0x54/0x90
[ 1318.917005]  ? debug_lockdep_rcu_enabled.part.0+0x1a/0x30
[ 1318.918099]  ? kfree+0x2fd/0x360
[ 1318.919233]  ? btrfs_decode_error+0x40/0x40 [btrfs]
[ 1318.920361]  legacy_get_tree+0x89/0xd0
[ 1318.921480]  vfs_get_tree+0x52/0x140
[ 1318.922625]  fc_mount+0x14/0x70
[ 1318.923713]  vfs_kern_mount.part.0+0x78/0x90
[ 1318.924789]  vfs_kern_mount+0x13/0x20
[ 1318.925885]  btrfs_mount+0x1f3/0xb30 [btrfs]
[ 1318.926922]  ? sched_clock_cpu+0x1b/0x130
[ 1318.927936]  ? find_held_lock+0x95/0xb0
[ 1318.928964]  ? btrfs_remount+0x7e0/0x7e0 [btrfs]
[ 1318.929939]  ? vfs_parse_fs_string+0xdc/0x130
[ 1318.930875]  ? vfs_parse_fs_string+0xdc/0x130
[ 1318.931784]  ? rcu_read_lock_sched_held+0xa1/0xd0
[ 1318.932651]  ? rcu_read_lock_bh_held+0xb0/0xb0
[ 1318.933497]  ? legacy_parse_param+0x75/0x340
[ 1318.934327]  ? __lookup_constant+0x54/0x90
[ 1318.935173]  ? debug_lockdep_rcu_enabled.part.0+0x1a/0x30
[ 1318.936023]  ? kfree+0x2fd/0x360
[ 1318.937372]  ? cap_capable+0xb3/0xf0
[ 1318.938323]  ? btrfs_remount+0x7e0/0x7e0 [btrfs]
[ 1318.939126]  legacy_get_tree+0x89/0xd0
[ 1318.939921]  ? legacy_get_tree+0x89/0xd0
[ 1318.940687]  vfs_get_tree+0x52/0x140
[ 1318.941431]  do_mount+0xe01/0x1220
[ 1318.942189]  ? rcu_read_lock_bh_held+0xb0/0xb0
[ 1318.942935]  ? copy_mount_string+0x20/0x20
[ 1318.943690]  ? __kasan_check_write+0x14/0x20
[ 1318.944438]  ? memdup_user+0x52/0x90
[ 1318.945190]  ksys_mount+0x82/0xd0
[ 1318.945921]  __x64_sys_mount+0x67/0x80
[ 1318.946640]  do_syscall_64+0x79/0xe0
[ 1318.947364]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1318.948109] RIP: 0033:0x7f07531b5e4e
[ 1318.948851] Code: 48 8b 0d 35 00 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 02 00 0c 00 f7 d8 64 89 01 48
[ 1318.951249] RSP: 002b:00007ffeffa9c9f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 1318.952119] RAX: ffffffffffffffda RBX: 00007f07532dc204 RCX: 00007f07531b5e4e
[ 1318.952988] RDX: 0000560357a97430 RSI: 0000560357a96230 RDI: 0000560357a93500
[ 1318.953867] RBP: 0000560357a932f0 R08: 0000000000000000 R09: 0000560357a990e0
[ 1318.954757] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 1318.955638] R13: 0000560357a93500 R14: 0000560357a97430 R15: 0000560357a932f0
[ 1318.956540] Modules linked in: btrfs(O) xor zstd_decompress zstd_compress raid6_pq loop mousedev nls_iso8859_1 nls_cp437 vfat fat iTCO_wdt crct10dif_pclmul iTCO_vendor_support snd_hda_codec_generic crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm input_leds snd_timer led_class psmouse pcspkr aesni_intel i2c_i801 snd intel_agp glue_helper crypto_simd cryptd soundcore lpc_ich intel_gtt rtc_cmos qemu_fw_cfg agpgart evdev mac_hid ip_tables x_tables xfs sr_mod cdrom sd_mod dm_mod virtio_scsi virtio_balloon virtio_rng virtio_blk virtio_console rng_core virtio_net net_failover failover ahci serio_raw libahci atkbd libps2 libata scsi_mod virtio_pci virtio_ring virtio i8042 serio [last unloaded: btrfs]
[ 1318.965122] ---[ end trace 51f0adac8fc1fe76 ]---
======================================================================

Acutally, there are two devices in the fs. Device 2 with
FSID_CHANGING_V2 allocated a fs_devices. But, device 1 found the
fs_devices but failed to be added into since fs_devices->opened (
the thread is doing mount device 1). But device 1's fsid was copied
to fs_devices->fsid then the assertion failed.

The solution is that only if a new device was added into a existing
fs_device, then the fs_devices->fsid is allowed to be rewritten.

Fixes: 7a62d0f07377 ("btrfs: Handle one more split-brain scenario during fsid change")
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
---
 fs/btrfs/volumes.c | 36 +++++++++++++++++++++---------------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d8e5560db285..9efa4123c335 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -732,6 +732,9 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 		BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
 	bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
 					BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
+	bool fs_devices_found = false;
+
+	*new_device_added = false;
 
 	if (fsid_change_in_progress) {
 		if (!has_metadata_uuid) {
@@ -772,24 +775,11 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 
 		device = NULL;
 	} else {
+		fs_devices_found = true;
+
 		mutex_lock(&fs_devices->device_list_mutex);
 		device = btrfs_find_device(fs_devices, devid,
 				disk_super->dev_item.uuid, NULL, false);
-
-		/*
-		 * If this disk has been pulled into an fs devices created by
-		 * a device which had the CHANGING_FSID_V2 flag then replace the
-		 * metadata_uuid/fsid values of the fs_devices.
-		 */
-		if (has_metadata_uuid && fs_devices->fsid_change &&
-		    found_transid > fs_devices->latest_generation) {
-			memcpy(fs_devices->fsid, disk_super->fsid,
-					BTRFS_FSID_SIZE);
-			memcpy(fs_devices->metadata_uuid,
-					disk_super->metadata_uuid, BTRFS_FSID_SIZE);
-
-			fs_devices->fsid_change = false;
-		}
 	}
 
 	if (!device) {
@@ -912,6 +902,22 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 		}
 	}
 
+	/*
+	 * If the new added disk has been pulled into an fs devices created by
+	 * a device which had the CHANGING_FSID_V2 flag then replace the
+	 * metadata_uuid/fsid values of the fs_devices.
+	 */
+	if (*new_device_added && fs_devices_found &&
+	    has_metadata_uuid && fs_devices->fsid_change &&
+	    found_transid > fs_devices->latest_generation) {
+		memcpy(fs_devices->fsid, disk_super->fsid,
+		       BTRFS_FSID_SIZE);
+		memcpy(fs_devices->metadata_uuid,
+		       disk_super->metadata_uuid, BTRFS_FSID_SIZE);
+
+		fs_devices->fsid_change = false;
+	}
+
 	/*
 	 * Unmount does not free the btrfs_device struct but would zero
 	 * generation along with most of the other members. So just update
-- 
2.21.0 (Apple Git-122.2)


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/6] btrfs: metadata_uuid: move split-brain handling from fs_id() to new function
  2019-12-12 11:01 [PATCH 0/6] btrfs: metadata uuid fixes and enhancements damenly.su
  2019-12-12 11:01 ` [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan damenly.su
@ 2019-12-12 11:01 ` damenly.su
  2019-12-12 13:05   ` Nikolay Borisov
  2019-12-12 11:01 ` [PATCH 3/6] btrfs: split-brain case for scanned changing device with INCOMPAT_METADATA_UUID damenly.su
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: damenly.su @ 2019-12-12 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Su Yue

From: Su Yue <Damenly_Su@gmx.com>

The parameter @metadata_fsid of fs_id() is not NULL while scanned device
has metadata_uuid but not changing. Obviously, the cases handling part
in fs_id() is for this situation. Move the logic into new function
find_fsid_changing_metadata_uuid().

Signed-off-by: Su Yue <Damenly_Su@gmx.com>
---
 fs/btrfs/volumes.c | 78 ++++++++++++++++++++++++----------------------
 1 file changed, 41 insertions(+), 37 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9efa4123c335..b08b06a89a77 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -438,40 +438,6 @@ static noinline struct btrfs_fs_devices *find_fsid(
 
 	ASSERT(fsid);
 
-	if (metadata_fsid) {
-		/*
-		 * Handle scanned device having completed its fsid change but
-		 * belonging to a fs_devices that was created by first scanning
-		 * a device which didn't have its fsid/metadata_uuid changed
-		 * at all and the CHANGING_FSID_V2 flag set.
-		 */
-		list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
-			if (fs_devices->fsid_change &&
-			    memcmp(metadata_fsid, fs_devices->fsid,
-				   BTRFS_FSID_SIZE) == 0 &&
-			    memcmp(fs_devices->fsid, fs_devices->metadata_uuid,
-				   BTRFS_FSID_SIZE) == 0) {
-				return fs_devices;
-			}
-		}
-		/*
-		 * Handle scanned device having completed its fsid change but
-		 * belonging to a fs_devices that was created by a device that
-		 * has an outdated pair of fsid/metadata_uuid and
-		 * CHANGING_FSID_V2 flag set.
-		 */
-		list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
-			if (fs_devices->fsid_change &&
-			    memcmp(fs_devices->metadata_uuid,
-				   fs_devices->fsid, BTRFS_FSID_SIZE) != 0 &&
-			    memcmp(metadata_fsid, fs_devices->metadata_uuid,
-				   BTRFS_FSID_SIZE) == 0) {
-				return fs_devices;
-			}
-		}
-	}
-
-	/* Handle non-split brain cases */
 	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
 		if (metadata_fsid) {
 			if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0
@@ -712,6 +678,46 @@ static struct btrfs_fs_devices *find_fsid_changed(
 
 	return NULL;
 }
+
+static struct btrfs_fs_devices *find_fsid_changing_metada_uuid(
+					struct btrfs_super_block *disk_super)
+{
+	struct btrfs_fs_devices *fs_devices;
+
+	/*
+	 * Handle scanned device having completed its fsid change but
+	 * belonging to a fs_devices that was created by first scanning
+	 * a device which didn't have its fsid/metadata_uuid changed
+	 * at all and the CHANGING_FSID_V2 flag set.
+	 */
+	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+		if (fs_devices->fsid_change &&
+		    memcmp(disk_super->metadata_uuid, fs_devices->fsid,
+			   BTRFS_FSID_SIZE) == 0 &&
+		    memcmp(fs_devices->fsid, fs_devices->metadata_uuid,
+			   BTRFS_FSID_SIZE) == 0) {
+			return fs_devices;
+		}
+	}
+	/*
+	 * Handle scanned device having completed its fsid change but
+	 * belonging to a fs_devices that was created by a device that
+	 * has an outdated pair of fsid/metadata_uuid and
+	 * CHANGING_FSID_V2 flag set.
+	 */
+	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+		if (fs_devices->fsid_change &&
+		    memcmp(fs_devices->metadata_uuid,
+			   fs_devices->fsid, BTRFS_FSID_SIZE) != 0 &&
+		    memcmp(disk_super->metadata_uuid, fs_devices->metadata_uuid,
+			   BTRFS_FSID_SIZE) == 0) {
+			return fs_devices;
+		}
+	}
+
+	return find_fsid(disk_super->fsid, disk_super->metadata_uuid);
+}
+
 /*
  * Add new device to list of registered devices
  *
@@ -751,13 +757,11 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 			fs_devices = find_fsid_changed(disk_super);
 		}
 	} else if (has_metadata_uuid) {
-		fs_devices = find_fsid(disk_super->fsid,
-				       disk_super->metadata_uuid);
+		fs_devices = find_fsid_changing_metada_uuid(disk_super);
 	} else {
 		fs_devices = find_fsid(disk_super->fsid, NULL);
 	}
 
-
 	if (!fs_devices) {
 		if (has_metadata_uuid)
 			fs_devices = alloc_fs_devices(disk_super->fsid,
-- 
2.21.0 (Apple Git-122.2)


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/6] btrfs: split-brain case for scanned changing device with INCOMPAT_METADATA_UUID
  2019-12-12 11:01 [PATCH 0/6] btrfs: metadata uuid fixes and enhancements damenly.su
  2019-12-12 11:01 ` [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan damenly.su
  2019-12-12 11:01 ` [PATCH 2/6] btrfs: metadata_uuid: move split-brain handling from fs_id() to new function damenly.su
@ 2019-12-12 11:01 ` damenly.su
  2019-12-12 13:24   ` Su Yue
  2019-12-12 13:34   ` Nikolay Borisov
  2019-12-12 11:01 ` [PATCH 4/6] btrfs: split-brain case for scanned changed device without INCOMPAT_METADATA_UUID damenly.su
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: damenly.su @ 2019-12-12 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Su Yue

From: Su Yue <Damenly_Su@gmx.com>

This patch adds the case for scanned changing device with
INCOMPAT_METADATA_UUID.
For this situation, the origin code only handles the case
the devices already pulled into disk with INCOMPAT_METADATA_UUID set.
There is an another case that the successful changed devices synced
without INCOMPAT_METADATA_UUID.
So add the check of Heather fsid of scanned device equals
metadata_uuid of fs_devices which is with INCOMPAT_METADATA_UUID
feature.

Signed-off-by: Su Yue <Damenly_Su@gmx.com>
---
 fs/btrfs/volumes.c | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b08b06a89a77..61b4a107bb58 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -654,7 +654,6 @@ static struct btrfs_fs_devices *find_fsid_inprogress(
 	return NULL;
 }
 
-
 static struct btrfs_fs_devices *find_fsid_changed(
 					struct btrfs_super_block *disk_super)
 {
@@ -663,9 +662,14 @@ static struct btrfs_fs_devices *find_fsid_changed(
 	/*
 	 * Handles the case where scanned device is part of an fs that had
 	 * multiple successful changes of FSID but curently device didn't
-	 * observe it. Meaning our fsid will be different than theirs.
+	 * observe it.
+	 *
+	 * Case 1: the devices already changed still owns the feature, their
+	 * fsid must differ from the disk_super->fsid.
 	 */
 	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+		if (fs_devices->fsid_change)
+			continue;
 		if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
 			   BTRFS_FSID_SIZE) != 0 &&
 		    memcmp(fs_devices->metadata_uuid, disk_super->metadata_uuid,
@@ -676,7 +680,26 @@ static struct btrfs_fs_devices *find_fsid_changed(
 		}
 	}
 
-	return NULL;
+	/*
+	 * Case 2: the synced devices doesn't have the metadata_uuid feature.
+	 * NOTE: the fs_devices has same metadata_uuid and fsid in memory, but
+	 * they differs in disk, because fs_id is copied to
+	 * fs_devices->metadata_id while alloc_fs_devices if no metadata
+	 * feature.
+	 */
+	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+		if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
+			   BTRFS_FSID_SIZE) == 0 &&
+		    memcmp(fs_devices->fsid, disk_super->metadata_uuid,
+			   BTRFS_FSID_SIZE) == 0 && !fs_devices->fsid_change)
+			return fs_devices;
+	}
+
+	/*
+	 * Okay, can't found any fs_devices already synced, back to
+	 * search devices unchanged or changing like the device.
+	 */
+	return find_fsid(disk_super->fsid, disk_super->metadata_uuid);
 }
 
 static struct btrfs_fs_devices *find_fsid_changing_metada_uuid(
-- 
2.21.0 (Apple Git-122.2)


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/6] btrfs: split-brain case for scanned changed device without INCOMPAT_METADATA_UUID
  2019-12-12 11:01 [PATCH 0/6] btrfs: metadata uuid fixes and enhancements damenly.su
                   ` (2 preceding siblings ...)
  2019-12-12 11:01 ` [PATCH 3/6] btrfs: split-brain case for scanned changing device with INCOMPAT_METADATA_UUID damenly.su
@ 2019-12-12 11:01 ` damenly.su
  2019-12-12 11:01 ` [PATCH 5/6] btrfs: copy fsid and metadata_uuid for pulled disk " damenly.su
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: damenly.su @ 2019-12-12 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Su Yue

From: Su Yue <Damenly_Su@gmx.com>

This patch adds the case for scanned not changing device without
INCOMPAT_METADATA_UUID.
For this situation, the origin code only handles the case
the devices already pulled into disk with INCOMPAT_METADATA_UUID set.
There is an another case that the successful changed devices synced
without INCOMPAT_METADATA_UUID. And the scanned device is the exactly
changed one.

So we should check if there is any changing fs_devices with
INCOMPAT_METADATA_UUID.

Signed-off-by: Su Yue <Damenly_Su@gmx.com>
---
 fs/btrfs/volumes.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 61b4a107bb58..faf9cdd14f33 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -741,6 +741,35 @@ static struct btrfs_fs_devices *find_fsid_changing_metada_uuid(
 	return find_fsid(disk_super->fsid, disk_super->metadata_uuid);
 }
 
+static struct btrfs_fs_devices *find_fsid_changing(
+					struct btrfs_super_block *disk_super)
+{
+	struct btrfs_fs_devices *fs_devices;
+
+	/*
+	 * Handles the case where scanned device is part of an fs that had
+	 * multiple successful changes of FSID but currently device didn't
+	 * observe it.
+	 * Since the scanned devices does not own the metadata_uuid feature,
+	 * fsid and metadata_uuid of the changing devices must differ, and
+	 * their metadata_uuid must be same as disk_super->fsid.
+	 */
+	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+		if (!fs_devices->fsid_change)
+			continue;
+		if (memcmp(fs_devices->fsid, fs_devices->metadata_uuid,
+			   BTRFS_FSID_SIZE) != 0 &&
+		    memcmp(fs_devices->metadata_uuid, disk_super->fsid,
+			   BTRFS_FSID_SIZE) == 0)
+			return fs_devices;
+	}
+
+	/*
+	 * Back to find newer fs_devices is changeing or some in same stage.
+	 */
+	return find_fsid(disk_super->fsid, NULL);
+}
+
 /*
  * Add new device to list of registered devices
  *
@@ -782,7 +811,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 	} else if (has_metadata_uuid) {
 		fs_devices = find_fsid_changing_metada_uuid(disk_super);
 	} else {
-		fs_devices = find_fsid(disk_super->fsid, NULL);
+		fs_devices = find_fsid_changing(disk_super);
 	}
 
 	if (!fs_devices) {
-- 
2.21.0 (Apple Git-122.2)


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/6] btrfs: copy fsid and metadata_uuid for pulled disk without INCOMPAT_METADATA_UUID
  2019-12-12 11:01 [PATCH 0/6] btrfs: metadata uuid fixes and enhancements damenly.su
                   ` (3 preceding siblings ...)
  2019-12-12 11:01 ` [PATCH 4/6] btrfs: split-brain case for scanned changed device without INCOMPAT_METADATA_UUID damenly.su
@ 2019-12-12 11:01 ` damenly.su
  2020-01-06 15:12   ` Nikolay Borisov
  2019-12-12 11:01 ` [PATCH 6/6] btrfs: metadata_uuid: move partly logic into find_fsid_inprogress() damenly.su
  2019-12-13  8:03 ` [PATCH 0/6] btrfs: metadata uuid fixes and enhancements Nikolay Borisov
  6 siblings, 1 reply; 26+ messages in thread
From: damenly.su @ 2019-12-12 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Su Yue

From: Su Yue <Damenly_Su@gmx.com>

Since a scanned device may be the device pulled into disk without
metadata_uuid feature, there may already be changing devices there.
Here copy fsid and metadata_uuid for above case.

Signed-off-by: Su Yue <Damenly_Su@gmx.com>
---
 fs/btrfs/volumes.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index faf9cdd14f33..b21ab45e76a0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -964,13 +964,16 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 	 * metadata_uuid/fsid values of the fs_devices.
 	 */
 	if (*new_device_added && fs_devices_found &&
-	    has_metadata_uuid && fs_devices->fsid_change &&
+	    fs_devices->fsid_change &&
 	    found_transid > fs_devices->latest_generation) {
 		memcpy(fs_devices->fsid, disk_super->fsid,
 		       BTRFS_FSID_SIZE);
-		memcpy(fs_devices->metadata_uuid,
-		       disk_super->metadata_uuid, BTRFS_FSID_SIZE);
-
+		if (has_metadata_uuid)
+			memcpy(fs_devices->metadata_uuid,
+			       disk_super->metadata_uuid, BTRFS_FSID_SIZE);
+		else
+			memcpy(fs_devices->metadata_uuid,
+			       disk_super->fsid, BTRFS_FSID_SIZE);
 		fs_devices->fsid_change = false;
 	}
 
-- 
2.21.0 (Apple Git-122.2)


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6] btrfs: metadata_uuid: move partly logic into find_fsid_inprogress()
  2019-12-12 11:01 [PATCH 0/6] btrfs: metadata uuid fixes and enhancements damenly.su
                   ` (4 preceding siblings ...)
  2019-12-12 11:01 ` [PATCH 5/6] btrfs: copy fsid and metadata_uuid for pulled disk " damenly.su
@ 2019-12-12 11:01 ` damenly.su
  2019-12-12 13:37   ` Nikolay Borisov
  2019-12-13  8:03 ` [PATCH 0/6] btrfs: metadata uuid fixes and enhancements Nikolay Borisov
  6 siblings, 1 reply; 26+ messages in thread
From: damenly.su @ 2019-12-12 11:01 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Su Yue

From: Su Yue <Damenly_Su@gmx.com>

The partly logic can be moved into find_fsid_inprogress() to
make code for fs_devices finding looks more elegant.

Signed-off-by: Su Yue <Damenly_Su@gmx.com>
---
 fs/btrfs/volumes.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b21ab45e76a0..7e05f96b1575 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -636,6 +636,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 /*
  * Handle scanned device having its CHANGING_FSID_V2 flag set and the fs_devices
  * being created with a disk that has already completed its fsid change.
+ * Or it might belong to fs with no UUID changes in effect, handle both.
  */
 static struct btrfs_fs_devices *find_fsid_inprogress(
 					struct btrfs_super_block *disk_super)
@@ -651,7 +652,7 @@ static struct btrfs_fs_devices *find_fsid_inprogress(
 		}
 	}
 
-	return NULL;
+	return find_fsid(disk_super->fsid, NULL);
 }
 
 static struct btrfs_fs_devices *find_fsid_changed(
@@ -795,19 +796,10 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 	*new_device_added = false;
 
 	if (fsid_change_in_progress) {
-		if (!has_metadata_uuid) {
-			/*
-			 * When we have an image which has CHANGING_FSID_V2 set
-			 * it might belong to either a filesystem which has
-			 * disks with completed fsid change or it might belong
-			 * to fs with no UUID changes in effect, handle both.
-			 */
+		if (!has_metadata_uuid)
 			fs_devices = find_fsid_inprogress(disk_super);
-			if (!fs_devices)
-				fs_devices = find_fsid(disk_super->fsid, NULL);
-		} else {
+		else
 			fs_devices = find_fsid_changed(disk_super);
-		}
 	} else if (has_metadata_uuid) {
 		fs_devices = find_fsid_changing_metada_uuid(disk_super);
 	} else {
-- 
2.21.0 (Apple Git-122.2)


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] btrfs: metadata_uuid: move split-brain handling from fs_id() to new function
  2019-12-12 11:01 ` [PATCH 2/6] btrfs: metadata_uuid: move split-brain handling from fs_id() to new function damenly.su
@ 2019-12-12 13:05   ` Nikolay Borisov
  2019-12-12 13:32     ` Su Yue
  0 siblings, 1 reply; 26+ messages in thread
From: Nikolay Borisov @ 2019-12-12 13:05 UTC (permalink / raw)
  To: damenly.su, linux-btrfs; +Cc: Su Yue



On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
> From: Su Yue <Damenly_Su@gmx.com>
> 
> The parameter @metadata_fsid of fs_id() is not NULL while scanned device
> has metadata_uuid but not changing. Obviously, the cases handling part
> in fs_id() is for this situation. Move the logic into new function
> find_fsid_changing_metadata_uuid().

I think the following changelog might sound better.

metadata_fsid parameter of find_fsid is non-null in once specific case
in device_list_add. Factor the code out of find_fsid to a dedicated
function to improve readability.


> 
> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
> ---
>  fs/btrfs/volumes.c | 78 ++++++++++++++++++++++++----------------------
>  1 file changed, 41 insertions(+), 37 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9efa4123c335..b08b06a89a77 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -438,40 +438,6 @@ static noinline struct btrfs_fs_devices *find_fsid(
>  
>  	ASSERT(fsid);
>  
> -	if (metadata_fsid) {
> -		/*
> -		 * Handle scanned device having completed its fsid change but
> -		 * belonging to a fs_devices that was created by first scanning
> -		 * a device which didn't have its fsid/metadata_uuid changed
> -		 * at all and the CHANGING_FSID_V2 flag set.
> -		 */
> -		list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
> -			if (fs_devices->fsid_change &&
> -			    memcmp(metadata_fsid, fs_devices->fsid,
> -				   BTRFS_FSID_SIZE) == 0 &&
> -			    memcmp(fs_devices->fsid, fs_devices->metadata_uuid,
> -				   BTRFS_FSID_SIZE) == 0) {
> -				return fs_devices;
> -			}
> -		}
> -		/*
> -		 * Handle scanned device having completed its fsid change but
> -		 * belonging to a fs_devices that was created by a device that
> -		 * has an outdated pair of fsid/metadata_uuid and
> -		 * CHANGING_FSID_V2 flag set.
> -		 */
> -		list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
> -			if (fs_devices->fsid_change &&
> -			    memcmp(fs_devices->metadata_uuid,
> -				   fs_devices->fsid, BTRFS_FSID_SIZE) != 0 &&
> -			    memcmp(metadata_fsid, fs_devices->metadata_uuid,
> -				   BTRFS_FSID_SIZE) == 0) {
> -				return fs_devices;
> -			}
> -		}
> -	}
> -
> -	/* Handle non-split brain cases */
>  	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
>  		if (metadata_fsid) {
>  			if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0
> @@ -712,6 +678,46 @@ static struct btrfs_fs_devices *find_fsid_changed(
>  
>  	return NULL;
>  }
> +
> +static struct btrfs_fs_devices *find_fsid_changing_metada_uuid(

That function name is long (and contains a typo). More like something like:

find_fsid_with_metadata_uuid.

> +					struct btrfs_super_block *disk_super)
> +{
> +	struct btrfs_fs_devices *fs_devices;
> +
> +	/*
> +	 * Handle scanned device having completed its fsid change but
> +	 * belonging to a fs_devices that was created by first scanning
> +	 * a device which didn't have its fsid/metadata_uuid changed
> +	 * at all and the CHANGING_FSID_V2 flag set.
> +	 */
> +	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
> +		if (fs_devices->fsid_change &&
> +		    memcmp(disk_super->metadata_uuid, fs_devices->fsid,
> +			   BTRFS_FSID_SIZE) == 0 &&
> +		    memcmp(fs_devices->fsid, fs_devices->metadata_uuid,
> +			   BTRFS_FSID_SIZE) == 0) {
> +			return fs_devices;
> +		}
> +	}
> +	/*
> +	 * Handle scanned device having completed its fsid change but
> +	 * belonging to a fs_devices that was created by a device that
> +	 * has an outdated pair of fsid/metadata_uuid and
> +	 * CHANGING_FSID_V2 flag set.
> +	 */
> +	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
> +		if (fs_devices->fsid_change &&
> +		    memcmp(fs_devices->metadata_uuid,
> +			   fs_devices->fsid, BTRFS_FSID_SIZE) != 0 &&
> +		    memcmp(disk_super->metadata_uuid, fs_devices->metadata_uuid,
> +			   BTRFS_FSID_SIZE) == 0) {
> +			return fs_devices;
> +		}
> +	}
> +
> +	return find_fsid(disk_super->fsid, disk_super->metadata_uuid);
> +}
> +
>  /*
>   * Add new device to list of registered devices
>   *
> @@ -751,13 +757,11 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>  			fs_devices = find_fsid_changed(disk_super);
>  		}
>  	} else if (has_metadata_uuid) {
> -		fs_devices = find_fsid(disk_super->fsid,
> -				       disk_super->metadata_uuid);
> +		fs_devices = find_fsid_changing_metada_uuid(disk_super);
>  	} else {
>  		fs_devices = find_fsid(disk_super->fsid, NULL);
>  	}
>  
> -
>  	if (!fs_devices) {
>  		if (has_metadata_uuid)
>  			fs_devices = alloc_fs_devices(disk_super->fsid,
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] btrfs: split-brain case for scanned changing device with INCOMPAT_METADATA_UUID
  2019-12-12 11:01 ` [PATCH 3/6] btrfs: split-brain case for scanned changing device with INCOMPAT_METADATA_UUID damenly.su
@ 2019-12-12 13:24   ` Su Yue
  2019-12-12 13:34   ` Nikolay Borisov
  1 sibling, 0 replies; 26+ messages in thread
From: Su Yue @ 2019-12-12 13:24 UTC (permalink / raw)
  To: damenly.su, linux-btrfs

On 2019/12/12 7:01 PM, damenly.su@gmail.com wrote:
> From: Su Yue <Damenly_Su@gmx.com>
>
> This patch adds the case for scanned changing device with
> INCOMPAT_METADATA_UUID.
> For this situation, the origin code only handles the case
> the devices already pulled into disk with INCOMPAT_METADATA_UUID set.
> There is an another case that the successful changed devices synced
> without INCOMPAT_METADATA_UUID.
> So add the check of Heather fsid of scanned device equals

s/Heather/whether/g, blame my spellchecker.

> metadata_uuid of fs_devices which is with INCOMPAT_METADATA_UUID
> feature.
>
> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
> ---
>   fs/btrfs/volumes.c | 29 ++++++++++++++++++++++++++---
>   1 file changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index b08b06a89a77..61b4a107bb58 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -654,7 +654,6 @@ static struct btrfs_fs_devices *find_fsid_inprogress(
>   	return NULL;
>   }
>
> -
>   static struct btrfs_fs_devices *find_fsid_changed(
>   					struct btrfs_super_block *disk_super)
>   {
> @@ -663,9 +662,14 @@ static struct btrfs_fs_devices *find_fsid_changed(
>   	/*
>   	 * Handles the case where scanned device is part of an fs that had
>   	 * multiple successful changes of FSID but curently device didn't
> -	 * observe it. Meaning our fsid will be different than theirs.
> +	 * observe it.
> +	 *
> +	 * Case 1: the devices already changed still owns the feature, their
> +	 * fsid must differ from the disk_super->fsid.
>   	 */
>   	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
> +		if (fs_devices->fsid_change)
> +			continue;
>   		if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
>   			   BTRFS_FSID_SIZE) != 0 &&
>   		    memcmp(fs_devices->metadata_uuid, disk_super->metadata_uuid,
> @@ -676,7 +680,26 @@ static struct btrfs_fs_devices *find_fsid_changed(
>   		}
>   	}
>
> -	return NULL;
> +	/*
> +	 * Case 2: the synced devices doesn't have the metadata_uuid feature.
> +	 * NOTE: the fs_devices has same metadata_uuid and fsid in memory, but
> +	 * they differs in disk, because fs_id is copied to
> +	 * fs_devices->metadata_id while alloc_fs_devices if no metadata
> +	 * feature.
> +	 */
> +	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
> +		if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
> +			   BTRFS_FSID_SIZE) == 0 &&
> +		    memcmp(fs_devices->fsid, disk_super->metadata_uuid,
> +			   BTRFS_FSID_SIZE) == 0 && !fs_devices->fsid_change)
> +			return fs_devices;
> +	}
> +
> +	/*
> +	 * Okay, can't found any fs_devices already synced, back to
> +	 * search devices unchanged or changing like the device.
> +	 */
> +	return find_fsid(disk_super->fsid, disk_super->metadata_uuid);
>   }
>
>   static struct btrfs_fs_devices *find_fsid_changing_metada_uuid(
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] btrfs: metadata_uuid: move split-brain handling from fs_id() to new function
  2019-12-12 13:05   ` Nikolay Borisov
@ 2019-12-12 13:32     ` Su Yue
  0 siblings, 0 replies; 26+ messages in thread
From: Su Yue @ 2019-12-12 13:32 UTC (permalink / raw)
  To: Nikolay Borisov, damenly.su, linux-btrfs

On 2019/12/12 9:05 PM, Nikolay Borisov wrote:
>
>
> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>> From: Su Yue <Damenly_Su@gmx.com>
>>
>> The parameter @metadata_fsid of fs_id() is not NULL while scanned device
>> has metadata_uuid but not changing. Obviously, the cases handling part
>> in fs_id() is for this situation. Move the logic into new function
>> find_fsid_changing_metadata_uuid().
>
> I think the following changelog might sound better.
>
> metadata_fsid parameter of find_fsid is non-null in once specific case
> in device_list_add. Factor the code out of find_fsid to a dedicated
> function to improve readability.
>

Thanks for the changelog! will replace.
>
>>
>> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
>> ---
>>   fs/btrfs/volumes.c | 78 ++++++++++++++++++++++++----------------------
>>   1 file changed, 41 insertions(+), 37 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 9efa4123c335..b08b06a89a77 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -438,40 +438,6 @@ static noinline struct btrfs_fs_devices *find_fsid(
>>
>>   	ASSERT(fsid);
>>
>> -	if (metadata_fsid) {
>> -		/*
>> -		 * Handle scanned device having completed its fsid change but
>> -		 * belonging to a fs_devices that was created by first scanning
>> -		 * a device which didn't have its fsid/metadata_uuid changed
>> -		 * at all and the CHANGING_FSID_V2 flag set.
>> -		 */
>> -		list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
>> -			if (fs_devices->fsid_change &&
>> -			    memcmp(metadata_fsid, fs_devices->fsid,
>> -				   BTRFS_FSID_SIZE) == 0 &&
>> -			    memcmp(fs_devices->fsid, fs_devices->metadata_uuid,
>> -				   BTRFS_FSID_SIZE) == 0) {
>> -				return fs_devices;
>> -			}
>> -		}
>> -		/*
>> -		 * Handle scanned device having completed its fsid change but
>> -		 * belonging to a fs_devices that was created by a device that
>> -		 * has an outdated pair of fsid/metadata_uuid and
>> -		 * CHANGING_FSID_V2 flag set.
>> -		 */
>> -		list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
>> -			if (fs_devices->fsid_change &&
>> -			    memcmp(fs_devices->metadata_uuid,
>> -				   fs_devices->fsid, BTRFS_FSID_SIZE) != 0 &&
>> -			    memcmp(metadata_fsid, fs_devices->metadata_uuid,
>> -				   BTRFS_FSID_SIZE) == 0) {
>> -				return fs_devices;
>> -			}
>> -		}
>> -	}
>> -
>> -	/* Handle non-split brain cases */
>>   	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
>>   		if (metadata_fsid) {
>>   			if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0
>> @@ -712,6 +678,46 @@ static struct btrfs_fs_devices *find_fsid_changed(
>>
>>   	return NULL;
>>   }
>> +
>> +static struct btrfs_fs_devices *find_fsid_changing_metada_uuid(
>
> That function name is long (and contains a typo). More like something like:
>
> find_fsid_with_metadata_uuid.

Fine. Naming is so hard to me.

Thanks.
>
>> +					struct btrfs_super_block *disk_super)
>> +{
>> +	struct btrfs_fs_devices *fs_devices;
>> +
>> +	/*
>> +	 * Handle scanned device having completed its fsid change but
>> +	 * belonging to a fs_devices that was created by first scanning
>> +	 * a device which didn't have its fsid/metadata_uuid changed
>> +	 * at all and the CHANGING_FSID_V2 flag set.
>> +	 */
>> +	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
>> +		if (fs_devices->fsid_change &&
>> +		    memcmp(disk_super->metadata_uuid, fs_devices->fsid,
>> +			   BTRFS_FSID_SIZE) == 0 &&
>> +		    memcmp(fs_devices->fsid, fs_devices->metadata_uuid,
>> +			   BTRFS_FSID_SIZE) == 0) {
>> +			return fs_devices;
>> +		}
>> +	}
>> +	/*
>> +	 * Handle scanned device having completed its fsid change but
>> +	 * belonging to a fs_devices that was created by a device that
>> +	 * has an outdated pair of fsid/metadata_uuid and
>> +	 * CHANGING_FSID_V2 flag set.
>> +	 */
>> +	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
>> +		if (fs_devices->fsid_change &&
>> +		    memcmp(fs_devices->metadata_uuid,
>> +			   fs_devices->fsid, BTRFS_FSID_SIZE) != 0 &&
>> +		    memcmp(disk_super->metadata_uuid, fs_devices->metadata_uuid,
>> +			   BTRFS_FSID_SIZE) == 0) {
>> +			return fs_devices;
>> +		}
>> +	}
>> +
>> +	return find_fsid(disk_super->fsid, disk_super->metadata_uuid);
>> +}
>> +
>>   /*
>>    * Add new device to list of registered devices
>>    *
>> @@ -751,13 +757,11 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>>   			fs_devices = find_fsid_changed(disk_super);
>>   		}
>>   	} else if (has_metadata_uuid) {
>> -		fs_devices = find_fsid(disk_super->fsid,
>> -				       disk_super->metadata_uuid);
>> +		fs_devices = find_fsid_changing_metada_uuid(disk_super);
>>   	} else {
>>   		fs_devices = find_fsid(disk_super->fsid, NULL);
>>   	}
>>
>> -
>>   	if (!fs_devices) {
>>   		if (has_metadata_uuid)
>>   			fs_devices = alloc_fs_devices(disk_super->fsid,
>>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] btrfs: split-brain case for scanned changing device with INCOMPAT_METADATA_UUID
  2019-12-12 11:01 ` [PATCH 3/6] btrfs: split-brain case for scanned changing device with INCOMPAT_METADATA_UUID damenly.su
  2019-12-12 13:24   ` Su Yue
@ 2019-12-12 13:34   ` Nikolay Borisov
  2019-12-12 14:19     ` Su Yue
  1 sibling, 1 reply; 26+ messages in thread
From: Nikolay Borisov @ 2019-12-12 13:34 UTC (permalink / raw)
  To: damenly.su, linux-btrfs; +Cc: Su Yue



On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
> From: Su Yue <Damenly_Su@gmx.com>
> 
> This patch adds the case for scanned changing device with
> INCOMPAT_METADATA_UUID.
> For this situation, the origin code only handles the case
> the devices already pulled into disk with INCOMPAT_METADATA_UUID set.
> There is an another case that the successful changed devices synced
> without INCOMPAT_METADATA_UUID.
> So add the check of Heather fsid of scanned device equals
> metadata_uuid of fs_devices which is with INCOMPAT_METADATA_UUID
> feature.
> 

This is hard for me to parse and correctly understand what you mean.

> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
> ---
>  fs/btrfs/volumes.c | 29 ++++++++++++++++++++++++++---
>  1 file changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index b08b06a89a77..61b4a107bb58 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -654,7 +654,6 @@ static struct btrfs_fs_devices *find_fsid_inprogress(
>  	return NULL;
>  }
>  
> -
>  static struct btrfs_fs_devices *find_fsid_changed(
>  					struct btrfs_super_block *disk_super)


find_fsid_changed handles the case where a device belongs to a
filesystem which had multiple successful fsid changed but it failed on
the last one.

>  {
> @@ -663,9 +662,14 @@ static struct btrfs_fs_devices *find_fsid_changed(
>  	/*
>  	 * Handles the case where scanned device is part of an fs that had
>  	 * multiple successful changes of FSID but curently device didn't
> -	 * observe it. Meaning our fsid will be different than theirs.
> +	 * observe it.
> +	 *
> +	 * Case 1: the devices already changed still owns the feature, their
> +	 * fsid must differ from the disk_super->fsid.

What do you mean by device to still owns the feature? Has the bit set or
something else?

>  	 */
>  	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
> +		if (fs_devices->fsid_change)
> +			continue;

Why do you do this?

>  		if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
>  			   BTRFS_FSID_SIZE) != 0 &&
>  		    memcmp(fs_devices->metadata_uuid, disk_super->metadata_uuid,
> @@ -676,7 +680,26 @@ static struct btrfs_fs_devices *find_fsid_changed(
>  		}
>  	}
>  
> -	return NULL;
> +	/*
> +	 * Case 2: the synced devices doesn't have the metadata_uuid feature.
> +	 * NOTE: the fs_devices has same metadata_uuid and fsid in memory, but
> +	 * they differs in disk, because fs_id is copied to
> +	 * fs_devices->metadata_id while alloc_fs_devices if no metadata

It's not possible for the device to have metadata_uuid feature because
this function is called from device_list_add iff the device has
METADATA_UUID flag:

if (fsid_change_in_progress) {

if (!has_metadata_uuid) {
} else {
 find_fsid_changed <-- here we are sure our device has METADATA_UUID set.
}
}

> +	 * feature.
> +	 */
> +	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
> +		if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
> +			   BTRFS_FSID_SIZE) == 0 &&
> +		    memcmp(fs_devices->fsid, disk_super->metadata_uuid,
> +			   BTRFS_FSID_SIZE) == 0 && !fs_devices->fsid_change)
> +			return fs_devices;
> +	}
> +
> +	/*
> +	 * Okay, can't found any fs_devices already synced, back to
> +	 * search devices unchanged or changing like the device.
> +	 */
> +	return find_fsid(disk_super->fsid, disk_super->metadata_uuid);
>  }
>  
>  static struct btrfs_fs_devices *find_fsid_changing_metada_uuid(
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] btrfs: metadata_uuid: move partly logic into find_fsid_inprogress()
  2019-12-12 11:01 ` [PATCH 6/6] btrfs: metadata_uuid: move partly logic into find_fsid_inprogress() damenly.su
@ 2019-12-12 13:37   ` Nikolay Borisov
  0 siblings, 0 replies; 26+ messages in thread
From: Nikolay Borisov @ 2019-12-12 13:37 UTC (permalink / raw)
  To: damenly.su, linux-btrfs; +Cc: Su Yue



On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
> From: Su Yue <Damenly_Su@gmx.com>
> 
> The partly logic can be moved into find_fsid_inprogress() to
> make code for fs_devices finding looks more elegant.
> 
> Signed-off-by: Su Yue <Damenly_Su@gmx.com>

Code-wise the change is correct, on the other hand it's overloading what
find_fsid_inprogress handles. I did this to make it explicitly clear
what functions are called in what case as this code is somewhat tricky.

David, do you think this change is worth it.

> ---
>  fs/btrfs/volumes.c | 16 ++++------------
>  1 file changed, 4 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index b21ab45e76a0..7e05f96b1575 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -636,6 +636,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>  /*
>   * Handle scanned device having its CHANGING_FSID_V2 flag set and the fs_devices
>   * being created with a disk that has already completed its fsid change.
> + * Or it might belong to fs with no UUID changes in effect, handle both.
>   */
>  static struct btrfs_fs_devices *find_fsid_inprogress(
>  					struct btrfs_super_block *disk_super)
> @@ -651,7 +652,7 @@ static struct btrfs_fs_devices *find_fsid_inprogress(
>  		}
>  	}
>  
> -	return NULL;
> +	return find_fsid(disk_super->fsid, NULL);
>  }
>  
>  static struct btrfs_fs_devices *find_fsid_changed(
> @@ -795,19 +796,10 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>  	*new_device_added = false;
>  
>  	if (fsid_change_in_progress) {
> -		if (!has_metadata_uuid) {
> -			/*
> -			 * When we have an image which has CHANGING_FSID_V2 set
> -			 * it might belong to either a filesystem which has
> -			 * disks with completed fsid change or it might belong
> -			 * to fs with no UUID changes in effect, handle both.
> -			 */
> +		if (!has_metadata_uuid)
>  			fs_devices = find_fsid_inprogress(disk_super);
> -			if (!fs_devices)
> -				fs_devices = find_fsid(disk_super->fsid, NULL);
> -		} else {
> +		else
>  			fs_devices = find_fsid_changed(disk_super);
> -		}
>  	} else if (has_metadata_uuid) {
>  		fs_devices = find_fsid_changing_metada_uuid(disk_super);
>  	} else {
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan
  2019-12-12 11:01 ` [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan damenly.su
@ 2019-12-12 14:15   ` Nikolay Borisov
  2019-12-13  2:30     ` Su Yue
  2019-12-13  2:46     ` [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan (reformatted) Su Yue
  0 siblings, 2 replies; 26+ messages in thread
From: Nikolay Borisov @ 2019-12-12 14:15 UTC (permalink / raw)
  To: damenly.su, linux-btrfs; +Cc: Su Yue



On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:

<snip>

> Acutally, there are two devices in the fs. Device 2 with
> FSID_CHANGING_V2 allocated a fs_devices. But, device 1 found the
> fs_devices but failed to be added into since fs_devices->opened (

It's not clear why device 1 wasn't able to be added to the fs_devices
allocated by dev 2. Please elaborate?


> the thread is doing mount device 1). But device 1's fsid was copied
> to fs_devices->fsid then the assertion failed.


dev 1 fsid should be copied iff its transid is newer.

> 
> The solution is that only if a new device was added into a existing
> fs_device, then the fs_devices->fsid is allowed to be rewritten.

fs_devices->fsid must be re-written by any device which is _newer_ w.r.t
to the transid.

> 
> Fixes: 7a62d0f07377 ("btrfs: Handle one more split-brain scenario during fsid change")
> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
> ---
>  fs/btrfs/volumes.c | 36 +++++++++++++++++++++---------------
>  1 file changed, 21 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index d8e5560db285..9efa4123c335 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -732,6 +732,9 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>  		BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
>  	bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
>  					BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
> +	bool fs_devices_found = false;
> +
> +	*new_device_added = false;
>  
>  	if (fsid_change_in_progress) {
>  		if (!has_metadata_uuid) {
> @@ -772,24 +775,11 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>  
>  		device = NULL;
>  	} else {
> +		fs_devices_found = true;
> +
>  		mutex_lock(&fs_devices->device_list_mutex);
>  		device = btrfs_find_device(fs_devices, devid,
>  				disk_super->dev_item.uuid, NULL, false);
> -
> -		/*
> -		 * If this disk has been pulled into an fs devices created by
> -		 * a device which had the CHANGING_FSID_V2 flag then replace the
> -		 * metadata_uuid/fsid values of the fs_devices.
> -		 */
> -		if (has_metadata_uuid && fs_devices->fsid_change &&
> -		    found_transid > fs_devices->latest_generation) {
> -			memcpy(fs_devices->fsid, disk_super->fsid,
> -					BTRFS_FSID_SIZE);
> -			memcpy(fs_devices->metadata_uuid,
> -					disk_super->metadata_uuid, BTRFS_FSID_SIZE);
> -
> -			fs_devices->fsid_change = false;
> -		}
>  	}
>  
>  	if (!device) {
> @@ -912,6 +902,22 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>  		}
>  	}
>  
> +	/*
> +	 * If the new added disk has been pulled into an fs devices created by
> +	 * a device which had the CHANGING_FSID_V2 flag then replace the
> +	 * metadata_uuid/fsid values of the fs_devices.
> +	 */
> +	if (*new_device_added && fs_devices_found &&
> +	    has_metadata_uuid && fs_devices->fsid_change &&
> +	    found_transid > fs_devices->latest_generation) {
> +		memcpy(fs_devices->fsid, disk_super->fsid,
> +		       BTRFS_FSID_SIZE);
> +		memcpy(fs_devices->metadata_uuid,
> +		       disk_super->metadata_uuid, BTRFS_FSID_SIZE);
> +
> +		fs_devices->fsid_change = false;
> +	}
> +
>  	/*
>  	 * Unmount does not free the btrfs_device struct but would zero
>  	 * generation along with most of the other members. So just update
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] btrfs: split-brain case for scanned changing device with INCOMPAT_METADATA_UUID
  2019-12-12 13:34   ` Nikolay Borisov
@ 2019-12-12 14:19     ` Su Yue
  0 siblings, 0 replies; 26+ messages in thread
From: Su Yue @ 2019-12-12 14:19 UTC (permalink / raw)
  To: Nikolay Borisov, damenly.su, linux-btrfs

On 2019/12/12 9:34 PM, Nikolay Borisov wrote:
>
>
> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>> From: Su Yue <Damenly_Su@gmx.com>
>>
>> This patch adds the case for scanned changing device with
>> INCOMPAT_METADATA_UUID.
>> For this situation, the origin code only handles the case
>> the devices already pulled into disk with INCOMPAT_METADATA_UUID set.
>> There is an another case that the successful changed devices synced
>> without INCOMPAT_METADATA_UUID.
>> So add the check of Heather fsid of scanned device equals
>> metadata_uuid of fs_devices which is with INCOMPAT_METADATA_UUID
>> feature.
>>
>
> This is hard for me to parse and correctly understand what you mean.
>

Sorry..

>> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
>> ---
>>   fs/btrfs/volumes.c | 29 ++++++++++++++++++++++++++---
>>   1 file changed, 26 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index b08b06a89a77..61b4a107bb58 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -654,7 +654,6 @@ static struct btrfs_fs_devices *find_fsid_inprogress(
>>   	return NULL;
>>   }
>>
>> -
>>   static struct btrfs_fs_devices *find_fsid_changed(
>>   					struct btrfs_super_block *disk_super)
>
>
> find_fsid_changed handles the case where a device belongs to a
> filesystem which had multiple successful fsid changed but it failed on
> the last one.
>
I go it while reading code. And "the last one" is the one where the
@disk_super read from. It has the metadata_uuid and FSID_CHANGING_V2.
Right?
What I want to express in changelog  is that those successful changed
devices *may*
not have the metadata_uuid feature. The code in btrfstune.c: line 141

         if (new_uuid && uuid_changed && memcmp(disk_super->metadata_uuid,
                                                new_fsid,
BTRFS_FSID_SIZE) == 0) {
                 /*
                  * Changing fsid to be the same as metadata uuid, so just
                  * disable the flag
                  */
                 memcpy(disk_super->fsid, &new_fsid, BTRFS_FSID_SIZE);
                 incompat_flags &= ~BTRFS_FEATURE_INCOMPAT_METADATA_UUID;
                 btrfs_set_super_incompat_flags(disk_super, incompat_flags);
                 memset(disk_super->metadata_uuid, 0, BTRFS_FSID_SIZE);

The INCOMPAT_METADATA_UUID can be cleared if changing fsid is same with
the metadata_uuid in deivce.

>>   {
>> @@ -663,9 +662,14 @@ static struct btrfs_fs_devices *find_fsid_changed(
>>   	/*
>>   	 * Handles the case where scanned device is part of an fs that had
>>   	 * multiple successful changes of FSID but curently device didn't
>> -	 * observe it. Meaning our fsid will be different than theirs.
>> +	 * observe it.
>> +	 *
>> +	 * Case 1: the devices already changed still owns the feature, their
>> +	 * fsid must differ from the disk_super->fsid.
>
> What do you mean by device to still owns the feature? Has the bit set or
> something else?
>
The metadata_uuid feature.
>>   	 */
>>   	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
>> +		if (fs_devices->fsid_change)
>> +			continue;
>
> Why do you do this?

Just make cases separated.

>
>>   		if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
>>   			   BTRFS_FSID_SIZE) != 0 &&
>>   		    memcmp(fs_devices->metadata_uuid, disk_super->metadata_uuid,
>> @@ -676,7 +680,26 @@ static struct btrfs_fs_devices *find_fsid_changed(
>>   		}
>>   	}
>>
>> -	return NULL;
>> +	/*
>> +	 * Case 2: the synced devices doesn't have the metadata_uuid feature.
>> +	 * NOTE: the fs_devices has same metadata_uuid and fsid in memory, but
>> +	 * they differs in disk, because fs_id is copied to
>> +	 * fs_devices->metadata_id while alloc_fs_devices if no metadata
>
> It's not possible for the device to have metadata_uuid feature because
> this function is called from device_list_add iff the device has
> METADATA_UUID flag:
>

Get and agree what you mean. As the reply above, the fs_devices already
allocated there(not the device scanning) may not have METADATA_UUID flag.

> if (fsid_change_in_progress) {
>
> if (!has_metadata_uuid) {
> } else {
>   find_fsid_changed <-- here we are sure our device has METADATA_UUID set.
> }
> }
>
>> +	 * feature.
>> +	 */
>> +	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
>> +		if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
>> +			   BTRFS_FSID_SIZE) == 0 &&
>> +		    memcmp(fs_devices->fsid, disk_super->metadata_uuid,
>> +			   BTRFS_FSID_SIZE) == 0 && !fs_devices->fsid_change)
>> +			return fs_devices;
>> +	}
>> +
>> +	/*
>> +	 * Okay, can't found any fs_devices already synced, back to
>> +	 * search devices unchanged or changing like the device.
>> +	 */
>> +	return find_fsid(disk_super->fsid, disk_super->metadata_uuid);
>>   }
>>
>>   static struct btrfs_fs_devices *find_fsid_changing_metada_uuid(
>>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan
  2019-12-12 14:15   ` Nikolay Borisov
@ 2019-12-13  2:30     ` Su Yue
  2019-12-13  2:46     ` [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan (reformatted) Su Yue
  1 sibling, 0 replies; 26+ messages in thread
From: Su Yue @ 2019-12-13  2:30 UTC (permalink / raw)
  To: Nikolay Borisov, damenly.su, linux-btrfs

On 2019/12/12 10:15 PM, Nikolay Borisov wrote:
>
>
> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>
> <snip>
>
>> Acutally, there are two devices in the fs. Device 2 with
>> FSID_CHANGING_V2 allocated a fs_devices. But, device 1 found the
>> fs_devices but failed to be added into since fs_devices->opened (
>  > It's not clear why device 1 wasn't able to be added to the fs_devices
> allocated by dev 2. Please elaborate?
>
Because fs_devices is opened.
For example.

$cat test.sh
====================================================================
img1="/tmp/test1.img"
img2="/tmp/test2.img"

[ -f "$img1" ] || fallocate -l 300M "$img1"
[ -f "$img2" ] || fallocate -l 300M "$img2"

mkfs.btrfs -f $img1 $img2 2>&1 >/dev/null|| exit 1
losetup -D

dmesg -C
rmmod btrfs
modprobe btrfs

loop1=$(losetup --find --show "$img1")
loop2=$(losetup --find --show "$img2")

mount $loop1 /mnt || exit 1
umount /mnt
====================================================================

$dmesg
====================================================================
[  395.205221] BTRFS: device fsid 5090db22-5e48-4767-8fb7-d037c619c1ee
devid 1 transid 5 /dev/loop0 scanned by systemd-udevd (13620)
[  395.210773] !!!!!!!!fs_device opened
[  395.213875] BTRFS info (device loop0): disk space caching is enabled
[  395.214994] BTRFS info (device loop0): has skinny extents
[  395.215891] BTRFS info (device loop0): flagging fs with big metadata
feature
[  395.222639] BTRFS error (device loop0): devid 2 uuid
adcc8454-695f-4e1d-bde8-94041b7bf761 is missing
[  395.224147] BTRFS error (device loop0): failed to read the system
array: -2
[  395.246163] !!!!!!!!fs_device opened
[  395.338219] BTRFS error (device loop0): open_ctree failed
=====================================================================

The line "!!!!!!!!fs_device opened" is handy added by me in debug purpose.

=====================================================================
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -794,6 +794,7 @@ static noinline struct btrfs_device
*device_list_add(const char *path,

         if (!device) {
                 if (fs_devices->opened) {
+                       pr_info("!!!!!!!!fs_device opened\n");
                         mutex_unlock(&fs_devices->device_list_mutex);
                         return ERR_PTR(-EBUSY);
                 }
=====================================================================

To make it more clear. The following is in metadata_uuid situation.
Device 1 is without FSID_CHANGING_V2 but has IMCOMPAT_METADATA_UUID.
(newer transid).

Device 2 is with FSID_CHANGING_V2 and IMCOMPAT_METADATA_UUID.(Older
transid).

The workflow in misc-tests/034 is

loop1=$(losetup --find --show "$device2")
loop2=$(losetup --find --show "$device1")

mount $loop1 /mnt ---> fails here

Assume the fs_devices was allocated by systemd-udevd through
btrfs_control_ioctl() path after finish of scanning of device2.

Then:

Thread *mounting device2*               Thread *scanning device1*


btrfs_mount_root			btrfs_control_ioctl

   mutex_lock(&uuid_mutex);

     btrfs_read_disk_super
     btrfs_scan_one_device
     --> there is only device2
	in the fs_devices

     btrfs_open_devices
       fs_devices->opened = 1
       fs_devices->latest_bdev = device2

   mutex_unlock(&uuid_mutex);
   -->Here, fs_devices->fsid is same
      as device2's fsid.
					mutex_lock(&uuid_mutex);
					btrfs_scan_one_device

					  btrfs_read_disk_super
					  device_list_add
					    found fs_devices
					      device = btrfs_find_device

					      rewrite fs_deivces->fsid

                                               if scanned device is newer
                                               --> Change fs_devices->fsi
                                                   d to device1->fsid

					    if (!device)
						if fs_devices->opened
						--> the device1 adding
						    aborts since
						    fs_devices
						    was opened
					mutex_unlock(&uuid_mutex);
   btrfs_fill_super
     open_ctree
        btrfs_read_dev_super(
        fs_devices->latest_bdev)
        --> the latest_bdev is device2

        assert fs_devices->fsid equals device2's fsid.
        --> fs_device->fsid was rewritten by the scanning thread

>
>> the thread is doing mount device 1). But device 1's fsid was copied
>> to fs_devices->fsid then the assertion failed.
>
>
> dev 1 fsid should be copied iff its transid is newer.
>
Even it was failed to be added into the fs_devices?
>>
>> The solution is that only if a new device was added into a existing
>> fs_device, then the fs_devices->fsid is allowed to be rewritten.
>
> fs_devices->fsid must be re-written by any device which is _newer_ w.r.t
> to the transid.

Then the assertion failed in above scenario. Just do not update the
fs_devices->fsid, let later btrfs_read_sys_array() report the device
missing then reject to mount.

Thanks
>
>>
>> Fixes: 7a62d0f07377 ("btrfs: Handle one more split-brain scenario during fsid change")
>> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
>> ---
>>   fs/btrfs/volumes.c | 36 +++++++++++++++++++++---------------
>>   1 file changed, 21 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index d8e5560db285..9efa4123c335 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -732,6 +732,9 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>>   		BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
>>   	bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
>>   					BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
>> +	bool fs_devices_found = false;
>> +
>> +	*new_device_added = false;
>>
>>   	if (fsid_change_in_progress) {
>>   		if (!has_metadata_uuid) {
>> @@ -772,24 +775,11 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>>
>>   		device = NULL;
>>   	} else {
>> +		fs_devices_found = true;
>> +
>>   		mutex_lock(&fs_devices->device_list_mutex);
>>   		device = btrfs_find_device(fs_devices, devid,
>>   				disk_super->dev_item.uuid, NULL, false);
>> -
>> -		/*
>> -		 * If this disk has been pulled into an fs devices created by
>> -		 * a device which had the CHANGING_FSID_V2 flag then replace the
>> -		 * metadata_uuid/fsid values of the fs_devices.
>> -		 */
>> -		if (has_metadata_uuid && fs_devices->fsid_change &&
>> -		    found_transid > fs_devices->latest_generation) {
>> -			memcpy(fs_devices->fsid, disk_super->fsid,
>> -					BTRFS_FSID_SIZE);
>> -			memcpy(fs_devices->metadata_uuid,
>> -					disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>> -
>> -			fs_devices->fsid_change = false;
>> -		}
>>   	}
>>
>>   	if (!device) {
>> @@ -912,6 +902,22 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>>   		}
>>   	}
>>
>> +	/*
>> +	 * If the new added disk has been pulled into an fs devices created by
>> +	 * a device which had the CHANGING_FSID_V2 flag then replace the
>> +	 * metadata_uuid/fsid values of the fs_devices.
>> +	 */
>> +	if (*new_device_added && fs_devices_found &&
>> +	    has_metadata_uuid && fs_devices->fsid_change &&
>> +	    found_transid > fs_devices->latest_generation) {
>> +		memcpy(fs_devices->fsid, disk_super->fsid,
>> +		       BTRFS_FSID_SIZE);
>> +		memcpy(fs_devices->metadata_uuid,
>> +		       disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>> +
>> +		fs_devices->fsid_change = false;
>> +	}
>> +
>>   	/*
>>   	 * Unmount does not free the btrfs_device struct but would zero
>>   	 * generation along with most of the other members. So just update
>>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan (reformatted)
  2019-12-12 14:15   ` Nikolay Borisov
  2019-12-13  2:30     ` Su Yue
@ 2019-12-13  2:46     ` Su Yue
  2019-12-13  5:36       ` Anand Jain
  1 sibling, 1 reply; 26+ messages in thread
From: Su Yue @ 2019-12-13  2:46 UTC (permalink / raw)
  To: Nikolay Borisov, damenly.su, linux-btrfs

On 2019/12/12 10:15 PM, Nikolay Borisov wrote:
>
>
> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>
> <snip>
>
>> Acutally, there are two devices in the fs. Device 2 with
>> FSID_CHANGING_V2 allocated a fs_devices. But, device 1 found the
>> fs_devices but failed to be added into since fs_devices->opened (
>
> It's not clear why device 1 wasn't able to be added to the fs_devices
> allocated by dev 2. Please elaborate?
>
>
Sure, of course.

For example.

$cat test.sh
====================================================================
img1="/tmp/test1.img"
img2="/tmp/test2.img"

[ -f "$img1" ] || fallocate -l 300M "$img1"
[ -f "$img2" ] || fallocate -l 300M "$img2"

mkfs.btrfs -f $img1 $img2 2>&1 >/dev/null|| exit 1
losetup -D

dmesg -C
rmmod btrfs
modprobe btrfs

loop1=$(losetup --find --show "$img1")
loop2=$(losetup --find --show "$img2")

mount $loop1 /mnt || exit 1
umount /mnt
====================================================================

$dmesg
====================================================================
[  395.205221] BTRFS: device fsid 5090db22-5e48-4767-8fb7-d037c619c1ee
devid 1 transid 5 /dev/loop0 scanned by systemd-udevd (13620)
[  395.210773] !!!!!!!!fs_device opened
[  395.213875] BTRFS info (device loop0): disk space caching is enabled
[  395.214994] BTRFS info (device loop0): has skinny extents
[  395.215891] BTRFS info (device loop0): flagging fs with big metadata
feature
[  395.222639] BTRFS error (device loop0): devid 2 uuid
adcc8454-695f-4e1d-bde8-94041b7bf761 is missing
[  395.224147] BTRFS error (device loop0): failed to read the system
array: -2
[  395.246163] !!!!!!!!fs_device opened
[  395.338219] BTRFS error (device loop0): open_ctree failed
=====================================================================

The line "!!!!!!!!fs_device opened" is handy added by me in debug purpose.

=====================================================================
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -794,6 +794,7 @@ static noinline struct btrfs_device
*device_list_add(const char *path,

         if (!device) {
                 if (fs_devices->opened) {
+                       pr_info("!!!!!!!!fs_device opened\n");
                         mutex_unlock(&fs_devices->device_list_mutex);
                         return ERR_PTR(-EBUSY);
                 }
=====================================================================

To make it more clear. The following is in metadata_uuid situation.
Device 1 is without FSID_CHANGING_V2 but has IMCOMPAT_METADATA_UUID.
(newer transid).

Device 2 is with FSID_CHANGING_V2 and IMCOMPAT_METADATA_UUID.(Older
transid).

The workflow in misc-tests/034 is

loop1=$(losetup --find --show "$device2")
loop2=$(losetup --find --show "$device1")

mount $loop1 /mnt ---> fails here

Assume the fs_devices was allocated by systemd-udevd through
btrfs_control_ioctl() path after finish of scanning of device2.

Then:

Thread *mounting device2*            Thread *scanning device1*


btrfs_mount_root                     btrfs_control_ioctl

   mutex_lock(&uuid_mutex);

     btrfs_read_disk_super
     btrfs_scan_one_device
     --> there is only device2
     in the fs_devices

     btrfs_open_devices
       fs_devices->opened = 1
       fs_devices->latest_bdev = device2

     mutex_unlock(&uuid_mutex);

                                       mutex_lock(&uuid_mutex);
                                       btrfs_scan_one_device
                                         btrfs_read_disk_super

                                         device_list_add
                                           found fs_devices
                                             device = btrfs_find_device

                                             rewrite fs_deivces->fsid if
                                             scanned device1 is newer
                                              --> Change fs_devices->fsi
                                                   d to device1->fsid

                                           if (!device)
                                              if(fs_devices->opened)
						 return -EBUSY
                                              --> the device1 adding
                                                  aborts since
                                                  fs_devices was opened
                                       mutex_unlock(&uuid_mutex);
   btrfs_fill_super
     open_ctree
        btrfs_read_dev_super(
        fs_devices->latest_bdev)
        --> the latest_bdev is device2

        assert fs_devices->fsid equals
        device2's fsid.
        --> fs_device->fsid was rewritten by
            the scanning thread

The result is fs_device->fsid is from device1 but super->fsid is from
the lastest device2.

>> the thread is doing mount device 1). But device 1's fsid was copied
>> to fs_devices->fsid then the assertion failed.
>
>
> dev 1 fsid should be copied iff its transid is newer.
>

Even it was failed to be added into the fs_devices?

>>
>> The solution is that only if a new device was added into a existing
>> fs_device, then the fs_devices->fsid is allowed to be rewritten.
>
> fs_devices->fsid must be re-written by any device which is _newer_ w.r.t
> to the transid.
>

Then the assertion failed in above scenario. Just do not update the
fs_devices->fsid, let later btrfs_read_sys_array() report the device
missing then reject to mount.

Thanks

>>
>> Fixes: 7a62d0f07377 ("btrfs: Handle one more split-brain scenario during fsid change")
>> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
>> ---
>>   fs/btrfs/volumes.c | 36 +++++++++++++++++++++---------------
>>   1 file changed, 21 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index d8e5560db285..9efa4123c335 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -732,6 +732,9 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>>   		BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
>>   	bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
>>   					BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
>> +	bool fs_devices_found = false;
>> +
>> +	*new_device_added = false;
>>
>>   	if (fsid_change_in_progress) {
>>   		if (!has_metadata_uuid) {
>> @@ -772,24 +775,11 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>>
>>   		device = NULL;
>>   	} else {
>> +		fs_devices_found = true;
>> +
>>   		mutex_lock(&fs_devices->device_list_mutex);
>>   		device = btrfs_find_device(fs_devices, devid,
>>   				disk_super->dev_item.uuid, NULL, false);
>> -
>> -		/*
>> -		 * If this disk has been pulled into an fs devices created by
>> -		 * a device which had the CHANGING_FSID_V2 flag then replace the
>> -		 * metadata_uuid/fsid values of the fs_devices.
>> -		 */
>> -		if (has_metadata_uuid && fs_devices->fsid_change &&
>> -		    found_transid > fs_devices->latest_generation) {
>> -			memcpy(fs_devices->fsid, disk_super->fsid,
>> -					BTRFS_FSID_SIZE);
>> -			memcpy(fs_devices->metadata_uuid,
>> -					disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>> -
>> -			fs_devices->fsid_change = false;
>> -		}
>>   	}
>>
>>   	if (!device) {
>> @@ -912,6 +902,22 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>>   		}
>>   	}
>>
>> +	/*
>> +	 * If the new added disk has been pulled into an fs devices created by
>> +	 * a device which had the CHANGING_FSID_V2 flag then replace the
>> +	 * metadata_uuid/fsid values of the fs_devices.
>> +	 */
>> +	if (*new_device_added && fs_devices_found &&
>> +	    has_metadata_uuid && fs_devices->fsid_change &&
>> +	    found_transid > fs_devices->latest_generation) {
>> +		memcpy(fs_devices->fsid, disk_super->fsid,
>> +		       BTRFS_FSID_SIZE);
>> +		memcpy(fs_devices->metadata_uuid,
>> +		       disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>> +
>> +		fs_devices->fsid_change = false;
>> +	}
>> +
>>   	/*
>>   	 * Unmount does not free the btrfs_device struct but would zero
>>   	 * generation along with most of the other members. So just update
>>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan (reformatted)
  2019-12-13  2:46     ` [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan (reformatted) Su Yue
@ 2019-12-13  5:36       ` Anand Jain
  2019-12-13  7:15         ` Su Yue
  0 siblings, 1 reply; 26+ messages in thread
From: Anand Jain @ 2019-12-13  5:36 UTC (permalink / raw)
  To: Su Yue; +Cc: Nikolay Borisov, damenly.su, linux-btrfs



  metadata_uuid code is too confusing, a lot of if and if-nots
  it should be have been better.

  more below.

On 13/12/19 10:46 AM, Su Yue wrote:
> On 2019/12/12 10:15 PM, Nikolay Borisov wrote:
>>
>>
>> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>>
>> <snip>
>>
>>> Acutally, there are two devices in the fs. Device 2 with
>>> FSID_CHANGING_V2 allocated a fs_devices. But, device 1 found the
>>> fs_devices but failed to be added into since fs_devices->opened (
>>
>> It's not clear why device 1 wasn't able to be added to the fs_devices
>> allocated by dev 2. Please elaborate?
>>
>>
> Sure, of course.
> 
> For example.
> 
> $cat test.sh
> ====================================================================
> img1="/tmp/test1.img"
> img2="/tmp/test2.img"
> 
> [ -f "$img1" ] || fallocate -l 300M "$img1"
> [ -f "$img2" ] || fallocate -l 300M "$img2"
> 
> mkfs.btrfs -f $img1 $img2 2>&1 >/dev/null|| exit 1
> losetup -D
> 
> dmesg -C
> rmmod btrfs
> modprobe btrfs
> 
> loop1=$(losetup --find --show "$img1")
> loop2=$(losetup --find --show "$img2")

  Can you explicitly show what devices should be scanned to make the
  device mount (below) successful. Fist you can cleanup the
  device list using

    btrfs device --forget

> mount $loop1 /mnt || exit 1
> umount /mnt
> ====================================================================
> 
> $dmesg
> ====================================================================
> [  395.205221] BTRFS: device fsid 5090db22-5e48-4767-8fb7-d037c619c1ee
> devid 1 transid 5 /dev/loop0 scanned by systemd-udevd (13620)
> [  395.210773] !!!!!!!!fs_device opened
> [  395.213875] BTRFS info (device loop0): disk space caching is enabled
> [  395.214994] BTRFS info (device loop0): has skinny extents
> [  395.215891] BTRFS info (device loop0): flagging fs with big metadata
> feature
> [  395.222639] BTRFS error (device loop0): devid 2 uuid
> adcc8454-695f-4e1d-bde8-94041b7bf761 is missing
> [  395.224147] BTRFS error (device loop0): failed to read the system
> array: -2
> [  395.246163] !!!!!!!!fs_device opened
> [  395.338219] BTRFS error (device loop0): open_ctree failed
> =====================================================================
> 
> The line "!!!!!!!!fs_device opened" is handy added by me in debug purpose.
> 
> =====================================================================
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -794,6 +794,7 @@ static noinline struct btrfs_device
> *device_list_add(const char *path,
> 
>          if (!device) {
>                  if (fs_devices->opened) {
> +                       pr_info("!!!!!!!!fs_device opened\n");
>                          mutex_unlock(&fs_devices->device_list_mutex);
>                          return ERR_PTR(-EBUSY);
>                  }
> =====================================================================
> 
> To make it more clear. The following is in metadata_uuid situation.
> Device 1 is without FSID_CHANGING_V2 but has IMCOMPAT_METADATA_UUID.
> (newer transid).
> 
> Device 2 is with FSID_CHANGING_V2 and IMCOMPAT_METADATA_UUID.(Older
> transid).

How were you able to set both BTRFS_SUPER_FLAG_CHANGING_FSID_V2
and BTRFS_FEATURE_INCOMPAT_METADATA_UUID on only devid 2 ?


> The workflow in misc-tests/034 is
> 
> loop1=$(losetup --find --show "$device2")
> loop2=$(losetup --find --show "$device1")
> 
> mount $loop1 /mnt ---> fails here
> 
> Assume the fs_devices was allocated by systemd-udevd through
> btrfs_control_ioctl() path after finish of scanning of device2.
> 
> Then:
> 

In the two threads which are in race (below), the mount thread can't be 
successful unless -o degraded is used, if it does it means the devid 1 
is already scanned and for that btrfs_device to be in the
btrfs_fs_devices list the fsid has to match (does not matter metadata_uuid).

> Thread *mounting device2*            Thread *scanning device1*
> 
> 
> btrfs_mount_root                     btrfs_control_ioctl
> 
>    mutex_lock(&uuid_mutex);
> 
>      btrfs_read_disk_super
>      btrfs_scan_one_device
>      --> there is only device2
>      in the fs_devices
> 
>      btrfs_open_devices
>        fs_devices->opened = 1
>        fs_devices->latest_bdev = device2
> 
>      mutex_unlock(&uuid_mutex);
> 
>                                        mutex_lock(&uuid_mutex);
>                                        btrfs_scan_one_device
>                                          btrfs_read_disk_super
> 
>                                          device_list_add
>                                            found fs_devices
>                                              device = btrfs_find_device
> 
>                                              rewrite fs_deivces->fsid if
>                                              scanned device1 is newer
>                                               --> Change fs_devices->fsi
>                                                    d to device1->fsid
> 
>                                            if (!device)
>                                               if(fs_devices->opened)
>                           return -EBUSY
>                                               --> the device1 adding
>                                                   aborts since
>                                                   fs_devices was opened
>                                        mutex_unlock(&uuid_mutex);
>    btrfs_fill_super
>      open_ctree
>         btrfs_read_dev_super(
>         fs_devices->latest_bdev)
>         --> the latest_bdev is device2
> 
>         assert fs_devices->fsid equals
>         device2's fsid.
>         --> fs_device->fsid was rewritten by
>             the scanning thread
> 
> The result is fs_device->fsid is from device1 but super->fsid is from
> the lastest device2.
> 

  Oops that's not good. However still not able to image various devices
  and its fsid to achieve that condition. Is it possible to write a test
  case? It would help.

Thanks, Anand

>>> the thread is doing mount device 1). But device 1's fsid was copied
>>> to fs_devices->fsid then the assertion failed.
>>
>>
>> dev 1 fsid should be copied iff its transid is newer.
>>
> 
> Even it was failed to be added into the fs_devices?
> 
>>>
>>> The solution is that only if a new device was added into a existing
>>> fs_device, then the fs_devices->fsid is allowed to be rewritten.
>>
>> fs_devices->fsid must be re-written by any device which is _newer_ w.r.t
>> to the transid.
>>
> 
> Then the assertion failed in above scenario. Just do not update the
> fs_devices->fsid, let later btrfs_read_sys_array() report the device
> missing then reject to mount.
> 
> Thanks
> 
>>>
>>> Fixes: 7a62d0f07377 ("btrfs: Handle one more split-brain scenario 
>>> during fsid change")
>>> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
>>> ---
>>>   fs/btrfs/volumes.c | 36 +++++++++++++++++++++---------------
>>>   1 file changed, 21 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index d8e5560db285..9efa4123c335 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -732,6 +732,9 @@ static noinline struct btrfs_device 
>>> *device_list_add(const char *path,
>>>           BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
>>>       bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
>>>                       BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
>>> +    bool fs_devices_found = false;
>>> +
>>> +    *new_device_added = false;
>>>
>>>       if (fsid_change_in_progress) {
>>>           if (!has_metadata_uuid) {
>>> @@ -772,24 +775,11 @@ static noinline struct btrfs_device 
>>> *device_list_add(const char *path,
>>>
>>>           device = NULL;
>>>       } else {
>>> +        fs_devices_found = true;
>>> +
>>>           mutex_lock(&fs_devices->device_list_mutex);
>>>           device = btrfs_find_device(fs_devices, devid,
>>>                   disk_super->dev_item.uuid, NULL, false);
>>> -
>>> -        /*
>>> -         * If this disk has been pulled into an fs devices created by
>>> -         * a device which had the CHANGING_FSID_V2 flag then replace 
>>> the
>>> -         * metadata_uuid/fsid values of the fs_devices.
>>> -         */
>>> -        if (has_metadata_uuid && fs_devices->fsid_change &&
>>> -            found_transid > fs_devices->latest_generation) {
>>> -            memcpy(fs_devices->fsid, disk_super->fsid,
>>> -                    BTRFS_FSID_SIZE);
>>> -            memcpy(fs_devices->metadata_uuid,
>>> -                    disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>> -
>>> -            fs_devices->fsid_change = false;
>>> -        }
>>>       }
>>>
>>>       if (!device) {
>>> @@ -912,6 +902,22 @@ static noinline struct btrfs_device 
>>> *device_list_add(const char *path,
>>>           }
>>>       }
>>>
>>> +    /*
>>> +     * If the new added disk has been pulled into an fs devices 
>>> created by
>>> +     * a device which had the CHANGING_FSID_V2 flag then replace the
>>> +     * metadata_uuid/fsid values of the fs_devices.
>>> +     */
>>> +    if (*new_device_added && fs_devices_found &&
>>> +        has_metadata_uuid && fs_devices->fsid_change &&
>>> +        found_transid > fs_devices->latest_generation) {
>>> +        memcpy(fs_devices->fsid, disk_super->fsid,
>>> +               BTRFS_FSID_SIZE);
>>> +        memcpy(fs_devices->metadata_uuid,
>>> +               disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>> +
>>> +        fs_devices->fsid_change = false;
>>> +    }
>>> +
>>>       /*
>>>        * Unmount does not free the btrfs_device struct but would zero
>>>        * generation along with most of the other members. So just update
>>>
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan (reformatted)
  2019-12-13  5:36       ` Anand Jain
@ 2019-12-13  7:15         ` Su Yue
  2019-12-13  8:51           ` Anand Jain
  0 siblings, 1 reply; 26+ messages in thread
From: Su Yue @ 2019-12-13  7:15 UTC (permalink / raw)
  To: Anand Jain; +Cc: Nikolay Borisov, damenly.su, linux-btrfs

On 2019/12/13 1:36 PM, Anand Jain wrote:
> 
> 
>   metadata_uuid code is too confusing, a lot of if and if-nots
>   it should be have been better.
> 

Agreed. It costed much brain power of mine.

>   more below.
> 

Willing to answer from my understanding.
If something wrong, please point at.

> On 13/12/19 10:46 AM, Su Yue wrote:
>> On 2019/12/12 10:15 PM, Nikolay Borisov wrote:
>>>
>>>
>>> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>>>
>>> <snip>
>>>
>>>> Acutally, there are two devices in the fs. Device 2 with
>>>> FSID_CHANGING_V2 allocated a fs_devices. But, device 1 found the
>>>> fs_devices but failed to be added into since fs_devices->opened (
>>>
>>> It's not clear why device 1 wasn't able to be added to the fs_devices
>>> allocated by dev 2. Please elaborate?
>>>
>>>
>> Sure, of course.
>>
>> For example.
>>
>> $cat test.sh
>> ====================================================================
>> img1="/tmp/test1.img"
>> img2="/tmp/test2.img"
>>
>> [ -f "$img1" ] || fallocate -l 300M "$img1"
>> [ -f "$img2" ] || fallocate -l 300M "$img2"
>>
>> mkfs.btrfs -f $img1 $img2 2>&1 >/dev/null|| exit 1
>> losetup -D
>>
>> dmesg -C
>> rmmod btrfs
>> modprobe btrfs
>>
>> loop1=$(losetup --find --show "$img1")
>> loop2=$(losetup --find --show "$img2")
> 
>   Can you explicitly show what devices should be scanned to make the
>   device mount (below) successful. Fist you can cleanup the
>   device list using
> 
>     btrfs device --forget
> 

Thanks for the tip.
The purpose of simple script is to show that there
may be uncompleted/unsuccessful device(s) scanning due to
fs_devices->opened. Is the issue already known?

>> mount $loop1 /mnt || exit 1
>> umount /mnt
>> ====================================================================
>>
>> $dmesg
>> ====================================================================
>> [  395.205221] BTRFS: device fsid 5090db22-5e48-4767-8fb7-d037c619c1ee
>> devid 1 transid 5 /dev/loop0 scanned by systemd-udevd (13620)
>> [  395.210773] !!!!!!!!fs_device opened
>> [  395.213875] BTRFS info (device loop0): disk space caching is enabled
>> [  395.214994] BTRFS info (device loop0): has skinny extents
>> [  395.215891] BTRFS info (device loop0): flagging fs with big metadata
>> feature
>> [  395.222639] BTRFS error (device loop0): devid 2 uuid
>> adcc8454-695f-4e1d-bde8-94041b7bf761 is missing
>> [  395.224147] BTRFS error (device loop0): failed to read the system
>> array: -2
>> [  395.246163] !!!!!!!!fs_device opened
>> [  395.338219] BTRFS error (device loop0): open_ctree failed
>> =====================================================================
>>
>> The line "!!!!!!!!fs_device opened" is handy added by me in debug 
>> purpose.
>>
>> =====================================================================
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -794,6 +794,7 @@ static noinline struct btrfs_device
>> *device_list_add(const char *path,
>>
>>          if (!device) {
>>                  if (fs_devices->opened) {
>> +                       pr_info("!!!!!!!!fs_device opened\n");
>>                          mutex_unlock(&fs_devices->device_list_mutex);
>>                          return ERR_PTR(-EBUSY);
>>                  }
>> =====================================================================
>>
>> To make it more clear. The following is in metadata_uuid situation.
>> Device 1 is without FSID_CHANGING_V2 but has IMCOMPAT_METADATA_UUID.
>> (newer transid).
>>
>> Device 2 is with FSID_CHANGING_V2 and IMCOMPAT_METADATA_UUID.(Older
>> transid).
> 
> How were you able to set FSID_CHANGING_V2
> and BTRFS_FEATURE_INCOMPAT_METADATA_UUID on only devid 2 ?
> 
> 
The device2 is simulated to be the device failed to sync due
to some expected reason (power loss).

mkfs on two devices, use v5.4 progs/btrfstune -m $device.
Then both two devices both have the BTRFS_FEATURE_INCOMPAT_METADATA_UUID.
Play some tricks in btrfstune.c to avoid final super block
write on one deivce. Like the ugly code to delete the device:
========================================================================
diff --git a/btrfstune.c b/btrfstune.c
index afa3aae3..f678b978 100644
--- a/btrfstune.c
+++ b/btrfstune.c
@@ -101,12 +101,14 @@ static int set_metadata_uuid(struct btrfs_root 
*root, const char *uuid_string)
         struct btrfs_super_block *disk_super;
         uuid_t new_fsid, unused1, unused2;
         struct btrfs_trans_handle *trans;
+       struct btrfs_device *dev, *next;
         bool new_uuid = true;
         u64 incompat_flags;
         bool uuid_changed;
         u64 super_flags;
         int ret;

         disk_super = root->fs_info->super_copy;
         super_flags = btrfs_super_flags(disk_super);
         incompat_flags = btrfs_super_incompat_flags(disk_super);
@@ -170,6 +172,14 @@ static int set_metadata_uuid(struct btrfs_root 
*root, const char *uuid_string)
                 return 0;
         }

+       list_for_each_entry_safe(dev, next, 
&root->fs_info->fs_devices->devices,
+                                dev_list) {
+               if (dev->devid == 2) {
+                       fsync(dev->fd);
+                       list_del_init(&dev->dev_list);
+               }
+       }
+==================================================================


Compile again. call btrfstune -m again.
Then we get a device with
BTRFS_FEATURE_INCOMPAT_METADATA_UUID and FSID_CHANGING_V2.

>> The workflow in misc-tests/034 is
>>
>> loop1=$(losetup --find --show "$device2")
>> loop2=$(losetup --find --show "$device1")
>>
>> mount $loop1 /mnt ---> fails here
>>
>> Assume the fs_devices was allocated by systemd-udevd through
>> btrfs_control_ioctl() path after finish of scanning of device2.
>>
>> Then:
>>
> 
> In the two threads which are in race (below), the mount thread can't be 
> successful unless -o degraded is used, if it does it means the devid 1 

Right.. The dmesg reports the device 1 is missing.

> is already scanned and for that btrfs_device to be in the
> btrfs_fs_devices list the fsid has to match (does not matter 
> metadata_uuid).

Sorry, I doesn't make much clear what you mean. In similar but no 
metadata_uuid situation, mount will fail too but the assertion won't
fail of course. The device1 was scanned but not added into the
fs_devices(already found) since the fs_devices was opened by the
mounting thread.

> 
>> Thread *mounting device2*            Thread *scanning device1*
>>
>>
>> btrfs_mount_root                     btrfs_control_ioctl
>>
>>    mutex_lock(&uuid_mutex);
>>
>>      btrfs_read_disk_super
>>      btrfs_scan_one_device
>>      --> there is only device2
>>      in the fs_devices
>>
>>      btrfs_open_devices
>>        fs_devices->opened = 1
>>        fs_devices->latest_bdev = device2
>>
>>      mutex_unlock(&uuid_mutex);
>>
>>                                        mutex_lock(&uuid_mutex);
>>                                        btrfs_scan_one_device
>>                                          btrfs_read_disk_super
>>
>>                                          device_list_add
>>                                            found fs_devices
>>                                              device = btrfs_find_device
>>
>>                                              rewrite fs_deivces->fsid if
>>                                              scanned device1 is newer
>>                                               --> Change fs_devices->fsi
>>                                                    d to device1->fsid
>>
>>                                            if (!device)
>>                                               if(fs_devices->opened)
>>                           return -EBUSY
>>                                               --> the device1 adding
>>                                                   aborts since
>>                                                   fs_devices was opened
>>                                        mutex_unlock(&uuid_mutex);
>>    btrfs_fill_super
>>      open_ctree
>>         btrfs_read_dev_super(
>>         fs_devices->latest_bdev)
>>         --> the latest_bdev is device2
>>
>>         assert fs_devices->fsid equals
>>         device2's fsid.
>>         --> fs_device->fsid was rewritten by
>>             the scanning thread
>>
>> The result is fs_device->fsid is from device1 but super->fsid is from
>> the lastest device2.
>>
> 
>   Oops that's not good. However still not able to image various devices
>   and its fsid to achieve that condition. Is it possible to write a test
>   case? It would help.
> 
It did happened in my test environment.. You can try misc-tests/034 
about 20 times on v5.4 progs and v5.5-rc1 kernel. As for the test case,
will give a try.

Thanks
> Thanks, Anand
> 
>>>> the thread is doing mount device 1). But device 1's fsid was copied
>>>> to fs_devices->fsid then the assertion failed.
>>>
>>>
>>> dev 1 fsid should be copied iff its transid is newer.
>>>
>>
>> Even it was failed to be added into the fs_devices?
>>
>>>>
>>>> The solution is that only if a new device was added into a existing
>>>> fs_device, then the fs_devices->fsid is allowed to be rewritten.
>>>
>>> fs_devices->fsid must be re-written by any device which is _newer_ w.r.t
>>> to the transid.
>>>
>>
>> Then the assertion failed in above scenario. Just do not update the
>> fs_devices->fsid, let later btrfs_read_sys_array() report the device
>> missing then reject to mount.
>>
>> Thanks
>>
>>>>
>>>> Fixes: 7a62d0f07377 ("btrfs: Handle one more split-brain scenario 
>>>> during fsid change")
>>>> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
>>>> ---
>>>>   fs/btrfs/volumes.c | 36 +++++++++++++++++++++---------------
>>>>   1 file changed, 21 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index d8e5560db285..9efa4123c335 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -732,6 +732,9 @@ static noinline struct btrfs_device 
>>>> *device_list_add(const char *path,
>>>>           BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
>>>>       bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
>>>>                       BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
>>>> +    bool fs_devices_found = false;
>>>> +
>>>> +    *new_device_added = false;
>>>>
>>>>       if (fsid_change_in_progress) {
>>>>           if (!has_metadata_uuid) {
>>>> @@ -772,24 +775,11 @@ static noinline struct btrfs_device 
>>>> *device_list_add(const char *path,
>>>>
>>>>           device = NULL;
>>>>       } else {
>>>> +        fs_devices_found = true;
>>>> +
>>>>           mutex_lock(&fs_devices->device_list_mutex);
>>>>           device = btrfs_find_device(fs_devices, devid,
>>>>                   disk_super->dev_item.uuid, NULL, false);
>>>> -
>>>> -        /*
>>>> -         * If this disk has been pulled into an fs devices created by
>>>> -         * a device which had the CHANGING_FSID_V2 flag then 
>>>> replace the
>>>> -         * metadata_uuid/fsid values of the fs_devices.
>>>> -         */
>>>> -        if (has_metadata_uuid && fs_devices->fsid_change &&
>>>> -            found_transid > fs_devices->latest_generation) {
>>>> -            memcpy(fs_devices->fsid, disk_super->fsid,
>>>> -                    BTRFS_FSID_SIZE);
>>>> -            memcpy(fs_devices->metadata_uuid,
>>>> -                    disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>>> -
>>>> -            fs_devices->fsid_change = false;
>>>> -        }
>>>>       }
>>>>
>>>>       if (!device) {
>>>> @@ -912,6 +902,22 @@ static noinline struct btrfs_device 
>>>> *device_list_add(const char *path,
>>>>           }
>>>>       }
>>>>
>>>> +    /*
>>>> +     * If the new added disk has been pulled into an fs devices 
>>>> created by
>>>> +     * a device which had the CHANGING_FSID_V2 flag then replace the
>>>> +     * metadata_uuid/fsid values of the fs_devices.
>>>> +     */
>>>> +    if (*new_device_added && fs_devices_found &&
>>>> +        has_metadata_uuid && fs_devices->fsid_change &&
>>>> +        found_transid > fs_devices->latest_generation) {
>>>> +        memcpy(fs_devices->fsid, disk_super->fsid,
>>>> +               BTRFS_FSID_SIZE);
>>>> +        memcpy(fs_devices->metadata_uuid,
>>>> +               disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>>> +
>>>> +        fs_devices->fsid_change = false;
>>>> +    }
>>>> +
>>>>       /*
>>>>        * Unmount does not free the btrfs_device struct but would zero
>>>>        * generation along with most of the other members. So just 
>>>> update
>>>>
>>
> 


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] btrfs: metadata uuid fixes and enhancements
  2019-12-12 11:01 [PATCH 0/6] btrfs: metadata uuid fixes and enhancements damenly.su
                   ` (5 preceding siblings ...)
  2019-12-12 11:01 ` [PATCH 6/6] btrfs: metadata_uuid: move partly logic into find_fsid_inprogress() damenly.su
@ 2019-12-13  8:03 ` Nikolay Borisov
  2019-12-16  0:49   ` Su Yue
  6 siblings, 1 reply; 26+ messages in thread
From: Nikolay Borisov @ 2019-12-13  8:03 UTC (permalink / raw)
  To: damenly.su, linux-btrfs; +Cc: Su Yue



On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
> From: Su Yue <Damenly_Su@gmx.com>
> 
> This patchset fixes one reproducible bug and add two split-brain
> cases ignored.
> 
> The origin code thinks the final state of successful synced device is
> always has INCOMPAT_METADATA_UUID feature. However, a device without
> the feature flag can be the one pull into disk. This is what handled
> in the patchset. Test images are added in btrfs-progs part.
> 
> Patch[1] fixes a bug about wrong fsid copy.
> Patch[2] is for the later patches.
> Patch[3-5] add the forgotten cases.
> Patch[6] just does simple code movement for grace.
> 
> The set passes xfstests-dev without regressions.
> 
> Su Yue (6):
>   btrfs: metadata_uuid: fix failed assertion due to unsuccessful device
>     scan
>   btrfs: metadata_uuid: move split-brain handling from fs_id() to new
>     function
>   btrfs: split-brain case for scanned changing device with
>     INCOMPAT_METADATA_UUID
>   btrfs: split-brain case for scanned changed device without
>     INCOMPAT_METADATA_UUID
>   btrfs: copy fsid and metadata_uuid for pulled disk without
>     INCOMPAT_METADATA_UUID
>   btrfs: metadata_uuid: move partly logic into find_fsid_inprogress()
> 
>  fs/btrfs/volumes.c | 193 +++++++++++++++++++++++++++++----------------
>  1 file changed, 125 insertions(+), 68 deletions(-)
> 


I'm currently on holiday but the fsid change feature has a design
document here:
https://github.com/btrfs/btrfs-dev-docs/blob/master/fsid-change.txt

it lists all the cases I have handled. If you think there are other
please first describe them in prose following the parlance set out in
the document to ease reasoning.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan (reformatted)
  2019-12-13  7:15         ` Su Yue
@ 2019-12-13  8:51           ` Anand Jain
  2019-12-13 10:10             ` Su Yue
  0 siblings, 1 reply; 26+ messages in thread
From: Anand Jain @ 2019-12-13  8:51 UTC (permalink / raw)
  To: Su Yue; +Cc: Nikolay Borisov, damenly.su, linux-btrfs



On 12/13/19 3:15 PM, Su Yue wrote:
> On 2019/12/13 1:36 PM, Anand Jain wrote:
>>
>>
>>   metadata_uuid code is too confusing, a lot of if and if-nots
>>   it should be have been better.
>>
> 
> Agreed. It costed much brain power of mine.
> 
>>   more below.
>>
> 
> Willing to answer from my understanding.
> If something wrong, please point at.
> 
>> On 13/12/19 10:46 AM, Su Yue wrote:
>>> On 2019/12/12 10:15 PM, Nikolay Borisov wrote:
>>>>
>>>>
>>>> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>>>>
>>>> <snip>
>>>>
>>>>> Acutally, there are two devices in the fs. Device 2 with
>>>>> FSID_CHANGING_V2 allocated a fs_devices. But, device 1 found the
>>>>> fs_devices but failed to be added into since fs_devices->opened (
>>>>
>>>> It's not clear why device 1 wasn't able to be added to the fs_devices
>>>> allocated by dev 2. Please elaborate?
>>>>
>>>>
>>> Sure, of course.
>>>
>>> For example.
>>>
>>> $cat test.sh
>>> ====================================================================
>>> img1="/tmp/test1.img"
>>> img2="/tmp/test2.img"
>>>
>>> [ -f "$img1" ] || fallocate -l 300M "$img1"
>>> [ -f "$img2" ] || fallocate -l 300M "$img2"
>>>
>>> mkfs.btrfs -f $img1 $img2 2>&1 >/dev/null|| exit 1
>>> losetup -D
>>>
>>> dmesg -C
>>> rmmod btrfs
>>> modprobe btrfs
>>>
>>> loop1=$(losetup --find --show "$img1")
>>> loop2=$(losetup --find --show "$img2")
>>
>>   Can you explicitly show what devices should be scanned to make the
>>   device mount (below) successful. Fist you can cleanup the
>>   device list using
>>
>>     btrfs device --forget
>>
> 
> Thanks for the tip.
> The purpose of simple script is to show that there
> may be uncompleted/unsuccessful device(s) scanning due to
> fs_devices->opened. Is the issue already known?


Do you mean at line 803.

-----------
  729 static noinline struct btrfs_device *device_list_add(const char *path,
  730                            struct btrfs_super_block *disk_super,
  731                            bool *new_device_added)
  732 {

::

  802         if (!device) {
  803                 if (fs_devices->opened) {
  804                       mutex_unlock(&fs_devices->device_list_mutex);
  805                       return ERR_PTR(-EBUSY);
  806                 }
------------

fs_devices->opened indicates mounted state of the device.

If there is a missing device, the %device will still be there,
we create a dummy %device with the dev_state set to
BTRFS_DEV_STATE_MISSING.

So its wrong if we encounter device == NULL for a given fsid
which is mounted. So by error and return we keep the mounted
fs safe and fail the hijacking attack.


>>> mount $loop1 /mnt || exit 1
>>> umount /mnt
>>> ====================================================================
>>>
>>> $dmesg
>>> ====================================================================
>>> [  395.205221] BTRFS: device fsid 5090db22-5e48-4767-8fb7-d037c619c1ee
>>> devid 1 transid 5 /dev/loop0 scanned by systemd-udevd (13620)
>>> [  395.210773] !!!!!!!!fs_device opened
>>> [  395.213875] BTRFS info (device loop0): disk space caching is enabled
>>> [  395.214994] BTRFS info (device loop0): has skinny extents
>>> [  395.215891] BTRFS info (device loop0): flagging fs with big metadata
>>> feature
>>> [  395.222639] BTRFS error (device loop0): devid 2 uuid
>>> adcc8454-695f-4e1d-bde8-94041b7bf761 is missing
>>> [  395.224147] BTRFS error (device loop0): failed to read the system
>>> array: -2
>>> [  395.246163] !!!!!!!!fs_device opened
>>> [  395.338219] BTRFS error (device loop0): open_ctree failed
>>> =====================================================================
>>>
>>> The line "!!!!!!!!fs_device opened" is handy added by me in debug 
>>> purpose.
>>>
>>> =====================================================================
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -794,6 +794,7 @@ static noinline struct btrfs_device
>>> *device_list_add(const char *path,
>>>
>>>          if (!device) {
>>>                  if (fs_devices->opened) {
>>> +                       pr_info("!!!!!!!!fs_device opened\n");
>>>                          mutex_unlock(&fs_devices->device_list_mutex);
>>>                          return ERR_PTR(-EBUSY);
>>>                  }
>>> =====================================================================
>>>
>>> To make it more clear. The following is in metadata_uuid situation.
>>> Device 1 is without FSID_CHANGING_V2 but has IMCOMPAT_METADATA_UUID.
>>> (newer transid).
>>>
>>> Device 2 is with FSID_CHANGING_V2 and IMCOMPAT_METADATA_UUID.(Older
>>> transid).
>>
>> How were you able to set FSID_CHANGING_V2

  Sorry typo, it should be BTRFS_SUPER_FLAG_CHANGING_FSID_V2.

>> and BTRFS_FEATURE_INCOMPAT_METADATA_UUID on only devid 2 ?
>>
>>
> The device2 is simulated to be the device failed to sync due
> to some expected reason (power loss).

  Ah. power loss before FSID_CHANGING_V2 is cleared. ok.

> mkfs on two devices, use v5.4 progs/btrfstune -m $device.
> Then both two devices both have the BTRFS_FEATURE_INCOMPAT_METADATA_UUID.
> Play some tricks in btrfstune.c to avoid final super block
> write on one deivce. Like the ugly code to delete the device:
> ========================================================================
> diff --git a/btrfstune.c b/btrfstune.c
> index afa3aae3..f678b978 100644
> --- a/btrfstune.c
> +++ b/btrfstune.c
> @@ -101,12 +101,14 @@ static int set_metadata_uuid(struct btrfs_root 
> *root, const char *uuid_string)
>          struct btrfs_super_block *disk_super;
>          uuid_t new_fsid, unused1, unused2;
>          struct btrfs_trans_handle *trans;
> +       struct btrfs_device *dev, *next;
>          bool new_uuid = true;
>          u64 incompat_flags;
>          bool uuid_changed;
>          u64 super_flags;
>          int ret;
> 
>          disk_super = root->fs_info->super_copy;
>          super_flags = btrfs_super_flags(disk_super);
>          incompat_flags = btrfs_super_incompat_flags(disk_super);
> @@ -170,6 +172,14 @@ static int set_metadata_uuid(struct btrfs_root 
> *root, const char *uuid_string)
>                  return 0;
>          }
> 
> +       list_for_each_entry_safe(dev, next, 
> &root->fs_info->fs_devices->devices,
> +                                dev_list) {
> +               if (dev->devid == 2) {
> +                       fsync(dev->fd);
> +                       list_del_init(&dev->dev_list);
> +               }
> +       }
> +==================================================================
> 

  Not like this. If you want to simulate failed write to the
  disk its better to do it with IO failing tools / mechanism outside
  of the btrfs-progs which probably can be a real fstests test case
  as well.

> Compile again. call btrfstune -m again.
> Then we get a device with
> BTRFS_FEATURE_INCOMPAT_METADATA_UUID and FSID_CHANGING_V2.
> 
>>> The workflow in misc-tests/034 is
>>>
>>> loop1=$(losetup --find --show "$device2")
>>> loop2=$(losetup --find --show "$device1")
>>>
>>> mount $loop1 /mnt ---> fails here
>>>
>>> Assume the fs_devices was allocated by systemd-udevd through
>>> btrfs_control_ioctl() path after finish of scanning of device2.
>>>
>>> Then:
>>>
>>
>> In the two threads which are in race (below), the mount thread can't 
>> be successful unless -o degraded is used, if it does it means the devid 1 
> 
> Right.. The dmesg reports the device 1 is missing.

  But then if you aren't using -o degraded then the mount shouldn't
  be successful. Are you using -o degraded?

> 
>> is already scanned and for that btrfs_device to be in the
>> btrfs_fs_devices list the fsid has to match (does not matter 
>> metadata_uuid).
> 
> Sorry, I doesn't make much clear what you mean. In similar but no 
> metadata_uuid situation, mount will fail too but the assertion won't
> fail of course. The device1 was scanned but not added into the
> fs_devices(already found) since the fs_devices was opened by the
> mounting thread.

  That's correct. But your test case with -o degraded and metadata_uuid
  is not clearly understood, is it possible to write a real fstests
  test case? you can use dmerror you will have control when to fail
  the IO.


>>
>>> Thread *mounting device2*            Thread *scanning device1*
>>>
>>>
>>> btrfs_mount_root                     btrfs_control_ioctl
>>>
>>>    mutex_lock(&uuid_mutex);
>>>
>>>      btrfs_read_disk_super
>>>      btrfs_scan_one_device
>>>      --> there is only device2
>>>      in the fs_devices
>>>
>>>      btrfs_open_devices
>>>        fs_devices->opened = 1
>>>        fs_devices->latest_bdev = device2
>>>
>>>      mutex_unlock(&uuid_mutex);
>>>
>>>                                        mutex_lock(&uuid_mutex);
>>>                                        btrfs_scan_one_device
>>>                                          btrfs_read_disk_super
>>>
>>>                                          device_list_add
>>>                                            found fs_devices
>>>                                              device = btrfs_find_device
>>>
>>>                                              rewrite fs_deivces->fsid if
>>>                                              scanned device1 is newer
>>>                                               --> Change fs_devices->fsi
>>>                                                    d to device1->fsid
>>>
>>>                                            if (!device)
>>>                                               if(fs_devices->opened)
>>>                           return -EBUSY
>>>                                               --> the device1 adding
>>>                                                   aborts since
>>>                                                   fs_devices was opened
>>>                                        mutex_unlock(&uuid_mutex);
>>>    btrfs_fill_super
>>>      open_ctree
>>>         btrfs_read_dev_super(
>>>         fs_devices->latest_bdev)
>>>         --> the latest_bdev is device2
>>>
>>>         assert fs_devices->fsid equals
>>>         device2's fsid.
>>>         --> fs_device->fsid was rewritten by
>>>             the scanning thread
>>>
>>> The result is fs_device->fsid is from device1 but super->fsid is from
>>> the lastest device2.
>>>
>>
>>   Oops that's not good. However still not able to image various devices
>>   and its fsid to achieve that condition. Is it possible to write a test
>>   case? It would help.
>>
> It did happened in my test environment.. You can try misc-tests/034 
> about 20 times on v5.4 progs and v5.5-rc1 kernel. As for the test case,
> will give a try.

Let me try.

Thanks, Anand


> Thanks
>> Thanks, Anand
>>
>>>>> the thread is doing mount device 1). But device 1's fsid was copied
>>>>> to fs_devices->fsid then the assertion failed.
>>>>
>>>>
>>>> dev 1 fsid should be copied iff its transid is newer.
>>>>
>>>
>>> Even it was failed to be added into the fs_devices?
>>>
>>>>>
>>>>> The solution is that only if a new device was added into a existing
>>>>> fs_device, then the fs_devices->fsid is allowed to be rewritten.
>>>>
>>>> fs_devices->fsid must be re-written by any device which is _newer_ 
>>>> w.r.t
>>>> to the transid.
>>>>
>>>
>>> Then the assertion failed in above scenario. Just do not update the
>>> fs_devices->fsid, let later btrfs_read_sys_array() report the device
>>> missing then reject to mount.
>>>
>>> Thanks
>>>
>>>>>
>>>>> Fixes: 7a62d0f07377 ("btrfs: Handle one more split-brain scenario 
>>>>> during fsid change")
>>>>> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
>>>>> ---
>>>>>   fs/btrfs/volumes.c | 36 +++++++++++++++++++++---------------
>>>>>   1 file changed, 21 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>>> index d8e5560db285..9efa4123c335 100644
>>>>> --- a/fs/btrfs/volumes.c
>>>>> +++ b/fs/btrfs/volumes.c
>>>>> @@ -732,6 +732,9 @@ static noinline struct btrfs_device 
>>>>> *device_list_add(const char *path,
>>>>>           BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
>>>>>       bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
>>>>>                       BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
>>>>> +    bool fs_devices_found = false;
>>>>> +
>>>>> +    *new_device_added = false;
>>>>>
>>>>>       if (fsid_change_in_progress) {
>>>>>           if (!has_metadata_uuid) {
>>>>> @@ -772,24 +775,11 @@ static noinline struct btrfs_device 
>>>>> *device_list_add(const char *path,
>>>>>
>>>>>           device = NULL;
>>>>>       } else {
>>>>> +        fs_devices_found = true;
>>>>> +
>>>>>           mutex_lock(&fs_devices->device_list_mutex);
>>>>>           device = btrfs_find_device(fs_devices, devid,
>>>>>                   disk_super->dev_item.uuid, NULL, false);
>>>>> -
>>>>> -        /*
>>>>> -         * If this disk has been pulled into an fs devices created by
>>>>> -         * a device which had the CHANGING_FSID_V2 flag then 
>>>>> replace the
>>>>> -         * metadata_uuid/fsid values of the fs_devices.
>>>>> -         */
>>>>> -        if (has_metadata_uuid && fs_devices->fsid_change &&
>>>>> -            found_transid > fs_devices->latest_generation) {
>>>>> -            memcpy(fs_devices->fsid, disk_super->fsid,
>>>>> -                    BTRFS_FSID_SIZE);
>>>>> -            memcpy(fs_devices->metadata_uuid,
>>>>> -                    disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>>>> -
>>>>> -            fs_devices->fsid_change = false;
>>>>> -        }
>>>>>       }
>>>>>
>>>>>       if (!device) {
>>>>> @@ -912,6 +902,22 @@ static noinline struct btrfs_device 
>>>>> *device_list_add(const char *path,
>>>>>           }
>>>>>       }
>>>>>
>>>>> +    /*
>>>>> +     * If the new added disk has been pulled into an fs devices 
>>>>> created by
>>>>> +     * a device which had the CHANGING_FSID_V2 flag then replace the
>>>>> +     * metadata_uuid/fsid values of the fs_devices.
>>>>> +     */
>>>>> +    if (*new_device_added && fs_devices_found &&
>>>>> +        has_metadata_uuid && fs_devices->fsid_change &&
>>>>> +        found_transid > fs_devices->latest_generation) {
>>>>> +        memcpy(fs_devices->fsid, disk_super->fsid,
>>>>> +               BTRFS_FSID_SIZE);
>>>>> +        memcpy(fs_devices->metadata_uuid,
>>>>> +               disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>>>> +
>>>>> +        fs_devices->fsid_change = false;
>>>>> +    }
>>>>> +
>>>>>       /*
>>>>>        * Unmount does not free the btrfs_device struct but would zero
>>>>>        * generation along with most of the other members. So just 
>>>>> update
>>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan (reformatted)
  2019-12-13  8:51           ` Anand Jain
@ 2019-12-13 10:10             ` Su Yue
  0 siblings, 0 replies; 26+ messages in thread
From: Su Yue @ 2019-12-13 10:10 UTC (permalink / raw)
  To: Anand Jain; +Cc: Nikolay Borisov, damenly.su, linux-btrfs

On 2019/12/13 4:51 PM, Anand Jain wrote:
> 
> 
> On 12/13/19 3:15 PM, Su Yue wrote:
>> On 2019/12/13 1:36 PM, Anand Jain wrote:
>>>
>>>
>>>   metadata_uuid code is too confusing, a lot of if and if-nots
>>>   it should be have been better.
>>>
>>
>> Agreed. It costed much brain power of mine.
>>
>>>   more below.
>>>
>>
>> Willing to answer from my understanding.
>> If something wrong, please point at.
>>
>>> On 13/12/19 10:46 AM, Su Yue wrote:
>>>> On 2019/12/12 10:15 PM, Nikolay Borisov wrote:
>>>>>
>>>>>
>>>>> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>>>>>
>>>>> <snip>
>>>>>
>>>>>> Acutally, there are two devices in the fs. Device 2 with
>>>>>> FSID_CHANGING_V2 allocated a fs_devices. But, device 1 found the
>>>>>> fs_devices but failed to be added into since fs_devices->opened (
>>>>>
>>>>> It's not clear why device 1 wasn't able to be added to the fs_devices
>>>>> allocated by dev 2. Please elaborate?
>>>>>
>>>>>
>>>> Sure, of course.
>>>>
>>>> For example.
>>>>
>>>> $cat test.sh
>>>> ====================================================================
>>>> img1="/tmp/test1.img"
>>>> img2="/tmp/test2.img"
>>>>
>>>> [ -f "$img1" ] || fallocate -l 300M "$img1"
>>>> [ -f "$img2" ] || fallocate -l 300M "$img2"
>>>>
>>>> mkfs.btrfs -f $img1 $img2 2>&1 >/dev/null|| exit 1
>>>> losetup -D
>>>>
>>>> dmesg -C
>>>> rmmod btrfs
>>>> modprobe btrfs
>>>>
>>>> loop1=$(losetup --find --show "$img1")
>>>> loop2=$(losetup --find --show "$img2")
>>>
>>>   Can you explicitly show what devices should be scanned to make the
>>>   device mount (below) successful. Fist you can cleanup the
>>>   device list using
>>>
>>>     btrfs device --forget
>>>
>>
>> Thanks for the tip.
>> The purpose of simple script is to show that there
>> may be uncompleted/unsuccessful device(s) scanning due to
>> fs_devices->opened. Is the issue already known?
> 
> 
> Do you mean at line 803.
> 
> -----------
>   729 static noinline struct btrfs_device *device_list_add(const char 
> *path,
>   730                            struct btrfs_super_block *disk_super,
>   731                            bool *new_device_added)
>   732 {
> 
> ::
> 
>   802         if (!device) {
>   803                 if (fs_devices->opened) {
>   804                       mutex_unlock(&fs_devices->device_list_mutex);
>   805                       return ERR_PTR(-EBUSY);
>   806                 }
> ------------
> 
> fs_devices->opened indicates mounted state of the device.
> 
> If there is a missing device, the %device will still be there,
> we create a dummy %device with the dev_state set to
> BTRFS_DEV_STATE_MISSING.
> 
> So its wrong if we encounter device == NULL for a given fsid
> which is mounted. So by error and return we keep the mounted
> fs safe and fail the hijacking attack.
> 
> 
Thanks for your patient explantion. If I understand the
BTRFS_DEV_STATE_MISSING correctly, dummmy device with the state
is adding only if "-o degraded" in read_one_chunk()?

>>>> mount $loop1 /mnt || exit 1
>>>> umount /mnt
>>>> ====================================================================
>>>>
>>>> $dmesg
>>>> ====================================================================
>>>> [  395.205221] BTRFS: device fsid 5090db22-5e48-4767-8fb7-d037c619c1ee
>>>> devid 1 transid 5 /dev/loop0 scanned by systemd-udevd (13620)
>>>> [  395.210773] !!!!!!!!fs_device opened
>>>> [  395.213875] BTRFS info (device loop0): disk space caching is enabled
>>>> [  395.214994] BTRFS info (device loop0): has skinny extents
>>>> [  395.215891] BTRFS info (device loop0): flagging fs with big metadata
>>>> feature
>>>> [  395.222639] BTRFS error (device loop0): devid 2 uuid
>>>> adcc8454-695f-4e1d-bde8-94041b7bf761 is missing
>>>> [  395.224147] BTRFS error (device loop0): failed to read the system
>>>> array: -2
>>>> [  395.246163] !!!!!!!!fs_device opened
>>>> [  395.338219] BTRFS error (device loop0): open_ctree failed
>>>> =====================================================================
>>>>
>>>> The line "!!!!!!!!fs_device opened" is handy added by me in debug 
>>>> purpose.
>>>>
>>>> =====================================================================
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -794,6 +794,7 @@ static noinline struct btrfs_device
>>>> *device_list_add(const char *path,
>>>>
>>>>          if (!device) {
>>>>                  if (fs_devices->opened) {
>>>> +                       pr_info("!!!!!!!!fs_device opened\n");
>>>>                          mutex_unlock(&fs_devices->device_list_mutex);
>>>>                          return ERR_PTR(-EBUSY);
>>>>                  }
>>>> =====================================================================
>>>>
>>>> To make it more clear. The following is in metadata_uuid situation.
>>>> Device 1 is without FSID_CHANGING_V2 but has IMCOMPAT_METADATA_UUID.
>>>> (newer transid).
>>>>
>>>> Device 2 is with FSID_CHANGING_V2 and IMCOMPAT_METADATA_UUID.(Older
>>>> transid).
>>>
>>> How were you able to set FSID_CHANGING_V2
> 
>   Sorry typo, it should be BTRFS_SUPER_FLAG_CHANGING_FSID_V2.
> 
>>> and BTRFS_FEATURE_INCOMPAT_METADATA_UUID on only devid 2 ?
>>>
>>>
>> The device2 is simulated to be the device failed to sync due
>> to some expected reason (power loss).
> 
>   Ah. power loss before FSID_CHANGING_V2 is cleared. ok.
> 
>> mkfs on two devices, use v5.4 progs/btrfstune -m $device.
>> Then both two devices both have the BTRFS_FEATURE_INCOMPAT_METADATA_UUID.
>> Play some tricks in btrfstune.c to avoid final super block
>> write on one deivce. Like the ugly code to delete the device:
>> ========================================================================
>> diff --git a/btrfstune.c b/btrfstune.c
>> index afa3aae3..f678b978 100644
>> --- a/btrfstune.c
>> +++ b/btrfstune.c
>> @@ -101,12 +101,14 @@ static int set_metadata_uuid(struct btrfs_root 
>> *root, const char *uuid_string)
>>          struct btrfs_super_block *disk_super;
>>          uuid_t new_fsid, unused1, unused2;
>>          struct btrfs_trans_handle *trans;
>> +       struct btrfs_device *dev, *next;
>>          bool new_uuid = true;
>>          u64 incompat_flags;
>>          bool uuid_changed;
>>          u64 super_flags;
>>          int ret;
>>
>>          disk_super = root->fs_info->super_copy;
>>          super_flags = btrfs_super_flags(disk_super);
>>          incompat_flags = btrfs_super_incompat_flags(disk_super);
>> @@ -170,6 +172,14 @@ static int set_metadata_uuid(struct btrfs_root 
>> *root, const char *uuid_string)
>>                  return 0;
>>          }
>>
>> +       list_for_each_entry_safe(dev, next, 
>> &root->fs_info->fs_devices->devices,
>> +                                dev_list) {
>> +               if (dev->devid == 2) {
>> +                       fsync(dev->fd);
>> +                       list_del_init(&dev->dev_list);
>> +               }
>> +       }
>> +==================================================================
>>
> 
>   Not like this. If you want to simulate failed write to the
>   disk its better to do it with IO failing tools / mechanism outside
>   of the btrfs-progs which probably can be a real fstests test case
>   as well.
> 
>> Compile again. call btrfstune -m again.
>> Then we get a device with
>> BTRFS_FEATURE_INCOMPAT_METADATA_UUID and FSID_CHANGING_V2.
>>
>>>> The workflow in misc-tests/034 is
>>>>
>>>> loop1=$(losetup --find --show "$device2")
>>>> loop2=$(losetup --find --show "$device1")
>>>>
>>>> mount $loop1 /mnt ---> fails here
>>>>
>>>> Assume the fs_devices was allocated by systemd-udevd through
>>>> btrfs_control_ioctl() path after finish of scanning of device2.
>>>>
>>>> Then:
>>>>
>>>
>>> In the two threads which are in race (below), the mount thread can't 
>>> be successful unless -o degraded is used, if it does it means the 
>>> devid 1 
>>
>> Right.. The dmesg reports the device 1 is missing.
> 
>   But then if you aren't using -o degraded then the mount shouldn't
>   be successful. Are you using -o degraded?
> 
No.. I know what you mean, if the race happened the mount operation 
should be expected to fail. And degraded mount solves the problem.

It makes me quite confused whether the workflow is correct.
Should I alwats add the "-o degraded" after the two 'losteup'?

>>
>>> is already scanned and for that btrfs_device to be in the
>>> btrfs_fs_devices list the fsid has to match (does not matter 
>>> metadata_uuid).
>>
>> Sorry, I doesn't make much clear what you mean. In similar but no 
>> metadata_uuid situation, mount will fail too but the assertion won't
>> fail of course. The device1 was scanned but not added into the
>> fs_devices(already found) since the fs_devices was opened by the
>> mounting thread.
> 
>   That's correct. But your test case with -o degraded and metadata_uuid
>   is not clearly understood, is it possible to write a real fstests
>   test case? you can use dmerror you will have control when to fail
>   the IO.
> 
Of course, I will do.

Thanks

> 
>>>
>>>> Thread *mounting device2*            Thread *scanning device1*
>>>>
>>>>
>>>> btrfs_mount_root                     btrfs_control_ioctl
>>>>
>>>>    mutex_lock(&uuid_mutex);
>>>>
>>>>      btrfs_read_disk_super
>>>>      btrfs_scan_one_device
>>>>      --> there is only device2
>>>>      in the fs_devices
>>>>
>>>>      btrfs_open_devices
>>>>        fs_devices->opened = 1
>>>>        fs_devices->latest_bdev = device2
>>>>
>>>>      mutex_unlock(&uuid_mutex);
>>>>
>>>>                                        mutex_lock(&uuid_mutex);
>>>>                                        btrfs_scan_one_device
>>>>                                          btrfs_read_disk_super
>>>>
>>>>                                          device_list_add
>>>>                                            found fs_devices
>>>>                                              device = btrfs_find_device
>>>>
>>>>                                              rewrite 
>>>> fs_deivces->fsid if
>>>>                                              scanned device1 is newer
>>>>                                               --> Change 
>>>> fs_devices->fsi
>>>>                                                    d to device1->fsid
>>>>
>>>>                                            if (!device)
>>>>                                               if(fs_devices->opened)
>>>>                           return -EBUSY
>>>>                                               --> the device1 adding
>>>>                                                   aborts since
>>>>                                                   fs_devices was opened
>>>>                                        mutex_unlock(&uuid_mutex);
>>>>    btrfs_fill_super
>>>>      open_ctree
>>>>         btrfs_read_dev_super(
>>>>         fs_devices->latest_bdev)
>>>>         --> the latest_bdev is device2
>>>>
>>>>         assert fs_devices->fsid equals
>>>>         device2's fsid.
>>>>         --> fs_device->fsid was rewritten by
>>>>             the scanning thread
>>>>
>>>> The result is fs_device->fsid is from device1 but super->fsid is from
>>>> the lastest device2.
>>>>
>>>
>>>   Oops that's not good. However still not able to image various devices
>>>   and its fsid to achieve that condition. Is it possible to write a test
>>>   case? It would help.
>>>
>> It did happened in my test environment.. You can try misc-tests/034 
>> about 20 times on v5.4 progs and v5.5-rc1 kernel. As for the test case,
>> will give a try.
> 
> Let me try.
> 
> Thanks, Anand
> 
> 
>> Thanks
>>> Thanks, Anand
>>>
>>>>>> the thread is doing mount device 1). But device 1's fsid was copied
>>>>>> to fs_devices->fsid then the assertion failed.
>>>>>
>>>>>
>>>>> dev 1 fsid should be copied iff its transid is newer.
>>>>>
>>>>
>>>> Even it was failed to be added into the fs_devices?
>>>>
>>>>>>
>>>>>> The solution is that only if a new device was added into a existing
>>>>>> fs_device, then the fs_devices->fsid is allowed to be rewritten.
>>>>>
>>>>> fs_devices->fsid must be re-written by any device which is _newer_ 
>>>>> w.r.t
>>>>> to the transid.
>>>>>
>>>>
>>>> Then the assertion failed in above scenario. Just do not update the
>>>> fs_devices->fsid, let later btrfs_read_sys_array() report the device
>>>> missing then reject to mount.
>>>>
>>>> Thanks
>>>>
>>>>>>
>>>>>> Fixes: 7a62d0f07377 ("btrfs: Handle one more split-brain scenario 
>>>>>> during fsid change")
>>>>>> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
>>>>>> ---
>>>>>>   fs/btrfs/volumes.c | 36 +++++++++++++++++++++---------------
>>>>>>   1 file changed, 21 insertions(+), 15 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>>>> index d8e5560db285..9efa4123c335 100644
>>>>>> --- a/fs/btrfs/volumes.c
>>>>>> +++ b/fs/btrfs/volumes.c
>>>>>> @@ -732,6 +732,9 @@ static noinline struct btrfs_device 
>>>>>> *device_list_add(const char *path,
>>>>>>           BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
>>>>>>       bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
>>>>>>                       BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
>>>>>> +    bool fs_devices_found = false;
>>>>>> +
>>>>>> +    *new_device_added = false;
>>>>>>
>>>>>>       if (fsid_change_in_progress) {
>>>>>>           if (!has_metadata_uuid) {
>>>>>> @@ -772,24 +775,11 @@ static noinline struct btrfs_device 
>>>>>> *device_list_add(const char *path,
>>>>>>
>>>>>>           device = NULL;
>>>>>>       } else {
>>>>>> +        fs_devices_found = true;
>>>>>> +
>>>>>>           mutex_lock(&fs_devices->device_list_mutex);
>>>>>>           device = btrfs_find_device(fs_devices, devid,
>>>>>>                   disk_super->dev_item.uuid, NULL, false);
>>>>>> -
>>>>>> -        /*
>>>>>> -         * If this disk has been pulled into an fs devices 
>>>>>> created by
>>>>>> -         * a device which had the CHANGING_FSID_V2 flag then 
>>>>>> replace the
>>>>>> -         * metadata_uuid/fsid values of the fs_devices.
>>>>>> -         */
>>>>>> -        if (has_metadata_uuid && fs_devices->fsid_change &&
>>>>>> -            found_transid > fs_devices->latest_generation) {
>>>>>> -            memcpy(fs_devices->fsid, disk_super->fsid,
>>>>>> -                    BTRFS_FSID_SIZE);
>>>>>> -            memcpy(fs_devices->metadata_uuid,
>>>>>> -                    disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>>>>> -
>>>>>> -            fs_devices->fsid_change = false;
>>>>>> -        }
>>>>>>       }
>>>>>>
>>>>>>       if (!device) {
>>>>>> @@ -912,6 +902,22 @@ static noinline struct btrfs_device 
>>>>>> *device_list_add(const char *path,
>>>>>>           }
>>>>>>       }
>>>>>>
>>>>>> +    /*
>>>>>> +     * If the new added disk has been pulled into an fs devices 
>>>>>> created by
>>>>>> +     * a device which had the CHANGING_FSID_V2 flag then replace the
>>>>>> +     * metadata_uuid/fsid values of the fs_devices.
>>>>>> +     */
>>>>>> +    if (*new_device_added && fs_devices_found &&
>>>>>> +        has_metadata_uuid && fs_devices->fsid_change &&
>>>>>> +        found_transid > fs_devices->latest_generation) {
>>>>>> +        memcpy(fs_devices->fsid, disk_super->fsid,
>>>>>> +               BTRFS_FSID_SIZE);
>>>>>> +        memcpy(fs_devices->metadata_uuid,
>>>>>> +               disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>>>>> +
>>>>>> +        fs_devices->fsid_change = false;
>>>>>> +    }
>>>>>> +
>>>>>>       /*
>>>>>>        * Unmount does not free the btrfs_device struct but would zero
>>>>>>        * generation along with most of the other members. So just 
>>>>>> update
>>>>>>
>>>>
>>>
>>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] btrfs: metadata uuid fixes and enhancements
  2019-12-13  8:03 ` [PATCH 0/6] btrfs: metadata uuid fixes and enhancements Nikolay Borisov
@ 2019-12-16  0:49   ` Su Yue
  0 siblings, 0 replies; 26+ messages in thread
From: Su Yue @ 2019-12-16  0:49 UTC (permalink / raw)
  To: Nikolay Borisov, damenly.su, linux-btrfs

On 2019/12/13 4:03 PM, Nikolay Borisov wrote:
>
>
> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>> From: Su Yue <Damenly_Su@gmx.com>
>>
>> This patchset fixes one reproducible bug and add two split-brain
>> cases ignored.
>>
>> The origin code thinks the final state of successful synced device is
>> always has INCOMPAT_METADATA_UUID feature. However, a device without
>> the feature flag can be the one pull into disk. This is what handled
>> in the patchset. Test images are added in btrfs-progs part.
>>
>> Patch[1] fixes a bug about wrong fsid copy.
>> Patch[2] is for the later patches.
>> Patch[3-5] add the forgotten cases.
>> Patch[6] just does simple code movement for grace.
>>
>> The set passes xfstests-dev without regressions.
>>
>> Su Yue (6):
>>    btrfs: metadata_uuid: fix failed assertion due to unsuccessful device
>>      scan
>>    btrfs: metadata_uuid: move split-brain handling from fs_id() to new
>>      function
>>    btrfs: split-brain case for scanned changing device with
>>      INCOMPAT_METADATA_UUID
>>    btrfs: split-brain case for scanned changed device without
>>      INCOMPAT_METADATA_UUID
>>    btrfs: copy fsid and metadata_uuid for pulled disk without
>>      INCOMPAT_METADATA_UUID
>>    btrfs: metadata_uuid: move partly logic into find_fsid_inprogress()
>>
>>   fs/btrfs/volumes.c | 193 +++++++++++++++++++++++++++++----------------
>>   1 file changed, 125 insertions(+), 68 deletions(-)
>>
>
>
> I'm currently on holiday but the fsid change feature has a design
> document here:
> https://github.com/btrfs/btrfs-dev-docs/blob/master/fsid-change.txt
>
> it lists all the cases I have handled. If you think there are other
> please first describe them in prose following the parlance set out in
> the document to ease reasoning.
>

Thanks. I am going to do it.
The following is just a rough version for people to understand.
The pull request version will be more official like expressions in
btrfs-dev-docs.


 > 4. Failure during transaction x + 1. When such a failure happens the
 > filesystem in question will be partitioned in two sets P and Q. Where P
 > would have the C flag and a new value for fsid written to it as well
as the
 > old FSID value written in the ‘metadata_uuid’ field. In contrast Q would
 > have just the old fsid and the IP flag, also the superblock’s generation
 > number of disks in P will be higher than those in Q. Again two cases
needs
 > to be handled:

I won't argue the Q has the old fsid and IP flag. There is another state
of P.


0) There are two devices with fsid A, without metadata_uuid. E.g
        d1[A, 0, 0]
        d2[A, 0, 0]
        (The first element is the FSID, the 2nd is METADATA_UUID, the 3rd is
         the incompat flag bits)

1) After running "btrfstune -M B":
        d1[B, A, METADATA_UUID]
        d2[B, A, METADATA_UUID]

2) During "btrfstune -M a",
2.1) After first btrfs_commit_transaction() of set_metadata_uuid() finished:

        d1[B, A, METADATA_UUID | FSID_CHANGING_V2]
        d2[B, A, METADATA_UUID | FSID_CHANGING_V2]

2.2) Then run into the branch  btrfstune.c: line 141.

	    if (new_uuid && uuid_changed && memcmp(disk_super->metadata_uuid,
                                                new_fsid,
BTRFS_FSID_SIZE) == 0) {
                 /*
                  * Changing fsid to be the same as metadata uuid, so just
                  * disable the flag
                  */
                 memcpy(disk_super->fsid, &new_fsid, BTRFS_FSID_SIZE);
                 incompat_flags &= ~BTRFS_FEATURE_INCOMPAT_METADATA_UUID;
                 btrfs_set_super_incompat_flags(disk_super, incompat_flags);
                 memset(disk_super->metadata_uuid, 0, BTRFS_FSID_SIZE);

      Now @disk_super is [A, 0, FSID_CHANGING_V2], without METADATA_UUID
flag.

2.3) Then we go to the final btrfs_commit_transaction() of
set_metadata_uuid().
      But powerloss happened, and only d1 get synced:

       d1[A, 0, 0]                                 ---> P
       d2[B, A, METADATA_UUID | FSID_CHANGING_V2]  ---> Q



      So there are 2 cases you forgot to consider.
      - P is scanned first, then Q.
      - Q is scanned first, then P.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] btrfs: copy fsid and metadata_uuid for pulled disk without INCOMPAT_METADATA_UUID
  2019-12-12 11:01 ` [PATCH 5/6] btrfs: copy fsid and metadata_uuid for pulled disk " damenly.su
@ 2020-01-06 15:12   ` Nikolay Borisov
  2020-01-07  1:31     ` Su Yue
  0 siblings, 1 reply; 26+ messages in thread
From: Nikolay Borisov @ 2020-01-06 15:12 UTC (permalink / raw)
  To: damenly.su, linux-btrfs; +Cc: Su Yue



On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
> From: Su Yue <Damenly_Su@gmx.com>
> 
> Since a scanned device may be the device pulled into disk without
> metadata_uuid feature, there may already be changing devices there.
> Here copy fsid and metadata_uuid for above case.
> 
> Signed-off-by: Su Yue <Damenly_Su@gmx.com>


What does this patch fix, why is it needed? It seems to be independent
of the split brain fixes?

> ---
>  fs/btrfs/volumes.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index faf9cdd14f33..b21ab45e76a0 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -964,13 +964,16 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>  	 * metadata_uuid/fsid values of the fs_devices.
>  	 */
>  	if (*new_device_added && fs_devices_found &&
> -	    has_metadata_uuid && fs_devices->fsid_change &&
> +	    fs_devices->fsid_change &&
>  	    found_transid > fs_devices->latest_generation) {
>  		memcpy(fs_devices->fsid, disk_super->fsid,
>  		       BTRFS_FSID_SIZE);
> -		memcpy(fs_devices->metadata_uuid,
> -		       disk_super->metadata_uuid, BTRFS_FSID_SIZE);
> -
> +		if (has_metadata_uuid)
> +			memcpy(fs_devices->metadata_uuid,
> +			       disk_super->metadata_uuid, BTRFS_FSID_SIZE);
> +		else
> +			memcpy(fs_devices->metadata_uuid,
> +			       disk_super->fsid, BTRFS_FSID_SIZE);
>  		fs_devices->fsid_change = false;
>  	}
>  
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] btrfs: copy fsid and metadata_uuid for pulled disk without INCOMPAT_METADATA_UUID
  2020-01-06 15:12   ` Nikolay Borisov
@ 2020-01-07  1:31     ` Su Yue
  2020-01-07  7:18       ` Nikolay Borisov
  0 siblings, 1 reply; 26+ messages in thread
From: Su Yue @ 2020-01-07  1:31 UTC (permalink / raw)
  To: Nikolay Borisov, damenly.su, linux-btrfs

On 2020/1/6 11:12 PM, Nikolay Borisov wrote:
>
>
> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>> From: Su Yue <Damenly_Su@gmx.com>
>>
>> Since a scanned device may be the device pulled into disk without
>> metadata_uuid feature, there may already be changing devices there.
>> Here copy fsid and metadata_uuid for above case.
>>
>> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
>
>
> What does this patch fix, why is it needed? It seems to be independent
> of the split brain fixes?
>

Sorry for the messy and short commit log.
It's one of the split brain fixes.

As mails I replied you earlier, the case
is for device which succeed to sync in
the second transaction and is without
metadata_uuid feature. If there is fs_devices
already scanned, the device's fsid instead of
metadata_uuid(NULL here) should be copied into
the fs_devices->metada_uuid field.

Thanks.

>> ---
>>   fs/btrfs/volumes.c | 11 +++++++----
>>   1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index faf9cdd14f33..b21ab45e76a0 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -964,13 +964,16 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>>   	 * metadata_uuid/fsid values of the fs_devices.
>>   	 */
>>   	if (*new_device_added && fs_devices_found &&
>> -	    has_metadata_uuid && fs_devices->fsid_change &&
>> +	    fs_devices->fsid_change &&
>>   	    found_transid > fs_devices->latest_generation) {
>>   		memcpy(fs_devices->fsid, disk_super->fsid,
>>   		       BTRFS_FSID_SIZE);
>> -		memcpy(fs_devices->metadata_uuid,
>> -		       disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>> -
>> +		if (has_metadata_uuid)
>> +			memcpy(fs_devices->metadata_uuid,
>> +			       disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>> +		else
>> +			memcpy(fs_devices->metadata_uuid,
>> +			       disk_super->fsid, BTRFS_FSID_SIZE);
>>   		fs_devices->fsid_change = false;
>>   	}
>>
>>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] btrfs: copy fsid and metadata_uuid for pulled disk without INCOMPAT_METADATA_UUID
  2020-01-07  1:31     ` Su Yue
@ 2020-01-07  7:18       ` Nikolay Borisov
  2020-01-07  7:34         ` Su Yue
  0 siblings, 1 reply; 26+ messages in thread
From: Nikolay Borisov @ 2020-01-07  7:18 UTC (permalink / raw)
  To: Su Yue, damenly.su, linux-btrfs



On 7.01.20 г. 3:31 ч., Su Yue wrote:
> On 2020/1/6 11:12 PM, Nikolay Borisov wrote:
>>
>>
>> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>>> From: Su Yue <Damenly_Su@gmx.com>
>>>
>>> Since a scanned device may be the device pulled into disk without
>>> metadata_uuid feature, there may already be changing devices there.
>>> Here copy fsid and metadata_uuid for above case.
>>>
>>> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
>>
>>
>> What does this patch fix, why is it needed? It seems to be independent
>> of the split brain fixes?
>>
> 
> Sorry for the messy and short commit log.
> It's one of the split brain fixes.
> 
> As mails I replied you earlier, the case
> is for device which succeed to sync in
> the second transaction and is without
> metadata_uuid feature. If there is fs_devices
> already scanned, the device's fsid instead of
> metadata_uuid(NULL here) should be copied into
> the fs_devices->metada_uuid field.

I figures as much as I started tackling the problem. So this must be
part of the patch which fixes aforementioned split brain scenario. I
have already developed some patches + tests. So will be sending those,
since they are very similar to what you posted originally I will retain
your SOB line.

> 
> Thanks.
> 
>>> ---
>>>   fs/btrfs/volumes.c | 11 +++++++----
>>>   1 file changed, 7 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index faf9cdd14f33..b21ab45e76a0 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -964,13 +964,16 @@ static noinline struct btrfs_device
>>> *device_list_add(const char *path,
>>>        * metadata_uuid/fsid values of the fs_devices.
>>>        */
>>>       if (*new_device_added && fs_devices_found &&
>>> -        has_metadata_uuid && fs_devices->fsid_change &&
>>> +        fs_devices->fsid_change &&
>>>           found_transid > fs_devices->latest_generation) {
>>>           memcpy(fs_devices->fsid, disk_super->fsid,
>>>                  BTRFS_FSID_SIZE);
>>> -        memcpy(fs_devices->metadata_uuid,
>>> -               disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>> -
>>> +        if (has_metadata_uuid)
>>> +            memcpy(fs_devices->metadata_uuid,
>>> +                   disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>> +        else
>>> +            memcpy(fs_devices->metadata_uuid,
>>> +                   disk_super->fsid, BTRFS_FSID_SIZE);
>>>           fs_devices->fsid_change = false;
>>>       }
>>>
>>>
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] btrfs: copy fsid and metadata_uuid for pulled disk without INCOMPAT_METADATA_UUID
  2020-01-07  7:18       ` Nikolay Borisov
@ 2020-01-07  7:34         ` Su Yue
  0 siblings, 0 replies; 26+ messages in thread
From: Su Yue @ 2020-01-07  7:34 UTC (permalink / raw)
  To: Nikolay Borisov, damenly.su, linux-btrfs

On 2020/1/7 3:18 PM, Nikolay Borisov wrote:
>
>
> On 7.01.20 г. 3:31 ч., Su Yue wrote:
>> On 2020/1/6 11:12 PM, Nikolay Borisov wrote:
>>>
>>>
>>> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>>>> From: Su Yue <Damenly_Su@gmx.com>
>>>>
>>>> Since a scanned device may be the device pulled into disk without
>>>> metadata_uuid feature, there may already be changing devices there.
>>>> Here copy fsid and metadata_uuid for above case.
>>>>
>>>> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
>>>
>>>
>>> What does this patch fix, why is it needed? It seems to be independent
>>> of the split brain fixes?
>>>
>>
>> Sorry for the messy and short commit log.
>> It's one of the split brain fixes.
>>
>> As mails I replied you earlier, the case
>> is for device which succeed to sync in
>> the second transaction and is without
>> metadata_uuid feature. If there is fs_devices
>> already scanned, the device's fsid instead of
>> metadata_uuid(NULL here) should be copied into
>> the fs_devices->metada_uuid field.
>
> I figures as much as I started tackling the problem. So this must be
> part of the patch which fixes aforementioned split brain scenario. I
> have already developed some patches + tests. So will be sending those,
> since they are very similar to what you posted originally I will retain
> your SOB line.
>

I'm okay about this way. Thanks.
>>
>> Thanks.
>>
>>>> ---
>>>>    fs/btrfs/volumes.c | 11 +++++++----
>>>>    1 file changed, 7 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index faf9cdd14f33..b21ab45e76a0 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -964,13 +964,16 @@ static noinline struct btrfs_device
>>>> *device_list_add(const char *path,
>>>>         * metadata_uuid/fsid values of the fs_devices.
>>>>         */
>>>>        if (*new_device_added && fs_devices_found &&
>>>> -        has_metadata_uuid && fs_devices->fsid_change &&
>>>> +        fs_devices->fsid_change &&
>>>>            found_transid > fs_devices->latest_generation) {
>>>>            memcpy(fs_devices->fsid, disk_super->fsid,
>>>>                   BTRFS_FSID_SIZE);
>>>> -        memcpy(fs_devices->metadata_uuid,
>>>> -               disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>>> -
>>>> +        if (has_metadata_uuid)
>>>> +            memcpy(fs_devices->metadata_uuid,
>>>> +                   disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>>> +        else
>>>> +            memcpy(fs_devices->metadata_uuid,
>>>> +                   disk_super->fsid, BTRFS_FSID_SIZE);
>>>>            fs_devices->fsid_change = false;
>>>>        }
>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-01-07  7:34 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-12 11:01 [PATCH 0/6] btrfs: metadata uuid fixes and enhancements damenly.su
2019-12-12 11:01 ` [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan damenly.su
2019-12-12 14:15   ` Nikolay Borisov
2019-12-13  2:30     ` Su Yue
2019-12-13  2:46     ` [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan (reformatted) Su Yue
2019-12-13  5:36       ` Anand Jain
2019-12-13  7:15         ` Su Yue
2019-12-13  8:51           ` Anand Jain
2019-12-13 10:10             ` Su Yue
2019-12-12 11:01 ` [PATCH 2/6] btrfs: metadata_uuid: move split-brain handling from fs_id() to new function damenly.su
2019-12-12 13:05   ` Nikolay Borisov
2019-12-12 13:32     ` Su Yue
2019-12-12 11:01 ` [PATCH 3/6] btrfs: split-brain case for scanned changing device with INCOMPAT_METADATA_UUID damenly.su
2019-12-12 13:24   ` Su Yue
2019-12-12 13:34   ` Nikolay Borisov
2019-12-12 14:19     ` Su Yue
2019-12-12 11:01 ` [PATCH 4/6] btrfs: split-brain case for scanned changed device without INCOMPAT_METADATA_UUID damenly.su
2019-12-12 11:01 ` [PATCH 5/6] btrfs: copy fsid and metadata_uuid for pulled disk " damenly.su
2020-01-06 15:12   ` Nikolay Borisov
2020-01-07  1:31     ` Su Yue
2020-01-07  7:18       ` Nikolay Borisov
2020-01-07  7:34         ` Su Yue
2019-12-12 11:01 ` [PATCH 6/6] btrfs: metadata_uuid: move partly logic into find_fsid_inprogress() damenly.su
2019-12-12 13:37   ` Nikolay Borisov
2019-12-13  8:03 ` [PATCH 0/6] btrfs: metadata uuid fixes and enhancements Nikolay Borisov
2019-12-16  0:49   ` Su Yue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).