All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs
@ 2023-05-04 17:07 Guilherme G. Piccoli
  2023-05-04 17:07 ` [PATCH 1/2] btrfs: Introduce the virtual_fsid feature Guilherme G. Piccoli
                   ` (5 more replies)
  0 siblings, 6 replies; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-05-04 17:07 UTC (permalink / raw)
  To: linux-btrfs
  Cc: clm, josef, dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, nborisov, Guilherme G. Piccoli

Hi folks, this is an attempt of supporting same fsid mounting on btrfs.
Currently, we cannot reliably mount same fsid filesystems even one at
a time in btrfs, but if users want to mount them at the same time, it's
pretty much impossible. Other filesystems like ext4 are capable of that.

The goal is to allow systems with A/B partitioning scheme (like the
Steam Deck console or various mobile devices) to be able to hold
the same filesystem image in both partitions; it also allows to have
block device level check for filesystem integrity - this is used in the
Steam Deck image installation, to check if the current read-only image
is pristine. A bit more details are provided in the following ML thread:

https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/

The mechanism used to achieve it is based in the metadata_uuid feature,
leveraging such code infrastructure for that. The patches are based on
kernel 6.3 and were tested both in a virtual machine as well as in the
Steam Deck. Comments, suggestions and overall feedback is greatly
appreciated - thanks in advance!

Cheers,


Guilherme


Guilherme G. Piccoli (2):
  btrfs: Introduce the virtual_fsid feature
  btrfs: Add module parameter to enable non-mount scan skipping

 fs/btrfs/disk-io.c |  22 +++++++--
 fs/btrfs/ioctl.c   |  18 ++++++++
 fs/btrfs/super.c   |  41 ++++++++++++-----
 fs/btrfs/super.h   |   1 +
 fs/btrfs/volumes.c | 111 +++++++++++++++++++++++++++++++++++++++------
 fs/btrfs/volumes.h |  11 ++++-
 6 files changed, 174 insertions(+), 30 deletions(-)

-- 
2.40.0


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-04 17:07 [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs Guilherme G. Piccoli
@ 2023-05-04 17:07 ` Guilherme G. Piccoli
  2023-05-05  7:21   ` Qu Wenruo
  2023-05-05 13:18   ` David Sterba
  2023-05-04 17:07 ` [PATCH 2/2] btrfs: Add module parameter to enable non-mount scan skipping Guilherme G. Piccoli
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-05-04 17:07 UTC (permalink / raw)
  To: linux-btrfs
  Cc: clm, josef, dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, nborisov, Guilherme G. Piccoli

Btrfs doesn't currently support to mount 2 different devices holding the
same filesystem - the fsid is used as a unique identifier in the driver.
This case is supported though in some other common filesystems, like
ext4; one of the reasons for which is not trivial supporting this case
on btrfs is due to its multi-device filesystem nature, native RAID, etc.

Supporting the same-fsid mounts has the advantage of allowing btrfs to
be used in A/B partitioned devices, like mobile phones or the Steam Deck
for example. Without this support, it's not safe for users to keep the
same "image version" in both A and B partitions, a setup that is quite
common for development, for example. Also, as a big bonus, it allows fs
integrity check based on block devices for RO devices (whereas currently
it is required that both have different fsid, breaking the block device
hash comparison).

Such same-fsid mounting is hereby added through the usage of the
mount option "virtual_fsid" - when such option is used, btrfs generates
a random fsid for the filesystem and leverages the metadata_uuid
infrastructure (introduced by [0]) to enable the usage of this secondary
virtual fsid. But differently from the regular metadata_uuid flag, this
is not written into the disk superblock - effectively, this is a spoofed
fsid approach that enables the same filesystem in different devices to
appear as different filesystems to btrfs on runtime.

In order to prevent more code complexity and potential corner cases,
given the usage is aimed to single-devices / partitions, virtual_fsid
is not allowed when the metadata_uuid flag is already present on the fs,
or if the device is on fsid-change state. Device removal/replace is also
disabled for devices mounted with the virtual_fsid option.

[0] 7239ff4b2be8 ("btrfs: Introduce support for FSID change without metadata rewrite)

Cc: Nikolay Borisov <nborisov@suse.com>
Suggested-by: John Schoenick <johns@valvesoftware.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

---

Hi folks, first of all thanks in advance for reviews / suggestions!
Some design choices that worth a discussion:

(1) The first choice was for the random fsid versus user-passed id.
Initially we considered that the flag could be "virtual_fsid=%s", hence
userspace would be responsible to generate the fsid. But seems the
collision probability of fsids in near zero, also the random code
hereby proposed checks if any other fsid/metadata_uuid present in
the btrfs structures is a match, and if so, new random uuids are
created until they prove the unique within btrfs.

(2) About disabling device removal/replace: these cases could be
handled I guess, but with increased code complexity. If there is a
need for that, we could implement. Also worth mentioning that any
advice is appreciated with regards to potential incompatibilities
between "virtual_fsid" and any other btrfs feature / mount option.

(3) There is no proposed documentation about the "virtual_fsid" here;
seems the kernel docs points to a wiki page, so appreciate suggestions
on how to edit such wiki and how to coordinate this with the patch
development cycle.


 fs/btrfs/disk-io.c | 22 ++++++++++--
 fs/btrfs/ioctl.c   | 18 ++++++++++
 fs/btrfs/super.c   | 32 +++++++++++------
 fs/btrfs/volumes.c | 86 +++++++++++++++++++++++++++++++++++++++-------
 fs/btrfs/volumes.h |  8 ++++-
 5 files changed, 139 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9e1596bb208d..66c2bac343b8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -468,7 +468,8 @@ static int check_tree_block_fsid(struct extent_buffer *eb)
 	 * seed devices it's forbidden to have their uuid changed so reading
 	 * ->fsid in this case is fine
 	 */
-	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
+	if (btrfs_fs_incompat(fs_info, METADATA_UUID) ||
+	    fs_devices->virtual_fsid)
 		metadata_uuid = fs_devices->metadata_uuid;
 	else
 		metadata_uuid = fs_devices->fsid;
@@ -2539,6 +2540,7 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
 {
 	u64 nodesize = btrfs_super_nodesize(sb);
 	u64 sectorsize = btrfs_super_sectorsize(sb);
+	u8 *fsid;
 	int ret = 0;
 
 	if (btrfs_super_magic(sb) != BTRFS_MAGIC) {
@@ -2619,8 +2621,22 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
 		ret = -EINVAL;
 	}
 
-	if (memcmp(fs_info->fs_devices->fsid, fs_info->super_copy->fsid,
-		   BTRFS_FSID_SIZE)) {
+	/*
+	 * When the virtual_fsid flag is passed at mount time, btrfs
+	 * creates a random fsid and makes use of the metadata_uuid
+	 * infrastructure in order to allow, for example, two devices
+	 * with same fsid getting mounted at the same time. But notice
+	 * no changes happen at the disk level, so the random fsid is a
+	 * driver abstraction for that single mount, not to be written
+	 * in the disk. That's the reason we're required here to compare
+	 * the fsid with the metadata_uuid if virtual_fsid was set.
+	 */
+	if (fs_info->fs_devices->virtual_fsid)
+		fsid = fs_info->fs_devices->metadata_uuid;
+	else
+		fsid = fs_info->fs_devices->fsid;
+
+	if (memcmp(fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE)) {
 		btrfs_err(fs_info,
 		"superblock fsid doesn't match fsid of fs_devices: %pU != %pU",
 			fs_info->super_copy->fsid, fs_info->fs_devices->fsid);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ba769a1eb87a..35e3a23f8c83 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2677,6 +2677,12 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
+	if (fs_info->fs_devices->virtual_fsid) {
+		btrfs_err(fs_info,
+			  "device removal is unsupported on devices mounted with virtual fsid\n");
+		return -EINVAL;
+	}
+
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args))
 		return PTR_ERR(vol_args);
@@ -2743,6 +2749,12 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
+	if (fs_info->fs_devices->virtual_fsid) {
+		btrfs_err(fs_info,
+			  "device removal is unsupported on devices mounted with virtual fsid\n");
+		return -EINVAL;
+	}
+
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args))
 		return PTR_ERR(vol_args);
@@ -3261,6 +3273,12 @@ static long btrfs_ioctl_dev_replace(struct btrfs_fs_info *fs_info,
 		return -EINVAL;
 	}
 
+	if (fs_info->fs_devices->virtual_fsid) {
+		btrfs_err(fs_info,
+			  "device replace is unsupported on devices mounted with virtual fsid\n");
+		return -EINVAL;
+	}
+
 	p = memdup_user(arg, sizeof(*p));
 	if (IS_ERR(p))
 		return PTR_ERR(p);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 366fb4cde145..8d9df169107a 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -115,6 +115,7 @@ enum {
 	Opt_thread_pool,
 	Opt_treelog, Opt_notreelog,
 	Opt_user_subvol_rm_allowed,
+	Opt_virtual_fsid,
 
 	/* Rescue options */
 	Opt_rescue,
@@ -188,6 +189,7 @@ static const match_table_t tokens = {
 	{Opt_treelog, "treelog"},
 	{Opt_notreelog, "notreelog"},
 	{Opt_user_subvol_rm_allowed, "user_subvol_rm_allowed"},
+	{Opt_virtual_fsid, "virtual_fsid"},
 
 	/* Rescue options */
 	{Opt_rescue, "rescue=%s"},
@@ -352,9 +354,10 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 		case Opt_subvol_empty:
 		case Opt_subvolid:
 		case Opt_device:
+		case Opt_virtual_fsid:
 			/*
 			 * These are parsed by btrfs_parse_subvol_options or
-			 * btrfs_parse_device_options and can be ignored here.
+			 * btrfs_parse_early_options and can be ignored here.
 			 */
 			break;
 		case Opt_nodatasum:
@@ -845,9 +848,10 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
  * All other options will be parsed on much later in the mount process and
  * only when we need to allocate a new super block.
  */
-static int btrfs_parse_device_options(const char *options, fmode_t flags,
-				      void *holder)
+static int btrfs_parse_early_options(const char *options, fmode_t flags,
+				    bool *virtual_fsid, void *holder)
 {
+	struct btrfs_scan_info info = { .vfsid = false };
 	substring_t args[MAX_OPT_ARGS];
 	char *device_name, *opts, *orig, *p;
 	struct btrfs_device *device = NULL;
@@ -874,14 +878,18 @@ static int btrfs_parse_device_options(const char *options, fmode_t flags,
 			continue;
 
 		token = match_token(p, tokens, args);
+
+		if (token == Opt_virtual_fsid)
+			(*virtual_fsid) = true;
+
 		if (token == Opt_device) {
 			device_name = match_strdup(&args[0]);
 			if (!device_name) {
 				error = -ENOMEM;
 				goto out;
 			}
-			device = btrfs_scan_one_device(device_name, flags,
-					holder);
+			info.path = device_name;
+			device = btrfs_scan_one_device(&info, flags, holder);
 			kfree(device_name);
 			if (IS_ERR(device)) {
 				error = PTR_ERR(device);
@@ -913,7 +921,7 @@ static int btrfs_parse_subvol_options(const char *options, char **subvol_name,
 
 	/*
 	 * strsep changes the string, duplicate it because
-	 * btrfs_parse_device_options gets called later
+	 * btrfs_parse_early_options gets called later
 	 */
 	opts = kstrdup(options, GFP_KERNEL);
 	if (!opts)
@@ -1431,6 +1439,7 @@ static struct dentry *mount_subvol(const char *subvol_name, u64 subvol_objectid,
 static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
 		int flags, const char *device_name, void *data)
 {
+	struct btrfs_scan_info info = { .path = NULL, .vfsid = false};
 	struct block_device *bdev = NULL;
 	struct super_block *s;
 	struct btrfs_device *device = NULL;
@@ -1472,13 +1481,14 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
 	}
 
 	mutex_lock(&uuid_mutex);
-	error = btrfs_parse_device_options(data, mode, fs_type);
+	error = btrfs_parse_early_options(data, mode, &info.vfsid, fs_type);
 	if (error) {
 		mutex_unlock(&uuid_mutex);
 		goto error_fs_info;
 	}
 
-	device = btrfs_scan_one_device(device_name, mode, fs_type);
+	info.path = device_name;
+	device = btrfs_scan_one_device(&info, mode, fs_type);
 	if (IS_ERR(device)) {
 		mutex_unlock(&uuid_mutex);
 		error = PTR_ERR(device);
@@ -2169,6 +2179,7 @@ static int btrfs_control_open(struct inode *inode, struct file *file)
 static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
 				unsigned long arg)
 {
+	struct btrfs_scan_info info = { .vfsid = false };
 	struct btrfs_ioctl_vol_args *vol;
 	struct btrfs_device *device = NULL;
 	dev_t devt = 0;
@@ -2182,10 +2193,11 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
 		return PTR_ERR(vol);
 	vol->name[BTRFS_PATH_NAME_MAX] = '\0';
 
+	info.path = vol->name;
 	switch (cmd) {
 	case BTRFS_IOC_SCAN_DEV:
 		mutex_lock(&uuid_mutex);
-		device = btrfs_scan_one_device(vol->name, FMODE_READ,
+		device = btrfs_scan_one_device(&info, FMODE_READ,
 					       &btrfs_root_fs_type);
 		ret = PTR_ERR_OR_ZERO(device);
 		mutex_unlock(&uuid_mutex);
@@ -2200,7 +2212,7 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
 		break;
 	case BTRFS_IOC_DEVICES_READY:
 		mutex_lock(&uuid_mutex);
-		device = btrfs_scan_one_device(vol->name, FMODE_READ,
+		device = btrfs_scan_one_device(&info, FMODE_READ,
 					       &btrfs_root_fs_type);
 		if (IS_ERR(device)) {
 			mutex_unlock(&uuid_mutex);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c6d592870400..5a38b3482ec5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -745,6 +745,33 @@ static struct btrfs_fs_devices *find_fsid_reverted_metadata(
 
 	return NULL;
 }
+
+static void prepare_virtual_fsid(struct btrfs_super_block *disk_super,
+				 const char *path)
+{
+	struct btrfs_fs_devices *fs_devices;
+	u8 vfsid[BTRFS_FSID_SIZE];
+	bool dup_fsid = true;
+
+	while (dup_fsid) {
+		dup_fsid = false;
+		generate_random_uuid(vfsid);
+
+		list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+			if (!memcmp(vfsid, fs_devices->fsid, BTRFS_FSID_SIZE) ||
+			    !memcmp(vfsid, fs_devices->metadata_uuid,
+				    BTRFS_FSID_SIZE))
+				dup_fsid = true;
+		}
+	}
+
+	memcpy(disk_super->metadata_uuid, disk_super->fsid, BTRFS_FSID_SIZE);
+	memcpy(disk_super->fsid, vfsid, BTRFS_FSID_SIZE);
+
+	pr_info("BTRFS: virtual fsid (%pU) set for device %s (real fsid %pU)\n",
+		disk_super->fsid, path, disk_super->metadata_uuid);
+}
+
 /*
  * Add new device to list of registered devices
  *
@@ -752,12 +779,15 @@ static struct btrfs_fs_devices *find_fsid_reverted_metadata(
  * device pointer which was just added or updated when successful
  * error pointer when failed
  */
-static noinline struct btrfs_device *device_list_add(const char *path,
-			   struct btrfs_super_block *disk_super,
-			   bool *new_device_added)
+static noinline
+struct btrfs_device *device_list_add(const struct btrfs_scan_info *info,
+				     struct btrfs_super_block *disk_super,
+				     bool *new_device_added)
 {
 	struct btrfs_device *device;
 	struct btrfs_fs_devices *fs_devices = NULL;
+	const char *path = info->path;
+	const bool virtual_fsid = info->vfsid;
 	struct rcu_string *name;
 	u64 found_transid = btrfs_super_generation(disk_super);
 	u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
@@ -775,12 +805,25 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 		return ERR_PTR(error);
 	}
 
+	if (virtual_fsid) {
+		if (has_metadata_uuid || fsid_change_in_progress) {
+			btrfs_err(NULL,
+			  "failed to add device %s with virtual fsid (%s)\n",
+				  path, (has_metadata_uuid ?
+					 "the fs already has a metadata_uuid" :
+					 "fsid change in progress"));
+			return ERR_PTR(-EINVAL);
+		}
+
+		prepare_virtual_fsid(disk_super, path);
+	}
+
 	if (fsid_change_in_progress) {
 		if (!has_metadata_uuid)
 			fs_devices = find_fsid_inprogress(disk_super);
 		else
 			fs_devices = find_fsid_changed(disk_super);
-	} else if (has_metadata_uuid) {
+	} else if (has_metadata_uuid || virtual_fsid) {
 		fs_devices = find_fsid_with_metadata_uuid(disk_super);
 	} else {
 		fs_devices = find_fsid_reverted_metadata(disk_super);
@@ -790,7 +833,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 
 
 	if (!fs_devices) {
-		if (has_metadata_uuid)
+		if (has_metadata_uuid || virtual_fsid)
 			fs_devices = alloc_fs_devices(disk_super->fsid,
 						      disk_super->metadata_uuid);
 		else
@@ -799,6 +842,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 		if (IS_ERR(fs_devices))
 			return ERR_CAST(fs_devices);
 
+		fs_devices->virtual_fsid = virtual_fsid;
 		fs_devices->fsid_change = fsid_change_in_progress;
 
 		mutex_lock(&fs_devices->device_list_mutex);
@@ -870,11 +914,21 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 	"BTRFS: device label %s devid %llu transid %llu %s scanned by %s (%d)\n",
 				disk_super->label, devid, found_transid, path,
 				current->comm, task_pid_nr(current));
-		else
-			pr_info(
+		else {
+			if (virtual_fsid)
+				pr_info(
+"BTRFS: device with virtual fsid %pU (real fsid %pU) devid %llu transid %llu %s scanned by %s (%d)\n",
+					disk_super->fsid,
+					disk_super->metadata_uuid, devid,
+					found_transid, path, current->comm,
+					task_pid_nr(current));
+			else
+				pr_info(
 	"BTRFS: device fsid %pU devid %llu transid %llu %s scanned by %s (%d)\n",
-				disk_super->fsid, devid, found_transid, path,
-				current->comm, task_pid_nr(current));
+					disk_super->fsid, devid, found_transid,
+					path, current->comm,
+					task_pid_nr(current));
+		}
 
 	} else if (!device->name || strcmp(device->name->str, path)) {
 		/*
@@ -1348,8 +1402,8 @@ int btrfs_forget_devices(dev_t devt)
  * and we are not allowed to call set_blocksize during the scan. The superblock
  * is read via pagecache
  */
-struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
-					   void *holder)
+struct btrfs_device *btrfs_scan_one_device(const struct btrfs_scan_info *info,
+					   fmode_t flags, void *holder)
 {
 	struct btrfs_super_block *disk_super;
 	bool new_device_added = false;
@@ -1377,7 +1431,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
 	 * values temporarily, as the device paths of the fsid are the only
 	 * required information for assembling the volume.
 	 */
-	bdev = blkdev_get_by_path(path, flags, holder);
+	bdev = blkdev_get_by_path(info->path, flags, holder);
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);
 
@@ -1394,7 +1448,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
 		goto error_bdev_put;
 	}
 
-	device = device_list_add(path, disk_super, &new_device_added);
+	device = device_list_add(info, disk_super, &new_device_added);
 	if (!IS_ERR(device) && new_device_added)
 		btrfs_free_stale_devices(device->devt, device);
 
@@ -2390,6 +2444,12 @@ int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
 
 	args->devid = btrfs_stack_device_id(&disk_super->dev_item);
 	memcpy(args->uuid, disk_super->dev_item.uuid, BTRFS_UUID_SIZE);
+
+	/*
+	 * Notice that the virtual_fsid feature is not handled here; device
+	 * removal/replace is instead forbidden when such feature is in-use,
+	 * but this note is for future users of btrfs_get_dev_args_from_path().
+	 */
 	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
 		memcpy(args->fsid, disk_super->metadata_uuid, BTRFS_FSID_SIZE);
 	else
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 7e51f2238f72..f2354e8288f9 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -73,6 +73,11 @@ enum btrfs_raid_types {
 #define BTRFS_DEV_STATE_FLUSH_SENT	(4)
 #define BTRFS_DEV_STATE_NO_READA	(5)
 
+struct btrfs_scan_info {
+	const char *path;
+	bool vfsid;
+};
+
 struct btrfs_zoned_device_info;
 
 struct btrfs_device {
@@ -278,6 +283,7 @@ struct btrfs_fs_devices {
 	u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
 	u8 metadata_uuid[BTRFS_FSID_SIZE];
 	bool fsid_change;
+	bool virtual_fsid;
 	struct list_head fs_list;
 
 	/*
@@ -537,7 +543,7 @@ struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
 void btrfs_mapping_tree_free(struct extent_map_tree *tree);
 int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 		       fmode_t flags, void *holder);
-struct btrfs_device *btrfs_scan_one_device(const char *path,
+struct btrfs_device *btrfs_scan_one_device(const struct btrfs_scan_info *info,
 					   fmode_t flags, void *holder);
 int btrfs_forget_devices(dev_t devt);
 void btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/2] btrfs: Add module parameter to enable non-mount scan skipping
  2023-05-04 17:07 [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs Guilherme G. Piccoli
  2023-05-04 17:07 ` [PATCH 1/2] btrfs: Introduce the virtual_fsid feature Guilherme G. Piccoli
@ 2023-05-04 17:07 ` Guilherme G. Piccoli
  2023-05-04 19:28 ` [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs Goffredo Baroncelli
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-05-04 17:07 UTC (permalink / raw)
  To: linux-btrfs
  Cc: clm, josef, dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, nborisov, Guilherme G. Piccoli

In case there are 2 btrfs filesystems holding the same fsid but in
different block devices, the ioctl-based scanning prevents their
peaceful coexistence, due to checks during the device add to the fsid
list. Imagine an A/B partitioned OS, like in mobile devices or the Steam
Deck - if we have both partitions holding the exact same image, depending
on the order that udev triggers the scan and the filesystem generation
number, the users potentially won't be able to mount one of them, even
if the other was never mounted.

To avoid this case, introduce a btrfs parameter to allow users to select
devices to be excluded of non-mount scanning. The module parameter
"skip_scan=%s" accepts full device paths comma-separated, the same paths
passed as a parameter to btrfs_scan_one_device(). If a scan procedure
wasn't triggered from the mount path (meaning it was ioctl-based) and
the "skip_scan" parameter contains a valid device path, such device
scanning is skipped and this is informed on dmesg.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

---

Some design choices that should be discussed here:

(1) We could either have it as a parameter, or a flag in the superblock
(like the metadata_uuid) - the parameter approach seemed easier / less
invasive, but I might be wrong - appreciate feedback on this.

(2) The parameter name of course is irrelevant and if somebody has a
better idea for the name, I'm upfront okay with that =)

(3) Again, no documentation is provided here - appreciate suggestions
on how to proper document changes to btrfs (wiki, I assume?).

Thanks in advance for reviews and suggestions,

Guilherme


 fs/btrfs/super.c   | 13 +++++++++----
 fs/btrfs/super.h   |  1 +
 fs/btrfs/volumes.c | 27 ++++++++++++++++++++++++++-
 fs/btrfs/volumes.h |  3 ++-
 4 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 8d9df169107a..4532cbc2bb57 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -62,6 +62,11 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/btrfs.h>
 
+char *skip_scan;
+module_param(skip_scan, charp, 0444);
+MODULE_PARM_DESC(skip_scan,
+		 "User list of devices to be skipped in non mount induced scans (comma separated)");
+
 static const struct super_operations btrfs_super_ops;
 
 /*
@@ -889,7 +894,7 @@ static int btrfs_parse_early_options(const char *options, fmode_t flags,
 				goto out;
 			}
 			info.path = device_name;
-			device = btrfs_scan_one_device(&info, flags, holder);
+			device = btrfs_scan_one_device(&info, flags, holder, true);
 			kfree(device_name);
 			if (IS_ERR(device)) {
 				error = PTR_ERR(device);
@@ -1488,7 +1493,7 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
 	}
 
 	info.path = device_name;
-	device = btrfs_scan_one_device(&info, mode, fs_type);
+	device = btrfs_scan_one_device(&info, mode, fs_type, true);
 	if (IS_ERR(device)) {
 		mutex_unlock(&uuid_mutex);
 		error = PTR_ERR(device);
@@ -2198,7 +2203,7 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
 	case BTRFS_IOC_SCAN_DEV:
 		mutex_lock(&uuid_mutex);
 		device = btrfs_scan_one_device(&info, FMODE_READ,
-					       &btrfs_root_fs_type);
+					       &btrfs_root_fs_type, false);
 		ret = PTR_ERR_OR_ZERO(device);
 		mutex_unlock(&uuid_mutex);
 		break;
@@ -2213,7 +2218,7 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
 	case BTRFS_IOC_DEVICES_READY:
 		mutex_lock(&uuid_mutex);
 		device = btrfs_scan_one_device(&info, FMODE_READ,
-					       &btrfs_root_fs_type);
+					       &btrfs_root_fs_type, false);
 		if (IS_ERR(device)) {
 			mutex_unlock(&uuid_mutex);
 			ret = PTR_ERR(device);
diff --git a/fs/btrfs/super.h b/fs/btrfs/super.h
index 8dbb909b364f..6eddd196bb51 100644
--- a/fs/btrfs/super.h
+++ b/fs/btrfs/super.h
@@ -3,6 +3,7 @@
 #ifndef BTRFS_SUPER_H
 #define BTRFS_SUPER_H
 
+extern char *skip_scan;
 int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 			unsigned long new_flags);
 int btrfs_sync_fs(struct super_block *sb, int wait);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5a38b3482ec5..53da2ebb246c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -12,6 +12,7 @@
 #include <linux/uuid.h>
 #include <linux/list_sort.h>
 #include <linux/namei.h>
+#include <linux/string.h>
 #include "misc.h"
 #include "ctree.h"
 #include "extent_map.h"
@@ -1403,7 +1404,8 @@ int btrfs_forget_devices(dev_t devt)
  * is read via pagecache
  */
 struct btrfs_device *btrfs_scan_one_device(const struct btrfs_scan_info *info,
-					   fmode_t flags, void *holder)
+					   fmode_t flags, void *holder,
+					   bool mounting)
 {
 	struct btrfs_super_block *disk_super;
 	bool new_device_added = false;
@@ -1414,6 +1416,29 @@ struct btrfs_device *btrfs_scan_one_device(const struct btrfs_scan_info *info,
 
 	lockdep_assert_held(&uuid_mutex);
 
+	if (!mounting && skip_scan) {
+		char *p, *skip_devs, *orig;
+
+		skip_devs = kstrdup(skip_scan, GFP_KERNEL);
+		if (!skip_devs)
+			return ERR_PTR(-ENOMEM);
+
+		orig = skip_devs;
+		while ((p = strsep(&skip_devs, ",")) != NULL) {
+			if (!*p)
+				continue;
+
+			if (!strcmp(p, info->path)) {
+				pr_info(
+	"BTRFS: skipped non-mount scan on device %s due to module parameter\n",
+					info->path);
+				kfree(orig);
+				return ERR_PTR(-EINVAL);
+			}
+		}
+		kfree(orig);
+	}
+
 	/*
 	 * we would like to check all the supers, but that would make
 	 * a btrfs mount succeed after a mkfs from a different FS.
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index f2354e8288f9..3e83565b326a 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -544,7 +544,8 @@ void btrfs_mapping_tree_free(struct extent_map_tree *tree);
 int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 		       fmode_t flags, void *holder);
 struct btrfs_device *btrfs_scan_one_device(const struct btrfs_scan_info *info,
-					   fmode_t flags, void *holder);
+					   fmode_t flags, void *holder,
+					   bool mounting);
 int btrfs_forget_devices(dev_t devt);
 void btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
 void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices);
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs
  2023-05-04 17:07 [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs Guilherme G. Piccoli
  2023-05-04 17:07 ` [PATCH 1/2] btrfs: Introduce the virtual_fsid feature Guilherme G. Piccoli
  2023-05-04 17:07 ` [PATCH 2/2] btrfs: Add module parameter to enable non-mount scan skipping Guilherme G. Piccoli
@ 2023-05-04 19:28 ` Goffredo Baroncelli
  2023-05-04 20:10   ` Guilherme G. Piccoli
  2023-05-05  5:16 ` Anand Jain
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 38+ messages in thread
From: Goffredo Baroncelli @ 2023-05-04 19:28 UTC (permalink / raw)
  To: Guilherme G. Piccoli, linux-btrfs
  Cc: clm, josef, dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, nborisov

On 04/05/2023 19.07, Guilherme G. Piccoli wrote:
> Hi folks, this is an attempt of supporting same fsid mounting on btrfs.
> Currently, we cannot reliably mount same fsid filesystems even one at
> a time in btrfs, but if users want to mount them at the same time, it's
> pretty much impossible. Other filesystems like ext4 are capable of that.

Hi Guilherme,

did you tried to run "btrfs dev scan --forget /dev/sd.." before
mount the filesystem ?

Assuming that you have two devices /dev/sdA and /dev/sdB with two btrfs
filesystem with the same uuid, you should mount /dev/sdA

btrfs dev scan --forget /dev/sdB # you can use event /dev/sdA
mount /dev/sdA /mnt/target

and to mount /dev/sdB

btrfs dev scan --forget /dev/sdA # you can use event /dev/sdB
mount /dev/sdB /mnt/target

I made a quick test using two loop devices and it seems that it works
reliably.

Another option should be make a kernel change that "forget" the device
before mounting *if* the filesystem is composed by only one device (
and another few exceptions like the filesystem is already mounted).

This would avoid all the problem related to make a "temporary" uuid.

> 
> The goal is to allow systems with A/B partitioning scheme (like the
> Steam Deck console or various mobile devices) to be able to hold
> the same filesystem image in both partitions; it also allows to have
> block device level check for filesystem integrity - this is used in the
> Steam Deck image installation, to check if the current read-only image
> is pristine. A bit more details are provided in the following ML thread:
> 
> https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/
> 
> The mechanism used to achieve it is based in the metadata_uuid feature,
> leveraging such code infrastructure for that. The patches are based on
> kernel 6.3 and were tested both in a virtual machine as well as in the
> Steam Deck. Comments, suggestions and overall feedback is greatly
> appreciated - thanks in advance!
> 
> Cheers,
> 
> 
> Guilherme
> 
> 
> Guilherme G. Piccoli (2):
>    btrfs: Introduce the virtual_fsid feature
>    btrfs: Add module parameter to enable non-mount scan skipping
> 
>   fs/btrfs/disk-io.c |  22 +++++++--
>   fs/btrfs/ioctl.c   |  18 ++++++++
>   fs/btrfs/super.c   |  41 ++++++++++++-----
>   fs/btrfs/super.h   |   1 +
>   fs/btrfs/volumes.c | 111 +++++++++++++++++++++++++++++++++++++++------
>   fs/btrfs/volumes.h |  11 ++++-
>   6 files changed, 174 insertions(+), 30 deletions(-)
> 

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs
  2023-05-04 19:28 ` [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs Goffredo Baroncelli
@ 2023-05-04 20:10   ` Guilherme G. Piccoli
  2023-05-04 21:09     ` Goffredo Baroncelli
  0 siblings, 1 reply; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-05-04 20:10 UTC (permalink / raw)
  To: kreijack
  Cc: linux-btrfs, clm, josef, dsterba, linux-fsdevel, kernel,
	kernel-dev, vivek, ludovico.denittis, johns

On 04/05/2023 16:28, Goffredo Baroncelli wrote:
> [...]
> Hi Guilherme,
> 
> did you tried to run "btrfs dev scan --forget /dev/sd.." before
> mount the filesystem ?
> 
> Assuming that you have two devices /dev/sdA and /dev/sdB with two btrfs
> filesystem with the same uuid, you should mount /dev/sdA
> 
> btrfs dev scan --forget /dev/sdB # you can use event /dev/sdA
> mount /dev/sdA /mnt/target
> 
> and to mount /dev/sdB
> 
> btrfs dev scan --forget /dev/sdA # you can use event /dev/sdB
> mount /dev/sdB /mnt/target
> 
> I made a quick test using two loop devices and it seems that it works
> reliably.

Hi Goffredo, thanks for your suggestion!

This seems interesting with regards the second patch here..indeed, I can
mount any of the 2 filesystems if I use the forget option - interesting
option, wasn't aware of that.

But unfortunately it seems limited to mounting one device at a time, and
we need to be able to mount *both* of them, due to an installation step.
If I try to forget the device that is mounted, it gets (obviously) a
"busy device" error.

Is there any missing step from my side, or mounting both devices is
really a limitation when using the forget option?


> 
> Another option should be make a kernel change that "forget" the device
> before mounting *if* the filesystem is composed by only one device (
> and another few exceptions like the filesystem is already mounted).
> 
> This would avoid all the problem related to make a "temporary" uuid.

I guess again this would be useful in the scope of the second patch
here...we could check the way you're proposing instead of having the
module parameter. In a way this is similar to the forget approach,
right? But it's kind of an "automatic" forget heh

How btrfs would know it is a case for single-device filesystem? In other
words: how would we distinguish between the cases we want to auto-forget
before mounting, and the cases in which this behavior is undesired?

Thanks again for your feedback, it is much appreciated.
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs
  2023-05-04 20:10   ` Guilherme G. Piccoli
@ 2023-05-04 21:09     ` Goffredo Baroncelli
  2023-05-05 16:21       ` Guilherme G. Piccoli
  0 siblings, 1 reply; 38+ messages in thread
From: Goffredo Baroncelli @ 2023-05-04 21:09 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-btrfs, clm, josef, dsterba, linux-fsdevel, kernel,
	kernel-dev, vivek, ludovico.denittis, johns

On 04/05/2023 22.10, Guilherme G. Piccoli wrote:
> On 04/05/2023 16:28, Goffredo Baroncelli wrote:
>> [...]
>> Hi Guilherme,
>>
>> did you tried to run "btrfs dev scan --forget /dev/sd.." before
>> mount the filesystem ?
>>
>> Assuming that you have two devices /dev/sdA and /dev/sdB with two btrfs
>> filesystem with the same uuid, you should mount /dev/sdA
>>
>> btrfs dev scan --forget /dev/sdB # you can use event /dev/sdA
>> mount /dev/sdA /mnt/target
>>
>> and to mount /dev/sdB
>>
>> btrfs dev scan --forget /dev/sdA # you can use event /dev/sdB
>> mount /dev/sdB /mnt/target
>>
>> I made a quick test using two loop devices and it seems that it works
>> reliably.
> 
> Hi Goffredo, thanks for your suggestion!
> 
> This seems interesting with regards the second patch here..indeed, I can
> mount any of the 2 filesystems if I use the forget option - interesting
> option, wasn't aware of that.
> 
> But unfortunately it seems limited to mounting one device at a time, and
> we need to be able to mount *both* of them, due to an installation step.
> If I try to forget the device that is mounted, it gets (obviously) a
> "busy device" error.

Ahh, I didn't realized that you want to mount the two FS at the
same time.

> 
> Is there any missing step from my side, or mounting both devices is
> really a limitation when using the forget option?
> 

 From my limited BTRFS internal knowledge, I think that your patches
takes the correct approach: using the "metadata_uuid" code to allow
two filesystems with the same uuid to exist at the same time.


> 
>>
>> Another option should be make a kernel change that "forget" the device
>> before mounting *if* the filesystem is composed by only one device (
>> and another few exceptions like the filesystem is already mounted).
>>
>> This would avoid all the problem related to make a "temporary" uuid.
> 
> I guess again this would be useful in the scope of the second patch
> here...we could check the way you're proposing instead of having the
> module parameter. In a way this is similar to the forget approach,
> right? But it's kind of an "automatic" forget heh
> 
> How btrfs would know it is a case for single-device filesystem? In other
> words: how would we distinguish between the cases we want to auto-forget
> before mounting, and the cases in which this behavior is undesired?

If I remember correctly in the super-block there is the number of disks
that compose the filesystem. If the count is 1, it should be safe to
forget-before-mount the filesystem (or to not store in the cache
after a scan)


> 
> Thanks again for your feedback, it is much appreciated.
> Cheers,
> 
> 
> Guilherme

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs
  2023-05-04 17:07 [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs Guilherme G. Piccoli
                   ` (2 preceding siblings ...)
  2023-05-04 19:28 ` [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs Goffredo Baroncelli
@ 2023-05-05  5:16 ` Anand Jain
  2023-05-05 16:27   ` Guilherme G. Piccoli
  2023-05-07 23:10 ` Dave Chinner
  2023-08-03 15:47 ` Guilherme G. Piccoli
  5 siblings, 1 reply; 38+ messages in thread
From: Anand Jain @ 2023-05-05  5:16 UTC (permalink / raw)
  To: Guilherme G. Piccoli, linux-btrfs
  Cc: clm, josef, dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, nborisov

On 5/5/23 01:07, Guilherme G. Piccoli wrote:
> Hi folks, this is an attempt of supporting same fsid mounting on btrfs.
> Currently, we cannot reliably mount same fsid filesystems even one at
> a time in btrfs, but if users want to mount them at the same time, it's
> pretty much impossible. Other filesystems like ext4 are capable of that.
> 


> The goal is to allow systems with A/B partitioning scheme (like the
> Steam Deck console or various mobile devices) to be able to hold
> the same filesystem image in both partitions; it also allows to have
> block device level check for filesystem integrity - this is used in the
> Steam Deck image installation, to check if the current read-only image
> is pristine. A bit more details are provided in the following ML thread:
> 
> https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/

Confused about your requirement: 2 identical filesystems mounted 
simultaneously or just one at a time? Latter works. Bugs were fixed.

Have you considered using the btrfs seed device feature to avoid 
sacrificing 50% capacity? Create read-only seed device as golden image, 
add writable device on top. Example:

   $ btrfstune -S1 /dev/rdonly-golden-img
   $ mount /dev/rdonly-golden-img /btrfs
   $ btrfs dev add /dev/rw-dev /btrfs
   $ mount -o remount,rw /dev/rw-dev /btrfs

To switch golden image:

   $ btrfs dev del /dev/rdonly-golden-img /btrfs
   $ umount /btrfs
   $ btrfstune -S1 /dev/rw-dev

Thanks, Anand


> The mechanism used to achieve it is based in the metadata_uuid feature,
> leveraging such code infrastructure for that. The patches are based on
> kernel 6.3 and were tested both in a virtual machine as well as in the
> Steam Deck. Comments, suggestions and overall feedback is greatly
> appreciated - thanks in advance!
> 
> Cheers,
> 
> 
> Guilherme
> 
> 
> Guilherme G. Piccoli (2):
>    btrfs: Introduce the virtual_fsid feature
>    btrfs: Add module parameter to enable non-mount scan skipping
> 
>   fs/btrfs/disk-io.c |  22 +++++++--
>   fs/btrfs/ioctl.c   |  18 ++++++++
>   fs/btrfs/super.c   |  41 ++++++++++++-----
>   fs/btrfs/super.h   |   1 +
>   fs/btrfs/volumes.c | 111 +++++++++++++++++++++++++++++++++++++++------
>   fs/btrfs/volumes.h |  11 ++++-
>   6 files changed, 174 insertions(+), 30 deletions(-)
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-04 17:07 ` [PATCH 1/2] btrfs: Introduce the virtual_fsid feature Guilherme G. Piccoli
@ 2023-05-05  7:21   ` Qu Wenruo
  2023-05-05 13:38     ` David Sterba
                       ` (3 more replies)
  2023-05-05 13:18   ` David Sterba
  1 sibling, 4 replies; 38+ messages in thread
From: Qu Wenruo @ 2023-05-05  7:21 UTC (permalink / raw)
  To: Guilherme G. Piccoli, linux-btrfs
  Cc: clm, josef, dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, nborisov



On 2023/5/5 01:07, Guilherme G. Piccoli wrote:
> Btrfs doesn't currently support to mount 2 different devices holding the
> same filesystem - the fsid is used as a unique identifier in the driver.
> This case is supported though in some other common filesystems, like
> ext4; one of the reasons for which is not trivial supporting this case
> on btrfs is due to its multi-device filesystem nature, native RAID, etc.

Exactly, the biggest problem is the multi-device support.

Btrfs needs to search and assemble all devices of a multi-device
filesystem, which is normally handled by things like LVM/DMraid, thus
other traditional fses won't need to bother that.

>
> Supporting the same-fsid mounts has the advantage of allowing btrfs to
> be used in A/B partitioned devices, like mobile phones or the Steam Deck
> for example. Without this support, it's not safe for users to keep the
> same "image version" in both A and B partitions, a setup that is quite
> common for development, for example. Also, as a big bonus, it allows fs
> integrity check based on block devices for RO devices (whereas currently
> it is required that both have different fsid, breaking the block device
> hash comparison).
>
> Such same-fsid mounting is hereby added through the usage of the
> mount option "virtual_fsid" - when such option is used, btrfs generates
> a random fsid for the filesystem and leverages the metadata_uuid
> infrastructure (introduced by [0]) to enable the usage of this secondary
> virtual fsid. But differently from the regular metadata_uuid flag, this
> is not written into the disk superblock - effectively, this is a spoofed
> fsid approach that enables the same filesystem in different devices to
> appear as different filesystems to btrfs on runtime.

I would prefer a much simpler but more explicit method.

Just introduce a new compat_ro feature, maybe call it SINGLE_DEV.

By this, we can avoid multiple meanings of the same super member, nor
need any special mount option.
Remember, mount option is never a good way to enable/disable a new feature.

The better method to enable/disable a feature should be mkfs and btrfstune.

Then go mostly the same of your patch, but maybe with something extra:

- Disbale multi-dev code
   Include device add/replace/removal, this is already done in your
   patch.

- Completely skip device scanning
   I see no reason to keep btrfs with SINGLE_DEV feature to be added to
   the device list at all.
   It only needs to be scanned at mount time, and never be kept in the
   in-memory device list.

Thanks,
Qu

>
> In order to prevent more code complexity and potential corner cases,
> given the usage is aimed to single-devices / partitions, virtual_fsid
> is not allowed when the metadata_uuid flag is already present on the fs,
> or if the device is on fsid-change state. Device removal/replace is also
> disabled for devices mounted with the virtual_fsid option.
>
> [0] 7239ff4b2be8 ("btrfs: Introduce support for FSID change without metadata rewrite)
>
> Cc: Nikolay Borisov <nborisov@suse.com>
> Suggested-by: John Schoenick <johns@valvesoftware.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
>
> ---
>
> Hi folks, first of all thanks in advance for reviews / suggestions!
> Some design choices that worth a discussion:
>
> (1) The first choice was for the random fsid versus user-passed id.
> Initially we considered that the flag could be "virtual_fsid=%s", hence
> userspace would be responsible to generate the fsid. But seems the
> collision probability of fsids in near zero, also the random code
> hereby proposed checks if any other fsid/metadata_uuid present in
> the btrfs structures is a match, and if so, new random uuids are
> created until they prove the unique within btrfs.
>
> (2) About disabling device removal/replace: these cases could be
> handled I guess, but with increased code complexity. If there is a
> need for that, we could implement. Also worth mentioning that any
> advice is appreciated with regards to potential incompatibilities
> between "virtual_fsid" and any other btrfs feature / mount option.
>
> (3) There is no proposed documentation about the "virtual_fsid" here;
> seems the kernel docs points to a wiki page, so appreciate suggestions
> on how to edit such wiki and how to coordinate this with the patch
> development cycle.
>
>
>   fs/btrfs/disk-io.c | 22 ++++++++++--
>   fs/btrfs/ioctl.c   | 18 ++++++++++
>   fs/btrfs/super.c   | 32 +++++++++++------
>   fs/btrfs/volumes.c | 86 +++++++++++++++++++++++++++++++++++++++-------
>   fs/btrfs/volumes.h |  8 ++++-
>   5 files changed, 139 insertions(+), 27 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 9e1596bb208d..66c2bac343b8 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -468,7 +468,8 @@ static int check_tree_block_fsid(struct extent_buffer *eb)
>   	 * seed devices it's forbidden to have their uuid changed so reading
>   	 * ->fsid in this case is fine
>   	 */
> -	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
> +	if (btrfs_fs_incompat(fs_info, METADATA_UUID) ||
> +	    fs_devices->virtual_fsid)
>   		metadata_uuid = fs_devices->metadata_uuid;
>   	else
>   		metadata_uuid = fs_devices->fsid;
> @@ -2539,6 +2540,7 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
>   {
>   	u64 nodesize = btrfs_super_nodesize(sb);
>   	u64 sectorsize = btrfs_super_sectorsize(sb);
> +	u8 *fsid;
>   	int ret = 0;
>
>   	if (btrfs_super_magic(sb) != BTRFS_MAGIC) {
> @@ -2619,8 +2621,22 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
>   		ret = -EINVAL;
>   	}
>
> -	if (memcmp(fs_info->fs_devices->fsid, fs_info->super_copy->fsid,
> -		   BTRFS_FSID_SIZE)) {
> +	/*
> +	 * When the virtual_fsid flag is passed at mount time, btrfs
> +	 * creates a random fsid and makes use of the metadata_uuid
> +	 * infrastructure in order to allow, for example, two devices
> +	 * with same fsid getting mounted at the same time. But notice
> +	 * no changes happen at the disk level, so the random fsid is a
> +	 * driver abstraction for that single mount, not to be written
> +	 * in the disk. That's the reason we're required here to compare
> +	 * the fsid with the metadata_uuid if virtual_fsid was set.
> +	 */
> +	if (fs_info->fs_devices->virtual_fsid)
> +		fsid = fs_info->fs_devices->metadata_uuid;
> +	else
> +		fsid = fs_info->fs_devices->fsid;
> +
> +	if (memcmp(fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE)) {
>   		btrfs_err(fs_info,
>   		"superblock fsid doesn't match fsid of fs_devices: %pU != %pU",
>   			fs_info->super_copy->fsid, fs_info->fs_devices->fsid);
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index ba769a1eb87a..35e3a23f8c83 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2677,6 +2677,12 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
>   	if (!capable(CAP_SYS_ADMIN))
>   		return -EPERM;
>
> +	if (fs_info->fs_devices->virtual_fsid) {
> +		btrfs_err(fs_info,
> +			  "device removal is unsupported on devices mounted with virtual fsid\n");
> +		return -EINVAL;
> +	}
> +
>   	vol_args = memdup_user(arg, sizeof(*vol_args));
>   	if (IS_ERR(vol_args))
>   		return PTR_ERR(vol_args);
> @@ -2743,6 +2749,12 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
>   	if (!capable(CAP_SYS_ADMIN))
>   		return -EPERM;
>
> +	if (fs_info->fs_devices->virtual_fsid) {
> +		btrfs_err(fs_info,
> +			  "device removal is unsupported on devices mounted with virtual fsid\n");
> +		return -EINVAL;
> +	}
> +
>   	vol_args = memdup_user(arg, sizeof(*vol_args));
>   	if (IS_ERR(vol_args))
>   		return PTR_ERR(vol_args);
> @@ -3261,6 +3273,12 @@ static long btrfs_ioctl_dev_replace(struct btrfs_fs_info *fs_info,
>   		return -EINVAL;
>   	}
>
> +	if (fs_info->fs_devices->virtual_fsid) {
> +		btrfs_err(fs_info,
> +			  "device replace is unsupported on devices mounted with virtual fsid\n");
> +		return -EINVAL;
> +	}
> +
>   	p = memdup_user(arg, sizeof(*p));
>   	if (IS_ERR(p))
>   		return PTR_ERR(p);
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 366fb4cde145..8d9df169107a 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -115,6 +115,7 @@ enum {
>   	Opt_thread_pool,
>   	Opt_treelog, Opt_notreelog,
>   	Opt_user_subvol_rm_allowed,
> +	Opt_virtual_fsid,
>
>   	/* Rescue options */
>   	Opt_rescue,
> @@ -188,6 +189,7 @@ static const match_table_t tokens = {
>   	{Opt_treelog, "treelog"},
>   	{Opt_notreelog, "notreelog"},
>   	{Opt_user_subvol_rm_allowed, "user_subvol_rm_allowed"},
> +	{Opt_virtual_fsid, "virtual_fsid"},
>
>   	/* Rescue options */
>   	{Opt_rescue, "rescue=%s"},
> @@ -352,9 +354,10 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
>   		case Opt_subvol_empty:
>   		case Opt_subvolid:
>   		case Opt_device:
> +		case Opt_virtual_fsid:
>   			/*
>   			 * These are parsed by btrfs_parse_subvol_options or
> -			 * btrfs_parse_device_options and can be ignored here.
> +			 * btrfs_parse_early_options and can be ignored here.
>   			 */
>   			break;
>   		case Opt_nodatasum:
> @@ -845,9 +848,10 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
>    * All other options will be parsed on much later in the mount process and
>    * only when we need to allocate a new super block.
>    */
> -static int btrfs_parse_device_options(const char *options, fmode_t flags,
> -				      void *holder)
> +static int btrfs_parse_early_options(const char *options, fmode_t flags,
> +				    bool *virtual_fsid, void *holder)
>   {
> +	struct btrfs_scan_info info = { .vfsid = false };
>   	substring_t args[MAX_OPT_ARGS];
>   	char *device_name, *opts, *orig, *p;
>   	struct btrfs_device *device = NULL;
> @@ -874,14 +878,18 @@ static int btrfs_parse_device_options(const char *options, fmode_t flags,
>   			continue;
>
>   		token = match_token(p, tokens, args);
> +
> +		if (token == Opt_virtual_fsid)
> +			(*virtual_fsid) = true;
> +
>   		if (token == Opt_device) {
>   			device_name = match_strdup(&args[0]);
>   			if (!device_name) {
>   				error = -ENOMEM;
>   				goto out;
>   			}
> -			device = btrfs_scan_one_device(device_name, flags,
> -					holder);
> +			info.path = device_name;
> +			device = btrfs_scan_one_device(&info, flags, holder);
>   			kfree(device_name);
>   			if (IS_ERR(device)) {
>   				error = PTR_ERR(device);
> @@ -913,7 +921,7 @@ static int btrfs_parse_subvol_options(const char *options, char **subvol_name,
>
>   	/*
>   	 * strsep changes the string, duplicate it because
> -	 * btrfs_parse_device_options gets called later
> +	 * btrfs_parse_early_options gets called later
>   	 */
>   	opts = kstrdup(options, GFP_KERNEL);
>   	if (!opts)
> @@ -1431,6 +1439,7 @@ static struct dentry *mount_subvol(const char *subvol_name, u64 subvol_objectid,
>   static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
>   		int flags, const char *device_name, void *data)
>   {
> +	struct btrfs_scan_info info = { .path = NULL, .vfsid = false};
>   	struct block_device *bdev = NULL;
>   	struct super_block *s;
>   	struct btrfs_device *device = NULL;
> @@ -1472,13 +1481,14 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
>   	}
>
>   	mutex_lock(&uuid_mutex);
> -	error = btrfs_parse_device_options(data, mode, fs_type);
> +	error = btrfs_parse_early_options(data, mode, &info.vfsid, fs_type);
>   	if (error) {
>   		mutex_unlock(&uuid_mutex);
>   		goto error_fs_info;
>   	}
>
> -	device = btrfs_scan_one_device(device_name, mode, fs_type);
> +	info.path = device_name;
> +	device = btrfs_scan_one_device(&info, mode, fs_type);
>   	if (IS_ERR(device)) {
>   		mutex_unlock(&uuid_mutex);
>   		error = PTR_ERR(device);
> @@ -2169,6 +2179,7 @@ static int btrfs_control_open(struct inode *inode, struct file *file)
>   static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
>   				unsigned long arg)
>   {
> +	struct btrfs_scan_info info = { .vfsid = false };
>   	struct btrfs_ioctl_vol_args *vol;
>   	struct btrfs_device *device = NULL;
>   	dev_t devt = 0;
> @@ -2182,10 +2193,11 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
>   		return PTR_ERR(vol);
>   	vol->name[BTRFS_PATH_NAME_MAX] = '\0';
>
> +	info.path = vol->name;
>   	switch (cmd) {
>   	case BTRFS_IOC_SCAN_DEV:
>   		mutex_lock(&uuid_mutex);
> -		device = btrfs_scan_one_device(vol->name, FMODE_READ,
> +		device = btrfs_scan_one_device(&info, FMODE_READ,
>   					       &btrfs_root_fs_type);
>   		ret = PTR_ERR_OR_ZERO(device);
>   		mutex_unlock(&uuid_mutex);
> @@ -2200,7 +2212,7 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
>   		break;
>   	case BTRFS_IOC_DEVICES_READY:
>   		mutex_lock(&uuid_mutex);
> -		device = btrfs_scan_one_device(vol->name, FMODE_READ,
> +		device = btrfs_scan_one_device(&info, FMODE_READ,
>   					       &btrfs_root_fs_type);
>   		if (IS_ERR(device)) {
>   			mutex_unlock(&uuid_mutex);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index c6d592870400..5a38b3482ec5 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -745,6 +745,33 @@ static struct btrfs_fs_devices *find_fsid_reverted_metadata(
>
>   	return NULL;
>   }
> +
> +static void prepare_virtual_fsid(struct btrfs_super_block *disk_super,
> +				 const char *path)
> +{
> +	struct btrfs_fs_devices *fs_devices;
> +	u8 vfsid[BTRFS_FSID_SIZE];
> +	bool dup_fsid = true;
> +
> +	while (dup_fsid) {
> +		dup_fsid = false;
> +		generate_random_uuid(vfsid);
> +
> +		list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
> +			if (!memcmp(vfsid, fs_devices->fsid, BTRFS_FSID_SIZE) ||
> +			    !memcmp(vfsid, fs_devices->metadata_uuid,
> +				    BTRFS_FSID_SIZE))
> +				dup_fsid = true;
> +		}
> +	}
> +
> +	memcpy(disk_super->metadata_uuid, disk_super->fsid, BTRFS_FSID_SIZE);
> +	memcpy(disk_super->fsid, vfsid, BTRFS_FSID_SIZE);
> +
> +	pr_info("BTRFS: virtual fsid (%pU) set for device %s (real fsid %pU)\n",
> +		disk_super->fsid, path, disk_super->metadata_uuid);
> +}
> +
>   /*
>    * Add new device to list of registered devices
>    *
> @@ -752,12 +779,15 @@ static struct btrfs_fs_devices *find_fsid_reverted_metadata(
>    * device pointer which was just added or updated when successful
>    * error pointer when failed
>    */
> -static noinline struct btrfs_device *device_list_add(const char *path,
> -			   struct btrfs_super_block *disk_super,
> -			   bool *new_device_added)
> +static noinline
> +struct btrfs_device *device_list_add(const struct btrfs_scan_info *info,
> +				     struct btrfs_super_block *disk_super,
> +				     bool *new_device_added)
>   {
>   	struct btrfs_device *device;
>   	struct btrfs_fs_devices *fs_devices = NULL;
> +	const char *path = info->path;
> +	const bool virtual_fsid = info->vfsid;
>   	struct rcu_string *name;
>   	u64 found_transid = btrfs_super_generation(disk_super);
>   	u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
> @@ -775,12 +805,25 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>   		return ERR_PTR(error);
>   	}
>
> +	if (virtual_fsid) {
> +		if (has_metadata_uuid || fsid_change_in_progress) {
> +			btrfs_err(NULL,
> +			  "failed to add device %s with virtual fsid (%s)\n",
> +				  path, (has_metadata_uuid ?
> +					 "the fs already has a metadata_uuid" :
> +					 "fsid change in progress"));
> +			return ERR_PTR(-EINVAL);
> +		}
> +
> +		prepare_virtual_fsid(disk_super, path);
> +	}
> +
>   	if (fsid_change_in_progress) {
>   		if (!has_metadata_uuid)
>   			fs_devices = find_fsid_inprogress(disk_super);
>   		else
>   			fs_devices = find_fsid_changed(disk_super);
> -	} else if (has_metadata_uuid) {
> +	} else if (has_metadata_uuid || virtual_fsid) {
>   		fs_devices = find_fsid_with_metadata_uuid(disk_super);
>   	} else {
>   		fs_devices = find_fsid_reverted_metadata(disk_super);
> @@ -790,7 +833,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>
>
>   	if (!fs_devices) {
> -		if (has_metadata_uuid)
> +		if (has_metadata_uuid || virtual_fsid)
>   			fs_devices = alloc_fs_devices(disk_super->fsid,
>   						      disk_super->metadata_uuid);
>   		else
> @@ -799,6 +842,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>   		if (IS_ERR(fs_devices))
>   			return ERR_CAST(fs_devices);
>
> +		fs_devices->virtual_fsid = virtual_fsid;
>   		fs_devices->fsid_change = fsid_change_in_progress;
>
>   		mutex_lock(&fs_devices->device_list_mutex);
> @@ -870,11 +914,21 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>   	"BTRFS: device label %s devid %llu transid %llu %s scanned by %s (%d)\n",
>   				disk_super->label, devid, found_transid, path,
>   				current->comm, task_pid_nr(current));
> -		else
> -			pr_info(
> +		else {
> +			if (virtual_fsid)
> +				pr_info(
> +"BTRFS: device with virtual fsid %pU (real fsid %pU) devid %llu transid %llu %s scanned by %s (%d)\n",
> +					disk_super->fsid,
> +					disk_super->metadata_uuid, devid,
> +					found_transid, path, current->comm,
> +					task_pid_nr(current));
> +			else
> +				pr_info(
>   	"BTRFS: device fsid %pU devid %llu transid %llu %s scanned by %s (%d)\n",
> -				disk_super->fsid, devid, found_transid, path,
> -				current->comm, task_pid_nr(current));
> +					disk_super->fsid, devid, found_transid,
> +					path, current->comm,
> +					task_pid_nr(current));
> +		}
>
>   	} else if (!device->name || strcmp(device->name->str, path)) {
>   		/*
> @@ -1348,8 +1402,8 @@ int btrfs_forget_devices(dev_t devt)
>    * and we are not allowed to call set_blocksize during the scan. The superblock
>    * is read via pagecache
>    */
> -struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
> -					   void *holder)
> +struct btrfs_device *btrfs_scan_one_device(const struct btrfs_scan_info *info,
> +					   fmode_t flags, void *holder)
>   {
>   	struct btrfs_super_block *disk_super;
>   	bool new_device_added = false;
> @@ -1377,7 +1431,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
>   	 * values temporarily, as the device paths of the fsid are the only
>   	 * required information for assembling the volume.
>   	 */
> -	bdev = blkdev_get_by_path(path, flags, holder);
> +	bdev = blkdev_get_by_path(info->path, flags, holder);
>   	if (IS_ERR(bdev))
>   		return ERR_CAST(bdev);
>
> @@ -1394,7 +1448,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
>   		goto error_bdev_put;
>   	}
>
> -	device = device_list_add(path, disk_super, &new_device_added);
> +	device = device_list_add(info, disk_super, &new_device_added);
>   	if (!IS_ERR(device) && new_device_added)
>   		btrfs_free_stale_devices(device->devt, device);
>
> @@ -2390,6 +2444,12 @@ int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
>
>   	args->devid = btrfs_stack_device_id(&disk_super->dev_item);
>   	memcpy(args->uuid, disk_super->dev_item.uuid, BTRFS_UUID_SIZE);
> +
> +	/*
> +	 * Notice that the virtual_fsid feature is not handled here; device
> +	 * removal/replace is instead forbidden when such feature is in-use,
> +	 * but this note is for future users of btrfs_get_dev_args_from_path().
> +	 */
>   	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
>   		memcpy(args->fsid, disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>   	else
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 7e51f2238f72..f2354e8288f9 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -73,6 +73,11 @@ enum btrfs_raid_types {
>   #define BTRFS_DEV_STATE_FLUSH_SENT	(4)
>   #define BTRFS_DEV_STATE_NO_READA	(5)
>
> +struct btrfs_scan_info {
> +	const char *path;
> +	bool vfsid;
> +};
> +
>   struct btrfs_zoned_device_info;
>
>   struct btrfs_device {
> @@ -278,6 +283,7 @@ struct btrfs_fs_devices {
>   	u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
>   	u8 metadata_uuid[BTRFS_FSID_SIZE];
>   	bool fsid_change;
> +	bool virtual_fsid;
>   	struct list_head fs_list;
>
>   	/*
> @@ -537,7 +543,7 @@ struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
>   void btrfs_mapping_tree_free(struct extent_map_tree *tree);
>   int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
>   		       fmode_t flags, void *holder);
> -struct btrfs_device *btrfs_scan_one_device(const char *path,
> +struct btrfs_device *btrfs_scan_one_device(const struct btrfs_scan_info *info,
>   					   fmode_t flags, void *holder);
>   int btrfs_forget_devices(dev_t devt);
>   void btrfs_close_devices(struct btrfs_fs_devices *fs_devices);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-04 17:07 ` [PATCH 1/2] btrfs: Introduce the virtual_fsid feature Guilherme G. Piccoli
  2023-05-05  7:21   ` Qu Wenruo
@ 2023-05-05 13:18   ` David Sterba
  2023-05-05 16:18     ` Guilherme G. Piccoli
  1 sibling, 1 reply; 38+ messages in thread
From: David Sterba @ 2023-05-05 13:18 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-btrfs, clm, josef, dsterba, linux-fsdevel, kernel,
	kernel-dev, vivek, ludovico.denittis, johns, nborisov

On Thu, May 04, 2023 at 02:07:07PM -0300, Guilherme G. Piccoli wrote:
> Btrfs doesn't currently support to mount 2 different devices holding the
> same filesystem - the fsid is used as a unique identifier in the driver.
> This case is supported though in some other common filesystems, like
> ext4; one of the reasons for which is not trivial supporting this case
> on btrfs is due to its multi-device filesystem nature, native RAID, etc.
> 
> Supporting the same-fsid mounts has the advantage of allowing btrfs to
> be used in A/B partitioned devices, like mobile phones or the Steam Deck
> for example. Without this support, it's not safe for users to keep the
> same "image version" in both A and B partitions, a setup that is quite
> common for development, for example. Also, as a big bonus, it allows fs
> integrity check based on block devices for RO devices (whereas currently
> it is required that both have different fsid, breaking the block device
> hash comparison).
> 
> Such same-fsid mounting is hereby added through the usage of the
> mount option "virtual_fsid" - when such option is used, btrfs generates
> a random fsid for the filesystem and leverages the metadata_uuid
> infrastructure (introduced by [0]) to enable the usage of this secondary
> virtual fsid. But differently from the regular metadata_uuid flag, this
> is not written into the disk superblock - effectively, this is a spoofed
> fsid approach that enables the same filesystem in different devices to
> appear as different filesystems to btrfs on runtime.

Have you evaluated if the metadata_uuid could be used for that? It is
stored on the filesystem image, but could you adapt the usecase to set
the UUID before mount manually (in tooling)?

The metadata_uuid is lightweight and meant to change the appearance of
the fs regarding uuids, verly close to what you describe. Adding yet
another uuid does not seem right so I'm first asking if and in what way
the metadata_uuid could be extended.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-05  7:21   ` Qu Wenruo
@ 2023-05-05 13:38     ` David Sterba
  2023-05-08 11:27       ` Anand Jain
  2023-05-05 15:51     ` Guilherme G. Piccoli
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 38+ messages in thread
From: David Sterba @ 2023-05-05 13:38 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Guilherme G. Piccoli, linux-btrfs, clm, josef, dsterba,
	linux-fsdevel, kernel, kernel-dev, vivek, ludovico.denittis,
	johns, nborisov

On Fri, May 05, 2023 at 03:21:35PM +0800, Qu Wenruo wrote:
> On 2023/5/5 01:07, Guilherme G. Piccoli wrote:
> > Btrfs doesn't currently support to mount 2 different devices holding the
> > same filesystem - the fsid is used as a unique identifier in the driver.
> > This case is supported though in some other common filesystems, like
> > ext4; one of the reasons for which is not trivial supporting this case
> > on btrfs is due to its multi-device filesystem nature, native RAID, etc.
> 
> Exactly, the biggest problem is the multi-device support.
> 
> Btrfs needs to search and assemble all devices of a multi-device
> filesystem, which is normally handled by things like LVM/DMraid, thus
> other traditional fses won't need to bother that.
> 
> >
> > Supporting the same-fsid mounts has the advantage of allowing btrfs to
> > be used in A/B partitioned devices, like mobile phones or the Steam Deck
> > for example. Without this support, it's not safe for users to keep the
> > same "image version" in both A and B partitions, a setup that is quite
> > common for development, for example. Also, as a big bonus, it allows fs
> > integrity check based on block devices for RO devices (whereas currently
> > it is required that both have different fsid, breaking the block device
> > hash comparison).
> >
> > Such same-fsid mounting is hereby added through the usage of the
> > mount option "virtual_fsid" - when such option is used, btrfs generates
> > a random fsid for the filesystem and leverages the metadata_uuid
> > infrastructure (introduced by [0]) to enable the usage of this secondary
> > virtual fsid. But differently from the regular metadata_uuid flag, this
> > is not written into the disk superblock - effectively, this is a spoofed
> > fsid approach that enables the same filesystem in different devices to
> > appear as different filesystems to btrfs on runtime.
> 
> I would prefer a much simpler but more explicit method.
> 
> Just introduce a new compat_ro feature, maybe call it SINGLE_DEV.
> 
> By this, we can avoid multiple meanings of the same super member, nor
> need any special mount option.
> Remember, mount option is never a good way to enable/disable a new feature.
> 
> The better method to enable/disable a feature should be mkfs and btrfstune.
> 
> Then go mostly the same of your patch, but maybe with something extra:
> 
> - Disbale multi-dev code
>    Include device add/replace/removal, this is already done in your
>    patch.
> 
> - Completely skip device scanning
>    I see no reason to keep btrfs with SINGLE_DEV feature to be added to
>    the device list at all.
>    It only needs to be scanned at mount time, and never be kept in the
>    in-memory device list.

This is actually a good point, we can do that already. As a conterpart
to 5f58d783fd7823 ("btrfs: free device in btrfs_close_devices for a
single device filesystem") that drops single device from the list,
single fs devices wouldn't be added to the list but some checks could be
still done like superblock validation for eventual error reporting.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-05  7:21   ` Qu Wenruo
  2023-05-05 13:38     ` David Sterba
@ 2023-05-05 15:51     ` Guilherme G. Piccoli
  2023-05-05 22:15       ` Qu Wenruo
  2023-05-05 17:34     ` Goffredo Baroncelli
  2023-07-05 22:52     ` Guilherme G. Piccoli
  3 siblings, 1 reply; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-05-05 15:51 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs
  Cc: clm, josef, dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, nborisov

On 05/05/2023 04:21, Qu Wenruo wrote:
> [...]
> Exactly, the biggest problem is the multi-device support.
> 
> Btrfs needs to search and assemble all devices of a multi-device
> filesystem, which is normally handled by things like LVM/DMraid, thus
> other traditional fses won't need to bother that.

Hi Qu, thanks a bunch for your feedback, and for validating my
understanding of the issue!


>  [...]
> 
> I would prefer a much simpler but more explicit method.
> 
> Just introduce a new compat_ro feature, maybe call it SINGLE_DEV.
> 
> By this, we can avoid multiple meanings of the same super member, nor
> need any special mount option.
> Remember, mount option is never a good way to enable/disable a new feature.
> 
> The better method to enable/disable a feature should be mkfs and btrfstune.
> 
> Then go mostly the same of your patch, but maybe with something extra:
> 
> - Disbale multi-dev code
>    Include device add/replace/removal, this is already done in your
>    patch.
> 
> - Completely skip device scanning
>    I see no reason to keep btrfs with SINGLE_DEV feature to be added to
>    the device list at all.
>    It only needs to be scanned at mount time, and never be kept in the
>    in-memory device list.
> 

This seems very interesting, but I'm a bit confused on how that would
work with 2 identical filesystem images mounted at the same time.

Imagine you have 2 devices, /dev/sda1 and /dev/sda2 holding the exact
same image, with the SINGLE_DEV feature set. They are identical, and
IIUC no matter if we skip scanning or disable any multi-device approach,
in the end both have the *same* fsid. How do we track this in the btrfs
code now? Once we try to mount the second one, it'll try to add the same
entity to the fs_uuids list...

That's the problem I faced when investigating the code and why the
proposal is to "spoof" the fsid with some random generated one.

Also, one more question: why do you say "Remember, mount option is never
a good way to enable/disable a new feature"? I'm not expert in
filesystems (by far heh), so I'm curious to fully understand your
point-of-view.

From my naive viewpoint, seems a mount option is "cheaper" than
introducing a new feature in the OS that requires changes on btrfs
userspace applications as well as to track incompatibilities in
different kernel versions.

Thanks again,


Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-05 13:18   ` David Sterba
@ 2023-05-05 16:18     ` Guilherme G. Piccoli
  2023-05-05 23:00       ` David Sterba
  0 siblings, 1 reply; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-05-05 16:18 UTC (permalink / raw)
  To: dsterba
  Cc: linux-btrfs, clm, josef, dsterba, linux-fsdevel, kernel,
	kernel-dev, vivek, ludovico.denittis, johns, Qu Wenruo

On 05/05/2023 10:18, David Sterba wrote:
> [...]
> Have you evaluated if the metadata_uuid could be used for that? It is
> stored on the filesystem image, but could you adapt the usecase to set
> the UUID before mount manually (in tooling)?
> 
> The metadata_uuid is lightweight and meant to change the appearance of
> the fs regarding uuids, verly close to what you describe. Adding yet
> another uuid does not seem right so I'm first asking if and in what way
> the metadata_uuid could be extended.

Hi David, thanks for your suggestion!

It might be possible, it seems a valid suggestion. But worth notice that
we cannot modify the FS at all. That's why I've implemented the feature
in a way it "fakes" the fsid for the driver, as a mount option, but
nothing changes in the FS.

The images on Deck are read-only. So, by using the metadata_uuid purely,
can we mount 2 identical images at the same time *not modifying* the
filesystem in any way? If it's possible, then we have only to implement
the skip scanning idea from Qu in the other thread (or else ioclt scans
would prevent mounting them).

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs
  2023-05-04 21:09     ` Goffredo Baroncelli
@ 2023-05-05 16:21       ` Guilherme G. Piccoli
  0 siblings, 0 replies; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-05-05 16:21 UTC (permalink / raw)
  To: kreijack
  Cc: linux-btrfs, clm, josef, dsterba, linux-fsdevel, kernel,
	kernel-dev, vivek, ludovico.denittis, johns

On 04/05/2023 18:09, Goffredo Baroncelli wrote:
> [...]
>> Is there any missing step from my side, or mounting both devices is
>> really a limitation when using the forget option?
>>
> 
>  From my limited BTRFS internal knowledge, I think that your patches
> takes the correct approach: using the "metadata_uuid" code to allow
> two filesystems with the same uuid to exist at the same time.
> 
> [...] 
>> How btrfs would know it is a case for single-device filesystem? In other
>> words: how would we distinguish between the cases we want to auto-forget
>> before mounting, and the cases in which this behavior is undesired?
> 
> If I remember correctly in the super-block there is the number of disks
> that compose the filesystem. If the count is 1, it should be safe to
> forget-before-mount the filesystem (or to not store in the cache
> after a scan)

Thanks Goffredo, for the clarifications =)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs
  2023-05-05  5:16 ` Anand Jain
@ 2023-05-05 16:27   ` Guilherme G. Piccoli
  2023-05-05 17:37     ` Goffredo Baroncelli
  2023-05-05 18:15     ` Vivek Dasmohapatra
  0 siblings, 2 replies; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-05-05 16:27 UTC (permalink / raw)
  To: Anand Jain, johns, vivek, ludovico.denittis
  Cc: clm, josef, linux-btrfs, dsterba, linux-fsdevel, kernel, kernel-dev

On 05/05/2023 02:16, Anand Jain wrote:
> [...]
>>
>> https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/
> 
> Confused about your requirement: 2 identical filesystems mounted 
> simultaneously or just one at a time? Latter works. Bugs were fixed.

Hi Anand, apologies - in fact, in this old-ish thread I mentioned we
need to mount one at a time, and this corresponds for the majority of
the use case. BUT...it seems that for the installing step we require to
have *both* mounted at the same time for a while, so it was a change in
the requirement since last analysis, and this is really what we
implemented here.


> 
> Have you considered using the btrfs seed device feature to avoid 
> sacrificing 50% capacity? Create read-only seed device as golden image, 
> add writable device on top. Example:
> 
>    $ btrfstune -S1 /dev/rdonly-golden-img
>    $ mount /dev/rdonly-golden-img /btrfs
>    $ btrfs dev add /dev/rw-dev /btrfs
>    $ mount -o remount,rw /dev/rw-dev /btrfs
> 
> To switch golden image:
> 
>    $ btrfs dev del /dev/rdonly-golden-img /btrfs
>    $ umount /btrfs
>    $ btrfstune -S1 /dev/rw-dev
> 

Yeah, I'm aware that btrfs has some features that might fit and could
even save space, but due to some requirements on Deck it's not possible
to use them.

I'll defer a more detailed response for John / Vivek / Ludovico, that
are aware of the use case in a detail level I'm not, since they designed
the installation / update path from the ground up.

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-05  7:21   ` Qu Wenruo
  2023-05-05 13:38     ` David Sterba
  2023-05-05 15:51     ` Guilherme G. Piccoli
@ 2023-05-05 17:34     ` Goffredo Baroncelli
  2023-05-05 22:31       ` Qu Wenruo
  2023-07-05 22:52     ` Guilherme G. Piccoli
  3 siblings, 1 reply; 38+ messages in thread
From: Goffredo Baroncelli @ 2023-05-05 17:34 UTC (permalink / raw)
  To: Qu Wenruo, Guilherme G. Piccoli, linux-btrfs
  Cc: clm, josef, dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, nborisov

On 05/05/2023 09.21, Qu Wenruo wrote:
> 
> I would prefer a much simpler but more explicit method.
> 
> Just introduce a new compat_ro feature, maybe call it SINGLE_DEV.

It is not clear to me if we need that.

I don't understand in what checking for SINGLE_DEV is different from
btrfs_super_block.disks_num == 1.

Let me to argument:

I see two scenarios:
1) mount two different fs with the same UUID NOT at the same time:
This could be done now with small change in the kernel:
- we need to NOT store the data of a filesystem when a disk is
   scanned IF it is composed by only one disk
- after the unmount we need to discard the data too (checking again
   that the filesystem is composed by only one disk)

No limit is needed to add/replace a disk. Of course after a disk is
added a filesystem with the same UUID cannot be mounted without a
full cycle of --forget.

I have to point out that this problem would be easily solved in
userspace if we switch from the current model where the disks are
scanned asynchronously (udev which call btrfs dev scan) to a model
where the disk are scanned at mount time by a mount.btrfs helper.

A mount.btrfs helper, also could be a place to put some more clear error
message like "we cannot mount this filesystem because one disk of a
raid5 is missing, try passing -o degraded"
or "we cannot mount this filesystem because we detect a brain split
problem" ....

2) mount two different fs with the same UUID at the SAME time:
This is a bit more complicated; we need to store a virtual UUID
somewhere.

However sometime we need to use the real fsid (during a write),
and sometime we need to use the virtual_uuid (e.g. for /sys/fs/btrfs/<uuid>)

Both in 1) and 2) we need to/it is enough to have btrfs_super_block.disks_num == 1
In the case 2) using a virtual_uuid mount option will prevent
to add a disk.


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs
  2023-05-05 16:27   ` Guilherme G. Piccoli
@ 2023-05-05 17:37     ` Goffredo Baroncelli
  2023-05-05 18:15     ` Vivek Dasmohapatra
  1 sibling, 0 replies; 38+ messages in thread
From: Goffredo Baroncelli @ 2023-05-05 17:37 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Anand Jain, johns, vivek, ludovico.denittis
  Cc: clm, josef, linux-btrfs, dsterba, linux-fsdevel, kernel, kernel-dev

On 05/05/2023 18.27, Guilherme G. Piccoli wrote:
> On 05/05/2023 02:16, Anand Jain wrote:
>> [...]
>>>
>>> https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/
>>
>> Confused about your requirement: 2 identical filesystems mounted
>> simultaneously or just one at a time? Latter works. Bugs were fixed.
> 
> Hi Anand, apologies - in fact, in this old-ish thread I mentioned we
> need to mount one at a time, and this corresponds for the majority of
> the use case. BUT...it seems that for the installing step we require to
> have *both* mounted at the same time for a while, so it was a change in
> the requirement since last analysis, and this is really what we
> implemented here.

What if the different images have different uuid from the begin ?


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs
  2023-05-05 16:27   ` Guilherme G. Piccoli
  2023-05-05 17:37     ` Goffredo Baroncelli
@ 2023-05-05 18:15     ` Vivek Dasmohapatra
  1 sibling, 0 replies; 38+ messages in thread
From: Vivek Dasmohapatra @ 2023-05-05 18:15 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Anand Jain, johns, ludovico.denittis
  Cc: clm, josef, linux-btrfs, dsterba, linux-fsdevel, kernel, kernel-dev

On 05/05/2023 17:27, Guilherme G. Piccoli wrote:
> On 05/05/2023 02:16, Anand Jain wrote:

[cut]

> I'll defer a more detailed response for John / Vivek / Ludovico, that
> are aware of the use case in a detail level I'm not, since they designed
> the installation / update path from the ground up.
> 

The OS images are entirely independent. The goal is that you could
completely corrupt slot A and it would have no impact on the bootability
of slot B.

So, yes, we sacrifice space but as a trade off we get robustness which
is more important to us.

=========================================================================

When a new OS image is delivered, the normal flow is this (simplified):

While booted on slot A (for example) the update process is started.

Our client fetches the most recent image from the update server.

This is delivered as a block level diff between the image you
have and the image you want.

The partitions that are allocated to slot B have the new data written
into them.

As a final step, the root fs of the new slot is mounted and a couple of
initialisation steps are completed (mostly writing config into the
common boot partition: The slot B partitions contents are not modified
as a result of this).

The system is rebooted. If all goes well slot B is booted and becomes
the primary (current) image.

If it fails for some reason, the bootloader will (either automatically
or by user intervention) go back to booting slot A.

Note that other than the final mount to update the common boot partition
with information about the new image we don't care at all about the
contents or even the type of the filesystems we have delivered (and even
then all we care about is that we _can_ mount it, not what it is).
===========================================================================

Now normally this is not a problem: If the new image is not the same as
the current one we will have written entirely new filesystems into
the B partitions and there is no conflict.

However if the user wishes or needs to reinstall a fresh copy of the
_current_ image (for whatever reason: maybe the current image is damaged
in some way and they need to so a factory reset) then with btrfs in the
mix this breaks down:

Since btrfs won't (at present) tolerate a second fs with the same fsuuid
we have to check that the user is not installing the same image on both
slots.

If the user has a broken image which is also the latest release and
needs to recover we have to artificially select an _older_ image, put
that on slot B. boot into that, then the user needs to boot that and
upgrade _again_ to get a repaired A slot.

This sort of works but isn't a great user experience and introduces an
artificial restriction - suddenly the images _do_ affect one another.

If the user subverts our safety checks (or we mess up and put the same
image on both slots) then suddenly the whole system becomes unbootable
which is less than ideal.

Hope that clarifies the situation and explains why we care.




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-05 15:51     ` Guilherme G. Piccoli
@ 2023-05-05 22:15       ` Qu Wenruo
  2023-05-08 22:49         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 38+ messages in thread
From: Qu Wenruo @ 2023-05-05 22:15 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Qu Wenruo, linux-btrfs
  Cc: clm, josef, dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, nborisov



On 2023/5/5 23:51, Guilherme G. Piccoli wrote:
> On 05/05/2023 04:21, Qu Wenruo wrote:
>> [...]
>> Exactly, the biggest problem is the multi-device support.
>>
>> Btrfs needs to search and assemble all devices of a multi-device
>> filesystem, which is normally handled by things like LVM/DMraid, thus
>> other traditional fses won't need to bother that.
> 
> Hi Qu, thanks a bunch for your feedback, and for validating my
> understanding of the issue!
> 
> 
>>   [...]
>>
>> I would prefer a much simpler but more explicit method.
>>
>> Just introduce a new compat_ro feature, maybe call it SINGLE_DEV.
>>
>> By this, we can avoid multiple meanings of the same super member, nor
>> need any special mount option.
>> Remember, mount option is never a good way to enable/disable a new feature.
>>
>> The better method to enable/disable a feature should be mkfs and btrfstune.
>>
>> Then go mostly the same of your patch, but maybe with something extra:
>>
>> - Disbale multi-dev code
>>     Include device add/replace/removal, this is already done in your
>>     patch.
>>
>> - Completely skip device scanning
>>     I see no reason to keep btrfs with SINGLE_DEV feature to be added to
>>     the device list at all.
>>     It only needs to be scanned at mount time, and never be kept in the
>>     in-memory device list.
>>
> 
> This seems very interesting, but I'm a bit confused on how that would
> work with 2 identical filesystem images mounted at the same time.
> 
> Imagine you have 2 devices, /dev/sda1 and /dev/sda2 holding the exact
> same image, with the SINGLE_DEV feature set. They are identical, and
> IIUC no matter if we skip scanning or disable any multi-device approach,
> in the end both have the *same* fsid. How do we track this in the btrfs
> code now? Once we try to mount the second one, it'll try to add the same
> entity to the fs_uuids list...

My bad, I forgot to mention that, if we hit such SINGLE_DEV fses, we 
should also not add them to the fs_uuids list either.

So the fs_uuids list conflicts would not be a problem at all.

> 
> That's the problem I faced when investigating the code and why the
> proposal is to "spoof" the fsid with some random generated one.
> 
> Also, one more question: why do you say "Remember, mount option is never
> a good way to enable/disable a new feature"? I'm not expert in
> filesystems (by far heh), so I'm curious to fully understand your
> point-of-view.

We had a bad example in the past, free space tree (aka, v2 space cache).

It's initially a pretty convenient way to enable the new feature, but 
now it's making a lot of new features, which can depend on v2 cache, 
more complex to handle those old mount options.

The compatibility matrix would only grow, and all the (mostly one-time) 
logic need to be implemented in kernel.

So in the long run, we prefer offline convert tool.

Thanks,
Qu

> 
>  From my naive viewpoint, seems a mount option is "cheaper" than
> introducing a new feature in the OS that requires changes on btrfs
> userspace applications as well as to track incompatibilities in
> different kernel versions.
> 
> Thanks again,
> 
> 
> Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-05 17:34     ` Goffredo Baroncelli
@ 2023-05-05 22:31       ` Qu Wenruo
  2023-05-06 17:30         ` Goffredo Baroncelli
  0 siblings, 1 reply; 38+ messages in thread
From: Qu Wenruo @ 2023-05-05 22:31 UTC (permalink / raw)
  To: kreijack, Qu Wenruo, Guilherme G. Piccoli, linux-btrfs
  Cc: clm, josef, dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, nborisov



On 2023/5/6 01:34, Goffredo Baroncelli wrote:
> On 05/05/2023 09.21, Qu Wenruo wrote:
>>
>> I would prefer a much simpler but more explicit method.
>>
>> Just introduce a new compat_ro feature, maybe call it SINGLE_DEV.
> 
> It is not clear to me if we need that.
> 
> I don't understand in what checking for SINGLE_DEV is different from
> btrfs_super_block.disks_num == 1.

Because disks_num == 1 doesn't exclude the ability to add new disks in 
the future.

Without that new SINGLE_DEV compat_ro, we should still do the regular 
device scan.

> 
> Let me to argument:
> 
> I see two scenarios:
> 1) mount two different fs with the same UUID NOT at the same time:
> This could be done now with small change in the kernel:
> - we need to NOT store the data of a filesystem when a disk is
>    scanned IF it is composed by only one disk
> - after the unmount we need to discard the data too (checking again
>    that the filesystem is composed by only one disk)
> 
> No limit is needed to add/replace a disk. Of course after a disk is
> added a filesystem with the same UUID cannot be mounted without a
> full cycle of --forget.

The problem is, what if:

- Both btrfs have single disk
- Both btrfs have the same fsid
- Both btrfs have been mounted
- Then one of btrfs is going to add a new disk

We either:

- Don't scan nor trace the device/fsid anyway
   Then after unmount, the new two disks btrfs can not be mounted
   and need extra scan/forgot/whatever to reassemble list.
   And that would also cause fsid conflicts if not properly handled
   between the single and multi disk btrfs.

- Scan and record the fsid/device at device add time
   This means we should reject the device add.
   This can sometimes cause confusion to the end user, just because they
   have mounted another fs, now they can not add a new device.

   And this is going to change device add code path quite hugely.
   We currently expects all device scan/trace thing done way before
   mount.
   Such huge change can lead to hidden bugs.

To me, neither is good to the end users.

A SINGLE_DEV feature would reject the corner case in a way more 
user-friendly and clear way.

   With SINGLE_DEV feature, just no dev add/replace/delete no matter
   what.


> 
> I have to point out that this problem would be easily solved in
> userspace if we switch from the current model where the disks are
> scanned asynchronously (udev which call btrfs dev scan) to a model
> where the disk are scanned at mount time by a mount.btrfs helper.
> 
> A mount.btrfs helper, also could be a place to put some more clear error
> message like "we cannot mount this filesystem because one disk of a
> raid5 is missing, try passing -o degraded"
> or "we cannot mount this filesystem because we detect a brain split
> problem" ....
> 
> 2) mount two different fs with the same UUID at the SAME time:
> This is a bit more complicated; we need to store a virtual UUID
> somewhere.
> 
> However sometime we need to use the real fsid (during a write),
> and sometime we need to use the virtual_uuid (e.g. for 
> /sys/fs/btrfs/<uuid>)

Another thing is, we already have too many uuids.

Some are unavoidable like fsid and device uuid.

But I still prefer not to add a new layer of unnecessary uuids.

Thanks,
Qu

> 
> Both in 1) and 2) we need to/it is enough to have 
> btrfs_super_block.disks_num == 1
> In the case 2) using a virtual_uuid mount option will prevent
> to add a disk.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-05 16:18     ` Guilherme G. Piccoli
@ 2023-05-05 23:00       ` David Sterba
  2023-05-08 22:59         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 38+ messages in thread
From: David Sterba @ 2023-05-05 23:00 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: dsterba, linux-btrfs, clm, josef, dsterba, linux-fsdevel, kernel,
	kernel-dev, vivek, ludovico.denittis, johns, Qu Wenruo

On Fri, May 05, 2023 at 01:18:56PM -0300, Guilherme G. Piccoli wrote:
> On 05/05/2023 10:18, David Sterba wrote:
> > [...]
> > Have you evaluated if the metadata_uuid could be used for that? It is
> > stored on the filesystem image, but could you adapt the usecase to set
> > the UUID before mount manually (in tooling)?
> > 
> > The metadata_uuid is lightweight and meant to change the appearance of
> > the fs regarding uuids, verly close to what you describe. Adding yet
> > another uuid does not seem right so I'm first asking if and in what way
> > the metadata_uuid could be extended.
> 
> Hi David, thanks for your suggestion!
> 
> It might be possible, it seems a valid suggestion. But worth notice that
> we cannot modify the FS at all. That's why I've implemented the feature
> in a way it "fakes" the fsid for the driver, as a mount option, but
> nothing changes in the FS.
> 
> The images on Deck are read-only. So, by using the metadata_uuid purely,
> can we mount 2 identical images at the same time *not modifying* the
> filesystem in any way? If it's possible, then we have only to implement
> the skip scanning idea from Qu in the other thread (or else ioclt scans
> would prevent mounting them).

Ok, I see, the device is read-only. The metadata_uuid is now set on an
unmounted filesystem and we don't have any semantics for a mount option.

If there's an equivalent mount option (let's say metadata_uuid for
compatibility) with the same semantics as if set offline, on the first
commit the metadata_uuid would be written.

The question is if this would be sane for read-only devices. You've
implemented the uuid on the metadata_uuid base but named it differently,
but this effectively means that metadata_uuid could work on read-only
devices too, but with some necessary updates to the device scanning.

From the use case perspective this should work, the virtual uuid would
basically be the metadata_uuid set and on a read-only device. The
problems start in the state transitions in the device tracking, we had
some bugs there and the code is hard to grasp. For that I'd very much
vote for using the metadata_uuid but we can provide an interface on top
of that to make it work.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-05 22:31       ` Qu Wenruo
@ 2023-05-06 17:30         ` Goffredo Baroncelli
  2023-05-06 23:00           ` Qu Wenruo
  0 siblings, 1 reply; 38+ messages in thread
From: Goffredo Baroncelli @ 2023-05-06 17:30 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, Guilherme G. Piccoli, linux-btrfs
  Cc: clm, josef, dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, nborisov

On 06/05/2023 00.31, Qu Wenruo wrote:
> 
> 
> On 2023/5/6 01:34, Goffredo Baroncelli wrote:
>> On 05/05/2023 09.21, Qu Wenruo wrote:
>>>
>>> I would prefer a much simpler but more explicit method.
>>>
>>> Just introduce a new compat_ro feature, maybe call it SINGLE_DEV.
>>
>> It is not clear to me if we need that.
>>
>> I don't understand in what checking for SINGLE_DEV is different from
>> btrfs_super_block.disks_num == 1.
> 
> Because disks_num == 1 doesn't exclude the ability to add new disks in the future.
> 
> Without that new SINGLE_DEV compat_ro, we should still do the regular device scan.
> 
>>
>> Let me to argument:
>>
>> I see two scenarios:
>> 1) mount two different fs with the same UUID NOT at the same time:
>> This could be done now with small change in the kernel:
>> - we need to NOT store the data of a filesystem when a disk is
>>    scanned IF it is composed by only one disk
>> - after the unmount we need to discard the data too (checking again
>>    that the filesystem is composed by only one disk)
>>
>> No limit is needed to add/replace a disk. Of course after a disk is
>> added a filesystem with the same UUID cannot be mounted without a
>> full cycle of --forget.
> 
> The problem is, what if:
> 
> - Both btrfs have single disk
> - Both btrfs have the same fsid
> - Both btrfs have been mounted
> - Then one of btrfs is going to add a new disk
> 

Why the user should be prevented to add a disk. It may
a aware user that want to do that, knowing the possible consequence.


[...]

> 
> - Scan and record the fsid/device at device add time
>    This means we should reject the device add.
>    This can sometimes cause confusion to the end user, just because they
>    have mounted another fs, now they can not add a new device.

I agree about the confusion. But not about the cause.
The confusion is due to the poor communication between the kernel (where the error is
detected) and the user. Now the only solution is to look at dmesg.

Allowing to mount two filesystem with the same UUID is technically possible.
There are some constraints bat are well defined; there are some corner case
but are well defined (like add a device to a single device filesystem).

However when we hit one of these corner case, now it is difficult to inform
the user about the problem. Because now the user has to look at the dmesg
to understand what is the problem.

This is the real problem. The communication. And we have a lot of these
problem (like mount a multi device filesystem without some disk, or with a
brain slip problem, or better inform the user if it is possible the
mount -o degraded).

Look this in another way; what if we had a mount.btrfs helper that:

- look for the devices which compose the filesystem at mounting time
- check if these devices are consistent:
	- if the fs is one-device, we don't need further check; otherwise check
	- if all the devices are present
	- if all the device have the same transaction id
	- if ...
   if any of the check above fails, write an error message; otherwise
- register the device(s) in the kernel or (better) pass it in the mount command
   line
- finally mount the filesystem


No need of strange flag; all the corner case can be handle safely and avoid
any confusion to the user.






> 
>    And this is going to change device add code path quite hugely.
>    We currently expects all device scan/trace thing done way before
>    mount.
>    Such huge change can lead to hidden bugs.
> 
> To me, neither is good to the end users.
> 
> A SINGLE_DEV feature would reject the corner case in a way more user-friendly and clear way.
> 
>    With SINGLE_DEV feature, just no dev add/replace/delete no matter
>    what.
> 
> 
>>
>> I have to point out that this problem would be easily solved in
>> userspace if we switch from the current model where the disks are
>> scanned asynchronously (udev which call btrfs dev scan) to a model
>> where the disk are scanned at mount time by a mount.btrfs helper.
>>
>> A mount.btrfs helper, also could be a place to put some more clear error
>> message like "we cannot mount this filesystem because one disk of a
>> raid5 is missing, try passing -o degraded"
>> or "we cannot mount this filesystem because we detect a brain split
>> problem" ....
>>
>> 2) mount two different fs with the same UUID at the SAME time:
>> This is a bit more complicated; we need to store a virtual UUID
>> somewhere.
>>
>> However sometime we need to use the real fsid (during a write),
>> and sometime we need to use the virtual_uuid (e.g. for /sys/fs/btrfs/<uuid>)
> 
> Another thing is, we already have too many uuids.
> 
> Some are unavoidable like fsid and device uuid.
> 
> But I still prefer not to add a new layer of unnecessary uuids.
> 
> Thanks,
> Qu
> 
>>
>> Both in 1) and 2) we need to/it is enough to have btrfs_super_block.disks_num == 1
>> In the case 2) using a virtual_uuid mount option will prevent
>> to add a disk.
> 

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-06 17:30         ` Goffredo Baroncelli
@ 2023-05-06 23:00           ` Qu Wenruo
  0 siblings, 0 replies; 38+ messages in thread
From: Qu Wenruo @ 2023-05-06 23:00 UTC (permalink / raw)
  To: kreijack, Qu Wenruo, Guilherme G. Piccoli, linux-btrfs
  Cc: clm, josef, dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, nborisov



On 2023/5/7 01:30, Goffredo Baroncelli wrote:
> On 06/05/2023 00.31, Qu Wenruo wrote:
>>
>>
>> On 2023/5/6 01:34, Goffredo Baroncelli wrote:
>>> On 05/05/2023 09.21, Qu Wenruo wrote:
>>>>
>>>> I would prefer a much simpler but more explicit method.
>>>>
>>>> Just introduce a new compat_ro feature, maybe call it SINGLE_DEV.
>>>
>>> It is not clear to me if we need that.
>>>
>>> I don't understand in what checking for SINGLE_DEV is different from
>>> btrfs_super_block.disks_num == 1.
>>
>> Because disks_num == 1 doesn't exclude the ability to add new disks in
>> the future.
>>
>> Without that new SINGLE_DEV compat_ro, we should still do the regular
>> device scan.
>>
>>>
>>> Let me to argument:
>>>
>>> I see two scenarios:
>>> 1) mount two different fs with the same UUID NOT at the same time:
>>> This could be done now with small change in the kernel:
>>> - we need to NOT store the data of a filesystem when a disk is
>>>    scanned IF it is composed by only one disk
>>> - after the unmount we need to discard the data too (checking again
>>>    that the filesystem is composed by only one disk)
>>>
>>> No limit is needed to add/replace a disk. Of course after a disk is
>>> added a filesystem with the same UUID cannot be mounted without a
>>> full cycle of --forget.
>>
>> The problem is, what if:
>>
>> - Both btrfs have single disk
>> - Both btrfs have the same fsid
>> - Both btrfs have been mounted
>> - Then one of btrfs is going to add a new disk
>>
>
> Why the user should be prevented to add a disk. It may
> a aware user that want to do that, knowing the possible consequence.
>
>
> [...]
>
>>
>> - Scan and record the fsid/device at device add time
>>    This means we should reject the device add.
>>    This can sometimes cause confusion to the end user, just because they
>>    have mounted another fs, now they can not add a new device.
>
> I agree about the confusion. But not about the cause.
> The confusion is due to the poor communication between the kernel (where
> the error is
> detected) and the user. Now the only solution is to look at dmesg.
>
> Allowing to mount two filesystem with the same UUID is technically
> possible.
> There are some constraints bat are well defined; there are some corner case
> but are well defined (like add a device to a single device filesystem).
>
> However when we hit one of these corner case, now it is difficult to inform
> the user about the problem. Because now the user has to look at the dmesg
> to understand what is the problem.
>
> This is the real problem. The communication. And we have a lot of these
> problem (like mount a multi device filesystem without some disk, or with a
> brain slip problem, or better inform the user if it is possible the
> mount -o degraded).
>
> Look this in another way; what if we had a mount.btrfs helper that:
>
> - look for the devices which compose the filesystem at mounting time
> - check if these devices are consistent:
>      - if the fs is one-device, we don't need further check; otherwise
> check
>      - if all the devices are present
>      - if all the device have the same transaction id
>      - if ...
>    if any of the check above fails, write an error message; otherwise
> - register the device(s) in the kernel or (better) pass it in the mount
> command
>    line
> - finally mount the filesystem
>
>
> No need of strange flag; all the corner case can be handle safely and avoid
> any confusion to the user.

Just handling corner cases is not good for maintaining already.

So nope, I don't believe this is the good way to go.

Furthermore, if someone really found the same fsid limits is a problem,
they should go with the SINGLE_DEV features, not relying on some extra
corner cases handling, which may or may not be supported on certain kernels.

On the other handle, a compat_ro flag is clear dedicated way to tell the
compatibility.

In fact, if you're going to introduce an unexpected behavior change,
it's way more strange to do without any compat_ro/ro/incompact flags.

Thanks,
Qu
>
>
>
>
>
>
>>
>>    And this is going to change device add code path quite hugely.
>>    We currently expects all device scan/trace thing done way before
>>    mount.
>>    Such huge change can lead to hidden bugs.
>>
>> To me, neither is good to the end users.
>>
>> A SINGLE_DEV feature would reject the corner case in a way more
>> user-friendly and clear way.
>>
>>    With SINGLE_DEV feature, just no dev add/replace/delete no matter
>>    what.
>>
>>
>>>
>>> I have to point out that this problem would be easily solved in
>>> userspace if we switch from the current model where the disks are
>>> scanned asynchronously (udev which call btrfs dev scan) to a model
>>> where the disk are scanned at mount time by a mount.btrfs helper.
>>>
>>> A mount.btrfs helper, also could be a place to put some more clear error
>>> message like "we cannot mount this filesystem because one disk of a
>>> raid5 is missing, try passing -o degraded"
>>> or "we cannot mount this filesystem because we detect a brain split
>>> problem" ....
>>>
>>> 2) mount two different fs with the same UUID at the SAME time:
>>> This is a bit more complicated; we need to store a virtual UUID
>>> somewhere.
>>>
>>> However sometime we need to use the real fsid (during a write),
>>> and sometime we need to use the virtual_uuid (e.g. for
>>> /sys/fs/btrfs/<uuid>)
>>
>> Another thing is, we already have too many uuids.
>>
>> Some are unavoidable like fsid and device uuid.
>>
>> But I still prefer not to add a new layer of unnecessary uuids.
>>
>> Thanks,
>> Qu
>>
>>>
>>> Both in 1) and 2) we need to/it is enough to have
>>> btrfs_super_block.disks_num == 1
>>> In the case 2) using a virtual_uuid mount option will prevent
>>> to add a disk.
>>
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs
  2023-05-04 17:07 [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs Guilherme G. Piccoli
                   ` (3 preceding siblings ...)
  2023-05-05  5:16 ` Anand Jain
@ 2023-05-07 23:10 ` Dave Chinner
  2023-05-08 22:45   ` Guilherme G. Piccoli
  2023-08-03 15:47 ` Guilherme G. Piccoli
  5 siblings, 1 reply; 38+ messages in thread
From: Dave Chinner @ 2023-05-07 23:10 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-btrfs, clm, josef, dsterba, linux-fsdevel, kernel,
	kernel-dev, vivek, ludovico.denittis, johns, nborisov

On Thu, May 04, 2023 at 02:07:06PM -0300, Guilherme G. Piccoli wrote:
> Hi folks, this is an attempt of supporting same fsid mounting on btrfs.
> Currently, we cannot reliably mount same fsid filesystems even one at
> a time in btrfs, but if users want to mount them at the same time, it's
> pretty much impossible. Other filesystems like ext4 are capable of that.
> 
> The goal is to allow systems with A/B partitioning scheme (like the
> Steam Deck console or various mobile devices) to be able to hold
> the same filesystem image in both partitions; it also allows to have
> block device level check for filesystem integrity - this is used in the
> Steam Deck image installation, to check if the current read-only image
> is pristine. A bit more details are provided in the following ML thread:
> 
> https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/
> 
> The mechanism used to achieve it is based in the metadata_uuid feature,
> leveraging such code infrastructure for that. The patches are based on
> kernel 6.3 and were tested both in a virtual machine as well as in the
> Steam Deck. Comments, suggestions and overall feedback is greatly
> appreciated - thanks in advance!

So how does this work if someone needs to mount 3 copies of the same
filesystem at the same time?

On XFS, we have the "nouuid" mount option which skips the duplicate
UUID checking done at mount time so that multiple snapshots or
images of the same filesystem can be mounted at the same time. This
means we don't get the same filesystem mounted by accident, but also
allows all the cases we know about where multiple versions of the
filesystem need to be mounted at the same time.

I know, fs UUIDs are used differently in btrfs vs XFS, but it would
be nice for users if filesystems shared the same interfaces for
doing the same sort of management operations...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-05 13:38     ` David Sterba
@ 2023-05-08 11:27       ` Anand Jain
  2023-05-08 11:50         ` Qu Wenruo
  0 siblings, 1 reply; 38+ messages in thread
From: Anand Jain @ 2023-05-08 11:27 UTC (permalink / raw)
  To: dsterba, Qu Wenruo
  Cc: Guilherme G. Piccoli, linux-btrfs, clm, josef, dsterba,
	linux-fsdevel, kernel, kernel-dev, vivek, ludovico.denittis,
	johns, nborisov

On 05/05/2023 21:38, David Sterba wrote:
> On Fri, May 05, 2023 at 03:21:35PM +0800, Qu Wenruo wrote:
>> On 2023/5/5 01:07, Guilherme G. Piccoli wrote:
>>> Btrfs doesn't currently support to mount 2 different devices holding the
>>> same filesystem - the fsid is used as a unique identifier in the driver.
>>> This case is supported though in some other common filesystems, like
>>> ext4; one of the reasons for which is not trivial supporting this case
>>> on btrfs is due to its multi-device filesystem nature, native RAID, etc.
>>
>> Exactly, the biggest problem is the multi-device support.
>>
>> Btrfs needs to search and assemble all devices of a multi-device
>> filesystem, which is normally handled by things like LVM/DMraid, thus
>> other traditional fses won't need to bother that.
>>
>>>
>>> Supporting the same-fsid mounts has the advantage of allowing btrfs to
>>> be used in A/B partitioned devices, like mobile phones or the Steam Deck
>>> for example. Without this support, it's not safe for users to keep the
>>> same "image version" in both A and B partitions, a setup that is quite
>>> common for development, for example. Also, as a big bonus, it allows fs
>>> integrity check based on block devices for RO devices (whereas currently
>>> it is required that both have different fsid, breaking the block device
>>> hash comparison).
>>>
>>> Such same-fsid mounting is hereby added through the usage of the
>>> mount option "virtual_fsid" - when such option is used, btrfs generates
>>> a random fsid for the filesystem and leverages the metadata_uuid
>>> infrastructure (introduced by [0]) to enable the usage of this secondary
>>> virtual fsid. But differently from the regular metadata_uuid flag, this
>>> is not written into the disk superblock - effectively, this is a spoofed
>>> fsid approach that enables the same filesystem in different devices to
>>> appear as different filesystems to btrfs on runtime.
>>
>> I would prefer a much simpler but more explicit method.
>>
>> Just introduce a new compat_ro feature, maybe call it SINGLE_DEV.
>>
>> By this, we can avoid multiple meanings of the same super member, nor
>> need any special mount option.
>> Remember, mount option is never a good way to enable/disable a new feature.
>>
>> The better method to enable/disable a feature should be mkfs and btrfstune.
>>
>> Then go mostly the same of your patch, but maybe with something extra:
>>
>> - Disbale multi-dev code
>>     Include device add/replace/removal, this is already done in your
>>     patch.
>>
>> - Completely skip device scanning
>>     I see no reason to keep btrfs with SINGLE_DEV feature to be added to
>>     the device list at all.
>>     It only needs to be scanned at mount time, and never be kept in the
>>     in-memory device list.
> 
> This is actually a good point, we can do that already. As a conterpart
> to 5f58d783fd7823 ("btrfs: free device in btrfs_close_devices for a
> single device filesystem") that drops single device from the list,
> single fs devices wouldn't be added to the list but some checks could be
> still done like superblock validation for eventual error reporting.

Something similar occurred to me earlier. However, even for a single
device, we need to perform the scan because there may be an unfinished
replace target from a previous reboot, or a sprout Btrfs filesystem may
have a single seed device. If we were to make an exception for replace
targets and seed devices, it would only complicate the scan logic, which
goes against our attempt to simplify it.

Thanks, Anand



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-08 11:27       ` Anand Jain
@ 2023-05-08 11:50         ` Qu Wenruo
  2023-05-11 11:51           ` David Sterba
  0 siblings, 1 reply; 38+ messages in thread
From: Qu Wenruo @ 2023-05-08 11:50 UTC (permalink / raw)
  To: Anand Jain, dsterba
  Cc: Guilherme G. Piccoli, linux-btrfs, clm, josef, dsterba,
	linux-fsdevel, kernel, kernel-dev, vivek, ludovico.denittis,
	johns, nborisov



On 2023/5/8 19:27, Anand Jain wrote:
> On 05/05/2023 21:38, David Sterba wrote:
>> On Fri, May 05, 2023 at 03:21:35PM +0800, Qu Wenruo wrote:
>>> On 2023/5/5 01:07, Guilherme G. Piccoli wrote:
>>>> Btrfs doesn't currently support to mount 2 different devices holding
>>>> the
>>>> same filesystem - the fsid is used as a unique identifier in the
>>>> driver.
>>>> This case is supported though in some other common filesystems, like
>>>> ext4; one of the reasons for which is not trivial supporting this case
>>>> on btrfs is due to its multi-device filesystem nature, native RAID,
>>>> etc.
>>>
>>> Exactly, the biggest problem is the multi-device support.
>>>
>>> Btrfs needs to search and assemble all devices of a multi-device
>>> filesystem, which is normally handled by things like LVM/DMraid, thus
>>> other traditional fses won't need to bother that.
>>>
>>>>
>>>> Supporting the same-fsid mounts has the advantage of allowing btrfs to
>>>> be used in A/B partitioned devices, like mobile phones or the Steam
>>>> Deck
>>>> for example. Without this support, it's not safe for users to keep the
>>>> same "image version" in both A and B partitions, a setup that is quite
>>>> common for development, for example. Also, as a big bonus, it allows fs
>>>> integrity check based on block devices for RO devices (whereas
>>>> currently
>>>> it is required that both have different fsid, breaking the block device
>>>> hash comparison).
>>>>
>>>> Such same-fsid mounting is hereby added through the usage of the
>>>> mount option "virtual_fsid" - when such option is used, btrfs generates
>>>> a random fsid for the filesystem and leverages the metadata_uuid
>>>> infrastructure (introduced by [0]) to enable the usage of this
>>>> secondary
>>>> virtual fsid. But differently from the regular metadata_uuid flag, this
>>>> is not written into the disk superblock - effectively, this is a
>>>> spoofed
>>>> fsid approach that enables the same filesystem in different devices to
>>>> appear as different filesystems to btrfs on runtime.
>>>
>>> I would prefer a much simpler but more explicit method.
>>>
>>> Just introduce a new compat_ro feature, maybe call it SINGLE_DEV.
>>>
>>> By this, we can avoid multiple meanings of the same super member, nor
>>> need any special mount option.
>>> Remember, mount option is never a good way to enable/disable a new
>>> feature.
>>>
>>> The better method to enable/disable a feature should be mkfs and
>>> btrfstune.
>>>
>>> Then go mostly the same of your patch, but maybe with something extra:
>>>
>>> - Disbale multi-dev code
>>>     Include device add/replace/removal, this is already done in your
>>>     patch.
>>>
>>> - Completely skip device scanning
>>>     I see no reason to keep btrfs with SINGLE_DEV feature to be added to
>>>     the device list at all.
>>>     It only needs to be scanned at mount time, and never be kept in the
>>>     in-memory device list.
>>
>> This is actually a good point, we can do that already. As a conterpart
>> to 5f58d783fd7823 ("btrfs: free device in btrfs_close_devices for a
>> single device filesystem") that drops single device from the list,
>> single fs devices wouldn't be added to the list but some checks could be
>> still done like superblock validation for eventual error reporting.
>
> Something similar occurred to me earlier. However, even for a single
> device, we need to perform the scan because there may be an unfinished
> replace target from a previous reboot, or a sprout Btrfs filesystem may
> have a single seed device. If we were to make an exception for replace
> targets and seed devices, it would only complicate the scan logic, which
> goes against our attempt to simplify it.

If we go SINGLE_DEV compat_ro flags, then no such problem at all, we can
easily reject any multi-dev features from such SINGLE_DEV fs.

Thanks,
Qu

>
> Thanks, Anand
>
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs
  2023-05-07 23:10 ` Dave Chinner
@ 2023-05-08 22:45   ` Guilherme G. Piccoli
  0 siblings, 0 replies; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-05-08 22:45 UTC (permalink / raw)
  To: Dave Chinner, Qu Wenruo, dsterba
  Cc: linux-btrfs, clm, josef, linux-fsdevel, kernel, kernel-dev,
	vivek, ludovico.denittis, johns

On 07/05/2023 20:10, Dave Chinner wrote:
> [...]
> So how does this work if someone needs to mount 3 copies of the same
> filesystem at the same time?
> 
> On XFS, we have the "nouuid" mount option which skips the duplicate
> UUID checking done at mount time so that multiple snapshots or
> images of the same filesystem can be mounted at the same time. This
> means we don't get the same filesystem mounted by accident, but also
> allows all the cases we know about where multiple versions of the
> filesystem need to be mounted at the same time.
> 
> I know, fs UUIDs are used differently in btrfs vs XFS, but it would
> be nice for users if filesystems shared the same interfaces for
> doing the same sort of management operations...
> 
> Cheers,
> 
> Dave.

Hi Dave, thanks for the information / suggestion.

I see no reason for the virtual_fsid fails with 3 or more devices; the
idea is that it creates random fsids for the every device in which you
mount with the flag, so shouldn't be a problem.

Of course renaming to "nouuid" would be completely fine (at least for
me) to keep consistency among filesystems; the only question that
remains is if we should go with a mount option or the compat_ro flag as
strongly suggest by Qu.

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-05 22:15       ` Qu Wenruo
@ 2023-05-08 22:49         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-05-08 22:49 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs, dsterba
  Cc: clm, josef, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, nborisov

On 05/05/2023 19:15, Qu Wenruo wrote:
> [...] 
>> Imagine you have 2 devices, /dev/sda1 and /dev/sda2 holding the exact
>> same image, with the SINGLE_DEV feature set. They are identical, and
>> IIUC no matter if we skip scanning or disable any multi-device approach,
>> in the end both have the *same* fsid. How do we track this in the btrfs
>> code now? Once we try to mount the second one, it'll try to add the same
>> entity to the fs_uuids list...
> 
> My bad, I forgot to mention that, if we hit such SINGLE_DEV fses, we 
> should also not add them to the fs_uuids list either.
> 
> So the fs_uuids list conflicts would not be a problem at all.

Awesome, thanks for clarifying Qu! Now I understand it =)

> [...]
>> Also, one more question: why do you say "Remember, mount option is never
>> a good way to enable/disable a new feature"? I'm not expert in
>> filesystems (by far heh), so I'm curious to fully understand your
>> point-of-view.
> 
> We had a bad example in the past, free space tree (aka, v2 space cache).
> 
> It's initially a pretty convenient way to enable the new feature, but 
> now it's making a lot of new features, which can depend on v2 cache, 
> more complex to handle those old mount options.
> 
> The compatibility matrix would only grow, and all the (mostly one-time) 
> logic need to be implemented in kernel.
> 
> So in the long run, we prefer offline convert tool.

OK, I understand your point. I guess I could rewrite it to make use of
such compat_ro flag, it'd be fine. *Personally* (thinking as an user), I
much rather have mount options, I think it's consistent with other
filesystems and doesn't require specific btrfs tooling usage...

But of course I'll defer the decision to the maintainers!

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-05 23:00       ` David Sterba
@ 2023-05-08 22:59         ` Guilherme G. Piccoli
  2023-05-08 23:18           ` Qu Wenruo
  0 siblings, 1 reply; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-05-08 22:59 UTC (permalink / raw)
  To: dsterba, Dave Chinner, Qu Wenruo, Qu Wenruo
  Cc: linux-btrfs, clm, josef, dsterba, linux-fsdevel, kernel,
	kernel-dev, vivek, ludovico.denittis, johns

On 05/05/2023 20:00, David Sterba wrote:
> [...]
>> Hi David, thanks for your suggestion!
>>
>> It might be possible, it seems a valid suggestion. But worth notice that
>> we cannot modify the FS at all. That's why I've implemented the feature
>> in a way it "fakes" the fsid for the driver, as a mount option, but
>> nothing changes in the FS.
>>
>> The images on Deck are read-only. So, by using the metadata_uuid purely,
>> can we mount 2 identical images at the same time *not modifying* the
>> filesystem in any way? If it's possible, then we have only to implement
>> the skip scanning idea from Qu in the other thread (or else ioclt scans
>> would prevent mounting them).
> 
> Ok, I see, the device is read-only. The metadata_uuid is now set on an
> unmounted filesystem and we don't have any semantics for a mount option.
> 
> If there's an equivalent mount option (let's say metadata_uuid for
> compatibility) with the same semantics as if set offline, on the first
> commit the metadata_uuid would be written.
> 
> The question is if this would be sane for read-only devices. You've
> implemented the uuid on the metadata_uuid base but named it differently,
> but this effectively means that metadata_uuid could work on read-only
> devices too, but with some necessary updates to the device scanning.
> 
> From the use case perspective this should work, the virtual uuid would
> basically be the metadata_uuid set and on a read-only device. The
> problems start in the state transitions in the device tracking, we had
> some bugs there and the code is hard to grasp. For that I'd very much
> vote for using the metadata_uuid but we can provide an interface on top
> of that to make it work.

OK, being completely honest here, I couldn't parse fully what you're
proposing - I blame it to my lack of knowledge on btrfs, so apologies heh

Could you clarify it a bit more? Are you suggesting we somewhat rework
"metadata_uuid", to kinda overload its meaning to be able to accomplish
this same-fsid mounting using "metadata_uuid" purely?

I see that we seem to have 3 proposals here:

(a) The compat_ro flag from Qu;

(b) Your idea (that requires some clarification for my fully
understanding - thanks in advance!);

(c) Renaming the mount option "virtual_fsid" to "nouuid" to keep
filesystem consistency, like XFS (courtesy of Dave Chinner) - please
correct me here if I misunderstood you Dave =)

I'd like to thank you all for the suggestions, and I'm willing to follow
the preferred one - as long we have a consensus / blessing from the
maintainers, I'm happy to rework this as the best possible approach for
btrfs.

Also, what about patch 2, does it make sense or should we kinda "embed"
the idea of scan skipping into the same-fsid mounting? Per my current
understanding, the idea (a) from Qu includes/fixes the scan thing and
makes patch 2 unnecessary.

Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-08 22:59         ` Guilherme G. Piccoli
@ 2023-05-08 23:18           ` Qu Wenruo
  2023-05-08 23:49             ` Guilherme G. Piccoli
  0 siblings, 1 reply; 38+ messages in thread
From: Qu Wenruo @ 2023-05-08 23:18 UTC (permalink / raw)
  To: Guilherme G. Piccoli, dsterba, Dave Chinner, Qu Wenruo
  Cc: linux-btrfs, clm, josef, dsterba, linux-fsdevel, kernel,
	kernel-dev, vivek, ludovico.denittis, johns



On 2023/5/9 06:59, Guilherme G. Piccoli wrote:
> On 05/05/2023 20:00, David Sterba wrote:
>> [...]
>>> Hi David, thanks for your suggestion!
>>>
>>> It might be possible, it seems a valid suggestion. But worth notice that
>>> we cannot modify the FS at all. That's why I've implemented the feature
>>> in a way it "fakes" the fsid for the driver, as a mount option, but
>>> nothing changes in the FS.
>>>
>>> The images on Deck are read-only. So, by using the metadata_uuid purely,
>>> can we mount 2 identical images at the same time *not modifying* the
>>> filesystem in any way? If it's possible, then we have only to implement
>>> the skip scanning idea from Qu in the other thread (or else ioclt scans
>>> would prevent mounting them).
>>
>> Ok, I see, the device is read-only. The metadata_uuid is now set on an
>> unmounted filesystem and we don't have any semantics for a mount option.
>>
>> If there's an equivalent mount option (let's say metadata_uuid for
>> compatibility) with the same semantics as if set offline, on the first
>> commit the metadata_uuid would be written.
>>
>> The question is if this would be sane for read-only devices. You've
>> implemented the uuid on the metadata_uuid base but named it differently,
>> but this effectively means that metadata_uuid could work on read-only
>> devices too, but with some necessary updates to the device scanning.
>>
>>  From the use case perspective this should work, the virtual uuid would
>> basically be the metadata_uuid set and on a read-only device. The
>> problems start in the state transitions in the device tracking, we had
>> some bugs there and the code is hard to grasp. For that I'd very much
>> vote for using the metadata_uuid but we can provide an interface on top
>> of that to make it work.
>
> OK, being completely honest here, I couldn't parse fully what you're
> proposing - I blame it to my lack of knowledge on btrfs, so apologies heh
>
> Could you clarify it a bit more? Are you suggesting we somewhat rework
> "metadata_uuid", to kinda overload its meaning to be able to accomplish
> this same-fsid mounting using "metadata_uuid" purely?
>
> I see that we seem to have 3 proposals here:
>
> (a) The compat_ro flag from Qu;
>
> (b) Your idea (that requires some clarification for my fully
> understanding - thanks in advance!);
>
> (c) Renaming the mount option "virtual_fsid" to "nouuid" to keep
> filesystem consistency, like XFS (courtesy of Dave Chinner) - please
> correct me here if I misunderstood you Dave =)

To me, (a) and (c) don't conflict at all.

We can allow "nouuid" only to work with SINGLE_DEV compat_ro.

That compat_ro flags is more like a better guarantee that the fs will
never have more disks.

As even with SINGLE_DEV compat_ro flags, we may still want some checks
to prevent the same fs being RW mounted at different instances, which
can cause other problems, thus dedicated "nouuid" may still be needed.

Thanks,
Qu

>
> I'd like to thank you all for the suggestions, and I'm willing to follow
> the preferred one - as long we have a consensus / blessing from the
> maintainers, I'm happy to rework this as the best possible approach for
> btrfs.
>
> Also, what about patch 2, does it make sense or should we kinda "embed"
> the idea of scan skipping into the same-fsid mounting? Per my current
> understanding, the idea (a) from Qu includes/fixes the scan thing and
> makes patch 2 unnecessary.
>
> Thanks,
>
>
> Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-08 23:18           ` Qu Wenruo
@ 2023-05-08 23:49             ` Guilherme G. Piccoli
  2023-05-09  0:02               ` Qu Wenruo
  0 siblings, 1 reply; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-05-08 23:49 UTC (permalink / raw)
  To: Qu Wenruo, dsterba, Dave Chinner, Qu Wenruo
  Cc: linux-btrfs, clm, josef, dsterba, linux-fsdevel, kernel,
	kernel-dev, vivek, ludovico.denittis, johns

On 08/05/2023 20:18, Qu Wenruo wrote:
> [...]
>> I see that we seem to have 3 proposals here:
>>
>> (a) The compat_ro flag from Qu;
>>
>> (b) Your idea (that requires some clarification for my fully
>> understanding - thanks in advance!);
>>
>> (c) Renaming the mount option "virtual_fsid" to "nouuid" to keep
>> filesystem consistency, like XFS (courtesy of Dave Chinner) - please
>> correct me here if I misunderstood you Dave =)
> 
> To me, (a) and (c) don't conflict at all.
> 
> We can allow "nouuid" only to work with SINGLE_DEV compat_ro.
> 
> That compat_ro flags is more like a better guarantee that the fs will
> never have more disks.
> 
> As even with SINGLE_DEV compat_ro flags, we may still want some checks
> to prevent the same fs being RW mounted at different instances, which
> can cause other problems, thus dedicated "nouuid" may still be needed.
> 
> Thanks,
> Qu

Hey Qu, I confess now I'm a bit confused heh

The whole idea of (a) was to *not* use a mount option, right?! Per my
understanding of your objections in this thread, you're not into a mount
option for this same-fsid feature (based on a bad previous experience,
as you explained).

If we're keeping the "nouuid" mount option, why we'd require the
compat_ro flag? Or vice-versa: having the compat_ro flag, why we'd need
the mount option?

Thanks in advance for clarifications,


Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-08 23:49             ` Guilherme G. Piccoli
@ 2023-05-09  0:02               ` Qu Wenruo
  0 siblings, 0 replies; 38+ messages in thread
From: Qu Wenruo @ 2023-05-09  0:02 UTC (permalink / raw)
  To: Guilherme G. Piccoli, dsterba, Dave Chinner, Qu Wenruo
  Cc: linux-btrfs, clm, josef, dsterba, linux-fsdevel, kernel,
	kernel-dev, vivek, ludovico.denittis, johns



On 2023/5/9 07:49, Guilherme G. Piccoli wrote:
> On 08/05/2023 20:18, Qu Wenruo wrote:
>> [...]
>>> I see that we seem to have 3 proposals here:
>>>
>>> (a) The compat_ro flag from Qu;
>>>
>>> (b) Your idea (that requires some clarification for my fully
>>> understanding - thanks in advance!);
>>>
>>> (c) Renaming the mount option "virtual_fsid" to "nouuid" to keep
>>> filesystem consistency, like XFS (courtesy of Dave Chinner) - please
>>> correct me here if I misunderstood you Dave =)
>>
>> To me, (a) and (c) don't conflict at all.
>>
>> We can allow "nouuid" only to work with SINGLE_DEV compat_ro.
>>
>> That compat_ro flags is more like a better guarantee that the fs will
>> never have more disks.
>>
>> As even with SINGLE_DEV compat_ro flags, we may still want some checks
>> to prevent the same fs being RW mounted at different instances, which
>> can cause other problems, thus dedicated "nouuid" may still be needed.
>>
>> Thanks,
>> Qu
>
> Hey Qu, I confess now I'm a bit confused heh
>
> The whole idea of (a) was to *not* use a mount option, right?! Per my
> understanding of your objections in this thread, you're not into a mount
> option for this same-fsid feature (based on a bad previous experience,
> as you explained).

My bad, I initially thought there would be some extra checks inside VFS
layer rejecting the same fs.

But after checking the xfs code, it looks like it's handling the nouuid
internally.

Thus no need for nouuid mount option if we go compat_ro.

Although I would still like a nouuid mount option, which is only valid
if the fs has the compat_ro flags, to provide a generic behavior just
like XFS.

Thanks,
Qu
>
> If we're keeping the "nouuid" mount option, why we'd require the
> compat_ro flag? Or vice-versa: having the compat_ro flag, why we'd need
> the mount option? >
> Thanks in advance for clarifications,
>
>
> Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-08 11:50         ` Qu Wenruo
@ 2023-05-11 11:51           ` David Sterba
  2023-05-11 14:12             ` Anand Jain
  2023-05-14 21:25             ` Guilherme G. Piccoli
  0 siblings, 2 replies; 38+ messages in thread
From: David Sterba @ 2023-05-11 11:51 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Anand Jain, Guilherme G. Piccoli, linux-btrfs, clm, josef,
	dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, nborisov

On Mon, May 08, 2023 at 07:50:55PM +0800, Qu Wenruo wrote:
> On 2023/5/8 19:27, Anand Jain wrote:
> > On 05/05/2023 21:38, David Sterba wrote:
> >> On Fri, May 05, 2023 at 03:21:35PM +0800, Qu Wenruo wrote:
> >>> On 2023/5/5 01:07, Guilherme G. Piccoli wrote:
> >> This is actually a good point, we can do that already. As a conterpart
> >> to 5f58d783fd7823 ("btrfs: free device in btrfs_close_devices for a
> >> single device filesystem") that drops single device from the list,
> >> single fs devices wouldn't be added to the list but some checks could be
> >> still done like superblock validation for eventual error reporting.
> >
> > Something similar occurred to me earlier. However, even for a single
> > device, we need to perform the scan because there may be an unfinished
> > replace target from a previous reboot, or a sprout Btrfs filesystem may
> > have a single seed device. If we were to make an exception for replace
> > targets and seed devices, it would only complicate the scan logic, which
> > goes against our attempt to simplify it.
> 
> If we go SINGLE_DEV compat_ro flags, then no such problem at all, we can
> easily reject any multi-dev features from such SINGLE_DEV fs.

With the scanning complications that Anand mentions the compat_ro flag
might make more sense, with all the limitations but allowing a safe use
of the duplicated UUIDs.

The flag would have to be set at mkfs time or by btrfsune on an
unmounted filesystem. Doing that on a mounted filesystem is possible too
but brings problems with updating the state of scanned device,
potentially ongoing operations like dev-replace and more.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-11 11:51           ` David Sterba
@ 2023-05-11 14:12             ` Anand Jain
  2023-05-14 21:25             ` Guilherme G. Piccoli
  1 sibling, 0 replies; 38+ messages in thread
From: Anand Jain @ 2023-05-11 14:12 UTC (permalink / raw)
  To: dsterba, Qu Wenruo
  Cc: Guilherme G. Piccoli, linux-btrfs, clm, josef, dsterba,
	linux-fsdevel, kernel, kernel-dev, vivek, ludovico.denittis,
	johns, nborisov

On 11/5/23 19:51, David Sterba wrote:
> On Mon, May 08, 2023 at 07:50:55PM +0800, Qu Wenruo wrote:
>> On 2023/5/8 19:27, Anand Jain wrote:
>>> On 05/05/2023 21:38, David Sterba wrote:
>>>> On Fri, May 05, 2023 at 03:21:35PM +0800, Qu Wenruo wrote:
>>>>> On 2023/5/5 01:07, Guilherme G. Piccoli wrote:
>>>> This is actually a good point, we can do that already. As a conterpart
>>>> to 5f58d783fd7823 ("btrfs: free device in btrfs_close_devices for a
>>>> single device filesystem") that drops single device from the list,
>>>> single fs devices wouldn't be added to the list but some checks could be
>>>> still done like superblock validation for eventual error reporting.
>>>
>>> Something similar occurred to me earlier. However, even for a single
>>> device, we need to perform the scan because there may be an unfinished
>>> replace target from a previous reboot, or a sprout Btrfs filesystem may
>>> have a single seed device. If we were to make an exception for replace
>>> targets and seed devices, it would only complicate the scan logic, which
>>> goes against our attempt to simplify it.
>>
>> If we go SINGLE_DEV compat_ro flags, then no such problem at all, we can
>> easily reject any multi-dev features from such SINGLE_DEV fs.
> 

For a single device, if we remove device replacement and seeding for a
specific flag, scanning is unnecessary.

> With the scanning complications that Anand mentions the compat_ro flag
> might make more sense, with all the limitations but allowing a safe use
> of the duplicated UUIDs.
> 
> The flag would have to be set at mkfs time or by btrfsune on an
> unmounted filesystem. Doing that on a mounted filesystem is possible too
> but brings problems with updating the state of scanned device,
> potentially ongoing operations like dev-replace and more.

Setting the flag at mkfs time is preferable, in my opinion. We could
still support the btrfstune method and/or online method later.

While we are here, I'm taking the opportunity to consolidate the
scattered metadata UUID checking, which has been a long-standing
goal of mine to clean up. This will make adding multi-UUID support
cleaner. If you have any ideas, please share.

Thanks, Anand

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-11 11:51           ` David Sterba
  2023-05-11 14:12             ` Anand Jain
@ 2023-05-14 21:25             ` Guilherme G. Piccoli
  1 sibling, 0 replies; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-05-14 21:25 UTC (permalink / raw)
  To: dsterba
  Cc: Anand Jain, linux-btrfs, clm, josef, dsterba, linux-fsdevel,
	kernel, kernel-dev, vivek, ludovico.denittis, johns, Qu Wenruo,
	Qu Wenruo

On 11/05/2023 08:51, David Sterba wrote:
> [...]
> With the scanning complications that Anand mentions the compat_ro flag
> might make more sense, with all the limitations but allowing a safe use
> of the duplicated UUIDs.
> 
> The flag would have to be set at mkfs time or by btrfsune on an
> unmounted filesystem. Doing that on a mounted filesystem is possible too
> but brings problems with updating the state of scanned device,
> potentially ongoing operations like dev-replace and more.

Hi David, thank you! So, would it make sense to also have a
"nouuid"-like mount option along with the compat_ro flag? I'm saying
this because I'm considering the "old"/existing SteamOS images heh

If we go only with the compat_ro flag, we'll only be able to mount 2
images at same time iff they have it set, meaning it'll only work for
newer images.

Anyway, I'm glad to implement the compat_ro flag code - I'll be out some
weeks on holidays, and will retake this work as soon as I'm back.

Thanks all that provided feedback / suggestions in this thread!
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-05-05  7:21   ` Qu Wenruo
                       ` (2 preceding siblings ...)
  2023-05-05 17:34     ` Goffredo Baroncelli
@ 2023-07-05 22:52     ` Guilherme G. Piccoli
  2023-07-06  0:53       ` Qu Wenruo
  3 siblings, 1 reply; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-07-05 22:52 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo
  Cc: clm, josef, dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, linux-btrfs, Anand Jain

On 05/05/2023 04:21, Qu Wenruo wrote:
> [...]
> I would prefer a much simpler but more explicit method.
> 
> Just introduce a new compat_ro feature, maybe call it SINGLE_DEV.
> 
> By this, we can avoid multiple meanings of the same super member, nor
> need any special mount option.
> Remember, mount option is never a good way to enable/disable a new feature.
> 
> The better method to enable/disable a feature should be mkfs and btrfstune.
> 
> Then go mostly the same of your patch, but maybe with something extra:
> 
> - Disbale multi-dev code
>    Include device add/replace/removal, this is already done in your
>    patch.
> 
> - Completely skip device scanning
>    I see no reason to keep btrfs with SINGLE_DEV feature to be added to
>    the device list at all.
>    It only needs to be scanned at mount time, and never be kept in the
>    in-memory device list.
> 

Hi Qu, I'm implementing this compat_ro idea of yours, but I'd like to
ask your input in some "design decisions" I'm facing here.

(a) I've skipped the device_list_add() step of adding the recent created
fs_devices struct to fs_uuids list, but I kept the btrfs_device creation
step. With that, the mount of two filesystems with same fsid fails..at
sysfs directory creation!

Of course - because it tries to add the same folder name to
/sys/fs/btrfs/ !!! I have some options here:

(I) Should I keep using a random generated fsid for single_dev devices,
in order we can mount many of them while not messing too much with the
code? I'd continue "piggybacking" on metadata_uuid idea if (I) is the
proper choice.

(II) Or maybe track down all fsid usage in the code (like this sysfs
case) and deal with that? In the sysfs case, we could change that folder
name to some other format, like fsid.NUM for single_dev devices, whereas
NUM is an incremental value for devices mounted with same fsid.

I'm not too fond of this alternative due to its complexity and "API"
breakage - userspace already expects /sys/fs/btrfs/ entries to be the fsid.

(III) Should we hide the filesystem from sysfs (and other potential
conflicts that come from same fsid mounting points)? Seems a hideous
approach, due to API breakage and bug potentials.

Maybe there are other choices, better than mine - lemme know if you have
some ideas!

Also, one last question/confirmation: you mentioned that "The better
method to enable/disable a feature should be mkfs" - you mean the same
way mkfs could be used to set features like "raid56" or "no-holes"?

By checking "mkfs.btrfs -O list-all", I don't see metadata_uuid for
example, which is confined to btrfstune it seems. I'm already modifying
btrfs-progs/mkfs, but since I'm emailing you, why not confirm, right? heh

Thanks again for the advice and suggestions - much appreciated!
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-07-05 22:52     ` Guilherme G. Piccoli
@ 2023-07-06  0:53       ` Qu Wenruo
  2023-07-06 22:32         ` Guilherme G. Piccoli
  0 siblings, 1 reply; 38+ messages in thread
From: Qu Wenruo @ 2023-07-06  0:53 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Qu Wenruo
  Cc: clm, josef, dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, linux-btrfs, Anand Jain



On 2023/7/6 06:52, Guilherme G. Piccoli wrote:
> On 05/05/2023 04:21, Qu Wenruo wrote:
>> [...]
>> I would prefer a much simpler but more explicit method.
>>
>> Just introduce a new compat_ro feature, maybe call it SINGLE_DEV.
>>
>> By this, we can avoid multiple meanings of the same super member, nor
>> need any special mount option.
>> Remember, mount option is never a good way to enable/disable a new feature.
>>
>> The better method to enable/disable a feature should be mkfs and btrfstune.
>>
>> Then go mostly the same of your patch, but maybe with something extra:
>>
>> - Disbale multi-dev code
>>     Include device add/replace/removal, this is already done in your
>>     patch.
>>
>> - Completely skip device scanning
>>     I see no reason to keep btrfs with SINGLE_DEV feature to be added to
>>     the device list at all.
>>     It only needs to be scanned at mount time, and never be kept in the
>>     in-memory device list.
>>
>
> Hi Qu, I'm implementing this compat_ro idea of yours, but I'd like to
> ask your input in some "design decisions" I'm facing here.
>
> (a) I've skipped the device_list_add() step of adding the recent created
> fs_devices struct to fs_uuids list, but I kept the btrfs_device creation
> step. With that, the mount of two filesystems with same fsid fails..at
> sysfs directory creation!
>
> Of course - because it tries to add the same folder name to
> /sys/fs/btrfs/ !!! I have some options here:
>
> (I) Should I keep using a random generated fsid for single_dev devices,
> in order we can mount many of them while not messing too much with the
> code? I'd continue "piggybacking" on metadata_uuid idea if (I) is the
> proper choice.
>
> (II) Or maybe track down all fsid usage in the code (like this sysfs
> case) and deal with that? In the sysfs case, we could change that folder
> name to some other format, like fsid.NUM for single_dev devices, whereas
> NUM is an incremental value for devices mounted with same fsid.
>
> I'm not too fond of this alternative due to its complexity and "API"
> breakage - userspace already expects /sys/fs/btrfs/ entries to be the fsid.
>
> (III) Should we hide the filesystem from sysfs (and other potential
> conflicts that come from same fsid mounting points)? Seems a hideous
> approach, due to API breakage and bug potentials.

Personally speaking, I would go one of the following solution:

- Keep the sysfs, but adds a refcount to the sysfs related structures
   If we still go register the sysfs interface, then we have to keep a
   refcount, or any of the same fsid got unmounted, we would remove the
   whole sysfs entry.

- Skip the sysfs entry completely for any fs with the new compat_ro flag
   This would your idea (III), but the sysfs interface itself is not that
   critical (we add and remove entries from time to time), so I believe
   it's feasible to hide the sysfs for certain features.

>
> Maybe there are other choices, better than mine - lemme know if you have
> some ideas!
>
> Also, one last question/confirmation: you mentioned that "The better
> method to enable/disable a feature should be mkfs" - you mean the same
> way mkfs could be used to set features like "raid56" or "no-holes"?

Yes.

>
> By checking "mkfs.btrfs -O list-all", I don't see metadata_uuid for
> example, which is confined to btrfstune it seems. I'm already modifying
> btrfs-progs/mkfs, but since I'm emailing you, why not confirm, right? heh

I'm not familiar with metadata_uuid, but there are similar features like
seeding, which is only available in btrfstune, but not in mkfs.

It's not that uncommon, but yeah, you have found something we should
improve on.

Thanks,
Qu

>
> Thanks again for the advice and suggestions - much appreciated!
> Cheers,
>
>
> Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature
  2023-07-06  0:53       ` Qu Wenruo
@ 2023-07-06 22:32         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-07-06 22:32 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, dsterba, Anand Jain
  Cc: clm, josef, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, linux-btrfs

On 05/07/2023 21:53, Qu Wenruo wrote:
> [...]
> Personally speaking, I would go one of the following solution:
> 
> - Keep the sysfs, but adds a refcount to the sysfs related structures
>    If we still go register the sysfs interface, then we have to keep a
>    refcount, or any of the same fsid got unmounted, we would remove the
>    whole sysfs entry.
> 
> - Skip the sysfs entry completely for any fs with the new compat_ro flag
>    This would your idea (III), but the sysfs interface itself is not that
>    critical (we add and remove entries from time to time), so I believe
>    it's feasible to hide the sysfs for certain features.
> 

Hi Qu, thanks for you prompt response.
I've been trying to mess with btrfs sysfs to allow two same fsid
co-existing, without success. For each corner case I handle, two more
show-up heh

Seems quite tricky and error-prone to have this "special-casing" of
sysfs just to accommodate this feature.

Are you strongly against keeping the previous idea, of a spoofed/virtual
fsid, but applied to the compat_ro single_dev idea? This way, all of
this sysfs situation (and other potentially hidden corner cases) would
be avoided. That's like my suggestion (I).

David / Anand, any thoughts/ideas? Thanks in advance!


>> [...]
>> Also, one last question/confirmation: you mentioned that "The better
>> method to enable/disable a feature should be mkfs" - you mean the same
>> way mkfs could be used to set features like "raid56" or "no-holes"?
> 
> Yes.
> 
>> [...]
> I'm not familiar with metadata_uuid, but there are similar features like
> seeding, which is only available in btrfstune, but not in mkfs.
> 
> It's not that uncommon, but yeah, you have found something we should
> improve on.
> 

Thanks for confirming, I could implement it in both mkfs and btrfstune -
seems the more consistent path.
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs
  2023-05-04 17:07 [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs Guilherme G. Piccoli
                   ` (4 preceding siblings ...)
  2023-05-07 23:10 ` Dave Chinner
@ 2023-08-03 15:47 ` Guilherme G. Piccoli
  5 siblings, 0 replies; 38+ messages in thread
From: Guilherme G. Piccoli @ 2023-08-03 15:47 UTC (permalink / raw)
  To: linux-btrfs
  Cc: clm, josef, dsterba, linux-fsdevel, kernel, kernel-dev, vivek,
	ludovico.denittis, johns, nborisov

On 04/05/2023 14:07, Guilherme G. Piccoli wrote:
> [...]

Hi folks, V2 sent here:
https://lore.kernel.org/linux-btrfs/20230803154453.1488248-1-gpiccoli@igalia.com/

Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2023-08-03 15:47 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-04 17:07 [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs Guilherme G. Piccoli
2023-05-04 17:07 ` [PATCH 1/2] btrfs: Introduce the virtual_fsid feature Guilherme G. Piccoli
2023-05-05  7:21   ` Qu Wenruo
2023-05-05 13:38     ` David Sterba
2023-05-08 11:27       ` Anand Jain
2023-05-08 11:50         ` Qu Wenruo
2023-05-11 11:51           ` David Sterba
2023-05-11 14:12             ` Anand Jain
2023-05-14 21:25             ` Guilherme G. Piccoli
2023-05-05 15:51     ` Guilherme G. Piccoli
2023-05-05 22:15       ` Qu Wenruo
2023-05-08 22:49         ` Guilherme G. Piccoli
2023-05-05 17:34     ` Goffredo Baroncelli
2023-05-05 22:31       ` Qu Wenruo
2023-05-06 17:30         ` Goffredo Baroncelli
2023-05-06 23:00           ` Qu Wenruo
2023-07-05 22:52     ` Guilherme G. Piccoli
2023-07-06  0:53       ` Qu Wenruo
2023-07-06 22:32         ` Guilherme G. Piccoli
2023-05-05 13:18   ` David Sterba
2023-05-05 16:18     ` Guilherme G. Piccoli
2023-05-05 23:00       ` David Sterba
2023-05-08 22:59         ` Guilherme G. Piccoli
2023-05-08 23:18           ` Qu Wenruo
2023-05-08 23:49             ` Guilherme G. Piccoli
2023-05-09  0:02               ` Qu Wenruo
2023-05-04 17:07 ` [PATCH 2/2] btrfs: Add module parameter to enable non-mount scan skipping Guilherme G. Piccoli
2023-05-04 19:28 ` [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs Goffredo Baroncelli
2023-05-04 20:10   ` Guilherme G. Piccoli
2023-05-04 21:09     ` Goffredo Baroncelli
2023-05-05 16:21       ` Guilherme G. Piccoli
2023-05-05  5:16 ` Anand Jain
2023-05-05 16:27   ` Guilherme G. Piccoli
2023-05-05 17:37     ` Goffredo Baroncelli
2023-05-05 18:15     ` Vivek Dasmohapatra
2023-05-07 23:10 ` Dave Chinner
2023-05-08 22:45   ` Guilherme G. Piccoli
2023-08-03 15:47 ` Guilherme G. Piccoli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.