All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] btrfs: strategy to perform a rollback at boot time
@ 2020-07-21 20:33 Goffredo Baroncelli
  2020-07-21 20:33 ` [PATCH] btrfs: allow more subvol= option Goffredo Baroncelli
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Goffredo Baroncelli @ 2020-07-21 20:33 UTC (permalink / raw)
  To: linux-btrfs


Hi all,

this is an RFC to discuss a my idea to allow a simple rollback of the
root filesystem at boot time.

The problem that I want to solve is the following: DPKG is very slow on
a BTRFS filesystem. The reason is that DPKG massively uses
sync()/fsync() to guarantee that the filesystem is always coherent even
in case of sudden shutdown.

The same can be useful even to the RPM Linux based distribution (which however
suffer less than DPKG).

A way to avoid the sync()/fsync() calls without loosing the DPKG
guarantees, is:
1) perform a snapshot of the root filesystem (the rollback one)
2) upgrade the filesystem without using sync/fsync
3) final (global) sync
4) destroy the rollback snapshot

If an unclean shutdown happens between 1) and 4), two subvolume exists:
the 'main' one and the 'rollback' one (which is the snapshot before the
update). In this case the system at boot time should mount the "rollback"
subvolume instead of the "main" one. Otherwise in case of a "clean" boot, the
"rollback" subvolume doesn't exist and only the "main" one can be
mounted.

In [1] I discussed a way to implement the steps 1 to 4. (ok, I missed
the point 3) ).

The part that was missed until now, is an automatic way to mount the rollback
subvolume at boot time when it is present.

My idea is to allow more 'subvol=' option. In this case BTRFS tries all the
passed subvolumes until the first succeed. So invoking the kernel as:

  linux root=UUID=xxxx rootflags=subvol=rollback,subvol=main ro 

First, the kernel tries to mount the 'rollback' subvolume. If the rollback
subvolume doesn't exist then it mounts the 'main' subvolume.

Of course after the mount, the system should perform a cleanup of the
subvolumes: i.e. if a rollback subvolume exists, the system should destroy
the "main" one (which contains garbage) and rename "rollback" to "main".
To be more precise:

	if test -d "rollback"; then
		if test -d "old"; then
			btrfs sub del "old"
		fi
		if test -d "main"; then
			mv "main" "old"
		fi
		mv "rollback" "main"
		btrfs sub del "old"
	fi

Comments are welcome
BR
G.Baroncelli

[1] http://lore.kernel.org/linux-btrfs/69396573-b5b3-b349-06f5-f5b74eb9720d@libero.it/

P.S.
I am guessing if an idea like this can be applied to a file. E.g. a sqlite
database that instead of reling to sync/fsync, creates a reflink file as
"rollback" if something goes wrong.... The ordering is preserved. Not the
duration.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] btrfs: allow more subvol= option
  2020-07-21 20:33 [RFC] btrfs: strategy to perform a rollback at boot time Goffredo Baroncelli
@ 2020-07-21 20:33 ` Goffredo Baroncelli
  2020-07-21 20:50   ` Steven Davies
  2020-07-22  1:12     ` kernel test robot
  2020-07-21 20:55 ` [RFC] btrfs: strategy to perform a rollback at boot time Steven Davies
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: Goffredo Baroncelli @ 2020-07-21 20:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Goffredo Baroncelli

From: Goffredo Baroncelli <kreijack@inwind.it>

When more than one subvol= options are passed, btrfs try to mount
each subvolume until the first one succeed. Up to 5 subvol= options
can be passed.

Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>

---
 fs/btrfs/super.c | 71 ++++++++++++++++++++++++++++++------------------
 1 file changed, 45 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index bc73fd670702..12d066e8d52c 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -52,6 +52,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/btrfs.h>
 
+#define SUBVOL_NAMES_COUNT 5
+
 static const struct super_operations btrfs_super_ops;
 
 /*
@@ -974,12 +976,13 @@ static int btrfs_parse_device_options(const char *options, fmode_t flags,
  *
  * The value is later passed to mount_subvol()
  */
-static int btrfs_parse_subvol_options(const char *options, char **subvol_name,
-		u64 *subvol_objectid)
+static int btrfs_parse_subvol_options(const char *options, char **subvol_names,
+					u64 *subvol_objectid)
 {
 	substring_t args[MAX_OPT_ARGS];
 	char *opts, *orig, *p;
 	int error = 0;
+	int svi = 0;
 	u64 subvolid;
 
 	if (!options)
@@ -1002,12 +1005,17 @@ static int btrfs_parse_subvol_options(const char *options, char **subvol_name,
 		token = match_token(p, tokens, args);
 		switch (token) {
 		case Opt_subvol:
-			kfree(*subvol_name);
-			*subvol_name = match_strdup(&args[0]);
-			if (!*subvol_name) {
+			if (svi >= SUBVOL_NAMES_COUNT) {
+				pr_err("BTRFS: too much 'subvol=' mount options\n");
+				error = -E2BIG;
+				goto out;
+			}
+			subvol_names[svi] = match_strdup(&args[0]);
+			if (!subvol_names[svi]) {
 				error = -ENOMEM;
 				goto out;
 			}
+			svi++;
 			break;
 		case Opt_subvolid:
 			error = match_u64(&args[0], &subvolid);
@@ -1429,13 +1437,16 @@ static inline int is_subvolume_inode(struct inode *inode)
 	return 0;
 }
 
-static struct dentry *mount_subvol(const char *subvol_name, u64 subvol_objectid,
+static struct dentry *mount_subvol(char **subvol_names,
+				   u64 subvol_objectid,
 				   struct vfsmount *mnt)
 {
 	struct dentry *root;
 	int ret;
+	const char *sv;
+	int i;
 
-	if (!subvol_name) {
+	if (!subvol_names[0]) {
 		if (!subvol_objectid) {
 			ret = get_default_subvol_objectid(btrfs_sb(mnt->mnt_sb),
 							  &subvol_objectid);
@@ -1444,17 +1455,27 @@ static struct dentry *mount_subvol(const char *subvol_name, u64 subvol_objectid,
 				goto out;
 			}
 		}
-		subvol_name = btrfs_get_subvol_name_from_objectid(
+		subvol_names[0] = btrfs_get_subvol_name_from_objectid(
 					btrfs_sb(mnt->mnt_sb), subvol_objectid);
-		if (IS_ERR(subvol_name)) {
-			root = ERR_CAST(subvol_name);
-			subvol_name = NULL;
+		if (IS_ERR(subvol_names[0])) {
+			root = ERR_CAST(subvol_names[0]);
+			subvol_names[0] = NULL;
 			goto out;
 		}
 
 	}
 
-	root = mount_subtree(mnt, subvol_name);
+	for (i = 0 ; i < SUBVOL_NAMES_COUNT ; i++) {
+		if (!subvol_names[i])
+			break;
+
+		root = mount_subtree(mnt, subvol_names[i]);
+		if (!IS_ERR(root)) {
+			sv = subvol_names[i];
+			break;
+		}
+	}
+
 	/* mount_subtree() drops our reference on the vfsmount. */
 	mnt = NULL;
 
@@ -1466,8 +1487,7 @@ static struct dentry *mount_subvol(const char *subvol_name, u64 subvol_objectid,
 
 		ret = 0;
 		if (!is_subvolume_inode(root_inode)) {
-			btrfs_err(fs_info, "'%s' is not a valid subvolume",
-			       subvol_name);
+			btrfs_err(fs_info, "'%s' is not a valid subvolume", sv);
 			ret = -EINVAL;
 		}
 		if (subvol_objectid && root_objectid != subvol_objectid) {
@@ -1478,7 +1498,7 @@ static struct dentry *mount_subvol(const char *subvol_name, u64 subvol_objectid,
 			 */
 			btrfs_err(fs_info,
 				  "subvol '%s' does not match subvolid %llu",
-				  subvol_name, subvol_objectid);
+				  sv, subvol_objectid);
 			ret = -EINVAL;
 		}
 		if (ret) {
@@ -1490,7 +1510,6 @@ static struct dentry *mount_subvol(const char *subvol_name, u64 subvol_objectid,
 
 out:
 	mntput(mnt);
-	kfree(subvol_name);
 	return root;
 }
 
@@ -1636,15 +1655,16 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
 {
 	struct vfsmount *mnt_root;
 	struct dentry *root;
-	char *subvol_name = NULL;
+	int i;
+	char *subvol_names[SUBVOL_NAMES_COUNT] = {0,};
 	u64 subvol_objectid = 0;
 	int error = 0;
 
-	error = btrfs_parse_subvol_options(data, &subvol_name,
-					&subvol_objectid);
+	error = btrfs_parse_subvol_options(data, subvol_names,
+				&subvol_objectid);
 	if (error) {
-		kfree(subvol_name);
-		return ERR_PTR(error);
+		root = ERR_PTR(error);
+		goto out;
 	}
 
 	/* mount device's root (/) */
@@ -1658,7 +1678,6 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
 				flags | SB_RDONLY, device_name, data);
 			if (IS_ERR(mnt_root)) {
 				root = ERR_CAST(mnt_root);
-				kfree(subvol_name);
 				goto out;
 			}
 
@@ -1668,21 +1687,21 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
 			if (error < 0) {
 				root = ERR_PTR(error);
 				mntput(mnt_root);
-				kfree(subvol_name);
 				goto out;
 			}
 		}
 	}
 	if (IS_ERR(mnt_root)) {
 		root = ERR_CAST(mnt_root);
-		kfree(subvol_name);
 		goto out;
 	}
 
-	/* mount_subvol() will free subvol_name and mnt_root */
-	root = mount_subvol(subvol_name, subvol_objectid, mnt_root);
+	/* mount_subvol() will free mnt_root */
+	root = mount_subvol(subvol_names, subvol_objectid, mnt_root);
 
 out:
+	for (i = 0 ; i < SUBVOL_NAMES_COUNT ; i++)
+		kfree(subvol_names[i]);
 	return root;
 }
 
-- 
2.28.0.rc1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] btrfs: allow more subvol= option
  2020-07-21 20:33 ` [PATCH] btrfs: allow more subvol= option Goffredo Baroncelli
@ 2020-07-21 20:50   ` Steven Davies
  2020-07-22  1:12     ` kernel test robot
  1 sibling, 0 replies; 17+ messages in thread
From: Steven Davies @ 2020-07-21 20:50 UTC (permalink / raw)
  To: Goffredo Baroncelli, linux-btrfs; +Cc: Goffredo Baroncelli

On 21/07/2020 21:33, Goffredo Baroncelli wrote:
> From: Goffredo Baroncelli <kreijack@inwind.it>
> 
> When more than one subvol= options are passed, btrfs try to mount
> each subvolume until the first one succeed. Up to 5 subvol= options
> can be passed.
> 
> Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
> 
> ---
>   fs/btrfs/super.c | 71 ++++++++++++++++++++++++++++++------------------
>   1 file changed, 45 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index bc73fd670702..12d066e8d52c 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -52,6 +52,8 @@
>   #define CREATE_TRACE_POINTS
>   #include <trace/events/btrfs.h>
>   
> +#define SUBVOL_NAMES_COUNT 5

As this is a maximum, perhaps MAX_SUBVOL_NAMES or SUBVOL_NAMES_MAX

> +
>   static const struct super_operations btrfs_super_ops;
>   
>   /*
> @@ -974,12 +976,13 @@ static int btrfs_parse_device_options(const char *options, fmode_t flags,
>    *
>    * The value is later passed to mount_subvol()
>    */
> -static int btrfs_parse_subvol_options(const char *options, char **subvol_name,
> -		u64 *subvol_objectid)
> +static int btrfs_parse_subvol_options(const char *options, char **subvol_names,
> +					u64 *subvol_objectid)
>   {
>   	substring_t args[MAX_OPT_ARGS];
>   	char *opts, *orig, *p;
>   	int error = 0;
> +	int svi = 0;
>   	u64 subvolid;
>   
>   	if (!options)
> @@ -1002,12 +1005,17 @@ static int btrfs_parse_subvol_options(const char *options, char **subvol_name,
>   		token = match_token(p, tokens, args);
>   		switch (token) {
>   		case Opt_subvol:
> -			kfree(*subvol_name);
> -			*subvol_name = match_strdup(&args[0]);
> -			if (!*subvol_name) {
> +			if (svi >= SUBVOL_NAMES_COUNT) {
> +				pr_err("BTRFS: too much 'subvol=' mount options\n");

s/too much/too many/

Perhaps also include ", maximum is %d", SUBVOL_NAMES_COUNT

--snip--

-- 
Steven Davies

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] btrfs: strategy to perform a rollback at boot time
  2020-07-21 20:33 [RFC] btrfs: strategy to perform a rollback at boot time Goffredo Baroncelli
  2020-07-21 20:33 ` [PATCH] btrfs: allow more subvol= option Goffredo Baroncelli
@ 2020-07-21 20:55 ` Steven Davies
  2020-07-23 19:52   ` Goffredo Baroncelli
  2020-07-21 21:09 ` Chris Murphy
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Steven Davies @ 2020-07-21 20:55 UTC (permalink / raw)
  To: Goffredo Baroncelli, linux-btrfs

On 21/07/2020 21:33, Goffredo Baroncelli wrote:
> 
> Hi all,
> 
> this is an RFC to discuss a my idea to allow a simple rollback of the
> root filesystem at boot time.
> 
> The problem that I want to solve is the following: DPKG is very slow on
> a BTRFS filesystem. The reason is that DPKG massively uses
> sync()/fsync() to guarantee that the filesystem is always coherent even
> in case of sudden shutdown.
> 
> The same can be useful even to the RPM Linux based distribution (which however
> suffer less than DPKG).
> 
> A way to avoid the sync()/fsync() calls without loosing the DPKG
> guarantees, is:
> 1) perform a snapshot of the root filesystem (the rollback one)
> 2) upgrade the filesystem without using sync/fsync
> 3) final (global) sync
> 4) destroy the rollback snapshot
> 
> If an unclean shutdown happens between 1) and 4), two subvolume exists:
> the 'main' one and the 'rollback' one (which is the snapshot before the
> update). In this case the system at boot time should mount the "rollback"
> subvolume instead of the "main" one. Otherwise in case of a "clean" boot, the
> "rollback" subvolume doesn't exist and only the "main" one can be
> mounted.
> 
> In [1] I discussed a way to implement the steps 1 to 4. (ok, I missed
> the point 3) ).
> 
> The part that was missed until now, is an automatic way to mount the rollback
> subvolume at boot time when it is present.
> 
> My idea is to allow more 'subvol=' option. In this case BTRFS tries all the
> passed subvolumes until the first succeed. So invoking the kernel as:
> 
>    linux root=UUID=xxxx rootflags=subvol=rollback,subvol=main ro
> 
> First, the kernel tries to mount the 'rollback' subvolume. If the rollback
> subvolume doesn't exist then it mounts the 'main' subvolume.
> 
> Of course after the mount, the system should perform a cleanup of the
> subvolumes: i.e. if a rollback subvolume exists, the system should destroy
> the "main" one (which contains garbage) and rename "rollback" to "main".
> To be more precise:
> 
> 	if test -d "rollback"; then
> 		if test -d "old"; then
> 			btrfs sub del "old"
> 		fi
> 		if test -d "main"; then
> 			mv "main" "old"
> 		fi
> 		mv "rollback" "main"
> 		btrfs sub del "old"
> 	fi
> 
> Comments are welcome

I like this idea. Do we have an easy way of detecting which subvolume 
has been mounted (through sysfs or similar), or would you expect to 
always be testing this based on the existence of certain 
subvolumes/directories?

-- 
Steven Davies

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] btrfs: strategy to perform a rollback at boot time
  2020-07-21 20:33 [RFC] btrfs: strategy to perform a rollback at boot time Goffredo Baroncelli
  2020-07-21 20:33 ` [PATCH] btrfs: allow more subvol= option Goffredo Baroncelli
  2020-07-21 20:55 ` [RFC] btrfs: strategy to perform a rollback at boot time Steven Davies
@ 2020-07-21 21:09 ` Chris Murphy
  2020-07-22  0:21 ` Nicholas D Steeves
  2020-07-23 21:53 ` Zygo Blaxell
  4 siblings, 0 replies; 17+ messages in thread
From: Chris Murphy @ 2020-07-21 21:09 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: Btrfs BTRFS

On Tue, Jul 21, 2020 at 2:33 PM Goffredo Baroncelli <kreijack@libero.it> wrote:
>
>
> Hi all,
>
> this is an RFC to discuss a my idea to allow a simple rollback of the
> root filesystem at boot time.
>
> The problem that I want to solve is the following: DPKG is very slow on
> a BTRFS filesystem. The reason is that DPKG massively uses
> sync()/fsync() to guarantee that the filesystem is always coherent even
> in case of sudden shutdown.
>
> The same can be useful even to the RPM Linux based distribution (which however
> suffer less than DPKG).
>
> A way to avoid the sync()/fsync() calls without loosing the DPKG
> guarantees, is:
> 1) perform a snapshot of the root filesystem (the rollback one)
> 2) upgrade the filesystem without using sync/fsync
> 3) final (global) sync
> 4) destroy the rollback snapshot
>
> If an unclean shutdown happens between 1) and 4), two subvolume exists:
> the 'main' one and the 'rollback' one (which is the snapshot before the
> update). In this case the system at boot time should mount the "rollback"
> subvolume instead of the "main" one. Otherwise in case of a "clean" boot, the
> "rollback" subvolume doesn't exist and only the "main" one can be
> mounted.
>
> In [1] I discussed a way to implement the steps 1 to 4. (ok, I missed
> the point 3) ).
>
> The part that was missed until now, is an automatic way to mount the rollback
> subvolume at boot time when it is present.
>
> My idea is to allow more 'subvol=' option. In this case BTRFS tries all the
> passed subvolumes until the first succeed. So invoking the kernel as:
>
>   linux root=UUID=xxxx rootflags=subvol=rollback,subvol=main ro
>
> First, the kernel tries to mount the 'rollback' subvolume. If the rollback
> subvolume doesn't exist then it mounts the 'main' subvolume.
>
> Of course after the mount, the system should perform a cleanup of the
> subvolumes: i.e. if a rollback subvolume exists, the system should destroy
> the "main" one (which contains garbage) and rename "rollback" to "main".
> To be more precise:
>
>         if test -d "rollback"; then
>                 if test -d "old"; then
>                         btrfs sub del "old"
>                 fi
>                 if test -d "main"; then
>                         mv "main" "old"
>                 fi
>                 mv "rollback" "main"
>                 btrfs sub del "old"
>         fi
>
> Comments are welcome
> BR
> G.Baroncelli
>
> [1] http://lore.kernel.org/linux-btrfs/69396573-b5b3-b349-06f5-f5b74eb9720d@libero.it/
>
> P.S.
> I am guessing if an idea like this can be applied to a file. E.g. a sqlite
> database that instead of reling to sync/fsync, creates a reflink file as
> "rollback" if something goes wrong.... The ordering is preserved. Not the
> duration.

One way:
btrfs sub snap main rollback
change bootloader rootflags=subvol=rollback and /etc/fstab (or use
btrfs sub set-default)
do the update to main
- if it blows up at anytime, rollback is what's used, delete main and
rename rollback to main
- if it succeeds, revert the bootloader changes so main boots, but
keep rollback in case booting main fails

Another way:
btrfs sub snap main update
lock the /var /etc /boot for main from changes: no configuration
changes, no package changes, but user can keep working on user space
things
use bwrap/nspawn/podman to load up and assemble the update tree and
perform the update out of band
- if update blows up, just delete the update snapshot, and then unlock
the system from disallowed changes
- if update succeeds, main can be renamed mainold and update can be
renamed main, update bootloader stuff; everything still stays locked
and the user can keep working on user space things until they're ready
to reboot; nice thing about containers is you can apply cgroupsv2
controls to make sure the update has no resource control impact on the
user's current work

Personally I prefer the latter, doing the update out of band rather
than applying the update either on a running sysroot or having to do
an offline (reboot to a minimal environment) update. I think locking
the user out of system changes is acceptable for such an out of band
update. The alternative is something like the merge of /etc /var
things that have changed during the time the update was initiated - I
think it's not worth that complexity but if someone wants to build
that, OK.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] btrfs: strategy to perform a rollback at boot time
  2020-07-21 20:33 [RFC] btrfs: strategy to perform a rollback at boot time Goffredo Baroncelli
                   ` (2 preceding siblings ...)
  2020-07-21 21:09 ` Chris Murphy
@ 2020-07-22  0:21 ` Nicholas D Steeves
  2020-07-23 20:02   ` Goffredo Baroncelli
  2020-07-23 21:53 ` Zygo Blaxell
  4 siblings, 1 reply; 17+ messages in thread
From: Nicholas D Steeves @ 2020-07-22  0:21 UTC (permalink / raw)
  To: Goffredo Baroncelli, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2800 bytes --]

Hi,

Reply follows inline.

Goffredo Baroncelli <kreijack@libero.it> writes:

> Hi all,
>
> this is an RFC to discuss a my idea to allow a simple rollback of the
> root filesystem at boot time.
>
> The problem that I want to solve is the following: DPKG is very slow on
> a BTRFS filesystem. The reason is that DPKG massively uses
> sync()/fsync() to guarantee that the filesystem is always coherent even
> in case of sudden shutdown.
>
> The same can be useful even to the RPM Linux based distribution (which however
> suffer less than DPKG).
>
> A way to avoid the sync()/fsync() calls without loosing the DPKG
> guarantees, is:
> 1) perform a snapshot of the root filesystem (the rollback one)
> 2) upgrade the filesystem without using sync/fsync
> 3) final (global) sync
> 4) destroy the rollback snapshot
>
> If an unclean shutdown happens between 1) and 4), two subvolume exists:
> the 'main' one and the 'rollback' one (which is the snapshot before the
> update). In this case the system at boot time should mount the "rollback"
> subvolume instead of the "main" one. Otherwise in case of a "clean" boot, the
> "rollback" subvolume doesn't exist and only the "main" one can be
> mounted.
>
> In [1] I discussed a way to implement the steps 1 to 4. (ok, I missed
> the point 3) ).
>
> The part that was missed until now, is an automatic way to mount the rollback
> subvolume at boot time when it is present.
>
> My idea is to allow more 'subvol=' option. In this case BTRFS tries all the
> passed subvolumes until the first succeed. So invoking the kernel as:
>
>   linux root=UUID=xxxx rootflags=subvol=rollback,subvol=main ro 
>
> First, the kernel tries to mount the 'rollback' subvolume. If the rollback
> subvolume doesn't exist then it mounts the 'main' subvolume.
>
> Of course after the mount, the system should perform a cleanup of the
> subvolumes: i.e. if a rollback subvolume exists, the system should destroy
> the "main" one (which contains garbage) and rename "rollback" to "main".
> To be more precise:
>

I like the idea of defaulting to a known-good snapshot on unclean
shutdown :-)

Is anyone on this list aware of grub-btrfs
https://github.com/Antynea/grub-btrfs ?  It's not my project, by the
way, but I'm curious what advantages your method has compared to the
alleged ZFS-like Boot Environment support of grub-btrfs?  In particular,
I wonder if the problem have already been solved solved due to that
project's snapper support, and if it just needs more exposure, general
testing, and integration for other distributions.

Oh, and to get apt to trigger snapshot creation:
https://github.com/stefxh/apt-btrfs-snapper

Iirc there are a couple other apt-btrfs snapshot creation projects

Best,
Nicholas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] btrfs: allow more subvol= option
  2020-07-21 20:33 ` [PATCH] btrfs: allow more subvol= option Goffredo Baroncelli
@ 2020-07-22  1:12     ` kernel test robot
  2020-07-22  1:12     ` kernel test robot
  1 sibling, 0 replies; 17+ messages in thread
From: kernel test robot @ 2020-07-22  1:12 UTC (permalink / raw)
  To: Goffredo Baroncelli, linux-btrfs; +Cc: kbuild-all, Goffredo Baroncelli

[-- Attachment #1: Type: text/plain, Size: 8032 bytes --]

Hi Goffredo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kdave/for-next]
[also build test WARNING on v5.8-rc6 next-20200721]
[cannot apply to btrfs/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Goffredo-Baroncelli/btrfs-allow-more-subvol-option/20200722-043357
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: x86_64-randconfig-s022-20200719 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-14) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.2-49-g707c5017-dirty
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: incorrect type in argument 3 (different base types) @@     expected unsigned long flags @@     got restricted gfp_t [usertype] mask @@
   include/trace/events/btrfs.h:1335:1: sparse:     expected unsigned long flags
   include/trace/events/btrfs.h:1335:1: sparse:     got restricted gfp_t [usertype] mask
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast to restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast to restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: restricted gfp_t degrades to integer
   include/trace/events/btrfs.h:1335:1: sparse: sparse: restricted gfp_t degrades to integer
>> fs/btrfs/super.c:1714:51: sparse: sparse: Using plain integer as NULL pointer
   fs/btrfs/super.c:2394:31: sparse: sparse: incompatible types in comparison expression (different address spaces):
   fs/btrfs/super.c:2394:31: sparse:    struct rcu_string [noderef] __rcu *
   fs/btrfs/super.c:2394:31: sparse:    struct rcu_string *

vim +1714 fs/btrfs/super.c

  1685	
  1686	/*
  1687	 * Mount function which is called by VFS layer.
  1688	 *
  1689	 * In order to allow mounting a subvolume directly, btrfs uses mount_subtree()
  1690	 * which needs vfsmount* of device's root (/).  This means device's root has to
  1691	 * be mounted internally in any case.
  1692	 *
  1693	 * Operation flow:
  1694	 *   1. Parse subvol id related options for later use in mount_subvol().
  1695	 *
  1696	 *   2. Mount device's root (/) by calling vfs_kern_mount().
  1697	 *
  1698	 *      NOTE: vfs_kern_mount() is used by VFS to call btrfs_mount() in the
  1699	 *      first place. In order to avoid calling btrfs_mount() again, we use
  1700	 *      different file_system_type which is not registered to VFS by
  1701	 *      register_filesystem() (btrfs_root_fs_type). As a result,
  1702	 *      btrfs_mount_root() is called. The return value will be used by
  1703	 *      mount_subtree() in mount_subvol().
  1704	 *
  1705	 *   3. Call mount_subvol() to get the dentry of subvolume. Since there is
  1706	 *      "btrfs subvolume set-default", mount_subvol() is called always.
  1707	 */
  1708	static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
  1709			const char *device_name, void *data)
  1710	{
  1711		struct vfsmount *mnt_root;
  1712		struct dentry *root;
  1713		int i;
> 1714		char *subvol_names[SUBVOL_NAMES_COUNT] = {0,};
  1715		u64 subvol_objectid = 0;
  1716		int error = 0;
  1717	
  1718		error = btrfs_parse_subvol_options(data, subvol_names,
  1719					&subvol_objectid);
  1720		if (error) {
  1721			root = ERR_PTR(error);
  1722			goto out;
  1723		}
  1724	
  1725		/* mount device's root (/) */
  1726		mnt_root = vfs_kern_mount(&btrfs_root_fs_type, flags, device_name, data);
  1727		if (PTR_ERR_OR_ZERO(mnt_root) == -EBUSY) {
  1728			if (flags & SB_RDONLY) {
  1729				mnt_root = vfs_kern_mount(&btrfs_root_fs_type,
  1730					flags & ~SB_RDONLY, device_name, data);
  1731			} else {
  1732				mnt_root = vfs_kern_mount(&btrfs_root_fs_type,
  1733					flags | SB_RDONLY, device_name, data);
  1734				if (IS_ERR(mnt_root)) {
  1735					root = ERR_CAST(mnt_root);
  1736					goto out;
  1737				}
  1738	
  1739				down_write(&mnt_root->mnt_sb->s_umount);
  1740				error = btrfs_remount(mnt_root->mnt_sb, &flags, NULL);
  1741				up_write(&mnt_root->mnt_sb->s_umount);
  1742				if (error < 0) {
  1743					root = ERR_PTR(error);
  1744					mntput(mnt_root);
  1745					goto out;
  1746				}
  1747			}
  1748		}
  1749		if (IS_ERR(mnt_root)) {
  1750			root = ERR_CAST(mnt_root);
  1751			goto out;
  1752		}
  1753	
  1754		/* mount_subvol() will free mnt_root */
  1755		root = mount_subvol(subvol_names, subvol_objectid, mnt_root);
  1756	
  1757	out:
  1758		for (i = 0 ; i < SUBVOL_NAMES_COUNT ; i++)
  1759			kfree(subvol_names[i]);
  1760		return root;
  1761	}
  1762	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33348 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] btrfs: allow more subvol= option
@ 2020-07-22  1:12     ` kernel test robot
  0 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2020-07-22  1:12 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 8190 bytes --]

Hi Goffredo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kdave/for-next]
[also build test WARNING on v5.8-rc6 next-20200721]
[cannot apply to btrfs/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Goffredo-Baroncelli/btrfs-allow-more-subvol-option/20200722-043357
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: x86_64-randconfig-s022-20200719 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-14) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.2-49-g707c5017-dirty
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast from restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: incorrect type in argument 3 (different base types) @@     expected unsigned long flags @@     got restricted gfp_t [usertype] mask @@
   include/trace/events/btrfs.h:1335:1: sparse:     expected unsigned long flags
   include/trace/events/btrfs.h:1335:1: sparse:     got restricted gfp_t [usertype] mask
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast to restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: cast to restricted gfp_t
   include/trace/events/btrfs.h:1335:1: sparse: sparse: restricted gfp_t degrades to integer
   include/trace/events/btrfs.h:1335:1: sparse: sparse: restricted gfp_t degrades to integer
>> fs/btrfs/super.c:1714:51: sparse: sparse: Using plain integer as NULL pointer
   fs/btrfs/super.c:2394:31: sparse: sparse: incompatible types in comparison expression (different address spaces):
   fs/btrfs/super.c:2394:31: sparse:    struct rcu_string [noderef] __rcu *
   fs/btrfs/super.c:2394:31: sparse:    struct rcu_string *

vim +1714 fs/btrfs/super.c

  1685	
  1686	/*
  1687	 * Mount function which is called by VFS layer.
  1688	 *
  1689	 * In order to allow mounting a subvolume directly, btrfs uses mount_subtree()
  1690	 * which needs vfsmount* of device's root (/).  This means device's root has to
  1691	 * be mounted internally in any case.
  1692	 *
  1693	 * Operation flow:
  1694	 *   1. Parse subvol id related options for later use in mount_subvol().
  1695	 *
  1696	 *   2. Mount device's root (/) by calling vfs_kern_mount().
  1697	 *
  1698	 *      NOTE: vfs_kern_mount() is used by VFS to call btrfs_mount() in the
  1699	 *      first place. In order to avoid calling btrfs_mount() again, we use
  1700	 *      different file_system_type which is not registered to VFS by
  1701	 *      register_filesystem() (btrfs_root_fs_type). As a result,
  1702	 *      btrfs_mount_root() is called. The return value will be used by
  1703	 *      mount_subtree() in mount_subvol().
  1704	 *
  1705	 *   3. Call mount_subvol() to get the dentry of subvolume. Since there is
  1706	 *      "btrfs subvolume set-default", mount_subvol() is called always.
  1707	 */
  1708	static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
  1709			const char *device_name, void *data)
  1710	{
  1711		struct vfsmount *mnt_root;
  1712		struct dentry *root;
  1713		int i;
> 1714		char *subvol_names[SUBVOL_NAMES_COUNT] = {0,};
  1715		u64 subvol_objectid = 0;
  1716		int error = 0;
  1717	
  1718		error = btrfs_parse_subvol_options(data, subvol_names,
  1719					&subvol_objectid);
  1720		if (error) {
  1721			root = ERR_PTR(error);
  1722			goto out;
  1723		}
  1724	
  1725		/* mount device's root (/) */
  1726		mnt_root = vfs_kern_mount(&btrfs_root_fs_type, flags, device_name, data);
  1727		if (PTR_ERR_OR_ZERO(mnt_root) == -EBUSY) {
  1728			if (flags & SB_RDONLY) {
  1729				mnt_root = vfs_kern_mount(&btrfs_root_fs_type,
  1730					flags & ~SB_RDONLY, device_name, data);
  1731			} else {
  1732				mnt_root = vfs_kern_mount(&btrfs_root_fs_type,
  1733					flags | SB_RDONLY, device_name, data);
  1734				if (IS_ERR(mnt_root)) {
  1735					root = ERR_CAST(mnt_root);
  1736					goto out;
  1737				}
  1738	
  1739				down_write(&mnt_root->mnt_sb->s_umount);
  1740				error = btrfs_remount(mnt_root->mnt_sb, &flags, NULL);
  1741				up_write(&mnt_root->mnt_sb->s_umount);
  1742				if (error < 0) {
  1743					root = ERR_PTR(error);
  1744					mntput(mnt_root);
  1745					goto out;
  1746				}
  1747			}
  1748		}
  1749		if (IS_ERR(mnt_root)) {
  1750			root = ERR_CAST(mnt_root);
  1751			goto out;
  1752		}
  1753	
  1754		/* mount_subvol() will free mnt_root */
  1755		root = mount_subvol(subvol_names, subvol_objectid, mnt_root);
  1756	
  1757	out:
  1758		for (i = 0 ; i < SUBVOL_NAMES_COUNT ; i++)
  1759			kfree(subvol_names[i]);
  1760		return root;
  1761	}
  1762	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33348 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] btrfs: strategy to perform a rollback at boot time
  2020-07-21 20:55 ` [RFC] btrfs: strategy to perform a rollback at boot time Steven Davies
@ 2020-07-23 19:52   ` Goffredo Baroncelli
  0 siblings, 0 replies; 17+ messages in thread
From: Goffredo Baroncelli @ 2020-07-23 19:52 UTC (permalink / raw)
  To: Steven Davies, linux-btrfs

On 7/21/20 10:55 PM, Steven Davies wrote:
> On 21/07/2020 21:33, Goffredo Baroncelli wrote:
>>
>> Hi all,
>>
>> this is an RFC to discuss a my idea to allow a simple rollback of the
>> root filesystem at boot time.
>>
>> The problem that I want to solve is the following: DPKG is very slow on
>> a BTRFS filesystem. The reason is that DPKG massively uses
>> sync()/fsync() to guarantee that the filesystem is always coherent even
>> in case of sudden shutdown.
>>
>> The same can be useful even to the RPM Linux based distribution (which however
>> suffer less than DPKG).
>>
>> A way to avoid the sync()/fsync() calls without loosing the DPKG
>> guarantees, is:
>> 1) perform a snapshot of the root filesystem (the rollback one)
>> 2) upgrade the filesystem without using sync/fsync
>> 3) final (global) sync
>> 4) destroy the rollback snapshot
>>
>> If an unclean shutdown happens between 1) and 4), two subvolume exists:
>> the 'main' one and the 'rollback' one (which is the snapshot before the
>> update). In this case the system at boot time should mount the "rollback"
>> subvolume instead of the "main" one. Otherwise in case of a "clean" boot, the
>> "rollback" subvolume doesn't exist and only the "main" one can be
>> mounted.
>>
>> In [1] I discussed a way to implement the steps 1 to 4. (ok, I missed
>> the point 3) ).
>>
>> The part that was missed until now, is an automatic way to mount the rollback
>> subvolume at boot time when it is present.
>>
>> My idea is to allow more 'subvol=' option. In this case BTRFS tries all the
>> passed subvolumes until the first succeed. So invoking the kernel as:
>>
>>    linux root=UUID=xxxx rootflags=subvol=rollback,subvol=main ro
>>
>> First, the kernel tries to mount the 'rollback' subvolume. If the rollback
>> subvolume doesn't exist then it mounts the 'main' subvolume.
>>
>> Of course after the mount, the system should perform a cleanup of the
>> subvolumes: i.e. if a rollback subvolume exists, the system should destroy
>> the "main" one (which contains garbage) and rename "rollback" to "main".
>> To be more precise:
>>
>>     if test -d "rollback"; then
>>         if test -d "old"; then
>>             btrfs sub del "old"
>>         fi
>>         if test -d "main"; then
>>             mv "main" "old"
>>         fi
>>         mv "rollback" "main"
>>         btrfs sub del "old"
>>     fi
>>
>> Comments are welcome
> 
> I like this idea. Do we have an easy way of detecting which subvolume has been mounted (through sysfs or similar), or would you expect to always be testing this based on the existence of certain subvolumes/directories?

You can use findmnt or cat /proc/self/mountinfo

$  findmnt  | egrep btrfs
/                                     /dev/sde3[/debian] btrfs       rw,noatime,nodiratime,nossd,space_cache,subvolid=257,subvol=/debian
├─/boot                               /dev/sde3[/boot]   btrfs       rw,noatime,nodiratime,nossd,space_cache,subvolid=258,subvol=/boot
├─/var/btrfs                          /dev/sde3          btrfs       rw,noatime,nodiratime,nossd,space_cache,subvolid=5,subvol=/
└─/mnt/btrfs-raid1                    /dev/sdd2          btrfs       rw,noatime,nodiratime,space_cache,subvolid=5,subvol=/


$ cat /proc/self/mountinfo  | egrep btrfs
26 1 0:22 /debian / rw,noatime,nodiratime shared:1 - btrfs /dev/sde3 rw,nossd,space_cache,subvolid=257,subvol=/debian
113 26 0:22 / /var/btrfs rw,noatime,nodiratime shared:61 - btrfs /dev/sde3 rw,nossd,space_cache,subvolid=5,subvol=/
112 26 0:22 /boot /boot rw,noatime,nodiratime shared:63 - btrfs /dev/sde3 rw,nossd,space_cache,subvolid=258,subvol=/boot
127 26 0:46 / /mnt/btrfs-raid1 rw,noatime,nodiratime shared:71 - btrfs /dev/sdd2 rw,space_cache,subvolid=5,subvol=/
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] btrfs: strategy to perform a rollback at boot time
  2020-07-22  0:21 ` Nicholas D Steeves
@ 2020-07-23 20:02   ` Goffredo Baroncelli
  0 siblings, 0 replies; 17+ messages in thread
From: Goffredo Baroncelli @ 2020-07-23 20:02 UTC (permalink / raw)
  To: Nicholas D Steeves, linux-btrfs

On 7/22/20 2:21 AM, Nicholas D Steeves wrote:
> Hi,
> 
> Reply follows inline.
> 
> Goffredo Baroncelli <kreijack@libero.it> writes:
> 
>> Hi all,
>>
>> this is an RFC to discuss a my idea to allow a simple rollback of the
>> root filesystem at boot time.
>>
>> The problem that I want to solve is the following: DPKG is very slow on
>> a BTRFS filesystem. The reason is that DPKG massively uses
>> sync()/fsync() to guarantee that the filesystem is always coherent even
>> in case of sudden shutdown.
>>
>> The same can be useful even to the RPM Linux based distribution (which however
>> suffer less than DPKG).
>>
>> A way to avoid the sync()/fsync() calls without loosing the DPKG
>> guarantees, is:
>> 1) perform a snapshot of the root filesystem (the rollback one)
>> 2) upgrade the filesystem without using sync/fsync
>> 3) final (global) sync
>> 4) destroy the rollback snapshot
>>
>> If an unclean shutdown happens between 1) and 4), two subvolume exists:
>> the 'main' one and the 'rollback' one (which is the snapshot before the
>> update). In this case the system at boot time should mount the "rollback"
>> subvolume instead of the "main" one. Otherwise in case of a "clean" boot, the
>> "rollback" subvolume doesn't exist and only the "main" one can be
>> mounted.
>>
>> In [1] I discussed a way to implement the steps 1 to 4. (ok, I missed
>> the point 3) ).
>>
>> The part that was missed until now, is an automatic way to mount the rollback
>> subvolume at boot time when it is present.
>>
>> My idea is to allow more 'subvol=' option. In this case BTRFS tries all the
>> passed subvolumes until the first succeed. So invoking the kernel as:
>>
>>    linux root=UUID=xxxx rootflags=subvol=rollback,subvol=main ro
>>
>> First, the kernel tries to mount the 'rollback' subvolume. If the rollback
>> subvolume doesn't exist then it mounts the 'main' subvolume.
>>
>> Of course after the mount, the system should perform a cleanup of the
>> subvolumes: i.e. if a rollback subvolume exists, the system should destroy
>> the "main" one (which contains garbage) and rename "rollback" to "main".
>> To be more precise:
>>
> 
> I like the idea of defaulting to a known-good snapshot on unclean
> shutdown :-)
> 
> Is anyone on this list aware of grub-btrfs
> https://github.com/Antynea/grub-btrfs ?  It's not my project, by the
> way, but I'm curious what advantages your method has compared to the
> alleged ZFS-like Boot Environment support of grub-btrfs?  

Looking at the script, it seems that grub-btrfs updated the grub.cfg when a new snapshot is created (using the systemd capability to trigger an event in case a certain filesystem change happens).

It is a nice idea, however it requires to regenerate grub.cfg at every snapshot. I think that it increases the likelihood of having a corrupted grub.cfg in case of unclean shutdown.

However, I think that grub definitely has the capability to boot from a rollback subvolume. However, this means that grub is needed to rollback the system, which is an too strong requirement.

Update:
Looking at this answer [https://unix.stackexchange.com/questions/415049/generating-menuentry-for-iso-images-dynamically-in-grub-cfg] it seems that grub has the capability to generate dynamically a menu on the basis of the directory structure

> In particular,
> I wonder if the problem have already been solved solved due to that
> project's snapper support, and if it just needs more exposure, general
> testing, and integration for other distributions.
> 
> Oh, and to get apt to trigger snapshot creation:
> https://github.com/stefxh/apt-btrfs-snapper
> 
> Iirc there are a couple other apt-btrfs snapshot creation projects
> 
> Best,
> Nicholas
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] btrfs: strategy to perform a rollback at boot time
  2020-07-21 20:33 [RFC] btrfs: strategy to perform a rollback at boot time Goffredo Baroncelli
                   ` (3 preceding siblings ...)
  2020-07-22  0:21 ` Nicholas D Steeves
@ 2020-07-23 21:53 ` Zygo Blaxell
  2020-07-24 11:56   ` Goffredo Baroncelli
  4 siblings, 1 reply; 17+ messages in thread
From: Zygo Blaxell @ 2020-07-23 21:53 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: linux-btrfs

On Tue, Jul 21, 2020 at 10:33:39PM +0200, Goffredo Baroncelli wrote:
> 
> Hi all,
> 
> this is an RFC to discuss a my idea to allow a simple rollback of the
> root filesystem at boot time.
> 
> The problem that I want to solve is the following: DPKG is very slow on
> a BTRFS filesystem. The reason is that DPKG massively uses
> sync()/fsync() to guarantee that the filesystem is always coherent even
> in case of sudden shutdown.
> 
> The same can be useful even to the RPM Linux based distribution (which however
> suffer less than DPKG).
> 
> A way to avoid the sync()/fsync() calls without loosing the DPKG
> guarantees, is:
> 1) perform a snapshot of the root filesystem (the rollback one)
> 2) upgrade the filesystem without using sync/fsync
> 3) final (global) sync
> 4) destroy the rollback snapshot

The idea sounds OK, but there are alternatives:

	1) perform snapshot of root filesystem
	2) chroot snapshot eatmydata apt dist-upgrade (*)
	3) sync -f snapshot
	4) renameat2(..., snapshot, ..., root, RENAME_EXCHANGE)
	5) delete snapshot

(*) OK you have to set up /dev, /proc, /sys, etc, probably a whole
namespace.

This may not play well with maintainer scripts on some distros, but it
does mean you don't have a half-broken system _during_ the upgrade.

Sometimes when I have a really problematic upgrade I rsync the system
to another box, do the upgrade there, and then rsync the system back
to the problematic box.  As a side-effect it also allows me to do a
verification test to make sure the upgrade worked before throwing it
onto a production system.  The snapshot/rollback thing would be a
local version of that.

> If an unclean shutdown happens between 1) and 4), two subvolume exists:
> the 'main' one and the 'rollback' one (which is the snapshot before the
> update). In this case the system at boot time should mount the "rollback"
> subvolume instead of the "main" one. Otherwise in case of a "clean" boot, the
> "rollback" subvolume doesn't exist and only the "main" one can be
> mounted.
> 
> In [1] I discussed a way to implement the steps 1 to 4. (ok, I missed
> the point 3) ).
> 
> The part that was missed until now, is an automatic way to mount the rollback
> subvolume at boot time when it is present.
> 
> My idea is to allow more 'subvol=' option. In this case BTRFS tries all the
> passed subvolumes until the first succeed. So invoking the kernel as:
> 
>   linux root=UUID=xxxx rootflags=subvol=rollback,subvol=main ro 
> 
> First, the kernel tries to mount the 'rollback' subvolume. If the rollback
> subvolume doesn't exist then it mounts the 'main' subvolume.

This could be done already from the initramfs.

> Of course after the mount, the system should perform a cleanup of the
> subvolumes: i.e. if a rollback subvolume exists, the system should destroy
> the "main" one (which contains garbage) and rename "rollback" to "main".
> To be more precise:
> 
> 	if test -d "rollback"; then
> 		if test -d "old"; then
> 			btrfs sub del "old"
> 		fi
> 		if test -d "main"; then
> 			mv "main" "old"
> 		fi
> 		mv "rollback" "main"
> 		btrfs sub del "old"
> 	fi
> 
> Comments are welcome
> BR
> G.Baroncelli
> 
> [1] http://lore.kernel.org/linux-btrfs/69396573-b5b3-b349-06f5-f5b74eb9720d@libero.it/
> 
> P.S.
> I am guessing if an idea like this can be applied to a file. E.g. a sqlite
> database that instead of reling to sync/fsync, creates a reflink file as
> "rollback" if something goes wrong.... The ordering is preserved. Not the
> duration.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] btrfs: strategy to perform a rollback at boot time
  2020-07-23 21:53 ` Zygo Blaxell
@ 2020-07-24 11:56   ` Goffredo Baroncelli
  2020-07-24 22:08     ` Chris Murphy
                       ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Goffredo Baroncelli @ 2020-07-24 11:56 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs, Chris Murphy

On 7/23/20 11:53 PM, Zygo Blaxell wrote:
> On Tue, Jul 21, 2020 at 10:33:39PM +0200, Goffredo Baroncelli wrote:
>>
>> Hi all,
>>
>> this is an RFC to discuss a my idea to allow a simple rollback of the
>> root filesystem at boot time.
>>
>> The problem that I want to solve is the following: DPKG is very slow on
>> a BTRFS filesystem. The reason is that DPKG massively uses
>> sync()/fsync() to guarantee that the filesystem is always coherent even
>> in case of sudden shutdown.
>>
>> The same can be useful even to the RPM Linux based distribution (which however
>> suffer less than DPKG).
>>
>> A way to avoid the sync()/fsync() calls without loosing the DPKG
>> guarantees, is:
>> 1) perform a snapshot of the root filesystem (the rollback one)
>> 2) upgrade the filesystem without using sync/fsync
>> 3) final (global) sync
>> 4) destroy the rollback snapshot
> 
> The idea sounds OK, but there are alternatives:
> 
> 	1) perform snapshot of root filesystem
> 	2) chroot snapshot eatmydata apt dist-upgrade (*)
> 	3) sync -f snapshot
> 	4) renameat2(..., snapshot, ..., root, RENAME_EXCHANGE)
> 	5) delete snapshot
> 
> (*) OK you have to set up /dev, /proc, /sys, etc, probably a whole
> namespace.
> 
> This may not play well with maintainer scripts on some distros, but it
> does mean you don't have a half-broken system _during_ the upgrade.

Also Chris, suggested that. However I don't think that it is a viable solution:
1) as you pointed out, most of the maintainer pre/post install scripts assume that the system is "live". So I don't think that it would be possible without auditing and updating all the packages.
2) what happens in case of unclean shutdown during step 4 ? To me it seems that we are performing two installations :-) The first one is at step 2 and the second one is at step 3. Moreover a move between two subvolumes is not allowed (it like a copy)
higo@venice:/tmp$ btrfs sub crea sub1
Create subvolume './sub1'
ghigo@venice:/tmp$ btrfs sub crea sub2
Create subvolume './sub2'
ghigo@venice:/tmp$ touch sub1/file1
ghigo@venice:/tmp$ python
Python 2.7.18 (default, Apr 20 2020, 20:30:41)
[GCC 9.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.rename("sub1/file1", "sub2/file")
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
OSError: [Errno 18] Invalid cross-device link


This means that there is an high risk of an incomplete write in case of unplanned shutdown (even tough clone is allowed...)


> 
> Sometimes when I have a really problematic upgrade I rsync the system
> to another box, do the upgrade there, and then rsync the system back
> to the problematic box.  As a side-effect it also allows me to do a
> verification test to make sure the upgrade worked before throwing it
> onto a production system.  The snapshot/rollback thing would be a
> local version of that.
> 
>> If an unclean shutdown happens between 1) and 4), two subvolume exists:
>> the 'main' one and the 'rollback' one (which is the snapshot before the
>> update). In this case the system at boot time should mount the "rollback"
>> subvolume instead of the "main" one. Otherwise in case of a "clean" boot, the
>> "rollback" subvolume doesn't exist and only the "main" one can be
>> mounted.
>>
>> In [1] I discussed a way to implement the steps 1 to 4. (ok, I missed
>> the point 3) ).
>>
>> The part that was missed until now, is an automatic way to mount the rollback
>> subvolume at boot time when it is present.
>>
>> My idea is to allow more 'subvol=' option. In this case BTRFS tries all the
>> passed subvolumes until the first succeed. So invoking the kernel as:
>>
>>    linux root=UUID=xxxx rootflags=subvol=rollback,subvol=main ro
>>
>> First, the kernel tries to mount the 'rollback' subvolume. If the rollback
>> subvolume doesn't exist then it mounts the 'main' subvolume.
> 
> This could be done already from the initramfs.

Ok, this means that we have three possibility:
1) do this at bootloder level (eg grub)
2) do this at initramfs
3) do this at kernel level (see my patch)

All these possibilities are a viable solution. However I find 1) and 2) the more "intrusive", and distro specific. My fear is that each distro will take a different choice, leading to a more fragmentation.
I hoped that the solution nr 3, could help to find a unique solution....




> 
>> Of course after the mount, the system should perform a cleanup of the
>> subvolumes: i.e. if a rollback subvolume exists, the system should destroy
>> the "main" one (which contains garbage) and rename "rollback" to "main".
>> To be more precise:
>>
>> 	if test -d "rollback"; then
>> 		if test -d "old"; then
>> 			btrfs sub del "old"
>> 		fi
>> 		if test -d "main"; then
>> 			mv "main" "old"
>> 		fi
>> 		mv "rollback" "main"
>> 		btrfs sub del "old"
>> 	fi
>>
>> Comments are welcome
>> BR
>> G.Baroncelli
>>
>> [1] http://lore.kernel.org/linux-btrfs/69396573-b5b3-b349-06f5-f5b74eb9720d@libero.it/
>>
>> P.S.
>> I am guessing if an idea like this can be applied to a file. E.g. a sqlite
>> database that instead of reling to sync/fsync, creates a reflink file as
>> "rollback" if something goes wrong.... The ordering is preserved. Not the
>> duration.
>>


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] btrfs: strategy to perform a rollback at boot time
  2020-07-24 11:56   ` Goffredo Baroncelli
@ 2020-07-24 22:08     ` Chris Murphy
  2020-07-25  2:37     ` Zygo Blaxell
  2020-07-27 12:26     ` David Sterba
  2 siblings, 0 replies; 17+ messages in thread
From: Chris Murphy @ 2020-07-24 22:08 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: Zygo Blaxell, Btrfs BTRFS, Chris Murphy

On Fri, Jul 24, 2020 at 5:56 AM Goffredo Baroncelli <kreijack@libero.it> wrote:
>
> On 7/23/20 11:53 PM, Zygo Blaxell wrote:
> > On Tue, Jul 21, 2020 at 10:33:39PM +0200, Goffredo Baroncelli wrote:
> >>
> >> Hi all,
> >>
> >> this is an RFC to discuss a my idea to allow a simple rollback of the
> >> root filesystem at boot time.
> >>
> >> The problem that I want to solve is the following: DPKG is very slow on
> >> a BTRFS filesystem. The reason is that DPKG massively uses
> >> sync()/fsync() to guarantee that the filesystem is always coherent even
> >> in case of sudden shutdown.
> >>
> >> The same can be useful even to the RPM Linux based distribution (which however
> >> suffer less than DPKG).
> >>
> >> A way to avoid the sync()/fsync() calls without loosing the DPKG
> >> guarantees, is:
> >> 1) perform a snapshot of the root filesystem (the rollback one)
> >> 2) upgrade the filesystem without using sync/fsync
> >> 3) final (global) sync
> >> 4) destroy the rollback snapshot
> >
> > The idea sounds OK, but there are alternatives:
> >
> >       1) perform snapshot of root filesystem
> >       2) chroot snapshot eatmydata apt dist-upgrade (*)
> >       3) sync -f snapshot
> >       4) renameat2(..., snapshot, ..., root, RENAME_EXCHANGE)
> >       5) delete snapshot
> >
> > (*) OK you have to set up /dev, /proc, /sys, etc, probably a whole
> > namespace.
> >
> > This may not play well with maintainer scripts on some distros, but it
> > does mean you don't have a half-broken system _during_ the upgrade.
>
> Also Chris, suggested that. However I don't think that it is a viable solution:
> 1) as you pointed out, most of the maintainer pre/post install scripts assume that the system is "live". So I don't think that it would be possible without auditing and updating all the packages.

The FHS is distinctly unhelpful here. I'd go so far as to say it's a
problem. However...

If only 3-4 locations are excluded from the snapshot and rollback
regime: /home, /var/tmp, /var/lib/libvirt/images, /var/log - most
everything else can be locked from modifications while the update is
occurring prior to the reboot.

Also, users find having to reboot to deploy things, annoying. And this
would let them keep working while the update happens, and they can
reboot whenever they want - albeit with an administratively locked
system, until reboot happens and succeeds. Yanking binaries and
libraries out from under a running system is objectively worse UX and
risk.


> 2) what happens in case of unclean shutdown during step 4 ? To me it seems that we are performing two installations :-) The first one is at step 2 and the second one is at step 3. Moreover a move between two subvolumes is not allowed (it like a copy)

mv right now is an expensive copy unless the mv happens via a
subvolume in common to both locations, e.g. top-level. I don't know
what kernel changes are needed to figure this out automatically and
not require an explicit mount of that in-common subvol. But it surely
would be nice to have.


>
> Ok, this means that we have three possibility:
> 1) do this at bootloder level (eg grub)
> 2) do this at initramfs
> 3) do this at kernel level (see my patch)
>
> All these possibilities are a viable solution. However I find 1) and 2) the more "intrusive", and distro specific. My fear is that each distro will take a different choice, leading to a more fragmentation.
> I hoped that the solution nr 3, could help to find a unique solution....

I think (1) is straight out due to the lack of good a11y, and i18n
support in GRUB. Same for the initramfs. This is unfortunate, because
the snapshot discovery and read-only boot works in SUSE, but I don't
think many people want to spend more effort on things that don't
support localization and accessibility.

If there is some test that could be done automatically? Fedora has a
boot success test patch in its GRUB, used to allow the GRUB menu to be
invisible to the user unless there's a boot problem.

https://src.fedoraproject.org/rpms/grub2/blob/master/f/0131-Add-grub-set-bootflag-utility.patch

Other ideas and related work:
https://btrfs.wiki.kernel.org/index.php/Autosnap#Timeslider
https://github.com/coreos/rpm-ostree  (One item of interest that might
help with FHS madness is the logic it uses for merging /etc following
updates.)

I'm not convinced there is a role for the kernel in any of this,
beyond what functionality we've already got (and aforementioned
reflink copies without having to mount an in-common subvol).


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] btrfs: strategy to perform a rollback at boot time
  2020-07-24 11:56   ` Goffredo Baroncelli
  2020-07-24 22:08     ` Chris Murphy
@ 2020-07-25  2:37     ` Zygo Blaxell
  2020-07-27 12:26     ` David Sterba
  2 siblings, 0 replies; 17+ messages in thread
From: Zygo Blaxell @ 2020-07-25  2:37 UTC (permalink / raw)
  To: kreijack; +Cc: linux-btrfs, Chris Murphy

On Fri, Jul 24, 2020 at 01:56:58PM +0200, Goffredo Baroncelli wrote:
> On 7/23/20 11:53 PM, Zygo Blaxell wrote:
> > On Tue, Jul 21, 2020 at 10:33:39PM +0200, Goffredo Baroncelli wrote:
> > > 
> > > Hi all,
> > > 
> > > this is an RFC to discuss a my idea to allow a simple rollback of the
> > > root filesystem at boot time.
> > > 
> > > The problem that I want to solve is the following: DPKG is very slow on
> > > a BTRFS filesystem. The reason is that DPKG massively uses
> > > sync()/fsync() to guarantee that the filesystem is always coherent even
> > > in case of sudden shutdown.
> > > 
> > > The same can be useful even to the RPM Linux based distribution (which however
> > > suffer less than DPKG).
> > > 
> > > A way to avoid the sync()/fsync() calls without loosing the DPKG
> > > guarantees, is:
> > > 1) perform a snapshot of the root filesystem (the rollback one)
> > > 2) upgrade the filesystem without using sync/fsync
> > > 3) final (global) sync
> > > 4) destroy the rollback snapshot
> > 
> > The idea sounds OK, but there are alternatives:
> > 
> > 	1) perform snapshot of root filesystem
> > 	2) chroot snapshot eatmydata apt dist-upgrade (*)
> > 	3) sync -f snapshot
> > 	4) renameat2(..., snapshot, ..., root, RENAME_EXCHANGE)
> > 	5) delete snapshot
> > 
> > (*) OK you have to set up /dev, /proc, /sys, etc, probably a whole
> > namespace.
> > 
> > This may not play well with maintainer scripts on some distros, but it
> > does mean you don't have a half-broken system _during_ the upgrade.
> 
> Also Chris, suggested that. However I don't think that it is a viable solution:
> 1) as you pointed out, most of the maintainer pre/post install scripts assume that the system is "live". So I don't think that it would be possible without auditing and updating all the packages.
> 2) what happens in case of unclean shutdown during step 4 ? To me it seems that we are performing two installations :-) The first one is at step 2 and the second one is at step 3. Moreover a move between two subvolumes is not allowed (it like a copy)
> higo@venice:/tmp$ btrfs sub crea sub1
> Create subvolume './sub1'
> ghigo@venice:/tmp$ btrfs sub crea sub2
> Create subvolume './sub2'
> ghigo@venice:/tmp$ touch sub1/file1
> ghigo@venice:/tmp$ python
> Python 2.7.18 (default, Apr 20 2020, 20:30:41)
> [GCC 9.3.0] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> > > > import os
> > > > os.rename("sub1/file1", "sub2/file")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> OSError: [Errno 18] Invalid cross-device link
> 
> This means that there is an high risk of an incomplete write in case of unplanned shutdown (even tough clone is allowed...)

renameat2() can atomically swap two directories, so it would solve this
specific part of the problem.

> 
> > 
> > Sometimes when I have a really problematic upgrade I rsync the system
> > to another box, do the upgrade there, and then rsync the system back
> > to the problematic box.  As a side-effect it also allows me to do a
> > verification test to make sure the upgrade worked before throwing it
> > onto a production system.  The snapshot/rollback thing would be a
> > local version of that.
> > 
> > > If an unclean shutdown happens between 1) and 4), two subvolume exists:
> > > the 'main' one and the 'rollback' one (which is the snapshot before the
> > > update). In this case the system at boot time should mount the "rollback"
> > > subvolume instead of the "main" one. Otherwise in case of a "clean" boot, the
> > > "rollback" subvolume doesn't exist and only the "main" one can be
> > > mounted.
> > > 
> > > In [1] I discussed a way to implement the steps 1 to 4. (ok, I missed
> > > the point 3) ).
> > > 
> > > The part that was missed until now, is an automatic way to mount the rollback
> > > subvolume at boot time when it is present.
> > > 
> > > My idea is to allow more 'subvol=' option. In this case BTRFS tries all the
> > > passed subvolumes until the first succeed. So invoking the kernel as:
> > > 
> > >    linux root=UUID=xxxx rootflags=subvol=rollback,subvol=main ro
> > > 
> > > First, the kernel tries to mount the 'rollback' subvolume. If the rollback
> > > subvolume doesn't exist then it mounts the 'main' subvolume.
> > 
> > This could be done already from the initramfs.
> 
> Ok, this means that we have three possibility:
> 1) do this at bootloder level (eg grub)
> 2) do this at initramfs
> 3) do this at kernel level (see my patch)
> 
> All these possibilities are a viable solution. However I find 1) and 2) the more "intrusive", and distro specific. My fear is that each distro will take a different choice, leading to a more fragmentation.
> I hoped that the solution nr 3, could help to find a unique solution....
> 
> 
> 
> 
> > 
> > > Of course after the mount, the system should perform a cleanup of the
> > > subvolumes: i.e. if a rollback subvolume exists, the system should destroy
> > > the "main" one (which contains garbage) and rename "rollback" to "main".
> > > To be more precise:
> > > 
> > > 	if test -d "rollback"; then
> > > 		if test -d "old"; then
> > > 			btrfs sub del "old"
> > > 		fi
> > > 		if test -d "main"; then
> > > 			mv "main" "old"
> > > 		fi
> > > 		mv "rollback" "main"
> > > 		btrfs sub del "old"
> > > 	fi
> > > 
> > > Comments are welcome
> > > BR
> > > G.Baroncelli
> > > 
> > > [1] http://lore.kernel.org/linux-btrfs/69396573-b5b3-b349-06f5-f5b74eb9720d@libero.it/
> > > 
> > > P.S.
> > > I am guessing if an idea like this can be applied to a file. E.g. a sqlite
> > > database that instead of reling to sync/fsync, creates a reflink file as
> > > "rollback" if something goes wrong.... The ordering is preserved. Not the
> > > duration.
> > > 
> 
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] btrfs: strategy to perform a rollback at boot time
  2020-07-24 11:56   ` Goffredo Baroncelli
  2020-07-24 22:08     ` Chris Murphy
  2020-07-25  2:37     ` Zygo Blaxell
@ 2020-07-27 12:26     ` David Sterba
  2020-07-27 17:25       ` Goffredo Baroncelli
  2 siblings, 1 reply; 17+ messages in thread
From: David Sterba @ 2020-07-27 12:26 UTC (permalink / raw)
  To: kreijack; +Cc: Zygo Blaxell, linux-btrfs, Chris Murphy

On Fri, Jul 24, 2020 at 01:56:58PM +0200, Goffredo Baroncelli wrote:
> > This could be done already from the initramfs.
> 
> Ok, this means that we have three possibility:
> 1) do this at bootloder level (eg grub)
> 2) do this at initramfs
> 3) do this at kernel level (see my patch)
> 
> All these possibilities are a viable solution. However I find 1) and
> 2) the more "intrusive", and distro specific. My fear is that each
> distro will take a different choice, leading to a more fragmentation.
> I hoped that the solution nr 3, could help to find a unique solution....

IMO bootloader or initrd are the right places to do the mount test and
eventual rollback. What kernel provides is to mount the subvolume, it's
up the the user to supply the right one. When I read the proposal the
option 2 was the the first thought that can be implemented with the
existing kernel support already.

Distros take different approach to various problems, and this is fine.
Here the list of fallback subvolumes, naming, where it's stored or
whatever may differ and the kernel provides the base functionality.

It would make sense to push that down one level in case all distros have
to repeat the same code and there would be an established way to do the
main/rollback switch.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] btrfs: strategy to perform a rollback at boot time
  2020-07-27 12:26     ` David Sterba
@ 2020-07-27 17:25       ` Goffredo Baroncelli
  2020-07-27 17:34         ` Goffredo Baroncelli
  0 siblings, 1 reply; 17+ messages in thread
From: Goffredo Baroncelli @ 2020-07-27 17:25 UTC (permalink / raw)
  To: dsterba, Zygo Blaxell, linux-btrfs, Chris Murphy

On 7/27/20 2:26 PM, David Sterba wrote:
> On Fri, Jul 24, 2020 at 01:56:58PM +0200, Goffredo Baroncelli wrote:
>>> This could be done already from the initramfs.
>>
>> Ok, this means that we have three possibility:
>> 1) do this at bootloder level (eg grub)
>> 2) do this at initramfs
>> 3) do this at kernel level (see my patch)
>>
>> All these possibilities are a viable solution. However I find 1) and
>> 2) the more "intrusive", and distro specific. My fear is that each
>> distro will take a different choice, leading to a more fragmentation.
>> I hoped that the solution nr 3, could help to find a unique solution....
> 
> IMO bootloader or initrd are the right places to do the mount test and
> eventual rollback. What kernel provides is to mount the subvolume, it's
> up the the user to supply the right one. When I read the proposal the
> option 2 was the the first thought that can be implemented with the
> existing kernel support already.
> 
> Distros take different approach to various problems, and this is fine.
> Here the list of fallback subvolumes, naming, where it's stored or
> whatever may differ and the kernel provides the base functionality.
> 
> It would make sense to push that down one level in case all distros have
> to repeat the same code and there would be an established way to do the
> main/rollback switch.

I am looking for another solution, which is based on some suggestions taken
from Zygo and Chris. This solution requires no change to initrd, kernel and bootloader.

More or less the sequence is the following:

During the upgrade
==================
1 cleanup previous unclean subvolume and snapshot pairs (due to an unattended abort)
2 take a snapshot of the main subvolume
3 swap (atomically via renameat2) the original subvolume and its snapshot
	this means that in case of an unattended reboot, the system starts
	from the snapshot
4 update the filesystem
5 re-swap original subvolume and its snapshot
6 delete the snapshot (or collect it to provide a way to return to previous configuration)

This procedure has three possible endings:
1) all ok, nothing to do
2) unattended reboot happened; at startup a cleanup of the subvolume is required
3) unattended abort happened without a reboot; we still have the two subvolumes, at least
during the shutdown the subvolume and its snapshot have to be swapped (if required)

During the startup
==================
A script checks if the system started from the snapshot, and if so
delete the original subvolume (or collect it to provide an history)


During the shutdown
===================
a script checks if both the subvolume and its snapshot are present.
This happens if the upgrade procedure abort for some reasons (but the system doesn't reboot).
In this case I think that it is safe to swap snapshot and original subvolume and
drop the snapshot (or collect it to provide....)


> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] btrfs: strategy to perform a rollback at boot time
  2020-07-27 17:25       ` Goffredo Baroncelli
@ 2020-07-27 17:34         ` Goffredo Baroncelli
  0 siblings, 0 replies; 17+ messages in thread
From: Goffredo Baroncelli @ 2020-07-27 17:34 UTC (permalink / raw)
  To: dsterba, Zygo Blaxell, linux-btrfs, Chris Murphy

On 7/27/20 7:25 PM, Goffredo Baroncelli wrote:
> On 7/27/20 2:26 PM, David Sterba wrote:
>> On Fri, Jul 24, 2020 at 01:56:58PM +0200, Goffredo Baroncelli wrote:
>>>> This could be done already from the initramfs.
>>>
>>> Ok, this means that we have three possibility:
>>> 1) do this at bootloder level (eg grub)
>>> 2) do this at initramfs
>>> 3) do this at kernel level (see my patch)
>>>
>>> All these possibilities are a viable solution. However I find 1) and
>>> 2) the more "intrusive", and distro specific. My fear is that each
>>> distro will take a different choice, leading to a more fragmentation.
>>> I hoped that the solution nr 3, could help to find a unique solution....
>>
>> IMO bootloader or initrd are the right places to do the mount test and
>> eventual rollback. What kernel provides is to mount the subvolume, it's
>> up the the user to supply the right one. When I read the proposal the
>> option 2 was the the first thought that can be implemented with the
>> existing kernel support already.
>>
>> Distros take different approach to various problems, and this is fine.
>> Here the list of fallback subvolumes, naming, where it's stored or
>> whatever may differ and the kernel provides the base functionality.
>>
>> It would make sense to push that down one level in case all distros have
>> to repeat the same code and there would be an established way to do the
>> main/rollback switch.
> 
> I am looking for another solution, which is based on some suggestions taken
> from Zygo and Chris. This solution requires no change to initrd, kernel and bootloader.
> 
> More or less the sequence is the following:
> 
> During the upgrade
> ==================
> 1 cleanup previous unclean subvolume and snapshot pairs (due to an unattended abort)
> 2 take a snapshot of the main subvolume
> 3 swap (atomically via renameat2) the original subvolume and its snapshot
>      this means that in case of an unattended reboot, the system starts
>      from the snapshot
> 4 update the filesystem
> 5 re-swap original subvolume and its snapshot
> 6 delete the snapshot (or collect it to provide a way to return to previous configuration)

Of course the devil is in the detail: with the process described above, during "grub" upgrade
the grub.cfg will be generated on the current root subvolume, which is the name of the snapshot

:-(


> 
> This procedure has three possible endings:
> 1) all ok, nothing to do
> 2) unattended reboot happened; at startup a cleanup of the subvolume is required
> 3) unattended abort happened without a reboot; we still have the two subvolumes, at least
> during the shutdown the subvolume and its snapshot have to be swapped (if required)
> 
> During the startup
> ==================
> A script checks if the system started from the snapshot, and if so
> delete the original subvolume (or collect it to provide an history)
> 
> 
> During the shutdown
> ===================
> a script checks if both the subvolume and its snapshot are present.
> This happens if the upgrade procedure abort for some reasons (but the system doesn't reboot).
> In this case I think that it is safe to swap snapshot and original subvolume and
> drop the snapshot (or collect it to provide....)
> 
> 
>>
> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-07-27 17:34 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-21 20:33 [RFC] btrfs: strategy to perform a rollback at boot time Goffredo Baroncelli
2020-07-21 20:33 ` [PATCH] btrfs: allow more subvol= option Goffredo Baroncelli
2020-07-21 20:50   ` Steven Davies
2020-07-22  1:12   ` kernel test robot
2020-07-22  1:12     ` kernel test robot
2020-07-21 20:55 ` [RFC] btrfs: strategy to perform a rollback at boot time Steven Davies
2020-07-23 19:52   ` Goffredo Baroncelli
2020-07-21 21:09 ` Chris Murphy
2020-07-22  0:21 ` Nicholas D Steeves
2020-07-23 20:02   ` Goffredo Baroncelli
2020-07-23 21:53 ` Zygo Blaxell
2020-07-24 11:56   ` Goffredo Baroncelli
2020-07-24 22:08     ` Chris Murphy
2020-07-25  2:37     ` Zygo Blaxell
2020-07-27 12:26     ` David Sterba
2020-07-27 17:25       ` Goffredo Baroncelli
2020-07-27 17:34         ` Goffredo Baroncelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.