[PATCH 0/4] Btrfs: a few more cleanups and fixes for clone/deduplication

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] Btrfs: a few more cleanups and fixes for clone/deduplication
@ 2018-12-12 18:05 fdmanana
  2018-12-12 18:05 ` [PATCH 1/4] Btrfs: move duplicated nodatasum check into common reflink/dedupe helper fdmanana
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: fdmanana @ 2018-12-12 18:05 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

This is a follow up to the recent change that makes btrfs use the generic
helper for cloning and deduplication [1]. It contains a mix of cleanups
and fixes, which existed before that change.

[1] Btrfs: use generic_remap_file_range_prep() for cloning and deduplication

Filipe Manana (4):
  Btrfs: move duplicated nodatasum check into common reflink/dedupe
    helper
  Btrfs: use cross mount point check for cloning and deduplication
  Btrfs: check if destination root is read-only for deduplication
  Btrfs: remove no longer needed range length checks for deduplication

 fs/btrfs/ioctl.c | 50 ++++++++++++++------------------------------------
 1 file changed, 14 insertions(+), 36 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/4] Btrfs: move duplicated nodatasum check into common reflink/dedupe helper
  2018-12-12 18:05 [PATCH 0/4] Btrfs: a few more cleanups and fixes for clone/deduplication fdmanana
@ 2018-12-12 18:05 ` fdmanana
  2019-01-11 14:55   ` David Sterba
  2018-12-12 18:05 ` [PATCH 2/4] Btrfs: use cross mount point check for cloning and deduplication fdmanana
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: fdmanana @ 2018-12-12 18:05 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Move the check that verifies if both inodes have checksums disabled or
both have them enabled, from the clone and deduplication functions into
the new common helper btrfs_remap_file_range_prep().

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ioctl.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 321fb9bc149d..1e90d10b5638 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3245,11 +3245,6 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 	int num_pages = PAGE_ALIGN(BTRFS_MAX_DEDUPE_LEN) >> PAGE_SHIFT;
 	u64 i, tail_len, chunk_count;
 
-	/* don't make the dst file partly checksummed */
-	if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
-	    (BTRFS_I(dst)->flags & BTRFS_INODE_NODATASUM))
-		return -EINVAL;
-
 	tail_len = olen % BTRFS_MAX_DEDUPE_LEN;
 	chunk_count = div_u64(olen, BTRFS_MAX_DEDUPE_LEN);
 	if (chunk_count == 0)
@@ -3869,11 +3864,6 @@ static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
 	 *   be either compressed or non-compressed.
 	 */
 
-	/* don't make the dst file partly checksummed */
-	if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
-	    (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM))
-		return -EINVAL;
-
 	/*
 	 * VFS's generic_remap_file_range_prep() protects us from cloning the
 	 * eof block into the middle of a file, which would result in corruption
@@ -3933,6 +3923,13 @@ static int btrfs_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 	else
 		btrfs_double_inode_lock(inode_in, inode_out);
 
+	/* don't make the dst file partly checksummed */
+	if ((BTRFS_I(inode_in)->flags & BTRFS_INODE_NODATASUM) !=
+	    (BTRFS_I(inode_out)->flags & BTRFS_INODE_NODATASUM)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
 	/*
 	 * Now that the inodes are locked, we need to start writeback ourselves
 	 * and can not rely on the writeback from the VFS's generic helper
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/4] Btrfs: use cross mount point check for cloning and deduplication
  2018-12-12 18:05 [PATCH 0/4] Btrfs: a few more cleanups and fixes for clone/deduplication fdmanana
  2018-12-12 18:05 ` [PATCH 1/4] Btrfs: move duplicated nodatasum check into common reflink/dedupe helper fdmanana
@ 2018-12-12 18:05 ` fdmanana
  2018-12-13 16:02   ` David Sterba
  2018-12-12 18:05 ` [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication fdmanana
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: fdmanana @ 2018-12-12 18:05 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

There is no reason why this check was performed for clone operations but
not for deduplication operations, after all deduplication is a special
case of cloning. So make the check happen for deduplication as well.

This check used to be done and got removed by accident in commit
2b3909f8a7fe9 ("btrfs: use new dedupe data function pointer").

Fixes: 2b3909f8a7fe9 ("btrfs: use new dedupe data function pointer").
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ioctl.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 1e90d10b5638..ffe940ceb80a 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3912,12 +3912,12 @@ static int btrfs_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 
 		if (btrfs_root_readonly(root_out))
 			return -EROFS;
-
-		if (file_in->f_path.mnt != file_out->f_path.mnt ||
-		    inode_in->i_sb != inode_out->i_sb)
-			return -EXDEV;
 	}
 
+	if (file_in->f_path.mnt != file_out->f_path.mnt ||
+	    inode_in->i_sb != inode_out->i_sb)
+		return -EXDEV;
+
 	if (same_inode)
 		inode_lock(inode_in);
 	else
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication
  2018-12-12 18:05 [PATCH 0/4] Btrfs: a few more cleanups and fixes for clone/deduplication fdmanana
  2018-12-12 18:05 ` [PATCH 1/4] Btrfs: move duplicated nodatasum check into common reflink/dedupe helper fdmanana
  2018-12-12 18:05 ` [PATCH 2/4] Btrfs: use cross mount point check for cloning and deduplication fdmanana
@ 2018-12-12 18:05 ` fdmanana
  2018-12-13 16:07   ` David Sterba
  2019-02-18 16:01   ` David Sterba
  2018-12-12 18:05 ` [PATCH 4/4] Btrfs: remove no longer needed range length checks " fdmanana
  2018-12-13 12:19 ` [PATCH 0/4] Btrfs: a few more cleanups and fixes for clone/deduplication Nikolay Borisov
  4 siblings, 2 replies; 26+ messages in thread
From: fdmanana @ 2018-12-12 18:05 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Checking if the destination root is read-only was being performed only for
clone operations. Make deduplication check it as well, as it does not make
sense to not do it, even if it is an operation that does not change the
file contents (such as defrag for example, which checks first if the root
is read-only).

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ioctl.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ffe940ceb80a..4e9efc93340e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3907,12 +3907,8 @@ static int btrfs_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 	u64 wb_len;
 	int ret;
 
-	if (!(remap_flags & REMAP_FILE_DEDUP)) {
-		struct btrfs_root *root_out = BTRFS_I(inode_out)->root;
-
-		if (btrfs_root_readonly(root_out))
-			return -EROFS;
-	}
+	if (btrfs_root_readonly(BTRFS_I(inode_out)->root))
+		return -EROFS;
 
 	if (file_in->f_path.mnt != file_out->f_path.mnt ||
 	    inode_in->i_sb != inode_out->i_sb)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/4] Btrfs: remove no longer needed range length checks for deduplication
  2018-12-12 18:05 [PATCH 0/4] Btrfs: a few more cleanups and fixes for clone/deduplication fdmanana
                   ` (2 preceding siblings ...)
  2018-12-12 18:05 ` [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication fdmanana
@ 2018-12-12 18:05 ` fdmanana
  2018-12-13 12:20   ` Nikolay Borisov
  2019-01-31 16:31   ` Filipe Manana
  2018-12-13 12:19 ` [PATCH 0/4] Btrfs: a few more cleanups and fixes for clone/deduplication Nikolay Borisov
  4 siblings, 2 replies; 26+ messages in thread
From: fdmanana @ 2018-12-12 18:05 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

Comparing the content of the pages in the range to deduplicate is now done
by the generic helper generic_remap_file_range_prep(), which takes care of
ensuring we do not compare/deduplicate undefined data beyond a file's eof
(range from eof to the next block boundary). So remove these checks which
are now redundant.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ioctl.c | 19 ++-----------------
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 4e9efc93340e..3a27efa2b955 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3206,31 +3206,16 @@ static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
 	inode_lock_nested(inode2, I_MUTEX_CHILD);
 }
 
-static int btrfs_extent_same_range(struct inode *src, u64 loff, u64 olen,
+static int btrfs_extent_same_range(struct inode *src, u64 loff, u64 len,
 				   struct inode *dst, u64 dst_loff)
 {
-	u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
 	int ret;
-	u64 len = olen;
-
-	if (loff + len == src->i_size)
-		len = ALIGN(src->i_size, bs) - loff;
-	/*
-	 * For same inode case we don't want our length pushed out past i_size
-	 * as comparing that data range makes no sense.
-	 *
-	 * This effectively means we require aligned extents for the single
-	 * inode case, whereas the other cases allow an unaligned length so long
-	 * as it ends at i_size.
-	 */
-	if (dst == src && len != olen)
-		return -EINVAL;
 
 	/*
 	 * Lock destination range to serialize with concurrent readpages().
 	 */
 	lock_extent(&BTRFS_I(dst)->io_tree, dst_loff, dst_loff + len - 1);
-	ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
+	ret = btrfs_clone(src, dst, loff, len, len, dst_loff, 1);
 	unlock_extent(&BTRFS_I(dst)->io_tree, dst_loff, dst_loff + len - 1);
 
 	return ret;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] Btrfs: a few more cleanups and fixes for clone/deduplication
  2018-12-12 18:05 [PATCH 0/4] Btrfs: a few more cleanups and fixes for clone/deduplication fdmanana
                   ` (3 preceding siblings ...)
  2018-12-12 18:05 ` [PATCH 4/4] Btrfs: remove no longer needed range length checks " fdmanana
@ 2018-12-13 12:19 ` Nikolay Borisov
  4 siblings, 0 replies; 26+ messages in thread
From: Nikolay Borisov @ 2018-12-13 12:19 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 12.12.18 г. 20:05 ч., fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> This is a follow up to the recent change that makes btrfs use the generic
> helper for cloning and deduplication [1]. It contains a mix of cleanups
> and fixes, which existed before that change.
> 
> [1] Btrfs: use generic_remap_file_range_prep() for cloning and deduplication
> 
> Filipe Manana (4):
>   Btrfs: move duplicated nodatasum check into common reflink/dedupe
>     helper
>   Btrfs: use cross mount point check for cloning and deduplication
>   Btrfs: check if destination root is read-only for deduplication
>   Btrfs: remove no longer needed range length checks for deduplication
> 
>  fs/btrfs/ioctl.c | 50 ++++++++++++++------------------------------------
>  1 file changed, 14 insertions(+), 36 deletions(-)

For the whole series:

Reviewed-by: Nikolay Borisov <nborisov@suse.com>


> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] Btrfs: remove no longer needed range length checks for deduplication
  2018-12-12 18:05 ` [PATCH 4/4] Btrfs: remove no longer needed range length checks " fdmanana
@ 2018-12-13 12:20   ` Nikolay Borisov
  2019-01-31 16:31   ` Filipe Manana
  1 sibling, 0 replies; 26+ messages in thread
From: Nikolay Borisov @ 2018-12-13 12:20 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 12.12.18 г. 20:05 ч., fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Comparing the content of the pages in the range to deduplicate is now done
> by the generic helper generic_remap_file_range_prep(), which takes care of

very minor nit: the checks are performed in generic_remap_checks which
is called by generic_remap_file_range_prep. But even without this change
LGTM.

> ensuring we do not compare/deduplicate undefined data beyond a file's eof
> (range from eof to the next block boundary). So remove these checks which
> are now redundant.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>  fs/btrfs/ioctl.c | 19 ++-----------------
>  1 file changed, 2 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 4e9efc93340e..3a27efa2b955 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3206,31 +3206,16 @@ static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
>  	inode_lock_nested(inode2, I_MUTEX_CHILD);
>  }
>  
> -static int btrfs_extent_same_range(struct inode *src, u64 loff, u64 olen,
> +static int btrfs_extent_same_range(struct inode *src, u64 loff, u64 len,
>  				   struct inode *dst, u64 dst_loff)
>  {
> -	u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
>  	int ret;
> -	u64 len = olen;
> -
> -	if (loff + len == src->i_size)
> -		len = ALIGN(src->i_size, bs) - loff;
> -	/*
> -	 * For same inode case we don't want our length pushed out past i_size
> -	 * as comparing that data range makes no sense.
> -	 *
> -	 * This effectively means we require aligned extents for the single
> -	 * inode case, whereas the other cases allow an unaligned length so long
> -	 * as it ends at i_size.
> -	 */
> -	if (dst == src && len != olen)
> -		return -EINVAL;
>  
>  	/*
>  	 * Lock destination range to serialize with concurrent readpages().
>  	 */
>  	lock_extent(&BTRFS_I(dst)->io_tree, dst_loff, dst_loff + len - 1);
> -	ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> +	ret = btrfs_clone(src, dst, loff, len, len, dst_loff, 1);
>  	unlock_extent(&BTRFS_I(dst)->io_tree, dst_loff, dst_loff + len - 1);
>  
>  	return ret;
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] Btrfs: use cross mount point check for cloning and deduplication
  2018-12-12 18:05 ` [PATCH 2/4] Btrfs: use cross mount point check for cloning and deduplication fdmanana
@ 2018-12-13 16:02   ` David Sterba
  2019-01-11 14:38     ` David Sterba
  0 siblings, 1 reply; 26+ messages in thread
From: David Sterba @ 2018-12-13 16:02 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Wed, Dec 12, 2018 at 06:05:57PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> There is no reason why this check was performed for clone operations but
> not for deduplication operations, after all deduplication is a special
> case of cloning. So make the check happen for deduplication as well.
> 
> This check used to be done and got removed by accident in commit
> 2b3909f8a7fe9 ("btrfs: use new dedupe data function pointer").
> 
> Fixes: 2b3909f8a7fe9 ("btrfs: use new dedupe data function pointer").
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>  fs/btrfs/ioctl.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 1e90d10b5638..ffe940ceb80a 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3912,12 +3912,12 @@ static int btrfs_remap_file_range_prep(struct file *file_in, loff_t pos_in,
>  
>  		if (btrfs_root_readonly(root_out))
>  			return -EROFS;
> -
> -		if (file_in->f_path.mnt != file_out->f_path.mnt ||
> -		    inode_in->i_sb != inode_out->i_sb)
> -			return -EXDEV;
>  	}
>  
> +	if (file_in->f_path.mnt != file_out->f_path.mnt ||
> +	    inode_in->i_sb != inode_out->i_sb)
> +		return -EXDEV;

I'm checking if this is or is not a change in the dedupe behaviour
regarding dedupe on different mont points.

I can't see the f_path.mnt check anywhere in the VFS calls so after that
patch deduping stops to work in some cases.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication
  2018-12-12 18:05 ` [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication fdmanana
@ 2018-12-13 16:07   ` David Sterba
  2019-01-31 16:39     ` Filipe Manana
  2019-02-18 16:01   ` David Sterba
  1 sibling, 1 reply; 26+ messages in thread
From: David Sterba @ 2018-12-13 16:07 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Wed, Dec 12, 2018 at 06:05:58PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Checking if the destination root is read-only was being performed only for
> clone operations. Make deduplication check it as well, as it does not make
> sense to not do it, even if it is an operation that does not change the
> file contents (such as defrag for example, which checks first if the root
> is read-only).

And this is also change in user-visible behaviour of dedupe, so this
needs to be verified if it's not breaking existing tools.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] Btrfs: use cross mount point check for cloning and deduplication
  2018-12-13 16:02   ` David Sterba
@ 2019-01-11 14:38     ` David Sterba
  0 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2019-01-11 14:38 UTC (permalink / raw)
  To: dsterba, fdmanana, linux-btrfs

On Thu, Dec 13, 2018 at 05:02:24PM +0100, David Sterba wrote:
> On Wed, Dec 12, 2018 at 06:05:57PM +0000, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> > 
> > There is no reason why this check was performed for clone operations but
> > not for deduplication operations, after all deduplication is a special
> > case of cloning. So make the check happen for deduplication as well.
> > 
> > This check used to be done and got removed by accident in commit
> > 2b3909f8a7fe9 ("btrfs: use new dedupe data function pointer").
> > 
> > Fixes: 2b3909f8a7fe9 ("btrfs: use new dedupe data function pointer").
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >  fs/btrfs/ioctl.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 1e90d10b5638..ffe940ceb80a 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -3912,12 +3912,12 @@ static int btrfs_remap_file_range_prep(struct file *file_in, loff_t pos_in,
> >  
> >  		if (btrfs_root_readonly(root_out))
> >  			return -EROFS;
> > -
> > -		if (file_in->f_path.mnt != file_out->f_path.mnt ||
> > -		    inode_in->i_sb != inode_out->i_sb)
> > -			return -EXDEV;
> >  	}
> >  
> > +	if (file_in->f_path.mnt != file_out->f_path.mnt ||
> > +	    inode_in->i_sb != inode_out->i_sb)
> > +		return -EXDEV;
> 
> I'm checking if this is or is not a change in the dedupe behaviour
> regarding dedupe on different mont points.
> 
> I can't see the f_path.mnt check anywhere in the VFS calls so after that
> patch deduping stops to work in some cases.

Dedupe on cross-mount has been broken (or working) since 4.5, that's a
lot of releases with at least 4 stables in between. Changing that back
might break applications, I've asked on irc but none of the widely used
tools depend on that.

As usual we can never know for sure, so I'd like to explore the option
to keep it working. We'd also like to drop the cross-mount limitation
for clone that's something users do want. However this still has some
opposition from VFS people.

Unless there's a technical/stability reason to add the check back, I'd
like to keep it working.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] Btrfs: move duplicated nodatasum check into common reflink/dedupe helper
  2018-12-12 18:05 ` [PATCH 1/4] Btrfs: move duplicated nodatasum check into common reflink/dedupe helper fdmanana
@ 2019-01-11 14:55   ` David Sterba
  0 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2019-01-11 14:55 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Wed, Dec 12, 2018 at 06:05:56PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Move the check that verifies if both inodes have checksums disabled or
> both have them enabled, from the clone and deduplication functions into
> the new common helper btrfs_remap_file_range_prep().
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: David Sterba <dsterba@suse.com>

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] Btrfs: remove no longer needed range length checks for deduplication
  2018-12-12 18:05 ` [PATCH 4/4] Btrfs: remove no longer needed range length checks " fdmanana
  2018-12-13 12:20   ` Nikolay Borisov
@ 2019-01-31 16:31   ` Filipe Manana
  2019-02-12 17:58     ` Filipe Manana
  2019-02-18 15:10     ` David Sterba
  1 sibling, 2 replies; 26+ messages in thread
From: Filipe Manana @ 2019-01-31 16:31 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

On Wed, Dec 12, 2018 at 6:07 PM <fdmanana@kernel.org> wrote:
>
> From: Filipe Manana <fdmanana@suse.com>
>
> Comparing the content of the pages in the range to deduplicate is now done
> by the generic helper generic_remap_file_range_prep(), which takes care of
> ensuring we do not compare/deduplicate undefined data beyond a file's eof
> (range from eof to the next block boundary). So remove these checks which
> are now redundant.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Any reason why this was not yet picked?
Thanks.

> ---
>  fs/btrfs/ioctl.c | 19 ++-----------------
>  1 file changed, 2 insertions(+), 17 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 4e9efc93340e..3a27efa2b955 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3206,31 +3206,16 @@ static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
>         inode_lock_nested(inode2, I_MUTEX_CHILD);
>  }
>
> -static int btrfs_extent_same_range(struct inode *src, u64 loff, u64 olen,
> +static int btrfs_extent_same_range(struct inode *src, u64 loff, u64 len,
>                                    struct inode *dst, u64 dst_loff)
>  {
> -       u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
>         int ret;
> -       u64 len = olen;
> -
> -       if (loff + len == src->i_size)
> -               len = ALIGN(src->i_size, bs) - loff;
> -       /*
> -        * For same inode case we don't want our length pushed out past i_size
> -        * as comparing that data range makes no sense.
> -        *
> -        * This effectively means we require aligned extents for the single
> -        * inode case, whereas the other cases allow an unaligned length so long
> -        * as it ends at i_size.
> -        */
> -       if (dst == src && len != olen)
> -               return -EINVAL;
>
>         /*
>          * Lock destination range to serialize with concurrent readpages().
>          */
>         lock_extent(&BTRFS_I(dst)->io_tree, dst_loff, dst_loff + len - 1);
> -       ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> +       ret = btrfs_clone(src, dst, loff, len, len, dst_loff, 1);
>         unlock_extent(&BTRFS_I(dst)->io_tree, dst_loff, dst_loff + len - 1);
>
>         return ret;
> --
> 2.11.0
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication
  2018-12-13 16:07   ` David Sterba
@ 2019-01-31 16:39     ` Filipe Manana
  2019-01-31 16:44       ` Hugo Mills
                         ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Filipe Manana @ 2019-01-31 16:39 UTC (permalink / raw)
  To: dsterba, Filipe Manana, linux-btrfs

On Thu, Dec 13, 2018 at 4:08 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Wed, Dec 12, 2018 at 06:05:58PM +0000, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > Checking if the destination root is read-only was being performed only for
> > clone operations. Make deduplication check it as well, as it does not make
> > sense to not do it, even if it is an operation that does not change the
> > file contents (such as defrag for example, which checks first if the root
> > is read-only).
>
> And this is also change in user-visible behaviour of dedupe, so this
> needs to be verified if it's not breaking existing tools.

Have you had the chance to do such verification?

This actually conflicts with send. Send does not expect a root/tree to
change, and with dedupe on read-only roots happening
in parallel with send is going to cause all sorts of unexpected and
undesired problems...

This is a problem introduced by dedupe ioctl when it landed, since
send existed for a longer time (when nothing else was
allowed to change read-only roots, including defrag).

I understand it can break some applications, but adding other solution
such as preventing send and dedupe from running in parallel
(erroring out or block and wait for each other, etc) is going to be
really ugly. There's always the workaround for apps to set the
subvolume
to RW mode, do the dedupe, then switch it back to RO mode.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication
  2019-01-31 16:39     ` Filipe Manana
@ 2019-01-31 16:44       ` Hugo Mills
  2019-02-18 15:38         ` David Sterba
  2019-02-12 17:59       ` Filipe Manana
  2019-02-20 16:41       ` Zygo Blaxell
  2 siblings, 1 reply; 26+ messages in thread
From: Hugo Mills @ 2019-01-31 16:44 UTC (permalink / raw)
  To: Filipe Manana; +Cc: dsterba, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2120 bytes --]

On Thu, Jan 31, 2019 at 04:39:22PM +0000, Filipe Manana wrote:
> On Thu, Dec 13, 2018 at 4:08 PM David Sterba <dsterba@suse.cz> wrote:
> >
> > On Wed, Dec 12, 2018 at 06:05:58PM +0000, fdmanana@kernel.org wrote:
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > Checking if the destination root is read-only was being performed only for
> > > clone operations. Make deduplication check it as well, as it does not make
> > > sense to not do it, even if it is an operation that does not change the
> > > file contents (such as defrag for example, which checks first if the root
> > > is read-only).
> >
> > And this is also change in user-visible behaviour of dedupe, so this
> > needs to be verified if it's not breaking existing tools.
> 
> Have you had the chance to do such verification?
> 
> This actually conflicts with send. Send does not expect a root/tree to
> change, and with dedupe on read-only roots happening
> in parallel with send is going to cause all sorts of unexpected and
> undesired problems...
> 
> This is a problem introduced by dedupe ioctl when it landed, since
> send existed for a longer time (when nothing else was
> allowed to change read-only roots, including defrag).
> 
> I understand it can break some applications, but adding other solution
> such as preventing send and dedupe from running in parallel
> (erroring out or block and wait for each other, etc) is going to be
> really ugly. There's always the workaround for apps to set the
> subvolume
> to RW mode, do the dedupe, then switch it back to RO mode.

   Only if you want your incremental send chain to break on the way
past...

   I think it's fairly clear by now (particularly from the last thread
on this a couple of weeks ago) that making RO subvols RW and then back
again is a fast way to broken incremental receives.

   Hugo.

-- 
Hugo Mills             | A clear conscience. Where did you get this taste for
hugo@... carfax.org.uk | luxuries, Bernard?
http://carfax.org.uk/  |                                          Sir Humphrey
PGP: E2AB1DE4          |                                   Yes, Prime Minister

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] Btrfs: remove no longer needed range length checks for deduplication
  2019-01-31 16:31   ` Filipe Manana
@ 2019-02-12 17:58     ` Filipe Manana
  2019-02-18 15:10     ` David Sterba
  1 sibling, 0 replies; 26+ messages in thread
From: Filipe Manana @ 2019-02-12 17:58 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

On Thu, Jan 31, 2019 at 4:31 PM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Wed, Dec 12, 2018 at 6:07 PM <fdmanana@kernel.org> wrote:
> >
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > Comparing the content of the pages in the range to deduplicate is now done
> > by the generic helper generic_remap_file_range_prep(), which takes care of
> > ensuring we do not compare/deduplicate undefined data beyond a file's eof
> > (range from eof to the next block boundary). So remove these checks which
> > are now redundant.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> Any reason why this was not yet picked?
> Thanks.

Another ping. Thanks.

>
> > ---
> >  fs/btrfs/ioctl.c | 19 ++-----------------
> >  1 file changed, 2 insertions(+), 17 deletions(-)
> >
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 4e9efc93340e..3a27efa2b955 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -3206,31 +3206,16 @@ static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
> >         inode_lock_nested(inode2, I_MUTEX_CHILD);
> >  }
> >
> > -static int btrfs_extent_same_range(struct inode *src, u64 loff, u64 olen,
> > +static int btrfs_extent_same_range(struct inode *src, u64 loff, u64 len,
> >                                    struct inode *dst, u64 dst_loff)
> >  {
> > -       u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
> >         int ret;
> > -       u64 len = olen;
> > -
> > -       if (loff + len == src->i_size)
> > -               len = ALIGN(src->i_size, bs) - loff;
> > -       /*
> > -        * For same inode case we don't want our length pushed out past i_size
> > -        * as comparing that data range makes no sense.
> > -        *
> > -        * This effectively means we require aligned extents for the single
> > -        * inode case, whereas the other cases allow an unaligned length so long
> > -        * as it ends at i_size.
> > -        */
> > -       if (dst == src && len != olen)
> > -               return -EINVAL;
> >
> >         /*
> >          * Lock destination range to serialize with concurrent readpages().
> >          */
> >         lock_extent(&BTRFS_I(dst)->io_tree, dst_loff, dst_loff + len - 1);
> > -       ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> > +       ret = btrfs_clone(src, dst, loff, len, len, dst_loff, 1);
> >         unlock_extent(&BTRFS_I(dst)->io_tree, dst_loff, dst_loff + len - 1);
> >
> >         return ret;
> > --
> > 2.11.0
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication
  2019-01-31 16:39     ` Filipe Manana
  2019-01-31 16:44       ` Hugo Mills
@ 2019-02-12 17:59       ` Filipe Manana
  2019-02-20 16:41       ` Zygo Blaxell
  2 siblings, 0 replies; 26+ messages in thread
From: Filipe Manana @ 2019-02-12 17:59 UTC (permalink / raw)
  To: dsterba, Filipe Manana, linux-btrfs

On Thu, Jan 31, 2019 at 4:39 PM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Thu, Dec 13, 2018 at 4:08 PM David Sterba <dsterba@suse.cz> wrote:
> >
> > On Wed, Dec 12, 2018 at 06:05:58PM +0000, fdmanana@kernel.org wrote:
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > Checking if the destination root is read-only was being performed only for
> > > clone operations. Make deduplication check it as well, as it does not make
> > > sense to not do it, even if it is an operation that does not change the
> > > file contents (such as defrag for example, which checks first if the root
> > > is read-only).
> >
> > And this is also change in user-visible behaviour of dedupe, so this
> > needs to be verified if it's not breaking existing tools.
>
> Have you had the chance to do such verification?
>
> This actually conflicts with send. Send does not expect a root/tree to
> change, and with dedupe on read-only roots happening
> in parallel with send is going to cause all sorts of unexpected and
> undesired problems...
>
> This is a problem introduced by dedupe ioctl when it landed, since
> send existed for a longer time (when nothing else was
> allowed to change read-only roots, including defrag).

Another ping.
This is a problem that has to be solved one way or another.

Thanks.

>
> I understand it can break some applications, but adding other solution
> such as preventing send and dedupe from running in parallel
> (erroring out or block and wait for each other, etc) is going to be
> really ugly. There's always the workaround for apps to set the
> subvolume
> to RW mode, do the dedupe, then switch it back to RO mode.
>
> Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] Btrfs: remove no longer needed range length checks for deduplication
  2019-01-31 16:31   ` Filipe Manana
  2019-02-12 17:58     ` Filipe Manana
@ 2019-02-18 15:10     ` David Sterba
  1 sibling, 0 replies; 26+ messages in thread
From: David Sterba @ 2019-02-18 15:10 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs, David Sterba

On Thu, Jan 31, 2019 at 04:31:49PM +0000, Filipe Manana wrote:
> On Wed, Dec 12, 2018 at 6:07 PM <fdmanana@kernel.org> wrote:
> >
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > Comparing the content of the pages in the range to deduplicate is now done
> > by the generic helper generic_remap_file_range_prep(), which takes care of
> > ensuring we do not compare/deduplicate undefined data beyond a file's eof
> > (range from eof to the next block boundary). So remove these checks which
> > are now redundant.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> 
> Any reason why this was not yet picked?

Added to misc-next now.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication
  2019-01-31 16:44       ` Hugo Mills
@ 2019-02-18 15:38         ` David Sterba
  2019-02-18 16:55           ` Filipe Manana
  0 siblings, 1 reply; 26+ messages in thread
From: David Sterba @ 2019-02-18 15:38 UTC (permalink / raw)
  To: Hugo Mills, Filipe Manana, dsterba, linux-btrfs

On Thu, Jan 31, 2019 at 04:44:39PM +0000, Hugo Mills wrote:
> On Thu, Jan 31, 2019 at 04:39:22PM +0000, Filipe Manana wrote:
> > On Thu, Dec 13, 2018 at 4:08 PM David Sterba <dsterba@suse.cz> wrote:
> > >
> > > On Wed, Dec 12, 2018 at 06:05:58PM +0000, fdmanana@kernel.org wrote:
> > > > From: Filipe Manana <fdmanana@suse.com>
> > > >
> > > > Checking if the destination root is read-only was being performed only for
> > > > clone operations. Make deduplication check it as well, as it does not make
> > > > sense to not do it, even if it is an operation that does not change the
> > > > file contents (such as defrag for example, which checks first if the root
> > > > is read-only).
> > >
> > > And this is also change in user-visible behaviour of dedupe, so this
> > > needs to be verified if it's not breaking existing tools.
> > 
> > Have you had the chance to do such verification?
> > 
> > This actually conflicts with send. Send does not expect a root/tree to
> > change, and with dedupe on read-only roots happening
> > in parallel with send is going to cause all sorts of unexpected and
> > undesired problems...
> > 
> > This is a problem introduced by dedupe ioctl when it landed, since
> > send existed for a longer time (when nothing else was
> > allowed to change read-only roots, including defrag).
> > 
> > I understand it can break some applications, but adding other solution
> > such as preventing send and dedupe from running in parallel
> > (erroring out or block and wait for each other, etc) is going to be
> > really ugly. There's always the workaround for apps to set the
> > subvolume
> > to RW mode, do the dedupe, then switch it back to RO mode.
> 
>    Only if you want your incremental send chain to break on the way
> past...
> 
>    I think it's fairly clear by now (particularly from the last thread
> on this a couple of weeks ago) that making RO subvols RW and then back
> again is a fast way to broken incremental receives.

So, I think the way it should be fixed is to keep deduplication off the
read-only subvolumes. The main reason I see is to avoid the random
problems that arise from send + dedupe interaction. The cost is lower
deduplication ratio.

The main usecase being a primary subvolume with RO snapshots over time,
with optional incremental send. That I know is common and widely used.

A problematic usecase that would utilize deduplication over RO snapshots
does could be something like a set of subvolumes that have very similar
data, get snapshotted or set RO, followed by a dedupe pass.

To support the latter I'd go only via the RO->RW, dedupe, RW->RO way, as
this records that there was a change and does not collide with send
assumptions. That this leads to loss of incremental send must be
communicated to the user with possible override, eg. --I-know-what-I-am-doing
option.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication
  2018-12-12 18:05 ` [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication fdmanana
  2018-12-13 16:07   ` David Sterba
@ 2019-02-18 16:01   ` David Sterba
  1 sibling, 0 replies; 26+ messages in thread
From: David Sterba @ 2019-02-18 16:01 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Wed, Dec 12, 2018 at 06:05:58PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Checking if the destination root is read-only was being performed only for
> clone operations. Make deduplication check it as well, as it does not make
> sense to not do it, even if it is an operation that does not change the
> file contents (such as defrag for example, which checks first if the root
> is read-only).
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

I've copied the portions of our discussion and added the patch to
misc-next.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication
  2019-02-18 15:38         ` David Sterba
@ 2019-02-18 16:55           ` Filipe Manana
  0 siblings, 0 replies; 26+ messages in thread
From: Filipe Manana @ 2019-02-18 16:55 UTC (permalink / raw)
  To: dsterba, Hugo Mills, Filipe Manana, linux-btrfs

On Mon, Feb 18, 2019 at 3:36 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Thu, Jan 31, 2019 at 04:44:39PM +0000, Hugo Mills wrote:
> > On Thu, Jan 31, 2019 at 04:39:22PM +0000, Filipe Manana wrote:
> > > On Thu, Dec 13, 2018 at 4:08 PM David Sterba <dsterba@suse.cz> wrote:
> > > >
> > > > On Wed, Dec 12, 2018 at 06:05:58PM +0000, fdmanana@kernel.org wrote:
> > > > > From: Filipe Manana <fdmanana@suse.com>
> > > > >
> > > > > Checking if the destination root is read-only was being performed only for
> > > > > clone operations. Make deduplication check it as well, as it does not make
> > > > > sense to not do it, even if it is an operation that does not change the
> > > > > file contents (such as defrag for example, which checks first if the root
> > > > > is read-only).
> > > >
> > > > And this is also change in user-visible behaviour of dedupe, so this
> > > > needs to be verified if it's not breaking existing tools.
> > >
> > > Have you had the chance to do such verification?
> > >
> > > This actually conflicts with send. Send does not expect a root/tree to
> > > change, and with dedupe on read-only roots happening
> > > in parallel with send is going to cause all sorts of unexpected and
> > > undesired problems...
> > >
> > > This is a problem introduced by dedupe ioctl when it landed, since
> > > send existed for a longer time (when nothing else was
> > > allowed to change read-only roots, including defrag).
> > >
> > > I understand it can break some applications, but adding other solution
> > > such as preventing send and dedupe from running in parallel
> > > (erroring out or block and wait for each other, etc) is going to be
> > > really ugly. There's always the workaround for apps to set the
> > > subvolume
> > > to RW mode, do the dedupe, then switch it back to RO mode.
> >
> >    Only if you want your incremental send chain to break on the way
> > past...
> >
> >    I think it's fairly clear by now (particularly from the last thread
> > on this a couple of weeks ago) that making RO subvols RW and then back
> > again is a fast way to broken incremental receives.
>
> So, I think the way it should be fixed is to keep deduplication off the
> read-only subvolumes. The main reason I see is to avoid the random
> problems that arise from send + dedupe interaction. The cost is lower
> deduplication ratio.
>
> The main usecase being a primary subvolume with RO snapshots over time,
> with optional incremental send. That I know is common and widely used.
>
> A problematic usecase that would utilize deduplication over RO snapshots
> does could be something like a set of subvolumes that have very similar
> data, get snapshotted or set RO, followed by a dedupe pass.
>
> To support the latter I'd go only via the RO->RW, dedupe, RW->RO way, as
> this records that there was a change and does not collide with send
> assumptions. That this leads to loss of incremental send must be
> communicated to the user with possible override, eg. --I-know-what-I-am-doing
> option.

I fully agree.
Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication
  2019-01-31 16:39     ` Filipe Manana
  2019-01-31 16:44       ` Hugo Mills
  2019-02-12 17:59       ` Filipe Manana
@ 2019-02-20 16:41       ` Zygo Blaxell
  2019-02-20 16:54         ` Filipe Manana
  2 siblings, 1 reply; 26+ messages in thread
From: Zygo Blaxell @ 2019-02-20 16:41 UTC (permalink / raw)
  To: Filipe Manana; +Cc: dsterba, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2823 bytes --]

On Thu, Jan 31, 2019 at 04:39:22PM +0000, Filipe Manana wrote:
> On Thu, Dec 13, 2018 at 4:08 PM David Sterba <dsterba@suse.cz> wrote:
> >
> > On Wed, Dec 12, 2018 at 06:05:58PM +0000, fdmanana@kernel.org wrote:
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > Checking if the destination root is read-only was being performed only for
> > > clone operations. Make deduplication check it as well, as it does not make
> > > sense to not do it, even if it is an operation that does not change the
> > > file contents (such as defrag for example, which checks first if the root
> > > is read-only).
> >
> > And this is also change in user-visible behaviour of dedupe, so this
> > needs to be verified if it's not breaking existing tools.
> 
> Have you had the chance to do such verification?
> 
> This actually conflicts with send. Send does not expect a root/tree to
> change, and with dedupe on read-only roots happening
> in parallel with send is going to cause all sorts of unexpected and
> undesired problems...

This is a problem bees ran into.  There is a workaround in bees (called
--workaround-btrfs-send) that avoids RO subvols as dedupe targets.
As the name of the option implies, it works around problems in btrfs send.

This kernel change makes the workaround mandatory now, as the default
case (without workaround) will fail on every RO subvol even if that
behavior is desired by the user.  That breaks an important use case on
the receiving side of sends--to dedupe the received subvols together
while also protecting them against modification on the target system
with the RO flag--and preserving that use case is why the send workaround
was optional (and not default) in bees.

bees also won't handle the RO/RW/RO transition correctly, as it didn't
seem like a sane thing to support at the time.  That is arguably something
to be fixed in bees.

> This is a problem introduced by dedupe ioctl when it landed, since
> send existed for a longer time (when nothing else was
> allowed to change read-only roots, including defrag).

Is there a reason why incremental send can't simply be fixed?  As far
as I can tell, send is failing because of a runtime check that seems to
be too strict; however, I haven't tried removing that check to see if
it fixes the problem in send, or just hides the next problem.

More details at:

	https://github.com/Zygo/bees/issues/79#issuecomment-429039036

> I understand it can break some applications, but adding other solution
> such as preventing send and dedupe from running in parallel
> (erroring out or block and wait for each other, etc) is going to be
> really ugly. There's always the workaround for apps to set the
> subvolume
> to RW mode, do the dedupe, then switch it back to RO mode.
> 
> Thanks.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication
  2019-02-20 16:41       ` Zygo Blaxell
@ 2019-02-20 16:54         ` Filipe Manana
  2019-02-20 17:17           ` Zygo Blaxell
  2019-02-21 16:54           ` Zygo Blaxell
  0 siblings, 2 replies; 26+ messages in thread
From: Filipe Manana @ 2019-02-20 16:54 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: dsterba, linux-btrfs

On Wed, Feb 20, 2019 at 4:42 PM Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
>
> On Thu, Jan 31, 2019 at 04:39:22PM +0000, Filipe Manana wrote:
> > On Thu, Dec 13, 2018 at 4:08 PM David Sterba <dsterba@suse.cz> wrote:
> > >
> > > On Wed, Dec 12, 2018 at 06:05:58PM +0000, fdmanana@kernel.org wrote:
> > > > From: Filipe Manana <fdmanana@suse.com>
> > > >
> > > > Checking if the destination root is read-only was being performed only for
> > > > clone operations. Make deduplication check it as well, as it does not make
> > > > sense to not do it, even if it is an operation that does not change the
> > > > file contents (such as defrag for example, which checks first if the root
> > > > is read-only).
> > >
> > > And this is also change in user-visible behaviour of dedupe, so this
> > > needs to be verified if it's not breaking existing tools.
> >
> > Have you had the chance to do such verification?
> >
> > This actually conflicts with send. Send does not expect a root/tree to
> > change, and with dedupe on read-only roots happening
> > in parallel with send is going to cause all sorts of unexpected and
> > undesired problems...
>
> This is a problem bees ran into.  There is a workaround in bees (called
> --workaround-btrfs-send) that avoids RO subvols as dedupe targets.
> As the name of the option implies, it works around problems in btrfs send.
>
> This kernel change makes the workaround mandatory now, as the default
> case (without workaround) will fail on every RO subvol even if that
> behavior is desired by the user.  That breaks an important use case on
> the receiving side of sends--to dedupe the received subvols together
> while also protecting them against modification on the target system
> with the RO flag--and preserving that use case is why the send workaround
> was optional (and not default) in bees.
>
> bees also won't handle the RO/RW/RO transition correctly, as it didn't
> seem like a sane thing to support at the time.  That is arguably something
> to be fixed in bees.
>
> > This is a problem introduced by dedupe ioctl when it landed, since
> > send existed for a longer time (when nothing else was
> > allowed to change read-only roots, including defrag).
>
> Is there a reason why incremental send can't simply be fixed?

This is a problem that affects both incremental and non-incremental (full) send.

> As far
> as I can tell, send is failing because of a runtime check that seems to
> be too strict; however, I haven't tried removing that check to see if
> it fixes the problem in send, or just hides the next problem.

The problem is send was designed with the idea that read-only roots
don't ever change.

The failures that can happen are many and unpredictable, from
occasionally failing with some error, to invalid memory accesses, use
after free problems, etc.
Essentially all caused by races when the nodes/leafs from the
read-only tree change while send is running.

I don't know what runtime check you are mentioning that is too strict.
You can definitely do dedupe on a read-only root and after it finishes
do a send (either full or incremental), and it will work.

>
> More details at:
>
>         https://github.com/Zygo/bees/issues/79#issuecomment-429039036
>
> > I understand it can break some applications, but adding other solution
> > such as preventing send and dedupe from running in parallel
> > (erroring out or block and wait for each other, etc) is going to be
> > really ugly. There's always the workaround for apps to set the
> > subvolume
> > to RW mode, do the dedupe, then switch it back to RO mode.
> >
> > Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication
  2019-02-20 16:54         ` Filipe Manana
@ 2019-02-20 17:17           ` Zygo Blaxell
  2019-02-22 11:13             ` Filipe Manana
  2019-02-21 16:54           ` Zygo Blaxell
  1 sibling, 1 reply; 26+ messages in thread
From: Zygo Blaxell @ 2019-02-20 17:17 UTC (permalink / raw)
  To: Filipe Manana; +Cc: dsterba, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 7997 bytes --]

On Wed, Feb 20, 2019 at 04:54:09PM +0000, Filipe Manana wrote:
> On Wed, Feb 20, 2019 at 4:42 PM Zygo Blaxell
> <ce3g8jdj@umail.furryterror.org> wrote:
> >
> > On Thu, Jan 31, 2019 at 04:39:22PM +0000, Filipe Manana wrote:
> > > On Thu, Dec 13, 2018 at 4:08 PM David Sterba <dsterba@suse.cz> wrote:
> > > >
> > > > On Wed, Dec 12, 2018 at 06:05:58PM +0000, fdmanana@kernel.org wrote:
> > > > > From: Filipe Manana <fdmanana@suse.com>
> > > > >
> > > > > Checking if the destination root is read-only was being performed only for
> > > > > clone operations. Make deduplication check it as well, as it does not make
> > > > > sense to not do it, even if it is an operation that does not change the
> > > > > file contents (such as defrag for example, which checks first if the root
> > > > > is read-only).
> > > >
> > > > And this is also change in user-visible behaviour of dedupe, so this
> > > > needs to be verified if it's not breaking existing tools.
> > >
> > > Have you had the chance to do such verification?
> > >
> > > This actually conflicts with send. Send does not expect a root/tree to
> > > change, and with dedupe on read-only roots happening
> > > in parallel with send is going to cause all sorts of unexpected and
> > > undesired problems...
> >
> > This is a problem bees ran into.  There is a workaround in bees (called
> > --workaround-btrfs-send) that avoids RO subvols as dedupe targets.
> > As the name of the option implies, it works around problems in btrfs send.
> >
> > This kernel change makes the workaround mandatory now, as the default
> > case (without workaround) will fail on every RO subvol even if that
> > behavior is desired by the user.  That breaks an important use case on
> > the receiving side of sends--to dedupe the received subvols together
> > while also protecting them against modification on the target system
> > with the RO flag--and preserving that use case is why the send workaround
> > was optional (and not default) in bees.
> >
> > bees also won't handle the RO/RW/RO transition correctly, as it didn't
> > seem like a sane thing to support at the time.  That is arguably something
> > to be fixed in bees.
> >
> > > This is a problem introduced by dedupe ioctl when it landed, since
> > > send existed for a longer time (when nothing else was
> > > allowed to change read-only roots, including defrag).
> >
> > Is there a reason why incremental send can't simply be fixed?
> 
> This is a problem that affects both incremental and non-incremental (full) send.
> 
> > As far
> > as I can tell, send is failing because of a runtime check that seems to
> > be too strict; however, I haven't tried removing that check to see if
> > it fixes the problem in send, or just hides the next problem.
> 
> The problem is send was designed with the idea that read-only roots
> don't ever change.

...except when they're snapshotted, balanced, or deduped (to list places
where that assumption hasn't held so far).

> The failures that can happen are many and unpredictable, from
> occasionally failing with some error, to invalid memory accesses, use
> after free problems, etc.
> Essentially all caused by races when the nodes/leafs from the
> read-only tree change while send is running.

Maybe you can explain this further?  As far as I can tell, all of those
are send bugs that should just be (and over the years have been) fixed.

> I don't know what runtime check you are mentioning that is too strict.

It's this one, in send.c:

                        /*
                         * We may have found an extent item that has changed
                         * only its disk_bytenr field and the corresponding
                         * inode item was not updated. This case happens due to
                         * very specific timings during relocation when a leaf
                         * that contains file extent items is COWed while
                         * relocation is ongoing and its in the stage where it
                         * updates data pointers. So when this happens we can
                         * safely ignore it since we know it's the same extent,
                         * but just at different logical and physical locations
                         * (when an extent is fully replaced with a new one, we
                         * know the generation number must have changed too,
                         * since snapshot creation implies committing the current
                         * transaction, and the inode item must have been updated
                         * as well).
                         * This replacement of the disk_bytenr happens at
                         * relocation.c:replace_file_extents() through
                         * relocation.c:btrfs_reloc_cow_block().
                         */
                        if (btrfs_file_extent_generation(leaf_l, ei_l) ==
                            btrfs_file_extent_generation(leaf_r, ei_r) &&
                            btrfs_file_extent_ram_bytes(leaf_l, ei_l) ==
                            btrfs_file_extent_ram_bytes(leaf_r, ei_r) &&
                            btrfs_file_extent_compression(leaf_l, ei_l) ==
                            btrfs_file_extent_compression(leaf_r, ei_r) &&
                            btrfs_file_extent_encryption(leaf_l, ei_l) ==
                            btrfs_file_extent_encryption(leaf_r, ei_r) &&
                            btrfs_file_extent_other_encoding(leaf_l, ei_l) ==
                            btrfs_file_extent_other_encoding(leaf_r, ei_r) &&
                            btrfs_file_extent_type(leaf_l, ei_l) ==
                            btrfs_file_extent_type(leaf_r, ei_r) &&
                            btrfs_file_extent_disk_bytenr(leaf_l, ei_l) !=
                            btrfs_file_extent_disk_bytenr(leaf_r, ei_r) &&
                            btrfs_file_extent_disk_num_bytes(leaf_l, ei_l) ==
                            btrfs_file_extent_disk_num_bytes(leaf_r, ei_r) &&
                            btrfs_file_extent_offset(leaf_l, ei_l) ==
                            btrfs_file_extent_offset(leaf_r, ei_r) &&
                            btrfs_file_extent_num_bytes(leaf_l, ei_l) ==
                            btrfs_file_extent_num_bytes(leaf_r, ei_r))
                                return 0;
                }

                inconsistent_snapshot_error(sctx, result, "extent");
                return -EIO;

This is the point where bees users report send failures.

I don't really understand what that code is trying to achieve, though.
If we are diffing two subvol trees then we should just send anything
that's different--even if we occasionally send a few redundant extents
because we happen to be sending during a balance, because that's better
than bombing out completely.

Or is the problem more like we lose our ei pointer in one of the subvols
when the extent tree changes under it?  That would be harder to solve,
it would have to keep releasing and reacquiring everything.

> You can definitely do dedupe on a read-only root and after it finishes
> do a send (either full or incremental), and it will work.

If there's a way to make a subvol RW temporarily _without breaking
incremental send from that subvol_ (i.e. without clearing the parent
UUIDs, maybe also allowing only dedupe) then I have no objection.

> > More details at:
> >
> >         https://github.com/Zygo/bees/issues/79#issuecomment-429039036
> >
> > > I understand it can break some applications, but adding other solution
> > > such as preventing send and dedupe from running in parallel
> > > (erroring out or block and wait for each other, etc) is going to be
> > > really ugly. There's always the workaround for apps to set the
> > > subvolume
> > > to RW mode, do the dedupe, then switch it back to RO mode.
> > >
> > > Thanks.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication
  2019-02-20 16:54         ` Filipe Manana
  2019-02-20 17:17           ` Zygo Blaxell
@ 2019-02-21 16:54           ` Zygo Blaxell
  1 sibling, 0 replies; 26+ messages in thread
From: Zygo Blaxell @ 2019-02-21 16:54 UTC (permalink / raw)
  To: Filipe Manana; +Cc: dsterba, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 5402 bytes --]

On Wed, Feb 20, 2019 at 04:54:09PM +0000, Filipe Manana wrote:
> On Wed, Feb 20, 2019 at 4:42 PM Zygo Blaxell
> <ce3g8jdj@umail.furryterror.org> wrote:
> >
> > On Thu, Jan 31, 2019 at 04:39:22PM +0000, Filipe Manana wrote:
> > > On Thu, Dec 13, 2018 at 4:08 PM David Sterba <dsterba@suse.cz> wrote:
> > > >
> > > > On Wed, Dec 12, 2018 at 06:05:58PM +0000, fdmanana@kernel.org wrote:
> > > > > From: Filipe Manana <fdmanana@suse.com>
> > > > >
> > > > > Checking if the destination root is read-only was being performed only for
> > > > > clone operations. Make deduplication check it as well, as it does not make
> > > > > sense to not do it, even if it is an operation that does not change the
> > > > > file contents (such as defrag for example, which checks first if the root
> > > > > is read-only).
> > > >
> > > > And this is also change in user-visible behaviour of dedupe, so this
> > > > needs to be verified if it's not breaking existing tools.
> > >
> > > Have you had the chance to do such verification?
> > >
> > > This actually conflicts with send. Send does not expect a root/tree to
> > > change, and with dedupe on read-only roots happening
> > > in parallel with send is going to cause all sorts of unexpected and
> > > undesired problems...
> >
> > This is a problem bees ran into.  There is a workaround in bees (called
> > --workaround-btrfs-send) that avoids RO subvols as dedupe targets.
> > As the name of the option implies, it works around problems in btrfs send.
> >
> > This kernel change makes the workaround mandatory now, as the default
> > case (without workaround) will fail on every RO subvol even if that
> > behavior is desired by the user.  That breaks an important use case on
> > the receiving side of sends--to dedupe the received subvols together
> > while also protecting them against modification on the target system
> > with the RO flag--and preserving that use case is why the send workaround
> > was optional (and not default) in bees.
> >
> > bees also won't handle the RO/RW/RO transition correctly, as it didn't
> > seem like a sane thing to support at the time.  That is arguably something
> > to be fixed in bees.
> >
> > > This is a problem introduced by dedupe ioctl when it landed, since
> > > send existed for a longer time (when nothing else was
> > > allowed to change read-only roots, including defrag).
> >
> > Is there a reason why incremental send can't simply be fixed?
> 
> This is a problem that affects both incremental and non-incremental (full) send.
> 
> > As far
> > as I can tell, send is failing because of a runtime check that seems to
> > be too strict; however, I haven't tried removing that check to see if
> > it fixes the problem in send, or just hides the next problem.
> 
> The problem is send was designed with the idea that read-only roots
> don't ever change.

Wait...don't ever change, or don't change while send is running?  If the
only thing we have to do is prevent concurrent send and dedupe, then that
might be much easier--and much less intrusive than breaking dedupe of
all RO subvols.

How does concurrent send and balance work now?  Maybe dedupe can use the
same approach.

> The failures that can happen are many and unpredictable, from
> occasionally failing with some error, to invalid memory accesses, use
> after free problems, etc.
> Essentially all caused by races when the nodes/leafs from the
> read-only tree change while send is running.
> 
> I don't know what runtime check you are mentioning that is too strict.
> You can definitely do dedupe on a read-only root and after it finishes
> do a send (either full or incremental), and it will work.

OK, after rereading https://github.com/Zygo/bees/issues/79 it's not
clear to me whether the problem reported was "send fails while bees is
running" or "send fails if bees is run between two incremental sends."
I had concluded it was the latter, which would definitely be a send bug,
but I can't find any statement to that effect from the bug reporter.

If it's the former, then all we need is some way to prevent dedupe while
sending on the same subvol.  Maybe send could set a flag (rwlock?) on the
subvol data structure, and dedupe can check the flag, with this behavior:

	- if dedupe is running while send starts, block the send until
	the dedupe is finished

	- if send is running when dedupe starts, reject the dedupe
	with...EBUSY?

Userspace dedupe could look for EBUSY and reschedule dedupe of the RO
subvol later.

Dedupe could block until the send is finished (similar to the way
that subvol deletes block until balance is finished) but I'd prefer to
have dedupe return immediately with an error so userspace can schedule
something different to dedupe (similar to the way that multiple concurrent
balances exclude each other).


> >
> > More details at:
> >
> >         https://github.com/Zygo/bees/issues/79#issuecomment-429039036
> >
> > > I understand it can break some applications, but adding other solution
> > > such as preventing send and dedupe from running in parallel
> > > (erroring out or block and wait for each other, etc) is going to be
> > > really ugly. There's always the workaround for apps to set the
> > > subvolume
> > > to RW mode, do the dedupe, then switch it back to RO mode.
> > >
> > > Thanks.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication
  2019-02-20 17:17           ` Zygo Blaxell
@ 2019-02-22 11:13             ` Filipe Manana
  2019-02-22 17:25               ` David Sterba
  0 siblings, 1 reply; 26+ messages in thread
From: Filipe Manana @ 2019-02-22 11:13 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: dsterba, linux-btrfs

On Wed, Feb 20, 2019 at 5:17 PM Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
>
> On Wed, Feb 20, 2019 at 04:54:09PM +0000, Filipe Manana wrote:
> > On Wed, Feb 20, 2019 at 4:42 PM Zygo Blaxell
> > <ce3g8jdj@umail.furryterror.org> wrote:
> > >
> > > On Thu, Jan 31, 2019 at 04:39:22PM +0000, Filipe Manana wrote:
> > > > On Thu, Dec 13, 2018 at 4:08 PM David Sterba <dsterba@suse.cz> wrote:
> > > > >
> > > > > On Wed, Dec 12, 2018 at 06:05:58PM +0000, fdmanana@kernel.org wrote:
> > > > > > From: Filipe Manana <fdmanana@suse.com>
> > > > > >
> > > > > > Checking if the destination root is read-only was being performed only for
> > > > > > clone operations. Make deduplication check it as well, as it does not make
> > > > > > sense to not do it, even if it is an operation that does not change the
> > > > > > file contents (such as defrag for example, which checks first if the root
> > > > > > is read-only).
> > > > >
> > > > > And this is also change in user-visible behaviour of dedupe, so this
> > > > > needs to be verified if it's not breaking existing tools.
> > > >
> > > > Have you had the chance to do such verification?
> > > >
> > > > This actually conflicts with send. Send does not expect a root/tree to
> > > > change, and with dedupe on read-only roots happening
> > > > in parallel with send is going to cause all sorts of unexpected and
> > > > undesired problems...
> > >
> > > This is a problem bees ran into.  There is a workaround in bees (called
> > > --workaround-btrfs-send) that avoids RO subvols as dedupe targets.
> > > As the name of the option implies, it works around problems in btrfs send.
> > >
> > > This kernel change makes the workaround mandatory now, as the default
> > > case (without workaround) will fail on every RO subvol even if that
> > > behavior is desired by the user.  That breaks an important use case on
> > > the receiving side of sends--to dedupe the received subvols together
> > > while also protecting them against modification on the target system
> > > with the RO flag--and preserving that use case is why the send workaround
> > > was optional (and not default) in bees.
> > >
> > > bees also won't handle the RO/RW/RO transition correctly, as it didn't
> > > seem like a sane thing to support at the time.  That is arguably something
> > > to be fixed in bees.
> > >
> > > > This is a problem introduced by dedupe ioctl when it landed, since
> > > > send existed for a longer time (when nothing else was
> > > > allowed to change read-only roots, including defrag).
> > >
> > > Is there a reason why incremental send can't simply be fixed?
> >
> > This is a problem that affects both incremental and non-incremental (full) send.
> >
> > > As far
> > > as I can tell, send is failing because of a runtime check that seems to
> > > be too strict; however, I haven't tried removing that check to see if
> > > it fixes the problem in send, or just hides the next problem.
> >
> > The problem is send was designed with the idea that read-only roots
> > don't ever change.
>
> ...except when they're snapshotted, balanced, or deduped (to list places
> where that assumption hasn't held so far).

Yes, we had bugs related to that (I fixed some of them). However, the
snapshotting ones [1, 2] happened to be easy to solve because only the
root node is COWed.
The one related to balance [3], which I only remember one and matches
the piece of code you pointed to (which I fixed long ago), actually
only solves
one problem, which was hitting a BUG_ON() (if the extent item was
changed by relocation we know no data changed and we can ignore the
change).
The other problem it doesn't solve is essentially the same problem
that a concurrent dedupe can cause, which I only realized some weeks
ago after noticing
dedup was allowed on RO roots (subvolumes/snapshots).

I'll explain below what problem it is.

>
> > The failures that can happen are many and unpredictable, from
> > occasionally failing with some error, to invalid memory accesses, use
> > after free problems, etc.
> > Essentially all caused by races when the nodes/leafs from the
> > read-only tree change while send is running.
>
> Maybe you can explain this further?  As far as I can tell, all of those
> are send bugs that should just be (and over the years have been) fixed.

So send was always lockless and transactionless, because it's a
potentially long running operation that could delay everything
else if it were not (specially if it held transactions open) and
because it needs to operate on a tree that never changes, so that it
always
sees a consistent state (often it needs to do multiple searches on a
tree for many different reasons). Well this second reason is actually
what
matters and the first reason is more a consequence of the second
reason. But well, the world is not perfect
and it seems that during the design/implementation stage of send,
balance and snapshotting of a snapshot being used by send were
forgotten.
Then after send came, dedupe came and because it allows to dedupe
against files in a RO tree, brought another way of being able to
modify a RO tree that can be in use by a concurrent send - this was
missed during the review process obviously, otherwise it would have
never allowed dedupe on RO trees or send would have been changed as
well to somehow work with a concurrent dedupe.

The big problem, for which I don't have a solution, is that dedupe and
balance COW entire paths, and not just the root node of a tree like
the snapshotting case.
For example while send is doing work with a leaf from a RO snapshot
tree, if that leaf is COWed, its extent is marked as free once
the transaction that COWed it is committed. That means that
immediately after that, the extent can be reused, allocated to some
other tree
leaf or node - while send is still using the respective in-memory
representation of the leaf, the extent buffer. So some other task will
be modifying the
extent buffer while send is using it as well - this is what brings
unexpected results that can result in crashes or, with luck, some
random failures that make
send return some error only. For a leaf we could just clone the extent
buffer while preventing transaction commits, and then allow them to
happen again after
cloning.

Now COWing a leaf also results in COWing its parent node, parent of
the parent, etc, all the way up to the root node (pointers need to be
changed in every parent). So if a node is COWed while send is using
it, and before it finishes using it the same extent gets reused for
another tree/node, send would
follow an incorrect path since now the node has different pointers or
it's in an undefined/inconsistent state due to the concurrent
modification. Creating a copy/clone of
the node while transaction commits are disabled is not an option,
since when using the copy, nodes below it might have been COWed and no
longer exist, so we are
accessing stale pointers.

Getting the extent of a node/leaf immediately reused after COWing it
and after its transaction is committed is not very common and happens
mostly when running close
to ENOSPC, plus the time spent processing a node/leaf by send is very
short. So this is effectively a hard to hit problem.

The bug fixes for the snapshotting of a RO snapshot work and are
simple because snapshotting only COWs the root of a tree, so we stop
transactions commits
temporarily, make a copy of the root for send to use and allow
transaction commits to happen again, while send is happy doing what it
has to using the cloned root
nodes - works because snapshotting doesn't COW any other nodes/leafs
of the tree, i.e., the pointers in the root node don't change.

>
> > I don't know what runtime check you are mentioning that is too strict.
>
> It's this one, in send.c:
>
>                         /*
>                          * We may have found an extent item that has changed
>                          * only its disk_bytenr field and the corresponding
>                          * inode item was not updated. This case happens due to
>                          * very specific timings during relocation when a leaf
>                          * that contains file extent items is COWed while
>                          * relocation is ongoing and its in the stage where it
>                          * updates data pointers. So when this happens we can
>                          * safely ignore it since we know it's the same extent,
>                          * but just at different logical and physical locations
>                          * (when an extent is fully replaced with a new one, we
>                          * know the generation number must have changed too,
>                          * since snapshot creation implies committing the current
>                          * transaction, and the inode item must have been updated
>                          * as well).
>                          * This replacement of the disk_bytenr happens at
>                          * relocation.c:replace_file_extents() through
>                          * relocation.c:btrfs_reloc_cow_block().
>                          */
>                         if (btrfs_file_extent_generation(leaf_l, ei_l) ==
>                             btrfs_file_extent_generation(leaf_r, ei_r) &&
>                             btrfs_file_extent_ram_bytes(leaf_l, ei_l) ==
>                             btrfs_file_extent_ram_bytes(leaf_r, ei_r) &&
>                             btrfs_file_extent_compression(leaf_l, ei_l) ==
>                             btrfs_file_extent_compression(leaf_r, ei_r) &&
>                             btrfs_file_extent_encryption(leaf_l, ei_l) ==
>                             btrfs_file_extent_encryption(leaf_r, ei_r) &&
>                             btrfs_file_extent_other_encoding(leaf_l, ei_l) ==
>                             btrfs_file_extent_other_encoding(leaf_r, ei_r) &&
>                             btrfs_file_extent_type(leaf_l, ei_l) ==
>                             btrfs_file_extent_type(leaf_r, ei_r) &&
>                             btrfs_file_extent_disk_bytenr(leaf_l, ei_l) !=
>                             btrfs_file_extent_disk_bytenr(leaf_r, ei_r) &&
>                             btrfs_file_extent_disk_num_bytes(leaf_l, ei_l) ==
>                             btrfs_file_extent_disk_num_bytes(leaf_r, ei_r) &&
>                             btrfs_file_extent_offset(leaf_l, ei_l) ==
>                             btrfs_file_extent_offset(leaf_r, ei_r) &&
>                             btrfs_file_extent_num_bytes(leaf_l, ei_l) ==
>                             btrfs_file_extent_num_bytes(leaf_r, ei_r))
>                                 return 0;
>                 }
>
>                 inconsistent_snapshot_error(sctx, result, "extent");
>                 return -EIO;
>
> This is the point where bees users report send failures.

That seems yo happen when dedupe is running in parallel with send.
That can be fixed by ignoring file extent items that changed without
the inode item having
changed as well (assuming this can only be caused by balance and
dedupe, and there's no other path for this).
When I added tjat check to fix a BUG_ON hit due to balance changing
file extent items (without updating inode items), I wasn't aware that
an ongoing dedupe could
cause the same problem (back then, when I added that check [3], I
thought that dedupe was like clone and would refuse to work on RO
subvolumes/snapshots).
However I don't see immediately how it's possible, since dedupe
updates file extent items (or inserts new ones) in the same
transaction where it updates the inode
item, so it's really weird how send would find the new/modified file
extent items from the commit root without seeing an update inode item
as well.

Btw, why weren't those reports reported to the btrfs mailing list?

However the bigger problem, mentioned above, I really don't have a
solution for it now, and it's far from being a trivial problem
(perhaps simply making send use read locks
for searches on the commit root would work). It can be caused by
either a concurrent dedupe or a concurrent balance (and hopefully
there's nothing else that can modify RO trees),
so a generic solution needs to be found. Preventing dedupe and send
from running in parallel, as you suggest in another reply, is
something I suggested too earlier in this thread.
Preventing balance and send running in parallel, well... it starts to get messy.

So not doing nothing for now, that is, not applying this patch to
disable dedupe on RO roots and wait for a better solution (best case,
for 5.2 merge window), is reasonable
and not something I'm against.
David, would you consider at least excluding it from 5.1 to allow for
a different solution to pop up for another merge window?

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=be6821f82c3cc36e026f5afd10249988852b35ea
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6f2f0b394b54e2b159ef969a0b5274e9bbf82ff2
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d5e84fd8d0634d056248b67463b42f6c85896a19

>
> I don't really understand what that code is trying to achieve, though.
> If we are diffing two subvol trees then we should just send anything
> that's different--even if we occasionally send a few redundant extents
> because we happen to be sending during a balance, because that's better
> than bombing out completely.
>
> Or is the problem more like we lose our ei pointer in one of the subvols
> when the extent tree changes under it?  That would be harder to solve,
> it would have to keep releasing and reacquiring everything.
>
> > You can definitely do dedupe on a read-only root and after it finishes
> > do a send (either full or incremental), and it will work.
>
> If there's a way to make a subvol RW temporarily _without breaking
> incremental send from that subvol_ (i.e. without clearing the parent
> UUIDs, maybe also allowing only dedupe) then I have no objection.
>
> > > More details at:
> > >
> > >         https://github.com/Zygo/bees/issues/79#issuecomment-429039036
> > >
> > > > I understand it can break some applications, but adding other solution
> > > > such as preventing send and dedupe from running in parallel
> > > > (erroring out or block and wait for each other, etc) is going to be
> > > > really ugly. There's always the workaround for apps to set the
> > > > subvolume
> > > > to RW mode, do the dedupe, then switch it back to RO mode.
> > > >
> > > > Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication
  2019-02-22 11:13             ` Filipe Manana
@ 2019-02-22 17:25               ` David Sterba
  0 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2019-02-22 17:25 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Zygo Blaxell, linux-btrfs

On Fri, Feb 22, 2019 at 11:13:48AM +0000, Filipe Manana wrote:
> So not doing nothing for now, that is, not applying this patch to
> disable dedupe on RO roots and wait for a better solution (best case,
> for 5.2 merge window), is reasonable
> and not something I'm against.
> David, would you consider at least excluding it from 5.1 to allow for
> a different solution to pop up for another merge window?

Ok, I'll remove it from the 5.1 queue for now. Thanks for the detailed
analysis. This fix favors send (for the correctness reasons), but the
deduplication usecase is also important. I hope we'll be able to come up
with a solution that does not hurt usability too much (or affects send
and dedupe equally with a sane fallback behaviour). The exclusion of
send and other operation (snapshot) already exists so the suggested
-EBUSY approach.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-02-22 17:24 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12 18:05 [PATCH 0/4] Btrfs: a few more cleanups and fixes for clone/deduplication fdmanana
2018-12-12 18:05 ` [PATCH 1/4] Btrfs: move duplicated nodatasum check into common reflink/dedupe helper fdmanana
2019-01-11 14:55   ` David Sterba
2018-12-12 18:05 ` [PATCH 2/4] Btrfs: use cross mount point check for cloning and deduplication fdmanana
2018-12-13 16:02   ` David Sterba
2019-01-11 14:38     ` David Sterba
2018-12-12 18:05 ` [PATCH 3/4] Btrfs: check if destination root is read-only for deduplication fdmanana
2018-12-13 16:07   ` David Sterba
2019-01-31 16:39     ` Filipe Manana
2019-01-31 16:44       ` Hugo Mills
2019-02-18 15:38         ` David Sterba
2019-02-18 16:55           ` Filipe Manana
2019-02-12 17:59       ` Filipe Manana
2019-02-20 16:41       ` Zygo Blaxell
2019-02-20 16:54         ` Filipe Manana
2019-02-20 17:17           ` Zygo Blaxell
2019-02-22 11:13             ` Filipe Manana
2019-02-22 17:25               ` David Sterba
2019-02-21 16:54           ` Zygo Blaxell
2019-02-18 16:01   ` David Sterba
2018-12-12 18:05 ` [PATCH 4/4] Btrfs: remove no longer needed range length checks " fdmanana
2018-12-13 12:20   ` Nikolay Borisov
2019-01-31 16:31   ` Filipe Manana
2019-02-12 17:58     ` Filipe Manana
2019-02-18 15:10     ` David Sterba
2018-12-13 12:19 ` [PATCH 0/4] Btrfs: a few more cleanups and fixes for clone/deduplication Nikolay Borisov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).