All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] btrfs: dedupe fixes, features V4
@ 2015-06-26 21:00 Mark Fasheh
  2015-06-26 21:00 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Mark Fasheh @ 2015-06-26 21:00 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs

Hi Chris,

   The following patches are based on top of my patch titled "btrfs:
Handle unaligned length in extent_same" which you have in your
'integration-4.2' branch:

https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=e1d227a42ea2b4664f94212bd1106b9a3413ffb8

The series contains three fixes and two features.

The first patch in the series fixes a bug where we were sometimes
passing the aligned length to our comparison function. We actually can
stop at the user passed length for this as we don't need to compare
data past i_size (and we only align if the extents go out to i_size).

The 2nd patch fixes a deadlock between btrfs readpage and
btrfs_extent_same. This was reported on the list some months ago -
basically we had the page and extent locking reversed. My patch fixes
up the locking to be in the right order.

The 3rd patch fixes a deadlocks in clone() (wrt extent-same) which
David found while reviewing my fixes. I also found that clone doesn't
lock extent ranges in any particular order which could obvioulsy be a
problem so that is fixed there too.

The last two patches add features which have been requested often by
users - the 4th adds the ability to dedupe within the same inode, and
the last patch fixes up the clone code to skip updating mtime when
called from extent-same (this helps with backup software which is
being confused into re-backing up deduped files).

Changes from V3 to V4:
  - change mtime patch from a flag to default behavior

Changes from V2 to V3:
  - better flag naming in patch #5, thanks to David

Changes from V1 to V2:
  - added patches 3-5

Thanks,
   --Mark

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data()
  2015-06-26 21:00 [PATCH 0/5] btrfs: dedupe fixes, features V4 Mark Fasheh
@ 2015-06-26 21:00 ` Mark Fasheh
  2015-06-26 21:00 ` [PATCH 2/5] btrfs: fix deadlock with extent-same and readpage Mark Fasheh
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Mark Fasheh @ 2015-06-26 21:00 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh

In the case that we dedupe the tail of a file, we might expand the dedupe
len out to the end of our last block. We don't want to compare data past
i_size however, so pass the original length to btrfs_cmp_data().

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
 fs/btrfs/ioctl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2d24ff4..2deea1f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2933,7 +2933,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 		goto out_unlock;
 	}
 
-	ret = btrfs_cmp_data(src, loff, dst, dst_loff, len);
+	/* pass original length for comparison so we stay within i_size */
+	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen);
 	if (ret == 0)
 		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
 
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/5] btrfs: fix deadlock with extent-same and readpage
  2015-06-26 21:00 [PATCH 0/5] btrfs: dedupe fixes, features V4 Mark Fasheh
  2015-06-26 21:00 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
@ 2015-06-26 21:00 ` Mark Fasheh
  2015-06-26 21:00 ` [PATCH 3/5] btrfs: fix clone / extent-same deadlocks Mark Fasheh
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Mark Fasheh @ 2015-06-26 21:00 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh

->readpage() does page_lock() before extent_lock(), we do the opposite in
extent-same. We want to reverse the order in btrfs_extent_same() but it's
not quite straightforward since the page locks are taken inside btrfs_cmp_data().

So I split btrfs_cmp_data() into 3 parts with a small context structure that
is passed between them. The first, btrfs_cmp_data_prepare() gathers up the
pages needed (taking page lock as required) and puts them on our context
structure. At this point, we are safe to lock the extent range. Afterwards,
we use btrfs_cmp_data() to do the data compare as usual and btrfs_cmp_data_free()
to clean up our context.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
 fs/btrfs/ioctl.c | 148 +++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 117 insertions(+), 31 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2deea1f..b899584 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2755,14 +2755,11 @@ out:
 	return ret;
 }
 
-static struct page *extent_same_get_page(struct inode *inode, u64 off)
+static struct page *extent_same_get_page(struct inode *inode, pgoff_t index)
 {
 	struct page *page;
-	pgoff_t index;
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
 
-	index = off >> PAGE_CACHE_SHIFT;
-
 	page = grab_cache_page(inode->i_mapping, index);
 	if (!page)
 		return NULL;
@@ -2783,6 +2780,20 @@ static struct page *extent_same_get_page(struct inode *inode, u64 off)
 	return page;
 }
 
+static int gather_extent_pages(struct inode *inode, struct page **pages,
+			       int num_pages, u64 off)
+{
+	int i;
+	pgoff_t index = off >> PAGE_CACHE_SHIFT;
+
+	for (i = 0; i < num_pages; i++) {
+		pages[i] = extent_same_get_page(inode, index + i);
+		if (!pages[i])
+			return -ENOMEM;
+	}
+	return 0;
+}
+
 static inline void lock_extent_range(struct inode *inode, u64 off, u64 len)
 {
 	/* do any pending delalloc/csum calc on src, one way or
@@ -2808,52 +2819,120 @@ static inline void lock_extent_range(struct inode *inode, u64 off, u64 len)
 	}
 }
 
-static void btrfs_double_unlock(struct inode *inode1, u64 loff1,
-				struct inode *inode2, u64 loff2, u64 len)
+static void btrfs_double_inode_unlock(struct inode *inode1, struct inode *inode2)
 {
-	unlock_extent(&BTRFS_I(inode1)->io_tree, loff1, loff1 + len - 1);
-	unlock_extent(&BTRFS_I(inode2)->io_tree, loff2, loff2 + len - 1);
-
 	mutex_unlock(&inode1->i_mutex);
 	mutex_unlock(&inode2->i_mutex);
 }
 
-static void btrfs_double_lock(struct inode *inode1, u64 loff1,
-			      struct inode *inode2, u64 loff2, u64 len)
+static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
+{
+	if (inode1 < inode2)
+		swap(inode1, inode2);
+
+	mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
+	if (inode1 != inode2)
+		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+}
+
+static void btrfs_double_extent_unlock(struct inode *inode1, u64 loff1,
+				      struct inode *inode2, u64 loff2, u64 len)
+{
+	unlock_extent(&BTRFS_I(inode1)->io_tree, loff1, loff1 + len - 1);
+	unlock_extent(&BTRFS_I(inode2)->io_tree, loff2, loff2 + len - 1);
+}
+
+static void btrfs_double_extent_lock(struct inode *inode1, u64 loff1,
+				     struct inode *inode2, u64 loff2, u64 len)
 {
 	if (inode1 < inode2) {
 		swap(inode1, inode2);
 		swap(loff1, loff2);
 	}
-
-	mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
 	lock_extent_range(inode1, loff1, len);
-	if (inode1 != inode2) {
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+	if (inode1 != inode2)
 		lock_extent_range(inode2, loff2, len);
+}
+
+struct cmp_pages {
+	int		num_pages;
+	struct page	**src_pages;
+	struct page	**dst_pages;
+};
+
+static void btrfs_cmp_data_free(struct cmp_pages *cmp)
+{
+	int i;
+	struct page *pg;
+
+	for (i = 0; i < cmp->num_pages; i++) {
+		pg = cmp->src_pages[i];
+		if (pg)
+			page_cache_release(pg);
+		pg = cmp->dst_pages[i];
+		if (pg)
+			page_cache_release(pg);
+	}
+	kfree(cmp->src_pages);
+	kfree(cmp->dst_pages);
+}
+
+static int btrfs_cmp_data_prepare(struct inode *src, u64 loff,
+				  struct inode *dst, u64 dst_loff,
+				  u64 len, struct cmp_pages *cmp)
+{
+	int ret;
+	int num_pages = PAGE_CACHE_ALIGN(len) >> PAGE_CACHE_SHIFT;
+	struct page **src_pgarr, **dst_pgarr;
+
+	/*
+	 * We must gather up all the pages before we initiate our
+	 * extent locking. We use an array for the page pointers. Size
+	 * of the array is bounded by len, which is in turn bounded by
+	 * BTRFS_MAX_DEDUPE_LEN.
+	 */
+	src_pgarr = kzalloc(num_pages * sizeof(struct page *), GFP_NOFS);
+	dst_pgarr = kzalloc(num_pages * sizeof(struct page *), GFP_NOFS);
+	if (!src_pgarr || !dst_pgarr) {
+		kfree(src_pgarr);
+		kfree(dst_pgarr);
+		return -ENOMEM;
 	}
+	cmp->num_pages = num_pages;
+	cmp->src_pages = src_pgarr;
+	cmp->dst_pages = dst_pgarr;
+
+	ret = gather_extent_pages(src, cmp->src_pages, cmp->num_pages, loff);
+	if (ret)
+		goto out;
+
+	ret = gather_extent_pages(dst, cmp->dst_pages, cmp->num_pages, dst_loff);
+
+out:
+	if (ret)
+		btrfs_cmp_data_free(cmp);
+	return 0;
 }
 
 static int btrfs_cmp_data(struct inode *src, u64 loff, struct inode *dst,
-			  u64 dst_loff, u64 len)
+			  u64 dst_loff, u64 len, struct cmp_pages *cmp)
 {
 	int ret = 0;
+	int i;
 	struct page *src_page, *dst_page;
 	unsigned int cmp_len = PAGE_CACHE_SIZE;
 	void *addr, *dst_addr;
 
+	i = 0;
 	while (len) {
 		if (len < PAGE_CACHE_SIZE)
 			cmp_len = len;
 
-		src_page = extent_same_get_page(src, loff);
-		if (!src_page)
-			return -EINVAL;
-		dst_page = extent_same_get_page(dst, dst_loff);
-		if (!dst_page) {
-			page_cache_release(src_page);
-			return -EINVAL;
-		}
+		BUG_ON(i >= cmp->num_pages);
+
+		src_page = cmp->src_pages[i];
+		dst_page = cmp->dst_pages[i];
+
 		addr = kmap_atomic(src_page);
 		dst_addr = kmap_atomic(dst_page);
 
@@ -2865,15 +2944,12 @@ static int btrfs_cmp_data(struct inode *src, u64 loff, struct inode *dst,
 
 		kunmap_atomic(addr);
 		kunmap_atomic(dst_addr);
-		page_cache_release(src_page);
-		page_cache_release(dst_page);
 
 		if (ret)
 			break;
 
-		loff += cmp_len;
-		dst_loff += cmp_len;
 		len -= cmp_len;
+		i++;
 	}
 
 	return ret;
@@ -2904,6 +2980,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 {
 	int ret;
 	u64 len = olen;
+	struct cmp_pages cmp;
 
 	/*
 	 * btrfs_clone() can't handle extents in the same file
@@ -2916,7 +2993,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 	if (len == 0)
 		return 0;
 
-	btrfs_double_lock(src, loff, dst, dst_loff, len);
+	btrfs_double_inode_lock(src, dst);
 
 	ret = extent_same_check_offsets(src, loff, &len, olen);
 	if (ret)
@@ -2933,13 +3010,22 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 		goto out_unlock;
 	}
 
+	ret = btrfs_cmp_data_prepare(src, loff, dst, dst_loff, olen, &cmp);
+	if (ret)
+		goto out_unlock;
+
+	btrfs_double_extent_lock(src, loff, dst, dst_loff, len);
+
 	/* pass original length for comparison so we stay within i_size */
-	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen);
+	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
 	if (ret == 0)
 		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
 
+	btrfs_double_extent_unlock(src, loff, dst, dst_loff, len);
+
+	btrfs_cmp_data_free(&cmp);
 out_unlock:
-	btrfs_double_unlock(src, loff, dst, dst_loff, len);
+	btrfs_double_inode_unlock(src, dst);
 
 	return ret;
 }
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/5] btrfs: fix clone / extent-same deadlocks
  2015-06-26 21:00 [PATCH 0/5] btrfs: dedupe fixes, features V4 Mark Fasheh
  2015-06-26 21:00 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
  2015-06-26 21:00 ` [PATCH 2/5] btrfs: fix deadlock with extent-same and readpage Mark Fasheh
@ 2015-06-26 21:00 ` Mark Fasheh
  2015-06-26 21:01 ` [PATCH 4/5] btrfs: allow dedupe of same inode Mark Fasheh
  2015-06-26 21:01 ` [PATCH 5/5] btrfs: don't update mtime on deduped inodes Mark Fasheh
  4 siblings, 0 replies; 10+ messages in thread
From: Mark Fasheh @ 2015-06-26 21:00 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh

Clone and extent same lock their source and target inodes in opposite order.
In addition to this, the range locking in clone doesn't take ordering into
account. Fix this by having clone use the same locking helpers as
btrfs-extent-same.

In addition, I do a small cleanup of the locking helpers, removing a case
(both inodes being the same) which was poorly accounted for and never
actually used by the callers.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
 fs/btrfs/ioctl.c | 34 ++++++++--------------------------
 1 file changed, 8 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b899584..8d6887d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2831,8 +2831,7 @@ static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
 		swap(inode1, inode2);
 
 	mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
-	if (inode1 != inode2)
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+	mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
 }
 
 static void btrfs_double_extent_unlock(struct inode *inode1, u64 loff1,
@@ -2850,8 +2849,7 @@ static void btrfs_double_extent_lock(struct inode *inode1, u64 loff1,
 		swap(loff1, loff2);
 	}
 	lock_extent_range(inode1, loff1, len);
-	if (inode1 != inode2)
-		lock_extent_range(inode2, loff2, len);
+	lock_extent_range(inode2, loff2, len);
 }
 
 struct cmp_pages {
@@ -3713,13 +3711,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 		goto out_fput;
 
 	if (!same_inode) {
-		if (inode < src) {
-			mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT);
-			mutex_lock_nested(&src->i_mutex, I_MUTEX_CHILD);
-		} else {
-			mutex_lock_nested(&src->i_mutex, I_MUTEX_PARENT);
-			mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
-		}
+		btrfs_double_inode_lock(src, inode);
 	} else {
 		mutex_lock(&src->i_mutex);
 	}
@@ -3769,8 +3761,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 
 		lock_extent_range(src, lock_start, lock_len);
 	} else {
-		lock_extent_range(src, off, len);
-		lock_extent_range(inode, destoff, len);
+		btrfs_double_extent_lock(src, off, inode, destoff, len);
 	}
 
 	ret = btrfs_clone(src, inode, off, olen, len, destoff);
@@ -3781,9 +3772,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 
 		unlock_extent(&BTRFS_I(src)->io_tree, lock_start, lock_end);
 	} else {
-		unlock_extent(&BTRFS_I(src)->io_tree, off, off + len - 1);
-		unlock_extent(&BTRFS_I(inode)->io_tree, destoff,
-			      destoff + len - 1);
+		btrfs_double_extent_unlock(src, off, inode, destoff, len);
 	}
 	/*
 	 * Truncate page cache pages so that future reads will see the cloned
@@ -3792,17 +3781,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 	truncate_inode_pages_range(&inode->i_data, destoff,
 				   PAGE_CACHE_ALIGN(destoff + len) - 1);
 out_unlock:
-	if (!same_inode) {
-		if (inode < src) {
-			mutex_unlock(&src->i_mutex);
-			mutex_unlock(&inode->i_mutex);
-		} else {
-			mutex_unlock(&inode->i_mutex);
-			mutex_unlock(&src->i_mutex);
-		}
-	} else {
+	if (!same_inode)
+		btrfs_double_inode_unlock(src, inode);
+	else
 		mutex_unlock(&src->i_mutex);
-	}
 out_fput:
 	fdput(src_file);
 out_drop_write:
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/5] btrfs: allow dedupe of same inode
  2015-06-26 21:00 [PATCH 0/5] btrfs: dedupe fixes, features V4 Mark Fasheh
                   ` (2 preceding siblings ...)
  2015-06-26 21:00 ` [PATCH 3/5] btrfs: fix clone / extent-same deadlocks Mark Fasheh
@ 2015-06-26 21:01 ` Mark Fasheh
  2015-06-26 21:01 ` [PATCH 5/5] btrfs: don't update mtime on deduped inodes Mark Fasheh
  4 siblings, 0 replies; 10+ messages in thread
From: Mark Fasheh @ 2015-06-26 21:01 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh

clone() supports cloning within an inode so extent-same can do
the same now. This patch fixes up the locking in extent-same to
know about the single-inode case. In addition to that, we add a
check for overlapping ranges, which clone does not allow.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
 fs/btrfs/ioctl.c | 76 ++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 60 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 8d6887d..83f4679 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2979,27 +2979,61 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 	int ret;
 	u64 len = olen;
 	struct cmp_pages cmp;
+	int same_inode = 0;
+	u64 same_lock_start = 0;
+	u64 same_lock_len = 0;
 
-	/*
-	 * btrfs_clone() can't handle extents in the same file
-	 * yet. Once that works, we can drop this check and replace it
-	 * with a check for the same inode, but overlapping extents.
-	 */
 	if (src == dst)
-		return -EINVAL;
+		same_inode = 1;
 
 	if (len == 0)
 		return 0;
 
-	btrfs_double_inode_lock(src, dst);
+	if (same_inode) {
+		mutex_lock(&src->i_mutex);
 
-	ret = extent_same_check_offsets(src, loff, &len, olen);
-	if (ret)
-		goto out_unlock;
+		ret = extent_same_check_offsets(src, loff, &len, olen);
+		if (ret)
+			goto out_unlock;
 
-	ret = extent_same_check_offsets(dst, dst_loff, &len, olen);
-	if (ret)
-		goto out_unlock;
+		/*
+		 * Single inode case wants the same checks, except we
+		 * don't want our length pushed out past i_size as
+		 * comparing that data range makes no sense.
+		 *
+		 * extent_same_check_offsets() will do this for an
+		 * unaligned length at i_size, so catch it here and
+		 * reject the request.
+		 *
+		 * This effectively means we require aligned extents
+		 * for the single-inode case, whereas the other cases
+		 * allow an unaligned length so long as it ends at
+		 * i_size.
+		 */
+		if (len != olen) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
+		/* Check for overlapping ranges */
+		if (dst_loff + len > loff && dst_loff < loff + len) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
+		same_lock_start = min_t(u64, loff, dst_loff);
+		same_lock_len = max_t(u64, loff, dst_loff) + len - same_lock_start;
+	} else {
+		btrfs_double_inode_lock(src, dst);
+
+		ret = extent_same_check_offsets(src, loff, &len, olen);
+		if (ret)
+			goto out_unlock;
+
+		ret = extent_same_check_offsets(dst, dst_loff, &len, olen);
+		if (ret)
+			goto out_unlock;
+	}
 
 	/* don't make the dst file partly checksummed */
 	if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
@@ -3012,18 +3046,28 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 	if (ret)
 		goto out_unlock;
 
-	btrfs_double_extent_lock(src, loff, dst, dst_loff, len);
+	if (same_inode)
+		lock_extent_range(src, same_lock_start, same_lock_len);
+	else
+		btrfs_double_extent_lock(src, loff, dst, dst_loff, len);
 
 	/* pass original length for comparison so we stay within i_size */
 	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
 	if (ret == 0)
 		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
 
-	btrfs_double_extent_unlock(src, loff, dst, dst_loff, len);
+	if (same_inode)
+		unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
+			      same_lock_start + same_lock_len - 1);
+	else
+		btrfs_double_extent_unlock(src, loff, dst, dst_loff, len);
 
 	btrfs_cmp_data_free(&cmp);
 out_unlock:
-	btrfs_double_inode_unlock(src, dst);
+	if (same_inode)
+		mutex_unlock(&src->i_mutex);
+	else
+		btrfs_double_inode_unlock(src, dst);
 
 	return ret;
 }
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/5] btrfs: don't update mtime on deduped inodes
  2015-06-26 21:00 [PATCH 0/5] btrfs: dedupe fixes, features V4 Mark Fasheh
                   ` (3 preceding siblings ...)
  2015-06-26 21:01 ` [PATCH 4/5] btrfs: allow dedupe of same inode Mark Fasheh
@ 2015-06-26 21:01 ` Mark Fasheh
  2015-06-27 21:44   ` Zygo Blaxell
  4 siblings, 1 reply; 10+ messages in thread
From: Mark Fasheh @ 2015-06-26 21:01 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh

One issue users have reported is that dedupe changes mtime on files,
resulting in tools like rsync thinking that their contents have changed when
in fact the data is exactly the same. Clone still wants an mtime change, so
we special case this in the code.

This was tested with the btrfs-extent-same tool.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/ioctl.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 83f4679..0af0f13 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -87,7 +87,8 @@ struct btrfs_ioctl_received_subvol_args_32 {
 
 
 static int btrfs_clone(struct inode *src, struct inode *inode,
-		       u64 off, u64 olen, u64 olen_aligned, u64 destoff);
+		       u64 off, u64 olen, u64 olen_aligned, u64 destoff,
+		       int no_mtime);
 
 /* Mask out flags that are inappropriate for the given type of inode. */
 static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
@@ -3054,7 +3055,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 	/* pass original length for comparison so we stay within i_size */
 	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
 	if (ret == 0)
-		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
+		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
 
 	if (same_inode)
 		unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
@@ -3219,13 +3220,17 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
 				     struct inode *inode,
 				     u64 endoff,
 				     const u64 destoff,
-				     const u64 olen)
+				     const u64 olen,
+				     int no_mtime)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	int ret;
 
 	inode_inc_iversion(inode);
-	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+	if (no_mtime)
+		inode->i_ctime = CURRENT_TIME;
+	else
+		inode->i_mtime = inode->i_ctime = CURRENT_TIME;
 	/*
 	 * We round up to the block size at eof when determining which
 	 * extents to clone above, but shouldn't round up the file size.
@@ -3310,13 +3315,13 @@ static void clone_update_extent_map(struct inode *inode,
  * @inode: Inode to clone to
  * @off: Offset within source to start clone from
  * @olen: Original length, passed by user, of range to clone
- * @olen_aligned: Block-aligned value of olen, extent_same uses
- *               identical values here
+ * @olen_aligned: Block-aligned value of olen
  * @destoff: Offset within @inode to start clone
+ * @no_mtime: Whether to update mtime on the target inode
  */
 static int btrfs_clone(struct inode *src, struct inode *inode,
 		       const u64 off, const u64 olen, const u64 olen_aligned,
-		       const u64 destoff)
+		       const u64 destoff, int no_mtime)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_path *path = NULL;
@@ -3640,7 +3645,7 @@ process_slot:
 					      root->sectorsize);
 			ret = clone_finish_inode_update(trans, inode,
 							last_dest_end,
-							destoff, olen);
+							destoff, olen, no_mtime);
 			if (ret)
 				goto out;
 			if (new_key.offset + datal >= destoff + len)
@@ -3678,7 +3683,7 @@ process_slot:
 		clone_update_extent_map(inode, trans, NULL, last_dest_end,
 					destoff + len - last_dest_end);
 		ret = clone_finish_inode_update(trans, inode, destoff + len,
-						destoff, olen);
+						destoff, olen, no_mtime);
 	}
 
 out:
@@ -3808,7 +3813,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 		btrfs_double_extent_lock(src, off, inode, destoff, len);
 	}
 
-	ret = btrfs_clone(src, inode, off, olen, len, destoff);
+	ret = btrfs_clone(src, inode, off, olen, len, destoff, 0);
 
 	if (same_inode) {
 		u64 lock_start = min_t(u64, off, destoff);
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/5] btrfs: don't update mtime on deduped inodes
  2015-06-26 21:01 ` [PATCH 5/5] btrfs: don't update mtime on deduped inodes Mark Fasheh
@ 2015-06-27 21:44   ` Zygo Blaxell
  2015-06-29 17:52     ` Mark Fasheh
  0 siblings, 1 reply; 10+ messages in thread
From: Zygo Blaxell @ 2015-06-27 21:44 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4596 bytes --]

On Fri, Jun 26, 2015 at 02:01:01PM -0700, Mark Fasheh wrote:
> One issue users have reported is that dedupe changes mtime on files,
> resulting in tools like rsync thinking that their contents have changed when
> in fact the data is exactly the same. Clone still wants an mtime change, so
> we special case this in the code.
> 
> This was tested with the btrfs-extent-same tool.
> 
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> ---
>  fs/btrfs/ioctl.c | 25 +++++++++++++++----------
>  1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 83f4679..0af0f13 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -87,7 +87,8 @@ struct btrfs_ioctl_received_subvol_args_32 {
>  
>  
>  static int btrfs_clone(struct inode *src, struct inode *inode,
> -		       u64 off, u64 olen, u64 olen_aligned, u64 destoff);
> +		       u64 off, u64 olen, u64 olen_aligned, u64 destoff,
> +		       int no_mtime);
>  
>  /* Mask out flags that are inappropriate for the given type of inode. */
>  static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
> @@ -3054,7 +3055,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
>  	/* pass original length for comparison so we stay within i_size */
>  	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
>  	if (ret == 0)
> -		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
> +		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
>  
>  	if (same_inode)
>  		unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> @@ -3219,13 +3220,17 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
>  				     struct inode *inode,
>  				     u64 endoff,
>  				     const u64 destoff,
> -				     const u64 olen)
> +				     const u64 olen,
> +				     int no_mtime)
>  {
>  	struct btrfs_root *root = BTRFS_I(inode)->root;
>  	int ret;
>  
>  	inode_inc_iversion(inode);
> -	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> +	if (no_mtime)
> +		inode->i_ctime = CURRENT_TIME;

I don't see a good reason to modify the ctime either.  Again, nothing
is changing here.  All we are doing is shuffling physical storage around.

Defrag and balance (which also move physical extents around) don't
touch ctime, mtime, or even atime.

> +	else
> +		inode->i_mtime = inode->i_ctime = CURRENT_TIME;
>  	/*
>  	 * We round up to the block size at eof when determining which
>  	 * extents to clone above, but shouldn't round up the file size.
> @@ -3310,13 +3315,13 @@ static void clone_update_extent_map(struct inode *inode,
>   * @inode: Inode to clone to
>   * @off: Offset within source to start clone from
>   * @olen: Original length, passed by user, of range to clone
> - * @olen_aligned: Block-aligned value of olen, extent_same uses
> - *               identical values here
> + * @olen_aligned: Block-aligned value of olen
>   * @destoff: Offset within @inode to start clone
> + * @no_mtime: Whether to update mtime on the target inode
>   */
>  static int btrfs_clone(struct inode *src, struct inode *inode,
>  		       const u64 off, const u64 olen, const u64 olen_aligned,
> -		       const u64 destoff)
> +		       const u64 destoff, int no_mtime)
>  {
>  	struct btrfs_root *root = BTRFS_I(inode)->root;
>  	struct btrfs_path *path = NULL;
> @@ -3640,7 +3645,7 @@ process_slot:
>  					      root->sectorsize);
>  			ret = clone_finish_inode_update(trans, inode,
>  							last_dest_end,
> -							destoff, olen);
> +							destoff, olen, no_mtime);
>  			if (ret)
>  				goto out;
>  			if (new_key.offset + datal >= destoff + len)
> @@ -3678,7 +3683,7 @@ process_slot:
>  		clone_update_extent_map(inode, trans, NULL, last_dest_end,
>  					destoff + len - last_dest_end);
>  		ret = clone_finish_inode_update(trans, inode, destoff + len,
> -						destoff, olen);
> +						destoff, olen, no_mtime);
>  	}
>  
>  out:
> @@ -3808,7 +3813,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
>  		btrfs_double_extent_lock(src, off, inode, destoff, len);
>  	}
>  
> -	ret = btrfs_clone(src, inode, off, olen, len, destoff);
> +	ret = btrfs_clone(src, inode, off, olen, len, destoff, 0);
>  
>  	if (same_inode) {
>  		u64 lock_start = min_t(u64, off, destoff);
> -- 
> 2.1.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/5] btrfs: don't update mtime on deduped inodes
  2015-06-27 21:44   ` Zygo Blaxell
@ 2015-06-29 17:52     ` Mark Fasheh
  2015-06-29 19:35       ` Zygo Blaxell
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Fasheh @ 2015-06-29 17:52 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Sat, Jun 27, 2015 at 05:44:28PM -0400, Zygo Blaxell wrote:
> On Fri, Jun 26, 2015 at 02:01:01PM -0700, Mark Fasheh wrote:
> > One issue users have reported is that dedupe changes mtime on files,
> > resulting in tools like rsync thinking that their contents have changed when
> > in fact the data is exactly the same. Clone still wants an mtime change, so
> > we special case this in the code.
> > 
> > This was tested with the btrfs-extent-same tool.
> > 
> > Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> > ---
> >  fs/btrfs/ioctl.c | 25 +++++++++++++++----------
> >  1 file changed, 15 insertions(+), 10 deletions(-)
> > 
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 83f4679..0af0f13 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -87,7 +87,8 @@ struct btrfs_ioctl_received_subvol_args_32 {
> >  
> >  
> >  static int btrfs_clone(struct inode *src, struct inode *inode,
> > -		       u64 off, u64 olen, u64 olen_aligned, u64 destoff);
> > +		       u64 off, u64 olen, u64 olen_aligned, u64 destoff,
> > +		       int no_mtime);
> >  
> >  /* Mask out flags that are inappropriate for the given type of inode. */
> >  static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
> > @@ -3054,7 +3055,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
> >  	/* pass original length for comparison so we stay within i_size */
> >  	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
> >  	if (ret == 0)
> > -		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
> > +		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> >  
> >  	if (same_inode)
> >  		unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> > @@ -3219,13 +3220,17 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
> >  				     struct inode *inode,
> >  				     u64 endoff,
> >  				     const u64 destoff,
> > -				     const u64 olen)
> > +				     const u64 olen,
> > +				     int no_mtime)
> >  {
> >  	struct btrfs_root *root = BTRFS_I(inode)->root;
> >  	int ret;
> >  
> >  	inode_inc_iversion(inode);
> > -	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> > +	if (no_mtime)
> > +		inode->i_ctime = CURRENT_TIME;
> 
> I don't see a good reason to modify the ctime either.  Again, nothing
> is changing here.  All we are doing is shuffling physical storage around.
> 
> Defrag and balance (which also move physical extents around) don't
> touch ctime, mtime, or even atime.

To be fair, those may actually be oversights, it's not uncommon to update
ctime on metadata changes.

Does a ctime change hurt any backup software (the reason for my first
patch)? I guess it could cause revaluation of meta data, but does that
actually happen? From what I can tell stuff like rsync is using mtime +
i_size to see if an inode changed.

Is there any software out there that monitors an inodes extent state which
might *want* ctime updates when this happens? Is that kind of usage a
stretch (or even something we care about?).

So my thinking is if it doesn't hurt anything, leave it in. Obviously if it
*is* causing issues then we should take it right out :)

Thanks for the discussion and review btw,
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/5] btrfs: don't update mtime on deduped inodes
  2015-06-29 17:52     ` Mark Fasheh
@ 2015-06-29 19:35       ` Zygo Blaxell
  2015-06-29 20:29         ` Mark Fasheh
  0 siblings, 1 reply; 10+ messages in thread
From: Zygo Blaxell @ 2015-06-29 19:35 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4632 bytes --]

On Mon, Jun 29, 2015 at 10:52:41AM -0700, Mark Fasheh wrote:
> On Sat, Jun 27, 2015 at 05:44:28PM -0400, Zygo Blaxell wrote:
> > On Fri, Jun 26, 2015 at 02:01:01PM -0700, Mark Fasheh wrote:
> > > One issue users have reported is that dedupe changes mtime on files,
> > > resulting in tools like rsync thinking that their contents have changed when
> > > in fact the data is exactly the same. Clone still wants an mtime change, so
> > > we special case this in the code.
> > > 
> > > This was tested with the btrfs-extent-same tool.
> > > 
> > > Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> > > ---
> > >  fs/btrfs/ioctl.c | 25 +++++++++++++++----------
> > >  1 file changed, 15 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > > index 83f4679..0af0f13 100644
> > > --- a/fs/btrfs/ioctl.c
> > > +++ b/fs/btrfs/ioctl.c
> > > @@ -87,7 +87,8 @@ struct btrfs_ioctl_received_subvol_args_32 {
> > >  
> > >  
> > >  static int btrfs_clone(struct inode *src, struct inode *inode,
> > > -		       u64 off, u64 olen, u64 olen_aligned, u64 destoff);
> > > +		       u64 off, u64 olen, u64 olen_aligned, u64 destoff,
> > > +		       int no_mtime);
> > >  
> > >  /* Mask out flags that are inappropriate for the given type of inode. */
> > >  static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
> > > @@ -3054,7 +3055,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
> > >  	/* pass original length for comparison so we stay within i_size */
> > >  	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
> > >  	if (ret == 0)
> > > -		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
> > > +		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> > >  
> > >  	if (same_inode)
> > >  		unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> > > @@ -3219,13 +3220,17 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
> > >  				     struct inode *inode,
> > >  				     u64 endoff,
> > >  				     const u64 destoff,
> > > -				     const u64 olen)
> > > +				     const u64 olen,
> > > +				     int no_mtime)
> > >  {
> > >  	struct btrfs_root *root = BTRFS_I(inode)->root;
> > >  	int ret;
> > >  
> > >  	inode_inc_iversion(inode);
> > > -	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> > > +	if (no_mtime)
> > > +		inode->i_ctime = CURRENT_TIME;
> > 
> > I don't see a good reason to modify the ctime either.  Again, nothing
> > is changing here.  All we are doing is shuffling physical storage around.
> > 
> > Defrag and balance (which also move physical extents around) don't
> > touch ctime, mtime, or even atime.
> 
> To be fair, those may actually be oversights, it's not uncommon to update
> ctime on metadata changes.

It makes no sense semantically.  There are no changes to any inode
fields here.  Normally when extents are moved around inodes don't change
at all.

The current balance behavior is definitely not an oversight.  Balance
would have to rewrite every inode on the filesystem (actually every
*snapshot* of every inode) to update ctime.

Defrag is copy-in-place while holding a lock to prevent concurrent
modifications.  Defrag does the complementary operation to extent-same.
Defrag will also not change any file contents, and it's already got the
correct ctime behavior.  ;)

> Does a ctime change hurt any backup software (the reason for my first
> patch)? I guess it could cause revaluation of meta data, but does that
> actually happen? From what I can tell stuff like rsync is using mtime +
> i_size to see if an inode changed.

Off the top of my head, git uses ctime to figure out whether its index is
up to date.  It probably borrowed that from other SCMs.

Some admins (myself included) build ad-hoc (and even formal) forensic
timelines from ctime data.  This doesn't work if dedup wipes out the
ctime, especially if you are doing aggressive dedup across multiple
snapshots.

The core problem is that deduping stops being a transparent rearrangement
of the physical layout of the data (like defrag or balance) if the ctime
changes whenever you do it.

> Is there any software out there that monitors an inodes extent state which
> might *want* ctime updates when this happens? Is that kind of usage a
> stretch (or even something we care about?).
> 
> So my thinking is if it doesn't hurt anything, leave it in. Obviously if it
> *is* causing issues then we should take it right out :)
> 
> Thanks for the discussion and review btw,
> 	--Mark
> 
> --
> Mark Fasheh
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/5] btrfs: don't update mtime on deduped inodes
  2015-06-29 19:35       ` Zygo Blaxell
@ 2015-06-29 20:29         ` Mark Fasheh
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Fasheh @ 2015-06-29 20:29 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Mon, Jun 29, 2015 at 03:35:02PM -0400, Zygo Blaxell wrote:
> On Mon, Jun 29, 2015 at 10:52:41AM -0700, Mark Fasheh wrote:
> > On Sat, Jun 27, 2015 at 05:44:28PM -0400, Zygo Blaxell wrote:
> > > On Fri, Jun 26, 2015 at 02:01:01PM -0700, Mark Fasheh wrote:
> > > > One issue users have reported is that dedupe changes mtime on files,
> > > > resulting in tools like rsync thinking that their contents have changed when
> > > > in fact the data is exactly the same. Clone still wants an mtime change, so
> > > > we special case this in the code.
> > > > 
> > > > This was tested with the btrfs-extent-same tool.
> > > > 
> > > > Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> > > > ---
> > > >  fs/btrfs/ioctl.c | 25 +++++++++++++++----------
> > > >  1 file changed, 15 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > > > index 83f4679..0af0f13 100644
> > > > --- a/fs/btrfs/ioctl.c
> > > > +++ b/fs/btrfs/ioctl.c
> > > > @@ -87,7 +87,8 @@ struct btrfs_ioctl_received_subvol_args_32 {
> > > >  
> > > >  
> > > >  static int btrfs_clone(struct inode *src, struct inode *inode,
> > > > -		       u64 off, u64 olen, u64 olen_aligned, u64 destoff);
> > > > +		       u64 off, u64 olen, u64 olen_aligned, u64 destoff,
> > > > +		       int no_mtime);
> > > >  
> > > >  /* Mask out flags that are inappropriate for the given type of inode. */
> > > >  static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
> > > > @@ -3054,7 +3055,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
> > > >  	/* pass original length for comparison so we stay within i_size */
> > > >  	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
> > > >  	if (ret == 0)
> > > > -		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
> > > > +		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
> > > >  
> > > >  	if (same_inode)
> > > >  		unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> > > > @@ -3219,13 +3220,17 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
> > > >  				     struct inode *inode,
> > > >  				     u64 endoff,
> > > >  				     const u64 destoff,
> > > > -				     const u64 olen)
> > > > +				     const u64 olen,
> > > > +				     int no_mtime)
> > > >  {
> > > >  	struct btrfs_root *root = BTRFS_I(inode)->root;
> > > >  	int ret;
> > > >  
> > > >  	inode_inc_iversion(inode);
> > > > -	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> > > > +	if (no_mtime)
> > > > +		inode->i_ctime = CURRENT_TIME;
> > > 
> > > I don't see a good reason to modify the ctime either.  Again, nothing
> > > is changing here.  All we are doing is shuffling physical storage around.
> > > 
> > > Defrag and balance (which also move physical extents around) don't
> > > touch ctime, mtime, or even atime.
> > 
> > To be fair, those may actually be oversights, it's not uncommon to update
> > ctime on metadata changes.
> 
> It makes no sense semantically.  There are no changes to any inode
> fields here.  Normally when extents are moved around inodes don't change
> at all.
> 
> The current balance behavior is definitely not an oversight.  Balance
> would have to rewrite every inode on the filesystem (actually every
> *snapshot* of every inode) to update ctime.
> 
> Defrag is copy-in-place while holding a lock to prevent concurrent
> modifications.  Defrag does the complementary operation to extent-same.
> Defrag will also not change any file contents, and it's already got the
> correct ctime behavior.  ;)

Ahh ok sure that makes sense :)


> > Does a ctime change hurt any backup software (the reason for my first
> > patch)? I guess it could cause revaluation of meta data, but does that
> > actually happen? From what I can tell stuff like rsync is using mtime +
> > i_size to see if an inode changed.
> 
> Off the top of my head, git uses ctime to figure out whether its index is
> up to date.  It probably borrowed that from other SCMs.
> 
> Some admins (myself included) build ad-hoc (and even formal) forensic
> timelines from ctime data.  This doesn't work if dedup wipes out the
> ctime, especially if you are doing aggressive dedup across multiple
> snapshots.
> 
> The core problem is that deduping stops being a transparent rearrangement
> of the physical layout of the data (like defrag or balance) if the ctime
> changes whenever you do it.

Ok, thanks for describing those use cases. It makes sense now to me why we
would want to avoid the ctime changes. I should have another patch out
shortly.

Thanks again,
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-06-29 20:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-26 21:00 [PATCH 0/5] btrfs: dedupe fixes, features V4 Mark Fasheh
2015-06-26 21:00 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
2015-06-26 21:00 ` [PATCH 2/5] btrfs: fix deadlock with extent-same and readpage Mark Fasheh
2015-06-26 21:00 ` [PATCH 3/5] btrfs: fix clone / extent-same deadlocks Mark Fasheh
2015-06-26 21:01 ` [PATCH 4/5] btrfs: allow dedupe of same inode Mark Fasheh
2015-06-26 21:01 ` [PATCH 5/5] btrfs: don't update mtime on deduped inodes Mark Fasheh
2015-06-27 21:44   ` Zygo Blaxell
2015-06-29 17:52     ` Mark Fasheh
2015-06-29 19:35       ` Zygo Blaxell
2015-06-29 20:29         ` Mark Fasheh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.