All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] btrfs: dedupe fixes, features V2
@ 2015-06-22 22:47 Mark Fasheh
  2015-06-22 22:47 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Mark Fasheh @ 2015-06-22 22:47 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs

Hi Chris,

   The following patches are based on top of my patch titled "btrfs:
Handle unaligned length in extent_same" which you have in your
'integration-4.2' branch:

https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=e1d227a42ea2b4664f94212bd1106b9a3413ffb8


I sent out the 1st two patches from this series last week, the last 3 are   
new to the list:

http://www.spinics.net/lists/linux-btrfs/msg44849.html


The first patch in the series fixes a bug where we were sometimes
passing the aligned length to our comparison function. We actually can
stop at the user passed length for this as we don't need to compare
data past i_size (and we only align if the extents go out to i_size).

The 2nd patch fixes a deadlock between btrfs readpage and
btrfs_extent_same. This was reported on the list some months ago -
basically we had the page and extent locking reversed. My patch fixes
up the locking to be in the right order.

The 3rd patch fixes a deadlocks in clone() (wrt extent-same) which
David found while reviewing my fixes. I also found that clone doesn't
lock extent ranges in any particular order which could obvioulsy be a
problem so that is fixed there too.

The last two patches add features which have been requested often by
users - the 4th adds the ability to dedupe within the same inode, and
the last patch adds a dedupe flag to avoid mtime updates (this helps
with backup software).

These patches have been tested with the 'btrfs-extent-same' tool that
can be found at:

https://github.com/markfasheh/duperemove/blob/nomtime/btrfs-extent-same.c

Thanks,
   --Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data()
  2015-06-22 22:47 [PATCH 0/5] btrfs: dedupe fixes, features V2 Mark Fasheh
@ 2015-06-22 22:47 ` Mark Fasheh
  2015-06-22 22:47 ` [PATCH 2/5] btrfs: fix deadlock with extent-same and readpage Mark Fasheh
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Mark Fasheh @ 2015-06-22 22:47 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh

In the case that we dedupe the tail of a file, we might expand the dedupe
len out to the end of our last block. We don't want to compare data past
i_size however, so pass the original length to btrfs_cmp_data().

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
 fs/btrfs/ioctl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2d24ff4..2deea1f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2933,7 +2933,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 		goto out_unlock;
 	}
 
-	ret = btrfs_cmp_data(src, loff, dst, dst_loff, len);
+	/* pass original length for comparison so we stay within i_size */
+	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen);
 	if (ret == 0)
 		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
 
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/5] btrfs: fix deadlock with extent-same and readpage
  2015-06-22 22:47 [PATCH 0/5] btrfs: dedupe fixes, features V2 Mark Fasheh
  2015-06-22 22:47 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
@ 2015-06-22 22:47 ` Mark Fasheh
  2015-06-22 22:47 ` [PATCH 3/5] btrfs: fix clone / extent-same deadlocks Mark Fasheh
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Mark Fasheh @ 2015-06-22 22:47 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh

->readpage() does page_lock() before extent_lock(), we do the opposite in
extent-same. We want to reverse the order in btrfs_extent_same() but it's
not quite straightforward since the page locks are taken inside btrfs_cmp_data().

So I split btrfs_cmp_data() into 3 parts with a small context structure that
is passed between them. The first, btrfs_cmp_data_prepare() gathers up the
pages needed (taking page lock as required) and puts them on our context
structure. At this point, we are safe to lock the extent range. Afterwards,
we use btrfs_cmp_data() to do the data compare as usual and btrfs_cmp_data_free()
to clean up our context.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
 fs/btrfs/ioctl.c | 148 +++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 117 insertions(+), 31 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2deea1f..b899584 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2755,14 +2755,11 @@ out:
 	return ret;
 }
 
-static struct page *extent_same_get_page(struct inode *inode, u64 off)
+static struct page *extent_same_get_page(struct inode *inode, pgoff_t index)
 {
 	struct page *page;
-	pgoff_t index;
 	struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
 
-	index = off >> PAGE_CACHE_SHIFT;
-
 	page = grab_cache_page(inode->i_mapping, index);
 	if (!page)
 		return NULL;
@@ -2783,6 +2780,20 @@ static struct page *extent_same_get_page(struct inode *inode, u64 off)
 	return page;
 }
 
+static int gather_extent_pages(struct inode *inode, struct page **pages,
+			       int num_pages, u64 off)
+{
+	int i;
+	pgoff_t index = off >> PAGE_CACHE_SHIFT;
+
+	for (i = 0; i < num_pages; i++) {
+		pages[i] = extent_same_get_page(inode, index + i);
+		if (!pages[i])
+			return -ENOMEM;
+	}
+	return 0;
+}
+
 static inline void lock_extent_range(struct inode *inode, u64 off, u64 len)
 {
 	/* do any pending delalloc/csum calc on src, one way or
@@ -2808,52 +2819,120 @@ static inline void lock_extent_range(struct inode *inode, u64 off, u64 len)
 	}
 }
 
-static void btrfs_double_unlock(struct inode *inode1, u64 loff1,
-				struct inode *inode2, u64 loff2, u64 len)
+static void btrfs_double_inode_unlock(struct inode *inode1, struct inode *inode2)
 {
-	unlock_extent(&BTRFS_I(inode1)->io_tree, loff1, loff1 + len - 1);
-	unlock_extent(&BTRFS_I(inode2)->io_tree, loff2, loff2 + len - 1);
-
 	mutex_unlock(&inode1->i_mutex);
 	mutex_unlock(&inode2->i_mutex);
 }
 
-static void btrfs_double_lock(struct inode *inode1, u64 loff1,
-			      struct inode *inode2, u64 loff2, u64 len)
+static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
+{
+	if (inode1 < inode2)
+		swap(inode1, inode2);
+
+	mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
+	if (inode1 != inode2)
+		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+}
+
+static void btrfs_double_extent_unlock(struct inode *inode1, u64 loff1,
+				      struct inode *inode2, u64 loff2, u64 len)
+{
+	unlock_extent(&BTRFS_I(inode1)->io_tree, loff1, loff1 + len - 1);
+	unlock_extent(&BTRFS_I(inode2)->io_tree, loff2, loff2 + len - 1);
+}
+
+static void btrfs_double_extent_lock(struct inode *inode1, u64 loff1,
+				     struct inode *inode2, u64 loff2, u64 len)
 {
 	if (inode1 < inode2) {
 		swap(inode1, inode2);
 		swap(loff1, loff2);
 	}
-
-	mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
 	lock_extent_range(inode1, loff1, len);
-	if (inode1 != inode2) {
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+	if (inode1 != inode2)
 		lock_extent_range(inode2, loff2, len);
+}
+
+struct cmp_pages {
+	int		num_pages;
+	struct page	**src_pages;
+	struct page	**dst_pages;
+};
+
+static void btrfs_cmp_data_free(struct cmp_pages *cmp)
+{
+	int i;
+	struct page *pg;
+
+	for (i = 0; i < cmp->num_pages; i++) {
+		pg = cmp->src_pages[i];
+		if (pg)
+			page_cache_release(pg);
+		pg = cmp->dst_pages[i];
+		if (pg)
+			page_cache_release(pg);
+	}
+	kfree(cmp->src_pages);
+	kfree(cmp->dst_pages);
+}
+
+static int btrfs_cmp_data_prepare(struct inode *src, u64 loff,
+				  struct inode *dst, u64 dst_loff,
+				  u64 len, struct cmp_pages *cmp)
+{
+	int ret;
+	int num_pages = PAGE_CACHE_ALIGN(len) >> PAGE_CACHE_SHIFT;
+	struct page **src_pgarr, **dst_pgarr;
+
+	/*
+	 * We must gather up all the pages before we initiate our
+	 * extent locking. We use an array for the page pointers. Size
+	 * of the array is bounded by len, which is in turn bounded by
+	 * BTRFS_MAX_DEDUPE_LEN.
+	 */
+	src_pgarr = kzalloc(num_pages * sizeof(struct page *), GFP_NOFS);
+	dst_pgarr = kzalloc(num_pages * sizeof(struct page *), GFP_NOFS);
+	if (!src_pgarr || !dst_pgarr) {
+		kfree(src_pgarr);
+		kfree(dst_pgarr);
+		return -ENOMEM;
 	}
+	cmp->num_pages = num_pages;
+	cmp->src_pages = src_pgarr;
+	cmp->dst_pages = dst_pgarr;
+
+	ret = gather_extent_pages(src, cmp->src_pages, cmp->num_pages, loff);
+	if (ret)
+		goto out;
+
+	ret = gather_extent_pages(dst, cmp->dst_pages, cmp->num_pages, dst_loff);
+
+out:
+	if (ret)
+		btrfs_cmp_data_free(cmp);
+	return 0;
 }
 
 static int btrfs_cmp_data(struct inode *src, u64 loff, struct inode *dst,
-			  u64 dst_loff, u64 len)
+			  u64 dst_loff, u64 len, struct cmp_pages *cmp)
 {
 	int ret = 0;
+	int i;
 	struct page *src_page, *dst_page;
 	unsigned int cmp_len = PAGE_CACHE_SIZE;
 	void *addr, *dst_addr;
 
+	i = 0;
 	while (len) {
 		if (len < PAGE_CACHE_SIZE)
 			cmp_len = len;
 
-		src_page = extent_same_get_page(src, loff);
-		if (!src_page)
-			return -EINVAL;
-		dst_page = extent_same_get_page(dst, dst_loff);
-		if (!dst_page) {
-			page_cache_release(src_page);
-			return -EINVAL;
-		}
+		BUG_ON(i >= cmp->num_pages);
+
+		src_page = cmp->src_pages[i];
+		dst_page = cmp->dst_pages[i];
+
 		addr = kmap_atomic(src_page);
 		dst_addr = kmap_atomic(dst_page);
 
@@ -2865,15 +2944,12 @@ static int btrfs_cmp_data(struct inode *src, u64 loff, struct inode *dst,
 
 		kunmap_atomic(addr);
 		kunmap_atomic(dst_addr);
-		page_cache_release(src_page);
-		page_cache_release(dst_page);
 
 		if (ret)
 			break;
 
-		loff += cmp_len;
-		dst_loff += cmp_len;
 		len -= cmp_len;
+		i++;
 	}
 
 	return ret;
@@ -2904,6 +2980,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 {
 	int ret;
 	u64 len = olen;
+	struct cmp_pages cmp;
 
 	/*
 	 * btrfs_clone() can't handle extents in the same file
@@ -2916,7 +2993,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 	if (len == 0)
 		return 0;
 
-	btrfs_double_lock(src, loff, dst, dst_loff, len);
+	btrfs_double_inode_lock(src, dst);
 
 	ret = extent_same_check_offsets(src, loff, &len, olen);
 	if (ret)
@@ -2933,13 +3010,22 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 		goto out_unlock;
 	}
 
+	ret = btrfs_cmp_data_prepare(src, loff, dst, dst_loff, olen, &cmp);
+	if (ret)
+		goto out_unlock;
+
+	btrfs_double_extent_lock(src, loff, dst, dst_loff, len);
+
 	/* pass original length for comparison so we stay within i_size */
-	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen);
+	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
 	if (ret == 0)
 		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
 
+	btrfs_double_extent_unlock(src, loff, dst, dst_loff, len);
+
+	btrfs_cmp_data_free(&cmp);
 out_unlock:
-	btrfs_double_unlock(src, loff, dst, dst_loff, len);
+	btrfs_double_inode_unlock(src, dst);
 
 	return ret;
 }
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] btrfs: fix clone / extent-same deadlocks
  2015-06-22 22:47 [PATCH 0/5] btrfs: dedupe fixes, features V2 Mark Fasheh
  2015-06-22 22:47 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
  2015-06-22 22:47 ` [PATCH 2/5] btrfs: fix deadlock with extent-same and readpage Mark Fasheh
@ 2015-06-22 22:47 ` Mark Fasheh
  2015-06-23 14:56   ` David Sterba
  2015-06-22 22:47 ` [PATCH 4/5] btrfs: allow dedupe of same inode Mark Fasheh
  2015-06-22 22:47 ` [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same Mark Fasheh
  4 siblings, 1 reply; 18+ messages in thread
From: Mark Fasheh @ 2015-06-22 22:47 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh

Clone and extent same lock their source and target inodes in opposite order.
In addition to this, the range locking in clone doesn't take ordering into
account. Fix this by having clone use the same locking helpers as
btrfs-extent-same.

In addition, I do a small cleanup of the locking helpers, removing a case
(both inodes being the same) which was poorly accounted for and never
actually used by the callers.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/ioctl.c | 34 ++++++++--------------------------
 1 file changed, 8 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b899584..8d6887d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2831,8 +2831,7 @@ static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
 		swap(inode1, inode2);
 
 	mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
-	if (inode1 != inode2)
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+	mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
 }
 
 static void btrfs_double_extent_unlock(struct inode *inode1, u64 loff1,
@@ -2850,8 +2849,7 @@ static void btrfs_double_extent_lock(struct inode *inode1, u64 loff1,
 		swap(loff1, loff2);
 	}
 	lock_extent_range(inode1, loff1, len);
-	if (inode1 != inode2)
-		lock_extent_range(inode2, loff2, len);
+	lock_extent_range(inode2, loff2, len);
 }
 
 struct cmp_pages {
@@ -3713,13 +3711,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 		goto out_fput;
 
 	if (!same_inode) {
-		if (inode < src) {
-			mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT);
-			mutex_lock_nested(&src->i_mutex, I_MUTEX_CHILD);
-		} else {
-			mutex_lock_nested(&src->i_mutex, I_MUTEX_PARENT);
-			mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
-		}
+		btrfs_double_inode_lock(src, inode);
 	} else {
 		mutex_lock(&src->i_mutex);
 	}
@@ -3769,8 +3761,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 
 		lock_extent_range(src, lock_start, lock_len);
 	} else {
-		lock_extent_range(src, off, len);
-		lock_extent_range(inode, destoff, len);
+		btrfs_double_extent_lock(src, off, inode, destoff, len);
 	}
 
 	ret = btrfs_clone(src, inode, off, olen, len, destoff);
@@ -3781,9 +3772,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 
 		unlock_extent(&BTRFS_I(src)->io_tree, lock_start, lock_end);
 	} else {
-		unlock_extent(&BTRFS_I(src)->io_tree, off, off + len - 1);
-		unlock_extent(&BTRFS_I(inode)->io_tree, destoff,
-			      destoff + len - 1);
+		btrfs_double_extent_unlock(src, off, inode, destoff, len);
 	}
 	/*
 	 * Truncate page cache pages so that future reads will see the cloned
@@ -3792,17 +3781,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 	truncate_inode_pages_range(&inode->i_data, destoff,
 				   PAGE_CACHE_ALIGN(destoff + len) - 1);
 out_unlock:
-	if (!same_inode) {
-		if (inode < src) {
-			mutex_unlock(&src->i_mutex);
-			mutex_unlock(&inode->i_mutex);
-		} else {
-			mutex_unlock(&inode->i_mutex);
-			mutex_unlock(&src->i_mutex);
-		}
-	} else {
+	if (!same_inode)
+		btrfs_double_inode_unlock(src, inode);
+	else
 		mutex_unlock(&src->i_mutex);
-	}
 out_fput:
 	fdput(src_file);
 out_drop_write:
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/5] btrfs: allow dedupe of same inode
  2015-06-22 22:47 [PATCH 0/5] btrfs: dedupe fixes, features V2 Mark Fasheh
                   ` (2 preceding siblings ...)
  2015-06-22 22:47 ` [PATCH 3/5] btrfs: fix clone / extent-same deadlocks Mark Fasheh
@ 2015-06-22 22:47 ` Mark Fasheh
  2015-06-23 14:59   ` David Sterba
  2015-06-22 22:47 ` [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same Mark Fasheh
  4 siblings, 1 reply; 18+ messages in thread
From: Mark Fasheh @ 2015-06-22 22:47 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh

clone() supports cloning within an inode so extent-same can do
the same now. This patch fixes up the locking in extent-same to
know about the single-inode case. In addition to that, we add a
check for overlapping ranges, which clone does not allow.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/ioctl.c | 76 ++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 60 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 8d6887d..83f4679 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2979,27 +2979,61 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 	int ret;
 	u64 len = olen;
 	struct cmp_pages cmp;
+	int same_inode = 0;
+	u64 same_lock_start = 0;
+	u64 same_lock_len = 0;
 
-	/*
-	 * btrfs_clone() can't handle extents in the same file
-	 * yet. Once that works, we can drop this check and replace it
-	 * with a check for the same inode, but overlapping extents.
-	 */
 	if (src == dst)
-		return -EINVAL;
+		same_inode = 1;
 
 	if (len == 0)
 		return 0;
 
-	btrfs_double_inode_lock(src, dst);
+	if (same_inode) {
+		mutex_lock(&src->i_mutex);
 
-	ret = extent_same_check_offsets(src, loff, &len, olen);
-	if (ret)
-		goto out_unlock;
+		ret = extent_same_check_offsets(src, loff, &len, olen);
+		if (ret)
+			goto out_unlock;
 
-	ret = extent_same_check_offsets(dst, dst_loff, &len, olen);
-	if (ret)
-		goto out_unlock;
+		/*
+		 * Single inode case wants the same checks, except we
+		 * don't want our length pushed out past i_size as
+		 * comparing that data range makes no sense.
+		 *
+		 * extent_same_check_offsets() will do this for an
+		 * unaligned length at i_size, so catch it here and
+		 * reject the request.
+		 *
+		 * This effectively means we require aligned extents
+		 * for the single-inode case, whereas the other cases
+		 * allow an unaligned length so long as it ends at
+		 * i_size.
+		 */
+		if (len != olen) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
+		/* Check for overlapping ranges */
+		if (dst_loff + len > loff && dst_loff < loff + len) {
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
+		same_lock_start = min_t(u64, loff, dst_loff);
+		same_lock_len = max_t(u64, loff, dst_loff) + len - same_lock_start;
+	} else {
+		btrfs_double_inode_lock(src, dst);
+
+		ret = extent_same_check_offsets(src, loff, &len, olen);
+		if (ret)
+			goto out_unlock;
+
+		ret = extent_same_check_offsets(dst, dst_loff, &len, olen);
+		if (ret)
+			goto out_unlock;
+	}
 
 	/* don't make the dst file partly checksummed */
 	if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
@@ -3012,18 +3046,28 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 	if (ret)
 		goto out_unlock;
 
-	btrfs_double_extent_lock(src, loff, dst, dst_loff, len);
+	if (same_inode)
+		lock_extent_range(src, same_lock_start, same_lock_len);
+	else
+		btrfs_double_extent_lock(src, loff, dst, dst_loff, len);
 
 	/* pass original length for comparison so we stay within i_size */
 	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
 	if (ret == 0)
 		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
 
-	btrfs_double_extent_unlock(src, loff, dst, dst_loff, len);
+	if (same_inode)
+		unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
+			      same_lock_start + same_lock_len - 1);
+	else
+		btrfs_double_extent_unlock(src, loff, dst, dst_loff, len);
 
 	btrfs_cmp_data_free(&cmp);
 out_unlock:
-	btrfs_double_inode_unlock(src, dst);
+	if (same_inode)
+		mutex_unlock(&src->i_mutex);
+	else
+		btrfs_double_inode_unlock(src, dst);
 
 	return ret;
 }
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same
  2015-06-22 22:47 [PATCH 0/5] btrfs: dedupe fixes, features V2 Mark Fasheh
                   ` (3 preceding siblings ...)
  2015-06-22 22:47 ` [PATCH 4/5] btrfs: allow dedupe of same inode Mark Fasheh
@ 2015-06-22 22:47 ` Mark Fasheh
  2015-06-23 15:11   ` David Sterba
  4 siblings, 1 reply; 18+ messages in thread
From: Mark Fasheh @ 2015-06-22 22:47 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh

One issue users have reported is that dedupe changes mtime on files,
resulting in tools like rsync thinking that their contents have changed when
in fact the data is exactly the same. Clone still wants an mtime change, so
we special case this in the code.

With this patch an application can pass the BTRFS_SAME_NO_MTIME flag to a
dedupe request and the kernel will honor it by only changing ctime.

I have an updated version of the btrfs-extent-same test program with a
switch to provide this flag at the 'no_time' branch of:

https://github.com/markfasheh/duperemove/

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/ioctl.c           | 34 ++++++++++++++++++++++++----------
 include/uapi/linux/btrfs.h |  5 ++++-
 2 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 83f4679..8cfc65f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -87,7 +87,8 @@ struct btrfs_ioctl_received_subvol_args_32 {
 
 
 static int btrfs_clone(struct inode *src, struct inode *inode,
-		       u64 off, u64 olen, u64 olen_aligned, u64 destoff);
+		       u64 off, u64 olen, u64 olen_aligned, u64 destoff,
+		       int no_mtime);
 
 /* Mask out flags that are inappropriate for the given type of inode. */
 static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
@@ -2974,7 +2975,7 @@ static int extent_same_check_offsets(struct inode *inode, u64 off, u64 *plen,
 }
 
 static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
-			     struct inode *dst, u64 dst_loff)
+			     struct inode *dst, u64 dst_loff, int no_mtime)
 {
 	int ret;
 	u64 len = olen;
@@ -3054,7 +3055,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 	/* pass original length for comparison so we stay within i_size */
 	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
 	if (ret == 0)
-		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
+		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff,
+				  no_mtime);
 
 	if (same_inode)
 		unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
@@ -3088,6 +3090,7 @@ static long btrfs_ioctl_file_extent_same(struct file *file,
 	u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
 	bool is_admin = capable(CAP_SYS_ADMIN);
 	u16 count;
+	int no_mtime = 0;
 
 	if (!(file->f_mode & FMODE_READ))
 		return -EINVAL;
@@ -3139,6 +3142,12 @@ static long btrfs_ioctl_file_extent_same(struct file *file,
 	if (!S_ISREG(src->i_mode))
 		goto out;
 
+	ret = -EINVAL;
+	if (same->flags & ~BTRFS_SAME_FLAGS)
+		goto out;
+	if (same->flags & BTRFS_SAME_NO_MTIME)
+		no_mtime = 1;
+
 	/* pre-format output fields to sane values */
 	for (i = 0; i < count; i++) {
 		same->info[i].bytes_deduped = 0ULL;
@@ -3164,7 +3173,8 @@ static long btrfs_ioctl_file_extent_same(struct file *file,
 			info->status = -EACCES;
 		} else {
 			info->status = btrfs_extent_same(src, off, len, dst,
-							info->logical_offset);
+							 info->logical_offset,
+							 no_mtime);
 			if (info->status == 0)
 				info->bytes_deduped += len;
 		}
@@ -3219,13 +3229,17 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
 				     struct inode *inode,
 				     u64 endoff,
 				     const u64 destoff,
-				     const u64 olen)
+				     const u64 olen,
+				     int no_mtime)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	int ret;
 
 	inode_inc_iversion(inode);
-	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+	if (no_mtime)
+		inode->i_ctime = CURRENT_TIME;
+	else
+		inode->i_mtime = inode->i_ctime = CURRENT_TIME;
 	/*
 	 * We round up to the block size at eof when determining which
 	 * extents to clone above, but shouldn't round up the file size.
@@ -3316,7 +3330,7 @@ static void clone_update_extent_map(struct inode *inode,
  */
 static int btrfs_clone(struct inode *src, struct inode *inode,
 		       const u64 off, const u64 olen, const u64 olen_aligned,
-		       const u64 destoff)
+		       const u64 destoff, int no_mtime)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_path *path = NULL;
@@ -3640,7 +3654,7 @@ process_slot:
 					      root->sectorsize);
 			ret = clone_finish_inode_update(trans, inode,
 							last_dest_end,
-							destoff, olen);
+							destoff, olen, no_mtime);
 			if (ret)
 				goto out;
 			if (new_key.offset + datal >= destoff + len)
@@ -3678,7 +3692,7 @@ process_slot:
 		clone_update_extent_map(inode, trans, NULL, last_dest_end,
 					destoff + len - last_dest_end);
 		ret = clone_finish_inode_update(trans, inode, destoff + len,
-						destoff, olen);
+						destoff, olen, no_mtime);
 	}
 
 out:
@@ -3808,7 +3822,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 		btrfs_double_extent_lock(src, off, inode, destoff, len);
 	}
 
-	ret = btrfs_clone(src, inode, off, olen, len, destoff);
+	ret = btrfs_clone(src, inode, off, olen, len, destoff, 0);
 
 	if (same_inode) {
 		u64 lock_start = min_t(u64, off, destoff);
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index b6dec05..beeb51c 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -342,11 +342,14 @@ struct btrfs_ioctl_same_extent_info {
 	__u32 reserved;
 };
 
+#define	BTRFS_SAME_NO_MTIME	0x1
+#define	BTRFS_SAME_FLAGS	(BTRFS_SAME_NO_MTIME)
+
 struct btrfs_ioctl_same_args {
 	__u64 logical_offset;	/* in - start of extent in source */
 	__u64 length;		/* in - length of extent */
 	__u16 dest_count;	/* in - total elements in info array */
-	__u16 reserved1;
+	__u16 flags;		/* in - see BTRFS_SAME_FLAGS */
 	__u32 reserved2;
 	struct btrfs_ioctl_same_extent_info info[0];
 };
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] btrfs: fix clone / extent-same deadlocks
  2015-06-22 22:47 ` [PATCH 3/5] btrfs: fix clone / extent-same deadlocks Mark Fasheh
@ 2015-06-23 14:56   ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2015-06-23 14:56 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Mon, Jun 22, 2015 at 03:47:40PM -0700, Mark Fasheh wrote:
> Clone and extent same lock their source and target inodes in opposite order.
> In addition to this, the range locking in clone doesn't take ordering into
> account. Fix this by having clone use the same locking helpers as
> btrfs-extent-same.
> 
> In addition, I do a small cleanup of the locking helpers, removing a case
> (both inodes being the same) which was poorly accounted for and never
> actually used by the callers.
> 
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>

Reviewed-by: David Sterba <dsterba@suse.cz>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] btrfs: allow dedupe of same inode
  2015-06-22 22:47 ` [PATCH 4/5] btrfs: allow dedupe of same inode Mark Fasheh
@ 2015-06-23 14:59   ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2015-06-23 14:59 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Mon, Jun 22, 2015 at 03:47:41PM -0700, Mark Fasheh wrote:
> clone() supports cloning within an inode so extent-same can do
> the same now. This patch fixes up the locking in extent-same to
> know about the single-inode case. In addition to that, we add a
> check for overlapping ranges, which clone does not allow.
> 
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>

Reviewed-by: David Sterba <dsterba@suse.cz>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same
  2015-06-22 22:47 ` [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same Mark Fasheh
@ 2015-06-23 15:11   ` David Sterba
  2015-06-23 17:11     ` Mark Fasheh
  2015-06-24 20:17     ` Zygo Blaxell
  0 siblings, 2 replies; 18+ messages in thread
From: David Sterba @ 2015-06-23 15:11 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Mon, Jun 22, 2015 at 03:47:42PM -0700, Mark Fasheh wrote:
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -87,7 +87,8 @@ struct btrfs_ioctl_received_subvol_args_32 {
>  
>  
>  static int btrfs_clone(struct inode *src, struct inode *inode,
> -		       u64 off, u64 olen, u64 olen_aligned, u64 destoff);
> +		       u64 off, u64 olen, u64 olen_aligned, u64 destoff,
> +		       int no_mtime);

Please make it 'flags' and pass the verified flags directly.

>  /* Mask out flags that are inappropriate for the given type of inode. */
>  static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
> @@ -2974,7 +2975,7 @@ static int extent_same_check_offsets(struct inode *inode, u64 off, u64 *plen,
>  }
>  
>  static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
> -			     struct inode *dst, u64 dst_loff)
> +			     struct inode *dst, u64 dst_loff, int no_mtime)
>  {
>  	int ret;
>  	u64 len = olen;
> @@ -3054,7 +3055,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
>  	/* pass original length for comparison so we stay within i_size */
>  	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
>  	if (ret == 0)
> -		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
> +		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff,
> +				  no_mtime);
>  
>  	if (same_inode)
>  		unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> @@ -3088,6 +3090,7 @@ static long btrfs_ioctl_file_extent_same(struct file *file,
>  	u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
>  	bool is_admin = capable(CAP_SYS_ADMIN);
>  	u16 count;
> +	int no_mtime = 0;
>  
>  	if (!(file->f_mode & FMODE_READ))
>  		return -EINVAL;
> @@ -3139,6 +3142,12 @@ static long btrfs_ioctl_file_extent_same(struct file *file,
>  	if (!S_ISREG(src->i_mode))
>  		goto out;
>  
> +	ret = -EINVAL;
> +	if (same->flags & ~BTRFS_SAME_FLAGS)
> +		goto out;
> +	if (same->flags & BTRFS_SAME_NO_MTIME)
> +		no_mtime = 1;
> +
>  	/* pre-format output fields to sane values */
>  	for (i = 0; i < count; i++) {
>  		same->info[i].bytes_deduped = 0ULL;
> @@ -3164,7 +3173,8 @@ static long btrfs_ioctl_file_extent_same(struct file *file,
>  			info->status = -EACCES;
>  		} else {
>  			info->status = btrfs_extent_same(src, off, len, dst,
> -							info->logical_offset);
> +							 info->logical_offset,
> +							 no_mtime);
>  			if (info->status == 0)
>  				info->bytes_deduped += len;
>  		}
> @@ -3219,13 +3229,17 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
>  				     struct inode *inode,
>  				     u64 endoff,
>  				     const u64 destoff,
> -				     const u64 olen)
> +				     const u64 olen,
> +				     int no_mtime)
>  {
>  	struct btrfs_root *root = BTRFS_I(inode)->root;
>  	int ret;
>  
>  	inode_inc_iversion(inode);
> -	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> +	if (no_mtime)
> +		inode->i_ctime = CURRENT_TIME;
> +	else
> +		inode->i_mtime = inode->i_ctime = CURRENT_TIME;
>  	/*
>  	 * We round up to the block size at eof when determining which
>  	 * extents to clone above, but shouldn't round up the file size.
> @@ -3316,7 +3330,7 @@ static void clone_update_extent_map(struct inode *inode,
>   */
>  static int btrfs_clone(struct inode *src, struct inode *inode,
>  		       const u64 off, const u64 olen, const u64 olen_aligned,
> -		       const u64 destoff)
> +		       const u64 destoff, int no_mtime)
>  {
>  	struct btrfs_root *root = BTRFS_I(inode)->root;
>  	struct btrfs_path *path = NULL;
> @@ -3640,7 +3654,7 @@ process_slot:
>  					      root->sectorsize);
>  			ret = clone_finish_inode_update(trans, inode,
>  							last_dest_end,
> -							destoff, olen);
> +							destoff, olen, no_mtime);
>  			if (ret)
>  				goto out;
>  			if (new_key.offset + datal >= destoff + len)
> @@ -3678,7 +3692,7 @@ process_slot:
>  		clone_update_extent_map(inode, trans, NULL, last_dest_end,
>  					destoff + len - last_dest_end);
>  		ret = clone_finish_inode_update(trans, inode, destoff + len,
> -						destoff, olen);
> +						destoff, olen, no_mtime);
>  	}
>  
>  out:
> @@ -3808,7 +3822,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
>  		btrfs_double_extent_lock(src, off, inode, destoff, len);
>  	}
>  
> -	ret = btrfs_clone(src, inode, off, olen, len, destoff);
> +	ret = btrfs_clone(src, inode, off, olen, len, destoff, 0);
>  
>  	if (same_inode) {
>  		u64 lock_start = min_t(u64, off, destoff);
> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> index b6dec05..beeb51c 100644
> --- a/include/uapi/linux/btrfs.h
> +++ b/include/uapi/linux/btrfs.h

> +#define	BTRFS_SAME_NO_MTIME	0x1
> +#define	BTRFS_SAME_FLAGS	(BTRFS_SAME_NO_MTIME)

The naming scheme used eg. for the send flags:

BTRFS_SEND_FLAG_MASK
BTRFS_SEND_FLAG_NO_FILE_DATA

So I suggest to follow it:

BTRFS_SAME_FLAG_MASK
BTRFS_SAME_FLAG_NO_MTIME

Though I'd rather see it named it with BTRFS_EXTENT_SAME_ prefix so it
(almost ...) matches the ioctl name (and is greppable).

This is going to be in a public interface so the naming is important.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same
  2015-06-23 15:11   ` David Sterba
@ 2015-06-23 17:11     ` Mark Fasheh
  2015-06-24 20:17     ` Zygo Blaxell
  1 sibling, 0 replies; 18+ messages in thread
From: Mark Fasheh @ 2015-06-23 17:11 UTC (permalink / raw)
  To: dsterba, Chris Mason, Josef Bacik, linux-btrfs

On Tue, Jun 23, 2015 at 05:11:56PM +0200, David Sterba wrote:
> On Mon, Jun 22, 2015 at 03:47:42PM -0700, Mark Fasheh wrote:
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -87,7 +87,8 @@ struct btrfs_ioctl_received_subvol_args_32 {
> >  
> >  
> >  static int btrfs_clone(struct inode *src, struct inode *inode,
> > -		       u64 off, u64 olen, u64 olen_aligned, u64 destoff);
> > +		       u64 off, u64 olen, u64 olen_aligned, u64 destoff,
> > +		       int no_mtime);
> 
> Please make it 'flags' and pass the verified flags directly.

Sure.


> >  /* Mask out flags that are inappropriate for the given type of inode. */
> >  static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
> > @@ -2974,7 +2975,7 @@ static int extent_same_check_offsets(struct inode *inode, u64 off, u64 *plen,
> >  }
> >  
> >  static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
> > -			     struct inode *dst, u64 dst_loff)
> > +			     struct inode *dst, u64 dst_loff, int no_mtime)
> >  {
> >  	int ret;
> >  	u64 len = olen;
> > @@ -3054,7 +3055,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
> >  	/* pass original length for comparison so we stay within i_size */
> >  	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
> >  	if (ret == 0)
> > -		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
> > +		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff,
> > +				  no_mtime);
> >  
> >  	if (same_inode)
> >  		unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> > @@ -3088,6 +3090,7 @@ static long btrfs_ioctl_file_extent_same(struct file *file,
> >  	u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
> >  	bool is_admin = capable(CAP_SYS_ADMIN);
> >  	u16 count;
> > +	int no_mtime = 0;
> >  
> >  	if (!(file->f_mode & FMODE_READ))
> >  		return -EINVAL;
> > @@ -3139,6 +3142,12 @@ static long btrfs_ioctl_file_extent_same(struct file *file,
> >  	if (!S_ISREG(src->i_mode))
> >  		goto out;
> >  
> > +	ret = -EINVAL;
> > +	if (same->flags & ~BTRFS_SAME_FLAGS)
> > +		goto out;
> > +	if (same->flags & BTRFS_SAME_NO_MTIME)
> > +		no_mtime = 1;
> > +
> >  	/* pre-format output fields to sane values */
> >  	for (i = 0; i < count; i++) {
> >  		same->info[i].bytes_deduped = 0ULL;
> > @@ -3164,7 +3173,8 @@ static long btrfs_ioctl_file_extent_same(struct file *file,
> >  			info->status = -EACCES;
> >  		} else {
> >  			info->status = btrfs_extent_same(src, off, len, dst,
> > -							info->logical_offset);
> > +							 info->logical_offset,
> > +							 no_mtime);
> >  			if (info->status == 0)
> >  				info->bytes_deduped += len;
> >  		}
> > @@ -3219,13 +3229,17 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
> >  				     struct inode *inode,
> >  				     u64 endoff,
> >  				     const u64 destoff,
> > -				     const u64 olen)
> > +				     const u64 olen,
> > +				     int no_mtime)
> >  {
> >  	struct btrfs_root *root = BTRFS_I(inode)->root;
> >  	int ret;
> >  
> >  	inode_inc_iversion(inode);
> > -	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> > +	if (no_mtime)
> > +		inode->i_ctime = CURRENT_TIME;
> > +	else
> > +		inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> >  	/*
> >  	 * We round up to the block size at eof when determining which
> >  	 * extents to clone above, but shouldn't round up the file size.
> > @@ -3316,7 +3330,7 @@ static void clone_update_extent_map(struct inode *inode,
> >   */
> >  static int btrfs_clone(struct inode *src, struct inode *inode,
> >  		       const u64 off, const u64 olen, const u64 olen_aligned,
> > -		       const u64 destoff)
> > +		       const u64 destoff, int no_mtime)
> >  {
> >  	struct btrfs_root *root = BTRFS_I(inode)->root;
> >  	struct btrfs_path *path = NULL;
> > @@ -3640,7 +3654,7 @@ process_slot:
> >  					      root->sectorsize);
> >  			ret = clone_finish_inode_update(trans, inode,
> >  							last_dest_end,
> > -							destoff, olen);
> > +							destoff, olen, no_mtime);
> >  			if (ret)
> >  				goto out;
> >  			if (new_key.offset + datal >= destoff + len)
> > @@ -3678,7 +3692,7 @@ process_slot:
> >  		clone_update_extent_map(inode, trans, NULL, last_dest_end,
> >  					destoff + len - last_dest_end);
> >  		ret = clone_finish_inode_update(trans, inode, destoff + len,
> > -						destoff, olen);
> > +						destoff, olen, no_mtime);
> >  	}
> >  
> >  out:
> > @@ -3808,7 +3822,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
> >  		btrfs_double_extent_lock(src, off, inode, destoff, len);
> >  	}
> >  
> > -	ret = btrfs_clone(src, inode, off, olen, len, destoff);
> > +	ret = btrfs_clone(src, inode, off, olen, len, destoff, 0);
> >  
> >  	if (same_inode) {
> >  		u64 lock_start = min_t(u64, off, destoff);
> > diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> > index b6dec05..beeb51c 100644
> > --- a/include/uapi/linux/btrfs.h
> > +++ b/include/uapi/linux/btrfs.h
> 
> > +#define	BTRFS_SAME_NO_MTIME	0x1
> > +#define	BTRFS_SAME_FLAGS	(BTRFS_SAME_NO_MTIME)
> 
> The naming scheme used eg. for the send flags:
> 
> BTRFS_SEND_FLAG_MASK
> BTRFS_SEND_FLAG_NO_FILE_DATA
> 
> So I suggest to follow it:
> 
> BTRFS_SAME_FLAG_MASK
> BTRFS_SAME_FLAG_NO_MTIME
> 
> Though I'd rather see it named it with BTRFS_EXTENT_SAME_ prefix so it
> (almost ...) matches the ioctl name (and is greppable).
> 
> This is going to be in a public interface so the naming is important.

Sounds good, I'll fix up the patch with your suggestions and resend another
round shortly. Btw thanks for all the reviews David, it is appreciated!
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same
  2015-06-23 15:11   ` David Sterba
  2015-06-23 17:11     ` Mark Fasheh
@ 2015-06-24 20:17     ` Zygo Blaxell
  2015-06-25 12:52       ` David Sterba
  1 sibling, 1 reply; 18+ messages in thread
From: Zygo Blaxell @ 2015-06-24 20:17 UTC (permalink / raw)
  To: dsterba, Mark Fasheh, Chris Mason, Josef Bacik, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 6006 bytes --]

On Tue, Jun 23, 2015 at 05:11:56PM +0200, David Sterba wrote:
> On Mon, Jun 22, 2015 at 03:47:42PM -0700, Mark Fasheh wrote:
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -87,7 +87,8 @@ struct btrfs_ioctl_received_subvol_args_32 {
> >  
> >  
> >  static int btrfs_clone(struct inode *src, struct inode *inode,
> > -		       u64 off, u64 olen, u64 olen_aligned, u64 destoff);
> > +		       u64 off, u64 olen, u64 olen_aligned, u64 destoff,
> > +		       int no_mtime);
> 
> Please make it 'flags' and pass the verified flags directly.
> 
> >  /* Mask out flags that are inappropriate for the given type of inode. */
> >  static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
> > @@ -2974,7 +2975,7 @@ static int extent_same_check_offsets(struct inode *inode, u64 off, u64 *plen,
> >  }
> >  
> >  static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
> > -			     struct inode *dst, u64 dst_loff)
> > +			     struct inode *dst, u64 dst_loff, int no_mtime)
> >  {
> >  	int ret;
> >  	u64 len = olen;
> > @@ -3054,7 +3055,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
> >  	/* pass original length for comparison so we stay within i_size */
> >  	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
> >  	if (ret == 0)
> > -		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff);
> > +		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff,
> > +				  no_mtime);
> >  
> >  	if (same_inode)
> >  		unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
> > @@ -3088,6 +3090,7 @@ static long btrfs_ioctl_file_extent_same(struct file *file,
> >  	u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
> >  	bool is_admin = capable(CAP_SYS_ADMIN);
> >  	u16 count;
> > +	int no_mtime = 0;
> >  
> >  	if (!(file->f_mode & FMODE_READ))
> >  		return -EINVAL;
> > @@ -3139,6 +3142,12 @@ static long btrfs_ioctl_file_extent_same(struct file *file,
> >  	if (!S_ISREG(src->i_mode))
> >  		goto out;
> >  
> > +	ret = -EINVAL;
> > +	if (same->flags & ~BTRFS_SAME_FLAGS)
> > +		goto out;
> > +	if (same->flags & BTRFS_SAME_NO_MTIME)
> > +		no_mtime = 1;
> > +
> >  	/* pre-format output fields to sane values */
> >  	for (i = 0; i < count; i++) {
> >  		same->info[i].bytes_deduped = 0ULL;
> > @@ -3164,7 +3173,8 @@ static long btrfs_ioctl_file_extent_same(struct file *file,
> >  			info->status = -EACCES;
> >  		} else {
> >  			info->status = btrfs_extent_same(src, off, len, dst,
> > -							info->logical_offset);
> > +							 info->logical_offset,
> > +							 no_mtime);
> >  			if (info->status == 0)
> >  				info->bytes_deduped += len;
> >  		}
> > @@ -3219,13 +3229,17 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
> >  				     struct inode *inode,
> >  				     u64 endoff,
> >  				     const u64 destoff,
> > -				     const u64 olen)
> > +				     const u64 olen,
> > +				     int no_mtime)
> >  {
> >  	struct btrfs_root *root = BTRFS_I(inode)->root;
> >  	int ret;
> >  
> >  	inode_inc_iversion(inode);
> > -	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> > +	if (no_mtime)
> > +		inode->i_ctime = CURRENT_TIME;
> > +	else
> > +		inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> >  	/*
> >  	 * We round up to the block size at eof when determining which
> >  	 * extents to clone above, but shouldn't round up the file size.
> > @@ -3316,7 +3330,7 @@ static void clone_update_extent_map(struct inode *inode,
> >   */
> >  static int btrfs_clone(struct inode *src, struct inode *inode,
> >  		       const u64 off, const u64 olen, const u64 olen_aligned,
> > -		       const u64 destoff)
> > +		       const u64 destoff, int no_mtime)
> >  {
> >  	struct btrfs_root *root = BTRFS_I(inode)->root;
> >  	struct btrfs_path *path = NULL;
> > @@ -3640,7 +3654,7 @@ process_slot:
> >  					      root->sectorsize);
> >  			ret = clone_finish_inode_update(trans, inode,
> >  							last_dest_end,
> > -							destoff, olen);
> > +							destoff, olen, no_mtime);
> >  			if (ret)
> >  				goto out;
> >  			if (new_key.offset + datal >= destoff + len)
> > @@ -3678,7 +3692,7 @@ process_slot:
> >  		clone_update_extent_map(inode, trans, NULL, last_dest_end,
> >  					destoff + len - last_dest_end);
> >  		ret = clone_finish_inode_update(trans, inode, destoff + len,
> > -						destoff, olen);
> > +						destoff, olen, no_mtime);
> >  	}
> >  
> >  out:
> > @@ -3808,7 +3822,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
> >  		btrfs_double_extent_lock(src, off, inode, destoff, len);
> >  	}
> >  
> > -	ret = btrfs_clone(src, inode, off, olen, len, destoff);
> > +	ret = btrfs_clone(src, inode, off, olen, len, destoff, 0);
> >  
> >  	if (same_inode) {
> >  		u64 lock_start = min_t(u64, off, destoff);
> > diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> > index b6dec05..beeb51c 100644
> > --- a/include/uapi/linux/btrfs.h
> > +++ b/include/uapi/linux/btrfs.h
> 
> > +#define	BTRFS_SAME_NO_MTIME	0x1
> > +#define	BTRFS_SAME_FLAGS	(BTRFS_SAME_NO_MTIME)
> 
> The naming scheme used eg. for the send flags:
> 
> BTRFS_SEND_FLAG_MASK
> BTRFS_SEND_FLAG_NO_FILE_DATA
> 
> So I suggest to follow it:
> 
> BTRFS_SAME_FLAG_MASK
> BTRFS_SAME_FLAG_NO_MTIME
> 
> Though I'd rather see it named it with BTRFS_EXTENT_SAME_ prefix so it
> (almost ...) matches the ioctl name (and is greppable).

Is there any sane use case where we would _want_ EXTENT_SAME to change
the mtime?  We do a lot of work to make sure that none of the files
involved have any sort of content change.  Why do we need the flag at all?

> This is going to be in a public interface so the naming is important.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same
  2015-06-24 20:17     ` Zygo Blaxell
@ 2015-06-25 12:52       ` David Sterba
  2015-06-25 13:10         ` Austin S Hemmelgarn
  2015-06-25 18:12         ` Mark Fasheh
  0 siblings, 2 replies; 18+ messages in thread
From: David Sterba @ 2015-06-25 12:52 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: dsterba, Mark Fasheh, Chris Mason, Josef Bacik, linux-btrfs

On Wed, Jun 24, 2015 at 04:17:32PM -0400, Zygo Blaxell wrote:
> Is there any sane use case where we would _want_ EXTENT_SAME to change
> the mtime?  We do a lot of work to make sure that none of the files
> involved have any sort of content change.  Why do we need the flag at all?

Good point, I don't see the usecase for updating MTIME.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same
  2015-06-25 12:52       ` David Sterba
@ 2015-06-25 13:10         ` Austin S Hemmelgarn
  2015-06-25 16:52           ` Zygo Blaxell
  2015-06-25 18:12         ` Mark Fasheh
  1 sibling, 1 reply; 18+ messages in thread
From: Austin S Hemmelgarn @ 2015-06-25 13:10 UTC (permalink / raw)
  To: dsterba, Zygo Blaxell, Mark Fasheh, Chris Mason, Josef Bacik,
	linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 592 bytes --]

On 2015-06-25 08:52, David Sterba wrote:
> On Wed, Jun 24, 2015 at 04:17:32PM -0400, Zygo Blaxell wrote:
>> Is there any sane use case where we would _want_ EXTENT_SAME to change
>> the mtime?  We do a lot of work to make sure that none of the files
>> involved have any sort of content change.  Why do we need the flag at all?
>
> Good point, I don't see the usecase for updating MTIME.
Was the original intent possibly to make certain the CTIME got updated? 
  Because EXTENT_SAME _does_ update the metadata, and by logical 
extension that means that the CTIME should change.



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2967 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same
  2015-06-25 13:10         ` Austin S Hemmelgarn
@ 2015-06-25 16:52           ` Zygo Blaxell
  0 siblings, 0 replies; 18+ messages in thread
From: Zygo Blaxell @ 2015-06-25 16:52 UTC (permalink / raw)
  To: Austin S Hemmelgarn
  Cc: dsterba, Mark Fasheh, Chris Mason, Josef Bacik, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3288 bytes --]

On Thu, Jun 25, 2015 at 09:10:31AM -0400, Austin S Hemmelgarn wrote:
> On 2015-06-25 08:52, David Sterba wrote:
> >On Wed, Jun 24, 2015 at 04:17:32PM -0400, Zygo Blaxell wrote:
> >>Is there any sane use case where we would _want_ EXTENT_SAME to change
> >>the mtime?  We do a lot of work to make sure that none of the files
> >>involved have any sort of content change.  Why do we need the flag at all?
> >
> >Good point, I don't see the usecase for updating MTIME.
> Was the original intent possibly to make certain the CTIME got
> updated?  Because EXTENT_SAME _does_ update the metadata, and by
> logical extension that means that the CTIME should change.

It updates the _extent_ metadata.  CTIME does not cover that, only _inode_
metadata changes.

Put another way, if we updated CTIME for extent metadata changes, then
a balance would imply rewriting every inode on the filesystem, which is
insane.  Same for defragment:  when we defragment files, we already
leave the MTIME and CTIME fields alone, and we use locking to protect
defrag from race conditions that might cause it to modify data.

You can confirm the behavior of balance and defrag on v4.0.5:

	# Here's a file:
	root@testhost:~# fiemap /media/usb7/vmlinuz
	File: /media/usb7/vmlinuz
	Log 0x0..0x654000 Phy 0xc10000..0x1264000 Flags FIEMAP_EXTENT_LAST
	root@testhost:~# ls --full -lc /media/usb7/vmlinuz
	-rw------- 1 root root 6634160 2015-06-25 12:37:16.265688748 -0400 /media/usb7/vmlinuz

	# Balance:
	root@testhost:~# btrfs balance start /media/usb7
	Done, had to relocate 5 out of 5 chunks

	# Confirm the file is an a new physical location:
	root@testhost:~# fiemap /media/usb7/vmlinuz
	File: /media/usb7/vmlinuz
	Log 0x0..0x654000 Phy 0x2b3d0000..0x2ba24000 Flags FIEMAP_EXTENT_LAST

	# We did not change the CTIME.
	root@testhost:~# ls --full -lc /media/usb7/vmlinuz
	-rw------- 1 root root 6634160 2015-06-25 12:37:16.265688748 -0400 /media/usb7/vmlinuz

	# Now let's try defrag!
	root@testhost:~# btrfs fi defrag -c /media/usb7/vmlinuz

	# File has been moved again, and parts of it are even compressed now
	root@testhost:~# fiemap /media/usb7/vmlinuz
	File: /media/usb7/vmlinuz
	Log 0x0..0x20000 Phy 0x2b390000..0x2b3b0000 Flags FIEMAP_EXTENT_ENCODED
	Log 0x20000..0x80000 Phy 0x2ba64000..0x2bac4000 Flags 0
	Log 0x80000..0x100000 Phy 0x2bac4000..0x2bb44000 Flags 0
	Log 0x100000..0x180000 Phy 0x2bb44000..0x2bbc4000 Flags 0
	Log 0x180000..0x200000 Phy 0x2bbc4000..0x2bc44000 Flags 0
	Log 0x200000..0x280000 Phy 0x2bc44000..0x2bcc4000 Flags 0
	Log 0x280000..0x300000 Phy 0x2bcc4000..0x2bd44000 Flags 0
	Log 0x300000..0x380000 Phy 0x2bd44000..0x2bdc4000 Flags 0
	Log 0x380000..0x400000 Phy 0x2bdc4000..0x2be44000 Flags 0
	Log 0x400000..0x480000 Phy 0x2be44000..0x2bec4000 Flags 0
	Log 0x480000..0x500000 Phy 0x2bec4000..0x2bf44000 Flags 0
	Log 0x500000..0x580000 Phy 0x2bf44000..0x2bfc4000 Flags 0
	Log 0x580000..0x600000 Phy 0x2bfc4000..0x2c044000 Flags 0
	Log 0x600000..0x654000 Phy 0x2c044000..0x2c098000 Flags FIEMAP_EXTENT_LAST

	# CTIME not changed (no changes to file content).
	root@testhost:~# ls --full -lc /media/usb7/vmlinuz
	-rw------- 1 root root 6634160 2015-06-25 12:37:16.265688748 -0400 /media/usb7/vmlinuz


> 
> 



[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same
  2015-06-25 12:52       ` David Sterba
  2015-06-25 13:10         ` Austin S Hemmelgarn
@ 2015-06-25 18:12         ` Mark Fasheh
  1 sibling, 0 replies; 18+ messages in thread
From: Mark Fasheh @ 2015-06-25 18:12 UTC (permalink / raw)
  To: dsterba, Zygo Blaxell, Chris Mason, Josef Bacik, linux-btrfs

On Thu, Jun 25, 2015 at 02:52:50PM +0200, David Sterba wrote:
> On Wed, Jun 24, 2015 at 04:17:32PM -0400, Zygo Blaxell wrote:
> > Is there any sane use case where we would _want_ EXTENT_SAME to change
> > the mtime?  We do a lot of work to make sure that none of the files
> > involved have any sort of content change.  Why do we need the flag at all?
> 
> Good point, I don't see the usecase for updating MTIME.

Yeah there isn't one and I doubt anyone will be upset if we just always
ignore the mtime update. I'll send some new patches shortly.

Thanks for the suggestion Zygo.
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/5] btrfs: fix clone / extent-same deadlocks
  2015-06-30 21:42 [PATCH 0/5] btrfs: dedupe fixes, features V5 Mark Fasheh
@ 2015-06-30 21:42 ` Mark Fasheh
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Fasheh @ 2015-06-30 21:42 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh

Clone and extent same lock their source and target inodes in opposite order.
In addition to this, the range locking in clone doesn't take ordering into
account. Fix this by having clone use the same locking helpers as
btrfs-extent-same.

In addition, I do a small cleanup of the locking helpers, removing a case
(both inodes being the same) which was poorly accounted for and never
actually used by the callers.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
 fs/btrfs/ioctl.c | 34 ++++++++--------------------------
 1 file changed, 8 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b899584..8d6887d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2831,8 +2831,7 @@ static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
 		swap(inode1, inode2);
 
 	mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
-	if (inode1 != inode2)
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+	mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
 }
 
 static void btrfs_double_extent_unlock(struct inode *inode1, u64 loff1,
@@ -2850,8 +2849,7 @@ static void btrfs_double_extent_lock(struct inode *inode1, u64 loff1,
 		swap(loff1, loff2);
 	}
 	lock_extent_range(inode1, loff1, len);
-	if (inode1 != inode2)
-		lock_extent_range(inode2, loff2, len);
+	lock_extent_range(inode2, loff2, len);
 }
 
 struct cmp_pages {
@@ -3713,13 +3711,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 		goto out_fput;
 
 	if (!same_inode) {
-		if (inode < src) {
-			mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT);
-			mutex_lock_nested(&src->i_mutex, I_MUTEX_CHILD);
-		} else {
-			mutex_lock_nested(&src->i_mutex, I_MUTEX_PARENT);
-			mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
-		}
+		btrfs_double_inode_lock(src, inode);
 	} else {
 		mutex_lock(&src->i_mutex);
 	}
@@ -3769,8 +3761,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 
 		lock_extent_range(src, lock_start, lock_len);
 	} else {
-		lock_extent_range(src, off, len);
-		lock_extent_range(inode, destoff, len);
+		btrfs_double_extent_lock(src, off, inode, destoff, len);
 	}
 
 	ret = btrfs_clone(src, inode, off, olen, len, destoff);
@@ -3781,9 +3772,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 
 		unlock_extent(&BTRFS_I(src)->io_tree, lock_start, lock_end);
 	} else {
-		unlock_extent(&BTRFS_I(src)->io_tree, off, off + len - 1);
-		unlock_extent(&BTRFS_I(inode)->io_tree, destoff,
-			      destoff + len - 1);
+		btrfs_double_extent_unlock(src, off, inode, destoff, len);
 	}
 	/*
 	 * Truncate page cache pages so that future reads will see the cloned
@@ -3792,17 +3781,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 	truncate_inode_pages_range(&inode->i_data, destoff,
 				   PAGE_CACHE_ALIGN(destoff + len) - 1);
 out_unlock:
-	if (!same_inode) {
-		if (inode < src) {
-			mutex_unlock(&src->i_mutex);
-			mutex_unlock(&inode->i_mutex);
-		} else {
-			mutex_unlock(&inode->i_mutex);
-			mutex_unlock(&src->i_mutex);
-		}
-	} else {
+	if (!same_inode)
+		btrfs_double_inode_unlock(src, inode);
+	else
 		mutex_unlock(&src->i_mutex);
-	}
 out_fput:
 	fdput(src_file);
 out_drop_write:
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] btrfs: fix clone / extent-same deadlocks
  2015-06-26 21:00 [PATCH 0/5] btrfs: dedupe fixes, features V4 Mark Fasheh
@ 2015-06-26 21:00 ` Mark Fasheh
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Fasheh @ 2015-06-26 21:00 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh

Clone and extent same lock their source and target inodes in opposite order.
In addition to this, the range locking in clone doesn't take ordering into
account. Fix this by having clone use the same locking helpers as
btrfs-extent-same.

In addition, I do a small cleanup of the locking helpers, removing a case
(both inodes being the same) which was poorly accounted for and never
actually used by the callers.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
 fs/btrfs/ioctl.c | 34 ++++++++--------------------------
 1 file changed, 8 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b899584..8d6887d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2831,8 +2831,7 @@ static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
 		swap(inode1, inode2);
 
 	mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
-	if (inode1 != inode2)
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+	mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
 }
 
 static void btrfs_double_extent_unlock(struct inode *inode1, u64 loff1,
@@ -2850,8 +2849,7 @@ static void btrfs_double_extent_lock(struct inode *inode1, u64 loff1,
 		swap(loff1, loff2);
 	}
 	lock_extent_range(inode1, loff1, len);
-	if (inode1 != inode2)
-		lock_extent_range(inode2, loff2, len);
+	lock_extent_range(inode2, loff2, len);
 }
 
 struct cmp_pages {
@@ -3713,13 +3711,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 		goto out_fput;
 
 	if (!same_inode) {
-		if (inode < src) {
-			mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT);
-			mutex_lock_nested(&src->i_mutex, I_MUTEX_CHILD);
-		} else {
-			mutex_lock_nested(&src->i_mutex, I_MUTEX_PARENT);
-			mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
-		}
+		btrfs_double_inode_lock(src, inode);
 	} else {
 		mutex_lock(&src->i_mutex);
 	}
@@ -3769,8 +3761,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 
 		lock_extent_range(src, lock_start, lock_len);
 	} else {
-		lock_extent_range(src, off, len);
-		lock_extent_range(inode, destoff, len);
+		btrfs_double_extent_lock(src, off, inode, destoff, len);
 	}
 
 	ret = btrfs_clone(src, inode, off, olen, len, destoff);
@@ -3781,9 +3772,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 
 		unlock_extent(&BTRFS_I(src)->io_tree, lock_start, lock_end);
 	} else {
-		unlock_extent(&BTRFS_I(src)->io_tree, off, off + len - 1);
-		unlock_extent(&BTRFS_I(inode)->io_tree, destoff,
-			      destoff + len - 1);
+		btrfs_double_extent_unlock(src, off, inode, destoff, len);
 	}
 	/*
 	 * Truncate page cache pages so that future reads will see the cloned
@@ -3792,17 +3781,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 	truncate_inode_pages_range(&inode->i_data, destoff,
 				   PAGE_CACHE_ALIGN(destoff + len) - 1);
 out_unlock:
-	if (!same_inode) {
-		if (inode < src) {
-			mutex_unlock(&src->i_mutex);
-			mutex_unlock(&inode->i_mutex);
-		} else {
-			mutex_unlock(&inode->i_mutex);
-			mutex_unlock(&src->i_mutex);
-		}
-	} else {
+	if (!same_inode)
+		btrfs_double_inode_unlock(src, inode);
+	else
 		mutex_unlock(&src->i_mutex);
-	}
 out_fput:
 	fdput(src_file);
 out_drop_write:
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] btrfs: fix clone / extent-same deadlocks
  2015-06-23 21:28 [PATCH 0/5] btrfs: dedupe fixes, features V3 Mark Fasheh
@ 2015-06-23 21:28 ` Mark Fasheh
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Fasheh @ 2015-06-23 21:28 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, David Sterba, linux-btrfs, Mark Fasheh

Clone and extent same lock their source and target inodes in opposite order.
In addition to this, the range locking in clone doesn't take ordering into
account. Fix this by having clone use the same locking helpers as
btrfs-extent-same.

In addition, I do a small cleanup of the locking helpers, removing a case
(both inodes being the same) which was poorly accounted for and never
actually used by the callers.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
---
 fs/btrfs/ioctl.c | 34 ++++++++--------------------------
 1 file changed, 8 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b899584..8d6887d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2831,8 +2831,7 @@ static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
 		swap(inode1, inode2);
 
 	mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
-	if (inode1 != inode2)
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+	mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
 }
 
 static void btrfs_double_extent_unlock(struct inode *inode1, u64 loff1,
@@ -2850,8 +2849,7 @@ static void btrfs_double_extent_lock(struct inode *inode1, u64 loff1,
 		swap(loff1, loff2);
 	}
 	lock_extent_range(inode1, loff1, len);
-	if (inode1 != inode2)
-		lock_extent_range(inode2, loff2, len);
+	lock_extent_range(inode2, loff2, len);
 }
 
 struct cmp_pages {
@@ -3713,13 +3711,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 		goto out_fput;
 
 	if (!same_inode) {
-		if (inode < src) {
-			mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT);
-			mutex_lock_nested(&src->i_mutex, I_MUTEX_CHILD);
-		} else {
-			mutex_lock_nested(&src->i_mutex, I_MUTEX_PARENT);
-			mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
-		}
+		btrfs_double_inode_lock(src, inode);
 	} else {
 		mutex_lock(&src->i_mutex);
 	}
@@ -3769,8 +3761,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 
 		lock_extent_range(src, lock_start, lock_len);
 	} else {
-		lock_extent_range(src, off, len);
-		lock_extent_range(inode, destoff, len);
+		btrfs_double_extent_lock(src, off, inode, destoff, len);
 	}
 
 	ret = btrfs_clone(src, inode, off, olen, len, destoff);
@@ -3781,9 +3772,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 
 		unlock_extent(&BTRFS_I(src)->io_tree, lock_start, lock_end);
 	} else {
-		unlock_extent(&BTRFS_I(src)->io_tree, off, off + len - 1);
-		unlock_extent(&BTRFS_I(inode)->io_tree, destoff,
-			      destoff + len - 1);
+		btrfs_double_extent_unlock(src, off, inode, destoff, len);
 	}
 	/*
 	 * Truncate page cache pages so that future reads will see the cloned
@@ -3792,17 +3781,10 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 	truncate_inode_pages_range(&inode->i_data, destoff,
 				   PAGE_CACHE_ALIGN(destoff + len) - 1);
 out_unlock:
-	if (!same_inode) {
-		if (inode < src) {
-			mutex_unlock(&src->i_mutex);
-			mutex_unlock(&inode->i_mutex);
-		} else {
-			mutex_unlock(&inode->i_mutex);
-			mutex_unlock(&src->i_mutex);
-		}
-	} else {
+	if (!same_inode)
+		btrfs_double_inode_unlock(src, inode);
+	else
 		mutex_unlock(&src->i_mutex);
-	}
 out_fput:
 	fdput(src_file);
 out_drop_write:
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-06-30 21:42 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-22 22:47 [PATCH 0/5] btrfs: dedupe fixes, features V2 Mark Fasheh
2015-06-22 22:47 ` [PATCH 1/5] btrfs: pass unaligned length to btrfs_cmp_data() Mark Fasheh
2015-06-22 22:47 ` [PATCH 2/5] btrfs: fix deadlock with extent-same and readpage Mark Fasheh
2015-06-22 22:47 ` [PATCH 3/5] btrfs: fix clone / extent-same deadlocks Mark Fasheh
2015-06-23 14:56   ` David Sterba
2015-06-22 22:47 ` [PATCH 4/5] btrfs: allow dedupe of same inode Mark Fasheh
2015-06-23 14:59   ` David Sterba
2015-06-22 22:47 ` [PATCH 5/5] btrfs: add no_mtime flag to btrfs-extent-same Mark Fasheh
2015-06-23 15:11   ` David Sterba
2015-06-23 17:11     ` Mark Fasheh
2015-06-24 20:17     ` Zygo Blaxell
2015-06-25 12:52       ` David Sterba
2015-06-25 13:10         ` Austin S Hemmelgarn
2015-06-25 16:52           ` Zygo Blaxell
2015-06-25 18:12         ` Mark Fasheh
2015-06-23 21:28 [PATCH 0/5] btrfs: dedupe fixes, features V3 Mark Fasheh
2015-06-23 21:28 ` [PATCH 3/5] btrfs: fix clone / extent-same deadlocks Mark Fasheh
2015-06-26 21:00 [PATCH 0/5] btrfs: dedupe fixes, features V4 Mark Fasheh
2015-06-26 21:00 ` [PATCH 3/5] btrfs: fix clone / extent-same deadlocks Mark Fasheh
2015-06-30 21:42 [PATCH 0/5] btrfs: dedupe fixes, features V5 Mark Fasheh
2015-06-30 21:42 ` [PATCH 3/5] btrfs: fix clone / extent-same deadlocks Mark Fasheh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.