All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] xfs: various fixes for 5.3
@ 2019-08-08 14:46 Darrick J. Wong
  2019-08-08 14:46 ` [PATCH 1/3] vfs: fix page locking deadlocks when deduping files Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Darrick J. Wong @ 2019-08-08 14:46 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

Here are assorted fixes for 5.3.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=xfs-5.3-fixes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] vfs: fix page locking deadlocks when deduping files
  2019-08-08 14:46 [PATCH 0/3] xfs: various fixes for 5.3 Darrick J. Wong
@ 2019-08-08 14:46 ` Darrick J. Wong
  2019-08-09 12:35   ` Bill O'Donnell
  2019-08-11 23:09   ` Dave Chinner
  2019-08-08 14:47 ` [PATCH 2/3] xfs: remove more ondisk directory corruption asserts Darrick J. Wong
  2019-08-08 14:47 ` [PATCH 3/3] xfs: don't crash on null attr fork xfs_bmapi_read Darrick J. Wong
  2 siblings, 2 replies; 8+ messages in thread
From: Darrick J. Wong @ 2019-08-08 14:46 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

When dedupe wants to use the page cache to compare parts of two files
for dedupe, we must be very careful to handle locking correctly.  The
current code doesn't do this.  It must lock and unlock the page only
once if the two pages are the same, since the overlapping range check
doesn't catch this when blocksize < pagesize.  If the pages are distinct
but from the same file, we must observe page locking order and lock them
in order of increasing offset to avoid clashing with writeback locking.

Fixes: 876bec6f9bbfcb3 ("vfs: refactor clone/dedupe_file_range common functions")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/read_write.c |   36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)


diff --git a/fs/read_write.c b/fs/read_write.c
index 1f5088dec566..4dbdccffa59e 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1811,10 +1811,7 @@ static int generic_remap_check_len(struct inode *inode_in,
 	return (remap_flags & REMAP_FILE_DEDUP) ? -EBADE : -EINVAL;
 }
 
-/*
- * Read a page's worth of file data into the page cache.  Return the page
- * locked.
- */
+/* Read a page's worth of file data into the page cache. */
 static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
 {
 	struct page *page;
@@ -1826,10 +1823,32 @@ static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
 		put_page(page);
 		return ERR_PTR(-EIO);
 	}
-	lock_page(page);
 	return page;
 }
 
+/*
+ * Lock two pages, ensuring that we lock in offset order if the pages are from
+ * the same file.
+ */
+static void vfs_lock_two_pages(struct page *page1, struct page *page2)
+{
+	/* Always lock in order of increasing index. */
+	if (page1->index > page2->index)
+		swap(page1, page2);
+
+	lock_page(page1);
+	if (page1 != page2)
+		lock_page(page2);
+}
+
+/* Unlock two pages, being careful not to unlock the same page twice. */
+static void vfs_unlock_two_pages(struct page *page1, struct page *page2)
+{
+	unlock_page(page1);
+	if (page1 != page2)
+		unlock_page(page2);
+}
+
 /*
  * Compare extents of two files to see if they are the same.
  * Caller must have locked both inodes to prevent write races.
@@ -1867,10 +1886,12 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
 		dest_page = vfs_dedupe_get_page(dest, destoff);
 		if (IS_ERR(dest_page)) {
 			error = PTR_ERR(dest_page);
-			unlock_page(src_page);
 			put_page(src_page);
 			goto out_error;
 		}
+
+		vfs_lock_two_pages(src_page, dest_page);
+
 		src_addr = kmap_atomic(src_page);
 		dest_addr = kmap_atomic(dest_page);
 
@@ -1882,8 +1903,7 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
 
 		kunmap_atomic(dest_addr);
 		kunmap_atomic(src_addr);
-		unlock_page(dest_page);
-		unlock_page(src_page);
+		vfs_unlock_two_pages(src_page, dest_page);
 		put_page(dest_page);
 		put_page(src_page);
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] xfs: remove more ondisk directory corruption asserts
  2019-08-08 14:46 [PATCH 0/3] xfs: various fixes for 5.3 Darrick J. Wong
  2019-08-08 14:46 ` [PATCH 1/3] vfs: fix page locking deadlocks when deduping files Darrick J. Wong
@ 2019-08-08 14:47 ` Darrick J. Wong
  2019-08-08 16:05   ` Bill O'Donnell
  2019-08-08 14:47 ` [PATCH 3/3] xfs: don't crash on null attr fork xfs_bmapi_read Darrick J. Wong
  2 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2019-08-08 14:47 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Continue our game of replacing ASSERTs for corrupt ondisk metadata with
EFSCORRUPTED returns.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_da_btree.c  |   19 ++++++++++++-------
 fs/xfs/libxfs/xfs_dir2_node.c |    3 ++-
 2 files changed, 14 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index d1c77fd0815d..0bf56e94bfe9 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -487,10 +487,8 @@ xfs_da3_split(
 	ASSERT(state->path.active == 0);
 	oldblk = &state->path.blk[0];
 	error = xfs_da3_root_split(state, oldblk, addblk);
-	if (error) {
-		addblk->bp = NULL;
-		return error;	/* GROT: dir is inconsistent */
-	}
+	if (error)
+		goto out;
 
 	/*
 	 * Update pointers to the node which used to be block 0 and just got
@@ -505,7 +503,10 @@ xfs_da3_split(
 	 */
 	node = oldblk->bp->b_addr;
 	if (node->hdr.info.forw) {
-		ASSERT(be32_to_cpu(node->hdr.info.forw) == addblk->blkno);
+		if (be32_to_cpu(node->hdr.info.forw) != addblk->blkno) {
+			error = -EFSCORRUPTED;
+			goto out;
+		}
 		node = addblk->bp->b_addr;
 		node->hdr.info.back = cpu_to_be32(oldblk->blkno);
 		xfs_trans_log_buf(state->args->trans, addblk->bp,
@@ -514,15 +515,19 @@ xfs_da3_split(
 	}
 	node = oldblk->bp->b_addr;
 	if (node->hdr.info.back) {
-		ASSERT(be32_to_cpu(node->hdr.info.back) == addblk->blkno);
+		if (be32_to_cpu(node->hdr.info.back) != addblk->blkno) {
+			error = -EFSCORRUPTED;
+			goto out;
+		}
 		node = addblk->bp->b_addr;
 		node->hdr.info.forw = cpu_to_be32(oldblk->blkno);
 		xfs_trans_log_buf(state->args->trans, addblk->bp,
 				  XFS_DA_LOGRANGE(node, &node->hdr.info,
 				  sizeof(node->hdr.info)));
 	}
+out:
 	addblk->bp = NULL;
-	return 0;
+	return error;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index afcc6642690a..1fc44efc344d 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -741,7 +741,8 @@ xfs_dir2_leafn_lookup_for_entry(
 	ents = dp->d_ops->leaf_ents_p(leaf);
 
 	xfs_dir3_leaf_check(dp, bp);
-	ASSERT(leafhdr.count > 0);
+	if (leafhdr.count <= 0)
+		return -EFSCORRUPTED;
 
 	/*
 	 * Look up the hash value in the leaf entries.

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] xfs: don't crash on null attr fork xfs_bmapi_read
  2019-08-08 14:46 [PATCH 0/3] xfs: various fixes for 5.3 Darrick J. Wong
  2019-08-08 14:46 ` [PATCH 1/3] vfs: fix page locking deadlocks when deduping files Darrick J. Wong
  2019-08-08 14:47 ` [PATCH 2/3] xfs: remove more ondisk directory corruption asserts Darrick J. Wong
@ 2019-08-08 14:47 ` Darrick J. Wong
  2019-08-08 14:51   ` Bill O'Donnell
  2 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2019-08-08 14:47 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, Zorro Lang

From: Darrick J. Wong <darrick.wong@oracle.com>

Zorro Lang reported a crash in generic/475 if we try to inactivate a
corrupt inode with a NULL attr fork (stack trace shortened somewhat):

RIP: 0010:xfs_bmapi_read+0x311/0xb00 [xfs]
RSP: 0018:ffff888047f9ed68 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff888047f9f038 RCX: 1ffffffff5f99f51
RDX: 0000000000000002 RSI: 0000000000000008 RDI: 0000000000000012
RBP: ffff888002a41f00 R08: ffffed10005483f0 R09: ffffed10005483ef
R10: ffffed10005483ef R11: ffff888002a41f7f R12: 0000000000000004
R13: ffffe8fff53b5768 R14: 0000000000000005 R15: 0000000000000001
FS:  00007f11d44b5b80(0000) GS:ffff888114200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000ef6000 CR3: 000000002e176003 CR4: 00000000001606e0
Call Trace:
 xfs_dabuf_map.constprop.18+0x696/0xe50 [xfs]
 xfs_da_read_buf+0xf5/0x2c0 [xfs]
 xfs_da3_node_read+0x1d/0x230 [xfs]
 xfs_attr_inactive+0x3cc/0x5e0 [xfs]
 xfs_inactive+0x4c8/0x5b0 [xfs]
 xfs_fs_destroy_inode+0x31b/0x8e0 [xfs]
 destroy_inode+0xbc/0x190
 xfs_bulkstat_one_int+0xa8c/0x1200 [xfs]
 xfs_bulkstat_one+0x16/0x20 [xfs]
 xfs_bulkstat+0x6fa/0xf20 [xfs]
 xfs_ioc_bulkstat+0x182/0x2b0 [xfs]
 xfs_file_ioctl+0xee0/0x12a0 [xfs]
 do_vfs_ioctl+0x193/0x1000
 ksys_ioctl+0x60/0x90
 __x64_sys_ioctl+0x6f/0xb0
 do_syscall_64+0x9f/0x4d0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f11d39a3e5b

The "obvious" cause is that the attr ifork is null despite the inode
claiming an attr fork having at least one extent, but it's not so
obvious why we ended up with an inode in that state.

Reported-by: Zorro Lang <zlang@redhat.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204031
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c |   29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index baf0b72c0a37..07aad70f3931 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3835,15 +3835,28 @@ xfs_bmapi_read(
 	XFS_STATS_INC(mp, xs_blk_mapr);
 
 	ifp = XFS_IFORK_PTR(ip, whichfork);
+	if (!ifp) {
+		/* No CoW fork?  Return a hole. */
+		if (whichfork == XFS_COW_FORK) {
+			mval->br_startoff = bno;
+			mval->br_startblock = HOLESTARTBLOCK;
+			mval->br_blockcount = len;
+			mval->br_state = XFS_EXT_NORM;
+			*nmap = 1;
+			return 0;
+		}
 
-	/* No CoW fork?  Return a hole. */
-	if (whichfork == XFS_COW_FORK && !ifp) {
-		mval->br_startoff = bno;
-		mval->br_startblock = HOLESTARTBLOCK;
-		mval->br_blockcount = len;
-		mval->br_state = XFS_EXT_NORM;
-		*nmap = 1;
-		return 0;
+		/*
+		 * A missing attr ifork implies that the inode says we're in
+		 * extents or btree format but failed to pass the inode fork
+		 * verifier while trying to load it.  Treat that as a file
+		 * corruption too.
+		 */
+#ifdef DEBUG
+		xfs_alert(mp, "%s: inode %llu missing fork %d",
+				__func__, ip->i_ino, whichfork);
+#endif /* DEBUG */
+		return -EFSCORRUPTED;
 	}
 
 	if (!(ifp->if_flags & XFS_IFEXTENTS)) {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] xfs: don't crash on null attr fork xfs_bmapi_read
  2019-08-08 14:47 ` [PATCH 3/3] xfs: don't crash on null attr fork xfs_bmapi_read Darrick J. Wong
@ 2019-08-08 14:51   ` Bill O'Donnell
  0 siblings, 0 replies; 8+ messages in thread
From: Bill O'Donnell @ 2019-08-08 14:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Zorro Lang

On Thu, Aug 08, 2019 at 07:47:08AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Zorro Lang reported a crash in generic/475 if we try to inactivate a
> corrupt inode with a NULL attr fork (stack trace shortened somewhat):
> 
> RIP: 0010:xfs_bmapi_read+0x311/0xb00 [xfs]
> RSP: 0018:ffff888047f9ed68 EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: ffff888047f9f038 RCX: 1ffffffff5f99f51
> RDX: 0000000000000002 RSI: 0000000000000008 RDI: 0000000000000012
> RBP: ffff888002a41f00 R08: ffffed10005483f0 R09: ffffed10005483ef
> R10: ffffed10005483ef R11: ffff888002a41f7f R12: 0000000000000004
> R13: ffffe8fff53b5768 R14: 0000000000000005 R15: 0000000000000001
> FS:  00007f11d44b5b80(0000) GS:ffff888114200000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000ef6000 CR3: 000000002e176003 CR4: 00000000001606e0
> Call Trace:
>  xfs_dabuf_map.constprop.18+0x696/0xe50 [xfs]
>  xfs_da_read_buf+0xf5/0x2c0 [xfs]
>  xfs_da3_node_read+0x1d/0x230 [xfs]
>  xfs_attr_inactive+0x3cc/0x5e0 [xfs]
>  xfs_inactive+0x4c8/0x5b0 [xfs]
>  xfs_fs_destroy_inode+0x31b/0x8e0 [xfs]
>  destroy_inode+0xbc/0x190
>  xfs_bulkstat_one_int+0xa8c/0x1200 [xfs]
>  xfs_bulkstat_one+0x16/0x20 [xfs]
>  xfs_bulkstat+0x6fa/0xf20 [xfs]
>  xfs_ioc_bulkstat+0x182/0x2b0 [xfs]
>  xfs_file_ioctl+0xee0/0x12a0 [xfs]
>  do_vfs_ioctl+0x193/0x1000
>  ksys_ioctl+0x60/0x90
>  __x64_sys_ioctl+0x6f/0xb0
>  do_syscall_64+0x9f/0x4d0
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x7f11d39a3e5b
> 
> The "obvious" cause is that the attr ifork is null despite the inode
> claiming an attr fork having at least one extent, but it's not so
> obvious why we ended up with an inode in that state.
> 
> Reported-by: Zorro Lang <zlang@redhat.com>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204031
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Makes sense, and alleviates the problem I occasionally see on g/475.

Reviewed-by: Bill O'Donnell <billodo@redhat.com>

> ---
>  fs/xfs/libxfs/xfs_bmap.c |   29 +++++++++++++++++++++--------
>  1 file changed, 21 insertions(+), 8 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index baf0b72c0a37..07aad70f3931 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3835,15 +3835,28 @@ xfs_bmapi_read(
>  	XFS_STATS_INC(mp, xs_blk_mapr);
>  
>  	ifp = XFS_IFORK_PTR(ip, whichfork);
> +	if (!ifp) {
> +		/* No CoW fork?  Return a hole. */
> +		if (whichfork == XFS_COW_FORK) {
> +			mval->br_startoff = bno;
> +			mval->br_startblock = HOLESTARTBLOCK;
> +			mval->br_blockcount = len;
> +			mval->br_state = XFS_EXT_NORM;
> +			*nmap = 1;
> +			return 0;
> +		}
>  
> -	/* No CoW fork?  Return a hole. */
> -	if (whichfork == XFS_COW_FORK && !ifp) {
> -		mval->br_startoff = bno;
> -		mval->br_startblock = HOLESTARTBLOCK;
> -		mval->br_blockcount = len;
> -		mval->br_state = XFS_EXT_NORM;
> -		*nmap = 1;
> -		return 0;
> +		/*
> +		 * A missing attr ifork implies that the inode says we're in
> +		 * extents or btree format but failed to pass the inode fork
> +		 * verifier while trying to load it.  Treat that as a file
> +		 * corruption too.
> +		 */
> +#ifdef DEBUG
> +		xfs_alert(mp, "%s: inode %llu missing fork %d",
> +				__func__, ip->i_ino, whichfork);
> +#endif /* DEBUG */
> +		return -EFSCORRUPTED;
>  	}
>  
>  	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] xfs: remove more ondisk directory corruption asserts
  2019-08-08 14:47 ` [PATCH 2/3] xfs: remove more ondisk directory corruption asserts Darrick J. Wong
@ 2019-08-08 16:05   ` Bill O'Donnell
  0 siblings, 0 replies; 8+ messages in thread
From: Bill O'Donnell @ 2019-08-08 16:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Aug 08, 2019 at 07:47:02AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Continue our game of replacing ASSERTs for corrupt ondisk metadata with
> EFSCORRUPTED returns.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks good to me.
Reviewed-by: Bill O'Donnell <billodo@redhat.com>

> ---
>  fs/xfs/libxfs/xfs_da_btree.c  |   19 ++++++++++++-------
>  fs/xfs/libxfs/xfs_dir2_node.c |    3 ++-
>  2 files changed, 14 insertions(+), 8 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> index d1c77fd0815d..0bf56e94bfe9 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.c
> +++ b/fs/xfs/libxfs/xfs_da_btree.c
> @@ -487,10 +487,8 @@ xfs_da3_split(
>  	ASSERT(state->path.active == 0);
>  	oldblk = &state->path.blk[0];
>  	error = xfs_da3_root_split(state, oldblk, addblk);
> -	if (error) {
> -		addblk->bp = NULL;
> -		return error;	/* GROT: dir is inconsistent */
> -	}
> +	if (error)
> +		goto out;
>  
>  	/*
>  	 * Update pointers to the node which used to be block 0 and just got
> @@ -505,7 +503,10 @@ xfs_da3_split(
>  	 */
>  	node = oldblk->bp->b_addr;
>  	if (node->hdr.info.forw) {
> -		ASSERT(be32_to_cpu(node->hdr.info.forw) == addblk->blkno);
> +		if (be32_to_cpu(node->hdr.info.forw) != addblk->blkno) {
> +			error = -EFSCORRUPTED;
> +			goto out;
> +		}
>  		node = addblk->bp->b_addr;
>  		node->hdr.info.back = cpu_to_be32(oldblk->blkno);
>  		xfs_trans_log_buf(state->args->trans, addblk->bp,
> @@ -514,15 +515,19 @@ xfs_da3_split(
>  	}
>  	node = oldblk->bp->b_addr;
>  	if (node->hdr.info.back) {
> -		ASSERT(be32_to_cpu(node->hdr.info.back) == addblk->blkno);
> +		if (be32_to_cpu(node->hdr.info.back) != addblk->blkno) {
> +			error = -EFSCORRUPTED;
> +			goto out;
> +		}
>  		node = addblk->bp->b_addr;
>  		node->hdr.info.forw = cpu_to_be32(oldblk->blkno);
>  		xfs_trans_log_buf(state->args->trans, addblk->bp,
>  				  XFS_DA_LOGRANGE(node, &node->hdr.info,
>  				  sizeof(node->hdr.info)));
>  	}
> +out:
>  	addblk->bp = NULL;
> -	return 0;
> +	return error;
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
> index afcc6642690a..1fc44efc344d 100644
> --- a/fs/xfs/libxfs/xfs_dir2_node.c
> +++ b/fs/xfs/libxfs/xfs_dir2_node.c
> @@ -741,7 +741,8 @@ xfs_dir2_leafn_lookup_for_entry(
>  	ents = dp->d_ops->leaf_ents_p(leaf);
>  
>  	xfs_dir3_leaf_check(dp, bp);
> -	ASSERT(leafhdr.count > 0);
> +	if (leafhdr.count <= 0)
> +		return -EFSCORRUPTED;
>  
>  	/*
>  	 * Look up the hash value in the leaf entries.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] vfs: fix page locking deadlocks when deduping files
  2019-08-08 14:46 ` [PATCH 1/3] vfs: fix page locking deadlocks when deduping files Darrick J. Wong
@ 2019-08-09 12:35   ` Bill O'Donnell
  2019-08-11 23:09   ` Dave Chinner
  1 sibling, 0 replies; 8+ messages in thread
From: Bill O'Donnell @ 2019-08-09 12:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Aug 08, 2019 at 07:46:56AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When dedupe wants to use the page cache to compare parts of two files
> for dedupe, we must be very careful to handle locking correctly.  The
> current code doesn't do this.  It must lock and unlock the page only
> once if the two pages are the same, since the overlapping range check
> doesn't catch this when blocksize < pagesize.  If the pages are distinct
> but from the same file, we must observe page locking order and lock them
> in order of increasing offset to avoid clashing with writeback locking.
> 
> Fixes: 876bec6f9bbfcb3 ("vfs: refactor clone/dedupe_file_range common functions")
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Reviewed-by: Bill O'Donnell <billodo@redhat.com>

> ---
>  fs/read_write.c |   36 ++++++++++++++++++++++++++++--------
>  1 file changed, 28 insertions(+), 8 deletions(-)
> 
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 1f5088dec566..4dbdccffa59e 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1811,10 +1811,7 @@ static int generic_remap_check_len(struct inode *inode_in,
>  	return (remap_flags & REMAP_FILE_DEDUP) ? -EBADE : -EINVAL;
>  }
>  
> -/*
> - * Read a page's worth of file data into the page cache.  Return the page
> - * locked.
> - */
> +/* Read a page's worth of file data into the page cache. */
>  static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
>  {
>  	struct page *page;
> @@ -1826,10 +1823,32 @@ static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
>  		put_page(page);
>  		return ERR_PTR(-EIO);
>  	}
> -	lock_page(page);
>  	return page;
>  }
>  
> +/*
> + * Lock two pages, ensuring that we lock in offset order if the pages are from
> + * the same file.
> + */
> +static void vfs_lock_two_pages(struct page *page1, struct page *page2)
> +{
> +	/* Always lock in order of increasing index. */
> +	if (page1->index > page2->index)
> +		swap(page1, page2);
> +
> +	lock_page(page1);
> +	if (page1 != page2)
> +		lock_page(page2);
> +}
> +
> +/* Unlock two pages, being careful not to unlock the same page twice. */
> +static void vfs_unlock_two_pages(struct page *page1, struct page *page2)
> +{
> +	unlock_page(page1);
> +	if (page1 != page2)
> +		unlock_page(page2);
> +}
> +
>  /*
>   * Compare extents of two files to see if they are the same.
>   * Caller must have locked both inodes to prevent write races.
> @@ -1867,10 +1886,12 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
>  		dest_page = vfs_dedupe_get_page(dest, destoff);
>  		if (IS_ERR(dest_page)) {
>  			error = PTR_ERR(dest_page);
> -			unlock_page(src_page);
>  			put_page(src_page);
>  			goto out_error;
>  		}
> +
> +		vfs_lock_two_pages(src_page, dest_page);
> +
>  		src_addr = kmap_atomic(src_page);
>  		dest_addr = kmap_atomic(dest_page);
>  
> @@ -1882,8 +1903,7 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
>  
>  		kunmap_atomic(dest_addr);
>  		kunmap_atomic(src_addr);
> -		unlock_page(dest_page);
> -		unlock_page(src_page);
> +		vfs_unlock_two_pages(src_page, dest_page);
>  		put_page(dest_page);
>  		put_page(src_page);
>  
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] vfs: fix page locking deadlocks when deduping files
  2019-08-08 14:46 ` [PATCH 1/3] vfs: fix page locking deadlocks when deduping files Darrick J. Wong
  2019-08-09 12:35   ` Bill O'Donnell
@ 2019-08-11 23:09   ` Dave Chinner
  1 sibling, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2019-08-11 23:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Aug 08, 2019 at 07:46:56AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When dedupe wants to use the page cache to compare parts of two files
> for dedupe, we must be very careful to handle locking correctly.  The
> current code doesn't do this.  It must lock and unlock the page only
> once if the two pages are the same, since the overlapping range check
> doesn't catch this when blocksize < pagesize.  If the pages are distinct
> but from the same file, we must observe page locking order and lock them
> in order of increasing offset to avoid clashing with writeback locking.
> 
> Fixes: 876bec6f9bbfcb3 ("vfs: refactor clone/dedupe_file_range common functions")
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/read_write.c |   36 ++++++++++++++++++++++++++++--------
>  1 file changed, 28 insertions(+), 8 deletions(-)
> 
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 1f5088dec566..4dbdccffa59e 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1811,10 +1811,7 @@ static int generic_remap_check_len(struct inode *inode_in,
>  	return (remap_flags & REMAP_FILE_DEDUP) ? -EBADE : -EINVAL;
>  }
>  
> -/*
> - * Read a page's worth of file data into the page cache.  Return the page
> - * locked.
> - */
> +/* Read a page's worth of file data into the page cache. */
>  static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
>  {
>  	struct page *page;
> @@ -1826,10 +1823,32 @@ static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
>  		put_page(page);
>  		return ERR_PTR(-EIO);
>  	}
> -	lock_page(page);
>  	return page;
>  }
>  
> +/*
> + * Lock two pages, ensuring that we lock in offset order if the pages are from
> + * the same file.
> + */
> +static void vfs_lock_two_pages(struct page *page1, struct page *page2)
> +{
> +	/* Always lock in order of increasing index. */
> +	if (page1->index > page2->index)
> +		swap(page1, page2);
> +
> +	lock_page(page1);
> +	if (page1 != page2)
> +		lock_page(page2);
> +}
> +
> +/* Unlock two pages, being careful not to unlock the same page twice. */
> +static void vfs_unlock_two_pages(struct page *page1, struct page *page2)
> +{
> +	unlock_page(page1);
> +	if (page1 != page2)
> +		unlock_page(page2);
> +}
> +
>  /*
>   * Compare extents of two files to see if they are the same.
>   * Caller must have locked both inodes to prevent write races.
> @@ -1867,10 +1886,12 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
>  		dest_page = vfs_dedupe_get_page(dest, destoff);
>  		if (IS_ERR(dest_page)) {
>  			error = PTR_ERR(dest_page);
> -			unlock_page(src_page);
>  			put_page(src_page);
>  			goto out_error;
>  		}
> +
> +		vfs_lock_two_pages(src_page, dest_page);
> +

Locking looks fine now, but....

... don't we need to check for invalidation races on the source page
here because the src inode is only locked shared and so can race with
things like direct IO under shared inode locking doing invalidation?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-08-11 23:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-08 14:46 [PATCH 0/3] xfs: various fixes for 5.3 Darrick J. Wong
2019-08-08 14:46 ` [PATCH 1/3] vfs: fix page locking deadlocks when deduping files Darrick J. Wong
2019-08-09 12:35   ` Bill O'Donnell
2019-08-11 23:09   ` Dave Chinner
2019-08-08 14:47 ` [PATCH 2/3] xfs: remove more ondisk directory corruption asserts Darrick J. Wong
2019-08-08 16:05   ` Bill O'Donnell
2019-08-08 14:47 ` [PATCH 3/3] xfs: don't crash on null attr fork xfs_bmapi_read Darrick J. Wong
2019-08-08 14:51   ` Bill O'Donnell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.