All of lore.kernel.org
 help / color / mirror / Atom feed
* reflink fixes
@ 2016-01-03 12:07 Christoph Hellwig
  2016-01-03 12:07 ` [PATCH 1/3] xfs: pass inode instead of file to xfs_reflink_dirty_range Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Christoph Hellwig @ 2016-01-03 12:07 UTC (permalink / raw)
  To: darrick.wong; +Cc: xfs

Hi Darrick,

below are three patches I need to xfstests to pass reliably for the
reflink code.  I actually still see a lot of issues when testing
NFS clones on XFS due to extents hanging on the COW for too long,
but that's a separate issue I still need to investigate.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] xfs: pass inode instead of file to xfs_reflink_dirty_range
  2016-01-03 12:07 reflink fixes Christoph Hellwig
@ 2016-01-03 12:07 ` Christoph Hellwig
  2016-01-05  1:23   ` Darrick J. Wong
  2016-01-03 12:07 ` [PATCH 2/3] xfs: only end a COW operation in xfs_zero_remaining_bytes if we started one Christoph Hellwig
  2016-01-03 12:07 ` [PATCH 3/3] xfs: cancel COW in xfs_cancel_ioend Christoph Hellwig
  2 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2016-01-03 12:07 UTC (permalink / raw)
  To: darrick.wong; +Cc: xfs

We don't actually need a file for write_begin/end, those can be passed
as NULL for disk based file systems.  This is important as we won't
even have a file pointer during a truncate operation, which gives a
guaranteed NULL pointer dererference with the current code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_reflink.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index da4a715..4a3f0ee 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1316,7 +1316,7 @@ out_error:
  */
 STATIC int
 xfs_reflink_dirty_range(
-	struct file		*filp,
+	struct inode		*inode,
 	xfs_off_t		pos,
 	xfs_off_t		len)
 {
@@ -1330,14 +1330,14 @@ xfs_reflink_dirty_range(
 	unsigned long		bytes;	/* Bytes to write to page */
 	void			*fsdata;
 
-	mapping = filp->f_mapping;
+	mapping = inode->i_mapping;
 	a_ops = mapping->a_ops;
 	flags = AOP_FLAG_UNINTERRUPTIBLE;
 	do {
 
 		offset = (pos & (PAGE_CACHE_SIZE - 1));
 		bytes = min_t(unsigned long, len, PAGE_CACHE_SIZE) - offset;
-		rpage = xfs_get_page(file_inode(filp), pos);
+		rpage = xfs_get_page(inode, pos);
 		if (IS_ERR(rpage)) {
 			error = PTR_ERR(rpage);
 			break;
@@ -1346,24 +1346,24 @@ xfs_reflink_dirty_range(
 			break;
 		}
 
-		error = a_ops->write_begin(filp, mapping, pos, bytes, flags,
+		error = a_ops->write_begin(NULL, mapping, pos, bytes, flags,
 					   &page, &fsdata);
 		page_cache_release(rpage);
 		if (error < 0)
 			break;
 
-		trace_xfs_reflink_unshare_page(file_inode(filp), page,
+		trace_xfs_reflink_unshare_page(inode, page,
 				pos, bytes);
 
 		if (!PageUptodate(page)) {
-			pr_err("%s: STALE? ino=%lu pos=%llu\n",
-				__func__, filp->f_inode->i_ino, pos);
+			pr_err("%s: STALE? ino=%llu pos=%llu\n",
+				__func__, XFS_I(inode)->i_ino, pos);
 			WARN_ON(1);
 		}
 		if (mapping_writably_mapped(mapping))
 			flush_dcache_page(page);
 
-		error = a_ops->write_end(filp, mapping, pos, bytes, bytes,
+		error = a_ops->write_end(NULL, mapping, pos, bytes, bytes,
 					 page, fsdata);
 		if (error < 0)
 			break;
@@ -1454,7 +1454,7 @@ xfs_reflink_dirty_extents(
 			flen = XFS_FSB_TO_B(mp, rlen);
 			if (fpos + flen > isize)
 				flen = isize - fpos;
-			error = xfs_reflink_dirty_range(filp, fpos, flen);
+			error = xfs_reflink_dirty_range(VFS_I(ip), fpos, flen);
 			xfs_ilock(ip, XFS_ILOCK_EXCL);
 			if (error)
 				goto out;
-- 
1.9.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] xfs: only end a COW operation in xfs_zero_remaining_bytes if we started one
  2016-01-03 12:07 reflink fixes Christoph Hellwig
  2016-01-03 12:07 ` [PATCH 1/3] xfs: pass inode instead of file to xfs_reflink_dirty_range Christoph Hellwig
@ 2016-01-03 12:07 ` Christoph Hellwig
  2016-01-05  1:28   ` Darrick J. Wong
  2016-01-03 12:07 ` [PATCH 3/3] xfs: cancel COW in xfs_cancel_ioend Christoph Hellwig
  2 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2016-01-03 12:07 UTC (permalink / raw)
  To: darrick.wong; +Cc: xfs

Without these we can see NULL pointer dereferences due to a non-existing
COW fork during xfstests runs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_bmap_util.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 7bee3c7..e777095 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1226,13 +1226,15 @@ xfs_zero_remaining_bytes(
 
 		error = xfs_bwrite(bp);
 		xfs_buf_relse(bp);
-		if (error) {
-			err2 = xfs_reflink_end_cow_failed(ip, offset,
+		if (should_fork) {
+			if (error) {
+				err2 = xfs_reflink_end_cow_failed(ip, offset,
+						lastoffset - offset + 1);
+				return error;
+			}
+			error = xfs_reflink_end_cow(ip, offset,
 					lastoffset - offset + 1);
-			return error;
 		}
-		error = xfs_reflink_end_cow(ip, offset,
-				lastoffset - offset + 1);
 		if (error)
 			return error;
 	}
-- 
1.9.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] xfs: cancel COW in xfs_cancel_ioend
  2016-01-03 12:07 reflink fixes Christoph Hellwig
  2016-01-03 12:07 ` [PATCH 1/3] xfs: pass inode instead of file to xfs_reflink_dirty_range Christoph Hellwig
  2016-01-03 12:07 ` [PATCH 2/3] xfs: only end a COW operation in xfs_zero_remaining_bytes if we started one Christoph Hellwig
@ 2016-01-03 12:07 ` Christoph Hellwig
  2016-01-05  1:43   ` Darrick J. Wong
  2 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2016-01-03 12:07 UTC (permalink / raw)
  To: darrick.wong; +Cc: xfs

Otherwise we leak COW allocations done earlier in writepage.  This
can be reproduced fairly easily when we hit the non-blocking writeback
EAGAIN case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 185415a..9c69dc3 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -588,6 +588,7 @@ xfs_cancel_ioend(
 {
 	xfs_ioend_t		*next;
 	struct buffer_head	*bh, *next_bh;
+	int			error;
 
 	do {
 		next = ioend->io_list;
@@ -605,6 +606,12 @@ xfs_cancel_ioend(
 			unlock_buffer(bh);
 		} while ((bh = next_bh) != NULL);
 
+		if (ioend->io_flags & XFS_IOEND_COW) {
+			error = xfs_reflink_end_cow_failed(
+					XFS_I(ioend->io_inode),
+					ioend->io_offset, ioend->io_size);
+			WARN_ON_ONCE(error);
+		}
 		mempool_free(ioend, xfs_ioend_pool);
 	} while ((ioend = next) != NULL);
 }
-- 
1.9.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] xfs: pass inode instead of file to xfs_reflink_dirty_range
  2016-01-03 12:07 ` [PATCH 1/3] xfs: pass inode instead of file to xfs_reflink_dirty_range Christoph Hellwig
@ 2016-01-05  1:23   ` Darrick J. Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2016-01-05  1:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Jan 03, 2016 at 01:07:51PM +0100, Christoph Hellwig wrote:
> We don't actually need a file for write_begin/end, those can be passed
> as NULL for disk based file systems.  This is important as we won't

I wasn't aware that one could /do/ that. :)

Looks good, though.  I think I can clean the filp arguments out of the reflink
code entirely, too.

--D

> even have a file pointer during a truncate operation, which gives a
> guaranteed NULL pointer dererference with the current code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_reflink.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index da4a715..4a3f0ee 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1316,7 +1316,7 @@ out_error:
>   */
>  STATIC int
>  xfs_reflink_dirty_range(
> -	struct file		*filp,
> +	struct inode		*inode,
>  	xfs_off_t		pos,
>  	xfs_off_t		len)
>  {
> @@ -1330,14 +1330,14 @@ xfs_reflink_dirty_range(
>  	unsigned long		bytes;	/* Bytes to write to page */
>  	void			*fsdata;
>  
> -	mapping = filp->f_mapping;
> +	mapping = inode->i_mapping;
>  	a_ops = mapping->a_ops;
>  	flags = AOP_FLAG_UNINTERRUPTIBLE;
>  	do {
>  
>  		offset = (pos & (PAGE_CACHE_SIZE - 1));
>  		bytes = min_t(unsigned long, len, PAGE_CACHE_SIZE) - offset;
> -		rpage = xfs_get_page(file_inode(filp), pos);
> +		rpage = xfs_get_page(inode, pos);
>  		if (IS_ERR(rpage)) {
>  			error = PTR_ERR(rpage);
>  			break;
> @@ -1346,24 +1346,24 @@ xfs_reflink_dirty_range(
>  			break;
>  		}
>  
> -		error = a_ops->write_begin(filp, mapping, pos, bytes, flags,
> +		error = a_ops->write_begin(NULL, mapping, pos, bytes, flags,
>  					   &page, &fsdata);
>  		page_cache_release(rpage);
>  		if (error < 0)
>  			break;
>  
> -		trace_xfs_reflink_unshare_page(file_inode(filp), page,
> +		trace_xfs_reflink_unshare_page(inode, page,
>  				pos, bytes);
>  
>  		if (!PageUptodate(page)) {
> -			pr_err("%s: STALE? ino=%lu pos=%llu\n",
> -				__func__, filp->f_inode->i_ino, pos);
> +			pr_err("%s: STALE? ino=%llu pos=%llu\n",
> +				__func__, XFS_I(inode)->i_ino, pos);
>  			WARN_ON(1);
>  		}
>  		if (mapping_writably_mapped(mapping))
>  			flush_dcache_page(page);
>  
> -		error = a_ops->write_end(filp, mapping, pos, bytes, bytes,
> +		error = a_ops->write_end(NULL, mapping, pos, bytes, bytes,
>  					 page, fsdata);
>  		if (error < 0)
>  			break;
> @@ -1454,7 +1454,7 @@ xfs_reflink_dirty_extents(
>  			flen = XFS_FSB_TO_B(mp, rlen);
>  			if (fpos + flen > isize)
>  				flen = isize - fpos;
> -			error = xfs_reflink_dirty_range(filp, fpos, flen);
> +			error = xfs_reflink_dirty_range(VFS_I(ip), fpos, flen);
>  			xfs_ilock(ip, XFS_ILOCK_EXCL);
>  			if (error)
>  				goto out;
> -- 
> 1.9.1
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xfs: only end a COW operation in xfs_zero_remaining_bytes if we started one
  2016-01-03 12:07 ` [PATCH 2/3] xfs: only end a COW operation in xfs_zero_remaining_bytes if we started one Christoph Hellwig
@ 2016-01-05  1:28   ` Darrick J. Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2016-01-05  1:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Jan 03, 2016 at 01:07:52PM +0100, Christoph Hellwig wrote:
> Without these we can see NULL pointer dereferences due to a non-existing
> COW fork during xfstests runs.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_bmap_util.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 7bee3c7..e777095 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1226,13 +1226,15 @@ xfs_zero_remaining_bytes(
>  
>  		error = xfs_bwrite(bp);
>  		xfs_buf_relse(bp);
> -		if (error) {
> -			err2 = xfs_reflink_end_cow_failed(ip, offset,
> +		if (should_fork) {
> +			if (error) {
> +				err2 = xfs_reflink_end_cow_failed(ip, offset,
> +						lastoffset - offset + 1);
> +				return error;
> +			}
> +			error = xfs_reflink_end_cow(ip, offset,
>  					lastoffset - offset + 1);
> -			return error;
>  		}

Good catch!  There's a buffer leak in the error case for xfs_map_cow_blocks()
so I'll fold them both into the original patch.

--D

> -		error = xfs_reflink_end_cow(ip, offset,
> -				lastoffset - offset + 1);
>  		if (error)
>  			return error;
>  	}
> -- 
> 1.9.1
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] xfs: cancel COW in xfs_cancel_ioend
  2016-01-03 12:07 ` [PATCH 3/3] xfs: cancel COW in xfs_cancel_ioend Christoph Hellwig
@ 2016-01-05  1:43   ` Darrick J. Wong
  2016-01-05 10:42     ` Christoph Hellwig
  2016-01-10 22:48     ` Darrick J. Wong
  0 siblings, 2 replies; 15+ messages in thread
From: Darrick J. Wong @ 2016-01-05  1:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Jan 03, 2016 at 01:07:53PM +0100, Christoph Hellwig wrote:
> Otherwise we leak COW allocations done earlier in writepage.  This
> can be reproduced fairly easily when we hit the non-blocking writeback
> EAGAIN case.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_aops.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 185415a..9c69dc3 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -588,6 +588,7 @@ xfs_cancel_ioend(
>  {
>  	xfs_ioend_t		*next;
>  	struct buffer_head	*bh, *next_bh;
> +	int			error;
>  
>  	do {
>  		next = ioend->io_list;
> @@ -605,6 +606,12 @@ xfs_cancel_ioend(
>  			unlock_buffer(bh);
>  		} while ((bh = next_bh) != NULL);
>  
> +		if (ioend->io_flags & XFS_IOEND_COW) {
> +			error = xfs_reflink_end_cow_failed(
> +					XFS_I(ioend->io_inode),
> +					ioend->io_offset, ioend->io_size);
> +			WARN_ON_ONCE(error);
> +		}

Hmm.  This might be the cause of the occasional complaints I've been seeing
where allocated blocks remain in the COW fork when the inode is being cleared
out.  That said, the xfs_reflink_end_cow_failed() is apparently missing a
xfs_bunmapi_cow() to actually clean out the COW fork.

Good catch, in any case.  Thank you for the testing and patches! :)

--D

>  		mempool_free(ioend, xfs_ioend_pool);
>  	} while ((ioend = next) != NULL);
>  }
> -- 
> 1.9.1
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] xfs: cancel COW in xfs_cancel_ioend
  2016-01-05  1:43   ` Darrick J. Wong
@ 2016-01-05 10:42     ` Christoph Hellwig
  2016-01-07  0:32       ` Darrick J. Wong
  2016-01-10 22:48     ` Darrick J. Wong
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2016-01-05 10:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, xfs

On Mon, Jan 04, 2016 at 05:43:10PM -0800, Darrick J. Wong wrote:
> Hmm.  This might be the cause of the occasional complaints I've been seeing
> where allocated blocks remain in the COW fork when the inode is being cleared
> out.  That said, the xfs_reflink_end_cow_failed() is apparently missing a
> xfs_bunmapi_cow() to actually clean out the COW fork.

I can still reproduce xfs_reflink_cancel_pending_cow tripping over
allocated blocks in the COW fork over NFS.  generic/154 reproduces
it 100% over NFS, although when adding a delay before the cleanup
it disappears.  I'm currently trying to figure out why.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] xfs: cancel COW in xfs_cancel_ioend
  2016-01-05 10:42     ` Christoph Hellwig
@ 2016-01-07  0:32       ` Darrick J. Wong
  2016-01-07 15:25         ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2016-01-07  0:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Christoph Hellwig, xfs

On Tue, Jan 05, 2016 at 02:42:14AM -0800, Christoph Hellwig wrote:
> On Mon, Jan 04, 2016 at 05:43:10PM -0800, Darrick J. Wong wrote:
> > Hmm.  This might be the cause of the occasional complaints I've been seeing
> > where allocated blocks remain in the COW fork when the inode is being cleared
> > out.  That said, the xfs_reflink_end_cow_failed() is apparently missing a
> > xfs_bunmapi_cow() to actually clean out the COW fork.
> 
> I can still reproduce xfs_reflink_cancel_pending_cow tripping over
> allocated blocks in the COW fork over NFS.  generic/154 reproduces
> it 100% over NFS, although when adding a delay before the cleanup
> it disappears.  I'm currently trying to figure out why.

Ok.  I spent a couple of days trying to find all the places where we need to
delete CoW reservations (hole punch, truncate, etc.) and found some places
where the code was leaving reservations behind in the CoW fork (most notable
truncate).  I also made the inode eviction code purge any CoW leftovers, so
that should all go away.

I also wrote some more xfstests that try to hit all the CoW-cancelling code
paths (fpunch, fzero, fcollapse, finsert, truncate, -EIO) to smoke test all
that.  By the way, do you have a testcase handy for the "non-blocking writeback
EAGAIN" case?  I'm guessing that we could hit that pretty easily by lowering
dirty_background_* and dirtying a lot of pages while reflinking?

(Will push new code to github in a day or two.)

--D

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] xfs: cancel COW in xfs_cancel_ioend
  2016-01-07  0:32       ` Darrick J. Wong
@ 2016-01-07 15:25         ` Christoph Hellwig
  2016-01-08 10:09           ` Darrick J. Wong
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2016-01-07 15:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Wed, Jan 06, 2016 at 04:32:27PM -0800, Darrick J. Wong wrote:
> Ok.  I spent a couple of days trying to find all the places where we need to
> delete CoW reservations (hole punch, truncate, etc.) and found some places
> where the code was leaving reservations behind in the CoW fork (most notable
> truncate).  I also made the inode eviction code purge any CoW leftovers, so
> that should all go away.

Can you send that part out for NFS testing?

> I also wrote some more xfstests that try to hit all the CoW-cancelling code
> paths (fpunch, fzero, fcollapse, finsert, truncate, -EIO) to smoke test all
> that.  By the way, do you have a testcase handy for the "non-blocking writeback
> EAGAIN" case?  I'm guessing that we could hit that pretty easily by lowering
> dirty_background_* and dirtying a lot of pages while reflinking?

I did hit it pretty easily testing over NFS to a local nfs server with
xfstests.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] xfs: cancel COW in xfs_cancel_ioend
  2016-01-07 15:25         ` Christoph Hellwig
@ 2016-01-08 10:09           ` Darrick J. Wong
  2016-01-08 13:47             ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2016-01-08 10:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Jan 07, 2016 at 04:25:42PM +0100, Christoph Hellwig wrote:
> On Wed, Jan 06, 2016 at 04:32:27PM -0800, Darrick J. Wong wrote:
> > Ok.  I spent a couple of days trying to find all the places where we need to
> > delete CoW reservations (hole punch, truncate, etc.) and found some places
> > where the code was leaving reservations behind in the CoW fork (most notable
> > truncate).  I also made the inode eviction code purge any CoW leftovers, so
> > that should all go away.
> 
> Can you send that part out for NFS testing?

Ok, I've uploaded the latest kernel + xfstests to github:
https://github.com/djwong/linux/commits/for-dave
https://github.com/djwong/xfstests/tree/for-dave

(Big pile of new tests to try to exercise the cow cancellation code.)

> 
> > I also wrote some more xfstests that try to hit all the CoW-cancelling code
> > paths (fpunch, fzero, fcollapse, finsert, truncate, -EIO) to smoke test all
> > that.  By the way, do you have a testcase handy for the "non-blocking writeback
> > EAGAIN" case?  I'm guessing that we could hit that pretty easily by lowering
> > dirty_background_* and dirtying a lot of pages while reflinking?
> 
> I did hit it pretty easily testing over NFS to a local nfs server with
> xfstests.

Hmm.  If my latest round of fixes didn't get this one, I'll see next week if
I can get the NFS stuff working so I can test it directly.

--D

> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] xfs: cancel COW in xfs_cancel_ioend
  2016-01-08 10:09           ` Darrick J. Wong
@ 2016-01-08 13:47             ` Christoph Hellwig
  2016-01-09 21:17               ` Darrick J. Wong
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2016-01-08 13:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

Thanks Darrick!

I've not seen the inode eviction asserts anymore, but I now hit a
corruption warnings in generic/168 reliably.  I did hit before as
well, but not very reliably.

generic/168 17s ...[  296.988867] XFS (vdc): Metadata corruption detected at
xfs_refcountbt_write_verify+0x3e/0x90, block 0xb0
[  296.990412] XFS (vdc): Unmount and run xfs_repair
[  296.991300] XFS (vdc): First 64 bytes of corrupted metadata buffer:
[  296.992349] ffff88007b755000: 52 33 46 43 00 01 00 01 ff ff ff ff ff ff ff
ff  R3FC............
[  296.993893] ffff88007b755010: 00 00 00 00 00 00 00 b0 00 00 00 00 00 00 00
00  ................
[  297.003293] ffff88007b755020: fe 94 c7 69 fd bd 4a 77 85 3d 9d 6d 7b 45 38
07  ...i..Jw.=.m{E8.
[  297.004442] ffff88007b755030: 00 00 00 00 00 00 00 00 00 00 0b 9c 00 00 7f
07  ................
[  297.005867] XFS (vdc): xfs_do_force_shutdown(0x8) called from line 1247 of
file fs/xfs/xfs_buf.c.  Return address = 0xffffffff814cce52

I'll investigate it in a little more detail.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] xfs: cancel COW in xfs_cancel_ioend
  2016-01-08 13:47             ` Christoph Hellwig
@ 2016-01-09 21:17               ` Darrick J. Wong
  2016-01-10  7:54                 ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2016-01-09 21:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jan 08, 2016 at 02:47:04PM +0100, Christoph Hellwig wrote:
> Thanks Darrick!
> 
> I've not seen the inode eviction asserts anymore, but I now hit a
> corruption warnings in generic/168 reliably.  I did hit before as
> well, but not very reliably.

I'll see if I can repro the 168 error; it's been running in a loop all
night and hasn't bombed yet.

In the meantime, I added some more tests and fixed a CoW corruption when an
xfs_io_overwrite extent has cow reservations in the middle of the extent.

I also restarted testing on arm64, ppc64{,el}, and i686; it seems
stable enough right now to pass all ~130 reflink xfstests here.

> generic/168 17s ...[  296.988867] XFS (vdc): Metadata corruption detected at
> xfs_refcountbt_write_verify+0x3e/0x90, block 0xb0
> [  296.990412] XFS (vdc): Unmount and run xfs_repair
> [  296.991300] XFS (vdc): First 64 bytes of corrupted metadata buffer:
> [  296.992349] ffff88007b755000: 52 33 46 43 00 01 00 01 ff ff ff ff ff ff ff
> ff  R3FC............
> [  296.993893] ffff88007b755010: 00 00 00 00 00 00 00 b0 00 00 00 00 00 00 00
> 00  ................

No LSN?  That's ... odd.

> [  297.003293] ffff88007b755020: fe 94 c7 69 fd bd 4a 77 85 3d 9d 6d 7b 45 38
> 07  ...i..Jw.=.m{E8.

Does this match the FS UUID?

> [  297.004442] ffff88007b755030: 00 00 00 00 00 00 00 00 00 00 0b 9c 00 00 7f
> 07  ................

Also no owner or CRC.  Hmmmm.

> [  297.005867] XFS (vdc): xfs_do_force_shutdown(0x8) called from line 1247 of
> file fs/xfs/xfs_buf.c.  Return address = 0xffffffff814cce52
> 
> I'll investigate it in a little more detail.

--D

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] xfs: cancel COW in xfs_cancel_ioend
  2016-01-09 21:17               ` Darrick J. Wong
@ 2016-01-10  7:54                 ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2016-01-10  7:54 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, xfs

On Sat, Jan 09, 2016 at 01:17:13PM -0800, Darrick J. Wong wrote:
> > 
> > I've not seen the inode eviction asserts anymore, but I now hit a
> > corruption warnings in generic/168 reliably.  I did hit before as
> > well, but not very reliably.
> 
> I'll see if I can repro the 168 error; it's been running in a loop all
> night and hasn't bombed yet.

Note that this is over nfs to a local server, not running on xfs directly,
which is doing fine.

> In the meantime, I added some more tests and fixed a CoW corruption when an
> xfs_io_overwrite extent has cow reservations in the middle of the extent.
> 
> I also restarted testing on arm64, ppc64{,el}, and i686; it seems
> stable enough right now to pass all ~130 reflink xfstests here.

I see pretty reliable failures in xfs/128 xfs/132 xfs/139, apparenly
due to content mismatches.

Re the verifier failure:

sees like we're hitting the

	if (level >= pag->pagf_refcount_level)
		return false;

case.  Together with the other garbage in it seems like we're
seeing a btree block that's not properly initialized in some way,
maybe after a split.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] xfs: cancel COW in xfs_cancel_ioend
  2016-01-05  1:43   ` Darrick J. Wong
  2016-01-05 10:42     ` Christoph Hellwig
@ 2016-01-10 22:48     ` Darrick J. Wong
  1 sibling, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2016-01-10 22:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Mon, Jan 04, 2016 at 05:43:10PM -0800, Darrick J. Wong wrote:
> On Sun, Jan 03, 2016 at 01:07:53PM +0100, Christoph Hellwig wrote:
> > Otherwise we leak COW allocations done earlier in writepage.  This
> > can be reproduced fairly easily when we hit the non-blocking writeback
> > EAGAIN case.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/xfs_aops.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 185415a..9c69dc3 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -588,6 +588,7 @@ xfs_cancel_ioend(
> >  {
> >  	xfs_ioend_t		*next;
> >  	struct buffer_head	*bh, *next_bh;
> > +	int			error;
> >  
> >  	do {
> >  		next = ioend->io_list;
> > @@ -605,6 +606,12 @@ xfs_cancel_ioend(
> >  			unlock_buffer(bh);
> >  		} while ((bh = next_bh) != NULL);
> >  
> > +		if (ioend->io_flags & XFS_IOEND_COW) {
> > +			error = xfs_reflink_end_cow_failed(
> > +					XFS_I(ioend->io_inode),
> > +					ioend->io_offset, ioend->io_size);
> > +			WARN_ON_ONCE(error);
> > +		}

Actually, no, this isn't correct.  Even if we cancel the ioend, we must retain
the CoW reservation because the pages remain dirty and writepage will try
again.  If we delete the reservation, that second writepage will treat the
dirty page as a regular overwrite because there's no reservation, which is
wrong.

We need to keep something in the CoW fork; either we can leave the allocated
blocks or we could theoretically convert it back to a delalloc reservation.
For now I'll leave the mapping untouched since I've subsequently taught xfs to
clear out the CoW mappings when we truncate/punch/etc.  The reservation won't
hang around for long unless IO errors start piling up.

This causes file corruption in xfs/140 when blocksize < pagesize.

--D

> 
> Hmm.  This might be the cause of the occasional complaints I've been seeing
> where allocated blocks remain in the COW fork when the inode is being cleared
> out.  That said, the xfs_reflink_end_cow_failed() is apparently missing a
> xfs_bunmapi_cow() to actually clean out the COW fork.
> 
> Good catch, in any case.  Thank you for the testing and patches! :)
> 
> --D
> 
> >  		mempool_free(ioend, xfs_ioend_pool);
> >  	} while ((ioend = next) != NULL);
> >  }
> > -- 
> > 1.9.1
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-01-10 22:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-03 12:07 reflink fixes Christoph Hellwig
2016-01-03 12:07 ` [PATCH 1/3] xfs: pass inode instead of file to xfs_reflink_dirty_range Christoph Hellwig
2016-01-05  1:23   ` Darrick J. Wong
2016-01-03 12:07 ` [PATCH 2/3] xfs: only end a COW operation in xfs_zero_remaining_bytes if we started one Christoph Hellwig
2016-01-05  1:28   ` Darrick J. Wong
2016-01-03 12:07 ` [PATCH 3/3] xfs: cancel COW in xfs_cancel_ioend Christoph Hellwig
2016-01-05  1:43   ` Darrick J. Wong
2016-01-05 10:42     ` Christoph Hellwig
2016-01-07  0:32       ` Darrick J. Wong
2016-01-07 15:25         ` Christoph Hellwig
2016-01-08 10:09           ` Darrick J. Wong
2016-01-08 13:47             ` Christoph Hellwig
2016-01-09 21:17               ` Darrick J. Wong
2016-01-10  7:54                 ` Christoph Hellwig
2016-01-10 22:48     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.