All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] xfs: for-next file collapse bug fixes
@ 2014-08-08 18:49 Brian Foster
  2014-08-08 18:49 ` [PATCH 1/2] xfs: don't log inode unless extent shift makes extent modifications Brian Foster
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Brian Foster @ 2014-08-08 18:49 UTC (permalink / raw)
  To: xfs

Hi all,

I've seen collapse range fall over during some recent stress testing.
I'm running fsx and 16 fsstress threads in parallel to reproduce. Note
that the fsstress workload doesn't need to be on the same fs (I suspect
a sync() is a trigger). These patches are what has fallen out so far...

The first patch stems from the fact that the error caused an fs shutdown
that appeared to be unnecessary. I was initially going to skip the inode
log on any error, but on closer inspection it seems like we expect to
abort/shutdown if something has in fact been changed, so this modifies
the code to reduce that shutdown window. The second patch deals with the
actual collapse failure by fixing up the locking.

Note that I still reproduced at least one collapse failure even with
these fixes, so there could be more at play here with the
implementation:

XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 5535 of file fs/xfs/libxfs/xfs_bmap.c.  Caller xfs_collapse_file_space+0x1af/0x280 [xfs]

This took significantly longer to reproduce and I don't yet have a feel
for how reproducible it is in general. In the meantime, these two seemed
relatively straightforward and incremental...

Brian

Brian Foster (2):
  xfs: don't log inode unless extent shift makes extent modifications
  xfs: hole the inode lock across a full file collapse

 fs/xfs/libxfs/xfs_bmap.c | 18 ++++++++++--------
 fs/xfs/xfs_bmap_util.c   |  5 +++--
 2 files changed, 13 insertions(+), 10 deletions(-)

-- 
1.8.3.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] xfs: don't log inode unless extent shift makes extent modifications
  2014-08-08 18:49 [PATCH 0/2] xfs: for-next file collapse bug fixes Brian Foster
@ 2014-08-08 18:49 ` Brian Foster
  2014-08-11 18:03   ` Christoph Hellwig
  2014-08-08 18:49 ` [PATCH 2/2] xfs: hole the inode lock across a full file collapse Brian Foster
  2014-08-11 21:55 ` [PATCH 0/2] xfs: for-next file collapse bug fixes Dave Chinner
  2 siblings, 1 reply; 10+ messages in thread
From: Brian Foster @ 2014-08-08 18:49 UTC (permalink / raw)
  To: xfs

The file collapse mechanism uses xfs_bmap_shift_extents() to collapse
all subsequent extents down into the specified, previously punched out,
region. This function performs some validation, such as whether a
sufficient hole exists in the target region of the collapse, then shifts
the remaining exents downward.

The exit path of the function currently logs the inode unconditionally.
While we must log the inode (and abort) if an error occurs and the
transaction is dirty, the initial validation paths can generate errors
before the transaction has been dirtied. This creates an unnecessary
filesystem shutdown scenario, as the caller will cancel a transaction
that has been marked dirty.

Modify xfs_bmap_shift_extents() to OR the logflags bits as modifications
are made to the inode bmap. Only log the inode in the exit path if
logflags has been set. This ensures we only have to cancel a dirty
transaction if modifications have been made and prevents an unnecessary
filesystem shutdown otherwise.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index de2d26d..86df952 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5424,7 +5424,7 @@ xfs_bmap_shift_extents(
 	struct xfs_bmap_free	*flist,
 	int			num_exts)
 {
-	struct xfs_btree_cur		*cur;
+	struct xfs_btree_cur		*cur = NULL;
 	struct xfs_bmbt_rec_host	*gotp;
 	struct xfs_bmbt_irec            got;
 	struct xfs_bmbt_irec		left;
@@ -5435,7 +5435,7 @@ xfs_bmap_shift_extents(
 	int				error = 0;
 	int				i;
 	int				whichfork = XFS_DATA_FORK;
-	int				logflags;
+	int				logflags = 0;
 	xfs_filblks_t			blockcount = 0;
 	int				total_extents;
 
@@ -5478,16 +5478,11 @@ xfs_bmap_shift_extents(
 		}
 	}
 
-	/* We are going to change core inode */
-	logflags = XFS_ILOG_CORE;
 	if (ifp->if_flags & XFS_IFBROOT) {
 		cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork);
 		cur->bc_private.b.firstblock = *firstblock;
 		cur->bc_private.b.flist = flist;
 		cur->bc_private.b.flags = 0;
-	} else {
-		cur = NULL;
-		logflags |= XFS_ILOG_DEXT;
 	}
 
 	/*
@@ -5545,11 +5540,14 @@ xfs_bmap_shift_extents(
 			blockcount = left.br_blockcount +
 				got.br_blockcount;
 			xfs_iext_remove(ip, *current_ext, 1, 0);
+			logflags |= XFS_ILOG_CORE;
 			if (cur) {
 				error = xfs_btree_delete(cur, &i);
 				if (error)
 					goto del_cursor;
 				XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor);
+			} else {
+				logflags |= XFS_ILOG_DEXT;
 			}
 			XFS_IFORK_NEXT_SET(ip, whichfork,
 				XFS_IFORK_NEXTENTS(ip, whichfork) - 1);
@@ -5575,6 +5573,7 @@ xfs_bmap_shift_extents(
 			got.br_startoff = startoff;
 		}
 
+		logflags |= XFS_ILOG_CORE;
 		if (cur) {
 			error = xfs_bmbt_update(cur, got.br_startoff,
 						got.br_startblock,
@@ -5582,6 +5581,8 @@ xfs_bmap_shift_extents(
 						got.br_state);
 			if (error)
 				goto del_cursor;
+		} else {
+			logflags |= XFS_ILOG_DEXT;
 		}
 
 		(*current_ext)++;
@@ -5597,6 +5598,7 @@ del_cursor:
 		xfs_btree_del_cursor(cur,
 			error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
 
-	xfs_trans_log_inode(tp, ip, logflags);
+	if (logflags)
+		xfs_trans_log_inode(tp, ip, logflags);
 	return error;
 }
-- 
1.8.3.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] xfs: hole the inode lock across a full file collapse
  2014-08-08 18:49 [PATCH 0/2] xfs: for-next file collapse bug fixes Brian Foster
  2014-08-08 18:49 ` [PATCH 1/2] xfs: don't log inode unless extent shift makes extent modifications Brian Foster
@ 2014-08-08 18:49 ` Brian Foster
  2014-08-11 18:03   ` Christoph Hellwig
  2014-08-13 15:42   ` Brian Foster
  2014-08-11 21:55 ` [PATCH 0/2] xfs: for-next file collapse bug fixes Dave Chinner
  2 siblings, 2 replies; 10+ messages in thread
From: Brian Foster @ 2014-08-08 18:49 UTC (permalink / raw)
  To: xfs

A file collapse stress test workload reproduces collapse failures
mid-operation due to changes in the inode fork extent count across
extent shift cycles. xfs_collapse_file_space() currently calls
xfs_bmap_shift_extents() to shift one extent at a time per transaction.
The extent index is used to track the next extent to shift after each
iteration.

A concurrent fsx and fsstress workload reproduces a scenario where the
extent count changes during this sequence, causing the 'current_ext'
index to become inaccurate and possibly skip shifting an extent. The
likely result of this behavior is the subsequent shift attempt will not
find a hole in the area of the skipped extent and fail, leaving the file
in a partially collapsed state.

This occurs because the ilock is released and acquired across each
transaction and each individual extent shift. Tracepoint output shows
that once the ilock is released after an extent shift, a pending
blocking writeback (e.g., sync) can acquire the lock and proceed before
the next extent is shifted down. If the writeback converts part of a
delayed allocation earlier in the file, for example, it can insert a new
extent into the map. Tracing confirms a call to
xfs_bmap_add_extent_delay_real() in this particular instance.

To prevent this scenario, hold the ilock across the entire extent shift
loop in xfs_collapse_file_space().

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_bmap_util.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 2f1e30d..96eb97b 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1474,6 +1474,8 @@ xfs_collapse_file_space(
 	if (error)
 		return error;
 
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+
 	while (!error && !done) {
 		tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
 		/*
@@ -1489,7 +1491,6 @@ xfs_collapse_file_space(
 			break;
 		}
 
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		error = xfs_trans_reserve_quota(tp, mp, ip->i_udquot,
 				ip->i_gdquot, ip->i_pdquot,
 				XFS_DIOSTRAT_SPACE_RES(mp, 0), 0,
@@ -1517,9 +1518,9 @@ xfs_collapse_file_space(
 			goto out;
 
 		error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	}
 
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
 
 out:
-- 
1.8.3.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] xfs: don't log inode unless extent shift makes extent modifications
  2014-08-08 18:49 ` [PATCH 1/2] xfs: don't log inode unless extent shift makes extent modifications Brian Foster
@ 2014-08-11 18:03   ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2014-08-11 18:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] xfs: hole the inode lock across a full file collapse
  2014-08-08 18:49 ` [PATCH 2/2] xfs: hole the inode lock across a full file collapse Brian Foster
@ 2014-08-11 18:03   ` Christoph Hellwig
  2014-08-13 15:42   ` Brian Foster
  1 sibling, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2014-08-11 18:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] xfs: for-next file collapse bug fixes
  2014-08-08 18:49 [PATCH 0/2] xfs: for-next file collapse bug fixes Brian Foster
  2014-08-08 18:49 ` [PATCH 1/2] xfs: don't log inode unless extent shift makes extent modifications Brian Foster
  2014-08-08 18:49 ` [PATCH 2/2] xfs: hole the inode lock across a full file collapse Brian Foster
@ 2014-08-11 21:55 ` Dave Chinner
  2 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2014-08-11 21:55 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Fri, Aug 08, 2014 at 02:49:24PM -0400, Brian Foster wrote:
> Hi all,
> 
> I've seen collapse range fall over during some recent stress testing.
> I'm running fsx and 16 fsstress threads in parallel to reproduce. Note
> that the fsstress workload doesn't need to be on the same fs (I suspect
> a sync() is a trigger). These patches are what has fallen out so far...
> 
> The first patch stems from the fact that the error caused an fs shutdown
> that appeared to be unnecessary. I was initially going to skip the inode
> log on any error, but on closer inspection it seems like we expect to
> abort/shutdown if something has in fact been changed, so this modifies
> the code to reduce that shutdown window. The second patch deals with the
> actual collapse failure by fixing up the locking.
> 
> Note that I still reproduced at least one collapse failure even with
> these fixes, so there could be more at play here with the
> implementation:
> 
> XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 5535 of file fs/xfs/libxfs/xfs_bmap.c.  Caller xfs_collapse_file_space+0x1af/0x280 [xfs]
> 
> This took significantly longer to reproduce and I don't yet have a feel
> for how reproducible it is in general. In the meantime, these two seemed
> relatively straightforward and incremental...

They look good, but it's too late for the 3.17 merge window.
However, given that we've got other fixes that need to go to 3.17
but are also too late (Chris Mason's direct IO invalidation fixes)
I'll plan these for 3.17-rc2 or so.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] xfs: hole the inode lock across a full file collapse
  2014-08-08 18:49 ` [PATCH 2/2] xfs: hole the inode lock across a full file collapse Brian Foster
  2014-08-11 18:03   ` Christoph Hellwig
@ 2014-08-13 15:42   ` Brian Foster
  2014-08-14  3:11     ` Dave Chinner
  1 sibling, 1 reply; 10+ messages in thread
From: Brian Foster @ 2014-08-13 15:42 UTC (permalink / raw)
  To: xfs

On Fri, Aug 08, 2014 at 02:49:26PM -0400, Brian Foster wrote:
> A file collapse stress test workload reproduces collapse failures
> mid-operation due to changes in the inode fork extent count across
> extent shift cycles. xfs_collapse_file_space() currently calls
> xfs_bmap_shift_extents() to shift one extent at a time per transaction.
> The extent index is used to track the next extent to shift after each
> iteration.
> 
> A concurrent fsx and fsstress workload reproduces a scenario where the
> extent count changes during this sequence, causing the 'current_ext'
> index to become inaccurate and possibly skip shifting an extent. The
> likely result of this behavior is the subsequent shift attempt will not
> find a hole in the area of the skipped extent and fail, leaving the file
> in a partially collapsed state.
> 
> This occurs because the ilock is released and acquired across each
> transaction and each individual extent shift. Tracepoint output shows
> that once the ilock is released after an extent shift, a pending
> blocking writeback (e.g., sync) can acquire the lock and proceed before
> the next extent is shifted down. If the writeback converts part of a
> delayed allocation earlier in the file, for example, it can insert a new
> extent into the map. Tracing confirms a call to
> xfs_bmap_add_extent_delay_real() in this particular instance.
> 
> To prevent this scenario, hold the ilock across the entire extent shift
> loop in xfs_collapse_file_space().
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_bmap_util.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 2f1e30d..96eb97b 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1474,6 +1474,8 @@ xfs_collapse_file_space(
>  	if (error)
>  		return error;
>  
> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> +

I realized this moves the lock outside of the xfs_trans_reserve(), thus
opening a potential deadlock scenario with regard to the log. I suppose
this might be harder to hit in real life than a sync() causing the
operation to fall over mid-sequence, so I'm still Ok with keeping this
unless anybody objects. That said, we might still have to do something
else here longer term... perhaps sync the whole file and revert to the
original locking, or retain this locking scheme and use a rolling
transaction. Anyways, food for thought..

Brian

>  	while (!error && !done) {
>  		tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
>  		/*
> @@ -1489,7 +1491,6 @@ xfs_collapse_file_space(
>  			break;
>  		}
>  
> -		xfs_ilock(ip, XFS_ILOCK_EXCL);
>  		error = xfs_trans_reserve_quota(tp, mp, ip->i_udquot,
>  				ip->i_gdquot, ip->i_pdquot,
>  				XFS_DIOSTRAT_SPACE_RES(mp, 0), 0,
> @@ -1517,9 +1518,9 @@ xfs_collapse_file_space(
>  			goto out;
>  
>  		error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
> -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	}
>  
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	return error;
>  
>  out:
> -- 
> 1.8.3.1
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] xfs: hole the inode lock across a full file collapse
  2014-08-13 15:42   ` Brian Foster
@ 2014-08-14  3:11     ` Dave Chinner
  2014-08-14 19:09       ` Brian Foster
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2014-08-14  3:11 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, Aug 13, 2014 at 11:42:29AM -0400, Brian Foster wrote:
> On Fri, Aug 08, 2014 at 02:49:26PM -0400, Brian Foster wrote:
> > A file collapse stress test workload reproduces collapse failures
> > mid-operation due to changes in the inode fork extent count across
> > extent shift cycles. xfs_collapse_file_space() currently calls
> > xfs_bmap_shift_extents() to shift one extent at a time per transaction.
> > The extent index is used to track the next extent to shift after each
> > iteration.

Right, it does so after writing back all the dirty pages and
invalidating the cache. This is done under the IOLOCK_EXCL, so we
should end up with nothing being able to newly dirty the inode while
the collapse range operation is in progress.
> > 
> > A concurrent fsx and fsstress workload reproduces a scenario where the
> > extent count changes during this sequence, causing the 'current_ext'
> > index to become inaccurate and possibly skip shifting an extent. The
> > likely result of this behavior is the subsequent shift attempt will not
> > find a hole in the area of the skipped extent and fail, leaving the file
> > in a partially collapsed state.
> > 
> > This occurs because the ilock is released and acquired across each
> > transaction and each individual extent shift. Tracepoint output shows
> > that once the ilock is released after an extent shift, a pending
> > blocking writeback (e.g., sync) can acquire the lock and proceed before
> > the next extent is shifted down. If the writeback converts part of a
> > delayed allocation earlier in the file, for example, it can insert a new
> > extent into the map. Tracing confirms a call to
> > xfs_bmap_add_extent_delay_real() in this particular instance.

Are we getting a dirty extent in the range being collapsed, or does
it exist outside the range being shifted? If outside, then shouldn't
we just sync the entire file first? i.e. treat it the same way as we
do xfs_swap_extents()?

Realistically, cur_extent is not valid once we drop the ILOCK.
Perhaps we should keep the offset of the extent we are up to in the
shift, and then look up the offset to get current extent map index
once we pick the ILOCK back up. That way we avoid issues with other
parts of the extent map changing between ILOCK hold contexts.

> > To prevent this scenario, hold the ilock across the entire extent shift
> > loop in xfs_collapse_file_space().
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/xfs_bmap_util.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index 2f1e30d..96eb97b 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -1474,6 +1474,8 @@ xfs_collapse_file_space(
> >  	if (error)
> >  		return error;
> >  
> > +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > +
> 
> I realized this moves the lock outside of the xfs_trans_reserve(), thus
> opening a potential deadlock scenario with regard to the log. I suppose
> this might be harder to hit in real life than a sync() causing the
> operation to fall over mid-sequence, so I'm still Ok with keeping this
> unless anybody objects.

We can't do that. Inode locking rules explicitly forbid it - we've
avoided needing to do break this rule for 20 years, so let's not
start making exceptions now just because it's a "simple fix".

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] xfs: hole the inode lock across a full file collapse
  2014-08-14  3:11     ` Dave Chinner
@ 2014-08-14 19:09       ` Brian Foster
  2014-08-14 22:30         ` Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: Brian Foster @ 2014-08-14 19:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Aug 14, 2014 at 01:11:36PM +1000, Dave Chinner wrote:
> On Wed, Aug 13, 2014 at 11:42:29AM -0400, Brian Foster wrote:
> > On Fri, Aug 08, 2014 at 02:49:26PM -0400, Brian Foster wrote:
> > > A file collapse stress test workload reproduces collapse failures
> > > mid-operation due to changes in the inode fork extent count across
> > > extent shift cycles. xfs_collapse_file_space() currently calls
> > > xfs_bmap_shift_extents() to shift one extent at a time per transaction.
> > > The extent index is used to track the next extent to shift after each
> > > iteration.
> 
> Right, it does so after writing back all the dirty pages and
> invalidating the cache. This is done under the IOLOCK_EXCL, so we
> should end up with nothing being able to newly dirty the inode while
> the collapse range operation is in progress.
> > > 
> > > A concurrent fsx and fsstress workload reproduces a scenario where the
> > > extent count changes during this sequence, causing the 'current_ext'
> > > index to become inaccurate and possibly skip shifting an extent. The
> > > likely result of this behavior is the subsequent shift attempt will not
> > > find a hole in the area of the skipped extent and fail, leaving the file
> > > in a partially collapsed state.
> > > 
> > > This occurs because the ilock is released and acquired across each
> > > transaction and each individual extent shift. Tracepoint output shows
> > > that once the ilock is released after an extent shift, a pending
> > > blocking writeback (e.g., sync) can acquire the lock and proceed before
> > > the next extent is shifted down. If the writeback converts part of a
> > > delayed allocation earlier in the file, for example, it can insert a new
> > > extent into the map. Tracing confirms a call to
> > > xfs_bmap_add_extent_delay_real() in this particular instance.
> 
> Are we getting a dirty extent in the range being collapsed, or does
> it exist outside the range being shifted? If outside, then shouldn't
> we just sync the entire file first? i.e. treat it the same way as we
> do xfs_swap_extents()?
> 

The dirty extent is prior to the range being collapsed, presumably
already dirty by the time we acquire the iolock. The free space portion
of the collapse currently only syncs from the start of the range to EOF.

Syncing the entire file is something I was planning to experiment with,
particularly since it seems like the remaining failure I mentioned in
the series intro appears to be related to hitting a delalloc extent
during the collapse. I don't yet know how that occurs, but my last test
dumped enough trace data to show a startblock value that includes the
mask used for delalloc extents. The collapse attempts to look up an
extent in the bmap btree with these values and falls over when it's not
found.

> Realistically, cur_extent is not valid once we drop the ILOCK.
> Perhaps we should keep the offset of the extent we are up to in the
> shift, and then look up the offset to get current extent map index
> once we pick the ILOCK back up. That way we avoid issues with other
> parts of the extent map changing between ILOCK hold contexts.
> 

I noticed that one or more of the other bmap functions do something like
this. Unless there's some other reason that we don't want to drop ilock
here, that sounds like a safer approach. Thanks.

> > > To prevent this scenario, hold the ilock across the entire extent shift
> > > loop in xfs_collapse_file_space().
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > >  fs/xfs/xfs_bmap_util.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > > index 2f1e30d..96eb97b 100644
> > > --- a/fs/xfs/xfs_bmap_util.c
> > > +++ b/fs/xfs/xfs_bmap_util.c
> > > @@ -1474,6 +1474,8 @@ xfs_collapse_file_space(
> > >  	if (error)
> > >  		return error;
> > >  
> > > +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > > +
> > 
> > I realized this moves the lock outside of the xfs_trans_reserve(), thus
> > opening a potential deadlock scenario with regard to the log. I suppose
> > this might be harder to hit in real life than a sync() causing the
> > operation to fall over mid-sequence, so I'm still Ok with keeping this
> > unless anybody objects.
> 
> We can't do that. Inode locking rules explicitly forbid it - we've
> avoided needing to do break this rule for 20 years, so let's not
> start making exceptions now just because it's a "simple fix".
> 

Sounds good, let's drop this one then and I'll revisit it. Are we still
Ok with the first patch? I think that one is still warranted since it
limits the current flaws of collapse to the collapse operation itself.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] xfs: hole the inode lock across a full file collapse
  2014-08-14 19:09       ` Brian Foster
@ 2014-08-14 22:30         ` Dave Chinner
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2014-08-14 22:30 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, Aug 14, 2014 at 03:09:18PM -0400, Brian Foster wrote:
> Sounds good, let's drop this one then and I'll revisit it. Are we still
> Ok with the first patch? I think that one is still warranted since it
> limits the current flaws of collapse to the collapse operation itself.

The first patch looks fine.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-08-14 22:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-08 18:49 [PATCH 0/2] xfs: for-next file collapse bug fixes Brian Foster
2014-08-08 18:49 ` [PATCH 1/2] xfs: don't log inode unless extent shift makes extent modifications Brian Foster
2014-08-11 18:03   ` Christoph Hellwig
2014-08-08 18:49 ` [PATCH 2/2] xfs: hole the inode lock across a full file collapse Brian Foster
2014-08-11 18:03   ` Christoph Hellwig
2014-08-13 15:42   ` Brian Foster
2014-08-14  3:11     ` Dave Chinner
2014-08-14 19:09       ` Brian Foster
2014-08-14 22:30         ` Dave Chinner
2014-08-11 21:55 ` [PATCH 0/2] xfs: for-next file collapse bug fixes Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.