All of lore.kernel.org
 help / color / mirror / Atom feed
* block allocations for the refcount btree
@ 2016-02-10  9:30 Christoph Hellwig
  2016-02-10  9:50 ` Darrick J. Wong
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2016-02-10  9:30 UTC (permalink / raw)
  To: Darrick J. Wong, Dave Chinner; +Cc: xfs

I've been chasing asserts about reserved blocks when testing the reflink
feature over NFS.  I was right with my suspicion that the full allocator
recursion from xfs_refcountbt_alloc_block was the culprit, but I should
have read the assert before jumping to conclusions:  we're running out
of space reservations, not log reservations.

The problem is that we need reserve the space for these normal allocator
block pools used by the refcount btree, which we may use up for allocations
and frees given that we use the normal space allocator for it.

This means we need a reservation for each transaction that truncates
extents, including those allocated in xfs_trans_roll.  The patch below
demonstrates a local hack that makes things work for me so far, but
I don't think it's a viable solution.

Darrick, Dave - can you explain the design decisions behind the
refcount btree block allocations to me?  This whole area still seems
to be in constant flux, so it seems there are lots of tradeoffs to make.

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 338c457..3b8a25e 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -967,7 +967,7 @@ xfs_free_eofblocks(
 			}
 		}
 
-		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
+		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 2, 0);
 		if (error) {
 			ASSERT(XFS_FORCED_SHUTDOWN(mp));
 			xfs_trans_cancel(tp);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index d1311ef..ec530d7 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1783,7 +1783,7 @@ xfs_inactive_truncate(
 	int			error;
 
 	tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
-	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
+	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 2, 0);
 	if (error) {
 		ASSERT(XFS_FORCED_SHUTDOWN(mp));
 		xfs_trans_cancel(tp);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ab3619c..a8c55bd 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -871,7 +871,7 @@ xfs_setattr_size(
 	truncate_setsize(inode, newsize);
 
 	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_SIZE);
-	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
+	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 2, 0);
 	if (error)
 		goto out_trans_cancel;
 
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 4d63836..a661430 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -4144,7 +4144,7 @@ xlog_recover_process_efi(
 	}
 
 	tp = xfs_trans_alloc(mp, 0);
-	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
+	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 2, 0);
 	if (error)
 		goto abort_error;
 	efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
index 3640c6e..33ff98f 100644
--- a/fs/xfs/xfs_qm_syscalls.c
+++ b/fs/xfs/xfs_qm_syscalls.c
@@ -237,7 +237,7 @@ xfs_qm_scall_trunc_qfile(
 	xfs_ilock(ip, XFS_IOLOCK_EXCL);
 
 	tp = xfs_trans_alloc(mp, XFS_TRANS_TRUNCATE_FILE);
-	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
+	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 2, 0);
 	if (error) {
 		xfs_trans_cancel(tp);
 		xfs_iunlock(ip, XFS_IOLOCK_EXCL);
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 9503ccc..585a94a 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1026,7 +1026,7 @@ xfs_reflink_update_dest(
 		return 0;
 
 	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_SIZE);
-	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
+	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 2, 0);
 
 	/*
 	 * check for running out of space
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index b44284c..7538f39 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -456,7 +456,7 @@ xfs_inactive_symlink_rmt(
 	ASSERT(ip->i_d.di_nextents > 0 && ip->i_d.di_nextents <= 2);
 
 	tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
-	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
+	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 2, 0);
 	if (error) {
 		xfs_trans_cancel(tp);
 		return error;
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 748b16a..2ef46de 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1065,7 +1065,7 @@ __xfs_trans_roll(
 	 * the prior and the next transactions.
 	 */
 	tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
-	error = xfs_trans_reserve(trans, &tres, 0, 0);
+	error = xfs_trans_reserve(trans, &tres, 2, 0);
 	/*
 	 *  Ensure that the inode is in the new transaction and locked.
 	 */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-02-10  9:30 block allocations for the refcount btree Christoph Hellwig
@ 2016-02-10  9:50 ` Darrick J. Wong
  2016-02-10 19:07   ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2016-02-10  9:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

[past-1am posting, reader beware...]

On Wed, Feb 10, 2016 at 01:30:11AM -0800, Christoph Hellwig wrote:
> I've been chasing asserts about reserved blocks when testing the reflink
> feature over NFS.  I was right with my suspicion that the full allocator
> recursion from xfs_refcountbt_alloc_block was the culprit, but I should
> have read the assert before jumping to conclusions:  we're running out
> of space reservations, not log reservations.

Oh, there are ways to run out of log reservation too, and fixing that is
also on my todo lits.

> The problem is that we need reserve the space for these normal allocator
> block pools used by the refcount btree, which we may use up for allocations
> and frees given that we use the normal space allocator for it.
> 
> This means we need a reservation for each transaction that truncates
> extents, including those allocated in xfs_trans_roll.  The patch below
> demonstrates a local hack that makes things work for me so far, but
> I don't think it's a viable solution.

That's odd... I'd have thought that the AG reservation would always be able
to handle a refcount btree expansion, since it calculates how many blocks
are needed to handle the worst case of 1 record per extent.  There's also
a bug where we undercount the number of blocks already used, so it should
have an extra big reservation.

OTOH I've seen occasional ENOSPCs in generic/186 and generic/168 too, so I
guess something's going wrong.  Maybe the xfs_ag_resv* tracepoints can help?

> Darrick, Dave - can you explain the design decisions behind the
> refcount btree block allocations to me?  This whole area still seems
> to be in constant flux, so it seems there are lots of tradeoffs to make.

ISTR Dave told me to use the regular allocator (instead of the AGFL) for
the refcount btree because it's not a free space btree and exists at a higher
level than, say, the rmap btree.  The per-AG reservation code selectively hides
part of an AG's free space from the regular allocator so that regular file
requests can't eat up so much space in the AG that future cow/reflink/whatever
operations encounter ENOSPC during a btree split and take the FS offline.

------

WRT xfstests, I think I fixed all the things Dave was complaining about.
Running tests overnight to make sure I didn't break anything, and I'll be
ready to send another pull req (I can patchbomb too, but know that there
are even more tests now that I cleaned up the style problems) tomorrow
morning if nothing breaks.

--D

> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 338c457..3b8a25e 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -967,7 +967,7 @@ xfs_free_eofblocks(
>  			}
>  		}
>  
> -		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
> +		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 2, 0);
>  		if (error) {
>  			ASSERT(XFS_FORCED_SHUTDOWN(mp));
>  			xfs_trans_cancel(tp);
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index d1311ef..ec530d7 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1783,7 +1783,7 @@ xfs_inactive_truncate(
>  	int			error;
>  
>  	tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
> -	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
> +	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 2, 0);
>  	if (error) {
>  		ASSERT(XFS_FORCED_SHUTDOWN(mp));
>  		xfs_trans_cancel(tp);
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index ab3619c..a8c55bd 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -871,7 +871,7 @@ xfs_setattr_size(
>  	truncate_setsize(inode, newsize);
>  
>  	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_SIZE);
> -	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
> +	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 2, 0);
>  	if (error)
>  		goto out_trans_cancel;
>  
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 4d63836..a661430 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -4144,7 +4144,7 @@ xlog_recover_process_efi(
>  	}
>  
>  	tp = xfs_trans_alloc(mp, 0);
> -	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
> +	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 2, 0);
>  	if (error)
>  		goto abort_error;
>  	efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
> diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
> index 3640c6e..33ff98f 100644
> --- a/fs/xfs/xfs_qm_syscalls.c
> +++ b/fs/xfs/xfs_qm_syscalls.c
> @@ -237,7 +237,7 @@ xfs_qm_scall_trunc_qfile(
>  	xfs_ilock(ip, XFS_IOLOCK_EXCL);
>  
>  	tp = xfs_trans_alloc(mp, XFS_TRANS_TRUNCATE_FILE);
> -	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
> +	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 2, 0);
>  	if (error) {
>  		xfs_trans_cancel(tp);
>  		xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 9503ccc..585a94a 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1026,7 +1026,7 @@ xfs_reflink_update_dest(
>  		return 0;
>  
>  	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_SIZE);
> -	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
> +	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 2, 0);
>  
>  	/*
>  	 * check for running out of space
> diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
> index b44284c..7538f39 100644
> --- a/fs/xfs/xfs_symlink.c
> +++ b/fs/xfs/xfs_symlink.c
> @@ -456,7 +456,7 @@ xfs_inactive_symlink_rmt(
>  	ASSERT(ip->i_d.di_nextents > 0 && ip->i_d.di_nextents <= 2);
>  
>  	tp = xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
> -	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
> +	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 2, 0);
>  	if (error) {
>  		xfs_trans_cancel(tp);
>  		return error;
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 748b16a..2ef46de 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -1065,7 +1065,7 @@ __xfs_trans_roll(
>  	 * the prior and the next transactions.
>  	 */
>  	tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
> -	error = xfs_trans_reserve(trans, &tres, 0, 0);
> +	error = xfs_trans_reserve(trans, &tres, 2, 0);
>  	/*
>  	 *  Ensure that the inode is in the new transaction and locked.
>  	 */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-02-10  9:50 ` Darrick J. Wong
@ 2016-02-10 19:07   ` Christoph Hellwig
  2016-02-10 21:40     ` Dave Chinner
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2016-02-10 19:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Wed, Feb 10, 2016 at 01:50:10AM -0800, Darrick J. Wong wrote:
> That's odd... I'd have thought that the AG reservation would always be able
> to handle a refcount btree expansion, since it calculates how many blocks
> are needed to handle the worst case of 1 record per extent.  There's also
> a bug where we undercount the number of blocks already used, so it should
> have an extra big reservation.
> 
> OTOH I've seen occasional ENOSPCs in generic/186 and generic/168 too, so I
> guess something's going wrong.  Maybe the xfs_ag_resv* tracepoints can help?

I'm not seeing an ENOSPC, I run into:

[  640.924891] XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 315

Which asserts that the current transaction is using up more blocks from
its reservation than it reserved.  The AG reservation seems to operate
on a different level than that.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-02-10 19:07   ` Christoph Hellwig
@ 2016-02-10 21:40     ` Dave Chinner
  2016-02-11 14:09       ` Brian Foster
  2016-02-12 19:10       ` Christoph Hellwig
  0 siblings, 2 replies; 21+ messages in thread
From: Dave Chinner @ 2016-02-10 21:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs, Darrick J. Wong

On Wed, Feb 10, 2016 at 11:07:38AM -0800, Christoph Hellwig wrote:
> On Wed, Feb 10, 2016 at 01:50:10AM -0800, Darrick J. Wong wrote:
> > That's odd... I'd have thought that the AG reservation would always be able
> > to handle a refcount btree expansion, since it calculates how many blocks
> > are needed to handle the worst case of 1 record per extent.  There's also
> > a bug where we undercount the number of blocks already used, so it should
> > have an extra big reservation.
> > 
> > OTOH I've seen occasional ENOSPCs in generic/186 and generic/168 too, so I
> > guess something's going wrong.  Maybe the xfs_ag_resv* tracepoints can help?
> 
> I'm not seeing an ENOSPC, I run into:
> 
> [  640.924891] XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 315

I run into that from time to time (maybe once a month) on a vanilla
kernel.

IIRC, the problem is the delayed allocation extent split runs out of
it's reserved block count if you split it enough times. The case
I've seen is that  the indlen calculated in xfs_bmap_worst_indlen()
ends up too small for a subsequent allocation after we've called
xfs_bmap_del_extent() to delete the middle of a delalloc extent too
many times.

Brian had some patches that attempted to solve it - we may have
simply dropped the ball on this (again).

http://oss.sgi.com/archives/xfs/2014-09/msg00337.html

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-02-10 21:40     ` Dave Chinner
@ 2016-02-11 14:09       ` Brian Foster
  2016-02-11 20:21         ` Dave Chinner
  2016-02-12 19:10       ` Christoph Hellwig
  1 sibling, 1 reply; 21+ messages in thread
From: Brian Foster @ 2016-02-11 14:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, Darrick J. Wong, xfs

On Thu, Feb 11, 2016 at 08:40:58AM +1100, Dave Chinner wrote:
> On Wed, Feb 10, 2016 at 11:07:38AM -0800, Christoph Hellwig wrote:
> > On Wed, Feb 10, 2016 at 01:50:10AM -0800, Darrick J. Wong wrote:
> > > That's odd... I'd have thought that the AG reservation would always be able
> > > to handle a refcount btree expansion, since it calculates how many blocks
> > > are needed to handle the worst case of 1 record per extent.  There's also
> > > a bug where we undercount the number of blocks already used, so it should
> > > have an extra big reservation.
> > > 
> > > OTOH I've seen occasional ENOSPCs in generic/186 and generic/168 too, so I
> > > guess something's going wrong.  Maybe the xfs_ag_resv* tracepoints can help?
> > 
> > I'm not seeing an ENOSPC, I run into:
> > 
> > [  640.924891] XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 315
> 
> I run into that from time to time (maybe once a month) on a vanilla
> kernel.
> 

Any idea which test reproduces? I see that generic/033 resulted from the
discussion below on the rfc. I don't currently reproduce with that test,
however. The test mentions it uses fzero because zero range doesn't do
writeback (comments ftw :) and thus allows splitting of delalloc
extents, but it looks like that might no longer be the case in the
kernel (since zero range was simplified to reuse punch/alloc).

> IIRC, the problem is the delayed allocation extent split runs out of
> it's reserved block count if you split it enough times. The case
> I've seen is that  the indlen calculated in xfs_bmap_worst_indlen()
> ends up too small for a subsequent allocation after we've called
> xfs_bmap_del_extent() to delete the middle of a delalloc extent too
> many times.
> 
> Brian had some patches that attempted to solve it - we may have
> simply dropped the ball on this (again).
> 
> http://oss.sgi.com/archives/xfs/2014-09/msg00337.html
> 

I recall working on this, but not quite where it left off. If I dig back
to my old tree from before the oss.sgi.com->vger switchover, I have a v1
branch for this work that was posted here:

http://oss.sgi.com/archives/xfs/2014-10/msg00294.html

It looks like we just never got it reviewed and I since lost track of
it. I can resurrect it if warranted. I would like to nail down a current
reproducer though...

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-02-11 14:09       ` Brian Foster
@ 2016-02-11 20:21         ` Dave Chinner
  0 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2016-02-11 20:21 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, Darrick J. Wong, xfs

On Thu, Feb 11, 2016 at 09:09:37AM -0500, Brian Foster wrote:
> On Thu, Feb 11, 2016 at 08:40:58AM +1100, Dave Chinner wrote:
> > On Wed, Feb 10, 2016 at 11:07:38AM -0800, Christoph Hellwig wrote:
> > > On Wed, Feb 10, 2016 at 01:50:10AM -0800, Darrick J. Wong wrote:
> > > > That's odd... I'd have thought that the AG reservation would always be able
> > > > to handle a refcount btree expansion, since it calculates how many blocks
> > > > are needed to handle the worst case of 1 record per extent.  There's also
> > > > a bug where we undercount the number of blocks already used, so it should
> > > > have an extra big reservation.
> > > > 
> > > > OTOH I've seen occasional ENOSPCs in generic/186 and generic/168 too, so I
> > > > guess something's going wrong.  Maybe the xfs_ag_resv* tracepoints can help?
> > > 
> > > I'm not seeing an ENOSPC, I run into:
> > > 
> > > [  640.924891] XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 315
> > 
> > I run into that from time to time (maybe once a month) on a vanilla
> > kernel.
> > 
> 
> Any idea which test reproduces? I see that generic/033 resulted from the
> discussion below on the rfc. I don't currently reproduce with that test,
> however. The test mentions it uses fzero because zero range doesn't do
> writeback (comments ftw :) and thus allows splitting of delalloc
> extents, but it looks like that might no longer be the case in the
> kernel (since zero range was simplified to reuse punch/alloc).

It's usually one of the fsstress tests that triggers it. For some
reason generic/233 sticks in my mind, but it's a pretty rare failure
these days...

> > IIRC, the problem is the delayed allocation extent split runs out of
> > it's reserved block count if you split it enough times. The case
> > I've seen is that  the indlen calculated in xfs_bmap_worst_indlen()
> > ends up too small for a subsequent allocation after we've called
> > xfs_bmap_del_extent() to delete the middle of a delalloc extent too
> > many times.
> > 
> > Brian had some patches that attempted to solve it - we may have
> > simply dropped the ball on this (again).
> > 
> > http://oss.sgi.com/archives/xfs/2014-09/msg00337.html
> > 
> 
> I recall working on this, but not quite where it left off. If I dig back
> to my old tree from before the oss.sgi.com->vger switchover, I have a v1
> branch for this work that was posted here:
> 
> http://oss.sgi.com/archives/xfs/2014-10/msg00294.html
> 
> It looks like we just never got it reviewed and I since lost track of
> it. I can resurrect it if warranted. I would like to nail down a current
> reproducer though...

*nod*. Not sure what we can use to trigger it, though.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-02-10 21:40     ` Dave Chinner
  2016-02-11 14:09       ` Brian Foster
@ 2016-02-12 19:10       ` Christoph Hellwig
  2016-02-13  2:33         ` Dave Chinner
  2016-03-01 18:18         ` Darrick J. Wong
  1 sibling, 2 replies; 21+ messages in thread
From: Christoph Hellwig @ 2016-02-12 19:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, Darrick J. Wong, xfs

On Thu, Feb 11, 2016 at 08:40:58AM +1100, Dave Chinner wrote:
> I run into that from time to time (maybe once a month) on a vanilla
> kernel.
> 
> IIRC, the problem is the delayed allocation extent split runs out of
> it's reserved block count if you split it enough times. The case
> I've seen is that  the indlen calculated in xfs_bmap_worst_indlen()
> ends up too small for a subsequent allocation after we've called
> xfs_bmap_del_extent() to delete the middle of a delalloc extent too
> many times.
> 
> Brian had some patches that attempted to solve it - we may have
> simply dropped the ball on this (again).
> 
> http://oss.sgi.com/archives/xfs/2014-09/msg00337.html

I'm pretty sure that is a separate issue.  With the refcount btree we may
allocate an extent (or rather just a single block) in xfs_alloc_ag_vextent
as called from xfs_refcountbt_alloc_block.  The reservation helps us to
ensure this block is always available, but we still need to account for
that in xfs_trans_reserve(), which we currently don't do for itruncate
transactions.  

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-02-12 19:10       ` Christoph Hellwig
@ 2016-02-13  2:33         ` Dave Chinner
  2016-02-13  4:44           ` Darrick J. Wong
  2016-02-13  7:48           ` Christoph Hellwig
  2016-03-01 18:18         ` Darrick J. Wong
  1 sibling, 2 replies; 21+ messages in thread
From: Dave Chinner @ 2016-02-13  2:33 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Darrick J. Wong, xfs

On Fri, Feb 12, 2016 at 11:10:46AM -0800, Christoph Hellwig wrote:
> On Thu, Feb 11, 2016 at 08:40:58AM +1100, Dave Chinner wrote:
> > I run into that from time to time (maybe once a month) on a vanilla
> > kernel.
> > 
> > IIRC, the problem is the delayed allocation extent split runs out of
> > it's reserved block count if you split it enough times. The case
> > I've seen is that  the indlen calculated in xfs_bmap_worst_indlen()
> > ends up too small for a subsequent allocation after we've called
> > xfs_bmap_del_extent() to delete the middle of a delalloc extent too
> > many times.
> > 
> > Brian had some patches that attempted to solve it - we may have
> > simply dropped the ball on this (again).
> > 
> > http://oss.sgi.com/archives/xfs/2014-09/msg00337.html
> 
> I'm pretty sure that is a separate issue.  With the refcount btree we may
> allocate an extent (or rather just a single block) in xfs_alloc_ag_vextent
> as called from xfs_refcountbt_alloc_block.  The reservation helps us to
> ensure this block is always available, but we still need to account for
> that in xfs_trans_reserve(), which we currently don't do for itruncate
> transactions.  

Ok, so we may have two different issues with a similar failure
symptom. As it is, I don't think this is a show stopper - we're
expecting to find these sorts of issues as we go along (hence the
experimental tag on the feature) and I think, at this point, getting
review and an initial merge done is more important...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-02-13  2:33         ` Dave Chinner
@ 2016-02-13  4:44           ` Darrick J. Wong
  2016-02-13  8:02             ` Christoph Hellwig
  2016-02-13  7:48           ` Christoph Hellwig
  1 sibling, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2016-02-13  4:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Sat, Feb 13, 2016 at 01:33:10PM +1100, Dave Chinner wrote:
> On Fri, Feb 12, 2016 at 11:10:46AM -0800, Christoph Hellwig wrote:
> > On Thu, Feb 11, 2016 at 08:40:58AM +1100, Dave Chinner wrote:
> > > I run into that from time to time (maybe once a month) on a vanilla
> > > kernel.
> > > 
> > > IIRC, the problem is the delayed allocation extent split runs out of
> > > it's reserved block count if you split it enough times. The case
> > > I've seen is that  the indlen calculated in xfs_bmap_worst_indlen()
> > > ends up too small for a subsequent allocation after we've called
> > > xfs_bmap_del_extent() to delete the middle of a delalloc extent too
> > > many times.
> > > 
> > > Brian had some patches that attempted to solve it - we may have
> > > simply dropped the ball on this (again).
> > > 
> > > http://oss.sgi.com/archives/xfs/2014-09/msg00337.html
> > 
> > I'm pretty sure that is a separate issue.  With the refcount btree we may
> > allocate an extent (or rather just a single block) in xfs_alloc_ag_vextent
> > as called from xfs_refcountbt_alloc_block.  The reservation helps us to
> > ensure this block is always available, but we still need to account for
> > that in xfs_trans_reserve(), which we currently don't do for itruncate
> > transactions.  

Hmm.  The per-AG reservation clients also had some problems counting the number
of tree blocks in use as part of setting up the reservation; that might have
had something to do with it.  Dunno, I'll go look at that part of the code
again when I finish interval query support for rmap.

> Ok, so we may have two different issues with a similar failure
> symptom. As it is, I don't think this is a show stopper - we're
> expecting to find these sorts of issues as we go along (hence the
> experimental tag on the feature) and I think, at this point, getting
> review and an initial merge done is more important...

I've noticed that I can trigger that assert (the log reservation one) fairly
regularly when running xfs/140 on a 800MB disk.

Brain dead, going to bed.  Next on my list is to rebase the more stable
parts of the patchset against Dave's for-4.6 branch and do another mass
mailing.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-02-13  2:33         ` Dave Chinner
  2016-02-13  4:44           ` Darrick J. Wong
@ 2016-02-13  7:48           ` Christoph Hellwig
  2016-02-14  0:21             ` Dave Chinner
  1 sibling, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2016-02-13  7:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs, Darrick J. Wong

On Sat, Feb 13, 2016 at 01:33:10PM +1100, Dave Chinner wrote:
> > allocate an extent (or rather just a single block) in xfs_alloc_ag_vextent
> > as called from xfs_refcountbt_alloc_block.  The reservation helps us to
> > ensure this block is always available, but we still need to account for
> > that in xfs_trans_reserve(), which we currently don't do for itruncate
> > transactions.  
> 
> Ok, so we may have two different issues with a similar failure
> symptom. As it is, I don't think this is a show stopper - we're
> expecting to find these sorts of issues as we go along (hence the
> experimental tag on the feature) and I think, at this point, getting
> review and an initial merge done is more important...

This triggers 100% reproducible over NFS, and as outlined I'm
also pretty sure about the root cause.  I don't think this is something
to be ignored.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-02-13  4:44           ` Darrick J. Wong
@ 2016-02-13  8:02             ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2016-02-13  8:02 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Fri, Feb 12, 2016 at 08:44:00PM -0800, Darrick J. Wong wrote:
> Brain dead, going to bed.  Next on my list is to rebase the more stable
> parts of the patchset against Dave's for-4.6 branch and do another mass
> mailing.

Yes, it would be great the prep work into for-next ASAP.  Another
important thing would be to rebase on the writepage changes now that
they are basically ready to be merged.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-02-13  7:48           ` Christoph Hellwig
@ 2016-02-14  0:21             ` Dave Chinner
  0 siblings, 0 replies; 21+ messages in thread
From: Dave Chinner @ 2016-02-14  0:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs, Darrick J. Wong

On Fri, Feb 12, 2016 at 11:48:27PM -0800, Christoph Hellwig wrote:
> On Sat, Feb 13, 2016 at 01:33:10PM +1100, Dave Chinner wrote:
> > > allocate an extent (or rather just a single block) in xfs_alloc_ag_vextent
> > > as called from xfs_refcountbt_alloc_block.  The reservation helps us to
> > > ensure this block is always available, but we still need to account for
> > > that in xfs_trans_reserve(), which we currently don't do for itruncate
> > > transactions.  
> > 
> > Ok, so we may have two different issues with a similar failure
> > symptom. As it is, I don't think this is a show stopper - we're
> > expecting to find these sorts of issues as we go along (hence the
> > experimental tag on the feature) and I think, at this point, getting
> > review and an initial merge done is more important...
> 
> This triggers 100% reproducible over NFS, and as outlined I'm
> also pretty sure about the root cause.  I don't think this is something
> to be ignored.

I didn't say we should to ignore it - I simply stated my priority
was getting the code reviewed and merged.  That doesn't stop you or
Darrick from working on a fix - if that happens after the initial
merge, we've still got the whole -rc cycle to find and fix issues
like this.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-02-12 19:10       ` Christoph Hellwig
  2016-02-13  2:33         ` Dave Chinner
@ 2016-03-01 18:18         ` Darrick J. Wong
  2016-03-01 20:40           ` Christoph Hellwig
  1 sibling, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2016-03-01 18:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Feb 12, 2016 at 11:10:46AM -0800, Christoph Hellwig wrote:
> On Thu, Feb 11, 2016 at 08:40:58AM +1100, Dave Chinner wrote:
> > I run into that from time to time (maybe once a month) on a vanilla
> > kernel.
> > 
> > IIRC, the problem is the delayed allocation extent split runs out of
> > it's reserved block count if you split it enough times. The case
> > I've seen is that  the indlen calculated in xfs_bmap_worst_indlen()
> > ends up too small for a subsequent allocation after we've called
> > xfs_bmap_del_extent() to delete the middle of a delalloc extent too
> > many times.
> > 
> > Brian had some patches that attempted to solve it - we may have
> > simply dropped the ball on this (again).
> > 
> > http://oss.sgi.com/archives/xfs/2014-09/msg00337.html
> 
> I'm pretty sure that is a separate issue.  With the refcount btree we may
> allocate an extent (or rather just a single block) in xfs_alloc_ag_vextent
> as called from xfs_refcountbt_alloc_block.  The reservation helps us to
> ensure this block is always available, but we still need to account for
> that in xfs_trans_reserve(), which we currently don't do for itruncate
> transactions.  

One side effect of the per-ag block reservation code is that it reserves all
the blocks that the refcountbt will ever need at mount time, which includes
decreasing the incore fdblocks counter at mount and putting it back at unmount
time.  This /should/ eliminate the need for reserving blocks in truncate
transactions, though clearly this isn't being done correctly.  The AGresv code
as of a couple weeks ago tried to monkey with the transaction block reservation
counts after the allocator does its usual accounting, which as you observe,
doesn't work.

Dave suggested that I embed the AGresv structures directly into xfs_perag, and
I realized that we'll only ever need two of these things -- one to feed the
AGFL (rmapbt) and another to feed the higher level btrees (refcountbt).  At the
same time, I decided that because the AGresv code ultimately knows whether an
allocation request was satisfied from a reservation or from the free space
btree, it should also have a hand in deciding whether or not to update the
transaction's block reservation.

So what I'm saying is that I think this problem was with the AGresv code not
doing accounting correctly, and that I've fixed it in a subsequent rewrite of
the AGresv code.  I'll post it later, after I figure out why generic/333
regresses with the new code.

However, there's one thing to be aware of -- if the AGresv uses up all the
blocks that were preallocated at mount time, the allocator will grab any free
blocks available and charge the blocks to the transaction, just like before.
If this ever happens (in theory we reserve enough blocks so that we can have a
refcount record for every block in the AG) then we'll still have this problem.

The most cautious thing to do, I think, is to combine the AGresv fixes with
this patch that adds a block reservation to truncate transactions in case the
AGresv can't supply a block to the refcount btree.  The problem here is that
for most cases we'll have both the AGresv and the transaction reserving blocks
for the same purpose, which seems excessive.  Moreover, it introduces the
possibility of userspace seeing ENOSPC while truncating files even if there's
actually sufficient space to handle a refcountbt split.

<shrug> What does everyone else think?

--D

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-03-01 18:18         ` Darrick J. Wong
@ 2016-03-01 20:40           ` Christoph Hellwig
  2016-03-02  5:24             ` Darrick J. Wong
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2016-03-01 20:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Tue, Mar 01, 2016 at 10:18:09AM -0800, Darrick J. Wong wrote:
> One side effect of the per-ag block reservation code is that it reserves all
> the blocks that the refcountbt will ever need at mount time, which includes
> decreasing the incore fdblocks counter at mount and putting it back at unmount
> time.  This /should/ eliminate the need for reserving blocks in truncate
> transactions, though clearly this isn't being done correctly.

We're still accouting these blocks in t_blk_res_used through
xfs_alloc_vextent -> xfs_alloc_ag_vextent -> xfs_trans_mod_sb.

I don't really see how the reservation code changes anything about
that accounting.  It just ensures the allocation will succeed through
xfs_ag_resv_needed in xfs_alloc_ag_vextent, and then removes the
allocated block from the reservation using xfs_ag_resv_alloc_block.

Maybe we need to find a way to not account for these blocks.

> So what I'm saying is that I think this problem was with the AGresv code not
> doing accounting correctly, and that I've fixed it in a subsequent rewrite of
> the AGresv code.  I'll post it later, after I figure out why generic/333
> regresses with the new code.

Ok, let's see if the new version helps with the above issue.

> However, there's one thing to be aware of -- if the AGresv uses up all the
> blocks that were preallocated at mount time, the allocator will grab any free
> blocks available and charge the blocks to the transaction, just like before.
> If this ever happens (in theory we reserve enough blocks so that we can have a
> refcount record for every block in the AG) then we'll still have this problem.

It seems like we should simply avoid that this case ever happens.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-03-01 20:40           ` Christoph Hellwig
@ 2016-03-02  5:24             ` Darrick J. Wong
  2016-03-02  9:59               ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2016-03-02  5:24 UTC (permalink / raw)
  To: Christoph Hellwig, Dave Chinner; +Cc: xfs

On Tue, Mar 01, 2016 at 12:40:13PM -0800, Christoph Hellwig wrote:
> On Tue, Mar 01, 2016 at 10:18:09AM -0800, Darrick J. Wong wrote:
> > One side effect of the per-ag block reservation code is that it reserves all
> > the blocks that the refcountbt will ever need at mount time, which includes
> > decreasing the incore fdblocks counter at mount and putting it back at unmount
> > time.  This /should/ eliminate the need for reserving blocks in truncate
> > transactions, though clearly this isn't being done correctly.
> 
> We're still accouting these blocks in t_blk_res_used through
> xfs_alloc_vextent -> xfs_alloc_ag_vextent -> xfs_trans_mod_sb.
> 
> I don't really see how the reservation code changes anything about
> that accounting.  It just ensures the allocation will succeed through
> xfs_ag_resv_needed in xfs_alloc_ag_vextent, and then removes the
> allocated block from the reservation using xfs_ag_resv_alloc_block.
> 
> Maybe we need to find a way to not account for these blocks.
> 
> > So what I'm saying is that I think this problem was with the AGresv code not
> > doing accounting correctly, and that I've fixed it in a subsequent rewrite of
> > the AGresv code.  I'll post it later, after I figure out why generic/333
> > regresses with the new code.
> 
> Ok, let's see if the new version helps with the above issue.
> 
> > However, there's one thing to be aware of -- if the AGresv uses up all the
> > blocks that were preallocated at mount time, the allocator will grab any free
> > blocks available and charge the blocks to the transaction, just like before.
> > If this ever happens (in theory we reserve enough blocks so that we can have a
> > refcount record for every block in the AG) then we'll still have this problem.
> 
> It seems like we should simply avoid that this case ever happens.

I've rebased my trees and pushed them all to github.

The for-dave-for-4.6 kernel and progs branches are the giant piles of patches
against Dave's for-next integration trees which (I think) are being reviewed
for 4.6.

The for-dave branches are against upstream as they've always been.

New patches have been added on the end of the patchset.

I noticed that generic/139 crashes for-dave with a 1k block size due something
or other sending us bio->bi_bdev == NULL.  This seems to be sorted out somehow
in for-next.  Other than that I haven't seen any problems... but I've only
run against x64 on bare XFS.  Will run other arches/NFS/etc tonight/tomorrow.

The transaction block reservation complaints should be fixed now, and I
think the transaction reservations have been fixed too... or at least they
don't show up on the tinydisk test setup.  But all that means is that someone
else will find it, probably within the first 3 minutes of testing. :P

kernel:
for-dave-for-4.6: 31bb2360d45ae3920244a808caffd500ad85f2b6 -> b07b3e650891dc3bd439c5c296bfc22b90c1b9a5
for-dave: fc77dbd34c5c99bce46d40a2491937c3bcbd10af -> 163e86bdb2a2dc1ebb91867a247b4e671298fde7

xfsprogs:
for-dave-for-4.6: c062cfe7dbc568ec9ec308ef6a1bf4ef5cd9af2a -> b147a7ea86bf1f3697cea3bfa7909197ee374aba
for-dave: 1abecdabbc9190db2fa7c3dab15931b7b9ddbb0d -> cbda21d47a9ecde446660bb8ffd808e2ac008554

--D

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-03-02  5:24             ` Darrick J. Wong
@ 2016-03-02  9:59               ` Christoph Hellwig
  2016-03-02 16:41                 ` Darrick J. Wong
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2016-03-02  9:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, xfs

On Tue, Mar 01, 2016 at 09:24:11PM -0800, Darrick J. Wong wrote:
> I've rebased my trees and pushed them all to github.
> 
> The for-dave-for-4.6 kernel and progs branches are the giant piles of patches
> against Dave's for-next integration trees which (I think) are being reviewed
> for 4.6.
> 
> The for-dave branches are against upstream as they've always been.

BTW, what's the point of for-dave vs for-dave-for-4.6 for xfsprogs?

> New patches have been added on the end of the patchset.
> 
> I noticed that generic/139 crashes for-dave with a 1k block size due something
> or other sending us bio->bi_bdev == NULL.  This seems to be sorted out somehow
> in for-next.  Other than that I haven't seen any problems... but I've only
> run against x64 on bare XFS.  Will run other arches/NFS/etc tonight/tomorrow.
> 
> The transaction block reservation complaints should be fixed now, and I
> think the transaction reservations have been fixed too... or at least they
> don't show up on the tinydisk test setup.  But all that means is that someone
> else will find it, probably within the first 3 minutes of testing. :P

Passes on NFS without hitting the space reservation issue, and passes
on XFS without new regression.  The odd transaction (not space)
reservation assert in xfs/140 that I started to myesteriously 100%
reproduce last week still is around on XFS.  I'll see if I can fix that
or at least triage it further..

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-03-02  9:59               ` Christoph Hellwig
@ 2016-03-02 16:41                 ` Darrick J. Wong
  2016-03-02 16:57                   ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2016-03-02 16:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Mar 02, 2016 at 01:59:32AM -0800, Christoph Hellwig wrote:
> On Tue, Mar 01, 2016 at 09:24:11PM -0800, Darrick J. Wong wrote:
> > I've rebased my trees and pushed them all to github.
> > 
> > The for-dave-for-4.6 kernel and progs branches are the giant piles of patches
> > against Dave's for-next integration trees which (I think) are being reviewed
> > for 4.6.
> > 
> > The for-dave branches are against upstream as they've always been.
> 
> BTW, what's the point of for-dave vs for-dave-for-4.6 for xfsprogs?

for-dave-for-4.6 = all the stuff I'm pushing to Dave for 4.6
for-dave = all the stuff from my dev tree minus the non-XFS stuff

("non XFS stuff" means all the ext4 fixes, etc.)

> > New patches have been added on the end of the patchset.
> > 
> > I noticed that generic/139 crashes for-dave with a 1k block size due something
> > or other sending us bio->bi_bdev == NULL.  This seems to be sorted out somehow
> > in for-next.  Other than that I haven't seen any problems... but I've only
> > run against x64 on bare XFS.  Will run other arches/NFS/etc tonight/tomorrow.
> > 
> > The transaction block reservation complaints should be fixed now, and I
> > think the transaction reservations have been fixed too... or at least they
> > don't show up on the tinydisk test setup.  But all that means is that someone
> > else will find it, probably within the first 3 minutes of testing. :P
> 
> Passes on NFS without hitting the space reservation issue, and passes
> on XFS without new regression.  The odd transaction (not space)
> reservation assert in xfs/140 that I started to myesteriously 100%
> reproduce last week still is around on XFS.  I'll see if I can fix that
> or at least triage it further..

Hmm, I'll give it a spin when I get in later.  Can you send me xfs_info
output so I can try to construct an equivalent reproducer setup?

--D

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-03-02 16:41                 ` Darrick J. Wong
@ 2016-03-02 16:57                   ` Christoph Hellwig
  2016-03-02 21:21                     ` Darrick J. Wong
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2016-03-02 16:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, xfs

On Wed, Mar 02, 2016 at 08:41:02AM -0800, Darrick J. Wong wrote:
> Hmm, I'll give it a spin when I get in later.  Can you send me xfs_info
> output so I can try to construct an equivalent reproducer setup?

root@vm:~/xfstests# xfs_info /mnt/scratch/
meta-data=/dev/vdc               isize=512    agcount=320, agsize=8192 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0 rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=2621440, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-03-02 16:57                   ` Christoph Hellwig
@ 2016-03-02 21:21                     ` Darrick J. Wong
  2016-03-03 14:05                       ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2016-03-02 21:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Mar 02, 2016 at 08:57:04AM -0800, Christoph Hellwig wrote:
> On Wed, Mar 02, 2016 at 08:41:02AM -0800, Darrick J. Wong wrote:
> > Hmm, I'll give it a spin when I get in later.  Can you send me xfs_info
> > output so I can try to construct an equivalent reproducer setup?
> 
> root@vm:~/xfstests# xfs_info /mnt/scratch/
> meta-data=/dev/vdc               isize=512    agcount=320, agsize=8192 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1 spinodes=0 rmapbt=0
>          =                       reflink=1
> data     =                       bsize=4096   blocks=2621440, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> log      =internal               bsize=4096   blocks=2560, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0

Ok.  I think the problem is that making changes to the refcount btree eats
up our entire reservation in certain cases.  Can you try the following bandaid?
This should give us enough room to handle splitting the btree at both ends
of a range that we're refcount-changing.

Come to think of it, this might not be enough -- if we want to change the
refcount of an n-block extent, can we end up changing n/i_refc_mnr[0] blocks?
I suspect yes, but I'll think about that some more.

diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index cfd8a3c..54c5c22 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -86,7 +86,7 @@ xfs_allocfree_log_count(
 	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
 		blocks += num_ops * (2 * mp->m_rmap_maxlevels - 1);
 	if (xfs_sb_version_hasreflink(&mp->m_sb))
-		blocks += num_ops * (2 * mp->m_refc_maxlevels - 1);
+		blocks += 2 * num_ops * (2 * mp->m_refc_maxlevels - 1);
 	return blocks;
 }
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-03-02 21:21                     ` Darrick J. Wong
@ 2016-03-03 14:05                       ` Christoph Hellwig
  2016-03-04  1:36                         ` Darrick J. Wong
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2016-03-03 14:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, xfs

On Wed, Mar 02, 2016 at 01:21:01PM -0800, Darrick J. Wong wrote:
> Ok.  I think the problem is that making changes to the refcount btree eats
> up our entire reservation in certain cases.  Can you try the following bandaid?
> This should give us enough room to handle splitting the btree at both ends
> of a range that we're refcount-changing.

This seems to work in general.  I ran into one log related hang when
running -g auto on nfs, but it's not been reproducible so far.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: block allocations for the refcount btree
  2016-03-03 14:05                       ` Christoph Hellwig
@ 2016-03-04  1:36                         ` Darrick J. Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Darrick J. Wong @ 2016-03-04  1:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Thu, Mar 03, 2016 at 06:05:56AM -0800, Christoph Hellwig wrote:
> On Wed, Mar 02, 2016 at 01:21:01PM -0800, Darrick J. Wong wrote:
> > Ok.  I think the problem is that making changes to the refcount btree eats
> > up our entire reservation in certain cases.  Can you try the following bandaid?
> > This should give us enough room to handle splitting the btree at both ends
> > of a range that we're refcount-changing.
> 
> This seems to work in general.  I ran into one log related hang when
> running -g auto on nfs, but it's not been reproducible so far.

It still breaks on 1k block filesystems, which confirms my suspicions that
we really need to have enough space in the reservation to handle the worst
case that every block in a n-block truncation has a separate refcountbt
record. :( :(

(Or break up our truncation activities, I suppose...)

--D

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-03-04  1:36 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-10  9:30 block allocations for the refcount btree Christoph Hellwig
2016-02-10  9:50 ` Darrick J. Wong
2016-02-10 19:07   ` Christoph Hellwig
2016-02-10 21:40     ` Dave Chinner
2016-02-11 14:09       ` Brian Foster
2016-02-11 20:21         ` Dave Chinner
2016-02-12 19:10       ` Christoph Hellwig
2016-02-13  2:33         ` Dave Chinner
2016-02-13  4:44           ` Darrick J. Wong
2016-02-13  8:02             ` Christoph Hellwig
2016-02-13  7:48           ` Christoph Hellwig
2016-02-14  0:21             ` Dave Chinner
2016-03-01 18:18         ` Darrick J. Wong
2016-03-01 20:40           ` Christoph Hellwig
2016-03-02  5:24             ` Darrick J. Wong
2016-03-02  9:59               ` Christoph Hellwig
2016-03-02 16:41                 ` Darrick J. Wong
2016-03-02 16:57                   ` Christoph Hellwig
2016-03-02 21:21                     ` Darrick J. Wong
2016-03-03 14:05                       ` Christoph Hellwig
2016-03-04  1:36                         ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.