All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: fix indlen accounting error on partial delalloc conversion
@ 2017-05-03 18:46 Brian Foster
  2017-05-09 16:09 ` Darrick J. Wong
  0 siblings, 1 reply; 3+ messages in thread
From: Brian Foster @ 2017-05-03 18:46 UTC (permalink / raw)
  To: linux-xfs

The delalloc -> real block conversion path uses an incorrect
calculation in the case where the middle part of a delalloc extent
is being converted. This is documented as a rare situation because
XFS generally attempts to maximize contiguity by converting as much
of a delalloc extent as possible.

If this situation does occur, the indlen reservation for the two new
delalloc extents left behind by the conversion of the middle range
is calculated and compared with the original reservation. If more
blocks are required, the delta is allocated from the global block
pool. This delta value can be characterized as the difference
between the new total requirement (temp + temp2) and the currently
available reservation minus those blocks that have already been
allocated (startblockval(PREV.br_startblock) - allocated).

The problem is that the current code does not account for previously
allocated blocks correctly. It subtracts the current allocation
count from the (new - old) delta rather than the old indlen
reservation. This means that more indlen blocks than have been
allocated end up stashed in the remaining extents and free space
accounting is broken as a result.

Fix up the calculation to subtract the allocated block count from
the original extent indlen and thus correctly allocate the
reservation delta based on the difference between the new total
requirement and the unused blocks from the original reservation.
Also remove a bogus assert that contradicts the fact that the new
indlen reservation can be larger than the original indlen
reservation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---

Hi all,

So while I fortunately was able to reproduce this problem and verify the
patch, I'm not able to do so in any way that facilitates a generic test
case. To reproduce, I had to hack up the allocation path to always do
single physical block allocations and otherwise not attempt to convert
whole delalloc extents, hack up an ioctl to do a
filemap_write_and_wait_range(), and run a test that repeatedly does 1M
buffered writes followed by fragmented, page-sized writebacks of that 1M
extent. This forces partial (middle) extent conversions until the extent
tree grows enough to require a bmbt block allocation. When that occurs,
I always end up with the following free space accounting inconsistency
reported via xfs_repair:

Phase 1 - find and verify superblock...     
Phase 2 - using internal log
        - zero log...               
        - scan filesystem freespace and inode maps...
sb_fdblocks 3929316, counted 3929314             
...

With this patch applied, the same test no longer reproduces the
corruption. This otherwise survives my xfstests testing (without the
aforementioned hacks). Thoughts, reviews, flames appreciated.

Brian

 fs/xfs/libxfs/xfs_bmap.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index f02eb76..8adb91b 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -2065,8 +2065,10 @@ xfs_bmap_add_extent_delay_real(
 		}
 		temp = xfs_bmap_worst_indlen(bma->ip, temp);
 		temp2 = xfs_bmap_worst_indlen(bma->ip, temp2);
-		diff = (int)(temp + temp2 - startblockval(PREV.br_startblock) -
-			(bma->cur ? bma->cur->bc_private.b.allocated : 0));
+		diff = (int)(temp + temp2 -
+			     (startblockval(PREV.br_startblock) -
+			      (bma->cur ?
+			       bma->cur->bc_private.b.allocated : 0)));
 		if (diff > 0) {
 			error = xfs_mod_fdblocks(bma->ip->i_mount,
 						 -((int64_t)diff), false);
@@ -2123,7 +2125,6 @@ xfs_bmap_add_extent_delay_real(
 		temp = da_new;
 		if (bma->cur)
 			temp += bma->cur->bc_private.b.allocated;
-		ASSERT(temp <= da_old);
 		if (temp < da_old)
 			xfs_mod_fdblocks(bma->ip->i_mount,
 					(int64_t)(da_old - temp), false);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] xfs: fix indlen accounting error on partial delalloc conversion
  2017-05-03 18:46 [PATCH] xfs: fix indlen accounting error on partial delalloc conversion Brian Foster
@ 2017-05-09 16:09 ` Darrick J. Wong
  2017-05-09 16:32   ` Brian Foster
  0 siblings, 1 reply; 3+ messages in thread
From: Darrick J. Wong @ 2017-05-09 16:09 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Wed, May 03, 2017 at 02:46:26PM -0400, Brian Foster wrote:
> The delalloc -> real block conversion path uses an incorrect
> calculation in the case where the middle part of a delalloc extent
> is being converted. This is documented as a rare situation because
> XFS generally attempts to maximize contiguity by converting as much
> of a delalloc extent as possible.
> 
> If this situation does occur, the indlen reservation for the two new
> delalloc extents left behind by the conversion of the middle range
> is calculated and compared with the original reservation. If more
> blocks are required, the delta is allocated from the global block
> pool. This delta value can be characterized as the difference
> between the new total requirement (temp + temp2) and the currently
> available reservation minus those blocks that have already been
> allocated (startblockval(PREV.br_startblock) - allocated).
> 
> The problem is that the current code does not account for previously
> allocated blocks correctly. It subtracts the current allocation
> count from the (new - old) delta rather than the old indlen
> reservation. This means that more indlen blocks than have been
> allocated end up stashed in the remaining extents and free space
> accounting is broken as a result.
> 
> Fix up the calculation to subtract the allocated block count from
> the original extent indlen and thus correctly allocate the
> reservation delta based on the difference between the new total
> requirement and the unused blocks from the original reservation.
> Also remove a bogus assert that contradicts the fact that the new
> indlen reservation can be larger than the original indlen
> reservation.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> Hi all,
> 
> So while I fortunately was able to reproduce this problem and verify the
> patch, I'm not able to do so in any way that facilitates a generic test
> case. To reproduce, I had to hack up the allocation path to always do
> single physical block allocations and otherwise not attempt to convert
> whole delalloc extents, hack up an ioctl to do a
> filemap_write_and_wait_range(), and run a test that repeatedly does 1M
> buffered writes followed by fragmented, page-sized writebacks of that 1M
> extent. This forces partial (middle) extent conversions until the extent
> tree grows enough to require a bmbt block allocation. When that occurs,
> I always end up with the following free space accounting inconsistency
> reported via xfs_repair:
> 
> Phase 1 - find and verify superblock...     
> Phase 2 - using internal log
>         - zero log...               
>         - scan filesystem freespace and inode maps...
> sb_fdblocks 3929316, counted 3929314             
> ...
> 
> With this patch applied, the same test no longer reproduces the
> corruption. This otherwise survives my xfstests testing (without the
> aforementioned hacks). Thoughts, reviews, flames appreciated.
> 
> Brian
> 
>  fs/xfs/libxfs/xfs_bmap.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index f02eb76..8adb91b 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -2065,8 +2065,10 @@ xfs_bmap_add_extent_delay_real(
>  		}
>  		temp = xfs_bmap_worst_indlen(bma->ip, temp);
>  		temp2 = xfs_bmap_worst_indlen(bma->ip, temp2);
> -		diff = (int)(temp + temp2 - startblockval(PREV.br_startblock) -
> -			(bma->cur ? bma->cur->bc_private.b.allocated : 0));
> +		diff = (int)(temp + temp2 -
> +			     (startblockval(PREV.br_startblock) -
> +			      (bma->cur ?
> +			       bma->cur->bc_private.b.allocated : 0)));

<thinking aloud again>

Prior to this operation we made a delalloc reservation in which we
reserved a number of blocks for bmbt expansion by subtracting that
quantity from incore fdblocks; that is startblockval(PREV.br_startblock).
Then we did some bmbt magic which used up cur->bc_private.b.allocated
blocks from the reservation.  Since those blocks were already subtracted
from the incore fdblocks, we don't need to do that now.  After the magic
was over, we now have two smaller delalloc reservations (with a real
extent in the middle), each with their own reservations for bmbt shape
changing expansion.

Now we aim to make the accounting of incore fdblocks square again by
checking if we are now hanging on to more blocks than we were before and
subtracting the difference from fdblocks if so.  (We deal separately with
the possibility of hanging onto /fewer/ blocks than before.)

Ok, so first we reserved startblockval(PREV.br_startblock).

After splitting the da extent in the middle we now have two da extents;
the effective indlen reservation of the whole range we just touched is:
xfs_bmap_worst_indlen(bma->ip, temp) + xfs_bmap_worst_indlen(bma->ip, temp2);

Then we (may have) fed bma->cur->bc_private.b.allocated blocks to the bmbt.
The number of blocks we now know about is:

xfs_bmap_worst_indlen(bma->ip, temp) +
xfs_bmap_worst_indlen(bma->ip, temp2) +
bma->cur->bc_private.b.allocated

Therefore, the number of blocks we are over (if at all) is:

xfs_bmap_worst_indlen(bma->ip, temp) +
xfs_bmap_worst_indlen(bma->ip, temp2) +
bma->cur->bc_private.b.allocated -
startblockval(PREV.br_startblock)

Or, put another way:

(xfs_bmap_worst_indlen(bma->ip, temp) +
 xfs_bmap_worst_indlen(bma->ip, temp2)) -
(startblockval(PREV.br_startblock) -
 bma->cur->bc_private.b.allocated)

(Exactly what Brian wrote above.)

If that quantity is positive, we're now hanging on to more blocks than
we were before, so we subtract that many from fdblocks.  That part looks
ok, though I have one question -- why is diff an int?  temp/temp2 are
both xfs_filblks_t (__uint64_t), so the whole expression is promoted to
a 64-bit operation, then the results are squashed down into a 32-bit
int, which is then cast back to int64_t.  With our short MAXEXTLEN I
think it's impossible ever to have diff > 2^32, but the casting back and
forth caught my eye.

Aside from that, it looks ok...
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

>  		if (diff > 0) {
>  			error = xfs_mod_fdblocks(bma->ip->i_mount,
>  						 -((int64_t)diff), false);
> @@ -2123,7 +2125,6 @@ xfs_bmap_add_extent_delay_real(
>  		temp = da_new;
>  		if (bma->cur)
>  			temp += bma->cur->bc_private.b.allocated;
> -		ASSERT(temp <= da_old);
>  		if (temp < da_old)
>  			xfs_mod_fdblocks(bma->ip->i_mount,
>  					(int64_t)(da_old - temp), false);
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] xfs: fix indlen accounting error on partial delalloc conversion
  2017-05-09 16:09 ` Darrick J. Wong
@ 2017-05-09 16:32   ` Brian Foster
  0 siblings, 0 replies; 3+ messages in thread
From: Brian Foster @ 2017-05-09 16:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, May 09, 2017 at 09:09:48AM -0700, Darrick J. Wong wrote:
> On Wed, May 03, 2017 at 02:46:26PM -0400, Brian Foster wrote:
> > The delalloc -> real block conversion path uses an incorrect
> > calculation in the case where the middle part of a delalloc extent
> > is being converted. This is documented as a rare situation because
> > XFS generally attempts to maximize contiguity by converting as much
> > of a delalloc extent as possible.
> > 
> > If this situation does occur, the indlen reservation for the two new
> > delalloc extents left behind by the conversion of the middle range
> > is calculated and compared with the original reservation. If more
> > blocks are required, the delta is allocated from the global block
> > pool. This delta value can be characterized as the difference
> > between the new total requirement (temp + temp2) and the currently
> > available reservation minus those blocks that have already been
> > allocated (startblockval(PREV.br_startblock) - allocated).
> > 
> > The problem is that the current code does not account for previously
> > allocated blocks correctly. It subtracts the current allocation
> > count from the (new - old) delta rather than the old indlen
> > reservation. This means that more indlen blocks than have been
> > allocated end up stashed in the remaining extents and free space
> > accounting is broken as a result.
> > 
> > Fix up the calculation to subtract the allocated block count from
> > the original extent indlen and thus correctly allocate the
> > reservation delta based on the difference between the new total
> > requirement and the unused blocks from the original reservation.
> > Also remove a bogus assert that contradicts the fact that the new
> > indlen reservation can be larger than the original indlen
> > reservation.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > 
> > Hi all,
> > 
> > So while I fortunately was able to reproduce this problem and verify the
> > patch, I'm not able to do so in any way that facilitates a generic test
> > case. To reproduce, I had to hack up the allocation path to always do
> > single physical block allocations and otherwise not attempt to convert
> > whole delalloc extents, hack up an ioctl to do a
> > filemap_write_and_wait_range(), and run a test that repeatedly does 1M
> > buffered writes followed by fragmented, page-sized writebacks of that 1M
> > extent. This forces partial (middle) extent conversions until the extent
> > tree grows enough to require a bmbt block allocation. When that occurs,
> > I always end up with the following free space accounting inconsistency
> > reported via xfs_repair:
> > 
> > Phase 1 - find and verify superblock...     
> > Phase 2 - using internal log
> >         - zero log...               
> >         - scan filesystem freespace and inode maps...
> > sb_fdblocks 3929316, counted 3929314             
> > ...
> > 
> > With this patch applied, the same test no longer reproduces the
> > corruption. This otherwise survives my xfstests testing (without the
> > aforementioned hacks). Thoughts, reviews, flames appreciated.
> > 
> > Brian
> > 
> >  fs/xfs/libxfs/xfs_bmap.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index f02eb76..8adb91b 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -2065,8 +2065,10 @@ xfs_bmap_add_extent_delay_real(
> >  		}
> >  		temp = xfs_bmap_worst_indlen(bma->ip, temp);
> >  		temp2 = xfs_bmap_worst_indlen(bma->ip, temp2);
> > -		diff = (int)(temp + temp2 - startblockval(PREV.br_startblock) -
> > -			(bma->cur ? bma->cur->bc_private.b.allocated : 0));
> > +		diff = (int)(temp + temp2 -
> > +			     (startblockval(PREV.br_startblock) -
> > +			      (bma->cur ?
> > +			       bma->cur->bc_private.b.allocated : 0)));
> 
> <thinking aloud again>
> 
> Prior to this operation we made a delalloc reservation in which we
> reserved a number of blocks for bmbt expansion by subtracting that
> quantity from incore fdblocks; that is startblockval(PREV.br_startblock).
> Then we did some bmbt magic which used up cur->bc_private.b.allocated
> blocks from the reservation.  Since those blocks were already subtracted
> from the incore fdblocks, we don't need to do that now.  After the magic
> was over, we now have two smaller delalloc reservations (with a real
> extent in the middle), each with their own reservations for bmbt shape
> changing expansion.
> 
> Now we aim to make the accounting of incore fdblocks square again by
> checking if we are now hanging on to more blocks than we were before and
> subtracting the difference from fdblocks if so.  (We deal separately with
> the possibility of hanging onto /fewer/ blocks than before.)
> 
> Ok, so first we reserved startblockval(PREV.br_startblock).
> 
> After splitting the da extent in the middle we now have two da extents;
> the effective indlen reservation of the whole range we just touched is:
> xfs_bmap_worst_indlen(bma->ip, temp) + xfs_bmap_worst_indlen(bma->ip, temp2);
> 
> Then we (may have) fed bma->cur->bc_private.b.allocated blocks to the bmbt.
> The number of blocks we now know about is:
> 
> xfs_bmap_worst_indlen(bma->ip, temp) +
> xfs_bmap_worst_indlen(bma->ip, temp2) +
> bma->cur->bc_private.b.allocated
> 
> Therefore, the number of blocks we are over (if at all) is:
> 
> xfs_bmap_worst_indlen(bma->ip, temp) +
> xfs_bmap_worst_indlen(bma->ip, temp2) +
> bma->cur->bc_private.b.allocated -
> startblockval(PREV.br_startblock)
> 
> Or, put another way:
> 
> (xfs_bmap_worst_indlen(bma->ip, temp) +
>  xfs_bmap_worst_indlen(bma->ip, temp2)) -
> (startblockval(PREV.br_startblock) -
>  bma->cur->bc_private.b.allocated)
> 
> (Exactly what Brian wrote above.)
> 
> If that quantity is positive, we're now hanging on to more blocks than
> we were before, so we subtract that many from fdblocks.  That part looks
> ok, though I have one question -- why is diff an int?  temp/temp2 are
> both xfs_filblks_t (__uint64_t), so the whole expression is promoted to
> a 64-bit operation, then the results are squashed down into a 32-bit
> int, which is then cast back to int64_t.  With our short MAXEXTLEN I
> think it's impossible ever to have diff > 2^32, but the casting back and
> forth caught my eye.
> 

As to why it is signed, I'd guess that either it was coded that way to
avoid a separate conditional or perhaps diff was previously used
elsewhere in this function..? I'm not really sure why it uses 32-bit vs.
64-bit tbh. (I don't think it matters, but I'm fine if you want to use
an int64_t or something and clean up the casting).

> Aside from that, it looks ok...
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> 

Thanks for review.

Brian

> --D
> 
> >  		if (diff > 0) {
> >  			error = xfs_mod_fdblocks(bma->ip->i_mount,
> >  						 -((int64_t)diff), false);
> > @@ -2123,7 +2125,6 @@ xfs_bmap_add_extent_delay_real(
> >  		temp = da_new;
> >  		if (bma->cur)
> >  			temp += bma->cur->bc_private.b.allocated;
> > -		ASSERT(temp <= da_old);
> >  		if (temp < da_old)
> >  			xfs_mod_fdblocks(bma->ip->i_mount,
> >  					(int64_t)(da_old - temp), false);
> > -- 
> > 2.7.4
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-05-09 16:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-03 18:46 [PATCH] xfs: fix indlen accounting error on partial delalloc conversion Brian Foster
2017-05-09 16:09 ` Darrick J. Wong
2017-05-09 16:32   ` Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.