Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/2] xfs: fixes for realtime file truncation
@ 2019-11-26 20:13 Omar Sandoval
  2019-11-26 20:13 ` [PATCH 1/2] xfs: fix realtime file data space leak Omar Sandoval
  2019-11-26 20:13 ` [PATCH 2/2] xfs: don't check for AG deadlock for realtime files in bunmapi Omar Sandoval
  0 siblings, 2 replies; 5+ messages in thread
From: Omar Sandoval @ 2019-11-26 20:13 UTC (permalink / raw)
  To: linux-xfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

Hello,

These two patches fix bugs in a corner case of truncating realtime
files. We encountered this as a soft lockup in production while
truncating certain files, but I found the space leak on further
investigation. The lockup is caused by an interaction between the two
bugs fixed by these patches. I've also sent a reproducer for xfstests.

These patches are based on v5.4. Thanks!

Omar Sandoval (2):
  xfs: fix realtime file data space leak
  xfs: don't check for AG deadlock for realtime files in bunmapi

 fs/xfs/libxfs/xfs_bmap.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] xfs: fix realtime file data space leak
  2019-11-26 20:13 [PATCH 0/2] xfs: fixes for realtime file truncation Omar Sandoval
@ 2019-11-26 20:13 ` Omar Sandoval
  2019-12-03  1:56   ` Darrick J. Wong
  2019-11-26 20:13 ` [PATCH 2/2] xfs: don't check for AG deadlock for realtime files in bunmapi Omar Sandoval
  1 sibling, 1 reply; 5+ messages in thread
From: Omar Sandoval @ 2019-11-26 20:13 UTC (permalink / raw)
  To: linux-xfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

Realtime files in XFS allocate extents in rextsize units. However, the
written/unwritten state of those extents is still tracked in blocksize
units. Therefore, a realtime file can be split up into written and
unwritten extents that are not necessarily aligned to the realtime
extent size. __xfs_bunmapi() has some logic to handle these various
corner cases. Consider how it handles the following case:

1. The last extent is unwritten.
2. The last extent is smaller than the realtime extent size.
3. startblock of the last extent is not aligned to the realtime extent
   size, but startblock + blockcount is.

In this case, __xfs_bunmapi() calls xfs_bmap_add_extent_unwritten_real()
to set the second-to-last extent to unwritten. This should merge the
last and second-to-last extents, so __xfs_bunmapi() moves on to the
second-to-last extent.

However, if the size of the last and second-to-last extents combined is
greater than MAXEXTLEN, xfs_bmap_add_extent_unwritten_real() does not
merge the two extents. When that happens, __xfs_bunmapi() skips past the
last extent without unmapping it, thus leaking the space.

Fix it by only unwriting the minimum amount needed to align the last
extent to the realtime extent size, which is guaranteed to merge with
the last extent.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 02469d59c787..6f8791a1e460 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5376,16 +5376,17 @@ __xfs_bunmapi(
 		}
 		div_u64_rem(del.br_startblock, mp->m_sb.sb_rextsize, &mod);
 		if (mod) {
+			xfs_extlen_t off = mp->m_sb.sb_rextsize - mod;
+
 			/*
 			 * Realtime extent is lined up at the end but not
 			 * at the front.  We'll get rid of full extents if
 			 * we can.
 			 */
-			mod = mp->m_sb.sb_rextsize - mod;
-			if (del.br_blockcount > mod) {
-				del.br_blockcount -= mod;
-				del.br_startoff += mod;
-				del.br_startblock += mod;
+			if (del.br_blockcount > off) {
+				del.br_blockcount -= off;
+				del.br_startoff += off;
+				del.br_startblock += off;
 			} else if (del.br_startoff == start &&
 				   (del.br_state == XFS_EXT_UNWRITTEN ||
 				    tp->t_blk_res == 0)) {
@@ -5403,6 +5404,7 @@ __xfs_bunmapi(
 				continue;
 			} else if (del.br_state == XFS_EXT_UNWRITTEN) {
 				struct xfs_bmbt_irec	prev;
+				xfs_fileoff_t		unwrite_start;
 
 				/*
 				 * This one is already unwritten.
@@ -5416,12 +5418,13 @@ __xfs_bunmapi(
 				ASSERT(!isnullstartblock(prev.br_startblock));
 				ASSERT(del.br_startblock ==
 				       prev.br_startblock + prev.br_blockcount);
-				if (prev.br_startoff < start) {
-					mod = start - prev.br_startoff;
-					prev.br_blockcount -= mod;
-					prev.br_startblock += mod;
-					prev.br_startoff = start;
-				}
+				unwrite_start = max3(start,
+						     del.br_startoff - mod,
+						     prev.br_startoff);
+				mod = unwrite_start - prev.br_startoff;
+				prev.br_startoff = unwrite_start;
+				prev.br_startblock += mod;
+				prev.br_blockcount -= mod;
 				prev.br_state = XFS_EXT_UNWRITTEN;
 				error = xfs_bmap_add_extent_unwritten_real(tp,
 						ip, whichfork, &icur, &cur,
-- 
2.24.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/2] xfs: don't check for AG deadlock for realtime files in bunmapi
  2019-11-26 20:13 [PATCH 0/2] xfs: fixes for realtime file truncation Omar Sandoval
  2019-11-26 20:13 ` [PATCH 1/2] xfs: fix realtime file data space leak Omar Sandoval
@ 2019-11-26 20:13 ` Omar Sandoval
  2019-11-27  0:36   ` Darrick J. Wong
  1 sibling, 1 reply; 5+ messages in thread
From: Omar Sandoval @ 2019-11-26 20:13 UTC (permalink / raw)
  To: linux-xfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

Commit 5b094d6dac04 ("xfs: fix multi-AG deadlock in xfs_bunmapi") added
a check in __xfs_bunmapi() to stop early if we would touch multiple AGs
in the wrong order. However, this check isn't applicable for realtime
files. In most cases, it just makes us do unnecessary commits. However,
without the fix from the previous commit ("xfs: fix realtime file data
space leak"), if the last and second-to-last extents also happen to have
different "AG numbers", then the break actually causes __xfs_bunmapi()
to return without making any progress, which sends
xfs_itruncate_extents_flags() into an infinite loop.

Fixes: 5b094d6dac04 ("xfs: fix multi-AG deadlock in xfs_bunmapi")
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 6f8791a1e460..a11b6e7cb35f 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5300,7 +5300,7 @@ __xfs_bunmapi(
 		 * Make sure we don't touch multiple AGF headers out of order
 		 * in a single transaction, as that could cause AB-BA deadlocks.
 		 */
-		if (!wasdel) {
+		if (!wasdel && !isrt) {
 			agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
 			if (prev_agno != NULLAGNUMBER && prev_agno > agno)
 				break;
-- 
2.24.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] xfs: don't check for AG deadlock for realtime files in bunmapi
  2019-11-26 20:13 ` [PATCH 2/2] xfs: don't check for AG deadlock for realtime files in bunmapi Omar Sandoval
@ 2019-11-27  0:36   ` Darrick J. Wong
  0 siblings, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2019-11-27  0:36 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-xfs, kernel-team

On Tue, Nov 26, 2019 at 12:13:29PM -0800, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Commit 5b094d6dac04 ("xfs: fix multi-AG deadlock in xfs_bunmapi") added
> a check in __xfs_bunmapi() to stop early if we would touch multiple AGs
> in the wrong order. However, this check isn't applicable for realtime
> files. In most cases, it just makes us do unnecessary commits. However,
> without the fix from the previous commit ("xfs: fix realtime file data
> space leak"), if the last and second-to-last extents also happen to have
> different "AG numbers", then the break actually causes __xfs_bunmapi()
> to return without making any progress, which sends
> xfs_itruncate_extents_flags() into an infinite loop.
> 
> Fixes: 5b094d6dac04 ("xfs: fix multi-AG deadlock in xfs_bunmapi")
> Signed-off-by: Omar Sandoval <osandov@fb.com>

Looks pretty straightforward,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 6f8791a1e460..a11b6e7cb35f 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5300,7 +5300,7 @@ __xfs_bunmapi(
>  		 * Make sure we don't touch multiple AGF headers out of order
>  		 * in a single transaction, as that could cause AB-BA deadlocks.
>  		 */
> -		if (!wasdel) {
> +		if (!wasdel && !isrt) {
>  			agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
>  			if (prev_agno != NULLAGNUMBER && prev_agno > agno)
>  				break;
> -- 
> 2.24.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] xfs: fix realtime file data space leak
  2019-11-26 20:13 ` [PATCH 1/2] xfs: fix realtime file data space leak Omar Sandoval
@ 2019-12-03  1:56   ` Darrick J. Wong
  0 siblings, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2019-12-03  1:56 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-xfs, kernel-team

On Tue, Nov 26, 2019 at 12:13:28PM -0800, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Realtime files in XFS allocate extents in rextsize units. However, the
> written/unwritten state of those extents is still tracked in blocksize
> units. Therefore, a realtime file can be split up into written and
> unwritten extents that are not necessarily aligned to the realtime
> extent size. __xfs_bunmapi() has some logic to handle these various
> corner cases. Consider how it handles the following case:
> 
> 1. The last extent is unwritten.
> 2. The last extent is smaller than the realtime extent size.
> 3. startblock of the last extent is not aligned to the realtime extent
>    size, but startblock + blockcount is.
> 
> In this case, __xfs_bunmapi() calls xfs_bmap_add_extent_unwritten_real()
> to set the second-to-last extent to unwritten. This should merge the
> last and second-to-last extents, so __xfs_bunmapi() moves on to the
> second-to-last extent.
> 
> However, if the size of the last and second-to-last extents combined is
> greater than MAXEXTLEN, xfs_bmap_add_extent_unwritten_real() does not
> merge the two extents. When that happens, __xfs_bunmapi() skips past the
> last extent without unmapping it, thus leaking the space.
> 
> Fix it by only unwriting the minimum amount needed to align the last
> extent to the realtime extent size, which is guaranteed to merge with
> the last extent.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 02469d59c787..6f8791a1e460 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5376,16 +5376,17 @@ __xfs_bunmapi(
>  		}
>  		div_u64_rem(del.br_startblock, mp->m_sb.sb_rextsize, &mod);
>  		if (mod) {
> +			xfs_extlen_t off = mp->m_sb.sb_rextsize - mod;
> +
>  			/*
>  			 * Realtime extent is lined up at the end but not
>  			 * at the front.  We'll get rid of full extents if
>  			 * we can.
>  			 */
> -			mod = mp->m_sb.sb_rextsize - mod;
> -			if (del.br_blockcount > mod) {
> -				del.br_blockcount -= mod;
> -				del.br_startoff += mod;
> -				del.br_startblock += mod;
> +			if (del.br_blockcount > off) {
> +				del.br_blockcount -= off;
> +				del.br_startoff += off;
> +				del.br_startblock += off;

Ok, so we make this change so that we no longer change @mod once it's
set by the div64 operation...

>  			} else if (del.br_startoff == start &&
>  				   (del.br_state == XFS_EXT_UNWRITTEN ||
>  				    tp->t_blk_res == 0)) {
> @@ -5403,6 +5404,7 @@ __xfs_bunmapi(
>  				continue;
>  			} else if (del.br_state == XFS_EXT_UNWRITTEN) {
>  				struct xfs_bmbt_irec	prev;
> +				xfs_fileoff_t		unwrite_start;
>  
>  				/*
>  				 * This one is already unwritten.
> @@ -5416,12 +5418,13 @@ __xfs_bunmapi(
>  				ASSERT(!isnullstartblock(prev.br_startblock));
>  				ASSERT(del.br_startblock ==
>  				       prev.br_startblock + prev.br_blockcount);
> -				if (prev.br_startoff < start) {
> -					mod = start - prev.br_startoff;
> -					prev.br_blockcount -= mod;
> -					prev.br_startblock += mod;
> -					prev.br_startoff = start;
> -				}

...and here, we have a @del extent that is unwritten and a @prev extent
that is written.  We aim to trick xfs_bmap_add_extent_unwritten_real
into extending @del towards startoff==0 and returning with @icur
pointing at @del (not @prev) so that the next time we go around the loop
we see an rtextsize-aligned @del and simply unmap it...

> +				unwrite_start = max3(start,
> +						     del.br_startoff - mod,
> +						     prev.br_startoff);

...however, if @prev is too long to convert+combine with @del, the
conversion routine converts @prev to unwritten and returns with @icur
pointing to @prev, not @del.  That's how we leak @del.

This patch fixes that by capping the conversion to the start of the
rtext alignment, which means that we can always merge with @del and
always return with @icur pointing at @del.  Ok, that's exactly what the
commit message says.

It was /really/ helpful to be able to use the test case to walk through
exactly what this patch is trying to fix.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +				mod = unwrite_start - prev.br_startoff;
> +				prev.br_startoff = unwrite_start;
> +				prev.br_startblock += mod;
> +				prev.br_blockcount -= mod;
>  				prev.br_state = XFS_EXT_UNWRITTEN;
>  				error = xfs_bmap_add_extent_unwritten_real(tp,
>  						ip, whichfork, &icur, &cur,
> -- 
> 2.24.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-26 20:13 [PATCH 0/2] xfs: fixes for realtime file truncation Omar Sandoval
2019-11-26 20:13 ` [PATCH 1/2] xfs: fix realtime file data space leak Omar Sandoval
2019-12-03  1:56   ` Darrick J. Wong
2019-11-26 20:13 ` [PATCH 2/2] xfs: don't check for AG deadlock for realtime files in bunmapi Omar Sandoval
2019-11-27  0:36   ` Darrick J. Wong

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git