All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] xfs: handle inode extent count mismatch
@ 2018-06-19  2:41 Dave Chinner
  2018-06-19  2:41 ` [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion Dave Chinner
  2018-06-19  2:41 ` [PATCH 2/2] xfs: More robust inode extent count validation Dave Chinner
  0 siblings, 2 replies; 16+ messages in thread
From: Dave Chinner @ 2018-06-19  2:41 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

Wen Xu provided an image that caused a crash allocating an extent.
The AGFL was corrupt, as was the inode data fork extent count. The
combination of the two corruptions could lead to a delalloc extent
being allocated on write, and then when allocation fails because the
AGFL was corrupt, it would try to punch out the delalloc extent
which would then try to convert the format of the extent list in the
inode data fork without a transaction. This would crash.

The following two patches address this - the first makes
xfs_bunmapi() return EFSCORRUPTED if it tries to change the inode
fork format without a transaction context and avoids the crash. THe
second makes the inode verifier detect this specific inode fork
corruption and prevents any attempt to access it with EFSCORRUPTED.

Comments?

Cheers,

Dave.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion
  2018-06-19  2:41 [PATCH 0/2] xfs: handle inode extent count mismatch Dave Chinner
@ 2018-06-19  2:41 ` Dave Chinner
  2018-06-19  4:54   ` Darrick J. Wong
  2018-06-19  2:41 ` [PATCH 2/2] xfs: More robust inode extent count validation Dave Chinner
  1 sibling, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2018-06-19  2:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

If we are punching out a delalloc extent, xfs_bunmapi() does not
have a transaction context and should not ever need to convert the
on-disk extent format. If such a thing is attempted (e.g. via a
corrupt inode extent count in extent format) then we should abort
with an EFSCORRUPTED error. Unfortunately, we don't do that and
crash instead:

 XFS (loop0): page discard on page 0000000005fd24f3, inode 0x75e5, offset 0.
 ==================================================================
 BUG: KASAN: null-ptr-deref in xfs_alloc_get_freelist+0x115/0x350
 Read of size 8 at addr 0000000000000028 by task a.out/1406
 CPU: 0 PID: 1406 Comm: a.out Not tainted 4.17.0-rc4-kasan #2
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
 Call Trace:
  dump_stack+0x7b/0xb5
  kasan_report+0x10c/0x390
  __asan_load8+0x54/0x90
  xfs_alloc_get_freelist+0x115/0x350
  xfs_alloc_fix_freelist+0x35b/0x830
  xfs_alloc_vextent+0x215/0x990
  xfs_bmap_extents_to_btree+0x30d/0x940
.....

By returning an error here, we avoid such crashes when punching out
a delalloc page because we don't try to fix up an AG freelist
without a transaction. Hence we get an error like so:

XFS (loop0): page discard on page ffffea00040ae640, inode 0x75e5, offset 0.
XFS (loop0): page discard unable to remove delalloc mapping.

And the filesystem continues to operate and the stale mapping is
cleaned up when the inode is reclaimed.

Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 01628f0c9a0c..6967ce8088d2 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5458,10 +5458,18 @@ __xfs_bunmapi(
 		*rlen = end - start + 1;
 
 	/*
-	 * Convert to a btree if necessary.
+	 * Convert the BMBT root format if necessary. This should only occur in
+	 * transaction contexts and not when removing delalloc extents from
+	 * the in-core extent tree. If we don't have a transaction, then we've
+	 * got some form of corruption somewhere, so return an error
+	 * immediately.
 	 */
 	if (xfs_bmap_needs_btree(ip, whichfork)) {
 		ASSERT(cur == NULL);
+		if (!tp) {
+			error = -EFSCORRUPTED;
+			goto error0;
+		}
 		error = xfs_bmap_extents_to_btree(tp, ip, firstblock, dfops,
 			&cur, 0, &tmp_logflags, whichfork);
 		logflags |= tmp_logflags;
@@ -5473,6 +5481,10 @@ __xfs_bunmapi(
 	 */
 	else if (xfs_bmap_wants_extents(ip, whichfork)) {
 		ASSERT(cur != NULL);
+		if (!tp) {
+			error = -EFSCORRUPTED;
+			goto error0;
+		}
 		error = xfs_bmap_btree_to_extents(tp, ip, cur, &tmp_logflags,
 			whichfork);
 		logflags |= tmp_logflags;
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] xfs: More robust inode extent count validation
  2018-06-19  2:41 [PATCH 0/2] xfs: handle inode extent count mismatch Dave Chinner
  2018-06-19  2:41 ` [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion Dave Chinner
@ 2018-06-19  2:41 ` Dave Chinner
  2018-06-19  4:57   ` Darrick J. Wong
  2018-06-20  7:34   ` Christoph Hellwig
  1 sibling, 2 replies; 16+ messages in thread
From: Dave Chinner @ 2018-06-19  2:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When the inode is in extent format, it can't have more extents that
fit in the inode fork. We don't currenty check this, and so this
corruption goes unnoticed by the inode verifiers. This can lead to
crashes operating on invalid in-memory structures.

Attempts to access such a inode will now error out in the verifier
rather than allowing modification operations to proceed.

Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_format.h    |  3 ++
 fs/xfs/libxfs/xfs_inode_buf.c | 74 +++++++++++++++++++++--------------
 2 files changed, 48 insertions(+), 29 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 1c5a8aaf2bfc..1cb298fec274 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -962,6 +962,9 @@ typedef enum xfs_dinode_fmt {
 		XFS_DFORK_DSIZE(dip, mp) : \
 		XFS_DFORK_ASIZE(dip, mp))
 
+#define XFS_DFORK_MAXEXT(dip, mp, w) \
+	(XFS_DFORK_SIZE(dip, mp, w) / sizeof(xfs_bmbt_rec_t))
+
 /*
  * Return pointers to the data or attribute forks.
  */
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index d38d724534c4..a41b6e5519e0 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -374,6 +374,45 @@ xfs_log_dinode_to_disk(
 	}
 }
 
+static xfs_failaddr_t
+xfs_dinode_verify_fork(
+	struct xfs_dinode	*dip,
+	struct xfs_mount	*mp,
+	int			whichfork)
+{
+	uint32_t		di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);
+
+	switch (XFS_DFORK_FORMAT(dip, whichfork)) {
+	case XFS_DINODE_FMT_LOCAL:
+		/*
+		 * no local regular files yet
+		 */
+		if (whichfork == XFS_DATA_FORK) {
+			if (S_ISREG(be16_to_cpu(dip->di_mode)))
+				return __this_address;
+			if (be64_to_cpu(dip->di_size) >
+					XFS_DFORK_SIZE(dip, mp, whichfork))
+				return __this_address;
+		}
+		if (di_nextents)
+			return __this_address;
+		/* fall through */
+	case XFS_DINODE_FMT_EXTENTS:
+		if (di_nextents > XFS_DFORK_MAXEXT(dip, mp, whichfork))
+			return __this_address;
+	case XFS_DINODE_FMT_BTREE:
+		if (whichfork == XFS_ATTR_FORK)
+			if (di_nextents > MAXAEXTNUM)
+				return __this_address;
+		else if (di_nextents > MAXEXTNUM)
+			return __this_address;
+		break;
+	default:
+		return __this_address;
+	}
+	return NULL;
+}
+
 xfs_failaddr_t
 xfs_dinode_verify(
 	struct xfs_mount	*mp,
@@ -441,24 +480,9 @@ xfs_dinode_verify(
 	case S_IFREG:
 	case S_IFLNK:
 	case S_IFDIR:
-		switch (dip->di_format) {
-		case XFS_DINODE_FMT_LOCAL:
-			/*
-			 * no local regular files yet
-			 */
-			if (S_ISREG(mode))
-				return __this_address;
-			if (di_size > XFS_DFORK_DSIZE(dip, mp))
-				return __this_address;
-			if (dip->di_nextents)
-				return __this_address;
-			/* fall through */
-		case XFS_DINODE_FMT_EXTENTS:
-		case XFS_DINODE_FMT_BTREE:
-			break;
-		default:
-			return __this_address;
-		}
+		fa = xfs_dinode_verify_fork(dip, mp, XFS_DATA_FORK);
+		if (fa)
+			return fa;
 		break;
 	case 0:
 		/* Uninitialized inode ok. */
@@ -468,17 +492,9 @@ xfs_dinode_verify(
 	}
 
 	if (XFS_DFORK_Q(dip)) {
-		switch (dip->di_aformat) {
-		case XFS_DINODE_FMT_LOCAL:
-			if (dip->di_anextents)
-				return __this_address;
-		/* fall through */
-		case XFS_DINODE_FMT_EXTENTS:
-		case XFS_DINODE_FMT_BTREE:
-			break;
-		default:
-			return __this_address;
-		}
+		fa = xfs_dinode_verify_fork(dip, mp, XFS_ATTR_FORK);
+		if (fa)
+			return fa;
 	} else {
 		/*
 		 * If there is no fork offset, this may be a freshly-made inode
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion
  2018-06-19  2:41 ` [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion Dave Chinner
@ 2018-06-19  4:54   ` Darrick J. Wong
  2018-06-19  5:27     ` Dave Chinner
  2018-06-20  7:31     ` Christoph Hellwig
  0 siblings, 2 replies; 16+ messages in thread
From: Darrick J. Wong @ 2018-06-19  4:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 19, 2018 at 12:41:27PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> If we are punching out a delalloc extent, xfs_bunmapi() does not
> have a transaction context and should not ever need to convert the
> on-disk extent format. If such a thing is attempted (e.g. via a
> corrupt inode extent count in extent format) then we should abort
> with an EFSCORRUPTED error. Unfortunately, we don't do that and
> crash instead:
> 
>  XFS (loop0): page discard on page 0000000005fd24f3, inode 0x75e5, offset 0.
>  ==================================================================
>  BUG: KASAN: null-ptr-deref in xfs_alloc_get_freelist+0x115/0x350
>  Read of size 8 at addr 0000000000000028 by task a.out/1406
>  CPU: 0 PID: 1406 Comm: a.out Not tainted 4.17.0-rc4-kasan #2
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
>  Call Trace:
>   dump_stack+0x7b/0xb5
>   kasan_report+0x10c/0x390
>   __asan_load8+0x54/0x90
>   xfs_alloc_get_freelist+0x115/0x350
>   xfs_alloc_fix_freelist+0x35b/0x830
>   xfs_alloc_vextent+0x215/0x990
>   xfs_bmap_extents_to_btree+0x30d/0x940
> .....
> 
> By returning an error here, we avoid such crashes when punching out
> a delalloc page because we don't try to fix up an AG freelist
> without a transaction. Hence we get an error like so:

Um, isn't erroring out here leaving a dirty bomb in the in-core metadata?
Like you say:

> XFS (loop0): page discard on page ffffea00040ae640, inode 0x75e5, offset 0.
> XFS (loop0): page discard unable to remove delalloc mapping.

We know the fs is corrupt, we might as well shut down now rather than
let this burp out later.

I get that people don't want to touch well seasoned code, but
xfs_bunmapi is this big unwieldly function that's crying out for a
refactor.  It's 330 lines long and can be called from various contexts
(data/attr fork, punch delalloc, etc.)...

...it's also weird that xfs_bmap_punch_delalloc_range calls xfs_bunmapi
with no transaction and a xfs_defer that we dump on the ground.

So yes, I think the patch does fix the crash, but it's kinda gross.

Thoughts?

--D

> And the filesystem continues to operate and the stale mapping is
> cleaned up when the inode is reclaimed.
> 
> Reported-by: Wen Xu <wen.xu@gatech.edu>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 01628f0c9a0c..6967ce8088d2 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5458,10 +5458,18 @@ __xfs_bunmapi(
>  		*rlen = end - start + 1;
>  
>  	/*
> -	 * Convert to a btree if necessary.
> +	 * Convert the BMBT root format if necessary. This should only occur in
> +	 * transaction contexts and not when removing delalloc extents from
> +	 * the in-core extent tree. If we don't have a transaction, then we've
> +	 * got some form of corruption somewhere, so return an error
> +	 * immediately.
>  	 */
>  	if (xfs_bmap_needs_btree(ip, whichfork)) {
>  		ASSERT(cur == NULL);
> +		if (!tp) {
> +			error = -EFSCORRUPTED;
> +			goto error0;
> +		}
>  		error = xfs_bmap_extents_to_btree(tp, ip, firstblock, dfops,
>  			&cur, 0, &tmp_logflags, whichfork);
>  		logflags |= tmp_logflags;
> @@ -5473,6 +5481,10 @@ __xfs_bunmapi(
>  	 */
>  	else if (xfs_bmap_wants_extents(ip, whichfork)) {
>  		ASSERT(cur != NULL);
> +		if (!tp) {
> +			error = -EFSCORRUPTED;
> +			goto error0;
> +		}
>  		error = xfs_bmap_btree_to_extents(tp, ip, cur, &tmp_logflags,
>  			whichfork);
>  		logflags |= tmp_logflags;
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] xfs: More robust inode extent count validation
  2018-06-19  2:41 ` [PATCH 2/2] xfs: More robust inode extent count validation Dave Chinner
@ 2018-06-19  4:57   ` Darrick J. Wong
  2018-06-19  5:29     ` Dave Chinner
  2018-06-20  7:34   ` Christoph Hellwig
  1 sibling, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2018-06-19  4:57 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 19, 2018 at 12:41:28PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When the inode is in extent format, it can't have more extents that
> fit in the inode fork. We don't currenty check this, and so this
> corruption goes unnoticed by the inode verifiers. This can lead to
> crashes operating on invalid in-memory structures.
> 
> Attempts to access such a inode will now error out in the verifier
> rather than allowing modification operations to proceed.
> 
> Reported-by: Wen Xu <wen.xu@gatech.edu>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_format.h    |  3 ++
>  fs/xfs/libxfs/xfs_inode_buf.c | 74 +++++++++++++++++++++--------------
>  2 files changed, 48 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 1c5a8aaf2bfc..1cb298fec274 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -962,6 +962,9 @@ typedef enum xfs_dinode_fmt {
>  		XFS_DFORK_DSIZE(dip, mp) : \
>  		XFS_DFORK_ASIZE(dip, mp))
>  
> +#define XFS_DFORK_MAXEXT(dip, mp, w) \
> +	(XFS_DFORK_SIZE(dip, mp, w) / sizeof(xfs_bmbt_rec_t))
> +
>  /*
>   * Return pointers to the data or attribute forks.
>   */
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index d38d724534c4..a41b6e5519e0 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -374,6 +374,45 @@ xfs_log_dinode_to_disk(
>  	}
>  }
>  
> +static xfs_failaddr_t
> +xfs_dinode_verify_fork(
> +	struct xfs_dinode	*dip,
> +	struct xfs_mount	*mp,
> +	int			whichfork)
> +{
> +	uint32_t		di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);
> +
> +	switch (XFS_DFORK_FORMAT(dip, whichfork)) {
> +	case XFS_DINODE_FMT_LOCAL:
> +		/*
> +		 * no local regular files yet
> +		 */
> +		if (whichfork == XFS_DATA_FORK) {
> +			if (S_ISREG(be16_to_cpu(dip->di_mode)))
> +				return __this_address;
> +			if (be64_to_cpu(dip->di_size) >
> +					XFS_DFORK_SIZE(dip, mp, whichfork))
> +				return __this_address;
> +		}
> +		if (di_nextents)
> +			return __this_address;
> +		/* fall through */

We could break here too, right?  There's no point in further checks of
di_nextents for local format forks.

> +	case XFS_DINODE_FMT_EXTENTS:
> +		if (di_nextents > XFS_DFORK_MAXEXT(dip, mp, whichfork))
> +			return __this_address;

Are we supposed to break here?

--D

> +	case XFS_DINODE_FMT_BTREE:
> +		if (whichfork == XFS_ATTR_FORK)
> +			if (di_nextents > MAXAEXTNUM)
> +				return __this_address;
> +		else if (di_nextents > MAXEXTNUM)
> +			return __this_address;
> +		break;
> +	default:
> +		return __this_address;
> +	}
> +	return NULL;
> +}
> +
>  xfs_failaddr_t
>  xfs_dinode_verify(
>  	struct xfs_mount	*mp,
> @@ -441,24 +480,9 @@ xfs_dinode_verify(
>  	case S_IFREG:
>  	case S_IFLNK:
>  	case S_IFDIR:
> -		switch (dip->di_format) {
> -		case XFS_DINODE_FMT_LOCAL:
> -			/*
> -			 * no local regular files yet
> -			 */
> -			if (S_ISREG(mode))
> -				return __this_address;
> -			if (di_size > XFS_DFORK_DSIZE(dip, mp))
> -				return __this_address;
> -			if (dip->di_nextents)
> -				return __this_address;
> -			/* fall through */
> -		case XFS_DINODE_FMT_EXTENTS:
> -		case XFS_DINODE_FMT_BTREE:
> -			break;
> -		default:
> -			return __this_address;
> -		}
> +		fa = xfs_dinode_verify_fork(dip, mp, XFS_DATA_FORK);
> +		if (fa)
> +			return fa;
>  		break;
>  	case 0:
>  		/* Uninitialized inode ok. */
> @@ -468,17 +492,9 @@ xfs_dinode_verify(
>  	}
>  
>  	if (XFS_DFORK_Q(dip)) {
> -		switch (dip->di_aformat) {
> -		case XFS_DINODE_FMT_LOCAL:
> -			if (dip->di_anextents)
> -				return __this_address;
> -		/* fall through */
> -		case XFS_DINODE_FMT_EXTENTS:
> -		case XFS_DINODE_FMT_BTREE:
> -			break;
> -		default:
> -			return __this_address;
> -		}
> +		fa = xfs_dinode_verify_fork(dip, mp, XFS_ATTR_FORK);
> +		if (fa)
> +			return fa;
>  	} else {
>  		/*
>  		 * If there is no fork offset, this may be a freshly-made inode
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion
  2018-06-19  4:54   ` Darrick J. Wong
@ 2018-06-19  5:27     ` Dave Chinner
  2018-06-19  6:06       ` Darrick J. Wong
  2018-06-20  7:31     ` Christoph Hellwig
  1 sibling, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2018-06-19  5:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jun 18, 2018 at 09:54:05PM -0700, Darrick J. Wong wrote:
> On Tue, Jun 19, 2018 at 12:41:27PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > If we are punching out a delalloc extent, xfs_bunmapi() does not
> > have a transaction context and should not ever need to convert the
> > on-disk extent format. If such a thing is attempted (e.g. via a
> > corrupt inode extent count in extent format) then we should abort
> > with an EFSCORRUPTED error. Unfortunately, we don't do that and
> > crash instead:
> > 
> >  XFS (loop0): page discard on page 0000000005fd24f3, inode 0x75e5, offset 0.
> >  ==================================================================
> >  BUG: KASAN: null-ptr-deref in xfs_alloc_get_freelist+0x115/0x350
> >  Read of size 8 at addr 0000000000000028 by task a.out/1406
> >  CPU: 0 PID: 1406 Comm: a.out Not tainted 4.17.0-rc4-kasan #2
> >  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> >  Call Trace:
> >   dump_stack+0x7b/0xb5
> >   kasan_report+0x10c/0x390
> >   __asan_load8+0x54/0x90
> >   xfs_alloc_get_freelist+0x115/0x350
> >   xfs_alloc_fix_freelist+0x35b/0x830
> >   xfs_alloc_vextent+0x215/0x990
> >   xfs_bmap_extents_to_btree+0x30d/0x940
> > .....
> > 
> > By returning an error here, we avoid such crashes when punching out
> > a delalloc page because we don't try to fix up an AG freelist
> > without a transaction. Hence we get an error like so:
> 
> Um, isn't erroring out here leaving a dirty bomb in the in-core metadata?

Not that I can tell. We've already trashed the dirty page state by
this point, so the page cache can safely reclaim the page and the
delalloc range over it will never get written.  And the XFS inode
cleanup code didn't have any issues with the way the error was
handled, either, because the delalloc range was actually removed
before the fork format error was triggered.

IOWs, there is no dirty, stale page state or delalloc extents
hanging around if this error fires.

> Like you say:
> 
> > XFS (loop0): page discard on page ffffea00040ae640, inode 0x75e5, offset 0.
> > XFS (loop0): page discard unable to remove delalloc mapping.
> 
> We know the fs is corrupt, we might as well shut down now rather than
> let this burp out later.

xfs_bunmapi() doesn't do shutdowns - the higher level code does a
shutdown on error if it is necessary, otherwise it just propagates
the error. In this case it has cleaned up correctly, propagates the
error and it gets back to userspace on the next fsync, and we're
fine to continue onwards as there was no unrecoverable error....

> I get that people don't want to touch well seasoned code, but
> xfs_bunmapi is this big unwieldly function that's crying out for a
> refactor.  It's 330 lines long and can be called from various contexts
> (data/attr fork, punch delalloc, etc.)...
>
> ...it's also weird that xfs_bmap_punch_delalloc_range calls xfs_bunmapi
> with no transaction and a xfs_defer that we dump on the ground.

Yes, and yes.

> So yes, I think the patch does fix the crash, but it's kinda gross.

Yes, it is.

But OTOH, I don't want to risk a bunch of filesystem corrupting
regressions across the entire XFS userbase just to fix a trivially
simple crash that requires an extremely unlikely co-ordinated
corruption of an inode data fork and an AGFL, and to simultaneously
have ENOSPC in every other AGF in the filesystem.

Put "refactor xfs_bunmapi()" on the list of "things to do when
there's nothing else to do"...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] xfs: More robust inode extent count validation
  2018-06-19  4:57   ` Darrick J. Wong
@ 2018-06-19  5:29     ` Dave Chinner
  2018-06-19  6:07       ` Darrick J. Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2018-06-19  5:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jun 18, 2018 at 09:57:25PM -0700, Darrick J. Wong wrote:
> On Tue, Jun 19, 2018 at 12:41:28PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When the inode is in extent format, it can't have more extents that
> > fit in the inode fork. We don't currenty check this, and so this
> > corruption goes unnoticed by the inode verifiers. This can lead to
> > crashes operating on invalid in-memory structures.
> > 
> > Attempts to access such a inode will now error out in the verifier
> > rather than allowing modification operations to proceed.
> > 
> > Reported-by: Wen Xu <wen.xu@gatech.edu>
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_format.h    |  3 ++
> >  fs/xfs/libxfs/xfs_inode_buf.c | 74 +++++++++++++++++++++--------------
> >  2 files changed, 48 insertions(+), 29 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> > index 1c5a8aaf2bfc..1cb298fec274 100644
> > --- a/fs/xfs/libxfs/xfs_format.h
> > +++ b/fs/xfs/libxfs/xfs_format.h
> > @@ -962,6 +962,9 @@ typedef enum xfs_dinode_fmt {
> >  		XFS_DFORK_DSIZE(dip, mp) : \
> >  		XFS_DFORK_ASIZE(dip, mp))
> >  
> > +#define XFS_DFORK_MAXEXT(dip, mp, w) \
> > +	(XFS_DFORK_SIZE(dip, mp, w) / sizeof(xfs_bmbt_rec_t))
> > +
> >  /*
> >   * Return pointers to the data or attribute forks.
> >   */
> > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> > index d38d724534c4..a41b6e5519e0 100644
> > --- a/fs/xfs/libxfs/xfs_inode_buf.c
> > +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> > @@ -374,6 +374,45 @@ xfs_log_dinode_to_disk(
> >  	}
> >  }
> >  
> > +static xfs_failaddr_t
> > +xfs_dinode_verify_fork(
> > +	struct xfs_dinode	*dip,
> > +	struct xfs_mount	*mp,
> > +	int			whichfork)
> > +{
> > +	uint32_t		di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);
> > +
> > +	switch (XFS_DFORK_FORMAT(dip, whichfork)) {
> > +	case XFS_DINODE_FMT_LOCAL:
> > +		/*
> > +		 * no local regular files yet
> > +		 */
> > +		if (whichfork == XFS_DATA_FORK) {
> > +			if (S_ISREG(be16_to_cpu(dip->di_mode)))
> > +				return __this_address;
> > +			if (be64_to_cpu(dip->di_size) >
> > +					XFS_DFORK_SIZE(dip, mp, whichfork))
> > +				return __this_address;
> > +		}
> > +		if (di_nextents)
> > +			return __this_address;
> > +		/* fall through */
> 
> We could break here too, right?  There's no point in further checks of
> di_nextents for local format forks.
> 
> > +	case XFS_DINODE_FMT_EXTENTS:
> > +		if (di_nextents > XFS_DFORK_MAXEXT(dip, mp, whichfork))
> > +			return __this_address;
> 
> Are we supposed to break here?

They all fall through like they used to, but they could break, too.
The behaviour will be the same now.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion
  2018-06-19  5:27     ` Dave Chinner
@ 2018-06-19  6:06       ` Darrick J. Wong
  2018-06-19 23:33         ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2018-06-19  6:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 19, 2018 at 03:27:59PM +1000, Dave Chinner wrote:
> On Mon, Jun 18, 2018 at 09:54:05PM -0700, Darrick J. Wong wrote:
> > On Tue, Jun 19, 2018 at 12:41:27PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > If we are punching out a delalloc extent, xfs_bunmapi() does not
> > > have a transaction context and should not ever need to convert the
> > > on-disk extent format. If such a thing is attempted (e.g. via a
> > > corrupt inode extent count in extent format) then we should abort
> > > with an EFSCORRUPTED error. Unfortunately, we don't do that and
> > > crash instead:
> > > 
> > >  XFS (loop0): page discard on page 0000000005fd24f3, inode 0x75e5, offset 0.
> > >  ==================================================================
> > >  BUG: KASAN: null-ptr-deref in xfs_alloc_get_freelist+0x115/0x350
> > >  Read of size 8 at addr 0000000000000028 by task a.out/1406
> > >  CPU: 0 PID: 1406 Comm: a.out Not tainted 4.17.0-rc4-kasan #2
> > >  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > >  Call Trace:
> > >   dump_stack+0x7b/0xb5
> > >   kasan_report+0x10c/0x390
> > >   __asan_load8+0x54/0x90
> > >   xfs_alloc_get_freelist+0x115/0x350
> > >   xfs_alloc_fix_freelist+0x35b/0x830
> > >   xfs_alloc_vextent+0x215/0x990
> > >   xfs_bmap_extents_to_btree+0x30d/0x940
> > > .....
> > > 
> > > By returning an error here, we avoid such crashes when punching out
> > > a delalloc page because we don't try to fix up an AG freelist
> > > without a transaction. Hence we get an error like so:
> > 
> > Um, isn't erroring out here leaving a dirty bomb in the in-core metadata?
> 
> Not that I can tell. We've already trashed the dirty page state by
> this point, so the page cache can safely reclaim the page and the
> delalloc range over it will never get written.  And the XFS inode
> cleanup code didn't have any issues with the way the error was
> handled, either, because the delalloc range was actually removed
> before the fork format error was triggered.
> 
> IOWs, there is no dirty, stale page state or delalloc extents
> hanging around if this error fires.

Hmmm, well I guess I'll pull this one in and look for problems.

I wonder, is there a <cough> testcase for this?  Or a fuzz-o-matic to
turn all these things into regression tests?

(Yeah, I know there won't be one for syzbot, I dug through its code and
had to reset my brain by reading mballoc.c. :P)

> > Like you say:
> > 
> > > XFS (loop0): page discard on page ffffea00040ae640, inode 0x75e5, offset 0.
> > > XFS (loop0): page discard unable to remove delalloc mapping.
> > 
> > We know the fs is corrupt, we might as well shut down now rather than
> > let this burp out later.
> 
> xfs_bunmapi() doesn't do shutdowns - the higher level code does a
> shutdown on error if it is necessary, otherwise it just propagates
> the error. In this case it has cleaned up correctly, propagates the
> error and it gets back to userspace on the next fsync, and we're
> fine to continue onwards as there was no unrecoverable error....

Fair enough.

> > I get that people don't want to touch well seasoned code, but
> > xfs_bunmapi is this big unwieldly function that's crying out for a
> > refactor.  It's 330 lines long and can be called from various contexts
> > (data/attr fork, punch delalloc, etc.)...
> >
> > ...it's also weird that xfs_bmap_punch_delalloc_range calls xfs_bunmapi
> > with no transaction and a xfs_defer that we dump on the ground.
> 
> Yes, and yes.
> 
> > So yes, I think the patch does fix the crash, but it's kinda gross.
> 
> Yes, it is.
> 
> But OTOH, I don't want to risk a bunch of filesystem corrupting
> regressions across the entire XFS userbase just to fix a trivially
> simple crash that requires an extremely unlikely co-ordinated
> corruption of an inode data fork and an AGFL, and to simultaneously
> have ENOSPC in every other AGF in the filesystem.
> 
> Put "refactor xfs_bunmapi()" on the list of "things to do when
> there's nothing else to do"...

So in 2066 after the polar ice caps melt after the XFS LOGHAMMER attack
has finally been put down?  Ok. :)

(But no, seriously, if anyone's looking for a little refactoring +
domain knowledge enhancement of the bmapi code...)

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] xfs: More robust inode extent count validation
  2018-06-19  5:29     ` Dave Chinner
@ 2018-06-19  6:07       ` Darrick J. Wong
  0 siblings, 0 replies; 16+ messages in thread
From: Darrick J. Wong @ 2018-06-19  6:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 19, 2018 at 03:29:31PM +1000, Dave Chinner wrote:
> On Mon, Jun 18, 2018 at 09:57:25PM -0700, Darrick J. Wong wrote:
> > On Tue, Jun 19, 2018 at 12:41:28PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > When the inode is in extent format, it can't have more extents that
> > > fit in the inode fork. We don't currenty check this, and so this
> > > corruption goes unnoticed by the inode verifiers. This can lead to
> > > crashes operating on invalid in-memory structures.
> > > 
> > > Attempts to access such a inode will now error out in the verifier
> > > rather than allowing modification operations to proceed.
> > > 
> > > Reported-by: Wen Xu <wen.xu@gatech.edu>
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_format.h    |  3 ++
> > >  fs/xfs/libxfs/xfs_inode_buf.c | 74 +++++++++++++++++++++--------------
> > >  2 files changed, 48 insertions(+), 29 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> > > index 1c5a8aaf2bfc..1cb298fec274 100644
> > > --- a/fs/xfs/libxfs/xfs_format.h
> > > +++ b/fs/xfs/libxfs/xfs_format.h
> > > @@ -962,6 +962,9 @@ typedef enum xfs_dinode_fmt {
> > >  		XFS_DFORK_DSIZE(dip, mp) : \
> > >  		XFS_DFORK_ASIZE(dip, mp))
> > >  
> > > +#define XFS_DFORK_MAXEXT(dip, mp, w) \
> > > +	(XFS_DFORK_SIZE(dip, mp, w) / sizeof(xfs_bmbt_rec_t))
> > > +
> > >  /*
> > >   * Return pointers to the data or attribute forks.
> > >   */
> > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> > > index d38d724534c4..a41b6e5519e0 100644
> > > --- a/fs/xfs/libxfs/xfs_inode_buf.c
> > > +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> > > @@ -374,6 +374,45 @@ xfs_log_dinode_to_disk(
> > >  	}
> > >  }
> > >  
> > > +static xfs_failaddr_t
> > > +xfs_dinode_verify_fork(
> > > +	struct xfs_dinode	*dip,
> > > +	struct xfs_mount	*mp,
> > > +	int			whichfork)
> > > +{
> > > +	uint32_t		di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork);
> > > +
> > > +	switch (XFS_DFORK_FORMAT(dip, whichfork)) {
> > > +	case XFS_DINODE_FMT_LOCAL:
> > > +		/*
> > > +		 * no local regular files yet
> > > +		 */
> > > +		if (whichfork == XFS_DATA_FORK) {
> > > +			if (S_ISREG(be16_to_cpu(dip->di_mode)))
> > > +				return __this_address;
> > > +			if (be64_to_cpu(dip->di_size) >
> > > +					XFS_DFORK_SIZE(dip, mp, whichfork))
> > > +				return __this_address;
> > > +		}
> > > +		if (di_nextents)
> > > +			return __this_address;
> > > +		/* fall through */
> > 
> > We could break here too, right?  There's no point in further checks of
> > di_nextents for local format forks.
> > 
> > > +	case XFS_DINODE_FMT_EXTENTS:
> > > +		if (di_nextents > XFS_DFORK_MAXEXT(dip, mp, whichfork))
> > > +			return __this_address;
> > 
> > Are we supposed to break here?
> 
> They all fall through like they used to, but they could break, too.
> The behaviour will be the same now.

Fair enough.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion
  2018-06-19  6:06       ` Darrick J. Wong
@ 2018-06-19 23:33         ` Dave Chinner
  2018-06-21 16:42           ` Darrick J. Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2018-06-19 23:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Jun 18, 2018 at 11:06:52PM -0700, Darrick J. Wong wrote:
> On Tue, Jun 19, 2018 at 03:27:59PM +1000, Dave Chinner wrote:
> > On Mon, Jun 18, 2018 at 09:54:05PM -0700, Darrick J. Wong wrote:
> > > On Tue, Jun 19, 2018 at 12:41:27PM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > If we are punching out a delalloc extent, xfs_bunmapi() does not
> > > > have a transaction context and should not ever need to convert the
> > > > on-disk extent format. If such a thing is attempted (e.g. via a
> > > > corrupt inode extent count in extent format) then we should abort
> > > > with an EFSCORRUPTED error. Unfortunately, we don't do that and
> > > > crash instead:
> > > > 
> > > >  XFS (loop0): page discard on page 0000000005fd24f3, inode 0x75e5, offset 0.
> > > >  ==================================================================
> > > >  BUG: KASAN: null-ptr-deref in xfs_alloc_get_freelist+0x115/0x350
> > > >  Read of size 8 at addr 0000000000000028 by task a.out/1406
> > > >  CPU: 0 PID: 1406 Comm: a.out Not tainted 4.17.0-rc4-kasan #2
> > > >  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > >  Call Trace:
> > > >   dump_stack+0x7b/0xb5
> > > >   kasan_report+0x10c/0x390
> > > >   __asan_load8+0x54/0x90
> > > >   xfs_alloc_get_freelist+0x115/0x350
> > > >   xfs_alloc_fix_freelist+0x35b/0x830
> > > >   xfs_alloc_vextent+0x215/0x990
> > > >   xfs_bmap_extents_to_btree+0x30d/0x940
> > > > .....
> > > > 
> > > > By returning an error here, we avoid such crashes when punching out
> > > > a delalloc page because we don't try to fix up an AG freelist
> > > > without a transaction. Hence we get an error like so:
> > > 
> > > Um, isn't erroring out here leaving a dirty bomb in the in-core metadata?
> > 
> > Not that I can tell. We've already trashed the dirty page state by
> > this point, so the page cache can safely reclaim the page and the
> > delalloc range over it will never get written.  And the XFS inode
> > cleanup code didn't have any issues with the way the error was
> > handled, either, because the delalloc range was actually removed
> > before the fork format error was triggered.
> > 
> > IOWs, there is no dirty, stale page state or delalloc extents
> > hanging around if this error fires.
> 
> Hmmm, well I guess I'll pull this one in and look for problems.
> 
> I wonder, is there a <cough> testcase for this?  Or a fuzz-o-matic to
> turn all these things into regression tests?

No test case. Should be able to create one easily enough with
xfs_db, though I haven't tried. Do the inode fuzzer tests screw with
the extent count?

> > But OTOH, I don't want to risk a bunch of filesystem corrupting
> > regressions across the entire XFS userbase just to fix a trivially
> > simple crash that requires an extremely unlikely co-ordinated
> > corruption of an inode data fork and an AGFL, and to simultaneously
> > have ENOSPC in every other AGF in the filesystem.
> > 
> > Put "refactor xfs_bunmapi()" on the list of "things to do when
> > there's nothing else to do"...
> 
> So in 2066 after the polar ice caps melt after the XFS LOGHAMMER attack
> has finally been put down?  Ok. :)

I'm sure someone will have reason to factor it before then :P

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion
  2018-06-19  4:54   ` Darrick J. Wong
  2018-06-19  5:27     ` Dave Chinner
@ 2018-06-20  7:31     ` Christoph Hellwig
  2018-06-21 22:34       ` Dave Chinner
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2018-06-20  7:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs

On Mon, Jun 18, 2018 at 09:54:05PM -0700, Darrick J. Wong wrote:
> ...it's also weird that xfs_bmap_punch_delalloc_range calls xfs_bunmapi
> with no transaction and a xfs_defer that we dump on the ground.
> 
> So yes, I think the patch does fix the crash, but it's kinda gross.
> 
> Thoughts?

I've got an alternative solution:

http://git.infradead.org/users/hch/xfs.git/commitdiff/a1c0685b2085b448cbe02f0f9ff0c8771e3f4496

The only bit that is missing is removing the now unused support for
a NULL tp in __xfs_bunmapi..

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] xfs: More robust inode extent count validation
  2018-06-19  2:41 ` [PATCH 2/2] xfs: More robust inode extent count validation Dave Chinner
  2018-06-19  4:57   ` Darrick J. Wong
@ 2018-06-20  7:34   ` Christoph Hellwig
  1 sibling, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2018-06-20  7:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jun 19, 2018 at 12:41:28PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When the inode is in extent format, it can't have more extents that
> fit in the inode fork. We don't currenty check this, and so this
> corruption goes unnoticed by the inode verifiers. This can lead to
> crashes operating on invalid in-memory structures.
> 
> Attempts to access such a inode will now error out in the verifier
> rather than allowing modification operations to proceed.
> 
> Reported-by: Wen Xu <wen.xu@gatech.edu>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_format.h    |  3 ++
>  fs/xfs/libxfs/xfs_inode_buf.c | 74 +++++++++++++++++++++--------------
>  2 files changed, 48 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 1c5a8aaf2bfc..1cb298fec274 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -962,6 +962,9 @@ typedef enum xfs_dinode_fmt {
>  		XFS_DFORK_DSIZE(dip, mp) : \
>  		XFS_DFORK_ASIZE(dip, mp))
>  
> +#define XFS_DFORK_MAXEXT(dip, mp, w) \
> +	(XFS_DFORK_SIZE(dip, mp, w) / sizeof(xfs_bmbt_rec_t))

struct xfs_bmbt_rec, please.

Also do we really need this macro instead of just open coding it?

> +		if (di_nextents)
> +			return __this_address;
> +		/* fall through */

seems weird to fall through when the next check is just for di_nextents
again.  I'd rather break out of the switch and have the common
validation after it.

But the basic of the patch look fine to me.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion
  2018-06-19 23:33         ` Dave Chinner
@ 2018-06-21 16:42           ` Darrick J. Wong
  0 siblings, 0 replies; 16+ messages in thread
From: Darrick J. Wong @ 2018-06-21 16:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Jun 20, 2018 at 09:33:17AM +1000, Dave Chinner wrote:
> On Mon, Jun 18, 2018 at 11:06:52PM -0700, Darrick J. Wong wrote:
> > On Tue, Jun 19, 2018 at 03:27:59PM +1000, Dave Chinner wrote:
> > > On Mon, Jun 18, 2018 at 09:54:05PM -0700, Darrick J. Wong wrote:
> > > > On Tue, Jun 19, 2018 at 12:41:27PM +1000, Dave Chinner wrote:
> > > > > From: Dave Chinner <dchinner@redhat.com>
> > > > > 
> > > > > If we are punching out a delalloc extent, xfs_bunmapi() does not
> > > > > have a transaction context and should not ever need to convert the
> > > > > on-disk extent format. If such a thing is attempted (e.g. via a
> > > > > corrupt inode extent count in extent format) then we should abort
> > > > > with an EFSCORRUPTED error. Unfortunately, we don't do that and
> > > > > crash instead:
> > > > > 
> > > > >  XFS (loop0): page discard on page 0000000005fd24f3, inode 0x75e5, offset 0.
> > > > >  ==================================================================
> > > > >  BUG: KASAN: null-ptr-deref in xfs_alloc_get_freelist+0x115/0x350
> > > > >  Read of size 8 at addr 0000000000000028 by task a.out/1406
> > > > >  CPU: 0 PID: 1406 Comm: a.out Not tainted 4.17.0-rc4-kasan #2
> > > > >  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > > >  Call Trace:
> > > > >   dump_stack+0x7b/0xb5
> > > > >   kasan_report+0x10c/0x390
> > > > >   __asan_load8+0x54/0x90
> > > > >   xfs_alloc_get_freelist+0x115/0x350
> > > > >   xfs_alloc_fix_freelist+0x35b/0x830
> > > > >   xfs_alloc_vextent+0x215/0x990
> > > > >   xfs_bmap_extents_to_btree+0x30d/0x940
> > > > > .....
> > > > > 
> > > > > By returning an error here, we avoid such crashes when punching out
> > > > > a delalloc page because we don't try to fix up an AG freelist
> > > > > without a transaction. Hence we get an error like so:
> > > > 
> > > > Um, isn't erroring out here leaving a dirty bomb in the in-core metadata?
> > > 
> > > Not that I can tell. We've already trashed the dirty page state by
> > > this point, so the page cache can safely reclaim the page and the
> > > delalloc range over it will never get written.  And the XFS inode
> > > cleanup code didn't have any issues with the way the error was
> > > handled, either, because the delalloc range was actually removed
> > > before the fork format error was triggered.
> > > 
> > > IOWs, there is no dirty, stale page state or delalloc extents
> > > hanging around if this error fires.
> > 
> > Hmmm, well I guess I'll pull this one in and look for problems.
> > 
> > I wonder, is there a <cough> testcase for this?  Or a fuzz-o-matic to
> > turn all these things into regression tests?
> 
> No test case. Should be able to create one easily enough with
> xfs_db, though I haven't tried. Do the inode fuzzer tests screw with
> the extent count?

The existing set of fuzz tests won't catch this because they go straight
into repair attempts to see if scrub/repair will deal with bad nextents.
They don't try to modify the corrupted fs.

They also do it slowly because fuzzing nextents is simply a part of
fuzzing every field in a extents-format file inode, and I suspect that
we don't really want to make fuzz testing a regular part of xfstests
because that immediately triples the auto group runtime. :)

So, targeted test please? :)

I will also work on a fuzz series that skips scrub/repair and goes
straight to writing to the corrupted fs to see what happens.

> > > But OTOH, I don't want to risk a bunch of filesystem corrupting
> > > regressions across the entire XFS userbase just to fix a trivially
> > > simple crash that requires an extremely unlikely co-ordinated
> > > corruption of an inode data fork and an AGFL, and to simultaneously
> > > have ENOSPC in every other AGF in the filesystem.
> > > 
> > > Put "refactor xfs_bunmapi()" on the list of "things to do when
> > > there's nothing else to do"...
> > 
> > So in 2066 after the polar ice caps melt after the XFS LOGHAMMER attack
> > has finally been put down?  Ok. :)
> 
> I'm sure someone will have reason to factor it before then :P

I ... forgot that hch already did. :/

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion
  2018-06-20  7:31     ` Christoph Hellwig
@ 2018-06-21 22:34       ` Dave Chinner
  2018-06-21 22:55         ` Darrick J. Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2018-06-21 22:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Darrick J. Wong, linux-xfs

On Wed, Jun 20, 2018 at 12:31:42AM -0700, Christoph Hellwig wrote:
> On Mon, Jun 18, 2018 at 09:54:05PM -0700, Darrick J. Wong wrote:
> > ...it's also weird that xfs_bmap_punch_delalloc_range calls xfs_bunmapi
> > with no transaction and a xfs_defer that we dump on the ground.
> > 
> > So yes, I think the patch does fix the crash, but it's kinda gross.
> > 
> > Thoughts?
> 
> I've got an alternative solution:
> 
> http://git.infradead.org/users/hch/xfs.git/commitdiff/a1c0685b2085b448cbe02f0f9ff0c8771e3f4496
> 
> The only bit that is missing is removing the now unused support for
> a NULL tp in __xfs_bunmapi..

Ah, I forgot about that patch. Thanks for the reminder, Christoph!

Darrick, can we get Christoph's patch in as a standalone bug fix
rather than wait for the bufferhead removal to be merged?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion
  2018-06-21 22:34       ` Dave Chinner
@ 2018-06-21 22:55         ` Darrick J. Wong
  2018-06-21 23:23           ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2018-06-21 22:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs

On Fri, Jun 22, 2018 at 08:34:06AM +1000, Dave Chinner wrote:
> On Wed, Jun 20, 2018 at 12:31:42AM -0700, Christoph Hellwig wrote:
> > On Mon, Jun 18, 2018 at 09:54:05PM -0700, Darrick J. Wong wrote:
> > > ...it's also weird that xfs_bmap_punch_delalloc_range calls xfs_bunmapi
> > > with no transaction and a xfs_defer that we dump on the ground.
> > > 
> > > So yes, I think the patch does fix the crash, but it's kinda gross.
> > > 
> > > Thoughts?
> > 
> > I've got an alternative solution:
> > 
> > http://git.infradead.org/users/hch/xfs.git/commitdiff/a1c0685b2085b448cbe02f0f9ff0c8771e3f4496
> > 
> > The only bit that is missing is removing the now unused support for
> > a NULL tp in __xfs_bunmapi..
> 
> Ah, I forgot about that patch. Thanks for the reminder, Christoph!
> 
> Darrick, can we get Christoph's patch in as a standalone bug fix
> rather than wait for the bufferhead removal to be merged?

Ok, will do.  AFAICT if I merge that patch then I can drop this one,
right?

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion
  2018-06-21 22:55         ` Darrick J. Wong
@ 2018-06-21 23:23           ` Dave Chinner
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2018-06-21 23:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Thu, Jun 21, 2018 at 03:55:08PM -0700, Darrick J. Wong wrote:
> On Fri, Jun 22, 2018 at 08:34:06AM +1000, Dave Chinner wrote:
> > On Wed, Jun 20, 2018 at 12:31:42AM -0700, Christoph Hellwig wrote:
> > > On Mon, Jun 18, 2018 at 09:54:05PM -0700, Darrick J. Wong wrote:
> > > > ...it's also weird that xfs_bmap_punch_delalloc_range calls xfs_bunmapi
> > > > with no transaction and a xfs_defer that we dump on the ground.
> > > > 
> > > > So yes, I think the patch does fix the crash, but it's kinda gross.
> > > > 
> > > > Thoughts?
> > > 
> > > I've got an alternative solution:
> > > 
> > > http://git.infradead.org/users/hch/xfs.git/commitdiff/a1c0685b2085b448cbe02f0f9ff0c8771e3f4496
> > > 
> > > The only bit that is missing is removing the now unused support for
> > > a NULL tp in __xfs_bunmapi..
> > 
> > Ah, I forgot about that patch. Thanks for the reminder, Christoph!
> > 
> > Darrick, can we get Christoph's patch in as a standalone bug fix
> > rather than wait for the bufferhead removal to be merged?
> 
> Ok, will do.  AFAICT if I merge that patch then I can drop this one,
> right?

Yup, should be able to, as this is the only vector into the problem
code.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-06-21 23:23 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-19  2:41 [PATCH 0/2] xfs: handle inode extent count mismatch Dave Chinner
2018-06-19  2:41 ` [PATCH 1/2] xfs: transactionless xfs_bunmapi shouldn't do format conversion Dave Chinner
2018-06-19  4:54   ` Darrick J. Wong
2018-06-19  5:27     ` Dave Chinner
2018-06-19  6:06       ` Darrick J. Wong
2018-06-19 23:33         ` Dave Chinner
2018-06-21 16:42           ` Darrick J. Wong
2018-06-20  7:31     ` Christoph Hellwig
2018-06-21 22:34       ` Dave Chinner
2018-06-21 22:55         ` Darrick J. Wong
2018-06-21 23:23           ` Dave Chinner
2018-06-19  2:41 ` [PATCH 2/2] xfs: More robust inode extent count validation Dave Chinner
2018-06-19  4:57   ` Darrick J. Wong
2018-06-19  5:29     ` Dave Chinner
2018-06-19  6:07       ` Darrick J. Wong
2018-06-20  7:34   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.