Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 0/6] xfs: fix buf log item memory corruption on non-amd64
@ 2020-01-14  6:31 Darrick J. Wong
  2020-01-14  6:31 ` [PATCH 1/6] xfs: refactor remote attr value buffer invalidation Darrick J. Wong
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Darrick J. Wong @ 2020-01-14  6:31 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, hch

Hi all,

This patch series corrects a memory corruption problem that I noticed
when running fstests on i386 and on a 64k-page aarch64 machine.  The
root cause is the fact that on v5 filesystems, a remote xattribute value
can be allocated 128K of disk space (64k for the value, 64 bytes for the
header).

xattr invalidation will try to xfs_trans_binval the attribute value
buffer, which creates a (zeroed) buffer log item.  The dirty buffer in
the buffer log item isn't large enough to handle > 64k of dirty data and
we write past the end of the array, corrupting memory.  On amd64 the
compiler inserts an invisible padding area just past the end of the
dirty bitmap, which is why we don't see the problem on our laptops. :P

Since we don't ever log remote xattr values, we can fix this problem by
making sure that no part of the code that handles remote attr values
ever supplies a transaction context to a xfs_buf function.  Finish the
series by adding a few asserts so that we'll shut down the log if this
kind of overrun ever happens again.

This has been lightly tested with fstests.  Enjoy!
Comments and questions are, as always, welcome.

--D

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/6] xfs: refactor remote attr value buffer invalidation
  2020-01-14  6:31 [PATCH v3 0/6] xfs: fix buf log item memory corruption on non-amd64 Darrick J. Wong
@ 2020-01-14  6:31 ` Darrick J. Wong
  2020-01-14  8:30   ` Christoph Hellwig
  2020-01-14  6:31 ` [PATCH 2/6] xfs: fix memory corruption during " Darrick J. Wong
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2020-01-14  6:31 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, hch

From: Darrick J. Wong <darrick.wong@oracle.com>

Hoist the code that invalidates remote extended attribute value buffers
into a separate helper function.  This prepares us for a memory
corruption fix in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_attr_remote.c |   52 ++++++++++++++++++++++++---------------
 fs/xfs/libxfs/xfs_attr_remote.h |    2 ++
 2 files changed, 34 insertions(+), 20 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index a6ef5df42669..df1ab0569481 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -552,6 +552,33 @@ xfs_attr_rmtval_set(
 	return 0;
 }
 
+/* Mark stale any incore buffers for the remote value. */
+int
+xfs_attr_rmtval_stale(
+	struct xfs_inode	*ip,
+	struct xfs_bmbt_irec	*map,
+	xfs_buf_flags_t		incore_flags)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_buf		*bp;
+
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+
+	if (XFS_IS_CORRUPT(mp, map->br_startblock == DELAYSTARTBLOCK) ||
+	    XFS_IS_CORRUPT(mp, map->br_startblock == HOLESTARTBLOCK))
+		return -EFSCORRUPTED;
+
+	bp = xfs_buf_incore(mp->m_ddev_targp,
+			XFS_FSB_TO_DADDR(mp, map->br_startblock),
+			XFS_FSB_TO_BB(mp, map->br_blockcount), incore_flags);
+	if (bp) {
+		xfs_buf_stale(bp);
+		xfs_buf_relse(bp);
+	}
+
+	return 0;
+}
+
 /*
  * Remove the value associated with an attribute by deleting the
  * out-of-line buffer that it is stored on.
@@ -560,7 +587,6 @@ int
 xfs_attr_rmtval_remove(
 	struct xfs_da_args	*args)
 {
-	struct xfs_mount	*mp = args->dp->i_mount;
 	xfs_dablk_t		lblkno;
 	int			blkcnt;
 	int			error;
@@ -575,9 +601,6 @@ xfs_attr_rmtval_remove(
 	blkcnt = args->rmtblkcnt;
 	while (blkcnt > 0) {
 		struct xfs_bmbt_irec	map;
-		struct xfs_buf		*bp;
-		xfs_daddr_t		dblkno;
-		int			dblkcnt;
 		int			nmap;
 
 		/*
@@ -588,22 +611,11 @@ xfs_attr_rmtval_remove(
 				       blkcnt, &map, &nmap, XFS_BMAPI_ATTRFORK);
 		if (error)
 			return error;
-		ASSERT(nmap == 1);
-		ASSERT((map.br_startblock != DELAYSTARTBLOCK) &&
-		       (map.br_startblock != HOLESTARTBLOCK));
-
-		dblkno = XFS_FSB_TO_DADDR(mp, map.br_startblock),
-		dblkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
-
-		/*
-		 * If the "remote" value is in the cache, remove it.
-		 */
-		bp = xfs_buf_incore(mp->m_ddev_targp, dblkno, dblkcnt, XBF_TRYLOCK);
-		if (bp) {
-			xfs_buf_stale(bp);
-			xfs_buf_relse(bp);
-			bp = NULL;
-		}
+		if (XFS_IS_CORRUPT(args->dp->i_mount, nmap != 1))
+			return -EFSCORRUPTED;
+		error = xfs_attr_rmtval_stale(args->dp, &map, XBF_TRYLOCK);
+		if (error)
+			return error;
 
 		lblkno += map.br_blockcount;
 		blkcnt -= map.br_blockcount;
diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
index 9d20b66ad379..6fb4572845ce 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.h
+++ b/fs/xfs/libxfs/xfs_attr_remote.h
@@ -11,5 +11,7 @@ int xfs_attr3_rmt_blocks(struct xfs_mount *mp, int attrlen);
 int xfs_attr_rmtval_get(struct xfs_da_args *args);
 int xfs_attr_rmtval_set(struct xfs_da_args *args);
 int xfs_attr_rmtval_remove(struct xfs_da_args *args);
+int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
+		xfs_buf_flags_t incore_flags);
 
 #endif /* __XFS_ATTR_REMOTE_H__ */


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/6] xfs: fix memory corruption during remote attr value buffer invalidation
  2020-01-14  6:31 [PATCH v3 0/6] xfs: fix buf log item memory corruption on non-amd64 Darrick J. Wong
  2020-01-14  6:31 ` [PATCH 1/6] xfs: refactor remote attr value buffer invalidation Darrick J. Wong
@ 2020-01-14  6:31 ` " Darrick J. Wong
  2020-01-14  8:40   ` Christoph Hellwig
  2020-01-14  6:31 ` [PATCH 3/6] xfs: clean up xfs_buf_item_get_format return value Darrick J. Wong
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2020-01-14  6:31 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, hch

From: Darrick J. Wong <darrick.wong@oracle.com>

While running generic/103, I observed what looks like memory corruption
and (with slub debugging turned on) a slub redzone warning on i386 when
inactivating an inode with a 64k remote attr value.

On a v5 filesystem, maximally sized remote attr values require one block
more than 64k worth of space to hold both the remote attribute value
header (64 bytes).  On a 4k block filesystem this results in a 68k
buffer; on a 64k block filesystem, this would be a 128k buffer.  Note
that even though we'll never use more than 65,600 bytes of this buffer,
XFS_MAX_BLOCKSIZE is 64k.

This is a problem because the definition of struct xfs_buf_log_format
allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k).  On i386 when we
invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of
the dirty map, writing right off the end of the log item and corrupting
memory.  We've gotten away with this on x86_64 for years because the
compiler inserts a u32 padding on the end of struct xfs_buf_log_format.

Fortunately for us, remote attribute values are written to disk with
xfs_bwrite(), which is to say that they are not logged.  Fix the problem
by removing all places where we could end up creating a buffer log item
for a remote attribute value and leave a note explaining why.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_attr_remote.c |   21 +++++++++++++-
 fs/xfs/xfs_attr_inactive.c      |   57 ++++++++++++---------------------------
 2 files changed, 37 insertions(+), 41 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index df1ab0569481..55e5b1d65449 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -25,6 +25,23 @@
 
 #define ATTR_RMTVALUE_MAPSIZE	1	/* # of map entries at once */
 
+/*
+ * Remote Attribute Values
+ * =======================
+ *
+ * Remote extended attribute values are conceptually simple -- they're written
+ * to data blocks mapped by an inode's attribute fork, and they have an upper
+ * size limit of 64k.  Setting a value does not involve the XFS log.
+ *
+ * However, on a v5 filesystem, maximally sized remote attr values require one
+ * block more than 64k worth of space to hold both the remote attribute value
+ * header (64 bytes).  On a 4k block filesystem this results in a 68k buffer;
+ * on a 64k block filesystem, this would be a 128k buffer.  Note that the log
+ * format can only handle a dirty buffer of XFS_MAX_BLOCKSIZE length (64k).
+ * Therefore, we /must/ ensure that remote attribute value buffers never touch
+ * the logging system and therefore never have a log item.
+ */
+
 /*
  * Each contiguous block has a header, so it is not just a simple attribute
  * length to FSB conversion.
@@ -401,7 +418,7 @@ xfs_attr_rmtval_get(
 			       (map[i].br_startblock != HOLESTARTBLOCK));
 			dblkno = XFS_FSB_TO_DADDR(mp, map[i].br_startblock);
 			dblkcnt = XFS_FSB_TO_BB(mp, map[i].br_blockcount);
-			error = xfs_trans_read_buf(mp, args->trans,
+			error = xfs_trans_read_buf(mp, NULL,
 						   mp->m_ddev_targp,
 						   dblkno, dblkcnt, 0, &bp,
 						   &xfs_attr3_rmt_buf_ops);
@@ -411,7 +428,7 @@ xfs_attr_rmtval_get(
 			error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino,
 							&offset, &valuelen,
 							&dst);
-			xfs_trans_brelse(args->trans, bp);
+			xfs_buf_relse(bp);
 			if (error)
 				return error;
 
diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
index 5ff49523d8ea..853a5a8defc9 100644
--- a/fs/xfs/xfs_attr_inactive.c
+++ b/fs/xfs/xfs_attr_inactive.c
@@ -25,22 +25,19 @@
 #include "xfs_error.h"
 
 /*
- * Look at all the extents for this logical region,
- * invalidate any buffers that are incore/in transactions.
+ * Invalidate any incore buffers associated with this remote attribute value
+ * extent.   We never log remote attribute value buffers, which means that they
+ * won't be attached to a transaction and are therefore safe to mark stale.
+ * The actual bunmapi will be taken care of later.
  */
 STATIC int
-xfs_attr3_leaf_freextent(
-	struct xfs_trans	**trans,
+xfs_attr3_rmt_inactive(
 	struct xfs_inode	*dp,
-	xfs_dablk_t		blkno,
-	int			blkcnt)
+	struct xfs_attr_inactive_list *lp)
 {
 	struct xfs_bmbt_irec	map;
-	struct xfs_buf		*bp;
 	xfs_dablk_t		tblkno;
-	xfs_daddr_t		dblkno;
 	int			tblkcnt;
-	int			dblkcnt;
 	int			nmap;
 	int			error;
 
@@ -48,8 +45,8 @@ xfs_attr3_leaf_freextent(
 	 * Roll through the "value", invalidating the attribute value's
 	 * blocks.
 	 */
-	tblkno = blkno;
-	tblkcnt = blkcnt;
+	tblkno = lp->valueblk;
+	tblkcnt = lp->valuelen;
 	while (tblkcnt > 0) {
 		/*
 		 * Try to remember where we decided to put the value.
@@ -57,35 +54,19 @@ xfs_attr3_leaf_freextent(
 		nmap = 1;
 		error = xfs_bmapi_read(dp, (xfs_fileoff_t)tblkno, tblkcnt,
 				       &map, &nmap, XFS_BMAPI_ATTRFORK);
-		if (error) {
+		if (error)
 			return error;
-		}
-		ASSERT(nmap == 1);
-		ASSERT(map.br_startblock != DELAYSTARTBLOCK);
+		if (XFS_IS_CORRUPT(dp->i_mount, nmap != 1))
+			return -EFSCORRUPTED;
 
 		/*
-		 * If it's a hole, these are already unmapped
-		 * so there's nothing to invalidate.
+		 * Mark any incore buffers for the remote value as stale.  We
+		 * never log remote attr value buffers, so the buffer should be
+		 * easy to kill.
 		 */
-		if (map.br_startblock != HOLESTARTBLOCK) {
-
-			dblkno = XFS_FSB_TO_DADDR(dp->i_mount,
-						  map.br_startblock);
-			dblkcnt = XFS_FSB_TO_BB(dp->i_mount,
-						map.br_blockcount);
-			bp = xfs_trans_get_buf(*trans,
-					dp->i_mount->m_ddev_targp,
-					dblkno, dblkcnt, 0);
-			if (!bp)
-				return -ENOMEM;
-			xfs_trans_binval(*trans, bp);
-			/*
-			 * Roll to next transaction.
-			 */
-			error = xfs_trans_roll_inode(trans, dp);
-			if (error)
-				return error;
-		}
+		error = xfs_attr_rmtval_stale(dp, &map, 0);
+		if (error)
+			return error;
 
 		tblkno += map.br_blockcount;
 		tblkcnt -= map.br_blockcount;
@@ -174,9 +155,7 @@ xfs_attr3_leaf_inactive(
 	 */
 	error = 0;
 	for (lp = list, i = 0; i < count; i++, lp++) {
-		tmp = xfs_attr3_leaf_freextent(trans, dp,
-				lp->valueblk, lp->valuelen);
-
+		tmp = xfs_attr3_rmt_inactive(dp, lp);
 		if (error == 0)
 			error = tmp;	/* save only the 1st errno */
 	}


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/6] xfs: clean up xfs_buf_item_get_format return value
  2020-01-14  6:31 [PATCH v3 0/6] xfs: fix buf log item memory corruption on non-amd64 Darrick J. Wong
  2020-01-14  6:31 ` [PATCH 1/6] xfs: refactor remote attr value buffer invalidation Darrick J. Wong
  2020-01-14  6:31 ` [PATCH 2/6] xfs: fix memory corruption during " Darrick J. Wong
@ 2020-01-14  6:31 ` Darrick J. Wong
  2020-01-14  6:31 ` [PATCH 4/6] xfs: complain if anyone tries to create a too-large buffer log item Darrick J. Wong
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2020-01-14  6:31 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, hch, Christoph Hellwig

From: Darrick J. Wong <darrick.wong@oracle.com>

The only thing that can cause a nonzero return from
xfs_buf_item_get_format is if the kmem_alloc fails, which it can't.
Get rid of all the unnecessary error handling.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf_item.c |   16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)


diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 3984779e5911..9737f177a49b 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -688,7 +688,7 @@ static const struct xfs_item_ops xfs_buf_item_ops = {
 	.iop_push	= xfs_buf_item_push,
 };
 
-STATIC int
+STATIC void
 xfs_buf_item_get_format(
 	struct xfs_buf_log_item	*bip,
 	int			count)
@@ -698,14 +698,11 @@ xfs_buf_item_get_format(
 
 	if (count == 1) {
 		bip->bli_formats = &bip->__bli_format;
-		return 0;
+		return;
 	}
 
 	bip->bli_formats = kmem_zalloc(count * sizeof(struct xfs_buf_log_format),
 				0);
-	if (!bip->bli_formats)
-		return -ENOMEM;
-	return 0;
 }
 
 STATIC void
@@ -731,7 +728,6 @@ xfs_buf_item_init(
 	struct xfs_buf_log_item	*bip = bp->b_log_item;
 	int			chunks;
 	int			map_size;
-	int			error;
 	int			i;
 
 	/*
@@ -760,13 +756,7 @@ xfs_buf_item_init(
 	 * Discontiguous buffer support follows the layout of the underlying
 	 * buffer. This makes the implementation as simple as possible.
 	 */
-	error = xfs_buf_item_get_format(bip, bp->b_map_count);
-	ASSERT(error == 0);
-	if (error) {	/* to stop gcc throwing set-but-unused warnings */
-		kmem_cache_free(xfs_buf_item_zone, bip);
-		return error;
-	}
-
+	xfs_buf_item_get_format(bip, bp->b_map_count);
 
 	for (i = 0; i < bip->bli_format_count; i++) {
 		chunks = DIV_ROUND_UP(BBTOB(bp->b_maps[i].bm_len),


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 4/6] xfs: complain if anyone tries to create a too-large buffer log item
  2020-01-14  6:31 [PATCH v3 0/6] xfs: fix buf log item memory corruption on non-amd64 Darrick J. Wong
                   ` (2 preceding siblings ...)
  2020-01-14  6:31 ` [PATCH 3/6] xfs: clean up xfs_buf_item_get_format return value Darrick J. Wong
@ 2020-01-14  6:31 ` Darrick J. Wong
  2020-01-14  6:32 ` [PATCH 5/6] xfs: make struct xfs_buf_log_format have a consistent size Darrick J. Wong
  2020-01-14  6:32 ` [PATCH 6/6] xfs: check log iovec size to make sure it's plausibly a buffer log format Darrick J. Wong
  5 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2020-01-14  6:31 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, hch, Christoph Hellwig

From: Darrick J. Wong <darrick.wong@oracle.com>

Complain if someone calls xfs_buf_item_init on a buffer that is larger
than the dirty bitmap can handle, or tries to log a region that's past
the end of the dirty bitmap.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf_item.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)


diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 9737f177a49b..be691d1d9fad 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -763,6 +763,15 @@ xfs_buf_item_init(
 				      XFS_BLF_CHUNK);
 		map_size = DIV_ROUND_UP(chunks, NBWORD);
 
+		if (map_size > XFS_BLF_DATAMAP_SIZE) {
+			kmem_cache_free(xfs_buf_item_zone, bip);
+			xfs_err(mp,
+	"buffer item dirty bitmap (%u uints) too small to reflect %u bytes!",
+					map_size,
+					BBTOB(bp->b_maps[i].bm_len));
+			return -EFSCORRUPTED;
+		}
+
 		bip->bli_formats[i].blf_type = XFS_LI_BUF;
 		bip->bli_formats[i].blf_blkno = bp->b_maps[i].bm_bn;
 		bip->bli_formats[i].blf_len = bp->b_maps[i].bm_len;
@@ -795,6 +804,9 @@ xfs_buf_item_log_segment(
 	uint		end_bit;
 	uint		mask;
 
+	ASSERT(first < XFS_BLF_DATAMAP_SIZE * XFS_BLF_CHUNK * NBWORD);
+	ASSERT(last < XFS_BLF_DATAMAP_SIZE * XFS_BLF_CHUNK * NBWORD);
+
 	/*
 	 * Convert byte offsets to bit numbers.
 	 */


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 5/6] xfs: make struct xfs_buf_log_format have a consistent size
  2020-01-14  6:31 [PATCH v3 0/6] xfs: fix buf log item memory corruption on non-amd64 Darrick J. Wong
                   ` (3 preceding siblings ...)
  2020-01-14  6:31 ` [PATCH 4/6] xfs: complain if anyone tries to create a too-large buffer log item Darrick J. Wong
@ 2020-01-14  6:32 ` Darrick J. Wong
  2020-01-14  8:41   ` Christoph Hellwig
  2020-01-14  6:32 ` [PATCH 6/6] xfs: check log iovec size to make sure it's plausibly a buffer log format Darrick J. Wong
  5 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2020-01-14  6:32 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, hch

From: Darrick J. Wong <darrick.wong@oracle.com>

Increase XFS_BLF_DATAMAP_SIZE by 1 to fill in the implied padding at the
end of struct xfs_buf_log_format.  This makes the size consistent so
that we can check it in xfs_ondisk.h, and will be needed once we start
logging attribute values.

On amd64 we get the following pahole:

struct xfs_buf_log_format {
        short unsigned int         blf_type;       /*     0     2 */
        short unsigned int         blf_size;       /*     2     2 */
        short unsigned int         blf_flags;      /*     4     2 */
        short unsigned int         blf_len;        /*     6     2 */
        long long int              blf_blkno;      /*     8     8 */
        unsigned int               blf_map_size;   /*    16     4 */
        unsigned int               blf_data_map[16]; /*    20    64 */
        /* --- cacheline 1 boundary (64 bytes) was 20 bytes ago --- */

        /* size: 88, cachelines: 2, members: 7 */
        /* padding: 4 */
        /* last cacheline: 24 bytes */
};

But on i386 we get the following:

struct xfs_buf_log_format {
        short unsigned int         blf_type;       /*     0     2 */
        short unsigned int         blf_size;       /*     2     2 */
        short unsigned int         blf_flags;      /*     4     2 */
        short unsigned int         blf_len;        /*     6     2 */
        long long int              blf_blkno;      /*     8     8 */
        unsigned int               blf_map_size;   /*    16     4 */
        unsigned int               blf_data_map[16]; /*    20    64 */
        /* --- cacheline 1 boundary (64 bytes) was 20 bytes ago --- */

        /* size: 84, cachelines: 2, members: 7 */
        /* last cacheline: 20 bytes */
};

Notice how the amd64 compiler inserts 4 bytes of padding to the end of
the structure to ensure 8-byte alignment.  Prior to "xfs: fix memory
corruption during remote attr value buffer invalidation" we would try to
write to blf_data_map[17], which is harmless on amd64 but really bad on
i386.

This shouldn't cause any changes in the ondisk logging formats because
the log code writes out the log vectors with the appropriate size for
the log item's map_size, and log recovery treats the data_map array as a
VLA.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_log_format.h |   19 ++++++++++++++-----
 fs/xfs/xfs_ondisk.h            |    1 +
 2 files changed, 15 insertions(+), 5 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 8ef31d71a9c7..9bac0d2e56dc 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -462,11 +462,20 @@ static inline uint xfs_log_dinode_size(int version)
 #define	XFS_BLF_GDQUOT_BUF	(1<<4)
 
 /*
- * This is the structure used to lay out a buf log item in the
- * log.  The data map describes which 128 byte chunks of the buffer
- * have been logged.
- */
-#define XFS_BLF_DATAMAP_SIZE	((XFS_MAX_BLOCKSIZE / XFS_BLF_CHUNK) / NBWORD)
+ * This is the structure used to lay out a buf log item in the log.  The data
+ * map describes which 128 byte chunks of the buffer have been logged.
+ *
+ * The placement of blf_map_size causes blf_data_map to start at an odd
+ * multiple of sizeof(unsigned int) offset within the struct.  Because the data
+ * bitmap size will always be an even number, the end of the data_map (and
+ * therefore the structure) will also be at an odd multiple of sizeof(unsigned
+ * int).  Some 64-bit compilers will insert padding at the end of the struct to
+ * ensure 64-bit alignment of blf_blkno, but 32-bit ones will not.  Therefore,
+ * XFS_BLF_DATAMAP_SIZE must be an odd number to make the padding explicit and
+ * keep the structure size consistent between 32-bit and 64-bit platforms.
+ */
+#define __XFS_BLF_DATAMAP_SIZE	((XFS_MAX_BLOCKSIZE / XFS_BLF_CHUNK) / NBWORD)
+#define XFS_BLF_DATAMAP_SIZE	(__XFS_BLF_DATAMAP_SIZE + 1)
 
 typedef struct xfs_buf_log_format {
 	unsigned short	blf_type;	/* buf log item type indicator */
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index b6701b4f59a9..5f04d8a5ab2a 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -111,6 +111,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(xfs_dir2_sf_hdr_t,		10);
 
 	/* log structures */
+	XFS_CHECK_STRUCT_SIZE(struct xfs_buf_log_format,	88);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dq_logformat,		24);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_efd_log_format_32,	28);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_efd_log_format_64,	32);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 6/6] xfs: check log iovec size to make sure it's plausibly a buffer log format
  2020-01-14  6:31 [PATCH v3 0/6] xfs: fix buf log item memory corruption on non-amd64 Darrick J. Wong
                   ` (4 preceding siblings ...)
  2020-01-14  6:32 ` [PATCH 5/6] xfs: make struct xfs_buf_log_format have a consistent size Darrick J. Wong
@ 2020-01-14  6:32 ` Darrick J. Wong
  2020-01-14  8:42   ` Christoph Hellwig
  5 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2020-01-14  6:32 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, hch

From: Darrick J. Wong <darrick.wong@oracle.com>

When log recovery is processing buffer log items, we should check that
the incoming iovec actually describes a region of memory large enough to
contain the log format and the dirty map.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_buf_item.c    |   17 +++++++++++++++++
 fs/xfs/xfs_buf_item.h    |    1 +
 fs/xfs/xfs_log_recover.c |    6 ++++++
 3 files changed, 24 insertions(+)


diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index be691d1d9fad..5be8973a452c 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -27,6 +27,23 @@ static inline struct xfs_buf_log_item *BUF_ITEM(struct xfs_log_item *lip)
 
 STATIC void	xfs_buf_do_callbacks(struct xfs_buf *bp);
 
+/* Is this log iovec plausibly large enough to contain the buffer log format? */
+bool
+xfs_buf_log_check_iovec(
+	struct xfs_log_iovec		*iovec)
+{
+	struct xfs_buf_log_format	*blfp = iovec->i_addr;
+	char				*bmp_end;
+	char				*item_end;
+
+	if (offsetof(struct xfs_buf_log_format, blf_data_map) > iovec->i_len)
+		return false;
+
+	item_end = (char *)iovec->i_addr + iovec->i_len;
+	bmp_end = (char *)&blfp->blf_data_map[blfp->blf_map_size];
+	return bmp_end <= item_end;
+}
+
 static inline int
 xfs_buf_log_format_size(
 	struct xfs_buf_log_format *blfp)
diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
index 4a054b11011a..30114b510332 100644
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -61,6 +61,7 @@ void	xfs_buf_iodone_callbacks(struct xfs_buf *);
 void	xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *);
 bool	xfs_buf_resubmit_failed_buffers(struct xfs_buf *,
 					struct list_head *);
+bool	xfs_buf_log_check_iovec(struct xfs_log_iovec *iovec);
 
 extern kmem_zone_t	*xfs_buf_item_zone;
 
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 99ec3fba4548..0d683fb96396 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1934,6 +1934,12 @@ xlog_recover_buffer_pass1(
 	struct list_head	*bucket;
 	struct xfs_buf_cancel	*bcp;
 
+	if (!xfs_buf_log_check_iovec(&item->ri_buf[0])) {
+		xfs_err(log->l_mp, "bad buffer log item size (%d)",
+				item->ri_buf[0].i_len);
+		return -EFSCORRUPTED;
+	}
+
 	/*
 	 * If this isn't a cancel buffer item, then just return.
 	 */


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/6] xfs: refactor remote attr value buffer invalidation
  2020-01-14  6:31 ` [PATCH 1/6] xfs: refactor remote attr value buffer invalidation Darrick J. Wong
@ 2020-01-14  8:30   ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2020-01-14  8:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch

On Mon, Jan 13, 2020 at 10:31:35PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Hoist the code that invalidates remote extended attribute value buffers
> into a separate helper function.  This prepares us for a memory
> corruption fix in the next patch.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/6] xfs: fix memory corruption during remote attr value buffer invalidation
  2020-01-14  6:31 ` [PATCH 2/6] xfs: fix memory corruption during " Darrick J. Wong
@ 2020-01-14  8:40   ` Christoph Hellwig
  2020-01-14 23:02     ` Darrick J. Wong
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2020-01-14  8:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch

> Fortunately for us, remote attribute values are written to disk with
> xfs_bwrite(), which is to say that they are not logged.  Fix the problem
> by removing all places where we could end up creating a buffer log item
> for a remote attribute value and leave a note explaining why.

This is stil missing a comment that you are using a suitable helper
for marking the buffer stale, and why rmeoving the HOLEBLOCK check
is safe (which I now tink it is based on looking at the caller).

> -			error = xfs_trans_read_buf(mp, args->trans,
> +			error = xfs_trans_read_buf(mp, NULL,
>  						   mp->m_ddev_targp,
>  						   dblkno, dblkcnt, 0, &bp,
>  						   &xfs_attr3_rmt_buf_ops);
> @@ -411,7 +428,7 @@ xfs_attr_rmtval_get(
>  			error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino,
>  							&offset, &valuelen,
>  							&dst);
> -			xfs_trans_brelse(args->trans, bp);
> +			xfs_buf_relse(bp);

FYI, I don't think mixing xfs_trans_read_buf and xfs_buf_relse is a good
pattern.

> @@ -48,8 +45,8 @@ xfs_attr3_leaf_freextent(
>  	 * Roll through the "value", invalidating the attribute value's
>  	 * blocks.
>  	 */
> -	tblkno = blkno;
> -	tblkcnt = blkcnt;
> +	tblkno = lp->valueblk;
> +	tblkcnt = lp->valuelen;

Nit: these could be easily initialized on the declaration lines.  Or
even better if you keep the old calling conventions of passing the
blockno and count by value, in which case we don't need the extra local
variables at all.

> @@ -174,9 +155,7 @@ xfs_attr3_leaf_inactive(
>  	 */
>  	error = 0;
>  	for (lp = list, i = 0; i < count; i++, lp++) {
> -		tmp = xfs_attr3_leaf_freextent(trans, dp,
> -				lp->valueblk, lp->valuelen);
> -
> +		tmp = xfs_attr3_rmt_inactive(dp, lp);

So given that we don't touch the transaction I don't think we even
need the memory allocation to defer the marking stale of the buffer
until after the xfs_trans_brelse.  But that could be a separate
patch, especially if the block/count calling conventions are kept as-is.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/6] xfs: make struct xfs_buf_log_format have a consistent size
  2020-01-14  6:32 ` [PATCH 5/6] xfs: make struct xfs_buf_log_format have a consistent size Darrick J. Wong
@ 2020-01-14  8:41   ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2020-01-14  8:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/6] xfs: check log iovec size to make sure it's plausibly a buffer log format
  2020-01-14  6:32 ` [PATCH 6/6] xfs: check log iovec size to make sure it's plausibly a buffer log format Darrick J. Wong
@ 2020-01-14  8:42   ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2020-01-14  8:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch

On Mon, Jan 13, 2020 at 10:32:09PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When log recovery is processing buffer log items, we should check that
> the incoming iovec actually describes a region of memory large enough to
> contain the log format and the dirty map.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/6] xfs: fix memory corruption during remote attr value buffer invalidation
  2020-01-14  8:40   ` Christoph Hellwig
@ 2020-01-14 23:02     ` Darrick J. Wong
  0 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2020-01-14 23:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Tue, Jan 14, 2020 at 12:40:11AM -0800, Christoph Hellwig wrote:
> > Fortunately for us, remote attribute values are written to disk with
> > xfs_bwrite(), which is to say that they are not logged.  Fix the problem
> > by removing all places where we could end up creating a buffer log item
> > for a remote attribute value and leave a note explaining why.
> 
> This is stil missing a comment that you are using a suitable helper
> for marking the buffer stale, and why rmeoving the HOLEBLOCK check
> is safe (which I now tink it is based on looking at the caller).

Oops, I forgot to update the changelog.


> > -			error = xfs_trans_read_buf(mp, args->trans,
> > +			error = xfs_trans_read_buf(mp, NULL,
> >  						   mp->m_ddev_targp,
> >  						   dblkno, dblkcnt, 0, &bp,
> >  						   &xfs_attr3_rmt_buf_ops);
> > @@ -411,7 +428,7 @@ xfs_attr_rmtval_get(
> >  			error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino,
> >  							&offset, &valuelen,
> >  							&dst);
> > -			xfs_trans_brelse(args->trans, bp);
> > +			xfs_buf_relse(bp);
> 
> FYI, I don't think mixing xfs_trans_read_buf and xfs_buf_relse is a good
> pattern.

Yeah, you're right.  I didn't want to go opencoding the !bp or
bp->b_error > 0 cases that happen in xfs_trans_buf_read to make this bug
fix an even bigger pile of patches, but maybe it's just time to clean up
xfs_buf_read() to return error values like most everywhere else.

> > @@ -48,8 +45,8 @@ xfs_attr3_leaf_freextent(
> >  	 * Roll through the "value", invalidating the attribute value's
> >  	 * blocks.
> >  	 */
> > -	tblkno = blkno;
> > -	tblkcnt = blkcnt;
> > +	tblkno = lp->valueblk;
> > +	tblkcnt = lp->valuelen;
> 
> Nit: these could be easily initialized on the declaration lines.  Or
> even better if you keep the old calling conventions of passing the
> blockno and count by value, in which case we don't need the extra local
> variables at all.
> 
> > @@ -174,9 +155,7 @@ xfs_attr3_leaf_inactive(
> >  	 */
> >  	error = 0;
> >  	for (lp = list, i = 0; i < count; i++, lp++) {
> > -		tmp = xfs_attr3_leaf_freextent(trans, dp,
> > -				lp->valueblk, lp->valuelen);
> > -
> > +		tmp = xfs_attr3_rmt_inactive(dp, lp);
> 
> So given that we don't touch the transaction I don't think we even
> need the memory allocation to defer the marking stale of the buffer
> until after the xfs_trans_brelse.  But that could be a separate
> patch, especially if the block/count calling conventions are kept as-is.

These last two I'll clean up in a followup patch that gets rid of the
pointless local variables in the first function and the pointless memory
allocation in the second function.

--D

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, back to index

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-14  6:31 [PATCH v3 0/6] xfs: fix buf log item memory corruption on non-amd64 Darrick J. Wong
2020-01-14  6:31 ` [PATCH 1/6] xfs: refactor remote attr value buffer invalidation Darrick J. Wong
2020-01-14  8:30   ` Christoph Hellwig
2020-01-14  6:31 ` [PATCH 2/6] xfs: fix memory corruption during " Darrick J. Wong
2020-01-14  8:40   ` Christoph Hellwig
2020-01-14 23:02     ` Darrick J. Wong
2020-01-14  6:31 ` [PATCH 3/6] xfs: clean up xfs_buf_item_get_format return value Darrick J. Wong
2020-01-14  6:31 ` [PATCH 4/6] xfs: complain if anyone tries to create a too-large buffer log item Darrick J. Wong
2020-01-14  6:32 ` [PATCH 5/6] xfs: make struct xfs_buf_log_format have a consistent size Darrick J. Wong
2020-01-14  8:41   ` Christoph Hellwig
2020-01-14  6:32 ` [PATCH 6/6] xfs: check log iovec size to make sure it's plausibly a buffer log format Darrick J. Wong
2020-01-14  8:42   ` Christoph Hellwig

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git