linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] xfs: fix buf log item memory corruption on non-amd64
@ 2020-01-09 18:44 Darrick J. Wong
  2020-01-09 18:44 ` [PATCH 1/5] xfs: refactor remote attr value buffer invalidation Darrick J. Wong
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Darrick J. Wong @ 2020-01-09 18:44 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

Hi all,

This second series corrects a memory corruption problem that I noticed
when running fstests on i386 and on a 64k-page aarch64 machine.  The
root cause is the fact that on v5 filesystems, a remote xattribute value
can be allocated 128K of disk space (64k for the value, 64 bytes for the
header).

xattr invalidation will try to xfs_trans_binval the attribute value
buffer, which creates a (zeroed) buffer log item.  The dirty buffer in
the buffer log item isn't large enough to handle > 64k of dirty data and
we write past the end of the array, corrupting memory.  On amd64 the
compiler inserts an invisible padding area just past the end of the
dirty bitmap, which is why we don't see the problem on our laptops. :P

Since we don't ever log remote xattr values, we can fix this problem by
making sure that no part of the code that handles remote attr values
ever supplies a transaction context to a xfs_buf function.  Finish the
series by adding a few asserts so that we'll shut down the log if this
kind of overrun ever happens again.

This has been lightly tested with fstests.  Enjoy!
Comments and questions are, as always, welcome.

--D

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] xfs: refactor remote attr value buffer invalidation
  2020-01-09 18:44 [PATCH v2 0/5] xfs: fix buf log item memory corruption on non-amd64 Darrick J. Wong
@ 2020-01-09 18:44 ` Darrick J. Wong
  2020-01-10 11:55   ` Christoph Hellwig
  2020-01-09 18:44 ` [PATCH 2/5] xfs: fix memory corruption during " Darrick J. Wong
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2020-01-09 18:44 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Hoist the code that invalidates remote extended attribute value buffers
into a separate helper function.  This prepares us for a memory
corruption fix in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_attr_remote.c |   44 +++++++++++++++++++++++++--------------
 fs/xfs/libxfs/xfs_attr_remote.h |    1 +
 2 files changed, 29 insertions(+), 16 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index a6ef5df42669..6babdaac6057 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -552,6 +552,32 @@ xfs_attr_rmtval_set(
 	return 0;
 }
 
+/* Mark stale any incore buffers for the remote value. */
+void
+xfs_attr_rmtval_stale(
+	struct xfs_inode	*ip,
+	struct xfs_bmbt_irec	*map)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_buf		*bp;
+	xfs_daddr_t		dblkno;
+	int			dblkcnt;
+
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+
+	dblkno = XFS_FSB_TO_DADDR(mp, map->br_startblock),
+	dblkcnt = XFS_FSB_TO_BB(mp, map->br_blockcount);
+
+	/*
+	 * If the "remote" value is in the cache, remove it.
+	 */
+	bp = xfs_buf_incore(mp->m_ddev_targp, dblkno, dblkcnt, XBF_TRYLOCK);
+	if (bp) {
+		xfs_buf_stale(bp);
+		xfs_buf_relse(bp);
+	}
+}
+
 /*
  * Remove the value associated with an attribute by deleting the
  * out-of-line buffer that it is stored on.
@@ -560,7 +586,6 @@ int
 xfs_attr_rmtval_remove(
 	struct xfs_da_args	*args)
 {
-	struct xfs_mount	*mp = args->dp->i_mount;
 	xfs_dablk_t		lblkno;
 	int			blkcnt;
 	int			error;
@@ -575,9 +600,6 @@ xfs_attr_rmtval_remove(
 	blkcnt = args->rmtblkcnt;
 	while (blkcnt > 0) {
 		struct xfs_bmbt_irec	map;
-		struct xfs_buf		*bp;
-		xfs_daddr_t		dblkno;
-		int			dblkcnt;
 		int			nmap;
 
 		/*
@@ -592,18 +614,8 @@ xfs_attr_rmtval_remove(
 		ASSERT((map.br_startblock != DELAYSTARTBLOCK) &&
 		       (map.br_startblock != HOLESTARTBLOCK));
 
-		dblkno = XFS_FSB_TO_DADDR(mp, map.br_startblock),
-		dblkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
-
-		/*
-		 * If the "remote" value is in the cache, remove it.
-		 */
-		bp = xfs_buf_incore(mp->m_ddev_targp, dblkno, dblkcnt, XBF_TRYLOCK);
-		if (bp) {
-			xfs_buf_stale(bp);
-			xfs_buf_relse(bp);
-			bp = NULL;
-		}
+		if (map.br_startblock != HOLESTARTBLOCK)
+			xfs_attr_rmtval_stale(args->dp, &map);
 
 		lblkno += map.br_blockcount;
 		blkcnt -= map.br_blockcount;
diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
index 9d20b66ad379..ce7287ac5b1f 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.h
+++ b/fs/xfs/libxfs/xfs_attr_remote.h
@@ -11,5 +11,6 @@ int xfs_attr3_rmt_blocks(struct xfs_mount *mp, int attrlen);
 int xfs_attr_rmtval_get(struct xfs_da_args *args);
 int xfs_attr_rmtval_set(struct xfs_da_args *args);
 int xfs_attr_rmtval_remove(struct xfs_da_args *args);
+void xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map);
 
 #endif /* __XFS_ATTR_REMOTE_H__ */


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] xfs: fix memory corruption during remote attr value buffer invalidation
  2020-01-09 18:44 [PATCH v2 0/5] xfs: fix buf log item memory corruption on non-amd64 Darrick J. Wong
  2020-01-09 18:44 ` [PATCH 1/5] xfs: refactor remote attr value buffer invalidation Darrick J. Wong
@ 2020-01-09 18:44 ` Darrick J. Wong
  2020-01-10 11:57   ` Christoph Hellwig
  2020-01-09 18:45 ` [PATCH 3/5] xfs: clean up xfs_buf_item_get_format return value Darrick J. Wong
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2020-01-09 18:44 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

While running generic/103, I observed what looks like memory corruption
and (with slub debugging turned on) a slub redzone warning on i386 when
inactivating an inode with a 64k remote attr value.

On a v5 filesystem, maximally sized remote attr values require one block
more than 64k worth of space to hold both the remote attribute value
header (64 bytes).  On a 4k block filesystem this results in a 68k
buffer; on a 64k block filesystem, this would be a 128k buffer.  Note
that even though we'll never use more than 65,600 bytes of this buffer,
XFS_MAX_BLOCKSIZE is 64k.

This is a problem because the definition of struct xfs_buf_log_format
allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k).  On i386 when we
invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of
the dirty map, writing right off the end of the log item and corrupting
memory.  We've gotten away with this on x86_64 for years because the
compiler inserts a u32 padding on the end of struct xfs_buf_log_format.

Fortunately for us, remote attribute values are written to disk with
xfs_bwrite(), which is to say that they are not logged.  Fix the problem
by removing all places where we could end up creating a buffer log item
for a remote attribute value and leave a note explaining why.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_attr_remote.c |   21 +++++++++++++++++++--
 fs/xfs/xfs_attr_inactive.c      |   34 ++++++----------------------------
 2 files changed, 25 insertions(+), 30 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 6babdaac6057..1765f5351f9e 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -25,6 +25,23 @@
 
 #define ATTR_RMTVALUE_MAPSIZE	1	/* # of map entries at once */
 
+/*
+ * Remote Attribute Values
+ * =======================
+ *
+ * Remote extended attribute values are conceptually simple -- they're written
+ * to data blocks mapped by an inode's attribute fork, and they have an upper
+ * size limit of 64k.  Setting a value does not involve the XFS log.
+ *
+ * However, on a v5 filesystem, maximally sized remote attr values require one
+ * block more than 64k worth of space to hold both the remote attribute value
+ * header (64 bytes).  On a 4k block filesystem this results in a 68k buffer;
+ * on a 64k block filesystem, this would be a 128k buffer.  Note that the log
+ * format can only handle a dirty buffer of XFS_MAX_BLOCKSIZE length (64k).
+ * Therefore, we /must/ ensure that remote attribute value buffers never touch
+ * the logging system and therefore never have a log item.
+ */
+
 /*
  * Each contiguous block has a header, so it is not just a simple attribute
  * length to FSB conversion.
@@ -401,7 +418,7 @@ xfs_attr_rmtval_get(
 			       (map[i].br_startblock != HOLESTARTBLOCK));
 			dblkno = XFS_FSB_TO_DADDR(mp, map[i].br_startblock);
 			dblkcnt = XFS_FSB_TO_BB(mp, map[i].br_blockcount);
-			error = xfs_trans_read_buf(mp, args->trans,
+			error = xfs_trans_read_buf(mp, NULL,
 						   mp->m_ddev_targp,
 						   dblkno, dblkcnt, 0, &bp,
 						   &xfs_attr3_rmt_buf_ops);
@@ -411,7 +428,7 @@ xfs_attr_rmtval_get(
 			error = xfs_attr_rmtval_copyout(mp, bp, args->dp->i_ino,
 							&offset, &valuelen,
 							&dst);
-			xfs_trans_brelse(args->trans, bp);
+			xfs_buf_relse(bp);
 			if (error)
 				return error;
 
diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
index 5ff49523d8ea..5c83dd3611af 100644
--- a/fs/xfs/xfs_attr_inactive.c
+++ b/fs/xfs/xfs_attr_inactive.c
@@ -30,17 +30,13 @@
  */
 STATIC int
 xfs_attr3_leaf_freextent(
-	struct xfs_trans	**trans,
 	struct xfs_inode	*dp,
 	xfs_dablk_t		blkno,
 	int			blkcnt)
 {
 	struct xfs_bmbt_irec	map;
-	struct xfs_buf		*bp;
 	xfs_dablk_t		tblkno;
-	xfs_daddr_t		dblkno;
 	int			tblkcnt;
-	int			dblkcnt;
 	int			nmap;
 	int			error;
 
@@ -64,28 +60,12 @@ xfs_attr3_leaf_freextent(
 		ASSERT(map.br_startblock != DELAYSTARTBLOCK);
 
 		/*
-		 * If it's a hole, these are already unmapped
-		 * so there's nothing to invalidate.
+		 * Mark any incore buffers for the remote value as stale.  We
+		 * never log remote attr value buffers, so the buffer should be
+		 * easy to kill.
 		 */
-		if (map.br_startblock != HOLESTARTBLOCK) {
-
-			dblkno = XFS_FSB_TO_DADDR(dp->i_mount,
-						  map.br_startblock);
-			dblkcnt = XFS_FSB_TO_BB(dp->i_mount,
-						map.br_blockcount);
-			bp = xfs_trans_get_buf(*trans,
-					dp->i_mount->m_ddev_targp,
-					dblkno, dblkcnt, 0);
-			if (!bp)
-				return -ENOMEM;
-			xfs_trans_binval(*trans, bp);
-			/*
-			 * Roll to next transaction.
-			 */
-			error = xfs_trans_roll_inode(trans, dp);
-			if (error)
-				return error;
-		}
+		if (map.br_startblock != HOLESTARTBLOCK)
+			xfs_attr_rmtval_stale(dp, &map);
 
 		tblkno += map.br_blockcount;
 		tblkcnt -= map.br_blockcount;
@@ -174,9 +154,7 @@ xfs_attr3_leaf_inactive(
 	 */
 	error = 0;
 	for (lp = list, i = 0; i < count; i++, lp++) {
-		tmp = xfs_attr3_leaf_freextent(trans, dp,
-				lp->valueblk, lp->valuelen);
-
+		tmp = xfs_attr3_leaf_freextent(dp, lp->valueblk, lp->valuelen);
 		if (error == 0)
 			error = tmp;	/* save only the 1st errno */
 	}


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] xfs: clean up xfs_buf_item_get_format return value
  2020-01-09 18:44 [PATCH v2 0/5] xfs: fix buf log item memory corruption on non-amd64 Darrick J. Wong
  2020-01-09 18:44 ` [PATCH 1/5] xfs: refactor remote attr value buffer invalidation Darrick J. Wong
  2020-01-09 18:44 ` [PATCH 2/5] xfs: fix memory corruption during " Darrick J. Wong
@ 2020-01-09 18:45 ` Darrick J. Wong
  2020-01-10 11:58   ` Christoph Hellwig
  2020-01-09 18:45 ` [PATCH 4/5] xfs: complain if anyone tries to create a too-large buffer log item Darrick J. Wong
  2020-01-09 18:45 ` [PATCH 5/5] xfs: make struct xfs_buf_log_format have a consistent size Darrick J. Wong
  4 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2020-01-09 18:45 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

The only thing that can cause a nonzero return from
xfs_buf_item_get_format is if the kmem_alloc fails, which it can't.
Get rid of all the unnecessary error handling.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_buf_item.c |   16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)


diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 3984779e5911..9737f177a49b 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -688,7 +688,7 @@ static const struct xfs_item_ops xfs_buf_item_ops = {
 	.iop_push	= xfs_buf_item_push,
 };
 
-STATIC int
+STATIC void
 xfs_buf_item_get_format(
 	struct xfs_buf_log_item	*bip,
 	int			count)
@@ -698,14 +698,11 @@ xfs_buf_item_get_format(
 
 	if (count == 1) {
 		bip->bli_formats = &bip->__bli_format;
-		return 0;
+		return;
 	}
 
 	bip->bli_formats = kmem_zalloc(count * sizeof(struct xfs_buf_log_format),
 				0);
-	if (!bip->bli_formats)
-		return -ENOMEM;
-	return 0;
 }
 
 STATIC void
@@ -731,7 +728,6 @@ xfs_buf_item_init(
 	struct xfs_buf_log_item	*bip = bp->b_log_item;
 	int			chunks;
 	int			map_size;
-	int			error;
 	int			i;
 
 	/*
@@ -760,13 +756,7 @@ xfs_buf_item_init(
 	 * Discontiguous buffer support follows the layout of the underlying
 	 * buffer. This makes the implementation as simple as possible.
 	 */
-	error = xfs_buf_item_get_format(bip, bp->b_map_count);
-	ASSERT(error == 0);
-	if (error) {	/* to stop gcc throwing set-but-unused warnings */
-		kmem_cache_free(xfs_buf_item_zone, bip);
-		return error;
-	}
-
+	xfs_buf_item_get_format(bip, bp->b_map_count);
 
 	for (i = 0; i < bip->bli_format_count; i++) {
 		chunks = DIV_ROUND_UP(BBTOB(bp->b_maps[i].bm_len),


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] xfs: complain if anyone tries to create a too-large buffer log item
  2020-01-09 18:44 [PATCH v2 0/5] xfs: fix buf log item memory corruption on non-amd64 Darrick J. Wong
                   ` (2 preceding siblings ...)
  2020-01-09 18:45 ` [PATCH 3/5] xfs: clean up xfs_buf_item_get_format return value Darrick J. Wong
@ 2020-01-09 18:45 ` Darrick J. Wong
  2020-01-10 11:58   ` Christoph Hellwig
  2020-01-09 18:45 ` [PATCH 5/5] xfs: make struct xfs_buf_log_format have a consistent size Darrick J. Wong
  4 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2020-01-09 18:45 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Complain if someone calls xfs_buf_item_init on a buffer that is larger
than the dirty bitmap can handle, or tries to log a region that's past
the end of the dirty bitmap.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_buf_item.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)


diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 9737f177a49b..be691d1d9fad 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -763,6 +763,15 @@ xfs_buf_item_init(
 				      XFS_BLF_CHUNK);
 		map_size = DIV_ROUND_UP(chunks, NBWORD);
 
+		if (map_size > XFS_BLF_DATAMAP_SIZE) {
+			kmem_cache_free(xfs_buf_item_zone, bip);
+			xfs_err(mp,
+	"buffer item dirty bitmap (%u uints) too small to reflect %u bytes!",
+					map_size,
+					BBTOB(bp->b_maps[i].bm_len));
+			return -EFSCORRUPTED;
+		}
+
 		bip->bli_formats[i].blf_type = XFS_LI_BUF;
 		bip->bli_formats[i].blf_blkno = bp->b_maps[i].bm_bn;
 		bip->bli_formats[i].blf_len = bp->b_maps[i].bm_len;
@@ -795,6 +804,9 @@ xfs_buf_item_log_segment(
 	uint		end_bit;
 	uint		mask;
 
+	ASSERT(first < XFS_BLF_DATAMAP_SIZE * XFS_BLF_CHUNK * NBWORD);
+	ASSERT(last < XFS_BLF_DATAMAP_SIZE * XFS_BLF_CHUNK * NBWORD);
+
 	/*
 	 * Convert byte offsets to bit numbers.
 	 */


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] xfs: make struct xfs_buf_log_format have a consistent size
  2020-01-09 18:44 [PATCH v2 0/5] xfs: fix buf log item memory corruption on non-amd64 Darrick J. Wong
                   ` (3 preceding siblings ...)
  2020-01-09 18:45 ` [PATCH 4/5] xfs: complain if anyone tries to create a too-large buffer log item Darrick J. Wong
@ 2020-01-09 18:45 ` Darrick J. Wong
  2020-01-10 11:59   ` Christoph Hellwig
  4 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2020-01-09 18:45 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Increase XFS_BLF_DATAMAP_SIZE by 1 to fill in the implied padding at the
end of struct xfs_buf_log_format.  This makes the size consistent so
that we can check it in xfs_ondisk.h, and will be needed once we start
logging attribute values.

On amd64 we get the following pahole:

struct xfs_buf_log_format {
        short unsigned int         blf_type;       /*     0     2 */
        short unsigned int         blf_size;       /*     2     2 */
        short unsigned int         blf_flags;      /*     4     2 */
        short unsigned int         blf_len;        /*     6     2 */
        long long int              blf_blkno;      /*     8     8 */
        unsigned int               blf_map_size;   /*    16     4 */
        unsigned int               blf_data_map[16]; /*    20    64 */
        /* --- cacheline 1 boundary (64 bytes) was 20 bytes ago --- */

        /* size: 88, cachelines: 2, members: 7 */
        /* padding: 4 */
        /* last cacheline: 24 bytes */
};

But on i386 we get the following:

struct xfs_buf_log_format {
        short unsigned int         blf_type;       /*     0     2 */
        short unsigned int         blf_size;       /*     2     2 */
        short unsigned int         blf_flags;      /*     4     2 */
        short unsigned int         blf_len;        /*     6     2 */
        long long int              blf_blkno;      /*     8     8 */
        unsigned int               blf_map_size;   /*    16     4 */
        unsigned int               blf_data_map[16]; /*    20    64 */
        /* --- cacheline 1 boundary (64 bytes) was 20 bytes ago --- */

        /* size: 84, cachelines: 2, members: 7 */
        /* last cacheline: 20 bytes */
};

Notice how the amd64 compiler inserts 4 bytes of padding to the end of
the structure to ensure 8-byte alignment.  Prior to "xfs: fix memory
corruption during remote attr value buffer invalidation" we would try to
write to blf_data_map[17], which is harmless on amd64 but really bad on
i386.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_log_format.h |   19 ++++++++++++++-----
 fs/xfs/xfs_ondisk.h            |    1 +
 2 files changed, 15 insertions(+), 5 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 8ef31d71a9c7..9bac0d2e56dc 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -462,11 +462,20 @@ static inline uint xfs_log_dinode_size(int version)
 #define	XFS_BLF_GDQUOT_BUF	(1<<4)
 
 /*
- * This is the structure used to lay out a buf log item in the
- * log.  The data map describes which 128 byte chunks of the buffer
- * have been logged.
- */
-#define XFS_BLF_DATAMAP_SIZE	((XFS_MAX_BLOCKSIZE / XFS_BLF_CHUNK) / NBWORD)
+ * This is the structure used to lay out a buf log item in the log.  The data
+ * map describes which 128 byte chunks of the buffer have been logged.
+ *
+ * The placement of blf_map_size causes blf_data_map to start at an odd
+ * multiple of sizeof(unsigned int) offset within the struct.  Because the data
+ * bitmap size will always be an even number, the end of the data_map (and
+ * therefore the structure) will also be at an odd multiple of sizeof(unsigned
+ * int).  Some 64-bit compilers will insert padding at the end of the struct to
+ * ensure 64-bit alignment of blf_blkno, but 32-bit ones will not.  Therefore,
+ * XFS_BLF_DATAMAP_SIZE must be an odd number to make the padding explicit and
+ * keep the structure size consistent between 32-bit and 64-bit platforms.
+ */
+#define __XFS_BLF_DATAMAP_SIZE	((XFS_MAX_BLOCKSIZE / XFS_BLF_CHUNK) / NBWORD)
+#define XFS_BLF_DATAMAP_SIZE	(__XFS_BLF_DATAMAP_SIZE + 1)
 
 typedef struct xfs_buf_log_format {
 	unsigned short	blf_type;	/* buf log item type indicator */
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index b6701b4f59a9..5f04d8a5ab2a 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -111,6 +111,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(xfs_dir2_sf_hdr_t,		10);
 
 	/* log structures */
+	XFS_CHECK_STRUCT_SIZE(struct xfs_buf_log_format,	88);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_dq_logformat,		24);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_efd_log_format_32,	28);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_efd_log_format_64,	32);


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] xfs: refactor remote attr value buffer invalidation
  2020-01-09 18:44 ` [PATCH 1/5] xfs: refactor remote attr value buffer invalidation Darrick J. Wong
@ 2020-01-10 11:55   ` Christoph Hellwig
  2020-01-14  0:43     ` Darrick J. Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2020-01-10 11:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

> +	struct xfs_mount	*mp = ip->i_mount;
> +	struct xfs_buf		*bp;
> +	xfs_daddr_t		dblkno;
> +	int			dblkcnt;
> +
> +	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> +
> +	dblkno = XFS_FSB_TO_DADDR(mp, map->br_startblock),
> +	dblkcnt = XFS_FSB_TO_BB(mp, map->br_blockcount);
> +
> +	/*
> +	 * If the "remote" value is in the cache, remove it.
> +	 */
> +	bp = xfs_buf_incore(mp->m_ddev_targp, dblkno, dblkcnt, XBF_TRYLOCK);

Do we really need the dblkno and dblkcnt local variables here?

> @@ -592,18 +614,8 @@ xfs_attr_rmtval_remove(
>  		ASSERT((map.br_startblock != DELAYSTARTBLOCK) &&
>  		       (map.br_startblock != HOLESTARTBLOCK));
>  
> -		dblkno = XFS_FSB_TO_DADDR(mp, map.br_startblock),
> -		dblkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
> -
> -		/*
> -		 * If the "remote" value is in the cache, remove it.
> -		 */
> -		bp = xfs_buf_incore(mp->m_ddev_targp, dblkno, dblkcnt, XBF_TRYLOCK);
> -		if (bp) {
> -			xfs_buf_stale(bp);
> -			xfs_buf_relse(bp);
> -			bp = NULL;
> -		}
> +		if (map.br_startblock != HOLESTARTBLOCK)
> +			xfs_attr_rmtval_stale(args->dp, &map);

I don't think we need the HOLESTARTBLOCK check here, given that we have
the asserts above.  I also think the assert should move into
xfs_attr_rmtval_stale and be split into two asserts, one each for the
invalid values.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] xfs: fix memory corruption during remote attr value buffer invalidation
  2020-01-09 18:44 ` [PATCH 2/5] xfs: fix memory corruption during " Darrick J. Wong
@ 2020-01-10 11:57   ` Christoph Hellwig
  2020-01-14  0:59     ` Darrick J. Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2020-01-10 11:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

> While running generic/103, I observed what looks like memory corruption
> and (with slub debugging turned on) a slub redzone warning on i386 when
> inactivating an inode with a 64k remote attr value.
> 
> On a v5 filesystem, maximally sized remote attr values require one block
> more than 64k worth of space to hold both the remote attribute value
> header (64 bytes).  On a 4k block filesystem this results in a 68k
> buffer; on a 64k block filesystem, this would be a 128k buffer.  Note
> that even though we'll never use more than 65,600 bytes of this buffer,
> XFS_MAX_BLOCKSIZE is 64k.
> 
> This is a problem because the definition of struct xfs_buf_log_format
> allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k).  On i386 when we
> invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of
> the dirty map, writing right off the end of the log item and corrupting
> memory.  We've gotten away with this on x86_64 for years because the
> compiler inserts a u32 padding on the end of struct xfs_buf_log_format.
> 
> Fortunately for us, remote attribute values are written to disk with
> xfs_bwrite(), which is to say that they are not logged.  Fix the problem
> by removing all places where we could end up creating a buffer log item
> for a remote attribute value and leave a note explaining why.

I think this changelog needs an explanation why using
xfs_attr_rmtval_stale which just trylock and checks if the buffers are
in core only in xfs_attr3_leaf_freextent is fine.  And while the incore
part looks sane to me, I think the trylock is wrong and we need to pass
the locking flag to xfs_attr_rmtval_stale.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] xfs: clean up xfs_buf_item_get_format return value
  2020-01-09 18:45 ` [PATCH 3/5] xfs: clean up xfs_buf_item_get_format return value Darrick J. Wong
@ 2020-01-10 11:58   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2020-01-10 11:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jan 09, 2020 at 10:45:00AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> The only thing that can cause a nonzero return from
> xfs_buf_item_get_format is if the kmem_alloc fails, which it can't.
> Get rid of all the unnecessary error handling.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] xfs: complain if anyone tries to create a too-large buffer log item
  2020-01-09 18:45 ` [PATCH 4/5] xfs: complain if anyone tries to create a too-large buffer log item Darrick J. Wong
@ 2020-01-10 11:58   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2020-01-10 11:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jan 09, 2020 at 10:45:08AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Complain if someone calls xfs_buf_item_init on a buffer that is larger
> than the dirty bitmap can handle, or tries to log a region that's past
> the end of the dirty bitmap.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] xfs: make struct xfs_buf_log_format have a consistent size
  2020-01-09 18:45 ` [PATCH 5/5] xfs: make struct xfs_buf_log_format have a consistent size Darrick J. Wong
@ 2020-01-10 11:59   ` Christoph Hellwig
  2020-01-10 16:53     ` Darrick J. Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2020-01-10 11:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Jan 09, 2020 at 10:45:14AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Increase XFS_BLF_DATAMAP_SIZE by 1 to fill in the implied padding at the
> end of struct xfs_buf_log_format.  This makes the size consistent so
> that we can check it in xfs_ondisk.h, and will be needed once we start
> logging attribute values.

Isn't this an incompatible change in the on-disk format for 32-bit
systems?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] xfs: make struct xfs_buf_log_format have a consistent size
  2020-01-10 11:59   ` Christoph Hellwig
@ 2020-01-10 16:53     ` Darrick J. Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2020-01-10 16:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Fri, Jan 10, 2020 at 03:59:38AM -0800, Christoph Hellwig wrote:
> On Thu, Jan 09, 2020 at 10:45:14AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Increase XFS_BLF_DATAMAP_SIZE by 1 to fill in the implied padding at the
> > end of struct xfs_buf_log_format.  This makes the size consistent so
> > that we can check it in xfs_ondisk.h, and will be needed once we start
> > logging attribute values.
> 
> Isn't this an incompatible change in the on-disk format for 32-bit
> systems?

AFAICT it isn't, because log recovery reads the on-disk log op head to
figure out the length of the log iovec, allocates that much memory for
the log buf, reads in the contents, and then casts i_addr to
xfs_buf_log_format.

On the log writing side, the logging code takes the buffer item and
writes out whatever's in the incore structure, assuming that the incore
structure is actually large enough to cover data_map[map_size - 1].

(Someone else might want to check that to make sure I haven't gone mad
from looking at the log code... :P)

--D

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] xfs: refactor remote attr value buffer invalidation
  2020-01-10 11:55   ` Christoph Hellwig
@ 2020-01-14  0:43     ` Darrick J. Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2020-01-14  0:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Fri, Jan 10, 2020 at 03:55:40AM -0800, Christoph Hellwig wrote:
> > +	struct xfs_mount	*mp = ip->i_mount;
> > +	struct xfs_buf		*bp;
> > +	xfs_daddr_t		dblkno;
> > +	int			dblkcnt;
> > +
> > +	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> > +
> > +	dblkno = XFS_FSB_TO_DADDR(mp, map->br_startblock),
> > +	dblkcnt = XFS_FSB_TO_BB(mp, map->br_blockcount);
> > +
> > +	/*
> > +	 * If the "remote" value is in the cache, remove it.
> > +	 */
> > +	bp = xfs_buf_incore(mp->m_ddev_targp, dblkno, dblkcnt, XBF_TRYLOCK);
> 
> Do we really need the dblkno and dblkcnt local variables here?

Eh, not really.

> > @@ -592,18 +614,8 @@ xfs_attr_rmtval_remove(
> >  		ASSERT((map.br_startblock != DELAYSTARTBLOCK) &&
> >  		       (map.br_startblock != HOLESTARTBLOCK));
> >  
> > -		dblkno = XFS_FSB_TO_DADDR(mp, map.br_startblock),
> > -		dblkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
> > -
> > -		/*
> > -		 * If the "remote" value is in the cache, remove it.
> > -		 */
> > -		bp = xfs_buf_incore(mp->m_ddev_targp, dblkno, dblkcnt, XBF_TRYLOCK);
> > -		if (bp) {
> > -			xfs_buf_stale(bp);
> > -			xfs_buf_relse(bp);
> > -			bp = NULL;
> > -		}
> > +		if (map.br_startblock != HOLESTARTBLOCK)
> > +			xfs_attr_rmtval_stale(args->dp, &map);
> 
> I don't think we need the HOLESTARTBLOCK check here, given that we have
> the asserts above.  I also think the assert should move into
> xfs_attr_rmtval_stale and be split into two asserts, one each for the
> invalid values.

<nod> I'll upgrade them to proper fs corruption messages while I'm at
it.

--D

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] xfs: fix memory corruption during remote attr value buffer invalidation
  2020-01-10 11:57   ` Christoph Hellwig
@ 2020-01-14  0:59     ` Darrick J. Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2020-01-14  0:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Fri, Jan 10, 2020 at 03:57:42AM -0800, Christoph Hellwig wrote:
> > While running generic/103, I observed what looks like memory corruption
> > and (with slub debugging turned on) a slub redzone warning on i386 when
> > inactivating an inode with a 64k remote attr value.
> > 
> > On a v5 filesystem, maximally sized remote attr values require one block
> > more than 64k worth of space to hold both the remote attribute value
> > header (64 bytes).  On a 4k block filesystem this results in a 68k
> > buffer; on a 64k block filesystem, this would be a 128k buffer.  Note
> > that even though we'll never use more than 65,600 bytes of this buffer,
> > XFS_MAX_BLOCKSIZE is 64k.
> > 
> > This is a problem because the definition of struct xfs_buf_log_format
> > allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k).  On i386 when we
> > invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of
> > the dirty map, writing right off the end of the log item and corrupting
> > memory.  We've gotten away with this on x86_64 for years because the
> > compiler inserts a u32 padding on the end of struct xfs_buf_log_format.
> > 
> > Fortunately for us, remote attribute values are written to disk with
> > xfs_bwrite(), which is to say that they are not logged.  Fix the problem
> > by removing all places where we could end up creating a buffer log item
> > for a remote attribute value and leave a note explaining why.
> 
> I think this changelog needs an explanation why using
> xfs_attr_rmtval_stale which just trylock and checks if the buffers are
> in core only in xfs_attr3_leaf_freextent is fine.

Hmm, maybe the *function* needs an explanation for why it's valid to use
rmtval_stale here:

/*
 * Invalidate any incore buffers associated with this remote attribute
 * value extent.   We never log remote attribute value buffers, which
 * means that they won't be attached to a transaction and are therefore
 * safe to mark stale.  The actual bunmapi will be taken care of later.
 */
STATIC int
xfs_attr3_rmt_inactive(
	struct xfs_inode	*dp,
	struct xfs_attr_inactive_list *lp)

I say the function is also confusingly named and could have its
arguments list trimmed.

> And while the incore
> part looks sane to me, I think the trylock is wrong and we need to pass
> the locking flag to xfs_attr_rmtval_stale.

Hmm, yeah, it's definitely wrong for the inactivation case -- the
inactivation thread holds the only incore reference to this unlinked
inode, so the buffer had better not be locked by another thread.

I'm not even all that sure why XBF_TRYLOCK is necessary in the remove
case, since we hold ILOCK_EXCL and there shouldn't be anyone else
messing around inside the xattr data.

--D

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-01-14  0:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-09 18:44 [PATCH v2 0/5] xfs: fix buf log item memory corruption on non-amd64 Darrick J. Wong
2020-01-09 18:44 ` [PATCH 1/5] xfs: refactor remote attr value buffer invalidation Darrick J. Wong
2020-01-10 11:55   ` Christoph Hellwig
2020-01-14  0:43     ` Darrick J. Wong
2020-01-09 18:44 ` [PATCH 2/5] xfs: fix memory corruption during " Darrick J. Wong
2020-01-10 11:57   ` Christoph Hellwig
2020-01-14  0:59     ` Darrick J. Wong
2020-01-09 18:45 ` [PATCH 3/5] xfs: clean up xfs_buf_item_get_format return value Darrick J. Wong
2020-01-10 11:58   ` Christoph Hellwig
2020-01-09 18:45 ` [PATCH 4/5] xfs: complain if anyone tries to create a too-large buffer log item Darrick J. Wong
2020-01-10 11:58   ` Christoph Hellwig
2020-01-09 18:45 ` [PATCH 5/5] xfs: make struct xfs_buf_log_format have a consistent size Darrick J. Wong
2020-01-10 11:59   ` Christoph Hellwig
2020-01-10 16:53     ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).