linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] xfs: prevent spoofing of rtbitmap blocks when recovering buffers
@ 2021-07-27 19:15 Darrick J. Wong
  2021-07-27 21:46 ` Dave Chinner
  0 siblings, 1 reply; 2+ messages in thread
From: Darrick J. Wong @ 2021-07-27 19:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

While reviewing the buffer item recovery code, the thought occurred to
me: in V5 filesystems we use log sequence number (LSN) tracking to avoid
replaying older metadata updates against newer log items.  However, we
use the magic number of the ondisk buffer to find the LSN of the ondisk
metadata, which means that if an attacker can control the layout of the
realtime device precisely enough that the start of an rt bitmap block
matches the magic and UUID of some other kind of block, they can control
the purported LSN of that spoofed block and thereby break log replay.

Since realtime bitmap and summary blocks don't have headers at all, we
have no way to tell if a block really should be replayed.  The best we
can do is replay unconditionally and hope for the best.

XXX: Won't this leave us with a corrupt rtbitmap if recovery also fails?
In other words, the usual problems that happen when you /don't/ track
buffer age with LSNs?  I've noticed that the recoveryloop tests get hung
up on incorrect frextents after a few iterations, but have not had time
to figure out if the rtbitmap recovery is wrong, or if there's something
broken with the old-style summary updates for rt counters.

XXXX: Maybe someone should fix the ondisk format to track the (magic,
blkno, lsn, uuid) like we do everything else in V5?  That's gonna suck
for 64-bit divisions...

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_buf_item_recover.c |   32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
index 05fd816edf59..a776bcfdf0c1 100644
--- a/fs/xfs/xfs_buf_item_recover.c
+++ b/fs/xfs/xfs_buf_item_recover.c
@@ -698,19 +698,29 @@ xlog_recover_do_inode_buffer(
 static xfs_lsn_t
 xlog_recover_get_buf_lsn(
 	struct xfs_mount	*mp,
-	struct xfs_buf		*bp)
+	struct xfs_buf		*bp,
+	struct xfs_buf_log_format *buf_f)
 {
 	uint32_t		magic32;
 	uint16_t		magic16;
 	uint16_t		magicda;
 	void			*blk = bp->b_addr;
 	uuid_t			*uuid;
-	xfs_lsn_t		lsn = -1;
+	uint16_t		blft;
+	xfs_lsn_t		lsn = NULLCOMMITLSN;
 
 	/* v4 filesystems always recover immediately */
 	if (!xfs_sb_version_hascrc(&mp->m_sb))
 		goto recover_immediately;
 
+	/*
+	 * realtime bitmap and summary file blocks do not have magic numbers or
+	 * UUIDs, so we must recover them immediately.
+	 */
+	blft = xfs_blft_from_flags(buf_f);
+	if (blft == XFS_BLFT_RTBITMAP_BUF || blft == XFS_BLFT_RTSUMMARY_BUF)
+		goto recover_immediately;
+
 	magic32 = be32_to_cpu(*(__be32 *)blk);
 	switch (magic32) {
 	case XFS_ABTB_CRC_MAGIC:
@@ -786,7 +796,13 @@ xlog_recover_get_buf_lsn(
 		break;
 	}
 
-	if (lsn != (xfs_lsn_t)-1) {
+	/*
+	 * ondisk buffers should never have a zero LSN, so recover those
+	 * buffers immediately.
+	 */
+	if (!lsn)
+		lsn = NULLCOMMITLSN;
+	if (lsn != NULLCOMMITLSN) {
 		if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
 			goto recover_immediately;
 		return lsn;
@@ -805,7 +821,9 @@ xlog_recover_get_buf_lsn(
 		break;
 	}
 
-	if (lsn != (xfs_lsn_t)-1) {
+	if (!lsn)
+		lsn = NULLCOMMITLSN;
+	if (lsn != NULLCOMMITLSN) {
 		if (!uuid_equal(&mp->m_sb.sb_uuid, uuid))
 			goto recover_immediately;
 		return lsn;
@@ -834,7 +852,7 @@ xlog_recover_get_buf_lsn(
 	/* unknown buffer contents, recover immediately */
 
 recover_immediately:
-	return (xfs_lsn_t)-1;
+	return NULLCOMMITLSN;
 
 }
 
@@ -920,8 +938,8 @@ xlog_recover_buf_commit_pass2(
 	 * the verifier will be reset to match whatever recover turns that
 	 * buffer into.
 	 */
-	lsn = xlog_recover_get_buf_lsn(mp, bp);
-	if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) {
+	lsn = xlog_recover_get_buf_lsn(mp, bp, buf_f);
+	if (lsn != NULLCOMMITLSN && XFS_LSN_CMP(lsn, current_lsn) > 0) {
 		trace_xfs_log_recover_buf_skip(log, buf_f);
 		xlog_recover_validate_buf_type(mp, bp, buf_f, NULLCOMMITLSN);
 		goto out_release;

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [RFC PATCH] xfs: prevent spoofing of rtbitmap blocks when recovering buffers
  2021-07-27 19:15 [RFC PATCH] xfs: prevent spoofing of rtbitmap blocks when recovering buffers Darrick J. Wong
@ 2021-07-27 21:46 ` Dave Chinner
  0 siblings, 0 replies; 2+ messages in thread
From: Dave Chinner @ 2021-07-27 21:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Jul 27, 2021 at 12:15:02PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> While reviewing the buffer item recovery code, the thought occurred to
> me: in V5 filesystems we use log sequence number (LSN) tracking to avoid
> replaying older metadata updates against newer log items.  However, we
> use the magic number of the ondisk buffer to find the LSN of the ondisk
> metadata, which means that if an attacker can control the layout of the
> realtime device precisely enough that the start of an rt bitmap block
> matches the magic and UUID of some other kind of block, they can control
> the purported LSN of that spoofed block and thereby break log replay.

That'd take some effort, especially to crash the system at the point
where it is active in the log...

> Since realtime bitmap and summary blocks don't have headers at all, we
> have no way to tell if a block really should be replayed.  The best we
> can do is replay unconditionally and hope for the best.
> 
> XXX: Won't this leave us with a corrupt rtbitmap if recovery also fails?
> In other words, the usual problems that happen when you /don't/ track
> buffer age with LSNs?  I've noticed that the recoveryloop tests get hung
> up on incorrect frextents after a few iterations, but have not had time
> to figure out if the rtbitmap recovery is wrong, or if there's something
> broken with the old-style summary updates for rt counters.

If we don't track the age of the buffers then replay will always
replay over the top of newer metadata. In general, this isn't a huge
problem for buffer replay as long as recovery always completes
fully.  If recovery always completes fully, the result should be the
same or newer metadata on disk than was previously on disk, even if
we started from older metadata in the journal. Speaking generally,
the LSN ordering really only makes recovery a bit more efficient by
allowing it to elide writeback of modifications that are already on
disk.

That said there some complex interactions between different types of
objects and the recovery of them that LSN ordering addresses. Such
as inodes being logged by buffer at allocation and unlink, but by
log_dinodes for all other modifications. replacing the flush
iteration in inodes because they can be logged and written back at
the same time. And there were some rare, subtle recovery corruptions
around these interactions that were solved by ensuring LSN ordering
was enforced.

The LSN ordering allows us to do fault analysis across log
recovery when we have inconsistencies between related objects such
as inodes, BMBT blocks and free space. I've been using this latter
relationship between log recovery and metadata writeback a *lot*
over the past couple of weeks.

So, really, the LSN order tracking largely doesn't matter when
everything is working as it should, and so always recovering a
RT bitmap buffer should not actually introduce any corruptions
except when log recoveyr fails. In which case, we already have an
inconsistent filesystem and need to reapir it...

> XXXX: Maybe someone should fix the ondisk format to track the (magic,
> blkno, lsn, uuid) like we do everything else in V5?  That's gonna suck
> for 64-bit divisions...

I largely avoided changing the rt bitmap format to add headers and
CRCs largely because it requires expensive rework of the extent to
bitmap indexing mechanism. I'm still not sure if we gain anything
from adding it now.

> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/xfs_buf_item_recover.c |   32 +++++++++++++++++++++++++-------
>  1 file changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index 05fd816edf59..a776bcfdf0c1 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -698,19 +698,29 @@ xlog_recover_do_inode_buffer(
>  static xfs_lsn_t
>  xlog_recover_get_buf_lsn(
>  	struct xfs_mount	*mp,
> -	struct xfs_buf		*bp)
> +	struct xfs_buf		*bp,
> +	struct xfs_buf_log_format *buf_f)
>  {
>  	uint32_t		magic32;
>  	uint16_t		magic16;
>  	uint16_t		magicda;
>  	void			*blk = bp->b_addr;
>  	uuid_t			*uuid;
> -	xfs_lsn_t		lsn = -1;
> +	uint16_t		blft;
> +	xfs_lsn_t		lsn = NULLCOMMITLSN;

I'd drop the -1 to NULLCOMMITLSN from the patch - it's not really
part of the bug fix...

Otherwise I think the change looks OK.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-07-27 21:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-27 19:15 [RFC PATCH] xfs: prevent spoofing of rtbitmap blocks when recovering buffers Darrick J. Wong
2021-07-27 21:46 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).