All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] xfs: detect corrupt inobt records better
@ 2018-04-17  6:39 Dave Chinner
  2018-04-17  6:39 ` [PATCH 1/2] xfs: validate cached inodes are free when allocated Dave Chinner
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Dave Chinner @ 2018-04-17  6:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: viro

Hi Folks,

Despite what the subject line says, this isn't a patch set that
looks at inobt records. What it does is catch bad allocations that
occur due to corrupt inobt records that cannot be detected until we
access the inode that has been selected for allocation.

The first patch fixes a dentry cache corruption vector, where we
overwrite an in-use inode via allocation. After scratching my head
about the whacky pathwalk oops, KASAN use-after-free traces and
deadlocks trying to lock inodes in xfs_iget(), Al Viro confirmed
that it smelt like an inode overwrite problem, so that's where I
looked next.

I just recently fixed a very similar inobt record corruption problem
that triggered inode list corruption. That one was on the "read from
disk" side of xfs_iget() - see commit ee457001ed6c ("xfs: catch
inode allocation state mismatch corruption") for details. The dentry
cache corruption was the same problem, just on the "cached in
memory" side of xfs_iget().  The same fix applies here and so I
factored those checks and applied them to both the cache hit and
miss cases.

To avoid the allocation deadlock problems, we have to ensure that
the directory we are allocating in isn't the inode that is selected
for allocation.  We already hold the directory inode locked, so the
attempt to lock it again in xfs_iget() can trigger a deadlock. Hence
we have to detect this before we try to retreive the inode from the
cache. This is the second patch, and it also catches attempts to
allocate special inodes we know are allocated (root inode, rt
inodes, quota inodes) just because it's easy and obviously wrong
to be allocating a known in-use inode.

-----

That's it for the patches, but there's a deeper problem here,
especially on v4 filesystems: we have no reliable way of detecting
inobt record corruption at runtime.

What I've put in place is basic cross checks that catch the result
of a bad allocation, but we still don't really know if the inobt
record was totally ok or not even if the inode on disk is free. It
just prevents us reallocating an inode that is already in use.

e.g. the inobt record could contain a stale inode chunk that has
been stamped with empty inode templates but is now considered free
space due to a crash and log recovery problem. We know the v4 format
has unfixable log recovery issues around inode chunk allocation
(fixed in the v5 format), so there's multiple layers of problems we
could trip over here.  i.e. our cross-check object can also be
corrupt or stale.

IOWs, once we stop trusting that internal filesystem indexes are
correct, we're in a world of pain because we cannot tell what is
valid and what has been corrupted or manipulated. Malicious damage
is a threat model most filesystem structures were never designed to
detect, let alone be robust against, and the v4 format has no
protections at all.

The problem is much more nuanced for v5 filesystems. For v5, we
don't read newly allocated inodes from disk - we implicitly trust
that the inobt record is valid. This is a fair assumption because
CRC and other on-disk validation structures reduce the chance of
undetected corruption substantially compared to v4 filesystems.
However, it means we can't detect inobt corruptions when they do
happen because we have no cross reference that can catch otherwise
undetected corruptions.

That said, with current mkfs defaults - v5 + finobt - we have
redundant metadata that will catch a inobt or finobt corruption
during allocation. Hence we're pretty safe from a single corruption
event going undetected, and the likelyhood that a concurrent
multiple sector corruption corrupts both the finobt and the inobt
records in exactly the same way is highly improbable. Hence v5 +
finobt is quite robust against inobt record corruption without
needing to read and trust the inode on disk to detect corruption.

However, v5 is still not safe against is malicious corruption, where
a bad actor intentionally modifies the on-disk structures to mark
inodes free in both the inobt and the finobt and then recalculates
the CRCs and other metadata. We could check the rmapbt if it's
configured, but an attacker can also manipulate that structure to
say that region does contain inodes. They can also manipulate the
free space btrees to say it's used space and so once we've chased
everything we can cross-check down we still can't reliably detect
malicious corruption attacks on the v5 filesystem structure.

IOWs, even with all the extra on-disk verification the v5 format
has, the only thing we can do to protect against propagation of
malicious corruption is the same thing as for v4: check the inode is
free on disk before allowing the allocation to proceed.

I have not done this, because I think malicious corruption is not
something we can defend against. Hence it makes no sense to add
checks that reduce performance but don't provide any extra
benefit to device-based corruption detection or propagation
prevention. IOWs, I don't think we should try to mitigate
malicious corruption attack scenarios.

I think we need to keep improving our bounds checking and structure
corruption detection, but we should not worry about things that take
multiple, highly improbably structure corruptions that are hihgly
expensive to detect and/or mitigate. i.e.  unprivileged mounts of
untrusted XFS filesystem images should never, ever be allowed.

What does everyone else think about this? Darrick, I'm sure you've
got some thoughts on this :)

Cheers,

Dave.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] xfs: validate cached inodes are free when allocated
  2018-04-17  6:39 [PATCH 0/3] xfs: detect corrupt inobt records better Dave Chinner
@ 2018-04-17  6:39 ` Dave Chinner
  2018-04-17  7:11   ` Christoph Hellwig
                     ` (2 more replies)
  2018-04-17  6:39 ` [PATCH 2/2] xfs: validate allocated inode number Dave Chinner
  2018-04-17  9:13 ` [PATCH 0/3] xfs: detect corrupt inobt records better Carlos Maiolino
  2 siblings, 3 replies; 13+ messages in thread
From: Dave Chinner @ 2018-04-17  6:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: viro

From: Dave Chinner <dchinner@redhat.com>

A recent fuzzed filesystem image cached random dcache corruption
when the reproducer was run. This often showed up as panics in
lookup_slow() on a null inode->i_ops pointer when doing pathwalks.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
....
Call Trace:
 lookup_slow+0x44/0x60
 walk_component+0x3dd/0x9f0
 link_path_walk+0x4a7/0x830
 path_lookupat+0xc1/0x470
 filename_lookup+0x129/0x270
 user_path_at_empty+0x36/0x40
 path_listxattr+0x98/0x110
 SyS_listxattr+0x13/0x20
 do_syscall_64+0xf5/0x280
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

but had many different failure modes including deadlocks trying to
lock the inode that was just allocated or KASAN reports of
use-after-free violations.

The cause of the problem was a corrupt INOBT on a v4 fs where the
root inode was marked as free in the inobt record. Hence when we
allocated an inode, it chose the root inode to allocate, found it in
the cache and re-initialised it.

We recently fixed a similar inode allocation issue caused by inobt
record corruption problem in xfs_iget_cache_miss() in commit
ee457001ed6c ("xfs: catch inode allocation state mismatch
corruption"). This change adds similar checks to the cache-hit path
to catch it, and turns the reproducer into a corruption shutdown
situation.

Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_icache.c | 73 +++++++++++++++++++++++++++++++++++------------------
 1 file changed, 48 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 98b7a4ae15e4..fb37ada55710 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -309,6 +309,46 @@ xfs_reinit_inode(
 	return error;
 }
 
+/*
+ * If we are allocating a new inode, then check what was returned is
+ * actually a free, empty inode. If we are not allocating an inode,
+ * the check we didn't find a free inode.
+ *
+ * Returns:
+ *	0		if the inode free state matches the lookup context
+ *	-ENOENT		if the inode is free and we are not allocating
+ *	-EFSCORRUPTED	if there is any state mismatch at all
+ */
+static int
+xfs_iget_check_free_state(
+	struct xfs_inode	*ip,
+	int			flags)
+{
+	if (flags & XFS_IGET_CREATE) {
+		/* should be a free inode */
+		if (VFS_I(ip)->i_mode != 0) {
+			xfs_warn(ip->i_mount,
+"Corruption detected! Free inode 0x%llx not marked free! (mode 0x%x)",
+				ip->i_ino, VFS_I(ip)->i_mode);
+			return -EFSCORRUPTED;
+		}
+
+		if (ip->i_d.di_nblocks != 0) {
+			xfs_warn(ip->i_mount,
+"Corruption detected! Free inode 0x%llx has blocks allocated!",
+				ip->i_ino);
+			return -EFSCORRUPTED;
+		}
+		return 0;
+	}
+
+	/* should be an allocated inode */
+	if (VFS_I(ip)->i_mode == 0)
+		return -ENOENT;
+
+	return 0;
+}
+
 /*
  * Check the validity of the inode we just found it the cache
  */
@@ -358,12 +398,12 @@ xfs_iget_cache_hit(
 	}
 
 	/*
-	 * If lookup is racing with unlink return an error immediately.
+	 * Check the inode free state is valid. This also detects lookup
+	 * racing with unlinks.
 	 */
-	if (VFS_I(ip)->i_mode == 0 && !(flags & XFS_IGET_CREATE)) {
-		error = -ENOENT;
+	error = xfs_iget_check_free_state(ip, flags);
+	if (error)
 		goto out_error;
-	}
 
 	/*
 	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
@@ -486,29 +526,12 @@ xfs_iget_cache_miss(
 
 
 	/*
-	 * If we are allocating a new inode, then check what was returned is
-	 * actually a free, empty inode. If we are not allocating an inode,
-	 * the check we didn't find a free inode.
+	 * Check the inode free state is valid. This also detects lookup
+	 * racing with unlinks.
 	 */
-	if (flags & XFS_IGET_CREATE) {
-		if (VFS_I(ip)->i_mode != 0) {
-			xfs_warn(mp,
-"Corruption detected! Free inode 0x%llx not marked free on disk",
-				ino);
-			error = -EFSCORRUPTED;
-			goto out_destroy;
-		}
-		if (ip->i_d.di_nblocks != 0) {
-			xfs_warn(mp,
-"Corruption detected! Free inode 0x%llx has blocks allocated!",
-				ino);
-			error = -EFSCORRUPTED;
-			goto out_destroy;
-		}
-	} else if (VFS_I(ip)->i_mode == 0) {
-		error = -ENOENT;
+	error = xfs_iget_check_free_state(ip, flags);
+	if (error)
 		goto out_destroy;
-	}
 
 	/*
 	 * Preload the radix tree so we can insert safely under the
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] xfs: validate allocated inode number
  2018-04-17  6:39 [PATCH 0/3] xfs: detect corrupt inobt records better Dave Chinner
  2018-04-17  6:39 ` [PATCH 1/2] xfs: validate cached inodes are free when allocated Dave Chinner
@ 2018-04-17  6:39 ` Dave Chinner
  2018-04-17  7:12   ` Christoph Hellwig
                     ` (2 more replies)
  2018-04-17  9:13 ` [PATCH 0/3] xfs: detect corrupt inobt records better Carlos Maiolino
  2 siblings, 3 replies; 13+ messages in thread
From: Dave Chinner @ 2018-04-17  6:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: viro

From: Dave Chinner <dchinner@redhat.com>

When we have corrupted free inode btrees, we can attempt to
allocate inodes that we know are already allocated. Catch allocation
of these inodes and report corruption as early as possible to
prevent corruption propagation or deadlocks.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_inode.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 7b764f746ff2..12e6c1aec386 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -790,6 +790,19 @@ xfs_ialloc(
 	}
 	ASSERT(*ialloc_context == NULL);
 
+	/*
+	 * Protect against obviously corrupt allocation btree records. Later
+	 * xfs_iget checks will catch re-allocation of other active in-memory
+	 * and on-disk inodes. If we don't catch reallocating the parent inode
+	 * here we will deadlock in xfs_iget() so we have to do these checks
+	 * first.
+	 */
+	if ((pip && ino == pip->i_ino) ||
+	    !xfs_verify_dir_ino(mp, ino)) {
+		xfs_alert(mp, "Allocated a known in-use inode 0x%llx!", ino);
+		return -EFSCORRUPTED;
+	}
+
 	/*
 	 * Get the in-core inode with the lock held exclusively.
 	 * This is because we're setting fields here we need
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] xfs: validate cached inodes are free when allocated
  2018-04-17  6:39 ` [PATCH 1/2] xfs: validate cached inodes are free when allocated Dave Chinner
@ 2018-04-17  7:11   ` Christoph Hellwig
  2018-04-17  9:00   ` Carlos Maiolino
  2018-04-18  0:10   ` Darrick J. Wong
  2 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2018-04-17  7:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, viro

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] xfs: validate allocated inode number
  2018-04-17  6:39 ` [PATCH 2/2] xfs: validate allocated inode number Dave Chinner
@ 2018-04-17  7:12   ` Christoph Hellwig
  2018-04-17  9:05   ` Carlos Maiolino
  2018-04-18  0:12   ` Darrick J. Wong
  2 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2018-04-17  7:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, viro

On Tue, Apr 17, 2018 at 04:39:16PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we have corrupted free inode btrees, we can attempt to
> allocate inodes that we know are already allocated. Catch allocation
> of these inodes and report corruption as early as possible to
> prevent corruption propagation or deadlocks.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_inode.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 7b764f746ff2..12e6c1aec386 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -790,6 +790,19 @@ xfs_ialloc(
>  	}
>  	ASSERT(*ialloc_context == NULL);
>  
> +	/*
> +	 * Protect against obviously corrupt allocation btree records. Later
> +	 * xfs_iget checks will catch re-allocation of other active in-memory
> +	 * and on-disk inodes. If we don't catch reallocating the parent inode
> +	 * here we will deadlock in xfs_iget() so we have to do these checks
> +	 * first.
> +	 */
> +	if ((pip && ino == pip->i_ino) ||
> +	    !xfs_verify_dir_ino(mp, ino)) {

Seems like the whole conditional would fit onto a single line.

Otherwise looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] xfs: validate cached inodes are free when allocated
  2018-04-17  6:39 ` [PATCH 1/2] xfs: validate cached inodes are free when allocated Dave Chinner
  2018-04-17  7:11   ` Christoph Hellwig
@ 2018-04-17  9:00   ` Carlos Maiolino
  2018-04-17 23:57     ` Dave Chinner
  2018-04-18  0:10   ` Darrick J. Wong
  2 siblings, 1 reply; 13+ messages in thread
From: Carlos Maiolino @ 2018-04-17  9:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

>  fs/xfs/xfs_icache.c | 73 +++++++++++++++++++++++++++++++++++------------------
>  1 file changed, 48 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 98b7a4ae15e4..fb37ada55710 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -309,6 +309,46 @@ xfs_reinit_inode(
>  	return error;
>  }
>  
> +/*
> + * If we are allocating a new inode, then check what was returned is
> + * actually a free, empty inode. If we are not allocating an inode,
> + * the check we didn't find a free inode.

I know it's been like that for a while, but maybe it's a good time to fix this
typo? "then check we didn't find..." ?

Fixing the old typo or not, the patch looks good:

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

-- 
Carlos

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] xfs: validate allocated inode number
  2018-04-17  6:39 ` [PATCH 2/2] xfs: validate allocated inode number Dave Chinner
  2018-04-17  7:12   ` Christoph Hellwig
@ 2018-04-17  9:05   ` Carlos Maiolino
  2018-04-18  0:12   ` Darrick J. Wong
  2 siblings, 0 replies; 13+ messages in thread
From: Carlos Maiolino @ 2018-04-17  9:05 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, viro

On Tue, Apr 17, 2018 at 04:39:16PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we have corrupted free inode btrees, we can attempt to
> allocate inodes that we know are already allocated. Catch allocation
> of these inodes and report corruption as early as possible to
> prevent corruption propagation or deadlocks.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---

Looks good

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

>  fs/xfs/xfs_inode.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 7b764f746ff2..12e6c1aec386 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -790,6 +790,19 @@ xfs_ialloc(
>  	}
>  	ASSERT(*ialloc_context == NULL);
>  
> +	/*
> +	 * Protect against obviously corrupt allocation btree records. Later
> +	 * xfs_iget checks will catch re-allocation of other active in-memory
> +	 * and on-disk inodes. If we don't catch reallocating the parent inode
> +	 * here we will deadlock in xfs_iget() so we have to do these checks
> +	 * first.
> +	 */
> +	if ((pip && ino == pip->i_ino) ||
> +	    !xfs_verify_dir_ino(mp, ino)) {
> +		xfs_alert(mp, "Allocated a known in-use inode 0x%llx!", ino);
> +		return -EFSCORRUPTED;
> +	}
> +
>  	/*
>  	 * Get the in-core inode with the lock held exclusively.
>  	 * This is because we're setting fields here we need
> -- 
> 2.16.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] xfs: detect corrupt inobt records better
  2018-04-17  6:39 [PATCH 0/3] xfs: detect corrupt inobt records better Dave Chinner
  2018-04-17  6:39 ` [PATCH 1/2] xfs: validate cached inodes are free when allocated Dave Chinner
  2018-04-17  6:39 ` [PATCH 2/2] xfs: validate allocated inode number Dave Chinner
@ 2018-04-17  9:13 ` Carlos Maiolino
  2018-04-17 22:28   ` Dave Chinner
  2 siblings, 1 reply; 13+ messages in thread
From: Carlos Maiolino @ 2018-04-17  9:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

> However, v5 is still not safe against is malicious corruption, where
> a bad actor intentionally modifies the on-disk structures to mark
> inodes free in both the inobt and the finobt and then recalculates
> the CRCs and other metadata. We could check the rmapbt if it's
> configured, but an attacker can also manipulate that structure to
> say that region does contain inodes. They can also manipulate the
> free space btrees to say it's used space and so once we've chased
> everything we can cross-check down we still can't reliably detect
> malicious corruption attacks on the v5 filesystem structure.
> 
> IOWs, even with all the extra on-disk verification the v5 format
> has, the only thing we can do to protect against propagation of
> malicious corruption is the same thing as for v4: check the inode is
> free on disk before allowing the allocation to proceed.
> 
> I have not done this, because I think malicious corruption is not
> something we can defend against. Hence it makes no sense to add
> checks that reduce performance but don't provide any extra
> benefit to device-based corruption detection or propagation
> prevention. IOWs, I don't think we should try to mitigate
> malicious corruption attack scenarios.
> 
> I think we need to keep improving our bounds checking and structure
> corruption detection, but we should not worry about things that take
> multiple, highly improbably structure corruptions that are hihgly
> expensive to detect and/or mitigate. i.e.  unprivileged mounts of
> untrusted XFS filesystem images should never, ever be allowed.
> 

I agree with it, recently, I had a discussion with Eric about something similar,
and well, this sort of attack requires some access to the filesystem image
itself, or the device where it lays on. And well, IMHO, in a system, if
somebody is ever allowed to do such modifications in the filesystem, the
security problem is way above the filesystem itself, either by allowing
unprivileged access to the FS image, or superuser being compromised.
Either way, I do not believe we will ever be able to 100% protect the superuser
against himself, or bad administrative policies even so if such protection would
be a performance trade-off.

Well, just my $0.02 :)


-- 
Carlos

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] xfs: detect corrupt inobt records better
  2018-04-17  9:13 ` [PATCH 0/3] xfs: detect corrupt inobt records better Carlos Maiolino
@ 2018-04-17 22:28   ` Dave Chinner
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Chinner @ 2018-04-17 22:28 UTC (permalink / raw)
  To: linux-xfs

On Tue, Apr 17, 2018 at 11:13:41AM +0200, Carlos Maiolino wrote:
> > However, v5 is still not safe against is malicious corruption, where
> > a bad actor intentionally modifies the on-disk structures to mark
> > inodes free in both the inobt and the finobt and then recalculates
> > the CRCs and other metadata. We could check the rmapbt if it's
> > configured, but an attacker can also manipulate that structure to
> > say that region does contain inodes. They can also manipulate the
> > free space btrees to say it's used space and so once we've chased
> > everything we can cross-check down we still can't reliably detect
> > malicious corruption attacks on the v5 filesystem structure.
> > 
> > IOWs, even with all the extra on-disk verification the v5 format
> > has, the only thing we can do to protect against propagation of
> > malicious corruption is the same thing as for v4: check the inode is
> > free on disk before allowing the allocation to proceed.
> > 
> > I have not done this, because I think malicious corruption is not
> > something we can defend against. Hence it makes no sense to add
> > checks that reduce performance but don't provide any extra
> > benefit to device-based corruption detection or propagation
> > prevention. IOWs, I don't think we should try to mitigate
> > malicious corruption attack scenarios.
> > 
> > I think we need to keep improving our bounds checking and structure
> > corruption detection, but we should not worry about things that take
> > multiple, highly improbably structure corruptions that are hihgly
> > expensive to detect and/or mitigate. i.e.  unprivileged mounts of
> > untrusted XFS filesystem images should never, ever be allowed.
> > 
> 
> I agree with it, recently, I had a discussion with Eric about something similar,
> and well, this sort of attack requires some access to the filesystem image
> itself, or the device where it lays on. And well, IMHO, in a system, if
> somebody is ever allowed to do such modifications in the filesystem, the
> security problem is way above the filesystem itself, either by allowing
> unprivileged access to the FS image, or superuser being compromised.

*nod*

> Either way, I do not believe we will ever be able to 100% protect the superuser
> against himself, or bad administrative policies even so if such protection would
> be a performance trade-off.

It's not so much protecting the superuser from themselves, it's
laying out the case for the container environment as to why untrusted
user mounts of third party filesystem images won't be allowed for
XFS. i.e. we simply cannot verify the metadata is actually safe to
interpret in kernel space by parsing it in kernel space.

This goes for ext4 as well, and probably ever other Linux filesystem
that lacks a cryptographically verifiable on-disk structure.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] xfs: validate cached inodes are free when allocated
  2018-04-17  9:00   ` Carlos Maiolino
@ 2018-04-17 23:57     ` Dave Chinner
  2018-04-18  0:05       ` Darrick J. Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2018-04-17 23:57 UTC (permalink / raw)
  To: linux-xfs

On Tue, Apr 17, 2018 at 11:00:30AM +0200, Carlos Maiolino wrote:
> >  fs/xfs/xfs_icache.c | 73 +++++++++++++++++++++++++++++++++++------------------
> >  1 file changed, 48 insertions(+), 25 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index 98b7a4ae15e4..fb37ada55710 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -309,6 +309,46 @@ xfs_reinit_inode(
> >  	return error;
> >  }
> >  
> > +/*
> > + * If we are allocating a new inode, then check what was returned is
> > + * actually a free, empty inode. If we are not allocating an inode,
> > + * the check we didn't find a free inode.
> 
> I know it's been like that for a while, but maybe it's a good time to fix this
> typo? "then check we didn't find..." ?
> 
> Fixing the old typo or not, the patch looks good:
> 
> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

Darrick, do want me to resend with the typo fix, or can you do it on
the way into your tree?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] xfs: validate cached inodes are free when allocated
  2018-04-17 23:57     ` Dave Chinner
@ 2018-04-18  0:05       ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2018-04-18  0:05 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Apr 18, 2018 at 09:57:31AM +1000, Dave Chinner wrote:
> On Tue, Apr 17, 2018 at 11:00:30AM +0200, Carlos Maiolino wrote:
> > >  fs/xfs/xfs_icache.c | 73 +++++++++++++++++++++++++++++++++++------------------
> > >  1 file changed, 48 insertions(+), 25 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > index 98b7a4ae15e4..fb37ada55710 100644
> > > --- a/fs/xfs/xfs_icache.c
> > > +++ b/fs/xfs/xfs_icache.c
> > > @@ -309,6 +309,46 @@ xfs_reinit_inode(
> > >  	return error;
> > >  }
> > >  
> > > +/*
> > > + * If we are allocating a new inode, then check what was returned is
> > > + * actually a free, empty inode. If we are not allocating an inode,
> > > + * the check we didn't find a free inode.
> > 
> > I know it's been like that for a while, but maybe it's a good time to fix this
> > typo? "then check we didn't find..." ?
> > 
> > Fixing the old typo or not, the patch looks good:
> > 
> > Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
> 
> Darrick, do want me to resend with the typo fix, or can you do it on
> the way into your tree?

I'll do it on the way into the tree.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] xfs: validate cached inodes are free when allocated
  2018-04-17  6:39 ` [PATCH 1/2] xfs: validate cached inodes are free when allocated Dave Chinner
  2018-04-17  7:11   ` Christoph Hellwig
  2018-04-17  9:00   ` Carlos Maiolino
@ 2018-04-18  0:10   ` Darrick J. Wong
  2 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2018-04-18  0:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, viro

On Tue, Apr 17, 2018 at 04:39:15PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> A recent fuzzed filesystem image cached random dcache corruption
> when the reproducer was run. This often showed up as panics in
> lookup_slow() on a null inode->i_ops pointer when doing pathwalks.
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> ....
> Call Trace:
>  lookup_slow+0x44/0x60
>  walk_component+0x3dd/0x9f0
>  link_path_walk+0x4a7/0x830
>  path_lookupat+0xc1/0x470
>  filename_lookup+0x129/0x270
>  user_path_at_empty+0x36/0x40
>  path_listxattr+0x98/0x110
>  SyS_listxattr+0x13/0x20
>  do_syscall_64+0xf5/0x280
>  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> 
> but had many different failure modes including deadlocks trying to
> lock the inode that was just allocated or KASAN reports of
> use-after-free violations.
> 
> The cause of the problem was a corrupt INOBT on a v4 fs where the
> root inode was marked as free in the inobt record. Hence when we
> allocated an inode, it chose the root inode to allocate, found it in
> the cache and re-initialised it.
> 
> We recently fixed a similar inode allocation issue caused by inobt
> record corruption problem in xfs_iget_cache_miss() in commit
> ee457001ed6c ("xfs: catch inode allocation state mismatch
> corruption"). This change adds similar checks to the cache-hit path
> to catch it, and turns the reproducer into a corruption shutdown
> situation.
> 
> Reported-by: Wen Xu <wen.xu@gatech.edu>
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>

Looks ok, will test...
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_icache.c | 73 +++++++++++++++++++++++++++++++++++------------------
>  1 file changed, 48 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 98b7a4ae15e4..fb37ada55710 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -309,6 +309,46 @@ xfs_reinit_inode(
>  	return error;
>  }
>  
> +/*
> + * If we are allocating a new inode, then check what was returned is
> + * actually a free, empty inode. If we are not allocating an inode,
> + * the check we didn't find a free inode.
> + *
> + * Returns:
> + *	0		if the inode free state matches the lookup context
> + *	-ENOENT		if the inode is free and we are not allocating
> + *	-EFSCORRUPTED	if there is any state mismatch at all
> + */
> +static int
> +xfs_iget_check_free_state(
> +	struct xfs_inode	*ip,
> +	int			flags)
> +{
> +	if (flags & XFS_IGET_CREATE) {
> +		/* should be a free inode */
> +		if (VFS_I(ip)->i_mode != 0) {
> +			xfs_warn(ip->i_mount,
> +"Corruption detected! Free inode 0x%llx not marked free! (mode 0x%x)",
> +				ip->i_ino, VFS_I(ip)->i_mode);
> +			return -EFSCORRUPTED;
> +		}
> +
> +		if (ip->i_d.di_nblocks != 0) {
> +			xfs_warn(ip->i_mount,
> +"Corruption detected! Free inode 0x%llx has blocks allocated!",
> +				ip->i_ino);
> +			return -EFSCORRUPTED;
> +		}
> +		return 0;
> +	}
> +
> +	/* should be an allocated inode */
> +	if (VFS_I(ip)->i_mode == 0)
> +		return -ENOENT;
> +
> +	return 0;
> +}
> +
>  /*
>   * Check the validity of the inode we just found it the cache
>   */
> @@ -358,12 +398,12 @@ xfs_iget_cache_hit(
>  	}
>  
>  	/*
> -	 * If lookup is racing with unlink return an error immediately.
> +	 * Check the inode free state is valid. This also detects lookup
> +	 * racing with unlinks.
>  	 */
> -	if (VFS_I(ip)->i_mode == 0 && !(flags & XFS_IGET_CREATE)) {
> -		error = -ENOENT;
> +	error = xfs_iget_check_free_state(ip, flags);
> +	if (error)
>  		goto out_error;
> -	}
>  
>  	/*
>  	 * If IRECLAIMABLE is set, we've torn down the VFS inode already.
> @@ -486,29 +526,12 @@ xfs_iget_cache_miss(
>  
>  
>  	/*
> -	 * If we are allocating a new inode, then check what was returned is
> -	 * actually a free, empty inode. If we are not allocating an inode,
> -	 * the check we didn't find a free inode.
> +	 * Check the inode free state is valid. This also detects lookup
> +	 * racing with unlinks.
>  	 */
> -	if (flags & XFS_IGET_CREATE) {
> -		if (VFS_I(ip)->i_mode != 0) {
> -			xfs_warn(mp,
> -"Corruption detected! Free inode 0x%llx not marked free on disk",
> -				ino);
> -			error = -EFSCORRUPTED;
> -			goto out_destroy;
> -		}
> -		if (ip->i_d.di_nblocks != 0) {
> -			xfs_warn(mp,
> -"Corruption detected! Free inode 0x%llx has blocks allocated!",
> -				ino);
> -			error = -EFSCORRUPTED;
> -			goto out_destroy;
> -		}
> -	} else if (VFS_I(ip)->i_mode == 0) {
> -		error = -ENOENT;
> +	error = xfs_iget_check_free_state(ip, flags);
> +	if (error)
>  		goto out_destroy;
> -	}
>  
>  	/*
>  	 * Preload the radix tree so we can insert safely under the
> -- 
> 2.16.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] xfs: validate allocated inode number
  2018-04-17  6:39 ` [PATCH 2/2] xfs: validate allocated inode number Dave Chinner
  2018-04-17  7:12   ` Christoph Hellwig
  2018-04-17  9:05   ` Carlos Maiolino
@ 2018-04-18  0:12   ` Darrick J. Wong
  2 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2018-04-18  0:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, viro

On Tue, Apr 17, 2018 at 04:39:16PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we have corrupted free inode btrees, we can attempt to
> allocate inodes that we know are already allocated. Catch allocation
> of these inodes and report corruption as early as possible to
> prevent corruption propagation or deadlocks.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>

Looks ok, will test...
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_inode.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 7b764f746ff2..12e6c1aec386 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -790,6 +790,19 @@ xfs_ialloc(
>  	}
>  	ASSERT(*ialloc_context == NULL);
>  
> +	/*
> +	 * Protect against obviously corrupt allocation btree records. Later
> +	 * xfs_iget checks will catch re-allocation of other active in-memory
> +	 * and on-disk inodes. If we don't catch reallocating the parent inode
> +	 * here we will deadlock in xfs_iget() so we have to do these checks
> +	 * first.
> +	 */
> +	if ((pip && ino == pip->i_ino) ||
> +	    !xfs_verify_dir_ino(mp, ino)) {
> +		xfs_alert(mp, "Allocated a known in-use inode 0x%llx!", ino);
> +		return -EFSCORRUPTED;
> +	}
> +
>  	/*
>  	 * Get the in-core inode with the lock held exclusively.
>  	 * This is because we're setting fields here we need
> -- 
> 2.16.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-04-18  0:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-17  6:39 [PATCH 0/3] xfs: detect corrupt inobt records better Dave Chinner
2018-04-17  6:39 ` [PATCH 1/2] xfs: validate cached inodes are free when allocated Dave Chinner
2018-04-17  7:11   ` Christoph Hellwig
2018-04-17  9:00   ` Carlos Maiolino
2018-04-17 23:57     ` Dave Chinner
2018-04-18  0:05       ` Darrick J. Wong
2018-04-18  0:10   ` Darrick J. Wong
2018-04-17  6:39 ` [PATCH 2/2] xfs: validate allocated inode number Dave Chinner
2018-04-17  7:12   ` Christoph Hellwig
2018-04-17  9:05   ` Carlos Maiolino
2018-04-18  0:12   ` Darrick J. Wong
2018-04-17  9:13 ` [PATCH 0/3] xfs: detect corrupt inobt records better Carlos Maiolino
2018-04-17 22:28   ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.