All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] xfs: fixes for inode allocation issues
@ 2018-03-21  7:52 Dave Chinner
  2018-03-21  7:52 ` [PATCH 1/2] xfs: catch inode allocation state mismatch corruption Dave Chinner
  2018-03-21  7:52 ` [PATCH 2/2] xfs: remove dead inode version setting code Dave Chinner
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Chinner @ 2018-03-21  7:52 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

The first patch in this series catches an on disk corruption that
can cause an oops on v4 and v5 w/ ikeep filesystems during inode
allocation. It turns the oops into an EFSCORRUPTED error return, and
the higher level code handles it from there.

The second patch is a cleanup I noticed while looking for the bug
fixed in the first patch.

Cheers,

Dave.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] xfs: catch inode allocation state mismatch corruption
  2018-03-21  7:52 [PATCH 0/2] xfs: fixes for inode allocation issues Dave Chinner
@ 2018-03-21  7:52 ` Dave Chinner
  2018-03-21 17:06   ` Darrick J. Wong
  2018-03-23 12:55   ` Carlos Maiolino
  2018-03-21  7:52 ` [PATCH 2/2] xfs: remove dead inode version setting code Dave Chinner
  1 sibling, 2 replies; 8+ messages in thread
From: Dave Chinner @ 2018-03-21  7:52 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We recently came across a V4 filesystem causing memory corruption
due to a newly allocated inode being setup twice and being added to
the superblock inode list twice. From code inspection, the only way
this could happen is if a newly allocated inode was not marked as
free on disk (i.e. di_mode wasn't zero).

Running the metadump on an upstream debug kernel fails during inode
allocation like so:

XFS: Assertion failed: ip->i_d.di_nblocks == 0, file: fs/xfs/xfs_inode.c, line: 838
------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:114!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 11 PID: 3496 Comm: mkdir Not tainted 4.16.0-rc5-dgc #442
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
RIP: 0010:assfail+0x28/0x30
RSP: 0018:ffffc9000236fc80 EFLAGS: 00010202
RAX: 00000000ffffffea RBX: 0000000000004000 RCX: 0000000000000000
RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff8227211b
RBP: ffffc9000236fce8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000bec R11: f000000000000000 R12: ffffc9000236fd30
R13: ffff8805c76bab80 R14: ffff8805c77ac800 R15: ffff88083fb12e10
FS:  00007fac8cbff040(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fffa6783ff8 CR3: 00000005c6e2b003 CR4: 00000000000606e0
Call Trace:
 xfs_ialloc+0x383/0x570
 xfs_dir_ialloc+0x6a/0x2a0
 xfs_create+0x412/0x670
 xfs_generic_create+0x1f7/0x2c0
 ? capable_wrt_inode_uidgid+0x3f/0x50
 vfs_mkdir+0xfb/0x1b0
 SyS_mkdir+0xcf/0xf0
 do_syscall_64+0x73/0x1a0
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Extracting the inode number we crashed on from an event trace and
looking at it with xfs_db:

xfs_db> inode 184452204
xfs_db> p
core.magic = 0x494e
core.mode = 0100644
core.version = 2
core.format = 2 (extents)
core.nlinkv2 = 1
core.onlink = 0
.....

Confirms that it is not a free inode on disk. xfs_repair
also trips over this inode:

.....
zero length extent (off = 0, fsbno = 0) in ino 184452204
correcting nextents for inode 184452204
bad attribute fork in inode 184452204, would clear attr fork
bad nblocks 1 for inode 184452204, would reset to 0
bad anextents 1 for inode 184452204, would reset to 0
imap claims in-use inode 184452204 is free, would correct imap
would have cleared inode 184452204
.....
disconnected inode 184452204, would move to lost+found

And so we have a situation where the directory structure and the
inobt thinks the inode is free, but the inode on disk thinks it is
still in use. Where this corruption came from is not possible to
diagnose, but we can detect it and prevent the kernel from oopsing
on lookup. The reproducer now results in:

$ sudo mkdir /mnt/scratch/{0,1,2,3,4,5}{0,1,2,3,4,5}
mkdir: cannot create directory ‘/mnt/scratch/00’: File exists
mkdir: cannot create directory ‘/mnt/scratch/01’: File exists
mkdir: cannot create directory ‘/mnt/scratch/03’: Structure needs cleaning
mkdir: cannot create directory ‘/mnt/scratch/04’: Input/output error
mkdir: cannot create directory ‘/mnt/scratch/05’: Input/output error
....

And this corruption shutdown:

[   54.843517] XFS (loop0): Corruption detected! Free inode 0xafe846c not marked free on disk
[   54.845885] XFS (loop0): Internal error xfs_trans_cancel at line 1023 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x425/0x670
[   54.848994] CPU: 10 PID: 3541 Comm: mkdir Not tainted 4.16.0-rc5-dgc #443
[   54.850753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[   54.852859] Call Trace:
[   54.853531]  dump_stack+0x85/0xc5
[   54.854385]  xfs_trans_cancel+0x197/0x1c0
[   54.855421]  xfs_create+0x425/0x670
[   54.856314]  xfs_generic_create+0x1f7/0x2c0
[   54.857390]  ? capable_wrt_inode_uidgid+0x3f/0x50
[   54.858586]  vfs_mkdir+0xfb/0x1b0
[   54.859458]  SyS_mkdir+0xcf/0xf0
[   54.860254]  do_syscall_64+0x73/0x1a0
[   54.861193]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[   54.862492] RIP: 0033:0x7fb73bddf547
[   54.863358] RSP: 002b:00007ffdaa553338 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[   54.865133] RAX: ffffffffffffffda RBX: 00007ffdaa55449a RCX: 00007fb73bddf547
[   54.866766] RDX: 0000000000000001 RSI: 00000000000001ff RDI: 00007ffdaa55449a
[   54.868432] RBP: 00007ffdaa55449a R08: 00000000000001ff R09: 00005623a8670dd0
[   54.870110] R10: 00007fb73be72d5b R11: 0000000000000246 R12: 00000000000001ff
[   54.871752] R13: 00007ffdaa5534b0 R14: 0000000000000000 R15: 00007ffdaa553500
[   54.873429] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1024 of file fs/xfs/xfs_trans.c.  Return address = ffffffff814cd050
[   54.882790] XFS (loop0): Corruption of in-memory data detected.  Shutting down filesystem
[   54.884597] XFS (loop0): Please umount the filesystem and rectify the problem(s)

Note that this crash is only possible on v4 filesystemsi or v5
filesystems mounted with the ikeep mount option. For all other V5
filesystems, this problem cannot occur because we don't read inodes
we are allocating from disk - we simply overwrite them with the new
inode information.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_icache.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 1dc37b72b6ea..98b7a4ae15e4 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -484,7 +484,28 @@ xfs_iget_cache_miss(
 
 	trace_xfs_iget_miss(ip);
 
-	if ((VFS_I(ip)->i_mode == 0) && !(flags & XFS_IGET_CREATE)) {
+
+	/*
+	 * If we are allocating a new inode, then check what was returned is
+	 * actually a free, empty inode. If we are not allocating an inode,
+	 * the check we didn't find a free inode.
+	 */
+	if (flags & XFS_IGET_CREATE) {
+		if (VFS_I(ip)->i_mode != 0) {
+			xfs_warn(mp,
+"Corruption detected! Free inode 0x%llx not marked free on disk",
+				ino);
+			error = -EFSCORRUPTED;
+			goto out_destroy;
+		}
+		if (ip->i_d.di_nblocks != 0) {
+			xfs_warn(mp,
+"Corruption detected! Free inode 0x%llx has blocks allocated!",
+				ino);
+			error = -EFSCORRUPTED;
+			goto out_destroy;
+		}
+	} else if (VFS_I(ip)->i_mode == 0) {
 		error = -ENOENT;
 		goto out_destroy;
 	}
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] xfs: remove dead inode version setting code
  2018-03-21  7:52 [PATCH 0/2] xfs: fixes for inode allocation issues Dave Chinner
  2018-03-21  7:52 ` [PATCH 1/2] xfs: catch inode allocation state mismatch corruption Dave Chinner
@ 2018-03-21  7:52 ` Dave Chinner
  2018-03-21 17:07   ` Darrick J. Wong
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2018-03-21  7:52 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We can only get into the branch if CRCs are enabled, so there's no
need to check inside the branch for CRCs being enabled....

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_inode_buf.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 4fe17b368316..b646bf633f66 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -564,10 +564,7 @@ xfs_iread(
 		/* initialise the on-disk inode core */
 		memset(&ip->i_d, 0, sizeof(ip->i_d));
 		VFS_I(ip)->i_generation = prandom_u32();
-		if (xfs_sb_version_hascrc(&mp->m_sb))
-			ip->i_d.di_version = 3;
-		else
-			ip->i_d.di_version = 2;
+		ip->i_d.di_version = 3;
 		return 0;
 	}
 
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] xfs: catch inode allocation state mismatch corruption
  2018-03-21  7:52 ` [PATCH 1/2] xfs: catch inode allocation state mismatch corruption Dave Chinner
@ 2018-03-21 17:06   ` Darrick J. Wong
  2018-03-21 21:10     ` Dave Chinner
  2018-03-23 12:55   ` Carlos Maiolino
  1 sibling, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2018-03-21 17:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Mar 21, 2018 at 06:52:47PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We recently came across a V4 filesystem causing memory corruption
> due to a newly allocated inode being setup twice and being added to
> the superblock inode list twice. From code inspection, the only way
> this could happen is if a newly allocated inode was not marked as
> free on disk (i.e. di_mode wasn't zero).
> 
> Running the metadump on an upstream debug kernel fails during inode
> allocation like so:
> 
> XFS: Assertion failed: ip->i_d.di_nblocks == 0, file: fs/xfs/xfs_inode.c, line: 838
> ------------[ cut here ]------------
> kernel BUG at fs/xfs/xfs_message.c:114!
> invalid opcode: 0000 [#1] PREEMPT SMP
> CPU: 11 PID: 3496 Comm: mkdir Not tainted 4.16.0-rc5-dgc #442
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> RIP: 0010:assfail+0x28/0x30
> RSP: 0018:ffffc9000236fc80 EFLAGS: 00010202
> RAX: 00000000ffffffea RBX: 0000000000004000 RCX: 0000000000000000
> RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff8227211b
> RBP: ffffc9000236fce8 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000bec R11: f000000000000000 R12: ffffc9000236fd30
> R13: ffff8805c76bab80 R14: ffff8805c77ac800 R15: ffff88083fb12e10
> FS:  00007fac8cbff040(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fffa6783ff8 CR3: 00000005c6e2b003 CR4: 00000000000606e0
> Call Trace:
>  xfs_ialloc+0x383/0x570
>  xfs_dir_ialloc+0x6a/0x2a0
>  xfs_create+0x412/0x670
>  xfs_generic_create+0x1f7/0x2c0
>  ? capable_wrt_inode_uidgid+0x3f/0x50
>  vfs_mkdir+0xfb/0x1b0
>  SyS_mkdir+0xcf/0xf0
>  do_syscall_64+0x73/0x1a0
>  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> 
> Extracting the inode number we crashed on from an event trace and
> looking at it with xfs_db:
> 
> xfs_db> inode 184452204
> xfs_db> p
> core.magic = 0x494e
> core.mode = 0100644
> core.version = 2
> core.format = 2 (extents)
> core.nlinkv2 = 1
> core.onlink = 0
> .....
> 
> Confirms that it is not a free inode on disk. xfs_repair
> also trips over this inode:
> 
> .....
> zero length extent (off = 0, fsbno = 0) in ino 184452204
> correcting nextents for inode 184452204
> bad attribute fork in inode 184452204, would clear attr fork
> bad nblocks 1 for inode 184452204, would reset to 0
> bad anextents 1 for inode 184452204, would reset to 0
> imap claims in-use inode 184452204 is free, would correct imap
> would have cleared inode 184452204
> .....
> disconnected inode 184452204, would move to lost+found
> 
> And so we have a situation where the directory structure and the
> inobt thinks the inode is free, but the inode on disk thinks it is
> still in use. Where this corruption came from is not possible to
> diagnose, but we can detect it and prevent the kernel from oopsing
> on lookup. The reproducer now results in:
> 
> $ sudo mkdir /mnt/scratch/{0,1,2,3,4,5}{0,1,2,3,4,5}
> mkdir: cannot create directory ‘/mnt/scratch/00’: File exists
> mkdir: cannot create directory ‘/mnt/scratch/01’: File exists
> mkdir: cannot create directory ‘/mnt/scratch/03’: Structure needs cleaning
> mkdir: cannot create directory ‘/mnt/scratch/04’: Input/output error
> mkdir: cannot create directory ‘/mnt/scratch/05’: Input/output error
> ....
> 
> And this corruption shutdown:
> 
> [   54.843517] XFS (loop0): Corruption detected! Free inode 0xafe846c not marked free on disk
> [   54.845885] XFS (loop0): Internal error xfs_trans_cancel at line 1023 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x425/0x670
> [   54.848994] CPU: 10 PID: 3541 Comm: mkdir Not tainted 4.16.0-rc5-dgc #443
> [   54.850753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [   54.852859] Call Trace:
> [   54.853531]  dump_stack+0x85/0xc5
> [   54.854385]  xfs_trans_cancel+0x197/0x1c0
> [   54.855421]  xfs_create+0x425/0x670
> [   54.856314]  xfs_generic_create+0x1f7/0x2c0
> [   54.857390]  ? capable_wrt_inode_uidgid+0x3f/0x50
> [   54.858586]  vfs_mkdir+0xfb/0x1b0
> [   54.859458]  SyS_mkdir+0xcf/0xf0
> [   54.860254]  do_syscall_64+0x73/0x1a0
> [   54.861193]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> [   54.862492] RIP: 0033:0x7fb73bddf547
> [   54.863358] RSP: 002b:00007ffdaa553338 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
> [   54.865133] RAX: ffffffffffffffda RBX: 00007ffdaa55449a RCX: 00007fb73bddf547
> [   54.866766] RDX: 0000000000000001 RSI: 00000000000001ff RDI: 00007ffdaa55449a
> [   54.868432] RBP: 00007ffdaa55449a R08: 00000000000001ff R09: 00005623a8670dd0
> [   54.870110] R10: 00007fb73be72d5b R11: 0000000000000246 R12: 00000000000001ff
> [   54.871752] R13: 00007ffdaa5534b0 R14: 0000000000000000 R15: 00007ffdaa553500
> [   54.873429] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1024 of file fs/xfs/xfs_trans.c.  Return address = ffffffff814cd050
> [   54.882790] XFS (loop0): Corruption of in-memory data detected.  Shutting down filesystem
> [   54.884597] XFS (loop0): Please umount the filesystem and rectify the problem(s)
> 
> Note that this crash is only possible on v4 filesystemsi or v5
> filesystems mounted with the ikeep mount option. For all other V5
> filesystems, this problem cannot occur because we don't read inodes
> we are allocating from disk - we simply overwrite them with the new
> inode information.

Got a test case for this scenario? :)

The patch looks ok, but I'd rather have some sort of reproducer so I can
verify that it works (and that scrub will notice the broken state) even
if we have to mkfs + db to check it.

> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_icache.c | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 1dc37b72b6ea..98b7a4ae15e4 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -484,7 +484,28 @@ xfs_iget_cache_miss(
>  
>  	trace_xfs_iget_miss(ip);
>  
> -	if ((VFS_I(ip)->i_mode == 0) && !(flags & XFS_IGET_CREATE)) {
> +
> +	/*
> +	 * If we are allocating a new inode, then check what was returned is
> +	 * actually a free, empty inode. If we are not allocating an inode,
> +	 * the check we didn't find a free inode.
> +	 */
> +	if (flags & XFS_IGET_CREATE) {
> +		if (VFS_I(ip)->i_mode != 0) {
> +			xfs_warn(mp,
> +"Corruption detected! Free inode 0x%llx not marked free on disk",
> +				ino);
> +			error = -EFSCORRUPTED;
> +			goto out_destroy;
> +		}
> +		if (ip->i_d.di_nblocks != 0) {
> +			xfs_warn(mp,
> +"Corruption detected! Free inode 0x%llx has blocks allocated!",
> +				ino);
> +			error = -EFSCORRUPTED;
> +			goto out_destroy;

I've a patch out for review that adds a xfs_inode_verifier_error
function that spits out a standardized corruption warning, a hex dump of
the bad dinode, and tells the user to run repair.  This seems like a
good candidate for that.

--D

> +		}
> +	} else if (VFS_I(ip)->i_mode == 0) {
>  		error = -ENOENT;
>  		goto out_destroy;
>  	}
> -- 
> 2.16.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] xfs: remove dead inode version setting code
  2018-03-21  7:52 ` [PATCH 2/2] xfs: remove dead inode version setting code Dave Chinner
@ 2018-03-21 17:07   ` Darrick J. Wong
  0 siblings, 0 replies; 8+ messages in thread
From: Darrick J. Wong @ 2018-03-21 17:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Mar 21, 2018 at 06:52:48PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We can only get into the branch if CRCs are enabled, so there's no
> need to check inside the branch for CRCs being enabled....
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_inode_buf.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index 4fe17b368316..b646bf633f66 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -564,10 +564,7 @@ xfs_iread(
>  		/* initialise the on-disk inode core */
>  		memset(&ip->i_d, 0, sizeof(ip->i_d));
>  		VFS_I(ip)->i_generation = prandom_u32();
> -		if (xfs_sb_version_hascrc(&mp->m_sb))
> -			ip->i_d.di_version = 3;
> -		else
> -			ip->i_d.di_version = 2;
> +		ip->i_d.di_version = 3;
>  		return 0;
>  	}
>  
> -- 
> 2.16.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] xfs: catch inode allocation state mismatch corruption
  2018-03-21 17:06   ` Darrick J. Wong
@ 2018-03-21 21:10     ` Dave Chinner
  2018-03-23 13:07       ` Carlos Maiolino
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2018-03-21 21:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Mar 21, 2018 at 10:06:40AM -0700, Darrick J. Wong wrote:
> On Wed, Mar 21, 2018 at 06:52:47PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > We recently came across a V4 filesystem causing memory corruption
> > due to a newly allocated inode being setup twice and being added to
> > the superblock inode list twice. From code inspection, the only way
> > this could happen is if a newly allocated inode was not marked as
> > free on disk (i.e. di_mode wasn't zero).
.....
> > Note that this crash is only possible on v4 filesystemsi or v5
> > filesystems mounted with the ikeep mount option. For all other V5
> > filesystems, this problem cannot occur because we don't read inodes
> > we are allocating from disk - we simply overwrite them with the new
> > inode information.
> 
> Got a test case for this scenario? :)

No, because I haven't had time to build an xfs_db script to corrupt
an inode and allocate it reliably yet.  I've only been using the
supplied metadump image to reproduce it so far. i.e.  I posted the
patch the moment I confirmed that it fixes the problem....

> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index 1dc37b72b6ea..98b7a4ae15e4 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -484,7 +484,28 @@ xfs_iget_cache_miss(
> >  
> >  	trace_xfs_iget_miss(ip);
> >  
> > -	if ((VFS_I(ip)->i_mode == 0) && !(flags & XFS_IGET_CREATE)) {
> > +
> > +	/*
> > +	 * If we are allocating a new inode, then check what was returned is
> > +	 * actually a free, empty inode. If we are not allocating an inode,
> > +	 * the check we didn't find a free inode.
> > +	 */
> > +	if (flags & XFS_IGET_CREATE) {
> > +		if (VFS_I(ip)->i_mode != 0) {
> > +			xfs_warn(mp,
> > +"Corruption detected! Free inode 0x%llx not marked free on disk",
> > +				ino);
> > +			error = -EFSCORRUPTED;
> > +			goto out_destroy;
> > +		}
> > +		if (ip->i_d.di_nblocks != 0) {
> > +			xfs_warn(mp,
> > +"Corruption detected! Free inode 0x%llx has blocks allocated!",
> > +				ino);
> > +			error = -EFSCORRUPTED;
> > +			goto out_destroy;
> 
> I've a patch out for review that adds a xfs_inode_verifier_error
> function that spits out a standardized corruption warning, a hex dump of
> the bad dinode, and tells the user to run repair.  This seems like a
> good candidate for that.

Sure, but that can be done as a separate commit - this fix is almost
certainly going to be backported to a distro kernel or two, so
keeping the initial commit as just the fix makes that whole process
much easier...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] xfs: catch inode allocation state mismatch corruption
  2018-03-21  7:52 ` [PATCH 1/2] xfs: catch inode allocation state mismatch corruption Dave Chinner
  2018-03-21 17:06   ` Darrick J. Wong
@ 2018-03-23 12:55   ` Carlos Maiolino
  1 sibling, 0 replies; 8+ messages in thread
From: Carlos Maiolino @ 2018-03-23 12:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

> 
> Note that this crash is only possible on v4 filesystemsi or v5
> filesystems mounted with the ikeep mount option. For all other V5
> filesystems, this problem cannot occur because we don't read inodes
> we are allocating from disk - we simply overwrite them with the new
> inode information.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>

I could reproduce this problem by manually setting an inode as free in the
inobt, and the patch fixes the bug.

The patch looks good too.

xfstests should hit the list soon.

Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Tested-by: Carlos Maiolino <cmaiolino@redhat.com>


> ---
>  fs/xfs/xfs_icache.c | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 1dc37b72b6ea..98b7a4ae15e4 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -484,7 +484,28 @@ xfs_iget_cache_miss(
>  
>  	trace_xfs_iget_miss(ip);
>  
> -	if ((VFS_I(ip)->i_mode == 0) && !(flags & XFS_IGET_CREATE)) {
> +
> +	/*
> +	 * If we are allocating a new inode, then check what was returned is
> +	 * actually a free, empty inode. If we are not allocating an inode,
> +	 * the check we didn't find a free inode.
> +	 */
> +	if (flags & XFS_IGET_CREATE) {
> +		if (VFS_I(ip)->i_mode != 0) {
> +			xfs_warn(mp,
> +"Corruption detected! Free inode 0x%llx not marked free on disk",
> +				ino);
> +			error = -EFSCORRUPTED;
> +			goto out_destroy;
> +		}
> +		if (ip->i_d.di_nblocks != 0) {
> +			xfs_warn(mp,
> +"Corruption detected! Free inode 0x%llx has blocks allocated!",
> +				ino);
> +			error = -EFSCORRUPTED;
> +			goto out_destroy;
> +		}
> +	} else if (VFS_I(ip)->i_mode == 0) {
>  		error = -ENOENT;
>  		goto out_destroy;
>  	}
> -- 
> 2.16.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] xfs: catch inode allocation state mismatch corruption
  2018-03-21 21:10     ` Dave Chinner
@ 2018-03-23 13:07       ` Carlos Maiolino
  0 siblings, 0 replies; 8+ messages in thread
From: Carlos Maiolino @ 2018-03-23 13:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, linux-xfs

On Thu, Mar 22, 2018 at 08:10:22AM +1100, Dave Chinner wrote:
> On Wed, Mar 21, 2018 at 10:06:40AM -0700, Darrick J. Wong wrote:
> > On Wed, Mar 21, 2018 at 06:52:47PM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > We recently came across a V4 filesystem causing memory corruption
> > > due to a newly allocated inode being setup twice and being added to
> > > the superblock inode list twice. From code inspection, the only way
> > > this could happen is if a newly allocated inode was not marked as
> > > free on disk (i.e. di_mode wasn't zero).
> .....
> > > Note that this crash is only possible on v4 filesystemsi or v5
> > > filesystems mounted with the ikeep mount option. For all other V5
> > > filesystems, this problem cannot occur because we don't read inodes
> > > we are allocating from disk - we simply overwrite them with the new
> > > inode information.
> > 
> > Got a test case for this scenario? :)
> 
> No, because I haven't had time to build an xfs_db script to corrupt
> an inode and allocate it reliably yet.  I've only been using the
> supplied metadump image to reproduce it so far. i.e.  I posted the
> patch the moment I confirmed that it fixes the problem....
> 

Oh, just read this, I was considering something like:

- mkfs.xfs the device with a fixed geometry.
- create a dir into the fs
- create a single file under this dir (using dd just to have a block allocated)
- using xfs_io to overwrite the inobt block into specified offset holding the
  inobt_record (there should be a single chunk only anyway).

The trickiest part would be to calculate the offsets, but I think this can be
done based on the geometry of the FS created. Well, at least for some reason we
change the Btree header/records size.

Does it look reasonable?


> > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > index 1dc37b72b6ea..98b7a4ae15e4 100644
> > > --- a/fs/xfs/xfs_icache.c
> > > +++ b/fs/xfs/xfs_icache.c
> > > @@ -484,7 +484,28 @@ xfs_iget_cache_miss(
> > >  
> > >  	trace_xfs_iget_miss(ip);
> > >  
> > > -	if ((VFS_I(ip)->i_mode == 0) && !(flags & XFS_IGET_CREATE)) {
> > > +
> > > +	/*
> > > +	 * If we are allocating a new inode, then check what was returned is
> > > +	 * actually a free, empty inode. If we are not allocating an inode,
> > > +	 * the check we didn't find a free inode.
> > > +	 */
> > > +	if (flags & XFS_IGET_CREATE) {
> > > +		if (VFS_I(ip)->i_mode != 0) {
> > > +			xfs_warn(mp,
> > > +"Corruption detected! Free inode 0x%llx not marked free on disk",
> > > +				ino);
> > > +			error = -EFSCORRUPTED;
> > > +			goto out_destroy;
> > > +		}
> > > +		if (ip->i_d.di_nblocks != 0) {
> > > +			xfs_warn(mp,
> > > +"Corruption detected! Free inode 0x%llx has blocks allocated!",
> > > +				ino);
> > > +			error = -EFSCORRUPTED;
> > > +			goto out_destroy;
> > 
> > I've a patch out for review that adds a xfs_inode_verifier_error
> > function that spits out a standardized corruption warning, a hex dump of
> > the bad dinode, and tells the user to run repair.  This seems like a
> > good candidate for that.
> 
> Sure, but that can be done as a separate commit - this fix is almost
> certainly going to be backported to a distro kernel or two, so
> keeping the initial commit as just the fix makes that whole process
> much easier...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-03-23 13:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-21  7:52 [PATCH 0/2] xfs: fixes for inode allocation issues Dave Chinner
2018-03-21  7:52 ` [PATCH 1/2] xfs: catch inode allocation state mismatch corruption Dave Chinner
2018-03-21 17:06   ` Darrick J. Wong
2018-03-21 21:10     ` Dave Chinner
2018-03-23 13:07       ` Carlos Maiolino
2018-03-23 12:55   ` Carlos Maiolino
2018-03-21  7:52 ` [PATCH 2/2] xfs: remove dead inode version setting code Dave Chinner
2018-03-21 17:07   ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.