linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Hutchings <ben@decadent.org.uk>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: akpm@linux-foundation.org,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	"Dave Chinner" <dchinner@redhat.com>,
	"Carlos Maiolino" <cmaiolino@redhat.com>
Subject: [PATCH 3.16 51/63] xfs: catch inode allocation state mismatch corruption
Date: Sat, 22 Sep 2018 01:15:42 +0100	[thread overview]
Message-ID: <lsq.1537575342.261199041@decadent.org.uk> (raw)
In-Reply-To: <lsq.1537575341.194909669@decadent.org.uk>

3.16.58-rc1 review patch.  If anyone has any objections, please let me know.

------------------

From: Dave Chinner <dchinner@redhat.com>

commit ee457001ed6c6f31ddad69c24c1da8f377d8472d upstream.

We recently came across a V4 filesystem causing memory corruption
due to a newly allocated inode being setup twice and being added to
the superblock inode list twice. From code inspection, the only way
this could happen is if a newly allocated inode was not marked as
free on disk (i.e. di_mode wasn't zero).

Running the metadump on an upstream debug kernel fails during inode
allocation like so:

XFS: Assertion failed: ip->i_d.di_nblocks == 0, file: fs/xfs/xfs_inod=
e.c, line: 838
 ------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:114!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 11 PID: 3496 Comm: mkdir Not tainted 4.16.0-rc5-dgc #442
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/0=
1/2014
RIP: 0010:assfail+0x28/0x30
RSP: 0018:ffffc9000236fc80 EFLAGS: 00010202
RAX: 00000000ffffffea RBX: 0000000000004000 RCX: 0000000000000000
RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff8227211b
RBP: ffffc9000236fce8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000bec R11: f000000000000000 R12: ffffc9000236fd30
R13: ffff8805c76bab80 R14: ffff8805c77ac800 R15: ffff88083fb12e10
FS:  00007fac8cbff040(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000=
000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fffa6783ff8 CR3: 00000005c6e2b003 CR4: 00000000000606e0
Call Trace:
 xfs_ialloc+0x383/0x570
 xfs_dir_ialloc+0x6a/0x2a0
 xfs_create+0x412/0x670
 xfs_generic_create+0x1f7/0x2c0
 ? capable_wrt_inode_uidgid+0x3f/0x50
 vfs_mkdir+0xfb/0x1b0
 SyS_mkdir+0xcf/0xf0
 do_syscall_64+0x73/0x1a0
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Extracting the inode number we crashed on from an event trace and
looking at it with xfs_db:

xfs_db> inode 184452204
xfs_db> p
core.magic = 0x494e
core.mode = 0100644
core.version = 2
core.format = 2 (extents)
core.nlinkv2 = 1
core.onlink = 0
.....

Confirms that it is not a free inode on disk. xfs_repair
also trips over this inode:

.....
zero length extent (off = 0, fsbno = 0) in ino 184452204
correcting nextents for inode 184452204
bad attribute fork in inode 184452204, would clear attr fork
bad nblocks 1 for inode 184452204, would reset to 0
bad anextents 1 for inode 184452204, would reset to 0
imap claims in-use inode 184452204 is free, would correct imap
would have cleared inode 184452204
.....
disconnected inode 184452204, would move to lost+found

And so we have a situation where the directory structure and the
inobt thinks the inode is free, but the inode on disk thinks it is
still in use. Where this corruption came from is not possible to
diagnose, but we can detect it and prevent the kernel from oopsing
on lookup. The reproducer now results in:

$ sudo mkdir /mnt/scratch/{0,1,2,3,4,5}{0,1,2,3,4,5}
mkdir: cannot create directory =E2=80=98/mnt/scratch/00=E2=80=99: File ex=
ists
mkdir: cannot create directory =E2=80=98/mnt/scratch/01=E2=80=99: File ex=
ists
mkdir: cannot create directory =E2=80=98/mnt/scratch/03=E2=80=99: Structu=
re needs cleaning
mkdir: cannot create directory =E2=80=98/mnt/scratch/04=E2=80=99: Input/o=
utput error
mkdir: cannot create directory =E2=80=98/mnt/scratch/05=E2=80=99: Input/o=
utput error
....

And this corruption shutdown:

[   54.843517] XFS (loop0): Corruption detected! Free inode 0xafe846c not=
 marked free on disk
[   54.845885] XFS (loop0): Internal error xfs_trans_cancel at line 1023 =
of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x425/0x670
[   54.848994] CPU: 10 PID: 3541 Comm: mkdir Not tainted 4.16.0-rc5-dgc #=
443
[   54.850753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO=
S 1.10.2-1 04/01/2014
[   54.852859] Call Trace:
[   54.853531]  dump_stack+0x85/0xc5
[   54.854385]  xfs_trans_cancel+0x197/0x1c0
[   54.855421]  xfs_create+0x425/0x670
[   54.856314]  xfs_generic_create+0x1f7/0x2c0
[   54.857390]  ? capable_wrt_inode_uidgid+0x3f/0x50
[   54.858586]  vfs_mkdir+0xfb/0x1b0
[   54.859458]  SyS_mkdir+0xcf/0xf0
[   54.860254]  do_syscall_64+0x73/0x1a0
[   54.861193]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[   54.862492] RIP: 0033:0x7fb73bddf547
[   54.863358] RSP: 002b:00007ffdaa553338 EFLAGS: 00000246 ORIG_RAX: 0000=
000000000053
[   54.865133] RAX: ffffffffffffffda RBX: 00007ffdaa55449a RCX: 00007fb73=
bddf547
[   54.866766] RDX: 0000000000000001 RSI: 00000000000001ff RDI: 00007ffda=
a55449a
[   54.868432] RBP: 00007ffdaa55449a R08: 00000000000001ff R09: 00005623a=
8670dd0
[   54.870110] R10: 00007fb73be72d5b R11: 0000000000000246 R12: 000000000=
00001ff
[   54.871752] R13: 00007ffdaa5534b0 R14: 0000000000000000 R15: 00007ffda=
a553500
[   54.873429] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1=
024 of file fs/xfs/xfs_trans.c.  Return address = ffffffff814cd050
[   54.882790] XFS (loop0): Corruption of in-memory data detected.  Shutt=
ing down filesystem
[   54.884597] XFS (loop0): Please umount the filesystem and rectify the =
problem(s)

Note that this crash is only possible on v4 filesystemsi or v5
filesystems mounted with the ikeep mount option. For all other V5
filesystems, this problem cannot occur because we don't read inodes
we are allocating from disk - we simply overwrite them with the new
inode information.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Tested-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[bwh: Backported to 3.16:
 - Look up mode in XFS inode, not VFS inode
 - Use positive error codes, and EIO instead of EFSCORRUPTED]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 fs/xfs/xfs_icache.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -293,7 +293,28 @@ xfs_iget_cache_miss(
 
 	trace_xfs_iget_miss(ip);
 
-	if ((ip->i_d.di_mode == 0) && !(flags & XFS_IGET_CREATE)) {
+
+	/*
+	 * If we are allocating a new inode, then check what was returned is
+	 * actually a free, empty inode. If we are not allocating an inode,
+	 * the check we didn't find a free inode.
+	 */
+	if (flags & XFS_IGET_CREATE) {
+		if (ip->i_d.di_mode != 0) {
+			xfs_warn(mp,
+"Corruption detected! Free inode 0x%llx not marked free on disk",
+				ino);
+			error = EIO;
+			goto out_destroy;
+		}
+		if (ip->i_d.di_nblocks != 0) {
+			xfs_warn(mp,
+"Corruption detected! Free inode 0x%llx has blocks allocated!",
+				ino);
+			error = EIO;
+			goto out_destroy;
+		}
+	} else if (ip->i_d.di_mode == 0) {
 		error = ENOENT;
 		goto out_destroy;
 	}


  reply	other threads:[~2018-09-22  0:23 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-22  0:15 [PATCH 3.16 00/63] 3.16.58-rc1 review Ben Hutchings
2018-09-22  0:15 ` Ben Hutchings [this message]
2018-09-22  5:25   ` [PATCH 3.16 51/63] xfs: catch inode allocation state mismatch corruption Dave Chinner
2018-09-22 20:57     ` Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 07/63] usbip: usbip_host: refine probe and disconnect debug msgs to be useful Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 41/63] USB: yurex: fix out-of-bounds uaccess in read handler Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 54/63] seccomp: create internal mode-setting function Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 45/63] x86/paravirt: Fix spectre-v2 mitigations for paravirt guests Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 36/63] jbd2: don't mark block as modified if the handle is out of credits Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 18/63] sr: pass down correctly sized SCSI sense buffer Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 06/63] usbip: usbip_host: fix to hold parent lock for device_attach() calls Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 29/63] ext4: make sure bitmaps and the inode table don't overlap with bg descriptors Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 63/63] mm: get rid of vmacache_flush_all() entirely Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 26/63] ext4: verify the depth of extent tree in ext4_find_extent() Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 38/63] Fix up non-directory creation in SGID directories Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 56/63] seccomp: split mode setting routines Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 13/63] futex: Remove unnecessary warning from get_futex_key Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 21/63] Bluetooth: hidp: buffer overflow in hidp_process_report Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 32/63] ext4: always verify the magic number in xattr blocks Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 23/63] xfs: set format back to extents if xfs_bmap_extents_to_btree Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 08/63] usbip: usbip_host: delete device from busid_table after rebind Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 60/63] x86/cpu/AMD: Fix erratum 1076 (CPB bit) Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 57/63] seccomp: add "seccomp" syscall Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 20/63] scsi: sg: allocate with __GFP_ZERO in sg_build_indirect() Ben Hutchings
2018-09-22  0:19   ` syzbot
2018-09-22  0:15 ` [PATCH 3.16 04/63] net: Set sk_prot_creator when cloning sockets to the right proto Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 33/63] ext4: never move the system.data xattr out of the inode body Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 55/63] seccomp: extract check/assign mode helpers Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 16/63] KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 35/63] ext4: add more inode number paranoia checks Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 42/63] ALSA: rawmidi: Change resized buffers atomically Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 27/63] ext4: always check block group bounds in ext4_init_block_bitmap() Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 25/63] ext4: fix check to prevent initializing reserved inodes Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 14/63] KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 15/63] KVM: x86: introduce linear_{read,write}_system Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 61/63] x86/cpu/intel: Add Knights Mill to Intel family Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 40/63] infiniband: fix a possible use-after-free bug Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 09/63] usbip: usbip_host: run rebind from exit when module is removed Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 11/63] usbip: usbip_host: fix bad unlock balance during stub_probe() Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 12/63] futex: Remove requirement for lock_page() in get_futex_key() Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 30/63] ext4: fix false negatives *and* false positives in ext4_check_descriptors() Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 37/63] ext4: avoid running out of journal credits when appending to an inline file Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 47/63] uas: replace WARN_ON_ONCE() with lockdep_assert_held() Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 44/63] x86/speculation: Protect against userspace-userspace spectreRSB Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 49/63] btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 24/63] ext4: only look at the bg_flags field if it is valid Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 53/63] xfs: don't call xfs_da_shrink_inode with NULL bp Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 17/63] kvm: x86: use correct privilege level for sgdt/sidt/fxsave/fxrstor access Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 43/63] x86/speculation: Clean up various Spectre related details Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 52/63] xfs: validate cached inodes are free when allocated Ben Hutchings
2018-09-22  5:26   ` Dave Chinner
2018-09-22 20:57     ` Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 48/63] video: uvesafb: Fix integer overflow in allocation Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 19/63] jfs: Fix inconsistency between memory allocation and ea_buf->max_size Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 22/63] scsi: libsas: defer ata device eh commands to libata Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 05/63] usbip: fix error handling in stub_probe() Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 10/63] usbip: usbip_host: fix NULL-ptr deref and use-after-free errors Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 46/63] cdrom: Fix info leak/OOB read in cdrom_ioctl_drive_status Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 28/63] ext4: don't allow r/w mounts if metadata blocks overlap the superblock Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 03/63] Revert "vti4: Don't override MTU passed on link creation via IFLA_MTU" Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 34/63] ext4: clear i_data in ext4_inode_info when removing inline data Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 01/63] x86/fpu: Fix the 'nofxsr' boot parameter to also clear X86_FEATURE_FXSR_OPT Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 02/63] x86/fpu: Default eagerfpu if FPU and FXSR are enabled Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 58/63] x86/process: Optimize TIF checks in __switch_to_xtra() Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 31/63] ext4: add corruption check in ext4_xattr_set_entry() Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 39/63] x86/entry/64: Remove %ebx handling from error_entry/exit Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 59/63] x86/process: Correct and optimize TIF_BLOCKSTEP switch Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 50/63] hfsplus: fix NULL dereference in hfsplus_lookup() Ben Hutchings
2018-09-22  0:15 ` [PATCH 3.16 62/63] KVM: x86: introduce num_emulated_msrs Ben Hutchings
2018-09-22 12:28 ` [PATCH 3.16 00/63] 3.16.58-rc1 review Guenter Roeck
2018-09-22 21:03   ` Ben Hutchings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=lsq.1537575342.261199041@decadent.org.uk \
    --to=ben@decadent.org.uk \
    --cc=akpm@linux-foundation.org \
    --cc=cmaiolino@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).