[PATCH 0/1] Fix for xfs agfl wrap crash for stable kernels

* [PATCH 0/1] Fix for xfs agfl wrap crash for stable kernels
@ 2018-05-31 23:41 Dave Chiluk
  2018-05-31 23:41 ` [PATCH] xfs: detect agfl count corruption and reset agfl Dave Chiluk
  2018-06-01 17:07 ` [PATCH 0/1] Fix for xfs agfl wrap crash for stable kernels Greg KH
  0 siblings, 2 replies; 3+ messages in thread
From: Dave Chiluk @ 2018-05-31 23:41 UTC (permalink / raw)
  To: stable

When moving xfs volumes between kernels that have 96f859d52 and don't
have 96f859d52, there is potential for a filesystem crash if the agfl
has wrapped (flfirst > fllast).  Depending on which filesystem this is
this can take down the whole machine.

Such is the case when upgrading from the stock Centos 7 3.13 to the
kernel.org stable kernels (via elrepo).  Another possible common
boundary cross I noticed was early Ubuntu kernel v4.4 to recent v4.4.
We've been hitting this crash roughly once a week in our cloud, and it
has produced the below stack trace.

The solution prefers to reset the agfl and leak a few blocks instead of
shutting down the filesystem.  The leaked blocks can be recovered using
a xfs_repair.

The attached patch is a backport of a27ba2607 due to a78ee256c.  It is
intended for and tested on the v4.4 stream, but should apply to all
kernels that lack upstream a78ee256c.  

Thanks,
Dave Chiluk

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of file fs/xfs/libxfs/xfs_btree.c.  Caller xfs_free_ag_extent+0x35d/0x7a0 [xfs]
CPU: 18 PID: 9896 Comm: mesos-slave Not tainted 4.10.10-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro PIO-618U-TR4T+-ST031/X10DRU-i+, BIOS 2.0 12/17/2015
Call Trace:
dump_stack+0x63/0x87
xfs_error_report+0x3b/0x40 [xfs]
? xfs_free_ag_extent+0x35d/0x7a0 [xfs]
xfs_btree_insert+0x1b0/0x1c0 [xfs]
xfs_free_ag_extent+0x35d/0x7a0 [xfs]
xfs_free_extent+0xbb/0x150 [xfs]
xfs_trans_free_extent+0x4f/0x110 [xfs]
? xfs_trans_add_item+0x5d/0x90 [xfs] 
xfs_extent_free_finish_item+0x26/0x40 [xfs]
xfs_defer_finish+0x149/0x410 [xfs]
xfs_remove+0x281/0x330 [xfs]
xfs_vn_unlink+0x55/0xa0 [xfs]
vfs_rmdir+0xb6/0x130
do_rmdir+0x1b3/0x1d0
SyS_rmdir+0x16/0x20
do_syscall_64+0x67/0x180
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f85d8d92397
RSP: 002b:00007f85cef9b758 EFLAGS: 00000246 ORIG_RAX: 0000000000000054
RAX: ffffffffffffffda RBX: 00007f858c00b4c0 RCX: 00007f85d8d92397
RDX: 00007f858c09ad70 RSI: 0000000000000000 RDI: 00007f858c09ad70
RBP: 00007f85cef9bc30 R08: 0000000000000001 R09: 0000000000000002
R10: 0000006f74656c67 R11: 0000000000000246 R12: 00007f85cef9c640
R13: 00007f85cef9bc50 R14: 00007f85cef9bcc0 R15: 00007f85cef9bc40
XFS (dm-4): xfs_do_force_shutdown(0x8) called from line 236 of file fs/xfs/libxfs/xfs_defer.c.  Return address = 0xffffffffa028f087
XFS (dm-4): Corruption of in-memory data detected.  Shutting down filesystem
XFS (dm-4): Please umount the filesystem and rectify the problem(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

^ permalink raw reply	[flat|nested] 3+ messages in thread