All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: what to do when trans_roll fails and we bheld buffers? (crash in xfs/141)
@ 2019-04-23 16:49 Darrick J. Wong
  2019-04-23 18:08 ` Brian Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2019-04-23 16:49 UTC (permalink / raw)
  To: xfs; +Cc: Brian Foster, Allison Collins

Hi all,

I saw a crash in xfs/141 last night while testing xfs for-next on a V4
filesystem.  I think the problem here is that:

 1. xfs_attr_set calls xfs_attr_set_args.

 2. xfs_attr_set_args calls xfs_attr_shortform_to_leaf and sets @leaf_bp.

 3. _set_args calls xfs_defer_finish.

 4. xfs_defer_finish calls xfs_defer_trans_roll calls xfs_trans_roll.

 5. xfs_trans_roll sets @args.trans to the new transaction created by
    xfs_trans_dup.

 6. xfs_trans_roll calls __xfs_trans_commit on the old transaction and
    fails because we injected a log error.

 7. _trans_roll passes the error code up to _defer_finish.

 8. _defer_finish passes the error code up to _attr_set without joining
    anything (in particular leaf_bp which we bhold'd so that we would
    retain the buffer lock) to the new transaction.  Note that
    @args.trans is still set to the new transaction; nobody's cancelled it.

 8. xfs_attr_set tries to bail out through out_release_leaf.  The
    leaf_bp is non-null and args.trans is non-null, so we call
    xfs_trans_brelse to release leaf_bp from args.trans

 9. leaf_bp isn't joined to any transaction at all, but we passed in a
    non-null args.trans so _trans_brelse hits the "bp->b_transp == tp"
    assert and then crashes trying to assert on the buffer log item
    which is null because leaf_bp isn't joined to any transaction.

I think the underlying problem here is that we haven't precisely defined
what's supposed to happen when the _trans_roll in _defer_trans_roll
errors out.  Looking at the three callers of trans_bhold (attr setting,
allocating disk space for a new dquot, and online repair) the handling
of this situation is not consistent.

(Embarassingly, the first of those three was RVB'd by me and the latter
two were written by me :P)

Should we attach the bholded buffers to the new transaction even if the
old transaction failed to commit so that callers can xfs_trans_brelse
the buffer and it all works out (like what the attr code does)?

Or should we cancel the new transaction and set *tpp to NULL, which will
also solve this problem for the attr case since args.trans will be null
after the old commit fails?  But then we run into the problem that now
every caller has to be prepared to deal with the transaction being null
if something went wrong.  All of those callers ought to be calling
xfs_trans_cancel if that's the case, so we might be able to get away
with adding a if (!tp) return; at the top of that function.

Or are the callers of _defer_finish/_trans_roll simply supposed to know
that the buffer might not be joined if the roll fails and deal with it
appropriately?

Extra complication: if the rolling succeeds but one of the
defer_ops->finish_item() calls fails then the buffer /will/ be joined to
the transaction being passed out.

I /think/ the easiest answer here is that we ought to join the buffers
to the new transaction even if committing the old one fails.  I also
suspect the out_release code in xrep_roll_ag_trans is wrong, but that's
my problem to fix. :)

--D

run fstests xfs/141 at 2019-04-23 00:01:13
XFS (sdc): Mounting V4 Filesystem
XFS (sdc): Ending clean mount
XFS (sdb): Mounting V4 Filesystem
XFS (sdb): Ending clean mount
XFS (sdb): Quotacheck needed: Please wait.
XFS (sdb): Quotacheck: Done.
XFS (sdb): Injecting error (false) at file fs/xfs/xfs_log.c, line 1924, on filesystem "sdb"
XFS (sdb): Intentionally corrupted log record at LSN 0x100000e0e. Shutdown imminent.
XFS (sdb): metadata I/O error in "xlog_iodone" at daddr 0x23762e len 64 error 0
XFS (sdb): xfs_do_force_shutdown(0x2) called from line 1272 of file fs/xfs/xfs_log.c. Return address = ffffffffa064e48b
XFS: Assertion failed: bp->b_transp == tp, file: fs/xfs/xfs_trans_buf.c, line: 385
potentially unexpected fatal signal 6.
CPU: 1 PID: 1154887 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
RIP: 0033:0x7fdb91cd9e97
Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f
RSP: 002b:00007ffdb0d0f080 EFLAGS: 00000246 ORIG_RAX: 000000000000000e
RAX: 0000000000000000 RBX: 0000000003aed64a RCX: 00007fdb91cd9e97
RDX: 0000000000000000 RSI: 00007ffdb0d0f080 RDI: 0000000000000002
RBP: 0000000000458923 R08: 0000000000000000 R09: 00007ffdb0d0f080
R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000005
R13: 000000000000031e R14: 00007ffdb0d0f3d0 R15: 000055d212696560
FS:  00007fdb928cdb80 GS:  0000000000000000
potentially unexpected fatal signal 6.
CPU: 1 PID: 1154888 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
RIP: 0033:0x7fdb91cd9e97
Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f
RSP: 002b:00007ffdb0d0eaf0 EFLAGS: 00000246 ORIG_RAX: 000000000000000e
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00007fdb91cd9e97
RDX: 0000000000000000 RSI: 00007ffdb0d0eaf0 RDI: 0000000000000002
RBP: 000055d2128a41c0 R08: 0000000000000000 R09: 00007ffdb0d0eaf0
R10: 0000000000000008 R11: 0000000000000246 R12: 000000000007a120
R13: 0000000051eb851f R14: 00007ffdb0d0f3d0 R15: 000055d21269ec60
FS:  00007fdb928cdb80 GS:  0000000000000000
XFS (sdb): Log I/O Error Detected. Shutting down filesystem
XFS (sdb): Please unmount the filesystem and rectify the problem(s)
WARNING: CPU: 0 PID: 1154885 at fs/xfs/xfs_message.c:104 assfail+0x27/0x2a [xfs]
Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
CPU: 0 PID: 1154885 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
RIP: 0010:assfail+0x27/0x2a [xfs]
Code: 0f 0b c3 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 50 7c 6a a0 48 89 fa 31 ff e8 a6 f8 ff ff 80 3d 45 94 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8b b3 a8 02 00 00 48 c7 c7 e8 80 6a a0 c6 05 66 8f 0a
RSP: 0018:ffffc9000194bbb0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa0699d13
RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdb9195c000 CR3: 00000000770bd000 CR4: 00000000000006b0
Call Trace:
 xfs_trans_brelse+0x185/0x210 [xfs]
 xfs_attr_set+0x2ce/0x300 [xfs]
 xfs_xattr_set+0x54/0xa0 [xfs]
 __vfs_setxattr+0x66/0x80
 __vfs_setxattr_noperm+0x54/0xf0
 vfs_setxattr+0x81/0xa0
 setxattr+0x146/0x1d0
 path_setxattr+0xd0/0xf0
 __x64_sys_setxattr+0x27/0x30
 do_syscall_64+0x50/0x160
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fdb91db6839
Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839
RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0
RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0
R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001
R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60
irq event stamp: 0
hardirqs last  enabled at (0): [<0000000000000000>]           (null)
hardirqs last disabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220
softirqs last  enabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220
softirqs last disabled at (0): [<0000000000000000>]           (null)
---[ end trace f2eeafdb05fd11a1 ]---
BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
#PF error: [normal kernel read fault]
PGD 0 P4D 0 
Oops: 0000 [#1] PREEMPT SMP
CPU: 0 PID: 1154885 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs]
Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f
RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13
RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0
Call Trace:
 xfs_attr_set+0x2ce/0x300 [xfs]
 xfs_xattr_set+0x54/0xa0 [xfs]
 __vfs_setxattr+0x66/0x80
 __vfs_setxattr_noperm+0x54/0xf0
 vfs_setxattr+0x81/0xa0
 setxattr+0x146/0x1d0
 path_setxattr+0xd0/0xf0
 __x64_sys_setxattr+0x27/0x30
 do_syscall_64+0x50/0x160
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fdb91db6839
Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839
RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0
RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0
R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001
R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60
Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
Dumping ftrace buffer:
   (ftrace buffer empty)
CR2: 0000000000000038
---[ end trace f2eeafdb05fd11a2 ]---
RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs]
Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f
RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13
RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: what to do when trans_roll fails and we bheld buffers? (crash in xfs/141)
  2019-04-23 16:49 RFC: what to do when trans_roll fails and we bheld buffers? (crash in xfs/141) Darrick J. Wong
@ 2019-04-23 18:08 ` Brian Foster
  2019-04-23 20:44   ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Foster @ 2019-04-23 18:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs, Allison Collins

On Tue, Apr 23, 2019 at 09:49:47AM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> I saw a crash in xfs/141 last night while testing xfs for-next on a V4
> filesystem.  I think the problem here is that:
> 
>  1. xfs_attr_set calls xfs_attr_set_args.
> 
>  2. xfs_attr_set_args calls xfs_attr_shortform_to_leaf and sets @leaf_bp.
> 
>  3. _set_args calls xfs_defer_finish.
> 
>  4. xfs_defer_finish calls xfs_defer_trans_roll calls xfs_trans_roll.
> 
>  5. xfs_trans_roll sets @args.trans to the new transaction created by
>     xfs_trans_dup.
> 
>  6. xfs_trans_roll calls __xfs_trans_commit on the old transaction and
>     fails because we injected a log error.
> 
>  7. _trans_roll passes the error code up to _defer_finish.
> 
>  8. _defer_finish passes the error code up to _attr_set without joining
>     anything (in particular leaf_bp which we bhold'd so that we would
>     retain the buffer lock) to the new transaction.  Note that
>     @args.trans is still set to the new transaction; nobody's cancelled it.
> 
>  8. xfs_attr_set tries to bail out through out_release_leaf.  The
>     leaf_bp is non-null and args.trans is non-null, so we call
>     xfs_trans_brelse to release leaf_bp from args.trans
> 
>  9. leaf_bp isn't joined to any transaction at all, but we passed in a
>     non-null args.trans so _trans_brelse hits the "bp->b_transp == tp"
>     assert and then crashes trying to assert on the buffer log item
>     which is null because leaf_bp isn't joined to any transaction.
> 
> I think the underlying problem here is that we haven't precisely defined
> what's supposed to happen when the _trans_roll in _defer_trans_roll
> errors out.  Looking at the three callers of trans_bhold (attr setting,
> allocating disk space for a new dquot, and online repair) the handling
> of this situation is not consistent.
> 
> (Embarassingly, the first of those three was RVB'd by me and the latter
> two were written by me :P)
> 
> Should we attach the bholded buffers to the new transaction even if the
> old transaction failed to commit so that callers can xfs_trans_brelse
> the buffer and it all works out (like what the attr code does)?
> 
> Or should we cancel the new transaction and set *tpp to NULL, which will
> also solve this problem for the attr case since args.trans will be null
> after the old commit fails?  But then we run into the problem that now
> every caller has to be prepared to deal with the transaction being null
> if something went wrong.  All of those callers ought to be calling
> xfs_trans_cancel if that's the case, so we might be able to get away
> with adding a if (!tp) return; at the top of that function.
> 
> Or are the callers of _defer_finish/_trans_roll simply supposed to know
> that the buffer might not be joined if the roll fails and deal with it
> appropriately?
> 
> Extra complication: if the rolling succeeds but one of the
> defer_ops->finish_item() calls fails then the buffer /will/ be joined to
> the transaction being passed out.
> 
> I /think/ the easiest answer here is that we ought to join the buffers
> to the new transaction even if committing the old one fails.  I also
> suspect the out_release code in xrep_roll_ag_trans is wrong, but that's
> my problem to fix. :)
> 

That seems like the most simple and predictable approach to me. Then the
caller doesn't care whether it has the original transaction or a new
one, it just needs to unwind its actions and cancel the tx in the event
of an error.

BTW, a couple other things that look potentially suspicious in this same
codepath:

1.) If the referenced xfs_defer_finish() call succeeds and the
subsequent xfs_trans_roll_inode() fails similarly, it looks like we can
end up in a similar unowned buffer state in the caller.

2.) If we do end up returning to xfs_attr_set_args() with the buffer
held to the transaction, ISTM it's likely dirty and thus
xfs_trans_brelse() may not actually do anything with respect to the
buffer hold. This seems like a potential buffer lock/ref leak vector.

A quick look at xfs_buf_item_unlock() suggests we don't drop the hold in
the abort case, but I could have missed something. If not, perhaps we'd
also need to make sure we release the hold before we return from
xfs_attr_set_args().

Hmm, looking around a bit more outside of this attr case I see code like
the following in xfs_dquot_disk_alloc():

	...
        xfs_trans_bhold(tp, bp);
        error = xfs_defer_finish(tpp);
        tp = *tpp;
        if (error) {
                xfs_buf_relse(bp);
                return error;
	}
	...

That code looks like it actually expects the unowned buf state on a
failed finish as it explicitly releases it on error. Perhaps this is an
argument for treating a failed xfs_defer_finish() like a failed commit
and it is the responsibility of the caller to grok that (meaning the
caller now owns the buf, not the tx). Then again, I'm not sure how the
caller knows whether xfs_defer_finish() failed due to a tx roll (where
we currently haven't re-held the buf) or something later in dfops
processing (where we might have already re-held the buf), so that seems
fishy too... :/

Brian

> --D
> 
> run fstests xfs/141 at 2019-04-23 00:01:13
> XFS (sdc): Mounting V4 Filesystem
> XFS (sdc): Ending clean mount
> XFS (sdb): Mounting V4 Filesystem
> XFS (sdb): Ending clean mount
> XFS (sdb): Quotacheck needed: Please wait.
> XFS (sdb): Quotacheck: Done.
> XFS (sdb): Injecting error (false) at file fs/xfs/xfs_log.c, line 1924, on filesystem "sdb"
> XFS (sdb): Intentionally corrupted log record at LSN 0x100000e0e. Shutdown imminent.
> XFS (sdb): metadata I/O error in "xlog_iodone" at daddr 0x23762e len 64 error 0
> XFS (sdb): xfs_do_force_shutdown(0x2) called from line 1272 of file fs/xfs/xfs_log.c. Return address = ffffffffa064e48b
> XFS: Assertion failed: bp->b_transp == tp, file: fs/xfs/xfs_trans_buf.c, line: 385
> potentially unexpected fatal signal 6.
> CPU: 1 PID: 1154887 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> RIP: 0033:0x7fdb91cd9e97
> Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f
> RSP: 002b:00007ffdb0d0f080 EFLAGS: 00000246 ORIG_RAX: 000000000000000e
> RAX: 0000000000000000 RBX: 0000000003aed64a RCX: 00007fdb91cd9e97
> RDX: 0000000000000000 RSI: 00007ffdb0d0f080 RDI: 0000000000000002
> RBP: 0000000000458923 R08: 0000000000000000 R09: 00007ffdb0d0f080
> R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000005
> R13: 000000000000031e R14: 00007ffdb0d0f3d0 R15: 000055d212696560
> FS:  00007fdb928cdb80 GS:  0000000000000000
> potentially unexpected fatal signal 6.
> CPU: 1 PID: 1154888 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> RIP: 0033:0x7fdb91cd9e97
> Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f
> RSP: 002b:00007ffdb0d0eaf0 EFLAGS: 00000246 ORIG_RAX: 000000000000000e
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00007fdb91cd9e97
> RDX: 0000000000000000 RSI: 00007ffdb0d0eaf0 RDI: 0000000000000002
> RBP: 000055d2128a41c0 R08: 0000000000000000 R09: 00007ffdb0d0eaf0
> R10: 0000000000000008 R11: 0000000000000246 R12: 000000000007a120
> R13: 0000000051eb851f R14: 00007ffdb0d0f3d0 R15: 000055d21269ec60
> FS:  00007fdb928cdb80 GS:  0000000000000000
> XFS (sdb): Log I/O Error Detected. Shutting down filesystem
> XFS (sdb): Please unmount the filesystem and rectify the problem(s)
> WARNING: CPU: 0 PID: 1154885 at fs/xfs/xfs_message.c:104 assfail+0x27/0x2a [xfs]
> Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> CPU: 0 PID: 1154885 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> RIP: 0010:assfail+0x27/0x2a [xfs]
> Code: 0f 0b c3 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 50 7c 6a a0 48 89 fa 31 ff e8 a6 f8 ff ff 80 3d 45 94 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8b b3 a8 02 00 00 48 c7 c7 e8 80 6a a0 c6 05 66 8f 0a
> RSP: 0018:ffffc9000194bbb0 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa0699d13
> RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fdb9195c000 CR3: 00000000770bd000 CR4: 00000000000006b0
> Call Trace:
>  xfs_trans_brelse+0x185/0x210 [xfs]
>  xfs_attr_set+0x2ce/0x300 [xfs]
>  xfs_xattr_set+0x54/0xa0 [xfs]
>  __vfs_setxattr+0x66/0x80
>  __vfs_setxattr_noperm+0x54/0xf0
>  vfs_setxattr+0x81/0xa0
>  setxattr+0x146/0x1d0
>  path_setxattr+0xd0/0xf0
>  __x64_sys_setxattr+0x27/0x30
>  do_syscall_64+0x50/0x160
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x7fdb91db6839
> Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
> RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839
> RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0
> RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0
> R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001
> R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60
> irq event stamp: 0
> hardirqs last  enabled at (0): [<0000000000000000>]           (null)
> hardirqs last disabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220
> softirqs last  enabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220
> softirqs last disabled at (0): [<0000000000000000>]           (null)
> ---[ end trace f2eeafdb05fd11a1 ]---
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> #PF error: [normal kernel read fault]
> PGD 0 P4D 0 
> Oops: 0000 [#1] PREEMPT SMP
> CPU: 0 PID: 1154885 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs]
> Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f
> RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13
> RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0
> Call Trace:
>  xfs_attr_set+0x2ce/0x300 [xfs]
>  xfs_xattr_set+0x54/0xa0 [xfs]
>  __vfs_setxattr+0x66/0x80
>  __vfs_setxattr_noperm+0x54/0xf0
>  vfs_setxattr+0x81/0xa0
>  setxattr+0x146/0x1d0
>  path_setxattr+0xd0/0xf0
>  __x64_sys_setxattr+0x27/0x30
>  do_syscall_64+0x50/0x160
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x7fdb91db6839
> Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
> RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839
> RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0
> RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0
> R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001
> R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60
> Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> CR2: 0000000000000038
> ---[ end trace f2eeafdb05fd11a2 ]---
> RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs]
> Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f
> RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13
> RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: what to do when trans_roll fails and we bheld buffers? (crash in xfs/141)
  2019-04-23 18:08 ` Brian Foster
@ 2019-04-23 20:44   ` Darrick J. Wong
  2019-04-23 21:33     ` Brian Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2019-04-23 20:44 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs, Allison Collins

On Tue, Apr 23, 2019 at 02:08:16PM -0400, Brian Foster wrote:
> On Tue, Apr 23, 2019 at 09:49:47AM -0700, Darrick J. Wong wrote:
> > Hi all,
> > 
> > I saw a crash in xfs/141 last night while testing xfs for-next on a V4
> > filesystem.  I think the problem here is that:
> > 
> >  1. xfs_attr_set calls xfs_attr_set_args.
> > 
> >  2. xfs_attr_set_args calls xfs_attr_shortform_to_leaf and sets @leaf_bp.
> > 
> >  3. _set_args calls xfs_defer_finish.
> > 
> >  4. xfs_defer_finish calls xfs_defer_trans_roll calls xfs_trans_roll.
> > 
> >  5. xfs_trans_roll sets @args.trans to the new transaction created by
> >     xfs_trans_dup.
> > 
> >  6. xfs_trans_roll calls __xfs_trans_commit on the old transaction and
> >     fails because we injected a log error.
> > 
> >  7. _trans_roll passes the error code up to _defer_finish.
> > 
> >  8. _defer_finish passes the error code up to _attr_set without joining
> >     anything (in particular leaf_bp which we bhold'd so that we would
> >     retain the buffer lock) to the new transaction.  Note that
> >     @args.trans is still set to the new transaction; nobody's cancelled it.
> > 
> >  8. xfs_attr_set tries to bail out through out_release_leaf.  The
> >     leaf_bp is non-null and args.trans is non-null, so we call
> >     xfs_trans_brelse to release leaf_bp from args.trans
> > 
> >  9. leaf_bp isn't joined to any transaction at all, but we passed in a
> >     non-null args.trans so _trans_brelse hits the "bp->b_transp == tp"
> >     assert and then crashes trying to assert on the buffer log item
> >     which is null because leaf_bp isn't joined to any transaction.
> > 
> > I think the underlying problem here is that we haven't precisely defined
> > what's supposed to happen when the _trans_roll in _defer_trans_roll
> > errors out.  Looking at the three callers of trans_bhold (attr setting,
> > allocating disk space for a new dquot, and online repair) the handling
> > of this situation is not consistent.
> > 
> > (Embarassingly, the first of those three was RVB'd by me and the latter
> > two were written by me :P)
> > 
> > Should we attach the bholded buffers to the new transaction even if the
> > old transaction failed to commit so that callers can xfs_trans_brelse
> > the buffer and it all works out (like what the attr code does)?
> > 
> > Or should we cancel the new transaction and set *tpp to NULL, which will
> > also solve this problem for the attr case since args.trans will be null
> > after the old commit fails?  But then we run into the problem that now
> > every caller has to be prepared to deal with the transaction being null
> > if something went wrong.  All of those callers ought to be calling
> > xfs_trans_cancel if that's the case, so we might be able to get away
> > with adding a if (!tp) return; at the top of that function.
> > 
> > Or are the callers of _defer_finish/_trans_roll simply supposed to know
> > that the buffer might not be joined if the roll fails and deal with it
> > appropriately?
> > 
> > Extra complication: if the rolling succeeds but one of the
> > defer_ops->finish_item() calls fails then the buffer /will/ be joined to
> > the transaction being passed out.
> > 
> > I /think/ the easiest answer here is that we ought to join the buffers
> > to the new transaction even if committing the old one fails.  I also
> > suspect the out_release code in xrep_roll_ag_trans is wrong, but that's
> > my problem to fix. :)
> > 
> 
> That seems like the most simple and predictable approach to me. Then the
> caller doesn't care whether it has the original transaction or a new
> one, it just needs to unwind its actions and cancel the tx in the event
> of an error.

<nod> I think it's the most straightforward option, but wanted to hear
someone else echo that.  Or tell me I'm a few cards short of a deck. :)

> BTW, a couple other things that look potentially suspicious in this same
> codepath:
> 
> 1.) If the referenced xfs_defer_finish() call succeeds and the
> subsequent xfs_trans_roll_inode() fails similarly, it looks like we can
> end up in a similar unowned buffer state in the caller.

Yep.  I think the bjoin has to go before the "if (error) return error;".

> 2.) If we do end up returning to xfs_attr_set_args() with the buffer
> held to the transaction, ISTM it's likely dirty and thus
> xfs_trans_brelse() may not actually do anything with respect to the
> buffer hold. This seems like a potential buffer lock/ref leak vector.

Hm.  I thought we joined and logged the buffer to the first transaction,
which should clear the dirty state when the transaction commits (or
doesn't)?  Then we add the held buffer to the new transaction, but it's
clean, so it should just release...?  Maybe I need to add an error
injection knob to make rolling fail to see what happens...

> A quick look at xfs_buf_item_unlock() suggests we don't drop the hold in
> the abort case, but I could have missed something. If not, perhaps we'd
> also need to make sure we release the hold before we return from
> xfs_attr_set_args().

We don't?  I would have thought:

	bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);

would do that?  <shrug> Maybe I'm confused.

> Hmm, looking around a bit more outside of this attr case I see code like
> the following in xfs_dquot_disk_alloc():
> 
> 	...
>         xfs_trans_bhold(tp, bp);
>         error = xfs_defer_finish(tpp);
>         tp = *tpp;
>         if (error) {
>                 xfs_buf_relse(bp);
>                 return error;
> 	}
> 	...
> 
> That code looks like it actually expects the unowned buf state on a
> failed finish as it explicitly releases it on error. Perhaps this is an
> argument for treating a failed xfs_defer_finish() like a failed commit
> and it is the responsibility of the caller to grok that (meaning the
> caller now owns the buf, not the tx). Then again, I'm not sure how the
> caller knows whether xfs_defer_finish() failed due to a tx roll (where

I don't think the caller has any good way of finding out, aside from
probing xfs_buf.b_transp == *tpp and there's a buffer log item and
XFS_BLI_HOLD is set, which is kind of gross...

bool xfs_buf_is_still_held_by_me_or_did_we_screw_up(xfs_buf_t *bp);

> we currently haven't re-held the buf) or something later in dfops
> processing (where we might have already re-held the buf), so that seems
> fishy too... :/

Yeah.  I think the error handling here (in quota) is simply wrong; that
ought to be "xfs_trans_brelse(*tpp, bp);" not a regular xfs_buf_relse.

--D

> 
> Brian
> 
> > --D
> > 
> > run fstests xfs/141 at 2019-04-23 00:01:13
> > XFS (sdc): Mounting V4 Filesystem
> > XFS (sdc): Ending clean mount
> > XFS (sdb): Mounting V4 Filesystem
> > XFS (sdb): Ending clean mount
> > XFS (sdb): Quotacheck needed: Please wait.
> > XFS (sdb): Quotacheck: Done.
> > XFS (sdb): Injecting error (false) at file fs/xfs/xfs_log.c, line 1924, on filesystem "sdb"
> > XFS (sdb): Intentionally corrupted log record at LSN 0x100000e0e. Shutdown imminent.
> > XFS (sdb): metadata I/O error in "xlog_iodone" at daddr 0x23762e len 64 error 0
> > XFS (sdb): xfs_do_force_shutdown(0x2) called from line 1272 of file fs/xfs/xfs_log.c. Return address = ffffffffa064e48b
> > XFS: Assertion failed: bp->b_transp == tp, file: fs/xfs/xfs_trans_buf.c, line: 385
> > potentially unexpected fatal signal 6.
> > CPU: 1 PID: 1154887 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > RIP: 0033:0x7fdb91cd9e97
> > Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f
> > RSP: 002b:00007ffdb0d0f080 EFLAGS: 00000246 ORIG_RAX: 000000000000000e
> > RAX: 0000000000000000 RBX: 0000000003aed64a RCX: 00007fdb91cd9e97
> > RDX: 0000000000000000 RSI: 00007ffdb0d0f080 RDI: 0000000000000002
> > RBP: 0000000000458923 R08: 0000000000000000 R09: 00007ffdb0d0f080
> > R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000005
> > R13: 000000000000031e R14: 00007ffdb0d0f3d0 R15: 000055d212696560
> > FS:  00007fdb928cdb80 GS:  0000000000000000
> > potentially unexpected fatal signal 6.
> > CPU: 1 PID: 1154888 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > RIP: 0033:0x7fdb91cd9e97
> > Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f
> > RSP: 002b:00007ffdb0d0eaf0 EFLAGS: 00000246 ORIG_RAX: 000000000000000e
> > RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00007fdb91cd9e97
> > RDX: 0000000000000000 RSI: 00007ffdb0d0eaf0 RDI: 0000000000000002
> > RBP: 000055d2128a41c0 R08: 0000000000000000 R09: 00007ffdb0d0eaf0
> > R10: 0000000000000008 R11: 0000000000000246 R12: 000000000007a120
> > R13: 0000000051eb851f R14: 00007ffdb0d0f3d0 R15: 000055d21269ec60
> > FS:  00007fdb928cdb80 GS:  0000000000000000
> > XFS (sdb): Log I/O Error Detected. Shutting down filesystem
> > XFS (sdb): Please unmount the filesystem and rectify the problem(s)
> > WARNING: CPU: 0 PID: 1154885 at fs/xfs/xfs_message.c:104 assfail+0x27/0x2a [xfs]
> > Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> > CPU: 0 PID: 1154885 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > RIP: 0010:assfail+0x27/0x2a [xfs]
> > Code: 0f 0b c3 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 50 7c 6a a0 48 89 fa 31 ff e8 a6 f8 ff ff 80 3d 45 94 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8b b3 a8 02 00 00 48 c7 c7 e8 80 6a a0 c6 05 66 8f 0a
> > RSP: 0018:ffffc9000194bbb0 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa0699d13
> > RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> > R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> > R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> > FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007fdb9195c000 CR3: 00000000770bd000 CR4: 00000000000006b0
> > Call Trace:
> >  xfs_trans_brelse+0x185/0x210 [xfs]
> >  xfs_attr_set+0x2ce/0x300 [xfs]
> >  xfs_xattr_set+0x54/0xa0 [xfs]
> >  __vfs_setxattr+0x66/0x80
> >  __vfs_setxattr_noperm+0x54/0xf0
> >  vfs_setxattr+0x81/0xa0
> >  setxattr+0x146/0x1d0
> >  path_setxattr+0xd0/0xf0
> >  __x64_sys_setxattr+0x27/0x30
> >  do_syscall_64+0x50/0x160
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > RIP: 0033:0x7fdb91db6839
> > Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
> > RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
> > RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839
> > RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0
> > RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0
> > R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001
> > R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60
> > irq event stamp: 0
> > hardirqs last  enabled at (0): [<0000000000000000>]           (null)
> > hardirqs last disabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220
> > softirqs last  enabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220
> > softirqs last disabled at (0): [<0000000000000000>]           (null)
> > ---[ end trace f2eeafdb05fd11a1 ]---
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> > #PF error: [normal kernel read fault]
> > PGD 0 P4D 0 
> > Oops: 0000 [#1] PREEMPT SMP
> > CPU: 0 PID: 1154885 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs]
> > Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f
> > RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13
> > RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> > R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> > R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> > FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0
> > Call Trace:
> >  xfs_attr_set+0x2ce/0x300 [xfs]
> >  xfs_xattr_set+0x54/0xa0 [xfs]
> >  __vfs_setxattr+0x66/0x80
> >  __vfs_setxattr_noperm+0x54/0xf0
> >  vfs_setxattr+0x81/0xa0
> >  setxattr+0x146/0x1d0
> >  path_setxattr+0xd0/0xf0
> >  __x64_sys_setxattr+0x27/0x30
> >  do_syscall_64+0x50/0x160
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > RIP: 0033:0x7fdb91db6839
> > Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
> > RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
> > RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839
> > RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0
> > RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0
> > R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001
> > R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60
> > Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> > Dumping ftrace buffer:
> >    (ftrace buffer empty)
> > CR2: 0000000000000038
> > ---[ end trace f2eeafdb05fd11a2 ]---
> > RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs]
> > Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f
> > RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13
> > RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> > R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> > R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> > FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0
> > 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: what to do when trans_roll fails and we bheld buffers? (crash in xfs/141)
  2019-04-23 20:44   ` Darrick J. Wong
@ 2019-04-23 21:33     ` Brian Foster
  2019-04-24  3:30       ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Foster @ 2019-04-23 21:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs, Allison Collins

On Tue, Apr 23, 2019 at 01:44:33PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 23, 2019 at 02:08:16PM -0400, Brian Foster wrote:
> > On Tue, Apr 23, 2019 at 09:49:47AM -0700, Darrick J. Wong wrote:
> > > Hi all,
> > > 
> > > I saw a crash in xfs/141 last night while testing xfs for-next on a V4
> > > filesystem.  I think the problem here is that:
> > > 
> > >  1. xfs_attr_set calls xfs_attr_set_args.
> > > 
> > >  2. xfs_attr_set_args calls xfs_attr_shortform_to_leaf and sets @leaf_bp.
> > > 
> > >  3. _set_args calls xfs_defer_finish.
> > > 
> > >  4. xfs_defer_finish calls xfs_defer_trans_roll calls xfs_trans_roll.
> > > 
> > >  5. xfs_trans_roll sets @args.trans to the new transaction created by
> > >     xfs_trans_dup.
> > > 
> > >  6. xfs_trans_roll calls __xfs_trans_commit on the old transaction and
> > >     fails because we injected a log error.
> > > 
> > >  7. _trans_roll passes the error code up to _defer_finish.
> > > 
> > >  8. _defer_finish passes the error code up to _attr_set without joining
> > >     anything (in particular leaf_bp which we bhold'd so that we would
> > >     retain the buffer lock) to the new transaction.  Note that
> > >     @args.trans is still set to the new transaction; nobody's cancelled it.
> > > 
> > >  8. xfs_attr_set tries to bail out through out_release_leaf.  The
> > >     leaf_bp is non-null and args.trans is non-null, so we call
> > >     xfs_trans_brelse to release leaf_bp from args.trans
> > > 
> > >  9. leaf_bp isn't joined to any transaction at all, but we passed in a
> > >     non-null args.trans so _trans_brelse hits the "bp->b_transp == tp"
> > >     assert and then crashes trying to assert on the buffer log item
> > >     which is null because leaf_bp isn't joined to any transaction.
> > > 
> > > I think the underlying problem here is that we haven't precisely defined
> > > what's supposed to happen when the _trans_roll in _defer_trans_roll
> > > errors out.  Looking at the three callers of trans_bhold (attr setting,
> > > allocating disk space for a new dquot, and online repair) the handling
> > > of this situation is not consistent.
> > > 
> > > (Embarassingly, the first of those three was RVB'd by me and the latter
> > > two were written by me :P)
> > > 
> > > Should we attach the bholded buffers to the new transaction even if the
> > > old transaction failed to commit so that callers can xfs_trans_brelse
> > > the buffer and it all works out (like what the attr code does)?
> > > 
> > > Or should we cancel the new transaction and set *tpp to NULL, which will
> > > also solve this problem for the attr case since args.trans will be null
> > > after the old commit fails?  But then we run into the problem that now
> > > every caller has to be prepared to deal with the transaction being null
> > > if something went wrong.  All of those callers ought to be calling
> > > xfs_trans_cancel if that's the case, so we might be able to get away
> > > with adding a if (!tp) return; at the top of that function.
> > > 
> > > Or are the callers of _defer_finish/_trans_roll simply supposed to know
> > > that the buffer might not be joined if the roll fails and deal with it
> > > appropriately?
> > > 
> > > Extra complication: if the rolling succeeds but one of the
> > > defer_ops->finish_item() calls fails then the buffer /will/ be joined to
> > > the transaction being passed out.
> > > 
> > > I /think/ the easiest answer here is that we ought to join the buffers
> > > to the new transaction even if committing the old one fails.  I also
> > > suspect the out_release code in xrep_roll_ag_trans is wrong, but that's
> > > my problem to fix. :)
> > > 
> > 
> > That seems like the most simple and predictable approach to me. Then the
> > caller doesn't care whether it has the original transaction or a new
> > one, it just needs to unwind its actions and cancel the tx in the event
> > of an error.
> 
> <nod> I think it's the most straightforward option, but wanted to hear
> someone else echo that.  Or tell me I'm a few cards short of a deck. :)
> 
> > BTW, a couple other things that look potentially suspicious in this same
> > codepath:
> > 
> > 1.) If the referenced xfs_defer_finish() call succeeds and the
> > subsequent xfs_trans_roll_inode() fails similarly, it looks like we can
> > end up in a similar unowned buffer state in the caller.
> 
> Yep.  I think the bjoin has to go before the "if (error) return error;".
> 

Yeah..

> > 2.) If we do end up returning to xfs_attr_set_args() with the buffer
> > held to the transaction, ISTM it's likely dirty and thus
> > xfs_trans_brelse() may not actually do anything with respect to the
> > buffer hold. This seems like a potential buffer lock/ref leak vector.
> 
> Hm.  I thought we joined and logged the buffer to the first transaction,
> which should clear the dirty state when the transaction commits (or
> doesn't)?  Then we add the held buffer to the new transaction, but it's
> clean, so it should just release...?  Maybe I need to add an error
> injection knob to make rolling fail to see what happens...
> 

I think it just depends where we fail. xfs_attr3_leaf_create() logs the
buffer and it looks like we have a couple error checks after that but
before we actually get to the defer finish code. If we fail there, ILTM
that we return with the original transaction with the leaf buffer joined
and dirty (but not yet held). If we do the bhold and make it into
xfs_defer_finish() and fail before we actually roll (it looks like the
only way this can happen is with a dfops held item overflow and thus
probably wouldn't happen here outside of memory corruption or
something), then I suppose we could technically end up in the caller
with a held+dirty buf from there.

Once we do roll, the buffer ends up clean+held as you note above, in
which case an xfs_trans_brelse() will cancel the hold. Though we're also
assuming that any deferred operation in progress doesn't dirty any held
buffers in the current transaction and then happen to fail and return to
the caller with the same buffer hold+dirty state. I don't think that
applies to the attr case (and it may not apply anywhere in the current
code, I haven't really looked), but it might be something to consider if
we're defining general behavior or error handling rules.

(Regardless I think sticking some error tags in some of this bhold error
handling code and seeing what actually happens is probably a good idea.)

> > A quick look at xfs_buf_item_unlock() suggests we don't drop the hold in
> > the abort case, but I could have missed something. If not, perhaps we'd
> > also need to make sure we release the hold before we return from
> > xfs_attr_set_args().
> 
> We don't?  I would have thought:
> 
> 	bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);
> 
> would do that?  <shrug> Maybe I'm confused.
> 

xfs_buf_item_unlock() samples the current state (into the hold, stale,
etc. bools), clears the bli flags in the line you posted above but then
acts on the original state:

	...
        if (hold || (stale && !released))
                return;

So AFAICT a commit/cancel/abort of a transaction with a held buffer
leaves the buffer lock/release as the responsibility of the caller.

> > Hmm, looking around a bit more outside of this attr case I see code like
> > the following in xfs_dquot_disk_alloc():
> > 
> > 	...
> >         xfs_trans_bhold(tp, bp);
> >         error = xfs_defer_finish(tpp);
> >         tp = *tpp;
> >         if (error) {
> >                 xfs_buf_relse(bp);
> >                 return error;
> > 	}
> > 	...
> > 
> > That code looks like it actually expects the unowned buf state on a
> > failed finish as it explicitly releases it on error. Perhaps this is an
> > argument for treating a failed xfs_defer_finish() like a failed commit
> > and it is the responsibility of the caller to grok that (meaning the
> > caller now owns the buf, not the tx). Then again, I'm not sure how the
> > caller knows whether xfs_defer_finish() failed due to a tx roll (where
> 
> I don't think the caller has any good way of finding out, aside from
> probing xfs_buf.b_transp == *tpp and there's a buffer log item and
> XFS_BLI_HOLD is set, which is kind of gross...
> 

Indeed...

> bool xfs_buf_is_still_held_by_me_or_did_we_screw_up(xfs_buf_t *bp);
> 
> > we currently haven't re-held the buf) or something later in dfops
> > processing (where we might have already re-held the buf), so that seems
> > fishy too... :/
> 
> Yeah.  I think the error handling here (in quota) is simply wrong; that
> ought to be "xfs_trans_brelse(*tpp, bp);" not a regular xfs_buf_relse.
> 

Yeah, I agree. The takeaway I think is just that if we fix up the
xfs_defer_finish() error path to have more consistent behavior then we
might want to audit some of these other codepaths that call with held
bufs/inodes.

Brian

> --D
> 
> > 
> > Brian
> > 
> > > --D
> > > 
> > > run fstests xfs/141 at 2019-04-23 00:01:13
> > > XFS (sdc): Mounting V4 Filesystem
> > > XFS (sdc): Ending clean mount
> > > XFS (sdb): Mounting V4 Filesystem
> > > XFS (sdb): Ending clean mount
> > > XFS (sdb): Quotacheck needed: Please wait.
> > > XFS (sdb): Quotacheck: Done.
> > > XFS (sdb): Injecting error (false) at file fs/xfs/xfs_log.c, line 1924, on filesystem "sdb"
> > > XFS (sdb): Intentionally corrupted log record at LSN 0x100000e0e. Shutdown imminent.
> > > XFS (sdb): metadata I/O error in "xlog_iodone" at daddr 0x23762e len 64 error 0
> > > XFS (sdb): xfs_do_force_shutdown(0x2) called from line 1272 of file fs/xfs/xfs_log.c. Return address = ffffffffa064e48b
> > > XFS: Assertion failed: bp->b_transp == tp, file: fs/xfs/xfs_trans_buf.c, line: 385
> > > potentially unexpected fatal signal 6.
> > > CPU: 1 PID: 1154887 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > RIP: 0033:0x7fdb91cd9e97
> > > Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f
> > > RSP: 002b:00007ffdb0d0f080 EFLAGS: 00000246 ORIG_RAX: 000000000000000e
> > > RAX: 0000000000000000 RBX: 0000000003aed64a RCX: 00007fdb91cd9e97
> > > RDX: 0000000000000000 RSI: 00007ffdb0d0f080 RDI: 0000000000000002
> > > RBP: 0000000000458923 R08: 0000000000000000 R09: 00007ffdb0d0f080
> > > R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000005
> > > R13: 000000000000031e R14: 00007ffdb0d0f3d0 R15: 000055d212696560
> > > FS:  00007fdb928cdb80 GS:  0000000000000000
> > > potentially unexpected fatal signal 6.
> > > CPU: 1 PID: 1154888 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > RIP: 0033:0x7fdb91cd9e97
> > > Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f
> > > RSP: 002b:00007ffdb0d0eaf0 EFLAGS: 00000246 ORIG_RAX: 000000000000000e
> > > RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00007fdb91cd9e97
> > > RDX: 0000000000000000 RSI: 00007ffdb0d0eaf0 RDI: 0000000000000002
> > > RBP: 000055d2128a41c0 R08: 0000000000000000 R09: 00007ffdb0d0eaf0
> > > R10: 0000000000000008 R11: 0000000000000246 R12: 000000000007a120
> > > R13: 0000000051eb851f R14: 00007ffdb0d0f3d0 R15: 000055d21269ec60
> > > FS:  00007fdb928cdb80 GS:  0000000000000000
> > > XFS (sdb): Log I/O Error Detected. Shutting down filesystem
> > > XFS (sdb): Please unmount the filesystem and rectify the problem(s)
> > > WARNING: CPU: 0 PID: 1154885 at fs/xfs/xfs_message.c:104 assfail+0x27/0x2a [xfs]
> > > Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> > > CPU: 0 PID: 1154885 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > RIP: 0010:assfail+0x27/0x2a [xfs]
> > > Code: 0f 0b c3 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 50 7c 6a a0 48 89 fa 31 ff e8 a6 f8 ff ff 80 3d 45 94 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8b b3 a8 02 00 00 48 c7 c7 e8 80 6a a0 c6 05 66 8f 0a
> > > RSP: 0018:ffffc9000194bbb0 EFLAGS: 00010246
> > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa0699d13
> > > RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> > > R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> > > FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007fdb9195c000 CR3: 00000000770bd000 CR4: 00000000000006b0
> > > Call Trace:
> > >  xfs_trans_brelse+0x185/0x210 [xfs]
> > >  xfs_attr_set+0x2ce/0x300 [xfs]
> > >  xfs_xattr_set+0x54/0xa0 [xfs]
> > >  __vfs_setxattr+0x66/0x80
> > >  __vfs_setxattr_noperm+0x54/0xf0
> > >  vfs_setxattr+0x81/0xa0
> > >  setxattr+0x146/0x1d0
> > >  path_setxattr+0xd0/0xf0
> > >  __x64_sys_setxattr+0x27/0x30
> > >  do_syscall_64+0x50/0x160
> > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > RIP: 0033:0x7fdb91db6839
> > > Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
> > > RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
> > > RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839
> > > RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0
> > > RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0
> > > R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001
> > > R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60
> > > irq event stamp: 0
> > > hardirqs last  enabled at (0): [<0000000000000000>]           (null)
> > > hardirqs last disabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220
> > > softirqs last  enabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220
> > > softirqs last disabled at (0): [<0000000000000000>]           (null)
> > > ---[ end trace f2eeafdb05fd11a1 ]---
> > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> > > #PF error: [normal kernel read fault]
> > > PGD 0 P4D 0 
> > > Oops: 0000 [#1] PREEMPT SMP
> > > CPU: 0 PID: 1154885 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs]
> > > Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f
> > > RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286
> > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13
> > > RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> > > R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> > > FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0
> > > Call Trace:
> > >  xfs_attr_set+0x2ce/0x300 [xfs]
> > >  xfs_xattr_set+0x54/0xa0 [xfs]
> > >  __vfs_setxattr+0x66/0x80
> > >  __vfs_setxattr_noperm+0x54/0xf0
> > >  vfs_setxattr+0x81/0xa0
> > >  setxattr+0x146/0x1d0
> > >  path_setxattr+0xd0/0xf0
> > >  __x64_sys_setxattr+0x27/0x30
> > >  do_syscall_64+0x50/0x160
> > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > RIP: 0033:0x7fdb91db6839
> > > Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
> > > RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
> > > RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839
> > > RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0
> > > RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0
> > > R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001
> > > R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60
> > > Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> > > Dumping ftrace buffer:
> > >    (ftrace buffer empty)
> > > CR2: 0000000000000038
> > > ---[ end trace f2eeafdb05fd11a2 ]---
> > > RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs]
> > > Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f
> > > RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286
> > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13
> > > RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> > > R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> > > FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0
> > > 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: what to do when trans_roll fails and we bheld buffers? (crash in xfs/141)
  2019-04-23 21:33     ` Brian Foster
@ 2019-04-24  3:30       ` Darrick J. Wong
  2019-04-24 11:19         ` Brian Foster
  0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2019-04-24  3:30 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs, Allison Collins

iOn Tue, Apr 24, 2019 at 05:33:59PM -0400, Brian Foster wrote:
> On Tue, Apr 23, 2019 at 01:44:33PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 23, 2019 at 02:08:16PM -0400, Brian Foster wrote:
> > > On Tue, Apr 23, 2019 at 09:49:47AM -0700, Darrick J. Wong wrote:
> > > > Hi all,
> > > > 
> > > > I saw a crash in xfs/141 last night while testing xfs for-next on a V4
> > > > filesystem.  I think the problem here is that:
> > > > 
> > > >  1. xfs_attr_set calls xfs_attr_set_args.
> > > > 
> > > >  2. xfs_attr_set_args calls xfs_attr_shortform_to_leaf and sets @leaf_bp.
> > > > 
> > > >  3. _set_args calls xfs_defer_finish.
> > > > 
> > > >  4. xfs_defer_finish calls xfs_defer_trans_roll calls xfs_trans_roll.
> > > > 
> > > >  5. xfs_trans_roll sets @args.trans to the new transaction created by
> > > >     xfs_trans_dup.
> > > > 
> > > >  6. xfs_trans_roll calls __xfs_trans_commit on the old transaction and
> > > >     fails because we injected a log error.
> > > > 
> > > >  7. _trans_roll passes the error code up to _defer_finish.
> > > > 
> > > >  8. _defer_finish passes the error code up to _attr_set without joining
> > > >     anything (in particular leaf_bp which we bhold'd so that we would
> > > >     retain the buffer lock) to the new transaction.  Note that
> > > >     @args.trans is still set to the new transaction; nobody's cancelled it.
> > > > 
> > > >  8. xfs_attr_set tries to bail out through out_release_leaf.  The
> > > >     leaf_bp is non-null and args.trans is non-null, so we call
> > > >     xfs_trans_brelse to release leaf_bp from args.trans
> > > > 
> > > >  9. leaf_bp isn't joined to any transaction at all, but we passed in a
> > > >     non-null args.trans so _trans_brelse hits the "bp->b_transp == tp"
> > > >     assert and then crashes trying to assert on the buffer log item
> > > >     which is null because leaf_bp isn't joined to any transaction.
> > > > 
> > > > I think the underlying problem here is that we haven't precisely defined
> > > > what's supposed to happen when the _trans_roll in _defer_trans_roll
> > > > errors out.  Looking at the three callers of trans_bhold (attr setting,
> > > > allocating disk space for a new dquot, and online repair) the handling
> > > > of this situation is not consistent.
> > > > 
> > > > (Embarassingly, the first of those three was RVB'd by me and the latter
> > > > two were written by me :P)
> > > > 
> > > > Should we attach the bholded buffers to the new transaction even if the
> > > > old transaction failed to commit so that callers can xfs_trans_brelse
> > > > the buffer and it all works out (like what the attr code does)?
> > > > 
> > > > Or should we cancel the new transaction and set *tpp to NULL, which will
> > > > also solve this problem for the attr case since args.trans will be null
> > > > after the old commit fails?  But then we run into the problem that now
> > > > every caller has to be prepared to deal with the transaction being null
> > > > if something went wrong.  All of those callers ought to be calling
> > > > xfs_trans_cancel if that's the case, so we might be able to get away
> > > > with adding a if (!tp) return; at the top of that function.
> > > > 
> > > > Or are the callers of _defer_finish/_trans_roll simply supposed to know
> > > > that the buffer might not be joined if the roll fails and deal with it
> > > > appropriately?
> > > > 
> > > > Extra complication: if the rolling succeeds but one of the
> > > > defer_ops->finish_item() calls fails then the buffer /will/ be joined to
> > > > the transaction being passed out.
> > > > 
> > > > I /think/ the easiest answer here is that we ought to join the buffers
> > > > to the new transaction even if committing the old one fails.  I also
> > > > suspect the out_release code in xrep_roll_ag_trans is wrong, but that's
> > > > my problem to fix. :)
> > > > 
> > > 
> > > That seems like the most simple and predictable approach to me. Then the
> > > caller doesn't care whether it has the original transaction or a new
> > > one, it just needs to unwind its actions and cancel the tx in the event
> > > of an error.
> > 
> > <nod> I think it's the most straightforward option, but wanted to hear
> > someone else echo that.  Or tell me I'm a few cards short of a deck. :)
> > 
> > > BTW, a couple other things that look potentially suspicious in this same
> > > codepath:
> > > 
> > > 1.) If the referenced xfs_defer_finish() call succeeds and the
> > > subsequent xfs_trans_roll_inode() fails similarly, it looks like we can
> > > end up in a similar unowned buffer state in the caller.
> > 
> > Yep.  I think the bjoin has to go before the "if (error) return error;".
> > 
> 
> Yeah..
> 
> > > 2.) If we do end up returning to xfs_attr_set_args() with the buffer
> > > held to the transaction, ISTM it's likely dirty and thus
> > > xfs_trans_brelse() may not actually do anything with respect to the
> > > buffer hold. This seems like a potential buffer lock/ref leak vector.
> > 
> > Hm.  I thought we joined and logged the buffer to the first transaction,
> > which should clear the dirty state when the transaction commits (or
> > doesn't)?  Then we add the held buffer to the new transaction, but it's
> > clean, so it should just release...?  Maybe I need to add an error
> > injection knob to make rolling fail to see what happens...
> > 
> 
> I think it just depends where we fail. xfs_attr3_leaf_create() logs the
> buffer and it looks like we have a couple error checks after that but
> before we actually get to the defer finish code. If we fail there, ILTM
> that we return with the original transaction with the leaf buffer joined
> and dirty (but not yet held). If we do the bhold and make it into
> xfs_defer_finish() and fail before we actually roll (it looks like the
> only way this can happen is with a dfops held item overflow and thus
> probably wouldn't happen here outside of memory corruption or
> something), then I suppose we could technically end up in the caller
> with a held+dirty buf from there.
> 
> Once we do roll, the buffer ends up clean+held as you note above, in
> which case an xfs_trans_brelse() will cancel the hold. Though we're also
> assuming that any deferred operation in progress doesn't dirty any held
> buffers in the current transaction and then happen to fail and return to
> the caller with the same buffer hold+dirty state. I don't think that
> applies to the attr case (and it may not apply anywhere in the current
> code, I haven't really looked), but it might be something to consider if
> we're defining general behavior or error handling rules.
> 
> (Regardless I think sticking some error tags in some of this bhold error
> handling code and seeing what actually happens is probably a good idea.)
> 
> > > A quick look at xfs_buf_item_unlock() suggests we don't drop the hold in
> > > the abort case, but I could have missed something. If not, perhaps we'd
> > > also need to make sure we release the hold before we return from
> > > xfs_attr_set_args().
> > 
> > We don't?  I would have thought:
> > 
> > 	bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);
> > 
> > would do that?  <shrug> Maybe I'm confused.
> > 
> 
> xfs_buf_item_unlock() samples the current state (into the hold, stale,
> etc. bools), clears the bli flags in the line you posted above but then
> acts on the original state:
> 
> 	...
>         if (hold || (stale && !released))
>                 return;
> 
> So AFAICT a commit/cancel/abort of a transaction with a held buffer
> leaves the buffer lock/release as the responsibility of the caller.

Hm.  One thing that's been bugging me for a long time about the way we
maintain locks on objects across a defer roll is that we treat inodes
and buffers differently:

For inodes we ijoin them to a transaction along with ilock_flags that
tell the transaction what we want to unlock when the transaction commits
or cancels.  We interpret ilock_flags == 0 in the inode log item to mean
"let the caller unlock the inode when the transaction is done", which
means that the caller must xfs_iunlock and xfs_irele after disposing of
that new transaction.

However, held buffers are treated differently -- BLI_HOLD being set in
the buffer log item means the transaction commit (or cancel) maintains
the lock but always clears the hold.  Hence the defer roll rejoins and
reholds the buffer to the new transaction unless something goes wrong,
which is how we ended up in this weird place.

So what if we just change the buffer join rules to be more like inodes?
Setting BLI_HOLD means that the buffer will not be unlocked or released
as part of transaction commit or cancel; the defer roll will join the
buffer (and the hold state) to the new transaction if nothing went
wrong, and we make it the responsibility of the caller to xfs_buf_unlock
and xfs_buf_relee the buffer after disposing of the new transaction,
just like we do for inodes.

There are only four users of holds: the defer code itself; dquot
allocation; attr leaf conversion; inode allocation; and online repair,
so it won't be hard to audit and fix all four if we change the behavior.

--D

> 
> > > Hmm, looking around a bit more outside of this attr case I see code like
> > > the following in xfs_dquot_disk_alloc():
> > > 
> > > 	...
> > >         xfs_trans_bhold(tp, bp);
> > >         error = xfs_defer_finish(tpp);
> > >         tp = *tpp;
> > >         if (error) {
> > >                 xfs_buf_relse(bp);
> > >                 return error;
> > > 	}
> > > 	...
> > > 
> > > That code looks like it actually expects the unowned buf state on a
> > > failed finish as it explicitly releases it on error. Perhaps this is an
> > > argument for treating a failed xfs_defer_finish() like a failed commit
> > > and it is the responsibility of the caller to grok that (meaning the
> > > caller now owns the buf, not the tx). Then again, I'm not sure how the
> > > caller knows whether xfs_defer_finish() failed due to a tx roll (where
> > 
> > I don't think the caller has any good way of finding out, aside from
> > probing xfs_buf.b_transp == *tpp and there's a buffer log item and
> > XFS_BLI_HOLD is set, which is kind of gross...
> > 
> 
> Indeed...
> 
> > bool xfs_buf_is_still_held_by_me_or_did_we_screw_up(xfs_buf_t *bp);
> > 
> > > we currently haven't re-held the buf) or something later in dfops
> > > processing (where we might have already re-held the buf), so that seems
> > > fishy too... :/
> > 
> > Yeah.  I think the error handling here (in quota) is simply wrong; that
> > ought to be "xfs_trans_brelse(*tpp, bp);" not a regular xfs_buf_relse.
> > 
> 
> Yeah, I agree. The takeaway I think is just that if we fix up the
> xfs_defer_finish() error path to have more consistent behavior then we
> might want to audit some of these other codepaths that call with held
> bufs/inodes.
> 
> Brian
> 
> > --D
> > 
> > > 
> > > Brian
> > > 
> > > > --D
> > > > 
> > > > run fstests xfs/141 at 2019-04-23 00:01:13
> > > > XFS (sdc): Mounting V4 Filesystem
> > > > XFS (sdc): Ending clean mount
> > > > XFS (sdb): Mounting V4 Filesystem
> > > > XFS (sdb): Ending clean mount
> > > > XFS (sdb): Quotacheck needed: Please wait.
> > > > XFS (sdb): Quotacheck: Done.
> > > > XFS (sdb): Injecting error (false) at file fs/xfs/xfs_log.c, line 1924, on filesystem "sdb"
> > > > XFS (sdb): Intentionally corrupted log record at LSN 0x100000e0e. Shutdown imminent.
> > > > XFS (sdb): metadata I/O error in "xlog_iodone" at daddr 0x23762e len 64 error 0
> > > > XFS (sdb): xfs_do_force_shutdown(0x2) called from line 1272 of file fs/xfs/xfs_log.c. Return address = ffffffffa064e48b
> > > > XFS: Assertion failed: bp->b_transp == tp, file: fs/xfs/xfs_trans_buf.c, line: 385
> > > > potentially unexpected fatal signal 6.
> > > > CPU: 1 PID: 1154887 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > > RIP: 0033:0x7fdb91cd9e97
> > > > Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f
> > > > RSP: 002b:00007ffdb0d0f080 EFLAGS: 00000246 ORIG_RAX: 000000000000000e
> > > > RAX: 0000000000000000 RBX: 0000000003aed64a RCX: 00007fdb91cd9e97
> > > > RDX: 0000000000000000 RSI: 00007ffdb0d0f080 RDI: 0000000000000002
> > > > RBP: 0000000000458923 R08: 0000000000000000 R09: 00007ffdb0d0f080
> > > > R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000005
> > > > R13: 000000000000031e R14: 00007ffdb0d0f3d0 R15: 000055d212696560
> > > > FS:  00007fdb928cdb80 GS:  0000000000000000
> > > > potentially unexpected fatal signal 6.
> > > > CPU: 1 PID: 1154888 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > > RIP: 0033:0x7fdb91cd9e97
> > > > Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f
> > > > RSP: 002b:00007ffdb0d0eaf0 EFLAGS: 00000246 ORIG_RAX: 000000000000000e
> > > > RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00007fdb91cd9e97
> > > > RDX: 0000000000000000 RSI: 00007ffdb0d0eaf0 RDI: 0000000000000002
> > > > RBP: 000055d2128a41c0 R08: 0000000000000000 R09: 00007ffdb0d0eaf0
> > > > R10: 0000000000000008 R11: 0000000000000246 R12: 000000000007a120
> > > > R13: 0000000051eb851f R14: 00007ffdb0d0f3d0 R15: 000055d21269ec60
> > > > FS:  00007fdb928cdb80 GS:  0000000000000000
> > > > XFS (sdb): Log I/O Error Detected. Shutting down filesystem
> > > > XFS (sdb): Please unmount the filesystem and rectify the problem(s)
> > > > WARNING: CPU: 0 PID: 1154885 at fs/xfs/xfs_message.c:104 assfail+0x27/0x2a [xfs]
> > > > Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> > > > CPU: 0 PID: 1154885 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > > RIP: 0010:assfail+0x27/0x2a [xfs]
> > > > Code: 0f 0b c3 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 50 7c 6a a0 48 89 fa 31 ff e8 a6 f8 ff ff 80 3d 45 94 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8b b3 a8 02 00 00 48 c7 c7 e8 80 6a a0 c6 05 66 8f 0a
> > > > RSP: 0018:ffffc9000194bbb0 EFLAGS: 00010246
> > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > > RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa0699d13
> > > > RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> > > > R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> > > > R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> > > > FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 00007fdb9195c000 CR3: 00000000770bd000 CR4: 00000000000006b0
> > > > Call Trace:
> > > >  xfs_trans_brelse+0x185/0x210 [xfs]
> > > >  xfs_attr_set+0x2ce/0x300 [xfs]
> > > >  xfs_xattr_set+0x54/0xa0 [xfs]
> > > >  __vfs_setxattr+0x66/0x80
> > > >  __vfs_setxattr_noperm+0x54/0xf0
> > > >  vfs_setxattr+0x81/0xa0
> > > >  setxattr+0x146/0x1d0
> > > >  path_setxattr+0xd0/0xf0
> > > >  __x64_sys_setxattr+0x27/0x30
> > > >  do_syscall_64+0x50/0x160
> > > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > RIP: 0033:0x7fdb91db6839
> > > > Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
> > > > RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
> > > > RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839
> > > > RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0
> > > > RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0
> > > > R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001
> > > > R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60
> > > > irq event stamp: 0
> > > > hardirqs last  enabled at (0): [<0000000000000000>]           (null)
> > > > hardirqs last disabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220
> > > > softirqs last  enabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220
> > > > softirqs last disabled at (0): [<0000000000000000>]           (null)
> > > > ---[ end trace f2eeafdb05fd11a1 ]---
> > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> > > > #PF error: [normal kernel read fault]
> > > > PGD 0 P4D 0 
> > > > Oops: 0000 [#1] PREEMPT SMP
> > > > CPU: 0 PID: 1154885 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > > RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs]
> > > > Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f
> > > > RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286
> > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > > RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13
> > > > RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> > > > R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> > > > R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> > > > FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0
> > > > Call Trace:
> > > >  xfs_attr_set+0x2ce/0x300 [xfs]
> > > >  xfs_xattr_set+0x54/0xa0 [xfs]
> > > >  __vfs_setxattr+0x66/0x80
> > > >  __vfs_setxattr_noperm+0x54/0xf0
> > > >  vfs_setxattr+0x81/0xa0
> > > >  setxattr+0x146/0x1d0
> > > >  path_setxattr+0xd0/0xf0
> > > >  __x64_sys_setxattr+0x27/0x30
> > > >  do_syscall_64+0x50/0x160
> > > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > RIP: 0033:0x7fdb91db6839
> > > > Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
> > > > RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
> > > > RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839
> > > > RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0
> > > > RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0
> > > > R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001
> > > > R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60
> > > > Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> > > > Dumping ftrace buffer:
> > > >    (ftrace buffer empty)
> > > > CR2: 0000000000000038
> > > > ---[ end trace f2eeafdb05fd11a2 ]---
> > > > RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs]
> > > > Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f
> > > > RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286
> > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > > RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13
> > > > RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> > > > R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> > > > R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> > > > FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0
> > > > 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: what to do when trans_roll fails and we bheld buffers? (crash in xfs/141)
  2019-04-24  3:30       ` Darrick J. Wong
@ 2019-04-24 11:19         ` Brian Foster
  2019-04-24 15:43           ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Foster @ 2019-04-24 11:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs, Allison Collins

On Tue, Apr 23, 2019 at 08:30:26PM -0700, Darrick J. Wong wrote:
> iOn Tue, Apr 24, 2019 at 05:33:59PM -0400, Brian Foster wrote:
> > On Tue, Apr 23, 2019 at 01:44:33PM -0700, Darrick J. Wong wrote:
> > > On Tue, Apr 23, 2019 at 02:08:16PM -0400, Brian Foster wrote:
> > > > On Tue, Apr 23, 2019 at 09:49:47AM -0700, Darrick J. Wong wrote:
> > > > > Hi all,
> > > > > 
> > > > > I saw a crash in xfs/141 last night while testing xfs for-next on a V4
> > > > > filesystem.  I think the problem here is that:
> > > > > 
> > > > >  1. xfs_attr_set calls xfs_attr_set_args.
> > > > > 
> > > > >  2. xfs_attr_set_args calls xfs_attr_shortform_to_leaf and sets @leaf_bp.
> > > > > 
> > > > >  3. _set_args calls xfs_defer_finish.
> > > > > 
> > > > >  4. xfs_defer_finish calls xfs_defer_trans_roll calls xfs_trans_roll.
> > > > > 
> > > > >  5. xfs_trans_roll sets @args.trans to the new transaction created by
> > > > >     xfs_trans_dup.
> > > > > 
> > > > >  6. xfs_trans_roll calls __xfs_trans_commit on the old transaction and
> > > > >     fails because we injected a log error.
> > > > > 
> > > > >  7. _trans_roll passes the error code up to _defer_finish.
> > > > > 
> > > > >  8. _defer_finish passes the error code up to _attr_set without joining
> > > > >     anything (in particular leaf_bp which we bhold'd so that we would
> > > > >     retain the buffer lock) to the new transaction.  Note that
> > > > >     @args.trans is still set to the new transaction; nobody's cancelled it.
> > > > > 
> > > > >  8. xfs_attr_set tries to bail out through out_release_leaf.  The
> > > > >     leaf_bp is non-null and args.trans is non-null, so we call
> > > > >     xfs_trans_brelse to release leaf_bp from args.trans
> > > > > 
> > > > >  9. leaf_bp isn't joined to any transaction at all, but we passed in a
> > > > >     non-null args.trans so _trans_brelse hits the "bp->b_transp == tp"
> > > > >     assert and then crashes trying to assert on the buffer log item
> > > > >     which is null because leaf_bp isn't joined to any transaction.
> > > > > 
> > > > > I think the underlying problem here is that we haven't precisely defined
> > > > > what's supposed to happen when the _trans_roll in _defer_trans_roll
> > > > > errors out.  Looking at the three callers of trans_bhold (attr setting,
> > > > > allocating disk space for a new dquot, and online repair) the handling
> > > > > of this situation is not consistent.
> > > > > 
> > > > > (Embarassingly, the first of those three was RVB'd by me and the latter
> > > > > two were written by me :P)
> > > > > 
> > > > > Should we attach the bholded buffers to the new transaction even if the
> > > > > old transaction failed to commit so that callers can xfs_trans_brelse
> > > > > the buffer and it all works out (like what the attr code does)?
> > > > > 
> > > > > Or should we cancel the new transaction and set *tpp to NULL, which will
> > > > > also solve this problem for the attr case since args.trans will be null
> > > > > after the old commit fails?  But then we run into the problem that now
> > > > > every caller has to be prepared to deal with the transaction being null
> > > > > if something went wrong.  All of those callers ought to be calling
> > > > > xfs_trans_cancel if that's the case, so we might be able to get away
> > > > > with adding a if (!tp) return; at the top of that function.
> > > > > 
> > > > > Or are the callers of _defer_finish/_trans_roll simply supposed to know
> > > > > that the buffer might not be joined if the roll fails and deal with it
> > > > > appropriately?
> > > > > 
> > > > > Extra complication: if the rolling succeeds but one of the
> > > > > defer_ops->finish_item() calls fails then the buffer /will/ be joined to
> > > > > the transaction being passed out.
> > > > > 
> > > > > I /think/ the easiest answer here is that we ought to join the buffers
> > > > > to the new transaction even if committing the old one fails.  I also
> > > > > suspect the out_release code in xrep_roll_ag_trans is wrong, but that's
> > > > > my problem to fix. :)
> > > > > 
> > > > 
> > > > That seems like the most simple and predictable approach to me. Then the
> > > > caller doesn't care whether it has the original transaction or a new
> > > > one, it just needs to unwind its actions and cancel the tx in the event
> > > > of an error.
> > > 
> > > <nod> I think it's the most straightforward option, but wanted to hear
> > > someone else echo that.  Or tell me I'm a few cards short of a deck. :)
> > > 
> > > > BTW, a couple other things that look potentially suspicious in this same
> > > > codepath:
> > > > 
> > > > 1.) If the referenced xfs_defer_finish() call succeeds and the
> > > > subsequent xfs_trans_roll_inode() fails similarly, it looks like we can
> > > > end up in a similar unowned buffer state in the caller.
> > > 
> > > Yep.  I think the bjoin has to go before the "if (error) return error;".
> > > 
> > 
> > Yeah..
> > 
> > > > 2.) If we do end up returning to xfs_attr_set_args() with the buffer
> > > > held to the transaction, ISTM it's likely dirty and thus
> > > > xfs_trans_brelse() may not actually do anything with respect to the
> > > > buffer hold. This seems like a potential buffer lock/ref leak vector.
> > > 
> > > Hm.  I thought we joined and logged the buffer to the first transaction,
> > > which should clear the dirty state when the transaction commits (or
> > > doesn't)?  Then we add the held buffer to the new transaction, but it's
> > > clean, so it should just release...?  Maybe I need to add an error
> > > injection knob to make rolling fail to see what happens...
> > > 
> > 
> > I think it just depends where we fail. xfs_attr3_leaf_create() logs the
> > buffer and it looks like we have a couple error checks after that but
> > before we actually get to the defer finish code. If we fail there, ILTM
> > that we return with the original transaction with the leaf buffer joined
> > and dirty (but not yet held). If we do the bhold and make it into
> > xfs_defer_finish() and fail before we actually roll (it looks like the
> > only way this can happen is with a dfops held item overflow and thus
> > probably wouldn't happen here outside of memory corruption or
> > something), then I suppose we could technically end up in the caller
> > with a held+dirty buf from there.
> > 
> > Once we do roll, the buffer ends up clean+held as you note above, in
> > which case an xfs_trans_brelse() will cancel the hold. Though we're also
> > assuming that any deferred operation in progress doesn't dirty any held
> > buffers in the current transaction and then happen to fail and return to
> > the caller with the same buffer hold+dirty state. I don't think that
> > applies to the attr case (and it may not apply anywhere in the current
> > code, I haven't really looked), but it might be something to consider if
> > we're defining general behavior or error handling rules.
> > 
> > (Regardless I think sticking some error tags in some of this bhold error
> > handling code and seeing what actually happens is probably a good idea.)
> > 
> > > > A quick look at xfs_buf_item_unlock() suggests we don't drop the hold in
> > > > the abort case, but I could have missed something. If not, perhaps we'd
> > > > also need to make sure we release the hold before we return from
> > > > xfs_attr_set_args().
> > > 
> > > We don't?  I would have thought:
> > > 
> > > 	bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);
> > > 
> > > would do that?  <shrug> Maybe I'm confused.
> > > 
> > 
> > xfs_buf_item_unlock() samples the current state (into the hold, stale,
> > etc. bools), clears the bli flags in the line you posted above but then
> > acts on the original state:
> > 
> > 	...
> >         if (hold || (stale && !released))
> >                 return;
> > 
> > So AFAICT a commit/cancel/abort of a transaction with a held buffer
> > leaves the buffer lock/release as the responsibility of the caller.
> 
> Hm.  One thing that's been bugging me for a long time about the way we
> maintain locks on objects across a defer roll is that we treat inodes
> and buffers differently:
> 
> For inodes we ijoin them to a transaction along with ilock_flags that
> tell the transaction what we want to unlock when the transaction commits
> or cancels.  We interpret ilock_flags == 0 in the inode log item to mean
> "let the caller unlock the inode when the transaction is done", which
> means that the caller must xfs_iunlock and xfs_irele after disposing of
> that new transaction.
> 
> However, held buffers are treated differently -- BLI_HOLD being set in
> the buffer log item means the transaction commit (or cancel) maintains
> the lock but always clears the hold.  Hence the defer roll rejoins and
> reholds the buffer to the new transaction unless something goes wrong,
> which is how we ended up in this weird place.
> 

The tx commit clears the bli hold state flag, but that's because the bli
is no longer associated with the transaction (which has been destroyed).
This is similar to how xfs_inode_item_unlock() does the following:

        lock_flags = iip->ili_lock_flags;
        iip->ili_lock_flags = 0;
        if (lock_flags)
                xfs_iunlock(ip, lock_flags);

... to clear the ili state and act on lock_flags. In both cases (a held
buf or lock_flags == 0 inode), the object remains owned by the caller.

> So what if we just change the buffer join rules to be more like inodes?
> Setting BLI_HOLD means that the buffer will not be unlocked or released
> as part of transaction commit or cancel; the defer roll will join the
> buffer (and the hold state) to the new transaction if nothing went
> wrong, and we make it the responsibility of the caller to xfs_buf_unlock
> and xfs_buf_relee the buffer after disposing of the new transaction,
> just like we do for inodes.
> 

Hmmm not sure I follow... what you describe above sounds like the
current behavior of a BLI_HOLD buffer in a transaction. The
xfs_defer_finish() call carries that state forward across however many
transaction rolls it does and then the final xfs_trans_commit() leaves
the objects owned by the caller according to its bhold/lock_flags
semantics.

To put it another way, any caller that does xfs_trans_bhold() or
xfs_trans_ijoin(ip, 0) of an associated object currently expects to have
to release that object after trans commit. Are you suggesting that we
lift that behavior to xfs_defer_finish() such that after the defer call
said objects are owned by the caller at that point (i.e., the caller
would have to rejoin to the final transaction if that is desired)?

Brian

> There are only four users of holds: the defer code itself; dquot
> allocation; attr leaf conversion; inode allocation; and online repair,
> so it won't be hard to audit and fix all four if we change the behavior.
> 
> --D
> 
> > 
> > > > Hmm, looking around a bit more outside of this attr case I see code like
> > > > the following in xfs_dquot_disk_alloc():
> > > > 
> > > > 	...
> > > >         xfs_trans_bhold(tp, bp);
> > > >         error = xfs_defer_finish(tpp);
> > > >         tp = *tpp;
> > > >         if (error) {
> > > >                 xfs_buf_relse(bp);
> > > >                 return error;
> > > > 	}
> > > > 	...
> > > > 
> > > > That code looks like it actually expects the unowned buf state on a
> > > > failed finish as it explicitly releases it on error. Perhaps this is an
> > > > argument for treating a failed xfs_defer_finish() like a failed commit
> > > > and it is the responsibility of the caller to grok that (meaning the
> > > > caller now owns the buf, not the tx). Then again, I'm not sure how the
> > > > caller knows whether xfs_defer_finish() failed due to a tx roll (where
> > > 
> > > I don't think the caller has any good way of finding out, aside from
> > > probing xfs_buf.b_transp == *tpp and there's a buffer log item and
> > > XFS_BLI_HOLD is set, which is kind of gross...
> > > 
> > 
> > Indeed...
> > 
> > > bool xfs_buf_is_still_held_by_me_or_did_we_screw_up(xfs_buf_t *bp);
> > > 
> > > > we currently haven't re-held the buf) or something later in dfops
> > > > processing (where we might have already re-held the buf), so that seems
> > > > fishy too... :/
> > > 
> > > Yeah.  I think the error handling here (in quota) is simply wrong; that
> > > ought to be "xfs_trans_brelse(*tpp, bp);" not a regular xfs_buf_relse.
> > > 
> > 
> > Yeah, I agree. The takeaway I think is just that if we fix up the
> > xfs_defer_finish() error path to have more consistent behavior then we
> > might want to audit some of these other codepaths that call with held
> > bufs/inodes.
> > 
> > Brian
> > 
> > > --D
> > > 
> > > > 
> > > > Brian
> > > > 
> > > > > --D
> > > > > 
> > > > > run fstests xfs/141 at 2019-04-23 00:01:13
> > > > > XFS (sdc): Mounting V4 Filesystem
> > > > > XFS (sdc): Ending clean mount
> > > > > XFS (sdb): Mounting V4 Filesystem
> > > > > XFS (sdb): Ending clean mount
> > > > > XFS (sdb): Quotacheck needed: Please wait.
> > > > > XFS (sdb): Quotacheck: Done.
> > > > > XFS (sdb): Injecting error (false) at file fs/xfs/xfs_log.c, line 1924, on filesystem "sdb"
> > > > > XFS (sdb): Intentionally corrupted log record at LSN 0x100000e0e. Shutdown imminent.
> > > > > XFS (sdb): metadata I/O error in "xlog_iodone" at daddr 0x23762e len 64 error 0
> > > > > XFS (sdb): xfs_do_force_shutdown(0x2) called from line 1272 of file fs/xfs/xfs_log.c. Return address = ffffffffa064e48b
> > > > > XFS: Assertion failed: bp->b_transp == tp, file: fs/xfs/xfs_trans_buf.c, line: 385
> > > > > potentially unexpected fatal signal 6.
> > > > > CPU: 1 PID: 1154887 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > > > RIP: 0033:0x7fdb91cd9e97
> > > > > Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f
> > > > > RSP: 002b:00007ffdb0d0f080 EFLAGS: 00000246 ORIG_RAX: 000000000000000e
> > > > > RAX: 0000000000000000 RBX: 0000000003aed64a RCX: 00007fdb91cd9e97
> > > > > RDX: 0000000000000000 RSI: 00007ffdb0d0f080 RDI: 0000000000000002
> > > > > RBP: 0000000000458923 R08: 0000000000000000 R09: 00007ffdb0d0f080
> > > > > R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000005
> > > > > R13: 000000000000031e R14: 00007ffdb0d0f3d0 R15: 000055d212696560
> > > > > FS:  00007fdb928cdb80 GS:  0000000000000000
> > > > > potentially unexpected fatal signal 6.
> > > > > CPU: 1 PID: 1154888 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > > > RIP: 0033:0x7fdb91cd9e97
> > > > > Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f
> > > > > RSP: 002b:00007ffdb0d0eaf0 EFLAGS: 00000246 ORIG_RAX: 000000000000000e
> > > > > RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00007fdb91cd9e97
> > > > > RDX: 0000000000000000 RSI: 00007ffdb0d0eaf0 RDI: 0000000000000002
> > > > > RBP: 000055d2128a41c0 R08: 0000000000000000 R09: 00007ffdb0d0eaf0
> > > > > R10: 0000000000000008 R11: 0000000000000246 R12: 000000000007a120
> > > > > R13: 0000000051eb851f R14: 00007ffdb0d0f3d0 R15: 000055d21269ec60
> > > > > FS:  00007fdb928cdb80 GS:  0000000000000000
> > > > > XFS (sdb): Log I/O Error Detected. Shutting down filesystem
> > > > > XFS (sdb): Please unmount the filesystem and rectify the problem(s)
> > > > > WARNING: CPU: 0 PID: 1154885 at fs/xfs/xfs_message.c:104 assfail+0x27/0x2a [xfs]
> > > > > Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> > > > > CPU: 0 PID: 1154885 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > > > RIP: 0010:assfail+0x27/0x2a [xfs]
> > > > > Code: 0f 0b c3 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 50 7c 6a a0 48 89 fa 31 ff e8 a6 f8 ff ff 80 3d 45 94 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8b b3 a8 02 00 00 48 c7 c7 e8 80 6a a0 c6 05 66 8f 0a
> > > > > RSP: 0018:ffffc9000194bbb0 EFLAGS: 00010246
> > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > > > RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa0699d13
> > > > > RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> > > > > R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> > > > > R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> > > > > FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > CR2: 00007fdb9195c000 CR3: 00000000770bd000 CR4: 00000000000006b0
> > > > > Call Trace:
> > > > >  xfs_trans_brelse+0x185/0x210 [xfs]
> > > > >  xfs_attr_set+0x2ce/0x300 [xfs]
> > > > >  xfs_xattr_set+0x54/0xa0 [xfs]
> > > > >  __vfs_setxattr+0x66/0x80
> > > > >  __vfs_setxattr_noperm+0x54/0xf0
> > > > >  vfs_setxattr+0x81/0xa0
> > > > >  setxattr+0x146/0x1d0
> > > > >  path_setxattr+0xd0/0xf0
> > > > >  __x64_sys_setxattr+0x27/0x30
> > > > >  do_syscall_64+0x50/0x160
> > > > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > > RIP: 0033:0x7fdb91db6839
> > > > > Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
> > > > > RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
> > > > > RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839
> > > > > RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0
> > > > > RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0
> > > > > R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001
> > > > > R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60
> > > > > irq event stamp: 0
> > > > > hardirqs last  enabled at (0): [<0000000000000000>]           (null)
> > > > > hardirqs last disabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220
> > > > > softirqs last  enabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220
> > > > > softirqs last disabled at (0): [<0000000000000000>]           (null)
> > > > > ---[ end trace f2eeafdb05fd11a1 ]---
> > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> > > > > #PF error: [normal kernel read fault]
> > > > > PGD 0 P4D 0 
> > > > > Oops: 0000 [#1] PREEMPT SMP
> > > > > CPU: 0 PID: 1154885 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > > > RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs]
> > > > > Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f
> > > > > RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286
> > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > > > RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13
> > > > > RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> > > > > R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> > > > > R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> > > > > FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0
> > > > > Call Trace:
> > > > >  xfs_attr_set+0x2ce/0x300 [xfs]
> > > > >  xfs_xattr_set+0x54/0xa0 [xfs]
> > > > >  __vfs_setxattr+0x66/0x80
> > > > >  __vfs_setxattr_noperm+0x54/0xf0
> > > > >  vfs_setxattr+0x81/0xa0
> > > > >  setxattr+0x146/0x1d0
> > > > >  path_setxattr+0xd0/0xf0
> > > > >  __x64_sys_setxattr+0x27/0x30
> > > > >  do_syscall_64+0x50/0x160
> > > > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > > RIP: 0033:0x7fdb91db6839
> > > > > Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
> > > > > RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
> > > > > RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839
> > > > > RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0
> > > > > RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0
> > > > > R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001
> > > > > R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60
> > > > > Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> > > > > Dumping ftrace buffer:
> > > > >    (ftrace buffer empty)
> > > > > CR2: 0000000000000038
> > > > > ---[ end trace f2eeafdb05fd11a2 ]---
> > > > > RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs]
> > > > > Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f
> > > > > RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286
> > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > > > RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13
> > > > > RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> > > > > R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> > > > > R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> > > > > FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0
> > > > > 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC: what to do when trans_roll fails and we bheld buffers? (crash in xfs/141)
  2019-04-24 11:19         ` Brian Foster
@ 2019-04-24 15:43           ` Darrick J. Wong
  0 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2019-04-24 15:43 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs, Allison Collins

On Wed, Apr 24, 2019 at 07:19:33AM -0400, Brian Foster wrote:
> On Tue, Apr 23, 2019 at 08:30:26PM -0700, Darrick J. Wong wrote:
> > iOn Tue, Apr 24, 2019 at 05:33:59PM -0400, Brian Foster wrote:
> > > On Tue, Apr 23, 2019 at 01:44:33PM -0700, Darrick J. Wong wrote:
> > > > On Tue, Apr 23, 2019 at 02:08:16PM -0400, Brian Foster wrote:
> > > > > On Tue, Apr 23, 2019 at 09:49:47AM -0700, Darrick J. Wong wrote:
> > > > > > Hi all,
> > > > > > 
> > > > > > I saw a crash in xfs/141 last night while testing xfs for-next on a V4
> > > > > > filesystem.  I think the problem here is that:
> > > > > > 
> > > > > >  1. xfs_attr_set calls xfs_attr_set_args.
> > > > > > 
> > > > > >  2. xfs_attr_set_args calls xfs_attr_shortform_to_leaf and sets @leaf_bp.
> > > > > > 
> > > > > >  3. _set_args calls xfs_defer_finish.
> > > > > > 
> > > > > >  4. xfs_defer_finish calls xfs_defer_trans_roll calls xfs_trans_roll.
> > > > > > 
> > > > > >  5. xfs_trans_roll sets @args.trans to the new transaction created by
> > > > > >     xfs_trans_dup.
> > > > > > 
> > > > > >  6. xfs_trans_roll calls __xfs_trans_commit on the old transaction and
> > > > > >     fails because we injected a log error.
> > > > > > 
> > > > > >  7. _trans_roll passes the error code up to _defer_finish.
> > > > > > 
> > > > > >  8. _defer_finish passes the error code up to _attr_set without joining
> > > > > >     anything (in particular leaf_bp which we bhold'd so that we would
> > > > > >     retain the buffer lock) to the new transaction.  Note that
> > > > > >     @args.trans is still set to the new transaction; nobody's cancelled it.
> > > > > > 
> > > > > >  8. xfs_attr_set tries to bail out through out_release_leaf.  The
> > > > > >     leaf_bp is non-null and args.trans is non-null, so we call
> > > > > >     xfs_trans_brelse to release leaf_bp from args.trans
> > > > > > 
> > > > > >  9. leaf_bp isn't joined to any transaction at all, but we passed in a
> > > > > >     non-null args.trans so _trans_brelse hits the "bp->b_transp == tp"
> > > > > >     assert and then crashes trying to assert on the buffer log item
> > > > > >     which is null because leaf_bp isn't joined to any transaction.
> > > > > > 
> > > > > > I think the underlying problem here is that we haven't precisely defined
> > > > > > what's supposed to happen when the _trans_roll in _defer_trans_roll
> > > > > > errors out.  Looking at the three callers of trans_bhold (attr setting,
> > > > > > allocating disk space for a new dquot, and online repair) the handling
> > > > > > of this situation is not consistent.
> > > > > > 
> > > > > > (Embarassingly, the first of those three was RVB'd by me and the latter
> > > > > > two were written by me :P)
> > > > > > 
> > > > > > Should we attach the bholded buffers to the new transaction even if the
> > > > > > old transaction failed to commit so that callers can xfs_trans_brelse
> > > > > > the buffer and it all works out (like what the attr code does)?
> > > > > > 
> > > > > > Or should we cancel the new transaction and set *tpp to NULL, which will
> > > > > > also solve this problem for the attr case since args.trans will be null
> > > > > > after the old commit fails?  But then we run into the problem that now
> > > > > > every caller has to be prepared to deal with the transaction being null
> > > > > > if something went wrong.  All of those callers ought to be calling
> > > > > > xfs_trans_cancel if that's the case, so we might be able to get away
> > > > > > with adding a if (!tp) return; at the top of that function.
> > > > > > 
> > > > > > Or are the callers of _defer_finish/_trans_roll simply supposed to know
> > > > > > that the buffer might not be joined if the roll fails and deal with it
> > > > > > appropriately?
> > > > > > 
> > > > > > Extra complication: if the rolling succeeds but one of the
> > > > > > defer_ops->finish_item() calls fails then the buffer /will/ be joined to
> > > > > > the transaction being passed out.
> > > > > > 
> > > > > > I /think/ the easiest answer here is that we ought to join the buffers
> > > > > > to the new transaction even if committing the old one fails.  I also
> > > > > > suspect the out_release code in xrep_roll_ag_trans is wrong, but that's
> > > > > > my problem to fix. :)
> > > > > > 
> > > > > 
> > > > > That seems like the most simple and predictable approach to me. Then the
> > > > > caller doesn't care whether it has the original transaction or a new
> > > > > one, it just needs to unwind its actions and cancel the tx in the event
> > > > > of an error.
> > > > 
> > > > <nod> I think it's the most straightforward option, but wanted to hear
> > > > someone else echo that.  Or tell me I'm a few cards short of a deck. :)
> > > > 
> > > > > BTW, a couple other things that look potentially suspicious in this same
> > > > > codepath:
> > > > > 
> > > > > 1.) If the referenced xfs_defer_finish() call succeeds and the
> > > > > subsequent xfs_trans_roll_inode() fails similarly, it looks like we can
> > > > > end up in a similar unowned buffer state in the caller.
> > > > 
> > > > Yep.  I think the bjoin has to go before the "if (error) return error;".
> > > > 
> > > 
> > > Yeah..
> > > 
> > > > > 2.) If we do end up returning to xfs_attr_set_args() with the buffer
> > > > > held to the transaction, ISTM it's likely dirty and thus
> > > > > xfs_trans_brelse() may not actually do anything with respect to the
> > > > > buffer hold. This seems like a potential buffer lock/ref leak vector.
> > > > 
> > > > Hm.  I thought we joined and logged the buffer to the first transaction,
> > > > which should clear the dirty state when the transaction commits (or
> > > > doesn't)?  Then we add the held buffer to the new transaction, but it's
> > > > clean, so it should just release...?  Maybe I need to add an error
> > > > injection knob to make rolling fail to see what happens...
> > > > 
> > > 
> > > I think it just depends where we fail. xfs_attr3_leaf_create() logs the
> > > buffer and it looks like we have a couple error checks after that but
> > > before we actually get to the defer finish code. If we fail there, ILTM
> > > that we return with the original transaction with the leaf buffer joined
> > > and dirty (but not yet held). If we do the bhold and make it into
> > > xfs_defer_finish() and fail before we actually roll (it looks like the
> > > only way this can happen is with a dfops held item overflow and thus
> > > probably wouldn't happen here outside of memory corruption or
> > > something), then I suppose we could technically end up in the caller
> > > with a held+dirty buf from there.
> > > 
> > > Once we do roll, the buffer ends up clean+held as you note above, in
> > > which case an xfs_trans_brelse() will cancel the hold. Though we're also
> > > assuming that any deferred operation in progress doesn't dirty any held
> > > buffers in the current transaction and then happen to fail and return to
> > > the caller with the same buffer hold+dirty state. I don't think that
> > > applies to the attr case (and it may not apply anywhere in the current
> > > code, I haven't really looked), but it might be something to consider if
> > > we're defining general behavior or error handling rules.
> > > 
> > > (Regardless I think sticking some error tags in some of this bhold error
> > > handling code and seeing what actually happens is probably a good idea.)
> > > 
> > > > > A quick look at xfs_buf_item_unlock() suggests we don't drop the hold in
> > > > > the abort case, but I could have missed something. If not, perhaps we'd
> > > > > also need to make sure we release the hold before we return from
> > > > > xfs_attr_set_args().
> > > > 
> > > > We don't?  I would have thought:
> > > > 
> > > > 	bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);
> > > > 
> > > > would do that?  <shrug> Maybe I'm confused.
> > > > 
> > > 
> > > xfs_buf_item_unlock() samples the current state (into the hold, stale,
> > > etc. bools), clears the bli flags in the line you posted above but then
> > > acts on the original state:
> > > 
> > > 	...
> > >         if (hold || (stale && !released))
> > >                 return;
> > > 
> > > So AFAICT a commit/cancel/abort of a transaction with a held buffer
> > > leaves the buffer lock/release as the responsibility of the caller.
> > 
> > Hm.  One thing that's been bugging me for a long time about the way we
> > maintain locks on objects across a defer roll is that we treat inodes
> > and buffers differently:
> > 
> > For inodes we ijoin them to a transaction along with ilock_flags that
> > tell the transaction what we want to unlock when the transaction commits
> > or cancels.  We interpret ilock_flags == 0 in the inode log item to mean
> > "let the caller unlock the inode when the transaction is done", which
> > means that the caller must xfs_iunlock and xfs_irele after disposing of
> > that new transaction.
> > 
> > However, held buffers are treated differently -- BLI_HOLD being set in
> > the buffer log item means the transaction commit (or cancel) maintains
> > the lock but always clears the hold.  Hence the defer roll rejoins and
> > reholds the buffer to the new transaction unless something goes wrong,
> > which is how we ended up in this weird place.
> > 
> 
> The tx commit clears the bli hold state flag, but that's because the bli
> is no longer associated with the transaction (which has been destroyed).
> This is similar to how xfs_inode_item_unlock() does the following:
> 
>         lock_flags = iip->ili_lock_flags;
>         iip->ili_lock_flags = 0;
>         if (lock_flags)
>                 xfs_iunlock(ip, lock_flags);
> 
> ... to clear the ili state and act on lock_flags. In both cases (a held
> buf or lock_flags == 0 inode), the object remains owned by the caller.
> 
> > So what if we just change the buffer join rules to be more like inodes?
> > Setting BLI_HOLD means that the buffer will not be unlocked or released
> > as part of transaction commit or cancel; the defer roll will join the
> > buffer (and the hold state) to the new transaction if nothing went
> > wrong, and we make it the responsibility of the caller to xfs_buf_unlock
> > and xfs_buf_relee the buffer after disposing of the new transaction,
> > just like we do for inodes.
> > 
> 
> Hmmm not sure I follow... what you describe above sounds like the
> current behavior of a BLI_HOLD buffer in a transaction. The
> xfs_defer_finish() call carries that state forward across however many
> transaction rolls it does and then the final xfs_trans_commit() leaves
> the objects owned by the caller according to its bhold/lock_flags
> semantics.
> 
> To put it another way, any caller that does xfs_trans_bhold() or
> xfs_trans_ijoin(ip, 0) of an associated object currently expects to have
> to release that object after trans commit. Are you suggesting that we
> lift that behavior to xfs_defer_finish() such that after the defer call
> said objects are owned by the caller at that point (i.e., the caller
> would have to rejoin to the final transaction if that is desired)?

I ... think I just had a giant thinko last night.  If xfs_defer_finish
returns an error, we don't know much of anything about the state of the
transaction -- it could be a fresh tx with nothing joined because
committing the old tx failed, or it could be a dirty tx with a bunch of
things joined (our bhold'en buffers included).  For all we know the held
resources are joined and dirty, which means we have to cancel the
transaction and /then/ we can safely unlock and release the held inodes
and buffers.

--D

> 
> Brian
> 
> > There are only four users of holds: the defer code itself; dquot
> > allocation; attr leaf conversion; inode allocation; and online repair,
> > so it won't be hard to audit and fix all four if we change the behavior.
> > 
> > --D
> > 
> > > 
> > > > > Hmm, looking around a bit more outside of this attr case I see code like
> > > > > the following in xfs_dquot_disk_alloc():
> > > > > 
> > > > > 	...
> > > > >         xfs_trans_bhold(tp, bp);
> > > > >         error = xfs_defer_finish(tpp);
> > > > >         tp = *tpp;
> > > > >         if (error) {
> > > > >                 xfs_buf_relse(bp);
> > > > >                 return error;
> > > > > 	}
> > > > > 	...
> > > > > 
> > > > > That code looks like it actually expects the unowned buf state on a
> > > > > failed finish as it explicitly releases it on error. Perhaps this is an
> > > > > argument for treating a failed xfs_defer_finish() like a failed commit
> > > > > and it is the responsibility of the caller to grok that (meaning the
> > > > > caller now owns the buf, not the tx). Then again, I'm not sure how the
> > > > > caller knows whether xfs_defer_finish() failed due to a tx roll (where
> > > > 
> > > > I don't think the caller has any good way of finding out, aside from
> > > > probing xfs_buf.b_transp == *tpp and there's a buffer log item and
> > > > XFS_BLI_HOLD is set, which is kind of gross...
> > > > 
> > > 
> > > Indeed...
> > > 
> > > > bool xfs_buf_is_still_held_by_me_or_did_we_screw_up(xfs_buf_t *bp);
> > > > 
> > > > > we currently haven't re-held the buf) or something later in dfops
> > > > > processing (where we might have already re-held the buf), so that seems
> > > > > fishy too... :/
> > > > 
> > > > Yeah.  I think the error handling here (in quota) is simply wrong; that
> > > > ought to be "xfs_trans_brelse(*tpp, bp);" not a regular xfs_buf_relse.
> > > > 
> > > 
> > > Yeah, I agree. The takeaway I think is just that if we fix up the
> > > xfs_defer_finish() error path to have more consistent behavior then we
> > > might want to audit some of these other codepaths that call with held
> > > bufs/inodes.
> > > 
> > > Brian
> > > 
> > > > --D
> > > > 
> > > > > 
> > > > > Brian
> > > > > 
> > > > > > --D
> > > > > > 
> > > > > > run fstests xfs/141 at 2019-04-23 00:01:13
> > > > > > XFS (sdc): Mounting V4 Filesystem
> > > > > > XFS (sdc): Ending clean mount
> > > > > > XFS (sdb): Mounting V4 Filesystem
> > > > > > XFS (sdb): Ending clean mount
> > > > > > XFS (sdb): Quotacheck needed: Please wait.
> > > > > > XFS (sdb): Quotacheck: Done.
> > > > > > XFS (sdb): Injecting error (false) at file fs/xfs/xfs_log.c, line 1924, on filesystem "sdb"
> > > > > > XFS (sdb): Intentionally corrupted log record at LSN 0x100000e0e. Shutdown imminent.
> > > > > > XFS (sdb): metadata I/O error in "xlog_iodone" at daddr 0x23762e len 64 error 0
> > > > > > XFS (sdb): xfs_do_force_shutdown(0x2) called from line 1272 of file fs/xfs/xfs_log.c. Return address = ffffffffa064e48b
> > > > > > XFS: Assertion failed: bp->b_transp == tp, file: fs/xfs/xfs_trans_buf.c, line: 385
> > > > > > potentially unexpected fatal signal 6.
> > > > > > CPU: 1 PID: 1154887 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > > > > RIP: 0033:0x7fdb91cd9e97
> > > > > > Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f
> > > > > > RSP: 002b:00007ffdb0d0f080 EFLAGS: 00000246 ORIG_RAX: 000000000000000e
> > > > > > RAX: 0000000000000000 RBX: 0000000003aed64a RCX: 00007fdb91cd9e97
> > > > > > RDX: 0000000000000000 RSI: 00007ffdb0d0f080 RDI: 0000000000000002
> > > > > > RBP: 0000000000458923 R08: 0000000000000000 R09: 00007ffdb0d0f080
> > > > > > R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000005
> > > > > > R13: 000000000000031e R14: 00007ffdb0d0f3d0 R15: 000055d212696560
> > > > > > FS:  00007fdb928cdb80 GS:  0000000000000000
> > > > > > potentially unexpected fatal signal 6.
> > > > > > CPU: 1 PID: 1154888 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > > > > RIP: 0033:0x7fdb91cd9e97
> > > > > > Code: c2 b8 ea 00 00 00 0f 05 48 3d 00 f0 ff ff 77 3b 41 89 c0 41 ba 08 00 00 00 31 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 <48> 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 00 44 89 c0 75 1f
> > > > > > RSP: 002b:00007ffdb0d0eaf0 EFLAGS: 00000246 ORIG_RAX: 000000000000000e
> > > > > > RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00007fdb91cd9e97
> > > > > > RDX: 0000000000000000 RSI: 00007ffdb0d0eaf0 RDI: 0000000000000002
> > > > > > RBP: 000055d2128a41c0 R08: 0000000000000000 R09: 00007ffdb0d0eaf0
> > > > > > R10: 0000000000000008 R11: 0000000000000246 R12: 000000000007a120
> > > > > > R13: 0000000051eb851f R14: 00007ffdb0d0f3d0 R15: 000055d21269ec60
> > > > > > FS:  00007fdb928cdb80 GS:  0000000000000000
> > > > > > XFS (sdb): Log I/O Error Detected. Shutting down filesystem
> > > > > > XFS (sdb): Please unmount the filesystem and rectify the problem(s)
> > > > > > WARNING: CPU: 0 PID: 1154885 at fs/xfs/xfs_message.c:104 assfail+0x27/0x2a [xfs]
> > > > > > Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> > > > > > CPU: 0 PID: 1154885 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > > > > RIP: 0010:assfail+0x27/0x2a [xfs]
> > > > > > Code: 0f 0b c3 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 50 7c 6a a0 48 89 fa 31 ff e8 a6 f8 ff ff 80 3d 45 94 0a 00 00 74 02 0f 0b <0f> 0b c3 48 8b b3 a8 02 00 00 48 c7 c7 e8 80 6a a0 c6 05 66 8f 0a
> > > > > > RSP: 0018:ffffc9000194bbb0 EFLAGS: 00010246
> > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > > > > RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa0699d13
> > > > > > RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> > > > > > R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> > > > > > R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> > > > > > FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > CR2: 00007fdb9195c000 CR3: 00000000770bd000 CR4: 00000000000006b0
> > > > > > Call Trace:
> > > > > >  xfs_trans_brelse+0x185/0x210 [xfs]
> > > > > >  xfs_attr_set+0x2ce/0x300 [xfs]
> > > > > >  xfs_xattr_set+0x54/0xa0 [xfs]
> > > > > >  __vfs_setxattr+0x66/0x80
> > > > > >  __vfs_setxattr_noperm+0x54/0xf0
> > > > > >  vfs_setxattr+0x81/0xa0
> > > > > >  setxattr+0x146/0x1d0
> > > > > >  path_setxattr+0xd0/0xf0
> > > > > >  __x64_sys_setxattr+0x27/0x30
> > > > > >  do_syscall_64+0x50/0x160
> > > > > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > > > RIP: 0033:0x7fdb91db6839
> > > > > > Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
> > > > > > RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
> > > > > > RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839
> > > > > > RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0
> > > > > > RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0
> > > > > > R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001
> > > > > > R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60
> > > > > > irq event stamp: 0
> > > > > > hardirqs last  enabled at (0): [<0000000000000000>]           (null)
> > > > > > hardirqs last disabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220
> > > > > > softirqs last  enabled at (0): [<ffffffff81057804>] copy_process+0x8d4/0x2220
> > > > > > softirqs last disabled at (0): [<0000000000000000>]           (null)
> > > > > > ---[ end trace f2eeafdb05fd11a1 ]---
> > > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> > > > > > #PF error: [normal kernel read fault]
> > > > > > PGD 0 P4D 0 
> > > > > > Oops: 0000 [#1] PREEMPT SMP
> > > > > > CPU: 0 PID: 1154885 Comm: fsstress Tainted: G        W         5.1.0-rc5-djw #rc5
> > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014
> > > > > > RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs]
> > > > > > Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f
> > > > > > RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286
> > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > > > > RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13
> > > > > > RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> > > > > > R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> > > > > > R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> > > > > > FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0
> > > > > > Call Trace:
> > > > > >  xfs_attr_set+0x2ce/0x300 [xfs]
> > > > > >  xfs_xattr_set+0x54/0xa0 [xfs]
> > > > > >  __vfs_setxattr+0x66/0x80
> > > > > >  __vfs_setxattr_noperm+0x54/0xf0
> > > > > >  vfs_setxattr+0x81/0xa0
> > > > > >  setxattr+0x146/0x1d0
> > > > > >  path_setxattr+0xd0/0xf0
> > > > > >  __x64_sys_setxattr+0x27/0x30
> > > > > >  do_syscall_64+0x50/0x160
> > > > > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > > > RIP: 0033:0x7fdb91db6839
> > > > > > Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
> > > > > > RSP: 002b:00007ffdb0d0f318 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
> > > > > > RAX: ffffffffffffffda RBX: 000000000000005d RCX: 00007fdb91db6839
> > > > > > RDX: 000055d212bd8b60 RSI: 00007ffdb0d0f360 RDI: 000055d212bf59f0
> > > > > > RBP: 0000000000000001 R08: 0000000000000001 R09: 00007ffdb0d0f3d0
> > > > > > R10: 000000000000005d R11: 0000000000000246 R12: 0000000000000001
> > > > > > R13: 000000000000031a R14: 00007ffdb0d0f360 R15: 000055d212bd8b60
> > > > > > Modules linked in: dm_snapshot xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ext2 ext4 mbcache jbd2 dm_flakey ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_set_hash_net xt_set ip_set_hash_mac ip_set nfnetlink bfq mq_deadline ip6table_filter ip6_tables iptable_filter dax_pmem_compat device_dax dax_pmem_core nd_pmem sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
> > > > > > Dumping ftrace buffer:
> > > > > >    (ftrace buffer empty)
> > > > > > CR2: 0000000000000038
> > > > > > ---[ end trace f2eeafdb05fd11a2 ]---
> > > > > > RIP: 0010:xfs_trans_brelse+0x48/0x210 [xfs]
> > > > > > Code: 85 4a 01 00 00 48 85 ed 0f 84 8a 00 00 00 66 66 66 66 90 65 8b 05 a8 85 9a 5f 89 c0 48 0f a3 05 06 d4 aa e1 0f 82 ee 00 00 00 <81> 7b 38 3c 12 00 00 0f 85 52 01 00 00 8b 83 a8 00 00 00 85 c0 0f
> > > > > > RSP: 0018:ffffc9000194bbb8 EFLAGS: 00010286
> > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > > > > RDX: 0000000000000002 RSI: 000000000000000a RDI: ffffffffa0699d13
> > > > > > RBP: ffff88803c45cb00 R08: 0000000000000000 R09: 0000000000000000
> > > > > > R10: 000000000000000a R11: f000000000000000 R12: ffff888060e41b80
> > > > > > R13: 0000000000000010 R14: 000000000000005d R15: 0000000000000000
> > > > > > FS:  00007fdb928cdb80(0000) GS:ffff88803e800000(0000) knlGS:0000000000000000
> > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > CR2: 0000000000000038 CR3: 00000000770bd000 CR4: 00000000000006b0
> > > > > > 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-04-24 15:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-23 16:49 RFC: what to do when trans_roll fails and we bheld buffers? (crash in xfs/141) Darrick J. Wong
2019-04-23 18:08 ` Brian Foster
2019-04-23 20:44   ` Darrick J. Wong
2019-04-23 21:33     ` Brian Foster
2019-04-24  3:30       ` Darrick J. Wong
2019-04-24 11:19         ` Brian Foster
2019-04-24 15:43           ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.