[PATCH 0/2] xfs: fix panics seen with error injection

* [PATCH 0/2] xfs: fix panics seen with error injection
@ 2018-11-07 20:10 Josef Bacik
  2018-11-07 20:10 ` [PATCH 1/2] xfs: change xfs_buf_ioapply_map to STATIC Josef Bacik
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Josef Bacik @ 2018-11-07 20:10 UTC (permalink / raw)
  To: kernel-team, linux-xfs

I have been trying to debug a xfs hang that happens sometimes when NBD
disconnects, but when trying to do error injection the box just falls over right
away with a panic trying to access an xfs_buf that has been freed.  I hit this
consistently with the reproducer you can find here

https://github.com/josefbacik/debug-scripts/tree/master/xfs-hang

You need to have bcc installed, have the error injection stuff turned on, and
just run

./reproducer.sh

You'll want to modify test.sh to point at wherever your fsstress is, and
whatever device you want it to use.  It'll walk through functions injecting
errors and usually craps out when it hits xfs_btree_log_recs.

What the script does is triggers on whatever function you are looking at
(xfs_btree_log_recs for example) and then anything that dirties a xfs_buf in
that path will save that xfs_buf for later.  Then when we go to do
xfs_buf_ioapply_map on that buf (which eventually calls submit_bio) we'll fail
that bio.  Xfs errors out and things carry on.

In my testing however it seems like we're dropping the ref on failed xfs_buf's
prematurely, so they get freed before we're able to add them to the delwri list
to be retried.  The 2/2 patch fixes this problem.  The 1/2 patch makes it
possible for the reproducer to work, as it relies on being able to attach a
kprobe/kretprobe at xfs_buf_ioapply_map.

With this patch xfs doesn't fall over as soon as I start trying to reproduce the
hang I'm actually trying to find.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread