linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] xfs: fix panics seen with error injection
@ 2018-11-07 20:10 Josef Bacik
  2018-11-07 20:10 ` [PATCH 1/2] xfs: change xfs_buf_ioapply_map to STATIC Josef Bacik
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Josef Bacik @ 2018-11-07 20:10 UTC (permalink / raw)
  To: kernel-team, linux-xfs

I have been trying to debug a xfs hang that happens sometimes when NBD
disconnects, but when trying to do error injection the box just falls over right
away with a panic trying to access an xfs_buf that has been freed.  I hit this
consistently with the reproducer you can find here

https://github.com/josefbacik/debug-scripts/tree/master/xfs-hang

You need to have bcc installed, have the error injection stuff turned on, and
just run

./reproducer.sh

You'll want to modify test.sh to point at wherever your fsstress is, and
whatever device you want it to use.  It'll walk through functions injecting
errors and usually craps out when it hits xfs_btree_log_recs.

What the script does is triggers on whatever function you are looking at
(xfs_btree_log_recs for example) and then anything that dirties a xfs_buf in
that path will save that xfs_buf for later.  Then when we go to do
xfs_buf_ioapply_map on that buf (which eventually calls submit_bio) we'll fail
that bio.  Xfs errors out and things carry on.

In my testing however it seems like we're dropping the ref on failed xfs_buf's
prematurely, so they get freed before we're able to add them to the delwri list
to be retried.  The 2/2 patch fixes this problem.  The 1/2 patch makes it
possible for the reproducer to work, as it relies on being able to attach a
kprobe/kretprobe at xfs_buf_ioapply_map.

With this patch xfs doesn't fall over as soon as I start trying to reproduce the
hang I'm actually trying to find.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] xfs: change xfs_buf_ioapply_map to STATIC
  2018-11-07 20:10 [PATCH 0/2] xfs: fix panics seen with error injection Josef Bacik
@ 2018-11-07 20:10 ` Josef Bacik
  2018-11-15 10:19   ` Christoph Hellwig
  2018-11-07 20:10 ` [PATCH 2/2] xfs: take a ref on failed bufs in xfs_inode_item_push Josef Bacik
  2018-11-07 22:55 ` [PATCH 0/2] xfs: fix panics seen with error injection Dave Chinner
  2 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2018-11-07 20:10 UTC (permalink / raw)
  To: kernel-team, linux-xfs

In order to make error injection triggering easier make
xfs_buf_ioapply_map STATIC so it's noinline and can be kprobe'ed.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/xfs/xfs_buf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index b21ea2ba768d..341bdb55b650 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1327,7 +1327,7 @@ xfs_buf_bio_end_io(
 	bio_put(bio);
 }
 
-static void
+STATIC void
 xfs_buf_ioapply_map(
 	struct xfs_buf	*bp,
 	int		map,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] xfs: take a ref on failed bufs in xfs_inode_item_push
  2018-11-07 20:10 [PATCH 0/2] xfs: fix panics seen with error injection Josef Bacik
  2018-11-07 20:10 ` [PATCH 1/2] xfs: change xfs_buf_ioapply_map to STATIC Josef Bacik
@ 2018-11-07 20:10 ` Josef Bacik
  2018-11-07 23:37   ` Dave Chinner
  2018-11-07 22:55 ` [PATCH 0/2] xfs: fix panics seen with error injection Dave Chinner
  2 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2018-11-07 20:10 UTC (permalink / raw)
  To: kernel-team, linux-xfs

If we failed to writeout a xfs_buf we'll grab a ref for it and put it on
li->li_buf.  Then when submitting the failed bufs we'll clear LI_FAILED
on the li, which clears the LI_FAILED flag, but also drops the ref on
the buf.  Since it isn't on a IO list at this point this could very well
be the last ref on the buf, which wreaks havoc when we go to add the buf
to the delwrite list.  Fix this by holding a ref on the buf before we
call xfs_buf_resubmit_failed_buffers in order to make sure the buf
doesn't disappear before we're able to clear the error and add it to the
delwri list.  This fixes the panics I was seeing with error injection.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/xfs/xfs_inode_item.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index fa1c4fe2ffbf..df49adf1989c 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -503,13 +503,16 @@ xfs_inode_item_push(
 	 * previously. Resubmit the buffer for IO.
 	 */
 	if (test_bit(XFS_LI_FAILED, &lip->li_flags)) {
-		if (!xfs_buf_trylock(bp))
+		xfs_buf_hold(bp);
+		if (!xfs_buf_trylock(bp)) {
+			xfs_buf_rele(bp);
 			return XFS_ITEM_LOCKED;
+		}
 
 		if (!xfs_buf_resubmit_failed_buffers(bp, buffer_list))
 			rval = XFS_ITEM_FLUSHING;
 
-		xfs_buf_unlock(bp);
+		xfs_buf_relse(bp);
 		return rval;
 	}
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] xfs: fix panics seen with error injection
  2018-11-07 20:10 [PATCH 0/2] xfs: fix panics seen with error injection Josef Bacik
  2018-11-07 20:10 ` [PATCH 1/2] xfs: change xfs_buf_ioapply_map to STATIC Josef Bacik
  2018-11-07 20:10 ` [PATCH 2/2] xfs: take a ref on failed bufs in xfs_inode_item_push Josef Bacik
@ 2018-11-07 22:55 ` Dave Chinner
  2 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2018-11-07 22:55 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-xfs

On Wed, Nov 07, 2018 at 03:10:53PM -0500, Josef Bacik wrote:
> I have been trying to debug a xfs hang that happens sometimes when NBD
> disconnects, but when trying to do error injection the box just falls over right
> away with a panic trying to access an xfs_buf that has been freed.  I hit this
> consistently with the reproducer you can find here
> 
> https://github.com/josefbacik/debug-scripts/tree/master/xfs-hang
> 
> You need to have bcc installed, have the error injection stuff turned on, and
> just run
> 
> ./reproducer.sh

Oh, custom error injection, not the XFS error injection
infrastructure.

> You'll want to modify test.sh to point at wherever your fsstress is, and
> whatever device you want it to use.  It'll walk through functions injecting
> errors and usually craps out when it hits xfs_btree_log_recs.
> 
> What the script does is triggers on whatever function you are looking at
> (xfs_btree_log_recs for example) and then anything that dirties a xfs_buf in
> that path will save that xfs_buf for later. i

Ok, I see that xfs-log-paths.txt file that contains the trigger
functions contains basically every path that can modify a buffer.

> Then when we go to do
> xfs_buf_ioapply_map on that buf (which eventually calls submit_bio) we'll fail
> that bio.  Xfs errors out and things carry on.

IOWs, it shoots down every buffer write after a delay period.

Which basically is no different from just erroring out in
xfs_buf_iodone_callbacks() in buffer write IO completion. i.e.
there's no need to trigger on functions that modify the buffers,
because they all end up in the same function and error handling code
on IO completion.

In fact, we have error injection mechanisms in XFS that can be used
for this which are triggered by sysfs attributes on
CONFIG_XFS_DEBUG=y kernels. i.e:

$ ls -1 /sys/fs/xfs/vdb/errortag/
ag_resv_critical
bad_summary
bmap_finish_one
bmapifmt
btree_chk_lblk
btree_chk_sblk
buf_lru_ref
bulkstat
dareadbuf
diowrite
dirinovalid
drop_writes
force_repair
free_extent
iflush1
iflush2
iflush3
iflush4
iflush5
iflush6
itobp
iunlink
iunlinkrm
log_bad_crc
logiodone
log_item_pin
noerror
readagf
readagi
refcount_continue_update
refcount_finish_one
rmap_finish_one
stratcmpl
stratread
$

The injection tag of note is logiodone -  it fails log IO to
simulate log IO failure shutdowns.

static void
xlog_iodone(xfs_buf_t *bp)
{
        struct xlog_in_core     *iclog = bp->b_log_item;
        struct xlog             *l = iclog->ic_log;
        int                     aborted = 0;

        /*
         * Race to shutdown the filesystem if we see an error or the iclog is in
         * IOABORT state. The IOABORT state is only set in DEBUG mode to inject
         * CRC errors into log recovery.
         */
>>>>>    if (XFS_TEST_ERROR(bp->b_error, l->l_mp, XFS_ERRTAG_IODONE_IOERR) ||
	     iclog->ic_state & XLOG_STATE_IOABORT) {


So if you want to catch logged buffers that are being written to
error them out and hence test the error retry paths, then replicate
xlog_iodone injection into xfs_buf_iodone_callbacks(). All modified
buffers have this iodone function attached to them - it does all the
post-io completion error handling and log item cleaning. i.e. this
small change will basically do the same sort of testing:

void
xfs_buf_iodone_callbacks(
        struct xfs_buf          *bp)
{
        /*
         * If there is an error, process it. Some errors require us
         * to run callbacks after failure processing is done so we
         * detect that and take appropriate action.
         */
-	if (bp->b_error && xfs_buf_iodone_callback_error(bp))
+	if (XFS_TEST_ERROR(bp->b_error, bp->b_mount, XFS_ERRTAG_IODONE_IOERR) &&
+	    xfs_buf_iodone_callback_error(bp))
                return;

It would probably be better to add a separate "buffer_write" error
tag to sysfs so you don't shut down the filesystem by induced log Io
errors, but essentially this will do exactly what your custom error
injection is doing.

And, by using the built in error injection, you can write a fstest
to exercise the buffer write error handling....


> In my testing however it seems like we're dropping the ref on failed xfs_buf's
> prematurely, so they get freed before we're able to add them to the delwri list
> to be retried.  The 2/2 patch fixes this problem.  The 1/2 patch makes it
> possible for the reproducer to work, as it relies on being able to attach a
> kprobe/kretprobe at xfs_buf_ioapply_map.

Use the built in error injection, and there's no need for hacking
code just so you can place one-off debug probes.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] xfs: take a ref on failed bufs in xfs_inode_item_push
  2018-11-07 20:10 ` [PATCH 2/2] xfs: take a ref on failed bufs in xfs_inode_item_push Josef Bacik
@ 2018-11-07 23:37   ` Dave Chinner
  2018-11-07 23:43     ` Josef Bacik
                       ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Dave Chinner @ 2018-11-07 23:37 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-xfs

On Wed, Nov 07, 2018 at 03:10:55PM -0500, Josef Bacik wrote:
> If we failed to writeout a xfs_buf we'll grab a ref for it and put it on
> li->li_buf.  Then when submitting the failed bufs we'll clear LI_FAILED
> on the li, which clears the LI_FAILED flag, but also drops the ref on
> the buf.  Since it isn't on a IO list at this point this could very well
> be the last ref on the buf, which wreaks havoc when we go to add the buf
> to the delwrite list.  Fix this by holding a ref on the buf before we
> call xfs_buf_resubmit_failed_buffers in order to make sure the buf
> doesn't disappear before we're able to clear the error and add it to the
> delwri list.  This fixes the panics I was seeing with error injection.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/xfs/xfs_inode_item.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index fa1c4fe2ffbf..df49adf1989c 100644
> --- a/fs/xfs/xfs_inode_item.c
> +++ b/fs/xfs/xfs_inode_item.c
> @@ -503,13 +503,16 @@ xfs_inode_item_push(
>  	 * previously. Resubmit the buffer for IO.
>  	 */
>  	if (test_bit(XFS_LI_FAILED, &lip->li_flags)) {
> -		if (!xfs_buf_trylock(bp))
> +		xfs_buf_hold(bp);
> +		if (!xfs_buf_trylock(bp)) {
> +			xfs_buf_rele(bp);
>  			return XFS_ITEM_LOCKED;
> +		}
>  
>  		if (!xfs_buf_resubmit_failed_buffers(bp, buffer_list))
>  			rval = XFS_ITEM_FLUSHING;
>  
> -		xfs_buf_unlock(bp);
> +		xfs_buf_relse(bp);
>  		return rval;
>  	}

This just doesn't smell right to me.

/me rummages around in the code

Ok, so I think the bug is in xfs_buf_resubmit_failed_buffers() in
that it removes all the "failed" reference counts from the buffer
before it adds the delwri reference count back to the buffer. Taking
a high level reference count like above just papers over the
transient reference counting error in
xfs_buf_resubmit_failed_buffers().

It also fails to fix the other xfs_buf_resubmit_failed_buffers()
caller, which has exactly the same problem (dquot writeback).

Perhaps something like the patch below?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers

From: Dave Chinner <dchinner@redhat.com>

When retrying a failed inode or dquot buffer,
xfs_buf_resubmit_failed_buffers() clears all the failed flags from
the inode/dquot log items. In doing so, it also drops all the
reference counts on the buffer that the failed log items hold. This
means it can drop all the active references on the buffer and hence
free the buffer before it queues it for write again.

Putting the buffer on the delwri queue takes a reference to the
buffer (so that it hangs around until it has been written and
completed), but this goes bang if the buffer has already been freed.

Hence we need to add the buffer to the delwri queue before we remove
the failed flags from the log items attached to the buffer to ensure
it always remains referenced during the resubmit process.

Reported-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf_item.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 12d8455bfbb2..010db5f8fb00 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -1233,9 +1233,23 @@ xfs_buf_iodone(
 }
 
 /*
- * Requeue a failed buffer for writeback
+ * Requeue a failed buffer for writeback.
  *
- * Return true if the buffer has been re-queued properly, false otherwise
+ * We clear the log item failed state here as well, but we have to be careful
+ * about reference counts because the only active reference counts on the buffer
+ * may be the failed log items. Hence if we clear the log item failed state
+ * before queuing the buffer for IO we can release all active references to
+ * the buffer and free it, leading to use after free problems in
+ * xfs_buf_delwri_queue. It makes no difference to the buffer or log items which
+ * order we process them in - the buffer is locked, and we own the buffer list
+ * so nothing on them is going to change while we are performing this action.
+ *
+ * Hence we can safely queue the buffer for IO before we clear the failed log
+ * item state, therefore  always having an active reference to the buffer and
+ * avoiding the transient zero-reference state that leads to use-after-free.
+ *
+ * Return true if the buffer was added to the buffer list, false if it was
+ * already on the buffer list.
  */
 bool
 xfs_buf_resubmit_failed_buffers(
@@ -1243,16 +1257,16 @@ xfs_buf_resubmit_failed_buffers(
 	struct list_head	*buffer_list)
 {
 	struct xfs_log_item	*lip;
+	bool			ret;
+
+	ret = xfs_buf_delwri_queue(bp, buffer_list);
 
 	/*
-	 * Clear XFS_LI_FAILED flag from all items before resubmit
-	 *
-	 * XFS_LI_FAILED set/clear is protected by ail_lock, caller  this
+	 * XFS_LI_FAILED set/clear is protected by ail_lock, caller of this
 	 * function already have it acquired
 	 */
 	list_for_each_entry(lip, &bp->b_li_list, li_bio_list)
 		xfs_clear_li_failed(lip);
 
-	/* Add this buffer back to the delayed write list */
-	return xfs_buf_delwri_queue(bp, buffer_list);
+	return ret;
 }

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] xfs: take a ref on failed bufs in xfs_inode_item_push
  2018-11-07 23:37   ` Dave Chinner
@ 2018-11-07 23:43     ` Josef Bacik
  2018-11-08  0:48       ` Dave Chinner
  2018-11-07 23:57     ` Josef Bacik
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2018-11-07 23:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Josef Bacik, kernel-team, linux-xfs

On Thu, Nov 08, 2018 at 10:37:40AM +1100, Dave Chinner wrote:
> On Wed, Nov 07, 2018 at 03:10:55PM -0500, Josef Bacik wrote:
> > If we failed to writeout a xfs_buf we'll grab a ref for it and put it on
> > li->li_buf.  Then when submitting the failed bufs we'll clear LI_FAILED
> > on the li, which clears the LI_FAILED flag, but also drops the ref on
> > the buf.  Since it isn't on a IO list at this point this could very well
> > be the last ref on the buf, which wreaks havoc when we go to add the buf
> > to the delwrite list.  Fix this by holding a ref on the buf before we
> > call xfs_buf_resubmit_failed_buffers in order to make sure the buf
> > doesn't disappear before we're able to clear the error and add it to the
> > delwri list.  This fixes the panics I was seeing with error injection.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/xfs/xfs_inode_item.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> > index fa1c4fe2ffbf..df49adf1989c 100644
> > --- a/fs/xfs/xfs_inode_item.c
> > +++ b/fs/xfs/xfs_inode_item.c
> > @@ -503,13 +503,16 @@ xfs_inode_item_push(
> >  	 * previously. Resubmit the buffer for IO.
> >  	 */
> >  	if (test_bit(XFS_LI_FAILED, &lip->li_flags)) {
> > -		if (!xfs_buf_trylock(bp))
> > +		xfs_buf_hold(bp);
> > +		if (!xfs_buf_trylock(bp)) {
> > +			xfs_buf_rele(bp);
> >  			return XFS_ITEM_LOCKED;
> > +		}
> >  
> >  		if (!xfs_buf_resubmit_failed_buffers(bp, buffer_list))
> >  			rval = XFS_ITEM_FLUSHING;
> >  
> > -		xfs_buf_unlock(bp);
> > +		xfs_buf_relse(bp);
> >  		return rval;
> >  	}
> 
> This just doesn't smell right to me.
> 
> /me rummages around in the code
> 
> Ok, so I think the bug is in xfs_buf_resubmit_failed_buffers() in
> that it removes all the "failed" reference counts from the buffer
> before it adds the delwri reference count back to the buffer. Taking
> a high level reference count like above just papers over the
> transient reference counting error in
> xfs_buf_resubmit_failed_buffers().
> 
> It also fails to fix the other xfs_buf_resubmit_failed_buffers()
> caller, which has exactly the same problem (dquot writeback).
>

Hrm I thought it was weird cscope showed only one caller, turns out my cscope.db
was messed up from all the switching between branches.
 
> Perhaps something like the patch below?
> 

I thought about this, but I was worried that clearing the XFS_LI_FAILED may race
with submitting the IO and having it fail again, so we end up clearing it when
we need it set to resubmit again.  But you are the expert here, if that isn't
possible then I'm happy with this patch.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] xfs: take a ref on failed bufs in xfs_inode_item_push
  2018-11-07 23:37   ` Dave Chinner
  2018-11-07 23:43     ` Josef Bacik
@ 2018-11-07 23:57     ` Josef Bacik
  2018-11-08 19:36     ` Josef Bacik
  2018-11-12 14:23     ` Josef Bacik
  3 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2018-11-07 23:57 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Josef Bacik, kernel-team, linux-xfs

On Thu, Nov 08, 2018 at 10:37:40AM +1100, Dave Chinner wrote:
> On Wed, Nov 07, 2018 at 03:10:55PM -0500, Josef Bacik wrote:
> > If we failed to writeout a xfs_buf we'll grab a ref for it and put it on
> > li->li_buf.  Then when submitting the failed bufs we'll clear LI_FAILED
> > on the li, which clears the LI_FAILED flag, but also drops the ref on
> > the buf.  Since it isn't on a IO list at this point this could very well
> > be the last ref on the buf, which wreaks havoc when we go to add the buf
> > to the delwrite list.  Fix this by holding a ref on the buf before we
> > call xfs_buf_resubmit_failed_buffers in order to make sure the buf
> > doesn't disappear before we're able to clear the error and add it to the
> > delwri list.  This fixes the panics I was seeing with error injection.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/xfs/xfs_inode_item.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> > index fa1c4fe2ffbf..df49adf1989c 100644
> > --- a/fs/xfs/xfs_inode_item.c
> > +++ b/fs/xfs/xfs_inode_item.c
> > @@ -503,13 +503,16 @@ xfs_inode_item_push(
> >  	 * previously. Resubmit the buffer for IO.
> >  	 */
> >  	if (test_bit(XFS_LI_FAILED, &lip->li_flags)) {
> > -		if (!xfs_buf_trylock(bp))
> > +		xfs_buf_hold(bp);
> > +		if (!xfs_buf_trylock(bp)) {
> > +			xfs_buf_rele(bp);
> >  			return XFS_ITEM_LOCKED;
> > +		}
> >  
> >  		if (!xfs_buf_resubmit_failed_buffers(bp, buffer_list))
> >  			rval = XFS_ITEM_FLUSHING;
> >  
> > -		xfs_buf_unlock(bp);
> > +		xfs_buf_relse(bp);
> >  		return rval;
> >  	}
> 
> This just doesn't smell right to me.
> 
> /me rummages around in the code
> 
> Ok, so I think the bug is in xfs_buf_resubmit_failed_buffers() in
> that it removes all the "failed" reference counts from the buffer
> before it adds the delwri reference count back to the buffer. Taking
> a high level reference count like above just papers over the
> transient reference counting error in
> xfs_buf_resubmit_failed_buffers().
> 
> It also fails to fix the other xfs_buf_resubmit_failed_buffers()
> caller, which has exactly the same problem (dquot writeback).
> 
> Perhaps something like the patch below?
>

The other question, is it possible for the buffer to be submitted in another
thread immediately after it is queued for IO?  Because then we have the same
problem, we could get to xfs_buf_unlock() and it would have been freed in the
other thread.  If that's not possible either then I'm still cool with your
patch.  Thanks,

Josef 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] xfs: take a ref on failed bufs in xfs_inode_item_push
  2018-11-07 23:43     ` Josef Bacik
@ 2018-11-08  0:48       ` Dave Chinner
  2018-11-08  1:43         ` Josef Bacik
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2018-11-08  0:48 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-xfs

[compendium reply]

On Wed, Nov 07, 2018 at 06:43:03PM -0500, Josef Bacik wrote:
> On Thu, Nov 08, 2018 at 10:37:40AM +1100, Dave Chinner wrote:
> > On Wed, Nov 07, 2018 at 03:10:55PM -0500, Josef Bacik wrote:
> > > If we failed to writeout a xfs_buf we'll grab a ref for it and put it on
> > > li->li_buf.  Then when submitting the failed bufs we'll clear LI_FAILED
> > > on the li, which clears the LI_FAILED flag, but also drops the ref on
> > > the buf.  Since it isn't on a IO list at this point this could very well
> > > be the last ref on the buf, which wreaks havoc when we go to add the buf
> > > to the delwrite list.  Fix this by holding a ref on the buf before we
> > > call xfs_buf_resubmit_failed_buffers in order to make sure the buf
> > > doesn't disappear before we're able to clear the error and add it to the
> > > delwri list.  This fixes the panics I was seeing with error injection.
....
> > Perhaps something like the patch below?
> > 
> 
> I thought about this, but I was worried that clearing the XFS_LI_FAILED may race
> with submitting the IO and having it fail again, so we end up clearing it when
> we need it set to resubmit again.  But you are the expert here, if that isn't
> possible then I'm happy with this patch.  Thanks,

The buffer cannot be submitted while we are clearing the failed
flags because a) the caller holds the buffer locked and so owns it
completely, and b) the caller owns the buffer_list that the buffer
is queued to and so controls when the list of buffers is submitted
for IO.

IOWs, there is no possibility of racing with clearing the
XFS_LI_FAILED flags because we own everything in that context.

> The other question, is it possible for the buffer to be submitted in another
> thread immediately after it is queued for IO?

See a) above - you have to hold the buffer lock to submit it for IO.
Hence holding the buffer lock over queueing means nothing can submit
it for IO at the same time. And you have to hold the buffer lock to
submit it to the delwri list:


bool
xfs_buf_delwri_queue(
        struct xfs_buf          *bp,
        struct list_head        *list)
{
>>>>>   ASSERT(xfs_buf_islocked(bp));
        ASSERT(!(bp->b_flags & XBF_READ));

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] xfs: take a ref on failed bufs in xfs_inode_item_push
  2018-11-08  0:48       ` Dave Chinner
@ 2018-11-08  1:43         ` Josef Bacik
  0 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2018-11-08  1:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Josef Bacik, kernel-team, linux-xfs

On Thu, Nov 08, 2018 at 11:48:17AM +1100, Dave Chinner wrote:
> [compendium reply]
> 
> On Wed, Nov 07, 2018 at 06:43:03PM -0500, Josef Bacik wrote:
> > On Thu, Nov 08, 2018 at 10:37:40AM +1100, Dave Chinner wrote:
> > > On Wed, Nov 07, 2018 at 03:10:55PM -0500, Josef Bacik wrote:
> > > > If we failed to writeout a xfs_buf we'll grab a ref for it and put it on
> > > > li->li_buf.  Then when submitting the failed bufs we'll clear LI_FAILED
> > > > on the li, which clears the LI_FAILED flag, but also drops the ref on
> > > > the buf.  Since it isn't on a IO list at this point this could very well
> > > > be the last ref on the buf, which wreaks havoc when we go to add the buf
> > > > to the delwrite list.  Fix this by holding a ref on the buf before we
> > > > call xfs_buf_resubmit_failed_buffers in order to make sure the buf
> > > > doesn't disappear before we're able to clear the error and add it to the
> > > > delwri list.  This fixes the panics I was seeing with error injection.
> ....
> > > Perhaps something like the patch below?
> > > 
> > 
> > I thought about this, but I was worried that clearing the XFS_LI_FAILED may race
> > with submitting the IO and having it fail again, so we end up clearing it when
> > we need it set to resubmit again.  But you are the expert here, if that isn't
> > possible then I'm happy with this patch.  Thanks,
> 
> The buffer cannot be submitted while we are clearing the failed
> flags because a) the caller holds the buffer locked and so owns it
> completely, and b) the caller owns the buffer_list that the buffer
> is queued to and so controls when the list of buffers is submitted
> for IO.
> 
> IOWs, there is no possibility of racing with clearing the
> XFS_LI_FAILED flags because we own everything in that context.
> 
> > The other question, is it possible for the buffer to be submitted in another
> > thread immediately after it is queued for IO?
> 
> See a) above - you have to hold the buffer lock to submit it for IO.
> Hence holding the buffer lock over queueing means nothing can submit
> it for IO at the same time. And you have to hold the buffer lock to
> submit it to the delwri list:
> 
> 
> bool
> xfs_buf_delwri_queue(
>         struct xfs_buf          *bp,
>         struct list_head        *list)
> {
> >>>>>   ASSERT(xfs_buf_islocked(bp));
>         ASSERT(!(bp->b_flags & XBF_READ));
>

Ah yeah duh, thanks,

Josef 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] xfs: take a ref on failed bufs in xfs_inode_item_push
  2018-11-07 23:37   ` Dave Chinner
  2018-11-07 23:43     ` Josef Bacik
  2018-11-07 23:57     ` Josef Bacik
@ 2018-11-08 19:36     ` Josef Bacik
  2018-11-12 14:23     ` Josef Bacik
  3 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2018-11-08 19:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Josef Bacik, kernel-team, linux-xfs

On Thu, Nov 08, 2018 at 10:37:40AM +1100, Dave Chinner wrote:
> On Wed, Nov 07, 2018 at 03:10:55PM -0500, Josef Bacik wrote:
> > If we failed to writeout a xfs_buf we'll grab a ref for it and put it on
> > li->li_buf.  Then when submitting the failed bufs we'll clear LI_FAILED
> > on the li, which clears the LI_FAILED flag, but also drops the ref on
> > the buf.  Since it isn't on a IO list at this point this could very well
> > be the last ref on the buf, which wreaks havoc when we go to add the buf
> > to the delwrite list.  Fix this by holding a ref on the buf before we
> > call xfs_buf_resubmit_failed_buffers in order to make sure the buf
> > doesn't disappear before we're able to clear the error and add it to the
> > delwri list.  This fixes the panics I was seeing with error injection.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/xfs/xfs_inode_item.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> > index fa1c4fe2ffbf..df49adf1989c 100644
> > --- a/fs/xfs/xfs_inode_item.c
> > +++ b/fs/xfs/xfs_inode_item.c
> > @@ -503,13 +503,16 @@ xfs_inode_item_push(
> >  	 * previously. Resubmit the buffer for IO.
> >  	 */
> >  	if (test_bit(XFS_LI_FAILED, &lip->li_flags)) {
> > -		if (!xfs_buf_trylock(bp))
> > +		xfs_buf_hold(bp);
> > +		if (!xfs_buf_trylock(bp)) {
> > +			xfs_buf_rele(bp);
> >  			return XFS_ITEM_LOCKED;
> > +		}
> >  
> >  		if (!xfs_buf_resubmit_failed_buffers(bp, buffer_list))
> >  			rval = XFS_ITEM_FLUSHING;
> >  
> > -		xfs_buf_unlock(bp);
> > +		xfs_buf_relse(bp);
> >  		return rval;
> >  	}
> 
> This just doesn't smell right to me.
> 
> /me rummages around in the code
> 
> Ok, so I think the bug is in xfs_buf_resubmit_failed_buffers() in
> that it removes all the "failed" reference counts from the buffer
> before it adds the delwri reference count back to the buffer. Taking
> a high level reference count like above just papers over the
> transient reference counting error in
> xfs_buf_resubmit_failed_buffers().
> 
> It also fails to fix the other xfs_buf_resubmit_failed_buffers()
> caller, which has exactly the same problem (dquot writeback).
> 
> Perhaps something like the patch below?
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> 
> xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> When retrying a failed inode or dquot buffer,
> xfs_buf_resubmit_failed_buffers() clears all the failed flags from
> the inode/dquot log items. In doing so, it also drops all the
> reference counts on the buffer that the failed log items hold. This
> means it can drop all the active references on the buffer and hence
> free the buffer before it queues it for write again.
> 
> Putting the buffer on the delwri queue takes a reference to the
> buffer (so that it hangs around until it has been written and
> completed), but this goes bang if the buffer has already been freed.
> 
> Hence we need to add the buffer to the delwri queue before we remove
> the failed flags from the log items attached to the buffer to ensure
> it always remains referenced during the resubmit process.
> 
> Reported-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

You can add Tested-by as well, this survives everything (ufortunately still no
hang, so it's back to the drawing board for that.)  Thanks,

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] xfs: take a ref on failed bufs in xfs_inode_item_push
  2018-11-07 23:37   ` Dave Chinner
                       ` (2 preceding siblings ...)
  2018-11-08 19:36     ` Josef Bacik
@ 2018-11-12 14:23     ` Josef Bacik
  2018-11-13  5:12       ` Darrick J. Wong
  2018-11-14  8:10       ` Dave Chinner
  3 siblings, 2 replies; 14+ messages in thread
From: Josef Bacik @ 2018-11-12 14:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Josef Bacik, kernel-team, linux-xfs

On Thu, Nov 08, 2018 at 10:37:40AM +1100, Dave Chinner wrote:
> On Wed, Nov 07, 2018 at 03:10:55PM -0500, Josef Bacik wrote:
> > If we failed to writeout a xfs_buf we'll grab a ref for it and put it on
> > li->li_buf.  Then when submitting the failed bufs we'll clear LI_FAILED
> > on the li, which clears the LI_FAILED flag, but also drops the ref on
> > the buf.  Since it isn't on a IO list at this point this could very well
> > be the last ref on the buf, which wreaks havoc when we go to add the buf
> > to the delwrite list.  Fix this by holding a ref on the buf before we
> > call xfs_buf_resubmit_failed_buffers in order to make sure the buf
> > doesn't disappear before we're able to clear the error and add it to the
> > delwri list.  This fixes the panics I was seeing with error injection.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/xfs/xfs_inode_item.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> > index fa1c4fe2ffbf..df49adf1989c 100644
> > --- a/fs/xfs/xfs_inode_item.c
> > +++ b/fs/xfs/xfs_inode_item.c
> > @@ -503,13 +503,16 @@ xfs_inode_item_push(
> >  	 * previously. Resubmit the buffer for IO.
> >  	 */
> >  	if (test_bit(XFS_LI_FAILED, &lip->li_flags)) {
> > -		if (!xfs_buf_trylock(bp))
> > +		xfs_buf_hold(bp);
> > +		if (!xfs_buf_trylock(bp)) {
> > +			xfs_buf_rele(bp);
> >  			return XFS_ITEM_LOCKED;
> > +		}
> >  
> >  		if (!xfs_buf_resubmit_failed_buffers(bp, buffer_list))
> >  			rval = XFS_ITEM_FLUSHING;
> >  
> > -		xfs_buf_unlock(bp);
> > +		xfs_buf_relse(bp);
> >  		return rval;
> >  	}
> 
> This just doesn't smell right to me.
> 
> /me rummages around in the code
> 
> Ok, so I think the bug is in xfs_buf_resubmit_failed_buffers() in
> that it removes all the "failed" reference counts from the buffer
> before it adds the delwri reference count back to the buffer. Taking
> a high level reference count like above just papers over the
> transient reference counting error in
> xfs_buf_resubmit_failed_buffers().
> 
> It also fails to fix the other xfs_buf_resubmit_failed_buffers()
> caller, which has exactly the same problem (dquot writeback).
> 
> Perhaps something like the patch below?
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> 
> xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> When retrying a failed inode or dquot buffer,
> xfs_buf_resubmit_failed_buffers() clears all the failed flags from
> the inode/dquot log items. In doing so, it also drops all the
> reference counts on the buffer that the failed log items hold. This
> means it can drop all the active references on the buffer and hence
> free the buffer before it queues it for write again.
> 
> Putting the buffer on the delwri queue takes a reference to the
> buffer (so that it hangs around until it has been written and
> completed), but this goes bang if the buffer has already been freed.
> 
> Hence we need to add the buffer to the delwri queue before we remove
> the failed flags from the log items attached to the buffer to ensure
> it always remains referenced during the resubmit process.
> 
> Reported-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Dave,

Are you planning on sending this along as is?  I'm going to throw it in our
kernel if you are happy with it.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] xfs: take a ref on failed bufs in xfs_inode_item_push
  2018-11-12 14:23     ` Josef Bacik
@ 2018-11-13  5:12       ` Darrick J. Wong
  2018-11-14  8:10       ` Dave Chinner
  1 sibling, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2018-11-13  5:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Josef Bacik, kernel-team, linux-xfs

On Mon, Nov 12, 2018 at 09:23:48AM -0500, Josef Bacik wrote:
> On Thu, Nov 08, 2018 at 10:37:40AM +1100, Dave Chinner wrote:
> > On Wed, Nov 07, 2018 at 03:10:55PM -0500, Josef Bacik wrote:
> > > If we failed to writeout a xfs_buf we'll grab a ref for it and put it on
> > > li->li_buf.  Then when submitting the failed bufs we'll clear LI_FAILED
> > > on the li, which clears the LI_FAILED flag, but also drops the ref on
> > > the buf.  Since it isn't on a IO list at this point this could very well
> > > be the last ref on the buf, which wreaks havoc when we go to add the buf
> > > to the delwrite list.  Fix this by holding a ref on the buf before we
> > > call xfs_buf_resubmit_failed_buffers in order to make sure the buf
> > > doesn't disappear before we're able to clear the error and add it to the
> > > delwri list.  This fixes the panics I was seeing with error injection.
> > > 
> > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > > ---
> > >  fs/xfs/xfs_inode_item.c | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> > > index fa1c4fe2ffbf..df49adf1989c 100644
> > > --- a/fs/xfs/xfs_inode_item.c
> > > +++ b/fs/xfs/xfs_inode_item.c
> > > @@ -503,13 +503,16 @@ xfs_inode_item_push(
> > >  	 * previously. Resubmit the buffer for IO.
> > >  	 */
> > >  	if (test_bit(XFS_LI_FAILED, &lip->li_flags)) {
> > > -		if (!xfs_buf_trylock(bp))
> > > +		xfs_buf_hold(bp);
> > > +		if (!xfs_buf_trylock(bp)) {
> > > +			xfs_buf_rele(bp);
> > >  			return XFS_ITEM_LOCKED;
> > > +		}
> > >  
> > >  		if (!xfs_buf_resubmit_failed_buffers(bp, buffer_list))
> > >  			rval = XFS_ITEM_FLUSHING;
> > >  
> > > -		xfs_buf_unlock(bp);
> > > +		xfs_buf_relse(bp);
> > >  		return rval;
> > >  	}
> > 
> > This just doesn't smell right to me.
> > 
> > /me rummages around in the code
> > 
> > Ok, so I think the bug is in xfs_buf_resubmit_failed_buffers() in
> > that it removes all the "failed" reference counts from the buffer
> > before it adds the delwri reference count back to the buffer. Taking
> > a high level reference count like above just papers over the
> > transient reference counting error in
> > xfs_buf_resubmit_failed_buffers().
> > 
> > It also fails to fix the other xfs_buf_resubmit_failed_buffers()
> > caller, which has exactly the same problem (dquot writeback).
> > 
> > Perhaps something like the patch below?
> > 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> > 
> > 
> > xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers
> > 
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When retrying a failed inode or dquot buffer,
> > xfs_buf_resubmit_failed_buffers() clears all the failed flags from
> > the inode/dquot log items. In doing so, it also drops all the
> > reference counts on the buffer that the failed log items hold. This
> > means it can drop all the active references on the buffer and hence
> > free the buffer before it queues it for write again.
> > 
> > Putting the buffer on the delwri queue takes a reference to the
> > buffer (so that it hangs around until it has been written and
> > completed), but this goes bang if the buffer has already been freed.
> > 
> > Hence we need to add the buffer to the delwri queue before we remove
> > the failed flags from the log items attached to the buffer to ensure
> > it always remains referenced during the resubmit process.
> > 
> > Reported-by: Josef Bacik <josef@toxicpanda.com>
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> Dave,
> 
> Are you planning on sending this along as is?  I'm going to throw it in our
> kernel if you are happy with it.  Thanks,

I'd appreciate it if this patch could be sent to the mailing list in its
own thread to avoid "new patch buried in other thread" syndrome.

--D

> Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] xfs: take a ref on failed bufs in xfs_inode_item_push
  2018-11-12 14:23     ` Josef Bacik
  2018-11-13  5:12       ` Darrick J. Wong
@ 2018-11-14  8:10       ` Dave Chinner
  1 sibling, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2018-11-14  8:10 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-xfs

On Mon, Nov 12, 2018 at 09:23:48AM -0500, Josef Bacik wrote:
> On Thu, Nov 08, 2018 at 10:37:40AM +1100, Dave Chinner wrote:
> > xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers
> > 
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When retrying a failed inode or dquot buffer,
> > xfs_buf_resubmit_failed_buffers() clears all the failed flags from
> > the inode/dquot log items. In doing so, it also drops all the
> > reference counts on the buffer that the failed log items hold. This
> > means it can drop all the active references on the buffer and hence
> > free the buffer before it queues it for write again.
> > 
> > Putting the buffer on the delwri queue takes a reference to the
> > buffer (so that it hangs around until it has been written and
> > completed), but this goes bang if the buffer has already been freed.
> > 
> > Hence we need to add the buffer to the delwri queue before we remove
> > the failed flags from the log items attached to the buffer to ensure
> > it always remains referenced during the resubmit process.
> > 
> > Reported-by: Josef Bacik <josef@toxicpanda.com>
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> Dave,
> 
> Are you planning on sending this along as is?  I'm going to throw it in our
> kernel if you are happy with it.  Thanks,

I'm planning to, it's just my stack of fixes is growing faster than
I can QA them at the moment. 

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] xfs: change xfs_buf_ioapply_map to STATIC
  2018-11-07 20:10 ` [PATCH 1/2] xfs: change xfs_buf_ioapply_map to STATIC Josef Bacik
@ 2018-11-15 10:19   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2018-11-15 10:19 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-xfs

On Wed, Nov 07, 2018 at 03:10:54PM -0500, Josef Bacik wrote:
> In order to make error injection triggering easier make
> xfs_buf_ioapply_map STATIC so it's noinline and can be kprobe'ed.

So per the whole STATIC discussion how about throwing in an explicit
noinline instead?

Then again if you really are about error injection we should probably
have a real tag here.  Either explicit error injection like we have
in many places in XFS, or a trace point which should give a really
nice hook, too.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-11-15 20:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-07 20:10 [PATCH 0/2] xfs: fix panics seen with error injection Josef Bacik
2018-11-07 20:10 ` [PATCH 1/2] xfs: change xfs_buf_ioapply_map to STATIC Josef Bacik
2018-11-15 10:19   ` Christoph Hellwig
2018-11-07 20:10 ` [PATCH 2/2] xfs: take a ref on failed bufs in xfs_inode_item_push Josef Bacik
2018-11-07 23:37   ` Dave Chinner
2018-11-07 23:43     ` Josef Bacik
2018-11-08  0:48       ` Dave Chinner
2018-11-08  1:43         ` Josef Bacik
2018-11-07 23:57     ` Josef Bacik
2018-11-08 19:36     ` Josef Bacik
2018-11-12 14:23     ` Josef Bacik
2018-11-13  5:12       ` Darrick J. Wong
2018-11-14  8:10       ` Dave Chinner
2018-11-07 22:55 ` [PATCH 0/2] xfs: fix panics seen with error injection Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).