All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: fix buffer use after free on IO error
@ 2014-03-22  2:48 Eric Sandeen
  2014-03-25 12:58 ` Brian Foster
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2014-03-22  2:48 UTC (permalink / raw)
  To: xfs-oss

When testing exhaustion of dm snapshots, the following appeared
with CONFIG_DEBUG_OBJECTS_FREE enabled:

ODEBUG: free active (active state 0) object type: work_struct hint: xfs_buf_iodone_work+0x0/0x1d0 [xfs]

indicating that we'd freed a buffer which still had a pending reference,
down this path:

[  190.867975]  [<ffffffff8133e6fb>] debug_check_no_obj_freed+0x22b/0x270
[  190.880820]  [<ffffffff811da1d0>] kmem_cache_free+0xd0/0x370
[  190.892615]  [<ffffffffa02c5924>] xfs_buf_free+0xe4/0x210 [xfs]
[  190.905629]  [<ffffffffa02c6167>] xfs_buf_rele+0xe7/0x270 [xfs]
[  190.911770]  [<ffffffffa034c826>] xfs_trans_read_buf_map+0x7b6/0xac0 [xfs]

At issue is the fact that if IO fails in xfs_buf_iorequest,
we'll queue completion unconditionally, and then call
xfs_buf_rele; but if IO failed, there are no IOs remaining,
and xfs_buf_rele will free the bp while work is still queued.

Fix this by not scheduling completion if the buffer has
an error on it; run it immediately.  The rest is only comment
changes.

Thanks to dchinner for spotting the root cause.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

p.s. or maybe this could be moved into _xfs_buf_ioend ...
I think I see some nice cleanups for xfs_buf_ioend vs.
_xfs_buf_ioend w.r.t. when "schedule" is true, so maybe
I can clean it up then.


diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 9c061ef..45eb5ef 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1361,21 +1361,29 @@ xfs_buf_iorequest(
 		xfs_buf_wait_unpin(bp);
 	xfs_buf_hold(bp);
 
-	/* Set the count to 1 initially, this will stop an I/O
+	/*
+	 * Set the count to 1 initially, this will stop an I/O
 	 * completion callout which happens before we have started
 	 * all the I/O from calling xfs_buf_ioend too early.
 	 */
 	atomic_set(&bp->b_io_remaining, 1);
 	_xfs_buf_ioapply(bp);
-	_xfs_buf_ioend(bp, 1);
+	/*
+	 * If _xfs_buf_ioapply failed, we'll get back here with
+	 * only the reference we took above.  _xfs_buf_ioend will
+	 * drop it to zero, so we'd better not queue it for later,
+	 * or we'll free it before it's done.
+	 */
+	_xfs_buf_ioend(bp, bp->b_error ? 0 : 1);
 
 	xfs_buf_rele(bp);
 }
 
 /*
  * Waits for I/O to complete on the buffer supplied.  It returns immediately if
- * no I/O is pending or there is already a pending error on the buffer.  It
- * returns the I/O error code, if any, or 0 if there was no error.
+ * no I/O is pending or there is already a pending error on the buffer, in which
+ * case nothing will ever complete.  It returns the I/O error code, if any, or
+ * 0 if there was no error.
  */
 int
 xfs_buf_iowait(

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: fix buffer use after free on IO error
  2014-03-22  2:48 [PATCH] xfs: fix buffer use after free on IO error Eric Sandeen
@ 2014-03-25 12:58 ` Brian Foster
  2014-03-25 13:17   ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Brian Foster @ 2014-03-25 12:58 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs-oss

On Fri, Mar 21, 2014 at 09:48:50PM -0500, Eric Sandeen wrote:
> When testing exhaustion of dm snapshots, the following appeared
> with CONFIG_DEBUG_OBJECTS_FREE enabled:
> 
> ODEBUG: free active (active state 0) object type: work_struct hint: xfs_buf_iodone_work+0x0/0x1d0 [xfs]
> 
> indicating that we'd freed a buffer which still had a pending reference,
> down this path:
> 
> [  190.867975]  [<ffffffff8133e6fb>] debug_check_no_obj_freed+0x22b/0x270
> [  190.880820]  [<ffffffff811da1d0>] kmem_cache_free+0xd0/0x370
> [  190.892615]  [<ffffffffa02c5924>] xfs_buf_free+0xe4/0x210 [xfs]
> [  190.905629]  [<ffffffffa02c6167>] xfs_buf_rele+0xe7/0x270 [xfs]
> [  190.911770]  [<ffffffffa034c826>] xfs_trans_read_buf_map+0x7b6/0xac0 [xfs]
> 
> At issue is the fact that if IO fails in xfs_buf_iorequest,
> we'll queue completion unconditionally, and then call
> xfs_buf_rele; but if IO failed, there are no IOs remaining,
> and xfs_buf_rele will free the bp while work is still queued.
> 
> Fix this by not scheduling completion if the buffer has
> an error on it; run it immediately.  The rest is only comment
> changes.
> 
> Thanks to dchinner for spotting the root cause.
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> p.s. or maybe this could be moved into _xfs_buf_ioend ...
> I think I see some nice cleanups for xfs_buf_ioend vs.
> _xfs_buf_ioend w.r.t. when "schedule" is true, so maybe
> I can clean it up then.
> 
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 9c061ef..45eb5ef 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1361,21 +1361,29 @@ xfs_buf_iorequest(
>  		xfs_buf_wait_unpin(bp);
>  	xfs_buf_hold(bp);
>  
> -	/* Set the count to 1 initially, this will stop an I/O
> +	/*
> +	 * Set the count to 1 initially, this will stop an I/O
>  	 * completion callout which happens before we have started
>  	 * all the I/O from calling xfs_buf_ioend too early.
>  	 */
>  	atomic_set(&bp->b_io_remaining, 1);
>  	_xfs_buf_ioapply(bp);
> -	_xfs_buf_ioend(bp, 1);
> +	/*
> +	 * If _xfs_buf_ioapply failed, we'll get back here with
> +	 * only the reference we took above.  _xfs_buf_ioend will
> +	 * drop it to zero, so we'd better not queue it for later,
> +	 * or we'll free it before it's done.
> +	 */
> +	_xfs_buf_ioend(bp, bp->b_error ? 0 : 1);
>  

Out of curiosity, is there any major reason we don't use 0 here
unconditionally? Are we worried about I/O completing before we have a
chance to decrement the reference?

Looks good to me either way:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  	xfs_buf_rele(bp);
>  }
>  
>  /*
>   * Waits for I/O to complete on the buffer supplied.  It returns immediately if
> - * no I/O is pending or there is already a pending error on the buffer.  It
> - * returns the I/O error code, if any, or 0 if there was no error.
> + * no I/O is pending or there is already a pending error on the buffer, in which
> + * case nothing will ever complete.  It returns the I/O error code, if any, or
> + * 0 if there was no error.
>   */
>  int
>  xfs_buf_iowait(
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: fix buffer use after free on IO error
  2014-03-25 12:58 ` Brian Foster
@ 2014-03-25 13:17   ` Christoph Hellwig
  2014-03-25 16:05     ` Eric Sandeen
  2014-03-25 18:08     ` Dave Chinner
  0 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2014-03-25 13:17 UTC (permalink / raw)
  To: Brian Foster; +Cc: Eric Sandeen, xfs-oss

> On Fri, Mar 21, 2014 at 09:48:50PM -0500, Eric Sandeen wrote:
> > +	/*
> > +	 * If _xfs_buf_ioapply failed, we'll get back here with
> > +	 * only the reference we took above.  _xfs_buf_ioend will
> > +	 * drop it to zero, so we'd better not queue it for later,
> > +	 * or we'll free it before it's done.
> > +	 */
> > +	_xfs_buf_ioend(bp, bp->b_error ? 0 : 1);
> >  
> 
> Out of curiosity, is there any major reason we don't use 0 here
> unconditionally? Are we worried about I/O completing before we have a
> chance to decrement the reference?

I think this should unconditionally avoid the schedule, and while we're
at it we should kill _xfs_buf_ioend and opencode it here and at the
other callsite.

Also atomic_dec_and_test really just returns true/false - there should
ne no need for the explicit == 1 in the conditional.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: fix buffer use after free on IO error
  2014-03-25 13:17   ` Christoph Hellwig
@ 2014-03-25 16:05     ` Eric Sandeen
  2014-03-25 17:25       ` Christoph Hellwig
  2014-03-25 18:08     ` Dave Chinner
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2014-03-25 16:05 UTC (permalink / raw)
  To: Christoph Hellwig, Brian Foster; +Cc: Eric Sandeen, xfs-oss

On 3/25/14, 6:17 AM, Christoph Hellwig wrote:
>> On Fri, Mar 21, 2014 at 09:48:50PM -0500, Eric Sandeen wrote:
>>> +	/*
>>> +	 * If _xfs_buf_ioapply failed, we'll get back here with
>>> +	 * only the reference we took above.  _xfs_buf_ioend will
>>> +	 * drop it to zero, so we'd better not queue it for later,
>>> +	 * or we'll free it before it's done.
>>> +	 */
>>> +	_xfs_buf_ioend(bp, bp->b_error ? 0 : 1);
>>>  
>>
>> Out of curiosity, is there any major reason we don't use 0 here
>> unconditionally? Are we worried about I/O completing before we have a
>> chance to decrement the reference?
> 
> I think this should unconditionally avoid the schedule, and while we're
> at it we should kill _xfs_buf_ioend and opencode it here and at the
> other callsite.

And then remove the flag from xfs_buf_ioend which is always 0 at that
point ...

> Also atomic_dec_and_test really just returns true/false - there should
> ne no need for the explicit == 1 in the conditional.

Yeah I have a patch to do that as well; I wanted to separate the
bugfix from the more invasive cleanup, though - and I wanted to
get the fix out for review sooner.

But yeah, I was unsure about whether or not to schedule at all here.
We come here from a lot of callsites and I'm honestly not sure what
the implications are yet.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: fix buffer use after free on IO error
  2014-03-25 16:05     ` Eric Sandeen
@ 2014-03-25 17:25       ` Christoph Hellwig
  2014-03-25 17:39         ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2014-03-25 17:25 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Eric Sandeen, Brian Foster, xfs-oss

On Tue, Mar 25, 2014 at 09:05:04AM -0700, Eric Sandeen wrote:
> >> Out of curiosity, is there any major reason we don't use 0 here
> >> unconditionally? Are we worried about I/O completing before we have a
> >> chance to decrement the reference?
> > 
> > I think this should unconditionally avoid the schedule, and while we're
> > at it we should kill _xfs_buf_ioend and opencode it here and at the
> > other callsite.
> 
> And then remove the flag from xfs_buf_ioend which is always 0 at that
> point ...

Is it?  xfs_buf_bio_end_io should stil be passing 1, the bio end_io
handler is the place we really need the workqueue for anyway.

> Yeah I have a patch to do that as well; I wanted to separate the
> bugfix from the more invasive cleanup, though - and I wanted to
> get the fix out for review sooner.

Sure, feel free to leave all the cleanups to another patch.

> But yeah, I was unsure about whether or not to schedule at all here.
> We come here from a lot of callsites and I'm honestly not sure what
> the implications are yet.

I think the the delayed completion is always wrong from the submission
path.  The error path is just a special case of a completion happening
before _xfs_buf_ioapply returns.  The combination of incredibly fast
hardware and bad preemption could cause the same bug you observed.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: fix buffer use after free on IO error
  2014-03-25 17:25       ` Christoph Hellwig
@ 2014-03-25 17:39         ` Eric Sandeen
  2014-03-25 17:44           ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2014-03-25 17:39 UTC (permalink / raw)
  To: Christoph Hellwig, Eric Sandeen; +Cc: Brian Foster, xfs-oss

On 3/25/14, 10:25 AM, Christoph Hellwig wrote:
> On Tue, Mar 25, 2014 at 09:05:04AM -0700, Eric Sandeen wrote:
>>>> Out of curiosity, is there any major reason we don't use 0 here
>>>> unconditionally? Are we worried about I/O completing before we have a
>>>> chance to decrement the reference?
>>>
>>> I think this should unconditionally avoid the schedule, and while we're
>>> at it we should kill _xfs_buf_ioend and opencode it here and at the
>>> other callsite.
>>
>> And then remove the flag from xfs_buf_ioend which is always 0 at that
>> point ...
> 
> Is it?  xfs_buf_bio_end_io should stil be passing 1, the bio end_io
> handler is the place we really need the workqueue for anyway.

These are the callers of xfs_buf_ioend:

  File              Function                 Line
0 xfs_buf.c         xfs_bioerror             1085 xfs_buf_ioend(bp, 0);
1 xfs_buf.c         _xfs_buf_ioend           1177 xfs_buf_ioend(bp, schedule);
2 xfs_buf_item.c    xfs_buf_item_unpin        494 xfs_buf_ioend(bp, 0);
3 xfs_buf_item.c    xfs_buf_iodone_callbacks 1138 xfs_buf_ioend(bp, 0);
4 xfs_inode.c       xfs_iflush_cluster       3015 xfs_buf_ioend(bp, 0);
5 xfs_log.c         xlog_bdstrat             1644 xfs_buf_ioend(bp, 0);
6 xfs_log_recover.c xlog_recover_iodone       386 xfs_buf_ioend(bp, 0);

so only _xfs_buf_ioend *might* pass something other than 0, and:

  File      Function           Line
0 xfs_buf.c xfs_buf_bio_end_io 1197 _xfs_buf_ioend(bp, 1);
1 xfs_buf.c xfs_buf_iorequest  1377 _xfs_buf_ioend(bp, bp->b_error ? 0 : 1);

At least up until now that was always called with "1"

>> Yeah I have a patch to do that as well; I wanted to separate the
>> bugfix from the more invasive cleanup, though - and I wanted to
>> get the fix out for review sooner.
> 
> Sure, feel free to leave all the cleanups to another patch.
> 
>> But yeah, I was unsure about whether or not to schedule at all here.
>> We come here from a lot of callsites and I'm honestly not sure what
>> the implications are yet.
> 
> I think the the delayed completion is always wrong from the submission
> path.  The error path is just a special case of a completion happening
> before _xfs_buf_ioapply returns.  The combination of incredibly fast
> hardware and bad preemption could cause the same bug you observed.

I wondered about that.

I'm not sure; I don't think it was the buf_rele inside xfs_buf_iorequest
that freed it, I think it was specifically the error path afterwards -
in my case, in xfs_trans_read_buf_map():

                                xfs_buf_	(bp);
					// xfs_buf_iorequest code below
					xfs_buf_hold(bp);
					atomic_set(&bp->b_io_remaining, 1);
					_xfs_buf_ioapply(bp); <-- gets error
					if (atomic_dec_and_test(&bp->b_io_remaining)
						xfs_buf_ioend(bp, bp->b_error ? 0 : 1);
					xfs_buf_rele(bp); <- releases our hold
                        }

                        error = xfs_buf_iowait(bp); <-- sees error; would have waited otherwise
                        if (error) {
                                xfs_buf_ioerror_alert(bp, __func__);
                                xfs_buf_relse(bp); <--- freed here ?

but my bp refcounting & lifetime knowledge is lacking :(

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: fix buffer use after free on IO error
  2014-03-25 17:39         ` Eric Sandeen
@ 2014-03-25 17:44           ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2014-03-25 17:44 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Christoph Hellwig, Brian Foster, Eric Sandeen, xfs-oss

On Tue, Mar 25, 2014 at 10:39:31AM -0700, Eric Sandeen wrote:
> so only _xfs_buf_ioend *might* pass something other than 0, and:
> 
>   File      Function           Line
> 0 xfs_buf.c xfs_buf_bio_end_io 1197 _xfs_buf_ioend(bp, 1);
> 1 xfs_buf.c xfs_buf_iorequest  1377 _xfs_buf_ioend(bp, bp->b_error ? 0 : 1);
> 
> At least up until now that was always called with "1"

Right, _was_.  But that changes to one always passing 1, and one passing
0 or one with your patch.  Or one passing always 1 and one always
passing 0 with the suggestion from Brian and me.  Either way we'd still
have versions passing 1.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: fix buffer use after free on IO error
  2014-03-25 13:17   ` Christoph Hellwig
  2014-03-25 16:05     ` Eric Sandeen
@ 2014-03-25 18:08     ` Dave Chinner
  2014-03-25 18:13       ` Christoph Hellwig
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2014-03-25 18:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Eric Sandeen, Brian Foster, xfs-oss

On Tue, Mar 25, 2014 at 06:17:05AM -0700, Christoph Hellwig wrote:
> > On Fri, Mar 21, 2014 at 09:48:50PM -0500, Eric Sandeen wrote:
> > > +	/*
> > > +	 * If _xfs_buf_ioapply failed, we'll get back here with
> > > +	 * only the reference we took above.  _xfs_buf_ioend will
> > > +	 * drop it to zero, so we'd better not queue it for later,
> > > +	 * or we'll free it before it's done.
> > > +	 */
> > > +	_xfs_buf_ioend(bp, bp->b_error ? 0 : 1);
> > >  
> > 
> > Out of curiosity, is there any major reason we don't use 0 here
> > unconditionally? Are we worried about I/O completing before we have a
> > chance to decrement the reference?
> 
> I think this should unconditionally avoid the schedule, and while we're
> at it we should kill _xfs_buf_ioend and opencode it here and at the
> other callsite.

I thought we schduled here because we can issue IO from IO
completion and so we need to requeue the IO completion rather than
run it inline in the current IO completion that hasn't fully
completed it's processing yet..

> Also atomic_dec_and_test really just returns true/false - there should
> ne no need for the explicit == 1 in the conditional.

Right, there's several other cleanups around this code that can be
done as Eric has mentioned...

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xfs: fix buffer use after free on IO error
  2014-03-25 18:08     ` Dave Chinner
@ 2014-03-25 18:13       ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2014-03-25 18:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, Eric Sandeen, Brian Foster, xfs-oss

On Wed, Mar 26, 2014 at 05:08:14AM +1100, Dave Chinner wrote:
> > I think this should unconditionally avoid the schedule, and while we're
> > at it we should kill _xfs_buf_ioend and opencode it here and at the
> > other callsite.
> 
> I thought we schduled here because we can issue IO from IO
> completion and so we need to requeue the IO completion rather than
> run it inline in the current IO completion that hasn't fully
> completed it's processing yet..

Actually we get a recursive callchain there from the I/O error handling.
I switched from always 0 to always 1 here in:

commit 08023d6dbe840dc4271805a9ea376fcbdee9f744
Author: Christoph Hellwig <hch@infradead.org>
Date:   Mon Jul 2 06:00:04 2012 -0400

    xfs: prevent recursion in xfs_buf_iorequest

but I never got around fixing the root cause by cleaning up the way we
handle buffer I/O completions on a shut down filesystem.

So I guess Eric's fix is the best we can do for now.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-03-25 18:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-22  2:48 [PATCH] xfs: fix buffer use after free on IO error Eric Sandeen
2014-03-25 12:58 ` Brian Foster
2014-03-25 13:17   ` Christoph Hellwig
2014-03-25 16:05     ` Eric Sandeen
2014-03-25 17:25       ` Christoph Hellwig
2014-03-25 17:39         ` Eric Sandeen
2014-03-25 17:44           ` Christoph Hellwig
2014-03-25 18:08     ` Dave Chinner
2014-03-25 18:13       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.