All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Re-dirty pages on I/O error
@ 2009-02-10  1:48 Lachlan McIlroy
  2009-02-10 10:01 ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Lachlan McIlroy @ 2009-02-10  1:48 UTC (permalink / raw)
  To: xfs-oss

[This patch seems to have slipped through the proverbial crack.]

If we get an error in xfs_page_state_convert() - and it's not EAGAIN - then
we throw away the dirty page without converting the delayed allocation.  This
leaves delayed allocations that can never be removed and confuses code that
expects a flush of the file to clear them.  We need to re-dirty the page on
error so we can try again later or report that the flush failed.


Index: xfs-fixes/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- xfs-fixes.orig/fs/xfs/linux-2.6/xfs_aops.c
+++ xfs-fixes/fs/xfs/linux-2.6/xfs_aops.c
@@ -1185,16 +1185,6 @@ error:
  	if (iohead)
  		xfs_cancel_ioend(iohead);

-	/*
-	 * If it's delalloc and we have nowhere to put it,
-	 * throw it away, unless the lower layers told
-	 * us to try again.
-	 */
-	if (err != -EAGAIN) {
-		if (!unmapped)
-			block_invalidatepage(page, 0);
-		ClearPageUptodate(page);
-	}
  	return err;
  }

@@ -1223,7 +1213,7 @@ xfs_vm_writepage(
  	struct page		*page,
  	struct writeback_control *wbc)
  {
-	int			error;
+	int			error = 0;
  	int			need_trans;
  	int			delalloc, unmapped, unwritten;
  	struct inode		*inode = page->mapping->host;
@@ -1269,19 +1259,16 @@ xfs_vm_writepage(
  	 * to real space and flush out to disk.
  	 */
  	error = xfs_page_state_convert(inode, page, wbc, 1, unmapped);
-	if (error == -EAGAIN)
-		goto out_fail;
  	if (unlikely(error < 0))
-		goto out_unlock;
+		goto out_fail;

  	return 0;

  out_fail:
  	redirty_page_for_writepage(wbc, page);
  	unlock_page(page);
-	return 0;
-out_unlock:
-	unlock_page(page);
+	if (error == -EAGAIN)
+		error = 0;
  	return error;
  }

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re-dirty pages on I/O error
  2009-02-10  1:48 [PATCH] Re-dirty pages on I/O error Lachlan McIlroy
@ 2009-02-10 10:01 ` Dave Chinner
  2009-02-10 23:33   ` Lachlan McIlroy
  2009-02-15 20:05   ` Christoph Hellwig
  0 siblings, 2 replies; 14+ messages in thread
From: Dave Chinner @ 2009-02-10 10:01 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: xfs-oss

On Tue, Feb 10, 2009 at 12:48:38PM +1100, Lachlan McIlroy wrote:
> [This patch seems to have slipped through the proverbial crack.]
>
> If we get an error in xfs_page_state_convert() - and it's not EAGAIN - then
> we throw away the dirty page without converting the delayed allocation.  This
> leaves delayed allocations that can never be removed and confuses code that
> expects a flush of the file to clear them.  We need to re-dirty the page on
> error so we can try again later or report that the flush failed.

As discussed previously, the fix that is needed in the page
invalidation path. i.e.  the page invalidate path clears the
buffer_delay() flag (via discard_buffer) before we get to
->releasepage and so releasepage fails to convert the delalloc
extent before tossing the page....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re-dirty pages on I/O error
  2009-02-10 10:01 ` Dave Chinner
@ 2009-02-10 23:33   ` Lachlan McIlroy
  2009-02-15 20:05   ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Lachlan McIlroy @ 2009-02-10 23:33 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-oss

Dave Chinner wrote:
> On Tue, Feb 10, 2009 at 12:48:38PM +1100, Lachlan McIlroy wrote:
>> [This patch seems to have slipped through the proverbial crack.]
>>
>> If we get an error in xfs_page_state_convert() - and it's not EAGAIN - then
>> we throw away the dirty page without converting the delayed allocation.  This
>> leaves delayed allocations that can never be removed and confuses code that
>> expects a flush of the file to clear them.  We need to re-dirty the page on
>> error so we can try again later or report that the flush failed.
> 
> As discussed previously, the fix that is needed in the page
> invalidation path. i.e.  the page invalidate path clears the
> buffer_delay() flag (via discard_buffer) before we get to
> ->releasepage and so releasepage fails to convert the delalloc
> extent before tossing the page....

I'll leave it up to you to implement that then Dave since I think you
are the only one who understands your angle of attack.

Meanwhile we have this patch running on customer systems and it's fixed
the problems they were having with orphaned delayed allocations being
left behind due to pages being erroneously invalidated.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re-dirty pages on I/O error
  2009-02-10 10:01 ` Dave Chinner
  2009-02-10 23:33   ` Lachlan McIlroy
@ 2009-02-15 20:05   ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2009-02-15 20:05 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-oss

On Tue, Feb 10, 2009 at 09:01:29PM +1100, Dave Chinner wrote:
> On Tue, Feb 10, 2009 at 12:48:38PM +1100, Lachlan McIlroy wrote:
> > [This patch seems to have slipped through the proverbial crack.]
> >
> > If we get an error in xfs_page_state_convert() - and it's not EAGAIN - then
> > we throw away the dirty page without converting the delayed allocation.  This
> > leaves delayed allocations that can never be removed and confuses code that
> > expects a flush of the file to clear them.  We need to re-dirty the page on
> > error so we can try again later or report that the flush failed.
> 
> As discussed previously, the fix that is needed in the page
> invalidation path. i.e.  the page invalidate path clears the
> buffer_delay() flag (via discard_buffer) before we get to
> ->releasepage and so releasepage fails to convert the delalloc
> extent before tossing the page....

And we need to get started on that fix rather sooner than later.

Any chance to get some testcase extrace from the scenarios at those
customer sites?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re-dirty pages on I/O error
  2008-09-16  4:01     ` Dave Chinner
@ 2008-09-16  6:30       ` Lachlan McIlroy
  0 siblings, 0 replies; 14+ messages in thread
From: Lachlan McIlroy @ 2008-09-16  6:30 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-dev, xfs-oss

Dave Chinner wrote:
> On Mon, Sep 15, 2008 at 01:22:22PM +1000, Lachlan McIlroy wrote:
>> Dave Chinner wrote:
>>> So we keep dirty pages around that we can't write back?
>> Yes.
>>
>>> If we are in a low memory situation and the block device
>>> has gone bad, that will prevent memory reclaim from making
>>> progress.
>> How do you differentiate "gone bad" from temporarily unavailable?
> 
> The only "temporary" error you can get in writeback is a path
> failure. IIRC, XVM will give an ENODEV on a path failure, but
> I don't think that dm-multipath does. Other than that, a write
> failure is unrecoverable. Any other error is permanent....
> 
>>> i.e. if we have a bad disk, a user can now take down the system
>>> by running it out of clean memory....
>> I'm sure there's many ways a malicious user could already do that.
> 
> That's no excuse for introducing a new way of taking down the
> system when a disk fails. Error handling in linux is bad enough
> without intentionally preventing the system from recovering from
> I/O errors...
> 
>> Would you rather have data corruption?
> 
> Data corruption as a result of an I/O error? What else can we
> be expected to do? Log the error and continue onwards....
> 
> Face it - if the drive is dead then we can't write the data
> anywhere, so keeping it around and potentially killing the system
> completely makes even less sense.  At some point we *have to give
> up* on data we can't write back....
> 
>> We've allowed the write() to succeed.  We've accepted the data.
>> We have an obligation to write it do disk.  Either we keep trying
>> in the face of errors or we take down the filesystem.
> 
> It's write-behind buffering. We give best effort, not guaranteed
> writeback. If the system crashes, that data is lost. If we get an
> I/O error, that data is lost. If the application cares, it uses
> fsync and it gets the error and can handle it.
> 
> .....
> 
>> The EAGAIN case can be exceptioned.  The error we are getting here
>> is ENOSPC because xfs_trans_reserve() is failing. 
> 
> <sigh>
> 
> Please - put that detail in the patch description. I'm getting a
> little tired of having to draw out the reasons for your patches
> one little bit at a time.
I intentionally left out that detail because that's not what I'm trying
to fix here.  Discarding data arbitrarily is wrong and needs to be fixed
regardless of the error.  I've already mentioned what the cause of the
ENOSPC is earlier in this thread.

> 
> So: why is xfs_trans_reserve() failing? Aren't all the transactions
> in the writeback path marked as XFS_TRANS_RESERVE? That allows the
> transaction reserve to succeed when at ENOSPC by dipping into the
> reserved blocks. Did we run out of reserved blocks (i.e. the reserve
> pool is not big enough)? Or is there some other case that leads to
> ENOSPC in the writeback path that we've never considered before?

Yes, xfs_trans_reserve() is failing, it is marked XFS_TRANS_RESERVE
and we ran out of the reserved pool.  We've tried bumping the pool
from 1024 blocks to 16384 blocks and we can still cause it to fail
so we'll need to make the default even higher.  This ENOSPC error is
not necessarily a permanent error and in this case we shouldn't
discard the page.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re-dirty pages on I/O error
  2008-09-15  3:22   ` Lachlan McIlroy
@ 2008-09-16  4:01     ` Dave Chinner
  2008-09-16  6:30       ` Lachlan McIlroy
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2008-09-16  4:01 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: xfs-dev, xfs-oss

On Mon, Sep 15, 2008 at 01:22:22PM +1000, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> So we keep dirty pages around that we can't write back?
> Yes.
>
>> If we are in a low memory situation and the block device
>> has gone bad, that will prevent memory reclaim from making
>> progress.
> How do you differentiate "gone bad" from temporarily unavailable?

The only "temporary" error you can get in writeback is a path
failure. IIRC, XVM will give an ENODEV on a path failure, but
I don't think that dm-multipath does. Other than that, a write
failure is unrecoverable. Any other error is permanent....

>> i.e. if we have a bad disk, a user can now take down the system
>> by running it out of clean memory....
> I'm sure there's many ways a malicious user could already do that.

That's no excuse for introducing a new way of taking down the
system when a disk fails. Error handling in linux is bad enough
without intentionally preventing the system from recovering from
I/O errors...

> Would you rather have data corruption?

Data corruption as a result of an I/O error? What else can we
be expected to do? Log the error and continue onwards....

Face it - if the drive is dead then we can't write the data
anywhere, so keeping it around and potentially killing the system
completely makes even less sense.  At some point we *have to give
up* on data we can't write back....

> We've allowed the write() to succeed.  We've accepted the data.
> We have an obligation to write it do disk.  Either we keep trying
> in the face of errors or we take down the filesystem.

It's write-behind buffering. We give best effort, not guaranteed
writeback. If the system crashes, that data is lost. If we get an
I/O error, that data is lost. If the application cares, it uses
fsync and it gets the error and can handle it.

.....

> The EAGAIN case can be exceptioned.  The error we are getting here
> is ENOSPC because xfs_trans_reserve() is failing. 

<sigh>

Please - put that detail in the patch description. I'm getting a
little tired of having to draw out the reasons for your patches
one little bit at a time.

So: why is xfs_trans_reserve() failing? Aren't all the transactions
in the writeback path marked as XFS_TRANS_RESERVE? That allows the
transaction reserve to succeed when at ENOSPC by dipping into the
reserved blocks. Did we run out of reserved blocks (i.e. the reserve
pool is not big enough)? Or is there some other case that leads to
ENOSPC in the writeback path that we've never considered before?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re-dirty pages on I/O error
  2008-09-13  4:19 ` Dave Chinner
@ 2008-09-15  3:22   ` Lachlan McIlroy
  2008-09-16  4:01     ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Lachlan McIlroy @ 2008-09-15  3:22 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-dev, xfs-oss

Dave Chinner wrote:
> On Thu, Sep 11, 2008 at 06:37:33PM +1000, Lachlan McIlroy wrote:
>> If we get an error in xfs_page_state_convert() - and it's not EAGAIN - then
>> we throw away the dirty page without converting the delayed allocation.  This
>> leaves delayed allocations that can never be removed and confuses code that
>> expects a flush of the file to clear them.  We need to re-dirty the page on
>> error so we can try again later or report that the flush failed.
>>
>> --- a/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 16:32:11.000000000 +1000
>> +++ b/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 15:44:09.000000000 +1000
>> @@ -1147,16 +1147,6 @@ error:
>> 	if (iohead)
>> 		xfs_cancel_ioend(iohead);
>>
>> -	/*
>> -	 * If it's delalloc and we have nowhere to put it,
>> -	 * throw it away, unless the lower layers told
>> -	 * us to try again.
>> -	 */
>> -	if (err != -EAGAIN) {
>> -		if (!unmapped)
>> -			block_invalidatepage(page, 0);
>> -		ClearPageUptodate(page);
>> -	}
>> 	return err;
>> }
> 
> So we keep dirty pages around that we can't write back?
Yes.

> If we are in a low memory situation and the block device
> has gone bad, that will prevent memory reclaim from making
> progress.
How do you differentiate "gone bad" from temporarily unavailable?

> 
> i.e. if we have a bad disk, a user can now take down the system
> by running it out of clean memory....
I'm sure there's many ways a malicious user could already do that.
Would you rather have data corruption?
We've allowed the write() to succeed.  We've accepted the data.
We have an obligation to write it do disk.  Either we keep trying
in the face of errors or we take down the filesystem.

> 
> The EAGAIN case is for when we can't get the locks we
> need during non-blocking writeback, which is a common case if
> there is concurrent writes to this inode....
> 
>> @@ -1216,8 +1206,11 @@ xfs_vm_writepage(
>> 	 * then mark the page dirty again and leave the page
>> 	 * as is.
>> 	 */
>> -	if (current_test_flags(PF_FSTRANS) && need_trans)
>> -		goto out_fail;
>> +	if (current_test_flags(PF_FSTRANS) && need_trans) {
>> +		redirty_page_for_writepage(wbc, page);
>> +		unlock_page(page);
>> +		return -EAGAIN;
>> +	}
> 
> Should not return an error here - the redirty_page_for_writepage()
> call effective says "can't do this right away, but no error
> needs to be reported because it can be written again later".
> Happens all the time with non-blocking writes.
Christoph already pointed that one out.

> 
>> @@ -1231,20 +1224,14 @@ xfs_vm_writepage(
>> 	 * to real space and flush out to disk.
>> 	 */
>> 	error = xfs_page_state_convert(inode, page, wbc, 1, unmapped);
>> -	if (error == -EAGAIN)
>> -		goto out_fail;
>> -	if (unlikely(error < 0))
>> -		goto out_unlock;
>>
>> -	return 0;
>> +	if (error < 0) {
>> +		redirty_page_for_writepage(wbc, page);
>> +		unlock_page(page);
>> +		return error;
>> +	}
> 
> That needs the EAGAIN exception - that's not an error.
> For EIO, though, we should really be invalidating the
> page like we currently do. However, it should be silent as
> per the current behaviour - a rate-limited log warning is
> really needed here...

The EAGAIN case can be exceptioned.  The error we are getting here
is ENOSPC because xfs_trans_reserve() is failing.  It looks to me like
__writepage() and mapping_set_error() want to know about that error.
We also need that error to be propogated back to any callers of
xfs_flushinval_pages() and the only way to do that is to return the
actual error.  Just redirtying the page wont return an error all the
way back up the call stack.

Silently discarding data just doesn't make sense.  Issuing a log
message isn't much better - noone will look for it until after they
realise there's a problem and all their files are corrupt.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re-dirty pages on I/O error
  2008-09-11  8:37 Lachlan McIlroy
  2008-09-11 10:33 ` Christoph Hellwig
@ 2008-09-13  4:19 ` Dave Chinner
  2008-09-15  3:22   ` Lachlan McIlroy
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2008-09-13  4:19 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: xfs-dev, xfs-oss

On Thu, Sep 11, 2008 at 06:37:33PM +1000, Lachlan McIlroy wrote:
> If we get an error in xfs_page_state_convert() - and it's not EAGAIN - then
> we throw away the dirty page without converting the delayed allocation.  This
> leaves delayed allocations that can never be removed and confuses code that
> expects a flush of the file to clear them.  We need to re-dirty the page on
> error so we can try again later or report that the flush failed.
>
> --- a/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 16:32:11.000000000 +1000
> +++ b/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 15:44:09.000000000 +1000
> @@ -1147,16 +1147,6 @@ error:
> 	if (iohead)
> 		xfs_cancel_ioend(iohead);
>
> -	/*
> -	 * If it's delalloc and we have nowhere to put it,
> -	 * throw it away, unless the lower layers told
> -	 * us to try again.
> -	 */
> -	if (err != -EAGAIN) {
> -		if (!unmapped)
> -			block_invalidatepage(page, 0);
> -		ClearPageUptodate(page);
> -	}
> 	return err;
> }

So we keep dirty pages around that we can't write back?
If we are in a low memory situation and the block device
has gone bad, that will prevent memory reclaim from making
progress.

i.e. if we have a bad disk, a user can now take down the system
by running it out of clean memory....

The EAGAIN case is for when we can't get the locks we
need during non-blocking writeback, which is a common case if
there is concurrent writes to this inode....

> @@ -1216,8 +1206,11 @@ xfs_vm_writepage(
> 	 * then mark the page dirty again and leave the page
> 	 * as is.
> 	 */
> -	if (current_test_flags(PF_FSTRANS) && need_trans)
> -		goto out_fail;
> +	if (current_test_flags(PF_FSTRANS) && need_trans) {
> +		redirty_page_for_writepage(wbc, page);
> +		unlock_page(page);
> +		return -EAGAIN;
> +	}

Should not return an error here - the redirty_page_for_writepage()
call effective says "can't do this right away, but no error
needs to be reported because it can be written again later".
Happens all the time with non-blocking writes.

> @@ -1231,20 +1224,14 @@ xfs_vm_writepage(
> 	 * to real space and flush out to disk.
> 	 */
> 	error = xfs_page_state_convert(inode, page, wbc, 1, unmapped);
> -	if (error == -EAGAIN)
> -		goto out_fail;
> -	if (unlikely(error < 0))
> -		goto out_unlock;
>
> -	return 0;
> +	if (error < 0) {
> +		redirty_page_for_writepage(wbc, page);
> +		unlock_page(page);
> +		return error;
> +	}

That needs the EAGAIN exception - that's not an error.
For EIO, though, we should really be invalidating the
page like we currently do. However, it should be silent as
per the current behaviour - a rate-limited log warning is
really needed here...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re-dirty pages on I/O error
  2008-09-12  6:44     ` Lachlan McIlroy
@ 2008-09-12 13:17       ` Eric Sandeen
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Sandeen @ 2008-09-12 13:17 UTC (permalink / raw)
  To: lachlan; +Cc: Christoph Hellwig, xfs-dev, xfs-oss

Lachlan McIlroy wrote:
> Eric Sandeen wrote:
>> Christoph Hellwig wrote:
>>> On Thu, Sep 11, 2008 at 06:37:33PM +1000, Lachlan McIlroy wrote:
>>>> If we get an error in xfs_page_state_convert() - and it's not EAGAIN - then
>>>> we throw away the dirty page without converting the delayed allocation.  This
>>>> leaves delayed allocations that can never be removed and confuses code that
>>>> expects a flush of the file to clear them.  We need to re-dirty the page on
>>>> error so we can try again later or report that the flush failed.
>>>>
>>>> --- a/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 16:32:11.000000000 +1000
>>>> +++ b/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 15:44:09.000000000 +1000
>>>> @@ -1147,16 +1147,6 @@ error:
>>>> 	if (iohead)
>>>> 		xfs_cancel_ioend(iohead);
>>>>
>>>> -	/*
>>>> -	 * If it's delalloc and we have nowhere to put it,
>>>> -	 * throw it away, unless the lower layers told
>>>> -	 * us to try again.
>>>> -	 */
>>>> -	if (err != -EAGAIN) {
>>>> -		if (!unmapped)
>>>> -			block_invalidatepage(page, 0);
>>>> -		ClearPageUptodate(page);
>>>> -	}
>>> While this always looked fishy to me we it needs a good explanation to
>>> kill this.  I try to remember why Steve did it this way long time ago.
>> Hrm some of that was my fault, ages ago.
>>
>> http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/pagebuf/Attic/page_buf_io.c.diff?r1=1.2;r2=1.3;hideattic=0
> That revision history doesn't match the ptools history - probably cause the
> file no longer exists.  Anyway it looks like the hack was mostly there before
> your change.

Before my change it just cleared the delalloc flag....

>> I don't remember the details fo why.... ah here's a clue
>>
>> http://oss.sgi.com/archives/xfs/2002-01/msg00475.html
> Geez, that's so long ago that I doubt it's still an issue.  So it sounds
> like this hack was added to prevent further issues after a forced shutdown
> has already occurred.  If we leave this hack in there's a good chance it
> will cause a forced shutdown.

I'm not arguing to keep it, just trying to figure out where it's from :)

>>> As _xfs_force_shutdown was written, it tried to schedule in an interrupt context
>>> and caused a BUG() to be thrown.
>>> Also, even if we didn't try to deal with leftover buffers in the interrupt,
>>> they subsequently had their delalloc flags removed, and thus queued up
>>> to clobber block 0 (1,2,3) on the disk, thus corrupting the filesystem.
>> so back then, delalloc buffers w/o a home would eventually slam into the
>> superblock, I guess.
> Really?  How do you figure?  Delayed allocations have a special block
> number of -1.  When that is converted to the physical disk address it
> should end up beyond the end of the volume.

Well, I don't remember how things worked back in 2002 for sure, but
pretty sure the net result was that we simply cleared the delalloc page
on the flag, and it eventually came out at block 0.  Back then.  A lot
has changed since!

-Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re-dirty pages on I/O error
  2008-09-11 21:21   ` Eric Sandeen
@ 2008-09-12  6:44     ` Lachlan McIlroy
  2008-09-12 13:17       ` Eric Sandeen
  0 siblings, 1 reply; 14+ messages in thread
From: Lachlan McIlroy @ 2008-09-12  6:44 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Christoph Hellwig, xfs-dev, xfs-oss

Eric Sandeen wrote:
> Christoph Hellwig wrote:
>> On Thu, Sep 11, 2008 at 06:37:33PM +1000, Lachlan McIlroy wrote:
>>> If we get an error in xfs_page_state_convert() - and it's not EAGAIN - then
>>> we throw away the dirty page without converting the delayed allocation.  This
>>> leaves delayed allocations that can never be removed and confuses code that
>>> expects a flush of the file to clear them.  We need to re-dirty the page on
>>> error so we can try again later or report that the flush failed.
>>>
>>> --- a/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 16:32:11.000000000 +1000
>>> +++ b/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 15:44:09.000000000 +1000
>>> @@ -1147,16 +1147,6 @@ error:
>>> 	if (iohead)
>>> 		xfs_cancel_ioend(iohead);
>>>
>>> -	/*
>>> -	 * If it's delalloc and we have nowhere to put it,
>>> -	 * throw it away, unless the lower layers told
>>> -	 * us to try again.
>>> -	 */
>>> -	if (err != -EAGAIN) {
>>> -		if (!unmapped)
>>> -			block_invalidatepage(page, 0);
>>> -		ClearPageUptodate(page);
>>> -	}
>> While this always looked fishy to me we it needs a good explanation to
>> kill this.  I try to remember why Steve did it this way long time ago.
> 
> Hrm some of that was my fault, ages ago.
> 
> http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/pagebuf/Attic/page_buf_io.c.diff?r1=1.2;r2=1.3;hideattic=0
That revision history doesn't match the ptools history - probably cause the
file no longer exists.  Anyway it looks like the hack was mostly there before
your change.

> 
> I don't remember the details fo why.... ah here's a clue
> 
> http://oss.sgi.com/archives/xfs/2002-01/msg00475.html
Geez, that's so long ago that I doubt it's still an issue.  So it sounds
like this hack was added to prevent further issues after a forced shutdown
has already occurred.  If we leave this hack in there's a good chance it
will cause a forced shutdown.

> 
>> As _xfs_force_shutdown was written, it tried to schedule in an interrupt context
>> and caused a BUG() to be thrown.
> 
>> Also, even if we didn't try to deal with leftover buffers in the interrupt,
>> they subsequently had their delalloc flags removed, and thus queued up
>> to clobber block 0 (1,2,3) on the disk, thus corrupting the filesystem.
> 
> so back then, delalloc buffers w/o a home would eventually slam into the
> superblock, I guess.
Really?  How do you figure?  Delayed allocations have a special block
number of -1.  When that is converted to the physical disk address it
should end up beyond the end of the volume.

> 
> Anyway, if this is redirtied, will it ever go away for an IO error that
> persists?
For the case that I am trying to fix the error should not be persistent.
If there is a persistent error (and it doesn't result in a forced shutdown)
then I think we'll just keep trying to flush the data.

For errors that are ultimately persistant we wont know on the first I/O
that these are going to be persistant errors so I don't think it makes
sense to toss the data straight away - we may have temporarily lost our
link to the storage.

If there is a serious error (ie that results in a forced shutdown) then
we should end up tossing the data in xfs_sync_inodes().

> 
> -Eric
> 
>>> @@ -1216,8 +1206,11 @@ xfs_vm_writepage(
>>> 	 * then mark the page dirty again and leave the page
>>> 	 * as is.
>>> 	 */
>>> -	if (current_test_flags(PF_FSTRANS) && need_trans)
>>> -		goto out_fail;
>>> +	if (current_test_flags(PF_FSTRANS) && need_trans) {
>>> +		redirty_page_for_writepage(wbc, page);
>>> +		unlock_page(page);
>>> +		return -EAGAIN;
>> The redirty, unlock, return sequence is duplicated after your
>> patch, I think we should still keep the out_fail goto.  Also returning
>> -EGAIN from ->writepage is wrong.  The return values goes through
>> handle_write_error and mapping_set_error into the return value of e.g.
>> msync.  If you look at all similar writepage implementation they only
>> return a negative error for a real error condition and simply return 0
>> when just redirtying it due to transaction constraints or when trylocks
>> fail.
>>
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re-dirty pages on I/O error
  2008-09-11 10:33 ` Christoph Hellwig
  2008-09-11 21:21   ` Eric Sandeen
@ 2008-09-12  6:04   ` Lachlan McIlroy
  1 sibling, 0 replies; 14+ messages in thread
From: Lachlan McIlroy @ 2008-09-12  6:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs-dev, xfs-oss

Christoph Hellwig wrote:
> On Thu, Sep 11, 2008 at 06:37:33PM +1000, Lachlan McIlroy wrote:
>> If we get an error in xfs_page_state_convert() - and it's not EAGAIN - then
>> we throw away the dirty page without converting the delayed allocation.  This
>> leaves delayed allocations that can never be removed and confuses code that
>> expects a flush of the file to clear them.  We need to re-dirty the page on
>> error so we can try again later or report that the flush failed.
>>
>> --- a/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 16:32:11.000000000 +1000
>> +++ b/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 15:44:09.000000000 +1000
>> @@ -1147,16 +1147,6 @@ error:
>> 	if (iohead)
>> 		xfs_cancel_ioend(iohead);
>>
>> -	/*
>> -	 * If it's delalloc and we have nowhere to put it,
>> -	 * throw it away, unless the lower layers told
>> -	 * us to try again.
>> -	 */
>> -	if (err != -EAGAIN) {
>> -		if (!unmapped)
>> -			block_invalidatepage(page, 0);
>> -		ClearPageUptodate(page);
>> -	}
> 
> While this always looked fishy to me we it needs a good explanation to
> kill this.  I try to remember why Steve did it this way long time ago.
We'd need an even better explanation to keep it.  It's just plain wrong
and has caused serious customer issues for us.

At ENOSPC calls to xfs_trans_reserve() can fail if the reserved space is
also exhausted (see m_resblks).  This causes conversion of delayed allocs
to fail with ENOSPC.  This dodgy code throws the page away and leaves the
delayed alloc.  A direct I/O then panics because it finds a delayed
allocation when it does not expect one.

This code has been there since the initial version of the file.

> 
>> @@ -1216,8 +1206,11 @@ xfs_vm_writepage(
>> 	 * then mark the page dirty again and leave the page
>> 	 * as is.
>> 	 */
>> -	if (current_test_flags(PF_FSTRANS) && need_trans)
>> -		goto out_fail;
>> +	if (current_test_flags(PF_FSTRANS) && need_trans) {
>> +		redirty_page_for_writepage(wbc, page);
>> +		unlock_page(page);
>> +		return -EAGAIN;
> 
> The redirty, unlock, return sequence is duplicated after your
> patch, I think we should still keep the out_fail goto.  Also returning
> -EGAIN from ->writepage is wrong.  The return values goes through
> handle_write_error and mapping_set_error into the return value of e.g.
> msync.  If you look at all similar writepage implementation they only
> return a negative error for a real error condition and simply return 0
> when just redirtying it due to transaction constraints or when trylocks
> fail.

I got rid of the goto because it looked messy but if you want it back I'll
change it.

As for the EAGAIN, I'll change that to 0.  I don't think it makes any
difference to the problem I'm trying to fix.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re-dirty pages on I/O error
  2008-09-11 10:33 ` Christoph Hellwig
@ 2008-09-11 21:21   ` Eric Sandeen
  2008-09-12  6:44     ` Lachlan McIlroy
  2008-09-12  6:04   ` Lachlan McIlroy
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Sandeen @ 2008-09-11 21:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Lachlan McIlroy, xfs-dev, xfs-oss

Christoph Hellwig wrote:
> On Thu, Sep 11, 2008 at 06:37:33PM +1000, Lachlan McIlroy wrote:
>> If we get an error in xfs_page_state_convert() - and it's not EAGAIN - then
>> we throw away the dirty page without converting the delayed allocation.  This
>> leaves delayed allocations that can never be removed and confuses code that
>> expects a flush of the file to clear them.  We need to re-dirty the page on
>> error so we can try again later or report that the flush failed.
>>
>> --- a/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 16:32:11.000000000 +1000
>> +++ b/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 15:44:09.000000000 +1000
>> @@ -1147,16 +1147,6 @@ error:
>> 	if (iohead)
>> 		xfs_cancel_ioend(iohead);
>>
>> -	/*
>> -	 * If it's delalloc and we have nowhere to put it,
>> -	 * throw it away, unless the lower layers told
>> -	 * us to try again.
>> -	 */
>> -	if (err != -EAGAIN) {
>> -		if (!unmapped)
>> -			block_invalidatepage(page, 0);
>> -		ClearPageUptodate(page);
>> -	}
> 
> While this always looked fishy to me we it needs a good explanation to
> kill this.  I try to remember why Steve did it this way long time ago.

Hrm some of that was my fault, ages ago.

http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/pagebuf/Attic/page_buf_io.c.diff?r1=1.2;r2=1.3;hideattic=0

I don't remember the details fo why.... ah here's a clue

http://oss.sgi.com/archives/xfs/2002-01/msg00475.html

> As _xfs_force_shutdown was written, it tried to schedule in an interrupt context
> and caused a BUG() to be thrown.

> Also, even if we didn't try to deal with leftover buffers in the interrupt,
> they subsequently had their delalloc flags removed, and thus queued up
> to clobber block 0 (1,2,3) on the disk, thus corrupting the filesystem.

so back then, delalloc buffers w/o a home would eventually slam into the
superblock, I guess.

Anyway, if this is redirtied, will it ever go away for an IO error that
persists?

-Eric

>> @@ -1216,8 +1206,11 @@ xfs_vm_writepage(
>> 	 * then mark the page dirty again and leave the page
>> 	 * as is.
>> 	 */
>> -	if (current_test_flags(PF_FSTRANS) && need_trans)
>> -		goto out_fail;
>> +	if (current_test_flags(PF_FSTRANS) && need_trans) {
>> +		redirty_page_for_writepage(wbc, page);
>> +		unlock_page(page);
>> +		return -EAGAIN;
> 
> The redirty, unlock, return sequence is duplicated after your
> patch, I think we should still keep the out_fail goto.  Also returning
> -EGAIN from ->writepage is wrong.  The return values goes through
> handle_write_error and mapping_set_error into the return value of e.g.
> msync.  If you look at all similar writepage implementation they only
> return a negative error for a real error condition and simply return 0
> when just redirtying it due to transaction constraints or when trylocks
> fail.
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Re-dirty pages on I/O error
  2008-09-11  8:37 Lachlan McIlroy
@ 2008-09-11 10:33 ` Christoph Hellwig
  2008-09-11 21:21   ` Eric Sandeen
  2008-09-12  6:04   ` Lachlan McIlroy
  2008-09-13  4:19 ` Dave Chinner
  1 sibling, 2 replies; 14+ messages in thread
From: Christoph Hellwig @ 2008-09-11 10:33 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: xfs-dev, xfs-oss

On Thu, Sep 11, 2008 at 06:37:33PM +1000, Lachlan McIlroy wrote:
> If we get an error in xfs_page_state_convert() - and it's not EAGAIN - then
> we throw away the dirty page without converting the delayed allocation.  This
> leaves delayed allocations that can never be removed and confuses code that
> expects a flush of the file to clear them.  We need to re-dirty the page on
> error so we can try again later or report that the flush failed.
>
> --- a/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 16:32:11.000000000 +1000
> +++ b/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 15:44:09.000000000 +1000
> @@ -1147,16 +1147,6 @@ error:
> 	if (iohead)
> 		xfs_cancel_ioend(iohead);
>
> -	/*
> -	 * If it's delalloc and we have nowhere to put it,
> -	 * throw it away, unless the lower layers told
> -	 * us to try again.
> -	 */
> -	if (err != -EAGAIN) {
> -		if (!unmapped)
> -			block_invalidatepage(page, 0);
> -		ClearPageUptodate(page);
> -	}

While this always looked fishy to me we it needs a good explanation to
kill this.  I try to remember why Steve did it this way long time ago.

> @@ -1216,8 +1206,11 @@ xfs_vm_writepage(
> 	 * then mark the page dirty again and leave the page
> 	 * as is.
> 	 */
> -	if (current_test_flags(PF_FSTRANS) && need_trans)
> -		goto out_fail;
> +	if (current_test_flags(PF_FSTRANS) && need_trans) {
> +		redirty_page_for_writepage(wbc, page);
> +		unlock_page(page);
> +		return -EAGAIN;

The redirty, unlock, return sequence is duplicated after your
patch, I think we should still keep the out_fail goto.  Also returning
-EGAIN from ->writepage is wrong.  The return values goes through
handle_write_error and mapping_set_error into the return value of e.g.
msync.  If you look at all similar writepage implementation they only
return a negative error for a real error condition and simply return 0
when just redirtying it due to transaction constraints or when trylocks
fail.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Re-dirty pages on I/O error
@ 2008-09-11  8:37 Lachlan McIlroy
  2008-09-11 10:33 ` Christoph Hellwig
  2008-09-13  4:19 ` Dave Chinner
  0 siblings, 2 replies; 14+ messages in thread
From: Lachlan McIlroy @ 2008-09-11  8:37 UTC (permalink / raw)
  To: xfs-dev, xfs-oss

If we get an error in xfs_page_state_convert() - and it's not EAGAIN - then
we throw away the dirty page without converting the delayed allocation.  This
leaves delayed allocations that can never be removed and confuses code that
expects a flush of the file to clear them.  We need to re-dirty the page on
error so we can try again later or report that the flush failed.

--- a/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 16:32:11.000000000 +1000
+++ b/fs/xfs/linux-2.6/xfs_aops.c	2008-09-11 15:44:09.000000000 +1000
@@ -1147,16 +1147,6 @@ error:
 	if (iohead)
 		xfs_cancel_ioend(iohead);
 
-	/*
-	 * If it's delalloc and we have nowhere to put it,
-	 * throw it away, unless the lower layers told
-	 * us to try again.
-	 */
-	if (err != -EAGAIN) {
-		if (!unmapped)
-			block_invalidatepage(page, 0);
-		ClearPageUptodate(page);
-	}
 	return err;
 }
 
@@ -1216,8 +1206,11 @@ xfs_vm_writepage(
 	 * then mark the page dirty again and leave the page
 	 * as is.
 	 */
-	if (current_test_flags(PF_FSTRANS) && need_trans)
-		goto out_fail;
+	if (current_test_flags(PF_FSTRANS) && need_trans) {
+		redirty_page_for_writepage(wbc, page);
+		unlock_page(page);
+		return -EAGAIN;
+	}
 
 	/*
 	 * Delay hooking up buffer heads until we have
@@ -1231,20 +1224,14 @@ xfs_vm_writepage(
 	 * to real space and flush out to disk.
 	 */
 	error = xfs_page_state_convert(inode, page, wbc, 1, unmapped);
-	if (error == -EAGAIN)
-		goto out_fail;
-	if (unlikely(error < 0))
-		goto out_unlock;
 
-	return 0;
+	if (error < 0) {
+		redirty_page_for_writepage(wbc, page);
+		unlock_page(page);
+		return error;
+	}
 
-out_fail:
-	redirty_page_for_writepage(wbc, page);
-	unlock_page(page);
 	return 0;
-out_unlock:
-	unlock_page(page);
-	return error;
 }
 
 STATIC int

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-02-15 20:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-10  1:48 [PATCH] Re-dirty pages on I/O error Lachlan McIlroy
2009-02-10 10:01 ` Dave Chinner
2009-02-10 23:33   ` Lachlan McIlroy
2009-02-15 20:05   ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2008-09-11  8:37 Lachlan McIlroy
2008-09-11 10:33 ` Christoph Hellwig
2008-09-11 21:21   ` Eric Sandeen
2008-09-12  6:44     ` Lachlan McIlroy
2008-09-12 13:17       ` Eric Sandeen
2008-09-12  6:04   ` Lachlan McIlroy
2008-09-13  4:19 ` Dave Chinner
2008-09-15  3:22   ` Lachlan McIlroy
2008-09-16  4:01     ` Dave Chinner
2008-09-16  6:30       ` Lachlan McIlroy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.