All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs_buf_iodone_callbacks to force shutdown & resubmit buf in case of permanent failure
@ 2016-04-26 14:01 Shyam Kaushik
  2016-04-26 22:46 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Shyam Kaushik @ 2016-04-26 14:01 UTC (permalink / raw)
  To: xfs; +Cc: Alex Lyakas

When XFS underlying disk fails, it could take several milliseconds for the
FS to be marked
shutdown. xfs_buf_iodone_callbacks() retries buf upon first failure by
submitting it once
again. But if the buf fails 2nd time before FS is marked for shutdown, it
just releases the
buf with xfs_buf_relse(). This is flawed that nobody is releasing the
XFS_IFLOCK on the inode.
Because of this AIL tasks repeated effort to xfs_inode_item_push() will
see that xfs_iflock()
cannot be acquired. This blocks XFS from being unmounted as
xfs_ail_push_all_sync() will keep
looping without progress.

Fix this by marking the FS for shutdown if we have a permanent failure &
resubmit the buf.
xfs_buf_submit() will see FS marked for shutdown & invoke the callback
which releases
XFS_IFLOCK.

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 1a6c9b9..6f73ee0 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -1100,7 +1100,12 @@ xfs_buf_iodone_callbacks(
                                       XBF_DONE | XBF_WRITE_FAIL;
                        xfs_buf_submit(bp);
                } else {
-                       xfs_buf_relse(bp);
+                       /*
+                       * if we have the buf fail 2nd time, force a FS
shutdown & resubmit
+                       * the buf for it to be failed back immediately
+                       */
+                       xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR);
+                       xfs_buf_submit(bp);
                }

                return;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs_buf_iodone_callbacks to force shutdown & resubmit buf in case of permanent failure
  2016-04-26 14:01 [PATCH] xfs_buf_iodone_callbacks to force shutdown & resubmit buf in case of permanent failure Shyam Kaushik
@ 2016-04-26 22:46 ` Dave Chinner
  2016-04-27  4:59   ` Shyam Kaushik
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2016-04-26 22:46 UTC (permalink / raw)
  To: Shyam Kaushik; +Cc: Alex Lyakas, xfs

On Tue, Apr 26, 2016 at 07:31:59PM +0530, Shyam Kaushik wrote:
> When XFS underlying disk fails, it could take several milliseconds for the
> FS to be marked
> shutdown. xfs_buf_iodone_callbacks() retries buf upon first failure by
> submitting it once
> again. But if the buf fails 2nd time before FS is marked for shutdown, it
> just releases the
> buf with xfs_buf_relse(). This is flawed that nobody is releasing the
> XFS_IFLOCK on the inode.
> Because of this AIL tasks repeated effort to xfs_inode_item_push() will
> see that xfs_iflock()
> cannot be acquired. This blocks XFS from being unmounted as
> xfs_ail_push_all_sync() will keep
> looping without progress.

Patch formatting first: commit messages should wrap at 68-72
columns...

> Fix this by marking the FS for shutdown if we have a permanent failure &
> resubmit the buf.
> xfs_buf_submit() will see FS marked for shutdown & invoke the callback
> which releases
> XFS_IFLOCK.
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 1a6c9b9..6f73ee0 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -1100,7 +1100,12 @@ xfs_buf_iodone_callbacks(
>                                        XBF_DONE | XBF_WRITE_FAIL;
>                         xfs_buf_submit(bp);
>                 } else {
> -                       xfs_buf_relse(bp);
> +                       /*
> +                       * if we have the buf fail 2nd time, force a FS
> shutdown & resubmit
> +                       * the buf for it to be failed back immediately
> +                       */
> +                       xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR);
> +                       xfs_buf_submit(bp);
>                 }
> 
>                 return;

... and patches should not be wrapped or have tabs converted to
spaces. See Documentation/SubmittingPatches.

Ok, so yes, it would seem there is a bug here, but I don't think
this is correct solution: this will shut down the filesystem
immediately on inode write IO errors. As i've said *many* times
before: this is might seem like a fix, but it is incorrect behaviour
as it does not give transient failures (e.g. multipath failover
causing timeouts) a chance to be resolved before shutting down the
filesystem.

That's the reason for all the retry behaviour on XFS metadata - the
system can continue to run logging new changes even if it can't
write back the changes immediately. As long as the log writes
continue to work, the filesystem can continue to function correctly.
Hence shutting down the filesystem immediately on any metadata write
error (other than a journal write) is premature and will lead to
spurious errors causing shutdowns rather than just logging a
warning.

We are in the process of making this error behaviour configurable -
Carlos is going to finish off the patchset I originally wrote to do
this, so the "shutdown immediately" option will be avaialable
through that set of interfaces. We'll still need to fix the unlock
case for retry here, though...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH] xfs_buf_iodone_callbacks to force shutdown & resubmit buf in case of permanent failure
  2016-04-26 22:46 ` Dave Chinner
@ 2016-04-27  4:59   ` Shyam Kaushik
  2016-04-27 23:53     ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Shyam Kaushik @ 2016-04-27  4:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Alex Lyakas, xfs

Hi Dave,

I am not sure how to do lock release in this code path. Is it possible
that you can take over this bug/patch? Thanks.

--Shyam

-----Original Message-----
From: Dave Chinner [mailto:david@fromorbit.com]
Sent: 27 April 2016 04:16
To: Shyam Kaushik
Cc: xfs@oss.sgi.com; Alex Lyakas
Subject: Re: [PATCH] xfs_buf_iodone_callbacks to force shutdown & resubmit
buf in case of permanent failure

On Tue, Apr 26, 2016 at 07:31:59PM +0530, Shyam Kaushik wrote:
> When XFS underlying disk fails, it could take several milliseconds for
the
> FS to be marked
> shutdown. xfs_buf_iodone_callbacks() retries buf upon first failure by
> submitting it once
> again. But if the buf fails 2nd time before FS is marked for shutdown,
it
> just releases the
> buf with xfs_buf_relse(). This is flawed that nobody is releasing the
> XFS_IFLOCK on the inode.
> Because of this AIL tasks repeated effort to xfs_inode_item_push() will
> see that xfs_iflock()
> cannot be acquired. This blocks XFS from being unmounted as
> xfs_ail_push_all_sync() will keep
> looping without progress.

Patch formatting first: commit messages should wrap at 68-72
columns...

> Fix this by marking the FS for shutdown if we have a permanent failure &
> resubmit the buf.
> xfs_buf_submit() will see FS marked for shutdown & invoke the callback
> which releases
> XFS_IFLOCK.
>
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 1a6c9b9..6f73ee0 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -1100,7 +1100,12 @@ xfs_buf_iodone_callbacks(
>                                        XBF_DONE | XBF_WRITE_FAIL;
>                         xfs_buf_submit(bp);
>                 } else {
> -                       xfs_buf_relse(bp);
> +                       /*
> +                       * if we have the buf fail 2nd time, force a FS
> shutdown & resubmit
> +                       * the buf for it to be failed back immediately
> +                       */
> +                       xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR);
> +                       xfs_buf_submit(bp);
>                 }
>
>                 return;

... and patches should not be wrapped or have tabs converted to
spaces. See Documentation/SubmittingPatches.

Ok, so yes, it would seem there is a bug here, but I don't think
this is correct solution: this will shut down the filesystem
immediately on inode write IO errors. As i've said *many* times
before: this is might seem like a fix, but it is incorrect behaviour
as it does not give transient failures (e.g. multipath failover
causing timeouts) a chance to be resolved before shutting down the
filesystem.

That's the reason for all the retry behaviour on XFS metadata - the
system can continue to run logging new changes even if it can't
write back the changes immediately. As long as the log writes
continue to work, the filesystem can continue to function correctly.
Hence shutting down the filesystem immediately on any metadata write
error (other than a journal write) is premature and will lead to
spurious errors causing shutdowns rather than just logging a
warning.

We are in the process of making this error behaviour configurable -
Carlos is going to finish off the patchset I originally wrote to do
this, so the "shutdown immediately" option will be avaialable
through that set of interfaces. We'll still need to fix the unlock
case for retry here, though...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs_buf_iodone_callbacks to force shutdown & resubmit buf in case of permanent failure
  2016-04-27  4:59   ` Shyam Kaushik
@ 2016-04-27 23:53     ` Dave Chinner
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2016-04-27 23:53 UTC (permalink / raw)
  To: Shyam Kaushik; +Cc: Alex Lyakas, xfs

On Wed, Apr 27, 2016 at 10:29:29AM +0530, Shyam Kaushik wrote:
> Hi Dave,
> 
> I am not sure how to do lock release in this code path. Is it possible
> that you can take over this bug/patch? Thanks.

To tell the truth, I haven't even thought about how to fix it yet.
I suspect that the callbacks will need to be run, but with a new
parameter passed to xfs_iflush_done to say "still dirty, unlock
only" so that it simply unlocks and removes each object from the
buffer callback list.

I've got other things that need my attention right now (e.g agfl
size problem), so it's going to be next week before I get to this...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-04-27 23:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-26 14:01 [PATCH] xfs_buf_iodone_callbacks to force shutdown & resubmit buf in case of permanent failure Shyam Kaushik
2016-04-26 22:46 ` Dave Chinner
2016-04-27  4:59   ` Shyam Kaushik
2016-04-27 23:53     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.