All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: Fix a deadlock in xfs_log_commit_cil() code path
@ 2013-07-19 22:31 Chandra Seetharaman
  2013-07-22 18:53 ` Ben Myers
  0 siblings, 1 reply; 7+ messages in thread
From: Chandra Seetharaman @ 2013-07-19 22:31 UTC (permalink / raw)
  To: XFS mailing list


While testing and rearranging pquota/gquota code, I stumbled
on a xfs_shutdown() during a mount. But the mount just hung.

Debugged and found that there is a deadlock involving
&log->l_cilp->xc_ctx_lock.

It is in a code path where &log->l_cilp->xc_ctx_lock is first
acquired in read mode and some levels down the same semaphore
is being acquired in write mode causing a deadlock.

This is the stack:
xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
  xlog_print_tic_res
    xfs_force_shutdown
      xfs_log_force_umount
        xlog_cil_force
          xlog_cil_force_lsn
            xlog_cil_push_foreground
              xlog_cil_push - tries to acquire same semaphore in write mode

This patch fixes the deadlock by changing the reason code for
xfs_force_shutdown in xlog_print_tic_res() to SHUTDOWN_LOG_IO_ERROR.

SHUTDOWN_LOG_IO_ERROR is the right reason code to be set since
we are in the log path.

Thanks to Dave for suggesting this solution.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
---
 fs/xfs/xfs_log.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index d852a2b..bf89eb9 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1941,7 +1941,7 @@ xlog_print_tic_res(
 
 	xfs_alert_tag(mp, XFS_PTAG_LOGRES,
 		"xlog_write: reservation ran out. Need to up reservation");
-	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+	xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR);
 }
 
 /*
-- 
1.7.1



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: Fix a deadlock in xfs_log_commit_cil() code path
  2013-07-19 22:31 [PATCH] xfs: Fix a deadlock in xfs_log_commit_cil() code path Chandra Seetharaman
@ 2013-07-22 18:53 ` Ben Myers
  0 siblings, 0 replies; 7+ messages in thread
From: Ben Myers @ 2013-07-22 18:53 UTC (permalink / raw)
  To: Chandra Seetharaman; +Cc: XFS mailing list

On Fri, Jul 19, 2013 at 05:31:38PM -0500, Chandra Seetharaman wrote:
> 
> While testing and rearranging pquota/gquota code, I stumbled
> on a xfs_shutdown() during a mount. But the mount just hung.
> 
> Debugged and found that there is a deadlock involving
> &log->l_cilp->xc_ctx_lock.
> 
> It is in a code path where &log->l_cilp->xc_ctx_lock is first
> acquired in read mode and some levels down the same semaphore
> is being acquired in write mode causing a deadlock.
> 
> This is the stack:
> xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
>   xlog_print_tic_res
>     xfs_force_shutdown
>       xfs_log_force_umount
>         xlog_cil_force
>           xlog_cil_force_lsn
>             xlog_cil_push_foreground
>               xlog_cil_push - tries to acquire same semaphore in write mode
> 
> This patch fixes the deadlock by changing the reason code for
> xfs_force_shutdown in xlog_print_tic_res() to SHUTDOWN_LOG_IO_ERROR.
> 
> SHUTDOWN_LOG_IO_ERROR is the right reason code to be set since
> we are in the log path.
> 
> Thanks to Dave for suggesting this solution.
> 
> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>

Looks fine.

Reviewed-by: Ben Myers <bpm@sgi.com>

Applied.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: Fix a deadlock in xfs_log_commit_cil() code path
  2013-07-18  2:24     ` Dave Chinner
@ 2013-07-18 19:54       ` Chandra Seetharaman
  0 siblings, 0 replies; 7+ messages in thread
From: Chandra Seetharaman @ 2013-07-18 19:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: XFS mailing list

On Thu, 2013-07-18 at 12:24 +1000, Dave Chinner wrote:
> On Wed, Jul 17, 2013 at 04:32:55PM -0500, Chandra Seetharaman wrote:
> > On Tue, 2013-07-16 at 10:54 +1000, Dave Chinner wrote:
> > > On Mon, Jul 15, 2013 at 05:52:34PM -0500, Chandra Seetharaman wrote:
> > > > While testing and rearranging my pquota/gquota code, I stumbled
> > > > on a xfs_shutdown() during a mount. But the mount just hung.
> > > > 
> > > > I debugged and found that there is a deadlock involving
> > > > &log->l_cilp->xc_ctx_lock.
> > > > 
> > > > It is in a code path where &log->l_cilp->xc_ctx_lock is first
> > > > acquired in read mode and some levels down the same semaphore
> > > > is being acquired in write mode causing a deadlock.
> > > > 
> > > > This is the stack:
> > > > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
> > > >   xlog_print_tic_res
> > > >     xfs_force_shutdown
> > > >       xfs_log_force_umount
> > > >         xlog_cil_force
> > > >           xlog_cil_force_lsn
> > > >             xlog_cil_push_foreground
> > > >               xlog_cil_push - tries to acquire same semaphore in write mode
> > > > 
> > > > This patch fixes the deadlock by not calling xfs_force_shutdown() while
> > > > holding the semaphore, instead calling it after dropping teh semaphore.
> > > > 
> > > > Thanks to Dave for suggesting this solution.
> > > > 
> > > > Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
> > > > 
> > > > ---
> > > >  fs/xfs/xfs_log.c      |    6 +++---
> > > >  fs/xfs/xfs_log_cil.c  |   10 ++++++----
> > > >  fs/xfs/xfs_log_priv.h |    2 +-
> > > >  fs/xfs/xfs_trans.c    |    2 +-
> > > >  4 files changed, 11 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > > index d852a2b..b9fa2da 100644
> > > > --- a/fs/xfs/xfs_log.c
> > > > +++ b/fs/xfs/xfs_log.c
> > > > @@ -1837,7 +1837,7 @@ xlog_state_finish_copy(
> > > >   * print out info relating to regions written which consume
> > > >   * the reservation
> > > >   */
> > > > -void
> > > > +int
> > > >  xlog_print_tic_res(
> > > >  	struct xfs_mount	*mp,
> > > >  	struct xlog_ticket	*ticket)
> > > > @@ -1941,7 +1941,7 @@ xlog_print_tic_res(
> > > >  
> > > >  	xfs_alert_tag(mp, XFS_PTAG_LOGRES,
> > > >  		"xlog_write: reservation ran out. Need to up reservation");
> > > > -	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > > > +	return EFSCORRUPTED;
> > > 
> > > Note the "SHUTDOWN_CORRUPT_INCORE" reason given here....
> > > 
> > > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > > > index 35a2299..d96022f 100644
> > > > --- a/fs/xfs/xfs_trans.c
> > > > +++ b/fs/xfs/xfs_trans.c
> > > > @@ -1547,7 +1547,7 @@ xfs_trans_commit(
> > > >  	xfs_trans_apply_dquot_deltas(tp);
> > > >  
> > > >  	error = xfs_log_commit_cil(mp, tp, &commit_lsn, flags);
> > > > -	if (error == ENOMEM) {
> > > > +	if (error) {
> > > >  		xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR);
> > > 
> > > Which is different to the reason given here. The shutdown reason
> > > should be maintained for this particular error....
> > 
> > I see.
> 
> What I mean is that the code in xfs_trans_commit() should do
> something like:
> 
> 	if (error) {
> 		int reason = SHUTDOWN_LOG_IO_ERROR;
> 		if (error == EFSCORRUPTED)
> 			reason = SHUTDOWN_CORRUPT_INCORE;
> 		xfs_force_shutdown(mp, reason);
> 		....
> 	}
> 
> > 
> > Is it ok if the error reason is not propagated to the xlog_write() code
> > path ?
> 
> No - if we get a transaction overflow, we need to trigger a
> shutdown. That means the error needs to be caught by the
> xlog_write() path an the filesystem shut down.
> 
> Looking at it more deeply, you could probably just change the
> shutdown in xlog_print_tic_res() to use SHUTDOWN_LOG_IO_ERROR and
> the problem is solved as the shutdown won't try to force the
> log. i.e. this whole problem will go away with that one line fix...

I am confused.

In the previous response you mentioned that we have to propagate the
reason as-is in xfs_trans_commit() path. But, the new suggestion you are
making will change the behavior of all paths and they will not enter
xfs_log_force_umount().

Besides, IIUC, XFS_MOUNT_FS_SHUTDOWN is set only in
xfs_log_force_umount(), so the very first time we enter
xlog_print_tic_res(), even with SHUTDOWN_LOG_IO_ERROR we will call
xfs_log_force_umount() when can lead to the deadlock we are trying to
avoid.

  
> 
> Cheers,
> 
> Dave.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: Fix a deadlock in xfs_log_commit_cil() code path
  2013-07-17 21:32   ` Chandra Seetharaman
@ 2013-07-18  2:24     ` Dave Chinner
  2013-07-18 19:54       ` Chandra Seetharaman
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2013-07-18  2:24 UTC (permalink / raw)
  To: Chandra Seetharaman; +Cc: XFS mailing list

On Wed, Jul 17, 2013 at 04:32:55PM -0500, Chandra Seetharaman wrote:
> On Tue, 2013-07-16 at 10:54 +1000, Dave Chinner wrote:
> > On Mon, Jul 15, 2013 at 05:52:34PM -0500, Chandra Seetharaman wrote:
> > > While testing and rearranging my pquota/gquota code, I stumbled
> > > on a xfs_shutdown() during a mount. But the mount just hung.
> > > 
> > > I debugged and found that there is a deadlock involving
> > > &log->l_cilp->xc_ctx_lock.
> > > 
> > > It is in a code path where &log->l_cilp->xc_ctx_lock is first
> > > acquired in read mode and some levels down the same semaphore
> > > is being acquired in write mode causing a deadlock.
> > > 
> > > This is the stack:
> > > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
> > >   xlog_print_tic_res
> > >     xfs_force_shutdown
> > >       xfs_log_force_umount
> > >         xlog_cil_force
> > >           xlog_cil_force_lsn
> > >             xlog_cil_push_foreground
> > >               xlog_cil_push - tries to acquire same semaphore in write mode
> > > 
> > > This patch fixes the deadlock by not calling xfs_force_shutdown() while
> > > holding the semaphore, instead calling it after dropping teh semaphore.
> > > 
> > > Thanks to Dave for suggesting this solution.
> > > 
> > > Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
> > > 
> > > ---
> > >  fs/xfs/xfs_log.c      |    6 +++---
> > >  fs/xfs/xfs_log_cil.c  |   10 ++++++----
> > >  fs/xfs/xfs_log_priv.h |    2 +-
> > >  fs/xfs/xfs_trans.c    |    2 +-
> > >  4 files changed, 11 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > index d852a2b..b9fa2da 100644
> > > --- a/fs/xfs/xfs_log.c
> > > +++ b/fs/xfs/xfs_log.c
> > > @@ -1837,7 +1837,7 @@ xlog_state_finish_copy(
> > >   * print out info relating to regions written which consume
> > >   * the reservation
> > >   */
> > > -void
> > > +int
> > >  xlog_print_tic_res(
> > >  	struct xfs_mount	*mp,
> > >  	struct xlog_ticket	*ticket)
> > > @@ -1941,7 +1941,7 @@ xlog_print_tic_res(
> > >  
> > >  	xfs_alert_tag(mp, XFS_PTAG_LOGRES,
> > >  		"xlog_write: reservation ran out. Need to up reservation");
> > > -	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > > +	return EFSCORRUPTED;
> > 
> > Note the "SHUTDOWN_CORRUPT_INCORE" reason given here....
> > 
> > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > > index 35a2299..d96022f 100644
> > > --- a/fs/xfs/xfs_trans.c
> > > +++ b/fs/xfs/xfs_trans.c
> > > @@ -1547,7 +1547,7 @@ xfs_trans_commit(
> > >  	xfs_trans_apply_dquot_deltas(tp);
> > >  
> > >  	error = xfs_log_commit_cil(mp, tp, &commit_lsn, flags);
> > > -	if (error == ENOMEM) {
> > > +	if (error) {
> > >  		xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR);
> > 
> > Which is different to the reason given here. The shutdown reason
> > should be maintained for this particular error....
> 
> I see.

What I mean is that the code in xfs_trans_commit() should do
something like:

	if (error) {
		int reason = SHUTDOWN_LOG_IO_ERROR;
		if (error == EFSCORRUPTED)
			reason = SHUTDOWN_CORRUPT_INCORE;
		xfs_force_shutdown(mp, reason);
		....
	}

> 
> Is it ok if the error reason is not propagated to the xlog_write() code
> path ?

No - if we get a transaction overflow, we need to trigger a
shutdown. That means the error needs to be caught by the
xlog_write() path an the filesystem shut down.

Looking at it more deeply, you could probably just change the
shutdown in xlog_print_tic_res() to use SHUTDOWN_LOG_IO_ERROR and
the problem is solved as the shutdown won't try to force the
log. i.e. this whole problem will go away with that one line fix...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: Fix a deadlock in xfs_log_commit_cil() code path
  2013-07-16  0:54 ` Dave Chinner
@ 2013-07-17 21:32   ` Chandra Seetharaman
  2013-07-18  2:24     ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Chandra Seetharaman @ 2013-07-17 21:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: XFS mailing list

On Tue, 2013-07-16 at 10:54 +1000, Dave Chinner wrote:
> On Mon, Jul 15, 2013 at 05:52:34PM -0500, Chandra Seetharaman wrote:
> > While testing and rearranging my pquota/gquota code, I stumbled
> > on a xfs_shutdown() during a mount. But the mount just hung.
> > 
> > I debugged and found that there is a deadlock involving
> > &log->l_cilp->xc_ctx_lock.
> > 
> > It is in a code path where &log->l_cilp->xc_ctx_lock is first
> > acquired in read mode and some levels down the same semaphore
> > is being acquired in write mode causing a deadlock.
> > 
> > This is the stack:
> > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
> >   xlog_print_tic_res
> >     xfs_force_shutdown
> >       xfs_log_force_umount
> >         xlog_cil_force
> >           xlog_cil_force_lsn
> >             xlog_cil_push_foreground
> >               xlog_cil_push - tries to acquire same semaphore in write mode
> > 
> > This patch fixes the deadlock by not calling xfs_force_shutdown() while
> > holding the semaphore, instead calling it after dropping teh semaphore.
> > 
> > Thanks to Dave for suggesting this solution.
> > 
> > Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
> > 
> > ---
> >  fs/xfs/xfs_log.c      |    6 +++---
> >  fs/xfs/xfs_log_cil.c  |   10 ++++++----
> >  fs/xfs/xfs_log_priv.h |    2 +-
> >  fs/xfs/xfs_trans.c    |    2 +-
> >  4 files changed, 11 insertions(+), 9 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index d852a2b..b9fa2da 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -1837,7 +1837,7 @@ xlog_state_finish_copy(
> >   * print out info relating to regions written which consume
> >   * the reservation
> >   */
> > -void
> > +int
> >  xlog_print_tic_res(
> >  	struct xfs_mount	*mp,
> >  	struct xlog_ticket	*ticket)
> > @@ -1941,7 +1941,7 @@ xlog_print_tic_res(
> >  
> >  	xfs_alert_tag(mp, XFS_PTAG_LOGRES,
> >  		"xlog_write: reservation ran out. Need to up reservation");
> > -	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > +	return EFSCORRUPTED;
> 
> Note the "SHUTDOWN_CORRUPT_INCORE" reason given here....
> 
> > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > index 35a2299..d96022f 100644
> > --- a/fs/xfs/xfs_trans.c
> > +++ b/fs/xfs/xfs_trans.c
> > @@ -1547,7 +1547,7 @@ xfs_trans_commit(
> >  	xfs_trans_apply_dquot_deltas(tp);
> >  
> >  	error = xfs_log_commit_cil(mp, tp, &commit_lsn, flags);
> > -	if (error == ENOMEM) {
> > +	if (error) {
> >  		xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR);
> 
> Which is different to the reason given here. The shutdown reason
> should be maintained for this particular error....

I see.

Is it ok if the error reason is not propagated to the xlog_write() code
path ?

> 
> Cheers,
> 
> Dave.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: Fix a deadlock in xfs_log_commit_cil() code path
  2013-07-15 22:52 Chandra Seetharaman
@ 2013-07-16  0:54 ` Dave Chinner
  2013-07-17 21:32   ` Chandra Seetharaman
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2013-07-16  0:54 UTC (permalink / raw)
  To: Chandra Seetharaman; +Cc: XFS mailing list

On Mon, Jul 15, 2013 at 05:52:34PM -0500, Chandra Seetharaman wrote:
> While testing and rearranging my pquota/gquota code, I stumbled
> on a xfs_shutdown() during a mount. But the mount just hung.
> 
> I debugged and found that there is a deadlock involving
> &log->l_cilp->xc_ctx_lock.
> 
> It is in a code path where &log->l_cilp->xc_ctx_lock is first
> acquired in read mode and some levels down the same semaphore
> is being acquired in write mode causing a deadlock.
> 
> This is the stack:
> xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
>   xlog_print_tic_res
>     xfs_force_shutdown
>       xfs_log_force_umount
>         xlog_cil_force
>           xlog_cil_force_lsn
>             xlog_cil_push_foreground
>               xlog_cil_push - tries to acquire same semaphore in write mode
> 
> This patch fixes the deadlock by not calling xfs_force_shutdown() while
> holding the semaphore, instead calling it after dropping teh semaphore.
> 
> Thanks to Dave for suggesting this solution.
> 
> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
> 
> ---
>  fs/xfs/xfs_log.c      |    6 +++---
>  fs/xfs/xfs_log_cil.c  |   10 ++++++----
>  fs/xfs/xfs_log_priv.h |    2 +-
>  fs/xfs/xfs_trans.c    |    2 +-
>  4 files changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index d852a2b..b9fa2da 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1837,7 +1837,7 @@ xlog_state_finish_copy(
>   * print out info relating to regions written which consume
>   * the reservation
>   */
> -void
> +int
>  xlog_print_tic_res(
>  	struct xfs_mount	*mp,
>  	struct xlog_ticket	*ticket)
> @@ -1941,7 +1941,7 @@ xlog_print_tic_res(
>  
>  	xfs_alert_tag(mp, XFS_PTAG_LOGRES,
>  		"xlog_write: reservation ran out. Need to up reservation");
> -	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> +	return EFSCORRUPTED;

Note the "SHUTDOWN_CORRUPT_INCORE" reason given here....

> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 35a2299..d96022f 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -1547,7 +1547,7 @@ xfs_trans_commit(
>  	xfs_trans_apply_dquot_deltas(tp);
>  
>  	error = xfs_log_commit_cil(mp, tp, &commit_lsn, flags);
> -	if (error == ENOMEM) {
> +	if (error) {
>  		xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR);

Which is different to the reason given here. The shutdown reason
should be maintained for this particular error....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] xfs: Fix a deadlock in xfs_log_commit_cil() code path
@ 2013-07-15 22:52 Chandra Seetharaman
  2013-07-16  0:54 ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Chandra Seetharaman @ 2013-07-15 22:52 UTC (permalink / raw)
  To: XFS mailing list

While testing and rearranging my pquota/gquota code, I stumbled
on a xfs_shutdown() during a mount. But the mount just hung.

I debugged and found that there is a deadlock involving
&log->l_cilp->xc_ctx_lock.

It is in a code path where &log->l_cilp->xc_ctx_lock is first
acquired in read mode and some levels down the same semaphore
is being acquired in write mode causing a deadlock.

This is the stack:
xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
  xlog_print_tic_res
    xfs_force_shutdown
      xfs_log_force_umount
        xlog_cil_force
          xlog_cil_force_lsn
            xlog_cil_push_foreground
              xlog_cil_push - tries to acquire same semaphore in write mode

This patch fixes the deadlock by not calling xfs_force_shutdown() while
holding the semaphore, instead calling it after dropping teh semaphore.

Thanks to Dave for suggesting this solution.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>

---
 fs/xfs/xfs_log.c      |    6 +++---
 fs/xfs/xfs_log_cil.c  |   10 ++++++----
 fs/xfs/xfs_log_priv.h |    2 +-
 fs/xfs/xfs_trans.c    |    2 +-
 4 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index d852a2b..b9fa2da 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1837,7 +1837,7 @@ xlog_state_finish_copy(
  * print out info relating to regions written which consume
  * the reservation
  */
-void
+int
 xlog_print_tic_res(
 	struct xfs_mount	*mp,
 	struct xlog_ticket	*ticket)
@@ -1941,7 +1941,7 @@ xlog_print_tic_res(
 
 	xfs_alert_tag(mp, XFS_PTAG_LOGRES,
 		"xlog_write: reservation ran out. Need to up reservation");
-	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+	return EFSCORRUPTED;
 }
 
 /*
@@ -2215,7 +2215,7 @@ xlog_write(
 		ticket->t_curr_res -= sizeof(xlog_op_header_t);
 
 	if (ticket->t_curr_res < 0)
-		xlog_print_tic_res(log->l_mp, ticket);
+		return xlog_print_tic_res(log->l_mp, ticket);
 
 	index = 0;
 	lv = log_vector;
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 02b9cf3..93ba7bd 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -730,10 +730,6 @@ xfs_log_commit_cil(
 	/* xlog_cil_insert_items() destroys log_vector list */
 	xlog_cil_insert_items(log, log_vector, tp->t_ticket);
 
-	/* check we didn't blow the reservation */
-	if (tp->t_ticket->t_curr_res < 0)
-		xlog_print_tic_res(log->l_mp, tp->t_ticket);
-
 	/* attach the transaction to the CIL if it has any busy extents */
 	if (!list_empty(&tp->t_busy)) {
 		spin_lock(&log->l_cilp->xc_cil_lock);
@@ -742,6 +738,12 @@ xfs_log_commit_cil(
 		spin_unlock(&log->l_cilp->xc_cil_lock);
 	}
 
+	/* check we didn't blow the reservation */
+	if (tp->t_ticket->t_curr_res < 0) {
+		up_read(&log->l_cilp->xc_ctx_lock);
+		return xlog_print_tic_res(log->l_mp, tp->t_ticket);
+	}
+
 	tp->t_commit_lsn = *commit_lsn;
 	xfs_log_done(mp, tp->t_ticket, NULL, log_flags);
 	xfs_trans_unreserve_and_mod_sb(tp);
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index b9ea262..4f2fa6d 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -576,7 +576,7 @@ xlog_write_adv_cnt(void **ptr, int *len, int *off, size_t bytes)
 	*off += bytes;
 }
 
-void	xlog_print_tic_res(struct xfs_mount *mp, struct xlog_ticket *ticket);
+int	xlog_print_tic_res(struct xfs_mount *mp, struct xlog_ticket *ticket);
 int
 xlog_write(
 	struct xlog		*log,
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 35a2299..d96022f 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1547,7 +1547,7 @@ xfs_trans_commit(
 	xfs_trans_apply_dquot_deltas(tp);
 
 	error = xfs_log_commit_cil(mp, tp, &commit_lsn, flags);
-	if (error == ENOMEM) {
+	if (error) {
 		xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR);
 		error = XFS_ERROR(EIO);
 		goto out_unreserve;


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-07-22 18:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-19 22:31 [PATCH] xfs: Fix a deadlock in xfs_log_commit_cil() code path Chandra Seetharaman
2013-07-22 18:53 ` Ben Myers
  -- strict thread matches above, loose matches on Subject: below --
2013-07-15 22:52 Chandra Seetharaman
2013-07-16  0:54 ` Dave Chinner
2013-07-17 21:32   ` Chandra Seetharaman
2013-07-18  2:24     ` Dave Chinner
2013-07-18 19:54       ` Chandra Seetharaman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.