All of lore.kernel.org
 help / color / mirror / Atom feed
* deadlock with &log->l_cilp->xc_ctx_lock semaphone
@ 2013-05-22 23:12 Chandra Seetharaman
  2013-05-22 23:41 ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Chandra Seetharaman @ 2013-05-22 23:12 UTC (permalink / raw)
  To: XFS mailing list

Hello,

While testing and rearranging my pquota/gquota code, I stumbled on a
xfs_shutdown() during a mount. But the mount just hung.

I debugged and found that it is in a code path where
&log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels
down the same semaphore is being acquired in write mode causing a
deadlock.

This is the stack:
xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
  xlog_print_tic_res
    xfs_force_shutdown
      xfs_log_force_umount
        xlog_cil_force
          xlog_cil_force_lsn
            xlog_cil_push_foreground
              xlog_cil_push - tries to acquire same semaphore in write mode
-------------------------

Here is the actual stack from /var/log/messages:

-------------------------
schedule+0x29/0x70
rwsem_down_write_failed+0xf5/0x1a0
call_rwsem_down_write_failed+0x13/0x20
? down_write+0x31/0x40
xlog_cil_push+0xd6/0x3e0 [xfs]
? __cond_resched+0x2a/0x40
xlog_cil_force_lsn+0x154/0x160 [xfs]
? __xfs_printk+0x31/0x50 [xfs]
xfs_log_force_umount+0x5c/0x170 [xfs]
xfs_do_force_shutdown+0x6b/0x170 [xfs]
? xlog_print_tic_res+0x112/0x120 [xfs]
xlog_print_tic_res+0x112/0x120 [xfs]
xfs_log_commit_cil+0x336/0x4a0 [xfs]
xfs_trans_commit+0x79/0x270 [xfs]  
xfs_qm_write_sb_changes+0x61/0x90 [xfs]
xfs_qm_mount_quotas+0x82/0x180 [xfs]
xfs_mountfs+0x5f6/0x6b0 [xfs]
xfs_fs_fill_super+0x2af/0x330 [xfs]
mount_bdev+0x1be/0x200
? __free_pages+0x2d/0x40
? xfs_setup_devices+0xa0/0xa0 [xfs]
? selinux_sb_copy_data+0x14a/0x1e0 
xfs_fs_mount+0x15/0x20 [xfs]
mount_fs+0x43/0x1a0
vfs_kern_mount+0x72/0x100
do_mount+0x3e3/0x980
SyS_mount+0x90/0xe0
tracesys+0xdd/0xe2

--------------------------



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: deadlock with &log->l_cilp->xc_ctx_lock semaphone
  2013-05-22 23:12 deadlock with &log->l_cilp->xc_ctx_lock semaphone Chandra Seetharaman
@ 2013-05-22 23:41 ` Dave Chinner
  2013-05-23 18:09   ` Chandra Seetharaman
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Dave Chinner @ 2013-05-22 23:41 UTC (permalink / raw)
  To: Chandra Seetharaman; +Cc: XFS mailing list

On Wed, May 22, 2013 at 06:12:43PM -0500, Chandra Seetharaman wrote:
> Hello,
> 
> While testing and rearranging my pquota/gquota code, I stumbled on a
> xfs_shutdown() during a mount. But the mount just hung.
> 
> I debugged and found that it is in a code path where
> &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels
> down the same semaphore is being acquired in write mode causing a
> deadlock.
> 
> This is the stack:
> xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
>   xlog_print_tic_res
>     xfs_force_shutdown
>       xfs_log_force_umount
>         xlog_cil_force
>           xlog_cil_force_lsn
>             xlog_cil_push_foreground
>               xlog_cil_push - tries to acquire same semaphore in write mode

Which means you had a transaction reservation overrun. Is it
reproducable? iDo you have the output from xlog_print_tic_res()?
Because:

> xfs_trans_commit+0x79/0x270 [xfs]  
> xfs_qm_write_sb_changes+0x61/0x90 [xfs]
> xfs_qm_mount_quotas+0x82/0x180 [xfs]
> xfs_mountfs+0x5f6/0x6b0 [xfs]

This transaction only modifies the superblock, and it has a buffer
reservation for a superblock sized buffer, and hence should never
overrun.

IOWs, I'm ifar more concerned about the fact there was a
transaction overrun than they was a hang in the path that handles
the overrun. The fact this hang has been there since 2.6.35 tells
you how rare transactions overruns are....

FWIW, the fix for the hang is to make xlog_print_tic_res() return an
error and have the caller handle the shutdown.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: deadlock with &log->l_cilp->xc_ctx_lock semaphone
  2013-05-22 23:41 ` Dave Chinner
@ 2013-05-23 18:09   ` Chandra Seetharaman
  2013-05-23 23:42     ` Dave Chinner
  2013-05-24 21:44   ` Chandra Seetharaman
  2013-06-18 23:18   ` Chandra Seetharaman
  2 siblings, 1 reply; 8+ messages in thread
From: Chandra Seetharaman @ 2013-05-23 18:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: XFS mailing list

On Thu, 2013-05-23 at 09:41 +1000, Dave Chinner wrote:
> On Wed, May 22, 2013 at 06:12:43PM -0500, Chandra Seetharaman wrote:
> > Hello,
> > 
> > While testing and rearranging my pquota/gquota code, I stumbled on a
> > xfs_shutdown() during a mount. But the mount just hung.
> > 
> > I debugged and found that it is in a code path where
> > &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels
> > down the same semaphore is being acquired in write mode causing a
> > deadlock.
> > 
> > This is the stack:
> > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
> >   xlog_print_tic_res
> >     xfs_force_shutdown
> >       xfs_log_force_umount
> >         xlog_cil_force
> >           xlog_cil_force_lsn
> >             xlog_cil_push_foreground
> >               xlog_cil_push - tries to acquire same semaphore in write mode
> 
> Which means you had a transaction reservation overrun. Is it
> reproducable? iDo you have the output from xlog_print_tic_res()?
> Because:

Here it is:

May 23 10:48:52 test46 kernel: [   77.500728] XFS (sdh8): xlog_write: reservation summary:
May 23 10:48:52 test46 kernel: [   77.500728]   trans type  = QM_SBCHANGE (26)
May 23 10:48:52 test46 kernel: [   77.500728]   unit res    = 2740 bytes
May 23 10:48:52 test46 kernel: [   77.500728]   current res = -48 bytes
May 23 10:48:52 test46 kernel: [   77.500728]   total reg   = 0 bytes (o/flow = 0 bytes)
May 23 10:48:52 test46 kernel: [   77.500728]   ophdrs      = 0 (ophdr space = 0 bytes)
May 23 10:48:52 test46 kernel: [   77.500728]   ophdr + reg = 0 bytes
May 23 10:48:52 test46 kernel: [   77.500728]   num regions = 0
May 23 10:48:52 test46 kernel: [   77.500728]

Yes. I can readily reproduce the problem, but it is with my mangled up
patchsets :). There is a small change that makes this problem reproduce
consistently.
> 
> > xfs_trans_commit+0x79/0x270 [xfs]  
> > xfs_qm_write_sb_changes+0x61/0x90 [xfs]
> > xfs_qm_mount_quotas+0x82/0x180 [xfs]
> > xfs_mountfs+0x5f6/0x6b0 [xfs]
> 
> This transaction only modifies the superblock, and it has a buffer
> reservation for a superblock sized buffer, and hence should never
> overrun.
> 
> IOWs, I'm ifar more concerned about the fact there was a
> transaction overrun than they was a hang in the path that handles

As I mentioned above, it may be a manipulation of my patch entanglement.

> the overrun. The fact this hang has been there since 2.6.35 tells
> you how rare transactions overruns are....
> 
> FWIW, the fix for the hang is to make xlog_print_tic_res() return an
> error and have the caller handle the shutdown.
> 
> Cheers,
> 
> Dave.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: deadlock with &log->l_cilp->xc_ctx_lock semaphone
  2013-05-23 18:09   ` Chandra Seetharaman
@ 2013-05-23 23:42     ` Dave Chinner
  2013-05-24 21:41       ` Chandra Seetharaman
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2013-05-23 23:42 UTC (permalink / raw)
  To: Chandra Seetharaman; +Cc: XFS mailing list

On Thu, May 23, 2013 at 01:09:02PM -0500, Chandra Seetharaman wrote:
> On Thu, 2013-05-23 at 09:41 +1000, Dave Chinner wrote:
> > On Wed, May 22, 2013 at 06:12:43PM -0500, Chandra Seetharaman wrote:
> > > Hello,
> > > 
> > > While testing and rearranging my pquota/gquota code, I stumbled on a
> > > xfs_shutdown() during a mount. But the mount just hung.
> > > 
> > > I debugged and found that it is in a code path where
> > > &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels
> > > down the same semaphore is being acquired in write mode causing a
> > > deadlock.
> > > 
> > > This is the stack:
> > > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
> > >   xlog_print_tic_res
> > >     xfs_force_shutdown
> > >       xfs_log_force_umount
> > >         xlog_cil_force
> > >           xlog_cil_force_lsn
> > >             xlog_cil_push_foreground
> > >               xlog_cil_push - tries to acquire same semaphore in write mode
> > 
> > Which means you had a transaction reservation overrun. Is it
> > reproducable? iDo you have the output from xlog_print_tic_res()?
> > Because:
> 
> Here it is:
> 
> May 23 10:48:52 test46 kernel: [   77.500728] XFS (sdh8): xlog_write: reservation summary:
> May 23 10:48:52 test46 kernel: [   77.500728]   trans type  = QM_SBCHANGE (26)
> May 23 10:48:52 test46 kernel: [   77.500728]   unit res    = 2740 bytes
> May 23 10:48:52 test46 kernel: [   77.500728]   current res = -48 bytes
> May 23 10:48:52 test46 kernel: [   77.500728]   total reg   = 0 bytes (o/flow = 0 bytes)
> May 23 10:48:52 test46 kernel: [   77.500728]   ophdrs      = 0 (ophdr space = 0 bytes)
> May 23 10:48:52 test46 kernel: [   77.500728]   ophdr + reg = 0 bytes
> May 23 10:48:52 test46 kernel: [   77.500728]   num regions = 0
> May 23 10:48:52 test46 kernel: [   77.500728]
> 
> Yes. I can readily reproduce the problem, but it is with my mangled up
> patchsets :). There is a small change that makes this problem reproduce
> consistently.

Interesting. That implies that the CIL stole the reservation for the
checkpoint headers from this reservation, and then it overran by 48
bytes. An increase in the number of quotas should not affect this.

What is the xfs_info output on the filesystem that is triggering
this?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: deadlock with &log->l_cilp->xc_ctx_lock semaphone
  2013-05-23 23:42     ` Dave Chinner
@ 2013-05-24 21:41       ` Chandra Seetharaman
  0 siblings, 0 replies; 8+ messages in thread
From: Chandra Seetharaman @ 2013-05-24 21:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: XFS mailing list

On Fri, 2013-05-24 at 09:42 +1000, Dave Chinner wrote:
> On Thu, May 23, 2013 at 01:09:02PM -0500, Chandra Seetharaman wrote:
> > On Thu, 2013-05-23 at 09:41 +1000, Dave Chinner wrote:
> > > On Wed, May 22, 2013 at 06:12:43PM -0500, Chandra Seetharaman wrote:
> > > > Hello,
> > > > 
> > > > While testing and rearranging my pquota/gquota code, I stumbled on a
> > > > xfs_shutdown() during a mount. But the mount just hung.
> > > > 
> > > > I debugged and found that it is in a code path where
> > > > &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels
> > > > down the same semaphore is being acquired in write mode causing a
> > > > deadlock.
> > > > 
> > > > This is the stack:
> > > > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
> > > >   xlog_print_tic_res
> > > >     xfs_force_shutdown
> > > >       xfs_log_force_umount
> > > >         xlog_cil_force
> > > >           xlog_cil_force_lsn
> > > >             xlog_cil_push_foreground
> > > >               xlog_cil_push - tries to acquire same semaphore in write mode
> > > 
> > > Which means you had a transaction reservation overrun. Is it
> > > reproducable? iDo you have the output from xlog_print_tic_res()?
> > > Because:
> > 
> > Here it is:
> > 
> > May 23 10:48:52 test46 kernel: [   77.500728] XFS (sdh8): xlog_write: reservation summary:
> > May 23 10:48:52 test46 kernel: [   77.500728]   trans type  = QM_SBCHANGE (26)
> > May 23 10:48:52 test46 kernel: [   77.500728]   unit res    = 2740 bytes
> > May 23 10:48:52 test46 kernel: [   77.500728]   current res = -48 bytes
> > May 23 10:48:52 test46 kernel: [   77.500728]   total reg   = 0 bytes (o/flow = 0 bytes)
> > May 23 10:48:52 test46 kernel: [   77.500728]   ophdrs      = 0 (ophdr space = 0 bytes)
> > May 23 10:48:52 test46 kernel: [   77.500728]   ophdr + reg = 0 bytes
> > May 23 10:48:52 test46 kernel: [   77.500728]   num regions = 0
> > May 23 10:48:52 test46 kernel: [   77.500728]
> > 
> > Yes. I can readily reproduce the problem, but it is with my mangled up
> > patchsets :). There is a small change that makes this problem reproduce
> > consistently.
> 
> Interesting. That implies that the CIL stole the reservation for the
> checkpoint headers from this reservation, and then it overran by 48
> bytes. An increase in the number of quotas should not affect this.
> 
> What is the xfs_info output on the filesystem that is triggering
> this?

I have the same set of patches, but it is not happening any more :(. I
will keep trying.
> 
> Cheers,
> 
> Dave.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: deadlock with &log->l_cilp->xc_ctx_lock semaphone
  2013-05-22 23:41 ` Dave Chinner
  2013-05-23 18:09   ` Chandra Seetharaman
@ 2013-05-24 21:44   ` Chandra Seetharaman
  2013-06-18 23:18   ` Chandra Seetharaman
  2 siblings, 0 replies; 8+ messages in thread
From: Chandra Seetharaman @ 2013-05-24 21:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: XFS mailing list

On Thu, 2013-05-23 at 09:41 +1000, Dave Chinner wrote:
> On Wed, May 22, 2013 at 06:12:43PM -0500, Chandra Seetharaman wrote:
> > Hello,
> > 
> > While testing and rearranging my pquota/gquota code, I stumbled on a
> > xfs_shutdown() during a mount. But the mount just hung.
> > 
> > I debugged and found that it is in a code path where
> > &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels
> > down the same semaphore is being acquired in write mode causing a
> > deadlock.
> > 
> > This is the stack:
> > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
> >   xlog_print_tic_res
> >     xfs_force_shutdown
> >       xfs_log_force_umount
> >         xlog_cil_force
> >           xlog_cil_force_lsn
> >             xlog_cil_push_foreground
> >               xlog_cil_push - tries to acquire same semaphore in write mode
> 
> Which means you had a transaction reservation overrun. Is it
> reproducable? iDo you have the output from xlog_print_tic_res()?
> Because:
> 
> > xfs_trans_commit+0x79/0x270 [xfs]  
> > xfs_qm_write_sb_changes+0x61/0x90 [xfs]
> > xfs_qm_mount_quotas+0x82/0x180 [xfs]
> > xfs_mountfs+0x5f6/0x6b0 [xfs]
> 
> This transaction only modifies the superblock, and it has a buffer
> reservation for a superblock sized buffer, and hence should never
> overrun.
> 
> IOWs, I'm ifar more concerned about the fact there was a
> transaction overrun than they was a hang in the path that handles
> the overrun. The fact this hang has been there since 2.6.35 tells
> you how rare transactions overruns are....
> 
> FWIW, the fix for the hang is to make xlog_print_tic_res() return an
> error and have the caller handle the shutdown.

Can I wait to call shutdown till I finish with the read mode semaphore ?

> 
> Cheers,
> 
> Dave.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: deadlock with &log->l_cilp->xc_ctx_lock semaphone
  2013-05-22 23:41 ` Dave Chinner
  2013-05-23 18:09   ` Chandra Seetharaman
  2013-05-24 21:44   ` Chandra Seetharaman
@ 2013-06-18 23:18   ` Chandra Seetharaman
  2013-06-20 21:06     ` Dave Chinner
  2 siblings, 1 reply; 8+ messages in thread
From: Chandra Seetharaman @ 2013-06-18 23:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: XFS mailing list

On Thu, 2013-05-23 at 09:41 +1000, Dave Chinner wrote:
> On Wed, May 22, 2013 at 06:12:43PM -0500, Chandra Seetharaman wrote:
> > Hello,
> > 
> > While testing and rearranging my pquota/gquota code, I stumbled on a
> > xfs_shutdown() during a mount. But the mount just hung.
> > 
> > I debugged and found that it is in a code path where
> > &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels
> > down the same semaphore is being acquired in write mode causing a
> > deadlock.
> > 
> > This is the stack:
> > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
> >   xlog_print_tic_res
> >     xfs_force_shutdown
> >       xfs_log_force_umount
> >         xlog_cil_force
> >           xlog_cil_force_lsn
> >             xlog_cil_push_foreground
> >               xlog_cil_push - tries to acquire same semaphore in write mode
> 
> Which means you had a transaction reservation overrun. Is it
> reproducable? iDo you have the output from xlog_print_tic_res()?
> Because:
> 
> > xfs_trans_commit+0x79/0x270 [xfs]  
> > xfs_qm_write_sb_changes+0x61/0x90 [xfs]
> > xfs_qm_mount_quotas+0x82/0x180 [xfs]
> > xfs_mountfs+0x5f6/0x6b0 [xfs]
> 
> This transaction only modifies the superblock, and it has a buffer
> reservation for a superblock sized buffer, and hence should never
> overrun.
> 
> IOWs, I'm ifar more concerned about the fact there was a
> transaction overrun than they was a hang in the path that handles
> the overrun. The fact this hang has been there since 2.6.35 tells
> you how rare transactions overruns are....
> 
> FWIW, the fix for the hang is to make xlog_print_tic_res() return an
> error and have the caller handle the shutdown.

Dave,

There are few ways this can be done, but each of them seem to change the
code behavior. Wanted to get your opinion on which is the correct way.

(1) - don't shutdown in xlog_print_tic_res();
    - upon return from xlog_print_tic_res(), do a up_read, and shutdown
    - at the end of fucntion, up_read only if we haven't done it already
  Behavior change: The protection offered by the semaphore is not 
      available to the code block from shutdown to end of function
(2) - don't shoudown in xlog_print_tic_res()
    - upon return just set a flag
    - at the end of function, after up_read, do shutdown
  Behavior change: Shutdown is delayed to a later point.

I prefer (2) since (1) drops the protection. But, do not know the
ramifications of delaying the shutdown. Can you comment ?

Regards,

Chandra
> 
> Cheers,
> 
> Dave.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: deadlock with &log->l_cilp->xc_ctx_lock semaphone
  2013-06-18 23:18   ` Chandra Seetharaman
@ 2013-06-20 21:06     ` Dave Chinner
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2013-06-20 21:06 UTC (permalink / raw)
  To: Chandra Seetharaman; +Cc: XFS mailing list

On Tue, Jun 18, 2013 at 06:18:35PM -0500, Chandra Seetharaman wrote:
> On Thu, 2013-05-23 at 09:41 +1000, Dave Chinner wrote:
> > On Wed, May 22, 2013 at 06:12:43PM -0500, Chandra Seetharaman wrote:
> > > Hello,
> > > 
> > > While testing and rearranging my pquota/gquota code, I stumbled on a
> > > xfs_shutdown() during a mount. But the mount just hung.
> > > 
> > > I debugged and found that it is in a code path where
> > > &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels
> > > down the same semaphore is being acquired in write mode causing a
> > > deadlock.
> > > 
> > > This is the stack:
> > > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
> > >   xlog_print_tic_res
> > >     xfs_force_shutdown
> > >       xfs_log_force_umount
> > >         xlog_cil_force
> > >           xlog_cil_force_lsn
> > >             xlog_cil_push_foreground
> > >               xlog_cil_push - tries to acquire same semaphore in write mode
> > 
> > Which means you had a transaction reservation overrun. Is it
> > reproducable? iDo you have the output from xlog_print_tic_res()?
> > Because:
> > 
> > > xfs_trans_commit+0x79/0x270 [xfs]  
> > > xfs_qm_write_sb_changes+0x61/0x90 [xfs]
> > > xfs_qm_mount_quotas+0x82/0x180 [xfs]
> > > xfs_mountfs+0x5f6/0x6b0 [xfs]
> > 
> > This transaction only modifies the superblock, and it has a buffer
> > reservation for a superblock sized buffer, and hence should never
> > overrun.
> > 
> > IOWs, I'm ifar more concerned about the fact there was a
> > transaction overrun than they was a hang in the path that handles
> > the overrun. The fact this hang has been there since 2.6.35 tells
> > you how rare transactions overruns are....
> > 
> > FWIW, the fix for the hang is to make xlog_print_tic_res() return an
> > error and have the caller handle the shutdown.
> 
> Dave,
> 
> There are few ways this can be done, but each of them seem to change the
> code behavior. Wanted to get your opinion on which is the correct way.
> 
> (1) - don't shutdown in xlog_print_tic_res();
>     - upon return from xlog_print_tic_res(), do a up_read, and shutdown
>     - at the end of fucntion, up_read only if we haven't done it already
>   Behavior change: The protection offered by the semaphore is not 
>       available to the code block from shutdown to end of function

We don't want to do the rest of the work in the function if the
ticket check fails, so:

> (2) - don't shoudown in xlog_print_tic_res()
>     - upon return just set a flag
>     - at the end of function, after up_read, do shutdown
>   Behavior change: Shutdown is delayed to a later point.

A behaviour change is fine because we don't care about races here.

> I prefer (2) since (1) drops the protection. But, do not know the
> ramifications of delaying the shutdown. Can you comment ?

Basically, I think you can have xlog_print_tic_res() return
EFSCORRUPTED to the xfs_log_commit_cil(), then have it drop the
lock and return that error to xfs_trans_commit() immediately.

However, xfs_trans_commit() only catches the ENOMEM error, but that
was a temporary situation to handle the allocation error from
xfs_log_commit_cil() specially. Since we've removed the old old
logging code, this is the only error that can be returned so the
shutdown on error doesn't need to be specific to ENOMEM. hence if
you convert this to just a plain if (error) check, the shutdown will
happen from xfs_trans_commit() and the deadlock won't occur.

FWIW, to be on the safe side, I'd move the ticket check to after we
attach the busy extents to the CIL so that we can be certain that
the error path in xfs_trans_commit() does the right thing with them.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-06-20 21:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-22 23:12 deadlock with &log->l_cilp->xc_ctx_lock semaphone Chandra Seetharaman
2013-05-22 23:41 ` Dave Chinner
2013-05-23 18:09   ` Chandra Seetharaman
2013-05-23 23:42     ` Dave Chinner
2013-05-24 21:41       ` Chandra Seetharaman
2013-05-24 21:44   ` Chandra Seetharaman
2013-06-18 23:18   ` Chandra Seetharaman
2013-06-20 21:06     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.