All of lore.kernel.org
 help / color / mirror / Atom feed
* XFS and USB Hang on 2.6.35.13
@ 2011-06-30 11:27 Amit Sahrawat
  2011-06-30 12:19 ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Amit Sahrawat @ 2011-06-30 11:27 UTC (permalink / raw)
  To: xfs, Dave Chinner


[-- Attachment #1.1: Type: text/plain, Size: 8362 bytes --]

Hi All,
I encountered a hang on XFS during unplug.
*Test Case:*
#!/bin/sh
index=0
while [ "$?" == 0 ]
do
        index=$(($index+1))
        sync
        cp /mnt/1KB.txt /tmp/"$index".test
done
Where /mnt - mount point for vfat and /tmp mount point for XFS, both can be
XFS also.

During this operation, unplug the USB. I am getting HANG almost everytime I
unplug.

*Kernel Version:* 2.6.35.13 (extremely sorry, I know next question will be
why am I not using TOT kernel - I tried but my PC does not boot up with the
latest one)

*Target=ARM*
*Logs Using Kernel Hung Task Feature*
# sh test.sh
usb 2-1: USB disconnect, address 2
sd 0:0:0:0: [sda] Unhandled error code
sd 0:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 40 08 01 00 00 02 00
end_request: I/O error, dev sda, sector 4196353
sd 0:0:0:0: [sda] Unhandled error code
sd 0:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 40 08 08 00 00 18 00
end_request: I/O error, dev sda, sector 4196360
end_request: I/O error, dev sda, sector 6293645
Device sda3, XFS metadata write error block 0x1 in sda3
xfs_force_shutdown(sda3,0x1) called from line 1031 of file
fs/xfs/xfs_buf_item.c.  Return address = 0xc0507b1c

*INFO: task khubd:*33 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
khubd         D c06c261c     0    33      2 0x00000000
Backtrace:
[<c06c2210>] (schedule+0x0/0x500) from [<c0523f4c>]
(_xfs_log_force+0x230/0x284)
[<c0523d1c>] (_xfs_log_force+0x0/0x284) from [<c0524194>]
(xfs_log_force+0x18/0x38)
[<c052417c>] (xfs_log_force+0x0/0x38) from [<c0544edc>]
(xfs_sync_data+0x48/0x58)
 r4:00000001 r3:00000005
[<c0544e94>] (xfs_sync_data+0x0/0x58) from [<c0544f44>]
(xfs_quiesce_data+0x24/0x80)
 r6:dbd0dd08 r5:00000001 r4:db46e000
[<c0544f20>] (xfs_quiesce_data+0x0/0x80) from [<c0542208>]
(xfs_fs_sync_fs+0x24/0xe0)
 r6:dbd0dd08 r5:00000001 r4:db46e000 r3:c05421e4
[<c05421e4>] (xfs_fs_sync_fs+0x0/0xe0) from [<c048faf8>]
(__sync_filesystem+0x84/0xa0)
 r8:0000001f r7:db46eae0 r6:dbd0dd08 r5:00000001 r4:db470200
[<c048fa74>] (__sync_filesystem+0x0/0xa0) from [<c048fbe0>]
(sync_filesystem+0x58/0x60)
 r5:db470200 r4:db470200
[<c048fb88>] (sync_filesystem+0x0/0x60) from [<c0499124>]
(fsync_bdev+0x20/0x44)
 r4:db89e0a0 r3:00000006
[<c0499104>] (fsync_bdev+0x0/0x44) from [<c056c6a0>]
(invalidate_partition+0x20/0x3c)
 r5:dbdc1a00 r4:db89e0a0
[<c056c680>] (invalidate_partition+0x0/0x3c) from [<c04b8914>]
(del_gendisk+0x34/0x140)
 r5:dbdc1a00 r4:db470e00
[<c04b88e0>] (del_gendisk+0x0/0x140) from [<c05c78e8>] (sd_remove+0x48/0x84)
 r6:db44a5e8 r5:c07e8fc4 r4:db46e4c0
[<c05c78a0>] (sd_remove+0x0/0x84) from [<c05b2860>]
(__device_release_driver+0x6c/0xac)
 r6:60000013 r5:c07e901c r4:db46e4c0 r3:c05c78a0
[<c05b27f4>] (__device_release_driver+0x0/0xac) from [<c05b2978>]
(device_release_driver+0x24/0x30)
 r5:db46e4c0 r4:db46e4f4
[<c05b2954>] (device_release_driver+0x0/0x30) from [<c05b1e58>]
(bus_remove_device+0x7c/0x8c)
 r5:dad9d018 r4:db46e4c0
[<c05b1ddc>] (bus_remove_device+0x0/0x8c) from [<c05b03f0>]
(device_del+0x118/0x170)
 r4:db46e4c0 r3:00000000
[<c05b02d8>] (device_del+0x0/0x170) from [<c05c4da4>]
(__scsi_remove_device+0x48/0x90)
 r6:60000013 r5:db46e4c0 r4:db46e400 r3:c07e8760
[<c05c4d5c>] (__scsi_remove_device+0x0/0x90) from [<c05c2404>]
(scsi_forget_host+0x48/0x6c)
 r5:db46e400 r4:db46e800
[<c05c23bc>] (scsi_forget_host+0x0/0x6c) from [<c05bc408>]
(scsi_remove_host+0x7c/0x104)
 r5:db46e838 r4:db46e800
[<c05bc38c>] (scsi_remove_host+0x0/0x104) from [<c0612ffc>]
(quiesce_and_remove_host+0x68/0x9c)
 r6:db46ebb4 r5:db46e800 r4:db46eab8 r3:00000000
[<c0612f94>] (quiesce_and_remove_host+0x0/0x9c) from [<c06130d0>]
(usb_stor_disconnect+0x1c/0x28)
 r7:c07eb2f4 r6:db46ec00 r5:db49a200 r4:db46eab8
[<c06130b4>] (usb_stor_disconnect+0x0/0x28) from [<c0601668>]
(usb_unbind_interface+0x54/0xdc)
 r4:db49a220 r3:c06130b4
[<c0601614>] (usb_unbind_interface+0x0/0xdc) from [<c05b2860>]
(__device_release_driver+0x6c/0xac)
 r7:00000000 r6:db49a200 r5:c07eb2f4 r4:db49a220
[<c05b27f4>] (__device_release_driver+0x0/0xac) from [<c05b2978>]
(device_release_driver+0x24/0x30)
 r5:db49a220 r4:db49a254
[<c05b2954>] (device_release_driver+0x0/0x30) from [<c05b1e58>]
(bus_remove_device+0x7c/0x8c)
 r5:db46ec68 r4:db49a220
[<c05b1ddc>] (bus_remove_device+0x0/0x8c) from [<c05b03f0>]
(device_del+0x118/0x170)
 r4:db49a220 r3:00000000
[<c05b02d8>] (device_del+0x0/0x170) from [<c05ff0c4>]
(usb_disable_device+0x58/0xf8)
 r6:db49a200 r5:00000000 r4:db46ec00 r3:00000005
[<c05ff06c>] (usb_disable_device+0x0/0xf8) from [<c05fa978>]
(usb_disconnect+0x98/0xf4)
 r7:dbd70200 r6:db46ec68 r5:db46ec9c r4:db46ec00
[<c05fa8e0>] (usb_disconnect+0x0/0xf4) from [<c05fafbc>]
(hub_thread+0x3e4/0xd78)
[<c05fabd8>] (hub_thread+0x0/0xd78) from [<c041e6a0>] (kthread+0x84/0x8c)
[<c041e61c>] (kthread+0x0/0x8c) from [<c040c04c>] (do_exit+0x0/0x61c)
 r7:00000013 r6:c040c04c r5:c041e61c r4:dbc2df24
*INFO: task xfslogd/1*:40 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
xfslogd/1     D c06c261c     0    40      2 0x00000000
Backtrace:
[<c06c2210>] (schedule+0x0/0x500) from [<c0523f4c>]
(_xfs_log_force+0x230/0x284)
[<c0523d1c>] (_xfs_log_force+0x0/0x284) from [<c0524148>]
(xfs_log_force_umount+0x1a8/0x1dc)
[<c0523fa0>] (xfs_log_force_umount+0x0/0x1dc) from [<c053753c>]
(xfs_do_force_shutdown+0x70/0x164)
[<c05374cc>] (xfs_do_force_shutdown+0x0/0x164) from [<c0507b1c>]
(xfs_buf_iodone_callbacks+0x14c/0x184)
 r7:db469f20 r6:db46e000 r5:daf58968 r4:db445ec0
[<c05079d0>] (xfs_buf_iodone_callbacks+0x0/0x184) from [<c053b8e0>]
(xfs_buf_iodone_work+0x58/0x7c)
 r7:c0bed1d0 r6:c053b888 r5:db468000 r4:db445ec0
[<c053b888>] (xfs_buf_iodone_work+0x0/0x7c) from [<c041a8bc>]
(worker_thread+0x154/0x1e4)
 r4:c0bed1c0 r3:00000001
[<c041a768>] (worker_thread+0x0/0x1e4) from [<c041e6a0>] (kthread+0x84/0x8c)
[<c041e61c>] (kthread+0x0/0x8c) from [<c040c04c>] (do_exit+0x0/0x61c)
 r7:00000013 r6:c040c04c r5:c041e61c r4:dbc2decc
*INFO: task sync:*164 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
sync          D c06c261c     0   164    136 0x00000000
Backtrace:
[<c06c2210>] (schedule+0x0/0x500) from [<c06c297c>]
(schedule_timeout+0x20/0x200)
[<c06c295c>] (schedule_timeout+0x0/0x200) from [<c06c20d0>]
(wait_for_common+0xd0/0x164)
 r7:db44534c r6:daed4000 r5:7fffffff r4:db445348
[<c06c2000>] (wait_for_common+0x0/0x164) from [<c06c220c>]
(wait_for_completion+0x18/0x1c)
[<c06c21f4>] (wait_for_completion+0x0/0x1c) from [<c053a74c>]
(xfs_buf_iowait+0x54/0x5c)
[<c053a6f8>] (xfs_buf_iowait+0x0/0x5c) from [<c053bb9c>]
(xfs_flush_buftarg+0x14c/0x180)
 r4:db4452c0 r3:db4452ec
[<c053ba50>] (xfs_flush_buftarg+0x0/0x180) from [<c0544f5c>]
(xfs_quiesce_data+0x3c/0x80)
 r7:c048fb14 r6:00000000 r5:00000001 r4:db46e000
[<c0544f20>] (xfs_quiesce_data+0x0/0x80) from [<c0542208>]
(xfs_fs_sync_fs+0x24/0xe0)
 r6:daed5f84 r5:00000001 r4:db46e000 r3:c05421e4
[<c05421e4>] (xfs_fs_sync_fs+0x0/0xe0) from [<c048faf8>]
(__sync_filesystem+0x84/0xa0)
 r8:c0394264 r7:c048fb14 r6:daed5f84 r5:00000001 r4:db470200
[<c048fa74>] (__sync_filesystem+0x0/0xa0) from [<c048fb40>]
(sync_one_sb+0x2c/0x30)
 r5:db470240 r4:db470200
[<c048fb14>] (sync_one_sb+0x0/0x30) from [<c0470ca0>]
(iterate_supers+0x74/0xb8)
[<c0470c2c>] (iterate_supers+0x0/0xb8) from [<c048fa30>]
(sync_filesystems+0x20/0x2c)
 r7:00000024 r6:bef48db4 r5:00000001 r4:000f78be
[<c048fa10>] (sync_filesystems+0x0/0x2c) from [<c048fb68>]
(sys_sync+0x24/0x44)
[<c048fb44>] (sys_sync+0x0/0x44) from [<c03940e0>]
(ret_fast_syscall+0x0/0x30)
*On Analysis:*
I found that both the *"khubd"* and *"xfslogd"* are waiting for signal
(sv_wait() (xfs_log.c - _xfs_log_force()))
*In normal case:*
I found that xfslogd does the callback and signal that through
xlog_state_do_callback() ( sv_broadcast(&iclog->ic_force_wait) - xfs_log.c);
But during hang, xfslogd itself is waiting. So who does the callback in this
case?
Is xfslogd supposed to wait(sv_wait(&iclog->ic_force_wait, PINOD,
&log->l_icloglock, s);) ?

Please let me know in case of any issues also update if my analysis is right
or wrong. I do not expect a solution but please guide me in my findings.

Thanks & Regards,
Amit Sahrawat

[-- Attachment #1.2: Type: text/html, Size: 9794 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS and USB Hang on 2.6.35.13
  2011-06-30 11:27 XFS and USB Hang on 2.6.35.13 Amit Sahrawat
@ 2011-06-30 12:19 ` Dave Chinner
  2011-07-01  4:30   ` Amit Sahrawat
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2011-06-30 12:19 UTC (permalink / raw)
  To: Amit Sahrawat; +Cc: xfs

On Thu, Jun 30, 2011 at 04:57:42PM +0530, Amit Sahrawat wrote:
> Hi All,
> I encountered a hang on XFS during unplug.
> *Test Case:*
> #!/bin/sh
> index=0
> while [ "$?" == 0 ]
> do
>         index=$(($index+1))
>         sync
>         cp /mnt/1KB.txt /tmp/"$index".test
> done
> Where /mnt - mount point for vfat and /tmp mount point for XFS, both can be
> XFS also.
> 
> During this operation, unplug the USB. I am getting HANG almost everytime I
> unplug.

Well, that's no surprise. The unplug appears to be losing IOs in
progress.

> *Kernel Version:* 2.6.35.13 (extremely sorry, I know next question will be
> why am I not using TOT kernel - I tried but my PC does not boot up with the
> latest one)
> 
> *Target=ARM*
> *Logs Using Kernel Hung Task Feature*
> # sh test.sh
> usb 2-1: USB disconnect, address 2
> sd 0:0:0:0: [sda] Unhandled error code
> sd 0:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
> sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 40 08 01 00 00 02 00
> end_request: I/O error, dev sda, sector 4196353
> sd 0:0:0:0: [sda] Unhandled error code
> sd 0:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
> sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 40 08 08 00 00 18 00
> end_request: I/O error, dev sda, sector 4196360
> end_request: I/O error, dev sda, sector 6293645
> Device sda3, XFS metadata write error block 0x1 in sda3
> xfs_force_shutdown(sda3,0x1) called from line 1031 of file
> fs/xfs/xfs_buf_item.c.  Return address = 0xc0507b1c

So the device was unplugged, there was a disconnect error, a few IO
errors and then a shutdown.

> 
> *INFO: task khubd:*33 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> khubd         D c06c261c     0    33      2 0x00000000
> Backtrace:
> [<c06c2210>] (schedule+0x0/0x500) from [<c0523f4c>]
> (_xfs_log_force+0x230/0x284)

You need to turn off line wrapping for stuff you paste into email.
The cleaned up (i.e. relevant part) trace is:

[<c06c2210>] (schedule+0x0/0x500)
[<c0523d1c>] (_xfs_log_force+0x0/0x284)
[<c052417c>] (xfs_log_force+0x0/0x38)
[<c0544e94>] (xfs_sync_data+0x0/0x58)
[<c0544f20>] (xfs_quiesce_data+0x0/0x80)
[<c05421e4>] (xfs_fs_sync_fs+0x0/0xe0)
[<c048fa74>] (__sync_filesystem+0x0/0xa0)
[<c048fb88>] (sync_filesystem+0x0/0x60)
[<c0499104>] (fsync_bdev+0x0/0x44)
[<c056c680>] (invalidate_partition+0x0/0x3c)
[<c04b88e0>] (del_gendisk+0x0/0x140)
[<c05c78a0>] (sd_remove+0x0/0x84)
[<c05b27f4>] (__device_release_driver+0x0/0xac)
[<c05b2954>] (device_release_driver+0x0/0x30)
[<c05b1ddc>] (bus_remove_device+0x0/0x8c)
[<c05b02d8>] (device_del+0x0/0x170)
[<c05c4d5c>] (__scsi_remove_device+0x0/0x90)
[<c05c23bc>] (scsi_forget_host+0x0/0x6c)
[<c05bc38c>] (scsi_remove_host+0x0/0x104)
[<c0612f94>] (quiesce_and_remove_host+0x0/0x9c)
[<c06130b4>] (usb_stor_disconnect+0x0/0x28)
[<c0601614>] (usb_unbind_interface+0x0/0xdc)
[<c05b27f4>] (__device_release_driver+0x0/0xac)
[<c05b2954>] (device_release_driver+0x0/0x30)
[<c05b1ddc>] (bus_remove_device+0x0/0x8c)
[<c05b02d8>] (device_del+0x0/0x170)
[<c05ff06c>] (usb_disable_device+0x0/0xf8)
[<c05fa8e0>] (usb_disconnect+0x0/0xf4)
[<c05fabd8>] (hub_thread+0x0/0xd78)
[<c041e61c>] (kthread+0x0/0x8c)

Well, that just looks utterly braindamaged to me.

We just had the device containing the filesystem removed from the
system, so the error handling routine ends up trying to sync the
filesystem to the device that doesn't exist anymore. WTF?

Anyway, that's not the cause of the hang, but just an example of
someone not thinking through what their error handling actually
does...

> *INFO: task xfslogd/1*:40 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> xfslogd/1     D c06c261c     0    40      2 0x00000000
> Backtrace:

[<c06c2210>] (schedule+0x0/0x500)
[<c0523d1c>] (_xfs_log_force+0x0/0x284)
[<c0523fa0>] (xfs_log_force_umount+0x0/0x1dc)
[<c05374cc>] (xfs_do_force_shutdown+0x0/0x164)
[<c05079d0>] (xfs_buf_iodone_callbacks+0x0/0x184)
[<c053b888>] (xfs_buf_iodone_work+0x0/0x7c)
[<c041a768>] (worker_thread+0x0/0x1e4)
[<c041e61c>] (kthread+0x0/0x8c)

That's where the shutdown has hung - only the xfslogd can complete
the IO that will allow the log force to complete, and that is not
occurring because it is waiting for the log force to complete before
it can complete the IO that will complete the log force...

AFAICT, this problem doesn't exist in TOT - the conversion of the
xfslogd workqueue to CMWQ allows processing of other xfslogd
workqueue events to continue even though this one has gone to sleep.

You probably need to change the shutdown type to
SHUTDOWN_LOG_IO_ERROR to prevent a log flush from occurring in this
shutdown context.

>  r7:00000013 r6:c040c04c r5:c041e61c r4:dbc2decc
> *INFO: task sync:*164 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> sync          D c06c261c     0   164    136 0x00000000
> Backtrace:
[<c06c2210>] (schedule+0x0/0x500)
[<c06c295c>] (schedule_timeout+0x0/0x200)
[<c06c2000>] (wait_for_common+0x0/0x164)
[<c06c21f4>] (wait_for_completion+0x0/0x1c)
[<c053a6f8>] (xfs_buf_iowait+0x0/0x5c)
[<c053ba50>] (xfs_flush_buftarg+0x0/0x180)
[<c0544f20>] (xfs_quiesce_data+0x0/0x80)
[<c05421e4>] (xfs_fs_sync_fs+0x0/0xe0)
[<c048fa74>] (__sync_filesystem+0x0/0xa0)
[<c048fb14>] (sync_one_sb+0x0/0x30)
[<c0470c2c>] (iterate_supers+0x0/0xb8)
[<c048fa10>] (sync_filesystems+0x0/0x2c)
[<c048fb44>] (sys_sync+0x0/0x44)

And that one is probably stuck waiting for the xfslogd to complete
the IO.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS and USB Hang on 2.6.35.13
  2011-06-30 12:19 ` Dave Chinner
@ 2011-07-01  4:30   ` Amit Sahrawat
  2011-07-01  9:03     ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Amit Sahrawat @ 2011-07-01  4:30 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Jun 30, 2011 at 5:49 PM, Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Jun 30, 2011 at 04:57:42PM +0530, Amit Sahrawat wrote:
> > Hi All,
> > I encountered a hang on XFS during unplug.
> > *Test Case:*
> > #!/bin/sh
> > index=0
> > while [ "$?" == 0 ]
> > do
> >         index=$(($index+1))
> >         sync
> >         cp /mnt/1KB.txt /tmp/"$index".test
> > done
> > Where /mnt - mount point for vfat and /tmp mount point for XFS, both can be
> > XFS also.
> >
> > During this operation, unplug the USB. I am getting HANG almost everytime I
> > unplug.
>
> Well, that's no surprise. The unplug appears to be losing IOs in
> progress.
>
> > *Kernel Version:* 2.6.35.13 (extremely sorry, I know next question will be
> > why am I not using TOT kernel - I tried but my PC does not boot up with the
> > latest one)
> >
> > *Target=ARM*
> > *Logs Using Kernel Hung Task Feature*
> > # sh test.sh
> > usb 2-1: USB disconnect, address 2
> > sd 0:0:0:0: [sda] Unhandled error code
> > sd 0:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
> > sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 40 08 01 00 00 02 00
> > end_request: I/O error, dev sda, sector 4196353
> > sd 0:0:0:0: [sda] Unhandled error code
> > sd 0:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
> > sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 40 08 08 00 00 18 00
> > end_request: I/O error, dev sda, sector 4196360
> > end_request: I/O error, dev sda, sector 6293645
> > Device sda3, XFS metadata write error block 0x1 in sda3
> > xfs_force_shutdown(sda3,0x1) called from line 1031 of file
> > fs/xfs/xfs_buf_item.c.  Return address = 0xc0507b1c
>
> So the device was unplugged, there was a disconnect error, a few IO
> errors and then a shutdown.
>
> >
> > *INFO: task khubd:*33 blocked for more than 120 seconds.
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > khubd         D c06c261c     0    33      2 0x00000000
> > Backtrace:
> > [<c06c2210>] (schedule+0x0/0x500) from [<c0523f4c>]
> > (_xfs_log_force+0x230/0x284)
>
> You need to turn off line wrapping for stuff you paste into email.
> The cleaned up (i.e. relevant part) trace is:
>
> [<c06c2210>] (schedule+0x0/0x500)
> [<c0523d1c>] (_xfs_log_force+0x0/0x284)
> [<c052417c>] (xfs_log_force+0x0/0x38)
> [<c0544e94>] (xfs_sync_data+0x0/0x58)
> [<c0544f20>] (xfs_quiesce_data+0x0/0x80)
> [<c05421e4>] (xfs_fs_sync_fs+0x0/0xe0)
> [<c048fa74>] (__sync_filesystem+0x0/0xa0)
> [<c048fb88>] (sync_filesystem+0x0/0x60)
> [<c0499104>] (fsync_bdev+0x0/0x44)
> [<c056c680>] (invalidate_partition+0x0/0x3c)
> [<c04b88e0>] (del_gendisk+0x0/0x140)
> [<c05c78a0>] (sd_remove+0x0/0x84)
> [<c05b27f4>] (__device_release_driver+0x0/0xac)
> [<c05b2954>] (device_release_driver+0x0/0x30)
> [<c05b1ddc>] (bus_remove_device+0x0/0x8c)
> [<c05b02d8>] (device_del+0x0/0x170)
> [<c05c4d5c>] (__scsi_remove_device+0x0/0x90)
> [<c05c23bc>] (scsi_forget_host+0x0/0x6c)
> [<c05bc38c>] (scsi_remove_host+0x0/0x104)
> [<c0612f94>] (quiesce_and_remove_host+0x0/0x9c)
> [<c06130b4>] (usb_stor_disconnect+0x0/0x28)
> [<c0601614>] (usb_unbind_interface+0x0/0xdc)
> [<c05b27f4>] (__device_release_driver+0x0/0xac)
> [<c05b2954>] (device_release_driver+0x0/0x30)
> [<c05b1ddc>] (bus_remove_device+0x0/0x8c)
> [<c05b02d8>] (device_del+0x0/0x170)
> [<c05ff06c>] (usb_disable_device+0x0/0xf8)
> [<c05fa8e0>] (usb_disconnect+0x0/0xf4)
> [<c05fabd8>] (hub_thread+0x0/0xd78)
> [<c041e61c>] (kthread+0x0/0x8c)
>
> Well, that just looks utterly braindamaged to me.
>
> We just had the device containing the filesystem removed from the
> system, so the error handling routine ends up trying to sync the
> filesystem to the device that doesn't exist anymore. WTF?
>

>>> This is what I think, why is syncing taking place when the device doesn't exist anymore. What is the gain in doing so? I will try and propose this feature.

> Anyway, that's not the cause of the hang, but just an example of
> someone not thinking through what their error handling actually
> does...
>
> > *INFO: task xfslogd/1*:40 blocked for more than 120 seconds.
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > xfslogd/1     D c06c261c     0    40      2 0x00000000
> > Backtrace:
>
> [<c06c2210>] (schedule+0x0/0x500)
> [<c0523d1c>] (_xfs_log_force+0x0/0x284)
> [<c0523fa0>] (xfs_log_force_umount+0x0/0x1dc)
> [<c05374cc>] (xfs_do_force_shutdown+0x0/0x164)
> [<c05079d0>] (xfs_buf_iodone_callbacks+0x0/0x184)
> [<c053b888>] (xfs_buf_iodone_work+0x0/0x7c)
> [<c041a768>] (worker_thread+0x0/0x1e4)
> [<c041e61c>] (kthread+0x0/0x8c)
>
> That's where the shutdown has hung - only the xfslogd can complete
> the IO that will allow the log force to complete, and that is not
> occurring because it is waiting for the log force to complete before
> it can complete the IO that will complete the log force...
>
> AFAICT, this problem doesn't exist in TOT - the conversion of the
>>> Again I have a problem which seems fixed in TOT :)

> xfslogd workqueue to CMWQ allows processing of other xfslogd
> workqueue events to continue even though this one has gone to sleep.
>
> You probably need to change the shutdown type to
> SHUTDOWN_LOG_IO_ERROR to prevent a log flush from occurring in this
> shutdown context.

>>> This will fix the error for this kernel version, I will give this a try.
>>> Is this the patchwork for CMWQ: http://patchwork.xfs.org/patch/2037/ (xfs: improve sync behaviour in face of aggressive dirtying) ? Please let me know.

>
> >  r7:00000013 r6:c040c04c r5:c041e61c r4:dbc2decc
> > *INFO: task sync:*164 blocked for more than 120 seconds.
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > sync          D c06c261c     0   164    136 0x00000000
> > Backtrace:
> [<c06c2210>] (schedule+0x0/0x500)
> [<c06c295c>] (schedule_timeout+0x0/0x200)
> [<c06c2000>] (wait_for_common+0x0/0x164)
> [<c06c21f4>] (wait_for_completion+0x0/0x1c)
> [<c053a6f8>] (xfs_buf_iowait+0x0/0x5c)
> [<c053ba50>] (xfs_flush_buftarg+0x0/0x180)
> [<c0544f20>] (xfs_quiesce_data+0x0/0x80)
> [<c05421e4>] (xfs_fs_sync_fs+0x0/0xe0)
> [<c048fa74>] (__sync_filesystem+0x0/0xa0)
> [<c048fb14>] (sync_one_sb+0x0/0x30)
> [<c0470c2c>] (iterate_supers+0x0/0xb8)
> [<c048fa10>] (sync_filesystems+0x0/0x2c)
> [<c048fb44>] (sys_sync+0x0/0x44)
>
> And that one is probably stuck waiting for the xfslogd to complete
> the IO.

>>>Yes, Both are waiting for signal to come.
Waiting [_xfs_log_force()][3125][Name:khubd]
end_request: I/O error, dev sda, sector 4226672
end_request: I/O error, dev sda, sector 5283400
Device sda3, XFS metadata write error block 0x1 in sda3
xfs_force_shutdown(sda3,0x1) called from line 1031 of file
fs/xfs/xfs_buf_item.c.  Return address = 0xc0507b1c

Waiting [_xfs_log_force()][3125][Name:xfslogd/0]


> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

Thanks & Regards,
Amit Sahrawat

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS and USB Hang on 2.6.35.13
  2011-07-01  4:30   ` Amit Sahrawat
@ 2011-07-01  9:03     ` Dave Chinner
  2011-07-01 10:37       ` Amit Sahrawat
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2011-07-01  9:03 UTC (permalink / raw)
  To: Amit Sahrawat; +Cc: xfs

On Fri, Jul 01, 2011 at 10:00:54AM +0530, Amit Sahrawat wrote:
> On Thu, Jun 30, 2011 at 5:49 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, Jun 30, 2011 at 04:57:42PM +0530, Amit Sahrawat wrote:
> > > Hi All,
> > > I encountered a hang on XFS during unplug.
> > > *Test Case:*
> > > #!/bin/sh
> > > index=0
> > > while [ "$?" == 0 ]
> > > do
> > >         index=$(($index+1))
> > >         sync
> > >         cp /mnt/1KB.txt /tmp/"$index".test
> > > done
> > > Where /mnt - mount point for vfat and /tmp mount point for XFS, both can be
> > > XFS also.
> > >
> > > During this operation, unplug the USB. I am getting HANG almost everytime I
> > > unplug.
> >
> > Well, that's no surprise. The unplug appears to be losing IOs in
> > progress.
> >
> > > *Kernel Version:* 2.6.35.13 (extremely sorry, I know next question will be
> > > why am I not using TOT kernel - I tried but my PC does not boot up with the
> > > latest one)
.....
> > > *INFO: task khubd:*33 blocked for more than 120 seconds.
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > khubd         D c06c261c     0    33      2 0x00000000
> > > Backtrace:
> > > [<c06c2210>] (schedule+0x0/0x500) from [<c0523f4c>]
> > > (_xfs_log_force+0x230/0x284)
> >
> > You need to turn off line wrapping for stuff you paste into email.
> > The cleaned up (i.e. relevant part) trace is:
> >
> > [<c06c2210>] (schedule+0x0/0x500)
> > [<c0523d1c>] (_xfs_log_force+0x0/0x284)
> > [<c052417c>] (xfs_log_force+0x0/0x38)
> > [<c0544e94>] (xfs_sync_data+0x0/0x58)
> > [<c0544f20>] (xfs_quiesce_data+0x0/0x80)
> > [<c05421e4>] (xfs_fs_sync_fs+0x0/0xe0)
> > [<c048fa74>] (__sync_filesystem+0x0/0xa0)
> > [<c048fb88>] (sync_filesystem+0x0/0x60)
> > [<c0499104>] (fsync_bdev+0x0/0x44)
> > [<c056c680>] (invalidate_partition+0x0/0x3c)
> > [<c04b88e0>] (del_gendisk+0x0/0x140)
> > [<c05c78a0>] (sd_remove+0x0/0x84)
> > [<c05b27f4>] (__device_release_driver+0x0/0xac)
> > [<c05b2954>] (device_release_driver+0x0/0x30)
> > [<c05b1ddc>] (bus_remove_device+0x0/0x8c)
> > [<c05b02d8>] (device_del+0x0/0x170)
> > [<c05c4d5c>] (__scsi_remove_device+0x0/0x90)
> > [<c05c23bc>] (scsi_forget_host+0x0/0x6c)
> > [<c05bc38c>] (scsi_remove_host+0x0/0x104)
> > [<c0612f94>] (quiesce_and_remove_host+0x0/0x9c)
> > [<c06130b4>] (usb_stor_disconnect+0x0/0x28)
> > [<c0601614>] (usb_unbind_interface+0x0/0xdc)
> > [<c05b27f4>] (__device_release_driver+0x0/0xac)
> > [<c05b2954>] (device_release_driver+0x0/0x30)
> > [<c05b1ddc>] (bus_remove_device+0x0/0x8c)
> > [<c05b02d8>] (device_del+0x0/0x170)
> > [<c05ff06c>] (usb_disable_device+0x0/0xf8)
> > [<c05fa8e0>] (usb_disconnect+0x0/0xf4)
> > [<c05fabd8>] (hub_thread+0x0/0xd78)
> > [<c041e61c>] (kthread+0x0/0x8c)
> >
> > Well, that just looks utterly braindamaged to me.
> >
> > We just had the device containing the filesystem removed from the
> > system, so the error handling routine ends up trying to sync the
> > filesystem to the device that doesn't exist anymore. WTF?
> >
> 
> >>> This is what I think, why is syncing taking place when the

Amit, you don't need to quote your own reply. That just confuses
mail readers that understand the ">" quoting convention and
highlight appropriately, and made me wonder if you'd even
replied....

> This is what I think, why is syncing taking place when the
> device doesn't exist anymore. What is the gain in doing so?

I doubt the person who wrote the error handling even realised that
it ended up in such a mess.

> I
> will try and propose this feature.

Not sure what you mean by this....

....
> > AFAICT, this problem doesn't exist in TOT - the conversion of the
>
> Again I have a problem which seems fixed in TOT :)
> 
> > xfslogd workqueue to CMWQ allows processing of other xfslogd
> > workqueue events to continue even though this one has gone to sleep.
> >
> > You probably need to change the shutdown type to
> > SHUTDOWN_LOG_IO_ERROR to prevent a log flush from occurring in this
> > shutdown context.
> 
> This will fix the error for this kernel version, I will give this a try.
> Is this the patchwork for CMWQ:
> http://patchwork.xfs.org/patch/2037/ (xfs: improve sync behaviour
> in face of aggressive dirtying) ? Please let me know.

No. 2.6.35 doesn't have the CMWQ infrastructure, it was introduced
in 2.6.38 IIRC.

IOWs, there isn't a fix you can just backport - you're going to need
to write and test your own fix, and my suggestion for doing that is
above.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS and USB Hang on 2.6.35.13
  2011-07-01  9:03     ` Dave Chinner
@ 2011-07-01 10:37       ` Amit Sahrawat
  0 siblings, 0 replies; 5+ messages in thread
From: Amit Sahrawat @ 2011-07-01 10:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, Jul 1, 2011 at 2:33 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Jul 01, 2011 at 10:00:54AM +0530, Amit Sahrawat wrote:
>> On Thu, Jun 30, 2011 at 5:49 PM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Thu, Jun 30, 2011 at 04:57:42PM +0530, Amit Sahrawat wrote:
>> > > Hi All,
>> > > I encountered a hang on XFS during unplug.
>> > > *Test Case:*
>> > > #!/bin/sh
>> > > index=0
>> > > while [ "$?" == 0 ]
>> > > do
>> > >         index=$(($index+1))
>> > >         sync
>> > >         cp /mnt/1KB.txt /tmp/"$index".test
>> > > done
>> > > Where /mnt - mount point for vfat and /tmp mount point for XFS, both can be
>> > > XFS also.
>> > >
>> > > During this operation, unplug the USB. I am getting HANG almost everytime I
>> > > unplug.
>> >
>> > Well, that's no surprise. The unplug appears to be losing IOs in
>> > progress.
>> >
>> > > *Kernel Version:* 2.6.35.13 (extremely sorry, I know next question will be
>> > > why am I not using TOT kernel - I tried but my PC does not boot up with the
>> > > latest one)
> .....
>> > > *INFO: task khubd:*33 blocked for more than 120 seconds.
>> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> > > khubd         D c06c261c     0    33      2 0x00000000
>> > > Backtrace:
>> > > [<c06c2210>] (schedule+0x0/0x500) from [<c0523f4c>]
>> > > (_xfs_log_force+0x230/0x284)
>> >
>> > You need to turn off line wrapping for stuff you paste into email.
>> > The cleaned up (i.e. relevant part) trace is:
>> >
>> > [<c06c2210>] (schedule+0x0/0x500)
>> > [<c0523d1c>] (_xfs_log_force+0x0/0x284)
>> > [<c052417c>] (xfs_log_force+0x0/0x38)
>> > [<c0544e94>] (xfs_sync_data+0x0/0x58)
>> > [<c0544f20>] (xfs_quiesce_data+0x0/0x80)
>> > [<c05421e4>] (xfs_fs_sync_fs+0x0/0xe0)
>> > [<c048fa74>] (__sync_filesystem+0x0/0xa0)
>> > [<c048fb88>] (sync_filesystem+0x0/0x60)
>> > [<c0499104>] (fsync_bdev+0x0/0x44)
>> > [<c056c680>] (invalidate_partition+0x0/0x3c)
>> > [<c04b88e0>] (del_gendisk+0x0/0x140)
>> > [<c05c78a0>] (sd_remove+0x0/0x84)
>> > [<c05b27f4>] (__device_release_driver+0x0/0xac)
>> > [<c05b2954>] (device_release_driver+0x0/0x30)
>> > [<c05b1ddc>] (bus_remove_device+0x0/0x8c)
>> > [<c05b02d8>] (device_del+0x0/0x170)
>> > [<c05c4d5c>] (__scsi_remove_device+0x0/0x90)
>> > [<c05c23bc>] (scsi_forget_host+0x0/0x6c)
>> > [<c05bc38c>] (scsi_remove_host+0x0/0x104)
>> > [<c0612f94>] (quiesce_and_remove_host+0x0/0x9c)
>> > [<c06130b4>] (usb_stor_disconnect+0x0/0x28)
>> > [<c0601614>] (usb_unbind_interface+0x0/0xdc)
>> > [<c05b27f4>] (__device_release_driver+0x0/0xac)
>> > [<c05b2954>] (device_release_driver+0x0/0x30)
>> > [<c05b1ddc>] (bus_remove_device+0x0/0x8c)
>> > [<c05b02d8>] (device_del+0x0/0x170)
>> > [<c05ff06c>] (usb_disable_device+0x0/0xf8)
>> > [<c05fa8e0>] (usb_disconnect+0x0/0xf4)
>> > [<c05fabd8>] (hub_thread+0x0/0xd78)
>> > [<c041e61c>] (kthread+0x0/0x8c)
>> >
>> > Well, that just looks utterly braindamaged to me.
>> >
>> > We just had the device containing the filesystem removed from the
>> > system, so the error handling routine ends up trying to sync the
>> > filesystem to the device that doesn't exist anymore. WTF?
>> >
>>
>> >>> This is what I think, why is syncing taking place when the
>
> Amit, you don't need to quote your own reply. That just confuses
> mail readers that understand the ">" quoting convention and
> highlight appropriately, and made me wonder if you'd even
> replied....
Ok will take care of this in future.
>
>> This is what I think, why is syncing taking place when the
>> device doesn't exist anymore. What is the gain in doing so?
>
> I doubt the person who wrote the error handling even realised that
> it ended up in such a mess.
That means there is no review going on for that path.
>
>> I
>> will try and propose this feature.
>
> Not sure what you mean by this....
I wanted to revise this error leg where-in sync is taking place. I can
only propose the suggestion for these error condition at the moment.
>
> ....
>> > AFAICT, this problem doesn't exist in TOT - the conversion of the
>>
>> Again I have a problem which seems fixed in TOT :)
>>
>> > xfslogd workqueue to CMWQ allows processing of other xfslogd
>> > workqueue events to continue even though this one has gone to sleep.
>> >
>> > You probably need to change the shutdown type to
>> > SHUTDOWN_LOG_IO_ERROR to prevent a log flush from occurring in this
>> > shutdown context.
>>
>> This will fix the error for this kernel version, I will give this a try.
>> Is this the patchwork for CMWQ:
>> http://patchwork.xfs.org/patch/2037/ (xfs: improve sync behaviour
>> in face of aggressive dirtying) ? Please let me know.
>
> No. 2.6.35 doesn't have the CMWQ infrastructure, it was introduced
> in 2.6.38 IIRC.
>
> IOWs, there isn't a fix you can just backport - you're going to need
> to write and test your own fix, and my suggestion for doing that is
> above.
Yes, I went through the lwn.net and the kernel patches survey, CMWQ is
new infrastructure and cannot be adopted to 2.6.35. At first I thought
changes might be related only to XFS, but it is not like that.

Regarding your fix - I tried out the change by setting the
flag(SHUTDOWN_LOG_IO_ERROR) in this condition and it is working fine.
There is a comment also in the function : xfs_do_force_shutdown()
which appropriately mentions the very same thing. But since in our
case it was returning due to flag not set, we ended up putting xfslogd
to infinite sleep.

Thanks for your help.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>

Regards,
Amit Sahrawat

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-07-01 10:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-30 11:27 XFS and USB Hang on 2.6.35.13 Amit Sahrawat
2011-06-30 12:19 ` Dave Chinner
2011-07-01  4:30   ` Amit Sahrawat
2011-07-01  9:03     ` Dave Chinner
2011-07-01 10:37       ` Amit Sahrawat

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.