All of lore.kernel.org
 help / color / mirror / Atom feed
* Storage server, hung tasks and tracebacks
@ 2012-05-02 18:44 Brian Candler
  2012-05-03 12:50 ` Stan Hoeppner
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Candler @ 2012-05-02 18:44 UTC (permalink / raw)
  To: xfs

I have a really wierd problem with storage. The one thing that I *can* see
is dmesg output giving messages about hung tasks and XFS call tracebacks, so
I'm posting here in the hope that someone can help me interpret them - even
if only to say "this definitely means that your disks are not responding"
would be very helpful.

First a bit of background. there are three storage servers: they have 24 x
3TB SATA disks on LSI HBAs (1x16 port, 1x8 port), plus a boot disk directly
to the motherboard, and 8GB RAM.  They are running Ubuntu 11.10 server
amd64, kernel 3.0.0-16-server #29-Ubuntu SMP. Apart from the boot disk,
all filesystems are XFS mounted with inode64.

I had been testing them as gluster fileservers, and two of them locked up in
operation.  This gave the appearance of a hardware or BIOS fault - in
particular, a power-cycle at the front panel wouldn't bring them back to
life, but powering off using the physical switch on the PSU did.

So after upgrading the BIOS and iKVM firmware on the motherboards, I decided
to give them a soak test, running a couple of instances on bonnie++ talking
directly to the disks (so no gluster or network I/O involved).

When I came back to check on them, they have all locked up in similar ways.
On the two that I can still login to, I see the XFS call traces in dmesg.

The three servers have their disks configured slightly differently, and the
failure modes are slightly different, so I'll expand them here.

storage1
--------

Configured as two RAID6 groups of 12 disks each:

    $ cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
    md127 : active raid6 sdy[11] sdx[10] sdw[9] sdv[8] sdu[7] sdt[6] sds[5] sdr[4] sdq[3] sdp[2] sdo[1] sdn[0]
          29302650880 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [12/12] [UUUUUUUUUUUU]
          
    md126 : active raid6 sdj[8] sdk[9] sdd[2] sde[3] sdi[7] sdm[11] sdg[5] sdc[1] sdb[0] sdl[10] sdh[6] sdf[4]
          29302650880 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [12/12] [UUUUUUUUUUUU]
          
    unused devices: <none>
    $ df -h | grep md
    /dev/md126             28T  105G   28T   1% /disk/data1
    /dev/md127             28T   86G   28T   1% /disk/data2

My soak test was to run four instances of bonnie++, on
/disk/data1/{test,testb} and /disk/data2/{test,testb}, using variants
of this command line:

    while [ 1 ]; do bonnie++ -d /disk/data1/test -s 16384k -n 98:800k:500k:1000; done

SYMPTOM: I can happily ssh into this box, but iostat shows that no disk
activity is going on. The bonnie++ processes (which I was running in
screen sessions) have apparently hung, not having completed even a single
cycle:

    brian@storage1:~$ while [ 1 ]; do bonnie++ -d /disk/data1/test -s 16384k -n 98:800k:500k:1000; done
    Writing a byte at a time...done
    Writing intelligently...done
    Rewriting...done
    Reading a byte at a time...done
    Reading intelligently...done
    start 'em...done...done...done...done...done...
    Create files in sequential order...done.
    Stat files in sequential order...done.
    Delete files in sequential order...done.
    Create files in random order...

Three of the bonnie++ processes stopped at this point. The other stopped at
"Delete files in sequential order..."

dmesg is apparently silent until I start getting the call traces, here's how
they begin:

...
[48899.342127]  disk 9, o:1, dev:sdw
[48899.342129]  disk 10, o:1, dev:sdx
[48899.342131]  disk 11, o:1, dev:sdy[532203.415122] INFO: task bonnie++:28294 blocked for more than 120 seconds.
[532203.415489] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[532203.424964] bonnie++        D ffffffff81805120     0 28294  28242 0x00000000
[532203.424973]  ffff880212129cc8 0000000000000082 ffff880212129c68 ffffffff812ed3ee
[532203.424984]  ffff880212129fd8 ffff880212129fd8 ffff880212129fd8 0000000000012a40
[532203.424993]  ffffffff81c0b020 ffff88020f070000 ffff88021fc0de40 7fffffffffffffff
[532203.425003] Call Trace:
[532203.425016]  [<ffffffff812ed3ee>] ? radix_tree_lookup_slot+0xe/0x10
[532203.425025]  [<ffffffff816058cf>] schedule+0x3f/0x60
[532203.425031]  [<ffffffff81605f15>] schedule_timeout+0x2a5/0x320
[532203.425039]  [<ffffffff8102c3d3>] ? x2apic_send_IPI_mask+0x13/0x20
[532203.425046]  [<ffffffff81026558>] ? native_smp_send_reschedule+0x48/0x60
[532203.425054]  [<ffffffff8104a5d2>] ? ttwu_queue+0x92/0xd0
[532203.425060]  [<ffffffff81604f7f>] wait_for_common+0xdf/0x180
[532203.425068]  [<ffffffff81194610>] ? __sync_filesystem+0x90/0x90
[532203.425074]  [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[532203.425080]  [<ffffffff81194610>] ? __sync_filesystem+0x90/0x90
[532203.425087]  [<ffffffff816050fd>] wait_for_completion+0x1d/0x20
[532203.425093]  [<ffffffff8118e969>] writeback_inodes_sb_nr+0x79/0xa0
[532203.425099]  [<ffffffff8118ebc1>] writeback_inodes_sb+0x21/0x30
[532203.425104]  [<ffffffff811945c9>] __sync_filesystem+0x49/0x90
[532203.425110]  [<ffffffff8119462f>] sync_one_sb+0x1f/0x30
[532203.425117]  [<ffffffff8116b525>] iterate_supers+0xa5/0x100
[532203.425123]  [<ffffffff811946db>] sys_sync+0x2b/0x60
[532203.425131]  [<ffffffff8160fb82>] system_call_fastpath+0x16/0x1b
[532323.396630] INFO: task bonnie++:28239 blocked for more than 120 seconds.
[532323.401489] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[532323.411026] bonnie++        D ffffffff81805120     0 28239  26958 0x00000000
[532323.411033]  ffff88020fd85cc8 0000000000000086 ffff88020fd85c68 ffffffff812ed3ee
[532323.411038]  ffff88020fd85fd8 ffff88020fd85fd8 ffff88020fd85fd8 0000000000012a40
[532323.411042]  ffff880212d2dc80 ffff880207400000 ffff88021fd0de40 7fffffffffffffff
[532323.411050] Call Trace:
[532323.411062]  [<ffffffff812ed3ee>] ? radix_tree_lookup_slot+0xe/0x10
[532323.411072]  [<ffffffff816058cf>] schedule+0x3f/0x60
[532323.411079]  [<ffffffff81605f15>] schedule_timeout+0x2a5/0x320
[532323.411087]  [<ffffffff8102c3d3>] ? x2apic_send_IPI_mask+0x13/0x20
[532323.411094]  [<ffffffff81026558>] ? native_smp_send_reschedule+0x48/0x60
[532323.411102]  [<ffffffff8104a5d2>] ? ttwu_queue+0x92/0xd0
[532323.411109]  [<ffffffff81604f7f>] wait_for_common+0xdf/0x180
[532323.411117]  [<ffffffff81194610>] ? __sync_filesystem+0x90/0x90
[532323.411123]  [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[532323.411129]  [<ffffffff81194610>] ? __sync_filesystem+0x90/0x90
[532323.411136]  [<ffffffff816050fd>] wait_for_completion+0x1d/0x20
[532323.411142]  [<ffffffff8118e969>] writeback_inodes_sb_nr+0x79/0xa0
[532323.411149]  [<ffffffff8118ebc1>] writeback_inodes_sb+0x21/0x30
[532323.411155]  [<ffffffff811945c9>] __sync_filesystem+0x49/0x90
[532323.411166]  [<ffffffff8119462f>] sync_one_sb+0x1f/0x30
[532323.411170]  [<ffffffff8116b525>] iterate_supers+0xa5/0x100
[532323.411173]  [<ffffffff811946db>] sys_sync+0x2b/0x60
[532323.411177]  [<ffffffff8160fb82>] system_call_fastpath+0x16/0x1b
[535202.773094] INFO: task kswapd0:31 blocked for more than 120 seconds.
[535202.778172] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[535202.788110] kswapd0         D 0000000000000003     0    31      2 0x00000000
[535202.788116]  ffff880212f03980 0000000000000046 ffff8800891939b0 000000000000001a
[535202.788121]  ffff880212f03fd8 ffff880212f03fd8 ffff880212f03fd8 0000000000012a40
[535202.788126]  ffff880211a0ae40 ffff88020fbfae40 ffff880212f03970 ffff880208215818
[535202.788136] Call Trace:
[535202.788148]  [<ffffffff816058cf>] schedule+0x3f/0x60
[535202.788178]  [<ffffffffa01c2861>] _xfs_log_force_lsn+0x171/0x2e0 [xfs]
[535202.788186]  [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[535202.788210]  [<ffffffffa01b8945>] ? xfs_iunpin_wait+0x35/0xb0 [xfs]
[535202.788233]  [<ffffffffa01c29e8>] xfs_log_force_lsn+0x18/0x40 [xfs]
[535202.788252]  [<ffffffffa01b68a7>] xfs_iunpin_nowait+0x47/0xb0 [xfs]
[535202.788261]  [<ffffffffa01b8945>] xfs_iunpin_wait+0x35/0xb0 [xfs]
[535202.788265]  [<ffffffff8104652b>] ? try_wait_for_completion+0x4b/0x60
[535202.788277]  [<ffffffffa01e4352>] xfs_reclaim_inode+0xb2/0x2a0 [xfs]
[535202.788281]  [<ffffffff812ee45c>] ? radix_tree_gang_lookup_tag.part.7+0x6c/0xf0
[535202.788293]  [<ffffffffa01e4fec>] xfs_reclaim_inodes_ag+0x20c/0x330 [xfs]
[535202.788304]  [<ffffffffa01e51b3>] xfs_reclaim_inode_shrink+0xa3/0xc0 [xfs]
[535202.788309]  [<ffffffff8111b1dd>] shrink_slab+0x11d/0x190
[535202.788312]  [<ffffffff8111e42a>] balance_pgdat+0x52a/0x6f0
[535202.788315]  [<ffffffff8111e6da>] kswapd+0xea/0x1f0
[535202.788318]  [<ffffffff8111e5f0>] ? balance_pgdat+0x6f0/0x6f0
[535202.788321]  [<ffffffff81080dac>] kthread+0x8c/0xa0
[535202.788325]  [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
[535202.788328]  [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa0
[535202.788331]  [<ffffffff81610ca0>] ? gs_change+0x13/0x13
[535202.788343] INFO: task bonnie++:28609 blocked for more than 120 seconds.
[535202.793856] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[535202.805501] bonnie++        D ffffffff81805120     0 28609  28557 0x00000000
[535202.805506]  ffff8800a95993d8 0000000000000086 ffff8802129c8000 0000000000000002
[535202.805510]  ffff8800a9599fd8 ffff8800a9599fd8 ffff8800a9599fd8 0000000000012a40
[535202.805515]  ffffffff81c0b020 ffff8802129c8000 ffff8800a95993c8 000000000000038d
[535202.805519] Call Trace:
[535202.805527]  [<ffffffff816058cf>] schedule+0x3f/0x60
[535202.805552]  [<ffffffffa01c40f7>] xlog_cil_force_lsn+0xf7/0x120 [xfs]
[535202.805567]  [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[535202.805585]  [<ffffffffa01b8945>] ? xfs_iunpin_wait+0x35/0xb0 [xfs]
[535202.805602]  [<ffffffffa01c2747>] _xfs_log_force_lsn+0x57/0x2e0 [xfs]
[535202.805609]  [<ffffffff8160772e>] ? _raw_spin_lock+0xe/0x20
[535202.805615]  [<ffffffff811510e8>] ? add_partial+0x58/0x90
[535202.805632]  [<ffffffffa01b8945>] ? xfs_iunpin_wait+0x35/0xb0 [xfs]
[535202.805648]  [<ffffffffa01c29e8>] xfs_log_force_lsn+0x18/0x40 [xfs]
[535202.805665]  [<ffffffffa01b68a7>] xfs_iunpin_nowait+0x47/0xb0 [xfs]
[535202.805681]  [<ffffffffa01b8945>] xfs_iunpin_wait+0x35/0xb0 [xfs]
[535202.805687]  [<ffffffff8104652b>] ? try_wait_for_completion+0x4b/0x60
[535202.805707]  [<ffffffffa01e4352>] xfs_reclaim_inode+0xb2/0x2a0 [xfs]
[535202.805713]  [<ffffffff812ee45c>] ? radix_tree_gang_lookup_tag.part.7+0x6c/0xf0
[535202.805732]  [<ffffffffa01e4fec>] xfs_reclaim_inodes_ag+0x20c/0x330 [xfs]
[535202.805751]  [<ffffffffa01e51b3>] xfs_reclaim_inode_shrink+0xa3/0xc0 [xfs]
[535202.805758]  [<ffffffff8111b1dd>] shrink_slab+0x11d/0x190
[535202.805763]  [<ffffffff8111d978>] do_try_to_free_pages+0x288/0x2e0
[535202.805782]  [<ffffffffa01da59f>] ? xfs_buf_get+0x8f/0x1a0 [xfs]
[535202.805787]  [<ffffffff8111dcd4>] try_to_free_pages+0xb4/0x180
[535202.805793]  [<ffffffff81113260>] __alloc_pages_nodemask+0x4e0/0x800
[535202.805799]  [<ffffffff81149273>] alloc_pages_current+0xa3/0x110
[535202.805805]  [<ffffffff81151f5d>] allocate_slab+0xfd/0x160
[535202.805810]  [<ffffffff81151ff0>] new_slab+0x30/0x160
[535202.805816]  [<ffffffff815f5299>] __slab_alloc+0x127/0x222
[535202.805822]  [<ffffffff8117d447>] ? d_alloc+0x27/0x1d0
[535202.805827]  [<ffffffff811534d2>] kmem_cache_alloc+0x112/0x120
[535202.805832]  [<ffffffff8117d447>] d_alloc+0x27/0x1d0
[535202.805838]  [<ffffffff81172c4d>] d_alloc_and_lookup+0x2d/0x90
[535202.805844]  [<ffffffff81180485>] ? d_lookup+0x35/0x60
[535202.805850]  [<ffffffff81173946>] __lookup_hash.part.15+0x96/0xb0
[535202.805869]  [<ffffffffa01ce173>] ? xfs_trans_free_item_desc+0x33/0x40 [xfs]
[535202.805874]  [<ffffffff81174230>] lookup_hash+0x80/0xe0
[535202.805878]  [<ffffffff8117625b>] do_last+0x26b/0x730
[535202.805883]  [<ffffffff811778aa>] path_openat+0xca/0x3f0
[535202.805901]  [<ffffffffa01d2630>] ? xfs_free_eofblocks+0x1d0/0x240 [xfs]
[535202.805919]  [<ffffffffa01d2630>] ? xfs_free_eofblocks+0x1d0/0x240 [xfs]
[535202.805925]  [<ffffffff81177cd2>] do_filp_open+0x42/0xa0
[535202.805931]  [<ffffffff812f4cc1>] ? strncpy_from_user+0x31/0x40
[535202.805936]  [<ffffffff8117307a>] ? do_getname+0x10a/0x180
[535202.805942]  [<ffffffff8160772e>] ? _raw_spin_lock+0xe/0x20
[535202.805947]  [<ffffffff81184f87>] ? alloc_fd+0xf7/0x150
[535202.805953]  [<ffffffff8116795d>] do_sys_open+0xed/0x220
[535202.805958]  [<ffffffff81169a65>] ? fput+0x25/0x30
[535202.805964]  [<ffffffff81167ab0>] sys_open+0x20/0x30
[535202.805970]  [<ffffffff8160fb82>] system_call_fastpath+0x16/0x1b
[535202.805975] INFO: task bonnie++:28664 blocked for more than 120 seconds.
... snip rest, I have more if you want it ...

storage2
--------

This originally had all 24 disks in a software RAID0 array. Two of them
(sdr, sdy) started giving read errors, so I built a new 22-disk RAID0
without those two.  Then I ran two instances of bonnie++, on
/disk/scratch/test and /disk/scratch/testb

This machine has locked up in such a way that I cannot ssh to it, it
just drops the connection:

    $ telnet storage2 22
    Trying storage2...
    Connected to 192.168.5.71.
    Escape character is '^]'.
    Connection closed by foreign host.

At the console I see a number of messages about blocked tasks (kworker,
bonnie++, sshd, cron, mv, kswapd0).  There is a login prompt, but if I type
my username I get no password prompt.

    ...
    [520326.168622] INFO: task kswapd0:30 blocked for more than 120 seconds.
    [520326.177738] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [520326.197088] INFO: task kworder/3:2:4834 blocked for more than 120 seconds.
    [520326.207151] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

    Ubuntu 11.10 storage2 tty1

    storage2 login: brian
    << no response past this point >>

storage3
--------

This has 24 disks in a software RAID0 array, and I was running two instances
of bonnie++ against it.

    $ cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
    md127 : active raid0 sdn[12] sdu[18] sdx[22] sdl[9] sdy[23] sdd[2] sdt[17] sdi[7] sdj[8] sdr[19] sdg[5] sdw[21] sdf[4] sdv[20] sds[16] sdk[10] sde[3] sdh[6] sdp[14] sdo[13] sdc[1] sdb[0] sdq[15] sdm[11]
          70326362112 blocks super 1.2 1024k chunks
          
    unused devices: <none>

SYMPTOM: Like storage1, I can ssh in but I find no I/O activity going on. I
also see call tracebacks in dmesg, although this time the first one is
kworker not bonnie++

...
[   38.659687] eth3: no IPv6 routers present
[   41.738989] eth2: no IPv6 routers present
[531843.502509] INFO: task kworker/2:1:29 blocked for more than 120 seconds.
[531843.502876] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[531843.512298] kworker/2:1     D 0000000000000002     0    29      2 0x00000000
[531843.512304]  ffff88020f493b80 0000000000000046 ffff880017213af8 ffff8800c93384c8
[531843.512309]  ffff88020f493fd8 ffff88020f493fd8 ffff88020f493fd8 0000000000012a40
[531843.512313]  ffff88020d80ae40 ffff88020f498000 0000000000000000 ffff88020e9c11e8
[531843.512318] Call Trace:
[531843.512327]  [<ffffffff816058cf>] schedule+0x3f/0x60
[531843.512332]  [<ffffffff816066c7>] __mutex_lock_slowpath+0xd7/0x150
[531843.512338]  [<ffffffff81606282>] mutex_lock+0x22/0x40
[531843.512371]  [<ffffffffa020806d>] xfs_reclaim_inodes_ag+0x28d/0x330 [xfs]
[531843.512395]  [<ffffffffa02081f0>] ? xfs_reclaim_inodes+0x20/0x20 [xfs]
[531843.512416]  [<ffffffffa02081ed>] xfs_reclaim_inodes+0x1d/0x20 [xfs]
[531843.512436]  [<ffffffffa0208212>] xfs_reclaim_worker+0x22/0x40 [xfs]
[531843.512443]  [<ffffffff8107bbba>] process_one_work+0x11a/0x480
[531843.512449]  [<ffffffff8107c965>] worker_thread+0x165/0x370
[531843.512455]  [<ffffffff8107c800>] ? manage_workers.isra.30+0x130/0x130
[531843.512461]  [<ffffffff81080dac>] kthread+0x8c/0xa0
[531843.512468]  [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
[531843.512475]  [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa0
[531843.512481]  [<ffffffff81610ca0>] ? gs_change+0x13/0x13
[531843.512485] INFO: task kswapd0:31 blocked for more than 120 seconds.
[531843.517282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[531843.526697] kswapd0         D ffffffff81805120     0    31      2 0x00000000
[531843.526701]  ffff880212f018f0 0000000000000046 0000000000000000 0000000000000000
[531843.526706]  ffff880212f01fd8 ffff880212f01fd8 ffff880212f01fd8 0000000000012a40
[531843.526710]  ffff880212d2dc80 ffff88020f49ae40 ffff880212f018e0 0000000000004978
[531843.526714] Call Trace:
[531843.526719]  [<ffffffff816058cf>] schedule+0x3f/0x60
[531843.526740]  [<ffffffffa01e70f7>] xlog_cil_force_lsn+0xf7/0x120 [xfs]
[531843.526744]  [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[531843.526764]  [<ffffffffa01db945>] ? xfs_iunpin_wait+0x35/0xb0 [xfs]
[531843.526783]  [<ffffffffa01e5747>] _xfs_log_force_lsn+0x57/0x2e0 [xfs]
[531843.526802]  [<ffffffffa01db945>] ? xfs_iunpin_wait+0x35/0xb0 [xfs]
[531843.526819]  [<ffffffffa01e59e8>] xfs_log_force_lsn+0x18/0x40 [xfs]
[531843.526837]  [<ffffffffa01d98a7>] xfs_iunpin_nowait+0x47/0xb0 [xfs]
[531843.526854]  [<ffffffffa01db945>] xfs_iunpin_wait+0x35/0xb0 [xfs]
[531843.526858]  [<ffffffff8104652b>] ? try_wait_for_completion+0x4b/0x60
[531843.526879]  [<ffffffffa0207352>] xfs_reclaim_inode+0xb2/0x2a0 [xfs]
[531843.526884]  [<ffffffff812ee45c>] ? radix_tree_gang_lookup_tag.part.7+0x6c/0xf0
[531843.526904]  [<ffffffffa0207fec>] xfs_reclaim_inodes_ag+0x20c/0x330 [xfs]
[531843.526910]  [<ffffffff8160772e>] ? _raw_spin_lock+0xe/0x20
[531843.526928]  [<ffffffffa02081b3>] xfs_reclaim_inode_shrink+0xa3/0xc0 [xfs]
[531843.526934]  [<ffffffff8111b1dd>] shrink_slab+0x11d/0x190
[531843.526937]  [<ffffffff8111e42a>] balance_pgdat+0x52a/0x6f0
[531843.526941]  [<ffffffff8111e6da>] kswapd+0xea/0x1f0
[531843.526944]  [<ffffffff8111e5f0>] ? balance_pgdat+0x6f0/0x6f0
[531843.526948]  [<ffffffff81080dac>] kthread+0x8c/0xa0
[531843.526951]  [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
[531843.526956]  [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa0
[531843.526959]  [<ffffffff81610ca0>] ? gs_change+0x13/0x13
[531843.526963] INFO: task kworker/3:1:52 blocked for more than 120 seconds.
[531843.532003] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[531843.541890] kworker/3:1     D 0000000000000003     0    52      2 0x00000000
[531843.541894]  ffff88020f0b9450 0000000000000046 0000000000000000 0000000000000000
[531843.541899]  ffff88020f0b9fd8 ffff88020f0b9fd8 ffff88020f0b9fd8 0000000000012a40
[531843.541903]  ffff88020d808000 ffff88020f07ae40 ffff88020f0b9440 0000000000004978
[531843.541907] Call Trace:
[531843.541912]  [<ffffffff816058cf>] schedule+0x3f/0x60
[531843.541931]  [<ffffffffa01e70f7>] xlog_cil_force_lsn+0xf7/0x120 [xfs]
[531843.541938]  [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[531843.541959]  [<ffffffffa01db945>] ? xfs_iunpin_wait+0x35/0xb0 [xfs]
[531843.541979]  [<ffffffffa01e5747>] _xfs_log_force_lsn+0x57/0x2e0 [xfs]
[531843.541999]  [<ffffffffa01bbefc>] ? xfs_bmapi+0x2ac/0x1150 [xfs]
[531843.542019]  [<ffffffffa01db945>] ? xfs_iunpin_wait+0x35/0xb0 [xfs]
[531843.542039]  [<ffffffffa01e59e8>] xfs_log_force_lsn+0x18/0x40 [xfs]
[531843.542059]  [<ffffffffa01d98a7>] xfs_iunpin_nowait+0x47/0xb0 [xfs]
[531843.542078]  [<ffffffffa01db945>] xfs_iunpin_wait+0x35/0xb0 [xfs]
[531843.542084]  [<ffffffff8104652b>] ? try_wait_for_completion+0x4b/0x60
[531843.542107]  [<ffffffffa0207352>] xfs_reclaim_inode+0xb2/0x2a0 [xfs]
[531843.542114]  [<ffffffff812ee45c>] ? radix_tree_gang_lookup_tag.part.7+0x6c/0xf0
[531843.542136]  [<ffffffffa0207fec>] xfs_reclaim_inodes_ag+0x20c/0x330 [xfs]
[531843.542158]  [<ffffffffa02081b3>] xfs_reclaim_inode_shrink+0xa3/0xc0 [xfs]
[531843.542166]  [<ffffffff8111b1dd>] shrink_slab+0x11d/0x190
[531843.542171]  [<ffffffff8111d978>] do_try_to_free_pages+0x288/0x2e0
[531843.542178]  [<ffffffff8111dcd4>] try_to_free_pages+0xb4/0x180
[531843.542184]  [<ffffffff81113260>] __alloc_pages_nodemask+0x4e0/0x800
[531843.542192]  [<ffffffff8160772e>] ? _raw_spin_lock+0xe/0x20
[531843.542199]  [<ffffffff81149273>] alloc_pages_current+0xa3/0x110
[531843.542205]  [<ffffffff81151f5d>] allocate_slab+0xfd/0x160
[531843.542211]  [<ffffffff81151ff0>] new_slab+0x30/0x160
[531843.542218]  [<ffffffff815f5299>] __slab_alloc+0x127/0x222
[531843.542225]  [<ffffffff8110c545>] ? mempool_alloc_slab+0x15/0x20
[531843.542232]  [<ffffffff811534d2>] kmem_cache_alloc+0x112/0x120
[531843.542238]  [<ffffffff8110c545>] mempool_alloc_slab+0x15/0x20
[531843.542245]  [<ffffffff8110c863>] mempool_alloc+0x53/0x130
[531843.542251]  [<ffffffff81081850>] ? add_wait_queue+0x60/0x60
[531843.542257]  [<ffffffff8119bfbe>] bio_alloc_bioset+0x3e/0xf0
[531843.542263]  [<ffffffff814a9350>] ? slot_store+0x2e0/0x2e0
[531843.542271]  [<ffffffff814a51f7>] bio_alloc_mddev+0x27/0x50
[531843.542276]  [<ffffffff814a93f8>] submit_flushes+0xa8/0x140[531843.542282]  [<ffffffff8107bbba>] process_one_work+0x11a/0x480
[531843.542287]  [<ffffffff8107c965>] worker_thread+0x165/0x370
[531843.542293]  [<ffffffff8107c800>] ? manage_workers.isra.30+0x130/0x130
[531843.542299]  [<ffffffff81080dac>] kthread+0x8c/0xa0
[531843.542305]  [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
[531843.542312]  [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa0
[531843.542318]  [<ffffffff81610ca0>] ? gs_change+0x13/0x13
[531843.542325] INFO: task xfsaild/md127:1146 blocked for more than 120 seconds.
[531843.547840] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[531843.559458] xfsaild/md127   D 0000000000000003     0  1146      2 0x00000000
[531843.559461]  ffff8802119bb700 0000000000000046 ffff8802119bb6f0 ffffffff81079c25
[531843.559463]  ffff8802119bbfd8 ffff8802119bbfd8 ffff8802119bbfd8 0000000000012a40
[531843.559465]  ffff88020f07ae40 ffff8802129d4560 ffff8802119bb6f0 ffff8802129d4560
[531843.559467] Call Trace:
[531843.559469]  [<ffffffff81079c25>] ? __queue_work+0xe5/0x320
[531843.559471]  [<ffffffff816058cf>] schedule+0x3f/0x60
[531843.559473]  [<ffffffff814a9606>] md_flush_request+0x86/0x140
[531843.559475]  [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[531843.559481]  [<ffffffffa0010419>] raid0_make_request+0x119/0x1e0 [raid0]
[531843.559484]  [<ffffffff812df850>] ? blk_throtl_bio+0x1c0/0x510
[531843.559486]  [<ffffffff814a4886>] md_make_request+0xc6/0x200
[531843.559488]  [<ffffffff8110c545>] ? mempool_alloc_slab+0x15/0x20
[531843.559490]  [<ffffffff812ce48a>] generic_make_request.part.51+0x24a/0x510
[531843.559492]  [<ffffffff812ce795>] generic_make_request+0x45/0x60
[531843.559494]  [<ffffffff812ce837>] submit_bio+0x87/0x110
[531843.559504]  [<ffffffffa01fda82>] _xfs_buf_ioapply+0x162/0x1e0 [xfs]
[531843.559514]  [<ffffffffa01e1dea>] ? xlog_bdstrat+0x2a/0x60 [xfs]
[531843.559525]  [<ffffffffa01fdc5e>] xfs_buf_iorequest+0x5e/0x100 [xfs]
[531843.559535]  [<ffffffffa01e1dea>] xlog_bdstrat+0x2a/0x60 [xfs]
[531843.559545]  [<ffffffffa01e3249>] xlog_sync+0x1c9/0x3c0 [xfs]
[531843.559554]  [<ffffffffa01e48a5>] xlog_state_release_iclog+0xa5/0xd0 [xfs]
[531843.559563]  [<ffffffffa01e4cf7>] xlog_write_copy_finish.isra.12+0x147/0x170 [xfs]
[531843.559573]  [<ffffffffa01e50f8>] xlog_write+0x268/0x400 [xfs]
[531843.559583]  [<ffffffffa01f9667>] ? kmem_zone_alloc+0x67/0xe0 [xfs]
[531843.559593]  [<ffffffffa01e67ec>] xlog_cil_push+0x20c/0x380 [xfs]
[531843.559603]  [<ffffffffa01e710e>] xlog_cil_force_lsn+0x10e/0x120 [xfs]
[531843.559606]  [<ffffffff81086bb6>] ? down_trylock+0x36/0x50
[531843.559617]  [<ffffffffa01fc881>] ? xfs_buf_rele+0x51/0x130 [xfs]
[531843.559627]  [<ffffffffa01fcd06>] ? _xfs_buf_find+0x56/0x280 [xfs]
[531843.559637]  [<ffffffffa01e5478>] _xfs_log_force+0x68/0x2a0 [xfs]
[531843.559647]  [<ffffffffa01fcef9>] ? _xfs_buf_find+0x249/0x280 [xfs]
[531843.559658]  [<ffffffffa01d91d9>] ? xfs_iunlock+0xe9/0x150 [xfs]
[531843.559668]  [<ffffffffa01dec87>] ? xfs_inode_item_pushbuf+0x87/0xe0 [xfs]
[531843.559678]  [<ffffffffa01e56c8>] xfs_log_force+0x18/0x40 [xfs]
[531843.559688]  [<ffffffffa01f37b4>] xfsaild_push+0x394/0x3a0 [xfs]
[531843.559699]  [<ffffffffa01f3822>] xfsaild+0x62/0xc0 [xfs]
[531843.559708]  [<ffffffffa01f37c0>] ? xfsaild_push+0x3a0/0x3a0 [xfs]
[531843.559711]  [<ffffffff81080dac>] kthread+0x8c/0xa0
[531843.559714]  [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
[531843.559717]  [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa0
[531843.559720]  [<ffffffff81610ca0>] ? gs_change+0x13/0x13
[531843.559725] INFO: task bonnie++:15912 blocked for more than 120 seconds.
... etc

Additional testing
------------------

All three machines are still stuck in this state, and I can leave them like
this for the next few days if necessary if there are any tests or anything
that you can suggest I do to the systems while they're like this.

The only thing I could think of doing was a 'dd' to see if the underlying
drives are working:

root     23604  0.0  0.0  11064  1688 pts/4    D+   19:04   0:00 dd if=/dev/md127 of=/dev/null bs=1024k count=1024

That process hung. So I tried again with smaller ones:

strace dd if=/dev/md127 of=/dev/null bs=1024k count=1 #ok
strace dd if=/dev/md127 of=/dev/null bs=1024k count=2 #ok
strace dd if=/dev/md127 of=/dev/null bs=1024k count=3 #ok
strace dd if=/dev/md127 of=/dev/null bs=1024k count=4 #ok
strace dd if=/dev/md127 of=/dev/null bs=1024k count=5 #hung

The strace of the last process was like this:

open("/dev/md127", O_RDONLY)            = 3
dup2(3, 0)                              = 0
close(3)                                = 0
lseek(0, 0, SEEK_CUR)                   = 0
open("/dev/null", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
dup2(3, 1)                              = 1
close(3)                                = 0
mmap(NULL, 1060864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6d5a1c7000
read(0, "XFSB\0\0\20\0\0\0\0\4\27\361\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
write(1, "XFSB\0\0\20\0\0\0\0\4\27\361\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
read(0, "INA\355\2\1\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\4"..., 1048576) = 1048576
write(1, "INA\355\2\1\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\4"..., 1048576) = 1048576
read(0, "IN\201\244\2\2\0\0\0\0\3\350\0\0\3\350\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\16"..., 1048576) = 1048576
write(1, "IN\201\244\2\2\0\0\0\0\3\350\0\0\3\350\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\16"..., 1048576) = 1048576
read(0, "p1\17\343>\203)\321\272\1\357\0\0\0\272\252V\206\377~\300\203\376\351\3\366\35)Zh[\25"..., 1048576) = 1048576
write(1, "p1\17\343>\203)\321\272\1\357\0\0\0\272\252V\206\377~\300\203\376\351\3\366\35)Zh[\25"..., 1048576) = 1048576
read(0, 

That is, it read 4 chunks OK and hung reading the 5th. This seems
suspiciously like a drive has locked up, but there are no dmesg errors for
sd[a-z] to suggest disk problems, whereas when I had a real failed disk
on storage2 I had messages like this (captured previously):

    [ 1217.995083] sd 5:0:0:0: [sdr] Unhandled sense code
    [ 1217.995085] sd 5:0:0:0: [sdr]  Result: hostbyte=invalid driverbyte=DRIVER_SENSE
    [ 1217.995089] sd 5:0:0:0: [sdr]  Sense Key : Medium Error [current] 
    [ 1217.995095] sd 5:0:0:0: [sdr]  Add. Sense: Unrecovered read error
    [ 1217.995099] sd 5:0:0:0: [sdr] CDB: Read(10): 28 00 30 00 05 80 00 00 10 00
    [ 1217.995108] end_request: critical target error, dev sdr, sector 805307776

Furthermore, the following command which makes a small transfer to every
disk runs fine:

    # for i in sd{b..y}; do echo $i; dd if=/dev/$i of=/dev/null count=1; done

But this one gets stuck at sdk:

    # for i in sd{b..y}; do echo $i; dd if=/dev/$i of=/dev/null bs=1024k count=1; done

Final point: the dd and bonnie++ processes can't be killed even with -9.

So, these machines are very sick, and in roughly the same way. And the fact
I can get simple dd's to lock up implies it's *not* anything to do with XFS,
so my apologies if this is the wrong forum.

Fortunately the problem appears fairly easy to reproduce.  They are
identical hardware, in terms of chassis, PSU, motherboard, BIOS version; and
they are the same OS.

I had already flashed the newest LSI firmware into the HBAs:

    $ sudo ./sas2flash -listall
    LSI Corporation SAS2 Flash Utility
    Version 13.00.00.00 (2012.02.17) 
    Copyright (c) 2008-2012 LSI Corporation. All rights reserved 

            Adapter Selected is a LSI SAS: SAS2116_1(B1) 

    Num   Ctlr            FW Ver        NVDATA        x86-BIOS         PCI Addr
    ----------------------------------------------------------------------------

    0  SAS2116_1(B1)   13.00.01.00    0c.00.00.01    07.25.00.00     00:02:00:00
    1  SAS2008(B2)     13.00.01.00    0c.00.00.05    07.25.00.00     00:03:00:00

            Finished Processing Commands Successfully.
            Exiting SAS2Flash.

(Maybe "newest" wasn't a good idea - it was released Feb 17 2012)

I could try ubuntu 12.04 with its newer kernel - I'd be surprised if the LSI
driver in 11.10 was as broken as this appears to be, but it's not
impossible. LSI themselves only seem to provide driver packages
for RH4/SLES9 and RH5/SLES10 at
http://www.lsi.com/channel/support/products/Pages/LSISAS9201-16i.aspx
(and I guess trying CentOS 5 is another option)

Any other suggestions (and of course interpretation of the kernel call
tracebacks) would be much appreciated.

Thanks,

Brian.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Storage server, hung tasks and tracebacks
  2012-05-02 18:44 Storage server, hung tasks and tracebacks Brian Candler
@ 2012-05-03 12:50 ` Stan Hoeppner
  2012-05-03 20:41   ` Brian Candler
  0 siblings, 1 reply; 14+ messages in thread
From: Stan Hoeppner @ 2012-05-03 12:50 UTC (permalink / raw)
  To: xfs

On 5/2/2012 1:44 PM, Brian Candler wrote:
> I have a really wierd problem with storage. The one thing that I *can* see
> is dmesg output giving messages about hung tasks and XFS call tracebacks, so
> I'm posting here in the hope that someone can help me interpret them - even
> if only to say "this definitely means that your disks are not responding"
> would be very helpful.
...
> Any other suggestions (and of course interpretation of the kernel call
> tracebacks) would be much appreciated.

Which mainboards are these Brian?  Make/model?

Make/model/count of all add in cards?

Make/model of PSU?

Make model of chassis?

I'll sleuth around and see what I can find.  Could be some obscure
expansion card interaction.  Could be undersized PSUs or lack of
backplanes spread evenly across the 12v rails of a multi-rail PSU, etc, etc.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Storage server, hung tasks and tracebacks
  2012-05-03 12:50 ` Stan Hoeppner
@ 2012-05-03 20:41   ` Brian Candler
  2012-05-03 22:19     ` Stan Hoeppner
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Candler @ 2012-05-03 20:41 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

On Thu, May 03, 2012 at 07:50:00AM -0500, Stan Hoeppner wrote:
> > Any other suggestions (and of course interpretation of the kernel call
> > tracebacks) would be much appreciated.
> 
> Which mainboards are these Brian?  Make/model?

Tyan S5510, with Intel Xeon CPU E31225 @ 3.10GHz

Upgraded to BIOS 1.05a and iKVM 3.00

> Make/model/count of all add in cards?

1 x LSI SAS9201–16i
1 x LSI SAS92118–8i
1 x Intel X520-DA2 dual 10G NIC

although the 10G link wasn't being used for the most recent tests.

> Make/model of PSU?

Will have to check, I think it may be this one:
http://www.xcase.co.uk/XCASE-Power-Supply-p/psu-dolphin-900..htm

> Make model of chassis?

http://www.xcase.co.uk/24-bay-Hotswap-rackmount-chassis-norco-RPC-4224-p/case-xcase-rm424.htm

The drives are 24 x ST3000DM001 (I was hoping to get low-power Hitachi
drives but they weren't available at the time)

> I'll sleuth around and see what I can find.  Could be some obscure
> expansion card interaction.  Could be undersized PSUs or lack of
> backplanes spread evenly across the 12v rails of a multi-rail PSU, etc, etc.

Much appreciated.

However, last night I rebooted one box (the one which wouldn't let me ssh
in) then upgraded it to ubuntu 12.04.  It has been running a couple of
concurrent bonnie++ instances for over 24 hours without a hitch.

So maybe it's the mpt2sas driver which is the difference:

[dmesg from Ubuntu 11.10]
    mpt2sas version 08.100.00.02

[dmesg from Ubuntu 12.04]
    mpt2sas version 10.100.00.00

Regards,

Brian.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Storage server, hung tasks and tracebacks
  2012-05-03 20:41   ` Brian Candler
@ 2012-05-03 22:19     ` Stan Hoeppner
  2012-05-04 16:32       ` Brian Candler
  0 siblings, 1 reply; 14+ messages in thread
From: Stan Hoeppner @ 2012-05-03 22:19 UTC (permalink / raw)
  To: Brian Candler; +Cc: xfs

On 5/3/2012 3:41 PM, Brian Candler wrote:
> On Thu, May 03, 2012 at 07:50:00AM -0500, Stan Hoeppner wrote:
>>> Any other suggestions (and of course interpretation of the kernel call
>>> tracebacks) would be much appreciated.
>>
>> Which mainboards are these Brian?  Make/model?
> 
> Tyan S5510, with Intel Xeon CPU E31225 @ 3.10GHz
> 
> Upgraded to BIOS 1.05a and iKVM 3.00
> 
>> Make/model/count of all add in cards?
> 
> 1 x LSI SAS9201–16i
> 1 x LSI SAS92118–8i
> 1 x Intel X520-DA2 dual 10G NIC
> 
> although the 10G link wasn't being used for the most recent tests.

I'm not finding anything down this avenue so far.

>> Make/model of PSU?
> 
> Will have to check, I think it may be this one:
> http://www.xcase.co.uk/XCASE-Power-Supply-p/psu-dolphin-900..htm
> 
>> Make model of chassis?
> 
> http://www.xcase.co.uk/24-bay-Hotswap-rackmount-chassis-norco-RPC-4224-p/case-xcase-rm424.htm
> 
> The drives are 24 x ST3000DM001 (I was hoping to get low-power Hitachi
> drives but they weren't available at the time)

Yuk.  Zero technical info available on this "house brand" PSU or
chassis.  That PSU is decidedly consumer oriented.  It may be fine, but
there seems to be no documentation available to verify its output
current or rail config.  No pic of the PSU sticker, no manual to be
found online.  Many of these "house brand" and cheap no-name Chinese
PSUs don't put out anywhere near the advertised power, nor necessarily
cleanly, which is why the UL sticker is so handy--it doesn't lie as the
printing on the box might.  No pic of the inside of that chassis, and no
manual here either, so I have no way of knowing the backplane
architecture.  I'd have to guess it's a 6x 4 drive passive backplane
setup similar to the Norco chassis.

FWIW, I'd never buy a PSU with a "technical description" of:

    140mm Red Fan
    Strong Long Cables
    Strong outer Casing
    Quality Internal Components
    Low Noise - Big Heat Sinks

One can spend a few extra bucks (pounds) and get a SuperMicro or Sparkle
unit with all the tech specs and QC test data available online, such as:
 http://www.supermicro.com/products/powersupply/80plus/80plus_pws-865-pq.pdf

> However, last night I rebooted one box (the one which wouldn't let me ssh
> in) then upgraded it to ubuntu 12.04.  It has been running a couple of
> concurrent bonnie++ instances for over 24 hours without a hitch.
> 
> So maybe it's the mpt2sas driver which is the difference:
> 
> [dmesg from Ubuntu 11.10]
>     mpt2sas version 08.100.00.02
> 
> [dmesg from Ubuntu 12.04]
>     mpt2sas version 10.100.00.00

Glad to hear you've got one running somewhat stable.  Could be a driver
problem, but it's pretty rare for a SCSI driver to hard lock a box isn't
it?  Keep us posted.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Storage server, hung tasks and tracebacks
  2012-05-03 22:19     ` Stan Hoeppner
@ 2012-05-04 16:32       ` Brian Candler
  2012-05-04 16:50         ` Stefan Ring
                           ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Brian Candler @ 2012-05-04 16:32 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

On Thu, May 03, 2012 at 05:19:41PM -0500, Stan Hoeppner wrote:
> Glad to hear you've got one running somewhat stable.  Could be a driver
> problem, but it's pretty rare for a SCSI driver to hard lock a box isn't
> it?

Yes, that bothers me too.

> Keep us posted.

Last night I fired up two more instances of bonnie++ on that box, so there
were four at once.  Going back to the box now, I find that they have all
hung :-(

They are stuck at:

    Delete files in random order...
    Stat files in random order...
    Stat files in random order...
    Stat files in sequential order...

respectively.

iostat 5 shows no activity. There are 9 hung processes:

$ uptime
 17:23:35 up 1 day, 20:39,  1 user,  load average: 9.04, 9.08, 8.91
$ ps auxwww | grep " D" | grep -v grep
root        35  1.5  0.0      0     0 ?        D    May02  42:10 [kswapd0]
root      1179  0.0  0.0      0     0 ?        D    May02   1:50 [xfsaild/md126]
root      3127  0.0  0.0  25096   312 ?        D    16:55   0:00 /usr/lib/postfix/master
tomi     29138  1.1  0.0 378860  3708 pts/1    D+   12:43   3:06 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
tomi     29390  1.0  0.0 378860  3560 pts/3    D+   12:52   2:53 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
tomi     30356  1.1  0.0 378860  3512 pts/2    D+   13:32   2:36 bonnie++ -d /disk/scratch/testb -s 16384k -n 98:800k:500k:1000
root     31075  0.0  0.0      0     0 ?        D    14:00   0:04 [kworker/0:0]
tomi     31796  0.6  0.0 378860  3864 pts/4    D+   14:30   1:05 bonnie++ -d /disk/scratch/testb -s 16384k -n 98:800k:500k:1000
root     31922  0.0  0.0      0     0 ?        D    14:35   0:00 [kworker/1:0]

dmesg shows hung tasks and backtraces, starting with:

[150927.599920] INFO: task kswapd0:35 blocked for more than 120 seconds.
[150927.600263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[150927.600698] kswapd0         D ffffffff81806240     0    35      2 0x00000000
[150927.600704]  ffff880212389330 0000000000000046 ffff880212389320 ffffffff81082df5
[150927.600710]  ffff880212389fd8 ffff880212389fd8 ffff880212389fd8 0000000000013780
[150927.600715]  ffff8802121816f0 ffff88020e538000 ffff880212389320 ffff88020e538000
[150927.600719] Call Trace:
[150927.600728]  [<ffffffff81082df5>] ? __queue_work+0xe5/0x320
[150927.600733]  [<ffffffff8165a55f>] schedule+0x3f/0x60
[150927.600739]  [<ffffffff814e82c6>] md_flush_request+0x86/0x140
[150927.600745]  [<ffffffff8105f990>] ? try_to_wake_up+0x200/0x200
[150927.600756]  [<ffffffffa0010419>] raid0_make_request+0x119/0x1c0 [raid0]
...

Now, the only other thing I have found by googling is a suggestion that LSI
drivers lock up when there is any smart or hddtemp activity: see end of
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/906873

On this system the smartmontools package is installed, but I have not
configured it, and smartd is not running.  I don't have hddtemp installed
either.

I am completely at a loss with all this... I've never seen a Unix/Linux
system behave so unreliably.  One of the company's directors has reminded me
that we have a Windows storage server with 48 disks which has been running
without incident for the last 3 or 4 years, and I don't have a good answer
for that :-(

Regards,

Brian.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Storage server, hung tasks and tracebacks
  2012-05-04 16:32       ` Brian Candler
@ 2012-05-04 16:50         ` Stefan Ring
  2012-05-07  1:53         ` Dave Chinner
       [not found]         ` <4FA4C321.2070105@hardwarefreak.com>
  2 siblings, 0 replies; 14+ messages in thread
From: Stefan Ring @ 2012-05-04 16:50 UTC (permalink / raw)
  To: Brian Candler; +Cc: Linux fs XFS

> I am completely at a loss with all this... I've never seen a Unix/Linux
> system behave so unreliably.

Fortunately, Linux doesn't have to prove itself anymore, these days.
Usually these things happen on really crappy hardware. I had a new HP
Proliant Blade server a few months ago, and it would randomly lock up
and produce NMIs for no apparent reason. It has inflicted a decent
amount of pain on me. Finally, it was replaced with an identical
model, and all problems were instantly gone.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Storage server, hung tasks and tracebacks
       [not found]         ` <4FA4C321.2070105@hardwarefreak.com>
@ 2012-05-06  8:47           ` Brian Candler
  2012-05-15 14:02           ` Brian Candler
  1 sibling, 0 replies; 14+ messages in thread
From: Brian Candler @ 2012-05-06  8:47 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

On Sat, May 05, 2012 at 01:05:21AM -0500, Stan Hoeppner wrote:
> Did I already mention checking the AC lines feeding these 3 boxes?
> Assuming they're on UPS are the boost/buck features working properly?
> Is your area known for fantastic clean power or dirty power with lots of
> sags?  That's the first thing I'd suspect with 3 new machines all
> behaving this way.

We thought that too. These boxes are installed in a data centre; however we
just took power to a new rack and these three machines are installed in
there, so it's not impossible there's a problem with the feed to that rack.

We did have some machines on the A and B PDUs in this rack, but we may also
try powering one from an adjacent rack.

Many thanks for your helpful suggestions, and we are working through this
with the equipment supplier (who is extraordinarily helpful and
knowledgeable and as keen to get to the bottom of this as we are). I'll
certainly let you know when we have a resolution.

Regards,

Brian.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Storage server, hung tasks and tracebacks
  2012-05-04 16:32       ` Brian Candler
  2012-05-04 16:50         ` Stefan Ring
@ 2012-05-07  1:53         ` Dave Chinner
       [not found]         ` <4FA4C321.2070105@hardwarefreak.com>
  2 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2012-05-07  1:53 UTC (permalink / raw)
  To: Brian Candler; +Cc: Stan Hoeppner, xfs

On Fri, May 04, 2012 at 05:32:37PM +0100, Brian Candler wrote:
> On Thu, May 03, 2012 at 05:19:41PM -0500, Stan Hoeppner wrote:
> > Glad to hear you've got one running somewhat stable.  Could be a driver
> > problem, but it's pretty rare for a SCSI driver to hard lock a box isn't
> > it?

No. The hardware does something bad to the PCI bus, or DMAs
something over kernel memory, or won't de-assert and interrupt line,
or .... and the system will hard hang. Hell, if it just stops and
you run out of memory because IO is needed to clean and free memory,
then system can hang there as well....

> > Keep us posted.
> 
> Last night I fired up two more instances of bonnie++ on that box, so there
> were four at once.  Going back to the box now, I find that they have all
> hung :-(
> 
> They are stuck at:
> 
>     Delete files in random order...
>     Stat files in random order...
>     Stat files in random order...
>     Stat files in sequential order...
> 
> respectively.
> 
> iostat 5 shows no activity. There are 9 hung processes:
> 
> $ uptime
>  17:23:35 up 1 day, 20:39,  1 user,  load average: 9.04, 9.08, 8.91
> $ ps auxwww | grep " D" | grep -v grep
> root        35  1.5  0.0      0     0 ?        D    May02  42:10 [kswapd0]
> root      1179  0.0  0.0      0     0 ?        D    May02   1:50 [xfsaild/md126]
> root      3127  0.0  0.0  25096   312 ?        D    16:55   0:00 /usr/lib/postfix/master
> tomi     29138  1.1  0.0 378860  3708 pts/1    D+   12:43   3:06 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> tomi     29390  1.0  0.0 378860  3560 pts/3    D+   12:52   2:53 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> tomi     30356  1.1  0.0 378860  3512 pts/2    D+   13:32   2:36 bonnie++ -d /disk/scratch/testb -s 16384k -n 98:800k:500k:1000
> root     31075  0.0  0.0      0     0 ?        D    14:00   0:04 [kworker/0:0]
> tomi     31796  0.6  0.0 378860  3864 pts/4    D+   14:30   1:05 bonnie++ -d /disk/scratch/testb -s 16384k -n 98:800k:500k:1000
> root     31922  0.0  0.0      0     0 ?        D    14:35   0:00 [kworker/1:0]
> 
> dmesg shows hung tasks and backtraces, starting with:
> 
> [150927.599920] INFO: task kswapd0:35 blocked for more than 120 seconds.
> [150927.600263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [150927.600698] kswapd0         D ffffffff81806240     0    35      2 0x00000000
> [150927.600704]  ffff880212389330 0000000000000046 ffff880212389320 ffffffff81082df5
> [150927.600710]  ffff880212389fd8 ffff880212389fd8 ffff880212389fd8 0000000000013780
> [150927.600715]  ffff8802121816f0 ffff88020e538000 ffff880212389320 ffff88020e538000
> [150927.600719] Call Trace:
> [150927.600728]  [<ffffffff81082df5>] ? __queue_work+0xe5/0x320
> [150927.600733]  [<ffffffff8165a55f>] schedule+0x3f/0x60
> [150927.600739]  [<ffffffff814e82c6>] md_flush_request+0x86/0x140
> [150927.600745]  [<ffffffff8105f990>] ? try_to_wake_up+0x200/0x200
> [150927.600756]  [<ffffffffa0010419>] raid0_make_request+0x119/0x1c0 [raid0]

That's most likely a hardware or driver problem - the IO request
queue is full which means that IO completions are not occurring or
being delayed excessively. The problem is below the level of the
filesystem....

> I am completely at a loss with all this... I've never seen a Unix/Linux
> system behave so unreliably.

If you are buying bottom of the barrel hardware, then you get the
reliability that you pay for. Spend a few more dollars and buy
something that is properly engineered - you've wasted more money
trying to diagnose this problem that you would have saved by being
cheap hardware....

> One of the company's directors has reminded me
> that we have a Windows storage server with 48 disks which has been running
> without incident for the last 3 or 4 years, and I don't have a good answer
> for that :-(

If you buy bottom of the barrel hardware for Windows servers, then
you'll get similar results, only they'll be much harder to diagnose.
Software can't fix busted hardware...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Storage server, hung tasks and tracebacks
       [not found]         ` <4FA4C321.2070105@hardwarefreak.com>
  2012-05-06  8:47           ` Brian Candler
@ 2012-05-15 14:02           ` Brian Candler
  2012-05-20 16:35             ` Brian Candler
  2012-05-20 23:59             ` Dave Chinner
  1 sibling, 2 replies; 14+ messages in thread
From: Brian Candler @ 2012-05-15 14:02 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

Update:

After a week away, I am continuing to try to narrow down the problem of this
system withing hanging I/O.

I can fairly reliably repeat the problem on a system with 24 disks, and I've
embarked on trying some different configs to see what's the simplest way I
can make this die.

During this, I found something of interest: I happened to leave an 'iostat
5' process running, and that hung too.  i.e. ps showed it in 'D+' state, and
it was unkillable.

root        34  0.6  0.0      0     0 ?        D    11:29   1:18 [kswapd0]
root      1258  0.0  0.0  15976   532 ?        Ds   11:29   0:00 /usr/sbin/irqbalance
root      1421  0.0  0.0      0     0 ?        D    12:49   0:01 [xfsaild/md127]
snmp      1430  0.0  0.0  48608  3440 ?        D    11:29   0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
xxxx      1614  1.1  0.0 378860  3812 pts/1    D+   12:50   1:15 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1669  1.2  0.0 378860  3816 pts/2    D+   12:50   1:21 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1727  0.5  0.0 383424   692 pts/3    Dl+  12:51   0:37 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1782  1.2  0.0 378860  3824 pts/4    D+   12:51   1:20 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1954  0.0  0.0   5912   544 pts/0    D+   12:58   0:00 iostat 5
root      2642  0.2  0.0      0     0 ?        D    13:25   0:09 [kworker/0:1]
root      3233  0.0  0.0   5044   168 ?        Ds   13:50   0:00 /usr/sbin/sshd -D -R
xxxx      4648  0.0  0.0   8104   936 pts/6    S+   14:41   0:00 grep --color=auto  D
root     29491  0.0  0.0      0     0 ?        D    12:45   0:00 [kworker/1:2]

I wonder if iostat actually communicates with the device driver at all? If
not, then presumably it's looking at some kernel data structure.  Maybe
there is a lock being kept open on that by someone/something.

At the same time, I notice that 'cat /proc/diskstats' still works, and
starting a new 'iostat 5' process works too.

After issuing halt -p I get this:

root        34  0.6  0.0      0     0 ?        D    11:29   1:18 [kswapd0]
root      1258  0.0  0.0  15976   532 ?        Ds   11:29   0:00 /usr/sbin/irqbalance
root      1421  0.0  0.0      0     0 ?        D    12:49   0:01 [xfsaild/md127]
snmp      1430  0.0  0.0  48608  3440 ?        D    11:29   0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
xxxx      1614  1.0  0.0 378860  3812 pts/1    D+   12:50   1:15 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1669  1.1  0.0 378860  3816 pts/2    D+   12:50   1:21 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1727  0.5  0.0 383424   692 pts/3    Dl+  12:51   0:37 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1782  1.1  0.0 378860  3824 pts/4    D+   12:51   1:20 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx      1954  0.0  0.0   5912   544 pts/0    D+   12:58   0:00 iostat 5
root      2642  0.1  0.0      0     0 ?        D    13:25   0:09 [kworker/0:1]
root      3233  0.0  0.0   5044   168 ?        Ds   13:50   0:00 /usr/sbin/sshd -D -R
root      4753  0.0  0.0  15056   928 ?        D    14:42   0:00 umount /run/rpc_pipefs
root      4828  0.0  0.0   4296   348 ?        D    14:42   0:00 sync
root      4834  0.0  0.0   8100   624 pts/6    R+   14:50   0:00 grep --color=auto  D
root     29491  0.0  0.0      0     0 ?        D    12:45   0:00 [kworker/1:2]

I see even umount'ing rpc_pipefs is hanging. So this suggests there's some
sort of global lock involved.

Anyway, I just wonder if this jogs a memory in anyone, as to why iostat
would hang in an unkillable way.

Regards,

Brian.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Storage server, hung tasks and tracebacks
  2012-05-15 14:02           ` Brian Candler
@ 2012-05-20 16:35             ` Brian Candler
  2012-05-22 13:14               ` Brian Candler
  2012-05-20 23:59             ` Dave Chinner
  1 sibling, 1 reply; 14+ messages in thread
From: Brian Candler @ 2012-05-20 16:35 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

Another update:

I have been trying some various combinations to see under what circumstances
I can make things lock up.

The main discovery: using ext4 instead of xfs, I cannot get the server to
lock up - after 36 hours of continuous testing anyway. With xfs and
everything else identical, it typically locks up within 10 minutes.

This is not to say that xfs is at fault. It may be that xfs generates a
higher peak load of I/O ops or something, and that tickles the problem.  In
any case I see a mixture of unkillable processes: not only bonnie++ and
xfsaild but I have also seen kswapd, kworker, irqbalance, even postfix
processes (which should not even be touching the 24-disk array; there is a
separate system disk directly connected to the motherboard's own SATA
controller)

The test is running four concurrent bonnie++ sessions in separate screen
sessions.

Some of the tests performed:

- 24 SATA disks, LSI HBAs, md RAID0, XFS: rapid lockup

- 24 SATA disks, LSI HBAs, md RAID0, ext4: no lockup seen so far

- 2 SATA disks, LSI HBAs, md RAID0, XFS: no lockup

- 1 system SATA disk, motherboard SATA, no RAID, ext4: no lockup

I did also write a ruby script to do lots of concurrent dd reads (at random
offsets) directly from the array. I wasn't able to replicate the problem
with that.

This is with Seagate 7200rpm drives, and the total I/O bandwidth I can see
is quite a lot (see iostat below).  I can also replicate the problem in a
similar system with Hitachi "coolspin" (5940rpm?) drives, but it seems to
take somewhat longer, maybe an hour or two, so perhaps the peak I/O ops is
something to do with it?

(These systems do have only 8GB RAM, so I also wondered if it was something
to do with deadlocking when allocating buffer space if not enough was
available)

Regards,

Brian.


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.55    0.00   80.67   12.63    0.00    5.15

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               0.40         0.00         2.40          0         12
sdf             187.80     17817.60    109416.80      89088     547084
sde             182.60     17817.60    112051.20      89088     560256
sdd             183.00     17817.60    108800.00      89088     544000
sdc             167.00     17612.80    105840.80      88064     529204
sdg             162.80     17612.80    107735.20      88064     538676
sdh             180.00     18022.40    112230.40      90112     561152
sdp             168.20     17408.00    107929.60      87040     539648
sdj             179.60     17614.40    111346.40      88072     556732
sdq             174.20     17408.00    108544.00      87040     542720
sdk             201.60     17612.80    111206.40      88064     556032
sdb             189.20     17819.20    108800.00      89096     544000
sdl             195.60     17542.40    110387.20      87712     551936
sdo             196.00     17408.00    111206.40      87040     556032
sdm             200.00     17408.00    110796.80      87040     553984
sdn             189.00     17408.00    108544.00      87040     542720
sdi             168.60     18022.40    112025.60      90112     560128
sdr             192.60     17819.20    111858.40      89096     559292
sdu             193.80     17612.80    108953.60      88064     544768
sdv             202.60     17612.80    108851.20      88064     544256
sdw             178.20     17612.80    108953.60      88064     544768
sdy             191.60     17612.80    110796.80      88064     553984
sdx             196.00     17612.80    111616.00      88064     558080
sds             182.80     17612.80    109158.40      88064     545792
sdt             191.60     17203.20    111219.20      86016     556096
md127          7569.80    415064.00   2620999.20    2075320   13104996

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Storage server, hung tasks and tracebacks
  2012-05-15 14:02           ` Brian Candler
  2012-05-20 16:35             ` Brian Candler
@ 2012-05-20 23:59             ` Dave Chinner
  2012-05-21  9:58               ` Brian Candler
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2012-05-20 23:59 UTC (permalink / raw)
  To: Brian Candler; +Cc: Stan Hoeppner, xfs

On Tue, May 15, 2012 at 03:02:37PM +0100, Brian Candler wrote:
> Update:
> 
> After a week away, I am continuing to try to narrow down the problem of this
> system withing hanging I/O.
> 
> I can fairly reliably repeat the problem on a system with 24 disks, and I've
> embarked on trying some different configs to see what's the simplest way I
> can make this die.
> 
> During this, I found something of interest: I happened to leave an 'iostat
> 5' process running, and that hung too.  i.e. ps showed it in 'D+' state, and
> it was unkillable.
> 
> root        34  0.6  0.0      0     0 ?        D    11:29   1:18 [kswapd0]
> root      1258  0.0  0.0  15976   532 ?        Ds   11:29   0:00 /usr/sbin/irqbalance
> root      1421  0.0  0.0      0     0 ?        D    12:49   0:01 [xfsaild/md127]
> snmp      1430  0.0  0.0  48608  3440 ?        D    11:29   0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
> xxxx      1614  1.1  0.0 378860  3812 pts/1    D+   12:50   1:15 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx      1669  1.2  0.0 378860  3816 pts/2    D+   12:50   1:21 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx      1727  0.5  0.0 383424   692 pts/3    Dl+  12:51   0:37 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx      1782  1.2  0.0 378860  3824 pts/4    D+   12:51   1:20 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx      1954  0.0  0.0   5912   544 pts/0    D+   12:58   0:00 iostat 5
> root      2642  0.2  0.0      0     0 ?        D    13:25   0:09 [kworker/0:1]
> root      3233  0.0  0.0   5044   168 ?        Ds   13:50   0:00 /usr/sbin/sshd -D -R
> xxxx      4648  0.0  0.0   8104   936 pts/6    S+   14:41   0:00 grep --color=auto  D
> root     29491  0.0  0.0      0     0 ?        D    12:45   0:00 [kworker/1:2]
> 
> I wonder if iostat actually communicates with the device driver at all? If
> not, then presumably it's looking at some kernel data structure.  Maybe
> there is a lock being kept open on that by someone/something.
> 
> At the same time, I notice that 'cat /proc/diskstats' still works, and
> starting a new 'iostat 5' process works too.
> 
> After issuing halt -p I get this:
> 
> root        34  0.6  0.0      0     0 ?        D    11:29   1:18 [kswapd0]
> root      1258  0.0  0.0  15976   532 ?        Ds   11:29   0:00 /usr/sbin/irqbalance
> root      1421  0.0  0.0      0     0 ?        D    12:49   0:01 [xfsaild/md127]
> snmp      1430  0.0  0.0  48608  3440 ?        D    11:29   0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
> xxxx      1614  1.0  0.0 378860  3812 pts/1    D+   12:50   1:15 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx      1669  1.1  0.0 378860  3816 pts/2    D+   12:50   1:21 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx      1727  0.5  0.0 383424   692 pts/3    Dl+  12:51   0:37 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx      1782  1.1  0.0 378860  3824 pts/4    D+   12:51   1:20 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx      1954  0.0  0.0   5912   544 pts/0    D+   12:58   0:00 iostat 5
> root      2642  0.1  0.0      0     0 ?        D    13:25   0:09 [kworker/0:1]
> root      3233  0.0  0.0   5044   168 ?        Ds   13:50   0:00 /usr/sbin/sshd -D -R
> root      4753  0.0  0.0  15056   928 ?        D    14:42   0:00 umount /run/rpc_pipefs
> root      4828  0.0  0.0   4296   348 ?        D    14:42   0:00 sync
> root      4834  0.0  0.0   8100   624 pts/6    R+   14:50   0:00 grep --color=auto  D
> root     29491  0.0  0.0      0     0 ?        D    12:45   0:00 [kworker/1:2]
> 
> I see even umount'ing rpc_pipefs is hanging. So this suggests there's some
> sort of global lock involved.

You need to provide the output of sysrq-W at this point ('echo w >
/proc/sysrq-trigger') so we can see where these are hung. the entire
dmesg would also be useful....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Storage server, hung tasks and tracebacks
  2012-05-20 23:59             ` Dave Chinner
@ 2012-05-21  9:58               ` Brian Candler
  2012-09-09  9:47                 ` Brian Candler
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Candler @ 2012-05-21  9:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Stan Hoeppner, xfs

[-- Attachment #1: Type: text/plain, Size: 6222 bytes --]

On Mon, May 21, 2012 at 09:59:03AM +1000, Dave Chinner wrote:
> You need to provide the output of sysrq-W at this point ('echo w >
> /proc/sysrq-trigger') so we can see where these are hung. the entire
> dmesg would also be useful....

Thank you for this advice Dave.

Attached is the full dmesg output after another hang. The sysrq output is
near the end, at timestamp 250695.

For this test, I built a fresh XFS filesystem (you can see this at timestamp
246909) - I forgot to mount with "inode64" this time, but it doesn't seem to
have made a difference.  I also did "swapoff" before starting the test, to
ensure that swapping to sda was not part of the problem.

A quick status summary from the hung system:

    root@storage3:~# free
                 total       used       free     shared    buffers     cached
    Mem:       8156224    8052824     103400          0       4808    7399112
    -/+ buffers/cache:     648904    7507320
    Swap:            0          0          0
    root@storage3:~# uptime
     10:46:47 up 2 days, 21:43,  1 user,  load average: 10.00, 9.81, 8.16
    root@storage3:~# ps auxwww | grep -v ' S'
    root        34  2.9  0.0      0     0 ?        D    May18 122:14 [kswapd0]
    root      1387  0.0  0.0  15976   504 ?        Ds   May18   0:18 /usr/sbin/irqbalance
    root      5242  0.0  0.0      0     0 ?        D    09:39   0:02 [xfsaild/md127]
    tomi      6249  4.2  0.0 378860  3844 pts/1    D+   09:40   2:48 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
    tomi      6251  4.1  0.0 378860  3836 pts/2    D+   09:40   2:44 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
    tomi      6253  4.1  0.0 378860  3848 pts/3    D+   09:40   2:46 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
    tomi      6255  4.0  0.0 378860  3840 pts/4    D+   09:40   2:40 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
    root      7795  0.1  0.0      0     0 ?        D    10:27   0:02 [kworker/0:3]
    root      8517  0.0  0.0  16876  1272 pts/0    R+   10:46   0:00 ps auxwww
    root     24420  0.0  0.0      0     0 ?        D    00:50   0:00 [kworker/3:0]
    root@storage3:~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4]
    [raid10] 
    md127 : active raid0 sds[17] sdx[22] sdj[8] sdt[18] sdk[9] sdc[1] sdb[0]
    sdh[6] sdu[19] sdi[7] sdn[12] sdo[13] sdv[20] sdm[11] sdq[15] sdp[14]
    sdl[10] sdw[21] sdg[5] sdr[16] sde[3] sdy[23] sdd[2] sdf[4]
          70326362112 blocks super 1.2 1024k chunks
          
    unused devices: <none>
    root@storage3:~# mount
    /dev/sda1 on / type ext4 (rw,errors=remount-ro)
    proc on /proc type proc (rw,noexec,nosuid,nodev)
    sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
    none on /sys/fs/fuse/connections type fusectl (rw)
    none on /sys/kernel/debug type debugfs (rw)
    none on /sys/kernel/security type securityfs (rw)
    udev on /dev type devtmpfs (rw,mode=0755)
    devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
    tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
    none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
    none on /run/shm type tmpfs (rw,nosuid,nodev)
    rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw)
    /dev/md127 on /disk/scratch type xfs (rw)
    root@storage3:~# df
    Filesystem       1K-blocks      Used   Available Use% Mounted on
    /dev/sda1        967415188  16754388   902241696   2% /
    udev               4069104         4     4069100   1% /dev
    tmpfs              1631248       380     1630868   1% /run
    none                  5120         0        5120   0% /run/lock
    none               4078112         0     4078112   0% /run/shm
    /dev/md127     70324275200 258902416 70065372784   1% /disk/scratch
    root@storage3:~# 

(Aside: when the test started the load average was just above 4, for the
four bonnie++ processes)

"iostat 5" shows zero activity to the MD RAID.

    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               0.00    0.00    0.00    0.05    0.00   99.95

    Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
    sda               0.20         1.60         0.00          8          0
    sdf               0.00         0.00         0.00          0          0
    sde               0.00         0.00         0.00          0          0
    sdd               0.00         0.00         0.00          0          0
    sdc               0.00         0.00         0.00          0          0
    sdg               0.00         0.00         0.00          0          0
    sdh               0.00         0.00         0.00          0          0
    sdp               0.00         0.00         0.00          0          0
    sdj               0.00         0.00         0.00          0          0
    sdq               0.00         0.00         0.00          0          0
    sdk               0.00         0.00         0.00          0          0
    sdb               0.00         0.00         0.00          0          0
    sdl               0.00         0.00         0.00          0          0
    sdo               0.00         0.00         0.00          0          0
    sdm               0.00         0.00         0.00          0          0
    sdn               0.00         0.00         0.00          0          0
    sdi               0.00         0.00         0.00          0          0
    sdr               0.00         0.00         0.00          0          0
    sdu               0.00         0.00         0.00          0          0
    sdv               0.00         0.00         0.00          0          0
    sdw               0.00         0.00         0.00          0          0
    sdy               0.00         0.00         0.00          0          0
    sdx               0.00         0.00         0.00          0          0
    sds               0.00         0.00         0.00          0          0
    sdt               0.00         0.00         0.00          0          0
    md127             0.00         0.00         0.00          0          0

Anything you can determine from this info much appreciated!

Regards,

Brian.

[-- Attachment #2: storage3-dmesg.txt.gz --]
[-- Type: application/x-gunzip, Size: 24023 bytes --]

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Storage server, hung tasks and tracebacks
  2012-05-20 16:35             ` Brian Candler
@ 2012-05-22 13:14               ` Brian Candler
  0 siblings, 0 replies; 14+ messages in thread
From: Brian Candler @ 2012-05-22 13:14 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

On Sun, May 20, 2012 at 05:35:06PM +0100, Brian Candler wrote:
> The main discovery: using ext4 instead of xfs, I cannot get the server to
> lock up - after 36 hours of continuous testing anyway. With xfs and
> everything else identical, it typically locks up within 10 minutes.

I re-ran the test using btrfs (not that I would use it in production, but
just as another comparison).  No lockup after over 24 hours either.

So for now I am converting these boxes to ext4.  I should soon have another
test box with 24 disks that I can reformat as XFS, if something changes and
you'd like me to retest it.

I'd still like to use XFS at some point in the future because this is
running gluster on top, and the gluster people recommend XFS and do most of
their testing with it.

Regards,

Brian.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Storage server, hung tasks and tracebacks
  2012-05-21  9:58               ` Brian Candler
@ 2012-09-09  9:47                 ` Brian Candler
  0 siblings, 0 replies; 14+ messages in thread
From: Brian Candler @ 2012-09-09  9:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Stan Hoeppner, xfs

[-- Attachment #1: Type: text/plain, Size: 1409 bytes --]

On Mon, May 21, 2012 at 10:58:30AM +0100, Brian Candler wrote:
> On Mon, May 21, 2012 at 09:59:03AM +1000, Dave Chinner wrote:
> > You need to provide the output of sysrq-W at this point ('echo w >
> > /proc/sysrq-trigger') so we can see where these are hung. the entire
> > dmesg would also be useful....
> 
> Thank you for this advice Dave.
> 
> Attached is the full dmesg output after another hang. The sysrq output is
> near the end, at timestamp 250695.

It has recently been pointed out to me that the original attachment was
incomplete and didn't include the sysrq output.  Attached is the dmesg file
*with* the sysrq data at the given timestamp.

Unfortunately, I have been trying to reproduce this problem on an identical
box and over 5 days the problem didn't reoccur.  (Aside: that was until one
of the RAID0 drives failed, which is turning out to be a common occurrence
with Seagates; but that's not the failure I was seeing before, which I used
to be able to replicate in less than an hour. In this case XFS shut down
gracefully in the face of I/O errors)

This isn't one of the three actual boxes which had locked up before, but
it's another box with identical chassis and motherboard bought from same
supplier. This however has an up-to-date Ubuntu 12.04 kernel on it, so it's
possible there has been some driver fix since my original tests. 

I will continue to investigate.

Regards,

Brian.

[-- Attachment #2: storage3-sysreq.txt.gz --]
[-- Type: application/x-gunzip, Size: 28530 bytes --]

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-09-09  9:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-02 18:44 Storage server, hung tasks and tracebacks Brian Candler
2012-05-03 12:50 ` Stan Hoeppner
2012-05-03 20:41   ` Brian Candler
2012-05-03 22:19     ` Stan Hoeppner
2012-05-04 16:32       ` Brian Candler
2012-05-04 16:50         ` Stefan Ring
2012-05-07  1:53         ` Dave Chinner
     [not found]         ` <4FA4C321.2070105@hardwarefreak.com>
2012-05-06  8:47           ` Brian Candler
2012-05-15 14:02           ` Brian Candler
2012-05-20 16:35             ` Brian Candler
2012-05-22 13:14               ` Brian Candler
2012-05-20 23:59             ` Dave Chinner
2012-05-21  9:58               ` Brian Candler
2012-09-09  9:47                 ` Brian Candler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.