* Storage server, hung tasks and tracebacks
@ 2012-05-02 18:44 Brian Candler
2012-05-03 12:50 ` Stan Hoeppner
0 siblings, 1 reply; 14+ messages in thread
From: Brian Candler @ 2012-05-02 18:44 UTC (permalink / raw)
To: xfs
I have a really wierd problem with storage. The one thing that I *can* see
is dmesg output giving messages about hung tasks and XFS call tracebacks, so
I'm posting here in the hope that someone can help me interpret them - even
if only to say "this definitely means that your disks are not responding"
would be very helpful.
First a bit of background. there are three storage servers: they have 24 x
3TB SATA disks on LSI HBAs (1x16 port, 1x8 port), plus a boot disk directly
to the motherboard, and 8GB RAM. They are running Ubuntu 11.10 server
amd64, kernel 3.0.0-16-server #29-Ubuntu SMP. Apart from the boot disk,
all filesystems are XFS mounted with inode64.
I had been testing them as gluster fileservers, and two of them locked up in
operation. This gave the appearance of a hardware or BIOS fault - in
particular, a power-cycle at the front panel wouldn't bring them back to
life, but powering off using the physical switch on the PSU did.
So after upgrading the BIOS and iKVM firmware on the motherboards, I decided
to give them a soak test, running a couple of instances on bonnie++ talking
directly to the disks (so no gluster or network I/O involved).
When I came back to check on them, they have all locked up in similar ways.
On the two that I can still login to, I see the XFS call traces in dmesg.
The three servers have their disks configured slightly differently, and the
failure modes are slightly different, so I'll expand them here.
storage1
--------
Configured as two RAID6 groups of 12 disks each:
$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : active raid6 sdy[11] sdx[10] sdw[9] sdv[8] sdu[7] sdt[6] sds[5] sdr[4] sdq[3] sdp[2] sdo[1] sdn[0]
29302650880 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [12/12] [UUUUUUUUUUUU]
md126 : active raid6 sdj[8] sdk[9] sdd[2] sde[3] sdi[7] sdm[11] sdg[5] sdc[1] sdb[0] sdl[10] sdh[6] sdf[4]
29302650880 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [12/12] [UUUUUUUUUUUU]
unused devices: <none>
$ df -h | grep md
/dev/md126 28T 105G 28T 1% /disk/data1
/dev/md127 28T 86G 28T 1% /disk/data2
My soak test was to run four instances of bonnie++, on
/disk/data1/{test,testb} and /disk/data2/{test,testb}, using variants
of this command line:
while [ 1 ]; do bonnie++ -d /disk/data1/test -s 16384k -n 98:800k:500k:1000; done
SYMPTOM: I can happily ssh into this box, but iostat shows that no disk
activity is going on. The bonnie++ processes (which I was running in
screen sessions) have apparently hung, not having completed even a single
cycle:
brian@storage1:~$ while [ 1 ]; do bonnie++ -d /disk/data1/test -s 16384k -n 98:800k:500k:1000; done
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...
Three of the bonnie++ processes stopped at this point. The other stopped at
"Delete files in sequential order..."
dmesg is apparently silent until I start getting the call traces, here's how
they begin:
...
[48899.342127] disk 9, o:1, dev:sdw
[48899.342129] disk 10, o:1, dev:sdx
[48899.342131] disk 11, o:1, dev:sdy[532203.415122] INFO: task bonnie++:28294 blocked for more than 120 seconds.
[532203.415489] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[532203.424964] bonnie++ D ffffffff81805120 0 28294 28242 0x00000000
[532203.424973] ffff880212129cc8 0000000000000082 ffff880212129c68 ffffffff812ed3ee
[532203.424984] ffff880212129fd8 ffff880212129fd8 ffff880212129fd8 0000000000012a40
[532203.424993] ffffffff81c0b020 ffff88020f070000 ffff88021fc0de40 7fffffffffffffff
[532203.425003] Call Trace:
[532203.425016] [<ffffffff812ed3ee>] ? radix_tree_lookup_slot+0xe/0x10
[532203.425025] [<ffffffff816058cf>] schedule+0x3f/0x60
[532203.425031] [<ffffffff81605f15>] schedule_timeout+0x2a5/0x320
[532203.425039] [<ffffffff8102c3d3>] ? x2apic_send_IPI_mask+0x13/0x20
[532203.425046] [<ffffffff81026558>] ? native_smp_send_reschedule+0x48/0x60
[532203.425054] [<ffffffff8104a5d2>] ? ttwu_queue+0x92/0xd0
[532203.425060] [<ffffffff81604f7f>] wait_for_common+0xdf/0x180
[532203.425068] [<ffffffff81194610>] ? __sync_filesystem+0x90/0x90
[532203.425074] [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[532203.425080] [<ffffffff81194610>] ? __sync_filesystem+0x90/0x90
[532203.425087] [<ffffffff816050fd>] wait_for_completion+0x1d/0x20
[532203.425093] [<ffffffff8118e969>] writeback_inodes_sb_nr+0x79/0xa0
[532203.425099] [<ffffffff8118ebc1>] writeback_inodes_sb+0x21/0x30
[532203.425104] [<ffffffff811945c9>] __sync_filesystem+0x49/0x90
[532203.425110] [<ffffffff8119462f>] sync_one_sb+0x1f/0x30
[532203.425117] [<ffffffff8116b525>] iterate_supers+0xa5/0x100
[532203.425123] [<ffffffff811946db>] sys_sync+0x2b/0x60
[532203.425131] [<ffffffff8160fb82>] system_call_fastpath+0x16/0x1b
[532323.396630] INFO: task bonnie++:28239 blocked for more than 120 seconds.
[532323.401489] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[532323.411026] bonnie++ D ffffffff81805120 0 28239 26958 0x00000000
[532323.411033] ffff88020fd85cc8 0000000000000086 ffff88020fd85c68 ffffffff812ed3ee
[532323.411038] ffff88020fd85fd8 ffff88020fd85fd8 ffff88020fd85fd8 0000000000012a40
[532323.411042] ffff880212d2dc80 ffff880207400000 ffff88021fd0de40 7fffffffffffffff
[532323.411050] Call Trace:
[532323.411062] [<ffffffff812ed3ee>] ? radix_tree_lookup_slot+0xe/0x10
[532323.411072] [<ffffffff816058cf>] schedule+0x3f/0x60
[532323.411079] [<ffffffff81605f15>] schedule_timeout+0x2a5/0x320
[532323.411087] [<ffffffff8102c3d3>] ? x2apic_send_IPI_mask+0x13/0x20
[532323.411094] [<ffffffff81026558>] ? native_smp_send_reschedule+0x48/0x60
[532323.411102] [<ffffffff8104a5d2>] ? ttwu_queue+0x92/0xd0
[532323.411109] [<ffffffff81604f7f>] wait_for_common+0xdf/0x180
[532323.411117] [<ffffffff81194610>] ? __sync_filesystem+0x90/0x90
[532323.411123] [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[532323.411129] [<ffffffff81194610>] ? __sync_filesystem+0x90/0x90
[532323.411136] [<ffffffff816050fd>] wait_for_completion+0x1d/0x20
[532323.411142] [<ffffffff8118e969>] writeback_inodes_sb_nr+0x79/0xa0
[532323.411149] [<ffffffff8118ebc1>] writeback_inodes_sb+0x21/0x30
[532323.411155] [<ffffffff811945c9>] __sync_filesystem+0x49/0x90
[532323.411166] [<ffffffff8119462f>] sync_one_sb+0x1f/0x30
[532323.411170] [<ffffffff8116b525>] iterate_supers+0xa5/0x100
[532323.411173] [<ffffffff811946db>] sys_sync+0x2b/0x60
[532323.411177] [<ffffffff8160fb82>] system_call_fastpath+0x16/0x1b
[535202.773094] INFO: task kswapd0:31 blocked for more than 120 seconds.
[535202.778172] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[535202.788110] kswapd0 D 0000000000000003 0 31 2 0x00000000
[535202.788116] ffff880212f03980 0000000000000046 ffff8800891939b0 000000000000001a
[535202.788121] ffff880212f03fd8 ffff880212f03fd8 ffff880212f03fd8 0000000000012a40
[535202.788126] ffff880211a0ae40 ffff88020fbfae40 ffff880212f03970 ffff880208215818
[535202.788136] Call Trace:
[535202.788148] [<ffffffff816058cf>] schedule+0x3f/0x60
[535202.788178] [<ffffffffa01c2861>] _xfs_log_force_lsn+0x171/0x2e0 [xfs]
[535202.788186] [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[535202.788210] [<ffffffffa01b8945>] ? xfs_iunpin_wait+0x35/0xb0 [xfs]
[535202.788233] [<ffffffffa01c29e8>] xfs_log_force_lsn+0x18/0x40 [xfs]
[535202.788252] [<ffffffffa01b68a7>] xfs_iunpin_nowait+0x47/0xb0 [xfs]
[535202.788261] [<ffffffffa01b8945>] xfs_iunpin_wait+0x35/0xb0 [xfs]
[535202.788265] [<ffffffff8104652b>] ? try_wait_for_completion+0x4b/0x60
[535202.788277] [<ffffffffa01e4352>] xfs_reclaim_inode+0xb2/0x2a0 [xfs]
[535202.788281] [<ffffffff812ee45c>] ? radix_tree_gang_lookup_tag.part.7+0x6c/0xf0
[535202.788293] [<ffffffffa01e4fec>] xfs_reclaim_inodes_ag+0x20c/0x330 [xfs]
[535202.788304] [<ffffffffa01e51b3>] xfs_reclaim_inode_shrink+0xa3/0xc0 [xfs]
[535202.788309] [<ffffffff8111b1dd>] shrink_slab+0x11d/0x190
[535202.788312] [<ffffffff8111e42a>] balance_pgdat+0x52a/0x6f0
[535202.788315] [<ffffffff8111e6da>] kswapd+0xea/0x1f0
[535202.788318] [<ffffffff8111e5f0>] ? balance_pgdat+0x6f0/0x6f0
[535202.788321] [<ffffffff81080dac>] kthread+0x8c/0xa0
[535202.788325] [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
[535202.788328] [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa0
[535202.788331] [<ffffffff81610ca0>] ? gs_change+0x13/0x13
[535202.788343] INFO: task bonnie++:28609 blocked for more than 120 seconds.
[535202.793856] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[535202.805501] bonnie++ D ffffffff81805120 0 28609 28557 0x00000000
[535202.805506] ffff8800a95993d8 0000000000000086 ffff8802129c8000 0000000000000002
[535202.805510] ffff8800a9599fd8 ffff8800a9599fd8 ffff8800a9599fd8 0000000000012a40
[535202.805515] ffffffff81c0b020 ffff8802129c8000 ffff8800a95993c8 000000000000038d
[535202.805519] Call Trace:
[535202.805527] [<ffffffff816058cf>] schedule+0x3f/0x60
[535202.805552] [<ffffffffa01c40f7>] xlog_cil_force_lsn+0xf7/0x120 [xfs]
[535202.805567] [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[535202.805585] [<ffffffffa01b8945>] ? xfs_iunpin_wait+0x35/0xb0 [xfs]
[535202.805602] [<ffffffffa01c2747>] _xfs_log_force_lsn+0x57/0x2e0 [xfs]
[535202.805609] [<ffffffff8160772e>] ? _raw_spin_lock+0xe/0x20
[535202.805615] [<ffffffff811510e8>] ? add_partial+0x58/0x90
[535202.805632] [<ffffffffa01b8945>] ? xfs_iunpin_wait+0x35/0xb0 [xfs]
[535202.805648] [<ffffffffa01c29e8>] xfs_log_force_lsn+0x18/0x40 [xfs]
[535202.805665] [<ffffffffa01b68a7>] xfs_iunpin_nowait+0x47/0xb0 [xfs]
[535202.805681] [<ffffffffa01b8945>] xfs_iunpin_wait+0x35/0xb0 [xfs]
[535202.805687] [<ffffffff8104652b>] ? try_wait_for_completion+0x4b/0x60
[535202.805707] [<ffffffffa01e4352>] xfs_reclaim_inode+0xb2/0x2a0 [xfs]
[535202.805713] [<ffffffff812ee45c>] ? radix_tree_gang_lookup_tag.part.7+0x6c/0xf0
[535202.805732] [<ffffffffa01e4fec>] xfs_reclaim_inodes_ag+0x20c/0x330 [xfs]
[535202.805751] [<ffffffffa01e51b3>] xfs_reclaim_inode_shrink+0xa3/0xc0 [xfs]
[535202.805758] [<ffffffff8111b1dd>] shrink_slab+0x11d/0x190
[535202.805763] [<ffffffff8111d978>] do_try_to_free_pages+0x288/0x2e0
[535202.805782] [<ffffffffa01da59f>] ? xfs_buf_get+0x8f/0x1a0 [xfs]
[535202.805787] [<ffffffff8111dcd4>] try_to_free_pages+0xb4/0x180
[535202.805793] [<ffffffff81113260>] __alloc_pages_nodemask+0x4e0/0x800
[535202.805799] [<ffffffff81149273>] alloc_pages_current+0xa3/0x110
[535202.805805] [<ffffffff81151f5d>] allocate_slab+0xfd/0x160
[535202.805810] [<ffffffff81151ff0>] new_slab+0x30/0x160
[535202.805816] [<ffffffff815f5299>] __slab_alloc+0x127/0x222
[535202.805822] [<ffffffff8117d447>] ? d_alloc+0x27/0x1d0
[535202.805827] [<ffffffff811534d2>] kmem_cache_alloc+0x112/0x120
[535202.805832] [<ffffffff8117d447>] d_alloc+0x27/0x1d0
[535202.805838] [<ffffffff81172c4d>] d_alloc_and_lookup+0x2d/0x90
[535202.805844] [<ffffffff81180485>] ? d_lookup+0x35/0x60
[535202.805850] [<ffffffff81173946>] __lookup_hash.part.15+0x96/0xb0
[535202.805869] [<ffffffffa01ce173>] ? xfs_trans_free_item_desc+0x33/0x40 [xfs]
[535202.805874] [<ffffffff81174230>] lookup_hash+0x80/0xe0
[535202.805878] [<ffffffff8117625b>] do_last+0x26b/0x730
[535202.805883] [<ffffffff811778aa>] path_openat+0xca/0x3f0
[535202.805901] [<ffffffffa01d2630>] ? xfs_free_eofblocks+0x1d0/0x240 [xfs]
[535202.805919] [<ffffffffa01d2630>] ? xfs_free_eofblocks+0x1d0/0x240 [xfs]
[535202.805925] [<ffffffff81177cd2>] do_filp_open+0x42/0xa0
[535202.805931] [<ffffffff812f4cc1>] ? strncpy_from_user+0x31/0x40
[535202.805936] [<ffffffff8117307a>] ? do_getname+0x10a/0x180
[535202.805942] [<ffffffff8160772e>] ? _raw_spin_lock+0xe/0x20
[535202.805947] [<ffffffff81184f87>] ? alloc_fd+0xf7/0x150
[535202.805953] [<ffffffff8116795d>] do_sys_open+0xed/0x220
[535202.805958] [<ffffffff81169a65>] ? fput+0x25/0x30
[535202.805964] [<ffffffff81167ab0>] sys_open+0x20/0x30
[535202.805970] [<ffffffff8160fb82>] system_call_fastpath+0x16/0x1b
[535202.805975] INFO: task bonnie++:28664 blocked for more than 120 seconds.
... snip rest, I have more if you want it ...
storage2
--------
This originally had all 24 disks in a software RAID0 array. Two of them
(sdr, sdy) started giving read errors, so I built a new 22-disk RAID0
without those two. Then I ran two instances of bonnie++, on
/disk/scratch/test and /disk/scratch/testb
This machine has locked up in such a way that I cannot ssh to it, it
just drops the connection:
$ telnet storage2 22
Trying storage2...
Connected to 192.168.5.71.
Escape character is '^]'.
Connection closed by foreign host.
At the console I see a number of messages about blocked tasks (kworker,
bonnie++, sshd, cron, mv, kswapd0). There is a login prompt, but if I type
my username I get no password prompt.
...
[520326.168622] INFO: task kswapd0:30 blocked for more than 120 seconds.
[520326.177738] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[520326.197088] INFO: task kworder/3:2:4834 blocked for more than 120 seconds.
[520326.207151] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Ubuntu 11.10 storage2 tty1
storage2 login: brian
<< no response past this point >>
storage3
--------
This has 24 disks in a software RAID0 array, and I was running two instances
of bonnie++ against it.
$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : active raid0 sdn[12] sdu[18] sdx[22] sdl[9] sdy[23] sdd[2] sdt[17] sdi[7] sdj[8] sdr[19] sdg[5] sdw[21] sdf[4] sdv[20] sds[16] sdk[10] sde[3] sdh[6] sdp[14] sdo[13] sdc[1] sdb[0] sdq[15] sdm[11]
70326362112 blocks super 1.2 1024k chunks
unused devices: <none>
SYMPTOM: Like storage1, I can ssh in but I find no I/O activity going on. I
also see call tracebacks in dmesg, although this time the first one is
kworker not bonnie++
...
[ 38.659687] eth3: no IPv6 routers present
[ 41.738989] eth2: no IPv6 routers present
[531843.502509] INFO: task kworker/2:1:29 blocked for more than 120 seconds.
[531843.502876] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[531843.512298] kworker/2:1 D 0000000000000002 0 29 2 0x00000000
[531843.512304] ffff88020f493b80 0000000000000046 ffff880017213af8 ffff8800c93384c8
[531843.512309] ffff88020f493fd8 ffff88020f493fd8 ffff88020f493fd8 0000000000012a40
[531843.512313] ffff88020d80ae40 ffff88020f498000 0000000000000000 ffff88020e9c11e8
[531843.512318] Call Trace:
[531843.512327] [<ffffffff816058cf>] schedule+0x3f/0x60
[531843.512332] [<ffffffff816066c7>] __mutex_lock_slowpath+0xd7/0x150
[531843.512338] [<ffffffff81606282>] mutex_lock+0x22/0x40
[531843.512371] [<ffffffffa020806d>] xfs_reclaim_inodes_ag+0x28d/0x330 [xfs]
[531843.512395] [<ffffffffa02081f0>] ? xfs_reclaim_inodes+0x20/0x20 [xfs]
[531843.512416] [<ffffffffa02081ed>] xfs_reclaim_inodes+0x1d/0x20 [xfs]
[531843.512436] [<ffffffffa0208212>] xfs_reclaim_worker+0x22/0x40 [xfs]
[531843.512443] [<ffffffff8107bbba>] process_one_work+0x11a/0x480
[531843.512449] [<ffffffff8107c965>] worker_thread+0x165/0x370
[531843.512455] [<ffffffff8107c800>] ? manage_workers.isra.30+0x130/0x130
[531843.512461] [<ffffffff81080dac>] kthread+0x8c/0xa0
[531843.512468] [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
[531843.512475] [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa0
[531843.512481] [<ffffffff81610ca0>] ? gs_change+0x13/0x13
[531843.512485] INFO: task kswapd0:31 blocked for more than 120 seconds.
[531843.517282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[531843.526697] kswapd0 D ffffffff81805120 0 31 2 0x00000000
[531843.526701] ffff880212f018f0 0000000000000046 0000000000000000 0000000000000000
[531843.526706] ffff880212f01fd8 ffff880212f01fd8 ffff880212f01fd8 0000000000012a40
[531843.526710] ffff880212d2dc80 ffff88020f49ae40 ffff880212f018e0 0000000000004978
[531843.526714] Call Trace:
[531843.526719] [<ffffffff816058cf>] schedule+0x3f/0x60
[531843.526740] [<ffffffffa01e70f7>] xlog_cil_force_lsn+0xf7/0x120 [xfs]
[531843.526744] [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[531843.526764] [<ffffffffa01db945>] ? xfs_iunpin_wait+0x35/0xb0 [xfs]
[531843.526783] [<ffffffffa01e5747>] _xfs_log_force_lsn+0x57/0x2e0 [xfs]
[531843.526802] [<ffffffffa01db945>] ? xfs_iunpin_wait+0x35/0xb0 [xfs]
[531843.526819] [<ffffffffa01e59e8>] xfs_log_force_lsn+0x18/0x40 [xfs]
[531843.526837] [<ffffffffa01d98a7>] xfs_iunpin_nowait+0x47/0xb0 [xfs]
[531843.526854] [<ffffffffa01db945>] xfs_iunpin_wait+0x35/0xb0 [xfs]
[531843.526858] [<ffffffff8104652b>] ? try_wait_for_completion+0x4b/0x60
[531843.526879] [<ffffffffa0207352>] xfs_reclaim_inode+0xb2/0x2a0 [xfs]
[531843.526884] [<ffffffff812ee45c>] ? radix_tree_gang_lookup_tag.part.7+0x6c/0xf0
[531843.526904] [<ffffffffa0207fec>] xfs_reclaim_inodes_ag+0x20c/0x330 [xfs]
[531843.526910] [<ffffffff8160772e>] ? _raw_spin_lock+0xe/0x20
[531843.526928] [<ffffffffa02081b3>] xfs_reclaim_inode_shrink+0xa3/0xc0 [xfs]
[531843.526934] [<ffffffff8111b1dd>] shrink_slab+0x11d/0x190
[531843.526937] [<ffffffff8111e42a>] balance_pgdat+0x52a/0x6f0
[531843.526941] [<ffffffff8111e6da>] kswapd+0xea/0x1f0
[531843.526944] [<ffffffff8111e5f0>] ? balance_pgdat+0x6f0/0x6f0
[531843.526948] [<ffffffff81080dac>] kthread+0x8c/0xa0
[531843.526951] [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
[531843.526956] [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa0
[531843.526959] [<ffffffff81610ca0>] ? gs_change+0x13/0x13
[531843.526963] INFO: task kworker/3:1:52 blocked for more than 120 seconds.
[531843.532003] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[531843.541890] kworker/3:1 D 0000000000000003 0 52 2 0x00000000
[531843.541894] ffff88020f0b9450 0000000000000046 0000000000000000 0000000000000000
[531843.541899] ffff88020f0b9fd8 ffff88020f0b9fd8 ffff88020f0b9fd8 0000000000012a40
[531843.541903] ffff88020d808000 ffff88020f07ae40 ffff88020f0b9440 0000000000004978
[531843.541907] Call Trace:
[531843.541912] [<ffffffff816058cf>] schedule+0x3f/0x60
[531843.541931] [<ffffffffa01e70f7>] xlog_cil_force_lsn+0xf7/0x120 [xfs]
[531843.541938] [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[531843.541959] [<ffffffffa01db945>] ? xfs_iunpin_wait+0x35/0xb0 [xfs]
[531843.541979] [<ffffffffa01e5747>] _xfs_log_force_lsn+0x57/0x2e0 [xfs]
[531843.541999] [<ffffffffa01bbefc>] ? xfs_bmapi+0x2ac/0x1150 [xfs]
[531843.542019] [<ffffffffa01db945>] ? xfs_iunpin_wait+0x35/0xb0 [xfs]
[531843.542039] [<ffffffffa01e59e8>] xfs_log_force_lsn+0x18/0x40 [xfs]
[531843.542059] [<ffffffffa01d98a7>] xfs_iunpin_nowait+0x47/0xb0 [xfs]
[531843.542078] [<ffffffffa01db945>] xfs_iunpin_wait+0x35/0xb0 [xfs]
[531843.542084] [<ffffffff8104652b>] ? try_wait_for_completion+0x4b/0x60
[531843.542107] [<ffffffffa0207352>] xfs_reclaim_inode+0xb2/0x2a0 [xfs]
[531843.542114] [<ffffffff812ee45c>] ? radix_tree_gang_lookup_tag.part.7+0x6c/0xf0
[531843.542136] [<ffffffffa0207fec>] xfs_reclaim_inodes_ag+0x20c/0x330 [xfs]
[531843.542158] [<ffffffffa02081b3>] xfs_reclaim_inode_shrink+0xa3/0xc0 [xfs]
[531843.542166] [<ffffffff8111b1dd>] shrink_slab+0x11d/0x190
[531843.542171] [<ffffffff8111d978>] do_try_to_free_pages+0x288/0x2e0
[531843.542178] [<ffffffff8111dcd4>] try_to_free_pages+0xb4/0x180
[531843.542184] [<ffffffff81113260>] __alloc_pages_nodemask+0x4e0/0x800
[531843.542192] [<ffffffff8160772e>] ? _raw_spin_lock+0xe/0x20
[531843.542199] [<ffffffff81149273>] alloc_pages_current+0xa3/0x110
[531843.542205] [<ffffffff81151f5d>] allocate_slab+0xfd/0x160
[531843.542211] [<ffffffff81151ff0>] new_slab+0x30/0x160
[531843.542218] [<ffffffff815f5299>] __slab_alloc+0x127/0x222
[531843.542225] [<ffffffff8110c545>] ? mempool_alloc_slab+0x15/0x20
[531843.542232] [<ffffffff811534d2>] kmem_cache_alloc+0x112/0x120
[531843.542238] [<ffffffff8110c545>] mempool_alloc_slab+0x15/0x20
[531843.542245] [<ffffffff8110c863>] mempool_alloc+0x53/0x130
[531843.542251] [<ffffffff81081850>] ? add_wait_queue+0x60/0x60
[531843.542257] [<ffffffff8119bfbe>] bio_alloc_bioset+0x3e/0xf0
[531843.542263] [<ffffffff814a9350>] ? slot_store+0x2e0/0x2e0
[531843.542271] [<ffffffff814a51f7>] bio_alloc_mddev+0x27/0x50
[531843.542276] [<ffffffff814a93f8>] submit_flushes+0xa8/0x140[531843.542282] [<ffffffff8107bbba>] process_one_work+0x11a/0x480
[531843.542287] [<ffffffff8107c965>] worker_thread+0x165/0x370
[531843.542293] [<ffffffff8107c800>] ? manage_workers.isra.30+0x130/0x130
[531843.542299] [<ffffffff81080dac>] kthread+0x8c/0xa0
[531843.542305] [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
[531843.542312] [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa0
[531843.542318] [<ffffffff81610ca0>] ? gs_change+0x13/0x13
[531843.542325] INFO: task xfsaild/md127:1146 blocked for more than 120 seconds.
[531843.547840] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[531843.559458] xfsaild/md127 D 0000000000000003 0 1146 2 0x00000000
[531843.559461] ffff8802119bb700 0000000000000046 ffff8802119bb6f0 ffffffff81079c25
[531843.559463] ffff8802119bbfd8 ffff8802119bbfd8 ffff8802119bbfd8 0000000000012a40
[531843.559465] ffff88020f07ae40 ffff8802129d4560 ffff8802119bb6f0 ffff8802129d4560
[531843.559467] Call Trace:
[531843.559469] [<ffffffff81079c25>] ? __queue_work+0xe5/0x320
[531843.559471] [<ffffffff816058cf>] schedule+0x3f/0x60
[531843.559473] [<ffffffff814a9606>] md_flush_request+0x86/0x140
[531843.559475] [<ffffffff810574b0>] ? try_to_wake_up+0x200/0x200
[531843.559481] [<ffffffffa0010419>] raid0_make_request+0x119/0x1e0 [raid0]
[531843.559484] [<ffffffff812df850>] ? blk_throtl_bio+0x1c0/0x510
[531843.559486] [<ffffffff814a4886>] md_make_request+0xc6/0x200
[531843.559488] [<ffffffff8110c545>] ? mempool_alloc_slab+0x15/0x20
[531843.559490] [<ffffffff812ce48a>] generic_make_request.part.51+0x24a/0x510
[531843.559492] [<ffffffff812ce795>] generic_make_request+0x45/0x60
[531843.559494] [<ffffffff812ce837>] submit_bio+0x87/0x110
[531843.559504] [<ffffffffa01fda82>] _xfs_buf_ioapply+0x162/0x1e0 [xfs]
[531843.559514] [<ffffffffa01e1dea>] ? xlog_bdstrat+0x2a/0x60 [xfs]
[531843.559525] [<ffffffffa01fdc5e>] xfs_buf_iorequest+0x5e/0x100 [xfs]
[531843.559535] [<ffffffffa01e1dea>] xlog_bdstrat+0x2a/0x60 [xfs]
[531843.559545] [<ffffffffa01e3249>] xlog_sync+0x1c9/0x3c0 [xfs]
[531843.559554] [<ffffffffa01e48a5>] xlog_state_release_iclog+0xa5/0xd0 [xfs]
[531843.559563] [<ffffffffa01e4cf7>] xlog_write_copy_finish.isra.12+0x147/0x170 [xfs]
[531843.559573] [<ffffffffa01e50f8>] xlog_write+0x268/0x400 [xfs]
[531843.559583] [<ffffffffa01f9667>] ? kmem_zone_alloc+0x67/0xe0 [xfs]
[531843.559593] [<ffffffffa01e67ec>] xlog_cil_push+0x20c/0x380 [xfs]
[531843.559603] [<ffffffffa01e710e>] xlog_cil_force_lsn+0x10e/0x120 [xfs]
[531843.559606] [<ffffffff81086bb6>] ? down_trylock+0x36/0x50
[531843.559617] [<ffffffffa01fc881>] ? xfs_buf_rele+0x51/0x130 [xfs]
[531843.559627] [<ffffffffa01fcd06>] ? _xfs_buf_find+0x56/0x280 [xfs]
[531843.559637] [<ffffffffa01e5478>] _xfs_log_force+0x68/0x2a0 [xfs]
[531843.559647] [<ffffffffa01fcef9>] ? _xfs_buf_find+0x249/0x280 [xfs]
[531843.559658] [<ffffffffa01d91d9>] ? xfs_iunlock+0xe9/0x150 [xfs]
[531843.559668] [<ffffffffa01dec87>] ? xfs_inode_item_pushbuf+0x87/0xe0 [xfs]
[531843.559678] [<ffffffffa01e56c8>] xfs_log_force+0x18/0x40 [xfs]
[531843.559688] [<ffffffffa01f37b4>] xfsaild_push+0x394/0x3a0 [xfs]
[531843.559699] [<ffffffffa01f3822>] xfsaild+0x62/0xc0 [xfs]
[531843.559708] [<ffffffffa01f37c0>] ? xfsaild_push+0x3a0/0x3a0 [xfs]
[531843.559711] [<ffffffff81080dac>] kthread+0x8c/0xa0
[531843.559714] [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
[531843.559717] [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa0
[531843.559720] [<ffffffff81610ca0>] ? gs_change+0x13/0x13
[531843.559725] INFO: task bonnie++:15912 blocked for more than 120 seconds.
... etc
Additional testing
------------------
All three machines are still stuck in this state, and I can leave them like
this for the next few days if necessary if there are any tests or anything
that you can suggest I do to the systems while they're like this.
The only thing I could think of doing was a 'dd' to see if the underlying
drives are working:
root 23604 0.0 0.0 11064 1688 pts/4 D+ 19:04 0:00 dd if=/dev/md127 of=/dev/null bs=1024k count=1024
That process hung. So I tried again with smaller ones:
strace dd if=/dev/md127 of=/dev/null bs=1024k count=1 #ok
strace dd if=/dev/md127 of=/dev/null bs=1024k count=2 #ok
strace dd if=/dev/md127 of=/dev/null bs=1024k count=3 #ok
strace dd if=/dev/md127 of=/dev/null bs=1024k count=4 #ok
strace dd if=/dev/md127 of=/dev/null bs=1024k count=5 #hung
The strace of the last process was like this:
open("/dev/md127", O_RDONLY) = 3
dup2(3, 0) = 0
close(3) = 0
lseek(0, 0, SEEK_CUR) = 0
open("/dev/null", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
dup2(3, 1) = 1
close(3) = 0
mmap(NULL, 1060864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6d5a1c7000
read(0, "XFSB\0\0\20\0\0\0\0\4\27\361\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
write(1, "XFSB\0\0\20\0\0\0\0\4\27\361\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
read(0, "INA\355\2\1\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\4"..., 1048576) = 1048576
write(1, "INA\355\2\1\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\4"..., 1048576) = 1048576
read(0, "IN\201\244\2\2\0\0\0\0\3\350\0\0\3\350\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\16"..., 1048576) = 1048576
write(1, "IN\201\244\2\2\0\0\0\0\3\350\0\0\3\350\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\16"..., 1048576) = 1048576
read(0, "p1\17\343>\203)\321\272\1\357\0\0\0\272\252V\206\377~\300\203\376\351\3\366\35)Zh[\25"..., 1048576) = 1048576
write(1, "p1\17\343>\203)\321\272\1\357\0\0\0\272\252V\206\377~\300\203\376\351\3\366\35)Zh[\25"..., 1048576) = 1048576
read(0,
That is, it read 4 chunks OK and hung reading the 5th. This seems
suspiciously like a drive has locked up, but there are no dmesg errors for
sd[a-z] to suggest disk problems, whereas when I had a real failed disk
on storage2 I had messages like this (captured previously):
[ 1217.995083] sd 5:0:0:0: [sdr] Unhandled sense code
[ 1217.995085] sd 5:0:0:0: [sdr] Result: hostbyte=invalid driverbyte=DRIVER_SENSE
[ 1217.995089] sd 5:0:0:0: [sdr] Sense Key : Medium Error [current]
[ 1217.995095] sd 5:0:0:0: [sdr] Add. Sense: Unrecovered read error
[ 1217.995099] sd 5:0:0:0: [sdr] CDB: Read(10): 28 00 30 00 05 80 00 00 10 00
[ 1217.995108] end_request: critical target error, dev sdr, sector 805307776
Furthermore, the following command which makes a small transfer to every
disk runs fine:
# for i in sd{b..y}; do echo $i; dd if=/dev/$i of=/dev/null count=1; done
But this one gets stuck at sdk:
# for i in sd{b..y}; do echo $i; dd if=/dev/$i of=/dev/null bs=1024k count=1; done
Final point: the dd and bonnie++ processes can't be killed even with -9.
So, these machines are very sick, and in roughly the same way. And the fact
I can get simple dd's to lock up implies it's *not* anything to do with XFS,
so my apologies if this is the wrong forum.
Fortunately the problem appears fairly easy to reproduce. They are
identical hardware, in terms of chassis, PSU, motherboard, BIOS version; and
they are the same OS.
I had already flashed the newest LSI firmware into the HBAs:
$ sudo ./sas2flash -listall
LSI Corporation SAS2 Flash Utility
Version 13.00.00.00 (2012.02.17)
Copyright (c) 2008-2012 LSI Corporation. All rights reserved
Adapter Selected is a LSI SAS: SAS2116_1(B1)
Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr
----------------------------------------------------------------------------
0 SAS2116_1(B1) 13.00.01.00 0c.00.00.01 07.25.00.00 00:02:00:00
1 SAS2008(B2) 13.00.01.00 0c.00.00.05 07.25.00.00 00:03:00:00
Finished Processing Commands Successfully.
Exiting SAS2Flash.
(Maybe "newest" wasn't a good idea - it was released Feb 17 2012)
I could try ubuntu 12.04 with its newer kernel - I'd be surprised if the LSI
driver in 11.10 was as broken as this appears to be, but it's not
impossible. LSI themselves only seem to provide driver packages
for RH4/SLES9 and RH5/SLES10 at
http://www.lsi.com/channel/support/products/Pages/LSISAS9201-16i.aspx
(and I guess trying CentOS 5 is another option)
Any other suggestions (and of course interpretation of the kernel call
tracebacks) would be much appreciated.
Thanks,
Brian.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Storage server, hung tasks and tracebacks
2012-05-02 18:44 Storage server, hung tasks and tracebacks Brian Candler
@ 2012-05-03 12:50 ` Stan Hoeppner
2012-05-03 20:41 ` Brian Candler
0 siblings, 1 reply; 14+ messages in thread
From: Stan Hoeppner @ 2012-05-03 12:50 UTC (permalink / raw)
To: xfs
On 5/2/2012 1:44 PM, Brian Candler wrote:
> I have a really wierd problem with storage. The one thing that I *can* see
> is dmesg output giving messages about hung tasks and XFS call tracebacks, so
> I'm posting here in the hope that someone can help me interpret them - even
> if only to say "this definitely means that your disks are not responding"
> would be very helpful.
...
> Any other suggestions (and of course interpretation of the kernel call
> tracebacks) would be much appreciated.
Which mainboards are these Brian? Make/model?
Make/model/count of all add in cards?
Make/model of PSU?
Make model of chassis?
I'll sleuth around and see what I can find. Could be some obscure
expansion card interaction. Could be undersized PSUs or lack of
backplanes spread evenly across the 12v rails of a multi-rail PSU, etc, etc.
--
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Storage server, hung tasks and tracebacks
2012-05-03 12:50 ` Stan Hoeppner
@ 2012-05-03 20:41 ` Brian Candler
2012-05-03 22:19 ` Stan Hoeppner
0 siblings, 1 reply; 14+ messages in thread
From: Brian Candler @ 2012-05-03 20:41 UTC (permalink / raw)
To: Stan Hoeppner; +Cc: xfs
On Thu, May 03, 2012 at 07:50:00AM -0500, Stan Hoeppner wrote:
> > Any other suggestions (and of course interpretation of the kernel call
> > tracebacks) would be much appreciated.
>
> Which mainboards are these Brian? Make/model?
Tyan S5510, with Intel Xeon CPU E31225 @ 3.10GHz
Upgraded to BIOS 1.05a and iKVM 3.00
> Make/model/count of all add in cards?
1 x LSI SAS9201–16i
1 x LSI SAS92118–8i
1 x Intel X520-DA2 dual 10G NIC
although the 10G link wasn't being used for the most recent tests.
> Make/model of PSU?
Will have to check, I think it may be this one:
http://www.xcase.co.uk/XCASE-Power-Supply-p/psu-dolphin-900..htm
> Make model of chassis?
http://www.xcase.co.uk/24-bay-Hotswap-rackmount-chassis-norco-RPC-4224-p/case-xcase-rm424.htm
The drives are 24 x ST3000DM001 (I was hoping to get low-power Hitachi
drives but they weren't available at the time)
> I'll sleuth around and see what I can find. Could be some obscure
> expansion card interaction. Could be undersized PSUs or lack of
> backplanes spread evenly across the 12v rails of a multi-rail PSU, etc, etc.
Much appreciated.
However, last night I rebooted one box (the one which wouldn't let me ssh
in) then upgraded it to ubuntu 12.04. It has been running a couple of
concurrent bonnie++ instances for over 24 hours without a hitch.
So maybe it's the mpt2sas driver which is the difference:
[dmesg from Ubuntu 11.10]
mpt2sas version 08.100.00.02
[dmesg from Ubuntu 12.04]
mpt2sas version 10.100.00.00
Regards,
Brian.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Storage server, hung tasks and tracebacks
2012-05-03 20:41 ` Brian Candler
@ 2012-05-03 22:19 ` Stan Hoeppner
2012-05-04 16:32 ` Brian Candler
0 siblings, 1 reply; 14+ messages in thread
From: Stan Hoeppner @ 2012-05-03 22:19 UTC (permalink / raw)
To: Brian Candler; +Cc: xfs
On 5/3/2012 3:41 PM, Brian Candler wrote:
> On Thu, May 03, 2012 at 07:50:00AM -0500, Stan Hoeppner wrote:
>>> Any other suggestions (and of course interpretation of the kernel call
>>> tracebacks) would be much appreciated.
>>
>> Which mainboards are these Brian? Make/model?
>
> Tyan S5510, with Intel Xeon CPU E31225 @ 3.10GHz
>
> Upgraded to BIOS 1.05a and iKVM 3.00
>
>> Make/model/count of all add in cards?
>
> 1 x LSI SAS9201–16i
> 1 x LSI SAS92118–8i
> 1 x Intel X520-DA2 dual 10G NIC
>
> although the 10G link wasn't being used for the most recent tests.
I'm not finding anything down this avenue so far.
>> Make/model of PSU?
>
> Will have to check, I think it may be this one:
> http://www.xcase.co.uk/XCASE-Power-Supply-p/psu-dolphin-900..htm
>
>> Make model of chassis?
>
> http://www.xcase.co.uk/24-bay-Hotswap-rackmount-chassis-norco-RPC-4224-p/case-xcase-rm424.htm
>
> The drives are 24 x ST3000DM001 (I was hoping to get low-power Hitachi
> drives but they weren't available at the time)
Yuk. Zero technical info available on this "house brand" PSU or
chassis. That PSU is decidedly consumer oriented. It may be fine, but
there seems to be no documentation available to verify its output
current or rail config. No pic of the PSU sticker, no manual to be
found online. Many of these "house brand" and cheap no-name Chinese
PSUs don't put out anywhere near the advertised power, nor necessarily
cleanly, which is why the UL sticker is so handy--it doesn't lie as the
printing on the box might. No pic of the inside of that chassis, and no
manual here either, so I have no way of knowing the backplane
architecture. I'd have to guess it's a 6x 4 drive passive backplane
setup similar to the Norco chassis.
FWIW, I'd never buy a PSU with a "technical description" of:
140mm Red Fan
Strong Long Cables
Strong outer Casing
Quality Internal Components
Low Noise - Big Heat Sinks
One can spend a few extra bucks (pounds) and get a SuperMicro or Sparkle
unit with all the tech specs and QC test data available online, such as:
http://www.supermicro.com/products/powersupply/80plus/80plus_pws-865-pq.pdf
> However, last night I rebooted one box (the one which wouldn't let me ssh
> in) then upgraded it to ubuntu 12.04. It has been running a couple of
> concurrent bonnie++ instances for over 24 hours without a hitch.
>
> So maybe it's the mpt2sas driver which is the difference:
>
> [dmesg from Ubuntu 11.10]
> mpt2sas version 08.100.00.02
>
> [dmesg from Ubuntu 12.04]
> mpt2sas version 10.100.00.00
Glad to hear you've got one running somewhat stable. Could be a driver
problem, but it's pretty rare for a SCSI driver to hard lock a box isn't
it? Keep us posted.
--
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Storage server, hung tasks and tracebacks
2012-05-03 22:19 ` Stan Hoeppner
@ 2012-05-04 16:32 ` Brian Candler
2012-05-04 16:50 ` Stefan Ring
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Brian Candler @ 2012-05-04 16:32 UTC (permalink / raw)
To: Stan Hoeppner; +Cc: xfs
On Thu, May 03, 2012 at 05:19:41PM -0500, Stan Hoeppner wrote:
> Glad to hear you've got one running somewhat stable. Could be a driver
> problem, but it's pretty rare for a SCSI driver to hard lock a box isn't
> it?
Yes, that bothers me too.
> Keep us posted.
Last night I fired up two more instances of bonnie++ on that box, so there
were four at once. Going back to the box now, I find that they have all
hung :-(
They are stuck at:
Delete files in random order...
Stat files in random order...
Stat files in random order...
Stat files in sequential order...
respectively.
iostat 5 shows no activity. There are 9 hung processes:
$ uptime
17:23:35 up 1 day, 20:39, 1 user, load average: 9.04, 9.08, 8.91
$ ps auxwww | grep " D" | grep -v grep
root 35 1.5 0.0 0 0 ? D May02 42:10 [kswapd0]
root 1179 0.0 0.0 0 0 ? D May02 1:50 [xfsaild/md126]
root 3127 0.0 0.0 25096 312 ? D 16:55 0:00 /usr/lib/postfix/master
tomi 29138 1.1 0.0 378860 3708 pts/1 D+ 12:43 3:06 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
tomi 29390 1.0 0.0 378860 3560 pts/3 D+ 12:52 2:53 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
tomi 30356 1.1 0.0 378860 3512 pts/2 D+ 13:32 2:36 bonnie++ -d /disk/scratch/testb -s 16384k -n 98:800k:500k:1000
root 31075 0.0 0.0 0 0 ? D 14:00 0:04 [kworker/0:0]
tomi 31796 0.6 0.0 378860 3864 pts/4 D+ 14:30 1:05 bonnie++ -d /disk/scratch/testb -s 16384k -n 98:800k:500k:1000
root 31922 0.0 0.0 0 0 ? D 14:35 0:00 [kworker/1:0]
dmesg shows hung tasks and backtraces, starting with:
[150927.599920] INFO: task kswapd0:35 blocked for more than 120 seconds.
[150927.600263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[150927.600698] kswapd0 D ffffffff81806240 0 35 2 0x00000000
[150927.600704] ffff880212389330 0000000000000046 ffff880212389320 ffffffff81082df5
[150927.600710] ffff880212389fd8 ffff880212389fd8 ffff880212389fd8 0000000000013780
[150927.600715] ffff8802121816f0 ffff88020e538000 ffff880212389320 ffff88020e538000
[150927.600719] Call Trace:
[150927.600728] [<ffffffff81082df5>] ? __queue_work+0xe5/0x320
[150927.600733] [<ffffffff8165a55f>] schedule+0x3f/0x60
[150927.600739] [<ffffffff814e82c6>] md_flush_request+0x86/0x140
[150927.600745] [<ffffffff8105f990>] ? try_to_wake_up+0x200/0x200
[150927.600756] [<ffffffffa0010419>] raid0_make_request+0x119/0x1c0 [raid0]
...
Now, the only other thing I have found by googling is a suggestion that LSI
drivers lock up when there is any smart or hddtemp activity: see end of
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/906873
On this system the smartmontools package is installed, but I have not
configured it, and smartd is not running. I don't have hddtemp installed
either.
I am completely at a loss with all this... I've never seen a Unix/Linux
system behave so unreliably. One of the company's directors has reminded me
that we have a Windows storage server with 48 disks which has been running
without incident for the last 3 or 4 years, and I don't have a good answer
for that :-(
Regards,
Brian.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Storage server, hung tasks and tracebacks
2012-05-04 16:32 ` Brian Candler
@ 2012-05-04 16:50 ` Stefan Ring
2012-05-07 1:53 ` Dave Chinner
[not found] ` <4FA4C321.2070105@hardwarefreak.com>
2 siblings, 0 replies; 14+ messages in thread
From: Stefan Ring @ 2012-05-04 16:50 UTC (permalink / raw)
To: Brian Candler; +Cc: Linux fs XFS
> I am completely at a loss with all this... I've never seen a Unix/Linux
> system behave so unreliably.
Fortunately, Linux doesn't have to prove itself anymore, these days.
Usually these things happen on really crappy hardware. I had a new HP
Proliant Blade server a few months ago, and it would randomly lock up
and produce NMIs for no apparent reason. It has inflicted a decent
amount of pain on me. Finally, it was replaced with an identical
model, and all problems were instantly gone.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Storage server, hung tasks and tracebacks
[not found] ` <4FA4C321.2070105@hardwarefreak.com>
@ 2012-05-06 8:47 ` Brian Candler
2012-05-15 14:02 ` Brian Candler
1 sibling, 0 replies; 14+ messages in thread
From: Brian Candler @ 2012-05-06 8:47 UTC (permalink / raw)
To: Stan Hoeppner; +Cc: xfs
On Sat, May 05, 2012 at 01:05:21AM -0500, Stan Hoeppner wrote:
> Did I already mention checking the AC lines feeding these 3 boxes?
> Assuming they're on UPS are the boost/buck features working properly?
> Is your area known for fantastic clean power or dirty power with lots of
> sags? That's the first thing I'd suspect with 3 new machines all
> behaving this way.
We thought that too. These boxes are installed in a data centre; however we
just took power to a new rack and these three machines are installed in
there, so it's not impossible there's a problem with the feed to that rack.
We did have some machines on the A and B PDUs in this rack, but we may also
try powering one from an adjacent rack.
Many thanks for your helpful suggestions, and we are working through this
with the equipment supplier (who is extraordinarily helpful and
knowledgeable and as keen to get to the bottom of this as we are). I'll
certainly let you know when we have a resolution.
Regards,
Brian.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Storage server, hung tasks and tracebacks
2012-05-04 16:32 ` Brian Candler
2012-05-04 16:50 ` Stefan Ring
@ 2012-05-07 1:53 ` Dave Chinner
[not found] ` <4FA4C321.2070105@hardwarefreak.com>
2 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2012-05-07 1:53 UTC (permalink / raw)
To: Brian Candler; +Cc: Stan Hoeppner, xfs
On Fri, May 04, 2012 at 05:32:37PM +0100, Brian Candler wrote:
> On Thu, May 03, 2012 at 05:19:41PM -0500, Stan Hoeppner wrote:
> > Glad to hear you've got one running somewhat stable. Could be a driver
> > problem, but it's pretty rare for a SCSI driver to hard lock a box isn't
> > it?
No. The hardware does something bad to the PCI bus, or DMAs
something over kernel memory, or won't de-assert and interrupt line,
or .... and the system will hard hang. Hell, if it just stops and
you run out of memory because IO is needed to clean and free memory,
then system can hang there as well....
> > Keep us posted.
>
> Last night I fired up two more instances of bonnie++ on that box, so there
> were four at once. Going back to the box now, I find that they have all
> hung :-(
>
> They are stuck at:
>
> Delete files in random order...
> Stat files in random order...
> Stat files in random order...
> Stat files in sequential order...
>
> respectively.
>
> iostat 5 shows no activity. There are 9 hung processes:
>
> $ uptime
> 17:23:35 up 1 day, 20:39, 1 user, load average: 9.04, 9.08, 8.91
> $ ps auxwww | grep " D" | grep -v grep
> root 35 1.5 0.0 0 0 ? D May02 42:10 [kswapd0]
> root 1179 0.0 0.0 0 0 ? D May02 1:50 [xfsaild/md126]
> root 3127 0.0 0.0 25096 312 ? D 16:55 0:00 /usr/lib/postfix/master
> tomi 29138 1.1 0.0 378860 3708 pts/1 D+ 12:43 3:06 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> tomi 29390 1.0 0.0 378860 3560 pts/3 D+ 12:52 2:53 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> tomi 30356 1.1 0.0 378860 3512 pts/2 D+ 13:32 2:36 bonnie++ -d /disk/scratch/testb -s 16384k -n 98:800k:500k:1000
> root 31075 0.0 0.0 0 0 ? D 14:00 0:04 [kworker/0:0]
> tomi 31796 0.6 0.0 378860 3864 pts/4 D+ 14:30 1:05 bonnie++ -d /disk/scratch/testb -s 16384k -n 98:800k:500k:1000
> root 31922 0.0 0.0 0 0 ? D 14:35 0:00 [kworker/1:0]
>
> dmesg shows hung tasks and backtraces, starting with:
>
> [150927.599920] INFO: task kswapd0:35 blocked for more than 120 seconds.
> [150927.600263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [150927.600698] kswapd0 D ffffffff81806240 0 35 2 0x00000000
> [150927.600704] ffff880212389330 0000000000000046 ffff880212389320 ffffffff81082df5
> [150927.600710] ffff880212389fd8 ffff880212389fd8 ffff880212389fd8 0000000000013780
> [150927.600715] ffff8802121816f0 ffff88020e538000 ffff880212389320 ffff88020e538000
> [150927.600719] Call Trace:
> [150927.600728] [<ffffffff81082df5>] ? __queue_work+0xe5/0x320
> [150927.600733] [<ffffffff8165a55f>] schedule+0x3f/0x60
> [150927.600739] [<ffffffff814e82c6>] md_flush_request+0x86/0x140
> [150927.600745] [<ffffffff8105f990>] ? try_to_wake_up+0x200/0x200
> [150927.600756] [<ffffffffa0010419>] raid0_make_request+0x119/0x1c0 [raid0]
That's most likely a hardware or driver problem - the IO request
queue is full which means that IO completions are not occurring or
being delayed excessively. The problem is below the level of the
filesystem....
> I am completely at a loss with all this... I've never seen a Unix/Linux
> system behave so unreliably.
If you are buying bottom of the barrel hardware, then you get the
reliability that you pay for. Spend a few more dollars and buy
something that is properly engineered - you've wasted more money
trying to diagnose this problem that you would have saved by being
cheap hardware....
> One of the company's directors has reminded me
> that we have a Windows storage server with 48 disks which has been running
> without incident for the last 3 or 4 years, and I don't have a good answer
> for that :-(
If you buy bottom of the barrel hardware for Windows servers, then
you'll get similar results, only they'll be much harder to diagnose.
Software can't fix busted hardware...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Storage server, hung tasks and tracebacks
[not found] ` <4FA4C321.2070105@hardwarefreak.com>
2012-05-06 8:47 ` Brian Candler
@ 2012-05-15 14:02 ` Brian Candler
2012-05-20 16:35 ` Brian Candler
2012-05-20 23:59 ` Dave Chinner
1 sibling, 2 replies; 14+ messages in thread
From: Brian Candler @ 2012-05-15 14:02 UTC (permalink / raw)
To: Stan Hoeppner; +Cc: xfs
Update:
After a week away, I am continuing to try to narrow down the problem of this
system withing hanging I/O.
I can fairly reliably repeat the problem on a system with 24 disks, and I've
embarked on trying some different configs to see what's the simplest way I
can make this die.
During this, I found something of interest: I happened to leave an 'iostat
5' process running, and that hung too. i.e. ps showed it in 'D+' state, and
it was unkillable.
root 34 0.6 0.0 0 0 ? D 11:29 1:18 [kswapd0]
root 1258 0.0 0.0 15976 532 ? Ds 11:29 0:00 /usr/sbin/irqbalance
root 1421 0.0 0.0 0 0 ? D 12:49 0:01 [xfsaild/md127]
snmp 1430 0.0 0.0 48608 3440 ? D 11:29 0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
xxxx 1614 1.1 0.0 378860 3812 pts/1 D+ 12:50 1:15 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1669 1.2 0.0 378860 3816 pts/2 D+ 12:50 1:21 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1727 0.5 0.0 383424 692 pts/3 Dl+ 12:51 0:37 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1782 1.2 0.0 378860 3824 pts/4 D+ 12:51 1:20 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1954 0.0 0.0 5912 544 pts/0 D+ 12:58 0:00 iostat 5
root 2642 0.2 0.0 0 0 ? D 13:25 0:09 [kworker/0:1]
root 3233 0.0 0.0 5044 168 ? Ds 13:50 0:00 /usr/sbin/sshd -D -R
xxxx 4648 0.0 0.0 8104 936 pts/6 S+ 14:41 0:00 grep --color=auto D
root 29491 0.0 0.0 0 0 ? D 12:45 0:00 [kworker/1:2]
I wonder if iostat actually communicates with the device driver at all? If
not, then presumably it's looking at some kernel data structure. Maybe
there is a lock being kept open on that by someone/something.
At the same time, I notice that 'cat /proc/diskstats' still works, and
starting a new 'iostat 5' process works too.
After issuing halt -p I get this:
root 34 0.6 0.0 0 0 ? D 11:29 1:18 [kswapd0]
root 1258 0.0 0.0 15976 532 ? Ds 11:29 0:00 /usr/sbin/irqbalance
root 1421 0.0 0.0 0 0 ? D 12:49 0:01 [xfsaild/md127]
snmp 1430 0.0 0.0 48608 3440 ? D 11:29 0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
xxxx 1614 1.0 0.0 378860 3812 pts/1 D+ 12:50 1:15 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1669 1.1 0.0 378860 3816 pts/2 D+ 12:50 1:21 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1727 0.5 0.0 383424 692 pts/3 Dl+ 12:51 0:37 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1782 1.1 0.0 378860 3824 pts/4 D+ 12:51 1:20 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
xxxx 1954 0.0 0.0 5912 544 pts/0 D+ 12:58 0:00 iostat 5
root 2642 0.1 0.0 0 0 ? D 13:25 0:09 [kworker/0:1]
root 3233 0.0 0.0 5044 168 ? Ds 13:50 0:00 /usr/sbin/sshd -D -R
root 4753 0.0 0.0 15056 928 ? D 14:42 0:00 umount /run/rpc_pipefs
root 4828 0.0 0.0 4296 348 ? D 14:42 0:00 sync
root 4834 0.0 0.0 8100 624 pts/6 R+ 14:50 0:00 grep --color=auto D
root 29491 0.0 0.0 0 0 ? D 12:45 0:00 [kworker/1:2]
I see even umount'ing rpc_pipefs is hanging. So this suggests there's some
sort of global lock involved.
Anyway, I just wonder if this jogs a memory in anyone, as to why iostat
would hang in an unkillable way.
Regards,
Brian.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Storage server, hung tasks and tracebacks
2012-05-15 14:02 ` Brian Candler
@ 2012-05-20 16:35 ` Brian Candler
2012-05-22 13:14 ` Brian Candler
2012-05-20 23:59 ` Dave Chinner
1 sibling, 1 reply; 14+ messages in thread
From: Brian Candler @ 2012-05-20 16:35 UTC (permalink / raw)
To: Stan Hoeppner; +Cc: xfs
Another update:
I have been trying some various combinations to see under what circumstances
I can make things lock up.
The main discovery: using ext4 instead of xfs, I cannot get the server to
lock up - after 36 hours of continuous testing anyway. With xfs and
everything else identical, it typically locks up within 10 minutes.
This is not to say that xfs is at fault. It may be that xfs generates a
higher peak load of I/O ops or something, and that tickles the problem. In
any case I see a mixture of unkillable processes: not only bonnie++ and
xfsaild but I have also seen kswapd, kworker, irqbalance, even postfix
processes (which should not even be touching the 24-disk array; there is a
separate system disk directly connected to the motherboard's own SATA
controller)
The test is running four concurrent bonnie++ sessions in separate screen
sessions.
Some of the tests performed:
- 24 SATA disks, LSI HBAs, md RAID0, XFS: rapid lockup
- 24 SATA disks, LSI HBAs, md RAID0, ext4: no lockup seen so far
- 2 SATA disks, LSI HBAs, md RAID0, XFS: no lockup
- 1 system SATA disk, motherboard SATA, no RAID, ext4: no lockup
I did also write a ruby script to do lots of concurrent dd reads (at random
offsets) directly from the array. I wasn't able to replicate the problem
with that.
This is with Seagate 7200rpm drives, and the total I/O bandwidth I can see
is quite a lot (see iostat below). I can also replicate the problem in a
similar system with Hitachi "coolspin" (5940rpm?) drives, but it seems to
take somewhat longer, maybe an hour or two, so perhaps the peak I/O ops is
something to do with it?
(These systems do have only 8GB RAM, so I also wondered if it was something
to do with deadlocking when allocating buffer space if not enough was
available)
Regards,
Brian.
avg-cpu: %user %nice %system %iowait %steal %idle
1.55 0.00 80.67 12.63 0.00 5.15
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.40 0.00 2.40 0 12
sdf 187.80 17817.60 109416.80 89088 547084
sde 182.60 17817.60 112051.20 89088 560256
sdd 183.00 17817.60 108800.00 89088 544000
sdc 167.00 17612.80 105840.80 88064 529204
sdg 162.80 17612.80 107735.20 88064 538676
sdh 180.00 18022.40 112230.40 90112 561152
sdp 168.20 17408.00 107929.60 87040 539648
sdj 179.60 17614.40 111346.40 88072 556732
sdq 174.20 17408.00 108544.00 87040 542720
sdk 201.60 17612.80 111206.40 88064 556032
sdb 189.20 17819.20 108800.00 89096 544000
sdl 195.60 17542.40 110387.20 87712 551936
sdo 196.00 17408.00 111206.40 87040 556032
sdm 200.00 17408.00 110796.80 87040 553984
sdn 189.00 17408.00 108544.00 87040 542720
sdi 168.60 18022.40 112025.60 90112 560128
sdr 192.60 17819.20 111858.40 89096 559292
sdu 193.80 17612.80 108953.60 88064 544768
sdv 202.60 17612.80 108851.20 88064 544256
sdw 178.20 17612.80 108953.60 88064 544768
sdy 191.60 17612.80 110796.80 88064 553984
sdx 196.00 17612.80 111616.00 88064 558080
sds 182.80 17612.80 109158.40 88064 545792
sdt 191.60 17203.20 111219.20 86016 556096
md127 7569.80 415064.00 2620999.20 2075320 13104996
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Storage server, hung tasks and tracebacks
2012-05-15 14:02 ` Brian Candler
2012-05-20 16:35 ` Brian Candler
@ 2012-05-20 23:59 ` Dave Chinner
2012-05-21 9:58 ` Brian Candler
1 sibling, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2012-05-20 23:59 UTC (permalink / raw)
To: Brian Candler; +Cc: Stan Hoeppner, xfs
On Tue, May 15, 2012 at 03:02:37PM +0100, Brian Candler wrote:
> Update:
>
> After a week away, I am continuing to try to narrow down the problem of this
> system withing hanging I/O.
>
> I can fairly reliably repeat the problem on a system with 24 disks, and I've
> embarked on trying some different configs to see what's the simplest way I
> can make this die.
>
> During this, I found something of interest: I happened to leave an 'iostat
> 5' process running, and that hung too. i.e. ps showed it in 'D+' state, and
> it was unkillable.
>
> root 34 0.6 0.0 0 0 ? D 11:29 1:18 [kswapd0]
> root 1258 0.0 0.0 15976 532 ? Ds 11:29 0:00 /usr/sbin/irqbalance
> root 1421 0.0 0.0 0 0 ? D 12:49 0:01 [xfsaild/md127]
> snmp 1430 0.0 0.0 48608 3440 ? D 11:29 0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
> xxxx 1614 1.1 0.0 378860 3812 pts/1 D+ 12:50 1:15 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx 1669 1.2 0.0 378860 3816 pts/2 D+ 12:50 1:21 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx 1727 0.5 0.0 383424 692 pts/3 Dl+ 12:51 0:37 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx 1782 1.2 0.0 378860 3824 pts/4 D+ 12:51 1:20 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx 1954 0.0 0.0 5912 544 pts/0 D+ 12:58 0:00 iostat 5
> root 2642 0.2 0.0 0 0 ? D 13:25 0:09 [kworker/0:1]
> root 3233 0.0 0.0 5044 168 ? Ds 13:50 0:00 /usr/sbin/sshd -D -R
> xxxx 4648 0.0 0.0 8104 936 pts/6 S+ 14:41 0:00 grep --color=auto D
> root 29491 0.0 0.0 0 0 ? D 12:45 0:00 [kworker/1:2]
>
> I wonder if iostat actually communicates with the device driver at all? If
> not, then presumably it's looking at some kernel data structure. Maybe
> there is a lock being kept open on that by someone/something.
>
> At the same time, I notice that 'cat /proc/diskstats' still works, and
> starting a new 'iostat 5' process works too.
>
> After issuing halt -p I get this:
>
> root 34 0.6 0.0 0 0 ? D 11:29 1:18 [kswapd0]
> root 1258 0.0 0.0 15976 532 ? Ds 11:29 0:00 /usr/sbin/irqbalance
> root 1421 0.0 0.0 0 0 ? D 12:49 0:01 [xfsaild/md127]
> snmp 1430 0.0 0.0 48608 3440 ? D 11:29 0:00 /usr/sbin/snmpd -Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid
> xxxx 1614 1.0 0.0 378860 3812 pts/1 D+ 12:50 1:15 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx 1669 1.1 0.0 378860 3816 pts/2 D+ 12:50 1:21 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx 1727 0.5 0.0 383424 692 pts/3 Dl+ 12:51 0:37 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx 1782 1.1 0.0 378860 3824 pts/4 D+ 12:51 1:20 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
> xxxx 1954 0.0 0.0 5912 544 pts/0 D+ 12:58 0:00 iostat 5
> root 2642 0.1 0.0 0 0 ? D 13:25 0:09 [kworker/0:1]
> root 3233 0.0 0.0 5044 168 ? Ds 13:50 0:00 /usr/sbin/sshd -D -R
> root 4753 0.0 0.0 15056 928 ? D 14:42 0:00 umount /run/rpc_pipefs
> root 4828 0.0 0.0 4296 348 ? D 14:42 0:00 sync
> root 4834 0.0 0.0 8100 624 pts/6 R+ 14:50 0:00 grep --color=auto D
> root 29491 0.0 0.0 0 0 ? D 12:45 0:00 [kworker/1:2]
>
> I see even umount'ing rpc_pipefs is hanging. So this suggests there's some
> sort of global lock involved.
You need to provide the output of sysrq-W at this point ('echo w >
/proc/sysrq-trigger') so we can see where these are hung. the entire
dmesg would also be useful....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Storage server, hung tasks and tracebacks
2012-05-20 23:59 ` Dave Chinner
@ 2012-05-21 9:58 ` Brian Candler
2012-09-09 9:47 ` Brian Candler
0 siblings, 1 reply; 14+ messages in thread
From: Brian Candler @ 2012-05-21 9:58 UTC (permalink / raw)
To: Dave Chinner; +Cc: Stan Hoeppner, xfs
[-- Attachment #1: Type: text/plain, Size: 6222 bytes --]
On Mon, May 21, 2012 at 09:59:03AM +1000, Dave Chinner wrote:
> You need to provide the output of sysrq-W at this point ('echo w >
> /proc/sysrq-trigger') so we can see where these are hung. the entire
> dmesg would also be useful....
Thank you for this advice Dave.
Attached is the full dmesg output after another hang. The sysrq output is
near the end, at timestamp 250695.
For this test, I built a fresh XFS filesystem (you can see this at timestamp
246909) - I forgot to mount with "inode64" this time, but it doesn't seem to
have made a difference. I also did "swapoff" before starting the test, to
ensure that swapping to sda was not part of the problem.
A quick status summary from the hung system:
root@storage3:~# free
total used free shared buffers cached
Mem: 8156224 8052824 103400 0 4808 7399112
-/+ buffers/cache: 648904 7507320
Swap: 0 0 0
root@storage3:~# uptime
10:46:47 up 2 days, 21:43, 1 user, load average: 10.00, 9.81, 8.16
root@storage3:~# ps auxwww | grep -v ' S'
root 34 2.9 0.0 0 0 ? D May18 122:14 [kswapd0]
root 1387 0.0 0.0 15976 504 ? Ds May18 0:18 /usr/sbin/irqbalance
root 5242 0.0 0.0 0 0 ? D 09:39 0:02 [xfsaild/md127]
tomi 6249 4.2 0.0 378860 3844 pts/1 D+ 09:40 2:48 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
tomi 6251 4.1 0.0 378860 3836 pts/2 D+ 09:40 2:44 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
tomi 6253 4.1 0.0 378860 3848 pts/3 D+ 09:40 2:46 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
tomi 6255 4.0 0.0 378860 3840 pts/4 D+ 09:40 2:40 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
root 7795 0.1 0.0 0 0 ? D 10:27 0:02 [kworker/0:3]
root 8517 0.0 0.0 16876 1272 pts/0 R+ 10:46 0:00 ps auxwww
root 24420 0.0 0.0 0 0 ? D 00:50 0:00 [kworker/3:0]
root@storage3:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4]
[raid10]
md127 : active raid0 sds[17] sdx[22] sdj[8] sdt[18] sdk[9] sdc[1] sdb[0]
sdh[6] sdu[19] sdi[7] sdn[12] sdo[13] sdv[20] sdm[11] sdq[15] sdp[14]
sdl[10] sdw[21] sdg[5] sdr[16] sde[3] sdy[23] sdd[2] sdf[4]
70326362112 blocks super 1.2 1024k chunks
unused devices: <none>
root@storage3:~# mount
/dev/sda1 on / type ext4 (rw,errors=remount-ro)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw)
/dev/md127 on /disk/scratch type xfs (rw)
root@storage3:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 967415188 16754388 902241696 2% /
udev 4069104 4 4069100 1% /dev
tmpfs 1631248 380 1630868 1% /run
none 5120 0 5120 0% /run/lock
none 4078112 0 4078112 0% /run/shm
/dev/md127 70324275200 258902416 70065372784 1% /disk/scratch
root@storage3:~#
(Aside: when the test started the load average was just above 4, for the
four bonnie++ processes)
"iostat 5" shows zero activity to the MD RAID.
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 0.05 0.00 99.95
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.20 1.60 0.00 8 0
sdf 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdg 0.00 0.00 0.00 0 0
sdh 0.00 0.00 0.00 0 0
sdp 0.00 0.00 0.00 0 0
sdj 0.00 0.00 0.00 0 0
sdq 0.00 0.00 0.00 0 0
sdk 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdl 0.00 0.00 0.00 0 0
sdo 0.00 0.00 0.00 0 0
sdm 0.00 0.00 0.00 0 0
sdn 0.00 0.00 0.00 0 0
sdi 0.00 0.00 0.00 0 0
sdr 0.00 0.00 0.00 0 0
sdu 0.00 0.00 0.00 0 0
sdv 0.00 0.00 0.00 0 0
sdw 0.00 0.00 0.00 0 0
sdy 0.00 0.00 0.00 0 0
sdx 0.00 0.00 0.00 0 0
sds 0.00 0.00 0.00 0 0
sdt 0.00 0.00 0.00 0 0
md127 0.00 0.00 0.00 0 0
Anything you can determine from this info much appreciated!
Regards,
Brian.
[-- Attachment #2: storage3-dmesg.txt.gz --]
[-- Type: application/x-gunzip, Size: 24023 bytes --]
[-- Attachment #3: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Storage server, hung tasks and tracebacks
2012-05-20 16:35 ` Brian Candler
@ 2012-05-22 13:14 ` Brian Candler
0 siblings, 0 replies; 14+ messages in thread
From: Brian Candler @ 2012-05-22 13:14 UTC (permalink / raw)
To: Stan Hoeppner; +Cc: xfs
On Sun, May 20, 2012 at 05:35:06PM +0100, Brian Candler wrote:
> The main discovery: using ext4 instead of xfs, I cannot get the server to
> lock up - after 36 hours of continuous testing anyway. With xfs and
> everything else identical, it typically locks up within 10 minutes.
I re-ran the test using btrfs (not that I would use it in production, but
just as another comparison). No lockup after over 24 hours either.
So for now I am converting these boxes to ext4. I should soon have another
test box with 24 disks that I can reformat as XFS, if something changes and
you'd like me to retest it.
I'd still like to use XFS at some point in the future because this is
running gluster on top, and the gluster people recommend XFS and do most of
their testing with it.
Regards,
Brian.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Storage server, hung tasks and tracebacks
2012-05-21 9:58 ` Brian Candler
@ 2012-09-09 9:47 ` Brian Candler
0 siblings, 0 replies; 14+ messages in thread
From: Brian Candler @ 2012-09-09 9:47 UTC (permalink / raw)
To: Dave Chinner; +Cc: Stan Hoeppner, xfs
[-- Attachment #1: Type: text/plain, Size: 1409 bytes --]
On Mon, May 21, 2012 at 10:58:30AM +0100, Brian Candler wrote:
> On Mon, May 21, 2012 at 09:59:03AM +1000, Dave Chinner wrote:
> > You need to provide the output of sysrq-W at this point ('echo w >
> > /proc/sysrq-trigger') so we can see where these are hung. the entire
> > dmesg would also be useful....
>
> Thank you for this advice Dave.
>
> Attached is the full dmesg output after another hang. The sysrq output is
> near the end, at timestamp 250695.
It has recently been pointed out to me that the original attachment was
incomplete and didn't include the sysrq output. Attached is the dmesg file
*with* the sysrq data at the given timestamp.
Unfortunately, I have been trying to reproduce this problem on an identical
box and over 5 days the problem didn't reoccur. (Aside: that was until one
of the RAID0 drives failed, which is turning out to be a common occurrence
with Seagates; but that's not the failure I was seeing before, which I used
to be able to replicate in less than an hour. In this case XFS shut down
gracefully in the face of I/O errors)
This isn't one of the three actual boxes which had locked up before, but
it's another box with identical chassis and motherboard bought from same
supplier. This however has an up-to-date Ubuntu 12.04 kernel on it, so it's
possible there has been some driver fix since my original tests.
I will continue to investigate.
Regards,
Brian.
[-- Attachment #2: storage3-sysreq.txt.gz --]
[-- Type: application/x-gunzip, Size: 28530 bytes --]
[-- Attachment #3: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2012-09-09 9:46 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-02 18:44 Storage server, hung tasks and tracebacks Brian Candler
2012-05-03 12:50 ` Stan Hoeppner
2012-05-03 20:41 ` Brian Candler
2012-05-03 22:19 ` Stan Hoeppner
2012-05-04 16:32 ` Brian Candler
2012-05-04 16:50 ` Stefan Ring
2012-05-07 1:53 ` Dave Chinner
[not found] ` <4FA4C321.2070105@hardwarefreak.com>
2012-05-06 8:47 ` Brian Candler
2012-05-15 14:02 ` Brian Candler
2012-05-20 16:35 ` Brian Candler
2012-05-22 13:14 ` Brian Candler
2012-05-20 23:59 ` Dave Chinner
2012-05-21 9:58 ` Brian Candler
2012-09-09 9:47 ` Brian Candler
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.