* Hung in D state during fclose
@ 2013-02-11 23:36 Cheung, Norman
0 siblings, 0 replies; 14+ messages in thread
From: Cheung, Norman @ 2013-02-11 23:36 UTC (permalink / raw)
To: xfs
Every 3 - 4 days, my application will hang in D state at file close. And shortly after that flush (from a different partition) is locked in D state also.
My application runs continuously, 5 threads are writing data at a rate of 400K decotr/sec to 5 different XFS partitions. Each of these partitions is a 2 disk RAID 0. In addition, I have other threads consuming 100% CPU at all time, and most of these threads are tied to its own CPU.
There are 5 data writing threads are also locked in specific CPU (one CPU for one thread), with priority set to high (-2). The data writing pattern is: each disk writing thread will write a file 1.5 Gig. Then the thread will pause for about 3 minutes. Hence we have 5 files of 1.5Gig each after one processing cycle. And we keep 5 sets and delete the older ones.
After about 300 - 800 cycle, one or two of these disk writing threads will go into D state. And within a second flush of another partition will show up in D state. then after 15 minutes of no activities, the parent task will lower the priority of all threads (to noraml 20) and abort the threads. In all cases, lowering the priority will get threads out of D states. I have also tried running the disk writing threads with normal priority (20). Same hangs.
Thanks in advance,
Norman
Below is the sysrq for the 2 offending threads.
1. the disk writing thread hung in fclose
Tigris_IMC.exe D 0000000000000000 0 4197 4100 0x00000000
ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
0000000000080000 0000000000000000 00000000000401e0 00000000061805c1
Call Trace:
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0
[<ffffffffa0402682>] ? xfs_iomap_write_delay+0x172/0x2b0 [xfs]
[<ffffffff813c7e35>] ? rwsem_down_failed_common+0xc5/0x150
[<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
[<ffffffff813c74ec>] ? down_write+0x1c/0x1d
[<ffffffffa03fba8e>] ? xfs_ilock+0x7e/0xa0 [xfs]
[<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0 [xfs]
[<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130
[<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0
[<ffffffff81141cdf>] ? __block_write_begin+0x1cf/0x4d0
[<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs]
[<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs]
[<ffffffff8114226b>] ? block_write_begin+0x4b/0xa0
[<ffffffffa041b8fb>] ? xfs_vm_write_begin+0x3b/0x70 [xfs]
[<ffffffff810c0258>] ? generic_file_buffered_write+0xf8/0x250
[<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs]
[<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs]
[<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0
[<ffffffff8119daa4>] ? security_file_permission+0x24/0xc0
[<ffffffff8111630a>] ? vfs_write+0xaa/0x190
[<ffffffff81116657>] ? sys_write+0x47/0x90
[<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b
2. flush from another partition
flush-8:48 D 0000000000000000 0 4217 2 0x00000000
ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9
Call Trace:
[<ffffffff810656f9>] ? __queue_work+0xc9/0x390
[<ffffffff811e3e3f>] ? cfq_insert_request+0xaf/0x4f0
[<ffffffff81065a06>] ? queue_work_on+0x16/0x20
[<ffffffff813c69cd>] ? schedule_timeout+0x1dd/0x240
[<ffffffffa041a762>] ? kmem_zone_zalloc+0x32/0x50 [xfs]
[<ffffffff813c7559>] ? __down+0x6c/0x99
[<ffffffff81070377>] ? down+0x37/0x40
[<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs]
[<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs]
[<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs]
[<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs]
[<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs]
[<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs]
[<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs]
[<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs]
[<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs]
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0
[<ffffffffa03d23a4>] ? xfs_alloc_vextent+0x184/0x4a0 [xfs]
[<ffffffffa03dc348>] ? xfs_bmap_btalloc+0x2d8/0x6d0 [xfs]
[<ffffffffa03e0efd>] ? xfs_bmapi+0x9bd/0x11a0 [xfs]
[<ffffffffa03d9bbc>] ? xfs_bmap_search_multi_extents+0xac/0x120 [xfs]
[<ffffffffa040293c>] ? xfs_iomap_write_allocate+0x17c/0x330 [xfs]
[<ffffffffa041b20f>] ? xfs_map_blocks+0x19f/0x1b0 [xfs]
[<ffffffffa041c20e>] ? xfs_vm_writepage+0x19e/0x470 [xfs]
[<ffffffff810c97ba>] ? __writepage+0xa/0x30
[<ffffffff810c9c4d>] ? write_cache_pages+0x1cd/0x3d0
[<ffffffff810c97b0>] ? bdi_set_max_ratio+0x90/0x90
[<ffffffff810c9e93>] ? generic_writepages+0x43/0x70
[<ffffffff81139330>] ? writeback_single_inode+0x160/0x300
[<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0
[<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140
[<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310
[<ffffffff813cedee>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff8113a10e>] ? wb_check_old_data_flush+0x4e/0xa0
[<ffffffff8113a28b>] ? wb_do_writeback+0x12b/0x160
[<ffffffff8113a332>] ? bdi_writeback_thread+0x72/0x150
[<ffffffff8113a2c0>] ? wb_do_writeback+0x160/0x160
[<ffffffff8106b06e>] ? kthread+0x7e/0x90
[<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10
[<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0
[<ffffffff813cf540>] ? gs_change+0x13/0x13
kernel version 3.0.13-0.27
xfsprogs version xfs_repair version 3.1.6
number of CPUs 32
xfs_info /fastraid2
meta-data=/dev/sdd1 isize=256 agcount=4, agsize=17822144 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=71288576, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=34808, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
contents of /proc/meminfo
cat /proc/meminfo
MemTotal: 262429860 kB
MemFree: 146893200 kB
Buffers: 38600 kB
Cached: 336388 kB
SwapCached: 0 kB
Active: 4601364 kB
Inactive: 277852 kB
Active(anon): 4504236 kB
Inactive(anon): 9524 kB
Active(file): 97128 kB
Inactive(file): 268328 kB
Unevictable: 107520004 kB
Mlocked: 212048 kB
SwapTotal: 1051644 kB
SwapFree: 1051644 kB
Dirty: 140 kB
Writeback: 0 kB
AnonPages: 112024272 kB
Mapped: 2031420 kB
Shmem: 9528 kB
Slab: 123444 kB
SReclaimable: 24088 kB
SUnreclaim: 99356 kB
KernelStack: 3728 kB
PageTables: 229284 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 260857204 kB
Committed_AS: 112752960 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 712612 kB
VmallocChunk: 34224802044 kB
HardwareCorrupted: 0 kB
AnonHugePages: 111861760 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 133788 kB
DirectMap2M: 7178240 kB
DirectMap1G: 261095424 kB
contents of /proc/mounts
cat /proc/mounts
rootfs / rootfs rw 0 0
udev /dev tmpfs rw,relatime,nr_inodes=0,mode=755 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
/dev/sde3 / reiserfs rw,relatime,acl,user_xattr 0 0
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
/dev/sde1 /boot reiserfs rw,relatime,acl,user_xattr 0 0
fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
//10.34.3.247/IMCRootFolder /home/klac/mount_path_Tool cifs rw,relatime,sec=ntlm,unc=\\10.34.3.247\IMCRootFolder,username=KTMfg,uid=1000,forceuid,gid=100,forcegid,addr=10.34.3.247,file_mode=0755,dir_mode=0755,nounix,rsize=16384,wsize=131007,actimeo=1 0 0
/dev/sda1 /edge xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdb1 /fastraid0 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdc1 /fastraid1 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdd1 /fastraid2 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdf1 /fastraid3 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdg1 /fastraid4 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdh1 /fastraid5 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
contents of /proc/partitions
cat /proc/partitions
major minor #blocks name
8 16 285155328 sdb
8 17 285154304 sdb1
8 32 285155328 sdc
8 33 285154304 sdc1
8 48 285155328 sdd
8 49 285154304 sdd1
8 80 285671424 sdf
8 81 285659136 sdf1
8 64 124485632 sde
8 65 1051648 sde1
8 66 1051648 sde2
8 67 122381312 sde3
8 96 285671424 sdg
8 97 285659136 sdg1
8 112 285671424 sdh
8 113 285659136 sdh1
8 0 142577664 sda
8 1 142576640 sda1
RAID layout (hardware and/or software)
Hardware RAID 0 , 2 disks per RAID.
LVM configuration
fdisk -l
Disk /dev/sdb: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00073b0b
Device Boot Start End Blocks Id System
/dev/sdb1 2048 570310655 285154304 83 Linux
Disk /dev/sdc: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000d3eba
Device Boot Start End Blocks Id System
/dev/sdc1 2048 570310655 285154304 83 Linux
Disk /dev/sdd: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0003e548
Device Boot Start End Blocks Id System
/dev/sdd1 2048 570310655 285154304 83 Linux
Disk /dev/sdf: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0009caf0
Device Boot Start End Blocks Id System
/dev/sdf1 2048 571320319 285659136 83 Linux
Disk /dev/sde: 127.5 GB, 127473287168 bytes
255 heads, 63 sectors/track, 15497 cylinders, total 248971264 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x674cfaee
Device Boot Start End Blocks Id System
/dev/sde1 * 2048 2105343 1051648 83 Linux
/dev/sde2 2105344 4208639 1051648 82 Linux swap / Solaris
/dev/sde3 4208640 248971263 122381312 83 Linux
Disk /dev/sdg: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0000a1d4
Device Boot Start End Blocks Id System
/dev/sdg1 2048 571320319 285659136 83 Linux
Disk /dev/sdh: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0006b7ff
Device Boot Start End Blocks Id System
/dev/sdh1 2048 571320319 285659136 83 Linux
Disk /dev/sda: 146.0 GB, 145999527936 bytes
255 heads, 63 sectors/track, 17750 cylinders, total 285155328 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000f1568
Device Boot Start End Blocks Id System
/dev/sda1 2048 285155327 142576640 83 Linux
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Hung in D state during fclose
2013-02-26 19:41 ` Cheung, Norman
@ 2013-02-26 20:31 ` Dave Chinner
0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2013-02-26 20:31 UTC (permalink / raw)
To: Cheung, Norman; +Cc: 'linux-xfs@oss.sgi.com'
On Tue, Feb 26, 2013 at 07:41:42PM +0000, Cheung, Norman wrote:
> I'd been checking all the XFS patches if any might relate to my situation and came across this
> http://article.gmane.org/gmane.comp.file-systems.xfs.general/40349/
>
> From: Christoph Hellwig <hch <at> infradead.org>
> Subject: [PATCH, RFC] writeback: avoid redirtying when ->write_inode failed to clear I_DIRTY
> Newsgroups: gmane.comp.file-systems.xfs.general
> Date: 2011-08-27 06:14:09 GMT (1 year, 26 weeks, 1 day, 13 hours and 17 minutes ago)
> Right now ->write_inode has no way to safely return a EAGAIN without explicitly
> redirtying the inode, as we would lose the dirty state otherwise. Most
> filesystems get this wrong, but XFS makes heavy use of it to avoid blocking
> the flusher thread when ->write_inode hits contentended inode locks. A
> contended ilock is something XFS can hit very easibly when extending files, as
> the data I/O completion handler takes the lock to update the size, and the
> ->write_inode call can race with it fairly easily if writing enough data
> in one go so that the completion for the first write come in just before
> we call ->write_inode.
>
> Change the handling of this case to use requeue_io for a quick retry instead
> of redirty_tail, which keeps moving out the dirtied_when data and thus keeps
> delaying the writeout more and more with every failed attempt to get the lock.
>
> I wonder if this would have caused my application waiting for xfs_ilock. I checked
> that this patch is not in my kernel source (SUSE 11 SP2, Rel 3.0.13-0.27)
I have no idea whether it will help or not, because SuSE (like Red
Hat) ship a heavily patched kernel and so it's rather hard for
anyone here to triage and diagnose problems like this. Further, even
if we can find a potential cause, we still can't fix it for you.
Hence perhaps you are best advised to talk to your SuSE support
contact at this point?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: Hung in D state during fclose
2013-02-14 4:53 ` Cheung, Norman
@ 2013-02-26 19:41 ` Cheung, Norman
2013-02-26 20:31 ` Dave Chinner
0 siblings, 1 reply; 14+ messages in thread
From: Cheung, Norman @ 2013-02-26 19:41 UTC (permalink / raw)
To: Dave Chinner; +Cc: 'linux-xfs@oss.sgi.com'
I'd been checking all the XFS patches if any might relate to my situation and came across this
http://article.gmane.org/gmane.comp.file-systems.xfs.general/40349/
From: Christoph Hellwig <hch <at> infradead.org>
Subject: [PATCH, RFC] writeback: avoid redirtying when ->write_inode failed to clear I_DIRTY
Newsgroups: gmane.comp.file-systems.xfs.general
Date: 2011-08-27 06:14:09 GMT (1 year, 26 weeks, 1 day, 13 hours and 17 minutes ago)
Right now ->write_inode has no way to safely return a EAGAIN without explicitly
redirtying the inode, as we would lose the dirty state otherwise. Most
filesystems get this wrong, but XFS makes heavy use of it to avoid blocking
the flusher thread when ->write_inode hits contentended inode locks. A
contended ilock is something XFS can hit very easibly when extending files, as
the data I/O completion handler takes the lock to update the size, and the
->write_inode call can race with it fairly easily if writing enough data
in one go so that the completion for the first write come in just before
we call ->write_inode.
Change the handling of this case to use requeue_io for a quick retry instead
of redirty_tail, which keeps moving out the dirtied_when data and thus keeps
delaying the writeout more and more with every failed attempt to get the lock.
I wonder if this would have caused my application waiting for xfs_ilock. I checked
that this patch is not in my kernel source (SUSE 11 SP2, Rel 3.0.13-0.27)
Re attached the traces again, below.
Many Thanks,
Norman
> > 1. the disk writing thread hung in fclose
> >
> >
> > Tigris_IMC.exe D 0000000000000000 0 4197 4100 0x00000000
> > ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> > ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> > 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1
> > Call Trace:
> > [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>]
> > ? xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ?
> > rwsem_down_failed_common+0xc5/0x150
> > [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> > [<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ?
> > xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ?
> > __xfs_get_blocks+0x1db/0x3d0 [xfs] [<ffffffff81103340>] ?
> > kmem_cache_alloc+0x100/0x130 [<ffffffff8113fa2e>] ?
> > alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ?
> > __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ?
> > xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ?
> > xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ?
> > block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ?
> > xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ?
> > generic_file_buffered_write+0xf8/0x250
> > [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs]
> > [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs]
> > [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ?
> > security_file_permission+0x24/0xc0
> > [<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ?
> > sys_write+0x47/0x90 [<ffffffff813ce412>] ?
> > system_call_fastpath+0x16/0x1b
>
> So that is doing a write() from fclose, and it's waiting on the inode
> XFS_ILOCK_EXCL
>
> /me wishes that all distros compiled their kernels with frame pointers
> enabled so that analysing stack traces is better than "I'm guessing that the
> real stack trace is....
>
> [NLC] the last entry zone_statistics is called only with NUMA enabled, I
> wonder if I can work around it by turning off NUMA.
>
> > 2. flush from another partition
> >
> > flush-8:48 D 0000000000000000 0 4217 2 0x00000000
> > ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
> > ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
> > ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9
> > Call Trace:
> > [<ffffffff810656f9>] ? __queue_work+0xc9/0x390 [<ffffffff811e3e3f>] ?
> > cfq_insert_request+0xaf/0x4f0 [<ffffffff81065a06>] ?
> > queue_work_on+0x16/0x20 [<ffffffff813c69cd>] ?
> > schedule_timeout+0x1dd/0x240 [<ffffffffa041a762>] ?
> > kmem_zone_zalloc+0x32/0x50 [xfs] [<ffffffff813c7559>] ?
> > __down+0x6c/0x99 [<ffffffff81070377>] ? down+0x37/0x40
> > [<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs]
> > [<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs]
> > [<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs]
> > [<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs]
> > [<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs]
> > [<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs]
> > [<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs]
> > [<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs]
> > [<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs]
> > [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa03d23a4>]
> > ? xfs_alloc_vextent+0x184/0x4a0 [xfs] [<ffffffffa03dc348>] ?
> > xfs_bmap_btalloc+0x2d8/0x6d0 [xfs] [<ffffffffa03e0efd>] ?
> > xfs_bmapi+0x9bd/0x11a0 [xfs] [<ffffffffa03d9bbc>] ?
> > xfs_bmap_search_multi_extents+0xac/0x120 [xfs] [<ffffffffa040293c>] ?
> > xfs_iomap_write_allocate+0x17c/0x330 [xfs] [<ffffffffa041b20f>] ?
> > xfs_map_blocks+0x19f/0x1b0 [xfs] [<ffffffffa041c20e>] ?
> > xfs_vm_writepage+0x19e/0x470 [xfs] [<ffffffff810c97ba>] ?
> > __writepage+0xa/0x30 [<ffffffff810c9c4d>] ?
> > write_cache_pages+0x1cd/0x3d0 [<ffffffff810c97b0>] ?
> > bdi_set_max_ratio+0x90/0x90 [<ffffffff810c9e93>] ?
> > generic_writepages+0x43/0x70 [<ffffffff81139330>] ?
> > writeback_single_inode+0x160/0x300
> > [<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0
> > [<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140
> > [<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310 [<ffffffff813cedee>] ?
> > apic_timer_interrupt+0xe/0x20 [<ffffffff8113a10e>] ?
> > wb_check_old_data_flush+0x4e/0xa0 [<ffffffff8113a28b>] ?
> > wb_do_writeback+0x12b/0x160 [<ffffffff8113a332>] ?
> > bdi_writeback_thread+0x72/0x150 [<ffffffff8113a2c0>] ?
> > wb_do_writeback+0x160/0x160 [<ffffffff8106b06e>] ? kthread+0x7e/0x90
> > [<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10
> > [<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0
> > [<ffffffff813cf540>] ? gs_change+0x13/0x13
>
> Thats the writeback thread waiting on an AGF buffer to be unlocked.
> IOWs, there are probably multiple concurrent allocations to the same AG. But
> that will be holding the XFS_ILOCK_EXCL lock that the other thread is waiting.
>
> What thread is holding the AGF buffer is anyone's guess - it could be waiting
> on IO completion, which would indicate a problem in the storage layers
> below XFS. The output in dmesg from sysrq-w (echo w >
> /proc/sysrq-trigger) might help indicate other blocked threads that could be
> holding the AGF lock.
>
>
> On Wed, Feb 13, 2013 at 12:12:47AM +0000, Cheung, Norman wrote:
> > Dave,
> >
> > One other point I have forgotten to mention is that the parent thread
> > will wait for 5 minutes and then lower the thread priority (from -2
> > back to 20) and set a global variable to signal the threads to exit.
> > The blocked thread responded well and exit from D state and fclose
> > completed with no error.
>
> So it's not hung - it's just very slow?
>
> [NLC] It will pause forever. I tried replaced the timeout with a sleep loop,
> and it will pause forever. That is how I got the stack trace.
>
> You have 256GB of memory. It's entirely possible that you've dirtied a large
> amount of memory and everything is simply stuck waiting for writeback to
> occur. Perhaps you should have a look at the utilisation of your disks when
> this still occurs. 'iostat -x -d -m 5' will give you some insight into utilsation
> when a hang occurs...
>
> [NLC] I have set the dirty_background_bytes to 40M and
> dirty_writeback_centisecs to 400; and watched the Meminfo. I don't get a
> lot of dirtied memory accumulation -- 7 to 10 M average. The disk usage from
> the sar log was steady at 150M/sec for each of the disk; but a few seconds
> after the fclose all disk activities stopped. Also the CPU % was quiet as well.
>
> > This cause me to wonder if it is possible that some XFS threads and
> > my application thread might be in a deadlock.
>
> Deadlocks are permanent, so what you are seeing is not a deadlock....
>
> [NLC] Would it be possible that there is priority inversion between my disk
> writing threads and the XFS threads & flush threads? My application thread
> runs at -2 priority.
>
> Many thanks,
> Norman
>
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: Hung in D state during fclose
2013-02-13 5:15 ` Dave Chinner
@ 2013-02-14 4:53 ` Cheung, Norman
2013-02-26 19:41 ` Cheung, Norman
0 siblings, 1 reply; 14+ messages in thread
From: Cheung, Norman @ 2013-02-14 4:53 UTC (permalink / raw)
To: Dave Chinner; +Cc: 'linux-xfs@oss.sgi.com'
-----Original Message-----
From: xfs-bounces@oss.sgi.com [mailto:xfs-bounces@oss.sgi.com] On Behalf Of Dave Chinner
Sent: Tuesday, February 12, 2013 9:16 PM
To: Cheung, Norman
Cc: 'linux-xfs@oss.sgi.com'
Subject: Re: Hung in D state during fclose
On Wed, Feb 13, 2013 at 12:12:47AM +0000, Cheung, Norman wrote:
> Dave,
>
> One other point I have forgotten to mention is that the parent thread
> will wait for 5 minutes and then lower the thread priority (from -2
> back to 20) and set a global variable to signal the threads to exit.
> The blocked thread responded well and exit from D state and fclose
> completed with no error.
So it's not hung - it's just very slow?
[NLC] It will pause forever. I tried replaced the timeout with a sleep loop, and it will pause forever. That is how I got the stack trace.
You have 256GB of memory. It's entirely possible that you've dirtied a large amount of memory and everything is simply stuck waiting for writeback to occur. Perhaps you should have a look at the utilisation of your disks when this still occurs. 'iostat -x -d -m 5' will give you some insight into utilsation when a hang occurs...
[NLC] I have set the dirty_background_bytes to 40M and dirty_writeback_centisecs to 400; and watched the Meminfo. I don't get a lot of dirtied memory accumulation -- 7 to 10 M average. The disk usage from the sar log was steady at 150M/sec for each of the disk; but a few seconds after the fclose all disk activities stopped. Also the CPU % was quiet as well.
> This cause me to wonder if it is possible that some XFS threads and
> my application thread might be in a deadlock.
Deadlocks are permanent, so what you are seeing is not a deadlock....
[NLC] Would it be possible that there is priority inversion between my disk writing threads and the XFS threads & flush threads? My application thread runs at -2 priority.
Many thanks,
Norman
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Hung in D state during fclose
2013-02-13 0:12 ` Cheung, Norman
@ 2013-02-13 5:15 ` Dave Chinner
2013-02-14 4:53 ` Cheung, Norman
0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2013-02-13 5:15 UTC (permalink / raw)
To: Cheung, Norman; +Cc: 'linux-xfs@oss.sgi.com'
On Wed, Feb 13, 2013 at 12:12:47AM +0000, Cheung, Norman wrote:
> Dave,
>
> One other point I have forgotten to mention is that the parent
> thread will wait for 5 minutes and then lower the thread priority
> (from -2 back to 20) and set a global variable to signal the
> threads to exit. The blocked thread responded well and exit from
> D state and fclose completed with no error.
So it's not hung - it's just very slow?
You have 256GB of memory. It's entirely possible that you've dirtied
a large amount of memory and everything is simply stuck waiting for
writeback to occur. Perhaps you should have a look at the
utilisation of your disks when this still occurs. 'iostat -x -d -m
5' will give you some insight into utilsation when a hang occurs...
> This cause me to wonder if it is possible that some XFS threads
> and my application thread might be in a deadlock.
Deadlocks are permanent, so what you are seeing is not a deadlock....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: Hung in D state during fclose
2013-02-12 21:06 ` Cheung, Norman
2013-02-12 22:21 ` Dave Chinner
@ 2013-02-13 0:12 ` Cheung, Norman
2013-02-13 5:15 ` Dave Chinner
1 sibling, 1 reply; 14+ messages in thread
From: Cheung, Norman @ 2013-02-13 0:12 UTC (permalink / raw)
To: 'Dave Chinner'; +Cc: 'linux-xfs@oss.sgi.com'
Dave,
One other point I have forgotten to mention is that the parent thread will wait for 5 minutes and then lower the thread priority (from -2 back to 20) and set a global variable to signal the threads to exit. The blocked thread responded well and exit from D state and fclose completed with no error.
This cause me to wonder if it is possible that some XFS threads and my application thread might be in a deadlock.
Thanks
Norman
-----Original Message-----
From: xfs-bounces@oss.sgi.com [mailto:xfs-bounces@oss.sgi.com] On Behalf Of Dave Chinner
Sent: Tuesday, February 12, 2013 12:23 PM
To: Cheung, Norman
Cc: linux-xfs@oss.sgi.com
Subject: Re: Hung in D state during fclose
On Tue, Feb 12, 2013 at 04:39:48PM +0000, Cheung, Norman wrote:
> It's just as mangled. Write them to a file, make sure it is formatted correctly, and attach it to the email.
> [NLC] attached in a file, sorry for the trouble. Also paste trace again below hopefully it will come thru better.
The file and the paste came through OK.
> > kernel version 3.0.13-0.27
>
> What distribution is that from?
> [NLC] SUSE
Yeah, looks to be a SLES kernel - have you talked to you SuSE support rep about this?
[NLC] In the process of making contact
> 1. the disk writing thread hung in fclose
>
>
> Tigris_IMC.exe D 0000000000000000 0 4197 4100 0x00000000
> ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1
> Call Trace:
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>]
> ? xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ?
> rwsem_down_failed_common+0xc5/0x150
> [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ?
> xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ?
> __xfs_get_blocks+0x1db/0x3d0 [xfs] [<ffffffff81103340>] ?
> kmem_cache_alloc+0x100/0x130 [<ffffffff8113fa2e>] ?
> alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ?
> __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ?
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ?
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ?
> block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ?
> xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ?
> generic_file_buffered_write+0xf8/0x250
> [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs]
> [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs]
> [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ?
> security_file_permission+0x24/0xc0
> [<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ?
> sys_write+0x47/0x90 [<ffffffff813ce412>] ?
> system_call_fastpath+0x16/0x1b
So that is doing a write() from fclose, and it's waiting on the inode XFS_ILOCK_EXCL
/me wishes that all distros compiled their kernels with frame pointers enabled so that analysing stack traces is better than "I'm guessing that the real stack trace is....
[NLC] the last entry zone_statistics is called only with NUMA enabled, I wonder if I can work around it by turning off NUMA.
> 2. flush from another partition
>
> flush-8:48 D 0000000000000000 0 4217 2 0x00000000
> ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
> ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
> ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9
> Call Trace:
> [<ffffffff810656f9>] ? __queue_work+0xc9/0x390 [<ffffffff811e3e3f>] ?
> cfq_insert_request+0xaf/0x4f0 [<ffffffff81065a06>] ?
> queue_work_on+0x16/0x20 [<ffffffff813c69cd>] ?
> schedule_timeout+0x1dd/0x240 [<ffffffffa041a762>] ?
> kmem_zone_zalloc+0x32/0x50 [xfs] [<ffffffff813c7559>] ?
> __down+0x6c/0x99 [<ffffffff81070377>] ? down+0x37/0x40
> [<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs]
> [<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs]
> [<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs]
> [<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs]
> [<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs]
> [<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs]
> [<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs]
> [<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs]
> [<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs]
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa03d23a4>]
> ? xfs_alloc_vextent+0x184/0x4a0 [xfs] [<ffffffffa03dc348>] ?
> xfs_bmap_btalloc+0x2d8/0x6d0 [xfs] [<ffffffffa03e0efd>] ?
> xfs_bmapi+0x9bd/0x11a0 [xfs] [<ffffffffa03d9bbc>] ?
> xfs_bmap_search_multi_extents+0xac/0x120 [xfs] [<ffffffffa040293c>] ?
> xfs_iomap_write_allocate+0x17c/0x330 [xfs] [<ffffffffa041b20f>] ?
> xfs_map_blocks+0x19f/0x1b0 [xfs] [<ffffffffa041c20e>] ?
> xfs_vm_writepage+0x19e/0x470 [xfs] [<ffffffff810c97ba>] ?
> __writepage+0xa/0x30 [<ffffffff810c9c4d>] ?
> write_cache_pages+0x1cd/0x3d0 [<ffffffff810c97b0>] ?
> bdi_set_max_ratio+0x90/0x90 [<ffffffff810c9e93>] ?
> generic_writepages+0x43/0x70 [<ffffffff81139330>] ?
> writeback_single_inode+0x160/0x300
> [<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0
> [<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140
> [<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310 [<ffffffff813cedee>] ?
> apic_timer_interrupt+0xe/0x20 [<ffffffff8113a10e>] ?
> wb_check_old_data_flush+0x4e/0xa0 [<ffffffff8113a28b>] ?
> wb_do_writeback+0x12b/0x160 [<ffffffff8113a332>] ?
> bdi_writeback_thread+0x72/0x150 [<ffffffff8113a2c0>] ?
> wb_do_writeback+0x160/0x160 [<ffffffff8106b06e>] ? kthread+0x7e/0x90
> [<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10
> [<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0
> [<ffffffff813cf540>] ? gs_change+0x13/0x13
Thats the writeback thread waiting on an AGF buffer to be unlocked.
IOWs, there are probably multiple concurrent allocations to the same AG. But that will be holding the XFS_ILOCK_EXCL lock that the other thread is waiting.
What thread is holding the AGF buffer is anyone's guess - it could be waiting on IO completion, which would indicate a problem in the storage layers below XFS. The output in dmesg from sysrq-w (echo w >
/proc/sysrq-trigger) might help indicate other blocked threads that could be holding the AGF lock.
[NLC] Only 2 threads in D state. I need to wait for the next hang to take another stack trace. Is there any workaround I can reduce the frequencies of this hang. What about reducing the xfssyncd_centisecs? Or other knobs?
.....
> RAID layout (hardware and/or software) Hardware RAID 0 , 2 disks per RAID.
What RAID controller? SAS or SATA drives? Stripe chunk/segment size?
Any BBWC?
[NLC] 2 RAID controllers in the system, it hung from disks on both of them. One is SuperMicro 2208 and the other is LSI 9265-8I. I think both use the same chipset.
[NLC] 2 SAS disks on each RAID 0 -15K RPM [NLC] I am not sure the segment size (strip size / no. of disks?) but the strip size is 512K (see below dump) [NLC] No BBWC installed
[NLC] My sunit=0 and swidth=0. And sectsz=512 Would it help to set this to stripe size?
Thanks,
Norman
/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -L1 -a0
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 1 (Target Id: 1)
Name :
RAID Level : Primary-0, Secondary-0, RAID Level Qualifier-0
Size : 271.945 GB
Parity Size : 0
State : Optimal
Strip Size : 512 KB
Number Of Drives : 2
Span Depth : 1
Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write
Disk Cache Policy : Enabled
Encryption Type : None
Bad Blocks Exist: No
PI type: No PI
Is VD Cached: No
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Hung in D state during fclose
2013-02-12 21:06 ` Cheung, Norman
@ 2013-02-12 22:21 ` Dave Chinner
2013-02-13 0:12 ` Cheung, Norman
1 sibling, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2013-02-12 22:21 UTC (permalink / raw)
To: Cheung, Norman; +Cc: linux-xfs
On Tue, Feb 12, 2013 at 09:06:03PM +0000, Cheung, Norman wrote:
> [NLC] Only 2 threads in D state. I need to wait for the next
> hang to take another stack trace. Is there any workaround I can
> reduce the frequencies of this hang. What about reducing the
> xfssyncd_centisecs? Or other knobs?
You need to isolate the cause of the hang before changing anything.
Twiddling random knobs won't improve the situation. Understand the
problem first, then take appropriate action.
> [NLC] 2 RAID controllers in the system, it hung from disks on
> both of them. One is SuperMicro 2208 and the other is LSI
> 9265-8I. I think both use the same chipset.
Does the filesystem that hangs always belong to the same controller?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: Hung in D state during fclose
2013-02-12 20:22 ` Dave Chinner
@ 2013-02-12 21:06 ` Cheung, Norman
2013-02-12 22:21 ` Dave Chinner
2013-02-13 0:12 ` Cheung, Norman
0 siblings, 2 replies; 14+ messages in thread
From: Cheung, Norman @ 2013-02-12 21:06 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
-----Original Message-----
From: xfs-bounces@oss.sgi.com [mailto:xfs-bounces@oss.sgi.com] On Behalf Of Dave Chinner
Sent: Tuesday, February 12, 2013 12:23 PM
To: Cheung, Norman
Cc: linux-xfs@oss.sgi.com
Subject: Re: Hung in D state during fclose
On Tue, Feb 12, 2013 at 04:39:48PM +0000, Cheung, Norman wrote:
> It's just as mangled. Write them to a file, make sure it is formatted correctly, and attach it to the email.
> [NLC] attached in a file, sorry for the trouble. Also paste trace again below hopefully it will come thru better.
The file and the paste came through OK.
> > kernel version 3.0.13-0.27
>
> What distribution is that from?
> [NLC] SUSE
Yeah, looks to be a SLES kernel - have you talked to you SuSE support rep about this?
[NLC] In the process of making contact
> 1. the disk writing thread hung in fclose
>
>
> Tigris_IMC.exe D 0000000000000000 0 4197 4100 0x00000000
> ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1
> Call Trace:
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>]
> ? xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ?
> rwsem_down_failed_common+0xc5/0x150
> [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ?
> xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ?
> __xfs_get_blocks+0x1db/0x3d0 [xfs] [<ffffffff81103340>] ?
> kmem_cache_alloc+0x100/0x130 [<ffffffff8113fa2e>] ?
> alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ?
> __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ?
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ?
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ?
> block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ?
> xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ?
> generic_file_buffered_write+0xf8/0x250
> [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs]
> [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs]
> [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ?
> security_file_permission+0x24/0xc0
> [<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ?
> sys_write+0x47/0x90 [<ffffffff813ce412>] ?
> system_call_fastpath+0x16/0x1b
So that is doing a write() from fclose, and it's waiting on the inode XFS_ILOCK_EXCL
/me wishes that all distros compiled their kernels with frame pointers enabled so that analysing stack traces is better than "I'm guessing that the real stack trace is....
[NLC] the last entry zone_statistics is called only with NUMA enabled, I wonder if I can work around it by turning off NUMA.
> 2. flush from another partition
>
> flush-8:48 D 0000000000000000 0 4217 2 0x00000000
> ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
> ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
> ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9
> Call Trace:
> [<ffffffff810656f9>] ? __queue_work+0xc9/0x390 [<ffffffff811e3e3f>] ?
> cfq_insert_request+0xaf/0x4f0 [<ffffffff81065a06>] ?
> queue_work_on+0x16/0x20 [<ffffffff813c69cd>] ?
> schedule_timeout+0x1dd/0x240 [<ffffffffa041a762>] ?
> kmem_zone_zalloc+0x32/0x50 [xfs] [<ffffffff813c7559>] ?
> __down+0x6c/0x99 [<ffffffff81070377>] ? down+0x37/0x40
> [<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs]
> [<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs]
> [<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs]
> [<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs]
> [<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs]
> [<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs]
> [<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs]
> [<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs]
> [<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs]
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa03d23a4>]
> ? xfs_alloc_vextent+0x184/0x4a0 [xfs] [<ffffffffa03dc348>] ?
> xfs_bmap_btalloc+0x2d8/0x6d0 [xfs] [<ffffffffa03e0efd>] ?
> xfs_bmapi+0x9bd/0x11a0 [xfs] [<ffffffffa03d9bbc>] ?
> xfs_bmap_search_multi_extents+0xac/0x120 [xfs] [<ffffffffa040293c>] ?
> xfs_iomap_write_allocate+0x17c/0x330 [xfs] [<ffffffffa041b20f>] ?
> xfs_map_blocks+0x19f/0x1b0 [xfs] [<ffffffffa041c20e>] ?
> xfs_vm_writepage+0x19e/0x470 [xfs] [<ffffffff810c97ba>] ?
> __writepage+0xa/0x30 [<ffffffff810c9c4d>] ?
> write_cache_pages+0x1cd/0x3d0 [<ffffffff810c97b0>] ?
> bdi_set_max_ratio+0x90/0x90 [<ffffffff810c9e93>] ?
> generic_writepages+0x43/0x70 [<ffffffff81139330>] ?
> writeback_single_inode+0x160/0x300
> [<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0
> [<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140
> [<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310 [<ffffffff813cedee>] ?
> apic_timer_interrupt+0xe/0x20 [<ffffffff8113a10e>] ?
> wb_check_old_data_flush+0x4e/0xa0 [<ffffffff8113a28b>] ?
> wb_do_writeback+0x12b/0x160 [<ffffffff8113a332>] ?
> bdi_writeback_thread+0x72/0x150 [<ffffffff8113a2c0>] ?
> wb_do_writeback+0x160/0x160 [<ffffffff8106b06e>] ? kthread+0x7e/0x90
> [<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10
> [<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0
> [<ffffffff813cf540>] ? gs_change+0x13/0x13
Thats the writeback thread waiting on an AGF buffer to be unlocked.
IOWs, there are probably multiple concurrent allocations to the same AG. But that will be holding the XFS_ILOCK_EXCL lock that the other thread is waiting.
What thread is holding the AGF buffer is anyone's guess - it could be waiting on IO completion, which would indicate a problem in the storage layers below XFS. The output in dmesg from sysrq-w (echo w >
/proc/sysrq-trigger) might help indicate other blocked threads that could be holding the AGF lock.
[NLC] Only 2 threads in D state. I need to wait for the next hang to take another stack trace. Is there any workaround I can reduce the frequencies of this hang. What about reducing the xfssyncd_centisecs? Or other knobs?
.....
> RAID layout (hardware and/or software) Hardware RAID 0 , 2 disks per RAID.
What RAID controller? SAS or SATA drives? Stripe chunk/segment size?
Any BBWC?
[NLC] 2 RAID controllers in the system, it hung from disks on both of them. One is SuperMicro 2208 and the other is LSI 9265-8I. I think both use the same chipset.
[NLC] 2 SAS disks on each RAID 0 -15K RPM
[NLC] I am not sure the segment size (strip size / no. of disks?) but the strip size is 512K (see below dump)
[NLC] No BBWC installed
[NLC] My sunit=0 and swidth=0. And sectsz=512 Would it help to set this to stripe size?
Thanks,
Norman
/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -L1 -a0
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 1 (Target Id: 1)
Name :
RAID Level : Primary-0, Secondary-0, RAID Level Qualifier-0
Size : 271.945 GB
Parity Size : 0
State : Optimal
Strip Size : 512 KB
Number Of Drives : 2
Span Depth : 1
Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Enabled
Encryption Type : None
Bad Blocks Exist: No
PI type: No PI
Is VD Cached: No
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Hung in D state during fclose
2013-02-12 16:39 ` Cheung, Norman
@ 2013-02-12 20:22 ` Dave Chinner
2013-02-12 21:06 ` Cheung, Norman
0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2013-02-12 20:22 UTC (permalink / raw)
To: Cheung, Norman; +Cc: linux-xfs
On Tue, Feb 12, 2013 at 04:39:48PM +0000, Cheung, Norman wrote:
> It's just as mangled. Write them to a file, make sure it is formatted correctly, and attach it to the email.
> [NLC] attached in a file, sorry for the trouble. Also paste trace again below hopefully it will come thru better.
The file and the paste came through OK.
> > kernel version 3.0.13-0.27
>
> What distribution is that from?
> [NLC] SUSE
Yeah, looks to be a SLES kernel - have you talked to you SuSE
support rep about this?
> 1. the disk writing thread hung in fclose
>
>
> Tigris_IMC.exe D 0000000000000000 0 4197 4100 0x00000000
> ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1
> Call Trace:
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0
> [<ffffffffa0402682>] ? xfs_iomap_write_delay+0x172/0x2b0 [xfs]
> [<ffffffff813c7e35>] ? rwsem_down_failed_common+0xc5/0x150
> [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff813c74ec>] ? down_write+0x1c/0x1d
> [<ffffffffa03fba8e>] ? xfs_ilock+0x7e/0xa0 [xfs]
> [<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0 [xfs]
> [<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130
> [<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0
> [<ffffffff81141cdf>] ? __block_write_begin+0x1cf/0x4d0
> [<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs]
> [<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs]
> [<ffffffff8114226b>] ? block_write_begin+0x4b/0xa0
> [<ffffffffa041b8fb>] ? xfs_vm_write_begin+0x3b/0x70 [xfs]
> [<ffffffff810c0258>] ? generic_file_buffered_write+0xf8/0x250
> [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs]
> [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs]
> [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0
> [<ffffffff8119daa4>] ? security_file_permission+0x24/0xc0
> [<ffffffff8111630a>] ? vfs_write+0xaa/0x190
> [<ffffffff81116657>] ? sys_write+0x47/0x90
> [<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b
So that is doing a write() from fclose, and it's waiting on the
inode XFS_ILOCK_EXCL
/me wishes that all distros compiled their kernels with frame
pointers enabled so that analysing stack traces is better than "I'm
guessing that the real stack trace is....
> 2. flush from another partition
>
> flush-8:48 D 0000000000000000 0 4217 2 0x00000000
> ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
> ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
> ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9
> Call Trace:
> [<ffffffff810656f9>] ? __queue_work+0xc9/0x390
> [<ffffffff811e3e3f>] ? cfq_insert_request+0xaf/0x4f0
> [<ffffffff81065a06>] ? queue_work_on+0x16/0x20
> [<ffffffff813c69cd>] ? schedule_timeout+0x1dd/0x240
> [<ffffffffa041a762>] ? kmem_zone_zalloc+0x32/0x50 [xfs]
> [<ffffffff813c7559>] ? __down+0x6c/0x99
> [<ffffffff81070377>] ? down+0x37/0x40
> [<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs]
> [<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs]
> [<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs]
> [<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs]
> [<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs]
> [<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs]
> [<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs]
> [<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs]
> [<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs]
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0
> [<ffffffffa03d23a4>] ? xfs_alloc_vextent+0x184/0x4a0 [xfs]
> [<ffffffffa03dc348>] ? xfs_bmap_btalloc+0x2d8/0x6d0 [xfs]
> [<ffffffffa03e0efd>] ? xfs_bmapi+0x9bd/0x11a0 [xfs]
> [<ffffffffa03d9bbc>] ? xfs_bmap_search_multi_extents+0xac/0x120 [xfs]
> [<ffffffffa040293c>] ? xfs_iomap_write_allocate+0x17c/0x330 [xfs]
> [<ffffffffa041b20f>] ? xfs_map_blocks+0x19f/0x1b0 [xfs]
> [<ffffffffa041c20e>] ? xfs_vm_writepage+0x19e/0x470 [xfs]
> [<ffffffff810c97ba>] ? __writepage+0xa/0x30
> [<ffffffff810c9c4d>] ? write_cache_pages+0x1cd/0x3d0
> [<ffffffff810c97b0>] ? bdi_set_max_ratio+0x90/0x90
> [<ffffffff810c9e93>] ? generic_writepages+0x43/0x70
> [<ffffffff81139330>] ? writeback_single_inode+0x160/0x300
> [<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0
> [<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140
> [<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310
> [<ffffffff813cedee>] ? apic_timer_interrupt+0xe/0x20
> [<ffffffff8113a10e>] ? wb_check_old_data_flush+0x4e/0xa0
> [<ffffffff8113a28b>] ? wb_do_writeback+0x12b/0x160
> [<ffffffff8113a332>] ? bdi_writeback_thread+0x72/0x150
> [<ffffffff8113a2c0>] ? wb_do_writeback+0x160/0x160
> [<ffffffff8106b06e>] ? kthread+0x7e/0x90
> [<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10
> [<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0
> [<ffffffff813cf540>] ? gs_change+0x13/0x13
Thats the writeback thread waiting on an AGF buffer to be unlocked.
IOWs, there are probably multiple concurrent allocations to the same
AG. But that will be holding the XFS_ILOCK_EXCL lock that the other
thread is waiting.
What thread is holding the AGF buffer is anyone's guess - it could
be waiting on IO completion, which would indicate a problem in the
storage layers below XFS. The output in dmesg from sysrq-w (echo w >
/proc/sysrq-trigger) might help indicate other blocked threads that
could be holding the AGF lock.
.....
> RAID layout (hardware and/or software) Hardware RAID 0 , 2 disks per RAID.
What RAID controller? SAS or SATA drives? Stripe chunk/segment size?
Any BBWC?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: Hung in D state during fclose
2013-02-12 10:20 ` Dave Chinner
@ 2013-02-12 16:39 ` Cheung, Norman
2013-02-12 20:22 ` Dave Chinner
0 siblings, 1 reply; 14+ messages in thread
From: Cheung, Norman @ 2013-02-12 16:39 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
[-- Attachment #1: Type: text/plain, Size: 6615 bytes --]
-----Original Message-----
From: Dave Chinner [mailto:david@fromorbit.com]
Sent: Tuesday, February 12, 2013 2:20 AM
To: Cheung, Norman
Cc: linux-xfs@oss.sgi.com
Subject: Re: Hung in D state during fclose
On Tue, Feb 12, 2013 at 07:01:59AM +0000, Cheung, Norman wrote:
> Sorry Dave. Let me copy and paste the trace below, hope it works.
> Thanks,
> Norman
>
> Below is the sysrq for the 2 offending threads.
>
> 1. the disk writing thread hung in fclose
>
>
> Tigris_IMC.exe D 0000000000000000 0 4197 4100 0x00000000
> ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 Call Trace:
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>]
> ? xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ?
> rwsem_down_failed_common+0xc5/0x150
> [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ?
> xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ?
> __xfs_get_blocks+0x1db/0x3d0 [xfs] [<ffffffff81103340>] ?
> kmem_cache_alloc+0x100/0x130 [<ffffffff8113fa2e>] ?
> alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ?
> __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ?
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ?
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ?
> block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ?
> xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ?
> generic_file_buffered_write+0xf8/0x250
> [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs]
> [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs]
> [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ?
> security_file_permission+0x24/0xc0
> [<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ?
> sys_write+0x47/0x90 [<ffffffff813ce412>] ?
> system_call_fastpath+0x16/0x1b
It's just as mangled. Write them to a file, make sure it is formatted correctly, and attach it to the email.
[NLC] attached in a file, sorry for the trouble. Also paste trace again below hopefully it will come thru better.
> kernel version 3.0.13-0.27
What distribution is that from?
[NLC] SUSE
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
[NLC]
1. the disk writing thread hung in fclose
Tigris_IMC.exe D 0000000000000000 0 4197 4100 0x00000000
ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
0000000000080000 0000000000000000 00000000000401e0 00000000061805c1
Call Trace:
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0
[<ffffffffa0402682>] ? xfs_iomap_write_delay+0x172/0x2b0 [xfs]
[<ffffffff813c7e35>] ? rwsem_down_failed_common+0xc5/0x150
[<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
[<ffffffff813c74ec>] ? down_write+0x1c/0x1d
[<ffffffffa03fba8e>] ? xfs_ilock+0x7e/0xa0 [xfs]
[<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0 [xfs]
[<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130
[<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0
[<ffffffff81141cdf>] ? __block_write_begin+0x1cf/0x4d0
[<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs]
[<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs]
[<ffffffff8114226b>] ? block_write_begin+0x4b/0xa0
[<ffffffffa041b8fb>] ? xfs_vm_write_begin+0x3b/0x70 [xfs]
[<ffffffff810c0258>] ? generic_file_buffered_write+0xf8/0x250
[<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs]
[<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs]
[<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0
[<ffffffff8119daa4>] ? security_file_permission+0x24/0xc0
[<ffffffff8111630a>] ? vfs_write+0xaa/0x190
[<ffffffff81116657>] ? sys_write+0x47/0x90
[<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b
2. flush from another partition
flush-8:48 D 0000000000000000 0 4217 2 0x00000000
ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9
Call Trace:
[<ffffffff810656f9>] ? __queue_work+0xc9/0x390
[<ffffffff811e3e3f>] ? cfq_insert_request+0xaf/0x4f0
[<ffffffff81065a06>] ? queue_work_on+0x16/0x20
[<ffffffff813c69cd>] ? schedule_timeout+0x1dd/0x240
[<ffffffffa041a762>] ? kmem_zone_zalloc+0x32/0x50 [xfs]
[<ffffffff813c7559>] ? __down+0x6c/0x99
[<ffffffff81070377>] ? down+0x37/0x40
[<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs]
[<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs]
[<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs]
[<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs]
[<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs]
[<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs]
[<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs]
[<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs]
[<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs]
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0
[<ffffffffa03d23a4>] ? xfs_alloc_vextent+0x184/0x4a0 [xfs]
[<ffffffffa03dc348>] ? xfs_bmap_btalloc+0x2d8/0x6d0 [xfs]
[<ffffffffa03e0efd>] ? xfs_bmapi+0x9bd/0x11a0 [xfs]
[<ffffffffa03d9bbc>] ? xfs_bmap_search_multi_extents+0xac/0x120 [xfs]
[<ffffffffa040293c>] ? xfs_iomap_write_allocate+0x17c/0x330 [xfs]
[<ffffffffa041b20f>] ? xfs_map_blocks+0x19f/0x1b0 [xfs]
[<ffffffffa041c20e>] ? xfs_vm_writepage+0x19e/0x470 [xfs]
[<ffffffff810c97ba>] ? __writepage+0xa/0x30
[<ffffffff810c9c4d>] ? write_cache_pages+0x1cd/0x3d0
[<ffffffff810c97b0>] ? bdi_set_max_ratio+0x90/0x90
[<ffffffff810c9e93>] ? generic_writepages+0x43/0x70
[<ffffffff81139330>] ? writeback_single_inode+0x160/0x300
[<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0
[<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140
[<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310
[<ffffffff813cedee>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff8113a10e>] ? wb_check_old_data_flush+0x4e/0xa0
[<ffffffff8113a28b>] ? wb_do_writeback+0x12b/0x160
[<ffffffff8113a332>] ? bdi_writeback_thread+0x72/0x150
[<ffffffff8113a2c0>] ? wb_do_writeback+0x160/0x160
[<ffffffff8106b06e>] ? kthread+0x7e/0x90
[<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10
[<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0
[<ffffffff813cf540>] ? gs_change+0x13/0x13
[-- Attachment #2: Trace_fclose_Hangs.txt --]
[-- Type: text/plain, Size: 12123 bytes --]
Below is the sysrq for the 2 offending threads.
1. the disk writing thread hung in fclose
Tigris_IMC.exe D 0000000000000000 0 4197 4100 0x00000000
ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
0000000000080000 0000000000000000 00000000000401e0 00000000061805c1
Call Trace:
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0
[<ffffffffa0402682>] ? xfs_iomap_write_delay+0x172/0x2b0 [xfs]
[<ffffffff813c7e35>] ? rwsem_down_failed_common+0xc5/0x150
[<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
[<ffffffff813c74ec>] ? down_write+0x1c/0x1d
[<ffffffffa03fba8e>] ? xfs_ilock+0x7e/0xa0 [xfs]
[<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0 [xfs]
[<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130
[<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0
[<ffffffff81141cdf>] ? __block_write_begin+0x1cf/0x4d0
[<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs]
[<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs]
[<ffffffff8114226b>] ? block_write_begin+0x4b/0xa0
[<ffffffffa041b8fb>] ? xfs_vm_write_begin+0x3b/0x70 [xfs]
[<ffffffff810c0258>] ? generic_file_buffered_write+0xf8/0x250
[<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs]
[<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs]
[<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0
[<ffffffff8119daa4>] ? security_file_permission+0x24/0xc0
[<ffffffff8111630a>] ? vfs_write+0xaa/0x190
[<ffffffff81116657>] ? sys_write+0x47/0x90
[<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b
2. flush from another partition
flush-8:48 D 0000000000000000 0 4217 2 0x00000000
ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9
Call Trace:
[<ffffffff810656f9>] ? __queue_work+0xc9/0x390
[<ffffffff811e3e3f>] ? cfq_insert_request+0xaf/0x4f0
[<ffffffff81065a06>] ? queue_work_on+0x16/0x20
[<ffffffff813c69cd>] ? schedule_timeout+0x1dd/0x240
[<ffffffffa041a762>] ? kmem_zone_zalloc+0x32/0x50 [xfs]
[<ffffffff813c7559>] ? __down+0x6c/0x99
[<ffffffff81070377>] ? down+0x37/0x40
[<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs]
[<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs]
[<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs]
[<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs]
[<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs]
[<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs]
[<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs]
[<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs]
[<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs]
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0
[<ffffffffa03d23a4>] ? xfs_alloc_vextent+0x184/0x4a0 [xfs]
[<ffffffffa03dc348>] ? xfs_bmap_btalloc+0x2d8/0x6d0 [xfs]
[<ffffffffa03e0efd>] ? xfs_bmapi+0x9bd/0x11a0 [xfs]
[<ffffffffa03d9bbc>] ? xfs_bmap_search_multi_extents+0xac/0x120 [xfs]
[<ffffffffa040293c>] ? xfs_iomap_write_allocate+0x17c/0x330 [xfs]
[<ffffffffa041b20f>] ? xfs_map_blocks+0x19f/0x1b0 [xfs]
[<ffffffffa041c20e>] ? xfs_vm_writepage+0x19e/0x470 [xfs]
[<ffffffff810c97ba>] ? __writepage+0xa/0x30
[<ffffffff810c9c4d>] ? write_cache_pages+0x1cd/0x3d0
[<ffffffff810c97b0>] ? bdi_set_max_ratio+0x90/0x90
[<ffffffff810c9e93>] ? generic_writepages+0x43/0x70
[<ffffffff81139330>] ? writeback_single_inode+0x160/0x300
[<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0
[<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140
[<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310
[<ffffffff813cedee>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff8113a10e>] ? wb_check_old_data_flush+0x4e/0xa0
[<ffffffff8113a28b>] ? wb_do_writeback+0x12b/0x160
[<ffffffff8113a332>] ? bdi_writeback_thread+0x72/0x150
[<ffffffff8113a2c0>] ? wb_do_writeback+0x160/0x160
[<ffffffff8106b06e>] ? kthread+0x7e/0x90
[<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10
[<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0
[<ffffffff813cf540>] ? gs_change+0x13/0x13
kernel version 3.0.13-0.27
xfsprogs version xfs_repair version 3.1.6
number of CPUs 32
xfs_info /fastraid2
meta-data=/dev/sdd1 isize=256 agcount=4, agsize=17822144 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=71288576, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=34808, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
contents of /proc/meminfo
cat /proc/meminfo
MemTotal: 262429860 kB
MemFree: 146893200 kB
Buffers: 38600 kB
Cached: 336388 kB
SwapCached: 0 kB
Active: 4601364 kB
Inactive: 277852 kB
Active(anon): 4504236 kB
Inactive(anon): 9524 kB
Active(file): 97128 kB
Inactive(file): 268328 kB
Unevictable: 107520004 kB
Mlocked: 212048 kB
SwapTotal: 1051644 kB
SwapFree: 1051644 kB
Dirty: 140 kB
Writeback: 0 kB
AnonPages: 112024272 kB
Mapped: 2031420 kB
Shmem: 9528 kB
Slab: 123444 kB
SReclaimable: 24088 kB
SUnreclaim: 99356 kB
KernelStack: 3728 kB
PageTables: 229284 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 260857204 kB
Committed_AS: 112752960 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 712612 kB
VmallocChunk: 34224802044 kB
HardwareCorrupted: 0 kB
AnonHugePages: 111861760 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 133788 kB
DirectMap2M: 7178240 kB
DirectMap1G: 261095424 kB
contents of /proc/mounts
cat /proc/mounts
rootfs / rootfs rw 0 0
udev /dev tmpfs rw,relatime,nr_inodes=0,mode=755 0 0 tmpfs /dev/shm tmpfs rw,relatime 0 0
/dev/sde3 / reiserfs rw,relatime,acl,user_xattr 0 0 proc /proc proc rw,relatime 0 0 sysfs /sys sysfs rw,relatime 0 0 devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
/dev/sde1 /boot reiserfs rw,relatime,acl,user_xattr 0 0 fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0 debugfs /sys/kernel/debug debugfs rw,relatime 0 0 nfsd /proc/fs/nfsd nfsd rw,relatime 0 0 rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 //10.34.3.247/IMCRootFolder /home/klac/mount_path_Tool cifs rw,relatime,sec=ntlm,unc=\\10.34.3.247\IMCRootFolder,username=KTMfg,uid=1000,forceuid,gid=100,forcegid,addr=10.34.3.247,file_mode=0755,dir_mode=0755,nounix,rsize=16384,wsize=131007,actimeo=1 0 0
/dev/sda1 /edge xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdb1 /fastraid0 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdc1 /fastraid1 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdd1 /fastraid2 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdf1 /fastraid3 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdg1 /fastraid4 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdh1 /fastraid5 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
contents of /proc/partitions
cat /proc/partitions
major minor #blocks name
8 16 285155328 sdb
8 17 285154304 sdb1
8 32 285155328 sdc
8 33 285154304 sdc1
8 48 285155328 sdd
8 49 285154304 sdd1
8 80 285671424 sdf
8 81 285659136 sdf1
8 64 124485632 sde
8 65 1051648 sde1
8 66 1051648 sde2
8 67 122381312 sde3
8 96 285671424 sdg
8 97 285659136 sdg1
8 112 285671424 sdh
8 113 285659136 sdh1
8 0 142577664 sda
8 1 142576640 sda1
RAID layout (hardware and/or software) Hardware RAID 0 , 2 disks per RAID.
LVM configuration
fdisk -l
Disk /dev/sdb: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00073b0b
Device Boot Start End Blocks Id System
/dev/sdb1 2048 570310655 285154304 83 Linux
Disk /dev/sdc: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000d3eba
Device Boot Start End Blocks Id System
/dev/sdc1 2048 570310655 285154304 83 Linux
Disk /dev/sdd: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0003e548
Device Boot Start End Blocks Id System
/dev/sdd1 2048 570310655 285154304 83 Linux
Disk /dev/sdf: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0009caf0
Device Boot Start End Blocks Id System
/dev/sdf1 2048 571320319 285659136 83 Linux
Disk /dev/sde: 127.5 GB, 127473287168 bytes
255 heads, 63 sectors/track, 15497 cylinders, total 248971264 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x674cfaee
Device Boot Start End Blocks Id System
/dev/sde1 * 2048 2105343 1051648 83 Linux
/dev/sde2 2105344 4208639 1051648 82 Linux swap / Solaris
/dev/sde3 4208640 248971263 122381312 83 Linux
Disk /dev/sdg: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0000a1d4
Device Boot Start End Blocks Id System
/dev/sdg1 2048 571320319 285659136 83 Linux
Disk /dev/sdh: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0006b7ff
Device Boot Start End Blocks Id System
/dev/sdh1 2048 571320319 285659136 83 Linux
Disk /dev/sda: 146.0 GB, 145999527936 bytes
255 heads, 63 sectors/track, 17750 cylinders, total 285155328 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000f1568
Device Boot Start End Blocks Id System
/dev/sda1 2048 285155327 142576640 83 Linux
[-- Attachment #3: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Hung in D state during fclose
2013-02-12 7:01 ` Cheung, Norman
@ 2013-02-12 10:20 ` Dave Chinner
2013-02-12 16:39 ` Cheung, Norman
0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2013-02-12 10:20 UTC (permalink / raw)
To: Cheung, Norman; +Cc: linux-xfs
On Tue, Feb 12, 2013 at 07:01:59AM +0000, Cheung, Norman wrote:
> Sorry Dave. Let me copy and paste the trace below, hope it works.
> Thanks,
> Norman
>
> Below is the sysrq for the 2 offending threads.
>
> 1. the disk writing thread hung in fclose
>
>
> Tigris_IMC.exe D 0000000000000000 0 4197 4100 0x00000000
> ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 Call Trace:
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>] ? xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ? rwsem_down_failed_common+0xc5/0x150
> [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ? xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0 [xfs] [<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130 [<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ? __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ? block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ? xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ? generic_file_buffered_write+0xf8/0x250
> [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs] [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs] [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ? security_file_permission+0x24/0xc0
> [<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ? sys_write+0x47/0x90 [<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b
It's just as mangled. Write them to a file, make sure it is
formatted correctly, and attach it to the email.
> kernel version 3.0.13-0.27
What distribution is that from?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: Hung in D state during fclose
2013-02-12 6:55 ` Dave Chinner
@ 2013-02-12 7:01 ` Cheung, Norman
2013-02-12 10:20 ` Dave Chinner
0 siblings, 1 reply; 14+ messages in thread
From: Cheung, Norman @ 2013-02-12 7:01 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
Sorry Dave. Let me copy and paste the trace below, hope it works.
Thanks,
Norman
Below is the sysrq for the 2 offending threads.
1. the disk writing thread hung in fclose
Tigris_IMC.exe D 0000000000000000 0 4197 4100 0x00000000
ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 Call Trace:
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>] ? xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ? rwsem_down_failed_common+0xc5/0x150
[<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
[<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ? xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0 [xfs] [<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130 [<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ? __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ? block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ? xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ? generic_file_buffered_write+0xf8/0x250
[<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs] [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs] [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ? security_file_permission+0x24/0xc0
[<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ? sys_write+0x47/0x90 [<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b
2. flush from another partition
flush-8:48 D 0000000000000000 0 4217 2 0x00000000
ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9 Call Trace:
[<ffffffff810656f9>] ? __queue_work+0xc9/0x390 [<ffffffff811e3e3f>] ? cfq_insert_request+0xaf/0x4f0 [<ffffffff81065a06>] ? queue_work_on+0x16/0x20 [<ffffffff813c69cd>] ? schedule_timeout+0x1dd/0x240 [<ffffffffa041a762>] ? kmem_zone_zalloc+0x32/0x50 [xfs] [<ffffffff813c7559>] ? __down+0x6c/0x99 [<ffffffff81070377>] ? down+0x37/0x40 [<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs] [<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs] [<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs] [<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs] [<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs] [<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs] [<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs] [<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs] [<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs] [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa03d23a4>] ? xfs_alloc_vextent+0x184/0x4a0 [xfs] [<ffffffffa03dc348>] ? xfs_bmap_btalloc+0x2d8/0x
6d0 [xfs] [<ffffffffa03e0efd>] ? xfs_bmapi+0x9bd/0x11a0 [xfs] [<ffffffffa03d9bbc>] ? xfs_bmap_search_multi_extents+0xac/0x120 [xfs] [<ffffffffa040293c>] ? xfs_iomap_write_allocate+0x17c/0x330 [xfs] [<ffffffffa041b20f>] ? xfs_map_blocks+0x19f/0x1b0 [xfs] [<ffffffffa041c20e>] ? xfs_vm_writepage+0x19e/0x470 [xfs] [<ffffffff810c97ba>] ? __writepage+0xa/0x30 [<ffffffff810c9c4d>] ? write_cache_pages+0x1cd/0x3d0 [<ffffffff810c97b0>] ? bdi_set_max_ratio+0x90/0x90 [<ffffffff810c9e93>] ? generic_writepages+0x43/0x70 [<ffffffff81139330>] ? writeback_single_inode+0x160/0x300
[<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0 [<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140 [<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310 [<ffffffff813cedee>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8113a10e>] ? wb_check_old_data_flush+0x4e/0xa0 [<ffffffff8113a28b>] ? wb_do_writeback+0x12b/0x160 [<ffffffff8113a332>] ? bdi_writeback_thread+0x72/0x150 [<ffffffff8113a2c0>] ? wb_do_writeback+0x160/0x160 [<ffffffff8106b06e>] ? kthread+0x7e/0x90 [<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0 [<ffffffff813cf540>] ? gs_change+0x13/0x13
kernel version 3.0.13-0.27
xfsprogs version xfs_repair version 3.1.6
number of CPUs 32
xfs_info /fastraid2
meta-data=/dev/sdd1 isize=256 agcount=4, agsize=17822144 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=71288576, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=34808, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
contents of /proc/meminfo
cat /proc/meminfo
MemTotal: 262429860 kB
MemFree: 146893200 kB
Buffers: 38600 kB
Cached: 336388 kB
SwapCached: 0 kB
Active: 4601364 kB
Inactive: 277852 kB
Active(anon): 4504236 kB
Inactive(anon): 9524 kB
Active(file): 97128 kB
Inactive(file): 268328 kB
Unevictable: 107520004 kB
Mlocked: 212048 kB
SwapTotal: 1051644 kB
SwapFree: 1051644 kB
Dirty: 140 kB
Writeback: 0 kB
AnonPages: 112024272 kB
Mapped: 2031420 kB
Shmem: 9528 kB
Slab: 123444 kB
SReclaimable: 24088 kB
SUnreclaim: 99356 kB
KernelStack: 3728 kB
PageTables: 229284 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 260857204 kB
Committed_AS: 112752960 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 712612 kB
VmallocChunk: 34224802044 kB
HardwareCorrupted: 0 kB
AnonHugePages: 111861760 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 133788 kB
DirectMap2M: 7178240 kB
DirectMap1G: 261095424 kB
contents of /proc/mounts
cat /proc/mounts
rootfs / rootfs rw 0 0
udev /dev tmpfs rw,relatime,nr_inodes=0,mode=755 0 0 tmpfs /dev/shm tmpfs rw,relatime 0 0
/dev/sde3 / reiserfs rw,relatime,acl,user_xattr 0 0 proc /proc proc rw,relatime 0 0 sysfs /sys sysfs rw,relatime 0 0 devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
/dev/sde1 /boot reiserfs rw,relatime,acl,user_xattr 0 0 fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0 debugfs /sys/kernel/debug debugfs rw,relatime 0 0 nfsd /proc/fs/nfsd nfsd rw,relatime 0 0 rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 //10.34.3.247/IMCRootFolder /home/klac/mount_path_Tool cifs rw,relatime,sec=ntlm,unc=\\10.34.3.247\IMCRootFolder,username=KTMfg,uid=1000,forceuid,gid=100,forcegid,addr=10.34.3.247,file_mode=0755,dir_mode=0755,nounix,rsize=16384,wsize=131007,actimeo=1 0 0
/dev/sda1 /edge xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdb1 /fastraid0 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdc1 /fastraid1 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdd1 /fastraid2 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdf1 /fastraid3 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdg1 /fastraid4 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdh1 /fastraid5 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
contents of /proc/partitions
cat /proc/partitions
major minor #blocks name
8 16 285155328 sdb
8 17 285154304 sdb1
8 32 285155328 sdc
8 33 285154304 sdc1
8 48 285155328 sdd
8 49 285154304 sdd1
8 80 285671424 sdf
8 81 285659136 sdf1
8 64 124485632 sde
8 65 1051648 sde1
8 66 1051648 sde2
8 67 122381312 sde3
8 96 285671424 sdg
8 97 285659136 sdg1
8 112 285671424 sdh
8 113 285659136 sdh1
8 0 142577664 sda
8 1 142576640 sda1
RAID layout (hardware and/or software) Hardware RAID 0 , 2 disks per RAID.
LVM configuration
fdisk -l
Disk /dev/sdb: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00073b0b
Device Boot Start End Blocks Id System
/dev/sdb1 2048 570310655 285154304 83 Linux
Disk /dev/sdc: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000d3eba
Device Boot Start End Blocks Id System
/dev/sdc1 2048 570310655 285154304 83 Linux
Disk /dev/sdd: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0003e548
Device Boot Start End Blocks Id System
/dev/sdd1 2048 570310655 285154304 83 Linux
Disk /dev/sdf: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0009caf0
Device Boot Start End Blocks Id System
/dev/sdf1 2048 571320319 285659136 83 Linux
Disk /dev/sde: 127.5 GB, 127473287168 bytes
255 heads, 63 sectors/track, 15497 cylinders, total 248971264 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x674cfaee
Device Boot Start End Blocks Id System
/dev/sde1 * 2048 2105343 1051648 83 Linux
/dev/sde2 2105344 4208639 1051648 82 Linux swap / Solaris
/dev/sde3 4208640 248971263 122381312 83 Linux
Disk /dev/sdg: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0000a1d4
Device Boot Start End Blocks Id System
/dev/sdg1 2048 571320319 285659136 83 Linux
Disk /dev/sdh: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0006b7ff
Device Boot Start End Blocks Id System
/dev/sdh1 2048 571320319 285659136 83 Linux
Disk /dev/sda: 146.0 GB, 145999527936 bytes
255 heads, 63 sectors/track, 17750 cylinders, total 285155328 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000f1568
Device Boot Start End Blocks Id System
/dev/sda1 2048 285155327 142576640 83 Linux
-----Original Message-----
From: Dave Chinner [mailto:david@fromorbit.com]
Sent: Monday, February 11, 2013 10:56 PM
To: Cheung, Norman
Cc: linux-xfs@oss.sgi.com
Subject: Re: Hung in D state during fclose
On Tue, Feb 12, 2013 at 06:17:04AM +0000, Norman Cheung wrote:
> I am not sure if this forum is the same as xfs@oss.sgi.com, if so, my
> apology for double posting. I appreciate in sight or work around on this.
>
> Every 3 - 4 days, my application will hang in D state at file close.
> And shortly after that flush (from a different partition) is locked in
> D state also.
>
> My application runs continuously, 5 threads are writing data at a rate
> of 1.5M/sec to 5 different XFS partitions. Each of these partitions
> is a 2 disk RAID 0. In addition, I have other threads consuming 100%
> CPU at all time, and most of these threads are tied to its own CPU.
>
> There are 5 data writing threads are also set to run in specific CPU
> (one CPU per thread), with priority set to high (-2). The data writing pattern is:
> each disk writing thread will write a file 1.5 Gig. Then the thread
> will pause for about 3 minutes. Hence we have 5 files of 1.5Gig each
> after one processing cycle. And we keep 5 sets and delete the older ones.
>
> After about 300 - 800 cycle, one or two of these disk writing threads
> will go into D state. And within a second flush of another partition
> will show up in D state. then after 15 minutes of no activities, the
> parent task will lower the priority of all threads (to noraml 20) and
> abort the threads. In all cases, lowering the priority will get
> threads out of D states. I have also tried running the disk writing
> threads with normal priority (20). Same hangs. Also the fclose of
> all 5 files to 5 different partitions happens around the same time.
>
> Thanks in advance,
>
> Norman
>
>
> Below is the sysrq for the 2 offending threads.
>
> 1. the disk writing thread hung in fclose
>
>
> Tigris_IMC.exe D 0000000000000000 0 4197 4100 0x00000000
> ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 Call Trace:
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>] ?
> xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ?
> rwsem_down_failed_common+0xc5/0x150
> [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ?
> xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ?
> __xfs_get_blocks+0x1db/0x3d0 [xfs] [<ffffffff81103340>] ?
> kmem_cache_alloc+0x100/0x130 [<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ?
> __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ?
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ?
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ?
> block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ?
> xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ?
> generic_file_buffered_write+0xf8/0x250
> [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs]
> [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs]
> [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ?
> security_file_permission+0x24/0xc0
> [<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ?
> sys_write+0x47/0x90 [<ffffffff813ce412>] ?
> system_call_fastpath+0x16/0x1b
Can you please post non-mangled traces? these have all the lines run together and then wrapped by you mailer....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Hung in D state during fclose
2013-02-12 6:17 Norman Cheung
@ 2013-02-12 6:55 ` Dave Chinner
2013-02-12 7:01 ` Cheung, Norman
0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2013-02-12 6:55 UTC (permalink / raw)
To: Norman Cheung; +Cc: linux-xfs
On Tue, Feb 12, 2013 at 06:17:04AM +0000, Norman Cheung wrote:
> I am not sure if this forum is the same as xfs@oss.sgi.com, if so, my apology
> for double posting. I appreciate in sight or work around on this.
>
> Every 3 - 4 days, my application will hang in D state at file close. And
> shortly after that flush (from a different partition) is locked in D state
> also.
>
> My application runs continuously, 5 threads are writing data at a rate of
> 1.5M/sec to 5 different XFS partitions. Each of these partitions is a 2 disk
> RAID 0. In addition, I have other threads consuming 100% CPU at all time, and
> most of these threads are tied to its own CPU.
>
> There are 5 data writing threads are also set to run in specific CPU (one CPU
> per thread), with priority set to high (-2). The data writing pattern is:
> each disk writing thread will write a file 1.5 Gig. Then the thread will
> pause for about 3 minutes. Hence we have 5 files of 1.5Gig each after one
> processing cycle. And we keep 5 sets and delete the older ones.
>
> After about 300 - 800 cycle, one or two of these disk writing threads will go
> into D state. And within a second flush of another partition will show up in
> D state. then after 15 minutes of no activities, the parent task will lower
> the priority of all threads (to noraml 20) and abort the threads. In all
> cases, lowering the priority will get threads out of D states. I have also
> tried running the disk writing threads with normal priority (20). Same
> hangs. Also the fclose of all 5 files to 5 different partitions happens
> around the same time.
>
> Thanks in advance,
>
> Norman
>
>
> Below is the sysrq for the 2 offending threads.
>
> 1. the disk writing thread hung in fclose
>
>
> Tigris_IMC.exe D 0000000000000000 0 4197 4100 0x00000000
> ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 Call Trace:
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>] ?
> xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ?
> rwsem_down_failed_common+0xc5/0x150
> [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ?
> xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0
> [xfs] [<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130
> [<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ?
> __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ?
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ?
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ?
> block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ?
> xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ?
> generic_file_buffered_write+0xf8/0x250
> [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs]
> [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs]
> [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ?
> security_file_permission+0x24/0xc0
> [<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ?
> sys_write+0x47/0x90 [<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b
Can you please post non-mangled traces? these have all the lines run
together and then wrapped by you mailer....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Hung in D state during fclose
@ 2013-02-12 6:17 Norman Cheung
2013-02-12 6:55 ` Dave Chinner
0 siblings, 1 reply; 14+ messages in thread
From: Norman Cheung @ 2013-02-12 6:17 UTC (permalink / raw)
To: linux-xfs
I am not sure if this forum is the same as xfs@oss.sgi.com, if so, my apology
for double posting. I appreciate in sight or work around on this.
Every 3 - 4 days, my application will hang in D state at file close. And
shortly after that flush (from a different partition) is locked in D state
also.
My application runs continuously, 5 threads are writing data at a rate of
1.5M/sec to 5 different XFS partitions. Each of these partitions is a 2 disk
RAID 0. In addition, I have other threads consuming 100% CPU at all time, and
most of these threads are tied to its own CPU.
There are 5 data writing threads are also set to run in specific CPU (one CPU
per thread), with priority set to high (-2). The data writing pattern is:
each disk writing thread will write a file 1.5 Gig. Then the thread will
pause for about 3 minutes. Hence we have 5 files of 1.5Gig each after one
processing cycle. And we keep 5 sets and delete the older ones.
After about 300 - 800 cycle, one or two of these disk writing threads will go
into D state. And within a second flush of another partition will show up in
D state. then after 15 minutes of no activities, the parent task will lower
the priority of all threads (to noraml 20) and abort the threads. In all
cases, lowering the priority will get threads out of D states. I have also
tried running the disk writing threads with normal priority (20). Same
hangs. Also the fclose of all 5 files to 5 different partitions happens
around the same time.
Thanks in advance,
Norman
Below is the sysrq for the 2 offending threads.
1. the disk writing thread hung in fclose
Tigris_IMC.exe D 0000000000000000 0 4197 4100 0x00000000
ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 Call Trace:
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>] ?
xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ?
rwsem_down_failed_common+0xc5/0x150
[<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
[<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ?
xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0
[xfs] [<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130
[<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ?
__block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ?
xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ?
xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ?
block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ?
xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ?
generic_file_buffered_write+0xf8/0x250
[<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs]
[<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs]
[<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ?
security_file_permission+0x24/0xc0
[<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ?
sys_write+0x47/0x90 [<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b
2. flush from another partition
flush-8:48 D 0000000000000000 0 4217 2 0x00000000
ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9 Call Trace:
[<ffffffff810656f9>] ? __queue_work+0xc9/0x390 [<ffffffff811e3e3f>] ?
cfq_insert_request+0xaf/0x4f0 [<ffffffff81065a06>] ? queue_work_on+0x16/0x20
[<ffffffff813c69cd>] ? schedule_timeout+0x1dd/0x240 [<ffffffffa041a762>] ?
kmem_zone_zalloc+0x32/0x50 [xfs] [<ffffffff813c7559>] ? __down+0x6c/0x99
[<ffffffff81070377>] ? down+0x37/0x40 [<ffffffffa041d59d>] ?
xfs_buf_lock+0x1d/0x40 [xfs] [<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210
[xfs] [<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs]
[<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs] [<ffffffffa04151af>] ?
xfs_trans_read_buf+0x1bf/0x2f0 [xfs] [<ffffffffa03d06c0>] ?
xfs_read_agf+0x60/0x1b0 [xfs] [<ffffffffa03cf3b7>] ?
xfs_alloc_update+0x17/0x20 [xfs] [<ffffffffa03d0841>] ?
xfs_alloc_read_agf+0x31/0xd0 [xfs] [<ffffffffa03d2083>] ?
xfs_alloc_fix_freelist+0x433/0x4a0 [xfs] [<ffffffff810d89ed>] ?
zone_statistics+0x9d/0xa0 [<ffffffffa03d23a4>] ? xfs_alloc_vextent+0x184/0x4a0
[xfs] [<ffffffffa03dc348>] ? xfs_bmap_btalloc+0x2d8/0x6d0 [xfs]
[<ffffffffa03e0efd>] ? xfs_bmapi+0x9bd/0x11a0 [xfs] [<ffffffffa03d9bbc>] ?
xfs_bmap_search_multi_extents+0xac/0x120 [xfs] [<ffffffffa040293c>] ?
xfs_iomap_write_allocate+0x17c/0x330 [xfs] [<ffffffffa041b20f>] ?
xfs_map_blocks+0x19f/0x1b0 [xfs] [<ffffffffa041c20e>] ?
xfs_vm_writepage+0x19e/0x470 [xfs] [<ffffffff810c97ba>] ? __writepage+0xa/0x30
[<ffffffff810c9c4d>] ? write_cache_pages+0x1cd/0x3d0 [<ffffffff810c97b0>] ?
bdi_set_max_ratio+0x90/0x90 [<ffffffff810c9e93>] ?
generic_writepages+0x43/0x70 [<ffffffff81139330>] ?
writeback_single_inode+0x160/0x300
[<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0 [<ffffffff81139cfd>] ?
writeback_inodes_wb+0x8d/0x140 [<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310
[<ffffffff813cedee>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8113a10e>] ?
wb_check_old_data_flush+0x4e/0xa0 [<ffffffff8113a28b>] ?
wb_do_writeback+0x12b/0x160 [<ffffffff8113a332>] ?
bdi_writeback_thread+0x72/0x150 [<ffffffff8113a2c0>] ?
wb_do_writeback+0x160/0x160 [<ffffffff8106b06e>] ? kthread+0x7e/0x90
[<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8106aff0>] ?
kthread_worker_fn+0x1a0/0x1a0 [<ffffffff813cf540>] ? gs_change+0x13/0x13
kernel version 3.0.13-0.27
xfsprogs version xfs_repair version 3.1.6
number of CPUs 32
xfs_info /fastraid2
meta-data=/dev/sdd1 isize=256 agcount=4, agsize=17822144 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=71288576, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=34808, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
contents of /proc/meminfo
cat /proc/meminfo
MemTotal: 262429860 kB
MemFree: 146893200 kB
Buffers: 38600 kB
Cached: 336388 kB
SwapCached: 0 kB
Active: 4601364 kB
Inactive: 277852 kB
Active(anon): 4504236 kB
Inactive(anon): 9524 kB
Active(file): 97128 kB
Inactive(file): 268328 kB
Unevictable: 107520004 kB
Mlocked: 212048 kB
SwapTotal: 1051644 kB
SwapFree: 1051644 kB
Dirty: 140 kB
Writeback: 0 kB
AnonPages: 112024272 kB
Mapped: 2031420 kB
Shmem: 9528 kB
Slab: 123444 kB
SReclaimable: 24088 kB
SUnreclaim: 99356 kB
KernelStack: 3728 kB
PageTables: 229284 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 260857204 kB
Committed_AS: 112752960 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 712612 kB
VmallocChunk: 34224802044 kB
HardwareCorrupted: 0 kB
AnonHugePages: 111861760 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 133788 kB
DirectMap2M: 7178240 kB
DirectMap1G: 261095424 kB
contents of /proc/mounts
cat /proc/mounts
rootfs / rootfs rw 0 0
udev /dev tmpfs rw,relatime,nr_inodes=0,mode=755 0 0 tmpfs /dev/shm tmpfs
rw,relatime 0 0
/dev/sde3 / reiserfs rw,relatime,acl,user_xattr 0 0 proc /proc proc
rw,relatime 0 0 sysfs /sys sysfs rw,relatime 0 0 devpts /dev/pts devpts
rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
/dev/sde1 /boot reiserfs rw,relatime,acl,user_xattr 0 0
fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0 nfsd /proc/fs/nfsd nfsd
rw,relatime 0 0 rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0
0 //10.34.3.247/IMCRootFolder /home/klac/mount_path_Tool cifs
rw,relatime,sec=ntlm,unc=\\10.34.3.247
\IMCRootFolder,username=KTMfg,uid=1000,forceuid,gid=100,forcegid,addr=10.34.3.2
47,file_mode=0755,dir_mode=0755,nounix,rsize=16384,wsize=131007,actimeo=1 0 0
/dev/sda1 /edge xfs
rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,
noquota 0 0
/dev/sdb1 /fastraid0 xfs
rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,
noquota 0 0
/dev/sdc1 /fastraid1 xfs
rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,
noquota 0 0
/dev/sdd1 /fastraid2 xfs
rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,
noquota 0 0
/dev/sdf1 /fastraid3 xfs
rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,
noquota 0 0
/dev/sdg1 /fastraid4 xfs
rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,
noquota 0 0
/dev/sdh1 /fastraid5 xfs
rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,
noquota 0 0
contents of /proc/partitions
cat /proc/partitions
major minor #blocks name
8 16 285155328 sdb
8 17 285154304 sdb1
8 32 285155328 sdc
8 33 285154304 sdc1
8 48 285155328 sdd
8 49 285154304 sdd1
8 80 285671424 sdf
8 81 285659136 sdf1
8 64 124485632 sde
8 65 1051648 sde1
8 66 1051648 sde2
8 67 122381312 sde3
8 96 285671424 sdg
8 97 285659136 sdg1
8 112 285671424 sdh
8 113 285659136 sdh1
8 0 142577664 sda
8 1 142576640 sda1
RAID layout (hardware and/or software) Hardware RAID 0 , 2 disks per
RAID.
LVM configuration
fdisk -l
Disk /dev/sdb: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units =
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier:
0x00073b0b
Device Boot Start End Blocks Id System
/dev/sdb1 2048 570310655 285154304 83 Linux
Disk /dev/sdc: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units =
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier:
0x000d3eba
Device Boot Start End Blocks Id System
/dev/sdc1 2048 570310655 285154304 83 Linux
Disk /dev/sdd: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units =
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier:
0x0003e548
Device Boot Start End Blocks Id System
/dev/sdd1 2048 570310655 285154304 83 Linux
Disk /dev/sdf: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units =
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier:
0x0009caf0
Device Boot Start End Blocks Id System
/dev/sdf1 2048 571320319 285659136 83 Linux
Disk /dev/sde: 127.5 GB, 127473287168 bytes
255 heads, 63 sectors/track, 15497 cylinders, total 248971264 sectors Units =
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier:
0x674cfaee
Device Boot Start End Blocks Id System
/dev/sde1 * 2048 2105343 1051648 83 Linux
/dev/sde2 2105344 4208639 1051648 82 Linux swap / Solaris
/dev/sde3 4208640 248971263 122381312 83 Linux
Disk /dev/sdg: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units =
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier:
0x0000a1d4
Device Boot Start End Blocks Id System
/dev/sdg1 2048 571320319 285659136 83 Linux
Disk /dev/sdh: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units =
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier:
0x0006b7ff
Device Boot Start End Blocks Id System
/dev/sdh1 2048 571320319 285659136 83 Linux
Disk /dev/sda: 146.0 GB, 145999527936 bytes
255 heads, 63 sectors/track, 17750 cylinders, total 285155328 sectors Units =
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier:
0x000f1568
Device Boot Start End Blocks Id System
/dev/sda1 2048 285155327 142576640 83 Linux
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-02-26 20:31 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-11 23:36 Hung in D state during fclose Cheung, Norman
2013-02-12 6:17 Norman Cheung
2013-02-12 6:55 ` Dave Chinner
2013-02-12 7:01 ` Cheung, Norman
2013-02-12 10:20 ` Dave Chinner
2013-02-12 16:39 ` Cheung, Norman
2013-02-12 20:22 ` Dave Chinner
2013-02-12 21:06 ` Cheung, Norman
2013-02-12 22:21 ` Dave Chinner
2013-02-13 0:12 ` Cheung, Norman
2013-02-13 5:15 ` Dave Chinner
2013-02-14 4:53 ` Cheung, Norman
2013-02-26 19:41 ` Cheung, Norman
2013-02-26 20:31 ` Dave Chinner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.