All of lore.kernel.org
 help / color / mirror / Atom feed
* Hung in D state during fclose
@ 2013-02-11 23:36 Cheung, Norman
  0 siblings, 0 replies; 14+ messages in thread
From: Cheung, Norman @ 2013-02-11 23:36 UTC (permalink / raw)
  To: xfs

Every 3 - 4 days, my application will hang in D state at file close.  And shortly after that flush (from a different partition) is locked in D state also.

My application runs continuously, 5 threads are writing data at a rate of 400K decotr/sec to 5 different XFS partitions.  Each of these partitions is a 2 disk RAID 0. In addition, I have other threads consuming 100% CPU at all time, and most of these threads are tied to its own CPU.

There are 5 data writing threads are also locked in specific CPU (one CPU for one thread), with priority set to high (-2).  The data writing pattern is: each disk writing thread will write a file  1.5 Gig. Then the thread will pause for about 3 minutes. Hence we have 5 files of 1.5Gig each after one processing cycle.  And we keep 5 sets and delete the older ones.

After about 300 - 800 cycle, one or two of these disk writing threads will go into D state.  And within a second flush of another partition will show up in D state.  then after 15 minutes of no activities, the parent task will lower the priority of all threads (to noraml 20) and abort the threads.  In all cases, lowering the priority will get threads out of D states.  I have also tried running the disk writing threads with normal priority (20).  Same hangs.

Thanks in advance,

Norman 


Below is the sysrq for the 2 offending threads.

1. the disk writing thread hung in fclose


Tigris_IMC.exe  D 0000000000000000     0  4197   4100 0x00000000
ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
0000000000080000 0000000000000000 00000000000401e0 00000000061805c1
Call Trace:
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0
[<ffffffffa0402682>] ? xfs_iomap_write_delay+0x172/0x2b0 [xfs]
[<ffffffff813c7e35>] ? rwsem_down_failed_common+0xc5/0x150
[<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
[<ffffffff813c74ec>] ? down_write+0x1c/0x1d
[<ffffffffa03fba8e>] ? xfs_ilock+0x7e/0xa0 [xfs]
[<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0 [xfs]
[<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130
[<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0
[<ffffffff81141cdf>] ? __block_write_begin+0x1cf/0x4d0
[<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs]
[<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs]
[<ffffffff8114226b>] ? block_write_begin+0x4b/0xa0
[<ffffffffa041b8fb>] ? xfs_vm_write_begin+0x3b/0x70 [xfs]
[<ffffffff810c0258>] ? generic_file_buffered_write+0xf8/0x250
[<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs]
[<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs]
[<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0
[<ffffffff8119daa4>] ? security_file_permission+0x24/0xc0
[<ffffffff8111630a>] ? vfs_write+0xaa/0x190
[<ffffffff81116657>] ? sys_write+0x47/0x90
[<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b

2. flush from another partition

flush-8:48      D 0000000000000000     0  4217      2 0x00000000
ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9
Call Trace:
[<ffffffff810656f9>] ? __queue_work+0xc9/0x390
[<ffffffff811e3e3f>] ? cfq_insert_request+0xaf/0x4f0
[<ffffffff81065a06>] ? queue_work_on+0x16/0x20
[<ffffffff813c69cd>] ? schedule_timeout+0x1dd/0x240
[<ffffffffa041a762>] ? kmem_zone_zalloc+0x32/0x50 [xfs]
[<ffffffff813c7559>] ? __down+0x6c/0x99
[<ffffffff81070377>] ? down+0x37/0x40
[<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs]
[<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs]
[<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs]
[<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs]
[<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs]
[<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs]
[<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs]
[<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs]
[<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs]
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0
[<ffffffffa03d23a4>] ? xfs_alloc_vextent+0x184/0x4a0 [xfs]
[<ffffffffa03dc348>] ? xfs_bmap_btalloc+0x2d8/0x6d0 [xfs]
[<ffffffffa03e0efd>] ? xfs_bmapi+0x9bd/0x11a0 [xfs]
[<ffffffffa03d9bbc>] ? xfs_bmap_search_multi_extents+0xac/0x120 [xfs]
[<ffffffffa040293c>] ? xfs_iomap_write_allocate+0x17c/0x330 [xfs]
[<ffffffffa041b20f>] ? xfs_map_blocks+0x19f/0x1b0 [xfs]
[<ffffffffa041c20e>] ? xfs_vm_writepage+0x19e/0x470 [xfs]
[<ffffffff810c97ba>] ? __writepage+0xa/0x30
[<ffffffff810c9c4d>] ? write_cache_pages+0x1cd/0x3d0
[<ffffffff810c97b0>] ? bdi_set_max_ratio+0x90/0x90
[<ffffffff810c9e93>] ? generic_writepages+0x43/0x70
[<ffffffff81139330>] ? writeback_single_inode+0x160/0x300
[<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0
[<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140
[<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310
[<ffffffff813cedee>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff8113a10e>] ? wb_check_old_data_flush+0x4e/0xa0
[<ffffffff8113a28b>] ? wb_do_writeback+0x12b/0x160
[<ffffffff8113a332>] ? bdi_writeback_thread+0x72/0x150
[<ffffffff8113a2c0>] ? wb_do_writeback+0x160/0x160
[<ffffffff8106b06e>] ? kthread+0x7e/0x90
[<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10
[<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0
[<ffffffff813cf540>] ? gs_change+0x13/0x13


kernel version 3.0.13-0.27
    
xfsprogs version xfs_repair version 3.1.6

number of CPUs  32

 xfs_info /fastraid2
meta-data=/dev/sdd1              isize=256    agcount=4, agsize=17822144 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=71288576, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=34808, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


    contents of /proc/meminfo
cat /proc/meminfo
MemTotal:       262429860 kB
MemFree:        146893200 kB
Buffers:           38600 kB
Cached:           336388 kB
SwapCached:            0 kB
Active:          4601364 kB
Inactive:         277852 kB
Active(anon):    4504236 kB
Inactive(anon):     9524 kB
Active(file):      97128 kB
Inactive(file):   268328 kB
Unevictable:    107520004 kB
Mlocked:          212048 kB
SwapTotal:       1051644 kB
SwapFree:        1051644 kB
Dirty:               140 kB
Writeback:             0 kB
AnonPages:      112024272 kB
Mapped:          2031420 kB
Shmem:              9528 kB
Slab:             123444 kB
SReclaimable:      24088 kB
SUnreclaim:        99356 kB
KernelStack:        3728 kB
PageTables:       229284 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    260857204 kB
Committed_AS:   112752960 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      712612 kB
VmallocChunk:   34224802044 kB
HardwareCorrupted:     0 kB
AnonHugePages:  111861760 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      133788 kB
DirectMap2M:     7178240 kB
DirectMap1G:    261095424 kB


    contents of /proc/mounts

cat /proc/mounts
rootfs / rootfs rw 0 0
udev /dev tmpfs rw,relatime,nr_inodes=0,mode=755 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
/dev/sde3 / reiserfs rw,relatime,acl,user_xattr 0 0
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
/dev/sde1 /boot reiserfs rw,relatime,acl,user_xattr 0 0
fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
//10.34.3.247/IMCRootFolder /home/klac/mount_path_Tool cifs rw,relatime,sec=ntlm,unc=\\10.34.3.247\IMCRootFolder,username=KTMfg,uid=1000,forceuid,gid=100,forcegid,addr=10.34.3.247,file_mode=0755,dir_mode=0755,nounix,rsize=16384,wsize=131007,actimeo=1 0 0
/dev/sda1 /edge xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdb1 /fastraid0 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdc1 /fastraid1 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdd1 /fastraid2 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdf1 /fastraid3 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdg1 /fastraid4 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdh1 /fastraid5 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0


    contents of /proc/partitions
 cat /proc/partitions
major minor  #blocks  name

   8       16  285155328 sdb
   8       17  285154304 sdb1
   8       32  285155328 sdc
   8       33  285154304 sdc1
   8       48  285155328 sdd
   8       49  285154304 sdd1
   8       80  285671424 sdf
   8       81  285659136 sdf1
   8       64  124485632 sde
   8       65    1051648 sde1
   8       66    1051648 sde2
   8       67  122381312 sde3
   8       96  285671424 sdg
   8       97  285659136 sdg1
   8      112  285671424 sdh
   8      113  285659136 sdh1
   8        0  142577664 sda
   8        1  142576640 sda1


    RAID layout (hardware and/or software)
Hardware RAID 0 , 2 disks per RAID.  
    LVM configuration

fdisk -l

Disk /dev/sdb: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00073b0b

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048   570310655   285154304   83  Linux

Disk /dev/sdc: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000d3eba

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048   570310655   285154304   83  Linux

Disk /dev/sdd: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0003e548

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1            2048   570310655   285154304   83  Linux

Disk /dev/sdf: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0009caf0

   Device Boot      Start         End      Blocks   Id  System
/dev/sdf1            2048   571320319   285659136   83  Linux

Disk /dev/sde: 127.5 GB, 127473287168 bytes
255 heads, 63 sectors/track, 15497 cylinders, total 248971264 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x674cfaee

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1   *        2048     2105343     1051648   83  Linux
/dev/sde2         2105344     4208639     1051648   82  Linux swap / Solaris
/dev/sde3         4208640   248971263   122381312   83  Linux

Disk /dev/sdg: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0000a1d4

   Device Boot      Start         End      Blocks   Id  System
/dev/sdg1            2048   571320319   285659136   83  Linux

Disk /dev/sdh: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0006b7ff

   Device Boot      Start         End      Blocks   Id  System
/dev/sdh1            2048   571320319   285659136   83  Linux

Disk /dev/sda: 146.0 GB, 145999527936 bytes
255 heads, 63 sectors/track, 17750 cylinders, total 285155328 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000f1568

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048   285155327   142576640   83  Linux

 


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Hung in D state during fclose
  2013-02-26 19:41                   ` Cheung, Norman
@ 2013-02-26 20:31                     ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2013-02-26 20:31 UTC (permalink / raw)
  To: Cheung, Norman; +Cc: 'linux-xfs@oss.sgi.com'

On Tue, Feb 26, 2013 at 07:41:42PM +0000, Cheung, Norman wrote:
> I'd been checking all the XFS patches if any might relate to my situation and came across this 
> http://article.gmane.org/gmane.comp.file-systems.xfs.general/40349/
> 
> From: Christoph Hellwig <hch <at> infradead.org>
> Subject: [PATCH, RFC] writeback: avoid redirtying when ->write_inode failed to clear I_DIRTY
> Newsgroups: gmane.comp.file-systems.xfs.general
> Date: 2011-08-27 06:14:09 GMT (1 year, 26 weeks, 1 day, 13 hours and 17 minutes ago)
> Right now ->write_inode has no way to safely return a EAGAIN without explicitly
> redirtying the inode, as we would lose the dirty state otherwise.  Most
> filesystems get this wrong, but XFS makes heavy use of it to avoid blocking
> the flusher thread when ->write_inode hits contentended inode locks.  A
> contended ilock is something XFS can hit very easibly when extending files, as
> the data I/O completion handler takes the lock to update the size, and the
> ->write_inode call can race with it fairly easily if writing enough data
> in one go so that the completion for the first write come in just before
> we call ->write_inode.
> 
> Change the handling of this case to use requeue_io for a quick retry instead
> of redirty_tail, which keeps moving out the dirtied_when data and thus keeps
> delaying the writeout more and more with every failed attempt to get the lock.
> 
> I wonder if this would have caused my application waiting for xfs_ilock.  I checked 
> that  this patch is not in my kernel source (SUSE 11 SP2, Rel 3.0.13-0.27)

I have no idea whether it will help or not, because SuSE (like Red
Hat) ship a heavily patched kernel and so it's rather hard for
anyone here to triage and diagnose problems like this. Further, even
if we can find a potential cause, we still can't fix it for you.
Hence perhaps you are best advised to talk to your SuSE support
contact at this point?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Hung in D state during fclose
  2013-02-14  4:53                 ` Cheung, Norman
@ 2013-02-26 19:41                   ` Cheung, Norman
  2013-02-26 20:31                     ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Cheung, Norman @ 2013-02-26 19:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: 'linux-xfs@oss.sgi.com'

I'd been checking all the XFS patches if any might relate to my situation and came across this 
http://article.gmane.org/gmane.comp.file-systems.xfs.general/40349/

From: Christoph Hellwig <hch <at> infradead.org>
Subject: [PATCH, RFC] writeback: avoid redirtying when ->write_inode failed to clear I_DIRTY
Newsgroups: gmane.comp.file-systems.xfs.general
Date: 2011-08-27 06:14:09 GMT (1 year, 26 weeks, 1 day, 13 hours and 17 minutes ago)
Right now ->write_inode has no way to safely return a EAGAIN without explicitly
redirtying the inode, as we would lose the dirty state otherwise.  Most
filesystems get this wrong, but XFS makes heavy use of it to avoid blocking
the flusher thread when ->write_inode hits contentended inode locks.  A
contended ilock is something XFS can hit very easibly when extending files, as
the data I/O completion handler takes the lock to update the size, and the
->write_inode call can race with it fairly easily if writing enough data
in one go so that the completion for the first write come in just before
we call ->write_inode.

Change the handling of this case to use requeue_io for a quick retry instead
of redirty_tail, which keeps moving out the dirtied_when data and thus keeps
delaying the writeout more and more with every failed attempt to get the lock.

I wonder if this would have caused my application waiting for xfs_ilock.  I checked 
that  this patch is not in my kernel source (SUSE 11 SP2, Rel 3.0.13-0.27)

Re attached the  traces again, below.

Many Thanks,
Norman

> > 1. the disk writing thread hung in fclose
> >
> >
> > Tigris_IMC.exe  D 0000000000000000     0  4197   4100 0x00000000
> > ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> > ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> > 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1
> > Call Trace:
> > [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>]
> > ? xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ?
> > rwsem_down_failed_common+0xc5/0x150
> > [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> > [<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ?
> > xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ?
> > __xfs_get_blocks+0x1db/0x3d0 [xfs] [<ffffffff81103340>] ?
> > kmem_cache_alloc+0x100/0x130 [<ffffffff8113fa2e>] ?
> > alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ?
> > __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ?
> > xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ?
> > xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ?
> > block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ?
> > xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ?
> > generic_file_buffered_write+0xf8/0x250
> > [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs]
> > [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs]
> > [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ?
> > security_file_permission+0x24/0xc0
> > [<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ?
> > sys_write+0x47/0x90 [<ffffffff813ce412>] ?
> > system_call_fastpath+0x16/0x1b
> 
> So that is doing a write() from fclose, and it's waiting on the inode
> XFS_ILOCK_EXCL
> 
> /me wishes that all distros compiled their kernels with frame pointers
> enabled so that analysing stack traces is better than "I'm guessing that the
> real stack trace is....
> 
> [NLC] the last entry zone_statistics is called only with NUMA enabled, I
> wonder if I can work around it by turning off NUMA.
> 
> > 2. flush from another partition
> >
> > flush-8:48      D 0000000000000000     0  4217      2 0x00000000
> > ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
> > ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
> > ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9
> > Call Trace:
> > [<ffffffff810656f9>] ? __queue_work+0xc9/0x390 [<ffffffff811e3e3f>] ?
> > cfq_insert_request+0xaf/0x4f0 [<ffffffff81065a06>] ?
> > queue_work_on+0x16/0x20 [<ffffffff813c69cd>] ?
> > schedule_timeout+0x1dd/0x240 [<ffffffffa041a762>] ?
> > kmem_zone_zalloc+0x32/0x50 [xfs] [<ffffffff813c7559>] ?
> > __down+0x6c/0x99 [<ffffffff81070377>] ? down+0x37/0x40
> > [<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs]
> > [<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs]
> > [<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs]
> > [<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs]
> > [<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs]
> > [<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs]
> > [<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs]
> > [<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs]
> > [<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs]
> > [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa03d23a4>]
> > ? xfs_alloc_vextent+0x184/0x4a0 [xfs] [<ffffffffa03dc348>] ?
> > xfs_bmap_btalloc+0x2d8/0x6d0 [xfs] [<ffffffffa03e0efd>] ?
> > xfs_bmapi+0x9bd/0x11a0 [xfs] [<ffffffffa03d9bbc>] ?
> > xfs_bmap_search_multi_extents+0xac/0x120 [xfs] [<ffffffffa040293c>] ?
> > xfs_iomap_write_allocate+0x17c/0x330 [xfs] [<ffffffffa041b20f>] ?
> > xfs_map_blocks+0x19f/0x1b0 [xfs] [<ffffffffa041c20e>] ?
> > xfs_vm_writepage+0x19e/0x470 [xfs] [<ffffffff810c97ba>] ?
> > __writepage+0xa/0x30 [<ffffffff810c9c4d>] ?
> > write_cache_pages+0x1cd/0x3d0 [<ffffffff810c97b0>] ?
> > bdi_set_max_ratio+0x90/0x90 [<ffffffff810c9e93>] ?
> > generic_writepages+0x43/0x70 [<ffffffff81139330>] ?
> > writeback_single_inode+0x160/0x300
> > [<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0
> > [<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140
> > [<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310 [<ffffffff813cedee>] ?
> > apic_timer_interrupt+0xe/0x20 [<ffffffff8113a10e>] ?
> > wb_check_old_data_flush+0x4e/0xa0 [<ffffffff8113a28b>] ?
> > wb_do_writeback+0x12b/0x160 [<ffffffff8113a332>] ?
> > bdi_writeback_thread+0x72/0x150 [<ffffffff8113a2c0>] ?
> > wb_do_writeback+0x160/0x160 [<ffffffff8106b06e>] ? kthread+0x7e/0x90
> > [<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10
> > [<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0
> > [<ffffffff813cf540>] ? gs_change+0x13/0x13
> 
> Thats the writeback thread waiting on an AGF buffer to be unlocked.
> IOWs, there are probably multiple concurrent allocations to the same AG. But
> that will be holding the XFS_ILOCK_EXCL lock that the other thread is waiting.
> 
> What thread is holding the AGF buffer is anyone's guess - it could be waiting
> on IO completion, which would indicate a problem in the storage layers
> below XFS. The output in dmesg from sysrq-w (echo w >
> /proc/sysrq-trigger) might help indicate other blocked threads that could be
> holding the AGF lock.
>

> 
> On Wed, Feb 13, 2013 at 12:12:47AM +0000, Cheung, Norman wrote:
> > Dave,
> >
> > One other point I have forgotten to mention is that the parent thread
> > will wait for 5 minutes and then lower the thread priority (from -2
> > back to 20) and set a global variable to signal the threads to exit.
> > The blocked thread responded well and exit from D state and fclose
> > completed with no error.
> 
> So it's not hung - it's just very slow?
> 
> [NLC] It will pause forever.  I tried replaced the timeout with a sleep loop,
> and it will pause forever.    That is how I got the stack trace.
> 
> You have 256GB of memory. It's entirely possible that you've dirtied a large
> amount of memory and everything is simply stuck waiting for writeback to
> occur. Perhaps you should have a look at the utilisation of your disks when
> this still occurs. 'iostat -x -d -m 5' will give you some insight into utilsation
> when a hang occurs...
> 
> [NLC] I  have set the dirty_background_bytes to 40M and
> dirty_writeback_centisecs to 400; and watched the Meminfo.  I don't get a
> lot of dirtied memory accumulation -- 7 to 10 M average.  The disk usage from
> the sar log was steady at 150M/sec for each of the disk; but a few seconds
> after the fclose all disk activities stopped. Also the CPU % was quiet as well.
> 
> > This cause me to wonder if it  is possible that  some XFS threads and
> > my application thread might be in a deadlock.
> 
> Deadlocks are permanent, so what you are seeing is not a deadlock....
> 
> [NLC] Would it be possible that there is priority inversion between my  disk
> writing threads and the XFS threads & flush threads?  My application thread
> runs at -2 priority.
> 
> Many thanks,
> Norman
> 
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Hung in D state during fclose
  2013-02-13  5:15               ` Dave Chinner
@ 2013-02-14  4:53                 ` Cheung, Norman
  2013-02-26 19:41                   ` Cheung, Norman
  0 siblings, 1 reply; 14+ messages in thread
From: Cheung, Norman @ 2013-02-14  4:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: 'linux-xfs@oss.sgi.com'



-----Original Message-----
From: xfs-bounces@oss.sgi.com [mailto:xfs-bounces@oss.sgi.com] On Behalf Of Dave Chinner
Sent: Tuesday, February 12, 2013 9:16 PM
To: Cheung, Norman
Cc: 'linux-xfs@oss.sgi.com'
Subject: Re: Hung in D state during fclose

On Wed, Feb 13, 2013 at 12:12:47AM +0000, Cheung, Norman wrote:
> Dave,
> 
> One other point I have forgotten to mention is that the parent thread 
> will wait for 5 minutes and then lower the thread priority (from -2 
> back to 20) and set a global variable to signal the threads to exit.  
> The blocked thread responded well and exit from D state and fclose 
> completed with no error.

So it's not hung - it's just very slow?

[NLC] It will pause forever.  I tried replaced the timeout with a sleep loop, and it will pause forever.    That is how I got the stack trace.

You have 256GB of memory. It's entirely possible that you've dirtied a large amount of memory and everything is simply stuck waiting for writeback to occur. Perhaps you should have a look at the utilisation of your disks when this still occurs. 'iostat -x -d -m 5' will give you some insight into utilsation when a hang occurs...

[NLC] I  have set the dirty_background_bytes to 40M and dirty_writeback_centisecs to 400; and watched the Meminfo.  I don't get a  lot of dirtied memory accumulation -- 7 to 10 M average.  The disk usage from the sar log was steady at 150M/sec for each of the disk; but a few seconds after the fclose all disk activities stopped. Also the CPU % was quiet as well.     

> This cause me to wonder if it  is possible that  some XFS threads and 
> my application thread might be in a deadlock.

Deadlocks are permanent, so what you are seeing is not a deadlock....

[NLC] Would it be possible that there is priority inversion between my  disk writing threads and the XFS threads & flush threads?  My application thread runs at -2 priority.

Many thanks,
Norman


Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Hung in D state during fclose
  2013-02-13  0:12             ` Cheung, Norman
@ 2013-02-13  5:15               ` Dave Chinner
  2013-02-14  4:53                 ` Cheung, Norman
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2013-02-13  5:15 UTC (permalink / raw)
  To: Cheung, Norman; +Cc: 'linux-xfs@oss.sgi.com'

On Wed, Feb 13, 2013 at 12:12:47AM +0000, Cheung, Norman wrote:
> Dave,
> 
> One other point I have forgotten to mention is that the parent
> thread will wait for 5 minutes and then lower the thread priority
> (from -2 back to 20) and set a global variable to signal the
> threads to exit.  The blocked thread responded well and exit from
> D state and fclose completed with no error.

So it's not hung - it's just very slow?

You have 256GB of memory. It's entirely possible that you've dirtied
a large amount of memory and everything is simply stuck waiting for
writeback to occur. Perhaps you should have a look at the
utilisation of your disks when this still occurs. 'iostat -x -d -m
5' will give you some insight into utilsation when a hang occurs...

> This cause me to wonder if it  is possible that  some XFS threads
> and my application thread might be in a deadlock.  

Deadlocks are permanent, so what you are seeing is not a deadlock....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Hung in D state during fclose
  2013-02-12 21:06           ` Cheung, Norman
  2013-02-12 22:21             ` Dave Chinner
@ 2013-02-13  0:12             ` Cheung, Norman
  2013-02-13  5:15               ` Dave Chinner
  1 sibling, 1 reply; 14+ messages in thread
From: Cheung, Norman @ 2013-02-13  0:12 UTC (permalink / raw)
  To: 'Dave Chinner'; +Cc: 'linux-xfs@oss.sgi.com'

Dave,

One other point I have forgotten to mention is that the parent thread will wait for 5 minutes and then lower the thread priority (from -2 back to 20) and set a global variable to signal the threads to exit.  The blocked thread responded well and exit from D state and fclose completed with no error.

This cause me to wonder if it  is possible that  some XFS threads and my application thread might be in a deadlock.  

Thanks
Norman


-----Original Message-----
From: xfs-bounces@oss.sgi.com [mailto:xfs-bounces@oss.sgi.com] On Behalf Of Dave Chinner
Sent: Tuesday, February 12, 2013 12:23 PM
To: Cheung, Norman
Cc: linux-xfs@oss.sgi.com
Subject: Re: Hung in D state during fclose

On Tue, Feb 12, 2013 at 04:39:48PM +0000, Cheung, Norman wrote:
> It's just as mangled. Write them to a file, make sure it is formatted correctly, and attach it to the email.
> [NLC] attached in a file, sorry for the trouble.  Also paste trace again below  hopefully it will  come thru better.

The file and the paste came through OK.

> > kernel version 3.0.13-0.27
> 
> What distribution is that from?
> [NLC] SUSE

Yeah, looks to be a SLES kernel - have you talked to you SuSE support rep about this?
[NLC] In the process of making contact

> 1. the disk writing thread hung in fclose
> 
> 
> Tigris_IMC.exe  D 0000000000000000     0  4197   4100 0x00000000
> ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 
> Call Trace:
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>] 
> ? xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ?
> rwsem_down_failed_common+0xc5/0x150
> [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ? 
> xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ? 
> __xfs_get_blocks+0x1db/0x3d0 [xfs] [<ffffffff81103340>] ? 
> kmem_cache_alloc+0x100/0x130 [<ffffffff8113fa2e>] ? 
> alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ? 
> __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ? 
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ? 
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ? 
> block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ? 
> xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ? 
> generic_file_buffered_write+0xf8/0x250
> [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs] 
> [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs] 
> [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ?
> security_file_permission+0x24/0xc0
> [<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ? 
> sys_write+0x47/0x90 [<ffffffff813ce412>] ? 
> system_call_fastpath+0x16/0x1b

So that is doing a write() from fclose, and it's waiting on the inode XFS_ILOCK_EXCL 

/me wishes that all distros compiled their kernels with frame pointers enabled so that analysing stack traces is better than "I'm guessing that the real stack trace is....

[NLC] the last entry zone_statistics is called only with NUMA enabled, I wonder if I can work around it by turning off NUMA.

> 2. flush from another partition
> 
> flush-8:48      D 0000000000000000     0  4217      2 0x00000000
> ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
> ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
> ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9 
> Call Trace:
> [<ffffffff810656f9>] ? __queue_work+0xc9/0x390 [<ffffffff811e3e3f>] ? 
> cfq_insert_request+0xaf/0x4f0 [<ffffffff81065a06>] ? 
> queue_work_on+0x16/0x20 [<ffffffff813c69cd>] ? 
> schedule_timeout+0x1dd/0x240 [<ffffffffa041a762>] ? 
> kmem_zone_zalloc+0x32/0x50 [xfs] [<ffffffff813c7559>] ? 
> __down+0x6c/0x99 [<ffffffff81070377>] ? down+0x37/0x40 
> [<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs] 
> [<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs] 
> [<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs] 
> [<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs] 
> [<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs] 
> [<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs] 
> [<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs] 
> [<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs] 
> [<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs] 
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa03d23a4>] 
> ? xfs_alloc_vextent+0x184/0x4a0 [xfs] [<ffffffffa03dc348>] ?
> xfs_bmap_btalloc+0x2d8/0x6d0 [xfs] [<ffffffffa03e0efd>] ? 
> xfs_bmapi+0x9bd/0x11a0 [xfs] [<ffffffffa03d9bbc>] ? 
> xfs_bmap_search_multi_extents+0xac/0x120 [xfs] [<ffffffffa040293c>] ? 
> xfs_iomap_write_allocate+0x17c/0x330 [xfs] [<ffffffffa041b20f>] ? 
> xfs_map_blocks+0x19f/0x1b0 [xfs] [<ffffffffa041c20e>] ? 
> xfs_vm_writepage+0x19e/0x470 [xfs] [<ffffffff810c97ba>] ? 
> __writepage+0xa/0x30 [<ffffffff810c9c4d>] ? 
> write_cache_pages+0x1cd/0x3d0 [<ffffffff810c97b0>] ? 
> bdi_set_max_ratio+0x90/0x90 [<ffffffff810c9e93>] ? 
> generic_writepages+0x43/0x70 [<ffffffff81139330>] ? 
> writeback_single_inode+0x160/0x300
> [<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0 
> [<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140 
> [<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310 [<ffffffff813cedee>] ?
> apic_timer_interrupt+0xe/0x20 [<ffffffff8113a10e>] ? 
> wb_check_old_data_flush+0x4e/0xa0 [<ffffffff8113a28b>] ? 
> wb_do_writeback+0x12b/0x160 [<ffffffff8113a332>] ? 
> bdi_writeback_thread+0x72/0x150 [<ffffffff8113a2c0>] ? 
> wb_do_writeback+0x160/0x160 [<ffffffff8106b06e>] ? kthread+0x7e/0x90 
> [<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10 
> [<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0 
> [<ffffffff813cf540>] ? gs_change+0x13/0x13

Thats the writeback thread waiting on an AGF buffer to be unlocked.
IOWs, there are probably multiple concurrent allocations to the same AG. But that will be holding the XFS_ILOCK_EXCL lock that the other thread is waiting.

What thread is holding the AGF buffer is anyone's guess - it could be waiting on IO completion, which would indicate a problem in the storage layers below XFS. The output in dmesg from sysrq-w (echo w >
/proc/sysrq-trigger) might help indicate other blocked threads that could be holding the AGF lock.

[NLC] Only 2 threads in D state.  I  need to wait for the next hang to take another stack trace.  Is there any workaround I can reduce the frequencies of this hang.  What about reducing the xfssyncd_centisecs? Or other knobs?
.....
>     RAID layout (hardware and/or software) Hardware RAID 0 , 2 disks per RAID.  

What RAID controller? SAS or SATA drives? Stripe chunk/segment size?
Any BBWC?
[NLC]  2 RAID controllers in the system, it hung from disks on both of them.  One is SuperMicro 2208 and the other is LSI 9265-8I.  I think both use the same chipset.
[NLC] 2 SAS disks on each RAID 0 -15K RPM [NLC] I am not sure the segment size (strip size / no. of disks?) but the strip size is 512K (see below dump) [NLC] No BBWC installed

[NLC] My sunit=0 and swidth=0.  And sectsz=512  Would it help to set this to stripe size?

Thanks,
Norman

/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -L1 -a0


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 1 (Target Id: 1)
Name                :
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
Size                : 271.945 GB
Parity Size         : 0
State               : Optimal
Strip Size          : 512 KB
Number Of Drives    : 2
Span Depth          : 1
Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write
Disk Cache Policy   : Enabled
Encryption Type     : None
Bad Blocks Exist: No
PI type: No PI

Is VD Cached: No

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Hung in D state during fclose
  2013-02-12 21:06           ` Cheung, Norman
@ 2013-02-12 22:21             ` Dave Chinner
  2013-02-13  0:12             ` Cheung, Norman
  1 sibling, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2013-02-12 22:21 UTC (permalink / raw)
  To: Cheung, Norman; +Cc: linux-xfs

On Tue, Feb 12, 2013 at 09:06:03PM +0000, Cheung, Norman wrote:
> [NLC] Only 2 threads in D state.  I  need to wait for the next
> hang to take another stack trace.  Is there any workaround I can
> reduce the frequencies of this hang.  What about reducing the
> xfssyncd_centisecs? Or other knobs?

You need to isolate the cause of the hang before changing anything.
Twiddling random knobs won't improve the situation. Understand the
problem first, then take appropriate action.

> [NLC]  2 RAID controllers in the system, it hung from disks on
> both of them.  One is SuperMicro 2208 and the other is LSI
> 9265-8I.  I think both use the same chipset.

Does the filesystem that hangs always belong to the same controller?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Hung in D state during fclose
  2013-02-12 20:22         ` Dave Chinner
@ 2013-02-12 21:06           ` Cheung, Norman
  2013-02-12 22:21             ` Dave Chinner
  2013-02-13  0:12             ` Cheung, Norman
  0 siblings, 2 replies; 14+ messages in thread
From: Cheung, Norman @ 2013-02-12 21:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs



-----Original Message-----
From: xfs-bounces@oss.sgi.com [mailto:xfs-bounces@oss.sgi.com] On Behalf Of Dave Chinner
Sent: Tuesday, February 12, 2013 12:23 PM
To: Cheung, Norman
Cc: linux-xfs@oss.sgi.com
Subject: Re: Hung in D state during fclose

On Tue, Feb 12, 2013 at 04:39:48PM +0000, Cheung, Norman wrote:
> It's just as mangled. Write them to a file, make sure it is formatted correctly, and attach it to the email.
> [NLC] attached in a file, sorry for the trouble.  Also paste trace again below  hopefully it will  come thru better.

The file and the paste came through OK.

> > kernel version 3.0.13-0.27
> 
> What distribution is that from?
> [NLC] SUSE

Yeah, looks to be a SLES kernel - have you talked to you SuSE support rep about this?
[NLC] In the process of making contact

> 1. the disk writing thread hung in fclose
> 
> 
> Tigris_IMC.exe  D 0000000000000000     0  4197   4100 0x00000000
> ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 
> Call Trace:
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>] 
> ? xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ? 
> rwsem_down_failed_common+0xc5/0x150
> [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ? 
> xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ? 
> __xfs_get_blocks+0x1db/0x3d0 [xfs] [<ffffffff81103340>] ? 
> kmem_cache_alloc+0x100/0x130 [<ffffffff8113fa2e>] ? 
> alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ? 
> __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ? 
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ? 
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ? 
> block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ? 
> xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ? 
> generic_file_buffered_write+0xf8/0x250
> [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs] 
> [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs] 
> [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ? 
> security_file_permission+0x24/0xc0
> [<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ? 
> sys_write+0x47/0x90 [<ffffffff813ce412>] ? 
> system_call_fastpath+0x16/0x1b

So that is doing a write() from fclose, and it's waiting on the inode XFS_ILOCK_EXCL 

/me wishes that all distros compiled their kernels with frame pointers enabled so that analysing stack traces is better than "I'm guessing that the real stack trace is....

[NLC] the last entry zone_statistics is called only with NUMA enabled, I wonder if I can work around it by turning off NUMA.

> 2. flush from another partition
> 
> flush-8:48      D 0000000000000000     0  4217      2 0x00000000
> ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
> ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
> ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9 
> Call Trace:
> [<ffffffff810656f9>] ? __queue_work+0xc9/0x390 [<ffffffff811e3e3f>] ? 
> cfq_insert_request+0xaf/0x4f0 [<ffffffff81065a06>] ? 
> queue_work_on+0x16/0x20 [<ffffffff813c69cd>] ? 
> schedule_timeout+0x1dd/0x240 [<ffffffffa041a762>] ? 
> kmem_zone_zalloc+0x32/0x50 [xfs] [<ffffffff813c7559>] ? 
> __down+0x6c/0x99 [<ffffffff81070377>] ? down+0x37/0x40 
> [<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs] 
> [<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs] 
> [<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs] 
> [<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs] 
> [<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs] 
> [<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs] 
> [<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs] 
> [<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs] 
> [<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs] 
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa03d23a4>] 
> ? xfs_alloc_vextent+0x184/0x4a0 [xfs] [<ffffffffa03dc348>] ? 
> xfs_bmap_btalloc+0x2d8/0x6d0 [xfs] [<ffffffffa03e0efd>] ? 
> xfs_bmapi+0x9bd/0x11a0 [xfs] [<ffffffffa03d9bbc>] ? 
> xfs_bmap_search_multi_extents+0xac/0x120 [xfs] [<ffffffffa040293c>] ? 
> xfs_iomap_write_allocate+0x17c/0x330 [xfs] [<ffffffffa041b20f>] ? 
> xfs_map_blocks+0x19f/0x1b0 [xfs] [<ffffffffa041c20e>] ? 
> xfs_vm_writepage+0x19e/0x470 [xfs] [<ffffffff810c97ba>] ? 
> __writepage+0xa/0x30 [<ffffffff810c9c4d>] ? 
> write_cache_pages+0x1cd/0x3d0 [<ffffffff810c97b0>] ? 
> bdi_set_max_ratio+0x90/0x90 [<ffffffff810c9e93>] ? 
> generic_writepages+0x43/0x70 [<ffffffff81139330>] ? 
> writeback_single_inode+0x160/0x300
> [<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0 
> [<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140 
> [<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310 [<ffffffff813cedee>] ? 
> apic_timer_interrupt+0xe/0x20 [<ffffffff8113a10e>] ? 
> wb_check_old_data_flush+0x4e/0xa0 [<ffffffff8113a28b>] ? 
> wb_do_writeback+0x12b/0x160 [<ffffffff8113a332>] ? 
> bdi_writeback_thread+0x72/0x150 [<ffffffff8113a2c0>] ? 
> wb_do_writeback+0x160/0x160 [<ffffffff8106b06e>] ? kthread+0x7e/0x90 
> [<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10 
> [<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0 
> [<ffffffff813cf540>] ? gs_change+0x13/0x13

Thats the writeback thread waiting on an AGF buffer to be unlocked.
IOWs, there are probably multiple concurrent allocations to the same AG. But that will be holding the XFS_ILOCK_EXCL lock that the other thread is waiting.

What thread is holding the AGF buffer is anyone's guess - it could be waiting on IO completion, which would indicate a problem in the storage layers below XFS. The output in dmesg from sysrq-w (echo w >
/proc/sysrq-trigger) might help indicate other blocked threads that could be holding the AGF lock.

[NLC] Only 2 threads in D state.  I  need to wait for the next hang to take another stack trace.  Is there any workaround I can reduce the frequencies of this hang.  What about reducing the xfssyncd_centisecs? Or other knobs?
.....
>     RAID layout (hardware and/or software) Hardware RAID 0 , 2 disks per RAID.  

What RAID controller? SAS or SATA drives? Stripe chunk/segment size?
Any BBWC?
[NLC]  2 RAID controllers in the system, it hung from disks on both of them.  One is SuperMicro 2208 and the other is LSI 9265-8I.  I think both use the same chipset.
[NLC] 2 SAS disks on each RAID 0 -15K RPM
[NLC] I am not sure the segment size (strip size / no. of disks?) but the strip size is 512K (see below dump)
[NLC] No BBWC installed

[NLC] My sunit=0 and swidth=0.  And sectsz=512  Would it help to set this to stripe size?

Thanks,
Norman

/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -L1 -a0


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 1 (Target Id: 1)
Name                :
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
Size                : 271.945 GB
Parity Size         : 0
State               : Optimal
Strip Size          : 512 KB
Number Of Drives    : 2
Span Depth          : 1
Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Enabled
Encryption Type     : None
Bad Blocks Exist: No
PI type: No PI

Is VD Cached: No

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Hung in D state during fclose
  2013-02-12 16:39       ` Cheung, Norman
@ 2013-02-12 20:22         ` Dave Chinner
  2013-02-12 21:06           ` Cheung, Norman
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2013-02-12 20:22 UTC (permalink / raw)
  To: Cheung, Norman; +Cc: linux-xfs

On Tue, Feb 12, 2013 at 04:39:48PM +0000, Cheung, Norman wrote:
> It's just as mangled. Write them to a file, make sure it is formatted correctly, and attach it to the email.
> [NLC] attached in a file, sorry for the trouble.  Also paste trace again below  hopefully it will  come thru better.

The file and the paste came through OK.

> > kernel version 3.0.13-0.27
> 
> What distribution is that from?
> [NLC] SUSE 

Yeah, looks to be a SLES kernel - have you talked to you SuSE
support rep about this?

> 1. the disk writing thread hung in fclose
> 
> 
> Tigris_IMC.exe  D 0000000000000000     0  4197   4100 0x00000000
> ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 
> Call Trace:
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 
> [<ffffffffa0402682>] ? xfs_iomap_write_delay+0x172/0x2b0 [xfs] 
> [<ffffffff813c7e35>] ? rwsem_down_failed_common+0xc5/0x150
> [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff813c74ec>] ? down_write+0x1c/0x1d 
> [<ffffffffa03fba8e>] ? xfs_ilock+0x7e/0xa0 [xfs] 
> [<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0 [xfs] 
> [<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130 
> [<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0 
> [<ffffffff81141cdf>] ? __block_write_begin+0x1cf/0x4d0 
> [<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs] 
> [<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs] 
> [<ffffffff8114226b>] ? block_write_begin+0x4b/0xa0 
> [<ffffffffa041b8fb>] ? xfs_vm_write_begin+0x3b/0x70 [xfs] 
> [<ffffffff810c0258>] ? generic_file_buffered_write+0xf8/0x250
> [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs] 
> [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs] 
> [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 
> [<ffffffff8119daa4>] ? security_file_permission+0x24/0xc0
> [<ffffffff8111630a>] ? vfs_write+0xaa/0x190 
> [<ffffffff81116657>] ? sys_write+0x47/0x90 
> [<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b

So that is doing a write() from fclose, and it's waiting on the
inode XFS_ILOCK_EXCL 

/me wishes that all distros compiled their kernels with frame
pointers enabled so that analysing stack traces is better than "I'm
guessing that the real stack trace is....

> 2. flush from another partition
> 
> flush-8:48      D 0000000000000000     0  4217      2 0x00000000
> ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
> ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
> ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9 
> Call Trace:
> [<ffffffff810656f9>] ? __queue_work+0xc9/0x390 
> [<ffffffff811e3e3f>] ? cfq_insert_request+0xaf/0x4f0 
> [<ffffffff81065a06>] ? queue_work_on+0x16/0x20 
> [<ffffffff813c69cd>] ? schedule_timeout+0x1dd/0x240 
> [<ffffffffa041a762>] ? kmem_zone_zalloc+0x32/0x50 [xfs] 
> [<ffffffff813c7559>] ? __down+0x6c/0x99 
> [<ffffffff81070377>] ? down+0x37/0x40 
> [<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs] 
> [<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs] 
> [<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs] 
> [<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs] 
> [<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs] 
> [<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs] 
> [<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs] 
> [<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs] 
> [<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs] 
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 
> [<ffffffffa03d23a4>] ? xfs_alloc_vextent+0x184/0x4a0 [xfs] 
> [<ffffffffa03dc348>] ? xfs_bmap_btalloc+0x2d8/0x6d0 [xfs] 
> [<ffffffffa03e0efd>] ? xfs_bmapi+0x9bd/0x11a0 [xfs] 
> [<ffffffffa03d9bbc>] ? xfs_bmap_search_multi_extents+0xac/0x120 [xfs] 
> [<ffffffffa040293c>] ? xfs_iomap_write_allocate+0x17c/0x330 [xfs] 
> [<ffffffffa041b20f>] ? xfs_map_blocks+0x19f/0x1b0 [xfs] 
> [<ffffffffa041c20e>] ? xfs_vm_writepage+0x19e/0x470 [xfs] 
> [<ffffffff810c97ba>] ? __writepage+0xa/0x30 
> [<ffffffff810c9c4d>] ? write_cache_pages+0x1cd/0x3d0 
> [<ffffffff810c97b0>] ? bdi_set_max_ratio+0x90/0x90 
> [<ffffffff810c9e93>] ? generic_writepages+0x43/0x70 
> [<ffffffff81139330>] ? writeback_single_inode+0x160/0x300
> [<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0 
> [<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140 
> [<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310 
> [<ffffffff813cedee>] ? apic_timer_interrupt+0xe/0x20 
> [<ffffffff8113a10e>] ? wb_check_old_data_flush+0x4e/0xa0 
> [<ffffffff8113a28b>] ? wb_do_writeback+0x12b/0x160 
> [<ffffffff8113a332>] ? bdi_writeback_thread+0x72/0x150 
> [<ffffffff8113a2c0>] ? wb_do_writeback+0x160/0x160 
> [<ffffffff8106b06e>] ? kthread+0x7e/0x90 
> [<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10 
> [<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0 
> [<ffffffff813cf540>] ? gs_change+0x13/0x13

Thats the writeback thread waiting on an AGF buffer to be unlocked.
IOWs, there are probably multiple concurrent allocations to the same
AG. But that will be holding the XFS_ILOCK_EXCL lock that the other
thread is waiting.

What thread is holding the AGF buffer is anyone's guess - it could
be waiting on IO completion, which would indicate a problem in the
storage layers below XFS. The output in dmesg from sysrq-w (echo w >
/proc/sysrq-trigger) might help indicate other blocked threads that
could be holding the AGF lock.

.....
>     RAID layout (hardware and/or software) Hardware RAID 0 , 2 disks per RAID.  

What RAID controller? SAS or SATA drives? Stripe chunk/segment size?
Any BBWC?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Hung in D state during fclose
  2013-02-12 10:20     ` Dave Chinner
@ 2013-02-12 16:39       ` Cheung, Norman
  2013-02-12 20:22         ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Cheung, Norman @ 2013-02-12 16:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

[-- Attachment #1: Type: text/plain, Size: 6615 bytes --]



-----Original Message-----
From: Dave Chinner [mailto:david@fromorbit.com] 
Sent: Tuesday, February 12, 2013 2:20 AM
To: Cheung, Norman
Cc: linux-xfs@oss.sgi.com
Subject: Re: Hung in D state during fclose

On Tue, Feb 12, 2013 at 07:01:59AM +0000, Cheung, Norman wrote:
> Sorry Dave.  Let me copy and paste the trace below, hope it works.
> Thanks,
> Norman
> 
> Below is the sysrq for the 2 offending threads.
> 
> 1. the disk writing thread hung in fclose
> 
> 
> Tigris_IMC.exe  D 0000000000000000     0  4197   4100 0x00000000
> ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 Call Trace:
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>] 
> ? xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ? 
> rwsem_down_failed_common+0xc5/0x150
> [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ? 
> xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ? 
> __xfs_get_blocks+0x1db/0x3d0 [xfs] [<ffffffff81103340>] ? 
> kmem_cache_alloc+0x100/0x130 [<ffffffff8113fa2e>] ? 
> alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ? 
> __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ? 
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ? 
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ? 
> block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ? 
> xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ? 
> generic_file_buffered_write+0xf8/0x250
> [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs] 
> [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs] 
> [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ? 
> security_file_permission+0x24/0xc0
> [<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ? 
> sys_write+0x47/0x90 [<ffffffff813ce412>] ? 
> system_call_fastpath+0x16/0x1b

It's just as mangled. Write them to a file, make sure it is formatted correctly, and attach it to the email.
[NLC] attached in a file, sorry for the trouble.  Also paste trace again below  hopefully it will  come thru better.

> kernel version 3.0.13-0.27

What distribution is that from?
[NLC] SUSE 

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com

[NLC] 

1. the disk writing thread hung in fclose


Tigris_IMC.exe  D 0000000000000000     0  4197   4100 0x00000000
ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 
Call Trace:
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 
[<ffffffffa0402682>] ? xfs_iomap_write_delay+0x172/0x2b0 [xfs] 
[<ffffffff813c7e35>] ? rwsem_down_failed_common+0xc5/0x150
[<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
[<ffffffff813c74ec>] ? down_write+0x1c/0x1d 
[<ffffffffa03fba8e>] ? xfs_ilock+0x7e/0xa0 [xfs] 
[<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0 [xfs] 
[<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130 
[<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0 
[<ffffffff81141cdf>] ? __block_write_begin+0x1cf/0x4d0 
[<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs] 
[<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs] 
[<ffffffff8114226b>] ? block_write_begin+0x4b/0xa0 
[<ffffffffa041b8fb>] ? xfs_vm_write_begin+0x3b/0x70 [xfs] 
[<ffffffff810c0258>] ? generic_file_buffered_write+0xf8/0x250
[<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs] 
[<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs] 
[<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 
[<ffffffff8119daa4>] ? security_file_permission+0x24/0xc0
[<ffffffff8111630a>] ? vfs_write+0xaa/0x190 
[<ffffffff81116657>] ? sys_write+0x47/0x90 
[<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b

2. flush from another partition

flush-8:48      D 0000000000000000     0  4217      2 0x00000000
ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9 
Call Trace:
[<ffffffff810656f9>] ? __queue_work+0xc9/0x390 
[<ffffffff811e3e3f>] ? cfq_insert_request+0xaf/0x4f0 
[<ffffffff81065a06>] ? queue_work_on+0x16/0x20 
[<ffffffff813c69cd>] ? schedule_timeout+0x1dd/0x240 
[<ffffffffa041a762>] ? kmem_zone_zalloc+0x32/0x50 [xfs] 
[<ffffffff813c7559>] ? __down+0x6c/0x99 
[<ffffffff81070377>] ? down+0x37/0x40 
[<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs] 
[<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs] 
[<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs] 
[<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs] 
[<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs] 
[<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs] 
[<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs] 
[<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs] 
[<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs] 
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 
[<ffffffffa03d23a4>] ? xfs_alloc_vextent+0x184/0x4a0 [xfs] 
[<ffffffffa03dc348>] ? xfs_bmap_btalloc+0x2d8/0x6d0 [xfs] 
[<ffffffffa03e0efd>] ? xfs_bmapi+0x9bd/0x11a0 [xfs] 
[<ffffffffa03d9bbc>] ? xfs_bmap_search_multi_extents+0xac/0x120 [xfs] 
[<ffffffffa040293c>] ? xfs_iomap_write_allocate+0x17c/0x330 [xfs] 
[<ffffffffa041b20f>] ? xfs_map_blocks+0x19f/0x1b0 [xfs] 
[<ffffffffa041c20e>] ? xfs_vm_writepage+0x19e/0x470 [xfs] 
[<ffffffff810c97ba>] ? __writepage+0xa/0x30 
[<ffffffff810c9c4d>] ? write_cache_pages+0x1cd/0x3d0 
[<ffffffff810c97b0>] ? bdi_set_max_ratio+0x90/0x90 
[<ffffffff810c9e93>] ? generic_writepages+0x43/0x70 
[<ffffffff81139330>] ? writeback_single_inode+0x160/0x300
[<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0 
[<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140 
[<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310 
[<ffffffff813cedee>] ? apic_timer_interrupt+0xe/0x20 
[<ffffffff8113a10e>] ? wb_check_old_data_flush+0x4e/0xa0 
[<ffffffff8113a28b>] ? wb_do_writeback+0x12b/0x160 
[<ffffffff8113a332>] ? bdi_writeback_thread+0x72/0x150 
[<ffffffff8113a2c0>] ? wb_do_writeback+0x160/0x160 
[<ffffffff8106b06e>] ? kthread+0x7e/0x90 
[<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10 
[<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0 
[<ffffffff813cf540>] ? gs_change+0x13/0x13


[-- Attachment #2: Trace_fclose_Hangs.txt --]
[-- Type: text/plain, Size: 12123 bytes --]


Below is the sysrq for the 2 offending threads.

1. the disk writing thread hung in fclose


Tigris_IMC.exe  D 0000000000000000     0  4197   4100 0x00000000
ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 
Call Trace:
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 
[<ffffffffa0402682>] ? xfs_iomap_write_delay+0x172/0x2b0 [xfs] 
[<ffffffff813c7e35>] ? rwsem_down_failed_common+0xc5/0x150
[<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
[<ffffffff813c74ec>] ? down_write+0x1c/0x1d 
[<ffffffffa03fba8e>] ? xfs_ilock+0x7e/0xa0 [xfs] 
[<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0 [xfs] 
[<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130 
[<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0 
[<ffffffff81141cdf>] ? __block_write_begin+0x1cf/0x4d0 
[<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs] 
[<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs] 
[<ffffffff8114226b>] ? block_write_begin+0x4b/0xa0 
[<ffffffffa041b8fb>] ? xfs_vm_write_begin+0x3b/0x70 [xfs] 
[<ffffffff810c0258>] ? generic_file_buffered_write+0xf8/0x250
[<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs] 
[<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs] 
[<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 
[<ffffffff8119daa4>] ? security_file_permission+0x24/0xc0
[<ffffffff8111630a>] ? vfs_write+0xaa/0x190 
[<ffffffff81116657>] ? sys_write+0x47/0x90 
[<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b

2. flush from another partition

flush-8:48      D 0000000000000000     0  4217      2 0x00000000
ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9 
Call Trace:
[<ffffffff810656f9>] ? __queue_work+0xc9/0x390 
[<ffffffff811e3e3f>] ? cfq_insert_request+0xaf/0x4f0 
[<ffffffff81065a06>] ? queue_work_on+0x16/0x20 
[<ffffffff813c69cd>] ? schedule_timeout+0x1dd/0x240 
[<ffffffffa041a762>] ? kmem_zone_zalloc+0x32/0x50 [xfs] 
[<ffffffff813c7559>] ? __down+0x6c/0x99 
[<ffffffff81070377>] ? down+0x37/0x40 
[<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs] 
[<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs] 
[<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs] 
[<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs] 
[<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs] 
[<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs] 
[<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs] 
[<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs] 
[<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs] 
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 
[<ffffffffa03d23a4>] ? xfs_alloc_vextent+0x184/0x4a0 [xfs] 
[<ffffffffa03dc348>] ? xfs_bmap_btalloc+0x2d8/0x6d0 [xfs] 
[<ffffffffa03e0efd>] ? xfs_bmapi+0x9bd/0x11a0 [xfs] 
[<ffffffffa03d9bbc>] ? xfs_bmap_search_multi_extents+0xac/0x120 [xfs] 
[<ffffffffa040293c>] ? xfs_iomap_write_allocate+0x17c/0x330 [xfs] 
[<ffffffffa041b20f>] ? xfs_map_blocks+0x19f/0x1b0 [xfs] 
[<ffffffffa041c20e>] ? xfs_vm_writepage+0x19e/0x470 [xfs] 
[<ffffffff810c97ba>] ? __writepage+0xa/0x30 
[<ffffffff810c9c4d>] ? write_cache_pages+0x1cd/0x3d0 
[<ffffffff810c97b0>] ? bdi_set_max_ratio+0x90/0x90 
[<ffffffff810c9e93>] ? generic_writepages+0x43/0x70 
[<ffffffff81139330>] ? writeback_single_inode+0x160/0x300
[<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0 
[<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140 
[<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310 
[<ffffffff813cedee>] ? apic_timer_interrupt+0xe/0x20 
[<ffffffff8113a10e>] ? wb_check_old_data_flush+0x4e/0xa0 
[<ffffffff8113a28b>] ? wb_do_writeback+0x12b/0x160 
[<ffffffff8113a332>] ? bdi_writeback_thread+0x72/0x150 
[<ffffffff8113a2c0>] ? wb_do_writeback+0x160/0x160 
[<ffffffff8106b06e>] ? kthread+0x7e/0x90 
[<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10 
[<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0 
[<ffffffff813cf540>] ? gs_change+0x13/0x13


kernel version 3.0.13-0.27
    
xfsprogs version xfs_repair version 3.1.6

number of CPUs  32

 xfs_info /fastraid2
meta-data=/dev/sdd1              isize=256    agcount=4, agsize=17822144 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=71288576, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=34808, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


    contents of /proc/meminfo
cat /proc/meminfo
MemTotal:       262429860 kB
MemFree:        146893200 kB
Buffers:           38600 kB
Cached:           336388 kB
SwapCached:            0 kB
Active:          4601364 kB
Inactive:         277852 kB
Active(anon):    4504236 kB
Inactive(anon):     9524 kB
Active(file):      97128 kB
Inactive(file):   268328 kB
Unevictable:    107520004 kB
Mlocked:          212048 kB
SwapTotal:       1051644 kB
SwapFree:        1051644 kB
Dirty:               140 kB
Writeback:             0 kB
AnonPages:      112024272 kB
Mapped:          2031420 kB
Shmem:              9528 kB
Slab:             123444 kB
SReclaimable:      24088 kB
SUnreclaim:        99356 kB
KernelStack:        3728 kB
PageTables:       229284 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    260857204 kB
Committed_AS:   112752960 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      712612 kB
VmallocChunk:   34224802044 kB
HardwareCorrupted:     0 kB
AnonHugePages:  111861760 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      133788 kB
DirectMap2M:     7178240 kB
DirectMap1G:    261095424 kB


    contents of /proc/mounts

cat /proc/mounts
rootfs / rootfs rw 0 0
udev /dev tmpfs rw,relatime,nr_inodes=0,mode=755 0 0 tmpfs /dev/shm tmpfs rw,relatime 0 0
/dev/sde3 / reiserfs rw,relatime,acl,user_xattr 0 0 proc /proc proc rw,relatime 0 0 sysfs /sys sysfs rw,relatime 0 0 devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
/dev/sde1 /boot reiserfs rw,relatime,acl,user_xattr 0 0 fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0 debugfs /sys/kernel/debug debugfs rw,relatime 0 0 nfsd /proc/fs/nfsd nfsd rw,relatime 0 0 rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 //10.34.3.247/IMCRootFolder /home/klac/mount_path_Tool cifs rw,relatime,sec=ntlm,unc=\\10.34.3.247\IMCRootFolder,username=KTMfg,uid=1000,forceuid,gid=100,forcegid,addr=10.34.3.247,file_mode=0755,dir_mode=0755,nounix,rsize=16384,wsize=131007,actimeo=1 0 0
/dev/sda1 /edge xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdb1 /fastraid0 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdc1 /fastraid1 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdd1 /fastraid2 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdf1 /fastraid3 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdg1 /fastraid4 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdh1 /fastraid5 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0


    contents of /proc/partitions
 cat /proc/partitions
major minor  #blocks  name

   8       16  285155328 sdb
   8       17  285154304 sdb1
   8       32  285155328 sdc
   8       33  285154304 sdc1
   8       48  285155328 sdd
   8       49  285154304 sdd1
   8       80  285671424 sdf
   8       81  285659136 sdf1
   8       64  124485632 sde
   8       65    1051648 sde1
   8       66    1051648 sde2
   8       67  122381312 sde3
   8       96  285671424 sdg
   8       97  285659136 sdg1
   8      112  285671424 sdh
   8      113  285659136 sdh1
   8        0  142577664 sda
   8        1  142576640 sda1


    RAID layout (hardware and/or software) Hardware RAID 0 , 2 disks per RAID.  
    LVM configuration

fdisk -l

Disk /dev/sdb: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00073b0b

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048   570310655   285154304   83  Linux

Disk /dev/sdc: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000d3eba

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048   570310655   285154304   83  Linux

Disk /dev/sdd: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0003e548

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1            2048   570310655   285154304   83  Linux

Disk /dev/sdf: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0009caf0

   Device Boot      Start         End      Blocks   Id  System
/dev/sdf1            2048   571320319   285659136   83  Linux

Disk /dev/sde: 127.5 GB, 127473287168 bytes
255 heads, 63 sectors/track, 15497 cylinders, total 248971264 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x674cfaee

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1   *        2048     2105343     1051648   83  Linux
/dev/sde2         2105344     4208639     1051648   82  Linux swap / Solaris
/dev/sde3         4208640   248971263   122381312   83  Linux

Disk /dev/sdg: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0000a1d4

   Device Boot      Start         End      Blocks   Id  System
/dev/sdg1            2048   571320319   285659136   83  Linux

Disk /dev/sdh: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0006b7ff

   Device Boot      Start         End      Blocks   Id  System
/dev/sdh1            2048   571320319   285659136   83  Linux

Disk /dev/sda: 146.0 GB, 145999527936 bytes
255 heads, 63 sectors/track, 17750 cylinders, total 285155328 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000f1568

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048   285155327   142576640   83  Linux


[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Hung in D state during fclose
  2013-02-12  7:01   ` Cheung, Norman
@ 2013-02-12 10:20     ` Dave Chinner
  2013-02-12 16:39       ` Cheung, Norman
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2013-02-12 10:20 UTC (permalink / raw)
  To: Cheung, Norman; +Cc: linux-xfs

On Tue, Feb 12, 2013 at 07:01:59AM +0000, Cheung, Norman wrote:
> Sorry Dave.  Let me copy and paste the trace below, hope it works.
> Thanks,
> Norman
> 
> Below is the sysrq for the 2 offending threads.
> 
> 1. the disk writing thread hung in fclose
> 
> 
> Tigris_IMC.exe  D 0000000000000000     0  4197   4100 0x00000000
> ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 Call Trace:
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>] ? xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ? rwsem_down_failed_common+0xc5/0x150
> [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ? xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0 [xfs] [<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130 [<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ? __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ? block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ? xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ? generic_file_buffered_write+0xf8/0x250
> [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs] [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs] [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ? security_file_permission+0x24/0xc0
> [<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ? sys_write+0x47/0x90 [<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b

It's just as mangled. Write them to a file, make sure it is
formatted correctly, and attach it to the email.

> kernel version 3.0.13-0.27

What distribution is that from?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Hung in D state during fclose
  2013-02-12  6:55 ` Dave Chinner
@ 2013-02-12  7:01   ` Cheung, Norman
  2013-02-12 10:20     ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Cheung, Norman @ 2013-02-12  7:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Sorry Dave.  Let me copy and paste the trace below, hope it works.
Thanks,
Norman

Below is the sysrq for the 2 offending threads.

1. the disk writing thread hung in fclose


Tigris_IMC.exe  D 0000000000000000     0  4197   4100 0x00000000
ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 Call Trace:
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>] ? xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ? rwsem_down_failed_common+0xc5/0x150
[<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
[<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ? xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0 [xfs] [<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130 [<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ? __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ? xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ? block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ? xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ? generic_file_buffered_write+0xf8/0x250
[<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs] [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs] [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ? security_file_permission+0x24/0xc0
[<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ? sys_write+0x47/0x90 [<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b

2. flush from another partition

flush-8:48      D 0000000000000000     0  4217      2 0x00000000
ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9 Call Trace:
[<ffffffff810656f9>] ? __queue_work+0xc9/0x390 [<ffffffff811e3e3f>] ? cfq_insert_request+0xaf/0x4f0 [<ffffffff81065a06>] ? queue_work_on+0x16/0x20 [<ffffffff813c69cd>] ? schedule_timeout+0x1dd/0x240 [<ffffffffa041a762>] ? kmem_zone_zalloc+0x32/0x50 [xfs] [<ffffffff813c7559>] ? __down+0x6c/0x99 [<ffffffff81070377>] ? down+0x37/0x40 [<ffffffffa041d59d>] ? xfs_buf_lock+0x1d/0x40 [xfs] [<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 [xfs] [<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs] [<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs] [<ffffffffa04151af>] ? xfs_trans_read_buf+0x1bf/0x2f0 [xfs] [<ffffffffa03d06c0>] ? xfs_read_agf+0x60/0x1b0 [xfs] [<ffffffffa03cf3b7>] ? xfs_alloc_update+0x17/0x20 [xfs] [<ffffffffa03d0841>] ? xfs_alloc_read_agf+0x31/0xd0 [xfs] [<ffffffffa03d2083>] ? xfs_alloc_fix_freelist+0x433/0x4a0 [xfs] [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa03d23a4>] ? xfs_alloc_vextent+0x184/0x4a0 [xfs] [<ffffffffa03dc348>] ? xfs_bmap_btalloc+0x2d8/0x
 6d0 [xfs] [<ffffffffa03e0efd>] ? xfs_bmapi+0x9bd/0x11a0 [xfs] [<ffffffffa03d9bbc>] ? xfs_bmap_search_multi_extents+0xac/0x120 [xfs] [<ffffffffa040293c>] ? xfs_iomap_write_allocate+0x17c/0x330 [xfs] [<ffffffffa041b20f>] ? xfs_map_blocks+0x19f/0x1b0 [xfs] [<ffffffffa041c20e>] ? xfs_vm_writepage+0x19e/0x470 [xfs] [<ffffffff810c97ba>] ? __writepage+0xa/0x30 [<ffffffff810c9c4d>] ? write_cache_pages+0x1cd/0x3d0 [<ffffffff810c97b0>] ? bdi_set_max_ratio+0x90/0x90 [<ffffffff810c9e93>] ? generic_writepages+0x43/0x70 [<ffffffff81139330>] ? writeback_single_inode+0x160/0x300
[<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0 [<ffffffff81139cfd>] ? writeback_inodes_wb+0x8d/0x140 [<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310 [<ffffffff813cedee>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8113a10e>] ? wb_check_old_data_flush+0x4e/0xa0 [<ffffffff8113a28b>] ? wb_do_writeback+0x12b/0x160 [<ffffffff8113a332>] ? bdi_writeback_thread+0x72/0x150 [<ffffffff8113a2c0>] ? wb_do_writeback+0x160/0x160 [<ffffffff8106b06e>] ? kthread+0x7e/0x90 [<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8106aff0>] ? kthread_worker_fn+0x1a0/0x1a0 [<ffffffff813cf540>] ? gs_change+0x13/0x13


kernel version 3.0.13-0.27
    
xfsprogs version xfs_repair version 3.1.6

number of CPUs  32

 xfs_info /fastraid2
meta-data=/dev/sdd1              isize=256    agcount=4, agsize=17822144 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=71288576, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=34808, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


    contents of /proc/meminfo
cat /proc/meminfo
MemTotal:       262429860 kB
MemFree:        146893200 kB
Buffers:           38600 kB
Cached:           336388 kB
SwapCached:            0 kB
Active:          4601364 kB
Inactive:         277852 kB
Active(anon):    4504236 kB
Inactive(anon):     9524 kB
Active(file):      97128 kB
Inactive(file):   268328 kB
Unevictable:    107520004 kB
Mlocked:          212048 kB
SwapTotal:       1051644 kB
SwapFree:        1051644 kB
Dirty:               140 kB
Writeback:             0 kB
AnonPages:      112024272 kB
Mapped:          2031420 kB
Shmem:              9528 kB
Slab:             123444 kB
SReclaimable:      24088 kB
SUnreclaim:        99356 kB
KernelStack:        3728 kB
PageTables:       229284 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    260857204 kB
Committed_AS:   112752960 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      712612 kB
VmallocChunk:   34224802044 kB
HardwareCorrupted:     0 kB
AnonHugePages:  111861760 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      133788 kB
DirectMap2M:     7178240 kB
DirectMap1G:    261095424 kB


    contents of /proc/mounts

cat /proc/mounts
rootfs / rootfs rw 0 0
udev /dev tmpfs rw,relatime,nr_inodes=0,mode=755 0 0 tmpfs /dev/shm tmpfs rw,relatime 0 0
/dev/sde3 / reiserfs rw,relatime,acl,user_xattr 0 0 proc /proc proc rw,relatime 0 0 sysfs /sys sysfs rw,relatime 0 0 devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
/dev/sde1 /boot reiserfs rw,relatime,acl,user_xattr 0 0 fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0 debugfs /sys/kernel/debug debugfs rw,relatime 0 0 nfsd /proc/fs/nfsd nfsd rw,relatime 0 0 rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 //10.34.3.247/IMCRootFolder /home/klac/mount_path_Tool cifs rw,relatime,sec=ntlm,unc=\\10.34.3.247\IMCRootFolder,username=KTMfg,uid=1000,forceuid,gid=100,forcegid,addr=10.34.3.247,file_mode=0755,dir_mode=0755,nounix,rsize=16384,wsize=131007,actimeo=1 0 0
/dev/sda1 /edge xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdb1 /fastraid0 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdc1 /fastraid1 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdd1 /fastraid2 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdf1 /fastraid3 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdg1 /fastraid4 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdh1 /fastraid5 xfs rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,noquota 0 0


    contents of /proc/partitions
 cat /proc/partitions
major minor  #blocks  name

   8       16  285155328 sdb
   8       17  285154304 sdb1
   8       32  285155328 sdc
   8       33  285154304 sdc1
   8       48  285155328 sdd
   8       49  285154304 sdd1
   8       80  285671424 sdf
   8       81  285659136 sdf1
   8       64  124485632 sde
   8       65    1051648 sde1
   8       66    1051648 sde2
   8       67  122381312 sde3
   8       96  285671424 sdg
   8       97  285659136 sdg1
   8      112  285671424 sdh
   8      113  285659136 sdh1
   8        0  142577664 sda
   8        1  142576640 sda1


    RAID layout (hardware and/or software) Hardware RAID 0 , 2 disks per RAID.  
    LVM configuration

fdisk -l

Disk /dev/sdb: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00073b0b

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048   570310655   285154304   83  Linux

Disk /dev/sdc: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000d3eba

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048   570310655   285154304   83  Linux

Disk /dev/sdd: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0003e548

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1            2048   570310655   285154304   83  Linux

Disk /dev/sdf: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0009caf0

   Device Boot      Start         End      Blocks   Id  System
/dev/sdf1            2048   571320319   285659136   83  Linux

Disk /dev/sde: 127.5 GB, 127473287168 bytes
255 heads, 63 sectors/track, 15497 cylinders, total 248971264 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x674cfaee

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1   *        2048     2105343     1051648   83  Linux
/dev/sde2         2105344     4208639     1051648   82  Linux swap / Solaris
/dev/sde3         4208640   248971263   122381312   83  Linux

Disk /dev/sdg: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0000a1d4

   Device Boot      Start         End      Blocks   Id  System
/dev/sdg1            2048   571320319   285659136   83  Linux

Disk /dev/sdh: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0006b7ff

   Device Boot      Start         End      Blocks   Id  System
/dev/sdh1            2048   571320319   285659136   83  Linux

Disk /dev/sda: 146.0 GB, 145999527936 bytes
255 heads, 63 sectors/track, 17750 cylinders, total 285155328 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000f1568

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048   285155327   142576640   83  Linux

-----Original Message-----
From: Dave Chinner [mailto:david@fromorbit.com] 
Sent: Monday, February 11, 2013 10:56 PM
To: Cheung, Norman
Cc: linux-xfs@oss.sgi.com
Subject: Re: Hung in D state during fclose

On Tue, Feb 12, 2013 at 06:17:04AM +0000, Norman Cheung wrote:
> I am not sure if this forum is the same as xfs@oss.sgi.com, if so, my 
> apology for double posting.  I appreciate in sight or work around on this.
> 
> Every 3 - 4 days, my application will hang in D state at file close.  
> And shortly after that flush (from a different partition) is locked in 
> D state also.
> 
> My application runs continuously, 5 threads are writing data at a rate 
> of 1.5M/sec to 5 different XFS partitions.  Each of these partitions 
> is a 2 disk RAID 0. In addition, I have other threads consuming 100% 
> CPU at all time, and most of these threads are tied to its own CPU.
> 
> There are 5 data writing threads are also set to run in specific CPU 
> (one CPU per  thread), with priority set to high (-2).  The data writing pattern is:
> each disk writing thread will write a file  1.5 Gig. Then the thread 
> will pause for about 3 minutes. Hence we have 5 files of 1.5Gig each 
> after one processing cycle.  And we keep 5 sets and delete the older ones.
> 
> After about 300 - 800 cycle, one or two of these disk writing threads 
> will go into D state.  And within a second flush of another partition 
> will show up in D state.  then after 15 minutes of no activities, the 
> parent task will lower the priority of all threads (to noraml 20) and 
> abort the threads.  In all cases, lowering the priority will get 
> threads out of D states.  I have also tried running the disk writing 
> threads with normal priority (20).  Same hangs.  Also the fclose of 
> all 5 files to 5 different partitions happens around the same time.
> 
> Thanks in advance,
> 
> Norman
> 
> 
> Below is the sysrq for the 2 offending threads.
> 
> 1. the disk writing thread hung in fclose
> 
> 
> Tigris_IMC.exe  D 0000000000000000     0  4197   4100 0x00000000
> ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 Call Trace:
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>] ? 
> xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ? 
> rwsem_down_failed_common+0xc5/0x150
> [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ? 
> xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ? 
> __xfs_get_blocks+0x1db/0x3d0 [xfs] [<ffffffff81103340>] ? 
> kmem_cache_alloc+0x100/0x130 [<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ?
> __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ? 
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ? 
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ? 
> block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ? 
> xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ? 
> generic_file_buffered_write+0xf8/0x250
> [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs] 
> [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs] 
> [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ?
> security_file_permission+0x24/0xc0
> [<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ? 
> sys_write+0x47/0x90 [<ffffffff813ce412>] ? 
> system_call_fastpath+0x16/0x1b

Can you please post non-mangled traces? these have all the lines run together and then wrapped by you mailer....

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Hung in D state during fclose
  2013-02-12  6:17 Norman Cheung
@ 2013-02-12  6:55 ` Dave Chinner
  2013-02-12  7:01   ` Cheung, Norman
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2013-02-12  6:55 UTC (permalink / raw)
  To: Norman Cheung; +Cc: linux-xfs

On Tue, Feb 12, 2013 at 06:17:04AM +0000, Norman Cheung wrote:
> I am not sure if this forum is the same as xfs@oss.sgi.com, if so, my apology 
> for double posting.  I appreciate in sight or work around on this.
> 
> Every 3 - 4 days, my application will hang in D state at file close.  And 
> shortly after that flush (from a different partition) is locked in D state 
> also.  
> 
> My application runs continuously, 5 threads are writing data at a rate of 
> 1.5M/sec to 5 different XFS partitions.  Each of these partitions is a 2 disk 
> RAID 0. In addition, I have other threads consuming 100% CPU at all time, and 
> most of these threads are tied to its own CPU.
> 
> There are 5 data writing threads are also set to run in specific CPU (one CPU  
> per  thread), with priority set to high (-2).  The data writing pattern is: 
> each disk writing thread will write a file  1.5 Gig. Then the thread will 
> pause for about 3 minutes. Hence we have 5 files of 1.5Gig each after one 
> processing cycle.  And we keep 5 sets and delete the older ones.
> 
> After about 300 - 800 cycle, one or two of these disk writing threads will go 
> into D state.  And within a second flush of another partition will show up in 
> D state.  then after 15 minutes of no activities, the parent task will lower 
> the priority of all threads (to noraml 20) and abort the threads.  In all 
> cases, lowering the priority will get threads out of D states.  I have also 
> tried running the disk writing threads with normal priority (20).  Same 
> hangs.  Also the fclose of all 5 files to 5 different partitions happens 
> around the same time.
> 
> Thanks in advance,
> 
> Norman 
> 
> 
> Below is the sysrq for the 2 offending threads.
> 
> 1. the disk writing thread hung in fclose
> 
> 
> Tigris_IMC.exe  D 0000000000000000     0  4197   4100 0x00000000
> ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
> ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
> 0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 Call Trace:
> [<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>] ? 
> xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ? 
> rwsem_down_failed_common+0xc5/0x150
> [<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
> [<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ? 
> xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0 
> [xfs] [<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130 
> [<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ? 
> __block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ? 
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ? 
> xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ? 
> block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ? 
> xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ? 
> generic_file_buffered_write+0xf8/0x250
> [<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs] 
> [<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs] 
> [<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ? 
> security_file_permission+0x24/0xc0
> [<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ? 
> sys_write+0x47/0x90 [<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b

Can you please post non-mangled traces? these have all the lines run
together and then wrapped by you mailer....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Hung in D state during fclose
@ 2013-02-12  6:17 Norman Cheung
  2013-02-12  6:55 ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Norman Cheung @ 2013-02-12  6:17 UTC (permalink / raw)
  To: linux-xfs

I am not sure if this forum is the same as xfs@oss.sgi.com, if so, my apology 
for double posting.  I appreciate in sight or work around on this.

Every 3 - 4 days, my application will hang in D state at file close.  And 
shortly after that flush (from a different partition) is locked in D state 
also.  

My application runs continuously, 5 threads are writing data at a rate of 
1.5M/sec to 5 different XFS partitions.  Each of these partitions is a 2 disk 
RAID 0. In addition, I have other threads consuming 100% CPU at all time, and 
most of these threads are tied to its own CPU.

There are 5 data writing threads are also set to run in specific CPU (one CPU  
per  thread), with priority set to high (-2).  The data writing pattern is: 
each disk writing thread will write a file  1.5 Gig. Then the thread will 
pause for about 3 minutes. Hence we have 5 files of 1.5Gig each after one 
processing cycle.  And we keep 5 sets and delete the older ones.

After about 300 - 800 cycle, one or two of these disk writing threads will go 
into D state.  And within a second flush of another partition will show up in 
D state.  then after 15 minutes of no activities, the parent task will lower 
the priority of all threads (to noraml 20) and abort the threads.  In all 
cases, lowering the priority will get threads out of D states.  I have also 
tried running the disk writing threads with normal priority (20).  Same 
hangs.  Also the fclose of all 5 files to 5 different partitions happens 
around the same time.

Thanks in advance,

Norman 


Below is the sysrq for the 2 offending threads.

1. the disk writing thread hung in fclose


Tigris_IMC.exe  D 0000000000000000     0  4197   4100 0x00000000
ffff881f3db921c0 0000000000000086 0000000000000000 ffff881f42eb8b80
ffff880861419fd8 ffff880861419fd8 ffff880861419fd8 ffff881f3db921c0
0000000000080000 0000000000000000 00000000000401e0 00000000061805c1 Call Trace:
[<ffffffff810d89ed>] ? zone_statistics+0x9d/0xa0 [<ffffffffa0402682>] ? 
xfs_iomap_write_delay+0x172/0x2b0 [xfs] [<ffffffff813c7e35>] ? 
rwsem_down_failed_common+0xc5/0x150
[<ffffffff811f32a3>] ? call_rwsem_down_write_failed+0x13/0x20
[<ffffffff813c74ec>] ? down_write+0x1c/0x1d [<ffffffffa03fba8e>] ? 
xfs_ilock+0x7e/0xa0 [xfs] [<ffffffffa041b64b>] ? __xfs_get_blocks+0x1db/0x3d0 
[xfs] [<ffffffff81103340>] ? kmem_cache_alloc+0x100/0x130 
[<ffffffff8113fa2e>] ? alloc_page_buffers+0x6e/0xe0 [<ffffffff81141cdf>] ? 
__block_write_begin+0x1cf/0x4d0 [<ffffffffa041b850>] ? 
xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffffa041b850>] ? 
xfs_get_blocks_direct+0x10/0x10 [xfs] [<ffffffff8114226b>] ? 
block_write_begin+0x4b/0xa0 [<ffffffffa041b8fb>] ? 
xfs_vm_write_begin+0x3b/0x70 [xfs] [<ffffffff810c0258>] ? 
generic_file_buffered_write+0xf8/0x250
[<ffffffffa04207b5>] ? xfs_file_buffered_aio_write+0xc5/0x130 [xfs] 
[<ffffffffa042099c>] ? xfs_file_aio_write+0x17c/0x2a0 [xfs] 
[<ffffffff81115b28>] ? do_sync_write+0xb8/0xf0 [<ffffffff8119daa4>] ? 
security_file_permission+0x24/0xc0
[<ffffffff8111630a>] ? vfs_write+0xaa/0x190 [<ffffffff81116657>] ? 
sys_write+0x47/0x90 [<ffffffff813ce412>] ? system_call_fastpath+0x16/0x1b

2. flush from another partition

flush-8:48      D 0000000000000000     0  4217      2 0x00000000
ffff883fc053f580 0000000000000046 ffff881f40f348f0 ffff881f40e2aa80
ffff883fabb83fd8 ffff883fabb83fd8 ffff883fabb83fd8 ffff883fc053f580
ffff883fc27654c0 ffff881f40dfc040 0000000000000001 ffffffff810656f9 Call Trace:
[<ffffffff810656f9>] ? __queue_work+0xc9/0x390 [<ffffffff811e3e3f>] ? 
cfq_insert_request+0xaf/0x4f0 [<ffffffff81065a06>] ? queue_work_on+0x16/0x20 
[<ffffffff813c69cd>] ? schedule_timeout+0x1dd/0x240 [<ffffffffa041a762>] ? 
kmem_zone_zalloc+0x32/0x50 [xfs] [<ffffffff813c7559>] ? __down+0x6c/0x99 
[<ffffffff81070377>] ? down+0x37/0x40 [<ffffffffa041d59d>] ? 
xfs_buf_lock+0x1d/0x40 [xfs] [<ffffffffa041d6a3>] ? _xfs_buf_find+0xe3/0x210 
[xfs] [<ffffffffa041dcb4>] ? xfs_buf_get+0x64/0x150 [xfs] 
[<ffffffffa041dfb2>] ? xfs_buf_read+0x12/0xa0 [xfs] [<ffffffffa04151af>] ? 
xfs_trans_read_buf+0x1bf/0x2f0 [xfs] [<ffffffffa03d06c0>] ? 
xfs_read_agf+0x60/0x1b0 [xfs] [<ffffffffa03cf3b7>] ? 
xfs_alloc_update+0x17/0x20 [xfs] [<ffffffffa03d0841>] ? 
xfs_alloc_read_agf+0x31/0xd0 [xfs] [<ffffffffa03d2083>] ? 
xfs_alloc_fix_freelist+0x433/0x4a0 [xfs] [<ffffffff810d89ed>] ? 
zone_statistics+0x9d/0xa0 [<ffffffffa03d23a4>] ? xfs_alloc_vextent+0x184/0x4a0 
[xfs] [<ffffffffa03dc348>] ? xfs_bmap_btalloc+0x2d8/0x6d0 [xfs] 
[<ffffffffa03e0efd>] ? xfs_bmapi+0x9bd/0x11a0 [xfs] [<ffffffffa03d9bbc>] ? 
xfs_bmap_search_multi_extents+0xac/0x120 [xfs] [<ffffffffa040293c>] ? 
xfs_iomap_write_allocate+0x17c/0x330 [xfs] [<ffffffffa041b20f>] ? 
xfs_map_blocks+0x19f/0x1b0 [xfs] [<ffffffffa041c20e>] ? 
xfs_vm_writepage+0x19e/0x470 [xfs] [<ffffffff810c97ba>] ? __writepage+0xa/0x30 
[<ffffffff810c9c4d>] ? write_cache_pages+0x1cd/0x3d0 [<ffffffff810c97b0>] ? 
bdi_set_max_ratio+0x90/0x90 [<ffffffff810c9e93>] ? 
generic_writepages+0x43/0x70 [<ffffffff81139330>] ? 
writeback_single_inode+0x160/0x300
[<ffffffff811397d4>] ? writeback_sb_inodes+0x104/0x1a0 [<ffffffff81139cfd>] ? 
writeback_inodes_wb+0x8d/0x140 [<ffffffff8113a05b>] ? wb_writeback+0x2ab/0x310 
[<ffffffff813cedee>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8113a10e>] ? 
wb_check_old_data_flush+0x4e/0xa0 [<ffffffff8113a28b>] ? 
wb_do_writeback+0x12b/0x160 [<ffffffff8113a332>] ? 
bdi_writeback_thread+0x72/0x150 [<ffffffff8113a2c0>] ? 
wb_do_writeback+0x160/0x160 [<ffffffff8106b06e>] ? kthread+0x7e/0x90 
[<ffffffff813cf544>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8106aff0>] ? 
kthread_worker_fn+0x1a0/0x1a0 [<ffffffff813cf540>] ? gs_change+0x13/0x13


kernel version 3.0.13-0.27
    
xfsprogs version xfs_repair version 3.1.6

number of CPUs  32

 xfs_info /fastraid2
meta-data=/dev/sdd1              isize=256    agcount=4, agsize=17822144 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=71288576, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=34808, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


    contents of /proc/meminfo
cat /proc/meminfo
MemTotal:       262429860 kB
MemFree:        146893200 kB
Buffers:           38600 kB
Cached:           336388 kB
SwapCached:            0 kB
Active:          4601364 kB
Inactive:         277852 kB
Active(anon):    4504236 kB
Inactive(anon):     9524 kB
Active(file):      97128 kB
Inactive(file):   268328 kB
Unevictable:    107520004 kB
Mlocked:          212048 kB
SwapTotal:       1051644 kB
SwapFree:        1051644 kB
Dirty:               140 kB
Writeback:             0 kB
AnonPages:      112024272 kB
Mapped:          2031420 kB
Shmem:              9528 kB
Slab:             123444 kB
SReclaimable:      24088 kB
SUnreclaim:        99356 kB
KernelStack:        3728 kB
PageTables:       229284 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    260857204 kB
Committed_AS:   112752960 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      712612 kB
VmallocChunk:   34224802044 kB
HardwareCorrupted:     0 kB
AnonHugePages:  111861760 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      133788 kB
DirectMap2M:     7178240 kB
DirectMap1G:    261095424 kB


    contents of /proc/mounts

cat /proc/mounts
rootfs / rootfs rw 0 0
udev /dev tmpfs rw,relatime,nr_inodes=0,mode=755 0 0 tmpfs /dev/shm tmpfs 
rw,relatime 0 0
/dev/sde3 / reiserfs rw,relatime,acl,user_xattr 0 0 proc /proc proc 
rw,relatime 0 0 sysfs /sys sysfs rw,relatime 0 0 devpts /dev/pts devpts 
rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
/dev/sde1 /boot reiserfs rw,relatime,acl,user_xattr 0 0 
fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0 
debugfs /sys/kernel/debug debugfs rw,relatime 0 0 nfsd /proc/fs/nfsd nfsd 
rw,relatime 0 0 rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 
0 //10.34.3.247/IMCRootFolder /home/klac/mount_path_Tool cifs 
rw,relatime,sec=ntlm,unc=\\10.34.3.247
\IMCRootFolder,username=KTMfg,uid=1000,forceuid,gid=100,forcegid,addr=10.34.3.2
47,file_mode=0755,dir_mode=0755,nounix,rsize=16384,wsize=131007,actimeo=1 0 0
/dev/sda1 /edge xfs 
rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,
noquota 0 0
/dev/sdb1 /fastraid0 xfs 
rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,
noquota 0 0
/dev/sdc1 /fastraid1 xfs 
rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,
noquota 0 0
/dev/sdd1 /fastraid2 xfs 
rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,
noquota 0 0
/dev/sdf1 /fastraid3 xfs 
rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,
noquota 0 0
/dev/sdg1 /fastraid4 xfs 
rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,
noquota 0 0
/dev/sdh1 /fastraid5 xfs 
rw,noatime,nodiratime,attr2,delaylog,largeio,nobarrier,logbufs=8,logbsize=256k,
noquota 0 0


    contents of /proc/partitions
 cat /proc/partitions
major minor  #blocks  name

   8       16  285155328 sdb
   8       17  285154304 sdb1
   8       32  285155328 sdc
   8       33  285154304 sdc1
   8       48  285155328 sdd
   8       49  285154304 sdd1
   8       80  285671424 sdf
   8       81  285659136 sdf1
   8       64  124485632 sde
   8       65    1051648 sde1
   8       66    1051648 sde2
   8       67  122381312 sde3
   8       96  285671424 sdg
   8       97  285659136 sdg1
   8      112  285671424 sdh
   8      113  285659136 sdh1
   8        0  142577664 sda
   8        1  142576640 sda1


    RAID layout (hardware and/or software) Hardware RAID 0 , 2 disks per 
RAID.  
    LVM configuration

fdisk -l

Disk /dev/sdb: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units = 
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 
0x00073b0b

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048   570310655   285154304   83  Linux

Disk /dev/sdc: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units = 
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 
0x000d3eba

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048   570310655   285154304   83  Linux

Disk /dev/sdd: 292.0 GB, 291999055872 bytes
255 heads, 63 sectors/track, 35500 cylinders, total 570310656 sectors Units = 
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 
0x0003e548

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1            2048   570310655   285154304   83  Linux

Disk /dev/sdf: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units = 
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 
0x0009caf0

   Device Boot      Start         End      Blocks   Id  System
/dev/sdf1            2048   571320319   285659136   83  Linux

Disk /dev/sde: 127.5 GB, 127473287168 bytes
255 heads, 63 sectors/track, 15497 cylinders, total 248971264 sectors Units = 
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 
0x674cfaee

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1   *        2048     2105343     1051648   83  Linux
/dev/sde2         2105344     4208639     1051648   82  Linux swap / Solaris
/dev/sde3         4208640   248971263   122381312   83  Linux

Disk /dev/sdg: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units = 
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 
0x0000a1d4

   Device Boot      Start         End      Blocks   Id  System
/dev/sdg1            2048   571320319   285659136   83  Linux

Disk /dev/sdh: 292.5 GB, 292527538176 bytes
255 heads, 63 sectors/track, 35564 cylinders, total 571342848 sectors Units = 
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 
0x0006b7ff

   Device Boot      Start         End      Blocks   Id  System
/dev/sdh1            2048   571320319   285659136   83  Linux

Disk /dev/sda: 146.0 GB, 145999527936 bytes
255 heads, 63 sectors/track, 17750 cylinders, total 285155328 sectors Units = 
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 
0x000f1568

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048   285155327   142576640   83  Linux

 



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-02-26 20:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-11 23:36 Hung in D state during fclose Cheung, Norman
2013-02-12  6:17 Norman Cheung
2013-02-12  6:55 ` Dave Chinner
2013-02-12  7:01   ` Cheung, Norman
2013-02-12 10:20     ` Dave Chinner
2013-02-12 16:39       ` Cheung, Norman
2013-02-12 20:22         ` Dave Chinner
2013-02-12 21:06           ` Cheung, Norman
2013-02-12 22:21             ` Dave Chinner
2013-02-13  0:12             ` Cheung, Norman
2013-02-13  5:15               ` Dave Chinner
2013-02-14  4:53                 ` Cheung, Norman
2013-02-26 19:41                   ` Cheung, Norman
2013-02-26 20:31                     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.